Estimating neighbourhood death rates using the random forest algorithm

Andrew John George Cairns*, Jie Wen, Torsten Kleinow

*Corresponding author for this work

Research output: Contribution to specialist publicationArticle


Recent decades have seen increasing evidence for inequality in mortality for different socio-economic groups in various national populations. Socio-economic characteristics are not the actual cause of mortality inequalities. Rather, the socio-economic characteristics of a population are correlated with the prevalence of various health-related lifestyles such as smoking, diet and exercise.
In addition, they can be related to the availability of preventive health care, crime rates, air pollution and other external factors that have an impact on health and mortality. Often, observed inequalities are based on existing socio-economic indicators such as income (eg Chetty et al., 2016), affluence (Cairns et al., 2019), deprivation (Villegas and Haberman, 2014) or education (eg Mackenbach et al., 2003, 2015).
However, many of these metrics are designed for other purposes. This means that while the English Index of Multiple Deprivation, for example, can be used as a good predictor of mortality, perhaps we can do better by designing a customised mortality index. Specifically, it might be possible to improve on these existing approaches by combining individual pieces of socio-economic information at the individual or (as we do here) neighbourhood level and analyse this using modern data-science techniques: here, the random forest algorithm. This method will allow us to capture how mortality rates respond to a range of variables, potentially in a non-linear way.
Original languageEnglish
Specialist publicationLongevity Bulletin
Publication statusPublished - 23 Aug 2023


  • LIFE Index
  • random forest algorithm
  • socio-economic mortality

Cite this