1 Introduction

Over the past century, the average human lifespan has increased significantly. In 1913, the global life expectancy was estimated to be 34.1 years, while in 2001 it increased to 66.6 years, and by 2015, it had further increased to 71.8 years (Riley 2005; Wang et al. 2016). Forecasting the evolution of mortality rates is very important for actuarial practice, as well as for health care and pension systems in general, and so, forecasts should be carefully assessed.

Yet, forecasts are based on past and present trends and a fundamental uncertainty is the extent to which these trends will remain valid in future. Hence, it is important to model uncertainty not only to the extent that it is expressed as variability around current trends, but also as to incorporate sources of variation that can affect, perhaps even reverse current trends. Human health depends on a large number of factors, including biological, environmental and social ones. These drivers may have reinforcing or opposing effects and are subject to varying degrees of uncertainty.

The most common mortality models, with Lee and Carter (1992) perhaps being the most famous, are extrapolative in nature. They are simple, robust and have good statistical properties, but rely on one critical assumption: that the past trends are representative of future developments. Ignoring factors that drive trends in mortality means being unable to model turning points. For example, the dissolution of the Soviet Union caused diverging patterns in mortality in Lithuania, Belarus and Russia (Grigoriev et al. 2010), three countries that had convergent paths until then. Modeling the future when the trend changes is difficult. But in that case, models with covariates would have allowed a researcher in 1991 to model possible pathways, based on possible economic and social developments, while extrapolative models couldn’t have.

This paper aims to develop a model that can incorporate a wide range of covariates and be used to forecast mortality under different scenarios. The main assumption is that mortality and the covariates do not diverge in the long run, without requiring causality assumptions.

The contribution of this study is to show how broadening the scope of external variables included in stochastic mortality models can improve forecasts compared to Niu and Melenberg (2014) and at the same time allow actuarial practitioners and policymakers to use interpretable models to assess the impact of different scenarios on mortality. The inclusion of covariates from different domains allows the researcher to model the full uncertainty surrounding mortality improvements. Mortality rates do not exist in a void: they are the product of the material, technological and environmental characteristics of the populations subject to them. Additionally, the study will explore the extent to which different variables have a disparate impact on mortality in different countries, highlighting the need to tailor the analysis to each country’s specific characteristics and history.

Previous research in the field of stochastic mortality models incorporating external variables is relatively scarce. Economic prosperity, among the first variables to be associated with reduced mortality (Preston 1975), is the main and often the only factor used to drive mortality reductions. Niu and Melenberg (2014) and Seklecka et al. (2019) approach the issue with a single-population model, while Boonen and Li (2017) present a multipopulation model using principal components, an approach used in the present study as well. More recently, environmental effects have been considered, with Dutton et al. (2020) extending Seklecka et al. (2019) by including the effect of temperature anomalies on mortality. Lifestyle factors like nutrition and smoking have been considered by French and O’Hare (2014). On the other hand, there is a vast literature of both demographic and medical studies that highlight links between mortality and a multiplicity of variables, including the ones in this study. Healthy habits are consistently linked with lower mortality, while air pollution is linked to excess deaths. However, to the author’s knowledge this it is the first study in which variables from these three domains are included together in a stochastic mortality model.

The rest of the paper is organized as follows. Section 2 presents a review of the literature on stochastic mortality models, with a focus on the ways to include external variables in the modeling process. The single-population model with external variables (SEV) is introduced in Sect. 3 along with details about the fitting process, about the variables and an analysis of the characteristics of the time series used. The results are presented in Sect. 4, with a focus on five countries and a general overview of goodness of fit and forecasting performance for all countries considered. Section 5 concludes with a discussion of the findings.

2 Literature review

The approaches to forecast mortality are generally classified into three categories, namely the extrapolation, explanation and the expectation ones (Stoeldraijer et al. 2013). Extrapolation approaches assume that existing trends and patterns in mortality rates by age are regular enough to continue into the future. Explanation approaches model future mortality with exogenous variables that have a known link to mortality, like smoking and lung cancer, using structural or epidemiological equations. The expectation approach instead incorporates expert opinions regarding various aspects of mortality. The three approaches can coexist in a single model, and the boundaries between the approaches can be blurred. Using external covariates to model mortality is close to the explanation approach. However, when applied without explicitly modeling the dependency structure between variables, this results in a model closer to extrapolative ones. The present work hence draws partly from the explanation and partly from the extrapolation approach, which is the one stochastic modeling has mostly concentrated on (Cairns et al. 2011). A common approach to extrapolative modeling is to extend past trends into the future by fitting linear trends for log-mortality rates with a temporal component that can be specified, for example, through a random walk with a linear or quadratic drift. These models may be accurate in the short run and backtesting, but they are unable to identify turning points and are not useful for assessing mortality under various development and policy scenarios. Even when models allow for the incorporation of expert judgment to set mortality targets (e.g., Boumezoued et al. (2019)), these targets are essentially set exogenously.

The link between economic development and mortality has been widely studied over the past decades. Preston (1975) described the association between life expectancy and per capita income almost 50 years ago. Further studies established both positive (Brenner (2005), Birchenall (2007)) and negative (Tapia Granados 2008; Tapia Granados and Ionides 2011; Tapia Granados and Ionides 2017) effects of economic growth.

There are examples of stochastic mortality models that incorporate economic variables in the context of extrapolative models based on Lee and Carter (1992). Hanewald (2011) describes a relationship between the latent factor of the Lee–Carter model, GDP and unemployment. Building on that, Niu and Melenberg (2014) propose a more general model that allows for multiple latent and multiple exogenous factors and offers an application using GDP as a predictive variable. An extension to the multipopulation case is provided by Boonen and Li (2017), with a model that allows for additional exogenous variables and where the latent factor of the Lee–Carter model is dropped entirely. Most notably, they focus on multiple groups of populations, including post-Soviet countries who experienced a mortality increase after the drop in GDP due to the dissolution of the Soviet Union: the temporary increase in mortality in those countries before GDP recovered solidifies the relationship between GDP and mortality. Bozzo et al. (2021) represented a further development applied to the mortality between regions in Italy.

Other models lend themselves to the inclusion of economic variables. The model from O’Hare and Li (2012), which extends the model of Plat (2009) and allows for additional terms to estimate young mortality, is modified by Seklecka et al. (2019) by using the correlation between GDP and mortality. Dutton et al. (2020) took this further by multiplying a term of the model by the correlation between temperature anomalies and mortality, emphasizing the importance of environmental effects. Non-economic explanatory variables have also been included in some models. For example, French and O’Hare (2014), building on King and Soneji (2011) and on the literature that links mortality to lifestyle and dietary variables, present a model with GDP, health care expenditures, tobacco and alcohol consumption, fat intake and fruit and vegetables consumption.

More recently, Li and Shi (2021) propose a global vector auto-regressive (GVAR) approach that can model mortality rates for a large number of populations with the inclusion of global factors. These could be external factors like GDP or other covariates. In their application, however, the authors only employ the average of the mortality rates of the 15 countries they include, as a proxy for the global advancement of medical treatments.

An important challenge when modeling mortality by including multiple economic, social, environmental and technological factors is the lack of micro-level data. Ideally, death rates would need to be available for every possible subpopulation defined by every combination of the factors one wishes to study, or even at the individual level in case of continuous factors, covering several decades and the whole population. Although there are some studies working with individual data (i.e., Cairns et al. (2019)), the time frame of available data and the selection of covariates is usually limited, resulting in models that apply only for a limited age range. Therefore, it is necessary to use factors that can be thought of having a wider, general effect on a population. Since the relationship between the external factor and mortality is rarely direct, much less known with certainty, these models lie somewhere between pure extrapolation models and explanation models.

An alternative approach to modeling mortality with external factors is to divide the population into subpopulations based on a specific covariate, such as an affluence or development index, and use a multipopulation model for the different subpopulations, ensuring coherent forecasts. This method does not explicitly include the covariate in the model, but instead forecasts mortality for each subpopulation separately, assuming that the subdivision of the population according to the covariate is stable and accurate. Like regions in a state (i.e., Bozzo et al. (2021) and Danesi et al. (2015) for two examples with Italian regions), other groupings of the population at the subnational level can be reasonably believed to have a converging mortality pattern in the long run, while allowing for divergence in the short run. For ranked groups, a desirable outcome is for groups to preserve their ordering, that is, to avoid crossovers in estimates of mortality rates.

An example of this approach is given by Villegas and Haberman (2014), where a composite deprivation index is used to rank small areas in England, classify them based on their rank and track their mortality counts. The authors show how less deprived quintiles had lower mortality throughout the period and also experienced faster mortality declines than more deprived quintiles, leading to a widening of mortality differentials by socioeconomic status. The population can be subdivided into subpopulations either by identifying homogenuous geographical entities (Villegas and Haberman 2014; Wen et al. 2021) or by classifying the individuals directly (Cairns et al. 2019).

3 Methodology and data

The availability and comparability of data across multiple countries is a main concern of this study, which meant that data sources have been carefully selected in order to ensure comparability and both the countries analyzed and the variables included reflect the availability of high-quality, comparable data.

The mortality data used in this article is limited to males and has been sourced from the Human Mortality Database (HMD), imported into R through the HMDHFDplus package. All 42 available countries were considered. The choice of the external variables has been more problematic. On the one hand, plausible covariates are variables that have already shown a link with mortality at the individual level or in forecasting or non-forecasting population-wide models. These include GDP, health care spending, other affluence measures, education, up to lifestyle variables like alcohol and tobacco consumption, obesity, marital/cohabitation status, etc. On the other hand, comparability required that the external variables be chosen among those available from reliable, official data sources, like the databases from the United Nations’ World Health Organization (WHO) or Food and Agriculture Organization (FAO), Penn World Tables and others, with time series extending as far back as the 1970s. Due to the absence of data for one or more covariates, 5 countries (Hong Kong, Israel, Luxembourg, New Zealand, Taiwan) have been excluded and the analysis performed on the remaining 37. The model estimates males and females separately and although only males are discussed for brevity, the covariates are available for females as well and broken down by gender where possible. Model estimation has been carried out with a version of the StMoMo R package modified by the author to allow for the inclusion of external variables.

3.1 Modeling methodology

The model proposed in this article is based on the Lee–Carter model, where the latent factor is substituted by one or more factors obtained from external variables through principal components analysis (PCA). The methodology is a mix of Niu and Melenberg (2014), Boonen and Li (2017) and French and O’Hare (2014) and can be summarized as follows:

  1. 1.

    Select possible covariates based on availability of data for a wide set of countries, an extended number of years and from reputable data sources. In addition to GDP and other affluence-related variables, consider variables related to environmental effects (i.e., air quality) and to lifestyle choices (i.e., alcohol consumption) that have a strong empirical link to mortality;

  2. 2.

    Perform tests on the possible covariates to assess their characteristics, i.e., determine whether they are stationary, non-stationary or stationary with structural breaks. Explore the long-term relationships between them and mortality, i.e., by checking whether cointegration relationships exist and are stable between countries;

  3. 3.

    For all countries with available data, fit the following single-population model, based on the model specified by Niu and Melenberg (2014):

    $$\log \left( {m_{x,t} } \right) = a_{x} + \mathop \sum \limits_{j = 1}^{J} b_{j,x} k_{j,t} + \mathop \sum \limits_{l = 1}^{L} c_{l,x} g_{l,t} + \epsilon_{x,t} ,$$
    (1)

with \(J\) age-period terms (\(J=0\) or \(J=1\) in the applications) and \(L\) orthogonal external factors \({g}_{l,t}\). The external factors are combinations of multiple external variables, in order to reduce dimensionality and solve identification issues akin to the use of principal components in Boonen and Li (2017).

The proposed approach tries to minimize assumptions about the regularity of mortality rates and of the covariates. The main assumption instead is that there are stable long-term relationships between covariates and mortality, even if the series themselves aren’t stationary or exhibit structural breaks. To this end, the cointegration analysis at step 2 investigates the existence of said long-term relationships: the cointegration coefficients themselves, though, aren’t used in the model. The assumption of a long-term relationship between variables is a crucial one if the resulting model is to be used in scenario planning where forecasts are required under deviations from current trends.

In the stochastic mortality modeling framework presented, covariates are treated uniformly across all age groups. Variations in their effect on mortality rates across ages are briefly discussed in Sect. 3.3. Overall, while some differences in impact exist, they do not appear substantial at a preliminary analysis and any age-specific effects are anticipated to be captured through the age loadings within the model’s components. The model is specifically tailored for planners who may encounter challenges in accessing or forecasting age-disaggregated data. For certain variables, such as economic or environmental ones, the breakdown by age may be impractical or impossible.

For this kind of applications, the practitioner faces a trade-off between model simplicity and data availability versus the intricate modeling of external effects on mortality, involving the identification of causal relationships, lagged effects, etc. The approach advocated in this study follows Niu and Melenberg (2014) and Boonen and Li (2017), favoring the former: in general, practical usability necessitates simplifications in any model, though balancing accuracy and feasibility amidst real-world data constraints remains a challenge.

3.2 Parameter estimation

The fitting procedure of the SEV models draws upon the Generalized Age-Period-Cohort (GAPC) models as implemented in the StMoMo R package (Villegas et al. 2018), which is itself based on the gnm R package (Turner and Firth 2023), drawing from Niu and Melenberg (2014) as well. The StMoMo package has been extended in order to allow for external terms in the fitting procedure. Even though StMoMo allows offsetting terms to be included in the model, they are wholly external to the fitting process and therefore age loadings can’t be fit for them without modifying the package. The following implementation bridges this gap in capability.

The SEV model for \(J\) age-period terms and \(L\) external terms is described in Eq. (1). The GAPC models model deaths instead of death rates. This is equivalent to fitting death rates and therefore death rates will be used in the following instead of deaths and exposures for compactness.

The terms \({g}_{l,t}\) are linear combinations of the \(O\) external variables \({h}_{o,t}\), with \(L<O\). In the applications, \(J=1\).

The fitting algorithm is as follows:

  • Obtain yearly deaths by age \({\text{D}}_{x,t}\), their correspondent exposures \({\text{E}}_{x,t}\) (the ratio of these two quantities is equivalent to death rates \({\text{m}}_{x,t}\)) and \(O\) external variables \({h}_{o,t}\) for the given country;

  • Perform a singular value decomposition on the scaled matrix of \({h}_{o,t}\) external variables, retain the \(L\) components which explain a share of variance larger than a set threshold, up to a given maximum of components, denote them with \({g}_{l,t}\);

  • Estimate the model with gnm, obtaining the quantities \({a}_{x}\), \({b}_{j,x}\), \({k}_{j,t}\) and \({c}_{l,x}\);

  • Fit the rates using the estimated parameters;

  • Transform the parameters so that they satisfy the identifiability constraints;

  • Fit the rates with the transformed parameters and check whether the transformation preserves the rates. If so, output the model.

The variance threshold, the maximum number of principal components and whether to include an age-period term are parameters set before fitting. The parameters, as is usual for mortality models based on the Lee–Carter model, are not identified without additional constraints. For example, setting \(J=1\), with \({c}^{*}\in {\mathbb{R}}^{J}\), \({d}^{*}\ne 0\) and \({e}^{*}\in {\mathbb{R}}^{L}\), it is possible to obtain, for the logarithm of the fitted rate \({m}_{x,t}\):

$$\begin{aligned} m_{{x,t}} = & a_{x} + b_{x} k_{t} + \sum\limits_{{l = 1}}^{L} {c_{{l,x}} } g_{{l,t}} \\ = & a_{x} + b_{x} k_{t} + \sum\limits_{{l = 1}}^{L} {c_{{l,x}} } g_{{l,t}} + \sum\limits_{{l = 1}}^{L} {e_{l} } b_{x} g_{{l,t}} - \sum\limits_{{l = 1}}^{L} {e_{l} } b_{x} g_{{l,t}} + b_{x} c^{*} - b_{x} c^{*} \\ = & \left( {a_{x} - b_{x} c^{*} } \right) + \frac{{b_{x} }}{{d^{*} }}d^{*} \left( {k_{t} - \sum\limits_{{l = 1}}^{L} {e_{l}^{*} } g_{{l,t}} + c^{*} } \right) + \sum\limits_{{l = 1}}^{L} {\left( {c_{{l,x}} + e_{l}^{*} b_{x} } \right)} g_{{l,t}} \\ = & \widetilde{a}_{x} + \widetilde{b}_{x} \widetilde{k}_{t} + \sum\limits_{{l = 1}}^{L} {\widetilde{c}_{{l,x}} } g_{{l,t}} , \\ \end{aligned}$$

with

$$\begin{array}{ll}{\widetilde{a}}_{x}& ={a}_{x}-{b}_{x}{c}^{*}\\ {\widetilde{b}}_{x}& =\frac{{b}_{x}}{{d}^{*}}\\ {\widetilde{k}}_{t}& ={d}^{*}\left({k}_{t}-\sum_{l=1}^{L}{e}_{l}^{*}{g}_{l,t}+{c}^{*}\right)\\ {\widetilde{c}}_{l,x}& ={c}_{l,x}+{e}_{l}^{*}{b}_{x}.\end{array}$$

It follows then that \({c}^{*}\), \({d}^{*}\) and \({e}_{l}^{*}\) need to be functions of the parameters such that, after the transformation, calculating \({c}^{*}\), \({d}^{*}\) and \({e}_{l}^{*}\) again yield \({c}^{*}=0\), \({d}^{*}=1\) and \({e}_{l}^{*}=0\).

The following normalization constraints based on Niu and Melenberg (2014) and Boonen and Li (2017) are proposed:

$$\sum_{x=1}^{N}{b}_{x}=1,$$
$$\sum_{t=1}^{T}{k}_{t}=0,$$
$$k=\left({k}_{1},\dots ,{k}_{T}\right)\ne 0,$$
$$\sum_{t=1}^{T}{k}_{t}{g}_{l,t}=0,\hspace{0.25em}{\text{for}}\hspace{0.25em}l=1,\dots ,L,$$

with the last constraint describing that the sample covariance of \({k}_{t}\) and \({g}_{l,t}\) is \(0\). These correspond to the following transformations:

$$\begin{array}{cc}{c}^{*}& =-\frac{\sum_{t=1}^{T}{k}_{t}}{T}\\ {d}^{*}& =\sum_{x=1}^{N}{b}_{x}\\ {e}_{l}^{*}& =cov\left({g}_{lt},{k}_{t}\right)/var\left({g}_{lt}\right)\hspace{0.25em}{\text{for}}\hspace{0.25em}l=1,\dots ,L,\end{array}$$

with cov() and var() denoting sample covariance and variance, respectively. In order to demonstrate that the proposed constraints identify the model uniquely, the following theorem must be proven:

Theorem 1

(Identification) Let \(m={\left({m}_{x,t}\right)}_{x=1,\dots ,N,t=1,\dots ,T}=m\left(\theta \right)\), where \(m=m\left(\theta \right)\) satisfies \({m}_{x,t}={a}_{x}+{b}_{x}{k}_{t}+\sum_{l=1}^{L}{c}_{l,x}{g}_{l,t}\) for some \(\theta =\left({\left({a}_{x}\right)}_{x=1,\dots ,N},{\left({b}_{x}\right)}_{x=1,\dots ,N},{\left({k}_{t}\right)}_{t=1,\dots ,T},{\left({c}_{l,x}\right)}_{x=1,\dots ,N,l=1,\dots ,L}\right)\). Then the parametrization \({\theta }^{0}\) satisfying the normalization constraints above satisfies the following:

  • \({\theta }^{0}\) is a function of \(\theta\).

  • \(m\) is a function of \(\theta\) through \({\theta }^{0}\).

  • The parametrization of \(m\) by \({\theta }^{0}\) is exactly identified. That is, if \({\theta }^{1}\ne {\theta }^{2}\) are two sets of parameters satisfying the normalization constraints, then \(m\left({\theta }^{1}\right)\ne m\left({\theta }^{2}\right)\).

The proof is in the Appendix.

3.3 Variables

The external variables chosen are meant to represent widely available and easily measurable variables that may have a plausible, although perhaps weak or indirect, effect on mortality. This is consistent with the overall focus on forecast improvement and scenario building. They are as follows (sources after dashes, literature linking them to mortality in parentheses):

  • Average height of men aged 18, as height is linked with a lower disease load in children which affects mortality—(NCD-RisC) (2016) and (NCD-RisC) (2020) estimate, considered as non-stochastic quantities [Hatton (2011); Quanjer (2023)]

  • Real GDP per capita—Feenstra et al. (2015) and PWT (Boonen and Li 2017; Niu and Melenberg 2014; Seklecka et al. 2019)

  • Age-standardized share of men with raised blood pressure—Zhou et al. (2017), NCD-RisC estimate, considered as non-stochastic quantities (Benjamin et al. 2017; Lim et al. 2012; Sundström et al. 2011)

  • Fruit consumption per capitaFootnote 1—FAO (Benjamin et al. 2017; Lim et al. 2012)

  • Vegetable consumption per capita—FAO (Benjamin et al. 2017; Lim et al. 2012)

  • Daily supply of calories per person—FAO (Benjamin et al. 2017; Lim et al. 2012)

  • Recorded alcohol consumption in liters per capita (15 +)—WHO and Wine Economics Research Centre, University of Adelaide, (Benjamin et al. 2017; Holmes and Anderson; 2017; Lim et al. 2012)

  • Cigarette consumption—International Cigarette Consumption Database, (Benjamin et al. 2017; Lim et al. 2012; Poirier et al.; 2019)

  • Surface temperature anomaly in degrees Celsius (difference between average country temperature and 1961–1990 global average temperature)—HadCRUT4 (Patz et al. 2005)

  • Fossil fuel consumption per capita—BP Statistical Review of World Energy via Our World in Data, a proxy variable for air pollution (Lim et al. 2012)

The preceding variables have both data available for a very high share of HMD countries and exhibit a high correlation with both age-specific death rates and the \({k}_{t}\) time index of the Lee–Carter model.

The graphs of correlations with mortality by age and country show distinctive patterns, displaying either correlations above 0.75 in absolute value for most ages or smaller correlations that are erratic across ages. An example with three variables in seven countries is shown in Fig. 1. Correlations usually become much weaker or disappear entirely at ages above 90. In some cases, e.g., for the share of men that are obese, the correlation with mortality weakens markedly for ages from 20 to 40, around the mortality hump. In countries with a more complicated mortality history like Russia or with shorter time series like Slovenia, correlations for a given variable across ages are more likely to be erratic than in countries with longer time series and a simpler mortality history.

Fig. 1
figure 1

Correlations between mortality rates and three potential covariates, by age

Table 1 summarizes the correlation between yearly age-standardized mortality rates and external factors for seven HMD countries. The age weights used are the WHO 2000–2025 Standard Million (Ahmad et al. 2001).

Table 1 Correlations between external variables and age-standardized mortality rates

3.4 Stationarity of mortality time indices

Having covariates that are correlated with mortality is not, by itself, enough to build a mortality model, since if any time series is a non-stationary process, the model estimates will be inconsistent and the correlations may be spurious. It is therefore necessary to investigate whether mortality rates—summarized by the time index \({k}_{t}\) of the Lee–Carter model—and external variables are stationary or not. In the latter case, cointegration analysis needs to be performed in order to ascertain whether the time series have a common stochastic trend.

As in Seklecka et al. (2019), the Lee–Carter \({k}_{t}\) index for male mortality and all countries has been tested for stationarity with both the Phillips–Perron (Phillips and Perron 1988) and KPSS (Kwiatkowski et al. 1992) tests, for which the null hypotheses are non-stationarity and stationarity, respectively. The version of the KPSS test for which the null is trend stationarity has been used.

The two tests agree on non-stationarity of \({k}_{t}\) (by not rejecting the null hypothesis for the Phillips–Perron test and by rejecting it for the KPSS test, in both cases at the 5% level) in 35 cases out of 42 countries. The tests disagree on the non-stationarity of \({k}_{t}\) for Japan, Republic of Korea and Russia. The \({k}_{t}\) time index appears stationary for Chile, Croatia, Hong Kong and Taiwan. For these seven countries, with the exception of Japan and Russia, the time series of mortality rates start relatively late, between 1970 for Taiwan and 2003 for the Republic of Korea. The detailed results are presented in Table 2, with p values > 0.1 for the KPSS test presented as 0.1 and p values < 0.01 presented as 0.01.

Table 2 Phillips–Perron and KPSS tests on the Lee–Carter time index

3.5 Structural breaks in mortality and covariates

Following Boonen and Li (2017) and Berkum et al. (2016), the possible presence of structural breaks has been investigated. While in Boonen and Li (2017) structural breaks are only used to calibrate forecasts, a non-stationary series may instead be trend stationary with structural breaks (Perron 1989), which may alleviate issues with variables who show a high order of integration. Moreover, the presence of a common structural break in both mortality series and an external covariate, with a similar trend for the two variables both before and after the break, would reinforce the credibility of a link between them.

The variables considered were the \({k}_{t}\) Lee–Carter mortality index and the stochastic covariatesFootnote 2: logarithm of real GDP, temperature anomalies, fossil fuel consumption, caloric supply, fruit and vegetable consumption, alcohol consumption, cigarette consumption.

The methods to test for structural change are the generalized fluctuation tests, described by Zeileis et al. (2010) and implemented in the R package strucchange. Recursive residuals have been used to analyze both cumulative sums of residuals (CUSUM processes) and moving sums of residuals (MOSUM processes). For each time series, a linear time trend is estimated and then residuals are calculated. The series is deemed to have at least a structural break if the null hypothesis of no break is rejected by the structural change test with \(\alpha =0.05\) for at least one of the CUSUM and MOSUM processes. The optimal break points are then estimated by using the algorithm of Bai and Perron (2003), with up to 1 break point identified per series.

Out of 42 countries, 35 have at least one break point in either the \({k}_{t}\) Lee–Carter time index or in a covariate. Belgium, Hong Kong, Israel, Republic of Korea, Luxembourg, New Zealand and Taiwan show no breakpoints. More specifically, 21 countries have at least one structural break in the \({k}_{t}\) index. Temperature anomalies show a structural break in 3 countries, fossil fuel consumption in 11 countries, fruit and vegetable consumption in 13 countries, while the other covariates have a structural break in between 16 and 20 countries.

For each of the 21 structural breaks in the time index \({k}_{t}\), it has been checked whether a covariate had a structural break in the same or in the preceding year. Fossil fuel consumption has a concurrent or immediately preceding break point in two cases, same as cigarette consumption per capita, followed by logarithm of real GDP, caloric supply, alcohol consumption and fruit and vegetable consumption with one case.

To sum up, both the mortality index \({k}_{t}\) and the external variables are often non-stationary and prone to having structural breaks even in relatively short time series (40 years or less). It does not appear, though, that breaks in \({k}_{t}\) are systematically preceded or followed by breaks in one or more of the external variables; hence, there is no evidence that a structural break in a variable causes a break in \({k}_{t}\) or vice versa.

Additionally, to check whether the relationship between mortality and the other variables is stable over time, it has been investigated whether there is a structural break in the series of the residuals of the time index \({k}_{t}\) regressed on the stochastic external variables. The only country to show such a break is East Germany, with a break estimated in 1996. When regressed on all variables, including the non-stochastic share of men with raised blood pressure and male height at age 18, Spain shows a structural break around 1985 and Poland in 1988. Except these three countries, the relationship between the mortality index and the external variables is stable over time.

3.6 Integration of covariates

It has been shown that both the time index \({k}_{t}\) and the external variables are non-stationary in most cases. In order to be able to model their long-term relationship, non-stationary variables need to be cointegrated and a prerequisite for cointegration is that the series are I(1). To investigate whether this is the case, the order of integration of all variables for all countries has been explored with both the Phillips–Perron and the KPSS tests (both at \(\alpha =0.05\)), with the following procedure:

  • Run both the Phillips–Perron test and the KPSS test on the series;

  • If both tests agree on non-stationarity (that is, the Phillips–Perron does not reject the null and the KPSS test rejects the null), differentiate the series and repeat the procedure on the differentiated series;

  • If both tests disagree with each other (i.e., they both fail to reject the null), for illustrative purposes only add 0.6 to the order of integration if the Phillips–Perron test points to non-stationarity and 0.4 if the KPSS test points at non-stationarity.

The results are presented in Fig. 2. Temperature anomalies are mostly stationary, with 5 countries non-stationary according to the Phillips–Perron test only and 1 country non-stationary according to the KPSS test only. The other variables are generally non-stationary, with multiple cases of discordance between the PP and KPSS tests about whether a series is I(1) or I(2). Alcohol consumption in the Republic of Korea, which has a short time series, is I(3) according to the Phillips–Perron test and I(0) according to the KPSS test, while for real GDP in Croatia is I(3) for the Phillips–Perron test and I(1) for the KPSS test. The procedure has been repeated considering structural breaks as well for series that had an order of integration of 1 or above. For series that had an order of integration below 1, structural breaks were ignored. Due to the shortness of the series, considering the structural break usually results in the null not being rejected for both the Phillips–Perron and the KPSS tests.

Fig. 2
figure 2

Orders of integration for all countries by variable

All in all, temperature anomalies are mostly stationary and the other variables are non-stationary. No series is unambiguously I(2) or higher, though for a sizable number of series the tests disagree whether they are I(1) or I(2). When considering structural breaks, the series are either I(1) or are non-stationary according to the Phillips–Perron test and stationary according to the KPSS test. Therefore, it cannot be argued that the series are stationary with structural breaks.

3.7 Cointegration analysis

As already outlined, most of the time series considered are I(1). Following the methodology outlined in Seklecka et al. (2019), the presence of cointegration relationships in the data has been explored. Cointegration is a prerequisite for producing consistent estimates in case of non-stationary time series. The procedure by Johansen (1991) has been applied to all countries with at least 20 years of data (27 countries), assuming a linear trend in cointegration. The results are presented in Table 3. With the exception of Denmark and Finland, all countries have at least one cointegration relationship with a p value under 5% and 23 countries (85%) have at least one cointegration relationship with a p value under 1%. In both Denmark and Finland the p value is just above 5%. For countries with less than 20 years of data, the series were not tested for cointegration due to the small number of years available. Although the long-run behavior of the series has not been ascertained for these countries, the fact that the series are cointegrated for almost all countries that were tested means that cointegration is likely in countries with short time series as well.

Table 3 Cointegration relationships between the Lee–Carter time index and external variables

Despite the non-stationary nature of both mortality rates, represented by the Lee–Carter time index (\({k}_{t}\)), and the covariates, the identification of a cointegration relationship points to their long-term coherence. This implies that these variables do not deviate significantly from each other over time, allowing for a reliable modeling approach with consistent estimates. Furthermore, the infrequent occurrence of structural breaks in \({k}_{t}\) when regressed on the covariates reinforces this finding.

4 Results

The SEV model is applied to male mortality between the ages of 40 to 90, with a maximum of two principal components and a threshold of 15% of variance explained by the principal component in order for it to be included in the model. The variance threshold and the limit on the maximum number of principal components ensure the best combination of fit and parsimony, as shown in Sect. 4.5 along with a brief discussion of alternative specifications. The 15% threshold was chosen since a preliminary principal components analysis on the covariates for a subset of countries evidenced a consistent presence of at least one principal component beyond the first explaining more than 10% of the variance, with a share of variance explained ranging between 15 and 25% depending on the country.

The overall performance of the SEV model for all countries analyzed is discussed first, using several goodness of fit measures comprising both absolute and percentage errors. The distribution of age at death is concentrated on higher ages; therefore, absolute residuals will be more influenced by the fit at old ages, while percentage residuals are more sensitive at younger ages, when the number of deaths is small and a small residual in absolute terms can lead to large errors in percentage terms. Subsequently, the forecasting performance is evaluated for various jump-off years, which represent the starting points for mortality projections based on historical data.

The impact of individual variables on mortality is examined by investigating the loadings of principal components, which form the external factors \({g}_{l,t}\) used in mortality modeling. Afterward, the contribution of specific variables to overall mortality reductions is scrutinized, focusing on selected countries for analysis.

The Appendix contains an in-depth discussion of the results for a selection of countries, with a specific focus on whether the model fit captures the mortality trend adequately by analyzing the model residuals, again both in raw number of deaths and in percentage terms.

4.1 Goodness of fit and model comparison

The fit of the SEV models is compared to the Lee–Carter model and to the Niu-Melenberg model, estimated on the same set of years and ages, using the mean absolute deviation (MAD), mean absolute percentage error (MAPE) and the Bayesian information criterion (BIC). They are defined as follows, with the predicted deaths \({\widehat{D}}_{x,t}=exp\{{a}_{x}+\sum_{j=1}^{J}{b}_{j,x}{k}_{j,t}+\sum_{l=1}^{L}{c}_{l,x}{d}_{l,x}{g}_{l,t}\}{E}_{x,t}\):

$${\text{MAD}} = \frac{{\mathop \sum \nolimits_{x} \mathop \sum \nolimits_{t} \left| {D_{x,t} - \hat{D}_{x,t} } \right|}}{XT}$$
$${\text{MAPE}} = \frac{{\mathop \sum \nolimits_{x} \mathop \sum \nolimits_{t} \frac{{\left| {D_{x,t} - \hat{D}_{x,t} } \right|}}{{D_{x,t} }}}}{XT}$$
$${\text{BIC}} = m\;\log M - 2\log \hat{L},$$

with \(X\) being the number of ages, \(T\) the number of years, \(m\) the number of parameters, \(M\) the number of observations and \(\widehat{L}\) the likelihood in the model. A lower value of the BIC indicates a better fit.

The results are presented in Figs. 3, 4 and 5, while a more detailed breakdown is provided in the Appendix. The SEV models outperform both the Lee–Carter and the Niu-Melenberg models in terms of both MAD and MAPE. As the two metrics are each more sensitive to different ages, better performance in terms of both suggests a better fit at all ages.

Fig. 3
figure 3

Mean absolute deviation (MAD) for SEV, Niu-Melenberg and Lee–Carter models

Fig. 4
figure 4

Mean absolute percentage error (MAPE) for SEV, Niu-Melenberg and Lee–Carter models

Fig. 5
figure 5

Bayesian information criterion (BIC) for SEV, Niu-Melenberg and Lee–Carter models

In terms of BIC, the SEV models have either a lower BIC or a very close BIC to both the Lee–Carter and Niu-Melenberg models, outperforming the Lee–Carter model for 18 countries out of 37 and the Niu-Melenberg model for 12 countries out of 37. The proposed single-population models therefore improve the fit beyond their cost in additional model complexity for between a third and half of the countries examined, while in the other countries the improved fit comes at a negligible added complexity.

4.2 Forecasting performance

Even more important than fitting historical data is the model’s ability to forecast future mortality rates. The forecasting performance of the single-population models is evaluated on historical data by estimating the model up to a jump-off year, forecasting the remaining years until the end of the sample and then comparing the forecasts with the actual mortality rates. The metric used is the relative root mean forecast square error (RMFSE), as in Boonen and Li (2017).

Given a jump-off year \(\widehat{u}\), a predicted logarithm of mortality rate \(\widehat{log\hspace{0.25em}{m}_{x,i,t}}\) for country \(i\), age \(x\) and year \(t\), and \({U}_{i}\) being the end of sample year for population \(i\), the RMFSE is:

$${\text{RMFSE}}_{M} \left( {i,\hat{u}} \right) = \sqrt {\frac{1}{{N\left( {U_{i} - \hat{u}} \right)}}\mathop \sum \limits_{{u = \hat{u} + 1}}^{{U_{i} }} \mathop \sum \limits_{x = 0}^{N} \frac{{\left( {\log m_{i,x,u} - \widehat{{\log m_{x,i,t} }}} \right)^{2} }}{{\left| {\log m_{i,x,u} } \right|}}}$$

The period term \({k}_{t}\) has been modeled as a random walk with drift. The models are evaluated with jump-off years between 2000 and 2010, in order to compare different forecast horizons. The results are presented for all countries in Table 4 for years 2000, 2005 and 2010. For the whole span of jump-off years 2000–2010, the results are presented in Fig. 6 for a group of low-mortality countries with similar mortality paths and in Fig. 7 for a group of similar high-mortality countries.

Table 4 Ratio between root mean square forecast squared error (RMFSE) for single-population models and correspondent Niu-Melenberg (NM) models, jump-off year 2000, 2005 and 2010, only countries with at least 9 years of data up to jump-off year
Fig. 6
figure 6

Difference between RMFSE for the SEV model for the Niu-Melenberg model, low-mortality countries (SEV is performing better if the line is under the dashed line)

Fig. 7
figure 7

Difference between RMFSE for the SEV model for the Niu-Melenberg model, high-mortality countries (SEV is performing better if the line is under the dashed line)

The forecasting performance has been tested for a given jump-off year if the model had at least 10 years of data up to the jump-off year included, if data was available for at least one year after the jump-off year and if the country is big enough to have at least 5.000 deaths per year across all ages considered.

For 12 countries out of 23, the ratio between the RMFSE of the SEV model and the Niu-Melenberg model is lower than 1, that is, the SEV model outperforms the Niu-Melenberg model in terms of RMFSE. For jump-off years 2005 and 2010 the countries outperforming the Niu-Melenberg model are 18 out of 31 and 12 out of 34, respectively. Overall, adding external variables beyond GDP improves the forecasting performance of the model for about half of the countries.

The relative performance of the SEV and Niu-Melenberg models can vary from year to year. As shown in Fig. 6, the SEV model is generally outperforming the Niu-Melenberg model for Australia, Switzerland and Canada.Footnote 3 For France, Italy and Spain, the better model changes from year to year. Still, for all six countries there are years when the SEV model outperforms the Niu-Melenberg model and vice versa.

This is even more evident in Fig. 7: the SEV model decidedly outperforms the Niu-Melenberg model in Bulgaria and, to a lesser extent, Slovakia, while it underperforms in Hungary. For all three countries, though, there are years where the opposite is true.

There is no clear relationship between the number of years used to train the model and the relative forecasting performance. The two countries with data starting in the 2000s (Belgium and Croatia) do not outperform the Niu-Melenberg model, but there are both countries with data starting in 1975 (the Netherlands) and countries with data from 1992 onwards (Russia) that outperform the Niu-Melenberg model for all applicable jump-off years.

4.3 Composition of external factors

While the \(O\) external variables \({h}_{o,t}\) used for the model are the same for all countries, the external factors \({g}_{l,t}\) actually used in the fitting process are linear combinations of the \({h}_{o,t}\), obtained through singular value decomposition. The loadings of the principal components express the relative importance of the variables for each principal component, and their analysis highlights whether variables’ importance is constant or differs across countries.

As shown in Table 5, the first principal component has a positive loading for real GDP that is quite stable across countries, as can be seen by the low standard deviation of the loading. Likewise, temperature anomalies have a positive loading and a low standard deviation, same for male height at age 18. These three variables all have a positive impact on mortality reductions, while the share of males with raised blood pressure has the opposite impact and has consequently a negative loading.

Table 5 Loadings summary for first and second principal component (PC) used in single-population models

The variable loadings for the second principal component are more varied, with most variables more or less equally split between positive and negative loadings. A more detailed look at the loadings in Table 6 reveals how the second principal component usually has one or two (in the case of Spain, three) variables with a high loading, over 0.5 in absolute value. Cigarette consumption and caloric supply have a loading over 0.5 in absolute value for the second principal component for 11 countries, then fruit and vegetable consumption (10 countries), fossil fuel consumption (9), alcohol consumption (7), male height at age 18 (5), temperature anomalies (2), share of men with raised blood pressure (1), while real GDP per capita never has a loading over 0.43 in absolute value.

Table 6 Loadings of the second principal component used in single-population models by country

4.4 Effects of external variables on mortality

The changes in the variables and their effect on mortality vary across countries. The changes in variables between the first and the last year considered and their respective contribution to log-mortality reduction are presented in Figs. 8 and 9, respectively. These countries are analyzed in more detail in the Appendix.

Fig. 8
figure 8

Change of standardized variables between the first and last year of the time span analyzed by variable, selected countries

Fig. 9
figure 9

Contributions to log-mortality reduction between the first and the last year of the time span analyzed by age and variable, selected countries

The reduction in alcohol consumption in the Netherlands and in Spain appears to have had a substantial effect on mortality reduction. On the other hand, the effect is much smaller in Poland, where alcohol consumption actually increased, possibly due to more affluence. The effect is small and differs by age in Sweden. Rising temperatures contributed to strong mortality reductions in Spain and the Netherlands, while having a negligible effect in Poland and Sweden. The effect is likely due to milder winters, since heat waves and their impact on mortality are a relatively recent effect. Cigarette consumption is falling in all four countries and contributes significantly to mortality reductions, as well as increases in GDP. The reduction in fossil fuel consumption has a substantial effect in Sweden only.

The results are overall in line with the literature: reductions in alcohol and tobacco consumption drive down mortality, as well as increases in GDP, better cardiovascular health, increased fruit and vegetable consumption. Somehow surprising are the effects of caloric supply, with strong increases in the Netherlands and Sweden contributing to reduction in mortality, while having no or negative effects in Spain and Poland, respectively. The reduction in fossil fuel consumption had notable positive effects on mortality in Sweden only, while having little to no effect in the Netherlands, Spain and Poland. The effect of rising temperatures is also unclear.

Isolating the contribution of multiple variables allows to build scenarios regarding one or more variables: in example, cigarette consumption in Sweden was equal to 1,797 cigarettes per capita in 1975 and 749 in 2014. In standardized terms, consumption fell from a value of 1.329 to -1.329. A further fall to 500 cigarettes per person would translate into a reduction of mortality at, say, age 60 by a further 2.66%.

4.5 Robustness checks and alternative specifications

Several alternative specifications of the SEV model have been tested, as well as various covariate sets.

While the number of external variables has no direct effect on model complexity, since the external factors are principal components of the SVD of the external variables matrix, it has been tested nevertheless whether excluding certain variables improves model fit by reducing noise. The tests have been carried out for models covering the 40–90 age range, with a maximum of two principal components and an additional age-period term. The tested sets are presented in Table 7. No set is clearly superior in terms of MAD and MAPE, while considering BIC the full variable set performs best, minimizing the BIC for 11 countries and being the second best choice for 4 more countries. It is therefore possible to further improve on the results presented in Sect. 4.1 by choosing a different subset of covariates, as the full variable set was chosen for all countries for comparability.

Table 7 Alternative variable sets to the base set and number of countries for which the set maximizes MAD and MAPE and minimizes BIC

For both the 0–90 and 40–90 age ranges, it has been investigated whether the inclusion of an age-period term and different limitations on the number of external factors can improve model fit. These tests have been carried out with the full covariate set and the results are presented in Table 8. While more relaxed criteria on inclusion of principal components of external variables increase fit, the BIC suggests more stringent limits. For the 0–90 age range, BIC is minimized for most countries with the inclusion of 3 principal components, while for 6 countries the BIC is minimized with the inclusion of a fourth principal component. For the 40–90 age range, two to three principal components are generally sufficient and additional components add little value. An age-period term markedly improves model fit in all cases.

Table 8 Parameter sets and number of countries for which the set maximizes MAD and MAPE and minimizes BIC

The age range chosen for this study, 40–90, excludes both younger and older ages. As shown in Sect. 3.3, the correlation of mortality rates with external variables drops substantially at ages over 90. In a model covering a larger range of ages, the role of an age-period term would be to model and consequently forecast those ages for which external variables hold little explanatory power.

Younger ages, on the other hand, pose different challenges. They are correlated with external variables, but since over the years they are representing an increasingly smaller share of total deaths, they are more sensitive to specific events that target those ages specifically. A telling example is the AIDS epidemic, which had a very strong effect on male mortality for ages up to 40 until 1995. Its effect on mortality over 40 is briefly discussed in the Appendix when discussing the model results for the USA. Moreover, the correlation of some variables with mortality rates shows a significant drop in the age range 20–40 for males, i.e., GDP as evidenced by Seklecka et al. (2019). While the identification of factors affecting young mortality specifically is outside the scope of this study, calibration of the model on a per-country basis would benefit from the inclusion of external variables disproportionately affecting younger ages.

Ultimately, the 40–90 age range has been chosen due to an overall better forecasting performance compared to the Niu-Melenberg model, with the RMFSE ratio being lower than 1 for a higher number of countries at all jump-off years tested.

Model specification has a larger effect on the BIC than the choice of covariates: while testing alternative specifications, the BIC would vary substantially in terms of countries outperforming the corresponding Lee–Carter model, while as far as the choice of variables is concerned, once a model outperformed the Lee–Carter model, it would continue doing so under all sets of variables tested for almost all countries.

4.5.1 Cohort terms

As shown in the previous sections, some countries exhibit cohort effects; therefore, it is sensible to investigate whether including a cohort term would improve the fit. Unfortunately, when fitting the model with both an age-period term and a cohort term, an infinite deviance is produced, preventing a successful estimate of the parameters; hence, the inclusion of a cohort term requires the omission of the age-period term.

A single-population model with a cohort term and no age-period term has been estimated for three countries where model residuals suggested a cohort effect: Spain, Poland and the USA; the results are presented in Table 9. The inclusion of a cohort term yields mixed results for the USA, where there is a decrease in MAPE and BIC, but not in MAD, while for Spain and Poland the fit is noticeably worse.

Table 9 Mean absolute deviation (MAD), mean absolute percentage error (MAPE) and Bayesian information criterion (BIC) for single-population models with cohort term and no age-period term

4.5.2 Stationarity of the age-period term

The age-period term \({b}_{x}{k}_{t}\) fits trends not captured by the external factors \({g}_{l,t}\). But it is possible that the age-period term captures temporary idiosyncrasies, which vanish in the long term. To this end, the stationarity of the \({k}_{t}\) terms has been investigated with the Phillips–Perron test. For 11 countries, the period term \({k}_{t}\) is stationary (with a p value less than 0.05) and therefore for these countries there are no systematic trends not captured by the external factors \({g}_{l,t}\).

5 Discussion

The effect of economic development, environmental and lifestyle factors on mortality is well documented. Prosperity, living in a clean environment and having healthy habits all contribute to a long life. Mortality rates and these factors are correlated and cointegration analysis has shown that this relationship is stable over the long run even if the individual series are usually non-stationary and also include structural breaks. Hence, external variables can be credibly incorporated in a stochastic mortality model. On one hand, this can improve fit and forecasting performance compared to the Lee–Carter model and to the Niu-Melenberg model which only includes GDP. In addition to improving model performance, including a wider set of variables can also improve the interpretability of stochastic mortality models and offer insights on the relative importance of different factors and how they vary across countries. This makes it easier to elaborate scenarios considering the trade-offs, i.e., between economic growth and environmental protection.

The main goal of this study was to build a model that could be useful in scenario building and policy planning. To this end, expanding the scope of variables included in the model is crucial. If we don’t model a variable, it is ignored: the optimal policy to reduce mortality based on a model which only incorporates GDP is to maximize GDP, no matter the effect on environment and on public health, which is clearly nonsensical.

Mortality is a complex phenomenon and stochastic mortality modeling is in a constant trade-off between simple, parsimonious models and more comprehensive models that capture the multifaceted nature of mortality. The complexity of mortality is reflected in the fact that in none of the analyzed countries the relationship between mortality and the other variables could be described by a single principal component. To put it otherwise, economic prosperity is not enough. Moreover, while GDP features prominently in the first principal component in all countries, the composition of the second principal component is much more varied, suggesting that different factors have different weight in explaining and predicting country-level mortality. Caloric supply, cigarette consumption, fossil fuel consumption, a proxy for air quality, fruit and vegetable consumption and, to a lesser degree, alcohol consumption feature prominently in the second principal component.

The differences in the weights of the covariates between countries suggest that each country, with its own characteristics and evolution of mortality, faces different challenges. The present analysis, with a fixed set of covariates for all countries, is meant to show the potential benefits of a more flexible model. The SEV model with the full set of external variables outperforms the Lee–Carter model for 18 countries, the Niu-Melenberg model for 12 countries and both models for a diverse set of 11 countries, that is, Australia, Bulgaria, Hungary, Italy, Japan, Netherlands, Russia, Spain, Ukraine, UK and USA, plus Canada for which a lack of data for a covariate prevents the calculation of the Lee–Carter model for the same years. The choice of the variable set has been shown to have an impact on the BIC and the full variable set minimizes the BIC for just 11 countries (4 of which also outperform the Lee–Carter and Niu-Melenberg models). One can reasonably conclude that for each individual country, both fit and forecasting performance can be improved upon with different sets of covariates, tailored on that country’s characteristics, and an appropriate variable selection.

A more in-depth analysis of the contributions of the various variables to mortality reductions over the period of analysis has been carried out for four countries. While GDP accounts for a substantial part of the mortality reductions over the period, the other variables are important as well, with effects generally in line with those estimated in the literature. Across countries and ages, variables’ importance can differ markedly, highlighting the need to tailor the models to the specific country’s characteristics and mortality history. Temperature anomalies—that is, deviations from long-run averages due to warming—appear to have contributed to mortality reductions in countries where their effect is non-negligible. This result should be further investigated and could perhaps be linked to reduced winter mortality. Over the period considered, milder winters might have had a stronger effect on mortality than hotter summers and heat waves.

Nevertheless, the choice of covariates appears to have captured the most common factors influencing mortality. Beyond the external variables, the additional age-period term has no clear trend across countries and doesn’t have a common pattern for the age loadings either. Its interpretation varies from country to country, suggesting no easily identifiable omitted variables. To the extent where the age loadings peak at a moderately old age (65 years in Italy, 75 in Spain), it could conceivably be interpreted as the effect of the healthcare system’s ability to reduce mortality beyond national affluence and lifestyle choices. For a little under a third of all countries considered, though, it’s a stationary process and thus it may simply capture past idiosyncrasies with no relevance for forecasting. This suggests that the external factors used actually captured the relevant trends in mortality.

The inclusion of a diverse set of covariates can improve forecasting performance upon the Niu-Melenberg model, as shown in backtesting. Arguably, though, the biggest advantage of using covariates covering a number of different factors is that it allows for wide-ranging model-based scenario planning based on actual historical data. How would mortality be affected if excise taxes were to be raised to the point that alcohol consumption is halved? How would a protracted stagnation impact on mortality? The impacts on mortality of policy decisions do not concern only policymakers, but actuaries as well, since policies with a future effect on mortality, like a smoking ban, have an immediate effect on pricing. Trends in mortality should be modeled using a range of scenarios incorporating both current trends and stated policy objectives in order to account for the full uncertainty of all factors.

5.1 Limitations

Even though mortality rates and the external variables used in this study have historically been correlated, these correlations are not guaranteed to remain valid in future as well. The covariates are proxies of actual factors that influence mortality and the relationship between the proxy and the factor may not hold in time, there could be possible non-linear effects (i.e., temperature anomalies and heat waves) and other unforeseen factors that may influence mortality may emerge (i.e., vaping as an alternative to smoking).

While the study refrains from asserting causal relationships between external variables and mortality, it acknowledges the potential for spurious correlations. It is important to highlight, however, that existing literature offers credible biological reasons linking mortality to these variables and hence using them for stochastic mortality modeling is grounded in existing scientific discourse.

Possibly the main difficulty in applying the SEV model to forecasts lies in the need to predict covariates, especially considering the inherent difficulty in forecasting economic variables. The incorporation of a broader range of variables both exacerbates and mitigates this challenge. On the one hand, forecasting multiple variables adds complexity and uncertainty. On the other hand, errors in forecasting individual variables are more likely to offset each other, at least partially. Larger errors in a single covariate have a relatively diminished impact on forecasts compared to single-covariate models like the Niu-Melenberg. As for how to obtain estimates, the practitioner might either use expert forecasts, i.e., the International Energy Agency’s scenarios for fossil fuel consumption, or extrapolative methods.

A purely extrapolative model, despite its apparent simplicity, does not inherently reduce uncertainty. Rather, it implicitly overlooks the uncertainty linked to the influence of covariates on mortality. This uncertainty persists whether explicitly modeled or not; the crucial distinction lies in acknowledging and quantifying this uncertainty in the forecasting process. Therefore, forecasting covariates, although challenging, presents a more comprehensive and transparent approach to mortality prediction. This approach accounts for the inherent uncertainty in external factors affecting mortality outcomes.

In some countries model residuals evidenced the presence of cohort effects, a well-known phenomenon in the study of mortality. The model presented allows for the inclusion of cohort terms at the expense of age-period terms. In general, a cohort term can improve model fit, but it is not a given and its inclusion should be considered carefully against the alternatives and considering the estimation difficulties.

A technical limitation of the model is its need to use uncorrelated factors, which leads to the use of principal components and therefore limits the direct interpretability of the model coefficients. Nevertheless, the impact of a specific variable can be derived from the model coefficients and this doesn’t limit the model ability to forecast mortality rates using arbitrary future values of the external variables, which is its main application. Another limitation of stochastic mortality models is their need of data for a relatively large number of years, which limits the set of available covariates and raises questions about the comparability of data across space and time.

A possible avenue for future research would be an extension to the multipopulation case. On the same note, further research is needed to determine why the importance of the different variables in the single-population model varies across countries.