Introduction

Every two years, the United Nations Population Division (UN) publishes an updated edition of the World Population Prospects (WPP), which has included projections of the populations of all the countries in the world to 2050, broken down by age and sex. The UN issues several variants of its population projections, of which the most important is the medium variant, viewed as the best projection of future population trends. It also issues high and low variants, obtained by increasing and decreasing the total fertility rate (TFR) by half a child, respectively. The high and low variants are scenarios designed to indicate what would happen if the assumptions underlying the medium variant were violated in various ways; they are not probabilistic projections. The UN does not issue variants indicating the likely impact of differences in future mortality rates. Largely because of the need to project climate change over the next century, the UN has recently extended its population projections to 2100.

Fully probabilistic population projections are an alternative to scenarios, which may be preferable because, unlike scenarios, they indicate the likely range of future population outcomes (Bongaarts and Bulatao 2000). To produce probabilistic population projections for all the world’s countries, one needs probabilistic projections of the main demographic processes affecting national populations: fertility, mortality, and international migration.

This article is part of a research program aiming to do this. The UN produces many of its current (deterministic) projections by projecting broad summaries of population processes, and then breaking them down into age-specific rates using model schedules and relational models, to yield the age- and sex-specific fertility, mortality, and migration rates that are required by the standard cohort-component population projection method. Our research therefore focuses on probabilistic projection of summary population measures. Fully probabilistic projections of the TFR for all countries have already been produced (Alkema et al. 2008, 2011; Raftery et al. 2009).

The best known approach to probabilistic projection of mortality is that of Lee and Carter (1992), which uses past age-specific death rates for each year over time in a country. They used a log-linear model for the age-specific mortality rate at age x in year t, \(\log (m_{xt}) = a_{x} + k_{t} b_{x} + \upvarepsilon _{xt}\), where \(k_{t}\) is the mortality index. They projected \(k_{t}\) into the future using a random walk model with constant drift. The Lee-Carter method has been found to perform well for some developed countries (Bell 1997; Booth et al. 2005). In simultaneous forecasts for a group of countries, a common age parameter is fixed to ensure consistent forecasts of multiple countries (Li and Lee 2005).

However, the Lee-Carter method requires that age-specific death rates be available for at least three time periods, which is not the case for many developing countries (Li et al. 2004). Also, whereas the Lee-Carter model specifies changes in age-specific mortality rates to have a constant distribution on the logarithmic scale, White (2002) found that linear changes in life expectancy gave a better fit to data from 21 industrialized countries. Further, Lee and Miller (2001) found that the forecasts of the Lee-Carter method for the United States, Japan, Canada, Sweden, and France systematically underestimated future life expectancy.

Girosi and King (2008) proposed a Bayesian method based on smoothing age-specific mortality rates over age and time. They argued that this would outperform other methods when informative covariates are observed. They applied their method to all-cause mortality for males for 48 countries (in their section 12.1), using tobacco use and GDP as covariates, and showed that it outperformed the Lee-Carter method. Most of these 48 countries are among the 39 % of countries that have good vital registration and other data. Our goal, however, is to provide forecasts for all the countries of the world, many of which do not have available covariate and age-specific mortality data of the kind and quality used by Girosi and King (2008: chap. 12).

The Lee and Carter (1992) and the Girosi and King (2008) approaches are based on the assumption that the rate of change of age-specific mortality remains constant over time. This does not hold for more than a few decades into the future (Li and Gerland 2011). However, the UN’s projections are for 90 years into the future, and these methods can be problematic for such a long projection horizon. The choice of the reference period can have a substantial influence on the results, and reversals of historical mortality trends, as in the former Soviet Union, can lead to implausible projections.

Lutz and colleagues at the International Institute for Applied Systems Analysis (IIASA) have taken a different approach, producing probabilistic projections from expert opinion. They addressed data limitations by aggregating countries into regions and forecasting regional life expectancy based on expert-based probabilistic projections (Lutz et al. 2004).

We propose an approach that is adapted to the UN’s task of projecting populations for all the countries in the world, many of which have data that are patchy and of variable quality. We project period life expectancy directly, using a random walk model with a nonconstant drift. The drift term is a nonlinear function of current life expectancy and reflects the fact that life expectancy tends to improve more slowly for the countries with the lowest and highest life expectancies, and more quickly for the countries in the middle. Also, the overall rate of improvement varies by country. We use a Bayesian hierarchical model (BHM), which allows us to estimate the rate of improvement in life expectancy for a country using past data from that country, and also taking account of the observed past patterns in all other countries.

We develop a one-sex model for males and discuss potential extensions to a two-sex model in the closing “Discussion” section.

The article is organized as follows. We first discuss the UN’s current projection methodology. We then describe our proposed model, which is a natural extension of the UN’s current practices from deterministic to probabilistic projections. We assess our model by holding back the last 10 years of data (1995–2005), reestimating our model using only the data from 1950 to 1995, generating probabilistic projections for 1995–2005, and comparing them with what was actually observed.

We then present probabilistic projections of life expectancy to 2100 in three countries with widely different current life expectancies—Madagascar, Latvia, and Japan—each of which presents different forecasting challenges. Life expectancy in Madagascar has been improving steadily but is still in the lowest quartile. Latvia has experienced a mortality crisis in the past generation, with both declining and increasing life expectancy. Japan is a leading country, with one of the highest life expectancies. For each country, we present out- of-sample projections for 1995–2005. Lastly, we discuss aggregation of country-specific projections. Comparisons are made with the regional projections for South Asia by IIASA.

Methodology

Data

We use the estimates of male period life expectancy at birth from the UN World Population Prospects (WPP) 2008 Revision from 1950 through 2005 (United Nations 2009). Period life expectancy refers to the life expectancy of a hypothetical cohort subjected to current mortality rates throughout its life (Preston et al. 2001: Section 3.1). Because of the significant impact of the HIV/AIDS epidemic on mortality rates, we do not include countries with a generalized HIV/AIDS epidemic in this analysis. We base our results on data from 158 countries, comprising about 90 % of the world’s population.

The UN produces estimates of age-specific mortality and period life expectancy at birth for 230 countries and areas, updated every two years. The data available vary widely between countries, with only 89 countries (39 %) having good vital registration data allowing direct and accurate estimation of age-specific mortality rates. Other countries have incomplete vital registration data (32 countries), summary estimates of child and adult mortality (38 countries), or estimates of child mortality only (49 countries), based on surveys, censuses or administrative records. Twenty-two small areas have no recent data at all.

There are many problems with the available data, particularly in the majority of countries without good vital registration data. These include the absence of age and sex breakdowns in census data, highly questionable census counts, incomplete geographical coverage, and major divergences between mortality estimates from different sources. The UN adjusts the available data in light of knowledge about the biases and quality of the different data sources. About half the countries lack age-specific mortality data; in these cases, the UN uses model life tables and relational models to estimate life expectancy from the available summary information about child and adult mortality.

There is considerable regional variation in the availability of reliable data. Of the 50 countries in Asia, 56 % have “reliable” or “fairly reliable” vital statistics, whereas 95 % of the countries in Europe and North America have reliable statistics. This number decreases dramatically in Africa, where only 5 of the 54 countries (9 %) maintain “reliable” or “fairly reliable” vital statistics (United Nations 2006).

Current UN Population Projection Methodology

Currently, the UN projects life expectancy at birth deterministically. The life expectancy at birth, \(\ell _{c,t + 1}\), for country c, in the next five-year period, \(t + 1\), is projected to be the life expectancy in the current time period, \(\ell _{c,t}\), plus the expected gain in life expectancy, \(g(\ell _{c,t})\). Observed five-year gains in life expectancy for 158 countries from 1950 to 2005 are plotted in Fig. 1. This figure highlights the nonconstant rate of change in life expectancy. To capture this, the UN has developed models that represent the gains in life expectancy by a double-logistic function of current life expectancy. The five deterministic UN models are shown in the bottom panel, where models vary by pace of gains in life expectancy.

Fig. 1
figure 1

Observed five-year gains in life expectancy, plotted against the life expectancy at the beginning of the five-year period. UN estimates for 158 countries from 1950 to 2005 are included in this figure (\(n = 1,738\)). Each point represents an observed five-year gain in life expectancy within a country. The black line is a locally weighted polynomial (lowess) regression of the observations, which highlights the nonconstant rate of gains in life expectancy. Included in the upper plot are the fitted posterior median double-logistic functions for Japan and Madagascar from our model. The UN deterministic models are included in the lower plot. (Note that 31 observations (1.8 %) are outside the range of the plot and are not shown, but they were included in the local regression)

The double-logistic function (Meyer 1994) has six parameters, as illustrated in Fig. 2. Four of them identify intervals of life expectancy when the rate of life expectancy gains is changing, one describes the approximate maximum gain in life expectancy, and the last parameter gives the asymptotic rate of gains as life expectancy increases. For each country, a UN analyst chooses one of five prescribed choices of the six parametersFootnote 1 by assessing the recently observed pace of mortality decline (United Nations 2009). The model implies that beyond a certain point, life expectancy increases at an effectively constant rate. This is consistent with research indicating that there is no evidence of an upper limit to life expectancy (Cohen and Oppenheim 2012; Oeppen and Vaupel 2002).

Fig. 2
figure 2

Illustration of the double-logistic function, based on a curve from the posterior distribution for Japan. The left plot illustrates the double-logistic function of five-year gains in life expectancy. The right plot is a trajectory of life expectancy with gains modeled according to the double-logistic function

The logistic function has been used for more than a century to model population growth. Marchetti et al. (1996) showed that a sum of logistic functions can be used to model not only the adoption and substitution of competing technological innovations (Fisher and Pry 1971; Meyer 1994; Meyer et al. 1999) but also the social diffusion, learning, and adoption of new ideas, norms, attitudes, and behaviors associated with the fertility and mortality transitions (Marchetti 1997; Marchetti et al. 1996; Potter et al. 2010) or nuptiality (Goldstein and Kenney 2001; Hernes 1972; Li and Wu 2008).

The transition from high to low mortality can be decomposed into two processes, each of which can be approximated by a logistic function. The first process consists of initial slow growth and diffusion of progress against mortality (e.g., small mortality improvements at low levels of life expectancy associated with diffusion of hygiene and improved nutrition), followed by a period of accelerated improvements, especially for infants and children (e.g., larger gains associated with greater social and economic development, and mass immunization). The second process kicks in once the easiest gains have been achieved against infectious diseases, and produces continuing gains against noncommunicable diseases. These improvements occur at a slower pace because of ever-greater challenges to the prevention of premature deaths at older ages resulting from cardiovascular diseases or neoplasms, and to the delay of the onset of aging (Fogel 2004; Riley 2001).

To summarize, the UN projects life expectancy in the next time period deterministically using the equation

$$ \ell_{c,t+1} = \ell_{c,t} + g(\ell_{c,t}). $$
(1)

The expected five-year gain in life expectancy is a double-logistic function of the current level of life expectancy—namely,

$$ \begin{array}{lll} g(\ell_{c,t}| \boldsymbol{\uptheta}^{c}) &=& \frac{k^{c}}{1+\exp\left(-\frac{A_{1}}{\Delta_{2}^{c}}\left(\ell_{ct}-\Delta_{1}^{c} - A_{2} \Delta_{2}^{c}\right)\right)} \\&& + \frac{z^{c}-k^{c}}{1+\exp\left(-\frac{A_{1}} {\Delta_{4}^{c}} \left(\ell_{ct}-\sum_{i=1}^{3}\Delta_{i}^{c}-A_{2} \Delta_{4}^{c}\right)\right)}. \end{array} $$
(2)

In Eq. (2), \(\boldsymbol {\uptheta }^{c} = (\Delta _{1}^{c},\Delta _{2}^{c},\Delta _{3}^{c},\Delta _{4}^{c},k^{c},z^{c})\) are the six parameters of the double-logistic function for country c, whose meaning is illustrated in Fig. 2. The vector \(\boldsymbol {\uptheta }^{c}\) of the parameters for country c are chosen by a UN analyst from the five possibilities \((\boldsymbol {\uptheta }^{\text {Very Slow}}, \boldsymbol {\uptheta }^{\text {Slow}}, \boldsymbol {\uptheta }^{\text {Medium}}, \boldsymbol {\uptheta }^{\text {Fast}},\boldsymbol {\uptheta }^{\text {Very Fast}})\). The constants \(A_{1} = 4.4, \, A_{2} = 0.5\) are chosen so that the parameters \(\{ \Delta _{i}^{c}: \, i=1,2,3,4 \}\) are on an interpretable scale, but they are arbitrary in that they could be changed without altering the results, provided that their product, \(A_{1} A_{2}\), remains unchanged.

Stochastic Model

The UN projection method is deterministic and does not account for uncertainty. We now extend it to a stochastic model to allow for uncertainty. This involves two extensions. The first allows for stochastic changes within a country by replacing the deterministic model in Eq. (1) with a stochastic one by adding a random perturbation to Eq. (1). It then becomes a random walk with drift, where the drift term is given by the double-logistic function.

The second extension is to allow the parameters of the double-logistic function to vary between countries over a continuous range rather than among the current five UN possibilities. The resulting hierarchical model is

$$ \ell_{c,t+1} = \ell_{c,t} + g\left(\ell_{c,t}| \boldsymbol{\uptheta}^{(c)} \right) + \upvarepsilon_{c,t+1} , $$
(3)

where

$$\begin{array}{rll} g\left(\ell_{c,t}| \boldsymbol{\uptheta}^{c} \right) &=& \text{Double-Logistic function with parameters }\boldsymbol{\uptheta}^{c}{\kern-1.25pt} , \\ \boldsymbol{\uptheta}^{c} &=& \left(\Delta_{1}^{c},\Delta_{2}^{c},\Delta_{3}^{c},\Delta_{4}^{c},k^{c},z^{c}\right){\kern-1.25pt} , \\ \Delta_{i}^{c}|\upsigma_{\Delta_{i}} &\stackrel{\textrm iid}{\sim}& \text{Normal}_{[0,100]}\left(\Delta_{i}, \upsigma_{\Delta_{i}}^{2}\right){\kern-1.25pt}, \;\;\;\; i=1,\ldots,4 , \\ k^{c} |\upsigma_{k} & \stackrel{\textrm iid}{\sim} & \text{Normal}_{[0,10]}\left(k, \upsigma_{k}^{2}\right){\kern-1.25pt} , \\ z^{c} | \upsigma_{z}& \stackrel{\textrm iid}{\sim} & \text{Normal}_{[0,1.15]}\left(z, \upsigma_{z}^{2}\right){\kern-1.25pt}, \end{array}$$

where Normal\(_{[a,b]} (\upmu , \upsigma ^{2})\) denotes a normal distribution with mean \(\upmu \) and standard deviation \(\upsigma \), truncated to lie between a and b.

This model allows us to pool information about the rates of gains across countries by assuming that each set of country-specific double-logistic parameters is randomly sampled from a common truncated normal distribution. The normal distribution is truncated such that all the double-logistic parameters are positive.

The parameter \(z^{c}\) is the asymptotic average rate of increase in life expectancy per five-year period. Our prior distribution for this is informed by the results of Oeppen and Vaupel (2002), who found a strong positive linear trend in the “best practices” life expectancy (i.e., the highest life expectancy in a given year) from the mid-nineteenth century through 2000. By assuming that \(z^{c}\) is nonnegative, we are assuming that life expectancy will continue to increase, on average. In their regression of highest male life expectancy on year, Oeppen and Vaupel (2002) estimated a slope of 1.11 years per five-year period, with \(R^{2} = .98\). Because this is the rate of increase for “best practices” countries, we assume that the asymptotic rate of increase for any given country will not exceed the upper bound of a 99.9 % confidence interval for this estimate—namely, 1.15.

To specify the distribution of the random perturbations, \(\upvarepsilon _{c,t}\), we first estimated the model assuming them to be normally distributed with a constant variance, using the estimation method described later. Figure 3 shows the absolute residuals from this fit with a fitted regression spline. The spread of the residuals clearly decreases with increasing life expectancy. To account for this, we modeled \(\upvarepsilon _{c,t}\) as normally distributed with standard deviation proportional to the regression spline fitted to the absolute residuals shown in Fig. 3, so that

$$ \upvarepsilon_{ct} \stackrel{\textrm iid}{\sim} N(0, (\upomega\times f(l_{c,t-1}))^{2}) . $$
(4)
Fig. 3
figure 3

Absolute residuals from the constant variance model plotted against across life expectancy, with fitted regression spline. (Note that 44 (2.8 %) of the residuals are outside the range of the plot but were included in the regression spline fit)

Our stochastic model builds on that proposed by Alkema et al. (2011) for probabilistic projection of the TFR for all countries. However, it differs in several respects. The double logistic model for the gains in life expectancy is more general than that for total fertility rate, since it asymptotes at a nonzero level, \(z^{c}\), which is estimated from the data for each country c. Also, prior information about the range of plausible values of \(z^{c}\) is available from other research, and this is incorporated explicitly via the Bayesian prior distribution. In the TFR model, in contrast, the prior distributions were largely uninformative.

Parameter Estimation

We adopt a Bayesian approach to estimating our model, making it a Bayesian hierarchical model. This requires specifying prior distributions for the 13 world parameters of the model: \((\Delta _{i}, \upsigma ^{2}_{\Delta _{i}})\) for \(i=1,\ldots ,4\); k, \(\upsigma ^{2}_{k}\), z, \(\upsigma ^{2}_{z}\), and \(\upomega \). We specify prior distributions that are proper but much more diffuse than the posterior distributions.

We set \(\Delta _{i} \sim N_{[0,100]}\left (a_{i},\updelta _{i}^{2}\right )\) for \(i=1,\ldots ,4\), \(k \sim N_{[0,10]}\left (a_{5}, \updelta _{5}^{2}\right )\) and \(z\,\sim N_{[0,1.15]}\left (a_{6}, \updelta _{6}^{2}\right )\). We set \((a_{1},\ldots ,a_{6})\) to the values specifying the UN medium-pace model: \((15.77,40.97,0.21,19.82,2.93,0.40)\). We set \(\left (\updelta _{1}^{2},\ldots ,\updelta _{6}^{2}\right )\) to the variances of the parameters among the different UN models.

For the world variance parameters—\(\upsigma ^{2}_{\Delta _{i}} (i=1,\ldots ,4)\), \(\upsigma ^{2}_{k}\), and \(\upsigma ^{2}_{z}\)—we used inverse-gamma prior distributions with 4 degrees of freedom (i.e., a shape parameter equal to 2). To set the parameters of these priors, we first fit the double-logistic model by least squares to the data from each country individually; then, for each parameter, we computed the empirical average squared deviations from the values for the UN medium-pace model. Next, we set the prior means of the reciprocals of the world variance parameters equal to the reciprocals of these values. This yielded rate parameters \((15.6^{2}, 23.5^{2}, 14.5^{2}, 14.7^{2}, 3.5^{2}, \) and \( 0.6^{2})\) for the six inverse-gamma prior distributions. The resulting prior distributions are guaranteed to be much more spread out than the posterior distribution. Finally, a diffuse Uniform [0,10] prior was used for \(\upomega \).

Experiments showed that the results were insensitive to changes in these priors, which is to be expected because the resulting prior distribution is much more spread out than the posterior distribution.

The posterior distribution of the world and country-level parameters was approximated by Markov chain Monte Carlo implemented in R. The approximately 1,000 parameters were updated one at a time, using Gibbs sampling (Gelfand and Smith 1990), Metropolis-Hastings sampling (Chib and Greenberg 1995; Hastings 1970), or slice sampling (Neal 2003). We used three chains, each the length of 100,000 scans, with a burn-in of 10,000 scans. Visual inspection of the trace plots, the Raftery-Lewis diagnostic (Raftery and Lewis 1992), and the Gelman-Rubin statistic (Gelman and Rubin 1992) indicated that the chains had converged and had explored the posterior distribution enough to yield good estimates of posterior quantiles of interest.

A free publicly available R software package called bayesLife is available to implement the method (Ševčíková and Raftery 2011). An additional R package called bayesDem, also freely and publicly available, provides a graphical user interface for bayesLife (Ševčíková 2011).

Model Validation

To assess our probabilistic projections, we fit our model to data from 1950 to 1995, giving 1,422 country-period combinations. We then used the resulting model to forecast life expectancy for males for the two five-year periods 1995–2000 and 2000–2005, and compared the forecasts with what was actually observed. We had 316 out-of-sample predictions.

Comparing probabilistic forecasts with outcomes is challenging because it involves comparing two different kinds of objects: a probability distribution and a single value (Gneiting et al. 2007). A probabilistic forecast should be calibrated; that is, x % prediction intervals should contain the truth x % of the time, on average. It should also be sharp; that is, the intervals should be as narrow as possible.

To assess the predictive ability of our model, we examined the mean absolute error (MAE) of our point forecasts. To assess the calibration of our probabilistic forecasts, we used two measures: the standardized absolute predictive error (SAPE) of our predictions, and the calibration of our predictive intervals. The SAPE is the absolute difference between the observed life expectancy (\(l_{ct}\)) and the median forecast (\(\hat {l}_{ct}\)), standardized by the standard deviation of the predictive distribution. It is then scaled so that when the model is correctly specified, the expected SAPE value is equal to 1. It is defined as \(\sqrt {\frac {\pi }{2}}*\frac {|l_{ct} - \hat {l}_{ct}|}{\hat {\upsigma }_{pred, ct}}\). If the model is well calibrated, we expect the mean SAPE to be close to 1.

These metrics are given in Table 1. The MAE of our median predictions was 1.07 years. Thus, our “best guess” was within slightly more than one year of the actual observation, on average. The mean SAPE was 1.04, which is close to the theoretical mean of 1, indicating that our predictive standard deviations were accurate. Overall, our model was well calibrated with our 95 % prediction intervals capturing the actual observations 92 % of the time and the 80 % intervals capturing the observations 82 % of the time.

Table 1 Summary measures for 10-year out-of-sample predictions for the Bayesian hierarchical model (BHM) and the current UN methodology

We compared the predictive ability of our model with the current UN methodology. Replicating our cross-validation methodology by using WPP 2008 data through 1995, a UN analyst computed life expectancy forecasts for 1995–2005 using one of the five prescribed UN models of gains in life expectancy at birth based on levels and trends in the preceding two decades. The MAE of our method was substantially smaller than that of the current UN method (Table 1), improving on it by more than 40 %.

We assessed the sharpness of our projections using the distribution of 80 % prediction interval half-widths. (If a prediction interval is symmetric, it is equal to the point prediction plus or minus the half-width.) For the 1995–2000 period, the prediction interval half-widths for different countries ranged from 0.7 to 1.9 years, with an average half-width of 1.3 years. For the next five-year period, 2000–2005, the interval half-widths increased to a range of 1.0 to 3.2 years, with an average half-width of 1.9 years. Average life expectancy at birth and prediction interval widths both varied by region. Among the continental regions from 1995 to 2005, Africa had the lowest life expectancy of 59.8 years, with an average interval half-width of 2.2 years, even after we excluded the 38 countries with generalized HIV/AIDS epidemics. With an average life expectancy of 73.5 years and an interval half-width of 1.3 years, North America had both the highest life expectancy and the narrowest prediction intervals.

Our model implies the possibility of crossovers between countries in the future, and one concern is whether this is plausible. Crossovers have happened in the past, so a probabilistic projection method should allow for the possibility in the future. The question is whether the model overestimates the probability of crossovers.

To investigate this, we computed the proportion of pairs of countries for which there was a crossover during our 55-year data period, 1950–2005, and compared it with our projected probability of crossover during the next 55-year period, 2005–2060. During 1950–2005, there were 4,202 crossovers among the \({158 \choose 2} = 12,403\) pairs of countries, a proportion of .34. For the projected 2005–2060 period, the posterior predictive probability of a crossover is .24, actually lower than the proportion in the historical period. The proportion of crossovers between the posterior median projections in 2005–2060 is .11, considerably lower than the historic proportion. Thus, it seems unlikely that the projection method is substantially overestimating the probability of crossovers in the future.

Case Studies

Typical Country: Madagascar

In the WPP 2008, the UN estimated the current life expectancy at birth among males in Madagascar to be 58.5 years. Panel a of Fig. 4 shows projections of life expectancy starting from 2005–2010. The median forecast from our BHM is similar to the UN’s WPP 2008 forecasts through 2050. The WPP 2008 projects male life expectancy in 2045–2050 will be 69.7 years. We project life expectancy will be 71.4 years, with an 80 % prediction interval of (65.5, 77.8). We project that 50 years later, in 2095–2100, life expectancy will reach 80.4 years, with a wider 80 % prediction interval of (72.6, 88.5). Panel b of Fig. 4 shows out-of-sample projections for Madagascar with projections beginning in 1990–1995. UN observed estimates are indicated in the plot by brown squares and are close to our median projection. The exclusion of two time periods results in more uncertainty in our projections for 2095–2100, with a median of 76.6 years and an 80 % prediction interval (64.7, 87.8).

Fig. 4
figure 4

Life expectancy projections for males in Madagascar. The plots include the UN projections and our median projections, with 80 % and 95 % prediction intervals. The life expectancy values used to estimate our model are indicated by grey circles. Panel a shows projections for 2005–2010. A typical stochastic trajectory is shown in black, illustrating that the future trajectory is likely to be less smooth than the median projection. Panel b shows cross-validation projections from 1990 to 1995. Observed life expectancies for the period 1995–2005 are shown as squares

Along with quantiles of the projected life expectancy distribution, we also include in Fig. 4a (as well as Figs. 5a and 6a for Latvia and Japan) a sample stochastic trajectory for Madagascar. We see that unlike the quantiles, the sample trajectory does not follow a smooth path. For this trajectory, the mean absolute deviation from the median is equal to the median mean absolute deviation among the posterior sample of projected trajectories. It can be viewed as a trajectory with typical deviation from the projected median.Footnote 2

Fig. 5
figure 5

Life expectancy projections for Latvia. The plots include the UN projections and BHM median projections, with 80 % and 95 % prediction intervals. The past values of life expectancy values used to estimate our model are shown by grey circles. Panel a shows projections to 2100 starting from 2005–2010. A typical stochastic trajectory is shown in black, illustrating the nonsmoothness of individual projections. Panel b shows out-of-sample projections starting from 1990–1995. Observed life expectancies for 1995–2005 are shown as squares. By 1995, Latvia had not yet recovered from its mortality crisis; the BHM projection intervals reflect uncertainty about a full recovery

Fig. 6
figure 6

Life expectancy projections for Japan. The plots include the UN projections and BHM median projections, with 80 % and 95 % prediction intervals. The life expectancy values used to estimate our model are shown by grey circles. Panel a shows projections for 2005–2010 with a sample trajectory. National Institute of Population and Social Security Research (IPSS) medium-variant projections are the same as the UN projections, with uncertainty bounds indicated in the shaded region. We include a trajectory with a constant increase of 1.11 years per five-year period, as estimated by Oeppen and Vaupel (O & V) (2002) for the “best practices” country. A typical stochastic trajectory is shown in black. Panel b shows cross-validation projections for 1990–1995. Observed life expectancies for 1995–2005 are shown as squares

Mortality Crisis: Latvia

Figure 5 shows estimated and projected life expectancies for Latvia. Male life expectancy in Latvia increased from 62.5 years in 1950 to 66.3 years in 1965. In the subsequent 15 years, however, male life expectancy in Latvia decreased by 2.2 years, to 64.1 years. Life expectancy increased again until a 3.8-year decline was recorded between 1985–1990 and 1990–1995. Since then, life expectancy in Latvia has been increasing again. Both the UN’s and our median projections predict a continuous increase in life expectancy, but ours predict a slower increase.

As can be seen from panel b of Fig. 5, our 80 % prediction intervals capture the observed estimates of life expectancy for 1995–2005. For the first time period, 1995–2000, the upper bound of our 80 % prediction interval was 64.4 years. Yet, the lower bound of our 80 % prediction interval, 61.1 years, indicates that life expectancy may continue to decrease. In fact, our prediction intervals allow for the possibility of life expectancy not increasing for the following 50 years, reflecting the erratic progress over the previous 40 years.

Leading Country: Japan

One of the difficulties with projecting mortality is accurately projecting the country with the highest life expectancy. Historically, “pessimists” believed that life expectancy could not keep rising at historic rates and assumed that there must be a “ceiling” to life expectancy for humans (Fries 1980; Olshansky et al. 1990, 2001, 2002, 2005). “Optimists,” on the other hand, saw no evidence of a limit to increases in life expectancy (Cohen and Oppenheim 2012; Oeppen and Vaupel 2002; Tuljapurkar 2005, Tuljapurkar et al. 2000). Past estimates of the “maximum life expectancy”have continually been surpassed (Oeppen and Vaupel 2002), and old-age mortality rates continue to decline (Vaupel et al. 1998). Oeppen and Vaupel (2002) presented evidence that the world’s highest, or “best practices,” life expectancy at birth has increased linearly across time and shows no signs of leveling off. They estimated that the “best practices” life expectancy for males has increased at a rate of 1.11 per five-year period. See Bongaarts (2006) for a review of the historical debate on life expectancy limits.

Although Japan does not currently have the highest male life expectancy (that title has belonged to Iceland since 2000), it has been the country with the highest overall life expectancy since 1980. Panel a of Fig. 6 plots male life expectancy in Japan. Also included in the plot is what the trajectory would be if male life expectancy in Japan increased at the “best practices” rate of 1.11 per five-year period. Vallin and Meslé (2009) updated and expanded the data time period (from 1840–2000 to 1750–2005) for “best practices” life expectancy. They found that a segmented line fit the extended time frame better, with the most recent segment (1960–2005) still having a strong positive slope (1.13 years per five-year period for women), and concluded that the Oeppen-Vaupel line may be too optimistic for the long-term future. Our median projection increases more slowly than the Oeppen-Vaupel “best practices” linear projection, reflecting the fact that the “best practices” line is for the best country at each time, and thus is likely to increase faster than for an individual country. However, the “best practices” trajectory is just within the upper bound of our 80 % prediction interval.

Bongaarts (2006) also found the Oeppen-Vaupel “best practices” rate to be overly optimistic. By decomposing mortality into juvenile, background, and senescent mortality, he observed that historically large gains in life expectancy were due to declines in juvenile mortality. Then, as juvenile mortality reached low levels, the rate of gains in life expectancy diminished. The rationale for this decomposition is similar to that for the double-logistic function in which there are periods of high gains in life expectancy (i.e., when juvenile mortality is declining) followed by a leveling off of gains (i.e., when the gains in life expectancy are due to incremental declines in senescent mortality). Bongaarts (2006) found that senescent life expectancy in countries with low mortality, on average, increased at a rate of 0.75 years per five-year period. The average asymptotic rate of gains estimated in our model was 0.84 years per five-year period, which is closer to the Bongaarts (2006) projected gains in senescent life expectancy than the “best practices” rate of increase of about 1.11.

Recently, the Japanese official projections made by the National Institute of Population and Social Security Research (IPSSR) extended the Lee-Carter method to provide more refined estimates of mortality at higher ages. The original Lee-Carter method estimated age-specific mortality rates for five-year age groups, with the last age group aggregating those 85 and older. More recently, the IPSSR used the shifting logistic model (Bongaarts 2005) to account for continued increases in life expectancy in Japan. IPSSR projections (low/medium/high rates of mortality decline variants, with the medium-variant being equivalent to the UN projections) (Kaneko et al. 2008) are included in panel a of Fig. 6. The IPSSR projections are more conservative and project an earlier leveling off of life expectancy than our projections, but they are still within our prediction intervals.

When looking at out-of-sample projections in panel b of Fig. 6, which begin in 1990–1995, we see that the UN out-of-sample projections suggested an immediate leveling off of life expectancy, unlike our projections. In fact, the observed life expectancy in 1995–2005 did not level off and instead continued to increase.

Other Countries

Updated projections of both male and female life expectancy for all the 158 countries analyzed in this article are shown in Online Resource 1.

Aggregation to Regional Projections: South Asia

Researchers at the International Institute for Applied Systems Analysis (IIASA) (Lutz et al. 2004) produced regional probabilistic prediction intervals for life expectancy using Delphi-type methods. A group of experts were asked to give 90 % prediction intervals for future life expectancy in each of 13 specified regions. Linear paths were then drawn from a normal distribution to produce probabilistic predictive distributions. This method uses demographic knowledge as its main input, whereas time-series methods (including ours) rely on past trends.

Country-specific projections allow regional projections to be made regardless of how the region is defined. To compare our projections with those of IIASA for South Asia, we aggregated the UN estimates and projections, as well as our projections, weighting country values proportionately to the male populations from 2005 to 2010.Footnote 3

The countries included in the IIASA-defined region of South Asia (with population percentage) are India (75.1 %), Pakistan (10.5 %), Bangladesh (9.9 %), Nepal (1.7 %), Afghanistan (1.5 %), Sri Lanka (1.3 %), Bhutan (0.04 %), and Maldives (0.02 %).

Our model takes no account of correlations between the random perturbations in life expectancy gains in different countries. Previous work (Alho 2008) has suggested that cross-country correlations are nonzero and should be modeled as such. Within South Asia, in the past 60 years, life expectancies were indeed highly correlated, as follows:

$$\left(\begin{array}{cccccccc}\text{Afghanistan} & .91 & .91 & .99 & .98 & .94 & .98 & .99 \\.91 & \text{Bangladesh} & .99 & .92 & .96 & .99 & .97 & .89 \\.91 & .99 & \text{Bhutan} & .92 & .96 & .99 & .96 & .89 \\.99 & .92 & .92 & \text{India} & .99 & .95 & .99 & .99 \\.98 & .96 & .96 & .99 & \text{Maldives} & .98 & .99 & .98 \\.94 & .99 & .99 & .95 & .98 & \text{Nepal} & .98 & .92 \\.98 & .97 & .96 & .99 & .99 & .98 & \text{Pakistan} & .97 \\.99 & .89 & .89 & .99 & .98 & .92 & .97 & \text{Sri Lanka}\\\end{array}\right).$$

However, for probabilistic projections, what matters are the cross-country correlations between the random perturbations (effectively the forecast errors) rather than between the life expectancies themselves. The hierarchical modeling of the change in life expectancy allows for between-country correlation in life expectancy. We are interested in the residual correlations in life expectancy gains, \(\uprho _{c_{i},c_{j}}\), between countries \(c_{i}\) and \(c_{j}\)—namely,

$$(\hat{\uprho}_{c_{i},c_{j}} ) = \left(\begin{array}{cccccccc}\text{Afghanistan} & -.28 & -.23 & .09 & 0 & -.07 & -.08 & .19 \\-.28 & \text{Bangladesh} & .05 & -.03 & .03 & 0 & .05 & -.19 \\-.23 & .05 & \text{Bhutan} & -.03 & .05 & .09 & .07 & .2 \\.09 & -.03 & -.03 & \text{India} & .16 & .03 & .03 & 0 \\0 & .03 & .05 & .16 & \text{Maldives} & .24 & .25 & .34 \\-.07 & 0 & .09 & .03 & .24 & \text{Nepal} & .08 & .05 \\-.08 & .05 & .07 & .03 & .25 & .08 & \text{Pakistan} & -.01 \\.19 & -.19 & .20 & 0 & .34 & .05 & -.01 & \text{Sri Lanka}\\\end{array}\right).$$

These correlations are much smaller than those between the life expectancies themselves, and in most cases are consistent with the absence of any correlation. Thus, our projection model has accounted for all or most of the between-country correlation in South Asia.

Country-specific projections were then made by sampling the vector of random perturbations, \((\updelta _{c_{i},t} )\), from a multivariate normal distribution whose covariance matrix incorporates the residual correlation matrix, \((\hat {\uprho }_{c_{i},c_{j}} )\). The aggregated projections for the South Asian region are shown in Fig. 7. The 2007 IIASA projections available on their website (Lutz et al. 2008) are also depicted. We found that our median projections were fairly similar to IIASA’s median projections, but our intervals were much sharper, ranging from 36 % to 72 % narrower than those of IIASA.

Fig. 7
figure 7

Life expectancy projections for South Asia (IIASA-defined) for our BHM model, IIASA, and the UN. The median projections for BHM and IIASA are similar, but the IIASA 80 % interval is much wider than the BHM 80 % interval

Our approach yields stochastic trajectories that can fluctuate around the projected life expectancy. By contrast, the IIASA method samples random linear trajectories, which change at a constant rate throughout the projection period, with no fluctuations.

Discussion

We have proposed a Bayesian way to produce probabilistic projections of life expectancy for all the countries of the world. One possible use of these projections is as inputs to the UN estimation and projections. We have shown through out-of-sample cross-validation that the method gives well-calibrated and sharp prediction intervals, and provides better forecasts than the current UN method.

We have restricted our analysis to countries without generalized HIV/AIDS epidemics because of the singular nature of their demographic impact, mainly on sexually active adults. To explore the possible applicability of our method to the excluded countries, we loosened the exclusion rule and fitted our model to all countries with a 2000–2005 HIV/AIDS prevalence rate of less than 4 % (\(n =\) 179), an additional 21 countries. In the 10–year cross-validation of these countries, the 80 % prediction intervals included the observed life expectancy 84 % of the time. Including the additional countries did not greatly degrade sharpness: the mean absolute error was 1.2 years, and the average 80 % prediction interval half-width was 1.9 years, compared with 1.0 and 1.7 years (respectively) without the HIV/AIDS countries. Further research is needed to generalize our model to countries with a generalized HIV/AIDS epidemic while properly accounting for the uncertainty in AIDS mortality, but these results suggest that the current model could provide reasonable approximate results.

Our method projects male life expectancy only. Further research is needed to apply this model to life expectancy at birth among males and females simultaneously while ensuring that trajectories by sex do not diverge or cross. Lee and Carter (1992) suggested doing this by modeling the two sexes independently and introducing a new parameter to ensure that the stochastic trajectories do not cross or diverge. Such a model could also be made more complex by allowing the double-logistic parameters for the two sexes to be correlated. An alternative approach would be to combine two stochastic models: one model for one sex (as here) or for average life expectancy, and a second model for the gap between the sexes.

Our model assumes that the rate of increase of life expectancy will decline in the future but that gains will continue, asymptotically, at a linear rate, on average. The model could be modified to allow the rate of increase to decline to zero over time, but experience to date does not provide much support for this (Oeppen and Vaupel 2002).

Modifying the model this way could also be dangerous. Olshansky et al. (2009) argued that current official forecasts of life expectancy by the U.S. Social Security Administration and the U.S. Census Bureau may be underestimated, by about three years for males and eight years for females. They estimated that this discrepancy could cost as much as $3 to $8 trillion more than currently projected for Medicare and Social Security. The current official government forecasts of U.S. male life expectancy in 2050 are 80–81 years, whereas Olshansky et al. (2009) projected 83–86 years under their two main scenarios, with a range of 78–113 years for their more extreme scenarios. Our BHM projections are close to theirs, with a median of 84.5 years and an 80 % prediction interval of 82.2–86.4.

Our model estimates a different asymptotic average rate of increase for each country, and so it is theoretically possible that the gaps between countries could grow over time, which might be viewed as unrealistic. Over our 90-year projection period, however, this does not happen, and in fact the opposite occurs. The between-country standard deviation of the country-specific posterior predictive medians declines from 6.4 in 2005–2010 to 6.1 for 2095–2100. (For comparison, the between-country standard deviation of observed life expectancies in 2000–2005 was 6.5.) This is because the Bayesian hierarchical model assumes that each country’s average rate of improvement converges to an asymptote as life expectancy increases, and finds the asymptotes for different countries to be similar (the standard deviation of the posterior medians of the asymptotes is only 0.05 years per five-year period). This convergence effect dominates the small growth in the gap due to differences in the estimated asymptotes over the projection period.

Much other research has been done on the forecasting of mortality (e.g., Booth 2006; Dowd et al. 2010). These efforts, however, have focused on developed countries, for which reliable age-specific data are available. The best known time-series method for forecasting age-specific mortality rates is the Lee-Carter method and its various parallels (e.g., Renshaw and Haberman 2006), generalizations (e.g., Brouhns et al. 2002; de Jong and Tickle 2006; Hyndman and Ullah 2007; Koissi et al. 2006; Pedroza 2006), and extensions (e.g., Ishii 2008; Li and Lee 2005; Li et al. 2004). As discussed earlier, the empirical evidence about the Lee-Carter method’s performance for forecasting life expectancy is mixed.

Other time-series methods to estimate and project age-specific mortality rates have been proposed. The Brass relational method fits a two-parameter model in which the age-spelcific mortality rates are assumed to be given by a linear function of a user-chosen model life table on a logit scale (Brass 1971). The Heligman–Pollard model is an eight-parameter model with three parts describing mortality in childhood, young adulthood, and later life (Heligman and Pollard 1980). Although both models have fit mortality data well (Hartmann 1987; Keyfitz 1991), difficulties may arise in projecting the parameters (Keyfitz 1981). Like the Heligman–Pollard model, the Bongaarts shifting logistic model (Bongaarts 2005) differentiates mortality at different ages by fitting a three-parameter logistic model and fixing the slope parameter across time while allowing the other two parameters to vary with time. The shifting logistic model, however, focuses on senescent mortality and may be most relevant when infant/child and adult mortality are already negligible. Other models have focused on senescent mortality in terms of biological and evolutionary phenomena that incorporate the idea of heterogeneity in the population and frailty (Steinsaltz and Wachter 2006; Yashin et al. 2000). These approaches require age-specific data.

Torri and Vaupel (2012) proposed a model for forecasting life expectancy in which the life expectancy for the leading country is modeled by an autoregressive integrated moving average (ARIMA) model with constant drift, and the gap between the life expectancy for any given country and the leading country is modeled by a geometric Brownian motion model, estimated separately for each country. They applied their method to Italy and the United States.

Instead of modeling age-specific mortality rates, Gage (1993) used a competing hazards model originally developed by Siler (1979) for animal mortality. The five-parameter model describes the hazard function as made up of three components: immature (or childhood), residual (or background), and senesecent mortalities. Bongaarts (2006) also suggested decomposing mortality into these three components with future projections focusing on the senescent mortality component. Both approaches require cause-of-death data, and Gage (1993) acknowledged that this is a limitation because of issues of data quality and cause-of-death classification.

In addition to time-series approaches, there are two other main approaches to developing probabilistic projections (Lee 1998). As previously discussed, Lutz and colleagues at the IIASA (Lutz et al. 1998, 2004, 2008) produced expert-based probabilistic projections. This method, however, does not explicitly rely on the use of available data, instead relying on a collection of experts and their ability to specify probabilistic bounds. The other main alternative to time-series methods is ex-post analysis of previous projections (Keyfitz 1981; Smith and Sincich 1990; Stoto 1983). In this method, previous forecast errors are used to create probabilistic errors on future projections.

Several other Bayesian approaches have been proposed. As already discussed, Girosi and King (2008) proposed a Bayesian method that can incorporate covariates. They showed that, on average, their method outperformed the Lee-Carter method (without covariates) for 48 countries with better mortality data. This result was obtained when covariates were used, but this requires additional data that may not be reliable or available in many countries. They did not give probabilistic forecasts, and their software does not produce them, although their method may in principle be able to provide them.

Czado et al. (2005) presented a Bayesian method for estimating the Poisson log-bilinear formulation of the Brouhns et al. (2002) Lee-Carter model. Pedroza (2006) proposed a Bayesian approach to the Lee-Carter model by accounting for the uncertainty in the age parameters as well as the mortality index usually forecasted. While the latter two approaches account for uncertainty in the Lee-Carter model, their generalization to all countries is hindered by the nonavailability of age-specific mortality rates.