1 Introduction

Even before the significant global mortality shock of the COVID-19 pandemic in 2020 [63], a slowdown in the rates of mortality improvements had been seen in several developed countries. For example, in the 20 years between 1991 and 2011, life expectancy at birth for males in England and Wales grew by almost 5 years and by more than 4 years for females and then, in 2015, a sharp spike in the number of deaths, especially among older people, resulted in an unprecedented fall in life expectancy [25, 26]. This phenomenon was seen across several European countries and also in the USA [6, 44, 46].

In the first published analyses of this phenomenon, different viewpoints emerged. Ho and Hendi [29] took the view that, in contrast to the USA, the simultaneous spike in mortality in 2015 in several high-income European countries was largely attributable to influenza. However, Hiam et al. [25, 26] focused on the high rates of mortality in the elderly over 75 in England in 2015 and attributed these to the negative consequences of the UK Government’s austerity policies and reductions in the funding of health and care services [41]. The ‘austerity’ explanation was subsequently challenged, mainly on the grounds that association does not prove causation. Furthermore, in the UK, pensioners were being better protected from spending cuts than other age groups, with state pension guaranteed to increase by the greatest of average earnings, consumer price index and \(2.5\%\) p.a. [19].

Subsequent analyses of data for England by the Continuous Mortality Investigation (for example, [9]) have focused on the underlying mortality improvement rates (the annual relative change over time in age specific mortality rates). These analyses show that the stagnation in rates of mortality improvement was observable across all levels of socioeconomic class and deprivation [9, 47]. This contrasts with the experience of the USA, which has experienced a real fall in life expectancy at birth for two consecutive years (2015 and 2016). However, the fall in the USA has been driven by increases in mortality in a specific range of ages and for a specific sub-population—namely, middle-aged, low educated, white adult men [6]. In England, mortality improvements have slowed across all socioeconomic strata as well as at older ages, with the greatest mortality slowdown for the most deprived groups. Thus, in England, mortality inequalities have widened further for those aged 65 and over [7, 9, 36, 40, 47]).

Analysis of mortality for England and Wales by cause of death points to a reduction in improvement in mortality from cardiovascular disease (coronary heart disease and stroke) among those aged 65 and older as the main driver [49]. Reductions in cardiovascular disease mortality, which is the leading cause of death, have historically driven improvements in life expectancy and any change in these rates has a large impact on trends.

Over the same period, deaths attributed to dementia have increased sharply from about 2006 onwards. With more people surviving to old age, an increase in deaths from dementia is to be expected. However, changes to the coding of the underlying causes of death with the introduction of ICD10 and its revisions in 2011 and 2014 (e.g. all vascular dementias previously coded to circulatory diseases are now being coded to dementia), alongside a greater awareness of the condition by certifying doctors, may have also contributed to the sharp rise in dementia mortality rates. Changes in the classification of causes of death do not affect the trends in overall deaths. However, changes in prevention and treatment of dementia, which delay subsequent mortality, are likely in the future.

The slowdown in the rates of mortality improvements observed in the UK is not unique and similar changes have been seen in other European countries. Six of the largest EU countries (France, Germany, Italy, Poland, Spain, UK) have seen a fall in life expectancy for both men and women between 2014 and 2015, with female life expectancy at birth falling in 23 of the 28 EU countries, while male life expectancy at birth has fallen in 16 EU countries [46, 50]. The European mortality monitoring network has attributed excess mortality in Europe in the particular winters of 2015, 2016 and 2017 to flu, and the particular strain prevalent, A(H3N2), noting the low efficacy of the relevant vaccine despite a high take-up. There is agreement that flu has played a role in the increased volatility of recent mortality trends in many counties. But the significance of its role in the general slowdown in mortality improvement in the EU countries is unclear.

Reports regarding the over 1965s in the Netherlands [58] and in Germany [62] also show that inequalities in life expectancy by socio-economic status measures have widened in recent years. However, more in-depth analysis of mortality patterns by cause of death or inequalities has not so far been conducted.

Nevertheless, several analyses (using different start-end dates, different summary measures of mortality, different age-ranges, different sets of comparator countries) have all concluded that, among high income countries other than the US, the greatest slowdown in the rates of mortality improvements has been in the UK [38]. For example, Leon et al. [38] compare mortality trends for England and Wales with those of the median of a group of 22 comparator high income countries. They find that England and Wales mortality rates at ages 25–49 are “appreciably higher” than in the comparator group, that the trend in life expectancy for England and Wales since 2011 is among the worst performing and that this is particularly the case for women.

The comparative country analyses of Leon et al. [38] and Raleigh [50] provide important insights which challenge some simplistic hypotheses, e.g.:

  • that countries with the highest life expectancy would experience the slowest improvements if the slowdown was because limits to mortality improvements were being approached—but this has not been so for Switzerland and Japan, which has not experienced a slowdown at all;

  • that the larger the rates of improvement in the years prior to 2010, the greater the slowdown—but this has not been so in Italy with its high and steady rates of improvement in the first decade of this century;

  • the higher the level of austerity, the bigger the reduction—Greece and Spain have experienced rising life expectancy despite having experienced higher levels of austerity cuts than the UK; and

  • the differences in the gender specific trends within countries remain unexplained.

While policymakers are alert to the need to reverse the adverse trends in mortality, it remains unclear what levers to pull. In order to inform policies, we need a much better understanding of the drivers of the recent change in rates of mortality improvement. It remains uncertain if the flattening out in the downward trend in mortality rates will persist—while some recent reports have suggested an upturn in mortality improvement trends in 2018 and 2019 [38, 45], 2020 and 2021 have been marked globally by the negative mortality shock caused by COVID-19.

We believe that a robust and detailed international comparison of mortality improvement trends across ages and genders would play a useful role in informing policymakers and the insurance and pensions industry. For example, if many other developed countries have not experienced a slowdown in mortality improvement recently, so that the UK and USA are outliers, then this may suggest that there is scope for a reversal of the recent slowdown in mortality improvement rates in these two countries. This could happen through policy changes, for example through more targeted health improvements in the more deprived population, or improved protection from a surge in winter deaths among the frail elderly.

Most of the published work on the recent slowdown in mortality trends has come from the disciplinary perspectives of epidemiology, public health or demography. Researchers have tended to express their results in summary statistics such as life expectancy or standardised mortality rates. While these indices are effective in summarising trends and communicating emerging issues to the public, they lack the granularity required to understand what is happening at individual ages and to assist the insurance and financial sectors in the calculations of reserves (and prices) for longevity risks for portfolios of annuities or pensions.

A deeper understanding of stagnating mortality patterns is important for several reasons. From a policy viewpoint, it is important to understand potential drivers of the slowdown to inform policies to reverse these trends in order to better implement and target health and social policies. Furthermore, mortality forecasts employed by many statistical institutions (and other users like the insurance industry) use linear extrapolative methods [59], based on time series, which may struggle to provide reliable forecasts when there are structural changes and when departures from a linear trend are observed, for example during periods of a slowdown in the trend. A better understanding of stagnating patterns could be used potentially to improve, or challenge, existing forecasting methodologies.

In this paper, we have attempted to fill this gap by analysing the historical mortality trends of 21 developed countries by age and gender to understand better the dynamic of the slowdown in improvement in mortality, and, in some cases the rise in mortality, which has been observed. The detailed analysis and modelling of historic mortality data for individual ages and calendar years since 1965 (from the Human Mortality Database) will enable us to understand more fully the dynamic of the slowdown in mortality improvement rates, to quantify the extent to which current mortality patterns deviate from what we might have expected and to analyse in detail the differences in the underlying trends in mortality rates across countries.

We first describe the historic mortality improvement rates for men and women in each of these countries for ages 50–95, and then analyse mortality improvement trends using stochastic mortality models which have been designed to fit the experience over time for this age group.

Specifically, we have used stochastic models to analyse the historic mortality trends of various countries between 1965 and 2010 and to project trends beyond 2011. We pose 2 important questions:

  • Given the historical trends, would we have forecast any slowdown in mortality improvement rates since 2011, when compared with the recent past?

  • What has happened since 2011 when compared with the forecasts?

The answer to the first question is: Yes, for many countries and with a gender difference. When we look at the period from 2000, the forecast mortality improvement rates in 2011–2017 are lower (by a threshold of \(0.25\%\)) than the actual improvement rates in the preceding decade among men in 16 countries and among women in 8 countries. These results are consistent with hypotheses that stalling mortality improvements emerged before 2010. Examples of such hypotheses include unfavourable trends in obesity, diabetes, cardiovascular-related deaths and dementia deaths.

Regarding the second question, we compare the forecasts with actual mortality improvement rates since 2011. Again, we observe a gender difference. Women in 18 countries but men in 8 countries have experienced lower mortality improvements than projected during 2011–2017 (by a threshold of \(0.25\%\)). For women, Greece, Italy and Spain are the 3 worst performing countries by this measure; for men, it is Taiwan, Germany and the UK. This observation, at least for some countries, is consistent with suggestions that austerity and the unusually high winter deaths during this period may have adversely affected mortality trends. Some of the Scandinavian populations have bucked the stalling mortality improvement trend, and have experienced higher mortality improvement rates than the projections.

In conclusion, we find that part of the slowdown in mortality improvement rates of the over 1950s since 2011 would have been expected from historical trends in many countries, especially among men. There has been a notable slowdown, compared with the model forecasts, since 2011 in many countries especially among women. But, there are some countries with higher mortality improvement rates than projected. A better understanding of the drivers behind these complex trends will inform health and social policies.

The paper is divided into the following sections. In Sect. 2, we describe the data and its source. In Sect. 3, we present a description of the methodology that we have used. Here, we describe our methodology for fitting the stochastic mortality models, for choosing the “best” models for each combination of gender and country, and for forecasting. In Sect. 4, we present our main results. In Sect. 5, we discuss the results, provide some concluding comments and suggestions for future work.

2 Data

Twenty-one countries are considered in this analysis. Countries with large populations, developed economies and good quality data have been chosen. We obtained data from the Human Mortality Database [31]. Whilst HMD provides data by single years of age from 0–105 for all countries, the range of calendar years that are available vary by country: see Table 1. Since the focus of the analysis is on modelling recent mortality trends, data prior to 1965 have been discarded.

Previous studies have shown that the stagnation in mortality improvement rates has been particularly marked at older ages, and hence we have limited our analyses to trends in mortality rates and mortality improvement rates by single year of age from 50 to 95, by individual calendar year, and by gender for each country. We note that data availability at very advanced ages (e.g. beyond age 100) is sparse in many countries, and, hence, we have restricted the analysis to ages up to 95.

Table 1 Data range available in the Human Mortality Database for selected countries

3 Methods

3.1 Descriptive analysis of trends

We first present a descriptive analysis of the observed trends in directly age standardised mortality rates (ASMR) in the 21 selected countries. The ASMR for a country in a particular year is derived by applying its observed age-specific mortality rates to the corresponding age distribution in a standard reference population. The reference population is the same for each country and in each year. The reference population used as the standard in this study is the population aggregated by single year of age of persons aged 50–95 across these selected countries in 2010. We found that the overall shape of this reference population across ages is similar to that of the European Standard Population, 2013, as illustrated by Fig. 8 in the Appendix.

Thus, the analysis of the trends in ASMR controls for the effect of differential age structures over time and between countries, with annual changes in ASMR rates being wholly attributable to changes in observed mortality rates rather than changes in the population age structure. We have also used the same standard reference population to calculate ASMRS by gender, to allow like for like comparisons of ASMRs by sex.

3.2 Stochastic mortality models

We then use stochastic mortality forecasting models to explore in more detail the extent of the slowdown of mortality improvements over the most recent years for the countries being considered. This process involves two stages. At the first stage, a number of stochastic mortality forecasting models are calibrated to each population by gender, excluding the most recent years (e.g. using data up to 2010 only). For each country–gender combination, we identify the models that best describe the trends for the period 1965–2010. At the second stage, the calibrated models from the first stage are used to forecast mortality rates over the most recent years (e.g. 2011–2017). The forecast mortality rates and resulting improvement rates are then compared in detail against the observed experience (post 2010).

Although we model trends in mortality rates, we analyse the results in terms of the underlying mortality improvement rates (MIRs). There has been growing interest recently among researchers in using mortality improvement rates as an effective tool for presenting and modelling mortality trends—see, for example, [2, 22, 23, 27, 43, 52]. The mortality improvement rate at age x in calendar year t, which we denote by \(\text {MIR}_{x,t}\), is given by

$$\begin{aligned} \text {MIR}_{x,t} = 1-\dfrac{m_{x,t}}{m_{x,t-1}} \end{aligned}$$
(1)

where \(m_{x,t}\) represents the central mortality rate for age x in calendar year t.

In order to choose the best fitting model, we have considered nine widely used mortality projection models from the literature. These models are the Age–Period–Cohort (APC) model [12], Lee–Carter (LC) model [37], Renshaw–Haberman (RH) model [51], two-factor Cairns–Blake–Dowd (CBD5) model [3] and its extensions (CBD6, CBD7, CBD8: [4]), Plat model [48], and two-dimensional P-spline model [11]. Many of these can be fitted to data through R packages [5, 33, 60, 61]. The nine models have been fitted to each country separately for males and females using data for ages 50 to 95 and for calendar years 1965 to 2010. For some countries, however, data are available for later years only (e.g. from 1981 for Greece, 1990 for Germany) and, in these cases, the models are calibrated using the data available from that starting year up to 2010.

An outline of the mathematical structure of these models is given in Table 2. Each model provides a mathematical formulation of the central mortality rate, \(m_{x,t}\), as a function of age, x, and calendar year, t.

Table 2 Mortality projection models implemented in this analysis; \(m_{x,t}\) denotes the central mortality rate at age x and calendar year t

In terms of broad structure, the LC model is a non-linear model where the terms on the right hand side can be regarded as representing the sum of an average age effect, \(\alpha _x\), and a term that represents a product of a time trend, \(\kappa _t\), and an age gradient, \(\beta _x\). The RH model additionally includes a term that represents a cohort effect relating to year of birth \(t-x\). The APC model is a specific case of the RH model. These three models are from the same family. Next, the CBD models also form a family and take advantage of the finding that, at the older ages, the data often suggest that the structure of the model may be simplified with linear predetermined terms replacing the estimated age gradient term \(\beta _x\). CBD 5 is the simplest with a linear age term. CBD 6 includes a cohort term like RH, and the Plat model adapts CBD 6 by including \(\alpha _x\). CBD 7 includes a quadratic age term and CBD 8 is a more complex version of CBD 6. The two-dimensional P-splines model is a different form of model and uses a combination of penalised B-splines to provide a smooth representation of the mortality surface viewed as a function of age and time.

3.2.1 Fitting the models and forecasting

The models in Table 2 are fitted using the methodology of constrained and penalised generalized non-linear models [13, 17, 32, 60] under the Poisson assumption about the distribution of death counts:

$$\begin{aligned} D_{x,t} \sim Poisson(E_{x,t} \times m_{x,t}), \end{aligned}$$
(2)

where \(D_{x,t}\), \(E_{x,t}\) and \(m_{x,t}\) represent the death counts, exposed-to-risk and central mortality rates at age x and calendar year t.

Among these models, the simplest to fit is CBD5. Indeed upon specification of the death and exposure data, the two unknown period components \(\kappa _{0,t}\) and \(\kappa _{1,t}\) in CBD5 can be fitted using standard functions designed to fit a generalized linear models.

Apart from CBD5 and P-splines, the seven other models are not identifiable. Thus, different sets of parameters estimates can yield the same estimate of the mortality rates. Looking at the LC model for example, the two sets of parameters \((\alpha _x,\beta _x, \kappa _t)\) and \((\alpha _x-a\beta _x,\beta _x, \kappa _t+a)\) yield identical fitted values of the mortality rates \(m_{x,t}\) for any value of a. The modern approach to managing non-identifiability is to impose appropriate constraints on the model parameters. The identifiability constraints used in this work are summarised in Table 30 in the Appendix [4, 32].

Further, the five models containing an explicit cohort component (i.e. CBD6, CBD7, CBD8, RH, Plat) involve extra complexity in terms of the estimation of the cohort parameters. The youngest and oldest cohorts in the data have too few observations for the parameters to be estimated reliably. Including these cohorts without adjustment would then yield estimates with very high levels of uncertainty. We address this problem by constraining the estimated cohort components of the four youngest cohorts (i.e. born from 1957 to 1960) to be identical; and apply the same constraint to the four oldest cohorts (i.e. born from 1870 to 1873).

On fitting the models, the period and cohort components are projected and combined to obtain forecasts of the age-specific mortality rates. The period components are forecasted using a multivariate random walk model with drift, and the cohort components are forecasted using ARIMA models [3, 53]. In order to fit an ARIMA, one must first choose the underlying different order (d), the auto-regressive order (p), and the order of the moving average (q). For each value of \(d \in \{0,1\}\), the autoregressive and moving average orders were optimised (over \(p \in \{0,1,2,3\}\) and \(q \in \{0,1,2,3\}\)) using the Akaike Information Criteria (AIC) with a correction for small sample size. More details on model selection are provided below.

Although the P-splines model does not require constraints or contain explicit cohort components, care is needed regarding the underlying smoothing methodology. Indeed, the performance of this method is driven by a number of parameters, the most influential ones being the spacing of the knots, the penalty function and the smoothing parameters. For the P-splines model, we use cubic B-splines with a 5-year knot spacing in age and in time, with a second order difference penalty function [11]. Following Djeundje and Currie [16], we have used the adjusted version of the Bayesian Information Criterion (BIC—see below) that takes into account over-dispersion when selecting the optimal values of the smoothing parameters.

One attractive feature of P-splines is that the penalty function allows us to fit the model and forecast simultaneously. The choice of penalty function has a significant impact on the direction of the forecast. In practice, however, the second order difference penalty is known to yield forecasts that fit well with the observed data provided that care is taken to avoid under-smoothing [16].

3.2.2 Model comparison and selection

On calibrating the nine mortality projection models (for each country–gender combination), we select three models for each population, in order to provide scope for analysing the sensitivity of the results to model choice and to reduce the impact of model error.

In general, the selection of models that aim at forecasting future trends involves a number of steps. First, a good model should fit the historic data well. Adherence of models to the data can be compared using statistical metrics such as the deviance, or by analysing plots of the residuals. However, by focusing only on this requirement, the most complex models with many parameters will tend to be preferred. In order to provide a reasonable balance between the conflicting characteristics of fidelity to the data and parsimony, we use the BIC:

$$\begin{aligned} \text {BIC} = \text {Dev} + \log (n) \times \text {ED}, \end{aligned}$$
(3)

where n represents the sample size, Dev is the deviance residual, and ED is the effective dimension of the model. Models with lower BIC tend to provide a good balance between adherence to the data and simplicity.

In choosing the “best” models, we have considered the desirable properties we would require of a stochastic mortality model. Cairns et al. [4] list a set of such properties (based on the earlier work of CMI [8])—these are parsimony, transparency, ability to generate sample paths, incorporation of cohort effects, nontrivial correlation structure, goodness of fit, robustness, forecasting biologically reasonable scenarios (see also [21], for further discussion). We note that the salience of a property will depend on the particular application being considered.

In this analysis, we have found that the best models (according to the BIC criterion) can yield unreasonable forecasts: in particular, when automatic optimisation of the orders of the ARIMA is used for forecasting the cohort component. Thus, although we explore in detail standard goodness of fit indices like the deviance, BIC and residual patterns, as noted above, the following three additional criteria have also been taken into account as part of the model selection:

  • Consistency: As noted earlier, for some decades, population mortality rates have been decreasing steadily over time. Thus, models exhibiting sudden jumps or shocks in fitted or projected mortality rates, and models yielding monotonically increasing mortality forecasts over time are discarded.

  • Stability: Any model exhibiting a high level of uncertainty in any of its components is discarded.

  • Parsimony: Increasing complexity that does not improve the accuracy of the forecasts is unhelpful. Thus, if forecasts from several models are similar, the simplest model is preferred.

We have decided to use the results from more than one model in our investigation. Although selecting a single model is often the most common approach when predicting the future trend of a specific variable, there two potential issues with such an approach. First, it represents over-confidence in thinking that the selected model is the only correct one and will produce reliable forecast in any situation. Second, the approach is incoherent because, as new data become available, the selected model may no longer be the optimal choice. There is evidence that model averaging leads to improved forecast accuracy [28, 54, 57]). Hence, we adopt a model averaging approach—there are many versions of model averaging in the literature and we adopt one based on arithmetic averaging on grounds of simplicity.

For each country–gender combination, three models are retained: the LC model is chosen to be one; and the other two models are chosen based on goodness of fit to the data and the above three criteria. LC serves as a benchmark and is retained as it is the most widely used model for mortality forecasting and, throughout the analyses, LC behaves well with respect to the above criteria. The other two models retained in each case vary from country-to-country as well as by gender within country (details of the three models selected for each of the 21 countries by gender are provided in the Appendix).

Lastly, to assess the extent to which the post-2010 mortality experience has departed from the projected trend, ASMRs based on each country–gender forecast are calculated for 2011–2017, along with the resulting average annual MIRs.

4 Results

4.1 Descriptive analysis

The results for the UK in terms of ASMR for ages 50–95 are shown in Fig. 1: the left-hand panel shows the annual ASMR rates, while the right hand panel shows the trend in smoothed MIRs (based on 5-year moving averages).

Fig. 1
figure 1

Observed mortality and improvements rates in UK. Left: Directly standardised mortality rates. Right: annualised mortality improvements (5-year moving averages) derived from the standardised mortality rates

Figure 1 confirms the steady fall of ASMR over time for both genders: we see the downward trend in the left-hand panel reflected in the corresponding positive values for the improvement rates in the right-hand panel. Male rates have fallen faster than those for women, narrowing the gender gap. From 2010, however, the decline in ASMRs has slowed, flattening out for both genders; and, correspondingly, the annual improvement rates have fallen, almost to zero during the last 5 years.

For all 21 countries, Table 3 shows the average yearly changes in ASMRs over the sub-periods of the study period (1965–2017) for women and men.

Table 3 Average yearly mortality improvements by country and by gender over multiple time periods

When we look across longer time-periods for each country, we can see that mortality improvement rates are neither constant nor monotonic: periods of larger improvement rates are followed by periods of slower improvements. Nevertheless, we note that, in all the countries included, both genders show a slowdown of MIRs since 2010 compared to the previous decade, except in Denmark (for both genders) and Japan and Norway (for men). However, the magnitude of the average annual MIR for 2011–2017 varies between countries: the low-high range is from \(0.32\%\) (Greece) through to \(2.31\%\) (Denmark) for women; and from \(0.51\%\) (Greece) to \(2.41\%\) (Norway) for men.

4.2 Analysis of stochastic mortality modelling results

As mentioned earlier, the nine models have been fitted to each country separately for males and females using data for ages 50–95 and for calendar years 1965–2010. For some countries, however, data are available for later years only (i.e. from 1970 for Taiwan, 1981 for Greece, 1990 for Germany) and in these cases the models have been calibrated using the data available from that starting year up to 2010. Using the criteria described in Sect. 3.2.2, we have chosen three models that provide the most satisfactory fit to the historic data for each gender-country combination.

Fig. 2
figure 2

Mortality rates (log scale) from selected models fitted to UK females

Fig. 3
figure 3

Mortality rates (log scale) from selected models fitted to UK males

As an example, we consider the results in more detail for the UK. Profile views from the 3 selected models are shown in Fig. 2 for women and Fig. 3 for men. In order to summarize the results, the projected age specific mortality rates (by single year of age and single year of calendar time) from 2011 onwards are used to compute the projected standardised mortality rates (PSMR) for the broad age groups 50–64, 65–79, 80–95 and for the full age range 50–95. The PSMR for each age group within a country in a particular year is calculated by applying its projected age-specific mortality rates to the corresponding age distribution in the standard reference population. From these PSMR, yearly projected mortality improvement rates are then derived and averaged. The results for UK men and women are shown in Table 4.

Table 4 Average yearly mortality improvements in the UK over the calendar time period 2011–2017 (Expected vs Observed)

Table 4 shows that, although the projected mortality improvements vary from one model to another, the average improvement experience since 2010 is lower compared to the predictions from any of the three models selected for men and women. This comment applies to each gender overall (age 50–95) as well as to all of the constituent age-subgroups shown (except for the APC model at ages 50–64 for women and the LC model at ages 80–95 for men).

Tables 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 and 27 in the Appendix provide the detailed results corresponding to Table 4 for each gender-country combination. Before looking at the results, we need to comment on the detailed calculation of the improvement rates. Close inspection will reveal that the observed mortality improvement rates for the time period 2011–2017 in the Appendix tables are slightly different from those shown earlier in Table 3. These differences have arisen because, when producing the tables in the Appendix, it is necessary to choose a consistent set of mortality rates for 2010 on which to base the forecasts up to 2017: we have decided to set the initial 2010 mortality rates to be equal to the average fitted rates from the selected three stochastic mortality models (the choice is rather arbitrary and we could have used the observed mortality rates as an alternative). This adjustment, which has a small effect on the results, also applies to Tables 5 and 6 and to Figs. 4, 5, 6 and 7: see below.

In order to provide a clearer picture of the results, we show in Fig. 4 a comparison of the mortality improvement experience over the time period 2011–2017 with the forecast mortality improvements for each country, based on the LC model for women. Similarly, Fig. 5 compares the mortality improvement experience over the time period 2011–2017 to the average forecast mortality improvements for each country, for women, based on the average of the selected three stochastic mortality models for each country.

Countries lying above the 45 degree line are those where the observed MIR is higher than the predicted MIR.

Fig. 4
figure 4

Comparison of observed versus expected annual mortality improvement rates for females by country, 2011–2017. Note: The expected improvement rates are derived from Lee–Carter forecast of mortality rates

Fig. 5
figure 5

Comparison of observed versus expected annual mortality improvement rates for females by country, 2011–2017. Note: The expected annual improvement rates are the average from the three models selected for each country

We see from Figs. 4 and 5 that, for women, in all countries except Denmark and Norway, the observed MIRs post 2010 are worse than would have been anticipated by the forecasts calibrated to the 1965–2010 data. The position of Denmark stands out with a markedly higher mortality improvements than would have been forecast. Women in Greece, Italy and Japan have experienced notably worse MIRs on average post-2010 than anticipated. Further, we note that the observed and forecast MIRs are relatively close in some countries (e.g. Belgium, Canada, Norway, Portugal and Sweden; see Figs. 4 and 5). Thus, there is marginal evidence for a slowdown in these countries.

The corresponding comparisons for men are shown in Figs. 6 and 7 and present a more balanced picture than for women: with about half of the countries experiencing a higher improvement on average than forecast and half lower. Also, in Figs. 6 and 7, the countries are positioned slightly closer to the 45 degree line than in Figs. 4 and 5. As with women, men in Denmark and Norway stand out with substantially higher mortality improvements on average than projected, followed by Sweden, Belgium and Finland (Fig. 7). At the other end of the spectrum, Germany, Greece, Taiwan and the UK are among the countries with the lowest MIRs on average for men compared to those forecast. Overall, these results suggest that, in aggregate, women have experienced a more widespread slowdown in mortality improvements than men (contrasting Figs. 5 and 7).

Fig. 6
figure 6

Comparison of observed versus expected annual mortality improvement rates for males by country, 2011–2017. Note: The expected improvement rates are derived from Lee–Carter forecast of mortality rates

Fig. 7
figure 7

Comparison of observed versus expected annual mortality improvement rates for males by country, 2011–2017. Note: The expected annual improvement rates are the average from the three models selected for each country

Considering these results, important salient questions arise:

  1. (i)

    could we have anticipated the slowdown in mortality improvements by extrapolating trends from the previous decade (2000–2010), and

  2. (ii)

    has there been a greater slowdown in observed MIRs compared to the predicted MIRs for the post-2010 period?

We present results in Table 5 for the 21 countries in our investigation to answer these questions. Addressing the first question above, the first column of Table 5 shows the difference between the projected MIRs for 2011–2017 and the actual MIRs for 2000–2010. Addressing the second question, the second column shows the difference between the actual MIRs for 2011–2017 and the projected MIRs for 2011–2017. The third column is the sum of the first two columns and equals the difference between the actual MIRs for 2011–2017 and the actual MIRs for 2000–2010. We thus decompose the stagnation in MIRs observed in 2011–2017 relative to 2000–2010 into two constituent parts: corresponding to “expected” change had 2000–2010 rates continued; and the additional change, based on projected rates for 2011–2017.

Table 5 Comparison of projected and observed standardised mortality improvement rates (ages 50–95), by sex

In order to focus attention on marked differences in trends, we use a difference of \(\pm \,0.25\%\) p.a. MIR as a cut-off point to differentiate countries whose projections are lower or higher than observed. We consider column 1 and observe a group of countries with projected 2011–2017 MIRs lower than the observed rates in the preceding decade by more than \(0.25\%\) p.a. This group includes both men and women in 7 countries—namely Ireland, Netherlands, the USA, Denmark, Norway, Canada and Sweden; only women in Taiwan; and only men in Greece, the UK, Belgium, Italy, Australia, France, Switzerland, Austria and Finland. (For clarity these are shown in bold).

In contrast, the projected 2011–2017 MIRs exceed the observed MIRs for 2001–2010 by more than the \(0.25\%\) threshold for German and Japanese men and women; and fall within the threshold for men and women in Portugal and Spain; for men in Taiwan; and for women in the UK, Austria, Finland, Greece, France, Belgium, Australia, Switzerland and Italy.

This analysis suggests that, given the observed trend in 2000–2010, we could have predicted a slowdown in mortality improvement trends in the recent decade. It leads to the second salient question mentioned above: has the slowdown in observed mortality improvement rates been higher than forecast? If so, this suggests that the difference may be associated with events that happened post-2011, and we could speculate that austerity and/or higher than normal levels of winter deaths might be possible causes.

From column 2 of Table 5, we observe a remarkable gender difference, where women in 18 (out of 21) countries but men in 8 countries have experienced lower MIRs than those forecast, by more than \(0.25\%\) p.a., during 2011–2017. (For clarity these are shown in bold).

Hence, we can partition the total slowdown in MIRs into 2: forecast slowdown (column 1 in Table 5) plus an additional slowdown relative to forecasts (column 2 in Table 5). For example, men in the UK have experienced lower mortality improvement rates in 2011–2017 than those in the preceding decade, averaging about \(1.55\%\) p.a. The forecast slowdown component is about \(0.81\%\) p.a. and the additional slowdown component is about \(0.74\%\) p.a. (Table 5). This ‘double slowdown’ is a feature of the experience in several countries: for women in Ireland, Netherlands, the USA, Canada, Taiwan and the UK; and for men in Ireland, the USA, Greece, the UK and Italy.

We also note from column 2 that both genders in Denmark and men in Norway, Sweden and Belgium have experienced higher mortality improvement rates in 2011–2017 than the forecasts, by more than \(0.25\%\).

Table 6 Average gap between the observed and projected improvement rates over 2011–2017

As an alternative method of comparison of the results between countries, we present in Table 6 rankings based on the gap between the observed mortality improvement rates and the projected mortality improvement rates (averaged over the 3 models used), separately for men and women, for the period 2011–2017. A more detailed analysis showing the variability of the gaps and rankings across the three stochastic mortality models selected for each country is presented in Tables 28 and 29 in the Appendix.

Table 6 shows that Denmark, Norway and Sweden come top of the rankings for both genders, experiencing notably higher MIR compared to the average model forecasts. The bottom three countries are Greece, Italy and Spain for women and Taiwan, Germany and the UK for men.

Figure 5 and Table 6 show that, for women, in all countries except Denmark and Norway, the average yearly mortality improvements post-2010 are worse than predicted by the models calibrated to the 1965–2010 data. The corresponding picture for men is different (Fig. 7 and Table 6).

As with women, men in Denmark and Norway stand out with substantially higher mortality improvements on average than forecast, followed by Sweden, Belgium and Finland.

5 Discussion and conclusions

This study analyses population mortality trends at ages over 50 across a group of 21 developed countries since 1965, at the level of age specific mortality rates and mortality improvement rates, to see whether we can better understand the reports of a slowdown in the upward trend in life expectancy in recent years, and particularly since 2010 [38, 50]. We have first provided a brief descriptive analysis of the historical trends in mortality rates and mortality improvement rates. We have then used stochastic mortality models in order to examine mortality improvement patterns and have contrasted the observed mortality improvement rates for the period since 2010 with the forecast improvement rates from using these stochastic mortality models.

As far as we are aware, this is the first comparative cross-country study of mortality trends using stochastic models to identify historical trends and forecast future trends to examine to what extent the post-2011 experience deviates from the long run trend. It allows us to compare what actually happened after 2011 against the forecast trend, enabling us to separate out the component of the observed slowdown attributable to long-run trends, from the component attributable to factors specific to the post-2011 period, for each country separately by gender.

We have compared the mortality improvement rate experience for each gender, for the time period 2011–2017 with the forecast improvement rates derived from the three best fitting stochastic mortality models. For women, we show that, in all countries except Denmark and Norway, the average yearly mortality improvements post 2010 have been worse than would have been anticipated by the Lee–Carter model calibrated to the 1965–2010 data. Further, we note that the observed improvements are relatively close to the forecasts in some countries (e.g. Portugal, Ireland and Canada). Thus, there is no evidence for a slowdown in these particular countries. There is a group of countries (including Greece, Italy and Japan) that have experienced worse mortality improvements for women on average than the rates that would have been anticipated from the Lee–Carter model. When we base the forecasts on the average of the three stochastic mortality models selected for each country, the results are similar (see Fig. 5) with Denmark and Norway being the only countries to show average observed mortality improvement rates for 2011–2017 that are higher than those forecast.

The anomalous position of Danish women (as illustrated by Figs. 4 and 5, for example) has been reported in the literature, in terms of a period of mortality stagnation beginning in the 1960s followed by a significant catch up in the most recent two decades. The stagnation in the historic data would lead mortality trend models to underestimate future mortality improvements. The consensus is that cohort effects (with high mortality for the 1915–1945 birth cohorts) and a high smoking prevalence in the 1960s are the main driving forces behind these trends (see, for example, [1, 34, 39]). Further, recent exercises in the stochastic modelling of clustering of mortality trends across multiple populations have identified the distinct pattern of Danish mortality (see, for example, [14, 24, 55]).

As we have seen, the corresponding picture for men is different. Overall the direction of observed mortality improvements relative to those forecast is more balanced than for women, with about half of the countries experiencing a higher improvement on average than forecast and half lower. This holds if the forecasts are based on the Lee–Carter model (Fig. 6) or on the average of the three stochastic mortality models selected for each country (Fig. 7). This suggests that, overall, it is women, rather than men, who have been experiencing the slowdown in mortality improvement.

For men, Denmark and Norway stand out with substantially higher mortality improvements on average than forecast. At the other end of the spectrum, Taiwan and Germany are among the countries with the lowest mortality improvements on average for men compared to the forecasts.

We have considered whether, given the historical trends in mortality across the different countries, we would have forecast the recent slowdown in mortality improvement rates. Based on the selected stochastic mortality models, we have calculated forecast mortality improvement rates for 2011–2017 and highlighted those which are lower than the mortality improvement rates observed in the preceding decade, 2000–2010 by more than the cut-off of 0.25\(\%\) p.a: see column 1 of Table 5. There is a gender bias with men being affected more than women. For men in 16 countries and women in 8 countries, we would have forecast a slowdown in mortality improvement rates, with forecast 2011–2017 improvement rates lower than those observed in 2000–2010. The populations affected are men and women in Ireland, Netherlands, the USA, Denmark, Norway, Canada and Sweden; only women in Taiwan; and only men in Greece, the UK, Belgium, Italy, Australia, France, Switzerland, Austria and Finland.

This observation is consistent with several hypotheses for the slowdown in mortality improvements that would have emerged before 2010 (see [38, 50], and the references listed therein). Some hypotheses have been developed based on specific countries, although they may also be relevant for other countries. These hypotheses include:

  • Worsening trends in diabetes and obesity in many OECD countries.

  • Inequality in mortality rates among different socio-economic groups has widened such that adverse mortality trends in the more deprived groups are affecting the overall mortality trend.

  • Improvements in circulatory disease mortality are slowing in several developed countries, related to the stabilising of smoking prevalence rates and of cholesterol levels, especially among men, in addition to the effects of the worsening diabetes and obesity trends mentioned above.

  • Rising mortality rates related to dementia and Alzheimer diseases, but this potential effect needs to be considered with care because of changes in coding practices in relation to causes of death.

  • Cohort effects. Thus, in the UK, people born between 1926 and 1935, aged 65–84 between 2000 and 2010, have experienced higher mortality improvement rates than people born before or after them. Some European countries also have similar cohort effects. The survival of these cohorts could have led to a subset of frail individuals: because more have survived to higher ages than previous or subsequent cohorts, there are higher numbers who could be vulnerable to mortality shocks like flu epidemics. This effect, when combined with younger cohorts with lower mortality rates, could lead to a stalling in overall mortality improvement rates. However, this hypothesis is not supported by data from the Cognitive Function and Ageing Study which shows that the current “oldest old” are physically and mentally more robust than previous cohorts [42]. It is also not supported by the Swedish study of Horder et al. [30], comparing the frailty of 2 cohorts, born in 1911–1912 and 1930.

We have considered whether there has been a greater slowdown in mortality improvement rates in 2011–2017 than suggested by the stochastic model forecasts: see column 2 of Table 5. We observe a notable gender difference, where women in 18 countries but men in 8 countries have experienced lower mortality improvements than projected, by more than \(0.25\%\) p.a., during 2011–2017. This observation is consistent with suggestions that austerity measures in response to the 2008 recession and excess winter deaths such as the unusually high 2014/2015 winter deaths have adversely affected female mortality trends. Recent work by Crawford et al. [10] examines the impact of social care cuts between 2009/2010 and 2017/2018 on the use of public hospitals in England and finds that reductions in long-term care spending have led to substantial increases in the number of emergency dept visits made by patients aged over 65: this is one element in a complex chain that links austerity measures to healthcare access and potentially poorer outcomes in terms of morbidity and mortality. The austerity hypothesis could also explain the larger impact on women because, on average, more women survive to older ages than men, and widowhood tends to leave them financially vulnerable to cuts in social welfare benefits (for references see [38, 49, 50]).

Both austerity and excess winter deaths would risk exacerbating the unfavourable trends in obesity, diabetes, circulatory diseases-related deaths, dementia deaths and frailty mentioned above. Disadvantaged groups within countries may be impacted more, with disproportionate adverse effects on their mortality rates such that overall mortality improvements are stalled [49]. As women are notably affected more than men in our analyses, we suggest that austerity may have disproportionately impacted women in these countries.

A number of the Scandinavian populations, including men and women in Denmark and men in Sweden and Norway have experienced mortality improvement rates in 2011–2017 that are greater than average forecast by more than \(0.25\%\) p.a. These countries were less affected by austerity and were among the countries least affected by the 2014/2015 excess winter deaths [18]. So, our results are consistent with the suggestion that austerity and excess winter deaths are linked to the recent slowdown in mortality improvement rates.

The UK, Spain and Germany are among the worst performing countries when assessed by the gap between the actual mortality improvement rates experienced and those forecast in 2011–2017 (Table 6). The UK and Spain have been cited to be more affected by austerity [50] and among the six worst affected countries by the 2014/2015 excess winter deaths in the European Union’s 15 countries [18]. So these results are consistent with the potential roles of austerity and winter deaths.

However, the experience of Germany is less consistent with the austerity and winter deaths hypotheses. It has been thought to be less affected by austerity [50] and was not one of the six worst-hit countries by the 2014/2015 winter deaths. One may have expected its post-2011 mortality trends to be more aligned with the model forecasts, but the observed trends are worse than the forecasts. An explanation may be that the HMD data series for Germany begins in 1990 which coincides with the reunification of Germany; as the gap between the mortality experience of the former East and West Germany narrowed, MIRs were initially high before subsequently slowing down, an effect which would have impacted on the model forecasts [20].

Additionally, Portugal has been regarded as having been particularly impacted by austerity [50] and was the worst-hit country by the 2014/2015 winter deaths according to EU MOMO [18]. We would expect Portugal to have experienced lower mortality improvement rates than forecast, but this is not the case for Portuguese men. These findings suggest that there are forces other than austerity and the 2014/2015 winter deaths that might have influenced the recent mortality slowdown.

We note, from Table 6, that the countries with the lowest rankings include Germany, Greece, Italy, Spain and the UK. The UK has been the subject of further analysis. It has experienced relatively low mortality improvement rates up to 2018 [38], and recent comparative analyses by the ONS suggest that it has also experienced the highest levels of excess deaths in the EU [46]. The drivers behind these adverse trends need to be investigated and identified in order to prevent the UK and the other four countries falling further behind their neighbours.

We have identified several areas where our research could be extended—we specifically mention four such areas:

  • Throughout the analysis, 2010 has been set as a reference year. Although this may be appropriate for some countries such as the UK, the exact timing of the change in the overall direction of mortality improvements can vary from country to country; further analysis would help to investigate this.

  • The stochastic mortality projection models implemented in this analysis have been fitted to the data above age 50, as most of these models were originally developed to forecast the mortality experience of pensioners. A subset of these models has been shown to be suitable for modelling mortality at younger ages (see, for example, [56, 57], for consideration of five such models) and hence it would be possible to extend the analysis down to ages below age 50.

  • In this paper, we have used a wide range of morality forecasting models to extract and forecast mortality patterns. Many variants of some of these models have been reported in the literature [15, 53]. Our work can be extended by considering some of those alternative variants. Moreover, the period components were projected in this work using multivariate random walk, whereas ARIMA was used for the cohort components. We shall consider alternative forecasting methods in a subsequent paper.

  • Our analyses are consistent with suggestions that there are forces that could have contributed to the slowdown in mortality improvement rates before 2010 and additional forces that could have contributed after that date. More analyses of the health and socio-economic trends in each country and the differences between countries would help to clarify the potential drivers behind the historical trends and potential future trajectories: see, for example, the recent paper by Kallestrup-Lamb et al. [35] which investigates the mortality improvement trend and cause of death patterns across socio economic groups for females in Denmark over the period 1985–2012.

In conclusion, we find that part of the slowdown in the MIR of the over 1950s since 2011 would have been expected from historical trends in many countries, especially among men. However, for many countries, the slowdown was more severe than forecast, especially among women; whilst, for some countries, the MIRs were better than projected. A better understanding of the complex socio-economic and health drivers underlying these differential trends could help to inform national policies.