Abstract
A common feature in the modelling and extrapolation of the trends in mortality rates over time, based on fitted parametric structures, has tended to involve the treatment of a structured fitted main effects period component (with possibly a cohort component) as a random effects time series. In this paper, we follow the lead of Haberman and Renshaw (Insurance Math Econ 50:309–333, 2012) and other authors in modelling and forecasting mortality improvement rates over time, rather than mortality rates. In this context, we assume linear parametric structures for mortality improvement rates, and we examine the feasibility of modelling the main period effects (and possibly any cohort effects) as a random effect from the outset. We argue that this leads to a more unified approach to model fitting and extrapolation.
1 Introduction
One of the themes of the recent longevity related academic literature has been the consideration of the modelling of mortality improvement rates (MIR), rather than mortality rates (MR). One of the motivations has been a practical one, as noted by Haberman and Renshaw [6], Denuit and Trufin [4] and Hunt and Villegas [10], inter alia. Many standard life tables used by actuaries (and required by regulators) for annuity pricing or reserving are increasingly based on an assumption about the dynamics of suitably defined mortality improvement rates. Thus, Haberman and Renshaw [6] and Denuit and Trufin [4] specifically mention current actuarial practice in Austria, Belgium, Denmark, Switzerland, UK and US that uses mortality improvement rates as a building block. Further we note that, in the UK, the Continuous Mortality Investigation Bureau (CMI) has recently developed and recommended a new mortality projection model based on improvement rates: see CMI [3]. These developments indicate a need for a sound theoretical foundation for the modelling of improvement rates—and this has led to a stream of contributions to the literature. We will refer to these different contributions, as appropriate, in the main body of the paper.
A second motivation for the interest of the academic literature has been the recognition that there may be theoretical and practical advantages in modelling improvement rates. A key issue in modelling mortality dynamics is understanding the dominant downwards time trend that has manifested itself over at least the last 70 years. It is well known in time series work that there are advantages if the underlying process that generates the time trend is timeinvariant. One of the common methods in time series analysis of transforming a socalled nonstationary time series into a stationary one is by detrending the series i.e. taking first differences: see Li et al. [13], Haberman and Renshaw [6, 7], Mitchell et al. [14], Hunt and Villegas [10] and BohkEwald and Rau [1]. This transformation implies that the mortality trend relates to the previous year’s mortality rates rather than the trend in a hidden mortality factor, like \(\kappa_{t}\) in the seminal model of Lee and Carter [11]. As we will show below, the definition of mortality improvement rates is linked to this transformation.
A further point noted by Hunt and Villegas [10] is that by considering mortality improvements from national data sets, inferences can be made about mortality trends in smaller subpopulations, although this requires consideration of longevity basis risk (see [8], and [19] for a further discussion).
We refer the reader to the text by Lee et al. [12] for an account of the general background to the theoretical modelling aspects underpinning this paper (Sects. 4–6), where we develop generalised linear MIR and MR models by the inclusion of random effects. Quoting from the epilogue to this text (p 360) “We [the authors] suspect that there may be many other new extensions waiting to be explored where the ideas underlying this book could be usefully exploited”. We believe that this paper provides such an extension.
All the applications presented in this paper were produced using the computer package R. Outline details are given in “Appendix II”.
Our main contribution is to show that, by attributing random effects to the period and cohort components (or just the period components) of a main effects ageperiodcohort structured linear predictor from the outset, it is possible to present a comprehensive selfcontained process for modelling and extrapolating mortality improvement rates, which incorporates structured dispersion and an apparent selfselecting time series. We show that this methodology also extends to the modelling and extrapolation of mortality rates provided that the predictor structure is linear. We argue that this methodological framework leads to a more unified approach to model fitting and extrapolation as a result of treating the time element as a random effect from the outset, thereby impacting both fitting and extrapolation stages.
The paper is arranged as follows. In Sect. 2, we investigate the interrelationship between two alternative measures of mortality improvement rates. In Sect. 3, we focus on a direct and indirect method of modelling mortality improvement rates with specific reference to the linear ageperiodcohort structure, and suggest the inclusion of a random effects period component. In Sects. 4 and 5, we describe a method of fitting respectively NormalNormal and PoissonNormal generalised linear mixed models, which incorporate fixed and random effects components, and have an additional provision for the joint modelling of a structured dispersion parameter. In Sect. 6 we indicate how a simple structured time series can also be incorporated into the fitting process. In each of Sects. 4–6, we include an investigation into the potential of the respective methodology by applying it to the recent UK male mortality experience. Sections 7 and 8 provide a discussion and some concluding comments.
2 Mortality improvement statistics
Ideally, in continuous time, age specific mortality improvement rates are quantified by the partial derivative of the logmortality rate, either with respect to period (calendaryear) or with respect to birthyear (cohort). Given that the former approach is the most commonly used, we assume this to be the case here unless stated otherwise.
Consider a rectangular ageperiod Lexis plane divided into annual cells supportive of mortality data.
\(\left( {d_{xt} ,e_{xt} ,\omega_{xt} } \right)\): age \(x = x_{1} ,x_{2} ,\ldots,x_{k}\), period \(t = t_{1} ,t_{2} ,\ldots,t_{n}\), and birthyear \(s = t  x\), where \(d_{xt} {, }\;{\text{ reported number of deaths}}\); \(e_{xt} {\text{, matching central exposure to the risk of death}}\); \(\omega_{xt} {,}\;\;{\text{ prior }}\left( {0/1} \right){\text{ weights to indicate empty/non  empty data cells}}\) and denote \(m_{x,t} {,}\;{\text{ the central rate of mortality}}\).
Assuming central rates of mortality throughout, two superficially different mortality improvement rate statistics have been proposed in the literature and are of interest:
Statistic I: \(y_{xt} = \frac{{  \Delta_{t} m_{x,t} }}{{{{\left( {m_{x,t  1} + m_{x,t} } \right)} \mathord{\left/ {\vphantom {{\left( {m_{x,t  1} + m_{x,t} } \right)} 2}} \right. \kern\nulldelimiterspace} 2}}} = 2\frac{{\left( {1  {{m_{x,t} } \mathord{\left/ {\vphantom {{m_{x,t} } {m_{x,t  1} }}} \right. \kern\nulldelimiterspace} {m_{x,t  1} }}} \right)}}{{\left( {1 + {{m_{x,t} } \mathord{\left/ {\vphantom {{m_{x,t} } {m_{x,t  1} }}} \right. \kern\nulldelimiterspace} {m_{x,t  1} }}} \right)}}\), and
Statistic II: \(z_{xt} = \Delta_{t} \log m_{x,t} = \log m_{x,t}  \log m_{x,t  1}\) where \(\Delta_{t}\) is the differencing operator and the statistics are computed using the estimate \(\hat{m}_{x,t} = {{d_{xt} } \mathord{\left/ {\vphantom {{d_{xt} } {e_{xt} }}} \right. \kern\nulldelimiterspace} {e_{xt} }}\); see Haberman and Renshaw [6] and Mitchell et al. [14] respectively for applications.
We investigate next the precise nature of the difference between the two statistics. Firstly, we note that while both statistics are discrete representations of the partial derivative of \(\log m_{x,t}\), Statistic I is based on the actual partial derivative itself. Secondly, we recall the monotonic increasing characteristic of the log function. Then a comparison of \( \Delta_{t} m_{x,t}\) with \(\Delta_{t} \log m_{x,t}\), which determines the respective signs of the two statistics, associates respectively positive and negative values with actual mortality improvements. In addition, exploratory scatter plots of the two statistics, using any rectangular ageperiod UKmale mortality data array, are found to exhibit perfect negative correlation which implies an exact connection between the two statistics.
Indeed, a mathematical tractable relationship between the two statistics reads as follows:
Correcting for the disparity in the sign of \(z_{xt}\) the relationship connecting the two statistics reads \(y_{xt} = 2\tanh \left( {{{z_{xt} } \mathord{\left/ {\vphantom {{z_{xt} } 2}} \right. \kern\nulldelimiterspace} 2}} \right)\) so that
a convergent power series comprising odd powers only, with the implication that the absolute value of \(y_{xt}\) is less than the matching absolute value of \(z_{xt}\) and hence the greater accuracy in general. Unless stated otherwise we use Statistic I.
2.1 An application
We utilise the UK male mortality data set, covering the period 1960–2016, ages 0–102, and comprising annual death counts and matching central exposures to the risk of death, as compiled by the Human Mortality Database [9]. By truncating the data at the upper age limit of 102, possible complications arising from the irregular nature of the lost fragments including zero entries are avoided. In addition, the number of any such fragments lost is extremely small, especially for the early calendar years.
One aspect of the computed empirical MIRs concerns the nature of any information provided in respect of patterns in the resulting age profiles. In order to investigate this feature, we have computed and displayed the nyear empirical MIR rollingaverages for a range of values of n. One such set of results for the 5year MIR rolling averages, expressed as percentages, and centred on the periods 2014(−1)2006 are depicted in the various panels of Fig. 1. (We make frequent use of the notation a(c)b to denote sequences of numbers ranging from a to b at intervals of c). In addition we have fitted and display a smoothspline curve in each panel (using the R smooth.spline function with a parameter setting of 0.8). We note an identifiable crude pattern in each panel subject to a degree of variation between (annually) adjacent panels. We return to this issue later in Sect. 4.
3 Mortality improvement rates and random effects
We start from the premise, implicit in Hunt and Villegas [10] that, in continuous time, the MIR is quantified by
(with the negative sign ensuring that improvement is positive and deterioration negative). We focus on linear parametric predictor structures \(\eta_{x,t}\) and specifically the familiar main effects APC (ageperiodcohort) structure, so that
comprising respectively age, birthyear and period main effects: while partial integration gives
with the \(\tilde{\eta }\) corresponding discrete version
as listed in Table 1 of Hunt and Villegas [10]. Here \(\tilde{\eta }_{x,t}\) denotes the transformed linear predictor and \(A_{x} = \log m_{{x,t_{1} }}\), the initial logmortality rates (or ‘constant’ of integration). Alternatively, Eq. (2) can be rearranged to read as
subject to the following transformation and redefinition of symbols
where \({\rm I}{\text{ and }}{\rm K}\) denote the respective integrands of \(\iota\) and \(\kappa .\)
Noting the close relationship between all three structures, MIR may be modelled directly using either of the Sect. 2 statistics as Normal responses [6], here in combination with structure (1), or indirectly using either (3) [10] or (4) (e.g. Richards et al. [17]; CMI [3]) in combination with Poisson responses \(\hat{m}_{x,t} = {{d_{xt} } \mathord{\left/ {\vphantom {{d_{xt} } {e_{xt} }}} \right. \kern\nulldelimiterspace} {e_{xt} }}\) and a loglink function.
We note that, subsequent to model fitting, (with the possible exception of the latest UK Continuous Mortality Investigation model), the fixed effects parameter \(\kappa_{t}\) (and sometimes \(\iota_{t  x}\)) are treated as random variables to facilitate model extrapolation. Hence we investigate the effect of reformulating the modelling assumptions underpinning (1), (3) and (4) by including the random effects from the outset. To do so, we follow the approach of Lee et al. [12] which is based on hierarchical generalised linear models (HGLMs).
4 Normal linear mixed modelling and MIR
We start by modelling the linear predictor structure (1) treating both \(\kappa_{t}\) and \(\iota_{t  x}\) as random effects and using Statistic I of Sect. 2 as responses. In formulating the model matrices which follow, the Lexis plane is scanned along the age axis for each increasing time period in sequence and we attach the suffix \(i = \left( {x,t} \right)\) (subject to exchangeability as appropriate), when the need arises to refer to the individual components.
Consider the multivariate normal mixed model
subject to independence with \(\Sigma\) = \(\sigma_{e}^{2}\) I, \(\Lambda\) = \(\sigma_{v}^{2}\) I; \(\tau\)\(= \left( {\sigma_{e}^{2} ,\sigma_{v}^{2} } \right)\)\(= \left( {\phi ,\lambda } \right)\) and focus on the associated augmented linear model
\(\left( {\begin{array}{*{20}c} y \\ {\psi_{M} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} X & Z \\ O & I \\ \end{array} } \right)\left( {\begin{array}{*{20}c} \beta \\ v \\ \end{array} } \right)\), abbreviated to \(y_{a} = T\theta\), with quasi random effects responses \(\psi_{M} =\)\({\text{E}}\left( \nu \right) = 0\) and augmented variance–covariance matrix
and the suffix i is introduced to indicate optional fixed effects variable dispersion.
Specifically, the matrices are designated as follows:
X, \(N \times p\) fixed effects design matrix. \(\beta\), \(p \times 1\) fixed effects parameters \(\alpha_{x}\). Z, \(N \times q\) random effects design matrix. v, \(q \times 1\) matrix of random effects comprising \(\kappa_{t}\) and \(\iota_{t  x}\). T, \(\left( {N + q} \right) \times \left( {p + q} \right)\) augmented design matrix. \(\theta\), \(\left( {p + q} \right) \times 1\) augmented matrix of fixed parameters and random effects. \(y_{a}\), \(\left( {N + q} \right) \times 1\) matrix of augmented responses comprising either one of the MIR statistics and quasi random effects responses \(\psi_{M} =\) E(\(v\)). \(\Sigma_{a}\), \(\left( {N + q} \right)\) augmented diagonal matrix of scale parameters. I and O, respective identity and zero matrices of appropriate size where in terms of \(n_{1} = x_{k}  x_{1} + 1, \, n_{2} = t_{n}  t_{1} + 1\).
The three constraints \(\kappa_{{\text{min(t)}}} = \iota_{{\text{min(s)}}} = \iota_{{\text{min(s) + 1}}} = 0\) are applied, which are sufficient in number to ensure that the matrix T has full rank.
The model is fitted using the iterative weighted least squares (IWLS) procedure outlined in “Appendix I”. Three sets of (Studentised) residuals
are generated where
subject to the respective constraints
We recall that the modelling assumptions imply that the mean values of all three sets of residuals are also zero.
4.1 An application
We make use of the UK male mortality data set, period 1960–2016, ages 0–102. We remark that the setting of the upper age limit (102) avoids any empty data cells for very little loss of other data above the age of 102. Alternatively, the introduction of 0/1 prior weights can beused to marginally extend the upper age limit.
Details of the fitted mixed effects model structure (1), with variable dispersion, are presented in Fig. 2, followed by some associated residual plots in Figs. 3, 4, 5. Thus the upper two panels in Fig. 2 depict the respective first and second moment fixed effects parameter estimates \(\hat{\alpha }_{x}\) and \(\hat{\phi }_{x}\), while the lower two panels depict the respective random effects components \(\hat{\iota }_{t  x}\) and \(\hat{\kappa }_{t}\).
Figure 3, the upper pair of panels in Fig. 4, together with the left hand column of panels in Fig. 5 refer to the first or primary set of residuals (5); with the centre and right hand columns of panels (Fig. 5) referring to the respective sets of cohort (centre) and period random effects residuals. For these sets of residual we find that
We note the following features from the figures: the constant nature of the bands of residuals in Fig. 3, consistent with the successful capture of dispersion effects; the generally satisfactory distribution of positive and negative residuals across the Lexis plane, with no noteworthy gaps (Fig. 4, upper two panels); and the marginal nature of the centre column of Normal and halfNormal plots (Fig. 5) associated with the cohort random effects. Taken as a whole, the residual plots are indicative that the modelling assumptions have been largely met.
We turn next to the replacement of the empirical MIR displayed in Fig. 1 with the corresponding fitted MIR. These are reproduced in the panels of Fig. 6, with the exception of the lower right panel for 2006. In addition, we have used the same parameter setting to generate the matching smoothspline curves, which are duplicated together with that generated for 2006 in the lower right panel. We note the sharpening focus of the patterns in each panel as a consequence. We note also the consistent shape of the resulting splinesmoother curves, which are characterised by two local extremes, subject to a slight drift in their positioning on the age axis. The two lower somewhat detached spline curves (in the lower right panel) are the product of the two most recent years, reflecting the recent decline in mortality improvement rates (a particular phenomenon which is analysed further by [5]. Almost without exception, for all panels, the readings (in terms of improvement rates) are positive.
5 PoissonNormal generalised linear mixed modelling and MR
In this section, we model the loglink linear predictor structure (4), with cohort and period random effects, using a combination of Poisson empirical mortality rates \(y_{i} = \hat{m}_{i}\) and quasi random effects \(\psi_{M} =\)\({\text{E}}\left( \nu \right) = 0\) to act as responses (dependent variable). Thus we:
Define an augmented GLM with responses \(y_{a}^{t} = \left( {y^{t} ,\psi_{M}^{t} } \right)^{t}\) such that
with potentially structured dispersion or scale parameters \(\tau\)\(= \left( {\phi ,\lambda } \right)\), respective Poisson and Normal variance functions \(V(\mu_{i} ) = \mu_{i} , \, V_{M} \left( {\psi_{Mi} } \right) = 1\), with log and identity link functions and augmented linear predictor
The matrix/vector symbols are all as defined in Sect. 4, subject to changes in the details. Thus in the case of (4) with the two random effects components
while no constraints are necessary to ensure the full rank of the matrix T.
Model fitting requires an adjusted dependent variable \(z_{a} = \left( {z^{t} ,z_{M}^{t} } \right)^{t}\) whose values
reflect the respected log and identity link functions, and a variance covariance matrix
which reflects the choice of respective variance functions: the suffix applied to \(\phi\) being indicative of fixed effects variable dispersion. Fitting then follows the IWLS procedure as laid out in “Appendix I” subject to the following changes:

1.
Given \(\tau\)\(= \left( {\phi ,\lambda } \right)\)
update \(\hat{\theta }\) by computing \(\Sigma_{a}^{  1}\), \(z_{a}\) and solving \(T^{t} \Sigma_{a}^{  1} T\hat{\theta } = T^{t} \Sigma_{a}^{  1} z_{a}\).

2.
Replace \(d_{i}\) when updating the fixed effects dispersion parameter \(\phi\) with either the squared deviance or squared Pearson residuals
$$d_{i} = 2\left( {y_{i} \log \left( {\frac{{y_{i} }}{{\mu_{i} }}} \right)  \left( {y_{i}  \mu_{i} } \right)} \right),\;\;d_{i} = \frac{{\left( {y_{i}  \mu_{i} } \right)^{2} }}{{\mu_{i} }}.$$while the empirical mortality rates provide the starting values for \(\mu\). Details of the residuals are as described in Sect. 4.
5.1 An application
We make use of the same UK male mortality data set for the period 1960–2016, ages 0–102, as above. Key details of the mixed effects structure (4) with variable dispersion by age are presented in Fig. 7: with the first moment fixed effects parameter estimates \(\hat{\alpha }_{x}\) and \(\hat{\beta }_{x}\) displayed in the two upper panels and the two random effects \(\hat{\iota }_{t  x}\) and \(\hat{\kappa }_{t}\) in the two centre panels. We note that \(\hat{\alpha }_{x}\) takes the familiar shape of a static lifetable’ including an ‘accident hump’. We also note that the second moment fixed effects parameter \(\hat{\phi }_{x}\), which is not displayed, is not too dissimilar in shape from its counterpart depicted in Fig. 2. In anticipation of the subsequent application of a first order autoregressive integrated ARI(1,1) time series, we depict the matching first differences of the two random effects components in the lower two panels.
Certain residual plots are presented in the two lower panels of Figs. 4 and 8, with the former indicating a lack of randomness in the spatial distribution of positive and negative residuals across the Lexis plane. The pattern of residuals when plotted against age, period and cohort year, result in similar patterns to the ones displayed in Fig. 3 and have not, as a consequence, been reproduced.
The model is extrapolated periodwise by applying an ARI(1,1) time series to the two random effects components as depicted in Fig. 9, together with forecasts and including matching residual plots in the alternative panels. Note that these time series plots replicate the respective details of the two centre panels in Fig. 7, while recalling the roles played by the two lower panels in Fig. 7 in generating the ARI(1,1) forecasts.
On the basis of 1000 time series forecast simulations, we have computed the periodbased life expectancies at each simulation, for certain ages, and depict the resulting 0.05, 0.5, 0.95 quantiles in Fig. 10. We have focused on periodbased computations as opposed to cohortbased computations since periodbased forecasts may be readily calibrated against historical trends based on the raw mortality data stretching back to 1922, which are also featured in Fig. 10. It would appear that the median forecast trends represent an over statement when compared with the most recent trend over the past decade or so, where mortality improvement rates have decelerated (see Fig. 1). Attempts to influence this feature by shortening the modelling period, in order to focus on the more recent mortality trend data, have not proved to be successful.
6 Incorporating an AR(1) times series into the MIR and MR fitting process
When the model structures of Sect. 3 are formulated with only the main period component designated as a random effect, the possibility of simultaneously fitting a single parameter AR(1) period component time series arises. This involves the necessary adjustments to the two design matrices X and Z and composition of the fixed and random effects matrices \(\mathbf{\beta}\) and v coupled with the introduction of a linear transformation of the random effects component
where
\({\mathbf L}\left( \rho \right)\) is a \(q \times q\) matrix, with parameter \(\rho\), and has \(rank\left( {{\mathbf L}\left( \rho \right)} \right) = q\).
Specifically for the AR(1) process, under the reverse transformation
where K has nonzero elements \(k_{t + 1,t} = \rho\), so that component wise.
\(r_{t} = v_{t}  \rho v_{t  1} ,t = t_{\min } ,t_{\min + 1} , \ldots ,t_{n} .\),.
Comprehensive model fitting then follows by embedding the IWLS fitting process (“Appendix I”) between the following two steps:

1.
Given \(\rho\) and hence \({\mathbf L}\left( \rho \right)\) estimate \(\left( {\mathbf {\beta ,r,\phi ,\lambda }} \right)\) for the model
\({\mathbf \eta} = g\left( {\mathbf \mu }\right) =\)\({\mathbf X\beta + Z^{*} r}\) where \({\mathbf Z^{*} = ZL}\left( \rho \right)\).
and g is the appropriate identity or log link function.

2.
Given \(\left( {\mathbf {\beta ,r,\phi ,\lambda }} \right)\) estimate \(\rho\) by maximising the adjusted profile likelihood.
Here, with \({\mathbf r\sim MVN}\left( {\mathbf O,\Lambda }\right)\),\({\mathbf \Lambda} = diag\left( \lambda \right)\) and \({\mathbf v = L}\left( \rho \right){\mathbf r}\) we maximise the profile likelihood of \({\mathbf v\sim}\)\({\mathbf MVN}\left( {{\mathbf O,L}\left( {\mathbf \rho} \right){\mathbf \Lambda L}\left({\mathbf \rho} \right)^{\mathbf t} }\right)\), adjusted to \({\mathbf v\sim}\)\({\mathbf MVN}\left( {{\mathbf O,\Lambda }} \right)\), as originally defined: with the parameter \(\rho\) entering the expression for the adjusted profile likelihood via the response term \(v\) which comprises the linear transformation of \(r\), involving \(\rho\).
We subsequently refer to the above comprehensive selfcontained fitting and extrapolation process as Approach A. Alternatively, by setting \(L\left( \rho \right) = I\), we can retain the IWLS fitting process unchanged and separately fit the single parameter AR(1) time series, subsequently referred to as Approach B.
For (1)
with the two constraints \(\kappa_{\min } = \iota_{\min } = 0\) applied, and for (3)
with the three constraints \(\iota_{{{\text{t}}_{{1}}  x_{k} }} = \iota_{{{\text{t}}_{{\text{n}}}  x_{1} }} = \kappa_{{t_{1} }} = 0\) applied.
We note that in the case of (3) care is needed in the construction of matrix X to ensure that the main effects cohort parameters do not automatically take precedence over other fixed effects parameters (which appears to be the case if the Rlibrary lme4 file is accessed and used in its construction).
6.1 An application
In addition to the direct modelling of MIR by the above process using model (1), we compare results obtained by the indirect modelling of MIR using model (3), which has also been formulated with a single period random effect. We again make use of the UKmale mortality data set, this time with slightly reduced period (1961–2009) and age (1–89) ranges in order to facilitate comparison with certain aspects of the previously reported investigations in Haberman and Renshaw [6]. When fitting (1), we choose the MIR Statistic I of Sect. 2, while noting that the choice does not have a material effect on the ensuing results. When fitting (3), we include the term \(A_{x}\) in the design matrix as a fixed effect to be estimated, as opposed to giving it the status of a predetermined offset. For both models, we allow for the capture of structured dispersion by fixed age effects.
The fitted results obtained using (1) are depicted in Fig. 11, together with one of the resulting residual plots in Fig. 12: the patterns in the remaining residual plots being similar to their counterparts in Fig. 3 and the two upper panels in Fig. 4. Modelling is conducted using Approach A throughout this section, but we have additionally superimposed, in Fig. 11, the equivalent plots obtained using Approach B to modelling (broken lines). We note the consistency of patterns within each panel as is the case when comparing the residual plots (not reproduced here but available from the authors).
Similarly, the fitted results obtained using (3), are presented in Fig. 13 together with one of the resulting residual plots in Fig. 14: the patterns in the remaining residual plots being similar to their counterparts in Fig. 3 and the two lower panels in Fig. 4. We note that the estimated fixed effects term \(A_{x}\) depicted in the upper left panel of Fig. 13 takes on the familiar shape of a static lifetable (including an ‘accident hump’), and we have plotted the differences between this estimated age profile and the matching initial empirical log mortality rate profile in the lower left panel of Fig. 13 for comparison purposes. We note the pattern of narrowing differences with increasing age. Again, we have superimposed the equivalent plots obtained using Approach B to modelling (broken lines) in all but the lower left panel of Fig. 13 where two superimposed different point plotting symbols are displayed. The mutually compensating displaced patterns in the fixed effects parameter \(\alpha_{x}\) and \(\iota_{t  x}\) plots (upper right, centre left panels) are noteworthy while consistency is preserved in the remaining four panels. Of particular interest is the horizontal trend (around zero) in the respective random effects period component (lower left panel Fig. 11 and centre right panel Fig. 13), which is indicative of the choice of a single parameter AR(1) process. We believe that this is possibly a general feature which is a consequence of the mixed effects model design.
Since the patterns in the matching residual plots are essentially identical under both Approaches A and B to modelling, they have not been replicated. The supporting comparison of residual plots is largely as illustrated in Fig. 7a–d of Haberman and Renshaw [6].
6.2 Conversion of model outputs into model rates or improvement rates
The basic operation used to determine mortality rates from model (1) is that of definite integration, while that used to determine improvement rates from model (3) is that of differentiation. We have conducted each of these operation (in discrete time) to construct Figs. 15 and 16 by superimposing the respective fitted statistic (broken lines) on the corresponding empirical statistic (continuous line). Thus in Fig. 15, we have used the fitted values \(\hat{y}_{x,t}\) from model (1) (based on Statistic I of Sect. 2) to compute and depicted the ‘fitted’ log mortality rates
for ages x = 40(05)75. Two cases are displayed using (1) \(\hat{A}_{x}\) from model (3) (brokenlines), and (2) \(\log \hat{m}_{{x,t_{1} }}\) the initial empirical log rates (dotted lines) to provide the starter log rates \(\log m_{{x,t_{1} }}\). In Fig. 16, we have performed the reciprocal conversion process by using the parameter estimates from models (3) to generate the brokenlines depicting ‘fitted’ improvements in Fig. 16. This issue is discussed further in Sect. 7.
6.3 Model extrapolation
Once the models have been fitted to the data, the method by which mortality improvement rates are extrapolated and converted to rates, using the fixed effects parameter estimates \(\hat{\alpha }_{x}\) and \(\hat{\iota }_{t  x}\), and random effects time series \(\hat{\kappa }_{t} = \hat{v}_{t}\), is identical for the two models (1) and (3): only the method of estimating these effects differs.
In the application which follows, when converting from extrapolated improvements to rates, we adopt the procedure described in Sect. 2.6 of Haberman and Renshaw [6]. For extrapolation to ages outside the dataset (referred to as “topping out” by age in the literature), we follow the period based version of the procedure using a fitted hyperbolic function which is described in Sect. 2.7 of Haberman and Renshaw [7]. In order to facilitate comparison with the earlier results of Haberman and Renshaw [6], in this application, we have truncated the lower end of the age range to ages 20–89, prior to modelling,
Given the horizontal trend in the random effects period components, coupled with the design feature \(E\left( {v_{i} } \right) = 0\), we extrapolate the models by applying the single parameter \(AR\left( 1 \right)\) time series to \(\hat{\kappa }_{t}\). By this method, we have constructed Fig. 17 which depicts extrapolated life expectancy 5%, 50%, 95% quantiles, computed on a cohort basis, and plotted as the individual horizontal lines, each based on 1000 simulations. First, in the two upper panels we show the respective model evolving biennial \(t_{n} =\) 1995(02)2009 extrapolations (subject to frontend data deletions shown in descending sequence), thereby using data for 1961–1995, 1961–1997,…, 1961–2009. Second, in the two lower panels, we show the respective model static \(t_{n} =\) 2009 extrapolations having first subjected the data to systematic biennial rearend truncations 1961(02)1975, thereby using data for the periods 1961–2009, 1963–2009,…, 1975–2009 (shown in ascending sequence).
We note the similarity of the extrapolations on comparing the different models, together with the material narrowing of the intervals in comparison with the intervals reported in Haberman and Renshaw [6], using a variety of fixed effects model structures.
As further evidence of the horizontal trend about zero in the random effects period component, we have compared the effects of including an additional intercept parameter in the AR(1) time series using Approach B. The results are presented in Table 1, in which we have tabulated the AR(1) parameter estimates, standard errors and pvalues, both with and without the intercept parameter included: and note the lack of statistical significance of the intercept parameter throughout.
We now revert to the full 1–89 age range, and an illustration of the 40(05)75 age specific empirical log mortality rates with simulated 5%, 50%, 95% quantile projections using model (1) and Approach A of Sect. 6 is presented in Fig. 18. These projections, based on 2000 simulations, have been generated by sampling the error in the period component time series. As has been widely discussed in the literature (see Lee and Carter [11], for example), this approach is an approximate one and does not take into account the smaller contribution from the uncertainty in the fixed effects parameter estimates. We have adopted this approach on grounds of simplicity, with the justification that the uncertainty in the fixed effects parameter estimates would only make a marginal difference to the prediction intervals.
7 Discussion
Concerning Figs. 1 and 6, the choice of n when forming the nyear MIR rolling averages is, in part, somewhat arbitrary. However, as n increases, further investigations (not included here) show that the emerging age patterns in the MIR become sharper in focus and display greater consistency. In addition, the patterns of tightly packed MIR age profiles (lower right panel Fig. 6) are found to extend backwards in time over the best part of the last half century before breaking up. This feature is broadly supported by the equivalent details of Fig. 1 which extends backwards in time much further.
The same mathematical formula connecting the two alternative MIR statistics of Sect. 2 continues to apply when the statistics are redefined from the cohort, as opposed to period perspective, to read as follows:
Statistic I: \(\tilde{y}_{xt} = \frac{{  \Delta_{s} m_{x,t} }}{{{{\left( {m_{x  1,t  1} + m_{x,t} } \right)} \mathord{\left/ {\vphantom {{\left( {m_{x  1,t  1} + m_{x,t} } \right)} 2}} \right. \kern\nulldelimiterspace} 2}}} = 2\frac{{\left( {1  {{m_{x,t} } \mathord{\left/ {\vphantom {{m_{x,t} } {m_{x  1,t  1} }}} \right. \kern\nulldelimiterspace} {m_{x  1,t  1} }}} \right)}}{{\left( {1 + {{m_{x,t} } \mathord{\left/ {\vphantom {{m_{x,t} } {m_{x  1,t  1} }}} \right. \kern\nulldelimiterspace} {m_{x  1,t  1} }}} \right)}}\), and
Statistics II: \(\tilde{z}_{xt} = \Delta_{s} \log m_{x,t} = \log m_{x,t}  \log m_{x  1,t  1}\).
See Haberman and Renshaw [7] for application of the former approach.
While noting that Schinzinger et al. [18] use a different predictor structure, which is multiplicative bilinear as opposed to our additive linear structure, both approaches assume Normal MIR random variables. Of particular relevance, we refer to the Centre Right panel of their Fig. 6.1 in which the annual ageaggregated empirical MIR, based on ages 21–100, for periods 1971–2011 are depicted for UK males (and females).
Using our slightly shorter version of the data, and the MIR Statistic II in common with Schinzinger et al. [18], the same results (males only), for the overlapping periods 1971–2009, are reproduced in the Upper Left panel of Fig. 19. In addition, a scatterplot of the timeadjacent ageaggregated MIR is depicted in the Upper Right panel, to illustrate the degree and nature of the correlation between these two measures, as discussed in Sect. 6.1 of Schinzinger et al. [18]. The equivalent results using the MIR Statistic I are depicted in the two Lower panels of Fig. 19. The comparison of matching upper and lower left panels reveals that the one is the mirror image, in the xaxis, of the other, for reasons given in Sect. 2; while the upwards trend in the pattern in the lower left panel accurately portrays the improvement.
Denuit and Trufin [4] provide a further interesting example of a HGLM with random effects period component, without classifying it as such. Thus, as a means of projecting a regulatory life table, an exponential decline model
subject to stipulated constraints is formulated, in which the mortality rates \(m_{x,t}\) are subjected to a sequence of annual random shocks \(\Lambda_{t}\). Thus
with loglink, so that
where the first term on the RHS is treated as an offset \(\log \hat{m}_{x,t  i}\), the next two fixed effects terms are estimated, and the final term \(v_{t} = \log \Lambda_{t}\) is modelled as a normal random effects variable (so that \(\Lambda_{t}\) has the lognormal distribution); with estimates subject to the stipulated constraints.
The philosophy underpinning this approach to generalised linear modelling with random effects and the associated likelihood theory, which we have followed and which is described in Lee et al. [12], is nonBayesian.
In addition to providing the detailed methodology on which Sects. 4–6 are based, the methodology of Lee et al. 12 could be extended to the incorporation of smoothing, including smoothing by betasplines.
We note that there is a long established practice of using the Poisson distribution to model numbers of deaths and hence central mortality rates. This starts historically with a static setting in the construction and graduation of lifetables and moves forward to a dynamic setting as here and in the extensive literature on longevity modelling. As discussed in Section 1 of Hunt and Villegas [10], there would appear to be no such established practice for the direct modelling of central mortality improvement rates. Given our results and in particular the residual plots reported in the applications (and also by Haberman and Renshaw [6]), we believe there is adequate evidence for the use of the Gaussian distribution to model mortality improvement rates.
In their Sect. 3.4 of Hunt and Villegas [10] have identified an important inherent difficulty with modelling improvements directly, expressed in terms of a “crude” or “fitted” estimation approach. We conjecture that this manifests itself, in the current modelling framework, as the choice between estimating \(A_{x}\) as part of the fixed effects structure in (3) or treating \(\hat{A}_{x} = \log \hat{m}_{{x,t_{1} }}\) as an offset term.
Further, this issue appears to be bound up with the difficulty of reversing the differentiation process which requires a suitable ‘constant’ of integration: a situation reminiscent of the problem of setting boundary values when integrating partial differential equations in mathematical physics. Setting aside model extrapolation, and given that the sequential differencing of rates is implied in the construction of improvements measures, the use of initial rates to reverse the process is a potential concern, and was the motivation behind the construction of Fig. 15 in Sect. 6.
When introducing a main effects cohort term into a dynamic parameterised predictor structure for modelling mortality rates (Renshaw and Haberman [16]), we have naturally assumed that the supporting Lexis plane was divided into unit squares, typically of size 1 year. We have had the benefit of a data set with sufficient granularity to match this level of detail. In so doing, we have avoided some of the problems that were experienced by Mitchell et al. [14] who have needed to use grouped data and irregular grouped data in some cases: this has led to difficulties in modelling the parameterised cohort components.
We note that the “toppingout” procedure does have a material effect on the values of the projected life expectancies depicted in Fig. 17. We note also that the choice of fixed effects parameter constraints made for the three mixed effects models is not necessarily unique but is sufficient to ensure that the matrix T has full rank.
8 Concluding comments
The comparison of MIR Statistics (Sect. 2) indicates the greater accuracy of Statistic I. However, we have found that the replacement of Statistic I by Statistic II (subject to reversal of sign) does not induce any material consequential differences in our reported results.
By attributing random effects to the period and cohort components or just the period components of a main effects ageperiodcohort structured linear predictor from the outset, we have described a comprehensive selfcontained process for modelling and forecasting mortality improvement rates, which incorporates structured dispersion and an apparent selfselecting time series. This process is made possible through the implementation of HGLMs with random effects. The methodology also extends to the modelling and forecasting of mortality rates provided that the predictor structure is linear, and, as such, excludes alternatives that involve bilinear decomposition. We argue that this methodological framework leads to a more unified approach to model fitting and forecasting.
For the linear predictor structures under consideration, the choice of the associated time series models, AR(1) for MIR mixed effects modelling and ARI(1,1) for MR mixed effects modelling, follows as a direct consequence of the initial modelling assumptions. The application of these time series models in this context, which are shown to be well supported by the data, may be further generalised by increasing the number of autoregressive terms when this is justified by the practical application.
We note the usefulness of examining the patterns of the random effects residuals as these provide additional critical insight into whether the modelling assumptions are being adhered to.
We note also the usefulness of comparing the empirical and modelled age specific mortality improvement rates. For example, we note that the MIR age profile modelled patterns of Sect. 4 (Fig. 6) are broadly similar to the corresponding empirically generated MIR age profiles of Fig. 1.
There is a long established practice of using the Poisson distribution to model numbers of deaths and hence central mortality rates. Given our detailed modelling results, we believe there is adequate empirical evidence for the use of the Gaussian distribution to model mortality improvement rates.
A word of caution is needed in the use of the Sect. 5 methodology for the long term forecasting of mortality rates and life expectancies in so far as it may not be sufficiently flexible to reflect recently emerging trends in the underlying mortality rates and corresponding life expectancies for certain countries (see, for example, Case and Deaton [2] and Djeundje et al. [5]).
Availability of data and material
The empirical data are publicly available from the Human Mortality Database.
Code availability
The code is open source and written in R statistical software. The code is available upon request from the authors.
References
BohkEwald C, Rau R (2017) Probabilistic mortality forecasting with varying agespecific survival improvements. Genus 73:1
Case A, Deaton A (2015) Rising morbidity and mortality in midlife among white nonHispanic Americans in the 21st century. Proc Natl Acad Sci USA 112:15078–15083
Continuous Mortality Investigation (2016) CMI Mortality Projections ModelWorking Paper 90. In: Institute and Faculty of Actuaries. https://www.actuaries.org.uk/learnanddevelop/continuousmortalityinvestigation/cmiworkingpapers/mortalityprojections/cmiworkingpapers9091and93
Denuit M, Trufin J (2016) From regulatory life table to stochastic mortality projections: the exponential decline model. Insurance Math Econ 71:295–303
Djeundje VB, Haberman S, Bajekal M, Lu J (2020). An analysis of mortality trends in developed countries, focusing on the recent slowdown in mortality improvements. In: Longevity Science Panel working paper. https://www.longevitypanel.co.uk/
Haberman S, Renshaw AE (2012) Parametric mortality rate modelling and projecting. Insurance Math Econ 50:309–333
Haberman S, Renshaw AE (2013) Modelling and projecting mortality improvement rates using a cohort perspective. Insurance Math Econ 53:150–168
Haberman S, Kaishev VK, Millossovich P, Villegas AM, Baxter SD Gaches AT Gunnlausson S, Sison M (2014) Longevity basis risk: a methodology for assessing basis risk. Technical Report, Cass Business Schooland Hymans Robertson LLP. Presented to Institute and Faculty of Actuaries. https://www.actuaries.org.uk/learnanddevelop/conferencepaperarchive/2014
Human Mortality Database (2019) www.mortality.org
Hunt A, Villegas AM (2020) Mortality improvement rates: modelling, parameter uncertainty and robustness. In: ARC Centre of Excellence in Population Ageing Research Working Paper 2020/28
Lee RD, Carter L (1992) Modelling and forecasting the time series of US mortality projection models. J Am Stat Assoc 87:659–671
Lee Y, Nelder JA, Pawitan Y (2006) Generalised linear models with random effects. Chapman & Hall
Li JSH, Chan WS, Cheung SH (2011) Structural changes in the LeeCarter mortality indexes: detection and implementations. N Am Actuarial J 15(1):13–31
Mitchell D, Brockett PL, MendozaArriaga R, Muthuraman K (2013) Modelling and forecasting mortality rates. Insurance Math Econ 52:275–285
R core team (2017) A language and environment for statistical modelling. R Foundation for Statistical Computing, Vienna, Austria. https://www.Rproject.org/
Renshaw A, Haberman S (2006) A cohortbased extension to the LeeCarter model for mortality reduction factors. Insurance: Mathematics and Economics 38(3):556–570
Richards SJ, Currie ID, Kleinow T, Ritchie GP (2017) A Stochastic Implementation of the APCI model for mortality projections. www.longevitas.co.uk
Schinzinger E, Denuit MM, Christiansen MC (2016) A multivariate evolutionary credibility model for mortality improvement rates. Insurance Math Econ 69:70–81
Villegas AM, Haberman S, Kaishev VK, Millossovich P (2017) A comparative study of two population models for the assessment of risk bias in longevity hedges. ASTIN Bull 47(3):631–679
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
We declare that we have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix I
Iterative weighted least squares (IWLS) model fitting procedure:

1.
Given \(\mathbf{\tau}\)\(= \left( {\phi ,\lambda } \right)\)
update \(\hat{\mathbf \theta }\) by computing \({\mathbf \Sigma}_{a}^{  1}\) and solving \({\mathbf T^{t}} {\mathbf \Sigma_{a}^{  1}} {\mathbf T\hat{\theta }} = {\mathbf T^{t}} {\mathbf \Sigma_{a}^{  1} y_{a}}\).
compute leverages \(\left( {h_{i} ,h_{Mi} } \right)\), diagonal elements of matrix \({\mathbf T}\left( {{\mathbf T}^{t} {\mathbf \Sigma}_{a}^{  1} T} \right)^{  1} {\mathbf T}^{t} {\mathbf \Sigma}_{{\mathbf a}}^{  {\mathbf 1}}\).

2.
Given \(\theta\)
update the components of \(\hat{\tau }\) sequentially using the Gamma GLMs
GLM characteristic  Fixed effects \(\phi\)  Random effects \(\lambda\) 

Response  \(d_{i}^{*}\)  \(d_{Mi}^{*}\) 
Mean  \(E\left( {d_{i}^{*} } \right) = \phi\)  \(E\left( {d_{Mi}^{*} } \right) = \lambda\) 
Variance  \({\text{var}} \left( {d_{i}^{*} } \right) = 2\phi^{2}\)  \({\text{var}} \left( {d_{Mi}^{*} } \right) = 2\lambda^{2}\) 
Link  \(\xi = g\left( \phi \right)\)  \(\xi_{M} = g_{M} \left( \lambda \right)\) 
Linear predictor  \(\gamma\)  \(\gamma_{M}\) 
Deviance component  \(gamma\left( {d_{i}^{*} ,\phi } \right)\)  \(gamma\left( {d_{Mi}^{*} ,\lambda } \right)\) 
Prior weights  \({{\left( {1  h_{i} } \right)} \mathord{\left/ {\vphantom {{\left( {1  h_{i} } \right)} 2}} \right. \kern\nulldelimiterspace} 2}\)  \({{\left( {1  h_{Mi} } \right)} \mathord{\left/ {\vphantom {{\left( {1  h_{Mi} } \right)} 2}} \right. \kern\nulldelimiterspace} 2}\) 
where the respective responses
the deviance components
and \(i = \left( {x,t} \right)\) identifies matching data cells and matrix components.
Starting values of the order 0.01 are allocated to the components of \(\tau\)\(= \left( {\phi ,\lambda } \right)\), while termination of the iterative process is controlled by the convergence of the three GLM deviances (and or convergence of the absolute difference in the sequential maximum scale parameter: thus \(\mathop {\max }\limits_{i} \left {\tau_{i,j}  \tau_{i,j  1} } \right < tolerance\) for iterations \(j = 1,2,3,...\) and preset tolerance). Both the canonical reciprocal link and log link have been found to work equally well. For uniform dispersion, the linear predictors \(\gamma\) and \(\gamma_{M}\) are set to be constants, while variable fixed effects dispersion by age, is possible by setting the linear predictors and starting values accordingly. When this is the case, \(\phi_{i}\) replaces \(\phi\).
Appendix II
(Outline details of the Rprogram used in this paper).
The key sections are as follows

1.
The settings comprise (1) scalars controlling the geometry of the data array, the extent of periodwise extrapolation, the number of simulations, the choice of MIR statistic and model type, (2) vectors controlling the choice of ages and quantiles when extrapolating.

2.
The collection of own functions includes functions for returning (1) MIR statistics, (2) agespecific periodwise life expectancies, (3) ARI(p,d) time series with forecasts and error forecasts.

3.
The set up involves (1) data scanning and (2) vector creation of different lengths and types, indexing the annual data cells by age, period and cohort

4.
The model fit first requires the construction of matrices O, I, X, Z, T,\(y_{a}\). The Rcommand library(lme4) aids this. For PoissonNormal mixed modelling an additional N by 1 matrix mu, using empirical mortality rates for starter values, is required to form the adjusted dependent variable, expression (6) The matrices \(\Sigma_{a}\) and \(\theta\), together with a diagonal matrix of leverages evolve as part of the iterative fitting procedure (“Appendix I”). Details of the age/period/cohort component effects and residuals are extracted for display.

5.
The graphics comprise (1) residual and (2) component fixed/random effects plots.

6.
In addition to returning forecasts with root mean square errors, a graphical facility has been included in the ARI own function.

7.
For preselected ages and future periods, the simulation process results in matrices of simulated (1) log mortality rates and (2) periodwise life expectancies. For MIR modelling, an additional step is applied to convert from improvement rate projections to log mortality rate projections. The number of simulations equates with the number of columns in these matrices, which readily reduce in size on taking quantiles. Graphics are included.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Renshaw, A., Haberman, S. Modelling and forecasting mortality improvement rates with random effects. Eur. Actuar. J. 11, 381–412 (2021). https://doi.org/10.1007/s13385021002741
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13385021002741
Keywords
 Mortality improvements
 Random effects modelling
 Hierarchical generalised linear modelling
 Age heteroscedasticity
 Mortality forecasting