Modelling and forecasting mortality improvement rates with random effects

A common feature in the modelling and extrapolation of the trends in mortality rates over time, based on fitted parametric structures, has tended to involve the treatment of a structured fitted main effects period component (with possibly a cohort component) as a random effects time series. In this paper, we follow the lead of Haberman and Renshaw (Insurance Math Econ 50:309–333, 2012) and other authors in modelling and forecasting mortality improvement rates over time, rather than mortality rates. In this context, we assume linear parametric structures for mortality improvement rates, and we examine the feasibility of modelling the main period effects (and possibly any cohort effects) as a random effect from the outset. We argue that this leads to a more unified approach to model fitting and extrapolation.


Introduction
One of the themes of the recent longevity related academic literature has been the consideration of the modelling of mortality improvement rates (MIR), rather than mortality rates (MR). One of the motivations has been a practical one, as noted by Haberman and Renshaw [6], Denuit and Trufin [4] and Hunt and Villegas [10], inter alia. Many standard life tables used by actuaries (and required by regulators) for annuity pricing or reserving are increasingly based on an assumption about the dynamics of suitably defined mortality improvement rates. Thus, Haberman and Renshaw [6] and Denuit and Trufin [4] specifically mention current actuarial practice in Austria, Belgium, Denmark, Switzerland, UK and US that uses mortality improvement rates as a building block. Further we note that, in the UK, the Continuous Mortality Investigation Bureau (CMI) has recently developed and recommended a new mortality projection model based on improvement rates: see CMI [3]. These developments indicate a need for a sound theoretical foundation for the modelling of improvement rates-and this has led to a stream of contributions to the literature. We will refer to these different contributions, as appropriate, in the main body of the paper.
A second motivation for the interest of the academic literature has been the recognition that there may be theoretical and practical advantages in modelling improvement rates. A key issue in modelling mortality dynamics is understanding the dominant downwards time trend that has manifested itself over at least the last 70 years. It is well known in time series work that there are advantages if the underlying process that generates the time trend is time-invariant. One of the common methods in time series analysis of transforming a so-called non-stationary time series into a stationary one is by de-trending the series i.e. taking first differences: see Li et al. [13], Haberman and Renshaw [6,7], Mitchell et al. [14], Hunt and Villegas [10] and Bohk-Ewald and Rau [1]. This transformation implies that the mortality trend relates to the previous year's mortality rates rather than the trend in a hidden mortality factor, like t in the seminal model of Lee and Carter [11]. As we will show below, the definition of mortality improvement rates is linked to this transformation.
A further point noted by Hunt and Villegas [10] is that by considering mortality improvements from national data sets, inferences can be made about mortality trends in smaller sub-populations, although this requires consideration of longevity basis risk (see [8], and [19] for a further discussion).
We refer the reader to the text by Lee et al. [12] for an account of the general background to the theoretical modelling aspects underpinning this paper (Sects. [4][5][6], where we develop generalised linear MIR and MR models by the inclusion of random effects. Quoting from the epilogue to this text (p 360) "We [the authors] suspect that there may be many other new extensions waiting to be explored where the ideas underlying this book could be usefully exploited". We believe that this paper provides such an extension.
All the applications presented in this paper were produced using the computer package R. Outline details are given in "Appendix II".
Our main contribution is to show that, by attributing random effects to the period and cohort components (or just the period components) of a main effects ageperiod-cohort structured linear predictor from the outset, it is possible to present a comprehensive self-contained process for modelling and extrapolating mortality improvement rates, which incorporates structured dispersion and an apparent selfselecting time series. We show that this methodology also extends to the modelling and extrapolation of mortality rates provided that the predictor structure is linear. We argue that this methodological framework leads to a more unified approach to model fitting and extrapolation as a result of treating the time element as a random effect from the outset, thereby impacting both fitting and extrapolation stages.
The paper is arranged as follows. In Sect. 2, we investigate the inter-relationship between two alternative measures of mortality improvement rates. In Sect. 3, we focus on a direct and indirect method of modelling mortality improvement rates with specific reference to the linear age-period-cohort structure, and suggest the inclusion of a random effects period component. In Sects. 4 and 5, we describe a method of fitting respectively Normal-Normal and Poisson-Normal generalised linear mixed models, which incorporate fixed and random effects components, and have an additional provision for the joint modelling of a structured dispersion parameter. In Sect. 6 we indicate how a simple structured time series can also be incorporated into the fitting process. In each of Sects. 4-6, we include an investigation into the potential of the respective methodology by applying it to the recent UK male mortality experience. Sections 7 and 8 provide a discussion and some concluding comments.

Mortality improvement statistics
Ideally, in continuous time, age specific mortality improvement rates are quantified by the partial derivative of the log-mortality rate, either with respect to period (calendar-year) or with respect to birth-year (cohort). Given that the former approach is the most commonly used, we assume this to be the case here unless stated otherwise.
Consider a rectangular age-period Lexis plane divided into annual cells supportive of mortality data. d xt , e xt , xt : age x = x 1 , x 2 , … , x k , period t = t 1 , t 2 , … , t n , and birth-year s = t − x , where d xt , reported number of deaths ; e xt , matching central exposure to the risk of death ; xt , prior (0∕1) weights to indicate empty/non -empty data cells and denote m x,t , the central rate of mortality.
Assuming central rates of mortality throughout, two superficially different mortality improvement rate statistics have been proposed in the literature and are of interest: Statistic I: , and Statistic II: z xt = Δ t log m x,t = log m x,t − log m x,t−1 where Δ t is the differencing operator and the statistics are computed using the estimate m x,t = d xt e xt ; see Haberman and Renshaw [6] and Mitchell et al. [14] respectively for applications.
We investigate next the precise nature of the difference between the two statistics. Firstly, we note that while both statistics are discrete representations of the partial derivative of log m x,t , Statistic I is based on the actual partial derivative itself. Secondly, we recall the monotonic increasing characteristic of the log function. Then a comparison of −Δ t m x,t with Δ t log m x,t , which determines the respective signs of the two statistics, associates respectively positive and negative values with actual mortality improvements. In addition, exploratory scatter plots of the two statistics, using any rectangular age-period UKmale mortality data array, are found to exhibit perfect negative correlation which implies an exact connection between the two statistics.
Indeed, a mathematical tractable relationship between the two statistics reads as follows: Correcting for the disparity in the sign of z xt the relationship connecting the two statistics reads y xt = 2 tanh z xt 2 so that a convergent power series comprising odd powers only, with the implication that the absolute value of y xt is less than the matching absolute value of z xt and hence the greater accuracy in general. Unless stated otherwise we use Statistic I.

An application
We utilise the UK male mortality data set, covering the period 1960-2016, ages 0-102, and comprising annual death counts and matching central exposures to the risk of death, as compiled by the Human Mortality Database [9]. By truncating the data at the upper age limit of 102, possible complications arising from the irregular nature of the lost fragments including zero entries are avoided. In addition, the number of any such fragments lost is extremely small, especially for the early calendar years.
One aspect of the computed empirical MIRs concerns the nature of any information provided in respect of patterns in the resulting age profiles. In order to investigate this feature, we have computed and displayed the n-year empirical MIR rolling-averages for a range of values of n. One such set of results for the 5-year MIR rolling averages, expressed as percentages, and centred on the periods 2014(−1)2006 are depicted in the various panels of Fig. 1. (We make frequent use of the notation a(c)b to denote sequences of numbers ranging from a to b at intervals of c). In addition we have fitted and display a smooth-spline curve in each panel (using the R smooth.spline function with a parameter setting of 0.8). We note an identifiable crude pattern in each panel subject to a degree of variation between (annually) adjacent panels. We return to this issue later in Sect. 4.

Mortality improvement rates and random effects
We start from the premise, implicit in Hunt and Villegas [10] that, in continuous time, the MIR is quantified by (with the negative sign ensuring that improvement is positive and deterioration negative). We focus on linear parametric predictor structures x,t and specifically the familiar main effects APC (age-period-cohort) structure, so that z xt = 2 tanh −1 y xt 2 = log 1 + y xt 2 1 − y xt 2 = y xt + 2 3 comprising respectively age, birth-year and period main effects: while partial integration gives with the ̃ corresponding discrete version as listed in Table 1 of Hunt and Villegas [10]. Here ̃x ,t denotes the transformed linear predictor and A x = log m x,t 1 , the initial log-mortality rates (or 'constant' of integration). Alternatively, Eq. (2) can be rearranged to read as Noting the close relationship between all three structures, MIR may be modelled directly using either of the Sect. 2 statistics as Normal responses [6], here in combination with structure (1), or indirectly using either (3) [10] or (4) (e.g. Richards et al. [17]; CMI [3]) in combination with Poisson responses m x,t = d xt e xt and a log-link function.
We note that, subsequent to model fitting, (with the possible exception of the latest UK Continuous Mortality Investigation model), the fixed effects parameter t (and sometimes t−x ) are treated as random variables to facilitate model extrapolation. Hence we investigate the effect of reformulating the modelling assumptions underpinning (1), (3) and (4) by including the random effects from the outset. To do so, we follow the approach of Lee et al. [12] which is based on hierarchical generalised linear models (HGLMs).

Normal linear mixed modelling and MIR
We start by modelling the linear predictor structure (1) treating both t and t−x as random effects and using Statistic I of Sect. 2 as responses. In formulating the model matrices which follow, the Lexis plane is scanned along the age axis for each increasing time period in sequence and we attach the suffix i = (x, t) (subject to exchangeability as appropriate), when the need arises to refer to the individual components.
Consider Specifically, the matrices are designated as follows: X, N × p fixed effects design matrix. , p × 1 fixed effects parameters x . Z, N × q random effects design matrix. v, q × 1 matrix of random effects comprising t and t−x . T, (N + q) × (p + q) augmented design matrix. , (p + q) × 1 augmented matrix of fixed parameters and random effects. y a , (N + q) × 1 matrix of augmented responses comprising either one of the MIR statistics and quasi random effects . Σ a , (N + q) augmented diagonal matrix of scale parameters. I and O, respective identity and zero matrices of appropriate size where in terms of The three constraints min(t) = min(s) = min(s) + 1 = 0 are applied, which are sufficient in number to ensure that the matrix T has full rank.
The model is fitted using the iterative weighted least squares (IWLS) procedure outlined in "Appendix I". Three sets of (Studentised) residuals are generated where subject to the respective constraints We recall that the modelling assumptions imply that the mean values of all three sets of residuals are also zero.

An application
We make use of the UK male mortality data set, period 1960-2016, ages 0-102. We remark that the setting of the upper age limit (102) avoids any empty data cells for very little loss of other data above the age of 102. Alternatively, the introduction of 0/1 prior weights can beused to marginally extend the upper age limit.
Details of the fitted mixed effects model structure (1), with variable dispersion, are presented in Fig. 2, followed by some associated residual plots in Figs. 3, 4, 5.
Thus the upper two panels in Fig. 2 depict the respective first and second moment fixed effects parameter estimates ̂x and ̂x , while the lower two panels depict the respective random effects components ̂t −x and ̂t. Figure 3, the upper pair of panels in Fig. 4, together with the left hand column of panels in Fig. 5 refer to the first or primary set of residuals (5); with the centre and right hand columns of panels ( Fig. 5) referring to the respective sets of cohort (centre) and period random effects residuals. For these sets of residual we find that

3
We note the following features from the figures: the constant nature of the bands of residuals in Fig. 3, consistent with the successful capture of dispersion effects; the generally satisfactory distribution of positive and negative residuals across the Lexis plane, with no noteworthy gaps (Fig. 4, upper two panels); and the marginal nature of the centre column of Normal and half-Normal plots (Fig. 5) associated with the cohort random effects. Taken as a whole, the residual plots are indicative that the modelling assumptions have been largely met.
We turn next to the replacement of the empirical MIR displayed in Fig. 1 with the corresponding fitted MIR. These are reproduced in the panels of Fig. 6, with In addition, we have used the same parameter setting to generate the matching smooth-spline curves, which are duplicated together with that generated for 2006 in the lower right panel. We note the sharpening focus of the patterns in each panel as a consequence. We note also the consistent shape of the resulting spline-smoother curves, which are characterised by two local extremes, subject to a slight drift in their positioning on the age axis. The two lower somewhat detached spline curves (in the lower right panel) are the product of the two most recent years, reflecting the recent decline in mortality improvement rates (a particular phenomenon which is analysed further by [5]. Almost without exception, for all panels, the readings (in terms of improvement rates) are positive. The matrix/vector symbols are all as defined in Sect. 4, subject to changes in the details. Thus in the case of (4) with the two random effects components  (4). Both models with cohort and period random effects, and variable age dispersion while no constraints are necessary to ensure the full rank of the matrix T.
Model fitting requires an adjusted dependent variable z a = z t , z t M t whose values reflect the respected log and identity link functions, and a variance covariance matrix which reflects the choice of respective variance functions: the suffix applied to being indicative of fixed effects variable dispersion. Fitting then follows the IWLS procedure as laid out in "Appendix I" subject to the following changes: update ̂ by computing Σ −1 a , z a and solving T t Σ −1 a T̂= T t Σ −1 a z a .
2. Replace d i when up-dating the fixed effects dispersion parameter with either the squared deviance or squared Pearson residuals while the empirical mortality rates provide the starting values for . Details of the residuals are as described in Sect. 4.

An application
We make use of the same UK male mortality data set for the period 1960-2016, ages 0-102, as above. Key details of the mixed effects structure (4) with variable dispersion by age are presented in Fig. 7: with the first moment fixed effects parameter estimates ̂x and ̂x displayed in the two upper panels and the two random effects ̂t −x and ̂t in the two centre panels. We note that ̂x takes the familiar shape of a static life-table' including an 'accident hump'. We also note that the second moment fixed effects parameter ̂x , which is not displayed, is not too dissimilar in shape from its counterpart depicted in Fig. 2. In anticipation of the subsequent application of a first order autoregressive integrated ARI(1,1) time series, we depict the matching first differences of the two random effects components in the lower two panels. Certain residual plots are presented in the two lower panels of Figs. 4 and 8, with the former indicating a lack of randomness in the spatial distribution of positive and negative residuals across the Lexis plane. The pattern of residuals when plotted against age, period and cohort year, result in similar patterns to the ones displayed in Fig. 3 and have not, as a consequence, been reproduced.
The model is extrapolated period-wise by applying an ARI(1,1) time series to the two random effects components as depicted in Fig. 9, together with forecasts and including matching residual plots in the alternative panels. Note that these time series plots replicate the respective details of the two centre panels in Fig. 7,  (4). Upper panels: fixed age effects parameters. Centre panels: random effects components. Lower panels: respective random effects first differences while recalling the roles played by the two lower panels in Fig. 7 in generating the ARI(1,1) forecasts.
On the basis of 1000 time series forecast simulations, we have computed the periodbased life expectancies at each simulation, for certain ages, and depict the resulting 0.05, 0.5, 0.95 quantiles in Fig. 10. We have focused on period-based computations as opposed to cohort-based computations since period-based forecasts may be readily calibrated against historical trends based on the raw mortality data stretching back to 1922, which are also featured in Fig. 10. It would appear that the median forecast trends represent an over statement when compared with the most recent trend over the past decade or so, where mortality improvement rates have decelerated (see Fig. 1). Attempts to influence this feature by shortening the modelling period, in order to focus on the more recent mortality trend data, have not proved to be successful.  where K has non-zero elements k t+1,t = , so that component wise.
Comprehensive model fitting then follows by embedding the IWLS fitting process ("Appendix I") between the following two steps: 1. Given and hence ( ) estimate ( , , , ) for the model = g( ) = + Z * r where * = ZL( ). and g is the appropriate identity or log link function. Empirical values with simulated 5%, 50%, 95% quantile forecasts generated using MR model (4) We subsequently refer to the above comprehensive self-contained fitting and extrapolation process as Approach A. Alternatively, by setting L( ) = I , we can retain the IWLS fitting process unchanged and separately fit the single parameter AR(1) time series, subsequently referred to as Approach B.
For (1) with the two constraints min = min = 0 applied, and for (3) with the three constraints t 1 −x k = t n −x 1 = t 1 = 0 applied. We note that in the case of (3) care is needed in the construction of matrix X to ensure that the main effects cohort parameters do not automatically take precedence over other fixed effects parameters (which appears to be the case if the R-library lme4 file is accessed and used in its construction).

An application
In addition to the direct modelling of MIR by the above process using model (1), we compare results obtained by the indirect modelling of MIR using model (3), which has also been formulated with a single period random effect. We again make use of the UKmale mortality data set, this time with slightly reduced period (1961-2009) and age (1-89) ranges in order to facilitate comparison with certain aspects of the previously reported investigations in Haberman and Renshaw [6]. When fitting (1), we choose the MIR Statistic I of Sect. 2, while noting that the choice does not have a material effect on the ensuing results. When fitting (3), we include the term A x in the design matrix as a fixed effect to be estimated, as opposed to giving it the status of a predetermined offset. For both models, we allow for the capture of structured dispersion by fixed age effects.
The fitted results obtained using (1) are depicted in Fig. 11, together with one of the resulting residual plots in Fig. 12: the patterns in the remaining residual plots being similar to their counterparts in Fig. 3 and the two upper panels in Fig. 4. Modelling is conducted using Approach A throughout this section, but we have additionally superimposed, in Fig. 11, the equivalent plots obtained using Approach B to modelling (broken lines). We note the consistency of patterns within each panel as is the case when comparing the residual plots (not reproduced here but available from the authors).
Similarly, the fitted results obtained using (3), are presented in Fig. 13 together with one of the resulting residual plots in Fig. 14: the patterns in the remaining residual plots being similar to their counterparts in Fig. 3 and the two lower panels in Fig. 4. We note that the estimated fixed effects term A x depicted in the upper left panel of Fig. 13 takes on the familiar shape of a static life-table (including an 'accident hump'), and we have plotted the differences between this estimated age profile and the matching initial empirical log mortality rate profile in the lower left panel of N = n 1 (n 2 − 1), p = n 1 , q = n 1 + 2 n 2 − 2 N = n 1 n 2 , p = n 2 + 3 n 1 − 1 , q = n 2 − 1 Fig. 13 for comparison purposes. We note the pattern of narrowing differences with increasing age. Again, we have superimposed the equivalent plots obtained using Approach B to modelling (broken lines) in all but the lower left panel of Fig. 13 where two superimposed different point plotting symbols are displayed. The mutually compensating displaced patterns in the fixed effects parameter x and t−x plots (upper right, centre left panels) are noteworthy while consistency is preserved in the remaining four panels. Of particular interest is the horizontal trend (around zero) in the respective random effects period component (lower left panel Fig. 11 and centre right panel Fig. 13), which is indicative of the choice of a single parameter AR(1) process. We believe that this is possibly a general feature which is a consequence of the mixed effects model design.
Since the patterns in the matching residual plots are essentially identical under both Approaches A and B to modelling, they have not been replicated. The supporting comparison of residual plots is largely as illustrated in Fig. 7a-d of Haberman and Renshaw [6]. Modelling and forecasting mortality improvement rates with…

Conversion of model outputs into model rates or improvement rates
The basic operation used to determine mortality rates from model (1) is that of definite integration, while that used to determine improvement rates from model (3) is that of differentiation. We have conducted each of these operation (in discrete time) to construct Figs. 15 and 16 by superimposing the respective fitted statistic (broken lines) on the corresponding empirical statistic (continuous line). Thus in Fig. 15, we have used the fitted values ŷ x,t from model (1) (based on Statistic I of Sect. 2) to compute and depicted the 'fitted' log mortality rates for ages x = 40(05)75. Two cases are displayed using (1) Â x from model (3) (brokenlines), and (2) logm x,t 1 the initial empirical log rates (dotted lines) to provide the starter log rates log m x,t 1 . In Fig. 16, we have performed the reciprocal conversion process by using the parameter estimates from models (3) to generate the brokenlines depicting 'fitted' improvements in Fig. 16. This issue is discussed further in Sect. 7.

Model extrapolation
Once the models have been fitted to the data, the method by which mortality improvement rates are extrapolated and converted to rates, using the fixed effects parameter estimates ̂x and ̂t −x , and random effects time series ̂t =v t , is identical for the two models (1) and (3): only the method of estimating these effects differs.
In the application which follows, when converting from extrapolated improvements to rates, we adopt the procedure described in Sect. 2.6 of Haberman and Renshaw [6]. For extrapolation to ages outside the dataset (referred to as "topping-out" by age in the literature), we follow the period based version of the procedure using a fitted hyperbolic function which is described in Sect. 2.7 of Haberman and Renshaw [7]. In order to facilitate comparison with the earlier results of Haberman and  Renshaw [6], in this application, we have truncated the lower end of the age range to ages 20-89, prior to modelling, Given the horizontal trend in the random effects period components, coupled with the design feature E v i = 0 , we extrapolate the models by applying the single parameter AR(1) time series to ̂t . By this method, we have constructed Fig. 17 which depicts extrapolated life expectancy 5%, 50%, 95% quantiles, computed on a cohort basis, and plotted as the individual horizontal lines, each based on 1000 simulations. First, in the two upper panels we show the respective model evolving biennial t n = 1995(02)2009 extrapolations (subject to front-end data deletions shown in descending sequence), thereby using data for 1961-1995, 1961-1997,…, 1961-2009. Second, in the two lower panels, we show the respective model static t n = 2009 extrapolations having first subjected the data to systematic biennial rear-end truncations 1961(02)1975, thereby using data for the periods 1961-2009, 1963-2009,…, 1975-2009 (shown in ascending sequence).
We note the similarity of the extrapolations on comparing the different models, together with the material narrowing of the intervals in comparison with the  [6], using a variety of fixed effects model structures.
As further evidence of the horizontal trend about zero in the random effects period component, we have compared the effects of including an additional intercept parameter in the AR(1) time series using Approach B. The results are presented in Table 1, in which we have tabulated the AR(1) parameter estimates, standard errors and p-values, both with and without the intercept parameter included: and note the lack of statistical significance of the intercept parameter throughout.
We now revert to the full 1-89 age range, and an illustration of the 40(05)75 age specific empirical log mortality rates with simulated 5%, 50%, 95% quantile projections using model (1) and Approach A of Sect. 6 is presented in Fig. 18. These projections, based on 2000 simulations, have been generated by sampling the error in the period component time series. As has been widely discussed in the literature (see Lee and Carter [11], for example), this approach is an approximate one and does not take into account the smaller contribution from the uncertainty in the fixed effects parameter estimates. We have adopted this approach  Table 1 UK male mortality experience, various periods as listed in the first column, matching the upper frame in Fig. 11, ages 20-89 Estimates, standard errors, and p-values for the AR(1) times series with an intercept parameter (second entry), columns 2-4, and without the intercept parameter, columns 5-7

Discussion
Concerning Figs. 1 and 6, the choice of n when forming the n-year MIR rollingaverages is, in part, somewhat arbitrary. However, as n increases, further investigations (not included here) show that the emerging age patterns in the MIR become sharper in focus and display greater consistency. In addition, the patterns of tightly packed MIR age profiles (lower right panel Fig. 6) are found to extend backwards in time over the best part of the last half century before breaking up. This feature is broadly supported by the equivalent details of Fig. 1 which extends backwards in time much further. The same mathematical formula connecting the two alternative MIR statistics of Sect. 2 continues to apply when the statistics are redefined from the cohort, as opposed to period perspective, to read as follows: , and Statistics II: z xt = Δ s log m x,t = log m x,t − log m x−1,t−1 . See Haberman and Renshaw [7] for application of the former approach. While noting that Schinzinger et al. [18] use a different predictor structure, which is multiplicative bilinear as opposed to our additive linear structure, both approaches assume Normal MIR random variables. Of particular relevance, we refer to the Centre Right panel of their Fig. 6.1 in which the annual age-aggregated empirical MIR, based on ages 21-100, for periods 1971-2011 are depicted for UK males (and females).
Using our slightly shorter version of the data, and the MIR Statistic II in common with Schinzinger et al. [18], the same results (males only), for the overlapping periods 1971-2009, are reproduced in the Upper Left panel of Fig. 19. In addition, a scatter-plot of the time-adjacent age-aggregated MIR is depicted in the Upper Right panel, to illustrate the degree and nature of the correlation between these two measures, as discussed in Sect. 6.1 of Schinzinger et al. [18]. The equivalent results using the MIR Statistic I are depicted in the two Lower panels of Fig. 19. The comparison of matching upper and lower left panels reveals that the one is the mirror image, in the x-axis, of the other, for reasons given in log m x,t = log m x,t−1 + log + log x + log Λ t  (1) and (3) where the first term on the RHS is treated as an offset logm x,t−i , the next two fixed effects terms are estimated, and the final term v t = log Λ t is modelled as a normal random effects variable (so that Λ t has the log-normal distribution); with estimates subject to the stipulated constraints.
The philosophy underpinning this approach to generalised linear modelling with random effects and the associated likelihood theory, which we have followed and which is described in Lee et al. [12], is non-Bayesian.
In addition to providing the detailed methodology on which Sects. 4-6 are based, the methodology of Lee et al. 12 could be extended to the incorporation of smoothing, including smoothing by beta-splines.
We note that there is a long established practice of using the Poisson distribution to model numbers of deaths and hence central mortality rates. This starts historically with a static setting in the construction and graduation of life-tables and moves forward to a dynamic setting as here and in the extensive literature on longevity modelling. As discussed in Section 1 of Hunt and Villegas [10], there would appear to be no such established practice for the direct modelling of central mortality improvement rates. Given our results and in particular the residual plots reported in the applications (and also by Haberman and Renshaw [6]), we believe there is adequate evidence for the use of the Gaussian distribution to model mortality improvement rates.
In their Sect. 3.4 of Hunt and Villegas [10] have identified an important inherent difficulty with modelling improvements directly, expressed in terms of a "crude" or "fitted" estimation approach. We conjecture that this manifests itself, in the current modelling framework, as the choice between estimating A x as part of the fixed effects structure in (3) or treating Â x = logm x,t 1 as an offset term. Further, this issue appears to be bound up with the difficulty of reversing the differentiation process which requires a suitable 'constant' of integration: a situation reminiscent of the problem of setting boundary values when integrating partial differential equations in mathematical physics. Setting aside model extrapolation, and given that the sequential differencing of rates is implied in the construction of improvements measures, the use of initial rates to reverse the process is a potential concern, and was the motivation behind the construction of Fig. 15 in Sect. 6.
When introducing a main effects cohort term into a dynamic parameterised predictor structure for modelling mortality rates (Renshaw and Haberman [16]), we have naturally assumed that the supporting Lexis plane was divided into unit squares, typically We have had the benefit of a data set with sufficient granularity to match this level of detail. In so doing, we have avoided some of the problems that were experienced by Mitchell et al. [14] who have needed to use grouped data and irregular grouped data in some cases: this has led to difficulties in modelling the parameterised cohort components.
We note that the "topping-out" procedure does have a material effect on the values of the projected life expectancies depicted in Fig. 17. We note also that the choice of fixed effects parameter constraints made for the three mixed effects models is not necessarily unique but is sufficient to ensure that the matrix T has full rank.

Concluding comments
The comparison of MIR Statistics (Sect. 2) indicates the greater accuracy of Statistic I. However, we have found that the replacement of Statistic I by Statistic II (subject to reversal of sign) does not induce any material consequential differences in our reported results.
By attributing random effects to the period and cohort components or just the period components of a main effects age-period-cohort structured linear predictor from the outset, we have described a comprehensive self-contained process for modelling and forecasting mortality improvement rates, which incorporates structured dispersion and an apparent self-selecting time series. This process is made possible through the implementation of HGLMs with random effects. The methodology also extends to the modelling and forecasting of mortality rates provided that the predictor structure is linear, and, as such, excludes alternatives that involve bilinear decomposition. We argue that this methodological framework leads to a more unified approach to model fitting and forecasting.
For the linear predictor structures under consideration, the choice of the associated time series models, AR(1) for MIR mixed effects modelling and ARI(1,1) for MR mixed effects modelling, follows as a direct consequence of the initial modelling assumptions. The application of these time series models in this context, which are shown to be well supported by the data, may be further generalised by increasing the number of autoregressive terms when this is justified by the practical application.
We note the usefulness of examining the patterns of the random effects residuals as these provide additional critical insight into whether the modelling assumptions are being adhered to.
We note also the usefulness of comparing the empirical and modelled age specific mortality improvement rates. For example, we note that the MIR age profile modelled patterns of Sect. 4 (Fig. 6) are broadly similar to the corresponding empirically generated MIR age profiles of Fig. 1.
There is a long established practice of using the Poisson distribution to model numbers of deaths and hence central mortality rates. Given our detailed modelling results, we believe there is adequate empirical evidence for the use of the Gaussian distribution to model mortality improvement rates. 7. For pre-selected ages and future periods, the simulation process results in matrices of simulated (1) log mortality rates and (2) period-wise life expectancies. For MIR modelling, an additional step is applied to convert from improvement rate projections to log mortality rate projections. The number of simulations equates with the number of columns in these matrices, which readily reduce in size on taking quantiles. Graphics are included.
Funding Not applicable.
Availability of data and material The empirical data are publicly available from the Human Mortality Database.

Code availability
The code is open source and written in R statistical software. The code is available upon request from the authors.

Compliance with ethical standards
Conflicts of interest We declare that we have no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.