Abstract
Modelling mortality and longevity risk is critical to assessing risk for insurers issuing longevity risk products. It has challenged practitioners and academics alike because of first the existence of common stochastic trends and second the unpredictability of an eventual mortality improvement in some age groups. When considering causeofdeath mortality rates, both aforementioned trends are additionally affected by the cause of death. Longevity trends are usually forecasted using a LeeCarter model with a single stochastic time series for period improvements, or using an agebased parametric model with univariate time series for the parameters. We assess a multivariate time series model for the parameters of the HeligmanPollard function, through Vector Error Correction Models which include the common stochastic longrun trends. The model is applied to circulatory disease deaths in U.S. over a 50year period and is shown to be an improvement over both the LeeCarter model and the stochastic parameter ARIMA HeligmanPollard model.
Introduction
Longevity and mortality improvement has been driven by many factors over the last century. These factors have impacted different ages to varying extents resulting in common improvements across age groups. As a result, longevity and mortality risk across ages is expected to include common stochastic trends. The LeeCarter model,^{Footnote 1} and variants of it, is often used for mortality and longevity risk applications by industry practitioners and academics. This model is highly parameterized and includes a single common stochastic trend across ages fitted using a time series model. An alternative approach^{Footnote 2} is to fit a parametric model to the age structure of mortality and apply time series models to the parameters. This provides for a more parsimonious model but does not capture common stochastic trends across the age structure of mortality. Our first attempt to improve these models was to use a Vector AutoRegression (VAR) for the parameters which captures dependence between the parameters. This was further used in an analysis of mortality trends in a number of developed countries.^{Footnote 3} However, VAR models do not include longrun relationships between the parameters that will be important in maintaining the relative levels of mortality for different ages. Our second attempt was then to use Vector Error Correction Models (VECM) that are applied in econometrics to model multivariate dynamic systems. They include time dependency between economic variables, common stochastic trends and longrun equilibrium relationships.
Modelling causeofdeath mortality rates is increasingly important in forecasting trends in aggregate mortality improvement.^{Footnote 4} Models that better capture trends and that can be applied to causeofdeath mortality rates will provide practitioners with an improved understanding of mortality risks and factors determining future mortality trends. Indeed, causes of death reflect underlying socioeconomic factors and provide important insights into future trends and volatility for longevity and mortality modelling.^{Footnote 5} Cause elimination models as well as causedelay models have been developed^{Footnote 6} assuming independent causes of death. Projections of mortality rates by cause of death have also been considered.^{Footnote 7} The LeeCarter model has been used for cause of death data.^{Footnote 8} In general, models used for mortality at a population level are not appropriate for causespecific rates as demonstrated by an analysis of six causes of death on Italian data.^{Footnote 9} Different causes of death have different age patterns, different time trends and these vary by country.^{Footnote 10} New approaches are then needed to model mortality by cause of death. A flexible model that can capture different age patterns and maintain consistent patterns while allowing trends to differ by age is required for a cause of death approach.
This paper models the age structure of causespecific mortality rates with a parametric mortality model, the HeligmanPollard model.^{Footnote 11} The parameters for this model are readily interpreted according to their impact on different age ranges. These parameters are modelled with a VECM allowing for time dependency and longrun trends between the parameters. To demonstrate the model performance, it is estimated over the period 1950–2000 for United States circulatory system deaths. The VECM approach is shown to be an improvement over the LeeCarter model and approaches using an AutoRegressive Integrated Moving Average (ARIMA) process for the parameters and allows a more realistic quantification of risk.
This paper provides practitioners with new methods for modelling mortality, in particular, causeofdeath mortality rates. The paper demonstrates the benefits of incorporating common trends in longevity and mortality risk models to be used by insurers issuing longevity and mortality risk products. The model assessed in the paper improves existing models and this is demonstrated in the improved forecasting performance.
The paper begins with a summary of the theoretical background for a VECM analysis in the next section, following which the data used is described. Longrun equilibriums between agebased risk factors are estimated and discussed in the later section. An improved mortality forecasting approach using a stochastic parameter HeligmanPollard VECM is provided in the subsequent section. The improved accuracy of this new forecasting approach over the LeeCarter model and ARIMA processes is then demonstrated for United States circulatory system deaths in the penultimate section. The final section concludes.
VAR and VECM, theoretical background
VAR model dynamic interactions between a set of variables. They capture dependence through time and between variables. This multivariate time series approach has been popular among economists for several decades and is often used to model time series of economic variables. VECM, an extension of VAR, include longrun equilibrium relationships between variables, also referred to as common stochastic trends, using the concept of cointegration.
A pthorder VAR, denoted as VAR(p), explains each variable with p lags of itself and the other variables in the model. Denoting the n variables at time t by the (n × 1) vector y _{ t }, a VAR is written as
where c is a (n × 1) vector of constants and Φ _{ i } is a (n × n) matrix of autoregressive coefficients for i=1, 2,…, p. The (n × 1) vector ɛ _{ t } is a vector of white noises, with
where Ω is a symmetric positive definite covariance matrix.
Estimates of the parameters of a VAR(p) and the associated asymptotic distributions assume stationarity of the process. A VAR(p) is (weakly) stationary if its first and second moments are constant over time, that is E(y _{ t }) and E(y _{ t } y′_{ t−j }) are independent of time t, although they will usually differ with the time lag j. In this case the process has a constant mean (no trend) and its variance does not change over time. However, many time series do have a trend so that nonstationarity must be considered as well.
If a variable (x _{ t }) is nonstationary and becomes stationary by taking first differences
then the variable is referred to as integrated of order one, denoted I(1). If the process is integrated of order one, differencing removes the nonstationarity and a VAR(p) can then be fitted. However, differencing will lose any information about longrun trends in the levels of the data. Even if the variables are nonstationary, they may move together with common stochastic trends. These common trends can be captured by a longrun equilibrium relationship. A linear combination of these variables will then exist such that the relation is stationary even if each variable is not.
Consider the n variables in vector y _{ t }, all I(1), and related by
This relationship holds on average in the long run. At a particular point in time, there will be deviations from the equilibrium such that
where z _{ t } is a stochastic variable representing that deviation.^{Footnote 12} If a longrun equilibrium exists, then z _{ t } will be stationary. In this case these integrated variables are referred to as cointegrated and the above relation as the cointegration relation.
The cointegration relation can be written in vector and matrix notations as
with
The vector β is referred to as a cointegrating vector. More than one cointegration relation may exist, and thus there might be more than one cointegrating vector, each being linearly independent from the others. For example, if there are five variables, the first two may be linked by a cointegration relation and the last three by another. In such a situation, the vector β is a matrix with each of its columns being a cointegrating vector. Thus
with β _{ i } the ith cointegration relation, for i=1, 2, …, r. The stationary vector β′y _{ t } contains the r linearly independent cointegrated relations of the n variables under study.^{Footnote 13} If the columns of β represent all the linearly independent relations, that is all other cointegrated relations are a linear combination of the columns of β, then there are exactly r cointegrating relations among the elements of y _{ t } and (β _{ 1 } β _{ 2 } … β _{ r }) forms a basis of the space of cointegrating vectors.
The cointegrated relations, given by the matrix β, are not uniquely defined. Each cointegrating vector could be multiplied by a constant and the relationship will remain the same. Broadly speaking, for any nonzero vector v with r elements, v′β′y _{ t } also represents the r cointegrated relations. As a result constraints are required for some values of the matrix β.
The cointegration relations are important in VAR modelling. A VAR(p) can equivalently be written as
where
 Π=:

−(I _{ n }−Φ _{1}−⋯−Φ _{ p });
= αβ′;
= matrix of rank r;
 α=:

a (n × r) loading matrix;
 β=:

a (n × r) matrix containing the r vectors forming a basis of the space of cointegration;
 ξ _{ i }=:

−(Φ _{ i+1 }+⋯+Φ _{ p }) for i=1, …, p−1.
which is known as the VECM of the cointegrated system. Each element is stationary as the first difference of an I(1) process is stationary as are the cointegration relations. The loading matrix α indicates which cointegrated relation has an impact on which variable and to what extent. For example, the element α _{ ij } measures the effect of the cointegrated relation j (j=1, …, r) on the variable i (i=1, …, n).
The rank of the matrix Π indicates the number of cointegrated relations among the variables of the process. Three different cases are possible: There is no cointegrated relation (r=0). A VAR(p−1) may be applied on the first differences of the variables. Or all linear combinations are stationary (r=n). Thus, all the variables in the process are stationary. Or there are r cointegrated relations (0<r<n), such that Π=αβ′. In this case, the cointegrated relations are included in the error correction term.
Johansen's approach is used to estimate the number of cointegrated relations in a process as well as the parameters in the matrices α, β, c and ξ _{ i } for i=1, 2,…, (p−1). As a summary, the following steps are used to estimate a VECM:

1
Lag order of the VAR, p: Use selection criteria, such as Akaike's Information Criteria (AIC), HannanQuinn Criterion (HQ), Schwarz Criterion (SC), Final Prediction Error (FPE), to select the lag order of the VAR.

2
Unit root tests on the variables: For a process to be stationary, the characteristic polynomial of its VAR should have all its roots outside the complex unit circle.^{Footnote 14} Therefore, if this polynomial has a root equal to unity, some or all the variables are integrated of order one and there might be cointegrated relations among them. Unit root tests, such as the KwiatkowskiPhillipsSchmidtShin test (KPSS), the Augmented DickeyFuller test (ADF) or the PhillipsPerron test (PP), are useful tools to check for the stationarity of the variables. KPSS tests the null hypothesis that the variable is level or trend stationary, while ADF and PP test the null hypothesis of a unit root, and thus, the null hypothesis of nonstationarity.

3
If the variables are stationary, denoted I(0), a VAR(p) is suitable. If the variables are I(1), the Johansen's procedure is applied in order to find the number of cointegrated relations. Two test statistics are commonly used in order to find the number of cointegrated relations: the trace test and the maximumeigenvalue test.

4
If the variables are I(1) and if there is no cointegration, a VAR(p−1) on the first difference is fitted. Otherwise, the appropriate VECM should be found.

5
Model validation: test for residual autocorrelations and nonnormality.
Figure 1 provides a summary diagram of the estimation process.
Data
The data used are for fiveyear age group causespecific mortality rates collected from the Mortality Database administered by the WHO.^{Footnote 15} Mortality rates were determined as the number of persons for each age group and sex who die in a particular year of a specified cause, divided by the number of persons of that age group and sex alive at the beginning of the year. Midyear population provided by the WHO database is used as an approximation for the beginningoftheyear population. Indeed, as the population increases over time, midyear population is an appropriate approximation, partly including migrations. Besides, diseases of the circulatory system for females in the United States over the period 1950–2005 are used and are known to be reliable. Circulatory diseases are recognised as the most important cause of death in developed economies, especially for ages above 30–50.^{10} They are important in modelling future aggregate mortality rates as well as capturing the most recent trends in mortality improvement. This cause represented around 40 per cent of total deaths in United States in 1955, 56 per cent in 1970, 50 per cent in 1985 and 41 per cent in 2000.
Causes of death are defined by the International Classification of Diseases (ICD), which ensures consistencies between countries. The ICD changed three times between 1950 and 2006, from ICD7 to ICD10, in order to take into account progresses in science and technology and to refine the classification. In the Unites States, the ICD changed in 1968, 1979 and 1999. The raw data is then not directly comparable for different periods. To make it comparable, comparability ratios are computed. The aim is to smooth mortality rates across the classifications. The average of the mortality rates over the last two years of a classification is required to coincide with the average of mortality rates over the first two years of the next classification. A comparability ratio is defined as the sum of the probabilities of dying in the first two years of a new classification divided by the sum of the probabilities of dying in the last two years of the previous classification.
In order to obtain data comparable over the complete period under observation, the number of deaths in a new classification is divided by the comparability ratio linking this classification with the previous one and previous comparability ratios where appropriate. Discontinuities in the mortality rates at the junction points between two classifications have been adjusted by these comparability ratios. The data used are the adjusted mortality rates.
Longrun equilibrium among agebased risk factors
The HeligmanPollard (HP) model has been used to model mortality rates in the United States over the period 1900–1985^{Footnote 16} and the multiexponential model for causespecific mortality rates in the United States over the period 1960–1985.^{Footnote 17} In previous research, the resulting time series of the parameters have been modelled with ARIMA processes and these models are used to forecast the value of the parameters and hence the complete age profile of mortality. Although univariate models are assumed for the parameters, the authors recognise the potential improvement from incorporating the covariation between the parameters.
We extend this approach, using multivariate time series allowing for time dependency and longrun trends between the parameters. The HP model is used for causespecific mortality rates and a VECM is used to model the parameters as stochastic factors. The HP model is a concise representation of mortality by age, each parameter having a demographic meaning and is given by
where
This model is a sum of three terms: the first represents mortality rates during childhood; the second, mortality at middle ages (the accident hump); and the last, mortality rates at older ages. The parameters have interpretation as factors impacting differing age ranges. Although the HP model has nine parameters, which means the model has nine stochastic factors, these are required in order to capture the shape of the mortality curve for the whole age range. Table 1 summarises the demographic meaning of the nine parameters and the range of values they may have.
The model estimation process in Figure 1 is applied to the mortality rates due to diseases of the circulatory system for females in the United States. The parameters of the HP function are first estimated by weighted least squares, the weights used implying a minimisation of the relative error as suggested by Heligman and Pollard.^{11} Because of issues regarding multicollinearity, irregular changes of the parameter estimates from one year to another may occur even if the mortality rates of adjoining years are similar.^{Footnote 18} This multicollinearity makes it more difficult to fit the HP function to any given data set. Without setting constraints on the parameters, convergence problems occur during the nonlinear minimisation, as parameter B tends to increase to values greater than ten, even if it was expected to be smaller than one based on previous studies,^{Footnote 19} and parameter F tends to reach the upper bound of 150.
Fixing the value of some of the parameters, either a priori or estimated from the data, has been proposed as a solution,^{Footnote 20} thus reducing the number of stochastic factors in the model. Since parameter B converges to an optimum value of one for a few years, without setting any constraints during the minimisation process, we fixed it to be equal to one. This assumption does not have an important effect on the fit of the HP function on mortality as parameter B has a negligible impact at ages other than zero. Parameter F is also fixed at its median value of 33.85. By fixing these two parameters, the nonlinear minimisation converges without any difficulties to an optimum value. Only seven parameters are then required in the VECM and forecasted to give the complete age profile of mortality rates.
Lag order selection
A lag order of one was found to provide a best fit to the parameters for the mortality rates, which is consistent with the autoregressive nature of mortality. This is determined based on AIC, HQ, SC and FPE tests performed on the data. These four criteria have the smallest value at a lag order of one, even with a constant or a trend included in the VAR for the parameters.
Unit root tests
KPSS, ADF and PP tests reveal similar results. The only parameter clearly trend stationary according to the three tests is parameter C, even at a 1 and 10 per cent significance levels. KPSS test does not reject the null hypothesis of stationarity at a 10 per cent significance level. ADF and PP tests reject the null hypothesis of nonstationarity at a 1 per cent significance level. Results are less conclusive for parameter D. ADF and PP tests reject the null hypothesis of nonstationarity at a 5 per cent significance level, but do not reject it at a 1 per cent significance level. KPSS test does not reject the null hypothesis of stationarity at a 1 per cent significance level but rejects it at the 5 per cent level. All three tests confirm the five remaining parameters as nonstationary.
In order to allow for this, two models are fitted, one assuming only parameter C is trend stationary and the other assuming parameters C and D are trend stationary. Normality tests on the residuals of both models as well as autocorrelations among the residuals show that the best results are obtained with parameter D modelled as nonstationary. This model is the one used. Cointegrated relations model then common stochastic trends in the six nonstationary parameters A, D, E, G, H and K.
Cointegrated relations
Table 2 presents the results for the trace test and the maximumeigenvalue test of the Johansen's procedure. These two tests assess the number of longrun equilibrium relationships among the stochastic parameters (factors) of the HeligmanPollard model (A, D, E, G, H and K). The trace test compares the null hypothesis that there are r cointegrated relations against the alternative of n cointegrated relations, where n corresponds to the number of variables under observation and r<n. This test indicates that we do not reject the null hypothesis of two, three, four or five cointegrated relations against six cointegrated relations at a 2.5 per cent significance level.
The maximumeigenvalue statistic tests the null hypothesis of r cointegrated relations against the hypothesis of r+1 cointegrated relations. Table 2(b) shows that the null hypothesis of two cointegrated relations is not rejected, while the null hypothesis of one cointegrated relation is rejected indicating that two cointegrated relations exist. These two longrun equilibrium relationships for the parameters of the HeligmanPollard model determine how changes in the agebased risk factors move relative to each other. They allow for stochastic trends in the mortality curve, while maintaining longrun relationships between ages.
Fitted VECM
Johansen's procedure is used to estimate the resulting VECM. The fitted VECM with B and F fixed is
The second term on the righthand side of this equation gives the effect of the two cointegrating relationships between the parameters. They can be written as
where z _{1t } and z _{2t } are two stochastic variables representing the deviation from the equilibrium. These two variables are stationary. The sign associated with each parameter is of significance in these relations.^{Footnote 21} The first cointegrated relation shows that as mortality around age zero decreases, that is parameter A decreases, then either the accident hump will decrease or impact a smaller age range, so that parameter D will decline or E will increase, or mortality for the elderly will increase, with parameter K declining, parameter G or parameter H increasing, or a combination of these impacts. This relationship reflects the historical data and the relative changes in the mortality curve across all ages. The VECM assumes these estimated longrun relationships will continue when used for forecasting. The curve changes stochastically but is constrained by the longrun equilibrium relationships between the parameters given by the cointegrating vectors.
Increases in mortality at older ages will occur in the HeligmanPollard model if there is an increase in either or both parameters G and H or a decrease in parameter K. This is captured in both cointegrating vectors as the coefficients for parameters G and H have the same sign, while the coefficient of K is of an opposite sign.
In a comparison of mortality for males and females in Switzerland and in the United States using the HeligmanPollard model,^{Footnote 22} mortality rates at older ages are shown to be lower, but the age pattern steeper, in Switzerland than in the Unites States, since parameter G is smaller and parameter H higher in Switzerland. Similarly, female mortality at older ages is lower and increases at a faster pace than male mortality in both countries. This link between these two parameters is reflected in the cointegrations, as a decrease in parameter G may produce an increase in parameter H for the relation to stay at the equilibrium.
The relation between these parameters that is reflected in the cointegrated relations can be understood based on recent changes in the leading causes of death. Deaths from circulatory system have been more important in the past. More recently, cancer is becoming the major cause of death for middle ages, while the circulatory system still remains the most important cause for ages above 75.^{10} Thus, circulatory mortality at middle ages has decreased more abruptly over the last 50 years than mortality at older ages, which is reflected through a decrease in parameter G and an increase in parameter H. The fitted mortality rates using the model are given in Figure 2 along with the original rates for comparison. The model fits the data well reproducing the age structure and changes through time.
Model validation
The residuals of the model are tested for normality as well as any remaining autocorrelation. The test statistics are summarised in Table 3 . The Portmanteau test is a test for the overall significance of the residual autocorrelations up to lag l. The Portmanteau statistic has an approximate asymptotic Chisquare distribution for large values of l. The test has a null hypothesis of noautocorrelation among the residuals up to l = 15 and l = 25 lags. The statistic used is the Portmanteau statistic adjusted for small sample size.^{Footnote 23} As shown by the pvalues, the null hypothesis is not rejected at a 5 per cent significance level.
Tests for normality are based on the third and fourth central moments (skewness and kurtosis) of a normal distribution.^{Footnote 24} The test statistic labelled both in Table 3 is a joint test of skewness and kurtosis. The three tests clearly show that the null hypothesis of normality is not rejected. The fitted model residuals for the parameters of the HeligmanPollard model satisfy the assumption of normally distributed and noautocorrelation. The model is a good representation of the data generating process and thus captures the trends and age dependence of mortality rates for deaths due to the circulatory system in the United States.
Projections
The fitted VECM provides a good fit to the data with 33 parameters for 19 age groups over 56 years based on 1,064 observations. Parameters are used to model stochastic trends and to ensure that longrun relationships between different ages are maintained through cointegration. These are very important features of the model approach.
To illustrate the model forecasting performance, the model is fitted to a shorter data period and projections are performed in order to compare the forecasts with the actual data. At least 50 years of observations are usually needed in order to reliably estimate a VECM. The model is fitted to data over the period 1950–2000 and then forecasted for the following 20 years. Parameter B is assumed fixed at a value of one and parameter F is assumed fixed at the median value over the period 1950–2000 of 25.42. Since actual data is only available until 2005, the first five years of forecasted mortality rates are compared with actual mortality.
To assess the performance of the VECM approach, two other models are fitted over the period 1950–2000 and used to forecast mortality rates until 2005. The first is a variant of the wellknown LeeCarter model, which has been successfully applied at the population level for United States data in the past. The second model fits the HeligmanPollard model for mortality rates and uses univariate time series, traditional ARIMA processes, to model the parameters.
VECM
The VECM fitting process is applied for the time period 1950–2000. AIC, HQ, SC and FPE tests confirm a lag order of one. ADF and PP tests reject the null hypothesis of nonstationarity at a 1 per cent significance level for parameters C, D and E, while KPSS test does not reject the null hypothesis of trend stationarity at a 10 per cent significance level only for parameter C and at a 1 per cent significance level for parameter E. Parameters C, D and E are assumed trend stationary based on support for this from at least two out of the three tests.
Two cointegrated relations are estimated for the remaining four parameters, as shown in Table 4 . These tests determine the number of longrun equilibrium relationships among the parameters of the HeligmanPollard model (A, G, H and K).
The estimated VECM is
The cointegrating relationships are
These are similar to the cointegrated relations for the VECM fitted over the full period 1950–2005 with parameters G and H having coefficients with similar signs, while the coefficient of K is of opposite sign. The middle age parameters D and E are no longer included.
Normality tests as well as tests on the autocorrelations among the residuals are shown in Table 5 . The normality of the residuals is not rejected at a 5 per cent significance level. However, the Portmanteau test indicates some autocorrelation remains in the residuals. A VECM with higher order, such as two or three lags, may then be better. Since the longer period model fit indicated a VECM with higher order was not required and since a model with as few parameters as possible is expected to have a better forecasting performance, a lag of one is used for the shorter period as well.
Figure 3 displays graphs of the parameters for the fitted model and the projected values. Figure 4 shows the observed mortality rates in 2000 (dots) along with forecasts for the VECM.
LeeCarter model
In order to assess the performance of the VECM a variant of the LeeCarter model,^{1} which has become a standard in mortality modelling, is used for comparison. The model decomposes the logarithm of the force of mortality into two components, one capturing the age pattern of average mortality rates and the other a common time trend with differential impacts by age. The model is written as
with
 μ _{ x,t } :

force of mortality at age x, and year t;
 α _{ x } :

mean value over time, at age x, of the logarithm of the force of mortality;
 β _{ x } :

deviation from the mean value α _{ x } at age x.
Reflects the impact of the time trend represented by κ _{ t } on age.
The higher its absolute value, the larger the changes in mortality at that age;
 κ _{ t } :

mortality rates trend over time;
 ɛ _{ x,t } :

random changes including those not captured by the model;
= errors assumed to have mean zero and variance σ ^{2} (homoscedasticity).
To ensure this interpretation of the parameter α _{ x }, the sum over the estimator is set equal to zero,
Since if N is the number of years of observations,
which leads to
For an identifiable model, another constraint on the parameters is specified, which usually is
The parameters are estimated numerically using maximum likelihood estimation, assuming that the number of deaths at age x follows a Poisson distribution with mean l _{ x,t }·μ _{ x,t }, l _{ x,t } being the population of age x alive at the beginning of year t.^{Footnote 25} The parameter estimates are shown in Figure 5.
In the LeeCarter model, a single common factor across ages determines the general level of mortality improvement over time. This improvement is modelled with a single time series (κ _{ t }) that is projected in order to forecast the complete age profile of mortality. Lee and Carter suggest the use of a simple random walk with drift, that is an ARIMA(0, 1, 0) model. Our analysis confirms an ARIMA(0, 1, 0) process with a drift of −0.36 as the best model.^{Footnote 26}
ARIMA processes
As another alternative for comparison, the parameters of the HeligmanPollard model fitted over the period 1950–2000 are modelled with ARIMA processes, instead of a VECM. We follow the approach applied in previous works,^{Footnote 27} that is trends are removed from the variables by first or second differencing. According to ADF, PP and KPSS tests, first differencing on each parameter is necessary and sufficient to assure stationarity. ARIMA(k, 1, (k−1)) models are successively fitted for each parameter of the HeligmanPollard model, increasing k by one, as suggested by Pandit and Wu.^{26} The Fcriterion is used to decide which model is the most adequate between an ARIMA(k, 1, (k−1)) and an ARIMA((k+1), 1, k). The Fcriterion is a test to check for improvement in the residual sum of squares of the error ɛ _{ t }. It tests the assumption that some of the coefficients in a model are restricted to zero.^{26}
Once the appropriate ARIMA(k, 1, (k−1)) process is found, the confidence interval of the coefficients is checked to ensure parameters are significantly different from zero. If some intervals include zero, the Fcriterion is applied to determine the adequacy of a model without the corresponding coefficients. To check the adequacy of the model found using the procedure described by Pandit and Wu, ARIMA(p, 1, q) processes for all combinations with p⩽3 and q⩽2 are fitted and compared through the Fcriterion, resulting in similar choice of the ARIMA model. Finally, residuals of the fitted model are checked to ensure they are uncorrelated. Table 6 shows the best fitting ARIMA models resulting from this procedure. Each model has no significant residual autocorrelation according to the Portmanteau test.
Model comparison
Comparison of the forecasts for the alternative models can be assessed from the plots given in Figures 6, 7 and 8. Figure 6 shows the forecasted mortality rates from the fitted VECM (curve) compared with the actual data (dots). The forecasts from the LeeCarter model are shown in Figure 7. Figure 8 shows the forecasted age profiles of mortality from the fitted ARIMA models.
These figures show that the LeeCarter model tends to produce relatively poor forecasts by overestimating the decline of mortality at young ages. The HeligmanPollard model along with ARIMA processes underestimates mortality improvement, particularly at older ages. The VECM provides better forecasts and does not exhibit either of these features.
The forecasting performance of the three methods is also compared by assessing the match between the actual and the forecasted mortality rates from 2001 to 2005 using a mean absolute percentage error statistic (MAPE), the average of the absolute percentage errors between the forecasted and the observed mortality rates computed over the 19 age groups. The projections are also compared with a nochange forecast. The nochange forecast takes the mortality of 2000 as the forecasted mortality rates of the following five years. The ratio of the square root of two mean square errors (MSE) computed over the 19 age groups is used. The first MSE is computed between the forecasted and the observed mortality rates. The second MSE is between the nochange forecast and the observed mortality rates. A statistic with a value smaller than unity indicates that the model forecasts are more accurate than the base assumption of no change in mortality. Table 7 shows the model comparison results.
The forecasts from the VECM are closer to the actual mortality rates than the forecasts of the LeeCarter model or the forecasts performed with the HeligmanPollard model and ARIMA processes. The forecast error (MAPE) with the VECM is lower and increases more slowly with time than the other two models. This is expected to apply for projections over longer time horizons. The VECM performance comes from its ability to capture relationships between the ages better than these alternative models.
The main benefit of using the HeligmanPollard model over the LeeCarter model is that it smooths mortality rates across ages and reduces the number of parameters for the age structure mortality curve. The HeligmanPollard model has sufficient flexibility to allow for the full age range of mortality as well as having parameters that are interpreted for their effect on mortality rates. Treating these parameters as stochastic factors allows for more general dynamics than in the LeeCarter model, even though there is an increase in the number of parameters for the random factors in the model.
The LeeCarter model appears to be more accurate at older ages during the early forecasted years than a stochastic parameter model based on the HeligmanPollard model. As mortality rates at older ages are higher than rates at young and middle ages, the absolute error is higher at older ages. The statistic that compares each model with the nochange forecast uses the squared error. In Table 7 the statistic comparing the nochange forecast for the LeeCarter model is then lower for the first three years than the statistic of the other two models. For the later years this is reversed with the VECM HeligmanPollard model having lower comparative forecast error. For longer forecast horizons, the age pattern given by the HeligmanPollard model produces better fits and relationships between ages are maintained by the VECM taking into account longrun stochastic trends for the parameters of the model. The VECM HeligmanPollard model is expected to be more accurate for longer forecast periods than five years because of this.
Conclusion
This paper presents a modelling and forecasting approach for mortality using VECM concepts for a parametric model for the age structure of mortality rates. Because of its ability to capture changing age structures and to model longrun trends, it has the flexibility to model different causes of death as well as aggregate mortality rates. To demonstrate the application of the model, it is fitted to causeofdeath mortality rates for the circulatory system in the United States over the period 1950–2000 and compared with alternative models previously proposed.
The model is a stochastic parameter HeligmanPollard model, where the time series of the parameters are modelled with a VECM. The VECM incorporates common stochastic trends through longrun equilibrium relationships. The model captures the age dependence of causespecific mortality rates through dependence between the parameters. Indeed, the parameters of the HeligmanPollard model have interpretation as factors impacting specific age ranges—young, middle and older. The demographic meaning of the parameters allows for an analysis of the longrun equilibriums or cointegration relations in the model. The expected relationship between the parameters at older ages is shown to be reflected in these cointegration relations.
The analysis demonstrates for circulatory diseases mortality that a forecasting model based on the HeligmanPollard model with a VECM for the parameters is an improvement over the wellknown LeeCarter model as well as over a model based on the HeligmanPollard model with ARIMA processes for the parameters. As the forecast horizon is lengthened, the model based on the HeligmanPollard model, with a VECM for the parameters, increases in forecasting accuracy. The HeligmanPollard model uses a smaller number of parameters to model the age structure of mortality rates compared with the LeeCarter model. The stochastic model for the parameters provides an adequate representation of the randomness of the process and the use of a VECM ensures consistent future mortality curves are forecasted, which is not the case if simpler ARIMA time series models are used. A stochastic parameter VECM form of the HeligmanPollard model has also been shown to perform well at a national population level based on Australian mortality data.^{Footnote 28} Finally, new theories on time series models and applications to mortality are being developed.^{Footnote 29} Standard unit root tests are potentially unreliable in the presence of bounded time series, as they tend to overreject the null hypothesis of a unit root, even asymptotically. Conventional cointegration tests may also be misleading in the presence of bounded time series. These new theories have the potential to improve the forecasting performance of the HeligmanPollard model with a VECM for the parameters.
Notes
In this paper, we consider variables that are integrated of order one, in which case, cointegrated relations are necessarily stationary. The general framework is given in Hamilton (1994) and Lütkepohl (2005).
The size of the coefficients is influenced by the value taken by the parameters. For example, as parameters A, D and G are small, their associated coefficient is high.
As in Lütkepohl (2005).
For a detailed description of these tests, see Lütkepohl (2005).
References
Barugola, T. and Maccheroni, C. (2007) ‘Sensitivity analysis of the LeeCarter model fitting mortality by causes of death’, Società Italiana di Statistica, Rischio e Previsione, Atti della Riunione Intermedia, Padova, pp. 481–482.
Bell, W.R. (1997) ‘Comparing and assessing time series methods for forecasting agespecific fertility and mortality rates’, Journal of Official Statistics 13 (3): 279–303.
Cavaliere, G. (2005) ‘Limited time series with a unit root’, Econometric Theory 21: 907–945.
Delwarde, A. and Denuit, M. (2006) ‘Construction de Tables de Mortalité Périodiques et Prospectives’, Paris, France: Economica.
Gaille, S. (2010) Improving longevity and mortality risk models, PhD thesis, University of Lausanne.
Gaille, S. and Sherris, M. (2010) Age patterns and trends in mortality by cause of death and implications for modeling longevity risk, Technical report, The Social Science Research Network Electronic Paper Collection.
Gutterman, S. and Vanderhoof, I.T. (1998) ‘Forecasting changes in mortality: A search for a law of causes and effects’, North American Actuarial Journal 2 (4): 135–138.
Hamilton, J.D. (1994) Time Series Analysis, Princeton: Princeton University Press.
Hanewald, K. (2011) ‘Explaining mortality dynamics: The role of macroeconomic fluctuations and cause of death trends’, North American Actuarial Journal 15 (2): 290–314.
Heligman, L. and Pollard, J.H. (1980) ‘The age pattern of mortality’, Journal of the Institute of Actuaries 107 (434): 49–80.
Lee, R.D. and Carter, L. (1992) ‘Modeling and forecasting US mortality’, Journal of the American Statistical Association 87 (419): 659–671.
Lütkepohl, H. (2005) New Introduction to Multiple Time Series Analysis, Germany: Springer.
Manton, K.G., Patrick, C.H. and Stallard, E. (1980) ‘Mortality model based on delays in progression of chronic diseases: Alternative to cause elimination model’, Public Health Reports 95 (6): 580–588.
McNown, R. and Rogers, A. (1989) ‘Forecasting mortality: A parameterized time series approach’, Demography 26 (4): 645–660.
McNown, R. and Rogers, A. (1992) ‘Forecasting causespecific mortality using time series methods’, International Journal of Forecasting 8 (3): 413–432.
Njenga, C.N. and Sherris, M. (2009) Longevity risk and the econometric analysis of mortality trends and volatility, Australian School of Business Research Paper No 2009ACTL08, University of New South Wales, Sydney.
Njenga, C.N. and Sherris, M. (2011) Modeling mortality with a Bayesian vector autoregression, Australian School of Business Research Paper No 2011ACTL04, University of New South Wales, Sydney.
Olshansky, J.S. (1987) ‘Simultaneous/multiple causedelay (SIMCAD): An epidemiological approach to projecting mortality’, Journal of Gerontology 42 (4): 358–365.
Pandit, S.M. and Wu, S.M. (2001) Time Series and System Analysis with Applications, Malabar: Krieger, Florida.
Tabeau, E., Ekamper, P., Huisman, C. and Bosch, A. (1999) ‘Improving overall mortality forecasts by analysing causeofdeath, period and cohort effects in trends’, European Journal of Population 15 (2): 153–183.
Tabeau, E., Van Den Bergh Jeths, A. and Heathcote, C. (2001) Forecasting Mortality in Developed Countries. Insights from a Statistical, Demographic and Epidemiological Perspective, Dordrecht: Kluwer Academic Publishers.
Wilmoth, J.R. (1993) Computational methods for fitting and extrapolating the Lee, Technical report, National Institute of Aging.
World Health Organization (2009) ‘Who mortality database’, January 2009, www.who.int/whosis/mort/download/en/index.html.
Acknowledgements
The authors acknowledge the support of ARC Linkage Grant Project LP0883398, Managing Risk with Insurance and Superannuation as Individuals Age with industry partners PwC and APRA. Gaille acknowledges scholarship support from the Swiss National Science Foundation for the project Managing Risk as Individuals Age with Insurance and Superannuation, number PBLAP1124258.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Gaille, S., Sherris, M. Modelling Mortality with Common Stochastic LongRun Trends. Geneva Pap Risk Insur Issues Pract 36, 595–621 (2011). https://doi.org/10.1057/gpp.2011.19
Published:
Issue Date:
DOI: https://doi.org/10.1057/gpp.2011.19
Keywords
 mortality trends
 HeligmanPollard model
 LeeCarter model
 VECM
 causes of death
 mortality forecasts