Small population bias and sampling effects in stochastic mortality modelling
 752 Downloads
 1 Citations
Abstract
We propose the use of parametric bootstrap methods to investigate the finite sample distribution of the maximum likelihood estimator for the parameter vector of a stochastic mortality model. Particular emphasis is placed on the effect that the size of the underlying population has on the distribution of the MLE in finite samples, and on the dependency structure of the resulting estimator: that is, the dependencies between estimators for the age, period and cohort effects in our model. In addition, we study the distribution of a likelihood ratio test statistic where we test a null hypothesis about the true parameters in our model. Finally, we apply the LRT to the cohort effects estimated from observed mortality rates for females in England and Wales and males in Scotland.
Keywords
Small population Age effect Period effect Cohort effect Bootstrap Parameter uncertainty Systematic parameter difference Likelihood ratio test Power of test1 Introduction
Stochastic mortality models are widely used as risk management tools in the insurance and pensions industry with the main application being the generation of plausible scenarios for future mortality rates. Many stochastic mortality models have been introduced in the last few decades. When new models have been developed the objective was mostly to improve the goodness of fit of the model to mortality data observed in relatively large populations: the LeeCarter model and its refinements (e.g. [3, 23, 31]) have been developed to provide a good fit to the mortality rates observed in the United States, England and Wales and the population of UK male assured lives; while the CairnsBlakeDowd ([6]) model (CBD) was introduced for modelling the England and Wales males population at higher ages.
In contrast, actuaries will often face the problem of modelling the mortality experience of much smaller populations, for example, the members of a pension scheme. Empirical research has found that mortality rates of smaller populations exhibit significantly more variability compared to the observed rates in larger populations. Furthermore, models that fit large countries well, might not be appropriate for smaller populations, for example, [3] showed that the LeeCarter model provides a rather poor fit to the mortality experience of smaller populations. A related issue is that empirical data from smaller populations might only be available for a relatively short period, which makes mortality projections rather uncertain. As a result, a number of recent papers have aimed to develop models specifically for smaller populations: for example, the Saint Model of [18].
A common assumption for many of the proposed models is that the observed numbers of deaths are realisations of random variables with a Poisson distribution given the underlying mortality rates. The estimation of parameters of any such model is therefore based on samples from a Poisson distribution, and, as always in statistics, parameter uncertainty is related to the sample size. Furthermore, many results about the distribution of estimators and corresponding confidence intervals rely on the Maximum Likelihood theorem and large sample sizes.
The increased uncertainty about estimated parameters for small populations results in high levels of uncertainty about projected mortality rates. As a consequence future realised mortality rates will not only diverge from projected rates due to future sampling variation caused by the Poisson distribution, but might also diverge from projections since the projections themselves are uncertain.
In the actuarial literature, simulation techniques have been proposed for dealing with uncertain parameters and projected mortality rates. For example, [24] investigated mortality uncertainty by applying a block bootstrap method on the LeeCarter model, and [4] proposed Poisson bootstrap methods for mortality forecasting. [6] studied the parameter uncertainty of the two factor CBD model by adopting a Bayesian approach. Czado et al. and Pedroza [12, 29] carried out the first Bayesian analysis using Markov Chain Monte Carlo (MCMC) of the LeeCarter model, with further work by [21, 22]. Reichmuth and Sarferaz [30] applied MCMC to a version of the [31] model. Cairns et al. (2011) applied MCMC to a twopopulation AgePeriodCohort model by combining the Poisson likelihood for the deaths counts with time series likelihood functions for the latent random period and cohort effects.
However, to the best of our knowledge, bootstrap methods have not been applied in a systematic way to investigate the impact of the size of a population on parameter and projection uncertainty. This is the focus of our research in this paper. We firstly apply Poisson parametric bootstrap methods to investigate how the variation of parameter estimates and projections is affected by the size of a population. The specific mortality model that we consider is a second generation CBD model with added cohort effect: see Sect. 2 for details. We vary the size of the population by assigning weights to a chosen benchmark population, e.g. England and Wales males. In simulation studies we find that the size of the population has a significant effect on the variation of parameter estimates and projections.
Although we apply a weight to the benchmark population (i.e. scale it down), we ensure that the mortality rates of the constructed small populations are equal to the fitted mortality rates of the benchmark population. In such a situation, uncertainty in projected mortality rates will be reduced if information from the benchmark population parameter estimates can be used for fitting smaller populations. This raises the question of how we can test for systematic differences between the parameters driving mortality rates in a small population and a given null hypothesis about those parameters, where the null hypothesis might have been obtained from a model fitted to a much larger population. If no significant differences can be found then it seems reasonable to use elements of the large population model fit to assist in generation of scenarios for the small population. We therefore investigate the properties of a likelihood ratio (LR) test for all or some of the estimated parameters, and, in particular, consider the distribution of the test statistic based on the bootstrap simulations. This allows us to investigate the power of the LR test and the effect of varying population sizes on the rejection rates. We find that the population size has a strong effect on the probability of a type II error. This is particularly relevant for pension schemes since the acceptance of an incorrect null hypothesis might lead to inaccurate mortality assumptions. To investigate the financial consequences of the resulting misspecified model, we consider annuity prices based on different assumptions about the underlying parameters of our model.
We apply the LR test in an empirical study. The null hypothesis for that study is the estimated cohort effect for males in England and Wales. With this null hypothesis we then carry out hypothesis tests using, first, mortality data for females in England and Wales and, second, males in Scotland to check if their cohort effects are significantly different from the estimated cohort effect for males in England and Wales. We find for both populations that the estimated cohort effect is significantly different from that in the null hypothesis.
The remainder of the paper is organised as follows. Section 2 introduces the model, assumptions and the notations we apply. Section 3 discusses the process of simulation and investigates the distribution of the maximum likelihood estimates, the correlation between the estimates and how these will be affected by changing the population size. In Sect. 4, we investigate the effect of the population size on forecasting by projecting the parameters as well as the mortality rates. Section 5 introduces a likelihood ratio test for testing systematic deviations of the true parameters from a given null hypothesis. The power of the likelihood ratio test is also analysed and we then investigate how significant the impact of shifting and scaling parameters is on the fitted mortality rates and corresponding annuity prices in Sect. 6. Finally, Sects. 7 and 8 include the LRT for testing a null hypothesis about the cohort effect only, and an empirical example for this test is provided. Section 9 provides our final conclusions.
2 The model
We denote by D(t, x) the number of deaths during calendar year \(t=t_1,\ldots ,t_{n_y}\) at age \(x=x_1,\ldots ,x_{n_a}\) and by E(t, x) the corresponding central exposure to risk.

\(\kappa _t^{(i)}\) is a period effect in year \(t=t_1,\ldots ,t_{n_y}\) for each \(i = 1,2,3\),

\(\kappa =\{\kappa ^{(1)},\kappa ^{(2)},\kappa ^{(3)}\}\), where \(\kappa ^{(i)}=\{\kappa _t^{(i)}\}_{t=t_1,\ldots t_{n_y}}\) for \(i=1, 2, 3\),

\(\gamma _c^{(4)}\) is the cohort effect for the cohort born in year \(c=tx\),

\(\gamma ^{(4)}=\{\gamma _c^{(4)}\}_{c=t_1x_{n_a},\ldots ,t_{n_y}x_1}\)

\(\bar{x}\) is the mean of the age range we use for our analysis, and

\({\hat{\sigma }}_x^2\) is the mean of \((x\bar{x})^2\).
As mentioned earlier, in this paper we are concerned with the consequences of small exposures, or population sizes, on the distribution of the maximum likelihood estimator (MLE) \({\hat{\theta }}\) of \(\theta\). To study the distribution of the MLE \({\hat{\theta }}\) we will simulate death data D(t, x) from the model in (1–3) using a given parameter vector \(\theta _0\) and different exposure sizes.
To ensure that our results are relevant for typical values of \(\theta\) we first fit our model to death and exposure data observed in England and Wales during the years 1961 to 2011 for males aged 50 to 89. Note that we do not claim that this is the only choice of dataset. Any large population plus any model that is known to fit it well can be used for this study. The reason for the choice of dataset is that we have familiarity with the England and Wales data and the selected model fits the similar dataset well in Ref. [8]. We then fix \(\theta _0\) to be equal to the estimated parameter vector \({\hat{\theta }}^{\text{ EW }}\) for this data. Note that this is only an example for the true parameter vector \(\theta _0\) and our analysis can be applied to other choices of \(\theta _0\). Mortality data for England and Wales are obtained from the Human Mortality Database.^{1} Note that we do not exclude short cohorts from the estimation since we are interested in how the MLE fits the short cohorts and the impact of small population sizes on the estimates.
The different exposure sizes used to simulate data in the remainder of this paper will be relative to the exposure \(E_0(t,x)\) for a benchmark population. For reasons of practical relevance and consistency with our choice of \(\theta _0\) the benchmark population is the male population in England and Wales unless stated otherwise.
3 Distribution of MLE in finite samples
The exposure of males in England and Wales (EW) in the dataset in year 2011 with selected ages (50, 60, 70, 80, 89)
Age x  50  60  70  80  89  

Exposure  EW  381, 797  307, 825  213, 455  134, 966  42, 640 
3.1 MLE
We conclude from Fig. 1 that there are no significant differences between the empirical correlation matrices obtained from different population sizes, as predicted. However, individual components of \({\hat{\theta }}^w\) are not independent from each other as we would expect given the model in (1–3).
We find for all population sizes considered that the empirical means of the simulated estimates fluctuate around the true parameter values \(\theta _0\) (solid line), which indicates that the MLE is approximately unbiased for all considered population sizes. However, the standard deviation of the estimator depends strongly on the size of the population, increasing significantly as the exposures get smaller as can be seen from the width of the confidence intervals.
4 Mortality projections
While fitting the model in (1–3) to observed mortality data only requires the estimation of the period effects \(\kappa _t = (\kappa ^{(1)}_t, \kappa ^{(2)}_t, \kappa ^{(3)}_t)'\) and the cohort effect \(\gamma ^{(4)}_c\), projecting mortality rates into the future requires a model for values of \(\kappa _t\) for \(t > t_{n_y}\) where \(t_{n_y}\) is the last year for which mortality data are available. Similarly, future values of the cohort effect \(\varvec{\gamma ^{(4)}}\) are also required.
The most common approach to obtain future values of \(\kappa\) and \(\gamma ^{(4)}\) is to consider these parameter vectors as observed trajectories of stochastic processes and fit a parametric time series model to each trajectory. In the following we will fit a threedimensional random walk to \(\kappa _t\) and a stationary AR(1) model to \(\gamma ^{(4)}_c\), as in Ref. [8]. We will then discuss the estimation of the parameters of those models based on the values of \(\theta _0\) and \({\hat{\theta }}^w_j\) for different values of w. This will allow us to investigate the impact of the relative population size w on the estimators for the parameters of the \(\kappa\) and \(\gamma ^{(4)}\) processes.
For the estimation of those parameters and the projections of the period effects and the cohort effect we will consider two approaches. Firstly, we will use a frequentest approach to obtain point estimates of the process parameters ignoring any uncertainty about those estimates. In our further analysis we will follow a Bayesian approach to incorporate parameter uncertainty into our mortality projections.
4.1 Projecting period effects
4.1.1 Point estimators
4.1.2 Bayesian estimation—parameter uncertainty
4.1.3 Empirical comparison
For our empirical study we simulate \(N = 1000\) scenarios for different values of w and plot the empirical density of the point estimator \({\hat{\mu }}^w\) in (9) based on the sample \({\hat{\mu }}^w_1, \ldots , {\hat{\mu }}^w_N\) on the left hand side of Fig. 4. To incorporate parameter uncertainty we draw a further sample of size \(M = 100\) from the posterior distribution of \({\tilde{\mu }}^w_j\) in (12) in each scenario \(j = 1, \ldots , N\). The empirical density of \({\tilde{\mu }}^w_j\) from these \(N \times M\) realisations is shown on the right hand side of Fig. 4.
By comparing the densities in the two columns of that figure we observe that the additional parameter uncertainty increases the variance of the empirical distributions of the drift estimators. This can be explained by investigating the source of uncertainty to the drift. The variation to the point estimator \({\hat{\mu }}^w(i)\) with no allowance for parameter uncertainty comes from the Poisson noise in the number of deaths from the bootstrap simulations, while the variance of the Bayesian estimator \({\tilde{\mu }}^w(i)\) with allowance for extra parameter uncertainty also includes the uncertainty (Eq. 12) from the posterior distribution given the Poisson noise.
We also find in Fig. 4 that the size of a population affects the uncertainty about the drift vector \(\mu\). The variance of the empirical finite sample distribution of both estimators, \({\hat{\mu }}\) and \({\tilde{\mu }}\) decreases significantly when the population size increases, although the difference between \(w=1\) and \(w=0.01\) is rather small as is particularly obvious for the Bayesian estimator \({\tilde{\mu }}\) Fig. 5
However, for smaller values of w we find that the population size has a much more pronounced effect on the variance. For example, the range of likely values of \({\tilde{\mu }}^{0.001}\) is significantly wider than the range of values of \({\tilde{\mu }}^{0.1}\) and \({\tilde{\mu }}^1\) reflecting the uncertainty about \(\mu ^w\) that we have already observed in Fig. 2 top left. The same argument applies to the point estimators \({\hat{\mu }}\).
The finite sample standard deviation of \({\hat{\mu }}\) and \({\tilde{\mu }}\)
\(i = 1\)  \(i = 2\)  \(i = 3\)  

Point estimator \({\hat{\mu }}^w(i)\)  w = 1  0.0000966  0.0000071  0.00000113 
w = 0.1  0.0003050  0.0000217  0.00000343  
w = 0.01  0.0009777  0.0000727  0.00001068  
w = 0.001  0.0028787  0.0002206  0.00003387  
Bayesian estimator \({\tilde{\mu }}^w(i)\)  w = 1  0.00369  0.000173  0.00000936 
w = 0.1  0.00396  0.000222  0.0000162  
w = 0.01  0.00620  0.000505  0.0000458  
w = 0.001  0.01689  0.001566  0.0001478 
Finally, the projected parameters based on the Bayesian estimates \({\tilde{\mu }}\) and \({\tilde{V}}\) are shown in Fig. 7. As we expected, the prediction intervals reflecting the uncertainty about future values of the period effects are very wide for small populations. The plots also suggest that the means of the covariances are right biased compared to the estimate for England and Wales. The variance of projection for all the populations are much higher than the estimates, due to the additional normal randomness added in the forecasting model by simulating the sample paths for \(\kappa\) and \(\gamma ^{(4)}\). However, the left column shows that there is no obvious proportional relationship between the population size and projection variance. By investigating the mean covariance matrices, we find that the increase of \(E[V_{3,3}^w]\) from \(w=0.01\) to \(w=1\) is of the highest among the three period effects, which suggests that the standard deviation of projection for \(\kappa _t^{(1)}\) and \(\kappa _t^{(2)}\) is not as sensitive as \(\kappa _t^{(3)}\) to the change of population size.
4.2 Projecting the cohort effect
After having removed cohorts with six or less observations from the data, we fit the AR(1) model in (13) to the rest of the cohort effects. The resulting density of the parameter estimates of the model are shown in Fig. 6. All of the parameter estimates and the standard deviation of error terms appear to be biased relative to the estimate for England and Wales, regardless of the size of population. However, we find that reducing the population size will greatly increase the mean bias as well as the uncertainty.
4.3 Projected mortality rates
Based on the projected period and cohort effects we can now turn to the projection of mortality rates using our model in (1)–(3). Figure 8 shows the twentyyear forward projections of mortality rates at ages 65 and 85. We compare the predicted rates with and without the allowance for parameter uncertainty for all the constructed populations with the projections based on the England and Wales data. Unsurprisingly, the uncertainty about future mortality rates increases as the forecast horizon increases. The other two factors which significantly influence the projection uncertainty are age and population size.
Reducing the population size results in greater uncertainty about mortality forecasts for both ages. For example, the uncertainty is much greater for the smaller populations (\(w=0.01, 0.001\)) at both ages 65 and 85. This means that there is considerable uncertainty about future mortality scenarios for a relatively small pension scheme with significant implications for the risk management of such a scheme.
 1.
Project mortality rates for each constructed population, while fixing the parameters \(\mu\) and V of the random walk to the estimates obtained from the England and Wales data.
 2.
Project mortality rates for each constructed population, while fixing only the drift \(\mu\) to the corresponding EW estimates and sample realisations of \({\hat{V}}\) from its empirical distribution.
 3.
Project mortality rates for each constructed population, while fixing only the variance matrix V to the corresponding EW estimates and sample the drift parameter from the empirical distribution of \({\hat{\mu }}\).
 4.
Project mortality rates when both V and \(\mu\) are samples from the empirical distribution of \({\hat{V}}\) and \({\hat{\mu }}\).
4.4 Summary
5 Likelihood ratio test for systematic parameter difference
We have seen that the size of a population has a substantial impact on the level of uncertainty about the parameters of the model in (1–3) when this model is fitted to the population’s mortality data. This raises the question whether the estimated period and cohort effects in \(\theta = (\kappa _t^{(1)},\kappa _t^{(2)},\kappa _t^{(3)},\gamma _c^{(4)})\) for a small a population are significantly different from those in a given, typically much larger, reference population. To address this question we apply a likelihood ratio test to test for significant deviations of estimated parameters from a given null hypothesis using the maximum likelihood estimator \({\hat{\theta }}^w_j\) defined in (7) for simulated mortality data \(D^w_j\) as in (6). We are particularly interested in the finite sample distribution of the test statistic as compared to its asymptotic distribution. As in Sect. 3 we will use simulated deaths scenarios to investigate the finite sample distribution and the power of the likelihood ratio test (LRT) applied to mortality data. We will start with a short review of the LRT.
5.1 Review of likelihood ratio test
Before we start testing our null hypothesis, it is worth considering the testability of the hypothesis.^{3} In our approach the constraints in Equation (4) in Sect. 2 are part of the model and therefore the effective number of parameters that are identifiable is the total number of parameters reduced by the number of constraints. In this paper, we formulate the constraints in terms of the cohort effect \(\gamma\) since we will in particular consider the case \(\theta _{r}=\gamma\) in our empirical study. If the test is about one of the period effects we could reformulate the constraints in terms of that period effect (strictly, therefore, a different model). In that way, the constraints are always fulfilled under \(H_0\). In short, the constraints should be chosen such that the null hypothesis fulfils the constraints. In other words, we are testing the null hypothesis that the mortality experience is generated by mortality rates that follow model M7 with the constraints in Equation (4) and \(\theta _{r}=\theta _{r_0}\).
In the reminder of this section we will consider a null hypothesis about the entire parameter vector \(\theta\) setting \(s=0\). In Sect. 7 we will then consider a null hypothesis about the cohort effect \(\gamma\) only, that is \(s>0\).
5.2 Finite sample distribution of LRT
As in Sect. 3, we choose the male population in England and Wales as our base case and set \(\theta _0 = {\hat{\theta }}^{\text{ EW }}\).
 1.
simulate \(D^w_j\) as in (6),
 2.
find the estimate \({\hat{\theta }}^w_j\) as in (7),
 3.
calculate the realisation of the LRT statistic \(\Gamma ^w_j\) as in (18) and
 4.
calculate the pvalue \(P^w_j\) based on the asymptotic \(\chi ^2\)distribution as \(P^w_j = P[X > \Gamma ^w_j]\) where X is has \(\chi ^2\)distribution with \(\alpha\) degrees of freedom.
5.3 Power of the likelihood ratio test
In the last section, we carried out the likelihood ratio test for the parameter difference and found that the \(\chi ^2\) approximation does not fail to capture the feature of the test statistic \(\Gamma ^{w}\) when \(H_0\) holds. We will now investigate how the population size affects the power of LRT. In general, the power of a binary hypothesis is the probability of correctly accepting the alternative hypothesis when it is true.^{4}

\(\theta ^{(1)}=({\hat{\kappa }}_0^{(1)}+\lambda ,{\hat{\kappa }}_0^{(2)},{\hat{\kappa }}_0^{(3)},{\hat{\gamma }}_0^{(4)})\)

\(\theta ^{(2)}=({\hat{\kappa }}_0^{(1)}, {\hat{\kappa }}_0^{(2)}+\lambda ,{\hat{\kappa }}_0^{(3)},{\hat{\gamma }}_0^{(4)})\)

\(\theta ^{(3)}=({\hat{\kappa }}_0^{(1)}, {\hat{\kappa }}_0^{(2)},{\hat{\kappa }}_0^{(3)}+\lambda ,{\hat{\gamma }}_0^{(4)})\)

\(\theta ^{(4)}=({\hat{\kappa }}_0^{(1)}, {\hat{\kappa }}_0^{(2)},{\hat{\kappa }}_0^{(3)},\lambda {\hat{\gamma }}_0^{(4)})\)
Using the simulated death counts \(D_{j}^{w,(i)}\) we obtain the MLE \({\hat{\theta }}_{j}^{w,(i)}\) as in (7). We then use the asymptotic \(\chi ^2\)distribution to test the null hypothesis that the parameters of our model are equal to the parameters obtained from the England and Wales populations. The pvalues \(P^{w,(i)}_j = P^{w,(i)}_j(\lambda )\) are then calculated as in step 4 in the previous section, and the null hypothesis is rejected in any scenario j for which \(P^{w,(i)}_j < 0.05\), that is, the significance level of the test is 0.05.
Then we investigate sensitivity of the power with respect to the size of \(\lambda\) and the size of population w. for each of the four cases, \(\theta ^{(1)},\ldots , \theta ^{(4)}\), we consider a set of values for \(\lambda\) that are regularly spaced.
Unsurprisingly, the power of the LRT is increasing in \(\lambda\) for any \(\theta ^{(i)}\) and relative population size w; the more we shift/scale the null hypothesis, the easier it is for the test to detect any shift/scaling. For the three period effects, decreasing the population size will greatly reduce the capability of LRT to detect the same amount of shift to a single parameter. We can also compare these plots with the earlier Fig. 2 which includes distributions of parameter estimates resulting from sampling variation. By way of example, for \(w=0.01\) the width of the confidence interval in Fig. 2e for \(\kappa ^{(3)}_{t,w}\) is about 0.005. This is much larger than the shifts that are considered in the power plot in Fig. 12. The reason why the latter values are so much lower is because we apply a systematic adjustment to all of the \(\kappa ^{(3)}_{t,w}\), in contrast to random adjustments (due to sampling variation) in the former.
6 Impact of parameter misspecification on mortality rates and annuities
The table contains the size of shift required for 50\(\%\) power when each parameter is shifted separately, with respect to population \(w=1,0.1,0.01\)
Parameter shifted  \(w=1\)  \(w=0.1\)  \(w=0.01\) 

\(\lambda ^{w,(1)}_{0.5}\)  0.003  0.006  0.02 
\(\lambda ^{w,(2)}_{0.5}\)  0.0003  0.0006  0.002 
\(\lambda ^{w,(3)}_{0.5}\)  0.0000025  0.000005  0.00018 
\(\lambda ^{w,(4)}_{0.5}\)  1.03  1.09  1.32 

\(\theta ^{w,(1)}_{0.5}=({\hat{\kappa }}_0^{(1)}+\lambda ^{w,(1)}_{0.5},{\hat{\kappa }}_0^{(2)},{\hat{\kappa }}_0^{(3)},{\hat{\gamma }}_0^{(4)})\)

\(\theta ^{w,(2)}_{0.5}=({\hat{\kappa }}_0^{(1)}, {\hat{\kappa }}_0^{(2)}+\lambda ^{w,(2)}_{0.5},{\hat{\kappa }}_0^{(3)},{\hat{\gamma }}_0^{(4)})\)

\(\theta ^{w,(3)}_{0.5}=({\hat{\kappa }}_0^{(1)}, {\hat{\kappa }}_0^{(2)},{\hat{\kappa }}_0^{(3)}+\lambda ^{w,(3)}_{0.5},{\hat{\gamma }}_0^{(4)})\)

\(\theta ^{w,(4)}_{0.5}=({\hat{\kappa }}_0^{(1)}, {\hat{\kappa }}_0^{(2)},{\hat{\kappa }}_0^{(3)},\lambda ^{w,(4)}_{0.5}{\hat{\gamma }}_0^{(4)})\)
 A temporary annuity of £1 per annum payable annually in arrears to a life now aged 65 exactly, starting at the beginning of year 2012 with term of 25 years. Its expected present value is calculated as:
 An annuity of £1 per annum payable annually in arrears to a life now aged 55 exactly, deferred for 10 years, starting at the beginning of year 2012 with term of 25 years. Its expected present value is:
We project the period and cohort effects in \(\theta ^{w,(i)}_{0.5}\) (\(i=1,2,3,4\)) and \({\hat{\theta }}^{EW}\) forward for 35 years as in Sect. 4 where we use the point estimates defined in (9) and (10) for the parameters of the random walk for the shifted period effects, that is, we do not consider uncertainty about the drift and variance matrix of the random walk. Annuity prices are calculated for each sample path and we then calculate the average annuity price for each w with the ith parameter shifted or scaled. The results are shown in Tables 4 and 5.
The effects of shifting the period effects and scaling the cohort effect are somewhat varied. As might be expected, the impact on prices is most obvious for \(w=0.01\). The impact on both types of annuity is straightforward to see for \(\kappa ^{(1)}\): the shift pushes up mortality rates at all ages and lowers prices. For \(\kappa ^{(2)}\) there is more impact on the age65 annuity than the age55 deferred annuity as the shift lowers mortality at younger ages and raises it at higher ages. For \(\kappa ^{(3)}\), also, the impact is different at different ages. Finally, for \(\gamma ^{(4)}\), the impact of scaling simply depends on the sign and magnitude of the value of \(\gamma ^{(4)}\) for the cohort being priced.
The impact of shifting each parameter separately on the price of a twenty fiveyear temporary annuity for an individual aged at 65
Parameter shifted  England and Wales  \(w=1\)  \(w=0.1\)  \(w=0.01\) 

\(\kappa ^{(1)}\)  14.67466  14.66393  14.65318  14.60280 
\(\kappa ^{(2)}\)  14.67466  14.66887  14.66307  14.63588 
\(\kappa ^{(3)}\)  14.67466  14.67500  14.67534  14.69850 
\(\gamma ^{(4)}\)  14.67466  14.66997  14.66056  14.62441 
The impact of shifting each parameter separately on the price of a tenyear deferred twenty fiveyear temporary annuity for an individual aged at 55
Parameter shifted  England and Wales  \(w=1\)  \(w=0.1\)  \(w=0.01\) 

\(\kappa ^{(1)}\)  11.96545  11.95599  11.94652  11.84214 
\(\kappa ^{(2)}\)  11.96545  11.96358  11.96169  11.95266 
\(\kappa ^{(3)}\)  11.96545  11.96565  11.96584  11.97920 
\(\gamma ^{(4)}\)  11.96545  11.96815  11.97355  11.99411 
7 Likelihood ratio test for the cohort effect
The general form of the LRT as reviewed in Sect. 5.1 allows us to test a null hypothesis about parts of the parameter vector \(\theta\) (restricted by the specified identifiability constraints as part of the model) rather than the entire \(\theta = (\kappa _t^{(1)},\kappa _t^{(2)},\kappa _t^{(3)},\gamma _c^{(4)})\). Testing parts of \(\theta\) is particularly relevant if mortality rates in a rather small population are modelled using estimated period or cohort effects from a larger population. Setting one or more of the components of \(\theta\) equal to the function of corresponding parameters estimated from the large population reduces the dimension of the parameter vector which needs to be estimated from the small population where parameter uncertainty is rather strong as we have seen in Sect. 3. The example we have in mind is a pension fund that uses national mortality data to improve its mortality models, or when the mortality experience in a small country is modelled based on the combined experience of other similar countries.
For practical relevance we base our simulation study on the female and male populations in England and Wales. We choose \(\gamma _0 = {\hat{\gamma }}^{EW}\), which is the estimated cohort effect from the mortality data for males in England and Wales. It is worth noting that, as \({\hat{\gamma }}^{EW}\) already satisfies the identifiability constraints, the null hypothesis \(H_0: \gamma =\gamma _0\) has no testability problems under the given identifiability constraints defined in the model system. To investigate finite sample properties of \(\Gamma\) we will need to specify a full parameter vector \(\theta\) to simulate scenarios for the death counts. Having fixed the cohort effect \(\gamma _0\) we choose the period effects to be the estimated period effects from data for the female population in England and Wales assuming that the cohort effect for those data is actually \(\gamma _0\). As we are mainly interested in small populations we will consider deaths count scenarios for populations which have exposures equal to \(wE_0\) where \(E_0\) is here the exposure for the female population in England and Wales.
 1.Simulate death counts \(D^w_j\) as in (6) using the parameter vectorto obtain scenarios \(D^w_j\) for different values of the relative population size w. The period effects \({\tilde{\kappa }}\) are estimated from data for females with the cohort effect fixed to \(\gamma _0\). The exposure is \(wE_0\) where \(E_0\) is the exposure for the female population in England and Wales.$$\begin{aligned} {\tilde{\theta }}= ({\tilde{\theta }}_s, \theta _{r0}) = ({\tilde{\kappa }}_t^{(1)},{\tilde{\kappa }}_t^{(2)},{\tilde{\kappa }}_t^{(3)}, \gamma _0) \end{aligned}$$
 2.
Find the MLE \({\tilde{\theta }}_{s,j}\) of period effects \(\kappa\) in scenario j assuming that the null hypothesis holds, as in (16).
 3.
Find the unrestricted MLE \({\hat{\theta }}_j\) as in (15).
 4.
Calculate the value of the test statistic \(\Gamma ^w_j\) in (17) in each scenario j.
 5.
Calculate the pvalues \(P^w_j\) based on the asymptotic \(\chi ^2\)distribution with \(\alpha\) degrees of freedom, where \(\alpha\) is the number of parameters (cohorts) r minus the number of constraints as in Sect. 5.2. For our data set we obtain \(\alpha = 87\).
8 Empirical examples
We apply the LRT for the cohort effect in two empirical studies.
8.1 Females vs. males in England and Wales
The population for which we wish to test the cohort effect first is the female population in England and Wales that we already considered in our simulation study. Our null hypothesis is therefore that the true cohort effect for the female population in England and Wales is equal to the estimated cohort effect for males in England and Wales. Note that this is different from testing the hypothesis that the male and female population share the same (true) cohort effect since we ignore the uncertainty about the estimated cohort effect for males.
This difference can be confirmed more formally using the LRT with the null hypothesis that the females have the same cohort effect as the previously estimated males cohort effect. The test statistic \(\Gamma\) is approximately 6311, which is an extremely high value for a \(\chi ^2\)distribution with 87 degrees of freedom and is also very high compared to the values of \(\Gamma\) observed in our simulation study, see Fig. 14. The pvalue is therefore very close to zero, and we reject the null hypothesis that the cohort effect fro the mortality of the female population is the same as the previously estimated cohort effect for the male population.
8.2 Male mortality in Scotland vs. England and Wales
For the LRT we again choose \(\gamma _0= {\hat{\gamma }}^{EW}\) and then test the hypothesis that the true cohort effect for Scottish males is equal to \(\gamma _0\). The 99% quantile of a \(\chi ^2\)distribution with 87 degrees of freedom is approximately 121. For the test statistic we find \(\Gamma = 193.37\) and we therefore reject the null hypothesis and conclude that the cohort effect in Scotland is significantly different from the estimated cohort effect for England and Wales. This indicates that there might be factors in the Scottish male population that result in significant differences throughout time. However, we might speculate that there is a common cohort effect, that is, for some reason, magnified in Scotland. Investigating this in detail is beyond the scope of this paper, but we speculate that a magnified effect might be the result of socioeconomic differences between the two populations: for example, cohort effects might be greater in lower socioeconomic groups.
9 Conclusion
In this paper, we investigated the finite sample distribution of the maximum likelihood estimators for the parameters of a stochastic mortality model. We found that the size of a population has a significant effect on the uncertainty about the estimated parameters and mortality projections. In particular, we found that there exists a bias in the estimated covariance matrix of the random walk fitted to the period effects when the size of the underlying population is small. As a consequence, prediction intervals are rather wide for small populations even when parameter uncertainty is ignored.
To investigate if parameters estimated from larger populations can be used to generate scenarios for smaller populations we investigated how a likelihood ratio test performs when applied to the mortality experience of a small population. We found that the finite sample distribution of the test statistic is very close to the asymptotically correct \(\chi ^2\) distribution and, therefore, the observed rejection rates are close to the chosen significance level. However, we also found that the power of the test depends strongly on the population size with the ability of the test to detect deviations from the null hypothesis being significantly reduced when the size of the underlying populations is small.
A brief investigation of annuity prices has shown that the misspecification of parameters has a limited financial impact. Considering shifts in the parameter values which the LR test would detect with a \(50\%\) chance we have seen that the impact of a small population size is significant for deferred annuities. To have a complete picture of possible further financial consequences, a more detailed study is required, which is beyond the scope of this paper.
In our empirical analysis we then applied the LRT, and found that neither of the mortality rates of the female population in England and Wales and the male population in Scotland should be modelled with a cohort effect estimated from the male population in England and Wales.
In this paper, we used the traditional twostage fitting approach whereby the period and cohort effects are estimated using the Poisson maximum likelihood method in the first stage and a time series model is fitted to these effects in the second stage. We have found that sampling variation in the small population datasets has significant impact, which can then obscure the true signal in those effects, and giving rise to misleading forecasts. Bayesian approaches that combines the two stages into one, e.g., [29], Cairns et al. (2011) and [12]) can be used to provide a way to address this problem. However, as use of the twostage approach is widespread (perhaps because of its relative simplicity) we have, here, attempted the first systematic analysis of the impact of population size on parameter estimates and forecasts using the twostage approach. In this way, users of the twostage approach will be better informed about its limitations as well as understanding how the likelihood ratio test might be used to exploit data from larger populations.
Footnotes
 1.
Human Mortality Database. University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Available at www.mortality.org or www.humanmortality.de (data downloaded on 16 February, 2014).
 2.
See [19] for more details about the likelihood ratio test and the asymptotic distribution of the LRT statistic.
 3.
See [32] for more details about testable hypotheses.
 4.
See [15] for more details on statistical power.
Notes
Acknowledgements
Liang Chen is in receipt of an Actuarial Research Centre PhD scholarship funded by the Faculty of Actuaries Endowment Fund and the Institute and Faculty of Actuaries. Andrew Cairns and Torsten Kleinow acknowledge financial support from the Actuarial Research Centre of the Institute and Faculty of Actuaries, and Netspar under project LMVP 2012.03.
References
 1.Andreev K (2002) Evolution of the Danish Population from 1835 to 2000, Vol. 9, University Press of Southern DenmarkGoogle Scholar
 2.Anisimova M, Bielawski JP, Yang Z (2001) Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evolut 18:1585–1592CrossRefGoogle Scholar
 3.Booth H, Hyndman RJ, Tickle L, De Jong P et al. (2006) LeeCarter mortality forecasting: a multicountry comparison of variants and extensions, Technical report, Monash University, Department of Econometrics and Business StatisticsGoogle Scholar
 4.Brouhns N, Denuit M, Van Keilegom I (2005) Bootstrapping the Poisson logbilinear model for mortality forecasting. Scand Actuar J 3:212–224MathSciNetCrossRefMATHGoogle Scholar
 5.Brouhns N, Denuit M, Vermunt JK (2002) A Poisson logbilinear regression approach to the construction of projected lifetables. Insur Math Econ 31(3):373–393MathSciNetCrossRefMATHGoogle Scholar
 6.Cairns AJG, Blake D, Dowd K (2006) A twofactor model for stochastic mortality with parameter uncertainty: theory and calibration. J Risk Insur 73(4):687–718CrossRefGoogle Scholar
 7.Cairns AJG, Blake D, Dowd K, Coughlan GD, Epstein D, KhalafAllah M (2011) Mortality density forecasts: an analysis of six stochastic mortality models. Insur Math Econ 48(3):355–367MathSciNetCrossRefMATHGoogle Scholar
 8.Cairns AJG, Blake D, Dowd K, Coughlan GD, Epstein D, Ong A, Balevich I (2009) A quantitative comparison of stochastic mortality models using data from England and Wales and the United States. N Am Actuar J 13(1):1–35MathSciNetCrossRefGoogle Scholar
 9.Cairns AJG, Blake D, Dowd K, Coughlan GD, KhalafAllah M (2011) Bayesian stochastic mortality modelling for two populations. ASTIN Bull 41(01):29–59MathSciNetMATHGoogle Scholar
 10.Cox DR, Hinkley DV (1979) Theoretical statistics, CRC PressGoogle Scholar
 11.Currie ID (2016) On fitting generalized linear and nonlinear models of mortality. Scand Actuar J 4:356–383MathSciNetCrossRefGoogle Scholar
 12.Czado C, Delwarde A, Denuit M (2005) Bayesian Poisson logbilinear mortality projections. Insur Math Econ 36(3):260–284MathSciNetCrossRefMATHGoogle Scholar
 13.D’Amato V, Haberman S, Russolillo M (2009) Efficient bootstrap applied to the Poisson logbilinear Lee Carter model. Proceedings of the Applied Stochastic Models and Data Analysis (ASMDAG09), pp. 374–377Google Scholar
 14.Davison AC (1997) Bootstrap methods and their application, Vol. 1, Cambridge university pressGoogle Scholar
 15.Ellis PD (2010) The essential guide to effect sizes: statistical power, metaanalysis, and the interpretation of research results, Cambridge University PressGoogle Scholar
 16.Gelman A, Carlin JB, Stern HS, Rubin DB (1995) Bayesian data analysis. Chapman and Hall, LondonMATHGoogle Scholar
 17.Huelsenbeck JP, Crandall KA (1997) Phylogeny estimation and hypothesis testing using maximum likelihood. Annual Review of Ecology and Systematics, pp 437–466Google Scholar
 18.Jarner SF, Kryger EM (2011) Modelling adult mortality in small populations: the SAINT model. ASTIN Bull 41(02):377–418MathSciNetMATHGoogle Scholar
 19.Kendall M, Stuart A, Ord J (1987) Kendalls advanced theory of statistics. Oxford University PressGoogle Scholar
 20.Kleinow T, Richards SJ (2016) Parameter risk in timeseries mortality forecasts, Working paper, HeriotWatt UniversityGoogle Scholar
 21.Kogure A, Kitsukawa K, Kurachi Y (2009) A Bayesian comparison of models for changing mortalities toward evaluating longevity risk in Japan. AsiaPacific Journal of Risk and Insurance 3(2)Google Scholar
 22.Kogure A, Kurachi Y (2010) A Bayesian approach to pricing longevity risk based on riskneutral predictive distributions. Insur Math Econ 46(1):162–172MathSciNetCrossRefMATHGoogle Scholar
 23.Lee RD, Carter LR (1992) Modeling and forecasting US mortality. Journal of the American statistical association 87(419):659–671MATHGoogle Scholar
 24.Liu X, Braun WJ (2011) Investigating mortality uncertainty using the block bootstrap. Journal of Probability and Statistics 2010Google Scholar
 25.Mood A, Graybill F, Boes D (1963) Introduction into the theory of statisticsGoogle Scholar
 26.Neyman J, Pearson ES (1992) On the problem of the most efficient tests of statistical hypotheses, SpringerGoogle Scholar
 27.Nielsen B, Nielsen JP (2014) Identification and forecasting in mortality models. The Scientific World JournalGoogle Scholar
 28.OBrien RM (2014) Estimable functions in ageperiodcohort models: a unified approach. Quality & Quantity 48(1):457–474Google Scholar
 29.Pedroza C (2006) A Bayesian forecasting model: predicting US male mortality. Biostatistics 7(4):530–550CrossRefMATHGoogle Scholar
 30.Reichmuth W, Sarferaz S (2008) Bayesian Demographic Modeling and Forecasting: An Application to U.S. Mortality, HumboldtUniversität zu Berlin, Wirtschaftswissenschaftliche FakultätGoogle Scholar
 31.Renshaw AE, Haberman S (2003) LeeCarter mortality forecasting with agespecific enhancement. Insur Math Econ 33(2):255–272MathSciNetCrossRefMATHGoogle Scholar
 32.Searle SR (1971) Linear models. New york: Wiley & SonsGoogle Scholar
 33.Wilks SS (1938) The largesample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 9(1):60–62CrossRefMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.