A general framework for analysing the mortality experience of a large portfolio of lives: with an application to the UK universities superannuation scheme

We propose a general framework that can be used to analyse the mortality experience of a large portfolio of lives. The objective of the framework is to provide a firm evidence base to support the setting of future mortality assumptions for the portfolio as a whole or subgroup-by-subgroup. The framework is developed in tandem with an analysis of the mortality of pensioners in the Universities Superannuation Scheme (USS), the largest funded pension scheme in the UK and one with a highly educated and very homogeneous membership. The USS experience was compared with English mortality subdivided into deprivation deciles using the Index of Multiple Deprivation (IMD). USS was found to have significantly lower mortality rates than even IMD-10 (the least deprived of the English deciles), but with similar mortality improvement rates to that decile over the period 2005–2016. Higher pensions were found to predict lower mortality, but only weakly so, and only for persons who retired on the first day of a month (mostly from active service). We found that other potential covariates derived from an individual’s post/zip code (geographical region and the IMD associated with their local area) typically had no explanatory power. This lack of dependence is an important conclusion of the USS-specific analysis and contrasts with others that consider the mortality of more heterogeneous scheme memberships. Although the key findings are likely to be particular to USS, we argue that our analytical framework will be useful for other large pension schemes and life annuity providers.


3 Introduction
The actuarial and demographic literature has no shortage of analyses of both national and sub-population mortality. But it is quite rare to have the opportunity to analyse a sub-population that is quite as homogeneous as the Universities Superannuation Scheme (USS).
USS is a UK funded multi-employer pension scheme that provides mainly defined benefit pensions to university staff at almost all UK universities. The membership comprises mostly academics and senior professional services staff. The multi-employer nature of the scheme means that typical durations of active service that count towards a USS pension are high relative to many other schemes which, in combination with higher than national average salaries, lead to higher average pension amounts compared with other large schemes. The homogeneity of the scheme membership stems from the fact that most members are (effectively by definition) highly educated and have very similar jobs and working environments. This homogeneity leads to very specific conclusions in this paper about rates of mortality that could be quite different from other groups of lives.
In most developed countries, national mortality rates are easy to obtain from either national statistics agencies or the Human Mortality Database (HMD). 1 Subpopulation mortality analyses also exist in a variety of settings. In the UK, for example, sub-population analyses or data are available for: • the nations within the UK (Scotland, England, Wales and Northern Ireland: see HMD); • geographical regions within, for example, England (see, for example, [1], and later in this paper); • Index of Multiple Deprivation deciles (IMD) (see, for example, Department of Communities and Local Government, 2015 [15], and later in this paper); • pension and insurance datasets (for example, [11]).
More detailed analyses of insurance or pension portfolios can take into account portfolio-or individual-specific information, such as pension amounts [11], industry [10] or geodemographic profile [28].
For other countries, covariates such as education [23] and income or affluence [2,6,29] have been found to be highly significant predictors of mortality rates at the national level.
In most cases, covariates such as geodemographic profile, region, affluence (or conversely deprivation) associated with local area of residence, and pension amount all have significant power to predict higher or lower levels of mortality. In turn, this leads to greater accuracy in the valuation of mortality-linked pension liabilities. Although our methodological approach was developed specifically to explain the idiosyncrasies of a particular pension scheme, we believe it can be adopted as a general framework for analysing the mortality experience of other large datasets.
The structure of the paper is as follows. Section 2 introduces the USS mortality data. Section 3 describes the CBDX mortality model [16] used to smooth some of the mortality data. Section 4 provides an empirical comparison of USS versus English mortality data by IMD decile, with more formal statistical tests in Sect. 5. Section 6 discusses a first-of-the-month effect, where the day of the month on which an individual retires turns out to be important. The significance of pension amount is considered in Sect. 7. Section 8 discusses future short-and long-term mortality rate improvement assumptions for the USS in the light of both national and international comparisons. Section 9 summarises the findings for the USS from the previous sections and makes recommendations based on those findings. Section 10 concludes with lessons about how the general framework can be used to analyse the mortality experience of other large pension schemes and life annuity providers. This paper builds on an earlier preliminary study of the longevity of USS members by Coughlan [13].

Description of the data
The USS pensioner dataset consists of 98,733 individuals who were directly in receipt of a USS pension at some time during the period 2005-2016 (inclusive). 2 Pensions in payment at the end of 2016 amounted to around £1.4 billion per annum. Pensioner 'types' comprise: primary pensioners (i.e., former active or deferred members who retired in 'normal health'); ill-health pensioners (former active members who took early retirement due to ill health, normally with enhanced benefits); spouse pensioners (i.e., spouses or long-term partners of deceased primary or illhealth pensioners); ex-spouse pensioners; child pensioners (i.e., children of deceased primary pensioners while they are still in full-time education); and other dependent pensioners. This analysis focuses on primary pensioners and spouses (approximately 93% of the lives and 95% of annual pension payments).
Data for each pensioner (in addition to 'type') comprise sex, dates of birth, retirement and death, pension amount, and their residential post/zip code. Pension amounts are either as at 1 January 2017 for survivors or at the date of death revalued to 1 January 2017 in line with general USS pension increases.
The data were subject to extensive data validation tests as well as a range of diagnostics designed to identify certain idiosyncratic features of the USS data. Data validation included identification of outliers in the data which were subsequently excluded from the analysis. Specifically, 93 out of 76,895 primary pensioners were considered to have inaccurate records including: age at retirement before 50 or after 80 (primary pensioners); date of birth before 1903; and date of exit preceded date of retirement.

3
With outliers removed, exposures 3 and deaths over the period 2005-2016 are given in Table 1. For comparison, exposures for the English national population over the 12 years from 2005-2016 and ages 65-89 are approximately 45 million person years (males) and 55 million person years (females), so USS pensioners' exposures are, on average, about 0.7% (males) and 0.3% (females) of the size of the corresponding English population.
Graphical diagnostics included scatterplots of pairs of quantities, such as date of birth, date of retirement, age at retirement, age at death and pension amount. The plots helped identify outliers and various idiosyncratic features of the data. Of most interest, we found that the day of the month on which primary retirees are recorded as having retired was strongly correlated (although not necessarily causally connected) with the age at retirement and the pension amount. The importance of this feature of the data is explored further in Sects. 6 and 7.

The CBDX model
The analysis below makes use of the CBDX model of Dowd et al. [16]. 4 This results in some smoothing in both the age and time dimensions. Even though USS is one of the largest UK pension schemes, the pensioners population, at around 0.3% to 0.7% of the equivalent English population, is small enough that there will be considerable small-population noise in the crude death rates. Application of the CBDX model dampens this noise considerably by exploiting the generally stable relationship in sufficiently large populations between death rates at different ages and through time. The model for the underlying death rate m(t, x) is: for calendar year t, age x last birthday (which we abbreviate to (t, x)) and birth cohort t − x , where • (x) is a non-parametric age effect, 5 similar to that in the Lee-Carter [21] model;  [4] model which builds on earlier proposed generalisations, including those of Plat [27] and Wen et al. [31,32]. 5 Or base mortality table. 3 Short for exposures to risk, which is not the same as population.
• x is the mid-point of the age range considered; • 1 (t) and 2 (t) are period effects.
Adding a third (quadratic) age-period component turned out not to be helpful: it resulted in overfitting within each year together with an unstable fit from year to year at both high and low ages in response to randomly high and low death counts. The (t, x) death count is denoted by D(t, x) and, conditional on m(t, x), is assumed to have a Poisson distribution with mean m(t, x)E(t, x), where E(t, x) is the central exposed-to-risk-the total number of years lived by pensioners in calendar year t at age x last birthday. E(t, x) can be computed exactly for the USS dataset. In this analysis, we take a sequential approach that first estimates (x) with smoothing and then the period effects, given (x).

Comparison of USS mortality rates with English deprivation deciles
As previously mentioned, even for a pension scheme of the size of USS, there can be significant noise in both crude and fitted mortality rates due to small-population noise in the death counts, particularly at high and low ages where expected deaths are low. It is helpful, therefore, to consider the USS experience alongside larger national datasets, where we have much larger exposures and, potentially, more years. In this case, we compare the USS experience with the English national population subdivided into deprivation deciles. Deprivation decile number 1 (IMD-1) covers the most deprived areas in England, moving up to decile 10 (IMD-10) which covers the least deprived areas. Construction of the deciles is described in Department of Communities and Local Government [15] (main report plus technical report). IMDbased mortality rate data are available for the period 2001-2016 and can be obtained from the UK's Office for National Statistics (ONS).

Graphical diagnostics
The plots that follow present results for USS in comparison with data obtained from the ONS. In Figs. 1 and 2, we plot death rates for USS and for the English IMD-1 to IMD-10 deprivation deciles.
The IMD death rates are plotted in their (almost) 6 unsmoothed form. The death rates for the IMD deciles in the two figures without any smoothing show a reasonably clear ordering for both calendar year and age. We also see a pattern of inequality that is typical for a variety of countries and socio-economic measures: a high degree of inequality at ages 60 and 65 that gradually narrows at higher ages.
The IMD deciles exhibit two further features of note. First, annual improvements in mortality have been higher for the least deprived deciles (so the gap is getting wider). Second, all deciles have exhibited a slow down in recent years, consistent 1 3 with the much-discussed national slow down in mortality rate improvements (but IMD-10 is still improving at a faster rate than IMD-1; see, for example, Office for National Statistics, [25]).
The curves for USS are based on aggregated deaths and exposures for primary pensioners and spouses based on lives (thick grey line) and amounts (dot-dashed line). 7 To mitigate small-population noise due to low death counts, the plots show fitted mortality rates based on the CBDX model. Mortality by amounts is lower than by lives, as is typical for other populations, implying that wealthier USS pensioners live longer than those who are less wealthy. However, the gap between the two is relatively small compared with other, less homogeneous populations (e.g., CMI standard tables S2PMA (amounts) and S2PML (lives); [11]). We see that the USS death rates are generally quite low compared with the IMD death rates for both males and females. In particular, in the later years, the USS mortality rate curve is mostly below the least deprived English decile, IMD-10.
We next sacrifice the age-related detail in the data in order to compare USS with IMD centiles rather than deciles, ranging from centile 1 (most deprived) to centile where m i (t, x) is the crude death rate for population i, year t and age x. This clearly reveals the even stronger result that USS male mortality rates are lower than the average of the 1% least deprived areas in England, while for females they are around the level of the 1-2% least deprived.    The top row of Fig. 3 also shows how reliable the Index of Multiple Deprivation is as a predictor of high or low mortality: that is, there is a clear shape to the plot with the centile on the x-axis, with a steepening at each end.
The clear conclusions from these three figures are: • Figure 1 shows that USS members have lower mortality rates than IMD-10 across all ages for men and women; • Figure 2 shows that mortality rate improvements for USS members are similar to IMD-10; • Figure 3 shows that on average USS males over 65 years of age are in the 1% of the UK population with the lowest mortality rates, while USS females over 65 years of age are in the 1-2% of the UK population with the lowest mortality rates.

Mortality rate analysis I: which explanatory variables are significant?
In this section, we conduct a series of formal statistical tests in order to determine whether there are unique features of the USS dataset that are significantly different from the larger, more heterogeneous IMD dataset. We also look at the significance of pension amount within the USS males data, and whether geographical region or the place of residence of a member within a specific IMD decile have significant power in explaining mortality rates.

Test 1: explanatory power of the IMD group
The IMD deciles discussed earlier consist of a collection of small geographical areas called Lower Super Output Areas (LSOAs), each having its own index of multiple deprivation (IMD). 8 Figures 1 and 2 indicate clearly that, in the absence of other information, the IMD decile to which an individual belongs by virtue of their place of residence is a significant predictor of mortality. With the additional information that an individual is a member of USS, does this explanatory power change? The USS data allow us to identify the LSOA, and hence the IMD decile, to which each member belongs (see Fig. 4) if they are resident in England-which accounts for about 75% of members. Table 2 summarises the total exposures and deaths over ages 65-89 and years 2005-2016 in each IMD group. The eleventh Group, 'X', counts pensioners who have not been allocated to an English LSOA either because they live in other parts of the UK or overseas, or have an unidentified post/zip code. Unsurprisingly, exposures increase steadily from IMD-1 to IMD-10: that is, USS pensioners mainly live in less deprived areas of England.
For this part of the framework to be useful, we require a tool that maps post/zip code to a set of distinct socio-economic groups. Each group must have its own historical mortality experience and, ideally, this should be based on national population and deaths data aggregated into socio-economic groups using an appropriate statistical method. In this case study, the mapping tool combines a post-code-to-LSOA lookup table and the English IMD, which provides a deprivation measure for each LSOA.
In order to conduct hypothesis tests, we define E(i, t, x) to be the exposed to risk for USS pensioners aged x in year t allocated to IMD group i, and D(i, t, x) and m(i, t, x) to be the corresponding death counts and death rates.
Assuming deaths, D(i, t, x), follow a Poisson distribution with mean m(i, t, x)E(i, t, x), we consider the following two hypotheses: Fig. 4 The process of linking individual pensioners to an IMD decile (Test 1) or to a geographical region (Test 2) • H0: m(i, t, x) = m(t, x) for all i, t, and x (i.e., there is no difference in death rates between the 11 groups, apart from differences between ages and over time), versus where the A(i, x) are constants to be estimated that depend on the IMD group, i, and age, x, but not year, t (that is, death rates differ across all groups). 9,10 Under H0, there are 12 × 25 = 300 parameters to be estimated. Under H1, there are 12 × 25 + (11 − 1) × 25 = 550 parameters to be estimated. 11 Let l 0 and l 1 be the maximum likelihoods under H0 and H1, respectively. We use the likelihood ratio test (in the spirit of Chen et al., [8]) which has test statistic S = 2(l 1 − l 0 ) and we reject H0 at the confidence level if S exceeds the quantile of the simulated distribution of the test statistic under H0. 12 For the data summarised in Table 2, the test statistic for males of S = 233.9 (Table 3) gives a p-value of 0.835. So we accept H0 and conclude that we have no evidence that the mortality rates of USS male primary pensioners are dependent on their IMD group. For females, we have a similar result: a test statistic of S = 284.8 and a p-value of 0.727. In other words, USS pensioners living in more deprived areas of England-for example, because of high house prices in other more desirable parts of cities such as London-do not suffer from higher rates of mortality. This is a very significant finding. In the general population, the IMD group is a very strong predictor of high or low mortality, but this does not hold for USS male pensioners (see, for example, Figs. 1 and 2). A likely explanation is that the strong homogeneity of the USS membership-a very specific and highly educated occupational groupoutweighs other information that has proved to be a useful predictor of mortality rates for other more heterogeneous segments of the general population.

Test 2A: regional differences
As an alternative to grouping by IMD decile, primary pensioners were grouped according to the region in which they are resident. There are nine English regions, and pensioners not in an identifiable LSOA are, again, allocated to the 'other' Group 1 3 X. Exposures and deaths are summarised in Table 4. 13 We carried out a similar likelihood ratio test to Test 1 (on the basis of the same H0 and H1, but with different groupings). For males, the Test 2A statistic S = 228.7 (Table 5) gave a p-value of 0.467 and for females, S = 255.1 , with a p-value of 0.285. So, for both males and females, we accept H0 and conclude that we have no evidence that the mortality rates of primary pensioners are dependent on the region in which they live. 14

Test 2B: north-south differences
Although the previous test does not find evidence for differences at the level of individual regions, it is possible, given the actual versus expected deaths in Table 4, to consider the possibility of a north-south divide. To test for this, 'the north' was taken to mean regions 1-5 and X, and 'the south' to be regions 6-9. 15 Table 6 presents the results of Test 2B, and we see that for both males and females there is reasonable (although not strong) evidence to reject the null hypothesis. That is, we can conclude that there is a difference in mortality rates between the north and the south. Statistically, for females, the evidence is less strong, reflecting the smaller population size. However, empirically, for females, the actual versus expected gap is wider than for males.

Test 3: difference between USS and IMD-10
In this test, we compare USS male and female death rates separately for ages 65-89 and years 2005-2015 with IMD-10 death rates.
Death rates are denoted by m(i, t, x), where i = 1, 2 correspond to IMD-10 and USS, respectively. The only modelling assumption is that We consider the following two hypotheses: x) for all t and x (i.e., death rates for USS are the same as those for IMD-10), versus • H1: m(2, t, x) ≠ m(1, t, x) for all t and x (i.e., death rates for USS differ from those of IMD-10).
Let l 0 and l 1 be the maximum log-likelihoods under H0 and H1. The test statistic for the likelihood ratio test is S = 2(l 1 − l 0 ) and we reject H0 at the confidence level if S exceeds the quantile of the 2 distribution where, here, = 12 × 25 = 300 is the x)). 13 Compared with the regional spread of the general population, USS primary pensioners are disproportionately concentrated in London and the South East. 14 It was feasible to split Group X further and retest. As an example, we split Group X in two (Scotland and other). However, Scottish mortality was similar to Group X as a whole, the p-value increased, and H0 was again accepted. 15 Most pensioners in group X reside in Scotland, Wales and Northern Ireland: countries that are contiguous to and have similar socio-economic characteristics to the northern English regions. Hence, X is grouped in with regions 1-5. number of degrees of freedom. Table 7 shows that we reject H0 for males and accept it for females.
As a slight note of caution, we observe that the IMD-10 population also includes USS pensioners, implying that the underlying assumption that the death counts in the two populations are independent is not strictly correct. However, since the overlap between USS and IMD-10 is only a small proportion of IMD-10 (about 1.1% of IMD-10 based on Table 2), the impact of this absence of strict independence is likely to be negligible. A further implication follows from the fact that the USS data have lower mortality rates than IMD-10, namely the IMD groups themselves are not completely homogeneous.
Finally, as we noted earlier, the use of Chi-squared for the distribution of the test statistic under H0 is an approximation that works better the larger the population size. Nevertheless, the high value of the test statistic for males far outweighs any questions over the validity of the Chi-squared distribution. On the other hand, if the p-value for the test is fairly close to the critical value (typically 0.05), we should be more cognisant of the independence issue.

Test 4: age-related difference between USS and IMD-10 only
We consider next the following two hypotheses:    Table 8 shows the test results. We conclude that mortality rate improvements in the USS (ages 65-89) are indistinguishable from those of the corresponding IMD-10 data. Importantly, the IMD-10 dataset is much larger-and so has much less small-population noise-and has more years. This implies that we can reasonably forecast USS mortality rates using forecasts for IMD-10 and then applying the age adjustment, A(x).
This leads to the following two hypotheses: • H0: CBDX model for USS with 1 (2, t) = 1 (1, t) and 2 (2, t) = 2 (1, t) (i.e., mortality rate improvements for USS derived from the CBDX model are the same at those for IMD-10), versus • H1: CBDX model for USS with 1 (2, t) and 2 (2, t) to be estimated from the USS data (i.e., mortality rate improvements for USS derived from the CBDX model differ from those for IMD-10). For the data under consideration, H1 has 24 more parameters. We again use the likelihood ratio test for primary pensioners and spouses, across ages 65-89 and years 2005-2016. Table 9 shows the test results. Under the stronger assumptions of the CBDX model, we still find that improvement rates in fitted USS mortality over the age range 65-89 are not significantly different from improvement rates in fitted IMD-10 mortality.
The importance of Test 5, as a model-based variant of Test 4, lies in the potential usefulness of the CBDX model for forecasting future mortality rates.

A first-of-the-month effect
We now expand upon a remark in Sect. 2 concerning the day of the month in which a pensioner is recorded as having retired and its connection with pension amount.
For primary retirees, Fig. 5 shows the relationship between an individual's age at retirement and the date during a calendar year when that individual retired. We observe a number of striking features. For both males and females, we see concentrations of dots along specific horizontals and verticals: verticals at the key ages of 50, 55, 60, 63.5 and 65; 16 horizontals correspond to the first of a month as the date of retirement. Relatively few individuals retired both later in the month and at an uncommon age. Furthermore, we found that pension amount had a clear link to the day of the month on which an individual is recorded as having retired. Figure 6 shows the cumulative distribution of pension amounts for those who retired on the first of a month and for those who retired later in the month. The figure shows a striking difference between the two groups, with those not retiring on the first of a month receiving pensions that are approximately 40% on average of  16 Over most of the relevant data period, USS constituent universities had the following common retirement ages: the lowest age for early retirement for a scheme member not in ill health (officially known as the normal minimum pension age, introduced in 2006 and set at 50, rising to 55 in 2010, following the Finance Act 2004), the normal age of female retirement (60), the normal age of male retirement (65) and a gender-neutral normal retirement age used by some universities (63.5) that was part way between the male and female normal retirement ages. The Pensions Act 1995 (later amended by the Pensions Act 2011) gradually raised the normal retirement age for women from 60 to equal that of men (65) between 2010 and 2018, and for the normal retirement age for both men and women to rise to 66 in 2020. The Equality Act 2010 removed an upper limit on the retirement age on the grounds of age discrimination (some universities had previously included in their contract of employment a specific maximum retirement age, e.g., 67 or 75; this became illegal under the Act, although employers can still require employees to retire on the grounds of incapacity).
those who retired on the first of a month. On further investigation, first-of-the-month retirees were found to be mostly pensioners who had retired from active service, with a significant peak in retirements at the beginning of October, the day after the end of the previous academic year. These individuals typically have long service and, consequently, relatively large pensions. 17 In contrast, later-in-the-month retirees were found to be mainly previously deferred pensioners. These will typically have shorter periods of service and, consequently, smaller pensions.

Mortality rate analysis II: does pension amount matter?
We have already seen some indication that USS mortality rates might be dependent on pension amount in Figs. 1 and 2. However, it is worth mentioning that the differences between lives-and amounts-based mortality rates are rather smaller with USS than is typical of standard tables (e.g. [9]). 18,19 We have already seen in the previous section that there is a link between the day of the month when a pensioner retired, membership status pre-retirement and pension amount (Figs. 5 and 6). We now seek to investigate pension amount as an predictive variable in more depth.

Subdivision of first-of-month primary retirees
Our empirical observations based on Figs. 1, 2, 5 and 6 prompted us to divide the primary retirees as follows: • Group 1: those recorded as having retired on the first of the month (mostly retirement from active service) with further subdivision into tertiles by pension amount (revalued to 1 April 2017) -1A: revalued pension less than a threshold P 1 ; -1B: revalued pension between thresholds P 1 and P 2 ; -1C: revalued pension above P 2 .
• Group 2: those who retired between the 2nd and last day of the month (mostly deferred members).

3
The Group 1 thresholds P 1 and P 2 are set at the 1/3 and 2/3 quantiles of the empirical distribution of the pension amounts for all first-of-month primary retirees, calculated separately for males and females. Values are given in Table 10.
Various graphical diagnostics were considered to explore the impact of both the first-of-the-month effect and the pension amount. Figure 7 for males shows how the ASMR for ages 65-89 varies over time for different groups. Specifically, we can see that Group 1B is slightly below Group 1A and Group 1C is more clearly below 1B. All three lie below IMD-10. On the other hand, small-population noise in the smaller Group 2 prevents us from detecting any obvious deviation from Groups 1A to 1C. Although Group 2 USS pensions are much lower, Group 2 is also mostly deferred pensioners. It is therefore likely that they also have non-USS pensions which reduce the gap between Group 2 and Group 1.    1B and 1C). The left-hand plot shows the fitted (x) (Eq. 1) for each group under the assumption that all follow the IMD-10 improvements as in Test 5 (Sect. 5). The right-hand plot shows the difference between the (x) for each group and the (x) for Group 1. This shows more clearly the differences between the five groups, reinforcing the empirical differences observed in Fig. 7. So a key finding from these two figures is that, for males, the pension amount might be predictive of the level of mortality, but only for first-of-the-month retirees. This implies that caution must be exercised when simply using pension amount without also taking the day of retirement into account. For females, the number of lives is sufficiently small that the results are inconclusive.
Our earlier tests 1-5 point to a strong occupational effect. As a graphical diagnostic, Fig. 7 suggests that, additionally, former academics (i.e., most of those in Group 2) have just as low mortality rates as those who stay in academia until retirement. Amongst other factors, they will have the same exceptionally high levels of formal education.

Test 6: explanatory power of pension amount
In this sub-section, we carried out a statistical test-similar to Tests 1 and 2 (in Sect. 5)-of the null hypothesis that Groups 1A, 1B and 1C have the same death rates versus the alternative hypothesis that Groups 1A, 1B and 1C have significantly different death rates. Formally, The likelihood ratio test in this case had a p-value of 0.0003 (Table 11), so we reject the null hypothesis and conclude that there are sufficient differences between the three groups that they should be modelled separately.
Finally, the difference between Groups 1A and 1C was compared with the CMI experience analysis of Self Administered Pension Scheme (SAPS) data [11]. Chart 6.1 of CMI [11] offers some insight into the differences in the SAPS data for males between pension bands. Although a direct comparison is difficult, the CMI chart points to much greater differences between the various CMI-SAPS pension amounts than we see in the USS data: potentially the result of greater heterogeneity in the SAPS data which comprise a mixture of industry sectors. For females, there was no graphical evidence supporting a difference in mortality rates between Groups 1A, 1B and 1C; in part, this is due to small-population noise.

Discussion of future short-and long-term mortality rate improvement assumptions
In this section, we discuss how our findings in the previous sections can be used to help set assumptions about future mortality rate improvements for the USS. We have noted that the different IMD groups have experienced different rates of improvement over the last 15 years. However, the principle of coherencesee Li and Lee [22] and Hyndman et al. [19]-suggests that these differences are unlikely to be sustainable in the long term. In other words, it is implausible to assume that mortality rates in the different IMD groups will continue to diverge in the long run. In setting long-term improvement rates, therefore, we should not look at IMD-10 in isolation. Instead, we should focus, on additionally, (a) national mortality trends and (b) international mortality trends. For the same reason, we should assume the same long-term improvement rate for females and males, with potential differences only in the short and medium term. The framework embedded in CMI [12] requires several inputs including agedependent short-and long-term improvement rates. The rationale for having a shortterm rate allows users to assume that recent improvement rates (high or low) might continue in the short term before reverting over time to some long-term rate.
We do not present here specific recommendations for the short and long-term improvement rates. Instead, we discuss what we see as a set of relevant factors, including those specific to USS that should be considered carefully before these assumptions are set.

National mortality trends
As a starting point for this discussion, Table 12 presents historical improvement rates using the overlapping periods available for USS, IMD and English national data. We highlight IMD-10 in this table in response to our conclusions in Tests 4 and 5 that USS mortality rate improvements are indistinguishable from IMD-10. Using English mortality rates from 2001-2016 (i.e., by aggregating the 10 IMD groups) and following the approach of Cairns et al. [4], a point estimate for the drift of 1 (t) (males) in the CBDX model is − 0.025 with a 95% confidence interval of (− 0.035, − 0.016) . 1 (t) dictates changes in the level of mortality at all ages, while 2 (t) affects the slope only. Thus, the negative of the drift of 1 (t) can be interpreted as the headline improvement rate: that is, on the basis of the CBDX model and English national population data for the period 2001-2016, a point estimate for the headline improvement rate is 2.5% per annum with a 95% credibility interval of (1.6, 3.5)%. 20 More detailed credibility intervals for selected individual ages are presented in Table 13. This clearly shows that, under the assumptions of the CBDX model, higher ages are predicted to have slower improvement rates than lower ages. If a longer run of data is considered appropriate, then the credibility intervals would probably narrow slightly reflecting the greater number of observations.
In the short run, consistent with the recent past, a projection of IMD-10 mortality might have a higher rate of improvement than the national population if one takes the view that there is a certain amount of persistence in mortality rate improvements in specific subgroups. 21 Counterbalancing this, short-term mortality rate improvement assumptions might reflect the recent (since around 2011) downturn in UK mortality rate improvements. Quantifying the extent of this downturn in the USS data is very difficult because of small-population noise. As a substitute, we present improvement rates over two time periods (2001-2011 and 2011-2016) for the IMD-10 group in Table 14. 20 The higher mean of 2.5% compared with Table 12 is the result of using a longer period: 2001-2016. The associated credibility interval is for the central improvement rate only, and does not include the stochastic component of the random walk, so the actual improvement rate over a given time horizon would be more uncertain. 21 Persistence would mean that if one subgroup has experienced, for example, 1% higher improvement rates over the last few years than another subgroup, then this difference might persist in the future, at least in the short term. Consideration of the improvement rate might also take account of information from other sources, as well as international comparisons (discussed below). For example, cause-of-death analysis or a downturn in health spending might help identify the reasons for the slow down in UK mortality rate improvements. We could then assess whether these factors can be expected to persist to the extent that shortor even long-term improvement rates will be lower than those experienced in the countries that are 'world leading'-in the sense of Oeppen and Vaupel [26] and Kleinow and Vellekoop [20]-for higher-age mortality rate improvements over the last 30 years.

International mortality trends
It is prudent to consider how patterns of mortality rate improvements vary in other countries to check whether they are in line with those in England. And, if they are out of line, should we alter our assumption for England to ensure that all countries (and hence the USS) have a common long-term improvement rate?
There has been considerable discussion of the slow down since around 2011 in mortality rate improvements in the UK. In Fig. 2, we saw that although different IMD groups have experienced different rates of mortality improvement, all 10 groups have, arguably, experienced a slow down.
On the other hand, compared with other countries, the UK has experienced higher average rates of improvements over a longer timespan. In particular, a comparison with Sweden and Japan is useful. Over the last 30 years, English and Welsh male mortality has improved at the rate of 2.6% per annum (ages 65-89), compared with 2.1% in Sweden and 1.7% in Japan. However, Sweden and Japan historically, and still do, have lower rates of mortality than England and Wales at higher ages, but have also experienced more steady rates of improvement over this period.
In Fig. 9, we illustrate how England and Wales (EW) 22 males mortality has fared relative to other developed countries over ages 65-89. We see different patterns of improvement in the six countries. Canada exhibited a similar slow down to EW. But Sweden and Japan, which already had low mortality, have not exhibited a slow down in recent years. This suggests that the EW slow down could be a local and temporary phenomenon. On the other hand, Sweden and Japan might be overdue for a slowdown, e.g., if the factors affecting EW are not the result of local drivers, but apply internationally, but with different lead times. Figure 9 also shows age standardised mortality rates for the English centiles (grey lines from 2001). This allows us to see that the spread between countries is much narrower than the spread between the centiles.

Summary of findings and recommendations
In this section, we summarise the findings for the USS from the previous sections: 22 We have used the larger EW population here as a longer run of mortality data are available for the combined population.

3
• The mortality rates of both male and female USS pensioners were found to be significantly lower than those for the least deprived (and longest-lived) of the national IMD deciles (IMD-10). On average across all post-retirement ages, USS male mortality rates are lower than the average of the 1% least deprived areas in England, while for females it is around the level of the 1-2% least deprived. In short, the USS has a distinctively different base mortality table from both the English national and IMD-10 populations and this observation should be useful for the purpose of valuing pension liabilities. • Mortality rate improvements for USS males and females were found not to be significantly different from the corresponding IMD-10 improvement rates. This is very useful when it comes to forecasting, since IMD-10 has the advantage of a greater volume of data (implying less small-population noise) and a longer run of data. • Many common longevity covariates which in different datasets have significant explanatory power turn out not to do so, or to be so strong, when it comes to USS members. A key example is post code or place of residence which can be associated with both region (e.g., N.E. England, London, etc.) and an IMD decile. We found that neither an individual's IMD decile nor the region where they live has any explanatory power, although there was a significant difference between the north and south of England. The implication of this is that the very strong homogeneity of the occupational group to which USS pensioners belong (i.e., university lecturers and senior administrators) overrides any other potential covariate available in the USS database as a predictor of an individual's mortality. 23 • There were significant differences between the mortality of male pensioners by pension amount, although only for those who retired on the first of the month (mostly persons retiring directly from active service).
Based on the tests carried out in this paper, our recommendations are as follows: • The USS should use its own base mortality tables for both males and females. • The short-term future mortality rate improvement for USS pensioners should be the same as that for IMD-10 (using the full experience of IMD-10 from 2001 onwards) and this should be different from (i.e., higher than) the short-term mortality rate improvement for the English national population. • The principle of coherence suggests that the long-term improvement assumption should be compatible with related populations including IMD-10, the English national population and other similar populations internationally. • The day of the month on which the member retires and the pension amount might help to predict the level of mortality for male USS pensioners.
CMI [12] also requires, as an input, an assumption about how quickly the improvement rate should shift between the short and long-term assumptions. However, this is beyond the scope of this paper and we leave it for future research.

Conclusions
In the process of examining the mortality experience of the UK Universities Superannuation Scheme, we have developed a general framework for analysing the mortality experience of a large portfolio of lives, such as those in any large pension scheme or annuity book. The framework has the following features and requirements: • It makes extensive use of a wide variety of graphical diagnostics. These should be seen as an essential part of the process for several reasons: -The graphics help to identify distinctive characteristics of the portfolio, such as, here, the day and month of retirement (and, in particular, the first-of-themonth effect), the swings in mean age at retirement, and concentrations of early retirements in specific years or in a narrow band of years. -Knowledge of these characteristics can be used to formulate hypotheses about the data or to group the data in sensible ways. For example, we were able to relate increases in retirement or concentrations of early retirements to specific legislative measures or changes in government policy.

3
-Graphical diagnostics also help to identify errors in the data (e.g., an error in the date of birth). -Some of the diagnostics plot the same data in slightly different ways, and, although, in some sense, there is nothing specifically new, the different perspectives can reveal additional nuances that we might not have identified previously.
• The framework makes use of a stochastic mortality model, such as the CBDX model, for the purpose of smoothing the mortality data and for forecasting future mortality rates. • The framework is designed to work with a variety of larger benchmark datasets that can be helpful, again using graphical diagnostics, for determining the base mortality table and for forecasting future mortality trends both in the short and long term: -A key requirement is to find a stable relationship between the mortality rates of the portfolio of lives of interest and those of one or more of these benchmark datasets. 24 We were able to find a stable relationship between the mortality data on USS pensioners and that of English mortality for one of the Index of Multiple Deprivation (IMD) deciles. The principal benefit of this is that data for IMD deciles, and in particular IMD-10, are much more comprehensive than USS in having a greater number of years of observations and a significantly larger number of lives (resulting in substantially lower smallpopulation noise). This can be exploited to help improve predictions about future USS mortality rates. -Of particular importance is the ability to map a member of the dataset to a particular socio-economic group or region whose mortality can also be modelled independently. As the previous bullet point makes clear, we were able to do this with the USS dataset by mapping a member's post code to a Lower Super Output Area and thence to the Index of Multiple Deprivation. This index is specific to England. However, there are other geodemographic classification techniques based on the notion of 'linking people to places' that are available, such as: -the Neighborhood Socioeconomic Status (NSES) Index in the US (see, e.g., [24]), -geodemographic profiling (see, e.g., [28]); this includes commercial organisations that collate multi-source information to produce sociodemographic measures (such as Experian's Mosaic consumer classification system, see, e.g., [30]), -customised socio-economic mortality indices (e.g., the Longevity Index for England (LIFE; [7]), 25 and -spatio-temporal datasets constructed from cell phone usage that can reveal human behavioural traits related to specific socio-economic characteris-tics, such as spending patterns in specific locations, which then need to be mapped to individuals in the data set, via post/zip code (see, e.g., [18]).
• Finally, it is important to recognise that different datasets will have their own idiosyncratic features which need to be teased out using the graphical tools of the general framework and then exploited to help set the base mortality table and to improve mortality forecasts. For example, our discovery of the exceptionally strong homogeneity of USS members combined with their high longevity can be fully exploited to improve the robustness of future mortality rate projections. Similarly, our finding that the day of the month on which a member retired had a significant impact on pension amount and life expectancy. Other data sets are likely to have a more heterogeneous membership, but if the socio-economic composition can be reliably determined then it may still be possible to get a good fix on the most appropriate base mortality table and to generate reliable forecasts of future mortality trends both in the short and long term. Figure 10 shows scatterplots of retirement date versus age at retirement for the financially important groups of primary pensioners (i.e., those who have retired in normal health) and widowed spouses (and partners). In the earlier years, for primary pensioners, we see concentrations of retirements at the beginning of October of each year, the day after the end of the previous academic year, whereas, more recently, primary retirements have become more spread out. There is a trend towards later retirement, probably in response to increasing healthy life expectancy and to changes in both scheme rules (including less favourable early retirement options) and legislation which raised both the normal minimum pension age from 50 to 55 (the Finance Act 2004) and the normal retirement age for both males and females to 66 (the Pensions Acts of 1995 and 2011) and outlawed age discrimination by removing age limits on retirement (the Equality Act 2010). The plots are denser for cohorts born after around 1946, probably reflecting the expansion of the university system in the 1960s and the recruitment of large numbers of young staff at that time.

Appendix: General characteristics of the pensioners dataset
This increase is perhaps clearer in Fig. 11, where we plot age at retirement versus date of birth. For pensioners who retired before 2005, the data are censored (i.e., records only exist for those who survived to 2005) and survivors are identified by grey dots. The lower left of each plot is, therefore, mostly blank because these early cohorts retired long before 2005 and mostly also died before 2005, so they are not included in the USS pensioners dataset. The upper right part of the plot is blank, as members have not yet attained the given age by 1 January 2017. The prominent diagonals correspond to retirements at the beginning of October each year.
In Fig. 11, we can identify the following features: • greater numbers of retirees in cohorts born after around 1946; • generally heavier rates of retirement around 1982-1984, 1988-1990, 1997 and 2010 (diagonal lines labelled DOR) corresponding to periods of significant reductions in staffing through early retirement and redundancy programmes. In July 1981, as part of the Thatcher government's austerity programme to reduce the national debt, the University Grants Committee reduced funding to UK universities over the following 3 years by 17% [17], while the Education Reform Act 1988 allowed universities to dismiss academics in permanent posts on the ground of redundancy, whereas previously they could only be dismissed for 'good cause', such as an inability or unwillingness to perform their duties, or gross misconduct. The increased retirements in 1997 and 2010 were the consequence of the introduction of market-driven forces in the university system, first, as a result of the Dearing Report [14] which recommended the abolition of student maintenance grants and the introduction of student Retirement patterns for males and females split by pensioner type: date of retirement versus age at retirement. Top: primary pensioners. Bottom: widowed spouse's age at the date of commencement of the spouse's pension; the retirement date is the commencement date of the widowed spouse's pension fees, and, second, following the Browne Report [3] which recommended the removal of the £3290 p.a. cap on fees, allowing universities to raise their fees to £9000, which reduced student demand for a range of courses. Figure 12 illustrates the changing patterns of retirement in a different way by plotting the mean age at retirement for primary pensioners. For pensioners who retired before 2005, the data are left-censored: only those who survived to the start of 2005 are included in the data. This introduces a bias in the (conditional) mean age at retirement which will be lower than the true or unconditional mean (younger retirees from a given cohort are more likely to survive to 2005). This bias will gradually increase as we move towards the left of the plot. In spite of this bias, there are three noteworthy features in the data: • the mean retirement age was initially relatively high (at around 63), with the male age being only slightly higher than the female age; • there were significant falls in the mean age around 1983 and 1989 corresponding to phases of increased early retirement as part of redundancy programmes (see the remarks above on Fig. 11); • there has been a steady increase in the mean age from 1997 onwards, with the male retirement age being around 2 years higher than the female age.
By the end of the sample period, the male retirement age was back to where it was at the start of the sample period, while the female age was still a little lower. It is hard to be sure why the retirement age was so high between 1975-1980 compared with the rest of the period, but we offer the following possible explanations. USS was still a relatively young scheme so pensions had not built up to adequate levels causing people to have to work longer, and there was also a much stronger sense of having a nation-wide common retirement age (65 for men and 60 for women) and people expected to work until these ages. It is also likely that the first early retirement programme of 1982-1984 opened academics eyes to early retirement being a new possibility for those tired of their jobs. Further, USS had a big surplus in the 1980s and 1990s allowing universities to offer generous early retirement packages. The changing patterns in age at retirement are illustrated further in Figs. 13 and 14, for females and males, respectively. For retirements in years 1981-2004, the input data are subject to left censoring: that is, only those who have survived to 2005 are included. In 1981 and 1985, retirements are concentrated around ages 60 and 65. In contrast, retirements in 1982 to 1984 are more spread out with a substantial increase in the total number of retirements. There is a similar pattern (especially for males) over the period 1988-1990 (spread similar to the 1982-1984 histograms) contrasted with 1987 and 1991 (concentrated around 60 and 65). These changes in the distribution of ages at retirement are reflected in the mean age at retirement plotted in Fig. 12. More generally, they show how patterns of retirement have changed over time. Although the normal retirement age is 65 for both males and females throughout the period covered by the data (it was not raised to 66 until 2020), the peak retirement ages are still 60 (females) and 65 (males), but these are less prominent than they used to be. The shift to later retirement is clear. Figure 15 reveals that average pensions for males are nearly double those of females, with relatively little change between the two 6-year periods. Figure 15 also shows that the shape of the distribution has changed quite considerably for males from one that was clearly bimodal-reflecting the different outcomes for short stayers and long stayers-to one that is more skewed towards small pensions.  Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.

Females Retiring in 2005−2010
Pension Amount