Human life is unlimited - but short

Does the human lifespan have an impenetrable biological upper limit which ultimately will stop further increase in life lengths? This question is important for understanding aging, and for society, and has led to intense controversies. Demographic data for humans has been interpreted as showing existence of a limit, or even as an indication of a decreasing limit, but also as evidence that a limit does not exist. This paper studies what can be inferred from data about human mortality at extreme age. We show that in western countries and Japan and after age 110 the probability of dying is about 47% per year. Hence there is no finite upper limit to the human lifespan. Still, given the present stage of biotechnology, it is unlikely that during the next 25 years anyone will live longer than 128 years in these countries. Data, remarkably, shows no difference in mortality after age 110 between sexes, between ages, or between different lifestyles or genetic backgrounds. These results, and the analysis methods developed in this paper, can help testing biological theories of ageing and aid confirmation of success of efforts to find a cure for ageing.

in Dong et al. (2016) is unfounded and based on inappropriate use of statistics. Additional details are given in Section 4.
A similar claim that there is a "biologic barrier limiting further life-span progression" is made in Antero-Jacquemin et al. (2015). The claim is based on analysis methods which do not distinguish between a finite and an infinite limit, which do not take truncation caused by sampling scheme into account (see discussion below), and which sometimes exclude the most extreme life lengths from analysis. Neither of the papers Dong et al. (2016) or Antero-Jacquemin et al. (2015) define the concept of a limit for the human lifespan.
Earlier papers, (Weon and Je, 2009, Aarssen and de Haan, 1994, Wilmoth et al., 2000, argue both for and against a limit for the human lifespan. The arguments are based on analysis of data from the Netherlands or Sweden with maximum recorded ages 112 years or less. Extrapolation to conclusions about the existence, or not, of a limit of 122.45 or more years, the age of the longest living documented human, Jeanne Calment, are long, and thus uncertain. In a careful analysis Gampe (2010) studies survival after age 110 using non-parametric EM estimates in a life-table setting. Her results agree with the present paper and hence, specifically, provide important confirmation that the parametric models we use are appropriate. However our parametric approach makes a much more detailed analysis possible, and, in particular, addresses the question of a limit for the human lifespan; make predictions about future record ages possible; and makes it possible to perform tests and to obtain reasonable confidence intervals. There is some consensus that "the notion of an intractable species-specific senescent death and species-specific maximum lifespan" has been refuted (Vaupel, 2010).
To summarize, the question if there is a limit to the human lifespan still has not received any convincing answer. In this paper we use the best available data, the International Database on Longevity (IDL, 2016) to give the currently best possible answer to the question, and to study how factors such as gender, age, lifestyle, or genetic background change -or do not change -human mortality at extreme age. We use statistical Extreme Value Theory which is the appropriate statistical framework for study of extreme ages.
Unvalidated data on ages of supercentenarians is known to be completely unreliable (Poulain, 2010). The IDL database contains validated life lengths of 668 supercentenarians from 15 countries, with "data collection performed in such a way that age-ascertainment bias is avoided". This means that validation was performed with equal effort for younger and older supercentenarians. The only other data on validated lifespans of supercentenarians, the GRG (2016) list, does not have a clearly specified plan for data collection, and is expected to be agebiased.
The sampling scheme in the IDL database is somewhat different for different countries, but typically IDL contains ages of persons who died in some time interval, say 1980-1999, at age exceeding 110 years. The time intervals are different for different countries. This sampling scheme leads to both left and right truncation of life lengths. The analysis in this paper throughout takes this truncation into account; for details see Section 3.3. This is important: analysis which does not do this properly would lead to crucially different results.
Extreme value statistics provides methods for analysis of the most extreme parts of data: extreme floods, extreme winds, extreme financial events -or extreme life lengths, here excess life length after age 110. The generalized Pareto (GP) distribution plays the same role for analysis of extreme excesses as the Gaussian distribution does in other parts of statistics (Coles, 2001, Beirlant et al., 2004. We summarize the main results of the analysis in Section 2. Details on data and statistical analysis are given in Section 3, and comments on the Dong et al. (2016) paper in Section 4. Section 5 contains a concluding discussion.

The human lifespan is unbounded, but short
Our aim is to estimate the "force of mortality" at extreme age. The force of mortality, or hazard rate, is the instantaneous rate of mortality so that, e.g., (assuming the measurement unit is year) the probability of dying tomorrow is 1/365 times the force of mortality today. Formally, the force of mortality equals minus the derivative of the log survival function (here again with survival measured in years). We model excess life length (life length minus 110 years) of supercentenarians with a GP distribution. The GP model includes three different cases with distinct behaviors of the force of mortality. In the first case there is a finite age at which the force of mortality tends to infinity and beyond which survival is not possible -in this case lifespans have a finite limit; in the second case the force of mortality is constant -life is unlimited but short (more precisely, survival after age 110 is short); and in the third case the force of mortality decreases with age -life is unlimited. In the second case excess life length has an exponential distribution. The three cases are illustrated in Fig. 1. Additionally, Fig. 1 shows the importance of taking truncation caused by the sampling scheme into account.  GP distribution fitted to IDL data for Japan, without taking truncation into account. Life appears to have a finite limit equal to 116.0 years. Middle: Exponential distribution fitted to all IDL data with validation level A, with truncation taken into account. Life is unlimited but short. Right: GP distribution fitted to IDL data for France, without truncation taken into account. Life appears to be unlimited. However, changing estimation for Japan and France so that truncation is taken into account leads to the conclusion that, in fact, also in these countries life is unlimited but short.
Likelihood ratio tests using the assumption of a GP distribution, or assuming an exponential distribution, show that although around 10 times as many women as men live to age 110 years, differences in mortality, if any, between women and men after age 110 are too small to be detectable from the IDL data. Likelihood ratio tests also did not detect any difference in survival between south and north Europe, between western Europe and north America, or between Japan and the other countries. Wald tests did not identify any differences in mortality above age 110 for earlier and later parts of the data, and likelihood ratio tests did not reject the hypothesis that survival after age 110 has an exponential distribution. The resulting model, that mortality after 110 years of age is constant, fits the data well (Tables 2-5, Fig. 4).
Thus, our conclusion is that in western Europe, north America, and Japan excess human life length after age 110 follows an exponential distribution -human life is unlimited, but short -and that survival is the same for women and men, during earlier periods and later ones, and in countries with very different lifestyles and genetic compositions. The mean of the exponential distribution was estimated to be 1.34 with 95% confidence interval (1.22, 1.46). Survival after 110 in these countries can hence be described as follows: each year a coin is tossed, and if heads comes up it means a person will live one more year; more precisely, the estimated probability to survive one more year is 47%, with 95% confidence interval (0.44, 0.50).
The number of persons who live longer than 110 years is rapidly increasing (Fig. 2). Still, it is a very extreme event that a woman will live 110 years: the probability is only about 2 out of 100,000.
For a man this probability is ten times smaller. Given an unchanged mortality after age 110 and a continued increase in the number of supercentenarians, it is likely that the record age which will be documented in western Europe, north America or Japan during the coming 25 years will exceed 119 years but will be shorter than 128 years (Fig. 3, Section 3.2). If the record falls outside this interval, it would be an indication that survival has changed.  Jeanne Calment lived 3.18 years longer than the second longest-living person in the IDL data, which seems extreme, and could indicate that she is an "outlier". Her life length is the longest of the 566 supercentenarians with validation level A, and the probability that the largest of 566 exponential variables with mean 1.34 is larger that 12.45 (her excess life length, 122.45-110) is 5%. Thus there is some evidence that her life length is close to being an outlier -or that, alternatively, mortality decreases at extremely high age, say after age 118.

International database on longevity
The IDL (2016) database contains validated supercentenarian life lengths from 15 countries. The inclusion criterion typically is that the death has occurred in a specified time interval, however with different time intervals for different countries, and with somewhat more complicated criteria for USA and Japan. There are two validation levels, A and B, with level A being the most thorough validation.
In the analysis of the IDL data we only used the 566 life lengths with validation level A, and countries were grouped geographically, and so that groups contain 64 or more supercentenarians, see Table 1. The grouping was mandated by the fact that data from several countries included too few supercentenarians to make it meaningful to analyse these countries separately. We excluded persons who died after December 31, 1999 in the USA, and persons who died in 1996 or after August 31, 2003 in Japan; see Section 3.3 below. In the sequel we for brevity just write "Europe&America" for the data set which contains countries from western Europe, north America, and Australia, and "World" for the entire data set. To investigate the possibility of time trends in mortality of supercentenarians the data sets were split into two parts according to year of death. Except for World, the division was made so that the second part included more persons than the first one, but with the difference between the number of persons in the parts as small as possible.  Including the 32 persons in countries validated at level B (Belgium, Finland, Norway, Sweden, Switzerland) in the analysis did not change conclusions. We also did the analyses reported below for life lengths in excess of higher age thresholds: 110.5, 111, 111.5, and 112 years. This did not change any of the conclusions, although estimates of course were somewhat different, and confidence intervals were wider for higher thresholds. The estimates and tests obtained after including validation level B data and/or using higher thresholds are given by the MATLAB toolbox LATool which we have developed and which is available as Supplementary Material. Similarly, removing the largest age, Jeanne Calment's age, from the data did not change conclusions.

Statistical analysis
The GP distribution has cumulative distribution function G(x) = 1 − (1 + γx/σ) −1/γ + and force of mortality function (1 + γx/σ) −1 + /σ, where + signifies that the expression in parentheses should be replaced by zero if negative. Here γ is a shape parameter, termed the extreme value index, and σ is a scale parameter. The three different cases, limited life length, unlimited but short life length, and unlimited life length, correspond to γ < 0, γ = 0 and γ > 0, respectively.
The statistical analyses were done using LATool, the MATLAB toolbox which is available as Supplementary Material. The toolbox uses maximum likelihood estimation which takes the truncation caused by the sampling scheme into account, see Section 3.3. Confidence intervals were computed using asymptotic normality. The two sets of p-values in Tables 2 and 4 correspond to the two possible testing strategies: first test using the GP distribution, and then in the end test for γ = 0, i.e. for an exponential distribution, or conversely start with testing for an exponential distribution and then do the other tests assuming an exponential distribution. Both strategies lead to the same conclusions.
The probability to survive one more year is e −1/σ . Inserting the estimate and confidence limits for σ into this formula gives the estimate and confidence interval for the yearly survival probability.
The maximum, or record, of n exponential random variables with parameter σ has cumulative distribution function (1 − e −x/σ ) n and probability density function ne −x/σ (1 − e −x/σ ) n−1 /σ. Hence, to predict the size of the record age in some time period both the value of σ (estimated above) and the value of n, the number of supercentenarians who will die in the period, is needed.
To estimate the trend in the number of supercentenarians using data from different countries, the data must cover the same time period. Such data is available for for supercentenarians who died in the time period 1980-1999, for the countries Italy, England and Wales, and USA. Using Poisson regression with linear link function, the expected number of supercentenarians who will die in these three countries during the next 25 years, i.e. in 2018-2042, was estimated to be 1,690 with a 95% confidence interval (1,326, 2,054).
The ratio of the number of persons aged 100-104 in north America, western Europe and Japan 1 in the year 2000 in the Human Mortality Database (HMD, 2016) to the same number for Italy, England and Wales, and USA is 1.76. We used this ratio, 1.76, as a proxy for the relative number of deaths of supercentenarians in these countries. Multiplying the estimate and confidence interval with 1.76 gave the estimate n = 2, 974, with confidence interval (2,334, 3,615), for the expected number of supercentenarians who will die in the period 2018-2042 in north America, western Europe and Japan. Assuming independence of this interval and the 95% confidence interval for σ gives a joint confidence level 90%.
The same computations using a log link (i.e. assuming an exponential growth of the number of supercentenarians) in the Poisson regression gave the estimate n = 18, 717 with confidence interval (7,429, 46,962) obtained by simulating parameter values from the limiting normal distribution. It may be noted that these predictions only depend on changes in survival up to age 110 and are not affected by future fluctuations in birth rates: all supercentenarians who will die in the interval 1918-1942 are already born. Inserting x = age−110 and the estimates σ = 1.34, n = 2, 974 into the formula for the probability density function for the record gives the blue graph in Fig. 3, left panel. Instead using the lower confidence interval bounds σ = 1.22, n = 2, 334 gives a lower bound probability density function: this is the green graph in Fig. 3, left. The upper bound, and the graphs in Fig. 3, right, are computed similarly.
Using the formula for the cumulative distribution function of record age and the linear estimate of n gives the estimate 3% for the probability that the record age is lower than 119. Instead inserting the lower bounds of the confidence intervals into the formula gives the upper bound estimate 23% of the probability. Similar computations, using the exponential growth estimate of n gives the estimate 3% for the probability that the record age exceeds 128 years, and the upper bound 19% for this probability. This shows that it is likely that the record age will fall in the interval 119-128 years. These results are obtained assuming that present mortality and growth of number of supercentenarians remain unchanged.
There are only 10 persons in the World data set who lived longer than 115 years, and hence little data to confirm an exponential distribution beyond this age. Still these 10 data points are in agreement with an exponential distribution (Fig. 4). In view of predictions of very rapid growth of the number of centenarians (Thatcher, 2010), the results based on the exponential regression for n may capture the most likely development.
In the computations above we for simplicity of exposition used the expected rather than actual, random, number of supercentenarians. However, assuming a Poisson distribution with mean n for the number N of supercentenarians who die in 2018-2042, the distribution of the record age is It can be seen by numerical computation that for the parameter values used above the difference between this expression and (1 − e −x/σ ) n is less than one unit in the second decimal. The same holds for the densities obtained by differentiation of the two formulas.

The IDL sampling scheme: left and right truncation
The two exponential qq-plots in Figure 5 are quite different, and illustrates that the sampling scheme has to be taken into account in the analysis. Specifically, the explanation of the difference between the plots is not that US mortality changed at the end of 1999, but instead that the sampling schemes for the 1980-1999 data and the 2000-2003 data were different. The 1980-1999 data consists of ages of supercentenarians who died in this interval, while the 2000-2003 data consists of supercentenarians who were alive at the end of 1999 but died before the middle of 2003. The effects of the 1980-1999 sampling scheme are discussed below. The 2000-2003 sampling scheme leads to two kinds of biases: persons who have longer lives are more likely to be alive at any given point of time, say December 31, 1999, but on the other hand, persons with very long life are not included in the data set because they have not died before the middle of 2003. These effects are different from those caused by the 1880-1999 sampling scheme, see below. Simulations confirmed that these sampling schemes can produce qq-plots like those in Fig. 5. A further illustration of the importance of taking the sampling scheme for the IDL data into account is given by Figure 1.
In principle, the bias caused by the different sampling scheme used for the USA 2000-2003 data could be taken into account, and also this part of the data could be used. However, the information given about the sampling plan for IDL supercentenarians who died in the USA after 1999 was not sufficient to make it possible for us to do this. Hence, supercentenarians who died in the USA after December 31, 1999, have been excluded from the analysis. For similar reasons, supercentenarians who died in Japan in 1996, or after August 31, 2003, have also been excluded from the analysis. Additionally, for reasons of confidentiality, for US supercentenarians only the year of death, and not the exact death date, is given. In the analysis we assumed that all US supercentenarian deaths occurred on July 2. However, conclusions didn't change if we instead set the death dates to January 1, or to December 31.
If the sampling scheme is that a cohort, all persons who are born in some time interval, is followed until all persons in it are dead, then there is no bias in the observed ages. Similarly, if a population is stable and there are no trends in the probability of becoming a supercentenarian, then recorded ages of supercentenarians who died in some time interval, say 1980-1999, are not subject to bias. However, the IDL database does not follow cohorts until they are extinct, and the probability of living until age 110 is increasing with time. This increase, if not taken into account, would lead to bias in the estimation of life length from the IDL data: small excess life lengths are less likely to be included in the beginning of the interval but more likely to be included in the end of the interval -and there are more deaths in the end. In the estimates and test performed here, we have taken this into account by letting the likelihood contribution from a person be conditional on the time of achieving age 110 and of death date being contained in the observation interval, as follows.
Suppose that a sample consists of all supercentenarian deaths in some country during the time interval (b, e). Let {t i } be the times of achieving age 110 and {x i } be the corresponding excess ages at death. Further suppose that excess life lengths have cumulative distribution function F and probability density function f , where in our analysis F is either a GP distribution function or an exponential distribution function. Then supercentenarian i is included in the sample if one of the following two conditions are satisfied: if a) t i ≤ b and b ≤ t i + x i < e or if b) t i > b and t i + x i < e.
In the case a) the excess life length of supercentenarian i is truncated so that it only is included in the sample if b − t i ≤ x i < e − t i . Hence the likelihood contribution from the excess life length of a supercentenarian in the sample whose 110 year birthday was before the beginning of the sampling time interval, i.e. with In case b) where the 110 year birthday occurred after the beginning of the sampling interval, t i > b, the excess life length is truncated to so that x i ≤ e i − t i , and the likelihood contribution from supercentenarian i is The full likelihood is then obtained by taking the product of the likelihood contributions for all supercentenarians and countries included in an analysis.
Copyright c 2017 by Dmitrii Zholud www.zholud.com    Estimation using truncated and/or censored observations, such as the truncated observations in the IDL database, is of central interest in survival analysis and in analysis of failure time data, see e.g. Andersen et al. (1992) and Kalbfleish and Prentice (2002) for much more information and asymptotic theory. However, the particular truncation setting above does not seem to be considered explicitly in these books. Instead, the formulas are contained in Gampe (2010).
We have not corrected for selection bias in the qq-plots, and in fact, such corrections do not seem to be available in the literature.
4 Comments on X. Dong, B. Milholland, J. Vijg, Nature 538, 257 (2016). The plot is based on IDL data for England & Wales, France, USA, and Japan. The data sets for the different countries cover different time periods, and as a result of this the yearly number of supercentenarian deaths in the plot varies from 0 to 42. The yearly maximum reported age at death follows the same pattern as the number of supercentenarian deaths, and in particular the decline in the maximum recorded age at death after 1995 follows the decline in yearly number of deaths. That the yearly maximum reported age at death shows the same pattern as the yearly number of deaths is completely as expected: the maximum of many lifespans is stochastically larger than the maximum of few lifespans, and hence the maximum age is likely to be larger for years with many deaths. The right plot in Fig. 6 instead of n t adds the mean of the maximum of n t independent exponential variables, with parameter 1.35 estimated from these countries. This does not take truncation into account, but still, again as expected, the mean and the regression lines agree well. Fig. 7 shows the same plots as in Fig. 6, left, done separately for individual countries. There is no evidence of a trend break in these plots. Yearly maximum reported age at death (years) Year Figure 6: Yearly maximum reported age at death for supercentenarians. Combined data consisting of England & Wales 1968-2006, France 1987-2003, USA 1980-2003, and Japan 1996-2005 Thus, in conclusion, the apparent trend break in Fig. 2.a in Dong et al. (2016) is an artifact caused by inappropriate combination of data from different time periods. Fig. 2.b of Dong et al. (2016) adds the 2nd to 5th highest reported age at death to Fig. 2.a. The pattern is the same as in Fig. 2.a, and the explanation is the same: inappropriate combination of data from different time periods. Further, the statement in Dong et al. (2016) that "the annual average age at death for these supercentenarians has not increased since 1968 (Fig. 2.c)" does not imply that there is a limit. Again: records increase as more attempts to break them are made, even without a Copyright c 2017 by Dmitrii Zholud www.zholud.com change in the underlying distribution. The conclusion "our data strongly suggest that the duration of life is limited" in Dong et al. (2016) is based on wrong and misleading analysis.

Discussion
The fact that data doesn't shows any difference in survival at extreme age between women and men; between persons with different lifestyles or genetic backgrounds; between different time periods; or between different ages, say between 110-year-olds and 115-year-olds is surprising and remarkable and can inform the search for demographic or biological theories of aging. In particular, the fact that lifestyle factors which are important for survival at younger ages ceases to play a role after age 110 is of both biological and popular interest.
Box 1 of Vaupel (2010) gives an introduction to the important explanation of mortality at high age as resulting from a mixture of subpopulations with different frailty. Presumably the composition of subpopulations of humans reaching 110 years of age differs, at least somewhat, between persons who lived earlier and who lived later, or between countries. If the frailty hypothesis is correct, mortality should then be different in at least some of the countries or in one of the time periods. However, we did not find any such differences. This weakens the case for the frailty explanation.
A fundamental hypothesis on how we age is that "the rate at which the chance of death increases with age for humans is a basic biological constant, very similar and perhaps invariant across individuals and over time", see Vaupel (2010). For mortality after 110, this would mean that the distribution of excess life length in the future should still be exponential, but with lower yearly probability of death. The methods developed in this paper provide an efficient way of checking if this indeed is the case: power calculations show that an increase of 5% or more in yearly survival in a new data set of the same size as the IDL data would be detectable. Substantial efforts to find "a cure for aging" and in "engineering better humans" with the aim of extending the length of human life, perhaps indefinitely, are underway, see e.g. Vijg and Campisi (2008), Longo et al. (2015), Häggström (2016). Success of such efforts could happen either in a dramatic and obvious way, or gradually. In the latter case the results of this paper, together with similar analyses of data to be collected in the future, will aid early confirmation of success of the efforts.
Supplementary materials: LATool -a MATLAB toolbox for life length analysis which makes it possible to obtain more detailed results and do alternative analyses. Available at www.zholud.com.