Skip to main content

Advertisement

Log in

Inference on the endpoint of human lifespan and its inherent statistical difficulty

Discussion on the paper by Holger Rootzén and Dmitrii Zholud

  • Published:
Extremes Aims and scope Submit manuscript

Abstract

We offer an inference methodology for the upper endpoint of a regularly varying distribution with finite endpoint. We apply it to the IDL and GRG data sets of lifespans of super-centenarians. As in the comprehensive analysis of Rootzén and Zholud, our results underscore the effect of the data sampling scheme and censoring on the conclusions. We also quantify the statistical difficulty of distinguishing between the hypotheses of finite and infinite lifespan by providing estimates of the required sample size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • The Holy Bible: King James Version, American Edition. https://www.bible.com/bible/547/GEN.6.kjva (2004)

  • Andersen, S.L., Sebastiani, P., Dworkis, D.A., Feldman, L., Perls, T.T.: Health span approximates life span among many supercentenarians: Compression of morbidity at the approximate limit of life span. J. Gerontol. Ser. A 67A(4), 395–405 (2012)

    Article  Google Scholar 

  • Beltrán-Sánchez, H., Razak, F., Subramanian, S.V.: Going beyond the disability-based morbidity definition in the compression of morbidity framework. Glob. Health Action, 7(1) (2014)

    Article  Google Scholar 

  • Bhattacharya, S., Kallitsis, M., Stoev, S.: Trimming the Hill estimator: robustness, optimality and adaptivity. ArXiv e-prints (2017)

  • Bickel, P.J., Ya’acov, R., Stoker, T.M.: Tailor-made tests for goodness of fit to semiparametric hypotheses. Ann. Statist. 34(2), 721–741 (2006)

    Article  MathSciNet  Google Scholar 

  • Bingham, N.H., Goldie, C.M., Teugels, J.L.: Regular variation. Cambridge University Press, Cambridge (1987)

    Book  Google Scholar 

  • Davison, A.C.: ‘The life of man, solitary, poor, nasty, brutish, and short’ Discussion of the paper by Rootzén and Zholud. Extremes. In this volume (2018)

  • Dong, X., Milholland, B., Vijg, J.: Evidence of a limit to human lifespan. Nature 538, 257–259 (2016)

    Article  Google Scholar 

  • Fries, J.F.: Aging, natural death, and the compression of morbidity. Engl. J. Med. 303(3), 130–135 (1980)

    Article  Google Scholar 

  • IDL: International Database on Longevity (2018). http://supercentenarians.org/DataBase

  • Le Cam, L., Yang, G.L.: Asymptotics in Statistics. Springer Series in Statistics, 2nd edn. Springer, New York (2000). Some basic concepts

    Google Scholar 

  • Resnick, S.I.: Extreme Values, Regular Variation and Point Processes. Springer, New York (1987)

    Book  Google Scholar 

  • Rootzén, H., Zholud, D.: Human life is unlimited – but short. Extremes. Discussion paper (2018)

  • Stoev, S., Bhattacharya, S.: Matlab code accompanying the paper: ‘Inference on the endpoint of human lifespan and its inherent statistical difficulty’. http://hdl.handle.net/2027.42/142999 (2018)

  • Swartz, A.: James Fries: Healthy Aging Pioneer. Amer. J. Publ. Health 98(7), 1163–1166 (2008)

    Article  Google Scholar 

Download references

Acknowledgements

We thank Thomas Mikosch, Holger Rootzén, and Dmitrii Zholud for the opportunity to engage in a stimulating research discussion. We are also grateful to Ya’acov Ritov for enlightening discussions on Statistics, Philosophy, and The Bible. SS and SB were partially funded by the NSF grant DMS-1462368.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stilian A. Stoev.

Appendices

Appendix A: Proofs

Proof of Proposition 1

Observe that \(Z_{i} = {V}_{i}^{-\xi }\), where V i, i = 1, … , n are iid Uniform(0, 1). A version of the Rényi representation entails that

$$(V_{(n,n)}, V_{(n-1,n)},\dots,V_{(1,n)}) \overset{d}{=} \left( \frac{{\Gamma}_{1}}{{\Gamma}_{n + 1}}, \frac{{\Gamma}_{2}}{{\Gamma}_{n + 1}}, \dots, \frac{{\Gamma}_{n}}{{\Gamma}_{n + 1}}\right). $$

Using this representation and the fact that \(Z_{(i,n)} = {V}_{(n-i + 1,n)}^{-\xi }\), from Eq. 4, we obtain

$$ \{\widehat{\xi}_{k_{0},k}(n),\ 0\le k_{0}<k<n \} \overset{d}{=} \left\{ - \frac{\xi}{k-k_{0}} \sum\limits_{i=k_{0}+ 1}^{k} i \log \left( \frac{{\Gamma}_{i}}{{\Gamma}_{i + 1}}\right),\ 0\le k_{0}<k<n \right\}. $$
(14)

Note that the random variables Γii+ 1,i = 1, … , n − 1 are independent. Indeed, this follows from the fact that for all k,

$$\left( \frac{{\Gamma}_{1}}{{\Gamma}_{k}},\dots,\frac{{\Gamma}_{k-1}}{{\Gamma}_{k}}\right) \ \ \text{ and } \ \ {\Gamma}_{k} $$

are independent. Furthermore, Γii+ 1 has the Beta(i, 1) distribution and hence Wi := (Γii+ 1)i is Uniform(0, 1). This implies that

$$- i \log\left( \frac{{\Gamma}_{i}}{{\Gamma}_{i + 1}}\right) = - \log(W_{i}),\ i = 1,\dots,n-1, $$

are iid standard exponential, which in view of Eq. 14 yields (6).

By the so-established part (i), for a fixed k0, we have that

$$\{ (k-k_{0})\widehat{\xi}_{k_{0},k}(n),\ k=k_{0}+ 1,\dots,n-1\} \overset{d}{=} \xi \left\{ {\Gamma}_{k-k_{0}},\ k=k_{0}+ 1,\dots,n-1\right\}. $$

Therefore, for the statistics defined in Eq. 7, we obtain

$$\begin{array}{@{}rcl@{}} \{U_{k_{0},k}(n),\ k=k_{0}+ 1,\dots,n-2 \} &=& \left\{ \left( \frac{(k-k_{0})\widehat{\xi}_{k_{0},k}(n)}{(k-k_{0}+ 1)\widehat{\xi}_{k_{0},k}(n)} \right)^{k-k_{0}},\right.\\ &&\left. k = k_{0}+ 1,\dots,n-1 \vphantom{\left( \frac{(k-k_{0})\widehat{\xi}_{k_{0},k}(n)}{(k-k_{0}+ 1)\widehat{\xi}_{k_{0},k}(n)} \right)}\right\}\\ &\overset{d}{=}& \left\{ \left( \frac{{\Gamma}_{k-k_{0}}}{{\Gamma}_{k-k_{0}+ 1}} \right)^{k-k_{0}},\ k=k_{0}\,+\,1,\dots,n-2 \right\}\!. \end{array} $$

As argued above, the random variables (Γii+ 1)i, i = 1, 2, … are iid Uniform(0, 1), which proves (7).

Proof of Relation (8)

We have

$$\begin{array}{@{}rcl@{}} \left( Q\left( \frac{{\Gamma}_{1}}{{\Gamma}_{n + 1}}\right),\ldots, Q\left( \frac{{\Gamma}_{k}}{{\Gamma}_{n + 1}}\right) \right)&=&Q\!\left( \frac{{\Gamma}_{k + 1}}{{\Gamma}_{n + 1}}\right)\left( Q\left( \frac{{\Gamma}_{1}}{{\Gamma}_{n + 1}}\right)/Q\left( \frac{{\Gamma}_{k + 1}}{{\Gamma}_{n + 1}}\right),\ldots,\right.\\ &&\left. Q\!\left( \frac{{\Gamma}_{k}}{{\Gamma}_{n + 1}}\right)/Q\left( \frac{{\Gamma}_{k + 1}}{{\Gamma}_{n + 1}}\right) \right)\\ &=&Q\!\left( \frac{{\Gamma}_{k + 1}}{{\Gamma}_{n + 1}}\right)\left( \frac{\ell({\Gamma}_{1}/{\Gamma}_{n + 1})}{\ell({\Gamma}_{k + 1}/{\Gamma}_{n + 1})}\left( \frac{{\Gamma}_{1}}{{\Gamma}_{k + 1}}\right)^{-\xi} , \cdots,\right.\\ && \left.\frac{\ell({\Gamma}_{k}/{\Gamma}_{n + 1})}{\ell({\Gamma}_{k + 1}/{\Gamma}_{n + 1})}\left( \frac{{\Gamma}_{k}}{{\Gamma}_{k + 1}}\right)^{-\xi}\right). \end{array} $$

By the Strong Law of Large Numbers, we have \({\Gamma }_{k + 1}/{\Gamma }_{n + 1}\stackrel {a.s.}{\longrightarrow }0\), and hence by the slow variation property of , for all fixed k and i = 1, … , k, we have

$$\frac{\ell({\Gamma}_{i}/{\Gamma}_{n + 1})}{\ell({\Gamma}_{k + 1}/{\Gamma}_{n + 1})} = \frac{\ell(({\Gamma}_{i}/{\Gamma}_{k + 1}) ({\Gamma}_{k + 1}/{\Gamma}_{n + 1}))}{\ell({\Gamma}_{k + 1}/{\Gamma}_{n + 1})} \stackrel{a.s.}{\longrightarrow}1. $$

This yields (8).

Appendix B: The need for dithering

Table 3 shows that each of the three longevity data sets involves a fair number of identical ages. This digitization effect is due to the fact that human lifetimes are reported as integer number of days. Indeed, the period of 12 years and 164 days (the excess lifetime of Jeanne Calment) over the super-centenarian threshold of 110 years involves (approximately) m = 4, 547 days (assuming the year has 365.25 days). If one samples uniformly and at random n = 631 excess ages (in integer number of days) from {1, … , m}, then the expected number of distinct ages in this sample is m × (1 − (m − 1)n/mn) ≈ 589.23. The non-uniform excess age distribution leads to fewer distinct values but this ball-park computation explains the source of the seemingly odd digitization effect. The digitization effect in-of-itself does not indicate issues with the sampling scheme, but leads to many ties among the order statistics which affect the empirical distribution of the U0,j’s (Fig. 4). Before we can apply the proposed methodology, we need to fix this problem. We do so with dithering, i.e., to each lifetime Xi, we add a random time-of-day when the person departed. Formally, we consider the dithered sample \({X}_{i}^{*} := X_{i} + {\Delta }_{i},\ i = 1,\dots ,n\), where the Δi’s are independent and uniformly distributed in the interval [− 0.5/365.25, + 0.5/365.25]. Such dithering has virtually no effect on the distribution of the excess lifetimes but it eliminates the large number of ties among the order statistics and corrects for the odd digitization effect on the scatter-plots of the U0,j-statistics (see Fig. 4).

Table 3 Sample sizes and corresponding numbers of unique numerical values for each data set
Fig. 4
figure 4

Scatter-plots of the U0,j statistics (10) based on the raw (left panel) and dithered (middle panel) type A data. The right panel shows a quantile-quantile plot of the raw versus the dithered data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stoev, S.A., Bhattacharya, S. Inference on the endpoint of human lifespan and its inherent statistical difficulty. Extremes 21, 391–404 (2018). https://doi.org/10.1007/s10687-018-0320-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10687-018-0320-1

Keywords

AMS 2000 Subject Classifications

Navigation