Life expectancy improvement for multiple cure distributions

In many circumstances, the increase in life expectancy when certain causes of death are eliminated is sought. These calculations are typically based on the assumption that the causes in question are simply omitted, which is equivalent to the causes being taken out of consideration, from the outset, with certainty. In this paper, we propose models whereby probability distributions for the cures of specific causes of death over time can be incorporated so as to more accurately predict the increase in life expectancy that would ensue. The theoretical results are applied to a real data set involving Diabetes and HIV-related deaths from Denver, Colorado, United States of America, between the years 1990 and 2015 inclusive.


Introduction
Modeling the increase in life expectancy when a certain decrement 1 is eliminated is an important statistic in many respects and, as noted by Beltran-Sanchez et al [3], is also an active area of demographic research. Actuaries, statisticians, demographers, engineers, sociologists, and econometricians all employ this concept when assessing the degree of influence certain causes of death (or failure, in a broader context) have on different populations (be they biological or not). The topic has been addressed in the literature from a wide array of different perspectives. Manuel et al [16] consider the issue of cause-deleted life expectancy that is modified to reflect what they call health-related quality of life factors. Somerville & Francombe [19] study the elimination of heart disease and cancer on mortality, as well as some of the accompanying consequences for life insurance. Mackenbach et al [14] investigate the gains in life expectancy after a cause of death is removed in light of the effect of related competing causes of death, and tackle several issues involved in the modeling and interpretation of these gains in life expectancy. Lai et al. [8] specifically consider the expected lifetime gains from the elimination of the human immunodeficiency virus (HIV), or HIV/AIDS. In Sect. 5 of this present article, a case study will be provided that applies the subsequent theory to a data set consisting of Diabetes and HIV-related deaths.
The concept of a cause-deleted life expectancy improvement (CDLEI), as found in Brown [5] and Adamic [1], is a statistic designed to quantify the increase in life expectancy if a certain cause of death is removed. The traditional approach to calculating the CDLEI is to remove the deaths caused by the failure in question from the life table (using an assumption or two, such as uniform distribution of decrements (UDD) and mutual independence between risks, as these random variables are not directly observable) and then to recalculate the life expectancies. One of the main drawbacks of this approach is that it is an either-or proposition: either the cause of death is allowed to remain perennially present, or it is taken out of consideration completely from the outset. Neither of these two extremes will necessarily correspond well with reality. To remedy this impasse, we will propose models that can incorporate probability distributions for the cures of the causes in question over time.
Accounting for possible cures in cause-deleted life expectancy calculations has several real-world applications. For example, if all of the life expectancies are known for each age, it is a fairly straightforward task to construct the entire life table from that information alone. Thus, if there are reasons to surmise that life expectancies will increase due to the emerging possibility of one or more cures for one or more causes of death, a new life table can be immediately created to reflect this reality. This is an especially important consideration for statistical entities that construct cohort life tables. Cohort life tables are typically built by the extrapolation of long term trends. Sometimes these trends are analyzed by cause of death. But the important thing to realize here is that these extrapolation methods will not adequately capture what we might call cure shocks, that is, significant breakthroughs in research that were not anticipated and that will result in immediate improvements in life expectancy.
Another example can be found in public health initiatives. Suppose a decision has to be made by public health officials regarding which cause of death should be granted research funding for a cure (and for how long). Although the cure of one risk, say cancer, would produce a life expectancy improvement well above a risk such as HIV/AIDS (since cancer accounts for so many more deaths), it may be the case that AIDS research has shown far more promising results in the interim than cancer research. This could very well imply that a cure for AIDS is substantially more likely than for cancer. If that were true, then a model that accounts for this would provide for a better basis of comparison between the two proposals in terms of expected lifetime gains.
It should be noted from the outset that our approach to modeling the cure of the cause itself is quite distinct from what is known as cure modeling, which is well established in the survival literature. For example, Tsodikov [20,21] espouses a parametric cure model to estimate the proportion of long-term survivors. Dickman et al [6] consider regression models pertaining to relative survival. Li & Taylor [10], Lu & Ying [13], and Lu [11] consider semi-parametric cure models, with Peng & Dear [18] proposing a nonparametric mixture model. Overviews of various cure models utilized to date can be found in Friis & Chernick [7] and Othus et al. [17]. The subdiscipline of cure fraction analysis (cf. [2,9]) also attempts to model survival based on the proportion cured as derived from an actual data set. Generally speaking, however, we may state that all of these methods involve fitting models, be they parametric or not, regression based or not, to data sets that include the actual times for which the subjects involved never experience the event of interest ( [12]). But these methods are all different from modeling the probability of a cure of the cause of death itself, which, although much more difficult to do, is necessary in order to ultimately calculate the improved CDLEI, at least in the context we are considering.
Attempting to model an entire distribution for a future cure that has not yet occurred, is, admittedly, a task that is difficult and inherently imprecise. We may address this concern from a variety of different perspectives. First we note that, despite the difficulty inherent in the task, the revised CDLEI calculation allows for the possibility of a better estimator of the true CDLEI than when it is assumed a cure is certain and applied immediately. In other words, more information can be utilized. Second, experts in the field may be privy to knowledge regarding progress made for a specific risk, how long it took other risks at a similar stage of development to realize a full cure, or even the likelihood of a patent (or some other criteria) being approved to treat the particular risk (if the research has already progressed to that stage). The superiority of the new CDLEI estimator will very much depend on the nature and the quality of the information that is available regarding a cure. Third, it is entirely possible that a complete distribution for the time until cure need not be specified very precisely anyway: rather, a range of values for the cure distribution parameters may emerge within which, or outside of which, the CDLEI is not likely to change much. Or, as in the case where a comparison needs to be made (such as in the cancer/HIV example above), it may suffice to estimate the relative cure (analogous to the concept of relative risk in survival analysis) between the decrements instead of having to completely specify two distinct distributions.
The structure of the paper is as follows. In Sect. 2, we present some preliminary theory and nomenclature that will be used in subsequent sections. In Sect. 3, we summarize the cure model for the univariate case, and in Sect. 4 we develop the cure model in the multivariate setting, including a separate treatment for the bivariate and trivariate model before moving on the the general multivariate case. In Sect. 5 we consider a case study based on a real data set involving Diabetes and HIV-related deaths from Denver, Colorado, United States of America, between the years 1990 and 2015 inclusive, and we conclude with a summary and discuss possible future work in Sect. 6. The appendix gives a formal proof for Theorem 1, the most important theoretical result of the paper.

Preliminaries and notation
Cause-deleted life calculations require the use of competing risks theory and notation. First, h (j) x (t) would represent the hazard rate for risk j at time x + t , given a current age of x. We will sometimes drop the subscript x for notational convenience, but it will always be assumed that time starts at age x. Also note the implicit assumption that only one distinct cause can be responsible for any particular death. This assumption is germane to all of the theory presented in this paper. The negation of j, denoted −j , will refer to all risks other than j. Furthermore, t p x would represent the probability of surviving all risks up to time t. The following standard relationships are given without proof: x would represent the probability of failure for some subject aged x prior to time x + t by cause j in the multiple risk forum.
The complete life expectancy (or simply the life expectancy) for an individual currently aged x quantifies the number of years the individual is expected to survive beyond x. This life expectancy will be denoted by e x , with where this integral expression can be derived using a simple recursive formula (cf. [4]).
To calculate the life expectancy when cause j is deleted, we use the well-known expression, is the "associated" survival function if cause j is eliminated as a competing risk, which can be expressed in terms of the cause-deleted hazard function: Following the notation of Brown [5], when cause j is eliminated, the CDLEI for any age x equals e (−j) x −e x . We also emphasize here that CDLEI calculations based directly on the hazard functions do not assume independence between the competing

3
Life expectancy improvement for multiple cure distributions risks. This is a considerable advantage, as, in practice, many competing risks of death will be mutually dependent. Some further notation from matrix theory is required in order to prove the results developed in subsequent sections. Let be a field and let A = [a ij ] ∈ m×n and B = [b ij ] ∈ p×q . Then the Kronecker product between A and B is denoted as A ⊗ B , and is defined as . We also propose the following Lemma, which will be used in a subsequent proof.

Cure model for the univariate case
In this section, we summarize the univariate model. Let T c be the future cure time of risk j. By cure we mean that risk j is no longer a potential candidate risk that can be responsible for failure. Note that the time until cure can itself be described using a survival function, since we would be modeling the time until some event of interest, in this case cure, occurs. Without loss of generality, we suppress the age subscript x here for notational convenience. Let F c (t) = P(T c ≤ t) where F c (t) would quantify the probability that a cure has been found for cause j by time t. With the probability distribution for the cure of cause j specified, we can define the hazard function for this process. The new hazard function, h * (t) say, is the hazard rate accounting for all causes of death when cure for cause j can be found with probability F c (t) by time t.
The new hazard will be a mixture of the hazard function without j with the hazard of all risks in toto, as follows: since F c (t) is the probability that cause j is in a cure state at the future time t. Equation (3.1) is an exact relationship, although the actual F c (t) that is chosen will depend on what assumptions are employed regarding the probability of cure. It is instructive to rearrange Eq. (3.1) as follows. Using the property that h (−j) (t) + h (j) (t) = h(t) , the new hazard function can be expressed as This is an intuitive result. The new hazard function, h * (t) , is the total hazard h(t) , minus the expected hazard for cause j reflecting the probability that cause j is no longer present at future time t.
The revised life expectancy can now be determined. For any age x and cumulative probability distribution of cure F c (t) , the following sequence of arguments engender a single equation for the revised CDLEI for any age x: It is important to note that our model can handle both increasing or decreasing rates, over time, in life expectancy. However, removal of causes of death, in and of themselves, can only increase life expectancy. For example, even if heart disease improvements are realized over time, it is still the case that removal of heart disease as a cause of death can only increase life expectancy. In this case, the gains in life expectancy will be less if there are already improvements in heart disease mortality rates.

The bivariate extension of the cure model
For the bivariate extension of the model, let F j 1 ,c (t) and F j 2 ,c (t) represent the probability that a cure has been found, by time t, for causes j 1 and j 2 respectively. We make the assumption that time of cure for the two causes are independent random variables, but not that the decrements associated with these cures are independent. With these probability functions for the cures of causes j 1 and j 2 , we can define a new hazard function as a mixture of: (1) the hazard function without both j 1 and j 2 , that is, h (−j 1 −j 2 ) (t) ; (2) the hazard function without cause j 2 only, or h (−j 2 ) (t) ; (3) the hazard without cause j 1 , or h (−j 1 ) (t) only; and (4) the hazard of all risks in aggregate, which is h(t) . Temporarily suppressing the subscript x, cure designator c, and time variation t for notational ease, by letting F j 1 ,c (t) = F j 1 and F j 2 ,c (t) = F j 2 , the new hazard function for the bivariate case is as follows: As in the univariate case, this equation can be expressed in a more intuitive form.
Life expectancy improvement for multiple cure distributions which is an intuitive result analogous to Equation (3.2). Interestingly, we can also obtain h * 2 by exploiting a direct relationship with the univariate hazard h * 1 from the previous subsection as follows: . Thus, a convex combination of F j 1 ,c (t) , as exhibited in h * 1 , and by utilizing the hazards h (−j 2 −j 1 ) and h (−j 2 ) , shows a recurring pattern of combinations of hazards and cure functions. From Equation (4.3), it is evident that h * 2 also has a convex combination in terms of F j 2 = F j 2 ,c (t) , the probability of cure for cause j 2 by time t. The pattern will be utilized in the ensuing theory.
It is instructive to express our results for the new hazard function for the univariate and bivariate cases already developed in matrix/vector form, as follows: where h 1 and h 2 are the appropriate hazard vectors for corresponding dimensions respectively. This matrix representation is favorable to the higher dimensional cure model. When we consider two cures (the bivariate model), there are 2 2 = 4 terms existing in the expression for the hazard. For higher dimensions, the number of terms in the hazard function h * i , i = 1, 2, … , increases by powers of two. Motivated by the relationships demonstrated between the hazards and cure probabilities presented thus far, we now propose a general framework for an arbitrary number of cures, n, and an accompanying proof.

The n-dimensional cure model
Let F j 1 , F j 2 , ⋯ , F j n denote the probabilities that a cure has been found for causes j 1 , j 2 , ⋯ , j n respectively by time t, and assume that the cure time random variables are all mutually independent for each and every cause of death. Let h (j 1 ) , h (j 2 ) , ⋯ , h (j n ) denote the hazard functions with causes j 1 , j 2 , ⋯ , j n respectively. For each distribution F j k , k = 1, 2, ⋯ , n , we can create diagonal matrices of order 2 as follows: For the n-dimensional cure model, there exists 2 n terms in the expansion of the hazard function, mimicking the expansion procedure previously done. By means of matrix notation, we define the hazard vector h n , which consists of hazards of individual cures and total cure. To formulate the vector of hazards, we should have n n−1 of them assigned a combination of two of the j ′ k s . For example, h (−j n −j n−1 ) , h (−j n−1 −j n−2 ) , ..., h (−j 3 −j 2 ) , and for n n−2 of them assign a combination of three of the j ′ k s such as and finally one of the kind h (−j n −j n−1 −⋯−j 1 ) and h in total. There should be 2 n terms of hazards to be found in the vector h n . For notational ease, we have chosen to use so that, the vector of hazards h n has dimension 1 × 2 n and is defined as Also, we define 2 n ×1 as a 2 n × 1 vector of ones. Using these precisions, we can obtain a general result for n-dimensional cure model.

Theorem 1
Let F j k , k = 1, 2, ..., n be the cumulative cure probabilities for n competing risks (j k ), k = 1, 2, ⋯ , n and F n represents the Kronecker product of n diagonal matrices of order 2 as in Equation (4.6) such that Utilizing the vector of hazards h n of dimension 1 × 2 n , and be a column vector of ones ′ with dimension 2 n × 1 , the new hazard function is denoted by h * n and can be expressed in the form which has 2 n terms of products of combinations of hazards h (.) and independent cumulative cure probabilities F (.) .
Note that although the hazard function for the n-dimensional cure model can assume different forms, Equation (4.9) is perhaps the most basic way to express it. Now, the recurrence relation connecting successive models of higher dimension can be expressed in the form (4.9) h * n =h n F n �

3
Life expectancy improvement for multiple cure distributions where h n−1 = (h (r 2 n−1 ) , h (r 2 n−2 ) , ⋯ , h (r 1 ) ) and h * n−1 =h n−2 F n−2 2 n−1 ×1 with F n−2 as the n − 2 Kronecker products of 2 dimensional matrices of cure probabilities. A proof of the theorem, by means of principle of mathematical induction, is provided in the appendix.
Using the property of total hazards h x = ∑ j h (j) x , we obtain a more simplified form of the combined hazard as The multivariate extension of the revised life expectancy can now be determined in a simple form based on the survival probability and combined form of hazards and cures. For any age x, and cumulative probability distribution of cures F k , k = 1, 2, ⋯ , n , with corresponding hazards h (k) , k = 1, 2, ⋯ , n , with the total hazard h , the revised CDLEI for any age can be written in the following form:

Case study
In this section we illustrate the models proposed in this paper using a real data set. The data set gives the number of deaths from birth to age 109 for the citizens of Denver, Colorado, USA, between the years 1990 and 2015 inclusive for all causes of death, those that results from Diabetes, and those that result from contracting the human immunodeficiency virus (HIV). The data were obtained from the Colorado Department of Public Health and Environment. Although the data are available for each age, in the interest of space, we present it here in decennial intervals only. The data are summarized in Table 1. Some ages have such a small number of deaths that the results were not made available for reasons of anonymity. Two noteworthy attributes pertaining to the data include: (a) Diabetes-related deaths account for approximately 1.5% of the total number of deaths, or about 1 in every 67 deaths; (b) HIV-related deaths account for approximately 0.5% of the total number of deaths, or about 1 in every 200.
To model the CDLEI (from age 0) for these data, we will employ a nonparametric approach. For this approach, we will estimate three functions: first the all-risks survival function, S(t), and then the associated survival functions of S (−Diab) (t) and S (−HIV) (t) , where Diab is shorthand for Diabetes. Since we are modeling from time (4.12) x . Specifically, the method that was used was a nonparametric generalized Kaplan-Meier estimator: where t ij represents the failure times for failures due to cause j, n ij the total number at risk just prior to each time point t ij , and d ij the number of deaths at time point t ij by cause j.
Since this paper focuses on the implementation of cure distributions to more accurately predict the increase in CDLEI, it is advantageous here to consider different forms for the cure function itself. A dominant characteristic that would distinguish different forms for the cure function would be whether or not the cure of the cause of death would increase over time, remain constant, or decrease over time. The two-parameter Weibull distribution is good choice to consider here, due to it's popularity in survival analysis, it's inherent flexibility, and it possessing a hazard function that can monotonically increase, remain constant, or monotonically decrease over time, depending on the value of the shape parameter. We therefore make the following assumptions for the two cure distributions: From this result we can see that if > 1 , then the hazard is monotonically increasing. In this case, the probability of obtaining a cure increases as time goes on and at an increasing rate. If, on the other hand, = 1 , then the hazard is constant, implying that the probability of obtaining a cure does not improve with time (other than the fact that more time has elapsed, of course). This is hopefully not surprising, since when = 1 , the Weibull distribution reduces to the exponential distribution, which has the famed memoryless property. Finally, if < 1 , then the hazard is monotonically decreasing. In this case, the probability of realizing Life expectancy improvement for multiple cure distributions a cure for a cause of death would increase with time (as it must since F(t) is a nondecreasing function) but at a decreasing rate. We consider the first scenario to be the most realistic, since as time goes on, the impact of research and technology should at least make the probability of cure more likely than at an earlier point in time. Table 2 illustrates the CDLEI results, from age 0, for some select values of the respective cure (scale) parameters 1 and 2 for values of shape parameters 1 and 2 equal to 0.5, 1, and 2. It is felicitous for comparisons to be done at age 0 rather than at any other age so that the potential benefits of a cure would apply to the largest possible cohort (however, there may be special circumstances when it might make sense to use a different age as the reference point). Many attributes of Table 2 are noteworthy, which can also be appreciated graphically by considering Fig. 1. First, note that the predicted increase in life expectancy from birth is 0.26381 years if both Diabetes and HIV were immediately eliminated as possible causes of death. Second, note that if Diabetes were immediately cured, but HIV realized no cure improvement over the lifetime duration under consideration, then the model predicts an immediate 0.17727 years of life expectancy improvement. Conversely, if HIV realized an immediate cure but there was no Diabetes cure on the horizon for the long term, a CDLEI of about 0.08603 years would be projected. Third, notice how swiftly the maximum CDLEI is reached when 1 = 2 = 2 compared to the memoryless cure functions where 1 = 2 = 1 . When 1 and 2 reach values a little above 0.06, the life expectancy improvement from age 0 essentially reaches its maximum and will not appreciably increase with larger forces of cure above this value. The opposite is true when 1 = 2 = .5 , where the maximum CDLEI is realized much more slowly. Finally, for the mixed case where 1 = .5 and 2 = 2 , the CDLEI statistics seem to more closely resemble the 1 = 2 = .5 case than the others cases, presumably owing to the fact that mortality rates for Diabetes are so much higher than for HIV (even in the face of significantly higher HIV cure probability).
To get our bearings a little, a value of 0.07 for 1 or 2 , when = 1 for example, would correspond to statements such as: • about an 80% chance of a cure within the next 23 years • about a 50% chance of a cure within the next 10 years. These statements may seem entirely plausible to experts in Diabetes and/or HIV research. If that were true, then a cure threshold is essentially realized and it would be justifiable to use the traditional cause-deleted approach of simply taking these causes of deaths out of consideration (since the maximum CDLEI is more or less achieved when = 0.07 and = 1 in both cases). If these two statements are deemed overly ambitious, then the CDLEI will very much depend on what values of 1 and

Conclusion and future work
In this paper we have proposed models that can incorporate multiple distributions for the probability that cures for specific causes of death can be realized over time, for the purpose of estimating a more accurate cause-deleted life expectancy  improvement. Despite the fact that it is difficult to model a cure event that has not yet happened, the proposed methodology is a vast improvement over the prevailing practice of simply assuming the cause in question is removed with certainty. Various factors, including stage of development in research for a cure, interim successes, and even the overall amount of scholarship that is being invested in the cure of a specific  cause of death, can all influence the postulated probability of a cure that experts in the field could realistically consider. There are several avenues of further research that flow from this present work. First, a natural extension of the theory presented in this paper would be to consider cases where there is a "partial cure". By partial cure, we mean that there are treatments available that prolong life, even though a full cure is not yet realized. A partial cure will translate into a higher life expectancy, effective immediately. As noted by Manton et al. [15], a cause-delay model provides a mechanism for incorporating the likely effects of medical innovation on survival, and this could prove useful for computing a superior CDLEI statistic. A second area to consider is the possibility of dependent cure time random variables. In practice, there can easily be dependence in the probabilities of cures being realized for certain causes of death that are naturally related: for example, the probability of a cure for breast cancer is likely significantly correlated to the probability of a cure for other types of cancers, such as leukemia. The use of copula models to account for dependent cure distributions might very well be the best approach to consider.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. , h (r 1 ) ) . Equation (7.4) can be expanded as Next, we wish to show that Equation (7.5) has the even more simplified form with h m+1 being the vector of m + 1 hazards, ′ the 2 m+1 × 1 vector of ones and F m+1 the m + 1 Kronecker products of the F m+1 matrices of the form found in Equation (4.6). Now the m + 1 Kronecker product can be written in the following form: Applying Lemma 1 to the Kronecker product of m + 1 terms, we obtain since the two matrices are commutative. The notation 2 m represents the zero matrix of order 2 m so that the each matrix in the product is of order 2 m+1 .
Next we perform the vector multiplication of F m+1 � to obtain a part of the general statement as in Equation (7.6): where there are 2 m of F i m+1 and 2 m of 1 − F i m+1 . By performing the complete multiplication, we get the hazard function where h m+1 = (h (r 2 n ) , h (r 2 n −1 ) , ⋯ , h (r 2 ) , h (r 1 ) |h (i 2 n ) , h (i 2 n −1 ) , ⋯ , h (i 2 ) , h (i 1 ) ) , h (i 1 ) = h the complete risk, h (r 2 n ) , h (r 2 n −1 ) , ⋯ , h (r 2 ) , h (r 1 ) are newly incorporated 2 m hazards due to the additional variable which gives various combinations of mixture of hazard functions without component variables and corresponding marginals. Now this vector can be written in the form where represents a vector of zeros, h r is a representation for the collection of h (r 2 n ) , h (r 2 n −1 ) , ⋯ , h (r 2 ) , h (r 1 ) , and h i is the notation used for h (i 2 n ) , h (i 2 n −1 ) , ⋯ , h (i 2 ) , h (i 1 ) . Then, the new hazard function can be expressed as , h m+1 =(h (r 2 n ) , h (r 2 n −1 ) , ⋯ , h (r 2 ) , h (r 1 ) |0, ⋯ , 0) + (0, ⋯ , 0|h (i 2 n ) , h (i 2 n −1 ) , ⋯ , h (i 2 ) , h (i 1 ) ) =(h r | ) + ( |h i ) Life expectancy improvement for multiple cure distributions Equation (7.5) can be obtained by performing the multiplication for the partitioned matrices, and we get 2 m+1 terms in the expansion of the general form. Hence it is true for n = m + 1 and so P(m + 1) is true. By the principle of mathematical induction, P(n) is true for all n ∈ ℕ∕{0} , the set of all natural numbers. ◻