1 Introduction

The population growth rate (λ or r) analyzed in Chap. 3 is a population-level consequence of the individual-level vital rates. A similarly basic outcome, at the individual or cohort level, is longevity: the length of individual life. The most commonly encountered description of longevity is its expectation, the life expectancy. However, longevity is a random variable, differing among individuals (even when those individuals are subject to the same rates and hazards) because of the random vagaries of mortality and survival. Therefore, it is important to also consider its variance and higher moments. This chapter introduces the sensitivity analysis of longevity, which will be explored in more detail in Chaps. 5, 11, and 12.

As in Chap. 3, we will begin by reviewing a classic formula for the sensitivity of life expectancy in age-classified models. The we will use matrix calculus to derive more general formulas for the moments of longevity, the distribution of age or stage at death, and the life disparity, applicable to age- or stage-classified populations.

2 Life Expectancy in Age-Classified Populations

Notation

It is customary to denote life expectancy by symbols like \(e_x^{\mathrm {o}}\) or e(x), but in general the symbol e plays too many roles in mathematics to be helpful for our purposes. So, when we make the transition to matrix formulations, I will use the symbol η, in various vector and scalar manifestations, to indicate longevity.

Perturbation analysis of longevity has been pursued mostly within the framework of age-classified life cycles (e.g., Canudas Romo 2003; Keyfitz 1971; Pollard 1982; Vaupel 1986; Vaupel and Canudas Romo 2003). The life expectancy at age x is given by

$$\displaystyle \begin{aligned} e(x) = \frac{1}{\ell(x)} \int_x^\infty \ell(s) ds {} \end{aligned} $$
(4.1)

where the survivorship function (x) is the probability of survival to age x.

The classical result for the sensitivity of life expectancy at birth to a change in mortality at age a is

$$\displaystyle \begin{aligned} {d e(0) \over d \mu(a)} = -\ell(a) e(a) . {} \end{aligned} $$
(4.2)

That is, the sensitivity of life expectancy at birth to a change in mortality at age a is equal to the product of the probability of survival to age a and the life expectancy at age a. In other words, e(0) is most sensitive to changes in mortality at ages to which lots of individuals survive (to experience the change in mortality) and beyond which there is lots of longevity remaining (so they can enjoy the change in mortality). The derivative is negative because increasing mortality reduces life expectancy.

The result was presented independently by Keyfitz (1971) who also referenced some earlier approaches (Wilson 1938; Irwin 1949) and by Pollard (1982). Keyfitz’s derivation was sketchy, and Pollard simply stated that the result was well-known, and gave no derivation. From a general sensitivity analysis perspective, we can derive the result using the same approach applied in Chap. 3 to population growth rate.

2.1 Derivation

Differentiating (4.1) with respect to mortality at some specified age a gives

$$\displaystyle \begin{aligned} {d e(0) \over d \mu(a)} = \int_0^\infty {d \ell(s) \over d \mu(a)} ds {} \end{aligned} $$
(4.3)

and our problem reduces to finding the derivative of (s) with respect to μ(a). To do so, introduce a parameter θ to measure the size of the perturbation at age a, and write mortality as

$$\displaystyle \begin{aligned} \mu(x,\theta) = \mu(x,0) + \theta \; \delta(x-a) {} \end{aligned} $$
(4.4)

where δ(x − a) is the Dirac delta function.Footnote 1 The derivative with respect to μ(a) is obtained by differentiating with respect to θ and evaluating the result at θ = 0.

Write survivorship as

$$\displaystyle \begin{aligned} \ell(x,\theta) = \exp \left[ - \int_0^x \mu(z, \theta) dz \right] \end{aligned} $$
(4.5)

so that

$$\displaystyle \begin{aligned} {d \ell(x,\theta) \over d \theta} = - \ell(x,\theta) \int_0^x {d \mu(z,\theta) \over d \theta} d z \end{aligned} $$
(4.6)

From (4.4) we have

$$\displaystyle \begin{aligned} {d \mu(z,\theta) \over d \theta} = \delta(z-a) \end{aligned} $$
(4.7)

so that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {d \ell(x,\theta) \over d \theta} &\displaystyle =&\displaystyle - \ell(x,\theta) \int_0^\infty \delta(z-a) dz \end{array} \end{aligned} $$
(4.8)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle -\ell(x,\theta) H(x-a) \end{array} \end{aligned} $$
(4.9)

where H(⋅) is the unit step function. Substituting this into (4.3) and evaluating at θ = 0 gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} {d e(0) \over d \mu(a)} &\displaystyle =&\displaystyle - \int_0^\infty \ell(s) H(s-a) ds \end{array} \end{aligned} $$
(4.10)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle - \int_a^\infty \ell(s) ds \end{array} \end{aligned} $$
(4.11)

which, by (4.1) is equal to (4.2).

3 A Markov Chain Model for the Life Cycle

Age has a special status in demography because it is continuous, linear, and permits movement in only one direction and at one rate (age increases by one unit for every unit of time). All other demographic characteristics have the potential for much greater flexibility, and the operators that describe movement and development of individuals require an equal degree of flexibility. This book is devoted to matrix formulations of these problems, which have the great advantage of permitting both age and stage-classified models. The basic formulation, as far as longevity is concerned, is that of a finite-state absorbing Markov chain.

3.1 A Markov Chain Formulation of the Life Cycle

We describe the life cycle as an absorbing Markov chain. This approach was pioneered in demography by Feichtinger (1971) and Hoem (1969), and has been greatly extended in recent years (Caswell 2001, 2006, 2009; Horvitz and Tuljapurkar 2008; Tuljapurkar and Horvitz 2006; Steinsaltz and Evans 2004). Good sources for the basic theory of absorbing Markov chains are Kemeny and Snell (1976) and Iosifescu (1980).

These models will be explored in more detail in Chaps. 5 and 11. The sensitivity analysis of measures of variance in longevity has been developed by Van Raalte and Caswell (2013) and Engelman et al. (2014). An important extension of Markov chain models for longevity is the incorporation of “rewards” to represent the value, in some sense, of the length of life, extending methods developed for dynamic programming (Howard 1960). The rewards include the production of offspring (Caswell 2011; van Daalen and Caswell 2015, 2017), the accumulation of income and expenditures (Caswell and Kluge 2015) and healthy longevity (Caswell and Zarulli 2018). The sensitivity analysis of these important models is derived in van Daalen and Caswell (2017).

Markov chain theory distinguishes between recurrent and transient states. A recurrent state has the property that the probability of returning to that state at least once is 1. A transient state is one for which that probability is less than 1. If a Markov chain contains transient states, it will eventually leave those states and arrive in a recurrent state or class of states, where it will remain permanently. Such a chain is called absorbing. Absorbing chains are the basic model for the demography of individuals because life is inherently transient. Any individual will, with probability one, eventually leave the set of living states and be absorbed by death.

If a Markov chain consists of a single set of recurrent states that all communicate with each other, it is said to be ergodic. The transition matrix for an ergodic chain is irreducible and primitive. Ergodic Markov chains play a limited role in demographic contexts because they cannot include mortality. Chapter 11 will, however, present the sensitivity analysis of these models.

In demographic models, individuals move among a set of transient (i.e., living) states in their life cycle before they eventually reach an absorbing state (death). Transient states may represent age classes, developmental or life history stages, or states defined by health, employment, economic, or other kinds of status. In studying longevity, we are particularly interested in absorbing states representing death, or perhaps death classified by age or stage at death, or by cause of death. The analysis applies equally to other ways of leaving the life cycle (e.g., graduation in a model of educational states, discharge from treatment in model of health states).

Number the stages in the life cycle so that the transient states are 1, …, s and the absorbing states are s + 1, …, s + a. Then the transition matrix of the Markov chain is

(4.12)

Here, U is the s × s matrix of transition probabilities among the transient states. The a × s matrix M gives the probabilities of absorption in each of the absorbing states. The columns of P sum to one. I assume that the spectral radius (the dominant eigenvalue) of U is strictly less than one; a sufficient condition for this is that there is a non-zero probability of ultimate death for every stage.

Age-classified models are a special case with survival probabilities on the subdiagonal (and possibly in the last diagonal entry); e.g., for s = 3 in which

$$\displaystyle \begin{aligned} {\mathbf{U}} = \left(\begin{array}{ccc} 0 & 0 & 0 \\ p_1 & 0 &0 \\ 0& p_2 & p_3 \\ \end{array}\right) {} \end{aligned} $$
(4.13)

The age-specific survival probability is \(p_i = e^{- \mu _i}\), with μ i a mortality rate applying to age class i. The (s, s) entry of U is an age-independent survival probability for a final open-ended age class, with a remaining life expectancy of 1∕(1 − p s). If p s = 0 no one survives beyond age class s. When the age-classified model is constructed from a life table, p i = 1 − q i−1; that is, the survival of age-class 1 is the complement of the probability of death between age 0 and 1.

The mortality matrix M gives the probabilities of transition from each of the transient states to each of the absorbing states. Figure 4.1 shows some examples of life cycle formulations that can arise, including both age and stage classification in the transient states, and absorbing states classified by age at death, grouped ages at death, stage at death, or cause of death. The resulting mortality matrices are

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{Figure 4.1a} \qquad & {\mathbf{M}}& = \left(\begin{array}{cccc} 1-P_1 & 1-P_2 & 1-P_3 & 1 \end{array}\right) \end{array} \end{aligned} $$
(4.14)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{Figure 4.1b} \qquad &{\mathbf{M}}& = \left(\begin{array}{cccc} 1-P_1 & 0 & 0 & 0 \\ 0 & 1-P_2 & 0 & 0 \\ 0 & 0 & 1-P_3 & 0\\ 0&0&0& 1-P_4 \end{array}\right) {} \end{array} \end{aligned} $$
(4.15)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{Figure 4.1c} \qquad &{\mathbf{M}}& = \left(\begin{array}{cccc} 1-P_1 & 1-P_2 & 0 & 0 \\ 0 & 0 & 1-P_3 & 1-P_4 \end{array}\right) \end{array} \end{aligned} $$
(4.16)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{Figure 4.1d} \qquad & {\mathbf{M}}& = \left(\begin{array}{cccc} q_1 & 0 & 0 & 0 \\ 0 & q_2 & 0 & 0 \\ 0 & 0 & q_3 & 0\\ 0&0&0& q_4 \end{array}\right) {} \end{array} \end{aligned} $$
(4.17)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{Figure 4.1e} \qquad & {\mathbf{M}} &= \left(\begin{array}{cccc} q_1 & q_2 & q_3& q_4\\ s_1 & s_2 & s_3 & s_4 \end{array}\right) \end{array} \end{aligned} $$
(4.18)

The beauty of formulating longevity as a Markov chain is that many statistics of longevity can be written in terms of the matrices U and M and sensitivity analysis can be carried out using matrix calculus.

Fig. 4.1
figure 1

Life cycle graphs showing some alternative choices for structure of the absorbing state: death, age at death, stage at death, or cause of death. (a) Age-classified with one dead state. (b) Age-classified, age at death. (c) Age-classified, grouped ages at death. (d) Stage-classified, stage at death. (e) Age-classified, causes of death

3.2 Occupancy Times

Consider an individual in transient state j. Eventual absortion is certain. But before that, the individual will occupy various transient states. The number of such visits, the occupancy timeFootnote 2 is the basic unit of longevity. Occupancy is particularly central in studies of health demography, where it quantifies the parts of a life spent in different health states. But, even without the added dimension of something like health, occupancy of transient states is the basis of longevity analysis.

Let ν ij be the number of visits to transient state i by an individual in transient state j, prior to absorption. Its expectation is given by the fundamental matrix (e.g., Kemeny and Snell 1976; Iosifescu 1980)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}} &=& \left(\begin{array}{c} E(\nu_{ij}) \end{array}\right) \end{array} \end{aligned} $$
(4.19)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &=& \left( {\mathbf{I}} - {\mathbf{U}} \right)^{-1} \end{array} \end{aligned} $$
(4.20)

More details, and examples, for the higher moments and variances of occupancy times are given in Chaps. 5 and 11 .

3.3 Longevity

The longevity of an individual in state j can be equated to the total occupancy time of all transient states by that individual, prior to eventual absorption. Let η j be this longevity; the expectation of η j is the sum of the elements in column j of N. We define η 1 and η 2 as the vectors containing the first and second moments of longevity, respectively. Then

$$\displaystyle \begin{aligned} E(\boldsymbol{\eta})^{\mathsf{T}} = \boldsymbol{\eta}_1^{\mathsf{T}} = {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}} {} \end{aligned} $$
(4.21)

Figure 4.2a shows the life expectancy for India in 1961 and Japan in 2006.

Fig. 4.2
figure 2

Calculations for longevity of India (1961) and Japan (2006). (a) Remaining life expectancy as a function of age. (b) Standard deviation of remaining longevity as a function of age. Vertical line at age 10 indicates SD 10, sometimes used as a measure of lifespan disparity. (c) Sensitivity of life expectancy at birth to changes mortality at each age. (d) Sensitivity of variance in longevity at birth to changes in mortality at each age. (e) Sensitivity of life disparity η to changes in mortality at each age

The vector of the second moments of longevity satisfies

$$\displaystyle \begin{aligned} \boldsymbol{\eta}_2^{\mathsf{T}} = \boldsymbol{\eta}_1^{\mathsf{T}} \left( 2 {\mathbf{N}} - {\mathbf{I}} \right) {} \end{aligned} $$
(4.22)

(Iosifescu 1980). The variance and standard deviation of longevity are thus

$$\displaystyle \begin{aligned} \begin{array}{rcl} V \left( \boldsymbol{\eta}\right) ^{\mathsf{T}} &\displaystyle =&\displaystyle \boldsymbol{\eta}_2 - \boldsymbol{\eta}_1 \circ \boldsymbol{\eta}_1 {} \end{array} \end{aligned} $$
(4.23)
$$\displaystyle \begin{aligned} \begin{array}{rcl} SD(\boldsymbol{\eta}) &\displaystyle =&\displaystyle \sqrt{V(\boldsymbol{\eta})} {} \end{array} \end{aligned} $$
(4.24)

where the square root is taken element-wise.

Note that V (η) and SD(η) are vectors; their elements give the variance or standard deviation of longevity for individuals in each stage, making it easy to examine variation in remaining longevity conditional on the starting age. This conditioning can be important; Edwards and Tuljapurkar (2005) have made a strong case that SD(η 10), starting from age 10, is a good index to prevent infant and child mortality from obscuring patterns in old age longevity.

Figure 4.2b shows SD(η) for India and Japan. The standard deviation at birth, SD(η 1) is roughly twice as great in India as in Japan, a discrepancy that remains at SD(η 10). Eventually, beyond the age of 50, SD(η) becomes greater in India than in Japan.

3.4 Age or Stage at Death

If the model contains more than one absorbing state (as in all the cases but the first in Fig. 4.1), the eventual fate of an individual is uncertain. The probability distributions of the eventual absorbing state are given by the columns of the matrix

$$\displaystyle \begin{aligned} {\mathbf{B}} = {\mathbf{M}} {\mathbf{N}} {} \end{aligned} $$
(4.25)

where b ij is the probability of eventual absorption in absorbing state i for an individual starting in transient state j (Iosifescu 1980).

Suppose that the absorbing stages are defined as the age (or stage) at death, as in Fig. 4.1b, d. Then M is given by Eq. (4.17) and the jth column of B is the probability distribution of age at death for an individual starting in age class j:

$$\displaystyle \begin{aligned} \boldsymbol{\psi}_j = {\mathbf{B}}(:,j) = {\mathbf{B}} {\mathbf{e}}_j . \end{aligned} $$
(4.26)

3.5 Life Lost and Life Disparity

When an individual dies, it loses the remaining life that it would have experienced, had it not died. This counterfactual proposition seems abstract, but we can make it concrete by asking for the expectation of that lost lifetime. An individual that dies at age x will lose, on average, an amount of life given by the life expectancy at age x. Averaging this remaining life expectancy over the distribution of age at death gives the mean life lost due to mortality. Vaupel and Canudas Romo (2003) denoted the life lost by e . Here we define the vector η , whose ith entry is the expected life lost due to mortality by an individual starting in age class i; it is given by

$$\displaystyle \begin{aligned} \left( \boldsymbol{\eta}^\dagger \right)^{\mathsf{T}} = \boldsymbol{\eta}_1^{\mathsf{T}} {\mathbf{B}} . {} \end{aligned} $$
(4.27)

Calculations of life lost from mortality due to specific causes of death play a central role in the calculations of disability-adjusted life years (DALYs) used in calculations of the burden of diseases (e.g., Devleesschauwer et al. 2014; GBD 2016 DALYs and HALE Collaborators 2017). See Caswell and Zarulli (2018) for the relationship between DALY calculations and Markov chain methods, and for a calculation of the variance in life lost.

The life lost η has an additional interpretation as a measure of disparity. Consider a population in which everyone dies at the same age. In such a situation, η  = 0, because at the age of death, there is no additional life expectancy. Thus η is a measure of “life disparity;” the larger its value, the more disparity there is among individuals in age at death (Vaupel et al. 2011).

The values of life disparity in age class 1, for Japan and India, in years, are

$$\displaystyle \begin{aligned} \eta_1^\dag = \left\{ \begin{array}{rl} 10.1 & \ \mbox{Japan} \\ 23.9 & \ \mbox{India} \end{array} \right. \end{aligned} $$
(4.28)

Just as India has a much larger variance in longevity than Japan, it also has a higher life disparity.

4 Sensitivity Analysis

Our goal is to obtain expressions for the derivatives of E(η), V (η), SD(η), B, and η , with respect to changes in age specific-mortality rates. The calculations and some results (contrasting the mortality schedules of Japan and India) are given here. More details are presented in Chaps. 5 and 11. Results are presented in terms of an arbitrary vector θ of parameters on which U and M depend. In the examples, θ will be the vector μ of age-specific mortality rates.

4.1 Sensitivity of the Fundamental Matrix

The fundamental matrix N appears in many of these formulas. Its sensitivity was first obtained by Caswell (2006). Suppose that U is a function of some vector θ of parameters. Then

$$\displaystyle \begin{aligned} {d \mbox{vec} \, {\mathbf{N}} \over d \boldsymbol{\theta}^{\mathsf{T}}} = \left( {\mathbf{N}}^{\mathsf{T}} \otimes {\mathbf{N}} \right) {d \mbox{vec} \, {\mathbf{U}} \over d \boldsymbol{\theta}^{\mathsf{T}}} {} \end{aligned} $$
(4.29)

(see Chap. 5).

4.2 Sensitivity of Life Expectancy

The sensitivity of the vector of life expectancy as a function of age is obtained by differentiating (4.21),

$$\displaystyle \begin{aligned} d \boldsymbol{\eta}_1^{\mathsf{T}} = {\mathbf{1}}^{\mathsf{T}} (d {\mathbf{N}}) \end{aligned} $$
(4.30)

Applying the vec operator and Roth’s theorem (2.13) gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} d \boldsymbol{\eta}_1 &\displaystyle =&\displaystyle \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{N}} \end{array} \end{aligned} $$
(4.31)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \left( {\mathbf{N}}^{\mathsf{T}} \otimes {\mathbf{N}} \right) d \mbox{vec} \, {\mathbf{U}} \end{array} \end{aligned} $$
(4.32)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \left( {\mathbf{N}}^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{U}} . {} \end{array} \end{aligned} $$
(4.33)

The last step uses the fact that (A ⊗B)(C ⊗D) = (AC ⊗BD). Applying the chain rule and the first identification theorem gives the result

$$\displaystyle \begin{aligned} {d \boldsymbol{\eta}_1 \over d \boldsymbol{\theta}^{\mathsf{T}}} = \left( {\mathbf{N}}^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) {d \mbox{vec} \, {\mathbf{U}} \over d \boldsymbol{\theta}^{\mathsf{T}}} {} \end{aligned} $$
(4.34)

Sensitivity to mortality

If interest focuses on changes in age-specific mortality, so that θ = μ, then the sensitivity formula expands, using the chain rule, to

$$\displaystyle \begin{aligned} {d \boldsymbol{\eta}_1 \over d \boldsymbol{\mu}^{\mathsf{T}}} = \left( {\mathbf{N}}^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) {d \mbox{vec} \, {\mathbf{U}} \over d \boldsymbol{\mu}^{\mathsf{T}}} \end{aligned} $$
(4.35)

This can be evaluated in several ways, depending on how the matrix U is written as a function of mortality. One approach is used in Sect. 4.4.3, and a somewhat more widely useful approach in Sect. 4.4.4.

The results for Japan and India are shown in Fig. 4.2. Life expectancy is more sensitive to changes in mortality in Japan than in India; the (absolute value of) sensitivity decreases almost linearly with age in Japan, and slightly less linearly in India (Fig. 4.2). On the other hand, life expectancy is more elastic to changes in mortality in India, and less so in Japan.

4.3 Generalizing the Keyfitz-Pollard Formula

The Keyfitz-Pollard formula for the sensitivity of life expectancy to changes in mortality rate, given in Eq. (4.2), has a clear interpretation: the sensitivity to mortality at age a depends on the probability of survival to age a and the remaining life expectancy at age a. We are now in a position to generalize this to stage-classified matrix models.

First, we derive the matrix version of the Keyfitz-Pollard result, for the sensitivity of life expectancy of age class 1, which is

$$\displaystyle \begin{aligned} \begin{array}{rcl} d E \left( \eta_1 \right) &\displaystyle =&\displaystyle \left( {\mathbf{e}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{N}} \end{array} \end{aligned} $$
(4.36)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle \left( {\mathbf{e}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \left( {\mathbf{N}}^{\mathsf{T}} \otimes {\mathbf{N}} \right) d \mbox{vec} \, {\mathbf{U}} {} \end{array} \end{aligned} $$
(4.37)

Consider a population with s age classes and let μ i be the mortality rate and \(p_i=\exp (-\mu _i) \) the survival probability for age class i. The matrix U is given by (4.13), which can be written

$$\displaystyle \begin{aligned} {\mathbf{U}} = \displaystyle \sum_{k=1}^{s-1} \left( {\mathbf{e}}_{k+1} {\mathbf{e}}_{k}^{\mathsf{T}} \right) \; p_k \end{aligned} $$
(4.38)

where e k is the unit vector, of length s, with a 1 in the kth position and zeros elsewhere. Differentiating U and applying the vec operator gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, {\mathbf{U}} &\displaystyle =&\displaystyle \displaystyle - \sum_{k=1}^{s-1} \left({\mathbf{e}}_k \otimes {\mathbf{e}}_{k+1} \right) \; p_k \left( d \mu_k \right) {} \end{array} \end{aligned} $$
(4.39)

Substitute (4.39) into (4.37) and consider a perturbation of mortality at age a; the result is

$$\displaystyle \begin{aligned} {d E(\eta_1) \over d \mu_a} = - \left( {\mathbf{e}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \left( {\mathbf{N}}^{\mathsf{T}} \otimes {\mathbf{N}} \right) \left({\mathbf{e}}_a \otimes {\mathbf{e}}_{a+1} \right) \; p_a . \end{aligned} $$
(4.40)

This simplifies to

$$\displaystyle \begin{aligned} \begin{array}{rcl} {d E(\eta_1) \over d \mu_a} &\displaystyle =&\displaystyle - \left( {\mathbf{e}}_1^{\mathsf{T}} {\mathbf{N}}^{\mathsf{T}} {\mathbf{e}}_a \otimes {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}} {\mathbf{e}}_{a+1} \right) p_a \end{array} \end{aligned} $$
(4.41)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle - \underbrace{E \left( \nu_{a} \right)\, p_a}_{\mathrm{survival}} \, \underbrace{E\left( \eta_{a+1} \right)}_{\mathrm{expectancy}} \qquad \mbox{age-classified} \end{array} \end{aligned} $$
(4.42)

In an age-classified model, ν a is either 0 or 1 (you cannot occupy a year of age for more than 1 year); hence the \(E \left ( \nu _{a} \right )\) is the probability of survival to age a. Thus we have a matrix version of the Keyfitz-Pollard result: the sensitivity of life expectancy is the probability of survival to age a times the probability of survival from a to a + 1, times the life expectancy at age a + 1.

Now apply the same approach to a stage-classified model, in which U can be written as the product of a diagonal matrix Σ with survival probabilities on the diagonal, and a stochastic matrix G giving the transition probabilities conditional on survival:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{U}} &=& {\mathbf{G}} \boldsymbol{\Sigma} \end{array} \end{aligned} $$
(4.43)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &=& {\mathbf{G}} \left(\begin{array}{ccc} p_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & p_s \end{array}\right) {} \end{array} \end{aligned} $$
(4.44)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &=& \displaystyle {\mathbf{G}} \sum_{k=1}^s \left( {\mathbf{e}}_k {\mathbf{e}}_k^{\mathsf{T}} \right) \; p_k \end{array} \end{aligned} $$
(4.45)

Differentiating and applying the vec operator gives

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{U}} = \displaystyle \sum_{k=1}^s \left( {\mathbf{e}}_k \otimes {\mathbf{G}} {\mathbf{e}}_k \right) \; p_k \left(d \mu_k \right) \end{aligned} $$
(4.46)

Substitute this into (4.37) and focus on a change in mortality at stage a; the result is

$$\displaystyle \begin{aligned} {d E(\eta_1) \over d \mu_a} = - \left( {\mathbf{e}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \left( {\mathbf{N}}^{\mathsf{T}} \otimes {\mathbf{N}} \right) \left( {\mathbf{e}}_a \otimes {\mathbf{G}} {\mathbf{e}}_a \right) \; p_a \end{aligned} $$
(4.47)

which simplifies to

$$\displaystyle \begin{aligned} \begin{array}{rcl} {d E \left( \eta_1 \right) \over d \mu_a} &\displaystyle =&\displaystyle - \left( {\mathbf{e}}_1^{\mathsf{T}} {\mathbf{N}}^{\mathsf{T}} {\mathbf{e}}_a \otimes {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}} {\mathbf{G}} {\mathbf{e}}_{a} \right) p_a \end{array} \end{aligned} $$
(4.48)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle - E \left( \nu_{a1} \right) E \left( \boldsymbol{\eta}^{\mathsf{T}} \right) {\mathbf{G}}(:,a) p_a \end{array} \end{aligned} $$
(4.49)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle =&\displaystyle - \underbrace{E \left( \nu_{a1} \right)}_{\mathrm{occupancy}} \; \sum_{h=1}^s \underbrace{p_a g_{ha}}_{\mathrm{transitions}} \; \underbrace{E \left( \eta_h \right)}_{\mathrm{expectancy}} \qquad \mbox{stage-classified} {} \end{array} \end{aligned} $$
(4.50)

Equation (4.50) is the stage-classified version of Keyfitz-Pollard: the sensitivity of life expectancy to a change in mortality in stage j is the product of the expected time spent in stage j and the remaining life expectancy, calculated as an average of the life expectancy of all stages k, weighted by the probability of transition from j to k. This can be simplified further by noting that, for either age or stage-classified populations, G(:, a)p a = U(:, a), so that a completely general expression is

$$\displaystyle \begin{aligned} {d E \left( \eta_1 \right) \over d \mu_a} = - E \left( \nu_{a1} \right) E \left( \boldsymbol{\eta}^{\mathsf{T}} \right) {\mathbf{U}}(:,a) \qquad \mbox{age- or stage-classified} \end{aligned} $$
(4.51)

4.4 Sensitivity of the Variance of Longevity

The sensitivity of the variance in longevity is obtained by differentiating (4.23)

$$\displaystyle \begin{aligned} d V \left( \boldsymbol{\eta} \right) = d \boldsymbol{\eta}_2 - 2 \left( \boldsymbol{\eta}_1 \circ d \boldsymbol{\eta}_1 \right) \end{aligned} $$
(4.52)

and applying the vec operator (using results from Chap. 2 on the vec of the Hadamard product), to obtain

$$\displaystyle \begin{aligned} d V \left( \boldsymbol{\eta} \right) = d \boldsymbol{\eta}_2 - 2 \mathcal{D}\,(\boldsymbol{\eta}_1) d \boldsymbol{\eta}_1. {} \end{aligned} $$
(4.53)

The derivative of η 1 is already given by (4.33):

$$\displaystyle \begin{aligned} d \boldsymbol{\eta}_1 = \left( {\mathbf{N}}^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{U}} . {} \end{aligned} $$
(4.54)

The derivative of η 2 is obtained by differentiating (4.22):

$$\displaystyle \begin{aligned} d \boldsymbol{\eta}_2^{\mathsf{T}} = 2 \left( d \boldsymbol{\eta}_1^{\mathsf{T}} \right) {\mathbf{N}} + d \boldsymbol{\eta}_1^{\mathsf{T}} \left( d {\mathbf{N}} \right) - d \boldsymbol{\eta}_1^{\mathsf{T}} \end{aligned} $$
(4.55)

Applying the vec operator to both sides and substituting (4.29) for dvec N gives

$$\displaystyle \begin{aligned} d \boldsymbol{\eta}_2 = \left( 2 {\mathbf{N}}^{\mathsf{T}} - {\mathbf{I}} \right) d \boldsymbol{\eta}_1 + 2 \left( {\mathbf{N}}^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} {\mathbf{N}} \right) d \mbox{vec} \, {\mathbf{U}} {} \end{aligned} $$
(4.56)

Inserting (4.54) for d η 1 and (4.56) for d η 2 into (4.53) gives the sensitivity of the variance in remaining longevity, for any starting age or stage, to changes in U. The sensitivity of longevity to mortality is obtained by differentiating U with respect to μ.

Derivatives of U

The derivative of U to the mortality vector μ are obtained as follows. For an age-classified model, define an age-advancement matrix

$$\displaystyle \begin{aligned} {\mathbf{L}} = \left(\begin{array}{ccc} 0 & 0 & 0\\ 1 & 0 & 0 \\ 0 & 1 & [1] \end{array}\right) \end{aligned} $$
(4.57)

(show here for three age classes, with the optional open-ended last age class). This matrix will mask the entries of a matrix 1p T, that contains p in each row, to obtain

$$\displaystyle \begin{aligned} {\mathbf{U}} = {\mathbf{L}} \circ \left( \mathbf{1} {\mathbf{p}}^{\mathsf{T}} \right) \end{aligned} $$
(4.58)

Differentiating and applying the vec operator gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} d {\mathbf{U}} &\displaystyle =&\displaystyle {\mathbf{L}} \circ \left(\rule{0in}{2ex} \mathbf{1} \left( d{\mathbf{p}}^{\mathsf{T}} \right) \right) \end{array} \end{aligned} $$
(4.59)
$$\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, {\mathbf{U}} &\displaystyle =&\displaystyle \mathcal{D}\,(\mbox{vec} \, {\mathbf{L}} ) \left({\mathbf{I}} \otimes \mathbf{1} \right) d {\mathbf{p}} . {} \end{array} \end{aligned} $$
(4.60)

Since \({\mathbf {p}} = \exp ( - \boldsymbol {\mu } )\),

$$\displaystyle \begin{aligned} d {\mathbf{p}} = - \mathcal{D}\,({\mathbf{p}}) d \boldsymbol{\mu}, \end{aligned} $$
(4.61)

and hence

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{U}}= - \mathcal{D}\,(\mbox{vec} \, {\mathbf{L}} ) \left({\mathbf{I}} \otimes \mathbf{1} \right) \mathcal{D}\,({\mathbf{p}}) d \boldsymbol{\mu} \qquad \mbox{age-classified} {} \end{aligned} $$
(4.62)

For a stage-classified model, write U = G Σ, as in (4.44) as

$$\displaystyle \begin{aligned} {\mathbf{U}} = {\mathbf{G}} \left[ {\mathbf{I}} \circ \left( \mathbf{1} {\mathbf{p}}^{\mathsf{T}} \right) \right] \end{aligned} $$
(4.63)

Differentiating and applying the vec operator, following the strategy of (4.60), gives

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{U}} = - \left( {\mathbf{I}} \otimes {\mathbf{G}}\right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}} ) \left({\mathbf{I}} \otimes \mathbf{1} \right) \mathcal{D}\,({\mathbf{p}}) d \boldsymbol{\mu} \qquad \mbox{stage-classified} {} \end{aligned} $$
(4.64)

Substituting (4.62) and (4.64) into the expressions for d η 1 and d η 2, and substituting those into (4.53) gives the sensitivity of the variance in longevity to age- or stage-specific mortality. It is possible to carry out the substitutions and to arrive at a single (large) expression for dV (η); see Chap. 5.

Figure 4.2d shows the sensitivity and elasticity of variance of longevity to changes in age-specific mortality. The variance is more sensitive to mortality changes in Japan than in India, and the sensitivities are highest at young ages. Both life tables have the property that sensitivities are positive at early ages (≈0–20 for India, ≈0–80 for Japan) and then become negative. Before this age, reductions in mortality will reduce variance; after this age, reductions in mortality increase the variance. See Sect. 4.4.6 for more on this.

4.5 Sensitivity of the Distribution of Age at Death

The sensitivity of the distribution of age or stage at death is obtained by differentiating (4.25) and applying the vec operator,

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}} = \left( {\mathbf{N}}^{\mathsf{T}} \otimes {\mathbf{I}} \right) d \mbox{vec} \, {\mathbf{M}} + \left({\mathbf{I}} \otimes {\mathbf{M}} \right) d \mbox{vec} \, {\mathbf{N}}. \end{aligned} $$
(4.65)

We already know dvec N. To obtain dvec M, note that when the absorbing states are defined in terms of stage at death

$$\displaystyle \begin{aligned} {\mathbf{M}} = {\mathbf{I}} - \mathcal{D}\,( {\mathbf{p}}) \end{aligned} $$
(4.66)

and thus

$$\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{M}} = - \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}} ) \left( {\mathbf{I}} \otimes \mathbf{1} \right) d {\mathbf{p}} \end{aligned} $$
(4.67)

It is revealing to write the sensitivity of B to changes in mortality using the chain rule,

$$\displaystyle \begin{aligned} {d \mbox{vec} \, {\mathbf{B}} \over d \boldsymbol{\mu}^{\mathsf{T}}} = \left( {\mathbf{N}}^{\mathsf{T}} \otimes {\mathbf{I}} \right) {d \mbox{vec} \, {\mathbf{M}} \over d {\mathbf{p}}^{\mathsf{T}}} {d {\mathbf{p}} \over d \boldsymbol{\mu}^{\mathsf{T}}} + \left({\mathbf{I}} \otimes {\mathbf{M}} \right) {d \mbox{vec} \, {\mathbf{N}} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{U}}} {d \mbox{vec} \, {\mathbf{U}} \over d {\mathbf{p}}^{\mathsf{T}}} {d {\mathbf{p}} \over d \boldsymbol{\mu}^{\mathsf{T}}} \end{aligned} $$
(4.68)

and to recognize how many of the pieces we have already obtained.

The distribution of stage at death for individuals starting in stage j is given by column j of B; i.e., ψ j = B(:, j). The sensitivity of ψ j to changes in mortality is

$$\displaystyle \begin{aligned} {d \boldsymbol{\psi}_j \over d \boldsymbol{\mu}^{\mathsf{T}}} = \left({\mathbf{e}}_j \otimes {\mathbf{I}} \right) {d \mbox{vec} \, {\mathbf{B}} \over d \boldsymbol{\mu}^{\mathsf{T}}} \end{aligned} $$
(4.69)

for any age or stage j of interest.

4.6 Sensitivity of Life Disparity

To get the sensitivity of the vector η , differentiate and apply the vec operator to Eq. (4.27), which gives

$$\displaystyle \begin{aligned} d \boldsymbol{\eta}^\dagger = {\mathbf{B}}^{\mathsf{T}} d \boldsymbol{\eta}_1 + \left( {\mathbf{I}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{B}}. {} \end{aligned} $$
(4.70)

Evaluating this expression for the data on India and Japan, we see that the sensitivity of η shows a pattern similar to that of the sensitivity of V (η) (Fig. 4.2), confirming that these indices are measuring similar aspects of disparity in longevity.

In particular, they show the existence of a critical age, before which reductions in mortality reduce disparity and after which they have the opposite effect. Zhang and Vaupel (2009) showed that this critical age, which they describe as separating “early” from “late” deaths is a general property of η . Although the details depend on which index of disparity one uses, the existence of a critical age separating positive and negative sensitivities is also a property of other measures of variation in longevity (Van Raalte and Caswell 2013). Vaupel et al. (2011) have used the critical age to decompose historical changes in lifespan disparity into components due to early and late mortality.

5 A Time-Series LTRE Decomposition: Life Disparity

The LTRE decomposition analysis in Sect. 2.9 can be used to decompose time series such as these into their components. We apply it here to calculate the contributions, to a long trajectory of changes in η , of changes in early and late mortality.

Suppose that some demographic outcome ξ(t) (dimension s × 1) is measured as a function of a parameter vector θ (dimension p × 1), at times 1, 2, …T. The changes in ξ(t) over time result from the changes in the parameters,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Delta \boldsymbol{\xi}(t) &\displaystyle =&\displaystyle \boldsymbol{\xi}(t+1) - \boldsymbol{\xi}(t) \end{array} \end{aligned} $$
(4.71)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \Delta \boldsymbol{\theta}(t) &\displaystyle =&\displaystyle \boldsymbol{\theta}(t+1) - \boldsymbol{\theta}(t) \end{array} \end{aligned} $$
(4.72)

The decomposition analysis for such sequences was introduced as a “regression LTRE” method in the context of ecotoxicology and response to environmental factors (e.g., Caswell 1996; Knight et al. 2009). The same approach was introduced independently by Horiuchi et al. (2008) to decompose differences between two conditions by imagining a continuous path from one to the other.

The analysis starts by considering the change in ξ over time,

$$\displaystyle \begin{aligned} {d \boldsymbol{\xi}(t) \over d t} = {d \boldsymbol{\xi}(t) \over d \boldsymbol{\theta}^{\mathsf{T}}(t)} {d \boldsymbol{\theta}(t) \over d t} \end{aligned} $$
(4.73)

If the time series is evaluated at discrete times t = 1, …, T, then to first order

$$\displaystyle \begin{aligned} \Delta \boldsymbol{\xi}(t) \approx {d \boldsymbol{\xi}(t) \over d \boldsymbol{\theta}^{\mathsf{T}} (t)} \Delta \boldsymbol{\theta}(t) \qquad s \times 1 \end{aligned} $$
(4.74)

The contributions to Δξ(t) are displayed separately in a contribution matrix

$$\displaystyle \begin{aligned} {\mathbf{C}}(t) = {d \boldsymbol{\xi}(t) \over d \boldsymbol{\theta}^{\mathsf{T}} (t)} \, \mathcal{D}\,[\Delta \boldsymbol{\theta}(t) ] \qquad s \times p \end{aligned} $$
(4.75)

the (i, j) entry of C(t) is the contribution of Δθ j(t) to Δξ i(t). The contributions additive over time, so the contributions of all the changes, integrated from t 1 to t 2, are given by the entries of

$$\displaystyle \begin{aligned} {\mathbf{C}} \left( t_1, t_2 \right) = \sum_{t=t_1}^{t_2} {\mathbf{C}}(t) \end{aligned} $$
(4.76)

Suppose the dependent variable is ξ = η and the parameter vector is θ = μ.

At each time and for each age, we aggregate the contributions from early and late mortality. Let X be an indicator matrix whose entries define whether a particular entry of C(t) is to be counted as early or late:

$$\displaystyle \begin{aligned} x_{ij} = \left\{ \begin{array}{cl} 1 & \ \theta_j\mbox{ contributes to }\Delta \xi_i \\ 0 & \ \mbox{otherwise} \end{array} \right. \end{aligned} $$
(4.77)

Then

$$\displaystyle \begin{aligned} {\mathbf{c}}(t) = \left( {\mathbf{C}}(t) \circ {\mathbf{X}} \right) \mathbf{1}\end{aligned} $$
(4.78)

is a vector giving the contributions to the change in ξ from the parameters chosen in X. Defining X early and X late gives changes at time t due to early and late mortality. The LTRE analysis is then

$$\displaystyle \begin{aligned} {\mathbf{c}}_{\mathrm{early}}(t_1, t_2) = \sum_{t_1}^{t_2} {\mathbf{c}}_{\mathrm{early}} (t) \end{aligned} $$
(4.79)

and similarly for c late(t 1, t 2).

As an example, Fig. 4.3a, b shows a time series of life expectancy (increasing from about 40–80 years between 1800 and 2010) and life disparity for Swedish females, based on data from Human Mortality Database (2016). As in most developed countries, life disparity at birth dropped dramatically from 1850 to about 1950 (e.g., Edwards 2011; Vaupel et al. 2011). Declines at later ages were less dramatic, and remaining life disparity conditional on survival to age 50 has been almost flat (Engelman et al. 2014). How did changes in early and late mortality contribute to these patterns?

Fig. 4.3
figure 3

(a) Historical trends in life expectancy at birth from 1800 to 2010. (b) Historical trends in life disparity (mean years of life lost due to mortality) for ages 0 and 50 years. (c) Contributions from early and late mortality improvement to the change in disparity at age 0. (d) The contributions for disparity at age 50. (Data for Swedish females, from the Human Mortality Database)

Figure 4.3c, d show the cumulative sums of the contributions c early and c late, and their total, for ages 0 and 50. The decline in life disparity at birth was driven almost completely by improvements in early mortality, which completely overshadowed a small increase in disparity that was generated by improvements in late life mortality. The picture for remaining life disparity at age 50 is different: the contributions from changes in early and late life mortality almost completely cancel each other out. These patterns, looking at the details of a single time series, agree with the much more general exploration of multiple countries, using a different approach, by Vaupel et al. (2011).

The accuracy of the decomposition can be evaluated by comparing the time series calculated from the total contributions, as shown in Fig. 4.3c, d, with the observed series, as shown in Fig. 4.3b. The agreement is extremely close; the LTRE decomposition captures the end result of the historical changes from 1800 to 2010 with an error of less than 0.1%.

6 Conclusion

This chapter and Chap. 3 contain examples of different approaches to the sensitivity analysis, of population growth rate and longevity, respectively. The power and flexibility of matrix calculus methods is apparent: the models are not restricted to age- or stage-classification, the absorbing states may be a single category of death or some more diverse set, the demographic outcomes are not limited to expectations, and the independent variables, the parameters that are being perturbed, can be anything of interest. The only requirement is that a chain of functional dependence can be followed: the outcome ξ depends on U, which depends on p, which depends on μ, …and so on. Mortality might depend on health status, which might depend on income level, which might depend on education, …, and so on. The sensitivity of ξ to any of these parameters is a application of the chain rule.