# Sensitivity Analysis of Discrete Markov Chains

• Hal Caswell
Open Access
Chapter
Part of the Demographic Research Monographs book series (DEMOGRAPHIC)

## Abstract

As we have seen repeatedly, Markov chains are often used as mathematical models of demographic (as well as other natural) phenomena, with transition probabilities defined in terms of parameters that are of interest in the scientific question at hand. Sensitivity analysis is an important way to quantify the effects of changes in these parameters on the behavior of the chain.

## 11.1 Introduction

As we have seen repeatedly, Markov chains are often used as mathematical models of demographic (as well as other natural) phenomena, with transition probabilities defined in terms of parameters that are of interest in the scientific question at hand. Sensitivity analysis is an important way to quantify the effects of changes in these parameters on the behavior of the chain. This chapter revisits, in a more rigorous way, some of the quantities already explored for absorbing Markov chains (Chaps. , , and ). It will also consider ergodic Markov chains (in which no absorbing states exist), and calculate the sensitivity of the stationary distribution and measures of the rate of convergence.

Perturbation (or sensitivity) analysis is a long-standing problem in the theory of Markov chains (Schweitzer 1968; Conlisk 1985; Golub and Meyer 1986; Funderlic and Meyer 1986; Seneta 1988, 1993; Meyer 1994; Cho and Meyer 2000; Mitrophanov 2003, 2005; Mitrophanov et al. 2005; Kirkland et al. 2008). When Markov chains are applied as models of physical, biological, or social systems, they are often defined as functions of parameters that have substantive meaning.

## 11.2 Absorbing Chains

The transition matrix for a discrete-time absorbing chain can be written
(11.1)
where U, of dimension s × s, is the transition matrix among the s transient states, and M, of dimension a × s, contains probabilities of transition from the transient states to the a absorbing states. Assume that the spectral radius of U is strictly less than 1. Because we are concerned here with absorption, but not what happens after, we ignore transitions among absorbing states; hence the identity matrix (a × a) in the lower right corner. The matrices U[θ] and M[θ] are functions of a vector of parameters. We assume that θ varies over some set in which the column sums of P are 1 and the spectral radius of U is strictly less than one.

### 11.2.1 Occupancy: Visits to Transient States

Let νij be the number of visits to transient state i, prior to absorption, by an individual starting in transient state j. The expectations of the νij are entries of the fundamental matrix $${\mathbf {N}} = {\mathbf {N}}_1 = \left ( E(\eta _{ij}^{~}) \right )$$:
\displaystyle \begin{aligned} {\mathbf{N}} = \left( {\mathbf{I}} - {\mathbf{U}} \right)^{-1} {} \end{aligned}
(11.2)
(e.g., Kemeny and Snell 1960; Iosifescu 1980). Let $${\mathbf {N}}_k = \left ( E(\eta _{ij}^k) \right )$$ be a matrix containing the kth moments about the origin of the νij. The first several of these matrices are (Iosifescu 1980, Thm. 3.1)
\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_1 &\displaystyle =&\displaystyle \left( {\mathbf{I}} - {\mathbf{U}} \right)^{-1} {} \end{array} \end{aligned}
(11.3)
\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_2 &\displaystyle =&\displaystyle \left( 2 {\mathbf{N}}_{\mathrm{dg}} - {\mathbf{I}} \right) {\mathbf{N}}_1 {} \end{array} \end{aligned}
(11.4)
\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_3 &\displaystyle =&\displaystyle \left( 6 {\mathbf{N}}^{2}_{\mathrm{dg}} - 6 {\mathbf{N}}_{\mathrm{dg}} + {\mathbf{I}} \right) {\mathbf{N}}_1 {} \end{array} \end{aligned}
(11.5)
\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_4 &\displaystyle =&\displaystyle \left( 24 {\mathbf{N}}^{3}_{\mathrm{dg}} -36 {\mathbf{N}}^{2}_{\mathrm{dg}} +14 {\mathbf{N}}_{\mathrm{dg}} - {\mathbf{I}} \right) {\mathbf{N}}_1.{} \end{array} \end{aligned}
(11.6)

### Theorem 11.2.1

LetNkbe the matrix of kth moments of the νij, as given by (11.3) , (11.4) , (11.5) , and (11.6) . The sensitivities ofNk, for k = 1, …, 4 are
(11.7)
(11.8)
(11.9)
(11.10)
where (see Sect. )
\displaystyle \begin{aligned} \begin{array}{rcl} d {\mathbf{N}}_{\mathrm{dg}} &\displaystyle =&\displaystyle {\mathbf{I}} \circ d {\mathbf{N}}_1 \end{array} \end{aligned}
(11.11)
\displaystyle \begin{aligned} \begin{array}{rcl} d \mathrm{vec} \, {\mathbf{N}}_{\mathrm{dg}} &\displaystyle =&\displaystyle \mathcal{D}\,(\mathrm{vec} \, {\mathbf{I}} ) d \mathrm{vec} \, {\mathbf{N}}_1. {} \end{array} \end{aligned}
(11.12)

### Proof

The result (11.7) is derived in Caswell (2006, Section 3.1). For k > 1, and considering Nk as a function of N1 and Ndg, the total differential of Nk is
\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{N}}_k = {\partial \mbox{vec} \, {\mathbf{N}}_k \over \partial \mbox{vec} \,^{\mathsf{T}} {\mathbf{N}}_1} d \mbox{vec} \, {\mathbf{N}}_1 + {\partial \mbox{vec} \, {\mathbf{N}}_k \over \partial \mbox{vec} \,^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}} d \mbox{vec} \, {\mathbf{N}}_{\mathrm{dg}}. {} \end{aligned}
(11.13)
The two terms of (11.13) are the partial differentials of vec Nk, obtained by taking differentials treating only N1 or only Ndg as variables, respectively. Denote these partial differentials as $$\partial _{\mbox{ {{\mathbf {N}}_1}}}$$$$\partial _{\mbox{ {{\mathbf {N}}_1}}}$$ and $$\partial _{\mbox{ {{\mathbf {N}}_{\mathrm {dg}}}}}$$ and $$\partial _{\mbox{ {{\mathbf {N}}_{\mathrm {dg}}}}}$$. Differentiating N2 in (11.4), gives
\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {{\mathbf{N}}_1}}} {\mathbf{N}}_2 &\displaystyle =&\displaystyle 2 {\mathbf{N}}_{\mathrm{dg}} \left(d {\mathbf{N}}_1 \right) - d {\mathbf{N}}_1 \end{array} \end{aligned}
(11.14)
\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {{\mathbf{N}}_{\mathrm{dg}}}}} {\mathbf{N}}_2 &\displaystyle =&\displaystyle 2 \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_1. \end{array} \end{aligned}
(11.15)
Applying the vec operator gives
(11.16)
\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {{\mathbf{N}}_{\mathrm{dg}}}}} \mbox{vec} \, {\mathbf{N}}_2 &\displaystyle =&\displaystyle 2 \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{I}} \right) d \mbox{vec} \, {\mathbf{N}}_{\mathrm{dg}}, \end{array} \end{aligned}
(11.17)
and (11.13) becomes
(11.18)
which is (11.8). The derivations of dvec N3 and dvec N4 follow the same sequence of steps. The details are given in Appendix A. □

The derivatives of N2, N3, and N4 can be used to study the variance, standard deviation, coefficient of variation, skewness, and kurtosis of the number of visits to the transient states (Caswell 2006, 2009, 2011).

### 11.2.2 Time to Absorption

Let ηj be the time to absorption starting in transient state j and let $$\boldsymbol {\eta }_k = E \left (\begin {array}{ccc} \eta _1^k, \cdots ,\eta _s^k \end {array}\right )^{\mathsf {T}}$$. The first several of these moments are (Iosifescu 1980, Thm. 3.2)
\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_1^{\mathsf{T}} &\displaystyle =&\displaystyle {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_1 \end{array} \end{aligned}
(11.19)
\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_2^{\mathsf{T}} &\displaystyle =&\displaystyle \boldsymbol{\eta}_1^{\mathsf{T}} \left( 2 {\mathbf{N}}_1 - {\mathbf{I}} \right) \end{array} \end{aligned}
(11.20)
\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_3^{\mathsf{T}} &\displaystyle =&\displaystyle \boldsymbol{\eta}_1^{\mathsf{T}} \left( 6 {\mathbf{N}}_1^2 - 6 {\mathbf{N}}_1 + {\mathbf{I}} \right) {} \end{array} \end{aligned}
(11.21)
\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_4^{\mathsf{T}} &\displaystyle =&\displaystyle \boldsymbol{\eta}_1^{\mathsf{T}} \left( 24 {\mathbf{N}}_1^3 - 36 {\mathbf{N}}_1^2 + 14 {\mathbf{N}}_1 - {\mathbf{I}} \right). {} \end{array} \end{aligned}
(11.22)

### Theorem 11.2.2

Let η k be the vector of the kth moments of the η i . The sensitivities of these moment vectors are
$$\begin{array}{rcl} d {\eta}_1 &\displaystyle =&\displaystyle \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) d {vec} \, {\mathbf{N}}_1 {} \end{array}$$
(11.23)
\displaystyle \begin{aligned} \begin{array}{rcl} d \boldsymbol{\eta}_2 &\displaystyle =&\displaystyle \left( 2 {\mathbf{N}}_1^{\mathsf{T}} - {\mathbf{I}} \right) d \boldsymbol{\eta}_1 + 2 \left( {\mathbf{I}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d {vec} \, {\mathbf{N}}_1 {} \end{array} \end{aligned}
(11.24)
(11.25)
(11.26)

where dvecN1is given by (11.7) .

### Proof

The derivative of η1 is obtained (Caswell 2006) by differentiating to get $$d \boldsymbol {\eta }_1^{\mathsf {T}} = {\mathbf {1}}^{\mathsf {T}} \left ( d {\mathbf {N}}_1 \right )$$ and then applying the vec operator. For the higher moments, consider the ηk to be functions of η1 and N1, and write the total differential
\displaystyle \begin{aligned} d \boldsymbol{\eta}_k = {\partial \boldsymbol{\eta}_k \over \partial \boldsymbol{\eta}_1^{\mathsf{T}}} \; d \boldsymbol{\eta}_1 + {\partial \boldsymbol{\eta}_k \over \partial \mbox{vec} \,^{\mathsf{T}} {\mathbf{N}}_1} \; d \mbox{vec} \, {\mathbf{N}}_1. {} \end{aligned}
(11.27)
The partial differentials of η2 with respect to η1 and N1 are
\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {\boldsymbol{\eta}_1}}} \boldsymbol{\eta}_2^{\mathsf{T}} &\displaystyle =&\displaystyle \left( d \boldsymbol{\eta}_1^{\mathsf{T}} \right) \left( 2 {\mathbf{N}}_1-{\mathbf{I}} \right) \end{array} \end{aligned}
(11.28)
\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {{\mathbf{N}}_1}}} \boldsymbol{\eta}_2^{\mathsf{T}} &\displaystyle =&\displaystyle 2 \boldsymbol{\eta}_1^{\mathsf{T}} \left( d {\mathbf{N}}_1 \right). \end{array} \end{aligned}
(11.29)
Applying the vec operator gives
\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {\boldsymbol{\eta}_1}}} \boldsymbol{\eta}_2 &\displaystyle =&\displaystyle \left( 2 {\mathbf{N}}_1^{\mathsf{T}} - {\mathbf{I}} \right) d \boldsymbol{\eta}_1 \end{array} \end{aligned}
(11.30)
\displaystyle \begin{aligned} \begin{array}{rcl} \partial_{\mbox{ {{\mathbf{N}}_1}}} \boldsymbol{\eta}_2 &\displaystyle =&\displaystyle 2 \left( {\mathbf{I}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{N}}_1 \end{array} \end{aligned}
(11.31)
which combine according to (11.27) to yield (11.24). The derivations of dη3 and dη4 follow the same sequence of steps; the details are shown in Appendix A. □

### 11.2.3 Number of States Visited Before Absorption

Let ξi ≥ 1 be the number of distinct transient states visited before absorption, and let ξ1 = E(ξ). Then
\displaystyle \begin{aligned} \boldsymbol{\xi}_1^{\mathsf{T}} = {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} {\mathbf{N}}_1 {} \end{aligned}
(11.32)
(Iosifescu 1980, Sect. 3.2.5), where $${\mathbf {N}}_{\mathrm {dg}}^{-1} = \left ( {\mathbf {N}}_{\mathrm {dg}} \right )^{-1}$$.

### Theorem 11.2.3

Letξ1 = E(ξ). The sensitivity ofξis
\displaystyle \begin{aligned} d \boldsymbol{\xi}_1 = \left[ - \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \left( {\mathbf{N}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) \mathcal{D}\,({vec} \, {\mathbf{I}}) + \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) \right] d {vec} \, {\mathbf{N}}_1, {} \end{aligned}
(11.33)

where dvecN1is given by (11.7) .

### Proof

Differentiating (11.32) yields
\displaystyle \begin{aligned} d \boldsymbol{\xi}_1^{\mathsf{T}} = {\mathbf{1}}^{\mathsf{T}} \left( d {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) {\mathbf{N}}_1 + {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} d {\mathbf{N}}_1. \end{aligned}
(11.34)
Applying the vec operator yields
\displaystyle \begin{aligned} d \boldsymbol{\xi}_1 = \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{N}}_{\mathrm{dg}}^{-1} + \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) d \mbox{vec} \, {\mathbf{N}}_1. \end{aligned}
(11.35)
Applying () to $$d \mbox{vec} \, {\mathbf {N}}_{\mathrm {dg}}^{-1}$$ and using (11.12) for dvec Ndg gives
\displaystyle \begin{aligned} d \boldsymbol{\xi}_1 = - \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \left( {\mathbf{N}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}}) d \mbox{vec} \, {\mathbf{N}}_1 + \left( {\mathbf{I}} \otimes {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_{\mathrm{dg}}^{-1} \right) d \mbox{vec} \, {\mathbf{N}}_1 \end{aligned}
(11.36)
which simplifies to (11.33). □

### 11.2.4 Multiple Absorbing States and Probabilities of Absorption

When the chain includes a > 1 absorbing states, the entry mij of the a × s submatrix M in (11.1) is the probability of transition from transient state j to absorbing state i. The result of the competing risks of absorption is a set of probabilities $$b_{ij} = P \left [ \mbox{absorption in }i \left | \mbox{starting in }j \right . \right ]$$ for i = 1, …, a and j = 1, …, s. The matrix $${\mathbf {B}} = \left ( b_{ij} \right ) = {\mathbf {M}} {\mathbf {N}}_1$$ (Iosifescu 1980, Thm. 3.3).

### Theorem 11.2.4

LetB = MN1be the matrix of absorption probabilities. Then
\displaystyle \begin{aligned} d {vec} \, {\mathbf{B}} = \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{I}} \right) d {vec} \, {\mathbf{M}} + \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{B}} \right) d {vec} \, {\mathbf{U}}. {} \end{aligned}
(11.37)

### Proof

Differentiating B yields
\displaystyle \begin{aligned} d {\mathbf{B}} = \left( d {\mathbf{M}} \right) {\mathbf{N}}_1 + {\mathbf{M}} \left( d {\mathbf{N}}_1 \right). \end{aligned}
(11.38)
Applying the vec operator gives
\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}} = \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{I}} \right) d \mbox{vec} \, {\mathbf{M}} + \left( {\mathbf{I}} \otimes {\mathbf{M}} \right) d \mbox{vec} \, {\mathbf{N}}_1. \end{aligned}
(11.39)
Substituting (11.7) for dvec N1 and simplifying gives (11.37). □
Column j of B is the probability distribution of the eventual absorption state for an individual starting in transient state j. Usually a few of those starting states are of particular interest (e.g., states corresponding to “birth” or to the start of some process). Let B(:, j) = Bej denote column j of B, where ej is the jth unit vector of length s. Thus the derivative of B(:, j) is
\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}}(:,j) = \left( {\mathbf{e}}_j^{\mathsf{T}} \otimes {\mathbf{I}}_s \right) d \mbox{vec} \, {\mathbf{B}} {} \end{aligned}
(11.40)
where dvec B is given by (11.37). Similarly, row i of B is $${\mathbf {B}}(i,:)={\mathbf {e}}_i^{\mathsf {T}} {\mathbf {B}}$$ and
\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}}(i,:) = \left( {\mathbf{I}}_s \otimes {\mathbf{e}}_i^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{B}} {} \end{aligned}
(11.41)
where ei is the ith unit vector of length a.

### 11.2.5 The Quasistationary Distribution

The quasistationary distribution of an absorbing Markov chain gives the limiting probability distribution, over the set of transient states, of the state of an individual that has yet to be absorbed. Let w and v be the right and left eigenvectors associated with the dominant eigenvalue of U, normalized so that ∥w∥ = ∥v∥ = 1. Darroch and Seneta (1965) defined two quasistationary distributions in terms of w and v. The limiting probability distribution of the state of an individual, given that absorption has not yet happened, converges to
\displaystyle \begin{aligned} {\mathbf{q}}_a = {\mathbf{w}} {} \end{aligned}
(11.42)
The limiting probability distribution of the state of an individual, given that absorption has not happened and will not happen for a long time, is
\displaystyle \begin{aligned} {\mathbf{q}}_b = \frac{{\mathbf{w}} \circ {\mathbf{v}}}{{\mathbf{w}}^{\mathsf{T}} {\mathbf{v}}} {} \end{aligned}
(11.43)
Horvitz and Tuljapurkar (2008) pointed out that the convergence to the quasistationary distribution implies that, in a stage-classified model, mortality eventually becomes independent of age.

### Lemma 1

Let the dominant eigenvalue ofU, guaranteed real and nonnegative by the Perron-Frobenius theorem, satisfy 0 < λ < 1, and letwandvbe the right and left eigenvectors corresponding to λ, scaled so thatwTv = 1. Then
(11.44)
(11.45)

### Proof

Equation (11.44) is proven in Caswell (2008, Section 6.1). Equation (11.45) is obtained by treating v as the right eigenvector of UT. □

### Theorem 11.2.5

The derivative of the quasistationary distributionqais given by (11.44) . The derivative of the quasistationary distributionqbis
(11.46)

where dwand dvare given by (11.44) and (11.45) respectively.

### Proof

The derivative of qa follows from its definition as the scaled right eigenvector of U. For qb, differentiating (11.43) gives
(11.47)
(11.48)
Applying the vec operator gives
(11.49)
which simplifies to give (11.46). □

## 11.3 Life Lost Due to Mortality

The approach here makes it easy to compute the sensitivity of a variety of dependent variables calculated from the Markov chain. As an example of this flexibility, consider a recently developed demographic index, the number of years of life lost due to mortality (Vaupel and Canudas Romo 2003).

The transient states of the chains are age classes, absorption corresponds to death, and absorbing states correspond to age at death. Let μi be the mortality rate and $$p_i=\exp (-\mu _i)$$ the survival probability at age i. The matrix U has the pi on the subdiagonal and zeros elsewhere. The matrix M has 1 − pi on the diagonal and zeros elsewhere. Let f = B(:, 1) be the distribution of age at death and η1 the vector of expected longevity as a function of age.

A death at age i represents the loss of some number of years of life beyond that age. The expectation of that loss is given by the ith entry of η1, and the expected number of years lost over the distribution of age at death is $$\eta ^\dagger = \boldsymbol {\eta }_1^{\mathsf {T}} {\mathbf {f}}$$. This quantity also measures the disparity among individuals in longevity (Vaupel and Canudas Romo 2003). If everyone died at the identical age x, f would be a delta function at x and further life expectancy at age x would be zero; their product would give η = 0. Declines in discrepancy have accompanied increases in life expectancy observed in developed countries (Edwards and Tuljapurkar 2005; Wilmoth and Horiuchi 1999). Thus it is useful to know how η responds to changes in mortality.

Differentiating η gives
\displaystyle \begin{aligned} d \eta^\dagger = \left( d \boldsymbol{\eta}_1^{\mathsf{T}} \right) {\mathbf{B}} {\mathbf{e}}_1 + \boldsymbol{\eta}_1^{\mathsf{T}} \left( d {\mathbf{B}} \right) {\mathbf{e}}_1. \end{aligned}
(11.50)
Applying the vec operator gives
\displaystyle \begin{aligned} d \eta^\dagger = {\mathbf{e}}_1^{\mathsf{T}} {\mathbf{b}}^{\mathsf{T}} d \boldsymbol{\eta}_1^{\mathsf{T}} + \left( {\mathbf{e}}_1^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{B}}. \end{aligned}
(11.51)
Substituting (11.23) for dη1 and (11.37) for dvec B gives
(11.52)
Simplifying and writing derivatives in terms of μ gives
(11.53)
Because mortality rates vary over several orders of magnitude with age, it is useful to present the results as elasticities,
\displaystyle \begin{aligned} {\epsilon \eta^\dagger \over \epsilon \boldsymbol{\mu}^{\mathsf{T}}} = \frac{1}{\eta^\dagger}\; {d \eta^\dagger \over d \boldsymbol{\mu}^{\mathsf{T}}} \; \mathcal{D}\,(\boldsymbol{\mu}). \end{aligned}
(11.54)
Figure 11.1 shows these elasticities for two populations chosen to have very different life expectancies: India in 1961, with female life expectancy of 45 years and η = 23.9 years and Japan in 2006, with female life expectancy of 86 years and η = 10.1 years (Human Mortality Database 2016). In both cases, elasticities are positive from birth to some age (≈50 for India, ≈85 for Japan) and negative thereafter. This implies that reductions in infant and early life mortality would reduce η, whereas reductions in old age mortality would increase η. Zhang and Vaupel (2009) have shown that the existence of such a critical age is a general property of these models.

## 11.4 Ergodic Chains

Now let us consider perturbations of an ergodic finite-state Markov chain with an irreducible, primitive, column-stochastic transition matrix P of dimension s × s. The stationary distribution π is given by the right eigenvector, scaled to sum to 1, corresponding to the dominant eigenvalue λ1 = 1 of P. The fundamental matrix of the chain is $${\mathbf {Z}} = \left ( {\mathbf {I}} - {\mathbf {P}} + \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right )^{-1}$$ (Kemeny and Snell 1960).

We are interested only in perturbations that preserve the column-stochasticity of P; i.e., for which P remains a stochastic matrix. Such perturbations are easily defined when the pij depend explicitly on a parameter vector θ. However, when the parameters of interest are the pij themselves, an implicit parameterization must be defined to preserve the stochastic nature of P under perturbation (Conlisk 1985; Caswell 2001). In Sect. 11.4.5 we will explore new expressions for two different forms of implicit parameterization.

Previous studies of perturbations of ergodic chains focus almost completely on perturbations of the stationary distribution, and are divided between those focusing on sensitivity as a derivative (e.g., Schweitzer 1968; Conlisk 1985; Golub and Meyer 1986) and studies focusing on perturbation bounds and condition numbers (Funderlic and Meyer 1986; Meyer 1994; Seneta 1988; Hunter 2005; Kirkland 2003); for reviews see Cho and Meyer (2000) and Kirkland et al. (2008). The approach here is similar in spirit to that of Schweitzer (1968), Conlisk (1985), and Golub and Meyer (1986), in that we focus on derivatives of Markov chain properties with respect to parameter perturbations, but taking advantage of the matrix calculus approach. We do not consider perturbation bounds here.

### Theorem 11.4.1

Letπbe the stationary distribution, satisfyingPπ = πand1Tπ = 1. The sensitivity ofπis
(11.55)

where Z is the fundamental matrix of the chain.

### Proof

The vector π is the right eigenvector of P, scaled to sum to 1. Applying Lemma 1, and noting that λ = 1 and 1TP = 1T, gives $$d \boldsymbol {\pi } = {\mathbf {Z}} \left [ \boldsymbol {\pi }^{\mathsf {T}} \otimes \left ( {\mathbf {I}}_s - \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right ) \right ] d \mbox{vec} \, {\mathbf {P}}$$. Noting that Zπ = π and simplifying the Kronecker products yields (11.55). □

Based on an analysis of eigenvector sensitivity (Meyer and Stewart 1982), Golub and Meyer (1986) derived an expression for the derivative of π to a change in a single element of P using the group generalized inverse $$\left ( {\mathbf {I}} -{\mathbf {P}} \right )^\#$$ of I −P. Since $$\left ( {\mathbf {I}} -{\mathbf {P}} \right )^\# = {\mathbf {Z}} - \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}}$$ (Golub and Meyer 1986), expression (11.55) is exactly the Golub-Meyer result expressed in matrix calculus notation. Our results here permit sensitivity analysis of functions of π using only the chain rule. If g(π) is a vector- or scalar-valued function of π, then
\displaystyle \begin{aligned} d g(\boldsymbol{\pi}) = {d g \over d \boldsymbol{\pi}^{\mathsf{T}} } \; {d \boldsymbol{\pi} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; d \mbox{vec} \, {\mathbf{P}} \end{aligned}
(11.56)
Some examples will appear in Sect. 11.5.

### 11.4.2 The Fundamental Matrix

The fundamental matrix $${\mathbf {Z}} = \left ( {\mathbf {I}} - {\mathbf {P}} + \boldsymbol {\pi } {\mathbf {1}}^{\mathsf {T}} \right )^{-1}$$ plays a role in ergodic chains similar to that played by N1 in absorbing chains (Kemeny and Snell 1960). It has been extended using generalized inverses (Meyer 1975; Kemeny 1981), but we do not consider those extensions here.

### Theorem 11.4.2

The sensitivity of the fundamental matrix is
(11.57)

### Proof

From (),
(11.58)
(11.59)
Substituting (11.55) for dπ and simplifying gives (11.57). □

### 11.4.3 The First Passage Time Matrix

Let $${\mathbf {R}} = \left ( r_{ij}^{~} \right )$$ be the matrix of mean first passage times from j to i, given by Iosifescu (1980, Thm. 4.7).
(11.60)
Again, this is the transpose of the expression obtained when P is row-stochastic.

### Theorem 11.4.3

The sensitivity of R is
(11.61)

where dπis given by (11.55) and dvecZis given by (11.57) .

### Proof

Differentiating (11.60) gives
(11.62)
Applying the vec operator gives
(11.63)
Using () for $$d \mbox{vec} \, \left [ \mathcal {D}\, (\boldsymbol {\pi } )^{-1} \right ]$$, () for $$d \mbox{vec} \, \mathcal {D}\,(\boldsymbol {\pi })$$, and (11.12) for dvec Zdg yields
(11.64)
which simplifies to give (11.61). □

### 11.4.4 Mixing Time and the Kemeny Constant

The mixing time K of a chain is the mean time required to get from a specified state to a state chosen at random from the stationary distribution π. Remarkably, K is independent of the starting state (Grinstead and Snell 2003; Hunter 2006) and is sometimes called Kemeny’s constant; it is a measure of the rate of convergence to stationarity, and is K = trace(Z) (Hunter 2006). In addition to being a quantity of interest in itself, the rate of convergence also plays a role in the sensitivity of the stationary distribution of ergodic chains (Hunter 2005; Mitrophanov 2005).

### Theorem 11.4.4

The sensitivity of K is
\displaystyle \begin{aligned} dK = \left( {vec} \, {\mathbf{I}}_s \right)^{\mathsf{T}} d {vec} \, {\mathbf{Z}}. {} \end{aligned}
(11.65)

### Proof

Differentiating K = trace(Z) gives
\displaystyle \begin{aligned} dK = {\mathbf{1}}^{\mathsf{T}} \left( {\mathbf{I}} \circ d {\mathbf{Z}} \right) \mathbf{1}. \end{aligned}
(11.66)
Applying the vec operator gives
\displaystyle \begin{aligned} dK = \left( {\mathbf{1}}^{\mathsf{T}} \otimes {\mathbf{1}}^{\mathsf{T}} \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}} ) d \mbox{vec} \, {\mathbf{Z}} \end{aligned}
(11.67)
which simplifies to (11.65). □

### 11.4.5 Implicit Parameters and Compensation

Theorems 11.4.1, 11.4.2, 11.4.3, and 11.4.4 are written in terms of dvec P. However, perturbation of any element, say pkj, to pkj + θkj, must be compensated for by adjustments of the other elements in column j so that the column sum remains equal to 1 (Conlisk 1985). Two kinds of compensation are likely to be of use in applications: additive and proportional. Additive compensation adjusts all the elements of the column by an equal amount, distributing the perturbation θkj additively over column j. Proportional compensation distributes θkj in proportion to the values of the pij, for i ≠ k. Proportional compensation is attractive because it preserves the pattern of zero and non-zero elements within P.

To develop the compensation formulae, let us start by considering a probability vector p, of dimension s × 1, with pi ≥ 0 and ∑ipi = 1. Let θi be the perturbation of pi, and write
\displaystyle \begin{aligned} {\mathbf{p}} (\boldsymbol{\theta}) = {\mathbf{p}} (0) + {\mathbf{A}} \boldsymbol{\theta} {} \end{aligned}
(11.68)
for some matrix A to be determined. If y is a function of p, then
\displaystyle \begin{aligned} dy = {d y \over d {\mathbf{p}}^{\mathsf{T}}} \; {d {\mathbf{p}} \over d \boldsymbol{\theta}^{\mathsf{T}}} \; d \boldsymbol{\theta} \end{aligned}
(11.69)
evaluated at θ = 0.

For the case of additive compensation, we write
\displaystyle \begin{aligned} \begin{array}{rcl} p_1(\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_1(0) + \theta_1 - \frac{\theta_2}{s-1} - \cdots - \frac{\theta_s}{s-1} \\ p_2 (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_2(0) -\frac{\theta_1}{s-1} + \theta_2 - \cdots - \frac{\theta_s}{s-1} \\ &\displaystyle \vdots&\displaystyle {} \\ p_s (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_s(0) -\frac{\theta_1}{s-1} -\frac{\theta_2}{s-1}- \cdots + \theta_s \end{array} \end{aligned}
(11.70)
The perturbation θ1 is added to p1 and compensated for by subtracting θ1∕(s − 1) from all other entries of p; clearly ∑ipi(θ) = 1 for any perturbation vector θ.
The system of Eqs. (11.70) can be written
\displaystyle \begin{aligned} {\mathbf{p}}(\boldsymbol{\theta}) = {\mathbf{p}}(0) + \left( {\mathbf{I}} - \frac{1}{s-1} {\mathbf{C}} \right) \boldsymbol{\theta}. \end{aligned}
(11.71)
Defining E to be a matrix of ones, then the matrix C can be written (as a so-called Toeplitz matrix) as C = E −I, with zeros on the diagonal and ones elsewhere. Thus the matrix A in (11.68) is
\displaystyle \begin{aligned} {\mathbf{A}} = {\mathbf{I}} - \frac{1}{s-1} {\mathbf{C}} {} \end{aligned}
(11.72)

### Proportional compensation

For proportional compensation, assume that pi < 1 for all i. The vector p(θ) is
\displaystyle \begin{aligned} \begin{array}{rcl} p_1(\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_1(0) + \theta_1 - \frac{p_1 \theta_2}{1-p_2} - \cdots - \frac{p_1 \theta_s}{1-p_s} \\ p_2 (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_2(0) -\frac{p_2 \theta_1}{1-p_1} + \theta_2 - \cdots - \frac{p_2 \theta_s}{1-p_s} \\ &\displaystyle \vdots&\displaystyle {} \\ p_s (\boldsymbol{\theta}) &\displaystyle =&\displaystyle p_s(0) -\frac{p_s \theta_1}{1-p_1} -\frac{p_s \theta_2}{1-p_2}- \cdots + \theta_s \end{array} \end{aligned}
(11.73)
The perturbation θ1 is added to p1 and compensated for by subtracting θ1pi∕(1 − p1) from the ith entry of p. Again, ∑ipi(θ) = 1 for any perturbation vector θ.
Equation (11.73) can be written
(11.74)
so that the matrix A in (11.68) is
\displaystyle \begin{aligned} {\mathbf{A}} = {\mathbf{I}} - \mathcal{D}\,({\mathbf{p}}) \; {\mathbf{C}} \; \mathcal{D}\,(\mathbf{1} - {\mathbf{p}})^{-1} {} \end{aligned}
(11.75)

### The transition matrix

We have derived compensation formulae for a single probability vector p. Now consider perturbation of a probability matrix P, each column of which is a probability vector. Define a perturbation matrix Θ where θij is the perturbation of pij. Perturbations of column j are to be compensated by a matrix Aj, so that
(11.76)
where Ai compensates for the changes in column i of P. Applying the vec operator to (11.76) gives
\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{vec} \, {\mathbf{P}}(\boldsymbol{\Theta}) &=& \mbox{vec} \, {\mathbf{P}}(0) + \left(\begin{array}{ccc} {\mathbf{A}}_1 & & \\ & \ddots & \\ && {\mathbf{A}}_s \end{array}\right) \mbox{vec} \, \boldsymbol{\Theta} \end{array} \end{aligned}
(11.77)
\displaystyle \begin{aligned} \begin{array}{rcl} &=& \mbox{vec} \, {\mathbf{P}}(0) + \sum_{i=1}^s \left( {\mathbf{E}}_{ii} \otimes {\mathbf{A}}_i \right) \mbox{vec} \, \boldsymbol{\Theta}. {} \end{array} \end{aligned}
(11.78)
The terms in the summation in (11.78) are recognizable as the vec of the product AiΘEii; thus
\displaystyle \begin{aligned} {\mathbf{P}}(\boldsymbol{\Theta}) = {\mathbf{P}}(0) + \sum_{i=1}^s {\mathbf{A}}_i \boldsymbol{\Theta} {\mathbf{E}}_{ii} {} \end{aligned}
(11.79)
where Eii is a matrix with a 1 in the (i, i) entry and zeros elsewhere.

Theorem 11.4.5

LetPbe a column-stochastic s × s transition matrix. LetΘbe a matrix of perturbations, where θijis applied to pij, and the other entries ofΘcompensate for the perturbation. LetC = E −I. If compensation is additive, then
\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{P}}(\boldsymbol{\Theta}) &\displaystyle =&\displaystyle {\mathbf{P}}(0) + \left( {\mathbf{I}} - \frac{1}{s-1} {\mathbf{C}} \right) \boldsymbol{\Theta} {} \end{array} \end{aligned}
(11.80)
\displaystyle \begin{aligned} \begin{array}{rcl} {d {vec} \, {\mathbf{P}} \over d {vec} \,^{\mathsf{T}} \boldsymbol{\Theta}} &\displaystyle =&\displaystyle \left[ {\mathbf{I}}_{s^2} - \frac{1}{s-1} \left( {\mathbf{I}}_s \otimes {\mathbf{C}} \right) \right]. {} \end{array} \end{aligned}
(11.81)
If compensation is proportional, then
\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{P}}(\boldsymbol{\Theta}) &\displaystyle =&\displaystyle {\mathbf{P}}(0) + \sum_{i=1}^s \left\{ {\mathbf{I}} - \mathcal{D}\, \left[ {\mathbf{P}}(:,i) \right] \; {\mathbf{C}}\; \mathcal{D}\, \left[ \mathbf{1} - {\mathbf{P}}(:,i) \right]^{-1} \right\} \boldsymbol{\Theta} {\mathbf{E}}_{ii} \qquad {} \end{array} \end{aligned}
(11.82)
(11.83)

Proof

P( Θ) is given by (11.79). If compensation is additive, Ai is given by (11.72) for all i. Substituting into (11.79) gives (11.80). Differentiating (11.80) and applying the vec operator gives (11.81).

If compensation is proportional, substituting (11.75) for Ai in (11.79) gives (11.82). Differentiating yields
\displaystyle \begin{aligned} d {\mathbf{P}} = \left( d \boldsymbol{\theta} \right) \sum_{i=1}^s {\mathbf{E}}_{ii} - \sum_{i=1}^s \mathcal{D}\,[ {\mathbf{P}}(:,1) ] \; {\mathbf{C}} \; \mathcal{D}\,[ \mathbf{1} - {\mathbf{P}}(:,i) ]^{-1} (d \boldsymbol{\Theta}) {\mathbf{E}}_{ii}. \end{aligned}
(11.84)
Using the vec operator gives (11.83). □
Perturbations of P subject to compensation are given by perturbations of Θ. Thus for any function y(P) we can write
\displaystyle \begin{aligned} \left. {d y \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} = {d y \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; {d \mbox{vec} \, {\mathbf{P}} \over d \mbox{vec} \,^{\mathsf{T}} \boldsymbol{\Theta}} \end{aligned}
(11.85)
where dvec Pdvec TΘ is given (for additive and proportional compensation) by Theorem 11.4.5. The slight notational complexity is worthwhile for clarifying how to use Theorem 11.4.5 in practice.

## 11.5 Species Succession in a Marine Community

Markov chains are used by ecologists as models of species replacement (succession) in ecological communities; (e.g., Horn 1975; Hill et al. 2004; Nelis and Wootton 2010). In these models, the state of a point on a landscape is given by the species occupying that point. The entry pij of P is the probability that species j is replaced by species i between t and t + 1. If a community consists of a large number of points independently subject to the transition probabilities in P, the stationary distribution π will give the relative frequencies of species in the community at equilibrium.

Hill et al. (2004) used a Markov chain to describe a community of encrusting organisms occupying rock surfaces at 30–35 m depth in the Gulf of Maine. The Markov chain contained 14 species plus an additional state (“bare rock”) for unoccupied substrate. The matrix P was estimated from longitudinal data (Hill et al. 2002, 2004) and is given, along with a list of species names, in Appendix B. We will use the results of this chapter to analyze the sensitivity of species diversity and the Kemeny constant to the processes of colonization and replacement that determing P.

### 11.5.1 Biotic Diversity

The stationary distribution π, with the species numbered in order of decreasing abundance and bare rock placed at the end as state 15, is shown in Fig. 11.2. The two dominant species are an encrusting sponge (called Hymedesmia) and a bryozoan (Crisia).
The entropy of this stationary distribution, $$H(\boldsymbol {\pi }) = -\boldsymbol {\pi }^{\mathsf {T}} (\log \boldsymbol {\pi })$$, where the logarithm is applied elementwise, is used as an index of biodiversity; it is maximal when all species are equally abundant and goes to 0 in a community dominated by a single species. The sensitivity of H is
\displaystyle \begin{aligned} d H = - \left( \log \; \boldsymbol{\pi}^{\mathsf{T}} + {\mathbf{1}}^{\mathsf{T}} \right) d \boldsymbol{\pi} {} \end{aligned}
(11.86)
Most ecologists, however, would not include bare substrate in a measure of biodiversity, so we define instead a “biotic diversity” $$H_b(\boldsymbol {\pi }) = H \left ( \boldsymbol {\pi }_b \right )$$ where
\displaystyle \begin{aligned} \boldsymbol{\pi}_b = \frac{{\mathbf{G}} \boldsymbol{\pi}}{\|{\mathbf{G}} \boldsymbol{\pi} \|}. {} \end{aligned}
(11.87)
The matrix G, of dimension 14 × 15, is a 0–1 matrix that selects rows 1–14 of π. Because π is positive, ∥Gπ∥ = 1TGπ. Differentiating πb gives
\displaystyle \begin{aligned} d \boldsymbol{\pi}_b = \left( \frac{{\mathbf{G}}}{{\mathbf{1}}^{\mathsf{T}} {\mathbf{G}} \boldsymbol{\pi}} - \frac{{\mathbf{G}} \boldsymbol{\pi} {\mathbf{1}}^{\mathsf{T}} {\mathbf{G}}}{\left( {\mathbf{1}}^{\mathsf{T}} {\mathbf{G}} \boldsymbol{\pi} \right)^2} \right) d \boldsymbol{\pi} \end{aligned}
(11.88)
which simplifies to
\displaystyle \begin{aligned} d \boldsymbol{\pi}_b = \left( \frac{{\mathbf{G}} - \boldsymbol{\pi}_b {\mathbf{1}}^{\mathsf{T}} {\mathbf{G}}}{{\mathbf{1}}^{\mathsf{T}} {\mathbf{G}} \boldsymbol{\pi}} \right) d \boldsymbol{\pi} {} \end{aligned}
(11.89)

This model contains no explicit parameters; perturbations of the transition probabilities themselves are of interest and a compensation pattern is needed. Because the relative magnitudes of the entries in a column of P reflect the relative abilities of species to capture or to hold space, proportional compensation is appropriate in this case because it preserves these relative abilities.

The sensitivity and elasticity of the biotic diversity Hb to changes in the matrix P, subject to proportional compensation, are
\displaystyle \begin{aligned} \begin{array}{rcl} \left. {d H_b \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} &\displaystyle =&\displaystyle \underbrace{{d H_b \over d \boldsymbol{\pi}_b^{\mathsf{T}}}}_1 \; \underbrace{{d \boldsymbol{\pi}_b \over d \boldsymbol{\pi}^{\mathsf{T}}}}_2 \; \underbrace{{d \boldsymbol{\pi} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}}}_3 \; \underbrace{{d \mbox{vec} \, {\mathbf{P}} \over d \mbox{vec} \,^{\mathsf{T}} \boldsymbol{\Theta}}}_4 {} \end{array} \end{aligned}
(11.90)
\displaystyle \begin{aligned} \begin{array}{rcl} \left. {\epsilon H_b \over \epsilon \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} &\displaystyle =&\displaystyle \frac{1}{H_b} \; {d H_b \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; \mathcal{D}\,(\mbox{vec} \, {\mathbf{P}}) {} \end{array} \end{aligned}
(11.91)
Term 1 on the right hand side of (11.90) is the derivative of Hb with respect to πb, and is given by (11.86). Term 2 is the derivative of the biotic diversity vector πb with respect to the full diversity vector π, given by (11.89). Term 3 is the derivative of the diversity vector π with respect to the transition matrix P, given by, (11.55). Finally, Term 4 is the derivative of the matrix P taking into account the compensation structure in (11.83).
The sensitivity and elasticity vectors (11.90) and (11.91) are of dimension 1 × s2 = 1 × 255. To reduce the number of independent perturbations, we consider subsets of the pij: disturbance (in which a species is replaced by bare rock), colonization of unoccupied space, replacement of one species by another, and persistence of a species in its location, where
\displaystyle \begin{aligned} \begin{array}{rcl} P[\mbox{disturbance of sp. }i ] &\displaystyle =&\displaystyle p_{si} {} \\ {} P[\mbox{colonization by sp. }i ] &\displaystyle =&\displaystyle p_{is} \\ {} P[\mbox{persistence of sp. }i ] &\displaystyle =&\displaystyle p_{ii} \\ {} P[\mbox{replacement of sp. }i ] &\displaystyle =&\displaystyle \sum_{k \neq i,s} p_{ki} \\ {} P[\mbox{replacement by sp. }i ] &\displaystyle =&\displaystyle \sum_{j \neq i, s} p_{ij}. {} \end{array} \end{aligned}
Extracting the corresponding elements of $${\epsilon H_b \over \epsilon \mbox{vec} \,^{\mathsf {T}} {\mathbf {P}}}$$ gives the elasticities to these classes of probabilities. Figure 11.3 shows that the dominant species (1 and 2) have impacts that are larger than, and opposite in sign to, those of the remaining species. Biodiversity would be enhanced by increasing the disturbance of, or the replacement of, species 1 and 2, and reduced by increasing the rates of colonization by, persistence of, or replacement by species 1 and 2.

### 11.5.2 The Kemeny Constant and Ecological Mixing

Ecologists have used several measures of the rate of convergence of communities modelled by Markov chains, including the damping ratio and Dobrushin’s coefficient of ergodicity (Hill et al. 2004). The Kemeny constant K is an interesting addition to this list; it gives the expected time to get from any initial state to a state selected at random from the stationary distribution (Hunter 2006). Once reaching that state, the behavior of the chain and the stationary process are indistinguishable.

The sensitivity of K, subject to compensation, is
\displaystyle \begin{aligned} \left. {d K \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \right|{}_{\mathrm{comp}} = {d K \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{Z}}} \; {d \mbox{vec} \, {\mathbf{Z}} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{P}}} \; {d \mbox{vec} \, {\mathbf{P}} \over d \mbox{vec} \,^{\mathsf{T}} \boldsymbol{\Theta}} \end{aligned}
(11.92)
where the three terms on the right hand side are given by (11.65), (11.57), and (11.83), respectively.
Figure 11.4 shows the sensitivities dKdvec TP, subject to proportional compensation, and aggregated as in Fig. 11.3. Unlike the case with Hb, the two dominant species do not stand out from the others. Increases in the rates of replacement will speed up convergence, and increases in persistence will slow convergence. The disturbance of, colonization by, persistence of, and replacement of species 6 (it is a sea anemone, Urticina crassicornis) have particularly large impacts on K. Examination of row 6 and column 6 of P (Appendix B) shows that U. crassicornis has the highest probability of persistence (p66 = 0.86), and one of the lowest rates of disturbance, in the community. While it is far from dominant (Fig. 11.2), it has a major impact on the rate of mixing.

## 11.6 Discussion

Given that many properties of finite state Markov chains can be expressed as simple matrix expressions, matrix calculus is an attractive approach to finding the sensitivity and elasticity to parameter perturbations. Most of the literature on perturbation analysis of Markov chains has focused on the stationary distribution of ergodic chains, but the approach here is equally applicable to absorbing chains, and to dependent variables other than the stationary distribution. The perturbation of ergodic chains is often studied using generalized inverses, since the influential studies of Meyer (Meyer 1975, 1994; Golub and Meyer 1986; Funderlic and Meyer 1986). Matrix calculus provides a complementary approach; the sensitivity of the stationary distribution π obtained here agrees with the result obtained by Golub and Meyer (1986) using the group generalized inverse.

The examples shown here are typical of cases where absorbing or ergodic Markov chains are used in population biology and ecology. In each example, the dependent variables of interest are functions several steps removed from the chain itself. The ease with which one can differentiate such functions is a particularly attractive property of the matrix calculus approach.

## References

1. Caswell, H. 2001. Matrix Population Models: Construction, Analysis, and Interpretation. 2nd edition. Sinauer Associates, Sunderland, MA.Google Scholar
2. Caswell, H., 2006. Applications of Markov chains in demography. Pages 319–334 in MAM2006: Markov Anniversary Meeting. Boson Books, Raleigh, North Carolina.Google Scholar
3. Caswell, H. 2008. Perturbation analysis of nonlinear matrix population models. Demographic Research 18:59–116.
4. Caswell, H. 2009. Stage, age and individual stochasticity in demography. Oikos 118:1763–1782.
5. Caswell, H. 2011. Perturbation analysis of continuous-time absorbing Markov chains. Numerical Linear Algebra with Applications 18:901–917.
6. Cho, G. E., and C. D. Meyer. 2000. Comparison of perturbation bounds for the stationary distribution of a Markov chain. Linear Algebra and its Applications 335:137–150.
7. Conlisk, J. 1985. Comparative statics for Markov chains. Journal of Economic Dynamics and Control 9:139–151.
8. Darroch, J. N., and E. Seneta. 1965. On quasi-stationary distributions in absorbing discrete-time finite Markov chains. Journal of Applied Probability 2:88–100.
9. Edwards, R. D., and S. Tuljapurkar. 2005. Inequality in life spans and a new perspective on mortality convergence across industrialized countries. Population and Development Review 31:645–674.
10. Funderlic, R. E., and C. D. Meyer, Jr. 1986. Sensitivity of the stationary distribution vector for an ergodic Markov chain. Linear Algebra and its Applications 76:1–17.
11. Golub, G. H., and C. D. Meyer, Jr. 1986. Using the QR factorization and group inversion to compute, differentiate, and estimate the sensitivity of stationary probabilities for Markov chains. SIAM Journal on Algebraic and Discrete Methods 7:273–281.
12. Grinstead, C. M., and J. L. Snell. 2003. Introduction to probability. Second edition. American Mathematical Society.Google Scholar
13. Hill, M. F., J. D. Witman, and H. Caswell. 2002. Spatio-temporal variation in Markov chain models of subtidal community succession. Ecology Letters 5:665–675.
14. Hill, M. F., J. D. Witman, and H. Caswell. 2004. Markov chain analysis of succession in a rocky subtidal community. The American Naturalist 164:E46–E61.
15. Horn, H. S., 1975. Markovian properties of forest succession. Pages 196–211 in M. L. Cody and J. M. Diamond, editors. Ecology and evolution of communities. Harvard University Press, Cambridge, MA.Google Scholar
16. Horvitz, C. C., and S. Tuljapurkar. 2008. Stage dynamics, period survival, and mortality plateaus. American Naturalist 172:203–215.
17. Human Mortality Database. 2016. University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). www.mortality.org URL www.mortality.org.
18. Hunter, J. J. 2005. Stationary distributions and mean first passage times of perturbed Markov chains. Linear Algebra and its Applications 410:217–243.
19. Hunter, J. J. 2006. Mixing times with applications to perturbed Markov chains. Linear Algebra and its Applications 417:108–123.
20. Iosifescu, M. 1980. Finite Markov Processes and Their Applications. Wiley, New York, New York.Google Scholar
21. Kemeny, J. G. 1981. Generalization of a fundamental matrix. Linear Algebra and its Applications 38:193–206.
22. Kemeny, J. G., and J. L. Snell. 1960. Finite Markov Chains. Van Nostrand, Princeton, New Jersey.Google Scholar
23. Kirkland, S. 2003. Conditioning properties of the stationary distribution for a Markov chain. Electronic Journal of Linear Algebra 10:1–15.
24. Kirkland, S. J., M. M. Neumann, and N.-S. Sze. 2008. On optimal condition numbers for Markov chains. Numerische Mathematik 110:521–537.
25. Meyer, C. D. 1975. The role of the group generalized inverse in the theory of finite Markov chains. SIAM Review 17:443–464.
26. Meyer, C. D. 1994. Sensitivity of the stationary distribution of a Markov chain. SIAM Journal of Matrix Analysis and Applications 15:715–728.
27. Meyer, C. D., and G. W. Stewart. 1982. Derivatives and perturbations of eigenvectors. SIAM Journal of Numerical Analysis 25:679–691.
28. Mitrophanov, A. Y. 2003. Stability and exponential convergence of continuous-time Markov chains. Journal of Applied Probability 40:970–979.
29. Mitrophanov, A. Y. 2005. Sensitivity and convergence of uniformly ergodic Markov chains. Journal of Applied Probability 42:1003–1014.
30. Mitrophanov, A. Y., A. Lomsadze, and M. Borodovsky. 2005. Sensitivity of hidden Markov models. Journal of Applied Probability 42:632–642.
31. Nelis, L. C., and J. T. Wootton. 2010. Treatment-based Markov chain models clarify mechanisms of invasion in an invaded grassland community. Proceedings B, The Royal Society of London 277:539.
32. Schweitzer, P. J. 1968. Perturbation theory and finite Markov chains. Journal of Applied Probability 5:401–413.
33. Seneta, E. 1988. Perturbation of the stationary distribution measured by ergodicity coefficients. Advances in Applied Probability 20:228–230.
34. Seneta, E. 1993. Sensitivity of finite Markov chains under perturbation. Statistics and Probability Letters 17:163–168.
35. Vaupel, J. W., and V. Canudas Romo. 2003. Decomposing change in life expectancy: a bouquet of formulas in honor of Nathan Keyfitz’s 90th birthday. Demography 40:201–216.
36. Wilmoth, J. R., and S. Horiuchi. 1999. Rectangularization revisited: variability of age at death within human populations. Demography 36:475–495.
37. Zhang, Z., and J. W. Vaupel. 2009. The age separating early deaths from late deaths. Demographic Research 20:721–730.

© The Author(s) 2019

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Authors and Affiliations

• Hal Caswell
• 1
1. 1.Biodiversity & Ecosystem DynamicsUniversity of AmsterdamAmsterdamThe Netherlands