Sensitivity Analysis of Discrete Markov Chains

As we have seen repeatedly, Markov chains are often used as mathematical models of demographic (as well as other natural) phenomena, with transition probabilities defined in terms of parameters that are of interest in the scientific question at hand. Sensitivity analysis is an important way to quantify the effects of changes in these parameters on the behavior of the chain.


Absorbing Chains
The transition matrix for a discrete-time absorbing chain can be written where U, of dimension s × s, is the transition matrix among the s transient states, and M, of dimension a × s, contains probabilities of transition from the transient states to the a absorbing states. Assume that the spectral radius of U is strictly less than 1. Because we are concerned here with absorption, but not what happens after, we ignore transitions among absorbing states; hence the identity matrix (a × a) in the lower right corner. The matrices U[θ ] and M[θ ] are functions of a vector of parameters. We assume that θ varies over some set in which the column sums of P are 1 and the spectral radius of U is strictly less than one.
The derivatives of N 2 , N 3 , and N 4 can be used to study the variance, standard deviation, coefficient of variation, skewness, and kurtosis of the number of visits to the transient states (Caswell 2006(Caswell , 2009(Caswell , 2011.

Time to Absorption
Let η j be the time to absorption starting in transient state j and let η k = E η k 1 , · · · , η k s T . The first several of these moments are (Iosifescu 1980, Thm. 3.2) Theorem 11.2.2 Let η k be the vector of the kth moments of the η i . The sensitivities of these moment vectors are 11.26) where dvec N 1 is given by (11.7).
Proof The derivative of η 1 is obtained (Caswell 2006) by differentiating to get dη T 1 = 1 T (dN 1 ) and then applying the vec operator. For the higher moments, consider the η k to be functions of η 1 and N 1 , and write the total differential The partial differentials of η 2 with respect to η 1 and N 1 are Applying the vec operator gives which combine according to (11.27) to yield (11.24). The derivations of dη 3 and dη 4 follow the same sequence of steps; the details are shown in Appendix A.

Multiple Absorbing States and Probabilities of Absorption
When the chain includes a > 1 absorbing states, the entry m ij of the a ×s submatrix M in (11.1) is the probability of transition from transient state j to absorbing state i. The result of the competing risks of absorption is a set of probabilities b ij = P absorption in i |starting in j for i = 1, . . . , a and j = 1, . . . , s. The matrix B = b ij = MN 1 (Iosifescu 1980, Thm. 3.3). Substituting (11.7) for dvec N 1 and simplifying gives (11.37).
Column j of B is the probability distribution of the eventual absorption state for an individual starting in transient state j . Usually a few of those starting states are of particular interest (e.g., states corresponding to "birth" or to the start of some where e i is the ith unit vector of length a.

The Quasistationary Distribution
The quasistationary distribution of an absorbing Markov chain gives the limiting probability distribution, over the set of transient states, of the state of an individual that has yet to be absorbed. Let w and v be the right and left eigenvectors associated with the dominant eigenvalue of U, normalized so that w = v = 1. Darroch and Seneta (1965) defined two quasistationary distributions in terms of w and v. The limiting probability distribution of the state of an individual, given that absorption has not yet happened, converges to q a = w (11.42) The limiting probability distribution of the state of an individual, given that absorption has not happened and will not happen for a long time, is Horvitz and Tuljapurkar (2008) pointed out that the convergence to the quasistationary distribution implies that, in a stage-classified model, mortality eventually becomes independent of age.
Lemma 1 Let the dominant eigenvalue of U, guaranteed real and nonnegative by the Perron-Frobenius theorem, satisfy 0 < λ < 1, and let w and v be the right and left eigenvectors corresponding to λ, scaled so that w T v = 1. Then Proof Equation (11.44) is proven in Caswell (2008, Section 6.1). Equation (11.45) is obtained by treating v as the right eigenvector of U T .
Theorem 11.2.5 The derivative of the quasistationary distribution q a is given by (11.44). The derivative of the quasistationary distribution q b is where dw and dv are given by (11.44) and (11.45) respectively.
Proof The derivative of q a follows from its definition as the scaled right eigenvector of U. For q b , differentiating (11.43) gives Applying the vec operator gives which simplifies to give (11.46).

Life Lost Due to Mortality
The approach here makes it easy to compute the sensitivity of a variety of dependent variables calculated from the Markov chain. As an example of this flexibility, consider a recently developed demographic index, the number of years of life lost due to mortality (Vaupel and Canudas Romo 2003). The transient states of the chains are age classes, absorption corresponds to death, and absorbing states correspond to age at death. Let μ i be the mortality rate and p i = exp(−μ i ) the survival probability at age i. The matrix U has the p i on the subdiagonal and zeros elsewhere. The matrix M has 1 − p i on the diagonal and zeros elsewhere. Let f = B(:, 1) be the distribution of age at death and η 1 the vector of expected longevity as a function of age.
A death at age i represents the loss of some number of years of life beyond that age. The expectation of that loss is given by the ith entry of η 1 , and the expected number of years lost over the distribution of age at death is η † = η T 1 f. This quantity also measures the disparity among individuals in longevity (Vaupel and Canudas Romo 2003). If everyone died at the identical age x, f would be a delta function at x and further life expectancy at age x would be zero; their product would give η † = 0. Declines in discrepancy have accompanied increases in life expectancy observed in developed countries (Edwards and Tuljapurkar 2005;Wilmoth and Horiuchi 1999). Thus it is useful to know how η † responds to changes in mortality.
Differentiating η † gives (11.50) Applying the vec operator gives (11.51) Substituting (11.23) for dη 1 and (11.37) for dvec B gives Simplifying and writing derivatives in terms of μ gives Because mortality rates vary over several orders of magnitude with age, it is useful to present the results as elasticities, In both cases, elasticities are positive from birth to some age (≈50 for India, ≈85 for Japan) and negative thereafter. This implies that reductions in infant and early life mortality would reduce η † , whereas reductions in old age mortality would increase η † . Zhang and Vaupel (2009) have shown that the existence of such a critical age is a general property of these models.

Ergodic Chains
Now let us consider perturbations of an ergodic finite-state Markov chain with an irreducible, primitive, column-stochastic transition matrix P of dimension s × s. The stationary distribution π is given by the right eigenvector, scaled to sum to 1, corresponding to the dominant eigenvalue λ 1 = 1 of P. The fundamental matrix of the chain is Z = I − P + π1 T −1 (Kemeny and Snell 1960).
We are interested only in perturbations that preserve the column-stochasticity of P; i.e., for which P remains a stochastic matrix. Such perturbations are easily defined when the p ij depend explicitly on a parameter vector θ . However, when the parameters of interest are the p ij themselves, an implicit parameterization must be defined to preserve the stochastic nature of P under perturbation (Conlisk 1985;Caswell 2001). In Sect. 11.4.5 we will explore new expressions for two different forms of implicit parameterization.
Previous studies of perturbations of ergodic chains focus almost completely on perturbations of the stationary distribution, and are divided between those focusing on sensitivity as a derivative (e.g., Schweitzer 1968;Conlisk 1985;Golub and Meyer 1986) and studies focusing on perturbation bounds and condition numbers (Funderlic and Meyer 1986;Meyer 1994;Seneta 1988;Hunter 2005;Kirkland 2003); for reviews see Cho and Meyer (2000) and Kirkland et al. (2008). The approach here is similar in spirit to that of Schweitzer (1968), Conlisk (1985), and Golub and Meyer (1986), in that we focus on derivatives of Markov chain properties with respect to parameter perturbations, but taking advantage of the matrix calculus approach. We do not consider perturbation bounds here.

The Stationary Distribution
Theorem 11.4.1 Let π be the stationary distribution, satisfying Pπ = π and 1 T π = 1. The sensitivity of π is where Z is the fundamental matrix of the chain.
Proof The vector π is the right eigenvector of P, scaled to sum to 1. Applying Lemma 1, and noting that λ = 1 and 1 T P = 1 T , gives dπ = Z π T ⊗ I s − π 1 T dvec P. Noting that Zπ = π and simplifying the Kronecker products yields (11.55).
Based on an analysis of eigenvector sensitivity (Meyer and Stewart 1982), Golub and Meyer (1986) derived an expression for the derivative of π to a change in a single element of P using the group generalized inverse (I − P) # of I − P. Since (I − P) # = Z − π 1 T (Golub and Meyer 1986), expression (11.55) is exactly the Golub-Meyer result expressed in matrix calculus notation. Our results here permit sensitivity analysis of functions of π using only the chain rule. If g(π ) is a vectoror scalar-valued function of π , then Some examples will appear in Sect. 11.5.

The Fundamental Matrix
The fundamental matrix Z = I − P + π 1 T −1 plays a role in ergodic chains similar to that played by N 1 in absorbing chains (Kemeny and Snell 1960). It has been extended using generalized inverses (Meyer 1975;Kemeny 1981), but we do not consider those extensions here.

Theorem 11.4.2 The sensitivity of the fundamental matrix is
Substituting (11.55) for dπ and simplifying gives (11.57).

The First Passage Time Matrix
Let R = r ij be the matrix of mean first passage times from j to i, given by Iosifescu (1980, Thm. 4.7).
Again, this is the transpose of the expression obtained when P is row-stochastic.
Theorem 11.4.3 The sensitivity of R is where dπ is given by (11.55) and dvec Z is given by (11.57).

Mixing Time and the Kemeny Constant
The mixing time K of a chain is the mean time required to get from a specified state to a state chosen at random from the stationary distribution π. Remarkably, K is independent of the starting state (Grinstead and Snell 2003;Hunter 2006) and is sometimes called Kemeny's constant; it is a measure of the rate of convergence to stationarity, and is K = trace(Z) (Hunter 2006). In addition to being a quantity of interest in itself, the rate of convergence also plays a role in the sensitivity of the stationary distribution of ergodic chains (Hunter 2005;Mitrophanov 2005 Theorems 11.4.1,11.4.2,11.4.3,and 11.4.4 are written in terms of dvec P. However, perturbation of any element, say p kj , to p kj + θ kj , must be compensated for by adjustments of the other elements in column j so that the column sum remains equal to 1 (Conlisk 1985). Two kinds of compensation are likely to be of use in applications: additive and proportional. Additive compensation adjusts all the elements of the column by an equal amount, distributing the perturbation θ kj additively over column j . Proportional compensation distributes θ kj in proportion to the values of the p ij , for i = k. Proportional compensation is attractive because it preserves the pattern of zero and non-zero elements within P.

Implicit Parameters and Compensation
To develop the compensation formulae, let us start by considering a probability vector p, of dimension s × 1, with p i ≥ 0 and i p i = 1. Let θ i be the perturbation of p i , and write p(θ ) = p(0) + Aθ (11.68) for some matrix A to be determined. If y is a function of p, then dy = dy dp T dp dθ T dθ (11.69) evaluated at θ = 0.
Additive compensation For the case of additive compensation, we write The perturbation θ 1 is added to p 1 and compensated for by subtracting θ 1 /(s − 1) from all other entries of p; clearly i p i (θ) = 1 for any perturbation vector θ . The system of Eqs. (11.70) can be written Defining E to be a matrix of ones, then the matrix C can be written (as a so-called Toeplitz matrix) as C = E − I, with zeros on the diagonal and ones elsewhere. Thus the matrix A in (11.68) is Proportional compensation For proportional compensation, assume that p i < 1 for all i. The vector p(θ ) is The perturbation θ 1 is added to p 1 and compensated for by subtracting θ 1 p i /(1−p 1 ) from the ith entry of p. Again, i p i (θ ) = 1 for any perturbation vector θ. Equation (11.73) can be written so that the matrix A in (11.68) is The transition matrix We have derived compensation formulae for a single probability vector p. Now consider perturbation of a probability matrix P, each column of which is a probability vector. Define a perturbation matrix where θ ij is the perturbation of p ij . Perturbations of column j are to be compensated by a matrix A j , so that P( ) = P(0) + A 1 (:, 1) · · · A s (:, s) (11.76) where A i compensates for the changes in column i of P. Applying the vec operator to (11.76) gives vec P( ) = vec P(0) + The terms in the summation in (11.78) are recognizable as the vec of the product A i E ii ; thus where E ii is a matrix with a 1 in the (i, i) entry and zeros elsewhere.
Theorem 11.4.5 Let P be a column-stochastic s × s transition matrix. Let be a matrix of perturbations, where θ ij is applied to p ij , and the other entries of compensate for the perturbation. Let C = E − I. If compensation is additive, then If compensation is proportional, then (11.83) Proof P( ) is given by (11.79). If compensation is additive, A i is given by (11.72) for all i. Substituting into (11.79) gives (11.80). Differentiating (11.80) and applying the vec operator gives (11.81). If compensation is proportional, substituting (11.75) for A i in (11.79) gives (11.82). Differentiating yields (11.84) Using the vec operator gives (11.83).
Perturbations of P subject to compensation are given by perturbations of . Thus for any function y(P) we can write dy dvec T P comp = dy dvec T P dvec P dvec T (11.85) where dvec P/dvec T is given (for additive and proportional compensation) by Theorem 11.4.5. The slight notational complexity is worthwhile for clarifying how to use Theorem 11.4.5 in practice.

Species Succession in a Marine Community
Markov chains are used by ecologists as models of species replacement (succession) in ecological communities; (e.g., Horn 1975;Hill et al. 2004;Nelis and Wootton 2010). In these models, the state of a point on a landscape is given by the species occupying that point. The entry p ij of P is the probability that species j is replaced by species i between t and t +1. If a community consists of a large number of points independently subject to the transition probabilities in P, the stationary distribution π will give the relative frequencies of species in the community at equilibrium. Hill et al. (2004) used a Markov chain to describe a community of encrusting organisms occupying rock surfaces at 30-35 m depth in the Gulf of Maine. The Markov chain contained 14 species plus an additional state ("bare rock") for unoccupied substrate. The matrix P was estimated from longitudinal data (Hill et al. 2002(Hill et al. , 2004 and is given, along with a list of species names, in Appendix B. We will use the results of this chapter to analyze the sensitivity of species diversity and the Kemeny constant to the processes of colonization and replacement that determing P.

Biotic Diversity
The stationary distribution π , with the species numbered in order of decreasing abundance and bare rock placed at the end as state 15, is shown in Fig. 11.2. The two dominant species are an encrusting sponge (called Hymedesmia) and a bryozoan (Crisia).
The entropy of this stationary distribution, H (π) = −π T (log π ), where the logarithm is applied elementwise, is used as an index of biodiversity; it is maximal when all species are equally abundant and goes to 0 in a community dominated by a single species. The sensitivity of H is Most ecologists, however, would not include bare substrate in a measure of biodiversity, so we define instead a "biotic diversity" (11.87) The matrix G, of dimension 14 × 15, is a 0-1 matrix that selects rows 1-14 of π . Because π is positive, Gπ = 1 T Gπ . Differentiating π b gives This model contains no explicit parameters; perturbations of the transition probabilities themselves are of interest and a compensation pattern is needed. Because the relative magnitudes of the entries in a column of P reflect the relative abilities of species to capture or to hold space, proportional compensation is appropriate in this case because it preserves these relative abilities.
The sensitivity and elasticity of the biotic diversity H b to changes in the matrix P, subject to proportional compensation, are Term 1 on the right hand side of (11.90) is the derivative of H b with respect to π b , and is given by (11.86). Term 2 is the derivative of the biotic diversity vector π b with respect to the full diversity vector π , given by (11.89). Term 3 is the derivative of the diversity vector π with respect to the transition matrix P, given by, (11.55). Finally, Term 4 is the derivative of the matrix P taking into account the compensation structure in (11.83). The sensitivity and elasticity vectors (11.90) and (11.91) are of dimension 1 × s 2 = 1 × 255. To reduce the number of independent perturbations, we consider subsets of the p ij : disturbance (in which a species is replaced by bare rock), colonization of unoccupied space, replacement of one species by another, and persistence of a species in its location, where Extracting the corresponding elements of H b vec T P gives the elasticities to these classes of probabilities. Figure 11.3 shows that the dominant species (1 and 2) have impacts that are larger than, and opposite in sign to, those of the remaining species. Biodiversity would be enhanced by increasing the disturbance of, or the replacement of, species 1 and 2, and reduced by increasing the rates of colonization by, persistence of, or replacement by species 1 and 2.

The Kemeny Constant and Ecological Mixing
Ecologists have used several measures of the rate of convergence of communities modelled by Markov chains, including the damping ratio and Dobrushin's coefficient of ergodicity (Hill et al. 2004). The Kemeny constant K is an interesting addition to this list; it gives the expected time to get from any initial state to  Hill et al. (2004). States 1-14 correspond to species, numbered in decreasing order of abundance in the stationary distribution. State 15 is bare rock, unoccupied by any species. For the identity of species and the transition matrix, see Appendix B a state selected at random from the stationary distribution (Hunter 2006). Once reaching that state, the behavior of the chain and the stationary process are indistinguishable.
The sensitivity of K, subject to compensation, is T (11.92) where the three terms on the right hand side are given by (11.65), (11.57), and (11.83), respectively. Figure 11.4 shows the sensitivities dK/dvec T P, subject to proportional compensation, and aggregated as in Fig. 11.3. Unlike the case with H b , the two dominant species do not stand out from the others. Increases in the rates of replacement will speed up convergence, and increases in persistence will slow convergence. The disturbance of, colonization by, persistence of, and replacement of species 6 (it is a sea anemone, Urticina crassicornis) have particularly large impacts on K. Examination of row 6 and column 6 of P (Appendix B) shows that U. crassicornis has the highest probability of persistence (p 66 = 0.86), and one of the lowest rates of disturbance, in the community. While it is far from dominant ( Fig. 11.2), it has a major impact on the rate of mixing.

Discussion
Given that many properties of finite state Markov chains can be expressed as simple matrix expressions, matrix calculus is an attractive approach to finding the sensitivity and elasticity to parameter perturbations. Most of the literature on perturbation analysis of Markov chains has focused on the stationary distribution of ergodic chains, but the approach here is equally applicable to absorbing chains, and to dependent variables other than the stationary distribution. The perturbation of ergodic chains is often studied using generalized inverses, since the influential studies of Meyer (Meyer 1975(Meyer , 1994Golub and Meyer 1986;Funderlic and Meyer 1986). Matrix calculus provides a complementary approach; the sensitivity of the stationary distribution π obtained here agrees with the result obtained by Golub and Meyer (1986) using the group generalized inverse.
The examples shown here are typical of cases where absorbing or ergodic Markov chains are used in population biology and ecology. In each example, the dependent variables of interest are functions several steps removed from the chain itself. The ease with which one can differentiate such functions is a particularly attractive property of the matrix calculus approach.

A Appendix A: Proofs
Theorems 11.2.1 and 11.2.2 give the sensitivities of the moments of the number of visits to transient states and of the time to absorption, respectively. These results are obtained by applying matrix calculus to the expressions for the moments. Proofs are given in the text for the first two moments; the proofs for the others follow the same steps but introduce no new concepts, and so are presented here.

A.1 Derivatives of the Moments of Occupancy Times
To continue the proof of Theorem 11.2.1, take partial differentials of N 3 in (11.5) with respect to N 1 and N dg , to obtain Applying the vec operator to each term and using Roth's theorem gives (11.96) Substituting (11.95) and (11.96) into (11.13) gives (11.9). Taking partial differentials of N 4 in (11.6) gives Applying the vec operator yields (11.100) Substituting (11.99) and (11.100) into (11.13) gives (11.10).

A.2 Derivatives of the Moments of Time to Absorption
To continue the proof of Theorem 11.2.2, take partial differentials of η 3 , in (11.21) with respect to η 1 and N 1 , to obtain Applying the vec operator yields which combine to yield (11.25).
The transition matrix for the marine benthic community (Hill et al. 2004) is  Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.