# Sensitivity Analysis of Continuous Markov Chains

• Hal Caswell
Open Access
Chapter
Part of the Demographic Research Monographs book series (DEMOGRAPHIC)

## Abstract

When Markov chains are used as mathematical models of natural or social phenomena, the transition intensities or probabilities are usually defined in terms of parameters that are relevant to the scientific question at hand. Sensitivity analysis of such models is important because it quantifies the dependence of the model behavior on the parameters.

## 12.1 Introduction

When Markov chains are used as mathematical models of natural or social phenomena, the transition intensities or probabilities are usually defined in terms of parameters that are relevant to the scientific question at hand. Sensitivity analysis of such models is important because it quantifies the dependence of the model behavior on the parameters. This chapter presents sensitivity results for finite-state, continuous-time absorbing Markov chains, paralleling the approach for discrete-time chains in Chap. . In absorbing chains, interest focuses on behavior prior to absorption (time spent in transient states and time to absorption) and on the probabilities of absorption in each absorbing state. Here we will derive formulae for the sensitivity and the elasticity (i.e., proportional sensitivity) of the moments of the time to absorption, the time spent in each transient state, and the number of visits to each transient state.

The most basic difference between discrete-time and continuous-time Markov chains is that the former are defined by transition probabilities, while the latter are defined by transition rates. This leads to differences in the structure of the matrices, but there is a nice parallelism in the results.

Perturbation analysis of Markov chains has a long history (Schweitzer 1968; Meyer 1975). Most of the literature, however, is devoted to discrete-time chains, and most of that focuses on ergodic chains and the perturbation analysis of the stationary distribution; e.g. Funderlic and Meyer (1986), Golub and Meyer (1986), Hunter (2005), Cho and Meyer (2000), and Seneta (1993). Much less attention has been paid to continuous-time chains. Perturbation expansions have been developed for the stationary distribution of ergodic continuous-time chains, with application to queueing models (Altman et al. 2004), and sensitivity results and perturbation bounds presented for transient solutions (Ramesh and Trivedi 1993; Mitrophanov 2004). The operations research literature contains many studies of the sensitivity of performance measures calculated over realizations of a continuous-time ergodic Markov chain; e.g., Cao (1989), Glasserman (1992), and Cao et al. (1996). The results to be presented here complement and extend the existing literature on perturbation analysis of Markov chains, by focusing on the statistical properties of the solutions of absorbing continuous-time chains, by introducing the use of matrix calculus, and (as a consequence of that technique) extending the range of parameters whose effects can be evaluated.

### 12.1.1 Absorbing Markov Chains

I consider a finite state, homogeneous, continuous-time Markov chain with intensity matrix Q, where qij is the rate of transition from stage j to stage i. The intensity matrix satisfies qij ≥ 0 for i ≠ j and qjj = −∑ijqij. Note that Q is written in column-to-row orientation, and operates on column vectors. An absorbing chain contains at least one absorbing class of states. Numbering the states so that the transient states appear before the absorbing states leads to the intensity matrix
(12.1)
The matrix U contains rates of transitions among the transient states, and M contains the rates of transition from transient to absorbing states.

I assume that U and M are differentiable functions of a vector θ of parameters, and that Q[θ] remains an intensity matrix for sufficiently small perturbations of θ. This includes as a special case the situation where the elements of θ are simply some or all of the qij, i ≠ j. The goal of the perturbation analysis is to obtain the derivatives of properties of the chain with respect to θ.

## 12.2 Occupancy Time in Transient States

Let s be the number of transient states, and νij be the time spent in transient state i by an individual starting in transient state j. Define $${\mathbf {N}}_k = E \left ( \nu _{ij}^k \right )$$ as the matrix whose entries are the kth moments, and $${\mathbf {N}}_{\mathrm {dg}} = \left ( {\mathbf {N}}_1 \right )_{\mathrm {dg}}$$. The matrix N1 of expectations is the fundamental matrix of the chain. The first several moments of occupancy times are given by the entries of the matrices
\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_1 &\displaystyle =&\displaystyle - {\mathbf{U}}^{-1} {} \end{array} \end{aligned}
(12.2)
\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_2 &\displaystyle =&\displaystyle 2 {\mathbf{N}}_{\mathrm{dg}} {\mathbf{N}}_1 {} \end{array} \end{aligned}
(12.3)
\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_3 &\displaystyle =&\displaystyle 6 {\mathbf{N}}_{\mathrm{dg}}^2 {\mathbf{N}}_1 {} \end{array} \end{aligned}
(12.4)
\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{N}}_4 &\displaystyle =&\displaystyle 24 {\mathbf{N}}_{\mathrm{dg}}^3 {\mathbf{N}}_1 {} \end{array} \end{aligned}
(12.5)
and, in general, by
\displaystyle \begin{aligned} {\mathbf{N}}_k = k {\mathbf{N}}_{\mathrm{dg}} {\mathbf{N}}_{k-1} \qquad k\ge 2 {} \end{aligned}
(12.6)
(Iosifescu 1980, Thm. 8.7).
The differentials of the moments (12.2), (12.3), (12.4), and (12.5) are
\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, {\mathbf{N}}_1 &\displaystyle =&\displaystyle \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{N}}_1 \right) d \mbox{vec} \, {\mathbf{U}} {} \end{array} \end{aligned}
(12.7)
\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, {\mathbf{N}}_2 &\displaystyle =&\displaystyle 2 \left\{ \rule{0in}{2.2ex} \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{I}} \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}}) + \left( {\mathbf{I}} \otimes {\mathbf{N}}_{\mathrm{dg}} \right) \right\} \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{N}}_1 \right) d \mbox{vec} \, {\mathbf{U}} {} \end{array} \end{aligned}
(12.8)
\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, {\mathbf{N}}_3 &\displaystyle =&\displaystyle 6 \left\{ \rule{0in}{2.2ex} 2 \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{N}}_{\mathrm{dg}} \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}} ) + \left( {\mathbf{I}} \otimes {\mathbf{N}}_{\mathrm{dg}}^2 \right) \right\} \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{N}}_1 \right) d \mbox{vec} \, {\mathbf{U}} {}\\ \end{array} \end{aligned}
(12.9)
\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, {\mathbf{N}}_4 &\displaystyle =&\displaystyle 24\left\{ \rule{0in}{2.2ex} 3 \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{N}}_{\mathrm{dg}}^2 \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}}) + \left( {\mathbf{I}} \otimes {\mathbf{N}}_{\mathrm{dg}}^3 \right) \right\} \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{N}}_1 \right) d \mbox{vec} \, {\mathbf{U}}\\ {} \end{array} \end{aligned}
(12.10)
where I = Is throughout. A recursive relation for all the moments is
\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{N}}_k = k \left( {\mathbf{N}}_{k-1}^{\mathsf{T}} \otimes {\mathbf{I}} \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}}) d \mbox{vec} \, {\mathbf{N}} + k \left( {\mathbf{I}} \otimes {\mathbf{N}}_{\mathrm{dg}} \right) d \mbox{vec} \, {\mathbf{N}}_{k-1} \qquad k \ge 2. {}\end{aligned}
(12.11)
The variance, standard deviation, and coefficient of variation of the νij are important in applications; they are
\displaystyle \begin{aligned} \begin{array}{rcl} V \left( \nu_{ij} \right) &\displaystyle =&\displaystyle {\mathbf{N}}_2 - {\mathbf{N}}_1 \circ {\mathbf{N}}_1 {} \end{array} \end{aligned}
(12.12)
\displaystyle \begin{aligned} \begin{array}{rcl} SD \left( \nu_{ij} \right) &\displaystyle =&\displaystyle \sqrt{V \left( \nu_{ij} \right)} {} \end{array} \end{aligned}
(12.13)
\displaystyle \begin{aligned} \begin{array}{rcl} CV \left( \nu_{ij} \right) &\displaystyle =&\displaystyle \mathcal{D}\, \left( \mbox{vec} \, {\mathbf{N}}_1 \right)^{-1} \mbox{vec} \, SD \left( \nu_{ij} \right) {}\vspace{-3pt} \end{array} \end{aligned}
(12.14)
where the square root is taken elementwise. Their derivatives are
\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, V &\displaystyle =&\displaystyle 2 \left[ \rule{0in}{2.1ex} \left( {\mathbf{N}}^{\mathsf{T}} \otimes {\mathbf{I}} \right) \mathcal{D}\,( \mbox{vec} \, {\mathbf{I}} ) + \left( {\mathbf{I}} \otimes {\mathbf{N}}_{\mathrm{dg}} \right) - \mathcal{D}\,(\mbox{vec} \, {\mathbf{N}} ) \right] d \mbox{vec} \, {\mathbf{N}}_1 {}\\ \end{array} \end{aligned}
(12.15)
\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, SD &\displaystyle =&\displaystyle \frac{1}{2} \mathcal{D}\, \left[ \rule{0in}{2.1ex} \mbox{vec} \, SD \left( \nu_{ij} \right) \right]^{-1} d \mbox{vec} \, V {} \end{array} \end{aligned}
(12.16)
\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, CV &\displaystyle =&\displaystyle \mathcal{D}\, \left( \mbox{vec} \, {\mathbf{N}}_1 \right)^{-1} d \mbox{vec} \, SD \\ &\displaystyle &\displaystyle - \left[ \left( \mbox{vec} \, SD \right)^{\mathsf{T}} \mathcal{D}\, \left( \mbox{vec} \, {\mathbf{N}}_1 \right)^{-1} \otimes \mathcal{D}\, \left( \mbox{vec} \, {\mathbf{N}}_1 \right)^{-1} \right] \\ &\displaystyle &\displaystyle \times \mathcal{D}\, \left( \mbox{vec} \, {\mathbf{I}}_{s^2} \right) \left( {\mathbf{1}}_{s^2} \otimes {\mathbf{I}}_{s^2} \right) d \mbox{vec} \, {\mathbf{N}}_1 {} \end{array} \end{aligned}
(12.17)
(suppressing the arguments of V , SD and CV ). Because N1 usually contains zeros, $$\mathcal {D}\,(\mbox{vec} \, {\mathbf {N}}_1)^{-1}$$ must be restricted to the non-zero entries; the coefficient of variation is undefined if the mean is zero.

### Derivation

The fundamental matrix N1 = −U−1. Applying () yields (12.7). The derivatives of the higher moments are obtained by differentiating N2N4 in (12.3), (12.4), and (12.5). For example, the differential of N4 is
\displaystyle \begin{aligned} d {\mathbf{N}}_4 = 24 \left\{ \rule{0in}{2.1ex} 3 {\mathbf{N}}_{\mathrm{dg}}^2 \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_1 + {\mathbf{N}}_{\mathrm{dg}}^3 \left( d {\mathbf{N}}_1 \right) \right\}, \end{aligned}
(12.18)
using the fact that Ndg commutes with itself and dNdg. Applying the vec operator gives
\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{N}}_4 = 24 \left\{ \rule{0in}{2.1ex} 3 \left( {\mathbf{N}}_2^{\mathsf{T}} \otimes {\mathbf{N}}_{\mathrm{dg}}^2 \right) d \mbox{vec} \, {\mathbf{N}}_{\mathrm{dg}} + \left( {\mathbf{I}}_s \otimes {\mathbf{N}}_{\mathrm{dg}}^3 \right) d \mbox{vec} \, {\mathbf{N}}_1 \right\}. \end{aligned}
(12.19)
Substituting () for dvec Ndg and (12.7) for dvec N1 gives (12.10). Results (12.8) and (12.9) are obtained in similar fashion.
Differentiating the recurrence relationship (12.6) gives
\displaystyle \begin{aligned} d {\mathbf{N}}_k = k \left( d {\mathbf{N}}_{\mathrm{dg}} \right) {\mathbf{N}}_{k-1} + s {\mathbf{N}}_{\mathrm{dg}} \left( d {\mathbf{N}}_{k-1} \right). \end{aligned}
(12.20)
Apply the vec operator,
\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{N}}_k = k \left( {\mathbf{N}}_{k-1}^{\mathsf{T}} \otimes {\mathbf{I}}_s \right) d \mbox{vec} \, {\mathbf{N}}_{\mathrm{dg}} + k \left( {\mathbf{I}}_s \otimes {\mathbf{N}}_{\mathrm{dg}} \right) d \mbox{vec} \, {\mathbf{N}}_{k-1}, \end{aligned}
(12.21)
and substitute () for dvec Ndg to obtain (12.11).
The derivative of V  in (12.15) comes from differentiating (12.12),
\displaystyle \begin{aligned} d V = d {\mathbf{N}}_2 - 2 {\mathbf{N}}_1 \circ d {\mathbf{N}}_1, \end{aligned}
(12.22)
applying the vec operator,
\displaystyle \begin{aligned} D \mbox{vec} \, V = d \mbox{vec} \, {\mathbf{N}}_2 - 2 \mathcal{D}\, \left( \mbox{vec} \, {\mathbf{N}}_1 \right) d \mbox{vec} \, {\mathbf{N}}_1, \end{aligned}
(12.23)
and then using (12.7) and (12.8). The derivative of $$SD \left ( \nu _{ij} \right )$$ in (12.16) follows from (). The derivative of $$CV \left ( \nu _{ij}\right )$$ in (12.17) is obtained using (), with x = vec SD and y = vec N1.

## 12.3 Longevity: Time to Absorption

Let ηj be the time to absorption for an individual currently in transient state j. The vectors of the kth moments of the time to absorption, ηk, satisfy
\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_1^{\mathsf{T}} &\displaystyle =&\displaystyle {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_1 {} \end{array} \end{aligned}
(12.24)
\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_2^{\mathsf{T}} &\displaystyle =&\displaystyle (2) {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_1^2 {} \end{array} \end{aligned}
(12.25)
\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_3^{\mathsf{T}} &\displaystyle =&\displaystyle (6) {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_1^3 {} \end{array} \end{aligned}
(12.26)
\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{\eta}_4^{\mathsf{T}} &\displaystyle =&\displaystyle (24) {\mathbf{1}}^{\mathsf{T}} {\mathbf{N}}_1^4 {} \end{array} \end{aligned}
(12.27)
and in general
\displaystyle \begin{aligned} \boldsymbol{\eta}_k^{\mathsf{T}} = k \boldsymbol{\eta}_{k-1}^{\mathsf{T}} {\mathbf{N}}_1 \qquad k \ge 2 {} \end{aligned}
(12.28)
(Iosifescu 1980, Thm. 8.6)
The variance, standard deviation, and coefficient of variation of the time to absorption are
\displaystyle \begin{aligned} \begin{array}{rcl} V(\boldsymbol{\eta}) &\displaystyle =&\displaystyle \boldsymbol{\eta}_2 - \boldsymbol{\eta}_1 \circ \boldsymbol{\eta}_1 {} \end{array} \end{aligned}
(12.29)
\displaystyle \begin{aligned} \begin{array}{rcl} SD \left( \boldsymbol{\eta} \right) &\displaystyle =&\displaystyle \sqrt{V \left( \boldsymbol{\eta} \right)} {} \end{array} \end{aligned}
(12.30)
\displaystyle \begin{aligned} \begin{array}{rcl} CV \left( \boldsymbol{\eta} \right) &\displaystyle =&\displaystyle \mathcal{D}\, \left( \rule{0in}{2ex}SD(\boldsymbol{\eta}) \right)^{-1} \boldsymbol{\eta}_1 {} \end{array} \end{aligned}
(12.31)
with the square root taken elementwise.
The derivatives of the moments in (12.24), (12.25), (12.26), and (12.27) are given by
\displaystyle \begin{aligned} \begin{array}{rcl} d \boldsymbol{\eta}_1 &\displaystyle =&\displaystyle \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{U}} {} \end{array} \end{aligned}
(12.32)
\displaystyle \begin{aligned} \begin{array}{rcl} d \boldsymbol{\eta}_2 &\displaystyle =&\displaystyle \left\{ 2 \left[ \left( {\mathbf{N}}_1^{\mathsf{T}} \right)^2 \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right] + 2 \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} {\mathbf{N}}_1 \right) \right\} d \mbox{vec} \, {\mathbf{U}} {} \end{array} \end{aligned}
(12.33)
\displaystyle \begin{aligned} \begin{array}{rcl} d \boldsymbol{\eta}_3 &\displaystyle =&\displaystyle \left\{ \rule{0in}{3ex} 6 \left[ \left( {\mathbf{N}}_1^{\mathsf{T}} \right)^3 \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right] + 6 \left[ \left( {\mathbf{N}}_1^{\mathsf{T}} \right)^2 \otimes \boldsymbol{\eta}_1^{\mathsf{T}} {\mathbf{N}}_1 \right] \right. \\ &\displaystyle &\displaystyle + \left. 3 \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes \boldsymbol{\eta}_2^{\mathsf{T}} {\mathbf{N}}_1 \right) \rule{0in}{3ex} \right\} d \mbox{vec} \, {\mathbf{U}} {} \end{array} \end{aligned}
(12.34)
\displaystyle \begin{aligned} \begin{array}{rcl} d \boldsymbol{\eta}_4 &\displaystyle =&\displaystyle \left\{ 24 \left[ \left( {\mathbf{N}}_1^{\mathsf{T}} \right)^4 \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right] + 24 \left[ \left( {\mathbf{N}}_1^{\mathsf{T}} \right)^3 \otimes \boldsymbol{\eta}_1^{\mathsf{T}} {\mathbf{N}}_1 \right] \right. \\ &\displaystyle &\displaystyle ~~\left. + 12 \left[ \left( {\mathbf{N}}_1^{\mathsf{T}} \right)^2 \otimes \boldsymbol{\eta}_2^{\mathsf{T}} {\mathbf{N}}_1 \right] + 4 \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes \boldsymbol{\eta}_3^{\mathsf{T}} {\mathbf{N}}_1 \right) \right\} d \mbox{vec} \, {\mathbf{U}} {} \end{array} \end{aligned}
(12.35)
and, recursively,
\displaystyle \begin{aligned} d \boldsymbol{\eta}_k = k {\mathbf{N}}_1^{\mathsf{T}} d \boldsymbol{\eta}_{k-1} + k \left( {\mathbf{I}}_s \otimes \boldsymbol{\eta}_{k-1}^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{N}}_1. {} \end{aligned}
(12.36)
The derivatives of the variance, standard deviation, and coefficient of variation of the time to absorption are (suppressing the arguments)
\displaystyle \begin{aligned} \begin{array}{rcl} d V &\displaystyle =&\displaystyle 2 \left\{ \left[ \left( {\mathbf{N}}_1^{\mathsf{T}} \right)^2 \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right] + \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} {\mathbf{N}}_1 \right) - \mathcal{D}\, \left( \boldsymbol{\eta}_1 \right) \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes \boldsymbol{\eta}_1^{\mathsf{T}} \right) \right\} d \mbox{vec} \, {\mathbf{U}} {}\\ \end{array} \end{aligned}
(12.37)
\displaystyle \begin{aligned} \begin{array}{rcl} d SD &\displaystyle =&\displaystyle \frac{1}{2} \mathcal{D}\, \left( SD \right)^{-1} d V {} \end{array} \end{aligned}
(12.38)
\displaystyle \begin{aligned} \begin{array}{rcl} d CV &\displaystyle =&\displaystyle \mathcal{D}\, \left( \boldsymbol{\eta}_1 \right)^{-1} d SD - \left[ SD^{\mathsf{T}} \mathcal{D}\, \left( \boldsymbol{\eta}_1 \right)^{-1} \otimes \mathcal{D}\, \left( \boldsymbol{\eta}_1 \right)^{-1} \right] \\ &\displaystyle &\displaystyle \times \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}}_s) \left( {\mathbf{1}}_s \otimes {\mathbf{I}}_s \right) d \boldsymbol{\eta}_1. {} \end{array} \end{aligned}
(12.39)

### Derivation

Differentiating (12.24) for the expected time to absorption gives
\displaystyle \begin{aligned} d \boldsymbol{\eta}_1^{\mathsf{T}} = {\mathbf{1}}_s^{\mathsf{T}} d {\mathbf{N}}_1,\end{aligned}
(12.40)
Applying the vec operator, substituting (12.7) for dvec N1, and simplifying gives (12.32). The derivatives of the higher moments are obtained in the same way; e.g., for η4,
\displaystyle \begin{aligned} d \boldsymbol{\eta}_4^{\mathsf{T}} = (24) {\mathbf{1}}_s^{\mathsf{T}} \left[ \rule{0in}{2.2ex} \left( d {\mathbf{N}}_1 \right) {\mathbf{N}}_1^3 + {\mathbf{N}}_1 \left( d {\mathbf{N}}_1 \right) {\mathbf{N}}_1^2 + {\mathbf{N}}_1^2 \left( d {\mathbf{N}}_1 \right) {\mathbf{N}}_1 + {\mathbf{N}}_1^3 \left( d {\mathbf{N}}_1 \right) \right].\end{aligned}
(12.41)
Applying the vec operator yields
\displaystyle \begin{aligned} \begin{array}{rcl} d \boldsymbol{\eta}_4 &\displaystyle =&\displaystyle 24 \left\{ \left[ \left( {\mathbf{N}}_1^{\mathsf{T}} \right)^3 \otimes {\mathbf{1}}_s^{\mathsf{T}} \right] + \left[ \left( {\mathbf{N}}_1^{\mathsf{T}} \right)^2 \otimes {\mathbf{1}}_s^{\mathsf{T}} {\mathbf{N}}_1 \right] + \left[ {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{1}}_s^{\mathsf{T}} {\mathbf{N}}_1^2 \right]\right.\\ &\displaystyle &\displaystyle \left.+ \left[ {\mathbf{I}}_s \otimes {\mathbf{1}}_s^{\mathsf{T}} {\mathbf{N}}_1^3 \right] \right\} d \mbox{vec} \, {\mathbf{N}}_1.\vspace{-3pt} \end{array} \end{aligned}
(12.42)
Substituting (12.7) for dvec N1 and simplifying using Eqs. (12.24), (12.25), and (12.26) gives (12.35). The derivatives of the second and third moments, (12.33) and (12.34), are obtained in similar fashion.
The recursive formula (12.36) is obtained by differentiating (12.28)
\displaystyle \begin{aligned} d \boldsymbol{\eta}_k^{\mathsf{T}} = k \left( d \boldsymbol{\eta}_{k-1}^{\mathsf{T}} \right) {\mathbf{N}}_1 + k \boldsymbol{\eta}_{k-1}^{\mathsf{T}} d {\mathbf{N}}_1.\end{aligned}
(12.43)
Apply the vec operator,
\displaystyle \begin{aligned} d \boldsymbol{\eta}_k = k {\mathbf{N}}_1^{\mathsf{T}} d \boldsymbol{\eta}_{k-1} + k \left( {\mathbf{I}}_s \otimes \boldsymbol{\eta}_{k-1}^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{N}}_1, \end{aligned}
(12.44)
substitute (12.7) for dvec N1, and simplify, to obtain (12.36).
Differentiating (12.29) for the variance yields
\displaystyle \begin{aligned} d V = d \boldsymbol{\eta}_2 - 2 \boldsymbol{\eta}_1 \circ d \boldsymbol{\eta}_1. \end{aligned}
(12.45)
Applying the vec operator gives
\displaystyle \begin{aligned} d V = d \boldsymbol{\eta}_2 - 2 \mathcal{D}\, \left( \boldsymbol{\eta}_1 \right) d \boldsymbol{\eta}_1. \end{aligned}
(12.46)
Substituting (12.32) for dη1 and (12.33) for dη2 gives the result (12.37). The derivatives of the standard deviation, in (12.38), and the coefficient of variation, in (12.39), are obtained by differentiating (12.30) and (12.31) and applying () and ().

## 12.4 Multiple Absorbing States and Probabilities of Absorption

Consider a chain that includes a > 1 absorbing states. The entry mij of the a × s submatrix M in (12.1) is the rate of transition from transient state j to absorbing state i. The probabilities of absorption are defined as
\displaystyle \begin{aligned} b_{ij} = P \left[ \mbox{absorption in }i \left| \mbox{starting in }j \right. \right]. \end{aligned}
(12.47)
The a × s matrix $${\mathbf {B}} = \left (\begin {array}{c} b_{ij} \end {array}\right )$$ is
\displaystyle \begin{aligned} {\mathbf{B}} = {\mathbf{M}} {\mathbf{N}}_1 {} \end{aligned}
(12.48)
(Iosifescu 1980, Section 8.5.6). Column j of B is the probability distribution of the eventual absorption state for an individual starting in transient state j. Usually a few starting states are of particular interest (e.g., states corresponding to “birth”). Let B(:, j) = Bej denote column j of B, where ej is the jth unit vector of length s. Then
\displaystyle \begin{aligned} d {\mathbf{B}}(:,j) = \left( {\mathbf{e}}_j^{\mathsf{T}} \otimes {\mathbf{I}}_s \right) d \mbox{vec} \, {\mathbf{B}}. {} \end{aligned}
(12.49)
Similarly, row i of B is $${\mathbf {B}}(i,:)={\mathbf {e}}_i^{\mathsf {T}} {\mathbf {B}}$$ and
\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}}(i,:) = \left( {\mathbf{I}}_s \otimes {\mathbf{e}}_i^{\mathsf{T}} \right) d \mbox{vec} \, {\mathbf{B}} {} \end{aligned}
(12.50)
where ei is the ith unit vector of length a. The derivative of B in (12.49) and (12.50) is
\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}} = \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{I}} \right) d \mbox{vec} \, {\mathbf{M}} + \left( {\mathbf{N}}_1^{\mathsf{T}} \otimes {\mathbf{B}} \right) d \mbox{vec} \, {\mathbf{U}}. {} \end{aligned}
(12.51)

### Derivations

Differentiating (12.48) yields
\displaystyle \begin{aligned} d {\mathbf{B}} = \left( d {\mathbf{M}} \right) {\mathbf{N}}_{1} + {\mathbf{M}} \left( d {\mathbf{N}}_{1} \right). \end{aligned}
(12.52)
Applying the vec operator and simplifying gives
\displaystyle \begin{aligned} d \mbox{vec} \, {\mathbf{B}} = \left( {\mathbf{N}}_{1}^{\mathsf{T}} \otimes {\mathbf{I}} \right) d \mbox{vec} \, {\mathbf{M}} + \left( {\mathbf{I}} \otimes {\mathbf{M}} \right) d \mbox{vec} \, {\mathbf{N}}_{1} \end{aligned}
(12.53)
Substituting (12.7) for dvec N1 and simplifying gives (12.51).

## 12.5 The Embedded Chain: Discrete Transitions Within a Continuous Process

If a continuous-time chain is observed only at the moments when it changes state, the result is a discrete-time process called the embedded Markov chain, or the jump chain, associated with Q (Iosifescu 1980, Section 8.3.2). The transition matrix of this embedded chain can be written
(12.54)
where
\displaystyle \begin{aligned} \begin{array}{rcl} \widehat{{\mathbf{U}}} &\displaystyle =&\displaystyle {\mathbf{I}}_s - {\mathbf{U}} {\mathbf{U}}_{\mathrm{dg}}^{-1} {} \end{array} \end{aligned}
(12.55)
\displaystyle \begin{aligned} \begin{array}{rcl} \widehat{{\mathbf{M}}} &\displaystyle =&\displaystyle -{\mathbf{M}} {\mathbf{U}}_{\mathrm{dg}}^{-1}. {} \end{array} \end{aligned}
(12.56)
The embedded chain provides information on the number of visits to each transient state, rather than the time spent in each transient state. The expected numbers of such visits are given by the fundamental matrix
\displaystyle \begin{aligned} \widehat{{\mathbf{N}}}_1 = \left( {\mathbf{I}} - \widehat{{\mathbf{U}}} \right)^{-1}. {} \end{aligned}
(12.57)
The sensitivity analysis of the embedded chain follows directly from the discrete-time results in previous chapters (Chaps. and ).
In particular, the differential of $$\widehat {{\mathbf {N}}}_1$$ is Caswell (2006)
\displaystyle \begin{aligned} d \mbox{vec} \, \widehat{{\mathbf{N}}}_1 = \left( \widehat{{\mathbf{N}}}_1^{\mathsf{T}} \otimes \widehat{{\mathbf{N}}}_1 \right) d \mbox{vec} \, \widehat{{\mathbf{U}}}. \end{aligned}
(12.58)
However, this derivative is unlikely to be the sensitivity we are looking for. The continuous-time chain is likely to be parameterized in terms of the rate matrices U and M, rather than the probability matrices $$\widehat {{\mathbf {U}}}$$ and $$\widehat {{\mathbf {M}}}$$. To express the perturbation analysis of $$\widehat {{\mathbf {P}}}$$ in terms of the parameters of Q requires the derivatives of the embedded chain with respect to the continuous chain; i.e.,
\displaystyle \begin{aligned} {d \mbox{vec} \, \widehat{{\mathbf{U}}} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{U}}} \quad \mbox{and} \quad {d \mbox{vec} \, \widehat{{\mathbf{M}}} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{M}}}. \end{aligned}
These derivatives are
\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, \widehat{{\mathbf{U}}} &\displaystyle =&\displaystyle \left[ - \left( {\mathbf{U}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{I}}_s \right) + \left( {\mathbf{U}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{U}} {\mathbf{U}}_{\mathrm{dg}}^{-1} \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}}_s) \right] d \mbox{vec} \, {\mathbf{U}} {} \end{array} \end{aligned}
(12.59)
\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, \widehat{{\mathbf{M}}} &\displaystyle =&\displaystyle - \left( {\mathbf{U}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{I}}_a \right) d \mbox{vec} \, {\mathbf{M}} \\ &\displaystyle &\displaystyle + \left({\mathbf{I}}_s \otimes {\mathbf{M}} \right) \left( {\mathbf{U}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{U}}_{\mathrm{dg}}^{-1} \right) \times \mathcal{D}\, \left( \mbox{vec} \, {\mathbf{I}}_s \right) d \mbox{vec} \, {\mathbf{U}}. {} \end{array} \end{aligned}
(12.60)
Using (12.59) and (12.61), one can write
\displaystyle \begin{aligned} {d \mbox{vec} \, \widehat{{\mathbf{N}}}_1 \over d \boldsymbol{\theta}^{\mathsf{T}}} = \left( \widehat{{\mathbf{N}}}_1^{\mathsf{T}} \otimes \widehat{{\mathbf{N}}}_1 \right) {d \mbox{vec} \, \widehat{{\mathbf{U}}} \over d \mbox{vec} \,^{\mathsf{T}} {\mathbf{U}}} \; {d \mbox{vec} \, {\mathbf{U}} \over d \boldsymbol{\theta}^{\mathsf{T}}}. \end{aligned}
(12.61)

### Derivation

Differentiate $$\widehat {{\mathbf {U}}}$$ in (12.55),
\displaystyle \begin{aligned} d \widehat{{\mathbf{U}}} = - \left( d {\mathbf{U}} \right) {\mathbf{U}}_{\mathrm{dg}}^{-1} - {\mathbf{U}} \left( d {\mathbf{U}}_{\mathrm{dg}}^{-1} \right), \end{aligned}
(12.62)
apply the vec operator, and use () and () for $$d \mbox{vec} \, {\mathbf {U}}_{\mathrm {dg}}^{-1}$$. The result is
\displaystyle \begin{aligned} \begin{array}{rcl} d \mbox{vec} \, \widehat{{\mathbf{U}}} &\displaystyle =&\displaystyle - \left[ \left( {\mathbf{U}}_{\mathrm{dg}}^{-1} \right)^{\mathsf{T}} \otimes {\mathbf{I}}_s \right] d \mbox{vec} \, {\mathbf{U}} - \left( {\mathbf{I}}_s \otimes {\mathbf{U}} \right) d \mbox{vec} \, {\mathbf{U}}_{\mathrm{dg}}^{-1} \\ {} &\displaystyle =&\displaystyle - \left( {\mathbf{U}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{I}}_s \right) d \mbox{vec} \, {\mathbf{U}} + \left( {\mathbf{I}}_s \otimes {\mathbf{U}} \right) \left( {\mathbf{U}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{U}}_{\mathrm{dg}}^{-1} \right) \mathcal{D}\,(\mbox{vec} \, {\mathbf{I}}_s) d \mbox{vec} \, {\mathbf{U}} \end{array} \end{aligned}
which simplifies to give (12.59). Similarly, differentiating $$\widehat {{\mathbf {M}}}$$ in (12.56) and applying the vec operator gives
\displaystyle \begin{aligned} d \mbox{vec} \, \widehat{{\mathbf{M}}} = - \left( {\mathbf{U}}_{\mathrm{dg}}^{-1} \otimes {\mathbf{I}}_a \right) d \mbox{vec} \, {\mathbf{M}} - \left( {\mathbf{I}}_s \otimes {\mathbf{M}} \right) d \mbox{vec} \, {\mathbf{U}}_{\mathrm{dg}}^{-1}. \end{aligned}
(12.63)
Using () and () for $$d \mbox{vec} \, {\mathbf {U}}_{\mathrm {dg}}^{-1}$$ and simplifying gives (12.61).

## 12.6 An Example: A Model of Disease Progression

An important area of application of continuous-time Markov chains is the modelling of transitions among disease states. In this context, the time to absorption is longevity, and the time spent in various transient states has implications for the quality of life during the disease. Fix and Neyman (1951) introduced the idea and proposed a 4-state model for cancer, with two transient states (under treatment or not) and two absorbing states (death from cancer or from other causes). Kay (1986) proposed a model with k disease states and an absorbing state representing death. There is now a large literature on such models and their estimation. Recently, studies have proliferated that use Markov chain models of disease transmission to explore the cost-effectiveness of screening and treatment procedures (e.g., Kuo et al. 1999; Chen et al. 1999; Wu et al. 2006; Sonnenberg and Beck 1993).

Sensitivity analysis reveals how these demographic properties respond to changes in parameters. As an example, I consider a model for the progression of colorectal cancer (CRC) that was developed to study the cost-effectiveness of a new CRC screening technique based on DNA testing of stool samples (Wu et al. 2006). The model includes 7 transient states (normal, small and large adenoma, early and late preclinical CRC, and early and late clinical CRC) and 2 absorbing states (death from CRC and death from other causes); see Fig. 12.1. Parameters were estimated from the literature and from clinical studies in Taiwan.
This model, which describes the so-called natural history of the disease, was embedded in a larger decision model to compare the cost-effectiveness of screening strategies. The intensity matrix (12.1) corresponding to Fig. 12.1 is
(12.64)
The λi are transition rates; μ is the mortality rate from other causes of death. The incidence rate of small adenoma (λ1) and the mortality rate due to other causes of death (μ) are age-dependent. Here I have analyzed values for age 70; based on figures in Wu et al. (2006). This leads to a parameter vector (all rates are per year):
\displaystyle \begin{aligned} \boldsymbol{\theta} = \left(\begin{array}{c} \lambda_1 \\ \vdots\\ \lambda_8 \\ \mu \end{array}\right) = \left(\begin{array}{r} 1.52\times 10^{-2}\\ 3.46\times 10^{-2}\\ 2.15\times 10^{-2}\\ 3.70\times 10^{-1}\\ 2.38\times 10^{-1}\\ 4.85\times 10^{-1}\\ 3.02\times 10^{-2}\\ 2.10\times 10^{-1}\\ 2.20\times 10^{-2} \end{array}\right). \end{aligned}
(12.65)

### 12.6.1 Sensitivity Results

The fundamental matrix (12.2) is
\displaystyle \begin{aligned} {\mathbf{N}}_1 = \left(\begin{array}{rrrrrrr} 26.9& 0& 0& 0& 0& 0& 0 \\ 7.2&17.7& 0& 0& 0& 0& 0 \\ 5.7&14.0&23.0& 0& 0& 0& 0 \\ 0.2& 0.5& 0.8& 1.6& 0& 0& 0 \\ 0.1& 0.4& 0.6& 1.2& 2.0& 0& 0 \\ 0.9& 2.2& 3.6& 7.2& 0&19.2& 0 \\ 0.3& 0.7& 1.2 & 2.4& 4.1& 0.00& 4.3 \end{array}\right). \end{aligned}
(12.66)
Thus, given these rates, a 70-year old normal condition individual would expect to spend 27 years in stage 1, and only 0.9 and 0.3 years in stages 6 and 7 (early and late clinical CRC).1 Individuals in more advanced stages can expect to spend progressively longer periods in stages 6 and 7 (compare across rows 6 and 7 of N1).
The standard deviations (12.13) of the times spent in the transient states are
\displaystyle \begin{aligned} SD \left(\nu_{ij} \right) = \left(\begin{array}{rrrrrrr} 26.9& 0& 0& 0& 0& 0& 0 \\ 14.2&17.7& 0& 0& 0& 0& 0 \\ 15.2&21.2&23.0& 0& 0& 0& 0 \\ 0.8& 1.1& 1.4& 1.6& 0& 0& 0 \\ 0.7& 1.1& 1.4& 1.8& 2.0& 0& 0 \\ 5.8& 8.9&11.2&15.0& 0&19.2& 0 \\ 1.6& 2.4& 3.0& 3.9& 4.3& 0& 4.3 \end{array}\right). \end{aligned}
(12.67)
Clearly, considerable variation can be expected in the times spent in the various states; the standard deviation equals or exceeds the mean in every case.
Considering the sensitivity analysis of the time spent in transient states, focus on the fate of a normal (state 1) individual. The expected times spent in each state by such an individual are give by N1(:, 1). From (12.7) and () the sensitivity and elasticity of N(:, 1) are
\displaystyle \begin{aligned} \begin{array}{rcl} {d {\mathbf{N}}_1(:,1) \over d \boldsymbol{\theta}^{\mathsf{T}}} &=& \left(\begin{array}{rrrrrrrrr} -722.6& 0& 0 & 0 & 0 & 0 & 0 & 0& -722.6 \\ 280.9& -127.5& 0 & 0 & 0 & 0 & 0 & 0& -321.6 \\ 223.4& 64.5& -132.0 & 0 & 0 & 0 & 0 & 0& -387.8 \\ 7.6& 2.2& 4.6 & -0.3 & -0.3 & 0 & 0 & 0& -13.5 \\ 5.6& 1.6& 3.4 & 0.2 & -0.2 & -0.3 & 0 & 0& -10.2 \\ 34.8& 10.0& 21.0 & -1.4 & 2.3 & 0 & -17.1 & 0& -79.0 \\ 11.6& 3.4& 7.0 & 0.3 & -0.5 & 0 & 0 & -1.3& -22.5 \end{array}\right) \\ {} {\epsilon {\mathbf{N}}_1(:,1) \over \epsilon \boldsymbol{\theta}^{\mathsf{T}}} &=& \left(\begin{array}{rrrrrrrrr} -0.4 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -0.6 \\ 0.6 & -0.6 & 0 & 0 & 0 & 0 & 0 & 0 & -1.0 \\ 0.6 & 0.4 & -0.5 & 0 & 0 & 0 & 0 & 0 & -1.5 \\ 0.6 & 0.4 & 0.5 & -0.6 & -0.4 & 0 & 0 & 0 & -1.5\\ 0.6 & 0.4 & 0.5 & 0.4 & -0.4 & -1.0 & 0 & 0 & -1.5 \\ 0.6 & 0.4 & 0.5 & -0.6 & 0.6 & 0 & -0.6 & 0 & -1.9 \\ 0.6 & 0.4 & 0.5 & 0.4 & -0.4 & 0.0 & 0 & -0.9 & -1.7 \end{array}\right). {} \end{array} \end{aligned}
(12.68)

These elasticities imply that a 1% increase in λ1 will (to first order) cause about a 0.4% decrease in the mean time spent in the normal state and a 0.6% increase in the mean time spent in each other state. A 1% increase in λ4 (the rate of transition between early and late preclinical CRC) creates a 0.6% decrease in the time spent in stages 4 and 6 (the early CRC stages) and a 0.4% increase in the time spent in stages 5 and 7 (the late CRC stages). An increase in the mortality rate μ due to other causes of death reduces the time spent in any of the transient states.

The elasticity of the variance in the time spent in the transient states by an individual in state 1 is
\displaystyle \begin{aligned} {\epsilon V(\nu_{i1}) \over \epsilon \boldsymbol{\theta}^{\mathsf{T}}}= \left(\begin{array}{rrrrrrrrr} -0.8& 0& 0& 0& 0& 0& 0& 0& -1.2 \\ 0.4& -1.2& 0& 0& 0& 0& 0& 0& -1.2 \\ 0.5& 0.3& -1.0& 0& 0& 0& 0& 0& -1.8 \\ 0.5& 0.4& 0.5& -1.2& -0.8& 0& 0& 0& -1.5 \\ 0.6& 0.4& 0.5& 0.4& -0.4& -1.9& 0& 0& -1.6 \\ 0.6& 0.4& 0.5& -0.6& 0.6& 0& -1.2& 0& -2.3 \\ 0.6& 0.4& 0.5& 0.4& -0.4& 0.0& 0& -1.8& -1.7 \end{array}\right). {} \end{aligned}
(12.69)
The sign pattern is the same as that of the elasticities of the mean times in (12.68), so we conclude that any parameter change that increases the mean time spent in a transient state will also increase the variance in that time. The elasticities of the variance are comparable to those of the mean (cf. (12.68) and (12.69)), showing that the means and the variance respond with roughly equal proportional changes.
Longevity is measured by the time to absorption, and is a primary concern in analyses of screening or treatment protocols. The vectors of the mean, standard deviation, and coefficient of variation of longevity are
\displaystyle \begin{aligned} \boldsymbol{\eta}_1 = \left(\begin{array}{r} 41.4 \\ 35.5 \\ 29.1 \\ 12.4 \\ 6.1 \\ 19.2 \\ 4.3 \end{array}\right) \quad SD(\boldsymbol{\eta}) = \left(\begin{array}{r} 37.4 \\ 30.3 \\ 25.8 \\ 14.1 \\ 4.7 \\ 19.2 \\ 4.3 \end{array}\right) \quad CV(\boldsymbol{\eta}) = \left(\begin{array}{r} 0.9 \\ 0.9 \\ 0.9 \\ 1.1 \\ 0.8 \\ 1.0 \\ 1.0 \end{array}\right). \end{aligned}
(12.70)
The sensitivity and elasticity of expected longevity (life expectancy) with respect to θ are
\displaystyle \begin{aligned} \begin{array}{rcl} {d \boldsymbol{\eta}_1 \over d \boldsymbol{\theta}^{\mathsf{T}}} &=& \left(\begin{array}{rrrrrrrrr} -158.7& -45.8& -96.0& -1.2& 1.3& -0.2& -17.1& -1.3&-1557.2 \\ 0&-112.2&-234.9& -3.0& 3.2& -0.6& -41.9& -3.2&-1089.1 \\ 0& 0&-384.2& -5.0& 5.3& -1.0& -68.6& -5.2&-756.5 \\ 0& 0& 0&-10.0& 10.7& -2.1&-138.8&-10.4&-176.0 \\ 0& 0& 0& 0& 0& -3.5& 0&-17.8& -29.8 \\ 0& 0& 0& 0& 0& 0&-367.0& 0&-367.0 \\ 0& 0& 0& 0& 0& 0& 0&-18.6& -18.6 \end{array}\right) {} \\ {} {\epsilon \boldsymbol{\eta}_1 \over \epsilon \boldsymbol{\theta}^{\mathsf{T}}} &=& \left(\begin{array}{rrrrrrrrr} -0.06& -0.04& -0.05& -0.01 & 0.01& -0.00& -0.01& -0.01& -0.83 \\ 0& -0.11& -0.14& -0.03 & 0.02& -0.01& -0.04& -0.02& -0.68 \\ 0& 0& -0.28& -0.06 & 0.04& -0.02& -0.07& -0.04& -0.57 \\ 0& 0& 0& -0.30 & 0.21& -0.08& -0.34& -0.18& -0.31 \\ 0& 0& 0& 0 & 0& -0.28& 0& -0.61& -0.11 \\ 0& 0& 0& 0 & 0& 0& -0.58& 0& -0.42 \\ 0& 0& 0& 0 & 0& 0& 0& -0.91& -0.09 \end{array}\right). {} \end{array} \end{aligned}
(12.71)

Almost all the nonzero elements are negative, because increasing any of the rates leading towards clinical CRC reduces life expectancy, as does increasing the mortality rate due to other causes of death. The exceptions are the sensitivities and elasticities of η1 to λ5 (in column 5 of these matrices), which are positive because λ5 delays the onset of clinical CRC (cf. Fig. 12.1).

The elasticities of E(η1), the life expectancy of a normal individual, to a change in θ, appear in the first row of (12.71). The largest of these (except for the last column, representing mortality from other causes of death) are to changes in λ1, λ2, and λ3, the rates of transition from normal to small adenoma, small to large adenoma, and large adenoma to preclinical CRC. The rates λ2 and λ3 have large effects on E(η2), and λ3 has a large effect on E(η3). These transitions are targets of screening and early treatment; this analysis quantifies the effect that such interventions could have.

The sensitivity and elasticity of the standard deviation of longevity are
\displaystyle \begin{aligned}\renewcommand\theequation{\thechapter.\arabic{equation}} {d SD \left(\boldsymbol{\eta} \right) \over d \boldsymbol{\theta}^{\mathsf{T}}} = \left(\begin{array}{rrrrrrrrr} -0.27&-0.07&-0.16&-0.00& 0.00&-0.00&-0.03&-0.00&-1.19 \\ 0&-0.13&-0.31&-0.00& 0.00&-0.00&-0.06&-0.00&-0.76 \\ 0& 0&-0.43&-0.00& 0.00&-0.00&-0.09&-0.00&-0.61 \\ 0& 0& 0&-0.01& 0.01& 0.00 &-0.27& 0.00&-0.27 \\ 0& 0& 0& 0& 0&-0& 0.00&-0.02&-0.02 \\ 0& 0& 0& 0& 0& 0&-0.37& 0&-0.37 \\ 0& 0& 0& 0& 0& 0& 0&-0.02&-0.02 \end{array}\right) \times 10^3 \end{aligned}
(12.72)
and
\displaystyle \begin{aligned}\renewcommand\theequation{\thechapter.\arabic{equation}} {\epsilon SD \left(\boldsymbol{\eta} \right) \over \epsilon \boldsymbol{\theta}^{\mathsf{T}}} = \left(\begin{array}{rrrrrrrrr} -0.11&-0.06&-0.09&-0.02& 0.01&-0.00&-0.02&-0.01&-0.70 \\ 0&-0.15&-0.22&-0.04& 0.03&-0.00&-0.06&-0.01&-0.55 \\ 0& 0&-0.36&-0.05& 0.05&-0.00&-0.11&-0.01&-0.52 \\ 0& 0& 0&-0.23& 0.23& 0.01&-0.58& 0.00&-0.43 \\ 0& 0& 0& 0& 0&-0.16& 0.00&-0.75&-0.09 \\ 0& 0& 0& 0& 0& 0&-0.58& 0&-0.42 \\ 0& 0& 0& 0& 0& 0& 0&-0.91&-0.09 \end{array}\right). \end{aligned}
(12.73)
These have the same sign pattern as the sensitivity of η1, indicating that any increase in life expectancy will be accompanied by an increase in the variance of longevity. The coefficient of variation takes this joint change into account; from (12.39),
\displaystyle \begin{aligned}\renewcommand\theequation{\thechapter.\arabic{equation}} {\epsilon CV \left( \boldsymbol{\eta} \right) \over \epsilon \boldsymbol{\theta}^{\mathsf{T}}} = \left(\begin{array}{rrrrrrrrr} 0.04& 0.02& 0.03& 0.00&-0.00&-0.00& 0.01&-0.00&-0.31 \\ 0&-0.00& 0.02&-0.01& 0.00&-0.01& 0.01&-0.01&-0.38 \\ 0& 0&-0.01&-0.03& 0.01&-0.02& 0.01&-0.04&-0.21 \\ 0.00& 0.00& 0.00&-0.00&-0.07&-0.08& 0.32&-0.14& 0.19 \\ 0& 0& 0& 0.00& 0.00&-0.30& 0.00&-0.27&-0.09 \\ 0& 0& 0& 0& 0& 0.00& 0.00& 0.00& 0.00 \\ 0& 0& 0& 0& 0& 0& 0& 0.00& 0.00 \end{array}\right). \end{aligned}
(12.74)

Most of these elasticities are small, suggesting that the mean and standard deviation respond roughly proportionally, so that the CV  does not change much.

The matrix B in (12.48), giving the ultimate probability of death from CRC (row 1) or other causes of death (row 2) is
\displaystyle \begin{aligned} {\mathbf{B}} = \left(\begin{array}{rrrrrrr} 0.1 & 0.2 & 0.4 & 0.7 & 0.9 & 0.6 & 0.9 \\ 0.9 & 0.8 & 0.6 & 0.3 & 0.1 & 0.4 & 0.1 \end{array}\right). \end{aligned}
(12.75)
Focusing on the probability of death due to CRC, the sensitivity and elasticity, from (12.50), are
\displaystyle \begin{aligned} \begin{array}{rcl} {d \mbox{vec} \, {\mathbf{B}}(1,:) \over d \boldsymbol{\theta}^{\mathsf{T}}} &\displaystyle =&\displaystyle \left(\begin{array}{rrrrrrrrr} 3.5&\displaystyle 1.0&\displaystyle 2.1&\displaystyle 0.0&\displaystyle -0.0&\displaystyle 0.0&\displaystyle 0.4&\displaystyle 0.0&\displaystyle -7.1 \\ 0&\displaystyle 2.5&\displaystyle 5.2&\displaystyle 0.1&\displaystyle -0.1&\displaystyle 0.0&\displaystyle 0.9&\displaystyle 0.1&\displaystyle -11.5 \\ 0&\displaystyle 0&\displaystyle 8.4&\displaystyle 0.1&\displaystyle -0.1&\displaystyle 0.0&\displaystyle 1.5&\displaystyle 0.1&\displaystyle -12.5 \\ 0&\displaystyle 0&\displaystyle 0&\displaystyle 0.2&\displaystyle -0.2&\displaystyle 0.1&\displaystyle 3.0&\displaystyle 0.2&\displaystyle -8.5 \\ 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0.1&\displaystyle 0.00&\displaystyle 0.4&\displaystyle -5.4 \\ 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 8.1&\displaystyle 0&\displaystyle -11.1 \\ 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0.4&\displaystyle -3.9 \end{array}\right) \\ {} {\epsilon \mbox{vec} \, {\mathbf{B}}(1,:) \over \epsilon \boldsymbol{\theta}^{\mathsf{T}}} &\displaystyle =&\displaystyle \left(\begin{array}{rrrrrrrrr} 0.6&\displaystyle 0.4&\displaystyle 0.5&\displaystyle 0.1&\displaystyle -0.1&\displaystyle 0.0&\displaystyle 0.1&\displaystyle 0.1&\displaystyle -1.7 \\ 0&\displaystyle 0.4&\displaystyle 0.5&\displaystyle 0.1&\displaystyle -0.1&\displaystyle 0.0&\displaystyle 0.1&\displaystyle 0.1&\displaystyle -1.2 \\ 0&\displaystyle 0&\displaystyle 0.5&\displaystyle 0.1&\displaystyle -0.1&\displaystyle 0.0&\displaystyle 0.1&\displaystyle 0.1&\displaystyle -0.8 \\ 0&\displaystyle 0&\displaystyle 0&\displaystyle 0.1&\displaystyle -0.1&\displaystyle 0.0&\displaystyle 0.1&\displaystyle 0.0&\displaystyle -0.3 \\ 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0.0&\displaystyle 0&\displaystyle 0.1&\displaystyle -0.1 \\ 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0.4&\displaystyle 0&\displaystyle -0.4 \\ 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0&\displaystyle 0.1&\displaystyle -0.1 \\ \end{array}\right). \end{array} \end{aligned}
The probability of death from CRC could be reduced by increasing the mortality rate due to other causes (last column), although this is not an attractive treatment option. A more useful interpretation of the last column is as an indication of the increase in death from CRC that would result from reducing other causes of death.

For normal individuals, the risk of death from CRC is most elastic to changes in λ2, λ3, and λ4 (row 1). The row sums of the elasticity matrix, corresponding to the effects of a proportional change in all rates, sum to zero because a change of time scale does not affect the probability of absorption.

### 12.6.2 Sensitivity of the Embedded Chain

The transition matrix $$\widehat {{\mathbf {P}}}$$ in (12.76) for the embedded chain is
(12.76)
The fundamental matrix $$\widehat {{\mathbf {N}}}_1$$ from (12.57) is
\displaystyle \begin{aligned} \widehat{{\mathbf{N}}}_1 = \left(\begin{array}{rrrrrrr} 1.0 & 0 & 0 & 0& 0 & 0& 0 \\ 0.4 & 1.0 & 0 & 0& 0 & 0& 0 \\ 0.2 & 0.6 & 1.0 & 0& 0 & 0& 0 \\ 0.1 & 0.3 & 0.5 & 1.0& 0 & 0& 0 \\ 0.1 & 0.2 & 0.3 & 0.6& 1.0 & 0& 0 \\ 0.1 & 0.1 & 0.2 & 0.4& 0 & 1.0& 0 \\ 0.1 & 0.2 & 0.3 & 0.6& 1.0 & 0& 1.0 \end{array}\right). \end{aligned}
(12.77)
In this continuous-time chain, states cannot be re-entered (cf. Fig. 12.1). Because a state can be visited at most once, the mean number of visits is also the probability of ever entering the state. Thus the probabilities that a normal individual will ever suffer early or late clinical CRC are $$\widehat {{\mathbf {N}}}_1 (6,1)=0.1$$, and $$\widehat {{\mathbf {N}}}_1(7,1) = 0.07$$, respectively. These probabilities increase for individuals in successively later stages; for an individual with large adenoma the probabilities are $$\widehat {{\mathbf {N}}}_1(6.3)=0.2$$ and $$\widehat {{\mathbf {N}}}_1(7,3)=0.3$$, respectively.
Focusing sensitivity analysis on individuals in the normal state (state 1), the sensitivities and elasticities of the number of visits are
\displaystyle \begin{aligned} {d \widehat{{\mathbf{N}}}_1(:,1) \over d \boldsymbol{\theta}^{\mathsf{T}}} = \left(\begin{array}{rrrrrrrrr} 0 & 0 & 0& 0& 0 & 0 & 0 & 0& 0 \\ 15.9 & 0 & 0& 0& 0 & 0 & 0 & 0&-11.0 \\ 9.7 & 2.8 & 0& 0& 0 & 0 & 0 & 0&-11.1 \\ 4.8 & 1.4 & 2.9& 0& 0 & 0 & 0 & 0& -8.3 \\ 2.8 & 0.8 & 1.7& 0.1& -0.1 & 0 & 0 & 0& -5.0 \\ 1.8 & 0.5 & 1.1& -0.1& 0.1 & 0 & 0 & 0& -3.2 \\ 2.7 & 0.8 & 1.6& 0.1& -0.1 & 0.0 & 0 & 0& -4.9 \end{array}\right) \end{aligned}
(12.78)
and
\displaystyle \begin{aligned} {\epsilon \widehat{{\mathbf{N}}}_1(:,1) \over \epsilon \boldsymbol{\theta}^{\mathsf{T}}} = \left(\begin{array}{rrrrrrrrr} 0 & 0 & 0& 0& 0 & 0 & 0 & 0& 0 \\ 0.6 & 0 & 0& 0& 0 & 0 & 0 & 0& -0.6 \\ 0.6 & 0.4 & 0& 0& 0 & 0 & 0 & 0& -1.0 \\ 0.6 & 0.4 & 0.5& 0& 0 & 0 & 0 & 0& -1.5 \\ 0.6 & 0.4 & 0.5& 0.4& -0.4 & 0 & 0 & 0& -1.5 \\ 0.6 & 0.4 & 0.5& -0.6& 0.6 & 0 & 0 & 0& -1.5 \\ 0.6 & 0.4 & 0.5& 0.41& -0.4 & 0.04 & 0 & 0& -1.5 \end{array}\right). \end{aligned}
(12.79)
The sensitivities and elasticities of the probability of contracting clinical CRC are given by the last two rows. These probabilities are highly elastic to λ1, λ2 and λ3. The elasticities to μ indicate that every 1% reduction in mortality due to other causes will cause about a 1.5% increase in the probability of experiencing clinical CRC.

## 12.7 Discussion

The results of this chapter have been presented in terms of differentials of, or derivatives with respect to, a general vector θ of parameters. The nature of these parameters and their relation to Q, U, or M can be very general. At its simplest, θ could consist of some subset of the elements of Q. This is the case in the CRC example (Sect. 12.6), in which the parameters are transition rates λi and mortality rates μi. More generally, the transition rates might themselves be written as functions of other variables. For example, in Van Den Hout and Matthews (2009a,b) the rates are written as $$q_{ij}=\exp \left ( \boldsymbol {\beta }_{ij}^{\mathsf {T}} {\mathbf {z}} \right )$$, i ≠ j, where z is a vector of covariates (e.g., age, medical care) and βij is a vector of coefficients to be estimated. The results presented here can be applied directly to such cases, and indeed to even more complicated functional dependencies, using the chain rule. Thus, focusing on parametric dependence is not only scientifically valuable (these are, after all, the relationships of interest in applications of Markov chains) but also extremely general.

Epidemic models are often written as continuous-time Markov chains, specified in terms of rates of movement among infection states. Gómez-Corral and López-García (2018) extended the methods of this chapter to a model in which individuals are classified by two state variables (a level-dependent quasi-birth-death process). The model may be considered a continuous-time analog of the age×stage models of Chap. (Caswell 2012; Caswell and Salguero-Gómez 2013; Caswell et al. 2018). Their approach takes advantage of the block structure of the intensity matrix for such processes. They have also applied the approach to receptor-ligand complexes within cells (López-García et al. 2018). As far removed from demography as molecules may seem, the concepts of i-state transitions, of inferring population behavior from individual trajectories, and of sensitivity analysis still apply. That’s a good thing.

## Footnotes

1. 1.

This calculation holds the mortality rate fixed at its values at age 70; in reality it increases with age. Wu et al. (2006) included age variation by providing values of λ1 (the rate of progression from normal to small adenoma) specific to 5-year intervals from 50 to 70 years of age; all other parameters were age-invariant.

## References

1. Altman, E., K. Avrachenkov, and R. Núnez-queija. 2004. Perturbation analysis for denumerable Markov chains with application to queueing models. Advances in Applied Probability 36:839–853.
2. Cao, X. 1989. Estimates of performance sensitivity of a stochastic system. IEEE Transactions on Information Theory 35:1058–1068.
3. Cao, X., X. Yuan, and L. Qiu. 1996. A single sample path-based performance sensitivity formula for Markov chains. IEEE Transactions on Automatic Control 41:1814–1817.
4. Caswell, H., 2006. Applications of Markov chains in demography. Pages 319–334 in MAM2006: Markov Anniversary Meeting. Boson Books, Raleigh, North Carolina.Google Scholar
5. Caswell, H. 2012. Matrix models and sensitivity analysis of populations classified by age and stage: a vec-permutation matrix approach. Theoretical Ecology 5:403–417.
6. Caswell, H., C. de Vries, N. Hartemink, G. Roth, and S. F. van Daalen. 2018. Age×stage-classified demographic analysis: a comprehensive approach. Ecological Monographs 88:560–584.
7. Caswell, H., and R. Salguero-Gómez. 2013. Age, stage and senescence in plants. Journal of Ecology 101:585–595.
8. Chen, T.-H., M.-F. Yen, S.-S. Lai, K. S-L, W. C-Y, W. J-M, T. C. Prevost, and D. S. W. 1999. Evaluation of a selective screening for colorectal carcinoma: the Taiwan Multicenter Cancer Screening (TAMCAS) Project. Cancer 86:1116–1128.Google Scholar
9. Cho, G. E., and C. D. Meyer. 2000. Comparison of perturbation bounds for the stationary distribution of a Markov chain. Linear Algebra and its Applications 335:137–150.
10. Fix, E., and J. A. Neyman. 1951. A simple stochastic model of recovery, relapse, death and loss of patients. Human Biology 23:205–241.Google Scholar
11. Funderlic, R. E., and C. D. Meyer, Jr. 1986. Sensitivity of the stationary distribution vector for an ergodic Markov chain. Linear Algebra and its Applications 76:1–17.
12. Glasserman, P. 1992. Derivative estimates from simulation of continuous-time Markov chains. Operations Research 40:292–308.
13. Golub, G. H., and C. D. Meyer, Jr. 1986. Using the QR factorization and group inversion to compute, differentiate, and estimate the sensitivity of stationary probabilities for Markov chains. SIAM Journal on Algebraic and Discrete Methods 7:273–281.
14. Gómez-Corral, A., and M. López-García. 2018. Perturbation analysis in finite LD-QBD processes and applications to epidemic models. Numerical Linear Algebra with Applications page e2160.Google Scholar
15. Hunter, J. J. 2005. Stationary distributions and mean first passage times of perturbed Markov chains. Linear Algebra and its Applications 410:217–243.
16. Iosifescu, M. 1980. Finite Markov Processes and Their Applications. Wiley, New York, New York.Google Scholar
17. Kay, R. A. 1986. Markov model for analysing cancer markers and disease states in survival studies. Biometrics 42:855–865.
18. Kuo, H. S., H. J. Chang, P. Chou, L. Teng, and T. H. H. Chan. 1999. A Markov chain model to assess the efficacy of screening for non-insulin dependent diabetes mellitus (NIDDM). International Journal of Epidemiology 28:233–240.
19. López-García, M., M. Nowicka, C. Bendtsen, G. Lythe, S. Ponnambalam, and C. Molina-París. 2018. Quantifying the phosphorylation timescales of receptor–ligand complexes: a Markovian matrix-analytic approach. Open Biology 8:180126.
20. Meyer, C. D. 1975. The role of the group generalized inverse in the theory of finite Markov chains. SIAM Review 17:443–464.
21. Mitrophanov, A. Y. 2004. The spectral gap and perturbation bounds for reversible continuous-time Markov chains. Journal of Applied Probability 41:1219–1222.
22. Ramesh, A. V., and K. Trivedi, 1993. On the sensitivity of transient solutions of Markov models. Pages 122–134 in Proceedings of the 1993 ACM SIGMETRICS Conference on measurement and modeling of computer systems.Google Scholar
23. Schweitzer, P. J. 1968. Perturbation theory and finite Markov chains. Journal of Applied Probability 5:401–413.
24. Seneta, E. 1993. Sensitivity of finite Markov chains under perturbation. Statistics and Probability Letters 17:163–168.
25. Sonnenberg, F. A., and R. Beck. 1993. Markov models in medical decision making: a practical guide. Medical Decision Making 13:322–338.
26. Van Den Hout, A., and F. E. Matthews. 2009a. Estimating dementia-free life expectancy for Parkinson’s patients using Bayesian inference and microsimulation. Biostatistics 10:729–743.
27. Van Den Hout, A., and F. E. Matthews. 2009b. A piecewise-constant Markov model and the effects of study design on the estimation of life expectancies in health and ill health. Statistical Methods in Medical Research 18:145–162.
28. Wu, G.-M., Y.-M. Wang, M.-F. Yen, J.-M. Wong, H.-C. Lai, J. Warwick, and C. TH-H. 2006. Cost-effectiveness analysis of colorectal cancer screening with stool DNA testing in intermediate-incidence countries. BMC Cancer 6:136.Google Scholar

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Authors and Affiliations

• Hal Caswell
• 1
1. 1.Biodiversity & Ecosystem DynamicsUniversity of AmsterdamAmsterdamThe Netherlands