The Jump Start Power Method: A New Approach for Computing the Ergodic Projector of a Finite Markov Chain
 122 Downloads
Abstract
This article presents a new numerical method for approximately computing the ergodic projector of a finite Markov chain. Our approach requires neither structural information on the chain, such as, the identification of ergodic classes, transient states, or qualitative information, such as whether the chain is nearly decomposable or not. The theoretical deduction of the new method is corroborated by an extensive numerical study.
Keywords
Power method Numerical evaluation Markov multichains Transient states Nearly decomposable1 Introduction
The main advantages of the power method are that PM is easy to implement and that it requires no further information on P. In addition, PM can be efficiently implemented for large sparse matrices, which is the main reason why PM is used for the acclaimed Google PageRank algorithm introduced by Brin and Page [7], and for more detail see [4, 9, 20]. PM has two main versions. In the vectorupdating version of PM, one computes \( \mu P^n \) for given vector \( \mu \). Vectorupdating applies in case a given Markov chain \({\hat{P}} \) with known stationary distribution \( \pi _{ {\hat{P}} } \) is updated (due to a change in the underlying hyperlink structure of the network) to a new Markov chain P. Then, computing \( \pi _{{\hat{P}} } P^n \) converges faster towards \( \pi _{P}\). An advantage of vector updating is that it only requires vectormatrix multiplications. The downside of this approach is that one cannot change the initial vector without a complete recalculation. The matrixupdating version directly computes \( P^n\) in order to approximate \( \varPi _P\). The advantage of the matrixupdating PM is that by squaring a matrix power \( P^n \), i.e., going from \( P^n \) to \( ( P^n )^2 = P^{2 n }\), high powers of P can be relatively easily computed. Indeed, computing \( P^n \) only requires \( \log _2 ( n ) \) matrix multiplications. Moreover, applying different initial vectors to \( \varPi _P\) allows to model different initial distributions which is of particular interest in case of multichains, see the subsequent section for details. The downside is that even with the \( \log _2 ( n ) \) advantage, matrix updating may require a significant number of matrix multiplications and as the power increase these matrices are not sparse. In case P is periodic, both the vectorupdating and the matrixupdating do not converge unless a convex combination of the original P with the identity matrix is used which comes at the expense of reduced convergence speed.
In this paper, we mainly focus on matrixupdating PM, from now on simply referred to as PM. Iterative methods, such as PM converge slowly in case the subdominant eigenvalue of P is close to 1, see [13, 15]. This typically happens if either the Pchain only jumps with small probability from the transient states to (one of) the ergodic class(es) or if P is nearly decomposable. Roughly speaking, an irreducible chain P is called nearly decomposable if the statespace can be divided into classes so that the interactions between states are relatively frequent compared to interactions between the classes (a formal definition will be provided later in the text). It can be shown that an irreducible Markov chain without transient states is nearly decomposable if and only if the subdominant eigenvalue is close to 1, see [11]. A famous example of a nearly decomposable Markov chain is the socalled Courtois matrix, which is a \( 6 \times 6 \) transition matrix for which PM requires \( n \approx 69.000\) in order to provide an approximation of \( \varPi _P \) that is correct in first 6 digits, [25]. In case the ergodic classes and the transient states are known, one may compute the ergodic projector directly by first computing the equilibrium distribution for each ergodic class, and then the longterm behavior of the transient states, see [5, 17] and the detailed discussion in Sect. 2. For a comprehensive overview of numerical methods for computing the ergodic projector of a finite Markov chain, we refer to [25].
Our research on Markov chains is stimulated by the growing interest in the analysis of social networks (where the Markov chain is used to model relationships among social agents, see [22]) and by the analysis of the world wide web, were based on the (bored) randomsurferconcept, the Markov chain models the probability of randomly going from one page to another, [9, 20]. A key feature of these networks is that they are large and that neither their structure (transient states, ergodic classes) nor their balancedness (nearly decomposable or not) are known a priori. Other examples of these type of complex networks include telecommunication networks, cognitive and semantic networks and biological networks.
Letting \( \alpha < 1 / \gamma ( P ) \), the result put forward in (2) implies that the Markov kernel \(H_\alpha ( P )\) is geometrically ergodic with transient phase \( r=1 \) and rate \( \alpha \gamma (P) \). Put differently, the transformation \( P \mapsto H_\alpha ( P ) \) provides a jump start for PM as the desired contraction property is immediately effective. Moreover, we will show that iterating the transformation yields a geometric reduction in the geometric rate, so that, for example, \( H_\alpha ( H_ \alpha ( P ) ) \) has a rate that is proportional to \( \alpha ^2 \). The above theoretical results lead to a new numerical approach for approximately computing \( \varPi _P \), called jump start power method.

The error of approximating \( \varPi _P\) by powers of the modified resolvent \( ( H_\alpha ( P ) )^k \) is of order \( ( \alpha \gamma (P))^k \). We use this fact to introduce the jump start power method (JSPM) that enjoys the robustness of PM but overcomes the numerical deficiency of PM. JSPM works well for multichains, nearly decomposable chains, and chains that jump with small probability from the transient states to (one of) the ergodic class(es).

An adapted version of JSPM is developed for largescale Markov chains which utilizes the structure of the Markov chain and takes only ‘one jump’ towards \(\varPi _P\), i.e., \(k=1\) together with a carefully chosen \(\alpha \in (0,1)\).

An extensive numerical study is provided that corroborates the form of the analytical bound for the decay of the error and illustrates the numerical advantages of JSPM.
2 A Brief Review of Markov Chains
In case the Markov chain has only one closed irreducible set of states, also called ergodic class, and a (possibly empty) set of transient states, it is called a Markov unichain (in short: unichain). For unichains it holds that the chain will eventually be trapped in the (unique) ergodic class, independent of the initial state. The unique distribution to which a unichain converges is described by the stationary distribution of P denoted as \(\pi ^\top _P\) which can be found by solving \(\pi ^\top _P\)P = \(\pi ^\top _P\). Since the stationary distribution is independent of the initial state, all rows of \(\varPi _P\) equal \(\pi ^\top _P\) in case P describes a Markov unichain.
In case there are multiple ergodic classes the stationary distribution fails to be unique. Indeed, any row of \(\varPi _P\) is a stationary distribution of the Markov chain. More specifically, denote the ith row of \(\varPi _P\) by \(\varPi _P(i,\bullet )\), then it holds that \(\varPi _P(i,\bullet )\) is a probability distribution which satisfies \(\varPi _P(i,\bullet ) P=\varPi _P(i,\bullet )\). This implies that any convex combination of the rows is also a stationary distribution of the Markov chain, i.e., for \((\gamma _i)_{i \in \mathbb {S}}: \sum _{i = 1}^S \gamma _i = 1\) and \(\gamma _i\ge 0 \), for all \( i \in \mathbb {S}\), it holds that \(\sum _{i = 1}^S \gamma _i \varPi _P(i,\bullet )\) is a probability distribution which is invariant with respect to P. When an initial distribution \(\mu ^\top \) is considered, this convex combination is fixed (and given by \( \mu ^\top \)) meaning that there exists a unique stationary distribution for the chain started in \( \mu ^\top \) (describing the longrun behavior of the chain started in \( \mu ^\top \)), or, more formally, \(\mu ^\top \varPi _{P}\) is the unique stationary distribution satisfying \((\mu ^\top \varPi _P) P = (\mu ^\top \varPi _P)\) when starting in \(\mu ^\top \). Literature concerning Markov multichains includes Markov decision processes from [23], series expansion of Markov chains [2, 5, 6] and singular perturbation analysis [1, 12] where the underlying multichain structure is often known beforehand.
A Markov chain may belong to all of the above types simultaneously. For example, a multichain with transient states may have an ergodic class that for itself constitutes a nearly decomposable chain. Below we illustrate this by means of a simple Markov chain.
Example 1
Markov chain P is noticeably a multichain with ergodic classes \( \{ 1 ,2 \} \) and \( \{3 \} \). State 4 is transient. If p, q are small, then the submatrix describing the transitions within the ergodic class \( \{ 1 , 2 \} \) becomes nearly decomposable. Similar when \(\sum _{i=1}^3 r_i\) is small, state 4 is only weakly connected to states \(\{1,2,3\}\).
3 Bounding the Approximation Error
Remark 1
The (i, j)th element of \(G_\alpha ( N , P^q ) \) gives the scaled \((1\alpha )\)discounted expected number of visits of the Markov chain with transition matrix \(P^q\) to the jth state in the first \(N + 1\) number of discrete time steps (including the state i at time zero) when starting in state i. Intuitively, the discounting ensures that the weights of the visits after many discrete time steps of the Markov chain with transition matrix \(P^q\) becomes smaller and smaller, ensuring existence of \(H_\alpha ( P^q ) \) since \(\Vert (1\alpha )P^q\Vert <1\), for \(\alpha \in (0,1)\).
Lemma 1
Proof
 1.\(N \le \phi (P,q)  1\) (geometric ergodicity does not apply):$$\begin{aligned} \Vert \varPi _P  ( H_\alpha ( N, P^q ) )^k\Vert \le \left( \sup _{n=0,1,\ldots ,N} \left\ \varPi _P  P^{q(n+1)} \right\ \right) ^k \end{aligned}$$(12)
 2.\(N \ge \phi (P,q)\) (geometric ergodicity applies): sinceit holds that$$\begin{aligned} 1(1\alpha )^{\phi (P,q)} \le \alpha \phi (P,q),\quad \text{ for } \alpha \in (0,1) \text{ and } \phi (P,q) = 0,1,\ldots , \end{aligned}$$where \(\gamma (P,q)\) is a finite constant defined in (3).$$\begin{aligned} \Vert \varPi _P  ( H_\alpha ( N, P^q ) )^k\Vert \le \left( \frac{\alpha \gamma (P,q)}{1(1\alpha )^{N+1}}\right) ^k, \end{aligned}$$
Remark 2
The following theorem summarizes some properties of \(H_\alpha (P^q)\).
Theorem 1
Proof
Inequality (13) follows directly from Lemma 1 by letting \(N\rightarrow \infty \). The first two equalities from (14) follow from Inequality (13) and the third equality from (1). \(\square \)
Remark 3
Theorem 3 in the “Appendix” shows that the results put forward in Lemma 1 and Theorem 1 apply to periodic Markov chains with period d for \( q=1 \) when \(\gamma (P,1)\) is replaced by \({{\bar{\gamma }}}(P,d)\) defined in the “Appendix”.
The result put forward in Theorem 1 shows that for \( \alpha < \gamma ( P , q )\) it holds that the modified resolvent \( H_\alpha ( P^q ) \) is geometrically ergodic with rate \( \alpha \gamma ( P , q ) \), transient phase \( r =1 \), and ergodic projector \( \varPi _P\).
As our numerical study in the second part of the paper shows, the modified resolvent is potentially more efficient than PM, which makes it, apart from the fact that it directly applies to multichains, an attractive alternative to PM. In the following, \(H_\alpha (P)\) is illustrated for Example 1.
Example 2
In the following example, we discuss the convergence of \(H_\alpha (P)\) in case of a nearly decomposable Markov chain.
Example 3
Theorem 2
Proof
Corollary 1
Remark 4
The results put forward in Theorem 2 and Corollary 1 apply for the case \( q=1 \) also to periodic Markov chains with period d when \(\gamma (P, 1)\) is replaced by \({{\bar{\gamma }}}(P,d)\). For details see the “Appendix”.
Theorem 2 shows that repeated application of the modified resolvent yields a geometric improvement of the rate of geometric ergodicity. Example 4 illustrates Theorem 2.
Example 4
Unfortunately, as \( \gamma ( P , q ) \) is not available, it is neither clear what a good initial choice for \( \alpha \) is, nor when to terminate \((H_\alpha (P))^k\) or the repeated application of \( H_\alpha ( P ; n ) \). In the following, we will address these two issues in more detail.
Remark 5
4 The Jump Start Power Method
In the previous section, we have shown that going from P to the modified resolvent \( H_\alpha ( P) \) potentially yields a geometrically ergodic Markov chain with no transient phase (i.e., \( r =1 \)). In this section we show how this result can be made fruitful for numerical computations. In particular, Sect. 4.1 illustrates the modified resolvent theory through numerical experiments, Sect. 4.2 develops a practical method that exploits the developed theory by introducing the jump start power method (JSPM) and provides numerical results. Lastly, Sect. 4.3 discusses and numerically illustrates the use of JSPM in case of large (sparse) systems.
4.1 Motivating Numerical Experiments
Instances used for numerical experiments
Tr. Matrix  Description  S  Ergodic structure  \(p^\star \) 

\(P_1\)  Courtois matrix, [25].  8  ([8], 0)  3e−5 
\(P_2\)  Kleinberg’s network, [18], with parameters \(p=5\), \(q=2\) and \(\beta =3\).  225  ([225], 0)  1e−2 
\(P_3\)  Kleinberg’s network, [18], with parameters \(p=3\), \(q=6\) and \(\beta =1\).  900  ([900], 0)  1e−2 
\(P_4\)  Block diagonal transition matrix of \(P_2\) and \(P_3\) with weak connection between \(P_2\) and \(P_3\). I.e., two random nodes from \(P_2\) and \(P_3\), resp., are connected with probability 1e−4.  1125  ([1125], 0)  1e−4 
\(P_5\)  Block diagonal transition matrix of \(P_2\) and \(P_3\) with 60 transient states.  1185  ([225, 900], 60)  8.4e−6 
\(P_6\)  Block diagonal transition matrix of \(P_1\) and \(P_2\) with 20 transient states.  253  ([8, 225], 20)  3e−5 
\(P_7\)  RENGA proteinprotein interaction (PPI) network, [10], with parameters \(\beta _1 = 0.4\) and \(\beta _2 = 0.965\).  700  ([202, 181, 67, 98, 152],0)  1.7e−3 
\(P_8\)  Preferential attachment network, [3], with parameter \(d = 5\).  700  ([700],0)  4.7e−5 
\(P_9\)  Lock and key PPI network, [21], with parameters \(\beta _1 = 0.4\) and \(\beta _2 = 0.965\).  700  ([700],0)  1.7e−4 
In particular for the Courtois matrix \(P_1\), in order to approximately achieve a norm error of \(7.92\cdot 10^{7}\) a power is needed of \(2^{16}\) while the same norm error is obtained via the modified resolvent with \(\alpha \approx 10^{10}\). For \(P_4\) the modified resolvent with \(\alpha \approx 10^{11.18}\) leads to the same norm error (of approximately \(1.63 \cdot 10^{5}\)) as PM with power 20655175 (\(\approx 2^{24.3}\)). As for computation times, experiments showed that on average PM \((P_4)^k\) with power \(k=20655175\) takes on average 73.12 seconds in a sparse matrix setting whereas the modified resolvent \(H_{\alpha =10^{11.18}}(P_4)\) takes on average 2.68 seconds, i.e., a difference of factor 27.28 on average (the experiments were performed in MATLAB R2011b on a 64bit Windows desktop PC with Intel(R) Core(TM) i52310 CPU @ 2.90GHz processor).
Length of Transient Phase for \( H_\alpha (P)\) Figure 2 illustrates the effect that powers of \(H_\alpha (P)\) have on the norm error for the Courtois matrix. Different values for \(\alpha \) are considered and for each \(\alpha \) the exponential decay location is determined and thereby the length of the transient phase is identified. Note that \(H_{\alpha =1}(P)\) equals P. A heuristic approach is used to find the exponential decay location where for each \(\alpha \) an exponential function is repeatedly fitted to the data until the coefficient of determination \(R^2\) is close enough to 1 (where \( R^2 =1 \) represents a perfect fit). After each fit which leads to an insufficient coefficient of determination, the dataset is reduced by increasing the value of the first considered power n and the fitting repeats. The found exponential decay locations (i.e., the smallest power in the dataset that led to \(R^2\) sufficiently close to 1) are denoted with the large dots and labeled, where the labels correspond to the fitted functions given under the graph together with the \(R^2\) in parenthesis behind the function.
This phenomenon has been theoretically shown in the previous section. It is worth noting for \( \alpha \le 10^{3}\) there is no transient phase, i.e., \( r =1 \) in these cases. Furthermore, from the fitted functions it follows that smaller \(\alpha \) values lead to stronger norm error reduction for increasing powers.
It shows that relatively large values for \(\alpha \) already lead to small norm errors after a few iterations. In particular, the fitted relation between the norm error of \(H_{\alpha =0.01}(P;n)\) and n is approximately \(4901 e^{4.6n}\) whereas that of \((H_{\alpha =0.01}(P))^n\) and n is approximately \(e^{0.0198n}\) (see also Fig. 2), showing that the effect of an increase in the number of iterations in the nested modified resolvent is far more effective than an increase in the power of the modified resolvent for the same \(\alpha \). It therefore illustrates the sharper bound found for the nested modified resolvent in comparison with powers of the modified resolvent.
4.2 Jump Start Power Method (JSPM)
In this section, we will develop a powermethod like algorithm based on the theory established in the previous section. To that end we discuss how to choose \(\alpha \) and we provide a stopping rule for the algorithm. Our recommendations are based on numerical experiments and balance avoiding numerical issues with achieving good numerical approximations.
Based on the above results, we advise to use the modified resolvent in a PM framework with a carefully chosen \(\alpha \). When to terminate the power iterations is a delicate matter. A natural stopping rule is to terminate the algorithm when the improvement of an extra iteration becomes insignificant. More specifically, in order to find a power k such that \( \Vert \varPi _P P^k \Vert \le \varepsilon \), one may terminate PM if \(\Vert P^k P  P^k \Vert < \varepsilon \), for \(\varepsilon > 0\) small. Unfortunately, this stopping rule may stop the algorithm too early as is illustrated in Example 5 below.
Example 5
 (1)
Choose\( \alpha = \min \{ \alpha _{\max },\; p^\star /N \} \)and select numerical precision\( \varepsilon \).
 (2)
Initialize\(k = 1\)and calculate\(H_\alpha (P)\).
 (3)
Set\(k=k+1\).
 (4)Ifgo to step 3. Otherwise go to step 5.$$\begin{aligned} \Vert (H_\alpha (P))^{k1} H_\alpha (P)  (H_\alpha (P))^{k1} \Vert \ge \varepsilon \end{aligned}$$
 (5)
Return\((H_\alpha (P))^{k}\).
In Table 3 some numerical results for JSPM are shown. For an overview of the instances see Table 1. Two parameter choices for \( \alpha \) and \( \varepsilon \) are considered, see Table 2. Parameter Setting 1 aims at achieving higher accuracy of the approximation (i.e., a small value for \(\varepsilon \)), which is numerically possible by choosing \(\alpha \) not too small. Parameter Setting 2 focuses more on a quick convergence of the algorithm, i.e., a larger value for \( \varepsilon \) compared to setting 1 and small \(\alpha \).
Parameter settings used for JSPM in numerical experiments
\(\alpha \)  \(\varepsilon \)  Aim  

Parameter setting 1:  \(\min \{ 10^{4}, \;p^\star /100\}\)  \(10^{8}\)  High accuracy 
Parameter setting 2:  \(\min \{ 10^{8}, \;p^\star /S\}\)  \(10^{6}\)  Fast computation 
Results of JSPM for two parameter settings given in Table 2
Parameter setting 1  Parameter setting 2  

Tr. Matrix  Norm error  # Iterations  Norm error  # Iterations 
\(P_1\)  2.4461e−010  4  4.8780e−009  3 
\(P_2\)  4.3119e−012  4  1.8567e−008  2 
\(P_3\)  1.1869e−012  4  1.0155e−008  2 
\(P_4\)  1.6029e−008  42  1.3002e−008  5 
\(P_5\)  4.3520e−009  3  1.9519e−008  2 
\(P_6\)  5.3403e−010  4  2.5750e−008  3 
\(P_7\)  4.7504e−010  8  5.9758e−009  3 
\(P_8\)  2.4177e−010  3  1.6035e−008  2 
\(P_9\)  4.6657e−010  3  5.9593e−008  2 
4.3 JSPM for Large Markov Chains
This final subsection discusses JSPM for large Markov chains. A common feature of large chains is that the transition matrix P is sparse but the ergodic projector \(\varPi _P\) is not due to connectivity [22]. This leads to numerical issues in approximating \(\varPi _P\). In particular for JSPM: when the approximation \( (H_\alpha (P) )^k \) approaches \(\varPi _P\) as k is increasing, iterations become computational more expensive and a memory burden emerges due to the loss of sparsity.
Therefore, in case of large instances, our advice based on numerical experiments is to choose \(\alpha \) significantly small and return \(H_\alpha (P)\) as approximation, i.e., apply the JSPM for \( k =1 \). In addition, instead of calculating \(H_\alpha (P)\) as a whole, we recommend to calculate a concentrated version of \(H_\alpha (P)\), denoted by \(H^c_\alpha (P)\), where the computation of \(H^c_\alpha (P)\) utilizes the structural properties of \( \varPi _P\) such as the fact that all rows corresponding to ergodic states from the same ergodic class are identical. In particular, when row i of \(H_\alpha (P)\), denoted by \(H_\alpha (P)(i,\bullet )\), is calculated, then based on this approximation it can be decided whether i is ergodic or transient by inspecting the value of \(H_\alpha (P)(i,i)\). Indeed evoking the diagonal criterion, see Sect. 2, state i is ergodic if and only if \(H_\alpha (P)(i,i)\) is significantly larger than 0. In case i is identified as ergodic, all indexes corresponding to (significantly) positive entries of \(H_\alpha (P)(i,\bullet )\) are identified as belonging to the same ergodic class. Vector \(H_\alpha (P)(i,\bullet )\) is saved in \(H^c_\alpha (P)\) as approximation for the rows of the particular ergodic class and we are done considering all the indexes from this ergodic class. In case i is identified as transient, \(H_\alpha (P)(i,\bullet )\) is saved in \(H^c_\alpha (P)\) as approximation for the ith row of \(\varPi _P\). We will refer to this procedure as the adapted JSPM version for large instances.
 (1)
Choose\(\iota > 0\).
 (2)
Initialize\(I=0\)and\(C = \emptyset \).
 (3)If\(\mathbb {S} \setminus C \ne \emptyset \):
 (3.1)
Select\(i\in \mathbb {S} \setminus C\).
 (3.2)
Calculate\(H_\alpha (P)(i,\bullet )\).
Otherwise go to step 6.
 (3.1)
 (4)If\(H_\alpha (P(i,i)) > \iota \):
 (4.1)
Stateiis identified as ergodic, set\(I = I + 1\).
 (4.2)
\(E_I = \{ j \; : \; H_\alpha (P)(i,j) > \iota \}\).
 (4.3)
\(C = C \cup E_I\).
Otherwiseiis identified as transient, set\(C = C \cup \{ i \}\).
 (4.1)
 (5)
Save \(H_\alpha (P)(i,\bullet )\) in \(H_\alpha ^c(P)\) and go to step 3.
 (6)
Return\(H_\alpha ^c(P)\).
Large instances used for numerical experiments
Tr. Matrix  Description  S  Ergodic structure  \(p^\star \) 

\(P_{10}\)  Preferential attachment network, [3], with parameter \(d = 2\).  10,000  ([10,000],0)  1.1e−3 
\(P_{11}\)  Kleinberg’s network, [18], with parameters \(p=1\), \(q=1\) and \(\beta =1.5\).  15,625  ([15,625],0)  4.6e−2 
\(P_{12}\)  Block diagonal transition matrix of \(P_{10}\) and \(P_{11}\) with weak connection between \(P_{10}\) and \(P_{11}\). I.e., two random nodes from \(P_{10}\) and \(P_{11}\), resp., are connected with probability 1.5e−9.  25,625  ([25,625],0)  1.5e−9 
\(P_{13}\)  Block diagonal transition matrix of \(P_{10}\) and \(P_{11}\) with 30 transient states.  25,655  ([10,000 15,625],30)  1.9e−6 
For instances \(P_{10}\), \(P_{11}\), \(P_{12}\) and \(P_{13}\) from Table 4 the adapted JSPM is applied where we have chosen \(\alpha = \min \{ 10^{10},\; (p^\star )^2 \}\) and \(\iota = (1/S)^2\). The philosophy behind the choice of \(\alpha \) is similar to Parameter Setting 2 in the previous section, i.e., small \(\alpha \) is chosen such that one iteration is most likely sufficient. Our experience for real life networks is that \(\iota = (1/S)^2\) ensures correct distinctions between transient and ergodic states. In Table 5 the norm errors and computation times in seconds (sec.) of the experiments can be found.
From the results it follows that high accuracy is achieved in a relatively small amount of time, i.e., a unique row of \(H_\alpha (P)\) in case of 25625 states is calculated with MATLAB R2011b in 1.37 seconds on a 64bit Windows desktop PC with Intel(R) Core(TM) i52310 CPU @ 2.90GHz processor with norm error \(3.524 \cdot 10^{8}\). To put the results into context, for instance \( P_{1 2} \) it takes on average 2.49 seconds to calculate \(\mu ^\top (P_{12})^4\), i.e., two sparse matrix multiplications, to obtain \((P_{12})^4\), with norm error \(\Vert \mu ^\top \varPi _{P_{12}}  \mu ^\top ( P_{12} )^4 \Vert = 0.1794\), where \(\mu ^\top \) equals the first row of an appropriate sized identity matrix. It becomes even more counterproductive if we calculate \(\mu ^\top ( P_{12}) ^8\), which takes on average 593.75 seconds and leads to norm error \(\Vert \mu ^\top \varPi _{P_{12}}  \mu ^\top ( P_{12} )^8 \Vert = 0.079\). The significant increase in computation time is due to loss of sparsity. In this case, the vectorupdating version of PM might be more efficient. Indeed, performing 7 sparse vectormatrix multiplications to obtain \(\mu ^\top (P_{12})^8 = P_{12}(1,\bullet )(P_{12})^7\), requires only 0.0109 seconds. However, evaluating \(\mu ^\top (P_{12})^{10000}\) in this way, leading to norm error \(\Vert \mu ^\top \varPi _{P_{12}}  \mu ^\top ( P_{12} )^{10000} \Vert = 0.0048\), already requires 15.63 seconds. Clearly, this demonstrates the potential of the adapted JSPM in evaluating the ergodic projector. Similar observations can be expected for the other large instances.
A way to (most likely) improve accuracy of the adapted JSPM without significantly increasing computation time is to calculate \(H^c_\alpha (P^q)\), for \(q > 1\). The intuition is that for relatively small q, \(P^q\) may not affect the sparsity too much (increase in computation time is limited) but may increase the accuracy (which is likely according to the theory). Note that although it is common, it is not necessary that larger q increases accuracy, theory only provides upperbounds for the norm error. Example 6 provides an instance for which a larger q does not increase accuracy of \(H_\alpha (P)\).
Example 6
Results for adapted JSPM, with \(\alpha = \min \{ 10^{10},\; (p^\star )^2 \}\) and \(\iota = (1/S)^2\), in case of large instances
\(\Vert \varPi _{P_i}H^c_\alpha (P_i)\Vert \)  Computation time (in s) \(H^c_\alpha (P_i)\)  

\(i = 10\)  2.0553e−010  0.24 
\(i = 11\)  1.6724e−010  1.14 
\(i = 12\)  3.5243e−008  1.37 
\(i = 13\)  6.1897e−006  45.80 
5 Conclusion
This paper introduces JSPM which is a generalization of PM. JSPM is a highly accurate approximation method for the ergodic projector of a general finite Markov chain, including periodic Markov multichains. Convergence analysis and numerical experiments show that it can provide a viable generalization of PM. Especially in case of largescale Markov chains, JSPM works well and can deal with nearly decomposable chains without running into numerical instabilities.
Further research includes extending the techniques used for analyzing JSPM to the deviation matrix and achieving higher accuracy via numerical ingenuity.
Footnotes
 1.
Note that \((1+\alpha N)(1\alpha )^N \le (1+\alpha )^N(1\alpha )^N = (1\alpha ^2)^N < 1\) for \(\alpha \in (0,1)\) and \(N\in \mathbb {N}\).
References
 1.Altman, E., Avrachenkov, K.E., NúñezQueija, R.: Perturbation analysis for denumerable Markov chains with application to queueing models. Adv. Appl. Probab. 36, 839–853 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
 2.Avrachenkov, K.E., Haviv, M.: The first Laurent series coefficients for singularly perturbed stochastic matrices. Linear Algebra Its Appl. 386, 242–259 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
 3.Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999). https://doi.org/10.1126/science.286.5439.509 MathSciNetCrossRefzbMATHGoogle Scholar
 4.Berkhin, P.: A survey on PageRank computing. Internet Math. 2(1), 73–120 (2005). https://doi.org/10.1080/15427951.2005.10129098 MathSciNetCrossRefzbMATHGoogle Scholar
 5.Berkhout, J., Heidergott, B.F.: A series expansion approach to risk analysis of an inventory system with sourcing. In: Proceedings of 12th International Workshop on Discrete Event Systems (WODES 2014), vol. 12, pp. 510–515 (2014). http://www.ifacpapersonline.net/Detailed/65199.html
 6.Berkhout, J., Heidergott, B.F.: Efficient algorithm for computing the ergodic projector of Markov multichains. Procedia Comput. Sci. 51, 1818–1827 (2015). https://doi.org/10.1016/j.procs.2015.05.403 CrossRefGoogle Scholar
 7.Brin, S., Page, L.: The anatomy of a largescale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)CrossRefGoogle Scholar
 8.Fox, B.L., Landi, D.M.: Scientific applications: an algorithm for identifying the ergodic subchains and transient states of a stochastic matrix. Commun. ACM 11(9), 619–621 (1968). https://doi.org/10.1145/364063.364082 CrossRefzbMATHGoogle Scholar
 9.Franceschet, M.: PageRank: standing on the shoulders of giants. Commun. ACM 54(6), 92–101 (2011)CrossRefGoogle Scholar
 10.Grindrod, P.: Rangedependent random graphs and their application to modeling large smallworld proteome datasets. Phys. Rev. E 66(6), 066,702 (2002). https://doi.org/10.1103/PhysRevE.66.066702 CrossRefGoogle Scholar
 11.Hartfiel, D.J., Meyer, C.D.: On the structure of stochastic matrices with a subdominant eigenvalue near 1. Linear Algebra Its Appl. 272(1–3), 172–193 (1998)MathSciNetzbMATHGoogle Scholar
 12.Hassin, R., Haviv, M.: Mean passage times and nearly uncoupled Markov chains. SIAM J. Discrete Math. 5(3), 386–397 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
 13.Haviv, M., Rothblim, U.: Bounds on distances between eigenvectors. Linear Algebra Its Appl. 15, 101–118 (1984)CrossRefGoogle Scholar
 14.Heidergott, B.F., Hordijk, A., van Uitert, M.: Series expansions for finitestate Markov chains. Probab. Eng. Inf. Sci. 21(3), 381–400 (2007). https://doi.org/10.1017/S0269964807000034 MathSciNetCrossRefzbMATHGoogle Scholar
 15.Ipsen, I.C.F., Meyer, C.D.: Uniform stability of Markov chains. SIAM J. Matrix Anal. Appl. 15, 1061–1074 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
 16.Kartashov, N.V.: Strong Stable Markov Chains. TBIMC Scientific Publishers, Kiev (1996)CrossRefzbMATHGoogle Scholar
 17.Kemeny, J.G., Snell, J.L.: Finite Markov Chains: With a New Appendix “Generalization of a Fundamental Matrix”. Springer, New York (1976)zbMATHGoogle Scholar
 18.Kleinberg, J.M.: Navigation in a small world. Nature 406(6798), 845 (2000). https://doi.org/10.1038/35022643 CrossRefGoogle Scholar
 19.Kontoyiannis, I., Meyn, S.P.: Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Probab. 13(1), 304–362 (2013)MathSciNetzbMATHGoogle Scholar
 20.Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2011)zbMATHGoogle Scholar
 21.Morrison, J.L., Breitling, R., Higham, D.J., Gilbert, D.R.: A lockandkey model for proteinprotein interactions. Bioinformatics (Oxford, England) 22(16), 2012–2019 (2006). https://doi.org/10.1093/bioinformatics/btl338 CrossRefGoogle Scholar
 22.Newman, M.: Networks: An Introduction. Oxford University Press, Oxford (2010)CrossRefzbMATHGoogle Scholar
 23.Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)CrossRefzbMATHGoogle Scholar
 24.Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (2014)zbMATHGoogle Scholar
 25.Stewart, W.J.: Introduction to the Numerical Solution of Markov Chains. Princeton University Press, Princeton (1994)zbMATHGoogle Scholar
 26.Taylor, A., Higham, D.J.: CONTEST. ACM Trans. Math. Softw. 35(4), 1–17 (2009). https://doi.org/10.1145/1462173.1462175 CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.