Abstract
Consider a Markov chain with finite state space and suppose you wish to change time replacing the integer step index n with a random counting process N(t). What happens to the mixing time of the Markov chain? We present a partial reply in a particular case of interest in which N(t) is a counting renewal process with power-law distributed inter-arrival times of index \(\beta \). We then focus on \(\beta \in (0,1)\), leading to infinite expectation for inter-arrival times and further study the situation in which inter-arrival times follow the Mittag-Leffler distribution of order \(\beta \).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
1.1 Motivation
The original motivation for this paper stems from a 2011 paper [8] where a probabilistic theory for dynamic networks was presented. In particular, given a fixed set of vertices, an embedded Markov chain was considered on the space of all possible graphs connecting the vertices. This discrete-time chain was then transformed into a continuous-time chain by means of a simple time change with a counting process. In a subsequent paper [3], we explicitly solved one of the models presented in [8]. This model is equivalent to an \(\alpha \)-delayed version of the Ehrenfest urn chain and the time change is the fractional Poisson process [4] of renewal type [6]. At that time, we initiated a discussion on how this time change for a discrete-time discrete-space Markov chain affects mixing times and the convergence rate to equilibrium. Below we collect results on this point in the interesting case in which the inter-arrival times between two consecutive transitions of the embedded chain have a power-law distribution with index \(\beta \), also covering the case in which \(\beta \in (0,1)\) meaning that the expected value of the waiting times is infinite. In the latter case, under an appropriate choice of the distribution of inter-arrival times, it is possible to show that the forward Kolmogorov equations can be replaced by a fractional version with Caputo derivative of index \(\beta \) (see e.g. [3] for details in the case mentioned above and [7] for a general theory) when the initial time of the process is a renewal point.
The starting point of our discussion is that the continuous-time probabilities \(p_{i,j} (t)\) of being in state j at time t, given that the process was in state i at time 0 converge to the same equilibrium distribution as in the case of the embedded chain. Then, in Theorem 2, we prove lower and upper bounds for the mixing time of the continuous-time chain based on the mixing time of the embedded chain and, in Theorem 3, we specialize the result to the case in which inter-arrival times follow the Mittag-Leffler distribution where a sharper upper bound is available. We believe these bounds can be useful for applied scientists simulating these processes, for instance to estimate how far from equilibrium their simulations are.
1.2 Preliminaries
Let \(T_1, T_2, \ldots \) be a sequence of independent positive random variables with the meaning of inter-event times or waiting times (with common law \(\nu \)) and define the partial sum
The sequence \(S_1, S_2, \ldots \) denotes the event times at which the state of the Markov chain X(t) attempts to change.
The embedded Markov chain is a discrete time chain \(X_{n}, n\ge 1\), with state space \(\mathcal {S}\). Initially we assume an initial distribution \(\mu ^{(0)}\), i.e. \(\mathbb {P}\{X_0 =i\} =\mu ^{(0)}_i\) and the chain evolves according to a discrete transition kernel \(q: \mathcal {S}\times \mathcal {S}\rightarrow [0,1]\). As usual, since \(\mathcal {S}\) is finite, the transition kernel may be encoded as a transition matrix \(Q = (q_{i,j})_{1 \le i,j \le |\mathcal {S}|}\). We will be assuming the chain \(X_n\) is irreducible for convenience of the exposition. Otherwise, all theorems below can be ascribed to each irreducible component separately. Moreover we shall also assume that the chain \(X_n\) is aperiodic. Again, this is a technical point when discussing the convergence to equilibrium, as in the irreducible aperiodic case we have almost sure convergence to the unique invariant measure for the discrete chain.
We couple the embedded chain \(X_n\) with the process X(t) via the counting process
that gives the number of events from time 0 up to a finite time horizon t. Then we have
i.e. the state of the process at time t is the same as that of the embedded chain after the last event before time t occurred.
All information about X(t) is encoded in the pairs \(\{(X_n, T_n)\}_{n \ge 1}\) which are a discrete-time Markov renewal process, satisfying
\(X(\cdot )\) is then a semi-Markov process subordinated to \(N_\nu (t)\) where we use “subordination” with the meaning of “time change” with an abuse of language. Under the assumption that \(\mu _0 = \delta _{i}\) (deterministic starting point), the temporal evolution of its transition probabilities satisfies the forward equations
Above we introduced \(p_{i,j}(t) = \mathbb {P}\{X(t) = j | X(0) = i\}\), the tail (complementary cumulative distribution function) \(\overline{F}_\nu (t) = 1- F_\nu (t)\) and \(f_\nu (t)\) the Radon-Nikodym derivative of \(\nu \) with respect to Lebesgue (the probability density function if appropriate smoothness conditions are satisfied). These equations are proved by conditioning on the time of the last event before time t and it is implicitly assumed that at \(t=0\) we have a renewal point.
A conditioning argument on the values of \(N_{\nu }(t)\) gives
where \(q^{(n)}_{i,j}\) are the n-step transitions of the embedded discrete Markov chain, namely the entries of the n-th power of the transition matrix \(Q = (q_{i,j})_{1 \le i, j \le |\mathcal {S}|}\).
From the ergodic theorem we have that for any i, j
This is sufficient to argue the following lemma.
Lemma 1
Consider the transition probabilities given by (1.6) and assume that \(\displaystyle \lim _{n \rightarrow \infty } q^{(n)}_{i,j} = \pi _{j}\). Then
Proof
Let N large enough so that for a given \(\varepsilon >0\) we have for all \(n > N\)
Then, substituting in (1.6) we have
Then as \(t \rightarrow \infty \) the first probability tends to 0, while \(\mathbb {P}\{N_\nu (t) > N\} \rightarrow 1\). Then let \(\varepsilon \rightarrow 0\) to obtain
The lower bound follows in a similar manner so we omit details. \(\square \)
This straightforward convergence result is the starting point of this discussion. In discrete Markov chains, there is a substantial body of literature (see [5] and references therein) examining quantitative estimates on the convergence; this information is encapsulated in information about the mixing times of the chain, using the total variation distance between the two measures.
1.3 Total variation distance and mixing times for discrete chains
Let \({\mathcal {F}}\) denote the \(\sigma \)-algebra of events of a space \(\Omega \) and \(\mu , \nu \) two probability measures on this space. Then the total variation distance between two measures is defined as
and one can show that for countable spaces
Moreover, the total variation distance between two measures can be given in terms of a different variational formula (coupling):
Both formulas have merit, as (1.7) can be used for a lower bound, while (1.9) for upper bounds on mixing times.
For any \(\varepsilon >0\), we define the mixing time \(T_{\varepsilon }\) of a finite state, aperiodic, irreducible Markov chain to be
The fact that \(\Vert q^{(n)}_{i,\cdot } - \pi _\cdot \Vert \) is non-increasing in n means that for all \(N > T_\varepsilon \) we have \(\Vert q^{(N)}_{i, \cdot } - \pi _{\cdot } \Vert \le \varepsilon \) and that \(T_{\varepsilon }\) are non-decreasing as \(\varepsilon \rightarrow 0\). Loosely speaking, for a given tolerance \(\varepsilon \), the mixing time tells us how long it takes the chain to start behaving as if it is near equilibrium.
Equation (1.9) can be used to obtain an upper bound for the mixing times the following way. First we construct a coupling between the two Markov chains, where \(X_0 \sim \delta _i\), \(Y_0 \sim \pi \). Both chains evolve according to the transition matrix Q independently, until they meet at some state x, after which the chains just jump to the same location together, again according to Q. The marginals of the pair chain \((X_n, Y_n)\) are still those of two Markov chains, so this description is indeed the description of a coupling between the two.
At the instant where the two independent chains meet, the pair Markov chain \((X_n, Y_n)\) hits the set
Let the hitting time of this set be
Then, using this coupling between the chains and (1.9) one can obtain
At this point the general theory of Markov chains can assist with uniform estimates on the hitting time, irrespective of the initial measure. This can be obtained by using the fact that the two chains act independently from another- until they meet at time \(\tau _{D}\)- and we have
where \(c_1, c_2\) are uniform constants. In particular this gives the bound
Using only Q one can derive upper bounds for \(\ell ^*_D\), so we treat that as a computable constant. Now, if, overall, the upper bound is less than \(\varepsilon |\mathcal {S}|^{-1}\) for some \(n_\varepsilon \) then (1.10) implies that \(T_{\varepsilon } \le n_\varepsilon \). Forcing the upper bound in the display above to be less than \(\varepsilon |\mathcal {S}|^{-1}\) we have that
which in turn gives that there exists a function \(f(\mathcal {S}, Q)\) such that
which shows us how the mixing time depends on the order of \(\varepsilon \).
For a lower bound, the most basic method involves counting; it relies on the idea that if the possible locations of a chain after n jumps do not cover a substantial proportion of the state space, we cannot be close to mixing. Then one can get
The constant c(Q) only depends on the transition matrix. Note that the lower bound above is not necessarily close to the upper bound, and as \(\varepsilon \rightarrow 0\) it gets weaker. The \(\varepsilon \)-order of this agrees with the upper bound when \(|{\mathcal {S}}| \sim \varepsilon ^{-1}\). Many further methods exist for lower bounds, but are usually model-dependent. We briefly mention that a suitable \(L^2\) theory exists for reversible, aperiodic, irreducible MCs so bounds on \(T_{\varepsilon }\) from below are of the same order as the upper bounds,
where \(\gamma ^*\) is the spectral gap of Q (the difference between 1 and the second largest eigenvalue \(\lambda _2\)).
2 Results
In this short paper, we will bound mixing times for continuous semi-Markov processes with heavy tails for the distribution of inter-event times. Using Lemma 1 we have that the convergence occurs (albeit more slowly than Markov chains). The global time change we performed on the chain will be reflected in the bounds for the mixing times, as we obtain them in terms of the mixing times of the embedded discrete chain.
At this point, we want to impose some conditions on the distribution of the inter-event times we are looking at. In particular:
Assumption
We assume there are two uniform constants \(c_1\) and \(c_2\), a \(t_0 > 0\) and a \(\beta >0\) such that
Note that we are not assuming any moments exist for the inter-event distributions as \(\beta \) can be in (0, 1). In the case where moments exist, the results sharpen.
For any \(\varepsilon >0\) we define the mixing time for the continuous semi-Markov chain to be
By Lemma 1 we know the \(p_{i, \cdot }\) converge to \(\pi \) so the above object is finite and well defined.
2.1 Motivating examples
Example 1
(Diagonalizable transition matrix) In this example, we make the assumption that Q is the transition matrix of an irreducible, aperiodic Markov chain and in particular that it is diagonalisable. Let \(\pi \) denote the unique invariant distribution of the Markov chain and recall that \(\pi \) is a left 1-eigenvector for the matrix Q and the vector \(\mathbf{1} = (1, \ldots , 1)\) is a right 1-eigenvector. Since Q is diagonalisable, we have that there exists a matrix L so that \(LQL^{-1} = D\) and without loss of generality we may assume that \(d_{11} = 1\) and that \(\ell _{1j} = \pi _j\). Furthermore, by the Perron-Frobenius theorem, the 1-eigenspace has dimension 1 and therefore the first column of \(L^{-1} = ({\tilde{\ell }}_{ij})\) is a right 1-eigenvector of Q and therefore satisfies \({\tilde{\ell }}_{i1} = 1\).
Then \(Q^n = L^{-1}D^nL\) and on a coordinate by coordinate computation we have
The eigenvalues \(\lambda _k\) remaining in the sum all have \(|\lambda _k| < 1\), with the sum vanishing as n grows and the n-step transitions converging to the invariant distribution.
Substituting the last relationship back in (1.6), we have
Particularly, the convergence to equilibrium for a finite state space process only depends on the tails of the probability generating function of \(N_\nu (t)\). Then, since \(N_\nu \) is an increasing process and \(|\lambda _k| < 1\), we may bound
Therefore the total variation distance as a function of time only depends on the tails of the probability generating function.
In fact, the following rough estimate can be performed, keeping in mind that \(|\lambda _2| <1\). Let K such that \(|\lambda _2|^K < \varepsilon /2\),
From Lemma 2 below, the second term above decays like \(K^{1+\beta }t^{-\beta }\) and modulates t in order to make this quantity arbitrarily small.
Example 2
(Mittag-Leffler waiting times) This example is taken from [2]. When the waiting times \(T_i\) are Mittag-Leffler with parameter \(\beta \), we have that \(P_{N_\nu (t)}(\lambda )\) \(= E_{\beta }((\lambda -1)t^\beta )\), where \(E_{\beta }\) is the Mittag-Leffler function with parameter \(\beta \in (0,1]\). For large t values we have that
and therefore
The total variation distance becomes less than \(\varepsilon > 0\), when
We compute an explicit value for \(C_{\lambda , \beta , N}\) later, in the proof of Theorem 2.
We are now ready to state the main theorem.
Theorem 1
Assume (2.1). Let \(\varepsilon > 0\) and \(T^\mathrm{{emb}}_{\varepsilon /2}\) the \(\varepsilon /2\)-mixing time for the embedded chain, be given by (1.10). Then for any \(\beta > 0 \) we can find explicit constants \(C_1\) so that
This theorem is quite general as it makes no further assumptions on the background chain. Moreover, as is often the case for discrete Markov chains, a lot of the sophisticated estimates on mixing times are model dependent, so a theorem like Theorem 1 can utilise those bounds directly.
In the case where the inter-event times are Mittag-Leffler distributed we can make the upper bound sharper.
Theorem 2
Let X(t) be a finite space semi-Markov process for which the inter-event times are Mittag-Leffler\((\beta )\) distributed. Then,
In Figure 1 we see a simulation of the fractional Ehrenfest chain for times before and at the upper bound of the mixing time in Theorem 2.
A natural question arises about lower bounds for \(T^\mathrm{{cont}}_{\varepsilon }\). These are more challenging to obtain for the total variation distance directly. However, by defining a different distance between the measures we can obtain also lower bounds. Let
Note that
and therefore if the expected value (2.5) is less than \(\varepsilon \) then the total variation distance is small. In particular this already gives
Using definition (2.5), we can however find bounds for \(\widetilde{T}^\mathrm{{cont}}_\varepsilon \).
Theorem 3
Assume (2.1). Let \(\delta > 0\) and \(T^\mathrm{{emb}}_{\delta }\) the \(\delta \)-mixing time for the embedded chain, be given by (1.10). Then for any \(\beta > 0\), and any \(\alpha \in (0,1)\) we can find explicit constants \(C_1 < C_2\) so that
We are now ready to present the proofs in the next section.
3 Mixing times and equilibrium
Lemma 2
Under assumption (2.1), let \(K \in \mathbb {N}\) and let \(t > (t_0 \vee (2c_2)^{1/\beta })K\). Then, there exists a uniform positive constant \(C_0\) so that
Proof
The assumptions of the lemma guarantee that all functions below are well defined, all constants arising from Taylor’s theorem do not depend on t and the error of Taylor’s theorem is small. When \(t > (t_0 \vee (2c_2)^{1/\beta })K\) we have
For a lower bound we can write
The lemma follows from (3.2) and (3.3). The last inequality on the right side of (3.1) comes directly from the assumption. \(\square \)
Proof of Theorem 1
It suffices to prove that for arbitrary \(M< L\) the total variation distance between the transition probabilities and the equilibrium distribution is bounded above, according to the following
Assume for the moment that (3.4) holds and set \(M = T^{\text {emb}}_{\varepsilon /2}\). By the definition of \(T^{\text {emb}}_{\varepsilon /2}\), the middle term on the right-hand side of (3.4) is bounded above by \(\varepsilon /2\). Then let \(L \rightarrow \infty \) so that the third term vanishes.
The left-hand side is then bounded by \(\varepsilon \) -and therefore the continuous process is \(\varepsilon \)-mixed- if \(\mathbb {P}\{ N_t < T^{\text {emb}}_{\varepsilon /2} \} \le \varepsilon /2\). By Lemma 2 this happens whenever
and therefore
The theorem is proven when we establish (3.4). To this end,
\(\square \)
Proof Theorem 2
When the counting process \(N_{\beta }(t)\) has Mittag-Leffler(\(\beta \)) inter-event times, we have
where \(B(\cdot ,\cdot )\) is the beta function.
As in Example 2,
Here, \(M_{N_\beta (t)}(s)\) is the moment generating function of the counting process \(N_{\beta }(t)\), while \(E_{\beta }\) is the Mittag-Leffler function with parameter \(\beta \). \(\ell ^*_D\) is defined in equation (1.11).
We will extrapolate mixing times asymptotics by forcing
The equality between this two quantities is a beautiful fact of the Mittag-Leffler function. The derivation of the moment generating function can be found in the book [1] and in [4].
One way to bound above the moment generating function is by
The constant \(\theta \) is to be determined so that each term above is bounded by \(\varepsilon /2\). For the first term we will use the Paley-Zygmound inequality. For any \(\theta \in [0,1]\) we have
The function \(\frac{\beta B(\beta , 1/2)}{2^{2\beta -1}} -1\) is monotonically decreasing in \(\beta \) and takes values in (0, 1). therefore there is a unique \(\theta ^*(\beta )\) in (0, 1) so that \((1-\theta ^*(\beta ))^{-2}(\frac{\beta B(\beta , 1/2)}{2^{2\beta -1}} -1)= 1\). For the particular value of \(\theta ^*\) we bound
In particular, we obtain
This is a much improved bound for the probability, than the one established in Lemma 2. Impose that the upper bound in (3.9) is less than \(\varepsilon /2\) to obtain that
Similarly, set
Combine (3.10) and (3.11) in (3.8), which in turn can bound (3.7) to conclude that the relation
as required. \(\square \)
Proof of Theorem 3
Using definition (2.5), we can however find a lower bound for \(\widetilde{T}^\mathrm{{cont}}_\varepsilon \). We have that for any M positive,
If we set \(M = \frac{1}{2}T^{\text {emb}}_{\varepsilon ^\alpha }\), we have
and therefore it suffices to have \(\mathbb {P}\{ N_t < T^{\text {emb}}_{\varepsilon ^\alpha }/2 \} > \varepsilon ^{1-\alpha }\), in order for the two measures to not be close in distance (2.5). This is enough to guarantee
At this point we need to separate two cases, depending on the assumption of Lemma 2. If \(\beta < 1\), then the assumption of the lemma requires
in order to use (3.1), while we must also have
so that the lower bound in Lemma 2 is non-negative. Then
In order for both inequalities (3.14) and (3.15) to be satisfied, we need (modulo the constants)
which is true as \(\alpha <1\). Therefore in the case \(\beta <1\)
Now suppose that \(\beta \ge 1\). Then for the estimate in Lemma 2 to be meaningful (i.e. the lower bound is strictly greater than 0), we need that \(t^{\beta }> T^{\text {emb}}_{\varepsilon ^{\alpha }}\). This is guaranteed by the assumption of Lemma 2 and we obtain
Now for the upper bound in the theorem, we can repeat the arguments of Theorem 1. We have
and therefore bound (3.4) and all subsequent arguments work for this distance as well. \(\square \)
References
Baleanu, D., Diethelm, K., Scalas, E., Trujillo, J.J.: Fractional Calculus: Models and Numerical Methods. World Scientific, Singapore (2016)
de Nigris, S., Hastir, A., Lambiotte, R.: Burstiness and fractional diffusion on complex networks. Eur. Phys. J. B 89, Art. 114 (2016)
Georgiou, N., Kiss, I.Z., Scalas, E.: Solvable non-Markovian dynamic network. Phys. Rev. E 92, Art. 042801 (2015)
Laskin, N.: Fractional Poisson process. Commun. Nonlinear Sci, Numer. Simul. 8, 201–213 (2003)
Levin, D.A., Peres, Y., Wilmer, E.L.: Markov Chains and Mixing Times. American Mathematical Society (2009)
Mainardi, F., Gorenflo, R., Scalas, E.: A fractional generalization of the Poisson process. Vietnam J. Math. 32(SI), 53–64 (2004)
Meerschaert, M.M., Toaldo, B.: Relaxation patterns and semi-Markov dynamics. Stoch. Process. Their Appl. 129(8), 2850–2879 (2019)
Raberto, M., Rapallo, F., Scalas, E.: Semi-Markov graph dynamics. Plos One 6(8), Art. e23370 (2011)
Acknowledgements
Both authors were partially supported by the Dr Perry James (Jim) Browne Research Center at the Department of Mathematics, University of Sussex.
Open Access
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.
About this article
Cite this article
Georgiou, N., Scalas, E. Bounds for mixing times for finite semi-Markov processes with heavy-tail jump distribution. Fract Calc Appl Anal 25, 229–243 (2022). https://doi.org/10.1007/s13540-021-00010-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13540-021-00010-2