Bounds for mixing times for finite semi-Markov processes with heavy-tail jump distribution

Consider a Markov chain with finite state space and suppose you wish to change time replacing the integer step index n with a random counting process N(t). What happens to the mixing time of the Markov chain? We present a partial reply in a particular case of interest in which N(t) is a counting renewal process with power-law distributed inter-arrival times of index β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}. We then focus on β∈(0,1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta \in (0,1)$$\end{document}, leading to infinite expectation for inter-arrival times and further study the situation in which inter-arrival times follow the Mittag-Leffler distribution of order β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}.


Motivation
The original motivation for this paper stems from a 2011 paper [8] where a probabilistic theory for dynamic networks was presented. In particular, given a fixed set of vertices, an embedded Markov chain was considered on the space of all possible graphs connecting the vertices. This discrete-time chain was then transformed into a continuous-time chain by means of a simple time change with a counting process. In a subsequent paper [3], we explicitly solved one of the models presented in [8]. This model is equivalent to an α-delayed version of the Ehrenfest urn chain and the time change is the fractional Poisson process [4] of renewal type [6]. At that time, we initiated a discussion on how this time change for a discrete-time discrete-space Markov chain affects mixing times and the convergence rate to equilibrium. Below we collect results on this point in the interesting case in which the inter-arrival times between two consecutive transitions of the embedded chain have a power-law distribution with index β, also covering the case in which β ∈ (0, 1) meaning that the expected value of the waiting times is infinite. In the latter case, under an appropriate choice of the distribution of inter-arrival times, it is possible to show that the forward Kolmogorov equations can be replaced by a fractional version with Caputo derivative of index β (see e.g. [3] for details in the case mentioned above and [7] for a general theory) when the initial time of the process is a renewal point.
The starting point of our discussion is that the continuous-time probabilities p i, j (t) of being in state j at time t, given that the process was in state i at time 0 converge to the same equilibrium distribution as in the case of the embedded chain. Then, in Theorem 2, we prove lower and upper bounds for the mixing time of the continuoustime chain based on the mixing time of the embedded chain and, in Theorem 3, we specialize the result to the case in which inter-arrival times follow the Mittag-Leffler distribution where a sharper upper bound is available. We believe these bounds can be useful for applied scientists simulating these processes, for instance to estimate how far from equilibrium their simulations are.

Preliminaries
Let T 1 , T 2 , . . . be a sequence of independent positive random variables with the meaning of inter-event times or waiting times (with common law ν) and define the partial sum (1.1) The sequence S 1 , S 2 , . . . denotes the event times at which the state of the Markov chain X (t) attempts to change. The embedded Markov chain is a discrete time chain X n , n ≥ 1, with state space S. Initially we assume an initial distribution μ (0) , i.e. P{X 0 = i} = μ (0) i and the chain evolves according to a discrete transition kernel q : S × S → [0, 1]. As usual, since S is finite, the transition kernel may be encoded as a transition matrix Q = (q i, j ) 1≤i, j≤|S| . We will be assuming the chain X n is irreducible for convenience of the exposition. Otherwise, all theorems below can be ascribed to each irreducible component separately. Moreover we shall also assume that the chain X n is aperiodic. Again, this is a technical point when discussing the convergence to equilibrium, as in the irreducible aperiodic case we have almost sure convergence to the unique invariant measure for the discrete chain.
We couple the embedded chain X n with the process X (t) via the counting process N ν (t) = max{n ∈ N : S n ≤ t} (1.2) that gives the number of events from time 0 up to a finite time horizon t. Then we have i.e. the state of the process at time t is the same as that of the embedded chain after the last event before time t occurred. All information about X (t) is encoded in the pairs {(X n , T n )} n≥1 which are a discrete-time Markov renewal process, satisfying is then a semi-Markov process subordinated to N ν (t) where we use "subordination" with the meaning of "time change" with an abuse of language. Under the assumption that μ 0 = δ i (deterministic starting point), the temporal evolution of its transition probabilities satisfies the forward equations Above we introduced p i, j (t) = P{X (t) = j|X (0) = i}, the tail (complementary cumulative distribution function) F ν (t) = 1 − F ν (t) and f ν (t) the Radon-Nikodym derivative of ν with respect to Lebesgue (the probability density function if appropriate smoothness conditions are satisfied). These equations are proved by conditioning on the time of the last event before time t and it is implicitly assumed that at t = 0 we have a renewal point. A conditioning argument on the values of N ν (t) gives where q (n) i, j are the n-step transitions of the embedded discrete Markov chain, namely the entries of the n-th power of the transition matrix Q = (q i, j ) 1≤i, j≤|S| .
From the ergodic theorem we have that for any i, j This is sufficient to argue the following lemma.

Lemma 1
Consider the transition probabilities given by (1.6) and assume that lim Proof Let N large enough so that for a given ε > 0 we have for all n > N Then, substituting in (1.6) we have Then as t → ∞ the first probability tends to 0, while The lower bound follows in a similar manner so we omit details.
This straightforward convergence result is the starting point of this discussion. In discrete Markov chains, there is a substantial body of literature (see [5] and references therein) examining quantitative estimates on the convergence; this information is encapsulated in information about the mixing times of the chain, using the total variation distance between the two measures.

Total variation distance and mixing times for discrete chains
Let F denote the σ -algebra of events of a space and μ, ν two probability measures on this space. Then the total variation distance between two measures is defined as and one can show that for countable spaces Moreover, the total variation distance between two measures can be given in terms of a different variational formula (coupling): is a coupling of μ and ν}. (1.9) Both formulas have merit, as (1.7) can be used for a lower bound, while (1.9) for upper bounds on mixing times. For any ε > 0, we define the mixing time T ε of a finite state, aperiodic, irreducible Markov chain to be (1.10) The fact that q i,· − π · ≤ ε and that T ε are non-decreasing as ε → 0. Loosely speaking, for a given tolerance ε, the mixing time tells us how long it takes the chain to start behaving as if it is near equilibrium.
Equation (1.9) can be used to obtain an upper bound for the mixing times the following way. First we construct a coupling between the two Markov chains, where X 0 ∼ δ i , Y 0 ∼ π . Both chains evolve according to the transition matrix Q independently, until they meet at some state x, after which the chains just jump to the same location together, again according to Q. The marginals of the pair chain (X n , Y n ) are still those of two Markov chains, so this description is indeed the description of a coupling between the two.
At the instant where the two independent chains meet, the pair Markov chain (X n , Y n ) hits the set Let the hitting time of this set be Then, using this coupling between the chains and (1.9) one can obtain At this point the general theory of Markov chains can assist with uniform estimates on the hitting time, irrespective of the initial measure. This can be obtained by using the fact that the two chains act independently from another-until they meet at time τ D -and we have where c 1 , c 2 are uniform constants. In particular this gives the bound Using only Q one can derive upper bounds for * D , so we treat that as a computable constant. Now, if, overall, the upper bound is less than ε|S| −1 for some n ε then (1.10) implies that T ε ≤ n ε . Forcing the upper bound in the display above to be less than ε|S| −1 we have that which in turn gives that there exists a function f (S, Q) such that which shows us how the mixing time depends on the order of ε. For a lower bound, the most basic method involves counting; it relies on the idea that if the possible locations of a chain after n jumps do not cover a substantial proportion of the state space, we cannot be close to mixing. Then one can get The constant c(Q) only depends on the transition matrix. Note that the lower bound above is not necessarily close to the upper bound, and as ε → 0 it gets weaker. The ε-order of this agrees with the upper bound when |S| ∼ ε −1 . Many further methods exist for lower bounds, but are usually model-dependent. We briefly mention that a suitable L 2 theory exists for reversible, aperiodic, irreducible MCs so bounds on T ε from below are of the same order as the upper bounds, where γ * is the spectral gap of Q (the difference between 1 and the second largest eigenvalue λ 2 ).

Results
In this short paper, we will bound mixing times for continuous semi-Markov processes with heavy tails for the distribution of inter-event times. Using Lemma 1 we have that the convergence occurs (albeit more slowly than Markov chains). The global time change we performed on the chain will be reflected in the bounds for the mixing times, as we obtain them in terms of the mixing times of the embedded discrete chain. At this point, we want to impose some conditions on the distribution of the interevent times we are looking at. In particular: Assumption We assume there are two uniform constants c 1 and c 2 , a t 0 > 0 and a β > 0 such that (2.1) Note that we are not assuming any moments exist for the inter-event distributions as β can be in (0, 1). In the case where moments exist, the results sharpen.
For any ε > 0 we define the mixing time for the continuous semi-Markov chain to be (2.2) By Lemma 1 we know the p i,· converge to π so the above object is finite and well defined.

Motivating examples
Example 1 (Diagonalizable transition matrix) In this example, we make the assumption that Q is the transition matrix of an irreducible, aperiodic Markov chain and in particular that it is diagonalisable. Let π denote the unique invariant distribution of the Markov chain and recall that π is a left 1-eigenvector for the matrix Q and the vector 1 = (1, . . . , 1) is a right 1-eigenvector. Since Q is diagonalisable, we have that there exists a matrix L so that L QL −1 = D and without loss of generality we may assume that d 11 = 1 and that 1 j = π j . Furthermore, by the Perron-Frobenius theorem, the 1-eigenspace has dimension 1 and therefore the first column of L −1 = (˜ i j ) is a right 1-eigenvector of Q and therefore satisfies˜ i1 = 1. Then Q n = L −1 D n L and on a coordinate by coordinate computation we have The eigenvalues λ k remaining in the sum all have |λ k | < 1, with the sum vanishing as n grows and the n-step transitions converging to the invariant distribution. Substituting the last relationship back in (1.6), we have Particularly, the convergence to equilibrium for a finite state space process only depends on the tails of the probability generating function of N ν (t). Then, since N ν is an increasing process and |λ k | < 1, we may bound Therefore the total variation distance as a function of time only depends on the tails of the probability generating function. In fact, the following rough estimate can be performed, keeping in mind that |λ 2 | < 1. Let K such that |λ 2 | K < ε/2, From Lemma 2 below, the second term above decays like K 1+β t −β and modulates t in order to make this quantity arbitrarily small. Example 2 (Mittag-Leffler waiting times) This example is taken from [2]. When the waiting times T i are Mittag-Leffler with parameter β, we have that P N ν (t) (λ) (2.4) The total variation distance becomes less than ε > 0, when We compute an explicit value for C λ,β,N later, in the proof of Theorem 2.
We are now ready to state the main theorem.
Theorem 1 Assume (2.1). Let ε > 0 and T emb ε/2 the ε/2-mixing time for the embedded chain, be given by (1.10). Then for any β > 0 we can find explicit constants C 1 so that This theorem is quite general as it makes no further assumptions on the background chain. Moreover, as is often the case for discrete Markov chains, a lot of the sophisticated estimates on mixing times are model dependent, so a theorem like Theorem 1 can utilise those bounds directly.
In the case where the inter-event times are Mittag-Leffler distributed we can make the upper bound sharper.

Theorem 2 Let X (t) be a finite space semi-Markov process for which the inter-event times are Mittag-Leffler(β) distributed. Then,
In Figure 1 we see a simulation of the fractional Ehrenfest chain for times before and at the upper bound of the mixing time in Theorem 2.
A natural question arises about lower bounds for T cont ε . These are more challenging to obtain for the total variation distance directly. However, by defining a different distance between the measures we can obtain also lower bounds. Let i,· − π < ε, for all s > t}. (2.5) Note that and therefore if the expected value (2.5) is less than ε then the total variation distance is small. In particular this already gives Using definition (2.5), we can however find bounds for T cont ε . Theorem 3 Assume (2.1). Let δ > 0 and T emb δ the δ-mixing time for the embedded chain, be given by (1.10). Then for any β > 0, and any α ∈ (0, 1) we can find explicit constants C 1 < C 2 so that We are now ready to present the proofs in the next section.

Mixing times and equilibrium
Lemma 2 Under assumption (2.1), let K ∈ N and let t > (t 0 ∨ (2c 2 ) 1/β )K . Then, there exists a uniform positive constant C 0 so that Proof The assumptions of the lemma guarantee that all functions below are well defined, all constants arising from Taylor's theorem do not depend on t and the error of Taylor's theorem is small. When t > (t 0 ∨ (2c 2 ) 1/β )K we have For a lower bound we can write The lemma follows from (3.2) and (3.3). The last inequality on the right side of (3.1) comes directly from the assumption.

Proof of Theorem 1
It suffices to prove that for arbitrary M < L the total variation distance between the transition probabilities and the equilibrium distribution is bounded above, according to the following Assume for the moment that (3.4) holds and set M = T emb ε/2 . By the definition of T emb ε/2 , the middle term on the right-hand side of (3.4) is bounded above by ε/2. Then let L → ∞ so that the third term vanishes.
The left-hand side is then bounded by ε -and therefore the continuous process is ε-mixedif P{N t < T emb ε/2 } ≤ ε/2. By Lemma 2 this happens whenever The theorem is proven when we establish (3.4). To this end, Proof Theorem 2 When the counting process N β (t) has Mittag-Leffler(β) inter-event times, we havē where B(·, ·) is the beta function. As in Example 2, Here, M N β (t) (s) is the moment generating function of the counting process N β (t), while E β is the Mittag-Leffler function with parameter β. * D is defined in equation (1.11).
We will extrapolate mixing times asymptotics by forcing The equality between this two quantities is a beautiful fact of the Mittag-Leffler function. The derivation of the moment generating function can be found in the book [1] and in [4]. One way to bound above the moment generating function is by The constant θ is to be determined so that each term above is bounded by ε/2. For the first term we will use the Paley-Zygmound inequality. For any θ ∈ [0, 1] we have The function β B(β,1/2) 2 2β−1 − 1 is monotonically decreasing in β and takes values in (0, 1). therefore there is a unique θ * (β) in (0, 1) so that (1 − θ * (β)) −2 ( β B(β,1/2) 2 2β−1 − 1) = 1. For the particular value of θ * we bound P N β (t) ≥ θ * (β)t β (1 + β) In particular, we obtain This is a much improved bound for the probability, than the one established in Lemma 2. Impose that the upper bound in (3.9) is less than ε/2 to obtain that as required.

Proof of Theorem 3
Using definition (2.5), we can however find a lower bound for T cont ε . We have that for any M positive, and therefore it suffices to have P{N t < T emb ε α /2} > ε 1−α , in order for the two measures to not be close in distance (2.5). This is enough to guarantee by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.