1 Introduction

1.1 Motivation

The original motivation for this paper stems from a 2011 paper [8] where a probabilistic theory for dynamic networks was presented. In particular, given a fixed set of vertices, an embedded Markov chain was considered on the space of all possible graphs connecting the vertices. This discrete-time chain was then transformed into a continuous-time chain by means of a simple time change with a counting process. In a subsequent paper [3], we explicitly solved one of the models presented in [8]. This model is equivalent to an \(\alpha \)-delayed version of the Ehrenfest urn chain and the time change is the fractional Poisson process [4] of renewal type [6]. At that time, we initiated a discussion on how this time change for a discrete-time discrete-space Markov chain affects mixing times and the convergence rate to equilibrium. Below we collect results on this point in the interesting case in which the inter-arrival times between two consecutive transitions of the embedded chain have a power-law distribution with index \(\beta \), also covering the case in which \(\beta \in (0,1)\) meaning that the expected value of the waiting times is infinite. In the latter case, under an appropriate choice of the distribution of inter-arrival times, it is possible to show that the forward Kolmogorov equations can be replaced by a fractional version with Caputo derivative of index \(\beta \) (see e.g. [3] for details in the case mentioned above and [7] for a general theory) when the initial time of the process is a renewal point.

The starting point of our discussion is that the continuous-time probabilities \(p_{i,j} (t)\) of being in state j at time t, given that the process was in state i at time 0 converge to the same equilibrium distribution as in the case of the embedded chain. Then, in Theorem 2, we prove lower and upper bounds for the mixing time of the continuous-time chain based on the mixing time of the embedded chain and, in Theorem 3, we specialize the result to the case in which inter-arrival times follow the Mittag-Leffler distribution where a sharper upper bound is available. We believe these bounds can be useful for applied scientists simulating these processes, for instance to estimate how far from equilibrium their simulations are.

1.2 Preliminaries

Let \(T_1, T_2, \ldots \) be a sequence of independent positive random variables with the meaning of inter-event times or waiting times (with common law \(\nu \)) and define the partial sum

$$\begin{aligned} S_n = \sum _{k=1}^n T_i, \; \; n\ge 1. \end{aligned}$$
(1.1)

The sequence \(S_1, S_2, \ldots \) denotes the event times at which the state of the Markov chain X(t) attempts to change.

The embedded Markov chain is a discrete time chain \(X_{n}, n\ge 1\), with state space \(\mathcal {S}\). Initially we assume an initial distribution \(\mu ^{(0)}\), i.e.  \(\mathbb {P}\{X_0 =i\} =\mu ^{(0)}_i\) and the chain evolves according to a discrete transition kernel \(q: \mathcal {S}\times \mathcal {S}\rightarrow [0,1]\). As usual, since \(\mathcal {S}\) is finite, the transition kernel may be encoded as a transition matrix \(Q = (q_{i,j})_{1 \le i,j \le |\mathcal {S}|}\). We will be assuming the chain \(X_n\) is irreducible for convenience of the exposition. Otherwise, all theorems below can be ascribed to each irreducible component separately. Moreover we shall also assume that the chain \(X_n\) is aperiodic. Again, this is a technical point when discussing the convergence to equilibrium, as in the irreducible aperiodic case we have almost sure convergence to the unique invariant measure for the discrete chain.

We couple the embedded chain \(X_n\) with the process X(t) via the counting process

$$\begin{aligned} N_\nu (t) = \max \{ n \in \mathbb {N}: S_n \le t \} \end{aligned}$$
(1.2)

that gives the number of events from time 0 up to a finite time horizon t. Then we have

$$\begin{aligned} X(t) = X_{N_{\nu }(t)} = X_n 1\!\!1\{ S_n \le t < S_{n+1} \}, \end{aligned}$$
(1.3)

i.e. the state of the process at time t is the same as that of the embedded chain after the last event before time t occurred.

All information about X(t) is encoded in the pairs \(\{(X_n, T_n)\}_{n \ge 1}\) which are a discrete-time Markov renewal process, satisfying

$$\begin{aligned}&\mathbb {P}\{ X_{n+1} = j, T_{n+1} \le t | (X_0, S_0), \ldots , (X_n =i, S_n)\} \nonumber \\&\quad = \mathbb {P}\{ X_{n+1} = j, T_{n+1}\le t | X_n = i \}. \end{aligned}$$
(1.4)

\(X(\cdot )\) is then a semi-Markov process subordinated to \(N_\nu (t)\) where we use “subordination” with the meaning of “time change” with an abuse of language. Under the assumption that \(\mu _0 = \delta _{i}\) (deterministic starting point), the temporal evolution of its transition probabilities satisfies the forward equations

$$\begin{aligned} p_{i,j}(t) \!= {\overline{F}}_\nu (t)\delta _{ij} \!\!+\! \sum _{\ell \in \mathcal {S}} q_{\ell , j} \!\! \int _0^t \!\! p_{i,\ell }(u)f_{\nu }(t-u)\,du. \end{aligned}$$
(1.5)

Above we introduced \(p_{i,j}(t) = \mathbb {P}\{X(t) = j | X(0) = i\}\), the tail (complementary cumulative distribution function) \(\overline{F}_\nu (t) = 1- F_\nu (t)\) and \(f_\nu (t)\) the Radon-Nikodym derivative of \(\nu \) with respect to Lebesgue (the probability density function if appropriate smoothness conditions are satisfied). These equations are proved by conditioning on the time of the last event before time t and it is implicitly assumed that at \(t=0\) we have a renewal point.

A conditioning argument on the values of \(N_{\nu }(t)\) gives

$$\begin{aligned} p_{i,j}(t) = {\overline{F}}_\nu (t)\delta _{ij} + \sum _{n=1}^\infty q^{(n)}_{i,j} \mathbb {P}\{N_\nu (t) = n\}, \end{aligned}$$
(1.6)

where \(q^{(n)}_{i,j}\) are the n-step transitions of the embedded discrete Markov chain, namely the entries of the n-th power of the transition matrix \(Q = (q_{i,j})_{1 \le i, j \le |\mathcal {S}|}\).

From the ergodic theorem we have that for any ij

$$\begin{aligned} \lim _{n \rightarrow \infty } q^{(n)}_{i,j} = \pi _{j} > 0. \end{aligned}$$

This is sufficient to argue the following lemma.

Lemma 1

Consider the transition probabilities given by (1.6) and assume that \(\displaystyle \lim _{n \rightarrow \infty } q^{(n)}_{i,j} = \pi _{j}\). Then

$$\begin{aligned} \lim _{t \rightarrow \infty } p_{i,j}(t) = \pi _i. \end{aligned}$$

Proof

Let N large enough so that for a given \(\varepsilon >0\) we have for all \(n > N\)

$$\begin{aligned} |q^{(n)}_{i,j} - \pi _{i}| < \varepsilon . \end{aligned}$$

Then, substituting in (1.6) we have

$$\begin{aligned} p_{i,j}(t)&= {\overline{F}}_\nu (t)\delta _{ij} + \sum _{n=1}^\infty q^{(n)}_{i,j} \mathbb {P}\{N_\nu (t) = n\} \\&\le {\overline{F}}_\nu (t)\delta _{ij} + \sum _{n=1}^N q^{(n)}_{i,j} \mathbb {P}\{N_\nu (t) = n\} + \sum _{n=N+1}^\infty (\pi _i +\varepsilon )\mathbb {P}\{N_\nu (t) = n\}\\&\le \mathbb {P}\{N_\nu (t) \le N\} + (\pi _j +\varepsilon )\mathbb {P}\{N_\nu (t) > N\}. \end{aligned}$$

Then as \(t \rightarrow \infty \) the first probability tends to 0, while \(\mathbb {P}\{N_\nu (t) > N\} \rightarrow 1\). Then let \(\varepsilon \rightarrow 0\) to obtain

$$\begin{aligned} \lim _{t \rightarrow \infty } p_{i,j}(t) \le \pi _j. \end{aligned}$$

The lower bound follows in a similar manner so we omit details. \(\square \)

This straightforward convergence result is the starting point of this discussion. In discrete Markov chains, there is a substantial body of literature (see [5] and references therein) examining quantitative estimates on the convergence; this information is encapsulated in information about the mixing times of the chain, using the total variation distance between the two measures.

1.3 Total variation distance and mixing times for discrete chains

Let \({\mathcal {F}}\) denote the \(\sigma \)-algebra of events of a space \(\Omega \) and \(\mu , \nu \) two probability measures on this space. Then the total variation distance between two measures is defined as

$$\begin{aligned} \Vert \mu - \nu \Vert = \sup _{A \in {\mathcal {F}}}|\mu (A) - \nu (A)| \in [0,1] \end{aligned}$$
(1.7)

and one can show that for countable spaces

$$\begin{aligned} \Vert \mu - \nu \Vert = \frac{1}{2}\sum _{x} |\mu (x) - \nu (x)|. \end{aligned}$$
(1.8)

Moreover, the total variation distance between two measures can be given in terms of a different variational formula (coupling):

$$\begin{aligned} \Vert \mu - \nu \Vert = \inf \{ \mathbb {P}\{ X \ne Y \}: (X, Y) \text { is a coupling of } \mu \text { and } \nu \}. \end{aligned}$$
(1.9)

Both formulas have merit, as (1.7) can be used for a lower bound, while (1.9) for upper bounds on mixing times.

For any \(\varepsilon >0\), we define the mixing time \(T_{\varepsilon }\) of a finite state, aperiodic, irreducible Markov chain to be

$$\begin{aligned} T_\varepsilon = \inf \left\{ n: \sum _{ i \in {\mathcal {S}}} \Vert q^{(n)}_{i, \cdot } - \pi _\cdot \Vert \le \varepsilon \right\} . \end{aligned}$$
(1.10)

The fact that \(\Vert q^{(n)}_{i,\cdot } - \pi _\cdot \Vert \) is non-increasing in n means that for all \(N > T_\varepsilon \) we have \(\Vert q^{(N)}_{i, \cdot } - \pi _{\cdot } \Vert \le \varepsilon \) and that \(T_{\varepsilon }\) are non-decreasing as \(\varepsilon \rightarrow 0\). Loosely speaking, for a given tolerance \(\varepsilon \), the mixing time tells us how long it takes the chain to start behaving as if it is near equilibrium.

Equation (1.9) can be used to obtain an upper bound for the mixing times the following way. First we construct a coupling between the two Markov chains, where \(X_0 \sim \delta _i\), \(Y_0 \sim \pi \). Both chains evolve according to the transition matrix Q independently, until they meet at some state x, after which the chains just jump to the same location together, again according to Q. The marginals of the pair chain \((X_n, Y_n)\) are still those of two Markov chains, so this description is indeed the description of a coupling between the two.

At the instant where the two independent chains meet, the pair Markov chain \((X_n, Y_n)\) hits the set

$$\begin{aligned} D = \{ (x,x) : x \in {\mathcal {S}} \}. \end{aligned}$$

Let the hitting time of this set be

$$\begin{aligned} \tau _D = \inf \{ n: (X_n, Y_n) \in D \}. \end{aligned}$$

Then, using this coupling between the chains and (1.9) one can obtain

$$\begin{aligned} \Vert q^{(n)}_{i,\cdot } - \pi \Vert \le \mathbb {P} \{ X_n \ne Y_n \}=\mathbb {P}_{\delta _i\otimes \pi }\{ \tau _D > n\}. \end{aligned}$$

At this point the general theory of Markov chains can assist with uniform estimates on the hitting time, irrespective of the initial measure. This can be obtained by using the fact that the two chains act independently from another- until they meet at time \(\tau _{D}\)- and we have

$$\begin{aligned} \sup _{\mu } \mathbb {P}_\mu ( \tau _D > n) \le c_1 e^{-c_2 n/ \ell ^*_D}, \quad \ell ^*_D = \max _i \mathbb {E}_{\delta _i}(\tau _D), \end{aligned}$$
(1.11)

where \(c_1, c_2\) are uniform constants. In particular this gives the bound

$$\begin{aligned} \Vert q^{(n)}_{i,\cdot } - \pi \Vert \le c_1 e^{-c_2 n/ \ell ^*_D}. \end{aligned}$$

Using only Q one can derive upper bounds for \(\ell ^*_D\), so we treat that as a computable constant. Now, if, overall, the upper bound is less than \(\varepsilon |\mathcal {S}|^{-1}\) for some \(n_\varepsilon \) then (1.10) implies that \(T_{\varepsilon } \le n_\varepsilon \). Forcing the upper bound in the display above to be less than \(\varepsilon |\mathcal {S}|^{-1}\) we have that

$$\begin{aligned} n_\varepsilon > C \ell _{D}^*(-\log \varepsilon + \log |\mathcal {S}|), \end{aligned}$$

which in turn gives that there exists a function \(f(\mathcal {S}, Q)\) such that

$$\begin{aligned} T_\varepsilon \le f({\mathcal {S}}, Q)| \log \varepsilon |, \end{aligned}$$
(1.12)

which shows us how the mixing time depends on the order of \(\varepsilon \).

For a lower bound, the most basic method involves counting; it relies on the idea that if the possible locations of a chain after n jumps do not cover a substantial proportion of the state space, we cannot be close to mixing. Then one can get

$$\begin{aligned} T_\varepsilon > \frac{\log ( |{\mathcal {S}}|(1 - \varepsilon ))}{\log c(Q)}\,. \end{aligned}$$
(1.13)

The constant c(Q) only depends on the transition matrix. Note that the lower bound above is not necessarily close to the upper bound, and as \(\varepsilon \rightarrow 0\) it gets weaker. The \(\varepsilon \)-order of this agrees with the upper bound when \(|{\mathcal {S}}| \sim \varepsilon ^{-1}\). Many further methods exist for lower bounds, but are usually model-dependent. We briefly mention that a suitable \(L^2\) theory exists for reversible, aperiodic, irreducible MCs so bounds on \(T_{\varepsilon }\) from below are of the same order as the upper bounds,

$$\begin{aligned} ((\gamma ^* )^{-1}- 1) | \log 2\varepsilon |< T_{\varepsilon } < (\gamma ^* )^{-1} c_{Q} |\log \varepsilon |, \end{aligned}$$

where \(\gamma ^*\) is the spectral gap of Q (the difference between 1 and the second largest eigenvalue \(\lambda _2\)).

2 Results

In this short paper, we will bound mixing times for continuous semi-Markov processes with heavy tails for the distribution of inter-event times. Using Lemma 1 we have that the convergence occurs (albeit more slowly than Markov chains). The global time change we performed on the chain will be reflected in the bounds for the mixing times, as we obtain them in terms of the mixing times of the embedded discrete chain.

At this point, we want to impose some conditions on the distribution of the inter-event times we are looking at. In particular:

Assumption

We assume there are two uniform constants \(c_1\) and \(c_2\), a \(t_0 > 0\) and a \(\beta >0\) such that

$$\begin{aligned} \frac{c_1}{t^{\beta }}\le \mathbb {P}\{ T> t \} \le \frac{c_2}{t^{\beta }}, \quad \text { for all }\ \ t > t_0. \end{aligned}$$
(2.1)

Note that we are not assuming any moments exist for the inter-event distributions as \(\beta \) can be in (0, 1). In the case where moments exist, the results sharpen.

For any \(\varepsilon >0\) we define the mixing time for the continuous semi-Markov chain to be

$$\begin{aligned} T_{\varepsilon }^{\text {cont}} = \inf \Big \{ t: \sum _{ i \in {\mathcal {S}}} \Vert p_{i, \cdot }(t) - \pi _\cdot \Vert \le \varepsilon \Big \}. \end{aligned}$$
(2.2)

By Lemma 1 we know the \(p_{i, \cdot }\) converge to \(\pi \) so the above object is finite and well defined.

2.1 Motivating examples

Example 1

(Diagonalizable transition matrix) In this example, we make the assumption that Q is the transition matrix of an irreducible, aperiodic Markov chain and in particular that it is diagonalisable. Let \(\pi \) denote the unique invariant distribution of the Markov chain and recall that \(\pi \) is a left 1-eigenvector for the matrix Q and the vector \(\mathbf{1} = (1, \ldots , 1)\) is a right 1-eigenvector. Since Q is diagonalisable, we have that there exists a matrix L so that \(LQL^{-1} = D\) and without loss of generality we may assume that \(d_{11} = 1\) and that \(\ell _{1j} = \pi _j\). Furthermore, by the Perron-Frobenius theorem, the 1-eigenspace has dimension 1 and therefore the first column of \(L^{-1} = ({\tilde{\ell }}_{ij})\) is a right 1-eigenvector of Q and therefore satisfies \({\tilde{\ell }}_{i1} = 1\).

Then \(Q^n = L^{-1}D^nL\) and on a coordinate by coordinate computation we have

$$\begin{aligned} q^{(n)}_{ij} =\sum _{k=1}^N {{\tilde{\ell }}}_{ik} \lambda ^{n}_{k}\ell _{kj} = \pi _j + \sum _{k \ne 1} {{\tilde{\ell }}}_{ik} \lambda ^{n}_{k}\ell _{kj}. \end{aligned}$$

The eigenvalues \(\lambda _k\) remaining in the sum all have \(|\lambda _k| < 1\), with the sum vanishing as n grows and the n-step transitions converging to the invariant distribution.

Substituting the last relationship back in (1.6), we have

$$\begin{aligned} p_{i,j}(t)&= {\overline{F}}_\nu (t)\delta _{ij} + \sum _{n=1}^\infty q^{(n)}_{i,j} \mathbb {P}\{N_\nu (t) = n\}\\&= {\overline{F}}_\nu (t)\delta _{ij} + \sum _{n=1}^\infty \left( \pi _j + \sum _{k = 2}^N {{\tilde{\ell }}}_{ik} \lambda ^{n}_{k}\ell _{kj} \right) \mathbb {P}\{N_\nu (t) = n\}\\&= \pi _j (1 - \mathbb {P}\{ N_\nu (t) = 0\}) + {\overline{F}}_\nu (t)\delta _{ij} + \sum _{k=2}^N \sum _{n=1}^\infty {{\tilde{\ell }}}_{ik} \lambda ^{n}_{k} \mathbb {P}\{N_\nu (t) = n\}\ell _{kj}\\&= \pi _j + (\delta _{ij} - \pi _j) {\overline{F}}_\nu (t) + \sum _{k=2}^N {{\tilde{\ell }}}_{ik} \sum _{n=1}^\infty \left( \lambda ^{n}_{k} \mathbb {P}\{N_\nu (t) = n\} \right) \ell _{kj}\\&= \pi _j + \sum _{k=2}^N {{\tilde{\ell }}}_{ik} \sum _{n=0}^\infty \left( \lambda ^{n}_{k} \mathbb {P}\{N_\nu (t) = n\} \right) \ell _{kj}\\&= \pi _j + \sum _{k=2}^N {{\tilde{\ell }}}_{ik} \mathbb {E}\Big (\lambda _k^{N_\nu (t)}\Big )\ell _{kj}\\&=\pi _j + \sum _{k=2}^N {{\tilde{\ell }}}_{ik} P_{N_\nu (t)}(\lambda _k)\ell _{kj}. \end{aligned}$$

Particularly, the convergence to equilibrium for a finite state space process only depends on the tails of the probability generating function of \(N_\nu (t)\). Then, since \(N_\nu \) is an increasing process and \(|\lambda _k| < 1\), we may bound

$$\begin{aligned} \sup _j | p_{i,j}(t) - \pi _j| \le C_N P_{N_\nu (t)}(|\lambda _2|). \end{aligned}$$
(2.3)

Therefore the total variation distance as a function of time only depends on the tails of the probability generating function.

In fact, the following rough estimate can be performed, keeping in mind that \(|\lambda _2| <1\). Let K such that \(|\lambda _2|^K < \varepsilon /2\),

$$\begin{aligned} P_{N_\nu (t)}(|\lambda _2|)&= \mathbb {E}(|\lambda _2|^{N_\nu (t)}\mathbf{1}\{N_\nu (t) > K\}) + \mathbb {E}(|\lambda _2|^{N_\nu (t)}\mathbf{1}\{N_\nu (t) \le K\})\\&\le |\lambda _2|^K + \mathbb {P}\{ N_{\nu }(t) \le K\} \le \varepsilon /2 + \mathbb {P}\{ N_{\nu }(t) \le K\} . \end{aligned}$$

From Lemma 2 below, the second term above decays like \(K^{1+\beta }t^{-\beta }\) and modulates t in order to make this quantity arbitrarily small.

Example 2

(Mittag-Leffler waiting times) This example is taken from [2]. When the waiting times \(T_i\) are Mittag-Leffler with parameter \(\beta \), we have that \(P_{N_\nu (t)}(\lambda )\) \(= E_{\beta }((\lambda -1)t^\beta )\), where \(E_{\beta }\) is the Mittag-Leffler function with parameter \(\beta \in (0,1]\). For large t values we have that

$$\begin{aligned} E_{\beta }((\lambda -1)t^\beta ) \sim C_{\lambda , \beta } t^{-\beta }, \end{aligned}$$

and therefore

$$\begin{aligned} \sup _j | p_{i,j}(t) - \pi _j| \le C_{\lambda _2, \beta } N t^{-\beta }. \end{aligned}$$
(2.4)

The total variation distance becomes less than \(\varepsilon > 0\), when

$$\begin{aligned} t > \left( \frac{C_{\lambda , \beta , N}N}{\varepsilon }\right) ^{1/\beta }. \end{aligned}$$

We compute an explicit value for \(C_{\lambda , \beta , N}\) later, in the proof of Theorem 2.

We are now ready to state the main theorem.

Theorem 1

Assume (2.1). Let \(\varepsilon > 0\) and \(T^\mathrm{{emb}}_{\varepsilon /2}\) the \(\varepsilon /2\)-mixing time for the embedded chain, be given by (1.10). Then for any \(\beta > 0 \) we can find explicit constants \(C_1\) so that

$$\begin{aligned} T^\mathrm{{cont}}_{\varepsilon } < C_1 \varepsilon ^{-1/\beta } (T^\mathrm{{emb}}_{\varepsilon /2})^{1+1/\beta }. \end{aligned}$$

This theorem is quite general as it makes no further assumptions on the background chain. Moreover, as is often the case for discrete Markov chains, a lot of the sophisticated estimates on mixing times are model dependent, so a theorem like Theorem 1 can utilise those bounds directly.

In the case where the inter-event times are Mittag-Leffler distributed we can make the upper bound sharper.

Theorem 2

Let X(t) be a finite space semi-Markov process for which the inter-event times are Mittag-Leffler\((\beta )\) distributed. Then,

$$\begin{aligned} T^\mathrm{{cont}}_{\varepsilon } < C_2 \varepsilon ^{-1/\beta } (T^\mathrm{{emb}}_{\varepsilon /2})^{1/\beta }\,. \end{aligned}$$

In Figure 1 we see a simulation of the fractional Ehrenfest chain for times before and at the upper bound of the mixing time in Theorem 2.

Fig. 1
figure 1

Mittag-Leffler Ehrenfest chain at (left panel) and before (right panel) the bound for \(T^\mathrm{{cont}}_{\varepsilon }\) as given in Theorem 2. The abscissae represent states and the ordinates are their empirical frequencies (blue stars). The expected distribution is binomial and is represented by the red stars

A natural question arises about lower bounds for \(T^\mathrm{{cont}}_{\varepsilon }\). These are more challenging to obtain for the total variation distance directly. However, by defining a different distance between the measures we can obtain also lower bounds. Let

$$\begin{aligned} \widetilde{T}^\mathrm{{cont}}_\varepsilon = \inf \{ t: \max _{i} \mathbb {E}\Vert q^{(N_s)}_{i,\cdot } - \pi \Vert < \varepsilon , \,\, \text { for all } s> t \}. \end{aligned}$$
(2.5)

Note that

$$\begin{aligned} \Vert p_{i, \cdot }(t) - \pi \Vert = \Vert \mathbb {E}(q^{(N_t)}_{i, \cdot }) - \pi \Vert \le \mathbb {E}\Vert q^{(N_t)}_{i, \cdot }- \pi \Vert , \end{aligned}$$

and therefore if the expected value (2.5) is less than \(\varepsilon \) then the total variation distance is small. In particular this already gives

$$\begin{aligned} \widetilde{T}^\mathrm{{cont}}_\varepsilon >{T}^\mathrm{{cont}}_\varepsilon . \end{aligned}$$

Using definition (2.5), we can however find bounds for \(\widetilde{T}^\mathrm{{cont}}_\varepsilon \).

Theorem 3

Assume (2.1). Let \(\delta > 0\) and \(T^\mathrm{{emb}}_{\delta }\) the \(\delta \)-mixing time for the embedded chain, be given by (1.10). Then for any \(\beta > 0\), and any \(\alpha \in (0,1)\) we can find explicit constants \(C_1 < C_2\) so that

$$\begin{aligned} C_1 \varepsilon ^{(\alpha -1)/\beta } (T^\mathrm{{emb}}_{\varepsilon ^{\alpha }})^{1/\beta }< \widetilde{T}^\mathrm{{cont}}_{\varepsilon } < C_2 \varepsilon ^{-1/\beta } (T^\mathrm{{emb}}_{\varepsilon /2})^{1+1/\beta }. \end{aligned}$$

We are now ready to present the proofs in the next section.

3 Mixing times and equilibrium

Lemma 2

Under assumption (2.1), let \(K \in \mathbb {N}\) and let \(t > (t_0 \vee (2c_2)^{1/\beta })K\). Then, there exists a uniform positive constant \(C_0\) so that

$$\begin{aligned} \frac{c_1 \,K}{t^\beta } - \frac{C_0\,K^{2}}{t^{2\beta }} \le \mathbb {P}\{N_\beta (t) < K \} \le \frac{c_2K^{1+\beta }}{t^\beta } + \frac{C_0\,K^{1+2\beta }}{t^{2\beta }}. \end{aligned}$$
(3.1)

Proof

The assumptions of the lemma guarantee that all functions below are well defined, all constants arising from Taylor’s theorem do not depend on t and the error of Taylor’s theorem is small. When \(t > (t_0 \vee (2c_2)^{1/\beta })K\) we have

$$\begin{aligned} 1 - \mathbb {P}\{N_\beta (t)< K \}&= 1 - \mathbb {P}\left\{ t< \sum ^{K}_{j=1} T_j \right\} \ge 1- \mathbb {P}\left\{ t < K \max _{1 \le j\le K } T_j \right\} \nonumber \\&= 1- \mathbb {P}\left\{ \max _{1 \le j\le K} T_j > \frac{t}{K} \right\} = \left( \mathbb {P}\left\{ T_1 \le \frac{t}{K} \right\} \right) ^{K} \nonumber \\&\ge \exp \left\{ K\log \left( 1 - c_2 \frac{K^\beta }{t^\beta } \right) \right\} \nonumber \\&= \exp \left\{ -Kc_2 \frac{K^\beta }{t^\beta } - KC_{\text {up}}c^2_2 \frac{K^{2\beta }}{t^{2\beta }} \right\} , \, \text {for a uniform }C_{\text {up}}, \nonumber \\&\ge 1 - c_2 \frac{K^{1+\beta }}{t^\beta } - C_{\text {up}}c^2_2 \frac{K^{1+2\beta }}{t^{2\beta }} . \end{aligned}$$
(3.2)

For a lower bound we can write

$$\begin{aligned} \mathbb {P}\{N_\beta (t)< K \}&= \mathbb {P}\left\{ t< \sum ^{K}_{j=1} T_j \right\} \ge \mathbb {P}\left\{ t< \max _{1 \le j\le K } T_j \right\} \nonumber \\&= 1 - \left( \mathbb {P}\left\{ T_1 < t \right\} \right) ^{K} \ge 1- \left( 1 - \frac{c_1}{t^{\beta }}\right) ^K \nonumber \\&= 1 - \exp \left\{ -Kc_1 \frac{1}{t^\beta } - K {\widetilde{C}}_{\text {low}}c^2_1 \frac{1}{t^{2\beta }} \right\} \ge 1 - \exp \left\{ -Kc_1 \frac{1}{t^\beta }\right\} \nonumber \\&\ge Kc_1 \frac{1}{t^\beta } - C_{{\text {low}}}\left( \frac{K c_1}{t^\beta }\right) ^2, \ \text { for a uniform constant } C_{\text {low}}. \end{aligned}$$
(3.3)

The lemma follows from (3.2) and (3.3). The last inequality on the right side of (3.1) comes directly from the assumption. \(\square \)

Proof of Theorem 1

It suffices to prove that for arbitrary \(M< L\) the total variation distance between the transition probabilities and the equilibrium distribution is bounded above, according to the following

$$\begin{aligned} \Vert p_{i,\cdot }(t) - \pi \Vert \le \mathbb {P}\{ N_t < M\} + \Vert q^{(M)}_{i,\cdot } - \pi \Vert + \Vert q_{i,\cdot }^{(L)} - \pi \Vert \mathbb {P}\{ N_t > L\}. \end{aligned}$$
(3.4)

Assume for the moment that (3.4) holds and set \(M = T^{\text {emb}}_{\varepsilon /2}\). By the definition of \(T^{\text {emb}}_{\varepsilon /2}\), the middle term on the right-hand side of (3.4) is bounded above by \(\varepsilon /2\). Then let \(L \rightarrow \infty \) so that the third term vanishes.

The left-hand side is then bounded by \(\varepsilon \) -and therefore the continuous process is \(\varepsilon \)-mixed- if \(\mathbb {P}\{ N_t < T^{\text {emb}}_{\varepsilon /2} \} \le \varepsilon /2\). By Lemma 2 this happens whenever

$$\begin{aligned} t > \left( \frac{2(c_1 + C_0)}{\varepsilon }\right) ^{1/\beta }\left( T^{\text {emb}}_{\varepsilon /2}\right) ^{1+1/\beta } \vee (t_0 \vee (2c_2)^{1/\beta })T^{\text {emb}}_{\varepsilon /2}, \end{aligned}$$

and therefore

$$\begin{aligned} T^{\text {cont}}_{\varepsilon } < C_2 \varepsilon ^{-1/\beta } (T^{\text {emb}}_{\varepsilon /2})^{1+1/\beta }. \end{aligned}$$
(3.5)

The theorem is proven when we establish (3.4). To this end,

$$\begin{aligned}&2\Vert p_{i,\cdot }(t) - \pi \Vert = \sum _{j \in {\mathcal {S}}}|p_{i,j}(t) - \pi _{j}| \\&\quad = \sum _{j \in {\mathcal {S}}}\left| \sum _{n=0}^{\infty }(q^{(n)}_{i,j} - \pi _{j}) \mathbb {P}\{ N_t = n\}\right| \le \sum _{j \in {\mathcal {S}}}\sum _{n=0}^{\infty }|q^{(n)}_{i,j} - \pi _{j}| \mathbb {P}\{ N_t = n\}\\&\quad \le \mathbb {P}\{ N_t< M\} + \sum _{j \in {\mathcal {S}}}\sum _{n=M}^{L}|q^{(n)}_{i,j} - \pi _{j}| \mathbb {P}\{ N_t = n\} + 2\Vert q^{(L)}_{i,\cdot } - \pi \Vert \mathbb {P}\{ N_t> L\}\\&\quad \le \mathbb {P}\{ N_t < M\} +2 \Vert q^{(M)}_{i,\cdot } - \pi \Vert \mathbb {P}\{ M \le N_t \le L\} + 2\Vert q^{(L)}_{i, \cdot } - \pi \Vert \mathbb {P}\{ N_t > L\}. \end{aligned}$$

\(\square \)

Proof Theorem 2

When the counting process \(N_{\beta }(t)\) has Mittag-Leffler(\(\beta \)) inter-event times, we have

$$\begin{aligned} {\bar{n}}_t = \mathbb {E}(N_\beta (t)) = \frac{t^\beta }{\Gamma (1+\beta )}, \quad \mathbb {E}(N^2_\beta (t)) = {\bar{n}}_t + ({\bar{n}}_t)^2 \left[ \frac{\beta B(\beta ,1/2)}{2^{2 \beta -1}} -1 \right] , \end{aligned}$$
(3.6)

where \(B(\cdot ,\cdot )\) is the beta function.

As in Example 2,

$$\begin{aligned} p_{i,j}(t)&= {\overline{F}}_\nu (t)\delta _{ij} + \sum _{n=1}^\infty q^{(n)}_{i,j} \mathbb {P}\{N_\nu (t) = n\} = \sum _{n=0}^\infty q^{(n)}_{i,j} \mathbb {P}\{N_\nu (t) = n\} \nonumber \\&\le \sum _{n=0}^\infty (\pi _j + c_1e^{-n/e\ell ^*_D}) \mathbb {P}\{N_\nu (t) = n\} \nonumber \\&=\pi _j + c_1 M_{N_\nu (t)}(-1/e\ell ^*_D) =\pi _j + c_1 E_{\beta }((e^{-1/e\ell ^*_D}-1)t^\beta ) . \end{aligned}$$
(3.7)

Here, \(M_{N_\beta (t)}(s)\) is the moment generating function of the counting process \(N_{\beta }(t)\), while \(E_{\beta }\) is the Mittag-Leffler function with parameter \(\beta \). \(\ell ^*_D\) is defined in equation (1.11).

We will extrapolate mixing times asymptotics by forcing

$$\begin{aligned} c_1E_{\beta }((e^{-1/e\ell ^*_D}-1)t^\beta ) = c_1 M_{N_\beta (t)}(-1/e\ell ^*_D) < \varepsilon . \end{aligned}$$

The equality between this two quantities is a beautiful fact of the Mittag-Leffler function. The derivation of the moment generating function can be found in the book [1] and in [4].

One way to bound above the moment generating function is by

$$\begin{aligned} M_{N_\nu (t)}(-1/e\ell ^*_D) \le \mathbb {P}\left\{ N_{\beta }(t) \le \theta \frac{t^{\beta }}{\Gamma (1+\beta )} \right\} + e^{- \theta t^{\beta }/e\Gamma (1+\beta )\ell ^*_D}. \end{aligned}$$
(3.8)

The constant \(\theta \) is to be determined so that each term above is bounded by \(\varepsilon /2\). For the first term we will use the Paley-Zygmound inequality. For any \(\theta \in [0,1]\) we have

$$\begin{aligned} \mathbb {P}\left\{ N_{\beta }(t) \ge \theta \frac{t^{\beta }}{\Gamma (1+\beta )} \right\}&\ge \frac{(1 - \theta )^2 \mathbb {E}(N_\beta (t))^2}{\text {Var}(N_\beta (t)) + (1 - \theta )^2\mathbb {E}(N^2_\beta (t))}\\&=\frac{ (1 - \theta )^2 {\bar{n}}_t^2}{((1-\theta )^2 + 1){\bar{n}}_t + {\bar{n}}_t^2 \left( \frac{\beta B(\beta , 1/2)}{2^{2\beta -1}} -1 \right) }\\&=\frac{1}{ (1 - \theta )^{-2} \left( \frac{\beta B(\beta , 1/2)}{2^{2\beta -1}} -1 \right) + (1 + (1 - \theta )^{-2}){\bar{n}}_t^{-1}}. \end{aligned}$$

The function \(\frac{\beta B(\beta , 1/2)}{2^{2\beta -1}} -1\) is monotonically decreasing in \(\beta \) and takes values in (0, 1). therefore there is a unique \(\theta ^*(\beta )\) in (0, 1) so that \((1-\theta ^*(\beta ))^{-2}(\frac{\beta B(\beta , 1/2)}{2^{2\beta -1}} -1)= 1\). For the particular value of \(\theta ^*\) we bound

$$\begin{aligned} \mathbb {P}\Big \{ N_{\beta }(t)&\ge \frac{\theta ^*({\beta })t^{\beta }}{\Gamma (1+\beta )} \Big \} \\&\ge 1 - \frac{(1 + (1 - \theta ^*(\beta ))^{-2})}{\bar{n_t}} = 1 - \frac{(1 + (1 - \theta ^*(\beta ))^{-2})\Gamma (1+\beta )}{t^{\beta }}. \end{aligned}$$

In particular, we obtain

$$\begin{aligned} \mathbb {P}\left\{ N_{\beta }(t) \le \frac{\theta ^*(\beta )t^{\beta }}{\Gamma (1+\beta )} \right\} \le \frac{(1 + (1 - \theta ^*(\beta ))^{-2})\Gamma (1+\beta )}{t^{\beta }} = \frac{C_\beta }{t^{\beta }}. \end{aligned}$$
(3.9)

This is a much improved bound for the probability, than the one established in Lemma 2. Impose that the upper bound in (3.9) is less than \(\varepsilon /2\) to obtain that

$$\begin{aligned} t > \left( \frac{2C_\beta }{\varepsilon }\right) ^{1/\beta }. \end{aligned}$$
(3.10)

Similarly, set

$$\begin{aligned} e^{- \theta ^*(\beta ) t^{\beta }/e\Gamma (1+\beta )\ell ^*_D} < \varepsilon /2 \Longleftrightarrow t > \left( \frac{e\Gamma (1+\beta )}{\theta ^*(\beta )}\ell _D^*\log \frac{2}{\varepsilon }\right) ^{1/\beta }. \end{aligned}$$
(3.11)

Combine (3.10) and (3.11) in (3.8), which in turn can bound (3.7) to conclude that the relation

$$\begin{aligned} T^\mathrm{{cont}}_\varepsilon \le \left( \max \left\{ C_\beta , \frac{e\Gamma (1+\beta )}{\theta ^*(\beta )}\ell _D^*\right\} \frac{2c_1}{\varepsilon }\right) ^{1/\beta }, \end{aligned}$$
(3.12)

as required. \(\square \)

Proof of Theorem 3

Using definition (2.5), we can however find a lower bound for \(\widetilde{T}^\mathrm{{cont}}_\varepsilon \). We have that for any M positive,

$$\begin{aligned} \mathbb {E}\Vert q^{(N_t)}_{i, \cdot }- \pi \Vert \ge \mathbb {E}(\Vert q^{(N_t)}_{i, \cdot }- \pi \Vert \mathbf{1}\{ N_t< M \}) \ge \Vert q^{(M)}_{i, \cdot }- \pi \Vert \mathbb {P}\{ N_t < M \}. \end{aligned}$$
(3.13)

If we set \(M = \frac{1}{2}T^{\text {emb}}_{\varepsilon ^\alpha }\), we have

$$\begin{aligned} \mathbb {E}\Vert q^{(N_t)}_{i, \cdot }- \pi \Vert \ge \varepsilon ^{\alpha }\mathbb {P}\Big \{ N_t < \frac{1}{2}T^{\text {emb}}_{\varepsilon ^\alpha } \Big \}, \end{aligned}$$

and therefore it suffices to have \(\mathbb {P}\{ N_t < T^{\text {emb}}_{\varepsilon ^\alpha }/2 \} > \varepsilon ^{1-\alpha }\), in order for the two measures to not be close in distance (2.5). This is enough to guarantee

$$\begin{aligned} \widetilde{T}^{\text {cont}}_{\varepsilon } \ge \sup \Big \{ t: \mathbb {P}\Big \{ N_{t} < \frac{1}{2}T^{\text {emb}}_{\varepsilon ^{\alpha }}\Big \} \ge \varepsilon ^{1-\alpha }\Big \}. \end{aligned}$$

At this point we need to separate two cases, depending on the assumption of Lemma 2. If \(\beta < 1\), then the assumption of the lemma requires

$$\begin{aligned} t > C(t_0, c_2, \beta ) T^{\text {emb}}_{\varepsilon ^{\alpha }} \end{aligned}$$

in order to use (3.1), while we must also have

$$\begin{aligned} t^{\beta } > C(t_0, c_2, \beta ) T^{\text {emb}}_{\varepsilon ^{\alpha }}, \end{aligned}$$
(3.14)

so that the lower bound in Lemma 2 is non-negative. Then

$$\begin{aligned} \varepsilon ^{1-\alpha }< \tilde{C}(t_0, c_2, \beta ) \frac{T^{\text {emb}}_{\varepsilon ^{\alpha }}}{t^\beta } \Longleftrightarrow t < \tilde{C}(t_0, c_2, \beta ) \varepsilon ^{(\alpha -1)/\beta }(T^{\text {emb}}_{\varepsilon ^{\alpha }})^{1/\beta }. \end{aligned}$$
(3.15)

In order for both inequalities (3.14) and (3.15) to be satisfied, we need (modulo the constants)

$$\begin{aligned} 1 < \varepsilon ^{(\alpha -1)/\beta } \end{aligned}$$

which is true as \(\alpha <1\). Therefore in the case \(\beta <1\)

$$\begin{aligned} \widetilde{T}^{\text {cont}}_{\varepsilon ^{\alpha }} > C_1 \varepsilon ^{(\alpha -1)/\beta } (T^{\text {emb}}_{\varepsilon ^{\alpha }})^{1/\beta }. \end{aligned}$$

Now suppose that \(\beta \ge 1\). Then for the estimate in Lemma 2 to be meaningful (i.e. the lower bound is strictly greater than 0), we need that \(t^{\beta }> T^{\text {emb}}_{\varepsilon ^{\alpha }}\). This is guaranteed by the assumption of Lemma 2 and we obtain

$$\begin{aligned} \widetilde{T}^{\text {cont}}_{\varepsilon ^{\alpha }} {\mathop {>}\limits ^{\text {Lemma } 2}} C_1 \varepsilon ^{(\alpha -1)/\beta } (T^{\text {emb}}_{\varepsilon ^{\alpha }})^{1/\beta }. \end{aligned}$$

Now for the upper bound in the theorem, we can repeat the arguments of Theorem 1. We have

$$\begin{aligned} \mathbb {E}\Vert q^{(N_t)}_{i, \cdot } - \pi \Vert = \sum _{n=0}^{\infty }\Vert q^{(n)}_{i, \cdot } - \pi \Vert \mathbb {P}\{N_t = n\}, \end{aligned}$$

and therefore bound (3.4) and all subsequent arguments work for this distance as well. \(\square \)