Mixing Times and Hitting Times for General Markov Processes

The hitting and mixing times are two fundamental quantities associated with Markov chains. In Peres and Sousi[PS2015] and Oliveira[Oli2012], the authors show that the mixing times and"worst-case"hitting times of reversible Markov chains on finite state spaces are equal up to some universal multiplicative constant. We use tools from nonstandard analysis to extend this result to reversible Markov chains on general state spaces that satisfy the strong Feller property. Finally, we show that this asymptotic equivalence can be used to find bounds on the mixing times of a large class of Markov chains used in MCMC, such as typical Gibbs samplers and Metroplis-Hastings chains, even though they usually do not satisfy the strong Feller property.

1. Introduction. Two of the most-studied quantities in the Markov chain literature are the mixing time and hitting times associated with a chain. In [35,34], the authors showed that these quantities are equal up to universal constants for reversible discrete time Markov processes with finite state space. In this paper, we use Nonstandard Analysis to extend this result to discrete time Markov processes on σ-compact state spaces that satisfy the strong Feller property (see 2.2). As in the context of [35,34], it is generally easier to get upper bounds on hitting times, and it is generally easier to get lower bounds on mixing times. These results let us estimate whichever is more convenient.
Recall that the mixing time measures the number of steps a Markov chain must take to (approximately) forget its initial condition. This quantity is fundamental in computer science and computational statistics, where it measures the efficiency of algorithms based on Markov chains; it is also important in the statistical physics literature, where it provides a way to qualitatively describe the behaviour of a material (see e.g. overviews [28,18,32,3,30]). The hitting time measures the number of steps a Markov chain must take before entering a set for the first time. This quantity is not always directly relevant for applications, but it is usually easier to compute or estimate and many tools have been developed for estimating hitting times and relating them to other quantities of interest (see e.g. the role of hitting time calculations in the theory of metastability [11]).
1.1. Nonstandard Analysis. In this paper, we extend known results about Markov Processes with a finite state space to those with a continuum state space. Our arguments are based on nonstandard analysis, which allows construction of a single object-a hyperfinite probability space-that satisfies all the first order logic properties of a finite probability space, but can simultaneously be viewed via the Loeb construction ( [29]) as a measure-theoretic probability space. This construction often allows one to make discrete arguments about the hyperfinite probability space, and then use the Loeb construction to express the results in measure-theoretic terms.
In order to do this, one has to establish approriate notions of liftings (hyperfinite processes that sit "above" the measure-theoretic objects of interest) and pushdowns (projections of hyperfinite objects to the measure-theoretic objects). These liftings and pushdowns form a "dictionary" that must be chosen specifically to represent the type of probabilistic process of interest. Dictionaries for Lebesgue Integration, Brownian Motion and Itô Integration were given in [5] and [6], for stochastic differential equations in [27], and for Markov chains in [19]. One of the main contributions of this paper is an expansion of the dictionary for Markov chains in [19]. This expansion lets us translate the proofs of existing discrete results to obtain several new results, and we anticipate it being useful for the translation of further Markov chain results in the future. Most algorithms used in MCMC do not satisfy the strong Feller condition, and so our main result, Theorem 2.2, does not apply directly. In Section 7, we explain how our main result can still be applied to popular MCMC chains.
We note that most chains used in MCMC are geometrically ergodic but do not have finite mixing times. Our main results can still be applied in this situation, and this is the subject of a companion paper.

Equivalence and Sensitivity.
There are many different ways to measure the time it takes for a Markov chain to "get random." The present paper belongs to the large literature, started in [2,4], devoted to understanding how much different measures of this time can disagree.
These "equivalence" results are closely related to the problem of studying the sensitivity of Markov chains to qualitatively-small changes (see e.g. [1,24]) and to the study of perturbations of Markov chains (see e.g. [31,25,36,38,9,33]). While perturbations have been studied on very general state space, to our knowledge all research related to sensitivity has been focused on Markov chains on discrete state spaces.
Finally, the relationship between hitting and mixing times has been refined since [35,34]; see e.g. [10]. 1.3. Overview of the Paper. In Section 2, we give basic definitions and inequalities related to mixing and hitting times. We also state the main results.
In Section 3 and Section 4, we introduce hyperfinite representations for probability spaces and discrete-time Markov processes developed in [19]. Namely, we show that, for every discrete-time Markov process satisfying appropriate conditions, there exists a corresponding hyperfinite Markov process.
In Section 5, we show that the mixing times and hitting times of a discretetime Markov process on compact state space can be approximated by the mixing times and hitting times of its corresponding hyperfinite Markov process. This leads to a proof in Section 5.3 that mixing times and hitting times are asymptotically equivalent for discrete-time Markov processes on compact state space satisfying Section 1. We extend to σ-compact spaces in Section 6.
Finally, in Section 7 and Section B we show how to apply our results to some popular chains from statistics.
Various elementary proofs and lemmas are deferred to the appendices.

Preliminaries and Main
Results. We fix a σ-compact metric state space X endowed with Borel σ-algebra B[X] and let {P x (·)} x∈X denote the transition kernel of a Markov process with unique stationary measure π. Throughout the paper, we include 0 in N. For x ∈ X, t ∈ N and A ∈ B[X], we write P (t) x (A) or P (t) (x, A) for the transition probability from x to A in t steps. We write P x (A) and P (x, A) as an abbreviation for P (1) x (A) and P (1) (x, A), respectively. Recall that {P x (·)} x∈X is said to be reversible if A P (x, B)π(dx) = B P (x, A)π(dx). |µ(A) − ν(A)| (2.2) the usual total variation distance between µ and ν. Our main result will require the following continuity condition on {P x (·)} x∈X . Assumption 1. DSF The transition kernel {P x (·)} x∈X satisfies the strong Feller property if for every x ∈ X and every ǫ > 0 there exists δ > 0 such that (∀y ∈ X)(|y − x| < δ =⇒ ( P x (·) − P y (·) < ǫ)).
It is clear that d(t) is a non-increasing function. The "lazy" transition kernel associated with {P x (·)} x∈X is: The lazy kernel {P L (x, ·)} x∈X of a transition kernel {P (x, ·)} x∈X is given by for every x ∈ X and every A ∈ B[X], where For ǫ ∈ R >0 , we denote the mixing time of the lazy chain by t L (ǫ). For notational convenience, we will simply write t L and t m when ǫ = 1 4 . We now denote by {X t } t∈N a Markov chain with transition kernel {P x (·)} x∈X and arbitrary starting point X 0 = x 0 ∈ X. Recall that the hitting time of a set A ∈ B[X] for this Markov chain is defined to be: We now introduce the maximum hitting time of large sets.
imsart-aap ver. 2014/10/16 file: Mixing_Extended_Version.tex date: April 5, 2019 Definition 3. Let α ∈ R >0 . The maximum hitting time with respect to α is where E is the expectation of a measure in the product space which generates the underlying Markov process and its subscript refers to the starting point X 0 .
We now quote the main results from [35,34]: Throughout the paper, we denote by M the collection of discrete time reversible transition kernels with a stationary distribution on a σ-compact metric state space satisfying Section 1. Note that transition kernels on finite state spaces belong to M. The main result of this paper generalizes Theorem 2.1 to M: The first inequality in Theorem 2.2 is straightforward and well-known (see e.g. Lemma 10). The second is more difficult. The compact version of Theorem 2.2 is proved in Theorem 5.5 and the general version is proved in Theorem 6.1.
2.1. Equivalent Form of Mixing Times and Hitting times. In this section, we define several quantities that are asymptotically equivalent to the mixing times and the maximum hitting times defined in the previous section. These equivalent forms play important roles throughout the entire paper, since they are easier to work with for general Markov processes. First, let We recall two important results on d(t): 1 Lemma 1 ( [28,Lemma. 4.11]). For every t ∈ N, we have d(t) ≤ d(t) ≤ 2d(t).
Lemma 2 ( [28,Lemma. 4.12]). The function d is sub-multiplicative. That is, for s, t ∈ N, For every ǫ ∈ R >0 , define the standardized mixing time to be Similarly, we define For convenience, we write t m and t L when ǫ = 1 4 . The following well-known equivalence between mixing times and standard mixing times follows immediately from 1 and 2: Next, we define the large hitting time: Definition 4. Let α ∈ R >0 . The large hitting time with respect to α is where P is a measure in the product space which generates the underlying Markov process and its subscript gives the starting point of the Markov process.
Unsurprisingly, the maximum hitting time is asymptotically equivalent to the large hitting time: Lemma 4. For every α ∈ R >0 and every Markov process, we have The following is an immediate consequence of 4 and Theorem 2.1.
Then there exist universal positive constants c ′ α , c α so that for every finite reversible Markov process 2.2. Notation from Nonstandard Analysis. In this paper, we use nonstandard analysis, a powerful machinery derived from mathematical logic, as our main toolkit. For those who are not familiar with nonstandard analysis, [19,20] provide reviews tailored to statisticians and probabilists. [8,16,40] provide thorough introductions.
We briefly introduce the setting and notation from nonstandard analysis. We use * to denote the nonstandard extension map taking elements, sets, functions, relations, etc., to their nonstandard counterparts. In particular, * R and * N denote the nonstandard extensions of the reals and natural numbers, respectively. An element r ∈ * R is infinite if |r| > n for every n ∈ N and is finite otherwise. An element r ∈ * R with r > 0 is infinitesimal if r −1 is infinite. For r, s ∈ * R, we use the notation r ≈ s as shorthand for the statement "|r − s| is infinitesimal," and similarly we use use r s as shorthand for the statement "either r ≥ s or r ≈ s." Given a topological space (X, T ), the monad of a point x ∈ X is the set U ∈T : x∈U * U . An element x ∈ * X is near-standard if it is in the monad of some y ∈ X. We say y is the standard part of x and write y = st(x). Note that such y is unique. We use NS( * X) to denote the collection of nearstandard elements of * X and we say NS( * X) is the near-standard part of * X. The standard part map st is a function from NS( * X) to X, taking near-standard elements to their standard parts. In both cases, the notation elides the underlying space Y and the topology T , because the space and topology will always be clear from context. For a metric space (X, d), two elements x, y ∈ * X are infinitely close if * d(x, y) ≈ 0. An element x ∈ * X is near-standard if and only if it is infinitely close to some y ∈ X. An element x ∈ * X is finite if there exists y ∈ X such that * d(x, y) < ∞ and is infinite otherwise. Let X be a topological space endowed with Borel σ-algebra B[X] and let M(X) denote the collection of all finitely additive probability measures on (X, B[X]). An internal probability measure µ on ( * X, * B[X]) is an element of * M(X). Namely, an internal probability measure µ on ( * X, * B[X]) is an internal function from * B[X] → * [0, 1] such that 1. µ(∅) = 0; 2. µ( * X) = 1; and 3. µ is hyperfinitely additive.
The Loeb space of the internal probability space ( * X, * B[X], µ) is a countably additive probability space ( * X, * B[X], µ) such that and Every standard model is closely connected to its nonstandard extension via the transfer principle, which asserts that a first order statement is true in the standard model is true if and only if it is true in the nonstandard model. Finally, given a cardinal number κ, a nonstandard model is called κsaturated if the following condition holds: let F be a family of internal sets, if F has cardinality less than κ and F has the finite intersection property, then the total intersection of F is non-empty. In this paper, we assume our nonstandard model is as saturated as we need (see e.g. [8,Thm. 1.7.3] for the existence of κ-saturated nonstandard models for any uncountable cardinal κ).
3. Hyperfinite Representation of Probability Spaces. In this section, we give an overview of hyperfinite representation for probability spaces developed in [19]. All the proofs can be found in [19,Section. 6]. We use similar notation to [19] and [20]. The following theorem gives a nonstandard characterization for compact topological spaces. In the following, we use the common notation d(x, A) = inf{y ∈ X : d(x, y)} for every x ∈ X and every A ⊂ X.
We now introduce the concept of hyperfinite representation of a Heine-Borel metric space X. The intuition is to take a "large enough" portion of * X containing X and then partition it into hyperfinitely many pieces of *Borel sets with infinitesimal radius. We then pick one "representative" from each piece to form a hyperfinite set. The formal definition is given below.
Definition 5. Let (X, d) be a metric space satisfying the Heine-Borel condition. Let δ ∈ * R + be an infinitesimal and r be an infinite hyperreal number. A (δ, r)-hyperfinite representation of X is a tuple (S, {B(s)} s∈S ) such that 1. S is a hyperfinite subset of * X. 5. For any x ∈ NS( * X), * d(x, * X \ s∈S B(s)) > r. 6. There exists a 0 ∈ X and some infinite r 0 such that The set S is called the base set of the hyperfinite representation. For every x ∈ s∈S B(s), we use s x to denote the unique element in S such that x ∈ B(s x ).
If X is compact, we have NS( * X) = * X by 3.1. In this case, we can pick S such that s∈S B(s) = * X, and hence the second parameter of an (ǫ, r)hyperfinite representation is redundant. Thus, we shall simply work with an ǫ-hyperfinite representation in the case where X is compact. The set U (a 0 , r 0 ) can be seen as the *closure of the nonstandard open ball U (a 0 , r 0 ). As X satisfies the Heine-Borel condition, by the transfer principle, U (a 0 , r 0 ) is a *compact set. That is, U (a 0 , r 0 ) satisfies all the first-order logic properties of a compact set.
The next theorem shows that hyperfinite representations always exist. Although the statement appears to be slightly stronger than [19, Thm. 6.6], its proof is almost identical to the proof of [19, Thm. 6.6] hence is omitted.
Theorem 3.2. Let X be a metric space satisfying the Heine-Borel condition. Then, for every positive infinitesimal δ and every positive infinite r, there exists an (δ, r)-hyperfinite representation (S r δ , {B(s)} s∈S r δ ) of * X such that X ⊂ S r δ .
Suppose X is a Heine-Borel metric space endowed with Borel σ-algebra B[X]. Let P be a probability measure on (X, B[X]). Let S be the base set of a (δ, r)-hyperfinite representation of X for some positive infinitesimal δ and some positive infinite number r. The next theorem shows that we can define an internal probability measure on (S, I(S)) that gives a "nice" approximation of P . , P ) be a Borel probability space where X is a metric space satisfying the Heine-Borel condition, and let ( * X, * B[X], * P ) be its nonstandard extension. For every positive infinitesimal δ, every positive infinite r and every (δ, r)-hyperfinite representation (S, {B(s)} s∈S ) of * X, define an internal probability measure P ′ on (S, I(S)) by letting P ′ ({s}) =

Hyperfinite Representation of General Markov Processes.
Let {P x } x∈X be the transition kernel of a discrete-time Markov process with state space X. We assume that X is a metric space satisfying the Heine-Borel condition throughout the rest of the paper until otherwise mentioned. The transition probability can be viewed as a function g : x (A) for every x ∈ X, t ∈ N and A ∈ B[X]. We will use g(x, t, A) and P (t) x (A) interchangeably. For any x ∈ X and A ∈ B[X], let P (0) We will construct a hyperfinite object to represent the Markov process {X t } t∈N associated with the transition kernel g. We fix a set T = {1, 2, . . . , K} for some infinite K ∈ * N throughout the paper. A hyperfinite Markov process is defined analogously to finite Markov processes. Namely, a hyperfinite Markov process is characterized by the following four ingredients: The following theorem shows that it is always possible to construct a hyperfinite Markov processes with these parameters. . Fix a hyperfinite state space S, a time line T , a hyperfinite set {v i } i∈S and a hyperfinite set {p ij } i,j∈S that satisfy the immediately-preceding conditions. Then there exists an internal probability triple (Ω, A, P) with an internal stochastic process {X t } t∈T defined on (Ω, A, P) such that for all t ∈ T and i 0 , ....i t ∈ S.
As in [19], we will construct a hyperfinite Markov process {X ′ t } t∈T which is a "nice" representation of {X t } t∈N . Due to the similarities between finite objects and hyperfinite objects, {X ′ t } t∈T inherits many key properties from finite Markov processes. {X ′ t } t∈T will play an essential role throughout the paper.
Pick any positive infinitesimal δ and any positive infinite number r. Let (S, {B(s)} s∈S ) be a (δ, r)-hyperfinite representation of * X. Let us recall some key properties of (S, {B(s)} s∈S ).
1. s ∈ B(s) for every s ∈ S. 2. For every s ∈ S, the diameter of B(s) is no greater than δ.
We shall fix such a (S, {B(s)} s∈S ) for the rest of the paper. When X is non-compact, s∈S B(s) = * X. Hence, we need to truncate * g to be an internal probability measure on s∈S B(s).
We now define a hyperfinite Markov process {X ′ t } t∈T on S by specifying its internal transition kernels. We will use G kj . For any internal set A ⊂ S and any i ∈ S, let G i (·) defines an internal probability measure on S for every t ∈ N and i ∈ S.
We now quote the following two key results from [19].
These theorems shows that the transition probabilities of {X t } t∈N agree with the Loeb extension of the internal transition probabilities of Hyperfinite Representation of Lazy Chain. For discrete-time Markov processes, one considers a lazy version of the original Markov process to avoid periodicity and near-periodicity issues. Let g be the transition kernel of a discrete-time Markov process. We denote by g L the lazy kernel of g, given by the formula g L (x, 1, Note that g L generally does not satisfy Section 1 even if g does. Suppose g satisfies Section 1 and let G be a hyperfinite representation of g. In addition, let {X ′ t } t∈T be a hyperfinite Markov process associated with the internal transition kernel G. Both G and {X ′ t } t∈T will be fixed until the applications in Section 7.
The lazy chain of {X ′ t } t∈T is defined to be a hyperfinite Markov process with transition probabilities L Before proving the main result of this section, we quote the following useful lemma. Lemma 7 ([19, Lemma. 7.24]). Let P 1 and P 2 be two internal probability measures on a hyperfinite set S. Then where P 1 (·) − P 2 (·) = * sup A∈I(S) |P 1 (A) − P 2 (A)| and the * sup is taken over all internal functions.
We now prove the following representation theorem, which is similar in spirit to Theorem 4.2: Proof. We prove it by induction on t ∈ N. Let t = 1. Pick any x ∈ NS( * X) and any A ∈ * B[X], by Lemma 5 and Theorem 4.
Suppose the theorem holds for t = n. We now show that the theorem holds for t = n + 1. By the transfer of the Markov property, we have * g L (x, n + 1, a∈A∩S B(a)) (4.9) where the last ≈ follows from the fact that * g L (x, 1, s∈S B(s)) = 1. By the induction hypothesis, we know that * g L (y, n, a∈A∩S B(a)) ≈ L for every y ∈ s∈S B(s). Thus, we have (4.16) We must now calculate the second term. By Lemma 5, we have * g(x, 1, B(s x )) ≈ * g(s x , 1, B(s x )). By Definition 6, we know that * g(s x , 1, B(s x )) ≈ G sxsx .
We will now show that by considering two cases: In the case * g(x, 1, s =sx B(s)) ≈ 0, by Section 1, we have * g(s x , 1, s =sx B(s)) ≈ 0. This allows us to define P 1 , P 2 : and . Then both P 1 and P 2 are internal probability measures on S. By Section 1, we know that in our second case as well, so this equality always holds.
The following well-known nonstandard representation theorem is due to Robert Anderson.
Note that every Heine-Borel space is σ-compact. We also recall that the hyperfinite state space S of {X ′ t } t∈T contains X as a subset. We now present the following hyperfinite representation theorem for lazy chains. The proof is very similar to the proof of 4.3 hence is omitted.
Theorem 4.5. Suppose that the transition kernel of {X t } t∈N satisfies Section 1. Then for every x ∈ X, every t ∈ N and every E ∈ B[X], we have

4.2.
Hyperfinite Representation of Stationary Distribution. Let π be a stationary distribution of {P x (·)} x∈X . We construct an analogous object in the hyperfinite representation {G i (·)} i∈S , called the "weakly stationary distribution".
Definition 7. Let Π be an internal probability measure on (S, I(S)). We say Π is a weakly stationary distribution for {G i (·)} i∈S if there exists an infinite t 0 ∈ T such that for any t ≤ t 0 and any A ∈ I(S) we have In [19], the authors show that weakly stationary distributions exist for hyperfinite representations of general state space continuous time Markov processes under moderate regularity conditions. In this section, we show that weakly stationary distributions exist for Markov processes with transition kernel satisfying Section 1. We start by giving an explicit construction of a weakly stationary distribution from the standard stationary distribution.
It is straightforward to verify from 8 that for every A ∈ I(S). The following theorem relates π ′ and π. We now show that π ′ is a weakly stationary distribution for {G i (·)} i∈S .
Proof. Pick an internal set A ⊂ S and some t ∈ N. By the transfer principle, we have π ′ (A) ≈ * π( a∈A B(a)) = * g(x, t, a∈A B(a)) * π(dx).
Note that we have for every S 1 , S 2 ∈ I(S) and every t ∈ N. We now show that {G i (·)} i∈S is "infinitesimally close" to a *reversible process.
For every i, j ∈ S with π ′ ({i}) = 0, define It is straightforward to verify that H i (·) defines an internal probability measure on (S, I(S)) for every i ∈ S. We now show that the hyperfinite Markov process with internal transition kernel {H i (·)} i∈S is *reversible.
Proof. We start by showing that π ′ is a *stationary distribution of Hence π ′ is a *stationary distribution of {H i (·)} i∈S . We now show that the hyperfinite Markov process with internal transition kernel {H i (·)} i∈S is *reversible with respect its *stationary distribution π ′ . For t ∈ S \ S 0 , we have for every j ∈ S. Thus, for every i, j ∈ S, we have (4.52) As our choices of s and A are arbitrary, we have max s∈S G s (·)−H s (·) ≈ 0. Suppose that the theorem holds for t = n. We now establish the result for t = n + 1. Pick s ∈ S and A ∈ I(S). By Lemma 7, we have Throughout the paper, we shall denote the hyperfinite Markov process on S with the internal transition matrix {H ij } i,j∈S by {Z t } t∈T . As the total variation distance between {G i (·)} i∈S and {H i } i∈S is infinitesimal, it is not surprising that {H i } i∈S can be used as a hyperfinite representation of {G i (·)} i∈S . The following two theorems follow easily from Theorem 4.2, 4.3 and 4.8 hence proofs are omitted.  Define the lazy transition kernel {I ij } i,j∈S associated with {H ij } i,j∈S to be a collection of internal transition probabilities satisfying the initial conditions I For every i ∈ S and A ∈ I(S) we then have The following result shows that the total variation distance between the lazy chain of {G i (·)} i∈S and the lazy chain of {H i } i∈S is infinitesimal. Assume that the theorem holds for t = n. We now prove the case for t = n + 1. Pick s ∈ S and A ∈ I(S). By Lemma 7, we have

Mixing Times and Hitting Times with Their Nonstandard
Counterparts. . In this section, we develop nonstandard notions of mixing and hitting times for hyperfinite Markov processes and we show that the nonstandard notions and standard notions agree with each other. We assume that the underlying state space X is compact for some theorems in this section. Recall that X is compact if and only if * X = NS( * X).
The following result is an immediate consequence of Lemma 5.1.
The result then follows from Lemma 5.1.
for all n ∈ N and all A 0 , A 1 , . . . , A n ∈ B[X]. We write P x (·) for the probability of an event conditional on X 0 = x. Let {H ij } i,j∈S and µ denote the internal transition matrix and the initial internal distribution of a *reversible hyperfinite Markov process {Z t } t∈T defined in 4.3, respectively. By 4.1, we have: Theorem 5.2. There exists a hyperfinite probability space (Ω, I(Ω), Q) such that for all t ∈ T and i 0 , i 1 , . . . , i t ∈ S.
We write Q s (·) for the internal probability of an internal event conditional on Z 0 = s. 1, dy). Similarly, the first internal hitting time τ ′ A of an internal set A ⊂ S is defined to be min{t ∈ T : Z t ∈ A}. It is easy to verify that Q s (τ ′ A = 1) = H s (A) for every s ∈ S and A ∈ I(S). For t > 1, s ∈ S and A ∈ I(S), we have . In order to apply nonstandard extensions and the transfer principle more easily, we define P : Theorem 5.3. Suppose {g(x, 1, ·)} x∈X satisfies Section 1. Moreover, assume that the state space X is compact. For every x ∈ * X, every A ∈ I(S) and every t ∈ N, we have * P(x, a∈A B(a), t) ≈ Q(s x , A, t) where s x is the unique element in S with x ∈ B(s x ).
Proof. For t = 1, by Section 1 and Theorem 4.9, we have * P(x, for every x ∈ * X and every A ∈ I(S).
Fix n ∈ N and suppose we have * P(x, a∈A B(a), t) ≈ Q(s x , A, t) for every x ∈ * X, every A ∈ I(S) and every t ≤ n. We now prove the case where t = n + 1. By the induction hypothesis, we have * P(x, Hence, we have * P(x, a∈A B(a), n + 1) ≈ Q(s x , A, n + 1), completing the proof.
The following result shows that the large hitting time of the standard Markov process defined in 4 is bounded from below by the large hitting time of its hyperfinite representation.

Mixing Times and Hitting Times on Compact Sets.
In this section, we use techniques developed in previous sections to prove Theorem 2.2 for reversible Markov processes with compact state spaces. The following lemma is well-known (for completeness, a proof can be found in A.5 in the appendix): Lemma 10. Let 0 < α < 1 2 . Let D denote the collection of discrete time transition kernels with a stationary distribution on a σ-compact metric state space. Then there exists a universal constant d ′ α such that, for every {g(x, 1, ·)} x∈X ∈ D, we have We now show prove our main result Theorem 2.2 in the special case that the underlying state space is compact: Then there exist universal constants d α , d ′ α such that, for every {g(x, 1, ·)} x∈X ∈ C, we have Proof. Suppose t H (α) is infinite. By 10, we know that t L is infinite. Thus, the result follows immediately in this case.
Suppose t H (α) is finite. Let c α be the constant given in Theorem 2.1. Let {I i (·)} i∈S be the internal transition probability matrix defined after 4.10. By 1, we know that {I i (·)} i∈S is a *reversible process with *stationary distribution π ′ . Let T L = * min{t ∈ T : * sup i∈S I where Q(s, A, k) is defined in 5.2.
6. Mixing Times and Hitting Times on σ-Compact Sets. We fix notation as in Section 5.3, but relax the assumption that (X, d) is a compact metric space to the assumption that (X, d) is a σ-compact metric space. As before, all σ-algebras should be taken to be the usual Borel σ-algebra.
We recall the definition of the trace of a Markov chain: Definition 9. Let g be the transition kernel of a Markov chain on state space X with stationary measure π and Borel σ-field B[X]. Let S ∈ B[X] have measure π(S) > 0.
Fix x ∈ X and let {X t } t≥0 be a Markov chain with transition kernel g and starting point X 0 = x. Then define the sequence {η i } i∈N by setting η 0 = min{t ≥ 0 : X t ∈ S} (6.1) and recursively setting We then define the trace of g on S to be the Markov chain with transition kernel rem 1. Suppose that the original transition kernel g has stationary distribution π. For S ∈ B[X] with π(S) > 0, the normalization of π to the set S is the stationary distribution of the trace transition kernel g (S) . Moreover, if g is ergodic and reversible with respect to the stationary distribution π, then g (S) is reversible with respect to the normalization of π to the set S.  A simple coupling argument, expanded in A.3, gives: Lemma 11. Let g be a transition kernel with stationary measure π that satisfies Section 1, and let S ∈ B[X] be a set with measure π(S) > 0. Then the trace g (S) of g on S also satisfies Section 1.
For the rest of the section, let K(X) denote the collection of all compact subsets of X that are also in B[X]. The next theorem shows that the standardized mixing time of the original Markov chain is bounded by the supremum over standardized mixing times of associated trace chains. 2 Lemma 12. Let g be the transition kernel of a Markov chain on state space X with stationary measure π. For S ∈ B[X] with π(S) > 0, denote by t (S) m the standardized mixing time with respect to g (S) . Then Proof. By the definition of t m , for all ǫ > 0 there exist some particular points x, y ∈ X and a set A ∈ B[X] such that |g(x, t, A) − g(y, t, A)| > 0.25 + ǫ (6.6) for t = t m −1. Next, note that {g(x, n, ·), g(y, n, ·)} tm n=0 is a finite collection of measures, and in particular it is tight. Therefore, there exists a compact set S such that max 0≤n≤tm max{g(x, n, S), g(y, n, S)} ≥ 1− ǫ 100tm and x, y ∈ S. Combining this with Inequality (6.6), the transition probabilities g (S) satisfy g(x, n, X \ S) − t n=0 g(y, n, X \ S) max{g(x, n, X \ S), g(y, n, X \ S)} Thus, the mixing time of g (S) is also at least t m , so we conclude By the coupling in 2, we have: Let g be the transition kernel of a Markov chain on state space X with stationary measure π. For S ∈ B[X] with π(S) > 0, denote by τ (S) g (α) the large hitting time with respect to g (S) . Then We can now prove Theorem 2.2, the main result of this section: Proof. By 10, there exists a universal constant a ′ α > 0 such that, for every {g(x, 1, ·)} x∈X ∈ M, we have a ′ α t H (α) ≤ t L . Recall that C is the collection of discrete time reversible transition kernels with compact state space satisfying Section 1. By Theorem 5.5, there exists a universal constant d α > 0 such that, for every {g(x, 1, ·)} x∈X ∈ M, the mixing time of the lazy chain is bounded by d α times the maximal hitting time. For every {g(x, 1, ·)} x∈X ∈ M, by 3, we have t L ≤ 2t L . By 23, 12, 11 and Theorem 5.5, we have imsart-aap ver. 2014/10/16 file: Mixing_Extended_Version.tex date: April 5, 2019 7. Statistical Applications and Extensions. In this section, we give results that allow us apply our main result, Theorem 2.2, to obtain useful bounds for various Markov chains that don't satisfy its main assumptions. Our main motivation is the study of Markov chain Monte Carlo (MCMC) algorithms. MCMC is ubiquitous in statistical computation, and in this context small mixing times correspond to efficient algorithms (see e.g. [12] for an overview of MCMC, [22] for applications, and [30] for analyses). Very few algorithms used for MCMC satisfy the strong Feller condition Section 1.
We begin by showing in Section 7.1 that our results apply without change to the Metropolis-Hastings algorithm, one of the most popular algorithms in computational statistics. In Section 7.2, we introduce a relaxation of the strong Feller condition Section 1 and then show that this relaxed property is satisfied by many other MCMC chains. Appendix Section B contains further applications.

Strong Feller Functions of Metropolis-Hastings Chains. We begin with the following definition of a large class of Metropolis-Hastings chains:
Definition 10 (Metropolis-Hastings Chain). Fix a distribution π with continuous density ρ supported on R d . Also fix a reversible kernel {q(x, 1, ·)} x∈R d on R d with stationary measure ν. For every x ∈ R d , assume that q(x, 1, ·) has continuous density q x and ν has continuous density φ. Define the acceptance function by the formula β(x, y) = min(1, ρ(y)q y (x) ρ(x)q x (y) ). (7.1) Finally, define g to be the transition kernel given by the formula For a transition kernel of this form, define the constant rem 4. It is well-known that, under these conditions, g will be reversible with stationary measure π (see e.g. [13]).
Let g be a Metropolis-Hastings kernel of form given in 10, and let {X t } t∈N ∼ g. Then define inductively η 0 = 0 and The process {(Y t , η t )} t∈N is a Markov chain. We denote by g ′ its transition kernel, and π ′ its stationary measure on X × N. We remark that it is easy to reconstruct {X t } t∈N from {(Y t , η t )} t∈N . For this section only, denote by t ′ m , t ′ L and t ′ H (α) the mixing time, lazy mixing time and maximum hitting time of g ′ .
We then have: Theorem 7.1. Let B be the collection of transition kernels of the form given in 7.2 with finite mixing time, and for which q x (y) is jointly continuous in x, y. Then for all 0 < α < 1 2 , there exists a universal constant 0 < c α < ∞ so that for every g ∈ B and every δ > 0.
Proof. Since g is of the form 7.2, it is straightforward to see that g ′ satisfies Section 1. Thus, one can apply Theorem 2.2 to show that, for any where (as in Theorem 2.2) the implied constant depends onα.
Next, we must relate t ′ H to t H . For x ∈ X, let λ(x) = g(x, 1, {x} c ). For λ ∈ (0, ∞), denote by L λ the law of the geometric random variable with success probability λ and let P λ denote its associated probability mass function. For A ⊂ X × N, we observe Define the associated "core" set A ⊂ X by Since π ′ (A ′ ) ≥ α, we must have π(A) ≥ δα. For chains {X t } t∈N and {(Y t , η t )} t∈N coupled as in Equation (7.5), define the hitting times By the definition of the "core" set in Equation (7.9), for all starting points x ∈ X. Under our coupling of for all starting points x ∈ X. Combining these two inequalities, we have Combining this with Equation (7.7), completes the proof.

7.2.
Almost-Strong Feller Chains. We don't know a general way to extend the trick in Section 7.1. Fortunately for us, in the context of MCMC, the user does not usually care about the mixing time of a specific Markov chain -it is enough to estimate the mixing time of some Markov chain that is both fast and easy to implement. We give the mathematical results first, then explain their relevance to MCMC in Section 7.2.3. 7.2.1. Generic Bounds. Let {g(x, 1, ·)} x∈X be the transition kernel of a Markov process. For every k ∈ N, denote by g (k) the transition kernel g (k) (x, t, A) = g(x, kt, A) (7. 16) for every x ∈ X, t ∈ N and A ∈ B[X]. We call {g (k) (x, 1, ·)} x∈X the kskeleton of {g(x, 1, ·)} x∈X . We will use the superscript (k) to extend our notation for the kernel g to the kernel g (k) . For example, for every ǫ > 0, we use t (k) m (ǫ) to denote the standardized mixing time of g (k) . We observe some simple relationships between g and g (k) , with details in Appendix A.4 for completeness: Lemma 14. For all ǫ > 0 and all k ∈ N, Lemma 15. For all α > 0 there exists a constant 0 < C α < ∞ so that for all k ∈ N, Next, we give a definition that relaxes the strong Feller condition in a quantitatively-useful way. We first make a small remark on three operations on kernels that we've defined: the trace of a kernel on a set, the k-skeleton of a kernel, and the "lazy" version of a kernel. As shown in 23, the "trace" and "lazy" transformations commute -the trace of the lazy chain is equal to the lazy version of the trace chain. However, the k-skeleton and "lazy" transformations do not generally commute. As such, we occasionally use parentheses in the following notation to emphasize the order in which these transformations occur, with subscripts taking precedence. For example, g (k) L is the k-skeleton of the chain g L , while (g (k) ) L is the lazy version of g (k) . This last chain is important, and so we introduce the shorthand We also define T m , T L , and T H to be the mixing time, lazy mixing time and maximum hitting time of G.
Definition 11 ((k, C)-almost Strong Feller). For k, C ∈ N, we say that a kernel {g(x, 1, ·)} x∈X is (k, C)-almost strong Feller if there exist kernels {G 1 (x, 1, ·), G 2 (x, 1, ·)} x∈X so that the following are satisfied: 1. G 1 is reversible and satisfies Section 1, and 2. For some For the rest of the paper, we let E(k, C) be the collection of (k, C)-almost strong Feller transition kernels on a σ-compact metric state space X. rem 5. Any strong Feller chain is (1, C)-almost strong Feller for all C ≥ 0. Our condition is inspired by the famous asymptotically strong Feller condition of [23].
To lessen notation in the rest of this section, we use "x y" as shorthand for the longer phrase "there exists a universal constant D such that x ≤ Dy," and x ∼ y for "x y and y x." We use the "prime" superscript to denote quantities related to chains drawn from G 1 . For example, we denote by e.g. t ′ m the mixing time of G 1 and t ′ L the mixing time of its associated lazy chain. We then have the main result of this section, which shows that T m is bounded from above by t Theorem 7.2. There exists a universal constant C 0 such that, for every 0 < α < 0.5, there exists a universal constant d α such that for all C > C 0 , all k ∈ N and all {g(x, 1, ·)} x∈X ∈ E(k, C), we have where ℓ Proof. Pick 0 < α < 1 2 . We have k, C ∈ N and g ∈ E(k, C) as generic constants and transition kernels, and we let G 1 , G 2 , and p be associated kernels and constant as in 11. By 14, Applying 21 and 7.21, there exists a constant C 0 > 0 so that for all C > C 0 , all k ∈ N and g ∈ E(k, C), we have as well. We restrict ourselves to C > C 0 for the remainder of the proof.
Since the transition kernel {G 1 (x, 1, ·)} x∈X satisfies Section 1, Theorem 2.2 gives H (α). (7.27) Combining this with Inequality (7.25) completes the proof. 7.2.2. Gibbs Samplers. We will show that 7.2 can be used to obtain nontrivial mixing bounds related to the following class of Gibbs samplers: Definition 12 (Gibbs Sampler). Fix a distribution π with continuous density ρ > 0 on R d . For x ∈ R d , i ∈ {1, 2, . . . , n} and z ∈ R, define the i'th conditional distribution of ρ. Let F x,i be the CDF of ρ x,i . We then define a Markov chain as follows.
Fix a starting point X 0 = x. Let i t iid ∼ Unif ({1, 2, . . . , d}) and U t iid ∼ Unif([0, 1]) be two i.i.d. sequences. We iteratively define X t+1 by the equation We define the transition kernel g by setting where P is a product measure that generates this Markov process. (7.29) is the usual "forward mapping" representation of a "random-scan" Gibbs sampler. Note that, since ρ is continuous and nonzero everywhere, F −1 x,i (u) always contains exactly one element for x ∈ R d , i ∈ {1, 2, . . . , n} and u ∈ [0, 1].
Under the same setting as (7.30), we define the associated "conditional" update kernels {g (i) } 1≤i≤d by their one-step transition probabilities: The MCMC literature has many variants of the Gibbs sampler, but we focus on this popular simple case. Before stating our main result, we recall that any sequence of transition kernels g 1 , . . . , g k on the same space has a product kernel, which we denote k j=1 g j . Informally, this product is obtained by "proposing from these kernels in order"; see e.g. Theorem 5.17 of [26] for a formal justification of the notation. Our main result is: Lemma 16. Let A be the collection of transition kernels of the form given in 12, that also have finite mixing time. Then, for all 0 < C < ∞, there exists a universal constant K C so that {g(x, 1, ·)} x∈X ∈ A is (k, C)-almost strong Feller for any k ≥ K C d log(t L ). rem 6. The condition ρ(θ) > 0 for all θ ∈ R d is only used as a simple sufficient condition for the chain G 1 defined in the proof to satisfy Section 1. In many other situations, this can be checked directly.
Proof. Throughout this proof, we fix g ∈ A and use notation from 12 freely. We begin by bounding the mixing time from below. For x ∈ R d , define the collection of vectors that share at least one entry with x. Since π has a density ρ, we have for any x ∈ R d that π(H(x)) = 0. (7.33) Thus, for any x ∈ R d and t ∈ N, we have By the representation for the lazy chain in 3, we also have By the well-known "coupon collector" bound (see the main theorem of [21]), there exists some d 0 ∈ N such that for all d ≥ d 0 , Putting together Inequalities (7.34) to (7.36), this implies that there exists some universal constant 0 < c 1 < ∞ so that for all d ∈ N and all g ∈ A on R d , t m , t L ≥ c 1 d log(d). (7.37) Denote by k ∈ N a constant that will be fixed later in the proof. Let  (7.38) and define the kernels G 1 , G 2 by setting In the notation of Definition 11, the constant p associated with this choice of k, G 1 , G 2 is p = P [E]. (7.39) We observe that, for any fixed j ∈ {1, 2, . . . , d} and s ∈ N, On the other hand, by Hoeffding's inequality, we have for all k ≥ 4. Combining these two bounds, (7.42) Noting k, d ≥ 1, we have: To satisfy Inequalities (7.21) and (7.20), we just need our choice of k to ensure that p ≤ 1 Ct L . Inspecting these inequalities, there exists a universal constant K so that this inequality is satisfied as long as k > Kd log(max(d, t L )). (7.44) On the other hand, by Inequality (7.37), there exists a universal constant A so that t L ≥ Ad log(max(d, t L )). (7.45) Inspecting these final two bounds, we see that for all k sufficiently large compared to d log(t L ) t L , this choice of k, p and G 1 satisfies (7.21) and (7.20).
Next, we must check that G 1 is reversible. To see this, we begin by noting that E depends only on the sequence {i t } t∈N of "index" variables in our forward-mapping representation. Next, we check that, even after conditioning on E, these index variables have a certain exchangeability-like property.
For m ∈ N and any sequence J ∈ R n with n ≥ m + 1, define the "reversal" function Then, for {X t } t∈N ∼ g drawn according to the forward-mapping representation, and for any x ∈ X, we have Recalling that g (i) is π-reversible for every i, we see that ) is itself π-reversible (see e.g. the introduction of [14] for a careful presentation of the additive reversibilization and a general argument as to why it is reversible). Thus, G 1 is π-reversible.
Finally, the fact that G 1 satisfies Section 1 follows immediately from the fact that the function (7.29) that gives the forward-mapping representation of g is continuous.
We conclude: Theorem 7.3. Let A be the collection of transition kernels of the form given in 12, that also have finite mixing time and satisfy ρ(θ) > 0 for all θ ∈ R d . Then there exists a universal constant 0 < K < ∞ such that, for all 0 < α < 0.5, there exist constants 0 < c α < ∞, 0 < c ′ α < ∞ so that for all {g(x, 1, ·)} x∈X ∈ A, Proof. The first inequality follows immediately from 7.2 and 16. The second follows from 15 and 3. 7.2.3. Using 7.3 and B.1 . We note that there are two obvious differences obstacles to using our main results, 7.3 and B.1, for practical problems: 1. Both refer to the transition kernel G rather than the original kernel g of interest. 2. Both require some choice of k, which in turn requires some a-priori bound on the mixing time of g L .
The first is not a practical problem, as it is straightforward to sample from G: 1. Sample a Markov chain {X t } t∈N ∼ g. 2. Transform the sequence {X 0 , X 1 , X 2 , . . .} by repeating each element a number of times given by a geometric(2) random variable, resulting in the sequence {X 0 , X ′ 1 , X ′ 2 , . . .}. 3. Take the k-skeleton of this sequence, {Y 0 , Y 1 , Y 2 , . . .} ≡ {X ′ 0 , X ′ k , X ′ 2k , . . .}. (2), transform the sequence {Y 0 , Y 1 , Y 2 , . . .} by repeating each element a number of times given by a geometric(2) random variable, resulting in the sequence {Y 0 , Y ′ 1 , Y ′ 2 , . . .} ∼ G. The second point is slightly more subtle. In practice, it is often possible to find a weak upper bound on a mixing time, even if practical upper bounds are much harder. For a typical example, the paper [15] finds a very generic upper bound on the mixing time of a family of Markov chains that includes many Gibbs samplers. For many well-studied target distributions on R d such as the uniform distribution on the simplex, box or ball, the upper bounds in [15] are roughly of the form t m e c 1 d . On the other hand, after many years of careful study, the true mixing times of many of these Markov chains were shown to be polynomial in d (see e.g. [39]).

As in step
In this situation, it was not too difficult to show an exponential bound on the mixing time, but it was quite hard to show a polynomial bound. This is exactly the situation in which 7.3 and B.1 are useful. If one can show that e.g. t L APPENDIX A: SOME ELEMENTARY BOUNDS In this section, we prove a few useful bounds on mixing times and hitting times. Results in this section are used throughout the paper.
Conversely, by a regeneration argument, we have for all k ∈ N. Thus, for every x ∈ X and every A ∈ B[X] with π(A) ≥ α, we have On the other hand, the same calculations show that τ g (α) is infinite if and only if t H (α) is infinite.

A.2. Mixing and Hitting Times of Perturbed Chains.
Lemma 17 (Submultiplicative Bounds on Hitting Times). Let {g(x, 1, ·)} x∈X be a transition kernel on a σ-compact metric state space X with stationary distribution π. Fix a constant α > 0 and set A ∈ B[X] with π(A) > α. Then for all k ∈ N and all x ∈ X, where P is a product measure that generates {g(x, 1, ·)} x∈X .
Proof. Pick a constant α > 0. By definition of τ g (α), we have for ℓ ∈ N, Iterating this bound over 0 ≤ ℓ < k, we have the desired result.
Lemma 18 (Comparison of Lazy Hitting Times). Let {g(x, 1, ·)} x∈X be a transition kernel on a σ-compact metric state space X with stationary distribution π. Let {g L (x, 1, ·)} x∈X be its associated lazy transition kernel as defined in 2. For every 0 < α < 1 2 , let t H (α) be the maximum hitting time for the kernel {g(x, 1, ·)} x∈X , and let ℓ H (α) be the maximum hitting time for the kernel g L . Then there exists a universal constant 0 < C < ∞, not depending on α, so that Then the chain {Y t } t∈N given by the formula Y t = X L(t) satisfies Y 0 = x and {Y t } t∈N ∼ g L . Let τ ′ A be the first hitting time of A for the chain {Y t }. We observe that τ ′ A = min{t : Y t ∈ A} = min{t : X L(t) ∈ A} = min{t : L(t) = τ A }.
Setting the notation L inv (t) = min{s : L(s) = t}, (A.10) this implies This immediately implies τ ′ A ≥ τ A which further the first inequality in the statement of the lemma. On the other hand, by Markov's inequality, we have Let P be a product measure that generates {g(x, 1, ·)} x∈X . By 17, we have Let τ ′ g (α) denote the large hitting time for {g L (x, 1, ·)} x∈X . Then we have τ ′ g (α) ≤ 150τ g (α). By 4, there exists a universal constant C, not depending on α, such that ℓ H (α) ≤ Ct H (α).
We quote the following lemma from [7].
Lemma 20 (Comparison of Lazy Mixing Times). Let {g(x, 1, ·)} x∈X be a transition kernel on a σ-compact metric state space X with stationary distribution π. Let {g L (x, 1, ·)} x∈X be its associated lazy transition kernel as defined in 2. For every 0 < ǫ < 1 2 , let t m (ǫ) be the mixing time for the kernel {g(x, 1, ·)} x∈X , and let t L (ǫ) be the mixing time for the lazy kernel g L . Then there exists a constant 0 < C ǫ < ∞, depending only on ǫ, so that t L (ǫ) ≤ C ǫ t m (ǫ). (A.14) rem 7. In contrast to 18, the reverse inequality is not true; t L may be much smaller than t m . and similarly for g ′ . Next, set t 0 = 4t m . We have π − π ′ = π(·) − g ′ (π ′ , t 0 , ·) ≤ π(·) − g(π ′ , t 0 , ·) + g(π ′ , t 0 , ·) − g ′ (π ′ , t 0 , ·) Thus, We now prove our main bound. For all x ∈ X, we have where the inequality in the second-last line is Inequality (A.19), the first inequality in the final line comes from assumption (A.17), and the second inequality in that line follows from 1 and 2. This shows that t ′ m ≤ 4t m , proving the first half of the inequality in the statement of the lemma.
However, we note that the inequalities t ′ m ≤ 4t m , combined with (A.15), show that This is exactly the weakened assumption (A.17) with the roles of g and g ′ swapped, and so we conclude that t m ≤ 4t ′ m as well.
Since A, x were arbitrary, we have τ ′ g (α) ≤ 3τ g (α). By 4, we have A.3. Properties of the Trace. We check that the "trace chain of the lazy chain" and the "lazy chain of the trace chain" are the same. To help, we introduce some notation. Define T L to be the map that takes a kernel g to the lazy version g L defined in 2. For a set S, define T (S) to be the map that takes a kernel g to the trace g (S) of g on S defined in 9. We then have: Proof. We will actually prove a slightly stronger statement by constructing Markov chains from the relevant kernels on the same probability space. Fix x ∈ X and let {X t } t∈N ∼ g with starting point X 0 = x. We recall that the "trace" process is defined in 9 by transforming the time-coordinate of {X t } t∈N according to the sequence of "entrance times" {η i } i∈N . In Remark 3, we pointed out that the "lazy" kernel from 2 can be defined in terms of a similar transformation of the time-coordinate: if {ζ i } i∈N is a sequence of i.i.d. geometric random variables with mean 2, and We also show that the trace inherits the strong Feller property: Proof of Lemma 11. Consider two starting points x, y ∈ S. We will denote by {X t } t∈N , {Y t } t∈N two Markov chains with transition kernel g and starting points X 0 = x, Y 0 = y. It is straightforward to check that there exists a coupling of these two chains so that P[X 1 = Y 1 ] ≤ g(x, 1, ·) − g(y, 1, ·) (A.32) and also is the first meeting time. We assume the chains are coupled in this way. Next, define the random times η x = min{t > 0 : X t ∈ S}, η y = min{t > 0 : Y t ∈ S}. Since g satisfies Section 1, this immediately implies g (S) does as well.

A.4. Mixing and Hitting Times of Skeleton Chains.
Proof of Lemma 14. Pick ǫ > 0 and k ∈ N. Recall that the function t → sup x,y∈X g(x, t, ·) − g(y, t, ·) (A.40) is a non-increasing function of t ∈ N. Pick t 1 ∈ N such that t 1 ≥ 1 k t m (ǫ). Then sup x,y∈X g (k) (x, t 1 , ·) − g (k) (y, t 1 , ·) = sup  Lemma 25. Let A be the collection of transition kernels of the form given in 7.2 with finite mixing time, and for which q x (y) is uniformly continuous jointly in x, y. Then, for all 0 < C < ∞, there exists a universal constant K C so that all {g(x, 1, ·)} x∈X ∈ A are (k, C)-almost strong Feller for all k ≥ K C min(γ −1 , t L ) log(t L ).
Proof. We mimic the proof of 16. Throughout this proof, we denote by g a generic element of A and use notation from 10 freely. Fix δ > 0 and let x be a point satisfying g(x, 1, {x} c ) ≤ γ + δ. Then for all t ∈ N, we have g(x, t, ·) − π ≥ g(x, t, {x}) ≥ (1 − γ − δ) t .
By the representation for the lazy chain in 3, we also have Since we assume that t m < ∞, this implies we must have γ < 1. Next, denote by k ∈ N a constant to be fixed later. Let {X t } t∈N ∼ g, and let L be the (random) function from Equation (A.27), so that {X L(t) } t∈N ∼ g L .
For this choice, define the event Comparing this to Inequality (B.2), we conclude (by the same argument as in the end of 16) that there exists some particular choice K = K C ∈ N depending only on C so that p ≤ 1 Ct L for all k ≥ K C log(t L ) min(γ −1 , t L ). Next, we observe that G 1 is reversible. From its definition, we see that G 2 is the kernel of the "trivial" Markov chain that never moves (that is, G 2 (x, 1, A) is just the indicator function of the set {x ∈ A}). Thus, G 2 is reversible for any measure, and in particular it is π-reversible. Since g (k) L is π-reversible and G 2 is π-reversible, this implies that G 1 is also π-reversible.
Finally, recall by our assumption that q x (y) is jointly uniformly continuous in its arguments and that ρ is continuous. This implies that the collection {g * (x, ·)} x∈X of sub-probability measures given by the formula g * (x, A) = g(x, 1, A\{x}) (B.6) satisfies Section 1, and thus that G 1 satisfies Section 1 as well. Hence, we have the desired result.