Non-reversible Metropolis-Hastings

The classical Metropolis-Hastings (MH) algorithm can be extended to generate non-reversible Markov chains. This is achieved by means of a modification of the acceptance probability, using the notion of vorticity matrix. The resulting Markov chain is non-reversible. Results from the literature on asymptotic variance, large deviations theory and mixing time are mentioned, and in the case of a large deviations result, adapted, to explain how non-reversible Markov chains have favorable properties in these respects. We provide an application of NRMH in a continuous setting by developing the necessary theory and applying, as first examples, the theory to Gaussian distributions in three and nine dimensions. The empirical autocorrelation and estimated asymptotic variance for NRMH applied to these examples show significant improvement compared to MH with identical stepsize.


Introduction
The Metropolis-Hastings algorithm [MRR + 53, Has70] is a Markov chain Monte-Carlo method of profound importance to many fields of mathematics such as Bayesian inference and statistical mechanics [DSC98,Dia08,LPW09]. The applicability of Metropolis-Hastings to a particular computational problem depends on the mixing properties of the Markov chain that is generated by the algorithm. The chains generated by the classical Metropolis-Hastings algorithm are reversible, or, in other words, satisfy detailed balance; in fact, this reversibility is essential in showing that the resulting chains have the right invariant probability distribution.
However, non-reversible Markov chains have better mixing properties. This can be shown experimentally in special cases [ST10,TCV11], theoretically in special cases [DHN00,Nea04], but in fact, also in general [SGS10]. In the latter paper it is shown that the asymptotic variance of nonreversible chains is in general better than the asymptotic variance of the 'corresponding' reversible chain. There exist two basic approaches to constructing non-reversible chains from reversible chains: one can 'lift' the Markov chain to a larger state space [DHN00,Nea04,TCV11], or one can introduce non-reversibility without altering the state space [SGS10]. In this document attention is confined to a novel method in the second category, for which non-reversible Metropolis-Hastings would be a suitable name. Other noteworthy publications on non-reversible Markov chains are [Wil99,GM00]. In this paper we propose a general method of the second type, i.e. there will be no need to augment the state space. We restrict our attention to finite Markov chains; non-reversible Markov chains on general spaces will not be discussed in this paper, but it may be noted that the main result of this paper, discussed in Section 2, generalizes in a straightforward way to Markov chains on general state spaces.
In this paper Metropolis-Hastings is extended to 'non-reversible Metropolis-Hastings'. The main idea of this paper is discussed in Section 2. It is shown how Metropolis-Hastings, and in particular the acceptance probability of Metropolis-Hastings, can be adjusted so that the resulting chain in non-reversible Metropolis-Hastings has a specified 'vorticity', and therefore, will be non-reversible. Any Markov chain satisfying a symmetric structure condition can be constructed by non-reversible Metropolis-Hastings, which establishes the universality of the algorithm. In Section 3 it is shown that non-reversible Metropolis-Hastings can be seen as 'adding' non-reversibility to a reversible Markov chain, and therefore, by application of a result of [SGS10], has improved performance with respect to asymptotic variance, compared to classical Metropolis-Hastings. For the convenience of the reader two practical approaches of applying non-reversible Metropolis-Hastings are discussed in Section 4, along with some results that are aimed at understanding what a vorticity matrix is and how it can perhaps be constructed. For the particular example of a multi-modal distribution on the n-cycle, illustrate the application and efficiency of non-reversible Metropolis-Hastings in Section 5. Finally, conclusions and directions of further research are discussed in Section 6.

Metropolis-Hastings generalized to non-reversible chains
As a preliminary to non-reversible Metropolis-Hastings, we require the notion of vorticity matrix, which is introduced in Section 2.1. The classical Metropolis-Hastings algorithm, discussed in Section 2.2, is extended using the notion of vorticity matrix to a non-reversible version in Section 2.3.

Vorticity matrices
Let P = P (x, y) denote a matrix of transition probabilities of a Markov chain on a finite state space S = {1, . . . , n}. A distribution on S is a vector with positive elements in R n , and is not necessarily normalized, i.e. it is not necessarily the case that x∈S π(x) = 1. If π is a distribution such that (x)π(x) = 1, then we call π a probability distribution. We will always assume that π(x) > 0 for all x ∈ S. A (probability) distribution π on S is said to be an invariant (probability) distribution of P if π T P = π T , i.e. x∈S π(x)P (x, y) = π(y) for all y ∈ S. A distribution π on S is said to satisfy the detailed balance condition with respect to P if diag(π)P = P T diag(π), i.e. π(x)P (x, y) = P (y, x)π(y) for all x, y ∈ S. If there exists a distribution π on S that satisfies detailed balance with respect to P , then P is said to be reversible. As is well known, and straightforward to check, if π satisfies detailed balance with respect to P , then π is invariant for P .
To investigate possible generalizations of the detailed balance condition, let us look at the matrix Γ defined, for some distribution π, by or in matrix notation Γ = diag(π)P − P T diag(π).
The following simple results are worth observing.
Lemma 2.1. Let P be the transition matrix of a Markov chain on S and let π be a distribution on S. Let Γ be defined by (1).
Proof. (i) and (ii) are immediate. As for (iii), which is zero for all x if and only if π is invariant for P .
In light of Lemma 2.1, a matrix Γ ∈ R n×n which is skew-symmetric and satisfies Γ½ = 0 is called a vorticity matrix. If Γ is related by (1) to a Markov chain P with invariant distribution π, it is called the vorticity of P and π. This notion can be considered to be essentially synonymous to non-reversibility although it has a specific definition. It will be a key ingredient in the construction of a non-reversible version of Metropolis-Hastings.
Remark 2.2 (Bibliographical note). The application of matrices with the properties of Lemma 2.1 to Markov chains was previously considered in [SGS10].

Classical reversible Metropolis-Hastings
In the Metropolis-Hastings algorithm a reversible Markov chain P 0 with a given invariant distribution π is constructed. We will assume, mainly for simplicity, throughout this paper that π(x) > 0 for all x ∈ S. As an ingredient for the construction of P 0 , a Markov chain Q is used, with the property that Q(y, x) = 0 whenever Q(x, y) = 0, x, y ∈ S. (2) In other words, whenever a transition from x to y has positive probability, the reverse probability also has positive probability. Condition (2) can be seen as a symmetric structure condition. The the Hastings ratio R 0 (x, y) is defined as R 0 (x, y) := π(y)Q(y, x) π(x)Q(x, y) , for all x, y ∈ S for which π(x)Q(x, y) = 0.
The value of R 0 at other values of x, y has no significance. With this definition of R 0 , acceptance probabilities are defined as A 0 (x, y) := min (1, R 0 (x, y)) , and transition probabilities P 0 (x, y) are defined by It is a straightforward exercise to show that the chain P 0 has π as its invariant distribution. An important step is the observation that R 0 (x, y) ≤ 1 if and only if R 0 (y, x) ≥ 1, which will be a recurring phenomenon in the sequel.

Generalization to non-reversible Metropolis-Hastings
We will now discuss how this framework can be extended to construct Markov chains that are, in general, non-reversible. Let Γ ∈ R n×n be a vorticity matrix satisfying Γ½ = 0, and let Q be the transition matrix of a Markov chain, satisfying (2). Again, π : S → (0, ∞) is some distribution that is not necessarily normalized, but that has only positive entries. We define for x, y ∈ S the non-reversible Hastings ratio as R Γ (x, y) := Γ(x, y) + π(y)Q(y, x) π(x)Q(x, y) , if π(x)Q(x, y) = 0, and let, as before, the acceptance probabilities A Γ be Entries of Γ can be negative. In order to avoid the situation that A Γ becomes negative, we will explicitly constrain vorticity matrix Γ to satisfy Γ(x, y) ≥ −π(y)Q(y, x) for all x, y ∈ S.
In particular, by the symmetric structure condition (2), Γ should have zeroes wherever Q has zeroes. As with Metropolis-Hastings, the transition probabilities P Γ (x, y) are defined by Indeed P Γ is a matrix of transition probabilities. For Γ = 0, A Γ and therefore P Γ reduce to A 0 and P 0 , so that the chosen notation is consistent.
In order to check that the proposed Markov chain has π as its invariant density, we need to verify that Γ, π and P Γ are related through (1). As a crucial step, we employ the following lemma, in analogy with Metropolis-Hastings.
Lemma 2.3. Let Γ be a vorticity matrix, Q a matrix of transition probabilities satisfying (2), π a distribution that is nowhere zero, such that (8) holds. Let R Γ be as above. Then R Γ (y, x) ≥ 1 whenever R Γ (x, y) ≤ 1, for any x, y ∈ S for which Q(x, y) = 0.
Using the previous lemma, it is now straightforward to show that Γ is the vorticity matrix of (P Γ , π).
Finally, since by assumption Γ½ = 0, Lemma 2.1 (ii) gives that π is invariant for P Γ . We have obtained our main result.
Theorem 2.5. Let Q be a Markov chain, Γ a vorticity matrix, and π a distribution on S, such that (2) and (8) are satisfied. Let P Γ be defined through (6), (7) and (9). Then P Γ has π as invariant distribution and Γ as its vorticity matrix.
The following trivial observation serves to indicate the generality of this approach. It asserts that every Markov chain may be build, in a trivial way, by the described procedure.
Proposition 2.6. Let P be a Markov chain with invariant distribution π and corresponding vorticity Γ, satisfying (2). Then, for Q = P and P Γ as defined through (6), (7), and (9), we have Proof. It suffices to note, that by Q = P and (1), A Γ (x, y) = 1 for all x, y ∈ S, x = y.
Remark 2.7. If, for some pair (x, y) ∈ S × S, (8) holds with equality, the transition probability P Γ (x, y) = 0 even when Q(x, y) = 0. Therefore irreducibility of Q does not imply irreducibility of P Γ , unless we impose the stronger condition: Γ(x, y) > −π(y)Q(y, x) for all x, y ∈ S. (10) The application of these results in practice will be discussed in Section 4. First we discuss a theoretical advantage of non-reversible Metropolis-Hastings.
3 Non-reversible chains constructed from reversible chains improve asymptotic variance In this section a second method of constructing non-reversible chains is discussed (Section 3.1. This conceptually simple approach consists of adding vorticity to a reversible chain that already has the desired invariant distribution. In Section 3.2 it is explained how non-reversible Metropolis-Hastings is essentially equivalent to the construction of Section 3.1. The advantage of the second approach is that it allows, using a result of [SGS10], a detailed analysis of the asymptotic variance, which is a performance measure of the MCMC method. Because of its importance, their analysis is repeated in some more detail in Section 3.3. In particular, it is shown that adding vorticity to a reversible chain always improves the asymptotic variance. Translating this result back to non-reversible Metropolis-Hastings gives us a way of incorporating non-reversibility in Metropolis-Hastings with guaranteed improved performance; this observation will play a key role in 'Approach (ii)' of Section 4.

Adding vorticity to a reversible chain
Suppose K is a reversible chain with respect to the distribution π and let Γ be a given vorticity matrix. Define Then Hence Γ is the vorticity matrix corresponding to π and P . Indeed, diag(π)P − P T diag(π) = diag(π)K + 1 2 Γ − K T diag(π) − 1 2 Γ T = Γ, since K satisfies detailed balance and Γ is skew-symmetric. In order for (11) to have non-negative entries, and to save the irreducibility structure of K, we require that which by symmetry of diag(π)K is equal to the constraint Γ(x, y) > −2π(y)K(y, x), for all x, y ∈ S.
This condition should be compared to (10). We see that P is a stochastic matrix, and π is invariant with respect to P . We see that in the construction (11), we have more freedom in choosing Γ, as the bound for Γ is a factor two larger. Essentially what we have done, is constructing diag(π)P from its symmetric part diag(π)K and its anti-symmetric part, 1 2 Γ.
Remark 3.1. This construction can not be extended in a straightforward way to the case where the invariant distribution of the reversible chain K is not proportional to π. Remark 3.2. This construction was observed earlier in [SGS10].

Equivalence with non-reversible Metropolis-Hastings
Let H be an irreducible Markov chain satisfying the symmetric structure condition (2). Let π be a given density that is nowhere zero. Let K be the (reversible) Metropolis-Hastings chain defined by H and π, i.e. for x, y ∈ S, x = y, and with remaining values K(x, x) as usual determined by the requirement that K defines a Markov chain. Let Γ be a vorticity matrix, and define Lemma 3.3. Let P be as constructed above. The following are equivalent: (i) P defines a valid transition matrix; (ii) The following requirement on Γ, π and H is satisfied:

If (i) and (ii) hold, then P has invariant distribution π and vorticity matrix Γ.
Proof. If (i) holds, it is clear that π is invariant and Γ is its vorticity matrix (by reversibility of K).
(ii) ⇒ (i): The non-negativity of the entries of P follows directly by reversal of the above argument. Furthermore Proposition 3.4. Let Q be irreducible, and suppose Q satisfies the symmetric structure condition (2). Furthermore suppose the non-negativity condition Let P be as constructed by (13), (14) from H, π and Γ. Then P coincides with P Γ as constructed in Section 2.3 by (6), (7), and (9).
Proof. We compute, for x = y, with Q(x, y) = 0, Both P and P Γ have zeroes on the off-diagonal entries for which Q has zeroes. Since both P and P Γ represent transition probabilities, this also fixes there diagonal elements, which concludes the proof.
Remark 3.5. Conversely, the construction of P as a sum of a reversible matrix K and a vorticity component 1 2 diag(π) −1 Γ can be considered to be a special case of non-reversible Metropolis-Hastings (as discussed in Section 2.3). This is an immediate consequence of Proposition 2.6.

Analysis of asymptotic variance of non-reversible chains
The main result of this section is based on [SGS10]. In Appendix A a self-contained version of the proof is provided.
Let K be a transition matrix of a reversible Markov chain with invariant probability distribution µ, and let Γ be a vorticity matrix. Let P = K + 1 2 diag(µ) −1 Γ and suppose P is the transition matrix of an irreducible Markov chain. As observed before, P has µ as its invariant probability distribution. Let f : S → R and write f = f − µ(f ). We consider the partial sums of f (X x t ), where the chain X x t has starting position X x 0 = x ∈ S. By the ergodic theorem, 1 for any x ∈ S. We will use the notation The Dirichlet form E corresponding to P is defined to be the quadratic form The following result is known as the Central Limit Theorem for Markov chains [MT93,Theorem 17.4.4].
Proposition 3.6. Suppose P is irreducible. Let g be a solution of the Poisson equation for P and f , i.e. (I − P )g = f − µ(f ), and let σ 2 f := E(g, g). Then We call σ 2 f the asymptotic variance of f with respect to P . Let σ 2 Γ,f denote the asymptotic variance of f with respect to P , and σ 2 0,f the asymptotic variance of f with respect to K. The following theorem may be found in Appendix A as Corollary A.5.
Theorem 3.7. Suppose Γ is a vorticity matrix that is not identically zero, and such that P as defined above is a valid transition matrix. Then for all f : S → R, we have σ 2 Γ,f ≤ σ 2 0,f , and there exists an f : S → R such that σ 2 Γ,f < σ 2 0,f . In Section 4.2 ('Approach (ii)') we discuss the application of this result to the construction of non-reversible Metropolis-Hastings schemes with a guaranteed improvement of asymptotic variance.

User's guide to non-reversible Metropolis-Hastings
In this section we discuss the application of the theory of the previous chapters to sampling problems. In a typical problem we are given a target distribution π : S → [0, ∞) that is not necessarily normalized. To apply non-reversible Metropolis-Hastings, we have to combine two elements: • an irreducible Markov chain on S with transition matrix Q satisfying the symmetric structure condition (2); • a vorticity matrix Γ that is consistent with Q, in the sense that (10) is satisfied. 1 There are two different approaches to finding a compatible combination of Q and Γ: (i) Start with a non-reversible irreducible proposal chain Q, of which we know a invariant distribution (which does not need to be normalized).
(ii) Start with any proposal chain H and vorticity matrix Γ 0 and combine these in a theoretically 'optimal' way.
The two approaches will be discussed in Sections 4.1 and 4.2, respectively. Approach (ii) may be considered the most favourable approach. This is because this second approach uses the analysis of Section 3.3 to have a guaranteed improvement of asymptotic variance. Also, in a numerical example, discussed in Section 5, approach (ii) seems to outperform approach (i) in many cases (but not all!) with respect to mixing time and spectral gap.
As an important tool required for method (ii), but essentially for both methods, it is useful to understand, construct and manipulate vorticity matrices. Therefore we discuss the structure of the set of vorticity matrices in more detail in Section 4.3.
1 The reader is reminded that (8) guarantees that the resulting chain P constructed by the non-reversible Metropolis-Hastings scheme is a proper transition matrix, and the slightly stronger condition (10) ensures that P inherits the irreducibility of Q.
We see that a sufficient condition in order for (8) to hold is 0 ≤ α ≤ min y∈S π(y) ρ(y) .
We may obtain a better bound by using the ratio where equality holds only in case (Q, ρ) is reversible. Then so that a sufficient condition for (8) to hold is for θ > 1. The case where θ = 1 corresponds to reversibility of (Q, ρ); in this case Γ Q = 0 so that the choice of α is of no significance.
Remark 4.1. If the proposal chain Q is reversible, then by definition ρ(x)Q(x, y) = ρ(y)Q(y, x) for some distribution ρ that is everywhere non-zero. Choosing Γ to be the vorticity of the proposal chain would provide a reversible Metropolis-Hastings chain, since Γ = 0 in this case.

Approach (ii): Guaranteed improvement of asymptotic variance
Corollary A.5 provides us with a practical approach to the construction of non-reversible Metropolis-Hastings schemes which are guaranteed to improve asymptotic variance.
Again we wish to sample from some invariant distribution π (not necessarily normalized). Suppose we we have at our disposal an irreducible chain with transition matrix H, satisfying symmetric structure condition (2), and a vorticity matrix Γ 0 . Suppose furthermore that (15) is satisfied for Γ = ηΓ 0 , for some η ∈ R, η = 0. In particular, it is required that Γ(x, y) = 0 whenever H(x, y) = 0 (or equivalently, by symmetric structure, H(y, x) = 0). In words: there is no vorticity possible along transitions that have have zero probability under H. Then Q, defined by is a valid transition matrix. We will take Q and Γ = ηΓ 0 as our proposal chain and proposal vorticity for non-reversible Metropolis-Hastings, respectively. By Proposition 3.4 the resulting transition P Γ coincides with the chain obtained by adding skew symmetric part 1 2π(x) Γ(x, y) to the reversible Metropolis-Hastings chain with matrix K(x, y), as given in (13). By Corollary A.5, this construction is guaranteed to have improved asymptotic variance relative to reversible Metropolis-Hastings, with proposal chain H. Furthermore (23) tells us that, in order to have the best asymptotic variance, we should choose |η| as large as possible. The maximal value of η is determined by (15), i.e. η max = sup {η > 0 : η|Γ 0 (x, y)| ≤ 2 min (π(x)H(x, y), π(y)H(y, x)) for all x, y ∈ S} . (20) Remark 4.2. Of course the two approaches may lead to identical results, depending on the choice of initial data. Throughout the discussion in this and the previous sections, the use of the symbols Q, π and Γ is consistent. In Approach (i) we start with Q, and from this the matrix H of Approach (ii) may be computed as H(x, y) = Q(x, y) − 1 2π(x) Γ(x, y). Conversely, Q as defined in Approach (ii) from H, π, η and Γ 0 , by (19), will have some invariant distribution ρ.

Remarks on vorticity matrices
Identifying a suitable vorticity matrix Γ corresponding to a given proposal chain may in general pose a challenge. In this section we collect some results of independent interest that improve our understanding of vorticity matrix, as well as mention simple ways of constructing vorticity matrices. The results of Section 4.3 are not used in the remainder of this paper.
A cycle in a graph G = (V, E) with n vertices may be represented by a matrix Γ having the following properties:  This is made precise by the following Proposition.

Proposition 4.3. Let G be a graph over n vertices and let Γ satisfy the conditions (i)-(iv) above. Then there exists a cycle, i.e. a non-empty sub-graph
there exists a non-empty cycle C = (V C , E C ), then there exists a matrix Γ satisfying (i)-(iv), and Proof. "⇒": Since Γ = 0 and Γ is skew-symmetric, there exists a pair (i 0 , i 1 ), i 1 = i 0 , such that Γ(i 0 , i 1 ) > 0. Since Γ is skew-symmetric, Γ(i 1 , i 0 ) < 0, and because rows sum to zero, there must be a positive element on row i 1 . Suppose this is at position (i 1 , i 2 ). Again i 2 = i 1 . We may repeat this procedure until we encounter a node i k that we already obtained. If this vertex is i 0 , we are done. If this vertex is i l = i k for some 0 < l < k − 1 (note l = k is impossible by skew-symmetry), we obtain a cycle {x i l , x i l+1 , . . . , x k } with the required properties by removing vertices i 0 , . . . , i l−1 .
"⇐": Let the entries of Γ(i, j) = 1 and Γ(j, i) = −1 whenever there is a edge between i and j in the directed cycle x 0 , x 1 , . . . , x k , and Γ(i, j) = 0 otherwise. For any i, n j=1 Γ(i, j) = ♯{edges out of i} − ♯{edges into i} = 0, so that (ii) is satisfied. The other conditions (i), (iii), (iv) are clearly satisfied.   Remark 4.5 (Construction from two Markov chains). Let P , Q be two Markov chains, both with invariant distribution π and vorticity matrices Γ P and Γ Q , respectively. Then R := P Q defines the transition matrix function of a Markov chain that also has π as its invariant distribution. We compute where [·, ·] denotes the usual matrix commutator, [A, B] = AB − BA.
In case P and Q are both reversible, then Γ P = Γ Q = 0, and Γ R may be simplified to In this section we apply the theoretical observations of the previous sections to the example of a multi-modal distribution on the n-cycle. The experimental set-up is discussed in Section 5.1 and the corresponding results in Section 5.2. A concise discussion of these results may be found in Section 5.3.

Experimental set-up
Let S = {1, . . . , n} with n > 2 odd (to avoid issues of periodicity). As a 'goal' distribution consider Here M = 0, 1, 2, . . . is a parameter indicating the number of modes in the distribution, and β ≥ 0 is the 'inverse temperature' or steepness of the distribution. For n = 49, plots of π for some values of β and M are provided in Figure 1. We should determine which proposal chain Q and which vorticity matrix Γ we shall use in the non-reversible Metropolis-Hastings scheme. Together, π, Γ and Q will completely determine the P by the construction described in Section 2.3. In order to determine suitable Q and Γ we will apply the two approaches discussed in Section 4.
As reversible proposal chain H we take The linear space of compatible vorticity matrices is one dimensional and spanned by the matrix We now consider the two approaches discussed in Section 4.
• Approach (i): As proposal chain Q for non-reversible Metropolis-Hastings we take Q = H + γΓ 0 , so Γ Q = γΓ 0 . In words, the chain Q moves with probability 1 2 (1 + γ) from node i to i + 1 for i < n, and then from n back to 1, and with probability 1 2 (1 − γ) in the reverse direction. Clearly Q has the uniform distribution as its invariant distribution, and the chain is reversible if and only if γ = 0. We see that γ is a 'measure of non-reversibility' of Q, or equivalently of the vorticity of Q. In order for the 'symmetric structure' requirement, (2), to hold, we require γ < 1.
As discussed in Section 4.1 we use Q as proposal chain and Γ = αΓ Q as vorticity matrix for non-reversible Metropolis-Hastings, with     for 0 < γ < 1 and 0 ≤ c 1 < 1 as motivated by Section 4.1, and where α max is to be determined. If γ = 0, the value of α has no significance since the base chain Q is reversible and thus Γ Q = 0.
We further take Γ = c 2 η max Γ 0 as vorticity matrix in the non-reversible Metropolis-Hastings algorithm. The case c 2 corresponds to the case in which both the proposal chain and the resulting non-reversible Metropolis-Hastings chain are reversible.
The conditions 0 ≤ γ < 1 and 0 ≤ c 1 < 1 (for Approach (i)) en 0 ≤ c 2 ≤ 1 (for Approach (ii)) ensure that the resulting chain P is well-defined (in particular, that condition 8 is satisfied) and that it is irreducible.
To measure the performance of the different chains P that we generate using this scheme we consider two closely related quantities: • The spectral gap σ := 1 − |λ 2 |. Here the eigenvalues of P , λ 1 , . . . , λ n , are listed in order of descending absolute value and with multiplicities according to algebraic multiplicity. A large spectral gap indicates fast mixing; • The ε-mixing time t mix = t mix := inf{t ∈ N : max x∈S ||P t (x, ·) − µ|| TV < ε}. Here µ(x) = π(x) y∈S π(y) is the normalization of π and ||µ − ν|| TV := 1 2 y∈S |µ(y) − ν(y)| is the total variation distance of the discrete probability distributions µ and ν. The choice of ε is not of fundamental importance; we let ε = 1 4 . See [LPW09, Section 4.5]. We fix n = 49, choosing an odd value of n to avoid problems of periodicity for the case γ = 0. The choice of c 1 , γ (Approach (i) and c 2 (Approach (ii)) influence spectral gap and mixing time. In our experiments we have optimized over discrete values of these parameters. We make further observations on this dependence in Section 6 below.

Experimental results
A natural question is for what shapes of invariant distributions π we can expect the largest speedup in sampling. To test this numerically, we varied parameters β and M in (21). For each value of (β, M ) we optimized c 1 , γ and c 2 over a discrete set of values with respect to (a) spectral gap and (b) mixing time. The results of the two approaches are plotted in Figure 2, and for some values of β and M the results are displayed in Table 1.
In Figure 3 the dependence of the spectral gap and the mixing time on n is displayed for different values of β, and M = 2.
Remark 5.1. The obtained optimal location of (c, γ) for the case of a flat distribution (β = 0 and/or M = 2), as given in Table 1, (c ⋆ , γ ⋆ ) = (0.22, 0.11) is a true minimizer, in the sense that the values of t mix (c, γ) and σ(c, γ) as functions of c and γ are certainly not flat.

Discussion of the experimental results
The following observations can be made from the numerical experiments. These are listed more or less in order of their importance.
• For a flat distribution (M = 0), a significant gain (of approximately factor 4) in both spectral gap and mixing time is already obtained. With just one mode (M = 1), there is some improvement, but relatively small. In this case, ordinary Metropolis-Hastings seems adequate. The largest improvement is found for two modes (M = 2). In this regime, at steepness β = 4, dramatic gains in mixing time and spectral gap are obtained. For a larger number of modes (M > 2), there is significant improvement using non-reversible Metropolis-Hastings, although with decreasing effectiveness as a function of M (Figure 2).
• For a large range of values of β (0 ≤ β ≤ 3), the multi-modal cases have better mixing times with non-reversible Metropolis-Hastings than the uni-modal chains have with reversible Metropolis-Hastings (Figure 2 (b), (d), (f)).
• The absolute difference in spectral gap is largest for moderately steep distributions (β = 4).
• Overall, Approaches (i) and (ii) have very similar performance, with in most cases an improved performance of Approach (ii). Only for values of β ≪ 1, Approach (i) seems to outperform Approach (ii) (Figure 2).
• For Approaches (i) and (ii), the optimal values of c 1 and c 2 (respectively) are as close to 1 as possible (within the discrete set of choices) for values of β ≥ 2. For β < 2 lower values of c 1 and c 2 are optimal.
• To maximize spectral gap and/or minimize mixing time, the optimal value of γ in approach (i) depends on the problem parameters β and M . The optimal value of γ seems to be assumed for values strictly less than 1, which are increasing in β.

Discussion
The theoretical discussion of Section 3 and the numerical experiment of Section 5 illustrate a great gain by employing non-reversible Metropolis-Hastings. This encourages one to apply these results to other settings. The practical application of non-reversible Metropolis-Hastings depends on the identification of suitable vorticity matrices Γ that are compatible with proposal chains Q, and establishing these in practical examples provides a promising direction of research. We are currently working on the application of non-reversible Metropolis-Hastings to spin systems. The challenge there is to identify suitable vorticity structures that are compatible with a proposal chain that performs single bit flips, and to investigate which of these structures provide the best MCMC performance. There is much that can be said, and is still being researched, about this example. Therefore we choose to postpone discussion of this example to a second paper.
As another application of non-reversible Metropolis-Hastings one can think of continuous spaces. It is straightforward to transfer the principle of non-reversible Metropolis-Hastings to general measurable spaces. As an application, one can think of the following. Suppose we wish to sample from a given distribution by means of some Langevin equation. In practical sampling algorithms, time is discretized, which introduces errors into the sampling method [RT96]. This should be corrected by performing an Metropolis-Hastings accept/reject step, but this destroys the possibly advantageous non-reversible nature of the Markov diffusion process. Here non-reversible Metropolis-Hastings may well be advantageous. This promising idea will be subject of future research.
Theoretical results on convergence to equilibrium of reversible chains are numerous. For nonreversible chains this topic is still a great challenge. One point in this context that deserves attention is the following. One can in principle apply the theory of Dirichlet forms as discussed e.g. in [Fil91] and reduce the problem of estimating mixing time of a non-reversible chain P to estimating mixing time of the reversible chain P P , where P is the time reversal of P . However, it is our experience that the obtained estimates do not capture the improvements that can be obtained by employing non-reversible chains. As far as I am aware, it remains a challenging open problem to provide, even for simple examples, estimates on mixing times that capture the difference between reversible and non-reversible chains.
A Proof of Theorem 3.7 In this Section we provide the proof of Theorem 3.7, based on the analysis in [SGS10]. Let L 2 (µ) denote the linear space of functions h : S → R and let L 2 0 (µ) denote the linear subspace of functions h : S → R satisfying µ(h) = 0, where both spaces are endowed with the inner product (·, ·) µ as defined above. For h : S → R, write h = h − µ(h) ∈ L 2 0 (S) and var(h) := x∈S π(x)(h(x) − µ(h)) 2 = (h, h) µ . The following lemma collects a few facts about the solution g of the Poisson equation, (I − P )g = f , and E(g, g). We will denote linear operators on L 2 0 (µ) by calligraphic script, e.g. A, G, to distinguish these from matrices, denoted by Roman script, e.g. K, P . This distinction will be of importance when considering inverses and adjoints. The adjoint operator of A in L 2 0 (µ) will be denoted by A ⋆ .
Lemma A.1. Suppose P is irreducible. Then (i) The linear mapping A : h → (I − P )h is invertible as a mapping from L 2 0 (µ) into itself.
(ii) For f : S → R, let g := Gf , so that g is a solution of the Poisson equation for P and f . Then Proof. (i) (I − P ) : L 2 0 (S) → L 2 0 (S). By irreducibility of P , the only functions h : S → R satisfying (I − P )h = 0 are the constant functions, so that I − P is injective and therefore invertible.
(ii) g ∈ L 2 0 (µ) and P g = g − f . Therefore From Lemma A.1 (iii), we see that the effect of the choice of P is, for fixed distribution µ, fully explained by the L 2 0 (µ)-self-adjoint part of G. The following lemma of [Mat92] will be helpful, as observed by [SGS10].
For A as defined in A.1, we have for the skew-adjoint and self-adjoint parts S and H, The adjoint of S in L 2 0 (µ) is then As an immediate corollary of Lemma A.2, we have the following result. We can now describe the main result of [SGS10] regarding asymptotic variance of non-reversible Markov chains. allows the matrix representation (I − K + ½µ T ) −1 , and S and S ⋆ have matrix representations 1 2 diag(µ) −1 Γ and 1 2 diag(µ) −1 Γ T , respectively, resulting in (23) on L 2 0 (µ). Since Γ½ = 0, it follows that (23) holds on L 2 (µ) = L 2 0 (µ) ⊕ span ½ if and only if it holds on L 2 0 (µ). From the above argument we see immediately that, if for some h : S → R, then there exists an f : S → R so that σ 2 1,f < σ 2 2,f .
Corollary A.5. Suppose Γ 1 is a vorticity matrix that is not identically zero, and such that P i as defined in Theorem A.4 is a valid transition matrix. Suppose furthermore that Γ 2 = 0. Then for all f : S → R, we have σ 2 1,f ≤ σ 2 2,f , and there exists an f such that σ 2 1,f < σ 2 2,f .