Metastability for general dynamics with rare transitions: escape time and critical configurations

Metastability is a physical phenomenon ubiquitous in first order phase transitions. A fruitful mathematical way to approach this phenomenon is the study of rare transitions Markov chains. For Metropolis chains associated with Statistical Mechanics systems, this phenomenon has been described in an elegant way in terms of the energy landscape associated to the Hamiltonian of the system. In this paper, we provide a similar description in the general rare transitions setup. Beside their theoretical content, we believe that our results are a useful tool to approach metastability for non--Metropolis systems such as Probabilistic Cellular Automata.


Introduction
In this paper we are interested in the phenomenon of metastability for systems evolving according to transformations satisfying the thermodynamic law for small changes of the thermodynamical parameters. Metastability is a physical phenomenon ubiquitous in first order phase transitions. It is typically observed when a system is set up in a state which is not the most thermodynamically favored one and suddenly switches to the stable phase as a result of abrupt perturbations.
Although metastable states have been deeply studied from the physical point of view, full rigorous mathematical theories based on a probabilistic approach have been developed only in the last three decades. We refer to [11] for a complete recent bibliography. Let us just stress that the three main points of interest in the study of metastability are the description of: (i) the first hitting time at which a Markov chain starting from the metastable state hits the stable one; (ii) the critical configurations that the system has to pass to reach the stable states; (iii) the tube of typical trajectories that the system typically follows on its transition to the stable state. These notions are central quantities of interest in many studies on metastability, which focus on proving convergence results in physically relevant limits, the most typical ones being the zero temperature limit and the infinite volume regime. In this paper, we focus on the finite volume and zero temperature limit setup.
The first mathematically rigorous results were obtained via the pathwise approach, which has been first developed in the framework of special models and then fully understood in the context of the Metropolis dynamics [7,22,24]. In this framework, the properties of the first hitting time to the stable states are deduced via large deviation estimates on properly chosen tubes of trajectories. A different point of view, the potential theoretical approach, has been proposed in [6] and is based on capacity-like estimates. We mention that a more recent approach has also been developed in [3,4].
Here we adopt the pathwise point of view and generalize the theory to the general Freidlin-Wentzel Markov chains or Markov chains with rare transitions setup. For Metropolis chains associated to Statistical Mechanics systems and reversible with respect to the associated Gibbs measure, the metastability phenomenon can be described in an elegant and physically satisfactory way via the energy landscape associated with the Hamiltonian of the system [22,24]. In particular the time needed by the system to hit the stable state can be expressed in terms of the height of the most convenient path (that is the path with minimal energetic cost) that the system has to follow on its way along the energy landscape to the stable state. Moreover, the state of the system at the top of such a path is a gate configuration in the sense that, in the low temperature regime, the system necessarily has to go through it before hitting the stable state. This description is very satisfactory from the physical point of view since both the typical time that the system spends in the metastable state before switching to the stable one and the mechanism that produces this escape can be quantified purely through the energy landscape. Let us mention that a simplified pathwise approach was proposed in [19], where the authors disentangled the study of the first hitting time from the study of the set of critical configurations and of the tube of the typical trajectories.
In this paper we show that a similar physically remarkable description can be given in the general rare transitions (Freidlin-Wentzel) framework, when the invariant measure of the system is a priori not Gibbsian. In this setup the pathwise study of metastability has been approached with a different scheme in [23], where the physical relevant quantities describing the metastable state are computed via a renormalization procedure. Here we show that the strategy developed in [19] can be extended to this setup at the cost of a higher complexity of techniques. A typical way of proceeding is to redefine the height of a path in terms of the exponential weight of the transition probabilities and of a function, the virtual energy, associated to the low temperature behavior of the invariant measure. In other words we reduce the pathwise study of metastability in the general rare transition case to the solution of a variational problem within the landscape induced by this notion of path height, using as a main tool the general cycle theory developed in [8,9]. We stress that, unlike the Metropolis case, this procedure cannot be applied only from the detailed analysis of the set of optimal paths, and that a finer description of the cycle landscape is needed to perform the analysis.
Besides their theoretical content, the main motivation of our results has been to provide a useful tool to approach metastability for a well known class of non-Metropolis systems, namely the Probabilistic Cellular Automata [15,14]. Indeed, in this case, it is possible to write the virtual energy in a rather simple way and then solve the difficult variational problems in the induced landscape [12,13,10].
The technical difficulties that we had to overcome are rather evident: giving a satisfactory mathematical description of metastability in a context where no Hamiltonian is available is a priori rather challenging. We overcame this difficulty using two key ideas.
First idea. In the seminal papers on the pathwise approach to metastability [22,24] results were proved via detailed probability estimates on suitably chosen tube of trajectories. A simpler approach has been pointed out in [19], where, still in the framework of the Metropolis dynamics, the author have shown that the main ingredient necessary to achieve the pathwise description of metastability is the classification of all the states of the systems in a sequence of decreasing (for the inclusion) subsets of the state space, whose elements have increasing stability, in the sense that starting from any one of them the height that has to be bypassed to reach a lower energy level becomes increasingly higher. Moreover, the authors use in a crucial way a recurrence property stating that starting from any state, the process reaches one of these stability level sets within a time controlled exponentially by the stability level of the set itself. This is the point of view we also adopt in the present work.
Second idea. One of the key tools in the pathwise study of metastability is the notion of cycle. In the context of general Markov chains, a cycle can be thought as a subset of the configuration states enjoying the following property: starting from anywhere within the cycle, with high probability the process visits all the states within the cycle before exiting the set itself. In the study of the metastable behavior of Metropolis chains a more physical definition of the notion of cycle was used: a cycle is a set of configurations such that starting from any of them any other can be reached by a path within the set with maximal energy height smaller than the minimal one necessary for the process to exit the set. In this paper, following [8], we use the fact that by defining the height of a path in terms of the virtual energy and of the exponential cost of transition, the two different approaches to cycles can be proven to be equivalent.
The paper is organized as follows. In Section 2 we describe our setup and state the main results. Section 3 is devoted to the discussion of the theory of cycles. In Section 4 we prove our main results. In Appendix A, we develop a condition under which the virtual energy is explicitly computable, and in Appendix B, we make a quick recap about the virtual energy.

Model and main results
In this section we introduce a general setup and state our main results on the metastable behavior of such a system. Then we describe in details this behavior in terms of the virtual energy, which in this setup is the analogous of the Hamiltonian for Metropolis chains.

The Freidlin-Wentzell setup
In this paper we will deal with a finite state space Markov chain with rare transitions. We consider -an arbitrary finite state space X .
Definition 2.1. A family of time homogeneous Markov chains (X n ) n∈N on X with transition probabilities p β indexed by a positive parameter β is said to "satisfy the Freidlin-Wentzell condition with respect to the rate function ∆" or "to have rare transitions with rate function ∆" if and only if for any x, y ∈ X .
The particular case where ∆(x, y) is infinite should be understood as the fact that, at low temperature, there is no transition possible between states x and y. In many papers, a connectivity matrix is introduced, that is a matrix whose non zero terms correspond to allowed jumps, see for instance [24][Condition R, Chapter 6].
We also note that condition (1) is usually written explicitly; namely, for any γ > 0, there for any β > β 0 and any x, y ∈ X . See for instance [24][Condition FW, Chapter 6] where the parameter γ is assumed to be a function of β vanishing for β → ∞, so that in particular the Freidlin-Wentzell setup covers this case.  [20]). It is the particular case where for any (x, y) ∈ X × X where q is an irreducible Markov matrix X × X → [0, 1] which does not depend on β. We stress that the Metropolis algorithm itself is a general framework which has as stationary measure the Gibbs measure of models issued from Statistical Mechanics (see examples later).
2. Weak reversible dynamics with respect to the potential U : X → R or dynamics induced by the potential U : X → R. This is the case where the rate function ∆ is such that for with the convention that +∞ + r = +∞ for any r ∈ R.
Even if the Metropolis dynamics is an example of a potential induced dynamics, these models form a broader class in which other important examples are Probabilistic Cellular Automata, see [18,13,10] and the following Remark 2.5.
From now on, we will always consider the general case of a family of homogeneous Markov chains satisfying the condition in Definition 2.1.

Virtual energy
A fundamental notion for the physical approach of the problem of metastability in the setup of rare transitions chains is the notion of virtual energy, whose definition is based on the following result.
. Consider a family of Markov chains satisfying the Freidlin-Wentzell condition in Definition 2.1. For β large enough, each Markov chain is irreducible and its invariant probability distribution µ β is such that for any x ∈ X , the limit exists and is a positive finite real number.
for x ∈ X , is called virtual energy.
The proof of Proposition 2.3 relies on some deep combinatorial results which are tailored to the Freidlin-Wentzell context. In general, the virtual energy has an exact expression in function of the transition rates ∆ (see, for instance, [8][Proposition 4.1], or the Appendix B at the end of the present work). Unfortunately, in the most general setup, this expression involving a certain family of graphs is intractable for all practical purposes when one is interested to study particular models.
Remark 2.5. In the special case of Probabilistic Cellular Automata, [10,13], the authors deal with models involving a potential G β (x) depending on β and satisfying the balance condition for every positive β. To bypass the technical difficulties inherent to these models, which stem for a large part from the intricate dependence on β of p β (·) and G β (·), the authors computed directly the expressions of the rate function ∆(·) in (1) and of the virtual energy (5). In this way, they obtained a weak reversible dynamics (see (4)). It thus became easier to solve the metastable behavior for these models, using solely the limit expressions obtained. We refer to Appendix A for a more general context in which these techniques still apply and we mention that our hope is that this generalization should cover some other relevant cases in which only the transitions rates are explicitly computable.
Finally, we stress that in the particular cases of Remark 2.2, the virtual energy, up to an additive constant, is precisely the potential which induces the dynamics.
for any x ∈ X .

General definitions
In the present and in the following sections, we introduce some standard notions, which are natural generalizations of the analogous quantities in the reversible setup, see [19] or [24].
A real valued function f : R + → R + is super exponentially small (SES for short) if and only if lim For x ∈ X , we let X x t be the chain started at x. For a nonempty set A ⊂ X and x ∈ X , we introduce the first hitting time τ x A to the set A which is the random variable A path is a sequence ω = (ω 1 , . . . , ω n ) such that ∆(ω i , ω i+1 ) < ∞ for i = 1, . . . , n − 1. For a path ω = (ω 1 , . . . , ω n ), we define |ω| = n its length. For x, y ∈ X a path ω : x → y joining x to y is a path ω = (ω 1 , . . . , ω n ) such that ω 1 = x and ω n = y. For any x, y ∈ X we write Ω x,y for the set of paths joining x to y. For A, B ⊂ X nonempty sets, we write Ω A,B for the set of paths joining a point in A to a point in B.
A set A ⊂ X with |A| > 1 is connected if and only if for all x, y ∈ A, there exists a path ω ∈ Ω x,y such that for any i ≤ |ω|, ω i ∈ A. By convention, we say that every singleton is connected.
For a nonempty set A, we define its external boundary ∂A := {y ∈ X \ A, there exists x ∈ A such that ∆(x, y) < ∞} and we write The bottom F(A) of A is the set of global minima of H on A, that is The set X s := F(X ) is called the set of stable points or the set of ground states of the virtual energy.

Communication height
A key notion in studying metastability is the one of the cost that the chain has to pay to follow a path. In the case of Metropolis dynamics this quantity is the highest energy level reached along a path. Such a notion has to be modified when general rare transitions dynamics are considered [25,10]. We thus define the height or elevation Φ(ω) of a path ω = (ω 1 , . . . , ω n ) by setting The communication height Φ(x, y) between two states x, y ∈ X is the quantity Given two nonempty sets A, B ⊂ X , we define For A, B nonempty subsets of X , we define Ω opt A,B as the set of optimal paths joining A to B, that is the set of paths joining a point in A to a point in B and realizing the min-max Φ(A, B) defined in (9).
For rare transitions dynamics induced by a potential (see Remark 2.2) it is easy to see that the communication height between two states is symmetric. A non-trivial result due to A. Trouvé [25] states that this is the case even in the general setup adopted in this paper.  This corollary is quite interesting and its meaning is illustrated in Figure 1. Indeed, in the case of a dynamics induced by a potential, the jump between two states can be thought of as in the left part of the figure: the chain can jump in both directions and the height reached in both cases is the same. This is not true anymore in general under the sole assumptions of Definition 2.1 (see the illustration on the right in the same figure). Provided the chain can perform the jump from x to y, that is ∆(x, y) < ∞, it is not ensured that the reverse jump is allowed. Moreover, even in such a case, the heights which are attained during the two jumps in general are different. Nevertheless, the important Corollary 2.8 states that the virtual energies of the two states x and y are both smaller than the heights attained by performing any of the two jumps.

Metastable states
The main purpose of this article is to define the notion of metastable states for a general rare transition dynamics and to prove estimates on the hitting time to the set of stable states for the dynamics started at a metastable state.
To perform this, we need to introduce the notion of stability level of a state x ∈ X . First define I x := {y ∈ X , H(y) < H(x)} (10) Figure 2: Illustration of the structure of the sets X a 's (see definition (14)) with 0 < a < V m .
which may be empty in general. Then we define the stability level of any state x ∈ X by and we set V x = ∞ in the case where I x is empty. We also let be the maximal stability level. Metastable states should be thought of as the set of states where the dynamics is typically going to spend a lot of time before reaching in a drastic way the set of stable states X s . Following [19] we define the set of metastable states X m as and in the sequel, see Section 2.7, we will state some results explaining why X m meets the requirements that one would heuristically expect from the set of metastable states. For example, we prove that the maximal stability level V m is precisely the quantity controlling the typical time that the system needs to escape from the metastable state. More generally, for any a > 0, we define the metastable set of level a > 0 as follows The structure of the sets X a 's is depicted in Figure 2. It is immediate to realize that X a ⊂ X a ′ for a ≥ a ′ . Moreover, it is worth noting that X V m = X s .

Cycles, saddles, and gates
We stress that one of our main results (see Theorem 2.16 below) describes a family of sets which have to be crossed with large probability in the low temperature limit.
To introduce these sets, we define as in [19] the notion of saddle points and of gates. We stress that, unlike the Metropolis dynamics, these notions cannot be defined at the level of paths only. Let us discuss this point a bit since this is a major difference between the setups.
The following definition was introduced in [19], and we recall it for expository purposes only. We stress that we cannot adapt it straightforwardly to our setup, as is discussed below.
It would be natural to generalize the definition (see [19]) of the set of minimal saddles between two states x, y ∈ X in the context of Metropolis dynamics as z ∈ X , there exists ω ∈ Ω opt x,y and i ≤ |ω| such that ω i = z and H(ω i−1 ) + ∆(ω i−1 , ω i ) = Φ(x, y) .
In the Freidlin Wentzell setup, this precise definition does not make sense at the level of typical behavior of trajectories. For example, there might be an optimal path ω joining x to y and a a minimal gate W such that ω i ∈ W (and hence H(ω i−1 ) + ∆(ω i−1 , ω i ) = Φ(x, y)) and such that nevertheless the point ω i does not play any particular role for the dynamics. Indeed, there might be a path with cost strictly lower than Φ(x, y) joining ω i−1 to ω i which will be favoured by the dynamics in the low temperature limit.
This phenomenon is very peculiar to the Metropolis setup; indeed, an energy level has to correspond to a point in this setup, whereas in the Freidlin Wentzell setup, this correspondence is not valid anymore.
Nevertheless, we stress that we can generalize the notion of gates and of minimal gates in our setup, at the cost of higher complexity of definitions. To perform this, we need to introduce the key notions of cycle and of principal boundary of a set. The notion of cycle will be discussed in details in Section 3.

Definition 2.9 ([8] Definition 4.2 ).
A nonempty set C ⊂ X is a cycle if it is either a singleton or for any x, y ∈ C, such that x = y, In words, a nonempty set C ⊂ X is a cycle if it is either a singleton or for any x, y ∈ C such that x = y, the probability starting from x to leave C without visiting y is exponentially small. We will denote by C(X ) the set of cycles. The set C(X ) has a tree structure, that is: 4.4). For any pair of cycles C, C ′ such that C ∩ C ′ = ∅, either C ⊂ C ′ or C ′ ⊂ C.
Next we introduce the important notion of principal boundary of an arbitrary subset of the state space X .

Proposition 2.11 ([8] Proposition 4.2).
For any D ⊂ X and any x ∈ D, the following limits exist and are finite: and, for any y ∈ X \ D, We stress that the limits appearing in the right hand side of (16) and (15) have explicit expressions which, as in Definition 2.4 for the virtual energy, seem to be intractable for practical purposes at least in the field of statistical mechanics.
The meaning of the two functions introduced in the Proposition 2.11 is rather transparent: (15) provides an exponential control on the typical time needed to escape from a general domain D starting from a state x in its interior and Γ D (x) is the mass of such an exponential. On the other hand, (16) provides an exponential bound to the probability to escape from D, starting at x, through the site y ∈ X \ D. Hence, we can think to ∆ D (x, y) as a measure of the cost that has to be paid to exit from D through y. Now, we remark that, due to the fact that the state space X is finite, for any domain D ⊂ X and for any x ∈ D there exists at least a point y ∈ X \ D such that ∆ D (x, y) = 0. Thus, we can introduce the concept of principal boundary of a set D ⊂ X We are finally ready to describe in a rigorous way the notion of gates which will be used to state one of our main results, Theorem 2.16.
Definition 2.12. Let x, y ∈ X . Let C x,y be the minimal cycle containing both x and y and let M x,y = {C i , i ≤ n 0 } be its decomposition into maximal strict subcycles. Both these notions are well defined by Proposition 2.10. We define the set of saddles between x and y (which is denoted by S(x, y)) by S(x, y) = C∈Mx,y B(C).
We stress that the set S(x, y) is related in a very intricate way to the energy landscape of the dynamics.
From now on, we can proceed by analogy with the definitions of the Metropolis case (see [19]). Given x, y ∈ X , we say that W ⊂ X is a gate for the couple (x, y) if W ⊂ S(x, y) and every path in Ω opt x,y intersects W , that is We also introduce W(x, y) as being the collection of all the gates for the couple (x, y).
In the metastability literature, the following set is also standard namely, G(x, y) is the set of saddles between x and y belonging to a minimal gate in W(x, y).

Main results
In this section we collect our results about the behavior of the system started at a metastable state. These results justify a posteriori why the abstract notion of metastable set X m fits with the heuristic idea of metastable behavior. . . , w 6 }. The optimal paths in Ω opt x,y are represented by the five black lines. The minimal gates are {w 1 , w 2 , w 4 , w 6 } and {w 1 , w 2 , w 5 , w 6 }. Any other subset of S(x, y) obtained by adding some of the missing saddles to one of the two minimal gates is a gate.
The first two results state that the escape time, that is the typical time needed by the dynamics started at a metastable state to reach the set of stable states, is exponentially large in the parameter β. Moreover, they ensure that the mass of such an exponential is given by the maximal stability level; the first result is a convergence in probability, whereas the second ensures convergence in mean.
Theorem 2.13. For any x ∈ X m , for any ε > 0 there exists β 0 < ∞ and K > 0 such that Theorem 2.14. For any x ∈ X m , the following convergence holds Theorem 2.15. Assume the existence of a recurrent state x 0 for the dynamics, namely, assume that there exists x 0 ∈ X such that late escape from the state x 0 : fast recurrence to x 0 : there exist two functions δ β , T ′ β : [0, +∞] → R such that and for any x ∈ X and β large enough.
Then, the following holds 1. the random variable τ x 0 X s /T β converges in law to an exponential variable with mean one; 2. the mean hitting time and T β are asymptotically equivalent, that is 3. the random variable converges in law to an exponential variable with mean one.
We stress that such exponential behaviors are not new in the literature; for the Metropolis case, we refer of course to [19,Theorem 4.15], and we refer to [1,2] for the generic reversible case. In an irreversible setup, results appeared only much more recently; let us mention [5] and [21]. In the case where the cardinality of the state space X diverges, more precise results than the one described in Theorem 2.15 were obtained in [16] and [17].
Our result is different from the ones we mention here, since we are able to give the explicit value of the expected value of the escape time in function of the transition rates of the family of Markov chains.
The above results are related to the properties of the escape time, the following one gives in particular some information about the trajectory that the dynamics started at a metastable state follows with high probability on its way towards the stable state.
Theorem 2.16. For any pair x, y ∈ X we consider the set of gates W(x, y) introduced in Section 2.6 and the corresponding set of minimal gates. For any minimal gate W ∈ W(x, y), there exists c > 0 such that The typical example of application of this result is to consider x ∈ X m , y ∈ X s , and W ∈ W(x, y); Theorem 2.16 ensures that, with high probability, on its escape from the metastable state x, the dynamics has to visit the gate W before hitting the stable state y. This is a strong information about the way in which the dynamics performs its escape from a metastable state.
We stress that our main tool to prove Theorem 2.16 is the description in great details of the set of typical trajectories of the dynamics of the transitions from x to y, which is the tube of typical trajectories K x,y (see [24,Chapter 6], and in particular Part 6.7, Theorems 6.31 and 6.33 where an analogous description has been performed in the particular case of the Metropolis dynamics). Recall the notations C x,y (the minimal cycle containing x and y) and M x,y (the decomposition into maximal strict subcycles of C x,y ) of Definition 2.12. The set K x,y is a subset of Ω opt x,y which can be described as follows: 1. as soon as the dynamics enters an element C ∈ M x,y , it exits C through its principal boundary B(C). This implies in particular the fact that the dynamics stays within the cycle C x,y during its transition from x to y, as we will show later (see in particular Remark 3.20); 2. as soon as the dynamics enters the unique element C(y) of M x,y containing y, it hits y before leaving C(y) for the first time.
We then state the following proposition about the tube K x,y : For any x, y ∈ X , as β → ∞, the set K x,y has probability exponentially close to 1, that is, for any ε > 0, there exists β 0 such that for any β ≥ β 0 : We stress that in concrete models, such a detailed description of the exit tube relies on an exhaustive analysis of the energy landscape which is unlikely to be performed in general. Nevertheless, for the particular case of PCA's, this analysis can be greatly simplified.
Remark 2.18. For reversible PCA's, the analysis of the phenomenon of metastability was performed in [13] by studying the transition between the metastable state (the − phase) towards the stable state (the + phase in this specific model) using a particular case of Proposition 2.17. Indeed, the decomposition into maximal cycles C (−),(+) was reduced to two cycles only, and the one containing the (−) state was refered to as the subcritical phase. One of the main tasks was then to identify the set of saddles, which in this case was reduced to the principal boundary of the subcritical phase.
Our approach shows in which way this technique should be extended in the more general case of several maximal cycles involved in the maximal decomposition of the cycle C x,y . A practical way to perform this would be to use Definition 2.12 to identify recursively the set of saddles.

Further results on the typical behavior of trajectories
In this section we collect some results on the set of typical trajectories in the large β limit.
The first result of this section is a large deviation estimate on the hitting time to the metastable set X a at level a > 0. The structure of the sets X a 's is depicted in Figure 2. Given a > 0, since states outside X a have stability level smaller that a, it is rather natural to expect that, starting from such a set, the system will typically need a time smaller than exp{βa} to reach X a . This recurrence result is the content of the following lemma.
Proposition 2.19. For any a > 0 and any ε > 0, the function Remark 2.20. Proposition 2.19 allows to disentangle the study of the first hitting time of the stable state from the results on the tube of typical trajectories performed in great details both in [23] and in [9]. This remarkable fact relies on Proposition 3.21, which guarantees the existence of downhill cycle paths to exit from any given set. In the Metropolis setup, this has been performed in [19] (see Theorem 3.1 and Lemma 2.28).
The following result is important in the theory of metastability and, in the context of Metropolis dynamics, is often referred to as the reversibility lemma. In that framework it is simply stated as the probability of reaching a configuration with energy larger than the one of the starting point in a time exponentially large in the energy difference between the final and the initial point. In our general it is of interest to state a more detailed result on the whole tube of trajectories overcoming this height level fixed a priori.
To make this result quantitative, given any x ∈ X and h, ε > 0, for any integer n ≥ 1, we consider the tube of trajectories which is the collection of trajectories started at x whose height at step n is at least equal to the value H(x) + h.
In words, the set E x,h (ε) is the set of trajectories started at x and which reach the height H(x) + h at a time at most equal to ⌊exp (β(h − ε))⌋.

Cycle theory in the Freidlin-Wentzell setup
In this section we summarize some well known facts about the theory of cycles, which can be seen as a handy tool to study the phenomenon of metastability in the Freidlin-Wentzell setup. Indeed, in [22] the authors developed a peculiar approach to cycle theory in the framework of the Metropolis dynamics, see also [24]. This approach was generalized in [10] in order to discuss the problem of metastability in the case of reversible Probabilistic Cellular Automata. In the present setup however we need the more general theory of cycles developed in [8]. We showed in [11] that these two approaches actually coincide in the particular case of the Metropolis dynamics.
We recall in this section some results developed by [8], which will turn out to be the building bricks of our approach.

An alternative definition of cycles
The definition of the notion of cycle given in Section 2.6 is based on a property of the chain started at a site within the cycle itself. The point of view developed in [ [22], Definition 3.1] for the Metropolis case and generalized in [10] in the framework of reversible Probabilistic Cellular Automata is a priori rather different. The authors introduced the notion of energy-cycle, which is defined through the height level reached by paths contained within the energy-cycle.
Even if the definitions 2.9 and 3.1 were introduced independently and in quite different contexts, it turns out that they actually coincide. More precisely, we will prove the following result (see the proof after Proposition 3.9):

Proposition 3.2. A nonempty set A ⊂ X is a cycle if and only if it is an energy-cycle.
After proving Proposition 3.2, we will no longer distinguish the notions of cycle and of energy-cycle.

Depth of a cycle
Here we introduce the key notion of depth of a cycle.
In the particular case where D is a cycle, a relevant property is the fact that, in the large β limit, on an exponential scale, neither τ x D c nor X x τ D c depend on the starting point x ∈ D. More precisely, we can formulate the following strenghtening of Proposition 2.11.
The quantity Γ(C) is the depth of the cycle C.

Cycle properties in terms of path heights
In the framework of the study of metastability, cycles have been defined in terms of the height attained by paths in their interior [22] (see also the generalization given in [10]). In this section we prove the equivalence between these two approaches.
Next we recall the following result, which links the minimal height of an exit path to the quantities we introduced previously. The subsequent natural question is about the height that a path can reach within a cycle. We thus borrow from [8] the following result. For any cycle C ∈ C(X ), x ∈ C, and y ∈ X \ C, there exists a path ω = (ω 1 , . . . , ω n ) ∈ Ω x,y such that ω i ∈ C for i = 1, . . . , n − 1 and For any x, y ∈ C, there is a path ω = (ω 1 , . . . , ω n ) ∈ Ω x,y such that ω i ∈ C for i = 1, . . . , n and Φ(ω) ≤ H(C) + sup{Γ(C) :C ∈ C(X ),C ⊂ C,C = C} < H(C) + Γ(C).
We stress that the right hand side term of (30) is infinite unless y ∈ ∂C.
In an informal way, the first part of Proposition 3.6, together with Proposition 3.5, states that there exists a path ω contained in C except for its endpoint and joining any given x ∈ C to any given point y ∈ ∂C whose cost is equal to the minimal cost one has to pay to exit at y starting from x. Furthermore, the second part can be rephrased by saying that one can join two arbitrary points x and y within C by paying an amount which is strictly less than the minimal amount the process has to pay to exit from C; indeed, using Remark 3.4, the right hand side of (31) can be bounded from above by H(C) + Γ(C).
We stress that this last property ensures the existence of at least one path contained in the cycle connecting the two states and of height smaller than the one that is necessary to exit from the cycle itself. But in general, there could exist other paths in the cycle, connecting the same states, with height larger than H(C)+Γ(C). This is a major difference with the Metropolis case, where every path contained in a cycle has height smaller than the one necessary to exit the cycle itself. From this point of view, the weak reversible case is closer to the general Freidlin-Wentzel setup than to the Metropolis one.
Another important property is the characterization of the depth of a cycle in terms of the maximal height that has to be reached by the trajectory to exit from a cycle.
We state now a result in which we give a different interpretation of the depth of a cycle in terms of the minimal height necessary to exit the cycle. Proof. Since any path connecting C to X \ C has at least one direct jump from a state in C to a state outside of C, we have that Now, recalling that the principal boundary B(C) is nonempty, by Proposition 3.5 we have To get the opposite bound we pickx ∈ C andȳ ∈ X \ C such thatȳ ∈ B(C). Then, by the first part of Proposition 3.6 there exists a path ω ∈ Ωx ,ȳ such that Φ(ω) = H(C) + Γ(C). Hence, we have that Φ(x,ȳ) ≤ Φ(ω) = H(C) + Γ(C). Finally, which completes the proof.
We are now ready to discuss the equivalence between the probabilistic [8] and energy [22] approaches to cycle theory. For any λ ∈ R, consider the equivalence relation . For any λ ∈ R the equivalence classes in X /R λ are either singletons {x} such that H(x) ≥ λ or cycles C ⊂ C(X ) such that max{H(C) + Γ(C),C ∈ C(X ),C ⊂ C,C = C} < λ ≤ H(C) + Γ(C). (33) Thus we have The results we have listed above allow us to finally prove the equivalence between the probabilistic [8] and energy approaches [22,24,10] to cycle theory, that is Proposition 3.2.
Proof of Proposition 3.2. The case A is a singleton is trivial. We assume A is not a singleton and prove the two implications.
First assume A satisfies (28), then A is an equivalence class in X /R Φ(A,X \A) . Thus, by Proposition 3.9, it follows that A is a cycle.
Reciprocally, assume that A is a cycle. By (34), there exists λ such that A is an equivalence class of X /R λ . Moreover, by (33) we have that where in the last step we made use of Proposition 3.8.
We stress that the following properties are trivial in the Metropolis and in the weak reversible setups mentioned in Remark 2.2, whereas in the general Freidlin-Wentzell setup, they are consequences of the non-trivial properties discussed previously in this section (see also [11]).
For example item 1 in the following proposition states that the principal boundary of a non-trivial cycle is the collection of the sites outside the cycle that can be reached from the interior via a single jump at height equal to the minimal height that has to be bypassed to exit from the cycle. This is precisely the notion of principal boundary adopted in [10,13] in the context of reversible Probabilistic Cellular Automata. Note also that such a notion is an obvious generalization of the idea of set of minima of the Hamiltonian of the boundary of a cycle used in the context of Metropolis systems.
Proof. Item 1. This result is an immediate consequence of Propositions 3.8 and 3.5.
Item 3. Pick x ∈ F(C). Since I x ⊂ X \ C, we have that Φ(x, I x ) ≥ Φ(C, X \ C). Since H(x) = H(C), this entails The item finally follows from Proposition 3.8 and definition (11).

Exit times of cycles
The main reason for which the notion of cycles has been introduced in the literature is that one has good control on their exit times in the large deviation regime. We summarize these properties in the following proposition.
Proposition 3.11. For any cycle C ∈ C(X ), x ∈ C, and any ε > 0, we have that is SES; 2. the following inequality holds for any δ > 0:

for any
4. for any y ∈ ∂C This result is the refinement of Proposition 2.11 in the sense that the control on the exit times and exit locations in (38) holds independently of the starting point of the process inside the cycle.
The results of Proposition 3.11 are proven in [8]. More precisely, item 1 is the content of the first part of [8,Proposition 4.19]. Item 2 is [8,Proposition 4.20]. Item 3 is nothing but the property defining the cycles, see Definition 2.9 above. Item 4 follows immediately by Propositions 2.11, 3.3, and 3.5.
By combining Proposition 3.5 and equations (35) and (38) we can deduce in a trivial way 1 the following useful corollary.
Corollary 3.12. For any cycle C ∈ C(X ), ε > 0, x ∈ C, and y ∈ B(C), we have that We discuss an interesting consequence of Proposition 2.21. For a given cycle C, starting from the bottom of C, the probability of reaching an energy level higher than the minimal cost necessary to exit C before exiting C is exponentially small in β. In an informal way, this means that at the level of the typical behavior of trajectories, at least for trajectories started from F(C), the classical notion of cycle for the Metropolis dynamics (which is defined in terms of energies only, see for example [24,Chapter 6]) and the one of energy cycles are close even in the Freidlin-Wentzell setup. More precisely we state the following proposition. Proposition 3.13. For any C ∈ C(X ), any ε > 0 and for β large enough: Let us remark that we expect 3.13 to hold as well starting from anywhere within C, but the proof of this result should be more involved.

Downhill or via typical jumps connected systems of cycles
Beside the estimate on the typical time needed to exit from a cycle, an important property is the one stated in (38) which implies that when the chain exits a cycle it will pass typically through the principal boundary. This leads us to introduce the collections of pairwise disjoint cycles such that it is possible to go from any of them to any other by always performing exits through the principal boundaries. To make this idea precise we introduce the following notion of oriented connection.
Definition 3.14. Given two disjoint cycles C, C ′ ∈ C(X ), we say that C is downhill connected or connected via typical jumps (vtj) to C ′ if and only if B(C) ∩ C ′ = ∅.
The fact that we introduced two names for the same notion deserves a comment: in [19] downhill connection is introduced in the framework of the Metropolis dynamics. In our opinion its natural extension to the general rare transition setup is the typical jumps connection defined in [8,Proposition 4.10]. This is the reason for the double name, nevertheless, in the sequel, we will always use the second one, which appears to be more appropriate in our setup, and we will use the abbreviation vtj.
A vtj-connected path of cycles is a pairwise disjoint sequence of cycles C 1 , . . . , C n ∈ C(X ) such that C i is vtj-connected to C i+1 for all i = 1, . . . , n − 1. A vtj-connected system of cycles is a pairwise disjoint collection of cycles {C 1 , . . . , C n } ⊂ C(X ) such that for any 1 ≤ i < i ′ ≤ n there exists i 1 , . . . , i m ∈ {1, . . . , n} such that i 1 = i, i m = i ′ , and C i 1 , . . . , C im is a vtj-connected path of cycles.
We let an isolated vtj-connected system of cycles to be a vtj-connected system of cycles {C 1 , . . . , C n } ⊂ C(X ) such that Via typical jumps connected systems satisfy the following important property: the height that has to be reached to exit from any of the cycles within the system is the same. Moreover, if the system is isolated, then the union of the cycles in the system is a cycle. More precisely we state the following two propositions.
Proof. Consider C i and C j , 1 ≤ i < j ≤ n. By definition of a vtj-connected system, there exists a path of cycles consisting of vtj-connected elements joining C i to C j , that is where all the indexes k j , for j ≤ i m , belong to [1, . . . , n]. Now, given k ∈ {1, . . . , m − 1} consider x ∈ C i k and y ∈ B(C i k ) ∩ C i k+1 . By Proposition 3.2 and item 1 in Proposition 3.10 we have that Φ(x, y) for any k = 1, . . . , m − 1.
Iterating this inequality along the cycle path C i 1 , C i 2 , . . . , C i m−1 , C im , we get that Φ(C i , X \ C i ) ≥ Φ(C j , X \ C j ), and by symmetry we get Since i and j were chosen arbitrarily in our vtj-connected system, we are done.
Proposition 3.16. Let {C 1 , . . . , C n } be a vtj-connected system of cycles. Assume that the system is isolated (recall the definition given above). Then n j=1 C j is a cycle.
Consider x, x ′ ∈ C and let i, i ′ ∈ {1, . . . , n} such that x ∈ C i and x ′ ∈ C i ′ . If i = i ′ , then by Proposition 3.2 we have that Φ(x, x ′ ) < λ. If, on the other hand, i = i ′ , by definition of vtj-connected system there exists i 1 , . . . , i m such that C i k is vtj-connected to C i k+1 for any k = 1, . . . , m − 1. Thus, by using Proposition 3.2 and item 1 of Proposition 3.10, we can prove that Φ(x, x ′ ) = λ. In conclusion, we have proven that Φ(x, x ′ ) ≤ λ for any x, x ′ ∈ C.
Finally, since the system is isolated we have that Φ(C i , X \ C) > λ for any i = 1, . . . , n and hence, Φ(C, X \ C) > Φ(x, x ′ ) for any x, x ′ ∈ C. Thus, by Proposition 3.2, we have that C is a cycle.

Partitioning a domain into maximal cycles
In the proof of our main results a fundamental tool will be the partitioning of a set into maximal subcycles. By maximal we mean that given such a partition into cycles, the union of any of them is either the whole set or is not a cycle.
More precisely, consider D ⊂ X nonempty. A partition into cycles of D is a partition {C i , i ∈ I} of D, where I is a finite set of indexes, such that C i ∈ C(X ) for any i ∈ I. Definition 3.17. A partition into maximal cycles of the nonempty set D ⊂ X is a partition {C i , i ∈ I} of strict subcycles of D such that the union of a number strictly smaller than |I| of the cycles C i 's is not a cycle.
The existence of such a partition is ensured by Proposition 2.10 and by the fact that singletons are themselves cycles. In Section 3.7 we describe a constructive way to get such a partition for any set D.
In the case where D is itself a cycle, this partition into maximal cycles is reduced to the set D. In such a case, we can nevertheless decompose it into maximal strict subcycles. . Consider a non trivial cycle C ∈ C(X ) (in particular |C| ≥ 2), and its decomposition into maximal strict subcycles C = n 0 j=1 C j where C j are disjoint elements of C(X ), n 0 ≥ 2. The existence of such a decomposition is ensured by the tree structure of Proposition 2.10.
The collection {C 1 , . . . , C n 0 } is an isolated vtj-connected system of cycles. Finally, from Propositions 3.15 and 3.8 it follows that for any i, j ≤ n 0 .
Remark 3. 19. We stress that the original Proposition 4.10 in [8] is actually much more exhaustive than the version presented here, and it allows in particular to construct the set of cycles C(X ) in a recursive way by computing at the same time the quantities Γ(C) and the ∆ C (y) (for y ∈ ∂C) for any element C ∈ C(X ), but this version will be enough for our purposes. We refer to [8] for more details.
Remark 3.20. For x, y ∈ X , from Proposition 3.18 and from Definition 2.12, one trivially gets the inclusion S(x, y) ⊂ C x,y .
A useful property of a partition of a domain into maximal cycles is contained in the following proposition.
Proposition 3.21. Consider a partition {C i , i ∈ I} into maximal cycles of a nonempty set D ⊂ X . Let J ⊂ I such that {C j , j ∈ J} is a vtj-connected system of cycles. Then this system is not isolated, namely, there exists j ∈ J such that Proof. The proposition follows immediately by the maximality assumption on the partition of D and by Proposition 3.16.
As a consequence of the above property we show that any state in a nonempty domain can be connected to the exterior of the domain by means of a vtj-connected cycle path made of cycles belonging to the domain itself. This will be a crucial point in the proof of Proposition 2.19. Proof. If D is a cycle the statement is trivial. Assume D is not a cycle and consider {C i , i ∈ I} a partition of D into maximal cycles. Note that |I| ≥ 2. Now, we partition {C i , i ∈ I} into its maximal vtj-connected components {C (j) k , k ∈ I (j) }, for j belonging to some set of indexes J. More precisely, we have the following:

each collection {C
(j) k , k ∈ I (j) } is a vtj-connected system of cycles; k ′ for any j, j ′ ∈ J such that j = j ′ , any k ∈ I (j) , and k ′ ∈ I (j ′ ) .

for any
is not a vtj-connected system of cycles.
By the property 1 above and by Proposition 3.21, if the union of the principal boundary of the cycles of one of those components does not intersect the exterior of D, then it necessarily intersects one of the cycles of one of the other components. Otherwise stated, for any j ∈ J k∈I (j) Now, consider x ∈ D and j 0 ∈ J such that x ∈ ∪ k∈I (j 0 ) C (j 0 ) k . We construct a sequence of indexes j 0 , j 1 , · · · ∈ J by using recursively the following rule k ′ = ∅ and let j r+1 = j until the if condition above is not fulfilled. Note that all the indexes j 0 , j 1 , . . . are pairwise not equal, namely, the algorithm above does not construct loops of maximal vtj-connected components. Indeed, if there were r and r ′ such that j r = j r ′ then the union of the maximal vtj-connected components corresponding to the indexes j r , j r+1 , . . . , j r ′ would be a vtj-connected system of cycles and this is absurd by definition of maximal connected component (see property 4 above).
Thus, since the number of maximal vtj-connected components in which the set {C i , i ∈ I} is partitioned is finite, the recursive application of the above rule produces a finite sequence of indexes j 0 , j 1 , . . . , j rx with r x ≥ 0 such that k , k ∈ I (jr) } for r = 0, . . . , r x we construct a vtj-connected cycle path C 1 , . . . , C n ⊂ D such that C 1 is the cycle containing x and belonging to the component {C (j 0 ) k , k ∈ I (j 0 ) } and C n is one of the cycles in the component {C

Example of partition into maximal cycles
It is interesting to discuss a constructive way to exhibit a partition into maximal cycles of a given D ⊂ X . For this reason we now describe a method inherited from the Metropolis setup in [19]. For D ⊂ X nonempty and x ∈ D, we consider namely, R D (x) is the union of {x} and of the points in X which can be reached by means of paths starting from x with height smaller that the height that it is necessary to reach to exit from D starting from x. 2. the set R D (x) is a cycle; Proof. The first item is clear by the definition of communication heights. Indeed, by contradiction, assume that there exists y ∈ R D (x) ∩ (X \ D), then Φ(x, y) satisfies simultaneously which is absurd. Second item. We consider u, v ∈ R D (x) and we show that Φ(u, v) < Φ(x, X \ A). As a consequence, we will get that R D (x) is a maximal connected subset of X satisfying that the maximum internal communication cost is strictly smaller than the given threshold Φ(x, X \ D), and, by Proposition 3.9, these sets are cycles.
To prove the opposite inequality, consider ω ∈ Ω opt x ′ ,x . From the Proposition (2.7), we get that x,X \D and note that Φ(ω ′ ) = Φ(x, X \ D). Then, the path ω ′′ ∈ Ω x ′ ,X \D obtained by concatenating ω and ω ′ satisfies (45) is thus completed. Now we come back to the proof of the third item. We consider x ′ ∈ R D (x) and proceed by double inclusion. We first show that R D (x ′ ) ⊂ R D (x). Pick up y ∈ R D (x ′ ): from the definition of R D (x ′ ) and (45), we get that Φ(x ′ , y) < Φ(x ′ , X \ D) = Φ(x, X \ D). Now we consider ω ∈ Ω opt x,x ′ , and by a concatenation argument similar to the one we already used twice, we get that On the other hand, the inclusion R D (x) ⊂ R D (x ′ ) proceeds in the same vein. Consider y ∈ R D (x) so that Φ(x, y) < Φ(x, X \ D). Pick up a path ω ∈ Ω opt x ′ ,x . Using again the symmetry Moreover, a concatenation argument shows that where we have also used that y ∈ R D (x). Finally, from (45), we deduce Φ(x ′ , y) < Φ(x ′ , X \ D), which implies y ∈ R D (x ′ ).
The main motivation for introducing the sets (44) is the fact that they provide in a constructive way a partition of a given set into maximal subcycles. The existence of such a partition is ensured by the structure of the set of cycles, see Proposition 2.10, but we point out that this way of obtaining the maximal subcycles of a given set D seems to be new in the context of the irreversible dynamics. Before stating precisely this result, for D ⊂ X , we set Proposition 3.24. Let D ⊂ X nonempty, then R D is a partition into maximal cycles of D.
Proof. In view of definition (44) and Proposition 3.23, the only not obvious point of this result is the one concerning maximality. Note that the maximality condition on cycles can be stated equivalently as follows: any cycle C ∈ C(X ) such that there exists R ∈ R D verifying R ⊂ C and R = C satisfies C ∩ (X \ D) = ∅. Now, assume that C ∈ C(X ) is a cycle strictly containing R D (x) for some x ∈ D. We will show that necessarily C ∩ (X \ D) = ∅.
By definition of R D (x), C contains a point v / ∈ R D (x), that is Φ(x, v) ≥ Φ(x, X \ D). As both x and v are elements of C, recalling Proposition 3.2, we get that On the other hand, we can choose y ∈ X \ D such that there exists ω ∈ Ω x,y satisfying Φ(ω) = Φ(x, X \ D). Then the above bound implies that Φ(C, X \ C) > Φ(ω) and in particular y ∈ X \ D. Hence the result.

Proof of main results
In this section we prove the results stated in Sections 2.7 and 2.8. We follow the scheme of [19], but the proofs are a bit different. The proofs of Theorems 2.14 and 2.15 are quite similar to the analogous ones in [19], nevertheless we chose to include them for the sake of completeness.
Proof of Theorem 2.13. Proof of (18). Let C be the set of states y ∈ X such that Φ(x, y) < V m + H(x). By Proposition 3.2 the set C is a cycle and, by construction, x ∈ F(C) and Φ(C, X \ C) = V m . Hence, by Proposition 3.8 it follows that Γ(C) = V m − H(x). Finally, since X s ∩ C = ∅ implies τ x X s ≥ τ x ∂C , we have that (18) follows by item 2 in Proposition 3.11. Proof of (19). As we have already remarked at the end of Section 2.5, see also Figure 2, X V m = X s . Hence, (19) is an immediate consequence of Proposition 2.19.
Before proving Theorem 2.14 we first state and prove the following preliminary integrability result.
Lemma 4.1. Given any real δ > 0 and any state x ∈ X , the family of random variables Y x β = τ x X s e −β(V m +δ) is uniformly integrable, more precisely, for any n ≥ 1 for β large enough.
Proof. For any n ≥ 1, by making use of the Markov property, we directly get Recalling that X V m = X s (see the end of Section 2.5) and making use of Proposition 2.19, we get that the above quantity is bounded from above by 2 −n as soon as β large enough.
Proof of Theorem 2.14. Fix x ∈ X m and δ > 0. Combining the convergence to zero in probability of the random variables Y β = τ x X s e −(V m +δ)β , which has been shown in Theorem 2.13 and their uniform summability stated in Lemma 4.1, we get that the family of random variables Y β converges to 0 in L 1 . Hence, for β large enough, On the other hand, by making use of the Markov's inequality we get the following bound: Using once again Theorem 2.13, we obtain that there exists K > 0 such that as soon as β is large enough. The Theorem 2.14 finally follows from bounds (48) and (49).
Proof of Theorem 2.15. We first prove item 1. Let x 0 be the recurrent state of Theorem 2.15 and recall (21)-(23). We consider s, t > 0 and let τ x 0 * (t) = inf{n ≥ tT β , X n ∈ {x 0 , X s }} be the first hitting time to the set {x 0 , X s } after time tT β for the chain X n started at x 0 .
Then we decompose: Using the Markov property and of the fact that {τ x 0 X s > τ x 0 * (t)} ⊂ {X τ x 0 * (t) = x 0 }, we directly get: Combining monotonicity and the fast recurrence property (23), by the decomposition (50) we deduce Here and later, we made use of the following obvious monotonicity property: where T is any random variable. We bound the same quantity from above in a similar fashion. Namely, using (50) once again: Consider β large enough so that T ′ β ≤ T β . For any given integer k ≥ 1, combining (52) and monotonicity, we get: Given the definition of T β (see (21)), there exists r ∈ (0, 1) such that δ β + P β [τ x 0 X s > T β ] ≤ r as soon as β is large enough. As a consequence, for β large enough, the following inequality holds: and this implies the tightness of the family τ x 0 X s /T β . Combining the upper bound (52) and the lower bound (51), we deduce that the limit in law X of any subsequence τ x 0 X s /T β β k satisfies the relation: for any t, s ≥ 0 which are continuity points for the distribution of τ x 0 X s . Since the set of such points is dense in R and a distribution function is always right continuous, (54) is valid for every s, t ≥ 0. This implies that P β (X > t) = e −at with a ∈ (0, ∞]. It is clear that the case a = ∞ is excluded from the definition of T β , since it would imply that X is almost surely equal to zero, which is in contradiction with the fact that By the Porte-Manteau theorem, we get that and combining (55) and (56), we conclude that a = 1.
As for item 2, combining the dominated convergence theorem and the uniform summability (53), we can write which entails the convergence (24). Item 3 directly follows from items 1 and 2 of the current theorem, which concludes the proof. Now, given x, y ∈ X , we consider a minimal gate W ⊂ W(x, y) as in Definition 2.12 and we go to the proofs of Theorem 2.16 and of Proposition 2.17.
To prove both these results, we first construct in a more formal way the tube of typical trajectories K x,y introduced in Section 2.7; we stress that this task is performed by making an extensive use of the notions developed in the previous parts. Then we show that K x,y is indeed typical in the low temperature regime, that is we show Proposition 2.17. Our task to prove Theorem 2.16 will then be reduced to show the inclusion K x,y ⊂ {τ x W < τ x y }. To give an explicit description of the set K x,y , we first need to introduce some typical events and recurrent notations.
We introduce the positive quantity and let ε ∈ (0, δ 0 /2). We define the cycle Of course, the cycle C coincides with the cycle C x,y of Definition 2.12 and we define it in this way for technical purposes only.
Note that any path Ω opt x,y is contained in C. Also, we already noted (and this is actually the major technical difference with the analogous result of [19]) that there might be paths contained in C, joining x to y and which do not belong to Ω opt x,y . We introduce the decomposition M = {C j , j ≤ n 0 } of C into maximal strict subcycles of C. The decomposition M is an isolated vtj-connected system of cycles (see Proposition 3.18). Then we discuss some geometrical properties of the decomposition M.
We first note that it is clear that x and y are not contained in the same element of M. Indeed, if they were contained in a common elementC ∈ M, we would have Φ(x, y) < H(C) + Γ(C) and in particular, from the definition of C, this would imply C ⊂C, which is absurd from the non triviality of the decomposition M. Thus we can denote by C(x) and C(y) the two (distinct) elements of M containing respectively the states x and y. More generally, for any u ∈ C, we define C(u) as being the element of M containing u.
To define K x,y , we shall start to restrict the set of trajectories to the set of trajectories Ω x,y ∩ {τ x y < τ x X \C }, for which the events we are going to introduce are well defined. More precisely, for a given trajectory of the canonical process ω ∈ Ω x,y ∩ {τ x y < τ x X \C }, we first define θ x 0 := 0, C x 0 = C(x) and for j ≥ 1: and C x j = C(ω θ x j ) is the element of M containing ω θ x j . This construction goes on as long as j ≤ j x,y , where we consider j x,y := inf{j ≥ 1, C x j = C(y)}. More generally, for any u ∈ C, we introduce the similar quantities (θ u j ) j , (C u j ) j , with notations which are self explanatory.
Then we introduce the event which is the event that the process hits y after entering C(y) before leaving C(y) for the first time.
For 0 ≤ i ≤ n 0 and u ∈ C i , we introduce the event where (ω u k ) k≥0 denotes a trajectory of the canonical process starting from u. Finally we can define the set K x,y , the tube of trajectories of the dynamics on its transition between x and y: We refer to Section 2.7 for an informal definition of K x.y .
Proof of Proposition 2.17.
We prove that as soon as β is large enough. Our proof first relies on the fact that, given δ > 0, for β large: which follows from the finiteness of X and Corollary 3.12. Then we will use the fact that for any ε ′ > 0, as soon as β is large enough: which we show at the end of the proof of Proposition 2.17. Let us note that in [9], the authors showed a result related to ours, in the sense that they provide the precise cost on a large deviation scale of not following a path contained in K x,y ∩Ω opt x,y on the transition from x to y. For our sake such a level of precision is not needed. On the other hand, we had to deal with the (easy) problem of giving an upper bound on the random variable j x,y , which was overcome in [9] by the notion of pruning tree.
We show how to deduce Proposition 2.17 from combining (58) and (59). For lightness of notations, we introduce the conditional probability in the next sequence of inequalities. Of course, since y ∈ C and y ∈ C(y), applying the strong Markov property at time τ x C(y) and Definition 2.9 we immediately get that, for any ε ′ > 0, as soon as β is large enough: It follows from this inequality that similar inequalities to (58) and (59) also hold for the probability P β instead of P β , and we will still refer to these slightly modified versions of (58) and (59) as (58) and (59) in the following.
Denoting by ε ′ a (small) positive constant which may change from line to line, we then get: where we used (60), (59) and the strong Markov property. Now from (58), we get and considering δ > ε, the statement of Proposition 2.17 follows. Now we are left with the proof of (59). Since M is an isolated vtj-connected system of cycles, we deduce that as soon as β is large enough. Indeed, there exists a vtj connected path of cycles (C u 1 , . . . ,C u m ) of length m (with m ≤ n 0 ) joining C(u) to C(y). For any u ∈ C, applying the strong Markov property at the time of first entrance intoC u 1 and proceeding iteratively, we get: where in the third inequality we used Corollary 3.12 and the definition of vtj-connectedness. Since the last term does not depend on u, we get (61). Making use recursively of the strong Markov property at times θ x ke εβ /n 0 , k = 1, . . . , n 0 , of the trivial bound n 0 ≤ |X | and of (61), we get: (63) and (59) then follows by choosing ε ′ ∈ (0, ε/|X | 2 ). This concludes the proof of Theorem 2.16.
Then we note that considering Proposition 2.17, for Theorem 2.16 to hold, it is enough to show the inclusions Ω opt Indeed, this implies in particular the trivial bound and Proposition 2.17 provides the requested lower bound on this last quantity. We remark that the inclusions of (65) are strict in general. The first inclusion follows immediately from the fact that an optimal path in Ω opt x,y exits from an element of M through its principal boundary. Also, it is clear that some paths in the set K x,y might not be optimal, and hence that it might be strict in general.
The second inclusion of (65) is not straightforward and we stress that it relies crucially on Proposition 3.6. Let us detail it.
Consider first the case ω ∈ K x,y ∩ Ω opt x,y . Since ω ∈ Ω opt x,y , by definition of a gate (see Section 2.6), it follows immediately that ω ∩ W = ∅.
Consider now an element ω ∈ K x,y \ Ω opt x,y , that is ω is an element of K x,y such that Φ(ω) > Φ(x, y).
To show the second inclusion of (65), the strategy is the following: we consider the sequence of points (u 1 , . . . , u j ) which are the successive points where ω intersects C ∈M B(C). The sequence (u 1 , . . . , u j ) is nonempty from the construction of K x,y and from the fact that C(x) = C(y). We are going to construct stepwise a path ω ∈ K x,y ∩ Ω opt x,y such that From the definition of a gate and from the fact that ω is optimal, we deduce that ω ∩ W = ∅. From this it follows that ω ∩ W = ω ∩ W = ∅, which indeed implies the second inclusion of (65).
To construct the path ω, we proceed in a recursive way; more precisely, we construct a sequence of paths (ω (k) ) k≥0 ∈ K x,y which becomes stationary for k large enough. We initialize our recursion by setting ω (0) := ω. Then, as long as the path ω (k) is not optimal, we proceed in the following way: consider and C k the element of M containing ω (k) i k . Then we distinguish two cases: ω • In the case where ω (k) i k +1 ∈ B(C k ), we make use of (30) in Proposition 3.6 and of (64) to get that there exists a path ω ′ ∈ Ω ω (k) and for any j ≤ |ω ′ | − 1, ω ′ j ∈ C k . We define the concatenated path Note that ω (k+1) ∈ K x,y and that u ∈ ω (k+1) . Then we continue the recursive construction.
• In the case where ω (k) y) and such that ω ′ is entirely contained in C k . Then we define the path ω (k+1) as in (67), and we note that in this case also ω (k+1) ∈ K x,y and u ∈ ω (k+1) .
It is clear from the construction that the sequence of paths (ω (k) ) k≥0 is stationary after a number of steps at most |ω|, and that the final path ω obtained at the end of the recursion is an element of K x,y ∩ Ω opt x,y satisfying (66). Hence the second inclusion in (65) follows, and thus Theorem 2.16 is proved. Now we go to the proof of Proposition 2.19. We first note that, in the spirit of [19], we need a downhill cycle path (see the definition in Section 3.5) connecting any given point x ∈ X \ X a , for a > 0, to X a . We recall that the notion of downhill cycle path given in [19] and [22], even if quite peculiar to the Metropolis dynamics setup, finds its natural extension to the general rare transition setup in [23] and in [9] through the notion of "via typical jumps" connection.
Proof of Proposition 2.19. Let a > 0, we assume that X a is a proper subset of X , otherwise there is nothing to prove. We consider x ∈ X \ X a and note that, by Proposition 3.22, there exists a vtj-connected cycle path C 1 , . . . , C l ⊂ X \ X a such that x ∈ C 1 and B(C l ) ∩ X a = ∅.
Since none of the cycles C 1 , . . . , C l can contain points of X a , for any i = 1, . . . , l and any z ∈ F(C i ) the stability level V z (recall definition (11)) of z satisfies V z ≤ a, and hence from item 3 in Proposition 3.10, we have Γ(C i ) ≤ a for any i = 1, . . . , l.
Then, from item 1 in Proposition 3.11, for any cycle C i of the vtj-connected path, for any z ∈ C i , and for any ε > 0, the function is SES. We consider y ∈ B(C l ) ∩ X a and, for each 2 ≤ i ≤ l, we consider y i ∈ B(C i−1 ) ∩ C i . We define y 1 = x and y l+1 = y, and we consider the set of paths consisting of the paths constructed by the concatenation of any l-uple of paths ω 1 , ω 2 , . . . , ω l satisfying the following conditions: 1. for any i = 1. . . . , l the length of the path ω i satisfies |ω i | ≤ e β(a+ε/4) ; 2. for any i = 1, . . . , l the path ω i joins y i to y i+1 , that is, ω i ∈ Ω y i ,y i+1 (recall the notation introduced in Section 2.3); 3. ω i j ∈ C i for any i = 1, . . . , l and for any j = 1, . . . , |ω i | − 1.
The existence of such a family of paths is ensured by Propositions 3.2, 3.8, 3.5, and 3.6. We stress that condition 1 restricts the set E to paths which spend a time less than e β(a+ε/4) in any cycle C i , i ≤ l.
For shortness, in the sequel, we shall use the notation E for the set of trajectories defined in (68).

Now, we write
where in the last step we have used the bound above on the length of the trajectories in E. Then we use Markov's property to get that Combining this inequality and (39) implies that, for any ε ′ > 0, P β τ x Xa ≤ e β(a+ε/2) ≥ e −βε ′ l ≥ e −βε ′ |X | as soon as β is large enough.
Since the last term in the right hand side of the bound above does not depend on x ∈ X a , we get that inf x∈Xa P β τ x Xa ≤ e β(a+ε/2) ≥ e −βε ′ |X | .

A Computing differences of virtual energy
In this appendix, we describe an abstract framework for which the virtual energy has a priori no explicit expression, but where we can construct it stepwise starting from a reference point acting as a point of null potential. We consider a Freidlin Wentzell dynamics satisfying Definition 2.1 and such that for every x, y ∈ X ∆(x, y) < ∞ if and only if ∆(y, x) < ∞.
Moreover, we assume that the dynamics satisfies the additional condition (where we recall that µ β is the invariant measure).
Of course, the convergence (74) is nothing else than requesting the existence of a potential, which is equal to the virtual energy up to a constant (see (4) and Proposition 2.6). Now we fix an arbitrary statex ∈ X and we define the Hamiltonian-like quantity For any x ∈ X , x =x, by irreducibility, there exists a path ω ∈ Ωx ,x such that |ω| ≤ X . Given such a path, we define the quantity and we set W ω (x) := 0.
Proposition A.2. Given x ∈ X and x =x, the quantity W ω (x) defined by (76) does not depend on the particular choice of the path ω ∈ Ωx ,x , and hence it defines a function W : X → R. The function W (·) − min X W coincides with the virtual energy H.
In general, the virtual energy might have an expression too involved for practical purposes. Equation (76) provides a constructive way to compute explicitly H step by step just from the knowledge of the rates of the dynamics.
Proof. For any x, y ∈ X and x = y, we consider ω, ω ′ ∈ Ω x,y and show that Indeed, using telescoping sums in the right hand side above, we can assume that all the ω ′ i 's are distinct (and in particular |ω ′ | ≤ |X |).
Now we divide both sides by β and we let β → ∞ to deduce (77).

B Explicit expression of the virtual energy
As noted in Section 2.2, the virtual energy H(x), for x ∈ X , has an explicit expression in terms of a specific graph construction. The same holds for the functions Γ D (x) and ∆ D (x, y), with D ⊂ X , x ∈ D, and y ∈ X \ D, introduced in Proposition 2.11. These explicit expressions were not necessary for our purposes, but for the sake of completeness, we choose to summarize these formulas in this appendix. We use the notations of [8], but since we do not want to develop the full theory here, we try to keep it as minimal as possible.
Definition B.1. Given A ⊂ X nonempty, let G(A) be the set of oriented graphs g ∈ X × X verifying the following properties: -for any x ∈ X \ A, there exists a unique y ∈ X such that (x, y) ∈ g (namely for any point in X \ A, there exists a unique arrow of the graph g exiting from such a point); -for any edge (x, y) ∈ g, x ∈ X \ A (no arrow of the graph g exits from A); -for any x ∈ X , n ∈ N, (x, x 1 ), (x 1 , x 2 ), . . . , (x n−1 , x n ) ∈ g one has that x = x i for i = 1, . . . , n (the graph g is without loops).
Since X is finite, from this definition it follows that for x ∈ X \ A, there exists a sequence of arrows connecting x to A. We borrow (and adapt to our notation) a beautiful description of the set G(A) from [23, below Definition 3.1]: G(A) is a forest of trees with roots in A and with branches given by arrows directed towards the root.
Definition B.2. Given A ⊂ X nonempty, x ∈ X \ A, and y ∈ A, let G x,y (A) be the collection of graphs g ∈ G(A) such that there exist n ∈ N and x 1 , . . . , x n ∈ X such that (x, x 1 ), (x 1 , x 2 ) . . . , (x n , y) ∈ g.
In words, G x,y (A) is the set of graphs in G(A) connecting the point x to the point y.