Effect of energy degeneracy on the transition time for a series of metastable states: application to Probabilistic Cellular Automata

We consider the problem of metastability for stochastic reversible dynamics with exponentially small transition probabilities. We generalize previous results in several directions. We give an estimate of the spectral gap of the transition matrix and of the mixing time of the associated dynamics in terms of the maximal stability level. These model-independent results hold in particular for a large class of Probabilistic Cellular Automata (PCA), which we then focus on. We consider the PCA in a finite volume, at small and fixed magnetic field, and in the limit of vanishing temperature. This model is peculiar because of the presence of three metastable states, two of which are degenerate with respect to their energy. We identify rigorously the metastable states by giving explicit upper bounds on the stability level of every other configuration. We rely on these estimates to prove a recurrence property of the dynamics, which is a cornerstone of the pathwise approach to metastability. Further, we also identify the metastable states according to the potential-theoretic approach to metastability, and this allows us to give precise asymptotics for the expected transition time from any such metastable state to the stable state.


Introduction
Metastability is a phenomenon that occurs when a physical system is close to a first order phase transition. Among classical examples are super-saturated vapors and ferromagnetic materials in a hysteresis loop [51]. The metastability phenomenon occurs only for some thermodynamical parameters when a system is trapped for a long time in a state different from the stable state. This is the so-called metastable state. While the system is trapped, it behaves as if it was in equilibrium, except that at a certain time it makes a sudden transition from the metastable state to the stable state. Metastability occurs in several physical situations and this has led to the formulation of numerous models for metastable behavior. However, in each case, three interesting issues are typically investigated. The first is the study of the transition time from any metastable state to any stable states. The fluctuations of the dynamics should facilitate the transition, but these are very unlikely, so the system is typically stuck in the metastable state for an exponentially long time. The second issue is the identification of certain configurations, the so-called critical configurations, that trigger the transition. The system fluctuates in a neighborhood of the metastable state until it visits the set of critical configurations during the last excursion. After this, the system relaxes to equilibrium. The third and last issue is the study of the typical path that the system follows during the transition from the metastable state to the stable state, the so-called tube of typical trajectories. This issue is especially interesting from a physics point of view.
The goal of this paper is twofold. First we consider general dynamics with exponentially small transition probabilities and we give an estimate of the mixing time and of the spectral gap of the transition matrix in terms of the maximal stability level. Second, we focus on a specific Probabilistic Cellular Automata in a finite volume, at small and fixed magnetic field, in the limit of vanishing temperature and we prove some results describing the metastable behaviour of the system.
Let us now discuss the two goals in detail, starting with a comparison between our estimates for the mixing time and the spectral gap and the literature on the topic. Similar results on the estimate of the spectral gap have been proved for the model of simulated annealing in [38]. The authors use Sobolev inequalities to study the simulated annealing algorithm and they demonstrate that this approach gives detailed information about the rate at which the process is tending to its ground state. Thanks to this result the mixing time is estimated for Metropolis dynamics. Our model-independent theorems are a generalization of the result in [45,Proposition 3.24] to reversible dynamics with exponentially small transition probabilities in finite volume. The analysis of the spectral gap between the zero eigenvalue and the next-smallest eigenvalue of the generator is very interesting for Markov processes, since it is useful to control convergence to equilibrium. In [10] the authors focus on the connection between metastability and spectral theory for the so-called generic Markov chains under the assumption of non-degeneracy. In particular, they use spectral information to derive sharp estimates on the transition times. We refer also to [7,Chapter 8 and 16], where the authors incorporate all the previous results about the study of metastability through spectral data. In particular, they show that the spectrum of the generator decomposes into a cluster of very small real eigenvalues that are separated by a gap from the rest of the spectrum. In order to study our PCA, we extend their estimates of the spectral gap to the case of degenerate in energy metastable states. The states σ and η are degenerate metastable states if they have the same energy and the energy barrier between them is smaller then the energy barrier between a metastable state and the stable state (see Condition 2.4 for a precise formulation and see [7,Chapter 16.5 point 3] for a discussion). To suit our purposes, we express these estimates as functions of the virtual energy instead of the Hamiltonian function, see Equation (2.5) for the specific definition and [14], [21].
Regarding the expected transition time, in [25] the authors consider series of two metastable states with decreasing energy in the framework of reversible finite state space Markov chains with exponentially small transition probabilities. Under certain assumptions, not only they find the (exponential) order of magnitude of the transition time from the first metastable state to the stable state, they also give an addition rule to compute the prefactor. We generalize their results on the mean transition time and their addition rule to a setting with several degenerate metastable states, see Section 2.4 for details.
The second goal concerns a particular Probabilistic Cellular Automata (PCA). Cellular Automata (CA) are discrete-time dynamical systems on a spatially extended discrete space and are used in a wide range of applications, for example to model natural and social phenomena. Probabilistic Cellular Automata (PCA) are the stochastic version of Cellular Automata, where the updating rules are random, i.e., the configurations are chosen according to probability distributions determined by the neighborhood of each site. Mathematically, we consider PCA with parallel (synchronous) dynamics, i.e., systems of finite-states Markov chains whose distribution at time n depends only on the states in a neighboring set at time n − 1. PCA are characterized by a matrix of transition probabilities from any configuration σ to any other configuration η defined as a product of local transition probabilities as p(σ, η) := i∈Λ p i,σ (η(i)), σ, η ∈ X , where Λ ⊂ Z 2 is a finite box with periodic boundary conditions and X = {−1, +1} Λ is the set of all configurations. Here we consider a specific PCA in the class introduced by Derrida [30], where the local transition probability is a certain function of the sum of neighboring spins S σ (·) (2.30) and the external magnetic field h p i,σ (a) : We obtain our PCA by summing only over the nearest neighbor sites, see (3.1) and Figure  3.1. When the sum is carried out over a symmetric set, the resulting dynamics is reversible with respect to a suitable Gibbs-like measure µ defined via a translation invariant multi-body potential, see (2.28). This measure depends on a parameter β which can be thought of as the inverse of the temperature of the system. For small values of the temperature, the PCA is likely to be found in the local minima of the Hamiltonian associated to µ. The metastable behavior of this model has been investigated on heuristic and numerical grounds in [6]. A key quantity in the study of metastability is the energy barrier from one of the metastable states to the stable state. This is the minimum, over all paths connecting the metastable to the stable state, of the maximal transition energy along each path, minus the energy of the starting configuration (see (2.8)-(2.9)). Intuitively, the energy barrier from η to σ is the energy that the system must overcome to reach η starting from σ. For our choice of parameters, our PCA has one stable state +1 and peculiarly three metastable states, which we identify rigorously as {−1, c e , c o }. To prove this, we will construct for each configuration σ / ∈ {−1, c e , c o , +1} a path starting from σ and ending in a lower energy state, such that the maximal energy, along the path, is lower than the energy barrier from −1 to +1. This leads to an explicit upper-bound V * for the stability level of every configuration except {−1, c e , c o , +1}, in Lemma 3.1, which we will refer to as our main technical tool. We rely on this estimate to prove two recurrence properties. The first is that, starting from any configuration, the system reaches the set {−1, c e , c o , +1} in a time smaller than e βV * with probability exponentially close to one. The second is that starting from any configuration the system reaches +1 in a time smaller than e βΓ PCA . To prove this, we combine our main tool with the computation of the energy barrier Γ PCA in [19] to prove the second recurrence property. We remark that c e and c o are two degenerate metastable states, since they have the same energy and the energy barrier between them is zero. Hence, we will use the shorthand c = {c e , c o }.
In order to find sharp estimates of the transition time from −1 to +1, we extend in Section 2.4, and then verify, the three model-dependent conditions given in [25]. These are, respectively, our main technical tool, the property that starting from −1 the system visits the chessboard c before reaching +1 with high probability [19], and the computation of the constants k 1 and k 2 in [24]. In fact, the sharp estimates on the transition time which we give here were already stated in [24], but the proof there missed some key steps, which we provide here. First, our Lemma 3.1 was assumed to hold without proof and the generalization given in theorems 2.8, 2.9, 2.10, 2.11, 2.12 were not done explicitly. To prove these last statements, we use model-independent theorems discussed earlier and model-dependent inputs such as the energy barrier.
Regarding the model-dependent results, [19] focuses on the transition from the metastable states to the stable state. In particular, the authors describe the tube of typical trajectories and they also estimate the transition time. To do this, they analyze the geometrical conditions for the shrinking or the growing of a cluster. Furthermore, they characterize the local minima of the energy and the so-called traps for the PCA dynamics. Building on this, we construct a specific path from any cluster to the stable state that the system follows with probability tending to one. Our estimates of the stability levels in Lemma 3.1 are based on these characterizations.
The authors in [23] consider a reversible PCA model with self-interactions. In particular they prove the recurrence to the set {−1, +1} and that −1 is the unique metastable state. They estimate the transition time in probability, in L 1 and in law. Moreover, they characterize the critical droplet that is visited by the system with probability tending to one during its excursion from the metastable to the stable state. Furthermore, in [44] they prove sharp estimates for expected transition time by computing the prefactor explicitly.
State of the art. A first mathematical description of metastability [51] was inspired by Gibbsian Equilibrium Statistical Mechanics and was based on the computation of the expected values with respect to restricted equilibrium states. The first dynamical approach, known as pathwise approach, was initiated in [13] and developed in [48,49,54], see also [50]. This approach derives large deviation estimates of the first hitting time and of the tube of typical trajectories. It is based on the notions of cycles and cycle paths and it hinges on a detailed knowledge of the energy landscape. Independently, similar results based on a graphical definition of cycles were derived in [15,14] and applied to reversible Metropolis dynamics and to simulated annealing in [16,55]. The pathwise approach was further developed in [40,20,21] to disentangle the study of transition time from the one of typical trajectories. This method was applied in [1,18,26,37,28,36,39,43,46,47,50] for Metropolis dynamics and in [19,23,22] for parallel dynamics.
The potential-theoretical approach is based on the study of the hitting time through the use of the Dirichlet form and spectral properties of the transition matrix. One of the advantages of this method is that it provides an estimate of the expected value of the transition time including the prefactor, by exploiting a detailed knowledge of the critical configurations, see [11,7]. This method was applied in [2,12,25,8,29] for Metropolis dynamics and in [44] for parallel dynamics.
Outline. The paper is organized as follows, in Section 2 we define a general setup and we present the main model-independent results with some applications to concrete models. In Section 3 we describe the reversible PCA model that we consider and we present the main model-dependent results. In Section 4 we carry out the proof of the model-independent results, and in Section 5 we carry out the proof of the model-dependent results. Finally in Appendix A we recall some results and give explicit computation that are used in the paper, and in Appendix B we prove theorems stated in Section 2.4.

General setup and definitions
Let X be a finite set, which we refer to as state space, and let ∆ : X × X −→ R + ∪ {∞} be a function, which we call rate function. ∆ is said to be irreducible if for every x, y ∈ X there exist a path ω = (ω 1 , ..., ω n ) ∈ X n with ω 1 = x, ω n = y and ∆(ω i , ω i+1 ) < ∞ for every 1 ≤ i ≤ n − 1, where n is a positive integer. A family of time-homogeneous Markov chains (X n ) n∈N on X with transition probabilities P β indexed by a positive parameter β is said to have rare transitions with rate function ∆ when for any x, y ∈ X . Intuitively, ∆(x, y) = +∞ should be understood as the fact that, when β is large, there is no possible transition between states x and y. We also note that condition (2.1) is sometimes written more explicitly as [21,Equation (2.2)]: for any γ > 0, there exists β 0 > 0 such that for any β > β 0 and any x, y ∈ X , where the parameter γ is a function of β that vanishes for β → ∞. Because of this, we also refer to the function ∆(x, y) as the energy cost of the transition from x to y. We assume that the Markov chain (X n ) n satisfies the detailed balance property for any x, y ∈ X , where G : X −→ R is the so-called Hamiltonian function. Equivalently, the Markov chain is reversible with respect to the Gibbs measure y∈X e −βG(y) .
(2.5) Definition (2.5) is well-posed, since for large β, the Markov chain (X n ) n is irreducible and its invariant probability distribution µ in (2.4) is such that for any x ∈ X the limit lim β→∞ − 1 β log µ(x) exists and is a positive real number [21,Prop. 2.1]. Taking the limit β → ∞ in (2.3) yields This motivates the following definition of transition energy where x, y are configurations in X . The definition of transition energy is needed to define the height along a path ω in the general setting. Indeed, there may not exist a configuration whose energy is equal to the energy of the maximum along the path. The transition energy between two configurations is defined as the sum between the virtual energy of the first configuration and the energy cost of the transition between the two configurations. This is unlike the Metropolis dynamics case [45], where the transition energy between two configurations is the virtual energy of some state along the path between the two.
Let ω = {ω 1 , ..., ω n } be a finite sequence of configurations. We call ω a path with starting configuration ω 1 and final configuration ω n . We denote the length of ω as |ω| = n. We define the height along ω as Let x, y ∈ X be two configurations. The communication height between two configurations x, y is defined as Φ(x, y) := min where Θ(x, y) the set of all the paths ω starting from x and ending in y. Similarly, we also define the communication height between two sets A, B ⊂ X as  The first hitting time of A ⊂ X starting from x ∈ X is defined as Whenever possible we shall drop from the notation the superscript denoting the starting point.
For any x ∈ X , let I x be the set of configurations with energy strictly lower than H(x), i.e., The stability level V x of x is the energy barrier that, starting from x, must be overcome to reach the set I x , i.e., If I x is empty, then we let V x = ∞. We denote by X s the set of global minima of the energy, and we refer to these as ground states. The metastable states are those states that attain the maximal stability level Γ m < ∞, that is (2.14) Since the metastable states are defined in terms of their stability level, a crucial role in our proofs is played by the set of all configurations with stability level strictly greater than V , that is We frame the problem of metastability as the identification of metastable states and the computation of transition times from the metastable states to the stable configurations. In summary, from the mathematical point of view, the metastability phenomenon for a given system is described in terms of X s , Γ m and X m . Now we define formally the energy barrier Γ as where y m ∈ X m and y s ∈ X s . Note that Γ does not depend on the specific choice of y m , y s . The energy barrier is the minimum energy necessary to trigger the nucleation. The energy Γ turns out to be equal to Γ m under specific assumptions [20,Theorem 2.4].
A different notion of metastable states is given in [10], within the framework of the potentialtheoretic approach. The Dirichlet form associated with our reversible Markov chain is the functional where f : X → R is a function. Thus, given two not empty disjoint sets Y, Z ⊂ X the capacity of the pair Y and Z defined as Note that the capacity is a symmetric function of the sets Y and Z. It can be proven that the right hand side of (2.19) has a unique minimizer called equilibrium potential of the pair Y and Z. There is a nice interpretation of the equilibrium potential in terms of hitting times. For any x ∈ X , we denote by P x (·) and E x [·] respectively the probability and the average along the trajectories of the process started at x. Then, it can be proven that the equilibrium potential of the pair Y and Z is equal to the function h Y,Z defined as follows where τ Y and τ Z are, respectively, the first hitting time to Y and Z for the chain started at x. It can be also proven that, for any Y ⊂ X and z ∈ X \ Y , In order to avoid confusion, we will denote the states that satisfy (2.22) as p.t.a.-metastable. The physical meaning of the above definition can be understood once one remarks that the quantity µ β (x)/cap β (x, y), for any x, y ∈ X , is strictly related to the communication cost between the states x and y, see Proposition B.5 for details. Thus, condition (2.22) ensures that the communication cost between any state outside M and M itself is smaller than the communication cost between any two states in M .

Main model-independent results
The following theorems give estimates of the mixing time and the spectral gap in the general setting.
Theorem 2.2. Let (P β (x, y)) x,y∈X be the transition matrix of a Markov chain. Assume there exists at least a stable state s such that Then, for any 0 < < 1 we have Theorem 2.3. Let (P β (x, y)) x,y∈X be a reversible transition matrix. Let ρ β = 1 − a (2) β be the spectral gap, with a (2) β is the second eigenvalue of the transition matrix such 1 = a Then there exist two constants 0 < c 1 < c 2 < ∞ independent of β such that for every β > 0,

Results for some concrete models
In this section we show that several well-known models in statistical mechanics satisfy the assumption (2.23) of Theorem 2.2. In particular we are able to get precise asymptotics for the mixing time of these models. Throughout this section we denote by Λ a finite subset of Z 2 , by X the configuration space and by s a stable state.
Metropolis algorithm. The Hamiltonian function for this model coincides with the virtual energy and is given by The transition probabilities are given by In this case the assumption (2.23) is shown to hold in [21,Prop. 3.24]. Note that Kawasaki dynamics is a type of Metropolis dynamics, so it falls into this case.
Reversible PCA model for Spin Systems. For this model, the Hamiltonian function is given by 28) and the virtual energy is obtained by (2.5) where K(i − j) = 0 for j ∈ U i a neighborhood of i. Different choices of K(·) and U i yield different PCA. It can be shown that, if U i is symmetric, then the Markov chain is reversible. The transition probabilities are given by where, for i ∈ Λ and σ ∈ X , p i,σ (·) is the probability measure on {−1, +1} defined as with a ∈ {−1, +1}. We have where we used the inequality (1 + x) α ≤ 1 + αx with α ∈ (0, 1). In this model the unique stable state is s = +1, so we conclude in the following way lim β→∞ i∈Λ where in the last equality we used that h ≥ 0 and |U i | is the same for all i ∈ Λ.
Irreversible PCA model. The Hamiltonian function of the Irreversible PCA model is given by The transition probabilities are given by Note that the subset X \ X s is not empty since G is not constant. We compute The last term goes to zero since N is finite. Since in this model s = +1, we have and the conclusion follows.

Series of metastable states
The structure of the energy landscape that we analyze for our reversible PCA model in Section 3.1 is such that the system has three metastable states with one non-degenerate-in-energy metastable state and two degenerate metastable states. Moreover, the system started at the metastable state with higher energy, must necessarily visit the second one before relaxing to the stable state. In this Section we generalize the results in [25, Section 2.5, 2.6] to this degenerate context. In particular, we shall prove the addition rule for the exit times from the metastable states.
Condition 2.4. We assume that the energy landscape (X , Q, H, ∆) is such that there exist four or more states x 0 , x 1 1 , x 2 1 , ..., x n 1 and Recalling the definition of the set of ground states X s , we immediately have Moreover, from the definition (2.13) of maximal stability level it follows that (see [20, Theorem 2.3]) the communication cost from x 2 to x 0 is equal to the communication cost from x r 1 to x 0 for every r = 1, ..., n, that is Note that, since x 2 is a metastable state, its stability level cannot be lower than Γ m . Then, The two bounds finally imply that Note that the communication cost from x 0 to x 2 and that from x r 1 to x 2 are larger than Γ m , i.e., Indeed, recalling the reversibility property (2.6), we have where in the last two steps we have used (2.41) and Condition 2.4, which proves the second of the two equations (2.42). The first of them can be proved similarly. When the system is started at x 2 , with high probability it will visit x r 1 before x 0 for every r = 1, ..., n. For this reason we shall assume the following condition.

Condition 2.5. Condition 2.4 is satisfied and
We remark that the Condition 2.5 is in fact a condition on the equilibrium potential h x0,x r 1 evaluated at x 2 , for every r = 1, ..., n.
One of important goals of this paper is to prove an additional rule for the mean hitting time of +1 starting at −1 using Theorem 2.12 for the expectation of the transition time τ x0 for the chain started at x 2 . Such an expectation, hence, will be of order exp(βΓ m ) and the prefactor will be that given in (2.52).
We can thus formulate the further assumptions that we shall need in the sequel.
Condition 2.6. Condition 2.4 is satisfied and there exists two positive constants k 1 , k 2 < ∞ and such that where o(1) denotes a function tending to zero in the limit β → ∞.
Condition 2.7. Condition 2.4 is satisfied and there exists n positive constants c 1 , c 2 , ..., c n < ∞ such that where o(1) denotes a function tending to zero in the limit β → ∞.
Theorem 2.8. Assume Condition 2.4 is satisfied. Then for every r = 1, ..., n we have We remark that Theorem 2.12 gives an addition formula for the mean hitting time of x 0 starting at x 2 . Neglecting terms of order o(1), such a mean time can be written as the sum of the mean hitting time of the subset {x 1 1 , ..., x n 1 , x 0 } starting at x 2 and of the mean hitting time of x 0 starting from any state in {x 1 1 , ..., x n 1 }. It is very interesting to note that in this decomposition no role is played by the mean hitting time of {x 1 1 , ..., x n 1 } starting at x 2 .

The model
We consider the reversible PCA model for Spin Systems introduced by Derrida in [30], see also [19]. In the second example of Section 2.3, we considered a general PCA, but from now on we restrict ourselves to a specific nearest-neighbor interaction, see figure 3.1. Consider the two-dimensional torus with L even Λ 2 L := {0, ..., L − 1} 2 , endowed with the Euclidean metric.
i Figure 3: In black are highlighted the sites j such that K(i − j) = 0 in the reversible PCA model for spin systems.
To each site i ∈ Λ we associate a variable σ(i) ∈ {−1, +1}. Λ 2 L represents an interacting particles system characterized by their spin and we interpret σ(i) = +1 (respectively σ(i) = −1) as indicating that the spin at site i is pointing upwards (respectively downwards). Let X := {−1, +1} Λ be the configuration space, let β := 1 T > 0 where T is thought of as the temperature. Let h ∈ (0, 1) be a parameter representing the external ferromagnetic field. We do not consider the case h > 1, because in that case there is no metastable behavior. The dynamics of the system are modelled as a Markov chain (σ n ) n∈N on X with transition matrix defined in (2.30), (2.31). In the rest of the paper, we will choose Note that the transition probability p i,σ (s) for the spin σ(i) given in (2.32) depends only on the values of the adjacent spins. The system evolves in discrete time steps, where at each step, all the spins are updated simultaneously according to the probability distribution (2.32). Intuitively, the value of the spin is likely to align with the local effective field S σ (i) + h. Here S σ (i) represents a ferromagnetic interaction among spins.
The Markov chain σ n satisfies the detailed balance property (2.3), where G(·) in (2.28) is the Hamiltonian function. Equivalently, the Markov chain is reversible with respect to the Gibbs measure (2.4) and this implies that the measure µ is stationary. Finally, given σ, η ∈ X , we define the energy cost of the transition from σ to η for our specific PCA, as Note that ∆(σ, η) ≥ 0 and, perhaps surprisingly, ∆(σ, η) is not necessarily equal to ∆(η, σ). We also note that condition (3.2) is sometimes written more explicitly as in (2.2). The last equality in (3.2) is obtained as follows (for more details, see Appendix A), Let us fix the notation of some important states as follows: • +1 is the configuration such that +1(i) = +1 for every i ∈ Λ; • −1 is the configuration such that −1(i) = −1 for every i ∈ Λ; • c e and c o are the configurations such that c e (i) = (−1) i1+i2 and c o (i) = (−1) i1+i2+1 for every i = (i 1 , i 2 ) ∈ Λ. These configuration are called chessboard configurations.
Next we define the virtual energy as the limit We distinguish two cases. • Case h > 0. In this case +1 is the unique ground state. The energy of this state is From now on we assume h > 0, fixed and small. Under periodic boundary conditions, the energy of these configurations is, respectively • H(+1) = −L 2 (4 + 2h), Since H(c e ) = H(c o ) and ∆(c e , c o ) = ∆(c o , c e ) = 0, from now on we will indicate either element of the set {c e , c o } as c, this is an example of stable pair (see Definition 5.1). Therefore, H(−1) > H(c) > H(+1) for 0 < h < 1. Our first goal is to show that {−1, c} is the set of metastable states and +1 is the global minimum (or ground state).

Main model-dependent results
In the setup introduced in [40], the minimal description of the metastability phenomenon is given in terms of X s , X m and Γ m , so we concentrate our attention on these. In particular we determine the metastable and stable stases and we show that the maximal stability level Γ m is equal to the energy barrier Γ PCA , defined as [19, (3.29)] where λ is the critical length computed in [19, (3.24)] and defined as where [·] is the integer part. Assuming that the system is prepared in the state σ 0 = −1, with probability tending to one as β → ∞ the system visits the chessboard c before relaxing to the stable state +1. Moreover, by [19, Theorem 3.11, Theorem 3.13] along the tube of paths from −1 to c the system visits a certain set of configurations called critical droplets from −1 to c. The critical droplets are all those configurations that have a single chessboard droplet of a specific size in a sea of minuses. Instead, along the tube of paths from c to +1 the system visits a certain set of configurations, also called critical droplets from c to +1, but in this case these are all those configurations that have a single plus droplet of a specific size in a chessboard. The droplet size, in both cases, is the so-called critical length λ. We then say that a rectangle is supercritical (resp. subcritical ) if the side of the rectangle is greater than λ (resp. smaller than λ). Formally, the chessboard droplet is a supercritical rectangle with a one-by-one protuberance attached to one of the two longest sides and with the spin plus in this protuberance. Note that starting from different initial configurations yields different kinds of droplets.
We are finally ready to present our model-dependent results. In Lemma 3.1 we show that all states different from +1, −1, c have a strictly lower stability level than Γ PCA . Using this lemma and [19, Lemma 3.4, Lemma 4.1], we show that Γ PCA = Γ m , allowing us to conclude in Theorem 3.2 that the only metastable states are indeed −1 and c.
are SES.
Equation (3.7) in the next theorem already appeared in [24, Theorem 3.1], however the proof there was incomplete. Thanks to the previous theorems we are able to prove it rigorously here. The second part of the next theorem is an application of Theorem 2.2 to the reversible PCA model by Derrida.
The first term 1 k1 e βΓ PCA represents the contribution of the mean hitting time

Proof of model-independent results
Before we prove Theorem 2.2, let us recall some important definitions. . Let (X n ) n be a Markov chain. A nonempty set C ⊂ X is a cycle if it is either a singleton or for any x, y ∈ C, such that x = y, In other words, a nonempty set C ⊂ X is a cycle if it is either a singleton or if for any x ∈ C, the probability for the process starting from x to leave C without first visiting all the other elements of C is exponentially small. We denote by C(X ) the set of cycles of X .  The proposition [21,Prop. 3.10] establishes the equivalence between cycle and energy-cycle and allows us to use the equivalence between the approach in [38,16,15] and the path-wise approaches [19,45,21,40,48,49,50] that uses the energy-cycle. Next we define the collection of maximal cycles.
To prove the reverse inequality Γ m ≥Γ(X \ {s}), we consider R D (x), the union of {x} and of the points in X which can be reached by means of paths starting from x with height smaller than the height that is necessary to escape from D ⊂ X starting from x [21, (3.58)]. We consider We partition X into the set of local minima X 0 (i.e., X V with V = 0) and its complement, as (4.7) Let us analyze the two terms on the right separately.
and we refer to the previous case x ∈ X 0 , sincex ∈ X 0 \ {s}.
where F = {(x, x)| x ∈ X }, Γ(X × X \ F ) = max C∈M(X ×X \F ) Γ(C) and M(X × X \ F ) = {C ∈ C(X ) | C maximal cycle by inclusion under the constraint C ⊆ X ×X }. Through the equivalence of two definitions of cycles, given by [21,Prop. 3.10], the critical depth H 2 is equal toΓ(X \ {s}). This quantity is well defined because its value is independent of the choice of s [14, Theorem 5.1]. Now we consider two independent Markov chains, X t and Y t , on the same energy landscape and with the same inverse temperature β. We define the two dimensional Markov chain {(X t , Y t )} on X × X with transition probabilities P ⊗2 β given by So, using [14, Theorem 5.1] and the assumption (2.23), the proof is concluded.
Before proving the bounds (2.25) we recall the Definition 2.18 and we define the generator of a Markov process.   where γ is a function of β that vanishes for β → ∞.
On the other hand, where the last inequality is obtained by (4.20), and by our assumption H(y 0 ) = 0. We conclude that where C is a constant and γ = γ * 1 + 2γ 2 . Lemma 4.8. There exists a constant C > 0, such that for all β ≥ 0, where γ is a function of β that vanishes for β → ∞.

Proof of model-dependent results
In Section 5.1 we prove the main model-dependent results except for Lemma 3.1, which we postpone to Section 5.2. Proof of Theorem 3.2 (Identification of metastable states). In [19] the authors computed the value of Γ to be Γ PCA = −2hλ 2 + 2λ(4 + h) − 2h. There, it was also proven that

Proof of
With a similar reasoning with a = Γ m , X Γm = X s , we get
For any σ ∈ X there exists a unique configuration η ∈ X such that the transition σ → η happens with high probability as β → ∞, that is p(σ, η) β→∞ −→ 1. So let η and σ be two configurations in X such that η = T σ, where Definition 5.2. Let σ, η ∈ X be two different configurations. We say that σ and η form a stable pair if and only if η = T σ and T η = σ. Moreover, we say that σ ∈ X is a trap if either σ is a stable configuration or the pair (σ, T σ) is a stable pair. We denote by T ⊂ X the collection of all traps.
We define two further maps, that will be useful later on. For any given j ∈ Λ, T F j (σ) = T (σ) except in the site j, where T F j (σ) = σ(j). Formally, For any given j ∈ Λ, T C j (σ) = T (σ) except in the site j, where T C j (σ) = −σ(j). Formally, The two maps are similar to T (σ), the only difference being that T F j (σ) fixes the value of the spin in j and T C j (σ) changes the value of the spin in j. We say that x, y ∈ Λ are nearest neighbors if and only if the lattice distance d between x, y is one, i.e., d(x, y) = 1. We indicate by R l,m ⊆ Λ the rectangle with sides l and m, 2 ≤ l ≤ m and we call non-interacting rectangles two rectangles R l,m and R l ,m such that any of the following conditions hold: • d(R l,m , R l ,m ) ≥ 3, if σ R l,m = c e R l,m and σ R l ,m = c e R l ,m ; • d(R l,m , R l ,m ) ≥ 3, if σ R l,m = +1 R l,m and σ R l ,m = +1 R l ,m ; • d(R l,m , R l ,m ) = 1, if σ R l,m = c o R l,m and σ R l ,m = c e R l ,m ; • d(R l,m , R l ,m ) = 1, if σ R l,m = c R l,m , σ R l ,m = +1 R l ,m and the sides on the interface are of the same length.
Whenever two rectangles are not non-interacting, we call them interacting.
Proof of Lemma 3.1. We begin by giving a rough sketch of the proof. Without loss of generality, we consider only configurations in U := X 0 \ {−1, c, +1}, since the configurations in X \ X 0 have stability level zero. Indeed, if σ ∈ X \ X 0 , we construct the path ω = (σ, T (σ)), so that T (σ) ∈ I σ and V σ = 0, where I σ was defined in (2.12). We will partition X 0 \{−1, c, +1} into several subsets A, B, D, E and for each of these we will construct a path ω ∈ Θ(σ, I σ ∩ X 0 ). Denote with σ Λ a configuration σ ∈ Λ ⊆ Λ. We will find an explicit upper-bound V * σ on the transition energy along ω as max k=1,...,|ω|−1 This means that all configurations in X 0 \ {−1, c, +1} have a lower stability level than Γ PCA . We now proceed with the detailed proof. We partition the set X 0 \ {−1, c, +1} into four subset as For each set A, B, D, E, we first describe it in words and then give its formal definition. We define the set A to be the set of configurations consisting of a single rectangle containing either c or +1, and surrounded by either c or −1, see Figure 5. More precisely, • A 1 is the collection of configurations such that ∃! R l,m ⊂ Λ with l < λ, σ R l,m = c R l,m and σ Λ\R l,m = −1 Λ\R l,m ; • A 2 is the collection of configurations such that ∃! R l,m ⊂ Λ with l ≥ λ, σ R l,m = c R l,m and σ Λ\R l,m = −1 Λ\R l,m ; • A 3 is the collection of configurations such that ∃! R l,m ⊂ Λ with l < λ, σ R l,m = +1 R l,m and σ Λ\R l,m = c Λ\R l,m ; • A 4 is the collection of configurations such that ∃! R l,m ⊂ Λ with l ≥ λ, σ R l,m = +1 R l,m and σ Λ\R l,m = c Λ\R l,m ; • A 5 is the collection of configurations such that ∃! R l,m ⊂ Λ with l < λ, σ R l,m = +1 R l,m and σ Λ\R l,m = −1 Λ\R l,m ; • A 6 is the collection of configurations such that ∃! R l,m ⊂ Λ with l ≥ λ, σ R l,m = +1 R l,m and σ Λ\R l,m = −1 Λ\R l,m . Configurations in the set B consist of a single chessboard rectangle which may contain an island of +1, surrounded by −1, see Figure 6. More precisely, • B 3 is the collection of configurations such that ∃! R l,m with l < λ, σ R l,m = +1 R l,m and ∃! R l ,m R l,m with l ≥ λ such that σ R l ,m \R l,m = c R l ,m \R l,m , σ Λ\R l ,m = −1 Λ\R l ,m . The set D contains all configurations with more than one rectangle, see Figure 7. More precisely, • D 1 is the collection of configurations such that there exist subcritical non-interacting rectangles R := (R l,m ) l,m such that σ Λ\R = −1 Λ\R and any rectangle of chessboard may contain one or more non-interacting rectangles of pluses; • D 2 is the collection of configurations such that there exist non-interacting rectangles R := (R l,m ) l,m where at least one of them is supercritical and such that σ Λ\R = −1 Λ\R . Moreover, any rectangle of chessboard may contain one or more non-interacting rectangles of pluses; • D 3 is the collection of configurations consisting of interacting rectangles R := (R l,m ) l,m with l < λ and such that any rectangle of chessboard may contain one or more noninteracting rectangles of pluses; • D 4 is the collection of configurations consisting of non-interacting rectangles R := (R l,m ) l,m with l < λ such that σ R l,m = +1 R l,m and σ Λ\R = c Λ\R ; • D 5 is the collection of configurations consisting of rectangles R := (R l,m ) l,m where at least one has l ≥ λ and such that σ R l,m = +1 R l,m and σ Λ\R = c Λ\R ; The set E contains all possible strips, that is, rectangles winding around the torus, see Figure 8.
• E 1 is the collection of configurations containing strips of c of width one surrounded by −1, and possibly rectangles of +1 and c; • E 2 is the collection of configurations containing strips of +1 of width one surrounded by c, and possibly rectangles of +1; • E 3 is the collection of configurations containing strips of +1 of width one surrounded by −1, and possibly rectangles of +1 and c; • E 4 is the collection of configurations containing pairs of adjacent strips of c and −1. For at least one of these pairs, both strips have width greater than one. Furthermore, there may be rectangles of c and +1 surrounded by −1, and rectangles of +1 surrounded by c; • E 5 is the collection of configurations containing pairs of adjacent strips of c and +1. For at least one of these pairs, both strips have width greater than one. Furthermore, there may be rectangles of +1 surrounded by c; • E 6 is the collection of configurations containing pair of adjacent strips of +1 and −1. For at least one these pairs, both strips have width greater than one. Furthermore, there may be rectangles of c and +1 surrounded by −1; • E 7 is the collection of configurations containing strips of c, −1 and +1 with at least one width greater than one, and possibly rectangles of c and +1 in −1, and possibly rectangles of +1 in c; We begin by considering the set A. Consider first the set A 1 .
Case A 1 . For any configuration σ ∈ A 1 we construct a path that begins in σ and ends in a configuration in A 1 ∪ {−1} with lower energy than σ, i.e., ω ∈ Θ(σ, I σ ∩ (A 1 ∪ {−1})). We now fix σ ≡ ω 1 ∈ A 1 and we begin by defining ω 2 . If there is a minus corner in σ R l,m , say in j 1 , then σ(j 1 ) is kept fixed and all other spins in the rectangle switch sign, i.e., ω 2 := T F j1 (ω 1 ). On the other hand, if there is no minus corner in σ R l,m , then we call the next configuration in the path ω 1 and we define it as ω 1 := T (ω 1 ), i.e., all the spins in the rectangle switch sign. After this step, ω 1 has a minus corner, so we can proceed as above and define ω 2 := T F j1 (ω 1 ). Note that in ω 2 there are two minus corners in the rectangle that are nearest neighbors of j 1 . For the next step, keep fixed the minus corner that is contained in a side of length l, say in j 2 , and define ω 3 := T F j2 (ω 2 ). By iterating this procedure l − 2 times, a full slice of the droplet is erased and we obtain the configuration η ≡ ω l such that η R l,m−1 = c and η Λ\R l,m−1 = −1. In order to determine where the maximum of the transition energy is attained, we rewrite for k = 1, . . . , l − 1 (5.13) with the convention that a sum over an empty set is equal to zero. From the reversibility property of the dynamics follows that 14) and since ∆(ω k+1 , ω k ) = 0 for k = 1, . . . , l − 2, for the path ω, Since V * σ depends only on the length l, we find V * A1 = max σ∈A1 V * σ by taking the maximum over l. Since l < λ, we have Finally, let us check that ω l ∈ I σ ∩ (A 1 ∪ {−1}). Using (5.14), (5.16) and [19, Tab. 1], we get The rectangle R l,m is subcritical if and only if l < 2/h, and so which concludes the proof for A 1 .
Case A 2 . For any configuration σ ∈ A 2 we construct a path that begins in σ and ends in a configuration in A 2 ∪ {c} with lower energy than σ, i.e., ω ∈ Θ(σ, I σ ∩ (A 2 ∪ {c})). We now fix σ ≡ ω 1 ∈ A 2 and we begin by defining ω 2 . We call j ∈ R l,m a site in one of the sides of length l and such that σ(j) = +1. Furthermore, we call j 1 ∈ Λ \ R l,m the nearest neighbor of j such that (necessarily) σ(j 1 ) = −1 and we define ω 2 := T C j1 (ω 1 ), i.e., σ(j 1 ) switches sign and the signs of all other sites in σ Λ\R l,m remain fixed. We define ω 3 := T (ω 2 ), ω 4 := T (ω 3 ) = T 2 (ω 2 ) and so on until a new slice is filled with chessboard. We obtain the configuration η such that η R l,m+1 = c and η Λ\R l,m+1 = −1. Note that at the first step of the dynamics either one or two nearest neighbors of j 1 in the external side of the rectangle switch sign when T is applied. Analogously, at each subsequent application of T , either one of two further sites in the external side of the rectangle switch sign. Therefore, the maximum number of iterations of the map T is l − 1. In order to determine where the maximum of the transition energy is attained, we rewrite the energy difference as in (5.13). Using (5.14) and since ∆(ω k , ω k+1 ) = 0 for k = 2, . . . , l − 1, for the path ω, Finally, let us check that ω l ∈ I σ ∩ (A 2 ∪ {c}). Using (5.14), (5.21) and [19, Tab. 1], we get The rectangle R l,m is supercritical if and only if l > 2/h, and so which concludes the proof for A 2 .
Case A 3 . For any configuration σ ∈ A 3 we construct a path that begins in σ and ends in a configuration in A 3 ∪ {c} with lower energy than σ, i.e., ω ∈ Θ(σ, I σ ∩ (A 3 ∪ {c})). We now fix σ ≡ ω 1 ∈ A 3 and we begin by defining ω 2 . If in σ R l,m there is a plus corner surrounded by two minuses, say in j 1 , then σ(j 1 ) switches sign and the signs of all other spins in the rectangle remain fixed, i.e., ω 2 := T C j1 (ω 1 ). On the other hand, if in σ R l,m there are no plus corners surrounded by minuses, then we call the next configuration in the path ω 1 and we define it as ω 1 := T (ω 1 ), i.e., all the spins in σ Λ\R l,m switch sign. After this step, ω 1 has a plus corner surrounded by two minuses, so we can proceed as above and define ω 2 := T C j1 (ω 1 ). Note that in ω 2 there are two plus corners in the rectangle that are nearest neighbors of j 1 . For the next step, the plus corner, say in j 2 , that is contained in a side of length l, switches sign, i.e., ω 3 := T C j2 (ω 2 ). By iterating this step l − 2 times, a full slice of the droplet is erased and we obtain the configuration η ≡ ω l such that η R l,m−1 = +1 and η Λ\R l,m−1 = c. In order to determine where the maximum of the transition energy is attained, we rewrite the energy difference as in (5.13). Using (5.14), we obtain the same result as in (5.15). Hence, Since V * σ depends only on the length l, we find V * A3 = max σ∈A3 V * σ by taking the maximum over l. Since l < λ, we have V * A3 < 2(2 − h). (5.25) Finally, let us check that ω l ∈ I σ ∩ (A 3 ∪ {c}). Using (5.14), (5.24) and [19, Tab. 1], we get The rectangle R l,m is subcritical if and only if l < 2/h, and so which concludes the proof for A 3 .
Case A 4 . For any configuration σ ∈ A 4 we construct a path that begins in σ and ends in a configuration in A 4 ∪ {+1} with lower energy than σ, i.e., ω ∈ Θ(σ, I σ ∩ (A 4 ∪ {+1})). We now fix σ ≡ ω 1 ∈ A 4 and we begin by defining ω 2 . Pick any site j ∈ R l,m in one of the sides of length l, such that its nearest neighbor j 1 ∈ Λ \ R l,m is such that σ(j 1 ) = +1. We define ω 2 := T F j1 (ω 1 ), i.e., σ(j 1 ) is kept fixed and all the spins in σ Λ\R l,m switch sign. We define ω 3 := T (ω 2 ), ω 4 := T (ω 3 ) = T 2 (ω 2 ) and so on until a new slice is filled with +1. We obtain the configuration η such that η R l,m+1 = +1 and η Λ\R l,m+1 = c. Note that at the first step of the dynamics either one or two nearest neighbors of j 1 in the external side of the rectangle switch sign when T is applied. Analogously, at each subsequent application of T , either one of two further sites in the external side of the rectangle switch sign. Therefore, the maximum number of iterations of the map T is l − 1. In order to determine where the maximum of the transition energy is attained, we rewrite the energy difference as in (5.13). Using (5.14), we obtain the same result as in (5.20). Hence, Since V * σ is the same for all configurations in A 4 , V * A4 = max σ∈A4 V * σ = 2(2 − h). Finally, let us check that ω l ∈ I σ ∩ (A 4 ∪ {+1}). Using (5.14), (5.28) and [19, Tab. 1], we get H(ω 1 ) + 2(2 − h) = H(ω l ) + 2h(l − 1). (5.29) The rectangle R l,m is supercritical if and only if l > 2/h, and so which concludes the proof for A 4 .
Case A 5 . For any configuration σ ∈ A 5 we construct a path that begins in σ and ends in a configuration in D 1 with lower energy than σ, i.e., ω ∈ Θ(σ, I σ ∩ D 1 ). We now fix σ ≡ ω 1 ∈ A 5 and we begin by defining ω 2 . We call j 1 a corner in R l,m such that (necessarily) σ(j 1 ) = +1 and we define ω 2 := T C j1 (ω 1 ), i.e., σ(j 1 ) switches sign and the signs of all other spins in the rectangle remain fixed. Note that in ω 2 there are two plus corners in the rectangle that are nearest neighbors of j 1 . For the next step, the plus corner, say in j 1 , that is contained in a side of length l switches sign, i.e., ω 3 := T C j2 (ω 2 ). After this, the spin of the nearest neighbor of j 2 along the same side of R l,m and different from j 1 , say in j 3 , switches spin, i.e., ω 4 := T C j3 (ω 3 ). By iterating this step l − 3 times, a full slice of the droplet is erased and we obtain the configuration ω l ≡ η such that η R l,m−1 = +1, η R l,1 = c, η Λ\R l,m = −1. The configuration η is a configuration in D 1 . In order to determine where the maximum of the transition energy is attained, we rewrite the energy difference as in (5.13). Using (5.14), we obtain the same result (5.15). Hence, Since V * σ depends only on the length l, we find V * A5 = max σ∈A5 V * σ by taking the maximum over l. Since l < λ, we have Finally, let us check that ω l ∈ I σ ∩ D 1 . Using (5.14), (5.31) and [19, Tab. 1], we get The rectangle R l,m is subcritical if and only if l < 2/h, and so which concludes the proof for A 5 .
Case A 6 . For any configuration σ ∈ A 6 we construct a path that begins in σ and ends in a configuration in D 3 with lower energy than σ, i.e., ω ∈ Θ(σ, I σ ∩D 3 ). We now fix σ ≡ ω 1 ∈ A 6 and we begin by defining ω 2 . We call j ∈ R l,m a site in a side of R l,m , and note that (necessarily) σ(j) = +1. Without loss of generality, we choose a side of length l. Furthermore, we call j 1 ∈ Λ \ R l,m the nearest neighbor of j contained in the external side with length l such that (necessarily) σ(j 1 ) = −1. We define ω 2 := T C j1 (ω 1 ), i.e., σ(j 1 ) switches sign and the signs of all other spins in σ Λ\R l,m remain fixed. We define ω 3 := T (ω 2 ), ω 4 := T (ω 3 ) = T (ω 2 ) and so on until a new slice is filled with c, so we obtain the configuration η such that η R l,m = +1, η R l,1 = c and η Λ\R l,m+1 = −1. Note that at the first step of the dynamics either one or two nearest neighbors of j 1 in the external side of the rectangle switch sign when T is applied. Analogously, at each subsequent application of T , either one of two further sites in the external side of the rectangle switch sign. Therefore, the maximum number of iterations of the map T is l − 1. The configuration η is a configuration in D 2 . In order to determine where the maximum of the transition energy is attained, we rewrite the energy difference as in (5.13). Using (5.14), we obtain the same result as in (5.20). Hence, Since V * σ is the same for all configurations in A 6 , V * A6 = max σ∈A6 V * σ = 2(2 − h). Finally, let us check that ω l ∈ I σ ∩ D 3 . Using (5.14), (5.35) and [19,Tab. 1], we get The rectangle R l,m is supercritical if and only if l > 2/h, and so which concludes the proof for A 6 . In conclusion, Next we consider the set B.
Case B 1 . For every configuration in B 1 , both rectangles are subcritical. Following a path that changes a slice of +1 into a slice of c, analogously as was done for A 3 , we get a configuration in For every configuration in B 2 , both rectangles are supercritical. Following a path that adds a slice of c, analogously as was done for A 2 , we get a configuration in For every configuration in B 3 , the external rectangle is supercritical and the internal rectangle is subcritical. Following a path that adds a slice of c, analogously as was done for A 2 , we get a configuration in Next we consider the set D.
Case D 1 . For every configuration σ in D 1 , all rectangles are subcritical and non-interacting. If σ contains at least one rectangle of +1 surrounded by c, we take our path to be the path that cuts a slice of +1, analogously as was done for A 3 . We get a configuration in I σ ∩ D 1 . Otherwise, if σ contains at least one rectangle of +1 surrounded by −1, we take our path to be the path that changes a slice of +1 into a slice of c, analogously as was done for A 5 . We get a configuration in I σ ∩ D 3 . Finally, we consider all remaining configurations, namely chessboard rectangles in a sea of minus. We take our path to be the path that cuts a slice of c, analogous to the one described in A 1 . We get a configuration in I σ ∩ (D 1 ∪ A 1 ). So, we have Case D 2 . For every configuration σ in D 2 , there exists at least one supercritical rectangle. If this is a chessboard rectangle, then we take the path that makes the rectangle grow a slice of c, analogously as was done for A 2 . We get a configuration in I σ ∩(A 3 ∪A 4 ∪D 2 ∪D 4 ∪D 5 ∪E 4 ∪{c}).
Otherwise, if this supercritical rectangle contains +1, we take the path that makes the rectangle grow a slice of c, analogously as was done for A 6 . We get a configuration in Case D 3 . For every configuration σ in D 3 , all rectangles are subcritical and non-interacting.
If σ contains at least one rectangle of +1 surrounded by c, we take our path to be the path that cuts a slice of +1, analogously as was done for A 3 . We get a configuration in I σ ∩ D 3 . Otherwise, if σ contains at least one rectangle of +1 at lattice distance one from a rectangle of c, we take the path that changes a slice of +1 into a slice of c along the interface between the two rectangles, analogously as was done for A 3 . We get a configuration in In the remaining cases, σ contains at least two rectangles of different chessboard parity at lattice distance one. We take our path to be a path that changes a slice of c, analogously as was done for A 1 . We get a configuration in Case D 4 . For every configuration σ in D 4 , all rectangles of +1 surrounded by c are subcritical and non-interacting. We take our path to be a path that cuts a slice of +1, analogously as was done for A 3 . We get a configuration in I σ ∩ (D 4 ∪ A 3 ). So, we have Case D 5 . For every configuration σ in D 5 , there exists at least a supercritical rectangle of +1 surrounded c. We consider this rectangle and we take the path that makes the rectangle grow a slice of +1, analogously as was done for A 4 . We get a configuration in In conclusion, Case E 1 . A configuration σ ≡ ω 1 in E 1 has at least a strip of c of width one. Pick a site j in the strip such that σ(j) = −1 and define ω 2 = T F j (ω 1 ), i.e., σ(j) is kept fixed. The energy difference is H(ω 2 ) − H(ω 1 ) = 2h [19,Tab.1]. We define ω 3 := T (ω 2 ), ω 4 := T (ω 3 ) = T 2 (ω 2 ) and so on until we obtain a configuration in Case E 2 . A configuration σ ≡ ω 1 in E 2 contains at least a strip of +1 of width one. Let σ(j) be a plus in the strip surrounded by one or two minuses. We define ω 2 = T C j (ω 1 ), i.e., σ(j) switches sign. The maximum energy difference is H( We define ω 3 := T (ω 2 ), ω 4 := T (ω 3 ) = T 2 (ω 2 ) and so on until we obtain a configuration in Case E 3 . A configuration σ ≡ ω 1 in E 3 has at least a strip of +1 of width one. If in σ there is a strip of +1 surrounded by two chessboards with the same parity, then pick a plus σ(j) in the strip and define ω 2 = T C j (ω 1 ), i.e., σ(j) switches sign. The energy difference is H(ω 2 ) − H(ω 1 ) = 2h [19,Tab.1]. We define ω 3 := T (ω 2 ), ω 4 := T (ω 3 ) = T 2 (ω 2 ) and so on until we obtain a configuration in I σ ∩ (E 1 ∪ E 7 ). Instead, if in σ there is a strip of +1 surrounded by two chessboards with different parity, then pick a plus σ(j) in a chessboard at lattice distance one from the strip and define ω 2 = T F j (ω 1 ), i.e., σ(j) is kept fixed. The energy difference is H(ω 2 ) − H(ω 1 ) = 2(2 − h) [19,Tab.1]. We define ω 3 := T (ω 2 ), ω 4 := T (ω 3 ) = T 2 (ω 2 ) and so on until we obtain a configuration in I σ ∩ E 5 . So, we have Case E 4 . We consider a configuration σ ≡ ω 1 in E 4 and pick a plus on the interface between c and −1, and call j the site of this plus. We call j 1 the nearest neighbor of j in −1 and we define ω 2 = T C j1 (ω 1 ), i.e., σ(j 1 ) switches sign. The energy difference is H(ω 2 ) − H(ω 1 ) = 2(2 − h) [19, Tab.1]. We define ω 3 := T (ω 2 ), ω 4 := T (ω 3 ) = T 2 (ω 2 ) and so on until we obtain a configuration in We consider a configuration σ ≡ ω 1 in E 5 and pick a plus in c on the interface between c and +1, and call j the site of this plus. We define ω 2 = T F j (ω 1 ), i.e., σ(j) is kept fixed. The energy difference is H(ω 2 ) − H(ω 1 ) = 2(2 − h) [19,Tab.1]. We define ω 3 := T (ω 2 ), ω 4 := T (ω 3 ) = T 2 (ω 2 ) and so on until we obtain a configuration in Case E 6 . We consider a configuration σ ≡ ω 1 in E 6 and pick a minus on the interface between −1 and +1, and call j the site of this minus. We define ω 2 = T C j (ω 1 ), i.e., σ(j) switches sign. The energy difference is H(ω 2 ) − H(ω 1 ) = 2(2 − h) [19, Tab.1]. We define ω 3 := T (ω 2 ), ω 4 := T (ω 3 ) = T 2 (ω 2 ) and so on until we obtain a configuration in I σ ∩ E 7 . So, we have If the configuration σ ≡ ω 1 in E 7 contains a strip of −1 adjacent to a strip of +1 and both have width greater then one, then we pick a minus one on the interface between −1 and +1 and we take a path analogously as was done for E 6 . We get a configuration in Otherwise, E 7 contains a strip of c adjacent to a strip of −1, both with width greater then one. Then, we pick a plus one, say in j, in the strip of c. We call j 1 the nearest neighbor of j in −1 and we define ω 2 = T C j1 (ω 1 ), i.e., σ(j 1 ) switches sign. The energy difference is H(ω 2 ) − H(ω 1 ) = 2(2 − h) [19, Tab.1]. We define ω 3 := T (ω 2 ), ω 4 := T (ω 3 ) = T 2 (ω 2 ) and so on until we obtain a configuration in I σ ∩ (E 7 ∪ E 5 ). So, we have To conclude the proof, we compare the value of V * = max{V * A , V * B , V * D , V * E } = 2(2−h) and Γ PCA , and we get

A Appendix
In this Appendix, we recall some results and give explicit computation that are used in the paper. Equation (3.2) is obtained as follows, where the finite non-empty sets X , Q ⊂ X × X , and the maps H : X → R, ∆ : Q → R + are called respectively state space, connectivity relation, energy, and energy cost, and for any σ, η ∈ X there exists an integer n ≥ 2 and ω 1 , ..., ω n ∈ X such that ω 1 = σ, ω n = η, and (ω i , ω i+1 ) ∈ Q for any i = 1, ..., n−1. An energy landscape (X , Q, H, ∆) is called reversible if and only if Q is symmetric, that is if (σ, η) ∈ Q then (η, σ) ∈ Q, and H(σ)+∆(σ, η) = ∆(η, σ)+H(η) for all (σ, η) ∈ Q. . Let X s be the set of stable states and assume X \ X s = ∅. If there exist A ⊂ X \ X s and a ∈ R + such that then Γ m = a and X m = A. i) For any σ ∈ S +1 \ {+1}, the pair (σ, T σ) is not a stable pair.

B Appendix
In this section we prove theorems given in Section 2.4. Before given the proof of Theorem 2.9, we state two useful lemmas. In the first of the two lemmas we collect two bounds on the energy cost to go from any state x = x r 1 to x r 1 or to x 0 , for r = 1, ..., n. The second lemma is similar. x ∈ X m which is in contradiction with Condition 2.4. Next we turn to the proof of the second inequality and we distinguish two cases. If H(x) < H(x r 1 ), then we have that x ∈ I x r 1 . By (2.3) and by (2.13), we get Φ(x r 1 , x) ≥ Φ(x r 1 , I x r 1 ) = Γ m + H(x r 1 ) that proves the inequality. If H(x) = H(x r 1 ), then let us define the set We will show that x ∈ C. Since H(x) = H(x r 1 ), the identity I x = I x r 1 follows. Furthermore, since x r 1 ∈ X m , we have C ∩ I x r 1 = ∅; hence, 3), x would be a metastable state, in contradiction with Condition 2.4. Hence, since x ∈ C, we have that This proves the inequality for every r = 1, ..., n. where the last equality follows from the definition of Gibbs-measure and H(x r 1 ) = H(x q 1 ) for every r, q = 1, ..., n. In order to give an upper bound, we first use the boundary conditions in where we have used that the configuration space is finite. Equation (2.47) finally follows recalling nµ β (x r 1 ) = µ β ({x 1 1 , ...x n 1 }) and by (B.9) and (B.12). Next we prove Equation (2.48). Recalling (2.21) above, we rewrite the expected value in terms of the capacity as for every r = 1, ..., n. (B.13) Considering the contribution of every x r 1 in the sum and observing that h x r 1 ,x0 (x r 1 ) = 1 and h x r 1 ,x0 (x q 1 ) 1 for every q = 1, ..., n , we get the following lower bound: , (B.14) where the last equality follows from the definition of Gibbs-measure and H(x r 1 ) = H(x q 1 ) for every q = 1, ..., n. In order to give an upper bound, we first use the boundary conditions in (2.20) to rewrite (B.13) as follows: Next we bound µ β (x) as µ β (x) ≤ µ β (x r 1 ) exp(−βδ) for some positive δ = min x {H(x) − H(x r 1 )} and for any x ∈ X such that H(x) > H(x r 1 ). Recalling that h x r 1 ,x0 (x r 1 ) = 1, h x r 1 ,x0 (x q 1 ) = 1 + o(1) for every q = 1, ..., n with q = r, we get which implies where we have used that the configuration space is finite and H(x r 1 ) = H(x q 1 ) for every q = 1, ..., n. Proof of Theorem 2.10 and Theorem 2.11. The two theorems follow immediately by exploiting Condition 2.6 and applying Theorem 2.9.
Proof. Given Y, Z ⊂ X such that Y ∩ Z = ∅ and x ∈ X \ {Y ∪ Z}, a renewal argument and the strong Markov property yield Therefore Recalling (2.21), we can rewrite the ratio in terms of ratio of capacities: Hence, we get Equation (B.18).
Proposition B.5. [8, Lemma 3.1.1] Consider the Markov chain defined in Section 2.1. For every not empty disjoint sets Y, Z ⊂ X there exist constants 0 < C 1 < C 2 < ∞ such that for all β large enough.