Mixing time of random walk on dynamical random cluster

We study the mixing time of a random walker who moves inside a dynamical random cluster model on the d-dimensional torus of side-length n. In this model, edges switch at rate \mu between open and closed, following a Glauber dynamics for the random cluster model with parameters p,q. At the same time, the walker jumps at rate 1 as a simple random walk on the torus, but is only allowed to traverse open edges. We show that for small enough p the mixing time of the random walker is of order n^2/\mu. In our proof we construct of a non-Markovian coupling through a multi-scale analysis of the environment, which we believe could be more widely applicable.


Introduction
We study the mixing time of a random walk on a dynamical random cluster model in T d n , the d-dimensional torus of side-length n. In this model, each edge of T d n can be in either of two states: open or closed. At time 0, we take the state of the edges to be distributed according to the random cluster measure with parameters p ∈ (0, 1) and q > 0. That is, for any subset of edges ω ⊂ E(T d n ), with E(T d n ) denoting the set of edges of the torus, the probability that the set of open edges at time 0 is equal to ω is where κ(ω) is the number of connected components obtained in the graph with vertex set T d n and edge set ω, and Z = Z(d, p, q) > 0 is just a normalizing constant so that the above is a probability measure. Instead of representing the state of the edges by the set ω of open edges, we will often represent it by an element η ∈ {0, 1} E(T d n ) , with η(e) = 0 meaning that the edge e ∈ E(T d n ) is closed and η(e) = 1 meaning that e is open. Thus, given η, we have ω = e ∈ E(T d n ) : η(e) = 1 . where an edge e is called a cut-edge if modifying the state of e (while keeping the state of the other edges unaltered) causes a change in the number of connected components in the configuration. Note that whether an edge e is a cut-edge for a configuration η is, in fact, independent of η(e). We let η t ∈ {0, 1} E(T d n ) denote the configuration that gives the state of the edges at time t.
On top of this dynamic environment we place a random walker which starts from the origin of T d n and has a Poisson clock of rate 1. When the clock of the walker rings, the walker chooses an edge uniformly at random from the set of edges that are adjacent to its current location, regardless of their states. If the chosen edge is open at that time, then the walker traverses the edge, otherwise the walker stays put. We denote by X t ∈ T d n the position of the walker at time t, and let {M t } t≥0 = {X t , η t } t≥0 , denote the full system composed of the walker {X t } t≥0 and the environment {η t } t≥0 . We note that {M t } t≥0 and {η t } t≥0 are Markov chains, while {X t } t≥0 is not.
One can check (for example, by reversibility) that if π denotes the uniform probability measure on T d n then π × υ is the unique stationary distribution of {M t } t . Let T mix denote the mixing time of the full system, starting from the worst-case initial state. In other words, given x ∈ T d n and ξ ∈ {0, 1} E(T d n ) , let T x,ξ mix be the smallest t such that, starting from M 0 = (x, ξ), the total variation distance between the distribution of M t and π × υ is smaller than a given constant, which for concreteness we take to be 1/4. Then T mix = min x,ξ T x,ξ mix . Our main result establishes that the mixing time is of order n 2 /µ for all small enough p. We remark that p and q are considered to be constants independent of n, while µ may depend on n; in particular, a natural case in the context of dynamic networks is that µ → 0 as n → ∞. Theorem 1.1. Given any q > 0 and any dimension d ≥ 1, there exists a positive p 0 > 0 so that for all p ∈ (0, p 0 ) there exists C 1 = C 1 (d, p, q) > 0 for which , for all µ = µ(n) > 0 and all n ≥ 1.
The proof of Theorem 1.1 goes via the construction of a non-Markovian coupling using a multi-scale analysis of the environment. We believe this idea can be more widely applicable to analyze the mixing time of random walks on particle systems, and we regard it as one of our main contributions. Another main contribution of our work is to initiate the analysis of the mixing time of a random walk in a dynamic environment where edge updates are not independent of one another; see the related works in Section 1.2. We will employ a multiscale analysis exactly to control the evolution of the environment. We will give a thorough description of the main ideas of the proof in Section 1.4, since first, in Section 1.3, we will need to introduce an auxiliary process.

Lower bounds on the mixing time
We also derive matching lower bounds on the mixing time. We start by stating a straightforward generalization of the lower bound from [11].
We consider a larger class of models, which we refer to as continuous-time random walks on general dynamical percolation, where the word general is to mean that the percolation process may not be independent. Let {X t , η t } t≥0 be a continuous-time Markov chain where the walker X t jumps at rate 1 and can only traverse open edges of T d n , and the environment {η t } t is a Markov chain on {0, 1} E(T d n ) where edges refresh their state at rate µ independently of the walker. As usual µ may depend on n. Let π be the uniform distribution on T d n and let ν be the stationary distribution of the Markov chain {η t } t .
We recall some fundamental definitions. The spectral gap γ of a reversible Markov chain is defined as where the infimum is over all functions f from the state space to R with Var(f ) = 0, the variance Var(f ) being with respect to the stationary distribution of the chain, and E(f, f ) is the so-called Dirichlet form. The relaxation time of the said Markov chain is defined as Given a time interval I ⊂ R + , we say that an edge is I-open, if it is open at some time during I. Then, for any vertex x ∈ T d n and any time interval I ⊂ R + , we let C x (I) denote the connected component of I-open edges from x. Finally, given a subset S ⊂ T d n , let diam(S) = max x,y∈S x − y 1 be the diameter of S, where x − y 1 is the L 1 distance (or, equivalently, the length of the shortest path) between x and y in T d n . We require the following two assumptions from the process {X t , η t } t≥0 : π × ν is the stationary distribution of {X t , η t } t≥0 , (1.3) and ∃δ > 0 and C 2 > 0 such that for any x ∈ T d n we have E ν D 2 x,δ ≤ C 2 , (1.4) where D x,δ = diam (C x ([0, δ])) and E ν denotes the expectation with respect to the stationary measure of the environment. The assumption in (1.3) just says that the stationary measure of the walker is uniform, while (1.4) gives that the environment is strictly subcritical.
Theorem 1.2. Let {X t , η t } t≥0 be a random walk in a general dynamical percolation satisfying (1.3) and (1.4) above. Then, there exist a constant C 3 > 0 depending only on d such that A natural setting is when the environment starts from its stationary distribution. For this, let υ t stand for the distribution of (X t , η t ) where the walker starts from the origin and the environment starts from stationarity (that is, η 0 is distributed as ν). Then, υ t − π × ν TV is the total variation distance between υ t and the stationary measure of {X t , η t } t .
Moreover, there exists a constant C 5 > 0 depending only on d such that, for any t ≥ δ, where E υt stands for the expectation with respect to υ t .
The proofs of Theorems 1.2 and 1.3 are identical to the ones for random walk on dynamical percolation from [11]. For the sake of completeness, we add the proofs in Section 8.
We want to apply the above theorems to derive lower bounds on the mixing time of a random walk in dynamical random cluster. It is clear that (1.3) holds in this case. We will show in Section 9 that (1.4) also holds, obtaining the corollary below. For any q, let p q c be the critical probability for the appearance of an infinite cluster in the random cluster model on Z d . Corollary 1.4. If {X t , η t } t≥0 is a random walker in the dynamic random cluster model, then for any q ≥ 1 and any p < p q c , there exists a constant c = c(d, q, p) > 0 such that the relaxation time of the full system and the mixing time starting from a stationary environment is at least cn 2 /µ. If q < 1, then for all small enough p the same conclusion holds.
The proof of the lower bound is much simpler than that of the upper bound, allowing us to derive it in the whole subcritical regime when q ≥ 1. In fact, when q ≥ 1, the proof follows by using a sprinkling lemma to compare two random clusters configurations with densities p < p (Lemma 9.1), and the exponential decay of cluster sizes in the subcritical regime. When q < 1, exponential decay of cluster sizes is only known for small enough p, preventing us to establish (1.4) in the whole subcritical regime.
We expect the upper bound of order n 2 /µ to hold in the whole subcritical regime as well, however our proof technique requires the percolation process to be a small perturbation of subcritical independent percolation, in a sense that we better explain in Remark 1.5, after introducing the -process.

Related works
We will restrict our discussion to works dealing with the mixing time of random walks on dynamic environments, as otherwise there is simply a plethora of works. We also remark that, if the environment is allowed to evolve in an arbitrary fashion (for example, by taking any sequence of graphs on a fixed vertex set), then several problems may arise. For example, there may not be a stationary distribution for the walker. Moreover, even if there is a stationary distribution, the distribution of the walker may not converge to stationarity, or the total variation distance to stationarity may not be monotone in time.
Random walk on dynamical percolation on T d n . This model is equivalent to the model we described restricted to q = 1. This special case is already quite challenging but some results have been obtained recently. First note that, when q = 1, the two probabilities in (1.2) become equal, and when an edge updates, it does so independently of the other edges, becoming open with probability p or closed with probability 1 − p. Though in this case edges evolve independently of one another, there are strong dependences between the location of the walker and the state of the edges (especially if µ → 0 as n → ∞, since edges update very slowly in comparison to the rate of jump of the walker).
The random walk on dynamical percolation model was introduced by Peres, Stauffer and Steif [11], where it is shown that, in the whole subcritical regime 1 , the mixing time is of order n 2 /µ. We remark that in [11] both upper and lower bounds of order n 2 /µ were derived for T mix . Recall that n 2 is the order of the mixing time of a simple random walk on the static torus (that is, where all edges are open at all times). So, in a subcritical dynamical percolation, the walker is delayed by a factor of 1/µ, which is the expected time that a single edge takes to refresh.
Later, Peres, Sousi and Steif [10] analyzed the supercritical regime and showed that, for p large enough, the mixing time is at most (log n) a n 2 + 1 µ for some constant a > 0. Their upper bound is not believed to be tight: one expects that, in the whole supercritical regime, the mixing time is of order n 2 + 1 µ . This remains an interesting open problem. Their proof makes strong use of isoperimetric properties of the infinite cluster of supercritical percolation, which are only known for q = 1. With regard to the critical regime, the only known result is that the mixing time is of order at most n 2 µ , which is the mixing time in the subcritical regime [7]. It is not inconceivable that the mixing time in the critical case is in fact of smaller order than n 2 µ . Random walk on dynamical percolation on other graphs. Sousi and Thomas [14] studied the case where the torus is replaced by the complete graph. This is a simpler case due to the lack of an underlying geometry, but for which a more detailed analysis can be carried out. They established the order of the mixing time in that case, and also the occurrence of a cut-off phenomenon. We remark that if the walker is at some vertex v and we know that an edge incident to v is updating to open, but we refrain from observing which of the edges incident to v is updating, then the other endpoint of this edge is uniformly at random among all vertices (but v). So, by traversing this edge (call it e), after one additional step, the walker can find itself in a location that is essentially uniformly at random, so very close to stationarity. Though suggestive, this is not enough to establish the mixing time, as one still needs to control that the walker "forgets" that e is now open (that is, the walker may be close to stationarity, but the full system is not). Still, this illustrates the kind of simplification that the lack of an underlying geometry brings.
The last work we mention for the random walk on dynamical percolation model is a recent result by Hermon and Sousi [7]. They developed a comparison principle and showed that, for any graph G, the so-called spectral profile mixing time for the random walk on dynamical percolation on G is at most 1 µ times the spectral profile mixing time of simple random walk on (the static graph) G.
In all the above results, it was crucial that when q = 1 edges update independently of one another. The main objective of our work is to develop a technique that can go beyond the dynamical percolation case and which can deal with environments whose edge updates may depend on one another, including the case of unbounded dependences such as in the dynamical random cluster.
Other models. We end this section by mentioning two lines of work. In the first one, Avena et al. [1,2] studied a different dynamic on the environment, where instead of dynamical percolation one has a dynamic configuration model. This model has some intuitive similarities with the dynamical percolation on the complete graph, in the sense that it also lacks an underlying geometry. They studied the mixing time and the occurrence of a cut-off phenomenon in this setting, but restricted to a random walker that is non-backtracking. This helps the walker to move away from its current location, strongly reducing dependences between the walker and the environment.
Finally, the second line of work we mention is that of [12,13]. They considered the case of a discrete-time random walk on a graph with a fixed set of vertices, but which evolves over time by means of an arbitrary sequence of graphs on that vertex set. The goal of their work is much different than ours; for example, they want to understand which conditions on the sequence 5 of graphs one can impose to guarantee that the mixing time is polynomial. They also derive results for the hitting time and cover time. We refer to [12,13] and references therein for a list of known results on dynamic graphs that go beyond the mixing time. We also refer the reader to [4] for results on a model similar to random walks on dynamical percolation on the complete graph.

The -process: retaining some randomness
Before giving the ideas of our proof, we need to describe a different representation of the full system, which is inspired by [11]. Recall that each edge has a Poisson clock of rate µ associated to it, which gives the times at which the edge is updated. To each update of an edge, we can decide whether the edge becomes open or closed by sampling an independent random variable U with a uniform distribution in (0, 1), and then making the edge open if and only if the edge is not a cut-edge and U < p or the edge is a cut-edge and U < p p+(1−p)q . Now, let Note that if U turns out to be in the interval (0, p min ) ∪ (p max , 1) the outcome of the update (i.e., whether the edge becomes open or closed) is determined regardless of whether or not the edge is a cut-edge. In other words, the update is oblivious to the current configuration, and we will refer to those updates as -updates. We then let p = p min + 1 − p max ∈ (0, 1) be the probability that a given update is a -update.
We now define the update of an edge e in two stages. First, we sample an independent random variable U , which is uniformly at random in (0, 1), so that if U < p , then the update is a -update, otherwise it is not a -update. Next, we use the random variable U to determine whether e updates to open or closed. More precisely, in the case of a -update, we make e open if U < p min p , otherwise e becomes closed. In the case of a non--update, we need to inspect the current configuration to see whether e is a cut-edge or not. In particular, we need to perform what we call an exploration of e, which means that we perform a local search from the endpoints of e that traverses only open edges in order to determine what are the open clusters of the endpoints of e. Hence, each update of e will be represented by a tuple (s, U , U ), where s > 0 is the time at which the update occurs, U ∈ (0, 1) is the variable used to decide whether the update is a -update, and U ∈ (0, 1) is the random variable governing whether the edge is to be updated open or closed.
We use this to introduce another Markov process which we denote by {M t } t≥0 = {X t , η t } t≥0 , and which we refer to as the -process. This process will retain more randomness than {M t } t≥0 and its state space will be , 1} for each edge e adjacent to v .
So an edge will be allowed to be in an additional state, called , which means that in its last update the edge underwent a -update. However, we do not allow that edges adjacent to the walker are in state .
The -process evolves as follows. If the Poisson clock of an edge e rings, we look at the variable U associated with this update and determine whether the update is a -update. If the update 6 is a -update and if e is not currently adjacent to the walker, then we make the state of e equal to . If e is adjacent to the walker, then we look at the variable U associated with this update and determine whether e is open or closed. Finally, if the update is not a -update, then we perform an exploration of e as mentioned above. The difference is that, in such an exploration, we may run into edges that are in state . For each such edge, we immediately sample whether that edge is open or closed by using the random variable U associated with its last update. We proceed in this way until the exploration ends and we have fully determined the cluster of each endpoint of e. At this moment, we know whether or not e is a cut-edge, and we can use the random variable U associated to the update of e to determine whether e is to be made open or closed. There is still one final case to be described: when it is the clock of the walker that rings. Suppose this happens and the walker jumps from a vertex v to a vertex w. Then, if there are edges adjacent to w at state we sample the state of such edges (using the random variables U associated to their last update) and switch them to open or closed, appropriately.
Note that, conditioned on the position of the walker and on the state 0, 1 or of each edge, we gain no information concerning whether the edges in state are open or closed. In particular, each such edge is open with probability p min p (which is the probability that the random variable U associated to their last update is at most p min p ). Therefore, we do not need to keep track of the variables U related to the last -update of each edge, since we can sample U whenever needed independently of the whole trajectory of the process. The -process is thus a Markov process.
Remark 1.5. When q = 1, we have p min = p max , and so p = 1: all updates are -updates, as in this case the random cluster model reduces to dynamical percolation. If q = 1, then as p → 0 we have that p max − p min → 0 and so p → 1. Therefore, for any fixed q and all small enough p, the dynamic random cluster model can be viewed as a small perturbation of dynamical percolation. We also obtain that edges of state are open with probability p min p < p c , so they form a subcritical percolation process as well. Those are the properties that play an essential role in the constructions of the multi-scale analysis and the coupling used to establish the upper bound on the mixing time (Theorem 1.1).

Proof overview
We will only give an overview of the upper bound, which is our main result and by far the most involved proof. We start recalling the proof in [11] for the subcritical regime when q = 1. There they also define the -process (which they denote byM t ). Recall that, when q = 1, we have p = 1, so all updates are -updates. With this, they define a stopping time τ 0 as the first time at which all edges adjacent to the walker are closed, and all remaining edges are in state . (1.5) Then, one can define a sequence of times τ 1 , τ 2 , . . . so that τ i is the first time after τ i−1 +C/µ, for some fixed constant C > 0, at which the event in (1.5) happens. These are regeneration times in the sense that the evolution of the full system from τ i does not depend on what happened before τ i . Once the full system is at a regeneration time τ i , with positive probability the following sequence of events happen within time τ i + C/µ: (v) the edges adjacent to the other endpoint of e (i.e., opposite to the location of the walker) refresh before the edges adjacent to the walker refresh.
When these events occur, the walker does nothing more than a jump to a uniformly random neighbor, and immediately gets back to a regeneration time (so τ i+1 = τ i + C/µ); such a regeneration time is then called a simple random walk regeneration since, at the end, what the walker did was just one step of a simple random walk in T d n . The proof in [11] then goes by showing that the τ i+1 − τ i are of order 1 µ . Therefore, after time n 2 µ , the walker underwent an order of n 2 regeneration times, a positive fraction of which being simple random walk regeneration. So it is possible to couple the full system with another copy of the full system so that, whenever the walker does a simple random walk regeneration, we employ one of the standard couplings of simple random walks on the torus. On the other hand, if the regeneration time is not a simple random walk regeneration, we couple the motion of the two walkers from one regeneration time to the next identically, so that the distance between the walkers does not change. Since an order of n 2 steps is necessary to couple two simple random walks on T d n , we get that performing an order of n 2 simple random walk regenerations is enough to couple the two processes, which translates to a mixing time of order n 2 /µ.
If we try to mimic the steps above for the case q = 1, we immediately run into the issue that the event (1.5) now occurs very rarely. In fact, since non--updates occur with positive probability, we will typically have a positive density of non--edges. Therefore, it will take an exponential amount of time to reach a regeneration time as in (1.5), rendering this strategy useless.
We will devise a different strategy. We will, as before, construct a coupling between two copies of the full-system, where we see the edges "from the point of view of the walker" in the sense that whenever the edge X t + e updates at time t, where X t is the position of the walker in the first copy, then in the second copy we will do the same update to the edge X t + e, where X t is the location of the walker in the second copy. Note that to establish the mixing time of the full system we need to couple the environments and the walkers. For simplicity, we concentrate our discussion here in the coupling of the walkers (which is the most delicate bit), and assume for now that somehow we managed to couple the two environments: that is, the two copies are coupled modulo a translation of the walkers. Note that, from this moment, if we were to employ the identity coupling (that is, the second copy mimics all the edge updates and jumps of the walker from the first copy) we would get that the environments will remain coupled (from the point of view of the walkers) but the distance between the walkers will not change, thereby not allowing the walkers to couple.
Our idea is to observe "a bit" the environment and, whenever the environment looks "favorable enough", we attempt to do a coupling that could bring the walkers closer together, which will be a standard coupling of simple random walks. We will refer to such moments as simple random walk moments, as an allusion to the simple random walk regenerations described above, but with the fundamental difference that they will not be regeneration times. On the other hand, when the environment is not favorable enough, then doing a simple random walk moment is a bit too risky, so instead we resort to the identity coupling as a means to keeping the distance between the walkers unchanged and not spoiling the work done during the favorable regions of the environments.
But what does it mean for the environment to look favorable enough? In short terms, it will mean that the event (1.5) occurs locally. That is, at such times, all edges adjacent to the walkers will be closed and all edges in a small region around the walkers will be (for example, all edges inside a ball of radius 3 around the walkers, excluding the edges adjacent to the walkers). At such a time, with positive probability, the sequence of events described above for the simple random walk regeneration occurs, and therefore we could attemp to perform one of the standard couplings of simple random walks. However, there are two important caveats.
The first caveat is that if we succeed in doing a simple random walk moment with a coupling of simple random walks, then the distance between the walkers will change. This means that the translation that maps the location of one walker to the location of the second walker will change, and this map is what we use to match the edges of the first copy to the edges of the second copy, when we view the edges from the point of view of the walkers. As a consequence, the environments will immediately decouple. Of course, if we only had -edges (besides the ones adjacent to the walkers, as in the case q = 1), then the environments would not decouple since despite the change in the translation map, we would still match -edges in the first copy to -edges in the second copy, so we can easily maintain the environments coupled. But, since q = 1 implies a density of non-edges, the environments will necessarily decouple. Moreover, if we decide to just wait the environments to recouple completely, this would take a time of order log n µ , which is just too long: it will lead to an upper bound on the mixing time of n 2 log n µ . So we will not recouple the environments completely, but will work with partially coupled environments.
The second caveat is that a simple random walk moment occurs with positive probability, so it is also possible that it turns out that a simple random walk moment does not take place. Then, what could happen in this case? If the environments were completely coupled, then we are guaranteed that we can perform identity coupling and keep the distance between the walkers unchanged. But we have just seen that the environments will typically not be fully coupled. Yet, if we knew that the environments are coupled in a neighborhood around the walkers and that the walkers will not exit this neighborhood, then identity coupling is still doable. That will be our strategy, but to implement it we will require a more delicate definition of what a favorable enough environment means.
We will use a multi-scale analysis to control the environment. This will reveal future information regarding the environment; that is, we will observe some information about the environment from time 0 to some time t, and then decide how to couple the walkers from time 0. Therefore, this construction will lead to a non-Markovian coupling.
A good picture to have in mind is that the environment is a process in space-time, where some regions are classified as favorable and others as unfavorable. We observe these regions from time 0 to time t, and then start observing the walkers which are paths in space-time that start growing from time 0. Whenever we see that the walkers are passing through a favorable part of the environment, where favorable will also imply that the walkers will not move outside some neighborhood around their current locations, we will try to do a simple random walk moment. If successful, the distance between the walkers may change and the environments may decouple, but still using (the yet-to-be-defined properties of) favorability we will be able to recouple the environments within a neighborhood around the walkers. If, instead, the simple random walk moment is not successful, then the walkers may move more than just one step of a simple random walk, but favorability will also imply that the walkers will not move too far away, in particular they will remain within a region where we know the environments were coupled. This will translate to a successful application of the identity coupling.
On the other hand, if we see that the walkers are approaching an unfavorable region of the environment, then we will want to do identity coupling but we will need to start preparing ourselves beforehand. The problem is that such an unfavorable region could be of an arbitrarily large scale, and the larger its size is, the earlier we need to start preparing for it. So when we see that in space-time the path of the walker is getting dangerously near an unfavorable region, we stop doing simple random walk moments even if in a smaller scale around the walkers the environment looks favorable. By switching off the simple random walk moments, we only apply identity coupling until the walkers reach the unfavorable region or are again far enough from any unfavourable region. We can show that such identity couplings will succeed and, since the translation map from one walker to the next will not change during this period, it will give enough time for the environments to couple in a region around the walkers that is as large as needed to contain the scale of the unfavorable region that the walkers are approaching. Then, with the environments properly coupled, if the walkers do enter the unfavorable region, they can move as wildly as the environment there allows because we can perform identity coupling throughout the unfavorable region. So the walkers survive the traversal of the unfavorable region without changing their distance.
Then one can imagine that the proof ends by showing that n 2 instances of a simple random walk moment are enough to guarantee that we can couple the walkers. This is partially true. The fact is that, as mentioned above, we need to observe future information to carry out this coupling strategy. But in order to establish that the mixing time is at most t, we need to show that with a large enough probability the two copies of the full system are coupled at time t without revealing any information that goes beyond time t. So our strategy to finalize the proof is to choose an appropriate time t ∈ (0, t), reveal the information up to time t and do the coupling described above up to time t , showing that within t we have carried out an order of n 2 simple random walk moments, and that we coupled the walkers at time t (the environments may, and typically will, be uncoupled except for a small region around the walkers). The whole analysis will be split into three phases, and the above will be carried out in the first two phases. We will be able to show that these first two phase succeed with positive probability.
Next, the goal is to try to do identity coupling from time t to t in a similar manner as we were doing when approaching an unfavorable region. In this second phase, identity coupling can only fail due to information that we have not observed because we are limited to observe the environment up to time t. We will show that, with positive probability, identity coupling will indeed succeed from t to t, leading to a coupling of the full system at time t. This is the content of the third phase. If any of the three phases fail, then we just restart from scratch. We only need to repeat the phases a constant number of times to guarantee that the whole coupling succeeds with probability at least 3/4.

Organization of the paper
In Section 2.1 we will introduce the multi-scale analysis that will allow to control the favorable regions of the environment. Then in Section 3 we will give an more thorough overview of the three phases of the proof of the upper bound, which will better explain the constructions from the tessellation of Section 2.1. Then in Sections 4, 5 and 6 we will give the three phases of the coupling, with the second phase in Section 5 being the most delicate part where the non-Markovian coupling is developed. Then in Section 7 we put all phases together to complete the proof of the upper bound (Theorem 1.1). In Section 8 we establish the general lower bounds from Theorems 1.2 and 1.3, but which are essentially the same proofs as in [11]; this section is added for the sake of completeness. Finally, in Section 9 we apply these theorems to derive the lower bounds on the mixing time and relaxation time of random walks on the dynamical random cluster model (Corollary 1.4).

Multi-scale setup
We start defining a multi-scale tessellation of T d n , which will consist of partitioning T d n into boxes and defining the event that boxes are good or bad. Those events will be then used to define the favorable parts of the environment.

Tessellation
and m be a sufficiently large integer. For each k ≥ 1 we tessellate T d n into cubes of length k where The cubes will be indexed by integer vectors i ∈ Z d , and denoted S core We will consider a tiling of T d n with a hierarchy as each cube of scale k is contained inside a unique cube of scale k + 1. For simplicity we will assume k divides n for all k we will consider 2 . Moreover for any subset V of the vertices of T d n , we denote by the set of all edges incident only to vertices in V .
Now we define a multi-scale tessellation of time. At scale 1, we tessellate R into intervals of length t 1 = √ µ and then, for higher scales, we define We index the time intervals by τ ∈ Z and denote them by T core Now for any i ∈ Z d , k ≥ 1, and τ ∈ Z, we define the core of the space-time k-box by For any subset of A ⊂ Z d , we let ∂A denote its inner boundary. Then, in space-time, we define the spatial boundary of R core For the time dimension, we define two time boundaries, the boundary ∂ + t corresponding to the largest unit of time in the box and the boundary ∂ − t corresponding to the smallest unit of time in the box: Figure 1: On the left a space-time box R k+1 (i, τ ) represented by a blue square, its core highlighted in yellow, its partition into cores of scale k + 1 represented by solid black lines, and its partition into cores of scale k in dashed lines. The horizontal axis represents space and the vertical axis represents time. On the right a space-time box of scale 1 (highlighted in blue), and space-time cores of level 1 in dashed lines.
For k ≥ 2, each box R core k (i, τ ) will be the central part of a larger box where we let In words R k (i, τ ) is composed of a cube in space of side length 3 k and a time interval of length 3t k , and it has R core k (i, τ ) as its central part (see Figure 1). For scale k = 1, we will need a small intersection between the time dimension. For this, let and set for each τ ∈ Z We then define the space and time boundaries of R k (i, τ ) for each k, i, τ analogously to (2.3) and (2.4).
Finally we denote by S inn 1 (i) the inner part of S 1 (i) which is obtained by removing all the vertices within distance γ 6 log 2 from the boundary of S 1 (i) (γ is a constant that will be clarified later in the definition); in symbols, 2.2 Good boxes at scale 1 Definition 2.1. We say that an event A is restricted to a region R ⊂ V (T d n ) and a time interval [s 0 , s 1 ] if A is measurable with respect to the σ-algebra generated by the updates of the edges from E(R) from time s 0 to s 1 , together with the random variables U, U associated to such updates. If we denote I = [s, s ], then we employ the shorter notation C x (I). Below we split the time interval of a box into two sets of sub-intervals, and then introduce the definition of good boxes.
We define two other tessellations of disjoint time intervals. The first one has length log 2 µ : Moreover, we fix a constant γ = γ(p, q, d) > 0 such that p min p + γ < p c , where p c is the critical probability for independent bond percolation on Z d , and introduce a tessellation of time of length γ µ : We assume throughout this paper that t 1 divides t 1 and that γ/µ divides t 1 , so that T 1 is a finer tessellation than T 1 , which in turn is a finer tessellation than T core 1 .

Remark 2.3.
Note that the larger p is (that is, the closer p is to p c ) the smaller we need to take γ. However, as we will need to take p small enough in several places in the proof, we will set γ first (for example, it is enough to take γ = 1 100 ). Then we make p small enough so that the condition on γ is satisfied.
The definition of good boxes will be done in steps. First we define some fundamental events that we will require from good boxes.
(G 2 ) For each j ∈ Z + and each spatial box S 1 (i), define G 2 (i, j) the event that, for any e ∈ E(S 1 (i)), during T 1 (j) edge e never gets a -update to become open.
(G 3 ) For each j ∈ Z + and each spatial box S 1 (i), let G 3 (i, j) be the event that, for each e ∈ E(S core 1 (i)), the number of -updates on edge e during T 1 (j) is at least 1 2 p log 2 (for the values of and p we will consider this will always be at least 1).
(G 4 ) For any j ∈ Z + take the unique τ such that T 1 (j) ⊂ T core 1 (τ ). For any site x on the torus, if we regard all edges closed at time τ t 1 and we only consider -updates disregarding all non--updates, then let G 4 (x, j) be the event that in other words, it is the value such that T 1 (j(τ )) starts at the initial time of R 1 (i, τ ). Note that both T 1 (j(τ )) and T 1 (j(τ + 1)) are contained in T 1 (τ ).
The event that a box R 1 (i, τ ) is good will be composed of four events, which we denote by and G 4 (i, τ ). The first event regards only non-updates and is simply
Lemma 2.5. Let R 1 (i, τ ) be any box of scale 1. There exist constants c, c > 0 so that for all small enough p we obtain P (G c 12 (i, τ )) ≤ c d+ 1 2 p max . and Proof. We start with event G 34 (i, τ ). For a given edge, an update that is not occurs at rate (1 − p )µ, a -update occurs at rate p µ, and a -update that opens an edge occurs at rate p p min p µ = p min µ. Moreover, there are at most 2d 3 d d edges in E(S 1 (i)). For G 3 , note that for a given edge e ∈ S 1 (i) and a given j, the number of -updates on e during T 1 (j) is a Poisson random variable of mean p log 2 . Therefore, using a standard Chernoff bound for Poisson random variables and the union bound over the edges in S 1 (i) and over the values of j, we obtain a constant c 1 > 0 such that Note that as p decreases to 0 we have that p increases to 1 and increases to ∞. So for all small enough p we obtain for some constant c 2 . Regarding G 4 (i, τ ), for any j ≥ 0 with T 1 (j) ⊂ T core 1 (τ ) \ T 1 (τ + 1) and any edge e, note that the probability that e is open at the beginning of the interval T 1 (j), given that we only consider -updates and consider that all edges are closed at τ t 1 , is at most p min p , since this is the probability that the last -update of e (if there was any) made e open. Thus, using that 1 − exp (−p γ) is the probability that e has a -update during T j 1 , we obtain that the probability that e is T j 1 -open under the above assumptions is at most In other words, the set of T j 1 -open edges (under the above assumptions) forms a subcritical percolation cluster. Therefore, using the exponential decay of cluster size for subcritical percolation, together with the union bound over all sites and values of j, we obtain a constant c such that We note that c 3 increases as p decreases. So the bound on P (G c 34 (i, τ )) follows by taking p small enough.
Now we turn to G 12 (i, τ ). From the above considerations we obtain where the term 1 − exp(·) comes from G 1 and G 2 , while the last term comes from G 3 as in the case of G 3 above. Using that where we use that √ is the term that dominates inside the parenthesis as p is made small enough (hence, is made large enough). So we can take p small enough so that p min log 2 + 2 exp −c 1 p log 2 ≤ 2p min log 2 ≤ 2p min √ .
We need one more step to define good boxes of scale 1. Using a standard result for percolation with bounded dependences [8], we will replace G 34 (i, τ ) by a collection of i.i.d. Bernoulli random variables.
be a collection of i.i.d. Bernoulli random variables of parameter 1 − exp −C 6 log 2 , Then for any .
Proof. First note that if we fix i, then G 34 (i, τ ) forms a collection of independent random variables as τ varies. So Note that this is true even for the events G 4 since for x ∈ S inn 1 (i), the event that the component of x is of length log 2 is measurable with respect to the edges in S 1 (i). Given i, the number of i such that S 1 (i) ∩ S 1 (i ) = ∅ is strictly smaller than ∆ = 5 d . Then, from Lemma 2.5 we obtain that by taking p small enough the marginal probability w = P (G c 34 (i, τ )) can be made smaller than (∆−1) ∆−1 ∆ ∆ , and so we can apply [8,Theorem 1.3] to deduce that the family G 34 (·, ·) stochastically dominates a set of i.i.d. Bernoulli random variables with parameter Applying Lemma 2.5 for the value of w completes the proof since w 1/∆ Now we are ready to define good boxes at scale 1.
For convenience, we assume that for τ < 0 then G(i, τ ) holds for all i ∈ Z d . We also couple (2.9) The following lemma bounds the probability that a box is bad, and follows directly from the previous lemmas.
Lemma 2.8. Let R 1 (i, τ ) be any box of scale 1. There exists a constant C 7 > 0 so that for all small enough p we obtain P (G c (i, τ )) ≤ C 7 Proof. This follows from Lemmas 2.5 and 2.6 Remark 2.9. Note that the event {R 1 (i, τ ) is good} is restricted to the cube S 1 (i) and the interval T 1 (τ ). In fact, that is why in G 4 we assume that all edges are closed at time τ t 1 , the initial time of the core T core 1 (τ ); note that the fact that all edges are closed at time τ t 1 is implied by G 2 , but by explicitly adding the assumption in G 4 we make G 2 (i, τ ) and G 4 (i, τ ) independent of each other. Note also that the decision of whether a box is good is completely independent of the walker, it only depends on the updates of the dynamical random cluster process.
Recall that X t denotes the position of the random walker at time t. In the lemma below, we will show that if the walker happens to be inside a good box, then it cannot move very quickly. This will allow us to have a better control on where the walker can be.
Lemma 2.10. Let t ≥ 0 be any given time and suppose (X t , t) ∈ R core In particular, if τ > 0 then where the last inequality holds for all p small enough (thus, large enough) and where x − y 1 denotes the L 1 distance in the torus between the positions x, y ∈ T d n ; in particular, x − y 1 does not depend on whether edges are open or closed.
Proof. This is a direct consequence of the event G 4 from the definition of good boxes, and the fact that good boxes do not have non--updates. There is just one caveat. The event , which is the only time interval of the type T 1 (·) inside T 1 (τ ). However, for such a j, we know that the connected components inside T 1 (j − 1) are of size at most log 2 , and G 2 (i, τ ) implies that during T 1 (j) no edge of S 1 (i) gets an update to open. Therefore, the size of the connected components can only decrease during T 1 (j) and the result follows.

Good boxes at larger scales
In this section we define the concept of good and bad boxes of scale larger than 1, but first we define a slightly relaxed version of intersection of boxes.
Definition 2.11. Since boxes are defined by semi-open intervals, we will consider boxes that are barely non-intersecting as intersecting. That is, we consider two boxes R k (i, τ ) and R k (i , τ ) as non-intersecting if and only if is bad} is strictly restricted to the cube S k (i) and the time interval T k (τ ). Moreover, by translation invariance, for any pair (i, τ ), (i , τ ) and any scale k Definition 2.14. Define ρ k as the probability that a k-box R k (i, τ ) is bad, that is, As noted in Remark 2.13, this probability does not depend on (i, τ ).
Recall that m is the variable that appears in the definition of k from (2.2). Lemma 2.15. For any m > 0, by setting p small enough we obtain Proof. We prove the lemma in a slightly stronger version: we prove that we can set values c k , satisfying c k ≥ 1 2 for all k, so that We prove this by induction. For k = 1 the statement is trivially satisfied by setting c 1 = 1. Assume the statement is true up to k. Now, by the definition of bad box we have for all k ≥ 1, provided ρ 1 is small enough with respect to m. Given m, ρ 1 can be made small enough by setting p small enough, as in Lemma 2.8. Notice that which proves the lemma.

Enlargement of boxes
As we discussed in the proof overview, whenever the walkers are in a favorable region of the environment, we will try to use a simple random walk coupling to bring the walkers together. However, when the walkers are in an unfavorable region of the environment, which essentially means that the walkers are approaching a bad box (at some scale), then we will have to refrain from doing this simple random walk coupling, and will just do a naïve identity coupling in order to let the environments couple around the walkers before they can reach the bad box. Here we will define two types of enlargements of boxes so that it is when the walker enters the enlargement of a bad box that we will need to stop doing the simple random walk coupling.
We also denote Note that R enl1 k (i, τ ) is a parallelogram of spatial length 9 k and time length 9t k for d ≥ 2 and 7t 1 + t 1 for d = 1.
Remark 2.17. The 1-enlargement is a (d + 1)-dimensional parallelogram centered in R k (i, τ ) defined to obtain the following property. Let R k (i, τ ) be a bad box, whose whole 1-enlargement R enl1 k (i, τ ) is contained inside a good box of scale k + 1. Then, we know that the only boxes of scale k inside the (k + 1)-box that can be bad are those intersecting R k (i, τ ). Let I be the Figure 2: In black a box R k (i, τ ), with its 1-enlargement in blue and its 2-enlargement in purple. The figure is not to scale and illustrates the case k ≥ 2; recall that the time intervals are defined differently at scale 1.
by at least one layer of cores. We define this layer of cores as and we also denote Note that the 2-enlargement is a larger We will require a different type of boundary for the second enlargement.

Feasible Paths
In this subsection we introduce the concept of feasible paths. For any graph is a càdlàg function of time such that for any s ∈ R + , if we take s to be the smallest value that is larger than s and such that P(s ) = P(s) then P(s ) ∈ N G (P(s)).
Note that a path, as defined above, does not consider whether edges are open or closed and is thus allowed to jump across closed edges. The same is true in the definition below. Recall the definition of the time intervals T j 1 from Definition 2.2 and the inner part S inn 1 (i) of box i from (2.7). Definition 2.20 (Feasible path). A path P is said to be feasible if for any times s, s with s, s ∈ T j 1 (τ ) ⊂ T core 1 (τ ) for some j and τ > 0, and such that P(s) ∈ S inn Intuitively, a feasible path can move at most log 2 during any interval T We will refer to a path that leaves the box R k (i, τ ) from the time boundary as a path P such that (P(s), s) ∈ R core k (i, τ ) for some s ≥ 0 and if s > s is the smallest value such that (P(s ), s ) / ∈ R k (i, τ ) then P(s ) ∈ S k (i). In other words, it is a path that exits R k (i, τ ) through ∂ + t R k (i, τ ). In the following two lemmas we will prove that a feasible path always leaves good boxes from the time boundary. Recall the definition of C x (I), the connected component of x during a time interval I, from the paragraph preceding Definition 2.2.
We define the spatial core of a box as follows: Proof. For any (v, s) ∈ R s-core Assume first that s ≥ t 1 . Then the subinterval T For all small enough p we have 3t 1 µ γ log 2 = 3 √ γ log 2 ≤ log 3 , recalling that γ is set before we take p small enough as in Remark 2.3.

20
The next lemma is the analogous of Lemma 2.21 for higher scales.
Lemma 2.22. For all m large enough and all p small enough with respect to m, the following holds. Let k ≥ 2 and let (i, τ ) be such that R k (i, τ ) is a good box. Let P be a feasible path such that (P(s), s) ∈ R s-core k (i, τ ) for some s. Assume that either s ≥ t 1 or |C P(s) (s)| ≤ √ log 2 . Then P leaves R k (i, τ ) from the time boundary. Moreover, while the path is inside the box, from time s up to time (τ + 2)t k , the path must be within distance k /9 from P(s).
Proof. For any (v, s) ∈ R s-core k (i, τ ) and any (v , s ) ∈ ∂ s R k (i, τ ), v − v 1 ≥ k as well as |s − s| ≤ 3t k . We do a proof by induction on k. The case k = 1 is be a direct consequence of Lemma 2.21. We will actually assume a slightly stronger induction hypothesis. Take c 1 = 1 100 , c 2 = 1 10 and, for j ≥ 2, set c j+1 = c j + 11 mj 2 . We take m large enough so that c j ≤ 1 9 for all j ≥ 1. Now, for a scale k, assume that if P(s) ∈ R s-core k (i, τ ) for some (i, τ ) and some s as in the statement of the lemma, then sup s ∈[s,(τ +2)t k ] P(s) − P(s ) 1 ≤ c k k . We want to prove the above for scale k + 1.
We split into two cases, starting with k ≥ 2 (thus, k + 1 ≥ 3). Let now P be a feasible path such that (P(s), s) ∈ R s-core k+1 (i, τ ), and R k+1 (i, τ ) is a good box. Thus there are no pairs of non intersecting bad boxes of scale k inside R k+1 (i, τ ). By Remark 2.17, if R k+1 (i, τ ) contains at least one bad box R k (i , τ ), then all bad boxes contained in R k+1 (i, τ ) are contained in R enl1 k (i , τ ). Inside R enl1 k (i , τ ) a feasible path has no restriction on how quickly it can move and it could potentially traverse S enl1 k (i ) instantaneously. The remaining boxes of scale k that are in R k+1 (i, τ ) are good and by the inductive hypothesis we can use that in these ones the maximum displacement of the path is bounded above by c k k , so for k ≥ 2 it follows that The term 12 + 2t k+1 2t k accounts for the following boxes. Each time the path P finds itself at the starting time of a k-core, it spends at least time 2t k inside the corresponding k-box, and after that amount of time finds itself at the starting time of another k-core. This gives at most 2t k+1 2t k k-cores for which we can apply the induction hypothesis. There are situations, however, that we cannot guarantee that the path P is at the starting time of a k-core. One first situation is the very first box. We can still apply the induction hypothesis in such cases, since the hypothesis requires only that the path is inside the spatial core, regardless of it being the starting time of a core or not, but can give rise to at most 12 additional boxes to the counting: the first box, the boxes right before and after R enl1 k (i , τ ), and the 9 time intervals contained in R enl1 k (i , τ ). Now we turn to the last case, which is k = 1 (that is, k + 1 = 2). We proceed in the same way as before, but taking care of the fact that boxes at scale 1 have a different length in the time dimension. We have sup s ∈[s,(τ +2)t 2 ] P(s) − P(s ) 1 ≤9 1 + 12 + 2t 2 t 1 c 1 1

21
The proof is concluded by taking m large enough to guarantee that c k+1 = 1 100 + 1 10 + 20 m k i=2 1 i 2 < 1 9 for all k.
Next we will prove that if a feasible path enters the 2-enlargement of a box R k (i, τ ) from its spatial boundary, and all the k-boxes inside R enl2 k (i, τ ) \ R enl1 k (i, τ ) are good, then the path remains far from the box R k (i, τ ). For this lemma, recall the definition of ∂ enl2 s R enl2 k (i, τ ) the 2-enlargement boundary of R k (i, τ ) from Definition 2.19.
Lemma 2.23. Let P be a feasible path such that (P(s), s) ∈ ∂ enl2 Assume without loss of generality that during [s, s ] the path never visits a box R k (·, ·) which is not contained in R enl2 k (i, τ ); otherwise we can carry out the proof separately to each portion of the path that only traverses boxes R k (·, ·) contained in R enl2 is a good box, letting s = sup {T k (τ )} we have that P(s) − P(s ) 1 ≤ k 9 by Lemma 2.22. If s < s , we can iterate the above argument obtaining that P(s) − P(s ) 1 ≤ 24 k 9 , where 24 amounts for the largest number of iterations. Since boxes have length 2t k in the time dimension, it would be enough to replace 24 by 12 for k ≥ 2, but we just use the larger bound 24 to accommodate also the k = 1 case, for which the length of a box in the time dimension is smaller. Now since 24 k 9 is smaller than 13 k , which is the distance between P(s) and the spatial boundary of S enl1 k (i) enlarged by all boxes R k (·, ·) that intersects it, the path can only traverse good kboxes while inside R enl2 k (i, τ ). In addition, for any v ∈ S enl1

Great Boxes
We will need a stroger notion for boxes of scale 1, which we will call great boxes.
to be the set of k-great boxes.
Later we will see that the walker has to traverse a feasible path. The next lemma will be used to say that if the walker traverses a k-box that is good with a large neighborhood of good k-boxes, then it is necessarily the case that the walker has to traverse enough k-great boxes. Such great boxes will be the places where we will attemp a simple random walk coupling later.
Lemma 2.25. Let P be a feasible path such that (P(τ t k ), τ t k ) ∈ ∂ − t R core k (i, τ ) for some (i, τ ). Assume that either τ > 0 or |C P(τ t k ) (τ t k )| ≤ √ log 2 . Then there exists C 8 > 0 such that letting r = C 8 t k t 1 we can find times s 1 < · · · < s r and distinct space-time indices (i 1 , τ 1 ), . . . , (i r , τ r ) such that the following all hold: • For all j, (P(s j ), s j ) ∈ ∂ − t R core 1 (i j , τ j ) and P exits R 1 (i j , τ j ) from the time boundary.
Proof. First, note that if a box R 1 (ĩ,τ ) is (k − 1)-great and is contained in R k (i, τ ) then it is also k-great. We will prove the statement of the lemma replacing C 8 with c k , some function of k. Then the lemma follows by showing that there is a universal value C 8 such that 0 < C 8 ≤ c k for all k. We will do a proof by induction on k. Case k = 1 is trivially verified by choosing c 1 = 1 because in this case r = c 1 = 1 and we take (i 1 , τ 1 ) = (i, τ ). Now, for k ≥ 2, assume the lemma is true up to scale k − 1 and consider a feasible path that at time τ t k is inside ∂ − t R core k (i, τ ) such that every box of scale k whose 2-enlargement intersects R k (i, τ ) is good. By Remark 2.17, the bad boxes of scale k − 1 inside R k (i, τ ) (if there are any) are all contained in R enl1 k−1 (i , τ ) for some i , τ . We then regard all boxes of scale 1 which are in at least one of the 2-enlargement of the boxes contained in R enl1 k−1 (i , τ ) as potentially not (k − 1)-great. By Lemma 2.22 we know that P crosses In words the path stays for time at least 2t k in the box S k (i).

Since the path starts from
for some i and τ . In this (k − 1)-box we can apply the inductive hypothesis, so after time 2t k−1 the path has gone through at least c k−1 t k−1 t 1 distinct (k −1)-great boxes. Since this path remains inside R k (i, τ ), we immediately obtain that such boxes are all k-great boxes. When the path reaches ∂ + t R k−1 (i , τ ) we have that the path is now on ∂ − t R core k−1 (i + j, τ + 2) ⊂ R k (i, τ ) for some j ∈ Z d and from here we can reapply the inductive hypothesis. So it remains to count how many times we can iterate this procedure before 2t k amount of time has passed.
To do this, we first count how much time the path can spend inside the 2-enlargement of a bad (k − 1)-box. It suffices to count how much time is spanned by the boxes whose 2-enlargements intersect R enl1 k−1 (i , τ ), which is |T enl1 k−1 (·)| + 2|T enl2 k−1 (·)| = 57t k−1 . Hence, the number of times the above procedure can be iterated is at least From the inductive hypothesis, the path will traverse at least (k − 1)-great boxes, by setting c k = 1 − 57 2m(k−1) 2 c k−1 . These boxes are k-great by the properties of R k (i, τ ). The lemma is then concluded by setting

Overview of the Proof
In this section we give a high-level description of the proof. Consider two processes {M t } t≥0 = {X t , η t } t≥0 and {M t } t≥0 = {X t , η t } t≥0 with starting states M 0 , M 0 ∈ Ω . We will construct a coupling of the two processes so that for some time T = ∆ 1 + ∆ 2 + ∆ 3 of order n 2 µ the two configurations agree with positive probability. Since {M t } t≥0 can be recovered from {M t } t≥0 , by sampling independently the edges with status , we will obtain our result.
The coupling will consist of three different phases which we will describe in a high level way below. The coupling of each phase will have a small, albeit positive, probability of failing. If the coupling of a phase fails, we declare the whole three-phase procedure to have failed, let the two processes evolve arbitrarily until time T and restart everything again from phase 1. The detailed analysis of each phase will be given in sections 4, 5 and 6. Then in section 7, we will put all phases together and complete the proof of Theorem 1.1.

First phase: the local coupling
During the first phase we let the two processes evolve independently, and wait for the first time the graphs of the two processes agree on a ball of radius 2 around the walkers, that is, we wait for a time t such that for all edges e ∈ E(B ∞ 2 (0)), where B ∞ r (x) is the vertices inside the L ∞ ball of radius r around x. We will show in Lemma 4.1 that this will happen within time ∆ 1 with large enough probability, where ∆ 1 has order log 2 n µ . This is the shortest of the three phases. If the first phase does not end within time ∆ 1 , we declare the whole three-phase procedure to have failed. This phase will be handled in section 4.

Second phase: the non-Markovian coupling of the walkers
This is the most involved phase. After the first phase has been completed successfully, the graphs of the two processes are the same on a ball of radius 2 around the walkers. Then, in the second phase we wish to couple the motion of the walkers. We use the tessellation to decide when to couple the walkers identically (so that they jump in the same way) and when to perform a better coupling aiming to decrease the distance between the walkers.
Intuitively, whenever the walkers are passing through a "bad" region of the environment (which in our case will be the 2-enlargement of a bad box) we will just do the identity coupling to make sure the distance between the walkers does not increase. In fact, we will only be able to do the identity coupling because we will use the annulus between the 2-enlargement of the bad box and the bad box itself (which is composed of good boxes) to give time for the graphs around the walkers to get coupled in both configurations, allowing the identity coupling to be carried out. If instead the two walkers are in a great box, then we try to do a better coupling, which we shall refer to as a simple random walk moment.
More precisely, translate time so that the second phase starts from time 0. Then, we create the multi-scale tessellation describe in Section 2.1 up to time ∆ 2 + ∆ 3 where ∆ 2 and ∆ 3 are of order n 2 µ . We will fix a largest scale k max and will look at how many times the walkers enter k max -great boxes.
When the walkers are in great boxes, Lemma 5.14 will give that the environment is favourable enough so that with positive probability the displacement of the walkers will have the same distribution as that of a simple random walk on T d n (i.e., where all edges are open). Phase 2 ends at time ∆ 2 where we check whether the walkers are coupled and the graphs are coupled on a ball of radius 2 around the walkers. Lemma 2.25 says that the walkers will cross an order of n 2 great boxes during [0, ∆ 2 ] and, therefore, by time ∆ 2 the walkers are expected to have done an order of n 2 simple random walk steps. Since two simple random walkers on T d n can be coupled in a way that they coalesce after a time of order n 2 , we can ensure that with high probability phase 2 ends successfully. The details are carried out in section 5.

Third phase: the coupling of the graphs
The third phase starts are time ∆ 2 ; as before we translate time so that the second phase starts at time 0. At the beginning of the third phase the walkers are coupled and the graphs are coupled as well on a ball of radius 2 around them. The idea of this phase is to keep performing identity coupling until the graphs couple together everywhere. We will show that this simple idea works.
There is one tricky issue. During the second phase, we needed to construct the tessellation all the way to time ∆ 2 + ∆ 3 , while the second phase ends already at time ∆ 2 . The reason for this is that, in order to know whether we can perform a simple random walk moment, we need to observe a little bit of future information about the environment. Therefore, as we performed the second phase, we observed some information from the updates after the end of phase two.
So the goal of the third phase is simply to let time pass until we get to a point where no information regarding future times has been observed, meanwhile doing identity coupling. With this, during the third phase we aim to keep the walkers coupled at all times, while we finish to couple the graphs before time ∆ 2 + ∆ 3 .
We do not use any further information from the tessellation than what we already observed for phase 2. The delicate point is that in order to apply identity coupling of the walkers, as we explained in the second phase, we have to ensure that the graphs around the walkers are coupled. How large a region we require to be coupled depend on the environment of good and bad boxes that is ahead of the walker, but now we cannot observe anything beyond what we have already observed in phase two; otherwise we would keep observing future information.
As hinted above, we just proceed with the identity coupling "blindly". That is, we perform identity coupling up to time ∆ 2 + ∆ 3 assuming that any information that we have not yet observed is "good", and simply "hope for the best". It will turn out that this procedure succeeds with large probability leaving the two processes completely coupled (both the graphs and the walkers) by time ∆ 2 + ∆ 3 . The details of this phase are given in Section 6.

What if a phase fails?
If any of the three phases does not successfully end, we let the two processes run independently (modulo what has already been observed) until the end of the third phase. This is needed as we might have observed some information about the environment up to that time. After that, we repeat the procedure from phase 1. Since the three phases succeed with positive probability, we only need to repeat the whole procedure a constant number of times. The end of the proof of the upper bound is given in section 7.

25
During the first phase we let the processes M t = (X t , η t ) and M t = (X t , η t ) evolve independently. Let Ψ t : V → V be the translation that maps X t into X t ; we will abuse notation and use the same Ψ t to denote the corresponding translation map E → E of the edges. For any i ∈ Z + ∪ {∞}, we define is the set of edges in the ball of radius r around v according to the norm L i . We omit i from the superscript whenever i = 1. Define the event that the edges in a L ∞ ball of radius 2 around the walkers are all at time t, except for the ones adjacent to the walkers which are closed; recall that is the size of the core of boxes of scale 1, whose value is given in (2.1). Let Note that τ B is a stopping time. Define ∆ 1 = C 9 log 2 n µ , for some constant C 9 (p) > 0, and define the event which we shall take as the event that phase 1 succeeds. This event is a bit more restricted than the one announced in the previous section, but this will be convenient for us in the next phase.
We then run phase 1 until τ B or ∆ 1 , whichever occurs first. If it turns out that F 1 does not occur, phase 1 is then stopped at time ∆ 1 and we declare the whole procedure to have failed at time ∆ 1 . In this case, we do not proceed to the second phase, and define ∆ 1 as the failing time of the procedure and, as we will explain more thoroughly in Section 7, we will restart from phase 1 from ∆ 1 .
The following lemma establishes the probability that the first phase is successful.
Before showing that phase 1 succeeds with good probability, we need to establish a simple result on percolation. We then prove Lemma 4.1 in Section 4.2.

Percolation on cylinders and open upwards paths
Let G = (V, E) be a finite graph whose maximum degree is d max ; in our case, it would be enough to take G to be the d-dimensional torus T d n of side length n, where nearest-neighbors are defined according to the ∞ norm. We consider the discrete cylinder V cyl = V × Z + and define a site percolation process on V cyl with parameter > 0. In other words, we declare each site of V cyl to be open with probability , independently of one another; a vertex that is not open is said to be closed. For two vertices of x, y ∈ V , we write x ∼ y to denote that the graph distance between x and y is at most 1 in G; thus, for example, x ∼ x for all x ∈ V . . An open upwards path in V cyl is a sequence of sites (i 0 , τ 0 ), (i 1 , τ 1 ), (i 2 , τ 2 ), . . . , (i r , τ r ) such that i j ∈ V , τ j ∈ Z + , i j ∼ i j+1 and the following holds for all j. If (i j , τ j ) is open, then τ j+1 = τ j + 1; otherwise, τ j+1 − τ j ∈ {0, 1}. In other words, the path is compelled to move "upwards" in the cylinder when it visits open sites.
Note that an open upwards path is allowed to visit a vertex more than once. We say that an open upwards path (i 0 , τ 0 ), (i 1 , τ 1 ), (i 2 , τ 2 ), . . . , (i r , τ r ) traverses m levels if τ r − τ 0 = m. When is close to 1, an open upwards path cannot visit too many closed sites. This is quantified in the next lemma. Before proving the above result, we need the following estimate in the number of subgraphs of G that contain a given vertex. Proof. This proof is quite standard and a version for the lattice can be found in [6, Proof of Theorem 4.20]; we include a proof here for the sake of completeness. Let A r,s be the number of induced connected subgraphs of V containing v and having r vertices and s boundary vertices, where a boundary vertex is a vertex that does not belong to the subgraph but has a neighbor who does. Hence, A r = s A r,s . Note that for any ∈ (0, 1), if we perform percolation on V , we obtain r,s A r,s r (1 − ) s = 1.
Proof of Lemma 4.3. Let m be an integer and, for convenience, set τ 0 = 0. Consider an open upwards path (i 0 , τ 0 ), (i 1 , τ 1 ), (i 2 , τ 2 ), . . . , (i r , τ r ) such that τ r −τ 0 = m; that is, the path traverses m levels. Let s 1 be the number of distinct closed sites visited by the path before it traverses 1 level, and for j ≥ 2 let s j be the number of distinct closed sites visited by the path after having traversed j − 1 levels and before traversing j levels. So m j=1 s j is the total number of distinct closed sites visited by the path. Note that, the sites counted in each s j must be of the form (·, j − 1) and must form a connected set with respect to the relation ∼ over G. Using Lemma 4.4, given s 1 , s 2 , . . . , s m , the number of possibles ways to pick the set of distinct sites within (i 0 , τ 0 ), (i 1 , τ 1 ), . . . is where c is the constant from Lemma 4.4, and the term s j in the product counts the number of sites at level j that can be selected to be the last vertex visited by the path before going to level j + 1. Then, (d max + 1) m accounts for the number of ways to choose the first site at level j given the last site at level j − 1; this amounts to at most d max + 1 choices per level. If we fix m j=1 s j = S, the number of ways to select the s j is S+m−1 S . Finally, given all sites in the path with s j as defined above, the probability that this path is an open upwards path is at most m j=1 (1 − ) s j since each site counted in the s j must be closed. Therefore, the expected number of open upwards paths that traverse m levels and visit at least αm closed sites is at most where we used that given c there exists a constant c such that ze c z ≤ c e c z for all z. It is enough to use the trivial bound S+m−1 S ≤ 2 S+m−1 in the above expression to obtain the upper bound with the inequality hold whenever is close enough to 1 so that 2c e c (1 − ) ≤ 1 2 . Then the lemma holds by setting further closer to 1 so that 4(d max By using a result by Liggett, Schonmann and Stacey [8], the above result can be extended to percolation on T d × Z + with bounded dependences. Proof. For any˜ , provided is large enough we can apply Liggett, Schonmann and Stacey [8,Theorem 0.0] to obtain that the dependent site percolation process stochastically dominates an independent site percolation process of parameter˜ . The lemma then follows by applying Lemma 4.3 to this independent site percolation process.

Proof of Lemma 4.1
Now we are in a position to establish the occurence of the first phase.
Proof of Lemma 4.1. Let τ B be the first time t ≥ t 1 such that X t and X t are both isolated, meaning that all edges adjacent to them are closed. We will show that τ B occurs before time For each process M t and M t we create a tessellation of T d n × [0, ∆ 1 ] into boxes of scale 1 using the values for and t 1 from Section 2.1. The event that a given box is good is defined as in Definition 2.7. We let M t and M t evolve independently of one another until a stopping time st 1 where X st 1 and X st 1 are both in good boxes and s ≥ 1. Note that good 1-boxes form a dependent site percolation process on T d n × Z + so that we can apply Lemma 4.5. Let (i 0 , 1) and (i 0 , 1) be the boxes visited by X t and X t at time t = t 1 . Now, since random walks must traverse a feasible path, and since feasible paths leave good boxes from the time boundary (cf. Lemma 2.21), we obtain that from (i 0 , 1) and (i 0 , 1) the random walks X t and X t must traverse an open upwards path. Therefore, the probability that up to level m = ∆ 1 /t 1 − 1 we have that X t and X t each visited more than m 3 bad 1-boxes is at most 2e −cm provided p is small enough (which makes the probability that a 1-box being good large enough). Under this event, there must exist m 3 instances of time s ≤ m at which X st 1 and X st 1 are both in good 1-boxes. when this happens, at time st 1 + t 1 both X st 1 +t 1 and X st 1 +t 1 are isolated in a vertex (i.e., all edges adjacent to them are closed). Therefore, where the term n 2d accounts for the number of choices for i 0 and i 0 . Now let F be the σ-algebra generated by M t and M t during t ∈ [0, τ B ]. We want to establish a lower bound on the probability that B τ B +t 1 given F. Since X t and X t are isolated in t = τ B , it is enough to compute the probability that all edges inside a L ∞ ball of radius 2 around the walkers do a -update but no non-update, and the edges adjacent to the walkers do not open or do a non -update during [τ B , τ B + t 1 ]. This probability is where the first term is the probability that no edge in the L ∞ ball of radius 2 around the walkers does a non -update, the second term is the probability that those edges do a -update and the last term is the probability that the edges adjacent to the walkers do not open. Therefore, Recall that ∆ 1 = C 9 log 2 n µ , t 1 = log 2 µ and t 1 = √ µ , where is just a large enough constant that is set before letting p be small enough. Now we show that we can make the above smaller than δ. We start with the term 2 (4 ) d exp −t 1 µp , which can be made, say, smaller than δ 3 . We will do this by adjusting only, but this term involves also p though p . However, note that p goes to 1 as p goes to 0. So, since t 1 µ is of order log 2 , we can choose large enough so that 2 (4 ) d exp −t 1 µp ≤ δ 3 for all p so that p ≥ 1 2 . After fixing , we can take p close enough to 0, which makes p min goes to 0 and p goes to 1, so that Finally, after fixing and p, we can take n large enough so that 2ne −cm ≤ δ 3 since m = ∆ 1 t 1 − 1 is of order log 2 n as a function of n. This concludes the first phase.

The Second Phase: non Markovian Coupling
To describe the coupling during the second phase we will use the full multi-scale space-time tessellation described in section 2.1. For simplicity, we translate time so that this phase starts at time 0 and that X 0 is at the origin. Hence, X 0 can be arbitrary, and η 0 and η 0 can be any configuration for which the event B 0 from (4.1) holds.

Largest scale
We begin by creating the multi-scale space-time tessellation of T d n × [0, ∆ 2 ] and with largest scale k max = log 2 log n.
We consider a positive constant C 10 (p) > 0 to be chosen later so that t kmax divides C 10 n 2 µ , and define 2) The following Lemma shows that with large probability there are no bad boxes of scale k max or larger. This will allow us to restrict our analysis to boxes of scale at most k max . We will consider all the boxes contained into the tessellation T d n × [0, ∆ 3 ], which in particular are all the boxes intersecting the tessellation of T d n × [0, ∆ 2 ].
Using the value of k max and the fact that ρ 1 can be made arbitrarily small by taking p small concludes the proof.

The coupling
Recall the map Ψ t introduced in Section 4 which maps X t into X t . In order to define the coupling of the two processes, we will use a different map Φ t . The idea is that our new map will be equal to Ψ t in good parts of the environment, but when the walker enters the enlargement of a bad box, we will stop changing Φ t and will keep it "frozen" until the walkers exit the enlargements of all bad boxes. The idea is that in the enlargement of bad boxes we want to couple the graphs in a large region around the walkers so that if the walkers enter a bad box, then they do so with their graphs coupled within the box. We stop updating Φ t because when Φ t changes many edges uncouple.
More precisely, given a time t, denote with s t = sup{s ≤ t : (X s , s) is inside a k max -great box} the last time before t the walker is in a k max great box. We will consider the new map Φ t defined as Φ t = Ψ st .
We will show that this change of map actually will not create any problems; in fact, we will show that Φ t ≡ Ψ t for all t because in the way we construct the coupling, when the walkers are in the enlargement of a bad box, we will succeed in applying identity coupling, hence the translation map remains constant. So, the introduction of Φ t here is a formalism so that the coupling procedure is well defined. This will imply that our application of identity coupling later on will be successful, which in turn implies that Φ t ≡ Ψ t .
As soon as the second phase begins we check whether the box R 1 (i, 0), such that (X 0 , 0) ∈ R core 1 (i, 0), is k max -great (the reason we do this will be clarified later, see Remark 5.9). If that is the case then we can begin the coupling procedure relative to the second phase. The coupling is composed of two parts: the coupling of the graphs (that is, the coupling of η t and η t ) and the coupling of the walkers.

Coupling of the graphs
We let the process {η t } t≥0 evolve. Denote with C v (t) (resp., C v (t)) the cluster that contains vertex v at time t in the process η t (resp., η t ). When an update (s, U , U ) occurs at an edge e in η s we update the process η s as follows.
• If the update is a -update we refrain from looking at U and instead simply set η s (e) = and η s (Φ s (e)) = .
• If the update is not a -update we must check in both configurations η s and η s whether e is a cut-edge or not. We do this by looking at the connected components of the endpoints v 1 , v 2 of the edge e. If an edge e is such that η s (e ) = and e is incident to a vertex in C v 1 (s) ∪ C v 2 (s), we sample its current status, open or closed, according to its last update. Note that this last update is itself a tuple (s, U , U ), so this step boils down to checking the value of U . If η s (Φ s (e )) = we set η s (Φ s (e )) = η s (e ) as well. We continue this procedure until the components of v 1 and v 2 have been fully explored in η s and proceed analogously for the process η s until the components of Φ s (v 1 ) and Φ s (v 2 ) have been fully explored. A potential disagreement η s (e) = η s (Φ s (e)) can happen only if, by revealing the components of v 1 , v 2 , Φ s (v 1 ) and Φ s (v 2 ), we find that e is a cut-edge in η s but Φ s (e) is not a cut-edge in η s , or vice-versa.
In this way edges whose status is can always be coupled equivalently whereas non -updates cause the reveal of the status of other edges, potentially creating disagreements between the two configurations.
Remark 5.2 (Momentaneous change of coupling). At some times we will carry out a different coupling of the environment. This will be done by simply introducing another mapΦ of the environments, and the coupling of the graphs will go as described above with Φ t replaced with Φ until we specify that Φ t is again the map to be used.

Coupling of the walkers
During this discussion the reader should refer to Figure 3. Our goal is to define a coupling that can bring the walkers together. For this we will use the multi-scale tessellation. The coupling of the walkers will be composed of two different couplings. When the walker X s enters the core of a great box R core 1 (i, τ ), we will try to take advantage of the nice environment that a great box provides to perform a coupling that we refer to as a simple random walk moment. This Figure 3: In red the bad boxes, in blue their enlargement, in black the tessellation and the walker's trajectory. In bad boxes there is no control over the displacement of the walker, whereas in good boxes, the walker always leaves the box from its time boundary. Whenever the walker enters the enlargement of a bad box we start doing identity coupling, otherwise, in great boxes, if a SRWM occurs we do a simple random walk coupling, if not we keep doing identity coupling. coupling aims to change the distance between the walkers, so that eventually the walkers may find themselves at the same site.
On the other hand, whenever X s is not in a great box, then we do not have a good enough control on the environment around the walker to do a simple random walk moment. In such cases, we will just resort to a simple identity coupling that keeps the distance between the walkers unchanged. An identity coupling will only be able to be performed if the environment around the walkers are the same. For this, we define the following event: If B s holds for all s ∈ (s 1 , s 2 ), then in this time interval the walkers can perform the same jumps and not change their relative distance. In other words, we identity coupling is successful. In fact if the environment around the walkers is the same (as a matter of fact we only need the environments to agree on a ball of radius 1 around the walkers), by doing identity coupling the walkers are able to perform the same jumps.
So the proof is now split into three steps. Since Φ t does not change when the walker enters the 2-enlargement of a bad box, we will show in Section 5.3 that when Φ t does not change the graph couples. Next, we deal with showing that identity coupling can be successfully implemented as the walker enters the 2-environment of a bad box (i.e., when the walker is not in a great box). This is carried out in Section 5.4. Then in Section 5.5 we deal with the simple random walk moments.

Coupling of the graphs with Φ t unchanged
Given I ⊂ Z d and k ≥ 1, let S k (I) = i∈I S k (i) and S k (I) = i∈I S k (i), for any ∈ {core, enl1, enl2}. Recall the value m in the definition of k in (2.2). Recall also t k from (2.6). Then, for k ≥ 2, we define We start this section showing that the graph gets coupled in regions of good boxes if Φ t does not change.
Lemma 5.3 (Graphs couple in good boxes). Let m be large enough, and then let be large enough with respect to m. Let R k (i, τ ) be a good box, and let s 1 be any time instance so that [s 1 , s 1 + 2t k ] ⊂ T k (τ ). If Φ t does not change during [s 1 , s 1 + 2t k ], then there exists t ∈ [s 1 , s 1 + 2t k ] such that η t (e) = η t (Φ t (e)) for all e ∈ S k (i).
Proof. If k = 1 then the proof follows since each edge of S k (i) receives only -updates and gets updated at least once during [s 1 , s 1 + 2t k ]. For k ≥ 2, we assume that the statement of the lemma holds up to scale k − 1. Let s 2 = max T k (τ ). Let τ be the first time index such that τ t k−1 ∈ [s 1 , s 2 ] and all boxes R k−1 (·, τ ) ⊂ R k (i, τ ) are good. Let I be the set of indices containing all (k − 1)-boxes that are inside S k (i); more precisely, Then, by induction, by time τ t k−1 + 2t k−1 we obtain that S k−1 (I) = S k (i) has been coupled.
Now it remains to show that τ t k−1 + 2t k−1 ≤ s 1 + 2t k . Note that since R k (i, τ ) is a good box, there exists ı, τ such that all (k − 1)-bad boxes contained in R k (i, τ ) are contained in R enl1 k−1 ( ı, τ ). Since the amount of time spanned by the enlargement at scale k − 1 is 9t k−1 , we obtain that τ t k−1 ≤ s 1 + 9t k−1 + t k−1 , where the last t k−1 is to account for the possibility that s 1 is not a multiple of t k−1 . Hence, using the notation a + = max {a, 1} for consistency with the case k = 2, and noting that t 1 ≤ 6t 1 (k−2) 2 + m provided is made large enough once m has been fixed, we have Recall the definition of S inn 1 (i) from (2.7). For k ≥ 2, define For a set of indices I, we write S inn k (I) = j∈I S inn k (j).
Note that by taking m large enough, then S core k (I) ⊂ S inn k (I) ⊂ S k (I). We start with a simple result about the connected component of a vertex.
Lemma 5.4. Let m ≥ 2 and let be large enough with respect to m. Let I ⊂ Z d be a set of indices, k ≥ 1 a scale and τ ≥ 1 a time index such that R k (i, τ ) is a good box for all i ∈ I. Then, for any v ∈ S inn k (I) and any t ∈ T k (τ ), the connected component of v is contained in Proof. For k = 1, the result follows by the fact that components have size at most log 2 in good 1-boxes when τ ≥ 1, and is large enough so log 2 ≤ 5 /m. For k ≥ 2, let (i , τ ) be such that v ∈ S core k−1 (i ) and t ∈ T k−1 (τ ) ⊂ T k (τ ); there could be more than one choice for τ , it is irrelevant which one we pick. Note that since v ∈ S inn k (I) by applying the induction hypothesis at scale k − 1 and set of indices {i }. Otherwise, note that by Remark 2.17 we have that R enl1 the lemma holds on this case as well. In the final case, when the connected component of v is not contained in S enl1 k−1 (i ), it may sound contradictory but we can get an even smaller bound for the component of v. The reason is that there must exist i such that v is at the same component of a vertex u with u ∈ S core k−1 (i ) and is a good box. Thus, by induction we obtain that the connected component of v is inside for all k as long as m ≥ 2, the proof is completed.
With the help of the above lemma, we can show that the graph cannot uncouple in regions surrounded by good boxes.
Lemma 5.5 (Graphs remain coupled if Φ t does not change). Let m be large, and let be large enough with respect to m. Let R k (i, τ ) be a good box, and let s 1 < s 2 with s 1 , s 2 ∈ T k (τ ) and s 1 ≤ (τ + 1)t k . If Φ t does not change during t ∈ [s 1 , s 2 ] and η s 1 (e) = η s 1 (Φ s 1 (e)) for all e ∈ E(S k (i)), then η t (e) = η t (Φ t (e)) for all e ∈ E(S inn k (i)) and all t ∈ [s 1 , s 2 ].
Proof. For k = 1 the lemma is obvious, since for any e ∈ E(S k (i)), e only receives -updates during T k (τ ). Therefore, η t (e) = η t (Φ t (e)) for all t ∈ [s 1 , s 2 ]. For k ≥ 2, assume the lemma holds up to scale k − 1. Let Let τ 1 = min T . Note that either where the latter happens when s 1 is near the starting time of T k (τ ). Because s 1 cannot be near the ending time of T k (τ ) due to the condition s 1 ≤ (τ + 1)t k , we obtain that T is not empty. We will first show that To see this, let r 1 = sup T k−1 (τ 1 ) and note that r 1 ≥ s 1 + t k−1 because of (5.7). Now, induction gives that S inn k−1 (j) remains coupled up to time r 1 . We would like to reapply the induction hypothesis on the box S k−1 (j) in the next time step, but for this we need S k−1 (j) to be coupled, not only S inn k−1 (j). Thus, we first apply Lemma 5.3 from time r 1 − 2t k−1 to obtain that there exists a time r 1 ∈ [r 1 − 2t k−1 , r 1 ] for which the whole of S k−1 (j) is coupled. Let τ 2 be such that r 1 ∈ T core k−1 (τ 2 ) and note that τ 2 ≥ τ 1 + 1. Thus, we repeat the induction hypothesis and the application of Lemma 5.3 to obtain a sequence of τ ι , r ι and r ι until a certain value r ι ∈ [s 2 − 2t k−1 , s 2 ]. At that time, the induction hypothesis gives that S inn k−1 (j) is coupled at time s 2 , establishing (5.8).
Now we turn to establish the lemma. If R k (i, τ ) has no bad (k − 1)-box intersecting the time interval [s 1 , s 2 ], then (5.8) and the fact that (k−1)-boxes overlap give that j : , and note that R k−1 (j, τ ) is good for all j ∈ J and τ ∈ T . Therefore, (5.8) gives that S inn k−1 (j) is coupled during [s 1 , s 2 ] for all j ∈ J . The remaining of the proof is split into two cases. First assume that S enl1 k−1 (i ) is separated from infinity by J , which means that any path from S enl1 k−1 (i ) to the outside of S k (i) must enter S core k−1 (j) for some j ∈ J . In fact, letting we get that the path must enter S core k−1 (j) for some j ∈ J . Besides, Lemma 5.4 gives that for all v ∈ S inn k−1 (J ) and all s ∈ T enl1 k−1 (τ ) we have that the connected component of v is contained in B ∞ 5 k−1 /m (v). Therefore, all connected components intersecting S enl1 k−1 (i ) must be contained in v∈S enl1 , which is a spatial region contained in the interior of S inn k (J ). Therefore, since S inn k−1 (J) ⊃ S inn k−1 (J ) remains coupled throughout [s 1 , s 2 ] by (5.8), non-updates inside S enl1 k−1 (i ) cannot uncouple the graph. Turning to the second case, we assume that S enl1 k−1 (i ) is not separated from infinity by J . This means that S enl1 k−1 (i ) is so close to the boundary of S k (i) that it does not intersect S inn k (i). More formally, for any v ∈ S enl1 k−1 (i) we have that B ∞ 10 k−1 (v) cannot be contained in S k (i). But this implies that any i with S core k−1 (i ) ⊂ S enl1 k−1 (i ) we have that S enl2 k−1 (i ) ⊂ S inn k (i). Therefore, applying (5.8) to the boxes in J already gives that S inn k (i) is coupled during [s 1 , s 2 ].

Identity coupling
We prove that, by doing identity coupling, as long as the particle X t is in a point (v, t) ∈ T d n ×R + in space-time that is part of a 1-box R 1 (·, ·) that is good, it is always possible to keep the distance between X t and X t constant. Recall the event B t from (5.3), the event B t from (4.1), and the definition of the spatial core of a box in (2.10). We will need a weaker version of B t which we define as B t = ∀e ∈ E B ∞ /3 (X t ) , η t (e) = η t (Φ t (e)) . (5.9) Recall that in the second phase we assume that R 0 (0, 0) is a k max -great box and B 0 holds; we do not restate these conditions on the lemmas. The lemma below is a composition of the previous lemma when the walker traverses a sequence of good 1-boxes. We assume that the stronger event B t holds at the start time to be able to guarantee that B t holds during the entire time interval covered by the lemma. Proof. Let R 1 (i, τ ) be the box the walker is in its core at time s 1 . Since R 1 (i, τ ) is a good box, Lemma 5.6 gives that identity coupling works up to the end of T 1 (τ ) and Lemma 5.3 gives that S 1 (i) couples at some time during [s 1 , s 1 + 2t 1 ]. Moreover, for any t ∈ [s 1 , is the box whose core the walker is in when exitting R 1 (i, τ ), we can apply Lemma 5.6 again to show that identity coupling succeeds. Repeating this argument over and over again establishes the lemma.
Now we analyze what happens in the neighborhood around a bad 1-box, supposing that the walker enters the 2-enlargement of that box. Two things can happen, either the walker enters the 2-enlargement of the box from the space boundary ∂ s R enl2 1 (·, ·) or it enters from the time boundary ∂ − t R enl2 exits R enl2 k (i, τ ) after s c ; we take the convention that s e = τ + k if X t ∈ S enl2 k (i) for all t ∈ [s c , τ for all e ∈ E(S inn k (J)) and all t ∈ [s c , s e ] with t ≥ min T enl1 k (τ ). (5.13) The proof uses induction on k, so we treat the case k = 1 separately.
Proof of Lemma 5.8 for k = 1. We start with the case s c > τ − 1 , meaning that the walker entered the 2-enlargement of the bad box from ∂ s R enl2 1 (i, τ ). We need to establish (5.11) and (5.12) in this case. We establish (5.12) by showing that the walker never gets closer than 12 from S enl1 1 (i). To see this, from (5.10) we have that R 2 (i , τ ) contains the 2-enlargement of R 1 (i, τ ), and R 2 (i , τ ) is a good box. Moreover, Remark 2.17 gives that the 1-enlargement of R 1 (i, τ ) contains all bad 1-boxes inside R 2 (i , τ ), and Lemma 2.23 gives that the distance between the walker and S enl1 1 (i) is at least 12 , establishing (5.12). To establish (5.11), note that the walker only traverses good boxes during [s c , s e ], so (5.11) follows from 5.7. Now we consider the case s c = τ − 1 , and need to establish (5.11) and (5.13). The idea in this case is to use the time interval between s c and min T enl1 1 (τ ), which is large enough for the graphs to couple. In fact, applying Lemma 5.3 to the box R 2 (i , τ ) from time s c , we obtain a time s ∈ [s c , s c + 2t 2 ] so that S 2 (i ) is coupled. From this time onwards Lemma 5.5 gives that S inn 2 (i ) ⊃ S enl2 1 (i) remains coupled up to time s e . From Lemma 2.22 we know that the walker does not leave S inn 2 (i ) during [s c , s e ]. So if identity coupling succeeds up to time s c + 2t 2 , then it succeeds up to time s e . Moreover, note that 2t 2 = 12t 2 /m = 12t 1 is smaller than the distance between s c and min T enl1 1 (τ ), which is 15t 1 . So S inn 2 (i ) couples before the walker can enter R enl1 1 (i, τ ) and (5.13) is established. It remains to show that the coupling succeeds and B t holds for all t ∈ [s c , s c + 2t 2 ], completing the proof of (5.11). For this, we only need to note that during this time interval the walker only traverses good 1-boxes, so (5.11) follows from Lemma 5.7.
Proof of Lemma 5.8 for k ≥ 2. We have already established the case k = 1. Now we proceed via induction. Assume all claims of the lemma are proved up to scale k − 1. Let R k (i, τ ) be a bad box and R k+1 (i , τ ) as in the statement of the lemma be a good box. All bad boxes in R k+1 (i , τ ) are contained in R enl1 k (i, τ ). We first prove the case s c > τ − k , which requires establishing (5.11) and (5.12). In this case we use the same argument as in the case k = 1; that is, (5.12) follows from Lemma 2.23. To show that identity coupling can be performed and B t holds, notice that if at time s c the walker is inside a bad box R k (i , τ ) for some k < k, then since R 1 (0, 0) is k max -great, we have that in a previous time the walker was in the boundary of R enl2 k (i , τ ). If there are more than one tuple (k , i , τ ) satisfying the property above, we take the one with the largest k (breaking ties arbitrarily if there still are more than one such tuples). Since the walker must have entered the 2-enlargement R enl2 k (i , τ ) at some time s c , we obtain by induction that while traversing the bad box R k (i , τ ) identity coupling is successful and B t holds up to the end of T k (τ ), since (5.13) implies B t . Therefore, when the walker leaves R k (i , τ ), we can apply the induction hypothesis again if the walker is inside another bad box. It remains to check that identity coupling can be performed while the walker passes through space-time locations that belong to good boxes at all scale, in particular, while the walker passes through good 1-boxes. But since B t holds at that time, identity coupling succeeds by Lemma 5.7, concluding the proof of (5.11).
We now prove the case s c = τ − k , which requires establishing (5.11) and (5.13). Assume that s e ≥ min T enl1 k (τ ), otherwise (5.11) follows from the same argument above and (5.13) is irrelevant. We can do the same argument as for k = 1; i.e., we show that the time interval between s c and min T enl1 k (τ ) is large enough for the graphs to couple. By Lemma 5.3 we obtain a time s ∈ [s c , s c + 2t k+1 ] so that S k+1 (i ) is coupled and, by Lemma 5.5, S inn k+1 (i ) ⊃ S enl2 k (i) remains coupled until s e . Since Lemma 2.22 gives that the walker does not leave S inn k+1 (i ) during [s c , s e ], if identity coupling succeeds up to time s c + 2t k+1 , then it succeeds up to time s e . Besides, 2t k+1 = 12 t k+1 k 2 m = 12t k is smaller than the distance between s c and min T enl1 k (τ ), which is 15t k . So S inn k+1 (i ) couples before the walker can enter R enl1 k (i, τ ) and (5.13) is established. To establish (5.11), we need to show that B t holds for all t ∈ [s c , s c + 2t k+1 ], but during this time the walker only traverses good 1-boxes, so (5.11) follows from Lemma 5.7.
Remark 5.9. The 2-enlargement of a bad box is chosen so that whenever the walker crosses it, by doing identity coupling the two processes have time to couple the environment before the walker crosses the bad box. For this exact reason we want the first box whose core the walker is at, at the beginning of the second phase, to be k max -great, so we know that the walker does not start inside the enlargement of a bad box, meaning that if the walker encountersa bad box during the second phase, it must first traverse its enlargement.

Simple random walk moment
Now we handle the case when the walker traverses great boxes, during which we do not perform identity coupling but try a different coupling. This coupling will be based on what we call a simple random walk moment (SRWM), which is a given condition of the evolution of the environment that makes the walker performs a simple random walk step.
Definition 5.10 (Simple random walk moment). Let R 1 (i, τ ) be a great box such that (X τ t 1 , τ t 1 ) ∈ R core 1 (i, τ ). We consider three consecutive intervals I 1 , I 2 , I 3 of lengths such that I 1 begins at time τ t 1 = min T core 1 (τ ); note that τ t 1 + 3 j=1 |I j | < (τ + 2)t 1 = max T core 1 (τ ). Let v ∈ S 1 (i) be the position of the walker X τ t 1 ; note that since R 1 (i, τ ) is a good box then all edges adjacent to v at time τ t 1 are closed. All the events below consider only -updates during I 1 ∪ I 2 ∪ I 3 , ignoring all non-updates. Then, a simple random walk moment (SRWM) is said to occur in R 1 (i, τ ) if the following events happen consecutively:  See Figure 4 for an illustrative realization of a simple random walk moment. Define be the indicator for the event that SRWM occurs in R 1 (i, τ ). (5.14) Remark 5.11. Given v, the position of the walker at time τ t 1 , the event SRWM depends only on the updates in E(S 1 (i)) during the time interval I 1 ∪ I 2 ∪ I 3 . In particular, it does not depend on the jumps of the walkers during I 1 ∪ I 2 ∪ I 3 , and does not depend on non-updates that could occur during I 1 ∪ I 2 ∪ I 3 .
Note that from τ t 1 to time τ t 1 + |I 1 ∪ I 2 ∪ I 3 | the walker essentially performed a simple random walk step since the edge e adjacent to v that is chosen to open during I 1 is a uniformly random edge. Now assume that the walker enters R core 1 (i, τ ) with R 1 (i, τ ) being a k max -great box; i.e., (X τ t 1 , τ t 1 ) ∈ R core 1 (i, τ ). We define the coupling we employ in this situation. Definition 5.12 (Coupling on great boxes). At time τ t 1 both walkers are trapped at some vertices v = X τ t 1 ∈ S core 1 (i) and v = Φ τ t 1 (v). Then we perform the following steps.
1. Sample whether a simple random walk moment occurs in R 1 (i, τ ). If not, sample the updates of the edges in S 1 (i) during I 1 ∪ I 2 ∪ I 3 from the distribution conditioned on I SRWM (i,τ ) = 0, apply the coupling of the graphs from Section 5.2.1 and apply identity coupling for the walkers. Identity coupling succeeds since the graphs are coupled inside S 1 (i) and we obtain that Φ t does not change during I 1 ∪ I 2 ∪ I 3 . This concludes the coupling when I SRWM  = 1, choose a coordinate j ∈ {1, 2, . . . , d} and a sign s ∈ {−1, +1} uniformly at random. If v and v agree in that coordinate, let e = (v, v + se j ) and e = (v, v + se j ) be the edges chosen to open during I 1 in the configurations η and η , respectively, where e 1 , e 2 , . . . , e d stands for the standard basis of Z d . In this case, during I 2 , we let the walkers perform the same jumps across e and e (i.e., we perform identity coupling), and note that Φ t maps e into e during this time. Then we couple the graphs using Φ t , as described in Section 5.2.1, until the end of I 3 . In this case, the map Φ t does not change during I 1 ∪ I 2 ∪ I 3 .
3. If I SRWM (i,τ ) = 1, and v and v do not agree in the jth coordinate, we set e = (v, v + se j ) and e = (v, v − se j ). This is the most delicate case as we will need to change the coupling of the graphs from the time e opens to the end of I 1 ∪ I 2 ∪ I 3 . For this, we will use the mapΦ which maps v to v and is a translation map in all coordinates but the jth one, where it is a reflection map around e. In particular,Φ maps e onto e. Then the graphs will be coupled as in Remark 5.2; that is, the graphs are coupled as in Section 5.2.1 but using the mapΦ instead of Φ t . Note that any update to e translates to an update of e, so they open at the same time and close at the same time. Let ζ be the time that e and e open for the first time during I 1 . Then, they remain open during [ζ, ζ + C 11 /µ] since I SRWM (i,τ ) = 1. We couple the position of the walkers at time ζ + C 11 /µ as follows. Let δ be the probability that X ζ+C 11 /µ = v and 1 − δ be the probability that X ζ+C 11 /µ = u. Then we make X ζ+C 11 /µ = X ζ+C 11 /µ with probability min {δ, 1 − δ}; otherwise, we sample them accordingly. Then, we let the graph and the walkers evolve up to the end of the interval I 1 ∪ I 2 ∪ I 3 , coupling the jumps of the walkers so that they jump at the same times after time ζ + C 11 /µ; note that the walkers do not move after e and e close for the first time after ζ + C 11 /µ. Now, let s = τ t 1 + |I 1 ∪ I 2 ∪ I 3 | be the end time of the simple random walk moment. Note that if SRWM occurs then X s − X s 1 may differ from X 1 τ t 1 − X 2 τ t 1 1 , and as a result the translation map Φ s may be different from Φ τ t 1 as well. So it could be the case that an edge that was coupled before the simple random walk moment (in the sense that η τ t 1 (e ) = η τ t 1 (Φ τ t 1 (e ))) may get uncoupled because the map Φ changes. On the other hand, after I 1 all the edges in the box receive a update. So at the end of the SRWM, all edges in S 1 (i) are with the only exception being the edges adjacent to the walker which are closed. So the configurations are coupled locally, in particular, B s holds. Moreover, as R 1 (i, τ ) is great (so it is also good) the particles will stay in S 1 (i) for the whole time interval T 1 (τ ). In other words we obtain that the edges in S 1 (i), where the random walk moment is occurring, are coupled after the simple random walk moment ends.
More formally, we will implement this by assigning a "hidden" random variable to each 1-box, which tells whether the box will undergo a SRWM should the walker pass there. We will not try the above coupling at each great box the walker enters, since we do need a bit of time separation between two simple random walk moments because of the overlapping of the boxes. But whenever we decide to attempt a simple random walk moment inside a great box the walker is in, the hidden random variable will tell whether SRWM will occurs. The main point is that we can obtain a lower bound on P I SRWM (i,τ ) = 1 that is uniform on the location of the walker at time τ t 1 . Because of this uniform bound, we can couple the outcome of the hidden variable with the evolution of the processes M t and M t so that the simple random walk moment takes place, regardless of the location of the walker within the box. The content of the hidden variable is just a Bernoulli random variable of parameter C 12 p 6d−1 6d , which is the bound we derive in Lemma 5.14 below, so the event of successfully performing a SRWM stochastically dominates the hidden variable. Whenever we decide to look at the hidden variable of a box, we perform the coupling described above. Otherwise, we just do identity coupling.
Before establishing a bound on P I SRWM (i,τ ) = 1 we need to show that the environments recouple locally after a SRWM.
Proof. R 1 (i, τ ) is k max -great, and in particular 1-great. Thus, every 1-box R(j, τ ) such that S 1 (j) ∩ S enl2 1 (i) = ∅ is good. After s − t 1 = τ t 1 + |I 1 ∪ I 2 ∪ I 3 |, we have that the edges in S 1 (i) are coupled and we start performing identity coupling of the walkers. The coupling is succeessful so Φ t does not change from that moment onwards and all edges in S 1 (j) with S 1 (j) ∩ S enl2 1 (i) = ∅ receives a -update and couples. Now we bound the probability of a SRWM. Recall from Definition 2.7 that the event that a box R 1 (i, τ ) is good is based on events G 12 (i, τ ) and G 34 (i, τ ). Define J to be the set of all tuples (i, τ ) such that R 1 (i, τ ) is a box of the tessellation of the second phase. Let Σ = {0, 1} 2J be the set of all possible assignments of occurrence or non occurrence to the events G 12 (i, τ ) and G 34 (i, τ ). Then for each σ ∈ Σ and each (i, τ ), the values σ 12 (i, τ ) and σ 34 (i, τ ) will be used to specify whether the events G 12 (i, τ ) and G 34 (i, τ ) occur, respectively. In this way, given σ ∈ Σ, we abuse notation and denote by σ the event that the realizations of G 12 (i, τ ) and G 34 (i, τ ) match the values of σ 12 (i, τ ) and σ 34 (i, τ ) for each (i, τ ) ∈ J , and write P(· | σ) for the corresponding conditional probability. More precisely, Note that once we condition on some σ ∈ Σ, then which boxes of all scales are good or bad is a deterministic function of σ. Let F t be the σ-algebra generated by the trajectory of the walker X s and the value of the map Ψ s , s ∈ [0, t], and all the updates of the graph up to time t. Let Σ i,τ ⊂ Σ be the set of assignments σ for which R 1 (i, τ ) is a k max -great box.
Lemma 5.14. Let (i, τ ) be such that R(i, τ ) is a k max -great box. There exists p 0 > 0 and C 12 > 0 such that for all p < p 0 , for all σ ∈ Σ i,τ , and all F ∈ F τ t 1 for which P (σ ∩ F ) > 0, then the probability of performing a simple random walk moment in R 1 (i, τ ) is Proof. Start with the following simplification of σ. Recall the definition of j(τ ) from (2.8). So j(τ ) and j(τ +1) are the first and last interval of the type T 1 (·) inside T 1 (τ ). Recall that σ 34 (·, ·) correspond to the events G 34 (·, ·), which are i.i.d. events coupled with the events G 34 (·, ·). Since G 34 (·, ·) are independent of G 12 (·, ·) by Lemma 2.4, we have that also G 34 (·, ·) are independent of G 12 (·, ·). Moreover, for any x, we have that G 34 (i , τ ) is independent of F τ t 1 since G 34 (i , τ ) only considers updates on the edges during the interval T core 1 (τ ) \ T 1 (τ + 1). Since for any fixed x we have that G 34 (i , τ ) are independent for different s, we have that G 34 (i , τ ) is independent of F τ t 1 . So now we collect in J 34 all tuples from J for which I SRWM depend on σ 34 (·, ·): We will not need to split σ 12 (·, ·) into two groups since those events are already independent of I SRWM (i,τ ) . For any σ ∈ Σ denote S = S(σ) = Note that T core 1 (τ ) \ T 1 (τ + 1) ⊃ I 1 ∪ I 2 ∪ I 3 , so I SRWM does not depend on F ∩ S given the position of the walker at time τ t 1 . Letting S 1 (i) = u∈S core 1 (i) B ∞ log 2 (u), which are the places where the walker can be at time τ t 1 , we write P I SRWM We start with the first term in (5.16); that is, we derive a lower bound on P I SRWM is composed of the events E 1 , E 2 and E 3 , which are independent of one another since they involve disjoint time intervals, we will derive a lower bound for each of them. For the event E 1 , we will require that an edge adjacent to v (call it e) opens during the first half of I 1 , so that e has time to remains open for time C 11 /µ during I 1 . Recall that I 1 has length t 1 /2, so its first half has length t 1 /4, and the rate at which an edge opens due to a -update is µp p min p = µp min , and the rate at which an edge close due to a -update is 1 − p max . We obtain In the product above, the first term corresponds to an edge adjacent to the walker (call it e) opening during the first half of I 1 , the second term is the probability that all 2d − 1 edges adjacent to e are closed at that time, the third term is the probability that none of the 4d − 2 edges adjacent to e open until the end of I 1 , and the fourth term is the probability that e remains open for at least time C 11 /µ. Recalling that t 1 = √ µ and that = p − 1 3d we obtain P (E 1 | X τ t 1 = v) = 1 − e −dp min Using that p ≤ 1 in the second term, p min ∈ p 1+q , p in the first and third terms, and p max ≥ 0 in the fourth term, and then making p small enough so that p min ≤ p max ≤ 1 2 and e −d p 1+q √ 2 ≤ 1 − dp √ 4(1+q) we obtain Now note that p √ = p 1− 1 6d goes to 0 as p → 0. Thus, we can take p small enough so that Event E 1 is the main one governing the probability that SRWM occurs, since it involves the opening of an edge, which has small probability. For E 2 and E 3 we will just derive simple bounds that will not go to 0 as p → 0. Recall that I 2 and I 3 are time intervals of length 1/µ, so P (E 2 | X τ t 1 = v) = 1 − e −µ(1−pmax) 1 µ e −µp min 1 µ e −(4d−2)µp min 1 µ , where the first term is the probability that e has a -update to close, the second term is the probability that e does not get a -update to open, and the final term is the probability that all 4d − 2 edges adjacent to e do not receive a -update to open. Recall that p min and p max both go to 0 as p → 0, so we obtain that Regarding E 3 , we obtain where the inequality follows for all small enough p since p → 1 and p min → 0 as p → 0. Putting (5.17), (5.18) and (5.19) together we have a constant c = c(d, q) so that for all small enough p we obtain P I SRWM (i,τ ) = 1 X τ t 1 = v ≥ cp 1− 1 6d . Plugging the bound above into (5.16), we obtain Now as we explained in the beginning of the proof, S 34 is independent of F τ t 1 and of S. Moreover, S 34 is composed of an intersection of independent events G 34 (·, τ ) since σ ∈ Σ i,τ so that R 1 (i, τ ) is k max -great. Therefore, where the last inequality follows from Lemma 2.6. Since as p → 0 the second term is much smaller than the first one, the lemma follows.

Concluding the second phase
Recall that for simplicity we are assuming that (X 0 , 0) = (0, 0), and recall the value of ∆ 2 from (5.2). Denote with I d : V → V the identity map, then we define If F 2 is verified, the second phase is successful and the third phase can start, otherwise we let the two processes evolve independently until the end of phase 3, and only then restart the coupling from phase 1.
Lemma 5.15. Assume F 1 is verified at time 0. For any δ > 0 and for all p small enough, there exists C 10 = C 10 (d, p, δ) > 0 in the definition of ∆ 2 and n 0 < ∞ such that for all n > n 0 P(F 2 ) ≥ 1 − δ.
Proof. Let P be any feasible path and consider Υ P 1 = inf {τ > 0 : (P(τ t 1 ), τ t 1 ) ∈ R core 1 (i, τ ) where R 1 (i, τ ) is a k max -great box} , Υ P j = inf τ > Υ P j−1 : (P(τ t 1 ), τ t 1 ) ∈ R core 1 (i, τ ) where R 1 (i, τ ) is a k max -great box , for j ≥ 2. For any feasible path P we let κ P be the largest value such that Υ P κ P ≤ ∆ 2 t 1 −2. Recall that Σ represents the set of all possible realizations of occurrences and non occurrences for the events G 12 (·, ·) and G 34 (·, ·), so the good and bad boxes at all scales are deterministic functions of σ. Let F(σ) be the set of all feasible paths for a given σ. Given the uniform bound from Lemma 5.14, we let Y 1 , Y 2 , . . . be a sequence of i.i.d. Bernoulli random variables of parameter where Y j gives whether the jth SRWM will succeed when we try to perform it during the coupling. Let where C 8 is from Lemma 2.25 and C 10 from the definition of ∆ 2 in (5.2). Define the following events Y j ≥ c 0 n 2 , with c 0 to be chosen later, and E 2 = {κ P ≥ ζ for all feasible paths P ∈ F(σ)} .
In this stage we want to couple the position of the walkers. From Lemma 5.8, by doing identity coupling whenever the walkers are not in a great box, their relative distance does not change. Their relative distance changes only when they are in a great box and a simple random walk moment is successfully performed. Let E coup = {Φ ∆ 2 = I d } ∩ B ∆ 2 . Hence P(F c 2 ) ≤ P ((0, 0) is not k max -great) + P E 1 ∩ E 2 ∩ E c coup + P (E c 1 ) + P (E c 2 ) .
We start by bounding the first term. Notice that {(0, 0) is not k max -great} does not depend on the configuration at time 0. Moreover, at time 0, the walkers are stuck in a vertex, so X s has to leave R 1 (0, 0) from the time boundary if R 1 (0, 0) is a good box. Using Lemmas 2.8 and 2.15 to bound ρ j , we obtain where c d is a constant that counts the number of boxes whose 2-enlargement intersects R 1 (0, 0), and the last inequality follows for all p small enough. Next we bound Under E 1 ∩ E 2 , we know we performed at least c 0 n 2 simple random walk moments. So, P E 1 ∩ E c coup can be bounded by the probability that two random walkers performing SRW on T d n are not coupled after c 0 n 2 steps. Taking c 0 = c 0 (d, δ) large enough we obtain that they have coupled with probability at least 1 − δ 4 . Next, we bound P (E c 2 ). From Lemma 5.1, with probability at least 1 − ρ 2 kmax−3 1 , all k maxboxes in the tessellation are good. Thus, Lemma 2.25 gives that while traversing the first good k max -great box any feasible paths will traverse at least C 8 t kmax t 1 k max -great 1-boxes. After the feasible path exits the first k max -great box, it enters into another one and we obtain again another set of k max -great 1-boxes. The total number of steps we can iterate this procedure up to reaching time ∆ 2 is ∆ 2 2t kmax − 1 ≥ ∆ 2 4t kmax . Therefore, any feasible path must traverse at least by simply having n large enough.
Finally we bound P (E c 1 ). This is a simple Chernoff bound for the sum of independent Bernoulli random variables, where P(Y j ) = C 12 p  Now we take C 10 large enough so that the above is larger than 2c 0 n 2 , which gives a constant c so that where the last inequality follows by taking n large.
To conclude the second phase, once the walkers are coupled after one SRWM, we just perform identity coupling up to time ∆ 2 . If t is a time where a SRWM ended, notice that Lemma 5.13 gives that B t holds. So we succeed performing identity coupling up to ∆ 2 by Lemmas 5.6, 5.7 and 5.8.

Third Phase
The third phase starts at time ∆ 2 , at which time the walkers are coupled and B ∆ 2 holds. During the third phase we let M t mimic the evolution of M t by doing identity coupling on both the motion of the walkers and the updates of the edges. We now check whether the processes are fully coupled by time ∆ 3 = ∆ 2 + n 2 µ . Define F 3 = X ∆ 3 = X ∆ 3 and η ∆ 3 (e) = η ∆ 3 (e) ∀e ∈ E(T) d n . (6.1) If F 3 is not verified, we restart the coupling at time ∆ 3 from phase 1.
Lemma 6.1. For any δ > 0, if p is small enough and n large enough, we obtain Proof. Recall that boxes contained in [0, ∆ 3 ] have been sampled as good or bad during the second phase. By Lemmas 5.6, 5.7 and 5.8, identity coupling is successful provided we cannot enter a bad box without first entering its 2-enlargement. Therefore, for the walkers to get uncoupled during [∆ 2 , ∆ 3 ], it must so happen that the walkers entered a bad box of some scale k whose 2-enlargement intersects [0, ∆ 2 ] and which was not observed during the second phase because it is not contained in [0, ∆ 3 ]. We now count the number of such boxes.
We start by deriving bounds on k and t k , the size of the boxes of scale k, for which the above can happen. When k ≥ k max , we can choose n large enough so that for any m, fixed Recall that k max = log 2 log n from (5.1). Then, µt kmax ≤ √ k