Higher order corrections for anisotropic bootstrap percolation

We study the critical probability for the metastable phase transition of the two-dimensional anisotropic bootstrap percolation model with (1, 2)-neighbourhood and threshold r=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r = 3$$\end{document}. The first order asymptotics for the critical probability were recently determined by the first and second authors. Here we determine the following sharp second and third order asymptotics: pc([L]2,N(1,2),3)=(loglogL)212logL-loglogLlogloglogL3logL+log92+1±o(1)loglogL6logL.\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_c\big ( [L]^2,{\mathcal {N}}_{\scriptscriptstyle (1,2)},3 \big ) \;= & {} \; \frac{(\log \log L)^2}{12\log L} - \frac{\log \log L \, \log \log \log L}{ 3\log L} \\&+ \frac{\left( \log \frac{9}{2} + 1 \pm o(1) \right) \log \log L}{6\log L}. \end{aligned}$$\end{document}We note that the second and third order terms are so large that the first order asymptotics fail to approximate pc\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} even for lattices of size well beyond 10101000\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{10^{1000}}$$\end{document}.


Motivation and statement of the main result
Bootstrap percolation is a general name for the dynamics of monotone, two-state cellular automata on a graph G. Bootstrap percolation models with different rules and on different graphs have since their invention by Chalupa et al. [20] been applied in various contexts and the mathematical properties of bootstrap percolation are an active area of research at the intersection between probability theory and combinatorics. See for instance [1,2,4,5,8,23,33] and the references therein.
Motivated by applications to statistical (solid-state) physics such as the Glauber dynamics of the Ising model [26,36] and kinetically constrained spin models [17], the underlying graph is often taken to be a d-dimensional lattice, and the initial state is usually chosen randomly.
Although some progress has recently been made in the study of very general cellular automata on lattices [12,14,25], attention so far has mainly focused on obtaining a very precise understanding of the metastable transition for specific simple models [4,8,9,18,23,30,33].
In this paper we will provide the most detailed description so far for such a model; namely, the so-called anisotropic bootstrap percolation model, defined as follows: First, given a finite set N ⊂ Z d \{0} (the neighbourhood) and an integer r (the threshold), define the bootstrap operator for every set S ⊂ Z d . That is, viewing S as the set of "infected" sites, every site v that has at least r infected "neighbours" in v + N becomes infected by the application of B. If ⊂ S ∩ then we say that S percolates on . We remark that since we will usually expect the probability of percolation to undergo a sharp transition around p c , the choice of the constant 1/2 in the definition (1.2) is not significant. The anisotropic bootstrap percolation model is a specific two-dimensional process in the family described above. To be precise, set d = 2 and (1.3) 1 Throughout this paper we will use the standard Landau order notation: either for all x sufficiently large or sufficiently small, depending on the context, ) and f (x) = (g(x)), To put this theorem in context, let us recall some of the previous results obtained for bootstrap processes in two dimensions. The archetypal example of a bootstrap percolation model is the "two-neighbour model", that is, the process with neighbourhood N (1,1) := (− 1, 0), (0, − 1), (0, 1), (1,0) and r = 2. The strongest known bounds are due to Gravner, Holroyd, and Morris [28,30,37], who, building on work of Aizenman and Lebowitz [4] and Holroyd [33], proved that p c [L] 2 , N (1,1) (1.4) The anisotropic model was first studied by Gravner and Griffeath [27] in 1996. In 2007, the second and third authors [41] determined the correct order of magnitude of p c . More recently, the first and second authors [23] proved that the anisotropic model exhibits a sharp threshold by determining the first term in (1.3).
The "Duarte model" is another anisotropic model that has been studied extensively [13,22,38]. The Duarte model has neighbourhood N Duarte = (− 1, 0), (0, − 1), (0, 1) and r = 2. The sharpest known bounds here are due to the Bollobás et al. [13]: Although the Duarte model has the same first order asymptotics for p c as the anisotropic model (up to the constant), the behaviour is very different. In particular, the Duarte model has a "drift" to the right: clusters grow only vertically and to the right. This asymmetry has severe consequences for the analysis of the model (especially for the shape of critical droplets). The "r -neighbour model" in d dimensions generalises the standard (two-neighbour) model described above. In this model, a vertex of Z d is infected by the process as soon as it acquires at least r already-infected nearest neighbours. Building on work of Aizenman and Lebowitz [4], Schonmann [40], Cerf and Cirillo [18], Cerf and Manzo [19], Holroyd [33] and Balogh et al. [9,10], the following sharp threshold result for all non-trivial pairs (d, r ) was obtained by Balogh et al. [8]: for every d r 2, there exists an (explicit) constant λ(d, r ) > 0 such that (1,...,1) .
(Here, and throughout the paper, log (k) denotes a k-times iterated logarithm.) Finally, we remark that much weaker bounds (differing by a large constant factor) have recently been obtained for an extremely general class of two-dimensional models by Bollobás et al. [12], see Sect. 1.3, below. Moreover, stronger bounds (differing by a factor of 1 + o(1)) were proved for a certain subclass of these models (including the two-neighbour model, but not the anisotropic model) by the Duminil-Copin and Holroyd [25].
Although various other specific models have been studied (see e.g. [15,16,34]), in each case the authors fell very far short of determining the second term.

The bootstrap percolation paradox
In [33] Holroyd for the first time determined sharp first order bounds on p c for the standard model, and observed that they were very far removed from numerical estimates: π 2 /18 ≈ 0.55, while the same constant was numerically determined to be 0.245 ± 0.015 on the basis of simulations of lattices up to L = 28800 [3]. This phenomenon became known in the literature as the bootstrap percolation paradox, see e.g. [2,21,28,31].
An attempt to explain this phenomenon goes as follows: if the convergence of p c to its first-order asymptotic value is extremely slow, while for any fixed L the transition around p c is very sharp, then it may appear that p c converges to a fixed value long before it actually does.
This indeed appears to be the case. The first rigorisation of the "extremely slow convergence" part of this argument appears in [28], for a model related to bootstrap percolation. Theorem 1.1 gives another unambiguous illustration of extremely slow convergence for a bootstrap percolation model: the second term in (1.3) is actually larger than the first while 4 log log log L > log log L , which holds for all L in the range 66 < L < 10 2390 . Moreover, the second term does not become negligible (smaller than 1% of the first term, say) until L > 10 10 1403 . On relatively small lattices, even the third term makes a significant contribution to p c : it is larger than the first term when L < 10 60 and larger than the second term when L < 10 13 .
The "sharp transition" part of the argument has also been made rigorous: for the standard model, an application of the Friedgut-Kalai sharp-threshold theorem [7] tells us that the "ε-window of the transition" 2 is So the ε-window is much smaller than the second order asymptotics in (1.4).
For the anisotropic model a similar analysis [11] yields that the ε-window satisfies which is again much smaller than the second and third order asymptotics in Theorem 1.1. So our analysis supports the above explanation of the bootstrap percolation paradox.

Universality
Recently, a very general family of bootstrap-type processes was introduced and studied by Bollobás et al. [14]. To define this family, let U = {X 1 , . . . , X m } be a finite collection of finite subsets of Z d \{0}, and define the corresponding bootstrap operator by setting for every set S ⊂ Z d . It is not hard to see that all of the bootstrap processes described above can be encoded by such an 'update family' U, and in fact this definition is substantially more general. The key discovery of [14] was that in two dimensions the class of such monotone cellular automata can be elegantly partitioned 3 into three classes, each with completely different behaviour. More precisely, for every twodimensional update family U, one of the following holds: • U is "supercritical" and has polynomial critical probability.
• U is "critical" and has poly-logarithmic critical probability.
• U is "subcritical" and has critical probability bounded away from zero.
We emphasise that the first two statements were proved in [14], but the third was proved slightly later, by Balister et al. [6]. Note that the critical class includes the two-neighbour, anisotropic and Duarte models (as well as many others, of course). For this class a much more precise result was recently obtained by Bollobás et al. [12]. In order to state this result, let us first (informally) define a two-dimensional update family to be "balanced" if its growth is asymptotically two-dimensional 4 (like that of the two-neighbour model), and "unbalanced" if its growth is asymptotically onedimensional (like that of the anisotropic and Duarte models). The following theorem was proved in [12]. Theorem 1.2 Let U be a critical two-dimensional bootstrap percolation update family. There exists α = α(U) ∈ N such that the following holds: Theorem 1.2 thus justifies our view of the anisotropic model as a canonical example of an unbalanced model.

Internally filling a critical droplet
As usual in (critical) bootstrap percolation, the key step in the proof of Theorem 1.1 will be to obtain very precise bounds on the probability that a "critical droplet" R is internally filled 6 (IF), i.e., that R ⊂ S ∩ R . We will prove the following bounds: Theorem 1.3 Let p > 0 and x, y ∈ N be such that 1/ p 2 x 1/ p 5 and 1 3 p log 1 p y 1 p log 1 p , and let R be an x × y rectangle. Then The alert reader may have noticed the following surprising fact: we obtain the first three terms of p c ([L] 2 , N (1,2) , 3) in Theorem 1.1, despite only determining the first two terms of log P p (R is IF) in Theorem 1.3. We will show how to formally deduce Theorem 1.1 from Theorem 1.3 in Sect. 7, but let us begin by giving a brief outline of the argument.
To slightly simplify the calculations, let us write We claim (and will later prove) that p c = p c ([L] 2 , N (1,2) , 3) is essentially equal to the value of p for which the expected number of internally filled critical droplets in [L] 2 is equal to 1 (the idea being that a critical droplet with size as in Theorem 1.3 will keep growing indefinitely with probability very close to one). We therefore have 5 Here Z 2 L denotes the discrete two-dimensional L × L torus, and p c Z 2 L , U is defined as in (1.2). We consider the torus since in general undesirable complications may arise due to boundary effects or strongly asymmetrical growth. 6 This notion is often referred to as "internally spanned" (especially in the older literature).
and hence Iterating the right-hand side gives Upon using the approximation log log(1/ p c ) ≈ log log log L and multiplying out, this reduces to which is what we hope to prove. Thus we obtain three terms for the price of two.

A generalisation of the anisotropic model
One natural way to generalise the anisotropic model is to consider, for each b > a 1, the neighbourhood It follows from Theorem 1.2 that 7 The arguments developed in [23] can be applied to prove that the leading order behaviour of p c for the (1, b)-model is 8 7 The value of α follows from [12,Definition 1.2]. Furthermore, if r b then the model is supercritical, so p c Z 2 L , N (a,b) , r ) L −c for some c > 0, and if r > a + b then the model is subcritical, so p c Z 2 L , N (a,b) , r ) > c for some c > 0. 8 This is contrary to the claim in [23,Sect. 1]. See also [24], the erratum to [23].
Combining the techniques of [23] with those introduced in this paper, it is possible to prove the following stronger bounds: Note that in the case b = 2 this reduces to Theorem 1.1. We remark that Theorem 1.4 follows from a corresponding generalisation of Theorem 1.3, with the constants 1 6 and 1 3 log 8 3e replaced by respectively. We will not prove Theorem 1.4, since the proof is conceptually the same as that in the case b = 2, but requires several straightforward but lengthy calculations that might obscure the key ideas of the proof. It is, however, not too hard to see where the numerical factors come from: A droplet grows horizontally in the (1, b)-model as long as it does not occur that the b + 1 consecutive columns to its left and/or right do not contain an infected site. And it grows vertically as long as there are b sites in a "growth configuration" somewhere above and/or below. There are

Comparison with simulations
One might be tempted to hope that the third-order approximation of p c in Theorem 1.1 is reasonably good already for lattices that a computer might be able to handle. Simulations indicate that this is not the case. Indeed, for lattices with L 10,000, the third-order approximation is even farther from to the simulated values than the firstorder approximation (and recall that the second-order approximation is negative here). We believe that this should not be surprising, because it is not at all obvious that the fourth order term should be significantly smaller: careful inspection of our proof suggests that the o( 1 p log 1 p ) term in Theorem 1.3 is at most O( 1 p log log 1 p ). Although we do not prove this, we have no reason to believe that a correction term of that order does not exist. Even if we suppose that the third order correction in Theorem 1.3 can be sharply bounded by C 3 / p, say, so that we would have the bound P p R is internally filled for critical droplets instead, then a computation like the one in Sect. 1.4 above suggests that this would yield 2 12 log L − (log log L)(log log log L) 3 log L + log 9 2 + 1 log log L 6 log L + (log log log L) 2 3 log L − (log 9 2 + 1) log log log L 3 log L + C 3 + log 1 12 2 + log 27 16 ± o(1) 12 log L , so the fourth, fifth, and sixth order terms of p c would also be comparable to the first for moderately sized lattices. Moreover, because of the extremely slow decay of these correction terms (e.g. (log log 10 10 ) 2 ≈ 10), it might be too optimistic to expect that one would be able to determine C 3 by fitting to the simulated values of p c , if indeed C 3 exists.

Comparison with the two-neighbor model
Comparing Theorem 1.1 with the analogous result for the two-neighbor model, (1.4), it may seem remarkable how much sharper the former is than the latter. We believe the following heuristic discussion goes a way towards explaining this difference. Both approximations of p c are proved using essentially the same critical droplet heuristic described above. Once a critical droplet has formed, the entire lattice will easily fill up. But filling a droplet-sized area is exponentially unlikely: it is essentially a large deviations event. The theory of large deviations tells us that if a rare event occurs, it will occur in the most probable way that it can. For filling a droplet, this means that one should find an optimal "growth trajectory": a sequence of dimensions from which a very small infected area (a "seed") steadily grows to fill up the entire droplet. For the anisotropic model, in [23], the first and second authors determined this trajectory to be close to x = e 3 py 3 p , where x and y denote the horizontal and vertical dimensions of the seed as it grows. This approximation was enough to yield the first term of p c . In the current paper we establish tighter bounds of optimal trajectory around x = e 3 py 3 p , allowing us to give the sharper estimate for the probability of filling a droplet in Theorem 1.3. As we showed in Sect. 1.4 above, this correction is enough to obtain the first three terms of p c for the anisotropic model.
For the two-neighbor model, however, finding this optimal growth trajectory is not at all the challenge: by symmetry it is trivially x = y. The correction to p c that Gravner, Holroyd, and Morris determined in [28,30,37], is instead due to the much smaller entropic effect of random fluctuations around this trajectory (see also the introduction of [29] for a more detailed explanation of this effect). We believe that such fluctuations also influence p c for the anisotropic model, but that their effect will be much smaller than the improvements that can still be made in controlling the precise shape of the optimal growth trajectory.

About the proofs
The proof of Theorem 1.1 uses a rigorisation of the iterative determination of p c in Sect. 1.4 above, combined with Theorem 1.3 and the classical argument of Aizenman and Lebowitz [4].
The lower bound in Theorem 1.3 is a refinement of the computation in [23].
Most of the work of this paper goes into the proof of the upper bound of Theorem 1.3. Like many recent entries in the bootstrap percolation literature, our proof centers around the "hierarchies" argument of Holroyd [33]. In particular, we sharpen the argument of [23] by incorporating the idea of "good" and "bad" hierarchies from [30], and by using very precise bounds on horizontal and vertical growth of infected rectangular regions.
The main new contributions of this paper (besides the iterative determination of p c ) can be found in Sects. 3 and 6.
In Sect. 3, we introduce the notion of spanning time (Definition 3.3), which characterises to a large extent the structure of configurations of vertical growth. We show that if the spanning time is 0, then such structures have a simple description in terms of paths of infected sites, whereas if the spanning time is not 0, then this description can still be given in terms of paths, but these paths now also involve more complex arrangements of infected sites. We call such arrangements infectors (Definition 3.7), and show that they are sufficiently rare that their contribution does not dominate the probability of vertical growth.
In Sect. 6 we generalise the variational principle of Holroyd [33] to a more general class of growth trajectories. This part of the proof is intended to be more widely applicable than the current anisotropic case, and is set up to allow for precise estimates.

Notation and definitions
Oftentimes, the quantities that we calculate will only depend on the size of R, and be invariant with respect to the position of R. In such cases, when there is no possible confusion, we will write R with x(R) = x and y(R) = y as ] ∩ Z}, and use similar notation for columns.
We say that That is, R is horizontally traversable if the rectangle becomes infected when the two columns to its left are completely infected. Under P p , this event is equiprobable to , and more importantly, it is equivalent to the event that R does not contain three or more consecutive columns without any infected sites and the rightmost column contains an infected site. We That is, R becomes entirely infected when all sites in the row directly below R are infected. Similarly, we say that Again, under P p up and down traversability are equiprobable, so we will only discuss up-traversability. If S is a random site percolation, then we simply say that R is horizontally-or up-or down-traversable. Given rectangles R ⊂ R we write R ⇒ R for the event that the dynamics restricted to R eventually infect all sites of R if all sites in R are infected, i.e., for the event that R = (S ∩ R ) ∪ R .
We will frequently make use of two standard correlation inequalities: The first is the Fortuin-Kasteleyn-Ginibre inequality (FKG-inequality), which states that for increasing events A and B, The second is the van den Berg-Kesten inequality (BK-inequality), which states that for increasing events A and [32,Chapter 2] for a more in-depth discussion).

The structure of this paper
In Sect. 2 we state two key bounds, Lemmas 2.2 and 2.3, giving primarily lower bounds on the probabilities of horizontal and vertical growth of an infected rectangular region, and we use them to prove the lower bound of Theorem 1.3. In Sect. 3 we prove a complementary upper bound on the vertical growth of infected rectangles, Lemma 3.1. In Sect. 4 we prove Lemma 4.1, which combines the upper bounds on horizontal and vertical growth from Lemmas 2.2 and 3.1. This lemma is crucial for the upper bound of Theorem 1.3. We prove the upper bound of Theorem 1.3 in Sect. 5, subject to a variational principle, Lemma 5.9, that we prove in Sect. 6. Finally, in Sect. 7 we use Theorem 1.3 to prove Theorem 1.1.
Proposition 2.1 Let p > 0 and 1 Note that the upper bound on y is different from the bound in Theorem 1.3. For the proof it suffices to show that there exists a subset of configurations that has the desired probability. We choose a subset of configurations that follow a typical "growth trajectory": configurations that contain a small area that is locally densely infected (a seed). We bound the probability that such a seed will grow a bit (which is likely), and then a lot more (which is exponentially unlikely), until the infected region reaches a size where the growth is again very likely, because the boundary of the infected region is large and the dynamics depend only on the existence of infected sites on the boundary, not on their number.
To prove this proposition we will need bounds on the probability that a rectangle becomes infected in the presence of a large infected cluster on its boundary. We state two lemmas that achieve this, which are improvements upon [23, Lemmas 2.1 and 2.2].

Lemma 2.2 For any rectangle [x] × [y]
, Moreover, f ( p, y) satisfies the following bounds: (a) when p → 0 and py → ∞, Proof From [23, Lemma 2.1] 9 we know that When u is close to 1, X = e −(1−u) 3 is an approximate solution for the positive root, since So, as p → 0 and py → ∞, This establishes (a) and (b) simply follows.
To prove (c), recall Rouché's Theorem (see e.g. [39,Theorem 10.43]), which states that if two functions g(z) and h(z) are holomorphic on a bounded region U ⊂ C with continuous boundary ∂U and satisfy |g(z) − h(z)| < |g(z)| for all z ∈ ∂U , then g and h have an equal number of roots on U . Applying Rouché's Theorem with h(z) = a 0 + a 1 z + a 2 z 2 + a 3 z 3 and g(z) = a 0 , it follows that the moduli of the roots of h(z) are all bounded from below by |a 0 |/(|a 0 | + max{|a 1 |, |a 2 |, |a 3 |}). Applying this bound to (2.1) we find that when u > 0 is sufficiently small, where the second inequality is due to a series expansion around u = 0. (We remark that an explicit computation gives α(u) u − 3u 2 for all u > 0, but without relying on a computer this may take several pages to verify.) Since we assumed (1 − p) y → 1 we thus have where we used 1 2 py 1 − (1 − p) y py for p sufficiently small.

Lemma 2.3 (a) If p 2 x is sufficiently small, then we have, for any rectangle [x] × [y]
, Proof We say that a rectangle is North-traversable (N-trav) if the intersection of every row with R contains a site (a, b) such that ((a, b) + N (1,2) )\{(a, b − 1)} contains at least two infected sites. Observe that North-traversability implies up-traversability, so We can similarly define South-traversability by requiring that the intersection of every row with R contains a site (a, b) such that ((a, b) + N (1,2) )\{(a, b + 1)} contains at least two infected sites. South-traversability implies down-traversability. Again, from a probabilistic point of view North-and South-traversability are equivalent, so we will henceforth only discuss North-traversability.
is North-traversable then for each of the y rows there must exist an infected pair of sites u and v and a site z in the row such that u, v ∈ z + N (1,2) . By the FKG inequality we thus have the lower bound For the proof of (a) we apply Janson's inequality [35]. The expected number of infected pairs immediately above an infected rectangle of width x is at least μ = (8x − 16) p 2 . To see this, consider that up to translations there are 8 possible pairs of infected sites above the rectangle that can infect the whole row, see Fig. 2 above. The variance is 10 μ, so the probability that some pair is infected is at least For the proof of (b) we use a cruder approximation: that (a, b) is the leftmost site of an infected pair as in Fig. 2. These pairs all have width at most 5, so the probability that a row of length x does not have an infected pair can be bounded from above by The claim follows.
Proof of Proposition 2. 1 We start by constructing a seed. Let r := 2 p log log 1 p and infect sites (1, 2i) and (2, 2i + 1) for 2i ≤ r . The probability that a rectangle [2] The growth of the seed to a rectangle of arbitrary size can be divided into three stages: Stage 1. By Lemma 2.2(a) the probability of finding a seed of size r that will grow to size e 3r p /(3 p) × [r ] is about the same as the probability of just finding the seed, i.e., Stage 2. Next we bound the probability that the infected rectangle grows to size where m := 1 3 p log 1 p . This is the bottleneck for the growth dynamics. We bound (2.3) by considering the growth in many small steps. In each such step, the rectangle will 10 For two positive sequences a n and b n we write a n b n when a n /b n → ∞ and a n b n when a n /b n → 0. either infect an entire row above or below it, or it will infect an entire row to the left or right of it (with the help of infected sites on the boundary of the rectangle). Because vertical growth is less probable than horizontal growth, we will consider sequences where the rectangle grows by one vertical step, from height to + 1, followed by horizontal growth that infects many columns successively, with the rectangle growing from width x to x +1 where x := e 3 p 3 p . That this choice is close to optimal can be seen in Sect. 6 below, where a variational principle for the upper bound of Theorem 1.3 is derived.
Having divided the growth into steps, we can bound (2.3) from below using the FKG-inequality: We bound these three products separately. It follows from Lemma 2.2(a) that the horizontal growth from width x to x +1 occurs with probability approximately 1/e, i.e., When m − r , then p 2 x log −2 1 p , so we can apply Lemma 2.3(a) to bound Therefore we can bound the second product in (2.4) from below by Using Lemma 2.3(b) we can similarly bound the third product from below by m =m−r +1 Multiplying the bounds (2.6), (2.7), and (2.8), and using that m = 1 3 p log 1 p , we get Now consider the case where it grows vertically, this time to height 3m. This also occurs with probability at least e −O(1/ p) . As the infected region gets larger, the probability that it keeps growing converges to 1. The result is that (2.10) holds for any rectangle R that is large enough, as long as the dimensions of R are sufficiently balanced (which is guaranteed by the assumptions on x and y). Now, by the FKG-inequality, we can multiply the bounds from the three stages (i.e., (2.2), (2.9), and (2.10)) to complete the proof of Proposition 2.1.

An upper bound on the probability of up-traversability
The following bound is crucial for the proof of the upper bound of Theorem 1.3. Recall from (1.1) the definition of the bootstrap operator B, and recall that , and that we write P p to indicate that the elements of S are chosen independently at random with probability p. Lemma 3.1 Let 1 k p −2/5 and let R be a rectangle with dimensions (x, y) such that y < x. Then, for p sufficiently small, We will apply this lemma with 1 p y 1 p log 6 1 p x and k = log 2 1 p . Note that in this case the upper bound given by the lemma is not much larger than the lower bound given by Lemma 2.3. In particular, for these choices of x, y and k, the bound given by the lemma is of the form We begin the proof of Lemma 3.1 with the following simple but important definition: let us say that a pair of sites P is a spanning pair for the row Then there must be a pair of already-infected sites in N (1,2)  We now make another important definition. In words, the spanning time τ is the first time t such that B (t) (A(S)) spans all rows of R. Since R is up-traversable by A(S), it follows by Lemma 3.2 that τ must be finite. However, we emphasise that it is possible that τ > 0, see Fig. 3 for some examples.
The central idea in the proof of Lemma 3.1 is to consider the cases τ = 0 and τ > 0 separately. When τ = 0, the structure is significantly simpler than when τ > 0, which allows for a very sharp estimate. When τ > 0 more complex structures are possible, but more infected sites are required, and this allows us to use a less precise analysis. Fig. 3 Five configurations (the red and black sites) that do not have a spanning pair for the row above the dark grey row at time t = 0, but that create a spanning pair (the red and blue sites) at some time t > 0 by iteration with B. The light grey sites indicate which sites must become infected to create the spanning pair. Note that in each case these sets have minimal cardinality (i.e., if we remove any black site, then iteration of B will not create the spanning pair) (color figure online)

The case τ = 0
Given a rectangle R, let F 0 (R) and F + (R) denote the families of all minimal sets A ⊂ R such that R is up-traversable by A and τ (R, A) = 0 and τ (R, A) > 0, respectively. Let us write U 0 (R) and U + (R) for the upsets generated by F 0 (R) and F + (R), respectively, i.e., the collections of subsets of R that contain a set A ∈ F 0 (R) or A ∈ F + (R), respectively.
The following lemma gives a precise estimate of the probability that a rectangle is up-traversable and τ = 0. (x, y), and let p ∈ (0, 1). Then

Lemma 3.4 Let R be a rectangle with dimensions
We will prove Lemma 3.4 using the first moment method. To be precise, we will show that the expected number of members of F 0 (R) that are contained in S is at most the right-hand side of (3.2). This will follow easily from the following lemma. Lemma 3.5 Let R be a rectangle with dimensions (x, y), and let p ∈ (0, 1). Then To count the sets in F 0 (R), we will need to understand their structure. We will show that each set A ∈ F 0 (R) can be partitioned into "paths" as follows: Proof Since A is a minimal subset of R such that R is up-traversable by A, and τ (R, A) = 0, it follows from Definition 3.3 that A contains a spanning pair for each row of R, and hence (by minimality of A) it follows that A consists exactly of a union of spanning pairs (one pair for each row) and no other sites. Let these pairs be P 1 , . . . , P y , and define a graph on [y] by placing an edge between i and j if P i ∩ P j is non-empty. The sets A 1 , . . . , A r are simply (the elements of A corresponding to) the components of this graph.
Let the components of the graph be C 1 , . . . , C r , and note first that each component is a path, since a spanning pair for row Proof of Lemma 3.5 To count the sets A ∈ F 0 (R), let us first fix |A|, and the sizes of the sets A 1 , . . . , A r given by Lemma 3.6. Recall that r = |A| − y and that A = A 1 ∪ · · · ∪ A r is a partition, and note that |A j | 2 for each j ∈ {1, . . . , r }, since A j is a union of spanning pairs. It follows that there are exactly y − 1 r − 1 ways to choose the sequence (|A 1 |, . . . , |A r |), where we order the sets A j so that if i < j then the top row of A i is no higher than the bottom row of A j . (Note that this is possible because each A i is a union of spanning pairs for some set of consecutive rows of R.) Now, we claim that there are at most Proof of Lemma 3.4 Define a random variable X to be the number of sets A ∈ F 0 (R) that are entirely infected at time zero, i.e., that are contained in our p-random set S. By Markov's inequality and Lemma 3.5, we have

The case τ > 0
In this section we analyse the event S ∩ R ∈ U + (R). If R is up-traversable by S, then let A again denote a subset of S of minimal cardinality such that R is up-traversable by A. By Lemma 3.

above we know that if R is up-traversable by A, then there must exist a time t at which there is spanning pair in B (t) (A) for each row in R.
The following definition isolates the sites that are responsible for the creation of such spanning pairs. Note that spanning pairs are infectors, but that many other configurations are possible: see Fig. 3 for a few examples. Proof Let A be a subset of S such that R is up-traversable by A and such that A is a set with minimal cardinality for this property. By Lemma 3.2, the event that R is up-traversable by A is equivalent to the event that there exists a spanning pair for each row of R after some finite number of iterations of A by the bootstrap operator B. This means that for each row A contains at least one infector. Note that it is a priori possible that the infectors in M(A , R) overlap partially or that an infector for some row is contained in an infector for a row = Recall that for any set Q ⊂ Z 2 we write x(Q) and y(Q) for the horizontal and vertical dimensions of that set. We split the event {S ∩ R ∈ U + (R)} according to whether there exists an infector M with x(M ) 6k 2 or not. [1, k] with k 3 such that k p −1 , and x such that k 5 x p −2 , then

Lemma 3.9 (Wide infectors)
Moreover, M j is the minimal set responsible for the creation of the spanning pair in row j, so it must be the case that M j does not have a gap of more than three consecutive columns. There are at most xk possible positions for the root of the infector. We thus bound (3.3) for the range of x and our choice of k from above by

Lemma 3.10 (Small infectors) There exist no infectors that are not a single spanning pair that intersect precisely one row, and there exist precisely two infectors that are not a single spanning pair that intersect precisely two rows, up to translations. The cardinality of these infectors is 4, and they span both rows they intersect.
Proof Let M j be the infector for some row j. Write v for an element of the spanning pair for row j that becomes infected due to the bootstrap dynamics on M j . (It is easy to see that only one element of a spanning pair can arise after time t = 0, but we do not use this fact.) Suppose t is the first time such that B (t) (M j ) contains a spanning pair. Because M j is not a spanning pair, t 1. Since v becomes infected at time t, it must be the case that |N (1,2) Any configuration of three sites in N (1,2) (v) contains a spanning pair for the row that v is in, so v cannot be in row j. By the definition of spanning pairs, (3.1), a site can either span the row that it is in, or the row below it, so v is in row j + 1. We conclude that there are no infectors that are not a spanning pair that intersect precisely one row. By the same argument, if t 2, then M j must contain a site in row j + 2, so only infectors that intersect two rows can have t = 1.
To analyse P p (S ∩ R ∈ U + (R)) we again divide A into the maximal number of disjoint, "causally independent" pieces, to which we may apply the BK-inequality. We have seen that when τ = 0 these pieces can be described as paths. When τ > 0 this is still the case, but now the path structure can be found at the level of the infectors. We partition A as follows: let r be the largest integer such that there exist sets B 1 , . . . , B r that partition A (i.e., B i ∩ B j = ∅ for all i = j and A = r i=1 B i ) and such that there exist r pairs of integers {(a i , b i )} r i=1 such that Suppose that there exists an such that M ∩ B i = ∅ and M ∩ B j = ∅ for some i = j. Without loss of generality, we can further assume that a i b i . Since M is the minimal set to create a spanning pair for row , and that M ∩ B i is a strict subset of M (since the latter intersects B j , which is disjoint from B i by assumption), we deduce that M ∩ B i cannot contain a spanning pair for row . By Lemma 3.2, this means that [1, x] It then follows that occurs. This gives a contradiction, since by construction the sets B 1 , . . . , B r are the maximal partition of A with this property, so such a j does not exist. So we conclude that if M j ⊂ B i but M j = B i and M j M j for some j < j, then The proof is identical to that of (c), mutatis mutandis.
For all k, , m, x ∈ N, let E +1, +m denote the event that a configuration of infected sites S has the following properties:

, + m] is up-traversable by
A cannot be divided into two or more disjointly occurring pieces, i.e., A = B 1 in the construction described above.
Lemma 3.12 For k 3, + m k and all p ∈ [0, 1], Proof There is at least one infected site in row + 1, and it can be at x positions. By Lemma 3.11, the event E +1, +m implies that A is the union of infectors that are not disjoint. Since, moreover, none of the infectors are wider than 6k 2 − 1, for each of the rows + 2, . . . , + m we then need to have at least 1 infected site in the line-segment [− 6k 2 − 3, 6k 2 + 3] directly above the infected site of the row below it. Finally, row + m must also be spanned, and by Lemma 3.10 its spanning pair must already be present at time t = 0, so there must be another infected site in that row, in one of the four positions that can create a spanning pair for line + m. We thus bound The following lemma states the key inequality for the induction: Lemma 3.13 For k 2,  A. Let B 1 , . . . , B r be the subdivision of A described above. Let u ∈ A and v ∈ A \A be such that {u, v} form a spanning pair for the row i, while A does not contain a spanning pair for row i. At least one such pair must exist since V + 1,k occurs. Let j be such that M i ⊆ B j (we can find such a B j by Lemma 3.11(a)). Suppose that B j spans exactly the rows + 1, . . . , + m (i.e., a j = + 1 and b j = + m). Then, by the construction of B 1 , . . . , B r and E +1, +m we know that occurs for S. Applying the BK-inequality and summing over and m gives the asserted inequality. The sum over m starts at 2 because by Lemma 3.10, B j must span at least two rows.

The proof of Lemma 3.1
To begin, assume that 3k 2 p x 1 p 2 . We start by proving Lemma 3.1 for the cases where y k. More precisely, we will prove that holds for k p −1 . We use induction. The inductive hypothesis is that (3.4) holds for k k − 1 and k 5 x p −2 . To initialise the induction we observe that when k = 1 there exist four spanning pairs up to translations that intersect one row, so P p (V 1,1 ) 4 p 2 x < e(8 p 2 x + 8 p). When k = 2 we use Lemma 3.10 to bound which, combined with Lemma 3.4 yields that when p is sufficiently small. When 3 k p −1 , by (3.5), Lemmas 3.4, 3.9, 3.12, and 3.13, and the induction hypothesis (3.5), when p is sufficiently small, where the second term on the right-hand side is due to Lemma 3.9, and the third and fourth correspond to the m = 2 and m 3 terms in Lemma 3.13. It is not difficult to show that When 3k 2 p x we have 12 pk 2 + 7 p 1 2 (8 p 2 x + 8 p), so this implies that Inspecting (3.6), it follows that the desired bound (3.4) holds if the following two inequalities hold for p sufficiently small: The first inequality holds because k x. It is easy to verify that the second inequality holds when k 5 x p −2 . Substituting the above inequalities into (3.6) proves the claim of Lemma 3.1 for y k.
Now we consider R = [1, x] × [1, y] for y such that k < y < x (still assuming that 3k 2 p x 1 p 2 ). We cover R with y/k rectangles of height k. If y is not divisible by k the covering "overshoots": it includes at most k − 1 rows that are not in R. If R is up-traversable, and if the overshoot contains a connected upward path, then all these rectangles are also up-traversable. The probability that there is a connected path in the overshoot is at least p k . It thus follows by the BK-inequality that where these bounds again hold for p sufficiently small. This completes the proof of

The probability of simultaneous horizontal and vertical growth
The lemma below states an upper bound on the probability of an infected rectangle growing both vertically and horizontally, i.e., an upper bound on P p (R ⇒ R ) for certain R ⊂ R . For two rectangles R ⊂ R with dimensions (x, y) and (x + s, y + t), let Observe that ψ and φ are both positive, decreasing, and convex functions (where they are not zero). y) and (x + s, y + t) respectively. Assume that t 1 p log −4 1 p . Then, for p sufficiently small, The proof uses a similar strategy as [23, Proof of Proposition 3.3]. Roughly speaking this strategy entails that we "decorrelate" the horizontal and vertical growth events needed for {R ⇒ R }.
Proof If y + t 4 p log log 1 p and x + s > 1/ p 2 , then we use the trivial bound P p (R ⇒ R ) 1, corresponding to U p (R, R ) = 0, as required.
If y + t 4 p log log 1 p and x + s 1/ p 2 , then we apply Lemma 3.1 (with k = ξ ), again giving the required bound.
Therefore, we assume henceforth that y + t > 4 p log log 1 p and x + s 1 p 2 . To start, suppose that (1 − δ ξ )tψ(x + s) > δ ξ sφ(y + t), which corresponds to the vertical growth component t being disproportionately large compared to the horizontal growth component s. Then, we can simply ignore the horizontal growth and apply Lemma 3.1 to bound and we are done. Therefore, let us henceforth also assume that We identify five (intersecting) regions within the area R \R: the North, South, West, and East regions R n , R s , R w , and R e , and the corner region H : , such that a 1 c 1 < c 2 a 2 and b 1 d 1 < d 2 b 2 , we define the sets We split (4.6) We start by bounding the first term in (4.6). Let F := {Y s/(2ξ)}. We use Lemma 3.1 with k = ξ = log 2 1 p to bound Let R n denote the set of all sets of n subrectangles of R e ∪ R w with heights y + t, total width n, and such that each pair of rectangles in a set r ∈ R n are separated by at least one column. I.e., for r = {r i } ∈ R n define the following two events: that is, E 2 (r) is the event that r is the partition into the least number of rectangles of total width n that does not intersect M H , and that there is no partition of total width greater than n that also does not intersect M H . Observe that Thus, where we used that the sum may be restricted to m s/(2ξ) by the conditioning on F. Now note that the events E and F can be verified by inspecting only M H , which, on the event E 2 (r) is contained in H \r, while E 1 (r) by definition only depends on the sites in r, so that conditionally on E 2 (r) the event E 1 (r) is independent of E and F. We may thus write Observe that for any fixed r the event E 1 (r) is increasing. Indeed, adding more sites to S can either make horizontal traversal occur when it did not before, or else, have no effect. We claim that the event E 2 (r), on the other hand, is the intersection of three decreasing events, and hence itself a decreasing event. To see this, observe that the first event in (4.8) is decreasing because adding more sites to S cannot decrease the total width of M H , since it is the union of all infectors intersecting H (not only those of minimal cardinality for a given row). The second event in (4.8) is likewise decreasing, because increasing M H cannot decrease the minimal number of rectangles of a partition that does not intersect M H , unless it also decreases the total width of that partition. The third event is decreasing because increasing M H cannot decrease its total width. Therefore, we may apply the FKG-inequality to obtain and we may thus further bound the right-hand side of (4.9) by Uniformly for any fixed r ∈ R s−m with m s/(2ξ), by Lemma 2.2, where the final inequality follows from Lemma 2.2(b) when p is sufficiently small. Inserting this bound in (4.9), we proceed by using that the events E 2 (r) are mutually disjoint for all r to bound (4.10) Combining (4.7) and (4.10) we bound the first term in (4.6) by p −ξ exp(−U p (R, R )). Now we bound the second term in (4.6). If Y > s/(2ξ) then at least s/(2ξ) out of s columns are non-empty. The probability that a column is non-empty is 1 − (1 − p) t 2 pt (when p is sufficiently small). Therefore, P(Y > s/(2ξ)) P(Bin(s, 2 pt) > s/(2ξ)). We use Chernoff's bound that P(Bin(n, p) > q) e −q when q > np to estimate P p (Y > s/(2ξ)) exp(−s/(2ξ)) (here we used that t 1 p log −4 1 p ). Observe that since ξ = log 2 1 p , δ ξ = 1 − ξ −1 , and φ(y + t) > log −12 1 p by our assumption that y + t > 4 p log log 1 p , we have Now recall our assumption (4.5) that (1 − δ ξ )tψ(x + s) δ ξ sφ(y + t). Applying this inequality twice, it follows that We thus have P p (Y > s/(2ξ)) exp(−U p (R, R )), as required. Applying the bounds for the two cases to (4.6) completes the proof (using the crude upper bound p −ξ + 1 2 p −ξ for p sufficiently small).

Notation and definitions
Before we proceed with the proof, we must introduce some more notation and a few definitions. Our proof uses hierarchies. The notion of hierarchies is due to Holroyd [33], and is common to much of the bootstrap percolation literature since. Here we use a definition of a hierarchy that is similar to the one in [23]: (a) Hierarchy, seed, normal vertex, and splitter: A hierarchy H is a rooted tree with out-degrees at most three 11 and with each vertex v labeled by non-empty rectangle R v such that R v contains all the rectangles that label the descendants of v. If the number of descendants of a vertex is 0, we call the vertex a seed. 12 If the vertex has one descendant, we call it a normal vertex, and we write u → v to indicate that u is a normal vertex with (unique) descendant v. If the vertex has two or more descendants, we call it a splitter vertex. We write N (H) for the number of vertices in the tree H. (b) Precision: A hierarchy of precision Z (with Z 1) is a hierarchy that satisfies the following conditions: (1) If w is a seed, then x(R w ) 2 and y(R w ) < 2Z , while if u is a normal vertex or a splitter, then y(R u ) 2Z . (1) For each seed w, R w = R w ∩ S (i.e., R w is internally filled by S).
(2) For each normal u and every v such that u → v, R u = (R v ∪ S) ∩ R u (i.e., the event {R v ⇒ R u } occurs on S).
(d) Goodness: Similar to [30], we say that a seed w is large if Z /3 y(R w ) Z .
We call a hierarchy good if it has at most log 11 1 p large seeds, and we call it bad otherwise.

Outline of the proof of Proposition 5.1
In this section we give the proof of Proposition 5.1 subject to Lemma 5.9 below. We prove Lemma 5.9 in Sect. 6 The proof of this lemma is the same as the proof of [23, Proposition 3.8] so we do not repeat it here. (But note that it does not matter that our definition of hierarchies uses "internally filled" rather than "k-occurs".) Throughout this paper, let Conform the hypothesis of Proposition 5.1 we restrict ourselves to hierarchies with root label R p of dimensions (x, y) such that For the sake of simplicity we often suppress subscripts Z p and R p . We bound the good and bad hierarchies separately: We bound the second term with the following lemma:

Lemma 5.4 As p tends to 0 we have
where for the third inequality we used the assumption on x(R s ) and that y(R s ) < 2Z = 2 p log −8 1 p . Proposition 5.1 thus holds for hierarchies with wide seeds. Let us therefore assume from here on that x(R s ) 1 3 p log 12 1 p for all seeds. By the BK-inequality we have (we ignore here the contributions from splitter vertices).
The following lemma is used to determine a bound for the product of the seeds: whereR n = 1, x(R u 1 ) + · · · + x(R u n ) × 1, y(R u 1 ) + · · · + y(R u n ) .
Proof For any p > 0 and any rectangle R with dimensions (x, y) with min{x, y} 2 and any a 2, b 1, we have Any seed of a hierarchy must have dimensions at least (2, 1) by definition, so an iterated application of (5.5) completes the proof.
Recall the definition of U p (R, R ) in (4.3) above. We use Lemmas 4.1 and 5.6 to bound the first product on the right-hand side of (5.4): To bound the second product of (5.4), we use the following lemma: Lemma 5.7 Let p > 0. Let N splitter denote the number of splitter vertices of the hierarchy H. Then there exists an integerN =N (H) 1 and a sequence of nested rectanglesR 0 ⊂ · · · ⊂RN with the following properties: •R 0 =R N seed (withR N seed as defined in Lemma 5.6 above), •RN has dimensions larger than R, The proof of this lemma goes by induction, using Lemma 4.1, and it is essentially the same as the proof of [23, Lemma 3.11], so we omit it here.
We use Lemma 5.7 to determine that there exist rectanglesR 1 ⊂ · · · ⊂RN satisfying the conditions of the lemma such that v →w Using Lemmas 4.1 and 5.7 and writing (R n ) N n=0 for the concatenation of the sequences (R n ) N seed n=0 and (R n )N n=0 , i.e., (R n ) N n=0 := (R 0 ,R 1 , . . . ,R N seed ,R 1 , . . . , RN ) with N := N seed +N , we bound To bound the first factor in (5.6) we use the following lemma: Assume throughout this section that f (x) and g(y) are positive, non-increasing, convex, Riemann-integrable functions. Let R + = (0, ∞) and for a = (a 1 , a 2 ) ∈ R 2 For a, b ∈ R 2 + with a b and any path γ from a to b, define and To start, an elementary lemma: The proof is easy (see [33,Sect. 6]). Let Note that since f (x) and g(y) are assumed to be convex decreasing functions, f,g describes a simple curve in For sets A, B ⊆ R 2 + we say that A lies Northwest of B and we write A B if for any a ∈ A and any b ∈ B that satisfy a 1 + a 2 = b 1 + b 2 we have a 2 b 2 . Lemma 6.2 If γ 1 and γ 2 are paths from a to b, and we have either γ 1 γ 2 f,g or f,g γ 2 γ 1 , then w f,g (γ 1 ) w f,g (γ 2 ).
By Green's Theorem in the plane we have Now, since γ 1 γ 2 f,g we have H f,g , and since moreover f and g are convex decreasing functions, we have g (y) − f (x) 0 for all (x, y) ∈ H . It follows that w f,g (γ 1 ) − w f,g (γ 2 ) 0.
By the same reasoning we have w f,g (γ 1 ) − w f,g (γ 2 ) 0 when f,g γ 2 γ 1 . Using Lemma 6.3 we split the minimising integral from u to v into three parts: By Lemma 6.4, c 1 , a 2 ) → c).
Moreover, by Lemma 6.3, By Lemma 6.2 and the fact that either And finally, since v → d γ → b is a path from v to b, Combining the above inequalities we obtain Now we consider case (b), that γ ∩ ψ,φ = ∅. Let f be the first point on γ such that f 1 = u 1 , and let g be the first point on γ such that g 1 = v 1 . See Fig. 6b. We divide the integral along γ into three parts: By Lemma 6.4, Since γ ∩ ψ,φ = ∅, either f 2 u 2 and g 2 v 2 or f 2 u 2 and g 2 v 2 , so that either It thus follows by Lemmas 6.2 and 6.3 that Combining the above inequalities we obtain φ (v, b). (6.6) From (6.5) and (6.6) we conclude that the lower bound w ψ,φ (γ ) W ψ,φ (a, u) + W ψ,φ (u, v) + W ψ,φ (v, b) holds uniformly for any path γ , and therefore, it also holds for the infimum, completing the proof.
The following lemma now states the crucial bound: (Observe that the first term in the first integral in (6.9) thus gives a complementary bound to (2.7), while the second term is complementary to (2.6).) It follows that Substituting (6.8) and (6.10) into (6.7) completes the proof.
Recall the definition of U p (R, R ) from (4.3) and recall that δ ξ = 1 − log −2 1 p . It follows from Lemma 6.1 that (x N , y N )) , and it follows from Lemma 6.6 that the right-hand side is bounded from below by We start with the upper bound. From [23] we know that if L e p −1+ε for any ε > 0, then P p ([L] 2 is IF) = o(1), so we assume that L e p −1+ε . Let m = p −5 . The probability that [L] 2 is internally filled is bounded from below by the probability that where the second inequality holds for p sufficiently small. Taking minus the logarithm on both sides and applying the above bound and Proposition 2.1 again we obtain the inequality such that m/3 x m and n/3 y n. It is a straightforward consequence of the proof of [23,Lemma 3.7] that if [L] 2 is internally filled, then there must exist a rectangle R ∈ R such that {R ⇒ [L] 2 } occurs. 13 The number of rectangles in R is bounded by mnL 2 , so by Proposition 5.1, Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.