1 Introduction

1.1 Motivation and statement of the main result

Bootstrap percolation is a general name for the dynamics of monotone, two-state cellular automata on a graph G. Bootstrap percolation models with different rules and on different graphs have since their invention by Chalupa et al. [20] been applied in various contexts and the mathematical properties of bootstrap percolation are an active area of research at the intersection between probability theory and combinatorics. See for instance [1, 2, 4, 5, 8, 23, 33] and the references therein.

Motivated by applications to statistical (solid-state) physics such as the Glauber dynamics of the Ising model [26, 36] and kinetically constrained spin models [17], the underlying graph is often taken to be a d-dimensional lattice, and the initial state is usually chosen randomly.

Although some progress has recently been made in the study of very general cellular automata on lattices [12, 14, 25], attention so far has mainly focused on obtaining a very precise understanding of the metastable transition for specific simple models [4, 8, 9, 18, 23, 30, 33].

In this paper we will provide the most detailed description so far for such a model; namely, the so-called anisotropic bootstrap percolation model, defined as follows: First, given a finite set \({\mathcal {N}}\subset {\mathbb {Z}}^d {\setminus } \{0\}\) (the neighbourhood) and an integer r (the threshold), define the bootstrap operator

$$\begin{aligned} {\mathcal {B}}({\mathcal {S}}) \, {:=} \, {\mathcal {S}}\cup \big \{ v \in {\mathbb {Z}}^d \,:\, | (v + {\mathcal {N}}) \cap {\mathcal {S}}| \geqslant r \big \} \end{aligned}$$
(1.1)

for every set \({\mathcal {S}}\subset {\mathbb {Z}}^d\). That is, viewing \({\mathcal {S}}\) as the set of “infected” sites, every site v that has at least r infected “neighbours” in \(v+{\mathcal {N}}\) becomes infected by the application of \({\mathcal {B}}\). For \(t \in {\mathbb {N}}\) let \({\mathcal {B}}^{(t)} ({\mathcal {S}}) = {\mathcal {B}}({\mathcal {B}}^{(t-1)}({\mathcal {S}}))\), where \({\mathcal {B}}^{(0)}({\mathcal {S}}) = {\mathcal {S}}\), and let \(\langle {\mathcal {S}}\rangle = \lim _{t \rightarrow \infty } {\mathcal {B}}^{(t)}({\mathcal {S}})\) denote the set of eventually infected sites. For each \(p \in (0,1)\), let \({\mathbb {P}}_p\) denote the probability measure under which the elements of the initial set \({\mathcal {S}}\subset {\mathbb {Z}}^d\) are chosen independently at random with probability p, and for each set \(\Lambda \subset {\mathbb {Z}}^d\), define the critical probability on \(\Lambda \) to be

$$\begin{aligned} p_c\big ( \Lambda , {\mathcal {N}}, r \big ) \, {:=} \, \inf \big \{ p > 0 \,:\, {\mathbb {P}}_p\big ( \Lambda \subset \langle {\mathcal {S}}\cap \Lambda \rangle \big ) \geqslant 1/2 \big \}. \end{aligned}$$
(1.2)

If \(\Lambda \subset \langle {\mathcal {S}}\cap \Lambda \rangle \) then we say that \({\mathcal {S}}\) percolates on \(\Lambda \). We remark that since we will usually expect the probability of percolation to undergo a sharp transition around \(p_c\), the choice of the constant 1 / 2 in the definition (1.2) is not significant.

The anisotropic bootstrap percolation model is a specific two-dimensional process in the family described above. To be precise, set \(d = 2\) and

$$\begin{aligned} {\mathcal {N}}_{\scriptscriptstyle (1,2)}\; {:=} \; \big \{ (-\,2,0), (-\,1,0), (0,-\,1), (0,1), (1,0), (2,0) \big \} \end{aligned}$$

or graphically,

and set \(r = 3\). \({\mathcal {N}}_{\scriptscriptstyle (1,2)}\) is sometimes called the “(1, 2)-neighbourhood” of the origin. See Fig. 1 for an illustration of the behaviour of the anisotropic model.

Fig. 1
figure 1

On the left: a final configuration of the anisotropic model on \([40]^2\). Note that not all stable shapes are rectangles. On the right: a final configuration on \([200]^2\) with \(p=0.085\), where the color of each site represents the time it became infected. Blue sites became infected first, red sites last (color figure online)

The main result of this paper is the following theorem:Footnote 1

Theorem 1.1

The critical probability of the anisotropic bootstrap percolation model satisfies

$$\begin{aligned} p_c\big ( [L]^2, {\mathcal {N}}_{\scriptscriptstyle (1,2)}, 3 \big ) \, = \, \frac{\log \log L}{12 \log L} \Big ( \log \log L - 4 \log \log \log L + 2 \log \frac{9\mathrm {e}}{2} \pm o(1) \Big ).\nonumber \\ \end{aligned}$$
(1.3)

To put this theorem in context, let us recall some of the previous results obtained for bootstrap processes in two dimensions. The archetypal example of a bootstrap percolation model is the “two-neighbour model”, that is, the process with neighbourhood

$$\begin{aligned} {\mathcal {N}}_{\scriptscriptstyle (1,1)}\, {:=} \, \big \{ (-\,1,0), (0,-\,1), (0,1), (1,0) \big \} \end{aligned}$$

and \(r = 2\). The strongest known bounds are due to Gravner, Holroyd, and Morris [28, 30, 37], who, building on work of Aizenman and Lebowitz [4] and Holroyd [33], proved that

$$\begin{aligned} p_c\big ( [L]^2, {\mathcal {N}}_{\scriptscriptstyle (1,1)}, 2 \big ) \, = \, \frac{\pi ^2}{18 \log L} - \Theta \bigg ( \frac{1}{(\log L)^{3/2}} \bigg ). \end{aligned}$$
(1.4)

The anisotropic model was first studied by Gravner and Griffeath [27] in 1996. In 2007, the second and third authors [41] determined the correct order of magnitude of \(p_c\). More recently, the first and second authors [23] proved that the anisotropic model exhibits a sharp threshold by determining the first term in (1.3).

The “Duarte model” is another anisotropic model that has been studied extensively [13, 22, 38]. The Duarte model has neighbourhood

$$\begin{aligned} {\mathcal {N}}_{\text {Duarte}}\, = \, \big \{ (-\,1,0), (0,-\,1), (0,1) \big \} \end{aligned}$$

and \(r = 2\). The sharpest known bounds here are due to the Bollobás et al. [13]:

$$\begin{aligned} p_c\big ({\mathbb {Z}}_L^2, {\mathcal {N}}_{\text {Duarte}},2 \big ) \, = \, \frac{(\log \log L)^2}{8\log L}(1 \pm o(1)). \end{aligned}$$

Although the Duarte model has the same first order asymptotics for \(p_c\) as the anisotropic model (up to the constant), the behaviour is very different. In particular, the Duarte model has a “drift” to the right: clusters grow only vertically and to the right. This asymmetry has severe consequences for the analysis of the model (especially for the shape of critical droplets).

The “r-neighbour model” in d dimensions generalises the standard (two-neighbour) model described above. In this model, a vertex of \({\mathbb {Z}}^d\) is infected by the process as soon as it acquires at least r already-infected nearest neighbours. Building on work of Aizenman and Lebowitz [4], Schonmann [40], Cerf and Cirillo [18], Cerf and Manzo [19], Holroyd [33] and Balogh et al. [9, 10], the following sharp threshold result for all non-trivial pairs (dr) was obtained by Balogh et al. [8]: for every \(d \geqslant r \geqslant 2\), there exists an (explicit) constant \(\lambda (d,r) > 0\) such that

$$\begin{aligned} p_c\big ( [L]^d, {\mathcal {N}}_{\scriptscriptstyle (1,\ldots ,1)}, r \big ) \, = \, \bigg ( \frac{\lambda (d,r) \pm o(1)}{\log _{(r-1)} L} \bigg )^{d-r+1}. \end{aligned}$$

(Here, and throughout the paper, \(\log _{\scriptscriptstyle (k)}\) denotes a k-times iterated logarithm.)

Finally, we remark that much weaker bounds (differing by a large constant factor) have recently been obtained for an extremely general class of two-dimensional models by Bollobás et al. [12], see Sect. 1.3, below. Moreover, stronger bounds (differing by a factor of \(1 + o(1)\)) were proved for a certain subclass of these models (including the two-neighbour model, but not the anisotropic model) by the Duminil-Copin and Holroyd [25].

Although various other specific models have been studied (see e.g. [15, 16, 34]), in each case the authors fell very far short of determining the second term.

1.2 The bootstrap percolation paradox

In [33] Holroyd for the first time determined sharp first order bounds on \(p_c\) for the standard model, and observed that they were very far removed from numerical estimates: \(\pi ^2/18 \approx 0.55\), while the same constant was numerically determined to be \(0.245 \pm 0.015\) on the basis of simulations of lattices up to \(L = 28800\) [3]. This phenomenon became known in the literature as the bootstrap percolation paradox, see e.g. [2, 21, 28, 31].

An attempt to explain this phenomenon goes as follows: if the convergence of \(p_c\) to its first-order asymptotic value is extremely slow, while for any fixed L the transition around \(p_c\) is very sharp, then it may appear that \(p_c\) converges to a fixed value long before it actually does.

This indeed appears to be the case. The first rigorisation of the “extremely slow convergence” part of this argument appears in [28], for a model related to bootstrap percolation. Theorem 1.1 gives another unambiguous illustration of extremely slow convergence for a bootstrap percolation model: the second term in (1.3) is actually larger than the first while

$$\begin{aligned} 4 \log \log \log L > \log \log L, \end{aligned}$$

which holds for all L in the range \(66< L < 10^{2390}\). Moreover, the second term does not become negligible (smaller than \(1\%\) of the first term, say) until \(L > 10^{10^{1403}}\). On relatively small lattices, even the third term makes a significant contribution to \(p_c\): it is larger than the first term when \(L < 10^{60}\) and larger than the second term when \(L < 10^{13}\).

The “sharp transition” part of the argument has also been made rigorous: for the standard model, an application of the Friedgut–Kalai sharp-threshold theorem [7] tells us that the “\(\varepsilon \)-window of the transition”Footnote 2 is

$$\begin{aligned} p_{1 - \varepsilon }([L]^2,{\mathcal {N}}_{\scriptscriptstyle (1,1)},2) -p_{\varepsilon }([L]^2,{\mathcal {N}}_{\scriptscriptstyle (1,1)},2) = O\left( \frac{\log \log L}{ {\log }^2 L}\right) . \end{aligned}$$

So the \(\varepsilon \)-window is much smaller than the second order asymptotics in (1.4).

For the anisotropic model a similar analysis [11] yields that the \(\varepsilon \)-window satisfies

$$\begin{aligned} p_{1- \varepsilon }([L]^2,{\mathcal {N}}_{\scriptscriptstyle (1,2)},3)-p_{\varepsilon }([L]^2,{\mathcal {N}}_{\scriptscriptstyle (1,2)},3) = O\left( \frac{\log ^3 \log L}{ \log ^2 L } \right) , \end{aligned}$$

which is again much smaller than the second and third order asymptotics in Theorem 1.1. So our analysis supports the above explanation of the bootstrap percolation paradox.

1.3 Universality

Recently, a very general family of bootstrap-type processes was introduced and studied by Bollobás et al. [14]. To define this family, let \({\mathcal {U}}= \{X_1,\ldots ,X_m\}\) be a finite collection of finite subsets of \({\mathbb {Z}}^d {\setminus } \{0\}\), and define the corresponding bootstrap operator by setting

$$\begin{aligned} {\mathcal {B}}_{\scriptscriptstyle {\mathcal {U}}}({\mathcal {S}}) \, = \, {\mathcal {S}}\cup \big \{ v \in {\mathbb {Z}}^d \,:\, v + X \subset {\mathcal {S}}\;\text { for some } X \in {\mathcal {U}}\big \} \end{aligned}$$

for every set \({\mathcal {S}}\subset {\mathbb {Z}}^d\). It is not hard to see that all of the bootstrap processes described above can be encoded by such an ‘update family’ \({\mathcal {U}}\), and in fact this definition is substantially more general. The key discovery of [14] was that in two dimensions the class of such monotone cellular automata can be elegantly partitionedFootnote 3 into three classes, each with completely different behaviour. More precisely, for every two-dimensional update family \({\mathcal {U}}\), one of the following holds:

  • \({\mathcal {U}}\) is “supercritical” and has polynomial critical probability.

  • \({\mathcal {U}}\) is “critical” and has poly-logarithmic critical probability.

  • \({\mathcal {U}}\) is “subcritical” and has critical probability bounded away from zero.

We emphasise that the first two statements were proved in [14], but the third was proved slightly later, by Balister et al. [6]. Note that the critical class includes the two-neighbour, anisotropic and Duarte models (as well as many others, of course). For this class a much more precise result was recently obtained by Bollobás et al. [12]. In order to state this result, let us first (informally) define a two-dimensional update family to be “balanced” if its growth is asymptotically two-dimensionalFootnote 4 (like that of the two-neighbour model), and “unbalanced” if its growth is asymptotically one-dimensional (like that of the anisotropic and Duarte models). The following theorem was proved in [12].

Theorem 1.2

Let \({\mathcal {U}}\) be a critical two-dimensional bootstrap percolation update family. There exists \(\alpha = \alpha ({\mathcal {U}}) \in {\mathbb {N}}\) such that the following holds:

  1. (a)

    If \({\mathcal {U}}\) is balanced, thenFootnote 5

    $$\begin{aligned} p_c\big ( {\mathbb {Z}}_L^2,{\mathcal {U}}\big ) = \Theta \bigg ( \frac{1}{(\log L)^{1/\alpha }} \bigg ). \end{aligned}$$
  2. (b)

    If \({\mathcal {U}}\) is unbalanced, then

    $$\begin{aligned} p_c\big ( {\mathbb {Z}}_L^2,{\mathcal {U}}\big ) = \Theta \bigg ( \frac{(\log \log L)^2}{(\log L)^{1/\alpha }} \bigg ). \end{aligned}$$

Theorem 1.2 thus justifies our view of the anisotropic model as a canonical example of an unbalanced model.

1.4 Internally filling a critical droplet

As usual in (critical) bootstrap percolation, the key step in the proof of Theorem 1.1 will be to obtain very precise bounds on the probability that a “critical droplet” R is internally filled Footnote 6 (IF), i.e., that \(R \subset \langle {\mathcal {S}}\cap R \rangle \). We will prove the following bounds:

Theorem 1.3

Let \(p > 0\) and \(x,y \in {\mathbb {N}}\) be such that \(1/p^2 \leqslant x \leqslant 1/p^5\) and \(\frac{1}{3p} \log \frac{1}{p} \leqslant y \leqslant \frac{1}{p} \log \frac{1}{p}\), and let R be an \(x \times y\) rectangle. Then

$$\begin{aligned} {\mathbb {P}}_p\big ( R \text { is internally filled} \big ) \, = \, \exp \left( -\frac{1}{6p} \left( \log \frac{1}{p} \right) ^2 + \left( \frac{1}{3} \log \frac{8}{3\mathrm {e}} \pm o(1) \right) \frac{1}{p} \log \frac{1}{p} \right) . \end{aligned}$$

The alert reader may have noticed the following surprising fact: we obtain the first three terms of \(p_c( [L]^2, {\mathcal {N}}_{\scriptscriptstyle (1,2)}, 3 )\) in Theorem 1.1, despite only determining the first two terms of \(\log {\mathbb {P}}_p( R\) is IF) in Theorem 1.3. We will show how to formally deduce Theorem 1.1 from Theorem 1.3 in Sect. 7, but let us begin by giving a brief outline of the argument.

To slightly simplify the calculations, let us write

$$\begin{aligned} C_1 : = \frac{1}{12} \qquad \text { and }\qquad C_2 := \frac{1}{6} \log \frac{8}{3 \mathrm {e}}. \end{aligned}$$

We claim (and will later prove) that \(p_c = p_c( [L]^2, {\mathcal {N}}_{\scriptscriptstyle (1,2)}, 3 )\) is essentially equal to the value of p for which the expected number of internally filled critical droplets in \([L]^2\) is equal to 1 (the idea being that a critical droplet with size as in Theorem 1.3 will keep growing indefinitely with probability very close to one). We therefore have

$$\begin{aligned} L^2 \, \approx \, \exp \left( \frac{2C_1}{p_c} \left( \log \frac{1}{p_c} \right) ^2 - \frac{2C_2}{p_c} \log \frac{1}{p_c} \right) , \end{aligned}$$

and hence

$$\begin{aligned} p_c \, \approx \, \frac{C_1}{\log L} \left( \log \frac{1}{p_c} \right) ^2 - \frac{C_2}{\log L} \log \frac{1}{p_c}. \end{aligned}$$

Iterating the right-hand side gives

$$\begin{aligned} p_c \,\approx & {} \, \frac{C_1}{\log L} \left( \log \log L - 2 \log \log \frac{1}{p_c} - \log C_1 \right) ^2 \\&- \frac{C_2}{\log L} \left( \log \log L - 2 \log \log \frac{1}{p_c} \right) . \end{aligned}$$

Upon using the approximation \(\log \log (1/p_c) \approx \log \log \log L\) and multiplying out, this reduces to

$$\begin{aligned} p_c \, \approx \, \frac{C_1(\log \log L)^2}{\log L} - \frac{4C_1\log \log L \log \log \log L}{\log L} - \big ( C_2 + 2C_1\log C_1 \big ) \frac{\log \log L}{\log L}, \end{aligned}$$

which is what we hope to prove. Thus we obtain three terms for the price of two.

1.5 A generalisation of the anisotropic model

One natural way to generalise the anisotropic model is to consider, for each \(b > a \geqslant 1\), the neighbourhood

$$\begin{aligned} {\mathcal {N}}_{\scriptscriptstyle (a,b)}\; = \; \left( \big \{ (0,y) \in {\mathbb {Z}}^2 : -a \leqslant y \leqslant a \big \} \cup \big \{ (x,0) \in {\mathbb {Z}}^2 : -b \leqslant x \leqslant b \big \}\right) {\setminus } \{ (0,0) \}. \end{aligned}$$

It follows from Theorem 1.2 that

$$\begin{aligned} p_c\big ( {\mathbb {Z}}_L^2, {\mathcal {N}}_{\scriptscriptstyle (a,b)},r) \, = \, \Theta \bigg ( \frac{(\log \log L)^2}{(\log L)^{1/\alpha }} \bigg ), \end{aligned}$$

where \(\alpha = r - b\), for each \(b +1 \leqslant r \leqslant a + b\).Footnote 7 The arguments developed in [23] can be applied to prove that the leading order behaviour of \(p_c\) for the (1, b)-model isFootnote 8

$$\begin{aligned} p_c\big ( [L]^2, {\mathcal {N}}_{\scriptscriptstyle (1,b)}, b+1 \big ) \, = \, \bigg ( \frac{(b-1)^2}{4(b+1)} \pm o(1) \bigg ) \frac{(\log \log L)^2}{\log L}. \end{aligned}$$

Combining the techniques of [23] with those introduced in this paper, it is possible to prove the following stronger bounds:

Theorem 1.4

Given \(b \geqslant 2\), set

$$\begin{aligned} C(b) = \frac{2}{b-1} \log \left( \frac{{2b \atopwithdelims ()b} - \frac{2b-1}{b+1} {2b-2 \atopwithdelims ()b} -1 + {2b \atopwithdelims ()b-1} - {2b-2 \atopwithdelims ()b-3}}{(b+1) \mathrm {e}}\right) + 2 \log \left( \frac{(b-1)^2}{4 (b+1)}\right) . \end{aligned}$$

Then

$$\begin{aligned}&p_c\big ( [L]^2, {\mathcal {N}}_{\scriptscriptstyle (1,b)}, b+1 \big ) \nonumber \\&\quad = \frac{(b-1)^2}{4(b+1)} \frac{\log \log L}{\log L} \Big ( \log \log L - 4 \log \log \log L - C(b) \pm o(1) \Big ). \end{aligned}$$
(1.5)

Note that in the case \(b = 2\) this reduces to Theorem 1.1. We remark that Theorem 1.4 follows from a corresponding generalisation of Theorem 1.3, with the constants \(\frac{1}{6}\) and \(\frac{1}{3} \log \frac{8}{3\mathrm {e}}\) replaced by

$$\begin{aligned} \frac{(b-1)^2}{2(b+1)} \qquad \text { and } \qquad \frac{b-1}{b+1} \log \left( \frac{{2b \atopwithdelims ()b} - \frac{2b-1}{b+1} {2b-2 \atopwithdelims ()b} -1 + {2b \atopwithdelims ()b-1} - {2b-2 \atopwithdelims ()b-3}}{(b+1)\mathrm {e}}\right) , \end{aligned}$$

respectively.

We will not prove Theorem 1.4, since the proof is conceptually the same as that in the case \(b = 2\), but requires several straightforward but lengthy calculations that might obscure the key ideas of the proof. It is, however, not too hard to see where the numerical factors come from:

A droplet grows horizontally in the (1, b)-model as long as it does not occur that the \(b+1\) consecutive columns to its left and/or right do not contain an infected site. And it grows vertically as long as there are b sites in a “growth configuration” somewhere above and/or below. There are

$$\begin{aligned} {2b \atopwithdelims ()b} - \frac{2b-1}{b+1} {2b-2 \atopwithdelims ()b} -1 + {2b \atopwithdelims ()b-1} - {2b-2 \atopwithdelims ()b-3} \end{aligned}$$
(1.6)

such configurations. Indeed, there are \({2b \atopwithdelims ()b}\) different ways of finding b infected sites inside \({\mathcal {N}}_{\scriptscriptstyle (1,b)} {\setminus } (\{(0,-1) ,(0,1)\})\). Of these, \(\sum _{i=2}^b {2b-i \atopwithdelims ()b} +1 = \frac{2b-1}{b+1} {2b-2 \atopwithdelims ()b} +1\) are right-shifts of another configuration (e.g. for \({\mathcal {N}}_{\scriptscriptstyle (1,2)}\) the choices \(\bullet \circ 0 \bullet \circ \) and \(\circ \bullet 0 \circ \bullet \) count as a single growth configuration), so their contribution must be subtracted. If (0, 1) is occupied, there are \({2b \atopwithdelims ()b-1}\) ways of placing the other \(b-1\) sites in \({\mathcal {N}}_{\scriptscriptstyle (1,b)} {\setminus } (\{(0,-1) ,(0,1)\})\). None of these are shift invariant, but some of them cannot grow to fill the entire row. Indeed, when \(b \geqslant 3\), configurations where \((-b,0), (0,1),\) and (b, 0) are infected do not cause the row to fill up. Therefore, we must subtract \({2b-2 \atopwithdelims ()b-3}\). This explains (1.6). See Fig. 2 for growth configurations of the case \(b = 2\). Finally, it takes \(b-1\) more infected sites for a rectangle to grow a row than it does to grow a column, which explains the remaining factors \(b-1\) in (1.5).

Fig. 2
figure 2

On the left: the three relative positions of an infected site that can cause horizontal growth of the grey rectangle. On the right: the 8 pairs of sites (up to shifts) that can cause vertical growth of the grey rectangle

1.6 Comparison with simulations

One might be tempted to hope that the third-order approximation of \(p_c\) in Theorem 1.1 is reasonably good already for lattices that a computer might be able to handle. Simulations indicate that this is not the case. Indeed, for lattices with \(L \leqslant 10,000,\) the third-order approximation is even farther from to the simulated values than the first-order approximation (and recall that the second-order approximation is negative here). We believe that this should not be surprising, because it is not at all obvious that the fourth order term should be significantly smaller: careful inspection of our proof suggests that the \(o(\frac{1}{p} \log \frac{1}{p})\) term in Theorem 1.3 is at most \(O(\frac{1}{p} \log \log \frac{1}{p})\). Although we do not prove this, we have no reason to believe that a correction term of that order does not exist. Even if we suppose that the third order correction in Theorem 1.3 can be sharply bounded by \(C_3 /p\), say, so that we would have the bound

$$\begin{aligned} {\mathbb {P}}_p\big ( R \text { is internally filled} \big ) \, {\mathop {=}\limits ^{?}} \, \exp \left( -\frac{2C_1}{p} \left( \log \frac{1}{p} \right) ^2 + \frac{2 C_2}{p} \log \frac{1}{p} + \frac{C_3 \pm o(1)}{p} \right) , \end{aligned}$$

for critical droplets instead, then a computation like the one in Sect. 1.4 above suggests that this would yield

$$\begin{aligned} p_c([L]^2,{\mathcal {N}}_{\scriptscriptstyle (1,2)},3)&{\mathop {=}\limits ^{?}} \frac{\left( \log \log L\right) ^2}{12\log L} \, - \, \frac{(\log \log L )( \log \log \log L)}{ 3\log L} \\&\quad + \frac{\left( \log \frac{9}{2}+1\right) \log \log L}{6\log L} \\&\quad + \frac{(\log \log \log L)^2}{3 \log L} - \frac{(\log \frac{9}{2} +1) \log \log \log L}{3\log L}\\&\quad + \frac{C_3 + \log \frac{1}{12} \left( 2 + \log \frac{27}{16}\right) \pm o(1)}{12 \log L}, \end{aligned}$$

so the fourth, fifth, and sixth order terms of \(p_c\) would also be comparable to the first for moderately sized lattices. Moreover, because of the extremely slow decay of these correction terms (e.g. \((\log \log 10^{10})^2 \approx 10\)), it might be too optimistic to expect that one would be able to determine \(C_3\) by fitting to the simulated values of \(p_c\), if indeed \(C_3\) exists.

1.7 Comparison with the two-neighbor model

Comparing Theorem 1.1 with the analogous result for the two-neighbor model, (1.4), it may seem remarkable how much sharper the former is than the latter. We believe the following heuristic discussion goes a way towards explaining this difference.

Both approximations of \(p_c\) are proved using essentially the same critical droplet heuristic described above. Once a critical droplet has formed, the entire lattice will easily fill up. But filling a droplet-sized area is exponentially unlikely: it is essentially a large deviations event. The theory of large deviations tells us that if a rare event occurs, it will occur in the most probable way that it can. For filling a droplet, this means that one should find an optimal “growth trajectory”: a sequence of dimensions from which a very small infected area (a “seed”) steadily grows to fill up the entire droplet. For the anisotropic model, in [23], the first and second authors determined this trajectory to be close to \(x = \frac{\mathrm {e}^{3py}}{3p}\), where x and y denote the horizontal and vertical dimensions of the seed as it grows. This approximation was enough to yield the first term of \(p_c\). In the current paper we establish tighter bounds of optimal trajectory around \(x = \frac{\mathrm {e}^{3py}}{3p}\), allowing us to give the sharper estimate for the probability of filling a droplet in Theorem 1.3. As we showed in Sect. 1.4 above, this correction is enough to obtain the first three terms of \(p_c\) for the anisotropic model.

For the two-neighbor model, however, finding this optimal growth trajectory is not at all the challenge: by symmetry it is trivially \(x=y\). The correction to \(p_c\) that Gravner, Holroyd, and Morris determined in [28, 30, 37], is instead due to the much smaller entropic effect of random fluctuations around this trajectory (see also the introduction of [29] for a more detailed explanation of this effect). We believe that such fluctuations also influence \(p_c\) for the anisotropic model, but that their effect will be much smaller than the improvements that can still be made in controlling the precise shape of the optimal growth trajectory.

1.8 About the proofs

The proof of Theorem 1.1 uses a rigorisation of the iterative determination of \(p_c\) in Sect. 1.4 above, combined with Theorem 1.3 and the classical argument of Aizenman and Lebowitz [4].

The lower bound in Theorem 1.3 is a refinement of the computation in [23].

Most of the work of this paper goes into the proof of the upper bound of Theorem 1.3. Like many recent entries in the bootstrap percolation literature, our proof centers around the “hierarchies” argument of Holroyd [33]. In particular, we sharpen the argument of [23] by incorporating the idea of “good” and “bad” hierarchies from [30], and by using very precise bounds on horizontal and vertical growth of infected rectangular regions.

The main new contributions of this paper (besides the iterative determination of \(p_c\)) can be found in Sects. 3 and 6.

In Sect. 3, we introduce the notion of spanning time (Definition 3.3), which characterises to a large extent the structure of configurations of vertical growth. We show that if the spanning time is 0, then such structures have a simple description in terms of paths of infected sites, whereas if the spanning time is not 0, then this description can still be given in terms of paths, but these paths now also involve more complex arrangements of infected sites. We call such arrangements infectors (Definition 3.7), and show that they are sufficiently rare that their contribution does not dominate the probability of vertical growth.

In Sect. 6 we generalise the variational principle of Holroyd [33] to a more general class of growth trajectories. This part of the proof is intended to be more widely applicable than the current anisotropic case, and is set up to allow for precise estimates.

1.9 Notation and definitions

A rectangle \([a,b] \times [c,d]\) is the set of sites in \({\mathbb {Z}}^2\) contained in the Euclidean rectangle \([a,b] \times [c,d]\). For a finite set \({\mathcal {Q}}\subset {\mathbb {Z}}^2\), we denote its dimensions by \(({\mathbf {x}}({\mathcal {Q}}), {\mathbf {y}}({\mathcal {Q}}))\), where \({\mathbf {x}}({\mathcal {Q}}) = \max \{a_1 - b_1 +1 \, : \, \{(a_1,a_2), (b_1, b_2)\} \in {\mathcal {Q}}\times {\mathcal {Q}}\}\), and similarly, \({\mathbf {y}}({\mathcal {Q}}) = \max \{a_2 - b_2 +1\, : \, \{(a_1,a_2), (b_1, b_2)\} \in {\mathcal {Q}}\times {\mathcal {Q}}\}\). So in particular, a rectangle \(R = [a,b] \times [c,d]\) has dimensions \(({\mathbf {x}}(R), {\mathbf {y}}(R)) = (|[a,b]\, \cap \, {\mathbb {Z}}|, |[c,d] \,\cap \,{\mathbb {Z}}|)\). Oftentimes, the quantities that we calculate will only depend on the size of R, and be invariant with respect to the position of R. In such cases, when there is no possible confusion, we will write R with \({\mathbf {x}}(R)=x\) and \({\mathbf {y}}(R)=y\) as \([x] \times [y]\). A row of R is a set \(\{(m,n) \in R \,:\, n=n_0\}\) for some fixed \(n_0\). A column is similarly defined as a set \(\{(m,n) \in R \,:\, m =m_0\}\). We sometimes write \([a,b] \times \{c\}\) for the row \(\{ (m,c) \in {\mathbb {Z}}^2 \, :\, m \in [a,b] \cap {\mathbb {Z}}\}\), and use similar notation for columns.

We say that a rectangle \(R = [a,b] \times [c,d]\) is horizontally traversable (hor-trav) by a configuration \({\mathcal {S}}\) if

$$\begin{aligned} R \subset \langle (R \cap {\mathcal {S}}) \cup ([a-2,a-1] \times [c,d])\rangle . \end{aligned}$$

That is, R is horizontally traversable if the rectangle becomes infected when the two columns to its left are completely infected. Under \({\mathbb {P}}_p\), this event is equiprobable to the event that \(R \subset \langle (R \cap {\mathcal {S}}) \cup ([b+1, b+2] \times [c,d])\rangle \), and more importantly, it is equivalent to the event that R does not contain three or more consecutive columns without any infected sites and the rightmost column contains an infected site.

We say that R is up-traversable (up-trav) by \({\mathcal {S}}\) if

$$\begin{aligned} R \subset \langle (R \cap {\mathcal {S}}) \cup ([a,b] \times \{c-1\})\rangle . \end{aligned}$$

That is, R becomes entirely infected when all sites in the row directly below R are infected. Similarly, we say that R is down-traversable by \({\mathcal {S}}\) if \(R \subset \langle (R \cap {\mathcal {S}}) \cup ([a,b] \times \{d+1\})\rangle \). Again, under \({\mathbb {P}}_p\) up and down traversability are equiprobable, so we will only discuss up-traversability. If \({\mathcal {S}}\) is a random site percolation, then we simply say that R is horizontally- or up- or down-traversable.

Given rectangles \(R \subset R'\) we write \(\left\{ R \Rightarrow R' \right\} \) for the event that the dynamics restricted to \(R'\) eventually infect all sites of \(R'\) if all sites in R are infected, i.e., for the event that \(R' = \langle ({\mathcal {S}}\cap R') \cup R\rangle \).

We will frequently make use of two standard correlation inequalities: The first is the Fortuin–Kasteleyn–Ginibre inequality (FKG-inequality), which states that for increasing events A and B, \({\mathbb {P}}_p(A \cap B) \geqslant {\mathbb {P}}_p(A) {\mathbb {P}}_p(B)\). The second is the van den Berg–Kesten inequality (BK-inequality), which states that for increasing events A and B, \({\mathbb {P}}_p(A \circ B) \leqslant {\mathbb {P}}_p(A) {\mathbb {P}}_p(B)\), where \(A \circ B\) means that A and B occur disjointly (see [32, Chapter 2] for a more in-depth discussion).

1.10 The structure of this paper

In Sect. 2 we state two key bounds, Lemmas 2.2 and 2.3, giving primarily lower bounds on the probabilities of horizontal and vertical growth of an infected rectangular region, and we use them to prove the lower bound of Theorem 1.3. In Sect. 3 we prove a complementary upper bound on the vertical growth of infected rectangles, Lemma 3.1. In Sect. 4 we prove Lemma 4.1, which combines the upper bounds on horizontal and vertical growth from Lemmas 2.2 and 3.1. This lemma is crucial for the upper bound of Theorem 1.3. We prove the upper bound of Theorem 1.3 in Sect. 5, subject to a variational principle, Lemma 5.9, that we prove in Sect. 6. Finally, in Sect. 7 we use Theorem 1.3 to prove Theorem 1.1.

2 The lower bound of Theorem 1.3

Recall that \(C_1 = \frac{1}{12}\) and \(C_2 = \frac{1}{6} \log \frac{8}{3 \mathrm {e}}\).

Proposition 2.1

Let \(p>0\) and \(\frac{1}{p^2} \leqslant x \leqslant \frac{1}{p^5}\) and \(\frac{1}{3p} \log \frac{1}{p} \leqslant y \leqslant \frac{1}{p^5}\). Then

$$\begin{aligned} {\mathbb {P}}_p([x] \times [y] \text { is IF}) \; \geqslant \; \exp \left( -\frac{2C_1}{p} \log ^2 \frac{1}{p} + \left( 2C_2 - o(1) \right) \frac{1}{p} \log \frac{1}{p} \right) . \end{aligned}$$

Note that the upper bound on y is different from the bound in Theorem 1.3.

For the proof it suffices to show that there exists a subset of configurations that has the desired probability. We choose a subset of configurations that follow a typical “growth trajectory”: configurations that contain a small area that is locally densely infected (a seed). We bound the probability that such a seed will grow a bit (which is likely), and then a lot more (which is exponentially unlikely), until the infected region reaches a size where the growth is again very likely, because the boundary of the infected region is large and the dynamics depend only on the existence of infected sites on the boundary, not on their number.

To prove this proposition we will need bounds on the probability that a rectangle becomes infected in the presence of a large infected cluster on its boundary. We state two lemmas that achieve this, which are improvements upon [23, Lemmas 2.1 and 2.2].

Lemma 2.2

For any rectangle \([x] \times [y]\),

$$\begin{aligned} \mathrm {e}^{ -x f(p,y)} \; \leqslant \; {\mathbb {P}}_p\left( [x] \times [y] \text { is hor-trav} \right) \; \leqslant \; \mathrm {e}^{-(x-2)f(p,y)}, \end{aligned}$$

where \(f(p,y) \,{:=}\, -\log ( \alpha (1-(1-p)^y))\) and where \(\alpha (u)\) is the positive root of the polynomial

$$\begin{aligned} X^3-uX^2-u(1-u)X-u(1-u)^2. \end{aligned}$$
(2.1)

Moreover, f(py) satisfies the following bounds:

  1. (a)

    when \(p\rightarrow 0\) and \(py\rightarrow \infty \),

    $$\begin{aligned} f(p,y)=\mathrm {e}^{-3py}+\Theta (\mathrm {e}^{-4py}), \end{aligned}$$
  2. (b)

    when \(y \geqslant \frac{2}{p} \log \log \frac{1}{p}\),

    $$\begin{aligned} f(p,y)=\mathrm {e}^{-3py}\left( 1+\Theta \left( \log ^{-2} (1/p)\right) \right) , \end{aligned}$$
  3. (c)

    when \(p \rightarrow 0\), \(y \rightarrow \infty \), and \((1-p)^y \rightarrow 1\),

    $$\begin{aligned} f(p,y) \geqslant \tfrac{1}{2} p y - 3 p^2 y^2. \end{aligned}$$

Proof

From [23, Lemma 2.1]Footnote 9 we know that

$$\begin{aligned} \alpha \left( 1-(1-p)^y \right) ^{x} \;\leqslant \; {\mathbb {P}}_p\left( [x] \times [y] \text { is hor-trav} \right) \; \leqslant \; \alpha \left( 1-(1-p)^y \right) ^{x-2}. \end{aligned}$$

When u is close to 1, \(X = \mathrm {e}^{-(1-u)^3}\) is an approximate solution for the positive root, since

$$\begin{aligned} \mathrm {e}^{-3(1-u)^3} -u\mathrm {e}^{-2(1-u)^3} -u(1-u) \mathrm {e}^{-(1-u)^3} -u(1-u)^2 = \Theta ((1-u)^4 ). \end{aligned}$$

So, as \(p \rightarrow 0\) and \(py \rightarrow \infty \),

$$\begin{aligned} -\log \alpha (1-(1-p)^y) = (1-p)^{3y} + \Theta ((1-p)^{4y}) = \mathrm {e}^{-3py} + \Theta (\mathrm {e}^{-4py}). \end{aligned}$$

This establishes (a) and (b) simply follows.

To prove (c), recall Rouché’s Theorem (see e.g. [39, Theorem 10.43]), which states that if two functions g(z) and h(z) are holomorphic on a bounded region \(U \subset {\mathbb {C}}\) with continuous boundary \(\partial U\) and satisfy \(|g(z) - h(z)| < |g(z)|\) for all \(z \in \partial U\), then g and h have an equal number of roots on U. Applying Rouché’s Theorem with \(h(z) = a_0 + a_1 z + a_2 z^2 + a_3 z^3\) and \(g(z) = a_0\), it follows that the moduli of the roots of h(z) are all bounded from below by \(|a_0| /(|a_0| + \max \{|a_1|, |a_2|, |a_3|\})\). Applying this bound to (2.1) we find that when \(u > 0\) is sufficiently small,

$$\begin{aligned} \alpha (u) \geqslant \frac{u(1-u)^2}{u(1-u)^2+1} \geqslant u -3 u^2, \end{aligned}$$

where the second inequality is due to a series expansion around \(u=0\). (We remark that an explicit computation gives \(\alpha (u) \geqslant u -3u^2\) for all \(u >0\), but without relying on a computer this may take several pages to verify.) Since we assumed \((1-p)^y \rightarrow 1\) we thus have

$$\begin{aligned} f(p,y) \geqslant (1-(1-p)^y) - 3(1-(1-p)^y)^2 \geqslant \tfrac{1}{2} py - 3 p^2 y^2, \end{aligned}$$

where we used \(\frac{1}{2} py \leqslant 1-(1-p)^y \leqslant py\) for p sufficiently small. \(\square \)

Lemma 2.3

  1. (a)

    If \(p^2x\) is sufficiently small, then we have, for any rectangle \([x] \times [y]\),

    $$\begin{aligned} {\mathbb {P}}_p\left( [x] \times [y] \text { is up-trav} \right) \; \geqslant \; \exp \Big ( y \log (8p^2x)\big (1+O(p^2x + p)\big ) \Big ). \end{aligned}$$
  2. (b)

    As long as \( \frac{8 p^2 x}{5} \le 1\) we have

    $$\begin{aligned} {\mathbb {P}}_p([x] \times [y] \text { is up-trav}) \; \geqslant \; \left( \frac{8 p^2 x}{5 \mathrm {e}} \right) ^y. \end{aligned}$$

Proof

We say that a rectangle is North-traversable (N-trav) if the intersection of every row with R contains a site (ab) such that \(((a,b) + {\mathcal {N}}_{\scriptscriptstyle (1,2)}){\setminus } \{(a, b-1)\}\) contains at least two infected sites. Observe that North-traversability implies up-traversability, so

$$\begin{aligned} {\mathbb {P}}_p([x] \times [y] \text { is up-trav}) \geqslant {\mathbb {P}}_p([x] \times [y] \text { is N-trav}). \end{aligned}$$

We can similarly define South-traversability by requiring that the intersection of every row with R contains a site (ab) such that \(((a,b) + {\mathcal {N}}_{\scriptscriptstyle (1,2)}){\setminus } \{(a, b+1)\}\) contains at least two infected sites. South-traversability implies down-traversability. Again, from a probabilistic point of view North- and South-traversability are equivalent, so we will henceforth only discuss North-traversability.

If \([x] \times [y]\) is North-traversable then for each of the y rows there must exist an infected pair of sites u and v and a site z in the row such that \(u,v \in z+ {\mathcal {N}}_{\scriptscriptstyle (1,2)}\). By the FKG inequality we thus have the lower bound

$$\begin{aligned} {\mathbb {P}}_p([x] \times [y] \text { is N-trav}) \geqslant {\mathbb {P}}_p(\exists \hbox { an infected pair for a row of length}\ x)^y. \end{aligned}$$

For the proof of (a) we apply Janson’s inequality [35]. The expected number of infected pairs immediately above an infected rectangle of width x is at least \(\mu ~=~(8x - 16)p^2\). To see this, consider that up to translations there are 8 possible pairs of infected sites above the rectangle that can infect the whole row, see Fig. 2 above. The variance isFootnote 10 \(\Delta = O(p^3x) \ll \mu \), so the probability that some pair is infected is at least

$$\begin{aligned} 1 - \exp \left( -\mu + \Delta /2 \right) \; \geqslant \; \left( 8p^2 x-O(p^3x + p^4x^2) \right) , \end{aligned}$$

using the inequality \(1 - \mathrm {e}^{-x} \geqslant x - x^2\) for \(x \geqslant 0\).

For the proof of (b) we use a cruder approximation: For \((a,b) \in [x] \times [y]\) let \(A_{(a,b)}\) be the event that (ab) is the leftmost site of an infected pair as in Fig. 2. These pairs all have width at most 5, so the probability that a row of length x does not have an infected pair can be bounded from above by

$$\begin{aligned} (1-8p^2)^{\lfloor x /5 \rfloor } \leqslant \exp \left( -\frac{8 p^2 x}{5}\right) \leqslant 1-\frac{8 p^2 x}{5 \mathrm {e}} \end{aligned}$$

when \( \frac{8 p^2 x}{5} \leqslant 1\). The claim follows. \(\square \)

Proof of Proposition 2.1

We start by constructing a seed. Let \(r\, {:=} \,\lfloor \frac{2}{p} \log \log \frac{1}{p} \rfloor \) and infect sites (1, 2i) and \((2,2i+1)\) for \(2i\le r\). The probability that a rectangle \([2] \times [r]\) is a seed is \(p^r\). Note that the infected sites internally fill \([2]\times [r]\).

The growth of the seed to a rectangle of arbitrary size can be divided into three stages:

Stage 1. By Lemma 2.2(a) the probability of finding a seed of size r that will grow to size \(\left[ \mathrm {e}^{3 rp}/(3p)\right] \times [r]\) is about the same as the probability of just finding the seed, i.e.,

$$\begin{aligned} p^{r} \cdot \exp \left( - \frac{\mathrm {e}^{3rp}}{3p}\cdot \left( \mathrm {e}^{-3 r p}+O\left( \mathrm {e}^{-4 r p}\right) \right) \right) \geqslant p^r \mathrm {e}^{-O(1/p)}. \end{aligned}$$
(2.2)

Stage 2. Next we bound the probability that the infected rectangle grows to size

$$\begin{aligned} R\, {:=} \,\left[ \frac{1}{3 p^2} \right] \times \left[ \frac{1}{3p} \log \frac{1}{p} \right] , \end{aligned}$$

that is, we want to bound

$$\begin{aligned} {\mathbb {P}}_p\left( \left[ \frac{\mathrm {e}^{3rp}}{3p}\right] \times \left[ r \right] \Rightarrow \left[ \frac{\mathrm {e}^{3mp}}{3p}\right] \times \left[ m \right] \right) , \end{aligned}$$
(2.3)

where \(m\,{:=} \,\frac{1}{3p} \log \frac{1}{p}\). This is the bottleneck for the growth dynamics. We bound (2.3) by considering the growth in many small steps. In each such step, the rectangle will either infect an entire row above or below it, or it will infect an entire row to the left or right of it (with the help of infected sites on the boundary of the rectangle). Because vertical growth is less probable than horizontal growth, we will consider sequences where the rectangle grows by one vertical step, from height \(\ell \) to \(\ell +1\), followed by horizontal growth that infects many columns successively, with the rectangle growing from width \(x_\ell \) to \(x_{\ell +1}\) where \( x_\ell : = \frac{\mathrm {e}^{3 \ell p}}{3p}\). That this choice is close to optimal can be seen in Sect. 6 below, where a variational principle for the upper bound of Theorem 1.3 is derived.

Having divided the growth into steps, we can bound (2.3) from below using the FKG-inequality:

$$\begin{aligned}&{\mathbb {P}}_p\Biggl (\left[ \frac{\mathrm {e}^{3rp}}{3p}\right] \times \left[ r \right] \Rightarrow \left[ \frac{\mathrm {e}^{3mp}}{3p}\right] \times \left[ m \right] \Biggr ) \nonumber \\&\quad \geqslant \prod _{\ell =r}^{m} {\mathbb {P}}_p\left( \left[ x_\ell \right] \times \left[ \ell \right] \Rightarrow \left[ x_{\ell +1}\right] \times \left[ \ell \right] \right) \nonumber \\&\qquad \times \prod _{\ell =r}^{m-r} {\mathbb {P}}_p\left( \left[ x_\ell \right] \times \left[ \ell \right] \Rightarrow \left[ x_\ell \right] \times \left[ \ell +1\right] \right) \nonumber \\&\qquad \times \prod _{\ell =m-r+1}^{m} {\mathbb {P}}_p\left( \left[ x_\ell \right] \times \left[ \ell \right] \Rightarrow \left[ x_{\ell }\right] \times \left[ \ell +1\right] \right) . \end{aligned}$$
(2.4)

We bound these three products separately.

It follows from Lemma 2.2(a) that the horizontal growth from width \(x_\ell \) to \(x_{\ell +1}\) occurs with probability approximately \(1/\mathrm {e}\), i.e.,

$$\begin{aligned} {\mathbb {P}}_p\big ( [x_\ell ] \times [\ell ]&\Rightarrow [x_{\ell +1}] \times [\ell ]\big ) \nonumber \\&\geqslant \exp \Big (-\frac{1}{3p} \big ( \mathrm {e}^{ 3 (\ell +1)p } - \mathrm {e}^{3 \ell p } \big ) \mathrm {e}^{- 3 \ell p }\big (1+O\big (\log ^{-4/3} (1/p)\big )\big )\Big ) \nonumber \\&\geqslant \mathrm {e}^{-1 - o(1)}. \end{aligned}$$
(2.5)

Therefore,

$$\begin{aligned} \prod _{\ell =r}^{m} {\mathbb {P}}_p\big ( [x_\ell ] \times [\ell ] \Rightarrow [x_{\ell +1}] \times [\ell ]\big ) \geqslant \mathrm {e}^{-m (1+o(1))}. \end{aligned}$$
(2.6)

When \(\ell \leqslant m-r\), then \(p^2 x_\ell \leqslant \log ^{-2} \frac{1}{p}\), so we can apply Lemma 2.3(a) to bound

$$\begin{aligned} {\mathbb {P}}_p([x_\ell ] \times [\ell ] \Rightarrow [x_\ell ] \times [\ell +1]) \geqslant 8 p^2 x_\ell \, \mathrm {e}^{O\left( \log ^{-4/3} \frac{1}{p}\right) }. \end{aligned}$$

Therefore we can bound the second product in (2.4) from below by

$$\begin{aligned}&\prod _{\ell =r}^{m-r} 8 p^2 x_\ell \mathrm {e}^{O\left( \log ^{-4/3} \frac{1}{p}\right) } \nonumber \\&\quad \geqslant \left( \frac{8 p}{3}\right) ^{m-2r} \exp \left( 3p \sum _{\ell =r}^{m-r} \ell \right) \mathrm {e}^{(m-2r)O\left( \log ^{-4/3} \frac{1}{p}\right) }\nonumber \\&\quad = \left( \frac{8 p}{3}\right) ^{m-2r} \exp \left( \frac{3p}{2} \big ((m-r) (m-r+1) - (r-1)r) \big ) \right) \mathrm {e}^{o\left( m \right) }\nonumber \\&\quad = p^{-r} \left( \frac{8 p}{3}\right) ^{m-r} \exp \left( \frac{3p}{2} \left( m^2 - 2mr + m\right) \right) \mathrm {e}^{o\left( m\right) }\nonumber \\&\quad = p^{-r} \left( \frac{8 p }{3} \right) ^{m-r} \exp \left( \frac{3 p}{2} (m^2 - 2mr)\right) \mathrm {e}^{o(m)}. \end{aligned}$$
(2.7)

Using Lemma 2.3(b) we can similarly bound the third product from below by

$$\begin{aligned} \prod _{\ell =m-r+1}^{m} \frac{8 p^2 x_\ell }{5 \mathrm {e}}&= \left( \frac{8p}{15 \mathrm {e}}\right) ^r \exp \left( \sum _{\ell = m-r+1}^m 3 \ell p \right) \nonumber \\&\geqslant \left( \frac{8 p}{3}\right) ^{r} \exp \left( \frac{3p}{2} \big (m(m+1) - (m-r)(m-r+1) \big )\right) \left( \frac{1}{5\mathrm {e}}\right) ^{r}\nonumber \\&= \left( \frac{8p}{3}\right) ^{r} \exp \left( \frac{3p}{2} 2mr\right) \mathrm {e}^{o(m)}. \end{aligned}$$
(2.8)

Multiplying the bounds (2.6), (2.7), and (2.8), and using that \(m = \frac{1}{3p} \log \frac{1}{p}\), we get

$$\begin{aligned}&{\mathbb {P}}_p\left( \left[ \frac{\mathrm {e}^{3rp}}{3p}\right] \times \left[ r \right] \Rightarrow \left[ \frac{\mathrm {e}^{3mp}}{3p}\right] \times \left[ m \right] \right) \nonumber \\&\quad \geqslant p^{-r} \left( \frac{8 p}{3}\right) ^m \exp \left( \frac{3 p}{2} m^2 -m\right) \mathrm {e}^{o(m)}\nonumber \\&\quad = p^{-r} \exp \left( \frac{3 p}{2} m^2 - m \log \frac{1}{p} + m \log \frac{8}{3} -m \right) \mathrm {e}^{o(m)}\nonumber \\&\quad = p^{-r} \exp \left( - \frac{1}{6p} \log ^2 \frac{1}{p} + (1- o(1))\frac{1}{3p} \log \frac{8}{3\mathrm {e}} \log \frac{1}{p} \right) . \end{aligned}$$
(2.9)

Stage 3. The infected region can grow from \([\frac{1}{3p^2}] \times [m]\) to arbitrary size with good probability. Indeed, we claim that

$$\begin{aligned} {\mathbb {P}}_p\left( \left[ \frac{1}{3p^2} \right] \times [m] \Rightarrow R\right) \geqslant \mathrm {e}^{-O\left( 1/p \right) }. \end{aligned}$$
(2.10)

This bound is proved in [23, proof of Proposition 2.4]. We do not repeat the proof here, but let us indicate how this bound is established: Consider the case where the cluster first grows horizontally to width \(1/p^2\). By Lemma 2.2(b) we have

$$\begin{aligned} {\mathbb {P}}_p\left( \left[ \frac{1}{3p^2} \right] \times [m] \Rightarrow \left[ \frac{1}{p^2}\right] \times [m] \right) \geqslant \exp \left( - \frac{2}{3 p^2} \cdot p (1 +o(1))\right) = \mathrm {e}^{- O\left( 1/p\right) }. \end{aligned}$$

Now consider the case where it grows vertically, this time to height 3m. This also occurs with probability at least \(\mathrm {e}^{-O(1/p)}\). As the infected region gets larger, the probability that it keeps growing converges to 1. The result is that (2.10) holds for any rectangle R that is large enough, as long as the dimensions of R are sufficiently balanced (which is guaranteed by the assumptions on x and y).

Now, by the FKG-inequality, we can multiply the bounds from the three stages (i.e., (2.2), (2.9), and (2.10)) to complete the proof of Proposition 2.1.\(\square \)

3 An upper bound on the probability of up-traversability

The following bound is crucial for the proof of the upper bound of Theorem 1.3. Recall from (1.1) the definition of the bootstrap operator \({\mathcal {B}}\), and recall that \({\mathcal {B}}^{(t)}({\mathcal {S}})\) is the t-th iterate of \({\mathcal {B}}\) with initial set \({\mathcal {S}}\), and that \(\langle {\mathcal {S}}\rangle = \lim _{t \rightarrow \infty } {\mathcal {B}}^{(t)}({\mathcal {S}})\). Recall that a rectangle \(R = [1,x] \times [1,y]\) is said to be up-traversable by a set \({\mathcal {S}}\) if \(R \subset \langle ({\mathcal {S}}\cap R) \cup ([1,x] \times \{0\}) \rangle \), and that we write \({\mathbb {P}}_p\) to indicate that the elements of \({\mathcal {S}}\) are chosen independently at random with probability p.

Lemma 3.1

Let \(1 \leqslant k \ll p^{-2/5}\) and let R be a rectangle with dimensions (xy) such that \(y < x\). Then, for p sufficiently small,

$$\begin{aligned} {\mathbb {P}}_p\big ( R \text { is up-traversable} \big ) \leqslant {\left\{ \begin{array}{ll} p^{-k} \mathrm {e}^{y/k} (24 p k^2 + 8p)^y &{} \text { if } \quad x < \frac{3 k^2}{p},\\ p^{-k} \mathrm {e}^{y/k} \big ( 8p^2 x + 8p \big )^{y} &{} \text { if } \quad \frac{3 k^2}{p} \leqslant x \leqslant \frac{1}{p^2}. \end{array}\right. } \end{aligned}$$

We will apply this lemma with \(\frac{1}{p} \ll y \ll \frac{1}{p} \log ^6 \frac{1}{p} \leqslant x\) and \(k = \log ^2 \frac{1}{p}\). Note that in this case the upper bound given by the lemma is not much larger than the lower bound given by Lemma 2.3. In particular, for these choices of xy and k, the bound given by the lemma is of the form \(\big ( ( 8 + o(1)) p^2 x \big )^y\).

We begin the proof of Lemma 3.1 with the following simple but important definition: let us say that a pair of sites \({\mathcal {P}}\) is a spanning pair for the row \([a,b] \times \{c\}\) if

$$\begin{aligned}{}[a,b] \times \{c\} \subset \langle {\mathcal {P}}\cup [a,b] \times \{c-1\}\rangle . \end{aligned}$$
(3.1)

That is, \({\mathcal {P}}\) is a spanning pair for \([a,b] \times \{c\}\) if the row becomes infected when \({\mathcal {P}}\) and the row below it are infected. Note that for each spanning pair \({\mathcal {P}}= \{u,v\}\) there exists \(z \in [a,b] \times \{c\}\) such that \(u,v \in z + ({\mathcal {N}}_{\scriptscriptstyle (1,2)}{\setminus } \{(0,-1)\})\), and thus that any spanning pair is a translate of one of the eight pairs on the right-hand side of Fig. 2.

Lemma 3.2

Let R be a rectangle such that R has \({\mathbf {x}}(R) \geqslant 2\) and \({\mathbf {y}}(R) \geqslant 1\), and let \({\mathcal {S}}\subset R\). Then R is up-traversable by \({\mathcal {S}}\) if and only if \(\langle {\mathcal {S}}\rangle \) contains a spanning pair for every row of R.

Proof

Suppose that \(R = [a,b] \times [c,d]\) with \(b-a \geqslant 1\) and \(d-c \geqslant 0\). It is easy to see that if \(\langle {\mathcal {S}}\rangle \) contains a spanning pair for every row of R, then R is up-traversable by \({\mathcal {S}}\): if \(\langle {\mathcal {S}}\rangle \) contains a spanning pair for the bottom row of R, then the whole row becomes infected, i.e., \([a,b] \times \{c\} \subset \langle {\mathcal {S}}\cup [a,b] \times \{c\}\rangle \). And given that the bottom row is infected, the row above the bottom row must also become infected, since \(\langle {\mathcal {S}}\rangle \) also contains a spanning pair for it, i.e., \([a,b] \times \{c+1\} \subset \langle {\mathcal {S}}\cup [a,b] \times \{c\}\rangle \). This argument can be repeated for all rows.

It will therefore suffice to prove that the converse holds. To do that, let \(j \in [c,d]\) be the smallest j such that \(\langle {\mathcal {S}}\rangle \) does not contain a spanning pair for the row \([a,b] \times \{j\}\). We claim that the set

$$\begin{aligned} \big ( \langle {\mathcal {S}}\cup [a,b] \times \{j-1\} \rangle {\setminus } \langle {\mathcal {S}}\rangle \big ) \cap ([a,b] \times \{j\}) \end{aligned}$$

is empty. Indeed, suppose that for some \(t \geqslant 1\) there exists a site v such that

$$\begin{aligned} v \in {\mathcal {B}}^{(t)}\big ({\mathcal {S}}\cup ([a,b] \times \{j-1\})\big ) \cap ([a,b] \times \{j\}), \end{aligned}$$

Then there must be a pair of already-infected sites in \({\mathcal {N}}_{\scriptscriptstyle (1,2)}(v) \cap ([a,b] \times [j,j+1])\) at time \(t-1\). But this pair lies in \(\langle {\mathcal {S}}\rangle \), and thus is a spanning pair for the row \([a,b] \times \{j\}\), a contradiction. Now, since \(\langle {\mathcal {S}}\rangle \) does not contain a spanning pair for \([a,b] \times \{j\}\), this implies that \(R \nsubseteq \langle {\mathcal {S}}\cup ([a,b] \times \{c-1\}) \rangle \), as required.\(\square \)

We now make another important definition.

Definition 3.3

For each rectangle R such that \({\mathbf {x}}(R) \geqslant 2\) and \({\mathbf {y}}(R) \geqslant 1\), and each set \({\mathcal {S}}\subset R\) such that R is up-traversable by \({\mathcal {S}}\), let \({\mathcal {A}}({\mathcal {S}}) \subset {\mathcal {S}}\) be a minimum-size subset such that R is up-traversable by \({\mathcal {A}}({\mathcal {S}})\). (If more than one such subset exists, then choose one according to some arbitrary rule.) Define the spanning time

$$\begin{aligned} \tau= & {} \tau (R, {\mathcal {S}}) \\:= & {} \min \big \{ t \geqslant 0 \, : \, {\mathcal {B}}^{(t)}({\mathcal {A}}({\mathcal {S}})) \hbox { contains a spanning pair for each row of}\ R \big \}. \end{aligned}$$

In words, the spanning time \(\tau \) is the first time t such that \({\mathcal {B}}^{(t)}({\mathcal {A}}({\mathcal {S}}))\) spans all rows of R. Since R is up-traversable by \({\mathcal {A}}({\mathcal {S}})\), it follows by Lemma 3.2 that \(\tau \) must be finite. However, we emphasise that it is possible that \(\tau > 0\), see Fig. 3 for some examples.

Fig. 3
figure 3

Five configurations (the red and black sites) that do not have a spanning pair for the row above the dark grey row at time \(t=0\), but that create a spanning pair (the red and blue sites) at some time \(t >0\) by iteration with \({\mathcal {B}}\). The light grey sites indicate which sites must become infected to create the spanning pair. Note that in each case these sets have minimal cardinality (i.e., if we remove any black site, then iteration of \({\mathcal {B}}\) will not create the spanning pair) (color figure online)

The central idea in the proof of Lemma 3.1 is to consider the cases \(\tau = 0\) and \(\tau > 0\) separately. When \(\tau = 0\), the structure is significantly simpler than when \(\tau > 0\), which allows for a very sharp estimate. When \(\tau > 0\) more complex structures are possible, but more infected sites are required, and this allows us to use a less precise analysis.

3.1 The case \(\tau = 0\)

Given a rectangle R, let \({\mathcal {F}}_0(R)\) and \({\mathcal {F}}_+(R)\) denote the families of all minimal sets \({\mathcal {A}}\subset R\) such that R is up-traversable by \({\mathcal {A}}\) and \(\tau (R,{\mathcal {A}}) = 0\) and \(\tau (R,{\mathcal {A}})>0\), respectively. Let us write \({\mathcal {U}}_0(R)\) and \({\mathcal {U}}_+(R)\) for the upsets generated by \({\mathcal {F}}_0(R)\) and \({\mathcal {F}}_+(R)\), respectively, i.e., the collections of subsets of R that contain a set \({\mathcal {A}}\in {\mathcal {F}}_0(R)\) or \({\mathcal {A}}\in {\mathcal {F}}_+(R)\), respectively.

The following lemma gives a precise estimate of the probability that a rectangle is up-traversable and \(\tau =0\).

Lemma 3.4

Let R be a rectangle with dimensions (xy), and let \(p \in (0,1)\). Then

$$\begin{aligned} {\mathbb {P}}_p\big ( {\mathcal {S}}\cap R \in {\mathcal {U}}_0(R) \big ) \leqslant (8p^2 x + 8p)^y. \end{aligned}$$
(3.2)

We will prove Lemma 3.4 using the first moment method. To be precise, we will show that the expected number of members of \({\mathcal {F}}_0(R)\) that are contained in \({\mathcal {S}}\) is at most the right-hand side of (3.2). This will follow easily from the following lemma.

Lemma 3.5

Let R be a rectangle with dimensions (xy), and let \(p \in (0,1)\). Then

$$\begin{aligned} |{\mathcal {F}}_0(R)| \, \leqslant \, \sum _{r=1}^y 8^y {y - 1 \atopwithdelims ()r - 1} x^r. \end{aligned}$$

To count the sets in \({\mathcal {F}}_0(R)\), we will need to understand their structure. We will show that each set \({\mathcal {A}}\in {\mathcal {F}}_0(R)\) can be partitioned into “paths” as follows:

Lemma 3.6

Let R be a rectangle with dimensions (xy), and let \({\mathcal {A}}\in {\mathcal {F}}_0(R)\). Then there exists a partition \({\mathcal {A}}= A_1 \cup \cdots \cup A_r\), where \(r = |{\mathcal {A}}| - y\), with the following property: For each \(j \in [r]\), there exists an ordering \((u_1,\ldots ,u_{|A_j|})\) of the elements of \(A_j\) such that

$$\begin{aligned} u_i - u_{i-1} \in \big \{ (\pm 2,1), (\pm 1,1) \big \}, \end{aligned}$$

for each \(2 \leqslant i < |A_j|\), and

$$\begin{aligned} u_{|A_j|} - u_{|A_j|-1} \in \big \{ (\pm 4,0), (\pm 3,0), (\pm 2,0), (\pm 1,0), (\pm 2,1), (\pm 1,1) \big \}. \end{aligned}$$

See Fig. 4 for an illustration.

Fig. 4
figure 4

On the left: an up-traversable rectangle. On the right: a minimal set \({\mathcal {A}}\). Note that \({\mathcal {A}}\) is sufficient for up-traversability, that \({\mathcal {A}}\) spans every row (so \(\tau =0\)), and that \({\mathcal {A}}\) consists of 8 paths (so \(r=8\))

Proof

Since \({\mathcal {A}}\) is a minimal subset of R such that R is up-traversable by \({\mathcal {A}}\), and \(\tau (R,{\mathcal {A}}) = 0\), it follows from Definition 3.3 that \({\mathcal {A}}\) contains a spanning pair for each row of R, and hence (by minimality of \({\mathcal {A}}\)) it follows that \({\mathcal {A}}\) consists exactly of a union of spanning pairs (one pair for each row) and no other sites. Let these pairs be \({\mathcal {P}}_1,\ldots ,{\mathcal {P}}_y\), and define a graph on [y] by placing an edge between i and j if \({\mathcal {P}}_i \cap {\mathcal {P}}_j\) is non-empty. The sets \(A_1,\ldots ,A_r\) are simply (the elements of \({\mathcal {A}}\) corresponding to) the components of this graph.

Let the components of the graph be \(C_1,\ldots ,C_r\), and note first that each component is a path, since a spanning pair for row \([a,b] \times \{c\}\) is contained in \([a,b] \times [c,c+1]\). Moreover, it follows immediately from this simple fact that if \({\mathcal {P}}_i \cap {\mathcal {P}}_j\) is non-empty then \({\mathcal {P}}_i\) and \({\mathcal {P}}_j\) must be spanning pairs for adjacent rows (say, \([a,b] \times \{c\}\) and \([a,b] \times \{c+1\}\)), and that their common element must lie in \([a,b] \times \{c+1\}\).

Now, consider a component \(C_\ell = \{ i_1,\ldots , i_s \}\), set \(A_\ell = \bigcup _{j = 1}^s {\mathcal {P}}_{i_j}\), and note that \(|A_\ell | = s + 1\). Let \(A_\ell = \{u_1,\ldots ,u_{s+1}\}\), and assume (without loss of generality) that \({\mathcal {P}}_{i_j} = \{u_j,u_{j+1}\}\) for each \(j \in \{1,\dots , s\}\). It now follows from the comments above, and the definition of a spanning pair in (3.1), that

$$\begin{aligned} u_i - u_{i-1} \in \big \{ (\pm 2,1), (\pm 1,1) \big \}, \end{aligned}$$

for each \(2 \leqslant i \leqslant s\), and that

$$\begin{aligned} u_{s+1} - u_s \in \big \{ (\pm 4,0), (\pm 3,0), (\pm 2,0), (\pm 1,0), (\pm 2,1), (\pm 1,1) \big \}, \end{aligned}$$

as claimed. Finally, note that \(|{\mathcal {A}}| = y + r\), since \(|A_\ell | = |C_\ell | + 1\) for each \(\ell \in \{1,\dots , r\}\).

\(\square \)

Proof of Lemma 3.5

To count the sets \({\mathcal {A}}\in {\mathcal {F}}_0(R)\), let us first fix \(|{\mathcal {A}}|\), and the sizes of the sets \(A_1,\ldots ,A_r\) given by Lemma 3.6. Recall that \(r = |{\mathcal {A}}| - y\) and that \({\mathcal {A}}= A_1 \cup \cdots \cup A_r\) is a partition, and note that \(|A_j| \geqslant 2\) for each \(j \in \{1,\dots , r\}\), since \(A_j\) is a union of spanning pairs. It follows that there are exactly

$$\begin{aligned} {y - 1 \atopwithdelims ()r - 1} \end{aligned}$$

ways to choose the sequence \((|A_1|,\ldots ,|A_r|)\), where we order the sets \(A_j\) so that if \(i < j\) then the top row of \(A_i\) is no higher than the bottom row of \(A_j\). (Note that this is possible because each \(A_i\) is a union of spanning pairs for some set of consecutive rows of R.) Now, we claim that there are at most

$$\begin{aligned} x \cdot 8^{|A_j| - 1} \end{aligned}$$

ways of choosing the elements of \(|A_j|\), given \(A_{j-1}\) and \(|A_j|\). Indeed, given \(A_{j-1}\) we can deduce which is the bottom row of \(A_j\), and we have at most x choices for the left-most element \(u_1\) of \(A_j\) in that row. If \(|A_j| = 2\) then (given \(u_1\)) there are then exactly 8 choices for the other element \(u_2\), since \(u_2 - u_1 \in \big \{ (4,0), (3,0), (2,0), (1,0), (\pm 2,1), (\pm 1,1) \big \}\). On the other hand, if \(|A_j| \geqslant 3\), then there are at most \(4^{|A_j| - 2} \cdot 12 \leqslant 8^{|A_j| - 1}\) choices for the remaining elements of \(A_j\) (given \(u_1\)), by Lemma 3.6, as required.

Now, multiplying together the (conditional) number of choices for each set \(A_j\), it follows that

$$\begin{aligned} |{\mathcal {F}}_0(R)| \, \leqslant \, \sum _{r=1}^y \sum _{|A_1|,\ldots ,|A_r|} \prod _{j = 1}^r \big ( x \cdot 8^{|A_j| - 1} \big ) \, \leqslant \, \sum _{r=1}^y 8^y {y - 1 \atopwithdelims ()r - 1} x^r, \end{aligned}$$

as claimed, since \(\sum _{j = 1}^r (|A_j| - 1) = y\). \(\square \)

Lemma 3.4 now follows by Markov’s inequality:

Proof of Lemma 3.4

Define a random variable X to be the number of sets \({\mathcal {A}}\in {\mathcal {F}}_0(R)\) that are entirely infected at time zero, i.e., that are contained in our p-random set \({\mathcal {S}}\). By Markov’s inequality and Lemma 3.5, we have

$$\begin{aligned} {\mathbb {P}}_p\big ( {\mathcal {S}}\cap R \in {\mathcal {U}}_0(R) \big )&\leqslant {\mathbb {E}}_p[X] \leqslant \sum _{r=1}^y 8^y {y - 1 \atopwithdelims ()r - 1} x^r p^{y+r} \\&= \sum _{r=1}^y {y-1 \atopwithdelims ()r-1} (8 p^2 x)^r (8 p)^{y-r} = \frac{p x}{1+px} (8p^2 x + 8p)^y \end{aligned}$$

as required. \(\square \)

3.2 The case \(\tau > 0\)

In this section we analyse the event \({\mathcal {S}}\cap R \in {\mathcal {U}}_+(R)\). If R is up-traversable by \({\mathcal {S}}\), then let \({\mathcal {A}}\) again denote a subset of \({\mathcal {S}}\) of minimal cardinality such that R is up-traversable by \({\mathcal {A}}\). By Lemma 3.2 above we know that if R is up-traversable by \({\mathcal {A}}\), then there must exist a time t at which there is spanning pair in \({\mathcal {B}}^{(t)}({\mathcal {A}})\) for each row in R. The following definition isolates the sites that are responsible for the creation of such spanning pairs.

Definition 3.7

Given \({\mathcal {S}}\) and a row \(\ell \), we say that \({\mathcal {M}}\subset {\mathcal {S}}\) is an infector of the row \(\ell \) if

  • there exists a \(t \geqslant 0\) such that \({\mathcal {B}}^{(t)}({\mathcal {M}})\) contains a spanning pair for the row \(\ell \), and

  • there does not exist a subset \({\mathcal {M}}' \subset {\mathcal {M}}\) such that there exists a \(t' \geqslant 0\) such that \({\mathcal {B}}^{(t')}({\mathcal {M}}')\) contains a spanning pair for the row \(\ell \) .

We call the bottom-most left-most site in \({\mathcal {M}}\) the root of \({\mathcal {M}}\). Given \({\mathcal {S}}\) we write \({\mathbb {M}}({\mathcal {S}}, R)\) for the set of all infectors contained in \({\mathcal {S}}\) for a row of R.

Note that spanning pairs are infectors, but that many other configurations are possible: see Fig. 3 for a few examples.

Lemma 3.8

(A property of the union of infectors) Suppose \(R = [1,x] \times [1,y]\) is up-traversable by \({\mathcal {S}}\) and that \({\mathcal {A}}\) is a subset of \({\mathcal {S}}\) of minimal cardinality with the same property. For each \(\ell \in \{1,\dots ,y\}\) there exists an infector \({\mathcal {M}}_\ell \) of row \(\ell \) in \({\mathcal {S}}\) such that

$$\begin{aligned} \bigcup _{\ell =1}^{y} {\mathcal {M}}_\ell = {\mathcal {A}}. \end{aligned}$$

Proof

Let \({\mathcal {A}}'\) be a subset of \({\mathcal {S}}\) such that R is up-traversable by \({\mathcal {A}}'\) and such that \({\mathcal {A}}'\) is a set with minimal cardinality for this property. By Lemma 3.2, the event that R is up-traversable by \({\mathcal {A}}'\) is equivalent to the event that there exists a spanning pair for each row of R after some finite number of iterations of \({\mathcal {A}}'\) by the bootstrap operator \({\mathcal {B}}\). This means that for each row \({\mathcal {A}}'\) contains at least one infector. Note that it is a priori possible that the infectors in \({\mathbb {M}}({\mathcal {A}}', R)\) overlap partially or that an infector for some row \(\ell \) is contained in an infector for a row \(\ell ' \ne \ell \). Write \(({\mathcal {M}}^{(i)})_{i=1}^{|{\mathbb {M}}({\mathcal {A}}',R)|}\) for some (arbitrary) ordered list of the infectors, and, for \(1 \leqslant s \leqslant |{\mathbb {M}}({\mathcal {A}}', R)|\) write

$$\begin{aligned} {\mathbb {M}}^\flat (s)\, {:=} \,\bigcup _{i=1}^{s-1} {\mathcal {M}}^{(i)} \cup \bigcup _{i=s+1}^{|{\mathbb {M}}({\mathcal {A}}',R)|} {\mathcal {M}}^{(i)} \end{aligned}$$

for the union of the sites of all the infectors except those of \({\mathcal {M}}^{(s)}\). Now suppose that there exist \(1 \leqslant s < t \leqslant |{\mathbb {M}}({\mathcal {A}}', R)|\) such that \({\mathcal {M}}^{(s)}, {\mathcal {M}}^{(t)}\) are both infectors of the same row \(\ell \) and suppose that \({\mathcal {M}}^{(s)} {\setminus } {\mathbb {M}}^\flat (s) \ne \varnothing \) and \({\mathcal {M}}^{(t)} {\setminus } {\mathbb {M}}^\flat (t) \ne \varnothing \). Then, since \({\mathcal {M}}^{(s)}\) is an infector for row \(\ell \) and the sites in \({\mathcal {M}}^{(t)} {\setminus } {\mathbb {M}}^\flat (t)\) are not needed to create a spanning pair for any other row, R is also up-traversable by the set \({\mathcal {A}}' {\setminus } ({\mathcal {M}}^{(t)} {\setminus } {\mathbb {M}}^\flat (t))\), whose cardinality is strictly smaller than \({\mathcal {A}}'\). This gives a contradiction. Hence, for each row \(\ell \) there must exist at most one infector \({\mathcal {M}}^{(s)}\) with the property that \({\mathcal {M}}^{(s)} {\setminus } {\mathbb {M}}^\flat (s) \ne \varnothing \). Taking their union we obtain \({\mathcal {A}}\) (i.e., \({\mathcal {A}}= {\mathcal {A}}'\)). \(\square \)

Recall that for any set \({\mathcal {Q}}\subset {\mathbb {Z}}^2\) we write \({\mathbf {x}}({\mathcal {Q}})\) and \({\mathbf {y}}({\mathcal {Q}})\) for the horizontal and vertical dimensions of that set. We split the event \(\{{\mathcal {S}}\cap R \in {\mathcal {U}}_+(R)\}\) according to whether there exists an infector \({\mathcal {M}}_\ell \) with \({\mathbf {x}}({\mathcal {M}}_\ell ) \geqslant 6k^2\) or not.

Lemma 3.9

(Wide infectors) Let \(R = [1,x] \times [1,k]\) with \(k \geqslant 3\) such that \( k \ll p^{-1}\), and x such that \( k^5 \ll x \leqslant p^{-2}\), then

$$\begin{aligned} {\mathbb {P}}_p\Big ({\mathcal {S}}\cap R \in {\mathcal {U}}_+ (R), \max _{\ell =1}^k {\mathbf {x}}({\mathcal {M}}_\ell ) \geqslant 6k^2 \Big ) = o((8p^2x + 8p)^k). \end{aligned}$$
(3.3)

Proof

Write \({\mathcal {M}}_{j}\) for the first infector such that \({\mathbf {x}}({\mathcal {M}}_{j}) \geqslant 6 k^2\). Since \({\mathcal {M}}_{j} \subset [1,x] \times [1,k]\), \({\mathbf {y}}({\mathcal {M}}_{j}) \leqslant k\). Moreover, \({\mathcal {M}}_{j}\) is the minimal set responsible for the creation of the spanning pair in row j, so it must be the case that \({\mathcal {M}}_{j}\) does not have a gap of more than three consecutive columns. There are at most xk possible positions for the root of the infector. We thus bound (3.3) for the range of x and our choice of k from above by

$$\begin{aligned} xk (1-(1-p)^{3k})^{2k^2}\leqslant & {} xk (3pk)^{2k^2} \ll (3 p k)^{6k - 3} \ll \left( 800 p^3 k^3 \right) ^{k}\\\ll & {} (8p^2x + 8p)^{k}. \end{aligned}$$

\(\square \)

Lemma 3.10

(Small infectors) There exist no infectors that are not a single spanning pair that intersect precisely one row, and there exist precisely two infectors that are not a single spanning pair that intersect precisely two rows, up to translations. The cardinality of these infectors is 4, and they span both rows they intersect.

Proof

Let \({\mathcal {M}}_j\) be the infector for some row j. Write v for an element of the spanning pair for row j that becomes infected due to the bootstrap dynamics on \({\mathcal {M}}_j\). (It is easy to see that only one element of a spanning pair can arise after time \(t=0\), but we do not use this fact.) Suppose t is the first time such that \({\mathcal {B}}^{(t)}({\mathcal {M}}_j)\) contains a spanning pair. Because \({\mathcal {M}}_j\) is not a spanning pair, \(t \geqslant 1\). Since v becomes infected at time t, it must be the case that \(|{\mathcal {N}}_{\scriptscriptstyle (1,2)}(v) \cap {\mathcal {B}}^{(t-1)} ({\mathcal {M}}_j)| \geqslant 3\). Any configuration of three sites in \({\mathcal {N}}_{\scriptscriptstyle (1,2)}(v)\) contains a spanning pair for the row that v is in, so v cannot be in row j. By the definition of spanning pairs, (3.1), a site can either span the row that it is in, or the row below it, so v is in row \(j+1\). We conclude that there are no infectors that are not a spanning pair that intersect precisely one row.

By the same argument, if \(t \geqslant 2\), then \({\mathcal {M}}_j\) must contain a site in row \(j+2\), so only infectors that intersect two rows can have \(t=1\).

One can easily verify that the only infectors with \(t =1\) that intersect two rows are translations of the configurations \(\{(0,0), (0,1), (3,1), (4,1)\}\) and \(\{(0,0), (0,1), (-\,3,1), (-\,4,1)\}\) (see the configuration in the bottom-left corner of Fig. 3). These infectors both have cardinality 4, and span both rows they intersect. \(\square \)

To analyse \({\mathbb {P}}_p({\mathcal {S}}\cap R \in {\mathcal {U}}_+(R))\) we again divide \({\mathcal {A}}\) into the maximal number of disjoint, “causally independent” pieces, to which we may apply the BK-inequality. We have seen that when \(\tau =0\) these pieces can be described as paths. When \(\tau >0\) this is still the case, but now the path structure can be found at the level of the infectors. We partition \({\mathcal {A}}\) as follows: let r be the largest integer such that there exist sets \(B_1, \dots , B_r\) that partition \({\mathcal {A}}\) (i.e., \(B_i \cap B_j = \varnothing \) for all \(i \ne j\) and \({\mathcal {A}}= \bigcup _{i=1}^r B_i\)) and such that there exist r pairs of integers \(\{(a_i, b_i)\}_{i=1}^r\) such that

  • \(1 = a_1 \leqslant b_1 \leqslant a_2 \leqslant b_2 \leqslant \cdots \leqslant a_r \leqslant b_r =k\), and

  • the event

    $$\begin{aligned} \{[1,x] \times [a_1, b_1] \text { is up-trav by }B_1\} \circ \cdots \circ \{[1,x] \times [a_r, b_r] \text { is up-trav by } B_r\} \end{aligned}$$

    occurs.

Lemma 3.11

(Path structure of \(B_1, \dots , B_r\)) Let \(R = [1,x] \times [1,y]\) and suppose that R is up-traversable by \({\mathcal {S}}\). Let \({\mathcal {A}}\) be the subset of \({\mathcal {S}}\) with minimal cardinality such that R is up-traversable by \({\mathcal {A}}\). Let \(B_1, \dots , B_r\) be the division of \({\mathcal {A}}\) into disjointly occurring pieces described above. Then the following hold:

  1. (a)

    For any row \(\ell \in \{1, \dots , y\}\) there exists a unique \(i \in \{1,\dots , r\}\) such that \({\mathcal {M}}_\ell \subseteq B_i\).

  2. (b)

    If \(B_i\) spans rows \(\ell , \dots , \ell +m\), then \(B_i = \cup _{j=\ell }^{\ell +m} {\mathcal {M}}_j\).

  3. (c)

    If \({\mathcal {M}}_j \subseteq B_i\) and \(j < b_i\), then at least one of the following holds: \({\mathcal {M}}_j = B_i\); or there exists a \(j' < j\) such that \({\mathcal {M}}_j \subset {\mathcal {M}}_{j'} \subseteq B_i\); or \({\mathcal {M}}_j \cap {\mathcal {M}}_{j+1} \ne \varnothing \).

  4. (d)

    If \({\mathcal {M}}_j \subseteq B_i\) and \(j = b_i\), then at least one of the following holds: \({\mathcal {M}}_j = B_i\); or there exists a \(j' < j\) such that \({\mathcal {M}}_j \subset {\mathcal {M}}_{j'} \subseteq B_i\); or \({\mathcal {M}}_{j-1} \cap {\mathcal {M}}_{j} \ne \varnothing \).

Proof

  1. (a)

    By construction, \({\mathcal {A}}= \cup _{i=1}^r B_i\), and \(B_i \circ B_j\) occurs if \(i \ne j\). By Lemma 3.8, \({\mathcal {A}}= \cup _{\ell =1}^k {\mathcal {M}}_\ell \). Suppose that there exists an \(\ell \) such that \({\mathcal {M}}_\ell \cap B_i\ne \varnothing \) and \({\mathcal {M}}_\ell \cap B_j\ne \varnothing \) for some \(i\ne j\). Without loss of generality, we can further assume that \(a_i\leqslant \ell \leqslant b_i\). Since \({\mathcal {M}}_\ell \) is the minimal set to create a spanning pair for row \(\ell \), and that \({\mathcal {M}}_\ell \cap B_i\) is a strict subset of \({\mathcal {M}}_\ell \) (since the latter intersects \(B_j\), which is disjoint from \(B_i\) by assumption), we deduce that \(\langle {\mathcal {M}}_\ell \cap B_i\rangle \) cannot contain a spanning pair for row \(\ell \). By Lemma 3.2, this means that \([1,x]\times [a_i,b_i]\) is not up-traversable by \(B_i\), which is a contradiction.

  2. (b)

    By Lemma 3.8, \({\mathcal {A}}= \cup _{i=1}^k {\mathcal {M}}_i\). Combined with (a) this gives (b).

  3. (c)

    Suppose that \(B_i\) spans rows \(\ell , \dots , \ell +m\) and suppose that there exists a \(j < b_i\) such that neither \({\mathcal {M}}_j = B_i\) nor \({\mathcal {M}}_j \subset {\mathcal {M}}_{j'}\) for some \(j' < j\), and such that \({\mathcal {M}}_{j} \cap {\mathcal {M}}_{j+1} = \varnothing \). Then we can partition

    $$\begin{aligned} B_i = \left( \bigcup _{s=\ell }^j {\mathcal {M}}_{s} \right) \sqcup \left( \bigcup _{t=j+1}^{\ell +m} {\mathcal {M}}_{t} \right) \,{=:}\, B_{i,1} \, \sqcup \, B_{i,2}. \end{aligned}$$

    It then follows that

    $$\begin{aligned} \{[1,x] \times [\ell , j] \text { is up-trav by }B_{i,1}\} \circ \{[1,x] \times [j+1, \ell +m] \text { is up-trav by } B_{i,2}\} \end{aligned}$$

    occurs. This gives a contradiction, since by construction the sets \(B_1, \dots , B_r\) are the maximal partition of \({\mathcal {A}}\) with this property, so such a j does not exist. So we conclude that if \({\mathcal {M}}_j \subset B_i\) but \({\mathcal {M}}_j \ne B_i\) and \({\mathcal {M}}_j \nsubseteq {\mathcal {M}}_{j'}\) for some \(j'<j\), then \({\mathcal {M}}_{j} \cap {\mathcal {M}}_{j+1} \ne \varnothing \).

  4. (d)

    The proof is identical to that of (c), mutatis mutandis.

\(\square \)

For all \(k,\ell ,m,x \in {\mathbb {N}}\), let \({\mathcal {E}}_{\ell +1, \ell +m}\) denote the event that a configuration of infected sites \({\mathcal {S}}\) has the following properties:

  • \({\mathcal {S}}\cap ([1,x] \times [\ell +1, \ell + m]) \in {\mathcal {U}}_+([1,x] \times [\ell +1, \ell + m])\),

  • the minimal subset \({\mathcal {A}}\) of \({\mathcal {S}}\) such that \([1,x] \times [\ell +1, \ell + m]\) is up-traversable by \({\mathcal {A}}\) cannot be divided into two or more disjointly occurring pieces, i.e., \({\mathcal {A}}= B_1\) in the construction described above.

  • \(\max _{j=\ell +1}^m {\mathbf {x}}({\mathcal {M}}_j) < 6k^2\).

Lemma 3.12

For \(k \geqslant 3\), \(\ell + m \leqslant k\) and all \(p \in [0,1]\),

$$\begin{aligned} {\mathbb {P}}_p({\mathcal {E}}_{\ell +1, \ell +m}) \leqslant 4p^2 x (12pk^2 + 7p)^{m-1}. \end{aligned}$$

Proof

There is at least one infected site in row \(\ell +1\), and it can be at x positions.

By Lemma 3.11, the event \({\mathcal {E}}_{\ell +1, \ell +m}\) implies that \({\mathcal {A}}\) is the union of infectors that are not disjoint. Since, moreover, none of the infectors are wider than \(6k^2-1\), for each of the rows \(\ell +2,\dots , \ell +m\) we then need to have at least 1 infected site in the line-segment \([-\,6k^2-3,6k^2+3]\) directly above the infected site of the row below it. Finally, row \(\ell +m\) must also be spanned, and by Lemma 3.10 its spanning pair must already be present at time \(t=0\), so there must be another infected site in that row, in one of the four positions that can create a spanning pair for line \(\ell +m\). We thus bound

$$\begin{aligned} {\mathbb {P}}_p({\mathcal {E}}_{\ell +1, \ell +m}) \leqslant px \cdot (p (12k^2+7))^{m-1} 4p. \end{aligned}$$

\(\square \)

Write

$$\begin{aligned} {\mathcal {V}}_{a,b}\, {:=} \,\big \{[1,x] \times [a,b] \text { is up-traversable}\big \} \end{aligned}$$

and

$$\begin{aligned} {\mathcal {V}}^+_{a,b}\, {:=} \,\big \{{\mathcal {S}}\cap ([1,x] \times [a,b]) \in {\mathcal {U}}_+([1,x] \times [a,b])\big \} \cap \big \{ \max _{j=a}^b {\mathbf {x}}({\mathcal {M}}_j) \leqslant 6 k^2 \big \}. \end{aligned}$$

The following lemma states the key inequality for the induction:

Lemma 3.13

For \(k \geqslant 2\),

$$\begin{aligned} {\mathbb {P}}_p({\mathcal {V}}^+_{1,k}) \leqslant \sum _{m=2}^k \sum _{\ell =0}^{k-m} {\mathbb {P}}_p({\mathcal {V}}_{1,\ell }) {\mathbb {P}}_p({\mathcal {E}}_{\ell +1, \ell +m}) {\mathbb {P}}_p({\mathcal {V}}_{\ell + m + 1,k}). \end{aligned}$$

Proof

Since \({\mathcal {V}}^+_{1,k}\) occurs, \([1,x] \times [1,k]\) is up-traversable. Let \({\mathcal {A}}\) be the minimal subset of \({\mathcal {S}}\) such that \([1,x] \times [1,k]\) is up-traversable with respect to \({\mathcal {A}}\). Let \(B_1, \dots , B_r\) be the subdivision of \({\mathcal {A}}\) described above. Let \(u \in {\mathcal {A}}\) and \(v \in \langle {\mathcal {A}}\rangle {\setminus } {\mathcal {A}}\) be such that \(\{u,v\}\) form a spanning pair for the row i, while \({\mathcal {A}}\) does not contain a spanning pair for row i. At least one such pair must exist since \({\mathcal {V}}^+_{1,k}\) occurs. Let j be such that \({\mathcal {M}}_i \subseteq B_j\) (we can find such a \(B_j\) by Lemma 3.11(a)). Suppose that \(B_j\) spans exactly the rows \(\ell +1, \dots , \ell +m\) (i.e., \(a_j = \ell +1\) and \(b_j = \ell +m\)). Then, by the construction of \(B_1, \dots , B_r\) and \({\mathcal {E}}_{\ell +1, \ell +m}\) we know that

$$\begin{aligned} {\mathcal {V}}_{1,\ell } \circ {\mathcal {E}}_{\ell +1, \ell +m} \circ {\mathcal {V}}_{\ell +m+1,k} \end{aligned}$$

occurs for \({\mathcal {S}}\). Applying the BK-inequality and summing over \(\ell \) and m gives the asserted inequality. The sum over m starts at 2 because by Lemma  3.10, \(B_j\) must span at least two rows.\(\square \)

3.3 The proof of Lemma 3.1

To begin, assume that \(\frac{3 k^2}{p} \leqslant x \leqslant \frac{1}{p^2}\). We start by proving Lemma 3.1 for the cases where \(y \leqslant k\). More precisely, we will prove that

$$\begin{aligned} {\mathbb {P}}_p({\mathcal {V}}_{1,k}) \leqslant \mathrm {e}(8p^2 x +8 p)^k, \end{aligned}$$
(3.4)

holds for \(k \ll p^{-1}\). We use induction. The inductive hypothesis is that (3.4) holds for \(k' \leqslant k-1\) and \( k^5 \ll x \leqslant p^{-2}\). To initialise the induction we observe that when \(k=1\) there exist four spanning pairs up to translations that intersect one row, so \({\mathbb {P}}_p({\mathcal {V}}_{1,1}) \leqslant 4p^2 x < \mathrm {e}(8p^2x + 8p)\). When \(k=2\) we use Lemma 3.10 to bound

$$\begin{aligned} {\mathbb {P}}_p({\mathcal {V}}^+_{1,2}) = 2 x p^4, \end{aligned}$$
(3.5)

which, combined with Lemma 3.4 yields that

$$\begin{aligned} {\mathbb {P}}_p({\mathcal {V}}_{1,2}) \leqslant (8p^2x + 8p)^2 + 4xp^4 \leqslant \mathrm {e}(8p^2 x +8 p)^2 \end{aligned}$$

when p is sufficiently small.

When \(3 \leqslant k \ll p^{-1}\), by (3.5), Lemmas 3.43.93.12, and 3.13, and the induction hypothesis (3.5), when p is sufficiently small,

$$\begin{aligned} {\mathbb {P}}_p({\mathcal {V}}_{1,k})&\leqslant (8p^2x + 8p)^k + \frac{\mathrm {e}-1}{3} (8p^2 x + 8p)^k + 2 x p^4 (k-1) \mathrm {e}^2 (8 p^2 x + 8 p)^{k-2}\nonumber \\&\quad + 4\mathrm {e}^2 x p^2 k \sum _{m=3}^k (8p^2 x + 8 p)^{k-m+1} (12p k^2 +7p)^{m-1}, \end{aligned}$$
(3.6)

where the second term on the right-hand side is due to Lemma 3.9, and the third and fourth correspond to the \(m=2\) and \(m\geqslant 3\) terms in Lemma 3.13.

It is not difficult to show that

$$\begin{aligned} \sum _{m=3}^k a^{k-m+1} b^{m-1} = \frac{b^2 a^{k-1} - a b^k}{a-b}. \end{aligned}$$

When \(\frac{3 k^2}{p} \leqslant x\) we have \(12 p k^2 + 7p \leqslant \tfrac{1}{2} (8 p^2 x + 8p)\), so this implies that

$$\begin{aligned}&4\mathrm {e}^2 x p^2 k \sum _{m=3}^k (8p^2 x + 8 p)^{k-m+1} (12p k^2 + 7p)^{m-1} \\&\quad \leqslant 8 \mathrm {e}^2 x p^2 (12 p k^2 + 7p)^2 (8 p^2 x + 8p)^{k-2}. \end{aligned}$$

Inspecting (3.6), it follows that the desired bound (3.4) holds if the following two inequalities hold for p sufficiently small:

$$\begin{aligned} 2 \mathrm {e}^2 x p^4(k-1)&< \frac{\mathrm {e}-1}{3}(8 p^2 x + 8 p)^2,\\ 8 \mathrm {e}^2 x p^2 k(12pk^2 + 7p)^2&< \frac{\mathrm {e}-1}{3} (8p^2 x + 8 p)^2. \end{aligned}$$

The first inequality holds because \(k \ll x\). It is easy to verify that the second inequality holds when \(k^5 \ll x \leqslant p^{-2}\). Substituting the above inequalities into (3.6) proves the claim of Lemma 3.1 for \(y \leqslant k\).

Now we consider \(R = [1,x] \times [1,y]\) for y such that \(k< y < x\) (still assuming that \(\frac{3 k^2}{p} \leqslant x \leqslant \frac{1}{p^2}\)). We cover R with \(\lceil y/k \rceil \) rectangles of height k. If y is not divisible by k the covering “overshoots”: it includes at most \(k-1\) rows that are not in R. If R is up-traversable, and if the overshoot contains a connected upward path, then all these rectangles are also up-traversable. The probability that there is a connected path in the overshoot is at least \(p^k\). It thus follows by the BK-inequality that

$$\begin{aligned} {\mathbb {P}}_p\left( R \text { is up-trav}\right)\leqslant & {} p^{-k} \prod _{n=1}^{\lceil y /k \rceil } {\mathbb {P}}_p({\mathcal {V}}_{(n-1)k+1, nk})\nonumber \\\leqslant & {} p^{-k} \mathrm {e}^{y/k} (8p^2 x + 8 p)^{y}, \end{aligned}$$
(3.7)

where these bounds again hold for p sufficiently small. This completes the proof of Lemma 3.1 for the case \(\frac{3 k^2}{p} \leqslant x \leqslant \frac{1}{p^2}\).

The case \(x < \frac{3 k^2}{p}\) is now easy. Note that if \([1,x] \times [1,y]\) is up-traversable by \({\mathcal {S}}\), then also \([1,x+a] \times [1,y]\) for any \(a \geqslant 1\) is up-traversable by \({\mathcal {S}}\) (i.e., up-traversability is a monotone increasing event in the width of the rectangle). Hence, \({\mathbb {P}}_p([x] \times [y]\) is up-trav) is a monotone increasing function in x. The bound thus follows by choosing \(x = \frac{3k^2}{p}\) and applying the bound for the case \(\frac{3 k^2}{p} \leqslant x \leqslant \frac{1}{p^2}\). \(\square \)

4 The probability of simultaneous horizontal and vertical growth

The lemma below states an upper bound on the probability of an infected rectangle growing both vertically and horizontally, i.e., an upper bound on \({\mathbb {P}}_p(R \Rightarrow R')\) for certain \(R \subset R'\).

Let

$$\begin{aligned} \xi \,{:=} \,\left\lceil \log ^2 \frac{1}{p}\right\rceil , \qquad \text { and } \qquad \delta _\xi \, {:=} \,1 - 2/\xi . \end{aligned}$$

Recalling Lemma 3.1 and the bound on f(py) in Lemma 2.2 above, let

(4.1)

and let

(4.2)

For two rectangles \(R\subset R'\) with dimensions (xy) and \((x+s,y+t)\), let

$$\begin{aligned} U^p(R,R'):= \delta _\xi \,(t\psi (x+s) + s \phi (y+t)). \end{aligned}$$
(4.3)

Observe that \(\psi \) and \(\phi \) are both positive, decreasing, and convex functions (where they are not zero).

Lemma 4.1

Let \(R \subset R'\), with dimensions (xy) and \((x+s,y+t)\) respectively. Assume that \(t \leqslant \frac{1}{p} \log ^{-4} \frac{1}{p}\). Then, for p sufficiently small,

$$\begin{aligned} {\mathbb {P}}_p(R \Rightarrow R') \leqslant 2p^{-\xi } \exp \left( -U^p (R,R') \right) . \end{aligned}$$

The proof uses a similar strategy as [23, Proof of Proposition 3.3]. Roughly speaking this strategy entails that we “decorrelate” the horizontal and vertical growth events needed for \(\{R \Rightarrow R'\}\).

Proof

If \(y+t \leqslant \frac{4}{p} \log \log \frac{1}{p}\) and \(x+s > 1/p^2\), then we use the trivial bound \({\mathbb {P}}_p(R \Rightarrow R') \leqslant 1,\) corresponding to \(U^p(R,R')=0\), as required.

If \(y+t \leqslant \frac{4}{p} \log \log \frac{1}{p}\) and \(x+s \leqslant 1/p^2\), then we apply Lemma 3.1 (with \(k = \xi \)), again giving the required bound.

Therefore, we assume henceforth that \(y+t > \frac{4}{p} \log \log \frac{1}{p}\) and \(x+s \leqslant \frac{1}{p^2}\).

To start, suppose that \((1-\delta _\xi ) t \psi (x+s) > \delta _\xi s \phi (y+t)\), which corresponds to the vertical growth component t being disproportionately large compared to the horizontal growth component s. Then, we can simply ignore the horizontal growth and apply Lemma 3.1 to bound

$$\begin{aligned} {\mathbb {P}}_p(R \Rightarrow R')&\leqslant p^{-\xi } \exp (-t \psi (x+s)) \nonumber \\&\leqslant p^{-\xi } \exp (-\delta _\xi t \psi (x+s) - \delta _\xi s \phi (y+t))\nonumber \\&= p^{-\xi } \exp (-U^p(R,R')), \end{aligned}$$
(4.4)

and we are done. Therefore, let us henceforth also assume that

$$\begin{aligned} (1-\delta _\xi ) t \psi (x+s) \leqslant \delta _\xi s \phi (y+t). \end{aligned}$$
(4.5)
Fig. 5
figure 5

The rectangles \(R_w\) and \(R_e\) are hatched red and the rectangles \(R_n\) and \(R_s\) are hatched blue. The regions where these four rectangles overlap, collectively called H, are cross hatched purple (color figure online)

We identify five (intersecting) regions within the area \(R' {\setminus } R\): the North, South, West, and East regions \(R_n\), \(R_s\), \(R_w\), and \(R_e\), and the corner region H: for \(R'=[a_1,a_2]\times [b_1,b_2]\) and \(R=[c_1,c_2]\times [d_1,d_2]\), such that \(a_1 \leqslant c_1 < c_2 \leqslant a_2\) and \(b_1 \leqslant d_1 < d_2 \leqslant b_2\), we define the sets

$$\begin{aligned} R_{w}\,&{:=}\,[a_1,c_1-1]\times [b_1,b_2]\quad \text {and}\quad R_{e}\,{:=}\,[c_2+1,a_2]\times [b_1,b_2],\\ R_{n}\,&{:=}\,[a_1,a_2]\times [d_2+1,b_2]\quad \text {and} \quad R_{s}\,{:=}\,[a_1,a_2]\times [b_1,d_1-1],\\ H \,&{:=}\,R' {\setminus } \{(x,y):x\in [c_1,c_2]\text { or }y\in [d_1,d_2]\}, \end{aligned}$$

see Fig. 5. Observe that

$$\begin{aligned} \{R \Rightarrow R'\} \subset \{R_n \text { is up-trav}\} \cap \{ R_s \text { is down-trav}\} \cap \{R_w \text { and } R_e \text { are hor-trav}\}. \end{aligned}$$

Let

$$\begin{aligned} E\, {:=} \,\{R_n \text { is up-trav}\} \cap \{ R_s \text { is down-trav}\}. \end{aligned}$$

Recall from Definition 3.7 above that we write \({\mathbb {M}}(S,R_n)\) and \({\mathbb {M}}({\mathcal {S}},R_s)\) for the sets of infectors of \(R_n\) and \(R_s\) (the latter being a set of infectors suitably defined for down-traversability). By Lemma 3.8, we are able to determine whether E occurs by inspecting only sites in \({\mathbb {M}}(S,R_n)\) and \({\mathbb {M}}({\mathcal {S}},R_s)\). So the event that \(R_w\) and \(R_e\) are horizontally traversable only depends on E through the information about the intersection of these sets with H, the region where the rectangles overlap. Define \({\mathbb {M}}^\flat _H({\mathcal {S}})\) as the set of all sites in \({\mathcal {S}}\cap H\) contained in either \({\mathbb {M}}(S,R_n)\) or \({\mathbb {M}}({\mathcal {S}},R_s)\). Let Y denote the number of columns in H that contain at least one infected site in \({\mathbb {M}}^\flat _H({\mathcal {S}})\). We split

$$\begin{aligned} {\mathbb {P}}_p(R \Rightarrow R') \leqslant {\mathbb {P}}_p\big (\{R \Rightarrow R'\} \cap \{Y \leqslant s/(2\xi )\} \big ) + {\mathbb {P}}_p(Y > s/(2 \xi )). \end{aligned}$$
(4.6)

We start by bounding the first term in (4.6). Let \(F : = \{Y \leqslant s/(2\xi )\}\). We use Lemma 3.1 with \(k = \xi = \lceil \log ^2 \frac{1}{p} \rceil \) to bound

$$\begin{aligned} {\mathbb {P}}_p( \{R \Rightarrow R'\} \cap F)&\leqslant {\mathbb {P}}_p(R_e \text { and }R_w \text { are hor-trav} \mid E \cap F ) {\mathbb {P}}_p(E )\nonumber \\&\leqslant {\mathbb {P}}_p(R_e \text { and }R_w \text { are hor-trav} \mid E \cap F ) p^{-\xi } \mathrm {e}^{-t \psi (x+s) }.\quad \quad \end{aligned}$$
(4.7)

Let \({\mathfrak {R}}_n\) denote the set of all sets of n subrectangles of \(R_e \cup R_w\) with heights \(y+t\), total width n, and such that each pair of rectangles in a set \({\mathfrak {r}}\in {\mathfrak {R}}_n\) are separated by at least one column. I.e., for \({\mathfrak {r}}= \{{\mathfrak {r}}_i\}_{i=1}^{N({\mathfrak {r}})} \in {\mathfrak {R}}_n\) we have that \({\mathfrak {r}}\) is a collection of \(N({\mathfrak {r}})\) strictly disjoint subrectangles with \(\sum _{i=1}^{N({\mathfrak {r}})} {\mathbf {x}}({\mathfrak {r}}_i) = n\), and \({\mathbf {y}}({\mathfrak {r}}_i) = y+t\) for all \(1 \leqslant i \leqslant N({\mathfrak {r}})\). For any \({\mathfrak {r}}= \{{\mathfrak {r}}_i\}_{i=1}^{N({\mathfrak {r}})} \in {\mathfrak {R}}_n\) define the following two events:

$$\begin{aligned} E_1({\mathfrak {r}})\, {:=} \,\left\{ \forall 1 \leqslant i \leqslant N({\mathfrak {r}}) : {\mathfrak {r}}_i \text { is horizontally traversable} \right\} \end{aligned}$$

and

$$\begin{aligned} E_2({\mathfrak {r}}) \,&{:=} \,\bigg \{{\mathbb {M}}_H^\flat \cap \bigcup _{i=1}^{N({\mathfrak {r}})} {\mathfrak {r}}_i = \varnothing \bigg \} \nonumber \\&\cap \bigg \{ \not \exists {\mathfrak {r}}' \in {\mathfrak {R}}_{n} : N({\mathfrak {r}}') < N({\mathfrak {r}}) \text { and } {\mathbb {M}}_H^\flat \cap \bigcup _{i=1}^{N({\mathfrak {r}}')} {\mathfrak {r}}'_i = \varnothing \bigg \} \nonumber \\&\cap \bigg \{\not \exists {\mathfrak {r}}'' \in \cup _{n'' > n} {\mathfrak {R}}_{n''} : {\mathbb {M}}_H^\flat \cap \bigcup _{i=1}^{N({\mathfrak {r}}'')} {\mathfrak {r}}_i = \varnothing \bigg \}, \end{aligned}$$
(4.8)

that is, \(E_2({\mathfrak {r}})\) is the event that \({\mathfrak {r}}\) is the partition into the least number of rectangles of total width n that does not intersect \({\mathbb {M}}_H^\flat \), and that there is no partition of total width greater than n that also does not intersect \({\mathbb {M}}_H^\flat \). Observe that

$$\begin{aligned} \{R_e, R_w \text { are hor-trav}\} \subseteq \bigsqcup _{m=0}^s \bigsqcup _{{\mathfrak {r}}\in {\mathfrak {R}}_{s-m}} (E_1({\mathfrak {r}}) \cap E_2({\mathfrak {r}})). \end{aligned}$$

Thus,

$$\begin{aligned}&{\mathbb {P}}_p(R_e, R_w \text { are hor-trav} \mid E \cap F) \nonumber \\&\quad \leqslant \sum _{m=0}^{s /(2\xi )} \sum _{{\mathfrak {r}}\in {\mathfrak {R}}_{s-m}} {\mathbb {P}}_p(E_1({\mathfrak {r}}) \mid E_2({\mathfrak {r}}) \cap E \cap F) {\mathbb {P}}_p(E_2({\mathfrak {r}}) \mid E \cap F), \end{aligned}$$
(4.9)

where we used that the sum may be restricted to \(m \leqslant s/(2\xi )\) by the conditioning on F. Now note that the events E and F can be verified by inspecting only \({\mathbb {M}}^\flat _H\), which, on the event \(E_2({\mathfrak {r}})\) is contained in \(H {\setminus } {\mathfrak {r}}\), while \(E_1({\mathfrak {r}})\) by definition only depends on the sites in \({\mathfrak {r}}\), so that conditionally on \(E_2({\mathfrak {r}})\) the event \(E_1({\mathfrak {r}})\) is independent of E and F. We may thus write

$$\begin{aligned} \begin{aligned} {\mathbb {P}}_p(E_1({\mathfrak {r}}) \mid E_2({\mathfrak {r}}) \cap E \cap F)&= \frac{{\mathbb {P}}_p(E_1({\mathfrak {r}}) \cap E \cap F \mid E_2({\mathfrak {r}}))}{{\mathbb {P}}_p(E \cap F \mid E_2({\mathfrak {r}}))} \\&= \frac{{\mathbb {P}}_p(E_1({\mathfrak {r}}) \mid E_2({\mathfrak {r}})) {\mathbb {P}}_p(E \cap F \mid E_2({\mathfrak {r}}))}{{\mathbb {P}}_p(E \cap F \mid E_2({\mathfrak {r}}))} \\&= {\mathbb {P}}_p(E_1({\mathfrak {r}}) \mid E_2({\mathfrak {r}})). \end{aligned} \end{aligned}$$

Observe that for any fixed \({\mathfrak {r}}\) the event \(E_1({\mathfrak {r}})\) is increasing. Indeed, adding more sites to \({\mathcal {S}}\) can either make horizontal traversal occur when it did not before, or else, have no effect. We claim that the event \(E_2({\mathfrak {r}})\), on the other hand, is the intersection of three decreasing events, and hence itself a decreasing event. To see this, observe that the first event in (4.8) is decreasing because adding more sites to \({\mathcal {S}}\) cannot decrease the total width of \({\mathbb {M}}_H^\flat \), since it is the union of all infectors intersecting H (not only those of minimal cardinality for a given row). The second event in (4.8) is likewise decreasing, because increasing \({\mathbb {M}}_H^\flat \) cannot decrease the minimal number of rectangles of a partition that does not intersect \({\mathbb {M}}_H^\flat \), unless it also decreases the total width of that partition. The third event is decreasing because increasing \({\mathbb {M}}_H^\flat \) cannot decrease its total width. Therefore, we may apply the FKG-inequality to obtain

$$\begin{aligned} {\mathbb {P}}_p(E_1({\mathfrak {r}}) \mid E_2({\mathfrak {r}})) \leqslant {\mathbb {P}}_p(E_1({\mathfrak {r}})), \end{aligned}$$

and we may thus further bound the right-hand side of (4.9) by

$$\begin{aligned} \sum _{m=1}^{s /(2\xi )} \sum _{{\mathfrak {r}}\in {\mathfrak {R}}_{s-m}} {\mathbb {P}}_p(E_1({\mathfrak {r}})) {\mathbb {P}}_p(E_2({\mathfrak {r}}) \mid E \cap F). \end{aligned}$$

Uniformly for any fixed \({\mathfrak {r}}\in {\mathfrak {R}}_{s-m}\) with \(m \leqslant s/(2\xi )\), by Lemma 2.2,

$$\begin{aligned} \begin{aligned} {\mathbb {P}}_p(E_1({\mathfrak {r}}))&= \prod _{i=1}^{N({\mathfrak {r}})} {\mathbb {P}}_p({\mathfrak {r}}_i \text { is hor-trav}) \leqslant \exp \left( -\sum _{i=1}^{N({\mathfrak {r}})} ({\mathbf {x}}({\mathfrak {r}}_i) -2)f(p,y+t)\right) \\&\leqslant \exp \left( -s(1-1/\xi ) f(p,y+t) \right) \\&\leqslant \exp \left( - \delta _\xi s \phi (y+t) \right) , \end{aligned} \end{aligned}$$

where the final inequality follows from Lemma 2.2(b) when p is sufficiently small. Inserting this bound in (4.9), we proceed by using that the events \(E_2({\mathfrak {r}})\) are mutually disjoint for all \({\mathfrak {r}}\) to bound

$$\begin{aligned}&{\mathbb {P}}_p(R_e, R_w \text { are hor-trav} \mid E \cap F) \nonumber \\&\quad \leqslant \exp \left( - \delta _\xi s \phi (y+t)\right) \sum _{m=1}^{s /(2\xi )} \sum _{{\mathfrak {r}}\in {\mathfrak {R}}_{s-m}} {\mathbb {P}}_p(E_2({\mathfrak {r}}) \mid E \cap F)\nonumber \\&\quad \leqslant \exp \left( - \delta _\xi s \phi (y+t)\right) . \end{aligned}$$
(4.10)

Combining (4.7) and (4.10) we bound the first term in (4.6) by \( p^{-\xi } \exp (-U^p(R,R'))\).

Now we bound the second term in (4.6). If \(Y > s/(2\xi )\) then at least \(s/(2 \xi )\) out of s columns are non-empty. The probability that a column is non-empty is \(1-(1-p)^t \leqslant 2pt\) (when p is sufficiently small). Therefore, \({\mathbb {P}}(Y > s/(2\xi )) \leqslant {\mathbb {P}}(\)Bin\((s, 2pt) > s/(2\xi ))\). We use Chernoff’s bound that \({\mathbb {P}}(\)Bin\((n, p) > q) \leqslant \mathrm {e}^{-q}\) when \(q>np\) to estimate

$$\begin{aligned} {\mathbb {P}}_p(Y > s/(2\xi )) \leqslant \exp (-s/(2\xi )) \end{aligned}$$

(here we used that \(t \leqslant \frac{1}{p} \log ^{-4} \frac{1}{p}\)). Observe that since \(\xi = \lceil \log ^2 \frac{1}{p} \rceil \), \(\delta _\xi = 1-\xi ^{-1}\), and \(\phi (y+t) > \log ^{-12} \frac{1}{p}\) by our assumption that \(y+t > \frac{4}{p} \log \log \frac{1}{p}\), we have

$$\begin{aligned} \exp (-s/(2 \xi )) \leqslant \exp \left( -\frac{\delta _\xi }{1-\delta _\xi } s \phi (y+t)\right) . \end{aligned}$$

Now recall our assumption (4.5) that \((1-\delta _\xi ) t \psi (x+s) \leqslant \delta _\xi s \phi (y+t)\). Applying this inequality twice, it follows that

$$\begin{aligned} \frac{\delta _\xi }{1-\delta _\xi } s \phi (y+t) \geqslant t \psi (x+s) \geqslant \delta _x t \psi (x+s) + \delta _\xi s \phi (y+t). \end{aligned}$$

We thus have \({\mathbb {P}}_p(Y > s/(2\xi )) \leqslant \exp (-U^p(R,R'))\), as required.

Applying the bounds for the two cases to (4.6) completes the proof (using the crude upper bound \(p^{-\xi } + 1 \leqslant 2 p^{- \xi }\) for p sufficiently small). \(\square \)

5 The upper bound of Theorem 1.3

Proposition 5.1

Let \(p > 0\) and \(\frac{1}{3p} \log \frac{1}{p} \leqslant y \leqslant \frac{1}{p} \log \frac{1}{p}\) and \(\frac{1}{p^2} \leqslant x \leqslant \frac{1}{p^5}\). Then

$$\begin{aligned} {\mathbb {P}}_p([x] \times [y] \text { is IF}) \leqslant \exp \left( - \frac{2C_1}{p} \log ^2 \frac{1}{p} + (2 C_2 + o(1)) \frac{1}{p} \log \frac{1}{p} \right) . \end{aligned}$$

5.1 Notation and definitions

Before we proceed with the proof, we must introduce some more notation and a few definitions. Our proof uses hierarchies. The notion of hierarchies is due to Holroyd [33], and is common to much of the bootstrap percolation literature since. Here we use a definition of a hierarchy that is similar to the one in [23]:

Definition 5.2

(Hierarchies).

  1. (a)

    Hierarchy, seed, normal vertex, and splitter: A hierarchy \({\mathcal {H}}\) is a rooted tree with out-degrees at most threeFootnote 11 and with each vertex v labeled by non-empty rectangle \(R_v\) such that \(R_v\) contains all the rectangles that label the descendants of v. If the number of descendants of a vertex is 0, we call the vertex a seed. Footnote 12 If the vertex has one descendant, we call it a normal vertex, and we write \(u\mapsto v\) to indicate that u is a normal vertex with (unique) descendant v. If the vertex has two or more descendants, we call it a splitter vertex. We write \(N({\mathcal {H}})\) for the number of vertices in the tree \({\mathcal {H}}\).

  2. (b)

    Precision: A hierarchy of precision Z (with \(Z \geqslant 1\)) is a hierarchy that satisfies the following conditions:

    1. (1)

      If w is a seed, then \({\mathbf {x}}(R_w) \geqslant 2\) and \({\mathbf {y}}(R_w)<2Z\), while if u is a normal vertex or a splitter, then \({\mathbf {y}}(R_u)\geqslant 2Z\).

    2. (2)

      If u is a normal vertex with descendant v, then \({\mathbf {y}}(R_u)-{\mathbf {y}}(R_v) \leqslant 2Z\).

    3. (3)

      If u is a normal vertex with descendant v and v is either a seed or a normal vertex, then \({\mathbf {y}}(R_u)-{\mathbf {y}}(R_v) >Z\).

    4. (4)

      If u is a splitter with descendants \(v_1,\dots ,v_i\) and \(i \in \{2,3\}\), then there exists \(j\in \{1,\dots ,i\}\) such that \({\mathbf {y}}(R_{u})-{\mathbf {y}}(R_{v_j})> Z.\)

  3. (c)

    Presence: Given a set of infected sites \({\mathcal {S}}\) we say that a hierarchy \({\mathcal {H}}\) is present in \({\mathcal {S}}\) if all of the following events occur disjointly:

    1. (1)

      For each seed w, \(R_w = \langle R_w \cap {\mathcal {S}}\rangle \) (i.e., \(R_w\) is internally filled by \({\mathcal {S}}\)).

    2. (2)

      For each normal u and every v such that \(u \mapsto v\), \(R_u = \langle (R_v \cup {\mathcal {S}}) \cap R_u \rangle \) (i.e., the event \(\{R_v \Rightarrow R_u\}\) occurs on \({\mathcal {S}}\)).

  4. (d)

    Goodness: Similar to [30], we say that a seed w is large if \(Z/3 \leqslant {\mathbf {y}}(R_w) \leqslant Z\). We call a hierarchy good if it has at most \(\log ^{11} \frac{1}{p}\) large seeds, and we call it bad otherwise.

5.2 Outline of the proof of Proposition 5.1

In this section we give the proof of Proposition 5.1 subject to Lemma  5.9 below. We prove Lemma 5.9 in Sect. 6.

Let \({\mathcal {H}}_{Z,R}\) denote a hierarchy with root R and precision Z. Let \({\mathbb {H}}_{Z,R}\) denote the set of all \({\mathcal {H}}_{Z,R}\). Likewise, let \({\mathbb {H}}_{Z,R}^{\scriptscriptstyle \mathrm {good}}\) and \({\mathbb {H}}_{Z,R}^{\scriptscriptstyle \mathrm {bad}}\) denote the subsets of good and bad hierarchies in \({\mathbb {H}}_{Z,R}\). Lastly, given a set of hierarchies \({\mathbb {H}}\) and a rectangle R, define the event

$$\begin{aligned} {\mathcal {X}}(R; {\mathbb {H}})\, {:=} \,\{{\mathcal {S}}\in \{0,1\}^{{\mathbb {Z}}^2} \, : \, \exists {\mathcal {H}}\in {\mathbb {H}} \text { such that } {\mathcal {H}}\text { is present in }{\mathcal {S}}\cap R\}. \end{aligned}$$

Lemma 5.3

Let R be a rectangle with \({\mathbf {x}}(R) \geqslant 2\) and let \(Z \geqslant 3\). If R is internally filled, then there exists a hierarchy \({\mathcal {H}}_{Z,R} \in {\mathbb {H}}_{Z,R}\) that is present, i.e., \({\mathcal {X}}(R; {\mathbb {H}}_{Z,R})\) occurs.

The proof of this lemma is the same as the proof of [23, Proposition 3.8] so we do not repeat it here. (But note that it does not matter that our definition of hierarchies uses “internally filled” rather than “k-occurs”.)

Throughout this paper, let

$$\begin{aligned} Z_p\, {:=}\,\frac{1}{p} \log ^{-8} \frac{1}{p}. \end{aligned}$$

Conform the hypothesis of Proposition 5.1 we restrict ourselves to hierarchies with root label \(R_p\) of dimensions (xy) such that

$$\begin{aligned} \frac{1}{p^2} \leqslant x \leqslant \frac{1}{p^5} \qquad \text { and } \qquad \frac{1}{3p} \log \frac{1}{p} \leqslant y \leqslant \frac{1}{p} \log \frac{1}{p}. \end{aligned}$$

For the sake of simplicity we often suppress subscripts \(Z_p\) and \(R_p\).

We bound the good and bad hierarchies separately:

$$\begin{aligned} {\mathbb {P}}_p(R_p \text { is IF}) \leqslant {\mathbb {P}}_p({\mathcal {X}}(R_p; {\mathbb {H}}^{\scriptscriptstyle \mathrm {good}})) + {\mathbb {P}}_p({\mathcal {X}}(R_p; {\mathbb {H}}^{\scriptscriptstyle \mathrm {bad}})). \end{aligned}$$
(5.1)

We bound the second term with the following lemma:

Lemma 5.4

As p tends to 0 we have

$$\begin{aligned} {\mathbb {P}}_p({\mathcal {X}}(R_p; {\mathbb {H}}^{\scriptscriptstyle \mathrm {bad}}) ) \leqslant \exp \left( - \Omega \left( \frac{1}{p} \log ^3 \frac{1}{p} \right) \right) . \end{aligned}$$

Proof

We claim that if R is a large seed, i.e., \(Z_p/3 \leqslant {\mathbf {y}}(R) \leqslant Z_p\), then

$$\begin{aligned} {\mathbb {P}}_p(R \text { is IF}) \leqslant \exp \left( -\Omega \left( \frac{1}{p} \log ^{-8} \frac{1}{p}\right) \right) . \end{aligned}$$

To see that this is indeed the case we consider the cases \(x \geqslant 1/p\) and \(x < 1/p\) separately. For the case \(x \geqslant 1/p\), the bound follows from Lemma 2.2(c):

$$\begin{aligned} \begin{aligned} {\mathbb {P}}_p(R \text { is IF})&\leqslant {\mathbb {P}}_p(R \text { is hor-trav}) \leqslant \exp \left( -(x-2)\left( \tfrac{1}{2} p y - 3 p^2 y^2 \right) \right) \\&\leqslant \exp \left( -\left( \frac{1}{p}-2\right) \left( \tfrac{1}{2} p \cdot \frac{1}{3p} \log ^{-8} \frac{1}{p} - 3 p^2 \cdot \frac{1}{p^2} \log ^{-16} \frac{1}{p}\right) \right) \\&\leqslant \exp \left( -\Omega \left( \frac{1}{p} \log ^{-8} \frac{1}{p} \right) \right) . \end{aligned} \end{aligned}$$

For the case \(x < 1/p\), the bound follows from Lemma 3.1 with \(k=2\) and p sufficiently small:

$$\begin{aligned} \begin{aligned} {\mathbb {P}}_p(R \text { is IF})&\leqslant {\mathbb {P}}_p(R \text { is up-trav}) \leqslant p^{-2} \mathrm {e}^{y/2} (104 p)^y\\&= \exp \left( 2 \log \frac{1}{p} + \frac{y}{2} + y \log (108) - y \log \frac{1}{p} \right) \\&= \exp \left( - (1+o(1)) y \log \frac{1}{p} \right) = \exp \left( -\Omega \left( \frac{1}{p} \log ^{-8} \frac{1}{p} \right) \right) . \end{aligned} \end{aligned}$$

Now consider the event \({\mathcal {X}}(R_p; {\mathbb {H}}^{\scriptscriptstyle \mathrm {bad}})\). This event implies that there exists a hierarchy \({\mathcal {H}}\) that is present and bad, which by definition means that more than \(\log ^{11} \frac{1}{p}\) rectangles of size between \(Z_p/3\) and \(Z_p\) are internally filled disjointly. Since \(R_p\) contains at most \(\frac{1}{p^6} \log \frac{1}{p}\) sites, the probability of this event is smaller than

$$\begin{aligned} \left( \frac{1}{p^6} \log \frac{1}{p} \cdot \mathrm {e}^{-\frac{c}{p} \log ^{-8} \frac{1}{p} }\right) ^{ \log ^{11} \frac{1}{p}} \leqslant \exp \left( - \Omega \left( \frac{1}{p} \log ^{3} \frac{1}{p}\right) \right) . \end{aligned}$$

\(\square \)

We bound the first term of (5.1) as follows:

$$\begin{aligned} {\mathbb {P}}_p({\mathcal {X}}(R_p; {\mathbb {H}}^{\scriptscriptstyle \mathrm {good}})) \leqslant |{\mathbb {H}}^{\scriptscriptstyle \mathrm {good}}| \max _{{\mathcal {H}}\in {\mathbb {H}}^{\scriptscriptstyle \mathrm {good}}} {\mathbb {P}}_p({\mathcal {H}}\text { is present}). \end{aligned}$$
(5.2)

Now we apply the following lemma.

Lemma 5.5

The number of good hierarchies satisfies

$$\begin{aligned} |{\mathbb {H}}^{\scriptscriptstyle \mathrm {good}}| \leqslant \mathrm {e}^{O(1/p)}. \end{aligned}$$

Proof

We start by observing that any good hierarchy \({\mathcal {H}}\in {\mathbb {H}}^{\scriptscriptstyle \mathrm {good}}\) has root \(R_p\) such that \(x \leqslant \frac{1}{p^5}\) and \({\mathbf {y}}(R_p) \leqslant \frac{1}{p} \log \frac{1}{p}\) and precision \(Z_p = \frac{1}{p} \log ^{-8} \frac{1}{p}\), so its height \(h({\mathcal {H}})\) is bounded from above by

$$\begin{aligned} h({\mathcal {H}}) \leqslant \frac{{\mathbf {y}}(R_p)}{Z_p} \leqslant \log ^9 \frac{1}{p}. \end{aligned}$$

Moreover, since there are at most \(\log ^{11} \frac{1}{p}\) large seeds in a good hierarchy, the number of vertices \(N_{\mathcal {H}}\) in the hierarchy \({\mathcal {H}}\) obeys

$$\begin{aligned} N_{\mathcal {H}}\leqslant \log ^{11} \frac{1}{p} \cdot \log ^9 \frac{1}{p} = \log ^{20} \frac{1}{p}. \end{aligned}$$
(5.3)

Each vertex of a hierarchy has 0, 1, 2 or 3 descendants, so there are at most \(4^{\log ^{20} \frac{1}{p}}\) unlabelled trees corresponding to the good hierarchies. Finally, since each vertex of a hierarchy is labelled by a sub-rectangle of \(R_p\) with \({\mathbf {x}}(R_p) \leqslant p^{-5}\), the number of choices for each label is bounded from above by

$$\begin{aligned} {\mathbf {x}}(R_p)^2 \cdot {\mathbf {y}}(R_p)^2 \leqslant p^{-10} \cdot \frac{1}{p^2} \log ^2 \frac{1}{p} \leqslant p^{-13}, \end{aligned}$$

so

$$\begin{aligned} |{\mathbb {H}}^{\scriptscriptstyle \mathrm {good}}| \leqslant \left( 4p^{-13}\right) ^{\log ^{20} \frac{1}{p}} = \mathrm {e}^{13 \log ^{21} \frac{1}{p} + \log 4 \log ^{20} \frac{1}{p}} \leqslant \mathrm {e}^{O(1/p)}. \end{aligned}$$

\(\square \)

By Lemma 5.5 it suffices to give a uniform bound on the probability that a given hierarchy is present, if the hierarchy is good. Indeed, it remains to show that

$$\begin{aligned} \max _{{\mathcal {H}}\in {\mathbb {H}}^{\scriptscriptstyle \mathrm {good}}} {\mathbb {P}}_p({\mathcal {H}}\text { is present}) \leqslant \exp \left( - \frac{2C_1}{p} \log ^2 \frac{1}{p} + (2 C_2 + o(1)) \frac{1}{p} \log \frac{1}{p} \right) . \end{aligned}$$

Before we proceed, let us deal with a small technical issue: the possibility of “wide” seeds (Lemma 5.9 below does not work in their presence). Observe that if the hierarchy \({\mathcal {H}}\) that maximises the probability of being present contains a seed with label \(R_s\) such that \({\mathbf {x}}(R_s) > \frac{1}{3p} \log ^{12} \frac{1}{p}\) (i.e., the seed is extremely wide), then the probability that \({\mathcal {H}}\) is present is bounded by the probability that \(R_s\) is horizontally traversable, which, by Lemma 2.2(c) can be bounded as follows:

$$\begin{aligned} {\mathbb {P}}_p(R_s \text { is IF})\leqslant & {} {\mathbb {P}}_p(R_s \text { is hor-trav})\\\leqslant & {} \exp \left( -({\mathbf {x}}(R_s)-2) \left( \tfrac{1}{2} p {\mathbf {y}}(R_s) - 3 p^2 {\mathbf {y}}(R_s)^2\right) \right) \\\leqslant & {} \exp \left( -\left( \frac{1}{p} \log ^{12} \frac{1}{p} -2\right) \left( \tfrac{1}{2} \log ^{-8} \frac{1}{p} - 3 \log ^{-16} \frac{1}{p} \right) \right) \\= & {} \exp \left( -\Omega \left( \frac{1}{p} \log ^3 \frac{1}{p} \right) \right) , \end{aligned}$$

where for the third inequality we used the assumption on \({\mathbf {x}}(R_s)\) and that \({\mathbf {y}}(R_s) < 2Z = \frac{2}{p} \log ^{-8} \frac{1}{p}\). Proposition 5.1 thus holds for hierarchies with wide seeds. Let us therefore assume from here on that \({\mathbf {x}}(R_s) \leqslant \frac{1}{3p} \log ^{12} \frac{1}{p}\) for all seeds.

By the BK-inequality we have

$$\begin{aligned} {\mathbb {P}}_p({\mathcal {H}}\text { is present}) \leqslant \prod _{u \text { seed}} {\mathbb {P}}_p(R_u \text { is IF}) \prod _{v \mapsto w } {\mathbb {P}}_p\left( R_v \Rightarrow R_w \right) \end{aligned}$$
(5.4)

(we ignore here the contributions from splitter vertices).

The following lemma is used to determine a bound for the product of the seeds:

Lemma 5.6

Given a hierarchy \({\mathcal {H}}_{Z,R}\), let \(N_{\mathrm{seed}}\) denote the number of seeds of the hierarchy \({\mathcal {H}}_{Z,R}\), and let \(u_1,\dots ,u_{N_{\mathrm{seed}}}\) be an arbitrary ordering of the seeds of \({\mathcal {H}}_{Z,R}\). Then

$$\begin{aligned} \prod _{u\text { seed}}{\mathbb {P}}_p(R_u \text { is IF}) \le \prod _{n=1}^{N_{\mathrm{seed}}}{\mathbb {P}}_p\left( {\tilde{R}}_n \Rightarrow {\tilde{R}}_{n+1}\right) \end{aligned}$$

where

$$\begin{aligned} {\tilde{R}}_n= \left[ 1,{\mathbf {x}}(R_{u_1})+\dots +{\mathbf {x}}(R_{u_n}) \right] \times \left[ 1,{\mathbf {y}}(R_{u_1})+\dots +{\mathbf {y}}(R_{u_n}) \right] . \end{aligned}$$

Proof

For any \(p>0\) and any rectangle \(R'\) with dimensions (xy) with \(\min \{x,y\} \geqslant 2\) and any \(a \geqslant 2\), \(b \geqslant 1\), we have

$$\begin{aligned} {\mathbb {P}}_p(R' \text { is IF}) \leqslant {\mathbb {P}}_p([1,a] \times [1,b] \Rightarrow [1,a+x] \times [1,b+y]). \end{aligned}$$
(5.5)

Indeed, if the rectangle \([a+1,a+x] \times [b+1, b+y]\) is internally filled, then the event \(\{[1,a] \times [1,b] \Rightarrow [1,a+x] \times [1, b+y]\}\) occurs. An application of the FKG-inequality thus gives (5.5).

Any seed of a hierarchy must have dimensions at least (2, 1) by definition, so an iterated application of (5.5) completes the proof.\(\square \)

Recall the definition of \(U^p(R,R')\) in (4.3) above. We use Lemmas  4.1 and 5.6 to bound the first product on the right-hand side of (5.4):

$$\begin{aligned} \begin{aligned} \prod _{u \text { seed}} {\mathbb {P}}_p(R_u \text { is IF})&\leqslant \prod _{n=1}^{N_{\mathrm{seed}}}{\mathbb {P}}_p({\tilde{R}}_n \Rightarrow {\tilde{R}}_{n+1})\\&\leqslant 2p^{-\xi N_{\mathrm{seed}} } \exp \left( -\sum _{n=0}^{N_{\mathrm{seed}}} U^p ({{\tilde{R}}}_{n} , {{\tilde{R}}}_{n+1})\right) , \end{aligned} \end{aligned}$$

where \({{\tilde{R}}}_0 = [1] \times [1]\).

To bound the second product of (5.4), we use the following lemma:

Lemma 5.7

Let \(p>0\). Let \(N_{\mathrm{splitter}}\) denote the number of splitter vertices of the hierarchy \({\mathcal {H}}\). Then there exists an integer \({\hat{N}} = {\hat{N}}({\mathcal {H}}) \geqslant 1\) and a sequence of nested rectangles \(\hat{R}_0\subset \cdots \subset {\hat{R}}_{{\hat{N}}}\) with the following properties:

  • \({\hat{R}}_0 = {{\tilde{R}}}_{N_{\mathrm{seed}}}\) (with \({{\tilde{R}}}_{N_{\mathrm{seed}}}\) as defined in Lemma 5.6 above),

  • \({\hat{R}}_{{\hat{N}}}\) has dimensions larger than R,

  • \({\mathbf {y}}({\hat{R}}_{n+1})-{\mathbf {y}}({\hat{R}}_n)\leqslant \frac{1}{p} \log ^{-8} \frac{1}{p}\) for every \(0\leqslant n\leqslant {\hat{N}}-1\),

  • for p sufficiently small,

    $$\begin{aligned} \prod _{v\mapsto w}{\mathbb {P}}_p(R_w \Rightarrow R_v)\leqslant 2p^{-\xi N_{\mathrm{splitter}}}\prod _{n=0}^{{\hat{N}}-1}\exp \left( -U^p({\hat{R}}_n, {\hat{R}}_{n+1})\right) . \end{aligned}$$

The proof of this lemma goes by induction, using Lemma 4.1, and it is essentially the same as the proof of [23, Lemma 3.11], so we omit it here.

We use Lemma 5.7 to determine that there exist rectangles \({\hat{R}}_1\subset \cdots \subset {\hat{R}}_{{\hat{N}}}\) satisfying the conditions of the lemma such that

$$\begin{aligned} \prod _{v\mapsto w}{\mathbb {P}}_p(R_v \Rightarrow R_w) \leqslant 2p^{-\xi N_{\mathrm{splitter}}}\prod _{n=0}^{{\hat{N}}-1}\exp \left( -U^p({\hat{R}}_n,{\hat{R}}_{n+1})\right) . \end{aligned}$$

Using Lemmas 4.1 and 5.7 and writing \((R_n)_{n=0}^{N}\) for the concatenation of the sequences \(({{\tilde{R}}}_n)_{n=0}^{N_{\mathrm{seed}}}\) and \(({\hat{R}}_n)_{n=0}^{{\hat{N}}}\), i.e.,

$$\begin{aligned} (R_n)_{n=0}^N\, {:=} \,({{\tilde{R}}}_0, {{\tilde{R}}}_1, \dots , {{\tilde{R}}}_{N_{\mathrm{seed}}}, {\hat{R}}_1, \dots , {\hat{R}}_{{\hat{N}}}) \end{aligned}$$

with \(N\, {:=} \,N_{\mathrm{seed}} + {\hat{N}}\), we bound

$$\begin{aligned} {\mathbb {P}}_p({\mathcal {H}}\text { is present}) \leqslant 4p^{-\xi (N_{\mathrm{seed}} + N_{\mathrm{splitter}})}\, \exp \left( -\sum _{n=0}^{N -1} U^p(R_n, R_{n+1}) \right) . \end{aligned}$$
(5.6)

To bound the first factor in (5.6) we use the following lemma:

Lemma 5.8

Any good hierarchy satisfies

$$\begin{aligned} 4p^{-\xi (N_{\mathrm{seed}} + N_{\mathrm{splitter}})} \leqslant \mathrm {e}^{O(1/p)}. \end{aligned}$$

Proof

By (5.3) there are at most \(\log ^{20} \frac{1}{p}\) vertices in a good hierarchy, and \(\xi = \lceil \log ^2 \frac{1}{p} \rceil \), so for any \({\mathcal {H}}\in {\mathbb {H}}^{\scriptscriptstyle \mathrm {good}}\),

$$\begin{aligned} 4p^{-\xi (N_{\mathrm{seed}} + N_{\mathrm{splitter}})} \leqslant 4p^{-\lceil \log \frac{1}{p} \rceil ^{22}} = 4\left( \mathrm {e}^{\log \frac{1}{p}}\right) ^{ \lceil \log \frac{1}{p} \rceil ^{22}} \leqslant \mathrm {e}^{O(1/p)}. \end{aligned}$$

\(\square \)

The final ingredient of the proof is the following lemma:

Lemma 5.9

Let \({\mathcal {R}}_N = \{R_n\}_{n=0}^N\) be a sequence of increasing, nested rectangles such that

  • \(R_{0} = [1]\times [1]\),

  • \(y_1 \leqslant \frac{2}{p} \log ^{-8} \frac{1}{p}\) and \(\frac{1}{3p} \log \frac{1}{p} \leqslant y_N \leqslant \frac{1}{p} \log \frac{1}{p}\),

  • \(x_1 \leqslant \frac{1}{3p} \log ^{12} \frac{1}{p}\) and \(x_N \geqslant \frac{1}{p^{2}}\).

Then

$$\begin{aligned} \exp \left( -\sum _{n=0}^{N-1} U^p(R_n, R_{n+1})\right) \leqslant \exp \left( - \frac{2C_1}{p} \log ^2 \frac{1}{p} + (2 C_2 + o(1)) \frac{1}{p} \log \frac{1}{p} \right) . \end{aligned}$$

The proof involves a longer computation, so we defer it to Sect. 6.

Proof of Proposition 5.1 subject to Lemma 5.9

We combine the above lemmas and the bounds derived in the discussion to conclude that

as claimed. \(\square \)

It remains to prove Lemma 5.9. We do this in the upcoming section.

6 Variational principles: proof of Lemma 5.9

To prove Lemma 5.9 we will start by setting up some variational principles, similar to [33, Sect. 6]. We start with a few general lemmas.

Assume throughout this section that f(x) and g(y) are positive, non-increasing, convex, Riemann-integrable functions. Let \({\mathbb {R}}_+ = (0,\infty )\) and for \({\underline{a}}=(a_1,a_2) \in {\mathbb {R}}_+^2\) and \({\underline{b}} = (b_1, b_2) \in {\mathbb {R}}_+^2\), write \({\underline{a}} \leqslant {\underline{b}}\) if \(a_1 \leqslant b_1\) and \(a_2 \leqslant b_2\). For \({\underline{a}}, {\underline{b}} \in {\mathbb {R}}^2_{+}\) with \({\underline{a}} \leqslant {\underline{b}}\) and any path \(\gamma \) from \({\underline{a}}\) to \({\underline{b}}\), define

$$\begin{aligned} w_{f,g}(\gamma ) \,{:=} \,\int \limits _\gamma f(x) \mathrm {d}y + g(y) \mathrm {d}x, \end{aligned}$$
(6.1)

and

$$\begin{aligned} W_{f,g}({\underline{a}}, {\underline{b}}) \,{:=} \,\inf _{\gamma \,:\, {\underline{a}} \rightarrow {\underline{b}}} \int \limits _\gamma f(x) \mathrm {d}y + g(y) \mathrm {d}x. \end{aligned}$$
(6.2)

To start, an elementary lemma:

Lemma 6.1

If \({\underline{a}} \leqslant {\underline{b}} \leqslant {\underline{c}}\), then \(W_{f,g} ({\underline{a}}, {\underline{b}}) + W_{f,g}({\underline{b}}, {\underline{c}}) \geqslant W_{f,g} ({\underline{a}}, {\underline{c}})\).

The proof is easy (see [33, Sect. 6]).

Let

$$\begin{aligned} \Delta _{f,g}\, {:=} \,\left\{ (x,y) \in {\mathbb {R}}_+^2 \, : \, f'(x) \ne 0, g'(y) \ne 0, f'(x) = g'(y)\right\} . \end{aligned}$$
(6.3)

Note that since f(x) and g(y) are assumed to be convex decreasing functions, \(\Delta _{f,g}\) describes a simple curve in \( [a,b] \times [c,d] \subset {\mathbb {R}}_+^2\) if \(f'(x) \ne 0\) for all \(x \in [a,b]\) and \(g'(y) \ne 0\) for all \(y \in [c,d]\).

For sets \(A, B \subseteq {\mathbb {R}}_+^2\) we say that A lies Northwest of B and we write \(A \succcurlyeq B\) if for any \({\underline{a}} \in A\) and any \({\underline{b}} \in B\) that satisfy \(a_1 + a_2 = b_1 + b_2\) we have \(a_2 \geqslant b_2\).

Lemma 6.2

If \(\gamma _1\) and \(\gamma _2\) are paths from \({\underline{a}}\) to \({\underline{b}}\), and we have either \(\gamma _1 \succcurlyeq \gamma _2 \succcurlyeq \Delta _{f,g}\) or \(\Delta _{f,g} \succcurlyeq \gamma _2 \succcurlyeq \gamma _1\), then \(w_{f,g} (\gamma _1) \geqslant w_{f,g}(\gamma _2)\).

Proof

To start, assume that \(\gamma _1 \succcurlyeq \gamma _2 \succcurlyeq \Delta _{f,g}\). Let H be the region between \(\gamma _1\) and \(\gamma _2\):

$$\begin{aligned} H \,{:=} \,\{{\underline{u}} \, : \, {\underline{a}} \leqslant {\underline{u}} \leqslant {\underline{b}} \text { and } \gamma _1 \succcurlyeq \{ {\underline{u}} \} \succcurlyeq \gamma _2\}. \end{aligned}$$

By Green’s Theorem in the plane we have

$$\begin{aligned} w_{f,g}(\gamma _1) - w_{f,g}(\gamma _2) = \iint \limits _{H} \left( g'(y) - f'(x) \right) \mathrm {d}x \mathrm {d}y. \end{aligned}$$

Now, since \(\gamma _1 \succcurlyeq \gamma _2 \succcurlyeq \Delta _{f,g}\) we have \(H \succcurlyeq \Delta _{f,g}\), and since moreover f and g are convex decreasing functions, we have \(g'(y) - f'(x) \geqslant 0\) for all \((x,y) \in H\). It follows that \(w_{f,g}(\gamma _1) - w_{f,g}(\gamma _2) \geqslant 0\).

By the same reasoning we have \(w_{f,g}(\gamma _1) - w_{f,g}(\gamma _2) \geqslant 0\) when \(\Delta _{f,g} \succcurlyeq \gamma _2 \succcurlyeq ~\gamma _1\).

\(\square \)

Lemma 6.3

For \({\underline{a}}, {\underline{b}} \in \Delta _{f,g}\) with \({\underline{a}} \leqslant {\underline{b}}\), let \(\gamma _0 \,{:=}\,\Delta _{f,g} \cap ([a_1, b_1] \times [a_2, b_2])\), then \(W_{f,g}({\underline{a}}, {\underline{b}}) = w_{f,g}(\gamma _0)\).

Proof

Suppose by contradiction that \(\gamma _1 \ne \gamma _0\) is a minimiser of \(W_{f,g}({\underline{a}}, {\underline{b}})\), and \(\gamma _0\) is not. Then \(\gamma _1\) must intersect \(\gamma _0\) in at least two points (counting \({\underline{a}}\) and \({\underline{b}}\) as intersection points as well). So we can find a set of disjoint curves \(\{\eta _i^1\}\) with \(\eta _i^1 \subset \gamma _1\) and a set of disjoint curves \(\{\eta _i^0\}\) with \(\eta _i^0 \subset \gamma _0\) so that \(\gamma _1 {\setminus } \cup _i \eta _i^1 = \gamma _0 {\setminus } \cup _i \eta _i^0\) and \(\eta _i^1 \succcurlyeq \eta _i^0 \succcurlyeq \Delta _{f,g}\) or \(\Delta _{f,g} \succcurlyeq \eta _i^0 \succcurlyeq \eta _i^1\) for each i, and so that \(\eta _i^0\) and \(\eta _i^1\) have the same end-points. By Lemma 6.2, replacing the curve \(\eta _i^1\) by \(\eta _i^0\) in \(\gamma _1\) does not increase the value of the line integral. Repeating this procedure for each such interval, we end up replacing the minimiser \(\gamma _1\) by \(\gamma _0\) without increasing the value of the integral, contradicting the assumption that \(\gamma _0\) was not a minimiser. \(\square \)

Given a set of points \(\{{\underline{a}}_{\scriptscriptstyle (i)}\}\), with \({\underline{a}}_{\scriptscriptstyle (i)} \in {\mathbb {R}}_+^2\), we write \({\underline{a}}_{\scriptscriptstyle (1)} \rightarrow {\underline{a}}_{\scriptscriptstyle (2)} \rightarrow \cdots \rightarrow {\underline{a}}_{\scriptscriptstyle (n)}\) for the path that linearly interpolates between successive points \({\underline{a}}_{\scriptscriptstyle (i)}\) and \({\underline{a}}_{\scriptscriptstyle (i+1)}\). Given a path \(\gamma \) and two points \({\underline{a}}, {\underline{b}} \in \gamma \), we write \({\underline{a}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{b}}\) for the part of \(\gamma \) between \({\underline{a}}\) and \({\underline{b}}\).

Lemma 6.4

If \(g(y)=c\) for some constant \(c \geqslant 0\) and f(x) is a positive, monotone decreasing function, then for \({\underline{a}} \leqslant {\underline{b}}\),

$$\begin{aligned} W_{f,g}({\underline{a}}, {\underline{b}}) = c (b_1-a_1) + f(b_1) (b_2 -a_2), \end{aligned}$$

and the path from \({\underline{a}}\) to \({\underline{b}}\) that minimises \(W_{f,g}({\underline{a}}, {\underline{b}})\) is \({\underline{a}} \rightarrow (b_1, a_2) \rightarrow {\underline{b}}\).

Proof

This follows directly from the definition of \(W_{f,g}\) and the assumptions on f and g.\(\square \)

Recall the definitions of \(\psi (y)\) and \(\phi (x)\) from (4.1) to (4.2), and the definition of \(\Delta _{f,g}\) in (6.3). Observe that

so \(\psi '(x) = \phi '(y)\) is solved by

$$\begin{aligned} x(y) = \frac{e^{3py}}{3p} \qquad \text { and } \qquad y(x) = \frac{1}{3p} \log (3px) \end{aligned}$$

when both \(\psi (x) \ne 0\) and \(\phi (y) \ne 0\). Observe that

$$\begin{aligned} \begin{aligned} y\left( \frac{3 \xi ^2}{p}\right)&= \frac{1}{3p} \log (9 \xi ^2) < \frac{4}{p} \log \log \frac{1}{p},\\ y\left( \frac{1}{p^2}\right)&= \frac{1}{3p} \log \frac{3}{p},\\ x\left( \frac{4}{p} \log \log \frac{1}{p}\right)&= \frac{1}{3p} \log ^{12} \frac{1}{p}. \end{aligned} \end{aligned}$$

We can thus write

$$\begin{aligned} \begin{aligned} \Delta _{\psi ,\phi }&= \left\{ (x,y) \in {\mathbb {R}}_+^2 \, : \, \psi '(x) \ne 0, \phi '(y) \ne 0, \psi '(x) = \phi '(y) \right\} \\&= \left\{ \left( \frac{\mathrm {e}^{3py}}{3 p}, y\right) \, : \, y \in \left( \frac{4}{p} \log \log \frac{1}{p}, \frac{1}{3p} \log \frac{3}{p} \right] \right\} . \end{aligned} \end{aligned}$$

The leftmost and rightmost points of \(\Delta _{\psi ,\phi }\) are given by

$$\begin{aligned} {\underline{u}} = \left( \frac{1}{3 p} \log ^{12} \frac{1}{p}, \frac{4}{p} \log \log \frac{1}{p} \right) \quad \text { and } \quad {\underline{v}} = \left( \frac{1}{p^2}, \frac{1}{3p} \log \frac{3}{p}\right) . \end{aligned}$$
(6.4)

Lemma 6.5

Let \({\underline{u}} \) and \({\underline{v}}\) be as in (6.4), and let \({\underline{a}}\) be such that \({\underline{a}} \leqslant {\underline{u}}\), and let \({\underline{b}}\) be such that \({\underline{v}} \leqslant {\underline{b}}\). Then

$$\begin{aligned} W_{\psi ,\phi }({\underline{a}}, {\underline{b}}) = W_{\psi ,\phi }({\underline{a}}, {\underline{u}}) + W_{\psi ,\phi }({\underline{u}}, {\underline{v}}) + W_{\psi , \phi }({\underline{v}}, {\underline{b}}). \end{aligned}$$

Proof

By Lemma 6.1, the right-hand side is an upper bound on \(W_{\psi ,\phi }({\underline{a}}, {\underline{b}})\). It remains to prove that it is also a lower bound.

Since \(\psi \) and \(\phi \) are decreasing, positive, continuous functions, any path that minimises \(W_{\psi ,\phi }({\underline{a}}, {\underline{b}})\) must be a coordinate-wise increasing path. Fix a coordinate-wise increasing path \(\gamma \subset {\mathbb {R}}_+^2\) from \({\underline{a}}\) to \({\underline{b}}\). Then either

  1. (a)

    \(\gamma \cap \Delta _{\psi ,\phi } \ne \varnothing ,\) or

  2. (b)

    \(\gamma \cap \Delta _{\psi ,\phi } = \varnothing .\)

Fig. 6
figure 6

The partitioning of \(\gamma \) used in Lemma 6.5 for the case \(\gamma \cap \Delta _{\psi ,\phi } \ne \varnothing \) in (a), and for the case \(\gamma \cap \Delta _{\psi ,\phi } = \varnothing \) in (b)

Consider first case (a). Write \({\underline{c}} \in \gamma \) for the first point in \(\gamma \) such that either \(c_1 = u_1\) or \(c_2 = u_2\) and \({\underline{d}} \in \gamma \) for the first point in \(\gamma \) such that either \(d_1 = v_1\) or \(d_2 = v_2\). Write \({\underline{l}}\) and \({\underline{r}}\) for the first and last point along \(\gamma \) where \(\gamma \) and \(\Delta _{\psi , \phi }\) intersect. Since \(\gamma \) is coordinate-wise increasing we have \({\underline{a}} \leqslant {\underline{c}} \leqslant {\underline{l}} \leqslant {\underline{r}} \leqslant {\underline{d}} \leqslant {\underline{b}}\). See Fig. 6a. We split the integral along \(\gamma \) into five parts:

$$\begin{aligned} w_{\psi ,\phi }(\gamma )= & {} w_{\psi ,\phi }({\underline{a}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{c}}) + w_{\psi ,\phi }({\underline{c}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{l}}) + w_{\psi ,\phi }({\underline{l}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{r}}) + w_{\psi ,\phi }({\underline{r}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{d}})\\&+ w_{\psi ,\phi }({\underline{d}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{b}}). \end{aligned}$$

Using Lemma 6.3 we split the minimising integral from \({\underline{u}}\) to \({\underline{v}}\) into three parts:

$$\begin{aligned} W_{\psi , \phi }({\underline{u}}, {\underline{v}}) = w_{\psi ,\phi } \big ({\underline{u}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{l}} \big ) + w_{\psi ,\phi }\big ({\underline{l}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{r}}\big ) + w_{\psi ,\phi }\big ({\underline{r}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{v}}\big ). \end{aligned}$$

By Lemma 6.4,

$$\begin{aligned} w_{\psi , \phi }({\underline{a}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{c}}) \geqslant w_{\psi , \phi }({\underline{a}} \rightarrow (c_1, a_2) \rightarrow {\underline{c}}). \end{aligned}$$

By Lemma 6.2 and the fact that either

$$\begin{aligned} {\underline{c}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{l}} \succcurlyeq {\underline{c}} \rightarrow {\underline{u}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{l}} \succcurlyeq \Delta _{\psi ,\phi } \qquad \text { or } \qquad \Delta _{\psi ,\phi } \succcurlyeq {\underline{c}} \rightarrow {\underline{u}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{l}} \succcurlyeq {\underline{c}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{l}}, \end{aligned}$$

we have

$$\begin{aligned} w_{\psi , \phi }({\underline{c}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{l}}) \geqslant w_{\psi , \phi }({\underline{c}} \rightarrow {\underline{u}}) + w_{\psi , \phi }\big ({\underline{u}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{l}}\big ). \end{aligned}$$

By Lemma 6.1,

$$\begin{aligned} w_{\psi , \phi }({\underline{a}} \rightarrow (c_1, a_2) \rightarrow {\underline{c}}) + w_{\psi , \phi }({\underline{c}} \rightarrow {\underline{u}})= & {} w_{\psi , \phi }({\underline{a}} \rightarrow (c_1, a_2) \rightarrow {\underline{c}} \rightarrow {\underline{u}}) \\\geqslant & {} W_{\psi ,\phi }({\underline{a}}, {\underline{u}}). \end{aligned}$$

Moreover, by Lemma 6.3,

$$\begin{aligned} w_{\psi , \phi }({\underline{l}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{r}}) \geqslant w_{\psi , \phi }\big ({\underline{l}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{r}}\big ). \end{aligned}$$

By Lemma 6.2 and the fact that either

$$\begin{aligned} {\underline{r}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{d}} \succcurlyeq {\underline{r}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{v}} \rightarrow {\underline{d}} \succcurlyeq \Delta _{\psi ,\phi } \qquad \text { or } \qquad \Delta _{\psi ,\phi } \succcurlyeq {\underline{r}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}}{\underline{v}} \rightarrow {\underline{d}} \succcurlyeq {\underline{r}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{d}}, \end{aligned}$$

we have

$$\begin{aligned} w_{\psi , \phi }({\underline{r}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{d}}) \geqslant w_{\psi , \phi }\big ({\underline{r}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{v}} \big ) + w_{\psi , \phi }({\underline{v}} \rightarrow {\underline{d}}). \end{aligned}$$

And finally, since \({\underline{v}} \rightarrow {\underline{d}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{b}}\) is a path from \({\underline{v}}\) to \({\underline{b}}\),

$$\begin{aligned} W_{\psi , \phi }({\underline{v}}, {\underline{b}}) \leqslant w_{\psi , \phi }({\underline{v}} \rightarrow {\underline{d}}) + w_{\psi , \phi }({\underline{d}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{b}}). \end{aligned}$$

Combining the above inequalities we obtain

$$\begin{aligned} w_{\psi ,\phi }(\gamma )\geqslant & {} W_{\psi , \phi }({\underline{a}} , {\underline{u}}) + w_{\psi , \phi }\big ({\underline{u}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{l}}\big ) + w_{\psi , \phi }\big ({\underline{l}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{r}}\big )\nonumber \\&+ w_{\psi , \phi }\big ({\underline{r}} {\mathop {\longrightarrow }\limits ^{\Delta _{\psi ,\phi }}} {\underline{v}}\big ) + w_{\psi , \phi }({\underline{v}} \rightarrow {\underline{d}}) + W_{\psi , \phi }({\underline{v}}, {\underline{b}}) - w_{\psi , \phi }({\underline{v}} \rightarrow {\underline{d}})\nonumber \\\geqslant & {} W_{\psi ,\phi }({\underline{a}}, {\underline{u}}) + W_{\psi ,\phi }({\underline{u}}, {\underline{v}}) + W_{\psi ,\phi }({\underline{v}}, {\underline{b}}). \end{aligned}$$
(6.5)

Now we consider case (b), that \(\gamma \cap \Delta _{\psi ,\phi } = \varnothing \). Let \({\underline{f}}\) be the first point on \(\gamma \) such that \(f_1 = u_1\), and let \({\underline{g}}\) be the first point on \(\gamma \) such that \(g_1 = v_1\). See Fig. 6b. We divide the integral along \(\gamma \) into three parts:

$$\begin{aligned} w_{\psi ,\phi }(\gamma ) = w_{\psi , \phi }({\underline{a}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{f}})+ w_{\psi , \phi }({\underline{f}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{g}}) + w_{\psi , \phi }({\underline{g}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{b}}). \end{aligned}$$

By Lemma 6.4,

$$\begin{aligned} \begin{aligned} w_{\psi ,\phi }({\underline{a}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{f}})&\geqslant w_{\psi ,\phi }({\underline{a}} \rightarrow (f_1,a_2) \rightarrow {\underline{f}}) \\&= w_{\psi ,\phi }({\underline{a}} \rightarrow (u_1, a_2 ) \rightarrow {\underline{u}}) + w_{\psi ,\phi }({\underline{u}} \rightarrow {\underline{f}})\\&= W_{\psi ,\phi }({\underline{a}}, {\underline{u}}) + w_{\psi ,\phi }({\underline{u}} \rightarrow {\underline{f}}). \end{aligned} \end{aligned}$$

Since \(\gamma \cap \Delta _{\psi ,\phi } = \varnothing \), either \(f_2 \geqslant u_2\) and \(g_2 \geqslant v_2\) or \(f_2 \leqslant u_2\) and \(g_2 \leqslant v_2\), so that either

$$\begin{aligned} {\underline{u}} \rightarrow {\underline{f}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{g}} \rightarrow {\underline{v}} \succcurlyeq \Delta _{\psi ,\phi } \qquad \text { or } \qquad \Delta _{\psi ,\phi } \succcurlyeq {\underline{u}} \rightarrow {\underline{f}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{g}} \rightarrow {\underline{v}} . \end{aligned}$$

It thus follows by Lemmas 6.2 and 6.3 that

$$\begin{aligned} w_{\psi ,\phi }({\underline{f}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{g}}) \geqslant W_{\psi ,\phi }({\underline{u}}, {\underline{v}}) - w_{\psi ,\phi }({\underline{u}} \rightarrow {\underline{f}}) - w_{\psi , \phi }({\underline{g}} \rightarrow {\underline{v}}). \end{aligned}$$

Finally, since \({\underline{v}} \rightarrow {\underline{g}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{b}}\) is a path from \({\underline{v}}\) to \({\underline{b}}\),

$$\begin{aligned} W_{\psi ,\phi }({\underline{v}}, {\underline{b}}) \leqslant w_{\psi ,\phi }({\underline{v}} \rightarrow {\underline{g}}) + w_{\psi ,\phi }({\underline{g}} {\mathop {\rightarrow }\limits ^{\gamma }} {\underline{b}}). \end{aligned}$$

Combining the above inequalities we obtain

$$\begin{aligned} w_{\psi ,\phi }(\gamma )\geqslant & {} W_{\psi ,\phi }({\underline{a}}, {\underline{u}}) + w_{\psi ,\phi }({\underline{u}} \rightarrow {\underline{f}}) + W_{\psi ,\phi }({\underline{u}}, {\underline{v}}) - w_{\psi ,\phi }({\underline{u}} \rightarrow {\underline{f}})\nonumber \\&\quad - w_{\psi , \phi }({\underline{g}} \rightarrow {\underline{v}}) + W_{\psi ,\phi }({\underline{v}}, {\underline{b}}) - w_{\psi ,\phi }({\underline{v}} \rightarrow {\underline{g}})\nonumber \\= & {} W_{\psi ,\phi }({\underline{a}}, {\underline{u}}) + W_{\psi ,\phi }({\underline{u}}, {\underline{v}}) + W_{\psi ,\phi }({\underline{v}}, {\underline{b}}). \end{aligned}$$
(6.6)

From (6.5) and (6.6) we conclude that the lower bound \(w_{\psi ,\phi }(\gamma ) \geqslant W_{\psi ,\phi }({\underline{a}}, {\underline{u}}) + W_{\psi ,\phi }({\underline{u}}, {\underline{v}}) + W_{\psi ,\phi }({\underline{v}}, {\underline{b}})\) holds uniformly for any path \(\gamma \), and therefore, it also holds for the infimum, completing the proof. \(\square \)

The following lemma now states the crucial bound:

Lemma 6.6

Let \({\underline{u}}\) and \({\underline{v}}\) be as in (6.4), and let \({\underline{a}} \leqslant {\underline{u}}\), \(a_2 = o(1/p)\), and \({\underline{b}} \geqslant {\underline{v}}\). Then

$$\begin{aligned} W_{\psi ,\phi }\left( {\underline{a}}, {\underline{b}} \right) \geqslant \frac{1}{6p} \log ^2 \frac{1}{p} - (1+o(1))\frac{1}{3p} \log \left( \frac{8}{3 \mathrm {e}}\right) \log \frac{1}{p}. \end{aligned}$$

Proof

Since we assumed \({\underline{a}} \leqslant {\underline{u}}\) and \({\underline{b}} \geqslant {\underline{v}}\), Lemma 6.5 gives

$$\begin{aligned} W_{\psi ,\phi }({\underline{a}}, {\underline{b}}) \geqslant W_{\psi ,\phi }({\underline{a}}, {\underline{u}}) + W_{\psi ,\phi }({\underline{u}}, {\underline{v}}). \end{aligned}$$
(6.7)

Recall that we have set \(\xi = \lceil \log ^2 \frac{1}{p} \rceil \) and \(\delta _\xi = 1-2/\xi \). We use Lemma 6.4, that \(a_1 \leqslant u_1\), and that \(a_2 = o(1/p)\) to bound

$$\begin{aligned} W_{\psi ,\phi }({\underline{a}}, {\underline{u}})\geqslant & {} - \left( \log (8p^2 u_1 + 8p) +1/\xi \right) \left( u_2- a_2\right) \nonumber \\= & {} \frac{4}{p} \log \frac{1}{p} \log \log \frac{1}{p} - o\left( \frac{1}{p} \log \frac{1}{p}\right) . \end{aligned}$$
(6.8)

Now we bound \(W_{\psi ,\phi }({\underline{u}}, {\underline{v}})\). It follows by Lemma 6.3 that

$$\begin{aligned} W_{\psi ,\phi }\left( {\underline{u}}, {\underline{v}}\right)&= \int \limits _{\frac{4}{p}\log \log \frac{1}{p}}^{\frac{1}{3p} \log \frac{3}{p}} \psi (x(y)) \mathrm {d}y + \phi ( y) x'(y) \mathrm {d}y \nonumber \\&\geqslant \int \limits _{\frac{4}{p}\log \log \frac{1}{p}}^{\frac{1}{3p} \log \frac{1}{p}} \left( - \log \left( \frac{8p}{3} \mathrm {e}^{3py} \right) - 1/\xi + \mathrm {e}^{-3py} \cdot \mathrm {e}^{3py}\right) \mathrm {d}y\nonumber \\&\geqslant \int \limits _{\frac{4}{p}\log \log \frac{1}{p}}^{\frac{1}{3p} \log \frac{1}{p}}\left( 1 - 3/\xi - \log \left( \frac{8p}{3}\right) - 3py \right) \mathrm {d}y. \end{aligned}$$
(6.9)

The integral on the right-hand side evaluates to

$$\begin{aligned}&\left( 1 - 3/\xi - \log \frac{8p}{3} \right) \Bigr [ y \,\Bigl ]_{\frac{4}{p}\log \log \frac{1}{p}}^{\frac{1}{3p} \log \frac{1}{p}} - \left[ \frac{3 p y^2}{2} \right] _{\frac{4}{p}\log \log \frac{1}{p}}^{\frac{1}{3p} \log \frac{1}{p}}\\&\quad = \frac{1}{6p} \log ^2 \frac{1}{p} - \frac{4}{p} \log \frac{1}{p} \log \log \frac{1}{p} - (1+o(1))\frac{1}{3p} \log \frac{8}{3 \mathrm {e}} \log \frac{1}{p}. \end{aligned}$$

(Observe that the first term in the first integral in (6.9) thus gives a complementary bound to (2.7), while the second term is complementary to (2.6).) It follows that

$$\begin{aligned} W_{\psi ,\phi }\left( {\underline{u}}, {\underline{v}} \right)\geqslant & {} \frac{1}{6p} \log ^2 \frac{1}{p} - \frac{4}{p} \log \frac{1}{p} \log \log \frac{1}{p} \nonumber \\&- (1+o(1))\frac{1}{3p} \log \frac{8}{3\mathrm {e}} \log \frac{1}{p} . \end{aligned}$$
(6.10)

Substituting (6.8) and (6.10) into (6.7) completes the proof. \(\square \)

Proof of Lemma 5.9

Given a sequence of increasing rectangles \((R_n)_{n=0}^N\), let \((x_n, y_n) \in {\mathbb {R}}_+^2\) denote the dimensions of \(R_n\). Construct the path \(\gamma \subset {\mathbb {R}}_+^2\) by linearly interpolating between successive points \((x_n, y_n)\), i.e.,

$$\begin{aligned} \gamma = (x_0, y_0) \rightarrow (x_1, y_1) \rightarrow \cdots \rightarrow (x_N, y_N). \end{aligned}$$

Recall the definition of \(U^p(R,R')\) from (4.3) and recall that \(\delta _\xi = 1 - \log ^{-2} \frac{1}{p}\). It follows from Lemma 6.1 that

$$\begin{aligned} \sum _{n=0}^{N-1} U^p (R_n, R_{n+1}) \geqslant \delta _\xi W_{\psi , \phi }\left( (x_0, y_0), (x_{N}, y_{N})\right) , \end{aligned}$$

and it follows from Lemma 6.6 that the right-hand side is bounded from below by

$$\begin{aligned} \frac{1}{6p} \log ^2 \frac{1}{p} - (1+o(1))\frac{1}{3p} \log \left( \frac{8}{3 \mathrm {e}}\right) \log \frac{1}{p}, \end{aligned}$$

completing the proof. \(\square \)

7 The critical probability: proof of Theorem 1.1

We start with the upper bound. From [23] we know that if \(L \ll \mathrm {e}^{p^{-1+\varepsilon }}\) for any \(\varepsilon >0\), then \({\mathbb {P}}_p([L]^2\) is IF\() = o(1)\), so we assume that \(L \geqslant \mathrm {e}^{p^{-1+\varepsilon }}\). Let \(m=p^{-5}\). The probability that \([L]^2\) is internally filled is bounded from below by the probability that \([L]^2\) contains exactly one internally filled translate of \([m]^2\), and that \(\{[m]^2 \Rightarrow [L]^2\}\) occurs. Indeed, let \(R_{\scriptscriptstyle (i,j)} = [(i-1)m+1, im] \times [(j-1)m+1, jm]\), and let

$$\begin{aligned} A_{(i,j)} \,{:=} \,\{R_{\scriptscriptstyle (i,j)} \text { is IF}\} \cap \{\forall (i',j') \ne (i,j): R_{\scriptscriptstyle (i',j')} \text { is not IF}\} \cap \{R_{\scriptscriptstyle (i,j)} \Rightarrow [L]^2\}. \end{aligned}$$

Then

$$\begin{aligned} \{[L]^2 \text { is IF}\} \supseteq \bigsqcup _{i,j = 1}^{\lfloor L / m\rfloor } A_{\scriptscriptstyle (i,j)}, \end{aligned}$$

so that

$$\begin{aligned} {\mathbb {P}}_p([L]^2 \text { is IF})\geqslant & {} \sum _{i,j=1}^{\lfloor L/m\rfloor } \bigg ( {\mathbb {P}}_p(R_{\scriptscriptstyle (i,j)}\text { is IF}) - \big (1-{\mathbb {P}}_p(R_{\scriptscriptstyle (i,j)} \Rightarrow [L]^2)\big ) \nonumber \\&- \sum _{\begin{array}{c} i',j'=1:\\ i' \ne i , j' \ne j \end{array}}^{\lfloor L / m\rfloor } {\mathbb {P}}_p(R_{\scriptscriptstyle (i,j)} \text { and } R_{\scriptscriptstyle (i',j')} \text { are IF})\bigg ). \end{aligned}$$
(7.1)

To bound \({\mathbb {P}}_p(R_{\scriptscriptstyle (i,j)} \Rightarrow [L]^2)\), observe that if every horizontal and vertical line segment of length \(p^{-5}\) intersecting \([L]^2\) contains a pair of adjacent infected sites, then it must be the case that \(\{R_{\scriptscriptstyle (i,j)} \Rightarrow [L]^2\}\) occurs. We can bound the probability of this event from below by

$$\begin{aligned} (1-(1-p^2)^{\frac{m}{2}})^{4 L^2} = \exp \big (4 L^2 \mathrm {e}^{-\frac{1}{2 p^{3}}(1+o(1))}\big ) \geqslant \exp ( c \mathrm {e}^{-c' p^{-3}}), \end{aligned}$$

for some \(c, c'>0\) and p sufficiently small. Note that by Proposition 2.1, the right-hand side is \(1-o({\mathbb {P}}_p([m]^2 \text { is IF}))\). Inserting this bound into (7.1) and summing over the indices, we obtain

$$\begin{aligned} \begin{aligned} {\mathbb {P}}_p([L]^2 \text { is IF})&\geqslant \frac{L^2}{m^2} \bigg ({\mathbb {P}}_p([m]^2 \text { is IF}) - \big (1-\exp ( c \mathrm {e}^{-c' p^{-3}})\big ) - \frac{L^2}{m^2} {\mathbb {P}}_p([m]^2 \text { is IF})^2 \bigg )\\&\geqslant \frac{L^2}{2 m^2} {\mathbb {P}}_p([m]^2 \text { is IF})\Big (1 - \frac{2 L^2}{m^2} {\mathbb {P}}_p([m]^2 \text { is IF})\Big ), \end{aligned} \end{aligned}$$

where the second inequality holds for p sufficiently small. Taking minus the logarithm on both sides and applying the above bound and Proposition 2.1 again we obtain the inequality

$$\begin{aligned}&-\log ({\mathbb {P}}_p([L]^2 \text { is IF}))\\&\quad \leqslant 2C_1 \frac{1}{p} \log ^2 \frac{1}{p} - 2C_2 \frac{1}{p} \log \frac{1}{p} - 2 \log L + 10 \log \frac{1}{p} + \log 2 \\&\qquad - \log \bigg (1- \frac{2L^2}{p^{10}} \exp \Big (-2C_1 \frac{1}{p} \log ^2 \frac{1}{p} + (2C_2-o(1)) \frac{1}{p} \log \frac{1}{p}\Big )\bigg ). \end{aligned}$$

Observe that if the right-hand side tends to 0 from above when we let \(p \rightarrow 0\), then \({\mathbb {P}}_p\left( [L]^2 \text { is IF}\right) \rightarrow 1\). To make the right-hand side vanish we will fix L to be equal to \(\Lambda \), where

$$\begin{aligned} \Lambda = \Lambda (p) \,{:=} \,\exp \left( \frac{C_1}{p} \log ^2 \frac{1}{p} - \frac{C_2 -\eta _u}{p} \log \frac{1}{p} \right) , \end{aligned}$$

where \(\eta _u \,{:=} \, p (10 + \log 2 \log ^{-1} \frac{1}{p} ) = o(1)\). Indeed, with this choice all the leading order terms cancel, and we obtain

$$\begin{aligned} -\log ({\mathbb {P}}_p([L]^2 \text { is IF})) \leqslant -\log \bigg (1- 4 \exp \Big (-o\Big (\frac{1}{p} \log \frac{1}{p} \Big )\Big )\bigg ) \rightarrow 0 \text { as } p \rightarrow 0, \end{aligned}$$

as desired.

Now we invert \(\Lambda (p)\) to find \(p_u = p_u (L)\), an asymptotically minimal sequence in p such that

$$\begin{aligned} {\mathbb {P}}_{p_u(L)} ([L]^2 \text { is IF}) \rightarrow 1 \qquad \text { as } \qquad L \rightarrow \infty . \end{aligned}$$

We can express \(p_u\) and \(1/p_u\) in terms of \(\Lambda \):

$$\begin{aligned} p_u = \frac{C_1\log ^2 \frac{1}{p_u} - (C_2-\eta _{u}) \log \frac{1}{p_u}}{ \log \Lambda }, \end{aligned}$$
(7.2)

and

$$\begin{aligned} \frac{1}{p_u} = \frac{ \log \Lambda }{C_1 \log ^2 \frac{1}{p_u} - (C_2- \eta _{u}) \log \frac{1}{p_u}}. \end{aligned}$$
(7.3)

Substituting (7.3) into (7.2), we get

$$\begin{aligned} p_u= & {} \frac{1}{\log \Lambda } \Biggl (C_1 \left( \left( \log \log \Lambda \right) - \log \left( C_1 \log ^2 \frac{1}{p_u} - (C_2-\eta _{u}) \log \frac{1}{p_u} \right) \right) ^2 \\&- (C_2-\eta _{u}) \left( \left( \log \log \Lambda \right) - \log \left( C_1 \log ^2 \frac{1}{p_u} - (C_2-\eta _{u}) \log \frac{1}{p_u} \right) \right) \Biggr ). \end{aligned}$$

For sufficiently large \(\Lambda \) (and hence for small \(p_u\)) and \(\delta >0\), the following inequalities hold:

$$\begin{aligned} \frac{1}{p_u} \leqslant \log \Lambda \leqslant \frac{1}{p_u^{1 + \delta }}, \qquad \text { and } \qquad \log \frac{1}{p_u} \leqslant \log \log \Lambda \leqslant (1+ \delta ) \log \frac{1}{p_u}. \end{aligned}$$

Whence we obtain the asymptotic formula

$$\begin{aligned} p_u= & {} \frac{C_1 (\log \log \Lambda )^2}{\log \Lambda } - \frac{4 C_1 \log \log \Lambda \log \log \log \Lambda }{\log \Lambda } \\&+ \frac{\left( C_2 + 2 C_1 \log C_1 + \delta \right) \log \log \Lambda }{\log \Lambda }, \end{aligned}$$

for \(\delta >0\), giving the upper bound in Theorem 1.1.

Now we prove the lower bound. Again let \(m=p^{-5}\) and let \(n = \frac{1}{p} \log \frac{1}{p}\) and let \(L \gg m,n\). Let \({\mathcal {R}}\) denote the set of all rectangles \(R \subset [L]^2\) with dimensions (xy) such that \(m/3 \leqslant x \leqslant m\) and \(n/3 \leqslant y \leqslant n\). It is a straightforward consequence of the proof of [23, Lemma 3.7] that if \([L]^2\) is internally filled, then there must exist a rectangle \(R \in {\mathcal {R}}\) such that \(\{R \Rightarrow [L]^2\}\) occurs.Footnote 13 The number of rectangles in \({\mathcal {R}}\) is bounded by \( mn L^2\), so by Proposition 5.1,

$$\begin{aligned} \begin{aligned} {\mathbb {P}}_p\left( [L]^2 \text { is IF}\right)&= {\mathbb {P}}_p\Big (\bigcup _{R \in {\mathcal {R}}} \left\{ R \text { is IF} \right\} \cap \left\{ R \Rightarrow [L]^2 \right\} \Big )\\&\leqslant m n L^2 \exp \Big (-2C_1\frac{1}{p} \log ^2 \frac{1}{p} + (2C_2 + \zeta )\frac{1}{p} \log \frac{1}{p} \Big ) \end{aligned} \end{aligned}$$

for any \(\zeta > 0\). Taking the logarithm of both sides gives

$$\begin{aligned} \log {\mathbb {P}}_p\left( [L]^2 \text { is IF}\right) \leqslant 2 \log L - 2C_1 \frac{1}{p} \log ^2 \frac{1}{p} + 2(C_2 + \eta _\ell ) \frac{1}{p}\log \frac{1}{p}, \end{aligned}$$

where \(\eta _\ell \, {:=} \,p \log (mn) + \zeta \). Observe that if the upper bound tends to \(- \infty \) as \(p \rightarrow 0\), then \({\mathbb {P}}_p\left( [L]^2 \text { is IF}\right) \rightarrow 0\). To minimise the right-hand side, we will fix L to be equal to \(\lambda \), where

$$\begin{aligned} \lambda = \lambda (p) \,{:=} \,\exp \left( \frac{C_1}{p} \log ^2 \frac{1}{p} - \frac{C_2 +\eta _\ell }{p} \log \frac{1}{p} \right) . \end{aligned}$$

Again, we invert \(\lambda (p)\), now to find \(p_\ell = p_\ell (L)\), an asymptotically maximal sequence in p such that

$$\begin{aligned} {\mathbb {P}}_{p_\ell (L)} ([L]^2 \text { is IF}) \rightarrow 0 \qquad \text { as } \qquad L \rightarrow \infty . \end{aligned}$$

Using the same steps as we used in the proof of the upper bound, we can now determine that \(p_\ell \) satisfies

$$\begin{aligned} p_\ell= & {} \frac{C_1 (\log \log \lambda )^2}{\log \lambda } - \frac{4 C_1\log \log \lambda \log \log \log \lambda }{ \log \lambda } \\&+ \frac{\left( C_2 + 2 C_1 \log C_1 - \delta \right) \log \log \lambda }{\log \lambda } \end{aligned}$$

for some \(\delta >0\) that can be chosen arbitrarily small but depends on \(\zeta \). This concludes the proof of Theorem 1.1. \(\square \)