1 Introduction

In this paper, we apply the viewpoints of stochastic topology and topological and geometric data analysis to a discrete geometric model from probability theory: the d-dimensional Eden cell growth model (EGM). The 2-dimensional EGM was first introduced and simulated by Murray Eden [17, 18] as a model for the growth of colonies of non-motile bacteria on flat surfaces [19]. It is defined on \(\mathbb {R}^2\), using the regular square tessellation of the plane, as follows. Start at time one with one square tile at the origin. At each time step, add a new square tile selected uniformly from among all tiles adjacent to the structure but not yet contained in it (this set of tiles is called the site perimeter). This process produces a shape that is well approximated by a convex set but has interesting geometry at the boundary—see Fig. 2. Here we study the natural higher-dimensional generalization of the EGM to the regular cubical lattice in \(\mathbb {R}^d\).

In the probability literature, the EGM is studied as an example of first-passage percolation [3, Chap. 6], a process which models the spread of a fluid or an infection in a nonhomogeneous medium. This literature mainly focuses on the large-scale structure and statistics of this process, about which a fair amount is known; one of the most important results is the Cox–Durrett shape theorem [12], which shows that under mild assumptions, growth is generally ball-like, rather than fractal as one might initially expect. That is, over time, the shape of the resulting structure looks more and more like a rescaling of a certain convex set which depends only on the model parameters—so, in the case of the Eden model, only on the dimension.Footnote 1

The shape theorem restricts all nondeterministic behavior to a collar near the boundary of this convex set which is vanishingly small compared to the whole structure, but whose thickness measured in tiles tends to infinity. As far as we know, not much previous attention has been dedicated to the local geometry in this region for any first-passage percolation model. This local geometry naturally includes the topology: although holes of any arbitrarily large size can (and, with probability 1, will at some point) appear at the boundary of the Eden model, with high probability, those holes and other nontrivial cycles get smaller and smaller in comparison to the overall shape. Moreover, most of the nontrivial topology occurs at the smallest scales—that is, most of the homology is generated by very small cycles. In short, by exploring the topology of the Eden model, we quantify small-scale perturbations of the boundary.

Fig. 1
figure 1

(a) A simulation of the two-dimensional Eden model with 100,000 tiles shown in light gray and (b) the lichen Phlyctis argena. Many natural processes result in “Eden-like” growth

Fig. 2
figure 2

Simulation of the 2D and 3D Eden growth models up to time 10,000 and 30,000 respectively. We zoom in to a portion of the boundary of each model to showcase some local topology generated by one-dimensional holes. The 3D example is available for online interactive exploration at the webpage https://skfb.ly/6SnT9

Stochastic growth models have been applied to study the temporal and spatial dynamics of a wide range of processes including the growth of bacterial cell colonies [41] and tumors [42] in biology, the spread of diseases in epidemiology [36], gelation and crystallization in materials science and physics [20], and urban growth [40] in the social sciences. The Eden model is an example of such a model that is simple enough to study analytically yet complex enough to capture important scaling behavior. Its surface is a prototypical model for the growth of interfaces and rough surfaces [5, 21, 31]. For a modified version of the Eden Model on a flat substrate, which restricts the growth of the model to a half infinite rectangle, this surface is believed to fall into the Kardar–Parisi–Zhang universality class, a large class of random interfaces characterized by the scaling behavior of the height function [23]; see also [10, 21]. Other systems believed to fall into this class include ballistic deposition and anisotropy-corrected versions of the Eden model in \(\mathbb {Z}^d\) [2], while statistics consistent with it have been observed in experiments of paper wetting [27] and turbulent liquid crystals [39].

The EGM itself has specifically been proposed as a model for wound regeneration [1] and the growth of bacterial colonies [41]. Similar systems with additional parameters or modified rules for the addition (or subtraction) of tiles have been proposed to model a wide variety of phenomena, such as the magnetic Eden model for aggregations of particles with a fixed spin in a mediumFootnote 2 [4, 8, 25]; cellular automata [14]; tumor growth [42]; and urban growth [26, 40], among others.

In applications of stochastic growth models, the geometry of the perimeter strongly influences the interaction with the ambient environment. For example, in materials science the roughness and porosity are important in a wide variety of contexts, and in marine biology the shape of a coral colony is related to resource acquisition [28]. This interaction might depend on the local topology of the structure. For example, for a three-dimensional aggregation, the two-dimensional homology corresponds to voids that do not have a connection with the external space; cells on the surface of these voids do not have the same access to external resources. The one-dimensional homology concentrated on the surface of an aggregation provides a measure of the complexity of that surface; the cells forming a 1-dimensional homology class could be thought of as a filter in the sense that the medium can flow through them.

2 Main Results

Our main results concern the rate of growth of the i-dimensional homology groups of the Eden growth model. Let A(t) be the d-dimensional Eden model at time t, for \(d\ge 2\), and let \(\beta _i(t)\) denote the rank of the i-dimensional homology (the ith Betti number) of A(t). Roughly speaking, \(\beta _i(t)\) measures the number of “i-dimensional holes” in A(t). For example, if \(d=3\), \(\beta _1(t)\) gives the number of tubes through A(t) (for a solid donut, this is one) and \(\beta _2(t)\) gives the number of voids of A(t), or bounded components of the complement of A(t) (for a sphere, this is one). See Sect. 3.1 for a technical definition.

The first result relates the growth of \(\beta _i(t)\) with that of the site perimeter of A(t), the set of tiles adjacent to but not contained in A(t). Write \(P_d(t)\) for the volume (number of d-dimensional faces) of the site perimeter. Also, recall that a statement A is true “with high probability” (alternatively “asymptotically almost surely”) with respect to t if, for all \(\epsilon >0\), it occurs with probability greater than \(\epsilon \) for all sufficiently large t.

Theorem 2.1

For each d and \(1 \le i \le d-1\), there are constants \(c=c(d,i)>0\) and \(\hat{c}=\hat{c}(d,i)\) such that

$$\begin{aligned} ct^{(d-1)/d}&\le \beta _i(t) \le \hat{c}P_d(t),&\qquad i&\le d-2, \end{aligned}$$
(1)
$$\begin{aligned} cP_d(t)&\le \beta _i(t) \le P_d(t),&\qquad i&= d-1, \end{aligned}$$
(2)

with high probability as \(t\rightarrow \infty \).

Below, we show that \(\hat{c}(d,i)\le 2^{d-i}\left( {\begin{array}{c}d\\ i\end{array}}\right) \), but based on computational experiments we suspect that \(\hat{c}(d,i)\) is decreasing in i, and in particular always bounded above by 1 as \(\hat{c}(d,d-1)\le 1\). See Fig. 6. Equation (2) tells us that the rank of the top-dimensional homology, the number of “voids”, scales with the volume of the perimeter.

Heuristics used in the physics literature [29] suggest that the volume \(P_d(t)\) of the site perimeter of the d-dimensional EGM scales as \(t^{(d-1)/d}\). This has been proven to be true “on average” and “most of the time” by Damron et al. [13], but the stronger conjecture that it is true with high probability is wide open and presents significant difficulties. Assuming this conjecture, our theorem shows that the ranks of all homology groups scale with the volume of the perimeter, up to a constant factor. This makes sense on an intuitive level, as any connected local configuration, including those that create topology locally, should occur with some non-zero probability anywhere on the boundary. Note that one of the main results of the same paper by Damron et al. is a proof that \(P_d(t)\) differs from \(t^{(d-1)/d}\) by a factor of at most \((\log t)^C\) for some constant C, assuming another longstanding conjecture called the uniform curvature condition [13, 34]. As such, above theorem shows an analogous conditional result for the Betti numbers.

The lower bound of Theorem 2.1 is a corollary of a more general result (Theorem 4.1 below): every local configuration (in a cube of sidelength R) of filled and empty tiles occurs, with high probability, at least \(c(R,d)t^{(d-1)/d}\) times at the boundary of the model at time-t. Thus, for example, cycles of arbitrarily large size, while they are rarer the bigger they are, still occur arbitrarily many times as t increases.

The results of our computational experiments (Sect. 6) suggest a stronger conjecture about the growth rate:

Conjecture 2.2

There exists a \(C_{i,d}>0\) such that

$$\begin{aligned} \frac{\beta _{i}(t)}{t^{({d-1})/{d}}}\rightarrow C_{i,d} \end{aligned}$$
(3)

almost surely as \(t\rightarrow \infty \).

The constants suggested by our experiments are \(C_{1,2}\approx 1.1\) and \(C_{1,3}\approx 0.419\). While we conducted experiments for higher-dimensional homology and higher-dimensional Eden models, we do not have sufficient evidence to provide reasonable guesses for the other constants.

We have also investigated how the rank of the homology can change in one step, proving another theorem:

Theorem 2.3

If \(\beta _i(t)\) is the ith Betti number of the d-dimensional EGM stochastic process at time t, then for all t

$$\begin{aligned} -2^{d-1-i}{d-1 \atopwithdelims ()i} \le \beta _i(t) - \beta _i(t-1) \le 2^{d-i}{d-1 \atopwithdelims ()i-1}, \end{aligned}$$
(4)

and all the values, including the extremal values, are attained with positive probability for each \(t \ge 3 \cdot 5^{d-1}\).

We can strengthen the previous result to find a probabilistic bound on the number of times that the change in the Betti number equals a given value. For \(\delta \in \mathbb {N}\), let \(X(\delta ,t,i)\) denote the number of times \(s\le t\) such that \(\beta _i(t) - \beta _i(t-1)=\delta \).

Theorem 2.4

There exists a constant \(c=c(d,l)>0\) depending only on d such that for all

$$\begin{aligned} l\in \left[ 2^{d-1-i}{d-1 \atopwithdelims ()i}, 2^{d-i}{d-1 \atopwithdelims ()i-1}\right] , \end{aligned}$$

\(\mathbb {P}(X(\delta ,t,i)\ge ct)\rightarrow 1\) as \(t\rightarrow \infty \).

Assuming the conjecture on the growth of the perimeter stated above, we can make the probabilistic statement of this theorem more uniform in t:

Theorem 2.5

Assume that there is a \(C(d)>0\) such that \(P_d(t)\le C(d)t^{(d-1)/d}\) with high probability. Then, for \(t\gg 0\) and for each \(-2^{d-1-i}{d-1 \atopwithdelims ()i} \le \ell \le 2^{d-i}{d-1 \atopwithdelims ()i-1}\),

$$\begin{aligned} \mathbb {P}({\beta _i(t) - \beta _i(t-1)=\ell })\ge c(d,\ell ) \end{aligned}$$

for some constant \(c(d,\ell )>0\).

Note that this differs from the previous theorem in that it provides a lower bound on the probability of a change occuring at a specific timestep, rather than a bound on the time-averaged number of occurrences; this would, for example, rule out the naive hypothesis that certain changes can only occur at even timesteps. We find that it suffices to set \(c(d,\ell )\sim 1/{\exp (\exp (d))}\) in both these results, and we believe that this is close to optimal for the rarest cases. Thus even in the 4-dimensional EGM, one cannot expect every possibility to show up in the course of a reasonable-length simulation, as we indeed see in our computational experiments. Proofs of Theorems 2.3, 2.4, and 2.5 are included in Sect. 5.

Again, our computational experiments suggest stronger regularity properties for the distribution of these jumps:

Conjecture 2.6

For every \(-2^{d-1-i}{d-1 \atopwithdelims ()i} \le \ell \le 2^{d-i}{d-1 \atopwithdelims ()i-1}\),

$$\begin{aligned} \mathbb {P}({\beta _i(t) - \beta _i(t-1)=\ell }) \end{aligned}$$

converges to a positive constant \(c(d,\ell )\) as \(t \rightarrow \infty \).

In Sect. 6, we present the results of our computational experiments for the Eden model. First, we consider the rates of growth of the perimeter (Sect. 6.1) and the Betti numbers (Sect. 6.2), and compare the behavior of \(\beta _i(t)\) for different values of i. Next, we apply persistent homology in Sect. 6.3 to study the amount of time between when an i-dimensional hole first appears in the Eden model and when it is killed by the addition of tiles. Finally, in Sect. 6.4 we consider the distributions of the volumes and shapes of the \((d-1)\)-dimensional holes in the Eden model, and how these holes divide as time progresses. The software and data developed in the course of this research is publicly available on GitHub [30].

3 Definitions and Preliminaries

To formally define the Eden model and its homology, we think of the regular cubical tiling as endowing \(\mathbb {R}^d\) with the structure of an infinite cubical complex whose vertices are \(\mathbb {Z}^d\) and whose d-cells are translates of \([0,1]^d\). We call this cubical complex \({{\,\textrm{CW}\,}}(\mathbb {Z}^d)\). A d-dimensional polycube is a union of d-cells of this structure (a pure d-dimensional subcomplex) which is strongly connected, that is, its interior is connected; in other words, the interiors of any two d-cells are connected via a path which is disjoint from the \((d-2)\)-skeleton (cf. the definition of a pseudomanifold). In the combinatorics literature, these are known as polyominoes in two dimensions and polycubes in higher dimensions.

Given a polycube A, its i-skeleton \(A^i\) is the union of all j-cells in A for \(j\le i\), forming a filtration

$$\begin{aligned} A^0 \subset A^1 \subset \ldots \subset A^d=A. \end{aligned}$$

The site perimeter of a polycube A is the set of d-cells of \({{\,\textrm{CW}\,}}(\mathbb {Z}^d)\) that are not in A but have \((d-1)\)-cells in common with A; in other words, d-cells Q such that \(A \cup Q\) is again a polycube. This contrasts with the boundary of the polycube, which is a \((d-1)\)-dimensional complex defined using the usual topological notion \(\partial A=\overline{A} \cap \overline{A^c}\).

Fig. 3
figure 3

Two polyominoes with the site perimeter highlighted. Each has one 1-dimensional hole, i.e., \(\beta _1 = 1\)

The Eden cell growth model is a stochastic process which produces a polycube A(t). It starts at time 1 with one d-cube at the origin, and at each time step, \(A(t+1)=A(t)\cup Q_{t+1}\) where \(Q_{t+1}\) is a d-cube chosen uniformly at random from the site perimeter.

The Eden model is often equivalently defined with the cubes replaced by vertices of the lattice \(\mathbb {Z}^d\), thought of as a graph with neighboring vertices linked along each axial direction. At each time step, a single unfilled vertex along the site perimeter is filled. In this formulation, the site perimeter consists of unfilled vertices which share an edge with a filled vertex, and the boundary consists of edges between filled and unfilled vertices (the boundary of the set of filled vertices in the sense of graphs). Our definition in terms of cubes is needed to define the homology of the Eden model; we take note of this equivalent formulation because it is the usual way of formalizing first-passage percolation, as we describe below.

3.1 Homology

The homology groups of a space are a sequence of abelian groups representing the “i-dimensional holes” of the complex. For example, a solid donut has a single 1-dimensional hole, while a 2-sphere has a single 2-dimensional void; these correspond to the ranks of the homology groups \(H_1\) and \(H_2\), respectively. A “0-dimensional hole” is a disconnection, and the rank of \(H_0\) is the number of connected components of the space. Homology groups of cubical complexes are most easily defined combinatorially, but are topological invariants. The reader is referred to an algebraic topology textbook such as [22] for more information.

In this paper we use homology with coefficients in the field \(\mathbb F_2=\{0,1\}\); we suppress this in our notation. Given a cubical complex A, let \(C_i(A)\) be the vector space of i-chains, that is, formal \(\mathbb F_2\)-linear combinations of i-cells. The boundary homomorphism \(\partial _i:C_i(A) \rightarrow C_{i-1}(A)\) sends each cell to the formal sum of the \((i-1)\)-cells on its boundary. Then the ith homology is the vector space of i-cycles, which have zero boundary, modulo the i-dimensional boundaries of \((i+1)\)-chains:

$$\begin{aligned} H_i(A)=\ker (\partial _i)/\partial _{i+1}(C_{i+1}(A)). \end{aligned}$$

The ith Betti number \(\beta _i(A)\) is the dimension of \(H_i(A)\). Thus \(\beta _0(A)\) is the number of connected components—always 1 for a polycube. Moreover, for a d-dimensional polycube, \(\beta _i(A)=0\) for all \(i \ge d\). This is obvious for \(i>d\) and true for all subsets of \(\mathbb {R}^d\) in the case \(i=d\). This leaves the cases \(1 \le i \le d-1\) as the interesting ones to measure for the Eden model.

3.2 First-Passage Percolation and the Eden Model

First-passage percolation (FPP) is a well-studied family of stochastic processes on the lattice \(\mathbb {Z}^d\), thought of as a graph; see [3] for an extensive survey. Here we describe how the Eden model can be thought of as a special case of FPP, which will be useful in several of our proofs.

We first define two types of stochastic processes. In bond FPP, the lattice is given by a graph metric with edge lengths pulled i.i.d. from some probability distribution, and the process of interest is the growth of the t-ball around the origin in this metric. Site FPP is similar but a bit harder to define; here every vertex of the graph (called a site) is assigned an i.i.d. number called a passage time. The passage time of a site p governs the time from when a site adjacent to p first gets “infected” to when p gets infected. We again start with the origin infected at time 0 and study the set of infected sites at time t.

Now consider site FPP where the passage times are distributed exponentially with mean 1. The exponential distribution is important because it is “memoryless” in the sense that

$$\begin{aligned} P(X>t+s \mid X>s)=P(X>t). \end{aligned}$$

Thus, conditioning on the event that the ball at time t is a polycube A, the additional time required to add a specific adjacent site is again exponential with mean 1, and is independent from when other adjacent sites are added and from the passage times of non-adjacent unfilled sites. In particular, every site in the perimeter has the same probability of being infected next. But this is exactly how the Eden model works, except that in the Eden model the time to add the next tile is fixed. Consequently, the Eden model can be thought of as a (variable) time rescaling of this FPP model. This was first observed by Richardson [37].

3.3 Variations on the Model

Our results are stable with respect to certain variations on the setup described above. First, instead of uniformly selecting a tile in the site perimeter, one could uniformly select an face on the boundary and add the adjacent tile along the face. In other words, the probability that an element of the site perimeter is selected is weighted by the number of connections between it and the polycube at time t. This can be modeled using first-passage percolation like the usual Eden model, but using bond FPP rather than site FPP. All of our proofs can easily be modified to produce analogous results for this model.

Another potential variation relates to how the topology of the Eden model is defined; rather than connecting cubes that touch at corners, one could consider two cubes to be connected only if they share a face. The advantage of this idea is that this aligns with the notion of adjacency used in defining growth. There are several ways of formalizing this idea. One is to consider the interior of the cubical complex constructed above. Alternatively, one can build a new cubical tessellation by placing grid points at the centers of cubes of the polycube; the intersection of this tessellation with our polycube is a deformation retract of its interior, and this gives a combinatorial characterization. The proof of Theorem 2.1 works without modification with this redefinition. One can also get an analogue of Theorems 2.3, 2.4, and 2.5, though with different constants: to understand the effect of adding a cube one has to work with the geometry of its dual cross polytope, rather than of the cube itself. In the end, though, this variation simply switches the role of the Eden ball and its complement: Alexander duality tells us that a sufficiently nice domain gives the same topological information as the closure of its complement, and we can obtain such a domain either by slightly thickening the Eden ball or by slightly thickening its complement.

Finally, our results can easily be extended to other regular tessellations of \(\mathbb {R}^d\) besides the cubical one. In fact, much of what we say seems to depend only on the large-scale geometry of the contractible cell complex \({{\,\textrm{CW}\,}}(\mathbb {Z}^d)\). One direction for further research would be to understand similar models on tessellations of hyperbolic space, nilpotent Lie groups, other symmetric spaces, \(CAT (0)\) cube complexes, and other contractible spaces on which a group acts geometrically. For what little is known about first-passage percolation on spaces of interest in geometric group theory, see [6].

3.4 Combinatorics of Cubes and Polycubes

The following is easy to see:

Lemma 3.1

The number of i-dimensional faces of the d-dimensional cube is \(2^{d-i}{d \atopwithdelims ()i}\).

In the proof of Theorem 2.1 we require the following combinatorial fact about polycube in general. Here \(\textrm{vol}_d\) represents the d-dimensional volume of a cubical complex, which is equal to the number of d-dimensional faces.

Lemma 3.2

Let A be any polycube in \(\mathbb {R}^d\). Then for some \(1\le i\le d\), the projection of A to the ith coordinate hyperplane (denoted \(\pi _i(A)\)) has

$$\begin{aligned} \textrm{vol}_{d-1}(\pi _i(A)) \ge \textrm{vol}_d(A)^{(d-1)/d}. \end{aligned}$$

Proof

The isoperimetric inequality for polycubes [7], attained by cubes, is

$$\begin{aligned} \textrm{vol}_{d-1}(\partial A) \ge 2d\textrm{vol}_d(A)^{(d-1)/d}. \end{aligned}$$

Suppose first that A is convex, that is the intersection of A with any line parallel to any coordinate axis is connected. (Note that a convex polycube is not a convex set!) This is equivalent to saying that every \((d-1)\)-cube in \(\partial A\) is visible from infinitely far in some coordinate direction. In that case,

$$\begin{aligned} \textrm{vol}_{d-1}(\partial A)=\sum _{i=1}^d 2\textrm{vol}_{d-1}(\pi _i(A)), \end{aligned}$$

which completes the proof.

Now take a general polycube A. We will construct a convex polycube \(A_d\) with the following properties:

  1. (i)

    \(\textrm{vol}_d(A_d)=\textrm{vol}_d(A)\).

  2. (ii)

    For each i, \(\textrm{vol}_{d-1}(\pi _i(A_d)) \le \textrm{vol}_{d-1}(\pi _i(A))\).

This comparison proves the lemma for A. We construct \(A_d\) by “lining up” the columns of A in each coordinate direction. That is, let \(A_0=A\). Once we have built \(A_{i-1}\), we make it into \(A_i\) by turning on gravity in the ith direction and “shaking”, that is, letting all the cubes fall down to some hyperplane below the polycube. Clearly condition (i) holds. We need to show that (ii) holds and that \(A_d\) is column convex. We show both of these by analyzing each shake, that is, each transition from \(A_{j-1}\) to \(A_j\).

During the ith shake, the polycube becomes column convex in the ith coordinate direction, that is, its intersection with any line in that direction is connected. It remains to show that during subsequent shakes, \(j>i\), this convexity is preserved. Since what happens to a cube depends only on its column, we look at the intersection of the polycube with each plane in the ij-direction. If we start with connected columns lined up on one side, then the jth shake sorts those columns by height, without changing their convexity.

Finally, we show that each jth shake does not increase the volume of \(\pi _i(A)\). Certainly if \(i=j\) the projection doesn’t change. Otherwise we again look at the intersection with each plane in the ij-direction. After the shake, what we see from the ith coordinate direction is the height of the largest column. Previously, every cube in that column was either visible or obstructed by something, so the volume of the projection can only decrease. \(\square \)

4 Proof of Theorem 2.1

We start with the (easy) upper bound. Write A(t) for the polycube at time t. Applying the Mayer–Vietoris sequence to \(A(t)\cup \overline{A(t)^c}=\mathbb {R}^d\), we see that

$$\begin{aligned} H_i(\partial A(t)) \cong H_i(A(t)) \oplus H_i(\overline{A(t)^c}). \end{aligned}$$

The rank of the left side is bounded by the number of i-cells in the boundary, giving the bound \(\beta _i(t) \le 2^{d-i}{d \atopwithdelims ()i}P_d(t)\) since \(2^{d-i}{d \atopwithdelims ()i}\) is the number of i-cells in a d-cube. In the case \(i=d-1\), we can get a stronger bound since \(\beta _{d-1}(t)\) is the number of voids in A(t), in other words, the number of bounded connected components of its complement. Since every connected component of the complement must include a cell of the site perimeter, \(\beta _{d-1}(t)\le P_d(t)\).

We now prove the lower bound. Here is the basic outline. Given a time t, we find \(\Omega (t^{(d-1)/d})\) disjoint empty boxes of side length R at the perimeter of a somewhat earlier stage \(A(t_0)\). Then we show that once we reach time t, at least a constant proportion of these boxes end up containing a structure which adds one to the ith Betti number.

The boxes are obtained as follows. By Lemma 3.2, the projection of \(A(t_0)\) in some coordinate direction has volume at least \(t_0^{(d-1)/d}\). Thus (thinking of that direction as “up”) we can drop \(\Omega (t_0^{(d-1)/d})\) boxes from overhead so that they land in different places on top of the polycube \(A(t_0)\). We formalize this in proving the following more general result.

Theorem 4.1

Let S be any d-dimensional polycube which is contained in the cube \([0,R]^d\) and includes the entire base of that cube (i.e., \([0,R]^{d-1}\times [0,1]\)). There is a constant \(c=c(R,d)>0\) such that S occurs (perhaps in rotated form) as the intersection of A(t) with at least \(ct^{(d-1)/d}\) different cubes of width R, with high probability as \(t \rightarrow \infty \).

Before proving Theorem 4.1, we use it to finish the proof of Theorem 2.1. Let \(1 \le i \le d-1\) and set

$$\begin{aligned} S=([0,5]^{d-1} \times [0,1]) \cup ([2,3]^{d-i-1} \times [1,4]^{i+1}) \setminus [2,3]^d \subset [0,5]^d. \end{aligned}$$

That is, S is the base together with a “handle” homotopy equivalent to \(S^i\). The theorem guarantees \(ct^{(d-1)/d}\) copies of S whose intersection with the remainder of A(t) is contained in the base. Thus A(t) is the union of two pieces: all the copies of S on one side, and the rest of A(t) together with the bases of the copies of S on the other; the intersection is a disjoint union of contractible components, one for each copy of S. Thus by the Mayer–Vietoris theorem, \(\beta _i(t) \ge ct^{(d-1)/d}\).

4.1 Proof of Theorem 4.1

To prove Theorem 4.1 we will use the reformulation of the Eden model in terms of first-passage percolation, as described in Sect. 3.2. We now keep track of time in the FPP model, which we indicate by r to contrast with t for Eden time and to suggest that it is roughly the radius of the polycube; the notation A(r) and \(P_d(r)\) indicates the Eden model in FPP time and the volume of its site perimeter for the rest of the section. We also write \(|A(r)|\) for the volume of A(r), i.e., t. Finally we define the passage time from r to be the passage time of a site if it is not in the site perimeter of A(r), and the time from r to infection if it is. The memorylessness of the exponential distribution implies that, given \(A(r)=A\), the passage times from r to sites not in A are are i.i.d. exponential, with no difference between sites in and outside the site perimeter.

Our approach is to find at least \(c(d)|A(r-2)|^{(d-1)/d}\) copies of S in A(r) with high probability. Thus, to prove the theorem, we also need to know that \(|A(r)|\le C(d)|A(r-2)|\). This follows from the Cox–Durrett shape theorem [12], which shows in particular that there is a constant \(V_0\) such that for every \(\varepsilon >0\), with high probability

$$\begin{aligned} (1-\varepsilon )V_0r^d<|A(r)|<(1+\varepsilon )V_0r^d. \end{aligned}$$

However, in the interest of keeping the overall argument elementary we also provide the following much cruder estimate. Since this estimate is stated in terms of the site perimeter, it is also useful for our later argument about \(\beta _{d-1}(t)\).

Lemma 4.2

With high probability as \(r \rightarrow \infty \),

$$\begin{aligned} |A(r)|\le |A(r-2)|+CP_d(r-2)\le (1+2dC)|A(r-2)|, \end{aligned}$$

where \(C=C(d)\) is a constant. In particular, \(P_d(r) \le (1+2dC)P_d(r-2)\).

Proof

We will show that there is an \(\varepsilon =\varepsilon (d)>0\) such that with high probability, \(|A(r+\varepsilon )|\le |A(r)|+P_d(r)\), and in particular \(P_d(r+\varepsilon ) \le 2P_d(r)\). This will imply the lemma with \(C=2^{\lceil 2/\varepsilon \rceil }\).

Let \(\varepsilon \) be such that \(\mathbb {P}(\rho _p<\varepsilon )=1/(2d+1)\), where \(\rho _p\) is the passage time from r at any site \(p \notin A(r)\). Consider a rooted infinite \((2d-1)\)-ary tree equipped with passage times on the nodes distributed via the same exponential distribution. The expected size E of the maximal subtree containing the root (if nonempty) whose nodes all have passage times \(<\varepsilon \) satisfies the recurrence relation

$$\begin{aligned} E=\frac{1+(2d-1)E}{2d+1}; \end{aligned}$$

thus \(E=1/2\). This bounds the expected size of the subtree reached in time \(\varepsilon \).

Now we show that \(V(r+\varepsilon )\) is bounded above by the total size of all these subtrees for a collection of \(P_d(r)\) independent such trees. We associate the roots of the trees to the cubes of the site perimeter of A(r), and then map each tree to \(\mathbb {Z}^d\) via a graph homomorphism by thinking of paths in the tree as corresponding to reduced words on d letters and their inverses, with one letter missing from the initial position corresponding to a neighbor of the root site which is in A(r).

We give a coupling between the weight distribution on the collection of trees and the passage times from r for sites outside A(r), in which each site is coupled to some node which maps to it. Namely, the nodes in the site perimeter are associated to the corresponding tree. Then we couple each subsequent site to a neighbor of the node coupled to the neighboring site reached at the earliest time. Thus the coupling between the probability spaces depends on the values pulled from preceding distributions; this doesn’t affect any probabilities since all that changes is which i.i.d. exponentially distributed weight corresponds to a given site.

In the end, every site in \(A(r+\varepsilon )\) is coupled to a node which is reached at time \(<\varepsilon \). Since the expected number of nodes in each tree attained after time \(\varepsilon \) is less than 1, with high probability, \(|A(r+\varepsilon )|<|A(r)|+P_d(r)\). \(\square \)

By Lemma 3.2, the projection of \(A(r-2)\) in some coordinate direction (without loss of generality, the \(x_d\) direction) has volume \(\ge |A(r-2)|^{(d-1)/d}\). In particular, if we partition the plane \(x_d=0\) into coordinate cubes of side length R, some number \(N\ge R^{-(d-1)}|A(r-2)|^{(d-1)/d}\) of those cubes intersect this projection. For each such cube \(K_j\), let \(h(K_j)\) be the maximal \(x_d\)-coordinate of a point of \(A(r-2)\) whose dth projection lies in K. Thus the d-dimensional cube \(\tilde{K}_j=K_j\times [h(K_j),h(K_j)+R]\) touches, but does not intersect \(A(r-2)\). We finish by showing

Lemma 4.3

There is a \(c(R)>0\) such that with high probability, for at least \(c(R)N\) values of j, \(1\le j\le N\), we have \(\tilde{K}_j\cap A(r)=S+y_j\), where \(y_j\) is defined so that \(\tilde{K}_j=[0,R]^d+y_j\).

Proof

For a site \(p \notin A(r-2)\), let \(\rho _p\) denote its passage time from \(r-2\). As outlined above, the \(\rho _p\) are i.i.d. for all points outside \(A(r-2)\). Let \(X_j\) be the event that for all \(p \in \tilde{K}_j\),

$$\begin{aligned} \rho _p&\le R^{-d} \quad \text {if }p \in S+y_j,\\ \rho _p&\ge 3 \quad \quad \;\,\, \text {if }p \notin S+y_j. \end{aligned}$$

Clearly, the \(X_j\) are i.i.d. and each \(X_j\) occurs with positive probability. Therefore there is a constant \(c(R)>0\) such that with high probability at least \(c(R)N\) of the \(X_j\) occur.

Now notice that if \(X_j\) occurs, then for some p in the base of \(\tilde{K}_j\), \(A(r-2+R^{-d})\) contains p. Every point in \(S+y_j\) is connected to that point by a path through \(S+y_j\) of length certainly \(\le R^d-1\). Therefore, for \(r-1<s<r+1\), A(s) contains all the points of \(S+y_j\) and none of the points of \(\tilde{K}_j\setminus (S+y_j)\). This proves the lemma. \(\square \)

4.2 Proof of the Lower Bound for Top-Dimensional Holes

Finally, we prove the stronger lower bound \(\beta _{d-1}(t)\ge cP_d(t)\). We will show the following:

Lemma 4.4

There is a \(c>0\) such that with high probability, there are \(\ge \) \(cP_d(r-2)\) voids of volume 1 in A(r).

Since by Lemma 4.2, \(P_d(r) \le C(d)P_d(r-2)\), this suffices.

Proof

For \(\sigma \in \mathbb Z^d\), let \(\Psi _\sigma \) be the set of sites in the intersection of \(\sigma +3\mathbb {Z}^d\) with the site perimeter of \(A(r-2)\). We can choose \(\sigma \) so that \(\Psi _\sigma \) contains at least \(1/3^d\) of the site perimeter. Given a site \(p\in \Psi _\sigma \), let \(X_p\) be the event that the passage time from \(r-2\) is \(>2\) for p and \(\le 1/2\) for all sites that share a \((d-2)\)-face with p. The \(X_p\) are i.i.d. and each \(X_p\) occurs with positive probability. Therefore there is a constant \(c>0\) such that with high probability, at least \(c\cdot 3^d|\Psi _\sigma |\) of the \(X_p\) occur. It is easy to see that if \(X_p\) occurs, then A(r) contains all the neighbors of p, but not p. \(\square \)

5 Proof of Theorems 2.3, 2.4, and 2.5

We now endeavor to understand the possible changes in \(\beta _i\) at a single timestep. Let A(t) be the polycube at time t, and Q be the tile added at time \(t+1\). Then by excision and the long exact sequence of a pair, we have

$$\begin{aligned} H_i(A(t+1),A(t)) \cong H_i(Q,Q \cap A(t)) \cong \tilde{H}_{i-1}(Q \cap A(t)), \end{aligned}$$

where \(\tilde{H}_i\) indicates reduced homology. The long exact sequence of the pair \((A(t+1), A(t))\) then indicates that

$$\begin{aligned} -\max {{\,\textrm{rank}\,}}H_i(Q \cap A(t)) \le \beta _i(t+1)-\beta _i(t) \le \max {{\,\textrm{rank}\,}}H_{i-1}(Q \cap A(t)), \end{aligned}$$

where the maximum is taken over possible subcomplexes of the d-dimensional cube which could be \(Q \cap A(t)\).

We now compute this maximal rank. Notice that \(Q \cap A(t)\) has to include at least one \((d-1)\)-dimensional face in order for us to be able to add the tile Q. Without loss of generality, we assume this is the base of the cube. Since adding i-cells can only increase it and adding \((i+1)\)-cells can only decrease it, \({{\,\textrm{rank}\,}}H_i(Q \cap A(t))\) is maximized when A(t) includes the entire i-skeleton of the cube but no \((i+1)\)-cells outside the base. Write \(Q_r=[0,1]^r \subset \mathbb {R}^d\) equipped with the standard cell structure. Then it is enough to compute

$$\begin{aligned} {{\,\textrm{rank}\,}}\tilde{H}_i(Q_{d-1} \cup Q_d^{(i)}) \end{aligned}$$

where \( Q_d^{(i)}\) is the i-skeleton of \(Q_d\). Notice that \(Q_{d-1} \cup \bigl (Q_{d-1}^{(i)} \times [0,1]\bigr )\) is contractible and obtained by adding \((i+1)\)-cells to \(Q_{d-1} \cup Q_d^{(i)}\); the number of these \((i+1)\)-cells is the same as the number \(J(d-1,i)\) of i-cells in the \((d-1)\)-cube. Therefore

$$\begin{aligned} {{\,\textrm{rank}\,}}\tilde{H}_i(Q_{d-1} \cup Q_d^{(i)})=J(d-1,i)=2^{d-1-i} {d-1 \atopwithdelims ()i}. \end{aligned}$$

This demonstrates the equation (4).

It remains to show that every change in \(\beta _i\) within this range is attained by some configuration. The Eden model produces any polycube with positive probability, so it is enough to demonstrate that:

Lemma 5.1

  1. (a)

    For each \(1 \le i \le d-1\) and each \(0 \le k \le J(d-1,i)\), there is a polycube in which adding a tile decreases \(\beta _i\) by k and increases \(\beta _{i+1}\) by \(J(d-1,i)-k\).

  2. (b)

    For each \(1 \le k \le J(d-1,0)=2^{d-1}\), there is a configuration in which adding a tile increases \(\beta _1\) by k.

Proof

Given subcomplexes \(R \subseteq S \subseteq Q_d^{(d-1)}\), we will construct a set \(A_{R,S}\) of tiles in the \(5 \times \cdots \times 5\) grid centered at \(Q_d\) that is homotopy equivalent to S and intersects \(Q_d\) in R. A tile \(Q'\) adjacent to \(Q_d\) is included if and only if \(Q'\cap Q_d\) is contained in R. The tiles in the boundary of the \(5\times \cdots \times 5\) grid are included according to the following criterion. The planes containing the \((d-1)\)-faces of \(Q_d\) partition \(\mathbb {R}^d\) into \(3^d\) regions. The intersection of the closure of such a region with \(Q_d\) consists of exactly one face of \(Q_d\). A boundary tile not in the top or bottom layer is included if and only if the region containing it intersects \(Q_d\) in a face of S.

Fig. 4
figure 4

Adding the central cube to this configuration increases \(\beta _1\) by 4; this is the construction given in the proof of Lemma 5.1 (b) altered slightly for visibility

\(A_{R,S}\) is homotopy equivalent to S, and \(A_{R,S} \cap Q_d=R\); thus using the Mayer–Vietoris theorem one sees that

$$\begin{aligned} H_i(A_{R,S} \cup Q_d) \cong H_i(S,R). \end{aligned}$$

Then to fulfill (a) we use \(A_{R,S}\) with \(R=Q_{d-1}\cup Q_d^{(i)}\) and \(R\subseteq S\subseteq Q_{d-1}\cup Q_{d-1}^{(i)}\times [0,1]\), with S containing k of the extra \((i+1)\)-dimensional faces. To fulfill (b) we use \(A_{R,S}\) with R comprising \(Q_{d-1}\) and k vertices of the upper face of \(Q_d\), and with S adding in the vertical edges connecting those vertices to the base. In both cases, adding in the center tile changes the topology as desired. \(\square \)

Now we show that with high probability, each such jump happens at least ct times between time 0 and t for some \(c=c(d)>0\). In particular, we show that a constant percentage of the time, the tile added at step s is locally configured as in Lemma 5.1, and that local configuration is attached to the rest of the polycube only by the base; hence by the Mayer–Vietoris theorem, the change in the overall Betti numbers is the same as the change in the local Betti numbers.

For this we use the FPP formulation of the Eden model; in fact our proof works in a wide range of FPP models.

Theorem 5.2

Consider a site FPP model in \(\mathbb {Z}^d\) whose probability distribution on passage times is not supported away from both 0 and \(\infty \) (in other words, there are pairs of times in the support whose ratio is arbitrarily large). We denote the polycube at time r in this model by the random variable A(r). Then there is some \(c(R)>0\) depending on the distribution such that the following holds. Let K be a (strict) sub-polycube of \([0,R]^d\) which contains the entire boundary, and mark a tile \(x_0\) inside \([0,R]^d\) and outside but adjacent to K. We say that \(x\in P_K\) if the tile x is added to the polycube at a time \(r_x \le r\), and

$$\begin{aligned} A(r_x)\cap ([0,R]^d+(x-x_0))=K\cup x+(x-x_0). \end{aligned}$$

Then with high probability \(|P_K|\ge c(R)|A(r)|\).

Applying this to the configurations in Lemma 5.1, framed inside a filled shell with extra white space added so that only the base of the interior configuration touches the shell, we get our desired statement. The theorem holds, mutatis mutandis, for bond percolation models.

Proof

We show that for each \(\sigma \in \mathbb {Z}^d\), with high probability a constant proportion of the tiles in \(A(r) \cap (R\mathbb {Z}^d+\sigma )\) are in \(P_K\). Since for some \(\sigma \),

$$\begin{aligned} |A(r) \cap (R\mathbb {Z}^d+\sigma )|\ge \frac{|A(r)|}{R^d} \end{aligned}$$

this is sufficient.

Now we look at the disjoint R-cubes around each site \(x \in A(r) \cap (R\mathbb {Z}^d+\sigma )\). Write \(Q_x=[0,R]^d+(x-x_0)\) and \(K_x=K+(x-x_0)\). Let \(\rho _y\) denote the passage time of a site y, and let \(X_x\) be the event that for all \(y \in Q_x\),

$$\begin{aligned} \rho _y&\le (R+2)^{-d} \qquad \text {if }y \in K_x,\\ 1<\rho _y&<2 \qquad \qquad \qquad \quad \text {if }y=x,\\ \rho _y&\ge 3 \qquad \qquad \qquad \quad \text {otherwise.} \end{aligned}$$

These times may be scaled based on the passage time distribution to make sure that the probability that \(\rho _y\) lands in each range is nonzero. Since R is arbitrary, here we are using the condition on the support. Clearly all the \(X_x\) are i.i.d. and each occurs with positive probability. Therefore there is a constant \(c(R)>0\) such that with high probability at least \(c(R)|A(r)\cap (R\mathbb {Z}^d+\sigma )|\) of them occur.

Now if \(X_x\) occurs, and assuming \(Q_x\) does not include the origin, let \(r_x\) be the time at which x enters the polycube. Then the path connecting the origin to x has to go through the outermost layer of \(Q_x\), so sites in that layer enter the polycube earlier. Once one point in \(K_x\) is in the polycube, the rest must join it after time \(<1\). The first point adjacent to x joins at some time in \((r_x-2,r_x-1)\). One sees therefore that all sites in \(K_x\) must enter the polycube at times in \((r_x-3,r_x)\), and that all sites in \(Q_x\setminus (K_x\cup \{x\})\) must enter after x does. Thus every x for which \(X_x\) occurs is in \(P_K\).

\(\square \)

Finally, we prove Theorem 2.5, which states that under the assumption that there is a \(C(d)>0\) such that \(P_d(t)\le C(d)t^{(d-1)/d}\) with high probability, the probability of each specific change in \(\beta _i\) occuring at time t is asymptotically bounded away from zero, again with high probability.

By Theorem 4.1 the perimeter contains \(\ge c(d)t^{(d-1)/d}\) sites at time t whose neighborhood looks like a given one of the configurations from Lemma 5.1 with high probability. Therefore, whenever \(P_d(t) \le C(d)t^{(d-1)/d}\), the probability that the next tile is added in the center of such a configuration is at least c(d)/C(d).

6 Computational Experiments and Open Problems

Theorem 2.1 shows a rigorous asymptotic bound for the Betti numbers of the Eden growth model in d dimensions. However, many finer questions about the associated geometry and topology remain open. In this section, we investigate several of these questions via computational experiments for the Eden model in dimensions 2 through 5, giving evidence for Conjectures 2.2 and 2.6 as stated in Sect. 2 and suggesting further conjectures.

The Eden Growth Model was implemented in Python, together with an algorithm that tracks the behavior of the \((d-1)\)-dimensional homology at each timestep. We find a basis for \(H_{d-1}(A(t))\) via Alexander duality by identifying the bounded components of the complement and tracking how they change over time. This implementation allows us to study fine questions about the distribution of shapes and area of the holes in the EGM in Sect. 6.4. In Sect. 6.1, we also compute the proportion of the site perimeter contained in the unbounded component of the complement (the outer perimeter) for clusters of sizes 1 and 2 million for the EGM in dimension two and clusters of size 1.5 million for the EGM in dimensions 2 through 5. The data analyzed in Tables 2 and 3, and Figs. 9 and 10 comes from a single set of ten two-dimensional clusters of size 1 million. This dataset has been made publicly available at the GitHub repository [30].

The algorithm described in the previous paragraph cannot easily be modified to measure the local geometry associated with the lower-dimensional homology.Footnote 3 Instead, we use the Perseus software package [32, 33] to compute the Betti numbers and persistent homology in all dimensions. These computations are discussed in Sects. 6.2 and 6.3. Unsurprisingly, this was slower than our other computations, and we include data from a single cluster of one million tiles for the two-dimensional EGM, and data from single clusters of size five hundred thousand for the EGM in dimensions three, four, and five.

6.1 Total, Inner, and Outer Perimeter

In applications of stochastic growth models (e.g. to modeling a bacterial cell colony), the interaction with the medium takes place along the site perimeter. These interactions may be qualitatively different for sites in the outer perimeter—the intersection of the site perimeter and the unbounded component of the complement (denoted \({\text {OutP}}_d(t))\)—and the inner perimeter—the remaining sites in the EGM, which are contained in its holes. By Alexander duality, top-dimensional holes can be thought of as capsules whose contents cannot interact with the outside medium. In what follows, we analyze the total, inner and outer site perimeter of simulations of the Eden model in dimensions 2 through 5.

Recall that the site perimeter of a d-dimensional polycube A is the set of d-cells that are not in A but that have \((d-1)\)-cells in common with A.

Table 1 shows the volumes of the the outer and site perimeters for examples of the Eden model composed of 1.5 million sites in dimensions 2 through 5. For \(d\ge 3\) we observed that \({\text {OutP}}_d(t)/P_d(t)\) was strictly decreasing for observations taken at evenly spaced intervals of \(10^5\) timesteps. This was not the case for \(d=2\). As such, we do not think that the ratio \({\text {OutP}}_d(t)/P_d(t)\) has stabilized at time \(t\le 1.5\times 10^6\) for \(d=3\), 4, or 5. This is unsurprising given that the diameter of the clusters, in any sense, is proportional to \(t^{1/d}\). Thus it would require considerably more computational power to collect enough data to make reasonable conjectures about the limiting value of \({\text {OutP}}_d(t)/P_d(t)\) for \(d\ge 3\). Nevertheless, we conjecture a law of large numbers for this site perimeter ratio:

Table 1 The total, inner, and outer perimeter and the diameter of one simulation in each dimension between 2 and 5 up to time 1.5 million

Conjecture 6.1

For each \(d>1\), there is a number \({\text {per}}_{d}>0\) such that

$$\begin{aligned} \frac{{\text {OutP}}_d(t)}{P_d(t)} \rightarrow {\text {per}}_{d} \end{aligned}$$
(5)

with high probability as \(t\rightarrow \infty \). Moreover, \(\lim _{d \rightarrow \infty } {\text {per}}_{d} = 1\) with \({\text {per}}_{2} \in [0.77,0.80]\).

6.2 Betti Numbers

In this section, we examine the asymptotics of the Betti numbers as well as the change in each Betti number at a single timestep. As mentioned before, the computations of the Betti numbers contained in this section were performed using the Perseus software package [32, 33]. Data for the two-dimensional Eden model comes from a single cluster of size one million, and data for dimensions three, four, and five are from single clusters of size 500, 000.

Fig. 5
figure 5

Frequency with which each change in the Betti number in one timestep occurs in the (a) two-dimensional, (b) three-dimensional, (c) four-dimensional, and (d) five-dimensional Eden models. The data is averaged over \(t=0\) to \(t=1{,}000{,}000\) for the two-dimensional Eden model and from \(t=0\) to \(t=500{,}000\) for the other three cases. This data provides strong evidence for Conjecture 2.6

Figure 5 shows the frequencies of the event that a Betti number changes by a given amount in a single timestep. The frequencies of each event appear to converge quite quickly, providing strong evidence for Conjecture 2.6. Unsurprisingly, small jumps are much more frequent than large jumps. This is related to the closeness of the frequencies of \(\beta _i\) increasing by one and decreasing by one in Figs. 6(a)–6(c) (except for the case \(i=3\) in the latter figure): the total Betti number grows more slowly than the number of timesteps, so the number of positive changes balances out the number of negative changes, with an error term growing more slowly than t (at a rate between \(t^{d-1}/d\) and \(P_d(t)\), by Theorem 2.1). We expect this behavior to also occur for \(\beta _3\) in the four-dimensional case and for the Betti numbers in the five-dimensional case, at larger values of t than pictured in Figs. 6(c) and 6(d). We provide more evidence below that statistics for this case have not yet stabilized. On the other hand, this heuristic does not explain the alignment in the frequency of events where \(\beta _1\) changes by \(+k\) and \(-k\) in Figs. 6(b) and 6(c).

Fig. 6
figure 6

The evolution of the Betti numbers and the total site perimeter \(P_d(t)\) over time in the (a) two-dimensional, (b) three-dimensional, (c) four-dimensional, and (d) five-dimensional Eden models. Power laws were fitted in MATLAB. Recall that the site perimeter consists of all cells that are not included in the Eden model but share a \(d-1\) face with it

The evolution of the Betti numbers over time are shown in Fig. 6, together with the perimeter. If \(P_d(t)\sim t^{(d-1)/d}\) as conjectured, Theorem 2.1 would imply that \(\beta _i(t)\) also scales as \(t^{(d-1)/d}\). To test this, we fitted power laws to the Betti curves in MATLAB. Estimated exponents are relatively close to their conjectured values for \(\beta _1\) for the Eden model in dimensions 2 through 5. Notably, \(\beta _3\) in the four-dimensional Eden model and \(\beta _3\) and \(\beta _4\) in the five-dimensional Eden model are growing much faster than expected, at a rate exceeding that of the volume. We take this as further evidence that statistics have not stabilized in this case.

Another interesting trend in three and four dimensions is that the \(\beta _i\) for small i starts out larger at the beginning and is overtaken by \(\beta _j\) for large j as time goes on. Recall from Conjecture 2.2 that \(C_{i,d}\) is the conjectured limit of \(\beta _{i}(t)/t^{(d-1)/d}\) as \(t\rightarrow \infty \). This data suggests a further conjecture.

Conjecture 6.2

For \(0<i<j<d\), \(C_{i,d}>C_{j,d}\).

As we will see in the next section using persistent homology, a heuristic explanation for this behavior is that while higher dimensional homology classes form more infrequently than lower-dimensional ones, they last for much longer.

6.3 Persistent Homology

When \(\beta _i(t)\) changes, one would like to associate this with a specific geometric feature of A(t) (an “i-dimensional hole”) that forms or disappears at time t. In general, it is impossible to single out a specific such feature, as this requires a choice of basis for the i-dimensional homology and there are many reasonable choices (though the situation is clearer in codimension one, as we will see in the next section). However, there is a well-defined pairing between the events where an i-dimensional homology class is born and \(\beta _i\) increases and the events where an i-dimensional homology class dies and \(\beta _i\) decreases. This can be found using persistent homology.

Persistent homology [16] tracks the birth and death of homology generators over time. More precisely, if \(X_1\hookrightarrow X_2\hookrightarrow \ldots \hookrightarrow X_n\) is a filtration of topological spaces (that is, a sequence of topological spaces where each is a subset of the next), the i-dimensional persistent homology intervals \(PH_i(\mathcal {X})\) are the unique set of half-open intervals \(\{[\textbf{b}_l,\textbf{d}_l)\}\) with endpoints in \(\{1,\dots ,n\}\) so that

$$\begin{aligned} {{\,\textrm{rank}\,}}{(H_i(X_j)\rightarrow H_i(X_k))}=\#\,\{I\in PH_i(\mathcal {X}):[j,k]\subset I\}. \end{aligned}$$

Compatible bases can be chosen for the homology groups \(H_i(X_j)\) so that an interval \([\textbf{b},\textbf{d})\) corresponds to a homology basis element that is born in \(H_i(X_{\textbf{b}})\), is mapped forward to basis elements in \(H_i(X_j)\) for \(\textbf{b}<j<\textbf{d}\), and dies in \(H_i(X_{\textbf{d}})\). Note that the choice of basis elements is not unique. For a more in depth introduction to persistent homology that describes further algebraic structure see, for example, [9, 15].

Here, we compute the persistent homology of the natural filtration of the Eden growth model through time \(A(1)\hookrightarrow A(2)\hookrightarrow \ldots \hookrightarrow A(t-1) \hookrightarrow A(t)\). This allows us to measure how long a homology class persists after it is born.

We first give a heuristic estimate for the expected persistence. First, note that the persistence in first-passage perolation time of an element of \(H_{d-1}(A(t))\) corresponding to a hole with one tile is exponentially distributed with mean 1. We claim the expectation scales as \(t^{(d-1)/d}\) in Eden time. To compute the expectation in Eden time, we need to estimate the expected difference in Eden time, i.e., volume, from \(A(r)_{FPP }\) to \(A(r+s)_{FPP }\), using this notation for the polycube at FPP time r and \(r+s\), respectively. For \(s\gg \sqrt{r}\), known convergence estimates for the shape theorem imply that this scales as \((r+s)^d-r^d\). For smaller s, we use a heuristic. We assume that as u goes from r to 2r, \(\mathbb {E}(|{A(u+s)_{FPP }}|-|{A(u)_{FPP }}|)\) changes at most by a multiplicative constant independent of r. By splitting the interval [r, 2r] into smaller intervals and using the consequence of the shape theorem above,

$$\begin{aligned} \sum _{n=0}^{\lceil r/s \rceil } \mathbb {E}(|{A(r+(n+1)s)_{FPP }}|-|{A(r+ns)_{FPP }}|)\sim (2r)^d-r^d. \end{aligned}$$

Dividing out and using our assumption, we get \(\mathbb {E}(|{A(r+s)_{FPP }}|-|{A(r)_{FPP }}|)\sim sr^{d-1}\). Integrating over s with respect to the exponential distribution to get the expected Eden time, we see that the expected persistence of a hole with one tile scales as \(t^{(d-1)/d}\), where \(t\sim r^d\) is the Eden time, similar to the expected perimeter. One might guess that the persistence of larger holes and holes of other dimensions follows a similar law; this is also suggested by our data.

Fig. 7
figure 7

Combined persistence diagrams for the (a) two-dimensional, (b) three-dimensional, (c) four-dimensional, and (d) five-dimensional Eden models. The dropoff on the right of the figures is due to finite size effects. The plot lines show the average \(\text {death}-\text {birth}\) in an interval around the \(birth \) value. We used the Perseus software package [32, 33] to compute persistent homology

The persistent homology data for the Eden model in dimensions 2–5 is shown in Fig. 7. While persistent homology is usually plotted in a scatter plot of birth versus death, we plot the birth versus the persistence to see how the expected persistence of a homology class changes over time. The scatter of points shows all intervals seen in the simulation, and the solid lines give an estimate of the average persistence of an interval with the given birth time. Note that the drop-off in the distribution of the deaths to the left of the plot is an artifact of the finite size of the simulation. In all cases, higher-dimensional homology classes persist longer on average than lower-dimensional ones. This is unsurprising, as a homology class in \(H_{d-1}\) can be killed only by adding specific tiles, but there are more ways to to kill lower-dimensional classes. On the other hand, there are more intervals for each dimension below \(d-1\) (for example, for the four-dimensional EGM there are \(3.4\times 10^4\), \(8.7\times 10^4\), and \(2.2\times 10^4\) intervals in dimensions 1, 2, and 3, respectively). These two trends explain the behavior we observed in Fig. 6, where the higher-dimensional Betti curves start below the lower-dimensional ones and then overtake them as time goes on: while fewer high-dimensional classes are born, they last much longer. Furthermore, most of the curves appear linear and parallel with \(P_d(\textbf{b})\) for a wide range, suggesting they follow a power law with the same exponent as \(P_d(\textbf{b})\).

Fig. 8
figure 8

Empirical distributions of the normalized persistence for the persistent homology of the Eden model in (a) three dimensions and (b) four dimensions. See the text for more details

The data in Fig. 7 suggests that the expected persistence of an interval born at time \(\textbf{b}\) scales as the perimeter, which is believed to scale as \(\textbf{b}^{(d-1)/d}\). One might also suspect that the distribution of the normalized quantity \((\textbf{d}-\textbf{b})/\textbf{b}^{(d-1)/d}\) converges as the birth time is taken to \(\infty \). The empirical distribution of this normalized persistence is shown for Eden models in three and four dimensions in Fig. 8. The figure includes data for intervals with birth times between \(t=1\times 10^5\) and \(t=2\times 10^5\); the upper cutoff was chosen so only a small percentage of intervals born before that time persisted beyond \(t=5\times 10^5\). (Computing the same histograms in a disjoint time interval results in a similar distribution.) Notably, the normalized persistence for \({{\,\textrm{PH}\,}}_2\) has a substantially longer tail than that of \({{\,\textrm{PH}\,}}_1\) for the three-dimensional Eden model, and both \({{\,\textrm{PH}\,}}_3\) and \({{\,\textrm{PH}\,}}_2\) have long tails for the four-dimensional model. For the latter case, it is somewhat surprising that the distribution for \({{\,\textrm{PH}\,}}_2\) is more similar to that for \({{\,\textrm{PH}\,}}_3\) than that for \({{\,\textrm{PH}\,}}_1\).

6.4 Local Geometry of Holes

In this section, we explore random variables defined in terms of the geometry associated to the \((d-1)\)-dimensional homology of the EGM in d dimensions. These variables are: the areas, shapes, and evolution of the top-dimensional holes.

6.4.1 Areas and Shapes

Betti numbers allow us to count the number of holes of each dimension. Alas, they tell us nothing about the geometry associated with these holes. As mentioned before, it is not easy to measure the geometric properties associated with homology in dimensions 1 through \(d-2\) as one cannot uniquely define representative cycles. Fortunately, for top-dimensional holes, we can use Alexander duality to associate generators of \(H_{d-1}(A(t))\) with components of the complement of A(t). In what follows, we present statistics concerning the area and shapes of top-dimensional holes in simulations of the EGM, largely focusing on dimension 2.

Fig. 9
figure 9

Normalized histogram of the areas of holes in the 2D Eden model, both at time 1 million and over all times (measured at the birth of the corresponding persistent homology interval). The latter includes data both from holes created from the outer perimeter and those resulting from the division of an existing hole. Data was averaged over ten simulations of the 2D EGM up to time 1 million. Here we show only data for holes up to area 10. More detailed statistical information is contained in Table 2

Table 2 Table showing numerical data corresponding to Fig. 9

Figure 9 shows a histogram of the areas of holes in the two-dimensional EGM with respect to two distributions: the areas of the holes at the time they were born, taken over all time (in orange with diagonal lines), and the areas of the holes present at time 1 million (in blue with spots). Unsurprisingly, the areas of holes at the time they were born are slightly larger than the snapshot at time one million. The frequency of holes of a given area appears to decrease somewhat sub-exponentially as a function of area, although the relationship is less clear for the smaller sample. Table 2 shows the corresponding numerical data. Data was taken from ten simulations of the two-dimensional EGM consisting of 1 million tiles.

Before studying the shapes of the holes in the two-dimensional EGM, we need to establish some conventions about how to count polycubes of a given area. Two shapes are instances of the same fixed polycube if they are congruent after translation and of the same free polycube if they are congruent after rotations, reflections, and translations. For example, there are 19 fixed polycubes of area four, and five free polycubes of area four. All free polycubes of areas three and four are depicted in Table 3, with the corresponding number of fixed polycubes in the first row.

Table 3 The proportion of holes of areas three and four which take the shape of each free polycube

In Table 3, we show the proportion of holes in the two-dimensional EGM that take the shape of each free polycube of area three or four. The data was taken from ten different runs of the Eden model through time 1 million. In this sample, we observed an average of 147, 306.5 holes of all sizes with a sample standard deviation of 152.4, of which an average of 6, 113.5 holes had area four and 12, 026.6 had area three at the moment of their birth, with sample standard deviations of 97.1 and 54.8 respectively. At time 1 million, we observed an average of 1, 231.5 holes of all sizes with a sample standard deviation of 34.2, of which an average of 17.7 holes had area four and 47.6 holes had area three, with sample standard deviations of 6.2 and 7.4 respectively (in Table 2 these statistics are presented as frequencies). Note that the most common birth shape of area four is the roundest (the square) when controlling for multiplicity, and the least common is the longest. However, at time 1 million, the T-shape just edges out the square. The difference between these frequencies is likely related to properties of the “reverse process” we describe in the next section.

We have also recorded the extremal volumes of holes in dimension 2 through 5. In ten simulations of the two dimensional Eden model up to time 1 million, the largest hole created had an area of 48.7 and a standard deviation of 10.564. One of these largest holes is depicted in Fig. 10, together with the largest hole created in a simulation of the 3D EGM up to time 1 million. In Table 4, we record the volume of the largest top-dimensional hole created in a single simulation through time 1.5 million in each dimension from 2 to 5.

Fig. 10
figure 10

A “cast” of the largest top-dimensional holes. The polycube on the left has area 48 and the polycube on the right has volume 49. They were obtained from a simulation of the EGM up to time one million in two and three dimensions respectively. The polycube is available for interactive exploration at https://skfb.ly/6SnzN

Table 4 Volumes of the largest top-dimensional holes created in an EGM simulation up to time 1.5 million in each of dimensions 2 through 5

6.4.2 Splitting Trees

After a hole forms from the outer perimeter, it may split a number of times before disappearing. This behavior is captured by a splitting tree [38], which tracks the times that division occurs and the resulting polycubes. Note that these splitting times correspond to births of intervals in the \((d-1)\)-dimensional persistent homology. In Fig. 11, we show the splitting tree of the two-dimensional hole depicted in Fig. 10. We do not perform an in depth analysis of this data, but we propose that the “reverse process” that produces this splitting tree is of interest. More precisely, P(t) evolves by the reverse process with initial condition P(0) if P(t) is determined from \(P(t-1)\) by uniformly removing one of the tiles adjacent to the perimeter. This is equivalent to applying the Eden growth process to the complement of P(0).

Fig. 11
figure 11

Splitting tree of the two-dimensional hole depicted in Fig. 10. Its birth time is \(t_b=586{,}942\) and its death time is \(t_d = 618{,}185\)