# Two Applications of Random Spanning Forests

## Abstract

We use random spanning forests to find, for any Markov process on a finite set of size *n* and any positive integer \(m \le n\), a probability law on the subsets of size *m* such that the mean hitting time of a random target that is drawn from this law does not depend on the starting point of the process. We use the same random forests to give probabilistic insights into the proof of an algebraic result due to Micchelli and Willoughby and used by Fill and by Miclo to study absorption times and convergence to equilibrium of reversible Markov chains. We also introduce a related coalescence and fragmentation process that leads to a number of open questions.

### Keywords

Finite networks Spectral analysis Spanning forests Determinantal processes Random sets Hitting times Coalescence and fragmentation Local equilibria### Mathematics Subject Classification (2010)

Primary: 05C81 60J20 15A15 Secondary: 15A18 05C85## 1 Well-Distributed Points, Local Equilibria and Coupled Spanning Forests

### 1.1 Well-Distributed Points and Random Spanning Forests

Let \(X = (X(t) : t \ge 0)\) be an irreducible continuous-time Markov process on a finite set \(\mathcal{X}\) with size \(|\mathcal{X}| = n\). It is known, see, for example, Lemma 10.8 in [9], that if \(R \in \mathcal{X}\) is chosen according to the equilibrium measure \(\mu \) of the process, then the mean value of the hitting time \({{\mathbb {E}}}[E_x[T_R]]\)—where \(E_x[\cdot ]\) stands for the mean value according to the law of the process started at *x* and \({{\mathbb {E}}}[\cdot ]\) stands for the mean value according to the law of *R*—does not depend on the starting point \(x \in \mathcal{X}\). More generally, if a random subset \(R \subset \mathcal{X}\) of any (possibly random) size has such a property, then we say that the law of *R* provides *well-distributed points*. One of our motivations for building such random sets was to find appropriate subsampling points for signal processing on arbitrary networks, in connection with intertwining equations and metastability studies (cf. [2]). In this paper, we build such a law on the subsets of any given size \(m \le n\). This is a trivial problem for \(m = n\), and for \(m = 1\), this property actually characterizes the law of *R*: in this case, the singleton *R* has to be chosen according to the equilibrium law.

*w*(

*x*,

*y*) the jump rate of

*X*from

*x*to

*y*in \(\mathcal{X}\) and by \(\mathcal{G} = (\mathcal{X}, w)\) the weighted and oriented graph for which

*x*,

*y*) is an edge of \(\phi \). The root set \(\rho (\phi )\) of the forest \(\phi \) is the set of points \(x \in \mathcal{X}\) for which there is no edge (

*x*,

*y*) in \(\phi \); the connected component of \(\phi \) are trees, each of them having edges that are oriented towards its own root. We call \(\mathcal{F}\) the set of all rooted spanning forests, we see each forest \(\phi \) in \(\mathcal{F}\) as a subset of \(\mathcal{E}\), and we associate with it the weight

*n*degenerate trees reduced to simple roots and \(w(\emptyset ) = 1\). We can now define our random forests: for each \(q > 0\), the random spanning forest \(\Phi _q\) is a random variable in \(\mathcal{F}\) with law

*L*given by

*L*, which computed in

*q*is

*X*is irreducible and ordering the eigenvalue by non-decreasing real part, \(\lambda _0\) is the only one zero eigenvalue, we have \(a_0 = 0\) and we can set \(a_{n + 1} = 0\).

### Theorem 1

We prove this theorem in Sect. 3, in which we also compute, in both cases, as a consequence of it and as needed in [2], mean return times to \(\rho (\Phi _q)\) from a uniformly chosen point in \(\rho (\Phi _q)\). In doing so, we will see that the problem of finding a distribution that provides exactly *m* well-distributed points has infinitely many solutions as soon as \(2 \le m \le n - 2\) and Theorem 1 simply provides one of them. The only cases when the convex set of solutions reduces to a singleton are the known case \(m = 1\), the easy case \(m = n -1\) and the trivial one \(m = n\).

### 1.2 Local Equilibria and Random Forests in the Reversible Case

*L*with its matrix representation with diagonal coefficients \(-w(x)\), where

*L*is reversible with respect to \(\mu \) and we write

*a priori*signed measures \(\nu _k\), with \(k < l\), by

### Micchelli and Willoughby’s Theorem

If *L* is reversible, then \(\nu _k\) is a non-negative measure for all non-negative \(k < l\) and any probability measure \(\nu \) on \(\mathcal{X} {\setminus } \mathcal{B}\).

*local equilibria*.

The previous probabilistic interpretation makes sense only once the non-negativity of the \(\nu _k\)s is guaranteed by Micchelli and Willoughby’s theorem, which is crucial in Fill’s and Miclo’s analysis. The fully algebraic proof by Micchelli and Willoughby describes the \(\nu _k\)s in terms of some divided differences and uses Cauchy’s interlacement theorem in an inductive argument to conclude to positivity. We will show in Sect. 4, on the one hand, that computing the probability of certain events related to our random forests \(\Phi _q\) leads naturally to the divided difference representation of the \(\nu _k\)s, when one has in mind their local equilibria interpretation. This will be done by using Wilson’s algorithm, which gives an alternative description of our random forests (see Sect. 2). On the other hand, our random forest original description will lead to the key formula of the inductive step: from the random forest point of view, this algebraic formula is nothing but a straightforward connection between the previous event probabilities. Section 4 contains the full derivation of Micchelli and Willoughby’s theorem.

### 1.3 Coupling Random Forests, Coalescence and Fragmentation

In dealing with practical sampling issues in the next section, we will couple all the \(\Phi _q\)s together in such a way that we will obtain the following side result.

### Theorem 2

With each spanning forest \(\phi \), we can associate a partition \(\mathcal{P}(\phi )\) of \(\mathcal{X}\), for which *x* and *y* in \(\mathcal{X}\) belong to the same class when they are in the same tree. We will see in Sect. 2.3 that the coupling \(t \mapsto \Phi _{1/t} = F(\ln (1 + \alpha t))\) is then associated with a fragmentation and coalescence process, for which coalescence is strongly predominant, and at each jump time, one component of the partition is fragmented into pieces that possibly coalesce with the other components. This coupling will lead to a number of open questions: (1) Is it possible to use this process to sample efficiently \(\Phi _q\) with a prescribed number of roots? (2) Can we use it to estimate the spectrum of *L*? (3) How to characterize the law of the associated partition process? (See Sect. 2.3 for more details.)

## 2 Preliminary Remarks and Sampling Issues

### 2.1 Wilson Algorithm, Partition Function and the Root Process

*r*, which can be sampled with Wilson’s algorithm (cf. [14]). For \(q > 0\), \(\Phi _q = \Phi _{q, \emptyset }\) itself is also a special case of the usual random spanning tree on an extended weighted graph \(\bar{\mathcal{G}} = (\bar{\mathcal{X}}, {\bar{w}})\) obtained by addition of an extra point

*r*to \(\mathcal{X}\)—to form \(\bar{\mathcal{X}} = \mathcal{X} \cup \{r\}\)—and by setting \({\bar{w}}(x, r) = q\) and \({\bar{w}}(r, x) = 0\) for all

*x*in \(\mathcal{X}\). Indeed, to get \(\Phi _q\) from the usual random spanning tree on \(\bar{\mathcal{X}}\), with the root in

*r*, one only needs to remove all the edges going from \(\mathcal{X}\) to

*r*. Following Propp and Wilson (cf. [13]), we can then use Wilson’s algorithm to sample \(\Phi _{q, \mathcal{B}}\) for \(q > 0\) or \(\mathcal{B} \ne \emptyset \):

- a.
start from \(\mathcal{B}_0 = \mathcal{B}\) and \(\phi _0 = \emptyset \), choose

*x*in \(\mathcal{X} {\setminus } \mathcal{B}_0\) and set \(i = 0\); - b.
run the Markov process starting at

*x*up to time \(T_q \wedge T_{\mathcal{B}_i}\) with \(T_q\) an independent exponential random variable with parameter*q*(so that \(T_q = +\infty \) if \(q = 0\)) and \(T_{\mathcal{B}_i}\) the hitting time of \(\mathcal{B}_i\); - c.withthe loop-erased trajectory obtained from \(X : [0, T_q \wedge T_{\mathcal{B}_i}] \rightarrow \mathcal{X}\), set \(\mathcal{B}_{i + 1} = \mathcal{B}_i \cup \{x_0, x_1, \ldots , x_k\}\) and \(\phi _{i + 1} = \phi _i \cup \{(x_0, x_1), (x_1, x_2), \ldots , (x_{k - 1}, x_k)\}\) (so that \(\phi _{i + 1} = \phi _i\) if \(k = 0\));$$\begin{aligned} \Gamma ^x_{q, \mathcal{B}_i} = (x_0, x_1, \ldots , x_k) \in \{x\} \times \bigl (\mathcal{X} {\setminus } (\mathcal{B}_i \cup \{x\})\bigr )^{k - 1} \times \bigl (\mathcal{X} {\setminus } \{x\}\bigr ) \end{aligned}$$
- d.
if \(\mathcal{B}_{i + 1} \ne \mathcal{X}\), choose

*x*in \(\mathcal{X} {\setminus } \mathcal{B}_{i + 1}\) and repeat b–d with \(i + 1\) in place of*i*, and, if \(\mathcal{B}_{i + 1} = \mathcal{X}\), set \(\Phi _{q, \mathcal{B}} = \phi _{i + 1}\).

*x*does not matter.

There are at least two ways to prove that this algorithm indeed samples \(\Phi _{q, \mathcal{B}}\) with the desired law, whatever the way in which the starting points *x* are chosen. One can, on the one hand, follow Wilson’s original proof in [14], which makes use of the so-called Diaconis–Fulton stack representation of Markov chains (see Sect. 2.3). One can, on the other hand, follow Marchal who first computes in [10] the law of the loop-erased trajectory \(\Gamma ^x_{q, \mathcal{B}}\) obtained from the random trajectory \(X : [0, T_q \wedge T_{\mathcal{B}}] \rightarrow \mathcal{X}\) started at \(x \in \mathcal{X} {\setminus } \mathcal{B}\) and stopped in \(\mathcal{B}\) or at an exponential time \(T_q\) if \(T_q\) is smaller than the hitting time \(T_\mathcal{B}\). One has indeed:

### Theorem [Marchal]

*j*in \(J_0\), and \(C_j\), with

*j*in \(J_+\), such that the \(B_j\)s follow Bernoulli laws and the \(C_j\)s follow convolutions of conjugated “complex Bernoulli laws”:

*j*in \(J_+\). This is equivalent to

### Proposition 2.1

### Proof

*j*in \(J_+\) there is \(j'\) in \(J_-\) such that \(\lambda _{j', \mathcal{B}} = {\bar{\lambda }}_{j, \mathcal{B}}\) and \(p_{j'} = {\bar{p}}_j\), the proof is complete. \(\square \)

The fact that \(\Phi _q\) is the usual random spanning tree on an extended graph implies, through the (non-reversible) transfer current theorem (cf. [4] and [5]), that \(\Phi _{q, \mathcal{B}} \subset \mathcal{E}\) is a determinantal process and so is \(\rho (\Phi _{q, \mathcal{B}}) \subset \mathcal{X}\). (In the reversible case at least, the fact that the law of \(|\rho (\Phi _{q, \mathcal{B}})|\) is a convolution of Bernoulli laws is also a consequence of this determinantality property.) Let us give a direct and short proof of the fact that \(\rho (\Phi _{q, \mathcal{B}})\) is a determinantal process associated with a remarkable kernel.

### Proposition 2.2

### Proof

*D*in a \(2 \times 2\) block matrix

*S*is the sub-Markovian generator of the trace on \(\mathcal{A}\) of the original process killed in \(\mathcal{B}\) and at rate

*q*outside \(\mathcal{B}\), while \(S^{-1}\) is the associated Green’s kernel).

*L*to conclude this proof. The trajectory of

*X*can be built by updating at each time of a Poisson process of intensity \(\alpha \), defined in Eq. (2), the current position \(x \in \mathcal{X}\) to \(y \in \mathcal{X}\) with probability

*x*and

*y*in \(\mathcal{X} {\setminus } \mathcal{B}\)

*m*trees, each of them spanning one of the \(\mathcal{X}_i\)s. For each \(i \le m\), we denote by \(L_i\) the generator of

*X*

*restricted to*\(\mathcal{X}_i\), which is defined by

*X*) when

*X*is reversible. If \([\mathcal{X}_1, \ldots , \mathcal{X}_m]\) is an admissible partition of \(\mathcal{X}\), that is if \(\mathcal{P}(\Phi _q) = [\mathcal{X}_1, \ldots , \mathcal{X}_m]\) with nonzero probability, then, denoting by \(\mathcal{T}_i\) the set of spanning

*trees*of \(\mathcal{X}_i\) and by \(\rho (\tau _i)\) the root \(x_i \in \mathcal{X}_i\) of \(\tau _i \in \mathcal{T}_i\), we can compute for any \((x_1, \ldots , x_m)\) in \(\mathcal{X}_1\times \cdots \times \mathcal{X}_m\)

### Proposition 2.3

See Fig. 1 for an illustration with the two-dimensional nearest-neighbour random walk in a Brownian sheet potential, which is easy to sample and gives rise to a rich and anisotropic energy landscape.

### 2.2 Sampling Approximately *m* Roots

*m*roots, with an error of order \(\sqrt{m}\) at most. By Proposition 2.1, it suffices to choose

*q*solution of

*q*, is large enough, since \(\mathrm{Var}\bigl (|\rho (\Phi _q)|\bigr ) / {{\mathbb {E}}}^2\bigl [|\rho (\Phi _q)|\bigr ] \le 2 / {{\mathbb {E}}}\bigl [|\rho (\Phi _q)|\bigr ]\). We then propose the following algorithm to sample \(\Phi _q\) with \(m \pm 2\sqrt{m}\) roots.

- a.
Start from any \(q_0 > 0\), for example \(q_0 = \alpha = \max _{x \in \mathcal{X}} w(x)\), and set \(i = 0\).

- b.
Sample \(\Phi _{q_i}\) with Wilson’s algorithm.

- c.
If \(|\rho (\Phi _{q_i})| \not \in \bigl [m - 2 \sqrt{m}, m + 2 \sqrt{m}\bigr ]\), set \(q_{i + 1} = m q_i / |\rho (\Phi _{q_i})|\) and repeat b–c with \(i + 1\) instead of

*i*, if \(|\rho (\Phi _{q_i})| \in \bigl [m - 2 \sqrt{m}, m + 2 \sqrt{m}\bigr ]\), then return \(\Phi _{q_i}\).

*f*is a contraction in a neighbourhood of \(q^*\) only, let us show that

*g*is indeed a global contraction. For all \(\gamma \in {{\mathbb {R}}}\), it holds

*j*in \(J_+\), we have

*L*that would be needed to give uniform bounds on \(|g'|\) and the \(\theta _k\)s) the approximation error for \(\gamma ^*\) is of the same order as \(\epsilon _{k - 1}\)—itself of order \(1/\sqrt{m}\) at most—and we get

*m*roots for \(\Phi _q\) within an error of order \(\sqrt{m}\).

### 2.3 Coupled Forests

*m*roots. This typically requires order \(\sqrt{m}\) extra iterations at most, and this is what we have done to obtain the exactly 50 roots of Fig. 1. Starting with \(q = q_0\) larger than the solution \(q^*\) of Eq. (10), it takes generally much more time to reach exactly

*m*roots than to decrease

*q*down to a good approximation of \(q^*\) according to the updating procedure \(q \leftarrow q \times m / |\rho (\Phi _q)|\). For example, starting from \(q = 4\) for the Metropolis random walk in Brownian sheet potential of Fig. 1, we got 361,782 roots at the first iteration, 51 roots and \(q = 5.26 \times 10^{-6}\) at the tenth iteration, and we needed 55 extra iterations to get exactly 50 roots with \(q = 4.92 \times 10^{-6}\), getting in the mean time root numbers oscillating between 43 and 59 for

*q*between \(3.96 \times 10^{-6}\) and \(6.07 \times 10^{-6}\). While decreasing

*q*, we produce a number of forests with a larger root number than desired, and, sampling for large

*q*being less time-consuming than sampling for small

*q*, the total running time of the iterations to decrease

*q*to the correct order is essentially of the same order as the running time of one iteration for

*q*of this correct order. This suggests that if we could continuously decrease

*q*in such a way that \(\Phi _q\) would cross all the manifolds

*q*algorithm,” building in this way the coupling of Theorem 2. But this is not sufficient to improve our sampling algorithm for a prescribed root number.

In this section, we prove Theorem 2, characterize the associated root process and describe the associated coalescence and fragmentation process, which leads to further open questions. This coupling is the natural extension of Wilson’s algorithm based on Diaconis and Fulton’s stack representation of random walk (cf. [6]) as used by Wilson and Propp in [14] and [13].

*Stack representations* Assume that an infinite list or collection or arrows is attached to each site of the graph, each arrow pointing towards one of its neighbour. Assume in addition that these arrows are distributed according to the probability kernel *P* of the discrete-time skeleton of *X* which is defined by Eqs. (8)–(9). Assume in other words that these arrows are independently distributed at each level of the stacks and that an arrow pointing towards the neighbour *y* of a given site *x* appears with probability *P*(*x*, *y*), considering in this context *x* itself as one of its neighbours. Imagine finally that each list of arrows attached to any site is piled down in such a way that it makes sense to talk of an infinite stack with an arrow on the top of this stack. By using this representation, one can generate the Markov process as follows: at each jump time of a Poisson process with intensity \(\alpha \), our walker steps to the neighbour pointed by the arrow at the top of the stack where it was sitting, and the top arrow is erased from this stack.

*r*in each stack. Such a pointer should independently appear with probability \(q/(q + \alpha )\) at each level in the different stacks. One way to introduce it is by generating independent uniform random variables

*U*together with each original arrow in the stacks. We can then replace the latter by a pointer to the absorbing state whenever \(U < q / (q + \alpha )\). A possible description of Wilson’s algorithm is then the following.

- a.
Start with a particle on each site. Both particles and sites will be declared either

*active*or*frozen*. At the beginning, all sites and particles are declared to be active. - b.Choose an arbitrary particle among all the active ones and look at the arrow at the top of the stack it is seated on. Call
*x*the site where the particle is seated.If the arrow is the pointer to

*r*, declare the particle to be frozen and site*x*as well.If the arrow points towards another site \(y \ne x\), remove the particle and keep the arrow. We say that this arrow is

*uncovered*.If the arrow points to

*x*itself, remove the arrow.

- c.Once again, choose an arbitrary particle among all the active ones, look at the arrow on the top of the stack it is seated on, and call
*x*the site where the particle is seated.If the arrow points to

*r*, the particle is declared to be frozen, and so are declared*x*and all the sites eventually leading to*x*by following uncovered top pile arrow paths.If the arrow points to a frozen site, remove the chosen particle at

*x*, keep the (now uncovered) arrow, and freeze the site*x*as well as any site eventually leading to*x*by following uncovered top pile arrow paths.If the arrow points to an active site, then there are two possibilities. By following from this site the uncovered arrows at the top of the stacks, we either reach a different active particle or run in a loop back to

*x*. In the former case, remove the chosen particle from site*x*and keep the discovered arrow. In the latter case, erase all the arrows along the loop and put an active particle on each site of the loop. Note that this last case includes the possibility for the discovered arrow of pointing to*x*itself, in which case we just have to remove the discovered arrow.

- d.
Repeat the previous step up to exhaustion of the active particles.

*the same spanning forest of uncovered arrows, with a frozen particle at each root, is obtained*. In particular, by choosing at each step the last encountered active particle, or the same as in the previous step when we just erased a loop, we perform a simple loop-erased random walk up to freezing.

### Proof of Theorem 2

*q*by the previously described algorithm and the same uniform variables

*U*can be used for each

*q*, this provides a global coupling for all the \(\Phi _q\). We first note that this coupling allows to sample \(\Phi _{q_2}\) from a sampled \(\Phi _{q_1}\) for \(q_2 < q_1\). Indeed, by running this algorithm for sampling \(\Phi _{q_2}\), one can reach at some point the spanning forest of uncovered arrows \(\Phi _{q_1}\) with this difference that the frozen particles of the final configuration obtained with parameter \(q_1\) can be still active at this intermediate step of the algorithm run with \(q_2\): it suffices to choose the sequence of active particles in the same way with both parameters, and this is possible since each pointer to

*r*in the stacks with parameter \(q_2\) is associated with a pointer to

*r*at the same level in the stacks with parameter \(q_1\). Thus, to sample \(\Phi _{q_2}\) from a sampled \(\Phi _{q_1}\), we just have to replace some frozen particles in \(\rho (\Phi _{q_1})\) and continue the algorithm with parameter \(q_2\). To decide which particle has to be unfrozen we can proceed as follows. With probability

*x*of \(\rho (\Phi _{q_1})\) is declared active and we set at the top of the pile in

*x*an arrow that points towards

*y*with probability \(P(x, y) = w(x, y) / \alpha \).

*m*roots at time

*t*, and the next jump time

*T*when it will “wake up” is such that the random variable

*m*independent uniform variables on \(\bigr [0, q / (q + \alpha )\bigl ) = \bigr [0, 1 / (1 + \alpha t)\bigl )\). Since

*V*has the same law as \(U^{1 / m} / (1 + \alpha t)\) with

*U*uniform on [0, 1). Using Eq. (13), we can then sample the next jump time

*T*by solving

*m*independent exponential random variables of rate 1.

Our Markov process \((F(s) \in \mathcal{F} : s \ge 0)\) is then built in the following way. We associate *m* independent exponential random clocks of rate 1 with the *m* roots of *F*(*s*) at time *s*. At the first ring time \(S \ge s\) at some root *x*, we define *F*(*S*) by declaring active the particle at *x*, putting an arrow to *y* with probability \(P(x, y) = w(x, y) / \alpha \) and restarting our algorithm with parameter \(q = 1 / T = \alpha / (e^S - 1)\). \(\square \)

*A determinantal formula for the associated root process.* Proposition 2.2, from which we recall the definition of the probability kernel \(K_{q, \mathcal{B}}\), can be extended to characterize the law of the coupled root process \(t \mapsto \rho (\Phi _{1 / t})\).

### Proposition 2.4

### Proof

*cannot*sample \(\Phi _{q_2}\) conditioned on \(\bigl \{\mathcal{A}_1 \subset \rho (\Phi _1)\bigr \}\) by keeping “frozen” each site in \(\mathcal{A}_1\) with probability

*p*defined by Eq. (12), calling \(\mathcal{B}\) the set of the remaining frozen sites and sampling \(\Phi _{q_2, \mathcal{B}}\) with this random \(\mathcal{B} \subset \mathcal{X}\), so that the root set would be a determinantal process with kernel \(K_{q_2, \mathcal{B}}\). The walking up procedure we defined after Eq. (12) indeed introduces a bias in the distribution at the top of the pile for the unfrozen sites: top pile arrows cannot be replaced by pointers to

*r*. To recover a determinantal process with random kernel \(K_{q_2, \mathcal{B}}\) for the conditional root process, the random set \(\mathcal{B}\) has to be built by keeping frozen each site in \(\mathcal{A}_1\) with a smaller probability \(p'\) solving

*r*with probability \(q_2 / (q_2 + \alpha )\), and this equation makes that we recover the correct biased probability. Solving it, we get \(p' = q_2/q_1 = t_1/t_2\) and Eq. (15).

When *k* is larger than 1, the formula is simply obtained by keeping frozen each site *x* in \(\bigcup _{i \le k} \mathcal{A}_k\) with a probability that depends on the largest *i* such that \(x \in \mathcal{A}_i\). This is the reason why we introduced the sets \(\mathcal{A}'_i\) : \(i^*\) is the largest *i* such that \(x \in \mathcal{A}_i\) if and only if \(x \in \mathcal{A}'_{i^*}\). \(\square \)

Fragmentation, coalescence and open questions. At each jump time \(S = S_{k + 1}\) of *F* and in the proof of Theorem 2, there is only one root *x* to “wake up,” which means that there is only one piece of the associated partition into *m* pieces at the previous jump time \(S_k\) that can be fragmented into different trees, the other pieces of the previous partition remaining contained in different pieces of the new partition at time \(S_{k + 1}\). At time \(S_{k + 1}\) we can have both fragmentation, produced by the loop-erasure procedure, and coalescence: the trees covering the possibly fragmented piece can be eventually grafted to the other \(m - 1\) non-fragmented frozen trees, when their associated loop-erased random walk freezes by running into these frozen trees.

*k*the number of sites in the tree that is rooted at

*x*: this happens when this tree is completely fragmented and no coalescence occurs. Coalescence can decrease the number of pieces by 1 at most: when each tree of the possibly fragmented piece is eventually grafted to the other pieces. But coalescence strongly dominates the process: as \(q = 1 / t\) decreases, so does \({{\mathbb {E}}}\bigl [|\rho (\Phi _q)|\bigr ]\), with limited fluctuations, as a consequence of Proposition 2.1 (cf. Figs. 2, 3, 4). And the fact that when \(|\rho (\Phi _q)|\) decreases, it does so by one unit at most, implies that the process \(t \mapsto \Phi _{1 / t}\) crosses all the manifolds \(\mathcal{F}_m\) defined by Eq. (11).

*m*that produces well-distributed points. We then get to our first open question

- Q1:
Is there a way to use the process \(t \mapsto \Phi _{1 / t}\) to sample the measure \({{\mathbb {P}}}\bigl (\Phi _q \in \cdot \bigm | |\rho (\Phi _q)| = m\bigr )\)?

- Q2:
Is there a way to use the process \(t \mapsto \Phi _{1 / t}\) to estimate in an efficient way the spectrum of \(-L\), or its higher part at least?

*rooted*since a special vertex, the root, is associated with each piece of the partition.) Figures 3 and 4 and Wilson’s algorithm show that as \(q = 1 / t\) decreases, the partition process naturally tends to break the space into larger and larger valleys in which the process is trapped on time scale \(t = 1 / q\) (note that the difference of \(12 = 27 - 15\) between the extreme values of \(s = \ln (1 + \alpha t)\) in the right picture of Fig. 4 corresponds to a ratio of order \(1.6 \times 10^5\) between the associated times

*t*). But, while we could characterize in Proposition 2.4 the law of the associated root process, we are far from obtaining a similar result for the rooted partition.

- Q3:
Which characterization can be given of the rooted partition associated with \(t \mapsto \Phi _{1 / t}\)?

*q*and an easier question would be that of characterizing the law of the forest process itself. Even though Fig. 2 echoes Figure 5 of [3], which illustrates a coalescence process that is also associated with random spanning forests, the two processes are quite different and we do not know the scaling limit of our process, even for a fixed value of

*q*. The process considered in [3] is a pure coalescence process, while fragmentation is also involved in our case; at a fixed time

*t*, the tree number in that case of the uniformly cut uniform spanning tree follows a binomial distribution, while the tree number of our process is distributed as a sum of Bernoulli random variables with non-homogeneous parameters; and, even conditioned on a same tree number, if the weights of the associated partitions share the same product of unrooted spanning tree number for each piece of the partition, the extra entropic factor depends in that case of these pieces’ boundaries, while in our case it is simply given by the product of their size.

## 3 Hitting Times

### 3.1 Forest Formulas for Hitting Distributions, Green’s Kernels and Mean Hitting Time

In order to prove Theorem 1, we first use Wilson’s algorithm to give forest representations of hitting distributions, Green’s kernels and mean hitting times. Two at least of these formulas, Formula (16) and Formula (17), already appeared in the work of Freidlin and Wentzell (see Lemma 3.2 and Lemma 3.3 in [8]).

*x*as first starting point for the loop-erased random walk, we first note that for all \(y \in \mathcal{B}\), it holds

### Lemma 3.1

### Proof of Lemma 3.1:

*X*—that is the Markov chain \({\hat{X}}\) with transition kernel

*P*defined by Eqs. (8)–(9)—and we call \({\hat{G}}_\mathcal{B}\) the Green’s kernel of \({\hat{X}}\) stopped in \(\mathcal{B}\). Let us denote by \({\hat{T}}_z\), \({\hat{T}}_z^+\) and \({\hat{T}}_\mathcal{B}\) the hitting time of

*z*, the return time to

*z*and the hitting time of \(\mathcal{B}\) for the Markov chain \({\hat{X}}\). Since \({\hat{G}}_\mathcal{B}(x, z) = P_x\bigl ({\hat{T}}_z< {\hat{T}}_\mathcal{B}\bigr ){\hat{G}}_\mathcal{B}(z, z) = P_x\bigl ({\hat{T}}_z < {\hat{T}}_\mathcal{B}\bigr ) / P_z\bigl ({\hat{T}}_z^+ > {\hat{T}}_\mathcal{B}\bigr )\), it holds

*z*,

*y*) the only edge in \(\phi '\) that is issued from

*z*, then we have \(\rho (\phi ) = \mathcal{B} \cup \{z\}\), \(\rho (\tau _y(\phi )) \ne z\), \(w(\phi ') = w(\phi )w(z, y)\) and we recognize \(\sum _{\phi '} w(\phi ') {{\mathbb {1}}}_{\{\rho (\phi ') = \mathcal{B}\}} = Z_\mathcal{B}(0)\) in the last denominator. \(\square \)

### 3.2 Well-Distributed Roots

### Proof of Theorem 1

*R*. The reason why we use this heavy double \(^+\) notation is that we will also consider the maybe less natural but often more useful

*randomized*or

*skeleton return time*\(T_R^+\), which is defined as follows. Assuming that

*X*is built by updating its current position at each time of a Poisson process of intensity \(\alpha \) according to the probability kernel

*P*defined by Eqs. (8)–(9), the skeleton return time is

*w*(

*x*) defined by Eq. (1). Like in the previous proof, we write \(S_q\) for \(S_{q, \emptyset }\) defined in Eq. (6), and we stress that its law depends on the spectrum of

*L*only.

### Proposition 3.2

### Proof

*x*belongs to \(\mathcal{B}\), it holds \(h_\mathcal{B}(x) = 0\) and, when \(x \not \in \mathcal{B}\),

*h*is constant on \(\mathcal{X}\), so that \(\bigl (Ph\bigr )(x) = h(x)\) for all

*x*in \(\mathcal{X}\), which implies, together with the previous equality,

*w*(

*x*)

*m*and summing on \(x \in \mathcal{X}\), the first part of the proposition. The last two equalities simply follow from Proposition 2.1.

*h*is harmonic on \(\mathcal{X}\) if and only if it is constant, the previous arguments actually show that any distribution \(\nu \) on the subsets \(\mathcal{B}\) of \(\mathcal{X}\) provides well-distributed points if and only if it satisfies Eq. (20), which is actually a list of \(n = |\mathcal{X}|\) equations the \(\nu (\mathcal{B})\)s have to satisfy, together with the positivity constraints \(\nu (\mathcal{B}) \ge 0\) for all \(\mathcal{B} \subset \mathcal{X}\) and the additional equation \(\sum _\mathcal{B} \nu (\mathcal{B}) = 1\). If we restrict ourselves to distributions supported by sets of a fixed size

*m*, these are \(n + 1\) linear equations for \({n \atopwithdelims ()m}\) unknown variables. Since \({n \atopwithdelims ()m} > n + 1\) for \(2 \le m \le n -2\) and Theorem 1 provides a solution \(\nu \) with positive mass \(\nu (\mathcal{B}) > 0\) for each subset \(\mathcal{B}\) of size

*m*, this shows that there are infinitely many solutions in this case. When \(m = 1\) or \(m = n - 1\), we have more equations than variables. If \(m = 1\), then solving Eq. (20) is straightforward and we have a unique solution. If \(m = n - 1\), then it is more convenient to solve directly the equation set

*t*, to see that the solution is unique.

## 4 Re-reading Micchelli and Willoughby’s Proof

*x*in \(\mathcal{X} {\setminus } \mathcal{B}\). (In the general case, \(\nu \) is a convex combination of such Dirac masses and we just need to prove the theorem in this special case of a generic Dirac mass.) For \(0 \le k < l\), we have

*x*to sample \(\Phi _{q, \mathcal{B}}\) and by comparing the successive decay times with the exponential random time \(T_q\), we get for any

*y*in \(\mathcal{X} {\setminus } \mathcal{B}\), using the notation of Proposition 2.2 and the notation recalled before Eq. (16),

*q*and multiply by \(Z_\mathcal{B}(q) = \det \bigl (q -[L]_{\mathcal{X} {\setminus } \mathcal{B}}\bigr )\) both sides of this equation to have a simpler polynomial right-hand side. With \(W_\mathcal{B}(q)\) the matrix defined by

### Definition 4.1

*f*defined on \({{\mathbb {R}}}\) and with values in a real vector space, if \(x_0\), \(x_1, \ldots , x_{l - 1}\) are distinct real numbers, the divided differences \(f[x_0]\), \(f[x_0, x_1], \ldots , f[x_0, \ldots , x_{l - 1}]\) are the coefficients of the unique polynomial

*Q*of degree less than

*l*such that

*x*in \({{\mathbb {R}}}\) and \(Q(x_k) = f(x_k)\) for \(k < l\).

### Remark

*x*in \(\mathcal{X}\)

*consequence*of the theorem by Micchelli and Willoughby (the local equilibrium interpretation of each \(\nu _k\) makes sense only once its non-negativity is established), but our goal is to

*prove*this theorem. This is what we are ready to do now.

### 4.1 Checking Eq. (28)

### 4.2 A Combinatorial Identity

The key point of the proof lies in the following lemma.

### Lemma 4.2

### Proof

*x*,

*y*) belongs to \(\phi \), and \(\phi '' = \phi {\setminus } \{(x, z), (z', y)\}\) if

*x*is connected in \(\phi \) to

*y*through \((x, z) \in \phi \) and \((z', y) \in \phi \), possibly with \(z = z'\). Since \(|\rho (\phi ')| = |\rho (\phi )| + 1\) and \(|\rho (\phi '')| = |\rho (\phi )| + 2\), Eq. (30) follows from Eqs. (31)–(33). Equation (29) is given by Eq. (31) with

*y*in place of

*x*. \(\square \)

### 4.3 Conclusion with Cauchy Interlacement Theorem

We will use the following lemma from [11] and for which we give an alternative proof.

### Lemma [Micchelli and Willoughby]

Let \(f : x \in {{\mathbb {R}}} \mapsto \prod _{j < l} (x - \alpha _j) \in {{\mathbb {R}}}\) be a polynomial of degree *l* with *l* distinct zeros \(\alpha _0> \alpha _1> \cdots > \alpha _{l - 1}\). Let \(\beta _0> \beta _1> \cdots > \beta _{L-1}\) be \(L \ge l\) real numbers such that \(\beta _j \ge \alpha _j\) for each \(j < l\). Then, for any \(k \le L\), \(f[\beta _0, \beta _1, \ldots , \beta _k] \ge 0\).

### Proof

We prove the lemma by induction on \(r = l - k\). First, since *f* is a polynomial of degree *l* with a dominant coefficient equal to 1, Definition 4.1 gives \(f[\beta _0, \ldots , \beta _k] = 0\) if \(k > l\)—that is \(r < 0\)—\(f[\beta _0, \ldots , \beta _k] = 1\) if \(k = l\)—that is \(r = 0\)—and the claim is established for \(r \le 0\).

## Notes

### Acknowledgements

Part of this work was done during two long visits by A.G. to Leiden, supported by the ERC Advanced Grant 267356-VARIS of Frank den Hollander. L.A. was supported by NWO Gravitation Grant 024.002.003-NETWORKS.

### References

- 1.Anantharam, V., Tsoucas, P.: A proof of the Markov chain tree theorem. Stat. Probab. Lett.
**8**(2), 189–192 (1989)MathSciNetCrossRefMATHGoogle Scholar - 2.Avena, L., Castell, F., Gaudillière, A., Mélot, C.: Approximate and exact solutions of intertwining equations through random spanning forests (2017). arXiv:1702.05992
- 3.Benoist, S., Dumaz, L., Werner, W.: Near critical spanning forests and renormalization (2015). arXiv:1503.08093
- 4.Burton, R., Pemantle, R.: Local characteristics, entropy and limit theorems for spanning trees and domino tilings via transfer-impedances. Ann. Probab.
**21**, 1329–1371 (1993)MathSciNetCrossRefMATHGoogle Scholar - 5.Chang, Y.: Contribution à l’étude des laçets Markoviens. thèse de doctorat de l’université Paris Sud—Paris XI, tel-00846462 (2013)Google Scholar
- 6.Diaconis, P., Fulton, W.: A growth model, a game, an algebra, Lagrange inversion, and characteristic classes. Rend. Semin. Mat. Univ. Pol. Torino
**49**(1), 95–119 (1991)MathSciNetMATHGoogle Scholar - 7.Fill, J.: On hitting times and fastest strong stationary times for skip-free and more general chains. J. Theor. Pobab.
**22**(3), 558–586 (2009)MathSciNetCrossRefMATHGoogle Scholar - 8.Freidlin, M.I., Wentzell, A.D.: Random Perturbations of Dynamical Systems, Grundlehren der Mathematischen Wissenschaften 260, 2nd edn. Springer, New York (1998)CrossRefMATHGoogle Scholar
- 9.Levin, D.A., Peres, Y., Wilmer, E.L.: Markov Chains and Mixing Times. American Mathematical Society, Providence (2008)CrossRefGoogle Scholar
- 10.Marchal, P.: Loop-erased random walks, spanning trees and Hamiltonian cycles. Electron. Commun. Probab.
**5**, 39–50 (2000)MathSciNetCrossRefMATHGoogle Scholar - 11.Micchelli, C.A., Willoughby, R.A.: On functions which preserve the class of Stieltjes matrices. Linear Algebra Appl.
**23**, 141–156 (1979)MathSciNetCrossRefMATHGoogle Scholar - 12.Miclo, L.: On absorption times and Dirichlet eigenvalues. ESAIM Probab. Stat.
**14**, 117–150 (2010)MathSciNetCrossRefMATHGoogle Scholar - 13.Propp, J., Wilson, D.: How to get a perfectly random sample from a generic Markov chain and generate a random spanning tree of a directed graph. J. Algorithms
**27**, 170–217 (1998)MathSciNetCrossRefMATHGoogle Scholar - 14.Wilson, D.: Generating random spanning trees more quickly than the cover time. In: Proceedings of the Twenty-Eight Annual ACM Symposium on the Theory of Computing, pp. 296–303 (1996)Google Scholar
- 15.Whittaker, E.T., Robinson, G.: The Calculus of Observations; A Treatise on Numerical Mathematics. Blackie and Son Limited, London (1924)MATHGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.