Two Applications of Random Spanning Forests

We use random spanning forests to find, for any Markov process on a finite set of size n and any positive integer $$m \le n$$m≤n, a probability law on the subsets of size m such that the mean hitting time of a random target that is drawn from this law does not depend on the starting point of the process. We use the same random forests to give probabilistic insights into the proof of an algebraic result due to Micchelli and Willoughby and used by Fill and by Miclo to study absorption times and convergence to equilibrium of reversible Markov chains. We also introduce a related coalescence and fragmentation process that leads to a number of open questions.


Well-Distributed Points and Random Spanning Forests
Let X = (X (t) : t ≥ 0) be an irreducible continuous-time Markov process on a finite set X with size |X | = n. It is known, see, for example, Lemma 10.8 in [9], that if R ∈ X is chosen according to the equilibrium measure μ of the process, then the mean value of the hitting time E[E x [T R ]]-where E x [·] stands for the mean value according to the law of the process started at x and E[·] stands for the mean value according to the law of R-does not depend on the starting point x ∈ X . More generally, if a random subset R ⊂ X of any (possibly random) size has such a property, then we say that the law of R provides well-distributed points. One of our motivations for building such random sets was to find appropriate subsampling points for signal processing on arbitrary networks, in connection with intertwining equations and metastability studies (cf. [2]). In this paper, we build such a law on the subsets of any given size m ≤ n. This is a trivial problem for m = n, and for m = 1, this property actually characterizes the law of R: in this case, the singleton R has to be chosen according to the equilibrium law.
To solve this problem in the general case, we use random rooted spanning forests, a standard variation-introduced, for example, in [13]-on the well-known uniform spanning tree theme. Let us first denote by w(x, y) the jump rate of X from x to y in X and by G = (X , w) the weighted and oriented graph for which E = {(x, y) ∈ X × X : x = y and w(x, y) > 0} is the edge set and w(e) = w(x, y) is the weight of e = (x, y) in E. A rooted spanning forest φ is a subgraph of G without cycle, with X as set of vertices and such that, for each x ∈ X , there is at most one y ∈ X such that (x, y) is an edge of φ. The root set ρ(φ) of the forest φ is the set of points x ∈ X for which there is no edge (x, y) in φ; the connected component of φ are trees, each of them having edges that are oriented towards its own root. We call F the set of all rooted spanning forests, we see each forest φ in F as a subset of E, and we associate with it the weight In particular, ∅ ∈ F is the spanning forest made of n degenerate trees reduced to simple roots and w(∅) = 1. We can now define our random forests: for each q > 0, the random spanning forest q is a random variable in F with law where the normalizing partition function is We can include the case q = +∞ in our definition by setting ∞ = ∅ ∈ F in a deterministic way. It turns out that both the law of ρ( q ) and the law of ρ( q ) conditioned on the event |ρ( q )| = m , for any 1 ≤ m ≤ n, provide well-distributed points. And we can compute the common value of the mean hitting time in both cases in terms of the eigenvalues of the generator L given by To this end, let us denote by λ 0 , λ 1 , . . . , λ n−1 the eigenvalues of −L and (a k : 0 ≤ k ≤ n) the coefficients of the characteristic polynomial of L, which computed in q is In this formula and all along the paper, we identify scalars with the appropriate multiples of the identity matrix. Recalling that X is irreducible and ordering the eigenvalue by non-decreasing real part, λ 0 is the only one zero eigenvalue, we have a 0 = 0 and we can set a n+1 = 0.

Theorem 1 For all x ∈ X and all positive integer m ≤ n it holds
We prove this theorem in Sect. 3, in which we also compute, in both cases, as a consequence of it and as needed in [2], mean return times to ρ( q ) from a uniformly chosen point in ρ( q ). In doing so, we will see that the problem of finding a distribution that provides exactly m well-distributed points has infinitely many solutions as soon as 2 ≤ m ≤ n − 2 and Theorem 1 simply provides one of them. The only cases when the convex set of solutions reduces to a singleton are the known case m = 1, the easy case m = n − 1 and the trivial one m = n.

Local Equilibria and Random Forests in the Reversible Case
For B ⊂ X we identify the signed measures on X \B with the row vectors ν = ν(x) : x ∈ X \B . For A ⊂ X and any matrix M = M(x, y) : x, y ∈ X , we write [M] A for the submatrix We identify L with its matrix representation with diagonal coefficients −w(x), where and we set The sub-Markovian generator [L] X \B is associated with the process killed in "the boundary" B. We will assume in this section that L is reversible with respect to μ and we write for the ordered eigenvalues of −[L] X \B , with l = |X \B|. We can then inductively define from any probability measure ν = ν l−1 on X \B the a priori signed measures To avoid ambiguity, we set ν k−1 = 0 if λ k,B = 0. The following result is due to Micchelli and Willoughby (see Theorem 3.2 in [11]).

Micchelli and Willoughby's Theorem
If L is reversible, then ν k is a non-negative measure for all non-negative k < l and any probability measure ν on X \B.
Since Eq. (3) can also be written (for k > 0 or B = ∅, so that λ k,B > 0) as and we can identify each ν k with a probability measure on the quotient space X /B for the equivalence relation ∼ B such that x ∼ B y if and only if x = y or x, y ∈ B: we simply put the missing mass on B. Equation (3) has then the following probabilistic interpretation. Starting from ν k , the system decays into ν k−1 after an exponential time of parameter λ k,B , and, more precisely, starting from ν k (·|B), the system remains in this state for an exponential time of parameter λ k,B before decaying into ν k−1 (·|B) or reaching B. This is rigorously established by Fill and Miclo (see [7] and [12]) to control convergence to equilibrium and absorption times of reversible processes, and this is the reason why the ν k s can be described as local equilibria.
The previous probabilistic interpretation makes sense only once the non-negativity of the ν k s is guaranteed by Micchelli and Willoughby's theorem, which is crucial in Fill's and Miclo's analysis. The fully algebraic proof by Micchelli and Willoughby describes the ν k s in terms of some divided differences and uses Cauchy's interlacement theorem in an inductive argument to conclude to positivity. We will show in Sect. 4, on the one hand, that computing the probability of certain events related to our random forests q leads naturally to the divided difference representation of the ν k s, when one has in mind their local equilibria interpretation. This will be done by using Wilson's algorithm, which gives an alternative description of our random forests (see Sect. 2).
On the other hand, our random forest original description will lead to the key formula of the inductive step: from the random forest point of view, this algebraic formula is nothing but a straightforward connection between the previous event probabilities. Section 4 contains the full derivation of Micchelli and Willoughby's theorem.

Coupling Random Forests, Coalescence and Fragmentation
In dealing with practical sampling issues in the next section, we will couple all the q s together in such a way that we will obtain the following side result. Theorem 2 There exists a (non-homogeneous) continuous-time Markov process (F(s) ∈ F : s ≥ 0) that couples together all our random forests q for q > 0 as follows: for all s ≥ 0 and φ ∈ F, it holds with t = 1/q, s = ln(1 + αt) and α as in Eq. (2).
With each spanning forest φ, we can associate a partition P(φ) of X , for which x and y in X belong to the same class when they are in the same tree. We will see in Sect. 2.3 that the coupling t → 1/t = F(ln(1 + αt)) is then associated with a fragmentation and coalescence process, for which coalescence is strongly predominant, and at each jump time, one component of the partition is fragmented into pieces that possibly coalesce with the other components. This coupling will lead to a number of open questions: (1) Is it possible to use this process to sample efficiently q with a prescribed number of roots? (2) Can we use it to estimate the spectrum of L? (3) How to characterize the law of the associated partition process? (See Sect. 2.3 for more details.)

Wilson Algorithm, Partition Function and the Root Process
Let us first slightly extend our notion of random forests. For any B ⊂ X , we denote by q,B a random variable in F with the law of q conditioned on the event B ⊂ ρ( q ) . We then have, for any φ in F, This law is non-degenerate even for q = 0, provided that B is non-empty. And if B is a singleton {r }, then 0,{r } is the usual random spanning tree with a prescribed root r , which can be sampled with Wilson's algorithm (cf. [14]). For q > 0, q = q,∅ itself is also a special case of the usual random spanning tree on an extended weighted graph G = (X ,w) obtained by addition of an extra point r to X -to formX = X ∪{r }-and by settingw(x, r ) = q andw(r, x) = 0 for all x in X . Indeed, to get q from the usual random spanning tree onX , with the root in r , one only needs to remove all the edges going from X to r . Following Propp and Wilson (cf. [13]), we can then use Wilson's algorithm to sample q,B for q > 0 or B = ∅: a. start from B 0 = B and φ 0 = ∅, choose x in X \B 0 and set i = 0; b. run the Markov process starting at x up to time T q ∧ T B i with T q an independent exponential random variable with parameter q (so that T q = +∞ if q = 0) and This algorithm is not only a practical algorithm to sample q , but also a powerful tool to analyse its law, one of its main strength points being the fact that the order of the chosen starting points x does not matter. There are at least two ways to prove that this algorithm indeed samples q,B with the desired law, whatever the way in which the starting points x are chosen. One can, on the one hand, follow Wilson's original proof in [14], which makes use of the so-called Diaconis-Fulton stack representation of Markov chains (see Sect. 2.3). One can, on the other hand, follow Marchal who first computes in [10] the law of the loop-erased trajectory x q,B obtained from the random trajectory X : [0, T q ∧ T B ] → X started at x ∈ X \B and stopped in B or at an exponential time T q if T q is smaller than the hitting time T B . One has indeed: From this result, one can compute the law of q,B defined by the previous algorithm to find, after telescopic cancellation, and ensured by the fact that the eigenvalues have non-negative real part. We eventually set and |ρ( q,B )| has the same law as S q,B for q > 0 or B = ∅.
Proof Equation (5) allows to identify Z B by summing on φ in F. The identity in law is obtained by identifying monomials with equal degree in Eq. (7) and by dividing each identity by Z B (q) to get, for any k ≤ l = |X \B|, Since for each j in J + there is j in J − such that λ j ,B =λ j,B and p j =p j , the proof is complete.
The fact that q is the usual random spanning tree on an extended graph implies, through the (non-reversible) transfer current theorem (cf. [4] and [5]), that q,B ⊂ E is a determinantal process and so is ρ( q,B ) ⊂ X . (In the reversible case at least, the fact that the law of |ρ( q,B )| is a convolution of Bernoulli laws is also a consequence of this determinantality property.) Let us give a direct and short proof of the fact that ρ( q,B ) is a determinantal process associated with a remarkable kernel. B ] A\B and we can assume without loss of generality that A ⊂ X \B. By sampling q,B with Wilson's algorithm and choosing as first starting points for the loop-erased random walks the elements of A, we have by Marchal's formula and after telescopic cancellation

Proposition 2.2 For any
The last but one equality, sometimes referred as Jacobi's equality, is obtained from standard manipulations of the Schur complement This formula gives det M = det S det D and identifies S −1 as one block of The previous equality was obtained by taking M = [q − L] X \B and D = [q − L] (X \B)\A (so that S is the sub-Markovian generator of the trace on A of the original process killed in B and at rate q outside B, while S −1 is the associated Green's kernel).
Let us express K q,B in terms of L to conclude this proof. The trajectory of X can be built by updating at each time of a Poisson process of intensity α, defined in Eq. (2), the current position x ∈ X to y ∈ X with probability with the convention Since the probability of reaching the time T q between two successive updating is q/(q + α), we get for any x and y in X \B

And for any
We conclude this section by observing that the Markov chain tree theorem (see, for example, [1]) allows to compute the root distribution when conditioning on P( q ), the partition of X that is associated with q . We write P( q ) = [X 1 , . . . , X m ] if q is made of m trees, each of them spanning one of the X i s. For each i ≤ m, we denote by L i the generator of X restricted to X i , which is defined by and by X i this restricted process. Since, by construction, the root of the spanning tree of X i is reachable by X i from any point in X i , this process admits only one invariant measure μ i , which is equal to μ(·|X i ) (recall that μ is the invariant measure of X ) when X is reversible. If [X 1 , . . . , X m ] is an admissible partition of X , that is if P( q ) = [X 1 , . . . , X m ] with nonzero probability, then, denoting by T i the set of spanning trees of X i and by ρ(τ i ) the root x i ∈ X i of τ i ∈ T i , we can compute for any The Markov chain tree theorem gives then See Fig. 1 for an illustration with the two-dimensional nearest-neighbour random walk in a Brownian sheet potential, which is easy to sample and gives rise to a rich and anisotropic energy landscape.

Sampling Approximately m Roots
While Wilson's algorithm provides a practical way to sample q , we do not have such an algorithm for q conditioned on |ρ( q )| = m . In this section, we explain how to get q with approximately m roots, with an error of order √ m at most. By Proposition 2.1, it suffices to choose q solution of Indeed, But we do not want to solve Eq. (10) analytically, and we do not want to compute the eigenvalues λ j . One way to find an approximate value of the solution q * of Eq. (10) is to use, on the one hand, the fact that q * is the only one stable attractor of the recursive sequence defined by on the other hand, the fact that |ρ( q )| and E |ρ( q )| = j<n q/(q + λ j ) are typically of the same order, at least when E |ρ( q )| , i.e. q, is large enough, since We then propose the following algorithm to sample q with m ± 2 √ m roots.
a. Start from any q 0 > 0, for example To see that this algorithm rapidly produces the desired result, it is convenient to write γ = ln q and introduce the global contraction While f is a contraction in a neighbourhood of q * only, let us show that g is indeed a global contraction. For all γ ∈ R, it holds Then, g (γ ) being the difference between two non-negative terms that are strictly smaller than 1, we have |g (γ )| < 1, for all γ in R. Now, writing for k ≥ 0 there are some non-negative θ k < 1 such that, with γ * = ln q * = g(γ * ), and, by induction on k > 0, Since Chebyshev's inequality gives for all δ > 0 after "a few iterations" (we cannot be more precise in absence of extra information on the spectrum of L that would be needed to give uniform bounds on |g | and the θ k s) the approximation error for γ * is of the same order as k−1 -itself of order 1/ √ m at most-and we get m roots for q within an error of order √ m.

Coupled Forests
Instead of stopping the iterations of the previous algorithm when reaching a forest with m ± 2 √ m roots, one can proceed up to reaching exactly m roots. This typically requires order √ m extra iterations at most, and this is what we have done to obtain the exactly 50 roots of Fig. 1. Starting with q = q 0 larger than the solution q * of Eq. (10), it takes generally much more time to reach exactly m roots than to decrease q down to a good approximation of q * according to the updating procedure q ← q × m/|ρ( q )|. For example, starting from q = 4 for the Metropolis random walk in Brownian sheet potential of Fig. 1, we got 361,782 roots at the first iteration, 51 roots and q = 5.26 × 10 −6 at the tenth iteration, and we needed 55 extra iterations to get exactly 50 roots with q = 4.92 × 10 −6 , getting in the mean time root numbers oscillating between 43 and 59 for q between 3.96 × 10 −6 and 6.07 × 10 −6 . While decreasing q, we produce a number of forests with a larger root number than desired, and, sampling for large q being less time-consuming than sampling for small q, the total running time of the iterations to decrease q to the correct order is essentially of the same order as the running time of one iteration for q of this correct order. This suggests that if we could continuously decrease q in such a way that q would cross all the manifolds then we might be able to find a more efficient algorithm to sample q with a prescribed root number. It turns out that we are able to implement such a "continuously decreasing q algorithm," building in this way the coupling of Theorem 2. But this is not sufficient to improve our sampling algorithm for a prescribed root number.
In this section, we prove Theorem 2, characterize the associated root process and describe the associated coalescence and fragmentation process, which leads to further open questions. This coupling is the natural extension of Wilson's algorithm based on Diaconis and Fulton's stack representation of random walk (cf. [6]) as used by Wilson and Propp in [14] and [13]. Stack representations Assume that an infinite list or collection or arrows is attached to each site of the graph, each arrow pointing towards one of its neighbour. Assume in addition that these arrows are distributed according to the probability kernel P of the discrete-time skeleton of X which is defined by Eqs. (8)- (9). Assume in other words that these arrows are independently distributed at each level of the stacks and that an arrow pointing towards the neighbour y of a given site x appears with probability P(x, y), considering in this context x itself as one of its neighbours. Imagine finally that each list of arrows attached to any site is piled down in such a way that it makes sense to talk of an infinite stack with an arrow on the top of this stack. By using this representation, one can generate the Markov process as follows: at each jump time of a Poisson process with intensity α, our walker steps to the neighbour pointed by the arrow at the top of the stack where it was sitting, and the top arrow is erased from this stack.
To describe Wilson's algorithm for sampling q , one has to introduce a further ingredient: pointers to an absorbing state r in each stack. Such a pointer should independently appear with probability q/(q + α) at each level in the different stacks. One way to introduce it is by generating independent uniform random variables U together with each original arrow in the stacks. We can then replace the latter by a pointer to the absorbing state whenever U < q/(q + α). A possible description of Wilson's algorithm is then the following.
a. Start with a particle on each site. Both particles and sites will be declared either active or frozen. At the beginning, all sites and particles are declared to be active. b. Choose an arbitrary particle among all the active ones and look at the arrow at the top of the stack it is seated on. Call x the site where the particle is seated.
-If the arrow is the pointer to r , declare the particle to be frozen and site x as well.
-If the arrow points towards another site y = x, remove the particle and keep the arrow. We say that this arrow is uncovered. -If the arrow points to x itself, remove the arrow. c. Once again, choose an arbitrary particle among all the active ones, look at the arrow on the top of the stack it is seated on, and call x the site where the particle is seated.
-If the arrow points to r , the particle is declared to be frozen, and so are declared x and all the sites eventually leading to x by following uncovered top pile arrow paths. -If the arrow points to a frozen site, remove the chosen particle at x, keep the (now uncovered) arrow, and freeze the site x as well as any site eventually leading to x by following uncovered top pile arrow paths. -If the arrow points to an active site, then there are two possibilities. By following from this site the uncovered arrows at the top of the stacks, we either reach a different active particle or run in a loop back to x. In the former case, remove the chosen particle from site x and keep the discovered arrow. In the latter case, erase all the arrows along the loop and put an active particle on each site of the loop. Note that this last case includes the possibility for the discovered arrow of pointing to x itself, in which case we just have to remove the discovered arrow. d. Repeat the previous step up to exhaustion of the active particles.
The crucial observation, which is due to Propp and Wilson, is that whatever the choice of active particles all along the algorithm, at the end of the day the same arrows are erased and the same spanning forest of uncovered arrows, with a frozen particle at each root, is obtained. In particular, by choosing at each step the last encountered active particle, or the same as in the previous step when we just erased a loop, we perform a simple loop-erased random walk up to freezing.
Proof of Theorem 2. Since q is sampled for any q by the previously described algorithm and the same uniform variables U can be used for each q, this provides a global coupling for all the q . We first note that this coupling allows to sample q 2 from a sampled q 1 for q 2 < q 1 . Indeed, by running this algorithm for sampling q 2 , one can reach at some point the spanning forest of uncovered arrows q 1 with this difference that the frozen particles of the final configuration obtained with parameter q 1 can be still active at this intermediate step of the algorithm run with q 2 : it suffices to choose the sequence of active particles in the same way with both parameters, and this is possible since each pointer to r in the stacks with parameter q 2 is associated with a pointer to r at the same level in the stacks with parameter q 1 . Thus, to sample q 2 from a sampled q 1 , we just have to replace some frozen particles in ρ( q 1 ) and continue the algorithm with parameter q 2 . To decide which particle has to be unfrozen we can proceed as follows. With probability each particle in ρ( q 1 ), independently from each other, is kept frozen. With probability 1 − p a particle in a site x of ρ( q 1 ) is declared active and we set at the top of the pile in x an arrow that points towards y with probability P(x, y) = w(x, y)/α. When q = 1/t continuously decreases, we obtain a right-continuous process t → 1/t , for which we can practically sample not only the "finite dimensional distributions"-i.e. the law of ( 1/t 1 , . . . , 1/t k ) for any choice of t 1 < · · · < t kbut the whole trajectories ( 1/t : t ≤ t * ) too, for any finite t * . Indeed, at each time t = 1/q, the next frozen particle to become active is uniformly distributed among the m roots at time t, and the next jump time T when it will "wake up" is such that the random variable V = 1/T α + 1/T = 1 1 + αT (13) has the law of the maximum of m independent uniform variables on 0, q/(q + α) = 0, 1/(1 + αt) . Since V has the same law as U 1/m /(1 + αt) with U uniform on [0, 1). Using Eq. (13), we can then sample the next jump time T by solving Setting s = ln(1 + αt) and S = ln(1 + αT ), the random variable S − s has the same law as the minimum of m independent exponential random variables of rate 1. Our Markov process (F(s) ∈ F : s ≥ 0) is then built in the following way. We associate m independent exponential random clocks of rate 1 with the m roots of F(s) at time s. At the first ring time S ≥ s at some root x, we define F(S) by declaring active the particle at x, putting an arrow to y with probability P(x, y) = w(x, y)/α and restarting our algorithm with parameter q = 1/T = α/(e S − 1).
A determinantal formula for the associated root process. Proposition 2.2, from which we recall the definition of the probability kernel K q,B , can be extended to characterize the law of the coupled root process t → ρ( 1/t ).

Proposition 2.4
For all 0 < t 1 < · · · < t k < t k+1 = 1/q k+1 and all A 1 , . . . , A k , A k+1 contained in X , it holds Proof Let us first consider the case k = 1, so that A 1 = A 1 . As far as 1/t is concerned for t > t 1 , conditioning on A 1 ⊂ ρ( 1 ) is nothing but a conditioning on the value of the uniform random variables at the top of the stacks in A 1 . With q 1 = 1/t 1 , we cannot sample q 2 conditioned on A 1 ⊂ ρ( 1 ) by keeping "frozen" each site in A 1 with probability p defined by Eq. (12), calling B the set of the remaining frozen sites and sampling q 2 ,B with this random B ⊂ X , so that the root set would be a determinantal process with kernel K q 2 ,B . The walking up procedure we defined after Eq. (12) indeed introduces a bias in the distribution at the top of the pile for the unfrozen sites: top pile arrows cannot be replaced by pointers to r . To recover a determinantal process with random kernel K q 2 ,B for the conditional root process, the random set B has to be built by keeping frozen each site in A 1 with a smaller probability p solving Top pile arrows of unfrozen sites can then still be replaced by pointers to r with probability q 2 /(q 2 + α), and this equation makes that we recover the correct biased probability. Solving it, we get p = q 2 /q 1 = t 1 /t 2 and Eq. (15). When k is larger than 1, the formula is simply obtained by keeping frozen each site x in i≤k A k with a probability that depends on the largest i such that x ∈ A i . This is the reason why we introduced the sets Fragmentation, coalescence and open questions. At each jump time S = S k+1 of F and in the proof of Theorem 2, there is only one root x to "wake up," which means that there is only one piece of the associated partition into m pieces at the previous jump time S k that can be fragmented into different trees, the other pieces of the previous partition remaining contained in different pieces of the new partition at time S k+1 . At time S k+1 we can have both fragmentation, produced by the loop-erasure procedure, and coalescence: the trees covering the possibly fragmented piece can be eventually grafted to the other m − 1 non-fragmented frozen trees, when their associated looperased random walk freezes by running into these frozen trees.
Fragmentation can increase the total number of pieces up to m + k − 1, with k the number of sites in the tree that is rooted at x: this happens when this tree is completely fragmented and no coalescence occurs. Coalescence can decrease the number of pieces by 1 at most: when each tree of the possibly fragmented piece is eventually grafted to the other pieces. But coalescence strongly dominates the process: as q = 1/t decreases, so does E |ρ( q )| , with limited fluctuations, as a consequence of Proposition 2.1 (cf. Figs. 2, 3, 4). And the fact that when |ρ( q )| decreases, it does so by one unit at most, implies that the process t → 1/t crosses all the manifolds F m defined by Eq. (11).
It is then easy to sample 1/T m with T m the first time when t → 1/t reaches F m . Unfortunately, 1/T m has not the same law as q conditioned on |ρ( q )| = m . One has indeed a counterexample already for n = 2 and m = 1. The random set ρ( T m ) is, generally, not even well distributed: for m = n − 1, there is only one distribution on the subsets of size m that produces well-distributed points. We then get to our first open question Q1: Is there a way to use the process t → 1/t to sample the measure P q ∈ · |ρ( q )| = m ?
One can also use this process to estimate t ≥ 0 → j 1/(1+tλ j ) since this sum is the expected value of |ρ( 1/t )|, which presents limited fluctuations only (see Fig. 4). This leads to our second open question Q2: Is there a way to use the process t → 1/t to estimate in an efficient way the spectrum of −L, or its higher part at least?
Our third open question concerns the law of the "rooted partition" associated with the forest process t → 1/t . (We call it rooted since a special vertex, the root, is associated with each piece of the partition.) Figures 3 and 4 and Wilson's algorithm show that as q = 1/t decreases, the partition process naturally tends to break the space into larger and larger valleys in which the process is trapped on time scale t = 1/q (note that the difference of 12 = 27−15 between the extreme values of s = ln(1+αt) in the right picture of Fig. 4 corresponds to a ratio of order 1.6 × 10 5 between the associated times t). But, while we could characterize in Proposition 2.4 the law of the associated root process, we are far from obtaining a similar result for the rooted partition.
Q3: Which characterization can be given of the rooted partition associated with t → 1/t ? We actually know very little beyond Proposition 2.3 on this partition for a fixed value of q and an easier question would be that of characterizing the law of the forest process itself. Even though Fig. 2 echoes Figure 5 of [3], which illustrates a coalescence process that is also associated with random spanning forests, the two processes are quite different and we do not know the scaling limit of our process, even for a fixed value of q. The process considered in [3] is a pure coalescence process, while fragmentation is also involved in our case; at a fixed time t, the tree number in that case of the uniformly cut uniform spanning tree follows a binomial distribution, while the tree number of our process is distributed as a sum of Bernoulli random variables with nonhomogeneous parameters; and, even conditioned on a same tree number, if the weights of the associated partitions share the same product of unrooted spanning tree number for each piece of the partition, the extra entropic factor depends in that case of these pieces' boundaries, while in our case it is simply given by the product of their size.

Forest Formulas for Hitting Distributions, Green's Kernels and Mean Hitting Time
In order to prove Theorem 1, we first use Wilson's algorithm to give forest representations of hitting distributions, Green's kernels and mean hitting times. Two at least of these formulas, Formula (16) and Formula (17), already appeared in the work of Freidlin and Wentzell (see Lemma 3.2 and Lemma 3.3 in [8]). We recall that T B stands for the hitting time of B ⊂ X , we denote by τ x (φ) the unique maximal tree in φ ∈ F that covers x ∈ X , and we recall that ρ(τ x (φ)) ∈ ρ(φ) is the root of τ x (φ) and recall Eq. (4). By considering Wilson's algorithm for sampling 0,B and choosing x as first starting point for the loop-erased random walk, we first note that for all y ∈ B, it holds We then get the following forest representation of the Green's kernel Lemma 3.1 For any B ⊂ X , x ∈ X and z / ∈ B, it holds We finally get Freidlin and Wentzell's forest representation for mean hitting times: Proof of Lemma 3.1: We introduce once again the discrete skeleton of X -that is the Markov chainX with transition kernel P defined by Eqs. (8)-(9)-and we callĜ B the Green's kernel ofX stopped in B. Let us denote byT z ,T + z andT B the hitting time of z, the return time to z and the hitting time of B for the Markov chainX . Sincê Then, since P y T z < T B = P y X T B∪{z} = z for z / ∈ B, it holds, using Formula (16), .
If we associate with each forest φ such that ρ(φ ) = B the forest φ = φ \{(z, y)} with (z, y) the only edge in φ that is issued from z, then we have

Well-Distributed Roots
Proof of Theorem 1. We first note that for any B ⊂ X , it holds [recall Eq. (4)] and, for 0 < m ≤ n, Then, by using Formula (17), By using again Eq. (18) and Proposition 2.1 with S q = S q,∅ , it follows We conclude this section by computing the mean return time to R = ρ( q ) starting from a uniformly distributed point in R. The reason why we use this heavy double + notation is that we will also consider the maybe less natural but often more useful randomized or skeleton return time T + R , which is defined as follows. Assuming that X is built by updating its current position at each time of a Poisson process of intensity α according to the probability kernel P defined by Eqs. (8)-(9), the skeleton return time is with τ 1 the first updating time in the Poisson process. One always has T R ≤ T + R ≤ T ++ R and, for any x ∈ R, it holds so that with w(x) defined by Eq. (1). Like in the previous proof, we write S q for S q,∅ defined in Eq. (6), and we stress that its law depends on the spectrum of L only.

Proposition 3.2 For all m ≤ n and all q > 0, it holds
where U (ρ( q )) stands for a random point uniformly distributed in ρ( q ).
Proof We work in discrete time, and we use the same notation as in the proof of Lemma 3.1. For any B ⊂ X and x ∈ X , we set When x belongs to B, it holds h B (x) = 0 and, when x / ∈ B, Setting we also have, when x ∈ B, Let us denote by ν any probability measure on the subsets of X that produces well-distributed points. By setting h(x) = B⊂X ν(B)h B (x) for all x ∈ X , we then get Since ν produces well-distributed point, the function h is constant on X , so that Ph (x) = h(x) for all x in X , which implies, together with the previous equality, Since by Theorem 1 we can choose for ν the distribution of ρ( q ) conditioned on |ρ( q )| = m , this proves, together with Eq. (19), after dividing by αm or w(x)m and summing on x ∈ X , the first part of the proposition. The last two equalities simply follow from Proposition 2.1.
x ∈ X , with the additional unknown variable t, to see that the solution is unique.

Re-reading Micchelli and Willoughby's Proof
Before following in Sects. 4.1-4.3 the three main steps of Micchelli and Willoughby's proof, we give some heuristic on the divided difference representation of the ν k = ν x k in the case ν = δ x for any x in X \B. (In the general case, ν is a convex combination of such Dirac masses and we just need to prove the theorem in this special case of a generic Dirac mass.) For 0 ≤ k < l, we have and, if B = ∅, the same formula gives ν x −1 = 0 by Hamilton-Cayley theorem, while, if B = ∅, λ 0,B = 0. Having in mind that each ν k should be interpreted as a local equilibrium from which the system decays into ν k−1 at rate λ k , this means that the system leaves the state ν 0 only to be absorbed in B when B = ∅ (the probability measure on the quotient set X /B that is associated with ν −1 is fully concentrated on B), while ν 0 is the non-decaying global equilibrium μ when B = ∅. Now, by considering Wilson's algorithm with a first loop-erased random walk started from x to sample q,B and by comparing the successive decay times with the exponential random time T q , we get for any y in X \B, using the notation of Proposition 2.2 and the notation recalled before Eq. (16), Let us divide by q and multiply by Z B (q) = det q − [L] X \B both sides of this equation to have a simpler polynomial right-hand side. With W B (q) the matrix defined by or equivalently, since we have seen in the proof of Proposition 2.2 that and suggests a divided difference representation for the ν k s according to the following definition.
for all x in R and Q(x k ) = f (x k ) for k < l.
Remark These divided differences f [x 0 , x 1 , . . . , x k ] are given by Lagrange interpolation formula which shows that divided differences are permutation invariant, and one can also compute them inductively with Newton interpolation formulas which explain the terminology. See, for example, Chapter II of [15] for more details.
To prove that our ν x k s are non-negative measures, we can assume without loss of generality that the eigenvalues λ k,B are all distinct (we can modify them slightly and use the continuity of the ν x k s). Equation (24) suggests in this case that for all k < l and x in X It is worth noting at this point that Eq. (28) would be a consequence of the theorem by Micchelli and Willoughby (the local equilibrium interpretation of each ν k makes sense only once its non-negativity is established), but our goal is to prove this theorem. This is what we are ready to do now.

Checking Eq. (28)
We simply check that the right-hand side and the left-hand side of Eq. (28) act in the same way on each left-eigenvector μ r of [L] X \B , with μ r [L] X \B = −λ r,B μ r . Recall that W B (q) is equivalently defined by Eq. (22) or Eq. (23). By using the latter and Formula (25), it holds, for each r < l and k < l, Since each numerator in the right-hand side is equal to 0 unless j = r , which can happen only if r ≤ k, we get

A Combinatorial Identity
The key point of the proof lies in the following lemma.
The theorem is eventually proven by showing by induction on l = |X \B| the stronger statement (recall Formula (27)): for all ξ 0 > ξ 1 > · · · > ξ L−1 with L ≥ l and such that ξ j ≥ −λ j,B for j < l. The claim is obvious for l = 1, and we distinguish to cases for l > 1. If x = y, we do not need the inductive hypothesis: it follows from Formula (29) that by Cauchy interlacement theorem-this is where the reversibility hypothesis mattersξ j ≥ λ j,B implies ξ j ≥ λ j,B∪{x} and the non-negativity follows from the lemma. If x = y, the claim follows in the same way from Formula (30) and the inductive hypothesis.