Two Applications of Random Spanning Forests
 224 Downloads
Abstract
We use random spanning forests to find, for any Markov process on a finite set of size n and any positive integer \(m \le n\), a probability law on the subsets of size m such that the mean hitting time of a random target that is drawn from this law does not depend on the starting point of the process. We use the same random forests to give probabilistic insights into the proof of an algebraic result due to Micchelli and Willoughby and used by Fill and by Miclo to study absorption times and convergence to equilibrium of reversible Markov chains. We also introduce a related coalescence and fragmentation process that leads to a number of open questions.
Keywords
Finite networks Spectral analysis Spanning forests Determinantal processes Random sets Hitting times Coalescence and fragmentation Local equilibriaMathematics Subject Classification (2010)
Primary: 05C81 60J20 15A15 Secondary: 15A18 05C851 WellDistributed Points, Local Equilibria and Coupled Spanning Forests
1.1 WellDistributed Points and Random Spanning Forests
Let \(X = (X(t) : t \ge 0)\) be an irreducible continuoustime Markov process on a finite set \(\mathcal{X}\) with size \(\mathcal{X} = n\). It is known, see, for example, Lemma 10.8 in [9], that if \(R \in \mathcal{X}\) is chosen according to the equilibrium measure \(\mu \) of the process, then the mean value of the hitting time \({{\mathbb {E}}}[E_x[T_R]]\)—where \(E_x[\cdot ]\) stands for the mean value according to the law of the process started at x and \({{\mathbb {E}}}[\cdot ]\) stands for the mean value according to the law of R—does not depend on the starting point \(x \in \mathcal{X}\). More generally, if a random subset \(R \subset \mathcal{X}\) of any (possibly random) size has such a property, then we say that the law of R provides welldistributed points. One of our motivations for building such random sets was to find appropriate subsampling points for signal processing on arbitrary networks, in connection with intertwining equations and metastability studies (cf. [2]). In this paper, we build such a law on the subsets of any given size \(m \le n\). This is a trivial problem for \(m = n\), and for \(m = 1\), this property actually characterizes the law of R: in this case, the singleton R has to be chosen according to the equilibrium law.
Theorem 1
We prove this theorem in Sect. 3, in which we also compute, in both cases, as a consequence of it and as needed in [2], mean return times to \(\rho (\Phi _q)\) from a uniformly chosen point in \(\rho (\Phi _q)\). In doing so, we will see that the problem of finding a distribution that provides exactly m welldistributed points has infinitely many solutions as soon as \(2 \le m \le n  2\) and Theorem 1 simply provides one of them. The only cases when the convex set of solutions reduces to a singleton are the known case \(m = 1\), the easy case \(m = n 1\) and the trivial one \(m = n\).
1.2 Local Equilibria and Random Forests in the Reversible Case
Micchelli and Willoughby’s Theorem
If L is reversible, then \(\nu _k\) is a nonnegative measure for all nonnegative \(k < l\) and any probability measure \(\nu \) on \(\mathcal{X} {\setminus } \mathcal{B}\).
The previous probabilistic interpretation makes sense only once the nonnegativity of the \(\nu _k\)s is guaranteed by Micchelli and Willoughby’s theorem, which is crucial in Fill’s and Miclo’s analysis. The fully algebraic proof by Micchelli and Willoughby describes the \(\nu _k\)s in terms of some divided differences and uses Cauchy’s interlacement theorem in an inductive argument to conclude to positivity. We will show in Sect. 4, on the one hand, that computing the probability of certain events related to our random forests \(\Phi _q\) leads naturally to the divided difference representation of the \(\nu _k\)s, when one has in mind their local equilibria interpretation. This will be done by using Wilson’s algorithm, which gives an alternative description of our random forests (see Sect. 2). On the other hand, our random forest original description will lead to the key formula of the inductive step: from the random forest point of view, this algebraic formula is nothing but a straightforward connection between the previous event probabilities. Section 4 contains the full derivation of Micchelli and Willoughby’s theorem.
1.3 Coupling Random Forests, Coalescence and Fragmentation
In dealing with practical sampling issues in the next section, we will couple all the \(\Phi _q\)s together in such a way that we will obtain the following side result.
Theorem 2
With each spanning forest \(\phi \), we can associate a partition \(\mathcal{P}(\phi )\) of \(\mathcal{X}\), for which x and y in \(\mathcal{X}\) belong to the same class when they are in the same tree. We will see in Sect. 2.3 that the coupling \(t \mapsto \Phi _{1/t} = F(\ln (1 + \alpha t))\) is then associated with a fragmentation and coalescence process, for which coalescence is strongly predominant, and at each jump time, one component of the partition is fragmented into pieces that possibly coalesce with the other components. This coupling will lead to a number of open questions: (1) Is it possible to use this process to sample efficiently \(\Phi _q\) with a prescribed number of roots? (2) Can we use it to estimate the spectrum of L? (3) How to characterize the law of the associated partition process? (See Sect. 2.3 for more details.)
2 Preliminary Remarks and Sampling Issues
2.1 Wilson Algorithm, Partition Function and the Root Process
 a.
start from \(\mathcal{B}_0 = \mathcal{B}\) and \(\phi _0 = \emptyset \), choose x in \(\mathcal{X} {\setminus } \mathcal{B}_0\) and set \(i = 0\);
 b.
run the Markov process starting at x up to time \(T_q \wedge T_{\mathcal{B}_i}\) with \(T_q\) an independent exponential random variable with parameter q (so that \(T_q = +\infty \) if \(q = 0\)) and \(T_{\mathcal{B}_i}\) the hitting time of \(\mathcal{B}_i\);
 c.withthe looperased trajectory obtained from \(X : [0, T_q \wedge T_{\mathcal{B}_i}] \rightarrow \mathcal{X}\), set \(\mathcal{B}_{i + 1} = \mathcal{B}_i \cup \{x_0, x_1, \ldots , x_k\}\) and \(\phi _{i + 1} = \phi _i \cup \{(x_0, x_1), (x_1, x_2), \ldots , (x_{k  1}, x_k)\}\) (so that \(\phi _{i + 1} = \phi _i\) if \(k = 0\));$$\begin{aligned} \Gamma ^x_{q, \mathcal{B}_i} = (x_0, x_1, \ldots , x_k) \in \{x\} \times \bigl (\mathcal{X} {\setminus } (\mathcal{B}_i \cup \{x\})\bigr )^{k  1} \times \bigl (\mathcal{X} {\setminus } \{x\}\bigr ) \end{aligned}$$
 d.
if \(\mathcal{B}_{i + 1} \ne \mathcal{X}\), choose x in \(\mathcal{X} {\setminus } \mathcal{B}_{i + 1}\) and repeat b–d with \(i + 1\) in place of i, and, if \(\mathcal{B}_{i + 1} = \mathcal{X}\), set \(\Phi _{q, \mathcal{B}} = \phi _{i + 1}\).
There are at least two ways to prove that this algorithm indeed samples \(\Phi _{q, \mathcal{B}}\) with the desired law, whatever the way in which the starting points x are chosen. One can, on the one hand, follow Wilson’s original proof in [14], which makes use of the socalled Diaconis–Fulton stack representation of Markov chains (see Sect. 2.3). One can, on the other hand, follow Marchal who first computes in [10] the law of the looperased trajectory \(\Gamma ^x_{q, \mathcal{B}}\) obtained from the random trajectory \(X : [0, T_q \wedge T_{\mathcal{B}}] \rightarrow \mathcal{X}\) started at \(x \in \mathcal{X} {\setminus } \mathcal{B}\) and stopped in \(\mathcal{B}\) or at an exponential time \(T_q\) if \(T_q\) is smaller than the hitting time \(T_\mathcal{B}\). One has indeed:
Theorem [Marchal]
Proposition 2.1
Proof
The fact that \(\Phi _q\) is the usual random spanning tree on an extended graph implies, through the (nonreversible) transfer current theorem (cf. [4] and [5]), that \(\Phi _{q, \mathcal{B}} \subset \mathcal{E}\) is a determinantal process and so is \(\rho (\Phi _{q, \mathcal{B}}) \subset \mathcal{X}\). (In the reversible case at least, the fact that the law of \(\rho (\Phi _{q, \mathcal{B}})\) is a convolution of Bernoulli laws is also a consequence of this determinantality property.) Let us give a direct and short proof of the fact that \(\rho (\Phi _{q, \mathcal{B}})\) is a determinantal process associated with a remarkable kernel.
Proposition 2.2
Proof
Proposition 2.3
See Fig. 1 for an illustration with the twodimensional nearestneighbour random walk in a Brownian sheet potential, which is easy to sample and gives rise to a rich and anisotropic energy landscape.
2.2 Sampling Approximately m Roots
 a.
Start from any \(q_0 > 0\), for example \(q_0 = \alpha = \max _{x \in \mathcal{X}} w(x)\), and set \(i = 0\).
 b.
Sample \(\Phi _{q_i}\) with Wilson’s algorithm.
 c.
If \(\rho (\Phi _{q_i}) \not \in \bigl [m  2 \sqrt{m}, m + 2 \sqrt{m}\bigr ]\), set \(q_{i + 1} = m q_i / \rho (\Phi _{q_i})\) and repeat b–c with \(i + 1\) instead of i, if \(\rho (\Phi _{q_i}) \in \bigl [m  2 \sqrt{m}, m + 2 \sqrt{m}\bigr ]\), then return \(\Phi _{q_i}\).
2.3 Coupled Forests
In this section, we prove Theorem 2, characterize the associated root process and describe the associated coalescence and fragmentation process, which leads to further open questions. This coupling is the natural extension of Wilson’s algorithm based on Diaconis and Fulton’s stack representation of random walk (cf. [6]) as used by Wilson and Propp in [14] and [13].
Stack representations Assume that an infinite list or collection or arrows is attached to each site of the graph, each arrow pointing towards one of its neighbour. Assume in addition that these arrows are distributed according to the probability kernel P of the discretetime skeleton of X which is defined by Eqs. (8)–(9). Assume in other words that these arrows are independently distributed at each level of the stacks and that an arrow pointing towards the neighbour y of a given site x appears with probability P(x, y), considering in this context x itself as one of its neighbours. Imagine finally that each list of arrows attached to any site is piled down in such a way that it makes sense to talk of an infinite stack with an arrow on the top of this stack. By using this representation, one can generate the Markov process as follows: at each jump time of a Poisson process with intensity \(\alpha \), our walker steps to the neighbour pointed by the arrow at the top of the stack where it was sitting, and the top arrow is erased from this stack.
 a.
Start with a particle on each site. Both particles and sites will be declared either active or frozen. At the beginning, all sites and particles are declared to be active.
 b.Choose an arbitrary particle among all the active ones and look at the arrow at the top of the stack it is seated on. Call x the site where the particle is seated.

If the arrow is the pointer to r, declare the particle to be frozen and site x as well.

If the arrow points towards another site \(y \ne x\), remove the particle and keep the arrow. We say that this arrow is uncovered.

If the arrow points to x itself, remove the arrow.

 c.Once again, choose an arbitrary particle among all the active ones, look at the arrow on the top of the stack it is seated on, and call x the site where the particle is seated.

If the arrow points to r, the particle is declared to be frozen, and so are declared x and all the sites eventually leading to x by following uncovered top pile arrow paths.

If the arrow points to a frozen site, remove the chosen particle at x, keep the (now uncovered) arrow, and freeze the site x as well as any site eventually leading to x by following uncovered top pile arrow paths.

If the arrow points to an active site, then there are two possibilities. By following from this site the uncovered arrows at the top of the stacks, we either reach a different active particle or run in a loop back to x. In the former case, remove the chosen particle from site x and keep the discovered arrow. In the latter case, erase all the arrows along the loop and put an active particle on each site of the loop. Note that this last case includes the possibility for the discovered arrow of pointing to x itself, in which case we just have to remove the discovered arrow.

 d.
Repeat the previous step up to exhaustion of the active particles.
Proof of Theorem 2
Our Markov process \((F(s) \in \mathcal{F} : s \ge 0)\) is then built in the following way. We associate m independent exponential random clocks of rate 1 with the m roots of F(s) at time s. At the first ring time \(S \ge s\) at some root x, we define F(S) by declaring active the particle at x, putting an arrow to y with probability \(P(x, y) = w(x, y) / \alpha \) and restarting our algorithm with parameter \(q = 1 / T = \alpha / (e^S  1)\). \(\square \)
A determinantal formula for the associated root process. Proposition 2.2, from which we recall the definition of the probability kernel \(K_{q, \mathcal{B}}\), can be extended to characterize the law of the coupled root process \(t \mapsto \rho (\Phi _{1 / t})\).
Proposition 2.4
Proof
When k is larger than 1, the formula is simply obtained by keeping frozen each site x in \(\bigcup _{i \le k} \mathcal{A}_k\) with a probability that depends on the largest i such that \(x \in \mathcal{A}_i\). This is the reason why we introduced the sets \(\mathcal{A}'_i\) : \(i^*\) is the largest i such that \(x \in \mathcal{A}_i\) if and only if \(x \in \mathcal{A}'_{i^*}\). \(\square \)
Fragmentation, coalescence and open questions. At each jump time \(S = S_{k + 1}\) of F and in the proof of Theorem 2, there is only one root x to “wake up,” which means that there is only one piece of the associated partition into m pieces at the previous jump time \(S_k\) that can be fragmented into different trees, the other pieces of the previous partition remaining contained in different pieces of the new partition at time \(S_{k + 1}\). At time \(S_{k + 1}\) we can have both fragmentation, produced by the looperasure procedure, and coalescence: the trees covering the possibly fragmented piece can be eventually grafted to the other \(m  1\) nonfragmented frozen trees, when their associated looperased random walk freezes by running into these frozen trees.
 Q1:

Is there a way to use the process \(t \mapsto \Phi _{1 / t}\) to sample the measure \({{\mathbb {P}}}\bigl (\Phi _q \in \cdot \bigm  \rho (\Phi _q) = m\bigr )\)?
 Q2:

Is there a way to use the process \(t \mapsto \Phi _{1 / t}\) to estimate in an efficient way the spectrum of \(L\), or its higher part at least?
 Q3:

Which characterization can be given of the rooted partition associated with \(t \mapsto \Phi _{1 / t}\)?
3 Hitting Times
3.1 Forest Formulas for Hitting Distributions, Green’s Kernels and Mean Hitting Time
In order to prove Theorem 1, we first use Wilson’s algorithm to give forest representations of hitting distributions, Green’s kernels and mean hitting times. Two at least of these formulas, Formula (16) and Formula (17), already appeared in the work of Freidlin and Wentzell (see Lemma 3.2 and Lemma 3.3 in [8]).
Lemma 3.1
Proof of Lemma 3.1:
3.2 WellDistributed Roots
Proof of Theorem 1
Proposition 3.2
Proof
4 Rereading Micchelli and Willoughby’s Proof
Definition 4.1
Remark
4.1 Checking Eq. (28)
4.2 A Combinatorial Identity
The key point of the proof lies in the following lemma.
Lemma 4.2
Proof
4.3 Conclusion with Cauchy Interlacement Theorem
We will use the following lemma from [11] and for which we give an alternative proof.
Lemma [Micchelli and Willoughby]
Let \(f : x \in {{\mathbb {R}}} \mapsto \prod _{j < l} (x  \alpha _j) \in {{\mathbb {R}}}\) be a polynomial of degree l with l distinct zeros \(\alpha _0> \alpha _1> \cdots > \alpha _{l  1}\). Let \(\beta _0> \beta _1> \cdots > \beta _{L1}\) be \(L \ge l\) real numbers such that \(\beta _j \ge \alpha _j\) for each \(j < l\). Then, for any \(k \le L\), \(f[\beta _0, \beta _1, \ldots , \beta _k] \ge 0\).
Proof
We prove the lemma by induction on \(r = l  k\). First, since f is a polynomial of degree l with a dominant coefficient equal to 1, Definition 4.1 gives \(f[\beta _0, \ldots , \beta _k] = 0\) if \(k > l\)—that is \(r < 0\)—\(f[\beta _0, \ldots , \beta _k] = 1\) if \(k = l\)—that is \(r = 0\)—and the claim is established for \(r \le 0\).
Notes
Acknowledgements
Part of this work was done during two long visits by A.G. to Leiden, supported by the ERC Advanced Grant 267356VARIS of Frank den Hollander. L.A. was supported by NWO Gravitation Grant 024.002.003NETWORKS.
References
 1.Anantharam, V., Tsoucas, P.: A proof of the Markov chain tree theorem. Stat. Probab. Lett. 8(2), 189–192 (1989)MathSciNetCrossRefMATHGoogle Scholar
 2.Avena, L., Castell, F., Gaudillière, A., Mélot, C.: Approximate and exact solutions of intertwining equations through random spanning forests (2017). arXiv:1702.05992
 3.Benoist, S., Dumaz, L., Werner, W.: Near critical spanning forests and renormalization (2015). arXiv:1503.08093
 4.Burton, R., Pemantle, R.: Local characteristics, entropy and limit theorems for spanning trees and domino tilings via transferimpedances. Ann. Probab. 21, 1329–1371 (1993)MathSciNetCrossRefMATHGoogle Scholar
 5.Chang, Y.: Contribution à l’étude des laçets Markoviens. thèse de doctorat de l’université Paris Sud—Paris XI, tel00846462 (2013)Google Scholar
 6.Diaconis, P., Fulton, W.: A growth model, a game, an algebra, Lagrange inversion, and characteristic classes. Rend. Semin. Mat. Univ. Pol. Torino 49(1), 95–119 (1991)MathSciNetMATHGoogle Scholar
 7.Fill, J.: On hitting times and fastest strong stationary times for skipfree and more general chains. J. Theor. Pobab. 22(3), 558–586 (2009)MathSciNetCrossRefMATHGoogle Scholar
 8.Freidlin, M.I., Wentzell, A.D.: Random Perturbations of Dynamical Systems, Grundlehren der Mathematischen Wissenschaften 260, 2nd edn. Springer, New York (1998)CrossRefMATHGoogle Scholar
 9.Levin, D.A., Peres, Y., Wilmer, E.L.: Markov Chains and Mixing Times. American Mathematical Society, Providence (2008)CrossRefGoogle Scholar
 10.Marchal, P.: Looperased random walks, spanning trees and Hamiltonian cycles. Electron. Commun. Probab. 5, 39–50 (2000)MathSciNetCrossRefMATHGoogle Scholar
 11.Micchelli, C.A., Willoughby, R.A.: On functions which preserve the class of Stieltjes matrices. Linear Algebra Appl. 23, 141–156 (1979)MathSciNetCrossRefMATHGoogle Scholar
 12.Miclo, L.: On absorption times and Dirichlet eigenvalues. ESAIM Probab. Stat. 14, 117–150 (2010)MathSciNetCrossRefMATHGoogle Scholar
 13.Propp, J., Wilson, D.: How to get a perfectly random sample from a generic Markov chain and generate a random spanning tree of a directed graph. J. Algorithms 27, 170–217 (1998)MathSciNetCrossRefMATHGoogle Scholar
 14.Wilson, D.: Generating random spanning trees more quickly than the cover time. In: Proceedings of the TwentyEight Annual ACM Symposium on the Theory of Computing, pp. 296–303 (1996)Google Scholar
 15.Whittaker, E.T., Robinson, G.: The Calculus of Observations; A Treatise on Numerical Mathematics. Blackie and Son Limited, London (1924)MATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.