1 Well-Distributed Points, Local Equilibria and Coupled Spanning Forests

1.1 Well-Distributed Points and Random Spanning Forests

Let \(X = (X(t) : t \ge 0)\) be an irreducible continuous-time Markov process on a finite set \(\mathcal{X}\) with size \(|\mathcal{X}| = n\). It is known, see, for example, Lemma 10.8 in [9], that if \(R \in \mathcal{X}\) is chosen according to the equilibrium measure \(\mu \) of the process, then the mean value of the hitting time \({{\mathbb {E}}}[E_x[T_R]]\)—where \(E_x[\cdot ]\) stands for the mean value according to the law of the process started at x and \({{\mathbb {E}}}[\cdot ]\) stands for the mean value according to the law of R—does not depend on the starting point \(x \in \mathcal{X}\). More generally, if a random subset \(R \subset \mathcal{X}\) of any (possibly random) size has such a property, then we say that the law of R provides well-distributed points. One of our motivations for building such random sets was to find appropriate subsampling points for signal processing on arbitrary networks, in connection with intertwining equations and metastability studies (cf. [2]). In this paper, we build such a law on the subsets of any given size \(m \le n\). This is a trivial problem for \(m = n\), and for \(m = 1\), this property actually characterizes the law of R: in this case, the singleton R has to be chosen according to the equilibrium law.

To solve this problem in the general case, we use random rooted spanning forests, a standard variation—introduced, for example, in [13]—on the well-known uniform spanning tree theme. Let us first denote by w(xy) the jump rate of X from x to y in \(\mathcal{X}\) and by \(\mathcal{G} = (\mathcal{X}, w)\) the weighted and oriented graph for which

$$\begin{aligned} \mathcal{E} = \left\{ (x, y) \in \mathcal{X} \times \mathcal{X} : x \ne y\hbox { and }w(x, y) > 0 \right\} \end{aligned}$$

is the edge set and \(w(e) = w(x, y)\) is the weight of \(e = (x, y)\) in \(\mathcal{E}\). A rooted spanning forest \(\phi \) is a subgraph of \(\mathcal{G}\) without cycle, with \(\mathcal{X}\) as set of vertices and such that, for each \(x \in \mathcal{X}\), there is at most one \(y \in \mathcal{X}\) such that (xy) is an edge of \(\phi \). The root set \(\rho (\phi )\) of the forest \(\phi \) is the set of points \(x \in \mathcal{X}\) for which there is no edge (xy) in \(\phi \); the connected component of \(\phi \) are trees, each of them having edges that are oriented towards its own root. We call \(\mathcal{F}\) the set of all rooted spanning forests, we see each forest \(\phi \) in \(\mathcal{F}\) as a subset of \(\mathcal{E}\), and we associate with it the weight

$$\begin{aligned} w(\phi ) = \prod _{e \in \phi } w(e). \end{aligned}$$

In particular, \(\emptyset \in \mathcal{F}\) is the spanning forest made of n degenerate trees reduced to simple roots and \(w(\emptyset ) = 1\). We can now define our random forests: for each \(q > 0\), the random spanning forest \(\Phi _q\) is a random variable in \(\mathcal{F}\) with law

$$\begin{aligned} {{\mathbb {P}}}\bigl (\Phi _q = \phi \bigr ) = {w(\phi ) q^{|\rho (\phi )|} \over Z(q)}, \qquad \phi \in \mathcal{F}, \end{aligned}$$

where the normalizing partition function is

$$\begin{aligned} Z(q) = \sum _{\phi \in \mathcal{F}} w(\phi ) q^{|\rho (\phi )|}. \end{aligned}$$

We can include the case \(q = +\infty \) in our definition by setting \(\Phi _\infty = \emptyset \in \mathcal{F}\) in a deterministic way.

It turns out that both the law of \(\rho (\Phi _q)\) and the law of \(\rho (\Phi _q)\) conditioned on the event \(\bigl \{|\rho (\Phi _q)| = m\bigr \}\), for any \(1 \le m \le n\), provide well-distributed points. And we can compute the common value of the mean hitting time in both cases in terms of the eigenvalues of the generator L given by

$$\begin{aligned} (Lf)(x) = \sum _{y \in \mathcal{X}} w(x, y)\bigl [f(y) - f(x)\bigr ], \qquad f:\mathcal{X} \rightarrow {{\mathbb {C}}}, \qquad x \in \mathcal{X}, \end{aligned}$$

To this end, let us denote by \(\lambda _0\), \(\lambda _1, \ldots , \lambda _{n - 1}\) the eigenvalues of \(-L\) and \((a_k : 0 \le k \le n)\) the coefficients of the characteristic polynomial of L, which computed in q is

$$\begin{aligned} \det (q - L) = \prod _{j < n} (q + \lambda _j) = \sum _{k \le n} a_k q^k. \end{aligned}$$

In this formula and all along the paper, we identify scalars with the appropriate multiples of the identity matrix. Recalling that X is irreducible and ordering the eigenvalue by non-decreasing real part, \(\lambda _0\) is the only one zero eigenvalue, we have \(a_0 = 0\) and we can set \(a_{n + 1} = 0\).

Theorem 1

For all \(x \in \mathcal{X}\) and all positive integer \(m \le n\) it holds

$$\begin{aligned} {{\mathbb {E}}}\Bigl [E_x\bigl [T_{\rho (\Phi _q)}\bigr ]\Bigr ] = {1 \over q} \left( 1 - \prod _{j > 0} {\lambda _j \over q + \lambda _j} \right) \quad \hbox {and}\quad {{\mathbb {E}}}\Bigl [E_x\bigl [T_{\rho (\Phi _q)}\bigr ] \Bigm | |\rho (\Phi _q)| = m\Bigr ] = {a_{m + 1} \over a_m}. \end{aligned}$$

We prove this theorem in Sect. 3, in which we also compute, in both cases, as a consequence of it and as needed in [2], mean return times to \(\rho (\Phi _q)\) from a uniformly chosen point in \(\rho (\Phi _q)\). In doing so, we will see that the problem of finding a distribution that provides exactly m well-distributed points has infinitely many solutions as soon as \(2 \le m \le n - 2\) and Theorem 1 simply provides one of them. The only cases when the convex set of solutions reduces to a singleton are the known case \(m = 1\), the easy case \(m = n -1\) and the trivial one \(m = n\).

1.2 Local Equilibria and Random Forests in the Reversible Case

For \(\mathcal{B} \subset \mathcal{X}\) we identify the signed measures on \(\mathcal{X} {\setminus } \mathcal{B}\) with the row vectors \(\nu = \bigl (\nu (x) : x \in \mathcal{X} {\setminus } \mathcal{B}\bigr )\). For \(\mathcal{A} \subset \mathcal{X}\) and any matrix \(M = \bigl (M(x, y) : x, y \in \mathcal{X}\bigr )\), we write \([M]_\mathcal{A}\) for the submatrix

$$\begin{aligned}{}[M]_\mathcal{A} = \bigr (M(x, y) : x, y \in \mathcal{A}\bigl ). \end{aligned}$$

We identify L with its matrix representation with diagonal coefficients \(-w(x)\), where

$$\begin{aligned} w(x) = \sum _{y \ne x} w(x, y), \qquad x \in \mathcal{X}, \end{aligned}$$
(1)

and we set

$$\begin{aligned} \alpha = \max _{x \in \mathcal{X}} w(x). \end{aligned}$$
(2)

The sub-Markovian generator \([L]_{\mathcal{X} {\setminus } \mathcal{B}}\) is associated with the process killed in “the boundary” \(\mathcal{B}\). We will assume in this section that L is reversible with respect to \(\mu \) and we write

$$\begin{aligned} \lambda _{0, \mathcal{B}} < \lambda _{1, \mathcal{B}} \le \cdots \le \lambda _{l - 1, \mathcal{B}} \end{aligned}$$

for the ordered eigenvalues of \(-[L]_{\mathcal{X} {\setminus } \mathcal{B}}\), with \(l = |\mathcal{X} {\setminus } \mathcal{B}|\). We can then inductively define from any probability measure \(\nu = \nu _{l - 1}\) on \(\mathcal{X} {\setminus } \mathcal{B}\) the a priori signed measures \(\nu _k\), with \(k < l\), by

$$\begin{aligned} \nu _k [L]_{\mathcal{X} {\setminus } \mathcal{B}} = \lambda _{k, \mathcal{B}}\bigl [\nu _k - \nu _{k - 1}\bigr ]. \end{aligned}$$
(3)

To avoid ambiguity, we set \(\nu _{k - 1} = 0\) if \(\lambda _{k, \mathcal{B}} = 0\). The following result is due to Micchelli and Willoughby (see Theorem 3.2 in [11]).

Micchelli and Willoughby’s Theorem

If L is reversible, then \(\nu _k\) is a non-negative measure for all non-negative \(k < l\) and any probability measure \(\nu \) on \(\mathcal{X} {\setminus } \mathcal{B}\).

Since Eq. (3) can also be written (for \(k > 0\) or \(\mathcal{B} \ne \emptyset \), so that \(\lambda _{k, \mathcal{B}} > 0\)) as

$$\begin{aligned} \nu _{k - 1} = \nu _k {[L]_{\mathcal{X} {\setminus } \mathcal{B}} + \lambda _{k, \mathcal{B}} \over \lambda _{k, \mathcal{B}}}, \end{aligned}$$

we have

$$\begin{aligned} \sum _{x \in \mathcal{B}} \nu _{k - 1}(x) \le \sum _{x \in \mathcal{B}} \nu _k(x) \le \cdots \le \sum _{x \in \mathcal{B}} \nu (x) = 1 \end{aligned}$$

and we can identify each \(\nu _k\) with a probability measure on the quotient space \(\mathcal{X} / \mathcal{B}\) for the equivalence relation \(\sim _\mathcal{B}\) such that \(x \sim _\mathcal{B} y\) if and only if \(x = y\) or \(x, y \in \mathcal{B}\): we simply put the missing mass on \(\mathcal{B}\). Equation (3) has then the following probabilistic interpretation. Starting from \(\nu _k\), the system decays into \(\nu _{k - 1}\) after an exponential time of parameter \(\lambda _{k, \mathcal{B}}\), and, more precisely, starting from \(\nu _k(\cdot | \mathcal{B})\), the system remains in this state for an exponential time of parameter \(\lambda _{k, \mathcal{B}}\) before decaying into \(\nu _{k - 1}(\cdot | \mathcal{B})\) or reaching \(\mathcal{B}\). This is rigorously established by Fill and Miclo (see [7] and [12]) to control convergence to equilibrium and absorption times of reversible processes, and this is the reason why the \(\nu _k\)s can be described as local equilibria.

The previous probabilistic interpretation makes sense only once the non-negativity of the \(\nu _k\)s is guaranteed by Micchelli and Willoughby’s theorem, which is crucial in Fill’s and Miclo’s analysis. The fully algebraic proof by Micchelli and Willoughby describes the \(\nu _k\)s in terms of some divided differences and uses Cauchy’s interlacement theorem in an inductive argument to conclude to positivity. We will show in Sect. 4, on the one hand, that computing the probability of certain events related to our random forests \(\Phi _q\) leads naturally to the divided difference representation of the \(\nu _k\)s, when one has in mind their local equilibria interpretation. This will be done by using Wilson’s algorithm, which gives an alternative description of our random forests (see Sect. 2). On the other hand, our random forest original description will lead to the key formula of the inductive step: from the random forest point of view, this algebraic formula is nothing but a straightforward connection between the previous event probabilities. Section 4 contains the full derivation of Micchelli and Willoughby’s theorem.

1.3 Coupling Random Forests, Coalescence and Fragmentation

In dealing with practical sampling issues in the next section, we will couple all the \(\Phi _q\)s together in such a way that we will obtain the following side result.

Theorem 2

There exists a (non-homogeneous) continuous-time Markov process \((F(s) \in \mathcal{F}: s\ge 0)\) that couples together all our random forests \(\Phi _q\) for \(q > 0\) as follows: for all \(s \ge 0\) and \(\phi \in \mathcal{F}\), it holds

$$\begin{aligned} {{\mathbb {P}}}(F(s) = \phi ) = {{\mathbb {P}}}(\Phi _{1 / t} = \phi ) = {{\mathbb {P}}}(\Phi _q = \phi ) \end{aligned}$$

with \(t = 1/q\), \(s = \ln (1 + \alpha t)\) and \(\alpha \) as in Eq. (2).

With each spanning forest \(\phi \), we can associate a partition \(\mathcal{P}(\phi )\) of \(\mathcal{X}\), for which x and y in \(\mathcal{X}\) belong to the same class when they are in the same tree. We will see in Sect. 2.3 that the coupling \(t \mapsto \Phi _{1/t} = F(\ln (1 + \alpha t))\) is then associated with a fragmentation and coalescence process, for which coalescence is strongly predominant, and at each jump time, one component of the partition is fragmented into pieces that possibly coalesce with the other components. This coupling will lead to a number of open questions: (1) Is it possible to use this process to sample efficiently \(\Phi _q\) with a prescribed number of roots? (2) Can we use it to estimate the spectrum of L? (3) How to characterize the law of the associated partition process? (See Sect. 2.3 for more details.)

2 Preliminary Remarks and Sampling Issues

2.1 Wilson Algorithm, Partition Function and the Root Process

Let us first slightly extend our notion of random forests. For any \(\mathcal{B} \subset \mathcal{X}\), we denote by \(\Phi _{q, \mathcal{B}}\) a random variable in \(\mathcal{F}\) with the law of \(\Phi _q\) conditioned on the event \(\bigl \{\mathcal{B} \subset \rho (\Phi _q)\bigl \}\). We then have, for any \(\phi \) in \(\mathcal{F}\),

$$\begin{aligned} {{\mathbb {P}}}\bigl (\Phi _{q, \mathcal{B}} = \phi \bigr ) = {w(\phi ) q^{|\rho (\phi )| - |\mathcal{B}|} \over Z_\mathcal{B}(q)} {{\mathbb {1}}}_{\{\mathcal{B} \subset \rho (\phi )\}} \end{aligned}$$

with

$$\begin{aligned} Z_\mathcal{B}(q) = \sum _{\phi : \rho (\phi ) \supset \mathcal{B}} w(\phi ) q^{|\rho (\phi )| - |\mathcal{B}|}. \end{aligned}$$
(4)

This law is non-degenerate even for \(q = 0\), provided that \(\mathcal{B}\) is non-empty. And if \(\mathcal{B}\) is a singleton \(\{r\}\), then \(\Phi _{0, \{r\}}\) is the usual random spanning tree with a prescribed root r, which can be sampled with Wilson’s algorithm (cf. [14]). For \(q > 0\), \(\Phi _q = \Phi _{q, \emptyset }\) itself is also a special case of the usual random spanning tree on an extended weighted graph \(\bar{\mathcal{G}} = (\bar{\mathcal{X}}, {\bar{w}})\) obtained by addition of an extra point r to \(\mathcal{X}\)—to form \(\bar{\mathcal{X}} = \mathcal{X} \cup \{r\}\)—and by setting \({\bar{w}}(x, r) = q\) and \({\bar{w}}(r, x) = 0\) for all x in \(\mathcal{X}\). Indeed, to get \(\Phi _q\) from the usual random spanning tree on \(\bar{\mathcal{X}}\), with the root in r, one only needs to remove all the edges going from \(\mathcal{X}\) to r. Following Propp and Wilson (cf. [13]), we can then use Wilson’s algorithm to sample \(\Phi _{q, \mathcal{B}}\) for \(q > 0\) or \(\mathcal{B} \ne \emptyset \):

  1. a.

    start from \(\mathcal{B}_0 = \mathcal{B}\) and \(\phi _0 = \emptyset \), choose x in \(\mathcal{X} {\setminus } \mathcal{B}_0\) and set \(i = 0\);

  2. b.

    run the Markov process starting at x up to time \(T_q \wedge T_{\mathcal{B}_i}\) with \(T_q\) an independent exponential random variable with parameter q (so that \(T_q = +\infty \) if \(q = 0\)) and \(T_{\mathcal{B}_i}\) the hitting time of \(\mathcal{B}_i\);

  3. c.

    with

    $$\begin{aligned} \Gamma ^x_{q, \mathcal{B}_i} = (x_0, x_1, \ldots , x_k) \in \{x\} \times \bigl (\mathcal{X} {\setminus } (\mathcal{B}_i \cup \{x\})\bigr )^{k - 1} \times \bigl (\mathcal{X} {\setminus } \{x\}\bigr ) \end{aligned}$$

    the loop-erased trajectory obtained from \(X : [0, T_q \wedge T_{\mathcal{B}_i}] \rightarrow \mathcal{X}\), set \(\mathcal{B}_{i + 1} = \mathcal{B}_i \cup \{x_0, x_1, \ldots , x_k\}\) and \(\phi _{i + 1} = \phi _i \cup \{(x_0, x_1), (x_1, x_2), \ldots , (x_{k - 1}, x_k)\}\) (so that \(\phi _{i + 1} = \phi _i\) if \(k = 0\));

  4. d.

    if \(\mathcal{B}_{i + 1} \ne \mathcal{X}\), choose x in \(\mathcal{X} {\setminus } \mathcal{B}_{i + 1}\) and repeat b–d with \(i + 1\) in place of i, and, if \(\mathcal{B}_{i + 1} = \mathcal{X}\), set \(\Phi _{q, \mathcal{B}} = \phi _{i + 1}\).

This algorithm is not only a practical algorithm to sample \(\Phi _q\), but also a powerful tool to analyse its law, one of its main strength points being the fact that the order of the chosen starting points x does not matter.

There are at least two ways to prove that this algorithm indeed samples \(\Phi _{q, \mathcal{B}}\) with the desired law, whatever the way in which the starting points x are chosen. One can, on the one hand, follow Wilson’s original proof in [14], which makes use of the so-called Diaconis–Fulton stack representation of Markov chains (see Sect. 2.3). One can, on the other hand, follow Marchal who first computes in [10] the law of the loop-erased trajectory \(\Gamma ^x_{q, \mathcal{B}}\) obtained from the random trajectory \(X : [0, T_q \wedge T_{\mathcal{B}}] \rightarrow \mathcal{X}\) started at \(x \in \mathcal{X} {\setminus } \mathcal{B}\) and stopped in \(\mathcal{B}\) or at an exponential time \(T_q\) if \(T_q\) is smaller than the hitting time \(T_\mathcal{B}\). One has indeed:

Theorem [Marchal]

For any self-avoiding path \((x_0, x_1, \ldots , x_k) \in \mathcal{X}^{k + 1}\) such that \(x_0 = x \in \mathcal{X} {\setminus } \mathcal{B}\), it holds

$$\begin{aligned} {{\mathbb {P}}}\bigl (\Gamma ^x_{q, \mathcal{B}} = (x_0, x_1, \ldots , x_k)\bigr ) = {\left\{ \begin{array}{ll} \displaystyle \prod _{j< k} w(x_j, x_{j + 1}) {\det [q - L]_{\mathcal{X} {\setminus } (\mathcal{B} \cup \{x_0, \ldots , x_{k - 1}\})} \over \det [q - L]_{\mathcal{X} {\setminus } \mathcal{B}}} &{} \quad \hbox {if }x_k \in \mathcal{B},\\ \displaystyle q \prod _{j < k} w(x_j, x_{j + 1}) {\det [q - L]_{\mathcal{X} {\setminus } (\mathcal{B} \cup \{x_0, \ldots , x_k\})} \over \det [q - L]_{\mathcal{X} {\setminus } \mathcal{B}}} &{}\quad \hbox {if }x_k \not \in \mathcal{B}. \end{array}\right. } \end{aligned}$$

From this result, one can compute the law of \(\Phi _{q, \mathcal{B}}\) defined by the previous algorithm to find, after telescopic cancellation,

$$\begin{aligned} {{\mathbb {P}}}\bigl (\Phi _{q, \mathcal{B}} = \phi \bigr ) = {w(\phi ) q^{|\rho (\phi )| - |\mathcal{B}|} \over \det [q - L]_{\mathcal{X} {\setminus } \mathcal{B}}} {{\mathbb {1}}}_{\{\mathcal{B} \subset \rho (\phi )\}}, \qquad \phi \in \mathcal{F}. \end{aligned}$$
(5)

We can then identify the law of \(|\rho (\Phi _{q, \mathcal{B}})|\) in terms of the eigenvalues \(\lambda _{0, \mathcal{B}}\), \(\lambda _{1, \mathcal{B}},\ldots , \lambda _{l - 1, \mathcal{B}}\), with \(l = |\mathcal{X} {\setminus } \mathcal{B}|\), of \(-[L]_{\mathcal{X} {\setminus } \mathcal{B}}\). Let us write to this end

$$\begin{aligned} J = \{0, 1, 2, \ldots , l - 1\} = J_0 \cup J_+ \cup J_- \end{aligned}$$

with

$$\begin{aligned} J_0= & {} \bigl \{j \in J : \lambda _{j, \mathcal{B}} \in {{\mathbb {R}}}\bigr \},\qquad J_+ = \bigl \{j \in J : \mathrm{Im}(\lambda _{j, \mathcal{B}}) > 0\big \},\\ J_-= & {} \bigl \{j \in J : \mathrm{Im}(\lambda _{j, \mathcal{B}}) < 0\big \}, \end{aligned}$$

let us set

$$\begin{aligned} p_j = {q \over q + \lambda _{j, \mathcal{B}}},\quad j \in J, \end{aligned}$$

and define independent random variables \(B_j\), with j in \(J_0\), and \(C_j\), with j in \(J_+\), such that the \(B_j\)s follow Bernoulli laws and the \(C_j\)s follow convolutions of conjugated “complex Bernoulli laws”:

$$\begin{aligned} {{\mathbb {P}}}(B_j = 1)= & {} p_j, \quad {{\mathbb {P}}}(B_j = 0) = 1 - p_j, \quad j \in J_0\\ {{\mathbb {P}}}(C_j = 2)= & {} p_j{\bar{p}}_j, \quad {{\mathbb {P}}}(C_j = 1) = p_j(1 - {\bar{p}}_j) + {\bar{p}}_j(1 - p_j),\\ {{\mathbb {P}}}(C_j = 0)= & {} (1 - p_j)(1 - {\bar{p}}_j), \quad \mathrm{i.e.}\\ {{\mathbb {P}}}(C_j = 2)= & {} |p_j|^2, \quad {{\mathbb {P}}}(C_j = 1) = 2 \mathrm{Re}(p_j) - 2|p_j|^2,\\ {{\mathbb {P}}}(C_j = 0)= & {} 1 - 2\mathrm{Re}(p_j) + |p_j|^2, \quad j \in J_+. \end{aligned}$$

Note that the previous equations indeed define probability laws for the \(C_j\)s as soon as \(2\mathrm{Re}(p_j) \ge 2|p_j|^2\) for j in \(J_+\). This is equivalent to

$$\begin{aligned} {q \over q + \lambda _{j, \mathcal{B}}} + {q \over q + {\bar{\lambda }}_{j, \mathcal{B}}} \ge {2q^2 \over |q + \lambda _{j, \mathcal{B}}|^2} \quad \Leftrightarrow \quad 2q^2 + 2q\mathrm{Re}(\lambda _{j, \mathcal{B}}) \ge 2q^2 \end{aligned}$$

and ensured by the fact that the eigenvalues have non-negative real part. We eventually set

$$\begin{aligned} S_{q, \mathcal{B}} = |\mathcal{B}| + \sum _{j \in J_0} B_j + \sum _{j \in J_+} C_j. \end{aligned}$$
(6)

Proposition 2.1

\(Z_\mathcal{B}\) is the characteristic polynomial of \([L]_{\mathcal{X} {\setminus } \mathcal{B}}\), i.e.

$$\begin{aligned} \sum _{\phi : \rho (\phi ) \supset \mathcal{B}} w(\phi ) q^{|\rho (\phi )| - |\mathcal{B}|} = \det (q - [L]_{\mathcal{X} {\setminus } \mathcal{B}}) = \prod _{j \in J}(q + \lambda _{j, \mathcal{B}}), \quad q \in {{\mathbb {R}}}, \end{aligned}$$
(7)

and \(|\rho (\Phi _{q, \mathcal{B}})|\) has the same law as \(S_{q, \mathcal{B}}\) for \(q > 0\) or \(\mathcal{B} \ne \emptyset \).

Proof

Equation (5) allows to identify \(Z_\mathcal{B}\) by summing on \(\phi \) in \(\mathcal{F}\). The identity in law is obtained by identifying monomials with equal degree in Eq. (7) and by dividing each identity by \(Z_\mathcal{B}(q)\) to get, for any \(k \le l = |\mathcal{X} {\setminus } \mathcal{B}|\),

$$\begin{aligned} {{\mathbb {P}}}\bigl (|\rho (\Phi _{q, \mathcal{B}})| = |\mathcal{B}| + k\bigr ) = \sum _{\begin{array}{c} \scriptstyle I \subset J: \\ \scriptstyle |I| = k \end{array}} \prod _{i \in I} {q \over q + \lambda _{i, \mathcal{B}}} \prod _{i \in J {\setminus } I} {\lambda _{i, \mathcal{B}} \over q + \lambda _{i, \mathcal{B}}} = \sum _{\begin{array}{c} \scriptstyle I \subset J: \\ \scriptstyle |I| = k \end{array}} \prod _{i \in I} p_j \prod _{i \in J {\setminus } I} \bigl (1 - p_j\bigr ). \end{aligned}$$

Since for each j in \(J_+\) there is \(j'\) in \(J_-\) such that \(\lambda _{j', \mathcal{B}} = {\bar{\lambda }}_{j, \mathcal{B}}\) and \(p_{j'} = {\bar{p}}_j\), the proof is complete. \(\square \)

The fact that \(\Phi _q\) is the usual random spanning tree on an extended graph implies, through the (non-reversible) transfer current theorem (cf. [4] and [5]), that \(\Phi _{q, \mathcal{B}} \subset \mathcal{E}\) is a determinantal process and so is \(\rho (\Phi _{q, \mathcal{B}}) \subset \mathcal{X}\). (In the reversible case at least, the fact that the law of \(|\rho (\Phi _{q, \mathcal{B}})|\) is a convolution of Bernoulli laws is also a consequence of this determinantality property.) Let us give a direct and short proof of the fact that \(\rho (\Phi _{q, \mathcal{B}})\) is a determinantal process associated with a remarkable kernel.

Proposition 2.2

For any \(\mathcal{A} \subset \mathcal{X}\)

$$\begin{aligned} {{\mathbb {P}}}\bigl (\mathcal{A} \subset \rho (\Phi _{q, \mathcal{B}})\bigr ) = \det [K_{q, \mathcal{B}}]_\mathcal{A} \end{aligned}$$

with

$$\begin{aligned} K_{q, \mathcal{B}}(x, y) = P_x\bigl (X(T_q \wedge T_\mathcal{B}) = y\bigr ), \quad x, y \in \mathcal{X}. \end{aligned}$$

Proof

Since for any \(x \in \mathcal{B}\) we have \(K_{q, \mathcal{B}} (x, \cdot ) = \delta _x\), it holds \(\det [K_{q, \mathcal{B}}]_\mathcal{A} = \det [K_{q, \mathcal{B}}]_{\mathcal{A} {\setminus } \mathcal{B}}\) and we can assume without loss of generality that \(\mathcal{A} \subset \mathcal{X} {\setminus } \mathcal{B}\). By sampling \(\Phi _{q, \mathcal{B}}\) with Wilson’s algorithm and choosing as first starting points for the loop-erased random walks the elements of \(\mathcal{A}\), we have by Marchal’s formula and after telescopic cancellation

$$\begin{aligned} {{\mathbb {P}}}\bigl (\mathcal{A} \subset \rho (\Phi _{q, \mathcal{B}})\bigr )= & {} q^{|\mathcal{A}|} {\det [q - L]_{(\mathcal{X} {\setminus } \mathcal{B}) {\setminus } \mathcal{A}} \over \det [q - L]_{\mathcal{X} {\setminus } \mathcal{B}}} = q^{|\mathcal{A}|} \det \bigl [[q - L]_{\mathcal{X} {\setminus } \mathcal{B}}^{-1}\bigr ]_\mathcal{A}\\= & {} \det \bigl [q [q - L]_{\mathcal{X} {\setminus } \mathcal{B}}^{-1}\bigr ]_\mathcal{A}. \end{aligned}$$

The last but one equality, sometimes referred as Jacobi’s equality, is obtained from standard manipulations of the Schur complement

$$\begin{aligned} S = A-BD^{-1}C \end{aligned}$$

of the block D in a \(2 \times 2\) block matrix

$$\begin{aligned} M = \begin{pmatrix} A &{}\quad B \\ C &{}\quad D \end{pmatrix} = \begin{pmatrix} {{\mathbb {1}}} &{}\quad BD^{-1} \\ 0 &{}\quad {{\mathbb {1}}} \end{pmatrix} \begin{pmatrix} S &{}\quad 0 \\ 0 &{}\quad D \end{pmatrix} \begin{pmatrix} {{\mathbb {1}}} &{}\quad 0 \\ D^{-1}C &{}\quad {{\mathbb {1}}} \end{pmatrix}. \end{aligned}$$

This formula gives \(\det M = \det S \det D\) and identifies \(S^{-1}\) as one block of

$$\begin{aligned} M^{-1} = \begin{pmatrix} S^{-1} &{}\quad -S^{-1}BD^{-1} \\ -D^{-1}CS^{-1} &{}\quad D^{-1} + D^{-1}CS^{-1}BD^{-1} \end{pmatrix}, \end{aligned}$$

the determinant of which is \(\det S^{-1} = \det D / \det M\). The previous equality was obtained by taking \(M = [q - L]_{\mathcal{X} {\setminus } \mathcal{B}}\) and \(D = [q - L]_{(\mathcal{X} {\setminus } \mathcal{B}) {\setminus } \mathcal{A}}\) (so that S is the sub-Markovian generator of the trace on \(\mathcal{A}\) of the original process killed in \(\mathcal{B}\) and at rate q outside \(\mathcal{B}\), while \(S^{-1}\) is the associated Green’s kernel).

Let us express \(K_{q, \mathcal{B}}\) in terms of L to conclude this proof. The trajectory of X can be built by updating at each time of a Poisson process of intensity \(\alpha \), defined in Eq. (2), the current position \(x \in \mathcal{X}\) to \(y \in \mathcal{X}\) with probability

$$\begin{aligned} P(x, y) = w(x, y)/\alpha , \end{aligned}$$
(8)

with the convention

$$\begin{aligned} w(x, x) = \alpha - w(x). \end{aligned}$$
(9)

Since the probability of reaching the time \(T_q\) between two successive updating is \(q/(q + \alpha )\), we get for any x and y in \(\mathcal{X} {\setminus } \mathcal{B}\)

$$\begin{aligned} P_x\bigl (X(T_q \wedge T_\mathcal{B}) = y\bigr )&= \sum _{k \ge 0} {q \over q + \alpha }\left( 1 - {q \over q + \alpha }\right) ^k [P]_{\mathcal{X} {\setminus } \mathcal{B}}^k(x, y)\\&= {q \over q + \alpha } \left( 1 - {\alpha \over q + \alpha }[P]_{\mathcal{X} \setminus \mathcal{B}}\right) ^{-1}(x, y)\\&= q \bigl (q - \alpha [P - 1]_{\mathcal{X} \setminus \mathcal{B}}\bigr )^{-1}(x, y) = q [q - L]_{\mathcal{X} \setminus \mathcal{B}}^{-1}(x, y). \end{aligned}$$

And for any \(\mathcal{A} \subset \mathcal{X} {\setminus } \mathcal{B}\) it holds \([K_{q, \mathcal{B}}]_\mathcal{A} = \bigl [q[q - L]_{\mathcal{X} {\setminus } \mathcal{B}}^{-1}\bigr ]_\mathcal{A}\). \(\square \)

Fig. 1
figure 1

A sample of \(\mathcal{P}(\Phi _q)\) and \(\rho (\Phi _q)\) with 50 roots on the \(987 \times 610\) rectangular grid and for the Metropolis random walk at inverse temperature \(\beta = .06\) in a Brownian sheet potential V, i.e. such that nearest-neighbours rates are given by \(w(x, y) = \exp \{-\beta [V(y) - V(x)]_+\}\) with V being the grid restriction of a Brownian sheet with 0 value on the north and west sides of the box. This random walk is reversible with respect to \(\exp \{-\beta V\}\). The cyan lines separate neighbouring trees, the roots are at the centre of the red diamonds, and blue levels depend on the potential: the darker the blue, the lower the potential (Color figure online)

We conclude this section by observing that the Markov chain tree theorem (see, for example, [1]) allows to compute the root distribution when conditioning on \(\mathcal{P}(\Phi _q)\), the partition of \(\mathcal{X}\) that is associated with \(\Phi _q\). We write \(\mathcal{P}(\Phi _q) = [\mathcal{X}_1, \ldots , \mathcal{X}_m]\) if \(\Phi _q\) is made of m trees, each of them spanning one of the \(\mathcal{X}_i\)s. For each \(i \le m\), we denote by \(L_i\) the generator of X restricted to \(\mathcal{X}_i\), which is defined by

$$\begin{aligned} (L_i f)(x) = \sum _{y \in \mathcal{X}_i} w(x, y)\bigl [f(y) - f(x)\bigr ], \quad x \in \mathcal{X}_i, \quad f: \mathcal{X}_i \rightarrow {{\mathbb {C}}}, \end{aligned}$$

and by \(X_i\) this restricted process. Since, by construction, the root of the spanning tree of \(\mathcal{X}_i\) is reachable by \(X_i\) from any point in \(\mathcal{X}_i\), this process admits only one invariant measure \(\mu _i\), which is equal to \(\mu (\cdot |\mathcal{X}_i)\) (recall that \(\mu \) is the invariant measure of X) when X is reversible. If \([\mathcal{X}_1, \ldots , \mathcal{X}_m]\) is an admissible partition of \(\mathcal{X}\), that is if \(\mathcal{P}(\Phi _q) = [\mathcal{X}_1, \ldots , \mathcal{X}_m]\) with nonzero probability, then, denoting by \(\mathcal{T}_i\) the set of spanning trees of \(\mathcal{X}_i\) and by \(\rho (\tau _i)\) the root \(x_i \in \mathcal{X}_i\) of \(\tau _i \in \mathcal{T}_i\), we can compute for any \((x_1, \ldots , x_m)\) in \(\mathcal{X}_1\times \cdots \times \mathcal{X}_m\)

$$\begin{aligned}&{{\mathbb {P}}}\Bigl (\rho (\Phi _q) = \{x_1, \ldots , x_m\} \Bigm | \mathcal{P}(\Phi _q) = [\mathcal{X}_1, \ldots , \mathcal{X}_m]\Bigr )\\&\quad = {q^m \sum _{\tau _1 \in \mathcal{T}_1} \cdots \sum _{\tau _m \in \mathcal{T}_m} \prod _{i = 1}^m w(\tau _i) {{\mathbb {1}}}_{\{\rho (\tau _i) = x_i\}} \over q^m \sum _{\tau _1 \in \mathcal{T}_1} \cdots \sum _{\tau _m \in \mathcal{T}_m} \prod _{i = 1}^m w(\tau _i)} \\&\quad = \prod _{i = 1}^m {\sum _{\tau _i \in \mathcal{T}_i} w(\tau _i) {{\mathbb {1}}}_{\{\rho (\tau _i) = x_i\}} \over \sum _{\tau _i \in \mathcal{T}_i} w(\tau _i)}. \end{aligned}$$

The Markov chain tree theorem gives then

Proposition 2.3

For any admissible partition \([\mathcal{X}_1, \ldots , \mathcal{X}_m]\) and any \((x_1, \ldots , x_m)\) in \(\mathcal{X}_1 \times \cdots \times \mathcal{X}_m\), it holds

$$\begin{aligned} {{\mathbb {P}}}\Bigl (\rho (\Phi _q) = \{x_1, \ldots , x_m\} \Bigm | \mathcal{P}(\Phi _q) = [\mathcal{X}_1, \ldots , \mathcal{X}_m]\Bigr ) = \prod _{i = 1}^m \mu _i(x_i). \end{aligned}$$

See Fig. 1 for an illustration with the two-dimensional nearest-neighbour random walk in a Brownian sheet potential, which is easy to sample and gives rise to a rich and anisotropic energy landscape.

2.2 Sampling Approximately m Roots

While Wilson’s algorithm provides a practical way to sample \(\Phi _q\), we do not have such an algorithm for \(\Phi _q\) conditioned on \(\bigl \{|\rho (\Phi _q)| = m\bigr \}\). In this section, we explain how to get \(\Phi _q\) with approximately m roots, with an error of order \(\sqrt{m}\) at most. By Proposition 2.1, it suffices to choose q solution of

$$\begin{aligned} \sum _{j < n} {q \over q + \lambda _j} = m. \end{aligned}$$
(10)

Indeed,

$$\begin{aligned} {{\mathbb {E}}}\bigl [|\rho (\Phi _q)|\bigr ] = \sum _{j \in J_0} {{\mathbb {E}}}[B_j] + \sum _{j \in J_+} {{\mathbb {E}}}[C_j] = \sum _{j \in J} {q \over q + \lambda _j}, \end{aligned}$$

while

$$\begin{aligned} \mathrm{Var}(B_j)= & {} p_j(1 - p_j) \le p_j = {{\mathbb {E}}}[B_j], \quad j \in J_0 \\ \mathrm{Var}(C_j)= & {} 2\mathrm{Re}(p_j) + 2|p_j|^2 - 4\mathrm{Re}(p_j)^2 \le 4\mathrm{Re}(p_j) = 2{{\mathbb {E}}}[C_j], \quad j \in J_+, \end{aligned}$$

so that \(\mathrm{Var}\bigl (|\rho (\Phi _q)|\bigr ) \le 2 {{\mathbb {E}}}\bigl [|\rho (\Phi _q)|\bigr ]\). But we do not want to solve Eq. (10) analytically, and we do not want to compute the eigenvalues \(\lambda _j\). One way to find an approximate value of the solution \(q^*\) of Eq. (10) is to use, on the one hand, the fact that \(q^*\) is the only one stable attractor of the recursive sequence defined by \(q_{k + 1} = f(q_k)\) with

$$\begin{aligned} f : q > 0 \mapsto q \times {m \over \sum _{j< n} {q \over q + \lambda _j}} = {m \over \sum _{j < n} {1 \over q + \lambda _j}}, \end{aligned}$$

on the other hand, the fact that \(|\rho (\Phi _q)|\) and \({{\mathbb {E}}}\bigl [|\rho (\Phi _q)|\bigr ] = \sum _{j < n} q / (q + \lambda _j)\) are typically of the same order, at least when \({{\mathbb {E}}}\bigl [|\rho (\Phi _q)|\bigr ]\), i.e. q, is large enough, since \(\mathrm{Var}\bigl (|\rho (\Phi _q)|\bigr ) / {{\mathbb {E}}}^2\bigl [|\rho (\Phi _q)|\bigr ] \le 2 / {{\mathbb {E}}}\bigl [|\rho (\Phi _q)|\bigr ]\). We then propose the following algorithm to sample \(\Phi _q\) with \(m \pm 2\sqrt{m}\) roots.

  1. a.

    Start from any \(q_0 > 0\), for example \(q_0 = \alpha = \max _{x \in \mathcal{X}} w(x)\), and set \(i = 0\).

  2. b.

    Sample \(\Phi _{q_i}\) with Wilson’s algorithm.

  3. c.

    If \(|\rho (\Phi _{q_i})| \not \in \bigl [m - 2 \sqrt{m}, m + 2 \sqrt{m}\bigr ]\), set \(q_{i + 1} = m q_i / |\rho (\Phi _{q_i})|\) and repeat b–c with \(i + 1\) instead of i, if \(|\rho (\Phi _{q_i})| \in \bigl [m - 2 \sqrt{m}, m + 2 \sqrt{m}\bigr ]\), then return \(\Phi _{q_i}\).

To see that this algorithm rapidly produces the desired result, it is convenient to write \(\gamma = \ln q\) and introduce the global contraction

$$\begin{aligned} g : \gamma \in {{\mathbb {R}}} \mapsto \ln f(e^\gamma ). \end{aligned}$$

While f is a contraction in a neighbourhood of \(q^*\) only, let us show that g is indeed a global contraction. For all \(\gamma \in {{\mathbb {R}}}\), it holds

$$\begin{aligned} g'(\gamma ) = {\sum _{j< n} \left( {q \over q + \lambda _j}\right) ^2 \over \sum _{j < n} {q \over q + \lambda _j}} = {\sum _{j \in J_0} \left( {q \over q + \lambda _j}\right) ^2 + 2 \sum _{j \in J_+} \mathrm{Re}^2\left( {q \over q + \lambda _j}\right) - \mathrm{Im}^2\left( {q \over q + \lambda _j}\right) \over \sum _{j \in J_0} {q \over q + \lambda _j} + 2 \sum _{j \in J_+} \mathrm{Re}\left( {q \over q + \lambda _j}\right) }. \end{aligned}$$

With \(\alpha _j = \mathrm{Re}(\lambda _j) > 0\) and \(\beta _j = \mathrm{Im}(\lambda _j) > 0\) for j in \(J_+\), we have

$$\begin{aligned} 0< \mathrm{Re}\left( {q \over q + \lambda _j}\right) = {q (q + \alpha _j) \over (q + \alpha _j)^2 + \beta _j^2} < 1 \qquad \mathrm{and}\qquad \mathrm{Im}\left( {q \over q + \lambda _j}\right) = {- q \beta _j \over (q + \alpha _j)^2 + \beta _j^2}, \end{aligned}$$

so that

$$\begin{aligned} \mathrm{Re}^2\left( {q \over q + \lambda _j}\right) \vee \mathrm{Im}^2\left( {q \over q + \lambda _j}\right) < \mathrm{Re}\left( {q \over q + \lambda _j}\right) , \end{aligned}$$

since \(0< \mathrm{Re}(q / (q + \lambda _j)) < 1\) and

$$\begin{aligned} {\mathrm{Im}^2\left( {q \over q + \lambda _j}\right) \over \mathrm{Re}\left( {q \over q + \lambda _j}\right) } = {q \beta _j^2 \over (q + \alpha _j)((q + \alpha _j)^2 + \beta _j^2)} < 1. \end{aligned}$$

Then, \(g'(\gamma )\) being the difference between two non-negative terms that are strictly smaller than 1, we have \(|g'(\gamma )| < 1\), for all \(\gamma \) in \({{\mathbb {R}}}\). Now, writing for \(k \ge 0\)

$$\begin{aligned} \epsilon _k = \bigl |\ln (1 + \delta _k)\bigr | = \left| \ln \left( {|\rho (\Phi _{q_k})| \over {{\mathbb {E}}}\bigl [|\rho (\Phi _{q_k})|\bigr ]} \right) \right| \quad \mathrm{and}\quad \gamma _{k + 1} = \ln q_{k + 1} = g(\gamma _k) - \ln (1 + \delta _k), \end{aligned}$$

there are some non-negative \(\theta _k < 1\) such that, with \(\gamma ^* = \ln q^* = g(\gamma ^*)\),

$$\begin{aligned} \bigl |\gamma _{k + 1} - \gamma ^*\bigr | \le \theta _k \bigl |\gamma _k - \gamma ^*\bigr | + \epsilon _k, \end{aligned}$$

and, by induction on \(k > 0\),

$$\begin{aligned} \bigl |\gamma _k - \gamma ^*\bigr |\le & {} \theta _{k - 1}\cdots \theta _0 \bigl |\gamma _0 - \gamma ^*\bigr | + \theta _{k - 1} \cdots \theta _1\epsilon _0 + \theta _{k - 1} \cdots \theta _2\epsilon _1\\&+ \cdots + \theta _{k - 1}\epsilon _{k-2} + \epsilon _{k-1}. \end{aligned}$$

Since Chebyshev’s inequality gives for all \(\delta > 0\)

$$\begin{aligned} {{\mathbb {P}}}\bigl (|\delta _k| > \delta \bigr ) \le {\mathrm{Var}\bigl (|\rho (\Phi _{q_k})|\bigr ) \over \delta ^2 {{\mathbb {E}}}^2\bigl [|\rho (\Phi _{q_k})|\bigr ]} \le {2 \over \delta ^2 {{\mathbb {E}}}\bigl [|\rho (\Phi _{q_k})|\bigr ]}, \end{aligned}$$

after “a few iterations” (we cannot be more precise in absence of extra information on the spectrum of L that would be needed to give uniform bounds on \(|g'|\) and the \(\theta _k\)s) the approximation error for \(\gamma ^*\) is of the same order as \(\epsilon _{k - 1}\)—itself of order \(1/\sqrt{m}\) at most—and we get m roots for \(\Phi _q\) within an error of order \(\sqrt{m}\).

2.3 Coupled Forests

Instead of stopping the iterations of the previous algorithm when reaching a forest with \(m \pm 2\sqrt{m}\) roots, one can proceed up to reaching exactly m roots. This typically requires order \(\sqrt{m}\) extra iterations at most, and this is what we have done to obtain the exactly 50 roots of Fig. 1. Starting with \(q = q_0\) larger than the solution \(q^*\) of Eq. (10), it takes generally much more time to reach exactly m roots than to decrease q down to a good approximation of \(q^*\) according to the updating procedure \(q \leftarrow q \times m / |\rho (\Phi _q)|\). For example, starting from \(q = 4\) for the Metropolis random walk in Brownian sheet potential of Fig. 1, we got 361,782 roots at the first iteration, 51 roots and \(q = 5.26 \times 10^{-6}\) at the tenth iteration, and we needed 55 extra iterations to get exactly 50 roots with \(q = 4.92 \times 10^{-6}\), getting in the mean time root numbers oscillating between 43 and 59 for q between \(3.96 \times 10^{-6}\) and \(6.07 \times 10^{-6}\). While decreasing q, we produce a number of forests with a larger root number than desired, and, sampling for large q being less time-consuming than sampling for small q, the total running time of the iterations to decrease q to the correct order is essentially of the same order as the running time of one iteration for q of this correct order. This suggests that if we could continuously decrease q in such a way that \(\Phi _q\) would cross all the manifolds

$$\begin{aligned} \mathcal{F}_m = \bigl \{\phi \in \mathcal{F} : |\rho (\phi )| = m\bigr \}, \qquad m \le n, \end{aligned}$$
(11)

then we might be able to find a more efficient algorithm to sample \(\Phi _q\) with a prescribed root number. It turns out that we are able to implement such a “continuously decreasing q algorithm,” building in this way the coupling of Theorem 2. But this is not sufficient to improve our sampling algorithm for a prescribed root number.

In this section, we prove Theorem 2, characterize the associated root process and describe the associated coalescence and fragmentation process, which leads to further open questions. This coupling is the natural extension of Wilson’s algorithm based on Diaconis and Fulton’s stack representation of random walk (cf. [6]) as used by Wilson and Propp in [14] and [13].

Stack representations Assume that an infinite list or collection or arrows is attached to each site of the graph, each arrow pointing towards one of its neighbour. Assume in addition that these arrows are distributed according to the probability kernel P of the discrete-time skeleton of X which is defined by Eqs. (8)–(9). Assume in other words that these arrows are independently distributed at each level of the stacks and that an arrow pointing towards the neighbour y of a given site x appears with probability P(xy), considering in this context x itself as one of its neighbours. Imagine finally that each list of arrows attached to any site is piled down in such a way that it makes sense to talk of an infinite stack with an arrow on the top of this stack. By using this representation, one can generate the Markov process as follows: at each jump time of a Poisson process with intensity \(\alpha \), our walker steps to the neighbour pointed by the arrow at the top of the stack where it was sitting, and the top arrow is erased from this stack.

To describe Wilson’s algorithm for sampling \(\Phi _q\), one has to introduce a further ingredient: pointers to an absorbing state r in each stack. Such a pointer should independently appear with probability \(q/(q + \alpha )\) at each level in the different stacks. One way to introduce it is by generating independent uniform random variables U together with each original arrow in the stacks. We can then replace the latter by a pointer to the absorbing state whenever \(U < q / (q + \alpha )\). A possible description of Wilson’s algorithm is then the following.

  1. a.

    Start with a particle on each site. Both particles and sites will be declared either active or frozen. At the beginning, all sites and particles are declared to be active.

  2. b.

    Choose an arbitrary particle among all the active ones and look at the arrow at the top of the stack it is seated on. Call x the site where the particle is seated.

    • If the arrow is the pointer to r, declare the particle to be frozen and site x as well.

    • If the arrow points towards another site \(y \ne x\), remove the particle and keep the arrow. We say that this arrow is uncovered.

    • If the arrow points to x itself, remove the arrow.

  3. c.

    Once again, choose an arbitrary particle among all the active ones, look at the arrow on the top of the stack it is seated on, and call x the site where the particle is seated.

    • If the arrow points to r, the particle is declared to be frozen, and so are declared x and all the sites eventually leading to x by following uncovered top pile arrow paths.

    • If the arrow points to a frozen site, remove the chosen particle at x, keep the (now uncovered) arrow, and freeze the site x as well as any site eventually leading to x by following uncovered top pile arrow paths.

    • If the arrow points to an active site, then there are two possibilities. By following from this site the uncovered arrows at the top of the stacks, we either reach a different active particle or run in a loop back to x. In the former case, remove the chosen particle from site x and keep the discovered arrow. In the latter case, erase all the arrows along the loop and put an active particle on each site of the loop. Note that this last case includes the possibility for the discovered arrow of pointing to x itself, in which case we just have to remove the discovered arrow.

  4. d.

    Repeat the previous step up to exhaustion of the active particles.

The crucial observation, which is due to Propp and Wilson, is that whatever the choice of active particles all along the algorithm, at the end of the day the same arrows are erased and the same spanning forest of uncovered arrows, with a frozen particle at each root, is obtained. In particular, by choosing at each step the last encountered active particle, or the same as in the previous step when we just erased a loop, we perform a simple loop-erased random walk up to freezing.

Proof of Theorem 2

Since \(\Phi _q\) is sampled for any q by the previously described algorithm and the same uniform variables U can be used for each q, this provides a global coupling for all the \(\Phi _q\). We first note that this coupling allows to sample \(\Phi _{q_2}\) from a sampled \(\Phi _{q_1}\) for \(q_2 < q_1\). Indeed, by running this algorithm for sampling \(\Phi _{q_2}\), one can reach at some point the spanning forest of uncovered arrows \(\Phi _{q_1}\) with this difference that the frozen particles of the final configuration obtained with parameter \(q_1\) can be still active at this intermediate step of the algorithm run with \(q_2\): it suffices to choose the sequence of active particles in the same way with both parameters, and this is possible since each pointer to r in the stacks with parameter \(q_2\) is associated with a pointer to r at the same level in the stacks with parameter \(q_1\). Thus, to sample \(\Phi _{q_2}\) from a sampled \(\Phi _{q_1}\), we just have to replace some frozen particles in \(\rho (\Phi _{q_1})\) and continue the algorithm with parameter \(q_2\). To decide which particle has to be unfrozen we can proceed as follows. With probability

$$\begin{aligned} p = {{\mathbb {P}}}\biggl (U< {q_2 \over q_2 + \alpha } \biggm | U < {q_1 \over q_1 + \alpha }\biggr ) = {q_2(q_1 + \alpha ) \over q_1(q_2 + \alpha )} \end{aligned}$$
(12)

each particle in \(\rho (\Phi _{q_1})\), independently from each other, is kept frozen. With probability \(1 - p\) a particle in a site x of \(\rho (\Phi _{q_1})\) is declared active and we set at the top of the pile in x an arrow that points towards y with probability \(P(x, y) = w(x, y) / \alpha \).

When \(q = 1 / t\) continuously decreases, we obtain a right-continuous process \(t \mapsto \Phi _{1 / t}\), for which we can practically sample not only the “finite dimensional distributions”—i.e. the law of \((\Phi _{1 / t_1}, \ldots , \Phi _{1 / t_k})\) for any choice of \(t_1< \cdots < t_k\)—but the whole trajectories \((\Phi _{1 / t} : t \le t^*)\) too, for any finite \(t^*\). Indeed, at each time \(t = 1 / q\), the next frozen particle to become active is uniformly distributed among the m roots at time t, and the next jump time T when it will “wake up” is such that the random variable

$$\begin{aligned} V = {1 / T \over \alpha + 1 / T} = {1 \over 1 + \alpha T} \end{aligned}$$
(13)

has the law of the maximum of m independent uniform variables on \(\bigr [0, q / (q + \alpha )\bigl ) = \bigr [0, 1 / (1 + \alpha t)\bigl )\). Since

$$\begin{aligned} {{\mathbb {P}}}(V< v) = \bigr (v (1 + \alpha t)\bigl )^m, \qquad v < {1 \over 1 + \alpha t}, \end{aligned}$$

V has the same law as \(U^{1 / m} / (1 + \alpha t)\) with U uniform on [0, 1). Using Eq. (13), we can then sample the next jump time T by solving

$$\begin{aligned} {1 + \alpha T \over 1 + \alpha t} = U^{- 1 / m}. \end{aligned}$$
(14)

Setting \(s = \ln (1 + \alpha t)\) and \(S = \ln (1 + \alpha T)\), the random variable \(S - s\) has the same law as the minimum of m independent exponential random variables of rate 1.

Our Markov process \((F(s) \in \mathcal{F} : s \ge 0)\) is then built in the following way. We associate m independent exponential random clocks of rate 1 with the m roots of F(s) at time s. At the first ring time \(S \ge s\) at some root x, we define F(S) by declaring active the particle at x, putting an arrow to y with probability \(P(x, y) = w(x, y) / \alpha \) and restarting our algorithm with parameter \(q = 1 / T = \alpha / (e^S - 1)\). \(\square \)

A determinantal formula for the associated root process. Proposition 2.2, from which we recall the definition of the probability kernel \(K_{q, \mathcal{B}}\), can be extended to characterize the law of the coupled root process \(t \mapsto \rho (\Phi _{1 / t})\).

Proposition 2.4

For all \(0< t_1< \cdots< t_k < t_{k + 1} = 1 / q_{k + 1}\) and all \(\mathcal{A}_1, \ldots , \mathcal{A}_k\), \(\mathcal{A}_{k + 1}\) contained in \(\mathcal{X}\), it holds

$$\begin{aligned}&{{\mathbb {P}}}\bigl (\mathcal{A}_{k + 1} \subset \rho (\Phi _{1 / t_{k + 1}}) \bigm | \mathcal{A}_{k} \subset \rho (\Phi _{1 / t_{k}}), \ldots , \mathcal{A}_{1} \subset \rho (\Phi _{1 / t_{1}})\bigr )\nonumber \\&\quad = \sum _{\mathcal{B}_k \subset \mathcal{A}'_k} \sum _{\mathcal{B}_{k - 1} \subset \mathcal{A}'_{k - 1}} \cdots \sum _{\mathcal{B}_{1} \subset \mathcal{A}'_{1}} \prod _{i = 1}^k \biggl ({t_i \over t_{k + 1}}\biggr )^{|\mathcal{B}_i|} \biggl (1 - {t_i \over t_{k + 1}}\biggr )^{|\mathcal{A}'_i {\setminus } \mathcal{B}_i|} \det \bigl [K_{q_{k + 1}, \mathcal{B}}\bigr ]_{\mathcal{A}_{k + 1}} \nonumber \\&\qquad \hbox {with} \quad \mathcal{A}'_k = \mathcal{A}_k, \quad \mathcal{A}'_{k - 1} = \mathcal{A}_{k - 1} {\setminus } \mathcal{A}_k, \quad \cdots \quad \mathcal{A}'_1 = \mathcal{A}_{1} {\setminus } (\mathcal{A}_k \cup \mathcal{A}_{k - 1} \cup \cdots \cup \mathcal{A}_2)\nonumber \\&\qquad \hbox {and} \quad \mathcal{B} = \bigcup _{i = 1}^k \mathcal{B}_i. \end{aligned}$$
(15)

Proof

Let us first consider the case \(k = 1\), so that \(\mathcal{A}'_1 = \mathcal{A}_1\). As far as \(\Phi _{1 / t}\) is concerned for \(t > t_1\), conditioning on \(\bigl \{A_1 \subset \rho (\Phi _1)\bigr \}\) is nothing but a conditioning on the value of the uniform random variables at the top of the stacks in \(\mathcal{A}_1\). With \(q_1 = 1 / t_1\), we cannot sample \(\Phi _{q_2}\) conditioned on \(\bigl \{\mathcal{A}_1 \subset \rho (\Phi _1)\bigr \}\) by keeping “frozen” each site in \(\mathcal{A}_1\) with probability p defined by Eq. (12), calling \(\mathcal{B}\) the set of the remaining frozen sites and sampling \(\Phi _{q_2, \mathcal{B}}\) with this random \(\mathcal{B} \subset \mathcal{X}\), so that the root set would be a determinantal process with kernel \(K_{q_2, \mathcal{B}}\). The walking up procedure we defined after Eq. (12) indeed introduces a bias in the distribution at the top of the pile for the unfrozen sites: top pile arrows cannot be replaced by pointers to r. To recover a determinantal process with random kernel \(K_{q_2, \mathcal{B}}\) for the conditional root process, the random set \(\mathcal{B}\) has to be built by keeping frozen each site in \(\mathcal{A}_1\) with a smaller probability \(p'\) solving

$$\begin{aligned} p = p' + (1 - p') {q_2 \over q_2 + \alpha }. \end{aligned}$$

Top pile arrows of unfrozen sites can then still be replaced by pointers to r with probability \(q_2 / (q_2 + \alpha )\), and this equation makes that we recover the correct biased probability. Solving it, we get \(p' = q_2/q_1 = t_1/t_2\) and Eq. (15).

When k is larger than 1, the formula is simply obtained by keeping frozen each site x in \(\bigcup _{i \le k} \mathcal{A}_k\) with a probability that depends on the largest i such that \(x \in \mathcal{A}_i\). This is the reason why we introduced the sets \(\mathcal{A}'_i\) : \(i^*\) is the largest i such that \(x \in \mathcal{A}_i\) if and only if \(x \in \mathcal{A}'_{i^*}\). \(\square \)

Fragmentation, coalescence and open questions. At each jump time \(S = S_{k + 1}\) of F and in the proof of Theorem 2, there is only one root x to “wake up,” which means that there is only one piece of the associated partition into m pieces at the previous jump time \(S_k\) that can be fragmented into different trees, the other pieces of the previous partition remaining contained in different pieces of the new partition at time \(S_{k + 1}\). At time \(S_{k + 1}\) we can have both fragmentation, produced by the loop-erasure procedure, and coalescence: the trees covering the possibly fragmented piece can be eventually grafted to the other \(m - 1\) non-fragmented frozen trees, when their associated loop-erased random walk freezes by running into these frozen trees.

Fragmentation can increase the total number of pieces up to \(m + k - 1\), with k the number of sites in the tree that is rooted at x: this happens when this tree is completely fragmented and no coalescence occurs. Coalescence can decrease the number of pieces by 1 at most: when each tree of the possibly fragmented piece is eventually grafted to the other pieces. But coalescence strongly dominates the process: as \(q = 1 / t\) decreases, so does \({{\mathbb {E}}}\bigl [|\rho (\Phi _q)|\bigr ]\), with limited fluctuations, as a consequence of Proposition 2.1 (cf. Figs. 2, 3, 4). And the fact that when \(|\rho (\Phi _q)|\) decreases, it does so by one unit at most, implies that the process \(t \mapsto \Phi _{1 / t}\) crosses all the manifolds \(\mathcal{F}_m\) defined by Eq. (11).

Fig. 2
figure 2

Snapshots at times \(s = \ln (1 + \alpha t)\) with \(t = 1 / q\) equal to 0, .5., 2, 8, 32, 128, 512, ..., 524,288 of the coalescence and fragmentation process \(s \mapsto F(s)\) for the simple random walk on the torus with uniform nearest-neighbours rates \(w(x, y) = 1\). Roots are red, non-root leaves are cyan, and other vertices are blue, different shades of blue being used for different trees (Color figure online)

Fig. 3
figure 3

Snapshots at times \(s = \ln (1 + \alpha t)\) with \(t = 1 / q\) equal to 0, .5., 2, 8, 32, 128, 512, ..., 524,288 of the coalescence and fragmentation process \(s \mapsto F(s)\) on the square grid for the random walk in a Brownian sheet potential with inverse temperature \(\beta = .16\). The colour conventions are the same as in Fig. 2 (Color figure online)

It is then easy to sample \(\Phi _{1 / T_m}\) with \(T_m\) the first time when \(t \mapsto \Phi _{1 / t}\) reaches \(\mathcal{F}_m\). Unfortunately, \(\Phi _{1 / T_m}\) has not the same law as \(\Phi _q\) conditioned on \(\bigl \{|\rho (\Phi _q)| = m\bigr \}\). One has indeed a counterexample already for \(n = 2\) and \(m = 1\). The random set \(\rho (\Phi _{T_m})\) is, generally, not even well distributed: for \(m = n - 1\), there is only one distribution on the subsets of size m that produces well-distributed points. We then get to our first open question

Q1::

Is there a way to use the process \(t \mapsto \Phi _{1 / t}\) to sample the measure \({{\mathbb {P}}}\bigl (\Phi _q \in \cdot \bigm | |\rho (\Phi _q)| = m\bigr )\)?

One can also use this process to estimate \(t \ge 0 \mapsto \sum _j 1 / (1 + t\lambda _j)\) since this sum is the expected value of \(|\rho (\Phi _{1 / t})|\), which presents limited fluctuations only (see Fig. 4). This leads to our second open question

Q2::

Is there a way to use the process \(t \mapsto \Phi _{1 / t}\) to estimate in an efficient way the spectrum of \(-L\), or its higher part at least?

Fig. 4
figure 4

The left picture shows the tree number as a function of time for the coalescence and fragmentation processes \(s \mapsto F(s)\) of Figs. 2 and 3 in semi-logarithmic scale. The right picture shows the same quantity in natural scale from time 15 to 27 for the coalescence and fragmentation process of Fig. 3 only

Our third open question concerns the law of the “rooted partition” associated with the forest process \(t \mapsto \Phi _{1 / t}\). (We call it rooted since a special vertex, the root, is associated with each piece of the partition.) Figures 3 and 4 and Wilson’s algorithm show that as \(q = 1 / t\) decreases, the partition process naturally tends to break the space into larger and larger valleys in which the process is trapped on time scale \(t = 1 / q\) (note that the difference of \(12 = 27 - 15\) between the extreme values of \(s = \ln (1 + \alpha t)\) in the right picture of Fig. 4 corresponds to a ratio of order \(1.6 \times 10^5\) between the associated times t). But, while we could characterize in Proposition 2.4 the law of the associated root process, we are far from obtaining a similar result for the rooted partition.

Q3::

Which characterization can be given of the rooted partition associated with \(t \mapsto \Phi _{1 / t}\)?

We actually know very little beyond Proposition 2.3 on this partition for a fixed value of q and an easier question would be that of characterizing the law of the forest process itself. Even though Fig. 2 echoes Figure 5 of [3], which illustrates a coalescence process that is also associated with random spanning forests, the two processes are quite different and we do not know the scaling limit of our process, even for a fixed value of q. The process considered in [3] is a pure coalescence process, while fragmentation is also involved in our case; at a fixed time t, the tree number in that case of the uniformly cut uniform spanning tree follows a binomial distribution, while the tree number of our process is distributed as a sum of Bernoulli random variables with non-homogeneous parameters; and, even conditioned on a same tree number, if the weights of the associated partitions share the same product of unrooted spanning tree number for each piece of the partition, the extra entropic factor depends in that case of these pieces’ boundaries, while in our case it is simply given by the product of their size.

3 Hitting Times

3.1 Forest Formulas for Hitting Distributions, Green’s Kernels and Mean Hitting Time

In order to prove Theorem 1, we first use Wilson’s algorithm to give forest representations of hitting distributions, Green’s kernels and mean hitting times. Two at least of these formulas, Formula (16) and Formula (17), already appeared in the work of Freidlin and Wentzell (see Lemma 3.2 and Lemma 3.3 in [8]).

We recall that \(T_\mathcal{B}\) stands for the hitting time of \(\mathcal{B} \subset \mathcal{X}\), we denote by \(\tau _x(\phi )\) the unique maximal tree in \(\phi \in \mathcal{F}\) that covers \(x \in \mathcal{X}\), and we recall that \(\rho (\tau _x(\phi )) \in \rho (\phi )\) is the root of \(\tau _x(\phi )\) and recall Eq. (4). By considering Wilson’s algorithm for sampling \(\Phi _{0, \mathcal{B}}\) and choosing x as first starting point for the loop-erased random walk, we first note that for all \(y \in \mathcal{B}\), it holds

$$\begin{aligned} P_x\bigl (X(T_\mathcal{B}) = y\bigr ) = {1 \over Z_\mathcal{B}(0)} \sum _{\begin{array}{c} \scriptstyle \phi : \rho (\phi ) = \mathcal{B}, \\ \scriptstyle \rho (\tau _x(\phi )) = y \end{array}} w(\phi ). \end{aligned}$$
(16)

We then get the following forest representation of the Green’s kernel

$$\begin{aligned} G_\mathcal{B}(x, z) = E_x\Biggl [\int _0^{T_\mathcal{B}}{{\mathbb {1}}}_{\{X(t) = z\}}\,dt\Biggr ], \qquad x, z \in \mathcal{X}. \end{aligned}$$

Lemma 3.1

For any \(\mathcal{B} \subset \mathcal{X}\), \(x \in \mathcal{X}\) and \(z \not \in \mathcal{B}\), it holds

$$\begin{aligned} G_\mathcal{B}(x, z) = {1 \over Z_\mathcal{B}(0)} \sum _{\begin{array}{c} \scriptstyle \phi : \rho (\phi ) = \mathcal{B} \cup \{z\}, \\ \scriptstyle \rho (\tau _x(\phi )) = z \end{array}} w(\phi ). \end{aligned}$$

We finally get Freidlin and Wentzell’s forest representation for mean hitting times:

$$\begin{aligned} E_x\bigl [T_\mathcal{B}\bigr ] = \sum _{z \not \in \mathcal{B}} G_\mathcal{B}(x, z) = {1 \over Z_\mathcal{B}(0)} \sum _{z \not \in \mathcal{B}} \sum _{\begin{array}{c} \scriptstyle \phi : \rho (\phi ) = \mathcal{B} \cup \{z\} \\ \scriptstyle \rho (\tau _x(\phi )) = z \end{array}} w(\phi ). \end{aligned}$$
(17)

Proof of Lemma 3.1:

We introduce once again the discrete skeleton of X—that is the Markov chain \({\hat{X}}\) with transition kernel P defined by Eqs. (8)–(9)—and we call \({\hat{G}}_\mathcal{B}\) the Green’s kernel of \({\hat{X}}\) stopped in \(\mathcal{B}\). Let us denote by \({\hat{T}}_z\), \({\hat{T}}_z^+\) and \({\hat{T}}_\mathcal{B}\) the hitting time of z, the return time to z and the hitting time of \(\mathcal{B}\) for the Markov chain \({\hat{X}}\). Since \({\hat{G}}_\mathcal{B}(x, z) = P_x\bigl ({\hat{T}}_z< {\hat{T}}_\mathcal{B}\bigr ){\hat{G}}_\mathcal{B}(z, z) = P_x\bigl ({\hat{T}}_z < {\hat{T}}_\mathcal{B}\bigr ) / P_z\bigl ({\hat{T}}_z^+ > {\hat{T}}_\mathcal{B}\bigr )\), it holds

$$\begin{aligned} G_\mathcal{B}(x, z)&= {1 \over \alpha } {\hat{G}}_\mathcal{B}(x, z) = {P_x\bigl ({\hat{T}}_z< {\hat{T}}_\mathcal{B}\bigr ) \over \alpha P_z\bigl ({\hat{T}}_z^+> {\hat{T}}_\mathcal{B}\bigr )} = {P_x\bigl ({\hat{T}}_z< {\hat{T}}_\mathcal{B}\bigr ) \over \alpha \sum _{y \ne z} P(z, y) P_y\bigl ({\hat{T}}_z > {\hat{T}}_\mathcal{B}\bigr )} \\&= {P_x\bigl (T_z< T_\mathcal{B}\bigr ) \over \sum _{y \ne z} w(z, y) \bigl [1 - P_y\bigl (T_z < T_\mathcal{B}\bigr )\bigr ]}. \end{aligned}$$

Then, since \(P_y\bigl (T_z < T_\mathcal{B}\bigr ) = P_y\Bigl (X\bigl (T_{\mathcal{B} \cup \{z\}}\bigr ) = z\Bigr )\) for \(z \not \in \mathcal{B}\), it holds, using Formula (16),

$$\begin{aligned} G_\mathcal{B}(x, z)&= { P_x\Bigl (X\bigl (T_{\mathcal{B} \cup \{z\}}\bigr ) = z\Bigr ) \over \sum _{y \ne z} \Bigl [1 - P_y\Bigl (X\bigl (T_{\mathcal{B} \cup \{z\}}\bigr ) = z\Bigr )\Bigr ] w(z, y)}\\&= { {1 \over Z_{\mathcal{B} \cup \{z\}}(0)} \sum _\phi w(\phi ) {{\mathbb {1}}}_{\{\rho (\phi ) = \mathcal{B} \cup \{z\}, \rho (\tau _x(\phi )) = z\}} \over \sum _{y \ne z} {1 \over Z_{\mathcal{B} \cup \{z\}}(0)} \sum _\phi w(\phi ) {{\mathbb {1}}}_{\{\rho (\phi ) = \mathcal{B} \cup \{z\}, \rho (\tau _y(\phi )) \ne z\}} w(z, y)}\\&= { \sum _\phi w(\phi ) {{\mathbb {1}}}_{\{\rho (\phi ) = \mathcal{B} \cup \{z\}, \rho (\tau _x(\phi )) = z\}} \over \sum _\phi w(\phi ) {{\mathbb {1}}}_{\{\rho (\phi ) = \mathcal{B} \cup \{z\}\}} \sum _{y \ne z} {{\mathbb {1}}}_{\{\rho (\tau _y(\phi )) \ne z\}} w(z, y) }. \end{aligned}$$

If we associate with each forest \(\phi '\) such that \(\rho (\phi ') = \mathcal{B}\) the forest \(\phi = \phi '{\setminus } \{(z, y)\}\) with (zy) the only edge in \(\phi '\) that is issued from z, then we have \(\rho (\phi ) = \mathcal{B} \cup \{z\}\), \(\rho (\tau _y(\phi )) \ne z\), \(w(\phi ') = w(\phi )w(z, y)\) and we recognize \(\sum _{\phi '} w(\phi ') {{\mathbb {1}}}_{\{\rho (\phi ') = \mathcal{B}\}} = Z_\mathcal{B}(0)\) in the last denominator. \(\square \)

3.2 Well-Distributed Roots

Proof of Theorem 1

We first note that for any \(\mathcal{B} \subset \mathcal{X}\), it holds [recall Eq. (4)]

$$\begin{aligned} {{\mathbb {P}}}\bigl (\rho (\Phi _q) = \mathcal{B}\bigr ) = {Z_\mathcal{B}(0) q^{|\mathcal{B}|} \over Z(q)} \end{aligned}$$

and, for \(0 < m \le n\),

$$\begin{aligned} {{\mathbb {P}}}\bigl (|\rho (\Phi _q)| = m\bigr ) = {a_m q^m \over Z(q)}. \end{aligned}$$
(18)

Then, by using Formula (17),

$$\begin{aligned} {{\mathbb {E}}}\Bigl [E_x\bigl [T_{\rho (\Phi _q)}\bigr ] \Big | |\rho (\Phi _q)| = m\Bigr ]&= \sum _{\mathcal{B} \subset \mathcal{X}: |\mathcal{B}| = m} {Z_\mathcal{B}(0) q^m \over a_m q^m} {1 \over Z_\mathcal{B}(0)} \sum _{z \not \in \mathcal{B}} \sum _{\begin{array}{c} \scriptstyle \phi : \rho (\phi ) = \mathcal{B} \cup \{z\} \\ \scriptstyle \rho (\tau _x(\phi )) = z \end{array}} w(\phi ) \\&= {1 \over a_m q^m}\sum _{\phi : |\rho (\phi )| = m + 1} {w(\phi ) q^{|\rho (\phi )|} \over q} \sum _{z \in \rho (\phi )} {{\mathbb {1}}}_{\{\rho (\tau _x(\phi )) = z\}} \\&= {a_{m + 1} q^{m + 1} \over a_m q^{m + 1}} = {a_{m + 1} \over a_m}. \end{aligned}$$

By using again Eq. (18) and Proposition 2.1 with \(S_q = S_{q, \emptyset }\), it follows

$$\begin{aligned} {{\mathbb {E}}}\Bigl [E_x\bigl [T_{\rho (\Phi _q)}\bigr ]\Bigr ]= & {} \sum _{m> 0} {a_{m + 1} q^m \over Z(q)} = {\sum _{k \ge 2} a_k q^k \over q Z(q)} = {{{\mathbb {P}}}(S_q \ge 2) \over q} = {1 - {{\mathbb {P}}}(S_q = 1) \over q} \\= & {} {1 \over q}\left( 1 - \prod _{j > 0} {\lambda _j \over q + \lambda _j} \right) . \end{aligned}$$

\(\square \)

We conclude this section by computing the mean return time

$$\begin{aligned} T_R^{++} = \inf \bigl \{t \ge 0\; :\; \exists s > 0,\; X(s) \ne X(0),\; s \le t,\; X(t) \in R\bigr \} \end{aligned}$$

to \(R = \rho (\Phi _q)\) starting from a uniformly distributed point in R. The reason why we use this heavy double \(^+\) notation is that we will also consider the maybe less natural but often more useful randomized or skeleton return time \(T_R^+\), which is defined as follows. Assuming that X is built by updating its current position at each time of a Poisson process of intensity \(\alpha \) according to the probability kernel P defined by Eqs. (8)–(9), the skeleton return time is

$$\begin{aligned} T_R^+ = \inf \bigl \{t \ge \tau _1 : X(t) \in R\bigr \}, \end{aligned}$$

with \(\tau _1\) the first updating time in the Poisson process. One always has \(T_R \le T_R^+ \le T_R^{++}\) and, for any \(x \in R\), it holds

$$\begin{aligned} E_x\bigl [T_R^{++}\bigr ] = E_x\bigl [T_R^+\bigr ] + P_x\bigl (X(\tau _1) = x\bigr )E_x\bigl [T_R^{++}\bigr ], \end{aligned}$$

so that

$$\begin{aligned} E_x\bigl [T_R^{++}\bigr ] = {\alpha \over w(x)}E_x\bigl [T_R^+\bigr ], \end{aligned}$$

with w(x) defined by Eq. (1). Like in the previous proof, we write \(S_q\) for \(S_{q, \emptyset }\) defined in Eq. (6), and we stress that its law depends on the spectrum of L only.

Proposition 3.2

For all \(m \le n\) and all \(q > 0\), it holds

$$\begin{aligned} {{\mathbb {E}}}\biggl [E_{U(\rho (\Phi _q))}\Bigl [T^+_{\rho (\Phi _q)}\Bigr ] \biggm | |\rho (\Phi _q)| = m \biggr ] = {n / m \over \alpha } \end{aligned}$$

and

$$\begin{aligned} {{\mathbb {E}}}\biggl [E_{U(\rho (\Phi _q))}\Bigl [T^{++}_{\rho (\Phi _q)}\Bigr ] \biggm | |\rho (\Phi _q)| = m \biggr ] = \left( {1 \over n} \sum _{x \in \mathcal{X}} {1 \over w(x)}\right) {n \over m} , \end{aligned}$$

so that

$$\begin{aligned} {{\mathbb {E}}}\biggl [E_{U(\rho (\Phi _q))}\Bigl [T^+_{\rho (\Phi _q)}\Bigr ]\biggr ]= & {} {1 \over \alpha } {{\mathbb {E}}}\biggl [{n \over S_q}\biggr ] \qquad \hbox {and}\qquad {{\mathbb {E}}}\biggl [E_{U(\rho (\Phi _q))}\Bigl [T^{++}_{\rho (\Phi _q)}\Bigr ]\biggr ]\\= & {} \left( {1 \over n} \sum _{x \in \mathcal{X}} {1 \over w(x)}\right) {{\mathbb {E}}}\biggl [{n \over S_q}\biggr ], \end{aligned}$$

where \(U(\rho (\Phi _q))\) stands for a random point uniformly distributed in \(\rho (\Phi _q)\).

Proof

We work in discrete time, and we use the same notation as in the proof of Lemma 3.1. For any \(\mathcal{B} \subset \mathcal{X}\) and \(x \in \mathcal{X}\), we set

$$\begin{aligned} h_\mathcal{B}(x) = E_x\bigl [{\hat{T}}_\mathcal{B}\bigr ] = \alpha E_x[T_\mathcal{B}]. \end{aligned}$$

When x belongs to \(\mathcal{B}\), it holds \(h_\mathcal{B}(x) = 0\) and, when \(x \not \in \mathcal{B}\),

$$\begin{aligned} h_\mathcal{B}(x)= & {} \sum _{y \in \mathcal{X}} P(x, y) E_x\bigl [{\hat{T}}_\mathcal{B} \bigm | {\hat{X}}(1) = y\bigr ] = \sum _{y \in \mathcal{X}} P(x, y) \bigl (1 + h_\mathcal{B}(y)\bigr )\\= & {} 1 + \bigl (Ph_\mathcal{B}\bigr )(x). \end{aligned}$$

Setting

$$\begin{aligned} g_\mathcal{B}(x)= {{\mathbb {1}}}_{\{x \in \mathcal{B}\}} E_x\bigl [{\hat{T}}_\mathcal{B}^+\bigr ] = {{\mathbb {1}}}_{\{x \in \mathcal{B}\}} \alpha E_x\bigl [T_\mathcal{B}^+\bigr ] = {{\mathbb {1}}}_{\{x \in \mathcal{B}\}} w(x) E_x\bigl [T_\mathcal{B}^{++}\bigr ], \end{aligned}$$
(19)

we also have, when \(x \in \mathcal{B}\),

$$\begin{aligned} g_\mathcal{B}(x) = 1 + \bigl (Ph_\mathcal{B}\bigr )(x). \end{aligned}$$

Let us denote by \(\nu \) any probability measure on the subsets of \(\mathcal{X}\) that produces well-distributed points. By setting \(h(x) = \sum _{\mathcal{B} \subset \mathcal{X}} \nu (\mathcal{B}) h_\mathcal{B}(x)\) for all \(x \in \mathcal{X}\), we then get

$$\begin{aligned} \bigl (Ph\bigr )(x)&= \sum _{\mathcal{B} \subset \mathcal{X}} \nu (\mathcal{B}) \bigl (Ph_\mathcal{B}\bigr )(x) = \sum _{\mathcal{B} \not \ni x} \nu (\mathcal{B}) \bigl (h_\mathcal{B} - 1\bigr )(x) + \sum _{\mathcal{B} \ni x} \nu (\mathcal{B}) \bigl (g_\mathcal{B} - 1\bigr )(x) \\&= - \sum _{\mathcal{B} \subset \mathcal{X}} \nu (\mathcal{B}) + \sum _{\mathcal{B} \not \ni x} \nu (\mathcal{B}) h_\mathcal{B}(x) + \sum _{\mathcal{B} \ni x} \nu (\mathcal{B}) g_\mathcal{B}(x) \\&= - 1 + h(x) + \sum _{\mathcal{B} \subset \mathcal{X}} \nu (\mathcal{B}) g_\mathcal{B}(x), \end{aligned}$$

where we used \(\sum _\mathcal{B} \nu (\mathcal{B}) = 1\), \(h_\mathcal{B}(x) = 0\) if \(x \in \mathcal{B}\), and \(g_\mathcal{B}(x) = 0\) if \(x \not \in \mathcal{B}\). Since \(\nu \) produces well-distributed point, the function h is constant on \(\mathcal{X}\), so that \(\bigl (Ph\bigr )(x) = h(x)\) for all x in \(\mathcal{X}\), which implies, together with the previous equality,

$$\begin{aligned} \sum _{\mathcal{B} \subset \mathcal{X}} \nu (\mathcal{B}) g_\mathcal{B}(x) = 1, \qquad x \in \mathcal{X}. \end{aligned}$$
(20)

Since by Theorem 1 we can choose for \(\nu \) the distribution of \(\rho (\Phi _q)\) conditioned on \(\bigl \{|\rho (\Phi _q)| = m\bigr \}\), this proves, together with Eq. (19), after dividing by \(\alpha m\) or w(x)m and summing on \(x \in \mathcal{X}\), the first part of the proposition. The last two equalities simply follow from Proposition 2.1.

Since a function h is harmonic on \(\mathcal{X}\) if and only if it is constant, the previous arguments actually show that any distribution \(\nu \) on the subsets \(\mathcal{B}\) of \(\mathcal{X}\) provides well-distributed points if and only if it satisfies Eq. (20), which is actually a list of \(n = |\mathcal{X}|\) equations the \(\nu (\mathcal{B})\)s have to satisfy, together with the positivity constraints \(\nu (\mathcal{B}) \ge 0\) for all \(\mathcal{B} \subset \mathcal{X}\) and the additional equation \(\sum _\mathcal{B} \nu (\mathcal{B}) = 1\). If we restrict ourselves to distributions supported by sets of a fixed size m, these are \(n + 1\) linear equations for \({n \atopwithdelims ()m}\) unknown variables. Since \({n \atopwithdelims ()m} > n + 1\) for \(2 \le m \le n -2\) and Theorem 1 provides a solution \(\nu \) with positive mass \(\nu (\mathcal{B}) > 0\) for each subset \(\mathcal{B}\) of size m, this shows that there are infinitely many solutions in this case. When \(m = 1\) or \(m = n - 1\), we have more equations than variables. If \(m = 1\), then solving Eq. (20) is straightforward and we have a unique solution. If \(m = n - 1\), then it is more convenient to solve directly the equation set

$$\begin{aligned} \sum _{|\mathcal{B}| = n - 1} \nu (\mathcal{B}) = 1, \qquad \sum _{|\mathcal{B}| = n - 1} \nu (\mathcal{B}) h_\mathcal{B}(x) = t, \qquad x \in \mathcal{X}, \end{aligned}$$

with the additional unknown variable t, to see that the solution is unique.

4 Re-reading Micchelli and Willoughby’s Proof

Before following in Sects. 4.14.3 the three main steps of Micchelli and Willoughby’s proof, we give some heuristic on the divided difference representation of the \(\nu _k = \nu _k^x\) in the case \(\nu = \delta _x\) for any x in \(\mathcal{X} {\setminus } \mathcal{B}\). (In the general case, \(\nu \) is a convex combination of such Dirac masses and we just need to prove the theorem in this special case of a generic Dirac mass.) For \(0 \le k < l\), we have

$$\begin{aligned} \nu _k^x = \delta _x \prod _{j > k} {[L]_{\mathcal{X} {\setminus } \mathcal{B}} + \lambda _{j, \mathcal{B}} \over \lambda _{j, \mathcal{B}}}, \end{aligned}$$

and, if \(\mathcal{B} \ne \emptyset \), the same formula gives \(\nu ^x_{-1} = 0\) by Hamilton–Cayley theorem, while, if \(\mathcal{B} = \emptyset \), \(\lambda _{0, \mathcal{B}} = 0\). Having in mind that each \(\nu _k\) should be interpreted as a local equilibrium from which the system decays into \(\nu _{k - 1}\) at rate \(\lambda _k\), this means that the system leaves the state \(\nu _0\) only to be absorbed in \(\mathcal{B}\) when \(\mathcal{B} \ne \emptyset \) (the probability measure on the quotient set \(\mathcal{X} / \mathcal{B}\) that is associated with \(\nu _{-1}\) is fully concentrated on \(\mathcal{B}\)), while \(\nu _0\) is the non-decaying global equilibrium \(\mu \) when \(\mathcal{B} = \emptyset \). Now, by considering Wilson’s algorithm with a first loop-erased random walk started from x to sample \(\Phi _{q, \mathcal{B}}\) and by comparing the successive decay times with the exponential random time \(T_q\), we get for any y in \(\mathcal{X} {\setminus } \mathcal{B}\), using the notation of Proposition 2.2 and the notation recalled before Eq. (16),

$$\begin{aligned} \begin{aligned}&{{\mathbb {P}}}\bigr (\rho (\tau _x(\Phi _q)) = y\bigr ) = P_x\bigl (X(T_q \wedge T_\mathcal{B}) = y\bigr ) = K_{q, \mathcal{B}}(x, y)\\&\quad = {q \over q + \lambda _{l - 1, \mathcal{B}}} \nu ^x_{l - 1}(y) + {\lambda _{l - 1, \mathcal{B}} \over q + \lambda _{l - 1, \mathcal{B}}} {q \over q + \lambda _{l - 2, \mathcal{B}}} \nu ^x_{l - 2}(y) \\&\qquad + \cdots + {\lambda _{l - 1, \mathcal{B}} \over q + \lambda _{l - 1, \mathcal{B}}} \cdots {\lambda _{2, \mathcal{B}} \over q + \lambda _{2, \mathcal{B}}} {q \over q + \lambda _{1, \mathcal{B}}} \nu ^x_1(y) \\&\qquad + {\lambda _{l - 1, \mathcal{B}} \over q + \lambda _{l - 1, \mathcal{B}}} \cdots {\lambda _{1, \mathcal{B}} \over q + \lambda _{1, \mathcal{B}}} {q \over q + \lambda _{0, \mathcal{B}}} \nu ^x_0(y). \end{aligned} \end{aligned}$$
(21)

Let us divide by q and multiply by \(Z_\mathcal{B}(q) = \det \bigl (q -[L]_{\mathcal{X} {\setminus } \mathcal{B}}\bigr )\) both sides of this equation to have a simpler polynomial right-hand side. With \(W_\mathcal{B}(q)\) the matrix defined by

$$\begin{aligned} W_\mathcal{B}(q)(x, y) = {Z_\mathcal{B}(q) \over q} {{\mathbb {P}}}\bigl (\rho (\tau _x(\Phi _q)) = y\bigl ) = {1 \over q} \sum _{\begin{array}{c} \scriptstyle \phi : \rho (\tau _x(\phi )) = y, \\ \scriptstyle \rho (\phi ) \supset \mathcal{B} \end{array}} w(\phi ) q^{|\rho (\phi ) {\setminus } \mathcal{B}|}, \quad x, y \in \mathcal{X} {\setminus } \mathcal{B}, \end{aligned}$$
(22)

or equivalently, since we have seen in the proof of Proposition 2.2 that \(K_{q, \mathcal{B}} = q [q - L]^{-1}_{\mathcal{X} {\setminus } \mathcal{B}}\),

$$\begin{aligned} W_\mathcal{B}(q) = Z_\mathcal{B}(q) [q - L]^{-1}_{\mathcal{X} {\setminus } \mathcal{B}}, \end{aligned}$$
(23)

Equation (21) now reads

$$\begin{aligned}&W_\mathcal{B}(q)(x, y) \nonumber \\&\quad = (q + \lambda _{0, \mathcal{B}}) \cdots (q + \lambda _{l - 2, \mathcal{B}}) \, \nu ^x_{l - 1}(y) + (q + \lambda _{0, \mathcal{B}}) \cdots (q + \lambda _{l - 3, \mathcal{B}}) \, \lambda _{l - 1, \mathcal{B}} \, \nu ^x_{l - 2}(y)\nonumber \\&\qquad + \cdots + (q + \lambda _{0, \mathcal{B}}) \, \lambda _{l - 1, \mathcal{B}} \lambda _{l - 2, \mathcal{B}} \cdots \lambda _{2, \mathcal{B}} \, \nu ^x_1(y) + \lambda _{l - 1, \mathcal{B}} \cdots \lambda _{1, \mathcal{B}} \, \nu ^x_0(y). \end{aligned}$$
(24)

and suggests a divided difference representation for the \(\nu _k\)s according to the following definition.

Definition 4.1

For any function f defined on \({{\mathbb {R}}}\) and with values in a real vector space, if \(x_0\), \(x_1, \ldots , x_{l - 1}\) are distinct real numbers, the divided differences \(f[x_0]\), \(f[x_0, x_1], \ldots , f[x_0, \ldots , x_{l - 1}]\) are the coefficients of the unique polynomial Q of degree less than l such that

$$\begin{aligned} Q(x)= & {} f[x_0] + f[x_0, x_1] (x - x_0) + f[x_0, x_1, x_2] (x - x_0) (x - x_1) + \cdots \\&+ f[x_0, \ldots , x_{l - 1}] (x - x_0) \cdots (x - x_{l - 2}) \end{aligned}$$

for all x in \({{\mathbb {R}}}\) and \(Q(x_k) = f(x_k)\) for \(k < l\).

Remark

These divided differences \(f[x_0, x_1, \ldots , x_k]\) are given by Lagrange interpolation formula

$$\begin{aligned} f[x_0, x_1, \ldots , x_k] = \sum _{j = 0}^k {f(x_j) \over \prod _{i \le k, i \ne j }(x_j - x_i)}, \qquad k < l, \end{aligned}$$
(25)

which shows that divided differences are permutation invariant, and one can also compute them inductively with Newton interpolation formulas

$$\begin{aligned} \begin{aligned} f[x_0]&= f(x_0), \\ f[x_0, \ldots , x_k]&= {f[x_1, \ldots , x_k] - f[x_0, \ldots , x_{k - 1}] \over x_k - x_0} , \end{aligned} \end{aligned}$$
(26)

which explain the terminology. See, for example, Chapter II of [15] for more details.

To prove that our \(\nu _k^x\)s are non-negative measures, we can assume without loss of generality that the eigenvalues \(\lambda _{k, \mathcal{B}}\) are all distinct (we can modify them slightly and use the continuity of the \(\nu _k^x\)s). Equation (24) suggests in this case that for all \(k < l\) and x in \(\mathcal{X}\)

$$\begin{aligned} W_\mathcal{B}[-\lambda _{0, \mathcal{B}}, \ldots , -\lambda _{k, \mathcal{B}}](x, \cdot ) = \lambda _{l - 1, \mathcal{B}} \cdots \lambda _{k + 1, \mathcal{B}} \, \nu _k^x = \delta _x \prod _{j > k} \bigl ([L]_{\mathcal{X} {\setminus } \mathcal{B}} + \lambda _{j, \mathcal{B}}\bigr ), \end{aligned}$$

i.e.

$$\begin{aligned} \nu _k^x = {W_\mathcal{B}[-\lambda _{0, \mathcal{B}}, \ldots , -\lambda _{k, \mathcal{B}}] \over \lambda _{l - 1, \mathcal{B}} \cdots \lambda _{k + 1, \mathcal{B}}}(x, \cdot ), \end{aligned}$$
(27)

or, equivalently,

$$\begin{aligned} W_\mathcal{B}[-\lambda _{0, \mathcal{B}}, \ldots , -\lambda _{k, \mathcal{B}}] = \prod _{j > k} \bigl ([L]_{\mathcal{X} {\setminus } \mathcal{B}} + \lambda _{j, \mathcal{B}}\bigr ). \end{aligned}$$
(28)

It is worth noting at this point that Eq. (28) would be a consequence of the theorem by Micchelli and Willoughby (the local equilibrium interpretation of each \(\nu _k\) makes sense only once its non-negativity is established), but our goal is to prove this theorem. This is what we are ready to do now.

4.1 Checking Eq. (28)

We simply check that the right-hand side and the left-hand side of Eq. (28) act in the same way on each left-eigenvector \(\mu _r\) of \([L]_{\mathcal{X} {\setminus } \mathcal{B}}\), with \(\mu _r [L]_{\mathcal{X} {\setminus } \mathcal{B}}= - \lambda _{r, \mathcal{B}} \mu _r\). Recall that \(W_\mathcal{B}(q)\) is equivalently defined by Eq. (22) or Eq. (23). By using the latter and Formula (25), it holds, for each \(r < l\) and \(k < l\),

$$\begin{aligned} \mu _r W_\mathcal{B}[-\lambda _{0, \mathcal{B}} , \ldots , -\lambda _{k, \mathcal{B}}]= & {} \sum _{j = 0}^k {\mu _r W_\mathcal{B}(- \lambda _{j, \mathcal{B}}) \over \prod _{i \le k, i \ne j }(-\lambda _{j, \mathcal{B}} + \lambda _{i, \mathcal{B}})} \\= & {} \sum _{j = 0}^k {\prod _{i < l, i \ne r}(-\lambda _{j, \mathcal{B}} + \lambda _{i, \mathcal{B}}) \over \prod _{i \le k, i \ne j }(-\lambda _{j, \mathcal{B}} + \lambda _{i, \mathcal{B}})} \mu _r . \end{aligned}$$

Since each numerator in the right-hand side is equal to 0 unless \(j = r\), which can happen only if \(r \le k\), we get

$$\begin{aligned} \begin{aligned} \mu _r W_\mathcal{B}[-\lambda _{0, \mathcal{B}}, \ldots , -\lambda _{k, \mathcal{B}}]&= {{\mathbb {1}}}_{\{r \le k\}} {\prod _{i < l, i \ne r}(-\lambda _{r, \mathcal{B}} + \lambda _{i, \mathcal{B}}) \over \prod _{i \le k, i \ne r }(-\lambda _{r, \mathcal{B}} + \lambda _{i, \mathcal{B}})} \mu _r\\&= {{\mathbb {1}}}_{\{r \le k\}} \left( \prod _{i> k}(-\lambda _{r, \mathcal{B}} + \lambda _{i, \mathcal{B}})\right) \mu _r \\&= \left( \prod _{i> k}(-\lambda _{r, \mathcal{B}} + \lambda _{i, \mathcal{B}})\right) \mu _r = \mu _r \prod _{j > k}\bigl ([L]_{\mathcal{X} {\setminus } \mathcal{B}} + \lambda _{j, \mathcal{B}}\bigr ). \end{aligned} \end{aligned}$$

4.2 A Combinatorial Identity

The key point of the proof lies in the following lemma.

Lemma 4.2

For any \(x \ne y\) in \(\mathcal{X} {\setminus } \mathcal{B}\), it holds

$$\begin{aligned} W_\mathcal{B}(q)(x, x) = Z_{\mathcal{B} \cup \{x\}}(q) \end{aligned}$$
(29)

and

$$\begin{aligned} W_\mathcal{B}(q)(x, y)= & {} w(x, y)Z_{\mathcal{B} \cup \{x, y\}}(q)\nonumber \\&+ \sum _{z, z' \in \mathcal{X} {\setminus } \left( \mathcal{B} \cup \{x, y\}\right) } w(x, z) W_{\mathcal{B} \cup \{x, y\}}(q)(z, z') w(z', y). \end{aligned}$$
(30)

Proof

Equation (22) can be rewritten as

$$\begin{aligned} W_\mathcal{B}(q)(x, y) = \sum _{\begin{array}{c} \scriptstyle \phi : \rho (\tau _x(\phi )) = y, \\ \scriptstyle \rho (\phi ) \supset \mathcal{B} \end{array}} w(\phi ) q^{|\rho (\phi )| - 1 - |\mathcal{B}|}, \end{aligned}$$
(31)

for \(x \ne y\) we have

$$\begin{aligned} Z_{\mathcal{B} \cup \{x, y\}}(q) = \sum _{\phi ': \rho (\phi ') \supset \mathcal{B} \cup \{x, y\}} w(\phi ') q^{|\rho (\phi ')| - 2 - |\mathcal{B}|} , \end{aligned}$$
(32)

and it holds

$$\begin{aligned} W_{\mathcal{B} \cup \{x, y\}}(q)(z, z')= & {} {1 \over q} \sum _{\begin{array}{c} \scriptstyle \phi '': \rho (\tau _z(\phi '')) = z', \\ \scriptstyle \rho (\phi '') \supset {\mathcal{B} \cup \{x, y\}} \end{array}} w(\phi '') q^{|\rho (\phi '')| - 2 - |\mathcal{B}|}\nonumber \\= & {} \sum _{\begin{array}{c} \scriptstyle \phi '': \rho (\tau _z(\phi '')) = z', \\ \scriptstyle \rho (\phi '') \supset {\mathcal{B} \cup \{x, y\}} \end{array}} w(\phi '') q^{|\rho (\phi '')| - 3 - |\mathcal{B}|}. \end{aligned}$$
(33)

Now we can define, for each \(\phi \) appearing in Eq. (31), \(\phi ' = \phi {\setminus } \{(x, y)\}\) if (xy) belongs to \(\phi \), and \(\phi '' = \phi {\setminus } \{(x, z), (z', y)\}\) if x is connected in \(\phi \) to y through \((x, z) \in \phi \) and \((z', y) \in \phi \), possibly with \(z = z'\). Since \(|\rho (\phi ')| = |\rho (\phi )| + 1\) and \(|\rho (\phi '')| = |\rho (\phi )| + 2\), Eq. (30) follows from Eqs. (31)–(33). Equation (29) is given by Eq. (31) with y in place of x. \(\square \)

4.3 Conclusion with Cauchy Interlacement Theorem

We will use the following lemma from [11] and for which we give an alternative proof.

Lemma [Micchelli and Willoughby]

Let \(f : x \in {{\mathbb {R}}} \mapsto \prod _{j < l} (x - \alpha _j) \in {{\mathbb {R}}}\) be a polynomial of degree l with l distinct zeros \(\alpha _0> \alpha _1> \cdots > \alpha _{l - 1}\). Let \(\beta _0> \beta _1> \cdots > \beta _{L-1}\) be \(L \ge l\) real numbers such that \(\beta _j \ge \alpha _j\) for each \(j < l\). Then, for any \(k \le L\), \(f[\beta _0, \beta _1, \ldots , \beta _k] \ge 0\).

Proof

We prove the lemma by induction on \(r = l - k\). First, since f is a polynomial of degree l with a dominant coefficient equal to 1, Definition 4.1 gives \(f[\beta _0, \ldots , \beta _k] = 0\) if \(k > l\)—that is \(r < 0\)\(f[\beta _0, \ldots , \beta _k] = 1\) if \(k = l\)—that is \(r = 0\)—and the claim is established for \(r \le 0\).

Now, for \(r > 0\), we first show that \(f[\beta _0, \alpha _1, \ldots , \alpha _k] \ge 0\). In the case \(\beta _0 = \alpha _0\), this is obvious by Formula (25), which gives \(f[\alpha _0, \ldots , \alpha _k] = 0\), while, if \(\beta _0 > \alpha _0\), we have by permutation invariance, Formula (26), and induction hypothesis,

$$\begin{aligned} f[\beta _0, \alpha _1, \ldots , \alpha _k]&= (\beta _0 - \alpha _0) {f[\beta _0, \alpha _1, \ldots , \alpha _k] - f[\alpha _0, \alpha _1, \ldots , \alpha _k] \over \beta _0 - \alpha _0} \\&= (\beta _0 - \alpha _0) {f[\beta _0, \alpha _1, \ldots , \alpha _k] - f[\alpha _1, \ldots , \alpha _k, \alpha _0] \over \beta _0 - \alpha _0} \\&= (\beta _0 - \alpha _0) f[\beta _0, \alpha _1, \ldots , \alpha _k, \alpha _0] = (\beta _0 - \alpha _0) f[\beta _0, \alpha _0, \ldots , \alpha _k]\\&\ge 0. \end{aligned}$$

We then show that \(f[\beta _0, \beta _1, \alpha _2, \ldots , \alpha _k] \ge 0\). In the case \(\beta _1 = \alpha _1\), this is what we have just proved, while, if \(\beta _1 > \alpha _1\), we have in the same way

$$\begin{aligned} f[\beta _0, \beta _1, \alpha _2, \ldots , \alpha _k]&= f[\beta _0, \alpha _1, \ldots , \alpha _k] + (\beta _1 - \alpha _1)\\&\quad {f[\beta _0, \beta _1, \alpha _2, \ldots , \alpha _k] - f[\beta _0, \alpha _1, \alpha _2, \ldots , \alpha _k] \over \beta _1 - \alpha _1} \\&= f[\beta _0, \alpha _1, \ldots , \alpha _k] + (\beta _1 - \alpha _1) f[\beta _0, \beta _1, \alpha _1, \ldots , \alpha _k] \ge 0. \end{aligned}$$

Proceeding similarly we eventually get to \(f[\beta _0, \beta _1, \ldots , \beta _k] \ge 0\).

The theorem is eventually proven by showing by induction on \(l = |\mathcal{X} {\setminus } \mathcal{B}|\) the stronger statement (recall Formula (27)):

$$\begin{aligned} W_\mathcal{B}[\xi _0, \ldots , \xi _k](x, y) \ge 0, \qquad k \le L, \qquad x, y \in \mathcal{X} {\setminus } \mathcal{B}, \end{aligned}$$

for all \(\xi _0> \xi _1> \cdots > \xi _{L - 1}\) with \(L \ge l\) and such that \(\xi _j \ge -\lambda _{j, \mathcal{B}}\) for \(j < l\). The claim is obvious for \(l = 1\), and we distinguish to cases for \(l > 1\). If \(x = y\), we do not need the inductive hypothesis: it follows from Formula (29) that

$$\begin{aligned} W_\mathcal{B}[\xi _0, \ldots , \xi _k](x, x) = Z_{\mathcal{B} \cup \{x\}}[\xi _0, \ldots , \xi _k], \end{aligned}$$

by Cauchy interlacement theorem—this is where the reversibility hypothesis matters—\(\xi _j \ge \lambda _{j, \mathcal{B}}\) implies \(\xi _j \ge \lambda _{j, \mathcal{B} \cup \{x\}}\) and the non-negativity follows from the lemma. If \(x \ne y\), the claim follows in the same way from Formula (30) and the inductive hypothesis.