1 Introduction

Let T be a \((d_1,d_2)\)-biregular tree with \(d_1,d_2\ge 3\). Denote by \({\text {Aut}}(T)\) the group of automorphisms acting without edge inversion. Let G be a non-compact, closed subgroup of \({\text {Aut}}(T)\) acting transitively on the boundary of the tree \(\partial T\). Let \(\Gamma \le G\) be a lattice and \(X=G/\Gamma \).

This parallels the classical setting of homogeneous dynamics, where one studies the actions of certain subgroups on a quotient of a linear algebraic group by a lattice. These two worlds intersect, for example, when \(G={\text {SL}}_2(k)\), where k is a non-archimedean local field, in which case G naturally acts on the associated Bruhat–Tits tree. However, our geometric setting also comprises many groups \(G\le {\text {Aut}}(T)\), including \({\text {Aut}}(T)\) itself, that are not linear [12].

We first focus on the homogeneous space \(X=G/\Gamma \), where \(\Gamma \) is a geometrically finite lattice. The dynamics of discrete geodesic flow on X was considered by Paulin in [41], and is related, among others, to the theory of continued fractions in non-archimedean local fields. We recall that when G is linear, by works of Raghunathan and Lubotzky [35, 43], any lattice therein is geometrically finite.

In our geometric setup, the role of Ad-unipotent subgroups in classical homogeneous dynamics is played by the horospherical subgroups \(G_\eta ^0\) of G, for \(\eta \in \partial T\). In the earlier work [13], the authors classified Borel probability measures on \(G/\Gamma \) invariant under \(G_\eta ^0\)-action for large class of groups G and general lattices \(\Gamma \), establishing an analogue of Dani’s result in [15]. Moreover, it was shown that when \(\Gamma \) is geometrically finite, \(G_\eta ^0\)-orbits are either compact or dense, as in the classical result of Hedlund [30] on the horocycle flow on finite volume hyperbolic surfaces.

1.1 Non-escape of mass

The horospherical group \(G_\eta ^0\) is amenable and one can easily construct Følner sequences therein: let \(a \in G\) be a hyperbolic element that has \(\eta \) as its attracting fixed point on \(\partial T\) and let M be the compact subgroup of \(G^0_\eta \) that fixes pointwise the translation axis of a in T. Then for any M-invariant compact subset O with non-empty interior in \(G^0_\eta \), the sequence \((O_t:=a^tOa^{-t})_{t \in \mathbb {N}}\) constitutes a Følner sequence in \(G^0_\eta \) (see e.g. [13, Lemma 2.10]). In the sequel, we shall refer to such sequences \(O_t\) as good Følner sequences. Følner sequences allow one to average along larger and larger pieces of the orbits. For \(x\in X\), we define \(\nu _{x,t} = m_{O_t} * \delta _x\), where \(m_{O_t}\) is the normalized restriction of the Haar measure \(m_{G^0_\eta }\) to \(O_t\); in other words for \(f \in C_c(X)\),

$$\begin{aligned} \int _X f(y) d\nu _{x,t}(y) = \int _{O_t} f(ux) dm_{O_t}(u). \end{aligned}$$

The probability measures \(\nu _{x,t}\) are called the orbital measures.

In general, one can have a qualitative information on the statistical behaviour of the typical points \(x \in X\). This can be done using the Howe–Moore property, established in our setting in [11] and amenable ergodic theorem [34]. Our topological result in [13] says, however, that all points \(x \in X\) that do not lie in a compact \(G^0_\eta \)-orbit have dense orbits. Therefore, the immediate question arises whether every dense orbit equidistributes to the Haar measure on \(G/\Gamma \). First possible obstruction to this is the escape of mass phenomenon. Our first result states that this does not happen when \(\Gamma \) is a geometrically finite lattice.

Theorem A

(Non-escape of mass) Let T be a \((d_1,d_2)\)-biregular tree, with \(d_1,d_2\ge 3\), and G a non-compact, closed subgroup of \({\text {Aut}}(T)\) acting transitively on \(\partial T\). Let \(\Gamma \) be a geometrically finite lattice in G, \(\eta \in \partial T\) and \(O_t\) a good Følner sequence in \(G^0_\eta \). Then, for every \(\varepsilon >0\), there exists a compact set \(K=K(\varepsilon ) \subset X\) such that for every \(x \in X\) not contained in a compact \(G^0_\eta \)-orbit, there exists a positive integer \(N=N(x,\varepsilon )\) with the property that for every \(t \ge N\), we have

$$\begin{aligned} \nu _{x,t}(K) > 1-\varepsilon . \end{aligned}$$
(1.1)

The above is known as non-escape of mass. In the context of one-parameter unipotent flows on quotients of real Lie groups, it is due to Dani and Margulis [14, 16]. Our result also applies to the linear setting, we now describe this special case. Let k be a non-archimedean local field and H be the group of k-points of a connected semisimple linear algebraic k-group \(\mathbb {H}\) of k-rank one. Let \(\mathbb {A}\) be a maximal k-split torus in \(\mathbb {H}\), \(\mathbb {Z}\) its centralizer, \(\mathbb {U}\) a maximal unipotent subgroup, \(\mathbb {P}\) the normalizer of \(\mathbb {U}\), and, respectively, AZUP be the groups of k-points. The group H acts by automorphisms [8] (see also [35, page 411]) on its Bruhat–Tits building which is a bi-regular tree T. If \(\mathbb {H}\) is simply connected, then H embeds as a closed subgroup of \({\text {Aut}}(T)\). In general, H might have edge inversion and in this case, we shall replace it with an index two subgroup that acts without edge inversion. Moreover, let K be a good maximal compact group of H. The group K is the stabilizer of a vertex of T, \(P=ZU\) is the stabilizer of an end \(\eta \in \partial T\) and we have the Iwasawa decomposition \(H=KP\) (see [8, § 4] or [7, § 8.2.1]). Finally let M be the compact subgroup \(K \cap Z\) of H. In our geometric setting, we have \(H^0_\eta =MU\) and the following result is an immediate consequence of the previous theorem:

Corollary 1.1

Let H and its subgroups MU be as above. Let \(\Lambda \) be a lattice in H and \(O_t\) be a good Følner sequence in MU. Then, for every \(\varepsilon >0\), there exists a compact set \(K=K(\varepsilon ) \subset X\) such that for every \(x \in X\) not contained in a compact MU-orbit, there exists a positive integer \(N=N(x,\varepsilon )\) with the property that for every \(t \ge N\), we have \( \nu _{x,t}(K) > 1-\varepsilon \).

This corollary is only relevant for fields k with \({\text {char}}k \ne 0\). Indeed in the zero characteristic case, by a result of Tamagawa [54] (also observed in [47]), every lattice in H is uniform. We also remark that the version of the previous corollary for U (instead of MU) holds as well. Finally, we note that a related result which would imply the previous corollary was mentioned in [29, page 467].

An immediate general consequence of Theorem A is

Corollary 1.2

For every \(x \in X\), every weak-\(*\) limit of the sequence \(\nu _{x,t}\) as \(t\rightarrow \infty \) is a \(G_\eta ^0\)-invariant probability measure on X.

In the proof of Theorem A, exploiting the underlying geometric setting, we translate the problem of understanding the distribution of \(G^0_\eta \)-orbit in \(G/\Gamma \) to the language of Markov chains, where it appears as a problem of controlling the distributions of a Markov chain with changing starting distributions. We then rely on two main ingredients: the first is a qualitative description of the behaviour of the discrete geodesic flow on \(G/\Gamma \), as studied in [13]. This allows us to understand the behaviour of starting distributions of the Markov chain. The second ingredient is, naturally, a set of Markov chain theoretical tools. The proof is then carried out by combining the two ingredients.

1.2 Equidistribution of orbits

For example, when \(G={\text {Aut}}(T)\) and for \(x \in X\) lying in a compact \(G^0_\eta \)-orbit, by standard arguments, all weak-\(*\) limits of \(\nu _{x,t}\) are \(G_\eta ^0\)-invariant probability measures supported on the homogeneous orbit. This orbit supports a unique \(G^0_\eta \)-invariant measure and, hence, \(\nu _{x,t}\) equidistribute to the homogeneous measure supported on the orbit closure.

Under the additional topological simplicity assumption on G, our second result yields a complete qualitative description of statistical behaviour of every \(x \in X\) not contained in a compact \(G^0_\eta \)-orbit and for such \(x \in X\), it identifies the limit of \(\nu _{x,t}\) as the Haar measure:

Theorem B

(Equidistribution) Let T be a \((d_1,d_2)\)-biregular tree, with \(d_1,d_2\ge 3\), and G a non-compact, closed, topologically simple subgroup of \({\text {Aut}}(T)\) acting transitively on \(\partial T\). Let \(\Gamma \) be a geometrically finite lattice in G and \(O_t\) be a good Følner sequence in \(G^0_\eta \). Assume \(x\in X\) does not belong to a compact \(G^0_\eta \)-orbit. Then, the orbital measures \(\nu _{x,t}\) equidistribute to the normalized Haar measure \(m_X\) as \(t\rightarrow \infty \), in other words, for every \(f \in C_c(X)\), we have

$$\begin{aligned} \int _{O_{t}} f(ux) dm_{O_{t}}(u) \underset{t \rightarrow \infty }{\longrightarrow } \int _X f(y) dm_X(y). \end{aligned}$$

The previous theorem has the following immediate consequence on the statistical behaviour of \(G^0_\eta \)-orbits. Let L be a closed subgroup of G. A probability measure \(\mu \) on X is called L-homogeneous if it is the unique L-invariant probability measure on a closed L-orbit. It is said to be homogeneous if it is L-homogeneous for some closed subgroup \(L<G\). A point \(x \in X\) is called generic for \(G^0_\eta \) (see [47, Definition 1]) if for some (equivalently for any) good Følner sequence \(O_t\), the sequence \(\nu _{x,t}\) of orbital measures equidistributes to a homogeneous measure.

Corollary 1.3

Keep the hypotheses of Theorem B. Any \(x\in X\) is generic for \(G^0_\eta \).

In the context of unipotent flows on \({\text {SL}}_2({\mathbb {R}})/\Gamma \), this result goes back to Dani–Smillie [17]. Since then, Ratner [46, 48], Shah [51] and others have obtained very general results in Lie groups or algebraic groups over local fields of characteristic zero, but even in the case of a semisimple linear group G of rank one over a local field of positive characteristic, e.g. \({\text {SL}}_2(k)\) with \(k=\mathbb {F}_q((X^{-1}))\), this result does not appear in the literature. However, we remark that for arithmetic quotients of linear groups, one may deduce such an equidistribution result by combining the work of Mohammadi [38] and the result mentioned by Ghosh in [29, page 467]. In the linear setting, the previous results have the following immediate consequence:

Corollary 1.4

Keep the hypotheses and the notation of Corollary 1.1. The statement of Corollary 1.3 holds when \(G^0_\eta \) is replaced with the subgroup MU of H.

For example, for \(H={\text {SL}}_2(\mathbb {F}_q((X^{-1})))\), one can take \(\Gamma \) to be the non-uniform lattice \({\text {SL}}_2(\mathbb {F}_q[X])\) and the groups M and U to be

$$\begin{aligned} M=\begin{pmatrix} \mathbb {F}_q[[X^{-1}]]^* &{} 0 \\ 0 &{} \mathbb {F}_q[[X^{-1}]]^* \end{pmatrix},\, \; \; \; U = \begin{pmatrix} 1 &{} \mathbb {F}_q((X^{-1})) \\ 0 &{} 1 \end{pmatrix}. \end{aligned}$$

We remark that for uniform lattices, one can use Margulis’ orbit thickening argument to show that U-action is uniquely ergodic (see Mohammadi [38], Ellis–Perrizo [22] or [13, Lemma 6.3]). It is also worth noting that for non-uniform quotients, using our geometric approach, one can show the version of the previous corollary for the U-action (instead of MU). Finally, we mention the work of Vatsal [56] in which the equidistribution results of Ratner [47, 48] for unipotent dynamics in the p-adic case were applicable with a geometric approach similar to ours (see [56, Page 9-10]).

Regarding the proof of Theorem B, it is proven by using Theorem A, the classification of \(G_\eta ^0\)-orbits, given in [13] and the Howe–Moore property established in [11].

1.3 New non-linear homogeneous dynamical phenomena

So far, the results obtained in Theorems A and B for geometrically finite lattices parallel the more classical results in linear homogeneous dynamics. However, the family of tree lattices is very rich and, as opposed to the linear setting, there exist many non-geometrically finite lattices. These exhibit wilder behaviors than their linear counterparts giving rise to several interesting phenomena that do not appear in the classical setting. Various aspects of these differences, as well as analogies, were studied by many, including Serre [50], Tits [55], Bass–Kulkarni [2], Burger–Mozes [10, 11], Lubotzky [35], Bass–Lubotzky [3], Paulin [42], Bekka–Lubotzky [5] etc. The following results add a new dynamical aspect to these non-linear phenomena showing that horospherical orbits on quotients by non-geometrically finite lattices can exhibit escape of mass, which does not occur in homogeneous dynamics in the linear setting.

Theorem C

(Escape of mass) For any \(q\ge 2\), there exist a lattice \(\Gamma \) in \(G={\text {Aut}}(T_{2q+2})\) and \(\eta \in \partial T_{2q+2}\) such that for the trivial coset \(x=e\Gamma \in X\), any compact \(K\subset X\) and any good Følner sequence \((O_t)_{t \in \mathbb {N}}\) in \(G^0_\eta \), we have

$$\begin{aligned} \lim _{t\rightarrow \infty } \nu _{x,t}(K) = 0. \end{aligned}$$

Recall that in the setting of unipotent dynamics on linear homogeneous spaces, by now classical results of Ratner [45, 46, 48], Mozes, Shah [39, 51] and others show that the orbital averages along unipotent group actions always converge towards an invariant probability measure. The following result contrasts the classical situation by giving an example where we see not only an escape of mass phenomenon, but also a failure of convergence of the orbital averages along Følner sequences.

Theorem D

(Escape of mass and equidistribution) There exists a non-uniform lattice \(\Gamma <{\text {Aut}}(T_6)\) with the property that for any \(\eta \in \partial T\) there exist points \(x \in X={\text {Aut}}(T_6)/\Gamma \) such that for any good Følner sequence \((O_t)_{t\in \mathbb {N}}\) in \(G^0_\eta \), the set of accumulation points of the sequence of orbital averages \(\nu _{x,t}\) contains the zero measure and \(m_X\).

The proof of this theorem is carried out in Sect. 5 and consists of several parts. In fact, it yields an uncountable number of non-isomorphic such lattices in \({\text {Aut}}(T_6)\). The construction of these lattices has a similar flavor as the constructions of Bass–Lubotzky in [3] to show that there are lattices of arbitrarily small covolumes in \({\text {Aut}}(T)\). Once the candidate lattices are constructed, the escape of mass phenomenon is proven by exploiting further the aforementioned connection between the Markov chain theory and distributions of horospherical orbits. This step uses the relatively finer ingredient of subgaussian concentration estimates for geometrically ergodic Markov chains (see e.g. Dedecker-Gouëzel [18]). Finally, the proofs of the uniqueness of the \(G^0_\eta \)-invariant probability measure and the equidistribution along some orbital averages rely, among others, on the mixing of the discrete geodesic flow and the positive recurrence of the associated Markov chain.

1.4 Equidistribution of spheres

To describe the general problem that we study here, consider a morphism of graphs \(\pi : T \rightarrow Q\), where T is a biregular tree. For a vertex \(\tilde{v} \in VT\), let \(S(\tilde{v},n)\) be the set of vertices of T at distance n from \(\tilde{v}\). Let \(\rho _n\) be the uniform distribution on \(S(\tilde{v},n)\). We are interested in the distributions \(\pi _* \rho _n\) on VQ: do they have a limiting distribution and, if yes, can one identify it? Questions about equidistribution of spheres are well-studied in many homogeneous quotients: Euclidean spheres in \({\mathbb {R}}^d/{\mathbb {Z}}^d\) in [44] or hyperbolic spheres in quotients of hyperbolic space \(\mathbb {H}^d/\Gamma \), where \(\Gamma \) is a lattice in \({\text {SO}}(d,1)\) (see [6, Theorem 3.3], [21, 25, 44, 52] for more general results with applications to various counting problems). In the following result, we answer such a question for the natural quotient Q of the tree associated to the \(\Gamma \)-action, where \(\Gamma \) is a general lattice in \({\text {Aut}}(T)\).

Theorem E

(Equidistribution of spheres in quotients by tree lattices) Let T be a biregular tree, \(\Gamma \le {\text {Aut}}(T)\) a tree lattice. Denote by \(Q=\Gamma \backslash T\).

  1. 1.

    (Non-escape of mass) For any \(\epsilon >0\), there exists a finite subset \(K\subset VQ\), such that for all \(n \in \mathbb {N}\) we have

    $$\begin{aligned} \pi _* \rho _n(K) \ge 1-\epsilon . \end{aligned}$$
  2. 2.

    (Limiting distribution) There exists an integer p, and limiting probability distributions \(\mu _0, ..., \mu _{p-1}\) on VQ such that for all \(v\in VT\) and for all \(0\le j <p\) we have

    $$\begin{aligned} \pi _* \rho _{pn+j} \rightarrow \mu _j, \quad \text { as } n\rightarrow \infty . \end{aligned}$$
  3. 3.

    (Exponential convergence) If, in addition, \(\Gamma \) is geometrically finite, we can take \(p=2\) and there exists \(r>1\) such that

    $$\begin{aligned} \Vert \pi _* \rho _{2n+j} - \mu _j \Vert = o(r^{-n}), \end{aligned}$$

    where \(\Vert .\Vert \) denotes the total variation norm.

In geometrically finite case (3), the measures \((\mu _j)_{j=0,1}\) coincide with the projection of the Haar measure \(m_X\) by the natural map \({\text {proj}}: {\text {Aut}}(T)/\Gamma \rightarrow VQ\) by two different base points. The exponential rate of convergence 1/r in this result can be made effective, using the effective version of geometric ergodic theorem for Markov chains as in [4].

The proof of the previous result relies on the tools we develop to prove Theorem A. Indeed, the Markov chain that we construct to track the statistical behaviour of horospherical averages easily allows one to understand the spherical averages provided one proves a (positive) geometric recurrence property for (non-) geometrically finite lattice quotients. This is carried out in Sect. 6. To draw an analogy, the overall proof can be seen to parallel, in considerably simpler fashion, the deduction of Theorem [24, Theorem 4.4] from Theorem 4.1 in that work.

Remark 1.5

(Diophantine exponent vs. speed of equidistribution) In fact, in the geometrically finite case, using the geometric recurrence of the associated Markov chain (Lemma 6.7), one can show the version of the equidistribution in Theorem B on the quotient VQ additionally with a speed as in (3) above. The equidistribution itself directly follows by projecting the measures \(m_{O_t}\) and \(m_X\) in Theorem B by the map \({\text {proj}}\). The speed of equidistribution depends on a geometric diophantine exponent (see e.g. [53, (1.6)] and [26, 42]) of the boundary point \(g^{-1}\eta \) where \(x=g\Gamma \). From this perspective, Theorem E (3) can also be seen as a particular case based on the fact of hyperbolic geometry that large circles are well-approximated by horocycles [24, p.116] (see also Remark 6.8).

1.5 Counting lattice points

Another classical question closely related to the equidistribution of spheres is the problem of counting lattice points. To describe the general problem, consider a lattice \(\Gamma \) (or more generally a discrete subgroup) in some locally compact topological group endowed with a non-negative functional \(\Vert \cdot \Vert \). One is interested in describing the asymptotics of

$$\begin{aligned} N(R) = | \{ \gamma \in \Gamma : \Vert \gamma \Vert \le R \} |. \end{aligned}$$

This problem goes back to Gauss who was interested in the case \({\mathbb {Z}}^d \le {\mathbb {R}}^d\) with Euclidean norm as the functional \(|| \cdot ||\). This particular problem is known as Gauss circle problem and the sharp error rates are still unknown. For \(\Gamma \le {\text {SL}}_2({\mathbb {R}})\), one can take \(|| \cdot ||\) to be the operator norm induced by the Euclidean norm on \(\mathbb {R}^2\), in which case, we have \(\Vert g\Vert = \exp (\frac{1}{2} d_{\mathbb {H}^2}(g.i, i))\). This was already studied by Delsarte [19] in 40’s, who obtained the first non-euclidean counting results. In the same setting, lattice point counting problem is also closely related to the counting of closed geodesics on hyperbolic surfaces. For an extensive historical survey and overview of methods used, we refer to [28], where the authors also develop spectral techniques to study the lattice point counting problem in a large generality.

Coming back to our setting, in analogy with the real hyperbolic case, it is natural to consider the functional \(\Vert g\Vert = d(g\tilde{o},\tilde{o})\), where \(\tilde{o}\in VT\) is some basepoint and d the graph distance on the tree. Clearly, for a discrete \(\Gamma \), N(R) is finite and non-decreasing in R. The following result describes the growth asymptotics of N(R) with exponential error term for a geometrically finite lattice \(\Gamma \):

Theorem F

Let T be a biregular tree, \(\Gamma \le {\text {Aut}}(T)\) a geometrically finite tree lattice. Let m be an Haar measure on \({\text {Aut}}(T)\) and \(m_X\) the induced finite measure on \(X={\text {Aut}}(T)/\Gamma \). Fix a basepoint \(\tilde{o}\in VT\) and for \(R \in \mathbb {N}\), let

$$\begin{aligned} N(R) = | \{ \gamma \in \Gamma : d(\gamma \tilde{o}, \tilde{o}) \le R \} |. \end{aligned}$$

Denote by \(B_T(R)\) the cardinality of the set of vertices at an even distance from \(\tilde{o}\) that is at most R. Then, there exists \(c \in (0,1)\) such that

$$\begin{aligned} \left| \frac{N(2R)}{B_T(2R)} - \frac{m(G_{\tilde{o}})}{m_X(X)} \right| < o(c^{2R}). \end{aligned}$$

We stress that unlike before, we do not normalize the measure \(m_X\) to be a probability measure. We also remark that the main term \(\frac{m(G_{\tilde{o}})}{m_X(X)}\) can alternatively be expressed as \((\sum _{v \in V'Q} \frac{1}{|\Gamma \cap G_{\tilde{v}}|} )^{-1}\), where for every vertex v of \(Q=\Gamma {\setminus } T\), \(\tilde{v} \in VT\) denotes a lift of v, \(G_{\tilde{v}}\) is the maximal compact subgroup of \({\text {Aut}}(T)\) fixing \(\tilde{v}\) and \(V'Q\) denotes the set of vertices Q at even distance from \(\pi (\tilde{o})\). Finally, we note that \({\text {Aut}}(T)\) acts without edge inversion and this implies that for every \(g \in {\text {Aut}}(T)\) and \(\tilde{v} \in VT\), \(d(g\tilde{v},\tilde{v}) \in \mathbb {N}\) is an even number. This is the reason why, in the previous statement, we only consider vertices at even distance from each other.

We remark that this theorem also follows from the main result of Kwon in [32] and from the work of Roblin [49, Chapitre 4, Corollaire 2]. Our proof relies on our previous result on the equidistribution of spheres (Theorem E) and is a relatively straightforward consequence thereof. An exponential error rate \(c \in (0,1)\) can also be effectively calculated.

The article is organized as follows. We recall some preliminary material mostly on lattices in groups acting on trees and set our notation in Sect. 2. In Sect. 3, we associate a natural Markov chain to an edge-indexed graph, study its properties and use these to prove Theorem A for geometrically finite lattices. In Sect. 4, we prove Theorem B. Theorems C and D are proven in Sect. 5. In Sect. 6, we introduce an auxiliary Markov chain and use this to study the edge-indexed graph associated to a general lattice and prove Theorems E and F.

2 Preliminaries

2.1 Basic notation

We denote by T a \((d_1,d_2)\)-regular tree, with \(d_1,d_2\ge 3\), with VT its set of vertices and ET, its edges. All edges are directed and \(\partial _0, \partial _1 : ET \rightarrow VT\) are, respectively, the initial and the terminal vertex maps. An (ordered) pair of edges \(e_1,e_2\) is called consecutive if \(\partial _1(e_1)=\partial _0(e_2)\). A sequence of consecutive edges \(e_1,...,e_n\) is called a path of length n. We also refer to it as a path between \(\partial _0(e_1)\) and \(\partial _1(e_n)\). The distance \(d(\cdot ,\cdot )\) between two vertices of the graph is defined as the minimal length of a path between these vertices.

We denote by \({\text {Aut}}(T)\) the group of tree automorphisms acting without edge inversion, i.e. the group of automorphisms g such that \(d(gv,v)=0 \pmod 2\) for one (equivalently every) vertex \(v \in VT\). When \(d_1=d_2\), this is an index two subgroup of full group of automorphisms. Endowed with pointwise convergence topology, it is a locally compact, second countable group. In this article G always stands for a non-compact, closed subgroup of \({\text {Aut}}(T)\) that acts transitively on the boundary \(\partial T\) of T.

Throughout the rest of the article, we fix a basepoint \(\tilde{o}\in VT\) and a distinguished end \(\eta \in \partial T\), and denote by \((y_0, y_1, y_2 ,...)\) the vertices of the infinite path converging to \(\eta \), where \(y_0=\tilde{o}\).

For a subset \(S\subset T\), and a subgroup \(H<{\text {Aut}}(T)\), \(H_S\) denotes the pointwise stabilizer of S in H. Given \(\eta \in \partial T\), we define

$$\begin{aligned} G_\eta ^0 : =\{ g\in G ~|~ \exists N , \forall n\ge N \quad g(y_n)=y_n \}. \end{aligned}$$

The group \(G_\eta ^0\) is called the horospherical subgroup (see [13, Section 2] for more details on horospherical subgroups). It is a closed and amenable subgroup of G and as mentioned in the introduction, one can construct many good Følner sequences in \(G^0_\eta \). The following sequence of compact open subgroups of \(G_\eta ^0\) yields a good and tempered Følner sequence that is particularly convenient for our geometric approach. For \(t \in \mathbb {N}\), we set

$$\begin{aligned} F_t := \{ g\in G_\eta ^0 ~|~ g(y_t)=y_t \}. \end{aligned}$$

In fact, as we shall see, thanks to the structure of good Følner sequences, it will be sufficient to prove our results only for the sequence \(F_t\). Denote by \(m_G\) and \(m_{G_\eta ^0}\) the Haar measures on G and \(G_\eta ^0\), respectively. By \(m_{F_t}\) we denote the Haar probability measure on \(F_t\) which clearly coincides with the normalized restriction of \(m_{G^0_\eta }\) to \(F_t\).

2.2 Lattices and theirs associated edge-indexed graphs

It is well-known that a subgroup \(\Gamma \le G\) is discrete if and only if all vertex stabilizers \(\Gamma _v\) for \(v\in VT\) are finite. A discrete subgroup \(\Gamma \le G\) is called a lattice if \(X=G/\Gamma \) admits a G-invariant Borel probability measure, in which case we denote this measure by \(m_X\). By our standing assumption of boundary transitivity of G, the quotient graph \(G\backslash T\) has two vertices. Indeed by [10, Lemma 3.1.1], G acts two-transitively on \(\partial T\) which in turn implies that G has precisely two orbits on VT. Moreover, since G acts without edge inversions, it acts transitively on the set of vertices of even distance. In this case, \(\Gamma \) is a lattice in G if and only if it is a lattice in \({\text {Aut}}(T)\). Therefore, all lattices we will consider are tree lattices, i.e. lattices in \({\text {Aut}}(T)\). For convenience, we will often call them lattices without specifying the ambient group. We refer to [3] for more details on tree lattices and edge-indexed graphs.

Given a discrete subgroup \(\Gamma \), there is a useful construction [2] of a graph Q and map \({\text {ind}}: EQ \rightarrow \mathbb {N}\) as follows: the graph Q is the quotient graph \(\Gamma \backslash T\), which is well-defined, since \(\Gamma \) acts without edge inversion. Denote by \(\pi :T\rightarrow Q\) the projection map. The index map \({\text {ind}}:EQ \rightarrow {\mathbb {N}}\) is given by \( {\text {ind}}(e) = [ \Gamma _{\partial _0(\tilde{e})} :\Gamma _{\tilde{e}} ]\), where \(\tilde{e}\in ET\) is any edge with \(\pi (\tilde{e})=e\). This clearly does not depend on the choice of the lift \(\tilde{e}\). The pair \((Q,{\text {ind}})\) is called the edge-indexed graph associated to \(\Gamma <{\text {Aut}}(T)\).

For \(v\in VQ\), we define \(\deg (v)\) to be the valency of any of its lifts \(\tilde{v}\). By definition of the map \({\text {ind}}\),

$$\begin{aligned} \deg (v)=\sum _{e\in EQ : \partial _0(e)=v}{\text {ind}}(e). \end{aligned}$$
(2.1)

Further, let \(\Delta : EQ \rightarrow {\mathbb {R}}\) given by \(\Delta (e)=\frac{{\text {ind}}(\overline{e})}{{\text {ind}}(e)}\) and for \(u,v\in VQ\) define

$$\begin{aligned} N_u (v) = \Delta (e_1)...\Delta (e_n), \end{aligned}$$
(2.2)

where \((e_1,...,e_n)\) is a path from u to v. For an an edge-indexed graph \((Q,{\text {ind}})\) associated with a discrete subgroup \(\Gamma \), the value of \(N_u(v)\) does not depend on the choice of the path. Fixing a basepoint \(o\in VQ\) (for convenience, we use \(o=\pi (\tilde{o})\)), the discrete subgroup \(\Gamma \) is a lattice in G (see [3, § 1.1.5]) if and only if

$$\begin{aligned} {\text {vol}}_o(Q,{\text {ind}}):= \sum _{v\in VQ} N_o(v)^{-1} < \infty , \end{aligned}$$
(2.3)

where d(., .) denotes the graph distance on Q. We shall refer to this quantity as the volume of the edge-indexed graph \((Q,{\text {ind}})\) based at o. We also remark that changing the base point from o to \(o'\) has the effect of multiplying the previous sum by the rational number \(\frac{\Delta (o')}{\Delta (o)}\), therefore does not affect its finiteness.

Conversely, one can define an abstract edge-indexed graph \((Q,{\text {ind}})\) as a tuple consisting of a graph Q and map \({\text {ind}}: EQ \rightarrow \mathbb {N}\). Under natural assumptions on the associated maps \(\Delta \) and N as above, there exists a discrete subgroup \(\Gamma \) whose associated edge-indexed graph coincides with \((Q,{\text {ind}})\) and the function N is proportional to \(v \mapsto |\Gamma _{\tilde{v}}|\), where \(\tilde{v}\) is any lift of v (see [3, page 23] or [2]).

For a discrete group \(\Gamma \le G\), we define the projection map \({\text {proj}}: G / \Gamma \rightarrow VQ\) by \({\text {proj}}(g \Gamma ):= \pi (g^{-1} \tilde{o})= \Gamma g^{-1}\tilde{o}\). The map \({\text {proj}}\) is clearly continuous and has compact fibers in \(G/\Gamma \): for each \(v \in VQ\) and \(g\in G\) such that \({\text {proj}}(g \Gamma )=v\), we have \({\text {proj}}^{-1}(v)= G_{\tilde{o}} g \Gamma \). Moreover, the measure of each fiber is

$$\begin{aligned} m_X(G_{\tilde{o}}g\Gamma ) = m_X(g^{-1}G_{\tilde{o}}g\Gamma )= m_X (G_{g^{-1}\tilde{o}}\Gamma ) = m_X(G_{\tilde{o}}\Gamma )\frac{|\Gamma _{\tilde{o}}|}{|\Gamma _{g^{-1}\tilde{o}}|} \end{aligned}$$

In other words, using the Definition 2.2 of the map \(N_o\), we have

$$\begin{aligned} {\text {proj}}_*m_X(v) =\frac{1}{N_o(v)} {\text {proj}}_*m_X(o). \end{aligned}$$
(2.4)

2.3 Geometrically finite lattices

Following [3, 50], we define a Nagao ray to be an edge-indexed graph \((Q,{\text {ind}})\) whose underlying graph Q is an infinite ray and the map \({\text {ind}}\) takes value 1 on all edges directed towards the infinity except the edge emanating from the vertex o at the origin. All edges e directed away from infinity are indexed by \(\deg (\partial _1(e))-1\). Here, an edge \(e \in EQ\) is said to be directed towards infinity if \(d(\partial _1(e),o)>d(\partial _0(e),o)\), and directed away from infinity otherwise. See Fig. 1 for an example of Nagao ray in \((q_1+1,q_2+1)\)-biregular tree. An open Nagao ray is obtained by removing the origin vertex from a Nagao ray.

Fig. 1
figure 1

Nagao ray, when T is \((q_1+1,q_2+1)\)-biregular. By convention, for edge e, the index \({\text {ind}}(e)\) is written next to the vertex \(\partial _0(e)\)

Following Paulin [42], a tree lattice \(\Gamma \) is called geometrically finite if its associated edge-indexed graph \((Q,{\text {ind}})\) contains a finite subgraph F whose set theoretic complement in Q is a disjoint union of finitely many open Nagao rays. The finite part of \((Q,{\text {ind}})\) is the smallest non-empty finite subgraph F with this property. When T is a \((q+1)\)-regular tree, a tree lattice \(\Gamma \) is called of Nagao type if the associated edge-indexed graph is a Nagao ray (see [3, Chapter 10]). Fig. 2 illustrates the corresponding edge-indexed graph. Another example of geometrically finite lattice, where T is (3, 10)-biregular tree, is given in Fig. 3.

Fig. 2
figure 2

Edge-indexed graph of a lattice of Nagao type. The index of \((x_0,x_1)\) is determined by (2.1). The finite part consists of the single vertex \(x_0\)

Fig. 3
figure 3

Edge-indexed graph associated with a geometrically finite lattice in \({\text {Aut}}(T)\), where T is (3, 10)-biregular tree. The finite part contains the solid nodes and the edges between them

When \(\Gamma \) is geometrically finite, we have a very useful characterization of compact \(G_\eta ^0\)-orbits in \(G/\Gamma \) (see [13, Lemma 6.2] or [42, Proposition 3.1]).

Proposition 2.1

Let \(\Gamma \le G\) geometrically finite lattice. Let \(g\in G\) be such that the \(G_\eta ^0\)-orbit of \(g\Gamma \) is not compact in \(G/\Gamma \). Let F denote the finite part of \(Q=\Gamma \backslash T\). Then \(\pi (g^{-1}y_t)\) belongs to F for infinitely many values of t, in particular \(t-d(\pi (g^{-1}y_t),F )\) is monotone non-decreasing and unbounded.

2.4 Markov chains

We recall some terminology and basic facts of the theory of Markov chains and set our notation. For more details, we refer the reader to [20, 33, 37].

Let S be a countable set, and \(P: S \times S \rightarrow [0,1]\) a Markov kernel, i.e. \(\sum _{y \in S} P(x,y)=1\) for every \(x \in S\). By (standard) abuse of notation, we shall also denote the associated Markov operator and its dual by P: for a function f on S, \(Pf(x)=\sum _{y}f(y)P(x,y)\). For a measure \(\mu \) on S, \(\mu P(\cdot )=\sum _{y} \mu (x)P(x,\cdot )\). For \(n \in \mathbb {N}\), \(P^n\) denotes the \(n^{th}\)-convolution power of P. For \(s\in S\), we denote by \(\delta _s\) the probability measure supported on \(\{s\}\): for \(s_1,s_2\in S\), \(P^n(s_1,s_2):=\delta _{s_1}P^n(s_2)\).

The Markov kernel P is called irreducible if for every \(s,t \in S\), there exists \(n \in \mathbb {N}\) with \(P^n(s,t)>0\). The period of an irreducible Markov kernel P is defined as \(\gcd \{n \in \mathbb {N}\, |\, P^n(s,s)>0\}\) for some (or equivalently all) \(s \in S\). If the period is 1, the Markov chain is called aperiodic. Denoting the period by p, there exists a partition \(\Omega _0,\ldots ,\Omega _{p-1}\) of the state space S into cyclic classes \(\Omega _i\) such that for every \(s \in \Omega _i\), \(P(s,\Omega _{i+1})=1\) \((i \mod p)\). If P is irreducible and has period p, then \(P^p\) restricted to each cyclic class is irreducible and aperiodic. In a standard manner [20, Section 3.1], a Markov kernel yields a canonical Markov chain on the state space S. Therefore, we shall equivalently speak of a Markov chain being irreducible, aperiodic etc.

A non-negative measure \(\mu \) on S is said to be stationary for the Markov kernel P if \(\mu P=\mu \). An irreducible Markov kernel P is called positive recurrent if it admits a stationary probability measure, in which case this measure is unique. If, moreover, P has period p then \(\mu =\frac{1}{p}\sum _{i=0}^{p-1} \mu _{|\Omega _i}\) is a stationary measure of P, where \(\mu _{|\Omega _i}\) is the unique stationary probability measure of \(P^p\) restricted to \(\Omega _i\). We also have \(\mu _{\Omega _i} P=\mu _{\Omega _{i+1}}\) \((i \mod p)\).

For an irreducible aperiodic positive recurrent Markov chain and any initial distribution \(\mu \), \(\mu P^n \) converges to the stationary probability measure as \(n\rightarrow \infty \). In case of an irreducible Markov chain that is not positive recurrent, \(\mu P^n\) converges to 0, regardless of the period.

3 Non-escape of mass

The aim of this section is to prove Theorem A. We start by associating a Markov chain with a tree lattice \(\Gamma \), study its properties and eventually link the Markov chain to the study of orbital measures in \(G/\Gamma \) of horospherical subgroups. If \(\Gamma \) is a uniform lattice, there is nothing to prove in Theorem A, so throughout the proof, \(\Gamma \) is assumed to be non-uniform.

3.1 The Markov chain

Let \(\Gamma \) be a tree lattice and \((Q,{\text {ind}})\) be the corresponding edge-indexed graph. Define the Markov chain \(M_n\) with state space EQ and transition probabilities given by

$$\begin{aligned} P(e_1,e_2) = {\left\{ \begin{array}{ll} 0 &{}\quad \text {if } \partial _1(e_1) \ne \partial _0(e_2), \\ \frac{{\text {ind}}(e_2)-1}{\deg (\partial _1(e_1))-1} &{}\quad \text {if } e_2=\overline{e_1}, \\ \frac{{\text {ind}}(e_2)}{\deg (\partial _1(e_1))-1} &{}\quad \text {otherwise. } \end{array}\right. } \end{aligned}$$

Note that by (2.1) transition probabilities sum to 1 so that P is a Markov kernel. As the subsequent proofs will show, we are naturally led to the study of the Markov chain \(M_n\) which can simply be seen as the image by quotient map \(\pi \) of the simple random walk on the set of edges of the tree T. It came to our knowledge that this Markov chain was considered by Burger and Mozes [9] in the study of the notion of divergence groups in \({\text {Aut}}(T)\) and by Kwon [32] in the study of mixing properties of the discrete geodesic flow.

Let us illustrate the structure of this Markov chain as well as our subsequent use of it in a simple but important situation, that is when \(\Gamma \) is lattice of Nagao type.

Example 3.1

Let \(\Gamma \) be Nagao lattice in \(G\le {\text {Aut}}(T)\), where T is a \((q+1)\)-regular tree (see Fig. 2 for the corresponding edge-indexed graph). In this case, the above construction of Markov chain gives rise to a state space and transition probabilities as illustrated in Fig. 4.

Fig. 4
figure 4

Transition probabilities of \(M_n\) when \(\Gamma \) is a lattice of Nagao type (for the labeling of edges, see Fig. 2)

Consider a random trajectory of this Markov chain on its state space as depicted in the previous figure. The key phenomenon for us in this example is that once the trajectory turns toward the finite part (here, this corresponds to the edges facing left or up), it must deterministically walk all the way toward the finite part without a chance to turn around. This feature entails very strong recurrence properties which will allow us to control hitting times and, eventually, deduce convergence of the Markov chain to the stationary measure (up to issues of periodicity) even with moving starting point. The latter property is crucial for Theorem A.

3.1.1 Basic properties

Lemma 3.2

Let \(\Gamma \) be a tree lattice. The associated Markov chain \(M_n\) is irreducible.

Proof

Since the graph Q is connected, it is sufficient to show that for any two edges \(e,f\in EQ\), such that \(\partial _1 e=\partial _0 f\), we have \(P^n(e,f)>0\) for some \(n\ge 1\). If \(f\ne \overline{e}\), this holds for \(n=1\) by definition of P.

When \(f= \overline{e}\), we claim that there exists \(n\ge 0\) and a finite non-backtracking path of edges \((e=e_0,e_1, \ldots , e_n)\) (i.e. \(\partial _1 e_i = \partial _0 e_{i+1}\) and \(\partial _0 e_i \ne \partial _1 e_{i+1}\) for all \(0\le i<n\)) such that \({\text {ind}}(\overline{e}_n)>1\). Note that for all i, the transition probabilities \(P(e_i,e_{i+1}), P(\overline{e_{i+1}},\overline{e_i}), P(e_n,\overline{e}_n)\) are all positive, implying \(P_{2n+1}(e,\overline{e})>0\).

We show the existence of a path as above by contradiction. Suppose for any non-backtracking finite path starting at e we have \({\text {ind}}(\overline{e}_n)=1\). Such a path cannot end at a leaf, since then \({\text {ind}}(\overline{e_n})=\deg (\partial _0 \overline{e_n})=\deg (\partial _1 e_n)>2\) by (2.1). Hence, we can extend it to produce an infinite non-backtracking path with \({\text {ind}}(\overline{e_i})=1\) for all \(i\in {\mathbb {N}}\). In particular, \(N_{\partial _0(e)}(e_i) \le 1\) for all i, which contradicts the finiteness of the volume in (2.3). \(\square \)

In the case of geometrically finite lattices, we will prove positive recurrence of the associated Markov chain \(M_n\) using Foster’s drift criterion. Positive recurrence of \(M_n\) in the setting of general tree lattices, which is required in the proof of Theorems C and E, is shown in Proposition 6.4 with a slightly more elaborate proof.

Assume \(\Gamma \) is a geometrically finite tree lattice, \((Q,{\text {ind}})\) its associated edge-index graph and F the finite part of Q. For \(e\in EQ\), we use the notation \(|e|:=d(\partial _1(e),F)\) to indicate the distance between an edge and the finite part F. For \(e\notin F\), we say that e is oriented toward the finite part if \(d(\partial _1(e), F) < d(\partial _0(e), F)\), and oriented toward the cusp otherwise.

Lemma 3.3

Let \(\Gamma \) be a geometrically finite lattice. Then, the associated Markov chain \(M_n\) is positive recurrent.

Proof

For \(d_1,d_2 \ge 4\), one easily verifies that for any \(e\in EQ\), setting

$$\begin{aligned} V(e) = (3/2)^{|e|} \end{aligned}$$

and letting P to be the Markov operator corresponding to \(M_n\), we have \(PV(e) <\infty \) for all \(e\in F\) and

$$\begin{aligned} PV(e) \le V(e) -1/8 , \quad \text { for all } e\in EQ {\setminus } F. \end{aligned}$$

In the case \(d_1=d_2=3\), a slightly different function V (which also works for the previous case) does the job:

Let

$$\begin{aligned} V(e)={\left\{ \begin{array}{ll} 0 &{}\quad \text {if } e\in F, \\ |e| &{}\quad \text {if } e \text { is oriented toward the finite part,} \\ 100(3/2)^{|e|} &{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$

We have

$$\begin{aligned} PV(e) \le V(e) -1 , \quad \text { for all } e\in EQ {\setminus } F. \end{aligned}$$

By Lemma 3.2, \(M_n\) is irreducible. Hence, by Foster’s drift Criteria [37, Chapter 13], \(M_n\) is positive recurrent. \(\square \)

A simple combinatorial observation allows us to show that when \(\Gamma \) is geometrically finite, the period of the Markov chain \(M_n\) is two. This is expressed in the following lemma:

Lemma 3.4

If there exist two edges \(e,f\in EQ\) such that \(e\ne \overline{f}\), \(\partial _1(f)=\partial _0(e)\) and \({\text {ind}}(e),{\text {ind}}(f)>1\), then the period of \(M_n\) is 2.

When \(\Gamma \) is geometrically finite, one can simply take ef to be two consecutive edges in a Nagao ray oriented toward the finite part so that the lemma applies.

Proof

Let \(m-1\) be the length of a path from e to \(\bar{e}\) along edges with positive transition probabilities. Since \({\text {ind}}(e)>1\), \(P(\overline{e},e)>0\), hence there is a loop of length m with positive transition probabilities along all edges. On the other hand, after the previous loop, one can follow the path from e to \(\overline{e}\), continue to \(\overline{f}\), then to f and finally back to e. This is a loop of length \(m+2\). Hence, the period divides by m and \(m+2\), which forces it to be 1 or 2. On the other hand, since \(\Gamma \) action on T preserves a partition into two sets of vertices (thanks to the assumption that \({\text {Aut}}(T)\) acts without edge inversion), hence the period cannot be 1, proving the claim. \(\square \)

3.1.2 Hitting time of the finite part

Let F be the finite part of the graph Q. For the Markov chain \(M_n\), we denote by \(\tau \) the first hitting time of F i.e. \(\tau =\min \{n\in \mathbb {N} \, | \, \partial _1(M_n) \in F \}\). By positive recurrence, \(\tau \) is finite almost surely. To deal with periodicity, define \(\tau ':= \min \{ n\in {\mathbb {N}}~|~ n \ge \tau , 2|n \}\).

We start by a lemma that controls the probabilities of long hitting times of the finite part.

Lemma 3.5

Assume \(\Gamma \) is geometrically finite. Let \(\tau \) be the hitting time defined above. Then for any \(e\in EQ\)

$$\begin{aligned} \mathbb {P}_e ( \tau = i ) \le {\left\{ \begin{array}{ll} q^{\lceil \frac{|e|-i}{2} \rceil }&{}\quad \text {if } i\ge |e|,\\ 0 &{}\quad \text {otherwise,} \end{array}\right. } \end{aligned}$$

where \(q+1=\min \{d_1 ,d_2\} \). The same bound is true for \(\tau '\) up to multiplicative constant \(C=C(d_1,d_2)\).

Proof

Clearly, random walk starting at e can never hit F in less than |e| steps. Similarly when \(|e|=0\), the claim is obvious. When \(|e|>0\), by definition of geometric finiteness, \(\partial _1(e)\) belongs to some Nagao ray. Because of the structure of Nagao rays (see Example 3.1), a Markov trajectory starting at an edge oriented toward the finite part F must necessarily take at least one step toward F. This can only change once the trajectory visits F. Hence, if e is oriented toward finite part, we deduce that \(\mathbb {P}_e(\tau =|e|)=1\) matching the upper bound in the statement.

On the other hand, if e is oriented toward the cusp, in order to avoid visiting F in the first \(i-1\) steps, the walk must take at least \(\lfloor \frac{i- \vert e\vert }{2} \rfloor \) steps toward the cusp, all with probability \(q^{-1}\). This gives the bound in the lemma. \(\square \)

3.1.3 Convergence of the Markov chain with varying initial distribution

As before, let P be the Markov operator corresponding to \(M_n\). Let \(\Omega _{0},\Omega _{1} \subseteq EQ\) be its cyclic classes and for \(j=0,1\), denote by \(\mu _{\Omega _j}\) the unique \(P^2\) stationary probability measure on \(\Omega _j\).

The next lemma describes the convergence of the Markov chain with moving initial distributions. The condition on the initial distributions will be clear later on, as this convergence will play a crucial role in the proof of Theorem A.

Lemma 3.6

Let \(\Omega \) be a cyclic class of P and \(e(t) \subset \Omega \) be a sequence of edges in the same cyclic class, such that \(t-|e(t)| \rightarrow \infty \). Let n(t) be such that \(|t-2n(t)|\) is constant so that \(\delta _{e(t)}P^{2n(t)}\) is supported in \(\Omega \). Then,

$$\begin{aligned} \Vert \delta _{e(t)} P^{2n(t)} - \mu _\Omega \Vert \longrightarrow 0, \end{aligned}$$

where \(\Vert \cdot \Vert \) denotes the total variation norm (see e.g. [37, § D.1.2]).

In the proof, we control the distributions with non-constant starting points e(t) by studying the behaviour of the Markov chain conditioned on the hitting time of the finite part. This, together with the precise control on the hitting time as provided by Lemma 3.5, allows us to prove the required convergence.

Proof

By conditioning the Markov chain on the hitting time \(\tau '\) (as defined in Sect. 3.1.2), we have

$$\begin{aligned} \Vert \delta _{e(t)}P^{2n(t)}-\mu _\Omega \Vert \le \sum _{\overset{i=0}{2|i}}^\infty \mathbb {P}_{e(t)}(\tau '=i) \Vert \mathbb {P}_{e(t)}(\delta _{e(t)}P^{2n(t)} \in \cdot | \tau '=i)-\mu _\Omega \Vert . \end{aligned}$$
(3.1)

Here, for every \(i \in 2\mathbb {N}\) with \(\mathbb {P}_{e(t)}(\tau '=i)>0\), \(\mathbb {P}_{e(t)}(\delta _{e(t)}P^{2n(t)} \in \cdot | \tau '=i)\) denotes the probability measure on EQ given by

$$\begin{aligned} e \mapsto \frac{\mathbb {P}_{e(t)}(M_{2n(t)}=e \, \, \text {and} \, \, \tau '=i)}{\mathbb {P}_{e(t)}(\tau '=i)}. \end{aligned}$$

It follows from strong Markov property that for \(2n(t)\ge i\), we have

$$\begin{aligned} \Vert \mathbb {P}_{e(t)}(\delta _{e(t)}P^{2n(t)} \in \cdot | \tau '=i)- \mu _\Omega \Vert \le \max _{e\in B(F,2)} \Vert \delta _e P^{2n(t)-i} - \mu _\Omega \Vert , \end{aligned}$$
(3.2)

where B(F, 2) is the set of all edges at distance \(\le 2 \) from F.

For \(i \in 2\mathbb {N}\), denote

$$\begin{aligned} A_i := \max _{e\in B(F,2)} \Vert \delta _e P^{i} - \mu _\Omega \Vert \qquad \text {and} \qquad B_i^{(t)}:= \mathbb {P}_{e(t)}(\tau '=i) \end{aligned}$$

For odd \(i \in \mathbb {N}\), we set \(A_i=B_i=0\). We have the following

  1. 1.

    \(\Vert m_1-m_2\Vert \le 2\) for any probability measures \(m_1,m_2\) on EQ,

  2. 2.

    For each t, \(\sum _i B_i^{(t)}=1\).

  3. 3.

    \(B_i^{(t)} = 0\) for \(i< |e(t)|-2\).

  4. 4.

    \(B_i^{(t)} \le Cq^{\frac{|e(t)|-i}{2}}\) for \(i\ge |e(t)|\).

  5. 5.

    \(A_i\rightarrow 0\) as \(i\rightarrow \infty \).

Indeed, (1) , (2) are trivial, and (3), (4) are proved in Lemma 3.5. (5) holds since \(P^2\) is positively recurrent (Proposition 6.4), irreducible (Lemma 3.2) and aperiodic (Lemma 3.4).

With this notation, splitting the right-hand-side of (3.1) into three sums, we get that left-hand-side of (3.1) is bounded above by

$$\begin{aligned} \sum _{i< |e(t)|-2} A_{2n(t)-i} B_i^{(t)} + \sum _{|e(t)|-2 \le i\le 2n(t)} A_{2n(t)-i} B_i^{(t)}+ \sum _{i>2n(t)} 2 B_i^{(t)}, \end{aligned}$$
(3.3)

where we used (3.2) for the first two sums, and (1) for the third. We need to show that the above tends to 0 as \(t\rightarrow \infty \).

By (3), the first sum is identically 0 and as \(t \rightarrow \infty \), the third sum tends to 0 by (4) and the fact that \(2n(t)-|e(t)|\) tends to \(\infty \).

We focus on the middle sum of (3.3), which after denoting \(N_t=2n(t)-|e(t)|\), we rewrite as follows

$$\begin{aligned} \sum _{i=|e(t)|-2}^{2n(t)} A_{2n(t)-i} B_i^{(t)}= & {} \sum _{i=0}^{2n(t)-|e(t)|+2} A_i B_{2n(t)-i}^{(t)} = \sum _{i=0}^{N_t+2} A_i B_{N_t+|e(t)|-i}^{(t)}\nonumber \\\le & {} \sum _{i=0}^{\lceil N_t/2 \rceil } A_i B_{N_t+|e(t)|-i}^{(t)} + \sum _{i= \lceil N_t/2 \rceil }^{N_t+2} A_i B_{N_t+|e(t)|-i}^{(t)} \end{aligned}$$
(3.4)

By (2), the second sum in (3.4) is bounded from above by \( \displaystyle \sup _{i \ge N_t/2} \{ A_i\}\). As \(t\rightarrow \infty \), \(N_t\) goes to \(\infty \) so that using (5), \(\displaystyle \sup _{i \ge N_t/2} \{ A_i\}\) converges to 0 showing that the second sum in (3.4) converges to 0.

On the other hand, by (1), the first sum in (3.4) is bounded above by

$$\begin{aligned} 2 \sum _{i=0}^{\lceil N_t/2 \rceil } B_{N_t+|e(t)|-i}^{(t)} \le 2 \sum _{i=\lceil N_t/2 \rceil +|e(t)|}^{N_t+|e(t)|} B_i^{(t)} . \end{aligned}$$

which converges to 0 as \(t\rightarrow \infty \) by (4). This concludes the proof. \(\square \)

3.2 Proof of Theorem A

We now link the Markov chain to the study of orbital measures of horospherical orbits and use the properties of \(M_n\) to prove Theorem A. Before starting the proof, we remark that it suffices to prove the result only for the Følner sequence \(F_t\). Indeed, let O be a M-invariant compact subset with non-empty interior in \(G^0_\eta \), \(a \in G\) be a hyperbolic element with attractive fixed point \(\eta \) and of (minimal) translation distance 2 and \(O_{t}=a^t O a^{-t}\) be the associated good Følner sequence. It follows by compactness of \(F_0\) and O that for some \(n_0 \in \mathbb {N}\) and every \(t \in \mathbb {N}\), we have

$$\begin{aligned} O_{t-n_0}\subseteq F_{2t} \subseteq O_{t+n_0}. \end{aligned}$$
(3.5)

As a consequence, there exists \(c \in (0,1)\) such that for every \(t \in \mathbb {N}\), the sequence \(F_{2t}=a^t F_0 a^{-t}\) satisfies

$$\begin{aligned} 0<c \le \frac{m_{G^0_\eta }(F_{2t})}{m_{G^0_\eta }(O_{t+n_0)}} \le \frac{m_{G^0_\eta }(F_{2t})}{m_{G^0_\eta }(O_{t-n_0)}} \le \frac{1}{c} < \infty \end{aligned}$$
(3.6)

One easily sees from these inequalities that the orbital measures \(\nu _{x,t}\) associated to \(F_{t}\) have non-escape of mass if and only if those associated to \(O_t\) have it.

3.2.1 Reduction to measures on the tree

For the rest of the section we fix \(x = g\Gamma \in X\) with non-compact \(G^0_\eta \)-orbit.

Recall that for \(t \in \mathbb {N}\), \(\nu _{x,t}\) denotes the probability measure on the orbit \(F_t x\) obtained by pushforward of the Haar probability measure on \(F_t\) under the orbit map \(u\mapsto ux\) for \(u\in F_t\).

Denote by \(\sigma _t\) the uniform probability measure on the finite set \(g^{-1}F_t \tilde{o} \subset VT\).

The following observation is the first step in reducing the proof of recurrence of horospherical orbits to studying recurrence properties of the Markov chain \(M_n\) introduced earlier.

Lemma 3.7

For every \(t \in \mathbb {N}^*\), we have

$$\begin{aligned} {\text {proj}}_* \nu _{x,t} = \pi _* \sigma _t . \end{aligned}$$
(3.7)

Proof

Recall that \(x\in G/\Gamma \) is fixed and \(g\in G\) is such that \(x=g\Gamma \). Consider the map \(f: G^0_\eta \rightarrow T\) given by \(f(u)=g^{-1}u^{-1}\tilde{o}\). Denote by \(O: u \mapsto ug\Gamma \) the orbit map. Then the following diagram clearly commutes:

figure a

By definition, \(O_* m_{F_t}=\nu _{x,t}\) and hence it is enough to see that \(f_* m_{F_t}=\sigma _t\). This is readily verified and we are done. \(\square \)

3.2.2 Further reduction to shadows and the Markov chain

Above, we related the orbital measures \(\nu _{x,t}\) to \(\sigma _t\) - distributions on VT. The next lemmas will link \(\sigma _t\) to the distributions of the Markov chain.

For \(v \in VT\) and \(n \in \mathbb {N}\), denote by S(vn) the set of vertices of T at distance \(n \ge 0\) from v. For w a neighbor of v, let \(S_w(v,n)\) be the subset of S(vn) consisting of vertices \(z \in VT\) such that \(d(z,w)<d(z,v)\). Thinking of v as a light source at the center the sphere, we call \(S_w(v,n)\) the shadow of w (see Fig. 5 for illustration). Denote by \(\lambda _{(v,w),n}\) the uniform probability measure on the shadow \(S_w(v,n)\).

Fig. 5
figure 6

Shadow of w: all nodes are vertices on the sphere S(v, 3), the shadow \(S_w(v,3)\) consists of solid nodes

Lemma 3.8

Let G be a non-compact closed subgroup of \({\text {Aut}}(T)\) that acts transitively on \(\partial T\). For any \(t \in \mathbb {N}^*\), we have

$$\begin{aligned} \sigma _t= \frac{1}{\deg (g^{-1}y_t)-1} \sum _{i_t=1}^{\deg (g^{-1}y_t)-1} \lambda _{(g^{-1}y_t,\tilde{v}_{i_t}),t}=\lambda _{(g^{-1}y_{t+1},g^{-1}y_t),t+1} , \end{aligned}$$

where \(\{\tilde{v}_{i_t}\}\) is the collection of vertices in T neighboring \(g^{-1}y_t\) except \(g^{-1}y_{t+1}\).

Proof

The last equality directly results from the definition of the probability measure \(\lambda _{(v,w),t}\), therefore we focus on the first equality. Since all the shadows involved have the same cardinality, using the definitions of \(\sigma _t\) and \(\lambda _{(v,w),t}\), the equality will follow if we show

$$\begin{aligned} g^{-1}F_t \tilde{o} = \bigsqcup _{i_t=1}^{\deg (g^{-1}y_t)-1} S_{\tilde{v}_{i_t}}(g^{-1}y_t,t). \end{aligned}$$

In other words, the set \(g^{-1}F_t\tilde{o}\) is the set of vertices on the sphere of radius t around \(g^{-1}y_t\) except the shadow of \(g^{-1}y_{t+1}\).

Since g acts by isometry, it is enough to show this for \(g={\text {id}}\). We clearly have

$$\begin{aligned} F_t \tilde{o} \quad \subset \bigsqcup _{\begin{array}{c} (y_t,w)\in ET \\ w\ne y_{t+1} \end{array}} S_{w}(y_t,t). \end{aligned}$$

To show the other inclusion, let \(\xi _1,\xi _2 \in \partial T\) be such that \((\xi _i, \eta ) \cap [y_0, \eta ) \supset [y_t, \eta )\) for \(i=1,2\). It clearly suffices to show that there exist a sequence \(h_n \in F_t\) with \(h_n \xi _1 \rightarrow \xi _2\) as \(n \rightarrow \infty \). To see this, note that since G is non-compact, closed and transitive on \(\partial T\), by [10, Lemma 3.1.1] it acts doubly transitively on \(\partial T\). Furthermore, since it is non-compact, it contains a hyperbolic element a that -thanks to double transitivity- we can suppose to have attracting point \(\eta \) and repelling point \(\xi _1\) on \(\partial T\). Similarly up to conjugating a, let b be a hyperbolic element with attracting fixed point \(\eta \) and repelling fixed point \(\xi _2\). The sequence \(h_n=b^{-n}a^n\) does the job and this concludes the proof. \(\square \)

Let

$$\begin{aligned} D:= \frac{1}{\deg (\tilde{o})} \sum _{(\tilde{o},\tilde{w})\in ET} \delta _{(o,\pi (\tilde{w}))}. \end{aligned}$$
(3.8)

where \(\hbox {deg}(.)\) denotes the valency of the vertex \(\tilde{o}\). Denote by \(\rho _n\) the uniform measure on the sphere \(S(\tilde{o},n)\). In the following lemma, we realize the probability measures \(\pi _*\lambda _{(v,w),n}\) and \(\pi _* \rho _n\) as the \(n^{th}\)-step distribution of our Markov chain with appropriate initial distributions. The fact that such a relation exists is not surprising as the Markov chain \(M_n\) is obtained as a quotient of the simple random walk on the edges of the tree T.

Lemma 3.9

Let \(\tilde{v} \in VT\) and \(\tilde{w} \in VT\) be a neighbour of \(\tilde{v}\). Denote \(\pi (\tilde{v})=v\), \(\pi (\tilde{w})=w\). Then for any \(n\ge 0\)

$$\begin{aligned} \pi _*\lambda _{(\tilde{v},\tilde{w}),n+1} = \partial _1{}_*(\delta _{(v,w)}P^n). \end{aligned}$$
(3.9)

and

$$\begin{aligned} \pi _* \rho _{n+1} = \partial _{1*} (D P^n). \end{aligned}$$
(3.10)

Proof

Define a Markov chain \(L_n\) with state space ET and transition probabilities

$$\begin{aligned} l(e_1,e_2)= {\left\{ \begin{array}{ll} \frac{1}{\deg (\partial _1(e_1)-1)} &{}\quad \text {if } \partial _1(e_1)=\partial _0(e_2),\\ 0 &{}\quad \text {otherwise.} \end{array}\right. } \end{aligned}$$

This chain describes the unbiased non-backtracking random walk on the (directed) edges of the tree. For all \(n\ge 0\), one clearly has

$$\begin{aligned} \lambda _{(\tilde{v},\tilde{w}),n+1} = \partial _{1*} (\delta _{(\tilde{v},\tilde{w})} L_n). \end{aligned}$$

On the other hand, observe that \(\pi _* \delta _{(\tilde{v},\tilde{w})} L_n = \delta _{(v,w)}M_n\) but since \(\pi \) commutes with \(\partial _1\), (3.9) follows.

To see the second claim, note that by construction of the Markov chain \(L_n\), we have

$$\begin{aligned} \rho _{n+1}= \frac{1}{\deg (\tilde{o})}\sum _{(\tilde{o},\tilde{w})\in ET} \delta _{(\tilde{o},\tilde{w})}L_n. \end{aligned}$$

Applying \(\pi _*\) to both sides yields (3.10). \(\square \)

Remark 3.10

We remark here that the statements of Lemmas 3.7, 3.8 and 3.9 hold more generally for any lattice \(\Gamma \) of \({\text {Aut}}(T)\). Indeed, the proofs do not make use of the particular structure of a geometrically finite lattice.

Proposition 3.11

Let \(x=g\Gamma \) with non-compact \(G_\eta ^0\)-orbit. Then the set of weak-\(*\) limit points of \( \pi _* \sigma _t\) is {\(\partial _{1*} \mu _{\Omega _0},\partial _{1*} \mu _{\Omega _1}\}\), where \(\Omega _i\)’s are two cyclic classes of P and for \(i=0,1\), \(\mu _{\Omega _i}\) is the unique \(P^2\)-stationary measures on \(\Omega _i\) as before.

Remark 3.12

In this proposition, the measures \(\pi _*\sigma _t\) depend on the point \(x=g\Gamma \), but the set of limit points of \(\pi _*\sigma _t\) does not.

Proof

Combining Lemmas 3.8 and 3.9 and denoting \(v_{i_t}:=\pi (\tilde{v}_{i_t})\), we have for any \(t \in \mathbb {N}^*\)

$$\begin{aligned} \pi _* \sigma _t =\frac{1}{\deg (g^{-1}y_t)-1}\sum _{i_t=1}^{\deg (g^{-1}y_t)-1} \partial _1 {}_*(\delta _{(\pi (g^{-1}y_t),v_{i_t})}P^{t-1}). \end{aligned}$$
(3.11)

For a fixed \(t \in \mathbb {N}\), the edges \((\pi (g^{-1}y_t),v_{i_t})\) belong to the same cyclic class, denote it by \(\Omega _{j(t)}\). Up to passing to a subsequence (i.e. considering even or odd t’s), which we also denote by t, we may assume that j(t) is constant. For each t, choose one vertex \(v(t) \subset \{ v_{i_t} \}\) and denote the edge

$$\begin{aligned} e(t)= (\pi (g^{-1}y_t),v_t). \end{aligned}$$
(3.12)

Up to passing to a further subsequence of t’s, we may suppose that \(\delta _{e(t)}P^{t-1}\) is supported in a single cyclic class. Therefore, for some \(r \in \{0,1\}\), every t in this sequence writes as \(t-1=2n(t)+r\), where \(n(t) \in \mathbb {N}\). Thus we can write

$$\begin{aligned} \delta _{e(t)}P^{t-1}=\delta _{e(t)}P^{2n(t)}P^r. \end{aligned}$$
(3.13)

Now, since by Proposition 2.1, we have \(t-|e(t)|\rightarrow \infty \), Lemma 3.6 applies and we deduce that \(\Vert \delta _{e(t)}P^{2n(t)} -\mu _{\Omega _j}\Vert \rightarrow 0\) as \(t \rightarrow \infty \) for some \(j \in \{0,1\}\). Therefore, we have

$$\begin{aligned} \Vert \delta _{e(t)}P^{t-1}-\mu _{\Omega _{i}}\Vert \rightarrow 0 \text { as } t \rightarrow \infty . \end{aligned}$$

where \(i=j+r \; (\text {mod}\, 2)\). This finishes the proof. \(\square \)

3.2.3 Proof of Theorem A

With the notation of this section, we want to show that for any \(\epsilon >0\), there exists a compact set \(K\subset G/\Gamma \) such that for every \(x\in X\) with non-compact \(G^0_\eta \)-orbit, there exists \(N \in \mathbb {N}\) such that for all \(t>N\)

$$\begin{aligned} \nu _{x,t}(K) \ge 1-\epsilon . \end{aligned}$$
(3.14)

Let \(L\subset EQ\) be a finite set such that \(\mu _{\Omega _j}(L)>1-\varepsilon \) for \(j\in \{0,1\}\). The set \(K={\text {proj}}^{-1}(\partial _1(L))\) is compact, since the map \({\text {proj}}: G/ \Gamma \rightarrow VQ\) has compact fibers. Using Lemma 3.7 and Proposition 3.11, we have for any \(x\in X\)

$$\begin{aligned} \liminf _{t \rightarrow \infty } \nu _{x,t}(K) = \liminf _{t \rightarrow \infty } \pi _* \sigma _t(\partial _1(L)) \ge \min \{\mu _{\Omega _0}(L),\mu _{\Omega _1}(L)\} > 1-\varepsilon , \end{aligned}$$

and the claim follows.

4 Equidistribution

This section is devoted to the proof of Theorem B which we deduce from Theorem A and our previous work [13].

Fix a hyperbolic element \(a \in G\) of translation length 2 with attracting fixed point \(\eta \). Denote by \(\eta _- \in \partial T\) the repelling point of a and set \(M=G^0_\eta \cap G^0_{\eta _-}\). Let O be a M-invariant compact subset of \(G_\eta ^0\) with non-empty interior. Let \(O_t=a^t O a^{-t}\) be the associated good Følner sequence for \(G_\eta ^0\). As before, for \(x \in X\), denote by \(\nu _{x,t}\) the orbital measure \( m_{O_t}* \delta _x\).

Let \(x\in X\) be such that \(G^0_\eta \)-orbit of x is not compact. By Theorem A, up to passing to a subsequence, we can suppose that

$$\begin{aligned} \nu _{x,t} \longrightarrow m \end{aligned}$$
(4.1)

for the weak-\(*\) topology and where m is a Borel probability measure on X. Furthermore, since \(O_t\) is a Følner sequence, m is \(G^0_\eta \)-invariant. We need to show that \(m=m_X\).

Recall that by [13, Theorem 1.6], there exists countably many closed \(G^0_\eta \)-orbits in X. These are all compact and for each cusp of \(\Gamma \), there exists precisely a discrete one parameter family of compact orbits. Denote by \(k\in \mathbb {N}\) the number of cusps of \(\Gamma \) and let \(C_{i,j}\) be the collection of compact \(G^0_\eta \)-orbits, where \(i=1,\ldots , k\) and \(j \in \mathbb {Z}\). By the same result, we have \(a C_{i,j}=C_{i,j+1}\) and \(a^{-\ell }C_{i,j}\) escapes to infinity as \(\ell \rightarrow \infty \) in the sense that for any compact set K, we have \(K \cap a^{-\ell }C_{i,j}= \emptyset \) for every \(\ell \) large enough (see e.g. proof of [13, Lemma 6.2]).

We first prove that \(m(C_{i,j})=0\) for every \(i=1,\ldots ,k\) and \(j \in \mathbb {Z}\). For a contradiction, suppose \(m(C_{i_0,j_0})>0\) for some \(i_0,j_0\). Denote \(\frac{1}{2}m(C_{i_0,j_0})=:\epsilon >0\) and let \(K=K(\epsilon )\) be the compact subset of X given by Theorem A. It follows by the latter result that we have

$$\begin{aligned} m(K) \ge 1-\epsilon . \end{aligned}$$
(4.2)

Choose an \(\ell \in \mathbb {N}\) large enough so that \(a^{-\ell }C_{i_0,j_0} \cap K =\emptyset \). Since \(a^{-\ell }x\) does not lie on a compact \(G^0_\eta \)-orbit either, using Theorem A, by passing to a further subsequence in (4.1), we can suppose that \(\nu _{a^{-\ell }x,t}\) also converges to a \(G^0_\eta \)-invariant probability measure that we denote by \(m^{a^{-\ell }}\). As in (4.2), by Theorem A, we have \(m^{a^{-\ell }}(K)\ge 1-\epsilon .\)

Using the relation \(a^{-\ell }O_t a^{\ell }=O_{t-\ell }\), one verifies by a simple calculation that we have \(m^{a^{-\ell }}=a^{-\ell }_*m\). Using this, we deduce

$$\begin{aligned} 2\epsilon = m(C_{i_{0},j_0})=m^{a^{-\ell }}(a^{-\ell }C_{i_{0},j_0}) \le m^{a^{-\ell }}(X {\setminus } K) \le \frac{1}{2}m(C_{i_0,j_0})=\epsilon , \end{aligned}$$

a contradiction. Therefore, \(m(C_{i,j})=0\) for all \(i=1,\ldots ,k\) and \(j \in \mathbb {Z}\).

We mention that at this point, one could conclude the proof by appealing to the classification of ergodic \(G_\eta ^0\)-invariant Borel probability measures [13, Theorem 1.1]. However, that result has extra hypotheses on G, namely Tits independence property and a certain transitivity condition. On the other hand, for a geometrically finite lattice \(\Gamma \), it is possible to give a similar classification of ergodic \(G_\eta ^0\)-invariant Borel probability measures on \(G/\Gamma \) for a more general group G as in Theorem B. We single this out in the next proposition which is essentially contained in [13].

Proposition 4.1

Let T be a \((d_1,d_2)\)-biregular tree, with \(d_1,d_2\ge 3\), and G a non-compact, closed and topologically simple subgroup of \({\text {Aut}}(T)\) acting transitively on \(\partial T\). Let \(\Gamma \) be a geometrically finite lattice in G and \(\eta \in \partial T\). Then, any \(G^0_\eta \)-invariant and ergodic Borel probability measure on \(X=G/\Gamma \) is either \(G^0_\eta \)-homogeneous and compactly supported, or it is the Haar measure \(m_X\).

To finish the proof of Theorem B, consider an ergodic decomposition of the \(G^0_\eta \)-invariant probability measure m. Since there are countably many closed \(G^0_\eta \)-orbits and each of them has zero measure with respect to m, the same holds for almost every ergodic component of m. Therefore by Proposition 4.1 almost every ergodic component of m is the Haar measure \(m_X\), hence \(m=m_X\) completing the proof of Theorem B. \(\square \)

Proof of Proposition 4.1

We use the same notation introduced in the beginning of the proof of Theorem B, namely a is a hyperbolic element with attracting fixed point \(\eta \in \partial T\), the group M and the good Følner sequence \(O_t\) are as defined there. Let \(m_0\) be a \(G^0_\eta \)-invariant and ergodic probability measure on X. If \(m_0\) gives positive mass to a compact \(G^0_\eta \)-orbit, then by ergodicity, it must be the homogeneous measure supported on that orbit. So let us suppose that \(m_0\) gives zero mass to each compact \(G^0_\eta \)-orbit. By pointwise ergodic theorem for amenable groups [34, Theorem 1.2], there exists a point \(y \in X\) that is generic with respect to \(m_0\) and the tempered Følner sequence \(O_t\) (see [13, § 2.3]). By [13, Theorem 1.6], \(G^0_\eta \)-orbit of y is dense in X. Then, by [13, Lemma 6.2], there exists a compact set \(K'\) in X, a sequence of integers \(n_k \rightarrow \infty \) such that \(a^{-n_k}y \in K'\) for every \(k \in \mathbb {N}\). For a function \(\theta \in C_c(X)\), denote \(\hat{\theta }(z):=\int \theta (mz)dm_M(z)\) where \(m_M\) is the Haar probability measure on M. The function \(\hat{\theta }\) is clearly M-invariant. Since G is closed, transitive and topologically simple, it has the Howe–Moore property [11, Proposition 4.2] and in particular the action of the hyperbolic element a on X is mixing. Therefore we can apply [13, Lemma 6.3], where we can take \(O^+\) to be \(O_{t_0}\) for some \(t_0 \in \mathbb {Z}\) small enough, for every \(\theta \in C_c(X)\), we have

$$\begin{aligned} \frac{1}{m_{G^0_\eta }(O_{t_0+n_k})} \int _{O_{t_0+n_k}} \hat{\theta }(uy) dm_{G^0_\eta }(u) \underset{k \rightarrow \infty }{\longrightarrow } \int \hat{\theta }(z) dm_X(z). \end{aligned}$$
(4.3)

On the other hand by choice of \(y \in X\), the left-hand-side above also converges to \(\int \hat{\theta }(z) dm_0(z)\). It follows

$$\begin{aligned} \int \int \theta (mz) dm_M(m) dm_0(z)=\int \int \theta (mz) dm_M(m) dm_X(z). \end{aligned}$$

But since \(m_0\) and \(m_X\) are \(G^0_\eta \)-invariant, by Fubini’s theorem, it follows that

$$\begin{aligned} \int \theta (z) dm_0(z)=\int \theta (z) dm_X(z) \end{aligned}$$

in other words, \(m_0=m_X\) as required. \(\square \)

5 Escape of mass phenomenon

This section contains the proofs of Theorems C and D.

We start by proving an escape of mass result that implies Theorem C. Regarding the construction of a lattice \(\Gamma <{\text {Aut}}(T)\) that figures in the following result, we note that by [3, § 4.11, Example 1], for every \(q \ge 2\), there exists a lattice \(\Gamma \le G = {\text {Aut}}(T_{2q+2})\) whose associated edge-indexed graph is as in Fig. 6. Clearly, this \(\Gamma \) is not geometrically finite.

Fig. 6
figure 7

An edge-indexed graph \((Q,{\text {ind}})\) of a non-geometrically finite lattice \(\Gamma \le {\text {Aut}}(T_{2q+2})\)

Theorem 5.1

Let \(\Gamma \) be a tree lattice with associated edge-indexed graph \((Q, {\text {ind}})\) as in Fig. 6. Let \(x=e\Gamma \in X = G/\Gamma \) be the trivial coset. Let \(\xi \in \partial T\) be the end corresponding to the sequence \(\{\tilde{x}_i\}_{i \in {\mathbb {N}}}\) for some lifts of \(x_i\) and \(G_\xi ^0\) be the corresponding horospherical subgroup. Then for any compact \(K\subset X\)

$$\begin{aligned} \lim _{t\rightarrow \infty } \nu _{x,t}(K) = 0 . \end{aligned}$$

Proof

By (3.6), it clearly suffices to prove the statement for the orbital measures \(\nu _{x,t}\) associated to the Følner sequence \(F_t\). Let \(\tilde{o}\) be a lift of the left-most vertex \(x_0\) to \(T=T_{2q+2}\). By Lemma 3.7 and the fact that \({\text {proj}}\) has compact fibers, it is enough to show that if \(\sigma _t\) is the uniform measure on \(F_t \tilde{o}\subset VT\), then for any \(x_l\in VQ\), we have \(\pi _*\sigma _t (x_l) \rightarrow 0\).

The set \(F_t\tilde{o}\) can be identified with all non-backtracking paths in T of length t that start at \(\tilde{x}_{t}\) and do not contain \(\tilde{x}_{t+1}\). A path from \(\tilde{x}_t\) to a vertex \(y\in F_t\tilde{o}\) with \(\pi (y)=x_l\) projects to a path in Q between \(x_t\) and \(x_l\). Note that the projection of such paths to Q can only contain \(x_0\) as endpoint. These will allow us to bound the number of such paths.

Without loss of generality, assume that t is even. For l an even non-negative integer, we claim that the number of vertices in \(F_t\tilde{o}\) that project to \(x_l\) is bounded above by

$$\begin{aligned} {t \atopwithdelims ()l/2} \cdot (2q)^{t-l/2} \cdot 2^{l/2}. \end{aligned}$$

Indeed, any such path from \(x_t\) to \(x_l\) must take \(t-l/2\) steps to the left and l/2 steps to the right in Fig. 6. The binomial coefficient counts the number of choices when to take the right step. Since the projection of the paths that we consider can only contain \(x_0\) as endpoint, for any choice of such path, each edge taken to the right has at most 2 lifts to T, while each edge taken to the left has at most 2q lifts. Therefore, for any even \(l\ge 0\)

$$\begin{aligned} \pi _* \sigma _t(x_l) \le \left( \frac{2q}{2q+1}\right) ^t g(t) \underset{t \rightarrow \infty }{\longrightarrow } 0, \end{aligned}$$

where g(t) is a polynomial in t of degree \(\le l/2\). \(\square \)

The rest of this section is devoted to the proof of Theorem D. Its proof consists of four parts. In the first part, we construct an uncountable family of lattices \(\Gamma _\alpha \) in \({\text {Aut}}(T_6)\). In the second part, thanks to an auxiliary Markov chain that we introduce, we obtain subgaussian concentration estimates on the Markov chain associated with the lattice \(\Gamma _\alpha \) (see Sect. 3.1). In the third part, we show that the space \({\text {Aut}}(T_6)/\Gamma _\alpha \) contains points x which exhibit escape of mass along some subsequence of horospherical orbital averages and along some other subsequences equidistribute to the Haar measure, as we show in the fourth part.

Proof of Theorem D

First part: Construction of \(\Gamma _\alpha \). For each \(\alpha \in (1,2)\), we will construct an edge-indexed graph \((Q_\alpha ,{\text {ind}})\) of finite volume, which will yield a lattice \(\Gamma _\alpha \le {\text {Aut}}(T_6)\). First, the underlying graph is a ray, with vertices \(\{x_i\}_{i=0}^\infty \) and edges \(e_i = (x_i,x_{i+1})\).

Let \(\alpha \in (1,2)\) and for \(i \ge 1\), let \(n_i = \lfloor \alpha ^i \rfloor \). We divide the vertices \((x_i)_{i \ge 1}\) of the ray into two types: \(x_j\) is black if \(j=i+n_1+ \cdots +n_i\) for some \(i \ge 1\) and white otherwise. In other words, there are blocks of \(n_i\) consecutive white vertices that are separated by single appearances of black vertices. A white vertex \(x_j\) is said to belong to \(i-\)th block if

$$\begin{aligned} i+n_1 + \cdots n_i< j < i +1 + n_1 + \cdots +n_{i+1}. \end{aligned}$$

We say that an edge e belongs to the i-th block if both \(\partial _0 e\) and \(\partial _1 e\) do.

We define the index map on \(EQ_\alpha \): for \(i>0\), let \({\text {ind}}(e_i)=2\) and \({\text {ind}}(\bar{e}_{i-1})=4\) if \(x_i\) is black and \({\text {ind}}(e_i)={\text {ind}}(\bar{e}_{i-1})=3\) if \(x_i\) is white. Set \({\text {ind}}(e_0)=6\). See Fig. 7 for illustration.

Fig. 7
figure 8

Edge-indexed graph \((Q_\alpha , {\text {ind}})\). Second and third blocks are marked

One easily checks that \((Q_\alpha ,{\text {ind}})\) has bounded denominators [3, page 23] and, hence, has a faithful finite grouping which yields a discrete subgroup \(\Gamma _\alpha \) in \({\text {Aut}}(T_{6})\). Moreover, \({\text {vol}}(Q_\alpha ) \le 1+ 4 \sum _{i \ge 1} (n_i +1)\frac{1}{2^{i-1}} < \infty \) (see (2.3)), thus \(\Gamma _\alpha \) is a lattice.

Second part: An auxiliary chain and subgaussian estimates. Consider the edge-indexed graph \(Q_\alpha \) and the Markov chain \(M_n\) on EQ as in Sect. 3.1. For an edge \(e_j\) that belongs to some block i, the transition probabilities of \(M_n\) are given by

$$\begin{aligned} P(e_j,e_{j+1})=P(\bar{e}_j,\bar{e}_{j-1})=\frac{3}{5}, \qquad P(e_j,\bar{e}_j)=P(\bar{e}_j,e_j)=\frac{2}{5}. \end{aligned}$$
(5.1)

In view of the reductions in Sect. 3, we are interested in understanding the distribution of \(\partial _0 {}_*\delta _{e_j} P^m\). For this, consider an auxiliary Markov kernel \(\bar{P}\) on a state space consisting of two elements \(\{e,\bar{e}\}\) with transition probabilities

$$\begin{aligned} \bar{P}(e,e)=\bar{P}(\bar{e},\bar{e})=\frac{3}{5}, \qquad \bar{P}(e,\bar{e})=\bar{P}(\bar{e},e)=\frac{2}{5}. \end{aligned}$$
(5.2)

This auxiliary Markov Chain records the behavior of \(M_n\) along the edges within some block. It remembers the probabilities of an edge to turn around or to continue further in the same direction (see e.g. Fig. 4).

Denote by \(V_n\) the Markov chain associated to the kernel \(\bar{P}\). Given a word \(\mathrm {u} \in \{e,\bar{e}\}^n\) of length \(n \ge 2\), for \( s \in \{e,\bar{e}\}^2\) denote by \(N_s(\mathrm {u})\) the number of occurrences of the word s as a subword of \(\mathrm {u}\). For \(n \ge 2\), define the function \(f_n\) on \(\{ e,\bar{e} \}^n\) by \(f_n(\mathrm {u})=N_{ee}(\mathrm {u})+N_{e\bar{e}}(\mathrm {u})-N_{\bar{e}e}(\mathrm {u})-N_{\bar{e}\bar{e}}(\mathrm {u})\). Denote by \(Y_n\) the integer valued random variable \(f_n(V_0,V_1,\ldots ,V_{n-1})\).

Now, it is readily observed that for every \(i \in \mathbb {N}\) large enough so that \(n_i\ge 16\) and every integer \(j,m \ge 2\) with

$$\begin{aligned} \begin{aligned} i+\sum _{k=1}^{i-1}n_k + \frac{n_i}{4} \le j \le i+\sum _{k=1}^{i}n_k - \frac{n_i}{4} \qquad \text {and} \qquad m \le \frac{n_i}{8}, \end{aligned} \end{aligned}$$
(5.3)

we have

$$\begin{aligned} \partial _0 {}_*\delta _{e_j}P^m = \partial _0 {}_*e_{j+Y_m} \quad \text {in distribution.} \end{aligned}$$
(5.4)

Indeed, the inequalities (5.3) make sure that the starting edge \(e_j\) in the i-th block is \(n_i/4\) away from the boundary of the i-th block, so application of \(P^m\) keeps the support of the distribution within i-th block and, therefore, transition probabilities at each step are given by (5.1). The relation with the auxiliary chain (5.2) is straightforward.

We now wish to use subgaussian concentration inequalities for the Markov chain \(V_m\) (e.g. as discussed in [18]). To this end, we note that being an aperiodic irreducible Markov chain with finite state space \(\{e,\overline{e}\}\), \(V_m\) is geometrically ergodic and the state space is a small set in the sense of [18, Definition 0.1]. Moreover, one checks by a simple calculation that \(|\mathbb {E}_{s}[Y_m]|\) is bounded by a constant C independently of \(s \in \{e,\bar{e}\}\) and \(m \in \mathbb {N}\). Finally, each function \(f_m\) is clearly separately bounded in the sense of [18, (0.1)]. Now, it follows by [18, Theorem 0.2, (0.7)]) that there exists a constant \(c'_0>0\) such that for all \(m \in \mathbb {N}\), \(s \in \{e,\overline{e}\}\) and \(r \in \mathbb {N}\), we have

$$\begin{aligned} \mathbb {P}_{s}(|Y_m| \le C+r) \ge 1-2 e^{-c'_0\frac{r^2}{m}}. \end{aligned}$$

Now using the relation (5.4) and slightly decreasing the constant \(c_0'\) to \(c_0\) (depending only on \(C>0\)), we deduce that for every \(i \in \mathbb {N}\) large enough, \(j,m \in \mathbb {N}\) as in (5.3) and \(r \le m\), we have

$$\begin{aligned} \mathbb {P}_{e_{j}}(M_m \in \{e_{j-r},\ldots ,e_{j+r},\bar{e}_{j-r},\ldots ,\bar{e}_{j+r}\}) >1- 2 e^{-c_0\frac{r^2}{m}}. \end{aligned}$$
(5.5)

It follows immediately that if an initial distribution \(\mu \) is supported on a set S of edges \(\{e_j\}\) which satisfy (5.3) for some i large enough, then for m as in (5.3) and \(r\le m\) we have

$$\begin{aligned} \mathbb {P}_{\mu }(M_m \text { is in }r\text {-neighborhood of }S) >1- 2 e^{-c_0\frac{r^2}{m}}. \end{aligned}$$
(5.6)

Third part: Showing the escape of mass. Now we construct points \(x = g\Gamma _\alpha \in G/\Gamma _\alpha \) who, under horospherical group action, exhibit the dynamical behavior as described in the statement of Theorem D. For an edge \(e\in EQ_\alpha \), denote by \(|e| = d(e,x_0)\) the graph distance in \(Q_\alpha \). In particular, \(|e_j|=j\).

Choose a sequence of edges \(e(t) \subset EQ_\alpha \), such that for some \(\beta \in (\frac{2}{3},1)\) we have

  1. 1.

    \(e(0)=e_0\).

  2. 2.

    \(\partial _0 e_{t+1}= \partial _1 e(t)\).

  3. 3.

    \(\liminf _{t\rightarrow \infty } \frac{t^\beta }{|e(t)|} = 0. \)

  4. 4.

    \(|e(t)|=0\) for infinitely many \(t\in {\mathbb {N}}\).

The path e(t) as above comes back infinitely often to \(x_0\), but makes some longer and longer visits towards the cusp. Now, choose a lift of e(t) in \(T_6\), starting at some basepoint \(\tilde{o} \in VT\), which is the lift of \(x_0\). Let \(\tilde{o}=y_0, y_1, y_2,...\) be consecutive vertices converging to \(\eta \in \partial T\). Let \(g\in {\text {Aut}}(T_6)\) be an automorphism such that \(g^{-1}\) maps the edge \((y_i,y_{i+1})\) to the lift of the edge e(i), and \(x=g\Gamma _\alpha \).

Let \(i_k\) and \(t_{i_k}\) be increasing sequences of \({\mathbb {N}}\), such that \(e(t_{i_k})\) is an edge, pointing toward the cusp, that is exactly in the middle of \(i_k\)-th block, namely

$$\begin{aligned} d_{i_k}:= |e(t_{i_k})| = i + n_1 + ... +n_{i_k-1} + \left\lfloor \frac{n_{i_k}}{2} \right\rfloor , \end{aligned}$$

and \(t_{i_k}^\beta < c_1 |e(t_{i_k})|\) for some \(c_1>0\). Such infinite subsequences exist by property (3) in the choice of e(t). In the notation above, \(e_{d_{i_k}}=|e(t_{i_k})|\).

We shall now show that the sequence of measures given by the orbital averages \(\nu _{x,t_{i_k}}\) converges weakly to 0 as \(k\rightarrow \infty \).

By Lemma 3.7 and (3.11), it suffices to show that the sequence \(\delta _{e(t_{i_k})}P^{t_{i_k}}\) of distributions of the Markov chain \(M_n\) converges weakly to zero. To do this, we would like to apply (5.5) to show that after \(t_{i_k}\)-iteration most of the mass of the Markov chain stays in \(i_k\)-th block, which moves to the cusp in \(Q_\alpha \) as \(k \rightarrow \infty \). However, constraints (5.3) are not satisfied since \(t_{i_k}\ge d_{i_k} > n_{i_k}/8\). We remind that \(n_{i_k} = \lfloor \alpha ^{i_k} \rfloor \), \(d_{i_k} \sim c_2\alpha ^{i_k}\) and, thus \(t_{i_k} \le c_3 \left( \alpha ^{i_k}\right) ^{1/\beta } \) for some positive constants \(c_2,c_3\).

To overcome this problem, we apply (5.6) several times for a small number of allowed iterations, each time dismissing an exponentially small proportion of trajectories that move more than a distance \(r_k\) to be chosen below.

Let \(m_k =\lfloor \frac{n_{i_k}}{8} \rfloor \sim \frac{\alpha ^{i_k}}{8}\). The number of times we wish to apply (5.6) is bounded above by

$$\begin{aligned} N_k =\lceil 8c_3 \alpha ^{i_k(1/\beta -1)} \rceil \ge \frac{t_{i_k} }{m_k} \end{aligned}$$

Set \(r_k = \lceil \frac{n_{i_k}/4}{N_k} \rceil \sim (32c_3)^{-1} \alpha ^{i_k(2-1/\beta )}\). The Markov property and choices of \(m_k\) and \(r_k\) allow us to repeatedly apply (5.6) \(N_k\) times (with \(m=m_k\) and \(r=r_k\)), each time conditioning on trajectories that do not move more than \(r_k\) in each \(m_k\)-iterate. We get that proportion of trajectories starting at \(e(t_{i_k})\) that move at most \(N_k \cdot r_k \le n_{i_k}/4\) (in particular, do not leave \(i_k\)-th block) is at least

$$\begin{aligned} \left( 1-2e^{-c_4\alpha ^{i_k(3-2/\beta )}} \right) ^{8c_2 \alpha ^{i_k(1/\beta -1)}}, \end{aligned}$$

for some constant \(c_4>0\). The above tends to 1 as \(k\rightarrow \infty \), implying the escape of mass for \(\nu _{x,t}\)’s when the underlying Følner sequence is \(F_t\). By (3.6), this also implies the escape of mass for \(\nu _{x,t}\)’s associated to any good Følner sequence.

Fourth part: Equidistribution. Recall that by our choice of \(x \in X\), there exists an increasing sequence \(t_k \in \mathbb {N}\) such that \(|e(t_k)|=0\) for every \(k \in \mathbb {N}\). The equidistribution statement follows from the following technical but more general result. This completes the proof of Theorem D. \(\square \)

Proposition 5.2

Let G be a non-compact, closed, topologically simple subgroup of \({\text {Aut}}(T)\) that acts transitively on \(\partial T\). Let \(\Gamma \) be a lattice in G, \(\eta \in \partial T\) and \(O_t\) a good Følner sequence in \(G^0_\eta \). Let \(g \in G\) and denote by \((\hat{e}(t))_{t \in \mathbb {N}}\) a sequence of consecutive edges in T on a geodesic segment towards \(g^{-1} \eta \). Assume that there exist a finite subset F of EQ and an increasing subsequence \(t_k\) such that \(\pi (\hat{e}(t_k))=:e(t_k) \in F\) for every \(k \in \mathbb {N}\). Then for \(x=g \Gamma \in X\), the orbital measures \(\nu _{x,t_k}\) converge towards the Haar measure \(m_X\) on X.

Proof

As before, fix a distinguished vertex \(\tilde{o}\) in T with respect to which the \({\text {proj}}: G/\Gamma \rightarrow VQ\) map given by \({\text {proj}}(h\Gamma )=\pi (h^{-1} \tilde{o})\) is defined. Let \(g \in G\) be as in the statement. Since the Markov chain \(M_n\) associated with the lattice \(\Gamma \) is positive recurrent (Proposition 6.4), it follows by (3.6) and the correspondance established in Lemmas 3.7, 3.8 and 3.9 (see also Remark 3.10) that \({\text {proj}}_*\nu _{x,t_k}\) converges to a probability measure \(\bar{m}\) on EQ. Since the map \({\text {proj}}\) is proper, this implies that the sequence \(\nu _{x,t_k}\) is tight so that any subsequence of \((\nu _{x,t_k})_{k \in \mathbb {N}}\) has a limit point and any limit point is a probability measure on X. Let m be such a limit point along a subsequence that we also denote by \(t_k\). Since \(\nu _{x,t_k}\)’s are orbital measures associated to a Følner sequence in \(G^0_\eta \), the limit probability measure m is \(G^0_\eta \)-invariant.

Now, fix a hyperbolic element \(a \in G\) with attracting point \(\eta \in \partial T\) and such that the translation axis of a contains \(\tilde{o}\). Let \(n_k=\lfloor \frac{t_k}{\tau (a)} \rfloor \) where \(\tau (a) \in \mathbb {N}\) denotes the translation length of a, so that \(|\tau (a^{n_k})-t_k|\) is bounded. For every \(k \in \mathbb {N}\), we have \({\text {proj}}(a^{-n_k}g\Gamma )=\pi (g^{-1}a^{n_k} \tilde{o})=\pi ((g^{-1}a^{n_k} g)g^{-1}\tilde{o})\). As \(n \in \mathbb {N}\) varies, \((g^{-1}a^n g)g^{-1}\tilde{o}\) describes vertices on the geodesic ray between \(g^{-1} \tilde{o}\) and \(g^{-1} \eta \). Therefore it follows by the hypothesis \(e(t_k) \in F\) that for some larger finite set \(F' \subset EQ\), we have \({\text {proj}}(a^{-n_k}g\Gamma ) \in F'\) for every \(k \in \mathbb {N}\). Since the map \({\text {proj}}\) has compact fibres, this entails that there exists a compact set \(K \subset G/\Gamma \) such that

$$\begin{aligned} a^{-n_k}g \Gamma \in K, \end{aligned}$$
(5.7)

for every \(k \in \mathbb {N}\). Furthermore, since such a group G as in the statement enjoys the Howe–Moore property [11] (see also [36]), the action of a on \((X,m_X)\) is mixing so that we are in a position to apply [13, Lemma 6.3] as in the proof of Proposition 4.1. Now repeating the same argument as in the end of the proof of Proposition 4.1 (i.e. (4.3) and thereafter), one deduces that \(m=m_X\) and this proves the proposition.

6 Limiting distributions of spheres in quotient graphs and lattice point counting

This section is devoted to the proof of Theorems E and F . Recall from Sect. 3.1, the irreducible Markov chain \(M_n\) associated with a tree lattice \(\Gamma \). We proved in Lemma 3.3 that it is positive recurrent when \(\Gamma \) is a geometrically finite lattice. However, in Theorem E general lattices are considered. Here we will prove that, more generally, \(M_n\) is positive recurrent for all lattices. In order to do this we introduce another Markov chain that will serve as a tool to analyse further the chain \(M_n\).

6.1 Auxiliary chain

Consider the Markov chain \(\hat{M}_n\) on the state space VQ, given by the transition kernel \(\hat{P}(v,w)=\frac{{\text {ind}}((v,w))}{\deg (v)}\) if \((v,w) \in EQ\) and 0 otherwise. Since the graph Q is connected, \(\hat{M}_n\) is irreducible.

Recall that \(\tilde{o}\in VT\) is a fixed basepoint. Let \(o=\pi (\tilde{o})\in VQ\). Consider the positive function \(\mu :VQ \rightarrow {\mathbb {R}}_+\), given by \(\mu (w)=\deg (w)N_o(w)^{-1} \), where \(N_o(.)\) is as defined in Sect. 2.2.

Lemma 6.1

The positive function \(\mu \) defines a finite stationary measure on VQ for \(\hat{M}_n\). In particular, the Markov chain \(\hat{M}_n\) is positively recurrent.

Proof

It suffices to check that \(\mu \) has finite \(l_1\)-norm and is reversible, i.e. satisfies \(\mu (w_1)\hat{P}(w_1,w_2)=\mu (w_2)\hat{P}(w_2,w_1)\) for all \(w_1,w_2 \in VQ\). It is enough to consider pairs of neighbors \(w_1,w_2 \in VQ\). Indeed, for such we have

$$\begin{aligned} \frac{\hat{P}(w_1,w_2)}{\hat{P}(w_2,w_1)}=\frac{{\text {ind}}(w_1,w_2)}{{\text {ind}}(w_2,w_1)}\frac{\deg (w_2)}{\deg (w_1)}=\frac{N_o(w_1)}{N_o(w_2)}\frac{\deg (w_2)}{\deg (w_1)}=\frac{\mu (w_2)}{\mu (w_1)}. \end{aligned}$$

This shows that \(\mu \) is a reversible measure on VQ. The fact that \(\mu \) has finite \(l_1\)-norm is a direct consequence of the volume formula (2.3):

$$\begin{aligned} \sum _{w \in VQ} \mu (w)= \sum _w \deg (w)N_o(w)^{-1} \le \max \{d_1,d_2\} \sum _w N_o(w)^{-1} < \infty . \end{aligned}$$

\(\square \)

Remark 6.2

Recall from Sect. 2.2 that G is transitive on the set of vertices of VT at even distance from \(\tilde{o}\). The image of \({\text {proj}}: G/\Gamma \rightarrow \Gamma {\setminus } T\) is the set of vertices at even distance from o. Moreover, from (2.4) it is clear that \({\text {proj}}_*m_X\) is proportional to the restriction of \(\mu \) to the image of \({\text {proj}}\). Hence, the measure \(\mu \) can be thought of as the projection of Haar measure on \(G/\Gamma \).

6.2 Positive recurrence of the Markov chain \(M_n\)

First, we wish to relate the Markov chains \(\hat{M}_n\) and \(M_n\). Denote by \(R_n\) the \(n^{\text {th}}\)-step of the nearest-neighbor simple random walk on the vertices of the tree T and by \(\delta _{\tilde{o}} (R_n)\) its distribution when the initial vertex is \(\tilde{o} \in VT\), i.e. a.s. \(R_0=\tilde{o}\). Note that since T is biregular, the restriction of \(\delta _{\tilde{o}}(R_n)\) to the spheres \(S(\tilde{o},m)\), for \(m \le n\), is a multiple of the uniform measure on \(S(\tilde{o},m)\) which is denoted by \(\rho _m\) as before. Let \(D_{\tilde{o}}\) be the distribution on EQ given by

$$\begin{aligned} D_{\tilde{o}}:= \frac{1}{\deg (\tilde{o})} \sum _{(\tilde{o},\tilde{w})\in ET} \delta _{(o,\pi (\tilde{w}))}. \end{aligned}$$
(6.1)

Lemma 6.3

For any \(n\ge 1\) we have

$$\begin{aligned} \begin{aligned} \delta _o \hat{P}^n&=\pi _*\delta _{\tilde{o}}(R_n)\\&=\mathbb {P}_{\tilde{o}}(R_n=\tilde{o})\delta _o+ \sum _{k=1}^n \mathbb {P}_{\tilde{o}}(d(R_n,\tilde{o})=k) \partial _{1*}(D_{\tilde{o}} P^{k-1}). \end{aligned} \end{aligned}$$

Proof

Let R be the transition kernel for simple random walk on the tree. For the first equality, one simply notes that \(\hat{P}(\pi (\tilde{x}),\pi (\tilde{y}))=R(\tilde{x},\pi ^{-1}(\tilde{y}))\).

The second equality follows from the fact that the distribution of \(n^{th}\)-step of nearest-neighborhood simple random walk on the tree is given by

$$\begin{aligned} \delta _{\tilde{o}}(R_n)= \mathbb {P}_{\tilde{o}}(R_n=\tilde{o}) \delta _{\tilde{o}} + \sum _{k=1}^n \mathbb {P}_{\tilde{o}}(d(R_n,\tilde{o})=k) \rho _k. \end{aligned}$$
(6.2)

The statement follows after applying \(\pi _*\) and Lemma 3.9. \(\square \)

In other words, the distribution of the chain \(\hat{M}_n\) starting from \(v \in VQ\) is given by weighted average of distributions given by \(M_k\) with \(k\le n\). We will use this relation to deduce the positive recurrence of \(M_n\) from the positive recurrence of \(\hat{M}_n\).

Proposition 6.4

The Markov chain \(M_n\) is positive recurrent.

Proof

By Kingman’s subadditive ergodic theorem, there exists \(r\in {\mathbb {R}}\) such that \(\frac{1}{k} d(\tilde{o},R_k ) \longrightarrow r\), \(\mathbb {P}_{\tilde{o}}\)-almost surely, hence also in measure, as \(k\rightarrow \infty \) (the value r is called the drift of random walk \(R_k\)). Since \( \max \{d_1,d_2\} \ge 3\), it is easily seen that \(r>0\). Let \(\varepsilon >0\). Then for all \(k \in \mathbb {N}\) large enough, we have

$$\begin{aligned} \mathbb {P}_{\tilde{o}}\left( | d(\tilde{o},R_k)-kr|>k\varepsilon \right) < \varepsilon \end{aligned}$$
(6.3)

By positive recurrence of the auxiliary chain \(\hat{M}_n\) (Lemma 6.1), there exists a finite subset \(K_1\) of VQ such that for every n large enough, \(\mathbb {P}_o (\hat{M}_n \in K_1)>1-\epsilon \).

In view of Lemma 6.3 and (6.3), we deduce that there exists a sequence \(n_k \in \mathbb {N}\) with \(|n_k-kr|\le k\varepsilon \) such that for every k large enough

$$\begin{aligned} \mathbb {P}(M_{n_k} \in \partial _1^{-1}K_1)>1-2\epsilon , \end{aligned}$$
(6.4)

which implies that the irreducible chain \(M_n\) is positive recurrent. \(\square \)

An alternative and more conceptual proof of Proposition 6.4 was kindly suggested to us by an anonymous referee. We discuss it in the following remark. As our proof above, it relies on the fact that the Markov chain \(M_n\) can be seen as a quotient of the simple random walk on ET, the set of edges of the tree T.

Remark 6.5

(Alternative proof of Proposition 6.4) Let \(\tilde{P}\) be the Markov operator associated to the simple random walk on ET. Considering two successive edges x and y in ET, we have \(ET=Gx \cup Gy \simeq G/G_x \cup G/G_y\), where \(G_x\) and \(G_y\) denote the respective stabilizers and \(G={\text {Aut}}(T)\). Using this and the fact that G is unimodular [1, Proposition 6], one sees that ET carries a \(\tilde{P}\)-stationary and G-invariant measure \(\tilde{\nu }\). The restriction of \(\tilde{\nu }\) to Gx (respectively Gy) corresponds to the G-invariant measure on \(G/G_x\) (respectively \(G/G_y\)). On the other hand, the Markov operator P of the Markov chain \(M_n\) on \(EQ \simeq \Gamma {\setminus } ET\) can be seen as the restriction of \(\tilde{P}\) to \(\Gamma \)-invariant functions on ET and the associated quotient measure \(\nu \) of \(\tilde{\nu }\) gives a P-stationary measure on EQ. But since \(\Gamma <G\) is a lattice and \(\nu \) is given by the quotient measure on \(\Gamma {\setminus } G/G_x \cup \Gamma {\setminus } G/G_y\), we have that \(\nu \) is finite, as required.

6.3 Proof of Theorem E

Here we prove parts 1. and 2. of Theorem E. Its third part about exponential equidistribution will be proven in Sect. 6.4.

Given \(\tilde{v} \in VT\), let \(D_{\tilde{v}}\) the distribution on VQ defined as in (6.1). By Lemma 3.10, \(\pi _* \rho _n =\partial _{1*}(D_{\tilde{v}}P^{n-1})\). Hence, by Proposition 6.4, there is no escape of mass for the sequence \(\pi _*\rho _n\). This proves (1) of Theorem E.

If the irreducible and positive recurrent Markov chain \(M_n\) has period \(p \in \mathbb {N}\), then the sequence of distributions \(D_{\tilde{v}}P^n\) have finitely many limit points \(\{\mu _j\}_{j=0}^{p'-1}\), corresponding to all possible convex combinations with coefficients \(1/\deg (\tilde{v})\) of the unique stationary probability measures of \(M_n\) on each one of its cyclic classes (corresponding to the classes of Dirac measures constituting \(D_{\tilde{v}}\)). This implies the convergence along subsequences \(pn+j\) and hence (2) of Theorem E. \(\square \)

6.4 Exponential equidistribution of spheres in quotients by geometrically finite lattices

Previously, we established positive recurrence of \(M_n\), which is sufficient to prove the existence of limiting distributions of spheres in quotients of trees by action of tree lattices. However, in some cases, our Markov chain possesses a stronger property, namely that of geometric ergodicity. In these situations, the speed of convergence to the limiting distribution can be shown to be exponential and the exponential rate can even be made effective.

We begin by stating a version of Geometric Ergodic Theorem for Markov chains. Out of the equivalent definitions of geometric ergodicity, we conveniently choose one that uses the (Foster–Lyapunov) drift criteria. We then prove geometric ergodicity of the Markov chain \(M_n\) associated to geometrically finite tree lattices and discuss the application for exponential equidistribution of spheres. We refer the reader to [20, 37] for more on geometric ergodicity.

Let \(M_n\) be an irreducible, aperiodic and positive recurrent Markov chain on a countable state space S with the stationary probability measure \(\mu \). Denote by P the corresponding Markov operator. We call \(M_n\) geometrically ergodic if there exists \(r>1\) such that for all \(x\in S\), we have

$$\begin{aligned} \sum _{n \ge 0} r^n \Vert \delta _x P^n - \mu \Vert < \infty . \end{aligned}$$
(6.5)

where \(\Vert \cdot \Vert \) denotes the total variation norm. In particular, for a geometrically ergodic chain \(M_n\), we have \(\Vert \delta _x P^n - \mu \Vert = o(r^{-n})\) for every \(x \in S\).

Theorem 6.6

(Geometric Ergodic Theorem) Let \(M_n\) be an irreducible aperiodic Markov chain on a countable state space S. Assume that there exist a finite set \(K \subset S\), \(b\in {\mathbb {R}}, \beta < 1\) and a function \(V\ge 1\), which is finite at some \(x_0\in S\) satisfying the drift criteria:

$$\begin{aligned} PV(x) \le \beta V(x) + b \mathbb {1}_K(x), \quad \text { for any } x\in X. \end{aligned}$$
(6.6)

Then \(M_n\) is geometrically ergodic.

Let us remark that the rate r can be made explicit in terms of \(\beta , K\); see [4] for the treatment of the constant r. Finally, aperiodicity hypothesis is only required to have a simple expression as in (6.5); if the Markov chain is not aperiodic, we shall still speak of geometric ergodicity if its restriction to its cyclic classes are.

Lemma 6.7

Let T be \((d_1,d_2)\)-biregular tree with \(d_1,d_2\ge 3\) and \(\Gamma \) a geometrically finite tree lattice. Then the associated Markov chain \(M_n\) is geometrically ergodic.

Proof

Let F be the compact part of Q. For convenience of notation, we will assume \(d_1 = d_2\). Let \(q=d_1+1\).

Recall that for \(e\in EQ\), \(|e|=d(\partial _1(e), F)\). The edge e is said to point toward the finite part if \(d(\partial _1(e),F)>d(\partial _0(e),F)\).

We define the function \(V:EQ \rightarrow [1,\infty )\) by

$$\begin{aligned} V(e) = {\left\{ \begin{array}{ll} 1 &{} \quad \text {if } e\in EF, \\ q^{0.1 |e|} &{}\quad \text {if } e \notin EF, \text {and points toward the finite part} \\ \frac{1}{2^{|e|}}q^{0.9 |e|} &{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$

We claim that V satisfies the drift criteria (6.6) with \(\beta = q^{-0.1}\) and \(b=q^5\).

Recall that we have positive transition probabilities only among neighboring edges in EQ. For \(e\in EQ {\setminus } EF\), the edge e belongs to a Nagao ray. If e is oriented toward the finite part and \(|e|>5\), \(PV(e) = V(f)\), where \(|f| = |e|-1\). Hence,

$$\begin{aligned} PV(e) = q^{0.1 (|e|-1)} \le q^{-0.1} V(e). \end{aligned}$$

If e is oriented toward the cusp, the transition probabilities are 1/q to jump one step further away from EF to edge pointing toward the cusp and \(q-1/q\) to get one step closer and point toward the finite part (see Example 3.1). In other words, for each edge e with \(|e|>5\) we have

$$\begin{aligned} \begin{aligned} PV(e)&= \frac{1}{q} \cdot \frac{ q^{0.9(|e|+1)}}{2^{|e|+1}} + \frac{q-1}{q} \cdot q^{0.1(|e|-1)} \\&\le \frac{1}{2} \cdot q^{-0.1} \cdot \frac{q^{0.9|e|}}{2^{|e|}} + q^{0.1(|e|-1)} \\&\le q^{-0.1} \cdot \frac{q^{0.9|e|}}{2^{|e|}} = q^{-0.1} V(e). \end{aligned} \end{aligned}$$

The last inequality holds since for any \(q\ge 4\) and \(|e|>5\)

$$\begin{aligned} q^{0.1(|e|-1)} \le \frac{1}{2} \cdot q^{-0.1} \cdot \frac{q^{0.9|e|}}{2^{|e|}}. \end{aligned}$$

The lemma follows by letting K be the finite set of edges with \(|e|\le 5\) (this also contains EF). \(\square \)

Finally, the description of limit measures \(\mu _j\)’s in the paragraph following the statement of Theorem E follows from the proof Sect. 6.3 and Lemma 3.4 which says that the period of the Markov chain \(M_n\) is always two so that the Dirac masses constituting each distribution \(D_{\tilde{v}}\) all belong to a single cyclic class.

Remark 6.8

In the context of homogeneous dynamics, inequalities of type (6.6) are often referred to as Margulis inequalities. They were first used in the work of Eskin–Margulis–Mozes [24] and Eskin–Margulis [23]. After we completed the first version of this article, for horospherical averages on lattice quotients of real semisimple groups, using linear representations, Katz [31] proved Margulis inequalities to establish quantitative non-divergence of horospherical averages (as in Lemma 6.7). Combining this with a spectral gap, he also deduced an equidistribution result (as Theorem B but) with rate depending, among others, on certain diophantine parameters of the starting point \(x \in G/\Gamma \) (cf. Remark 1.5). For \({\text {PSL}}_2(\mathbb {R})\)-quotients, more precise estimates were obtained earlier by Flaminio–Forni [27] and Strömbergsson [53], exploiting, among others, (unitary) representation theory of \({\text {PSL}}_2(\mathbb {R})\).

Remark 6.9

We remark that the family of lattices for which the associated Markov chain is geometrically ergodic and, consequently, for which part 3. of Theorem E holds, contains many non-geometrically finite lattices. For example, the lattice associated with the edge-indexed graph from Fig. 6 is such an example, with similar Foster–Lyapunov function V(x) to the one in proof of Lemma 6.7.

6.5 Proof of Theorem F

Let \(\Gamma \) be a geometrically finite lattice in \({\text {Aut}}(T)=:G\), denote by m a Haar measure on G and let \(m_X\) be the induced G-invariant finite measure on \(G/\Gamma \) by choice of a Borel fundamental domain in G. Denote by \(S_T(R)\) the cardinality of the sphere of radius R around \(\tilde{o}\) in T and \(o= \pi (\tilde{o})\). As before, \(\pi \) is the natural projection \(VT \rightarrow VQ\) and \(\rho _n\) denotes the normalized probability measure on the sphere of radius n on VQ with center o. Recall that G has precisely two orbits on VT and it acts transitively on the set of vertices of T that are of even distance to each other, so that for every \(\gamma \in \Gamma \), \(2| d(\gamma \tilde{o},\tilde{o})\). For every \(R \in \mathbb {N}\), we have

$$\begin{aligned} N(2R) = \sum _{n\le R} S_T(2n) \cdot \pi _* \rho _{2n}(o) \cdot | \Gamma \cap G_{\tilde{o}}| . \end{aligned}$$
(6.7)

Thanks to (3) of Theorem E (see also the paragraph following that theorem), for some constant \(r>1\), we have

$$\begin{aligned} |\pi _*\rho _{2n}(o)-\frac{1}{m_X(X)} {\text {proj}}_*m_X(o)|=o(r^{-n}). \end{aligned}$$
(6.8)

On the other hand, in (6.8), the term \({\text {proj}}_*m_X(o)\) can be rewritten as:

$$\begin{aligned} {\text {proj}}_*m_X(o)=m_X({\text {proj}}^{-1}(o))=m_X(G_{\tilde{o}}\Gamma )=\frac{1}{|G_{\tilde{o}} \cap \Gamma |} m(G_{\tilde{o}}). \end{aligned}$$
(6.9)

Plugging (6.9) and (6.8) in (6.7) yields the desired statement.

To see the alternative expression of the main term \(\frac{m(G_{\tilde{o}})}{m_X(X)}\) as expressed after the statement of Theorem F, observe first that it follows by unimodularity of \({\text {Aut}}(T)\) that for any two vertices \(\tilde{v},\tilde{w} \in VT\) with \(2 | d(\tilde{v},\tilde{w})\), we have \(m(G_{\tilde{v}})=m(G_{\tilde{w}})\). Now fixing a lift \(\tilde{v}\) for every \(v \in VQ\) with 2|d(ov) and an element \(g_v\) such that \(g_v \tilde{v}= \tilde{o}\), we have

$$\begin{aligned} m_X(X)= & {} \sum _{\underset{2| d(v,o)}{v \in VQ}} m_X({\text {proj}}^{-1}(v))=\sum _{\underset{2| d(v,o)}{v \in VQ}}m_X(G_{\tilde{o}}g_v \Gamma )=\sum _{\underset{2| d(v,o)}{v \in VQ}}m_X(G_{\tilde{v}}\Gamma )\nonumber \\= & {} \sum _{\underset{2| d(v,o)}{v \in VQ}} \frac{1}{|\Gamma \cap G_{\tilde{v}}|}m(G_{\tilde{v}})=m(G_{\tilde{o}})\sum _{\underset{2| d(v,o)}{v \in VQ}} \frac{1}{|\Gamma \cap G_{\tilde{v}}|} \end{aligned}$$
(6.10)

and get that the main term \(\frac{m(G_{\tilde{o}})}{m_X(X)}\) is equal to \(\left( \sum _{\underset{2| d(v,o)}{v \in VQ}} \frac{1}{|\Gamma \cap G_{\tilde{v}}|}\right) ^{-1}\).