Statistics in conjugacy classes in free groups

In this paper, we establish statistical results for a convex co-compact action of a free group on a CAT($-1$) space where we restrict to a non-trivial conjugacy class in the group. In particular, we obtain a central limit theorem where the variance is twice the variance that appears when we do not make this restriction.


Introduction and results
Let Γ be a free group on p ≥ 2 generators acting convex co-compactly on a CAT(−1) space (X, d) (i.e the quotient of the intersection of X and the convex hull of the limit set of Γ is compact). There has been considerable work in trying to understand the statistics of such an action. For example, the following result (a particular case of theŠvarc-Milnor lemma) is well-known. Fix a free generating set A = {a 1 , . . . , a p } and let | · | denote word length on Γ with respect to A. Then, for an arbitrary base point o ∈ X, there exist constants C 1 , C 2 > 0 such that for all x ∈ Γ. Thus |x| and d(o, xo) are comparable quantities and it is natural to ask if more precise estimates hold, at least typically or on average. One such result is the following. Write Γ n := {x ∈ Γ : |x| = n}. Then the averages 1 #Γ n x∈Γn d(o, xo) n (1.2) converge to some λ > 0, as n → ∞ [14], [15], where the positivity follows immediately from the lower bound in (1.1). (See Remark 1.3(i) below for a further discussion.) Furthermore, subject to a mild non-degeneracy condition, namely that the set {d(o, xo) − λ|x| : x ∈ Γ} is not bounded, the distribution of (d(o, xo) − λn)/ √ n with respect to the normalised counting measure on Γ n converges to a normal distribution N (0, σ 2 ), as n → ∞, for some finite σ 2 > 0.
In this paper, we shall consider the corresponding questions when we restrict our group elements to a non-trivial conjugacy class. Let C be a non-trivial conjugacy class in Γ and let k = min{|x| : x ∈ C}. Let C n = {x ∈ C : |x| = n} and note that C n is non-empty if and only if n = k + 2m, m ∈ Z + . Subject to an additional condition, we also have a central limit theorem.
Theorem 1.2. Suppose that the set {d(o, xo) − λ|x| : x ∈ Γ} is not bounded. Then the distribution of (d(o, xo)−λ(k +2m))/ √ k + 2m with respect to normalised counting measure on C k+2m converges to a normal distribution N (0, 2σ 2 ), as n → ∞, A noteworthy feature of this result is that the variance is twice the variance that appears in the unrestricted case. Theorems 1.1 and 1.2 will follow from more general results proved below. follows from Proposition 8 of [14]. The results in [14] are proved for co-compact groups of isometries of real hyperbolic space but go over to co-compact groups of isometries of CAT(−1) spaces by the arguments of [15]. Some explanation may be in order here. The paper [15] is written in the context of compact manifolds (possibly with boundary) with variable negative curvature. In our situation, X corresponds to the universal cover of M and Γ to the fundamental group, acting as isometries on X. Given a point p ∈ M and a non-identity element x ∈ Γ (thought of as π 1 (M, p)), the number l(x) is defined to be the length of the shortest geodesic arc from p to itself in the homotopy class determined by x. This can be reinterpreted as the number d(o, xo), where o is a lift of p to X, returning us to our original setting. Although the results of [15] are stated for manifolds of negative curvature, the arguments used there, in particular the key Lemma 1, only require that X be a CAT(−1) space. A consequence of this lemma is that d(o, xo) can be written as the Birkhoff sum of a Hölder continuous function on an associated subshift of finite type (Proposition 3 of [15]); this shows that d(o, xo) satisfies the assumption (A1) in the next section. (Of course, the assumption (A2) below is trivially satisfied.) The existence of a limit in (1.2) continues to hold if Γ is a word hyperbolic group following an observation of Calegari and Fujiwara [1], using a result of Coornaert [3]. (ii) The number λ > 0 may also be characterised in the following way. Let Σ be the space of infinite reduced words on A ∪ A −1 and let µ 0 be the measure of maximal entropy for the shift map σ : Σ → Σ -these objects are defined in section 2. Then, Then ℓ(x) and x are positive and depend only on the conjugacy class of x, so we may write ℓ(C) and C . Furthermore, ℓ(C) is the length of the closed geodesic on the quotient Γ\X in the free homotopy class determined by C. If S were bounded then we would have ℓ(C) = λ C for all non-trivial conjugacy classes C. In particular, the length spectrum of Γ\X, i.e. the set of lengths of closed geodesics, would be contained in the set λZ. However, it is known that the length spectrum is not contained in a discrete subgroup of the reals when X is the real hyperbolic space H k , k ≥ 2 or when X is a simply connected surface of pinched variable negative curvature [4], so the hypothesis holds in these cases. More generally, though the hypothesis may fail in particular cases, it will typically hold. For example, if X is a metric tree with quotient metric graph Γ\X then to ensure the hypothesis is satisfied, one only requires that Γ\X has two closed paths whose lengths have irrational ratio.
(v) The above results still hold if d(o, xo) is replaced by a Hölder length function L(x) as defined in [7].
We end the introduction by outlining the contents of the paper. In section 2 we discuss the relationship between free groups and subshifts of finite type and state more general versions of Theorems 1.1 and 1.2. In section 3 we introduce the transfer operators that we use for our analysis and discuss some of their properties. In section 4 we introduce a generating function η C (z, s) related to the conjugacy class C, where z and s are complex variables. In the geometric setting considered above, this generating function takes the form In particular, the variable z is associated to the word length and the variable s to the geometric length (or to a more general weighting below). This generating function is perhaps the main new innovation of the paper, though its analysis is inspired by work on a somewhat similar function in [9]. This allows us to prove our first main result. We conclude the paper in section 5 by proving a central limit theorem over a non-trivial conjugacy class. The results in this paper form part of the first author's PhD thesis at the University of Warwick.

Free groups and subshifts
As above, let Γ be a free group with free generating set A = {a 1 , . . . , a p }, p ≥ 2.
Every non-identity element x ∈ Γ has a unique representation as a reduced word x = x 0 x 1 · · · x n−1 and we define the word length |x| of x, by |x| = n. We associate to the identity element the empty word and set |1| = 0. Let Γ n = {x ∈ Γ : |x| = n}.
Let C be a non-trivial conjugacy class in Γ and let k = inf{|x| : x ∈ C} > 0. The set of elements with shortest word length in the conjugacy class is precisely the set of elements with cyclically reduced word representations. In fact, if g = g 1 · · · g k ∈ C is cyclically reduced then all cyclically reduced words in C are given by cyclic permutations of the letters in g 1 · · · g k . Let C n = {x ∈ C : |x| = n} and note that C n is non-empty if and only if n = k + 2m. If x ∈ C k+2m then its reduced word representation is of the form w −1 m · · · w −1 1 g 1 · · · g k w 1 · · · w m , for some cyclically reduced g = g 1 · · · g k ∈ C k and w = w 1 · · · w m ∈ Γ m with w 1 = g 1 , g −1 k . Hence it is convenient to introduce the notation Γ m (g) = {w ∈ Γ m : w 1 = g 1 , g −1 k }. A simple calculation shows that the number of elements in C k+2m is given by We associate to the free group Γ a dynamical system called a subshift of finite type. This subshift of finite type is formed from the space of infinite reduced words (with the obvious definition) adjoined to the elements of Γ together with the dynamics given by the action of the shift map. It will be convenient to describe this space by means of a transition matrix. Define a p × p matrix A, with rows and columns indexed by We give A ∪ A −1 the discrete topology, (A ∪ A −1 ) Z + the product topology and Σ the subspace topology; then σ is continuous. Since the matrix A is aperiodic (i.e. there exists n ≥ 1 such that for each pair of indices (s, t), A n (s, t) > 0), σ : Σ → Σ is mixing (i.e. for every pair of non-empty open sets U, V ⊂ Σ there is an n ∈ Z + such that We augment Σ by defining Σ * = Σ ∪ Γ, where the elements of Γ are identified with finite reduced words in the obvious way. The shift map naturally extends to a map σ : Σ * → Σ * , where, for the finite reduced word x 0 x 1 · · · x n−1 ∈ Γ, we set σ(x 0 x 1 · · · x n−1 ) = x 1 · · · x n−1 ; and for the empty word σ1 = 1. It is sometimes useful to think of an element of Γ as an infinite sequence ending in an infinite string of 1s.
We endow Σ * with the following metric, consistent with the topololgy on Σ.
we take x n = 1 (the empty symbol) for each n ≥ m. Then σ : Σ * → Σ * is continuous and Γ is a dense subset of Σ * .
We will write M for the set of σ-invariant Borel probability measures on Σ. For ν ∈ M, we write h(ν) for its entropy. We define the pressure of a continuous function f : Σ → R by If f is Hölder continuous then the supremum is attained at a unique µ f ∈ M, called the equilibrium state of f . (If f : Σ * → R then we write P (f ) := P (f | Σ ).) The equilibrium state of zero µ 0 is also called the measure of maximal entropy and P (0) is equal to the topological entropy h of σ : Σ → Σ. It is easy to calculate that h = log(2p − 1) (the logarithm of the largest eigenvalue of A) and that µ 0 is characterised by (Technically, this defines µ 0 as a measure on Σ * with support equal to Σ.) Two Hölder continuous functions f, g : Σ * → R are cohomologous if there exists a continuous function u : Σ * → R such that f = g + u • σ − u. Two Hölder continuous functions have the same equilibrium state if and only if they differ by the sum of a coboundary and a constant. A function f : Σ * → R is locally constant if there exists n ≥ 1 such that for all pairs x, y ∈ Σ with x k = y k for 0 ≤ k ≤ n, f (x) = f (y). Locally constant functions are automatically Hölder continuous for any choice of Hölder exponent. For a function f : Σ * → R we denote by f n (x) the Birkhoff sum We have the following result [12], [18].
Furthermore, σ 2 f = 0 if and only if f is cohomologous to a constant. For convenience, in the work that follows we shall interchangeably refer to elements x ∈ Γ and the associated element of the sequence space x ∈ Σ * . We now state the technical result from which Theorem 1.1 follows. We consider functions F : Γ → R which satisfy the following two assumptions.
(A1) There exists a Hölder continuous function f : for each x ∈ Γ n with n ≥ 0, and (A2) F (x) = F (x −1 ). We will prove the following.
Theorem 2.2. Suppose that F : Γ → R satisfies assumptions (A1) and (A2). There exists F ∈ R such that We remark that, without the restriction to a conjugacy class, the analogous result holds subject only to (A1). This follows from the analysis in [14] or from a large deviations argument following the ideas of Kifer [10] as employed in [13]. We also establish a central limit theorem for the group elements in Γ restricted to a non-trivial conjugacy class. In addition to assumptions (A1) and (A2), we require a third assumption.
(A3) F (·) − F | · | is unbounded as a function from Γ to R. Proof. For simplicity, we will write f | Σ = f . If F (·) − F | · | is bounded then f n (x) − n f dµ 0 : x ∈ Γ n , n ≥ 1 is a bounded set. Since f is Hölder continuous, this implies that is also bounded. In particular, f n − n f dµ 0 2 /n converges uniformly to zero and it is easy to deduce that σ 2 f = 0. Therefore, by Proposition 2.1, f is cohomologous to a constant.
On the other hand, if f is cohomologous to a constant then, again by Hölder continuity, {F (x) − F |x| : x ∈ Γ} = f n (x) − n f dµ 0 : x ∈ Γ n , n ≥ 1 is bounded.
It is a well-known result that if f : Σ → R is not cohomologous to a constant then the process f • σ n , n ≥ 1, satisfies a central limit theorem with respect to µ 0 with variance σ 2 f , i.e., that f n − n f dµ 0 / √ n converges in distribution to a normal random variable with mean zero and variance σ 2 f > 0 or, explicitly, that for a ∈ R, [2]. Furthermore, analogues of this hold for the periodic points of σ : Σ → Σ [2] and, by adapting the proof, for pre-images of a given point. This gives a central limit theorem for F over Γ n (without the assumption (A2)). Particular cases of this have appeared in articles by Rivin [17] for homomorphisms, and Horsham and Sharp [7] (see also [6]) for quasimorphisms. Calegari and Fujiwara [1] prove a central limit theorem for quasimorphisms on Gromov hyperbolic groups, but have more restrictions on the regularity of the quasimorphism. Restricting to a non-trivial conjugacy class, we have the following theorem.
Theorem 2.4. Suppose that F : Γ → R satisfies assumptions (A1), (A2) and (A3). Then the sequence converges to the distribution function of a normal random variable with mean 0 and positive variance 2σ 2 f . We note the limiting distribution function is independent of the choice of nontrivial conjugacy class. Further, it is interesting that the variance in Theorem 2.4 is twice the variance when we do not restrict elements x ∈ Γ to a non-trivial conjugacy class.
Proof of Theorems 1.1 and 1.2. As in the introduction, let the free group Γ act convex co-compactly on a CAT(−1) space (X, d). Then it was shown in [15] that F (x) := d(o, xo) satisfies (A1). (In fact, the result in [15] is stated when X is a simply connected manifold with bounded negative curvatures but the proof only requires the CAT(−1) property.) Assumption (A2) is clearly satisfied. Therefore, Theorem 1.1 follows from Theorem 2.2. Furthermore, the additional assumption on d(o, xo) in Theorem 1.2 matches (A3) and so Theorem 1.2 also follows.

Transfer operators
In this section we recall results from the theory of transfer operators that will be used to deduce Theorem 2.2 and Theorem 2.4. Let F θ (Σ, C) denote the space of d θ -Lipschitz functions f : Σ → C. This is a Banach space with respect to the norm Any Hölder continuous function becomes Lipschitz by changing the choice of θ (i.e. if f has Hölder exponent α with respect to d θ then f ∈ F θ α (Σ, C)), so there is no loss of generality in restricting to these spaces. Given g ∈ F θ (Σ, C), the transfer operator L g : F θ (Σ, C) → F θ (Σ, C) is defined pointwise by L g ω(x) = σy=x e g(y) ω(y).
Proposition 3.1 (Ruelle-Perron-Frobenius Theorem). Suppose that g ∈ F θ (Σ, C) is real-valued. Then L g : F θ (Σ, C) → F θ (Σ, C) has a simple eigenvalue equal to e P (g) , associated strictly positive eigenfunction ψ and eigenmeasure ν (i.e. L g ψ = e P (g) ψ and L * g ν = e P (g) ν), normalised so that ν is a probability measure and ψ dν = 1. Furthermore, the rest of the spectrum of L g is contained in a disk of radius strictly smaller than e P (g) .
The equilibrium state µ g is given by dµ g = ψdν. We say that g is normalised if L g 1 = 1 (which in particular implies P (g) = 0). If we replace g by g ′ = g − P (g) + u − u • σ where u = log ψ then g ′ is normalised and g and g ′ have the same equilibrium state.
Suppose that f, g ∈ F θ (Σ, C) are real-valued functions. We consider small perturbations of the operator L g of the form L g+sf for values of s ∈ C in a neighbourhood of the origin. Since e P (g) is a simple isolated eigenvalue of L g , for small perturbations of s close to the origin this eigenvalue persists so that the operator L g+sf has a simple eigenvalue β(s) and corresponding eigenfuction ψ s that vary analytically with s and satisfy β(0) = e P (g) and ψ 0 = ψ [8]. Furthermore, by the upper semicontinuity of the spectral radius, there exists ε > 0 such that, for s close to the origin, the remainder of the spectrum of L g+sf lies in a disk of radius e P (g)−ε . We extend the definition of pressure by setting e P (g+sf ) = β(s).
We find it useful to consider σ : Σ * → Σ * as a subshift of finite type and will use the previous notation and concepts introduced for Σ in this setting. We modify the definition of the transfer operator L sf : F θ (Σ * , C) → F θ (Σ * , C) as follows: Here 1 denotes the identity element in Γ, considered as an infinite word (1, 1, . . .). We note the transfer operator we use differs from the usual definition by excluding the preimage y = 1 from the summation over the set {y ∈ Σ * : σy = x}; however, the definition of this transfer operator agrees with our previous definition for each x = 1. Following Lemma 2 of [14], L sf : F θ (Σ * , C) → F θ (Σ * , C) has the same isolated eigenvalues as L sf : F θ (Σ ∪ {1}, C) → F θ (Σ ∪ {1}, C). Since the modified definition of L sf excludes the eigenvalue e sf (1) associated to the eigenfunction χ {1} (the indicator function of the set {1}), L sf : F θ (Σ * , C) → F θ (Σ * , C) therefore has the same isolated eigenvalues as L sf : F θ (Σ, C) → F θ (Σ, C). Furthermore, again by Lemma 2 of [14], L sf : F θ (Σ * , C) → F θ (Σ * , C) is quasi-compact with essential spectral radius at most θe P (Re(s)f ) , and so it suffices to consider the spectral theory of L sf on F θ (Σ, C).

Proof of Theorem 2.2
In this section, we will prove Theorem 2.2. We introduce a generating function η C (z, s) on two complex variables given by (wherever the series converges). We prove the theorem by studying the asymptotic behaviour, as m → ∞, of the coefficient of z k+2m in the power series We will find the following bound useful in the proof of Theorem 2.2.
Lemma 4.1. Suppose that f ∈ F θ (Σ * , C), g ∈ C k and w ∈ Γ m (g) then there exists a constant K > 0, independent of m, such that is uniformly bounded for w ∈ Γ (by Lemma 4.1) and ξ w (s) = s 2 ζ w (s), with ζ w (s) an entire function. By this approximation and assumption (A2), we have Let χ g : Σ * → R be the locally constant function given by We introduced the function χ g in order to write η C (z, s) in terms of the transfer operator. We have z k+2m (L m 2sf χ g )(1) + δ(z, s).
Thus the power series ∞ m=0 z k+2m x∈C k+2m F (x) can be written in terms of the transfer operator since We analyse the growth of the coefficients of the power series in the following sequence of lemmas. The coefficient in the next lemma grows with the same order. Proof. Since, for each w ∈ Γ, ξ ′ w (0) = 0, ∂ ∂s δ(z, s) For each w ∈ Γ we have |κ w | ≤ K. Thus the coefficient of z k+2m is bounded in modulus by g∈C k w∈Γm(g) from which the lemma follows.
We decompose the transfer operator L sf into the projection R s associated to the eigenspace associated to the eigenvalue e P (sf ) and Q s = L sf − e P (sf ) R s . For s ∈ C in a neighbourhood of s = 0, the operators R s and Q s are analytic. We use this operator decomposition to obtain the estimates in the next two lemmas. Proof. Suppose that s ∈ C such that 0 ≤ |s| < δ 1 then, as discussed in section 3, if δ 1 is sufficiently small each perturbed operator L 2sf has a simple maximal eigenvalue e P (2sf ) . Moreover, for |s| < δ 1 , there exists ε 1 (δ 1 ) > 0 such that lim sup We consider the analyticity of the series Suppose that we fix z ∈ C such that |z| < e −h+ε1 , then the series converges for each s ∈ C with |s| < δ 1 . Meanwhile, given s ∈ C such that |s| < δ 1 , the series converges for each z ∈ C with |z| < e −h+ε1 . Thus, by Hartogs' theorem (Theorem 1.2.5, [11]), the series converges to an analytic function in the polydisk {s ∈ C : |s| < δ 1 } × {z ∈ C : |z| < e −h+ε1 }. Thus the power series is analytic for |z| < e −h+ε1 and so we estimate the coefficients of the power series by O(e m(h−ε) ) with 0 < ε < ε 1 .
There is one power series left to study.
from which the result follows.
Combining the above lemmas, we find that the coefficient of z k+2m in ∂ ∂s η C (z, s) s=0 satisfies the estimate g∈C k 2me mh P ′ (0)R 0 χ g (1) + O(e mh ).
Returning to Theorem 2.2 we now have Thus we have .
If we substitute f : Σ * → R given by f (x) = 1 for each x ∈ Σ * into the preceding limit we obtain Hence we have the desired result,

Proof of Theorem 2.4
In this section we will prove Theorem 2.4. By Levy's Continuity Theorem (cf. Theorem 2, Chapter XV §3, [5]), the theorem will follow if we show that the characteristic functions converge pointwise to e −σ 2 f t 2 , the characteristic function of the normal distribution with mean zero and variance 2σ 2 f . Suppose that F satisfies (A1), (A2) and (A3). By replacing F with F − F | · | (which still satisfies the three assumptions) or, equivalently, f with f − f dµ 0 , we may assume without loss of generality that f dµ 0 = 0. This reduction does not change the variance. We may then write We recall the approximation, which we obtain from Lemma 4.1, is uniformly bounded for w ∈ Γ and ξ w (s) is an entire function such that ξ w (0) = 0. Using the above approximation, we write ϕ m (t) as the sum of a leading term and an error term: where τ = 2it/ √ k + 2m and the error term ρ m (t) is given by Since the bound on κ w is uniform and ξ w (0) = 0, we find that ρ m (t) → 0 as m → ∞. We rewrite the leading term using the transfer operator as 1 #C k+2m g∈C k e τ f k (g)/2 L m τ f χ g (1).
For sufficiently large m, the simple maximal eigenvalue e P (τ f ) of the perturbed operator L τ f persists and also plays a crucial role in determining the limit of ϕ m (t) as m → ∞. Before we establish the limit, we first analyse the pressure function and establish a preliminary limit for e m(P (τ f )−h) as m → ∞.
Recall that the pressure function P (sf ) (defined as the principal branch of the logarithm of e P (sf ) ) is analytic in a neighbourhood of s = 0 and that P ′ (0) = f dµ 0 = 0. By analyticity we can choose δ > 0 such that if |s| < δ then P (2sf ) = h + 2σ 2 f s 2 + s 3 ϑ(s), for some function ϑ(s) that is analytic in a neighbourhood of s = 0. For sufficiently large m, with τ = 2it/ √ k + 2m as before, we have from which the next proposition and corollary follow.
We use the notation β(τ ) = e P (τ f ) and β(0) = e h in the proof of Proposition 5.3.
Proposition 5.3. The limit of ϕ m (t) as m → ∞ is e −σ 2 f t 2 .
Proof. Written in terms of the transfer operator and a null sequence (ρ m (t)) ∞ m=0 , the function ϕ m (t) is equal to 1 #C k+2m g∈C k e τ f k (g)/2 L m τ f χ g (1) + ρ m (t).
We recall the decomposition of the transfer operator into L sf = β(s)R s + Q s . For sufficiently large m, the leading term is given by β(τ ) m #C k+2m g∈C k e τ f k (g)/2 R τ χ g (1) + 1 #C k+2m g∈C k e τ f k (g)/2 Q m τ χ g (1).
We now turn our attention to the asymptotics for the term β(τ ) m #C k+2m g∈C k e τ f k (g)/2 R τ χ g (1).
We recall the limit lim m→∞ #C k+2m β(0) m = g∈C k R 0 χ g (1) and so, together with the above approximation, we find the limit of ϕ m (t) as m → ∞ is given by which is the desired result.