1 Introduction

We consider several asymptotic enumeration and analytic problems for sparse random regular graphs and their adjacency matrices. A graph is called regular if every vertex has the same degree; a sparse regular graph is typically one for which the degree \(d\) is either constant or of a far smaller order than the number of vertices \(n\). A classical model is the uniform distribution over all \(d\)-regular graphs on \(n\) labeled vertices; a thorough survey on properties of the uniform model can be found in [49].

Our model of choice is the more recent permutation model: Consider \(d\) many iid uniformly random permutations \(\{ \pi _1, \ldots , \pi _d \}\) on \(n\) vertices labeled \(\{1,2,\ldots ,n\}\). A graph can be constructed by adding one edge between each pair \((i, \pi _j(i))\); thus every vertex \(i\) has edges to \(\pi _j(i)\) and \(\pi ^{-1}_j(i)\) for every permutation \(\pi _j\), for a total degree of \(2d\). As the reader will note, this allows multiples edges and self-loops, with each self-loop contributing two to the degree of its vertex. However, one can still ask the usual enumeration questions about this graph, e.g., the distribution of the number of cycles.

Another way to represent this graph is by its adjacency matrix, which is an \(n\times n\) matrix whose \((i,j)\)th entry is the number of edges between \(i\) and \(j\), with self-loops counted twice. This random matrix can be now studied in it own right; for example, one can ask about the distribution of its eigenvalues. Note that—trivially—the top eigenvalue is \(2d\); the distribution of the rest of the eigenvalues is an interesting question. For the uniform model of random regular graphs (or Erdős-Rényi graphs) such questions have been studied since the pioneering work [37]. Among the more recent articles, see [22, 46], and [18]. We refer the reader to [18] for a more exhaustive review of the vast related literature.

Our results touch on both aspects. We consider two separate scenarios, either when \(d\) is independent of \(n\), or when \(d\) grows slowly with \(n\). We will assume throughout that \(d\ge 2\); the reason for this is that the \(d=1\) case has been dealt with (in a larger context) by [4].

The paper is divided into three thematically separate but mathematically dependent parts.

  1. (i)

    Section 3: Joint asymptotic distribution of a growing sequence of short cycles. It is well known in the classical models of random regular graphs that the number of cycles of length \(k\), where \(k\) is small (typically logarithmic in \(n\)), is approximately Poisson. See [10] or [49] for an account of older results, or [39] for the best result in this direction. In Theorem 11, we prove this fact for the permutation model, using Stein’s method along with ideas from [34] to estimate the total variation distance between a vector of the number of cycles of lengths \(1\) to \(r\) and a corresponding vector of independent Poisson random variables. This theorem holds for nearly the same regime of \(r,\,d\), and \(n\) as in [39, Theorem 1], and unlike that theorem gives an explicit error bound on the approximation. This bound is essential to our analysis of eigenvalue statistics in Sect. 5. The mean number of cycles is somewhat interesting. When \(d\) is fixed, for the uniform model of random \(2d\)-regular graphs, the limiting mean of the number of short cycles of length \(k\) is \((2d-1)^k/2k\). For the permutation model, the limiting mean is the slightly different quantity \(a(d,k)/2k\), where

    $$\begin{aligned} a(d,k)=\left\{ \begin{array}{l@{\quad }l} (2d-1)^{k} - 1 + 2d,&\text{ when}\, k \text{ is} \text{ even},\\ (2d-1)^{k} +1,&\text{ when}\,k \text{ is} \text{ odd}. \end{array}\right. \end{aligned}$$

    See also [32, Theorem 4.1], in which the authors consider a different model of random regular graph and find that the limiting mean number of cycles of length \(k\) differs slightly from both of these.

    Next we consider the number of short non-backtracking walks on the graph; a non-backtracking walk is a closed walk that never follows an edge and immediately retraces that same edge backwards. We actually consider cyclically non-backtracking walks (CNBWs), whose definition will be given in Sect. 3.2. Non-backtracking walks are important in both theory and practice as can be seen from the articles [1] and [25]. We consider the entire vector of cyclically non-backtracking walks of lengths \(1\) to \(r_n\), where \(r_n\) is the “boundary length” of short walks/cycles, and is growing to infinity with \(n\). In Theorem 21, we assume that \(d\) is independent of \(n\). We prove that the vector of CNBWs, as a random sequence in a weighted \(\ell ^2\) space, converges weakly to a limiting random sequence whose finite-dimensional distributions are linear sums of independent Poisson random variables.

    When \(d\) grows slowly with \(n\) (slower than any fixed power of \(n\), which is the same regime studied in [18]), a corresponding result is proved in Theorem 22. Here, we center the vector of CNBW for each \(n\). The resulting random sequence converges weakly to an infinite sequence of independent, centered normal random variables with unequal (\(\sigma _k^2=2k\)) variances.

  2. (ii)

    Section 4: An estimate of  \(C\sqrt{2d-1}\) for the second largest (in absolute value) eigenvalue for any \((d,n)\). The spectral gap of the permutation model, for fixed \(d\), has been intensely studied recently in [25] for the resolution of the Alon conjecture. This conjecture states that the second largest eigenvalue of ‘most random regular graphs’ of degree \(2d\) is less than \(2\sqrt{2d-1} + \epsilon \); the assumption is that \(d\) is kept fixed while \(n\) grows to infinity. This important conjecture implies that ‘most’ sparse random regular graphs are nearly Ramanujan (see [35]). Friedman’s work builds on earlier work [11, 20], and [24]. Although [25] and related works consider the permutation model, for fixed \(d\), their results also apply to other models due to various contiguity results; see [49, Section 4] and [26].

    To develop the precise second eigenvalue control that we require in Sect. 5, we have followed a line of reasoning that originates with Kahn and Szemerédi [21]. This approach has been used recently to great effect by [8, 22], and [36], to name a few. With this technique we are able to show that the second largest eigenvalue is bounded by \(40000\sqrt{2d-1}\) with a probability at least \(1- Cn^{-1}\) for some universal constant \(C\) (see Theorem 24). We have not attempted to find an optimal constant, and instead we focus on extricating the \(d\) and \(n\) dependence in the bound. Both [8] and [36] provide examples of how the Kahn–Szemerédi argument can be used to control the second eigenvalue when \(d\) grows with \(n\). In [8], the authors work in the configuration model to obtain the \(O(\sqrt{d})\) bound for \(d = O(\sqrt{n}),\) essentially the largest \(d\) for which the configuration model represents the uniform \(d\)-regular graph well enough to prove eigenvalue concentration. In [36], the authors study the spectra of random covers. The permutation model is an example of such a cover, where the base graph is a single point with \(d\) self loops. Using the Kahn–Szemerédi machinery, they are able to show an \(O(\sqrt{d} \log d)\) bound with \(d(n) = \text{ poly}(n)\) growth. The adaptations to the original Kahn–Szemerédi argument made in [36], especially the usage of Freedman’s martingale inequality, are similar to the ones made here. However, as we do not need to consider the geometry of the base graph, we are able to push this argument to prove a non-asymptotic bound of the correct order.

  3. (iii)

    Section 5: Limiting distribution of linear eigenvalue statistics of the rescaled adjacency matrix. Our final section is in the spirit of Random Matrix Theory (RMT). Let \(A_n\) denote the adjacency matrix of a random regular graph on \(n\) vertices. By linear statistics of the spectrum we mean random variables of the type \(\sum _{i=1}^n f(\lambda _i)\), where \(\lambda _1 \ge \cdots \ge \lambda _n\) are the \(n\) eigenvalues of the symmetric matrix \((2d-1)^{-1/2}A_n\). We do this rescaling of \(A_n\) irrespective of whether \(d\) is fixed or growing so as to keep all but the first eigenvalue bounded with high probability.

    The limiting distribution of linear eigenvalue statistics for various RMT models such as the classical invariant ensembles or the Wigner/Wishart matrices has been (and continues to be) widely studied. For the sake of space, we give here only a brief (and therefore incomplete) list of methods and papers which study the subject. For a more in-depth review, we refer the reader to [2].

    The first, and still one of the most widely used methods of approach is the method of moments, introduced in [48], used in [29] and perfected in [43] for Wigner matrices (it also works for Wishart); this method is also used here in conjunction with other tools. Explicit moment calculations alongside Toeplitz determinants have also been used in determining the linear statistics of circular ensembles [17, 27, 44].

    Other methods include the Stieltjes transform method (also known as the method of resolvents), which was employed with much success in a series of papers of which we mention [12] and [33]; the (quite analytical) method of potentials, which works on a different class of random matrices including the Gaussian Wigner ones [28]; stochastic calculus [13]; and free probability [30]. Finally, a completely different set of techniques were explored in [15].

    Recently and notably, for a single permutation matrix, such a study has been approached in [47] and completed in [4]; our results share several features with the latter paper.

    A noteworthy aspect in all these is that when the function \(f\) is smooth enough (usually analytic), the variance of the random variables \(\sum _{i=1}^n f(\lambda _i)\) typically remains bounded. This is attributed to eigenvalue repulsion; see [5, Section 21.2.2] for further discussion. Even more interestingly, there is no process convergence of the cumulative distribution function. This can be guessed from the fact that when the function \(f\) is rough (e.g., the characteristic function of an interval), the variance of the linear statistics grows slowly with \(n\) (as seen for example in [16] and [42]). One major difference our models have with the classical ensembles is that our matrices are sparse; their sparsity affects the behavior of the limit.

    In Theorems 35 and 39 we prove limiting distributions of linear eigenvalue statistics. For fixed \(d\), the functions we cover are those that are analytically continuable to a large enough ellipse containing a compact interval of spectral support. When \(d\) grows we need functions that are slightly more smooth. Let \((T_k)_{k\in \mathbb N }\) be the Chebyshev polynomials of the first kind on a certain compact interval; since they constitute a basis for \(\mathbf{L}^2\) functions, any such function admits a decomposition in a Fourier-like series expressed in terms of the Chebyshev polynomials. The required smoothness is characterized in terms of how quickly the truncated series converges in the supremum norm to the actual function on the given interval. In Theorem 35, we consider \(d\) to be fixed. The limiting distribution of the linear eigenvalue statistics is a non-Gaussian infinitely divisible distribution. This is consistent with the results in [4]. Theorem 39 proves a Gaussian limit in the case of a slowly growing \(d\) after we have appropriately centered the random variables. This transition is expected. In [18] the authors consider the uniform model of random regular graphs and show that when \(d\) is growing slowly, the spectrum of the adjacency matrix starts resembling that of a real symmetric Wigner matrix. Similar techniques, coupled with estimates proved in this paper, could be used to extend such results to the present model.

    The proofs in this section follow easily from the results in parts (i) and (ii) above. As in [18], the proofs display interesting combinatorial interpretations of analytic quantities common in RMT.

2 A weak convergence set-up

The following weak convergence set-up will be used to prove the limit theorems in the later text. Let \(\underline{\omega }:=(\omega _m)_{m\in \mathbb N }\) be a sequence of positive weights that decay to zero at a suitable rate as \(m\) tends to infinity. Let \(\mathbf{L}^2(\underline{\omega })\) denote the space of sequences \((x_m)_{m \in \mathbb N }\) that are square-integrable with respect to \(\underline{\omega }\), i.e., \(\sum _{m=1}^{\infty } x_m^2 \omega _m < \infty \). Our underlying complete separable metric space will be \(X=(\mathbf{L}^2(\underline{\omega }), \left||\cdot \right||)\), where \(\left||\cdot \right||\) denotes the usual norm.

Remark 1

Although we have chosen to work with \(\mathbf{L}^2\) for simplicity, any \(\mathbf{L}^p\) space would have worked as well.

Let us denote the space of probability measures on the Borel \(\sigma \)-algebra of \(X\) by \(\mathbb P (X)\). We will skip mentioning the Borel \(\sigma \)-algebra and refer to a member of \(\mathbb P (X)\) as a probability measure on \(X\). We equip \(\mathbb P (X)\) with the Prokhorov metric for weak convergence; for the standard results on weak convergence we use below, please consult Chapter 3 in [19]. Let \(\rho \) denote the Prokhorov metric on \(\mathbb P (X) \times \mathbb P (X)\) as given in [19, eqn. (1.1) on page 96].

Lemma 2

The metric space \((\mathbb P (X), \rho )\) is a complete separable metric space.

Proof

The claim follows from [19, Thm. 1.7, p. 101] since \(X\) is a complete separable metric space. \(\square \)

To prove tightness of subsets of \(\mathbb P (X)\) we will use the following class of compact subsets of \(\mathbf{L}^2(\underline{\omega })\).

Lemma 3

(The infinite cube) Let \((a_m)_{m\in \mathbb N } \in \mathbf{L}^2(\underline{\omega })\) be such that \(a_m \ge 0\) for every \(m\). Then the set

$$\begin{aligned} \left\{ (b_m)_{m\in \mathbb N }\in \mathbf{L}^2(\underline{\omega }):\quad 0\le \left|b_m \right| \le a_m\quad for\,all\quad m \in \mathbb N \right\} \end{aligned}$$

is compact in \((\mathbf{L}^2(\underline{\omega }), \left||\cdot \right||)\).

Proof

First observe that the cube is compact in the product topology by Tychonoff’s theorem. Norm convergence to the limit points now follows by the Dominated Convergence Theorem. \(\square \)

We now explore some consequences of relative compactness.

Lemma 4

Suppose \(\{X_n\}\) and \(X\) are random sequences taking values in \(\mathbf{L}^2(\underline{\omega })\) such that \(X_n\) converges in law to \(X\). Then, for any \(b \in \mathbf{L}^2(\underline{\omega })\), the random variables \(\left\langle b, X_n \right\rangle \) converges in law to \(\left\langle b,X \right\rangle \).

Proof

This is a corollary of the usual Continuous Mapping Theorem. \(\square \)

Our final lemma shows that finite-dimensional distributions characterize a probability measure on the Borel \(\sigma \)-algebra on \(X\).

Lemma 5

Let \(x\) be a typical element in \(X\). Let \(P\) and \(Q\) be two probability measures on \(X\). Suppose for any finite collection of indices \((i_1, \ldots , i_k)\), the law of the random vector \((x_{i_1}, \ldots , x_{ i_k})\) is the same under both \(P\) and \(Q\). Then \(P=Q\) on the entire Borel \(\sigma \)-algebra.

Proof

Our claim will follow once we show that \(P\) and \(Q\) give identical mass to every basic open neighborhood determined by the norm; however, the norm function \(x \mapsto \left||x \right||\) is measurable with respect to the \(\sigma \)-algebra generated by coordinate projections. Now, under our assumption, every finite-dimensional distribution is identical under \(P\) and \(Q\); hence the probability measures \(P\) and \(Q\) are identical on the coordinate \(\sigma \)-algebra. This proves our claim. \(\square \)

3 Some results on Poisson approximation

3.1 Cycles in random regular graphs

Let \(G_n\) be the \(2d\)-regular graph on \(n\) vertices sampled from \(\mathcal G _{n,2d}\), the permutation model of random regular graphs. The graph \(G_n\) is generated from the uniform random permutations \(\pi _1,\ldots ,\pi _d\) as described in the introduction. Assume that the vertices of \(G_n\) are labeled by \(\{1,\ldots ,n\}\), and let \(C^{(n)}_k\) denote the number of (simple) cycles of length \(k\) in \(G_n\).

We start by giving the limiting distribution of \(C^{(n)}_k\) as \(n\rightarrow \infty \). Suppose that \(w=w_1\cdots w_k\) is a word on the letters \(\pi _1,\ldots ,\pi _d\) and \(\pi _1^{-1},\ldots ,\pi _d^{-1}\). We call \(w\) cyclically reduced if \(w_1\ne w_k^{-1}\) and \(w_i\ne w_{i+1}^{-1}\) for \(1\le i < k\). Let \(a(d,k)\) denote the number of cyclically reduced words of length \(k\) on this alphabet.

Proposition 6

As \(n\rightarrow \infty \) while \(k\) and \(d\) are kept fixed,

$$\begin{aligned} C^{(n)}_k\,\mathop {\longrightarrow }\limits ^{\mathcal{L }}\,\mathrm{Poi}\left(\frac{a(d,k)}{2k}\right). \end{aligned}$$

We will actually give a stronger version of this result in Theorem 8, but we include this proposition nevertheless because it has a more elementary proof, and because in proving it we will develop some lemmas that will come in handy later. We also note the following exact expression for \(a(d,k)\),

$$\begin{aligned} a(d,2k) = (2d-1)^{2k} -1+2d,\quad \text{ and}\quad a(d,2k+1)= (2d-1)^{2k+1}+1, \end{aligned}$$
(1)

whose proof we provide in the Appendix (see Lemma 41).

Our argument heavily uses the concepts of [34], but we will try to make our proof self-contained. Let \(\mathcal W \) be the set of cyclically reduced words of length \(k\) on letters \(\pi _1,\ldots ,\pi _d\) and \(\pi _1^{-1},\ldots ,\pi _d^{-1}\). For \(w\in \mathcal W \), we define a closed trail with word \(w\) to be an object of the form

with \(s_i\in \{1,\ldots ,n\}\). In Sect. 3.1, we will consider only the case where \(s_0,\ldots ,s_{k-1}\) are distinct, though we will drop this assumption in Sect. 3.2. We say that the trail appears in \(G_n\) if \(w_1(s_0)=s_1,\,w_2(s_1)=s_2\), and so on. In other words, we are considering \(G_n\) as a directed graph with edges labeled by the permutations that gave rise to them, and we are asking if it contains the trail as a subgraph. We note that a trail (with distinct vertices) can only appear in \(G_n\) if its word is cyclically reduced.

To give an idea of the method we will use, we demonstrate how to calculate \(\lim _{n\rightarrow \infty }\mathbf{E}[C_k^{(n)}]\). Suppose we have a trail with word \(w\). Let \(e_w^i\) be the number of times \(\pi _i\) or \(\pi _i^{-1}\) appears in \(w\). It is straightforward to see that the trail appears in \(G_n\) with probability \(\prod _{i=1}^d 1/[n]_{e_w^i}\), where

$$\begin{aligned}{}[x]_j=x(x-1)\cdots (x-j+1) \end{aligned}$$

is the falling factorial or Pochhammer symbol.

For every word in \(\mathcal W \), there are \([n]_k\) trails with that word. The total number of trails of length \(k\) contained in \(G_n\) is \(2k\) times the number of cycles, so

$$\begin{aligned} 2k\mathbf{E}[C_k^{(n)}]=\sum _{w\in \mathcal W }[n]_k\prod _{i=1}^d\frac{1}{[n]_{e_w^i}}. \end{aligned}$$
(2)

Each summand converges to 1 as \(n\rightarrow \infty \), giving \(\mathbf{E}[C_k^{(n)}]\rightarrow a(d,k)/2k\), consistent with Proposition 6.

To prove Proposition 6, we will need to count more complicated objects than in the above example, and we will need some machinery from [34]. Suppose we have the following list of \(r\) trails with associated words \(w^1,\ldots ,w^r\):

(3)

with \(s_i^j\in \{1,\ldots ,n\}\). Though we take the vertices \(s^j_0,\ldots ,s^j_{k-1}\) of each trail to be distinct, vertices from different trails may coincide (see Fig. 1 for an example).

Suppose we have another list of \(r\) trails, \((u_i^j,\ 0\le i\le k, 1\le j\le r)\) with the same words \(w^1,\ldots ,w^r\). We say that these two lists are of the same category if \(s_i^j=s_{i^{\prime }}^{j^{\prime }}\iff u_i^j=u_{i^{\prime }}^{j^{\prime }}\). Roughly speaking, this means that the trails in the two lists overlap each other in the same way. The probability that some list of trails appears in \(G_n\) depends only on its category.

We can represent each category as a directed, edge-labeled graph depicting the overlap of the trails. This is more complicated to explain than to do, and we encourage the reader to simply look at the example in Fig. 1, or at Figure 7 in [34]. Given the list of trails \((s_i^j)\), we define this graph as follows. First, reconsider the variables \(s_i^j\) simply as abstract labels rather than elements of \(\{1,\ldots , n\}\), and partition these labels by placing any two of them in a block together if (considered as integers again) they are equal. The graph has these blocks as its vertices. It includes an edge labeled \(\pi _i\) from one block to another if the trails include a step labeled \(\pi _i\) or \(\pi _i^{-1}\) from any vertex in the first block to any vertex in the second; this edge should be directed according to whether the step was labeled \(\pi _i\) or \(\pi _i^{-1}\).

Fig. 1
figure 1

A list of two trails, and the graph associated with its category. Since \(s_2^1=s_3^2=3\), the vertices \(s_2^1\) and \(s_3^2\) are blocked together in the graph, and since \(s_1^1=s_2^2=1\), the vertices \(s_1^1\) and \(s_2^2\) are blocked together

Suppose that \(\Gamma \) is the graph of a category of a list of trails, and define \(X_\Gamma ^{(n)}\) to be the number of tuples of trails of category \(\Gamma \) found in \(G_n\). If \(\Gamma \) is the graph of a category of a list of a single trail with word \(w\in \mathcal W \), we write \(X_w^{(n)}\) for \(X_\Gamma ^{(n)}\). Note that such graphs have a simple form demonstrated in Fig. 2.

Fig. 2
figure 2

The graph \(\Gamma \) associated with a single trail with word \(\pi _2\pi _1^{-1}\pi _2\pi _1 \pi _2\pi _3^{-1}\)

Lemma 7

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbf{E}[X_\Gamma ^{(n)}]={\left\{ \begin{array}{ll} 1&if\,\Gamma \,has\,the\,same\,number\,of\,vertices\, as\,edges\\ 0&otherwise \end{array}\right.} \end{aligned}$$

To demonstrate the connection to the calculation we performed in (2), observe that

$$\begin{aligned} 2kC_k^{(n)}=\sum _{w\in \mathcal W }X_w^{(n)}, \end{aligned}$$

and by our lemma the expectation of this converges to \(a(d,k)\) as \(n\rightarrow \infty \).

Proof of Lemma 7

This is essentially the same calculation as in (2). Let \(e\) and \(v\) be the number of edges and vertices, respectively, of the graph \(\Gamma \). Let \(e_i\) be the number of edges in \(\Gamma \) labeled by \(\pi _i\).

There are \([n]_v\) different trails of category \(\Gamma \), corresponding to the number of ways to assign vertices \(\{1,\ldots , n\}\) to the vertices of \(\Gamma \). Since each of these trails appears in \(G_n\) with probability \(\prod _{i=1}^d1/[n]_{e_i}\),

$$\begin{aligned} \mathbf{E}[X_\Gamma ^{(n)}] = [n]_v\prod _{i=1}^d\frac{1}{[n]_{e_i}}. \end{aligned}$$
(4)

As \(n\rightarrow \infty \), this converges to 0 if \(e>v\) and to 1 if \(e=v\). If \(\Gamma \) is the graph of a category of a list of trails, then every vertex has degree at least 2, so it never happens that \(e<v\), which completes the lemma. We note for later use that this remains true even when we drop the requirement that all vertices of a trail be distinct, so long as the word of each trail is cyclically reduced. \(\square \)

Proof of Proposition 6

We will use the moment method. Fix a positive integer \(r\). The main idea of the proof is interpret \(\big (C_{k}^{(n)}\big )^r\) as the number of \(r\)-tuples of cycles of length \(k\) in \(G_n\). As there are \(2k\) closed trails for every cycle of length \(k\), we can also think of it as \((2k)^{-r}\) times the number of \(r\)-tuples of closed trails of length \(k\) in \(G_n\).

Let \(\mathcal G \) be the set of graphs of categories of lists of \(r\) trails of length \(k\). The above interpretation implies that

$$\begin{aligned} \big (C_{k}^{(n)}\big )^r = \frac{1}{(2k)^r}\sum _{\Gamma \in \mathcal G }X_{\Gamma }^{(n)}. \end{aligned}$$
(5)

By Lemma 7, we can compute \(\lim _{n\rightarrow \infty }\mathbf{E}\big (C_{k}^{(n)}\big )^r\) by counting the number of graphs in \(\Gamma \) with the same number of edges as vertices. Let \(\mathcal G ^{\prime }\subset \mathcal G \) be the set of such graphs.

Let \(\Gamma \in \mathcal G ^{\prime }\), and consider some list of \(r\) trails of category \(\Gamma \). Since \(\Gamma \) has as many edges as vertices, it consists of disjoint cycles. This implies that for any two trails in the list, either the trails are wholly identified in \(\Gamma \), or they are are disjoint. These identifications of the \(r\) different trails give a partition of \(r\) objects.

Fig. 3
figure 3

A graph formed from three trails of length \(6\), all identified with each other. There are \(a(d,6)\) choices for the edge-labels \(w\). There are six choices for which element \(s^2_a\) will be identified with \(s^1_0\), and two choices for how to orient the trail \(s^2\) when identifying it with \(s^1\). There are also six choices for which element \(s^3_b\) will be identified with \(s^1_0\), along with another two choices for its orientation. All together, there are \(a(d,6)(2\cdot 6)^2\) elements of \(\mathcal G ^{\prime }\) corresponding to the partition of three elements into one part

Given some partition of the \(r\) objects into \(m\) parts, we will count the graphs in \(\mathcal G ^{\prime }\) whose trails are identified according to the partition (see Fig. 3 for an example). Consider some part consisting of \(p\) trails. The trails form a cycle in \(\Gamma \); we need to count the number of different ways to label the edges and vertices. There are \(a(d,k)\) different ways to label the edges. Each trail in the part can have its vertices identified with those of the first trail in \(2k\) different ways, for a total of \((2k)^{p-1}\) choices. Thus the number of choices for this part is \(a(d,k)(2k)^{p-1}\). Doing this for every part in the given partition, we have a total of \(a(d,k)^m(2k)^{r-m}\). Recalling that the number of partitions of \(r\) objects into \(m\) parts is given by the Stirling number of the second kind \(S(r,m)\),

$$\begin{aligned} |\mathcal G ^{\prime }| = \sum _{m=1}^r S(r,m)a(d,k)^m(2k)^{r-m}. \end{aligned}$$

By (5) and Lemma 7,

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbf{E}\big (C_{k}^{(n)}\big )^r=\sum _{m=1}^r S(r,m) \left(\frac{a(d,k)}{2k}\right)^m. \end{aligned}$$

It is well known that this is the \(r\)th moment of the \(\mathrm{Poi}(a(d,k)/2k)\) distribution (see for example [40]), and that this distribution is determined by its moments, thus proving the theorem. \(\square \)

This theorem tells us the limiting distribution of \(C_k^{(n)}\) as \(n\rightarrow \infty \), with \(d\) and \(k\) fixed, but tells us nothing if \(d\) and \(k\) grow with \(n\). The following theorem addresses this, and gives us a quantitative bound on the rate of convergence. We will assume throughout that \(d\ge 2\); we use this assumption only to simplify some of our asymptotic quantities, but as far better results for the \(d=1\) case are already known (see [3]), we see no reason to complicate things. For clarity, we state this and future results with an explicit constant rather than big-O notation, but it is the order, not the constant, that interests us. Recall that the total variation distance between two probability measures is the largest possible difference between the probabilities that they assign to the same event.

Theorem 8

There is a constant \(C_0\) such that for any \(n,\,k\), and \(d\ge 2\), the total variation distance between the law of \(C_k^{(n)}\) and \(\mathrm{Poi}(a(d,k)/2k)\) is bounded by \(C_0k(2d-1)^k/n\).

Proof

We will prove this using Stein’s method; good introductions to Stein’s method for the Poisson distribution can be found in [6, 14], and especially [9], which focuses on the the technique of size-biased coupling that we will employ. We give here the basic set-up. Let \(\mathbb Z _+\) denote the nonnegative integers. For any \(A\subset \mathbb Z _+\), let \(g=g_{\lambda ,A}\) be the function on \(\mathbb Z _+\) satisfying

$$\begin{aligned} \lambda g(j+1)-jg(j)=\mathbf{1}_{j\in A}-\mathrm{Poi}(\lambda )\{A\} \end{aligned}$$

with \(g(0)=0\), where \(\mathrm{Poi}(\lambda )\{A\}\) denotes the measure of \(A\) under the \(\mathrm{Poi}(\lambda )\) distribution. This function \(g\) is the called the solution to the Stein equation. For any nonnegative integer-valued random variable \(X\),

$$\begin{aligned} \mathbf{P}[X\in A]-\mathrm{Poi}(\lambda )\{A\}=\mathbf{E}[\lambda g(X+1)-Xg(X)]. \end{aligned}$$
(6)

Bounding the right hand side of this equation over all choices of \(g\) thus bounds the total variation distance between the law of \(X\) and the \(\mathrm{Poi}(\lambda )\) distribution. The following estimates on \(g\) are standard (see [9, Lemma 1.1.1], for example):

$$\begin{aligned} \left||g \right||_{\infty }\le \min (1,\lambda ^{-1/2}),\quad \Delta g\le \min (1,\lambda ^{-1}), \end{aligned}$$
(7)

where \(\Delta g=\sup _{j}|g(j+1)-g(j)|\).

Let \(\mathcal C \) be the set of closed trails of length \(k\) on \(n\) vertices, with two trails identified if one is a cyclic or inverted cyclic shift of one another. Elements of \(\mathcal C \) are essentially cycles in the complete graph on \(n\) vertices, with edges labeled by \(\pi _1,\ldots ,\pi _d\) and \(\pi _1^{-1},\ldots ,\pi _d^{-1}\). We note that \(|\mathcal C |=[n]_ka(d,k)/2k\).

For \(t\in \mathcal C \), let \(F_t=\mathbf{1}_{(t\,\mathrm{occurs \ in}\,G_n)}\). Let \(\lambda =a(d,k)/2k\). We abbreviate \(C_k^{(n)}\) to \(C\), and we note that \(C=\sum _{t\in \mathcal C }F_t\). We can evaluate the right hand side of (6) as

$$\begin{aligned} \mathbf{E}[\lambda g(C+1)-C g(C)] =\sum _{s\in \mathcal C }\left(\frac{1}{[n]_k}\mathbf{E}[g(C+1)]-\mathbf{E}[F_sg(C)]\right) \end{aligned}$$

Let \(p_t=\mathbf{E}[F_t]\). We note that \(F_sg(C)=F_sg\big (\sum _{t\ne s}F_t+1\big )\), and that

$$\begin{aligned} \mathbf{E}\left[F_sg\left(\sum _{t\ne s}F_t+1\right)\right] =p_s\mathbf{E}\left.\left[g\left(\sum _{t\ne s}F_t+1\right)\ \right|\ F_s=1\right]\!. \end{aligned}$$

In Lemma 10, we will construct for each \(s\in \mathcal C \) a random variable \(Y_s\) on the same probability space as \(C\) that has the distribution of \(\sum _{t\ne s}F_t\) conditioned on \(F_s=1\). Then we evaluate

$$\begin{aligned} |\mathbf{E}[\lambda g(C+1)-C g(C)]|&= \left| \sum _{s\in \mathcal C }\left(\frac{1}{[n]_k}\mathbf{E}[g(C+1)]-p_s\mathbf{E}[g(Y_s+1)]\right) \right|\\&\le \sum _{s\in \mathcal C }\frac{1}{[n]_k}\mathbf{E}\big |g(C+1)-g(Y_s+1)\big |\\&+\sum _{s\in \mathcal C }\left|\frac{1}{[n]_k}-p_s\right|\mathbf{E}\big |g(Y_s+1)\big |. \end{aligned}$$

We bound these terms as follows:

$$\begin{aligned} |g(C+1)-g(Y_s+1)|\le \Delta g |C-Y_s| \end{aligned}$$

and

$$\begin{aligned} \left|\frac{1}{[n]_k}-p_s\right|\le \left|\frac{1}{[n]_k}-\frac{1}{n^k} \right|\le \frac{k^2}{2n[n]_k}. \end{aligned}$$

This last bound makes use of the inequality \([n]_k\ge n^k(1-k^2/2n)\). Applying these bounds gives

$$\begin{aligned} |\mathbf{E}[\lambda g(C+1)-C g(C)]|&\le \sum _{s\in \mathcal C }\frac{\Delta g}{[n]_k}\mathbf{E}|C-Y_s| +|\mathcal C |\frac{k^2}{2n[n]_k}\left||g \right||_{\infty }\nonumber \\&\le \frac{\Delta g}{[n]_k}\sum _{s\in \mathcal C }\mathbf{E}|C-Y_s| +O\left(\frac{k^{3/2}(2d-1)^{k/2}}{n}\right). \end{aligned}$$
(8)

To get a good bound on this, we just need to demonstrate how to construct \(Y_s\) so that \(\mathbf{E}|C-Y_s|\) is small. We sketch our method as follows: Fix \(s\in \mathcal C \), and let \(G_n^{\prime }\) be a random graph on \(n\) vertices distributed as \(G_n\) conditioned to contain the cycle \(s\). We will couple \(G_n^{\prime }\) with \(G_n\) in a natural way, and then prove in Lemma 9 that \(G_n\) and \(G_n^{\prime }\) differ only slightly. We then define \(Y_s\) in terms of \(G_n^{\prime }\), and we establish in Lemma 10 that \(\mathbf{E}|C-Y_s|\) is small. Finally, we finish the proof of Theorem 8 by using these results to bound the right side of (8).

We start by constructing \(G_n^{\prime }\). Fix some \(s\in \mathcal C \). The basic idea is to modify the permutations \(\pi _1,\ldots ,\pi _d\) to get random permutations \(\pi _1^{\prime },\ldots ,\pi _d^{\prime }\), which we will then use to create a \(2d\)-regular graph \(G_n^{\prime }\) in the usual way. Before we give our construction of \(\pi _1^{\prime },\ldots ,\pi _d^{\prime }\), we consider what distributions they should have. Suppose for example that \(d=3\) and \(s\) is

To force \(G_n^{\prime }\) to contain \(s,\,\pi _1^{\prime }\) should be a uniform random permutation conditioned to make \(\pi _1^{\prime }(4)=1\) and \(\pi _1^{\prime }(3)=2,\,\pi _2^{\prime }\) a uniform random permutation with no conditioning, and \(\pi _3^{\prime }\) a uniform random permutation conditioned to make \(\pi _3^{\prime }(1)=2\) and \(\pi _3^{\prime }(3)=4\).

We now describe the construction of \(\pi _1^{\prime },\ldots ,\pi _d^{\prime }\). Suppose \(s\) has the form

(9)

(The element \(s\) is actually an equivalence class of the \(2k\) different cyclic and inverted cyclic shifts of the above trail, but we will continue to represent it as above.) Let \(1\le l\le d\), and suppose that the edge-labels \(\pi _l\) and \(\pi _l^{-1}\) appear \(M\) times in the cycle \(s\), and let \((a_m,b_m)\) for \(1\le m\le M\) be these edges. If \((a_m,b_m)\) is labeled \(\pi _l\), then \(a_m\) is the tail and \(b_m\) the head of the edge; if it is labeled \(\pi _l^{-1}\), then \(a_m\) is the head and \(b_m\) the tail. We must construct \(\pi _l^{\prime }\) to have the uniform distribution conditioned on \(\pi _l^{\prime }(a_m)=b_m\) for \((a_m,b_m),\ 1\le m\le M\).

We define a sequence of random transpositions by the following algorithm: Let \(\tau _1\) swap \(\pi _l(a_1)\) and \(b_1\). Let \(\tau _2\) swap \(\tau _1\pi _l(a_2)\) and \(b_2\), and so on. We then define \(\pi _l^{\prime }=\tau _M\cdots \tau _1\pi _l\). This permutation satisfies \(\pi _l^{\prime }(a_m)=b_m\) for \(1\le m\le M\), and it is distributed uniformly, subject to the given constraints, which is easily proven by induction on each swap. This completes our construction of \(\pi _1^{\prime },\ldots ,\pi _d^{\prime }\).

We now define \(G_n^{\prime }\) to be the random graph on \(n\) vertices with edges \((i,\pi _j^{\prime }(i))\) for every \(1\le i\le n\) and \(1\le j\le d\). It is evident that \(G_n^{\prime }\) is defined on the same probability space as \(G\) and is distributed as \(G_n\) conditioned on containing \(s\). The key fact is that \(G_n^{\prime }\) is nearly identical to \(G_n\):

Lemma 9

Suppose there is an edge contained in \(G_n\) but not in \(G_n^{\prime }\). Then the trail \(s\) contains either an edge of the form or of the form .

Proof

Suppose \(\pi _l(i)=j\), but \(\pi _l^{\prime }(i)\ne j\). Then \(j\) must have been swapped when making \(\pi ^{\prime }_l\), which can happen only if \(\pi _l(a_m)=j\) or \(b_m=j\) for some \(m\). In the first case, \(a_m=i\) and \(s\) contains the edge with \(b_m\ne j\), and in the second \(s\) contains the edge with \(a_m\ne i\). \(\square \)

If \(s\) contains an edge of the form or of the form , then \(G_n^{\prime }\) cannot possibly contain while still containing \(s\). The preceding lemma then says that we have coupled \(G_n\) and \(G_n^{\prime }\) as best we can, in the following sense: \(G_n^{\prime }\) keeps as many edges of \(G_n\) that it can, given that it contains \(s\).

For \(t\in \mathcal C \), let \(F^{\prime }_t=\mathbf{1}_{(G_n^{\prime }\,\mathrm{contains}\,t)}\). Define \(Y_s\) by \(Y_s=\sum _{t\ne s}F_t^{\prime }\). Since \(G_n^{\prime }\) is distributed as \(G_n\) conditioned to contain \(s\), the random variable \(Y_s\) is distributed as \(\sum _{t\ne s}F_t\) conditioned on \(F_s=1\). We now proceed to bound \(\mathbf{E}|C-Y_s|\), adding in the minor technical condition that \(k<n^{1/6}\).

Lemma 10

There exists an absolute constant \(C_1\) with the following property. For any \(s\in \mathcal C \) and \(Y_s\) defined above, and for all \(n,\,k\), and \(d\ge 2\) satisfying \(k<n^{1/6}\),

$$\begin{aligned} \mathbf{E}|C-Y_s|\le \frac{C_1k(2d-1)^k}{n}, \end{aligned}$$
(10)

Proof

We start by partitioning the cycles of \(\mathcal C \) according to how many edges they share with \(s\). Define \(\mathcal C _{-1}\) as all elements in \(\mathcal C \) that contain an edge with \(v\ne s_{i+1}\) or an edge with \(v\ne s_i\). For \(0\le j<k\), define \(\mathcal C _j\) as all elements of \(\mathcal C \setminus \mathcal C _{-1}\) that share exactly \(j\) edges with \(s\).

The sets \(\mathcal C _{-1},\ldots , \mathcal C _{k-1}\) include every element of \(\mathcal C \) except for \(s\). Loosely, this classifies elements of \(\mathcal C \) according their likelihood of appearing in \(G_n^{\prime }\) compared to in \(G_n\): trails in \(\mathcal C _{-1}\) never appear in \(G_n^{\prime }\); trails in \(\mathcal C _0\) appear in \(G_n^{\prime }\) with nearly the same probability as in \(G_n\); and the trails in \(\mathcal C _i\) appear in \(G_n^{\prime }\) considerably more often than in \(G_n\).

This classification of elements of \(\mathcal C \) works nicely with our coupling. Suppose \(t\in \mathcal C _i\) for \(i\ge 0\). Lemma 9 shows that if \(t\) appears in \(G_n\), it must also appear in \(G_n^{\prime }\). That is, \(F_t^{\prime }\ge F_t\) for all \(t\in \mathcal C _i\) for \(i\ge 0\). On the other hand, \(F_t^{\prime }=0\) for all \(t\in \mathcal C _{-1}\). Using this,

$$\begin{aligned} \mathbf{E}|C-Y_s|&= \mathbf{E}\left|F_s+\sum _{t\in \mathcal C _{-1}}(F_t-F^{\prime }_t) +\sum _{t\in \mathcal C _0}(F_t-F^{\prime }_t)+\sum _{i=1}^{k-1}\sum _{t\in \mathcal C _i} (F_t-F^{\prime }_t)\right|\nonumber \\&\le p_s+\mathbf{E}\left|\sum _{t\in \mathcal C _{-1}}(F_t-F^{\prime }_t)\right| +\mathbf{E}\left|\sum _{t\in \mathcal C _0}(F_t-F^{\prime }_t)\right| +\mathbf{E}\left|\sum _{i=1}^{k-1}\sum _{t\in \mathcal C _i}(F_t-F^{\prime }_t)\right|\nonumber \\&= p_s+\sum _{t\in \mathcal C _{-1}}\mathbf{E}[F_t]+\sum _{t\in \mathcal C _0} \mathbf{E}[F_t^{\prime }-F_t]+\sum _{i=1}^{k-1}\sum _{t\in \mathcal C _i}\mathbf{E}[F^{\prime }_t-F_t]\nonumber \\&\le p_s+\sum _{t\in \mathcal C _{-1}}p_t+\sum _{t\in \mathcal C _0} (p_t^{\prime }-p_t)+\sum _{i=1}^{k-1}\sum _{t\in \mathcal C _i}p^{\prime }_t, \end{aligned}$$
(11)

with \(p^{\prime }_t=\mathbf{E}[F^{\prime }_t]\).

The rest of the proof is an analysis of \(|\mathcal C _i|\) and of \(p_t^{\prime }\). We start by considering the first sum. For any edge with \(v\ne s_{i+1}\) or with \(v\ne s_i\), there are no more than \([n-2]_{k-2}(2d-1)^{k-1}\) trails containing that edge (identifying cyclic and inverted cyclic shifts). This gives the bound

$$\begin{aligned} |\mathcal C _{-1}|\le 2k(n-2)[n-2]_{k-2}(2d-1)^{k-1}. \end{aligned}$$

Applying \(p_t\le 1/[n]_k\),

$$\begin{aligned} \sum _{t\in \mathcal C _{-1}}p_t =O\left(\frac{k(2d-1)^{k-1}}{n}\right). \end{aligned}$$

For the next sum, we note that with \(e_t^i\) denoting the number of times \(\pi _i\) and \(\pi _i^{-1}\) appear in the word of \(t\), for for any \(t\in \mathcal C _0\),

$$\begin{aligned} p_t=\prod _{i=1}^d\frac{1}{[n]_{e_t^i}},\quad p^{\prime }_t=\prod _{i=1}^d\frac{1}{[n-e_s^i]_{e_t^i}}. \end{aligned}$$

Thus we have \(p^{\prime }_t\le 1/[n-k]_k\) and \(p_t\ge 1/n^k\). Using the bound \(|\mathcal C _0|\le |\mathcal C |=a(d,k)[n]_k/2k\), we have

$$\begin{aligned} \sum _{t\in \mathcal C _0} (p_t^{\prime }-p_t)&\le \frac{a(d,k)[n]_k}{2k}\left(\frac{1}{[n-k]_k}- \frac{1}{n^k}\right)\\&= \frac{a(d,k)}{2k}\left(\left(\frac{n}{n-k}\right)^k \left(1+O \left(\frac{k^2}{n}\right)\right)- \left(1+O \left(\frac{k^2}{n}\right)\right) \right)\\&= \frac{a(d,k)}{2k}\left(\left(1+\frac{k}{n-k}\right)^k \left(1+O \left(\frac{k^2}{n}\right)\right)- \left(1+O \left(\frac{k^2}{n}\right)\right) \right)\\&= O \left(\frac{k(2d-1)^k}{n}\right). \end{aligned}$$

The last and most involved calculation is to bound \(|\mathcal C _i|\). Fix some choice of \(i\) edges of \(s\). We start by counting the number of cycles in \(\mathcal C _i\) that share exactly these edges with \(s\). We illustrate this process in Fig. 4. Call the graph consisting of these edges \(H\), and suppose that \(H\) has \(p\) components. Since it is a forest, \(H\) has \(i+p\) vertices.

Fig. 4
figure 4

Assembling an element \(t\in \mathcal C _i\) that overlaps with \(s\) at a given subgraph \(H\)

Let \(A_1,\ldots , A_p\) be the components of \(H\). We can assemble any \(t\in \mathcal C _i\) that overlaps with \(s\) in \(H\) by stringing together these components in some order, with other edges in between. Each component can appear in \(t\) in one of two orientations. Since we consider \(t\) only up to cyclic shift and inverted cyclic shift, we can assume without loss of generality that \(t\) begins with component \(A_1\) with a fixed orientation. This leaves \((p-1)!2^{p-1}\) choices for the order and orientation of \(A_2,\ldots ,A_p\) in \(t\).

Imagine now the components laid out in a line, with gaps between them, and count the number of ways to fill the gaps. Each of the \(p\) gaps must contain at least one edge, and the total number of edges in all the gaps is \(k-i\). Thus the total number of possible gap sizes is the number of compositions of \(k-i\) into \(p\) parts, or \(\genfrac(){0.0pt}{}{k-i-1}{p-1}\).

Now that we have chosen the number of edges to appear in each gap, we choose the edges themselves. We can do this by giving an ordered list \(k-p-i\) vertices to go in the gaps, along with a label and an orientation for each of the \(k-i\) edges this gives. There are \([n-p-i]_{k-p-i}\) ways to choose the vertices. We can give each new edge any orientation and label subject to the constraint that the word of \(t\) must be reduced. This means we have at most \(2d-1\) choices for the orientation and label of each new edge, for a total of at most \((2d-1)^{k-i}\).

All together, there are at most \((p-1)!2^{p-1}\genfrac(){0.0pt}{}{k-i-1}{p-1}[n-p-i]_{k-p-i}(2d-1)^{k-i}\) elements of \(\mathcal C _i\) that overlap with the cycle \(s\) at the subgraph \(H\). We now calculate the number of different ways to choose a subgraph \(H\) of \(s\) with \(i\) edges and \(p\) components. Suppose \(s\) is given as in (9). We first choose a vertex \(s_j\). Then, we can specify which edges to include in \(H\) by giving a sequence \(a_1,b_1,\ldots ,a_p,b_p\) instructing us to include the first \(a_1\) edges after \(s_j\) in \(H\), then to exclude the next \(b_1\), then to include the next \(a_2\), and so on. Any sequence for which \(a_i\) and \(b_i\) are positive integers, \(a_1+\cdots + a_p=i\), and \(b_1+\cdots +b_p=k-i\) gives us a valid choice of \(i\) edges of \(s\) making up \(p\) components. This counts each subgraph \(H\) a total of \(p\) times, since we could begin with any component of \(H\). Hence the number of subgraphs \(H\) with \(i\) edges and \(p\) components is \((k/p)\genfrac(){0.0pt}{}{i-1}{p-1}\genfrac(){0.0pt}{}{k-i-1}{p-1}\). This gives us the bound

$$\begin{aligned} |\mathcal C _i|\!\le \!\sum _{p=1}^{i\wedge (k-i)} (k/p)\genfrac(){0.0pt}{}{i\!-\!1}{p\!-\!1}\genfrac(){0.0pt}{}{k-i-1}{p-1}^2 (p\!-\!1)!2^{p\!-\!1}[n-p-i]_{k-p-i}(2d-1)^{k-i}. \end{aligned}$$

We apply the bounds \(\genfrac(){0.0pt}{}{i-1}{p-1}\le k^{p-1}/(p-1)!\) and \(\genfrac(){0.0pt}{}{k-i-1}{p-1}\le (e(k-i-1)/(p-1))^{p-1}\) to get

$$\begin{aligned} |\mathcal C _i| \!\le \! k(2d-1)^{k-i}[n-1-i]_{k-1-i}\left(\!1\!+\! \sum _{p=2}^{i\wedge (k-i)}\frac{1}{p} \left(\frac{2e^2k^3}{(p-1)^2}\right)^{p-1} \frac{1}{[n\!-\!1\!-\!i]_{p-1}}\!\right)\!. \end{aligned}$$

Since \(k<n^{1/6}\), the sum in the above equation is bounded by an absolute constant. Using the bound \(p_t^{\prime }\le 1/[n-k]_{k-i}\) for \(t\in \mathcal C _i\), we have

$$\begin{aligned} \sum _{t\in \mathcal C _i}p_t^{\prime }= O\left( \frac{k(2d-1)^{k-i}}{n} \right) \end{aligned}$$

and

$$\begin{aligned} \sum _{i=1}^{k-1}\sum _{t\in \mathcal C _i}p_t^{\prime }= O\left( \frac{k(2d-1)^{k-1}}{n} \right). \end{aligned}$$

These estimates, along with \(p_s\le 1/[n]_k\), complete the proof. \(\square \)

All that remains now is to apply this lemma to finish the proof of Theorem 8. First, consider the case where \(k\ge n^{1/6}\). Then \(k(2d-1)^k/n>1\) for sufficiently large values of \(n\) (regardless of \(d\)), in which case the theorem is trivially satisfied. By choosing \(C_0\) large enough, it holds for all \(n\) with \(k\ge n^{1/6}\).

When \(k<n^{1/6}\), we apply Lemma 10 and (7) to (8) to get

$$\begin{aligned} |\mathbf{E}[\lambda g(C+1)-C g(C)]|&= \frac{\Delta g}{[n]_k}|\mathcal C|O \left(\frac{k(2d-1)^k}{n}\right) +O \left(\frac{k^{3/2}(2d-1)^{k/2}}{n}\right)\\&= O \left(\frac{k(2d-1)^{k}}{n}\right)+ O \left(\frac{k^{3/2}(2d-1)^{k/2}}{n}\right) \end{aligned}$$

The first term is larger than the second for all but finitely many pairs \((k,d)\) with \(d\ge 2\). Hence there exists \(C_0\) large enough that for all \(n,\,k\), and \(d\ge 2\),

$$\begin{aligned} |\mathbf{E}[\lambda g(C+1)-C g(C)]| \le \frac{C_0k(2d-1)^k}{n}. \end{aligned}$$

\(\square \)

We will need a multivariate version of this theorem as well. Define \((C_{k}^{(\infty )};\ k\ge 1)\) to be independent Poisson random variables, with \(C_{k}^{(\infty )}\) having mean \(a(d,k)/2k\). Let \(d_{TV}(X,Y)\) denote the total variation distance between the laws of \(X\) and \(Y\).

Theorem 11

There is a constant \(C_2\) such that for all \(n,\,k\), and \(d\ge 2\),

$$\begin{aligned} d_{TV}\left(\big (C_1^{(n)}, \ldots ,C_r^{(n)}\big ),\ \big (C_{1}^{(\infty )},\ldots ,C_{r}^{(\infty )}\big )\right)\le \frac{C_2(2d-1)^{2r}}{n}. \end{aligned}$$

Our proof will be very similar to the single variable case above, except that we use Stein’s method for Poisson process approximation (see [9, Section 10.3]). Let \(\lambda _k=a(d,k)/2k\), and let \(e_i\in \mathbb Z _+^r\) be the vector with \(i\)th entry one and all other entries zero. Define the operator \(\mathcal A \) by

$$\begin{aligned} \mathcal A h(x) = \sum _{k=1}^r\lambda _k\big (h(x+e_k)-h(x)\big ) +\sum _{k=1}^rx_k\big (h(x-e_k)-h(x)\big ) \end{aligned}$$

for any \(h:\mathbb Z _+^r\rightarrow \mathbb R \) and \(x\in \mathbb Z _+^r\). We now describe the function that plays a role analogous to \(g\) in the single variable case.

Lemma 12

For any set \(A\subset \mathbb Z _+^r\), there is a function \(h:\mathbb Z _+^r\rightarrow \mathbb R \) such that

$$\begin{aligned} \mathcal A h(x)=1_{x\in A}- \mathbf{P}\big [\big (C_{1}^{(\infty )},\ldots ,C_{r}^{(\infty )}\big )\in A\big ]. \end{aligned}$$

This function \(h\) has the following properties:

$$\begin{aligned}&\sup _{\begin{matrix} x\in \mathbb Z _+^r\\ 1\le k\le r \end{matrix}}|h(x+e_k)-h(x)|\le 1, \end{aligned}$$
(12)
$$\begin{aligned}&\sup _{\begin{matrix} x\in \mathbb Z _+^r\\ 1\le j,k\le r \end{matrix}}|h(x+e_j+e_k) - h(x+e_j) +h(x)- h(x+e_k)|\le 1. \end{aligned}$$
(13)

 

Proof

This follows from Proposition 10.1.2 and Lemma 10.1.3 in [9] as applied to a point process on a space with \(r\) elements. \(\square \)

Our goal is thus to bound \(\mathbf{E}\big [\mathcal A h\big (C_1^{(n)}, \ldots ,C_r^{(n)}\big )\big ]\) for any function \(h\) as in Lemma 12. We will abbreviate this vector to \(\mathbf{C}=(C_1^{(n)}, \ldots ,C_r^{(n)})\). The set of equivalence class of closed trails of length \(k\), which we previously denoted \(\mathcal C \), we will now call \(\mathcal C ^k\).

$$\begin{aligned}&\mathbf{E}[\mathcal A h(\mathbf{C})] =\sum _{k=1}^r\sum _{s\in \mathcal C ^k}\left(\frac{1}{[n]_k}\mathbf{E}[h(\mathbf{C}+e_k)- h(\mathbf{C})] +\mathbf{E}\big [F_s\big (h(\mathbf{C}-e_k)-h(\mathbf{C})\big )\big ]\right)\\&\quad =\sum _{k=1}^r\sum _{s\in \mathcal C ^k}\left(\frac{1}{[n]_k}\mathbf{E}[h(\mathbf{C}+e_k)- h(\mathbf{C})] +p_s\mathbf{E}\big [h(\mathbf{C}-e_k)-h(\mathbf{C})\ \big |\ F_s=1\big ]\right). \end{aligned}$$

For every \(s\in \mathcal C ^k\), we will construct on the same probability space as \(\mathbf{C}\) a random variable \(\mathbf{Y}_s\) such that

$$\begin{aligned} \mathbf{Y}_s \,\mathop {=}\limits ^\mathcal{L }\,\left.\left(C_1^{(n)},\ldots ,\ C_{k-1}^{(n)},\ \sum _{\begin{matrix} t\in \mathcal C ^k\\ t\ne s \end{matrix}}F_t,\ C_{k+1}^{(n)},\ldots ,\ C_r^{(n)}\right)\ \right|\ F_s=1. \end{aligned}$$
(14)

Then

$$\begin{aligned} \big |\mathbf{E}[\mathcal A h(\mathbf{C})]\big |&= \left|\sum _{k=1}^r\sum _{s\in \mathcal C ^k}\left(\frac{1}{[n]_k}\mathbf{E}[h(\mathbf{C}+e_k)-h(\mathbf{C})] +p_s\mathbf{E}[h(\mathbf{Y}_s)-h(\mathbf{Y}_s+e_k)]\right)\right|\\&\le \sum _{k=1}^r \sum _{s\in \mathcal C ^k}\frac{1}{[n]_k}\mathbf{E}\big |h(\mathbf{C}+e_k)-h(\mathbf{C}) + h(\mathbf{Y}_s)-h(\mathbf{Y}_s+e_k)\big | \\&\quad + \sum _{k=1}^r\sum _{s\in \mathcal C ^k}\left|\frac{1}{[n]_k}-p_s\right| \mathbf{E}\big |h(\mathbf{Y}_s)-h(\mathbf{Y}_s+e_k)\big |. \end{aligned}$$

By (12) and (13), respectively,

$$\begin{aligned} \big |h(\mathbf{Y}_s)-h(\mathbf{Y}+e_k)\big |&\le 1,\\ \big |h(\mathbf{C}+e_k)-h(\mathbf{C}) + h(\mathbf{Y}_s)-h(\mathbf{Y}+e_k)\big |&\le \left||\mathbf{C}-\mathbf{Y}_s \right||_1. \end{aligned}$$

Hence

$$\begin{aligned} \big |\mathbf{E}[\mathcal A h(\mathbf{C})]\big |&\le \sum _{k=1}^r \sum _{s\in \mathcal C ^k}\frac{1}{[n]_k}\mathbf{E}\left||\mathbf{C}-\mathbf{Y}_s \right||_1 +\sum _{k=1}^r\sum _{s\in \mathcal C ^k}\left|\frac{1}{[n]_k}-p_s\right|\\&\le \sum _{k=1}^r \sum _{s\in \mathcal C ^k}\frac{1}{[n]_k}\mathbf{E}\left||\mathbf{C}-\mathbf{Y}_s \right||_1 +\sum _{k=1}^r|\mathcal C ^k|\frac{k^2}{2n[n]_k}\\&= \sum _{k=1}^r \sum _{s\in \mathcal C ^k}\frac{1}{[n]_k}\mathbf{E}\left||\mathbf{C}-\mathbf{Y}_s \right||_1 +O\left(\frac{r(2d-1)^r}{n}\right). \end{aligned}$$

Theorem 11 then follows from the following lemma:

Lemma 13

There exists an absolute constant \(C_3\) with the following property. For any \(1\le k\le r\) and \(s\in \mathcal C ^k\), let \(\mathbf{Y}_s\) be distributed as in (14). There is a coupling of \(\mathbf{C}\) and \(\mathbf{Y}_s\) such that for all \(n,\,k\), and \(d\ge 2\) satisfying \(k<n^{1/6}\),

$$\begin{aligned} \mathbf{E}\left||\mathbf{C}-\mathbf{Y}_s \right||_1\le \frac{C_3r(2d-1)^r}{n} \end{aligned}$$
(15)

Proof

This proof is nearly identical to that of Lemma 10. We construct as before the graph \(G_n^{\prime }\) and the random variables \(F_t^{\prime }\) for \(t\in \mathcal C ^i,\,1\le i\le r\). Then \(\mathbf{Y}_s\) can be defined in the natural way as

$$\begin{aligned} \mathbf{Y}_s = \left(\sum _{t\in \mathcal C ^1}F^{\prime }_t,\, \ldots ,\, \sum _{t\in \mathcal C ^{k-1}}F^{\prime }_t,\, \sum _{\begin{matrix} t\in \mathcal C ^k\\ t\ne s \end{matrix}}F^{\prime }_t,\, \sum _{t\in \mathcal C ^{k+1}}F^{\prime }_t,\, \ldots , \sum _{t\in \mathcal C ^{r}}F^{\prime }_t\right). \end{aligned}$$

We define \(\mathcal C ^i_{-1},\ldots ,\mathcal C ^i_{(i-1)\wedge k}\) as before, and it remains true that \(F^{\prime }_t\ge F_t\) if \(t\in \mathcal C ^i_j\) for \(j\ge 0\), and \(F^{\prime }_t=0\) if \(t\in \mathcal C ^i_{-1}\). Doing the calculation just as in (11),

$$\begin{aligned} \mathbf{E}\left||\mathbf{C}-\mathbf{Y}_s \right||_1 \le \sum _{i=1}^r \left( \sum _{t\in \mathcal C _{-1}^i}p_t + \sum _{t\in \mathcal C _0^i}(p_t^{\prime }-p_t)+ \sum _{j=1}^{(i-1)\wedge k}\sum _{t\in \mathcal C _j^i}p_t^{\prime } \right) + p_s. \end{aligned}$$

Nearly identical calculations as in Lemma 10 show that

$$\begin{aligned} \sum _{t\in \mathcal C _{-1}^i}p_t&= O\left(\frac{k(2d-1)^{i-1}}{n}\right),\\ \sum _{t\in \mathcal C _0^i}(p_t^{\prime }-p_t)&= O\left(\frac{i(2d-1)^i}{n}\right),\\ \sum _{t\in \mathcal C _j^i}p_t^{\prime }&= O\left(\frac{k(2d-1)^{i-j}}{n}\right), \end{aligned}$$

which completes the proof. \(\square \)

 

3.2 Non-backtracking walks in random regular graphs

We now seek to transfer our results on cycles to closed non-backtracking walks. Note that we consider \(G_n\) as an undirected graph when we discuss walks on it. A non-backtracking walk is one that begins and ends at the same vertex, and that never follows an edge and immediately follows that same edge backwards. Let \(\text{ NBW}_{k}^{(n)}\) denote the number of closed non-backtracking walks of length \(k\) on \(G_n\).

If the last step of a closed non-backtracking walk is anything other than the reverse of the first step, we say that the walk is cyclically non-backtracking (Fig. 5). Cyclically non-backtracking walks on \(G_n\) are exactly the closed non-backtracking walks whose words are cyclically reduced. Cyclically non-backtracking walks are easier to analyze than plain non-backtracking walks because every cyclic and inverted cyclic shift of a cyclically non-backtracking walk remains cyclically non-backtracking. Let \(\mathrm{CNBW}_{k}^{(n)}\) denote the number of closed cyclically non-backtracking walks of length \(k\) on \(G_n\).

These notions sometimes go by different names. In [25], non-backtracking walks are called irreducible, and \(\text{ NBW}_{k}^{(n)}\) is called \(\text{ IrredTr}_k(G)\). Cyclically non-backtracking walks are called strongly irreducible, and \(\mathrm{CNBW}_{k}^{(n)}\) is called \(\text{ SIT}_k(G)\).

Fig. 5
figure 5

The walk \(1\rightarrow 2\rightarrow 3\rightarrow 4\rightarrow 5\rightarrow 2\rightarrow 1\) is non-backtracking, but not cyclically non-backtracking. Note that such walks have a “lollipop” shape

Recall that \((C_{k}^{(\infty )};\ k\ge 1)\) are independent Poisson random variables, with \(C_{k}^{(\infty )}\) having mean \(a(d,k)/2k\). Define

$$\begin{aligned} \mathrm{CNBW}_{k}^{(\infty )} = \sum _{j|k} 2jC_{j}^{(\infty )}. \end{aligned}$$

For any cycle in \(G_n\) of length \(j|k\), we obtain \(2j\) non-backtracking walks of length \(k\) by choosing a starting point and direction and then walking around the cycle repeatedly. We start by decomposing \(\mathrm{CNBW}_{k}^{(n)}\) into these walks plus the remaining “bad” walks that are not repeated cycles. We denote these as \(B_{k}^{(n)}\), giving us

$$\begin{aligned} \mathrm{CNBW}_{k}^{(n)}= \sum _{j|k}2jC_{j}^{(n)} + B_k^{(n)}. \end{aligned}$$
(16)

The results of Sect. 3.1 give us a good understanding of \(C_{k}^{(n)}\). Our goal now is to analyze \(B_k^{(n)}\). Specifically, we will show that in the right asymptotic regime, it is likely to be zero, implying that \(\mathrm{CNBW}_{k}^{(n)}\) will converge to \(\mathrm{CNBW}_{k}^{(\infty )}\). We start with a more precise version of Lemma 7.

Lemma 14

With the setup of Lemma 7, suppose that \(\Gamma \) has \(k\) vertices and \(e\) edges, with \(e>k\). Then for all \(n>e\),

$$\begin{aligned} \mathbf{E}\big [X_{\Gamma }^{(n)}\big ]\le \frac{1}{[n-k]_{e-k}} \end{aligned}$$

Proof

This is apparent from (4). \(\square \)

Proposition 15

For all \(n\ge 2k\),

$$\begin{aligned} \mathbf{E}\big [B_k^{(n)}\big ] \le \sum _{i=1}^{k-1}\frac{a(d,k)k^{2i+2}}{[n-k]_i}. \end{aligned}$$

Proof

Any closed cyclically non-backtracking walk can be thought of as a trail, with repeated vertices in the trail now allowed. Such a walk is counted by \(B_k^{(n)}\) if and only if the graph of its category has more edges than vertices. Let \(\mathcal G _d\) consist of all graphs of categories of a closed trail of length \(k\) that have more edges than vertices. Then

$$\begin{aligned} B_k^{(n)} = \sum _{\Gamma \in \mathcal G _d}X_\Gamma ^{(n)}, \end{aligned}$$

using the notation of Sect. 3.1. To use Lemma 14, we classify the graphs in \(\mathcal G _d\) according how to many more edges than vertices they contain:

$$\begin{aligned} \mathbf{E}\big [B_k^{(n)}\big ] \le \sum _{i=1}^{\infty } \big |\{\Gamma \in \mathcal G _d:\Gamma \text{ has} \text{ exactly} i \text{ more} \text{ edges} \text{ than} \text{ vertices}\}\big |\frac{1}{[n-k]_i}. \end{aligned}$$

A graph in \(\mathcal G _d\) has at most \(k\) edges, so the terms with \(i\ge k\) in this sum are zero. By Lemma 18 in [34], for each word \(w\in \mathcal W \), the number of graphs in \(\mathcal G _d\) with word \(w\) and with \(i\) more edges than vertices is at most \(k^{2i+2}\), completing the proof. \(\square \)

It is worth noting that this proposition fails if the word “cyclically” is removed from the definition of \(B_k^{(n)}\). The problem is that walks that are non-backtracking but not cyclically non-backtracking can have as many vertices as edges.

Corollary 16

There is an absolute constant \(C_5\) such that for all \(n,\,r\), and \(d\ge 2\),

$$\begin{aligned} \mathbf{P}[B_{k}^{(n)} > 0\quad { for\ some}\ k\le r]\le \frac{C_5r^4(2d-1)^{r}}{n}. \end{aligned}$$

Proof

Bounding the expression from Proposition 15 by a geometric series,

$$\begin{aligned} \mathbf{E}\big [B_{r}^{(n)}\big ] \le \frac{a(d,r)r^4}{n-r} \frac{n-2r}{n-2r-r^2}. \end{aligned}$$

If \(r\ge n^{1/4}\), then \(r^4(2d-1)^r/n>1\), and the corollary is trivially true for any \(C_5\ge 1\). Thus we may assume that \(r<n^{1/4}\). In this case, the expression \((n-2r)/(n-2r-r^2)\) is bounded by an absolute constant. This and (1) imply that for some constant \(C_4\),

$$\begin{aligned} \mathbf{E}\big [B_{r}^{(n)}\big ] \le \frac{C_4r^4(2d-1)^{r}}{n}. \end{aligned}$$

Since \(B_{k}^{(n)}\) is integer-valued,

$$\begin{aligned} \mathbf{P}[B_{k}^{(n)} > 0\,\text{ for} \text{ some}\,k\le r_n]&\le \sum _{k=1}^{r}\mathbf{P}[B_{k}^{(n)} > 0]\le \sum _{k=1}^{r}\mathbf{E}[B_{k}^{(n)}]\\&\le \sum _{k=1}^{r}\frac{C_4k^4(2d-1)^k}{n} \le \frac{C_5r^4(2d-1)^{r}}{n} \end{aligned}$$

for some choice of the constant \(C_5\). \(\square \)

The following fact follows directly from the definition of total variation distance, and we omit its proof.

Lemma 17

Let \(X\) and \(Y\) be random variables on a metric space \(S\), and let \(T\) be any metric space. For any measurable \(f:S\rightarrow T\),

$$\begin{aligned} d_{TV}(f(X),f(Y))\le d_{TV}(X,Y). \end{aligned}$$

It is now straightforward to give a result on non-backtracking walks analogous to Theorem 11.

Proposition 18

There is a constant \(C_6\) such that for all \(n,\,r\), and \(d\ge 2\),

$$\begin{aligned} d_{TV}\left( \big ( \mathrm{CNBW}_{k}^{(n)};\ 1\le k\le r \big ), \big ( \mathrm{CNBW}_{k}^{(\infty )};\ 1\le k\le r \big )\right) \le \frac{C_6(2d-1)^{2r}}{n}. \end{aligned}$$

Proof

We start by recalling the decomposition of \(\mathrm{CNBW}_{k}^{(n)}\) into good and bad walks given in (16). Let \(G_{k}^{(n)}=\sum _{j|k}2jC_{j}^{(n)}\), so that \(\mathrm{CNBW}_{k}^{(n)}=G_{k}^{(n)}+B_{k}^{(n)}\). By Lemma 17 and Theorem 11,

$$\begin{aligned}&d_{TV} \left( \big ( G_{k}^{(n)};\ 1\le k\le r \big ), \big ( \mathrm{CNBW}_{k}^{(\infty )};\ 1\le k\le r \big ) \right)\nonumber \\&\quad \le d_{TV} \left(\big (C_{k}^{(n)};\ 1\le k\le r\big ), \big (C_{k}^{(\infty )};\ 1\le k\le r\big )\right)\nonumber \\&\quad \le \frac{C_2(2d-1)^{2r}}{n}. \end{aligned}$$
(17)

Then for any \(A\subset \mathbb Z _+^r\),

$$\begin{aligned}&\mathbf{P} \left[\big ( \mathrm{CNBW}_{k}^{(n)};\ 1\le k\le r \big )\in A \right]-\mathbf{P} \left[\big ( \mathrm{CNBW}_{k}^{(\infty )};\ 1\le k\le r \big )\in A \right] \\&\quad \le \mathbf{P} \left[\big ( G_{k}^{(n)};\ 1\le k\le r \big )\in A \right]+\mathbf{P} \left[\bigcup _{k=1}^r\big \{B_{k}^{(n)}>0\big \} \right]\\&\qquad -\mathbf{P} \left[\big ( \mathrm{CNBW}_{k}^{(\infty )};\ 1\le k\le r \big )\in A \right]\le \frac{C_2(2d-1)^{2r}}{n}+\frac{C_5r^4(2d-1)^r}{n} \end{aligned}$$

by (17) and Corollary 16. For any \(n\) and \(d\), since \(d \ge 2\) and thus \(2d-1 \ge 3\), the first term is larger than the second for all but at most a finite number of \(r\)s, bounded independently of \(n\) and \(d\). Therefore there exists a constant \(C_6\) satisfying the conditions of the theorem. \(\square \)

Corollary 19

For any fixed \(r\) and \(d\ge 2\),

$$\begin{aligned} (\mathrm{CNBW}_{1}^{(n)},\ldots ,\mathrm{CNBW}_{r}^{(n)})\,\mathop {\longrightarrow }\limits ^{\mathcal{L }}\,(\mathrm{CNBW}_{1}^{(\infty )},\ldots ,\mathrm{CNBW}_{r}^{(\infty )}) \end{aligned}$$

as \(n\rightarrow \infty \).

To achieve a version of the above corollary that holds when \(d\) grows, we need to center and scale our random variables \(\mathrm{CNBW}_{k}^{(n)}\).

Proposition 20

Let \(r\) be fixed, and suppose that \(d=d(n)\rightarrow \infty \) as \(n\rightarrow \infty \), and that \((2d-1)^{2r}=o(n)\). Let \(\widetilde{\text{ CNBW}}_{\mathrm{k}}^{\mathrm{(n)}}=(2d-1)^{-k/2} (\mathrm{CNBW}_{k}^{(n)}-\mathbf{E}[\mathrm{CNBW}_{k}^{(\infty )}])\). Let \(Z_1,\ldots ,Z_r\) be independent normal random variables with \(\mathbf{E}Z_k=0\) and \(\mathbf{E}Z_k^2 = 2k\). Then as \(n\rightarrow \infty \),

$$\begin{aligned} \big (\widetilde{\text{ CNBW}}_{\mathrm{1}}^{\mathrm{(n)}}, \ldots ,\ \widetilde{\text{ CNBW}}_{\mathrm{ r}}^{\mathrm{(n)}}\big )\,\mathop {\longrightarrow }\limits ^{\mathcal{L }}\,(Z_1,\ldots ,Z_r). \end{aligned}$$

Proof

Let \(X_{k}^{(n)}=(2d-1)^{-k/2} (\mathrm{CNBW}_{k}^{(\infty )}-\mathbf{E}[\mathrm{CNBW}_{k}^{(\infty )}])\). We note that \(\mathrm{CNBW}_{k}^{(\infty )}\) depends on \(d\) (and hence on \(n\)), although we have suppressed this dependence from the notation. By Proposition 18 and Lemma 17, the total variation distance between the laws of \(\big (\widetilde{\text{ CNBW}}_{\mathrm{k}}^{\mathrm{(n)}};\ 1\le k\le r\big )\) and \(\big (X_{k}^{(n)};\ 1\le k\le r\big )\) converges to zero as \(n\rightarrow \infty \). Hence it suffices to show that \(\big (X_{k}^{(n)};\ 1\le k\le r\big )\,\mathop {\longrightarrow }\limits ^{\mathcal{L }}\,(Z_1,\ldots ,Z_r)\) as \(n\rightarrow \infty \).

Let \(\lambda _k=a(d,k)/2k\) as in Theorem 11. We can write \(X_{k}^{(n)}\) as

$$\begin{aligned} X_{k}^{(n)}= 2k(2d-1)^{-k/2}\big (C_{k}^{(\infty )}- \lambda _k\big )+ (2d-1)^{-k/2}\sum _{\begin{matrix} j|k\\ j<k \end{matrix}}\big (2jC_{j}^{(\infty )} -a(d,j)\big ). \end{aligned}$$

Using (1), it is a straightforward calculation to show that as \(n\rightarrow \infty \),

$$\begin{aligned} \left(2k(2d-1)^{-k/2}\big (C_{k}^{(\infty )}- \lambda _k\big );\ 1\le k\le r\right) \,\mathop {\longrightarrow }\limits ^{\mathcal{L }}\,(Z_1,\ldots ,Z_r). \end{aligned}$$

Hence we need only show that for all \(k\le r\),

$$\begin{aligned} (2d-1)^{-k/2}\sum _{\begin{matrix} j|k\\ j<k \end{matrix}}\big (2jC_{j}^{(n)} - a(d,j)\big ) \,\mathop {\longrightarrow }\limits ^{pr}\,0. \end{aligned}$$

We calculate

$$\begin{aligned} \mathbf{Var} \left[(2d-1)^{-k/2}\sum _{\begin{matrix} j|k\\ j<k \end{matrix}}\left(2jC_{j}^{(n)} - a(d, j)\right)\right] = (2d-1)^{-k}\sum _{\begin{matrix} j|k\\ j<k \end{matrix}}ja(d, j), \end{aligned}$$

and the statement follows by (1) and Chebyshev’s inequality. \(\square \)

The remaining results in this section refer to the weak convergence set-up in Sect. 2.

Theorem 21

Suppose that \(d\) is fixed, that \(r_n\rightarrow \infty \), and that

$$\begin{aligned} (2d-1)^{2r_n} = o(n). \end{aligned}$$
(18)

Let

$$\begin{aligned} \Theta _k=\mathbf{E}\big [\mathrm{CNBW}_{k}^{(\infty )}\big ]^2=\sum _{j|k} 2j a(d, j) + \left( \sum _{j|k} a(d,j) \right)^2. \end{aligned}$$

Let \((b_k)_{k\in \mathbb N }\) be any fixed positive summable sequence. Define the weights of Sect. 2 by setting

$$\begin{aligned} \omega _k=b_k/\Theta _k, \quad k \in \mathbb N . \end{aligned}$$

Let \(P_n\) be the law of the sequence \((\mathrm{CNBW}_{1}^{(n)},\ldots ,\mathrm{CNBW}_{r_n}^{(n)},0,0, \ldots )\). Then \(\{P_n\}\), considered as a sequence in \(\mathbb P (X)\), converges weakly to the law of the random vector \(\big (\mathrm{CNBW}_{k}^{(\infty )};\, k\in \mathbb N \big )\).

Proof

We first claim that the random vector \((\mathrm{CNBW}_{k}^{(\infty )};\, k \in \mathbb N )\) almost surely lies in \(\mathbf{L}^2(\underline{\omega })\). This follows by a deliberate choice of \(\underline{\omega }\):

$$\begin{aligned} \mathbf{E}\sum _{k=1}^\infty \left(\mathrm{CNBW}_{k}^{(\infty )}\right)^2 \omega _k = \sum _{k=1}^\infty \Theta _k \omega _k = \sum _{k=1}^\infty b_k < \infty , \end{aligned}$$

which proves finiteness almost surely. The computation of \(\Theta _k\) is straightforward.

By Corollary 19, we know that all subsequential weak limits of \(P_n\) have the same finite-dimensional distributions as \((\mathrm{CNBW}_{k}^{(\infty )}; k\in \mathbb N )\), and by Lemma 5, they are in fact identical to the law of \((\mathrm{CNBW}_{k}^{(\infty )}; k\in \mathbb N )\). Thus it suffices to show that \(\{P_1,P_2,\ldots \}\) is tight. To do this we will apply Lemma 3 by choosing a suitable infinite cube.

In other words, we must show that given any \(\epsilon >0\), there exists an element \(\underline{a}=(a_m)_{m\in \mathbb N } \in \mathbf{L}^2(\underline{\omega })\) such that

$$\begin{aligned} \sup _n\mathbf{P} \left[\cup _{k=1}^{r_n} \left\{ \mathrm{CNBW}_{k}^{(n)}> a_k \right\} \right]<\epsilon . \end{aligned}$$
(19)

In fact, our choice of \(\underline{a}\) is

$$\begin{aligned} a_k = (\alpha +2) \mathbf{E}\left( \mathrm{CNBW}_{k}^{(\infty )}\right)= (\alpha + 2) \sum _{j|k} a(d, j), \end{aligned}$$

for some positive \(\alpha \) determined by \(\epsilon \). Note that, by an obvious calculation, \(\underline{a}\in \mathbf{L}^2(\underline{\omega })\).

By Proposition 18, for any \(\eta >0\),

$$\begin{aligned} \mathbf{P} \left[\cup _{k=1}^{r_n} \left\{ \mathrm{CNBW}_{k}^{(n)}> a_k \right\} \right] \le \mathbf{P} \left[ \cup _{k=1}^{r_n}\left\{ \mathrm{CNBW}_{k}^{(\infty )}> a_k \right\} \right]+\eta \end{aligned}$$
(20)

for all sufficiently large \(n\). Now, we apply the union bound

$$\begin{aligned} \sup _n \mathbf{P} \left[\cup _{k=1}^{r_n} \left\{ \mathrm{CNBW}_{k}^{(\infty )}> a_k \right\} \right] \le \sum _{k=1}^\infty \mathbf{P} \left[ \mathrm{CNBW}_{k}^{(\infty )} > a_k\right] \end{aligned}$$
(21)

and bound the right side by a simple large deviation estimate.

We start with the decomposition

$$\begin{aligned} \mathrm{CNBW}_{k}^{(\infty )} = \sum _{j|k} 2j C_{j}^{(\infty )}, \end{aligned}$$
(22)

where \(\{C_{j}^{(\infty )}\}\) are independent Poisson random variables with mean \(a(d, j)/2j\). Thus, for any \(\lambda >0\), the exponential moments are easy to derive:

$$\begin{aligned} \mathbf{E}\left( e^{\lambda \mathrm{CNBW}_{k}^{(\infty )}} \right)&= \prod _{j|k} E\left( e^{\lambda 2j C_{j}^{(\infty )} } \right)= \prod _{j|k} \exp \left\{ \frac{a(d, j)}{2j}\left( e^{2\lambda j} - 1 \right) \right\} \\&= \exp \left[ \sum _{j|k} a(d, j)\frac{e^{2\lambda j }-1}{2j} \right]. \end{aligned}$$

Hence, by Markov’s inequality, we get

$$\begin{aligned} \mathbf{P} \left( \mathrm{CNBW}_{k}^{(\infty )} > a_k \right)&\le e^{-\lambda a_k} \mathbf{E} \left( e^{\lambda \mathrm{CNBW}_{k}^{(\infty )}} \right)\\&\le \exp \left[ \sum _{j|k} a(d,j) \left( \frac{e^{2\lambda j } -1 }{2j} - (\alpha +2)\lambda \right) \right]. \end{aligned}$$

An easy analysis shows that if \(\lambda = \log 2/ (2k)\), one must have

$$\begin{aligned} \frac{e^{2\lambda j}-1}{2j} < 2\lambda , \quad \text{ for} \text{ all}\,j\le k. \end{aligned}$$

Hence,

$$\begin{aligned} \mathbf{P} \left( \mathrm{CNBW}_{k}^{(\infty )} > a_k \right) \le \exp \left[ -\frac{\alpha \log 2}{2k} \sum _{j|k} a(d, j) \right] \le 2^{ -\alpha (2d-1)^{k}/2k}. \end{aligned}$$

The above expression is clearly summable in \(k\), and thus from (21) we get

$$\begin{aligned} \sup _n \mathbf{P} \left[\cup _{k=1}^{r_n} \left\{ \mathrm{CNBW}_{k}^{(\infty )}> a_k \right\} \right] \le \sum _{k=1}^\infty 2^{ -\alpha (2d-1)^{k}/2k}. \end{aligned}$$

The right side can be made as small as we want by choosing a large enough \(\alpha \). This is enough to establish (19). \(\square \)

We now prove a corresponding theorem when \(d\) is growing with \(n\). Let \(\mu _k(d)\) denote \(\mathbf{E}\big [\mathrm{CNBW}_{k}^{(\infty )}\big ]\) emphasizing its dependence on \(d\). We define

$$\begin{aligned} \widetilde{N}_{k}^{(n)} = (2d-1)^{-k/2}\big (\mathrm{CNBW}_{k}^{(n)}-\mu _k(d)\big ). \end{aligned}$$
(23)

Theorem 22

Suppose that \(d=d(n)\rightarrow \infty \) and \(r_n\rightarrow \infty \) as \(n\rightarrow \infty \). Suppose that

$$\begin{aligned} (2d-1)^{2r_n} = o(n). \end{aligned}$$

We define the weights \(\underline{\omega }\) by setting \(\omega _k=b_k/(k^2\log k)\), where \((b_k)_{k\in \mathbb N }\) is any fixed positive summable sequence. Let \(P_n\) be the law of the sequence \((\widetilde{N}_{1}^{(n)},\ldots ,\widetilde{N}_{r_n}^{(n)},0,0, \ldots )\). Let \(Z_1,Z_2,\ldots \) be independent normal random variables with \(\mathbf{E}Z_k=0\) and \(\mathbf{E}Z_k^2=2k\). Then \(P_n\), considered as an element of \(\mathbb P (X)\), converges weakly to the law of the random vector \((Z_k; k\in \mathbb N )\).

To proceed with the proof we will need a lemma on measure concentration. We will use a result on modified logarithmic Sobolev inequality that can be found in the Berlin notes by Ledoux [31]. For the convenience of the reader we reproduce (a slight modification of) the statement of Theorem 5.5 in [31, page 71] for a joint product measure. Please note that although the statement of Theorem 5.5 is written for an iid product measure, its proof goes through even when the coordinate laws are different (but independent). In fact, the crucial step is the tensorization of entropy ([31, Proposition 2.2]), which is generally true.

Lemma 23

For \(n\in \mathbb N \), let \(\mu _1, \mu _2, \ldots , \mu _n\) be \(n\) probability measures on \(\mathbb N \). For functions \(f\) on \(\mathbb N \), define \(Df(x)= f(x+1) - f(x)\) to be the discrete derivative. Define the entropy of \(f\) under \(\mu _i\) by

$$\begin{aligned} \text{ Ent}_{\mu _i}(f)= \mathbf{E}_{\mu _i}\left( f \log f \right) - \mathbf{E}_{\mu _i} (f) \log \mathbf{E}_{\mu _i}\left( f \right)\!. \end{aligned}$$

Assume that there exist two positive constants \(c\) and \(d\) such that for every \(f\) on \(\mathbb N \) such that \(\sup _x\left|Df \right| \le \lambda \), one has

$$\begin{aligned} \text{ Ent}_{\mu _i}\left( e^f\right) \le c e^{d\lambda } \mathbf{E}_{\mu _i}\left( \left|D f \right|^2 e^f \right), \quad \text{ as} \text{ functions} \text{ of} \,\lambda . \end{aligned}$$

Let \(\mu \) denote the product measure of the \(\mu _i\)’s. Let \(F\) be a function on \(\mathbb N ^n\) such that for every \(x\in \mathbb N ^n\),

$$\begin{aligned} \sum _{i=1}^n \left|F(x+e_i) - F(x) \right|^2 \le \alpha ^2,\quad \text{ and}\quad \max _{1\le i \le n} \left|F(x+e_i) - F(x) \right| \le \beta . \end{aligned}$$

Then \(\mathbf{E}_\mu (\left|F \right|) < \infty \) and, for every \(r\ge 0\),

$$\begin{aligned} \mu \left( F \ge \mathbf{E}_\mu (F) + r \right) \le \exp \left( -\frac{r}{2d\beta } \log \left(1 + \frac{\beta dr}{4 c\alpha ^2} \right) \right). \end{aligned}$$

 

Proof of Theorem 22

The proof is similar in spirit to the proof of Theorem 21. As in that proof, the limiting measure is supported on \(\mathbf{L}^2(\underline{\omega })\). By Proposition 20 and Lemma 5, we need only show that the family \(\{P_1,P_2,\ldots \}\) is tight. As in Theorem 21, we need to choose a suitable infinite cube.

Choose \(\epsilon >0\). Define

$$\begin{aligned} a_k = \alpha k \sqrt{\log k}, \end{aligned}$$

for some positive \(\alpha > 1\) depending on \(\epsilon \). Then \(\underline{a} \in \mathbf{L}^2(\underline{\omega })\).

We need to show that, for a suitable choice of \(\alpha \),

$$\begin{aligned} \sup _n\mathbf{P}\left[\cup _{k=1}^{r_n}\left\{ \left|\widetilde{N}_{k}^{(n)} \right| > a_k \right\} \right]<\epsilon . \end{aligned}$$

By Lemma 17 and Proposition 18, for any \(\eta >0\),

$$\begin{aligned} \mathbf{P}\left[\cup _{k=1}^{r_n}\left\{ \left|\widetilde{N}_{k}^{(n)} \right| > a_k \right\} \right]\!<\!\mathbf{P}\left[\cup _{k=1}^{r_n}\left\{ \left|\mathrm{CNBW}_{k}^{(\infty )} \!-\! \mu _k(d) \right| > a_k (2d-1)^{k/2} \right\} \right]\!+\!\eta \nonumber \\ \end{aligned}$$
(24)

for all sufficiently large \(n\).

Note as before that \(\mathrm{CNBW}_{k}^{(\infty )}\) depends on \(d\) (and hence on \(n\)).

Proceeding as before, we need to estimate

$$\begin{aligned} \mathbf{P}\left( \left|\mathrm{CNBW}_{k}^{(\infty )} - \mu _k(d) \right| > a_k (2d-1)^{k/2} \right) \end{aligned}$$

for our choice of \(a_k\).

Let \(\mathrm{Poi}(\theta )\) denote as before the Poisson law with mean \(\theta \). We will denote expectation with respect to \(\mathrm{Poi}(\theta )\) by \(\mathbf{E}_{\pi _{\theta }}\). As shown in Corollary 5.3 in [31, page 69], \(\mathrm{Poi}(\theta )\) satisfies the following modified logarithmic Sobolev inequality: for any \(f\) on \(\mathbb N \) with strictly positive values

$$\begin{aligned} \text{ Ent}_{\pi _{\theta }}(f) \le \theta \mathbf{E}_{\pi _\theta } \left( \frac{1}{f} \left|Df \right|^2 \right). \end{aligned}$$
(25)

Here \(\text{ Ent}_{\pi _\theta }(f)\) refers to the entropy of \(f\) under \(\mathrm{Poi}(\theta )\).

Let now \(f\) on \(\mathbb N \) satisfy \(\sup _{x} \left|Df(x) \right| \le \lambda \). By eqn. (5.16) in [31, page 70], (25) implies that \(\mathrm{Poi}(\theta )\) satisfies the inequality

$$\begin{aligned} \text{ Ent}_{\pi _\theta }\left( e^f\right) \le C e^{2\lambda } \mathbf{E}_{\pi _\theta }\left( \left|Df \right|^2 e^f \right), \quad \text{ for} \text{ any}\,C\ge \theta . \end{aligned}$$
(26)

Now fix some \(k\in \mathbb N \) and consider the product measure of the random vector \((C_{j}^{(\infty )},\; j|k )\). Each coordinate satisfies inequality (26) and one can take the common constant \(C\) to be \(a(d,k)/2k\).

We apply Lemma 23 on the function \(F(\underline{x}) = \sum _{j|k} 2j x_j\). It is straightforward to see that one can take \(\alpha ^2= 4k^3,\,\beta =2k\). Thus, we get the following tail estimate for any \(r >0\):

$$\begin{aligned} \mathbf{P}\left( F > \mathbf{E}(F) + r \right) \le \exp \left( -\frac{r}{8k}\log \left( 1 + \frac{4k r}{4 C 4k^3 } \right) \right). \end{aligned}$$

Replacing \(F\) by \(-F\) we obtain a two-sided bound

$$\begin{aligned} \mathbf{P}\left( \left|F - \mathbf{E}(F) \right| > r \right) \le 2\exp \left( -\frac{r}{8k}\log \left( 1 + \frac{4k r}{16 C k^3 } \right) \right). \end{aligned}$$

Hence we have shown that for any \(r >0\), the following estimate holds

$$\begin{aligned} \mathbf{P}\left( \left|\mathrm{CNBW}_{k}^{(\infty )} - \mu _k(d) \right| > r \right)&\le 2\exp \left( -\frac{r}{8k}\log \left( 1 + \frac{8k^2 r}{16 a(d,k) k^3 } \right) \right)\\&= 2\exp \left( -\frac{r}{8k}\log \left( 1 + \frac{r}{ 2a(d,k) k} \right) \right). \end{aligned}$$

Recall from (1) that \(a(d,k)\sim (2d-1)^k\). Therefore

$$\begin{aligned}&\mathbf{P}\left( \left|\mathrm{CNBW}_{k}^{(\infty )} - \mu _k(d) \right| > a_k (2d-1)^{k/2} \right)\\&\quad \le 2\exp \left( -\frac{a_k (2d-1)^{k/2}}{8k}\log \left( 1 + \frac{a_k}{ 2(2d-1)^{k/2} k} \right) \right). \end{aligned}$$

Now, \(\log (1+x) \ge x/2\) for all \(0\le x\le 1\). Using this simple bound we get that for all \((k,d)\) such that \(\alpha \log k \le 2(2d-1)^{k}\), we have

$$\begin{aligned} \mathbf{P}\left( \left|\mathrm{CNBW}_{k}^{(\infty )} - \mu _k(d) \right| > a_k (2d-1)^{k/2} \right)&\le 2 \exp \left( -\frac{a_k^2}{32 k^2} \right)\\&\le 2\exp \left( -\frac{\alpha ^2 k^2 \log k}{32 k^2}\right)=2 k^{-\alpha ^2/32}. \end{aligned}$$

The right side is summable whenever \(\alpha ^2 > 32\). The rest of the proof follows just as in Theorem 21. \(\square \)

 

4 Spectral concentration

The problem of estimating the spectral gap of a \(d\)-regular graph has been approached primarily in two ways, the method of moments and the counting method of Kahn and Szemerédi, prezented in [21]. The method of moments has been developed in the work of Broder and Shamir [11] and very extensively by Friedman [24, 25]. In his work, Friedman, relying on \(d\) being fixed independently of \(n\), developed extremely fine control over the magnitude of the second eigenvalue. On the other hand in [21], Kahn and Szemerédi only show that the second largest eigenvalue has magnitude \(O(\sqrt{d}).\) While weaker than Friedman’s bound, their techniques readily extend to the case where \(d\) is allowed to grow as a function of \(n\); this observation has been informally made by others, and communicated to us by Vu and Friedman. Here we will formalize it, and present the Kahn–Szemerédi argument in the context of growing \(d\) to demonstrate the method’s validity, as well as to develop some handle on the constants in the bound.

Specifically, we will prove

Theorem 24

For any \(m > 0,\) there is a constant \(C=C(m)\) and universal constants \(K\) and \(c\) so that

$$\begin{aligned} \mathbf{P}\left[ \exists i \ne 1 ~:~ |\lambda _i| \ge C \sqrt{d} \right] \le n^{-m} + K\exp (-cn). \end{aligned}$$

Further, the constant \(C\) may be taken to be \(36000 + 2400m.\)

In what follows, let \(M\) be the adjacency matrix for the \(2d\)-regular graph \(G_n\). Recall that this matrix can be realized by sampling independently and uniformly \(d\) permutation matrices \(A_1, A_2, \ldots , A_d\) and defining

$$\begin{aligned} M = A_1 + A_1^t + A_2 + A_2^t + \cdots + A_d + A_d^t. \end{aligned}$$

The starting point is the variational characterization of the eigenvalues \(\lambda _1 \ge \lambda _2 \ge \cdots \ge \lambda _n\) of \(M\), which states that

$$\begin{aligned} \max \{ \lambda _2, |\lambda _n| \} = \sup _{\begin{matrix} w \perp \mathbf{1}\\ \Vert w\Vert =1 \end{matrix}} \left|w^t M w\right|\!. \end{aligned}$$

Additional flexibility is provided by replacing this symmetric version of the Rayleigh quotient by the asymmetric version,

$$\begin{aligned} \sup _{\begin{matrix} w,v \perp \mathbf{1}\\ \Vert v\Vert =\Vert w\Vert =1 \end{matrix}} |v^t M w|. \end{aligned}$$

The random variables \(v^tMw\), for fixed \(w\) and \(v\), are substantially more tractable than the supremum. To be able to work with these random variables instead of the supremum, we will pass to a finite set of vectors which approximate the sphere \(\mathcal S = \{w \perp \mathbf{1}~:~ \Vert w\Vert =1\}.\) More specifically, we will only consider those \(w\) and \(v\) lying on the subset of the lattice \(\mathcal T \) defined as

$$\begin{aligned} \mathcal T := \left\{ \frac{\delta z }{\sqrt{n}} ~:~ z \in \mathbb Z ^n, \Vert z\Vert ^2 \le \frac{n}{\delta ^2}, z \perp \mathbf{1}\right\} , \end{aligned}$$

for a fixed \(\delta > 0.\)

Vectors from \(\mathcal T \) approximate vectors from \(\mathcal S \) in the sense that every \(v \in (1-\delta )\mathcal S \) is a convex combination of points in \(\mathcal T .\) (See Lemma 2.3 of [22].) Thus

$$\begin{aligned} \frac{1}{(1-\delta )^2} \sup _{\begin{matrix} w,v \perp \mathbf{1}\\ \Vert v\Vert =\Vert w\Vert =1 \end{matrix}} \left|[1-\delta ]v^t M [1-\delta ]w\right| \le \frac{1}{(1-\delta )^2} \sup _{x,y \in \mathcal T } \left|x^tMy\right|\!. \end{aligned}$$

Furthermore, by a volume argument, it is possible to bound the cardinality of \(\mathcal T \) as

$$\begin{aligned} \left|\mathcal T \right| \left(\frac{\delta }{\sqrt{n}}\right)^n \le \text{ Vol}\left[ x \in \mathbb R ^n ~:~ \Vert x\Vert \le 1+\frac{\delta }{2} \right] = \frac{(1+\frac{\delta }{2})^n \sqrt{\pi }^n }{\Gamma (\frac{n}{2} + 1)}. \end{aligned}$$

Employing Stirling’s approximation, this shows

$$\begin{aligned} \left|\mathcal T \right| \le C \left[\frac{(1+\frac{\delta }{2})\sqrt{2e\pi }}{\delta } \right]^n. \end{aligned}$$

for some universal constant \(C\).

The breakthrough of Kahn and Szemerédi was to realize that \(x^t M y\) can be controlled by virtue of a split into two types of terms. If \(x^tMy\) is written as a sum

$$\begin{aligned} x^t M y = \sum _{\begin{matrix} (u,v) \\ |x_uy_v| < \frac{\sqrt{d}}{n} \end{matrix}} x_uM_{uv}y_v + \sum _{\begin{matrix} (u,v) \\ |x_uy_v| \ge \frac{\sqrt{d}}{n} \end{matrix}} x_uM_{uv}y_v, \end{aligned}$$

then the contribution of the first sum turns out to be very nearly its mean because of the Lipschitz dependence of the sum on the edges of the graph. The contribution of the second sum turns out to never be too large for a very different reason: the number of edges between any two sets in the graph is on the same order as its mean. Following Feige and Ofek, for a fixed pair of vectors \((x,y) \in \mathcal T ^2,\) define the light couples \(\mathcal{L } = \mathcal{L }(x,y)\) to be all those ordered pairs \((u,v)\) so that \(|x_uy_v| \le \frac{\sqrt{d}}{n},\) and let the heavy couples \(\mathcal H \) be all those pairs that are not light.

4.1 Controlling the contribution of the light couples

Part of the advantage of having selected only the light couples is that their expected contribution is of the “correct” order, as the lemma below shows.

Lemma 25

$$\begin{aligned} \left| \mathbf{E}\sum _{ (u,v) \in \mathcal{L }} x_uM_{uv}y_v \right| \le 2\sqrt{d}. \end{aligned}$$

Proof

By symmetry, \(\mathbf{E}M_{uv}\) is simply equal to \(\frac{2d}{n},\) so that

$$\begin{aligned} \mathbf{E}\sum _{ \{u,v\} \in \mathcal{L }} x_uM_{uv}y_v = \frac{2d}{n}\sum _{ \{u,v\} \in \mathcal{L }} x_uy_v. \end{aligned}$$

Because each of \(x_u\) and \(y_v\) sum to \(0,\) the sum over light couples is equal in magnitude to the sum over heavy couples. Thus, it suffices to estimate

$$\begin{aligned} \left| \sum _{ \{u,v\} \in \mathcal H } x_uy_v \right|&\le \sum _{ \{u,v\} \in \mathcal H } \left| x_uy_v \right| = \sum _{ \{u,v\} \in \mathcal H } \frac{x_u^2y_v^2}{\left|x_uy_v\right|}\\&\le \frac{n}{\sqrt{d}}\sum _{ \{u,v\} \in \mathcal H }{x_u^2y_v^2},\quad \text{ by} \text{ the} \text{ defining} \text{ property} \text{ of} \text{ heavy} \text{ couples,}\\&\le \frac{n}{\sqrt{d}}. \end{aligned}$$

In the last step we recall that both \(\Vert x\Vert , \Vert y\Vert \le 1.\) \(\square \)

To show that not only the expectation, but the sum itself is of the correct order, we must prove a concentration estimate for this sum. For technical reasons, it is helpful if we deal with sums over fewer terms. To this end, define

$$\begin{aligned} A = A_1 + A_2 + \cdots + A_d. \end{aligned}$$

In terms of \(A\) it is enough to insist that for every \(x,y \in \mathcal T \)

$$\begin{aligned} \left|\sum _{ (u,v) \in \mathcal{L }} x_uA_{uv}y_v\right| \le t\sqrt{d} \end{aligned}$$

for then by symmetry,

$$\begin{aligned} \left|\sum _{ (u,v) \in \mathcal{L }} x_uM_{uv}y_v\right| \le 2t\sqrt{d}, \end{aligned}$$

for all \(x,y \in \mathcal T .\) As a further simplification, we will not prove a tail estimate for the whole quantity \(\sum _{ (u,v) \in \mathcal{L }} x_uA_{uv}y_v\); instead, fix an arbitrary collection \(U\) of vertices of size at most \(\lceil \frac{n}{2} \rceil .\) Having fixed this collection, we will show a tail estimate for \(\sum _{ (u,v) \in \mathcal{L } \cap U \times [n] } x_uA_{uv}y_v.\) This truncation is made to simplify a variance estimate (see (28)), and it might be possible to avoid it entirely.

Theorem 26

For every \(x,y \in \mathcal T \), and every \(U \subset [n]\) with \(|U| \le \lceil \frac{n}{2}\rceil ,\)

$$\begin{aligned} \mathbf{P}\left[ \left|\sum _{ (u,v) \in \mathcal{L } \cap U \times [n]} x_uA_{uv}y_v - \mathbf{E}x_uA_{uv}y_v \right| > t\sqrt{d} \right] \le C_0\exp \left( -\frac{nt^2}{C_1 + C_2t} \right) \end{aligned}$$

for some universal constants \(C_0,\,C_1\) and \(C_2.\) These constants can be taken as \(2,\,64\), and \(8/3\) respectively.

Proof

Let \({\tilde{\mathcal{L }}}\) be \(\mathcal{L } \cap U \times [n].\) We will estimate tail probabilities for \(\sum \nolimits _{ (u,v) \in {\tilde{\mathcal{L }}} } x_uA_{uv}y_v.\)

The main tool needed to establish this result is Freedman’s martingale inequality [23]. Let \(X_1, X_2, \ldots \) be martingale increments. Write \(\fancyscript{F}_k\) for the natural filtration induced by these increments, and define \(V_k = \mathbf{E}\left[ X_k^2 ~|~ \fancyscript{F}_{k-1} \right].\) If \(S_n\) is the partial sum \(S_n = \sum _{i=1}^n X_i\) (with \(S_0 = 0\)) and \(T_n\) is the sum \(T_n = \sum _{i=1}^n V_i\) (with \(T_0 = 0\)), then by analogy with the continuous case, one expects \(S_n\) to be a Brownian motion at time \(T_n\) (a discretization of the bracket process). The analogy requires, however, that the increments have some a priori bound. Namely, if \(|X_k| \le R,\)

$$\begin{aligned} \mathbf{P}\left[ \exists ~n \le \tau \,\text{ so} \text{ that}\,S_n \ge a\,\text{ and}\,T_n \ge b \right] \le 2\exp \left( -\frac{a^2/2}{\frac{Ra}{3} + b} \right). \end{aligned}$$

Remark 27

The constants quoted here are slightly better than the constants that appear in Freedman’s original paper. This statement of the theorem follows from Proposition 2.1 of [23] and the calculus lemma

$$\begin{aligned} (1+u) \log ( 1 + u) - u \ge \frac{u^2/2}{1+u/3}, \end{aligned}$$

for \(u \ge 0.\)

Reorder and relabel the vertices from \(U\) as \(x_1,x_2,\ldots , x_r,\) with \(r \le \lceil \frac{n}{2} \rceil \) so that \(|x_j|\) decreases in \(j.\) Order pairs \((i,j) \in [d] \times \{0,1,2,\ldots r\}\) lexicographically, and enumerate \(\pi _i(j)\) in this order as \(f_1,f_2, \ldots , f_{rd}.\) Define a filtration of \(\sigma \)-algebras \(\{\fancyscript{F}_{k}\}_{k=1}^{rd}\) by revealing these pieces of information, i.e. \(\fancyscript{F}_k = \fancyscript{F}_{k-1} \vee \pi (f_k).\) According to this filtration, let

$$\begin{aligned} S_k = \mathbf{E}\left[ \sum _{ (u,v) \in \tilde{\mathcal{L }}} x_uA_{uv}y_v \bigg \vert \fancyscript{F}_k \right] \end{aligned}$$

define a martingale and let \(X_k = X_{(i,j)}\) be the associated martingale increments.

The desired deviation bound can now be cast in terms of \(S_k\) as

$$\begin{aligned}&\mathbf{P}\left[ \left|\sum _{ {\tilde{\mathcal{L }}} } x_uA_{uv}y_v - \mathbf{E}x_uA_{uv}y_v\right| \ge t \right]\\&\quad \le \mathbf{P}\left[ \exists ~k \le rd\,\text{ so} \text{ that}\,|S_k -S_0| = |S_k| \ge t\,\text{ and}\,T_n \ge b \right]\\&\quad \le 2\exp \left( \frac{ -t^2/2}{(\frac{R t}{3} + b)}\right), \end{aligned}$$

provided that \(b\) satisfies

$$\begin{aligned} \sum _{k=1}^{rd} \mathbf{E}\left[ X_k^2 ~\big \vert ~ \fancyscript{F}_{k-1}\right] \le b. \end{aligned}$$

This reduces the problem to finding suitable \(R\) and \(b.\) The starting point for finding any such bound is simplifying the expression for the martingale increments \(X_{(i,k)}.\) To this end, let \(\pi \) be a fixed permutation of \([n],\) and define \(\Pi _k\) to be the collection of all permutations that agree with \(\pi \) in the first k entries, i.e.

$$\begin{aligned} \Pi _k = \{ \sigma ~:~ \sigma (i) = \pi (i)~i = 1, 2, \ldots , k \}. \end{aligned}$$

Further let \(T : \Pi _{k-1} \rightarrow \Pi _{k}\) be the map which maps a permutation to its nearest neighbor in \(\Pi _{k},\) in the sense of transposition distance, i.e.

$$\begin{aligned} T[\sigma ](i) = \left\{ \begin{array}{l@{\quad }l} \pi (k)&i = k \\ \sigma (k)&i = \sigma ^{-1}(\pi (k)) \\ \sigma (i)&\text{ else} \end{array}.\right. \end{aligned}$$

Note that this map is the identity upon restriction to \(\Pi _k.\) Let \(L_{[u,v]}\) be the characteristic function for \((u,v) \in \tilde{\mathcal{L }}.\) In terms of these notation, it is possible to express \(X_{(i,k)}\) as

$$\begin{aligned} X_{(i,k)} = \frac{1}{|\Pi _{k-1}|} \sum _{ \tau \in \Pi _{k-1}} \sum _{u \in U} x_u L_{[u,T[\tau ](u)]} y_{T[\tau ](u)} - x_u L_{[u,\tau (u)]} y_{\tau (u)}, \end{aligned}$$

where \(\pi = \sigma _i,\) and the contributions of the other \(\sigma _j\) all cancel. As \(\tau (u) = T[\tau ](u)\) except for when \(u=k\) or \(u = \tau ^{-1}( \pi (k)),\) this simplifies to

$$\begin{aligned} X_{(i,k)}&= \frac{1}{|\Pi _{k-1}|} \sum _{ \tau \in \Pi _{k-1}} \left(x_u L_{[u,\pi (k)]} y_{\pi (k)} - x_u L_{[u,\tau (k)]}y_{\tau (k)} \right. \\&\left. +\, x_{\tau ^{-1}(\pi (k))} L_{[\tau ^{-1}(\pi (k)),\tau (k)]}y_{\tau (k)} - x_{\tau ^{-1}(\pi (k))} L_{[\tau ^{-1}(\pi (k)),\pi (k)]}y_{\pi (k)} \right)\!. \end{aligned}$$

This can be recast probabilistically. Define two random variables \(v\) and \(u\) as

$$\begin{aligned} v&\sim&\mathrm Unif \left\{ [n] \setminus \pi [k] \right\} \!,\\ u&\sim&\mathrm Unif \left\{ [n] \setminus [k] \right\} \!, \end{aligned}$$

(where \([n] = \{1,2,\ldots , n\}\)) so that

$$\begin{aligned} \frac{n-k+1}{n-k}X_k&= \mathbf{E}\big [x_k L_{[k,v]} y_v - x_k L_{[k,\pi (k)]} y_{\pi (k)} + x_u L_{[u,\pi (k)]} y_{\pi (k)} \nonumber \\&- x_u L_{[u,v]} y_v \big \vert \fancyscript{F}_k \big ]. \end{aligned}$$
(27)

Terms for which \(\pi (k) = \tau (k)\) again cancel, and so we have disregarded these terms from the right hand side. It is also for this reason that the small correction appears in front of \(X_k.\) From here it is possible to immediately deduce a sufficient a priori bound on \(X_k,\) as each term in this expectation is at most \(\frac{\sqrt{d}}{n},\) so that

$$\begin{aligned} |X_k| \le 4\frac{\sqrt{d}}{n}. \end{aligned}$$

The conditional variance \(\mathbf{E}\left[X_k^2 ~\big \vert ~ \fancyscript{F}_{k-1}\right]\) is not much more complicated. Effectively, we take \(\pi (k)\) to be uniformly distributed over \([n] \setminus \pi [k-1]\) and bound \(\mathbf{E}\left[X_k^2 ~\big \vert ~ \fancyscript{F}_{k-1}\right]\) by

$$\begin{aligned} \mathbf{E}\left[X_k^2 ~\big \vert ~ \fancyscript{F}_{k-1}\right]&\le 4\mathbf{E}\left[ x_k^2(L_{[k,v]}y_v)^2 +x_k^2(L_{[k,\pi (k)]}y_{\pi (k)})^2 +x_u^2(L_{[u,\pi (k)]}y_{\pi (k)})^2\right.\\&\left.+x_u^2(L_{[u,v]}y_v)^2 ~\big \vert ~ \fancyscript{F}_{k-1} \right]. \end{aligned}$$

As we have ordered the \(x_i,\,x_u^2 \le x_k^2\). Further, by bounding all the \(L_{[a,b]}\) terms by \(1,\) and using that \(v\) is marginally distributed as \(\text{ Unif} \left\{ [n] \setminus \pi [k-1] \right\} \), this bound becomes

$$\begin{aligned} \mathbf{E}\left[X_k^2 ~\big \vert ~ \fancyscript{F}_{k-1}\right] \le 16\mathbf{E}\left[ x_k^2y_v^2 ~\big \vert ~ \fancyscript{F}_{k-1} \right]. \end{aligned}$$

Upon explicit calculation, we see that

$$\begin{aligned} \mathbf{E}\left[ y_v^2 ~\big \vert ~ \fancyscript{F}_{k-1} \right] = \frac{1}{n - k} \sum _{ [n] \setminus \pi [k-1] } y_v^2 \le \frac{1}{n-k}, \end{aligned}$$

where it has been used that \(\Vert y\Vert \le 1.\) Combining the above with (27), we see that

$$\begin{aligned} \mathbf{E}\left[X_k^2 ~\big \vert ~ \fancyscript{F}_{k-1}\right] \le \left[\frac{n-k}{n-k+1}\right]^2\frac{16x_k^2}{n-k} \le \frac{32x_k^2}{n} \end{aligned}$$
(28)

where it has been used that \(k \le r \le \lceil \frac{n}{2}\rceil .\) Summing over all martingale increments,

$$\begin{aligned} \sum _{i=1}^d \sum _{k=1}^{r} \frac{32x_k^2}{n} \le \frac{32d}{n}. \end{aligned}$$

Thus the Freedman martingale bound becomes

$$\begin{aligned} \mathbf{P}\left[ \left|\sum _{ {\tilde{\mathcal{L }}} } x_uA_{uv}y_v - \mathbf{E}x_uA_{uv}y_v\right| > t\sqrt{d} \right] \le 2\exp \left( \frac{ -nt^2}{64 + 8/3t}\right). \end{aligned}$$

\(\square \)

Let \(\mathcal{L }_{\mathrm{left}}\) be the set of vertices that appear in the first coordinate of some light couple, and choose \(U \subseteq \mathcal{L }_{\mathrm{left}}\) arbitrarily so that \(\left|U\right| = \lceil {\left| \mathcal{L }_\mathrm{left}\right|}/{2} \rceil .\) It follows then that, if \(U_1 := U\), and \(U_2 := \mathcal{L }_\mathrm{left} \setminus U_1\),

$$\begin{aligned}&\mathbf{P}\left[ \left|\sum _{ (u,v) \in \mathcal{L }} x_uA_{uv}y_v - \mathbf{E}x_u A_{uv} y_v \right| > t\sqrt{d} \right]\\&\quad \le 2\mathbf{P}\left[ \max _{i=1,2} \left|\sum _{ (u,v) \in \mathcal{L } \cap U_i \times [n] } x_uA_{uv}y_v - \mathbf{E}x_u A_{uv} y_v \right| > \frac{t}{2}\sqrt{d} \right]. \end{aligned}$$

From this point, it is possible to estimate

$$\begin{aligned} \mathbf{P}\left[ \exists ~x,y \in \mathcal T ~:~ \left|\sum _{\mathcal{L }} x_u M_{uv} y_v\right| > 2(2t+1)\sqrt{d}\right] \end{aligned}$$

by

$$\begin{aligned} \mathbf{P}\left[ \exists ~x,y \in \mathcal T ~:~ \left|\sum _{\mathcal{L } \cap U \times [n]} x_u[A_{uv} -\mathbf{E}A_{uv}]y_v\right| > t\sqrt{d}\right] \end{aligned}$$

Applying the union bound and Theorem 26, we see now that

$$\begin{aligned}&\mathbf{P}\left[ \exists ~x,y \in \mathcal T ~:~ \left|\sum _{\mathcal{L }} x_u M_{uv} y_v\right| > 2(2t+1)\sqrt{d}\right] \\&\quad \le C \left[\frac{(2+\delta )\sqrt{2e \pi }}{2\delta } \right]^{2n} \exp \left(\frac{-nt^2}{64 + 8t/3} \right), \end{aligned}$$

so that taking \(e - 2 \ge \delta \ge \frac{1}{2}\) and \(t = 27,\) it is seen that this probability decays exponentially fast, and we have proven

Theorem 28

There are universal constants \(C\) and \(K\) sufficiently large and \(c > 0\) so that for \( e - 2 \ge \delta \ge \frac{1}{2}\) and except for with probability at most

$$\begin{aligned} K\exp \left( -cn \right)\!, \end{aligned}$$

there is no pair of vectors \(x,y \in \mathcal T \) having

$$\begin{aligned} \left| \sum _{(u,v) \in \mathcal{L }} x_u M_{uv}y_v \right| \ge C\sqrt{2d}. \end{aligned}$$

It is possible to take \(C =110.\)

 

4.2 Controlling the contribution of the heavy couples

 

Lemma 29

(Discrepancy) For any two vertex sets \(A\) and \(B\), let \(e(A,B)\) denote the number of directed edges from \(A\) to \(B\) that result as a form \(\pi _i(a) = b\) for some \(1 \le i \le d,\,a \in A\) and \(b \in B.\) Let \(\mu (A,B) = |A||B|\frac{d}{n}.\) For every \(m>0,\) there are constants \(c_1 \ge e\) and \(c_2\) so that for every pair of vertex sets \(A\) and \(B\), except with probability \(n^{-m}\), exactly one of the following properties holds

  1. 1.

    either \( \frac{e(A,B)}{\mu (A,B)} \le c_1~, \)

  2. 2.

    or \( e(A,B)\log \frac{e(A,B)}{\mu (A,B)} \le c_2( |A| \vee |B|)\log \frac{n}{|A| \vee |B|} \)

It is possible to take \(c_1 =e^4\) and \(c_2 = 2e^2(6+m).\)

To prove this lemma, we rely on a standard type of large deviation inequality shown below, which mirrors the large deviation inequalities available for sums of i.i.d. indicators.

Lemma 30

For any \(k \ge e,\)

$$\begin{aligned} \mathbf{P}\left[ e(A,B) \ge k \mu (A,B) \right] \le \exp ( - k[ \log k -2] \mu ). \end{aligned}$$

Proof

Let \(e_\pi (A,B)\) denote the number \(a \in A\) so that \(\pi (a) \in B.\) It is possible to bound

$$\begin{aligned} \mathbf{P}\left[ e_\pi (A,B) = t \right] \le \frac{[a]_t[b]_t}{t![n]_t}, \end{aligned}$$

where we recall that \([a]_t = a (a-1) \cdots (a-t+1)\) is the falling factorial or Pochhammer symbol. Using the fact that \([n]_t \ge e^{-t}n^t,\) this may be bounded as

$$\begin{aligned} \mathbf{P}\left[ e_\pi (A,B) = t \right] \le \frac{a^tb^te^{t}}{t!n^t}, \end{aligned}$$

so that the Laplace transform of \(e_\pi (A,B)\) can be estimated as

$$\begin{aligned} \mathbf{E}\left[ \exp ( \lambda e_\pi (A,B)) \right] \le \sum _{t=0}^\infty e^{\lambda t} \frac{a^tb^te^{t}}{t!n^t} = \exp \left[ \frac{abe^{1+\lambda }}{n}\right]. \end{aligned}$$

Thus by Markov’s inequality, we have

$$\begin{aligned} \mathbf{P}\left[ e(A,B) \ge k \mu (A,B) \right]&\le \frac{\mathbf{E}\left[ \exp \left(\lambda \sum _{i=1}^d e_{\sigma _i}(A,B) \right)\right]}{e^{-k\lambda \mu }}\\&\le \exp \left[ \mu e^{1+\lambda } - k\lambda \mu \right], \end{aligned}$$

where \(\lambda >0\) is any positive number and \(\mu = \mu (A,B).\) Taking \(1+\lambda = \log k,\) valid for \(k > e,\) it follows that

$$\begin{aligned} \mathbf{P}\left[ e(A,B) \ge k \mu (A,B) \right] \le \exp \left[ -k(\log k - 2)\mu \right]\!, \end{aligned}$$

for \(k \ge e.\) \(\square \)

Armed with Lemma 30, we can proceed with the proof of Lemma 29.  

Proof of Lemma 29

If either of \(|A|\) or \(|B|\) is greater than \(\frac{n}{e},\) then \(e(A,B) \le (|A| \vee |B|)d,\) so that

$$\begin{aligned} \frac{e(A,B)}{\mu (A,B)} \le \frac{nd(|A| \vee |B|)}{|A||B|d} = \frac{n}{|A| \wedge |B|} \le e. \end{aligned}$$

Thus, it suffices to deal with the case that both \(A\) and \(B\) are less than \(\frac{n}{e}.\) In what follows, we will think of \(a\) and \(b\) as being the sizes of \(|A|\) and \(|B|\) in preparation to use a union bound. Let \(k = k(a,b,n)\) be defined as \(k = \max \{k^{*}, \frac{1}{e} \}\), where \(k^{*}\) satisfies

$$\begin{aligned} k^*\log k^* = \frac{(6+m)(a \vee b)n}{abd} \log \frac{n}{a \vee b}, \end{aligned}$$

or \(\frac{1}{e},\) whichever is larger. When \(a \vee b \le \frac{n}{e},\) it follows that

$$\begin{aligned} (6 + m)(a \vee b) \log \frac{n}{a \vee b} \ge 2a\log \frac{n}{a} +2b\log \frac{n}{b} +(2 + m)(a \vee b) \log \frac{n}{a \vee b}, \end{aligned}$$

where we have used the monotonicity of \(x\log \frac{n}{x}\) on \([1,\frac{n}{e}]\); thus

$$\begin{aligned} (6 + m)(a \vee b) \log \frac{n}{a \vee b} \ge a\left(1 + \log \frac{n}{a}\right) +b\left(1 + \log \frac{n}{b}\right) +(2 + m)\log n. \end{aligned}$$

Exponentiating,

$$\begin{aligned} \exp \left[ k \log k \frac{abd}{n} \right] \ge \left(\frac{ea}{n}\right)^n \left(\frac{eb}{n}\right)^n n^{2+m}, \end{aligned}$$

if \(k \ge \frac{1}{e}.\) It follows that

$$\begin{aligned}&\mathbf{P}\left[\exists A,B~\text{ with}~|A|=a,~|B|=b,~\text{ so} \text{ that}~e(A,B) \ge e^2k(a,b)\mu (A,B)\right] \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \le {n \atopwithdelims ()a}{n \atopwithdelims ()b}\exp ( -e^2k[ \log k] \mu ) \le n^{-2-m}. \end{aligned}$$

Moreover, applying this bound to all \(a\) and \(b,\) it follows that

$$\begin{aligned} e(A,B) \le e^2 k(|A|,|B|) \mu (A,B), \end{aligned}$$

except with probability smaller than \(n^{-m}.\) If for two sets \(A\) and \(B,\,k=\frac{1}{e},\) then

$$\begin{aligned} e(A,B) \le e \mu (A,B), \end{aligned}$$

and we are in the first case of the discrepancy property, for \(c_1 \ge e.\) Otherwise,

$$\begin{aligned} e(A,B) \log k \le e^2 k\log k \mu (A,B) = e^2(6+m)(a \vee b) \log \frac{n}{a \vee b}, \end{aligned}$$

and noting that \(k \ge \frac{e(A,B)}{e^2\mu (A,B)},\) it follows that

$$\begin{aligned} \frac{1}{2} e(A,B) \log \frac{e(A,B)}{\mu (A,B)} \le e(A,B) \log \frac{e(A,B)}{e^2 \mu (A,B)} \le e^2(6+m)(a \vee b) \log \frac{n}{a \vee b}, \end{aligned}$$

when \(\frac{e(A,B)}{\mu (A,B)} \ge e^4.\) If this is not the case, then we are again in the first case of the discrepancy property, taking \(c_1 \ge e^4.\) Taking \(c_1 = e^4,\) it follows that we may take \(c_2 = 2e^2(6+m).\) \(\square \)

The discrepancy property implies that there are no dense subgraphs, and thus the contribution of the heavy couples is not too large.

Lemma 31

If the discrepancy property holds, with associated constants \(c_1\) and \(c_2\), then

$$\begin{aligned} \sum _{ \{u,v\} \in \mathcal H } \left|x_uA_{u,v}y_v\right| \le C\sqrt{d}, \end{aligned}$$

for some constant \(C\) depending on \(c_1,c_2,\) and \(\delta .\)

Proof

The method of proof here is essentially identical to Kahn and Szemerédi or Feige and Ofek (see [21] or [22]). We provide a proof of this lemma for completeness as well as to establish the constants involved. We will partition the summands into blocks where each term \(x_u\) or \(y_v\) has approximately the same magnitude. Thus let \(\gamma _i = 2^i \delta ,\) and put

$$\begin{aligned} A_i&= \left\{ u~\big \vert ~\frac{\gamma _{i-1}}{\sqrt{n}} \le |x_u| < \frac{\gamma _{i}}{\sqrt{n}} \right\} \!, \quad 1\! \le i \le \log \lceil \sqrt{n} \rceil .\\ B_i&= \left\{ u~ \big \vert ~ \frac{\gamma _{i-1}}{\sqrt{n}} \le |y_u| < \frac{\gamma _{i}}{\sqrt{n}} \right\} \!,\quad 1\! \le i \le \log \lceil \sqrt{n} \rceil . \end{aligned}$$

Let \(\hat{\mathcal{H }}\) denote those pairs \((i,j)\) so that \(\gamma _i\gamma _j \ge \sqrt{d}.\) The contribution of the absolute sum can, in these terms, be bounded by

$$\begin{aligned} \sum _{ (u,v) \in \mathcal H } \left|x_uM_{u,v}y_v\right| \le \sum _{ (i,j) \in \hat{\mathcal{H }}} \frac{\gamma _i\gamma _j}{n} e(A_i,B_j). \end{aligned}$$

Let \(\lambda _{i,j} = \frac{e(A_i,B_j)}{\mu (A_i,B_j)}\) denote the discrepancy, which can be controlled using Lemma 29. In terms of this quantity, the bound becomes

$$\begin{aligned} \sum _{ (u,v) \in \mathcal H } \left|x_uM_{u,v}y_v\right| \le \sum _{ (i,j) \in \hat{\mathcal{H }}} \frac{\gamma _i\gamma _j}{n} \lambda _{i,j} |A_i||B_j|\frac{d}{n}. \end{aligned}$$

In this form, the magnitudes of each of the quantities are somewhat opaque. Consider the sum \(\sum _{i} |A_i| \frac{\gamma _i^2}{n};\) it is at most \(4\Vert x\Vert ^2= 4.\) In particular, it is of constant order. Thus let \(\alpha _i =|A_i| \frac{\gamma _i^2}{n}\) and \(\beta _j = |B_j| \frac{\gamma _j^2}{n}.\) This allows the bound to be rewritten as

$$\begin{aligned} d \sum _{ (i,j) \in \hat{\mathcal{H }}} \frac{\gamma _i^2|A_i|}{n}\frac{\gamma _j^2|B_j|}{n} \frac{\lambda _{i,j}}{\gamma _i\gamma _j} = \frac{d}{\sqrt{d}} \sum _{ (i,j) \in \hat{\mathcal{H }}} \alpha _i \beta _j \frac{\lambda _{i,j}\sqrt{d}}{\gamma _i\gamma _j}. \end{aligned}$$

This exposes the quantity \(\sigma _{i,j} = \frac{\lambda _{i,j}\sqrt{d}}{\gamma _i\gamma _j}\) as having some special importance. In effect, we will show that either for fixed \(i,\,\sum _{j} \sigma _{i,j} \beta _j\) has constant order, or for fixed \(j,\sum _{i} \sigma _{i,j}\alpha _i\) has constant order.

In what follows, we will bound the contribution of the summands where \(|A_i| \ge |B_j|.\) By symmetry, the contribution of the other summands will have the same bound. The heavy couples will now be partitioned into \(6\) classes \(\{ \hat{\mathcal{H }}_i\}_{i=1}^6\) where their contribution is bounded in a different way. Let \(\hat{\mathcal{H }}_i \subseteq \hat{\mathcal{H }}\) be those pairs \((i,j)\) which satisfy the \(i^{th}\) property from the following list but none of the prior properties:

  1. 1.

    \(\sigma _{i,j} \le c_1.\)

  2. 2.

    \(\lambda _{i,j} \le c_1.\)

  3. 3.

    \(\gamma _j > \frac{1}{4} \sqrt{d}\gamma _i.\)

  4. 4.

    \(\log \lambda _{i,j} > \frac{1}{4}\left[2\log \gamma _i + \log \frac{1}{\alpha _i} \right].\)

  5. 5.

    \(2\log \gamma _i \ge \log \frac{1}{\alpha _i}.\)

  6. 6.

    \(2\log \gamma _i < \log \frac{1}{\alpha _i}.\)

The last properties are better understood when the second case of the discrepancy property is expressed in present notation. In its original form, it states

$$\begin{aligned} e(A_i,B_j) \log \lambda _{i,j} \le c_2 |A_i| \log \frac{n}{|A_i|}. \end{aligned}$$

Substituting \(\gamma _i^2/\alpha _i\) for \(n/|A_i|\) and multiplying both sides of this equation through by \(\frac{\gamma _i }{|B_j|\gamma _j \sqrt{d} \log \lambda _{i,j}}\) produces the equivalent form

$$\begin{aligned} \sigma _{i,j}\beta _j \le c_2 \frac{\gamma _j}{\sqrt{d}\gamma _i}\frac{\left[2\log \gamma _i + \log \frac{1}{\alpha _i} \right]}{\log \lambda _{i,j}}. \end{aligned}$$

Thus, the last \(3\) cases cover each of the possible dominant \(\log \) terms in this bound.

4.2.1 Bounding the contribution of \(\hat{\mathcal{H }}_1\) and \(\hat{\mathcal{H }}_2\)

In either of these situations, we have a bound on \(\sigma _{i,j}.\) Especially, either \(\sigma _{i,j} \le c_1\) or, all the discrepancies \(\lambda _{i,j}\) are uniformly bounded by \(c_1.\) As

$$\begin{aligned} \sigma _{i,j} = \frac{\lambda _{i,j}\sqrt{d}}{\gamma _i \gamma _j}, \end{aligned}$$

and \(\gamma _i \gamma _j \ge \sqrt{d},\)

$$\begin{aligned} \sigma _{i,j} \le c_1 \end{aligned}$$

for both cases.

4.2.2 Bounding the contribution of \(\hat{\mathcal{H }}_3\)

For these terms, we fix j. In this case, the magnitudes of the entries corresponding to \(j\) of \(y_v\) dominate those of the entries corresponding to \(i\) of \(x_u.\) However, by regularity \(e(A_i,B_j) \le |B_j| d,\) so that the discrepancy \(\lambda _{i,j}\) is at most \(\frac{ n }{|A_i|} = \frac{ \gamma _i^2}{\alpha _i}.\)

$$\begin{aligned} \sum _{ i~:~(i,j) \in \hat{\mathcal{H _3}}} \alpha _i \sigma _{i,j} = \sum _{ i~:~(i,j) \in \hat{\mathcal{H _3}}} \alpha _i \frac{\lambda _{i,j}\sqrt{d}}{\gamma _i\gamma _j} \le \sum _{ i~:~(i,j) \in \hat{\mathcal{H _3}}} \frac{\gamma _i\sqrt{d}}{\gamma _j} \le 8, \end{aligned}$$

where in the last step it has been used that the sum is geometric with leading term less than \(4\gamma _j /\sqrt{d}.\)

4.2.3 Bounding the contribution of \(\hat{\mathcal{H }}_4\)

For these terms, we fix i. We are not in case \((2),\) and it follows that the second case of the discrepancy property holds. In present notation

$$\begin{aligned} \sigma _{i,j}\beta _j \le c_2 \frac{\sqrt{d}\gamma _j}{d\gamma _i}\frac{\left[2\log \gamma _i + \log \frac{1}{\alpha _i} \right]}{\log \lambda _{i,j}} \le \frac{4c_2\gamma _j}{\gamma _i \sqrt{d}}, \end{aligned}$$

where the hypothesis has been used. As we are not in case \((3)\), the sum of these terms is bounded as

$$\begin{aligned} \sum _{ j~:~(i,j) \in \hat{\mathcal{H _4}}} \beta _j \sigma _{i,j} \le 2c_2, \end{aligned}$$

where it has been used that the sum above has a geometric dominator with leading term at most \(\frac{1}{4}\gamma _i\sqrt{d}.\)

4.2.4 Bounding the contribution of \(\hat{\mathcal{H }}_5\)

For these terms, we fix i. Again, the second case of the discrepancy property holds. Now, in addition,

$$\begin{aligned} \log \lambda _{i,j} \le \frac{1}{4}\left[2\log \gamma _i + \log \frac{1}{\alpha _i} \right] \le \log \gamma _i, \end{aligned}$$

i.e. that \(\lambda _{i,j} \le \gamma _i.\) Furthermore, we are not in case \((1)\) so \( c_1 \le \sigma _{i,j} = \frac{\lambda _{i,j}\sqrt{d}}{\gamma _i \gamma _j} \le \frac{\sqrt{d}}{\gamma _j}.\) Thus the second discrepancy bound becomes

$$\begin{aligned} \sigma _{i,j}\beta _j \le c_2 \frac{\sqrt{d}\gamma _j}{d\gamma _i}\frac{\left[2\log \gamma _i + \log \frac{1}{\alpha _i} \right]}{\log \lambda _{i,j}} \le c_2 \frac{\gamma _j 4\log \gamma _i}{\sqrt{d}\gamma _i\log c_1} \le \frac{4c_2}{c_1} \frac{\gamma _j}{\sqrt{d}}, \end{aligned}$$

where it has been used that \(\gamma _i \ge \lambda _{i,j} \ge c_1 \ge e\), and that \(\log x / x\) is monotonically decreasing for \(x > e.\) Thus,

$$\begin{aligned} \sum _{ j~:~(i,j) \in \hat{\mathcal{H _5}}} \beta _j \sigma _{i,j} \le \sum _{ j~:~(i,j) \in \hat{\mathcal{H _5}}} \frac{4c_2}{c_1} \frac{\gamma _j}{\sqrt{d}} \le \frac{8c_2}{c_1^2}, \end{aligned}$$

where it has been used that the second sum above is geometric with largest term \(\sqrt{d}/c_1.\)

4.2.5 Bounding the contribution of \(\hat{\mathcal{H }}_6\)

For these terms, we fix j. The second case of the discrepancy property holds and in addition,

$$\begin{aligned} \log \lambda _{i,j} \le \frac{1}{4}\left[2\log \gamma _i + \log \frac{1}{\alpha _i} \right] \le \frac{1}{2} \log \frac{1}{\alpha _i}. \end{aligned}$$

This implies that \(\sigma \) satisfies the asymmetric bound \(\sigma _{i,j} \le \frac{1}{\alpha _i}\frac{\sqrt{d}}{\gamma _i\gamma _j}.\) Thus,

$$\begin{aligned} \sum _{ i~:~(i,j) \in \hat{\mathcal{H _6}}} \alpha _i \sigma _{i,j} \le \sum _{ i~:~(i,j) \in \hat{\mathcal{H _6}}}\frac{\sqrt{d}}{\gamma _i\gamma _j} \le 2, \end{aligned}$$

where it has been used that the sum above is geometric with leading term \(\frac{1}{\sqrt{d}}\) (which follows as \(\gamma _i\gamma _j \ge \sqrt{d}\)).

4.2.6 Assembling the bound

We must sum the contributions of each of the classes of couples. Recall that we must double the contribution here because we have only considered couples where \(|A_i| \ge |B_j|.\) In each of the cases outlined above, it only remains to sum over the \(\alpha _i\) or \(\beta _j\) in each bound. Doing so contributes a factor of \(4\) to each bound, so that the constant can be given by

$$\begin{aligned} 2\left[ 16c_1+32+8c_2+ \frac{32c_2}{c_1^2} +8 \right] \end{aligned}$$

\(\square \)  

4.3 Finalizing the proof of Theorem 24

Proof

We will take \(\delta = \frac{1}{2}.\) With \(m\) given, it follows the discrepancy property (Lemma 29) holds with probability at least \(1 - n^{-m},\) and with constants \(c_1 = e^4\) and \(c_2 = 2e^2(6+m).\) Therefore, by Lemma 31, for any two \(x, y \in \mathcal T ,\) the contribution of the heavy couples to \(x^tMy\) (which is at most twice the contribution of \(x^tAy\), given that the bounds hold for all \(x\) and \(y\)) is at most

$$\begin{aligned} 4\left[ 16c_1+32+8c_2+ \frac{32c_2}{c_1^2} +8 \right] \sqrt{d} \le (8854+ 585m) \sqrt{d}. \end{aligned}$$

By Theorem 28, with probability at least \((1- C\exp (-cn)\) for some universal constants \(C > 0\) and \(c >0,\) the contribution of the light couples is never more than \(110 \sqrt{d}\). Hence

$$\begin{aligned} \sup _{x,y \in \mathcal T } |x^t M y| \le (8964 + 585m) \sqrt{d}, \end{aligned}$$

except with probability at most \(n^{-m} + C\exp (-cn).\) At last, this implies that \(\lambda _2 \vee |\lambda _n| \le 4(8964 + 585m) \sqrt{d}\), except with probability at most \(n^{-m} + C\exp (-cn).\) \(\square \)

 

5 Linear statistics of eigenvalues

We now connect Sect. 3.2 to linear eigenvalue statistics of the adjacency matrix of \(G_n\). Let \(\{T_n(x)\}_{n \in \mathbb N }\) be the Chebyshev polynomials of the first kind on the interval \([-1,1]\). We define a set of polynomials

$$\begin{aligned} \Gamma _0(x)&= 1, \end{aligned}$$
(29)
$$\begin{aligned} \Gamma _{2k}(x)&= 2 T_{2k}\left(\frac{x}{2} \right) + \frac{2d-2}{(2d-1)^k}~,~\forall ~k \ge 1,\end{aligned}$$
(30)
$$\begin{aligned} \Gamma _{2k+1}(x)&= 2 T_{2k+1}\left(\frac{x}{2}\right)\!, ~\forall ~k \ge 0. \end{aligned}$$
(31)

We note that much of the following proposition can be found in Lemma 10.4 of [25].

Proposition 32

Let \(A_n\) be the adjacency matrix of \(G_n\), and let \(\lambda _1\ge \cdots \ge \lambda _n\) be the eigenvalues of \((2d-1)^{-1/2}A_n\). Then

$$\begin{aligned} N_{k}^{(n)}:=\sum _{i=1}^n\Gamma _k(\lambda _i)&=(2d-1)^{-k/2}\mathrm{CNBW}_{k}^{(n)}. \end{aligned}$$

Proof

To show the above, we will first use the Chebyshev polynomials of the second kind on \([-1,1]\), namely, \(\{U_n\}_{n \in \mathbb N }\).

Let

$$\begin{aligned} p_k(x)=U_k\left(\frac{x}{2}\right) - \frac{1}{2d-1}U_{k-2}\left(\frac{x}{2}\right)\!. \end{aligned}$$
(32)

It is known [1, eqn. 12] that \((2d-1)^{-k/2}\text{ NBW}_{k}^{(n)}=\sum _{i=1}^np_k(\lambda _i)\). We thus proceed by relating \(\mathrm{CNBW}_{k}^{(n)}\) to \(\text{ NBW}_{k}^{(n)}\).

A closed non-backtracking walk of length \(k\) is either cyclically non-backtracking or can be obtained from a closed non-backtracking walk of length \(k-2\) by “adding a tail,” i.e., adding a new step to the beginning of the walk and its reverse to the end. For any closed cyclically non-backtracking walk of length \(k-2\), we can add a tail in \(2d-2\) ways. For any closed non-backtracking walk of length \(k-2\) that is not cyclically non-backtracking, we can add a tail in \(2d-1\) ways. Hence for \(k\ge 3\),

$$\begin{aligned} \text{ NBW}_{k}^{(n)}&= \mathrm{CNBW}_{k}^{(n)}+(2d-2)\mathrm{CNBW}_{k-2}^{(n)}+(2d-1) \left(\text{ NBW}_{k-2}^{(n)}-\mathrm{CNBW}_{k-2}^{(n)}\right)\\&= \mathrm{CNBW}_{k}^{(n)}+(2d-1)\text{ NBW}_{k-2}^{(n)}-\mathrm{CNBW}_{k-2}^{(n)}. \end{aligned}$$

Applying this relation iteratively and noting that \(\mathrm{CNBW}_{k}^{(n)}=\text{ NBW}_{k}^{(n)}\) for \(k=1,2\), we have

$$\begin{aligned} \mathrm{CNBW}_{k}^{(n)} = \text{ NBW}_{k}^{(n)} - (2d-2) \left(\text{ NBW}_{k-2}^{(n)}+\text{ NBW}_{k-4}^{(n)}+ \cdots +\text{ NBW}_{a}^{(n)}\right) \end{aligned}$$

with \(a=2\) if \(k\) is even and \(a=1\) if \(k\) is odd. Observe now that

$$\begin{aligned} \Gamma _{2k}(x)&= p_{2k}(x)-(2d-2)\left(\frac{p_{2k-2}(x)}{2d-1} +\frac{p_{2k-4}(x)}{(2d-1)^2}+\cdots +\frac{p_{2}(x)}{(2d-1)^{k-1}}\right), \end{aligned}$$

and

$$\begin{aligned} \Gamma _{2k-1}(x) = p_{2k-1}(x)-(2d-2)\left(\frac{p_{2k-3}(x)}{2d-1} +\frac{p_{2k-5}(x)}{(2d-1)^2}+\cdots + \frac{p_{1}(x)}{(2d-1)^{k-1}}\right)\!. \end{aligned}$$

A quick calculation shows now that

$$\begin{aligned} \Gamma _{2k}(x)&= U_{2k} \left( \frac{x}{2} \right) - U_{2k-2} \left(\frac{x}{2} \right) + \frac{2d-2}{(2d-1)^{k/2}}~, \text{ while}\\ \Gamma _{2k+1}(x)&= U_{2k+1} \left( \frac{x}{2} \right) - U_{2k-1} \left( \frac{x}{2} \right)\!, \end{aligned}$$

and the rest follows from the fact that \(T_{k}(x) = \frac{1}{2} \left(U_{k}(x)- U_{k-2}(x) \right)\). \(\square \)

The weak convergence of the sequence \((\mathrm{CNBW}_{k}^{(n)},\; 1\le k \le r_n)\) in Theorem 21 allows us to establish limiting laws for a general class of linear functions of eigenvalues. First we will make some canonical choices of parameters \(\{r_n\}\). Define

$$\begin{aligned} r_n = \frac{\beta \log n}{\log (2d-1)}, \quad \text{ for} \text{ some}~\beta < 1/2. \end{aligned}$$
(33)

Note that \(2r_n \log (2d-1) = 2\beta \log n\), which shows (18), even when \(d\) grows with \(n\).

We now need another definition. Let \(h\) be a function on \(\mathbb R \) such that

$$\begin{aligned} h(r_n) \ge \log (2d-1), \quad \text{ for} \text{ all} \text{ large} \text{ enough}~n. \end{aligned}$$
(34)

This definition is not so important when \(d\) is fixed, since a constant \(h(x)\equiv \log (2d-1)\) for all \(x\in \mathbb R \) is a good choice. However, when \(d\) grows with \(n\), an appropriate choice needs to be made. For example when \(2d-1= (\log n)^\gamma \) for some \(\gamma >0\), one may take

$$\begin{aligned} h(x) = C\log x,\quad \text{ for} \text{ some} \text{ large} \text{ enough} \text{ positive} \text{ constant}~C. \end{aligned}$$
(35)

For our next result, we will use some theorems from Approximation Theory. Recall that every function \(f\) on \([-1, 1]\) which is square-integrable with respect to the arc-sine law has a series expansion with respect to the Chebyshev polynomials of the first kind. Good references for approximation theory and the Chebyshev polynomials are the book [38] and the (yet unpublished) book [45].

Recall the polynomials \(\Gamma _k(x)\) as defined in (29); if a function has a series expansion in terms of Chebyshev polynomials of the first kind, \(T_k(x)\), on \([-1,1]\), then it has a series expansion in terms of \(\Gamma _k(x)\) on \([-2, 2]\).

We recall the definition of a Bernstein ellipse of radius \(\rho \).

Definition 33

Let \(\rho >1\), and let \(\mathcal E _B(\rho )\) be the image of the circle of radius \(\rho \), centered at the origin, under the map \(f(z) = \frac{z+ z^{-1}}{2}\). We call \(\mathcal E _B(\rho )\) the Bernstein ellipse of radius \(\rho \). The ellipse has foci at \(\pm 1\), and the sum of the major semiaxis and the minor semiaxis is exactly \(\rho \).

To prove our main result for \(d\) fixed, we first need a lemma.

Lemma 34

Suppose that \(d\ge 2\) is fixed. Let \(f\) be a function defined on \(\mathbb C \) which is analytic inside a Bernstein ellipse of radius \(2\rho \), where \(\rho = (2d-1)^{\alpha }\), for some \(\alpha >2\), and such that \(|f(z)| < M\) inside this ellipse.

Let \(f(x) = \sum _{i=0}^{\infty } c_i \Gamma _i(x)\) for \(x\) on \([-2,2]\) (the existence, as well as uniform convergence of the series on \([-2, 2]\), is guaranteed by the fact that \(f\) is analytic on \([-2,2]\)).

Then the following things are true:

  1. (i)

    The expansion of \(f(x)\) in terms of \(\Gamma _i(x)\) actually converges uniformly on \([-2-\epsilon , 2+ \epsilon ]\) for some small enough \(\epsilon >0\).

  2. (ii)

    The aforementioned series expansion also converges pointwise on \([2, \frac{2d}{\sqrt{2d-1}}]\).

  3. (iii)

    If \(f_k :=\sum _{i=0}^k c_i \Gamma _i\) is the \(k\)th truncation of this (modified) Chebyshev series for \(f\), then, for a small enough \(\epsilon \),

    $$\begin{aligned} \sup _{0\le \left| x \right| \le 2+\epsilon } \left|f(x) - f_k(x) \right| \le M^{\prime } \left(2d-1\right)^{-\alpha ^{\prime } k}, \end{aligned}$$

    where \(2<\alpha ^{\prime }<\alpha \), and \(M^{\prime }\) is a constant independent of \(k\).

  4. (iv)

    For all \(k \in \mathbb N \), let \(b_k = \frac{1}{(2d-1)^k}\), and let \(\omega _k\) be the sequence of weights described in Theorem 21. Then the sequence of coefficients \(\{c_k\}_{k \in \mathbb N }\) satisfies

    $$\begin{aligned} \left( \frac{c_k}{(2d-1)^{k/2} \omega _k} \right)_{k\in \mathbb N } \in \mathbf{L}^2(\underline{\omega }). \end{aligned}$$

Proof

We will prove the facts (i) through (iv) in succession.

Facts (i) and (ii) will use a particular expression for \(T_n(x)\) outside \([-1,1]\), namely,

$$\begin{aligned} T_n(x) = \frac{(x-\sqrt{x^2-1})^n + (x + \sqrt{x^2-1})^n}{2}. \end{aligned}$$
(36)

For Fact (i), it is easy to see that if \(x\) is in \([-2-\epsilon , 2+ \epsilon ]\), and particularly for \(\epsilon \) small enough,

$$\begin{aligned} |\Gamma _k(x)| \le C (1+ 3 \sqrt{\epsilon })^k~, \end{aligned}$$

where \(C\) is some constant independent of \(k\).

By Theorem 8.1 in [45], which first appeared in Section 61 of [7], it follows that

$$\begin{aligned} |c_k| \le M^{\prime } (2d-1)^{-\alpha k}~, \end{aligned}$$
(37)

for some constant \(M^{\prime }\) which may depend on \(M\) and \(d\), but not on \(k\).

Note that \(1+ 3\sqrt{\epsilon } < (2d-1)^{\alpha }\), for any \(d\ge 2,\,\alpha >2\), and \(\epsilon \) small enough.

Consequently, the series \(\sum _{k=0}^{\infty } c_k \Gamma _k(x)\) is absolutely convergent on \([-2-\epsilon , 2+ \epsilon ]\), and hence the expansion of \(f\) into this modified Chebyshev series is valid (and absolutely convergent) on \([-2-\epsilon , 2+ \epsilon ]\). This proves Fact (i).

Similarly, we now look on the interval \([2, \frac{2d}{\sqrt{2d-1}}]\), and note that on that interval the expression for \(T_n(x/2)\) will be bounded from above by

$$\begin{aligned} |T_n(x/2)| < \frac{1+ (2d-1)^{n/2}}{2}~; \end{aligned}$$

indeed, this happens because \(x/2 - \sqrt{x^2/4-1}\) is decreasing (and maximally \(1\), at \(x=2\)) while \(x/2+\sqrt{x^2/4-1}\) is increasing (and maximally \((2d-1)^{n/2}\), at \(x=2d/\sqrt{2d-1}\)).

From here it follows once again that

$$\begin{aligned} |\Gamma _n(x)| \le 2 (2d-1)^{n/2}~, \end{aligned}$$

on \([2, \frac{2d}{\sqrt{2d-1}}]\), and thus the series \(\sum _{k=0}^{\infty } c_k \Gamma _k(x)\) is absolutely convergent on this interval as well. The equality with the function \(f\) follows from analyticity. This proves Fact (ii).

Fact (iii) is an immediate consequence of (37), by taking \(\epsilon \) small enough relative to \(d\) and \(\alpha \).

Fact (iv) follows easily from the definitions of \(\omega _k,\,\Theta _k\) (given in Theorem 21), and from (37). \(\square \)

We can now present our main result for the case when \(d\) is fixed.

Theorem 35

Assume the same conditions on \(f\) and notations as in Lemma 34. Then the random variable \(\sum _{i=1}^n f(\lambda _i) - nc_0\) converges in law to the infinitely divisible random variable

$$\begin{aligned} Y_f:=\sum _{k=1}^\infty \frac{c_k}{(2d-1)^{k/2}} \mathrm{CNBW}_{k}^{(\infty )}~. \end{aligned}$$

Remark 36

There is a good explanation of why we must subtract \(nc_0\) in the statement of the above theorem. Consider the Kesten-McKay density, normalized to have support \([-2,2]\):

$$\begin{aligned} \rho _{2d}(x) = \frac{2d(2d-1)\sqrt{4-x^2}}{2\pi (4d^2-(2d-1)x^2)}. \end{aligned}$$

It is proved in [37] that in the uniform model of random \(d\)-regular graph, the random variable \(n^{-1}\sum _{i=1}^nf(\lambda _i)\) converges in probability to \(\int _{-2}^2 f(x)\rho _d(x)dx\). This also holds for the present model; one can prove it by applying the contiguity results of [26], or by using the above theorem to compute that \(\lim _{n\rightarrow \infty }n^{-1}\sum _{i=1}^n \lambda _i^k\) is the \(k\)th moment of the Kesten-McKay law.

If \(\sum _{i=1}^n f(\lambda _i)\) converges in distribution (without subtracting the constant), then \(n^{-1}\sum _{i=1}^n f(\lambda _i)\) converges to zero in probability. Thus such a function \(f\) must be orthogonal to one in the \(\mathbf{L}^2\) space of the Kesten-McKay law. It has been shown in [41, Example 5.3] that the polynomials \((p_k)\), defined in (32), along with the constant polynomial \(p_0\equiv 1\) constitute an orthogonal basis for the \(\mathbf{L}^2\) space. The polynomials \((\Gamma _k)\), being linear combinations of \((p_k,\; k\ge 1)\), are therefore orthogonal to one in that \(\mathbf{L}^2\) space. Hence for any \(f\) of Theorem 35, the function \(f - c_0\) is orthogonal to the Kesten-McKay law.

Proof

Armed with the results of Lemma 34, the proof is simple.

We first claim that

$$\begin{aligned} Y_{f}^{(n)}:=\sum _{k=1}^{r_n} c_k N_{k}^{(n)}= \sum _{k=1}^{r_n} \frac{c_k}{(2d-1)^{k/2} \omega _k} \mathrm{CNBW}_{k}^{(n)} \omega _k \end{aligned}$$

converges in law to \(Y_f\) as \(n\) tends to infinity. This follows from Theorem 21 and Lemma 4 once we show that the sequence

$$\begin{aligned} \left( \frac{c_k}{(2d-1)^{k/2} \omega _k} \right)_{k\in \mathbb N } \in \mathbf{L}^2(\underline{\omega }). \end{aligned}$$

This is precisely Fact (iv) from Lemma 34.

The result will now follow from Slutsky’s theorem once we show that, for any \(\delta >0\),

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathbf{P}\left( \left|\sum _{i=1}^n f(\lambda _i) - nc_0- Y_{f}^{(n)} \right| > \delta \right) = 0. \end{aligned}$$
(38)

The proof of (38) has two components. Choose the parameter \(\beta \) in (33) such that \(\alpha \beta < 1\). This also implies \(\beta < 1/2\). We start by noting

$$\begin{aligned} nc_0+ Y_{f}^{(n)}= \sum _{i=1}^n \sum _{k=1}^{r_n} c_k \Gamma _k(\lambda _i)= f_{r_n}(\lambda _1) + \sum _{i=1}^{n-1} f_{r_n}(\lambda _i). \end{aligned}$$

Recall that the first eigenvalue of \(A_n\) is exactly \(2d\), irrespective of \(n\). Thus, once we scale \(A_n\) by \(\sqrt{2d-1}\), by Fact (ii) from Lemma 34, \(f_{r_n}(\frac{2d}{\sqrt{2d-1}})\) converges as a deterministic sequence to \(f(\frac{2d}{\sqrt{2d-1}})\). Choose a large enough \(n_1\) such that

$$\begin{aligned} \left|f_{r_n}\left(\frac{2d}{\sqrt{2d-1}}\right)- f \left(\frac{2d}{\sqrt{2d-1}}\right) \right| < \delta /4, \quad \text{ for} \text{ all}\,n \ge n_1. \end{aligned}$$

On the other hand, if we define the event

$$\begin{aligned} A_n := \left\{ \left|\lambda _i \right|\le 2 + \epsilon ,\,\text{ for} \text{ all}\,i>1\right\} \!, \end{aligned}$$

Theorem 1.1 in [25], shows that \(\mathbf{P}\left( A_n\right) \ge 1- cn^{-\tau }\), for some positive constants \(c\) and \(\tau \). On this event, Fact (i) from Lemma 34, together with (33), implies that

$$\begin{aligned} \sum _{i=2}^{n-1}\left|f(\lambda _i) - f_{r_n}(\lambda _i) \right|&\le (n-1) M \exp \left( -\alpha r_n \log (2d-1) \right)= M n\exp (-\alpha \beta \log n)\\&= Mn^{-\alpha \beta +1}=o(1). \end{aligned}$$

Choose a large enough \(n_2\) such that the above number is less than \(\delta /4\).

Thus, for all \(n \ge \max (n_1, n_2)\), we have

$$\begin{aligned} \mathbf{P}\left( \left|\sum _{i=1}^n f(\lambda _i) - nc_0- Y_{f}^{(n)} \right| > \delta \right) \le P(A_n^c) = cn^{-\tau }=o(1). \end{aligned}$$

This completes the proof. \(\square \)

 

Remark 37

We now take a moment to demonstrate how to compute the limiting distribution of \(\sum _{j=1}^n\Gamma _k(\lambda _j)\) when \(d=1\) using the results of [4], and we show that it is consistent with our own results. (Though in this paper we focus on \(d \ge 2\), our techniques apply for \(d=1\), too, and prove nearly the same result as Theorem 35.) Let \(M_n\) be a uniform random \(n\times n\) permutation matrix with eigenvalues \(e^{2\pi i \varphi _1},\ldots ,e^{2\pi i\varphi _n}\) on the unit circle. Let \(A_n=M_n+M_n^T\) with eigenvalues \(\lambda _1,\ldots ,\lambda _n\), which satisfy \(\lambda _j=2\cos (2\pi \varphi _j)\). We define \(f(x)=\Gamma _k(2\cos (2\pi x))=2\cos (2\pi k x)+c_k\), where \(c_k=0\) when \(k\) is odd and \(c_k=(2d-2)/(2d-1)^{k/2}\) when \(k\) is even. Then \(\sum _{j=1}^n\Gamma _k(\lambda _j)=\sum _{j=1}f(\varphi _j)\).

Theorem 1.1 of [4] gives the characteristic function of the limiting distribution \(\mu _f\) of \(\sum _{j=1}f(\varphi _j)-\mathbf{E}\sum _{j=1}f(\varphi _j)\) as

$$\begin{aligned} \hat{\mu }_f(t)&=\exp \left(\int (e^{itx}-1-itx)dM_f(x)\right) \end{aligned}$$

with \(M_f\) given by

$$\begin{aligned} M_f&= \sum _{j=1}^{\infty }\frac{1}{j}\delta _{jR_j(f)},\\ R_j(f)&= \frac{1}{j}\sum _{h=0}^{j-1}f\left(\frac{h}{j}\right)-\int \limits _0^1 f(x)dx. \end{aligned}$$

It is straightforward to calculate that

$$\begin{aligned} R_j(f)=\left\{ \begin{array}{l@{\quad }l} 2&\text{ if}\,j|k,\\ 0&\text{ otherwise}. \end{array}\right. \end{aligned}$$

Thus we find

$$\begin{aligned} \hat{\mu }_f(t)=\exp \left(\sum _{j|k}\frac{1}{j}(e^{2itj}-1)-2it)\right), \end{aligned}$$

which is the characteristic function of \(\mathrm{CNBW}_{k}^{(\infty )}-\mathbf{E}\big [\mathrm{CNBW}_{k}^{(\infty )}\big ]\) for \(d=1\) (note that \(a(d,k)=2\) in this case).

Finally, we consider now the case of growing degree \(d = d_n \) and the relationship between \(d_n\) and \(r_n\), as given in the statement of Theorem 22 and in (33). Although we have chosen not to use the notation \(d_n\) elsewhere in the paper, we will use it here, to emphasize each pair \((d_n, r_n)\). For our results to be applicable, we will need that both \(d_n\) and \(r_n\) grow to \(\infty \).

We will first remove the dependence on \(d_n\) for our orthogonal polynomial basis, making them scaled Chebyshev. Define

$$\begin{aligned} \Phi _0(x)&= 1,\end{aligned}$$
(39)
$$\begin{aligned} \Phi _{k}(x)&= 2 T_k \left( \frac{x}{2} \right)\!, \quad k \ge 1. \end{aligned}$$
(40)

If \(A_n\) is the adjacency matrix of \(G_n\) and \(\lambda _1\ge \cdots \ge \lambda _n\) are the eigenvalues of \((2d_n-1)^{-1/2}A_n\) and \(k \ge 1\), then

$$\begin{aligned} \sum _{i=1}^n\Phi _k(\lambda _i) =\left\{ \begin{array}{l@{\quad }l} (2d_n-1)^{-k/2}\left(\mathrm{CNBW}_{k}^{(n)} - (2d_n-2)n\right)&\text{ if}~k~\text{ is} \text{ even},\\ (2d_n-1)^{-k/2}\mathrm{CNBW}_{k}^{(n)}&\text{ if}~k~\text{ is} \text{ odd}. \end{array}\right. \end{aligned}$$

Please note from (23) that

$$\begin{aligned} \widetilde{N}_{k}^{(n)} = \left\{ \begin{array}{l@{\quad }l} \sum \nolimits _{i=1}^n\Phi _k(\lambda _i) -(2d_n-1)^{-k/2}\big (\mu _k(d_n)-(2d_n-2)n\big )&\text{ if}~k~\text{ is} \text{ even}\\ \sum \nolimits _{i=1}^n\Phi _k(\lambda _i) -(2d_n-1)^{-k/2}\mu _k(d_n)&\text{ if}~k~\text{ is} \text{ odd} \end{array}\right. \end{aligned}$$

Our final result is very similar in spirit to Theorem 35, and we will need a helpful tool like Lemma 34 to make it work.

Lemma 38

Suppose now that \(d_n,\,r_n\) are growing with \(n\) and governed by (33). Consider the polynomials \(\Phi _k\) as in (39). Let \(f\) be an entire function on \(\mathbb C \). Let \(a>1\) be a fixed real number. Then

  1. (i)

    \(f\) admits an absolutely convergent (modified) Chebyshev series expansion

    $$\begin{aligned} f(x) = \sum _{i=0}^{\infty } c_i \Phi _i(x) \end{aligned}$$

    on \([-a, a]\);

  2. (ii)

    for some choice of weights \(\underline{\omega }=(b_k/k^2\log k)_{k\in \mathbb N }\) from Theorem 22, the sequence of coefficients \((c_k)_{k\in \mathbb N }\) satisfies

    $$\begin{aligned} \left( \frac{c_k}{\omega _k} \right)_{k\in \mathbb N }&\in&\mathbf{L}^2\left( \underline{\omega }\right)\!. \end{aligned}$$
    (41)

Proof

Both Facts (i) and (ii) follow in the same way as the proofs of Facts (i) and (ii) from Lemma 34, noting that, since \(f\) is entire, it is sufficient to choose a Bernstein ellipse of radius large enough. This will provide a fast-enough decaying geometric bound on the coefficients, to compensate for the bounds on the growth of the \(T_n(x)\) as given by (43), on the fixed interval \([-a,a]\).

We detail a bit more the proof of Fact (ii), since it is only (slightly) more complex. Choose for example \(b_k = \frac{1}{2^k}\); since \(f\) is entire, choose the Bernstein ellipse of radius \(3C\), on which \(f\) is bounded by some given \(B\); as in the proof of Theorem 35, this states that the coefficients \(c_n\) are bounded by

$$\begin{aligned} |c_n| \le B^{\prime } (3C)^{-n}~, \end{aligned}$$
(42)

for some \(B^{\prime }\) independent of \(n\).

As before, thanks to the expression (36), we can bound the growth of the modified Chebyshev polynomials on \([-C, C]\) by

$$\begin{aligned} \max _{x \in [-C, C]} |T_n(x/2)| \le B^{\prime \prime } C^n, \end{aligned}$$
(43)

for some \(B^{\prime \prime }\) independent of \(n\).

With these choices for \(\omega \) and \((b_k)_{k \in \mathbb N }\), (41) follows now from (42) and (43). \(\square \)

We can now give our main result for the case when \(d_n\) and \(r_n\) both grow. The essential difference from before is in the centering and in assumption (ii) below which stresses the dependence on the growth rate of the degree sequence.

Theorem 39

Assume the same setup as in Lemma 38, with the following additional constraints on the entire function \(f\):

  1. (i)

    Let \(C:=C(1)\) be chosen according to Theorem 24. Let \(f_k := \sum _{i=0}^k c_i \Phi _i\) denote the \(k\)th truncation of this series on \([-C, C]\). Then

    $$\begin{aligned} \sup _{0\le \left|x \right| \le C} \left|f(x) - f_k(x) \right| \le M \exp \left( -\alpha k h(k) \right)\!, \quad \text{ for} \text{ some}~\alpha >2~\text{ and}~M >0, \end{aligned}$$

    where \(h\) has been defined in (34).

  2. (ii)

    Recall the definition of sequence \((r_n)\) from (33) with a choice of \(\beta < 1/\alpha \). Then \(f\) and its sequence of truncations, \(f_{r_n}\), satisfy

    $$\begin{aligned} \lim _{n\rightarrow \infty } \left|f_{r_n}\left(2d_n (2d_n-1)^{-1/2} \right) - f\left(2d_n(2d_n-1)^{-1/2}\right) \right|=0. \end{aligned}$$

Define now the array of constants

$$\begin{aligned} m^f_k(n):= \sum _{i=1}^k \frac{c_i}{(2d_n-1)^{i/2}}\left( \mu _i(d_n) - \mathbf{1}_{(i~\mathrm{is even})}(2d_n-2)n \right)\!. \end{aligned}$$

If conditions (i) and (ii) above are satisfied, the sequence of random variables

$$\begin{aligned} \left(\sum _{i=1}^n f(\lambda _i) - nc_0 - m^f_{r_n}(n)\right)_{n\in \mathbb N } \end{aligned}$$

converges in law to a normal random variable with mean zero and variance \(\sigma _f^2 = \sum _{k=1}^\infty 2k c_k^2\).

Remark 40

Note the significance of the term \(h(k)\). The presence of \(h(k)\), which is usually a logarithmic term as in (35), demands somewhat more than just analyticity of \(f\). Similarly, requirement (ii) requires convergence of the truncations sequence, evaluated at points diverging to \(\infty \); it is a kind of “diagonal” convergence, which is not automatically satisfied even for entire functions.

Proof

The proof is almost identical to the proof of Theorem 35 and we only highlight the slight differences. As before, define

$$\begin{aligned} nc_0+ Y_{n}^{(f)}:= \sum _{k=1}^{r_n} c_k \widetilde{N}_{k}^{(n)} = \sum _{i=1}^n \left(\sum _{k=1}^{r_n} c_k \Phi _k(\lambda _i)\right) - m^f_{r_n}(n). \end{aligned}$$

To prove that \(Y_{n}^{(f)}\) converges in law to \(N(0, \sigma _f^2)\), we use Fact (ii) from Lemma 38 together with assumption (ii); by Theorem 22, the convergence follows. We only need to show that

$$\begin{aligned} \left|\sum _{i=1}^{n}f(\lambda _i) - nc_0- Y_{n}^{(f)} \right| \end{aligned}$$

converges to zero in probability. The convergence for \(\lambda _1\) is given by assumption (ii), while the rest of it is assured by assumption (i) and Theorem 24. \(\square \)