Skip to main content
Log in

The spectral norm of random inner-product kernel matrices

  • Published:
Probability Theory and Related Fields Aims and scope Submit manuscript

Abstract

We study an “inner-product kernel” random matrix model, whose empirical spectral distribution was shown by Xiuyuan Cheng and Amit Singer to converge to a deterministic measure in the large n and p limit. We provide an interpretation of this limit measure as the additive free convolution of a semicircle law and a Marchenko–Pastur law. By comparing the tracial moments of this matrix to an additive deformation of a Wigner matrix, we establish that for odd kernel functions, the spectral norm of this matrix converges almost surely to the edge of the limiting spectrum. Our study is motivated by the analysis of a covariance thresholding procedure for the statistical detection and estimation of sparse principal components, and our results characterize the limit of the largest eigenvalue of the thresholded sample covariance matrix in the null setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Amini, A.A., Wainwright, M.J.: High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Stat. 37(5B), 2877–2921 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bai, Z.D., Yin, Y.Q.: Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. Ann. Probab. 21(3), 1275–1294 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  3. Baik, J., Arous, G.B., Péché, S.: Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33(5), 1643–1697 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  4. Baik, J., Silverstein, J.W.: Eigenvalues of large sample covariance matrices of spiked population models. J. Multivar. Anal. 97(6), 1382–1408 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  5. Berthet, Q., Rigollet, P.: Complexity theoretic lower bounds for sparse principal component detection. In: Conference on Learning Theory, pp. 1046–1066 (2013)

  6. Berthet, Q., Rigollet, P.: Optimal detection of sparse principal components in high dimension. Ann. Stat. 41(4), 1780–1815 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  7. Biane, P.: On the free convolution with a semi-circular distribution. Indiana Univ. Math. J. 46(3), 705–718 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bickel, P.J., Levina, E.: Covariance regularization by thresholding. Ann. Stat. 36(6), 2577–2604 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  9. Birnbaum, A., Johnstone, I.M., Nadler, B., Paul, D.: Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Stat. 41(3), 1055–1084 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  10. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)

  11. Cai, T.T., Liu, W.: Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 106(494), 672–684 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cai, T.T., Ma, Z., Wu, Y.: Sparse PCA: optimal rates and adaptive estimation. Ann. Stat. 41(6), 3074–3110 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cai, T.T., Ma, Z., Wu, Y.: Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Relat. Fields 161(3–4), 781–815 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  14. Cai, T.T., Zhou, H.H.: Minimax estimation of large covariance matrices under \(\ell _1\)-norm. Stat. Sin. 22(4), 1319–1349 (2012)

    MATH  Google Scholar 

  15. Cai, T.T., Zhou, H.H.: Optimal rates of convergence for sparse covariance matrix estimation. Ann. Stat. 40(5), 2389–2420 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  16. Capitaine, M., Donati-Martin, C., Féral, D., Février, M.: Free convolution with a semicircular distribution and eigenvalues of spiked deformations of Wigner matrices. Electron. J. Probab. 16(64), 1750–1792 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  17. Capitaine, M., Péché, S.: Fluctuations at the edges of the spectrum of the full rank deformed GUE. Probab. Theory Relat. Fields 165(1), 117–161 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  18. Carleson, L.: On Bernstein’s approximation problem. Proc. Am. Math. Soc. 2(6), 953–961 (1951)

    MathSciNet  MATH  Google Scholar 

  19. Chafaï, D., Tikhomirov, K.: On the convergence of the extremal eigenvalues of empirical covariance matrices with dependence. arXiv preprint arXiv:1509.02231 (2015)

  20. Cheng, X., Singer, A.: The spectrum of random inner-product kernel matrices. Random Matrices Theory Appl. 2(4), 1350010-1–1350010-47 (2013)

  21. d’Aspremont, A., El Ghaoui, L., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  22. Deshpande, Y., Montanari, A.: Sparse PCA via covariance thresholding. J. Mach. Learn. Res. 17(141), 1–41 (2016)

    MathSciNet  MATH  Google Scholar 

  23. Do, Y., Vu, V.: The spectrum of random kernel matrices: universality results for rough and varying kernels. Random Matrices Theory Appl. 2(3), 1350005-1–1350005-29 (2013)

  24. El Karoui, N.: Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Stat. 36(6), 2717–2756 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  25. El Karoui, N.: The spectrum of kernel random matrices. Ann. Stat. 38(1), 1–50 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  26. Erdös, L., Schlein, B., Yau, H.T., Yin, J.: The local relaxation flow approach to universality of the local statistics for random matrices. Ann. Inst. Henri Poincaré Probab. Stat. 48(1), 1–46 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  27. Fuk, D.K., Nagaev, S.V.: Probability inequalities for sums of independent random variables. Theory Probab. Appl. 16(4), 643–660 (1971)

    Article  MATH  Google Scholar 

  28. Füredi, Z., Komlós, J.: The eigenvalues of random symmetric matrices. Combinatorica 1(3), 233–241 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  29. Geman, S.: A limit theorem for the norm of random matrices. Ann. Probab. 8(2), 252–261 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  30. Götze, F., Tikhomirov, A.: Rate of convergence in probability to the Marchenko–Pastur law. Bernoulli 10(3), 503–548 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  31. Johnstone, I.M., Lu, A.Y.: Sparse principal components analysis. Unpublished manuscript (2004)

  32. Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), 682–693 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  33. Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the lasso. J. Comput. Graph. Stat. 12(3), 531–547 (2003)

    Article  MathSciNet  Google Scholar 

  34. Kasiviswanathan, S.P., Rudelson, M.: Spectral norm of random kernel matrices with applications to privacy. arXiv preprint arXiv:1504.05880 (2015)

  35. Koltchinskii, V., Giné, E.: Random matrix approximation of spectra of integral operators. Bernoulli 6(1), 113–167 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  36. Krauthgamer, R., Nadler, B., Vilenchik, D.: Do semidefinite relaxations solve sparse PCA up to the information limit? Ann. Stat. 43(3), 1300–1322 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  37. Latala, R.: Some estimates of norms of random matrices. Proc. Am. Math. Soc. 133(5), 1273–1282 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  38. Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28(5), 1302–1338 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  39. Lee, J.O., Schnelli, K.: Edge universality for deformed Wigner matrices. Rev. Math. Phys. 27(8), 1550018 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  40. Lei, J., Vu, V.Q.: Sparsistency and agnostic inference in sparse PCA. Ann. Stat. 43(1), 299–322 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  41. Ma, Z.: Sparse principal component analysis and iterative thresholding. Ann. Stat. 41(2), 772–801 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  42. Male, C.: The norm of polynomials in large random and deterministic matrices. Probab. Theory Relat. Fields 154(3–4), 477–532 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  43. Nadler, B.: Finite sample approximation results for principal component analysis: a matrix perturbation approach. Ann. Stat. 36(6), 2791–2817 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  44. Onatski, A., Moreira, M.J., Hallin, M.: Asymptotic power of sphericity tests for high-dimensional data. Ann. Stat. 41(3), 1204–1231 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  45. Paul, D.: Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Stat. Sin. 17(4), 1617–1642 (2007)

    MathSciNet  MATH  Google Scholar 

  46. Pillai, N.S., Yin, J.: Universality of covariance matrices. Ann. Appl. Probab. 24(3), 935–1001 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  47. Rothman, A.J., Levina, E., Zhu, J.: Generalized thresholding of large covariance matrices. J. Am. Stat. Assoc. 104(485), 177–186 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  48. Schölkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In: International Conference on Artificial Neural Networks, pp. 583–588. Springer (1997)

  49. Shcherbina, T.: On universality of local edge regime for the deformed Gaussian unitary ensemble. J. Stat. Phys. 143(3), 455–481 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  50. Shen, H., Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99(6), 1015–1034 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  51. Silverstein, J.W., Bai, Z.: On the empirical distribution of eigenvalues of a class of large dimensional random matrices. J. Multivar. Anal. 54(2), 175–192 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  52. Speicher, R.: Multiplicative functions on the lattice of non-crossing partitions and free convolution. Math. Ann. 298(1), 611–628 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  53. Srivastava, N., Vershynin, R.: Covariance estimation for distributions with \(2+\varepsilon \) moments. Ann. Probab. 41(5), 3081–3111 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  54. Szegö, G.: Orthogonal Polynomials. American Mathematical Society, Providence (1939)

    MATH  Google Scholar 

  55. Tao, T.: Topics in Random Matrix Theory. American Mathematical Society, Providence (2012)

    Book  MATH  Google Scholar 

  56. Tao, T., Vu, V.: Random covariance matrices: universality of local statistics of eigenvalues. Ann. Probab. 40(3), 1285–1315 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  57. Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Eldar, Y., Kutyniok, G. (eds.) Compressed Sensing, pp. 210–268. Cambridge University Press, Cambridge (2012)

    Chapter  Google Scholar 

  58. Voiculescu, D.: Addition of certain non-commuting random variables. J. Funct. Anal. 66(3), 323–346 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  59. Voiculescu, D.: Limit laws for random matrices and free products. Invent. Math. 104(1), 201–220 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  60. Vu, V.Q., Cho, J., Lei, J., Rohe, K.: Fantope projection and selection: a near-optimal convex relaxation of sparse PCA. In: Advances in Neural Information Processing Systems, pp. 2670–2678 (2013)

  61. Vu, V.Q., Lei, J.: Minimax rates of estimation for sparse PCA in high dimensions. AISTATS 15, 1278–1286 (2012)

    Google Scholar 

  62. Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)

    Article  Google Scholar 

  63. Yin, Y.Q., Bai, Z.D., Krishnaiah, P.R.: On the limit of the largest eigenvalue of the large dimensional sample covariance matrix. Probab. Theory Relat. Fields 78(4), 509–521 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  64. Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhou Fan.

Additional information

Zhou Fan is supported by a Hertz Foundation Fellowship and an NDSEG Fellowship (DoD, Air Force Office of Scientific Research, 32 CFR 168a). Andrea Montanari is partially supported by NSF Grants CCF-1319979 and DMS-1106627 and the AFOSR Grant FA9550-13-1-0036.

Appendices

Combinatorial results

This appendix contains the proofs of Lemmas 5.5, 5.7, and 5.13 used in Sect. 5, as well as the proof of Proposition 5.17 and the explicit construction of the map \(\varphi \) in that proposition.

1.1 Proof of Lemmas 5.5, 5.7, and 5.13

Proof

(Lemma 5.13) Let \(I=\{1,\ldots ,p\}\) and \(J=\{1,\ldots ,n\}\), and consider an undirected graph G on the vertex set \(I \sqcup J\) (the disjoint union of I and J with \(n+p\) elements, treating elements of I and the elements of J as distinct). Let G have an edge between \(i,i' \in I\) if there are three consecutive vertices (p, n, p) of the l-graph with the labels i, \(\emptyset \), \(i'\) or \(i'\), \(\emptyset \), i. Let G have an edge between \(i \in I\) and \(j \in J\) if there are two consecutive vertices of the l-graph such that the p-vertex has label i and the n-vertex has label j. The number of vertices of G incident to at least one edge is \(\tilde{m}\), and G must be connected, so it has at least \(\tilde{m}-1\) edges. An edge in G between \(i,i' \in I\) corresponds to at least two consecutive pairs of p-vertices in the l-graph having an n-vertex with empty label in between, by condition (3) of Definition 5.12, so the number of such edges is at most \(\frac{l-\tilde{k}}{2}\). Similarly, an edge in G between \(i \in I\) and \(j \in J\) corresponds to at least two pairs of consecutive n and p-vertices of the l-graph such that the n-vertex has non-empty label, by condition (2) of 5.12, so the number of such edges is at most \(\frac{2\tilde{k}}{2}\). Then \(\tilde{m}-1 \le \frac{l+\tilde{k}}{2}\). \(\square \)

Turning now to multi-labelings, for each \(j \in \{1,2,3,\ldots \}\) and a given multi-labeling, let us denote throughout

$$\begin{aligned} N_j:=\text {number of appearances of } j \text { as an } n\text {-label}. \end{aligned}$$

Then the following two lemmas hold:

Lemma A.1

In any multi-labeling of an l-graph, each j that appears as an n-label has \(N_j \ge 2\).

Proof

Suppose that an n-label j appears only once. The two p-vertices preceding and following that n-vertex must have distinct labels, say \(i_1\) and \(i_2\), by condition (1) of Definition 5.4. Then exactly one edge in the l-graph has p-vertex endpoint labeled \(i_1\) and n-vertex endpoint having label j (and similarly for \(i_2\) and j), contradicting condition (3) of Definition 5.4. \(\square \)

Lemma A.2

Suppose a multi-labeling of an l-graph has at most \(\frac{l}{2}\) distinct p-labels. If this multi-labeling has excess \(\varDelta \), then

$$\begin{aligned} \sum _{j:N_j \ge 3} N_j \le 6\varDelta -6. \end{aligned}$$

Consequently, the number of n-vertices having any label j for which \(N_j \ge 3\) is also at most \(6\varDelta -6\).

Proof

Observe that if m total distinct p-labels and n-labels appear in the labeling, and at most \(\frac{l}{2}\) of these are p-labels, then the labeling has at least \(m-\frac{l}{2}\) distinct n-labels. Let \(c=|\{j:N_j=2\}|\). Then Lemma A.1 implies \(2c+3\left( m-\frac{l}{2}-c\right) \le \sum _{s=1}^l d_s\) (where \(d_1,\ldots ,d_l\) are the numbers of n-labels on the ln-vertices), so \(c \ge 3m-\frac{3l}{2}-\sum _{s=1}^l d_s\). Then the n-labels in \(\{j:N_j=2\}\) account for at least \(6m-3l-2\sum _{s=1}^l d_s\) of the \(\sum _{s=1}^l d_s\) total n-labels, implying that at most \(3l+3\sum _{s=1}^l d_s-6m=6\varDelta -6\) total n-labels remain. This establishes the first claim, and the second follows directly from the first. \(\square \)

We will prove many subsequent claims regarding multi-labelings by induction on l. The following two lemmas describe the base case of the induction and the basic inductive step.

Lemma A.3

Suppose \(l=2\) or \(l=3\). Then for any multi-labeling of the l-graph, all lp-labels are distinct, and all ln-vertices have the same tuple of n-labels, up to reordering.

Proof

That all lp-labels are distinct is a consequence of condition (1) of Definition 5.4. Then by conditions (2) and (3) of Definition 5.4, the n-vertices immediately preceding and following each p-vertex must have the same tuple of n-labels, up to reordering. \(\square \)

Lemma A.4

In a multi-labeling of an l-graph with \(l \ge 4\), suppose a p-vertex V is such that its p-label appears on no other p-vertices. Let the n-vertex preceding V be U, the p-vertex preceding U be T, the n-vertex following V be W, and the p-vertex following W be X.

  1. 1.

    If T and X have different p-labels, then the graph obtained by deleting V and W and connecting U to X is an \((l-1)\)-graph with valid multi-labeling.

  2. 2.

    If T and X have the same p-label, then the graph obtained by deleting U, V, W, and X and connecting T to the n-vertex after X is an \((l-2)\)-graph with valid multi-labeling.

Proof

First consider case (1). As T and X have distinct p-labels, it remains true that no two consecutive p-vertices in the \((l-1)\)-graph have the same p-label, so condition (1) of Definition 5.4 holds. Condition (2) of Definition 5.4 clearly still holds as well. If V has p-label i and W has n-labels \((j_1,\ldots ,j_d)\), then U has n-labels \((j_1,\ldots ,j_d)\) as well, up to reordering, by conditions (2) and (3) of Definition 5.4 and the fact that V is the only p-vertex with label i. Then in the \((l-1)\)-graph obtained by deleting V and W, the number of edges with p-vertex endpoint labeled i and n-vertex endpoint having label \(j_s\) for any \(s=1,\ldots ,d\) is zero, and the number of edges with p-vertex endpoint labeled \(i'\) and n-vertex endpoint having label \(j'\) is the same as in the original l-graph for all other pairs \((i',j')\). Thus condition (3) of Definition 5.4 still holds as well, so the \((l-1)\)-graph has a valid multi-labeling.

Now consider case (2). X and the p-vertex after X must have different p-labels in the original l-graph, by condition (1) of Definition 5.4. As T and X have the same p-label, this implies T and the p-vertex after X must have different p-labels, so condition (1) of Definition 5.4 still holds in the \((l-2)\)-graph. Condition (2) of Definition 5.4 clearly still holds in the \((l-2)\)-graph as well. Suppose V has p-label \(i_1\), T and X have p-label \(i_2\), and W has n-labels \((j_1,\ldots ,j_d)\). As in case (1), U must also have n-labels \((j_1,\ldots ,j_d)\) up to reordering. Then in the \((l-2)\)-graph obtained by deleting U, V, W, and X, the number of edges with p-vertex endpoint labeled \(i_1\) and n-vertex endpoint having label \(j_s\) for any \(s=1,\ldots ,d\) is zero, the number of edges with p-vertex endpoint labeled \(i_2\) and n-vertex endpoint having label \(j_s\) for any \(s=1,\ldots ,d\) is two less than in the original l-graph, and the number of edges with p-vertex endpoint labeled \(i'\) and n-vertex endpoint having label \(j'\) is the same as in the original l-graph for all other pairs \((i',j')\). Hence condition (3) of Definition 5.4 still holds as well, so the \((l-2)\)-graph has a valid multi-labeling. \(\square \)

Proof

(Lemma 5.5) We induct on l. For \(l=2\), a multi-labeling must have \(d_1=d_2\) and \(m=d_1+2\), and for \(l=3\), a multi-labeling must have \(d_1=d_2=d_3\) and \(m=d_1+3\), by Lemma A.3. The result is then easily verified in these two cases.

Suppose by induction that the result holds for \(l-2\) and \(l-1\), and consider a multi-labeling of an l-graph with \(l \ge 4\). If each distinct p-label appears at least twice, then there are at most \(\frac{l}{2}\) distinct p-labels. Lemma A.1 implies there are at most \(\frac{\sum _{s=1}^l d_s}{2}\) distinct n-labels, so \(m \le \frac{l+\sum _{s=1}^l d_s}{2}\), establishing the result.

Thus, suppose that some p-vertex V has a label that appears exactly once, and let TUWX be as in Lemma A.4. If T and X have different p-labels, follow procedure (1) in Lemma A.4 to obtain a multi-labeling of an \((l-1)\)-graph. This multi-labeling now has \(m-1\) total distinct p-labels and n-labels, and so the induction hypothesis implies \(m-1 \le \frac{l-1+\sum _{s=1}^l d_s-d}{2}+1\) where d is the number of n-labels of the deleted n-vertex W. Hence \(m \le \frac{l+\sum _{s=1}^l d_s}{2}-\frac{d+1}{2}+2 \le \frac{l+\sum _{s=1}^l d_s}{2}+1\).

If T and X have the same p-label, follow procedure (2) of Lemma A.4 to obtain a multi-labeling of an \((l-2)\)-graph. This multi-labeling has between \(m-d-1\) and \(m-1\) (inclusive) total distinct p-labels and n-labels, where d is the number of n-labels of the deleted n-vertex W. The induction hypothesis implies \(m-d-1 \le \frac{l-2+\sum _{s=1}^l d_s-2d}{2}+1\), so \(m \le \frac{l+\sum _{s=1}^l d_s}{2}+1\). This completes the induction in both cases, establishing the desired result. \(\square \)

Proof

(Lemma 5.7) We induct on l. For \(l=2\) or 3, we must have \(b_{ij}=0\) or 2 for all (ij) by Lemma A.3, and \(\varDelta \ge 0\) by Lemma 5.5, so the result holds.

Suppose the result holds for \(l-2\) and \(l-1\), and consider a multi-labeling of an l-graph with \(l \ge 4\). If each distinct p-label appears at least twice, then there are at most \(\frac{l}{2}\) distinct p-labels, so Lemma A.2 applies. For any j with \(N_j=2\), we have \(b_{ij}=2\) or \(b_{ij}=0\) for all i, by conditions (1) and (3) of Definition 5.4. For any j with \(N_j \ge 3\), we apply the bound \(\sum _{i:b_{ij}>2} b_{ij} \le 2N_j\). Then \(\sum _{i,j:b_{ij}>2} b_{ij} \le 2(6\varDelta -6) \le 12\varDelta \) by Lemma A.2.

Now suppose that some p-vertex V has a p-label appearing exactly once. Consider the \((l-1)\)-graph or \((l-2)\)-graph obtained by Lemma A.4. In the case of the \((l-1)\)-graph, it is easily verified that \(\sum _{i,j:b_{ij}>2} b_{ij}\) is the same as in the original l-graph, so the induction hypothesis implies \(\sum _{i,j:b_{ij}>2} b_{ij} \le 12\left( \frac{l-1+\sum _{s=1}^l d_s-d}{2}+1-(m-1)\right) \le 12\varDelta \), where \(d \ge 1\) is the number of n-labels on the deleted n-vertex W.

In the case of the \((l-2)\)-graph, suppose the deleted n-vertex W (and U) has dn-labels, of which \(d'\) also appear on an n-vertex different from W and U. If j does not appear on W or U, then clearly \(b_{ij}\) is the same in the \((l-2)\)-graph and the original l-graph for all i. If j is one of the \(d-d'\)n-label values appearing only on W and U, then \(b_{ij}=0\) or 2 in both the \((l-2)\)-graph and the original l-graph for all i. If j is one of the other \(d'\)n-label values appearing on W and U, then in deleting U, V, W, and X, we may have reduced \(b_{ij}\) by 2 for at most two distinct values of i (corresponding to the p-labels of V and X). This implies that \(\sum _{i:b_{ij}>2} b_{ij}\) reduces by at most 8 for this j, with the maximal reduction occurring if \(b_{ij}=4\) for both of these values of i in the original l-graph. Then by the induction hypothesis, \(\sum _{i,j:b_{ij}>2} b_{ij}-8d' \le 12\left( \frac{l-2+\sum _{s=1}^l d_s-2d}{2}+1-(m-1-(d-d')\right) \), as the \((l-2)\)-graph has \(m-1-(d-d')\) total distinct n and p-labels. Then \(\sum _{i,j:b_{ij}>2} b_{ij} \le 12\left( \frac{l+\sum _{s=1}^l d_s}{2}+1-m-d'\right) +8d' \le 12\varDelta \), so the result holds in this case as well, completing the induction. \(\square \)

1.2 Construction of the map \(\varphi \)

Definition A.5

In an l-graph with a multi-labeling, an n-vertex is single if it has only one n-label. It is a good single if it is single and if its n-label j appears only on single n-vertices. Otherwise, it is a bad single.

Definition A.6

In an l-graph with a multi-labeling, a pair \((V,V')\) of distinct (not necessarily consecutive) n-vertices is a good pair if the following conditions hold:

  1. 1.

    V and \(V'\) have the same tuple of n-labels, up to reordering,

  2. 2.

    V and \(V'\) are not single, and

  3. 3.

    \(N_j=2\) for each j appearing as an n-label on V and \(V'\) (i.e. this label j appears on no other n-vertices).

If an n-vertex V is not single and not part of any good pair, then V is a bad non-single.

Thus, every n-vertex is either a good single, a bad single, a bad non-single, or part of a good pair. Conditions (1) and (3) of Definition 5.4 require that, if \((V,V')\) is a good pair, then the two (distinct) p-labels of the p-vertices preceding and following V are the same as those of the p-vertices preceding and following \(V'\) (but not necessarily in the same order).

Definition A.7

Suppose \((V,V')\) is a good pair of n-vertices. Let the p-vertices preceding and following V be U and W, respectively, and let the p-vertices preceding and following \(V'\) be \(U'\) and \(W'\), respectively. Then the good pair \((V,V')\) is proper if U has the same label as \(W'\) and \(U'\) has the same label as W, and it is improper if U has the same label as \(U'\) and W has the same label as \(W'\).

Definition A.8

The label-simplifying map is the map from (pn)-multi-labelings of an l-graph to \((p,n+1)\)-simple-labelings of an l-graph, defined by the following procedure:

  1. 1.

    While there exists an improper good pair of n-vertices \((V,V')\), iterate the following: Let W be the p-vertex following V and \(W'\) be the p-vertex following \(V'\), and reverse the sequence of vertices starting at W and ending at \(W'\) (together with their labels).

  2. 2.

    For each n-vertex in a good pair, relabel it with the empty label.

  3. 3.

    For each n-vertex that is a bad single or a bad non-single, relabel it with the single label \(n+1\).

Remark A.9

In the case where there are multiple improper good pairs in step (1) of this procedure, it will not be important for our later arguments in which order the pairs \((V,V')\) are selected and which vertex we choose as V and which as \(V'\). For concreteness, we may always select \(\{V,V'\}\) to be the improper good pair whose sorted n-label-tuple is smallest lexicographically, and we may take V to come before \(V'\) in the l-graph cycle.

Lemma A.10

The following are true for the label-simplifying map in Definition A.8:

  1. 1.

    Step (1) of the procedure in Definition A.8 always terminates in a valid (pn)-multi-labeling with no improper good pairs.

  2. 2.

    The image of any (pn)-multi-labeling under the map is a valid \((p,n+1)\)-simple-labeling.

  3. 3.

    If two multi-labelings are equivalent, then their image simple-labelings are also equivalent.

Proof

Clearly each reversal in step (1) of the procedure preserves condition (2) of Definition 5.4 as well as the number of good pairs and n-labels of each good pair. As W and \(W'\) have the same p-label because \((V,V')\) is improper, it also preserves conditions (1) and (3) of Definition 5.4, so the resulting labeling is still a valid (pn)-multi-labeling. Each time this reversal is performed, V and \(V'\) become consecutive n-vertices in the l-graph, and the pair \((V,V')\) becomes a proper good pair. As V and \(V'\) are consecutive, they must remain consecutive under each subsequent reversal, so their properness is preserved. Hence the procedure must terminate after a number of iterations at most the total number of good pairs in the multi-labeling, and the final multi-labeling is such that all good pairs are proper. This establishes (1).

To prove (2), note that the image labeling has either one n-label or the empty label for each n-vertex. Condition (1) of Definition 5.12 holds for the image labeling by condition (1) of Definition 5.4, as the p-labels are preserved. As all good pairs in the multi-labeling obtained after applying step (1) of the procedure are proper, and step (2) of the procedure maps their labels to the empty label, condition (3) of Definition 5.12 holds for the image labeling. Finally, note that if j is an n-label appearing on good single vertices in the multi-labeling, then condition (2) of Definition 5.12 holds in the image labeling for this j and all p-labels i by condition (3) in Definition 5.4. For the new n-label \(n+1\) created in step (3) of the map, note that for each \(i \in \{1,2,3,\ldots \}\) there must be an even number of edges in the l-graph with p-endpoint labeled i. Of these, there must be an even number with n-endpoint j for any good single label j, by the above argument, and there must also be an even number with n-endpoint belonging to a good pair since these edges must come in pairs. Hence the number of remaining edges adjacent to any p-vertex with label i must also be even. These are precisely the edges with p-endpoint labeled i and n-endpoint labeled \(n+1\) in the image labeling, so condition (2) of Definition 5.12 holds for the new n-label \(n+1\) and all p-labels i as well. Hence the image labeling is a valid \((p,n+1)\)-simple-labeling, establishing (2).

(3) is evident, as equivalent multi-labelings have the same proper and improper good pairs of n-vertices and the same good single n-vertices. \(\square \)

Definition A.11

Let \(\mathcal {C}\) and \(\tilde{\mathcal {C}}\) be the set of all multi-labeling equivalence classes and simple-labeling equivalence classes, respectively, of an l-graph. For \(\mathcal {L} \in \mathcal {C}\) and any multi-labeling in \(\mathcal {L}\), let \(\tilde{\mathcal {L}} \in \tilde{\mathcal {C}}\) contain its image simple-labeling under the label-simplifying map of Definition A.8, and define \(\varphi :\mathcal {C} \rightarrow \tilde{\mathcal {C}}\) by \(\varphi (\mathcal {L})=\tilde{\mathcal {L}}\).

1.3 Verification of Proposition 5.17, properties (1) and (2)

For the map \(\varphi \) of Definition A.11, property (1) of Proposition 5.17 is evident as the p-labels are preserved. We verify property (2) by bounding the number of bad non-single n-vertices.

For each pair \(i,i' \in \{1,2,3,\ldots \}\) with \(i<i'\), and for a given multi-labeling, let us denote

$$\begin{aligned} P_{i,i'}&:=\text {number of appearances of } i,i' \text { as the } p\text {-labels of two consecutive }\\&\quad p\text {-vertices (in some order)}. \end{aligned}$$

Lemma A.12

Suppose a multi-labeling of an l-graph has excess \(\varDelta \). Then

$$\begin{aligned} \sum _{i<i':P_{i,i'} \ge 3} P_{i,i'} \le 42\varDelta . \end{aligned}$$

Proof

We induct on l. For \(l=2\) and 3, \(P_{i,i'}=0\) or 1 for all pairs \(i<i'\), and \(\varDelta \ge 0\) by Lemma 5.5, so the result holds.

Suppose by induction that the result holds for \(l-2\) and \(l-1\), and consider a multi-labeling of an l-graph with \(l \ge 4\). First suppose each distinct p-label appears at least twice, so there are at most \(\frac{l}{2}\) distinct p-labels. If an n-label j is such that \(N_j=2\), then the pairs of p-vertices before and after the two n-vertices with label j must have the same pairs of p-labels, by conditions (1) and (3) of Definition 5.4. Thus the number of pairs \(i<i'\) with \(P_{i,i'}=1\) is at most the number of n-vertices for which \(N_j \ge 3\) for all of its n-labels j. This is at most \(6\varDelta \) by Lemma A.2. On the other hand, the number of distinct p-labels is at most one more than the number of distinct pairs of consecutive p-labels. (This is easily seen by considering the undirected graph with vertices \(\{1,\ldots ,p\}\) having an edge between \(i,i'\) if and only if some consecutive pair of p-vertices have labels i and \(i'\), and noting that this graph is connected.) Lemma A.1 implies there are at most \(\frac{\sum _{s=1}^l d_s}{2}\) distinct n-labels, and hence at least \(m-\frac{\sum _{s=1}^l d_s}{2}-1=\frac{l}{2}-\varDelta \) distinct pairs \(i<i'\) of consecutive p-labels. At least \(\frac{l}{2}-7\varDelta \) of these have \(P_{i,i'} \ge 2\). If c of these have \(P_{i,i'}=2\), then \(2c+3\left( \frac{l}{2}-7\varDelta -c\right) \le l\), so \(c \ge \frac{l}{2}-21\varDelta \). These account for at least \(l-42\varDelta \) pairs of consecutive p-vertices, implying that at most \(42\varDelta \) pairs of consecutive p-vertices remain. This establishes the result in this case.

Now suppose that there is some p-vertex V whose p-label appears only once. Consider the \((l-1)\)-graph or \((l-2)\)-graph obtained by Lemma A.4. It is easily verified that \(\sum _{i<i':P_{i,i'} \ge 3} P_{i,i'}\) is the same in this graph as in the original l-graph, because if \(P_{i,i'} \ge 3\) in the original l-graph, then neither i nor \(i'\) can be the p-label of V. On the other hand, our proof of Lemma 5.5 verified that this \((l-1)\)-graph or \((l-2)\)-graph has excess at most that of the original l-graph, so the desired result follows from the induction hypothesis. \(\square \)

The next lemma bounds the number of bad non-single n-vertices, i.e. it shows that in any multi-labeling with small excess \(\varDelta \), most of the non-single n-vertices must belong to a good pair.

Lemma A.13

Suppose a multi-labeling of an l-graph has excess \(\varDelta \) and k single n-vertices. Then there are at least \(\frac{l-k}{2}-48\varDelta \) good pairs of n-vertices.

Proof

Let m be the number of distinct n and p-labels and let \(d_1,\ldots ,d_l\) be the numbers of n-labels on the ln-vertices. We induct on l. If \(l=2\), then Lemma A.3 implies \(d_1=d_2\), \(m=d_1+2\), and \(\varDelta =0\). If \(d_1=d_2=1\), then \(k=2\) and there are no good pairs, and if \(d_1=d_2 \ge 2\), then \(k=0\) and there is one good pair. Hence the result holds. If \(l=3\), then Lemma A.3 implies \(d_1=d_2=d_3\), \(m=d_1+3\), and \(\varDelta =\frac{d_1-1}{2}\). If \(d_1=d_2=d_3=1\), then \(k=3\), \(\varDelta =0\), and there are no good pairs. If \(d_1=d_2=d_3 \ge 2\), then \(k=0\), \(\varDelta \ge \frac{1}{2}\), and there are still no good pairs. In either case, the result also holds.

Consider \(l \ge 4\), and assume by induction that the result holds for \(l-2\) and \(l-1\). First suppose each distinct p-label appears at least twice, so there are at most \(\frac{l}{2}\) distinct p-labels. By Lemma A.2 there are at most \(6\varDelta \)n-vertices with some n-label j such that \(N_j \ge 3\), so there are at least \(l-k-6\varDelta \) non-single n-vertices for which each of its n-labels j has \(N_j=2\). Let V be one such n-vertex. We consider three cases:

Case 1:V has two n-labels \(j_1\) and \(j_2\) that appear on two different other n-vertices \(W_1\) and \(W_2\). Then Definition 5.4 implies that the three pairs of consecutive p-vertices around V, \(W_1\), and \(W_2\) must have the same pair of p-labels. By Lemma A.12, there are at most \(42\varDelta \) such n-vertices V.

Case 2: All n-labels of V appear on a single other n-vertex \(W_1\), but \(W_1\) has some additional n-label j not appearing on V. Then either all such additional n-labels j have \(N_j \ge 3\), or there is some such j with \(N_j=2\). In the former case, the number of such vertices \(W_1\) is at most \(6\varDelta \) by Lemma A.2. As V is the unique n-vertex sharing an n-label j with \(W_1\) for which \(N_j=2\), this implies the number of such vertices V is also at most \(6\varDelta \). In the latter case, j appears on a vertex \(W_2\) distinct from V and \(W_1\). Then the three pairs of p-vertices around V, \(W_1\), and \(W_2\) must have the same pair of p-labels, and by Lemma A.12 the number of such vertices V is at most \(42\varDelta \). Hence the number of n-vertices V belonging to this case is at most \(48\varDelta \)

Case 3:V forms a good pair with some other vertex \(V'\). By the bounds in cases 1 and 2, there are at least \(l-k-96\varDelta \) such vertices V, hence at least \(\frac{l-k}{2}-48\varDelta \) good pairs, and the result holds.

Now suppose there is some p-vertex V whose p-label appears only once. Let TUWX be as in Lemma A.4, and recall that U and W have the same n-labels up to reordering. Consider four cases:

Case 1:T and X have different p-labels, and U and W are single. Lemma A.4 yields an \((l-1)\)-graph with \(k-1\) single n-vertices, \(\sum _{s=1}^l d_s-1\) total n-labels, and \(m-1\) total distinct p- and n-labels. By the induction hypothesis, this \((l-1)\)-graph has at least

$$\begin{aligned} \tfrac{(l-1)-(k-1)}{2}-48\left( \tfrac{(l-1)+\left( \sum _{s=1}^l d_s-1\right) }{2}+1-(m-1) \right) =\tfrac{l-k}{2}-48\varDelta \end{aligned}$$

good pairs, which are also good pairs in the l-graph.

Case 2:T and X have different p-labels, and U and W each have \(d \ge 2\)n-labels. Lemma A.4 yields an \((l-1)\)-graph with k single n-vertices, \(\sum _{s=1}^l d_s-d\) total n-labels, and \(m-1\) distinct p- and n-labels. By the induction hypothesis, this \((l-1)\)-graph has at least

$$\begin{aligned} \tfrac{(l-1)-k}{2}-48\left( \tfrac{(l-1)+\left( \sum _{s=1}^l d_s-d\right) }{2}+1-(m-1) \right) >\tfrac{l-k}{2}-48\varDelta +1 \end{aligned}$$

good pairs. It can have at most one more good pair than the original l-graph (which occurs if W has a tuple of n-labels appearing on exactly three different n-vertices in the l-graph).

Case 3:T and X have the same p-label, and U and W are single. Lemma A.4 yields an \((l-2)\)-graph with \(k-2\) single n-vertices, \(\sum _{s=1}^l d_s-2\) total n-labels, and either \(m-2\) distinct p- and n-labels if U and W have an n-label appearing only those two times, or \(m-1\) distinct p- and n-labels otherwise. Supposing the former, this \((l-2)\)-graph has at least

$$\begin{aligned} \tfrac{(l-2)-(k-2)}{2}-48\left( \tfrac{(l-2)+\left( \sum _{s=1}^l d_s-2\right) }{2}+1-(m-2) \right) =\tfrac{l-k}{2}-48\varDelta \end{aligned}$$

good pairs, and it has the same number of good pairs as the original l-graph. Supposing the latter, this \((l-2)\)-graph has at least

$$\begin{aligned} \tfrac{(l-2)-(k-2)}{2}-48\left( \tfrac{(l-2)+\left( \sum _{s=1}^l d_s-2\right) }{2}+1-(m-1) \right) >\tfrac{l-k}{2}-48\varDelta +1 \end{aligned}$$

good pairs, and it can have at most one more good pair than the original l-graph (which occurs if the \((l-2)\)-graph has a good pair containing the n-label of the removed vertices U and W).

Case 4:T and X have the same p-label, and U and W each have \(d \ge 2\)n-labels. Lemma A.4 yields an \((l-2)\)-graph with k single n-vertices, \(\sum _{s=1}^l d_s-2d\) total n-labels, and between \(m-d-1\) and \(m-1\) (inclusive) distinct p- and n-labels. If it has exactly \(m-d-1\) distinct p- and n-labels, then we must have removed a good pair, and the \((l-2)\)-graph has at least

$$\begin{aligned} \tfrac{(l-2)-k}{2}-48\left( \tfrac{(l-2)+\left( \sum _{s=1}^l d_s-2d\right) }{2}+1- (m-d-1)\right) =\tfrac{l-k}{2}-48\varDelta -1 \end{aligned}$$

good pairs. If, instead, the \((l-2)\)-graph has \(m-c-1\) distinct p- and n-labels for \(0 \le c < d\), then U and W cannot be a good pair in the original l-graph as they have \(d-c\)n-labels j for which \(N_j \ge 3\), and the \((l-2)\)-graph can have at most \(d-c\) more good pairs than the l-graph, one for each such j. The \((l-2)\)-graph has at least

$$\begin{aligned} \tfrac{(l-2)-k}{2}-48\left( \tfrac{(l-2)+\left( \sum _{s=1}^l d_s-2d\right) }{2}+1 -(m-c-1)\right) >\tfrac{l-k}{2}-48\varDelta +d-c \end{aligned}$$

good pairs. In all cases, we establish that the l-graph has at least \(\tfrac{l-k}{2}-48\varDelta \) good pairs, completing the induction. \(\square \)

Proof

(Proposition 5.17, property (2)) Let \(\mathcal {L} \in \mathcal {C}\) be any multi-labeling equivalence class. Let \(\varphi (\mathcal {L})\) have \(\tilde{k}\)n-vertices with non-empty label. This means \(\mathcal {L}\) has \(\tilde{k}\)n-vertices that do not belong to a good pair. These vertices have at least \(\tilde{k}\) total n-labels in \(\mathcal {L}\), implying that there are at most \(\sum _{s=1}^l d_s-\tilde{k}\) total n-labels on the good pair vertices. These good pair vertices account for at most \(\frac{\sum _{s=1}^l d_s-\tilde{k}}{2}\) distinct n-labels in \(\mathcal {L}\), and these are mapped to the empty label under the label-simplifying map. Furthermore, by Lemma A.13, there are at most \(96\varDelta (\mathcal {L})\) bad non-single n-vertices, and these have at most \(96D\varDelta (\mathcal {L})\) additional distinct n-labels that are mapped to the new n-label \(n+1\). Any bad single n-vertex has an n-label that is the same as one of these \(96D\varDelta (\mathcal {L})\) distinct n-labels (otherwise it is a good single by definition), and the n-label of any good single n-vertex is preserved under the label-simplifying map. Hence, if m is the number of total distinct p- and n-labels in \(\mathcal {L}\) and \(\tilde{m}\) is the number of total distinct p-labels and non-empty n-labels in \(\varphi (\mathcal {L})\), then \(\tilde{m} \ge m-\frac{\sum _{s=1}^l d_s-\tilde{k}}{2}-96D\varDelta (\mathcal {L})\), so \(\tilde{\varDelta }(\varphi (\mathcal {L})) =\frac{l+\tilde{k}}{2}+1-\tilde{m} \le (96D+1)\varDelta (\mathcal {L})\). Hence property (2) holds. \(\square \)

1.4 Verification of Proposition 5.17, property (3)

Recall that we order the vertices of an l-graph according to a cyclic traversal starting from a (arbitrary) p-vertex.

Definition A.14

The canonical simple labeling in a simple labeling equivalence class \(\tilde{\mathcal {L}}\) is the one in which each ith new p-vertex label that appears in the cyclic traversal is i, and each jth new non-empty n-vertex label is j.

The canonical multi-labeling in a multi-labeling equivalence class \(\mathcal {L}\) is the one in which each ith new p-vertex label is i and each jth new n-vertex label is j, with the new n-vertex labels in the label-tuple for each n-vertex appearing in sorted order.

Each \(\tilde{\mathcal {L}}\) has a unique canonical simple-labeling, which is an (ll)-simple labeling, and each \(\mathcal {L}\) has a unique canonical multi-labeling, which is an (lDl)-multi-labeling.

For each \(\tilde{\mathcal {L}} \in \tilde{\mathcal {C}}\) and \(\varDelta _0 \ge 0\), property (3) of Proposition 5.17 is a bound on a certain weighted cardinality of the set

$$\begin{aligned} \mathcal {S}(\varDelta _0,\tilde{\mathcal {L}}):=\varphi ^{-1}(\tilde{\mathcal {L}}) \cap \{\mathcal {L}:\varDelta (\mathcal {L})=\varDelta _0\}. \end{aligned}$$

We describe a series of non-determined steps by which the mapping \(\varphi \) may be “inverted” to obtain the canonical multi-labeling L of any \(\mathcal {L} \in \varphi ^{-1}(\tilde{\mathcal {L}})\), given \(\tilde{\mathcal {L}}\):

  1. 1.

    Choose a non-empty n-label value appearing in \(\tilde{\mathcal {L}}\) to be “\(\hbox {n}+1\)”, or assume there is no such label. (The n-vertices with empty label will be the good pairs, and the remaining n-vertices with label different from “\(\hbox {n}+1\)” will be the good singles.)

  2. 2.

    Choose a subset S of n-vertices with label “\(\hbox {n}+1\)” to be the bad non-singles in L. (The remaining n-vertices with label “\(\hbox {n}+1\)” will be the bad singles.)

  3. 3.

    For each n-vertex in S, choose the size of its n-label tuple in L to be between 2 and D (inclusive), and pick n-labels from \(\{1,\ldots ,Dl\}\) for that tuple.

  4. 4.

    For each n-vertex with label “\(\hbox {n}+1\)” not in S, pick a single value in \(\{1,\ldots ,Dl\}\) for its n-label in L.

  5. 5.

    For all n-vertices with empty label in \(\tilde{\mathcal {L}}\), pair them up into good pairs for L.

  6. 6.

    For each good pair, choose the size of its n-label tuple in L to be between 2 and D (inclusive), and choose a permutation of the second n-label tuple of the pair that matches the first.

  7. 7.

    Let \(\mathcal {G}\) be the set of good pairs \((V,V')\) that are consecutive n-vertices in the l-graph and such that the p-label (in \(\tilde{\mathcal {L}}\)) of the p-vertex between them appears at least twice. Choose an ordered subset of \(\mathcal {G}\). For each \((V,V')\) in this subset, if W is the p-vertex between V and \(V'\), choose some other p-vertex \(W'\) having the same p-label as W, and reverse the sequence of vertices from W to \(W'\) or from \(W'\) to W.

  8. 8.

    Choose p-labels for L such that the resulting labeling is canonical and two p-vertices have the same label if and only if they do in \(\tilde{\mathcal {L}}\). Choose the remaining n-labels for L (corresponding to the good pairs and good singles) such that the resulting labeling is canonical, the properties of Definitions A.5 and A.6 are satisfied, and two good single vertices have the same n-label if and only if they do in \(\tilde{\mathcal {L}}\).

These steps are non-determined in the sense that each step may be performed in multiple ways, yielding many possible output multi-labelings L. They “invert” \(\varphi \) in the following sense:

Lemma A.15

For any \(\mathcal {L} \in \varphi ^{-1}(\mathcal {\tilde{L}})\), the canonical multi-labeling L of \(\mathcal {L}\) is a possible output of the above procedure.

Proof

Let \(L^*\) denote the (lDl)-multi-labeling obtained by applying step (1) of the label-simplifying map in Definition A.8 to L. (It is an (lDl)-multi-labeling by Lemma A.10.)

L may be obtained by the above procedures as follows: Perform steps (1) and (2) to correctly partition the n-vertices into the good pair, good single, bad single, and bad non-single n-vertices of \(L^*\). Perform steps (3) and (4) to recover the n-labels in \(L^*\) of the bad single and bad non-single n-vertices. Perform steps (5) and (6) to correctly identify the good pairs of \(L^*\) and the permutation that maps the label-tuple of the second vertex to that of the first vertex in each pair. Perform step (7) to invert the reversals that mapped L to \(L^*\) (in the reverse order of how they were applied in the label-simplifying map): This is possible because each reversal in step (1) of the label-simplifying map causes an additional good pair \((V,V')\) of n-vertices to become consecutive in the l-graph, with the p-vertex between them having p-label appearing at least twice, and these three vertices remain consecutive after each subsequent reversal. Finally, perform step (8) to recover the p-labels and the good single and good pair n-labels of L, which is possible because (by assumption) L is a valid canonical multi-labeling. \(\square \)

To obtain the desired weighted cardinality bound for \(\mathcal {S}(\varDelta _0,\tilde{\mathcal {L}})\), we bound the number of ways each of the above 8 steps may be performed such that the final output L is the canonical multi-labeling for some \(\mathcal {L} \in \mathcal {S}(\varDelta _0,\tilde{\mathcal {L}})\). The bounds for all but steps (4) and (7) follow from our preceding combinatorial estimates. The following simple lemma will yield a bound for step (7):

Lemma A.16

Suppose a multi-labeling of an l-graph has excess \(\varDelta \). Then there are at most \(2\varDelta \) good pairs of n-vertices such that the two vertices in the pair are consecutive in the l-graph cycle and the p-label of the p-vertex between them appears at least twice in the labeling.

Proof

Call a p-vertex “sandwiched” if it is between two consecutive n-vertices that form a good pair. Let i be a p-label appearing on a total of \(b \ge 2\)p-vertices, of which \(c \ge 1\) are sandwiched. If \(b>c\), then change the c appearances of i on the sandwiched p-vertices to c new p-labels not yet appearing in the labeling. Otherwise if \(b=c\) (so \(c \ge 2\)), then change \(c-1\) appearances of i on the sandwiched p-vertices to \(c-1\) new p-labels not yet appearing in the labeling. Do this for every such i. Note that changing the p-label of any sandwiched p-vertex does not violate any of the conditions of Definition 5.4, so the resulting labeling is still a valid multi-labeling. If x is the number of good pairs originally satisfying the condition of the lemma, then we have added at least \(\frac{x}{2}\) new p-labels to the labeling. Hence Lemma 5.5 implies \(m+\frac{x}{2} \le \frac{l+\sum _{s=l} d_s}{2}+1\), so \(x \le 2\varDelta \). \(\square \)

The remaining challenge is to bound the number of ways of performing step (4). This bound is not straightforward because the number of bad singles is not necessarily small when \(\varDelta \) is small. We instead show that the number of bad singles that we may “freely label” is small:

Definition A.17

In a multi-labeling of an l-graph, \(i \in \{1,2,3,\ldots \}\) is a connector if it appears as a p-label and, among all n-vertices that are adjacent to any p-vertex with label i, exactly two are bad singles and none are bad non-singles; these two bad singles are connected. A sequence of bad singles \(W_1,\ldots ,W_a\) is a connected cycle if \(W_1\) is connected to \(W_2\), \(W_2\) is connected to \(W_3\), etc., and \(W_a\) is connected to \(W_1\).

Note that “connector” refers to a label i, not to any specific p-vertex having i as its label, and two “connected” bad singles are adjacent to p-vertices having the connector label i, but these p-vertices may be distinct in the l-graph. Each bad single n-vertex may be connected to at most two other bad single n-vertices (where the connectors are the p-labels of its two adjacent p-vertices), and hence this notion of connectedness partitions the set of bad single n-vertices into connected components that are either individual vertices, linear chains, or cycles.

Motivation for this definition comes from the observation that if two bad single n-vertices are connected, then they must have the same n-label, as follows from condition (3) of Definition 5.4 and the fact that n-labels appearing on good singles and good pairs must be distinct from those appearing on the remaining n-vertices.

Lemma A.18

Suppose a multi-labeling of an l-graph has excess \(\varDelta \) and k single n-vertices, of which \(k'\) are good single and \(k-k'\) are bad single. Then at least \(k-k'-(288D+2)\varDelta \) distinct p-labels are connectors, and there are at most \((192D+1)\varDelta \) connected cycles of bad single n-vertices.

Proof

Suppose the multi-labeling is a (pn)-multi-labeling. Construct an undirected multi-graph G with vertex set \(\{1,\ldots ,p\}\), where each edge of G has one label in \(\{1,\ldots ,n\}\), as follows: For each n-vertex V in the l-graph and each n-label j of V, if V is preceded and followed by p-vertices with labels \(i_1\) and \(i_2\), then add an edge \(i_1 \sim i_2\) in G with label j. (Thus G has \(\sum _{s=1}^l d_s\) total edges.) Condition (3) of Definition 5.4 implies for any j, each vertex of G has even degree in the sub-graph consisting of only edges with label j.

We will sequentially remove edges of G corresponding to good pairs and good singles, until only edges corresponding to bad singles and bad non-singles remain. At any stage of this removal process, let us call a vertex of G “active” if there is at least one edge still adjacent to that vertex. Let us define a “component” as the set of active vertices that may be reached by traversing the remaining edges of G from a particular active vertex. (Hence a component of G is a connected component, in the standard sense, that contains at least two vertices.) We will track the quantity

$$\begin{aligned} M=\#\{\text {active vertices}\}+\#\{\text {distinct edge labels}\}- \#\{\text {components}\}. \end{aligned}$$

Initially, G has m active vertices plus distinct edge labels (where m is the number of distinct n- and p-vertices of the l-graph), and one component, so \(M=m-1\). Let us remove the edges of G corresponding to good pairs. If an n-vertex of a good pair has dn-labels, then the good pair corresponds to 2d edges between a single pair of vertices in G whose edge labels do not appear elsewhere in G. Removing these 2d edges removes d distinct edge labels, and if this also changes the connectivity structure of G, then either \(\#\{\text {components}\}\) increases by 1, \(\#\{\text {active vertices}\}\) decreases by 1, or \(\#\{\text {components}\}\) decreases by 1 and \(\#\{\text {active vertices}\}\) decreases by 2. In all cases, M decreases by at most \(d+1\). Then after removing all edges of G corresponding to good pairs, \(M \ge m-1-\left( \frac{\sum _{s=1}^l d_s-k}{2}\right) -\left( \frac{l-k}{2}\right) =k-\varDelta \), as there are at most \(\frac{\sum _{s=1}^l d_s-k}{2}\) distinct n-labels for the good pairs and at most \(\frac{l-k}{2}\) good pairs.

Let us now remove the edges of G corresponding to good singles. Let j be an n-label of a good single, and consider removing the edges of G with label j one at a time. As each vertex of G has even degree in the subgraph of edges with label j, when the first such edge is removed, the number of components and active vertices cannot change. Subsequently, the removal of each additional edge might increase \(\#\{\text {components}\}-\#\{\text {active vertices}\}\) by 1 upon considering the same three cases as above. When the last such edge is removed, there are no longer any edges with label j by the definition of a good single, so \(\#\{\text {distinct edge labels}\}\) decreases by 1. Hence removing all edges with label j decreases M by at most the number of such edges, and \(M \ge k-k'-\varDelta \) after removing the edges corresponding to all \(k'\) good singles.

Call the resulting graph \(G'\). Every vertex of \(G'\) still has even degree in the subgraph of edges with label j, for any j. In particular, every active vertex of \(G'\) has degree at least two. By Definition A.17, \(i \in \{1,\ldots ,p\}\) is a connector if and only if i has degree exactly two in \(G'\), in which case the edges incident to i in \(G'\) must have the same label j, and the n-vertices with label j in the l-graph are the bad singles connected by i. A connected cycle of bad singles corresponds to the edges of a cycle of (necessarily distinct) vertices in \(G'\) with degree exactly two.

The number of distinct edge labels in \(G'\) equals the number of distinct n-labels in the l-graph appearing on bad non-singles (as any n-label appearing on a bad single also appears on some bad non-single). By Lemma A.13 this is at most \(96D\varDelta \). Hence \(\#\{\text {active vertices}\}-\#\{\text {components}\} \ge k-k'-(96D+1)\varDelta \) for \(G'\). The number of total edges in \(G'\) is at most \(k-k'+96D\varDelta \), with \(k-k'\) of them corresponding to bad singles and at most \(96D\varDelta \) corresponding to bad non-singles. Then the total vertex degree of \(G'\) is at most \(2(k-k'+96D\varDelta )\). As each active vertex in \(G'\) has degree at least two, this implies \(\#\{\text {active vertices}\} \le k-k'+96D\varDelta \). Then \(\#\{\text {components}\} \le (192D+1)\varDelta \), so there are at most \((192D+1)\varDelta \) connected cycles of bad singles. Furthermore, if there are x connectors (i.e. active vertices with degree exactly two), then since there are at least \(k-k'-(96D+1)\varDelta \) active vertices, \(2x+4(k-k'-(96D+1)\varDelta -x) \le 2(k-k'+96D\varDelta )\), so \(x \ge k-k'-(288D+2)\varDelta \). \(\square \)

Proof

(Proposition 5.17, property (3)) Let C denote a positive constant that may depend on D and that may change from instance to instance. Fix \(\varDelta _0 \ge 0\) and \(\tilde{\mathcal {L}}\). We upper bound the number of ways in which steps (1)–(8) of the inversion procedure may be performed, such that the resulting multi-labeling L is canonical for some \(\mathcal {L} \in \mathcal {S}(\varDelta _0,\tilde{\mathcal {L}})\):

There are at most \(l+1\) ways of performing step (1).

By Lemma A.13, to yield L with excess \(\varDelta _0\), there can be at most \(C\varDelta _0\) bad non-single n-vertices, and hence we must take \(|S| \le C\varDelta _0\) in step (2).

To perform step (3), for each vertex in S, we may first choose the number of n-labels d between 2 and D, and then there are at most \((Dl)^d\) ways of choosing the n-labels for that vertex.

For step (4), suppose \(k'\) good single and \(k-k'\) bad single n-vertices were identified in steps (1) and (2). By Lemma A.18, there are at least \(k-k'-C\varDelta _0\) connectors, and any two connected bad single n-vertices must be given the same n-label. (The p-labels of \(\tilde{\mathcal {L}}\) are known and are preserved in L, so after steps (1) and (2) we know which labels are connectors and which bad singles must be connected in L.) Going through the connectors one-by-one, each successive connector constrains the n-label of one more bad single n-vertex, unless that connector closes a connected cycle. But as there are at most \(C\varDelta _0\) connected cycles by Lemma A.18, the number of bad single n-vertices that we can freely label at most \(C\varDelta _0\). Then there are at most \((Dl)^{C\varDelta _0}\) ways to perform step (4).

For step (5), recall that the pairs of p-vertices surrounding the two n-vertices of a good pair must have the same pair of p-labels. By Lemma A.12, for all but at most \(C\varDelta _0\) of the n-vertices with empty label, this pairing is uniquely determined, so there are at most \((C\varDelta _0)^{C\varDelta _0}\) ways of performing step (5).

For step (6), there are \((l-\tilde{k}(\tilde{\mathcal {L}}))/2\) good pairs, and for each pair we may choose the number of n-labels d between 2 and D and then one of d! permutations.

Lemma A.16 shows that \(|\mathcal {G}| \le 2\varDelta _0\) for step (7). For each element that we add to the ordered subset of \(\mathcal {G}\), there are at most \(2\varDelta _0\) choices for this element and at most 2l ways of choosing \(W'\) and which half of the cycle to reverse, or we may choose to not add any more elements. We make such a choice at most \(2\varDelta _0\) times, so there are at most \((4\varDelta _0 l+1)^{2\varDelta _0}\) ways of performing step (7).

Finally, there is at most one way to perform step (8), as the labels on the good single and good pair n-vertices are distinct from those on the bad single and bad non-single n-vertices, and each new n-label and p-label has a unique choice to make L canonical.

We may incorporate the product \(\prod _{s=1}^l |a_{d_s}(\mathcal {L})|/ (d_s(\mathcal {L})!)^{1/2}\) on the left side of (16) into the cardinality count by noting that this product contributes \(|a_d|/(d!)^{1/2}\) for each vertex in S having dn-labels, \(a_d^2/d!\) for each good pair having dn-labels per vertex of the pair, and \(|a_1|\) for each of the \(\tilde{k}(\tilde{\mathcal {L}})-|S|\) single vertices in L. Combining the above bounds then yields

$$\begin{aligned}&\mathop {\sum _{\mathcal {L} \in \varphi ^{-1}(\tilde{\mathcal {L}})}} _{\varDelta (\mathcal {L})=\varDelta _0} \prod _{s=1}^l \frac{|a_{d_s}(\mathcal {L})|}{(d_s(\mathcal {L})!)^{1/2}}\\&\quad \le (l+1)\sum _S \left( \sum _{d=2}^D (Dl)^d\frac{|a_d|}{(d!)^{1/2}} \right) ^{|S|}(Dl)^{C\varDelta _0}(C\varDelta _0)^{C\varDelta _0}\\&\quad \quad \left( \sum _{d=2}^D d!\frac{a_d^2}{d!} \right) ^{\frac{l-\tilde{k}(\tilde{\mathcal {L}})}{2}}(4\varDelta _0l+1)^{2\varDelta _0} |a_1|^{\tilde{k}(\tilde{\mathcal {L}})-|S|}\\&\quad \le (l+1)(Cl)^{C\varDelta _0}|a|^{\tilde{k}(\tilde{\mathcal {L}})} (\nu -a^2)^{\frac{l-\tilde{k}(\tilde{\mathcal {L}})}{2}}\sum _S |a|^{-|S|} \left( \sum _{d=2}^D (Dl)^d\frac{|a_d|}{(d!)^{1/2}}\right) ^{|S|}, \end{aligned}$$

where \(\sum _S\) denotes the sum over all possible sets S selected by step (2), and the second line applies \(\varDelta _0 \le Cl\) and \(\sum _{d=2}^D a_d^2=\nu -a^2\). As \(|S| \le C\varDelta _0\), this implies by Cauchy–Schwarz

$$\begin{aligned} |a|^{-|S|}\left( \sum _{d=2}^D (Dl)^d \frac{|a_d|}{(d!)^{1/2}}\right) ^{|S|}&\le (Cl)^{C\varDelta _0} |a|^{-|S|}\left( \sum _{d=2}^D \frac{a_d^2}{d!}\right) ^{\frac{|S|}{2}}\\&\le (Cl)^{C\varDelta _0} \left( \frac{\sqrt{\nu }}{|a|}\right) ^{C\varDelta _0}. \end{aligned}$$

The sum is over at most \(l^{C\varDelta _0}\) possible sets S, so this verifies condition (3) of the proposition upon noting that \((Cl)^{C\varDelta _0} \le l^{C_3+C_4\varDelta _0}\) for some constants \(C_3,C_4>0\) and all \(l \ge 2\). \(\square \)

Moment bound for a deformed GUE matrix

In this appendix, we prove Proposition 5.11. Recall Definition 5.10 of M, W, V, and Z, which implicitly depend on \(\tilde{p}\) and \(\tilde{n}\). Throughout this section, we will use p and n in place of \(\tilde{p}\) and \(\tilde{n}\).

Lemma B.1

Suppose \(n,p \rightarrow \infty \) with \(p/n \rightarrow \gamma \). Then \(\Vert M\Vert \rightarrow \Vert \mu _{a,\nu ,\gamma }\Vert \) almost surely

Proof

Recall that \(M=\sqrt{\frac{\gamma (\nu -a^2)}{p}}W+\frac{a}{n}V\), where \(V=ZZ^T-D\) and \(D={\text {diag}}(\Vert Z_i\Vert _2^2)\). The empirical spectral distribution of \(\frac{1}{n}ZZ^T\) converges weakly almost surely to \(\mu _{\mathrm {MP},\gamma }\). By a chi-squared tail bound and a union bound, \(\Vert \frac{1}{n}D-{\text {Id}}\Vert \rightarrow 0\), so the empirical spectral distribution of \(\frac{a}{n}V\) converges weakly almost surely to \(a(\mu _{\mathrm {MP},\gamma }-1)\). Furthermore, the maximal distance between an eigenvalue of \(\frac{a}{n}V\) and the support of \(a(\mu _{\mathrm {MP},\gamma }-1)\) converges to 0 almost surely by the results of [2, 63].

Let \(V=O\varLambda O^T\) where O is the real orthogonal matrix that diagonalizes V. Then the spectrum of M is the same as that of \(\sqrt{\frac{\gamma (\nu -a^2)}{p}}O^TWO +\frac{a}{n}\varLambda \), and \(O^TWO\) is still distributed as the GUE. Conditional on V, the above arguments and Proposition 8.1 of [16] imply \(\Vert \sqrt{\frac{\gamma (\nu -a^2)}{p}}O^TWO+\frac{a}{n}\varLambda \Vert \rightarrow \Vert \mu _{a,\nu ,\gamma }\Vert \) almost surely. As this convergence holds almost surely in V, it holds unconditionally as well. \(\square \)

Lemma B.2

Suppose \(n,p \rightarrow \infty \) with \(p/n \rightarrow \gamma \), let \(l:=l(n)\) be such that \(l(n)/n \rightarrow 0\), and let \(\mathcal {B}_n\) be any event. Then there exist positive constants \(C:=C_{a,\nu ,\gamma }\) and \(c:=c_{a,\nu ,\gamma }\) such that \(\mathbb {E}[\Vert M\Vert ^l\mathbb {1}\{\mathcal {B}_n\}] \le C^l\mathbb {P}[\mathcal {B}_n]+e^{-cn}\) for all large n.

Proof

Note

$$\begin{aligned} \Vert M\Vert \le \sqrt{\frac{\gamma (\nu -a^2)}{p}}\Vert W\Vert +\frac{|a|}{n}\Vert ZZ^T\Vert +\frac{|a|}{n}\max _{1 \le i \le p} \Vert Z_i\Vert _2^2. \end{aligned}$$

Applying standard tail bounds (e.g. Corollary 2.3.5 of [55], Corollary 5.35 of [57], and Lemma 1 of [38]), there exist constants \(C,\varepsilon >0\) depending on \(a,\nu ,\gamma \) such that, for all \(t \ge C\) and sufficiently large n, \(\mathbb {P}[\Vert M\Vert >t] \le e^{-\varepsilon tn}\). Then we may write

$$\begin{aligned} \mathbb {E}\left[ \Vert M\Vert ^l\mathbb {1}\{\mathcal {B}_n\}\right]&=\mathbb {E}\left[ \Vert M\Vert ^l\mathbb {1}\{\mathcal {B}_n\}\mathbb {1}\{\Vert M\Vert \le C\}\right] +\mathbb {E}\left[ \Vert M\Vert ^l\mathbb {1}\{\mathcal {B}_n\}\mathbb {1}\{\Vert M\Vert>C\}\right] \\&\le C^l\mathbb {P}[\mathcal {B}_n]+\int _{C^l}^\infty \mathbb {P}\left[ \Vert M\Vert ^l>t\right] dt\\&=C^l\mathbb {P}[\mathcal {B}_n]+\int _C^\infty \mathbb {P}[\Vert M\Vert >s]\cdot ls^{l-1}ds\\&\le C^l\mathbb {P}[\mathcal {B}_n]+l\int _C^\infty e^{-\varepsilon sn+(l-1)\log s}ds\\&\le C^l\mathbb {P}[\mathcal {B}_n]+l\int _C^\infty e^{-(\varepsilon n-l)s}ds\\&=C^l\mathbb {P}[\mathcal {B}_n]+\frac{l}{\varepsilon n-l}e^{-(\varepsilon n-l)C} \end{aligned}$$

for all large n. As \(l=o(n)\), the result follows upon setting \(c=C\varepsilon /2\). \(\square \)

Lemma B.3

Suppose \(n,p \rightarrow \infty \) with \(p/n \rightarrow \gamma \). Then \(\mathbb {E}[\Vert M\Vert ] \rightarrow \Vert \mu _{a,\nu ,\gamma }\Vert \).

Proof

Lemma B.1 and Fatou’s lemma imply \(\liminf \mathbb {E}[\Vert M\Vert ] \ge \Vert \mu _{a,\nu ,\gamma }\Vert \). For any \(\varepsilon >0\), let \(\mathcal {B}_n=\left\{ \Vert M\Vert >\Vert \mu _{a,\nu ,\gamma }\Vert +\varepsilon \right\} \). Then

$$\begin{aligned} \mathbb {E}[\Vert M\Vert ]=\mathbb {E}[\Vert M\Vert \mathbb {1}\{\mathcal {B}_n^C\}] +\mathbb {E}[\Vert M\Vert \mathbb {1}\{\mathcal {B}_n\}] \le \Vert \mu _{a,\nu ,\gamma }\Vert +\varepsilon +\mathbb {E}[\Vert M\Vert \mathbb {1}\{\mathcal {B}_n\}]. \end{aligned}$$

Lemma B.1 implies that \(\mathbb {P}[\mathcal {B}_n] \rightarrow 0\), so Lemma B.2 (with \(l=1\)) implies that \(\mathbb {E}[\Vert M\Vert \mathbb {1}\{\mathcal {B}_n\}] \rightarrow 0\) as well. Then \(\mathbb {E}[\Vert M\Vert ] \le \Vert \mu _{a,\nu ,\gamma }\Vert +2\varepsilon \) for all large n, and the result follows by taking \(\varepsilon \rightarrow 0\). \(\square \)

Lemma B.4

Suppose \(F:\mathbb {R}^d \rightarrow \mathbb {R}\) is L-Lipschitz on a set \(G \subseteq \mathbb {R}^k\), i.e. \(|F(x)-F(y)| \le L\Vert x-y\Vert _2\) for all \(x,y \in G\). Let \(\xi \sim N(0,I_d)\). Then there exists a function \(\tilde{F}:\mathbb {R}^d \rightarrow \mathbb {R}\) such that \(\tilde{F}(x)=F(x)\) for all \(x \in G\), \(|\tilde{F}(x)-\tilde{F}(y)| \le L\Vert x-y\Vert _2\) for all \(x,y \in \mathbb {R}^k\), and, for all \(\varDelta >0\),

$$\begin{aligned} \mathbb {P}[F(\xi )-\mathbb {E}F(\xi ) \ge \varDelta +|\mathbb {E}F(\xi )-\mathbb {E}\tilde{F}(\xi )| \text { and } \xi \in G] \le e^{-\frac{\varDelta ^2}{2L^2}}. \end{aligned}$$

Proof

Let \(\tilde{F}(x)=\inf _{x' \in G} (F(x')+L\Vert x-x'\Vert _2)\). Note that if \(x \in G\), then \(F(x) \le F(x')+L\Vert x-x'\Vert _2\) for all \(x' \in G\), so \(\tilde{F}(x)=F(x)\). Also, for any \(x,y \in \mathbb {R}^k\) and \(\varepsilon >0\), there exists \(x' \in G\) such that \(\tilde{F}(x) \ge F(x')+L\Vert x-x'\Vert _2-\varepsilon \). Then by definition, \(\tilde{F}(y) \le F(x')+L\Vert y-x'\Vert _2\), so \(\tilde{F}(y)-\tilde{F}(x) \le L\Vert y-x'\Vert _2-L\Vert x-x'\Vert _2+\varepsilon \le L\Vert x-y\Vert _2+\varepsilon \). Similarly, \(\tilde{F}(x)-\tilde{F}(y) \le L\Vert x-y\Vert _2+\varepsilon \). This holds for all \(\varepsilon >0\), so \(|\tilde{F}(x)-\tilde{F}(y)| \le L\Vert x-y\Vert _2\). Finally, applying Gaussian concentration of measure for the Lipschitz function \(\tilde{F}\),

$$\begin{aligned}&\mathbb {P}[F(\xi )-\mathbb {E}F(\xi ) \ge \varDelta +|\mathbb {E}F(\xi )-\mathbb {E}\tilde{F}(\xi )| \text { and } \xi \in G]\\&\quad = \mathbb {P}[\tilde{F}(\xi ) \ge \varDelta +|\mathbb {E}F(\xi )-\mathbb {E}\tilde{F}(\xi )|+\mathbb {E}F(\xi ) \text { and } \xi \in G]\\&\quad \le \mathbb {P}[\tilde{F}(\xi ) \ge \varDelta +\mathbb {E}\tilde{F}(\xi )] \le e^{-\frac{\varDelta ^2}{2L^2}}. \end{aligned}$$

\(\square \)

Lemma B.5

Suppose \(n,p \rightarrow \infty \) with \(p/n \rightarrow \gamma \), and let \(\varepsilon >0\). Then there exist \(c:=c_{a,\nu ,\gamma }>0\) and \(N:=N_{a,\nu ,\gamma ,\varepsilon }>0\) and a set \(G:=G_{n,p} \subset \mathbb {R}^{p \times n}\) with \(\mathbb {P}[Z \in G] \ge 1-2e^{-\frac{n}{2}}\), such that for all \(t>\varepsilon \) and \(n>N\),

$$\begin{aligned} \mathbb {P}[\Vert M\Vert \ge \Vert \mu _{a,\nu ,\gamma }\Vert +t \text { and } Z \in G] \le e^{-cnt^2}. \end{aligned}$$

Proof

Recall \(M=\sqrt{\frac{\gamma (\nu -a^2)}{p}}W+\frac{a}{n}(ZZ^T -{\text {diag}}(\Vert Z_i\Vert _2^2))\). Denote

$$\begin{aligned} \mathcal {W}=\big ((w_{ii})_{1 \le i \le p},(\sqrt{2} {\text {Re}}w_{ij},\sqrt{2} {\text {Im}}w_{ij})_{1 \le i<j \le p}\big ) \in \mathbb {R}^{p^2}, \end{aligned}$$

so that the entries of \(\mathcal {W}\) and Z are IID \(\mathcal {N}(0,1)\). Define \(f:\mathbb {R}^{p^2+np} \rightarrow \mathbb {R}\) and \(f_v:\mathbb {R}^{p^2+np} \rightarrow \mathbb {R}\) for \(v \in \mathbb {C}^p\) by \(f(\mathcal {W},Z)=\Vert M\Vert \) and \(f_v(\mathcal {W},Z)=v^*Mv\), so that \(f(\mathcal {W},Z)=\sup _{v \in \mathbb {C}^p:\Vert v\Vert _2=1} |f_v(\mathcal {W},Z)|\). By elementary calculations, and denoting \(Z_i\) as the ith row of Z,

$$\begin{aligned}&\frac{\partial f_v(\mathcal {W},Z)}{\partial w_{ii}} =\sqrt{\frac{\gamma (\nu -a^2)}{p}}|v_i|^2,\quad \frac{\partial f_v(\mathcal {W},Z)}{\partial (\sqrt{2}{\text {Re}}w_{ij})} =\sqrt{\frac{2\gamma (\nu -a^2)}{p}}{\text {Re}}(\overline{v_i}v_j),\\&\frac{\partial f_v(\mathcal {W},Z)}{\partial (\sqrt{2}{\text {Im}}w_{ij})} =-\sqrt{\frac{2\gamma (\nu -a^2)}{p}}{\text {Im}}(\overline{v_i}v_j),\quad \nabla _{Z_i} f_v(\mathcal {W},Z)=\frac{2a}{n}\mathop {\sum _{j=1}^p}_{j \ne i} {\text {Re}}(\overline{v_i}v_j)Z_j. \end{aligned}$$

Then, for any \(v \in \mathbb {C}^p\) such that \(\Vert v\Vert _2=1\),

$$\begin{aligned}&\Vert \nabla f_v(\mathcal {W},Z)\Vert _2^2\\&\quad =\frac{\gamma (\nu -a^2)}{p}\left( \sum _{i=1}^p |v_i|^4+2\sum _{1 \le i<j \le p} |\overline{v_i}v_j|^2\right) +\frac{4a^2}{n^2}\sum _{i=1}^p \left\| \mathop {\sum _{j=1}^p}_{j \ne i} {\text {Re}}(\overline{v_i}v_j)Z_j\right\| _2^2\\&\quad \le \frac{\gamma (\nu -a^2)}{p}\left( \sum _{i=1}^p |v_i|^2\right) ^2 +\frac{4a^2}{n^2} \sum _{i=1}^p |v_i|^2 \Vert Z\Vert ^2\Vert v\Vert _2^2\\&\quad =\frac{\gamma (\nu -a^2)}{p}+\frac{4a^2\Vert Z\Vert ^2}{n^2}. \end{aligned}$$

Take \(G=\{Z \in \mathbb {R}^{p \times n}:\Vert Z\Vert \le 2\sqrt{n}+\sqrt{p}\}\). Then by Corollary 5.35 of [57], \(\mathbb {P}[Z \notin G] \le 2e^{-\frac{n}{2}}\). As \(\mathbb {R}^{p^2} \times G\) is convex, the above inequality implies \(f_v(\mathcal {W},Z)\) is L-Lipschitz on \(\mathbb {R}^{p^2} \times G\) for \(L=O(n^{-1/2})\). Then

$$\begin{aligned}&f(\mathcal {W},Z)-f(\mathcal {W}',Z')\\&\quad \le \sup _{v \in \mathbb {C}^p:\Vert v\Vert _2=1} \big (|f_v(\mathcal {W},Z)|-|f_v(\mathcal {W}',Z')|\big )\\&\quad \le \sup _{v \in \mathbb {C}^p:\Vert v\Vert _2=1} \big |f_v(\mathcal {W},Z)-f_v(\mathcal {W}',Z') \big | \le L\Vert (\mathcal {W},Z)-(\mathcal {W}',Z')\Vert _2 \end{aligned}$$

for all \(\mathcal {W},\mathcal {W}' \in \mathbb {R}^{p^2}\) and \(Z,Z' \in G\), so f is also L-Lipschitz on \(\mathbb {R}^{p^2} \times G\).

Let \(\tilde{f}:\mathbb {R}^{p^2+np} \rightarrow \mathbb {R}\) be the L-Lipschitz extension of f on \(\mathbb {R}^{p^2} \times G\) given by Lemma B.4. Note that

$$\begin{aligned} |\mathbb {E}f(\mathcal {W},Z)-\mathbb {E}\tilde{f}(\mathcal {W},Z)|&=|\mathbb {E}[(f(\mathcal {W},Z)-\tilde{f}(\mathcal {W},Z))\mathbb {1}\{Z \notin G\}]|\\&\le \mathbb {E}|f(\mathcal {W},Z)\mathbb {1}\{Z \notin G\}| +\mathbb {E}|\tilde{f}(\mathcal {W},Z)\mathbb {1}\{Z \notin G\}|. \end{aligned}$$

Lemma B.2 (with \(l=1\)) implies \(\mathbb {E}|f(\mathcal {W},Z)\mathbb {1}\{Z \notin G\}|=\mathbb {E}[\Vert M\Vert \mathbb {1}\{Z \notin G\}]=o(1)\). As \(\tilde{f}\) is L-Lipschitz,

$$\begin{aligned} |\tilde{f}(\mathcal {W},Z)| \le |\tilde{f}(0,0)|+L\Vert (\mathcal {W},Z)\Vert _2 =|f(0,0)|+L\Vert (\mathcal {W},Z)\Vert _2=L\Vert (\mathcal {W},Z)\Vert _2. \end{aligned}$$

Let

$$\begin{aligned} \mathcal {A}_n=\left\{ \left\| (\mathcal {W},Z)\right\| _2 \le \sqrt{2(p^2+np)} \right\} . \end{aligned}$$

As \(\Vert (\mathcal {W},Z)\Vert _2^2\) is chi-squared distributed with \(p^2+np\) degrees of freedom, a standard tail bound gives \(\mathbb {P}\left[ \left\| (\mathcal {W},Z)\right\| _2^2 \ge p^2+np+t\right] \le e^{-\frac{t^2}{8(p^2+np)}}\). Then

$$\begin{aligned} \mathbb {E}\left[ \left\| (\mathcal {W},Z)\right\| _2^2 \mathbb {1}\left\{ \mathcal {A}_n^C\right\} \right]&=\int _{p^2+np}^\infty \mathbb {P}\left[ \left\| (\mathcal {W},Z)\right\| _2^2 \ge p^2+np+t\right] dt\\&\le \int _{p^2+np}^\infty e^{-\frac{t^2}{8(p^2+np)}}dt\\&=2\sqrt{p^2+np}\int _{\frac{\sqrt{p^2+np}}{2}}^\infty e^{-\frac{s^2}{2}}ds \sim 4e^{-\frac{p^2+np}{8}}. \end{aligned}$$

This implies

$$\begin{aligned}&\mathbb {E}|\tilde{f}(\mathcal {W},Z)\mathbb {1}\{Z \notin G\}|\\&\le \mathbb {E}[|\tilde{f}(\mathcal {W},Z)|\mathbb {1}\{Z \notin G\}\mathbb {1}\{\mathcal {A}_n\}] +\mathbb {E}[|\tilde{f}(\mathcal {W},Z)|\mathbb {1}\{Z \notin G\}\mathbb {1}\{\mathcal {A}_n^C\}]\\&\le L\sqrt{2(p^2+np)}\mathbb {P}[Z \notin G] +L\mathbb {E}\left[ \left\| (\mathcal {W},Z)\right\| _2^2\mathbb {1}\{\mathcal {A}_n^C\}\right] ^{1/2} =o(1). \end{aligned}$$

Then \(|\mathbb {E}f(\mathcal {W},Z)-\mathbb {E}\tilde{f}(\mathcal {W},Z)|=o(1)\), so Lemmas B.3 and B.4 imply, for all \(t>\varepsilon \) and all sufficiently large n (i.e. \(n>N_{a,\nu ,\gamma ,\varepsilon }\) independent of t),

$$\begin{aligned}&\mathbb {P}[\Vert M_{n,p}\Vert \ge \Vert \mu _{a,\nu ,\gamma }\Vert +t \text { and } Z \in G]\\&\quad \le \mathbb {P}\left[ \Vert M_{n,p}\Vert -\mathbb {E}\Vert M_{n,p}\Vert \ge t-\tfrac{\varepsilon }{2}+|\mathbb {E}f(\mathcal {W},Z) -\mathbb {E}\tilde{f}(\mathcal {W},Z)| \text { and } Z \in G \right] \\&\quad \le e^{-\frac{(t-\varepsilon /2)^2}{2L^2}}\le e^{-\frac{t^2}{8L^2}}. \end{aligned}$$

The result follows upon noting that \(L=O(n^{-1/2})\). \(\square \)

Proof

(Proposition 5.11) Let \(c>0\) and \(G \subset \mathbb {R}^{p \times n}\) be as in Lemma B.5. Then, for any \(\varepsilon >0\),

$$\begin{aligned}&\mathbb {E}[\Vert M\Vert ^l\mathbb {1}\{Z \in G\}]\\&\le (\Vert \mu _{a,\nu ,\gamma }\Vert +\varepsilon )^l +\mathbb {E}\left[ \Vert M\Vert ^l\mathbb {1}\{\Vert M\Vert \ge \Vert \mu _{a,\nu ,\gamma }\Vert +\varepsilon \} \mathbb {1}\{Z \in G\}\right] \\&=(\Vert \mu _{a,\nu ,\gamma }\Vert +\varepsilon )^l+\int _{(\Vert \mu _{a,\nu ,\gamma }\Vert +\varepsilon )^l}^\infty \mathbb {P}\left[ \Vert M\Vert ^l \ge t \text { and } Z \in G \right] dt\\&=(\Vert \mu _{a,\nu ,\gamma }\Vert +\varepsilon )^l+\int _{\Vert \mu _{a,\nu ,\gamma }\Vert +\varepsilon }^\infty \mathbb {P}[\Vert M\Vert \ge s \text { and } Z \in G] \cdot ls^{l-1}ds\\&\le (\Vert \mu _{a,\nu ,\gamma }\Vert +\varepsilon )^l+l\int _\varepsilon ^\infty e^{-cns^2} (\Vert \mu _{a,\nu ,\gamma }\Vert +s)^{l-1}ds \end{aligned}$$

for all sufficiently large n, where we have applied Lemma B.5. Note that

$$\begin{aligned} l\int _\varepsilon ^\infty e^{-cns^2}(\Vert \mu _{a,\nu ,\gamma }\Vert +s)^{l-1}ds&\le l\int _\varepsilon ^\infty e^{-cns^2+l(\Vert \mu _{a,\nu ,\gamma }\Vert +s)}ds\\&=le^{l\Vert \mu _{a,\nu ,\gamma }\Vert +\frac{l^2}{4cn}}\int _\varepsilon ^\infty e^{-cn\left( s-\frac{l}{2cn}\right) ^2}ds\\&=\frac{le^{l\Vert \mu _{a,\nu ,\gamma }\Vert +\frac{l^2}{4cn}}}{\sqrt{2cn}} \int _{\sqrt{2cn}\left( \varepsilon -\frac{l}{2cn}\right) }^\infty e^{-\frac{t^2}{2}}dt\\&\sim \frac{le^{l\Vert \mu _{a,\nu ,\gamma }\Vert +\frac{l^2}{4cn}}}{2cn\left( \varepsilon -\frac{l}{2cn}\right) } e^{-cn\left( \varepsilon -\frac{l}{2cn}\right) ^2} \rightarrow 0 \end{aligned}$$

for \(l=O(\log n)\), so \(\mathbb {E}[\Vert M\Vert ^l\mathbb {1}\{Z \in G\}] \le (\Vert \mu _{a,\nu ,\gamma }\Vert +\varepsilon )^l+o(1)\). On the other hand, \(\mathbb {P}[Z \notin G] \le 2e^{-\frac{n}{2}}\) by Lemma B.5, so Lemma B.2 implies \(\mathbb {E}[\Vert M\Vert ^l\mathbb {1}\{Z \notin G\}]=o(1)\) for \(l=O(\log n)\). Hence \(\mathbb {E}[\Vert M\Vert ^l] \le (\Vert \mu _{a,\nu ,\gamma }\Vert +\varepsilon )^l+o(1)\), and taking \(\varepsilon \rightarrow 0\) concludes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, Z., Montanari, A. The spectral norm of random inner-product kernel matrices. Probab. Theory Relat. Fields 173, 27–85 (2019). https://doi.org/10.1007/s00440-018-0830-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-018-0830-4

Keywords

Mathematics Subject Classification

Navigation