Notes on Computational Hardness of Hypothesis Testing: Predictions Using the Low-Degree Likelihood Ratio

Kunisky, Dmitriy; Wein, Alexander S.; Bandeira, Afonso S.

doi:10.1007/978-3-030-97127-4_1

Dmitriy Kunisky^3,4,
Alexander S. Wein^5,6 &
Afonso S. Bandeira^7,8

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 385))

Included in the following conference series:

ISAAC Congress (International Society for Analysis, its Applications and Computation)

462 Accesses
13 Citations

Abstract

These notes survey and explore an emerging method, which we call the low-degree method, for understanding statistical-versus-computational tradeoffs in high-dimensional inference problems. In short, the method posits that a certain quantity—the second moment of the low-degree likelihood ratio—gives insight into how much computational time is required to solve a given hypothesis testing problem, which can in turn be used to predict the computational hardness of a variety of statistical inference tasks. While this method originated in the study of the sum-of-squares (SoS) hierarchy of convex programs, we present a self-contained introduction that does not require knowledge of SoS. In addition to showing how to carry out predictions using the method, we include a discussion investigating both rigorous and conjectural consequences of these predictions. These notes include some new results, simplified proofs, and refined conjectures. For instance, we point out a formal connection between spectral methods and the low-degree likelihood ratio, and we give a sharp low-degree lower bound against subexponential-time algorithms for tensor PCA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We will only consider this so-called strong version of distinguishability, where the probability of success must tend to 1 as n →∞, as opposed to the weak version where this probability need only be bounded above $\frac {1}{2}$. For high-dimensional problems, the strong version typically coincides with important notions of estimating the planted signal (see Sect. 4.2.6), whereas the weak version is often trivial.
2.
For instance, what will be relevant in the examples we consider later, any pair of non-degenerate multivariate Gaussian distributions satisfy this assumption.
3.
It is important to note that, from the point of view of statistics, we are restricting our attention to the special case of deciding between two “simple” hypotheses, where each hypothesis consists of the dataset being drawn from a specific distribution. Optimal testing is more subtle for “composite” hypotheses in parametric families of probability distributions, a more typical setting in practice. The mathematical difficulties of this extended setting are discussed thoroughly in [75].
4.
For readers not familiar with the Radon–Nikodym derivative: if $\mathbb {P}$, $\mathbb {Q}$ are discrete distributions then $L(\boldsymbol Y) = \mathbb {P}(\boldsymbol Y)/\mathbb {Q}(\boldsymbol Y)$; if $\mathbb {P}$, $\mathbb {Q}$ are continuous distributions with density functions p, q (respectively) then L(Y ) = p(Y )∕q(Y ).
5.
For a more precise definition of $L^2(\mathbb {Q}_n)$ (in particular including issues around functions differing on sets of measure zero) see a standard reference on real analysis such as [100].
6.
To clarify, orthogonal projection is with respect to the inner product induced by $\mathbb {Q}_n$ (see Definition 7).
7.
Two techniques from this calculation are elements of the “replica method” from statistical physics: (1) writing a power of an expectation as an expectation over independent “replicas” and (2) changing the order of expectations and evaluating the moment-generating function. The interested reader may see [82] for an early reference, or [21, 79] for two recent presentations.
8.
We will not actually use the definition of the univariate Hermite polynomials (although we will use certain properties that they satisfy as needed), but the definition is included for completeness in Appendix “Hermite Polynomials”.
9.
This model is equivalent to the more standard model in which the noise is symmetric with respect to permutations of the indices; see Appendix “Equivalence of Symmetric and Asymmetric Noise Models”.
10.
Concretely, one may take $A_p = \frac {1}{\sqrt {2}} p^{-p/4-1/2}$ and $B_p = \sqrt {2} e^{p/2} p^{-p/4}$.
11.
Some of these results only apply to minor variants of the spiked tensor problem, but we do not expect this difference to be important.
12.
Gaussian Orthogonal Ensemble (GOE): W is a symmetric n × n matrix with entries $W_{ii} \sim \mathscr {N}(0,2/n)$ and $W_{ij} = W_{ji} \sim \mathscr {N}(0,1/n)$, independently.
13.
In the sparse Rademacher prior, each entry of x is nonzero with probability ρ (independently), and the nonzero entries are drawn uniformly from $\{\pm 1/\sqrt {\rho }\}$.
14.
More specifically, $(\|L_n^{\le D}\|{ }^2 - 1)$ is the variance of a certain pseudo-expectation value generated by pseudo-calibration, whose actual value in a valid pseudo-expectation must be exactly 1. It appears to be impossible to “correct” this part of the pseudo-expectation if the variance is diverging with n.
15.
Here, “best” is in the sense of strongly distinguishing $\mathbb {P}_n$ and $\mathbb {Q}_n$ throughout the largest possible regime of model parameters.
16.
In [47], it is shown that for a fairly general class of average-case hypothesis testing problems, if SoS succeeds in some range of parameters then there is a low-degree spectral method whose maximum positive eigenvalue succeeds (in a somewhat weaker range of parameters). However, the resulting matrix could a priori have an arbitrarily large (in magnitude) negative eigenvalue, which would prevent the spectral method from running in polynomial time. For this same reason, it seems difficult to establish a formal connection between SoS and the LDLR via spectral methods.
17.
Indeed, coordinate degree need not be phrased in terms of polynomials, and one may equivalently consider the linear subspace of $L^2(\mathbb {Q}_n)$ of functions that is spanned by functions of at most D variables at a time.
18.
Non-trivial estimation of a signal $\boldsymbol x \in \mathbb {R}^n$ means having an estimator $\hat {\boldsymbol x}$ achieving $|\langle \hat {\boldsymbol x}, \boldsymbol x \rangle |/(\|\hat {\boldsymbol x}\| \cdot \|\boldsymbol x\|) \ge \varepsilon $ with high probability, for some constant ε > 0.

References

A. Auffinger, G. Ben Arous, J. Černỳ, Random matrices and complexity of spin glasses. Commun. Pure Appl. Math. 66(2), 165–201 (2013)
Google Scholar
D. Achlioptas, A. Coja-Oghlan, Algorithmic barriers from phase transitions, in 2008 49th Annual IEEE Symposium on Foundations of Computer Science (IEEE, IEEE, 2008), pp. 793–802
Google Scholar
A. Anandkumar, Y. Deng, R. Ge, H. Mobahi, Homotopy analysis for tensor PCA (2016). arXiv preprint arXiv:1610.09322
Google Scholar
N. Alon, M. Krivelevich, B. Sudakov, Finding a large hidden clique in a random graph. Random Struct. Algorithms 13(3–4), 457–466 (1998)
Article MathSciNet MATH Google Scholar
A.A. Amini, M.J. Wainwright, High-dimensional analysis of semidefinite relaxations for sparse principal components, in 2008 IEEE International Symposium on Information Theory (IEEE, Piscataway, 2008), pp. 2454–2458
Google Scholar
N. Alon, R. Yuster, U. Zwick, Color-coding. J. ACM 42(4), 844–856 (1995)
Article MathSciNet MATH Google Scholar
M. Brennan, G. Bresler, Optimal average-case reductions to sparse PCA: from weak assumptions to strong hardness (2019). arXiv preprint arXiv:1902.07380
Google Scholar
M. Brennan, G. Bresler, W. Huleihel, Reducibility and computational lower bounds for problems with planted sparse structure (2018). arXiv preprint arXiv:1806.07508
Google Scholar
J. Baik, G.Ben Arous, S. Péché, Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33(5), 1643–1697 (2005)
Article MathSciNet MATH Google Scholar
J. Barbier, M. Dia, N. Macris, F. Krzakala, T. Lesieur, L. Zdeborová, Mutual information for symmetric rank-one matrix estimation: a proof of the replica formula, in Proceedings of the 30th International Conference on Neural Information Processing Systems (Curran Associates, 2016), pp. 424–432
Google Scholar
V.V.S.P. Bhattiprolu, M. Ghosh, V. Guruswami, E. Lee, M. Tulsiani, Multiplicative approximations for polynomial optimization over the unit sphere. Electron. Colloq. Comput. Complexity 23, 185 (2016)
Google Scholar
G.Ben Arous, R. Gheissari, A. Jagannath, Algorithmic thresholds for tensor PCA (2018). arXiv preprint arXiv:1808.00921
Google Scholar
V. Bhattiprolu, V. Guruswami, E. Lee, Sum-of-squares certificates for maxima of random tensors on the sphere (2016). arXiv preprint arXiv:1605.00903
Google Scholar
F. Benaych-Georges, R. Rao Nadakuditi, The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adva. Math. 227(1), 494–521 (2011)
Article MathSciNet MATH Google Scholar
B. Barak, S. Hopkins, J. Kelner, P.K. Kothari, A. Moitra, A. Potechin, A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM J. Comput. 48(2), 687–735 (2019)
Article MathSciNet MATH Google Scholar
A. Blum, A. Kalai, H. Wasserman, Noise-tolerant learning, the parity problem, and the statistical query model. J. ACM 50(4), 506–519 (2003)
Article MathSciNet MATH Google Scholar
A.S. Bandeira, D. Kunisky, A.S. Wein, Computational hardness of certifying bounds on constrained PCA problems (2019). arXiv preprint arXiv:1902.07324
Google Scholar
C. Bordenave, M. Lelarge, L. Massoulié, Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs, in 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (IEEE, Piscataway, 2015), pp. 1347–1357
Google Scholar
J. Banks, C. Moore, J. Neeman, P. Netrapalli, Information-theoretic thresholds for community detection in sparse networks, in Conference on Learning Theory (2016), pp. 383–416
Google Scholar
J. Banks, C. Moore, R. Vershynin, N. Verzelen, J. Xu, Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization. IEEE Trans. Inform. Theory 64(7), 4872–4894 (2018)
Article MathSciNet MATH Google Scholar
A.S. Bandeira, A. Perry, A.S. Wein, Notes on computational-to-statistical gaps: predictions using statistical physics (2018). arXiv preprint arXiv:1803.11132
Google Scholar
Q. Berthet, P. Rigollet, Computational lower bounds for sparse PCA (2013). arXiv preprint arXiv:1304.0828
Google Scholar
B. Barak, D. Steurer, Proofs, beliefs, and algorithms through the lens of sum-of-squares. Course Notes (2016). http://www.sumofsquares.org/public/index.html
W.-K. Chen, D. Gamarnik, D. Panchenko, M. Rahman, Suboptimality of local algorithms for a class of max-cut problems. Ann. Probab. 47(3), 1587–1618 (2019)
Article MathSciNet MATH Google Scholar
Y. Deshpande, E. Abbe, A. Montanari, Asymptotic mutual information for the two-groups stochastic block model (2015). arXiv preprint arXiv:1507.08685
Google Scholar
M. Dyer, A. Frieze, M. Jerrum, On counting independent sets in sparse graphs. SIAM J. Comput. 31(5), 1527–1541 (2002)
Article MathSciNet MATH Google Scholar
I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, A. Stewart, Robust estimators in high-dimensions without the computational intractability. SIAM J. Comput. 48(2), 742–864 (2019)
Article MathSciNet MATH Google Scholar
A. Decelle, F. Krzakala, C. Moore, L. Zdeborová, Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84(6), 066106 (2011)
Google Scholar
A. Decelle, F. Krzakala, C. Moore, L. Zdeborová, Inference and phase transitions in the detection of modules in sparse networks. Phys. Rev. Lett. 107(6), 065701 (2011)
Google Scholar
I. Diakonikolas, D.M. Kane, A. Stewart, Statistical query lower bounds for robust estimation of high-dimensional Gaussians and gaussian mixtures, in 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) (IEEE, Piscataway, 2017), pp. 73–84
Google Scholar
Y. Ding, D. Kunisky, A.S. Wein, A.S. Bandeira, Subexponential-time algorithms for sparse PCA (2019). arXiv preprint
Google Scholar
Y. Deshpande, A. Montanari, Sparse PCA via covariance thresholding, in Advances in Neural Information Processing Systems (2014), pp. 334–342
Google Scholar
Y. Deshpande, A. Montanari, Finding hidden cliques of size $\sqrt {(N/e)}$ in nearly linear time. Found. Comput. Math. 15(4), 1069–1128 (2015)
Google Scholar
Y. Deshpande, A. Montanari, Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems, in Conference on Learning Theory (2015), pp. 523–562
Google Scholar
D.L. Donoho, A. Maleki, A. Montanari, Message-passing algorithms for compressed sensing. Proc. Nat. Acad. Sci. 106(45), 18914–18919 (2009)
Article Google Scholar
A. El Alaoui, F. Krzakala, Estimation in the spiked Wigner model: a short proof of the replica formula, in 2018 IEEE International Symposium on Information Theory (ISIT) (IEEE, 2018), pp. 1874–1878
Google Scholar
A. El Alaoui, F. Krzakala, M.I. Jordan, Finite size corrections and likelihood ratio fluctuations in the spiked Wigner model (2017). arXiv preprint arXiv:1710.02903
Google Scholar
A. El Alaoui, F. Krzakala, M.I. Jordan, Fundamental limits of detection in the spiked Wigner model (2018). arXiv preprint arXiv:1806.09588
Google Scholar
V. Feldman, E. Grigorescu, L. Reyzin, S.S. Vempala, Y. Xiao, Statistical algorithms and a lower bound for detecting planted cliques. J. ACM 64(2), 8 (2017)
Google Scholar
U. Feige, J. Kilian, Heuristics for semirandom graph problems. J. Comput. Syst. Sci. 63(4), 639–671 (2001)
Article MathSciNet MATH Google Scholar
D. Féral, S. Péché, The largest eigenvalue of rank one deformation of large Wigner matrices. Commun. Math. Phys. 272(1), 185–228 (2007)
Article MathSciNet MATH Google Scholar
V. Feldman, W. Perkins, S. Vempala, On the complexity of random satisfiability problems with planted solutions. SIAM J. Comput. 47(4), 1294–1338 (2018)
Article MathSciNet MATH Google Scholar
D. Grigoriev, Linear lower bound on degrees of Positivstellensatz calculus proofs for the parity. Theor. Comput. Sci. 259(1–2), 613–622 (2001)
Article MathSciNet MATH Google Scholar
D. Gamarnik, M. Sudan, Limits of local algorithms over sparse random graphs, in Proceedings of the 5th Conference on Innovations in Theoretical Computer Science(ACM, New York, 2014), pp. 369–376
Google Scholar
D. Gamarnik, I. Zadik, Sparse high-dimensional linear regression. algorithmic barriers and a local search algorithm (2017). arXiv preprint arXiv:1711.04952
Google Scholar
D. Gamarnik I. Zadik, The landscape of the planted clique problem: Dense subgraphs and the overlap gap property (2019). arXiv preprint arXiv:1904.07174
Google Scholar
S.B. Hopkins, P.K. Kothari, A. Potechin, P. Raghavendra, T. Schramm, D. Steurer, The power of sum-of-squares for detecting hidden structures, in 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) (IEEE, Piscataway, 2017), pp. 720–731
Google Scholar
S. Hopkins, Statistical Inference and the Sum of Squares Method. PhD thesis, Cornell University, August 2018
Google Scholar
S.B. Hopkins, D. Steurer, Bayesian estimation from few samples: community detection and related problems (2017). arXiv preprint arXiv:1710.00264
Google Scholar
S.B. Hopkins, J. Shi, D. Steurer, Tensor principal component analysis via sum-of-square proofs, in Conference on Learning Theory (2015), pp. 956–1006
Google Scholar
S.B. Hopkins, T. Schramm, J. Shi, D. Steurer, Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors, in Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing (ACM, New York, 2016), pp. 178–191
Book MATH Google Scholar
B. Hajek, Y. Wu, J. Xu, Computational lower bounds for community detection on random graphs, in Conference on Learning Theory (2015), pp. 899–928
Google Scholar
S. Janson, Gaussian Hilbert Spaces, vol. 129 (Cambridge University Press, Cambridge, 1997)
Book MATH Google Scholar
M. Jerrum, Large cliques elude the Metropolis process. Random Struct. Algorithms 3(4), 347–359 (1992)
Article MathSciNet MATH Google Scholar
I.M. Johnstone, A.Y. Lu, Sparse principal components analysis. Unpublished Manuscript (2004)
Google Scholar
I.M. Johnstone, A.Y. Lu, On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), 682–693 (2009)
Article MathSciNet MATH Google Scholar
A. Jagannath, P. Lopatto, L. Miolane, Statistical thresholds for tensor PCA (2018). arXiv preprint arXiv:1812.03403
Google Scholar
M. Kearns, Efficient noise-tolerant learning from statistical queries. J. ACM 45(6), 983–1006 (1998)
Article MathSciNet MATH Google Scholar
F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborová, P. Zhang, Spectral redemption in clustering sparse networks. Proc. Nat. Acad. Sci. 110(52), 20935–20940 (2013)
Article MathSciNet MATH Google Scholar
P.K. Kothari, R. Mori, R. O’Donnell, D. Witmer, Sum of squares lower bounds for refuting any CSP, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (ACM, New York, 2017), pp. 132–145
MATH Google Scholar
F. Krzakała, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, L. Zdeborová, Gibbs states and the set of solutions of random constraint satisfaction problems. Proc. Nat. Acad. Sci. 104(25), 10318–10323 (2007)
Article MathSciNet MATH Google Scholar
R. Krauthgamer, B. Nadler, D. Vilenchik, Do semidefinite relaxations solve sparse PCA up to the information limit? Ann. Stat. 43(3), 1300–1322 (2015)
Article MathSciNet MATH Google Scholar
A.R. Klivans, A.A. Sherstov, Unconditional lower bounds for learning intersections of halfspaces. Mach. Learn. 69(2–3), 97–114 (2007)
Article MATH Google Scholar
L. Kučera, Expected complexity of graph partitioning problems. Discrete Appl. Math. 57(2–3), 193–212 (1995)
Article MathSciNet MATH Google Scholar
R. Kannan, S. Vempala, Beyond spectral: Tight bounds for planted Gaussians (2016). arXiv preprint arXiv:1608.03643
Google Scholar
F. Krzakala, J. Xu, L. Zdeborová, Mutual information in rank-one matrix estimation, in 2016 IEEE Information Theory Workshop (ITW) (IEEE, Piscataway, 2016), pp. 71–75
Google Scholar
J.B. Lasserre, Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11(3), 796–817 (2001)
Article MathSciNet MATH Google Scholar
L. Le Cam, Asymptotic Methods in Statistical Decision Theory (Springer, Berlin, 2012)
Google Scholar
L. Le Cam, Locally asymptotically normal families of distributions. Univ. California Publ. Stat. 3, 37–98 (1960)
Google Scholar
T. Lesieur, F. Krzakala, L. Zdeborová, MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel, in s2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton) (IEEE, 2015), pp. 680–687
Google Scholar
T. Lesieur, F. Krzakala, L. Zdeborová, Phase transitions in sparse PCA, in 2015 IEEE International Symposium on Information Theory (ISIT) (IEEE, Piscataway, 2015), pp. 1635–1639
Google Scholar
A.K. Lenstra, H.W. Lenstra, L. Lovász, Factoring polynomials with rational coefficients. Math. Ann. 261(4), 515–534 (1982)
Article MathSciNet MATH Google Scholar
M. Lelarge, L. Miolane, Fundamental limits of symmetric low-rank matrix estimation. Probab. Theory Related Fields 173(3–4), 859–929 (2019)
Article MathSciNet MATH Google Scholar
T. Lesieur, L. Miolane, M. Lelarge, F. Krzakala, L. Zdeborová, Statistical and computational phase transitions in spiked tensor estimation, in 2017 IEEE International Symposium on Information Theory (ISIT) (IEEE, Piscataway, 2017), pp. 511–515
Google Scholar
E.L. Lehmann, J.P. Romano, Testing Statistical Hypotheses (Springer, Berlin, 2006)
MATH Google Scholar
L. Massoulié, Community detection thresholds and the weak Ramanujan property, in Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing (ACM, New York, 2014), pp. 694–703
MATH Google Scholar
L. Miolane, Phase transitions in spiked matrix estimation: information-theoretic analysis (2018). arXiv preprint arXiv:1806.04343
Google Scholar
S.S. Mannelli, F. Krzakala, P. Urbani, L. Zdeborova, Passed & spurious: Descent algorithms and local minima in spiked matrix-tensor models, in International Conference on Machine Learning (2019), pp. 4333–4342
Google Scholar
M. Mezard, A. Montanari, Information, Physics, and Computation (Oxford University Press, Oxford, 2009)
Book MATH Google Scholar
E. Mossel, J. Neeman, A. Sly, Reconstruction and estimation in the planted partition model. Probab. Theory Related Fields 162(3–4), 431–461 (2015)
Article MathSciNet MATH Google Scholar
E. Mossel, J. Neeman, A. Sly, A proof of the block model threshold conjecture. Combinatorica 38(3), 665–708 (2018)
Article MathSciNet MATH Google Scholar
M. Mézard, G. Parisi, M. Virasoro, Spin Glass Theory and Beyond: An Introduction to the Replica Method and Its Applications, vol. 9 (World Scientific Publishing Company, Singapore, 1987)
MATH Google Scholar
R. Meka, A. Potechin, A. Wigderson, Sum-of-squares lower bounds for planted clique, in Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing (ACM, New York, 2015), pp. 87–96
Book MATH Google Scholar
A. Montanari, D. Reichman, O. Zeitouni, On the limitation of spectral methods: from the Gaussian hidden clique problem to rank-one perturbations of gaussian tensors, in Advances in Neural Information Processing Systems (2015), pp. 217–225
Google Scholar
L. Massoulié, L. Stephan, D. Towsley, Planting trees in graphs, and finding them back (2018). arXiv preprint arXiv:1811.01800
Google Scholar
T. Ma, A. Wigderson, Sum-of-squares lower bounds for sparse PCA, in Advances in Neural Information Processing Systems (2015), pp. 1612–1620
Google Scholar
J. Neyman, E.S. Pearson, IX. on the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Lond. Ser. A Containing Papers Math. Phys. Charact. 231(694–706), 289–337 (1933)
Google Scholar
R. O’Donnell, Analysis of Boolean Functions (Cambridge University Press, Cambridge, 2014)
Book MATH Google Scholar
P.A. Parrilo, Structured Semidefinite Programs and Semialgebraic Geometry Methods in Robustness and Optimization. PhD thesis, California Institute of Technology, 2000
Google Scholar
A. Perry, A.S. Wein, A.S. Bandeira, Statistical limits of spiked tensor models (2016). arXiv preprint arXiv:1612.07728
Google Scholar
A. Perry, A.S. Wein, A.S. Bandeira, A. Moitra, Optimality and sub-optimality of PCA I: spiked random matrix models. Ann. Stat. 46(5), 2416–2451 (2018)
Article MathSciNet MATH Google Scholar
P. Rigollet, J.-C. Hütter, High-dimensional statistics. Lecture Notes, 2018
Google Scholar
E. Richard, A. Montanari, A statistical model for tensor PCA, in Advances in Neural Information Processing Systems (2014), pp. 2897–2905
Google Scholar
P. Raghavendra, S. Rao, T. Schramm, Strongly refuting random CSPs below the spectral threshold, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (ACM, New York, 2017), pp. 121–131
MATH Google Scholar
P. Raghavendra, T. Schramm, D. Steurer, High-dimensional estimation via sum-of-squares proofs (2018). arXiv preprint arXiv:1807.11419
Google Scholar
R.W. Robinson, N.C. Wormald, Almost all cubic graphs are hamiltonian. Random Struct. Algorithms 3(2), 117–125 (1992)
Article MathSciNet MATH Google Scholar
R.W. Robinson, N.C. Wormald, Almost all regular graphs are hamiltonian. Random Struct. Algorithms 5(2), 363–374 (1994)
Article MathSciNet MATH Google Scholar
G. Schoenebeck, Linear level Lasserre lower bounds for certain k-CSPs, in 2008 49th Annual IEEE Symposium on Foundations of Computer Science (IEEE, Piscataway, 2008), pp. 593–602
Google Scholar
A. Saade, F. Krzakala, L. Zdeborová, Spectral clustering of graphs with the Bethe Hessian, in Advances in Neural Information Processing Systems (2014), pp. 406–414
Google Scholar
E.M. Stein, R. Shakarchi, Real Analysis: Measure Theory, Integration, and Hilbert Spaces (Princeton University Press, Princeton, 2009)
Book MATH Google Scholar
G. Szegö, Orthogonal Polynomials, vol. 23 (American Mathematical Soc., 1939)
Google Scholar
T. Wang, Q. Berthet, Y. Plan, Average-case hardness of RIP certification, in Advances in Neural Information Processing Systems (2016), pp. 3819–3827
Google Scholar
T. Wang, Q. Berthet, R.J. Samworth, Statistical and computational trade-offs in estimation of sparse principal components. Ann. Stat. 44(5), 1896–1930 (2016)
Article MathSciNet MATH Google Scholar
A.S. Wein, A. El Alaoui, C. Moore, The Kikuchi hierarchy and tensor PCA (2019). arXiv preprint arXiv:1904.03858
Google Scholar
I. Zadik, D. Gamarnik, High dimensional linear regression using lattice basis reduction, in Advances in Neural Information Processing Systems (2018), pp. 1842–1852
Google Scholar
L. Zdeborová, F. Krzakala, Statistical physics of inference: thresholds and algorithms. Adv. Phys. 65(5), 453–552 (2016)
Article Google Scholar

Download references

Acknowledgements

We thank the participants of a working group on the subject of these notes, organized by the authors at the Courant Institute of Mathematical Sciences during the spring of 2019. We also thank Samuel B. Hopkins, Philippe Rigollet, and David Steurer for helpful discussions.

DK was partially supported by NSF grants DMS-1712730 and DMS-1719545. ASW was partially supported by NSF grant DMS-1712730 and by the Simons Collaboration on Algorithms and Geometry. ASB was partially supported by NSF grants DMS-1712730 and DMS-1719545, and by a grant from the Sloan Foundation.

Author information

Authors and Affiliations

Department of Computer Science, Yale University, New Haven, CT, USA
Dmitriy Kunisky
Department of Mathematics, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
Dmitriy Kunisky
Simons Institute for the Theory of Computing, UC Berkeley, Berkeley, CA, USA
Alexander S. Wein
Department of Mathematics, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
Alexander S. Wein
Department of Mathematics, ETH Zürich, Zürich, Switzerland
Afonso S. Bandeira
Department of Mathematics and Center for Data Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
Afonso S. Bandeira

Authors

Dmitriy Kunisky
View author publications
You can also search for this author in PubMed Google Scholar
Alexander S. Wein
View author publications
You can also search for this author in PubMed Google Scholar
Afonso S. Bandeira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Afonso S. Bandeira .

Editor information

Editors and Affiliations

Department of Mathematics, University of Aveiro, Aveiro, Portugal
Paula Cerejeiras
Institute of Applied Analysis, TU Bergakademie Freiberg, Freiberg, Germany
Michael Reissig

Appendices

Appendix 1: Omitted Proofs

Neyman-Pearson Lemma

We include here, for completeness, a proof of the classical Neyman–Pearson lemma [87].

Proof of Lemma 1

Note first that a test f is completely determined by its rejection region, $R_f = \{\boldsymbol Y: f(\boldsymbol Y) = \mathbb {P}\}$. We may rewrite the power of f as

$$\displaystyle \begin{aligned} 1 - \beta(f) = \mathbb{P}[f(\boldsymbol Y) = \mathbb{P}] = \int_{R_f}d\mathbb{P}(\boldsymbol Y) = \int_{R_f}L(\boldsymbol Y)d\mathbb{Q}(\boldsymbol Y). \end{aligned}$$

On the other hand, our assumption on α(f) is equivalent to

$$\displaystyle \begin{aligned} \mathbb{Q}[R_f] \leq \mathbb{Q}[L(\boldsymbol Y) > \eta]. \end{aligned}$$

Thus, we are interested in solving the optimization

$$\displaystyle \begin{aligned} \begin{array}{ll} \text{maximize} & \int_{R_f}L(\boldsymbol Y)d\mathbb{Q}(\boldsymbol Y) \\ \text{subject to} & R_f \in \mathscr{F}, \\ & \mathbb{Q}[R_f] \leq \mathbb{Q}[L(\boldsymbol Y) > \eta]. \end{array} \end{aligned}$$

From this form, let us write , then the difference of powers is

$$\displaystyle \begin{aligned} (1 - \beta(L_\eta)) - (1 - \beta(f)) &= \int_{R_\star}L(\boldsymbol Y)d\mathbb{Q}(\boldsymbol Y) - \int_{R_f}L(\boldsymbol Y)d\mathbb{Q}(\boldsymbol Y) \\ &= \int_{R_\star \setminus R_f}L(\boldsymbol Y)d\mathbb{Q}(\boldsymbol Y) - \int_{R_f \setminus R_\star}L(\boldsymbol Y)d\mathbb{Q}(\boldsymbol Y) \\ &\geq \eta\left(\mathbb{Q}[R_\star \setminus R_f] - \mathbb{Q}[R_f \setminus R_\star]\right) \\ &= \eta\left(\mathbb{Q}[R_\star] - \mathbb{Q}[R_f]\right) \\ &\geq 0, \end{aligned} $$

completing the proof.

Equivalence of Symmetric and Asymmetric Noise Models

For technical convenience, in the main text we worked with an asymmetric version of the spiked Wigner model (see Sect. 3.2), Y = λxx ^⊤ + Z where Z has i.i.d. $\mathscr {N}(0,1)$ entries. A more standard model is to instead observe $\widetilde {\boldsymbol Y} = \frac {1}{2}(\boldsymbol Y + \boldsymbol Y^\top ) = \lambda \boldsymbol x \boldsymbol x^\top + \boldsymbol W$, where W is symmetric with $\mathscr {N}(0,1)$ diagonal entries and $\mathscr {N}(0,1/2)$ off-diagonal entries, all independent. These two models are equivalent, in the sense that if we are given a sample from one then we can produce a sample from the other. Clearly, if we are given Y , we can symmetrize it to form $\widetilde {\boldsymbol Y}$. Conversely, if we are given $\widetilde {\boldsymbol Y}$, we can draw an independent matrix G with i.i.d. $\mathscr {N}(0,1)$ entries, and compute $\widetilde {\boldsymbol Y} + \frac {1}{2}(\boldsymbol G - \boldsymbol G^\top )$; one can check that the resulting matrix has the same distribution as Y (we are adding back the “skew-symmetric part” that is present in Y but not $\widetilde {\boldsymbol Y}$).

In the spiked tensor model (see Sect. 3.1), our asymmetric noise model is similarly equivalent to the standard symmetric model defined in [93] (in which the noise tensor Z is averaged over all permutations of indices). Since we can treat each entry of the symmetric tensor separately, it is sufficient to show the following one-dimensional fact: for unknown $x \in \mathbb {R}$, k samples of the form $y_i = x + \mathscr {N}(0,1)$ are equivalent to one sample of the form $\tilde y = x + \mathscr {N}(0,1/k)$. Given {y _i}, we can sample $\tilde y$ by averaging: $\frac {1}{k}\sum _{i=1}^k y_i$. For the converse, fix unit vectors a ₁, …, a _k at the corners of a simplex in $\mathbb {R}^{k-1}$; these satisfy $\langle \boldsymbol a_i,\boldsymbol a_j \rangle = -\frac {1}{k-1}$ for all i≠j. Given $\tilde y$, draw $\boldsymbol u \sim \mathscr {N}(0,{\boldsymbol I}_{k-1})$ and let $y_i = \tilde y + \sqrt {1-1/k} \,\langle \boldsymbol a_i,\boldsymbol u \rangle $; one can check that these have the correct distribution.

Low-Degree Analysis of Spiked Wigner Above the PCA Threshold

Proof of Theorem 6

We follow the proof of Theorem 2(ii) in Sect. 3.1.2. For any choice of d ≤ D, using the standard bound $\binom {2d}{d} \ge 4^d/(2\sqrt {d})$,

$$\displaystyle \begin{aligned} \|L_n^{\le D}\|{}^2 &\ge \frac{\lambda^{2d}}{d!} \operatorname*{\mathbb{E}}_{\boldsymbol x^1,\boldsymbol x^2}[\langle \boldsymbol x^1,\boldsymbol x^2 \rangle^{2d}] \\ &\ge \frac{\lambda^{2d}}{d!} \binom{n}{d} \frac{(2d)!}{2^{d}} \\ &= \frac{\lambda^{2d}}{d!} \frac{n!}{d!(n-d)!} \frac{(2d)!}{2^{d}} \\ &= \lambda^{2d} \binom{2d}{d} \frac{n!}{(n-d)! 2^d} \\ &\ge \lambda^{2d} \frac{4^d}{2\sqrt{d}} \frac{(n-d)^d}{2^d} \\ &= \frac{1}{2\sqrt{d}} \left(2\lambda^2 (n-d)\right)^d \\ &= \frac{1}{2\sqrt{d}} \left(\hat\lambda^2 \left(1 - \frac{d}{n}\right)\right)^d.\end{aligned} $$

(using the moment bound12 from Section3.1.2)

Since $\hat \lambda > 1$, this diverges as n →∞ provided we choose d ≤ D with ω(1) ≤ d ≤ o(n).

Appendix 2: Omitted Probability Theory Background

Hermite Polynomials

Here we give definitions and basic facts regarding the Hermite polynomials (see, e.g, [101] for further details), which are orthogonal polynomials with respect to the standard Gaussian measure.

Definition 15

The univariate Hermite polynomials are the sequence of polynomials $h_k(x) \in \mathbb {R}[x]$ for k ≥ 0 defined by the recursion

$$\displaystyle \begin{aligned} h_0(x) &= 1, \\ h_{k + 1}(x) &= xh_k(x) - h_k^\prime(x).\end{aligned} $$

The normalized univariate Hermite polynomials are $\widehat {h}_k(x) = h_k(x) / \sqrt {k!}$.

The following is the key property of the Hermite polynomials, which allows functions in $L^2(\mathscr {N}(0, 1))$ to be expanded in terms of them.

Proposition 10

The normalized univariate Hermite polynomials form a complete orthonormal system of polynomials for $L^2(\mathscr {N}(0, 1))$.

The following are the multivariate generalizations of the above definition that we used throughout the main text.

Definition 16

The N-variate Hermite polynomials are the polynomials for $\boldsymbol \alpha \in \mathbb {N}^N$. The normalizedN-variate Hermite polynomials inNvariables are the polynomials for $\boldsymbol \alpha \in \mathbb {N}^N$.

Again, the following is the key property justifying expansions in terms of these polynomials.

Proposition 11

The normalized N-variate Hermite polynomials form a complete orthonormal system of (multivariate) polynomials for $L^2(\mathscr {N}(\boldsymbol 0, \boldsymbol I_N))$.

For the sake of completeness, we also provide proofs below of the three identities concerning univariate Hermite polynomials that we used in Sect. 2.3 to derive the norm of the LDLR under the additive Gaussian noise model. It is more convenient to prove these in a different order than they were presented in Sect. 2.3, since one identity is especially useful for proving the others.

Proof of Proposition 8 , Integration by Parts

Recall that we are assuming a function $f: \mathbb {R} \to \mathbb {R}$ is k times continuously differentiable and f and its derivatives are $O(\exp (|x|{ }^\alpha ))$ for α ∈ (0, 2), and we want to show the identity

$$\displaystyle \begin{aligned} \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_k(y) f(y)] = \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}\left[ \frac{d^k f}{dy^k}(y)\right]. \end{aligned}$$

We proceed by induction. Since h ₀(y) = 1, the case k = 0 follows immediately. We also verify by hand the case k = 1, with h ₁(y) = y:

$$\displaystyle \begin{aligned} \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}\left[yf(y) \right] &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty f(y) \cdot ye^{-y^2 / 2}dy \\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty f^\prime(y) e^{-y^2 / 2}dy \\ &= \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}\left[f^\prime(y) \right], \end{aligned} $$

where we have used ordinary integration by parts.

Now, suppose the identity holds for all degrees smaller than some k ≥ 2, and expand the degree k case according to the recursion:

$$\displaystyle \begin{aligned} \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_k(y) f(y)] &= \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[y h_{k - 1}(y) f(y)] - \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_{k - 1}^\prime(y) f(y)] \\ &= \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_{k - 1}^\prime(y)f(y)] + \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_{k - 1}(y)f^\prime(y)] \\ &\hspace{3.5cm} - \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_{k - 1}^\prime(y) f(y)] \\ &= \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_{k - 1}(y)f^\prime(y)] \\ &= \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}\left[\frac{d^k f}{dy^k}(y)\right], \end{aligned} $$

where we have used the degree 1 and then the degree k − 1 hypotheses.

Proof of Proposition 7 , Translation Identity

Recall that we want to show, for all k ≥ 0 and $\mu \in \mathbb {R}$, that

$$\displaystyle \begin{aligned} \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(\mu, 1)}[h_k(y)] = \mu^k. \end{aligned}$$

We proceed by induction on k. Since h ₀(y) = 1, the case k = 0 is immediate. Now, suppose the identity holds for degree k − 1, and expand the degree k case according to the recursion:

$$\displaystyle \begin{aligned} \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(\mu, 1)}[h_k(y)] &= \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_k(\mu + y)] \\ &= \mu \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_{k - 1}(\mu + y)] + \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[y h_{k - 1}(\mu + y)] \\ &\hspace{3.75cm} - \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_{k - 1}^\prime(\mu + y)] \end{aligned} $$

which may be simplified by the Gaussian integration by parts to

$$\displaystyle \begin{aligned} &= \mu \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_{k - 1}(\mu + y)] + \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_{k - 1}^\prime(\mu + y)] \\ &\hspace{3.75cm} - \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_{k - 1}^\prime(\mu + y)] \\ &= \mu \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}[h_{k - 1}(\mu + y)], \end{aligned} $$

and the result follows by the inductive hypothesis.

Proof of Proposition 9 , Generating Function

Recall that we want to show the series identity for any $x, y \in \mathbb {R}$,

$$\displaystyle \begin{aligned} \exp\left(xy - \frac{1}{2}x^2\right) = \sum_{k = 0}^\infty \frac{1}{k!}x^k h_k(y). \end{aligned}$$

For any fixed x, the left-hand side belongs to $L^2(\mathscr {N}(0, 1))$ in the variable y. Thus this is merely a claim about the Hermite coefficients of this function, which may be computed by taking inner products. Namely, let us write

then using Gaussian integration by parts,

$$\displaystyle \begin{aligned} \langle f_x, \widehat{h}_k \rangle &= \frac{1}{\sqrt{k!}}\operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}\left[f_x(y) h_k(y)\right] \\ &= \frac{1}{\sqrt{k!}}\operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}\left[\frac{d^k f_x}{dy^k}(y) \right] \\ &= \frac{1}{\sqrt{k!}}x^k \operatorname*{\mathbb{E}}_{y \sim \mathscr{N}(0, 1)}\left[f_x(y) \right]. \end{aligned} $$

A simple calculation shows that $\mathbb {E}_{y \sim \mathscr {N}(0, 1)}[f_x(y)] = 1$ (this is an evaluation of the Gaussian moment-generating function that we have mentioned in the main text), and then by the Hermite expansion

$$\displaystyle \begin{aligned} f_x(y) = \sum_{k = 0}^\infty \langle f_x, \widehat{h}_k \rangle \widehat{h}_k(y) = \sum_{k = 0}^\infty \frac{1}{k!}x^k h_k(y), \end{aligned}$$

giving the result.

Subgaussian Random Variables

Many of our rigorous arguments rely on the concept of subgaussianity, which we now define. See, e.g., [92] for more details.

Definition 17

For σ ² > 0, we say that a real-valued random variable π is σ ²-subgaussian if $\mathbb {E}[\pi ] = 0$ and for all $t \in \mathbb {R}$, the moment-generating function $M(t) = \mathbb {E}[\exp (t \pi )]$ of π exists and is bounded by $M(t) \le \exp (\sigma ^2 t^2 / 2)$.

Here σ ² is called the variance proxy, which is not necessarily equal to the variance of π (although it can be shown that σ ² ≥Var[π]). The name subgaussian refers to the fact that $\exp (\sigma ^2 t^2 / 2)$ is the moment-generating function of $\mathscr {N}(0,\sigma ^2)$.

The following are some examples of (laws of) subgaussian random variables. Clearly, $\mathscr {N}(0,\sigma ^2)$ is σ ²-subgaussian. By Hoeffding’s lemma, any distribution supported on an interval [a, b] is (b − a)²∕4-subgaussian. In particular, the Rademacher distribution Unif({±1}) is 1-subgaussian. Note also that the sum of n independent σ ²-subgaussian random variables is σ ²n-subgaussian.

Subgaussian random variables admit the following bound on their absolute moments; see Lemmas 1.3 and 1.4 of [92].

Proposition 12

If π is σ ² -subgaussian then

$$\displaystyle \begin{aligned}\mathbb{E}[|\pi|{}^k] \le (2\sigma^2)^{k/2} k \varGamma(k/2)\end{aligned}$$

for every integer k ≥ 1.

Here Γ(⋅) denotes the gamma function which, recall, is defined for all positive real numbers and satisfies Γ(k) = (k − 1)! when k is a positive integer. We will need the following property of the gamma function.

Proposition 13

For all x > 0 and a > 0,

$$\displaystyle \begin{aligned}\frac{\varGamma(x+a)}{\varGamma(x)} \le (x+a)^a.\end{aligned}$$

Proof

This follows from two standard properties of the gamma function. The first is that (similarly to the factorial) Γ(x + 1)∕Γ(x) = x for all x > 0. The second is Gautschi’s inequality, which states that Γ(x + s)∕Γ(x) < (x + s)^s for all x > 0 and s ∈ (0, 1).

In the context of the spiked Wigner model (Sect. 3.2), we now prove that subgaussian spike priors admit a local Chernoff bound (Definition 14).

Proposition 14

Suppose π is σ ²-subgaussian (for some constant σ ² > 0) with $\mathbb {E}[\pi ] = 0$and $\mathbb {E}[\pi ^2] = 1$. Let $(\mathscr {X}_n)$be the spike prior that draws each entry ofxi.i.d. from π (where π does not depend on n). Then $(\mathscr {X}_n)$admits a local Chernoff bound.

Proof

Since π is subgaussian, π ² is subexponential, which implies $\mathbb {E}[\exp (t \pi ^2)] < \infty $ for all |t|≤ s for some s > 0 (see e.g., Lemma 1.12 of [92]).

Let π, π ^′ be independent copies of π, and set Π = ππ ^′. The moment-generating function of Π is

$$\displaystyle \begin{aligned} M(t) = \mathbb{E}[\exp(t \varPi)] = \mathbb{E}_\pi \mathbb{E}_{\pi'}[\exp(t \pi \pi')] \le \mathbb{E}_\pi\left[\exp\left(\sigma^2 t^2 \pi^2/2\right)\right] < \infty \end{aligned}$$

provided $\frac {1}{2}\sigma ^2 t^2 < s$, i.e. $|t| < \sqrt {2s/\sigma ^2}$. Thus M(t) exists in an open interval containing t = 0, which implies $M'(0) = \mathbb {E}[\varPi ] = 0$ and $M''(0) = \mathbb {E}[\varPi ^2] = 1$ (this is the defining property of the moment-generating function: its derivatives at zero are the moments).

Let η > 0 and . Since M(0) = 1, M′(0) = 0, M″(0) = 1 and, as one may check, $f(0) = 1, f'(0) = 0, f''(0) = \frac {1}{1-\eta } > 1$, there exists δ > 0 such that, for all t ∈ [−δ, δ], M(t) exists and M(t) ≤ f(t).

We then apply the standard Chernoff bound argument to $\langle \boldsymbol x^1,\boldsymbol x^2 \rangle = \sum _{i=1}^n \varPi _i$ where Π ₁, …, Π _n are i.i.d. copies of Π. For any α > 0,

$$\displaystyle \begin{aligned} \Pr\left\{\langle \boldsymbol x^1,\boldsymbol x^2 \rangle \ge t\right\} &= \Pr\left\{\exp(\alpha \langle \boldsymbol x^1,\boldsymbol x^2 \rangle) \ge \exp(\alpha t)\right\}\\ &\le \exp(-\alpha t) \mathbb{E}[\exp(\alpha \langle \boldsymbol x^1,\boldsymbol x^2 \rangle)] \end{aligned} $$

(byMarkov’s inequality)

$$\displaystyle \begin{aligned} &= \exp(-\alpha t) \mathbb{E}\left[\exp\left(\alpha \sum_{i=1}^n \varPi_i\right)\right]\\ &= \exp(-\alpha t) [M(\alpha)]^n\\ &\le \exp(-\alpha t) [f(\alpha)]^n \\ &= \exp(-\alpha t) \exp\left(\frac{\alpha^2 n}{2(1-\eta)}\right). \end{aligned} $$

(provided α ≤ δ)

Taking α = (1 − η)t∕n,

$$\displaystyle \begin{aligned}\Pr\left\{\langle \boldsymbol x^1,\boldsymbol x^2 \rangle \ge t\right\} \le \exp\left(-\frac{1}{n}(1-\eta)t^2 + \frac{1}{2n}(1-\eta)t^2\right) = \exp\left(-\frac{1}{2n}(1-\eta)t^2\right)\end{aligned}$$

as desired. This holds provided α ≤ δ, i.e. t ≤ δn∕(1 − η). A symmetric argument with − Π in place of Π holds for the other tail, $\Pr \left \{\langle \boldsymbol x^1,\boldsymbol x^2 \rangle \le -t\right \}$.

Hypercontractivity

The following hypercontractivity result states that the moments of low-degree polynomials of i.i.d. random variables must behave somewhat reasonably. The Rademacher version is the Bonami lemma from [88], and the Gaussian version appears in [53] (see Theorem 5.10 and Remark 5.11 of [53]). We refer the reader to [88] for a general discussion of hypercontractivity.

Proposition 15 (Bonami Lemma)

Letx = (x ₁, …, x _n) have either i.i.d. $\mathscr {N}(0,1)$or i.i.d. Rademacher (uniform ± 1) entries, and let $f: \mathbb {R}^n \to \mathbb {R}$be a polynomial of degree k. Then

$$\displaystyle \begin{aligned}\mathbb{E}[f(x)^4] \le 3^{2k} \,\mathbb{E}[f(x)^2]^2.\end{aligned}$$

We will combine this with the following standard second moment method.

Proposition 16 (Paley-Zygmund Inequality)

If Z ≥ 0 is a random variable with finite variance, and 0 ≤ θ ≤ 1, then

$$\displaystyle \begin{aligned}\mathrm{Pr}\left\{Z > \theta\, \mathbb{E}[Z]\right\} \ge (1-\theta)^2 \frac{\mathbb{E}[Z]^2}{\mathbb{E}[Z^2]}.\end{aligned}$$

By combining Propositions 16 and 15, we immediately have the following.

Corollary 2

Letx = (x ₁, …, x _n) have either i.i.d. $\mathscr {N}(0,1)$or i.i.d. Rademacher (uniform ± 1) entries, and let $f: \mathbb {R}^n \to \mathbb {R}$be a polynomial of degree k. Then, for 0 ≤ θ ≤ 1,

$$\displaystyle \begin{aligned}\Pr\left\{f(x)^2 > \theta\, \mathbb{E}[f(x)^2]\right\} \ge (1-\theta)^2 \frac{\mathbb{E}[f(x)^2]^2}{\mathbb{E}[f(x)^4]} \ge \frac{(1-\theta)^2}{3^{2k}}.\end{aligned}$$

Remark 5

One rough interpretation of Corollary 2 is that if f is degree k, then $\mathbb {E}[f(x)^2]$ cannot be dominated by an event of probability smaller than roughly 3^−2k.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kunisky, D., Wein, A.S., Bandeira, A.S. (2022). Notes on Computational Hardness of Hypothesis Testing: Predictions Using the Low-Degree Likelihood Ratio. In: Cerejeiras, P., Reissig, M. (eds) Mathematical Analysis, its Applications and Computation. ISAAC 2019. Springer Proceedings in Mathematics & Statistics, vol 385. Springer, Cham. https://doi.org/10.1007/978-3-030-97127-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-97127-4_1
Published: 03 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97126-7
Online ISBN: 978-3-030-97127-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Notes on Computational Hardness of Hypothesis Testing: Predictions Using the Low-Degree Likelihood Ratio

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1: Omitted Proofs

Neyman-Pearson Lemma

Proof of Lemma 1

Equivalence of Symmetric and Asymmetric Noise Models

Low-Degree Analysis of Spiked Wigner Above the PCA Threshold

Proof of Theorem 6

Appendix 2: Omitted Probability Theory Background

Hermite Polynomials

Definition 15

Proposition 10

Definition 16

Proposition 11

Proof of Proposition 8 , Integration by Parts

Proof of Proposition 7 , Translation Identity

Proof of Proposition 9 , Generating Function

Subgaussian Random Variables

Definition 17

Proposition 12

Proposition 13

Proof

Proof

Hypercontractivity

Proposition 15 (Bonami Lemma)

Proposition 16 (Paley-Zygmund Inequality)

Corollary 2

Remark 5

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation