Skip to main content

The overlap gap property in principal submatrix recovery

Abstract

We study support recovery for a \(k \times k\) principal submatrix with elevated mean \(\lambda /N\), hidden in an \(N\times N\) symmetric mean zero Gaussian matrix. Here \(\lambda >0\) is a universal constant, and we assume \(k = N \rho \) for some constant \(\rho \in (0,1)\). We establish that there exists a constant \(C>0\) such that the MLE recovers a constant proportion of the hidden submatrix if \(\lambda {\ge C} \sqrt{\frac{1}{\rho } \log \frac{1}{\rho }}\), while such recovery is information theoretically impossible if \(\lambda = o( \sqrt{\frac{1}{\rho } \log \frac{1}{\rho }} )\). The MLE is computationally intractable in general, and in fact, for \(\rho >0\) sufficiently small, this problem is conjectured to exhibit a statistical-computational gap. To provide rigorous evidence for this, we study the likelihood landscape for this problem, and establish that for some \(\varepsilon >0\) and \(\sqrt{\frac{1}{\rho } \log \frac{1}{\rho } } \ll \lambda \ll \frac{1}{\rho ^{1/2 + \varepsilon }}\), the problem exhibits a variant of the Overlap-Gap-Property (OGP). As a direct consequence, we establish that a family of local MCMC based algorithms do not achieve optimal recovery. Finally, we establish that for \(\lambda > 1/\rho \), a simple spectral method recovers a constant proportion of the hidden submatrix.

This is a preview of subscription content, access via your institution.

Fig. 1

Data availibility

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Abbe, E.: Community detection and stochastic block models: recent developments. J. Mach. Learn. Res. 18(1), 6446–6531 (2017)

    MathSciNet  Google Scholar 

  2. Achlioptas, D., Coja-Oghlan, A., Ricci-Tersenghi, F.: On the solution-space geometry of random constraint satisfaction problems. Random Struct. Algorithms 38(3), 251–268 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  3. Addario-Berry, L., Maillard, P.: The algorithmic hardness threshold for continuous random energy models. Math. Stat. Learn. 2(1), 77–101 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  4. Aizenman, M., Sims, R., Starr, S.L.: Extended variational principle for the Sherrington–Kirkpatrick spin-glass model. Phys. Rev. B 68(21), 214403 (2003)

    Article  Google Scholar 

  5. Alon, N., Krivelevich, M., Sudakov, B.: Finding a large hidden clique in a random graph. Random Struct. Algorithms 13(3–4), 457–466 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  6. Amini, A.A., Wainwright, M.J : High-dimensional analysis of semidefinite relaxations for sparse principal components. In: 2008 IEEE International Symposium on Information Theory, pp. 2454–2458. IEEE (2008)

  7. Arguin, L.-P.: Spin glass computations and Ruelle’s probability cascades. J. Stat. Phys. 126(4–5), 951–976 (2007)

  8. Auffinger, A., Chen, W.-K.: Parisi formula for the ground state energy in the mixed \(p\)-spin model. Ann. Probab. 45(6B), 4617–4631 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  9. Auffinger, A., Chen, W..-K., Zeng, Q.: The SK model is full-step replica symmetry breaking at zero temperature. Commun. Pure Appl. Math. 73, 921–943 (2020)

    Article  MATH  Google Scholar 

  10. Baffioni, F., Rosati, F.: Some exact results on the ultrametric overlap distribution in mean field spin glass models (i). Eur. Phys. J. B Condens. Matter Complex Syst. 17(3), 439–447 (2000)

    Article  Google Scholar 

  11. Baik, J., Arous, B., Ben Arous, G., Péché, S.: Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33(5), 1643–1697 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  12. Balakrishnan, S., Kolar, M., Rinaldo, A., Singh, A., Wasserman, L.: Statistical and computational tradeoffs in biclustering. In: NeurIPS 2011 Workshop on Computational Trade-Offs in Statistical Learning, vol. 4 (2011)

  13. Banks, J., Moore, C., Vershynin, R., Verzelen, N., Jiaming, X.: Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization. IEEE Trans. Inf. Theory 64(7), 4872–4894 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  14. Barak, B., Hopkins, S., Kelner, J., Kothari, P.K., Moitra, A., Potechin, A.: A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM J. Comput. 48(2), 687–735 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  15. Barbier, J., Dia, M., Macris, N., Krzakala, F., Lesieur, T., Zdeborová, L.: Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 424–432 (2016)

  16. Barbier, J., Macris, N., Rush, C.: All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation. In: Advances in Neural Information Processing Systems, pp. 14915–14926 (2020)

  17. Barra, A., Genovese, G., Guerra, F.: Equilibrium statistical mechanics of bipartite spin systems. J. Phys. A: Math. Theor. 44(24), 245002 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  18. Ben Arous, G., Gheissari, R., Jagannath, A.: Algorithmic thresholds for tensor PCA. Ann. Probab. 48(4), 2052–2087 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  19. Arous, G.B., Jagannath, A.: Spectral gap estimates in mean field spin glasses. Commun. Math. Phys. 361(1), 1–52 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  20. Ben Arous, G., Wein, A.S., Zadik, I.: Free energy wells and overlap gap property in sparse pca. In: Conference on Learning Theory, pp. 479–482. PMLR (2020)

  21. Benaych-Georges, F., Nadakuditi, R.R.: The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math. 227(1), 494–521 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  22. Berthet, Q., Rigollet, P.: Complexity theoretic lower bounds for sparse principal component detection. In: Conference on Learning Theory, pp. 1046–1066 (2013)

  23. Bhamidi, S., Dey, P.S., Nobel, A.B.: Energy landscape for large average submatrix detection problems in Gaussian random matrices. Probab. Theory Relat. Fields 168(3–4), 919–983 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  24. Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities. A Nonasymptotic Theory of Independence. Oxford University Press, Oxford (2013)

    Book  MATH  Google Scholar 

  25. Brennan, M., Bresler, G., Huleihel, W.: Reducibility and computational lower bounds for problems with planted sparse structure. In: Proceedings of the 31st Conference On Learning Theory. PMLR, vol. 75, pp. 48–166 (2018)

  26. Brennan, M., Bresler, G., Huleihel, W.: Universality of computational lower bounds for submatrix detection. In: Proceedings of the Thirty-Second Conference on Learning Theory. PMLR, vol. 99, pp. 417–468 (2019)

  27. Butucea, C., Ingster, Y.I.: Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19(5B), 2652–2688 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  28. Butucea, C., Ingster, Y.I., Suslina, I.A.: Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix. ESAIM Probab. Stat. 19, 115–134 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  29. Cai, T.T., Liang, T., Rakhlin, A.: Computational and statistical boundaries for submatrix localization in a large noisy matrix. Ann. Stat. 45(4), 1403–1430 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  30. Chandrasekaran, V., Jordan, M.I.: Computational and statistical tradeoffs via convex relaxation. Proc. Natl. Acad. Sci. 110(13), E1181–E1190 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  31. Chen, W.-K., Gamarnik, D., Panchenko, D., Rahman, M.: Suboptimality of local algorithms for a class of max-cut problems. Ann. Probab. 47(3), 1587–1618 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  32. Chen, Y., Jiaming, X.: Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices. J. Mach. Learn. Res. 17(1), 882–938 (2016)

    MathSciNet  MATH  Google Scholar 

  33. Coja-Oghlan, A., Haqshenas, A., Hetterich, S.: Walksat stalls well below satisfiability. SIAM J. Discrete Math. 31(2), 1160–1173 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  34. Cover, T.M., Thomas, J.A.: Elements of Information Theory, vol. 68, pp. 69–73. Wiley, New York (1991)

    Book  MATH  Google Scholar 

  35. de Bruijn, N.G., Erdös, P.: Some linear and some quadratic recursion formulas. II. Nederl. Akad. Wetensch. Proc. Ser. A. Indagationes Math. 55, 152–163 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  36. Deshpande, Y., Montanari, .: Information-theoretically optimal sparse PCA. In: 2014 IEEE International Symposium on Information Theory, pp. 2197–2201. IEEE (2014)

  37. Deshpande, Y., Montanari, A.: Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems. In: Conference on Learning Theory, pp. 523–562 (2015)

  38. Ding, Y., Kunisky, D., Wein, A.S, Bandeira, A.S: Sparse high-dimensional linear regression. Estimating squared error and a phase transition. Ann. Stat. (to appear)

  39. Feldman, V., Grigorescu, E., Reyzin, L., Vempala, S.S., Xiao, Y.: Statistical algorithms and a lower bound for detecting planted cliques. J. ACM (JACM) 64(2), 8 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  40. Gamarnik, D., Li, Q.: Finding a large submatrix of a Gaussian random matrix. Ann. Stat. 46(6A), 2511–2561 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  41. Gamarnik, D., Sudan, M.: Performance of sequential local algorithms for the random NAE-K-SAT problem. SIAM J. Comput. 46(2), 590–619 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  42. Gamarnik, D., Zadik, I.: High dimensional regression with binary coefficients. estimating squared error and a phase transtition. In: Conference on Learning Theory, pp. 948–953 (2017)

  43. Gamarnik, D., Zadik, I.: Sparse high-dimensional linear regression. Algorithmic barriers and a local search algorithm. arXiv:1711.04952 (2017)

  44. Gamarnik, D., Zadik, I.: The landscape of the planted clique problem: dense subgraphs and the overlap gap property. arXiv:1904.07174 (2019)

  45. Gao, C., Ma, Z., Zhou, H.H.: Sparse CCA: adaptive estimation and computational barriers. Ann. Stat. 45(5), 2074–2101 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  46. Hopkins, S.B., Kothari, P., Potechin, A.H., Raghavendra, P., Schramm, T.: On the integrality gap of degree-4 sum of squares for planted clique. ACM Trans. Algorithms (TALG) 14(3), 28 (2018)

    MathSciNet  MATH  Google Scholar 

  47. Hopkins, S.B, Schramm, T., Shi, J., Steurer, D.: Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors. In: Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, pp. 178–191. ACM (2016)

  48. Jagannath, A.: Approximate ultrametricity for random measures and applications to spin glasses. Commun. Pure Appl. Math. 70(4), 611–664 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  49. Jagannath, A., Ko, J., Sen, S.: Max \(\kappa \)-cut and the inhomogeneous Potts spin glass. Ann. Appl. Probab. 28(3), 1536–1572 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  50. Jagannath, A., Lopatto, P., Miolane, L.: Statistical thresholds for tensor PCA. Ann. Appl. Probab. 30(4), 1910–1933 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  51. Jagannath, A., Sen, S.: On the unbalanced cut problem and the generalized Sherrington-Kirkpatrick model. Ann. Inst. Henri Poincaré D 8(1), 35–88 (2020)

  52. Jagannath, A., Tobasco, I.: A dynamic programming approach to the Parisi functional. Proc. Am. Math. Soc. 144(7), 3135–3150 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  53. Jagannath, A., Tobasco, I.: Low temperature asymptotics of spherical mean field spin glasses. Commun. Math. Phys. 352(3), 979–1017 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  54. Jagannath, A., Tobasco, I.: Some properties of the phase diagram for mixed p-spin glasses. Probab. Theory Relat. Fields 167(3–4), 615–672 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  55. Kolar, M., Balakrishnan, S., Rinaldo, A., Singh, A.: Minimax localization of structural information in large noisy matrices. In: Advances in Neural Information Processing Systems, pp. 909–917 (2011)

  56. Krzakala, F., Xu, J., Zdeborová, L.: Mutual information in rank-one matrix estimation. In: 2016 IEEE Information Theory Workshop (ITW), pp. 71–75. IEEE (2016)

  57. Lelarge, M., Miolane, L.: Fundamental limits of symmetric low-rank matrix estimation. Probab. Theory Relat. Fields 173(3–4), 859–929 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  58. Lesieur, T., Krzakala, F., Zdeborová, L.: Phase transitions in sparse PCA. In: 2015 IEEE International Symposium on Information Theory (ISIT), pp. 1635–1639. IEEE (2015)

  59. Lesieur, T., Krzakala, F., Zdeborová, L.: Constrained low-rank matrix estimation: phase transitions, approximate message passing and applications. J. Stat. Mech. Theory Exp. 2017(7), 073403 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  60. Ma, T., Wigderson, A.: Sum-of-squares lower bounds for sparse PCA. In: Advances in Neural Information Processing Systems, pp. 1612–1620 (2015)

  61. Ma, Z., Yihong, W.: Computational barriers in minimax submatrix detection. Ann. Stat. 43(3), 1089–1116 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  62. Meka, R., Potechin, A., Wigderson, A.: Sum-of-squares lower bounds for planted clique. In: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, pp. 87–96. ACM (2015)

  63. Mézard, M., Parisi, G., Virasoro, M.A.: Spin Glass Theory and Beyond: An Introduction to the Replica Method and Its Applications, vol. 9. World Scientific Publishing Company (1987)

  64. Mézard, M., Mora, T., Zecchina, R.: Clustering of solutions in the random satisfiability problem. Phys. Rev. Lett. 94(19), 197205 (2005)

    Article  Google Scholar 

  65. Montanari, A.: Finding one community in a sparse graph. J. Stat. Phys. 161(2), 273–299 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  66. Montanari, A.: Optimization of the Sherrington–Kirkpatrick Hamiltonian. SIAM J. Comput. (2021). https://doi.org/10.1137/20M132016X

  67. Montanari, A., Reichman, D., Zeitouni, O.: On the limitation of spectral methods: from the Gaussian hidden clique problem to rank-one perturbations of Gaussian tensors. In: Advances in Neural Information Processing Systems, pp. 217–225 (2015)

  68. Moore, Cr.: The computer science and physics of community detection: landscapes, phase transitions, and hardness. arXiv:1702.00467 (2017)

  69. Panchenko, D.: The Parisi ultrametricity conjecture. Ann. Math. (2) 177(1), 383–393 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  70. Panchenko, D.: The Sherrington–Kirkpatrick model. Springer, Berlin (2013)

    Book  MATH  Google Scholar 

  71. Panchenko, D.: The Parisi formula for mixed \(p\)-spin models. Ann. Probab. 42(3), 946–958 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  72. Panchenko, D.: The free energy in a multi-species Sherrington–Kirkpatrick model. Ann. Probab. 43(6), 3494–3513 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  73. Panchenko, D.: Free energy in the mixed \( p \)-spin models with vector spins. Ann. Probab. 46(2), 865–896 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  74. Panchenko, D.: Free energy in the Potts spin glass. Ann. Probab. 46(2), 829–864 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  75. Rahman, M., Virag, B.: Local algorithms for independent sets are half-optimal. Ann. Probab. 45(3), 1543–1577 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  76. Richard, E., Montanari, A.: A statistical model for tensor PCA. In: Advances in Neural Information Processing Systems, pp. 2897–2905 (2014)

  77. Rossman, B.: Average-case complexity of detecting cliques. Ph.D. thesis, Massachusetts Institute of Technology (2010)

  78. Schramm, T., Wein, A.S: Computational barriers to estimation from low-degree polynomials. arXiv:2008.02269 (2020)

  79. Shabalin, A.A., Weigman, V.J., Perou, C.M., Nobel, A.B.: Finding large average submatrices in high dimensional data. Ann. Appl. Stat. 3(3), 985–1012 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  80. Steele, J.M.: Probability Theory and Combinatorial Optimization, volume 69 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (1997)

    Google Scholar 

  81. Stroock, D.W., Varadhan, S.R.S.: Multidimensional Diffusion Processes. Classics in Mathematics. Springer, Berlin (2006). ((Reprint of the 1997 edition))

  82. Subag, E.: Following the ground-states of full-RSB spherical spin glasses. arXiv:1812.04588 (2018)

  83. Talagrand, M.: Mean field models for spin glasses. Volume II, volume 55 of Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics]. Springer, Heidelberg (2011) (Advanced replica-symmetry and low temperature)

  84. Wein, A.S, El Alaoui, A., Moore, C.: The Kikuchi hierarchy and tensor PCA. arXiv:1904.03858 (2019)

  85. Wu, Y., Xu, J.: Statistical problems with planted structures: information-theoretical and computational limits. In: Information-Theoretic Methods in Data Science, p. 383 (2021)

Download references

Acknowledgements

The authors thank an anonymous referee for pointing out a substantial improvement to Theorem 1.2 as well as for several constructive comments that have improved the exposition of this paper. SS thanks Yash Deshpande for introducing him to the problem. DG gratefully acknowledges the support of ONR Grant N00014-17-1- 2790. AJ gratefully acknowledges NSERC [RGPIN-2020-04597, DGECR-2020-00199] and the partial support of NSF Grant NSF OISE-1604232. Cette recherche a été financée par le Conseil de recherches en sciences naturelles et en génie du Canada (CRSNG).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aukosh Jagannath.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Ruelle probability cascades

For the convenience of the reader, we briefly review here basic properties of Ruelle probability cascades (RPCs) (sometimes called, Derrida Ruelle Probability Cascades) used throughout this paper.

1.1 Construction and basic properties

Let us begin by recalling the construction of RPCs and some basic properties. See, e.g., [70] or [48, Sec. 3.3].

Fix \(r\ge 1\) and let \(\mathcal {A}_{r}\) be as in Sect. 6.1. We label the vertices of this tree as \(\mathcal {A}_{r}=\mathbb {N}^{0}\cup \mathbb {N}^{1}\cup \cdots \cup \mathbb {N}^{r}\), where a vertex at depth k has label \(\alpha =(\alpha ^{1},\ldots ,\alpha ^{k})\) which corresponds to the root-vertex path, \(\emptyset \rightarrow \alpha ^{1}\rightarrow (\alpha ^{1},\alpha ^{2})\rightarrow \cdots \rightarrow (\alpha ^{1},\ldots ,\alpha ^{k}).\) As above, we denote this path by \(p(\alpha )\). Denote the depth of a vertex by \(\Big |\alpha \Big |\) and let \(\partial \mathcal {A}_{r}\) denote the leaves of \(\mathcal {A}_{r}\).

For \(r\ge 1\) and a fixed sequence \(0=\mu _{-1}<\mu _{0}<\ldots <\mu _{r}=1\), we construct the corresponding RPC as follows. Let \(m_{\theta }(dx)=\theta x^{-\theta -1}dx\). For each non-leaf vertex \(\alpha \in \mathcal {A}_{r}\backslash \partial \mathcal {A}_{r}\), we assign an independent copy of the Poisson point process \(PPP(m_{\mu _{\Big |\alpha \Big |}}(dx))\) arranged in decreasing order, where we assign each child of \(\alpha \) the term in the point process of corresponding rank. This yields a collection \((u_{\alpha })_{\alpha \in \mathcal {A}_{r}}\) of random variables. Let \(w_{\alpha }=\prod _{\gamma \in p(\alpha )}u_{\gamma }\) and finally consider the normalized collection \((v_{\alpha })_{\alpha \in \mathcal {A}_{r}}\) given by

$$\begin{aligned} v_{\alpha }=\frac{w_{\alpha }}{\sum _{\Big |\beta \Big |=\Big |\alpha \Big |}w_{\beta }}. \end{aligned}$$

The Ruelle probability cascade with parameters \((\mu _{k})_{k=-1}^{r}\) is the stochastic process \((v_{\alpha })_{\alpha \in \partial \mathcal {A}_{r}}\).

It will also be helpful to note the following. Let \(\mu \in \mathscr {M}_1([0,\rho ])\) have finite support and consider the overlap distribution \(\mathcal {R}(\mu )\) defined as in (6.21). We note here the following elementary consequence of the definition. For a proof see, e.g., [70, Eq. 2.82].

Lemma A.1

Let \(\mu \in \mathscr {M}_1([0,\rho ])\) be of finite support and consider \(\pi \) as defined in (6.21). Then \(\mathbb {E}\pi (R_{12} = q) = \mu (\{q\})\).

1.2 Calculating expectations and Parisi PDEs

Let us now recall the following well-known result connecting Ruelle probability cascades to Parisi-type PDEs. (Recall again that we may view \(\mathbb {N}^{r}=\partial \mathcal {A}_{r}\). ) Results of this type appear in different notations throughout the spin glass literature and are sometimes referred to as consequences of the Bolthausen–Sznitman invariance of RPCs. The following results are taken from [7].

Theorem A.2

(Theorem 6 from [7]) Fix \(r\ge 1,T>0\), and sequences

$$\begin{aligned} 0&=q_{0}<q_{1}<\ldots<q_{r}=T\\ 0&=\mu _{-1}<\mu _{0}<\ldots <\mu _{r}=1. \end{aligned}$$

Let \(\psi \in C^{1}([0,T])\) be non-negative increasing and let \((g_{\psi }(\alpha ))_{\alpha \in \mathbb {N}^{r}}\) denote the centered Gaussian process with covariance

$$\begin{aligned} \mathbb {E}g_{\psi }(\alpha )g_{\psi }(\beta )=\psi (q_{\Big |\alpha \wedge \beta \Big |}). \end{aligned}$$

Finally, let \((v_{\alpha })_{\alpha \in \mathbb {N}^{r}}\) be a Ruelle probability cascade with parameters \((\mu _{k})\). Then we have the following.

  1. (1)

    For any smooth f of at most linear growth we have that

    $$\begin{aligned} \mathbb {E}\log \sum _{\alpha \in \mathbb {N}^{r}}v_{\alpha }\exp [f(g_{\psi }(\alpha ))]=\phi _{\mu }(0,0), \end{aligned}$$

    where \(\phi \) is the unique solution to

    $$\begin{aligned} \partial _{t}\phi +\frac{\psi '}{2}\left( \Delta \phi +\mu [0,t](\partial _{x}\phi )^{2}\right)&=0\\ \phi (T,x)&=f(x), \end{aligned}$$

    and \(\mu \in \mathscr {M}_{1}([0,T])\) is given by \(\mu (\{q_{k}\})=\mu _{k}-\mu _{k-1}\).

  2. (2)

    If \((g_{\psi }^{i})_{i=1}^{M}\) are iid copies of \(g_{\psi }\) and \((f_{i})\) are of at most linear growth, then

    $$\begin{aligned} \mathbb {E}\log \sum _{\alpha }v_{\alpha }\exp \Big (\sum _{i}f_{i}(g_{\psi }^{i}(\alpha )) \Big )=\sum _{i}\mathbb {E}\log \sum v_{\alpha }\exp f_{i}(g_{\psi }(\alpha )). \end{aligned}$$

We note here the following corollary which has appeared more or less verbatim in many papers and follows by applying item 1 in the above with \(f(x)=x\) and the Cole–Hopf iteration.

Corollary A.3

[Proposition 7 from [7]] We have that

$$\begin{aligned} \mathbb {E}\log \sum _{\alpha \in \mathbb {N}^{r}}v_{\alpha }\exp [g_{\psi }(\alpha ))]=\frac{1}{2}\int _{0}^{T}\psi '(s)\mu ([0,s])ds. \end{aligned}$$

The simplicity of the formula in this case follows from noting that the heat equation with initial data \(e^x\) is exactly solvable.

Strict convexity

To prove strict convexity of \(\mathcal {P}\), let us introduce the following notation. For the sake of clarity, we make the dependence of \(u_{\nu }^{i}\) on \(\Lambda _{i}\) explicit by writing \(u_{\nu ,\Lambda _{i}}(t,x)=u_{\nu }^{i}(t,x)\). Furthermore, as (2.1) is invariant under a spatial translation, we see that

$$\begin{aligned} u_{\nu ,\Lambda }(t,x)=u_{\nu _{0},0}(t,x+\Lambda +\nu (\{\rho \})). \end{aligned}$$

where \(\nu _{0}=\nu -\nu (\{\rho \})\delta _{\rho }\). It will also be helpful to recall the dynamic programming principle for \(u_{\nu ,\Lambda _i}(t,x)\) from [51, Lemma 3.5].

Lemma

For any \((\nu ,\lambda )\in \mathcal {A}\times \mathbb {R}\) of the form \(d\nu =mdt+c \delta _{\rho }\), we have that for any \(t<t'\le \rho \),

$$\begin{aligned} u_{\nu ,\lambda }(t,x)=\sup _{\alpha \in \mathcal {B}_{t'}}\mathbb {E}\Big [ u_{\nu ,\lambda }(t',X_{t'}^{\alpha })-\int _{t}^{t'}m(s)\alpha _{s}^{2}ds \Big ], \end{aligned}$$

where \(X^{\alpha }\) solves

$$\begin{aligned} {\left\{ \begin{array}{ll} dX_{s}=m(s)\alpha _{s}^{2}ds+\sqrt{2}dW_{s}\\ X_{t}=x, \end{array}\right. } \end{aligned}$$

\(W_{s}\) is a standard brownian motion and \(\mathcal {B}_{t'}\) is the space of bounded stochastic processes that are progressively measurable with respect to the filtration of \((W_{s})_{s\le t'}\). Furthermore, any optimal control \(\alpha _{s}^{*}\) satisfies

$$\begin{aligned} m(s)\alpha _{s}^{*}=m(s)\partial _{x}u_{\nu ,\lambda }(s,X_{s})\quad a.s. \end{aligned}$$

The proof will begin with the following observation.

Lemma B.1

For any \(\nu \in \mathcal {A}_{0}\), and any \(t\in [0,\rho ),\) \(u_{\nu ,0}(t,x)\) is strictly convex in x.

Proof

Fix \(x_{0},x_{1}\in \mathbb {R}\) distinct, \(\theta \in (0,1)\), and let \(x_{\theta }=\theta x_{0}+(1-\theta )x_{0}\). Let \((X_{s}^{\theta })\) denote the optimal trajectory corresponding to \(u_{\nu ,0}(t,x_{\theta })\), and similarly let \(\alpha _{s}^{\theta }=\partial _{x}u_{\nu ,0}(s,X_{s}^{\theta })\) denote a corresponding optimal control.

Observe that if we let \(G_s=\int _{0}^{s}\sqrt{2}dW_{s_1}+\int _{0}^{s}2m\left( \alpha _{s_1}^{\theta }\right) ^{2}ds_1\), then \(X_{s}^{\theta }=G_s+x_{\theta }\). We first claim that the law of \(G_\rho \) charges any interval \((a,b)\subseteq \mathbb {R}\). As it is possible that \(\int _0^\rho m^2(s) ds=\infty \), Novikov’s condition does not apply, so we cannot apply Girsanov’s theorem directly to \(G_\rho \). We circumvent this by a localization argument as follows.

Fix \(0<s<\rho \). Since \(0\le \sup |\alpha _{s}^{\theta }|\le 1\) by (2.8), we have that \(b_t = 2m(t)(\alpha _t^\theta )^2\) has \(\sup _{0\le t \le s} | b_{t}| < C(s)\) for some non-random \(C(s)>0\). By Girsanov’s theorem [81, Lemma 6.4.1] there is a tilt of the law of \(G_s\) such that the law under this tilted measure is Gaussian. Thus \(\mathbb {P}[G_{s} \in \mathcal {I}] >0\) for any interval \(\mathcal {I}\). Now, fix an interval \(\mathcal {I}= (a,b)\).

Note that

$$\begin{aligned} G_\rho = G_{s'} + \int _{s'}^{\rho } \sqrt{2} dW_{s_1} + \int _{s'}^{\rho } 2m (\alpha ^{\theta }_{s_1})^2 ds_1, \end{aligned}$$

and thus

$$\begin{aligned} |G_\rho - G_{s'}| \le 2 \int _{s'}^{\rho } m ds_1 + \Big | \int _{s'}^{\rho } \sqrt{2} dW_{s_1} \Big |. \end{aligned}$$

Further, \(\int _{s'}^{\rho } \sqrt{2} dW_{s_1} \sim \mathcal {N}(0, 2(\rho - s'))\). Fix \(C'>0\), and let \((c,d) \subset \mathcal {I}\) such that

$$\begin{aligned} \min \{c-a, d-b\} > 2 \int _{s'}^{\rho } m ds_1 + C' \sqrt{2(\rho - s')}. \end{aligned}$$

Such an interval always exists once \(\rho -s'\) is sufficiently small. This implies

$$\begin{aligned} \mathbb {P}[G_{\rho } \in \mathcal {I}] \ge \mathbb {P} \Big [G_{s'} \in (c,d), |G_{\rho }-G_{s'}| \le 2 \int _{s'}^{\rho } m ds_1 + C' \sqrt{2(\rho - s')} \Big ]>0. \end{aligned}$$

This, in turn, establishes that for any interval \(\mathcal {I}\), \(\mathbb {P}[G_{\rho } \in \mathcal {I}] >0\).

Let \(Y=G_{\rho }+x_{1}\) and \(Z=G_{\rho }+x_{0}\), then we have that

$$\begin{aligned} u_{{\nu ,0}}(t,x)&=\mathbb {E}\left[ \left( X_{\rho }^{\theta }\right) _{+}-\int _{t}^{\rho }2m\left( \alpha _{s}^{\theta }\right) ^{2}ds\right] \\&<\theta \mathbb {E}\left[ Y_{+}-\int _{t}^{\rho }2m\left( \alpha _{s}^{\theta }\right) ^{2}ds\right] +(1-\theta )\mathbb {E}\left[ Z_{+}-\int _{t}^{\rho }2m\left( \alpha _{s}^{\theta }\right) ^{2}ds\right] \\&\le \theta u_{\nu ,0}(t,x_{0})+(1-\theta )u_{\nu .0}(t,x_{1}), \end{aligned}$$

where in the second line we use that if \(a<0<b\) then \(\left( \theta a+(1-\theta )b\right) _{+}<\theta a_{+}+(1-\theta )b_{+}\) and that

$$\begin{aligned} \mathbb {P}(YZ<0)= \mathbb {P}((x_{0}+G_{\rho })(x_{1}+G_{\rho })<0)>0. \end{aligned}$$

\(\square \)

Lemma B.2

The functional \(\mathcal {P}\) is strictly convex on \(\mathcal {A}_{0}\times \mathbb {R}\).

Remark B.3

It will be easy to see from the proof that it is also convex on \(\mathcal {A}\times \mathbb {R}\). Strict convexity, however, fails on this larger domain due to the invariance of the functional under the map \((mdt+c\delta _\rho ,\Lambda _1,\Lambda _2)\mapsto (mdt,\Lambda _1+2c,\Lambda _2+2c)\).

Proof

This proof follows the approach of [52, Theorem 20]. Fix \((\nu _{1},\lambda _{1}),(\nu _{2},\lambda _{2})\in \mathcal {A}_{0}\times \mathbb {R}\), and \(\theta \in [0,1]\), let

$$\begin{aligned} (\nu _{\theta },\lambda _{\theta })=\theta (\nu _{1},\lambda _{1})+(1-\theta )(\nu _{2},\lambda _{2}), \end{aligned}$$

and write \(\nu _{1}=m_{1}dt\), \(\nu _{2}=m_{2}dt\), \(\nu _{\theta }=m_{\theta }dt\) with \(m_{\theta }=\theta m_{1}+(1-\theta )m_{2}\).

Let \(X_{s}^{\theta }\) denote the optimal trajectory for the stochastic control problem for \(u_{\nu _{\theta },\lambda _{\theta }}\) and let \(\alpha ^{\theta }=\partial _{x}u_{\nu _{\theta },\lambda _{\theta }}(s,X_{s}^{\theta })\) the corresponding control. Finally let \(Y^{\theta },Z^{\theta }\) solve the SDEs

$$\begin{aligned} dY^{\theta }&=2m_{1}\left( \alpha _{s}^{\theta }\right) ^{2}ds+\sqrt{2}dW_{s}\\ dZ^{\theta }&=2m_{2}(\alpha _{s}^{\theta })^{2}ds+\sqrt{2}dW_{s} \end{aligned}$$

with \(Y_{0}=Z_{0}=0.\)

Now, fix some \(0<t<\rho .\) then by the dynamic programming principle,

$$\begin{aligned} u_{\nu _{\theta },\lambda _{\theta }}(0,0)&=\mathbb {E}\Big [ u_{\nu _{\theta },\lambda _{\theta }}(t,X_{t}^{\theta })-\int _{0}^{t}m_{\theta }(s)\left( \alpha _{s}^{\theta }\right) ^{2}ds \Big ]. \end{aligned}$$

Since the Eq. (2.1) is invariant under translations of space, we see that \(u_{\nu ,\lambda }(t,x)=u_{\nu ,0}(t,x+\lambda )\) for any \((\nu ,\lambda )\). Thus we may rewrite the above as

$$\begin{aligned} u_{\nu _{\theta },\lambda _{\theta }}(0,0)&=\mathbb {E}u_{\nu _{\theta },0}(t,X_{t}^{\theta }+\lambda _{\theta })-\int _{0}^{t}m_{\theta }(s)\left( \alpha _{s}^{\theta }\right) ^{2}ds\\&\le \theta \mathbb {E}\Big (u_{\nu _{\theta },0}(t,Y_{t}+\lambda _{1})-\int _{0}^{t}m_{1}(s)(\alpha _{s}^{\theta })^{2}ds\Big )\\&\quad +(1-\theta )\mathbb {E}\Big (u_{\nu _{\theta },0}(t,Z_{t}+\lambda _{2})-\int _{0}^{t}m_{2}(\alpha _{s}^{\theta })^{2}ds\Big )\\&\le \theta u_{\nu _{1},\lambda _{1}}(0,0)+(1-\theta )u_{\nu _{2},\lambda _{2}}(0,0), \end{aligned}$$

in the first inequality we have used the convexity of \(u_{\nu ,0}(t,0)\) in space. Note that in fact the first inequality is strict, provided

$$\begin{aligned} \mathbb {P}(\left( Y_{t}+\lambda _{1}\right) \ne (Z_{t}+\lambda _{2}))>0. \end{aligned}$$

In particular, it suffices to show that \(Var(Y_{t}-Z_{t})+|\lambda _{1}-\lambda _{2}|>0\). Thus if \(\lambda _{1}\ne \lambda _{2}\) we are done. If they are equal then we know that \(m_{1}\ne m_{2}\). In this case, by right continuity and monotonicity, there must be some \(s<\tau <\rho \) such that \(m_{1}(t')\ne m_{2}(t')\) on \([s,\tau ]\) (that we can take \(t<\rho \) follows from the fact that if \(\nu _{1}\ne \nu _{2}\) then \(m_{1}\) and \(m_{2}\) must differ on a set of positive lebesgue measure ). In particular, we choose \(t=\tau \) from now on.

Note that by Ito’s lemma, our choice of \(\alpha _{s}^{\theta }\) is a martingale, with

$$\begin{aligned} \alpha _{s}^{\theta }-\alpha _{0}^{\theta }=\int _0^{s}\sqrt{2}\partial _{xx}u_{\nu ,0}(s_1,X_{s_1}^{\theta })dW_{s_1}. \end{aligned}$$

Thus by Ito’s isometry, if we let \(\Delta _{s}=2(m_{1}-m_{2})\)

$$\begin{aligned} Var(Y_{t}-Z_{t})&=\mathbb {E}\Big (\int _{0}^{t}\Delta _{s}(\alpha _{s}^{\theta }-\alpha _{0}^{\theta }) ds\Big )^{2} =\int \int _{[0,t]^{2}}\Delta _{s_1}\Delta _{s_2}K(s_1,s_2)ds_1ds_2, \end{aligned}$$

where

$$\begin{aligned} K(s,t) =&\mathbb {E}(\alpha _{s}-\alpha _{0})(\alpha _{t}-\alpha _{0}) =\int _{0}^{t\wedge s}2\mathbb {E}\left( \partial _{xx}u_{\nu ,0}(s_1,X_{s_1}^{\theta })\right) ^{2}ds_1\\ =&p(t\wedge s)=p(t)\wedge p(s), \end{aligned}$$

where\(p(t)=\int _{0}^{t}2\mathbb {E}\left( \partial _{xx}u_{\nu ,0}(s,X_{s}^{\theta })\right) ^{2}ds\). Notice that since \(t<\rho \), we have that \(\Delta _{t}\in L^{2}([0,t])\) . Thus to show positivity of the variance, it suffices to show that K is positive definite.

Since u is \(C^{2}([0,t+\epsilon ]\times \mathbb {R})\) for some \(\epsilon >0\) small enough, and u is strictly convex, we have that \(\partial _{xx}u(t,x)>0\) lebesgue a.e. x. Thus p(t) is strictly increasing. Thus this kernel corresponds to a monotone time change of a Brownian motion, so that it is positive definite. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gamarnik, D., Jagannath, A. & Sen, S. The overlap gap property in principal submatrix recovery. Probab. Theory Relat. Fields 181, 757–814 (2021). https://doi.org/10.1007/s00440-021-01089-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-021-01089-7

Keywords

  • Submatrix recovery
  • Overlap gap property
  • Spin glasses

Mathematics Subject Classification

  • Primary 68Q87
  • 60C05
  • Secondary 82B44
  • 68Q25
  • 62H25