Abstract
In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or “free energy”) in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relatively high level of technicality. We present a new proof scheme that is quite straightforward with respect to the previous ones. We call it the adaptive interpolation method because it can be seen as an extension of the interpolation method developped by Guerra and Toninelli in the context of spin glasses, with an interpolation path that is adaptive. In order to illustrate our method we show how to prove the replica formula for three non-trivial inference problems. The first one is symmetric rank-one matrix estimation (or factorisation), which is the simplest problem considered here and the one for which the method is presented in full details. Then we generalize to symmetric tensor estimation and random linear estimation. We believe that the present method has a much wider range of applicability and also sheds new insights on the reasons for the validity of replica formulas in Bayesian inference.
Similar content being viewed by others
Notes
Since the first version of this manuscript, the method has been successfully applied to many other problems including non-symmetric matrix and tensor factorization [36], generalized linear models and learning [37], models of deep neural networks [38, 39], random linear estimation with structured matrices [40] and even problems defined by sparse graphical models such as the censored block model [41].
In the present formulation one can also interpret the succession of Gaussian mean-fields in each step as a Wiener process. For this reason we initially called this new approach “the stochastic interpolation method”. The interpretation in terms of a Wiener process is in fact not really needed, and here we choose a more pedestrian path, but we believe this is an aspect of the method that may be of further interest (specially for diluted systems) and briefly discuss it in “Appendix E”.
We abusively use the notation \(dxP_0(x)\) even though \(P_0\) is not necessarily absolutely continuous.
For all other models considered in this paper we directly write the explicit expression of the free energy, but the derivation is always similar.
This identity has been abusively called “Nishimori identity” in the statistical physics literature. One should however note that it is a simple consequence of Bayes formula (see e.g appendix B of [18]). The “true” Nishimori identity [52] concerns models with one extra feature, namely a gauge symmetry which allows to eliminate the input signal, and the expectation over \({\mathbf {S}}\) in (47) can therefore be dropped (see e.g. [20]).
Here we use Lemma 2 but a weaker form of concentration is enough for this argument, namely it suffices to control the following type of “thermal” fluctuation \({\mathbb {E}}[\langle q_{{\mathbf {X}}, {\mathbf {S}}}^2\rangle _{k,t,\epsilon } - \langle q_{{\mathbf {X}}, {\mathbf {S}}}^{}\rangle _{k,t,\epsilon }^2]\). Moreover it is not necessary to allow for an \(\epsilon \)-dependence in \(m_k\)’s.
References
Talagrand, M.: Spin Glasses: A Challenge for Mathematicians: Cavity and Mean Field Models, vol. 46. Springer, Berlin (2003)
Talagrand, M.: Mean Field Models for Spin Glasses. Volume I: Basic Examples. Springer, Berlin (2011)
Talagrand, M.: Mean Field Models for Spin Glasses. Volume II: Advanced Replica-Symmetry and Low Temperature. Springer, Berlin (2011)
Panchenko, D.: The Sherrington–Kirkpatrick Model. Springer Monographs in Mathematics. Springer, Berlin (2013)
Mézard, M., Parisi, G., Virasoro, M.-A.: Spin Glass Theory and Beyond. World Scientific Publishing Co. Inc, Singapore (1990)
Guerra, F.: Replica broken bounds in the mean field spin glass model. Commun. Math. Phys. 233, 1–12 (2003)
Guerra, F., Toninelli, F.: Quadratic replica coupling in the Sherrington–Kirkpatrick mean field spin glass model. J. Math. Phys. 43, 3704–3716 (2002)
Talagrand, M.: The Parisi formula. Ann. Math. 163, 221–263 (2006)
Parisi, G.: A sequence of approximate solutions to the S-K model for spin glasses. J. Phys. A 13, L-115 (1980)
Sherrington, D., Kirkpatrick, S.: Solvable model of a spin glass. Phys. Rev. Lett. 35(26), 1792–1796 (1975)
Montanari, A.: Tight bounds for LDPC and LDGM codes under map decoding. IEEE Trans. Inf. Theory 51, 3221–3246 (2005)
Macris, N.: Griffith Kelly Sherman correlation inequalities: a useful tool in the theory of error correcting codes. IEEE Trans. Inf. Theory 53(2), 664–683 (2007)
Macris, N.: Sharp bounds on generalized exit functions. IEEE Trans. Inf. Theory 53, 2365–2375 (2007)
Kudekar, S., Macris, N.: Sharp bounds for optimal decoding of low-density parity-check codes. IEEE Trans. Inf. Theory 55(10), 4635–4650 (2009)
Korada, S.B., Macris, N.: On the capacity of a code division multiple access system. In: Proceedings of Allerton Conference on Communication, Control, and Computing, Monticello, IL, pp. 959–966 (Sept 2007)
Korada, S.B., Macris, N.: Tight bounds on the capacity of binary input random CDMA systems. IEEE Trans. Inf. Theory 56(11), 5590–5613 (2010)
Barbier, J., Dia, M., Macris, N., Krzakala, F.: The mutual information in random linear estimation. In: The 54th Annual Allerton Conference on Communication, Control, and Computing (Sept 2016)
Barbier, J., Macris, N., Dia, M., Krzakala, F.: Mutual information and optimality of approximate message-passing in random linear estimation. arXiv preprint arXiv:1701.05823
Barbier, J., Macris, N.: I-MMSE relations in random linear estimation and a sub-extensive interpolation method. arXiv:1704.04158 (April 2017)
Korada, S.B., Macris, N.: Exact solution of the gauge symmetric p-spin glass model on a complete graph. J. Stat. Phys. 136(2), 205–230 (2009)
Krzakala, F., Xu, J., Zdeborová, L.: Mutual information in rank-one matrix estimation. In: 2016 IEEE Information Theory Workshop (ITW), pp. 71–75 (Sept 2016)
Barbier, J., Dia, M., Macris, N., Krzakala, F., Lesieur, T., Zdeborová, L.: Mutual information for symmetric rank-one matrix estimation: a proof of the replica formula. Adv. Neural Inf. Process. Syst. (NIPS) 29, 424–432 (2016)
Franz, S., Leone, M.: Replica bounds for optimization problems and diluted spin systems. J. Stat. Phys. 111, 535–564 (2003)
Franz, S., Leone, M., Toninelli, F.: Replica bounds for diluted non-poissonian spin systems. J. Phys. A Math. Gen. 36, 535–564 (2003)
Panchenko, D., Talagrand, M.: Bounds for diluted mean-field spin glass models. Probab. Theory Relat. Fields 130(8), 319–336 (2004)
Hassani, H., Macris, N., Urbanke, R.: Threshold saturation in spatially coupled constraint satisfaction problems. J. Stat. Phys. 150, 807–850 (2013)
Mézard, M., Montanari, A.: Information, Physics and Computation. Oxford Press, Oxford (2009)
Giurgiu, A., Macris, N., Urbanke, R.: Spatial coupling as a proof technique and three applications. IEEE Trans. Inf. Theory 62(10), 5281–5295 (2016)
Reeves, G., Pfister, H.D.: The replica-symmetric prediction for compressed sensing with gaussian matrices is exact. In: 2016 IEEE International Symposium on Information Theory (ISIT) (July 2016)
Reeves, G., Pfister, H.D.: The replica-symmetric prediction for compressed sensing with Gaussian matrices is exact. arXiv:1607.02524 (2016)
Lesieur, T., Miolane, L., Lelarge, M., Krzakala, F., Zdeborová, L.: Statistical and computational phase transitions in spiked tensor estimation. In: 2017 IEEE International Symposium on Information Theory, ISIT 2017, Aachen, Germany, June 25–30, 2017, pp. 511–515 (2017)
Lelarge, M., Miolane, L.: Fundamental limits of symmetric low-rank matrix estimation. Probab. Theory Relat. Fields (2018). https://doi.org/10.1007/s00440-018-0845-x
Miolane, L.: Fundamental limits of low-rank matrix estimation: the non-symmetric case. ArXiv e-prints (Feb 2017)
Coja-Oghlan, A., Krzakala, F., Perkins, W., Zdeborova, L.: Information-theoretic thresholds from the cavity method. arXiv:1611.00814v3 (2016)
Aizenman, M., Sims, R., Starr, S.L.: Extended variational principle for the Sherrington–Kirkpatrick spin-glass model. Phys. Rev. B 68, 214403 (2003)
Barbier, J., Macris, N., Miolane, L.: The layered structure of tensor estimation and its mutual information. In: 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (Sept 2017)
Barbier, J., Krzakala, F., Macris, N., Miolane, L., Zdeborová, L.: Phase transitions, optimal errors and optimality of message-passing in generalized linear models. arXiv preprint arXiv:1708.03395 (2017)
Gabrié, M., Manoel, A., Luneau, C., Barbier, J., Macris, N., Krzakala, F., Zdeborová, L.: Entropy and mutual information in models of deep neural networks. In: Advances in Neural Information Processing Systems (NIPS), Montréal, CA (2018)
Aubin, B., Maillard, A., Barbier, J., Macris, N., Krzakala, F., Zdeborová, L.: The committee machine: Computational to statistical gaps in learning a two-layers neural network. In: Advances in Neural Information Processing Systems (NIPS), Montréal, CA (2018)
Barbier, J., Macris, N., Maillard, A., Krzakala, F.: The mutual information in random linear estimation beyond i.i.d. matrices. In: IEEE International Symposium on Information Theory (ISIT) (2018)
Barbier, J., Chan, C.-L., Macris, N.: Adaptive path interpolation for sparse systems: application to a simple censored block model. In: IEEE International Symposium on Information Theory (ISIT) (2018)
Pastur, L., Shcherbina, M.: The absence of the selfaverageness of the order parameter in the Sherrington–Kirkpatrick model. J. Stat. Phys. 62(1/2), 1–19 (1991)
Pastur, L., Shcherbina, M., Tirozzi, B.: The replica symmetric solution without replica trick for the Hopfield model. J. Stat. Phys. 74, 1161–1183 (1994)
Shcherbina, M.: On the replica symmetric solution for the Sherrington-Kirkpatrick model. Helvetica Physica Acta 70, 838–853 (1997)
Nishimori, H.: Statistical Physics of Spin Glasses and Information Processing: An Introduction. Oxford University Press, Oxford (2001)
Iba, Y.: The Nishimori line and Bayesian statistics. J. Phys. A Math. General 32(21), 3875 (1999)
Mezard, M., Montanari, A.: Information, Physics and Computation. Oxford University Press, Oxford (2009)
Lesieur, T., Krzakala, F., Zdeborová, L.: Mmse of probabilistic low-rank matrix estimation: universality with respect to the output channel. In: Annual Allerton Conference (2015)
Deshpande, Y., Abbe, E., Montanari, A.: Asymptotic mutual information for the binary stochastic block model. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 185–189 (July 2016)
Guerra, F., Toninelli, F.L.: The thermodynamic limit in mean field spin glass models. Commun. Math. Phys. 230(1), 71–79 (2002)
Giurgiu, A., Macris, N., Urbanke, R.: How to prove the maxwell conjecture via spatial coupling: a proof of concept. In: 2012 IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 458–462 (July 2012)
Nishimori, H.: Statistical Physics of Spin Glasses and Information Processing: An Introduction. Oxford University Press, Oxford (2001)
Lesieur, T., Krzakala, F., Zdeborová, L.: Constrained low-rank matrix estimation: phase transitions, approximate message passing and applications. J. Stat. Mech. Theory Exp. 2017(7), 073403 (2017)
Guerra, F., Toninelli, F.: The infinite volume limit in generalised mean field disordered models. Markov Proc. Rel. Fields 9(2), 195–2017 (2003)
McDiarmid, C.: On the method of bounded differences. Surv. Comb. 141(1), 148–188 (1989)
Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press (2013)
Guo, D., Wu, Y., Shitz, S.S., Verdú, S.: Estimation in gaussian noise: properties of the minimum mean-square error. IEEE Trans. Inf. Theory 57(4), 2371–2385 (2011)
Acknowledgements
Jean Barbier acknowledges funding by the Swiss National Science Foundation Grant No. 200021-156672. We thank Thibault Lesieur for providing us the expression of the RS potential for tensor estimation. We also acknowledge helpful discussions with Olivier Lévêque and Léo Miolane on the stochastic calculus interpretation and continuous version of “Appendix E”.
Author information
Authors and Affiliations
Corresponding author
Appendices
Linking the perturbed and plain free energies
The purpose of this appendix is to prove Lemma 1. We first note that differentiating the function \(\epsilon \mapsto f_{k=1,t=0;\epsilon }\) in (24)
By a Gaussian integration by parts the last term becomes
By an application of the identity (47) we have \({\mathbb {E}}[\langle X_i\rangle _{1, 0;\epsilon } S_i] = {\mathbb {E}}[\langle X_i\rangle _{1, 0;\epsilon }^2]\). Therefore we find
Now by convexity and (47) we have \({\mathbb {E}}[\langle X_i\rangle _{1, 0;\epsilon }^2] \le {\mathbb {E}}[\langle X_i^2\rangle _{1, 0;\epsilon }] = {\mathbb {E}}[S^2]\). Therefore
and the first inequality of the Lemma follows from an application of the mean value theorem.
The second inequality follows from the Lipschitz continuity of the free energy \(f_{k=K,t=1;\epsilon }\) of the decoupled scalar system. We refer to [57] for the proof of this standard fact.
Proof of Lemma 3
The proof of this lemma uses another interpolation:
where \({\mathbf {X}},{\mathbf {X}}', {\mathbf {X}}''\) etc are i.i.d replicas distributed according to (22). Computations similar to those in Sect. 2.7 lead to
where we define
Finally from (182) and Cauchy–Schwarz, one obtains
The last equality is true as long as the prior \(P_0\) has bounded first four moments. We prove this claim now. Let us start by studying \({\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^2 \rangle _{k,s;\epsilon }]\). Using Cauchy–Schwarz for the inequality and (47) for the subsequent equality,
where the last equality is valid for \(P_0\) with bounded second and fourth moments. For \({\mathbb {E}}[\langle g({\mathbf {X}}, {\mathbf {X}}';{\mathbf {S}})^2 \rangle _{k,s;\epsilon }]\) we proceed similarly by decoupling the expectations using Cauchy–Schwarz and then using (47) to make appear only terms depending on the signal \({\mathbf {s}}\). One finds that under the same conditions on the moments of \(P_0\) we have \({\mathbb {E}}[\langle g({\mathbf {X}}, {\mathbf {X}}';{\mathbf {S}})^2 \rangle _{k,s;\epsilon }] = {{\mathcal {O}}}(n^2)\). Combined with (185) leads to the last equality of (184) and ends the proof.
Alternative argument for the lower bound
We present an alternative useful, albeit not completely rigorous, argument to obtain the lower bound (46). With enough work the argument can be made rigorous. Note that defining
the identity (45) is equivalent to
Setting \(b_n=2a_n\), taking a sequence \(a_n\rightarrow 0\) slowly enough as \(n\rightarrow +\infty \), using Lemma 1 and (28), we obtain
Simple algebra starting from \(\partial _{m_k}{\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta ) = 0\) implies, under the assumption that the extrema are attained at interior points of \({\mathbb {R}}^{K_n}_+\) (the point to work out to make the argument rigorous), that the minimizer of \({\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta )\) satisfies
The right hand side is independent of k, thus the minimizer is \(m_k=m_*\) for \(k=1, \ldots , K_n\) where
Thus
From (188) we get
which is the inequality (46).
A consequence of Bayes rule
The purpose of this appendix is to prove the identity (47). Recall that the Gibbs bracket \(\langle - \rangle _{k,t;\epsilon }\) is the average with respect to the posterior \(P_{k,t;\epsilon }({\mathbf {x}}\vert \varvec{\theta })\) where \(\varvec{\theta } :=\{{\mathbf {s}}, \{{\mathbf {z}}^{(k)}, \widetilde{{\mathbf {z}}}^{(k)}\}_{k=1}^K, {\widehat{{\mathbf {z}}}}\}\). Using Bayes law we have:
It remains to notice that
where the Gibbs bracket on the right hand side is an average with respect to the product measure of two posteriors \(P_{k,t;\epsilon }({\mathbf {x}}\vert \varvec{\theta }) P_{k,t;\epsilon }({\mathbf {x}}'\vert \varvec{\theta })\).
A stochastic calculus interpretation
We note that the proofs do not require any upper limit on K. This suggests that it is possible to formulate the adaptive interpolation method entirely in a continuum language. Here we informally show this for the simplest problem, namely symmetric rank-one matrix factorisation, and plan to come back to a rigorous formulation of the continuum formulation in future work.
It is helpful to first write down explicitly the (k, t)-interpolating Hamiltonian (14) (leaving out the perturbation in (21) which is irrelevant for the argument here)
and to define the step-wise function \(m(u) = m_{k^\prime }\) for \(k'/K \le u< (k^\prime +1)/K\), \(k^\prime = 1, \dots , K\).
Let us first look at the terms that do not involve Gaussian noise and become simple Riemann integrals. We have for the contribution coming from (195) and (197),
Similarly, we have for the terms coming from (196) and (198),
Now we treat the more interesting contributions involving the Gaussian noise. Let B(u) be the Wiener process defined by \(B(0)=0\), \({\mathbb {E}}[B(u)] =0\), \({\mathbb {E}}[B(u)B(v)] = \min (u,v)\) for \(u, v \in {\mathbb {R}}_+\). We introduce independent copies \(B_{ij}(u)\), \(i, j = 1,\dots , n\) and consider the sum of increments (also written as an Ito integral)
Since the increments are independent and \({\mathbb {E}}[(B(u) - B(v))^2] = |u - v|\), this is a Gaussian random variable with zero mean and variance \((K + 1 - k - t)/K\). It is therefore equal in distribution to
and the contribution of the (random) Gaussian noise in (195) and (197) becomes
To represent the contributions of (196), (198) we introduce independent copies of the Wiener process \(\widetilde{B}_i(u)\), \(i=1, \dots , n\) and form the Ito integral
which has the same variance than
Indeed
Therefore the contribution of (196) and (198) can be represented as
Finally, collecting (199), (200), (203), (207), setting \(\tau :=(t + k)/K\) and \(K \rightarrow \infty \), we obtain a continuous form of the random (k, t)-interpolating Hamiltonian,
where m(u) is an arbitrary trial function and \(\mathbf {B}\) denotes the collection of all Wiener processes. Note that \(\int _\tau ^1du \, B_{ij}(u) = B_{ij}(1) - B_{ij}(\tau )\) which is distributed as \(\sqrt{1-\tau }Z_{ij}\) for \(Z_{ij}\sim \mathcal {N}(0, 1)\), and \(\int _0^\tau \sqrt{m(u)}d{\widetilde{B}}_i(u)\) is distributed as \(\sqrt{\int _0^\tau m(u)}{\widetilde{Z}}_i\) for \({\widetilde{Z}}_i\sim \mathcal {N}(0, 1)\). Therefore (208) is equal in distribution to
Clearly, the usual Guerra-Toninelli interpolation appears as a special case where one choose a constant trial function \(m(u)=m\) constant. When we go from (208) to (209) we eliminate completely the Wiener process, however we believe it is useful to keep in mind the point of view expressed by (208) which may turn out to be important for more complicated problems.
Starting from (208) or (209) it is possible to evaluate the free energy change along the interpolation path. We define the free energy
For \(\tau = 0\) using we recover the original Hamiltonian \(\mathcal {H}_{k=1, t=0}\) (see (26)) and \(f(0) = f\) given in (7). For \(\tau = 1\) setting \(\int _0^1 du \,m(u) = m_{\mathrm{mf}}\) we recover the mean-field Hamiltonian \(\mathcal {H}_{k=K, t=1}\) (see (31)) and \(f(1) = f_{\mathrm{den}}(\Sigma (\int _0^1 du\, m(u)); \Delta )\). Then proceeding similarly to Sect. 2.7 one finds the identity
where \(\langle -\rangle _\tau \) is the Gibbs average w.r.t (208).
Of course this immediately gives the upper bound in Proposition 1. The matching lower bound is obtained by the same ideas used in the discrete version. We briefly review them informally in the continuous language. One first introduces the \(\epsilon \)-perturbation term (21) and proves a concentration property for the overlap analogous to Lemma 2. Starting with the continuous version of the interpolating Hamiltonian the proof of the free energy concentration is essentially identical (even simpler) than in Sect. 7, which implies the overlap concentration through Sect. 5 that is unchanged. Then, the square in the remainder term is approximately equal to \(({\mathbb {E}}_{{\mathbf {S}}, \mathbf {B}}[\langle q_{{\mathbf {X}}, {\mathbf {S}}}^{}\rangle _{\tau , \epsilon }] - m(\tau ))^2\) and we make it vanish by choosing
This continuous setting thus allows to avoid proving Lemma 3. This then easily yields the lower bound in Proposition 1. One must still check that (212) has a solution. The right hand side is a function \(G_{n, \epsilon }(\tau ; \int _0^\tau du\, m(u))\) so setting \(x(\tau ) = \int _0^\tau du\, m(u)\), \(dx/d\tau = m(\tau )\), we recognize that (212) is a first order differential equation with initial condition \(x(0)=0\). The existence of a unique global solution on \(\tau \in [0,1]\) is then proved using the Cauchy–Lipschitz theorem. Moreover this solution is differentiable and monotone increasing with respect to \(\epsilon \). This last step of the analysis replaces Lemma 4.
Rights and permissions
About this article
Cite this article
Barbier, J., Macris, N. The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference. Probab. Theory Relat. Fields 174, 1133–1185 (2019). https://doi.org/10.1007/s00440-018-0879-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-018-0879-0