Skip to main content
Log in

The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference

  • Published:
Probability Theory and Related Fields Aims and scope Submit manuscript

Abstract

In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or “free energy”) in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relatively high level of technicality. We present a new proof scheme that is quite straightforward with respect to the previous ones. We call it the adaptive interpolation method because it can be seen as an extension of the interpolation method developped by Guerra and Toninelli in the context of spin glasses, with an interpolation path that is adaptive. In order to illustrate our method we show how to prove the replica formula for three non-trivial inference problems. The first one is symmetric rank-one matrix estimation (or factorisation), which is the simplest problem considered here and the one for which the method is presented in full details. Then we generalize to symmetric tensor estimation and random linear estimation. We believe that the present method has a much wider range of applicability and also sheds new insights on the reasons for the validity of replica formulas in Bayesian inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Since the first version of this manuscript, the method has been successfully applied to many other problems including non-symmetric matrix and tensor factorization [36], generalized linear models and learning [37], models of deep neural networks [38, 39], random linear estimation with structured matrices [40] and even problems defined by sparse graphical models such as the censored block model [41].

  2. In the present formulation one can also interpret the succession of Gaussian mean-fields in each step as a Wiener process. For this reason we initially called this new approach “the stochastic interpolation method”. The interpretation in terms of a Wiener process is in fact not really needed, and here we choose a more pedestrian path, but we believe this is an aspect of the method that may be of further interest (specially for diluted systems) and briefly discuss it in “Appendix E”.

  3. We abusively use the notation \(dxP_0(x)\) even though \(P_0\) is not necessarily absolutely continuous.

  4. For all other models considered in this paper we directly write the explicit expression of the free energy, but the derivation is always similar.

  5. This identity has been abusively called “Nishimori identity” in the statistical physics literature. One should however note that it is a simple consequence of Bayes formula (see e.g appendix B of [18]). The “true” Nishimori identity [52] concerns models with one extra feature, namely a gauge symmetry which allows to eliminate the input signal, and the expectation over \({\mathbf {S}}\) in (47) can therefore be dropped (see e.g. [20]).

  6. Here we use Lemma 2 but a weaker form of concentration is enough for this argument, namely it suffices to control the following type of “thermal” fluctuation \({\mathbb {E}}[\langle q_{{\mathbf {X}}, {\mathbf {S}}}^2\rangle _{k,t,\epsilon } - \langle q_{{\mathbf {X}}, {\mathbf {S}}}^{}\rangle _{k,t,\epsilon }^2]\). Moreover it is not necessary to allow for an \(\epsilon \)-dependence in \(m_k\)’s.

References

  1. Talagrand, M.: Spin Glasses: A Challenge for Mathematicians: Cavity and Mean Field Models, vol. 46. Springer, Berlin (2003)

    MATH  Google Scholar 

  2. Talagrand, M.: Mean Field Models for Spin Glasses. Volume I: Basic Examples. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

  3. Talagrand, M.: Mean Field Models for Spin Glasses. Volume II: Advanced Replica-Symmetry and Low Temperature. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

  4. Panchenko, D.: The Sherrington–Kirkpatrick Model. Springer Monographs in Mathematics. Springer, Berlin (2013)

    Book  MATH  Google Scholar 

  5. Mézard, M., Parisi, G., Virasoro, M.-A.: Spin Glass Theory and Beyond. World Scientific Publishing Co. Inc, Singapore (1990)

    MATH  Google Scholar 

  6. Guerra, F.: Replica broken bounds in the mean field spin glass model. Commun. Math. Phys. 233, 1–12 (2003)

    Article  MATH  Google Scholar 

  7. Guerra, F., Toninelli, F.: Quadratic replica coupling in the Sherrington–Kirkpatrick mean field spin glass model. J. Math. Phys. 43, 3704–3716 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Talagrand, M.: The Parisi formula. Ann. Math. 163, 221–263 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Parisi, G.: A sequence of approximate solutions to the S-K model for spin glasses. J. Phys. A 13, L-115 (1980)

    Article  Google Scholar 

  10. Sherrington, D., Kirkpatrick, S.: Solvable model of a spin glass. Phys. Rev. Lett. 35(26), 1792–1796 (1975)

    Article  Google Scholar 

  11. Montanari, A.: Tight bounds for LDPC and LDGM codes under map decoding. IEEE Trans. Inf. Theory 51, 3221–3246 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  12. Macris, N.: Griffith Kelly Sherman correlation inequalities: a useful tool in the theory of error correcting codes. IEEE Trans. Inf. Theory 53(2), 664–683 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  13. Macris, N.: Sharp bounds on generalized exit functions. IEEE Trans. Inf. Theory 53, 2365–2375 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  14. Kudekar, S., Macris, N.: Sharp bounds for optimal decoding of low-density parity-check codes. IEEE Trans. Inf. Theory 55(10), 4635–4650 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  15. Korada, S.B., Macris, N.: On the capacity of a code division multiple access system. In: Proceedings of Allerton Conference on Communication, Control, and Computing, Monticello, IL, pp. 959–966 (Sept 2007)

  16. Korada, S.B., Macris, N.: Tight bounds on the capacity of binary input random CDMA systems. IEEE Trans. Inf. Theory 56(11), 5590–5613 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  17. Barbier, J., Dia, M., Macris, N., Krzakala, F.: The mutual information in random linear estimation. In: The 54th Annual Allerton Conference on Communication, Control, and Computing (Sept 2016)

  18. Barbier, J., Macris, N., Dia, M., Krzakala, F.: Mutual information and optimality of approximate message-passing in random linear estimation. arXiv preprint arXiv:1701.05823

  19. Barbier, J., Macris, N.: I-MMSE relations in random linear estimation and a sub-extensive interpolation method. arXiv:1704.04158 (April 2017)

  20. Korada, S.B., Macris, N.: Exact solution of the gauge symmetric p-spin glass model on a complete graph. J. Stat. Phys. 136(2), 205–230 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  21. Krzakala, F., Xu, J., Zdeborová, L.: Mutual information in rank-one matrix estimation. In: 2016 IEEE Information Theory Workshop (ITW), pp. 71–75 (Sept 2016)

  22. Barbier, J., Dia, M., Macris, N., Krzakala, F., Lesieur, T., Zdeborová, L.: Mutual information for symmetric rank-one matrix estimation: a proof of the replica formula. Adv. Neural Inf. Process. Syst. (NIPS) 29, 424–432 (2016)

    Google Scholar 

  23. Franz, S., Leone, M.: Replica bounds for optimization problems and diluted spin systems. J. Stat. Phys. 111, 535–564 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  24. Franz, S., Leone, M., Toninelli, F.: Replica bounds for diluted non-poissonian spin systems. J. Phys. A Math. Gen. 36, 535–564 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  25. Panchenko, D., Talagrand, M.: Bounds for diluted mean-field spin glass models. Probab. Theory Relat. Fields 130(8), 319–336 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  26. Hassani, H., Macris, N., Urbanke, R.: Threshold saturation in spatially coupled constraint satisfaction problems. J. Stat. Phys. 150, 807–850 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  27. Mézard, M., Montanari, A.: Information, Physics and Computation. Oxford Press, Oxford (2009)

    Book  MATH  Google Scholar 

  28. Giurgiu, A., Macris, N., Urbanke, R.: Spatial coupling as a proof technique and three applications. IEEE Trans. Inf. Theory 62(10), 5281–5295 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  29. Reeves, G., Pfister, H.D.: The replica-symmetric prediction for compressed sensing with gaussian matrices is exact. In: 2016 IEEE International Symposium on Information Theory (ISIT) (July 2016)

  30. Reeves, G., Pfister, H.D.: The replica-symmetric prediction for compressed sensing with Gaussian matrices is exact. arXiv:1607.02524 (2016)

  31. Lesieur, T., Miolane, L., Lelarge, M., Krzakala, F., Zdeborová, L.: Statistical and computational phase transitions in spiked tensor estimation. In: 2017 IEEE International Symposium on Information Theory, ISIT 2017, Aachen, Germany, June 25–30, 2017, pp. 511–515 (2017)

  32. Lelarge, M., Miolane, L.: Fundamental limits of symmetric low-rank matrix estimation. Probab. Theory Relat. Fields (2018). https://doi.org/10.1007/s00440-018-0845-x

  33. Miolane, L.: Fundamental limits of low-rank matrix estimation: the non-symmetric case. ArXiv e-prints (Feb 2017)

  34. Coja-Oghlan, A., Krzakala, F., Perkins, W., Zdeborova, L.: Information-theoretic thresholds from the cavity method. arXiv:1611.00814v3 (2016)

  35. Aizenman, M., Sims, R., Starr, S.L.: Extended variational principle for the Sherrington–Kirkpatrick spin-glass model. Phys. Rev. B 68, 214403 (2003)

    Article  Google Scholar 

  36. Barbier, J., Macris, N., Miolane, L.: The layered structure of tensor estimation and its mutual information. In: 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (Sept 2017)

  37. Barbier, J., Krzakala, F., Macris, N., Miolane, L., Zdeborová, L.: Phase transitions, optimal errors and optimality of message-passing in generalized linear models. arXiv preprint arXiv:1708.03395 (2017)

  38. Gabrié, M., Manoel, A., Luneau, C., Barbier, J., Macris, N., Krzakala, F., Zdeborová, L.: Entropy and mutual information in models of deep neural networks. In: Advances in Neural Information Processing Systems (NIPS), Montréal, CA (2018)

  39. Aubin, B., Maillard, A., Barbier, J., Macris, N., Krzakala, F., Zdeborová, L.: The committee machine: Computational to statistical gaps in learning a two-layers neural network. In: Advances in Neural Information Processing Systems (NIPS), Montréal, CA (2018)

  40. Barbier, J., Macris, N., Maillard, A., Krzakala, F.: The mutual information in random linear estimation beyond i.i.d. matrices. In: IEEE International Symposium on Information Theory (ISIT) (2018)

  41. Barbier, J., Chan, C.-L., Macris, N.: Adaptive path interpolation for sparse systems: application to a simple censored block model. In: IEEE International Symposium on Information Theory (ISIT) (2018)

  42. Pastur, L., Shcherbina, M.: The absence of the selfaverageness of the order parameter in the Sherrington–Kirkpatrick model. J. Stat. Phys. 62(1/2), 1–19 (1991)

    Article  Google Scholar 

  43. Pastur, L., Shcherbina, M., Tirozzi, B.: The replica symmetric solution without replica trick for the Hopfield model. J. Stat. Phys. 74, 1161–1183 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  44. Shcherbina, M.: On the replica symmetric solution for the Sherrington-Kirkpatrick model. Helvetica Physica Acta 70, 838–853 (1997)

    MathSciNet  MATH  Google Scholar 

  45. Nishimori, H.: Statistical Physics of Spin Glasses and Information Processing: An Introduction. Oxford University Press, Oxford (2001)

    Book  MATH  Google Scholar 

  46. Iba, Y.: The Nishimori line and Bayesian statistics. J. Phys. A Math. General 32(21), 3875 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  47. Mezard, M., Montanari, A.: Information, Physics and Computation. Oxford University Press, Oxford (2009)

    Book  MATH  Google Scholar 

  48. Lesieur, T., Krzakala, F., Zdeborová, L.: Mmse of probabilistic low-rank matrix estimation: universality with respect to the output channel. In: Annual Allerton Conference (2015)

  49. Deshpande, Y., Abbe, E., Montanari, A.: Asymptotic mutual information for the binary stochastic block model. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 185–189 (July 2016)

  50. Guerra, F., Toninelli, F.L.: The thermodynamic limit in mean field spin glass models. Commun. Math. Phys. 230(1), 71–79 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  51. Giurgiu, A., Macris, N., Urbanke, R.: How to prove the maxwell conjecture via spatial coupling: a proof of concept. In: 2012 IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 458–462 (July 2012)

  52. Nishimori, H.: Statistical Physics of Spin Glasses and Information Processing: An Introduction. Oxford University Press, Oxford (2001)

    Book  MATH  Google Scholar 

  53. Lesieur, T., Krzakala, F., Zdeborová, L.: Constrained low-rank matrix estimation: phase transitions, approximate message passing and applications. J. Stat. Mech. Theory Exp. 2017(7), 073403 (2017)

    Article  MathSciNet  Google Scholar 

  54. Guerra, F., Toninelli, F.: The infinite volume limit in generalised mean field disordered models. Markov Proc. Rel. Fields 9(2), 195–2017 (2003)

    MATH  Google Scholar 

  55. McDiarmid, C.: On the method of bounded differences. Surv. Comb. 141(1), 148–188 (1989)

    MathSciNet  MATH  Google Scholar 

  56. Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press (2013)

  57. Guo, D., Wu, Y., Shitz, S.S., Verdú, S.: Estimation in gaussian noise: properties of the minimum mean-square error. IEEE Trans. Inf. Theory 57(4), 2371–2385 (2011)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Jean Barbier acknowledges funding by the Swiss National Science Foundation Grant No. 200021-156672. We thank Thibault Lesieur for providing us the expression of the RS potential for tensor estimation. We also acknowledge helpful discussions with Olivier Lévêque and Léo Miolane on the stochastic calculus interpretation and continuous version of “Appendix E”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean Barbier.

Appendices

Linking the perturbed and plain free energies

The purpose of this appendix is to prove Lemma 1. We first note that differentiating the function \(\epsilon \mapsto f_{k=1,t=0;\epsilon }\) in (24)

$$\begin{aligned} \frac{d f_{1, 0;\epsilon }}{d\epsilon } = \frac{1}{n}\sum _{i=1}^n {\mathbb {E}}\left[ \frac{1}{2}\langle X_i^2\rangle _{1, 0;\epsilon } - \langle X_i \rangle _{1, 0;\epsilon } S_i - \frac{1}{2\sqrt{\epsilon }}\langle X_i\rangle _{1, 0;\epsilon } {\hat{Z}}_i \right] . \end{aligned}$$
(177)

By a Gaussian integration by parts the last term becomes

$$\begin{aligned} -\frac{1}{2\sqrt{\epsilon }}{\mathbb {E}}[\langle X_i\rangle _{1, 0; \epsilon } {\hat{Z}}_i ] = -\frac{1}{2\sqrt{\epsilon }}{\mathbb {E}}\left[ \frac{\partial }{\partial {\hat{Z}}_i}\langle X_i\rangle _{1, 0;\epsilon }\right] = -\frac{1}{2}{\mathbb {E}}\left[ \langle X_i^2\rangle _{1, 0;\epsilon } - \langle X_i\rangle _{1, 0;\epsilon }^2\right] . \end{aligned}$$
(178)

By an application of the identity (47) we have \({\mathbb {E}}[\langle X_i\rangle _{1, 0;\epsilon } S_i] = {\mathbb {E}}[\langle X_i\rangle _{1, 0;\epsilon }^2]\). Therefore we find

$$\begin{aligned} \frac{d f_{1, 0;\epsilon }}{d\epsilon } = - \frac{1}{2n}\sum _{i=1}^n {\mathbb {E}}[\langle X_i\rangle _{1, 0;\epsilon }^2]. \end{aligned}$$
(179)

Now by convexity and (47) we have \({\mathbb {E}}[\langle X_i\rangle _{1, 0;\epsilon }^2] \le {\mathbb {E}}[\langle X_i^2\rangle _{1, 0;\epsilon }] = {\mathbb {E}}[S^2]\). Therefore

$$\begin{aligned} \Big \vert \frac{d f_{1, 0;\epsilon }}{d\epsilon } \Big \vert \le \frac{{\mathbb {E}}[S^2]}{2} \end{aligned}$$
(180)

and the first inequality of the Lemma follows from an application of the mean value theorem.

The second inequality follows from the Lipschitz continuity of the free energy \(f_{k=K,t=1;\epsilon }\) of the decoupled scalar system. We refer to [57] for the proof of this standard fact.

Proof of Lemma 3

The proof of this lemma uses another interpolation:

$$\begin{aligned}&{\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,t;\epsilon }]- {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,0;\epsilon }] = \int _0^t ds \frac{d\,{\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,s;\epsilon }]}{ds} \nonumber \\&\quad = \int _0^t ds {\mathbb {E}}\left[ \big \langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\big \rangle _{k,s;\epsilon }\Big \langle \frac{d{{\mathcal {H}}}_{k,s;\epsilon }({\mathbf {X}};\varvec{\Theta })}{ds}\Big \rangle _{k,s;\epsilon } - \Big \langle q_{{\mathbf {X}},{\mathbf {S}}}^{} \frac{d{{\mathcal {H}}}_{k,s;\epsilon }({\mathbf {X}};\varvec{\Theta })}{ds}\Big \rangle _{k,s;\epsilon }\right] , \nonumber \\&\quad = \int _0^t ds {\mathbb {E}}\left[ \Big \langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\Big (\frac{d{{\mathcal {H}}}_{k,s;\epsilon }({\mathbf {X}}';\varvec{\Theta })}{ds} -\frac{d{{\mathcal {H}}}_{k,s;\epsilon }({\mathbf {X}};\varvec{\Theta })}{ds}\Big )\Big \rangle _{k,s;\epsilon }\right] , \end{aligned}$$
(181)

where \({\mathbf {X}},{\mathbf {X}}', {\mathbf {X}}''\) etc are i.i.d replicas distributed according to (22). Computations similar to those in Sect. 2.7 lead to

$$\begin{aligned} {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,t;\epsilon }] - {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,0;\epsilon }] = \frac{1}{K} \int _0^t ds {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}(g({\mathbf {X}}', {\mathbf {X}}'';{\mathbf {S}}) - g({\mathbf {X}},{\mathbf {X}}';{\mathbf {S}}))\rangle _{k,s;\epsilon }] \end{aligned}$$
(182)

where we define

$$\begin{aligned} g({\mathbf {x}}, {\mathbf {x}}';{\mathbf {s}}):=\frac{m_{k}}{\Delta }\sum _{i=1}^n\Big (\frac{x_ix_i'}{2} - x_is_i \Big ) - \frac{1}{\Delta }\sum _{i\le j=1}^n\Big (\frac{x_ix_j x_i'x_j'}{2n} - \frac{x_ix_js_is_j}{n} \Big ). \end{aligned}$$
(183)

Finally from (182) and Cauchy–Schwarz, one obtains

$$\begin{aligned} \big |{\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,t;\epsilon }] - {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,0;\epsilon }]\big |&= {{\mathcal {O}}}\Big (\frac{1}{K} \sqrt{{\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^2 \rangle _{k,s;\epsilon }] {\mathbb {E}}[\langle g({\mathbf {X}}, {\mathbf {X}}';{\mathbf {S}})^2 \rangle _{k,s;\epsilon }]}\Big )\nonumber \\&= {{\mathcal {O}}}\Big (\frac{n}{K}\Big ). \end{aligned}$$
(184)

The last equality is true as long as the prior \(P_0\) has bounded first four moments. We prove this claim now. Let us start by studying \({\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^2 \rangle _{k,s;\epsilon }]\). Using Cauchy–Schwarz for the inequality and (47) for the subsequent equality,

$$\begin{aligned} {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^2\rangle _{k,s;\epsilon }]&= \frac{1}{n^2}\sum _{i,j=1}^n{\mathbb {E}}[\langle X_iX_jS_iS_j \rangle _{k,s;\epsilon }] \nonumber \\&\le \frac{1}{n^2}\sum _{i,j=1}^n \sqrt{{\mathbb {E}}[\langle X_i^2X_j^2 \rangle _{k,s;\epsilon }]{\mathbb {E}}[S_i^2S_j^2 ] } = \frac{1}{n^2}\sum _{i,j=1}^n {\mathbb {E}}[S_i^2S_j^2]={{\mathcal {O}}}(1), \end{aligned}$$
(185)

where the last equality is valid for \(P_0\) with bounded second and fourth moments. For \({\mathbb {E}}[\langle g({\mathbf {X}}, {\mathbf {X}}';{\mathbf {S}})^2 \rangle _{k,s;\epsilon }]\) we proceed similarly by decoupling the expectations using Cauchy–Schwarz and then using (47) to make appear only terms depending on the signal \({\mathbf {s}}\). One finds that under the same conditions on the moments of \(P_0\) we have \({\mathbb {E}}[\langle g({\mathbf {X}}, {\mathbf {X}}';{\mathbf {S}})^2 \rangle _{k,s;\epsilon }] = {{\mathcal {O}}}(n^2)\). Combined with (185) leads to the last equality of (184) and ends the proof.

Alternative argument for the lower bound

We present an alternative useful, albeit not completely rigorous, argument to obtain the lower bound (46). With enough work the argument can be made rigorous. Note that defining

$$\begin{aligned} {\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^K;\Delta ) :=\frac{1}{4\Delta K}\sum _{k=1}^K m_{k}^2+f_{\mathrm{den}}\big (\Sigma (m_{\mathrm{mf}}^{(K)};\Delta )\big ) =f_{\mathrm{RS}}(m_{\mathrm{mf}}^{(K)};\Delta ) + \frac{V(\{m_k\})}{4\Delta }, \end{aligned}$$
(186)

the identity (45) is equivalent to

$$\begin{aligned} \int _{a_n}^{b_n}d\epsilon \, f_{1,0;\epsilon }&=\int _{a_n}^{b_n}d\epsilon \,\left\{ (f_{K_n,1;\epsilon } - f_{K_n,1;0}) + {\widetilde{f}}_{\mathrm{RS}}(\{m_k^{(n)}\}_{k=1}^{K_n};\Delta )\right\} + \mathcal {O}(a_n^{-2}n^{-\alpha }) \nonumber \\&\ge \int _{a_n}^{b_n}d\epsilon \,(f_{K_n,1;\epsilon } - f_{K_n,1;0}) \nonumber \\&\quad + \min _{\{m_k\ge 0\}_{k=1}^{K_n}} {\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta ) + \mathcal {O}(a_n^{-2}n^{-\alpha }). \end{aligned}$$
(187)

Setting \(b_n=2a_n\), taking a sequence \(a_n\rightarrow 0\) slowly enough as \(n\rightarrow +\infty \), using Lemma 1 and (28), we obtain

$$\begin{aligned} \liminf _{n\rightarrow +\infty }f_n \ge \min _{\{m_k\ge 0\}_{k=1}^{K_n}}{\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta ). \end{aligned}$$
(188)

Simple algebra starting from \(\partial _{m_k}{\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta ) = 0\) implies, under the assumption that the extrema are attained at interior points of \({\mathbb {R}}^{K_n}_+\) (the point to work out to make the argument rigorous), that the minimizer of \({\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta )\) satisfies

$$\begin{aligned} m_k = -2\,\partial _{\Sigma ^{-2}} f_{\mathrm{den}}(\Sigma )|_{\Sigma (m_{\mathrm{mf}}^{(K_n)};\Delta )}, \qquad k=1, \ldots , K_n. \end{aligned}$$
(189)

The right hand side is independent of k, thus the minimizer is \(m_k=m_*\) for \(k=1, \ldots , K_n\) where

$$\begin{aligned} m_* = -2\, \partial _{\Sigma ^{-2}} f_{\mathrm{den}}(\Sigma )|_{\Sigma = \sqrt{\frac{\Delta }{m_*}}}, \qquad k=1, \ldots , K_n. \end{aligned}$$
(190)

Thus

$$\begin{aligned} \min _{\{m_k\ge 0\}_{k=1}^{K_n}} {\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta ) = f_{\mathrm{RS}}( m_*; \Delta ) \ge \min _{m\ge 0} f_{\mathrm{RS}}(m;\Delta ). \end{aligned}$$
(191)

From (188) we get

$$\begin{aligned} \liminf _{n\rightarrow +\infty }f_n&\ge \min _{m \ge 0 } f_{\mathrm{RS}}(m;\Delta ) \end{aligned}$$
(192)

which is the inequality (46).

A consequence of Bayes rule

The purpose of this appendix is to prove the identity (47). Recall that the Gibbs bracket \(\langle - \rangle _{k,t;\epsilon }\) is the average with respect to the posterior \(P_{k,t;\epsilon }({\mathbf {x}}\vert \varvec{\theta })\) where \(\varvec{\theta } :=\{{\mathbf {s}}, \{{\mathbf {z}}^{(k)}, \widetilde{{\mathbf {z}}}^{(k)}\}_{k=1}^K, {\widehat{{\mathbf {z}}}}\}\). Using Bayes law we have:

$$\begin{aligned} {\mathbb {E}}_{\varvec{\Theta }}[\langle g({\mathbf {X}}, {\mathbf {S}})\rangle _{k,t;\epsilon }] = {\mathbb {E}}_{{\mathbf {S}}} {\mathbb {E}}_{\varvec{\Theta }\vert {\mathbf {S}}} [\langle g({\mathbf {X}}, {\mathbf {S}})\rangle _{k,t;\epsilon }] = {\mathbb {E}}_{\varvec{\Theta }} {\mathbb {E}}_{{\mathbf {S}}\vert \varvec{\Theta }} [\langle g({\mathbf {X}}, {\mathbf {S}})\rangle _{k, t,\epsilon }]. \end{aligned}$$
(193)

It remains to notice that

$$\begin{aligned} {\mathbb {E}}_{\varvec{\Theta }}{\mathbb {E}}_{{\mathbf {S}}\vert \varvec{\Theta }} [\langle g({\mathbf {X}}, {\mathbf {S}})\rangle _{k,t;\epsilon }] = {\mathbb {E}}_{\varvec{\Theta }}[\langle g({\mathbf {X}}, {\mathbf {X}}')\rangle _{k,t;\epsilon }] \end{aligned}$$
(194)

where the Gibbs bracket on the right hand side is an average with respect to the product measure of two posteriors \(P_{k,t;\epsilon }({\mathbf {x}}\vert \varvec{\theta }) P_{k,t;\epsilon }({\mathbf {x}}'\vert \varvec{\theta })\).

A stochastic calculus interpretation

We note that the proofs do not require any upper limit on K. This suggests that it is possible to formulate the adaptive interpolation method entirely in a continuum language. Here we informally show this for the simplest problem, namely symmetric rank-one matrix factorisation, and plan to come back to a rigorous formulation of the continuum formulation in future work.

It is helpful to first write down explicitly the (kt)-interpolating Hamiltonian (14) (leaving out the perturbation in (21) which is irrelevant for the argument here)

$$\begin{aligned} \mathcal {H}_{k,t}({\mathbf {x}};\varvec{\theta }) =&\frac{1}{K\Delta }\sum _{k^\prime = k+1}^K \sum _{i\le j=1}^n \Big ( \frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n} - \sqrt{\frac{K\Delta }{n}} x_i x_j z_{ij}^{(k^\prime )}\Big ) \end{aligned}$$
(195)
$$\begin{aligned}&+ \frac{1}{K\Delta }\sum _{k^\prime =1}^{k-1} m_{k^\prime } \sum _{i=1}^n \Big ( \frac{x_i^2}{2} - x_i s_i - \sqrt{\frac{K\Delta }{m_{k^\prime }}} x_i {\widetilde{z}}_i^{(k^\prime )}\Big ) \end{aligned}$$
(196)
$$\begin{aligned}&+ \frac{1-t}{K\Delta } \sum _{i\le j=1}^n \Big ( \frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n} - \sqrt{\frac{K\Delta }{(1-t)n}} x_i x_j z_{ij}^{(k)} \Big ) \end{aligned}$$
(197)
$$\begin{aligned}&+ \frac{t\,m_k}{K\Delta }\sum _{i=1}^n \Big ( \frac{x_i^2}{2} - x_i s_i - \sqrt{\frac{K\Delta }{t \,m_{k}}} x_i {\widetilde{z}}_i^{(k)}\Big ), \end{aligned}$$
(198)

and to define the step-wise function \(m(u) = m_{k^\prime }\) for \(k'/K \le u< (k^\prime +1)/K\), \(k^\prime = 1, \dots , K\).

Let us first look at the terms that do not involve Gaussian noise and become simple Riemann integrals. We have for the contribution coming from (195) and (197),

$$\begin{aligned}&\frac{1}{\Delta }\sum _{i\le j=1}^n\left\{ \frac{1}{K}\sum _{k^\prime = k+1}^K \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ) + \frac{1-t}{K} \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ) \right\} \nonumber \\&\quad = ~\frac{1}{\Delta }\sum _{i\le j=1}^n\left\{ \int _{\frac{k+1}{K}}^{\frac{K+1}{K}}du\, \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ) + \int _{\frac{k+t}{K}}^{\frac{k+1}{K}} du \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ) \right\} \nonumber \\&\quad = ~\frac{1}{\Delta }\sum _{i\le j=1}^n \int _{\frac{k+t}{K}}^{\frac{K+1}{K}} du\, \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ). \end{aligned}$$
(199)

Similarly, we have for the terms coming from (196) and (198),

$$\begin{aligned}&\frac{1}{\Delta }\sum _{i=1}^n\left\{ \frac{1}{K}\sum _{k^\prime =1}^{k-1} m_{k^\prime }\Big (\frac{x_i^2}{2} - x_i s_i\Big ) + \frac{t\,m_k}{K} \Big (\frac{x_i^2}{2} - x_i s_i \Big ) \right\} \nonumber \\&\quad =\, \frac{1}{\Delta }\sum _{i=1}^n\left\{ \int _{\frac{1}{K}}^{\frac{k}{K}} du\, m(u) \Big (\frac{x_i^2}{2} - x_i s_i\Big ) + \int _{\frac{k}{K}}^{\frac{k+t}{K}} du\,m(u) \Big (\frac{x_i^2}{2} - x_i s_i \Big ) \right\} \nonumber \\&\quad =\, \frac{1}{\Delta }\sum _{i=1}^n\left\{ \int _{\frac{1}{K}}^{\frac{k+t}{K}} du\, m(u) \Big (\frac{x_i^2}{2} - x_i s_i\Big ) \right\} . \end{aligned}$$
(200)

Now we treat the more interesting contributions involving the Gaussian noise. Let B(u) be the Wiener process defined by \(B(0)=0\), \({\mathbb {E}}[B(u)] =0\), \({\mathbb {E}}[B(u)B(v)] = \min (u,v)\) for \(u, v \in {\mathbb {R}}_+\). We introduce independent copies \(B_{ij}(u)\), \(i, j = 1,\dots , n\) and consider the sum of increments (also written as an Ito integral)

$$\begin{aligned}&\left\{ B_{ij}\Big (\frac{k+1}{K}\Big ) - B_{ij}\Big (\frac{k+t}{K}\Big ) \right\} + \sum _{k^\prime =k+1}^K \left\{ B_{ij}\Big (\frac{k^\prime +1}{K}\Big ) - B_{ij}\Big (\frac{k^\prime }{K}\Big ) \right\} \nonumber \\&\quad = \int _{\frac{k+t}{K}}^{\frac{K+1}{K}} dB_{ij}(u). \end{aligned}$$
(201)

Since the increments are independent and \({\mathbb {E}}[(B(u) - B(v))^2] = |u - v|\), this is a Gaussian random variable with zero mean and variance \((K + 1 - k - t)/K\). It is therefore equal in distribution to

$$\begin{aligned} \frac{1}{\sqrt{K}}\sum _{k^\prime = k+1}^K Z_{ij}^{(k^\prime )} + \sqrt{\frac{1-t}{K}} Z_{ij}^{(k)}, \end{aligned}$$
(202)

and the contribution of the (random) Gaussian noise in (195) and (197) becomes

$$\begin{aligned} \frac{1}{\sqrt{\Delta n}}\sum _{i\le j=1}^n x_i x_j \left\{ \frac{1}{\sqrt{K}}\sum _{k^\prime = k+1}^K Z_{ij}^{(k^\prime )} + \sqrt{\frac{1-t}{K}} Z_{ij}^{(k)} \right\} = \frac{1}{\sqrt{\Delta n}}\sum _{i\le j=1}^n \int _{\frac{k+t}{K}}^{\frac{K+1}{K}} dB_{ij}(u) x_i x_j. \end{aligned}$$
(203)

To represent the contributions of (196), (198) we introduce independent copies of the Wiener process \(\widetilde{B}_i(u)\), \(i=1, \dots , n\) and form the Ito integral

$$\begin{aligned}&\sum _{k^\prime =1}^{k-1} \sqrt{m_{k^\prime }} \left\{ {\widetilde{B}}_i\Big (\frac{k^\prime +1}{K}\Big ) - {\widetilde{B}}_i\Big (\frac{k^\prime }{K}\Big ) \right\} + \sqrt{m_k} \left\{ {\widetilde{B}}_i\Big (\frac{k +t}{K}\Big ) - {\widetilde{B}}_i\Big (\frac{k}{K}\Big ) \right\} \nonumber \\&\quad = \int _{\frac{1}{K}}^{\frac{k+t}{K}} \sqrt{m(u)} d{\widetilde{B}}_i(u) \end{aligned}$$
(204)

which has the same variance than

$$\begin{aligned} \frac{1}{\sqrt{K}} \sum _{k^\prime =1}^{k-1} \sqrt{m_{k^\prime }} \, {\widetilde{Z}}_i^{(k^\prime )} + \sqrt{\frac{t \,m_k}{K}} \,{\widetilde{Z}}_i ^{(k)}. \end{aligned}$$
(205)

Indeed

$$\begin{aligned} \frac{1}{K}\sum _{k^\prime =1}^{k-1} m_{k^\prime } + \frac{t \,m_k}{K} = \sum _{k^\prime =1}^{k-1} m_{k^\prime } \Big (\frac{k^\prime +1}{K} - \frac{k^\prime }{K}\Big ) + m_k \Big (\frac{k+t}{K} - \frac{k}{K}\Big ) = \frac{1}{K} \int _{\frac{1}{K}}^{\frac{k+t}{K}} du\, m(u). \end{aligned}$$
(206)

Therefore the contribution of (196) and (198) can be represented as

$$\begin{aligned} \frac{1}{\sqrt{\Delta }} \sum _{i=1}^n x_i \int _{\frac{1}{K}}^{\frac{k+t}{K}} \sqrt{m(u)} d{\widetilde{B}}_i(u). \end{aligned}$$
(207)

Finally, collecting (199), (200), (203), (207), setting \(\tau :=(t + k)/K\) and \(K \rightarrow \infty \), we obtain a continuous form of the random (kt)-interpolating Hamiltonian,

$$\begin{aligned} \mathcal {H}_\tau ({\mathbf {x}}; {\mathbf {s}}, \mathbf {B}) = \,&\frac{1}{\Delta } \sum _{i\le j =1}^n \int _\tau ^1 \left\{ \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ) du - \sqrt{\frac{\Delta }{n}} x_i x_j dB_{ij}(u) \right\} \nonumber \\&+ \frac{1}{\Delta } \sum _{i=1}^n \int _0^\tau \left\{ \Big (\frac{x_i^2}{2} - x_i s_i\Big ) m(u) du - \sqrt{\Delta m(u)} x_i d{\widetilde{B}}_i(u)\right\} \end{aligned}$$
(208)

where m(u) is an arbitrary trial function and \(\mathbf {B}\) denotes the collection of all Wiener processes. Note that \(\int _\tau ^1du \, B_{ij}(u) = B_{ij}(1) - B_{ij}(\tau )\) which is distributed as \(\sqrt{1-\tau }Z_{ij}\) for \(Z_{ij}\sim \mathcal {N}(0, 1)\), and \(\int _0^\tau \sqrt{m(u)}d{\widetilde{B}}_i(u)\) is distributed as \(\sqrt{\int _0^\tau m(u)}{\widetilde{Z}}_i\) for \({\widetilde{Z}}_i\sim \mathcal {N}(0, 1)\). Therefore (208) is equal in distribution to

$$\begin{aligned}&\frac{1}{\Delta } \sum _{i\le j =1}^n \left\{ \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big )(1-\tau ) - x_i x_j Z_{ij}\sqrt{\frac{\Delta (1-\tau )}{n}} \right\} \nonumber \\&\quad + ~ \frac{1}{\Delta } \sum _{i=1}^n \left\{ \Big (\frac{x_i^2}{2} - x_i s_i\Big ) \int _0^\tau m(u) du - x_i {\widetilde{Z}}_i \sqrt{\Delta \int _0^\tau m(u)du} \right\} . \end{aligned}$$
(209)

Clearly, the usual Guerra-Toninelli interpolation appears as a special case where one choose a constant trial function \(m(u)=m\) constant. When we go from (208) to (209) we eliminate completely the Wiener process, however we believe it is useful to keep in mind the point of view expressed by (208) which may turn out to be important for more complicated problems.

Starting from (208) or (209) it is possible to evaluate the free energy change along the interpolation path. We define the free energy

$$\begin{aligned} f(\tau ) = - \frac{1}{n} {\mathbb {E}}_{{\mathbf {S}}, \mathbf {B}}\left[ \ln {\mathbb {E}}_{\mathbf {X}}\left[ e^{-\mathcal {H}_\tau ({\mathbf {X}}; {\mathbf {S}}, \mathbf {B})}\right] \right] . \end{aligned}$$
(210)

For \(\tau = 0\) using we recover the original Hamiltonian \(\mathcal {H}_{k=1, t=0}\) (see (26)) and \(f(0) = f\) given in (7). For \(\tau = 1\) setting \(\int _0^1 du \,m(u) = m_{\mathrm{mf}}\) we recover the mean-field Hamiltonian \(\mathcal {H}_{k=K, t=1}\) (see (31)) and \(f(1) = f_{\mathrm{den}}(\Sigma (\int _0^1 du\, m(u)); \Delta )\). Then proceeding similarly to Sect. 2.7 one finds the identity

$$\begin{aligned} f =&f_{\mathrm{RS}}\Big (\int _0^1d\tau \, m(\tau ); \Delta \Big ) + \left\{ \int _0^1d\tau \, m(\tau )^2 - \Big (\int _0^1 d\tau \, m(\tau )\Big )^2\right\} \nonumber \\&- \frac{1}{4\Delta }\int _0^1 d\tau \, {\mathbb {E}}_{{\mathbf {S}}, \mathbf {B}}\left[ \big \langle (q_{{\mathbf {X}}, {\mathbf {S}}}^{} - m(\tau ))^2\big \rangle _\tau \right] + {{\mathcal {O}}}(n^{-1}) \end{aligned}$$
(211)

where \(\langle -\rangle _\tau \) is the Gibbs average w.r.t (208).

Of course this immediately gives the upper bound in Proposition 1. The matching lower bound is obtained by the same ideas used in the discrete version. We briefly review them informally in the continuous language. One first introduces the \(\epsilon \)-perturbation term (21) and proves a concentration property for the overlap analogous to Lemma 2. Starting with the continuous version of the interpolating Hamiltonian the proof of the free energy concentration is essentially identical (even simpler) than in Sect. 7, which implies the overlap concentration through Sect. 5 that is unchanged. Then, the square in the remainder term is approximately equal to \(({\mathbb {E}}_{{\mathbf {S}}, \mathbf {B}}[\langle q_{{\mathbf {X}}, {\mathbf {S}}}^{}\rangle _{\tau , \epsilon }] - m(\tau ))^2\) and we make it vanish by choosing

$$\begin{aligned} m(\tau ) = {\mathbb {E}}_{{\mathbf {S}}, \mathbf {B}}[\langle q_{{\mathbf {X}}, {\mathbf {S}}}^{}\rangle _{\tau , \epsilon }]. \end{aligned}$$
(212)

This continuous setting thus allows to avoid proving Lemma 3. This then easily yields the lower bound in Proposition 1. One must still check that (212) has a solution. The right hand side is a function \(G_{n, \epsilon }(\tau ; \int _0^\tau du\, m(u))\) so setting \(x(\tau ) = \int _0^\tau du\, m(u)\), \(dx/d\tau = m(\tau )\), we recognize that (212) is a first order differential equation with initial condition \(x(0)=0\). The existence of a unique global solution on \(\tau \in [0,1]\) is then proved using the Cauchy–Lipschitz theorem. Moreover this solution is differentiable and monotone increasing with respect to \(\epsilon \). This last step of the analysis replaces Lemma 4.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barbier, J., Macris, N. The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference. Probab. Theory Relat. Fields 174, 1133–1185 (2019). https://doi.org/10.1007/s00440-018-0879-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-018-0879-0

Mathematics Subject Classification

Navigation