The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference

Barbier, Jean; Macris, Nicolas

doi:10.1007/s00440-018-0879-0

The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference

Published: 26 October 2018

Volume 174, pages 1133–1185, (2019)
Cite this article

Probability Theory and Related Fields Aims and scope Submit manuscript

1091 Accesses
57 Citations
Explore all metrics

Abstract

In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or “free energy”) in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relatively high level of technicality. We present a new proof scheme that is quite straightforward with respect to the previous ones. We call it the adaptive interpolation method because it can be seen as an extension of the interpolation method developped by Guerra and Toninelli in the context of spin glasses, with an interpolation path that is adaptive. In order to illustrate our method we show how to prove the replica formula for three non-trivial inference problems. The first one is symmetric rank-one matrix estimation (or factorisation), which is the simplest problem considered here and the one for which the method is presented in full details. Then we generalize to symmetric tensor estimation and random linear estimation. We believe that the present method has a much wider range of applicability and also sheds new insights on the reasons for the validity of replica formulas in Bayesian inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Randomized matrix-free trace and log-determinant estimators

Article 06 April 2017

Statistical inference of semidefinite programming

Article 28 February 2018

Approximate Bayesian computation using asymptotically normal point estimates

Article 27 April 2022

Notes

Since the first version of this manuscript, the method has been successfully applied to many other problems including non-symmetric matrix and tensor factorization [36], generalized linear models and learning [37], models of deep neural networks [38, 39], random linear estimation with structured matrices [40] and even problems defined by sparse graphical models such as the censored block model [41].
In the present formulation one can also interpret the succession of Gaussian mean-fields in each step as a Wiener process. For this reason we initially called this new approach “the stochastic interpolation method”. The interpretation in terms of a Wiener process is in fact not really needed, and here we choose a more pedestrian path, but we believe this is an aspect of the method that may be of further interest (specially for diluted systems) and briefly discuss it in “Appendix E”.
We abusively use the notation $dxP_0(x)$ even though $P_0$ is not necessarily absolutely continuous.
For all other models considered in this paper we directly write the explicit expression of the free energy, but the derivation is always similar.
This identity has been abusively called “Nishimori identity” in the statistical physics literature. One should however note that it is a simple consequence of Bayes formula (see e.g appendix B of [18]). The “true” Nishimori identity [52] concerns models with one extra feature, namely a gauge symmetry which allows to eliminate the input signal, and the expectation over ${\mathbf {S}}$ in (47) can therefore be dropped (see e.g. [20]).
Here we use Lemma 2 but a weaker form of concentration is enough for this argument, namely it suffices to control the following type of “thermal” fluctuation ${\mathbb {E}}[\langle q_{{\mathbf {X}}, {\mathbf {S}}}^2\rangle _{k,t,\epsilon } - \langle q_{{\mathbf {X}}, {\mathbf {S}}}^{}\rangle _{k,t,\epsilon }^2]$. Moreover it is not necessary to allow for an $\epsilon $-dependence in $m_k$’s.

References

Talagrand, M.: Spin Glasses: A Challenge for Mathematicians: Cavity and Mean Field Models, vol. 46. Springer, Berlin (2003)
MATH Google Scholar
Talagrand, M.: Mean Field Models for Spin Glasses. Volume I: Basic Examples. Springer, Berlin (2011)
Book MATH Google Scholar
Talagrand, M.: Mean Field Models for Spin Glasses. Volume II: Advanced Replica-Symmetry and Low Temperature. Springer, Berlin (2011)
Book MATH Google Scholar
Panchenko, D.: The Sherrington–Kirkpatrick Model. Springer Monographs in Mathematics. Springer, Berlin (2013)
Book MATH Google Scholar
Mézard, M., Parisi, G., Virasoro, M.-A.: Spin Glass Theory and Beyond. World Scientific Publishing Co. Inc, Singapore (1990)
MATH Google Scholar
Guerra, F.: Replica broken bounds in the mean field spin glass model. Commun. Math. Phys. 233, 1–12 (2003)
Article MATH Google Scholar
Guerra, F., Toninelli, F.: Quadratic replica coupling in the Sherrington–Kirkpatrick mean field spin glass model. J. Math. Phys. 43, 3704–3716 (2002)
Article MathSciNet MATH Google Scholar
Talagrand, M.: The Parisi formula. Ann. Math. 163, 221–263 (2006)
Article MathSciNet MATH Google Scholar
Parisi, G.: A sequence of approximate solutions to the S-K model for spin glasses. J. Phys. A 13, L-115 (1980)
Article Google Scholar
Sherrington, D., Kirkpatrick, S.: Solvable model of a spin glass. Phys. Rev. Lett. 35(26), 1792–1796 (1975)
Article Google Scholar
Montanari, A.: Tight bounds for LDPC and LDGM codes under map decoding. IEEE Trans. Inf. Theory 51, 3221–3246 (2005)
Article MathSciNet MATH Google Scholar
Macris, N.: Griffith Kelly Sherman correlation inequalities: a useful tool in the theory of error correcting codes. IEEE Trans. Inf. Theory 53(2), 664–683 (2007)
Article MathSciNet MATH Google Scholar
Macris, N.: Sharp bounds on generalized exit functions. IEEE Trans. Inf. Theory 53, 2365–2375 (2007)
Article MathSciNet MATH Google Scholar
Kudekar, S., Macris, N.: Sharp bounds for optimal decoding of low-density parity-check codes. IEEE Trans. Inf. Theory 55(10), 4635–4650 (2009)
Article MathSciNet MATH Google Scholar
Korada, S.B., Macris, N.: On the capacity of a code division multiple access system. In: Proceedings of Allerton Conference on Communication, Control, and Computing, Monticello, IL, pp. 959–966 (Sept 2007)
Korada, S.B., Macris, N.: Tight bounds on the capacity of binary input random CDMA systems. IEEE Trans. Inf. Theory 56(11), 5590–5613 (2010)
Article MathSciNet MATH Google Scholar
Barbier, J., Dia, M., Macris, N., Krzakala, F.: The mutual information in random linear estimation. In: The 54th Annual Allerton Conference on Communication, Control, and Computing (Sept 2016)
Barbier, J., Macris, N., Dia, M., Krzakala, F.: Mutual information and optimality of approximate message-passing in random linear estimation. arXiv preprint arXiv:1701.05823
Barbier, J., Macris, N.: I-MMSE relations in random linear estimation and a sub-extensive interpolation method. arXiv:1704.04158 (April 2017)
Korada, S.B., Macris, N.: Exact solution of the gauge symmetric p-spin glass model on a complete graph. J. Stat. Phys. 136(2), 205–230 (2009)
Article MathSciNet MATH Google Scholar
Krzakala, F., Xu, J., Zdeborová, L.: Mutual information in rank-one matrix estimation. In: 2016 IEEE Information Theory Workshop (ITW), pp. 71–75 (Sept 2016)
Barbier, J., Dia, M., Macris, N., Krzakala, F., Lesieur, T., Zdeborová, L.: Mutual information for symmetric rank-one matrix estimation: a proof of the replica formula. Adv. Neural Inf. Process. Syst. (NIPS) 29, 424–432 (2016)
Google Scholar
Franz, S., Leone, M.: Replica bounds for optimization problems and diluted spin systems. J. Stat. Phys. 111, 535–564 (2003)
Article MathSciNet MATH Google Scholar
Franz, S., Leone, M., Toninelli, F.: Replica bounds for diluted non-poissonian spin systems. J. Phys. A Math. Gen. 36, 535–564 (2003)
Article MathSciNet MATH Google Scholar
Panchenko, D., Talagrand, M.: Bounds for diluted mean-field spin glass models. Probab. Theory Relat. Fields 130(8), 319–336 (2004)
Article MathSciNet MATH Google Scholar
Hassani, H., Macris, N., Urbanke, R.: Threshold saturation in spatially coupled constraint satisfaction problems. J. Stat. Phys. 150, 807–850 (2013)
Article MathSciNet MATH Google Scholar
Mézard, M., Montanari, A.: Information, Physics and Computation. Oxford Press, Oxford (2009)
Book MATH Google Scholar
Giurgiu, A., Macris, N., Urbanke, R.: Spatial coupling as a proof technique and three applications. IEEE Trans. Inf. Theory 62(10), 5281–5295 (2016)
Article MathSciNet MATH Google Scholar
Reeves, G., Pfister, H.D.: The replica-symmetric prediction for compressed sensing with gaussian matrices is exact. In: 2016 IEEE International Symposium on Information Theory (ISIT) (July 2016)
Reeves, G., Pfister, H.D.: The replica-symmetric prediction for compressed sensing with Gaussian matrices is exact. arXiv:1607.02524 (2016)
Lesieur, T., Miolane, L., Lelarge, M., Krzakala, F., Zdeborová, L.: Statistical and computational phase transitions in spiked tensor estimation. In: 2017 IEEE International Symposium on Information Theory, ISIT 2017, Aachen, Germany, June 25–30, 2017, pp. 511–515 (2017)
Lelarge, M., Miolane, L.: Fundamental limits of symmetric low-rank matrix estimation. Probab. Theory Relat. Fields (2018). https://doi.org/10.1007/s00440-018-0845-x
Miolane, L.: Fundamental limits of low-rank matrix estimation: the non-symmetric case. ArXiv e-prints (Feb 2017)
Coja-Oghlan, A., Krzakala, F., Perkins, W., Zdeborova, L.: Information-theoretic thresholds from the cavity method. arXiv:1611.00814v3 (2016)
Aizenman, M., Sims, R., Starr, S.L.: Extended variational principle for the Sherrington–Kirkpatrick spin-glass model. Phys. Rev. B 68, 214403 (2003)
Article Google Scholar
Barbier, J., Macris, N., Miolane, L.: The layered structure of tensor estimation and its mutual information. In: 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (Sept 2017)
Barbier, J., Krzakala, F., Macris, N., Miolane, L., Zdeborová, L.: Phase transitions, optimal errors and optimality of message-passing in generalized linear models. arXiv preprint arXiv:1708.03395 (2017)
Gabrié, M., Manoel, A., Luneau, C., Barbier, J., Macris, N., Krzakala, F., Zdeborová, L.: Entropy and mutual information in models of deep neural networks. In: Advances in Neural Information Processing Systems (NIPS), Montréal, CA (2018)
Aubin, B., Maillard, A., Barbier, J., Macris, N., Krzakala, F., Zdeborová, L.: The committee machine: Computational to statistical gaps in learning a two-layers neural network. In: Advances in Neural Information Processing Systems (NIPS), Montréal, CA (2018)
Barbier, J., Macris, N., Maillard, A., Krzakala, F.: The mutual information in random linear estimation beyond i.i.d. matrices. In: IEEE International Symposium on Information Theory (ISIT) (2018)
Barbier, J., Chan, C.-L., Macris, N.: Adaptive path interpolation for sparse systems: application to a simple censored block model. In: IEEE International Symposium on Information Theory (ISIT) (2018)
Pastur, L., Shcherbina, M.: The absence of the selfaverageness of the order parameter in the Sherrington–Kirkpatrick model. J. Stat. Phys. 62(1/2), 1–19 (1991)
Article Google Scholar
Pastur, L., Shcherbina, M., Tirozzi, B.: The replica symmetric solution without replica trick for the Hopfield model. J. Stat. Phys. 74, 1161–1183 (1994)
Article MathSciNet MATH Google Scholar
Shcherbina, M.: On the replica symmetric solution for the Sherrington-Kirkpatrick model. Helvetica Physica Acta 70, 838–853 (1997)
MathSciNet MATH Google Scholar
Nishimori, H.: Statistical Physics of Spin Glasses and Information Processing: An Introduction. Oxford University Press, Oxford (2001)
Book MATH Google Scholar
Iba, Y.: The Nishimori line and Bayesian statistics. J. Phys. A Math. General 32(21), 3875 (1999)
Article MathSciNet MATH Google Scholar
Mezard, M., Montanari, A.: Information, Physics and Computation. Oxford University Press, Oxford (2009)
Book MATH Google Scholar
Lesieur, T., Krzakala, F., Zdeborová, L.: Mmse of probabilistic low-rank matrix estimation: universality with respect to the output channel. In: Annual Allerton Conference (2015)
Deshpande, Y., Abbe, E., Montanari, A.: Asymptotic mutual information for the binary stochastic block model. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 185–189 (July 2016)
Guerra, F., Toninelli, F.L.: The thermodynamic limit in mean field spin glass models. Commun. Math. Phys. 230(1), 71–79 (2002)
Article MathSciNet MATH Google Scholar
Giurgiu, A., Macris, N., Urbanke, R.: How to prove the maxwell conjecture via spatial coupling: a proof of concept. In: 2012 IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 458–462 (July 2012)
Nishimori, H.: Statistical Physics of Spin Glasses and Information Processing: An Introduction. Oxford University Press, Oxford (2001)
Book MATH Google Scholar
Lesieur, T., Krzakala, F., Zdeborová, L.: Constrained low-rank matrix estimation: phase transitions, approximate message passing and applications. J. Stat. Mech. Theory Exp. 2017(7), 073403 (2017)
Article MathSciNet Google Scholar
Guerra, F., Toninelli, F.: The infinite volume limit in generalised mean field disordered models. Markov Proc. Rel. Fields 9(2), 195–2017 (2003)
MATH Google Scholar
McDiarmid, C.: On the method of bounded differences. Surv. Comb. 141(1), 148–188 (1989)
MathSciNet MATH Google Scholar
Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press (2013)
Guo, D., Wu, Y., Shitz, S.S., Verdú, S.: Estimation in gaussian noise: properties of the minimum mean-square error. IEEE Trans. Inf. Theory 57(4), 2371–2385 (2011)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Jean Barbier acknowledges funding by the Swiss National Science Foundation Grant No. 200021-156672. We thank Thibault Lesieur for providing us the expression of the RS potential for tensor estimation. We also acknowledge helpful discussions with Olivier Lévêque and Léo Miolane on the stochastic calculus interpretation and continuous version of “Appendix E”.

Author information

Authors and Affiliations

Laboratoire de Théorie des Communications, Faculté Informatique et Communications, Ecole Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
Jean Barbier & Nicolas Macris
International Center for Theoretical Physics, Strada Costiera, 11 I, 34151, Trieste, Italy
Jean Barbier

Authors

Jean Barbier
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Macris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean Barbier.

Appendices

Linking the perturbed and plain free energies

The purpose of this appendix is to prove Lemma 1. We first note that differentiating the function $\epsilon \mapsto f_{k=1,t=0;\epsilon }$ in (24)

$$\begin{aligned} \frac{d f_{1, 0;\epsilon }}{d\epsilon } = \frac{1}{n}\sum _{i=1}^n {\mathbb {E}}\left[ \frac{1}{2}\langle X_i^2\rangle _{1, 0;\epsilon } - \langle X_i \rangle _{1, 0;\epsilon } S_i - \frac{1}{2\sqrt{\epsilon }}\langle X_i\rangle _{1, 0;\epsilon } {\hat{Z}}_i \right] . \end{aligned}$$

(177)

By a Gaussian integration by parts the last term becomes

$$\begin{aligned} -\frac{1}{2\sqrt{\epsilon }}{\mathbb {E}}[\langle X_i\rangle _{1, 0; \epsilon } {\hat{Z}}_i ] = -\frac{1}{2\sqrt{\epsilon }}{\mathbb {E}}\left[ \frac{\partial }{\partial {\hat{Z}}_i}\langle X_i\rangle _{1, 0;\epsilon }\right] = -\frac{1}{2}{\mathbb {E}}\left[ \langle X_i^2\rangle _{1, 0;\epsilon } - \langle X_i\rangle _{1, 0;\epsilon }^2\right] . \end{aligned}$$

(178)

By an application of the identity (47) we have ${\mathbb {E}}[\langle X_i\rangle _{1, 0;\epsilon } S_i] = {\mathbb {E}}[\langle X_i\rangle _{1, 0;\epsilon }^2]$. Therefore we find

$$\begin{aligned} \frac{d f_{1, 0;\epsilon }}{d\epsilon } = - \frac{1}{2n}\sum _{i=1}^n {\mathbb {E}}[\langle X_i\rangle _{1, 0;\epsilon }^2]. \end{aligned}$$

(179)

Now by convexity and (47) we have ${\mathbb {E}}[\langle X_i\rangle _{1, 0;\epsilon }^2] \le {\mathbb {E}}[\langle X_i^2\rangle _{1, 0;\epsilon }] = {\mathbb {E}}[S^2]$. Therefore

$$\begin{aligned} \Big \vert \frac{d f_{1, 0;\epsilon }}{d\epsilon } \Big \vert \le \frac{{\mathbb {E}}[S^2]}{2} \end{aligned}$$

(180)

and the first inequality of the Lemma follows from an application of the mean value theorem.

The second inequality follows from the Lipschitz continuity of the free energy $f_{k=K,t=1;\epsilon }$ of the decoupled scalar system. We refer to [57] for the proof of this standard fact.

Proof of Lemma 3

The proof of this lemma uses another interpolation:

$$\begin{aligned}&{\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,t;\epsilon }]- {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,0;\epsilon }] = \int _0^t ds \frac{d\,{\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,s;\epsilon }]}{ds} \nonumber \\&\quad = \int _0^t ds {\mathbb {E}}\left[ \big \langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\big \rangle _{k,s;\epsilon }\Big \langle \frac{d{{\mathcal {H}}}_{k,s;\epsilon }({\mathbf {X}};\varvec{\Theta })}{ds}\Big \rangle _{k,s;\epsilon } - \Big \langle q_{{\mathbf {X}},{\mathbf {S}}}^{} \frac{d{{\mathcal {H}}}_{k,s;\epsilon }({\mathbf {X}};\varvec{\Theta })}{ds}\Big \rangle _{k,s;\epsilon }\right] , \nonumber \\&\quad = \int _0^t ds {\mathbb {E}}\left[ \Big \langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\Big (\frac{d{{\mathcal {H}}}_{k,s;\epsilon }({\mathbf {X}}';\varvec{\Theta })}{ds} -\frac{d{{\mathcal {H}}}_{k,s;\epsilon }({\mathbf {X}};\varvec{\Theta })}{ds}\Big )\Big \rangle _{k,s;\epsilon }\right] , \end{aligned}$$

(181)

where ${\mathbf {X}},{\mathbf {X}}', {\mathbf {X}}''$ etc are i.i.d replicas distributed according to (22). Computations similar to those in Sect. 2.7 lead to

$$\begin{aligned} {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,t;\epsilon }] - {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,0;\epsilon }] = \frac{1}{K} \int _0^t ds {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}(g({\mathbf {X}}', {\mathbf {X}}'';{\mathbf {S}}) - g({\mathbf {X}},{\mathbf {X}}';{\mathbf {S}}))\rangle _{k,s;\epsilon }] \end{aligned}$$

(182)

where we define

$$\begin{aligned} g({\mathbf {x}}, {\mathbf {x}}';{\mathbf {s}}):=\frac{m_{k}}{\Delta }\sum _{i=1}^n\Big (\frac{x_ix_i'}{2} - x_is_i \Big ) - \frac{1}{\Delta }\sum _{i\le j=1}^n\Big (\frac{x_ix_j x_i'x_j'}{2n} - \frac{x_ix_js_is_j}{n} \Big ). \end{aligned}$$

(183)

Finally from (182) and Cauchy–Schwarz, one obtains

$$\begin{aligned} \big |{\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,t;\epsilon }] - {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^{}\rangle _{k,0;\epsilon }]\big |&= {{\mathcal {O}}}\Big (\frac{1}{K} \sqrt{{\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^2 \rangle _{k,s;\epsilon }] {\mathbb {E}}[\langle g({\mathbf {X}}, {\mathbf {X}}';{\mathbf {S}})^2 \rangle _{k,s;\epsilon }]}\Big )\nonumber \\&= {{\mathcal {O}}}\Big (\frac{n}{K}\Big ). \end{aligned}$$

(184)

The last equality is true as long as the prior $P_0$ has bounded first four moments. We prove this claim now. Let us start by studying ${\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^2 \rangle _{k,s;\epsilon }]$. Using Cauchy–Schwarz for the inequality and (47) for the subsequent equality,

$$\begin{aligned} {\mathbb {E}}[\langle q_{{\mathbf {X}},{\mathbf {S}}}^2\rangle _{k,s;\epsilon }]&= \frac{1}{n^2}\sum _{i,j=1}^n{\mathbb {E}}[\langle X_iX_jS_iS_j \rangle _{k,s;\epsilon }] \nonumber \\&\le \frac{1}{n^2}\sum _{i,j=1}^n \sqrt{{\mathbb {E}}[\langle X_i^2X_j^2 \rangle _{k,s;\epsilon }]{\mathbb {E}}[S_i^2S_j^2 ] } = \frac{1}{n^2}\sum _{i,j=1}^n {\mathbb {E}}[S_i^2S_j^2]={{\mathcal {O}}}(1), \end{aligned}$$

(185)

where the last equality is valid for $P_0$ with bounded second and fourth moments. For ${\mathbb {E}}[\langle g({\mathbf {X}}, {\mathbf {X}}';{\mathbf {S}})^2 \rangle _{k,s;\epsilon }]$ we proceed similarly by decoupling the expectations using Cauchy–Schwarz and then using (47) to make appear only terms depending on the signal ${\mathbf {s}}$. One finds that under the same conditions on the moments of $P_0$ we have ${\mathbb {E}}[\langle g({\mathbf {X}}, {\mathbf {X}}';{\mathbf {S}})^2 \rangle _{k,s;\epsilon }] = {{\mathcal {O}}}(n^2)$. Combined with (185) leads to the last equality of (184) and ends the proof.

Alternative argument for the lower bound

We present an alternative useful, albeit not completely rigorous, argument to obtain the lower bound (46). With enough work the argument can be made rigorous. Note that defining

$$\begin{aligned} {\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^K;\Delta ) :=\frac{1}{4\Delta K}\sum _{k=1}^K m_{k}^2+f_{\mathrm{den}}\big (\Sigma (m_{\mathrm{mf}}^{(K)};\Delta )\big ) =f_{\mathrm{RS}}(m_{\mathrm{mf}}^{(K)};\Delta ) + \frac{V(\{m_k\})}{4\Delta }, \end{aligned}$$

(186)

the identity (45) is equivalent to

$$\begin{aligned} \int _{a_n}^{b_n}d\epsilon \, f_{1,0;\epsilon }&=\int _{a_n}^{b_n}d\epsilon \,\left\{ (f_{K_n,1;\epsilon } - f_{K_n,1;0}) + {\widetilde{f}}_{\mathrm{RS}}(\{m_k^{(n)}\}_{k=1}^{K_n};\Delta )\right\} + \mathcal {O}(a_n^{-2}n^{-\alpha }) \nonumber \\&\ge \int _{a_n}^{b_n}d\epsilon \,(f_{K_n,1;\epsilon } - f_{K_n,1;0}) \nonumber \\&\quad + \min _{\{m_k\ge 0\}_{k=1}^{K_n}} {\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta ) + \mathcal {O}(a_n^{-2}n^{-\alpha }). \end{aligned}$$

(187)

Setting $b_n=2a_n$, taking a sequence $a_n\rightarrow 0$ slowly enough as $n\rightarrow +\infty $, using Lemma 1 and (28), we obtain

$$\begin{aligned} \liminf _{n\rightarrow +\infty }f_n \ge \min _{\{m_k\ge 0\}_{k=1}^{K_n}}{\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta ). \end{aligned}$$

(188)

Simple algebra starting from $\partial _{m_k}{\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta ) = 0$ implies, under the assumption that the extrema are attained at interior points of ${\mathbb {R}}^{K_n}_+$ (the point to work out to make the argument rigorous), that the minimizer of ${\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta )$ satisfies

$$\begin{aligned} m_k = -2\,\partial _{\Sigma ^{-2}} f_{\mathrm{den}}(\Sigma )|_{\Sigma (m_{\mathrm{mf}}^{(K_n)};\Delta )}, \qquad k=1, \ldots , K_n. \end{aligned}$$

(189)

The right hand side is independent of k, thus the minimizer is $m_k=m_*$ for $k=1, \ldots , K_n$ where

$$\begin{aligned} m_* = -2\, \partial _{\Sigma ^{-2}} f_{\mathrm{den}}(\Sigma )|_{\Sigma = \sqrt{\frac{\Delta }{m_*}}}, \qquad k=1, \ldots , K_n. \end{aligned}$$

(190)

Thus

$$\begin{aligned} \min _{\{m_k\ge 0\}_{k=1}^{K_n}} {\widetilde{f}}_{\mathrm{RS}}(\{m_k\}_{k=1}^{K_n};\Delta ) = f_{\mathrm{RS}}( m_*; \Delta ) \ge \min _{m\ge 0} f_{\mathrm{RS}}(m;\Delta ). \end{aligned}$$

(191)

From (188) we get

$$\begin{aligned} \liminf _{n\rightarrow +\infty }f_n&\ge \min _{m \ge 0 } f_{\mathrm{RS}}(m;\Delta ) \end{aligned}$$

(192)

which is the inequality (46).

A consequence of Bayes rule

The purpose of this appendix is to prove the identity (47). Recall that the Gibbs bracket $\langle - \rangle _{k,t;\epsilon }$ is the average with respect to the posterior $P_{k,t;\epsilon }({\mathbf {x}}\vert \varvec{\theta })$ where $\varvec{\theta } :=\{{\mathbf {s}}, \{{\mathbf {z}}^{(k)}, \widetilde{{\mathbf {z}}}^{(k)}\}_{k=1}^K, {\widehat{{\mathbf {z}}}}\}$. Using Bayes law we have:

$$\begin{aligned} {\mathbb {E}}_{\varvec{\Theta }}[\langle g({\mathbf {X}}, {\mathbf {S}})\rangle _{k,t;\epsilon }] = {\mathbb {E}}_{{\mathbf {S}}} {\mathbb {E}}_{\varvec{\Theta }\vert {\mathbf {S}}} [\langle g({\mathbf {X}}, {\mathbf {S}})\rangle _{k,t;\epsilon }] = {\mathbb {E}}_{\varvec{\Theta }} {\mathbb {E}}_{{\mathbf {S}}\vert \varvec{\Theta }} [\langle g({\mathbf {X}}, {\mathbf {S}})\rangle _{k, t,\epsilon }]. \end{aligned}$$

(193)

It remains to notice that

$$\begin{aligned} {\mathbb {E}}_{\varvec{\Theta }}{\mathbb {E}}_{{\mathbf {S}}\vert \varvec{\Theta }} [\langle g({\mathbf {X}}, {\mathbf {S}})\rangle _{k,t;\epsilon }] = {\mathbb {E}}_{\varvec{\Theta }}[\langle g({\mathbf {X}}, {\mathbf {X}}')\rangle _{k,t;\epsilon }] \end{aligned}$$

(194)

where the Gibbs bracket on the right hand side is an average with respect to the product measure of two posteriors $P_{k,t;\epsilon }({\mathbf {x}}\vert \varvec{\theta }) P_{k,t;\epsilon }({\mathbf {x}}'\vert \varvec{\theta })$.

A stochastic calculus interpretation

We note that the proofs do not require any upper limit on K. This suggests that it is possible to formulate the adaptive interpolation method entirely in a continuum language. Here we informally show this for the simplest problem, namely symmetric rank-one matrix factorisation, and plan to come back to a rigorous formulation of the continuum formulation in future work.

It is helpful to first write down explicitly the (k, t)-interpolating Hamiltonian (14) (leaving out the perturbation in (21) which is irrelevant for the argument here)

$$\begin{aligned} \mathcal {H}_{k,t}({\mathbf {x}};\varvec{\theta }) =&\frac{1}{K\Delta }\sum _{k^\prime = k+1}^K \sum _{i\le j=1}^n \Big ( \frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n} - \sqrt{\frac{K\Delta }{n}} x_i x_j z_{ij}^{(k^\prime )}\Big ) \end{aligned}$$

(195)

$$\begin{aligned}&+ \frac{1}{K\Delta }\sum _{k^\prime =1}^{k-1} m_{k^\prime } \sum _{i=1}^n \Big ( \frac{x_i^2}{2} - x_i s_i - \sqrt{\frac{K\Delta }{m_{k^\prime }}} x_i {\widetilde{z}}_i^{(k^\prime )}\Big ) \end{aligned}$$

(196)

$$\begin{aligned}&+ \frac{1-t}{K\Delta } \sum _{i\le j=1}^n \Big ( \frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n} - \sqrt{\frac{K\Delta }{(1-t)n}} x_i x_j z_{ij}^{(k)} \Big ) \end{aligned}$$

(197)

$$\begin{aligned}&+ \frac{t\,m_k}{K\Delta }\sum _{i=1}^n \Big ( \frac{x_i^2}{2} - x_i s_i - \sqrt{\frac{K\Delta }{t \,m_{k}}} x_i {\widetilde{z}}_i^{(k)}\Big ), \end{aligned}$$

(198)

and to define the step-wise function $m(u) = m_{k^\prime }$ for $k'/K \le u< (k^\prime +1)/K$, $k^\prime = 1, \dots , K$.

Let us first look at the terms that do not involve Gaussian noise and become simple Riemann integrals. We have for the contribution coming from (195) and (197),

$$\begin{aligned}&\frac{1}{\Delta }\sum _{i\le j=1}^n\left\{ \frac{1}{K}\sum _{k^\prime = k+1}^K \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ) + \frac{1-t}{K} \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ) \right\} \nonumber \\&\quad = ~\frac{1}{\Delta }\sum _{i\le j=1}^n\left\{ \int _{\frac{k+1}{K}}^{\frac{K+1}{K}}du\, \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ) + \int _{\frac{k+t}{K}}^{\frac{k+1}{K}} du \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ) \right\} \nonumber \\&\quad = ~\frac{1}{\Delta }\sum _{i\le j=1}^n \int _{\frac{k+t}{K}}^{\frac{K+1}{K}} du\, \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ). \end{aligned}$$

(199)

Similarly, we have for the terms coming from (196) and (198),

$$\begin{aligned}&\frac{1}{\Delta }\sum _{i=1}^n\left\{ \frac{1}{K}\sum _{k^\prime =1}^{k-1} m_{k^\prime }\Big (\frac{x_i^2}{2} - x_i s_i\Big ) + \frac{t\,m_k}{K} \Big (\frac{x_i^2}{2} - x_i s_i \Big ) \right\} \nonumber \\&\quad =\, \frac{1}{\Delta }\sum _{i=1}^n\left\{ \int _{\frac{1}{K}}^{\frac{k}{K}} du\, m(u) \Big (\frac{x_i^2}{2} - x_i s_i\Big ) + \int _{\frac{k}{K}}^{\frac{k+t}{K}} du\,m(u) \Big (\frac{x_i^2}{2} - x_i s_i \Big ) \right\} \nonumber \\&\quad =\, \frac{1}{\Delta }\sum _{i=1}^n\left\{ \int _{\frac{1}{K}}^{\frac{k+t}{K}} du\, m(u) \Big (\frac{x_i^2}{2} - x_i s_i\Big ) \right\} . \end{aligned}$$

(200)

Now we treat the more interesting contributions involving the Gaussian noise. Let B(u) be the Wiener process defined by $B(0)=0$, ${\mathbb {E}}[B(u)] =0$, ${\mathbb {E}}[B(u)B(v)] = \min (u,v)$ for $u, v \in {\mathbb {R}}_+$. We introduce independent copies $B_{ij}(u)$, $i, j = 1,\dots , n$ and consider the sum of increments (also written as an Ito integral)

$$\begin{aligned}&\left\{ B_{ij}\Big (\frac{k+1}{K}\Big ) - B_{ij}\Big (\frac{k+t}{K}\Big ) \right\} + \sum _{k^\prime =k+1}^K \left\{ B_{ij}\Big (\frac{k^\prime +1}{K}\Big ) - B_{ij}\Big (\frac{k^\prime }{K}\Big ) \right\} \nonumber \\&\quad = \int _{\frac{k+t}{K}}^{\frac{K+1}{K}} dB_{ij}(u). \end{aligned}$$

(201)

Since the increments are independent and ${\mathbb {E}}[(B(u) - B(v))^2] = |u - v|$, this is a Gaussian random variable with zero mean and variance $(K + 1 - k - t)/K$. It is therefore equal in distribution to

$$\begin{aligned} \frac{1}{\sqrt{K}}\sum _{k^\prime = k+1}^K Z_{ij}^{(k^\prime )} + \sqrt{\frac{1-t}{K}} Z_{ij}^{(k)}, \end{aligned}$$

(202)

and the contribution of the (random) Gaussian noise in (195) and (197) becomes

$$\begin{aligned} \frac{1}{\sqrt{\Delta n}}\sum _{i\le j=1}^n x_i x_j \left\{ \frac{1}{\sqrt{K}}\sum _{k^\prime = k+1}^K Z_{ij}^{(k^\prime )} + \sqrt{\frac{1-t}{K}} Z_{ij}^{(k)} \right\} = \frac{1}{\sqrt{\Delta n}}\sum _{i\le j=1}^n \int _{\frac{k+t}{K}}^{\frac{K+1}{K}} dB_{ij}(u) x_i x_j. \end{aligned}$$

(203)

To represent the contributions of (196), (198) we introduce independent copies of the Wiener process $\widetilde{B}_i(u)$, $i=1, \dots , n$ and form the Ito integral

$$\begin{aligned}&\sum _{k^\prime =1}^{k-1} \sqrt{m_{k^\prime }} \left\{ {\widetilde{B}}_i\Big (\frac{k^\prime +1}{K}\Big ) - {\widetilde{B}}_i\Big (\frac{k^\prime }{K}\Big ) \right\} + \sqrt{m_k} \left\{ {\widetilde{B}}_i\Big (\frac{k +t}{K}\Big ) - {\widetilde{B}}_i\Big (\frac{k}{K}\Big ) \right\} \nonumber \\&\quad = \int _{\frac{1}{K}}^{\frac{k+t}{K}} \sqrt{m(u)} d{\widetilde{B}}_i(u) \end{aligned}$$

(204)

which has the same variance than

$$\begin{aligned} \frac{1}{\sqrt{K}} \sum _{k^\prime =1}^{k-1} \sqrt{m_{k^\prime }} \, {\widetilde{Z}}_i^{(k^\prime )} + \sqrt{\frac{t \,m_k}{K}} \,{\widetilde{Z}}_i ^{(k)}. \end{aligned}$$

(205)

Indeed

$$\begin{aligned} \frac{1}{K}\sum _{k^\prime =1}^{k-1} m_{k^\prime } + \frac{t \,m_k}{K} = \sum _{k^\prime =1}^{k-1} m_{k^\prime } \Big (\frac{k^\prime +1}{K} - \frac{k^\prime }{K}\Big ) + m_k \Big (\frac{k+t}{K} - \frac{k}{K}\Big ) = \frac{1}{K} \int _{\frac{1}{K}}^{\frac{k+t}{K}} du\, m(u). \end{aligned}$$

(206)

Therefore the contribution of (196) and (198) can be represented as

$$\begin{aligned} \frac{1}{\sqrt{\Delta }} \sum _{i=1}^n x_i \int _{\frac{1}{K}}^{\frac{k+t}{K}} \sqrt{m(u)} d{\widetilde{B}}_i(u). \end{aligned}$$

(207)

Finally, collecting (199), (200), (203), (207), setting $\tau :=(t + k)/K$ and $K \rightarrow \infty $, we obtain a continuous form of the random (k, t)-interpolating Hamiltonian,

$$\begin{aligned} \mathcal {H}_\tau ({\mathbf {x}}; {\mathbf {s}}, \mathbf {B}) = \,&\frac{1}{\Delta } \sum _{i\le j =1}^n \int _\tau ^1 \left\{ \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big ) du - \sqrt{\frac{\Delta }{n}} x_i x_j dB_{ij}(u) \right\} \nonumber \\&+ \frac{1}{\Delta } \sum _{i=1}^n \int _0^\tau \left\{ \Big (\frac{x_i^2}{2} - x_i s_i\Big ) m(u) du - \sqrt{\Delta m(u)} x_i d{\widetilde{B}}_i(u)\right\} \end{aligned}$$

(208)

where m(u) is an arbitrary trial function and $\mathbf {B}$ denotes the collection of all Wiener processes. Note that $\int _\tau ^1du \, B_{ij}(u) = B_{ij}(1) - B_{ij}(\tau )$ which is distributed as $\sqrt{1-\tau }Z_{ij}$ for $Z_{ij}\sim \mathcal {N}(0, 1)$, and $\int _0^\tau \sqrt{m(u)}d{\widetilde{B}}_i(u)$ is distributed as $\sqrt{\int _0^\tau m(u)}{\widetilde{Z}}_i$ for ${\widetilde{Z}}_i\sim \mathcal {N}(0, 1)$. Therefore (208) is equal in distribution to

$$\begin{aligned}&\frac{1}{\Delta } \sum _{i\le j =1}^n \left\{ \Big (\frac{x_i^2 x_j^2}{2n} - \frac{x_ix_js_is_j}{n}\Big )(1-\tau ) - x_i x_j Z_{ij}\sqrt{\frac{\Delta (1-\tau )}{n}} \right\} \nonumber \\&\quad + ~ \frac{1}{\Delta } \sum _{i=1}^n \left\{ \Big (\frac{x_i^2}{2} - x_i s_i\Big ) \int _0^\tau m(u) du - x_i {\widetilde{Z}}_i \sqrt{\Delta \int _0^\tau m(u)du} \right\} . \end{aligned}$$

(209)

Clearly, the usual Guerra-Toninelli interpolation appears as a special case where one choose a constant trial function $m(u)=m$ constant. When we go from (208) to (209) we eliminate completely the Wiener process, however we believe it is useful to keep in mind the point of view expressed by (208) which may turn out to be important for more complicated problems.

Starting from (208) or (209) it is possible to evaluate the free energy change along the interpolation path. We define the free energy

$$\begin{aligned} f(\tau ) = - \frac{1}{n} {\mathbb {E}}_{{\mathbf {S}}, \mathbf {B}}\left[ \ln {\mathbb {E}}_{\mathbf {X}}\left[ e^{-\mathcal {H}_\tau ({\mathbf {X}}; {\mathbf {S}}, \mathbf {B})}\right] \right] . \end{aligned}$$

(210)

For $\tau = 0$ using we recover the original Hamiltonian $\mathcal {H}_{k=1, t=0}$ (see (26)) and $f(0) = f$ given in (7). For $\tau = 1$ setting $\int _0^1 du \,m(u) = m_{\mathrm{mf}}$ we recover the mean-field Hamiltonian $\mathcal {H}_{k=K, t=1}$ (see (31)) and $f(1) = f_{\mathrm{den}}(\Sigma (\int _0^1 du\, m(u)); \Delta )$. Then proceeding similarly to Sect. 2.7 one finds the identity

$$\begin{aligned} f =&f_{\mathrm{RS}}\Big (\int _0^1d\tau \, m(\tau ); \Delta \Big ) + \left\{ \int _0^1d\tau \, m(\tau )^2 - \Big (\int _0^1 d\tau \, m(\tau )\Big )^2\right\} \nonumber \\&- \frac{1}{4\Delta }\int _0^1 d\tau \, {\mathbb {E}}_{{\mathbf {S}}, \mathbf {B}}\left[ \big \langle (q_{{\mathbf {X}}, {\mathbf {S}}}^{} - m(\tau ))^2\big \rangle _\tau \right] + {{\mathcal {O}}}(n^{-1}) \end{aligned}$$

(211)

where $\langle -\rangle _\tau $ is the Gibbs average w.r.t (208).

Of course this immediately gives the upper bound in Proposition 1. The matching lower bound is obtained by the same ideas used in the discrete version. We briefly review them informally in the continuous language. One first introduces the $\epsilon $-perturbation term (21) and proves a concentration property for the overlap analogous to Lemma 2. Starting with the continuous version of the interpolating Hamiltonian the proof of the free energy concentration is essentially identical (even simpler) than in Sect. 7, which implies the overlap concentration through Sect. 5 that is unchanged. Then, the square in the remainder term is approximately equal to $({\mathbb {E}}_{{\mathbf {S}}, \mathbf {B}}[\langle q_{{\mathbf {X}}, {\mathbf {S}}}^{}\rangle _{\tau , \epsilon }] - m(\tau ))^2$ and we make it vanish by choosing

$$\begin{aligned} m(\tau ) = {\mathbb {E}}_{{\mathbf {S}}, \mathbf {B}}[\langle q_{{\mathbf {X}}, {\mathbf {S}}}^{}\rangle _{\tau , \epsilon }]. \end{aligned}$$

(212)

This continuous setting thus allows to avoid proving Lemma 3. This then easily yields the lower bound in Proposition 1. One must still check that (212) has a solution. The right hand side is a function $G_{n, \epsilon }(\tau ; \int _0^\tau du\, m(u))$ so setting $x(\tau ) = \int _0^\tau du\, m(u)$, $dx/d\tau = m(\tau )$, we recognize that (212) is a first order differential equation with initial condition $x(0)=0$. The existence of a unique global solution on $\tau \in [0,1]$ is then proved using the Cauchy–Lipschitz theorem. Moreover this solution is differentiable and monotone increasing with respect to $\epsilon $. This last step of the analysis replaces Lemma 4.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barbier, J., Macris, N. The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference. Probab. Theory Relat. Fields 174, 1133–1185 (2019). https://doi.org/10.1007/s00440-018-0879-0

Download citation

Received: 08 May 2017
Revised: 03 July 2018
Published: 26 October 2018
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s00440-018-0879-0

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference

Abstract

Access this article

Similar content being viewed by others

Randomized matrix-free trace and log-determinant estimators

Statistical inference of semidefinite programming

Approximate Bayesian computation using asymptotically normal point estimates

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Linking the perturbed and plain free energies

Proof of Lemma 3

Alternative argument for the lower bound

A consequence of Bayes rule

A stochastic calculus interpretation

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference

Abstract

Access this article

Similar content being viewed by others

Randomized matrix-free trace and log-determinant estimators

Statistical inference of semidefinite programming

Approximate Bayesian computation using asymptotically normal point estimates

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Linking the perturbed and plain free energies

Proof of Lemma 3

Alternative argument for the lower bound

A consequence of Bayes rule

A stochastic calculus interpretation

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation