Journal of Statistical Physics

, Volume 164, Issue 3, pp 531–574

# Cycle-Based Cluster Variational Method for Direct and Inverse Inference

Article

## Abstract

Large scale inference problems of practical interest can often be addressed with help of Markov random fields. This requires to solve in principle two related problems: the first one is to find offline the parameters of the MRF from empirical data (inverse problem); the second one (direct problem) is to set up the inference algorithm to make it as precise, robust and efficient as possible. In this work we address both the direct and inverse problem with mean-field methods of statistical physics, going beyond the Bethe approximation and associated belief propagation algorithm. We elaborate on the idea that loop corrections to belief propagation can be dealt with in a systematic way on pairwise Markov random fields, by using the elements of a cycle basis to define regions in a generalized belief propagation setting. For the direct problem, the region graph is specified in such a way as to avoid feed-back loops as much as possible by selecting a minimal cycle basis. Following this line we are led to propose a two-level algorithm, where a belief propagation algorithm is run alternatively at the level of each cycle and at the inter-region level. Next we observe that the inverse problem can be addressed region by region independently, with one small inverse problem per region to be solved. It turns out that each elementary inverse problem on the loop geometry can be solved efficiently. In particular in the random Ising context we propose two complementary methods based respectively on fixed point equations and on a one-parameter log likelihood function minimization. Numerical experiments confirm the effectiveness of this approach both for the direct and inverse MRF inference. Heterogeneous problems of size up to $$10^5$$ are addressed in a reasonable computational time, notably with better convergence properties than ordinary belief propagation.

## References

1. 1.
Bethe, H.A.: Statistical theory of superlattices. Proc. R. Soc. Lond. A 150(871), 552–575 (1935)
2. 2.
Chertkov, M., Chernyak, V.Y.: Loop series for discrete statistical models on graphs. J. Stat. Mech. 6, P06009 (2006)Google Scholar
3. 3.
Cocco, S., Monasson, R.: Adaptive cluster expansion for the inverse Ising problem: convergence, algorithm and tests. J. Stat. Phys. 147(2), 252–314 (2012)
4. 4.
Cooper, G.: The computational complexity of probabilistic inference using bayesian belief networks (research note). Artif. Intell. 42(2–3), 393–405 (1990)
5. 5.
Decelle, A., Ricci-Tersenghi, F.: Pseudolikelihood decimation algorithm improving the inference of the interaction network in a general class of Ising models. Phys. Rev. Lett. 112, 070603 (2014)
6. 6.
Dominguez, E., Lage-Castellanos, A., Mulet, R., Ricci-Tersenghi, F., Rizzo, T.: Characterizing and improving generalized belief propagation algorithms on the 2d Edwards-Anderson model. J. Stat. Mech. Theory Exp. 12, P12007 (2011)Google Scholar
7. 7.
Furtlehner, C.: Approximate inverse Ising models close to a Bethe reference point. J. Stat. Mech. 09, P09020 (2013)
8. 8.
Gabrié, M., Tramel, E.W., Krzakala, F.: Training restricted Boltzmann machine via the Thouless–Anderson–Palmer free energy. Adv. Neural Inf. Process. Syst. 28, 640–648 (2015)Google Scholar
9. 9.
Gelfand, A., Welling, M.: Generalized belief propagation on tree robust structured region graphs. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, vol. 28 (2012)Google Scholar
10. 10.
Globerson, A., Jaakkola, T.: Fixing max-product: convergent message passing algorithms for MAP LP-relaxations. In: NIPS, pp. 553–560 (2007)Google Scholar
11. 11.
Heskes, T.: Stable fixed points of loopy belief propagation are minima of the Bethe free energy. Adv. Neural Inf. Process. Syst. 15, 343–350 (2003)Google Scholar
12. 12.
Heskes, T., Albers, K., Kappen, B.: Approximate inference and constrained optimization. In: UAI (2003)Google Scholar
13. 13.
Höfling, H., Tibshirani, R.: Estimation of sparse binary pairwise Markov networks using pseudo-likelihood. JMLR. 10, 883–906 (2009)
14. 14.
Horton, J.: A polynomial-time algorithm to find the shortest cycle basis of a graph. SIAM J. Comput. 16(2), 358–366 (1987)
15. 15.
Jörg, T., Lukic, J., Marinari, E., Martin, O.C.: Strong universality and algebraic scaling in two-dimensional Ising spin glasses. Phys. Rev. Lett. 96, 237205 (2006)
16. 16.
Kappen, H., Rodríguez, F.: Efficient learning in Boltzmann machines using linear response theory. Neural Comput. 10(5), 1137–1156 (1998)
17. 17.
Kavitha, T., Liebchen, C., Mehlhorn, K., Michail, D., Rizzi, R., Ueckerdt, T., Zweig, K.A.: Cycle bases in graphs characterization, algorithms, complexity, and applications. Comput. Sci. Rev. 3(4), 199–243 (2009)
18. 18.
Kikuchi, R.: A theory of cooperative phenomena. Phys. Rev. 81, 988–1003 (1951)
19. 19.
Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1568–1583 (2006)
20. 20.
Kolmogorov, V., Wainwright, M.: On the optimality of tree-reweighted max-product message-passing. In: UAI, pp. 316–323 (2005)Google Scholar
21. 21.
Kudekar, S., Johnson, J., Chertkov, M.: Improved linear programming decoding using frustrated cycles. In: Proceedings of the 2013 IEEE International Symposium on Information Theory, Istanbul, Turkey, pp. 1496–1500, 7–12 July 2013Google Scholar
22. 22.
Lage-Castellanos, A., Mulet, R., Ricci-Tersenghi, F., Rizzo, T.: A very fast inference algorithm for finite-dimensional spin glasses: belief propagation on the dual lattice. Phys. Rev. E 84, 046706 (2011)
23. 23.
Lauritzen, S.: Graphical Models. Oxford University Press, Oxford (1996)
24. 24.
LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015)
25. 25.
Lee, S.I., Ganapathi, V., Koller, D.: Efficient structure learning of Markov networks using $${L}_1$$-regularization. In: NIPS (2006)Google Scholar
26. 26.
Martin, V., Furtlehner, C., Han, Y., Lasgouttes, J.-M.: GMRF estimation under topological and spectral constraints. In: ECML, vol. 8725, pp. 370–385 (2014)Google Scholar
27. 27.
Martin, V., Lasgouttes, J.-M., Furtlehner, C.: Latent binary MRF for online reconstruction of large scale systems. In: Annals of Mathematics and Artificial Intelligence, pp. 1–32. Springer, Dordrecht (2015)Google Scholar
28. 28.
Mézard, M., Mora, T.: Constraint satisfaction problems and neural networks: a statistical physics perspective. J. Physiol. Paris 103(1–2), 107–113 (2009)
29. 29.
Montanari, A., Rizzo, T.: How to compute loop corrections to the Bethe approximation. J. Stat. Mech. Theory Exp. 2005(10), P10011 (2005)
30. 30.
Mooij, J., Kappen, H.: Loop corrections for approximate inference on factor graphs. JMLR. 8, 1113–1143 (2007)
31. 31.
Morita, T.: Cluster variation method and Möbius inversion formula. J. Stat. Phys. 59(3–4), 819–825 (1990)
32. 32.
Nguyen, H., Berg, J.: Bethe–Peierls approximation and the inverse Ising model. J. Stat. Mech. 1112(3501), P03004 (2012)Google Scholar
33. 33.
Pakzad, P., Anantharam, V.: Estimation and marginalization using the Kikuchi approximation methods. Neural Comput. 17(8), 1836–1873 (2005)
34. 34.
Parisi, G., Slanina, F.: Loop expansion around the Bethe–Peierls approximation for lattice models. J. Stat. Mech. Theory Exp. 2006(02), L02003 (2006)
35. 35.
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Network of Plausible Inference. Morgan Kaufmann, San Mateo (1988)
36. 36.
Pelizzola, A.: Cluster variation method in statistical physics and probabilistic graphical models. J. Phys. A Math. Gen. 38(33), R309–R339 (2005)
37. 37.
Ramezanpour, A.: Computing loop corrections by message passing. Phys. Rev. E 87, 060103 (2013)
38. 38.
Ravikumar, P., Wainwright, M.J., Lafferty, J.D.: High-dimensional Ising model selection using L$$_1$$-regularized logistic regression. Ann. Stat. 38(3), 1287–1319 (2010)
39. 39.
Rizzi, R.: Minimum weakly fundamental cycle bases are hard to find. Algorithmica 53(3), 402–424 (2009)
40. 40.
Ruozzi, N.: Message passing algorithms for optimization. PhD Thesis, Yale University (2011)Google Scholar
41. 41.
Savit, R.: Duality in field theory and statistical systems. Rev. Mod. Phys. 52(2), 453–487 (1980)
42. 42.
Shimony, S.: Finding MAPs for belief networks is NP-hard. Artif. Intell. 68(2), 399–410 (1994)
43. 43.
Sontag, D., Jaakkola, T.: New outer bounds on the marginal polytope. In: Neural Information Processing Systems. MIT, Cambridge (2007)Google Scholar
44. 44.
Sontag, D., Meltzer, T., Globerson, A., Jaakkola, T., Weiss, Y.: Tightening LP-relaxations for MAP using message passing. In: Uncertainty in Artificial Intelligence (UAI) (2008)Google Scholar
45. 45.
Sontag, D., Choe, D., Li, Y.: Efficiently searching for frustrated cycles in MAP inference. In: UAI, pp. 795–804 (2012)Google Scholar
46. 46.
Sudderth, E., Wainwright, M., Willsky, A.: Loop series and Bethe variational bounds in attractive graphical models. NIPS. 20, 1425–1432 (2008)Google Scholar
47. 47.
Tanaka, K.: Statistical-mechanical approach to image processing. J. Phys. A Math. Gen. 35(37), R81 (2002)
48. 48.
Wainwright, M., Jordan, M.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)
49. 49.
Wainwright, M., Jaakkola, T., Willsky, A.: MAP estimation via agreement on trees: message-passing and linear programming. IEEE Trans. Inf. Theory 51(11), 3697–3717 (2005)
50. 50.
Weiss, Y.: Correctness of local probability propagation in graphical models with loops. Neural Comput. 12(1), 1–41 (2000)
51. 51.
Welling, M.: On the choice of regions for generalized belief propagation. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI ’04), pp. 585–592 (2004)Google Scholar
52. 52.
Welling, M., Teh, Y.: Approximate inference in Boltzmann machines. Artif. Intell. 143(1), 19–50 (2003)
53. 53.
Welling, M., Minka, T., Teh, Y.W.: Structured region graphs: morphing EP into GBP. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, vol. 21 (2005)Google Scholar
54. 54.
Xiao, J., Zhou, H.: Partition function loop series for a general graphical model: free-energy corrections and message-passing equations. J. Phys. A Math. Theor. 44(42), 425001 (2011)
55. 55.
Yasuda, M., Tanaka, K.: Approximate learning algorithm in Boltzmann machines. Neural Comput. 21, 3130–3178 (2009)
56. 56.
Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory. 51(7), 2282–2312 (2005)
57. 57.
Yuille, A.L.: CCCP algorithms to minimize the Bethe and Kikuchi free energies: convergent alternatives to belief propagation. Neural Comput. 14, 1691–1722 (2002)