Statistics and Computing

, Volume 14, Issue 2, pp 143–166 | Cite as

Tree consistency and bounds on the performance of the max-product algorithm and its generalizations

  • Martin Wainwright
  • Tommi Jaakkola
  • Alan Willsky


Finding the maximum a posteriori (MAP) assignment of a discrete-state distribution specified by a graphical model requires solving an integer program. The max-product algorithm, also known as the max-plus or min-sum algorithm, is an iterative method for (approximately) solving such a problem on graphs with cycles. We provide a novel perspective on the algorithm, which is based on the idea of reparameterizing the distribution in terms of so-called pseudo-max-marginals on nodes and edges of the graph. This viewpoint provides conceptual insight into the max-product algorithm in application to graphs with cycles. First, we prove the existence of max-product fixed points for positive distributions on arbitrary graphs. Next, we show that the approximate max-marginals computed by max-product are guaranteed to be consistent, in a suitable sense to be defined, over every tree of the graph. We then turn to characterizing the nature of the approximation to the MAP assignment computed by max-product. We generalize previous work by showing that for any graph, the max-product assignment satisfies a particular optimality condition with respect to any subgraph containing at most one cycle per connected component. We use this optimality condition to derive upper bounds on the difference between the log probability of the true MAP assignment, and the log probability of a max-product assignment. Finally, we consider extensions of the max-product algorithm that operate over higher-order cliques, and show how our reparameterization analysis extends in a natural manner.

MAP estimation integer programming max-product min-sum max-plus graphical models max-marginals inference iterative decoding Viterbi algorithm dynamic programming 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aji S.M., Horn G., McEliece R.J., and Xu M. 1998. Iterative min-sum decoding of tail-biting codes. In: Proc. IEEE Information Theory Workshop, Killarney Ireland, pp. 68–69.Google Scholar
  2. Aji S.M. and McEliece R.J. 2000. The generalized distributive law. IEEE Trans. Info. Theory 46: 325–343.Google Scholar
  3. Barahona F. 1982. On the computational computational complexity of the Ising model. Journal of Physics A: Mathematical and General 15: 3241–3253.Google Scholar
  4. Benedetto S., Montorsi G., Divsalar D., and Pollara F. 1996. Soft-output decoding algorithms in iterative decoding of turbo codes. Technical report, Jet Propulsion Laboratory.Google Scholar
  5. Berge C. 1989. Hypergraphs. North-Holland Publishing Company, Amsterdam.Google Scholar
  6. Bertsekas D.P. 1995. Dynamic Programming and Stochastic Control, vol. 1. Athena Scientific, Belmont, MA.Google Scholar
  7. Besag J. 1986. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series B 48(3): 259–279.Google Scholar
  8. Bodlaender H. 1993. A tourist guide through treewidth. Acta Cybernetica 11: 1–21.Google Scholar
  9. Bollobás B. 1998. Modern Graph Theory. Springer-Verlag, New York.Google Scholar
  10. Boykov Y., Veksler O., and Zabih R. 2001. Fast approximate energy minimization via graph cuts. IEEE. Trans. PAMI 23: 1222–1238.Google Scholar
  11. Brémaud P. 1991. Markov Chains, Gibbs Fields, Monte Carlo Simulation, and Queues. Springer.Google Scholar
  12. Censor Y. and Zenios S.A. 1988. Parallel Optimization: Theory, Algorithms, and Applications. Numerical Mathematics and Scientific Computation. Oxford University Press.Google Scholar
  13. Cowell R.G., Dawid A.P., Lauritzen S.L., and Spiegelhalter D.J. 1999. Probabilistic Networks and Expert Systems. Statistics for Engineering and Information Science. Springer-Verlag.Google Scholar
  14. Dawid A.P. 1992. Applications of a general propagation algorithm for probabilistic expert systems. Statistics and Computing 2: 25–36.Google Scholar
  15. Forney G.D., Jr. 1973. The Viterbi algorithm. Proc. IEEE 61: 268–277.Google Scholar
  16. Forney G.D., Jr., Kschischang F.R., Marcus B., and Tuncel S. 2001a. Iterative decoding of tail-biting trellises and connections with symbolic dynamics. In: Codes, Systems and Graphical Models. Springer, pp. 239–264.Google Scholar
  17. Forney G.D., Jr., Koetter R., Kschischang F.R., and Reznick A. 2001b. On the effective weights of pseudocodewords for codes defined on graphs with cycles. In: Codes, Systems and Graphical Models. Springer, pp. 101–112.Google Scholar
  18. Freeman W.T. and Pasztor E.C. 1999. Learning to estimate scenes from images. In: NIPS, vol. 11.Google Scholar
  19. Freeman W.T. and Weiss Y. 2001. On the optimality of solutions of the max-product belief propagation algorithm in arbitrary graphs. IEEE Trans. Info. Theory 47: 736–744.Google Scholar
  20. Frey B.J. and Koetter R. 2000. Exact inference using the attenuated maxproduct algorithm. In: Advanced Mean Field Methods: Theory and Practice. MIT Press.Google Scholar
  21. Frey B.J., Koetter R., and Vardy A. 2001. Signal-space characterization of iterative decoding. IEEE Trans. Info. Theory. 47: 766–781.Google Scholar
  22. Fridman A. 2002.Topology-corrected belief revision in the ising model. Presented at Snowbird Learning workshop, Snowbird, UT.Google Scholar
  23. Gallager R.G. 1963. Low-Density Parity Check Codes. MIT Press, Cambridge, MA.Google Scholar
  24. Geman S. and Geman D. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Pat. Anal. Mach. Intell. 6: 721–741.Google Scholar
  25. Goemans M.X. and Williamson D.P. 1995. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM 42: 1115–1145.Google Scholar
  26. Grötschel M., Lovász L., and Schrijver A. 1993. Geometric algorithms and combinatorial optimization. Springer-Verlag, Berlin, Germany.Google Scholar
  27. Horn G. 1999. Iterative decoding and pseudocodewords. PhD thesis, California Institute of Technology.Google Scholar
  28. Jordan M. (Ed.) 1999. Learning in graphical models. MIT Press, Cambridge, MA.Google Scholar
  29. Kleinberg J. and Tardos E. 1999. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov random fields. In: IEEE Symp. Found. Comp. Science, pp. 14–24.Google Scholar
  30. Lauritzen S.L. 1996. Graphical models. Oxford University Press, Oxford.Google Scholar
  31. McEliece R.J. and Yildirim M. 2002. Belief propagation on partially ordered sets. In: Gilliam D. and Rosenthal J. (Eds.), Mathematical Theory of Systems and Networks. Institute for Mathematics and its Applications.Google Scholar
  32. Nilsson D. 1998. An efficient algorithm for finding the M most probable configurations in a probabilistic expert system. Statistics and Computing 8: 159–173.Google Scholar
  33. Ortega J.M. and Rheinboldt W.C. 2000. Iterative solution of nonlinear equations in several variables. Classics in Applied Mathematics. SIAM, New York.Google Scholar
  34. Pakzad P. and Anantharam V. 2002. Iterative algorithms and free energy minimization. In: Conference on Information Sciences and Systems.Google Scholar
  35. Pearl J. 1988. Probabilistic reasoning in intelligent systems. Morgan Kaufman, San Mateo.Google Scholar
  36. Stanley R.P. 1997. Enumerative combinatorics, vol. 1. Cambridge University Press, Cambridge, UK.Google Scholar
  37. Verdú S. and Poor H.V. 1987. Abstract dynamic programming models under commutativity conditions. SIAM J. Control and Optimization 25(4): 990–1006.Google Scholar
  38. Viterbi A. 1967. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Info. Theory IT-13: 260–269.Google Scholar
  39. Wainwright M.J., Jaakkola T.S., and Willsky A.S. 2003. Tree-based reparameterization framework for analysis of sum-product and related algorithms. IEEE Trans. Info. Theory 49(5): 1120–1146.Google Scholar
  40. Wainwright M.J. 2002. Stochastic processes on graphs with cycles: Geometric and variational approaches. PhD thesis, MIT.Google Scholar
  41. Weiss. Y. 2000. Correctness of local probability propagation in graphical models with loops. Neural Computation 12: 1–41.Google Scholar
  42. Wiberg N. 1996. Codes and decoding on general graphs. PhD thesis, University of Linkoping, Sweden.Google Scholar
  43. Yedidia J.S., Freeman W.T., and Weiss Y. 2001. Generalized belief propagation. In: NIPS 13, MIT Press, pp. 689–695.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Martin Wainwright
    • 1
  • Tommi Jaakkola
    • 2
  • Alan Willsky
    • 2
  1. 1.Electrical Engineering & Computer ScienceUC BerkeleyBerkeley
  2. 2.Electrical Engineering & Computer ScienceMITCambridge

Personalised recommendations