Skip to main content

Graphical Models and Message-Passing Algorithms: Some Introductory Lectures

  • Chapter
Mathematical Foundations of Complex Networked Information Systems

Part of the book series: Lecture Notes in Mathematics ((LNMCIME,volume 2141))

Abstract

Graphical models provide a framework for describing statistical dependencies in (possibly large) collections of random variables. At their core lie various correspondences between the conditional independence properties of a random vector, and the structure of an underlying graph used to represent its distribution. They have been used and studied within many sub-disciplines of statistics, applied mathematics, electrical engineering and computer science, including statistical machine learning and artificial intelligence, communication and information theory, statistical physics, network control theory, computational biology, statistical signal processing, natural language processing and computer vision among others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Recall that the message \(M_{s\rightarrow t}\) is a vector of \(\vert \mathcal{X}_{t}\vert \) numbers, one for each value \(x_{s} \in \mathcal{X}_{t}\).

References

  1. S. Aji, R. McEliece, The generalized distributive law. IEEE Trans. Inf. Theory 46, 325–343 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  2. S. Amari, Differential geometry of curved exponential families—curvatures and information loss. Ann. Stat. 10(2), 357–385 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  3. S. Amari, Differential-Geometrical Methods in Statistics (Springer, New York, 1985)

    Book  MATH  Google Scholar 

  4. P. Amestoy, T.A. Davis, I.S. Duff, An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Appl. 17, 886–905 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  5. A. Anandkumar, V.Y.F. Tan, F. Huang, A.S. Willsky, High-dimensional structure learning of Ising models: local separation criterion. Ann. Stat. 40(3), 1346–1375 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  6. S. Arnborg, Complexity of finding embeddings in a k-tree. SIAM J. Algebr. Discrete Math. 3(2), 277–284 (1987)

    Article  MathSciNet  Google Scholar 

  7. O. Banerjee, L.E. Ghaoui, A. d’Aspremont, Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9, 485–516 (2008)

    MATH  MathSciNet  Google Scholar 

  8. U. Bertele, F. Brioschi, Nonserial Dynamic Programming (Academic, New York, 1972)

    MATH  Google Scholar 

  9. D. Bertsekas, Nonlinear Programming (Athena Scientific, Belmont, MA, 1995)

    MATH  Google Scholar 

  10. J. Besag, Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B 36, 192–236 (1974)

    MATH  MathSciNet  Google Scholar 

  11. J. Besag, Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika 64(3), 616–618 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  12. J. Besag, On the statistical analysis of dirty pictures. J. R. Stat. Soc. Ser. B 48(3), 259–279 (1986)

    MATH  MathSciNet  Google Scholar 

  13. Y.M. Bishop, S.E. Fienberg, P.W. Holland, Discrete Multivariate Analysis: Theory and Practice (MIT Press, Boston, MA, 1975)

    MATH  Google Scholar 

  14. H.L. Bodlaender, Dynamic programming on graphs with bounded treewidth, in Automata, Languages and Programming, vol. 317 (Springer, Berlin, 1988), pp. 105–118

    Google Scholar 

  15. H. Bodlaender, A tourist guide through treewidth. Acta Cybern. 11, 1–21 (1993)

    MATH  MathSciNet  Google Scholar 

  16. H.L. Bodlaender, A linear-time algorithm for finding tree decompositions of small treewidth. SIAM J. Comput. 25, 1305–1317 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  17. B. Bollobás, Graph Theory: An Introductory Course (Springer, New York, 1979)

    Book  MATH  Google Scholar 

  18. S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004)

    Book  MATH  Google Scholar 

  19. L.D. Brown, Fundamentals of Statistical Exponential Families (Institute of Mathematical Statistics, Hayward, CA, 1986)

    MATH  Google Scholar 

  20. A. Cayley, A theorem on trees. Q. J. Math. 23, 376–378 (1889)

    Google Scholar 

  21. C.K. Chow, C.N. Liu, Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory IT-14, 462–467 (1968)

    Article  Google Scholar 

  22. P. Clifford, Markov random fields in statistics, in Disorder in Physical Systems, ed. by G. Grimmett, D.J.A. Welsh. Oxford Science Publications (Clarendon Press, Oxford, 1990)

    Google Scholar 

  23. T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms (MIT Press, Cambridge, MA, 1990)

    MATH  Google Scholar 

  24. I. Csiszar, I-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3(1), 146–158 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  25. I. Csiszár, Sanov property, generalized I-projection and a conditional limit theorem. Ann. Probab. 12(3), 768–793 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  26. I. Csiszár, A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling. Ann. Stat. 17(3), 1409–1413 (1989)

    Article  Google Scholar 

  27. J.N. Darroch, D. Ratcliff, Generalized iterative scaling for log-linear models. Ann. Math. Stat. 43, 1470–1480 (1972)

    Article  MATH  MathSciNet  Google Scholar 

  28. A.P. Dawid, Applications of a general propagation algorithm for probabilistic expert systems. Stat. Comput. 2, 25–36 (1992)

    Article  Google Scholar 

  29. R.L. Dykstra, An iterative procedure for obtaining I-projections onto the intersection of convex sets. Ann. Probab. 13(3), 975–984 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  30. B. Efron, The geometry of exponential families. Ann. Stat. 6, 362–376 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  31. S. Fienberg, An iterative procedure for estimation in contingency tables. Ann. Math. Stat. 41(3), 907–917 (1970)

    Article  MATH  MathSciNet  Google Scholar 

  32. G.D. Forney Jr., The Viterbi algorithm. Proc. IEEE 61, 268–277 (1973)

    Article  MathSciNet  Google Scholar 

  33. J. Friedman, T. Hastie, R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008)

    Article  MATH  Google Scholar 

  34. R.G. Gallager, Low-Density Parity Check Codes (MIT Press, Cambridge, MA, 1963)

    Google Scholar 

  35. S. Geman, D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI 6, 721–741 (1984)

    Article  MATH  Google Scholar 

  36. A. George, J.W.H. Liu, The evolution of the minimum degree ordering algorithm. SIAM Rev. 31(1), 1–19 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  37. G.R. Grimmett, A theorem about random fields. Bull. Lond. Math. Soc. 5, 81–84 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  38. S.J. Haberman, The Analysis of Frequency Data (University of Chicago Press, Chicago, IL, 1974)

    MATH  Google Scholar 

  39. E. Ising, Beitrag zur theorie der ferromagnetismus. Zeitschrift für Physik 31(1), 253–258 (1925)

    Article  Google Scholar 

  40. M. Kalisch, P. Bühlmann, Estimating high-dimensional directed acyclic graphs with the PC algorithm. J. Mach. Learn. Res. 8, 613–636 (2007)

    MATH  Google Scholar 

  41. R. Kalman, A new approach to linear filtering and prediction problems. Am. Soc. Mech. Eng.: Basic Eng. Ser. D 82, 35–45 (1960)

    Google Scholar 

  42. D. Karger, N. Srebro, Learning Markov networks: maximum bounded tree-width graphs, in Symposium on Discrete Algorithms (2001), pp. 392–401

    Google Scholar 

  43. F. Kschischang, B. Frey, H.A. Loeliger, Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  44. S.L. Lauritzen, Graphical Models (Oxford University Press, Oxford, 1996)

    Google Scholar 

  45. S.L. Lauritzen, D.J. Spiegelhalter, Local computations with probabilities on graphical structures and their application to expert systems (with discussion). J. R. Stat. Soc. B 50, 155–224 (1988)

    MathSciNet  Google Scholar 

  46. H.A. Loeliger, An introduction to factor graphs. IEEE Signal Process. Mag. 21, 28–41 (2004)

    Article  Google Scholar 

  47. N. Meinshausen, A note on the Lasso for graphical Gaussian model selection. Stat. Probab. Lett. 78(7), 880–884 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  48. N. Meinshausen, P. Bühlmann, High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34, 1436–1462 (2006)

    Article  MATH  Google Scholar 

  49. S.V. Parter, The use of linear graphs in Gaussian elimination. SIAM Rev. 3, 119–130 (1961)

    Article  MATH  MathSciNet  Google Scholar 

  50. J. Pearl, Probabilistic Reasoning in Intelligent Systems (Morgan Kaufman, San Mateo, 1988)

    Google Scholar 

  51. L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–285 (1989)

    Article  Google Scholar 

  52. P. Ravikumar, M.J. Wainwright, J. Lafferty, High-dimensional Ising model selection using 1-regularized logistic regression. Ann. Stat. 38(3), 1287–1319 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  53. P. Ravikumar, M.J. Wainwright, G. Raskutti, B. Yu, High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergence. Electron. J. Stat. 5, 935–980 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  54. D.J. Rose, Triangulated graphs and the elimination process. J. Math. Anal. Appl. 32, 597–609 (1970)

    Article  MATH  MathSciNet  Google Scholar 

  55. G.R. Shafer, P.P. Shenoy, Probability propagation. Ann. Math. Artif. Intell. 2, 327–352 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  56. P. Spirtes, C. Glymour, R. Scheines, Causation, Prediction and Search (MIT Press, Cambridge, 2000)

    Google Scholar 

  57. N. Srebro, Maximum likelihood Markov networks: an algorithmic approach. Master’s thesis, MIT, 2000

    Google Scholar 

  58. F.F. Stephan, Iterative method of adjusting sample frequency tables when expected margins are known. Ann. Math. Stat. 13, 166–178 (1942)

    Article  MATH  Google Scholar 

  59. R.M. Tanner, A recursive approach to low complexity codes. IEEE Trans. Inf. Theory IT-27, 533–547 (1980)

    MathSciNet  Google Scholar 

  60. R.E. Tarjan, M. Yannakakis, Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J. Comput. 13(3), 566–579 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  61. J.H. van Lint, R.M. Wilson, A Course in Combinatorics (Cambridge University Press, Cambridge, 1992)

    MATH  Google Scholar 

  62. S. Verdú, H.V. Poor, Abstract dynamic programming models under commutativity conditions. SIAM J. Control Optim. 25(4), 990–1006 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  63. A. Viterbi, Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Inf. Theory IT-13, 260–269 (1967)

    Article  Google Scholar 

  64. M.J. Wainwright, M.I. Jordan, Graphical models, exponential families and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)

    MATH  Google Scholar 

  65. M.J. Wainwright, T.S. Jaakkola, A.S. Willsky, Tree-based reparameterization framework for analysis of sum-product and related algorithms. IEEE Trans. Inf. Theory 49(5), 1120–1146 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  66. M.J. Wainwright, T.S. Jaakkola, A.S. Willsky, Tree consistency and bounds on the max-product algorithm and its generalizations. Stat. Comput. 14, 143–166 (2004)

    Article  MathSciNet  Google Scholar 

  67. M. Yannakakis, Computing the minimum fill-in is NP-complete. SIAM J. Algebr. Discrete Methods 2(1), 77–79 (1981)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was partially supported by NSF grants CCF-0545862, DMS-0605165 and CCF-0635372, and AFOSR grant 09NL184.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin J. Wainwright .

Editor information

Editors and Affiliations

Appendix: Triangulation and Equivalent Graph-Theoretic Properties

Appendix: Triangulation and Equivalent Graph-Theoretic Properties

In this Appendix, we prove Theorem 3 as part of a more general discussion of triangulation and related graph-theoretic properties. Having already defined the notions of triangulations and junction tree, let us now define the closely related notions of decompsable and recursively simplicial. The following notion serves to formalize the “divide-and-conquer” nature of efficient algorithms:

Definition 9

A graph \(\mathcal{G} = (\mathcal{V},\mathcal{E})\) is decomposable if either it is complete, or its vertex set \(\mathcal{V}\) can be split into the disjoint union of three sets \(A \cup B \cup S\) such that (a) \(A\) and \(B\) are non-empty; (b) the set \(S\) separates \(A\) and \(B\) in \(\mathcal{G}\), and is complete (i.e., \((s,t) \in \mathcal{E}\) for all \(s,t \in S\)); and (c) \(A \cup S\) and \(B \cup S\) are also decomposable.

Recall from our discussion of the elimination algorithm in Sect. 3.1 that when a vertex is removed from the graph, the algorithm always connects together all of its neighbors, thereby creating additional edges in the reduced graph. The following property characterizes when there is an elimination ordering such that no edges are added by the elimination algorithm.

Definition 10

A vertex is simplicial if its neighbors form a complete subgraph. A non-empty graph is recursively simplicial if it contains a simplicial vertex, and when s is removed, any graph that remains is recursively simplicial.

It should be intuitively clear that these four properties—namely, triangulated, decomposable, recursively simplicial, and having a junction tree—are related. We now show that all four properties are actually equivalent:

Theorem 4

The following properties of an undirected graph \(\mathcal{G}\) are all equivalent:

Property (T)::

\(\mathcal{G}\) is triangulated.

Property (D)::

\(\mathcal{G}\) is decomposable.

Property (R)::

\(\mathcal{G}\) is recursively simplicial.

Property (J)::

\(\mathcal{G}\) has a junction tree.

We prove the sequence of implications \((T) \Rightarrow (D) \Rightarrow (R) \Rightarrow (J) \Rightarrow (T)\).

(T) \(\Rightarrow \) (D): :

We proceed via induction on the graph size \(N\). The claim is trivial for \(N = 1\), so let us assume it for all graphs with \(N\) vertices, and prove that it also holds for any graph \(\mathcal{G}\) with \(N + 1\) vertices. If \(\mathcal{G}\) is complete, then it is certainly decomposable. Moreover, if \(\mathcal{G}\) has more than one connected component, each of which is complete, then it is also decomposable. Otherwise, we may assume that at least one connected component of \(\mathcal{G}\) is not complete. (Without loss of generality in the argument to follow, we assume that \(\mathcal{G}\) has a single connected component which is not complete.) Since \(\mathcal{G}\) is not complete, it contains two non-adjacent vertices \(a,b\). Let \(S\) be a minimal set that separates \(a\) and \(b\); the set \(S\) must be non-empty since \(\mathcal{G}\) has a single connected component. Define \(A\) as the set of all vertices connected to a in \(\mathcal{V}\setminus S\), and set \(B:= \mathcal{V}\setminus (A \cup S)\). Clearly, \(S\) separates \(A\) from \(B\) in \(\mathcal{G}\).

Now we need to show that \(S\) is complete. If \(\vert S\vert = 1\), the claim is trivial. Otherwise, for any two distinct vertices \(s,t \in \mathcal{S}\), there exist paths \((s,a_{1},\ldots,a_{i},t)\) and \((s,b_{1},\ldots,b_{j},t)\) where \(a_{k} \in A\), \(b_{k} \in B\) and i, j ≥ 1. (This claim relies on the minimality of \(S\): if there did not exist a path from a to s, then vertex s could be removed from \(\mathcal{S}\). Similar reasoning applies to establish a path from a to t, and also the paths involving \(B\).)

We claim that s and t are joined by an edge. If not, take the path from s to t through \(A\) with minimal length, and similarly for \(B\). This pair of paths forms a cycle of length at least four, which must have a chord. The chord cannot be in \(A\) or \(B\), since this would contradict minimality. It cannot be between vertices in \(A\) and \(B\) since \(S\) separates these two sets. Therefore, s and t are joined, and \(S\) is complete.

Finally, we need to show that \(A \cup S\) and \(B \cup S\) are also decomposable. But they must be triangulated, since otherwise \(\mathcal{G}\) would not be triangulated, and they have cardinality strictly smaller than \(N + 1\), so the result follows by induction.

(D) \(\Rightarrow \) (R): :

Proof by induction on graph size \(N\). Trivial for \(N = 1\). To complete the induction step, we require the following lemma:

Lemma 6

Every decomposable graph with at least two vertices has at least two simplicial vertices. If the graph is not complete, these vertices can be chosen to be non-adjacent.

Proof

Proof by induction on graph size \(N\). Trivial for \(N = 2\). Consider a decomposable graph with \(N + 1\) vertices. If the graph is complete, all vertices are simplicial. Otherwise, decompose the graph into disjoint sets \(A\), \(B\) and \(S\). The subgraphs \(A \cup S\) and \(B \cup S\) are also chordless, and hence we have two simplicial vertices in \(A \cup S\). If \(A \cup S\) is not complete, these can be chosen to be non-adjacent. Given that \(S\) is complete, one of the vertices can be taken in \(A\). Otherwise, if \(A \cup S\) is complete, choose any node in \(A\). Proceed in a symmetric fashion for \(B\). The simplicial vertices thus chosen will not be connected, since \(S\) separates \(A\) and \(B\). □ 

Thus, given a decomposable graph, we can find some simplicial vertex s to remove. We need to show that the remaining graph is also decomposable, so as to apply the induction hypothesis. In particular, we prove that \(\mathcal{G}\) decomposable implies that any vertex-induced subgraph \(\mathcal{G}[U]\) is also decomposable. We prove this induction on | U | . Trivial for | U |  = 1. Trivially true if \(\mathcal{G}\) is complete; otherwise, break into \(A \cup S \cup B\). Removing a node from \(S\) leaves \(S\setminus \{s\}\) complete, and \(A \cup S\) and \(B \cup S\) decomposable by the induction hypothesis. Removing a node from \(A\) does not change \(B \cup S\), and either leaves \(A\) empty (in which case remainder \(B \cup S\) is decomposable), or leaves \(A \cup S\) decomposable by induction.

(R) \(\Rightarrow \) (J): :

Proof by induction on graph size \(N\). Trivial for \(N = 1\). Let s be a simplicial vertex, and consider subgraph \(\mathcal{G}'\) obtained by removing s. By induction, \(\mathcal{G}'\) has a junction tree \(\mathcal{T}'\), which we will extend to a junction tree \(\mathcal{T}\) for \(\mathcal{G}\). Let C′ be a maximal clique in \(\mathcal{T}'\) that contains all the neighbors of s; this must exist since s is simplicial. If C′ is precisely the neighbors of s, then we can add s to C′ so as to obtain \(\mathcal{T}\), which is a junction tree for \(\mathcal{G}\).

If not (i.e., if C′ contains the neighbors of s as a proper subset), then we can add a new clique containing s and its neighbors to \(\mathcal{T}'\), with an edge to C′. Since s is in no other clique of \(\mathcal{T}\) and \(C\setminus \{s\}\) is a subset of \(C'\), the tree \(\mathcal{T}'\) is a junction tree for \(\mathcal{G}\)¿

(J) \(\Rightarrow \) (T): :

Proof by induction on number of vertices M in junction tree. For M = 1, \(\mathcal{G}\) is complete and hence triangulated. Consider a junction tree \(\mathcal{T}\) with M + 1 vertices. For a fixed leaf C of \(\mathcal{T}\), let C′ be the unique neighbor of C in \(\mathcal{T}\), and let \(\mathcal{T}'\) be the tree that remains when C is removed.

Step 1::

If C ⊆ C′, then \(\mathcal{T}'\) is a junction tree for \(\mathcal{G}\), and result follows by induction.

Step 2::

If \(C \cap C' \subset C\) (in a strict sense), then consider the subgraph \(\mathcal{G}'\) formed by removing the non-empty set \(R:= C\setminus C'\) from \(\mathcal{V}\). We claim that it is chordal. First, observe that R has an empty intersection with every clique in \(\mathcal{T}'\) (using junction tree property). (That is, say \(R \cap D\neq 0\) for some clique node D in \(\mathcal{T}'\). Then there exists s ∈ CD, but \(s\notin C'\), with violates running intersection.) Follows that \(\mathcal{T}'\) is a junction tree for \(\mathcal{G}'\), and so \(\mathcal{G}'\) is chordal (by applying induction hypothesis).

Step 3::

Now claim that \(\mathcal{G}\) is chordal. Any cycle entirely contained in \(\mathcal{G}'\) is chordless by induction. If the cycle is entirely within the complete subgraph \(\mathcal{G}[C]\), it is also chordless. Any other cycle must intersect R, \(C \cap C'\) and \(\mathcal{V}\setminus C\). In particular, it must cross \(C \cap C'\) twice, and since this set is complete, it has a chord.

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Wainwright, M.J. (2015). Graphical Models and Message-Passing Algorithms: Some Introductory Lectures. In: Fagnani, F., Fosson, S., Ravazzi, C. (eds) Mathematical Foundations of Complex Networked Information Systems. Lecture Notes in Mathematics(), vol 2141. Springer, Cham. https://doi.org/10.1007/978-3-319-16967-5_3

Download citation

Publish with us

Policies and ethics