Graphical Models and Message-Passing Algorithms: Some Introductory Lectures

Wainwright, Martin J.

doi:10.1007/978-3-319-16967-5_3

Martin J. Wainwright⁸

Part of the book series: Lecture Notes in Mathematics ((LNMCIME,volume 2141))

1204 Accesses
1 Citations

Abstract

Graphical models provide a framework for describing statistical dependencies in (possibly large) collections of random variables. At their core lie various correspondences between the conditional independence properties of a random vector, and the structure of an underlying graph used to represent its distribution. They have been used and studied within many sub-disciplines of statistics, applied mathematics, electrical engineering and computer science, including statistical machine learning and artificial intelligence, communication and information theory, statistical physics, network control theory, computational biology, statistical signal processing, natural language processing and computer vision among others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Recall that the message \(M_{s\rightarrow t}\) is a vector of \(\vert \mathcal{X}_{t}\vert \) numbers, one for each value \(x_{s} \in \mathcal{X}_{t}\).

References

S. Aji, R. McEliece, The generalized distributive law. IEEE Trans. Inf. Theory 46, 325–343 (2000)
Article MATH MathSciNet Google Scholar
S. Amari, Differential geometry of curved exponential families—curvatures and information loss. Ann. Stat. 10(2), 357–385 (1982)
Article MATH MathSciNet Google Scholar
S. Amari, Differential-Geometrical Methods in Statistics (Springer, New York, 1985)
Book MATH Google Scholar
P. Amestoy, T.A. Davis, I.S. Duff, An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Appl. 17, 886–905 (1996)
Article MATH MathSciNet Google Scholar
A. Anandkumar, V.Y.F. Tan, F. Huang, A.S. Willsky, High-dimensional structure learning of Ising models: local separation criterion. Ann. Stat. 40(3), 1346–1375 (2012)
Article MATH MathSciNet Google Scholar
S. Arnborg, Complexity of finding embeddings in a k-tree. SIAM J. Algebr. Discrete Math. 3(2), 277–284 (1987)
Article MathSciNet Google Scholar
O. Banerjee, L.E. Ghaoui, A. d’Aspremont, Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9, 485–516 (2008)
MATH MathSciNet Google Scholar
U. Bertele, F. Brioschi, Nonserial Dynamic Programming (Academic, New York, 1972)
MATH Google Scholar
D. Bertsekas, Nonlinear Programming (Athena Scientific, Belmont, MA, 1995)
MATH Google Scholar
J. Besag, Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B 36, 192–236 (1974)
MATH MathSciNet Google Scholar
J. Besag, Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika 64(3), 616–618 (1977)
Article MATH MathSciNet Google Scholar
J. Besag, On the statistical analysis of dirty pictures. J. R. Stat. Soc. Ser. B 48(3), 259–279 (1986)
MATH MathSciNet Google Scholar
Y.M. Bishop, S.E. Fienberg, P.W. Holland, Discrete Multivariate Analysis: Theory and Practice (MIT Press, Boston, MA, 1975)
MATH Google Scholar
H.L. Bodlaender, Dynamic programming on graphs with bounded treewidth, in Automata, Languages and Programming, vol. 317 (Springer, Berlin, 1988), pp. 105–118
Google Scholar
H. Bodlaender, A tourist guide through treewidth. Acta Cybern. 11, 1–21 (1993)
MATH MathSciNet Google Scholar
H.L. Bodlaender, A linear-time algorithm for finding tree decompositions of small treewidth. SIAM J. Comput. 25, 1305–1317 (1996)
Article MATH MathSciNet Google Scholar
B. Bollobás, Graph Theory: An Introductory Course (Springer, New York, 1979)
Book MATH Google Scholar
S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004)
Book MATH Google Scholar
L.D. Brown, Fundamentals of Statistical Exponential Families (Institute of Mathematical Statistics, Hayward, CA, 1986)
MATH Google Scholar
A. Cayley, A theorem on trees. Q. J. Math. 23, 376–378 (1889)
Google Scholar
C.K. Chow, C.N. Liu, Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory IT-14, 462–467 (1968)
Article Google Scholar
P. Clifford, Markov random fields in statistics, in Disorder in Physical Systems, ed. by G. Grimmett, D.J.A. Welsh. Oxford Science Publications (Clarendon Press, Oxford, 1990)
Google Scholar
T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms (MIT Press, Cambridge, MA, 1990)
MATH Google Scholar
I. Csiszar, I-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3(1), 146–158 (1975)
Article MATH MathSciNet Google Scholar
I. Csiszár, Sanov property, generalized I-projection and a conditional limit theorem. Ann. Probab. 12(3), 768–793 (1984)
Article MATH MathSciNet Google Scholar
I. Csiszár, A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling. Ann. Stat. 17(3), 1409–1413 (1989)
Article Google Scholar
J.N. Darroch, D. Ratcliff, Generalized iterative scaling for log-linear models. Ann. Math. Stat. 43, 1470–1480 (1972)
Article MATH MathSciNet Google Scholar
A.P. Dawid, Applications of a general propagation algorithm for probabilistic expert systems. Stat. Comput. 2, 25–36 (1992)
Article Google Scholar
R.L. Dykstra, An iterative procedure for obtaining I-projections onto the intersection of convex sets. Ann. Probab. 13(3), 975–984 (1985)
Article MATH MathSciNet Google Scholar
B. Efron, The geometry of exponential families. Ann. Stat. 6, 362–376 (1978)
Article MATH MathSciNet Google Scholar
S. Fienberg, An iterative procedure for estimation in contingency tables. Ann. Math. Stat. 41(3), 907–917 (1970)
Article MATH MathSciNet Google Scholar
G.D. Forney Jr., The Viterbi algorithm. Proc. IEEE 61, 268–277 (1973)
Article MathSciNet Google Scholar
J. Friedman, T. Hastie, R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008)
Article MATH Google Scholar
R.G. Gallager, Low-Density Parity Check Codes (MIT Press, Cambridge, MA, 1963)
Google Scholar
S. Geman, D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI 6, 721–741 (1984)
Article MATH Google Scholar
A. George, J.W.H. Liu, The evolution of the minimum degree ordering algorithm. SIAM Rev. 31(1), 1–19 (1989)
Article MATH MathSciNet Google Scholar
G.R. Grimmett, A theorem about random fields. Bull. Lond. Math. Soc. 5, 81–84 (1973)
Article MATH MathSciNet Google Scholar
S.J. Haberman, The Analysis of Frequency Data (University of Chicago Press, Chicago, IL, 1974)
MATH Google Scholar
E. Ising, Beitrag zur theorie der ferromagnetismus. Zeitschrift für Physik 31(1), 253–258 (1925)
Article Google Scholar
M. Kalisch, P. Bühlmann, Estimating high-dimensional directed acyclic graphs with the PC algorithm. J. Mach. Learn. Res. 8, 613–636 (2007)
MATH Google Scholar
R. Kalman, A new approach to linear filtering and prediction problems. Am. Soc. Mech. Eng.: Basic Eng. Ser. D 82, 35–45 (1960)
Google Scholar
D. Karger, N. Srebro, Learning Markov networks: maximum bounded tree-width graphs, in Symposium on Discrete Algorithms (2001), pp. 392–401
Google Scholar
F. Kschischang, B. Frey, H.A. Loeliger, Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
Article MATH MathSciNet Google Scholar
S.L. Lauritzen, Graphical Models (Oxford University Press, Oxford, 1996)
Google Scholar
S.L. Lauritzen, D.J. Spiegelhalter, Local computations with probabilities on graphical structures and their application to expert systems (with discussion). J. R. Stat. Soc. B 50, 155–224 (1988)
MathSciNet Google Scholar
H.A. Loeliger, An introduction to factor graphs. IEEE Signal Process. Mag. 21, 28–41 (2004)
Article Google Scholar
N. Meinshausen, A note on the Lasso for graphical Gaussian model selection. Stat. Probab. Lett. 78(7), 880–884 (2008)
Article MATH MathSciNet Google Scholar
N. Meinshausen, P. Bühlmann, High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34, 1436–1462 (2006)
Article MATH Google Scholar
S.V. Parter, The use of linear graphs in Gaussian elimination. SIAM Rev. 3, 119–130 (1961)
Article MATH MathSciNet Google Scholar
J. Pearl, Probabilistic Reasoning in Intelligent Systems (Morgan Kaufman, San Mateo, 1988)
Google Scholar
L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–285 (1989)
Article Google Scholar
P. Ravikumar, M.J. Wainwright, J. Lafferty, High-dimensional Ising model selection using ℓ ₁-regularized logistic regression. Ann. Stat. 38(3), 1287–1319 (2010)
Article MATH MathSciNet Google Scholar
P. Ravikumar, M.J. Wainwright, G. Raskutti, B. Yu, High-dimensional covariance estimation by minimizing ℓ ₁-penalized log-determinant divergence. Electron. J. Stat. 5, 935–980 (2011)
Article MATH MathSciNet Google Scholar
D.J. Rose, Triangulated graphs and the elimination process. J. Math. Anal. Appl. 32, 597–609 (1970)
Article MATH MathSciNet Google Scholar
G.R. Shafer, P.P. Shenoy, Probability propagation. Ann. Math. Artif. Intell. 2, 327–352 (1990)
Article MATH MathSciNet Google Scholar
P. Spirtes, C. Glymour, R. Scheines, Causation, Prediction and Search (MIT Press, Cambridge, 2000)
Google Scholar
N. Srebro, Maximum likelihood Markov networks: an algorithmic approach. Master’s thesis, MIT, 2000
Google Scholar
F.F. Stephan, Iterative method of adjusting sample frequency tables when expected margins are known. Ann. Math. Stat. 13, 166–178 (1942)
Article MATH Google Scholar
R.M. Tanner, A recursive approach to low complexity codes. IEEE Trans. Inf. Theory IT-27, 533–547 (1980)
MathSciNet Google Scholar
R.E. Tarjan, M. Yannakakis, Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J. Comput. 13(3), 566–579 (1984)
Article MATH MathSciNet Google Scholar
J.H. van Lint, R.M. Wilson, A Course in Combinatorics (Cambridge University Press, Cambridge, 1992)
MATH Google Scholar
S. Verdú, H.V. Poor, Abstract dynamic programming models under commutativity conditions. SIAM J. Control Optim. 25(4), 990–1006 (1987)
Article MATH MathSciNet Google Scholar
A. Viterbi, Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Inf. Theory IT-13, 260–269 (1967)
Article Google Scholar
M.J. Wainwright, M.I. Jordan, Graphical models, exponential families and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)
MATH Google Scholar
M.J. Wainwright, T.S. Jaakkola, A.S. Willsky, Tree-based reparameterization framework for analysis of sum-product and related algorithms. IEEE Trans. Inf. Theory 49(5), 1120–1146 (2003)
Article MATH MathSciNet Google Scholar
M.J. Wainwright, T.S. Jaakkola, A.S. Willsky, Tree consistency and bounds on the max-product algorithm and its generalizations. Stat. Comput. 14, 143–166 (2004)
Article MathSciNet Google Scholar
M. Yannakakis, Computing the minimum fill-in is NP-complete. SIAM J. Algebr. Discrete Methods 2(1), 77–79 (1981)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

This work was partially supported by NSF grants CCF-0545862, DMS-0605165 and CCF-0635372, and AFOSR grant 09NL184.

Author information

Authors and Affiliations

Department of Statistics, UC Berkeley, Berkeley, CA, 94720, USA
Martin J. Wainwright

Authors

Martin J. Wainwright
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin J. Wainwright .

Editor information

Editors and Affiliations

DISMA, Politecnico di Torino, Torino, Italy
Fabio Fagnani
DET, Politecnico di Torino, Torino, Italy
Sophie M. Fosson
DET, Politecnico di Torino, Torino, Italy
Chiara Ravazzi

Appendix: Triangulation and Equivalent Graph-Theoretic Properties

In this Appendix, we prove Theorem 3 as part of a more general discussion of triangulation and related graph-theoretic properties. Having already defined the notions of triangulations and junction tree, let us now define the closely related notions of decompsable and recursively simplicial. The following notion serves to formalize the “divide-and-conquer” nature of efficient algorithms:

Definition 9

A graph \(\mathcal{G} = (\mathcal{V},\mathcal{E})\) is decomposable if either it is complete, or its vertex set \(\mathcal{V}\) can be split into the disjoint union of three sets \(A \cup B \cup S\) such that (a) \(A\) and \(B\) are non-empty; (b) the set \(S\) separates \(A\) and \(B\) in \(\mathcal{G}\), and is complete (i.e., \((s,t) \in \mathcal{E}\) for all \(s,t \in S\)); and (c) \(A \cup S\) and \(B \cup S\) are also decomposable.

Recall from our discussion of the elimination algorithm in Sect. 3.1 that when a vertex is removed from the graph, the algorithm always connects together all of its neighbors, thereby creating additional edges in the reduced graph. The following property characterizes when there is an elimination ordering such that no edges are added by the elimination algorithm.

Definition 10

A vertex is simplicial if its neighbors form a complete subgraph. A non-empty graph is recursively simplicial if it contains a simplicial vertex, and when s is removed, any graph that remains is recursively simplicial.

It should be intuitively clear that these four properties—namely, triangulated, decomposable, recursively simplicial, and having a junction tree—are related. We now show that all four properties are actually equivalent:

Theorem 4

The following properties of an undirected graph \(\mathcal{G}\) are all equivalent:

Property (T)::: \(\mathcal{G}\) is triangulated.
Property (D)::: \(\mathcal{G}\) is decomposable.
Property (R)::: \(\mathcal{G}\) is recursively simplicial.
Property (J)::: \(\mathcal{G}\) has a junction tree.

We prove the sequence of implications \((T) \Rightarrow (D) \Rightarrow (R) \Rightarrow (J) \Rightarrow (T)\).

(T) \(\Rightarrow \) (D): :

We proceed via induction on the graph size \(N\). The claim is trivial for \(N = 1\), so let us assume it for all graphs with \(N\) vertices, and prove that it also holds for any graph \(\mathcal{G}\) with \(N + 1\) vertices. If \(\mathcal{G}\) is complete, then it is certainly decomposable. Moreover, if \(\mathcal{G}\) has more than one connected component, each of which is complete, then it is also decomposable. Otherwise, we may assume that at least one connected component of \(\mathcal{G}\) is not complete. (Without loss of generality in the argument to follow, we assume that \(\mathcal{G}\) has a single connected component which is not complete.) Since \(\mathcal{G}\) is not complete, it contains two non-adjacent vertices \(a,b\). Let \(S\) be a minimal set that separates \(a\) and \(b\); the set \(S\) must be non-empty since \(\mathcal{G}\) has a single connected component. Define \(A\) as the set of all vertices connected to a in \(\mathcal{V}\setminus S\), and set \(B:= \mathcal{V}\setminus (A \cup S)\). Clearly, \(S\) separates \(A\) from \(B\) in \(\mathcal{G}\).

Now we need to show that \(S\) is complete. If \(\vert S\vert = 1\), the claim is trivial. Otherwise, for any two distinct vertices \(s,t \in \mathcal{S}\), there exist paths \((s,a_{1},\ldots,a_{i},t)\) and \((s,b_{1},\ldots,b_{j},t)\) where \(a_{k} \in A\), \(b_{k} \in B\) and i, j ≥ 1. (This claim relies on the minimality of \(S\): if there did not exist a path from a to s, then vertex s could be removed from \(\mathcal{S}\). Similar reasoning applies to establish a path from a to t, and also the paths involving \(B\).)

We claim that s and t are joined by an edge. If not, take the path from s to t through \(A\) with minimal length, and similarly for \(B\). This pair of paths forms a cycle of length at least four, which must have a chord. The chord cannot be in \(A\) or \(B\), since this would contradict minimality. It cannot be between vertices in \(A\) and \(B\) since \(S\) separates these two sets. Therefore, s and t are joined, and \(S\) is complete.

Finally, we need to show that \(A \cup S\) and \(B \cup S\) are also decomposable. But they must be triangulated, since otherwise \(\mathcal{G}\) would not be triangulated, and they have cardinality strictly smaller than \(N + 1\), so the result follows by induction.

(D) \(\Rightarrow \) (R): :

Proof by induction on graph size \(N\). Trivial for \(N = 1\). To complete the induction step, we require the following lemma:

Lemma 6

Every decomposable graph with at least two vertices has at least two simplicial vertices. If the graph is not complete, these vertices can be chosen to be non-adjacent.

Proof

Proof by induction on graph size \(N\). Trivial for \(N = 2\). Consider a decomposable graph with \(N + 1\) vertices. If the graph is complete, all vertices are simplicial. Otherwise, decompose the graph into disjoint sets \(A\), \(B\) and \(S\). The subgraphs \(A \cup S\) and \(B \cup S\) are also chordless, and hence we have two simplicial vertices in \(A \cup S\). If \(A \cup S\) is not complete, these can be chosen to be non-adjacent. Given that \(S\) is complete, one of the vertices can be taken in \(A\). Otherwise, if \(A \cup S\) is complete, choose any node in \(A\). Proceed in a symmetric fashion for \(B\). The simplicial vertices thus chosen will not be connected, since \(S\) separates \(A\) and \(B\). □

Thus, given a decomposable graph, we can find some simplicial vertex s to remove. We need to show that the remaining graph is also decomposable, so as to apply the induction hypothesis. In particular, we prove that \(\mathcal{G}\) decomposable implies that any vertex-induced subgraph \(\mathcal{G}[U]\) is also decomposable. We prove this induction on | U | . Trivial for | U | = 1. Trivially true if \(\mathcal{G}\) is complete; otherwise, break into \(A \cup S \cup B\). Removing a node from \(S\) leaves \(S\setminus \{s\}\) complete, and \(A \cup S\) and \(B \cup S\) decomposable by the induction hypothesis. Removing a node from \(A\) does not change \(B \cup S\), and either leaves \(A\) empty (in which case remainder \(B \cup S\) is decomposable), or leaves \(A \cup S\) decomposable by induction.

(R) \(\Rightarrow \) (J): :

Proof by induction on graph size \(N\). Trivial for \(N = 1\). Let s be a simplicial vertex, and consider subgraph \(\mathcal{G}'\) obtained by removing s. By induction, \(\mathcal{G}'\) has a junction tree \(\mathcal{T}'\), which we will extend to a junction tree \(\mathcal{T}\) for \(\mathcal{G}\). Let C′ be a maximal clique in \(\mathcal{T}'\) that contains all the neighbors of s; this must exist since s is simplicial. If C′ is precisely the neighbors of s, then we can add s to C′ so as to obtain \(\mathcal{T}\), which is a junction tree for \(\mathcal{G}\).

If not (i.e., if C′ contains the neighbors of s as a proper subset), then we can add a new clique containing s and its neighbors to \(\mathcal{T}'\), with an edge to C′. Since s is in no other clique of \(\mathcal{T}\) and \(C\setminus \{s\}\) is a subset of \(C'\), the tree \(\mathcal{T}'\) is a junction tree for \(\mathcal{G}\)¿

(J) \(\Rightarrow \) (T): :

Proof by induction on number of vertices M in junction tree. For M = 1, \(\mathcal{G}\) is complete and hence triangulated. Consider a junction tree \(\mathcal{T}\) with M + 1 vertices. For a fixed leaf C of \(\mathcal{T}\), let C′ be the unique neighbor of C in \(\mathcal{T}\), and let \(\mathcal{T}'\) be the tree that remains when C is removed.

Step 1::: If C ⊆ C′, then \(\mathcal{T}'\) is a junction tree for \(\mathcal{G}\), and result follows by induction.
Step 2::: If \(C \cap C' \subset C\) (in a strict sense), then consider the subgraph \(\mathcal{G}'\) formed by removing the non-empty set \(R:= C\setminus C'\) from \(\mathcal{V}\). We claim that it is chordal. First, observe that R has an empty intersection with every clique in \(\mathcal{T}'\) (using junction tree property). (That is, say \(R \cap D\neq 0\) for some clique node D in \(\mathcal{T}'\). Then there exists s ∈ C ∩ D, but \(s\notin C'\), with violates running intersection.) Follows that \(\mathcal{T}'\) is a junction tree for \(\mathcal{G}'\), and so \(\mathcal{G}'\) is chordal (by applying induction hypothesis).
Step 3::: Now claim that \(\mathcal{G}\) is chordal. Any cycle entirely contained in \(\mathcal{G}'\) is chordless by induction. If the cycle is entirely within the complete subgraph \(\mathcal{G}[C]\), it is also chordless. Any other cycle must intersect R, \(C \cap C'\) and \(\mathcal{V}\setminus C\). In particular, it must cross \(C \cap C'\) twice, and since this set is complete, it has a chord.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wainwright, M.J. (2015). Graphical Models and Message-Passing Algorithms: Some Introductory Lectures. In: Fagnani, F., Fosson, S., Ravazzi, C. (eds) Mathematical Foundations of Complex Networked Information Systems. Lecture Notes in Mathematics(), vol 2141. Springer, Cham. https://doi.org/10.1007/978-3-319-16967-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-16967-5_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16966-8
Online ISBN: 978-3-319-16967-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Graphical Models and Message-Passing Algorithms: Some Introductory Lectures

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Triangulation and Equivalent Graph-Theoretic Properties

Appendix: Triangulation and Equivalent Graph-Theoretic Properties

Definition 9

Definition 10

Theorem 4

Lemma 6

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation