Abstract
Graphical models provide a framework for describing statistical dependencies in (possibly large) collections of random variables. At their core lie various correspondences between the conditional independence properties of a random vector, and the structure of an underlying graph used to represent its distribution. They have been used and studied within many sub-disciplines of statistics, applied mathematics, electrical engineering and computer science, including statistical machine learning and artificial intelligence, communication and information theory, statistical physics, network control theory, computational biology, statistical signal processing, natural language processing and computer vision among others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Recall that the message \(M_{s\rightarrow t}\) is a vector of \(\vert \mathcal{X}_{t}\vert \) numbers, one for each value \(x_{s} \in \mathcal{X}_{t}\).
References
S. Aji, R. McEliece, The generalized distributive law. IEEE Trans. Inf. Theory 46, 325–343 (2000)
S. Amari, Differential geometry of curved exponential families—curvatures and information loss. Ann. Stat. 10(2), 357–385 (1982)
S. Amari, Differential-Geometrical Methods in Statistics (Springer, New York, 1985)
P. Amestoy, T.A. Davis, I.S. Duff, An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Appl. 17, 886–905 (1996)
A. Anandkumar, V.Y.F. Tan, F. Huang, A.S. Willsky, High-dimensional structure learning of Ising models: local separation criterion. Ann. Stat. 40(3), 1346–1375 (2012)
S. Arnborg, Complexity of finding embeddings in a k-tree. SIAM J. Algebr. Discrete Math. 3(2), 277–284 (1987)
O. Banerjee, L.E. Ghaoui, A. d’Aspremont, Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9, 485–516 (2008)
U. Bertele, F. Brioschi, Nonserial Dynamic Programming (Academic, New York, 1972)
D. Bertsekas, Nonlinear Programming (Athena Scientific, Belmont, MA, 1995)
J. Besag, Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B 36, 192–236 (1974)
J. Besag, Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika 64(3), 616–618 (1977)
J. Besag, On the statistical analysis of dirty pictures. J. R. Stat. Soc. Ser. B 48(3), 259–279 (1986)
Y.M. Bishop, S.E. Fienberg, P.W. Holland, Discrete Multivariate Analysis: Theory and Practice (MIT Press, Boston, MA, 1975)
H.L. Bodlaender, Dynamic programming on graphs with bounded treewidth, in Automata, Languages and Programming, vol. 317 (Springer, Berlin, 1988), pp. 105–118
H. Bodlaender, A tourist guide through treewidth. Acta Cybern. 11, 1–21 (1993)
H.L. Bodlaender, A linear-time algorithm for finding tree decompositions of small treewidth. SIAM J. Comput. 25, 1305–1317 (1996)
B. Bollobás, Graph Theory: An Introductory Course (Springer, New York, 1979)
S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004)
L.D. Brown, Fundamentals of Statistical Exponential Families (Institute of Mathematical Statistics, Hayward, CA, 1986)
A. Cayley, A theorem on trees. Q. J. Math. 23, 376–378 (1889)
C.K. Chow, C.N. Liu, Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory IT-14, 462–467 (1968)
P. Clifford, Markov random fields in statistics, in Disorder in Physical Systems, ed. by G. Grimmett, D.J.A. Welsh. Oxford Science Publications (Clarendon Press, Oxford, 1990)
T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms (MIT Press, Cambridge, MA, 1990)
I. Csiszar, I-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3(1), 146–158 (1975)
I. Csiszár, Sanov property, generalized I-projection and a conditional limit theorem. Ann. Probab. 12(3), 768–793 (1984)
I. Csiszár, A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling. Ann. Stat. 17(3), 1409–1413 (1989)
J.N. Darroch, D. Ratcliff, Generalized iterative scaling for log-linear models. Ann. Math. Stat. 43, 1470–1480 (1972)
A.P. Dawid, Applications of a general propagation algorithm for probabilistic expert systems. Stat. Comput. 2, 25–36 (1992)
R.L. Dykstra, An iterative procedure for obtaining I-projections onto the intersection of convex sets. Ann. Probab. 13(3), 975–984 (1985)
B. Efron, The geometry of exponential families. Ann. Stat. 6, 362–376 (1978)
S. Fienberg, An iterative procedure for estimation in contingency tables. Ann. Math. Stat. 41(3), 907–917 (1970)
G.D. Forney Jr., The Viterbi algorithm. Proc. IEEE 61, 268–277 (1973)
J. Friedman, T. Hastie, R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008)
R.G. Gallager, Low-Density Parity Check Codes (MIT Press, Cambridge, MA, 1963)
S. Geman, D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI 6, 721–741 (1984)
A. George, J.W.H. Liu, The evolution of the minimum degree ordering algorithm. SIAM Rev. 31(1), 1–19 (1989)
G.R. Grimmett, A theorem about random fields. Bull. Lond. Math. Soc. 5, 81–84 (1973)
S.J. Haberman, The Analysis of Frequency Data (University of Chicago Press, Chicago, IL, 1974)
E. Ising, Beitrag zur theorie der ferromagnetismus. Zeitschrift für Physik 31(1), 253–258 (1925)
M. Kalisch, P. Bühlmann, Estimating high-dimensional directed acyclic graphs with the PC algorithm. J. Mach. Learn. Res. 8, 613–636 (2007)
R. Kalman, A new approach to linear filtering and prediction problems. Am. Soc. Mech. Eng.: Basic Eng. Ser. D 82, 35–45 (1960)
D. Karger, N. Srebro, Learning Markov networks: maximum bounded tree-width graphs, in Symposium on Discrete Algorithms (2001), pp. 392–401
F. Kschischang, B. Frey, H.A. Loeliger, Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
S.L. Lauritzen, Graphical Models (Oxford University Press, Oxford, 1996)
S.L. Lauritzen, D.J. Spiegelhalter, Local computations with probabilities on graphical structures and their application to expert systems (with discussion). J. R. Stat. Soc. B 50, 155–224 (1988)
H.A. Loeliger, An introduction to factor graphs. IEEE Signal Process. Mag. 21, 28–41 (2004)
N. Meinshausen, A note on the Lasso for graphical Gaussian model selection. Stat. Probab. Lett. 78(7), 880–884 (2008)
N. Meinshausen, P. Bühlmann, High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34, 1436–1462 (2006)
S.V. Parter, The use of linear graphs in Gaussian elimination. SIAM Rev. 3, 119–130 (1961)
J. Pearl, Probabilistic Reasoning in Intelligent Systems (Morgan Kaufman, San Mateo, 1988)
L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–285 (1989)
P. Ravikumar, M.J. Wainwright, J. Lafferty, High-dimensional Ising model selection using ℓ 1-regularized logistic regression. Ann. Stat. 38(3), 1287–1319 (2010)
P. Ravikumar, M.J. Wainwright, G. Raskutti, B. Yu, High-dimensional covariance estimation by minimizing ℓ 1-penalized log-determinant divergence. Electron. J. Stat. 5, 935–980 (2011)
D.J. Rose, Triangulated graphs and the elimination process. J. Math. Anal. Appl. 32, 597–609 (1970)
G.R. Shafer, P.P. Shenoy, Probability propagation. Ann. Math. Artif. Intell. 2, 327–352 (1990)
P. Spirtes, C. Glymour, R. Scheines, Causation, Prediction and Search (MIT Press, Cambridge, 2000)
N. Srebro, Maximum likelihood Markov networks: an algorithmic approach. Master’s thesis, MIT, 2000
F.F. Stephan, Iterative method of adjusting sample frequency tables when expected margins are known. Ann. Math. Stat. 13, 166–178 (1942)
R.M. Tanner, A recursive approach to low complexity codes. IEEE Trans. Inf. Theory IT-27, 533–547 (1980)
R.E. Tarjan, M. Yannakakis, Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J. Comput. 13(3), 566–579 (1984)
J.H. van Lint, R.M. Wilson, A Course in Combinatorics (Cambridge University Press, Cambridge, 1992)
S. Verdú, H.V. Poor, Abstract dynamic programming models under commutativity conditions. SIAM J. Control Optim. 25(4), 990–1006 (1987)
A. Viterbi, Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Inf. Theory IT-13, 260–269 (1967)
M.J. Wainwright, M.I. Jordan, Graphical models, exponential families and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)
M.J. Wainwright, T.S. Jaakkola, A.S. Willsky, Tree-based reparameterization framework for analysis of sum-product and related algorithms. IEEE Trans. Inf. Theory 49(5), 1120–1146 (2003)
M.J. Wainwright, T.S. Jaakkola, A.S. Willsky, Tree consistency and bounds on the max-product algorithm and its generalizations. Stat. Comput. 14, 143–166 (2004)
M. Yannakakis, Computing the minimum fill-in is NP-complete. SIAM J. Algebr. Discrete Methods 2(1), 77–79 (1981)
Acknowledgements
This work was partially supported by NSF grants CCF-0545862, DMS-0605165 and CCF-0635372, and AFOSR grant 09NL184.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Triangulation and Equivalent Graph-Theoretic Properties
Appendix: Triangulation and Equivalent Graph-Theoretic Properties
In this Appendix, we prove Theorem 3 as part of a more general discussion of triangulation and related graph-theoretic properties. Having already defined the notions of triangulations and junction tree, let us now define the closely related notions of decompsable and recursively simplicial. The following notion serves to formalize the “divide-and-conquer” nature of efficient algorithms:
Definition 9
A graph \(\mathcal{G} = (\mathcal{V},\mathcal{E})\) is decomposable if either it is complete, or its vertex set \(\mathcal{V}\) can be split into the disjoint union of three sets \(A \cup B \cup S\) such that (a) \(A\) and \(B\) are non-empty; (b) the set \(S\) separates \(A\) and \(B\) in \(\mathcal{G}\), and is complete (i.e., \((s,t) \in \mathcal{E}\) for all \(s,t \in S\)); and (c) \(A \cup S\) and \(B \cup S\) are also decomposable.
Recall from our discussion of the elimination algorithm in Sect. 3.1 that when a vertex is removed from the graph, the algorithm always connects together all of its neighbors, thereby creating additional edges in the reduced graph. The following property characterizes when there is an elimination ordering such that no edges are added by the elimination algorithm.
Definition 10
A vertex is simplicial if its neighbors form a complete subgraph. A non-empty graph is recursively simplicial if it contains a simplicial vertex, and when s is removed, any graph that remains is recursively simplicial.
It should be intuitively clear that these four properties—namely, triangulated, decomposable, recursively simplicial, and having a junction tree—are related. We now show that all four properties are actually equivalent:
Theorem 4
The following properties of an undirected graph \(\mathcal{G}\) are all equivalent:
- Property (T)::
-
\(\mathcal{G}\) is triangulated.
- Property (D)::
-
\(\mathcal{G}\) is decomposable.
- Property (R)::
-
\(\mathcal{G}\) is recursively simplicial.
- Property (J)::
-
\(\mathcal{G}\) has a junction tree.
We prove the sequence of implications \((T) \Rightarrow (D) \Rightarrow (R) \Rightarrow (J) \Rightarrow (T)\).
- (T) \(\Rightarrow \) (D): :
-
We proceed via induction on the graph size \(N\). The claim is trivial for \(N = 1\), so let us assume it for all graphs with \(N\) vertices, and prove that it also holds for any graph \(\mathcal{G}\) with \(N + 1\) vertices. If \(\mathcal{G}\) is complete, then it is certainly decomposable. Moreover, if \(\mathcal{G}\) has more than one connected component, each of which is complete, then it is also decomposable. Otherwise, we may assume that at least one connected component of \(\mathcal{G}\) is not complete. (Without loss of generality in the argument to follow, we assume that \(\mathcal{G}\) has a single connected component which is not complete.) Since \(\mathcal{G}\) is not complete, it contains two non-adjacent vertices \(a,b\). Let \(S\) be a minimal set that separates \(a\) and \(b\); the set \(S\) must be non-empty since \(\mathcal{G}\) has a single connected component. Define \(A\) as the set of all vertices connected to a in \(\mathcal{V}\setminus S\), and set \(B:= \mathcal{V}\setminus (A \cup S)\). Clearly, \(S\) separates \(A\) from \(B\) in \(\mathcal{G}\).
Now we need to show that \(S\) is complete. If \(\vert S\vert = 1\), the claim is trivial. Otherwise, for any two distinct vertices \(s,t \in \mathcal{S}\), there exist paths \((s,a_{1},\ldots,a_{i},t)\) and \((s,b_{1},\ldots,b_{j},t)\) where \(a_{k} \in A\), \(b_{k} \in B\) and i, j ≥ 1. (This claim relies on the minimality of \(S\): if there did not exist a path from a to s, then vertex s could be removed from \(\mathcal{S}\). Similar reasoning applies to establish a path from a to t, and also the paths involving \(B\).)
We claim that s and t are joined by an edge. If not, take the path from s to t through \(A\) with minimal length, and similarly for \(B\). This pair of paths forms a cycle of length at least four, which must have a chord. The chord cannot be in \(A\) or \(B\), since this would contradict minimality. It cannot be between vertices in \(A\) and \(B\) since \(S\) separates these two sets. Therefore, s and t are joined, and \(S\) is complete.
Finally, we need to show that \(A \cup S\) and \(B \cup S\) are also decomposable. But they must be triangulated, since otherwise \(\mathcal{G}\) would not be triangulated, and they have cardinality strictly smaller than \(N + 1\), so the result follows by induction.
- (D) \(\Rightarrow \) (R): :
-
Proof by induction on graph size \(N\). Trivial for \(N = 1\). To complete the induction step, we require the following lemma:
Lemma 6
Every decomposable graph with at least two vertices has at least two simplicial vertices. If the graph is not complete, these vertices can be chosen to be non-adjacent.
Proof
Proof by induction on graph size \(N\). Trivial for \(N = 2\). Consider a decomposable graph with \(N + 1\) vertices. If the graph is complete, all vertices are simplicial. Otherwise, decompose the graph into disjoint sets \(A\), \(B\) and \(S\). The subgraphs \(A \cup S\) and \(B \cup S\) are also chordless, and hence we have two simplicial vertices in \(A \cup S\). If \(A \cup S\) is not complete, these can be chosen to be non-adjacent. Given that \(S\) is complete, one of the vertices can be taken in \(A\). Otherwise, if \(A \cup S\) is complete, choose any node in \(A\). Proceed in a symmetric fashion for \(B\). The simplicial vertices thus chosen will not be connected, since \(S\) separates \(A\) and \(B\). □
Thus, given a decomposable graph, we can find some simplicial vertex s to remove. We need to show that the remaining graph is also decomposable, so as to apply the induction hypothesis. In particular, we prove that \(\mathcal{G}\) decomposable implies that any vertex-induced subgraph \(\mathcal{G}[U]\) is also decomposable. We prove this induction on | U | . Trivial for | U | = 1. Trivially true if \(\mathcal{G}\) is complete; otherwise, break into \(A \cup S \cup B\). Removing a node from \(S\) leaves \(S\setminus \{s\}\) complete, and \(A \cup S\) and \(B \cup S\) decomposable by the induction hypothesis. Removing a node from \(A\) does not change \(B \cup S\), and either leaves \(A\) empty (in which case remainder \(B \cup S\) is decomposable), or leaves \(A \cup S\) decomposable by induction.
- (R) \(\Rightarrow \) (J): :
-
Proof by induction on graph size \(N\). Trivial for \(N = 1\). Let s be a simplicial vertex, and consider subgraph \(\mathcal{G}'\) obtained by removing s. By induction, \(\mathcal{G}'\) has a junction tree \(\mathcal{T}'\), which we will extend to a junction tree \(\mathcal{T}\) for \(\mathcal{G}\). Let C′ be a maximal clique in \(\mathcal{T}'\) that contains all the neighbors of s; this must exist since s is simplicial. If C′ is precisely the neighbors of s, then we can add s to C′ so as to obtain \(\mathcal{T}\), which is a junction tree for \(\mathcal{G}\).
If not (i.e., if C′ contains the neighbors of s as a proper subset), then we can add a new clique containing s and its neighbors to \(\mathcal{T}'\), with an edge to C′. Since s is in no other clique of \(\mathcal{T}\) and \(C\setminus \{s\}\) is a subset of \(C'\), the tree \(\mathcal{T}'\) is a junction tree for \(\mathcal{G}\)¿
- (J) \(\Rightarrow \) (T): :
-
Proof by induction on number of vertices M in junction tree. For M = 1, \(\mathcal{G}\) is complete and hence triangulated. Consider a junction tree \(\mathcal{T}\) with M + 1 vertices. For a fixed leaf C of \(\mathcal{T}\), let C′ be the unique neighbor of C in \(\mathcal{T}\), and let \(\mathcal{T}'\) be the tree that remains when C is removed.
- Step 1::
-
If C ⊆ C′, then \(\mathcal{T}'\) is a junction tree for \(\mathcal{G}\), and result follows by induction.
- Step 2::
-
If \(C \cap C' \subset C\) (in a strict sense), then consider the subgraph \(\mathcal{G}'\) formed by removing the non-empty set \(R:= C\setminus C'\) from \(\mathcal{V}\). We claim that it is chordal. First, observe that R has an empty intersection with every clique in \(\mathcal{T}'\) (using junction tree property). (That is, say \(R \cap D\neq 0\) for some clique node D in \(\mathcal{T}'\). Then there exists s ∈ C ∩ D, but \(s\notin C'\), with violates running intersection.) Follows that \(\mathcal{T}'\) is a junction tree for \(\mathcal{G}'\), and so \(\mathcal{G}'\) is chordal (by applying induction hypothesis).
- Step 3::
-
Now claim that \(\mathcal{G}\) is chordal. Any cycle entirely contained in \(\mathcal{G}'\) is chordless by induction. If the cycle is entirely within the complete subgraph \(\mathcal{G}[C]\), it is also chordless. Any other cycle must intersect R, \(C \cap C'\) and \(\mathcal{V}\setminus C\). In particular, it must cross \(C \cap C'\) twice, and since this set is complete, it has a chord.
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Wainwright, M.J. (2015). Graphical Models and Message-Passing Algorithms: Some Introductory Lectures. In: Fagnani, F., Fosson, S., Ravazzi, C. (eds) Mathematical Foundations of Complex Networked Information Systems. Lecture Notes in Mathematics(), vol 2141. Springer, Cham. https://doi.org/10.1007/978-3-319-16967-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-16967-5_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16966-8
Online ISBN: 978-3-319-16967-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)