Abstract
We demonstrate that certain large-clique graph triangulations can be useful for reducing computational requirements when making queries on mixed stochastic/deterministic graphical models. This is counter to the conventional wisdom that triangulations that minimize clique size are always most desirable for use in computing queries on graphical models. Many of these large-clique triangulations are non-minimal and are thus unattainable via the popular elimination algorithm. We introduce ancestral pairs as the basis for novel triangulation heuristics and prove that no more than the addition of edges between ancestral pairs needs to be considered when searching for state space optimal triangulations in such graphs. Empirical results on random and real world graphs are given. We also present an algorithm and correctness proof for determining if a triangulation can be obtained via elimination, and we show that the decision problem associated with finding optimal state space triangulations in this mixed setting is NP-complete.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aji, S. M., & McEliece, J. (2000). The generalized distributive law. IEEE Transactions on Information Theory, 46, 325–343.
Allen, D., & Darwiche, A. (2003). New advances in inference by recursive conditioning. In Proc. of the 19th annual conference on uncertainty in artificial intelligence (UAI) (pp. 2–10). San Francisco: Morgan Kaufmann.
Arnborg, S., Corneil, D. J., & Proskurowski, A. P. (1987). Complexity of finding embeddings in a k-tree. SIAM Journal on Algebraic and Discrete Methods, 8, 227–284.
Bacchus, F., Dalmao, S., & Pitassi, T. (2003a). Algorithms and complexity results for #SAT and Bayesian inference. In IEEE symposium on foundations of computer science (FOCS) (pp. 340–351).
Bacchus, F., Dalmao, S., & Pitassi, T. (2003b). Value elimination: Bayesian inference via backtracking search. In Proc. of the 19th annual conference on uncertainty in artificial intelligence (UAI) (pp. 20–28). San Francisco: Morgan Kaufmann.
Bartels, C. (2008). Graphical models for large vocabulary speech recognition. PhD thesis, University of Washington, Seattle.
Bartels, C., & Bilmes, J. (2004). Elimination is not enough: Non-minimal triangulations for graphical models (Tech. Rep. UWEETR-2004-0010). University of Washington.
Bartels, C., & Bilmes, J. (2006). Non-minimal triangulations for mixed stochastic/deterministic graphical models. In Proc. of the 22nd conference on uncertainty in artificial intelligence, Cambridge, MA, USA.
Bartels, C., Duh, K., Bilmes, J., Kirchhoff, K., & King, S. (2005). Genetic triangulation of graphical models for speech and language processing. In Proc. of the 9th European conference on speech communication and technology (Eurospeech’05), Lisbon, Portugal.
Berre, D. L., & Simon, L. (2005). Preface to the special volume on the SAT 2005 competitions and evaluations. Journal on Satisfiability, Boolean Modeling and Computation, 2.
Bertele, U., & Brioschi, F. (1972). Nonserial dynamic programming. New York: Academic Press.
Bilmes, J., & Bartels, C. (2003). On triangulating dynamic graphical models. In Proc. of the 19th conference on uncertainty in artificial intelligence, Acapulco, Mexico.
Bilmes, J., Zweig, G. et al. (2001). Discriminatively structured dynamic graphical models for speech recognition. In Final report: JHU 2001 summer workshop.
Bodlaender, H. L., Koster, AM, van den Eijkhof, F., & van der Gaag, L. C. (2001). Pre-processing for triangulation of probabilistic networks. In Proc. of the 17th annual conference on uncertainty in artificial intelligence (UAI) (pp. 32–39). San Francisco: Morgan Kaufmann.
Boutilier, C., Friedman, N., Goldszmidt, M., & Koller, D. (1996). Context-specific independence in Bayesian networks. In Proc. of the 12th conference on uncertainty in artificial intelligence, Portland, Oregon.
Chavira, M., & Darwiche, A. (2007). Compiling Bayesian networks using variable elimination. In Proc. of the 20th international joint conference on artificial intelligence (IJCAI) (pp. 2443–2449).
Chavira, M., & Darwiche, A. (2008). On probabilistic inference by weighted model counting. Artificial Intelligence, 172(6–7), 772–799.
Cook, S. A. (1971). The complexity of theorem-proving procedures. In STOC ’71: Proc. of the third annual ACM symposium on theory of computing (pp. 151–158). New York: ACM.
Darwiche, A. (2001). Recursive conditioning. Artificial Intelligence, 126, 5–41.
Davis, M., & Putnam, H. (1960). A computing procedure for quantification theory. Journal of the ACM, 7(3), 201–215.
Davis, M., Logemann, G., & Loveland, D. (1962). A machine program for theorem-proving. Communications of the ACM, 5(7), 394–397.
Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Artificial Intelligence, 93(1–2), 1–27.
Dechter, R. (1990). Enhancement schemes for constraint processing: Backjumping, learning and cutset decomposition. Artificial Intelligence, 41(3), 273–312.
Dechter, R. (1996). Bucket elimination: a unifying framework for several probabilistic inference algorithms. In Proc. of the 12th annual conference on uncertainty in artificial intelligence (UAI) (pp. 211–219). San Francisco: Morgan Kaufmann.
Dechter, R. (1998). Bucket elimination: a unifying framework for probabilistic inference. In M.I. Jordan (Ed.), Learning and inference in graphical models (pp. 75–104). Cambridge: MIT Press.
Dechter, R. (2003). Constraint processing. San Francisco: Morgan Kaufmann.
Dechter, R., & Fattah, Y.E. (2001). Topological parameters for time-space tradeoff. Artificial Intelligence, 125(1), 93–118.
Dechter, R., & Larkin, D. (2001). Hybrid processing of beliefs and constraints. In Proc. of the 17th annual conference on uncertainty in artificial intelligence (UAI) (pp. 112–119). San Francisco: Morgan Kaufmann.
Dechter, R., & Mateescu, R. (2004). Mixtures of deterministic-probabilistic networks and their and/or search space. In Proc. of the 20th annual conference on uncertainty in artificial intelligence (UAI), Arlington, Virginia (pp. 120–129).
Draper, D. (1995). Clustering without (thinking about) triangulation. In Proc. of the 11th annual conference on uncertainty in artificial intelligence (UAI) (pp. 125–133).
Filali, K., & Bilmes, J. (2005). A dynamic Bayesian framework to model context and memory in edit distance learning: An application to pronunciation classification. In Proc. of the 43rd annual meeting of the association for computational linguistics (ACL’05) (pp. 338–345). Ann Arbor: Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P05/P05-1042.
Filali, K., & Bilmes, JA (2007). Multi-dynamic Bayesian networks. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems 19 (pp. 409–416). Cambridge: MIT Press.
Fishelson, M., & Geiger, D. (2004). Optimizing exact genetic linkage computations. Journal of Computational Biology, 11(2–3), 263–275.
Gavril, F. (1972). Algorithms for minimum coloring, maximum clique, minimum covering by cliques, and maximum independent set of a chordal graph. SIAM Journal on Computing, 1(2), 180–187.
Golumbic, M. C. (1980). Algorithmic graph theory and perfect graphs. New York: Academic Press.
Jensen, F., & Andersen, S. (1990). Approximations in Bayesian belief universes for knowledge-based systems. In Proc. of the 6th annual conference on uncertainty in artificial intelligence (UAI), New York: Elsevier.
Jensen, F., & Jensen, F. (1994). Optimal junction trees. In Proc. of the 10th annual conference on uncertainty in artificial intelligence (UAI) (pp. 360–366). San Francisco: Morgan Kaufmann.
Jensen, F., Lauritzen, S., & Olesen, K. (1990). Bayesian updating in causal probabilistic networks by local computation. CSQ. Computational Statistics Quarterly, 4, 269–282.
Kjaerulff, U. (1990). Triangulation of graphs-algorithms giving small total state space (Tech. Rep. R-90-09). Aalborg University.
Kjærulff, U. (1992). A computational scheme for reasoning in dynamic probabilistic networks. In Proc. of the 8th conference on uncertainty in artificial intelligence (UAI) (pp. 121–129). San Francisco: Morgan Kaufmann.
Larkin, D., & Dechter, R. (2003). Bayesian inference in the presence of determinism. In Proc. of the AISTATS, ’03. Key West: Soc. for AI and Statistics.
Larranaga, P., Kuijpers, C., Poza, M., & Murga, R. (1997). Decomposing Bayesian networks by genetic algorithms. Statistics and Computing, 7(1), 19–34.
Lauritzen, S. (1996). Graphical models. London: Oxford University Press.
Lauritzen, S., & Spiegelhalter, D. (1988). Local computations with probabilities on graphical structures and their application to expert systems. The Journal of the Royal Statistical Society, 50, 57–224.
Lei, X., Ji, G., Bilmes, J., & Ostendorf, M. (2005). DBN multistream models for Mandarin toneme recognition. In Proc. of the international conference on acoustics, speech and signal processing (ICASSP).
Mateescu, R.E. (2007). AND/OR search spaces for graphical models. PhD thesis, University of California, Irvine.
Meila, M., & Jordan, M. I. (1997). Triangulation by continuous embedding. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems (Vol. 9). Cambridge: MIT Press.
Murphy, K. P. (2002). Dynamic Bayesian networks: representation, inference and learning. PhD thesis, University of California, Berkeley.
Nilsson, D. (1998). An efficient algorithm for finding the m most probable configurations in probabilistic expert systems. Statistics and Computing, 8, 159–173.
Ohtsuki, T., Cheung, L. K., & Fujisawa, T. (1976). Minimal triangulation of a graph and optimal pivoting order in a sparse matrix. Journal of Mathematical Analysis and Applications, 54, 622–633.
Olesen, K. G., & Madsen, A. L. (2002). Maximal prime subgraph decomposition of Bayesian networks. IEEE Transactions on Systems, Man and Cybernetics. Part B. Cybernetics, 32, 21–31.
Park, J. (2002). MAP complexity results and approximation methods. In Proc. of the 18th annual conference on uncertainty in artificial intelligence (UAI) (pp. 388–396). citeseer.ist.psu.edu/park02map.html.
Parra, A., & Scheffler, P. (1995). How to use the minimal separators of a graph for its chordal triangulation. In Automata, languages and programming (pp. 123–134). citeseer.ist.psu.edu/parra94how.html.
Parter, S. (1961). The use of linear graphs in gauss elimination. SIAM Review, 3(2), 119–130.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference (2nd edn.). San Mateo: Morgan Kaufmann.
Rose, D. J. (1970). Triangulated graphs and the elimination process. Journal of Mathematical Analysis and Applications, 32, 597–609.
Rose, D. J., Tarjan, RE, & Lueker, G. S. (1976). Algorithmic aspects of vertex elimination on graphs. SIAM Journal on Computing, 5, 266–283.
Sang, T., Beame, P., & Kautz, H. (2005). Performing Bayesian inference by weighted model counting. In Proc. of the 21st national conference on artificial intelligence (AAAI 05), Pittsburgh, USA.
Shenoy, P. P., & Shafer, G. (1986). Propagating belief functions with local computations. IEEE Expert, 1(3), 43–52.
Subramanya, A., Gowdy, J., Bartels, C., & Bilmes, J. (2004). DBN based multi-stream models for audio-visual speech recognition. In Proc. of the international conference on acoustics, speech and signal processing (ICASSP), Montreal, Canada.
Tarjan, R.E., & Yannakakis, M. (1984). Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM Journal on Computing, 13, 566–579.
Tsang, E. (1993). Foundations of constraint satisfaction. New York: Academic Press.
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.
Vomlel, J. (2002). Exploiting functional dependence in Bayesian network inference. In Proc. of the 18th annual conference on uncertainty in artificial intelligence (UAI) (pp. 528–535). San Francisco: Morgan Kaufmann.
Wen, W. X. (1991). Optimal decomposition of belief networks. In Proc. of the 6th annual conference on uncertainty in artificial intelligence (UAI) (pp. 209–224). New York: Elsevier.
Xiang, Y. (1999). Temporally invariant junction tree for inference in dynamic Bayesian network. In Lecture notes in computer science: Vol. 1600. Artificial intelligence today: recent trends and developments. Berlin: Springer.
Yannakakis, M. (1981). Computing the minimum fill-in is NP-complete. SIAM Journal on Algebraic and Discrete Methods, 2(1), 77–79.
Zhang, L., & Malik, S. (2002). The quest for efficient boolean satisfiability solvers. In Proc. of the 8th international conference on computer aided deduction (CADE 2002).
Zhang, Y., Diao, Q., Huang, S., Hu, W., Bartels, C., & Bilmes, J. (2003). DBN based multi-stream models for speech. In Proc. of the IEEE intl. conference on acoustics, speech, and signal processing, Hong Kong, China.
Zweig, G. G. (1998) Speech recognition with dynamic Bayesian networks. PhD thesis, University of California, Berkeley.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Lise Getoor.
Rights and permissions
About this article
Cite this article
Bartels, C.D., Bilmes, J.A. Creating non-minimal triangulations for use in inference in mixed stochastic/deterministic graphical models. Mach Learn 84, 249–289 (2011). https://doi.org/10.1007/s10994-010-5233-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5233-4