A “Microscopic” Study of Minimum Entropy Search in Learning Decomposable Markov Networks
 Y. Xiang,
 S.K.M. Wong,
 N. Cercone
 … show all 3 hide
Abstract
Several scoring metrics are used in different search procedures for learning probabilistic networks. We study the properties of cross entropy in learning a decomposable Markov network. Though entropy and related scoring metrics were widely used, its “microscopic” properties and asymptotic behavior in a search have not been analyzed. We present such a “microscopic” study of a minimum entropy search algorithm, and show that it learns an Imap of the domain model when the data size is large.
Search procedures that modify a network structure one link at a time have been commonly used for efficiency. Our study indicates that a class of domain models cannot be learned by such procedures. This suggests that prior knowledge about the problem domain together with a multilink search strategy would provide an effective way to uncover many domain models.
 Beinlich, I.A., Suermondt, H.J., Chavez, R.M., Cooper, G.F. (1989) The alarm monitoring system: A case study with two probabilistic inference techniques for belief networks. Knowledge Systems Lab, Medical Computer Science, Stanford University
 Bouckaert, R.R. Properties of Bayesian belief network learning algorithms. In: Lopez de Mantaras, R., Poole, D. eds. (1994) Proc. of 10th Conf. on Uncertainty in Artificial Intelligence. Morgan Kaufmann, Seattle, Washington, pp. 102109
 Buntine, W. (1991). Classifiers: A theoretical and empirical study. In R. Lopez de Mantaras & D. Poole (Eds.), Proc. of 1991 Inter. Joint Conf. on Artificial Intelligence (pp. 638644), Sydney.
 Buntine, W. (1991). Theory refinement on Bayesian networks. In B.D. D’Ambrosio, P. Smets, & P.P. Bonissone (Eds.), Proc. of 7th Conf. on Uncertainty in Artificial Intelligence (pp. 5260).
 Buntine, W. (1994) Operations for learning with graphical models. Journal of Artificial Intelligence Research 2: pp. 159225
 Charniak, E. (1991) Bayesian networks without tears. AI Magazine 12: pp. 5063
 Cheeseman, P. (1993) Overview of model selection. Proc. of 4th Inter. Workshop on Artificial Intelligence and Statistics. Society for AI and Statistics, Ft. Lauderdale
 Chickering, D., Geiger, D., Heckerman, D. (1995) Learning Bayesian networks: Serach methods and experimental results. Proc. of 5th Conf. on Artificial Intelligence and Statistics. Society for AI and Statistics, Ft. Lauderdale, pp. 112128
 Chow, C.K., Liu, C.N. (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans. on Information Theory 14: pp. 462467
 Cooper, G.F., Herskovits, E. (1992) A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9: pp. 309347
 Dawid, A.P., Lauritzen, S.L. (1993) Hyper Markov laws in the statistical analysis of decomposable graphical models. Annals of Statistics 21: pp. 12721317
 Edwards, D., Havranek, T. (1985) A fast procedure for model search in multidimensional contingency tables. Biometrika 72: pp. 339351
 Frydenberg, M., Lauritzen, S.L. (1989) Decomposition of maximum likelihood in mixed graphical interaction models. Biometrika 76: pp. 539555
 Fung, R.M., Crawford, S.L. (1990) Constructor: A system for the induction of probabilistic models. Proc. of AAAI. MA MIT Press, Boston, pp. 762769
 Gallager, R.G. (1968). Information Theory and Reliable Communication. John Wiley and Sons.
 Golumbic, M.C. (1980). Algorithmic Graph Theory and Perfect Graphs. Academic Press.
 Green, P.J. (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: pp. 711732
 Hajek, P., Hovranek, T., & Jirousek, R. (1992). Uncertain Information Processing in Expert Systems. CRC Press.
 Heckerman, D. (1995). A tutorial on learning Bayesian networks. Technical Report MSRTR9506, Microsoft Research, Mocrisoft.
 Heckerman, D., Geiger, D., Chickering, D.M. (1995) Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20: pp. 197243
 Henrion, M. (1988). Propagating uncertainty in Bayesian networks by probabilistic logic sampling. In J.F. Lemmer, & L.N. Kanal (Eds.), Uncertainty in Artificial Intelligence 2 (pp. 149163). Elsevier Science Publishers.
 Herskovits, E.H., & Cooper, G.F. (1990).Kutato: An entropydriven system for construction of probabilistic expert systems from database. In Proc. 6th Conf. on Uncertainty in Artificial Intelligence (pp. 5462). Cambridge.
 Jensen, F.V. (1988) Junction tree and decomposable hypergraphs. Denmark, JUDEX, Aalborg
 Jensen, F.V., Lauritzen, S.L., Olesen, K.G. (1990) Bayesian updating in causal probabilistic networks by local computations. Computational Statistics Quarterly 4: pp. 269282
 John, G.H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proc. 11th Inter. Conf. on Machine Learning (pp. 121129).
 Kullback, S., Leibler, R.A. (1951) On information and sufficiency. Annals of Mathematical Statistics 22: pp. 7986
 Lam, W., Bacchus, F. (1994) Learning Bayesian networks: An approach based on the MDL principle. Computational Intelligence 10: pp. 269293
 Lauritzen, S.L., Spiegelhalter, D.J. (1988) Local computation with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B 50: pp. 157244
 Madigan, D., Raftery, A.E. (1994) Model selection and accounting for model uncertainty in graphical models using Occam's window. Journal of American Statistical Association 89: pp. 15351546
 Madigan, D., York, J. (1995) Bayesian graphical models for discrete data. International Statistical Review 63: pp. 215232
 Manber, U. (1989). Introduction to Algorithms: a Creative Approach. AddisonWesley.
 Neapolitan, R.E. (1990). Probabilistic Reasoning in Expert Systems. John Wiley and Sons.
 Pagallo, G., Haussler, D. (1990) Boolean feature discovery in empirical learning. Machine Learning 5: pp. 7199
 Pearl, J. (1986) Fusion, propagation, and structuring in belief networks. Artificial Intelligence 29: pp. 241288
 Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
 Rebane, G., & Pearl, J. (1987). The recovery of causal ploytrees from statistical data. In Proc. of Workshop on Uncertainty in Artificial Intelligence (pp. 222228). Seattle.
 Reinis, Z., Pokorny, J., Basika, V., Tiserova, J., Gorican, K., Horakova, D., Stuchlikova, E., Havranek, T., Hrabovsky, F. (1981) Prognostic significance of the risk profile in the prevention of coronary heart disease. Bratis. lek Listy 76: pp. 137150
 Sclove, S.L. (1994). Smallsample and largesample statistical model selection criteria. In P. Cheeseman & R.W. Oldford (Eds.), Selecting Models from Data (pp. 3139). SpringerVerlag.
 Spirtes, P., Glymour, C. (1991) An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review 9: pp. 6273
 Wong, S.K.M., Butz, C.J., & Xiang, Y. (1995). A method for implementing a probabilistic model as a relational database. In Proc. 11th Conf. on Uncertainty in Artificial Intelligence (pp. 556564). Montreal.
 Wong, S.K.M., & Xiang, Y. (1994). Construction of a Markov network from data for probabilistic inference. In Proc. 3rd Inter. Workshop on Rough Sets and Soft Computing (pp. 562569). San Jose.
 Wong, S.K.M., Xiang, Y., & Nie, X. (1994). Representation of Bayesian networks as relational databases. In Proc. 5th Inter. Conf. Information Processing and Management of Uncertainty in KnowledgeBased Systems(IPMU) (pp. 159165). Paris.
 Xiang, Y. (1996).Aprobabilistic framework for cooperative multiagent distributed interpretation and optimization of communication. Artificial Intelligence, to appear in fall.
 Xiang, Y., Pant, B., Eisen, A., Beddoes, M.P., Poole, D. (1993) Multiply sectioned Bayesian networks for neuromuscular diagnosis. Artificial Intelligence in Medicine 5: pp. 293314
 Xiang, Y., Poole, D., Beddoes, M.P. (1993) Multiply sectioned Bayesian networks and junction forests for large knowledge based systems. Computational Intelligence 9: pp. 171220
 Title
 A “Microscopic” Study of Minimum Entropy Search in Learning Decomposable Markov Networks
 Journal

Machine Learning
Volume 26, Issue 1 , pp 6592
 Cover Date
 19970101
 DOI
 10.1023/A:1007324100110
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Kluwer Academic Publishers
 Additional Links
 Topics
 Keywords

 inductive learning
 reasoning under uncertainty
 knowledge acquisition
 Markov networks
 probabilistic networks
 Industry Sectors
 Authors

 Y. Xiang ^{(1)}
 S.K.M. Wong ^{(1)}
 N. Cercone ^{(1)}
 Author Affiliations

 1. Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada, S4S 0A2