Machine Learning

, Volume 26, Issue 1, pp 65–92 | Cite as

A “Microscopic” Study of Minimum Entropy Search in Learning Decomposable Markov Networks

  • Y. Xiang
  • S.K.M. Wong
  • N. Cercone
Article

Abstract

Several scoring metrics are used in different search procedures for learning probabilistic networks. We study the properties of cross entropy in learning a decomposable Markov network. Though entropy and related scoring metrics were widely used, its “microscopic” properties and asymptotic behavior in a search have not been analyzed. We present such a “microscopic” study of a minimum entropy search algorithm, and show that it learns an I-map of the domain model when the data size is large.

Search procedures that modify a network structure one link at a time have been commonly used for efficiency. Our study indicates that a class of domain models cannot be learned by such procedures. This suggests that prior knowledge about the problem domain together with a multi-link search strategy would provide an effective way to uncover many domain models.

inductive learning reasoning under uncertainty knowledge acquisition Markov networks probabilistic networks 

References

  1. Beinlich, I.A., Suermondt, H.J., Chavez, R.M., & Cooper, G.F. (1989). The alarm monitoring system: A case study with two probabilistic inference techniques for belief networks. Technical Report KSL-88-84, Knowledge Systems Lab, Medical Computer Science, Stanford University.Google Scholar
  2. Bouckaert, R.R. (1994). Properties of Bayesian belief network learning algorithms. In R. Lopez de Mantaras & D. Poole (Eds.), Proc. of 10th Conf. on Uncertainty in Artificial Intelligence (pp. 102-109). Seattle, Washington: Morgan Kaufmann.Google Scholar
  3. Buntine, W. (1991). Classifiers: A theoretical and empirical study. In R. Lopez de Mantaras & D. Poole (Eds.), Proc. of 1991 Inter. Joint Conf. on Artificial Intelligence (pp. 638-644), Sydney.Google Scholar
  4. Buntine, W. (1991). Theory refinement on Bayesian networks. In B.D. D’Ambrosio, P. Smets, & P.P. Bonissone (Eds.), Proc. of 7th Conf. on Uncertainty in Artificial Intelligence (pp. 52-60).Google Scholar
  5. Buntine, W. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, (2):159-225.Google Scholar
  6. Charniak, E. (1991). Bayesian networks without tears. AI Magazine, 12(4):50-63.Google Scholar
  7. Cheeseman, P. (1993). Overview of model selection. In Proc. of 4th Inter. Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale: Society for AI and Statistics.Google Scholar
  8. Chickering, D., Geiger, D., & Heckerman, D. (1995). Learning Bayesian networks: Serach methods and experimental results. In Proc. of 5th Conf. on Artificial Intelligence and Statistics (pp. 112-128). Ft. Lauderdale: Society for AI and Statistics.Google Scholar
  9. Chow, C.K., & Liu, C.N. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. on Information Theory, (14):462-467.Google Scholar
  10. Cooper, G.F., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, (9):309-347.Google Scholar
  11. Dawid, A.P., & Lauritzen, S.L. (1993). Hyper Markov laws in the statistical analysis of decomposable graphical models. Annals of Statistics, 21(3):1272-1317.Google Scholar
  12. Edwards, D., & Havranek, T. (1985). A fast procedure for model search in multidimensional contingency tables. Biometrika, 72(2):339-351.Google Scholar
  13. Frydenberg, M., & Lauritzen, S.L. (1989). Decomposition of maximum likelihood in mixed graphical interaction models. Biometrika, 76(3):539-555.Google Scholar
  14. Fung, R.M., & Crawford, S.L. (1990). Constructor: A system for the induction of probabilistic models. In Proc. of AAAI (pp. 762-769). Boston: MA MIT Press.Google Scholar
  15. Gallager, R.G. (1968). Information Theory and Reliable Communication. John Wiley and Sons.Google Scholar
  16. Golumbic, M.C. (1980). Algorithmic Graph Theory and Perfect Graphs. Academic Press.Google Scholar
  17. Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4):711-732.Google Scholar
  18. Hajek, P., Hovranek, T., & Jirousek, R. (1992). Uncertain Information Processing in Expert Systems. CRC Press.Google Scholar
  19. Heckerman, D. (1995). A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, Mocrisoft.Google Scholar
  20. Heckerman, D., Geiger, D., & Chickering, D.M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20:197-243.Google Scholar
  21. Henrion, M. (1988). Propagating uncertainty in Bayesian networks by probabilistic logic sampling. In J.F. Lemmer, & L.N. Kanal (Eds.), Uncertainty in Artificial Intelligence 2 (pp. 149-163). Elsevier Science Publishers.Google Scholar
  22. Herskovits, E.H., & Cooper, G.F. (1990).Kutato: An entropy-driven system for construction of probabilistic expert systems from database. In Proc. 6th Conf. on Uncertainty in Artificial Intelligence (pp. 54-62). Cambridge.Google Scholar
  23. Jensen, F.V. (1988). Junction tree and decomposable hypergraphs. Technical Report, JUDEX, Aalborg, Denmark.Google Scholar
  24. Jensen, F.V., Lauritzen, S.L., & Olesen, K.G. (1990). Bayesian updating in causal probabilistic networks by local computations. Computational Statistics Quarterly, (4):269-282.Google Scholar
  25. John, G.H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proc. 11th Inter. Conf. on Machine Learning (pp. 121-129).Google Scholar
  26. Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22:79-86.Google Scholar
  27. Lam, W., & Bacchus, F. (1994). Learning Bayesian networks: An approach based on the MDL principle. Computational Intelligence, 10(3):269-293.Google Scholar
  28. Lauritzen, S.L., & Spiegelhalter, D.J. (1988). Local computation with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B, (50):157-244.Google Scholar
  29. Madigan, D., & Raftery, A.E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window. Journal of American Statistical Association, 89(428):1535-1546.Google Scholar
  30. Madigan, D., & York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63:215-232.Google Scholar
  31. Manber, U. (1989). Introduction to Algorithms: a Creative Approach. Addison-Wesley.Google Scholar
  32. Neapolitan, R.E. (1990). Probabilistic Reasoning in Expert Systems. John Wiley and Sons.Google Scholar
  33. Pagallo, G., & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, (5):71-99.Google Scholar
  34. Pearl, J. (1986). Fusion, propagation, and structuring in belief networks. Artificial Intelligence, (29):241-288.Google Scholar
  35. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.Google Scholar
  36. Rebane, G., & Pearl, J. (1987). The recovery of causal ploy-trees from statistical data. In Proc. of Workshop on Uncertainty in Artificial Intelligence (pp. 222-228). Seattle.Google Scholar
  37. Reinis, Z., Pokorny, J., Basika, V., Tiserova, J., Gorican, K., Horakova, D., Stuchlikova, E., Havranek, T., & Hrabovsky, F. (1981). Prognostic significance of the risk profile in the prevention of coronary heart disease. Bratis. lek Listy, 76:137-150.Google Scholar
  38. Sclove, S.L. (1994). Small-sample and large-sample statistical model selection criteria. In P. Cheeseman & R.W. Oldford (Eds.), Selecting Models from Data (pp. 31-39). Springer-Verlag.Google Scholar
  39. Spirtes, P., & Glymour, C. (1991). An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9(1):62-73.Google Scholar
  40. Wong, S.K.M., Butz, C.J., & Xiang, Y. (1995). A method for implementing a probabilistic model as a relational database. In Proc. 11th Conf. on Uncertainty in Artificial Intelligence (pp. 556-564). Montreal.Google Scholar
  41. Wong, S.K.M., & Xiang, Y. (1994). Construction of a Markov network from data for probabilistic inference. In Proc. 3rd Inter. Workshop on Rough Sets and Soft Computing (pp. 562-569). San Jose.Google Scholar
  42. Wong, S.K.M., Xiang, Y., & Nie, X. (1994). Representation of Bayesian networks as relational databases. In Proc. 5th Inter. Conf. Information Processing and Management of Uncertainty in Knowledge-Based Systems(IPMU) (pp. 159-165). Paris.Google Scholar
  43. Xiang, Y. (1996).Aprobabilistic framework for cooperative multi-agent distributed interpretation and optimization of communication. Artificial Intelligence, to appear in fall.Google Scholar
  44. Xiang, Y., Pant, B., Eisen, A., Beddoes, M.P., & Poole, D. (1993a). Multiply sectioned Bayesian networks for neuromuscular diagnosis. Artificial Intelligence in Medicine, 5:293-314.Google Scholar
  45. Xiang, Y., Poole, D., & Beddoes, M.P. (1993b). Multiply sectioned Bayesian networks and junction forests for large knowledge based systems. Computational Intelligence, 9(2):171-220.Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Y. Xiang
    • 1
  • S.K.M. Wong
    • 1
  • N. Cercone
    • 1
  1. 1.Department of Computer ScienceUniversity of ReginaRegina, SaskatchewanCanada

Personalised recommendations