Machine Learning

, Volume 47, Issue 1, pp 63–89 | Cite as

Learning Recursive Bayesian Multinets for Data Clustering by Means of Constructive Induction

  • Jose M. Peña
  • Jose A. Lozano
  • Pedro Larrañaga

Abstract

This paper introduces and evaluates a new class of knowledge model, the recursive Bayesian multinet (RBMN), which encodes the joint probability distribution of a given database. RBMNs extend Bayesian networks (BNs) as well as partitional clustering systems. Briefly, a RBMN is a decision tree with component BNs at the leaves. A RBMN is learnt using a greedy, heuristic approach akin to that used by many supervised decision tree learners, but where BNs are learnt at leaves using constructive induction. A key idea is to treat expected data as real data. This allows us to complete the database and to take advantage of a closed form for the marginal likelihood of the expected complete data that factorizes into separate marginal likelihoods for each family (a node and its parents). Our approach is evaluated on synthetic and real-world databases.

data clustering Bayesian networks Bayesian multinets constructive induction EM algorithm BC+EM method 

References

  1. Anderberg, M. R. (1973). Cluster analysis for applications. New York, NY: Academic Press.Google Scholar
  2. Arciszewski, T., Michalski, R. S., & Wnek, J. (1995). Constructive induction: The key to design creativity. In Proceedings of the Third International Round-Table Conference on Computational Models of Creative Design (pp. 397–425). Heron Island, Queensland, Australia.Google Scholar
  3. Banfield, J., & Raftery, A. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49, 803–821.Google Scholar
  4. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth International Group.Google Scholar
  5. Buntine, W. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225.Google Scholar
  6. Castillo, E., Gutiérrez, J. M., & Hadi, A. S. (1997). Expert systems and probabilistic network models. New York, NY: Springer-Verlag.Google Scholar
  7. Chandon, J. L., & Pinson, S. (1980). Analyse typologique. Théories et applications. Paris, France: Masson.Google Scholar
  8. Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., & Freeman, D. (1988). AutoClass: A Bayesian classification system. In Proceedings of the Fifth International Conference on Machine Learning (pp. 54–64). San Francisco, CA: Morgan Kaufmann.Google Scholar
  9. Cheeseman, P., & Stutz, J. (1995). Bayesian classification (AutoClass): Theory and results. Advances in knowledge discovery and data mining (pp. 153–180). Menlo Park, CA: AAAI Press.Google Scholar
  10. Chickering, D. M. (1996). Learning Bayesian networks is NP-complete. In Learning from Data: Artificial Intelligence and Statistics V (pp. 121–130). New York, NY: Springer-Verlag.Google Scholar
  11. Chickering, D. M., & Heckerman, D. (1997). Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Machine Learning, 29, 181–212.Google Scholar
  12. Cooper, G., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.Google Scholar
  13. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B 39, 1–38.Google Scholar
  14. Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. New York, NY: John Wiley & Sons.Google Scholar
  15. Fisher, D. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172.Google Scholar
  16. Fisher, D., & Hapanyengwi, G. (1993). Database management and analysis tools of machine induction. Journal of Intelligent Information Systems, 2, 5–38.Google Scholar
  17. Friedman, N. (1998). The Bayesian structural EM algorithm. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (pp. 129–138). San Francisco, CA: Morgan Kaufmann.Google Scholar
  18. Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, 131–163.Google Scholar
  19. Friedman, N., & Goldszmidt, M. (1996). Building classifiers using Bayesian networks. In Proccedings of the Thirteenth National Conference on Artificial Intelligence (pp. 1277–1284). Menlo Park, CA: AAAI Press.Google Scholar
  20. Friedman, N., Goldszmidt, M., & Lee, T. (1998). Bayesian network classification with continuous attributes: Getting the best of both discretization and parametric fitting. In Proceedings of the Fifteenth National Conference on Machine Learning.Google Scholar
  21. Geiger, D., & Heckerman, D. (1996). Knowledge representation and inference in similarity networks and Bayesian multinets. Artificial Intelligence, 82, 45–74.Google Scholar
  22. Hartigan, J. A. (1975). Clustering algorithms. New York, NY: John Wiley & Sons.Google Scholar
  23. Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.Google Scholar
  24. Jensen, F. V. (1996). An introduction to Bayesian networks. New York, NY: Springer-Verlag.Google Scholar
  25. Kaufman, L., & Rousseeuw, P. (1990). Finding groups in data. New York, NY: John Wiley & Sons.Google Scholar
  26. Keogh, E. J., & Pazzani, M. J. (1999). Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches. In Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics (pp. 225–230). San Mateo, CA: Morgan Kaufmann.Google Scholar
  27. Kohavi, R. (1996). Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 202–207). Menlo Park, CA: AAAI Press.Google Scholar
  28. Langley, P. (1993). Induction of recursive Bayesian classifiers. In Proceedings of the Eighth European Conference on Machine Learning (pp. 153–164). Berlin, Germany: Springer-Verlag.Google Scholar
  29. Lauritzen, S. L. (1996). Graphical models. Oxford, United Kingdom: Clarendon Press.Google Scholar
  30. MacQueen, J. B. (1967). Some methods for classification analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability I (pp. 281–297). CA: University of California Press.Google Scholar
  31. McLachlan, G. J., & Krishnan, T. (1997). The EM algorithm and extensions. New York, NY: JohnWiley & Sons.Google Scholar
  32. Meilă, M. (1999). Learning with mixtures of trees. Ph.D. Thesis. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  33. Meilă, M., & Heckerman, D. (1998). An experimental comparison of several clustering and initialization methods. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (pp. 386–395). San Francisco, CA: Morgan Kaufmann.Google Scholar
  34. Merz, C., Murphy, P., & Aha, D. (1997). UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine. http://www.ics.uci.edu/∼mlearn/MLRepository.html.Google Scholar
  35. Michalski, R. S. (1978). Pattern recognition as knowledge-guided computer induction. Technical Report No. 927, Department of Computer Science, University of Illinois, Urbana, IL.Google Scholar
  36. Pazzani, M. J. (1996a). Constructive induction of Cartesian product attributes. In Information, Statistics and Induction in Science (pp. 66–77). Melbourne, Australia: World Scientific.Google Scholar
  37. Pazzani, M. J. (1996b). Searching for dependencies in Bayesian classifiers. In Learning from Data: Artificial Intelligence and Statistics V (pp. 239–248). New York, NY: Springer-Verlag.Google Scholar
  38. Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo, CA: Morgan Kaufmann.Google Scholar
  39. Peña, J. M., Lozano, J. A., & Larrañaga, P. (1999). Learning Bayesian networks for clustering by means of constructive induction. Pattern Recognition Letters, 20, 1219–1230.Google Scholar
  40. Peña, J. M., Lozano, J. A., & Larrañaga, P. (2000a). An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering. Pattern Recognition Letters, 21, 779–786.Google Scholar
  41. Peña, J. M., Lozano, J. A., & Larrañaga, P. (2001). Performance evaluation of compromise conditional Gaussian networks for data clustering. International Journal of Approximate Reasoning, 28, 23–50.Google Scholar
  42. Peña, J. M., Izarzugaza, I., Lozano, J. A., Aldasoro, E., & Larrañaga, P. (2001). Geographical clustering of cancer incidence by means of Bayesian networks and conditional Gaussian networks. In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics (pp. 266–271). San Francisco, CA: Morgan Kaufmann.Google Scholar
  43. Peot, M. (1996). Geometric implications of the naive Bayes assumption. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (pp. 414–419). San Francisco, CA: Morgan Kaufmann.Google Scholar
  44. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  45. Ramoni, M., & Sebastiani, P. (1997). Learning Bayesian networks from incomplete databases. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (pp. 401–408). San Mateo, CA: Morgan Kaufmann.Google Scholar
  46. Ramoni, M., & Sebastiani, P. (1998). Parameter estimation in Bayesian networks from incomplete databases. Intelligent Data Analysis, 2.Google Scholar
  47. Ramoni, M., & Sebastiani, P. (1999). Learning conditional probabilities from incomplete data: An experimental comparison. In Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics (pp. 260–265). San Mateo, CA: Morgan Kaufmann.Google Scholar
  48. Spiegelhalter, D., Dawid, A., Lauritzen, S. L., & Cowell, R. (1993). Bayesian analysis in expert systems. Statistical Science, 8, 219–282.Google Scholar
  49. Thiesson, B., Meek, C., Chickering, D. M., & Heckerman, D. (1998). Learning mixtures of DAG models. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (pp. 504–513). San Francisco, CA: Morgan Kaufmann.Google Scholar
  50. Zheng, Z. & Webb, G. I. (2000). Lazy learning of Bayesian rules. Machine Learning, 41, 53–84.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Jose M. Peña
    • 1
  • Jose A. Lozano
    • 1
  • Pedro Larrañaga
    • 1
  1. 1.Department of Computer Science and Artificial IntelligenceUniversity of the Basque CountryDonostia-San SebastiánSpain

Personalised recommendations