Mining Bayesian Network Structure for Large Sets of Variables

  • Mieczysław A. Kłopotek
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2366)


A well-known problem with Bayesian networks (BN) is the practical limitation for the number of variables for which a Bayesian network can be learned in reasonable time. Even the complexity of simplest tree-like BN learning algorithms is prohibitive for large sets of variables. The paper presents a novel algorithm overcoming this limitation for the tree-like class of Bayesian networks. The new algorithm space consumption grows linearly with the number of variables n while the execution time is proportional to n ln(n), outperforming any known algorithm. This opens new perspectives in construction of Bayesian networks from data containing tens of thousands and more variables, e.g. in automatic text categorization.


Bayesian Network Edge Node Bayesian Belief Network Edge Tree Edge Removal 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cerquides, J.: Applying General Bayesian Techniques to Improve TAN Induction, Knowledge Discovery and Data Mining, 1999, pp 292–296.Google Scholar
  2. 2.
    Cheng, J., Bell, D.A., Liu, W.: An algorithm for Bayesian belief network construction from data, Proceedings of AI & STAT’97, Ft. Lauderdale, Florida, 1997.Google Scholar
  3. 3.
    Cheng, J., Bell, D.A., Liu, W.: Learning belief networks from data: an information theory based approach. Proceedings of the Sixth ACM International Conference on Information and Knowledge Management, 1997.Google Scholar
  4. 4.
    Chow, C. K., Liu, C. N.: Approximating discrete probability distributions with dependence trees, IEEE Trans. on IT, IT-14, No. 3, 1968, pp. 462–467CrossRefGoogle Scholar
  5. 5.
    Chou, C. K., Wagner, T. J.: Consistency of an estimate of tree-dependent probability distribution, IEEE Transactions on Information Theory, IT-19, 1973, 369–371CrossRefGoogle Scholar
  6. 6.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers, Machine Learning vol. 29, 1997, pp. 131.zbMATHCrossRefGoogle Scholar
  7. 7.
    Inza, N., Merino, M., Larranaga, P., Quiroga, J., Sierra, B., Girala, M.: Feature Subset selection by genetic algorithms and estimation of distribution algorithms. A case study in the survival of cirrhotic patients treated with TIPS. Artificial Intelligence in Medicine (in press)Google Scholar
  8. 8.
    K1lopotek M.A.: A New Bayesian Tree Learning Method with Reduced Time and Space Complexity. Fundamenta Informaticae, 49(2002), IOS Press, in pressGoogle Scholar
  9. 9.
    Kłopotek, M. A., et al.: Bayesian Network Mining System. Proc. X International Symposium on Intelligent Information Systems, Zakopane, 18–22 June, 2001, Springer-Verlag, New York 2001. pp. 97-110Google Scholar
  10. 10.
    Meila, M., Jordan, M.: Learning with mixtures of trees. Journal of Machine Learning Research, Vol. 1, 2000Google Scholar
  11. 11.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo CA, 1988.Google Scholar
  12. 12.
    Suzuki, J.: Learning Bayesian Belief Networks based on the Minimum Descripion Length Principle: Basic Properties, IEICE Trans.Found., Vol. E82-A, Oct. 1999Google Scholar
  13. 13.
    Valiveti, R. S., Oommen, B. J.: On using the chi-squared statistics for determining statistic dependence, Pattern Recognition Vol. 25 No. 11, 1992, pp. 1389–1400.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Mieczysław A. Kłopotek
    • 1
    • 2
  1. 1.Institute of Computer SciencePolish Academy of SciencesWarsawPoland
  2. 2.Institute of Computer ScienceUniversity of PodlasieSiedlcePoland

Personalised recommendations