Learning Bayesian Networks Using Feature Selection

  • Gregory M. Provan
  • Moninder Singh
Part of the Lecture Notes in Statistics book series (LNS, volume 112)


This paper introduces a novel enhancement for learning Bayesian networks with a bias for small, high-predictive-accuracy networks. The new approach selects a subset of features that maximizes predictive accuracy prior to the network learning phase. We examine explicitly the effects of two aspects of the algorithm, feature selection and node ordering. Our approach generates networks that are computationally simpler to evaluate and display predictive accuracy comparable to that of Bayesian networks which model all attributes.


Feature Selection Bayesian Network Predictive Accuracy Feature Selection Algorithm Attribute Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Aha94]
    Aha, D.W. and R.L. Bankert (1994). Feature selection for case-based classification of cloud types. In AAAI Workshop on Case-based Reasoning, 106–112, Seattle, WA. AAAI Press.Google Scholar
  2. [Amuallim9l]
    Amuallim, H. and T.G. Dietterich (1991). Learning with Many Irrelevant Features. In Proc. Conf of the AAAI,547–552. AAAI Press.Google Scholar
  3. [Andersen89]
    Andersen, S.K., K.G. Olesen, F.V. Jensen and F. Jensen (1989). “HUGINa Shell for Building Bayesian Belief Universes for Expert Systems”. Procs. IJCAI, 1080–1085.Google Scholar
  4. [Buntine92]
    Buntine, W.L. and T. Niblett (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 7.Google Scholar
  5. [Cardie93]
    Cardie, C. (1993). Using Decision Trees to Improve Case-based Learning. In Proc. Machine Learning,25–32. Morgan Kaufmann.Google Scholar
  6. [Caruana94]
    Caruana, R. and D. Freitag (1994). Greedy attribute selection. In W. Cohen and H. Hirsch, editors, Proc. Machine Learning,28–36. Morgan Kaufmann.Google Scholar
  7. [Cheeseman94]
    Cheeseman, P. and W. Oldford, editors. (1994). Selecting Models from Data: AI and Statistics IV. Springer-Verlag.CrossRefGoogle Scholar
  8. [Cooper92]
    Cooper, G.F. and E. Herskovits (1992). A Bayesian Method for the Induction of of Probabilistic Networks from Data. In Machine Learning 9, 54–62.Google Scholar
  9. [Dawid92]
    ]Dawid, A.P. (1992). Prequential Analysis, Stochastic Complexity and Bayesian Inference. In J.M. Bernardo, J. Berger, A. Dawid, and A. Smith, editors, Bayesian Statistics 4,109–125. Oxford Science Publications.Google Scholar
  10. [Dejviver82]
    ]Dejviver, P. and J. Kittler (1982). Pattern Recognition: A Statistical Approach. Prentice-Hall.Google Scholar
  11. [Herskovits90]
    Herskovits, E. and G.F. Cooper (1990). KUTATO: An Entropy-Driven System for Construction of Probabilistic Expert Systems from Databases. In Proc. Conf. Uncertainty in Artificial Intelligence, 54–62.Google Scholar
  12. [John94]
    John, G., R. Kohavi, and K. Pfleger (1994). Irrelevant features and the subset selection problem. Proc. Machine Learning, 121–129. Morgan Kaufmann.Google Scholar
  13. [Kira92a]
    Kira, K. and L. Rendell (1992). A practical approach to feature selection. In Proc. Machine Learning, 249–256, Aberdeen, Scotland. Morgan Kaufmann.Google Scholar
  14. [Kira92b]
    Kira, K. and L. Rendell (1992). The Feature Selection Problem: Traditional Methods and a New Algorithm. In Proc. AAAI,129–134. AAAI Press.Google Scholar
  15. [Kononenko94]
    Kononenko, I. (1994). Estimating attributes: Analysis and extension of relief. In Proc. European Conf. on Machine Learning, 171–182.Google Scholar
  16. [Langley94a]
    ]Langley, P. (1994). Selection of relevant features in machine learning. In R. Greiner, editor, Proc. AAAI Fall Symposium on Relevance. AAAI Press.Google Scholar
  17. [Langley94b]
    Langley, P. and S. Sage (1994). Induction of selective bayesian classifiers. In Proc. Conf. on Uncertainty in AI,399–406. Morgan Kaufmann.Google Scholar
  18. [Madigan93]
    Madigan, D., A. Raftery, J. York, J. Bradshaw, and R. Almond (1993). Strategies for Graphical Model Selection. In Proc. International Workshop on AI and Statistics, 331–336.Google Scholar
  19. [Mari1163]
    Marill, T. and D. Green (1963). On the effectiveness of receptors in recognition systems. IEEE Trans. on Information Theory, 9: 11–17.CrossRefGoogle Scholar
  20. [Murphy92]
    Murphy, P.M. and D.W. Aha (1992). UCI Repository of Machine Learning Databases. Dept. of Information and Computer Science, Univ. of California, Irvine.Google Scholar
  21. [Narendra77]
    ]Narendra, M. and K. Fukunaga (1977). A branch and bound algorithm for feature subset selection. IEEE Trans. on Computers, C-26(9):917–922.Google Scholar
  22. [Siedlecki88]
    Siedlecki, W. and J. Sklansky (1988). On automatic feature selection. Itnl. J. of Pattern Recognition and Artificial Intelligence, 2 (2): 197–220.CrossRefGoogle Scholar
  23. [Singh95]
    Singh, M. and M. Valtorta (1995). Construction of Bayesian Network Structures from Data: a Brief Survey and an Efficient Algorithm. Int. Journal of Approximate Reasoning, 12, 111–131.CrossRefzbMATHGoogle Scholar
  24. [Xu89]
    ]Xu, L., P. Yan, and T. Chang. (1989). Best-first strategy for feature selection. In Proc. Ninth International Conf. on Pattern Recognition,706–708. IEEE Computer Society Press.Google Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • Gregory M. Provan
    • 1
  • Moninder Singh
    • 2
  1. 1.Institute for Decision Systems ResearchLos AltosUSA
  2. 2.Dept. of Computer and Information ScienceUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations