Skip to main content

Advertisement

SpringerLink
Go to cart
  • Log in
  1. Home
  2. Machine Learning
  3. Article
Bayesian Network Classifiers
Download PDF
Your article has downloaded

Similar articles being viewed by others

Slider with three articles shown per slide. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide.

Multi-dimensional Bayesian network classifiers: A survey

11 July 2020

Santiago Gil-Begue, Concha Bielza & Pedro Larrañaga

Accelerated gradient boosting

04 February 2019

G. Biau, B. Cadre & L. Rouvière

On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

26 July 2018

Tri Le & Bertrand Clarke

Supervised learning via smoothed Polya trees

12 October 2018

William Cipolli III & Timothy Hanson

Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation

02 April 2019

Mansoor Sheikh & A. C. C. Coolen

Recent advances in decision trees: an updated survey

10 October 2022

Vinícius G. Costa & Carlos E. Pedreira

Multi-step Training of a Generalized Linear Classifier

29 September 2018

Kanishka Tyagi & Michael Manry

An extensive experimental evaluation of automated machine learning methods for recommending classification algorithms

19 August 2020

M. P. Basgalupp, R. C. Barros, … A. A. Freitas

The voice of optimization

19 July 2020

Dimitris Bertsimas & Bartolomeo Stellato

Download PDF
  • Published: November 1997

Bayesian Network Classifiers

  • Nir Friedman1,
  • Dan Geiger2 &
  • Moises Goldszmidt3 

Machine Learning volume 29, pages 131–163 (1997)Cite this article

  • 28k Accesses

  • 3140 Citations

  • 10 Altmetric

  • Metrics details

Abstract

Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state-of-the-art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.

Download to read the full article text

Working on a manuscript?

Avoid the common mistakes

References

  • Binder, J., D. Koller, S. Russell, & K. Kanazawa (1997). Adaptive probabilistic networks with hidden variables. Machine Learning, this issue.

  • Bouckaert, R. R. (1994). Properties of Bayesian network learning algorithms. In R. López de Mantarás & D. Poole (Eds.), Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 102–109). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Buntine, W. (1991). Theory refinement on Bayesian networks. In B. D. D' Ambrosio, P. Smets, & P. P. Bonissone (Eds.), Proceedings of the Seventh Annual Conference on Uncertainty Artificial Intelligence (pp. 52–60). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Buntine, W. (1996). A guide to the literature on learning probabilistic networks from data. IEEE Trans. on Knowledge and Data Engineering, 8, 195–210.

    Google Scholar 

  • Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In L. C. Aiello (Ed.), Proceedings of the 9th European Conference on Artificial Intelligence (pp. 147–149). London: Pitman.

    Google Scholar 

  • Chickering, D.M. (1995). Learning Bayesian networks is NP-complete. In D. Fisher & A. Lenz, Learning from Data. Springer-Verlag.

  • Chickering, D. M. & D. Heckerman (1996). Efficient approximations for the marginal likelihood of incomplete data given a Bayesian network. In E. Horvits & F. Jensen (Eds.), Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (pp. 158–168). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Chow, C. K. & C. N. Liu (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. on Info. Theory, 14, 462–467.

    Google Scholar 

  • Cooper, G. F. & E. Herskovits (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.

    Google Scholar 

  • Cormen, T. H., C. E. Leiserson, & R. L. Rivest (1990). Introduction to Algorithms. Cambridge, MA: MIT Press.

    Google Scholar 

  • Cover, T. M. & J. A. Thomas (1991). Elements of Information Theory. New York: John Wiley & Sons.

    Google Scholar 

  • Dawid, A. P. (1976). Properties of diagnostic data distributions. Biometrics, 32, 647–658.

    Google Scholar 

  • DeGroot, M. H. (1970). Optimal Statistical Decisions. New York: McGraw-Hill.

    Google Scholar 

  • Domingos, P. & M. Pazzani (1996). Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In L. Saitta (Ed.), Proceedings of the Thirteenth International Conference on Machine Learning (pp. 105–112). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Dougherty, J., R. Kohavi, & M. Sahami (1995). Supervised and unsupervised discretization of continuous features. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 194–202). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Duda, R. O. & P. E. Hart (1973). Pattern Classification and Scene Analysis. New York: John Wiley & Sons.

    Google Scholar 

  • Ezawa, K. J. & T. Schuermann (1995). Fraud/uncollectable debt detection using a Bayesian network based learning system: A rare binary outcome with mixed data structures. In P. Besnard & S. Hanks (Eds.), Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 157–166). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Fayyad, U. M. & K. B. Irani (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1022–1027). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Friedman, J. (1997a). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1, 55–77.

    Google Scholar 

  • Friedman, N. (1997b). Learning belief networks in the presence of missing values and hidden variables. In D. Fisher (Ed.), Proceedings of the Fourteenth International Conference on Machine Learning (pp. 125–133). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Friedman, N. & M. Goldszmidt (1996a). Building classifiers using Bayesian networks. In Proceedings of the National Conference on Artificial Intelligence (pp. 1277–1284). Menlo Park, CA: AAAI Press.

    Google Scholar 

  • Friedman, N. & M. Goldszmidt (1996b). Discretization of continuous attributes while learning Bayesian networks. In L. Saitta (Ed.), Proceedings of the Thirteenth International Conference on Machine Learning (pp. 157–165). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Friedman, N. & M. Goldszmidt (1996c). Learning Bayesian networks with local structure. In E. Horvits & F. Jensen (Eds.), Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (pp. 252–262). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Geiger, D. (1992). An entropy-based learning algorithm of Bayesian conditional trees. In D. Dubois, M. P. Wellman, B. D. D' Ambrosio, & P. Smets (Eds.), Proceedings of the Eighth Annual Conference on Uncertainty Artificial Intelligence (pp. 92–97). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Geiger, D. & D. Heckerman (1996). Knowledge representation and inference in similarity networks and Bayesian multinets. Artificial Intelligence, 82, 45–74.

    Google Scholar 

  • Geiger, D., D. Heckerman, & C. Meek (1996). Asymptotic model selection for directed graphs with hidden variables. In E. Horvits & F. Jensen (Eds.), Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (pp. 283–290). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Heckerman, D. (1991). Probabilistic Similarity Networks. Cambridge, MA: MIT Press.

    Google Scholar 

  • Heckerman, D. (1995). A tutorial on learning Bayesian networks. Technical Report MSR-TR–95–06, Microsoft Research.

  • Heckerman, D. & D. Geiger (1995). Learning Bayesian networks: a unification for discrete and Gaussian domains. In P. Besnard & S. Hanks (Eds.), Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 274–284). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Heckerman, D., D. Geiger, & D. M. Chickering (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.

    Google Scholar 

  • John, G. & R. Kohavi (1997). Wrappers for feature subset selection. Artificial Intelligence. Accepted for publication. A preliminary version appears in Proceedings of the Eleventh International Conference on Machine Learning, 1994, pp. 121–129, under the title “Irrelevant features and the subset selection problem”.

  • John, G. H. & P. Langley (1995). Estimating continuous distributions in Bayesian classifiers. In P. Besnard & S. Hanks (Eds.), Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 338–345). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1137–1143). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Kohavi, R., G. John, R. Long, D. Manley, & K. Pfleger (1994). MLC++: A machine learning library in C++. In Proc. Sixth International Conference on Tools with Artificial Intelligence (pp. 740–743). IEEE Computer Society Press.

  • Kononenko, I. (1991). Semi-naive Bayesian classifier. In Y. Kodratoff (Ed.), Proc. Sixth European Working Session on Learning (pp. 206–219). Berlin: Springer-Verlag.

    Google Scholar 

  • Kullback, S. & R. A. Leibler (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 76–86.

    Google Scholar 

  • Lam, W. & F. Bacchus (1994). Learning Bayesian belief networks. An approach based on the MDL principle. Computational Intelligence, 10, 269–293.

    Google Scholar 

  • Langley, P., W. Iba, & K. Thompson (1992). An analysis of Bayesian classifiers. In Proceedings, Tenth National Conference on Artificial Intelligence (pp. 223–228). Menlo Park, CA: AAAI Press.

    Google Scholar 

  • Langley, P. & S. Sage (1994). Induction of selective Bayesian classifiers. In R. López de Mantarás & D. Poole (Eds.), Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 399–406). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Lauritzen, S. L. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 19, 191–201.

    Google Scholar 

  • Lewis, P. M. (1959). Approximating probability distributions to reduce storage requirements. Information and Control, 2, 214–225.

    Google Scholar 

  • Murphy, P. M. & D. W. Aha (1995). UCI repository of machine learning databases. http://www.ics.uci. edu/~mlearn/MLRepository.html.

  • Pazzani, M. J. (1995). Searching for dependencies in Bayesian classifiers. In D. Fisher & H. Lenz (Eds.), Proceedings of the fifth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL.

  • Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge: Cambridge University Press.

    Google Scholar 

  • Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.

    Google Scholar 

  • Rubin, D. R. (1976). Inference and missing data. Biometrica, 63, 581–592.

    Google Scholar 

  • Singh, M. & G.M. Provan (1995). A comparison of induction algorithms for selective and non-selective Bayesian classifiers. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 497–505). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Singh, M. & G. M. Provan (1996). Efficient learning of selective Bayesian network classifiers. In L. Saitta (Ed.), Proceedings of the Thirteenth International Conference on Machine Learning (pp. 453–461). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Spiegelhalter, D. J., A. P. Dawid, S. L. Lauritzen, & R. G. Cowell (1993). Bayesian analysis in expert systems. Statistical Science, 8, 219–283.

    Google Scholar 

  • Suzuki, J. (1993). A construction of Bayesian networks from databases based on an MDL scheme. In D. Heckerman & A. Mamdani (Eds.), Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence (pp. 266–273). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Computer Science Division, University of California, 387 Soda Hall, Berkeley, CA, 94720

    Nir Friedman

  2. Computer Science Department, Technion, Haifa, Israel, 32000

    Dan Geiger

  3. SRI International, 333 Ravenswood Ave., Menlo Park, CA, 94025

    Moises Goldszmidt

Authors
  1. Nir Friedman
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Dan Geiger
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Moises Goldszmidt
    View author publications

    You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Friedman, N., Geiger, D. & Goldszmidt, M. Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997). https://doi.org/10.1023/A:1007465528199

Download citation

  • Issue Date: November 1997

  • DOI: https://doi.org/10.1023/A:1007465528199

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Bayesian networks
  • classification
Download PDF

Working on a manuscript?

Avoid the common mistakes

Advertisement

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not logged in - 95.216.99.153

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.