Advertisement

Searching for Dependencies in Bayesian Classifiers

  • Michael J. Pazzani
Chapter
Part of the Lecture Notes in Statistics book series (LNS, volume 112)

Abstract

Naive Bayesian classifiers which make independence assumptions perform remarkably well on some data sets but poorly on others. We explore ways to improve the Bayesian classifier by searching for dependencies among attributes. We propose and evaluate two algorithms for detecting dependencies among attributes and show that the backward sequential elimination and joining algorithm provides the most improvement over the naive Bayesian classifier. The domains on which the most improvement occurs are those domains on which the naive Bayesian classifier is significantly less accurate than a decision tree learner. This suggests that the attributes used in some common databases are not independent conditioned on the class and that the violations of the independence assumption that affect the accuracy of the classifier can be detected from training data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Almuallim, H., and Dietterich, T. G. (1991). Learning with many irrelevant features. In Ninth National Conference on Artificial Intelligence, 547–552. MIT Press.Google Scholar
  2. Caruana, R., and Freitag, D. (1994). Greedy attribute selection. In Cohen, W., and Hirsh, H., eds., Machine Learning: Proceedings of the Eleventh International Conference. Morgan KaufmannGoogle Scholar
  3. Cooper, G., and Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.zbMATHGoogle Scholar
  4. Danyluk, A., and Provost, F. (1993). Small disjuncts in action: Learning to diagnose errors in the telephone network local loop. Machine Learning Conference, pp 81–88.Google Scholar
  5. Duda, R., and Hart, P. (1973). Pattern classification and scene analysis. New York: John Wiley & Sons.zbMATHGoogle Scholar
  6. John, G. Kohavi, R., and Pfleger, K. (1994). Irrelevant Features and the subset selection problem Proceedings of the Eleventh International Conference on Machine Learning. New Brunswick, NJ.Google Scholar
  7. Kittler, J. (1986). Feature selection and extraction. In Young & Fu, (eds.), Handbook of pattern recognition and image processing. New York: Academic Press.Google Scholar
  8. Kononenko, I. (1990). Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition. In B. Wielinga (Eds..), Current trends in knowledge acquisition. Amsterdam: IOS Press.Google Scholar
  9. Kononenko, I. (1991). Semi-naive Bayesian classifier. Proceedings of the Sixth European Working Session on Learning. (pp. 206–219). Porto, Portugal: Pittman.Google Scholar
  10. Langley, P. (1993). Induction of recursive Bayesian classifiers. Proceedings of the 1993 European Conference on Machine Learning. (pp. 153–164). Vienna: Springer-Verlag.Google Scholar
  11. Langley, P., and Sage, S. (1994). Induction of selective Bayesian classifiers. Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence. Seattle, WAGoogle Scholar
  12. Moore, A. W., and Lee, M. S. (1994). Efficient algorithms for minimizing cross validation error. In Cohen, W. W., and Hirsh, H., eds., Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann.Google Scholar
  13. Murphy, P. M., and Aha, D. W. (1995). UCI Repository of machine learning databases. Irvine: University of California, Department of Information & Computer Science. Machine-readable data repository ftp://ics.uci.edu:/pub/machine-learning-databases.Google Scholar
  14. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., and Brunk, C. (1994). Reducing Misclassification Costs. Proceedings of the Eleventh International Conference on Machine Learning. New Brunswick, NJ.Google Scholar
  15. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann.Google Scholar
  16. Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.Google Scholar
  17. Rachlin, Kasif, Salzberg, and Aha, (1994). Towards a better understanding of memory-based reasoning systems. Proceedings of the Eleventh International Conference on Machine Learning. New Brunswick, NJ.Google Scholar
  18. Ragavan, H., and Rendell, L. (1993). Lookahead feature construction for learning hard concepts. Machine Learning: Proceedings of the Tenth International Conference. Morgan KaufmannGoogle Scholar
  19. Schlimmer, J. (1987). Incremental adjustment of representations for learning. Machine Learning: Proceedings of the Fourth International Workshop. Morgan KaufmannGoogle Scholar
  20. Schaffer, C. (1994). A conservation law of generalization performance. Proceedings of the Eleventh International Conference on Machine Learning. New Brunswick, NJ.Google Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • Michael J. Pazzani
    • 1
  1. 1.Department of Information and Computer ScienceUniversity of California, IrvineIrvineUSA

Personalised recommendations