Skip to main content
Log in

Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In many application domains, there is a need for learning algorithms that can effectively exploit attribute value taxonomies (AVT)—hierarchical groupings of attribute values—to learn compact, comprehensible and accurate classifiers from data—including data that are partially specified. This paper describes AVT-NBL, a natural generalization of the naïve Bayes learner (NBL), for learning classifiers from AVT and data. Our experimental results show that AVT-NBL is able to generate classifiers that are substantially more compact and more accurate than those produced by NBL on a broad range of data sets with different percentages of partially specified values. We also show that AVT-NBL is more efficient in its use of training data: AVT-NBL produces classifiers that outperform those produced by NBL using substantially fewer training examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Almuallim H, Akiba Y, Kaneda S (1995) On handling tree-structured attributes. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann, pp 12–20

  2. Almuallim H, Akiba Y, Kaneda S (1996) An efficient algorithm for finding optimal gain-ratio multiple-split tests on hierarchical attributes in decision tree learning. In: Proceedings of the thirteenth national conference on artificial intelligence and eighth innovative applications of artificial intelligence conference, vol 1. AAAI/MIT Press, pp 703–708

  3. Aronis J, Provost F, Buchanan B (1996) Exploiting background knowledge in automated discovery. In: Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, pp 355–358

  4. Aronis J, Provost F (1997) Increasing the efficiency of inductive learning with breadth-first marker propagation. In: Proceedings of the third international conference on knowledge discovery and data mining. AAAI Press, pp 119–122

  5. Ashburner M, et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Gen 25:25–29

    Google Scholar 

  6. Bergadano F, Giordana A (1990) Guiding induction with domain theories. Machine learning—an artificial intelligence approach, vol. 3. Morgan Kaufmann, pp 474–492

  7. Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am pp 35–43

  8. Bhattacharya I, Getoor L (2004) Deduplication and group detection using links. KDD workshop on link analysis and group detection, Aug. 2004. Seattle

  9. Caragea D, Silvescu A, Honavar V (2004) A framework for learning from distributed data using sufficient statistics and its application to learning decision trees. Int J Hybrid Intell Syst 1:80–89

    Google Scholar 

  10. Caragea D, Pathak J, Honavar V (2004) Learning classifiers from semantically heterogeneous data. In: Proceedings of the third international conference on ontologies, databases, and applications of semantics for large scale information systems. pp 963–980

  11. Chen A, Chiu J, Tseng F (1996) Evaluating aggregate operations over imprecise data. IEEE Trans Knowl Data En 8:273–284

    Google Scholar 

  12. Clare A, King R (2001) Knowledge discovery in multi-label phenotype data. In: Proceedings of the fifth European conference on principles of data mining and knowledge discovery. Lecture notes in computer science, vol 2168. Springer, Berlin Heidelberg New York, pp 42–53

  13. Cohen W (1996) Learning trees and rules with set-valued features. In: Proceedings of the thirteenth national conference on artificial intelligence. AAAI/MIT Press, pp 709–716

  14. DeMichiel L (1989) Resolving database incompatibility: an approach to performing relational operations over mismatched domains. IEEE Trans Knowl Data Eng 1:485–493

    Article  Google Scholar 

  15. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc, Series B 39:1–38

    MathSciNet  Google Scholar 

  16. desJardins M, Getoor L, Koller D (2000) Using feature hierarchies in Bayesian network learning. In: Proceedings of symposium on abstraction, reformulation, and approximation 2000. Lecture notes in artificial intelligence, vol 1864, Springer, Berlin Heidelberg New York, pp 260–270

  17. Dhar V, Tuzhilin A (1993) Abstract-driven pattern discovery in databases. IEEE Trans Knowl Data Eng 5:926–938

    Article  Google Scholar 

  18. Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29:103–130

    Article  Google Scholar 

  19. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163

    Article  Google Scholar 

  20. Han J, Fu Y (1996) Attribute-oriented induction in data mining. Advances in knowledge discovery and data mining. AAAI/MIT Press, pp 399–421

  21. Haussler D (1998) Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artif Intell 36:177–221

    MathSciNet  Google Scholar 

  22. Hendler J, Stoffel K, Taylor M (1996) Advances in high performance knowledge representation. University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, July 1996. CS-TR-3672 (Also cross-referenced as UMIACS-TR-96-56)

  23. Kang D, Silvescu A, Zhang J, Honavar V (2004) Generation of attribute value taxonomies from data for data-driven construction of accurate and compact classifiers. In: Proceedings of the fourth IEEE international conference on data mining, pp 130–137

  24. Kohavi R, Becker B, Sommerfield D (1997) Improving simple Bayes. Tech. Report, Data mining and visualization group, Silicon Graphics Inc.

  25. Kohavi R, Provost P (2001) Applications of data mining to electronic commerce. Data Min Knowl Discov 5:5–10

    MATH  MathSciNet  Google Scholar 

  26. Kohavi R, Mason L, Parekh R, Zheng Z (2004) Lessons and challenges from mining retail E-commerce data. Special Issue: Data mining lessons learned. Mach Learn 57:83–113

    Article  MATH  Google Scholar 

  27. Koller D, Sahami M (1997) Hierarchically classifying documents using very few words. In: Proceedings of the fourteenth international conference on machine learning. Morgan Kaufmann, pp 170–178

  28. Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: Proceedings of the tenth national conference on artificial intelligence. AAAI/MIT Press, pp 223-228

  29. McCallum A, Rosenfeld R, Mitchell T, Ng A (1998) Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the fifteenth international conference on machine learning. Morgan Kaufmann, pp 359–367

  30. McClean S, Scotney B, Shapcott M (2001) Aggregation of imprecise and uncertain information in databases. IEEE Trans Know Data Eng 13:902–912

    Google Scholar 

  31. Mitchell T (1997) Machine Learning. Addison-Wesley

  32. Núñez M (1991) The use of background knowledge in decision tree induction. Mach Learn 6:231–250

    Google Scholar 

  33. Pazzani M, Kibler D (1992) The role of prior knowledge in inductive learning. Mach Learn 9:54–97

    Google Scholar 

  34. Pazzani M, Mani S, Shankle W (1997) Beyond concise and colorful: learning intelligible rules. In: Proceedings of the third international conference on knowledge discovery and data mining. AAAI Press, pp 235–238

  35. Pereira F, Tishby N, Lee L (1993) Distributional clustering of English words. In: Proceedings of the thirty-first annual meeting of the association for computational linguistics. pp 183–190

  36. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA

  37. Rissanen J (1978) Modeling by shortest data description. Automatica 14:37–38

    Article  Google Scholar 

  38. Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. ACM SIGIR 2000. pp 208–215

  39. Taylor M, Stoffel K, Hendler J (1997) Ontology-based induction of high level classification rules. SIGMOD data mining and knowledge discovery workshop, Tuscon, Arizona

  40. Towell G, Shavlik J (1994) Knowledge-based artificial neural networks. Artif Intell 70:119–165

    Article  Google Scholar 

  41. Undercoffer J, et al (2004) A target centric ontology for intrusion detection: using DAML+OIL to classify intrusive behaviors. Knowledge Engineering Review—Special Issue on Ontologies for Distributed Systems, January 2004, Cambridge University Press

    Google Scholar 

  42. Walker A (1980) On retrieval from a small version of a large database. In: Proceedings of the sixth international conference on very large data bases. pp 47–54

  43. Yamazaki T, Pazzani M, Merz C (1995) Learning hierarchies from ambiguous natural language data. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann, pp 575–583

  44. Zhang J, Silvescu A, Honavar V (2002) Ontology-driven induction of decision trees at multiple levels of abstraction. In: Proceedings of symposium on abstraction, reformulation, and approximation 2002. Lecture notes in artificial intelligence, vol 2371. Springer, Berlin Heidelberg New York, pp 316–323

  45. Zhang J, Honavar V (2003) Learning decision tree classifiers from attribute value taxonomies and partially specified data. In: Proceedings of the twentieth international conference on machine learning. AAAI Press, pp 880–887

  46. Zhang J, Honavar V (2004) AVT-NBL: an algorithm for learning compact and accurate naive Bayes classifiers from attribute value taxonomies and data. In: Proceedings of the fourth IEEE international conference on data mining. IEEE Computer Society, pp 289-296

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Zhang.

Additional information

This paper is an extended version of a paper published in the 4th IEEE International Conference on Data Mining, 2004.

Jun Zhang is currently a PhD candidate in computer science at Iowa State University, USA. His research interests include machine learning, data mining, ontology-driven learning, computational biology and bioinformatics, evolutionary computation and neural networks. From 1993 to 2000, he was a lecturer in computer engineering at University of Science and Technology of China. Jun Zhang received a MS degree in computer engineering from the University of Science and Technology of China in 1993 and a BS in computer science from Hefei University of Technology, China, in 1990.

Dae-Ki Kang is a PhD student in computer science at Iowa State University. His research interests include ontology learning, relational learning, and security informatics. Prior to joining Iowa State, he worked at a Bay-area startup company and at Electronics and Telecommunication Research Institute in South Korea. He received a Masters degree in computer science at Sogang University in 1994 and a bachelor of engineering (BE) degree in computer science and engineering at Hanyang University in Ansan in 1992.

Adrian Silvescu is a PhD candidate in computer science at Iowa State University. His research interests include machine learning, artificial intelligence, bioinformatics and complex adaptive systems. He received a MS degree in theoretical computer science from the University of Bucharest, Romania, in 1997, and received a BS in computer science from the University of Bucharest in 1996.

Vasant Honavar received a BE in electronics engineering from Bangalore University, India, an MS in electrical and computer Engineering from Drexel University and an MS and a PhD in computer science from the University of Wisconsin, Madison. He founded (in 1990) and has been the director of the Artificial Intelligence Research Laboratory at Iowa State University (ISU), where he is currently a professor of computer science and of bioinformatics and computational biology. He directs the Computational Intelligence, Learning & Discovery Program, which he founded in 2004. Honavar's research and teaching interests include artificial intelligence, machine learning, bioinformatics, computational molecular biology, intelligent agents and multiagent systems, collaborative information systems, semantic web, environmental informatics, security informatics, social informatics, neural computation, systems biology, data mining, knowledge discovery and visualization. Honavar has published over 150 research articles in refereed journals, conferences and books and has coedited 6 books. Honavar is a coeditor-in-chief of the Journal of Cognitive Systems Research and a member of the Editorial Board of the Machine Learning Journal and the International Journal of Computer and Information Security. Prof. Honavar is a member of the Association for Computing Machinery (ACM), American Association for Artificial Intelligence (AAAI), Institute of Electrical and Electronic Engineers (IEEE), International Society for Computational Biology (ISCB), the New York Academy of Sciences, the American Association for the Advancement of Science (AAAS) and the American Medical Informatics Association (AMIA).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Kang, DK., Silvescu, A. et al. Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data. Knowl Inf Syst 9, 157–179 (2006). https://doi.org/10.1007/s10115-005-0211-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-005-0211-z

Keywords

Navigation