Advertisement

Learning Tree Augmented Naive Bayes for Ranking

  • Liangxiao Jiang
  • Harry Zhang
  • Zhihua Cai
  • Jiang Su
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3453)

Abstract

Naive Bayes has been widely used in data mining as a simple and effective classification algorithm. Since its conditional independence assumption is rarely true, numerous algorithms have been proposed to improve naive Bayes, among which tree augmented naive Bayes (TAN) [3] achieves a significant improvement in term of classification accuracy, while maintaining efficiency and model simplicity. In many real-world data mining applications, however, an accurate ranking is more desirable than a classification. Thus it is interesting whether TAN also achieves significant improvement in term of ranking, measured by AUC(the area under the Receiver Operating Characteristics curve) [8,1]. Unfortunately, our experiments show that TAN performs even worse than naive Bayes in ranking. Responding to this fact, we present a novel learning algorithm, called forest augmented naive Bayes (FAN), by modifying the traditional TAN learning algorithm. We experimentally test our algorithm on all the 36 data sets recommended by Weka [12], and compare it to naive Bayes, SBC [6], TAN [3], and C4.4 [10], in terms of AUC. The experimental results show that our algorithm outperforms all the other algorithms significantly in yielding accurate rankings. Our work provides an effective and efficient data mining algorithm for applications in which an accurate ranking is required.

Keywords

data mining and knowledge discovery learning algorithms Bayesian networks decision trees 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)CrossRefGoogle Scholar
  2. 2.
    Cohen, W.W., Schapire, R.E., Singer, Y.: Learning to order things. Journal of Artificial Intelligence Research 10, 243–270 (1997)MathSciNetGoogle Scholar
  3. 3.
    Friedman, N., Greiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning 29, 103–130 (1997)CrossRefGoogle Scholar
  4. 4.
    Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171–186 (2001)zbMATHCrossRefGoogle Scholar
  5. 5.
    Keogh, E., Pazzani, M.: Learning augmented bayesian classifiers. In: Proceedings of Seventh International Workshop on AI and Statistics. Ft. Lauderdale (1999)Google Scholar
  6. 6.
    Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pp. 339–406 (1994)Google Scholar
  7. 7.
    Merz, C., Murphy, P., Aha, D.: UCI repository of machine learning databases. Dept of ICS, University of California, Irvine (1997), http://www.ics.uci.edu/~mlearn/MLRepository.html
  8. 8.
    Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distribution. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 43–48. AAAI Press, Menlo Park (1997)Google Scholar
  9. 9.
    Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453. Morgan Kaufmann, San Francisco (1998)Google Scholar
  10. 10.
    Provost, F.J., Domingos, P.: Tree Induction for Probability-Based Ranking. Machine Learning 52(3), 199–215 (2003)zbMATHCrossRefGoogle Scholar
  11. 11.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  12. 12.
  13. 13.
    Witten, I.H., Frank, E.: Data Mining –Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, San Francisco (2000)Google Scholar
  14. 14.
    Ling, C.X., Zhang, H.: Toward Bayesian classifiers with accurate probabilities. In: Proceedings of the Sixth Pacific-Asia Conference on KDD, pp. 123–134. Springer, Heidelberg (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Liangxiao Jiang
    • 1
  • Harry Zhang
    • 2
  • Zhihua Cai
    • 1
  • Jiang Su
    • 2
  1. 1.Department of Computer ScienceChina University of GeosciencesWuhanChina
  2. 2.Faculty of Computer ScienceUniversity of New BrunswickFrederictonCanada

Personalised recommendations