Multinomial Naive Bayes for Text Categorization Revisited

  • Ashraf M. Kibriya
  • Eibe Frank
  • Bernhard Pfahringer
  • Geoffrey Holmes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3339)


This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning. More specifically, it compares standard multinomial naive Bayes to the recently proposed transformed weight-normalized complement naive Bayes classifier (TWCNB) [1], and shows that some of the modifications included in TWCNB may not be necessary to achieve optimum performance on some datasets. However, it does show that TFIDF conversion and document length normalization are important. It also shows that support vector machines can, in fact, sometimes very significantly outperform both methods. Finally, it shows how the performance of multinomial naive Bayes can be improved using locally weighted learning. However, the overall conclusion of our paper is that support vector machines are still the method of choice if the aim is to maximize accuracy.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of naive Bayes text classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning, Washington, DC, pp. 616–623. AAAI Press, Menlo Park (2003)Google Scholar
  2. 2.
    McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. Technical report, American Association for Artificial Intelligence Workshop on Learning for Text Categorization (1998)Google Scholar
  3. 3.
    Eyheramendy, S., Lewis, D.D., Madigan, D.: On the naive Bayes model for text categorization. In: Ninth International Workshop on Artificial Intelligence and Statistics, pp. 3–6 (2003)Google Scholar
  4. 4.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the Tenth European Conference on Machine Learning, pp. 137–142. Springer, Heidelberg (1998)Google Scholar
  5. 5.
    Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155. ACM Press, New York (1998)Google Scholar
  6. 6.
    Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49. ACM Press, New York (1999), doi:10.1145/312624.312647CrossRefGoogle Scholar
  7. 7.
    Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Information Retrieval 4, 5–31 (2001)zbMATHCrossRefGoogle Scholar
  8. 8.
    Rennie, J.: Personal communication regarding WCNB (2004)Google Scholar
  9. 9.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods—Support Vector Learning, MIT Press, Cambridge (1998)Google Scholar
  10. 10.
    Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar
  11. 11.
    Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, MIT Press, Cambridge (1999)Google Scholar
  12. 12.
    Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning. Artificial Intelligence Review 11, 11–73 (1997)CrossRefGoogle Scholar
  13. 13.
    Frank, E., Hall, M., Pfahringer, B.: Locally weighted naive Bayes. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, San Francisco (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Ashraf M. Kibriya
    • 1
  • Eibe Frank
    • 1
  • Bernhard Pfahringer
    • 1
  • Geoffrey Holmes
    • 1
  1. 1.Department of Computer ScienceUniversity of WaikatoHamiltonNew Zealand

Personalised recommendations