Combining Modifications to Multinomial Naive Bayes for Text Classification

  • Antti Puurula
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7675)


Multinomial Naive Bayes (MNB) is a preferred classifier for many text classification tasks, due to simplicity and trivial scaling to large scale tasks. However, in terms of classification accuracy it has a performance gap to modern discriminative classifiers, due to strong data assumptions. This paper explores the optimized combination of popular modifications to generative models in the context of MNB text classification. In order to optimize the introduced classifier metaparameters, we explore direct search optimization using random search algorithms. We evaluate 7 basic modifications and 4 search algorithms across 5 publicly availably available datasets, and give comparisons to similarly optimized Multiclass Support Vector Machine (SVM) classifiers. The use of optimized modifications results in over 20% mean reduction in classification errors compared to baseline MNB models, reducing the gap between SVM and MNB mean performance by over 60%. Some of the individual modifications are shown to have substantial and significant effects, while differences between the random search algorithms are smaller and not statistically significant. The evaluated modifications are potentially applicable to many applications of generative text modeling, where similar performance gains can be achieved.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lewis, D.D.: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  2. 2.
    Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of naive bayes text classifiers. In: ICML 2003, pp. 616–623 (2003)Google Scholar
  3. 3.
    Kibriya, A.M., Frank, E., Pfahringer, B., Holmes, G.: Multinomial Naive Bayes for Text Categorization Revisited. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 488–499. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Schneider, K.-M.: Techniques for Improving the Performance of Naive Bayes for Text Classification. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 682–693. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Crammer, K., Singer, Y.: On the learnability and design of output codes for multiclass problems. In: Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, COLT 2000, pp. 35–46. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
  6. 6.
    Keerthi, S.S., Sundararajan, S., Chang, K.W., Hsieh, C.J., Lin, C.J.: A sequential dual method for large scale multi-class linear SVMs. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 408–416. ACM, New York (2008)CrossRefGoogle Scholar
  7. 7.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A Library for Large Linear Classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)MATHGoogle Scholar
  8. 8.
    Bergstra, J., Bengio, Y.: Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research 13, 281–305 (2012)MathSciNetGoogle Scholar
  9. 9.
    Jones, K.S.: A Statistical Interpretation of Term Specificity and its Application in Retrieval. Journal of Documentation 28(1), 11–21 (1972)CrossRefGoogle Scholar
  10. 10.
    Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1996, pp. 21–29. ACM, New York (1996)CrossRefGoogle Scholar
  11. 11.
    Lee, L.: IDF revisited: a simple new derivation within the Robertson-Spärck Jones probabilistic model. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 751–752. ACM, New York (2007)CrossRefGoogle Scholar
  12. 12.
    Robertson, S., Zaragoza, H.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr. 3, 333–389 (2009)CrossRefGoogle Scholar
  13. 13.
    Zhai, C., Lafferty, J.: Two-stage language models for information retrieval. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002, pp. 49–56. ACM, New York (2002)CrossRefGoogle Scholar
  14. 14.
    Wang, L., Lin, J., Metzler, D.: A cascade ranking model for efficient ranked retrieval. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 105–114. ACM, New York (2011)Google Scholar
  15. 15.
    Powell, M.J.D.: Direct search algorithms for optimization calculations. Acta Numerica 7, 287–336 (1998)CrossRefGoogle Scholar
  16. 16.
    Luke, S.: Essentials of Metaheuristics. Version 1.2 edn. Lulu (2009),
  17. 17.
    Hansen, N., Auger, A., Ros, R., Finck, S., Pošík, P.: Comparing results of 31 algorithms from the black-box optimization benchmarking bbob-2009. In: Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO 2010, pp. 1689–1696. ACM, New York (2010)CrossRefGoogle Scholar
  18. 18.
    Favreau, R.R., Franks, R.G.: Statistical optimization. In: Proceedings Second International Analog Computer Conference (1958)Google Scholar
  19. 19.
    White, R.C.: A survey of random methods for parameter optimization. Simulation 17, 197–205 (1971)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003)CrossRefGoogle Scholar
  21. 21.
    Brunato, M., Battiti, R.: Rash: A Self-Adaptive Random Search Method. In: Cotta, C., Sevaux, M., Sörensen, K. (eds.) Adaptive and Multilevel Metaheuristics. SCI, vol. 136, pp. 95–117. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  22. 22.
    Cardoso-Cachopo, A.: Improving Methods for Single-label Text Categorization. PhD thesis, Instituto Superior Técnico - Universidade Técnica de Lisboa (October 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Antti Puurula
    • 1
  1. 1.Department of Computer ScienceThe University of WaikatoHamiltonNew Zealand

Personalised recommendations