Advertisement

Bayesian Multinomial Naïve Bayes Classifier to Text Classification

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 448)

Abstract

Text classification is the task of assigning predefined classes to free-text documents, and it can provide conceptual views of document collections. The multinomial naïve Bayes (NB) classifier is one NB classifier variant, and it is often used as a baseline in text classification. However, multinomial NB classifier is not fully Bayesian. This study proposes a Bayesian version NB classifier. Finally, experimental results on 20 newsgroup show that Bayesian multinomial NB classifier with suitable Dirichlet hyper-parameters has similar performance with multinomial NB classifier.

Keywords

Text classification Multinomial naïve bayes classifier Fully bayesian 

Notes

Acknowledgments

We thank the financial support from National Science Foundation of China (ID: 71403255), and Key Technologies R&D Program of Chinese 12th Five-Year Plan (2011–2015) (ID: 2015BAH25F01). Our gratitude also goes to the anonymous reviewers for their valuable comments.

References

  1. 1.
    Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Mining Text Data, pp. 163–222. Springer (2012)Google Scholar
  2. 2.
    McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: ICML/AAAI-98 Workshop on Learning for Text categorization, AAAI, pp. 41–48 (1998)Google Scholar
  3. 3.
    Rish, I.: An empirical study of the naïve Bayes classifier. In: IJCAI Workshop on Emprical Methods in AI (2001)Google Scholar
  4. 4.
    Bird, S., Klein, E., Loper, E. (eds.): Natural Language Processing with Python. O’Reilly, Springfield (2009)zbMATHGoogle Scholar
  5. 5.
    Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning (2003) Google Scholar
  6. 6.
    Manning, C.D., Raghavan, P., Schütze, H. (eds.): Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)zbMATHGoogle Scholar
  7. 7.
    Lang, K.: Newswenews: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, pp. 331–339 (1995)Google Scholar
  8. 8.
    Pedregosa, F., Varoquaus, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(2), 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Xu, S., Ma, F., Tao, L.: Learn from the information contained in the false splice sites as well as in the true splice sites using SVM. In: Proceedings of the International Conference on Intelligent Systems and Knowledge Engineering, Amsterdam, pp. 1360–1366. Atlantis Press, Netherlands (2007)Google Scholar
  10. 10.
    Rennie, J.D.M.: Improving multi-class text classification with naive Bayes. Master’s thesis, Massachusetts Institute of Technology (2001)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  1. 1.Institute of Scientific and Technical Information of ChinaBeijingPeople’s Republic of China

Personalised recommendations