The Naive Bayes Classifier in Opinion Mining: In Search of the Best Feature Set

  • Liviu P. Dinu
  • Iulia Iuga
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7181)


This paper focuses on how naive Bayes classifiers work in opinion mining applications. The first question asked is what are the feature sets to choose when training such a classifier in order to obtain the best results in the classification of objects (in this case, texts). The second question is whether combining the results of Naive Bayes classifiers trained on different feature sets has a positive effect on the final results. Two data bases consisting of negative and positive movie reviews were used when training and testing the classifiers for testing purposes.


Frequent Word Opinion Mining Sentiment Analysis Informative Feature Computational Linguistics 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chaovalit, P., Zhou, L.: Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches. In: 38th Hawaii International Conference on System Sciences, HICSS 2005 (2005)Google Scholar
  2. 2.
    Conrad, J.G., Schilder, F.: Opinion mining in legal blogs. In: Proceedings of the 11th International Conference on Artificial Intelligence and Law, ICAIL 2007, pp. 231–236 (2007)Google Scholar
  3. 3.
    Dinu, A.: Short Text Categorization via Coherence Constraints. In: Proc. 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2011, Timisoara, Romania, September 26-29, pp. 247–251 (2011)Google Scholar
  4. 4.
    Feldman, R., Sanger, J.: The Text Mining Handbook - Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press (2007)Google Scholar
  5. 5.
    Kononenko, I.: Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine 23(1), 89–109 (2001)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Langley, P., Iba, W., Thompson, K.: An Analysis of Bayesian Classifiers. In: Proc. AAAI 1992, pp. 223–228 (1992)Google Scholar
  7. 7.
    Lewis, D.D.: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Proc. Machine Learning: ECML-1998, 10th European Conference on Machine Learning, Chemnitz, Germany, April 21-23, pp. 4–15 (1998)Google Scholar
  8. 8.
    Mihalcea, R., Banea, C., Wiebe, J.: Learning Multilingual Subjective Language via Cross-Lingual Projections. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, ACL 2007, Prague, Czech Republic, June 23-30 (2007)Google Scholar
  9. 9.
    Mihalcea, R., Pulman, S.: Characterizing Humour: An Exploration of Features in Humorous Texts. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 337–347. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Pak, A., Paroubek, P.: Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, May 17-23 (2010)Google Scholar
  11. 11.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002)Google Scholar
  12. 12.
    Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval (FTIR) 2(1-2), 1–135 (2007)Google Scholar
  13. 13.
    Perkins, J.: Python Text Processing with NLTK 2.0 Cookbook. Packt Publishing (2010)Google Scholar
  14. 14.
    Rish, I., Watson, T.J.: An empirical study of the naive Bayes classifier. Research Center (2001),
  15. 15.
    Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques. Elsevier (2005)Google Scholar
  16. 16.
    Yessenov, K., Misailovic, S.: Sentiment Analysis of Movie Review Comments, Report on Spring 2009 final project (2009),
  17. 17.
    Beautiful Soup - HTML/XML parser for Python,
  18. 18.
    IMDbPY - package for manipulating IMDb data for Python,
  19. 19.
    NLTK - Natural Language ToolKit,
  20. 20.
    PyGTK - library for implementing graphic user interfaces in Python,
  21. 21.
    WebKit - web browser web,

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Liviu P. Dinu
    • 1
  • Iulia Iuga
    • 1
  1. 1.Faculty of Mathematics and Computer Science, Center for Computational LinguisticsUniversity of BucharestBucharestRomania

Personalised recommendations