Fast and Accurate Sentiment Classification Using an Enhanced Naive Bayes Model

  • Vivek Narayanan
  • Ishan Arora
  • Arjun Bhatia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8206)


We have explored different methods of improving the accuracy of a Naive Bayes classifier for sentiment analysis. We observed that a combination of methods like effective negation handling, word n-grams and feature selection by mutual information results in a significant improvement in accuracy. This implies that a highly accurate and fast sentiment classifier can be built using a simple Naive Bayes model that has linear training and testing time complexities. We achieved an accuracy of 88.80% on the popular IMDB movie reviews dataset. The proposed method can be generalized to a number of text categorization problems for improving speed and accuracy.


Sentiment classification Negation Handling Mutual Information Feature Selection n-grams 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Large Movie Review Dataset (n.d.),
  2. 2.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10. Association for Computational Linguistics (2002)Google Scholar
  3. 3.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  4. 4.
    Das, S., Chen, M.: Yahoo! for Amazon: Sentiment parsing from small talk on the web. In: EFA 2001 Barcelona Meetings (2001)Google Scholar
  5. 5.
    Pauls, A., Klein, D.: Faster and smaller n-gram language models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (2011)Google Scholar
  6. 6.
    Rennie, J.D., et al.: Tackling the poor assumptions of naive bayes text classifiers. In: Machine Learning-International Workshop then Conference, vol. 20(2) (2003)Google Scholar
  7. 7.
    Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning Word Vectors for Sentiment Analysis. In: The 49th Annual Meeting of the Association for Computational Linguistics, ACL 2011 (2011)Google Scholar
  8. 8.
    Kennedy, A., Inkpen, D.: Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence 22(2), 110–125 (2006)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Li, T., Zhang, Y., Sindhwani, V.: A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1. Association for Computational Linguistics (2009)Google Scholar
  10. 10.
    Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 301–311. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Springer, HeidelbergGoogle Scholar
  12. 12.
    Whitelaw, C., Garg, N., Argamon, S.: Using appraisal groups for sentiment analysis. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM (2005)Google Scholar
  13. 13.
    Socher, R., et al.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2011)Google Scholar
  14. 14.
    Source code of classifier developed for this paper,
  15. 15.
    Devitt, A., Ahmad, K.: Sentiment polarity identification in financial news: A cohesion-based approach. In: Annual Meeting-Association for Computational Linguistics, vol. 45(1) (2007)Google Scholar
  16. 16.
    Peng, F., Schuurmans, D.: Combining naive Bayes and n-gram language models for text classification. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 335–350. Springer, Heidelberg (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Vivek Narayanan
    • 1
  • Ishan Arora
    • 1
  • Arjun Bhatia
    • 1
  1. 1.Department of Electronics EngineeringIndian Institute of Technology (BHU)VaranasiIndia

Personalised recommendations