Optimal Feature Selection for Sentiment Analysis

  • Basant Agarwal
  • Namita Mittal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7817)

Abstract

Sentiment Analysis (SA) research has increased tremendously in recent times. Sentiment analysis deals with the methods that automatically process the text contents and extract the opinion of the users. In this paper, unigram and bi-grams are extracted from the text, and composite features are created using them. Part of Speech (POS) based features adjectives and adverbs are also extracted. Information Gain (IG) and Minimum Redundancy Maximum Relevancy (mRMR) feature selection methods are used to extract prominent features. Further, effect of various feature sets for sentiment classification is investigated using machine learning methods. Effects of different categories of features are investigated on four standard datasets i.e. Movie review, product (book, DVD and electronics) review dataset. Experimental results show that composite features created from prominent features of unigram and bi-gram perform better than other features for sentiment classification. mRMR is better feature selection method as compared to IG for sentiment classification. Boolean Multinomial Naïve Bayes (BMNB) algorithm performs better than Support Vector Machine (SVM) classifier for sentiment analysis in terms of accuracy and execution time.

Keywords

Sentiment Analysis feature selection methods machine learning Information Gain Minimum Redundancy Maximum Relevancy (mRMR) composite features 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1–135 (2008)CrossRefGoogle Scholar
  2. 2.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002)Google Scholar
  3. 3.
    Agarwal, B., Mittal, N.: Categorical Probability Proportion Difference (CPPD): A Feature Selection Method for Sentiment Classification. In: Proceedings of the 2nd Workshop on Sentiment Analysis where AI Meets Psychology (SAAIP 2012), COLING 2012, Mumbai, pp. 17–26 (2012)Google Scholar
  4. 4.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the Association for Computational Linguistics (ACL), pp. 271–278 (2004)Google Scholar
  5. 5.
    Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 412–418 (2004)Google Scholar
  6. 6.
    Verma, S., Bhattacharyya, P.: Incorporating semantic knowledge for sentiment analysis. In: Proceedings of ICON 2009, Hyderabad, India (2009)Google Scholar
  7. 7.
    Turney, P.: Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: ACL 2002, pp. 417–424 (2002)Google Scholar
  8. 8.
    Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)CrossRefGoogle Scholar
  9. 9.
    Liu, B.: Sentiment Analysis and Subjectivity. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn., pp. 627–666. Chapman & Hall (2010)Google Scholar
  10. 10.
    Abbasi, A., France, S., Zhang, Z., Chen, H.: Selecting Attributes for Sentiment Classification Using Feature Relation Networks. IEEE Transactions on Knowledge and Data Engineering 23, 447–462 (2011)CrossRefGoogle Scholar
  11. 11.
    Witten, I.H., Frank, E.: Data mining: Practical Machine Learning Tools and techniques. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  12. 12.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  13. 13.
    Tan, S., Zhang, J.: An empirical study of sentiment analysis for chinese documents. Expert Systems with Applications 34, 2622–2629 (2008)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Wang, S., Li, D., Wei, Y., Li, H.: A Feature Selection Method Based on Fisher’s Discriminant Ratio for Text Sentiment Classification. In: Liu, W., Luo, X., Wang, F.L., Lei, J. (eds.) WISM 2009. LNCS, vol. 5854, pp. 88–97. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    O’Keefe, T., Koprinska, I.: Feature Selection and Weighting Methods in Sentiment Analysis. In: Proceedings of the 14th Australasian Document Computing Symposium (2009)Google Scholar
  16. 16.
    Nicholls, C., Song, F.: Comparison of feature selection methods for sentiment analysis. In: Farzindar, A., Kešelj, V. (eds.) Canadian AI 2010. LNCS, vol. 6085, pp. 286–289. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Abbasi, A., Chen, H.C., Salem, A.: Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Transactions on Information Systems 26(3), Article no:12 (2008)Google Scholar
  18. 18.
    Manning, C.D., Raghvan, P., Schutze, H.: Introduction to information retrieval. Cambridge University Press, Cambridge (2008)MATHCrossRefGoogle Scholar
  19. 19.
    WEKA. Open Source Machine Learning Software Weka, http://www.cs.waikato.ac.nz/ml/weka/
  20. 20.
    Benamara, F., Cesarano, C., Picariello, A., Reforgiato, D., Subrahmanian, V.S.: Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM) (2007)Google Scholar
  21. 21.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI 1995 Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1137–1143 (1995)Google Scholar
  22. 22.
    Blitzer, J., Dredze, M., Pereira, F.: Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the Association for Computational Linguistics (ACL), pp. 440–447 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Basant Agarwal
    • 1
  • Namita Mittal
    • 1
  1. 1.Malaviya National Institute of TechnologyJaipurIndia

Personalised recommendations