Skip to main content

Optimal Feature Selection for Sentiment Analysis

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7817))

Abstract

Sentiment Analysis (SA) research has increased tremendously in recent times. Sentiment analysis deals with the methods that automatically process the text contents and extract the opinion of the users. In this paper, unigram and bi-grams are extracted from the text, and composite features are created using them. Part of Speech (POS) based features adjectives and adverbs are also extracted. Information Gain (IG) and Minimum Redundancy Maximum Relevancy (mRMR) feature selection methods are used to extract prominent features. Further, effect of various feature sets for sentiment classification is investigated using machine learning methods. Effects of different categories of features are investigated on four standard datasets i.e. Movie review, product (book, DVD and electronics) review dataset. Experimental results show that composite features created from prominent features of unigram and bi-gram perform better than other features for sentiment classification. mRMR is better feature selection method as compared to IG for sentiment classification. Boolean Multinomial Naïve Bayes (BMNB) algorithm performs better than Support Vector Machine (SVM) classifier for sentiment analysis in terms of accuracy and execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1–135 (2008)

    Article  Google Scholar 

  2. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002)

    Google Scholar 

  3. Agarwal, B., Mittal, N.: Categorical Probability Proportion Difference (CPPD): A Feature Selection Method for Sentiment Classification. In: Proceedings of the 2nd Workshop on Sentiment Analysis where AI Meets Psychology (SAAIP 2012), COLING 2012, Mumbai, pp. 17–26 (2012)

    Google Scholar 

  4. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the Association for Computational Linguistics (ACL), pp. 271–278 (2004)

    Google Scholar 

  5. Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 412–418 (2004)

    Google Scholar 

  6. Verma, S., Bhattacharyya, P.: Incorporating semantic knowledge for sentiment analysis. In: Proceedings of ICON 2009, Hyderabad, India (2009)

    Google Scholar 

  7. Turney, P.: Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: ACL 2002, pp. 417–424 (2002)

    Google Scholar 

  8. Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  9. Liu, B.: Sentiment Analysis and Subjectivity. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn., pp. 627–666. Chapman & Hall (2010)

    Google Scholar 

  10. Abbasi, A., France, S., Zhang, Z., Chen, H.: Selecting Attributes for Sentiment Classification Using Feature Relation Networks. IEEE Transactions on Knowledge and Data Engineering 23, 447–462 (2011)

    Article  Google Scholar 

  11. Witten, I.H., Frank, E.: Data mining: Practical Machine Learning Tools and techniques. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  12. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  13. Tan, S., Zhang, J.: An empirical study of sentiment analysis for chinese documents. Expert Systems with Applications 34, 2622–2629 (2008)

    Article  MathSciNet  Google Scholar 

  14. Wang, S., Li, D., Wei, Y., Li, H.: A Feature Selection Method Based on Fisher’s Discriminant Ratio for Text Sentiment Classification. In: Liu, W., Luo, X., Wang, F.L., Lei, J. (eds.) WISM 2009. LNCS, vol. 5854, pp. 88–97. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. O’Keefe, T., Koprinska, I.: Feature Selection and Weighting Methods in Sentiment Analysis. In: Proceedings of the 14th Australasian Document Computing Symposium (2009)

    Google Scholar 

  16. Nicholls, C., Song, F.: Comparison of feature selection methods for sentiment analysis. In: Farzindar, A., Kešelj, V. (eds.) Canadian AI 2010. LNCS, vol. 6085, pp. 286–289. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. Abbasi, A., Chen, H.C., Salem, A.: Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Transactions on Information Systems 26(3), Article no:12 (2008)

    Google Scholar 

  18. Manning, C.D., Raghvan, P., Schutze, H.: Introduction to information retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  19. WEKA. Open Source Machine Learning Software Weka, http://www.cs.waikato.ac.nz/ml/weka/

  20. Benamara, F., Cesarano, C., Picariello, A., Reforgiato, D., Subrahmanian, V.S.: Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM) (2007)

    Google Scholar 

  21. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI 1995 Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1137–1143 (1995)

    Google Scholar 

  22. Blitzer, J., Dredze, M., Pereira, F.: Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the Association for Computational Linguistics (ACL), pp. 440–447 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Agarwal, B., Mittal, N. (2013). Optimal Feature Selection for Sentiment Analysis. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37256-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37256-8_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37255-1

  • Online ISBN: 978-3-642-37256-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics