Skip to main content

Term Co-occurrence Based Feature Selection for Sentiment Classification

  • Conference paper
  • First Online:
Smart and Innovative Trends in Next Generation Computing Technologies (NGCT 2017)

Abstract

In this paper, the strategy of feature selection for sentiment classification explored and compared to other significant feature selection strategies found in contemporary literature. The feature selection models performed using the statistical measure of t-score and z-score. SVM, NB and AdaBoost classifiers used for classification and compared. The objective of the paper is to explore and evaluate the scope of statistical measures for identifying the optimal features and its significance to classify the opinion using divergent classifiers. Performance analysis carried out on varied datasets with diverse range like the movie reviews, product reviews and tweets, the experiments carried out on feature selection strategies proposed and other strategies found in literature. From the results of the experimental studies, it is evident that optimal features selected using t-score and z-score are robust and outperformed the other feature selection strategies. In order to assess the significance of the feature selection models proposed, the classification process carried out using three classifiers called SVM, NB and AdaBoost. The classification accuracy about the features obtain by proposed models is much higher that compared to the classification accuracy obtained for the features selected by other contemporary models. Among the three classifiers that used to assess classification accuracy, AdaBoost has outperformed the other two models of SVM and NB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liu, B.: Sentiment analysis and subjectivity. In: Handbook of Natural Language Processing, 2nd edn, pp. 627–666. CRC Press, Taylor and Francis Group (2010)

    Google Scholar 

  2. Verma, S., Bhattacharyya, P.: Incorporating semantic knowledge for sentiment analysis. In: Proceedings of 6th International Conference on Natural Language Processing (2009)

    Google Scholar 

  3. Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. (TOIS) 26(3), 12 (2008). https://doi.org/10.1145/1361684.1361685

    Article  Google Scholar 

  4. Vinodhini, G., Chandrasekaran, R.M.: Performance evaluation of machine learning classifiers in sentiment mining. Int. J. Comput. Trends Technol. 4(6), 1783–1786 (2013)

    Google Scholar 

  5. Neviarouskaya, A., Prendinger, H., Ishizuka, M.: SentiFul: a lexicon for sentiment analysis. IEEE Trans. Affect. Comput. 2(1), 22–36 (2011). https://doi.org/10.1109/T-AFFC.2011.1

    Article  Google Scholar 

  6. Dang, Y., Zhang, Y., Chen, H.: A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell. Syst. 25(4), 46–53 (2010). https://doi.org/10.1109/mis.2009.105

    Article  Google Scholar 

  7. Ghosh, M., Kar, A.: Unsupervised linguistic approach for sentiment classification from online reviews using SentiWordNet 3.0. Int. J. Eng. Res. Technol. 2(9), 55–60 (2013)

    Google Scholar 

  8. Zha, Z.-J., Yu, J., Tang, J., Wang, M., Chua, T.-S.: Product aspect ranking and its applications. IEEE Trans. Knowl. Data Eng. 26(5), 1211–1224 (2014). https://doi.org/10.1109/tkde.2013.136

    Article  Google Scholar 

  9. Wang, C., Xiao, Z., Liu, Y., Xu, Y., Zhou, A., Zhang, K.: SentiView: sentiment analysis and visualization for internet popular topics. IEEE Trans. Hum.-Mach. Syst. 43(6), 620–630 (2013). https://doi.org/10.1109/thms.2013.2285047

    Article  Google Scholar 

  10. Kim, Y., Zhang, O.: Credibility adjusted term frequency: a supervised term weighting scheme for sentiment analysis and text classification. In: Proceedings of 5th Workshop of Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 79–83 (2014)

    Google Scholar 

  11. Duric, A., Song, F.: Feature selection for sentiment analysis based on content and syntax models. Decis. Support Syst. 53(4), 704–711 (2012). https://doi.org/10.1016/j.dss.2012.05.023

    Article  Google Scholar 

  12. Nie, F., Xu, D., Tsang, I.W.-H., Zhang, C.: Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction. IEEE Trans. Image Process. 19(7), 1921–1932 (2010). https://doi.org/10.1109/TIP.2010.2044958

    Article  MathSciNet  MATH  Google Scholar 

  13. Nie, F., Huang, H., Cai, X., Ding, C.: Efficient and robust feature selection via joint ℓ2, 1-norms minimization. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 1813–1821 (2010)

    Google Scholar 

  14. Cao, Q., Duan, W., Gan, Q.: Exploring determinants of voting for the ‘‘helpfulness’’ of online user reviews: a text mining approach. Decis. Support Syst. 50(2), 511–521 (2011). https://doi.org/10.1016/j.dss.2010.11.009

    Article  Google Scholar 

  15. Pai, M.-Y., Chu, H.-C., Wang, S.-C., Chen, Y.-M.: Electronic word of mouth analysis for service experience. Expert Syst. Appl. 40(6), 1993–2006 (2013). https://doi.org/10.1016/j.eswa.2012.10.024

    Article  Google Scholar 

  16. Zhang, W., Xu, H., Wan, W.: Weakness finder: find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Syst. Appl. 39(11), 10283–10291 (2012). https://doi.org/10.1016/j.eswa.2012.02.166

    Article  Google Scholar 

  17. Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999). https://doi.org/10.1023/A:1018628609742

    Article  MATH  Google Scholar 

  18. Murphy, K.P.: Naive Bayes Classifiers. University of British Columbia, Vancouver (2006)

    Google Scholar 

  19. An, T.-K., Kim, M.-H.: A new diverse AdaBoost classifier. In: Proceedings of International Conference on Artificial Intelligence and Computational Intelligence (AICI), Sanya, China, pp. 359–363 (2010). https://doi.org/10.1109/aici.2010.82

  20. Budak, H., Taşabat, S.E.: A modified t-score for feature selection. Anadolu Univ. J. Sci. Technol. A-Appl. Sci. Eng. 17(5), 845–852 (2016). https://doi.org/10.18038/aubtda.279853

    Article  Google Scholar 

  21. Kummer, O., Savoy, J.: Feature selection in sentiment analysis. In: CORIA, Bordeaux, France, pp. 273–284 (2012)

    Google Scholar 

  22. Sahoo, P.K., Riedel, T.: Mean Value Theorems and Functional Equations. World Scientific, Singapore (1998)

    Book  Google Scholar 

  23. http://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf. Accessed 18 May 2017

  24. http://www.sjsu.edu/faculty/gerstman/StatPrimer/z-table.pdf. Accessed 18 May 2017

  25. Tripathy, A., Agrawal, A., Rath, S.K.: Classification of sentiment reviews using n-gram machine learning approach. Expert Syst. Appl. 57, 117–126 (2016). https://doi.org/10.1016/j.eswa.2016.03.028

    Article  Google Scholar 

  26. Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996)

    Google Scholar 

  27. http://jmcauley.ucsd.edu/data/amazon/. Accessed 18 May 2017

  28. http://www.cs.cornell.edu/people/pabo/movie-review-data/. Accessed 18 May 2017

  29. http://thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip. Accessed 18 May 2017

Download references

Acknowledgements

The authors are thankful to the University Grants Commission, New Delhi for supporting this research at School of Computer Sciences, North Maharashtra University, Jalgaon under the Special Assistance Programme (SAP) at the level of DRS-II.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sudarshan S. Sonawane .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sonawane, S.S., Kolhe, S.R. (2018). Term Co-occurrence Based Feature Selection for Sentiment Classification. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds) Smart and Innovative Trends in Next Generation Computing Technologies. NGCT 2017. Communications in Computer and Information Science, vol 827. Springer, Singapore. https://doi.org/10.1007/978-981-10-8657-1_31

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8657-1_31

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8656-4

  • Online ISBN: 978-981-10-8657-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics