Term Co-occurrence Based Feature Selection for Sentiment Classification

Sonawane, Sudarshan S.; Kolhe, Satish R.

doi:10.1007/978-981-10-8657-1_31

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 827))

Included in the following conference series:

International Conference on Next Generation Computing Technologies

1374 Accesses

Abstract

In this paper, the strategy of feature selection for sentiment classification explored and compared to other significant feature selection strategies found in contemporary literature. The feature selection models performed using the statistical measure of t-score and z-score. SVM, NB and AdaBoost classifiers used for classification and compared. The objective of the paper is to explore and evaluate the scope of statistical measures for identifying the optimal features and its significance to classify the opinion using divergent classifiers. Performance analysis carried out on varied datasets with diverse range like the movie reviews, product reviews and tweets, the experiments carried out on feature selection strategies proposed and other strategies found in literature. From the results of the experimental studies, it is evident that optimal features selected using t-score and z-score are robust and outperformed the other feature selection strategies. In order to assess the significance of the feature selection models proposed, the classification process carried out using three classifiers called SVM, NB and AdaBoost. The classification accuracy about the features obtain by proposed models is much higher that compared to the classification accuracy obtained for the features selected by other contemporary models. Among the three classifiers that used to assess classification accuracy, AdaBoost has outperformed the other two models of SVM and NB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Liu, B.: Sentiment analysis and subjectivity. In: Handbook of Natural Language Processing, 2nd edn, pp. 627–666. CRC Press, Taylor and Francis Group (2010)
Google Scholar
Verma, S., Bhattacharyya, P.: Incorporating semantic knowledge for sentiment analysis. In: Proceedings of 6th International Conference on Natural Language Processing (2009)
Google Scholar
Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. (TOIS) 26(3), 12 (2008). https://doi.org/10.1145/1361684.1361685
Article Google Scholar
Vinodhini, G., Chandrasekaran, R.M.: Performance evaluation of machine learning classifiers in sentiment mining. Int. J. Comput. Trends Technol. 4(6), 1783–1786 (2013)
Google Scholar
Neviarouskaya, A., Prendinger, H., Ishizuka, M.: SentiFul: a lexicon for sentiment analysis. IEEE Trans. Affect. Comput. 2(1), 22–36 (2011). https://doi.org/10.1109/T-AFFC.2011.1
Article Google Scholar
Dang, Y., Zhang, Y., Chen, H.: A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell. Syst. 25(4), 46–53 (2010). https://doi.org/10.1109/mis.2009.105
Article Google Scholar
Ghosh, M., Kar, A.: Unsupervised linguistic approach for sentiment classification from online reviews using SentiWordNet 3.0. Int. J. Eng. Res. Technol. 2(9), 55–60 (2013)
Google Scholar
Zha, Z.-J., Yu, J., Tang, J., Wang, M., Chua, T.-S.: Product aspect ranking and its applications. IEEE Trans. Knowl. Data Eng. 26(5), 1211–1224 (2014). https://doi.org/10.1109/tkde.2013.136
Article Google Scholar
Wang, C., Xiao, Z., Liu, Y., Xu, Y., Zhou, A., Zhang, K.: SentiView: sentiment analysis and visualization for internet popular topics. IEEE Trans. Hum.-Mach. Syst. 43(6), 620–630 (2013). https://doi.org/10.1109/thms.2013.2285047
Article Google Scholar
Kim, Y., Zhang, O.: Credibility adjusted term frequency: a supervised term weighting scheme for sentiment analysis and text classification. In: Proceedings of 5th Workshop of Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 79–83 (2014)
Google Scholar
Duric, A., Song, F.: Feature selection for sentiment analysis based on content and syntax models. Decis. Support Syst. 53(4), 704–711 (2012). https://doi.org/10.1016/j.dss.2012.05.023
Article Google Scholar
Nie, F., Xu, D., Tsang, I.W.-H., Zhang, C.: Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction. IEEE Trans. Image Process. 19(7), 1921–1932 (2010). https://doi.org/10.1109/TIP.2010.2044958
Article MathSciNet MATH Google Scholar
Nie, F., Huang, H., Cai, X., Ding, C.: Efficient and robust feature selection via joint ℓ2, 1-norms minimization. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 1813–1821 (2010)
Google Scholar
Cao, Q., Duan, W., Gan, Q.: Exploring determinants of voting for the ‘‘helpfulness’’ of online user reviews: a text mining approach. Decis. Support Syst. 50(2), 511–521 (2011). https://doi.org/10.1016/j.dss.2010.11.009
Article Google Scholar
Pai, M.-Y., Chu, H.-C., Wang, S.-C., Chen, Y.-M.: Electronic word of mouth analysis for service experience. Expert Syst. Appl. 40(6), 1993–2006 (2013). https://doi.org/10.1016/j.eswa.2012.10.024
Article Google Scholar
Zhang, W., Xu, H., Wan, W.: Weakness finder: find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Syst. Appl. 39(11), 10283–10291 (2012). https://doi.org/10.1016/j.eswa.2012.02.166
Article Google Scholar
Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999). https://doi.org/10.1023/A:1018628609742
Article MATH Google Scholar
Murphy, K.P.: Naive Bayes Classifiers. University of British Columbia, Vancouver (2006)
Google Scholar
An, T.-K., Kim, M.-H.: A new diverse AdaBoost classifier. In: Proceedings of International Conference on Artificial Intelligence and Computational Intelligence (AICI), Sanya, China, pp. 359–363 (2010). https://doi.org/10.1109/aici.2010.82
Budak, H., Taşabat, S.E.: A modified t-score for feature selection. Anadolu Univ. J. Sci. Technol. A-Appl. Sci. Eng. 17(5), 845–852 (2016). https://doi.org/10.18038/aubtda.279853
Article Google Scholar
Kummer, O., Savoy, J.: Feature selection in sentiment analysis. In: CORIA, Bordeaux, France, pp. 273–284 (2012)
Google Scholar
Sahoo, P.K., Riedel, T.: Mean Value Theorems and Functional Equations. World Scientific, Singapore (1998)
Book Google Scholar
http://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf. Accessed 18 May 2017
http://www.sjsu.edu/faculty/gerstman/StatPrimer/z-table.pdf. Accessed 18 May 2017
Tripathy, A., Agrawal, A., Rath, S.K.: Classification of sentiment reviews using n-gram machine learning approach. Expert Syst. Appl. 57, 117–126 (2016). https://doi.org/10.1016/j.eswa.2016.03.028
Article Google Scholar
Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996)
Google Scholar
http://jmcauley.ucsd.edu/data/amazon/. Accessed 18 May 2017
http://www.cs.cornell.edu/people/pabo/movie-review-data/. Accessed 18 May 2017
http://thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip. Accessed 18 May 2017

Download references

Acknowledgements

The authors are thankful to the University Grants Commission, New Delhi for supporting this research at School of Computer Sciences, North Maharashtra University, Jalgaon under the Special Assistance Programme (SAP) at the level of DRS-II.

Author information

Authors and Affiliations

Department of Computer Engineering, Shri Gulabrao Deokar College of Engineering, Jalgaon, India
Sudarshan S. Sonawane
School of Computer Sciences, North Maharashtra University, Jalgaon, India
Satish R. Kolhe

Authors

Sudarshan S. Sonawane
View author publications
You can also search for this author in PubMed Google Scholar
Satish R. Kolhe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sudarshan S. Sonawane .

Editor information

Editors and Affiliations

Indian Institute of Technology Patna, Patna, Bihar, India
Pushpak Bhattacharyya
University of Petroleum and Energy Studies, Dehradun, India
Hanumat G. Sastry
University of Petroleum and Energy Studies, Dehradun, India
Venkatadri Marriboyina
University of Petroleum and Energy Studies, Dehradun, India
Rashmi Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sonawane, S.S., Kolhe, S.R. (2018). Term Co-occurrence Based Feature Selection for Sentiment Classification. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds) Smart and Innovative Trends in Next Generation Computing Technologies. NGCT 2017. Communications in Computer and Information Science, vol 827. Springer, Singapore. https://doi.org/10.1007/978-981-10-8657-1_31

Download citation

DOI: https://doi.org/10.1007/978-981-10-8657-1_31
Published: 09 June 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8656-4
Online ISBN: 978-981-10-8657-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics