Abstract
In this paper, the strategy of feature selection for sentiment classification explored and compared to other significant feature selection strategies found in contemporary literature. The feature selection models performed using the statistical measure of t-score and z-score. SVM, NB and AdaBoost classifiers used for classification and compared. The objective of the paper is to explore and evaluate the scope of statistical measures for identifying the optimal features and its significance to classify the opinion using divergent classifiers. Performance analysis carried out on varied datasets with diverse range like the movie reviews, product reviews and tweets, the experiments carried out on feature selection strategies proposed and other strategies found in literature. From the results of the experimental studies, it is evident that optimal features selected using t-score and z-score are robust and outperformed the other feature selection strategies. In order to assess the significance of the feature selection models proposed, the classification process carried out using three classifiers called SVM, NB and AdaBoost. The classification accuracy about the features obtain by proposed models is much higher that compared to the classification accuracy obtained for the features selected by other contemporary models. Among the three classifiers that used to assess classification accuracy, AdaBoost has outperformed the other two models of SVM and NB.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liu, B.: Sentiment analysis and subjectivity. In: Handbook of Natural Language Processing, 2nd edn, pp. 627–666. CRC Press, Taylor and Francis Group (2010)
Verma, S., Bhattacharyya, P.: Incorporating semantic knowledge for sentiment analysis. In: Proceedings of 6th International Conference on Natural Language Processing (2009)
Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. (TOIS) 26(3), 12 (2008). https://doi.org/10.1145/1361684.1361685
Vinodhini, G., Chandrasekaran, R.M.: Performance evaluation of machine learning classifiers in sentiment mining. Int. J. Comput. Trends Technol. 4(6), 1783–1786 (2013)
Neviarouskaya, A., Prendinger, H., Ishizuka, M.: SentiFul: a lexicon for sentiment analysis. IEEE Trans. Affect. Comput. 2(1), 22–36 (2011). https://doi.org/10.1109/T-AFFC.2011.1
Dang, Y., Zhang, Y., Chen, H.: A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell. Syst. 25(4), 46–53 (2010). https://doi.org/10.1109/mis.2009.105
Ghosh, M., Kar, A.: Unsupervised linguistic approach for sentiment classification from online reviews using SentiWordNet 3.0. Int. J. Eng. Res. Technol. 2(9), 55–60 (2013)
Zha, Z.-J., Yu, J., Tang, J., Wang, M., Chua, T.-S.: Product aspect ranking and its applications. IEEE Trans. Knowl. Data Eng. 26(5), 1211–1224 (2014). https://doi.org/10.1109/tkde.2013.136
Wang, C., Xiao, Z., Liu, Y., Xu, Y., Zhou, A., Zhang, K.: SentiView: sentiment analysis and visualization for internet popular topics. IEEE Trans. Hum.-Mach. Syst. 43(6), 620–630 (2013). https://doi.org/10.1109/thms.2013.2285047
Kim, Y., Zhang, O.: Credibility adjusted term frequency: a supervised term weighting scheme for sentiment analysis and text classification. In: Proceedings of 5th Workshop of Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 79–83 (2014)
Duric, A., Song, F.: Feature selection for sentiment analysis based on content and syntax models. Decis. Support Syst. 53(4), 704–711 (2012). https://doi.org/10.1016/j.dss.2012.05.023
Nie, F., Xu, D., Tsang, I.W.-H., Zhang, C.: Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction. IEEE Trans. Image Process. 19(7), 1921–1932 (2010). https://doi.org/10.1109/TIP.2010.2044958
Nie, F., Huang, H., Cai, X., Ding, C.: Efficient and robust feature selection via joint ℓ2, 1-norms minimization. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 1813–1821 (2010)
Cao, Q., Duan, W., Gan, Q.: Exploring determinants of voting for the ‘‘helpfulness’’ of online user reviews: a text mining approach. Decis. Support Syst. 50(2), 511–521 (2011). https://doi.org/10.1016/j.dss.2010.11.009
Pai, M.-Y., Chu, H.-C., Wang, S.-C., Chen, Y.-M.: Electronic word of mouth analysis for service experience. Expert Syst. Appl. 40(6), 1993–2006 (2013). https://doi.org/10.1016/j.eswa.2012.10.024
Zhang, W., Xu, H., Wan, W.: Weakness finder: find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Syst. Appl. 39(11), 10283–10291 (2012). https://doi.org/10.1016/j.eswa.2012.02.166
Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999). https://doi.org/10.1023/A:1018628609742
Murphy, K.P.: Naive Bayes Classifiers. University of British Columbia, Vancouver (2006)
An, T.-K., Kim, M.-H.: A new diverse AdaBoost classifier. In: Proceedings of International Conference on Artificial Intelligence and Computational Intelligence (AICI), Sanya, China, pp. 359–363 (2010). https://doi.org/10.1109/aici.2010.82
Budak, H., Taşabat, S.E.: A modified t-score for feature selection. Anadolu Univ. J. Sci. Technol. A-Appl. Sci. Eng. 17(5), 845–852 (2016). https://doi.org/10.18038/aubtda.279853
Kummer, O., Savoy, J.: Feature selection in sentiment analysis. In: CORIA, Bordeaux, France, pp. 273–284 (2012)
Sahoo, P.K., Riedel, T.: Mean Value Theorems and Functional Equations. World Scientific, Singapore (1998)
http://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf. Accessed 18 May 2017
http://www.sjsu.edu/faculty/gerstman/StatPrimer/z-table.pdf. Accessed 18 May 2017
Tripathy, A., Agrawal, A., Rath, S.K.: Classification of sentiment reviews using n-gram machine learning approach. Expert Syst. Appl. 57, 117–126 (2016). https://doi.org/10.1016/j.eswa.2016.03.028
Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996)
http://jmcauley.ucsd.edu/data/amazon/. Accessed 18 May 2017
http://www.cs.cornell.edu/people/pabo/movie-review-data/. Accessed 18 May 2017
http://thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip. Accessed 18 May 2017
Acknowledgements
The authors are thankful to the University Grants Commission, New Delhi for supporting this research at School of Computer Sciences, North Maharashtra University, Jalgaon under the Special Assistance Programme (SAP) at the level of DRS-II.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sonawane, S.S., Kolhe, S.R. (2018). Term Co-occurrence Based Feature Selection for Sentiment Classification. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds) Smart and Innovative Trends in Next Generation Computing Technologies. NGCT 2017. Communications in Computer and Information Science, vol 827. Springer, Singapore. https://doi.org/10.1007/978-981-10-8657-1_31
Download citation
DOI: https://doi.org/10.1007/978-981-10-8657-1_31
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8656-4
Online ISBN: 978-981-10-8657-1
eBook Packages: Computer ScienceComputer Science (R0)