Efficient feature selection techniques for sentiment analysis

Madasu, Avinash; Elango, Sivasankar

doi:10.1007/s11042-019-08409-z

Efficient feature selection techniques for sentiment analysis

Published: 14 December 2019

Volume 79, pages 6313–6335, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

1147 Accesses
39 Citations
Explore all metrics

Abstract

Sentiment analysis is a domain of study that focuses on identifying and classifying the ideas expressed in the form of text into positive, negative and neutral polarities. Feature selection is a crucial process in machine learning. In this paper, we aim to study the performance of different feature selection techniques for sentiment analysis. Term Frequency Inverse Document Frequency (TF-IDF) is used as the feature extraction technique for creating feature vocabulary. Various Feature Selection (FS) techniques are experimented to select the best set of features from feature vocabulary. The selected features are trained using different machine learning classifiers Logistic Regression (LR), Support Vector Machines (SVM), Decision Tree (DT) and Naive Bayes (NB). Ensemble techniques Bagging and Random Subspace are applied on classifiers to enhance the performance on sentiment analysis. We show that, when the best FS techniques are trained using ensemble methods achieve remarkable results on sentiment analysis. We also compare the performance of FS methods trained using Bagging, Random Subspace with varied neural network architectures. We show that FS techniques trained using ensemble classifiers outperform neural networks requiring significantly less training time and parameters thereby eliminating the need for extensive hyper-parameter tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparison Study on Ensemble Strategies and Feature Sets for Sentiment Analysis

An ensemble approach to stabilize the features for multi-domain sentiment analysis using supervised machine learning

Article Open access 14 November 2018

A Novel Ensemble Approach for Feature Selection to Improve and Simplify the Sentimental Analysis

Notes

Code will be available at repository https://github.com/avinashsai/MTAP
Train,Test splits can found in https://github.com/avinashsai/Cross-domain-sentiment-analysis/tree/master/Dataset/Actualdata
http://nlp.stanford.edu/data/glove.840B.300d.zip

References

Abbasi A, Chen H C, Salem A (2008) Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. In: ACM transactions on information systems (TOIS), 2008, 26(3)
Abdi A, Shamsuddin S M, Hasan S, Piran J (2019) Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion. Inf Process Manag 56(4):1245–1259
Article Google Scholar
Agarwal B, Mittal N (2012) Categorical probability proportion difference (CPPD): a feature selection method for sentiment classification. In: Proceedings of the 2nd workshop on sentiment analysis where ai meets psychology, pp 17–26
Agarwal B, Mittal N (2013) Optimal feature selection for sentiment analysis. In: International conference on intelligent text processing and computational linguistics. Springer, Berlin, pp 13–24
Bahassine S, Madani A, Al-Sarem M, Kissi M (2018) Feature selection using an improved Chi-square for Arabic text classification. Journal of King Saud University-Computer and Information Sciences
Barandiaran I (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):1–22
Google Scholar
Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th annual meeting of the association of computational linguistics, pp 440–447
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Cai J, Song F (2008) Maximum entropy modeling with feature selection for text categorization. In: Li H, Liu T, Ma WY, Sakai T, Wong KF, Zhou G (eds) Information retrieval technology. AIRS 2008. Lecture notes in computer science, vol 4993. Springer, Berlin
Chi X, Siew T P, Cambria E (2017) Adaptive two-stage feature selection for sentiment classification. In: IEEE international conference on systems, man, and cybernetics (SMC), pp 1238–1243
Conneau A, Schwenk H, Barrault L, Lecun Y (2016) Very deep convolutional networks for text classification. arXiv:1606.01781
Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: Icml, vol 1, pp 74–81
From Group to Individual Labels using Deep Features’, Kotzias et al. KDD, 2015
Galavotti L, Sebastiani F, Simi M (2000) Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha J, Baker T (eds) Research and advanced technology for digital libraries. ECDL 2000. Lecture Notes in Computer Science, vol 1923. Springer, Berlin
Gao Z, Wang D Y, Wan S H, Zhang H, Wang Y L (2019) Cognitive-inspired class-statistic matching with triple-constrain for camera free 3D object retrieval. Futur Gener Comput Syst 94:641–653
Article Google Scholar
Gao Z, Xuan H Z, Zhang H, Wan S, Choo KKR (2019) Adaptive fusion and category-level dictionary learning model for multi-view human action recognition. IEEE Internet of Things Journal
Harris ZS (1954) Distributional structure. Word 10.2-3:146–162
Article Google Scholar
Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural computation 9(8):1735–1780
Article Google Scholar
Jones KS (2004) A statistical interpretation of term specificity and its application in retrieval. Journal of documentation
Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: EACL, 427–431. Association for computational linguistics
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882
Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intel 70:25–37
Article Google Scholar
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Lee J, Yu I, Park J, Kim D W (2019) Memetic feature selection for multilabel text categorization using label frequency difference. Inform Sci 485:263–280
Article Google Scholar
Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. arXiv:1605.05101
López M, Valdivia A, Martínez-Cámara E, Luzón MV, Herrera F (2019) E2SAM: Evolutionary ensemble of sentiment analysis methods for domain adaptation. Inform Sci 480:273–286
Article Google Scholar
Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with naive bayes which naive bayes?. In: Proceedings of CEAS
Morinaga S, Yamanishi K, Tateishi K, Fukushima T (2002) Mining product reputations on the web. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 341–349. ACM
O’Keefe T, Koprinska I (2009) Feature selection and weighting methods in sentiment analysis. In: Proceedings of the 14th Australasian document computing symposium, Sydney, pp 67–74
Oussous A, Lahcen AA, Belfkih S (2019) Impact of text pre-processing and ensemble learning on arabic sentiment analysis. In: Proceedings of the 2nd international conference on networking, information systems and security, pp 65. ACM
Pang B, Lee L (2005) Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 115–124). Association for Computational Linguistics
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing - Vol 10,EMNLP ’02, pp 79–86
Pascanu R, Mikolov T, Bengio Y (2012) Understanding the exploding gradient problem. arXiv:1211.5063, 2
Plackett R L (1983) Karl Pearson and the chi-squared test. International Statistical Review/Revue Internationale de Statistique, pp 59–72
Pong-Inwong C, Kaewmak K (2016) Improved sentiment analysis for teaching evaluation using feature selection and voting ensemble learning integration. In: 2nd IEEE international conference on computer and communications (ICCC), pp 1222–1225
Rehman A, Javed K, Babri H A, Saeed M (2015) Relative discrimination criterion–A novel feature ranking method for text data. Expert Syst Appl 42:3670–3681
Article Google Scholar
Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1-2):23–69
Article Google Scholar
Tan S, Zhang J (2008) An empirical study of sentiment analysis for chinese documents. Expert Syst Appl 34(4):2622–2629
Article Google Scholar
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: A review. Data classification: Algorithms and applications, pp 37
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10(66-71):13
Google Scholar
Wang S, Li D, Wei Y, Li H (2009) A feature selection method based on fisher’s discriminant ratio for text sentiment classification. In: Liu W, Luo X, Wang FL, Lei J (eds) Web information systems and mining. WISM 2009. Lecture notes in computer science, vol 5854. Springer, Berlin
Wang S, Manning CD (2012) Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the 50th annual meeting of the association for computational linguistics: Short papers-volume 2 (pp. 90–94). Association for Computational Linguistics
Xiao L, Zhang H, Chen W, Wang Y, Jin Y (2018) Transformable convolutional neural network for text classification. In IJCAI, pp 4496–4502

Download references

Author information

Authors and Affiliations

Samsung R and D Institute India, Bengaluru, Bagmane Constellation Business Park, Outer Ring Road, Doddanekundi Circle, Marathahalli Post, Bengaluru, Karnataka, 560037, India
Avinash Madasu
Department of Computer Science, National Institute of Technology, Tiruchirappalli, Tanjore Main Road, National Highway 67, Near BHEL Trichy, Tiruchirappalli, Tamil Nadu, 620015, India
Sivasankar Elango

Authors

Avinash Madasu
View author publications
You can also search for this author in PubMed Google Scholar
Sivasankar Elango
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Avinash Madasu.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Madasu, A., Elango, S. Efficient feature selection techniques for sentiment analysis. Multimed Tools Appl 79, 6313–6335 (2020). https://doi.org/10.1007/s11042-019-08409-z

Download citation

Received: 06 December 2018
Revised: 01 October 2019
Accepted: 01 November 2019
Published: 14 December 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11042-019-08409-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient feature selection techniques for sentiment analysis

Abstract

Access this article

Similar content being viewed by others

A Comparison Study on Ensemble Strategies and Feature Sets for Sentiment Analysis

An ensemble approach to stabilize the features for multi-domain sentiment analysis using supervised machine learning

A Novel Ensemble Approach for Feature Selection to Improve and Simplify the Sentimental Analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient feature selection techniques for sentiment analysis

Abstract

Access this article

Similar content being viewed by others

A Comparison Study on Ensemble Strategies and Feature Sets for Sentiment Analysis

An ensemble approach to stabilize the features for multi-domain sentiment analysis using supervised machine learning

A Novel Ensemble Approach for Feature Selection to Improve and Simplify the Sentimental Analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation