Abstract
This paper presents a study on Twitter sentiment analysis where tweets are gathered and sentiments behind the tweet are evaluated by using various machine learning techniques. It presents an empirical evaluation of correlation based feature selection for sentiment classification on twitter data. The data is extracted from twitter in real time and text preprocessing and feature extraction is applied on the textual data. Correlation based attribute selection methods are used and machine learning classifiers (SVM, Naïve Bayes, Random Forest, Meta classifier, SGD, Logistic Regression) are compared on various performance parameters to show which classifier gives better results. The results show that when STWV with Attribute Selection methods are used together in the same setup, the classifiers give accuracy between 78 and 88% with about 0.88 true positive rate and 0.15 false positive rate which is far better when no attribute selection method is used.
Keywords
- Twitter analysis
- Sentiment analysis
- Correlation based feature selection
- Machine learning techniques
This is a preview of subscription content, access via your institution.
Buying options

References
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: LREc. vol 10
Huang C-L, Hu Y-C, Lin C-H (2017) Twitter sentiment analysis. https://cseweb.ucsd.edu/classes/wi17/cse258-a/reports/a080.pdf. Accessed 20 Apr 2018 (Online)
Song J, Kim KT, Lee B, Kim S, Youn HY (2017) A novel classification approach based on na¨ıve bayes for twitter sentiment analysis. KSII Trans Internet Inf Syst (TIIS) 11(6):2996–3011
Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F (2018) Tsaf: twitter sentiment analysis framework using a hybrid classification scheme. Expert Syst 35(1):e12233
Sharma S, Jain A (2019) Cyber social media analytics and issues: a pragmatic approach for twitter sentiment analysis. In: Bhatia S, Tiwari S, Mishra K, Trivedi M (eds) Advances in computer communication and computational sciences. Advances in Intelligent Systems and Computing, vol 924. Springer, Singapore
Maurya S, Jain A (2019) Ranking based feature selection techniques for phishing website identification. In: IEEE international conference on machine learning and data science 2018 (in press)
Yildirim P (2015) Filter based feature selection methods for prediction of risk in hepatitis disease. Int J Mach Learn Comput 5(4)
Holte RC (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11:63–91
Hall MA (1999) Correlation-based feature selection for machine learning. In: Ph.D. Department of Computer Science, The University of Waikato, Hamilton
Novakovic J, Strbac P, Bulatovic D (2011) Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav J Oper Res 21(1):119–135
Hall MA, Smith LA (1999) Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: Proceedings of the twelfth international Florida artificial intelligence research society conference, pp 235–239
Ashraf M, Chetty G, Tran D (2013) Feature selection techniques on thyroid, hepatitis, and breast cancer datasets. Int J Data Min Intell Inf Technol Appl (IJMIA) 3(1):1–8
Leach M (2012) Parallelising feature selection algorithms. University of Manchester, Manchester
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining ACM, pp 168–177
Zareapoor M, Seeja KR (2015) Feature extraction or feature selection for text classification: A case study on phishing email detection. Int J Inf Eng Electron Bus 7(2):60
Khonji M, Jones A, Iraqi Y (2013) An empirical evaluation for feature selection methods in phishing email classification. Int J Comput Syst Sci Eng 28(1):37–51
Bahl S, Sharma SK (2015) Improving classification accuracy of intrusion detection system using feature subset selection. In: 2015 Fifth international conference on advanced computing and communication technologies. IEEE, pp. 431–436
Bahl S, Sharma SK (2016) A minimal subset of features using correlation feature selection model for intrusion detection system. In: Proceedings of the second international conference on computer and communication technologies. Springer, New Delhi, pp. 337–346
Morariu D, Cretulescu R, Breazu M (2013) Feature selection in document classification. In: The fourth international conference in romania of information science and information literacy, ISSN-L pp 2247–0255
Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Technol 5(1):1–167
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093
Hussein DMEDM (2018) A survey on sentiment analysis challenges. J King Saud Univ Eng Sci 30(4):330–338
Acknowledgements
This publication is an outcome of the R&D work undertaken project under the Visvesvaraya Ph.D. Scheme of Ministry of Electronics and Information Technology, Government of India, being implemented by Digital India Corporation and with the cooperation of GGSIP University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Sharma, S., Jain, A. (2020). An Empirical Evaluation of Correlation Based Feature Selection for Tweet Sentiment Classification. In: Gunjan, V., Senatore, S., Kumar, A., Gao, XZ., Merugu, S. (eds) Advances in Cybernetics, Cognition, and Machine Learning for Communication Technologies. Lecture Notes in Electrical Engineering, vol 643. Springer, Singapore. https://doi.org/10.1007/978-981-15-3125-5_22
Download citation
DOI: https://doi.org/10.1007/978-981-15-3125-5_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3124-8
Online ISBN: 978-981-15-3125-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)