An Empirical Evaluation of Correlation Based Feature Selection for Tweet Sentiment Classification

Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 643)


This paper presents a study on Twitter sentiment analysis where tweets are gathered and sentiments behind the tweet are evaluated by using various machine learning techniques. It presents an empirical evaluation of correlation based feature selection for sentiment classification on twitter data. The data is extracted from twitter in real time and text preprocessing and feature extraction is applied on the textual data. Correlation based attribute selection methods are used and machine learning classifiers (SVM, Naïve Bayes, Random Forest, Meta classifier, SGD, Logistic Regression) are compared on various performance parameters to show which classifier gives better results. The results show that when STWV with Attribute Selection methods are used together in the same setup, the classifiers give accuracy between 78 and 88% with about 0.88 true positive rate and 0.15 false positive rate which is far better when no attribute selection method is used.


Twitter analysis Sentiment analysis Correlation based feature selection Machine learning techniques 



This publication is an outcome of the R&D work undertaken project under the Visvesvaraya Ph.D. Scheme of Ministry of Electronics and Information Technology, Government of India, being implemented by Digital India Corporation and with the cooperation of GGSIP University.


  1. 1.
    Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: LREc. vol 10Google Scholar
  2. 2.
    Huang C-L, Hu Y-C, Lin C-H (2017) Twitter sentiment analysis. Accessed 20 Apr 2018 (Online)
  3. 3.
    Song J, Kim KT, Lee B, Kim S, Youn HY (2017) A novel classification approach based on na¨ıve bayes for twitter sentiment analysis. KSII Trans Internet Inf Syst (TIIS) 11(6):2996–3011Google Scholar
  4. 4.
    Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F (2018) Tsaf: twitter sentiment analysis framework using a hybrid classification scheme. Expert Syst 35(1):e12233CrossRefGoogle Scholar
  5. 5.
    Sharma S, Jain A (2019) Cyber social media analytics and issues: a pragmatic approach for twitter sentiment analysis. In: Bhatia S, Tiwari S, Mishra K, Trivedi M (eds) Advances in computer communication and computational sciences. Advances in Intelligent Systems and Computing, vol 924. Springer, SingaporeGoogle Scholar
  6. 6.
    Maurya S, Jain A (2019) Ranking based feature selection techniques for phishing website identification. In: IEEE international conference on machine learning and data science 2018 (in press)Google Scholar
  7. 7.
    Yildirim P (2015) Filter based feature selection methods for prediction of risk in hepatitis disease. Int J Mach Learn Comput 5(4)Google Scholar
  8. 8.
    Holte RC (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11:63–91CrossRefGoogle Scholar
  9. 9.
    Hall MA (1999) Correlation-based feature selection for machine learning. In: Ph.D. Department of Computer Science, The University of Waikato, HamiltonGoogle Scholar
  10. 10.
    Novakovic J, Strbac P, Bulatovic D (2011) Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav J Oper Res 21(1):119–135MathSciNetCrossRefGoogle Scholar
  11. 11.
    Hall MA, Smith LA (1999) Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: Proceedings of the twelfth international Florida artificial intelligence research society conference, pp 235–239Google Scholar
  12. 12.
    Ashraf M, Chetty G, Tran D (2013) Feature selection techniques on thyroid, hepatitis, and breast cancer datasets. Int J Data Min Intell Inf Technol Appl (IJMIA) 3(1):1–8Google Scholar
  13. 13.
    Leach M (2012) Parallelising feature selection algorithms. University of Manchester, ManchesterGoogle Scholar
  14. 14.
    Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining ACM, pp 168–177Google Scholar
  15. 15.
    Zareapoor M, Seeja KR (2015) Feature extraction or feature selection for text classification: A case study on phishing email detection. Int J Inf Eng Electron Bus 7(2):60Google Scholar
  16. 16.
    Khonji M, Jones A, Iraqi Y (2013) An empirical evaluation for feature selection methods in phishing email classification. Int J Comput Syst Sci Eng 28(1):37–51Google Scholar
  17. 17.
    Bahl S, Sharma SK (2015) Improving classification accuracy of intrusion detection system using feature subset selection. In: 2015 Fifth international conference on advanced computing and communication technologies. IEEE, pp. 431–436Google Scholar
  18. 18.
    Bahl S, Sharma SK (2016) A minimal subset of features using correlation feature selection model for intrusion detection system. In: Proceedings of the second international conference on computer and communication technologies. Springer, New Delhi, pp. 337–346Google Scholar
  19. 19.
    Morariu D, Cretulescu R, Breazu M (2013) Feature selection in document classification. In: The fourth international conference in romania of information science and information literacy, ISSN-L pp 2247–0255Google Scholar
  20. 20.
    Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Technol 5(1):1–167CrossRefGoogle Scholar
  21. 21.
    Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093CrossRefGoogle Scholar
  22. 22.
    Hussein DMEDM (2018) A survey on sentiment analysis challenges. J King Saud Univ Eng Sci 30(4):330–338Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.USICT, Guru Gobind Singh Indraprastha UniversityDelhiIndia

Personalised recommendations