Skip to main content

An Empirical Evaluation of Correlation Based Feature Selection for Tweet Sentiment Classification

Part of the Lecture Notes in Electrical Engineering book series (LNEE,volume 643)

Abstract

This paper presents a study on Twitter sentiment analysis where tweets are gathered and sentiments behind the tweet are evaluated by using various machine learning techniques. It presents an empirical evaluation of correlation based feature selection for sentiment classification on twitter data. The data is extracted from twitter in real time and text preprocessing and feature extraction is applied on the textual data. Correlation based attribute selection methods are used and machine learning classifiers (SVM, Naïve Bayes, Random Forest, Meta classifier, SGD, Logistic Regression) are compared on various performance parameters to show which classifier gives better results. The results show that when STWV with Attribute Selection methods are used together in the same setup, the classifiers give accuracy between 78 and 88% with about 0.88 true positive rate and 0.15 false positive rate which is far better when no attribute selection method is used.

Keywords

  • Twitter analysis
  • Sentiment analysis
  • Correlation based feature selection
  • Machine learning techniques

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-981-15-3125-5_22
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-981-15-3125-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.99
Price excludes VAT (USA)
Hardcover Book
USD   179.99
Price excludes VAT (USA)
Fig. 1

References

  1. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: LREc. vol 10

    Google Scholar 

  2. Huang C-L, Hu Y-C, Lin C-H (2017) Twitter sentiment analysis. https://cseweb.ucsd.edu/classes/wi17/cse258-a/reports/a080.pdf. Accessed 20 Apr 2018 (Online)

  3. Song J, Kim KT, Lee B, Kim S, Youn HY (2017) A novel classification approach based on na¨ıve bayes for twitter sentiment analysis. KSII Trans Internet Inf Syst (TIIS) 11(6):2996–3011

    Google Scholar 

  4. Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F (2018) Tsaf: twitter sentiment analysis framework using a hybrid classification scheme. Expert Syst 35(1):e12233

    CrossRef  Google Scholar 

  5. Sharma S, Jain A (2019) Cyber social media analytics and issues: a pragmatic approach for twitter sentiment analysis. In: Bhatia S, Tiwari S, Mishra K, Trivedi M (eds) Advances in computer communication and computational sciences. Advances in Intelligent Systems and Computing, vol 924. Springer, Singapore

    Google Scholar 

  6. Maurya S, Jain A (2019) Ranking based feature selection techniques for phishing website identification. In: IEEE international conference on machine learning and data science 2018 (in press)

    Google Scholar 

  7. Yildirim P (2015) Filter based feature selection methods for prediction of risk in hepatitis disease. Int J Mach Learn Comput 5(4)

    Google Scholar 

  8. Holte RC (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11:63–91

    CrossRef  Google Scholar 

  9. Hall MA (1999) Correlation-based feature selection for machine learning. In: Ph.D. Department of Computer Science, The University of Waikato, Hamilton

    Google Scholar 

  10. Novakovic J, Strbac P, Bulatovic D (2011) Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav J Oper Res 21(1):119–135

    CrossRef  MathSciNet  Google Scholar 

  11. Hall MA, Smith LA (1999) Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: Proceedings of the twelfth international Florida artificial intelligence research society conference, pp 235–239

    Google Scholar 

  12. Ashraf M, Chetty G, Tran D (2013) Feature selection techniques on thyroid, hepatitis, and breast cancer datasets. Int J Data Min Intell Inf Technol Appl (IJMIA) 3(1):1–8

    Google Scholar 

  13. Leach M (2012) Parallelising feature selection algorithms. University of Manchester, Manchester

    Google Scholar 

  14. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining ACM, pp 168–177

    Google Scholar 

  15. Zareapoor M, Seeja KR (2015) Feature extraction or feature selection for text classification: A case study on phishing email detection. Int J Inf Eng Electron Bus 7(2):60

    Google Scholar 

  16. Khonji M, Jones A, Iraqi Y (2013) An empirical evaluation for feature selection methods in phishing email classification. Int J Comput Syst Sci Eng 28(1):37–51

    Google Scholar 

  17. Bahl S, Sharma SK (2015) Improving classification accuracy of intrusion detection system using feature subset selection. In: 2015 Fifth international conference on advanced computing and communication technologies. IEEE, pp. 431–436

    Google Scholar 

  18. Bahl S, Sharma SK (2016) A minimal subset of features using correlation feature selection model for intrusion detection system. In: Proceedings of the second international conference on computer and communication technologies. Springer, New Delhi, pp. 337–346

    Google Scholar 

  19. Morariu D, Cretulescu R, Breazu M (2013) Feature selection in document classification. In: The fourth international conference in romania of information science and information literacy, ISSN-L pp 2247–0255

    Google Scholar 

  20. Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Technol 5(1):1–167

    CrossRef  Google Scholar 

  21. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093

    CrossRef  Google Scholar 

  22. Hussein DMEDM (2018) A survey on sentiment analysis challenges. J King Saud Univ Eng Sci 30(4):330–338

    Google Scholar 

Download references

Acknowledgements

This publication is an outcome of the R&D work undertaken project under the Visvesvaraya Ph.D. Scheme of Ministry of Electronics and Information Technology, Government of India, being implemented by Digital India Corporation and with the cooperation of GGSIP University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanur Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Sharma, S., Jain, A. (2020). An Empirical Evaluation of Correlation Based Feature Selection for Tweet Sentiment Classification. In: Gunjan, V., Senatore, S., Kumar, A., Gao, XZ., Merugu, S. (eds) Advances in Cybernetics, Cognition, and Machine Learning for Communication Technologies. Lecture Notes in Electrical Engineering, vol 643. Springer, Singapore. https://doi.org/10.1007/978-981-15-3125-5_22

Download citation