Advertisement

Classifying Political Tweets Using Naïve Bayes and Support Vector Machines

  • Ahmed Al Hamoud
  • Ali Alwehaibi
  • Kaushik Roy
  • Marwan Bikdash
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10868)

Abstract

Twitter, which is one of the most popular microblogging platforms and contains a huge amount of meaningful information, can be used in opinion mining and sentiment analysis. Twitter data contains text communication of more than 330 million active users monthly. This research effort applies the machine learning techniques to determine whether the contents of tweets are political or apolitical. Preprocessing involves cleaning-up the texts to obtain meaningful information and accurate opinions. Bag-of-Words (BOW), Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) were used to extract the features from twitter data. We then used Chi-Square technique to select the salient features from a high dimensional feature set. Finally, Support Vector Machines (SVMs) and Naive Bayes (NB) were applied to classify the twitter data. The results suggest that SVMs with BOW provide the highest accuracy and F-measure.

Keywords

Sentiment analysis Natural language processing Opinion mining Feature selection 

Notes

Acknowledgements

This research is based upon work supported by the Science & Technology Center: Bio/Computational Evolution in Action Consortium (BEACON).

References

  1. 1.
  2. 2.
    Twitter by the Numbers: Stats, Demographics and Fun Facts (2018). https://www.omnicoreagency.com/twitter-statistics/
  3. 3.
  4. 4.
    Karamibekr, M., Ghorbani, A.A.: Verb oriented sentiment classification. In: Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, vol. 01, pp. 327–331 (2012)Google Scholar
  5. 5.
    Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5, 1093–1113 (2014)CrossRefGoogle Scholar
  6. 6.
    Abdel Hady, M.F., Karali, A., Kamal, E., Ibrahim, R.: Unsupervised active learning of CRF model for cross-lingual named entity recognition. In: El Gayar, N., Schwenker, F., Suen, C. (eds.) ANNPR 2014. LNCS (LNAI), vol. 8774, pp. 23–34. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11656-3_3CrossRefGoogle Scholar
  7. 7.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)Google Scholar
  8. 8.
    Read, J.: Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL Student Research Workshop, pp. 43–48 (2005)Google Scholar
  9. 9.
    Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1(12) (2009)Google Scholar
  10. 10.
    Zaman, A.N.K., Brown, C.G.: Latent semantic indexing and large dataset: study of term-weighting schemes. In: 2010 5th International Conference on Digital Information Management, ICDIM 2010, pp. 1–4 (2010)Google Scholar
  11. 11.
    Marchetti-Bowick, M., Chambers, N.: Learning for microblogs with distant supervision: political forecasting with Twitter. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 603–612 (2012)Google Scholar
  12. 12.
    Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)CrossRefGoogle Scholar
  13. 13.
    Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press (2014)Google Scholar
  14. 14.
    Luhn, H.P.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1, 309–317 (1957)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28, 11–21 (1972)CrossRefGoogle Scholar
  16. 16.
    Rutkowski, L., Tadeusiewicz, R., Zadeh, Lotfi A., Zurada, Jacek M. (eds.): ICAISC 2008. LNCS (LNAI), vol. 5097. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-69731-2CrossRefzbMATHGoogle Scholar
  17. 17.
    Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. In: Data Classification: Algorithms and Applications, p. 37 (2014)Google Scholar
  18. 18.
    Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  19. 19.
    Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10, 988–999 (1999)CrossRefGoogle Scholar
  20. 20.
    Gunn, S.R., et al.: Support vector machines for classification and regression. ISIS Technical Report, vol. 14, pp. 85–86 (1998)Google Scholar
  21. 21.
    Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier (2011)Google Scholar
  22. 22.
    Ben-Hur, A., Weston, J.: A user’s guide to support vector machines. Data Min. Tech. Life Sci. 609, 223–239 (2010)CrossRefGoogle Scholar
  23. 23.
    Fletcher, T.: Support vector machines explained: introductory course. Internal report 1–19 (2009)Google Scholar
  24. 24.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Ahmed Al Hamoud
    • 1
  • Ali Alwehaibi
    • 1
  • Kaushik Roy
    • 2
  • Marwan Bikdash
    • 1
  1. 1.Department of Computational Science and EngineeringNorth Carolina Agricultural and Technical State UniversityGreensboroUSA
  2. 2.Department of Computer ScienceNorth Carolina Agricultural and Technical State UniversityGreensboroUSA

Personalised recommendations