Advertisement

A Comparison of Supervised Machine Learning Approaches for Categorized Tweets

  • M. VadivukarassiEmail author
  • N. Puviarasan
  • P. Aruna
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 26)

Abstract

In Twitter, the user send their opinion in small messages called as tweets. In this paper, machine learning techniques have been implemented for tweet categorization. The machine learning techniques contains different classification models and have various performance measures. In this proposed work, a simple and workable approach is identified for predicting the category of the tweets and aims to investigate the machine learning techniques on real time Twitter data. Here, the three techniques are used such as Naïve Bayes classifier, LinearSVC and MultinomialNB. Before applying the classifier, a simple method is used for pre-processing called as term frequency-inverse document frequency. It is used for tweet classification to get the weight score as the feature vector. This feature extraction method TF-IDF used to identify the most frequent words in the tweets. The dataset that has been collected from Twitter streaming API for each topic which consists of English tweets called as proposed corpus. Based on the accuracy, the performance measures of tweet classification has been calculated. Finally, the results have shown that MutinomialNB has performed better experimentally compared to the other two different techniques by obtaining 79% of accuracy.

Keywords

LinearSVC Machine learning MultinomialNB Naïve Bayes Twitter Tweet categorization 

Notes

Acknowledgements

This research work was supported by Department of Computer Science and Engineering, Annamalai University and funded by University Grants Commission (UGC) of Indian Union government.

References

  1. 1.
    Bakliwal, A., Arora, P., Madhappan, S.: Mining Sentiments From Tweets. IIIT Hyderabad (2012)Google Scholar
  2. 2.
    Batool, R., Khattak, A.M.: Precise tweet classification and sentiment analysis. In: International Conference on Computer and Information Science, pp. 461–466 (2013)Google Scholar
  3. 3.
    Read, J.: Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: ACM Student Workshop, pp. 43–48 (2005)Google Scholar
  4. 4.
    Lee, K., Palsetia, D., Narayanan, R.: Twitter trending topic classification. In: International Conference on Data Mining Workshops, pp. 251–258 (2011)Google Scholar
  5. 5.
    Dilrukshi, I.: Twitter news classification using SVM. In: International Conference on Computer Science & Education, pp. 287–291 (2013)Google Scholar
  6. 6.
    Zubiaga, A., Spina, D., Fresno, V., Martínez, R.: Real-time classification of Twitter trends. J. Am. Soc. Inf. Sci. Technol. (2013)Google Scholar
  7. 7.
    Nishida, K., Banno, R.: Tweet classification by data compression. In: International Workshop on Detecting and Exploiting Cultural Diversity on the Social Web, pp. 29–34 (2011)Google Scholar
  8. 8.
    Pennacchiotti, M., Popescu, A.-M.: Machine learning approach to Twitter user classification. In: Fifth International AAAI Conference on Weblogs and Social Media (2011)Google Scholar
  9. 9.
    Hu, X., Sun, N., Zhang, C., Chua, T.-S.: Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 919–928. ACM (2009)Google Scholar
  10. 10.
    Malkani, Z., Gillie, E.: Supervised Multi-Class Classification Of Tweets. Stanford University, December 2012Google Scholar
  11. 11.
    Sriram, B., Fuhry, D.: Short text classification in Twitter to improve information filtering. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 841–842 (2010)Google Scholar
  12. 12.
    Forman, G., Kirshenbaum, E.: Extremely fast text feature extraction for classification and indexing. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 1221–1230 (2008)Google Scholar
  13. 13.
    Joho, H., Sanderson, M.: Document frequency and term specificity. In: Large Scale Semantic Access to Content (Text, Image, Video, and Sound), pp. 350–359 (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringAnnamalai UniversityChidambaramIndia
  2. 2.Department of Computer and Information ScienceAnnamalai UniversityChidambaramIndia

Personalised recommendations