Harnessing Twitter for Automatic Sentiment Identification Using Machine Learning Techniques

  • Amiya Kumar Dash
  • Jitendra Kumar Rout
  • Sanjay Kumar Jena
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 44)

Abstract

User generated content on twitter gives an ample source to gathering individuals’ opinion. Because of the huge number of tweets in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the tweets from corpuses which, requires knowledge of sentiment bearing words. Many computational techniques, models and algorithms are there for identifying sentiment from unstructured text. Most of them rely on machine-learning techniques, using bag-of-words (BoW) representation as their basis. In this paper, we have applied three different machine learning algorithm (Naive Bayes (NB), Maximum Entropy (ME) and Support Vector Machines (SVM)) for sentiment identification of tweets, to study the effectiveness of various feature combination. Our experiments demonstrate that NB with Laplace smoothing considering unigram, Part-of-Speech (POS) as feature and SVM with unigram as feature are effective in classifying the tweets.

Keywords

Bag-of-words (BoW) Machine learning algorithms Laplace smoothing Part-of-Speech (POS) 

References

  1. 1.
    Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Human Lang. Technol. 5(1), 1–167 (2012)CrossRefGoogle Scholar
  2. 2.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics (2004)Google Scholar
  3. 3.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)Google Scholar
  4. 4.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  5. 5.
    Sadegh, M., Ibrahim, R., Othman, Z.A.: Opinion mining and sentiment analysis: a survey. Int. J. Comput. Technol. 2(3), 171–178 (2012)Google Scholar
  6. 6.
    Wang, W., Chen, L., Thirunarayan, K., Sheth, A.P.: Harnessing twitter “big data” for automatic emotion identification. In: Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom), pp. 587–592. IEEE (2012)Google Scholar
  7. 7.
    Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI-99 Workshop on Machine Learning for Information Filtering. vol. 1, pp. 61–67 (1999)Google Scholar
  8. 8.
    Turney, P., Littman, M.L.: Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus (2002)Google Scholar
  9. 9.
    Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. Association for Computational Linguistics (2002)Google Scholar
  10. 10.
    Vipul Pandey, C.I.: Sentiment analysis of microblogs. In: Diploma Thesis, CS 229 Project Report, Stanford UniversityGoogle Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  • Amiya Kumar Dash
    • 1
  • Jitendra Kumar Rout
    • 1
  • Sanjay Kumar Jena
    • 1
  1. 1.Department of Computer Science & EngineeringNational Institute of Technology RourkelaRourkelaIndia

Personalised recommendations