Skip to main content

Deep Learning for Twitter Sentiment Analysis: The Effect of Pre-trained Word Embedding

  • Chapter
  • First Online:
Machine Learning Paradigms

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 18))

Abstract

Twitter is the most popular microblogging platform, with millions of users exchanging daily a huge volume of text messages, called “tweets”. This has resulted in an enormous source of unstructured data, Big Data. Such a Big Data can be analyzed by companies or organizations with the purpose of extracting customer perspective about their products or services and monitoring marketing trends. Understanding automatically the opinions behind user-generated content, called “Big Data Analytics”, is of great concern. Deep learning can be used to make discriminative tasks of Big Data Analytics easier and with higher performance. Deep learning is an aspect of machine learning which refers to an artificial neural network with multiple layers and has been extensively used to address Big Data challenges, like semantic indexing, data tagging and immediate information retrieval. Deep learning requires its input to be represented as word embeddings, i.e. as a real-value vector in a high-dimensional space. However, word embedding models need large corpuses for training and presenting a reliable word vector. Thus, there are a number of pre-trained word embeddings freely available to leverage. In effect, these are words and their corresponding n-dimensional word vectors, made by different research teams. In this work, we have made data analysis with huge numbers of tweets taken as big data and thereby classifying their polarity using a deep learning approach with four notable pre-trained word vectors, namely Google’s Word2Vec, Stanford’s Crawl GloVe, Stanford’s Twitter GloVe, and Facebook’s FastText. One major conclusion is that tweet classification using deep learning outperforms the baseline machine learning algorithms. At the same time and with regard to pre-trained word embeddings, FastText provides more consistent results across datasets, while Twitter GloVe obtains very good accuracy rates despite its lower dimensionality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/loretoparisi/word2vec-twitter.

  2. 2.

    http://en.wikipedia.org/wiki/List_of_emoticons.

  3. 3.

    https://code.google.com/archive/p/word2vec/.

  4. 4.

    https://nlp.stanford.edu/projects/glove/.

  5. 5.

    https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md.

References

  1. S. Sohangir, D. Wang, A. Pomeranets, T.M. Khoshgoftaar, Big Data: deep learning for financial sentiment analysis. J. Big Data 5(1) 2018

    Google Scholar 

  2. V. Sahayak, V. Shete, A. Pathan, Sentiment analysis on twitter data. Int. J. Innov. Res. Adv. Eng. (IJIRAE) 2(1), 178–183 (2015)

    Google Scholar 

  3. K.C. Tsai, L.L. Wang, Z. Han, Caching for mobile social networks with deep learning: twitter analysis for 2016 US election. IEEE Trans. Netw. Sci. Eng. (2018)

    Google Scholar 

  4. C. Troussas, A. Krouska, M. Virvou, Trends on sentiment analysis over social networks: pre-processing ramifications, stand-alone classifiers and ensemble averaging, in Machine Learning Paradigms (Cham, Springer, 2019), pp. 161–186

    Google Scholar 

  5. A. Krouska, C. Troussas, M. Virvou, Comparative evaluation of algorithms for sentiment analysis over social networking services. J. Univers. Comput. Sci. 23(8), 755–768 (2017)

    Google Scholar 

  6. C. Troussas, A. Krouska, M. Virvou, Evaluation of ensemble-based sentiment classifiers for twitter data, in 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA) (2016), pp. 1–6

    Google Scholar 

  7. H. Thakkar, D. Patel, Approaches for sentiment analysis on twitter: a state-of-art study (2015). arXiv:1512.01043

  8. Q.T. Ain, M. Ali, A. Riaz, A. Noureen, M. Kamran, B. Hayat, A. Rehman, Sentiment analysis using deep learning techniques: a review. Int. J. Adv. Comput. Sci. Appl. 8(6), 424 (2017)

    Google Scholar 

  9. T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space (2013). arXiv:1301.3781

  10. T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems (2013), pp. 3111–3119

    Google Scholar 

  11. J. Pennington, R. Socher, C. Manning, Glove: global vectors for word representation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1532–1543

    Google Scholar 

  12. P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information (2016). arXiv:1607.04606

  13. M. Cliche, BB_twtr at SemEval-2017 Task 4: twitter sentiment analysis with CNNs and LSTMs (2017). arXiv:1704.06125

  14. H. Nguyen, M.L. Nguyen, A deep neural architecture for sentence-level sentiment classification in Twitter social networking, in International Conference of the Pacific Association for Computational Linguistics (2017), pp. 15–27

    Google Scholar 

  15. A. Severyn, A. Moschitti, Twitter sentiment analysis with deep convolutional neural networks. in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (2015), pp. 959–962

    Google Scholar 

  16. A. Cocos, A.G. Fiks, A.J. Masino, Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts. J. Am. Med. Inform. Assoc. 24(4), 813–821 (2017)

    Article  Google Scholar 

  17. T. Wu, S. Liu, J. Zhang, Y. Xiang, Twitter spam detection based on deep learning, in Proceedings of the Australasian Computer Science Week Multiconference (2017), p. 3

    Google Scholar 

  18. Z. Jianqiang, G. Xiaolin, Z. Xuejun, Deep convolution neural networks for Twitter sentiment analysis. IEEE Access 6, 23253–23260 (2018)

    Article  Google Scholar 

  19. J. Wehrmann, W. Becker, H.E. Cagnini, R.C. Barros, A character-based convolutional neural network for language-agnostic Twitter sentiment analysis, in International Joint Conference on Neural Networks (IJCNN) (2017), pp. 2384–2391

    Google Scholar 

  20. P. Vateekul, T. Koomsubha, A study of sentiment analysis using deep learning techniques on Thai Twitter data, in 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE) (2016), pp. 1–6

    Google Scholar 

  21. H. Saif, M. Fernandez, Y. He, H. Alani, Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold, in 1st International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013) (2013)

    Google Scholar 

  22. D.A. Shamma, L. Kennedy, E.F. Churchill, Tweet the debates: understanding community annotation of uncollected sources, in Proceedings of the First SIGMM Workshop on Social Media (2009), pp. 3–10

    Google Scholar 

  23. M. Speriosu, N. Sudan, S. Upadhyay, J. Baldridge, Twitter polarity classification with label propagation over lexical links and the follower graph, in Proceedings of the First Workshop on Unsupervised Learning in NLP (2011), pp. 53–63

    Google Scholar 

  24. A. Krouska, C. Troussas, M. Virvou, The effect of preprocessing techniques on twitter sentiment analysis, in 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA) (2016), pp. 1–5

    Google Scholar 

  25. J. Zhang, C. Zong, Deep neural networks in machine translation: an overview. IEEE Intell. Syst. 30(5), 16–25 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria Virvou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Krouska, A., Troussas, C., Virvou, M. (2020). Deep Learning for Twitter Sentiment Analysis: The Effect of Pre-trained Word Embedding. In: Tsihrintzis, G., Jain, L. (eds) Machine Learning Paradigms. Learning and Analytics in Intelligent Systems, vol 18. Springer, Cham. https://doi.org/10.1007/978-3-030-49724-8_5

Download citation

Publish with us

Policies and ethics