Deep Learning for Twitter Sentiment Analysis: The Effect of Pre-trained Word Embedding

Krouska, Akrivi; Troussas, Christos; Virvou, Maria

doi:10.1007/978-3-030-49724-8_5

Akrivi Krouska⁶,
Christos Troussas⁶ &
Maria Virvou⁶

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 18))

1705 Accesses
7 Citations

Abstract

Twitter is the most popular microblogging platform, with millions of users exchanging daily a huge volume of text messages, called “tweets”. This has resulted in an enormous source of unstructured data, Big Data. Such a Big Data can be analyzed by companies or organizations with the purpose of extracting customer perspective about their products or services and monitoring marketing trends. Understanding automatically the opinions behind user-generated content, called “Big Data Analytics”, is of great concern. Deep learning can be used to make discriminative tasks of Big Data Analytics easier and with higher performance. Deep learning is an aspect of machine learning which refers to an artificial neural network with multiple layers and has been extensively used to address Big Data challenges, like semantic indexing, data tagging and immediate information retrieval. Deep learning requires its input to be represented as word embeddings, i.e. as a real-value vector in a high-dimensional space. However, word embedding models need large corpuses for training and presenting a reliable word vector. Thus, there are a number of pre-trained word embeddings freely available to leverage. In effect, these are words and their corresponding n-dimensional word vectors, made by different research teams. In this work, we have made data analysis with huge numbers of tweets taken as big data and thereby classifying their polarity using a deep learning approach with four notable pre-trained word vectors, namely Google’s Word2Vec, Stanford’s Crawl GloVe, Stanford’s Twitter GloVe, and Facebook’s FastText. One major conclusion is that tweet classification using deep learning outperforms the baseline machine learning algorithms. At the same time and with regard to pre-trained word embeddings, FastText provides more consistent results across datasets, while Twitter GloVe obtains very good accuracy rates despite its lower dimensionality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

S. Sohangir, D. Wang, A. Pomeranets, T.M. Khoshgoftaar, Big Data: deep learning for financial sentiment analysis. J. Big Data 5(1) 2018
Google Scholar
V. Sahayak, V. Shete, A. Pathan, Sentiment analysis on twitter data. Int. J. Innov. Res. Adv. Eng. (IJIRAE) 2(1), 178–183 (2015)
Google Scholar
K.C. Tsai, L.L. Wang, Z. Han, Caching for mobile social networks with deep learning: twitter analysis for 2016 US election. IEEE Trans. Netw. Sci. Eng. (2018)
Google Scholar
C. Troussas, A. Krouska, M. Virvou, Trends on sentiment analysis over social networks: pre-processing ramifications, stand-alone classifiers and ensemble averaging, in Machine Learning Paradigms (Cham, Springer, 2019), pp. 161–186
Google Scholar
A. Krouska, C. Troussas, M. Virvou, Comparative evaluation of algorithms for sentiment analysis over social networking services. J. Univers. Comput. Sci. 23(8), 755–768 (2017)
Google Scholar
C. Troussas, A. Krouska, M. Virvou, Evaluation of ensemble-based sentiment classifiers for twitter data, in 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA) (2016), pp. 1–6
Google Scholar
H. Thakkar, D. Patel, Approaches for sentiment analysis on twitter: a state-of-art study (2015). arXiv:1512.01043
Q.T. Ain, M. Ali, A. Riaz, A. Noureen, M. Kamran, B. Hayat, A. Rehman, Sentiment analysis using deep learning techniques: a review. Int. J. Adv. Comput. Sci. Appl. 8(6), 424 (2017)
Google Scholar
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space (2013). arXiv:1301.3781
T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems (2013), pp. 3111–3119
Google Scholar
J. Pennington, R. Socher, C. Manning, Glove: global vectors for word representation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1532–1543
Google Scholar
P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information (2016). arXiv:1607.04606
M. Cliche, BB_twtr at SemEval-2017 Task 4: twitter sentiment analysis with CNNs and LSTMs (2017). arXiv:1704.06125
H. Nguyen, M.L. Nguyen, A deep neural architecture for sentence-level sentiment classification in Twitter social networking, in International Conference of the Pacific Association for Computational Linguistics (2017), pp. 15–27
Google Scholar
A. Severyn, A. Moschitti, Twitter sentiment analysis with deep convolutional neural networks. in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (2015), pp. 959–962
Google Scholar
A. Cocos, A.G. Fiks, A.J. Masino, Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts. J. Am. Med. Inform. Assoc. 24(4), 813–821 (2017)
Article Google Scholar
T. Wu, S. Liu, J. Zhang, Y. Xiang, Twitter spam detection based on deep learning, in Proceedings of the Australasian Computer Science Week Multiconference (2017), p. 3
Google Scholar
Z. Jianqiang, G. Xiaolin, Z. Xuejun, Deep convolution neural networks for Twitter sentiment analysis. IEEE Access 6, 23253–23260 (2018)
Article Google Scholar
J. Wehrmann, W. Becker, H.E. Cagnini, R.C. Barros, A character-based convolutional neural network for language-agnostic Twitter sentiment analysis, in International Joint Conference on Neural Networks (IJCNN) (2017), pp. 2384–2391
Google Scholar
P. Vateekul, T. Koomsubha, A study of sentiment analysis using deep learning techniques on Thai Twitter data, in 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE) (2016), pp. 1–6
Google Scholar
H. Saif, M. Fernandez, Y. He, H. Alani, Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold, in 1st International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013) (2013)
Google Scholar
D.A. Shamma, L. Kennedy, E.F. Churchill, Tweet the debates: understanding community annotation of uncollected sources, in Proceedings of the First SIGMM Workshop on Social Media (2009), pp. 3–10
Google Scholar
M. Speriosu, N. Sudan, S. Upadhyay, J. Baldridge, Twitter polarity classification with label propagation over lexical links and the follower graph, in Proceedings of the First Workshop on Unsupervised Learning in NLP (2011), pp. 53–63
Google Scholar
A. Krouska, C. Troussas, M. Virvou, The effect of preprocessing techniques on twitter sentiment analysis, in 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA) (2016), pp. 1–5
Google Scholar
J. Zhang, C. Zong, Deep neural networks in machine translation: an overview. IEEE Intell. Syst. 30(5), 16–25 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Software Engineering Laboratory, Department of Informatics, University of Piraeus, Piraeus, Greece
Akrivi Krouska, Christos Troussas & Maria Virvou

Authors

Akrivi Krouska
View author publications
You can also search for this author in PubMed Google Scholar
Christos Troussas
View author publications
You can also search for this author in PubMed Google Scholar
Maria Virvou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria Virvou .

Editor information

Editors and Affiliations

Department of Informatics, University of Piraeus, Piraeus, Greece
George A. Tsihrintzis
University of Technology Sydney, NSW, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Krouska, A., Troussas, C., Virvou, M. (2020). Deep Learning for Twitter Sentiment Analysis: The Effect of Pre-trained Word Embedding. In: Tsihrintzis, G., Jain, L. (eds) Machine Learning Paradigms. Learning and Analytics in Intelligent Systems, vol 18. Springer, Cham. https://doi.org/10.1007/978-3-030-49724-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-49724-8_5
Published: 24 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49723-1
Online ISBN: 978-3-030-49724-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics