Abstract
Twitter is the most popular microblogging platform, with millions of users exchanging daily a huge volume of text messages, called “tweets”. This has resulted in an enormous source of unstructured data, Big Data. Such a Big Data can be analyzed by companies or organizations with the purpose of extracting customer perspective about their products or services and monitoring marketing trends. Understanding automatically the opinions behind user-generated content, called “Big Data Analytics”, is of great concern. Deep learning can be used to make discriminative tasks of Big Data Analytics easier and with higher performance. Deep learning is an aspect of machine learning which refers to an artificial neural network with multiple layers and has been extensively used to address Big Data challenges, like semantic indexing, data tagging and immediate information retrieval. Deep learning requires its input to be represented as word embeddings, i.e. as a real-value vector in a high-dimensional space. However, word embedding models need large corpuses for training and presenting a reliable word vector. Thus, there are a number of pre-trained word embeddings freely available to leverage. In effect, these are words and their corresponding n-dimensional word vectors, made by different research teams. In this work, we have made data analysis with huge numbers of tweets taken as big data and thereby classifying their polarity using a deep learning approach with four notable pre-trained word vectors, namely Google’s Word2Vec, Stanford’s Crawl GloVe, Stanford’s Twitter GloVe, and Facebook’s FastText. One major conclusion is that tweet classification using deep learning outperforms the baseline machine learning algorithms. At the same time and with regard to pre-trained word embeddings, FastText provides more consistent results across datasets, while Twitter GloVe obtains very good accuracy rates despite its lower dimensionality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
S. Sohangir, D. Wang, A. Pomeranets, T.M. Khoshgoftaar, Big Data: deep learning for financial sentiment analysis. J. Big Data 5(1) 2018
V. Sahayak, V. Shete, A. Pathan, Sentiment analysis on twitter data. Int. J. Innov. Res. Adv. Eng. (IJIRAE) 2(1), 178–183 (2015)
K.C. Tsai, L.L. Wang, Z. Han, Caching for mobile social networks with deep learning: twitter analysis for 2016 US election. IEEE Trans. Netw. Sci. Eng. (2018)
C. Troussas, A. Krouska, M. Virvou, Trends on sentiment analysis over social networks: pre-processing ramifications, stand-alone classifiers and ensemble averaging, in Machine Learning Paradigms (Cham, Springer, 2019), pp. 161–186
A. Krouska, C. Troussas, M. Virvou, Comparative evaluation of algorithms for sentiment analysis over social networking services. J. Univers. Comput. Sci. 23(8), 755–768 (2017)
C. Troussas, A. Krouska, M. Virvou, Evaluation of ensemble-based sentiment classifiers for twitter data, in 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA) (2016), pp. 1–6
H. Thakkar, D. Patel, Approaches for sentiment analysis on twitter: a state-of-art study (2015). arXiv:1512.01043
Q.T. Ain, M. Ali, A. Riaz, A. Noureen, M. Kamran, B. Hayat, A. Rehman, Sentiment analysis using deep learning techniques: a review. Int. J. Adv. Comput. Sci. Appl. 8(6), 424 (2017)
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space (2013). arXiv:1301.3781
T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems (2013), pp. 3111–3119
J. Pennington, R. Socher, C. Manning, Glove: global vectors for word representation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1532–1543
P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information (2016). arXiv:1607.04606
M. Cliche, BB_twtr at SemEval-2017 Task 4: twitter sentiment analysis with CNNs and LSTMs (2017). arXiv:1704.06125
H. Nguyen, M.L. Nguyen, A deep neural architecture for sentence-level sentiment classification in Twitter social networking, in International Conference of the Pacific Association for Computational Linguistics (2017), pp. 15–27
A. Severyn, A. Moschitti, Twitter sentiment analysis with deep convolutional neural networks. in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (2015), pp. 959–962
A. Cocos, A.G. Fiks, A.J. Masino, Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts. J. Am. Med. Inform. Assoc. 24(4), 813–821 (2017)
T. Wu, S. Liu, J. Zhang, Y. Xiang, Twitter spam detection based on deep learning, in Proceedings of the Australasian Computer Science Week Multiconference (2017), p. 3
Z. Jianqiang, G. Xiaolin, Z. Xuejun, Deep convolution neural networks for Twitter sentiment analysis. IEEE Access 6, 23253–23260 (2018)
J. Wehrmann, W. Becker, H.E. Cagnini, R.C. Barros, A character-based convolutional neural network for language-agnostic Twitter sentiment analysis, in International Joint Conference on Neural Networks (IJCNN) (2017), pp. 2384–2391
P. Vateekul, T. Koomsubha, A study of sentiment analysis using deep learning techniques on Thai Twitter data, in 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE) (2016), pp. 1–6
H. Saif, M. Fernandez, Y. He, H. Alani, Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold, in 1st International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013) (2013)
D.A. Shamma, L. Kennedy, E.F. Churchill, Tweet the debates: understanding community annotation of uncollected sources, in Proceedings of the First SIGMM Workshop on Social Media (2009), pp. 3–10
M. Speriosu, N. Sudan, S. Upadhyay, J. Baldridge, Twitter polarity classification with label propagation over lexical links and the follower graph, in Proceedings of the First Workshop on Unsupervised Learning in NLP (2011), pp. 53–63
A. Krouska, C. Troussas, M. Virvou, The effect of preprocessing techniques on twitter sentiment analysis, in 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA) (2016), pp. 1–5
J. Zhang, C. Zong, Deep neural networks in machine translation: an overview. IEEE Intell. Syst. 30(5), 16–25 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Krouska, A., Troussas, C., Virvou, M. (2020). Deep Learning for Twitter Sentiment Analysis: The Effect of Pre-trained Word Embedding. In: Tsihrintzis, G., Jain, L. (eds) Machine Learning Paradigms. Learning and Analytics in Intelligent Systems, vol 18. Springer, Cham. https://doi.org/10.1007/978-3-030-49724-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-49724-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49723-1
Online ISBN: 978-3-030-49724-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)