Sentiment Analysis of Arabic and English Tweets

Elhadad, Mohamed K.; Li, Kin Fun; Gebali, Fayez

doi:10.1007/978-3-030-15035-8_32

Mohamed K. Elhadad¹⁸,
Kin Fun Li¹⁸ &
Fayez Gebali¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 927))

Included in the following conference series:

Workshops of the International Conference on Advanced Information Networking and Applications

2852 Accesses
5 Citations

Abstract

Due to the continuous and rapid growth of daily posted data on the social media sites in many different languages, the automated classification of this huge amount of data has become one of the most important tasks for handling, managing, and organizing this huge amount of textual data. There exist many examples of social media sites, but Twitter is considered to be one of the most popular and commonly used, as users are able to communicate with each other, share their opinions, and express their emotions (sentiments) in the form of convenient short blogs using less than 140 words. Accordingly, many companies and organizations may analyze these sentiments in order to evaluate the users’ thoughts, and determine their polarity from the content of the text. For this process, natural language processing techniques, statistics, or machine learning algorithms are being used to identify and extract the sentiment of the text. In practice, many data mining techniques and algorithms are being applied to observe patterns and correlation among that huge amount of data. This paper proposes an efficient approach in handling Tweets, in both Arabic and English languages, with different processing techniques applied. This approach is based on using the Vector Space Model (VSM) to represent text documents and Tweets, and the Term Frequency Inverse Document Frequency (TFIDF) in a term weighting process to generate the feature vector for classification process. The proposed approach has been evaluated using several experiments with different classifiers on five datasets: Decision trees, Naive-Bayes, kNN, Logistic Regression, Perceptron, and Multilayer Perceptron. The experimental results reveal the effectiveness of our proposed approach when comparing classification results with the published work in [1,2,3].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rane, A., Anand, K.: Sentiment classification system of Twitter data for US airline service analysis. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), pp. 769–773. IEEE, Tokyo (2018)
Google Scholar
Nabil, M., Alay, M., Atiya, A.: ASTD: Arabic sentiment tweets dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2515–2519. Association for Computational Linguistics, Lisbon (2015)
Google Scholar
Baly, R., Badaro, G., El-Khoury, G., Moukalled, R., Aoun, R., Hajj, H., El-Hajj, W., Habash, N., Shaban, K.: A characterization study of Arabic Twitter data with a benchmarking for state-of-the-art opinion mining models. In: Proceedings of the Third Arabic Natural Language Processing Workshop (WANLP), pp. 110–118. Association for Computational Linguistics, Valencia (2017)
Google Scholar
Heikal, M., Torki, M., El-Makky, N.: Sentiment analysis of Arabic Tweets using deep learning. Procedia Comput. Sci. 142, 114–122 (2018)
Article Google Scholar
Goel, A., Gautam, J., Kumar, S.: Real time sentiment analysis of Tweets using Naive Bayes. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT). IEEE (2016)
Google Scholar
Holzinger, A., Stocker, C., Ofner, B., Prohaska, G., Brabenetz, A., Hofmann-Wellenhof, R.: Combining HCI. In: Natural Language Processing, and Knowledge Discovery-Potential of IBM Content Analytics as an Assistive Technology in the Biomedical Field, pp. 13–24. Springer, Heidelberg (2013)
Google Scholar
Ristoski, P., Paulheim, H.: Semantic web in data mining and knowledge discovery: a comprehensive survey. Web Semant. Sci. Serv. Agents World Wide Web 36, 1–22 (2016)
Article Google Scholar
Mukherjee, S., Shaw, R., Haldar, N., Changdar, S.: A survey of data mining applications and techniques. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 6(5), 4663–4666 (2015)
Google Scholar
Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Based Syst. 89, 14–46 (2015)
Article Google Scholar
Neethu, M.S., Rajasree, R.: Sentiment analysis in Twitter using machine learning techniques. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1–5. IEEE, Tiruchengode (2013)
Google Scholar
Jindal, R., Malhotra, R., Jain, A.: Techniques for text classification: literature review and current trends. Webology 12(2), 1–28 (2015)
Google Scholar
Yang, P., Chen, Y.: A survey on sentiment analysis by using machine learning methods. In: 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp. 117–121. IEEE, Chengdu (2017)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86. Association for Computational Linguistics, ACM, Philadelphia (2002)
Google Scholar
Das, S., Behera, R.K., Rath, S.K.: Real-time sentiment analysis of Twitter streaming data for stock prediction. Procedia Comput. Sci. 132, 956–964 (2018)
Article Google Scholar
Kim, S.-B., Han, K.-S., Rim, H.-C., Myaeng, S.H.: Some effective techniques for Naive Bayes text classification. IEEE Trans. Knowl. Data Eng. 18(11), 1457–1466 (2006)
Article Google Scholar
Niu, Z., Yin, Z., Kong, X.: Sentiment classification for microblog by machine learning. In: 2012 Fourth International Conference on Computational and Information Sciences, pp. 286–289. IEEE, Chongqing (2012)
Google Scholar
Barbosa, L., Feng, J.: Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, pp. 36–44. ACM, Beijing (2010)
Google Scholar
Celikyilmaz, A., Hakkani-Tür, D., Feng, J.: Probabilistic model-based sentiment analysis of Twitter messages. In: 2010 IEEE Spoken Language Technology Workshop, pp. 79–84. IEEE, Berkeley (2011)
Google Scholar
Guellil, I., Adeel, A., Azouaou, F., Hachani, A.E., Hussain, A.: Arabizi sentiment analysis based on transliteration and automatic corpus annotation. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 335–341. Association for Computational Linguistics, Brussels (2018)
Google Scholar
Bhavitha, B.K., Rodrigues, A.P., Chiplunkar, N.J.: Comparative study of machine learning techniques in sentimental analysis. In: 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 216–221. IEEE, Coimbatore (2017)
Google Scholar
Dashtipour, K., Poria, S., Hussain, A., Cambria, E., Hawalah, A.Y., Gelbukh, A., Zhou, Q.: Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn. Comput. 8(4), 757–771 (2016)
Article Google Scholar
Balahur, A., Turchi, M.: Multilingual sentiment analysis using machine translation? In: Proceedings of the 3rd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, pp. 52–60. Association for Computational Linguistics, Jeju (2012)
Google Scholar
Sedding, J., Kazakov, D.: WordNet-based text document clustering. In: Proceedings of the 3rd Workshop on Robust Methods in Analysis of Natural Language Data, pp. 104–113. Association for Computational Linguistics, ACM, Geneva (2004)
Google Scholar
Peng, X., Choi, B.: Document classifications based on word semantic hierarchies. Artif. Intell. Appl. 5, 362–367 (2005)
Google Scholar
Marcus, M.P., Marcinkiewics, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993). Special issue on using large corpora: II (Association for Computational Linguistics, MIT Press Cambridge)
Google Scholar
Maamouri, M., Bies, A., Krouna, S., Gaddeche, F., Bouziri, B.: Arabic tree banking morphological analysis and pos annotation, Ver. 3.8. Linguistic Data Consortium, Univ. Pennsylvania, Pennsylvania (2009)
Google Scholar
Albared, M., Omar, N., Ab Aziz, M.J., Nazri, M.Z.A.: Automatic part of speech tagging for Arabic: an experiment using Bigram hidden Markov model. In: International Conference on Rough Sets and Knowledge Technology, pp. 361–370. Springer, Heidelberg (2010)
Google Scholar
NL company, Ranks: Stopword Lists. ranks.nl. (n.d). https://www.ranks.nl/stopwords. Accessed 19 Jan 2019
Wen, C.Y.J.: Text categorization based on a similarity approach. In: Proceedings of International Conference on Intelligent System and Knowledge Engineering, pp. 1–5. Atlantis Press, China (2007)
Google Scholar
Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, L.: Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. 26(3), 13 (2008)
Article Google Scholar
Schütze, H., Manning, C.D., Raghavan, P.: Introduction to Information Retrieval, vol. 39. Cambridge University Press, Cambridge (2008)
MATH Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
ElSahar, H., El-Beltagy, S.R.: Building large arabic multi-domain resources for sentiment analysis. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing, CICLing 2015. LNCS, pp. 23–34. Springer (2015)
Google Scholar
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision, p. 12. CS224N Project Report, Stanford (2009)
Google Scholar
Twitter-Airline-Sentiment Dataset: taken from the standard Kaggle, vol. 2. Kaggle (2017)
Google Scholar
Purvank. Uber-rider-reviews-dataset taken from the standard Kaggle, kaggle (2018)
Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Victoria, Victoria, Canada
Mohamed K. Elhadad, Kin Fun Li & Fayez Gebali

Authors

Mohamed K. Elhadad
View author publications
You can also search for this author in PubMed Google Scholar
Kin Fun Li
View author publications
You can also search for this author in PubMed Google Scholar
Fayez Gebali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kin Fun Li .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
Department of Advanced Sciences, Hosei University, Koganei-Shi, Tokyo, Japan
Makoto Takizawa
Department of Computer Science, Technical University of Catalonia, Barcelona, Barcelona, Spain
Fatos Xhafa
Faculty of Business Administration, Rissho University, Tokyo, Japan
Tomoya Enokido

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Elhadad, M.K., Li, K.F., Gebali, F. (2019). Sentiment Analysis of Arabic and English Tweets. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds) Web, Artificial Intelligence and Network Applications. WAINA 2019. Advances in Intelligent Systems and Computing, vol 927. Springer, Cham. https://doi.org/10.1007/978-3-030-15035-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-15035-8_32
Published: 15 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15034-1
Online ISBN: 978-3-030-15035-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics