Abstract
The importance of building sentiment analysis tools for Arabic social media has been recognized during the past couple of years, especially with the rapid increase in the number of Arabic social media users. One of the main difficulties in tackling this problem is that text within social media is mostly colloquial, with many dialects being used within social media platforms. In this paper, we present a set of features that were integrated with a machine learning based sentiment analysis model and applied on Egyptian, Saudi, Levantine, and MSA Arabic social media datasets. Many of the proposed features were derived through the use of an Arabic Sentiment Lexicon. The model also presents emoticon based features, as well as input text related features such as the number of segments within the text, the length of the text, whether the text ends with a question mark or not, etc. We show that the presented features have resulted in an increased accuracy across six of the seven datasets we’ve experimented with and which are all benchmarked. Since the developed model outperforms all existing Arabic sentiment analysis systems that have publicly available datasets, we can state that this model presents state-of-the-art in Arabic sentiment analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdulla, N.A., Ahmed, N.A., Shehab, M.A., Al-Ayyoub, M.: Arabic sentiment analysis: lexicon-based and corpus-based. In: 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6. IEEE, Amman (2013)
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: rapid prototyping for complex data mining tasks. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–940 (2006)
Shoukry, A., Rafea, A.: Preprocessing Egyptian dialect tweets for sentiment mining. In: Proceedings of 4th Workshop on Computational Approaches to Arabic Script-Based Languages, San Diego, California, USA, pp. 47–56 (2012)
Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: practical machine learning tools and techniques with Java implementations. In: Seminar, vol. 99, pp. 192–196 (1999)
El-Beltagy, S.R., Rafea, A.: An accuracy enhanced light stemmer for Arabic text. ACM Trans. Speech Lang. Process. 7, 2–23 (2011)
Salamah, J.B., Elkhlifi, A.: Microblogging opinion mining approach for Kuwaiti dialect. In: International Conference on Computing Technology and Information Management (ICCTIM 2014), pp. 388–396 (2014)
Duwairi, R., Marji, R., Sha’ban, N., Rushaidat, S.: Sentiment analysis in Arabic tweets. In: 2014 5th International Conference Information and Communication Systems (ICICS), pp. 1–6. IEEE, Irbid (2014)
Salameh, M., Mohammad, S., Kiritchenko, S.: Sentiment after translation: a case-study on Arabic social media posts. In: Proceedings of 2015 Conference of the North American Chapter of Association for Computational Linguistics: Human Language Technologies, pp. 767–777. Association for Computational Linguistics, Denver (2015)
Mohammad, S.M., Kiritchenko, S., Zhu, X.: NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of 7th International Workshop on Semantic Evaluation Exercises (SemEval-2013), Atlanta, Georgia, USA (2013)
Kiritchenko, S., Zhu, X., Mohammad, S.: Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014)
Mourad, A., Darwish, K.: Subjectivity and sentiment analysis of modern standard Arabic and Arabic microblogs. In: Proceedings of 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 55–64 (2013)
Refaee, E., Rieser, V.: Subjectivity and sentiment analysis of Arabic Twitter feeds with limited resources. In: Proceedings of Workshop on Free/Open-Source Arabic corpora and corpora processing tools, Reykjavik, Iceland, pp. 16–21 (2014)
Shoukry, A., Rafea, A.: A hybrid approach for sentiment classification of egyptian dialect tweets. In: 1st International Conference on Arabic Computational Linguistics (ACLing), Cairo, Egypt, pp. 78–85 (2015)
Khalil, T., Halaby, A., Hammad, M.H., El-Beltagy, S.R.: Which configuration works best? An experimental study on supervised Arabic twitter sentiment analysis. In: Proceedings of 1st Conference on Arabic Computational Liguistics (ACLing 2015), Co-located with CICLing 2015, Cairo, Egypt, pp. 86–93 (2015)
Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of 20th International Conference on Machine Learning (ICML 2003), USA, vol. 20, pp. 616–623 (2003)
Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I.H., Trigg, L.: WEKA: a machine learning workbench for data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 1305–1314. Springer, Boston (2005). https://doi.org/10.1007/978-0-387-09823-4_66
ElSahar, H., El-Beltagy, S.R.: Building large Arabic multi-domain resources for sentiment analysis. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 23–34. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18117-2_2
El-Beltagy, S.R.: NileULex: a phrase and word level sentiment lexicon for Egyptian and modern standard Arabic. In: Proceedings of LREC 2016, Portorož, Slovenia (2016, to appear)
El-Beltagy, S.R., Ali, A.: Open issues in the sentiment analysis of Arabic social media: a case study. In: Proceedings of 9th International Conference on Innovations and Information Technology (IIT 2013), Al Ain, UAE (2013)
Zayed, O., El-Beltagy, S.R.: Named entity recognition of persons’ names in Arabic tweets. In: Proceedings of Recent Advances in Natural Language Processing (RANLP 2015), Hissar, Bulgaria (2015)
El-Beltagy, S.R., Rafea, A.: LemaLight: a dictionary based Arabic lemmatizer and stemmer (2016)
El-Beltagy, S.R., Rafea, A.: A corpus based approach for the automatic creation of Arabic broken plural dictionaries. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7816, pp. 89–97. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37247-6_8
Refaee, E., Rieser, V.: An Arabic Twitter Corpus for subjectivity and sentiment analysis. In: Proceedings of 9th Edition of Language Resources and Evaluation Conference (LREC 2014), Iceland (2014)
McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: AAAI/ICML-1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–39 (2011)
El-Bletagy, S.R.: NileTMRG: deriving prior polarities for Arabic sentiment terms. In: Proceedings of SemEval 2016, San Diego, California (2014, submitted)
Acknowledgements
The authors would like to thank Amira Shoukry, Dr. Ahmed Rafea, and Dr. Kareem Darwish for kindly sharing their datasets.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
El-Beltagy, S.R., Khalil, T., Halaby, A., Hammad, M. (2018). Combining Lexical Features and a Supervised Learning Approach for Arabic Sentiment Analysis. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-75487-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75486-4
Online ISBN: 978-3-319-75487-1
eBook Packages: Computer ScienceComputer Science (R0)