Skip to main content
Log in

Deep analysis of an Arabic sentiment classification system based on lexical resource expansion and custom approaches building

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Sentiment analysis aims to extract emotions from a broad set of data. This paper studies the impact of lexical resource enrichment on Arabic Sentiment Analysis. At first and as there is a lack of Arabic lexical resources in the field of sentiment analysis, we build new resources and use several lexicon construction methods. The first method is manual and it lies in extracting sentimental words from a selected dataset and the second is semi-automatic and based on translating an English lexicon into Arabic followed by a manual check. Both methods generate terms in word form. Besides the mentioned resources, the paper enriches an existing resource that contains terms related to four specific domains by creating its equivalent lemmatized version. Following various methods, we created lexicons with different morphologies to enrich the existing Arabic resources. Subsequently, these resources are used in developing a polarity classifier. The paper explains the followed steps to construct the different lexical resources, defines the pre-processing levels and gives statistics related to each lexicon. Then, we present the classification approaches we used to determine the polarity of the new data. In order to perform in depth analysis of the results in correspondence to the extracted features, we opt for the unsupervised and the supervised approaches that help to have a clear view on their internal architecture and process. The experiments are based on features alteration, besides opting for a feature selection approach in order to keep the most pertinent features and reduce the characteristic vector size. Moreover, we perform an in depth analysis of the characteristic vectors and corpus nature and we explain the main causes behind results improvement and degradation. The results of the tests carried out show the relevance of each component of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://www.cs.waikato.ac.nz/ml/weka/

  2. https://www.softwaretestinghelp.com/machine-learning-tools/

  3. https://translate.google.com/

  4. https://www.makeuseof.com/tag/best-online-translators/

  5. https://www.almaany.com/ar/dict/ar-ar/

  6. https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/

  7. https://mawdoo3.com/

References

  • Abdulla, N. A., Ahmed, N. A., Shehab, M. A., Al-Ayyoub, M. (2013). Arabic sentiment analysis: Lexicon-based and corpus-based. In 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT). IEEE, pp. 1–6. https://doi.org/10.1109/AEECT.2013.6716448

  • Abdulla, N. A., Ahmed, N. A., Shehab, M. A., Al-Ayyoub, M., Al-Kabi, M. N., & Al-rifai, S. (2014). Towards improving the lexicon-based approach for Arabic sentiment analysis. International Journal of Information Technology and Web Engineering, 9, 55–71. https://doi.org/10.4018/ijitwe.2014070104.

    Article  Google Scholar 

  • Abdul-Mageed, M., & Diab, M. T. (2012). Toward building a large-scale Arabic sentiment Lexicon. Proc. 6th Int. Glob. WordNet Conf. 18–22.

  • Al-Moslmi, T., Albared, M., Al-Shabi, A., Omar, N., & Abdullah, S. (2018). Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis. Journal of Information Science, 44, 345–362. https://doi.org/10.1177/0165551516683908.

    Article  Google Scholar 

  • Al-Sallab, A., Baly, R., Hajj, H., Shaban, K. B., El-Hajj, W., & Badaro, G. (2017). AROMA: A recursive deep learning model for opinion mining in Arabic as a low resource language. ACM Transactions on Asian and Low-Resource Language Information Processing, 16, 1–20. https://doi.org/10.1145/3086575.

    Article  Google Scholar 

  • Al-twairesh, N., Al-khalifa, H., Al-salman, A., 2016. AraSenTi : Large-Scale Twitter-Specific Arabic Sentiment Lexicons. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 697–705).

  • Aly, M., Atiya, A., 2013. LABR: A Large Scale Arabic Book Reviews Dataset. 51st Annu. Meet. Assoc. Comput. Linguist. 494–498.

  • Badaro, G., Baly, R., Hajj, H., Habash, N., El-Hajj, W., 2014. A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP). Association for Computational Linguistics, pp. 165–173. https://doi.org/10.3115/v1/W14-3623.

  • Baly, R., Badaro, G., El-Khoury, G., Moukalled, R., Aoun, R., Hajj, H., El-Hajj, W., Habash, N., & Shaban, K. (2017). A characterization study of Arabic Twitter data with a benchmarking for state-of-the-art opinion mining models. In Proceedings of the Third Arabic Natural Language Processing Workshop. Association for Computational Linguistics, pp. 110–118. https://doi.org/10.18653/v1/W17-1314.

  • Baly, R., Khaddaj, A., Hajj, H., El-hajj, W., & Shaban, K. B. (2014). ArSentD-LEV: A Multi-Topic Corpus for Target-based Sentiment Analysis in Arabic levantine tweets. arXiv preprint arXiv:1906.01830.

  • Boudchiche, M., & Mazroui, A. (2019). A hybrid approach for Arabic lemmatization. International Journal of Speech Technology, 22, 563–573. https://doi.org/10.1007/s10772-018-9528-3.

    Article  Google Scholar 

  • Boudchiche, M., Mazroui, A., Bebah, M. O. A. O., Lakhouaja, A., & Boudlal, A. (2017). AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer. Journal of King Saud University-Computer and Information Sciences, 29(2), 141–146.

    Article  Google Scholar 

  • Duwairi, R., & El-Orfali, M. (2014). A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. Journal of Information Science, 40, 501–513. https://doi.org/10.1177/0165551514534143.

    Article  Google Scholar 

  • Duwairi, R. M., Ahmed, N. A., & Al-Rifai, S. Y. (2015). Detecting sentiment embedded in Arabic social media: A lexicon-based approach. Journal of Intelligent & Fuzzy System, 29, 107–117. https://doi.org/10.3233/IFS-151574.

    Article  Google Scholar 

  • Duwairi, R .M., Qarqaz, I. (2014). Arabic sentiment analysis using supervised classification. In 2014 2nd International Conference on Future Internet of Things and Cloud (FiCloud). IEEE, pp. 579–583. https://doi.org/10.1109/FiCloud.2014.100

  • Elnagar, A., Khalifa, Y. S., & Einea, A. (2018a). Hotel Arabic-reviews dataset construction for sentiment analysis applications. In K. Shaalan, A. E. Hassanien, & F. Tolba (Eds.), Intelligent natural language processing: Trends and applications (pp. 35–52). Cham: Springer International Publishing.

    Chapter  Google Scholar 

  • Elnagar, A., Lulu, L., & Einea, O. (2018b). An annotated huge dataset for standard and colloquial Arabic reviews for subjective sentiment analysis. Procedia Computer Science, 142, 182–189. https://doi.org/10.1016/j.procs.2018.10.474.

    Article  Google Scholar 

  • ElSahar, H., El-Beltagy, S.R., 2015. Building large arabic multi-domain resources for sentiment analysis. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 23–34). Springer, Cham.

  • Eskander, R., & Rambow, O. (2015). SLSA: A Sentiment Lexicon for Standard Arabic. In Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp. 2545–2550. https://doi.org/10.18653/v1/D15-1304

  • Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: A deep learning approach.

  • Ibrahim, H. S., Abdou, S. M., & Gheith, M. (2015a). Sentiment analysis for modern standard Arabic and colloquial. International Journal on Natural Language Computing, 4, 95–109. https://doi.org/10.5121/ijnlc.2015.4207.

    Article  Google Scholar 

  • Ibrahim, H. S., Abdou, S. M., & Gheith, M. (2015b). Automatic expandable large-scale sentiment lexicon of modern standard Arabic and Colloquial. In 2015 first international conference on Arabic computational linguistics (ACLing). IEEE, pp. 94–99. https://doi.org/10.1109/ACLing.2015.20

  • Karoui, J., Zitoune, F. B., & Moriceau, V. (2017). SOUKHRIA: Towards an irony detection system for Arabic in social media. Procedia Computer Science, 117, 161–168. https://doi.org/10.1016/j.procs.2017.10.105.

    Article  Google Scholar 

  • Krouska, A., Troussas, C., & Virvou, M. (2016). The effect of preprocessing techniques on Twitter sentiment analysis. In: 2016 7th International conference on information, intelligence, systems & applications (IISA). IEEE, pp. 1–5. https://doi.org/10.1109/IISA.2016.7785373

  • Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer. Proc. 14th Int. Conf. World Wide Web - WWW 05 342. https://doi.org/10.1145/1060745.1060797

  • Mahyoub, F. H. H., Siddiqui, M. A., & Dahab, M. Y. (2014). Building an Arabic sentiment lexicon using semi-supervised learning. Journal of King Saud University – Computer and Information Sciences, 26, 417–424. https://doi.org/10.1016/j.jksuci.2014.06.003.

    Article  Google Scholar 

  • Mohammad, S., Salameh, M., & Kiritchenko, S. (2016). Sentiment lexicons for Arabic social media. In Proceedings of the tenth international conference on language resources and evaluation (LREC'16) (pp. 33–37)

  • Nabil, M., Aly, M., & Atiya, A. (2015). ASTD: Arabic Sentiment Tweets Dataset. In: Proceedings of the 2015 conference on empirical methods in natural language processing. association for computational linguistics, pp. 2515–2519. https://doi.org/10.18653/v1/D15-1299

  • Oussous, A., Lahcen, A. A., & Belfkih, S. (2019). Impact of Text Pre-processing and Ensemble Learning on Arabic Sentiment Analysis. In: The 2nd International Conference. ACM Press, pp. 1–9. https://doi.org/10.1145/3320326.3320399

  • Soumeur, A., Mokdadi, M., Guessoum, A., & Daoud, A. (2018). Sentiment analysis of users on social networks: Overcoming the challenge of the loose usages of the Algerian Dialect. Procedia Computer Science, 142, 26–37. https://doi.org/10.1016/j.procs.2018.10.458.

    Article  Google Scholar 

  • Tubishat, M., Abushariah, M. A. M., Idris, N., & Aljarah, I. (2019). Improved whale optimization algorithm for feature selection in Arabic sentiment analysis. Applied Intelligence, 49, 1688–1707. https://doi.org/10.1007/s10489-018-1334-8.

    Article  Google Scholar 

  • Youssef, M., & El-Beltagy, S. R. (2018). MoArLex: An Arabic sentiment lexicon built through automatic lexicon expansion. Procedia Computer Science, 142, 94–103. https://doi.org/10.1016/j.procs.2018.10.464.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ibtissam Touahri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Touahri, I., Mazroui, A. Deep analysis of an Arabic sentiment classification system based on lexical resource expansion and custom approaches building. Int J Speech Technol 24, 109–126 (2021). https://doi.org/10.1007/s10772-020-09758-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09758-z

Keyword

Navigation