Skip to main content

Combining Lexical Features and a Supervised Learning Approach for Arabic Sentiment Analysis

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

Abstract

The importance of building sentiment analysis tools for Arabic social media has been recognized during the past couple of years, especially with the rapid increase in the number of Arabic social media users. One of the main difficulties in tackling this problem is that text within social media is mostly colloquial, with many dialects being used within social media platforms. In this paper, we present a set of features that were integrated with a machine learning based sentiment analysis model and applied on Egyptian, Saudi, Levantine, and MSA Arabic social media datasets. Many of the proposed features were derived through the use of an Arabic Sentiment Lexicon. The model also presents emoticon based features, as well as input text related features such as the number of segments within the text, the length of the text, whether the text ends with a question mark or not, etc. We show that the presented features have resulted in an increased accuracy across six of the seven datasets we’ve experimented with and which are all benchmarked. Since the developed model outperforms all existing Arabic sentiment analysis systems that have publicly available datasets, we can state that this model presents state-of-the-art in Arabic sentiment analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The system was the best performer in SemEval 2013 and SemEval 2014 with respect to the message level polarity detection task [9, 10].

  2. 2.

    The version provided to us by the authors had 4820 tweets.

References

  1. Abdulla, N.A., Ahmed, N.A., Shehab, M.A., Al-Ayyoub, M.: Arabic sentiment analysis: lexicon-based and corpus-based. In: 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6. IEEE, Amman (2013)

    Google Scholar 

  2. Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: rapid prototyping for complex data mining tasks. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–940 (2006)

    Google Scholar 

  3. Shoukry, A., Rafea, A.: Preprocessing Egyptian dialect tweets for sentiment mining. In: Proceedings of 4th Workshop on Computational Approaches to Arabic Script-Based Languages, San Diego, California, USA, pp. 47–56 (2012)

    Google Scholar 

  4. Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: practical machine learning tools and techniques with Java implementations. In: Seminar, vol. 99, pp. 192–196 (1999)

    Google Scholar 

  5. El-Beltagy, S.R., Rafea, A.: An accuracy enhanced light stemmer for Arabic text. ACM Trans. Speech Lang. Process. 7, 2–23 (2011)

    Article  Google Scholar 

  6. Salamah, J.B., Elkhlifi, A.: Microblogging opinion mining approach for Kuwaiti dialect. In: International Conference on Computing Technology and Information Management (ICCTIM 2014), pp. 388–396 (2014)

    Google Scholar 

  7. Duwairi, R., Marji, R., Sha’ban, N., Rushaidat, S.: Sentiment analysis in Arabic tweets. In: 2014 5th International Conference Information and Communication Systems (ICICS), pp. 1–6. IEEE, Irbid (2014)

    Google Scholar 

  8. Salameh, M., Mohammad, S., Kiritchenko, S.: Sentiment after translation: a case-study on Arabic social media posts. In: Proceedings of 2015 Conference of the North American Chapter of Association for Computational Linguistics: Human Language Technologies, pp. 767–777. Association for Computational Linguistics, Denver (2015)

    Google Scholar 

  9. Mohammad, S.M., Kiritchenko, S., Zhu, X.: NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of 7th International Workshop on Semantic Evaluation Exercises (SemEval-2013), Atlanta, Georgia, USA (2013)

    Google Scholar 

  10. Kiritchenko, S., Zhu, X., Mohammad, S.: Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014)

    Google Scholar 

  11. Mourad, A., Darwish, K.: Subjectivity and sentiment analysis of modern standard Arabic and Arabic microblogs. In: Proceedings of 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 55–64 (2013)

    Google Scholar 

  12. Refaee, E., Rieser, V.: Subjectivity and sentiment analysis of Arabic Twitter feeds with limited resources. In: Proceedings of Workshop on Free/Open-Source Arabic corpora and corpora processing tools, Reykjavik, Iceland, pp. 16–21 (2014)

    Google Scholar 

  13. Shoukry, A., Rafea, A.: A hybrid approach for sentiment classification of egyptian dialect tweets. In: 1st International Conference on Arabic Computational Linguistics (ACLing), Cairo, Egypt, pp. 78–85 (2015)

    Google Scholar 

  14. Khalil, T., Halaby, A., Hammad, M.H., El-Beltagy, S.R.: Which configuration works best? An experimental study on supervised Arabic twitter sentiment analysis. In: Proceedings of 1st Conference on Arabic Computational Liguistics (ACLing 2015), Co-located with CICLing 2015, Cairo, Egypt, pp. 86–93 (2015)

    Google Scholar 

  15. Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of 20th International Conference on Machine Learning (ICML 2003), USA, vol. 20, pp. 616–623 (2003)

    Google Scholar 

  16. Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I.H., Trigg, L.: WEKA: a machine learning workbench for data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 1305–1314. Springer, Boston (2005). https://doi.org/10.1007/978-0-387-09823-4_66

    Chapter  Google Scholar 

  17. ElSahar, H., El-Beltagy, S.R.: Building large Arabic multi-domain resources for sentiment analysis. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 23–34. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18117-2_2

    Google Scholar 

  18. El-Beltagy, S.R.: NileULex: a phrase and word level sentiment lexicon for Egyptian and modern standard Arabic. In: Proceedings of LREC 2016, Portorož, Slovenia (2016, to appear)

    Google Scholar 

  19. El-Beltagy, S.R., Ali, A.: Open issues in the sentiment analysis of Arabic social media: a case study. In: Proceedings of 9th International Conference on Innovations and Information Technology (IIT 2013), Al Ain, UAE (2013)

    Google Scholar 

  20. Zayed, O., El-Beltagy, S.R.: Named entity recognition of persons’ names in Arabic tweets. In: Proceedings of Recent Advances in Natural Language Processing (RANLP 2015), Hissar, Bulgaria (2015)

    Google Scholar 

  21. El-Beltagy, S.R., Rafea, A.: LemaLight: a dictionary based Arabic lemmatizer and stemmer (2016)

    Google Scholar 

  22. El-Beltagy, S.R., Rafea, A.: A corpus based approach for the automatic creation of Arabic broken plural dictionaries. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7816, pp. 89–97. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37247-6_8

    Chapter  Google Scholar 

  23. Refaee, E., Rieser, V.: An Arabic Twitter Corpus for subjectivity and sentiment analysis. In: Proceedings of 9th Edition of Language Resources and Evaluation Conference (LREC 2014), Iceland (2014)

    Google Scholar 

  24. McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: AAAI/ICML-1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)

    Google Scholar 

  25. Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–39 (2011)

    Article  Google Scholar 

  26. El-Bletagy, S.R.: NileTMRG: deriving prior polarities for Arabic sentiment terms. In: Proceedings of SemEval 2016, San Diego, California (2014, submitted)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank Amira Shoukry, Dr. Ahmed Rafea, and Dr. Kareem Darwish for kindly sharing their datasets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samhaa R. El-Beltagy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

El-Beltagy, S.R., Khalil, T., Halaby, A., Hammad, M. (2018). Combining Lexical Features and a Supervised Learning Approach for Arabic Sentiment Analysis. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75487-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75486-4

  • Online ISBN: 978-3-319-75487-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics