Advertisement

SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis

  • Imane GuellilEmail author
  • Ahsan Adeel
  • Faical Azouaou
  • Amir Hussain
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10989)

Abstract

Data annotation is an important but time-consuming and costly procedure. To sort a text into two classes, the very first thing we need is a good annotation guideline, establishing what is required to qualify for each class. In the literature, the difficulties associated with an appropriate data annotation has been underestimated. In this paper, we present a novel approach to automatically construct an annotated sentiment corpus for Algerian dialect (A Maghrebi Arabic dialect). The construction of this corpus is based on an Algerian sentiment lexicon that is also constructed automatically. The presented work deals with the two widely used scripts on Arabic social media: Arabic and Arabizi. The proposed approach automatically constructs a sentiment corpus containing 8000 messages (where 4000 are dedicated to Arabic and 4000 to Arabizi). The achieved F1-score is up to 72% and 78% for an Arabic and Arabizi test sets, respectively. Ongoing work is aimed at integrating transliteration process for Arabizi messages to further improve the obtained results.

Keywords

Arabic sentiment analysis Algerian dialect Sentiment lexicon Sentiment corpus Sentiment classification 

Notes

Acknowledgment

Amir Hussain and Ahsan Adeel were supported by the UK Engineering and Physical Sciences Research Council (EPSRC) grant No.EP/M026981/1.

References

  1. 1.
    Abdul-Mageed, M., Diab, M.T.: AWATIF: A multi-genre corpus for modern standard Arabic subjectivity and sentiment analysis. In: LREC, pp. 3907–3914. Citeseer (2012)Google Scholar
  2. 2.
    Abdulla, N., Mohammed, S., Al-Ayyoub, M., Al-Kabi, M., et al.: Automatic lexicon construction for Arabic sentiment analysis. In: Future Internet of Things and Cloud (FiCloud), International Conference on 2014, pp. 547–552. IEEE (2014)Google Scholar
  3. 3.
    Al-Ayyoub, M., Essa, S.B., Alsmadi, I.: Lexicon-based sentiment analysis of Arabic tweets. Int. J. Soc. Netw. Min. 2(2), 101–114 (2015)CrossRefGoogle Scholar
  4. 4.
    Al-Twairesh, N., Al-Khalifa, H., Al-Salman, A., Al-Ohali, Y.: Arasenti-tweet: a corpus for Arabic sentiment analysis of Saudi tweets. Procedia Comput. Sci. 117, 63–72 (2017)CrossRefGoogle Scholar
  5. 5.
    Alayba, A.M., Palade, V., England, M., Iqbal, R.: Arabic language sentiment analysis on health services. In: Arabic Script Analysis and Recognition (ASAR), 1st International Workshop on 2017, pp. 114–118. IEEE (2017)Google Scholar
  6. 6.
    AlKhateeb, J.H., Jiang, J., Ren, J., Ipson, S.: Component-based segmentation of words from handwritten Arabic text. Int. J. Comput. Syst. Sci. Eng. 5(1), 54–58 (2009)Google Scholar
  7. 7.
    AlKhateeb, J.H., Pauplin, O., Ren, J., Jiang, J.: Performance of hidden markov model and dynamic bayesian network classifiers on handwritten Arabic word recognition. knowl.-Based Syst. 24(5), 680–688 (2011)CrossRefGoogle Scholar
  8. 8.
    AlKhateeb, J.H., Ren, J., Jiang, J., Al-Muhtaseb, H.: Offline handwritten Arabic cursive text recognition using hidden Markov models and re-ranking. Pattern Recogn. Lett. 32(8), 1081–1088 (2011)CrossRefGoogle Scholar
  9. 9.
    Aly, M., Atiya, A.: Labr: A large scale Arabic book reviews dataset. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 494–498 (2013). (Volume 2: Short Papers)Google Scholar
  10. 10.
    Badaro, G., Baly, R., Hajj, H., Habash, N., El-Hajj, W.: A large scale arabic sentiment lexicon for Arabic opinion mining. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pp. 165–173 (2014)Google Scholar
  11. 11.
    Duwairi, R.M., Alfaqeh, M., Wardat, M., Alrabadi, A.: Sentiment analysis for arabizi text. In: Information and Communication Systems (ICICS), 7th International Conference on 2016, pp. 127–132. IEEE (2016)Google Scholar
  12. 12.
    Guellil, I., Azouaou, F.: Bilingual lexicon for algerian arabic dialect treatment in social media (2017)Google Scholar
  13. 13.
    Guellil, I., Azouaou, F., Abbas, M.: Comparison between neural and statistical translation after transliteration of algerian arabic dialect. In: WiNLP: Women and Underrepresented Minorities in Natural Language Processing (co-located withACL 2017), pp. 1–5 (2017)Google Scholar
  14. 14.
    Guellil, I., Azouaou, F., Abbas, M., Fatiha, S.: Arabizi transliteration of algerian Arabic dialect into modern standard Arabic. In: Social MT First workshop on Social Media and User Generated Content Machine Translation, pp. 1–8 2017Google Scholar
  15. 15.
    Guellil, I., Azouaou, F.: Asda: Analyseur syntaxique du dialecte alg \(\{\)\(\backslash \)’e\(\}\) rien dans un but d’analyse s \(\{\)\(\backslash \)’e\(\}\) mantiqueGoogle Scholar
  16. 16.
    Guellil, I., Azouaou, F.: Arabic dialect identification with an unsupervised learning (based on a lexicon) application case: algerian dialect. In: Computational Science and Engineering (CSE) and IEEE International Conference on 2016 Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES), pp. 724–731. IEEE (2016)Google Scholar
  17. 17.
    Guellil, I., Boukhalfa, K.: Social big data mining: A survey focused on opinion mining and sentiments analysis. In: Programming and Systems (ISPS), 12th International Symposium on 2015, pp. 1–10. IEEE (2015)Google Scholar
  18. 18.
    Khan, A.Z., Atique, M., Thakare, V.: Combining lexicon-based and learning-based methods for twitter sentiment analysis. Int. J. Electron. Commun. Soft Comput. Sci. Eng. (IJECSCSE) 89, 89 (2015)Google Scholar
  19. 19.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)Google Scholar
  20. 20.
    Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human language technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011)Google Scholar
  21. 21.
    Mataoui, M., Zelmati, O., Boumechache, M.: A proposed lexicon-based sentiment analysis approach for the vernacular algerian Arabic. Res. Comput. Sci. 110, 55–70 (2016)Google Scholar
  22. 22.
    Medhaffar, S., Bougares, F., Esteve, Y., Hadrich-Belguith, L.: Sentiment analysis of Tunisian dialects: Linguistic ressources and experiments. In: Proceedings of the Third Arabic Natural Language Processing Workshop, pp. 55–61 (2017)Google Scholar
  23. 23.
    Meftouh, K., Harrat, S., Jamoussi, S., Abbas, M., Smaili, K.: Machine translation experiments on PADIC: a parallel Arabic dialect corpus. In: The 29th Pacific Asia Conference on Language, Information and Computation, pp. 1–9 (2015)Google Scholar
  24. 24.
    Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Rushdi-Saleh, M., Martín-Valdivia, M.T., Ureña-López, L.A., Perea-Ortega, J.M.: OCA: opinion corpus for Arabic. J. Assoc. Inf. Sci. Technol. 62(10), 2045–2054 (2011)CrossRefGoogle Scholar
  26. 26.
    Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Imane Guellil
    • 1
    • 2
    Email author
  • Ahsan Adeel
    • 3
  • Faical Azouaou
    • 2
  • Amir Hussain
    • 3
  1. 1.Ecole Superieure des Sciences Appliquées d’Alger ESSA-algerAlgerAlgeria
  2. 2.Laboratoire des Méthodes de Conception des Systèmes (LMCS)Ecole Nationale Supérieure d’InformatiqueOued-SmarAlgérie
  3. 3.Institute of Computing Science and MathematicsSchool of Natural Sciences University of StirlingStirlingUK

Personalised recommendations