The Role of Transliteration in the Process of Arabizi Translation/Sentiment Analysis

  • Imane GuellilEmail author
  • Faical Azouaou
  • Fodil Benali
  • Ala Eddine Hachani
  • Marcelo Mendoza
Part of the Studies in Computational Intelligence book series (SCI, volume 874)


Arabizi is a form of written Arabic which relies on Latin letters, numerals and punctuation rather than Arabic letters. In literature most of the works are concentrated in the study of Arabic neglecting the study of Arabizi. To conduct automatic translation and sentiment analysis, some approaches tend to handle it like any other language while others use a transliteration phase which converts Arabizi into Arabic script. In this context, the main purpose of this study is to determine the utility of Arabizi transliteration in improving automatic translation and sentiment analysis results. We introduce a rule-based automatic transliteration system. Then we apply this system to transliterate a collection of messages before proceeding to machine translation and sentiment analysis tasks. To evaluate the importance of transliteration on these tasks, we also present the construction of a set of lexical resources, such as: a parallel corpus between Arabizi and Modern Standard Arabic (MSA) constructed manually, a sentiment lexicon built automatically and revised manually, and an annotated sentiment corpus constructed automatically based on the sentiment lexicon. We also apply a set of algorithms and models dedicated to machine translation and sentiment analysis, including a number of shallow and deep classifiers as well as different embedding-based models for feature extraction. The experimental results show a consistent improvement after applying transliteration achieving performance results up to 13.06 for automatic translation using the BLEU score and up to 92% for sentiment classification using the F1-score. This study allows to affirm that transliteration is a key factor in Arabizi handling.


Arabizi translation Sentiment analysis Machine translation 



Mr. Mendoza acknowledge funding support from the Millennium Institute for Foundational Research on Data and also by the project BASAL FB0821. The funder played no role in the design of this study.


  1. 1.
    I. Guellil, H. Saâdane, F. Azouaou, B. Gueni, D. Nouvel, Arabic natural language processing: an overview. J. King Saud Univ.-Comput. Inf. Sci. (2019)Google Scholar
  2. 2.
    K. Darwish, Arabizi detection and conversion to Arabic. arXiv preprint arXiv:1306.6755 (2013)
  3. 3.
    A. Bies, Z. Song, M. Maamouri, S. Grimes, H. Lee, J. Wright, S. Strassel, N. Habash, R. Eskander, O. Rambow, Transliteration of Arabizi into Arabic orthography: developing a parallel annotated Arabizi-Arabic script SMS/chat corpus, in Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014, pp. 93–103Google Scholar
  4. 4.
    R. Cotterell, A. Renduchintala, N. Saphra, C. Callison-Burch, An Algerian Arabic–French code-switched corpus, in Workshop on Free/Open-Source Arabic Corpora and Corpora Processing Tools Workshop Programme, 2014, p. 34Google Scholar
  5. 5.
    A. Abdelali, K. Darwish, N. Durrani, H. Mubarak, Farasa: a fast and furious segmenter for Arabic, in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 2016, pp. 11–16Google Scholar
  6. 6.
    A. Pasha, M. Al-Badrashiny, M.T. Diab, A. El Kholy, R. Eskander, N. Habash, M. Pooleery, O. Rambow, R. Roth, MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic, in LREC, vol. 14 (2014), pp. 1094–1101Google Scholar
  7. 7.
    S. Yousfi, S.-A. Berrani, C. Garcia, ALIF: a dataset for Arabic embedded text recognition in TV broadcast, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2015), pp. 1221–1225Google Scholar
  8. 8.
    G. Inoue, N. Habash, Y. Matsumoto, H. Aoyama, A parallel corpus of Arabic-Japanese news articles, in LREC (2018)Google Scholar
  9. 9.
    S. Mohammad, M. Salameh, S. Kiritchenko, Sentiment lexicons for Arabic social media, in LREC (2016)Google Scholar
  10. 10.
    N. Al-Twairesh, H. Al-Khalifa, A. AlSalman, Arasenti: large-scale twitter-specific Arabic sentiment lexicons, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1 (2016), pp. 697–705Google Scholar
  11. 11.
    K. Darwish, H. Mubarak, A. Abdelali, M. Eldesouki, Y. Samih, R. Alharbi, M. Attia, W. Magdy, L. Kallmeyer, Multi-dialect Arabic pos tagging: a CRF approach, in LREC (2018)Google Scholar
  12. 12.
    N. Habash, F. Eryani, S. Khalifa, O. Rambow, D. Abdulrahim, A. Erdmann, R. Faraj, W. Zaghouani, H. Bouamor, N. Zalmout et al., Unified guidelines and resources for Arabic dialect orthography, in LREC (2018)Google Scholar
  13. 13.
    S. Shon, A. Ali, J. Glass, Convolutional neural networks and language embeddings for end-to-end dialect recognition. arXiv preprint arXiv:1803.04567 (2018)
  14. 14.
    I. Guellil, F. Azouaou, Asda: Analyseur syntaxique du dialecte alg érien dans un but d’analyse s é mantique. arXiv preprint arXiv:1707.08998 (2017)
  15. 15.
    K. Darwish, Arabizi detection and conversion to Arabic, in Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014, pp. 217–224Google Scholar
  16. 16.
    I. Guellil, F. Azouaou, M. Abbas, S. Fatiha, Arabizi transliteration of Algerian Arabic dialect into modern standard Arabic, in Social MT 2017: First workshop on Social Media and User Generated Content Machine Translation (Co-located with EAMT 2017), 2017Google Scholar
  17. 17.
    N. Habash, A. Soudi, T. Buckwalter, On Arabic transliteration, in Arabic Computational Morphology (Springer, 2007), pp. 15–22Google Scholar
  18. 18.
    I. Guellil, A. Faical, Bilingual lexicon for Algerian Arabic dialect treatment in social media, in WiNLP: Women & Underrepresented Minorities in Natural Language Processing (Co-located with ACL 2017) (2017).
  19. 19.
    M. Al-Badrashiny, R. Eskander, N. Habash, O. Rambow, Automatic transliteration of romanized dialectal Arabic, in Proceedings of the Eighteenth Conference on Computational Natural Language Learning, 2014, pp. 30–38Google Scholar
  20. 20.
    K. Meftouh, S. Harrat, S. Jamoussi, M. Abbas, K. Smaili, Machine translation experiments on PADIC: a parallel Arabic dialect corpus, in The 29th Pacific Asia Conference on Language, Information and Computation, 2015Google Scholar
  21. 21.
    G. Kumar, Y. Cao, R. Cotterell, C. Callison-Burch, D. Povey, S. Khudanpur, Translations of the Callhome Egyptian Arabic corpus for conversational speech translation, in IWSLT. Citeseer, 2014Google Scholar
  22. 22.
    R. Suwaileh, M. Kutlu, N. Fathima, T. Elsayed, M. Lease, Arabicweb16: a new crawl for today’s Arabic web, in Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, 2016), pp. 673–676Google Scholar
  23. 23.
    M. Rushdi-Saleh, M.T. Martín-Valdivia, L.A. Ureña-López, J.M. Perea-Ortega, OCA: opinion corpus for Arabic. J. Assoc. Inf. Sci. Technol. 62(10), 2045–2054 (2011)CrossRefGoogle Scholar
  24. 24.
    N. Abdulla, S. Mohammed, M. Al-Ayyoub, M. Al-Kabi et al., Automatic lexicon construction for Arabic sentiment analysis, in 2014 International Conference on Future Internet of Things and Cloud (FiCloud) (IEEE, 2014), pp. 547–552Google Scholar
  25. 25.
    M. Abdul-Mageed, M.T. Diab, AWATIF: a multi-genre corpus for modern standard Arabic subjectivity and sentiment analysis, in LREC. Citeseer, 2012, pp. 3907–3914Google Scholar
  26. 26.
    M. Aly, A. Atiya, LABR: a large scale Arabic book reviews dataset, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2 (2013), pp. 494–498Google Scholar
  27. 27.
    G. Badaro, R. Baly, H. Hajj, N. Habash, W. El-Hajj, A large scale Arabic sentiment lexicon for Arabic opinion mining, in Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014, pp. 165–173Google Scholar
  28. 28.
    S.R. El-Beltagy, NileULex: a phrase and word level sentiment lexicon for Egyptian and modern standard Arabic, in LREC (2016)Google Scholar
  29. 29.
    M. van der Wees, A. Bisazza, C. Monz, A simple but effective approach to improve Arabizi-to-English statistical machine translation, in Proceedings of the 2nd Workshop on Noisy User-Generated Text (WNUT), 2016, pp. 43–50Google Scholar
  30. 30.
    J. May, Y. Benjira, A. Echihabi, An Arabizi-English social media statistical machine translation system, in Proceedings of the 11th Conference of the Association for Machine Translation in the Americas, 2014, pp. 329–341Google Scholar
  31. 31.
    I. Guellil, F. Azouaou, M. Abbas, Comparison between neural and statistical translation after transliteration of Algerian Arabic dialect, in WiNLP: Women & Underrepresented Minorities in Natural Language Processing (Co-located with ACL 2017), 2017Google Scholar
  32. 32.
    I. Guellil, F. Azouaou, Neural vs statistical translation of Algerian Arabic dialect written with Arabizi and Arabic letter, in The 31st Pacific Asia Conference on Language, Information and Computation PACLIC 31 (2017), 2017Google Scholar
  33. 33.
    R.M. Duwairi, M. Alfaqeh, M. Wardat, A. Alrabadi, Sentiment analysis for Arabizi text, in 2016 7th International Conference on Information and Communication Systems (ICICS) (IEEE, 2016), pp. 127–132Google Scholar
  34. 34.
    I. Guellil, A. Adeel, F. Azouaou, A. Hussain, SentiALG: automated corpus annotation for Algerian sentiment analysis. arXiv preprint arXiv:1808.05079 (2018)
  35. 35.
    S. Medhaffar, F. Bougares, Y. Esteve, L. Hadrich-Belguith, Sentiment analysis of Tunisian dialects: linguistic resources and experiments, in Proceedings of the Third Arabic Natural Language Processing Workshop, 2017, pp. 55–61Google Scholar
  36. 36.
    I. Guellil, F. Azouaou, H. Saâdane, N. Semmar, Une approche fondée sur les lexiques d’analyse de sentiments du dialecte algérien (2017)Google Scholar
  37. 37.
    I. Guellil, F. Azouaou, F. Benali, A.-E. Hachani, H. Saadane, Approche hybride pour la translitération de l’arabizi algérien : une étude préliminaire, in Conference: 25e conférence sur le Traitement Automatique des Langues Naturelles (TALN), May 2018, Rennes, FranceAt: Rennes, France (2018).
  38. 38.
    P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens et al., Moses: open source toolkit for statistical machine translation, in Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (Association for Computational Linguistics, 2007), pp. 177–180Google Scholar
  39. 39.
    S. Al-Azani, E.-S.M. El-Alfy, Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short Arabic text. Procedia Comput. Sci. 109, 359–366 (2017)CrossRefGoogle Scholar
  40. 40.
    A.A. Altowayan, L. Tao, Word embeddings for Arabic sentiment analysis, in 2016 IEEE International Conference on Big Data (Big Data) (IEEE, 2016), pp. 3820–3825Google Scholar
  41. 41.
    A. El Mahdaouy, E. Gaussier, S.O. El Alaoui, Arabic text classification based on word and document embeddings, in International Conference on Advanced Intelligent Systems and Informatics (Springer, 2016), pp. 32–41Google Scholar
  42. 42.
    A. Barhoumi, Y.E.C. Aloulou, L.H. Belguith, Document Embeddings for Arabic Sentiment Analysis (2017)Google Scholar
  43. 43.
    A. Dahou, S. Xiong, J. Zhou, M.H. Haddoud, P. Duan, Word embeddings and convolutional neural network for Arabic sentiment classification, in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 2016, pp. 2418–2427Google Scholar
  44. 44.
    M. Attia, Y. Samih, A. El-Kahky, L. Kallmeyer, Multilingual multi-class sentiment classification using convolutional neural networks, in LREC (2018)Google Scholar
  45. 45.
    R. Zbib, E. Malchiodi, J. Devlin, D. Stallard, S. Matsoukas, R. Schwartz, J. Makhoul, O.F. Zaidan, C. Callison-Burch, Machine translation of Arabic dialects, in Proceedings of the 2012 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics, 2012), pp. 49–59Google Scholar
  46. 46.
    W. Salloum, N. Habash, Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation, in Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties (Association for Computational Linguistics, 2011), pp. 10–21Google Scholar
  47. 47.
    M. Taboada, J. Brooke, M. Tofiloski, K. Voll, M. Stede, Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)CrossRefGoogle Scholar
  48. 48.
    T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems, 2013, pp. 3111–3119Google Scholar
  49. 49.
    Q. Le, T. Mikolov, Distributed representations of sentences and documents, in International Conference on Machine Learning, 2014, pp. 1188–1196Google Scholar
  50. 50.
    M. Abdul-Mageed, M.T. Diab, M. Korayem, Subjectivity and sentiment analysis of modern standard Arabic, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2 (Association for Computational Linguistics, 2011), pp. 587–591Google Scholar
  51. 51.
    K. Meftouh, N. Bouchemal, K. Smaïli, A study of a non-resourced language: the case of one of the Algerian dialects, in The third International Workshop on Spoken Languages Technologies for Under-Resourced Languages-SLTU’12, 2012Google Scholar
  52. 52.
    F.J. Och, H. Ney, A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)CrossRefGoogle Scholar
  53. 53.
    K. Heafield, KenLM: faster and smaller language model queries, in Proceedings of the Sixth Workshop on Statistical Machine Translation (Association for Computational Linguistics, 2011), pp. 187–197Google Scholar
  54. 54.
    K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, BLEU: a method for automatic evaluation of machine translation, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics Association for Computational Linguistics, 2002, pp. 311–318Google Scholar
  55. 55.
    I. Guellil, F. Azouaou, Arabic dialect identification with an unsupervised learning (based on a lexicon). application case: Algerian dialect, in 2016 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC) and 15th International Symposium on Distributed Computing and Applications for Business Engineering (DCABES) (IEEE, 2016), pp. 724–731Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Imane Guellil
    • 1
    Email author
  • Faical Azouaou
    • 1
  • Fodil Benali
    • 2
  • Ala Eddine Hachani
    • 3
  • Marcelo Mendoza
    • 4
  1. 1.Laboratoire des Méthodes de Conception des SystèmesEcole nationale Supérieure d’InformatiqueOued-Smar, AlgiersAlgeria
  2. 2.L’université de Versailles Saint QuentinVersaillesFrance
  3. 3.Sarl Services Web et Promotion (SWP)AlgiersAlgeria
  4. 4.Universidad Técnica Federico Santa MaríaSantiagoChile

Personalised recommendations