Skip to main content

The Role of Transliteration in the Process of Arabizi Translation/Sentiment Analysis

  • Chapter
  • First Online:
Recent Advances in NLP: The Case of Arabic Language

Part of the book series: Studies in Computational Intelligence ((SCI,volume 874))

Abstract

Arabizi is a form of written Arabic which relies on Latin letters, numerals and punctuation rather than Arabic letters. In literature most of the works are concentrated in the study of Arabic neglecting the study of Arabizi. To conduct automatic translation and sentiment analysis, some approaches tend to handle it like any other language while others use a transliteration phase which converts Arabizi into Arabic script. In this context, the main purpose of this study is to determine the utility of Arabizi transliteration in improving automatic translation and sentiment analysis results. We introduce a rule-based automatic transliteration system. Then we apply this system to transliterate a collection of messages before proceeding to machine translation and sentiment analysis tasks. To evaluate the importance of transliteration on these tasks, we also present the construction of a set of lexical resources, such as: a parallel corpus between Arabizi and Modern Standard Arabic (MSA) constructed manually, a sentiment lexicon built automatically and revised manually, and an annotated sentiment corpus constructed automatically based on the sentiment lexicon. We also apply a set of algorithms and models dedicated to machine translation and sentiment analysis, including a number of shallow and deep classifiers as well as different embedding-based models for feature extraction. The experimental results show a consistent improvement after applying transliteration achieving performance results up to 13.06 for automatic translation using the BLEU score and up to 92% for sentiment classification using the F1-score. This study allows to affirm that transliteration is a key factor in Arabizi handling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Code switching: The presence of different language and dialects into the same message.

  2. 2.

    https://en.wikipedia.org/wiki/Arabic_chat_alphabet.

  3. 3.

    http://restfb.com/.

  4. 4.

    http://twitter4j.org/en/index.html.

  5. 5.

    https://www.facebook.com/MustafaHosny/.

  6. 6.

    ooredooqatar.

  7. 7.

    https://www.facebook.com/EnnaharTv/?_ga=2.64384426.442009333.1538003328-860660030.1538003328.

  8. 8.

    https://www.socialbakers.com/statistics/facebook/pages/total/algeria/.

  9. 9.

    https://glosbe.com/en/arq/excellent.

  10. 10.

    https://github.com/fbougares/TSAC.

  11. 11.

    https://radimrehurek.com/gensim/models/word2vec.html.

  12. 12.

    https://radimrehurek.com/gensim/models/doc2vec.html.

  13. 13.

    https://keras.io/layers/embeddings/.

References

  1. I. Guellil, H. Saâdane, F. Azouaou, B. Gueni, D. Nouvel, Arabic natural language processing: an overview. J. King Saud Univ.-Comput. Inf. Sci. (2019)

    Google Scholar 

  2. K. Darwish, Arabizi detection and conversion to Arabic. arXiv preprint arXiv:1306.6755 (2013)

  3. A. Bies, Z. Song, M. Maamouri, S. Grimes, H. Lee, J. Wright, S. Strassel, N. Habash, R. Eskander, O. Rambow, Transliteration of Arabizi into Arabic orthography: developing a parallel annotated Arabizi-Arabic script SMS/chat corpus, in Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014, pp. 93–103

    Google Scholar 

  4. R. Cotterell, A. Renduchintala, N. Saphra, C. Callison-Burch, An Algerian Arabic–French code-switched corpus, in Workshop on Free/Open-Source Arabic Corpora and Corpora Processing Tools Workshop Programme, 2014, p. 34

    Google Scholar 

  5. A. Abdelali, K. Darwish, N. Durrani, H. Mubarak, Farasa: a fast and furious segmenter for Arabic, in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 2016, pp. 11–16

    Google Scholar 

  6. A. Pasha, M. Al-Badrashiny, M.T. Diab, A. El Kholy, R. Eskander, N. Habash, M. Pooleery, O. Rambow, R. Roth, MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic, in LREC, vol. 14 (2014), pp. 1094–1101

    Google Scholar 

  7. S. Yousfi, S.-A. Berrani, C. Garcia, ALIF: a dataset for Arabic embedded text recognition in TV broadcast, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2015), pp. 1221–1225

    Google Scholar 

  8. G. Inoue, N. Habash, Y. Matsumoto, H. Aoyama, A parallel corpus of Arabic-Japanese news articles, in LREC (2018)

    Google Scholar 

  9. S. Mohammad, M. Salameh, S. Kiritchenko, Sentiment lexicons for Arabic social media, in LREC (2016)

    Google Scholar 

  10. N. Al-Twairesh, H. Al-Khalifa, A. AlSalman, Arasenti: large-scale twitter-specific Arabic sentiment lexicons, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1 (2016), pp. 697–705

    Google Scholar 

  11. K. Darwish, H. Mubarak, A. Abdelali, M. Eldesouki, Y. Samih, R. Alharbi, M. Attia, W. Magdy, L. Kallmeyer, Multi-dialect Arabic pos tagging: a CRF approach, in LREC (2018)

    Google Scholar 

  12. N. Habash, F. Eryani, S. Khalifa, O. Rambow, D. Abdulrahim, A. Erdmann, R. Faraj, W. Zaghouani, H. Bouamor, N. Zalmout et al., Unified guidelines and resources for Arabic dialect orthography, in LREC (2018)

    Google Scholar 

  13. S. Shon, A. Ali, J. Glass, Convolutional neural networks and language embeddings for end-to-end dialect recognition. arXiv preprint arXiv:1803.04567 (2018)

  14. I. Guellil, F. Azouaou, Asda: Analyseur syntaxique du dialecte alg érien dans un but d’analyse s é mantique. arXiv preprint arXiv:1707.08998 (2017)

  15. K. Darwish, Arabizi detection and conversion to Arabic, in Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014, pp. 217–224

    Google Scholar 

  16. I. Guellil, F. Azouaou, M. Abbas, S. Fatiha, Arabizi transliteration of Algerian Arabic dialect into modern standard Arabic, in Social MT 2017: First workshop on Social Media and User Generated Content Machine Translation (Co-located with EAMT 2017), 2017

    Google Scholar 

  17. N. Habash, A. Soudi, T. Buckwalter, On Arabic transliteration, in Arabic Computational Morphology (Springer, 2007), pp. 15–22

    Google Scholar 

  18. I. Guellil, A. Faical, Bilingual lexicon for Algerian Arabic dialect treatment in social media, in WiNLP: Women & Underrepresented Minorities in Natural Language Processing (Co-located with ACL 2017) (2017). http://www.winlp.org/wp-content/uploads/2017/final_papers_2017/92_Paper.pdf

  19. M. Al-Badrashiny, R. Eskander, N. Habash, O. Rambow, Automatic transliteration of romanized dialectal Arabic, in Proceedings of the Eighteenth Conference on Computational Natural Language Learning, 2014, pp. 30–38

    Google Scholar 

  20. K. Meftouh, S. Harrat, S. Jamoussi, M. Abbas, K. Smaili, Machine translation experiments on PADIC: a parallel Arabic dialect corpus, in The 29th Pacific Asia Conference on Language, Information and Computation, 2015

    Google Scholar 

  21. G. Kumar, Y. Cao, R. Cotterell, C. Callison-Burch, D. Povey, S. Khudanpur, Translations of the Callhome Egyptian Arabic corpus for conversational speech translation, in IWSLT. Citeseer, 2014

    Google Scholar 

  22. R. Suwaileh, M. Kutlu, N. Fathima, T. Elsayed, M. Lease, Arabicweb16: a new crawl for today’s Arabic web, in Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, 2016), pp. 673–676

    Google Scholar 

  23. M. Rushdi-Saleh, M.T. Martín-Valdivia, L.A. Ureña-López, J.M. Perea-Ortega, OCA: opinion corpus for Arabic. J. Assoc. Inf. Sci. Technol. 62(10), 2045–2054 (2011)

    Article  Google Scholar 

  24. N. Abdulla, S. Mohammed, M. Al-Ayyoub, M. Al-Kabi et al., Automatic lexicon construction for Arabic sentiment analysis, in 2014 International Conference on Future Internet of Things and Cloud (FiCloud) (IEEE, 2014), pp. 547–552

    Google Scholar 

  25. M. Abdul-Mageed, M.T. Diab, AWATIF: a multi-genre corpus for modern standard Arabic subjectivity and sentiment analysis, in LREC. Citeseer, 2012, pp. 3907–3914

    Google Scholar 

  26. M. Aly, A. Atiya, LABR: a large scale Arabic book reviews dataset, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2 (2013), pp. 494–498

    Google Scholar 

  27. G. Badaro, R. Baly, H. Hajj, N. Habash, W. El-Hajj, A large scale Arabic sentiment lexicon for Arabic opinion mining, in Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014, pp. 165–173

    Google Scholar 

  28. S.R. El-Beltagy, NileULex: a phrase and word level sentiment lexicon for Egyptian and modern standard Arabic, in LREC (2016)

    Google Scholar 

  29. M. van der Wees, A. Bisazza, C. Monz, A simple but effective approach to improve Arabizi-to-English statistical machine translation, in Proceedings of the 2nd Workshop on Noisy User-Generated Text (WNUT), 2016, pp. 43–50

    Google Scholar 

  30. J. May, Y. Benjira, A. Echihabi, An Arabizi-English social media statistical machine translation system, in Proceedings of the 11th Conference of the Association for Machine Translation in the Americas, 2014, pp. 329–341

    Google Scholar 

  31. I. Guellil, F. Azouaou, M. Abbas, Comparison between neural and statistical translation after transliteration of Algerian Arabic dialect, in WiNLP: Women & Underrepresented Minorities in Natural Language Processing (Co-located with ACL 2017), 2017

    Google Scholar 

  32. I. Guellil, F. Azouaou, Neural vs statistical translation of Algerian Arabic dialect written with Arabizi and Arabic letter, in The 31st Pacific Asia Conference on Language, Information and Computation PACLIC 31 (2017), 2017

    Google Scholar 

  33. R.M. Duwairi, M. Alfaqeh, M. Wardat, A. Alrabadi, Sentiment analysis for Arabizi text, in 2016 7th International Conference on Information and Communication Systems (ICICS) (IEEE, 2016), pp. 127–132

    Google Scholar 

  34. I. Guellil, A. Adeel, F. Azouaou, A. Hussain, SentiALG: automated corpus annotation for Algerian sentiment analysis. arXiv preprint arXiv:1808.05079 (2018)

  35. S. Medhaffar, F. Bougares, Y. Esteve, L. Hadrich-Belguith, Sentiment analysis of Tunisian dialects: linguistic resources and experiments, in Proceedings of the Third Arabic Natural Language Processing Workshop, 2017, pp. 55–61

    Google Scholar 

  36. I. Guellil, F. Azouaou, H. Saâdane, N. Semmar, Une approche fondée sur les lexiques d’analyse de sentiments du dialecte algérien (2017)

    Google Scholar 

  37. I. Guellil, F. Azouaou, F. Benali, A.-E. Hachani, H. Saadane, Approche hybride pour la translitération de l’arabizi algérien : une étude préliminaire, in Conference: 25e conférence sur le Traitement Automatique des Langues Naturelles (TALN), May 2018, Rennes, FranceAt: Rennes, France (2018). https://www.researchgate.net/publication/326354578_Approche_Hybride_pour_la_transliteration_de_l%27arabizi_algerien_une_etude_preliminaire

  38. P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens et al., Moses: open source toolkit for statistical machine translation, in Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (Association for Computational Linguistics, 2007), pp. 177–180

    Google Scholar 

  39. S. Al-Azani, E.-S.M. El-Alfy, Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short Arabic text. Procedia Comput. Sci. 109, 359–366 (2017)

    Article  Google Scholar 

  40. A.A. Altowayan, L. Tao, Word embeddings for Arabic sentiment analysis, in 2016 IEEE International Conference on Big Data (Big Data) (IEEE, 2016), pp. 3820–3825

    Google Scholar 

  41. A. El Mahdaouy, E. Gaussier, S.O. El Alaoui, Arabic text classification based on word and document embeddings, in International Conference on Advanced Intelligent Systems and Informatics (Springer, 2016), pp. 32–41

    Google Scholar 

  42. A. Barhoumi, Y.E.C. Aloulou, L.H. Belguith, Document Embeddings for Arabic Sentiment Analysis (2017)

    Google Scholar 

  43. A. Dahou, S. Xiong, J. Zhou, M.H. Haddoud, P. Duan, Word embeddings and convolutional neural network for Arabic sentiment classification, in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 2016, pp. 2418–2427

    Google Scholar 

  44. M. Attia, Y. Samih, A. El-Kahky, L. Kallmeyer, Multilingual multi-class sentiment classification using convolutional neural networks, in LREC (2018)

    Google Scholar 

  45. R. Zbib, E. Malchiodi, J. Devlin, D. Stallard, S. Matsoukas, R. Schwartz, J. Makhoul, O.F. Zaidan, C. Callison-Burch, Machine translation of Arabic dialects, in Proceedings of the 2012 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics, 2012), pp. 49–59

    Google Scholar 

  46. W. Salloum, N. Habash, Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation, in Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties (Association for Computational Linguistics, 2011), pp. 10–21

    Google Scholar 

  47. M. Taboada, J. Brooke, M. Tofiloski, K. Voll, M. Stede, Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)

    Article  Google Scholar 

  48. T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems, 2013, pp. 3111–3119

    Google Scholar 

  49. Q. Le, T. Mikolov, Distributed representations of sentences and documents, in International Conference on Machine Learning, 2014, pp. 1188–1196

    Google Scholar 

  50. M. Abdul-Mageed, M.T. Diab, M. Korayem, Subjectivity and sentiment analysis of modern standard Arabic, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2 (Association for Computational Linguistics, 2011), pp. 587–591

    Google Scholar 

  51. K. Meftouh, N. Bouchemal, K. Smaïli, A study of a non-resourced language: the case of one of the Algerian dialects, in The third International Workshop on Spoken Languages Technologies for Under-Resourced Languages-SLTU’12, 2012

    Google Scholar 

  52. F.J. Och, H. Ney, A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)

    Article  Google Scholar 

  53. K. Heafield, KenLM: faster and smaller language model queries, in Proceedings of the Sixth Workshop on Statistical Machine Translation (Association for Computational Linguistics, 2011), pp. 187–197

    Google Scholar 

  54. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, BLEU: a method for automatic evaluation of machine translation, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics Association for Computational Linguistics, 2002, pp. 311–318

    Google Scholar 

  55. I. Guellil, F. Azouaou, Arabic dialect identification with an unsupervised learning (based on a lexicon). application case: Algerian dialect, in 2016 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC) and 15th International Symposium on Distributed Computing and Applications for Business Engineering (DCABES) (IEEE, 2016), pp. 724–731

    Google Scholar 

Download references

Acknowledgements

Mr. Mendoza acknowledge funding support from the Millennium Institute for Foundational Research on Data and also by the project BASAL FB0821. The funder played no role in the design of this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imane Guellil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Guellil, I., Azouaou, F., Benali, F., Hachani, A.E., Mendoza, M. (2020). The Role of Transliteration in the Process of Arabizi Translation/Sentiment Analysis. In: Abd Elaziz, M., Al-qaness, M., Ewees, A., Dahou, A. (eds) Recent Advances in NLP: The Case of Arabic Language. Studies in Computational Intelligence, vol 874. Springer, Cham. https://doi.org/10.1007/978-3-030-34614-0_6

Download citation

Publish with us

Policies and ethics