Skip to main content

Arabic Computational Linguistics: Potential, Pitfalls and Challenges

  • Chapter
  • First Online:
Natural Language Processing in Artificial Intelligence — NLPinAI 2021

Part of the book series: Studies in Computational Intelligence ((SCI,volume 999))

  • 520 Accesses

Abstract

Arabic computational linguistics though still relatively new is gaining pace rapidly. While the development of tools for computational linguistics in many languages has come a very long way, and progress has been achieved in creating tools for Arabic, Arabic computational linguistics are in need of much attention. It is not obvious that tools developed for, let us say, English will only need minor modifications before they can be applied to Arabic. Computational tools developed for English rely heavily on enormous work achieved in English linguistics in general, and corpus linguistics more particularly. If Arabic computational linguistics is to achieve its potential, it needs to mirror the hard work done in other languages. Researchers in Arabic computational linguistics should also fully understand the nature of the data they are working with. The present article is not a review of the field, but rather a discussion on the potential, pitfalls, and challenges of Arabic computational linguistics. We will discuss the potential of what research in this field can contribute to linguistic and pedagogical research on Arabic, we will also discuss issues related to defining what ‘Arabic (language)’ is from a linguistic point of view, the nature of the Arabic script, transcription and transliteration, and finally corpus building.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. arTenTen: Corpus of the Arabic Web. https://www.sketchengine.eu/artenten-arabic-corpus/

  2. Archive.org. https://archive.org

  3. Arts, T., Belinkov, Y., Habash, N., Kilgarriff, A., Suchomel, V.: arTenTen: Arabic corpus and word sketches. J. King Saud Univ. Comput. Inf. Sci. 26, 357 (2014). https://doi.org/10.1016/j.jksuci.2014.06.009

    Article  Google Scholar 

  4. Badawi, E.M., Carter, M.G., Gully, A.: Modern Written Arabic: A Comprehensive Grammar. Routledge, London (2004)

    Google Scholar 

  5. Belinkov, Y., Magidow, A., Romanov, M., Shmidman, A., Koppel, M.: Shamela: A Large-Scale Historical Arabic Corpus (2016)

    Google Scholar 

  6. Beth Mardutho. https://bethmardutho.org/simtho/

  7. Buckwalter developed in 1988. http://www.qamus.org/transliteration.htm

  8. CalimaStar. https://calimastar.abudhabi.nyu.edu/analyzer/

  9. Corpus Coranicum. https://corpuscoranicum.de

  10. Ditters, E.: Issues in Arabic computational linguistics. In: Owens, J. (ed.) The Oxford Handbook of Arabic Linguistics. Online Publication (2013)

    Google Scholar 

  11. Eddakrouri, A.: https://sites.google.com/a/aucegypt.edu/infoguistics/directory/Corpus-Linguistics/arabic-corpora

  12. Habash N., Soudi A., and Buckwalter, T.: On Arabic transliteration. In: Soudi, A., Bosch, A., Neumann, G. (eds.) Arabic Computational Morphology. Text, Speech and Language Technology, vol. 38. Springer, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-6046-5_2

  13. The History of the Arabic Book: A New Chapter. Institute for Advanced Study, Near Eastern Studies and Digital Scholarship @IAS Joint Lecture, 4 March 2021. See also https://www.youtube.com/watch?v=Z6KkpF3-73U

  14. Kitab project. http://kitab-project.org

  15. Madamira demo webpage. https://camel.abudhabi.nyu.edu/madamira/. See also http://innovation.columbia.edu/technologies/cu14012_arabic-language-disambiguation-for-natural-language-processing-applications

  16. al-Maktaba al-Shamila. https://shamela.ws

  17. MecEnery, T., Hardie, A., Younis (red), N.: Arabic Corpus Linguistics. Edinburgh University Press, Edinburgh (2019)

    Book  Google Scholar 

  18. Palva, H.: Dialect classification. In: Versteegh, C.H.M., Eid, M. (eds.) Encyclopedia of Arabic Language and Linguistics, vol. 1, A-Ed, pp. 604–613. Leiden, Brill (2006)

    Google Scholar 

  19. Pasha, A., et al.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: Proceedings of the 9th International Conference on Language Resources and Evaluation, pp. 1094–1101 (2014)

    Google Scholar 

  20. Retsö, J.: What is Arabic? In: Owens, J. (ed.) The Oxford Handbook of Arabic Linguistics. Online Publication (2013)

    Google Scholar 

  21. Salloum, W., Habash, N.: ADAM: Analyzer for Dialectal Arabic Morphology. J. King Saud Univ. Comput. Inf. Sci. 26, 372–378 (2014)

    Google Scholar 

  22. Samih, Y.: Dialectal Arabic Processing Using Deep Learning. Inaugural-Dissertation. Heinrich-Heine-Universität Düsseldorf, Düsseldorf (2017)

    Google Scholar 

  23. Shamela: A Large-Scale Historical Arabic Corpus. https://arxiv.org/abs/1612.08989

  24. Stanford University Arabic Natural Language Processing. https://nlp.stanford.edu/projects/arabics.html

  25. Taji, D., Khalifa, S., Obeid, O., Eryani, F., Habash, N.: An Arabic morphological analyzer and generator with copious features. In: Proceedings of the 15th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 140–150. Brussels, Belgium, 31 October 2018

    Google Scholar 

  26. Text Encoding Initiative. https://tei-c.org

  27. The Quranic Arabic Corpus. https://corpus.quran.com

  28. Versteegh, C.H.M.: The Arabic Language, 2nd edn. Edinburgh University Press, Edinburgh (2014)

    Google Scholar 

  29. al-Waraq. https://alwaraq.net/

  30. Wardini, E.: The Quran: Key Words in Context, vol. 1–5. Gorgias Press, Piscataway (2020)

    Google Scholar 

  31. Wardini, E.: The Quran: Key Word Collocations, vol. 1–16. Gorgias Press, Piscataway (2021)

    Google Scholar 

  32. Wehr, H.: Arabisches Wörterbuch für die Schriftsprache der Gegenwart. In: Hans, W., Milton, C.J. (eds.) Leipzig. English translation: A Dictionary of Modern Written Arabic (Arabic-English), 4th edn. Considerably enl. and amended by the author New York: Spoken Language Services (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elie Wardini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wardini, E. (2022). Arabic Computational Linguistics: Potential, Pitfalls and Challenges. In: Loukanova, R. (eds) Natural Language Processing in Artificial Intelligence — NLPinAI 2021. Studies in Computational Intelligence, vol 999. Springer, Cham. https://doi.org/10.1007/978-3-030-90138-7_4

Download citation

Publish with us

Policies and ethics