Abstract
Arabic computational linguistics though still relatively new is gaining pace rapidly. While the development of tools for computational linguistics in many languages has come a very long way, and progress has been achieved in creating tools for Arabic, Arabic computational linguistics are in need of much attention. It is not obvious that tools developed for, let us say, English will only need minor modifications before they can be applied to Arabic. Computational tools developed for English rely heavily on enormous work achieved in English linguistics in general, and corpus linguistics more particularly. If Arabic computational linguistics is to achieve its potential, it needs to mirror the hard work done in other languages. Researchers in Arabic computational linguistics should also fully understand the nature of the data they are working with. The present article is not a review of the field, but rather a discussion on the potential, pitfalls, and challenges of Arabic computational linguistics. We will discuss the potential of what research in this field can contribute to linguistic and pedagogical research on Arabic, we will also discuss issues related to defining what ‘Arabic (language)’ is from a linguistic point of view, the nature of the Arabic script, transcription and transliteration, and finally corpus building.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
arTenTen: Corpus of the Arabic Web. https://www.sketchengine.eu/artenten-arabic-corpus/
Archive.org. https://archive.org
Arts, T., Belinkov, Y., Habash, N., Kilgarriff, A., Suchomel, V.: arTenTen: Arabic corpus and word sketches. J. King Saud Univ. Comput. Inf. Sci. 26, 357 (2014). https://doi.org/10.1016/j.jksuci.2014.06.009
Badawi, E.M., Carter, M.G., Gully, A.: Modern Written Arabic: A Comprehensive Grammar. Routledge, London (2004)
Belinkov, Y., Magidow, A., Romanov, M., Shmidman, A., Koppel, M.: Shamela: A Large-Scale Historical Arabic Corpus (2016)
Beth Mardutho. https://bethmardutho.org/simtho/
Buckwalter developed in 1988. http://www.qamus.org/transliteration.htm
CalimaStar. https://calimastar.abudhabi.nyu.edu/analyzer/
Corpus Coranicum. https://corpuscoranicum.de
Ditters, E.: Issues in Arabic computational linguistics. In: Owens, J. (ed.) The Oxford Handbook of Arabic Linguistics. Online Publication (2013)
Eddakrouri, A.: https://sites.google.com/a/aucegypt.edu/infoguistics/directory/Corpus-Linguistics/arabic-corpora
Habash N., Soudi A., and Buckwalter, T.: On Arabic transliteration. In: Soudi, A., Bosch, A., Neumann, G. (eds.) Arabic Computational Morphology. Text, Speech and Language Technology, vol. 38. Springer, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-6046-5_2
The History of the Arabic Book: A New Chapter. Institute for Advanced Study, Near Eastern Studies and Digital Scholarship @IAS Joint Lecture, 4 March 2021. See also https://www.youtube.com/watch?v=Z6KkpF3-73U
Kitab project. http://kitab-project.org
Madamira demo webpage. https://camel.abudhabi.nyu.edu/madamira/. See also http://innovation.columbia.edu/technologies/cu14012_arabic-language-disambiguation-for-natural-language-processing-applications
al-Maktaba al-Shamila. https://shamela.ws
MecEnery, T., Hardie, A., Younis (red), N.: Arabic Corpus Linguistics. Edinburgh University Press, Edinburgh (2019)
Palva, H.: Dialect classification. In: Versteegh, C.H.M., Eid, M. (eds.) Encyclopedia of Arabic Language and Linguistics, vol. 1, A-Ed, pp. 604–613. Leiden, Brill (2006)
Pasha, A., et al.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: Proceedings of the 9th International Conference on Language Resources and Evaluation, pp. 1094–1101 (2014)
Retsö, J.: What is Arabic? In: Owens, J. (ed.) The Oxford Handbook of Arabic Linguistics. Online Publication (2013)
Salloum, W., Habash, N.: ADAM: Analyzer for Dialectal Arabic Morphology. J. King Saud Univ. Comput. Inf. Sci. 26, 372–378 (2014)
Samih, Y.: Dialectal Arabic Processing Using Deep Learning. Inaugural-Dissertation. Heinrich-Heine-Universität Düsseldorf, Düsseldorf (2017)
Shamela: A Large-Scale Historical Arabic Corpus. https://arxiv.org/abs/1612.08989
Stanford University Arabic Natural Language Processing. https://nlp.stanford.edu/projects/arabics.html
Taji, D., Khalifa, S., Obeid, O., Eryani, F., Habash, N.: An Arabic morphological analyzer and generator with copious features. In: Proceedings of the 15th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 140–150. Brussels, Belgium, 31 October 2018
Text Encoding Initiative. https://tei-c.org
The Quranic Arabic Corpus. https://corpus.quran.com
Versteegh, C.H.M.: The Arabic Language, 2nd edn. Edinburgh University Press, Edinburgh (2014)
al-Waraq. https://alwaraq.net/
Wardini, E.: The Quran: Key Words in Context, vol. 1–5. Gorgias Press, Piscataway (2020)
Wardini, E.: The Quran: Key Word Collocations, vol. 1–16. Gorgias Press, Piscataway (2021)
Wehr, H.: Arabisches Wörterbuch für die Schriftsprache der Gegenwart. In: Hans, W., Milton, C.J. (eds.) Leipzig. English translation: A Dictionary of Modern Written Arabic (Arabic-English), 4th edn. Considerably enl. and amended by the author New York: Spoken Language Services (1994)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Wardini, E. (2022). Arabic Computational Linguistics: Potential, Pitfalls and Challenges. In: Loukanova, R. (eds) Natural Language Processing in Artificial Intelligence — NLPinAI 2021. Studies in Computational Intelligence, vol 999. Springer, Cham. https://doi.org/10.1007/978-3-030-90138-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-90138-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90137-0
Online ISBN: 978-3-030-90138-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)