Skip to main content
Log in

Language resources for Maghrebi Arabic dialects’ NLP: a survey

  • Survey
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Diglossia is one of the main characteristics of Arabic language. In Arab countries, there are three forms of Arabic that co-exist: Classical Arabic (CA) which is mainly used in the Quran and in several classical literary texts, Modern Standard Arabic (MSA) that descends from CA and used as official language, and various regional colloquial varieties of Arabic that are usually referred to as Arabic dialects (AD). Deemed to be amongst low-resource languages, these dialects have aroused increased interest among the NLP community in recent years. Indeed, the various Arabic dialects are increasingly used on the social web and may be transcribed in both the Arabic and the Latin script. The latter is known as Arabizi and seems to be more frequently used for some of them. The AD NLP raises many challenges and requires the availability of large and appropriate language resources. In this study, we focus, in particular, on the Maghrebi Arabic dialects (MADs). We propose a thorough review of the language resources (LRs) that have been generated by the various work carried out on the MAD language processing. A survey of the currently online available MAD NLP dedicated-LRs is also compiled and discussed. LRs investigated in this work are essentially data-resources such as primary and annotated corpora, lexica, dictionaries, ontologies, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.elra.info/en/about/what-language-resource/.

  2. LORELEI (Low Resource Language for Emergent Incidents) is a project funded by US government. Its goal is to develop HLT for low-resource languages, in support of missions related to emergent incidents.

  3. According to https://data.worldbank.org, consulted in May 2019.

  4. Given the vowel systems and the influence of the Semitic substratum in the East and Berber in the West, the various sets of dialects are grouped into two main classes also based on a geographical distinction (Embarki 2008): (1) Oriental Arabic dialects named Mashriqi Arabic, which can be sub-classified into Arabian Peninsula, Mesopotamian, Egyptian and Levantine dialects; (2) Western Arabic dialects named Maghrebi Arabic, which can be sub-classified into Mauritanian, Moroccan, Algerian, Tunisian and Libyan dialects.

  5. This survey examines works that have been published until May 2019.

  6. According to http://countrymeters.info consulted in May 2019.

  7. Berber represents the languages originally spoken by the native populations of the Maghreb prior to their adoption of Arabic.

  8. In Dictionary of meanings “معجم المعاني عربي عربي”: http://www.almaany.com.

  9. The project website: http://resources.camel-lab.com/.

  10. Queries realized on February 14, 2018.

  11. Return resources in all associated individual languages in addition to any resource in the base macrolanguage.

References

  • Abainia, K. (2019). DZDC12: A new multipurpose parallel Algerian Arabizi–French code-switched corpus. Language Resources and Evaluation. https://doi.org/10.1007/s10579-019-09454-8.

    Article  Google Scholar 

  • Abidi, K., Menacer, M. A., & Smaili. K. (2017). Calyou: A comparable spoken Algerian corpus harvested from youtube. In Proceedings of the 8th annual conference of the international communication association (Interspeech). Stockholm.

  • Abidi, K., & Smaili, K. (2017). An empirical study of the Algerian dialect of Social network. In Proceedings of international conference on natural language, signal and speech processing. Casablanca—Morocco.

  • Abidi, K., & Smaili, K. (2018). An Automatic Learning of an Algerian Dialect Lexicon by using Multilingual Word Embeddings. In Proceedings of the 11th edition of the language resources and evaluation conference. Miyazaki, Japan.

  • Adouane, W., & Dobnik, S. (2017). Identification of Languages in Algerian Arabic Multilingual Documents. In Proceedings of the third Arabic natural language processing workshop (pp. 1–8). Valencia, Spain.

  • Adouane, W., Semmar, N., & Johansson, R. (2016a). Romanized berber and romanized arabic automatic language identification using machine learning. In Proceedings of the 3rd workshop on NLP for similar languages, varieties and dialects. Osaka, Japan.

  • Adouane, W., Semmar, N., Johansson, R., & Bobicev, V. (2016b). Automatic detection of arabicized berber and arabic varieties. In Proceedings of the third workshop on NLP for similar languages, varieties and dialects (pp. 63–72). Osaka, Japan.

  • Alhammi, H. A., & Alfard, R. A. (2018). Building a twitter social media network corpus for libyan dialect. International Journal of Computer Electrical Engineering, 10, 1.

    Article  Google Scholar 

  • Ali, A., Dehak, N., Cardinal, P., Khurana, S., Yella, S. H., Glass, J., Bell, P., & Renals, S. (2016). Automatic dialect detection in arabic broadcast speech. In Proceedings of interspeech-2016 (pp. 2934–2938). San Francisco, US.

  • Ali, A., Mubarak, H., & Vogel, S. (2014). Advances in dialectal arabic speech recognition: A study using twitter to improve Egyptian ASR. In Proceedings of the 11th international workshop on spoken language translation (IWSLT 2014). Lake Tahoe, USA.

  • Al-Kabi, M., Al-Ayyoub, M., Alsmadi, I., & Wahsheh, H. (2016). A prototype for a standard arabic sentiment analysis corpus. The International Arab Journal of Information Technology, 13(1), 163–169.

    Google Scholar 

  • Almeman, K., & Lee, M. G. (2012). Towards developing a multi-dialect morphological analyzer for Arabic. In Proceedings of the 4th international conference on Arabic language processing. Rabat, Morocco.

  • Almeman, K., & Lee, M. (2013). Automatic building of arabic multi dialect text corpora by bootstrapping dialect words. In Proceedings of the 1st international conference on communications, signal processing, and their applications. Sharjah, United Arab Emirates.

  • Alsarsour, I., Mohamed, E., Suwaileh, R., & Elsayed, T. (2018). DART: A large dataset of dialectal Arabic tweets. In Proceedings of the 11th edition of the language resources and evaluation conference. Miyazaki, Japan.

  • Al-Shargi, F., Kaplan, A., Eskander, R., Habash, N., & Rambow. O. (2016). Morphologically annotated corpora and morphological analyzers for Moroccan and Sanaani Yemeni Arabic. In Proceedings of the 10th language resources and evaluation conference. Portoroz, Slovenia.

  • Alshutayri, A., & Atwell, E. (2017). Exploring twitter as a source of an arabic dialect corpus. International Journal of Computational Linguistics, 8, 2.

    Google Scholar 

  • Alshutayri, A., & Atwell, E. (2018a). Arabic dialects annotation using an online game. In Proceedings of the 2nd international conference on natural language and speech processing. Algiers, Algeria.

  • Alshutayri, A., & Atwell, E. (2018b). Creating an Arabic dialect text corpus by exploring twitter, facebook, and online newspapers. In Proceedings of the 3rd workshop on open-source Arabic corpora and processing tools. Miyazaki, Japan.

  • Altamimi, M., Alruwaili, O., & Teahan, W. J. (2018). BTAC: A twitter corpus for Arabic dialect identification. In Proceedings of the 6th conference on computer-mediated communication (CMC) and social media corpora (CMC-corpora 2018). Antwerp, Belgium.

  • Amazouz, D., Adda-Decker, M., & Lamel, L. (2017). Addressing code-switching in French/Algerian Arabic speech. In Proceedings of INTERSPEECH 2017. Stockholm, Sweden.

  • Amazouz, D., Adda-Decker, M., & Lamel, L. (2018). The French-Algerian code-switching triggered audio corpus (FACST). In Proceedings of 11th international conference on language resources and evaluation LREC 2018 (pp. 1468–1473). Miyazaki, Japan.

  • Ameur, H., Jamoussi, S., & Ben Hamadou, A. (2016). Exploiting emoticons to generate emotional dictionaries from Facebook pages. Intelligent Decision Technologies, Springer, 2016, 39–49.

    Google Scholar 

  • Aridhi, C., Achour, H., Souissi, E., & Younes, J. (2017). Word-level identification of romanized tunisian dialect. In Proceedings of the 22nd international conference on natural language & information systems (pp. 170–175). Liège, Belgium.

  • Assiri, A., Emam, A., & Aldossari, H. (2015). Arabic sentiment analysis: A survey. IJACSA, 6, 12.

    Google Scholar 

  • Azouaou, F., & Guellil, I. (2017). ALG/FR: A step by step construction of a lexicon between Algerian Dialect and French. In Proceedings of the 31st Pacific Asia conference on language, information and computation, PACLIC 31. Cebu, Philippines.

  • Barkat, M. (1999). Identification if Arabic dialects and experimental determination of distinctive cues. In Proceedings of the 14th international congress of phonetic sciences. San Francisco, US.

  • Barkat, M., Hamdi, R., & Pellegrino, F. (2004). De la caractérisation linguistique à l’identification automatique des dialectes arabes. In Proceedings of the MIDL Workshop. Paris, France.

  • Barkat, M., & Vasilescu, I. (2001). From perceptual designs to linguistic typology and automatic language identification: Overview and perspectives. In Proceedings of Eurospeech, 7th European conference on speech communication and technology. Aalborg, Denmark.

  • Barkat, M., Vasilescu, I., & Pellegrino, F. (2003). Stratégies perceptuelles et identification automatique des langues. Revue Parole, 25, 26.

    Google Scholar 

  • Belgacem, M. (2009). Construction d’un corpus robuste de différents dialectes arabes. Actes des 8emes Rencontres Jeunes Chercheurs en Parole, 33.

  • Ben Moussa, N. K., & Alimi, A. M. (2015). Construction d’un Wordnet standard pour l’Arabe tunisien. In Proceedings of Colloque pour les Étudiants Chercheurs en Traitement Automatique du Langage naturel et ses applications. Sousse, Tunisia.

  • Ben Moussa, N. K., Soussou, H., Alimi A. M. (2016). Intelligent Tunisian Arabic morphological analyzer. In Proceedings of the 2016 IEEE/ACS 13th international conference of computer systems and applications (AICCSA). Agadir, Morocco.

  • Ben Moussa, N. K., Soussou, H., & Alimi, A. M. (2014). Building a standardized Wordnet in the ISO LMF for aeb language. In Proceedings of the 7th Global Wordnet Conference (GWC 2014), association for computational linguistics (pp.71—77). Tartu-Estonia.

  • Ben Moussa, N. K., Soussou, H., Alimi, A. M. (2015). Tunisian Arabic aebWordnet: Current state and future extensions. In Proceedings of the first international conference on Arabic computational linguistics. Cairo, Egypt.

  • Ben Moussa, N. K., Soussou, H., & Alimi, A. (2019). Tunisian arabic chat alphabet transliteration using probabilistic finite state transducers. The International Arab Journal of Information Technology, 16, 2.

    Google Scholar 

  • Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2013). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56, 85–100.

    Article  Google Scholar 

  • Bezoui, M., Beni Hssane A., & Elmoutaouakkil, A. (2019). Speech recognition of moroccan dialect using hidden Markov models. In Proceedings of international symposium on machine learning and big data analytics for cybersecurity and privacy (MLBDACP). Leuven, Belgium.

  • Bouamor, H., Habash, N., & Oflazer, K. (2014). A multidialectal parallel corpus of Arabic. In Proceedings of the ninth international conference on language resources and evaluation. Iceland, May.

  • Bouamor, H., Habash, N., Salameh, M., Zaghouani, W., Rambow, O., Abdulrahim, D., Obeid, O., Khalifa, S., Eryani, F., Erdmann, A., & Oflazer, K. (2018). The MADAR Arabic Dialect Corpus and Lexicon. In Proceedings of the 11th edition of the language resources and evaluation conference. Miyazaki, Japan.

  • Bouchlaghem, R., Elkhlifi, A., & Faiz, R. (2014). Tunisian dialect Wordnet creation and enrichment using web resources an other Wordnets. In Proceedings of the EMNLP 2014 Workshop on Arabic natural language processing (pp. 104—113). Doha, Qatar.

  • Bougrine, S., Cherroun, H., & Ziadi, D. (2015). Prosody-based Spoken Algerian Arabic dialect identification. In Proceedings of the international conference on natural language and speech processing. Algiers, Algeria.

  • Bougrine, S., Cherroun, H., Ziadi, D., Lakhdari, A., & Chorana, A. (2016). Toward a rich Arabic speech parallel corpus for algerian sub-dialects. In Proceedings of the 2nd workshop on Arabic corpora and processing tools 2016 theme: Social Media. Portorož, Slovenia.

  • Bougrine, S., Chorana, A., Lakhdari, A., & Cherroun, H. (2017). Toward a web-based speech corpus for Algerian Arabic dialectal varieties. In Proceedings of the 3rd Arabic natural language processing workshop (WANLP) (pp. 138—146). Valencia, Spain.

  • Boujelbane, R., Khemakhem, M. E., Béchet, F., & Belguith, L. H. (2015). De l’arabe standard vers l’arabe dialectal: Projection de corpus et ressources linguistiques en vue du traitement automatique de l’oral dans les médias tunisiens. Revue TAL, 55, 2.

    Google Scholar 

  • Boujelbane, R., Khemekhem, M. E., & Belguith, L. H. (2013b). Mapping rules for building a Tunisian Dialect Lexicon and generating corpora. In Proceedings of the international joint conference on natural language processing. Nagoya, Japan.

  • Boujelbane, R., Khemekhem, M. E., BenAyed, S., & Belguith, L. H. (2013a). Building Bilingual Lexicon to Create Dialect Tunisian Corpora and Adapt Language Model. In Proceedings of the 2nd workshop on hybrid approaches to translation, ACL 2013. Sofia, Bulgaria.

  • Boujelbane, R., Mallek, M., Khemakhem., M. E., & Belguith L. H. (2014). Fine-grained POS Tagging of Spoken Tunisian Dialect Corpora. In Proceedings of the 19th international conference on application of natural language to information systems (pp. 59–62). Montpellier, France.

  • Boujelbane, R., Zribi, I., Kharroubi, S., & Khemakhem, M. E. (2016). An automatic process for Tunisian Arabic orthography normalization. In Proceedings of the 10th international conference on natural language processing (HrTAL2016). Dubrovnik, Croatia.

  • Callan, J., Hoy, M., Yoo, C., & Zhao, L. (2009). The ClueWeb09 Dataset, 2009. Presentation Nov. 19, 2009 at NIST TREC. Slides online at boston.lti.cs.cmu.edu/classes/11-742/S10-TREC/TREC-Nov19-09.pdf.

  • Cotterell, R., & Callison-Burch, C. (2014). A multi-dialect, multi-genre corpus of informal written Arabic. In Proceedings of the 9th international conference on language resources and evaluation. Reykjavik, Iceland.

  • Cotterell, R., Renduchintala, A., Saphra, N., & Callison-Burch, C. (2014). An Algerian Arabic-French code-switched corpus. In Proceedings of the workshop on free/open-source arabic corpora and corpora processing tools workshop programme (pp. 34). Reykjavik, Iceland.

  • Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V., Dimitrov, M., Dowman, M., et al. (2009). Developing language processing components with GATE Version 5 (a User Guide). Sheffield: The University of Sheffield.

    Google Scholar 

  • Darwish, K., Abdelali, A., Mubarak, H., Samih, Y., & Attia, M. (2018b). Diacritization of Moroccan and Tunisian Arabic dialects: A CRF approach. In Proceedings of the 3rd workshop on open-source Arabic corpora and processing tools. Miyazaki, Japan.

  • Darwish, K., Mubarak, H., Abdelali, A., Eldesouki, M., Samih, Y., Alharbi, R., Attia, M., Magdy, W., & Kallmeyer, L. (2018a). Multi-dialect Arabic POS tagging: A CRF approach. In Proceedings of the 11th edition of the language resources and evaluation conference. Miyazaki, Japan.

  • Diab, M., Habash, N., Rambow, O., Altantawy, M., & Benajiba, Y. (2010). COLABA: Arabic dialect annotation and processing. In Proceedings of the LREC workshop on semitic language processing (pp. 66—74). Malta.

  • Djellab, M., Amrouche, A., Bouridane, A., & Mehallegue, N. (2017). Algerian modern colloquial Arabic speech corpus (AMCASC): Regional accents recognition within complex socio-linguistic environments. Language Resources and Evaluation, 51(3), 613–641.

    Article  Google Scholar 

  • Duong, L. (2017). Natural language processing for resource-poor languages. Ph.D. thesis, the University of Melbourne. Melbourne, Australia.

  • Eldesouki, M., Samih, Y., Abdelali, A., Attia, M., Mubarak, H., Darwish, K., & Kallmeyer, L. (2017). Arabic multi-dialect segmentation: bi-LSTM-CRF vs. SVM. CoRR, abs/1708.05891.

  • El-Haj, M., Kruschwitz, U., & Fox, C. (2014). Creating language resources for under-resourced languages: Methodologies, and experiments with Arabic. Language Resources and Evaluation, 46(3), 549–580.

    Article  Google Scholar 

  • EL-Haj, M., Rayson, P., & Aboelezz, M. (2018). Arabic dialect identification in the context of bivalency and code-switching. In Proceedings of the 11th edition of the language resources and evaluation conference (pp. 3622—3627). Miyazaki, Japan.

  • Elimam, A. (2004). Le maghribi, alias ed-darija, langue consensuelle du Maghreb. éd. Dar El Gharb. Alger.

  • Elimam, A. (2009). Du Punique au Maghribi Trajectoires d’une langue sémito-méditerranéenne. Synergies Tunisie no 1, 25–38.

  • Elimam, A. (2012). Le maghribi, vernaculaire majoritaire à l’épreuve de la minoration. Oran: ENSET.

    Google Scholar 

  • Elkateb, S., Black, B., Vossen, P., Farwell, D., Pease, A., & Fellbaum, C. (2006). Arabic WordNet and the challenges of Arabic. In Proceedings of the challenge of Arabic for NLP/MT conference (pp. 15—24). London, UK.

  • Elkhlifi, A., Bouchlaghem, R., & Rhazi, A. (2014). Opinion extraction in Moroccan Dialect Texts. In Proceedings of the 5th international conference on arabic language processing. Oujda, Morocco.

  • Baly R., El-Khourya, G., Moukalleda, R., Aouna, R., Hajja, H., Shabanb, K. B., & El-Hajj, W. (2017). Comparative evaluation of sentiment analysis methods across Arabic dialects. In Proceedings of the 3rd international conference on arabic computational linguistics, ACLing 2017, Dubai. United Arab Emirates.

  • El Abdouli. A., Hassouni, L., Anoun, H. (2019). A distributed approach for mining Moroccan Hashtags using Twitter Platform. In Proceedings the 2nd international conference on networking, information systems & security. Rabat, Morocco.

  • Elmarakshy, R., & Ismail, M.A. (2015). Compiling a dialectal Arabic lexicon Using Latent Topic models. In Proceedings of the 1st international conference on advanced intelligent system and informatics (AISI2015). Beni Suef, Egypt.

  • Embarki, M. (2008). Les dialectes arabes modernes: état et nouvelles perspectives pour la classification géo-sociologique. Arabica, 5(6), 583–604.

    Article  Google Scholar 

  • Eskander, R., & Habash. N. (2013). Automatic correction and extension of morphological annotations. In Proceedings of the 7th linguistic annotation workshop & interoperability with discourse (pp.1–10). Sofia, Bulgaria.

  • Fishman, A. J. (1999). Handbook of language and ethnic identity. New York: Oxford University Press.

    Google Scholar 

  • Graff, D., & Maamouri, M. (2012). Developing LMF-XML bilingual dictionaries for colloquial Arabic dialects. In Proceedings of the 8th international conference on language resources and evaluation (pp. 269–274). Istanbul, Turkey.

  • Graja, M., Jaoua, M., & Belguith, L. H. (2010). Lexical study of a spoken dialogue corpus in Tunisian dialect. In Proceedings of the international arab conference on information technology (ACIT’2010). Benghazi-Libya.

  • Graja, M., Jaoua, M., & Belguith, L. H. (2011a). Building ontologies to understand spoken Tunisian dialect. International Journal of Computer Science, Engineering and Applications, 1, 4.

    Article  Google Scholar 

  • Graja, M., Jaoua, M., & Belguith, L. H. (2011b). Towards understanding Spoken Tunisian dialect. In Proceedings of the 18th international conference (ICONIP 2011). Shanghai, China

  • Graja, M., Jaoua, M., & Belguith, L. H. (2013). Discriminative framework for spoken Tunisian dialect understanding. In Proceedings of the first international conference on statistical language and speech processing (SLSP 2013). Tarragona, Spain.

  • Graja, M., Jaoua, M., & Belguith, L. H. (2015). Statistical framework with knowledge base integration for robust speech understanding of the Tunisian dialect. In IEEE/ACM transactions on audio, speech, and language processing, 23(12).

  • Guellil, I., Adeel, A., Azouaou, F., & Hussain, A. (2018b). Sentialg: Automated corpus annotation for Algerian sentiment analysis. In Proceedings of the international conference on brain inspired cognitive systems (pp. 557-567).

  • Guellil, I., Adeel, A., AZOUAOU, F., Hachani, A. E., & Hussain, A. (2018c). Arabizi sentiment analysis based on transliteration and automatic corpus annotation. In Proceedings of the 9th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 335-341). Brussels, Belgium.

  • Guellil, I., & Azouaou, F. (2016a). Arabic Dialect Identification with an Unsupervised Learning (Based on a Lexicon). Application Case: ALGERIAN Dialect. In Proceedings of the 2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES) (pp. 724–731).

  • Guellil, I., & Azouaou, F. (2016b). ASDA: Analyseur Syntaxique du Dialecte Algérien dans un but d’analyse sémantique. In Proceedings of Conférence Nationale d’Intelligence Artificielle. Clermont-Ferrand, France.

  • Guellil, I., & Azouaou, F. (2017). Bilingual Lexicon for Algerian Arabic Dialect Treatment in Social Media. In Proceedings of WiNLP: Women & underrepresented minorities in natural language processing (co-located with ACL 2017). Vancouver, Canada.

  • Guellil, I., Azouaou, F., Abbas, M., & Sadat, F. (2017a). Arabizi transliteration of Algerian Arabic dialect into Modern Standard Arabic. In Proceedings of the first workshop on social media and user generated content machine translation (co-located with EAMT 2017). Prague, Czech Republic.

  • Guellil, I., Azouaou, F., Benali, F., Hachani, A. E., & Saadane, H. (2018a). Approche Hybride pour la translitération de l’arabizi algérien: une étude préliminaire. In Proceedings of the 25e conférence sur le Traitement Automatique des Langues Naturelles (TALN). Rennes, France.

  • Guellil, I., Azouaou, F., Saâdane, H., & Semmar, N. (2017b). Une approche fondée sur les lexiques d’analyse de sentiments du dialecte algérien. La revue internationale Traitement Automatique des Langues (TAL) (pp. 41–65).

  • Rahab, H. Zitouni, A., & Djoudi, M. (2017). ARAACOM: ARAbic algerian corpus for opinion mining. In Proceedings of the 3rd international conference of computing for engineering and sciences. Istanbul, Turkey.

  • Habash, N., Diab, M., & Rabmow. O. (2012). Conventional orthography for dialectal Arabic. In Proceedings of the 8th international conference on language resources and evaluation. Istanbul, Turkey.

  • Habash, N., Eryani, F., Khalifa, S., Rambow, O., Abdulrahim, D., Erdmann, A., Faraj, R., Zaghouani, W., Bouamor, H., Zalmout, N., Hassan, S., Al shargi, F., Alkhereyf, S., Abdulkareem, B., Eskander, R., Salameh, M., & Saddiki, H. (2018). Unified guidelines and resources for Arabic dialect orthography. In Proceedings of the 11th edition of the language resources and evaluation conference. Miyazaki, Japan.

  • Hamdi, A., Boujelbane, R., Habash, N., & Nasr, A. (2013a). Un Système de Traduction de Verbes entre Arabe Standard et Arabe Dialectal par Analyse Morphologique Profonde. In Proceedings of TALN 2013. Nantes, France.

  • Hamdi, A., Boujelbane, R., Habash, N., & Nasr, A. (2013b). The Effects of factorizing root and pattern mapping in bidirectional Tunisian—standard Arabic machine translation. In Proceedings of MT Summit 2013. Nice, France.

  • Hamdi, A., Gala, N., & Nasr, A. (2014). Automatically building a Tunisian Lexicon for Deverbal Nouns. In Proceedings of the first workshop on applying NLP tools to similar languages, Varieties and Dialects (pp. 95—102). Dublin, Ireland.

  • Hamdi, A. Nasr, A., Habash, N., & Gala, N. (2015). POS-tagging of Tunisian dialect using standard arabic resources and tools. In Proceedings of the second workshop on arabic natural language processing (pp. 59–68). Beijing, China.

  • Harrat, H., Abbas, M., Meftouh, K., & Smaili, K. (2013). Diacritics restoration for Arabic dialect texts. In Proceedings of interspeech-2013. Lyon, France.

  • Harrat, S., Meftouh, K., & Smaili. K. (2017a). Creating Parallel Arabic dialect corpus: Pitfalls to avoid. In Proceedings of the 18th international conference on computational linguistics and intelligent text processing (CICLING). Budapest, Hungary.

  • Harrat, S., Meftouh, K., & Smaïli. K. (2017b). Machine translation for Arabic dialects (survey). Information processing and management.

  • Harrat, S., Meftouh, K., & Smaïli. K. (2017c). Maghrebi Arabic dialect processing: An overview. In Proceedings of the international conference on natural language, signal and speech processing. Casablanca, Morocco.

  • Harrat, S., Meftouh, K., Abbas, M., Hidouci, K. W., & Smaili, K. (2016). An algerian dialect: Study and resources. International Journal of Advanced Computer Science and Applications, 7, 3.

    Article  Google Scholar 

  • Harrat, S., Meftouh, K., Abbas, M., Jamoussi, S., Saad, M., & Smaili, K. (2015). Cross-dialectal Arabic processing. In Proceedings of the 16th international conference on computational linguistics and intelligent text processing. Cairo, Egypt.

  • Harrat, S., Meftouh, K., Abbas, M., & Smaïli, K. (2014). Building resources for algerian arabic dialects. Corpus (sentences), 4000, 2415.

    Google Scholar 

  • Harrell, R. S. (1963). A dictionary of Moroccan Arabic: Moroccan-English. Georgetown University Press.

  • Harrell, R. S., & Bergman, E. M. (2004). A dictionary of Moroccan Arabic: Moroccan-English/English-Moroccan. Georgetown Classics in Arabic Languages and Linguistics series.

  • Hassine, M., Boussaid, L., & Messaoud, H. (2016). Maghrebian dialect recognition based on support vector machines and neural network classifiers. International Journal of Speech Technology, 19(4), 987–995.

    Article  Google Scholar 

  • Hassine, M., Boussaid, L., & Messaoud, H. (2018). Tunisian Dialect Recognition Based on Hybrid Techniques. The International Arab Journal of Information Technology, 15, 1.

    Google Scholar 

  • Iskra, D. J., Siemund, R., Borno, J., Moreno, A., Emam, O., Choukri, K., Gedge, O., Tropf, H., Nogueiras, A., Zitouni, I., Tsopanoglou, A., & Fakotakis, N. (2004). Orientel-telephony databases across northern Africa and the middle east. In Proceedings of the 4th international conference on language resources and evaluation. Lisbon, Portugal

  • Karoui, J., Graja, M., Boudabous, M. M., & Belguith, L. H. (2013a). Domain ontology construction from a Tunisian spoken dialogue corpus. In Proceedings of the international conference on web and information technologies (ICWIT 2013). Hammamet, Tunisia.

  • Karoui, J., Graja, M., Boudabous, M. M., & Belguith, L. H. (2013b). Semi-automatic domain ontology construction from spoken corpus in Tunisian dialect: Railway request information. International Journal of Recent Contributions from Engineering, Science & IT, 1(1), 35–38.

    Article  Google Scholar 

  • Lachachi, N.-E., & Adla, A. (2015). GMM-Based Maghreb dialect identification system. Journal of Information Processing Systems., 11(1), 22–38.

    Google Scholar 

  • Lachachi N., & Adla A. (2016a). Identification Automatique des Dialectes du Maghreb. Revue Maghrébine des Langues (RML10), 85–101.

  • Lachachi N., & Adla A. (2016b). Two approaches-based L2-SVMs reduced to MEB problems for dialect identification, International Journal of Computational Vision and Robotics.

  • Lichouri, M., Abbas, M., Freihat, A. A., & Megtouf, D. E. H. (2018). Word-level vs sentence-level language identification: Application to Algerian and arabic dialects. Procedia Computer Science, 142, 246–253.

    Article  Google Scholar 

  • Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The Penn Arabic Treebank: Building a large-scale annotated Arabic Corpus. In Proceedings of NEMLAR conference on Arabic language resources and tools. Cairo, Egypt.

  • Masmoudi, A., Bougares, F., Khmekhem, M. E., Estève, Y., & Belguith, L. H. (2017). Automatic speech recognition system for Tunisian dialect. Language Resources and Evaluation, 52(1), 249–267.

    Article  Google Scholar 

  • Masmoudi, A., Habash, N., Khemakhem, M. E., & Belguith, L. H. (2015). Arabic transliteration of romanized Tunisian dialect text: A preliminary investigation. In Proceedings of the 16th international conference on intelligent text processing and computational linguistics. Cairo, Egypt.

  • Masmoudi, A., Khemakhem, M. E., Estève, Y., Bougares, F., Dabbar, S., & Belguith, L. H. (2014a). Phonétisation automatique du dialecte tunisien. In Proceedings of JEP 2014. Le Mans, France.

  • Masmoudi, A., Khemakhem, M. E., Estève, Y., Belguith, L. H., & Habash, N. (2014b). A corpus and phonetic dictionary for Tunisian Arabic speech recognition. In Proceedings of the 9th edition of the language resources and evaluation conference. Reykjavik, Iceland.

  • Masmoudi, A., Estève, Y., Khmekhem, M. E., Bougares, F., & Belguith, L. H. (2014c). Phonetic Tool for the Tunisian Arabic. In Proceedings of the 4th international workshop on spoken language technologies for under-resourced languages. St. Petersburg, Russia.

  • Mataoui, M., Zelmati, O., & Boumechache, M. (2016). a proposed lexicon-based sentiment analysis approach for the vernacular Algerian Arabic. Research in Computing Science, 110, 55–70.

    Article  Google Scholar 

  • McNeil, K. (2012). Tunisian Arabic Morphological Parser. Ling-420.

  • McNeil, K. (2015). Tunisian Arabic corpus: A written corpus of an “unwritten” language. Vienna: International Symposium on Tunisian and Libyan Arabic Dialects, University of Vienna.

    Google Scholar 

  • McNeil, K., & Faiza, M. (2011). Tunisian Arabic Corpus: Creating a written corpus of an “unwritten” language. In Proceedings of the Workshop on Arabic Corpus Linguistics. Lancaster University, UK.

  • Mdhaffar, S., Bougares, F., Estève, Y., & Belguith, L. H. (2017). Sentiment analysis of Tunisian Dialect: Linguistic Resources and Experiments. In Proceedings of the 3rd Arabic natural language processing workshop (pp. 55–61). Valencia, Spain.

  • Meftouh, K., Bouchemal, N., & Smaïli, K. (2012). A study of a non-resourced language: An Algerian dialect. In Proceedings of the 3rd international workshop on spoken languages technologies for under-resourced languages (pp. 125–132). Cape Town, South Africa.

  • Meftouh, K., Harrat, K., Jamoussi, S., Abbas, M., & Smaili. K. (2015). Machine Translation Experiments on PADIC: A parallel arabic dialect corpus. In Proceedings of the 29th Pacific Asia conference on language, information and computation. Shanghai, China.

  • Meftouh, K., Harrat, S., & Smaïli, K. (2018). PADIC: Extension and new experiments. In Proceedings of the 7th international conference on advanced technologies. Antalya, Turkey.

  • Mekki, A., Zribi, I., Khemakhem, M. E., & Belguith, L. H. (2017). Syntactic Analysis of the Tunisian Arabic. In Proceedings of the international workshop on language processing and knowledge management. Sfax, Tunisia.

  • Mohand, T. (1999). Substrat et convergences: Le berbère et l’arabe nord-africain. Estudios de Dialectologi´a Norteaafricana y andalus´ı, 4, 99–119.

  • Mourtada, R., & Salem, F. (2014). Citizen engagement and public services in the Arab World: The potential of social media. Arab Social Media Report series, 6th edition.

  • Mrini, K., & Bond. F. (2017). Building the Moroccan darija wordnet (mdw) using bilingual resources. In Proceedings of the international conference on natural language, signal and speech processing (ICNLSSP). Casablanca, Morocco.

  • Mubarak, H. (2018), Dial2MSA: A Tweets Corpus for Converting Dialectal Arabic to Modern Standard Arabic. In Proceedings of the 3rd Workshop on Open-Source Arabic Corpora and Processing Tools. Miyazaki, Japan.

  • Mubarak, H., & Darwish, K. (2014). Using Twitter to Collect a Multi-Dialectal Corpus of Arabic. In Proceedings of the EMNLP 2014 workshop on Arabic natural language processing (pp. 1—7). Doha, Qatar.

  • Mzoughi, I. (2015). Intégration des emprunts lexicaux au français en arabe dialectal tunisien. Linguistique: Université de Cergy Pontoise.

    Google Scholar 

  • Neifar, W., Bahou, Y., Graja, M., & Jaoua, M. (2014). Implementation of a symbolic method for the Tunisian Dialect understanding. In Proceedings of the 5th international conference on Arabic language processing (CITALA 2014). Oujda, Morocco.

  • Novotney, S., Schwartz, R., & Khudanpurb, S. (2016). Getting more from automatic transcripts for semi-supervised language modeling. Computer Speech & Language, 36, 93–109.

    Article  Google Scholar 

  • Oussous, A., Lahcen, A. A., & Belfkih, S. (2018). Improving sentiment analysis of Moroccan tweets using ensemble learning. In Proceedings of the 3rd international conference on big data, cloud and applications (pp. 91–104). Kenitra, Morocco.

  • Pellegrino, F., & Barkat, M. (1999). Investigating dialectal differences via vowel system modeling: Application to Arabic. In Proceedings of the 14th international congress of phonetic sciences. San Francisco, USA.

  • Pereira, C. (2005). Arabe maghrébin. In Proceedings of Actes du Colloque International Langues d’Europe et de la Méditerranée LEM. Nice, France.

  • Pereira, C. (2011). Arabic in the North African Region. Stefan Weniger (ed) in collaboration with Geoffrey Khan, Michael P. Streck and Janet C. E. Watson. Semitic Languages, 944–959.

  • Rahab, H., Zitouni, A., & Djoudi, M. (2019). SANA: Sentiment analysis on newspapers comments in Algeria, Journal of King Saud UniversityComputer and Information Sciences, https://doi.org/10.1016/j.jksuci.2019.04.012.

  • Rosner, M. (2009). Electronic language resources for Maltese. B. Comrie, R. Fabri, E. Hume, M. Mifsud & M. Vanhove (Eds.), Introducing maltese linguistics. John Benjamins Publishing, 113, 251-276.

  • Saadane, H., Guidere, M., & Fluhr, C. (2013). La reconnaissance automatique des dialectes arabes à l’écrit. In Proceedings of colloque international «Quelle place pour la langue arabe aujourd’hui» (pp. 18–20).

  • Saadane, H., & Habash, N. (2015). A conventional orthography for Algerian Arabic. In Proceedings of the second workshop on ARABIC natural language processing (pp. 69–79). Beijing, China.

  • Saadane, H., Nouvel, D., Seffih, H., & Fluhr, C. (2017). Une approche linguistique pour la détection des dialectes arabes. Actes de TALN 2017, 2: Articles courts.

  • Saadane, H.,Seffih, H., Fluhr, C., Choukri, K., & Semmar, N. (2018). Automatic identification of Maghreb Dialects using a dictionary-based approach. In Proceedings of the 11th edition of the language resources and evaluation conference. Miyazaki, Japan.

  • Sadat, F., Kazemi, F., & Farzindar, A. (2014a). Automatic identification of Arabic dialects in social media. In Proceedings of the first international workshop on Social media retrieval and analysis (pp. 35–40).

  • Sadat, F., Kazemi, F., & Farzindar, A. (2014b). Automatic identification of Arabic language varieties and dialects in social media. In Proceedings of the second workshop on natural language processing for social media (pp. 22–27). Dublin, Ireland.

  • Sadat, F., Mallek, F., Sellami, R., Boudabous, M. M., & Farzindar, A. (2014c). Collaboratively constructed linguistic resources for language variants and their exploitation in NLP applications-the case of Tunisian Arabic and the social media. In Proceedings of the workshop on lexical and grammatical resources for language processing (pp. 102). Dublin, Ireland.

  • Salama, A., Bouamor, H., Mohit, B., & Oflazer, K. (2014). YouDACC: The Youtube dialectal Arabic commentary Corpus. In Proceedings of the 9th International conference on language resources and evaluation (pp. 1246—1251). Reykjavik, Iceland.

  • Salem, F. (2017). Social media and the internet of things towards data-driven policymaking in the Arab world: Potential, limits and concerns. The Arab Social Media Report, 7, 462.

    Google Scholar 

  • Samih, Y., Eldesouki, M., Attia, M., Darwish, K., Abdelali, A., Mubarak, H., & Kallmeyer, L. (2017). Learning from relatives: Unified dialectal Arabic segmentation. In Proceedings of the 21st conference on computational natural language learning (pp. 432–441). Vancouver, Canada.

  • Samih, Y., Maharjan, S., Attia, M., Kallmeyer, L., & Solorio, T. (2016). Multilingual code-switching Identification via LSTM recurrent neural networks. In Proceedings of the second workshop on computational approaches to code switching (pp. 50–59). Austin, USA.

  • Samih, Y., & Maier, W. (2016a). An Arabic-Moroccan Darija Code-Switched Corpus. In Proceedings of the 10th edition of the language resources and evaluation conference. Portorož, Slovenia.

  • Samih, Y., & Maier, W. (2016b). Detecting Code-switching in Moroccan Arabic social media. In Proceedings of SocialNLP @ IJCAI-2016. New York, USA.

  • Sayadi, K., Liwicki, M., Ingold, R., & Bui, M. (2016). Tunisian dialect and modern standard Arabic dataset for sentiment analysis: Tunisian election context. In Proceedings of the 17th international conference on intelligent text processing and Arabic computational linguistics. Konya, Turkey.

  • Sayahi, H. (2014). Diglossia and language contact: Language variation and change in North Africa. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Shoufan, A. & Alameri, S. (2015). Natural language processing for dialectical Arabic: A survey. In Proceedings of the second workshop on Arabic natural language processing. Beijing, China.

  • Soumeur, A., Mokdadi, M., Guessoum, A., & Daoud, A. (2018). Sentiment analysis of users on social networks: Overcoming the challenge of the loose usages of the Algerian dialect. Procedia computer science, 142, 26–37.

    Article  Google Scholar 

  • Suwaileh, R., Kultlu, M., Fathima, N., Elsayed, T., & Lease, M. (2016). ArabicWeb16: A new crawl for today’s Arabic web. In Proceedings of the 39th annual international ACM SIGIR conference on research and development in information retrieval: SIGIR’16 (pp. 673–676). Pisa, Italy.

  • Tachicart, R., & Bouzoubaa, K. (2014). A hybrid approach to translate Moroccan Arabic dialect. In Proceedings of the 9th international conference on intelligent systems, (SITA’14). Rabat, Morocco.

  • Tachicart, R., Bouzoubaa, K., & Jaafar. H. (2014). Building a Moroccan dialect electronic dictionary (MDED). In Proceedings of the 5th international conference on Arabic language processing (CITALA). Oujda, Morocco.

  • Tachicart, R., Bouzoubaa, K., Lhoussain, A. S., & Jaafar. H. (2017). Automatic identification of Moroccan Colloquial Arabic. In Proceedings of the 6th international conference on Arabic language processing. Fez, Morocco.

  • Takezawa, T., Kikui, G., Mizushima, M., & Sumita, E. (2007). Multilingual spoken language corpus development for communication research. Computational Linguistics and Chinese Language Processing, 12(3), 303–324.

    Google Scholar 

  • Terbeh, N., Maraoui, M., & Zrigui, M. (2018). Arabic dialect identification based on probabilistic-phonetic modeling. Computación y Sistemas, 22(3), 863–870.

    Article  Google Scholar 

  • Torjmen, R., & Haddar, K. (2018a). Morphological analyzer for the Tunisian dialect. In Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) International workshop on temporal, spatial, and spatio-temporal data mining, 11107, 180–187.

  • Torjmen, R., & Haddar, K. (2018b). Construction of morphological grammars for the Tunisian dialect. In Formalizing natural languages with NooJ 2018 and its natural language processing applications, proceedings of the 12th international conference, NooJ 2018. Palermo, Italy.

  • Tratz, S., Briesch, D., Laoudi, J., Voss, C., & Holland, V. M. (2014). Language and dialect identification in social media analysis. In Proceedings of SPIE sensing technology + applications. Baltimore, USA.

  • Turki, H., Adel, I., Daouda, T., & Régragui, N. (2016). A conventional orthography for Maghrebi Arabic. In Proceedings of the 10th edition of the language resources and evaluation conference. Portoroz, Slovenia.

  • Versteegh, K. (1997). The Arabic language (p. 277). Columbia: Columbia University Press-Foreign Language Study.

    Google Scholar 

  • Voss, C., Tratz, S., Laoudiy, J., & Briesch, D. (2014). Finding romanized Arabic dialect in code-mixed tweets. In Proceedings of the 9th international conference on language resources and evaluation. Reykjavik, Iceland

  • Witt, A., Heid, U., Sasaki, F., & Sérasset, G. (2009). Multilingual language resources and interoperability. lre-intro.tex; 28/01/2009; 14:31; 2009 Kluwer Academic Publishers. The Netherlands.

  • Wray, S., & Ali, A. (2015). Crowdsource a little to label a lot: Labeling a speech corpus of dialectal Arabic. In Proceedings of Interspeech-2015. Dresden, Germany.

  • Younes, J., Achour, H., & Souissi, E. (2015). Constructing linguistic resources for the Tunisian dialect using textual user-generated contents on the social web. In Proceedings of the 1st international workshop on natural language processing for informal text (NLPIT 2015) in conjunction with the international conference on web engineering (ICWE 2015). Rotterdam, The Netherlands.

  • Younes, J., & Souissi, E. (2014). A quantitative view of Tunisian dialect electronic writing. In Proceedings of the 5th international conference on Arabic language processing (pp. 63–72). Oujda, Morocco.

  • Younes, J., Souissi, E., & Achour, H. (2016). A hidden Markov model for automatic transliteration of romanized Tunisian Dialect. In Proceedings of the 2nd international conference on arabic computational linguistics. Konya, Turkey.

  • Younes, J., Souissi, E., Achour, H., & Ferchichi, A. (2018). A sequence-to-sequence based approach for the double transliteration of Tunisian dialect. Procedia Computer Science, 142, 238–245.

    Article  Google Scholar 

  • Zaghouani, W., & Charfi, A. (2018). Arap-Tweet: A large multi-dialect twitter corpus for gender, age and language variety identification. In Proceedings of the 11th edition of the language resources and evaluation conference. Miyazaki, Japan.

  • Zaidan, O. F., & Callison-Burch, C. (2014). Arabic dialect identification. International Journal of Computational Linguistics (IJCL), 40(1), 171–202.

    Article  Google Scholar 

  • Zarra, T., Chiheb, R., Moumen, R., Faizi., R., & ElAfia. A. (2017). Topic and sentiment model applied to the colloquial Arabic: A case study of Maghrebi Arabic. In Proceedings of the 2017 international conference on smart digital environment (pp. 174–181). Rabat, Morocco.

  • Zbib, R., Malchiodi, K., Devlin, J., Stallard, D., Matsoukas, S., Schwartz, R., Makhoul, J., Zaidan, O. F., & Callison- Burch, C. (2012). Machine translation of Arabic dialects. In Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 49–59). Montreal, Canada.

  • Zribi, I., Boujelbane, R., Masmoudi, A., Khemakhem, M. E., Belguith, L. H., & Habash, N. (2014). A conventional orthography for Tunisian Arabic. In Proceedings of the 9th edition of the language resources and evaluation conference. Reykjavik, Iceland.

  • Zribi, I., Khemakhem, M. E., & Belguith, L. H. (2013a). Morphological analysis of Tunisian Dialect. In Proceeding of international joint conference on natural language processing (IJCNLP 2013). Nagoya, Japan.

  • Zribi, I., Graja, M., Khemakhem, M. E., Jaoua, M., & Belguith, L. H. (2013b). Orthographic transcription for Spoken Tunisian Arabic. In Proceedings of the 14th international conference on intelligent text processing and computational linguistics (pp. 153–163). Samos, Greece.

  • Zribi, I., Kammoun, I., Khemakhem, M. E., Belguith, L. H. & Blache, P. (2016). Sentence boundary detection for transcribed Tunisian Arabic. In Proceedings of the 13th conference on natural language processing (KONVENS 2016). Varanasi, India

  • Zribi, I., Khemakhem, M. E., Belguith, L. H., & Blache, P. (2015). Spoken Tunisian Arabic Corpus\STAC: Transcription and annotation. Research in Computing Science, 90, 123.

    Article  Google Scholar 

  • Zribi, I., Khemakhem, M. E., Belguith, L. H., & Blache, P. (2017). Morphological Disambiguation of Tunisian Dialect. Journal of King Saud University, 29, 147–155.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jihene Younes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Younes, J., Souissi, E., Achour, H. et al. Language resources for Maghrebi Arabic dialects’ NLP: a survey. Lang Resources & Evaluation 54, 1079–1142 (2020). https://doi.org/10.1007/s10579-020-09490-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-020-09490-9

Keywords

Navigation