A new Quranic Corpus rich in morphosyntactical information

Zeroual, Imad; Lakhouaja, Abdelhak

doi:10.1007/s10772-016-9335-7

A new Quranic Corpus rich in morphosyntactical information

Published: 16 February 2016

Volume 19, pages 339–346, (2016)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Imad Zeroual¹ &
Abdelhak Lakhouaja¹

520 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

There is not a widely amount of available annotated Arabic corpora. This leads us to contribute to the enrichment of Arabic corpora resources. In this regard, we have decided to start working with correct and carefully selected texts. Thus, beginning with the Quranic Arabic text is the best way to start for such an effort. Furthermore, the annotating linguistic resources, such as Quranic Corpus, are important for researchers working in all Arabic natural language processing fields. To the best of our knowledge, the only available Quranic Arabic corpora are from the University of Leeds, University of Jordan and the University of Haifa. Unfortunately, these corpora have several problems and they do not contain enough grammatical and syntactical information. To build a new Corpus of the Quran, the work used a semi-automatic technique, which consists in using the morphsyntactic of standard Arabic words “AlKhalil Morpho Sys” followed by a manual treatment. As a result of this work, we have built a new Quranic Corpus rich in morphosyntactical information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Arabic Corpus Linguistics: Major Progress, but Still a Long Way to Go

A Large Terminological Dictionary of Arabic Compound Words

Language resources for Maghrebi Arabic dialects’ NLP: a survey

Article 25 April 2020

Jihene Younes, Emna Souissi, … Ahmed Ferchichi

References

Albared, M., Omar, N., & Ab Aziz, M. J. (2011). Developing a competitive HMM Arabic POS tagger using small training corpora. In intelligent information and database systems (pp. 288–296). Springer Berlin Heidelberg.
Atwell, E., Brierley, C., Dukes, K., Sawalha, M., & Sharaf, A. B. (2011). An artificial intelligence approach to Arabic and Islamic content on the internet. In Proceedings of NITS 3rd National Information Technology Symposium.
Boudchiche, M., Mazroui, A., Lakhouaja, A., & Ould Bebah, M. (2014, February 8). L’Analyseur Morphosyntaxique AlKhalil Morpho Sys 2. 1ère Journée Doctorale Nationale sur L’Ingénierie de la Langue Arabe (JDILA’14).
Boudlal, A., Lakhouaja, A., Mazroui, A., Meziane, A., Bebah, M., & Shoul, M. (2010). Alkhalil Morpho SYS1: A Morphosyntactic Analysis System for Arabic Texts. In International Arab Conference on Information Technology.
Brierley, C., Sawalha, M., & Atwell, E. (2012). Open-source boundary-annotated Corpus for Arabic speech and language processing. In LREC (pp. 1011–1016).
Buckwalter, T. (2004, August). Issues in Arabic orthography and morphology analysis. In proceedings of the workshop on computational approaches to Arabic script-based languages (pp. 31–34). Association for computational linguistics.
Dror, J., Shaharabani, D., Talmon, R., & Wintner, S. (2004). Morphological analysis of the Qur’an. Literary and Linguistic Computing, 19(4), 431–452.
Article Google Scholar
Dukes, K., Atwell, E., & Habash, N. (2013). Supervised collaboration for syntactic annotation of Quranic Arabic. Language Resources and Evaluation, 47(1), 33–62.
Article Google Scholar
Dukes, K., & Buckwalter, T. (2010, March). A dependency treebank of the Quran using traditional Arabic grammar. In Informatics and systems (INFOS), 2010 The 7th International conference on (pp. 1–7). IEEE.
Dukes, K., & Habash, N. (2010, May). Morphological annotation of Quranic Arabic. In LREC
Elmahdy, M., Gruhn, R., Minker, W., & Abdennadher, S. (2009). Survey on common Arabic language forms from a speech recognition point of view. In International conference on Acoustics (NAG-DAGA), Rotterdam, Netherlands (pp. 63–66).
Fabri, R., Gasser, M., Habash, N., Kiraz, G., & Wintner, S. (2014). Linguistic introduction: The orthography, morphology and syntax of semitic languages. In natural language processing of semitic languages (pp. 3–41). Springer Berlin Heidelberg.
Habash, N., Diab, M. T., & Rambow, O. (2012). Conventional orthography for dialectal arabic. In LREC (pp. 711–718).
Habash, N., & Roth, R. M. (2009, August). Catib: The columbia arabic treebank. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp. 221–224). Association for computational linguistics.
KSU–Electronic Mosshaf project “Ayat”. http://quran.ksu.edu.sa/
Larkey, L. S., Ballesteros, L., & Connell, M. E. (2007). Light stemming for Arabic information retrieval. In Arabic computational morphology (pp. 221–243). Springer Netherlands.
Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004, September). The penn arabic treebank: Building a large-scale annotated arabic Corpus. In NEMLAR conference on Arabic language resources and tools (pp. 102–109).
Marton, Y., Habash, N., & Rambow, O. (2011, June). Improving Arabic dependency parsing with form-based and functional morphological features. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-Volume 1 (pp. 1586–1596). Association for computational linguistics.
Sawalha, M., & Atwell, E. (2013). A standard tag set expounding traditional morphological features for Arabic language part-of-speech tagging. Word Structure, 6(1), 43–99.
Article Google Scholar
Sawalha, M., Brierley, C., & Atwell, E. (2012). Predicting phrase breaks in classical and modern standard Arabic text. In LREC (pp. 3868–3872).
Sawalha, M., Brierley, C., & Atwell, E. (2014). Automatically generated, phonemic Arabic-IPA pronunciation tiers for the boundary annotated Qur’an dataset for machine learning (version 2.0). LRE-REL2, 42.
Sharaf, A. B. M., & Atwell, E. (2012). QurAna: Corpus of the Quran annotated with pronominal anaphora. In LREC (pp. 130–137).
Smrž, O., & Hajic, J. (2006). The other Arabic treebank: Prague dependencies and functions (p. 104). Arabic Computational Linguistics: Current Implementations. CSLI Publications.
Google Scholar
Watson, J. C. (2007). The phonology and morphology of Arabic. Oxford university press.
Zarrabi-Zadeh, H. (2007–2014). Tanzil Quran project. http://tanzil.net/
Zeroual, I., & Lakhouaja, A. (2014, November). A New Quranic Corpus rich in morphological information. In Procedings of the 5th International Conference on Arabic language processing CITALA2014, Oujda, Morocco.
Zitouni, I. (Ed.). (2014). Natural language processing of semitic languages (pp. 299–334). Springer.
Zitouni, I., & Benajiba, Y. (2014). Aligned-parallel-corpora based semi-supervised learning for Arabic mention detection. IEEE/ACM transactions on audio, speech and language processing (TASLP), 22(2), 314–324.

Download references

Acknowledgments

We would like to thank the Arabic Language Processing team in Oujda, especially Pr. Mazroui Azzeddine for his useful and relevant remarks. Also, we would like to thank Pr. Boudlal Abderrahim and Belahbib Rachid for their helpful information about the Arabic morphological and syntactical rules.

Author information

Authors and Affiliations

Computer Sciences Laboratory Faculty of Sciences, Mohammed First University, Oujda, Morocco
Imad Zeroual & Abdelhak Lakhouaja

Authors

Imad Zeroual
View author publications
You can also search for this author in PubMed Google Scholar
Abdelhak Lakhouaja
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Imad Zeroual.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Informed consent

The authors declare that this study does not involve human participation.

Research involving human participants and/or animals

The authors declare that this research not involves human subjects and/or animals research.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeroual, I., Lakhouaja, A. A new Quranic Corpus rich in morphosyntactical information. Int J Speech Technol 19, 339–346 (2016). https://doi.org/10.1007/s10772-016-9335-7

Download citation

Received: 27 February 2015
Accepted: 17 January 2016
Published: 16 February 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10772-016-9335-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new Quranic Corpus rich in morphosyntactical information

Abstract

Access this article

Similar content being viewed by others

Arabic Corpus Linguistics: Major Progress, but Still a Long Way to Go

A Large Terminological Dictionary of Arabic Compound Words

Language resources for Maghrebi Arabic dialects’ NLP: a survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Research involving human participants and/or animals

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Arabic Corpus Linguistics: Major Progress, but Still a Long Way to Go

A Large Terminological Dictionary of Arabic Compound Words

Language resources for Maghrebi Arabic dialects’ NLP: a survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Research involving human participants and/or animals

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation