In this paper, we developed an automatic extraction model of synonyms, which is used to construct our Quranic Arabic WordNet (QAWN) that depends on traditional Arabic dictionaries. In this work, we rely on three resources. First, the Boundary Annotated Quran Corpus that contains Quran words, Part-of-Speech, root and other related information. Second, the lexicon resources that was used to collect a set of derived words for Quranic words. Third, traditional Arabic dictionaries, which were used to extract the meaning of words with distinction of different senses. The objective of this work is to link the Quranic words of similar meanings in order to generate synonym sets (synsets). To accomplish that, we used term frequency and inverse document frequency in vector space model, and we then computed cosine similarities between Quranic words based on textual definitions that are extracted from traditional Arabic dictionaries. Words of highest similarity were grouped together to form a synset. Our QAWN consists of 6918 synsets that were constructed from about 8400 unique word senses, on average of 5 senses for each word. Based on our experimental evaluation, the average recall of the baseline system was 7.01 %, whereas the average recall of the QAWN was 34.13 % which improved the recall of semantic search for Quran concepts by 27 %.
This is a preview of subscription content,to check access.
Access this article
Similar content being viewed by others
Abouenour, L., Bouzoubaa, K., & Rosso, P. (2013). On the evaluation and improvement of Arabic WordNet coverage and usability. Language Resources and Evaluation, 47(3), 891–917.
Aliwy, A. H. (2013). Arabic morphosyntactic raw text part of speech tagging system. Repozytorium Uniwersytetu Warszawskiego.
Banerjee, S., & Pedersen, T. (2002). An adapted Lesk algorithm for word sense disambiguation using WordNet. In Computational linguistics and intelligent text processing (pp. 136–145). Berlin: Springer.
Brierley, C., Sawalha, M., & Atwell, E. (2012). Open-source boundary-annotated corpus for Arabic speech and language processing. In Proceedings of language resources and evaluation conference (LREC) 2012.
Elkateb, S., Black, W., Rodríguez, H., Alkhalifa, M., Vossen, P., Pease, A., & Fellbaum, C. (2006). Building a WordNet for Arabic. In Proceedings of the fifth international conference on language resources and evaluation (LREC 2006).
Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.
Fellbaum, C., & Vossen, P. (2007). Connecting the universal to the specific. In T. Ishida, S. R. Fussell & P. T. J. M. Vossen (Eds.), Intercultural collaboration: First international workshop (Vol. 4568, pp. 1–16). Lecture notes in computer science. New York: Springer
Fellbaum, c, & Vossen, P. (2012). Challenges for a multilingual WordNet. Language Resources and Evaluation, 46, 313–326.
Mandala, R., Takenobu, T., & Hozumi, T. (1998). The use of WordNet in information retrieval. In: Paper presented at the use of WordNet in natural language processing systems: Proceedings of the conference.
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38, 39–41.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database*. International Journal of Lexicography, 3(4), 235–244.
Miller, G. A., & Fellbaum, C. (2007). WordNet then and now. Language Resources and Evaluation, 41, 209–214.
Poprat, M., Beisswanger, E., & Hahn, U. (2008, June). Building a BioWordNet by using WordNet’s data formats and WordNet’s software infrastructure: A failure story. In Software engineering, testing, and quality assurance for natural language processing (pp. 31–39). Association for Computational Linguistics.
Princeton. (2015). Retrived February 3, 2015, from https://wordnet.princeton.edu/.
Qurany. (2015). Retrived February 3, 2015, from http://quranytopics.appspot.com/.
Sawalha, M., & Atwell, E. (2010). Constructing and using broad-coverage lexical resource for enhancing morphological analysis of Arabic. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10).
Sawalha, M. (2011). Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora. PhD Thesis. School of Computing. University of Leeds.
Sawalha, M., Brierley, C., & Atwell, E. (2014). Automatically generated, phonemic Arabic-IPA pronunciation tiers for the boundary annotated Qur'an dataset for machine learning (version 2.0). In proceedings of LRE-Rel 2: 2nd workshop on language resources and evaluation for religious texts at LREC 2014. Reykjavik, Iceland.
Sawalha, M. S., Brierley, C., & Atwell, E. (2012). Open-source boundary-annotated Qur’an Corpus for Arabic and phrase breaks prediction in classical and modern standard Arabic text. Journal of Speech Sciences, 2(2), 175–191.
Shoaib, M., Yasin, M. N., Hikmat, U. K., Saeed, M. I., & Khiyal, M. S. H. (2009, October). Relational WordNet model for semantic search in Holy Quran. In International conference on emerging technologies, 2009. ICET 2009 (pp. 29–34). IEEE.
Siemiński, A. (2011). Wordnet based word sense disambiguation. In Computational collective intelligence. Technologies and applications (pp. 405–414). Berlin:Springer.
Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E. G., & Milios, E. E. (2005, November). Semantic similarity methods in wordNet and their application to information retrieval on the web. In Proceedings of the 7th annual ACM international workshop on Web information and data management (pp. 10–16). ACM.
Yih, W.-T., & Meek, C. (2007). Improving similarity measures for short segments of text. In Paper presented at the AAAI.
About this article
Cite this article
AlMaayah, M., Sawalha, M. & Abushariah, M.A.M. Towards an automatic extraction of synonyms for Quranic Arabic WordNet. Int J Speech Technol 19, 177–189 (2016). https://doi.org/10.1007/s10772-015-9301-9