Skip to main content
Log in

Towards an automatic extraction of synonyms for Quranic Arabic WordNet

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we developed an automatic extraction model of synonyms, which is used to construct our Quranic Arabic WordNet (QAWN) that depends on traditional Arabic dictionaries. In this work, we rely on three resources. First, the Boundary Annotated Quran Corpus that contains Quran words, Part-of-Speech, root and other related information. Second, the lexicon resources that was used to collect a set of derived words for Quranic words. Third, traditional Arabic dictionaries, which were used to extract the meaning of words with distinction of different senses. The objective of this work is to link the Quranic words of similar meanings in order to generate synonym sets (synsets). To accomplish that, we used term frequency and inverse document frequency in vector space model, and we then computed cosine similarities between Quranic words based on textual definitions that are extracted from traditional Arabic dictionaries. Words of highest similarity were grouped together to form a synset. Our QAWN consists of 6918 synsets that were constructed from about 8400 unique word senses, on average of 5 senses for each word. Based on our experimental evaluation, the average recall of the baseline system was 7.01 %, whereas the average recall of the QAWN was 34.13 % which improved the recall of semantic search for Quran concepts by 27 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.almaany.com/.

  2. https://ar.wikipedia.org/wiki/%D8%AC%D8%B2%D8%A1_%D8%B9%D9%85

References

  • Abouenour, L., Bouzoubaa, K., & Rosso, P. (2013). On the evaluation and improvement of Arabic WordNet coverage and usability. Language Resources and Evaluation, 47(3), 891–917.

    Article  Google Scholar 

  • Aliwy, A. H. (2013). Arabic morphosyntactic raw text part of speech tagging system. Repozytorium Uniwersytetu Warszawskiego.

  • Banerjee, S., & Pedersen, T. (2002). An adapted Lesk algorithm for word sense disambiguation using WordNet. In Computational linguistics and intelligent text processing (pp. 136–145). Berlin: Springer.‏

  • Brierley, C., Sawalha, M., & Atwell, E. (2012). Open-source boundary-annotated corpus for Arabic speech and language processing. In Proceedings of language resources and evaluation conference (LREC) 2012.

  • Elkateb, S., Black, W., Rodríguez, H., Alkhalifa, M., Vossen, P., Pease, A., & Fellbaum, C. (2006). Building a WordNet for Arabic. In Proceedings of the fifth international conference on language resources and evaluation (LREC 2006).

  • Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  • Fellbaum, C., & Vossen, P. (2007). Connecting the universal to the specific. In T. Ishida, S. R. Fussell & P. T. J. M. Vossen (Eds.), Intercultural collaboration: First international workshop (Vol. 4568, pp. 1–16). Lecture notes in computer science. New York: Springer

  • Fellbaum, c, & Vossen, P. (2012). Challenges for a multilingual WordNet. Language Resources and Evaluation, 46, 313–326.

    Article  Google Scholar 

  • Mandala, R., Takenobu, T., & Hozumi, T. (1998). The use of WordNet in information retrieval. In: Paper presented at the use of WordNet in natural language processing systems: Proceedings of the conference.

  • Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38, 39–41.

    Article  Google Scholar 

  • Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database*. International Journal of Lexicography, 3(4), 235–244.‏

  • Miller, G. A., & Fellbaum, C. (2007). WordNet then and now. Language Resources and Evaluation, 41, 209–214.

    Article  Google Scholar 

  • Poprat, M., Beisswanger, E., & Hahn, U. (2008, June). Building a BioWordNet by using WordNet’s data formats and WordNet’s software infrastructure: A failure story. In Software engineering, testing, and quality assurance for natural language processing (pp. 31–39). Association for Computational Linguistics.‏

  • Princeton. (2015). Retrived February 3, 2015, from https://wordnet.princeton.edu/.

  • Qurany. (2015). Retrived February 3, 2015, from http://quranytopics.appspot.com/.

  • Sawalha, M., & Atwell, E. (2010). Constructing and using broad-coverage lexical resource for enhancing morphological analysis of Arabic. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10).

  • Sawalha, M. (2011). Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora. PhD Thesis. School of Computing. University of Leeds.

  • Sawalha, M., Brierley, C., & Atwell, E. (2014). Automatically generated, phonemic Arabic-IPA pronunciation tiers for the boundary annotated Qur'an dataset for machine learning (version 2.0). In proceedings of LRE-Rel 2: 2nd workshop on language resources and evaluation for religious texts at LREC 2014. Reykjavik, Iceland.

  • Sawalha, M. S., Brierley, C., & Atwell, E. (2012). Open-source boundary-annotated Qur’an Corpus for Arabic and phrase breaks prediction in classical and modern standard Arabic text. Journal of Speech Sciences, 2(2), 175–191.

    Google Scholar 

  • Shoaib, M., Yasin, M. N., Hikmat, U. K., Saeed, M. I., & Khiyal, M. S. H. (2009, October). Relational WordNet model for semantic search in Holy Quran. In International conference on emerging technologies, 2009. ICET 2009  (pp. 29–34). IEEE.

  • Siemiński, A. (2011). Wordnet based word sense disambiguation. In Computational collective intelligence. Technologies and applications (pp. 405–414). Berlin:Springer.‏

  • Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E. G., & Milios, E. E. (2005, November). Semantic similarity methods in wordNet and their application to information retrieval on the web. In Proceedings of the 7th annual ACM international workshop on Web information and data management (pp. 10–16). ACM.‏

  • Yih, W.-T., & Meek, C. (2007). Improving similarity measures for short segments of text. In Paper presented at the AAAI.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majdi Sawalha.

Appendix

Appendix

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

AlMaayah, M., Sawalha, M. & Abushariah, M.A.M. Towards an automatic extraction of synonyms for Quranic Arabic WordNet. Int J Speech Technol 19, 177–189 (2016). https://doi.org/10.1007/s10772-015-9301-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9301-9

Keywords

Navigation