Skip to main content
Log in

Semantic indexing of Arabic texts for information retrieval system

  • Special Issue Article
  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

As part of information retrieval systems (IRS) and in the context of the use of ontologies for documents and queries indexing, we propose and evaluate in this paper the contribution of this approach applied to Arabic texts. To do this we indexed a corpus of Arabic text using Arabic WordNet. The disambiguation of words was performed by applying the Lesk algorithm. The results obtained by our experiment allowed us to deduct the contribution of this approach in IRS for Arabic texts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. CF concept frequency, IDF inverse of document frequency.

  2. Graph where the nodes are the indexing concepts and arcs represent latent contextual relationships between concepts.

  3. http://muchmore.dfki.de/.

  4. Medical subject heading.

  5. http://www.nlm.nih.gov/research/umls/.

  6. http://www.clefcampaign.org/.

  7. http://www.nlm.nih.gov/mesh/.

  8. http://trec.nist.gov/data/t9_filtering.html.

  9. Graphical word is any sequence of characters separated by two spaces.

  10. This time is obtained by Dell Core i5 machine, 4 GB of RAM, 500 GB hard drive.

References

  • Abderrahim, M.-A., Abderrahim, M.-E.-A., & Chikh, M.-A. (2013). Using Arabic WordNet for semantic indexation in information retrieval system. International Journal of Computer Science Issues, 10(2), 327–332.

    Google Scholar 

  • Abouenour, L., Bouzoubaa, K., & Rosso, P. (2013). On the evaluation and improvement of Arabic WordNet coverage and usability. Language Resources and Evaluation, 47(3), 891–917.

    Article  Google Scholar 

  • Achour, H., & Zouari, M. (2013). Multilingual learning objects indexing and retrieving based on ontologies. World Congress on IEEE 2013 of the Computer and Information Technology (WCCIT).

  • Agirre, E., & Rigau, G. (1996). Word sense disambiguation using conceptual density. Proceedings of the coling-ACL’96 workshop (pp. 16–22). Copenhagen.

  • Agirre, E., & Soroa, A. (2009). Personalizing pagerank for word sense disambiguation. Proceedings of the 12th conference of the European chapter of the ACL (pp. 33–41). Athens: ©2009 Association for Computational Linguistics.

  • Andreasen, T., Bulskov, H., & Knappe, R. (2003). Similarity for conceptual querying. 18th international symposium on computer and information sciences (pp 268–275).

  • Azzoug, W., & Boubekeur, F. (2013a). Pondération des Concepts en Indexation Sémantique. CORIA’13: Dixième édition de la Conférence en Recherche d’Information et Applications. Neuchatel, Suisse.

  • Azzoug, W., & Boubekeur, F. (2013b). Désambiguisation des sens des mots-application en recherche d’information. Dans 7ème Journées scientifiques pour la présentation des travaux de recherches des domaines de l’information, INFODays’ 2013. Chlef: Université Hassiba BenBouali.

    Google Scholar 

  • Azzoug, W., Boubekeur, F., & Boughanem, M. (2011). Indexation Sémantique de documents textuels. CIDE’11: 14 ème Conférence Internationale sue le Document Electronique. Rabat: Maroc.

    Google Scholar 

  • Azzoug, W., Boubekeur, F., & Boughanem, M. (2012). Les concepts sont-ils de bons candidats à l’indexation ?. COSI’12: 9ème édition du colloque sur l’optimisation et les systèmes d’information. Tlemcen: Algérie.

    Google Scholar 

  • Bakhouche, A., & Tlili-Guiassa, Y. (2012). Meaning representation for automatic indexing of Arabic texts. International Journal of Computer Science Issues (IJCSI), 9(6), 173–178.

    Google Scholar 

  • Baziz, M. (2005). Indexation Conceptuelle guidée par Ontologie pour la Recherche d’Information. Thèse Phd. Université Toulouse III-Paul Sabatier.

  • Baziz, M., Boughanem, M., & Aussenac-Gilles, N. (2004). In Y. Ding, K. Van Riejsbergen & I. Ounis, J. Jose (Eds.), The use of ontology for semantic representation of documents. The 2nd semantic web and information retrieval workshop (SWIR), SIGIR 2004 (pp. 38–45). Sheffield.

  • Baziz, M., Boughanem, M., & Aussenac-Gilles, N. (2005). A conceptual indexing approach based on document content representation. CoLIS5: Fifth international conference on conceptions of libraries and information science. Glasgow.

  • Black, W., & Sabri E. (2004). A prototype English-Arabic Dictionary based on WordNet. Proceedings of the second international WordNet conference (pp. 67–74). Brno.

  • Black, W., Elkateb, S., Rodriguez, H., Alkhalifa, M., Vossen, P., Pease, A., & Fellbaum, C. (2006). Introducing the Arabic WordNet Project. Proceedings of the third international WordNet conference (pp. 295–300).

  • Boubekeur, F., Boughanem, M., & Tamine, L., (2008). Une approche d’indexation conceptuelle de documents basée sur les graphes CP_Nets. COSI’08: Cinquième édition du colloque sur l’optimisation et les systèmes d’information. Tizi-Ouzou.

  • Boubekeur, F., Boughanem, M., Tamine, L., & Daoud, M. (2010). De l’utilisation de WordNet pour l’indexation conceptuelle des documents. CIDE’13: 13 ème Colloque International sur le Document Electronique. Paris: INHA.

    Google Scholar 

  • Boughanem, M., Mallak, I., & Prade, H. (2010). A new factor for computing the relevance of a document to a query. WCCI’10: IEEE world congress on comutational intelligence. Barcelone.

  • Boughanem, M., Soulé-Dupuy, C. (1992). A connexionist model for information retrieval. DEXA 1992 (pp. 260–265).

  • Dinh, D. (2012). Accés à l’information biomédicale : vers une approche d’indexation et de recherche d’information conceptuelle basée sur la fusion de ressources termino-ontologiques. Thèse Phd. Université de Toulouse 3 Paul Sabatier.

  • Dinh, D., & Tamine, L. (2010). Vers un modèle d’indexation sémantique adapté aux dossiers medicaux de patients. CORIA’10: Conférence francophone en Recherche d’Information et Applications (pp. 325–336).

  • Elkateb, S., Black, W., Rodríguez, H., Alkhalifa, M., Vossen, P., Pease, A., & Fellbaum, C. (2006a). Building a WordNet for Arabic. Proceedings of the fifth international conference on language resources and evaluation (pp. 29–34). Genoa.

  • Elkateb, S., Black, W., Vossen, P., Farwell, D., Rodríguez, H., Pease, A., & Alkhalifa, M. (2006b). Arabic WordNet and the challenges of Arabic. Proceedings of Arabic NLP/MT conference (pp. 15–24). London. Citeseer 2006.

  • Gasmi, M. (2009). Utilisation des ontologies pour l’indexation automatique des sites Web en Arabe. Mémoire de magister, Universite Kasdi Merbah Ouargla.

  • Harrathi, F., Roussey, C., Maisonnnasse, L., & Calabretto, S. (2010). Vers une approche statistique pour l’indexation sémantique des documents multilingues. Proceedings of Actes du XXVIII° congrés INFORSID. Marseille.

  • Hearst, M. A., & Karadi, C. (1997). Cat-a-cone : an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy. 20th International conference on research and development in information retrieval (pp. 246–257). SIGIR 1997.

  • Hernandez, N. (2005). Ontologies de Domaine pour la Modélisation du Contexte en Recherche d’Information. Thèse Phd. Université Toulouse III-Paul Sabatier.

  • Hernandez, N., Hubert, G., Mothe, J., & Ralalason, B. (2008). RI et Ontologies. Technical report, Université Toulouse III-Paul Sabatier.

  • Khan, L. R. (2000). Ontology based information selection. Phd thesis. Faculty of the Graduate School, University of Southern California.

  • Khan, L. R., Mc Leod, D., & Hovy, E. (2004). Retrival effectiveness of an ontology based model for information selection. The VLDB Journal (13th ed.) (pp. 71–85).

  • Kim, H., Park, C. S., Park, J. Y., Jung, B., & Lee, Y. J. (2007). A multimedia content management and retrieval system based on metadata and ontologies. IEEE international conference on multimedia and expo (pp 556—559).

  • Köhler, J., Philippi, S., Specht, M., & Rüegg, A. (2006). Ontology based text indexing and querying for the Semantic Web. Knowledge Based Systems, 19, 744–754.

    Article  Google Scholar 

  • Magnini, B., & Cavagli, G. (2000). Integrating subject field cods into WordNet. Proceedings of LREC-2000: second international conference on language resources and evaluation (pp. 1413–1418). Athens.

  • Maisonnasse, L., Gaussier, E., & Chevallet, J.-P. (2009). Model fusion in conceptual language modeling. ECIR 2009 (pp. 240–251).

  • Mallak, I. (2011). De nouveaux facteurs pour l’exploitation de la sémantique d’un texte en recherche d’information. Thèse Phd. Université de Toulouse.

  • Mihalcea, R., & Moldovan, D. I. (2000). Semantic indexing using WordNet senses. ACL workshop on IR and NLP (pp. 35–45).

  • Tamine, L. (2000). Optimisation de Requêtes dans un Système de Recherche d’Information. Thèse Phd. Université Toulouse III-Paul Sabatier.

  • Tazzite, N., Yousfi, A., & Bouyakhf, H. (2008). Conception et réalisation d’un système de recherche d’informations intégrant des connaissances sémantiques dans la phase d’indexation. NTIC’08, Les Technologies de l’information: statuts ET opportunités pour l’amazighe. Rebat MAROC. Retrieved from 28 Nov 2008.

  • Vallet, D., Castells, P., Fernández, M., Mylonas, P., & Avrithis, Y. (2007). Personalized content retrieval in context using ontological knowledge. IEEE Transactions on Circuits and Systems for Video Technology, 17, 336–346.

    Article  Google Scholar 

  • Vasilescu, F. (2003). Monolingual corpus disambiguation by the approaches of Lesk. Master’s thesis. University of Montreal, Faculty of Arts and Sciences.

  • Voorhees E. 1993. Using WordNet to disambiguate word senses for text retrieval. Proceedings of the 16th annual conference on research and development in information retrieval, SIGIR’93. Pittsburgh, PA.

  • Wang, H., Chia, L. T., & Liu, S. (2007). Semantic retrieval with enhanced matchmaking and multi-modality ontology. IEEE international conference on multimedia and expo (pp. 516–519).

  • Xiaomeng, S., & Atle, J. G. (2006). An information retrieval approach to ontology mapping. Data & Knowledge Engineering, 58, 47–69.

    Article  Google Scholar 

  • Zouaghi, A., Merhbene, L., & Zrigui, M. (2012a). Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation. Artificial Intelligence Review, 38(4), 257–269.

    Article  Google Scholar 

  • Zouaghi, A., Zrigui, M., Antoniadis, G., & Merhbene, L. (2012b). Contribution to semantic analysis of Arabic language. Advances in Artificial Intelligence, 2012, 11.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Alaeddine Abderrahim.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abderrahim, M.A., Dib, M., Abderrahim, M.EA. et al. Semantic indexing of Arabic texts for information retrieval system. Int J Speech Technol 19, 229–236 (2016). https://doi.org/10.1007/s10772-015-9307-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9307-3

Keywords

Navigation