Skip to main content
Log in

Towards a new possibilistic query translation tool for cross-language information retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Approaches of query translation in Cross-Language Information Retrieval (CLIR) have frequently used dictionaries which suffer from translation ambiguity. Besides, a word-by-word query translation is not sufficient. In this paper, we propose, evaluate and compare a new possibilistic approach for query translation in order to improve the previous dictionary-based ones. This approach uses a probability-to-possibility transformation as a mean to introduce further tolerance in query translation process. Firstly, we identify noun phrases (NPs) in the source query and translate them as units using translation patterns and a language model. Secondly, source query terms which are not included in any selected NPs are translated word-by-word using our new possibilistic approach of single word translation. Indeed, we take into account all query words and their translations when we choose the suitable translation of a given word. We start from the idea that the correct suitable translations of query terms have a tendency to co-occur in the target language documents unlike unsuitable ones. Finally, to increase the coverage of the bilingual dictionary, additional words and their translations are automatically generated from a parallel bilingual corpus. We tested our approach using the French-English parallel text corpus Europarl and the CLEF-2003 French-English CLIR test collection. The reported experiments showed the performance of the probability-to-possibility transformation-based approach compared to the probabilistic one and to some state-of-the-art CLIR tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://www.illc.uva.nl/EuroWordNet/

  2. http://nlp.stanford.edu/software/lex-parser.shtml

  3. http://www.statmt.org/moses/

  4. http://www.statmt.org/moses/giza/GIZA++.html

  5. http://berkeleyaligner.sourceforge.net/

  6. http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

  7. SPORT (in French): Système POssibiliste de tRaduction de requêTes.

  8. http://terrier.org/

  9. https://code.google.com/p/java-google-translate-text-to-speech/

  10. http://classifier4j.sourceforge.net/

  11. SPORSER (in French): Système POssibiliste de Reformulation SEmantique de Requêtes.

  12. http://prefuse.org/

  13. http:// www.clef-campaign.org/

  14. http://www.statmt.org/europarl/

  15. http://www.reverso.net/spell-checker/english-spelling-grammar/

  16. https://translate.google.fr/?hl=fr

References

  1. Abdelali A, Cowie JR, Farwell D, Ogden WC (2004) Uclir: a multilingual information retrieval tool. In Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial 8(22):103–110

    Google Scholar 

  2. Adriani M (2000) Using statistical term similarity for sense disambiguation in cross-language information retrieval. Inf Retr 2(1):67–78

    Article  Google Scholar 

  3. Adriani M, van Rijsbergen CJ (1999) Term similarity-based query expansion for cross language information retrieval. In: Abiteboul S and Vercoustre AM (eds) Proceedings of the 3rd European conference on research and advanced Technology for Digital Libraries. Springer, LNCS 1696, p 311–322

  4. Ayed R, Bounhas I, Elayeb B, Evrard F, Bellamine Ben Saoud N (2012a) Arabic morphological analysis and disambiguation using a possibilistic classifier. In: intelligent computing theories and applications - 8th international conference. Springer-Verlag Berlin Heidelberg, LNAI 7390, p 274–279

  5. Ayed R, Bounhas I, Elayeb B, Evrard F, Bellamine Ben Saoud N (2012b) A possibilistic approach for the automatic morphological disambiguation of Arabic texts. In: Proceedings of the 13th ACIS international conference on software engineering. Artificial Intelligence, Networking and Parallel/Distributed Computing, pp 187–194

    Google Scholar 

  6. Ayed R, Bounhas I, Elayeb B, Bellamine Ben Saoud N, Evrard F (2014a) Evaluation d’une approche possibiliste pour la désambiguïsation des textes arabes. In: TALN-2014. Actes de la conférence Traitement Automatique des Langues. Marseille, France, pp 316–327

    Google Scholar 

  7. Ayed R, Bounhas I, Elayeb B, Bellamine Ben Saoud N, Evrard F (2014b) Improving arabic texts morphological disambiguation using possibilistic classifier. In: Proceedings of the 19th International Conference on Application of Natural Language to Information Systems. Springer International Publishing Switzerland, LNCS 8455, Montpellier, France, p 138–147

  8. Ballesteros L, Croft WB (1996). Dictionary methods for cross-lingual information retrieval. In: Thomas H and Wagner RR (eds) Proceedings of the 7th international DEXA conference on database and expert systems applications. Springer-Verlag Berlin Heidelberg, LNCS 1134, p 791–801

  9. Ballesteros L, Croft WB (1997) Phrasal translation and query expansion techniques for cross-language information retrieval. In: Proceedings of the 20th International Conference on Research and Development in Information Retrieval, p 84–91

  10. Ballesteros L, Croft WB (1998) Resolving ambiguity for cross-language retrieval. In: Proceedings of the 21st international conference on Research and Development in information retrieval. Melbourne, Australia, pp 64–71

    Google Scholar 

  11. Ben Amor N, Mellouli K, Benferhat S, Dubois D, Prade H (2002) A theoretical framework for possibilistic independence in a weakly ordered setting. Int J Uncertainty, Fuzziness Knowledge Based Syst 10:117–155

    Article  MathSciNet  MATH  Google Scholar 

  12. Ben Khiroun O, Elayeb B, Bounhas I, Evrard F, Bellamine Ben Saoud N (2011) A possibilistic approach for semantic query expansion. In: Proceedings of the 4th international conference on internet technologies and applications. Wrexham, Wales, pp 308–316

    Google Scholar 

  13. Ben Khiroun O, Elayeb B, Bounhas I, Evrard F, Bellamine Ben Saoud N (2012) A possibilistic approach for automatic word sense disambiguation. In: Proceedings of the 24th conference on computational linguistics and speech processing. Chung-Li, Taiwan, China, pp 261–275

    Google Scholar 

  14. Ben Khiroun O, Elayeb B, Bounhas I, Evrard F, Bellamine Ben Saoud N (2014a) Improving query expansion by automatic query disambiguation in intelligent information retrieval. In: Proceedings of the 6th international conference on agents and artificial intelligence. SciTePress, Angers, Loire Valley, France, pp 153–160

    Google Scholar 

  15. Ben Khiroun O, Ayed R, Elayeb B, Bounhas I, Bellamine Ben Saoud N, Evrard F (2014b) Towards a new standard arabic test collection for mono- and cross-language information retrieval. In: Proceedings of the 19th International Conference on Application of Natural Language to Information Systems. Springer International Publishing Switzerland, LNCS 8455, Montpellier, France, p 168–171

  16. Ben Romdhane W, Elayeb B, Bounhas I, Evrard F, Bellamine Ben Saoud, N (2013) A Possibilistic query translation approach for cross-language information retrieval. In: De-Shuang Huang et al. (eds) Proceedings of the 9th International Conference on Intelligent Computing. Springer-Verlag Berlin Heidelberg, LNCS 7996, Nanning, China, p 73–82

  17. Bian GW, Teng SY (2008) Integrating query translation and text classification in a cross-language patent access system. In: Proceedings of NTCIR-7 Workshop Meeting, p 16–19

  18. Bounhas I, Elayeb B, Evrard F, Slimani Y (2011a) ArabOnto: experimenting a new distributional approach for building Arabic ontological resources. Int J Metadata Semant Ontol 6(2):81–95

    Article  Google Scholar 

  19. Bounhas I, Elayeb B, Evrard F, Slimani Y (2011b) Organizing contextual knowledge for arabic text disambiguation and terminology extraction. Knowl Organ 38(6):473–490

    Google Scholar 

  20. Bounhas M, Mellouli K, Prade H, Serrurier M (2013) Possibilistic Classifiers for numerical data. Soft Comput 17(5):733–751

    Article  MATH  Google Scholar 

  21. Bounhas M, Ghasemi MH, Prade H, Serrurier M, Mellouli K (2014a) Naïve possibilistic classifiers for imprecise or uncertain numerical data. Fuzzy Sets Syst 239:137–156

    Article  MATH  Google Scholar 

  22. Bounhas I, Lahbib W, Elayeb B (2014b) Arabic domain terminology extraction: A literature review. In: R. Meersman et al. (eds) Proceedings of The 13th International Conference on Ontologies, DataBases, and Applications of Semantics. Springer-Verlag Berlin Heidelberg, LNCS 8841, Amantea, Italy, p 792–799

  23. Bounhas I, Lahbib W, et Elayeb B (2014c). Extraction de terminologies en langue Arabe: un état de l’art. In Proceedings of Cinquième Journées Francophones sur les Ontologies (JFO). Hammamet, Tunisia, p 271–282

    Google Scholar 

  24. Bounhas I, Ayed R, Elayeb B, Evrard F, Bellamine Ben Saoud N (2015a) Experimenting a discriminative possibilistic classifier with reweighting model for Arabic morphological disambiguation. Comput Speech Lang 33(2015):67–87

    Article  Google Scholar 

  25. Bounhas I, Ayed R, Elayeb B, Bellamine Ben Saoud N (2015b) A hybrid possibilistic approach for Arabic full morphological disambiguation. Data Knowl Eng 100(2015):240–254

    Article  Google Scholar 

  26. Capstick J, Diagne AK, Erbach G, Uszkoreit H, Leisenberg A, Leisenberg M (2000) A system for supporting cross-lingual information retrieval. Inf Process Manag 36(2):275–289

    Article  Google Scholar 

  27. Chinnakotla MK, Ranadive S, Bhattacharyya P, Damani OP (2007) Hindi and marathi to english cross language information retrieval at CLEF 2007. In: working notes for CLEF 2007 workshop, Budapest, Hungary. http://ceur-ws.org/Vol-1173/CLEF2007wn-adhoc-KumarChinnakotlaEt2007.pdf

  28. Church K, Gale W, Hanks P, Hindle D (1991) Using statistics in lexical analysis. Lexical Acquisition: exploiting on-line resources to build a Lexicon. Lawrence Erlbaum Associates, Hillsdale, pp 115–164

    Google Scholar 

  29. Civanlar MR, Trussell HJ (1986) Constructing membership functions using statistical data. Fuzzy Sets Syst 18:1–13

    Article  MathSciNet  MATH  Google Scholar 

  30. Clough P, Stevenson M (2004) Cross-language information retrieval using EuroWordNet and word sense disambiguation. In: McDonald S and Tait J I (eds), Proceedings of the European Conference in Information Retrieval. Springer-Verlag, Heidelberg, LNCS 2997, p 327–337

  31. Daille B (1994) Approche mixte pour l’extraction de terminologie : statistique lexicale et filtres linguistiques. Ph.D. Thesis, University of Paris 7 (In French)

  32. Davis MW, Ogden WC (1997) Free resources and advanced alignment for cross-language text retrieval. In: Proceedings of The sixth Text REtrieval Conference (TREC-1997), p 385–395

  33. Delgado M, Moral S (1987) On the concept of possibility-probability consistency. Fuzzy Sets Syst 21(3):311–318

    Article  MathSciNet  MATH  Google Scholar 

  34. Dubois D (2006) Possibility theory and statistical reasoning. Comput Stat Data Anal 51(1):47–69

    Article  MathSciNet  MATH  Google Scholar 

  35. Dubois D, Prade H (1985) Unfair coins and necessity measures: towards a possibilistic interpretation of histograms. Fuzzy Sets Syst 10(1):15–20

    MathSciNet  MATH  Google Scholar 

  36. Dubois D, Prade H (eds) (1987) Théorie des Possibilités: Application à la Représentation des Connaissances en Informatique. Edition Masson, Paris

    Google Scholar 

  37. Dubois D, Prade H (1992) When upper probabilities are possibility measures. Fuzzy Sets Syst 49:65–74

    Article  MathSciNet  MATH  Google Scholar 

  38. Dubois D, Prade H (1993) Fuzzy sets and probability: misunderstandings, bridges and gaps. Proceedings of the Second IEEE Conference on Fuzzy Systems, In, pp 1059–1068

    Google Scholar 

  39. Dubois D, Prade H (eds) (1994) Possibility theory: an approach to computerized processing of uncertainty. Plenum Press, New York

    Google Scholar 

  40. Dubois D and Prade H (1998) Possibility theory: qualitative and quantitative aspects. In: Gabbay DM and Smets Ph. (eds),Quantified representation of uncertainty and imprecision, Handbook of Defeasible Reasoning and Uncertainty Management Systems. Klower Academic Publishers, Netherlands, vol. 1, pp. 169–226.

  41. Dubois D, Prade H (2000) An overview of ordinal and numerical approaches to causal diagnostic problem solving, abductive reasoning and learning. In: Gabbay DM, Kruse R (eds) Handbooks of defeasible reasoning and uncertainty management systems, Drums Handbooks, vol, vol 4, pp 231–280

    Google Scholar 

  42. Dubois D, Prade H (2006) Représentations formelles de l’incertain et de l’imprécis, Concepts et méthodes pour l’aide à la décision - outils de modélisation. In: Bouyssou D, Dubois D, Pirlot M, Prade H (eds), vol 2, pp 99–137

  43. Dubois D, Prade H (2009) Formal representations of uncertainty. In: Bouyssou D, Dubois D, Pirlot M, Prade H (eds) Decision-making process. ISTE & Hoboken, Wiley, London, UK, NJ, USA, pp 85–156

    Chapter  Google Scholar 

  44. Dubois D, Prade H, Sandri S (1993) On possibility/probability transformation. In: Proceedings of the 4th IFSA Conference, p 103–112

  45. Dubois D, Foulloy L, Mauris G, Prade H (2004) Probability-possibility transformations, triangular fuzzy sets and probabilistic inequalities. Reliab Comput 10(4):273–297

    Article  MathSciNet  MATH  Google Scholar 

  46. Dunning T (1994) Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19:61–74

    Google Scholar 

  47. Elayeb B (2009) SARIPOD: système multi-agent de Recherche Intelligente Possibiliste des Documents Web. Ph.D. Thesis, The National Polytechnic Institute of Toulouse, France

  48. Elayeb B, Bounhas I (2016) Arabic cross-language information retrieval: a review. ACM Transaction on Asian and Low-Resource Language Information Processing 15(3) 44 pages

  49. Elayeb B, Evrard F, Zaghdoud M, Ben Ahmed M (2009) Towards an intelligent possibilistic web information retrieval using multiagent system. Interactive Technology and Smart Education, Special issue: New Learning Support Systems 6(1):40–59

    Article  Google Scholar 

  50. Elayeb B, Bounhas I, Ben Khiroun O, Evrard F, Bellamine Ben Saoud N (2011) Towards a possibilistic information retrieval system using semantic query expansion. Int J Intell Inf Technol 7(4):1–25

    Article  Google Scholar 

  51. Elayeb B, Bounhas I, Ben Khiroun O, Evrard F, Bellamine Ben Saoud N (2015a) A comparative study between possibilistic and probabilistic approaches for monolingual word sense disambiguation. Knowl Inf Syst 44(1):91–126

    Article  Google Scholar 

  52. Elayeb B, Bounhas I, Ben Khiroun O, Bellamine Ben Saoud N (2015b) Combining semantic query disambiguation and expansion to improve intelligent information retrieval. In: Duval B, van den Herik J, Loiseau S, Filipe J (eds) ICAART2014 revised selected papers, LNAI, vol 8946, pp 280–295

    Google Scholar 

  53. Farag A, Nürnberger A (2012) Literature review of interactive cross-language information retrieval tools. Int Arab J Inform Technol 9(5):479–486

    Google Scholar 

  54. Farag S, Nürnberger A (2013) Translation ambiguity resolution using interactive contextual information. In: Przepiorkowski A et al. (eds) Computational linguistics. Springer-Verlag Berlin Heidelberg, SCI 458, p 219–240

  55. Gao J, Nie JY, Xun E, Zhang J, Zhou M, Huang C (2001) Improving query translation for cross-language information retrieval using statistical models. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and Development in information retrieval. New Orleans, Louisiana, USA, pp 96–104

    Google Scholar 

  56. Gao J, Nie JY, Xun E, Zhou M (2006) Statistical query translation models for cross-language information retrieval. ACM Trans Asian Lang Inform Process 5(4):323–359

    Article  Google Scholar 

  57. Haouari B, Ben Amor N, Elouedi Z, Mellouli K (2009) Naïve possibilistic network classifiers. Fuzzy Sets Syst 160(22):3224–3238

    Article  MATH  Google Scholar 

  58. Hedlund T, Airio E, Keskustalo H, Lehtokangas R, Pirkola A, Järvelin K (2004) Dictionary-based cross-language information retrieval: learning experiences from CLEF 2000-2002. Inf Retr 7(1–2):99–119

    Article  Google Scholar 

  59. Heer J, Card S, Landay J (2005) Prefuse : a toolkit for interactive information visualization. Proceedings of the SIGCHI conference on Human factors in computing systems, In, pp 421–430

    Google Scholar 

  60. Hiemstra D, Jong F (1999) Disambiguation strategies for cross-language information retrieval. In: Proceedings of the 3rd European Conference on Research and Advanced Technology for Digital Libraries, p 274–293

  61. Hull DA (1993) Using statistical testing in the evaluation of retrieval experiments. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p 329–338

  62. Hull DA (1997) Using structured queries for disambiguation in cross-language information retrieval. In: Hull D, Oard D (eds) Proceedings of the AAAI symposium on cross-language text and speech retrieval. AAAI Press, Menlo Park, pp 84–98

    Google Scholar 

  63. Hull DA, Grefenstette G (1996) Querying across languages: a dictionary-based approach to multilingual information retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p 46–57

  64. Jaynes ET, Bretthorst GL (eds) (2003) Probability theory the logic of science. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  65. Kadri Y (2008) Recherche d’Information Translinguistique sur les Documents en Arabe. Ph.D. Thesis, Faculty of Higher Studies, Montréal University, Canada

  66. Kadri Y, Nie JY (2004) Traduction des requêtes pour la recherché d’information translinguistique Anglais-Arabe. Proceedings of Conférence sur le Traitement Automatique des Langues Naturelles, In, pp 291–296

    Google Scholar 

  67. Kadri Y, Nie JY (2006a) Effective stemming for Arabic information retrieval. Proceeding of The British Computer Society, The challenge of Arabic for NLP/MT Conference, In, pp 68–74

    Google Scholar 

  68. Kadri Y, Nie JY (2006b) Improving query translation with confidence estimation for cross language information retrieval. Proceedings of the Conference on Information and Knowledge Management, In, pp 818–819

    Google Scholar 

  69. Kadri Y, Nie JY (2007) Combining resources with confidence measures for cross language information retrieval. Proceeding of the Ph.D. Workshop on Information and Knowledge Management, In, pp 131–138

    Google Scholar 

  70. Kadri Y, Nie JY (2008) A comparative study for query translation using linear combination and confidence measure. Proceedings of The Third International Joint Conference on Natural Language Processing, In, pp 181–188

    Google Scholar 

  71. Khemakhem A, Gargouri B, Ben Hamadou A (2013) Collaborative Enrichment of Electronic Dictionaries Standardized-LMF. In: Métais E, Meziane F, Saraee M et al. (eds) Natural Language Processing and Information Systems - 18th International Conference on Applications of Natural Language to Information Systems. Springer, LNCS 7934, Salford, UK, p 328–336

  72. Klir GJ (1990) A principle of uncertainty and information invariance. Int J Gen Syst 17(23):249–275

    Article  MATH  Google Scholar 

  73. Koehn Ph (2005) Europarl: A Parallel Corpus for Statistical Machine Translation. In: Proceedings of the 10th Machine Translation Summit, p 79–86

  74. Koeling R, McCarthy D, Carroll J (2005) Domain-specific sense distributions and predominant sense acquisition. In: Proceedings of the HLT/EMNLP 2005 - human language technology conference and conference on empirical methods in natural language processing. The Association for Computational Linguistics, Vancouver, British Columbia, Canada, pp 419–426

    Google Scholar 

  75. Kowk KL (2000) Exploiting a Chinese-English bilingual wordlist for English-Chinese cross-language information retrieval. In: Proceedings of the 5th international workshop on information retrieval with Asian languages. ACM Press, Hong Kong, pp 173–179

    Google Scholar 

  76. Lahbib W, Bounhas I, Elayeb B, Evrard F, Slimani Y (2013) An hybrid approach for arabic semantic relation extraction. In: Proceedings of the 26th international FLAIRS conference. AAAI press, St. Pete Beach, Florida, USA, pp 315–320

    Google Scholar 

  77. Lahbib W, Bounhas I and Elayeb B (2014) Arabic-english domain terminology extraction from aligned corpora. In: Meersman R et al. (eds) Proceedings of The 13th International Conference on Ontologies, DataBases, and Applications of Semantics. Springer-Verlag Berlin Heidelberg, LNCS 8841, Amantea, Italy, p 745–759

  78. Lefever E, Hoste V (2010) SemEval-2010 task 3: cross-lingual word sense disambiguation. In: Proceedings of the 5th international workshop on semantic evaluation (SemEval-2010). Uppsala, Sweden, pp 15–20

    Google Scholar 

  79. Lefever E, Hoste V (2013) SemEval-2013 task 10: cross-lingual word sense disambiguation. In: Second joint conference on lexical and computational semantics (*SEM), volume 2: seventh international workshop on semantic evaluation (SemEval 2013). Atlanta, Georgia, pp 158–166

    Google Scholar 

  80. Levow G-A, Oard DW, Resnik P (2005) Dictionary-based techniques for cross-language information retrieval. Inf Process Manag 41(3):523–547

    Article  Google Scholar 

  81. Lopez-Ostenero F, Gonzalo J, Penas A, Verdejo F (2003) Interactive cross-language searching: phrases are better than terms of query formulation and refinement. In: Peters C, Braschler M, Gonzalo J (eds) Proceedings of CLEF 2002, Springer-Verlag, Heidelberg, LNCS, vol 2785, pp 416–429

    Google Scholar 

  82. Luca EWD, Hauke S, Nurnberger A, Schlechtweg S (2006) MultiLexExplorer – combining multilingual web search with multilingual lexical resources. In: Proceedings of Combined Workshop on Language-Enabled Educational Technology and Development and Evaluation of Robust Spoken Dialogue Systems part of ECAI 2006, p 17–21

  83. Maisonnasse L (2008) Les supports de vocabulaires pour les systèmes de recherche d’information orientés précision : application aux graphes pour la recherche d’information médicale. Ph.D. Thesis, Joseph Fourier University, Grenoble I, France

  84. Nie JY (1998) TREC-7 CLIR using a probabilistic translation model. In: Proceedings of the 7th Text REtrieval Conference (TREC), p 482–488

  85. Nie JY (1999) CLIR using a probabilistic translation model based on web documents. In: Proceedings of the 8th Text REtrieval Conference (TREC). http://trec.nist.gov/pubs/trec8/papers/TREC8-Nie.pdf

  86. Oard DW, He D, Wang J (2008) User-assisted query translation for interactive cross-language information retrieval. Inf Process Manag 44(1):181–211

    Article  Google Scholar 

  87. Ogden WC, Davis MW (2000) Improving cross-language text retrieval with human interactions. In: proceedings of the 33rd Hawaii international conference on system sciences. Washington, DC, USA, p 3044

    Google Scholar 

  88. Papineni K, Roukos S, Ward T, Zhu W J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual meeting of the Association for Computational Linguistics. p 311–318

  89. Petrelli D, Beaulieu M, Sanderson M, Demetriou G, Herring P, Hansen P (2004) Observing users, designing clarity: a case study on the user-centered design of a cross-language information retrieval system. J Am Soc Inf Sci Technol 55(10):923–934

    Article  Google Scholar 

  90. Shinnou H, Sasaki M (2003) Unsupervised learning of word sense disambiguation rules by estimating an optimum iteration number in the em algorithm. Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003:41–48

    Article  Google Scholar 

  91. Smadja F, Mckeown KR, Hatzivassiloglou V (1996) Translating collocations for bilingual lexicons: a statistical approach. Comput Linguist 22:1–38

    Google Scholar 

  92. Soudani N, Bounhas I, Elayeb B, Slimani Y (2014a) Toward an Arabic ontology for Arabic word sense disambiguation based on normalized dictionaries. In: Meersman R et al. (eds) Proceedings of the 13th International Conference on Ontologies, DataBases, and Applications of Semantics. Springer-Verlag Berlin Heidelberg, LNCS 8842, Amantea, Italy, p 655–658

  93. Soudani N, Bounhas I, Elayeb B, Slimani Y (2014b) An LMF-based normalization approach of Arabic Islamic dictionaries for arabic word sense disambiguation: application on hadith. In: Proceedings of the 2nd International Conference on Islamic Applications in Computer Science and Technologies. Amman, Jordan

  94. Soudani N, Bounhas I, Elayeb B, Slimani Y (2014c) Generic normalization approach of Arabic dictionaries for Arabic word sense disambiguation. In: Proceedings of Cinquième Journées Francophones sur les Ontologies (JFO). Hammamet, Tunisia, pp 309–315

    Google Scholar 

  95. Ture F, Boschee E (2014) Learning to translate: a query-specific combination approach for cross-lingual information retrieval. In: Proceedings of the 2014 Conference on empirical methods in natural language processing. Doha, Qatar, p 589–599

  96. Vossen P (ed) (1998) EuroWordNet: a multilingual database with lexical semantic networks. Kluwer Academic Publishers, Norwell

    MATH  Google Scholar 

  97. Xu J, Weischedel R (2000) Cross-lingual information retrieval using Hidden Markov models. In: Proceedings of the 2000 Joint SIGDAT conference on empirical methods in natural language processing and very large corpora. Hong Kong, China, p 95–103

  98. Xu J, Fraser A, Makhoul J, Noamany M and Osman G (2001). UN Arabic English parallel text version 1.0 beta [CD-ROM]. Philadelphia: University of Pennsylvania, Linguistic Data Consortium.

  99. Xu J, Fraser A, Weischedel R (2002) TREC 2001: cross-lingual retrieval at BBN. In: Proceedings of the 10th Text REtrieval Conference. NIST Special Publication 500–250. Gaithersburg, Maryland, p 68–77

  100. Xun E (1999) Incremental english parsing using combination of statistic and learning methods. Ph.D. Thesis, Harbin Institute of Technology, China

  101. Xun E, Zhou M, Huang C (2000) A unified statistical model for the identification of English base NP. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. Hong Kong

  102. Yamada K (2001) Probability-possibility transformation based on evidence theory. In: Proceedings of the 9th International Fuzzy Systems Association World Congress, vol 1, pp 70–75

  103. Zadeh L (1978) Fuzzy sets as a basic for a theory of possibility. Fuzzy Sets Syst 1:3–28

    Article  MATH  Google Scholar 

  104. Zouaghi A, Merhbene L, Zrigui M (2012) Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation. Artif Intell Rev 38(4):257–269

    Article  Google Scholar 

Download references

Acknowledgements

Sincere thanks to the anonymous reviewers for their constructive comments, which significantly enhanced the quality of this manuscript during reviewing process. The authors wish to thank Muna Shdaifat who revised the paper and improved its English. We are also grateful to the Evaluations and Language resources Distribution Agency (ELDA) which kindly provided us the document collections of the CLEF-2003 campaign.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bilel Elayeb.

Appendix: The naïve Bayesian approach for NP translation

Appendix: The naïve Bayesian approach for NP translation

Naïve Bayesian disambiguation translation (NBDT) approaches are inspired from the Bayes rule. In these techniques, the input variables are supposed independent. In spite of their easiness, NBDT approaches can be often more efficient than some recognized QT ones [55, 56]. In addition, we can consider a given NBDT method as a Bayesian network in which predictive source query terms are supposed to be conditionally independent given the translation of each one [11, 20, 23].

Let us consider a French NP modelled as the vector FNP = {f 1 ,…,f n }, with its NP pattern, FPT. Using the bilingual dictionary, we search all the available English translations for each French term f i in FNP. We have also taken advantage of all the available translation patterns EPT for FPT. Then, the best English translated phrase, ENP* = {e 1 ,…,e m }, is the one that maximizes the formula (23) below.

$$ {ENP}^{\ast }=\underset{ENP}{ \arg \max}\left( P\left(\left. ENP\right| FNP\right)\right)=\underset{ENP}{ \arg \max}\left( P\left(\left. FNP\right| ENP\right)\ast P(ENP)\right)=\underset{e_j}{ \arg \max}\left( P\left({e}_j\right)\ast \prod_{i=1}^n P\left(\left.{f}_i\right|{e}_j\right)\right) $$
(23)

Using the Bayes rule:

$$ P\left(\left.{e}_j\right|{f}_1,{f}_2,\dots, {f}_n\right)=\frac{P\left({e}_j\right)\ast P\left(\left.{f}_1,{f}_2,\dots, {f}_n\right|{e}_j\right)}{P\left({f}_1,{f}_2,\dots, {f}_n\right)} $$
(24)

Where: P(FNP|ENP) is the translation probability, and P(ENP) is a priori probability of words of the translated English NP.

When calculating the maximum posterior probability of a translation, we can ignore the normalizing factor P(f 1 , f 2 ,, f n ) since it does not depend on the translation e j . We estimate the most important factor in formula (24), namely P(f 1 , f 2 ,…, f n |e j ), using training data. We can decompose the likelihood into a product of terms, as given in formula (25), because naïve Bayes supposes that conditional probabilities of source query terms are statistically independent.

$$ P\left(\left.{f}_1,{f}_2,\dots, {f}_n\right|{e}_j\right)=\prod_{i=1}^n P\left(\left.{f}_i\right|{e}_j\right) $$
(25)

Given an NP (FNP or ENP), as a set of terms (F or E) grouped by an NP pattern (FPT or EPT). We suppose that the translation of terms and NP patterns are independent as given in formula (26):

$$ \begin{array}{l} P\left(\left. F NP\right| E NP\right)= P\left(\left. F, FPT\right| E, EPT\right)\\ {}= P\left(\left. F\right| E, EPT\right)\ast P\left(\left. F PT\right| E, EPT\right)= P\left(\left. F\right| E\right)\ast P\left(\left. F PT\right| E PT\right)\end{array} $$
(26)

Replacing formula (26) in formula (23) as the following:

$$ {ENP}^{\ast }=\underset{ENP}{ \arg \max}\left( P\left(\left. F\right| E\right)\ast P\left(\left. F PT\right| E PT\right)\ast P(ENP)\right) $$
(27)

Where: P(F|E) is the translation probability from English words E in ENP to French words F in FNP. Given the English pattern EPT, P(FPT|EPT) is the probability of the translation pattern FPT.

These probabilities are estimated as follows:

$$ P\left(\left. FPT\right| EPT\right)=\frac{Occ\left( FPT, EPT\right)}{Occ(EPT)} $$
(28)

Where: Occ(EPT) is the number of occurrences of EPT in the English portion of the aligned bilingual corpus; and Occ(FPT, EPT) is the number of times FPT corresponds to EPT in the aligned sentences.

P(ENP) is determined by the English trigram language model as given in formula (29):

$$ P(ENP)= P\left({e}_1,\dots, {e}_n\right)=\prod_{i=1}^n P\left(\left.{e}_i\right|{e}_{i-2},{e}_{i-1}\right) $$
(29)

Unfortunately, the probabilistic approaches suggested a more complex NP identification process [100, 101]. Firstly, base NPs are selected and translated with high accuracy. Secondly, complex NPs are identified and translated with accuracy less than the base ones. To overcome this limit, reliable complex NPs are chosen in the second step using only a small set of syntactic patterns. For more details and discussions about the probabilistic model for query translation applied to English-Chinese CLIR, we refer to [56].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elayeb, B., Romdhane, W.B. & Saoud, N.B.B. Towards a new possibilistic query translation tool for cross-language information retrieval. Multimed Tools Appl 77, 2423–2465 (2018). https://doi.org/10.1007/s11042-017-4398-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4398-2

Keywords

Navigation