Abstract
Approaches of query translation in Cross-Language Information Retrieval (CLIR) have frequently used dictionaries which suffer from translation ambiguity. Besides, a word-by-word query translation is not sufficient. In this paper, we propose, evaluate and compare a new possibilistic approach for query translation in order to improve the previous dictionary-based ones. This approach uses a probability-to-possibility transformation as a mean to introduce further tolerance in query translation process. Firstly, we identify noun phrases (NPs) in the source query and translate them as units using translation patterns and a language model. Secondly, source query terms which are not included in any selected NPs are translated word-by-word using our new possibilistic approach of single word translation. Indeed, we take into account all query words and their translations when we choose the suitable translation of a given word. We start from the idea that the correct suitable translations of query terms have a tendency to co-occur in the target language documents unlike unsuitable ones. Finally, to increase the coverage of the bilingual dictionary, additional words and their translations are automatically generated from a parallel bilingual corpus. We tested our approach using the French-English parallel text corpus Europarl and the CLEF-2003 French-English CLIR test collection. The reported experiments showed the performance of the probability-to-possibility transformation-based approach compared to the probabilistic one and to some state-of-the-art CLIR tools.
Similar content being viewed by others
Notes
SPORT (in French): Système POssibiliste de tRaduction de requêTes.
SPORSER (in French): Système POssibiliste de Reformulation SEmantique de Requêtes.
http:// www.clef-campaign.org/
References
Abdelali A, Cowie JR, Farwell D, Ogden WC (2004) Uclir: a multilingual information retrieval tool. In Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial 8(22):103–110
Adriani M (2000) Using statistical term similarity for sense disambiguation in cross-language information retrieval. Inf Retr 2(1):67–78
Adriani M, van Rijsbergen CJ (1999) Term similarity-based query expansion for cross language information retrieval. In: Abiteboul S and Vercoustre AM (eds) Proceedings of the 3rd European conference on research and advanced Technology for Digital Libraries. Springer, LNCS 1696, p 311–322
Ayed R, Bounhas I, Elayeb B, Evrard F, Bellamine Ben Saoud N (2012a) Arabic morphological analysis and disambiguation using a possibilistic classifier. In: intelligent computing theories and applications - 8th international conference. Springer-Verlag Berlin Heidelberg, LNAI 7390, p 274–279
Ayed R, Bounhas I, Elayeb B, Evrard F, Bellamine Ben Saoud N (2012b) A possibilistic approach for the automatic morphological disambiguation of Arabic texts. In: Proceedings of the 13th ACIS international conference on software engineering. Artificial Intelligence, Networking and Parallel/Distributed Computing, pp 187–194
Ayed R, Bounhas I, Elayeb B, Bellamine Ben Saoud N, Evrard F (2014a) Evaluation d’une approche possibiliste pour la désambiguïsation des textes arabes. In: TALN-2014. Actes de la conférence Traitement Automatique des Langues. Marseille, France, pp 316–327
Ayed R, Bounhas I, Elayeb B, Bellamine Ben Saoud N, Evrard F (2014b) Improving arabic texts morphological disambiguation using possibilistic classifier. In: Proceedings of the 19th International Conference on Application of Natural Language to Information Systems. Springer International Publishing Switzerland, LNCS 8455, Montpellier, France, p 138–147
Ballesteros L, Croft WB (1996). Dictionary methods for cross-lingual information retrieval. In: Thomas H and Wagner RR (eds) Proceedings of the 7th international DEXA conference on database and expert systems applications. Springer-Verlag Berlin Heidelberg, LNCS 1134, p 791–801
Ballesteros L, Croft WB (1997) Phrasal translation and query expansion techniques for cross-language information retrieval. In: Proceedings of the 20th International Conference on Research and Development in Information Retrieval, p 84–91
Ballesteros L, Croft WB (1998) Resolving ambiguity for cross-language retrieval. In: Proceedings of the 21st international conference on Research and Development in information retrieval. Melbourne, Australia, pp 64–71
Ben Amor N, Mellouli K, Benferhat S, Dubois D, Prade H (2002) A theoretical framework for possibilistic independence in a weakly ordered setting. Int J Uncertainty, Fuzziness Knowledge Based Syst 10:117–155
Ben Khiroun O, Elayeb B, Bounhas I, Evrard F, Bellamine Ben Saoud N (2011) A possibilistic approach for semantic query expansion. In: Proceedings of the 4th international conference on internet technologies and applications. Wrexham, Wales, pp 308–316
Ben Khiroun O, Elayeb B, Bounhas I, Evrard F, Bellamine Ben Saoud N (2012) A possibilistic approach for automatic word sense disambiguation. In: Proceedings of the 24th conference on computational linguistics and speech processing. Chung-Li, Taiwan, China, pp 261–275
Ben Khiroun O, Elayeb B, Bounhas I, Evrard F, Bellamine Ben Saoud N (2014a) Improving query expansion by automatic query disambiguation in intelligent information retrieval. In: Proceedings of the 6th international conference on agents and artificial intelligence. SciTePress, Angers, Loire Valley, France, pp 153–160
Ben Khiroun O, Ayed R, Elayeb B, Bounhas I, Bellamine Ben Saoud N, Evrard F (2014b) Towards a new standard arabic test collection for mono- and cross-language information retrieval. In: Proceedings of the 19th International Conference on Application of Natural Language to Information Systems. Springer International Publishing Switzerland, LNCS 8455, Montpellier, France, p 168–171
Ben Romdhane W, Elayeb B, Bounhas I, Evrard F, Bellamine Ben Saoud, N (2013) A Possibilistic query translation approach for cross-language information retrieval. In: De-Shuang Huang et al. (eds) Proceedings of the 9th International Conference on Intelligent Computing. Springer-Verlag Berlin Heidelberg, LNCS 7996, Nanning, China, p 73–82
Bian GW, Teng SY (2008) Integrating query translation and text classification in a cross-language patent access system. In: Proceedings of NTCIR-7 Workshop Meeting, p 16–19
Bounhas I, Elayeb B, Evrard F, Slimani Y (2011a) ArabOnto: experimenting a new distributional approach for building Arabic ontological resources. Int J Metadata Semant Ontol 6(2):81–95
Bounhas I, Elayeb B, Evrard F, Slimani Y (2011b) Organizing contextual knowledge for arabic text disambiguation and terminology extraction. Knowl Organ 38(6):473–490
Bounhas M, Mellouli K, Prade H, Serrurier M (2013) Possibilistic Classifiers for numerical data. Soft Comput 17(5):733–751
Bounhas M, Ghasemi MH, Prade H, Serrurier M, Mellouli K (2014a) Naïve possibilistic classifiers for imprecise or uncertain numerical data. Fuzzy Sets Syst 239:137–156
Bounhas I, Lahbib W, Elayeb B (2014b) Arabic domain terminology extraction: A literature review. In: R. Meersman et al. (eds) Proceedings of The 13th International Conference on Ontologies, DataBases, and Applications of Semantics. Springer-Verlag Berlin Heidelberg, LNCS 8841, Amantea, Italy, p 792–799
Bounhas I, Lahbib W, et Elayeb B (2014c). Extraction de terminologies en langue Arabe: un état de l’art. In Proceedings of Cinquième Journées Francophones sur les Ontologies (JFO). Hammamet, Tunisia, p 271–282
Bounhas I, Ayed R, Elayeb B, Evrard F, Bellamine Ben Saoud N (2015a) Experimenting a discriminative possibilistic classifier with reweighting model for Arabic morphological disambiguation. Comput Speech Lang 33(2015):67–87
Bounhas I, Ayed R, Elayeb B, Bellamine Ben Saoud N (2015b) A hybrid possibilistic approach for Arabic full morphological disambiguation. Data Knowl Eng 100(2015):240–254
Capstick J, Diagne AK, Erbach G, Uszkoreit H, Leisenberg A, Leisenberg M (2000) A system for supporting cross-lingual information retrieval. Inf Process Manag 36(2):275–289
Chinnakotla MK, Ranadive S, Bhattacharyya P, Damani OP (2007) Hindi and marathi to english cross language information retrieval at CLEF 2007. In: working notes for CLEF 2007 workshop, Budapest, Hungary. http://ceur-ws.org/Vol-1173/CLEF2007wn-adhoc-KumarChinnakotlaEt2007.pdf
Church K, Gale W, Hanks P, Hindle D (1991) Using statistics in lexical analysis. Lexical Acquisition: exploiting on-line resources to build a Lexicon. Lawrence Erlbaum Associates, Hillsdale, pp 115–164
Civanlar MR, Trussell HJ (1986) Constructing membership functions using statistical data. Fuzzy Sets Syst 18:1–13
Clough P, Stevenson M (2004) Cross-language information retrieval using EuroWordNet and word sense disambiguation. In: McDonald S and Tait J I (eds), Proceedings of the European Conference in Information Retrieval. Springer-Verlag, Heidelberg, LNCS 2997, p 327–337
Daille B (1994) Approche mixte pour l’extraction de terminologie : statistique lexicale et filtres linguistiques. Ph.D. Thesis, University of Paris 7 (In French)
Davis MW, Ogden WC (1997) Free resources and advanced alignment for cross-language text retrieval. In: Proceedings of The sixth Text REtrieval Conference (TREC-1997), p 385–395
Delgado M, Moral S (1987) On the concept of possibility-probability consistency. Fuzzy Sets Syst 21(3):311–318
Dubois D (2006) Possibility theory and statistical reasoning. Comput Stat Data Anal 51(1):47–69
Dubois D, Prade H (1985) Unfair coins and necessity measures: towards a possibilistic interpretation of histograms. Fuzzy Sets Syst 10(1):15–20
Dubois D, Prade H (eds) (1987) Théorie des Possibilités: Application à la Représentation des Connaissances en Informatique. Edition Masson, Paris
Dubois D, Prade H (1992) When upper probabilities are possibility measures. Fuzzy Sets Syst 49:65–74
Dubois D, Prade H (1993) Fuzzy sets and probability: misunderstandings, bridges and gaps. Proceedings of the Second IEEE Conference on Fuzzy Systems, In, pp 1059–1068
Dubois D, Prade H (eds) (1994) Possibility theory: an approach to computerized processing of uncertainty. Plenum Press, New York
Dubois D and Prade H (1998) Possibility theory: qualitative and quantitative aspects. In: Gabbay DM and Smets Ph. (eds),Quantified representation of uncertainty and imprecision, Handbook of Defeasible Reasoning and Uncertainty Management Systems. Klower Academic Publishers, Netherlands, vol. 1, pp. 169–226.
Dubois D, Prade H (2000) An overview of ordinal and numerical approaches to causal diagnostic problem solving, abductive reasoning and learning. In: Gabbay DM, Kruse R (eds) Handbooks of defeasible reasoning and uncertainty management systems, Drums Handbooks, vol, vol 4, pp 231–280
Dubois D, Prade H (2006) Représentations formelles de l’incertain et de l’imprécis, Concepts et méthodes pour l’aide à la décision - outils de modélisation. In: Bouyssou D, Dubois D, Pirlot M, Prade H (eds), vol 2, pp 99–137
Dubois D, Prade H (2009) Formal representations of uncertainty. In: Bouyssou D, Dubois D, Pirlot M, Prade H (eds) Decision-making process. ISTE & Hoboken, Wiley, London, UK, NJ, USA, pp 85–156
Dubois D, Prade H, Sandri S (1993) On possibility/probability transformation. In: Proceedings of the 4th IFSA Conference, p 103–112
Dubois D, Foulloy L, Mauris G, Prade H (2004) Probability-possibility transformations, triangular fuzzy sets and probabilistic inequalities. Reliab Comput 10(4):273–297
Dunning T (1994) Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19:61–74
Elayeb B (2009) SARIPOD: système multi-agent de Recherche Intelligente Possibiliste des Documents Web. Ph.D. Thesis, The National Polytechnic Institute of Toulouse, France
Elayeb B, Bounhas I (2016) Arabic cross-language information retrieval: a review. ACM Transaction on Asian and Low-Resource Language Information Processing 15(3) 44 pages
Elayeb B, Evrard F, Zaghdoud M, Ben Ahmed M (2009) Towards an intelligent possibilistic web information retrieval using multiagent system. Interactive Technology and Smart Education, Special issue: New Learning Support Systems 6(1):40–59
Elayeb B, Bounhas I, Ben Khiroun O, Evrard F, Bellamine Ben Saoud N (2011) Towards a possibilistic information retrieval system using semantic query expansion. Int J Intell Inf Technol 7(4):1–25
Elayeb B, Bounhas I, Ben Khiroun O, Evrard F, Bellamine Ben Saoud N (2015a) A comparative study between possibilistic and probabilistic approaches for monolingual word sense disambiguation. Knowl Inf Syst 44(1):91–126
Elayeb B, Bounhas I, Ben Khiroun O, Bellamine Ben Saoud N (2015b) Combining semantic query disambiguation and expansion to improve intelligent information retrieval. In: Duval B, van den Herik J, Loiseau S, Filipe J (eds) ICAART2014 revised selected papers, LNAI, vol 8946, pp 280–295
Farag A, Nürnberger A (2012) Literature review of interactive cross-language information retrieval tools. Int Arab J Inform Technol 9(5):479–486
Farag S, Nürnberger A (2013) Translation ambiguity resolution using interactive contextual information. In: Przepiorkowski A et al. (eds) Computational linguistics. Springer-Verlag Berlin Heidelberg, SCI 458, p 219–240
Gao J, Nie JY, Xun E, Zhang J, Zhou M, Huang C (2001) Improving query translation for cross-language information retrieval using statistical models. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and Development in information retrieval. New Orleans, Louisiana, USA, pp 96–104
Gao J, Nie JY, Xun E, Zhou M (2006) Statistical query translation models for cross-language information retrieval. ACM Trans Asian Lang Inform Process 5(4):323–359
Haouari B, Ben Amor N, Elouedi Z, Mellouli K (2009) Naïve possibilistic network classifiers. Fuzzy Sets Syst 160(22):3224–3238
Hedlund T, Airio E, Keskustalo H, Lehtokangas R, Pirkola A, Järvelin K (2004) Dictionary-based cross-language information retrieval: learning experiences from CLEF 2000-2002. Inf Retr 7(1–2):99–119
Heer J, Card S, Landay J (2005) Prefuse : a toolkit for interactive information visualization. Proceedings of the SIGCHI conference on Human factors in computing systems, In, pp 421–430
Hiemstra D, Jong F (1999) Disambiguation strategies for cross-language information retrieval. In: Proceedings of the 3rd European Conference on Research and Advanced Technology for Digital Libraries, p 274–293
Hull DA (1993) Using statistical testing in the evaluation of retrieval experiments. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p 329–338
Hull DA (1997) Using structured queries for disambiguation in cross-language information retrieval. In: Hull D, Oard D (eds) Proceedings of the AAAI symposium on cross-language text and speech retrieval. AAAI Press, Menlo Park, pp 84–98
Hull DA, Grefenstette G (1996) Querying across languages: a dictionary-based approach to multilingual information retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p 46–57
Jaynes ET, Bretthorst GL (eds) (2003) Probability theory the logic of science. Cambridge University Press, Cambridge
Kadri Y (2008) Recherche d’Information Translinguistique sur les Documents en Arabe. Ph.D. Thesis, Faculty of Higher Studies, Montréal University, Canada
Kadri Y, Nie JY (2004) Traduction des requêtes pour la recherché d’information translinguistique Anglais-Arabe. Proceedings of Conférence sur le Traitement Automatique des Langues Naturelles, In, pp 291–296
Kadri Y, Nie JY (2006a) Effective stemming for Arabic information retrieval. Proceeding of The British Computer Society, The challenge of Arabic for NLP/MT Conference, In, pp 68–74
Kadri Y, Nie JY (2006b) Improving query translation with confidence estimation for cross language information retrieval. Proceedings of the Conference on Information and Knowledge Management, In, pp 818–819
Kadri Y, Nie JY (2007) Combining resources with confidence measures for cross language information retrieval. Proceeding of the Ph.D. Workshop on Information and Knowledge Management, In, pp 131–138
Kadri Y, Nie JY (2008) A comparative study for query translation using linear combination and confidence measure. Proceedings of The Third International Joint Conference on Natural Language Processing, In, pp 181–188
Khemakhem A, Gargouri B, Ben Hamadou A (2013) Collaborative Enrichment of Electronic Dictionaries Standardized-LMF. In: Métais E, Meziane F, Saraee M et al. (eds) Natural Language Processing and Information Systems - 18th International Conference on Applications of Natural Language to Information Systems. Springer, LNCS 7934, Salford, UK, p 328–336
Klir GJ (1990) A principle of uncertainty and information invariance. Int J Gen Syst 17(23):249–275
Koehn Ph (2005) Europarl: A Parallel Corpus for Statistical Machine Translation. In: Proceedings of the 10th Machine Translation Summit, p 79–86
Koeling R, McCarthy D, Carroll J (2005) Domain-specific sense distributions and predominant sense acquisition. In: Proceedings of the HLT/EMNLP 2005 - human language technology conference and conference on empirical methods in natural language processing. The Association for Computational Linguistics, Vancouver, British Columbia, Canada, pp 419–426
Kowk KL (2000) Exploiting a Chinese-English bilingual wordlist for English-Chinese cross-language information retrieval. In: Proceedings of the 5th international workshop on information retrieval with Asian languages. ACM Press, Hong Kong, pp 173–179
Lahbib W, Bounhas I, Elayeb B, Evrard F, Slimani Y (2013) An hybrid approach for arabic semantic relation extraction. In: Proceedings of the 26th international FLAIRS conference. AAAI press, St. Pete Beach, Florida, USA, pp 315–320
Lahbib W, Bounhas I and Elayeb B (2014) Arabic-english domain terminology extraction from aligned corpora. In: Meersman R et al. (eds) Proceedings of The 13th International Conference on Ontologies, DataBases, and Applications of Semantics. Springer-Verlag Berlin Heidelberg, LNCS 8841, Amantea, Italy, p 745–759
Lefever E, Hoste V (2010) SemEval-2010 task 3: cross-lingual word sense disambiguation. In: Proceedings of the 5th international workshop on semantic evaluation (SemEval-2010). Uppsala, Sweden, pp 15–20
Lefever E, Hoste V (2013) SemEval-2013 task 10: cross-lingual word sense disambiguation. In: Second joint conference on lexical and computational semantics (*SEM), volume 2: seventh international workshop on semantic evaluation (SemEval 2013). Atlanta, Georgia, pp 158–166
Levow G-A, Oard DW, Resnik P (2005) Dictionary-based techniques for cross-language information retrieval. Inf Process Manag 41(3):523–547
Lopez-Ostenero F, Gonzalo J, Penas A, Verdejo F (2003) Interactive cross-language searching: phrases are better than terms of query formulation and refinement. In: Peters C, Braschler M, Gonzalo J (eds) Proceedings of CLEF 2002, Springer-Verlag, Heidelberg, LNCS, vol 2785, pp 416–429
Luca EWD, Hauke S, Nurnberger A, Schlechtweg S (2006) MultiLexExplorer – combining multilingual web search with multilingual lexical resources. In: Proceedings of Combined Workshop on Language-Enabled Educational Technology and Development and Evaluation of Robust Spoken Dialogue Systems part of ECAI 2006, p 17–21
Maisonnasse L (2008) Les supports de vocabulaires pour les systèmes de recherche d’information orientés précision : application aux graphes pour la recherche d’information médicale. Ph.D. Thesis, Joseph Fourier University, Grenoble I, France
Nie JY (1998) TREC-7 CLIR using a probabilistic translation model. In: Proceedings of the 7th Text REtrieval Conference (TREC), p 482–488
Nie JY (1999) CLIR using a probabilistic translation model based on web documents. In: Proceedings of the 8th Text REtrieval Conference (TREC). http://trec.nist.gov/pubs/trec8/papers/TREC8-Nie.pdf
Oard DW, He D, Wang J (2008) User-assisted query translation for interactive cross-language information retrieval. Inf Process Manag 44(1):181–211
Ogden WC, Davis MW (2000) Improving cross-language text retrieval with human interactions. In: proceedings of the 33rd Hawaii international conference on system sciences. Washington, DC, USA, p 3044
Papineni K, Roukos S, Ward T, Zhu W J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual meeting of the Association for Computational Linguistics. p 311–318
Petrelli D, Beaulieu M, Sanderson M, Demetriou G, Herring P, Hansen P (2004) Observing users, designing clarity: a case study on the user-centered design of a cross-language information retrieval system. J Am Soc Inf Sci Technol 55(10):923–934
Shinnou H, Sasaki M (2003) Unsupervised learning of word sense disambiguation rules by estimating an optimum iteration number in the em algorithm. Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003:41–48
Smadja F, Mckeown KR, Hatzivassiloglou V (1996) Translating collocations for bilingual lexicons: a statistical approach. Comput Linguist 22:1–38
Soudani N, Bounhas I, Elayeb B, Slimani Y (2014a) Toward an Arabic ontology for Arabic word sense disambiguation based on normalized dictionaries. In: Meersman R et al. (eds) Proceedings of the 13th International Conference on Ontologies, DataBases, and Applications of Semantics. Springer-Verlag Berlin Heidelberg, LNCS 8842, Amantea, Italy, p 655–658
Soudani N, Bounhas I, Elayeb B, Slimani Y (2014b) An LMF-based normalization approach of Arabic Islamic dictionaries for arabic word sense disambiguation: application on hadith. In: Proceedings of the 2nd International Conference on Islamic Applications in Computer Science and Technologies. Amman, Jordan
Soudani N, Bounhas I, Elayeb B, Slimani Y (2014c) Generic normalization approach of Arabic dictionaries for Arabic word sense disambiguation. In: Proceedings of Cinquième Journées Francophones sur les Ontologies (JFO). Hammamet, Tunisia, pp 309–315
Ture F, Boschee E (2014) Learning to translate: a query-specific combination approach for cross-lingual information retrieval. In: Proceedings of the 2014 Conference on empirical methods in natural language processing. Doha, Qatar, p 589–599
Vossen P (ed) (1998) EuroWordNet: a multilingual database with lexical semantic networks. Kluwer Academic Publishers, Norwell
Xu J, Weischedel R (2000) Cross-lingual information retrieval using Hidden Markov models. In: Proceedings of the 2000 Joint SIGDAT conference on empirical methods in natural language processing and very large corpora. Hong Kong, China, p 95–103
Xu J, Fraser A, Makhoul J, Noamany M and Osman G (2001). UN Arabic English parallel text version 1.0 beta [CD-ROM]. Philadelphia: University of Pennsylvania, Linguistic Data Consortium.
Xu J, Fraser A, Weischedel R (2002) TREC 2001: cross-lingual retrieval at BBN. In: Proceedings of the 10th Text REtrieval Conference. NIST Special Publication 500–250. Gaithersburg, Maryland, p 68–77
Xun E (1999) Incremental english parsing using combination of statistic and learning methods. Ph.D. Thesis, Harbin Institute of Technology, China
Xun E, Zhou M, Huang C (2000) A unified statistical model for the identification of English base NP. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. Hong Kong
Yamada K (2001) Probability-possibility transformation based on evidence theory. In: Proceedings of the 9th International Fuzzy Systems Association World Congress, vol 1, pp 70–75
Zadeh L (1978) Fuzzy sets as a basic for a theory of possibility. Fuzzy Sets Syst 1:3–28
Zouaghi A, Merhbene L, Zrigui M (2012) Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation. Artif Intell Rev 38(4):257–269
Acknowledgements
Sincere thanks to the anonymous reviewers for their constructive comments, which significantly enhanced the quality of this manuscript during reviewing process. The authors wish to thank Muna Shdaifat who revised the paper and improved its English. We are also grateful to the Evaluations and Language resources Distribution Agency (ELDA) which kindly provided us the document collections of the CLEF-2003 campaign.
Author information
Authors and Affiliations
Corresponding author
Appendix: The naïve Bayesian approach for NP translation
Appendix: The naïve Bayesian approach for NP translation
Naïve Bayesian disambiguation translation (NBDT) approaches are inspired from the Bayes rule. In these techniques, the input variables are supposed independent. In spite of their easiness, NBDT approaches can be often more efficient than some recognized QT ones [55, 56]. In addition, we can consider a given NBDT method as a Bayesian network in which predictive source query terms are supposed to be conditionally independent given the translation of each one [11, 20, 23].
Let us consider a French NP modelled as the vector FNP = {f 1 ,…,f n }, with its NP pattern, FPT. Using the bilingual dictionary, we search all the available English translations for each French term f i in FNP. We have also taken advantage of all the available translation patterns EPT for FPT. Then, the best English translated phrase, ENP* = {e 1 ,…,e m }, is the one that maximizes the formula (23) below.
Using the Bayes rule:
Where: P(FNP|ENP) is the translation probability, and P(ENP) is a priori probability of words of the translated English NP.
When calculating the maximum posterior probability of a translation, we can ignore the normalizing factor P(f 1 , f 2 ,…, f n ) since it does not depend on the translation e j . We estimate the most important factor in formula (24), namely P(f 1 , f 2 ,…, f n |e j ), using training data. We can decompose the likelihood into a product of terms, as given in formula (25), because naïve Bayes supposes that conditional probabilities of source query terms are statistically independent.
Given an NP (FNP or ENP), as a set of terms (F or E) grouped by an NP pattern (FPT or EPT). We suppose that the translation of terms and NP patterns are independent as given in formula (26):
Replacing formula (26) in formula (23) as the following:
Where: P(F|E) is the translation probability from English words E in ENP to French words F in FNP. Given the English pattern EPT, P(FPT|EPT) is the probability of the translation pattern FPT.
These probabilities are estimated as follows:
Where: Occ(EPT) is the number of occurrences of EPT in the English portion of the aligned bilingual corpus; and Occ(FPT, EPT) is the number of times FPT corresponds to EPT in the aligned sentences.
P(ENP) is determined by the English trigram language model as given in formula (29):
Unfortunately, the probabilistic approaches suggested a more complex NP identification process [100, 101]. Firstly, base NPs are selected and translated with high accuracy. Secondly, complex NPs are identified and translated with accuracy less than the base ones. To overcome this limit, reliable complex NPs are chosen in the second step using only a small set of syntactic patterns. For more details and discussions about the probabilistic model for query translation applied to English-Chinese CLIR, we refer to [56].
Rights and permissions
About this article
Cite this article
Elayeb, B., Romdhane, W.B. & Saoud, N.B.B. Towards a new possibilistic query translation tool for cross-language information retrieval. Multimed Tools Appl 77, 2423–2465 (2018). https://doi.org/10.1007/s11042-017-4398-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4398-2