Abstract
The goal of rank fusion in information retrieval (IR) is to deliver a single output list from multiple search results. Improving performance by combining the outputs of various IR systems is a challenging task. A central point is the fact that many non-obvious factors are involved in the estimation of relevance, inducing nonlinear interrelations between the data. The ability to model complex dependency relationships between random variables has become increasingly popular in the realm of information retrieval, and the need to further explore these dependencies for data fusion has been recently acknowledged. Copulas provide a framework to separate the dependence structure from the margins. Inspired by the theory of copulas, we propose a new unsupervised, dynamic, nonlinear, rank fusion method, based on a nested composition of non-algebraic function pairs. The dependence structure of the model is tailored by leveraging query-document correlations on a per-query basis. We experimented with three topic sets over CLEF corpora fusing 3 and 6 retrieval systems, comparing our method against the CombMNZ technique and other nonlinear unsupervised strategies. The experiments show that our fusion approach improves performance under explicit conditions, providing insight about the circumstances under which linear fusion techniques have comparable performance to nonlinear methods.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Notes
Cross Language Evaluation Forum (http://www.clef-campaign.org/)
Cross Language Evaluation Forum (http://www.clef-campaign.org/)
References
Arampatzis, A., & Robertson, S. (2011). Modeling score distributions in information retrieval. Information Retrieval, 14(1), 26–46. https://doi.org/10.1007/s10791-010-9145-5
Bailey, P., Moffat, A., Scholer, F., & Thomas, P. (2017). Retrieval consistency in the presence of query variations. In Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval. SIGIR ’17 (pp. 395–404). Association for Computing Machinery. https://doi.org/10.1145/3077136.3080839
Canalle, G. K., Salgado, A. C., & Loscio, B. F. (2021). A survey on data fusion: what for? in what form? what is next? Journal of Intelligent Information Systems, 57(1), 25–50. https://doi.org/10.1007/s10844-020-00627-4
Cormack, G. V., Clarke, C. L. A., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd International ACM SIGIR conference on research and development in information retrieval. SIGIR ’09 (pp. 758–759). Association for Computing Machinery. https://doi.org/10.1145/1571941.1572114
Cummins, R. (2011). In M. V. M Salem, K Shaalan, F. Oroumchian, A. Shakery, & H. Khelalfa (Eds.), Measuring the ability of score distributions to model relevance. Springer. https://doi.org/10.1007/978-3-642-25631-8_3
Dai, Z., & Callan, J. (2020). Context-aware term weighting for first stage passage retrieval. In Proceedings of the 43rd International ACM SIGIR conference on research and development in information retrieval. SIGIR ’20 (pp. 1533–1536). Association for Computing Machinery. https://doi.org/10.1145/3397271.3401204
Eickhoff, C., & de Vries, A. P. (2014). Modelling complex relevance spaces with copulas. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 1831–1834). ACM. https://doi.org/10.1145/2661829.2661925
Eickhoff, C., de Vries, A. P., & Collins-Thompson, K. (2013). Copulas for information retrieval. In Proceedings of the 36th International ACM SIGIR conference on research and development in information retrieval (pp. 663–672). ACM. https://doi.org/10.1145/2484028.2484066
Fox, E. A., & Shaw, J. A. (1994). Combination of multiple searches. NIST Special Publication, 243.
Frank Hsu, D., & Taksa, I. (2005). Comparing rank and score combination methods for data fusion in information retrieval. Information Retrieval, 8 (3), 449–480. https://doi.org/10.1007/s10791-005-6994-4
Górecki, J., Hofert, M., & Holena, M. (2016). An approach to structure determination and estimation of hierarchical archimedean copulas and its application to bayesian classification. Journal of Intelligent Information Systems, 46(1), 21–59. https://doi.org/10.1007/s10844-014-0350-3
Hofert, M., Maechler, M., & McNeil, A. J. (2012). Estimators for archimedean copulas in high dimensions. arXiv:Computation. https://doi.org/10.48550/arXiv.1207.1708
Hofert, M., & Scherer, M. (2011). Cdo pricing with nested archimedean copulas. Quantitative Finance, 11(5), 775–787. https://doi.org/10.1080/14697680903508479.
Joe, H. (1997). Multivariate Models and Dependence Concepts. London: Chapman & Hall.
Komatsuda, T., Keyaki, A., & Miyazaki, J. (2016). A score fusion method using a mixture copula. In International conference on database and expert systems applications, (Vol. 9828 pp. 216–232). Springer. https://doi.org/10.1007/978-3-319-44406-2_16
Lillis, D., Zhang, L., Toolan, F., Collier, R. W., Leonard, D., & Dunnion, J. (2010). Estimating probabilities for effective data fusion. In Proceedings of the 33rd International ACM SIGIR conference on research and development in information retrieval. SIGIR ’10 (pp. 347–354). Association for Computing Machinery. https://doi.org/10.1145/1835449.1835508
Losada, D. E., Parapar, J., & Barreiro, A. (2018). A rank fusion approach based on score distributions for prioritizing relevance assessments in information retrieval evaluation. Information Fusion, 39, 56–71. https://doi.org/10.1016/j.inffus.2017.04.001
Mallia, A., Khattab, O., Suel, T., & Tonellotto, N. (2021). Learning passage impacts for inverted indexes. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’21 (pp. 1723–1727). Association for Computing Machinery. https://doi.org/10.1145/3404835.3463030
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. USA: Cambridge University Press.
McNeil, A. J. (2008). Sampling nested archimedean copulas. Journal of Statistical Computation and Simulation, 78(6), 567–581.
McNeil, A. J., & Nešlehová, J. (2009). Multivariate Archimedean copulas, d-monotone functions and ℓ1-norm symmetric distributions. The Annals of Statistics, 37(5B), 3059–3097. https://doi.org/10.1214/07-AOS556.
Mitra, B., & Craswell, N. (2018). An introduction to neural information retrieval. Foundations and TrendsⓇ in Information Retrieval, 13(1), 1–126. https://doi.org/10.1561/1500000061.
Mourão, A, Martins, F., & Magalhães, J. (2014). Inverse square rank fusion for multimodal search. In 2014 12th international workshop on content-based multimedia indexing (CBMI) (pp. 1–6). https://doi.org/10.1109/CBMI.2014.6849825
Nelsen, R. B. (2006). An Introduction to Copulas (Springer Series in Statistics). Berlin: Springer.
Robertson, S., Kanoulas, E., & Yilmaz, E. (2013). Modelling score distributions without actual scores. In Proceedings of the 2013 Conference on the Theory of Information Retrieval. ICTIR ’13 (pp. 85–92). ACM. https://doi.org/10.1145/2499178.2499181
Sklar, M. (1959). Fonctions de Répartition À N Dimensions et Leurs marges (Vol. 8, pp. 229–231). France: Université de Paris.
Vogt, C. C., & Cottrell, G. W. (1999). Fusion via a linear combination of scores. Information Retrieval, 1(3), 151–173. https://doi.org/10.1023/A:1009980820262
Wu, S., & Crestani, F. (2015). A geometric framework for data fusion in information retrieval. Information Systems, 50, 20–35. https://doi.org/10.1016/j.is.2015.01.001
Wu, S., & McClean, S. (2006). Performance prediction of data fusion for information retrieval. Information Processing & Management, 42(4), 899–915. https://doi.org/10.1016/j.ipm.2005.08.004.
Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’01 (pp. 334–342). Association for Computing Machinery. https://doi.org/10.1145/383952.384019
Acknowledgements
The authors acknowledge the support of CONACYT.
Funding
This research was partially supported by the Consejo Nacional de Ciencia y Tecnología (CONACYT), México, through the scholarships numbers:
∙ 296232: F.C. Fernández-Reyes.
∙ 29943: E. Morales-González.
Author information
Authors and Affiliations
Contributions
The following details each author’s contribution:
∙ Conceptualization of work. J. Hermosillo-Valadez and F.C. Fernández-Reyes and M. Montes-y-Gómez.
∙ Methodology. J. Hermosillo-Valadez and E. Morales-González and F.C. Fernández-Reyes.
∙ Formal analysis. J. Hermosillo-Valadez.
∙ Algorithms coding and experimental runs. E. Morales-González and F.C. Fernández-Reyes.
∙ Experimental validation. J. Hermosillo-Valadez and E. Morales-González and M. Montes-y-Gómez.
∙ Data collection and curation. F.C. Fernández-Reyes and E. Morales-González.
∙ Material and software resources. J. Hermosillo-Valadez and J. Fuentes-Pacheco and J.M. Rendón-Mancha.
∙ Writing–original draft preparation. J. Hermosillo-Valadez and E. Morales-González and F.C. Fernández-Reyes and M. Montes-y-Gómez.
∙ Writing–review and editing. J. Hermosillo-Valadez and M. Montes-y-Gómez and J.M. Rendón-Mancha.
∙ Graphics and visualization. J. Hermosillo-Valadez and E. Morales-González.
∙ Supervision. J. Hermosillo-Valadez.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
All authors made substantial contributions to the work, revised it critically for important intellectual content, approved the version to be published and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Consent for Publication
All authors have approved the manuscript until its final version, and approved its submission to Intelligent Information Systems. All authors have the consent from the responsible authorities at the Universidad Autónoma del Estado de Morelos where the work has been carried out.
Human and Animal Ethics
Not Applicable.
Competing interests
We have no conflicts of interest to disclose. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hermosillo-Valadez, J., Morales-González, E., Fernández-Reyes, F.C. et al. Exploiting hierarchical dependence structures for unsupervised rank fusion in information retrieval. J Intell Inf Syst 60, 853–876 (2023). https://doi.org/10.1007/s10844-022-00751-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-022-00751-3