Exploiting hierarchical dependence structures for unsupervised rank fusion in information retrieval

Hermosillo-Valadez, Jorge; Morales-González, Eliseo; Fernández-Reyes, Francis C.; Montes-y-Gómez, Manuel; Fuentes-Pacheco, Jorge; Rendón-Mancha, Juan M.

doi:10.1007/s10844-022-00751-3

Exploiting hierarchical dependence structures for unsupervised rank fusion in information retrieval

Published: 18 October 2022

Volume 60, pages 853–876, (2023)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Jorge Hermosillo-Valadez¹,
Eliseo Morales-González¹,
Francis C. Fernández-Reyes¹,
Manuel Montes-y-Gómez²,
Jorge Fuentes-Pacheco¹ &
…
Juan M. Rendón-Mancha¹

281 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The goal of rank fusion in information retrieval (IR) is to deliver a single output list from multiple search results. Improving performance by combining the outputs of various IR systems is a challenging task. A central point is the fact that many non-obvious factors are involved in the estimation of relevance, inducing nonlinear interrelations between the data. The ability to model complex dependency relationships between random variables has become increasingly popular in the realm of information retrieval, and the need to further explore these dependencies for data fusion has been recently acknowledged. Copulas provide a framework to separate the dependence structure from the margins. Inspired by the theory of copulas, we propose a new unsupervised, dynamic, nonlinear, rank fusion method, based on a nested composition of non-algebraic function pairs. The dependence structure of the model is tailored by leveraging query-document correlations on a per-query basis. We experimented with three topic sets over CLEF corpora fusing 3 and 6 retrieval systems, comparing our method against the CombMNZ technique and other nonlinear unsupervised strategies. The experiments show that our fusion approach improves performance under explicit conditions, providing insight about the circumstances under which linear fusion techniques have comparable performance to nonlinear methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Score Fusion Method Using a Mixture Copula

Unsupervised System Combination for Set-Based Retrieval with Expectation Maximization

The Cluster Hypothesis in Information Retrieval

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

Cross Language Evaluation Forum (http://www.clef-campaign.org/)
Cross Language Evaluation Forum (http://www.clef-campaign.org/)

References

Arampatzis, A., & Robertson, S. (2011). Modeling score distributions in information retrieval. Information Retrieval, 14(1), 26–46. https://doi.org/10.1007/s10791-010-9145-5
Article Google Scholar
Bailey, P., Moffat, A., Scholer, F., & Thomas, P. (2017). Retrieval consistency in the presence of query variations. In Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval. SIGIR ’17 (pp. 395–404). Association for Computing Machinery. https://doi.org/10.1145/3077136.3080839
Canalle, G. K., Salgado, A. C., & Loscio, B. F. (2021). A survey on data fusion: what for? in what form? what is next? Journal of Intelligent Information Systems, 57(1), 25–50. https://doi.org/10.1007/s10844-020-00627-4
Article Google Scholar
Cormack, G. V., Clarke, C. L. A., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd International ACM SIGIR conference on research and development in information retrieval. SIGIR ’09 (pp. 758–759). Association for Computing Machinery. https://doi.org/10.1145/1571941.1572114
Cummins, R. (2011). In M. V. M Salem, K Shaalan, F. Oroumchian, A. Shakery, & H. Khelalfa (Eds.), Measuring the ability of score distributions to model relevance. Springer. https://doi.org/10.1007/978-3-642-25631-8_3
Dai, Z., & Callan, J. (2020). Context-aware term weighting for first stage passage retrieval. In Proceedings of the 43rd International ACM SIGIR conference on research and development in information retrieval. SIGIR ’20 (pp. 1533–1536). Association for Computing Machinery. https://doi.org/10.1145/3397271.3401204
Eickhoff, C., & de Vries, A. P. (2014). Modelling complex relevance spaces with copulas. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 1831–1834). ACM. https://doi.org/10.1145/2661829.2661925
Eickhoff, C., de Vries, A. P., & Collins-Thompson, K. (2013). Copulas for information retrieval. In Proceedings of the 36th International ACM SIGIR conference on research and development in information retrieval (pp. 663–672). ACM. https://doi.org/10.1145/2484028.2484066
Fox, E. A., & Shaw, J. A. (1994). Combination of multiple searches. NIST Special Publication, 243.
Frank Hsu, D., & Taksa, I. (2005). Comparing rank and score combination methods for data fusion in information retrieval. Information Retrieval, 8 (3), 449–480. https://doi.org/10.1007/s10791-005-6994-4
Article Google Scholar
Górecki, J., Hofert, M., & Holena, M. (2016). An approach to structure determination and estimation of hierarchical archimedean copulas and its application to bayesian classification. Journal of Intelligent Information Systems, 46(1), 21–59. https://doi.org/10.1007/s10844-014-0350-3
Article Google Scholar
Hofert, M., Maechler, M., & McNeil, A. J. (2012). Estimators for archimedean copulas in high dimensions. arXiv:Computation. https://doi.org/10.48550/arXiv.1207.1708
Hofert, M., & Scherer, M. (2011). Cdo pricing with nested archimedean copulas. Quantitative Finance, 11(5), 775–787. https://doi.org/10.1080/14697680903508479.
Article MathSciNet MATH Google Scholar
Joe, H. (1997). Multivariate Models and Dependence Concepts. London: Chapman & Hall.
Book MATH Google Scholar
Komatsuda, T., Keyaki, A., & Miyazaki, J. (2016). A score fusion method using a mixture copula. In International conference on database and expert systems applications, (Vol. 9828 pp. 216–232). Springer. https://doi.org/10.1007/978-3-319-44406-2_16
Lillis, D., Zhang, L., Toolan, F., Collier, R. W., Leonard, D., & Dunnion, J. (2010). Estimating probabilities for effective data fusion. In Proceedings of the 33rd International ACM SIGIR conference on research and development in information retrieval. SIGIR ’10 (pp. 347–354). Association for Computing Machinery. https://doi.org/10.1145/1835449.1835508
Losada, D. E., Parapar, J., & Barreiro, A. (2018). A rank fusion approach based on score distributions for prioritizing relevance assessments in information retrieval evaluation. Information Fusion, 39, 56–71. https://doi.org/10.1016/j.inffus.2017.04.001
Article Google Scholar
Mallia, A., Khattab, O., Suel, T., & Tonellotto, N. (2021). Learning passage impacts for inverted indexes. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’21 (pp. 1723–1727). Association for Computing Machinery. https://doi.org/10.1145/3404835.3463030
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. USA: Cambridge University Press.
Book MATH Google Scholar
McNeil, A. J. (2008). Sampling nested archimedean copulas. Journal of Statistical Computation and Simulation, 78(6), 567–581.
Article MathSciNet MATH Google Scholar
McNeil, A. J., & Nešlehová, J. (2009). Multivariate Archimedean copulas, d-monotone functions and ℓ1-norm symmetric distributions. The Annals of Statistics, 37(5B), 3059–3097. https://doi.org/10.1214/07-AOS556.
Article MathSciNet MATH Google Scholar
Mitra, B., & Craswell, N. (2018). An introduction to neural information retrieval. Foundations and TrendsⓇ in Information Retrieval, 13(1), 1–126. https://doi.org/10.1561/1500000061.
Article Google Scholar
Mourão, A, Martins, F., & Magalhães, J. (2014). Inverse square rank fusion for multimodal search. In 2014 12th international workshop on content-based multimedia indexing (CBMI) (pp. 1–6). https://doi.org/10.1109/CBMI.2014.6849825
Nelsen, R. B. (2006). An Introduction to Copulas (Springer Series in Statistics). Berlin: Springer.
Google Scholar
Robertson, S., Kanoulas, E., & Yilmaz, E. (2013). Modelling score distributions without actual scores. In Proceedings of the 2013 Conference on the Theory of Information Retrieval. ICTIR ’13 (pp. 85–92). ACM. https://doi.org/10.1145/2499178.2499181
Sklar, M. (1959). Fonctions de Répartition À N Dimensions et Leurs marges (Vol. 8, pp. 229–231). France: Université de Paris.
MATH Google Scholar
Vogt, C. C., & Cottrell, G. W. (1999). Fusion via a linear combination of scores. Information Retrieval, 1(3), 151–173. https://doi.org/10.1023/A:1009980820262
Article Google Scholar
Wu, S., & Crestani, F. (2015). A geometric framework for data fusion in information retrieval. Information Systems, 50, 20–35. https://doi.org/10.1016/j.is.2015.01.001
Article Google Scholar
Wu, S., & McClean, S. (2006). Performance prediction of data fusion for information retrieval. Information Processing & Management, 42(4), 899–915. https://doi.org/10.1016/j.ipm.2005.08.004.
Article Google Scholar
Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’01 (pp. 334–342). Association for Computing Machinery. https://doi.org/10.1145/383952.384019

Download references

Acknowledgements

The authors acknowledge the support of CONACYT.

Funding

This research was partially supported by the Consejo Nacional de Ciencia y Tecnología (CONACYT), México, through the scholarships numbers:

∙ 296232: F.C. Fernández-Reyes.

∙ 29943: E. Morales-González.

Author information

Authors and Affiliations

Centro de Investigación en Ciencias, Universidad Autónoma del Estado de Morelos, Av. Universidad 1001, Cuernavaca, 62209, Morelos, México
Jorge Hermosillo-Valadez, Eliseo Morales-González, Francis C. Fernández-Reyes, Jorge Fuentes-Pacheco & Juan M. Rendón-Mancha
Laboratorio de Tecnologías del Lenguaje, Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro 1, Santa María Tonantzintla, 72840, Puebla, México
Manuel Montes-y-Gómez

Authors

Jorge Hermosillo-Valadez
View author publications
You can also search for this author in PubMed Google Scholar
Eliseo Morales-González
View author publications
You can also search for this author in PubMed Google Scholar
Francis C. Fernández-Reyes
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Montes-y-Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Fuentes-Pacheco
View author publications
You can also search for this author in PubMed Google Scholar
Juan M. Rendón-Mancha
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The following details each author’s contribution:

∙ Conceptualization of work. J. Hermosillo-Valadez and F.C. Fernández-Reyes and M. Montes-y-Gómez.

∙ Methodology. J. Hermosillo-Valadez and E. Morales-González and F.C. Fernández-Reyes.

∙ Formal analysis. J. Hermosillo-Valadez.

∙ Algorithms coding and experimental runs. E. Morales-González and F.C. Fernández-Reyes.

∙ Experimental validation. J. Hermosillo-Valadez and E. Morales-González and M. Montes-y-Gómez.

∙ Data collection and curation. F.C. Fernández-Reyes and E. Morales-González.

∙ Material and software resources. J. Hermosillo-Valadez and J. Fuentes-Pacheco and J.M. Rendón-Mancha.

∙ Writing–original draft preparation. J. Hermosillo-Valadez and E. Morales-González and F.C. Fernández-Reyes and M. Montes-y-Gómez.

∙ Writing–review and editing. J. Hermosillo-Valadez and M. Montes-y-Gómez and J.M. Rendón-Mancha.

∙ Graphics and visualization. J. Hermosillo-Valadez and E. Morales-González.

∙ Supervision. J. Hermosillo-Valadez.

Corresponding author

Correspondence to Jorge Hermosillo-Valadez.

Ethics declarations

Ethics approval and consent to participate

All authors made substantial contributions to the work, revised it critically for important intellectual content, approved the version to be published and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Consent for Publication

All authors have approved the manuscript until its final version, and approved its submission to Intelligent Information Systems. All authors have the consent from the responsible authorities at the Universidad Autónoma del Estado de Morelos where the work has been carried out.

Human and Animal Ethics

Not Applicable.

Competing interests

We have no conflicts of interest to disclose. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hermosillo-Valadez, J., Morales-González, E., Fernández-Reyes, F.C. et al. Exploiting hierarchical dependence structures for unsupervised rank fusion in information retrieval. J Intell Inf Syst 60, 853–876 (2023). https://doi.org/10.1007/s10844-022-00751-3

Download citation

Received: 04 August 2022
Revised: 21 September 2022
Accepted: 22 September 2022
Published: 18 October 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10844-022-00751-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting hierarchical dependence structures for unsupervised rank fusion in information retrieval

Abstract

Access this article

Similar content being viewed by others

A Score Fusion Method Using a Mixture Copula

Unsupervised System Combination for Set-Based Retrieval with Expectation Maximization

The Cluster Hypothesis in Information Retrieval

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for Publication

Human and Animal Ethics

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploiting hierarchical dependence structures for unsupervised rank fusion in information retrieval

Abstract

Access this article

Similar content being viewed by others

A Score Fusion Method Using a Mixture Copula

Unsupervised System Combination for Set-Based Retrieval with Expectation Maximization

The Cluster Hypothesis in Information Retrieval

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for Publication

Human and Animal Ethics

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation