Skip to main content
Log in

Exploiting hierarchical dependence structures for unsupervised rank fusion in information retrieval

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The goal of rank fusion in information retrieval (IR) is to deliver a single output list from multiple search results. Improving performance by combining the outputs of various IR systems is a challenging task. A central point is the fact that many non-obvious factors are involved in the estimation of relevance, inducing nonlinear interrelations between the data. The ability to model complex dependency relationships between random variables has become increasingly popular in the realm of information retrieval, and the need to further explore these dependencies for data fusion has been recently acknowledged. Copulas provide a framework to separate the dependence structure from the margins. Inspired by the theory of copulas, we propose a new unsupervised, dynamic, nonlinear, rank fusion method, based on a nested composition of non-algebraic function pairs. The dependence structure of the model is tailored by leveraging query-document correlations on a per-query basis. We experimented with three topic sets over CLEF corpora fusing 3 and 6 retrieval systems, comparing our method against the CombMNZ technique and other nonlinear unsupervised strategies. The experiments show that our fusion approach improves performance under explicit conditions, providing insight about the circumstances under which linear fusion techniques have comparable performance to nonlinear methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. Cross Language Evaluation Forum (http://www.clef-campaign.org/)

  2. Cross Language Evaluation Forum (http://www.clef-campaign.org/)

References

  • Arampatzis, A., & Robertson, S. (2011). Modeling score distributions in information retrieval. Information Retrieval, 14(1), 26–46. https://doi.org/10.1007/s10791-010-9145-5

    Article  Google Scholar 

  • Bailey, P., Moffat, A., Scholer, F., & Thomas, P. (2017). Retrieval consistency in the presence of query variations. In Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval. SIGIR ’17 (pp. 395–404). Association for Computing Machinery. https://doi.org/10.1145/3077136.3080839

  • Canalle, G. K., Salgado, A. C., & Loscio, B. F. (2021). A survey on data fusion: what for? in what form? what is next? Journal of Intelligent Information Systems, 57(1), 25–50. https://doi.org/10.1007/s10844-020-00627-4

    Article  Google Scholar 

  • Cormack, G. V., Clarke, C. L. A., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd International ACM SIGIR conference on research and development in information retrieval. SIGIR ’09 (pp. 758–759). Association for Computing Machinery. https://doi.org/10.1145/1571941.1572114

  • Cummins, R. (2011). In M. V. M Salem, K Shaalan, F. Oroumchian, A. Shakery, & H. Khelalfa (Eds.), Measuring the ability of score distributions to model relevance. Springer. https://doi.org/10.1007/978-3-642-25631-8_3

  • Dai, Z., & Callan, J. (2020). Context-aware term weighting for first stage passage retrieval. In Proceedings of the 43rd International ACM SIGIR conference on research and development in information retrieval. SIGIR ’20 (pp. 1533–1536). Association for Computing Machinery. https://doi.org/10.1145/3397271.3401204

  • Eickhoff, C., & de Vries, A. P. (2014). Modelling complex relevance spaces with copulas. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 1831–1834). ACM. https://doi.org/10.1145/2661829.2661925

  • Eickhoff, C., de Vries, A. P., & Collins-Thompson, K. (2013). Copulas for information retrieval. In Proceedings of the 36th International ACM SIGIR conference on research and development in information retrieval (pp. 663–672). ACM. https://doi.org/10.1145/2484028.2484066

  • Fox, E. A., & Shaw, J. A. (1994). Combination of multiple searches. NIST Special Publication, 243.

  • Frank Hsu, D., & Taksa, I. (2005). Comparing rank and score combination methods for data fusion in information retrieval. Information Retrieval, 8 (3), 449–480. https://doi.org/10.1007/s10791-005-6994-4

    Article  Google Scholar 

  • Górecki, J., Hofert, M., & Holena, M. (2016). An approach to structure determination and estimation of hierarchical archimedean copulas and its application to bayesian classification. Journal of Intelligent Information Systems, 46(1), 21–59. https://doi.org/10.1007/s10844-014-0350-3

    Article  Google Scholar 

  • Hofert, M., Maechler, M., & McNeil, A. J. (2012). Estimators for archimedean copulas in high dimensions. arXiv:Computation. https://doi.org/10.48550/arXiv.1207.1708

  • Hofert, M., & Scherer, M. (2011). Cdo pricing with nested archimedean copulas. Quantitative Finance, 11(5), 775–787. https://doi.org/10.1080/14697680903508479.

    Article  MathSciNet  MATH  Google Scholar 

  • Joe, H. (1997). Multivariate Models and Dependence Concepts. London: Chapman & Hall.

    Book  MATH  Google Scholar 

  • Komatsuda, T., Keyaki, A., & Miyazaki, J. (2016). A score fusion method using a mixture copula. In International conference on database and expert systems applications, (Vol. 9828 pp. 216–232). Springer. https://doi.org/10.1007/978-3-319-44406-2_16

  • Lillis, D., Zhang, L., Toolan, F., Collier, R. W., Leonard, D., & Dunnion, J. (2010). Estimating probabilities for effective data fusion. In Proceedings of the 33rd International ACM SIGIR conference on research and development in information retrieval. SIGIR ’10 (pp. 347–354). Association for Computing Machinery. https://doi.org/10.1145/1835449.1835508

  • Losada, D. E., Parapar, J., & Barreiro, A. (2018). A rank fusion approach based on score distributions for prioritizing relevance assessments in information retrieval evaluation. Information Fusion, 39, 56–71. https://doi.org/10.1016/j.inffus.2017.04.001

    Article  Google Scholar 

  • Mallia, A., Khattab, O., Suel, T., & Tonellotto, N. (2021). Learning passage impacts for inverted indexes. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’21 (pp. 1723–1727). Association for Computing Machinery. https://doi.org/10.1145/3404835.3463030

  • Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. USA: Cambridge University Press.

    Book  MATH  Google Scholar 

  • McNeil, A. J. (2008). Sampling nested archimedean copulas. Journal of Statistical Computation and Simulation, 78(6), 567–581.

    Article  MathSciNet  MATH  Google Scholar 

  • McNeil, A. J., & Nešlehová, J. (2009). Multivariate Archimedean copulas, d-monotone functions and 1-norm symmetric distributions. The Annals of Statistics, 37(5B), 3059–3097. https://doi.org/10.1214/07-AOS556.

    Article  MathSciNet  MATH  Google Scholar 

  • Mitra, B., & Craswell, N. (2018). An introduction to neural information retrieval. Foundations and TrendsⓇ in Information Retrieval, 13(1), 1–126. https://doi.org/10.1561/1500000061.

    Article  Google Scholar 

  • Mourão, A, Martins, F., & Magalhães, J. (2014). Inverse square rank fusion for multimodal search. In 2014 12th international workshop on content-based multimedia indexing (CBMI) (pp. 1–6). https://doi.org/10.1109/CBMI.2014.6849825

  • Nelsen, R. B. (2006). An Introduction to Copulas (Springer Series in Statistics). Berlin: Springer.

    Google Scholar 

  • Robertson, S., Kanoulas, E., & Yilmaz, E. (2013). Modelling score distributions without actual scores. In Proceedings of the 2013 Conference on the Theory of Information Retrieval. ICTIR ’13 (pp. 85–92). ACM. https://doi.org/10.1145/2499178.2499181

  • Sklar, M. (1959). Fonctions de Répartition À N Dimensions et Leurs marges (Vol. 8, pp. 229–231). France: Université de Paris.

    MATH  Google Scholar 

  • Vogt, C. C., & Cottrell, G. W. (1999). Fusion via a linear combination of scores. Information Retrieval, 1(3), 151–173. https://doi.org/10.1023/A:1009980820262

    Article  Google Scholar 

  • Wu, S., & Crestani, F. (2015). A geometric framework for data fusion in information retrieval. Information Systems, 50, 20–35. https://doi.org/10.1016/j.is.2015.01.001

    Article  Google Scholar 

  • Wu, S., & McClean, S. (2006). Performance prediction of data fusion for information retrieval. Information Processing & Management, 42(4), 899–915. https://doi.org/10.1016/j.ipm.2005.08.004.

    Article  Google Scholar 

  • Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’01 (pp. 334–342). Association for Computing Machinery. https://doi.org/10.1145/383952.384019

Download references

Acknowledgements

The authors acknowledge the support of CONACYT.

Funding

This research was partially supported by the Consejo Nacional de Ciencia y Tecnología (CONACYT), México, through the scholarships numbers:

∙ 296232: F.C. Fernández-Reyes.

∙ 29943: E. Morales-González.

Author information

Authors and Affiliations

Authors

Contributions

The following details each author’s contribution:

Conceptualization of work. J. Hermosillo-Valadez and F.C. Fernández-Reyes and M. Montes-y-Gómez.

Methodology. J. Hermosillo-Valadez and E. Morales-González and F.C. Fernández-Reyes.

Formal analysis. J. Hermosillo-Valadez.

Algorithms coding and experimental runs. E. Morales-González and F.C. Fernández-Reyes.

Experimental validation. J. Hermosillo-Valadez and E. Morales-González and M. Montes-y-Gómez.

Data collection and curation. F.C. Fernández-Reyes and E. Morales-González.

Material and software resources. J. Hermosillo-Valadez and J. Fuentes-Pacheco and J.M. Rendón-Mancha.

Writing–original draft preparation. J. Hermosillo-Valadez and E. Morales-González and F.C. Fernández-Reyes and M. Montes-y-Gómez.

Writing–review and editing. J. Hermosillo-Valadez and M. Montes-y-Gómez and J.M. Rendón-Mancha.

Graphics and visualization. J. Hermosillo-Valadez and E. Morales-González.

Supervision. J. Hermosillo-Valadez.

Corresponding author

Correspondence to Jorge Hermosillo-Valadez.

Ethics declarations

Ethics approval and consent to participate

All authors made substantial contributions to the work, revised it critically for important intellectual content, approved the version to be published and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Consent for Publication

All authors have approved the manuscript until its final version, and approved its submission to Intelligent Information Systems. All authors have the consent from the responsible authorities at the Universidad Autónoma del Estado de Morelos where the work has been carried out.

Human and Animal Ethics

Not Applicable.

Competing interests

We have no conflicts of interest to disclose. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hermosillo-Valadez, J., Morales-González, E., Fernández-Reyes, F.C. et al. Exploiting hierarchical dependence structures for unsupervised rank fusion in information retrieval. J Intell Inf Syst 60, 853–876 (2023). https://doi.org/10.1007/s10844-022-00751-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-022-00751-3

Keywords

Navigation