WSRR: Weighted Rank-Relevance Sampling for Dense Text Retrieval

Hambarde, Kailash; Proença, Hugo

doi:10.1007/978-981-99-3758-5_22

Kailash Hambarde¹³ &
Hugo Proença¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 719))

Included in the following conference series:

International Conference on Information and Communication Technology for Intelligent Systems

139 Accesses

Abstract

As in many other domains based in the contrastive learning paradigm, negative sampling is seen as a particular sensitive problem for appropriately training dense text retrieval models. For most cases, it is accepted that the existing techniques often suffer from the problem of uninformative or false negatives, which reduces the computational effectiveness of the learning phase and even reduces the probability of convergence of the whole process. Upon these limitations, in this paper we present a new approach for dense text retrieval (termed WRRS: Weighted Rank-Relevance Sampling) that addresses the limitations of current negative sampling strategies. WRRS assigns probabilities to negative samples based on their relevance scores and ranks, which consistently leads to improvements in retrieval performance. Under this perspective, WRRS offers a solution to uninformative or false negatives in traditional negative sampling techniques, which is seen as a valuable contribution to the field. Our empirical evaluation was carried out against the AR2 baseline on two well known datasets (NQ and MS Doc), pointing for consistent improvements over the SOTA performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Karpukhin V, Oğuz B, Min S, Lewis P, Wu L, Edunov S, Chen D, Yih W-T (2020) Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906
Reimers N, Gurevych I (2019) Sentence-bert: sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084
Brickley D, Burgess M, Noy N (2019) Google dataset search: building a search engine for datasets in an open web ecosystem. In: The World Wide web conference, pp 1365–1375
Google Scholar
Qu Y, Ding Y, Liu J, Liu K, Ren R, Zhao WX, Dong D, Wu H, Wang H (2020) RocketQA: an optimized training approach to dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2010.08191
Izacard G, Grave E (2020) Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282
Luan Yi, Eisenstein Jacob, Toutanova Kristina, Collins Michael (2021) Sparse, dense, and attentional representations for text retrieval. Trans Assoc Comput Linguistics 9:329–345
Article Google Scholar
Xiong L, Xiong C, Li Y, Tang K-F, Liu J, Bennett P, Ahmed J, Overwijk A (2020) Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808
Zhang H, Gong Y, Shen Y, Lv J, Duan N, Chen W (2021) Adversarial retriever-ranker for dense text retrieval. arXiv preprint arXiv:2110.03611
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Zhan J, Mao J, Liu Y, Zhang M, Ma S (2020) Repbert: contextualized text embeddings for first-stage retrieval. arXiv preprint arXiv:2006.15498
Hong W, Zhang Z, Wang J, Zhao H (2022) Sentence-aware contrastive learning for open-domain passage retrieval. In: Proceedings of the 60th annual meeting of the association for computational linguistics, vol 1: Long Papers, pp 1062–1074
Google Scholar
Ram O, Shachaf G, Levy O, Berant J, Globerson A (2021) Learning to retrieve passages without supervision. arXiv preprint arXiv:2112.07708
Zhou K, Zhang B, Zhao WX, Wen J-R (2022) Debiased contrastive learning of unsupervised sentence representations. arXiv preprint arXiv:2205.00656
Min S, Michael J, Hajishirzi H, Zettlemoyer L (2020) AmbigQA: answering ambiguous open-domain questions. arXiv preprint arXiv:2004.10645
Zhan J, Mao J, Liu Y, Guo J, Zhang M, Ma S (2021) Optimizing dense retrieval model training with hard negatives. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pp 1503–1512
Google Scholar
Ren R, Qu Y, Liu J, Zhao WX, She Q, Wu H, Wang H, Wen J-R (2021) Rocketqav2: a joint training method for dense passage retrieval and passage re-ranking. arXiv preprint arXiv:2110.07367
Lu Y, Liu Y, Liu J, Shi Y, Huang Z, Sun SFY, Tian H, et al (2022) Ernie-search: bridging cross-encoder with dual-encoder via self on-the-fly distillation for dense passage retrieval. arXiv preprint arXiv:2205.09153
Zhou J, Li X, Shang L, Luo L, Zhan K, Hu E, Zhang X, et al (2022) Hyperlink-induced pre-training for passage retrieval in open-domain question answering. arXiv preprint arXiv:2203.06942
Xu C, Guo D, Duan N, McAuley J (2022) Laprador: unsupervised pretrained dense retriever for zero-shot text retrieval. arXiv preprint arXiv:2203.06169
Mao K, Dou Z, Qian H (2022) Curriculum contrastive context denoising for few-shot conversational dense retrieval. In: Proceedings of the 45th international ACM SIGIR conference on research and development in iformation retrieval, pp 176–186
Google Scholar
Hofstätter S, Lin S-C, Yang J-H, Lin J, Hanbury A (2021) Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pp 113–122
Google Scholar
Lu J, Abrego GH, Ma J, Ni J, Yang Y (2021) Multi-stage training with improved negative contrast for neural passage retrieval. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 6091–6103
Google Scholar
Zhou K, Gong Y, Liu X, Zhao WX, Shen Y, Dong A, Lu J, et al. Simans: simple ambiguous negatives sampling for dense text retrieval. arXiv preprint arXiv:2210.11773
Formal T, Lassance C, Piwowarski B, Clinchant S (2022) From distillation to hard negative sampling: making sparse neural IR models more effective. In: Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, pp 2353–2359
Google Scholar
Wang J, Zhu J, He X (2021) Cross-batch negative sampling for training two-tower recommenders. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pp 1632–1636
Google Scholar
Johnson Jeff, Douze Matthijs, Jégou Hervé (2019) Billion-scale similarity search with GPUS. IEEE Trans Big Data 7(3):535–547
Article Google Scholar
Nguyen T, Rosenberg M, Song X, Gao J, Tiwary S, Majumder R, Deng L (2016) MS MARCO: a human generated machine reading comprehension dataset. Choice 2640:660
Google Scholar
Kwiatkowski T, Palomaki J, Redfield O, Collins M, Parikh A, Alberti C, Epstein D, et al. (2019) Natural questions: a benchmark for question answering research. Trans Assoc Comput Linguistics 7:453–466
Google Scholar

Download references

Acknowledgements

The author would like to thank to AddPath—Adaptative Designed Clinical Pathways Project (CENTRO-01-0247-FEDER-072640 LISBOA-01-0247-FEDER-072640). This work is funded by FCT/MCTES through national funds and co-funded by EU funds under the project UIDB/50008/2020.

Author information

Authors and Affiliations

IT: Instituto de Telecomunicações, University of Beira Interior, Covilha, Portugal
Kailash Hambarde & Hugo Proença

Authors

Kailash Hambarde
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Proença
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kailash Hambarde .

Editor information

Editors and Affiliations

Hertfordshire Business School, University of Hertfordshire, Hatfield, Hertfordshire, UK
Jyoti Choudrie
Department of AI and DS, Vishwakarma Institute of Information Technology, Pune, India
Parikshit N. Mahalle
Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
Thinagaran Perumal
Global Knowledge Research Foundation, Ahmedabad, Gujarat, India
Amit Joshi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hambarde, K., Proença, H. (2023). WSRR: Weighted Rank-Relevance Sampling for Dense Text Retrieval. In: Choudrie, J., Mahalle, P.N., Perumal, T., Joshi, A. (eds) ICT with Intelligent Applications. ICTIS 2023. Lecture Notes in Networks and Systems, vol 719. Springer, Singapore. https://doi.org/10.1007/978-981-99-3758-5_22

Download citation

DOI: https://doi.org/10.1007/978-981-99-3758-5_22
Published: 23 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3757-8
Online ISBN: 978-981-99-3758-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

WSRR: Weighted Rank-Relevance Sampling for Dense Text Retrieval