Efficient Multi-vector Dense Retrieval with Bit Vectors

Nardini, Franco Maria; Rulli, Cosimo; Venturini, Rossano

doi:10.1007/978-3-031-56060-6_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14609))

Included in the following conference series:

European Conference on Information Retrieval

427 Accesses

Abstract

Dense retrieval techniques employ pre-trained large language models to build a high-dimensional representation of queries and passages. These representations compute the relevance of a passage w.r.t. to a query using efficient similarity measures. In this line, multi-vector representations show improved effectiveness at the expense of a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. Recently, PLAID has tackled these problems by introducing a centroid-based term representation to reduce the memory impact of multi-vector systems. By exploiting a centroid interaction mechanism, PLAID filters out non-relevant documents, thus reducing the cost of the successive ranking stages. This paper proposes “Efficient Multi-Vector dense retrieval with Bit vectors” (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. First, EMVB employs a highly efficient pre-filtering step of passages using optimized bit vectors. Second, the computation of the centroid interaction happens column-wise, exploiting SIMD instructions, thus reducing its latency. Third, EMVB leverages Product Quantization (PQ) to reduce the memory footprint of storing vector representations while jointly allowing for fast late interaction. Fourth, we introduce a per-document term filtering method that further improves the efficiency of the last step. Experiments on MS MARCO and LoTTE show that EMVB is up to \({2.8}{\times }\) faster while reducing the memory footprint by \({1.8}{\times }\) with no loss in retrieval accuracy compared to PLAID.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
the terms “document” and “passage” are used interchangeably in this paper.

References

Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems (NIPS) (2020)
Google Scholar
Bruch, S., Lucchese, C., Nardini, F.M.: Efficient and Effective Tree-Based and Neural Learning to Rank. Now Foundations and Trends (2023)
Google Scholar
Dai, D., Sun, Y., Dong, L., Hao, Y., Sui, Z., Wei, F.: Why can GPT learn in-context? Language models secretly perform gradient descent as meta optimizers. arXiv preprint arXiv:2212.10559 (2022)
Fang, Y., Zhan, J., Liu, Y., Mao, J., Zhang, M., Ma, S.: Joint optimization of multi-vector representation with product quantization. In: Natural Language Processing and Chinese Computing (2022)
Google Scholar
Formal, T., Piwowarski, B., Clinchant, S.: SPLADE: sparse lexical and expansion model for first stage ranking. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021)
Google Scholar
Gao, L., Dai, Z., Callan, J.: COIL: revisit exact lexical match in information retrieval with contextualized inverted list. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)
Google Scholar
Ge, T., He, K., Ke, Q., Sun, J.: Optimized product quantization. IEEE Trans. Pattern Anal. Mach. Intel. 36, 744–755 (2013)
Article Google Scholar
Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval augmented language model pre-training. In: Proceedings of the International Conference on Machine Learning (ICML) (2020)
Google Scholar
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Transa. Pattern Anal. Mach. Intel. 33, 117–128 (2010)
Article Google Scholar
Johnson, J., Douze, M., Jegou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7, 535–547 (2021)
Article Google Scholar
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (2020)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT (2019)
Google Scholar
Khattab, O., Potts, C., Zaharia, M.: Baleen: robust multi-hop reasoning at scale via condensed retrieval. In: Advances in Neural Information Processing Systems (NIPS) (2021)
Google Scholar
Khattab, O., Zaharia, M.: ColBERT: efficient and effective passage search via contextualized late interaction over BERT. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 39–48 (2020)
Google Scholar
Lemire, D., Downs, T.: AVX-512: when and how to use these new instructions (2023). https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/
Li, M., et al.: CITADEL: conditional token interaction via dynamic lexical routing for efficient and effective multi-vector retrieval. arXiv e-prints (2022)
Google Scholar
Nguyen, T., et al.: MS Marco: a human-generated machine reading comprehension dataset (2016)
Google Scholar
Qian, G., Sural, S., Gu, Y., Pramanik, S.: Similarity between Euclidean and Cosine angle distance for nearest neighbor queries. In: Proceedings of the 2004 ACM Symposium on Applied Computing (2004)
Google Scholar
Santhanam, K., Khattab, O., Potts, C., Zaharia, M.: PLAID: an efficient engine for late interaction retrieval. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management (2022)
Google Scholar
Santhanam, K., Khattab, O., Saad-Falcon, J., Potts, C., Zaharia, M.: ColBERTv2: effective and efficient retrieval via lightweight late interaction. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2022)
Google Scholar
Wang, E., et al.: Intel math kernel library. In: High-Performance Computing on the Intel® Xeon Phi\(^{{\rm TM}}\) (2014)
Google Scholar
Wang, X., MacAvaney, S., Macdonald, C., Ounis, I.: Effective contrastive weighting for dense query expansion. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (2023)
Google Scholar
Wang, X., Macdonald, C., Tonellotto, N., Ounis, I.: ColBERT-PRF: semantic pseudo-relevance feedback for dense passage and document retrieval. ACM Trans. Web 17(1), 1–39 (2023)
Google Scholar
Xiong, L., et al.: Approximate nearest neighbor negative contrastive learning for dense text retrieval. In: International Conference on Learning Representations (2021)
Google Scholar
Zhan, J., Mao, J., Liu, Y., Guo, J., Zhang, M., Ma, S.: Optimizing dense retrieval model training with hard negatives. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021)
Google Scholar
Zhan, J., Mao, J., Liu, Y., Guo, J., Zhang, M., Ma, S.: Learning discrete representations via constrained clustering for effective and efficient dense retrieval. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 1328–1336 (2022)
Google Scholar

Download references

Acknowledgements

This work was partially supported by the EU - NGEU, by the PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - “FAIR - Future Artificial Intelligence Research” - Spoke 1 “Human-centered AI” funded by the European Commission under the NextGeneration EU program, by the PNRR ECS00000017 Tuscany Health Ecosystem Spoke 6 “Precision medicine & personalized healthcare”, by the European Commission under the NextGeneration EU programme, by the Horizon Europe RIA “Extreme Food Risk Analytics” (EFRA), grant agreement n. 101093026, by the “Algorithms, Data Structures and Combinatorics for Machine Learning” (MIUR-PRIN 2017), and by the “Algorithmic Problems and Machine Learning” (MIUR-PRIN 2022).

Author information

Authors and Affiliations

ISTI-CNR, Pisa, Italy
Franco Maria Nardini & Cosimo Rulli
University of Pisa, Pisa, Italy
Rossano Venturini

Authors

Franco Maria Nardini
View author publications
You can also search for this author in PubMed Google Scholar
Cosimo Rulli
View author publications
You can also search for this author in PubMed Google Scholar
Rossano Venturini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cosimo Rulli .

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, PISA, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nardini, F.M., Rulli, C., Venturini, R. (2024). Efficient Multi-vector Dense Retrieval with Bit Vectors. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14609. Springer, Cham. https://doi.org/10.1007/978-3-031-56060-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-56060-6_1
Published: 16 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56059-0
Online ISBN: 978-3-031-56060-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Multi-vector Dense Retrieval with Bit Vectors