Abstract
Large scale web search engines provide sub-second response times to interactive user queries. However, not all search traffic arises interactively – cache updates, internal testing and prototyping, generation of training data, and web mining tasks all contribute to the workload of a typical search service. If these non-interactive query components are collected together and processed as a batch, the overall execution cost of query processing can be significantly reduced. In this reproducibility study, we revisit query batching in the context of large-scale conjunctive processing over inverted indexes, considering both on-disk and in-memory index arrangements. Our exploration first verifies the results reported in the reference work [Ding et al., WSDM 2011], and then provides novel approaches for batch processing which give rise to better time–space trade-offs than have been previously achieved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albers, S.: New results on web caching with request reordering. In: Proceedings of SPAA, pp. 84–92 (2004)
Bajaj, P., et al.: MS MARCO: a human generated MAchine Reading COmprehension dataset. arXiv:1611.09268v3 (2018)
Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5(2), 78–101 (1966)
Benham, R., Mackenzie, J., Moffat, A., Culpepper, J.S.: Boosting search performance using query variations. ACM Trans. Inf. Syst. 37(4), 41.1–41.25 (2019)
Blandford, D., Blelloch, G.: Index compression through document reordering. In: Proceedings of DCC, pp. 342–352 (2002)
Cambazoglu, B.B., et al.: A refreshing perspective of search engine caching. In: Proceedings of WWW, pp. 181–190 (2010)
Catena, M., Tonellotto, N.: Multiple query processing via logic function factoring. In: Proceedings of SIGIR, pp. 937–940 (2019)
Chaudhuri, S., Church, K., König, A.C., Sui, L.: Heavy-tailed distributions and multi-keyword queries. In: Proceedings of SIGIR, pp. 663–670 (2007)
Cheng, C., Chung, C., Shann, J.J.: Fast query evaluation through document identifier assignment for inverted file-based information retrieval systems. Inf. Proc. Man. 42(3), 729–750 (2006)
Choudhury, F.M., Culpepper, J.S., Bao, Z., Sellis, T.: Batch processing of top-\(k\) spatial-textual queries. ACM Trans. Spat. Alg. Syst. 3(4), 13.1–13.40 (2018)
Chowdhury, G.: An agenda for green information retrieval research. Inf. Proc. Man. 48(6), 1067–1077 (2012)
Craswell, N., Campos, D., Mitra, B., Yilmaz, E., Billerbeck, B.: ORCAS: 20 million clicked query-document pairs for analyzing search. In: Proceedings of CIKM, pp. 2983–2989 (2020)
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Lin, J.: Overview of the TREC 2021 deep learning track. In: Proceedings of TREC (2021)
Culpepper, J.S., Moffat, A.: Efficient set intersection for inverted indexing. ACM Trans. Inf. Syst. 29(1), 1.1–1.25 (2010)
Dhulipala, L., Kabiljo, I., Karrer, B., Ottaviano, G., Pupyrev, S., Shalita, A.: Compressing graphs and indexes with recursive graph bisection. In: Proceedings of KDD, pp. 1535–1544 (2016)
Ding, S., Attenberg, J., Baeza-Yates, R., Suel, T.: Batch query processing for web search engines. In: Proceedings of WSDM, pp. 137–146 (2011)
Fagni, T., Perego, R., Silvestri, F., Orlando, S.: Boosting the performance of web search engines: caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 24(1), 51–78 (2006)
Feder, T., Motwani, R., Panigrahy, R., Zhu, A.: Web caching with request reordering. In: Proceedings of SODA, pp. 104–105 (2002)
Hwang, S.-W., Kim, S., He, Y., Elnikety, S., Choi, S.: Prediction and predictability for search query acceleration. ACM Trans. Web 10(3), 19.1–19.28 (2016)
Jonassen, S., Cambazoglu, B.B., Silvestri, F.: Prefetching query results and its impact on search engines. In: Proceedings of SIGIR, pp. 631–640 (2012)
Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. Soft. Prac. Exp. 41(1), 1–29 (2015)
Lempel, R., Moran, S.: Predictive caching and prefetching of query results in search engines. In: Proceedings of WWW, pp. 19–28 (2003)
Lin, J., et al.: Supporting interoperability between open-source search engines with the common index file format. In: Proceedings of SIGIR, pp. 2149–2152 (2020)
Ma, H., Wang, B.: User-aware caching and prefetching query results in web search engines. In: Proceedings of SIGIR, pp. 1163–1164 (2012)
Ma, X., Pradeep, R., Nogueira, R., Lin, J.: Document expansions and learned sparse lexical representations for MSMARCO V1 and V2. In: Proceedings of SIGIR, pp. 3187–3197 (2022)
Mackenzie, J., Mallia, A., Petri, M., Culpepper, J.S., Suel, T.: Compressing inverted indexes with recursive graph bisection: a reproducibility study. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11437, pp. 339–352. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15712-8_22
Mackenzie, J., Petri, M., Moffat, A.: Tradeoff options for bipartite graph partitioning. IEEE Trans. Know. Data Eng. (2022, to appear)
Mallia, A., Siedlaczek, M., Mackenzie, J., Suel, T.: PISA: performant indexes and search for academia. In: Proceedings of OSIRRC at SIGIR 2019, pp. 50–56 (2019)
Marín, M., Navarro, G.: Distributed query processing using suffix arrays. In: Proceedings of SPIRE, pp. 311–325 (2003)
Petersen, C., Simonsen, J.G., Lioma, C.: Power law distributions in information retrieval. ACM Trans. Inf. Syst. 34(2), 8.1–8.37 (2016)
Scells, H., Zhuang, S., Zuccon, G.: Reduce, reuse, recycle: green information retrieval research. In: Proceedings of SIGIR, pp. 2825–2837 (2022)
Sellis, T.K.: Multiple-query optimization. ACM Trans. Data. Syst. 13(1), 23–52 (1988)
Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. In: Proceedings of ACL, pp. 3645–3650 (2019)
Tolosa, G., Becchetti, L., Feuerstein, E., Marchetti-Spaccamela, A.: Performance improvements for search systems using an integrated cache of lists + intersections. Inf. Retr. 20(3), 172–198 (2017)
Tonellotto, N., Macdonald, C., Ounis, I.: Efficient query processing for scalable web search. Found. Trnd. Inf. Retr. 12(4–5), 319–500 (2018)
Yang, P., Fang, H., Lin, J.: Anserini: reproducible ranking baselines using Lucene. J. Data Inf. Qual. 10(4), 1–20 (2018)
Acknowledgement
This work was supported by the Australian Research Council’s Discovery Projects Scheme (project DP200103136) and a University of Queensland New Staff Research Grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mackenzie, J., Moffat, A. (2023). Index-Based Batch Query Processing Revisited. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13982. Springer, Cham. https://doi.org/10.1007/978-3-031-28241-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-28241-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28240-9
Online ISBN: 978-3-031-28241-6
eBook Packages: Computer ScienceComputer Science (R0)