Abstract
Large Web search engines are complex systems that solve thousands of user queries per second on clusters of dedicated distributed memory processors. Processing each query involves executing a number of operations to get the answer presented to the user. The most expensive operation in running time is the calculation of the top-k documents that best match each query. In this paper we propose the parallelization of a state of the art document ranking algorithm called Block-Max WAND. We propose a 2-steps parallelization of the WAND algorithm in order to reduce inter-processor communication and running time cost. Multi-threading tailored to Block-Max WAND is also proposed to exploit multi-core parallelism in each processor. The experimental results show that the proposed parallelization reduces execution time significantly as compared against current approaches used in search engines.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Anh, V.N., Moffat, A.: Pruned query evaluation using pre-computed impacts. In: SIGIR, pp. 372–379 (2006)
Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: Index-access optimized top-k query processing. In: VLDB, pp. 475–486 (2006)
Blanco, R., Barreiro, A.: Probabilistic static pruning of inverted files. ACM Trans. Inf. Syst. 28(1) (2010)
Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.Y.: Efficient query evaluation using a two-level retrieval process. In: CIKM, pp. 426–434 (2003)
Chakrabarti, K., Chaudhuri, S., Ganti, V.: Interval-based pruning for top-k processing over compressed lists. In: ICDE, pp. 709–720 (2011)
Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: SIGIR, pp. 993–1002 (2011)
Jonassen, S., Bratsberg, S.E.: Intra-query Concurrent Pipelined Processing for Distributed Full-Text Retrieval. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 413–425. Springer, Heidelberg (2012)
Long, X., Suel, T.: Optimized query execution in large search engines with global page ordering. In: VLDB, pp. 129–140 (2003)
Marin, M., Costa, V.G.: Sync/async parallel search for the efficient design and construction of web search engines. PARCO 36(4), 153–168 (2010)
Strohman, T., Turtle, H.R., Croft, W.B.: Optimization strategies for complex queries. In: SIGIR, pp. 219–225 (2005)
Tatikonda, S., Cambazoglu, B., Junqueira, F.: Posting list intersection on multicore architectures. In: SIGIR (2011)
Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: WWW, pp. 401–410 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rojas, O., Gil-Costa, V., Marin, M. (2013). Efficient Parallel Block-Max WAND Algorithm. In: Wolf, F., Mohr, B., an Mey, D. (eds) Euro-Par 2013 Parallel Processing. Euro-Par 2013. Lecture Notes in Computer Science, vol 8097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40047-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-40047-6_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40046-9
Online ISBN: 978-3-642-40047-6
eBook Packages: Computer ScienceComputer Science (R0)