Abstract
Pipelined query processing over a term-wise distributed inverted index has superior throughput at high query multiprogramming levels. However, due to long query latencies this approach is inefficient at lower levels. In this paper we explore two types of intra-query parallelism within the pipelined approach, parallel execution of a query on different nodes and concurrent execution on the same node. According to the experimental results, our approach reaches the throughput of the state-of-the-art method at about half of the latency. On the single query case the observed latency improvement is up to 2.6 times.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, N.: Distributed query processing using partitioned inverted files. In: SPIRE (2001)
Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press (2010)
Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: SIGIR (2011)
Feuerstein, E., Marin, M., Mizrahi, M., Gil-Costa, V., Baeza-Yates, R.: Two-Dimensional Distributed Inverted Files. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 206–213. Springer, Heidelberg (2009)
Jonassen, S., Bratsberg, S.E.: A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval. In: Chen, L., Triantafillou, P., Suel, T. (eds.) WISE 2010. LNCS, vol. 6488, pp. 587–601. Springer, Heidelberg (2010)
Jonassen, S., Bratsberg, S.E.: Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 530–542. Springer, Heidelberg (2011)
Lester, N., Moffat, A., Webber, W., Zobel, J.: Space-Limited Ranked Query Evaluation Using Adaptive Pruning. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 470–477. Springer, Heidelberg (2005)
Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Mining query logs to optimize index partitioning in parallel web search engines. In: InfoScale (2007)
Marin, M., Gil-Costa, V.: High-performance distributed inverted files. In: CIKM (2007)
Marin, M., Gil-Costa, V., Bonacic, C., Baeza-Yates, R., Scherson, I.: Sync/async parallel search for the efficient design and construction of web search engines. In: Parallel Computing (2010)
Moffat, A., Webber, W., Zobel, J.: Load balancing for term-distributed parallel retrieval. In: SIGIR (2006)
Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. (2007)
Ribeiro-Neto, B., Barbosa, R.: Query performance for tightly coupled distributed digital libraries. In: DL (1998)
Strohman, T., Croft, W.: Efficient document retrieval in main memory. In: SIGIR (2007)
Tomasic, A., Garcia-Molina, H.: Query processing and inverted indices in shared nothing text document information retrieval systems. The VLDB Journal (1993)
Turtle, H., Flood, J.: Query evaluation: strategies and optimizations. Inf. Process. Manage. (1995)
Webber, W.: Design and evaluation of a pipelined distributed information retrieval architecture. Master’s thesis (2007)
Xi, W., Sornil, O., Luo, M., Fox, E.A.: Hybrid Partition Inverted Files: Experimental Validation. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, p. 422. Springer, Heidelberg (2002)
Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: WWW (2009)
Zhang, J., Suel, T.: Optimized inverted list assignment in distributed search engine architectures. In: Paral. and Dist. Proc. Symp. Int. (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jonassen, S., Bratsberg, S.E. (2012). Intra-query Concurrent Pipelined Processing for Distributed Full-Text Retrieval. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-28997-2_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28996-5
Online ISBN: 978-3-642-28997-2
eBook Packages: Computer ScienceComputer Science (R0)