Skip to main content

Intra-query Concurrent Pipelined Processing for Distributed Full-Text Retrieval

  • Conference paper
Advances in Information Retrieval (ECIR 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7224))

Included in the following conference series:

Abstract

Pipelined query processing over a term-wise distributed inverted index has superior throughput at high query multiprogramming levels. However, due to long query latencies this approach is inefficient at lower levels. In this paper we explore two types of intra-query parallelism within the pipelined approach, parallel execution of a query on different nodes and concurrent execution on the same node. According to the experimental results, our approach reaches the throughput of the state-of-the-art method at about half of the latency. On the single query case the observed latency improvement is up to 2.6 times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, N.: Distributed query processing using partitioned inverted files. In: SPIRE (2001)

    Google Scholar 

  2. Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press (2010)

    Google Scholar 

  3. Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: SIGIR (2011)

    Google Scholar 

  4. Feuerstein, E., Marin, M., Mizrahi, M., Gil-Costa, V., Baeza-Yates, R.: Two-Dimensional Distributed Inverted Files. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 206–213. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Jonassen, S., Bratsberg, S.E.: A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval. In: Chen, L., Triantafillou, P., Suel, T. (eds.) WISE 2010. LNCS, vol. 6488, pp. 587–601. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Jonassen, S., Bratsberg, S.E.: Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 530–542. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Lester, N., Moffat, A., Webber, W., Zobel, J.: Space-Limited Ranked Query Evaluation Using Adaptive Pruning. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 470–477. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Mining query logs to optimize index partitioning in parallel web search engines. In: InfoScale (2007)

    Google Scholar 

  9. Marin, M., Gil-Costa, V.: High-performance distributed inverted files. In: CIKM (2007)

    Google Scholar 

  10. Marin, M., Gil-Costa, V., Bonacic, C., Baeza-Yates, R., Scherson, I.: Sync/async parallel search for the efficient design and construction of web search engines. In: Parallel Computing (2010)

    Google Scholar 

  11. Moffat, A., Webber, W., Zobel, J.: Load balancing for term-distributed parallel retrieval. In: SIGIR (2006)

    Google Scholar 

  12. Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. (2007)

    Google Scholar 

  13. Ribeiro-Neto, B., Barbosa, R.: Query performance for tightly coupled distributed digital libraries. In: DL (1998)

    Google Scholar 

  14. Strohman, T., Croft, W.: Efficient document retrieval in main memory. In: SIGIR (2007)

    Google Scholar 

  15. Tomasic, A., Garcia-Molina, H.: Query processing and inverted indices in shared nothing text document information retrieval systems. The VLDB Journal (1993)

    Google Scholar 

  16. Turtle, H., Flood, J.: Query evaluation: strategies and optimizations. Inf. Process. Manage. (1995)

    Google Scholar 

  17. Webber, W.: Design and evaluation of a pipelined distributed information retrieval architecture. Master’s thesis (2007)

    Google Scholar 

  18. Xi, W., Sornil, O., Luo, M., Fox, E.A.: Hybrid Partition Inverted Files: Experimental Validation. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, p. 422. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  19. Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: WWW (2009)

    Google Scholar 

  20. Zhang, J., Suel, T.: Optimized inverted list assignment in distributed search engine architectures. In: Paral. and Dist. Proc. Symp. Int. (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jonassen, S., Bratsberg, S.E. (2012). Intra-query Concurrent Pipelined Processing for Distributed Full-Text Retrieval. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28997-2_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28996-5

  • Online ISBN: 978-3-642-28997-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics