Skip to main content

A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval

  • Conference paper
Web Information Systems Engineering – WISE 2010 (WISE 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6488))

Included in the following conference series:

Abstract

Term-partitioning is an efficient way to distribute a large inverted index. Two fundamentally different query processing approaches are pipelined and non-pipelined. While the pipelined approach provides higher query throughput, the non-pipelined approach provides shorter query latency. In this work we propose a third alternative, combining non-pipelined inverted index access, heuristic decision between pipelined and non-pipelined query execution and an improved query routing strategy. From our results, the method combines the advantages of both approaches and provides high throughput and short query latency. Our method increases the throughput by up to 26% compared to the non-pipelined approach and reduces the latency by up to 32% compared to the pipelined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, N.: Distributed query processing using partitioned inverted files. In: Proc. SPIRE (2001)

    Google Scholar 

  2. Badue, C., Barbosa, R., Golgher, P., Ribeiro-Neto, B., Ziviani, N.: Basic issues on the processing of web queries. In: Proc. SIGIR. ACM Press, New York (2005)

    Google Scholar 

  3. Büttcher, S., Clarke, C., Soboroff, I.: The trec 2006 terabyte track. In: Proc. TREC (2006)

    Google Scholar 

  4. Clarke, C., Soboroff, I.: The trec 2005 terabyte track. In: Proc. TREC (2005)

    Google Scholar 

  5. Lester, N., Moffat, A., Webber, W., Zobel, J.: Space-limited ranked query evaluation using adaptive pruning. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 470–477. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Mining query logs to optimize index partitioning in parallel web search engines. In: Proc. InfoScale (2007)

    Google Scholar 

  7. MacFarlane, A., McCann, J., Robertson, S.: Parallel search using partitioned inverted files. In: Proc. SPIRE. IEEE Computer Society, Los Alamitos (2000)

    Google Scholar 

  8. Marin, M., Gil-Costa, V.: High-performance distributed inverted files. In: Proc. CIKM. ACM, New York (2007)

    Google Scholar 

  9. Marin, M., Gil-Costa, V., Bonacic, C., Baeza-Yates, R., Scherson, I.: Sync/Async parallel search for the efficient design and construction of web search engines. Parallel Computing 36(4) (2010)

    Google Scholar 

  10. Marin, M., Gomez, C.: Load balancing distributed inverted files. In: Proc. WIDM. ACM, New York (2007)

    Google Scholar 

  11. Marin, M., Gomez-Pantoja, C., Gonzalez, S., Gil-Costa, V.: Scheduling intersection queries in term partitioned inverted files. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 434–443. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Moffat, A., Webber, W., Zobel, J.: Load balancing for term-distributed parallel retrieval. In: Proc. SIGIR. ACM Press, New York (2006)

    Google Scholar 

  13. Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10(3) (2007)

    Google Scholar 

  14. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: OSIR Workshop, SIGIR (2006)

    Google Scholar 

  15. Ribeiro-Neto, B., Barbosa, R.: Query performance for tightly coupled distributed digital libraries. In: Proc. DL. ACM Press, New York (1998)

    Google Scholar 

  16. Zipf, G.: Human behavior and the principle of least-effort. Journal of the American Society for Information Science and Technology (1949)

    Google Scholar 

  17. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jonassen, S., Bratsberg, S.E. (2010). A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval. In: Chen, L., Triantafillou, P., Suel, T. (eds) Web Information Systems Engineering – WISE 2010. WISE 2010. Lecture Notes in Computer Science, vol 6488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17616-6_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17616-6_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17615-9

  • Online ISBN: 978-3-642-17616-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics