A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval

Jonassen, Simon; Bratsberg, Svein Erik

doi:10.1007/978-3-642-17616-6_51

Simon Jonassen¹⁹ &
Svein Erik Bratsberg¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6488))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1538 Accesses
5 Citations

Abstract

Term-partitioning is an efficient way to distribute a large inverted index. Two fundamentally different query processing approaches are pipelined and non-pipelined. While the pipelined approach provides higher query throughput, the non-pipelined approach provides shorter query latency. In this work we propose a third alternative, combining non-pipelined inverted index access, heuristic decision between pipelined and non-pipelined query execution and an improved query routing strategy. From our results, the method combines the advantages of both approaches and provides high throughput and short query latency. Our method increases the throughput by up to 26% compared to the non-pipelined approach and reduces the latency by up to 32% compared to the pipelined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, N.: Distributed query processing using partitioned inverted files. In: Proc. SPIRE (2001)
Google Scholar
Badue, C., Barbosa, R., Golgher, P., Ribeiro-Neto, B., Ziviani, N.: Basic issues on the processing of web queries. In: Proc. SIGIR. ACM Press, New York (2005)
Google Scholar
Büttcher, S., Clarke, C., Soboroff, I.: The trec 2006 terabyte track. In: Proc. TREC (2006)
Google Scholar
Clarke, C., Soboroff, I.: The trec 2005 terabyte track. In: Proc. TREC (2005)
Google Scholar
Lester, N., Moffat, A., Webber, W., Zobel, J.: Space-limited ranked query evaluation using adaptive pruning. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 470–477. Springer, Heidelberg (2005)
Chapter Google Scholar
Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Mining query logs to optimize index partitioning in parallel web search engines. In: Proc. InfoScale (2007)
Google Scholar
MacFarlane, A., McCann, J., Robertson, S.: Parallel search using partitioned inverted files. In: Proc. SPIRE. IEEE Computer Society, Los Alamitos (2000)
Google Scholar
Marin, M., Gil-Costa, V.: High-performance distributed inverted files. In: Proc. CIKM. ACM, New York (2007)
Google Scholar
Marin, M., Gil-Costa, V., Bonacic, C., Baeza-Yates, R., Scherson, I.: Sync/Async parallel search for the efficient design and construction of web search engines. Parallel Computing 36(4) (2010)
Google Scholar
Marin, M., Gomez, C.: Load balancing distributed inverted files. In: Proc. WIDM. ACM, New York (2007)
Google Scholar
Marin, M., Gomez-Pantoja, C., Gonzalez, S., Gil-Costa, V.: Scheduling intersection queries in term partitioned inverted files. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 434–443. Springer, Heidelberg (2008)
Chapter Google Scholar
Moffat, A., Webber, W., Zobel, J.: Load balancing for term-distributed parallel retrieval. In: Proc. SIGIR. ACM Press, New York (2006)
Google Scholar
Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10(3) (2007)
Google Scholar
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: OSIR Workshop, SIGIR (2006)
Google Scholar
Ribeiro-Neto, B., Barbosa, R.: Query performance for tightly coupled distributed digital libraries. In: Proc. DL. ACM Press, New York (1998)
Google Scholar
Zipf, G.: Human behavior and the principle of least-effort. Journal of the American Society for Information Science and Technology (1949)
Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, Norwegian University of Science and Technology, Sem Sælands vei 7-9, NO-7491, Trondheim, Norway
Simon Jonassen & Svein Erik Bratsberg

Authors

Simon Jonassen
View author publications
You can also search for this author in PubMed Google Scholar
Svein Erik Bratsberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
Lei Chen
University of Patras, 26504, Patras, Greece
Peter Triantafillou
Polytechnic Institute of NYU, 11201, Brooklyn, NY, USA
Torsten Suel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jonassen, S., Bratsberg, S.E. (2010). A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval. In: Chen, L., Triantafillou, P., Suel, T. (eds) Web Information Systems Engineering – WISE 2010. WISE 2010. Lecture Notes in Computer Science, vol 6488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17616-6_51

Download citation

DOI: https://doi.org/10.1007/978-3-642-17616-6_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17615-9
Online ISBN: 978-3-642-17616-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics