Abstract
Term-partitioning is an efficient way to distribute a large inverted index. Two fundamentally different query processing approaches are pipelined and non-pipelined. While the pipelined approach provides higher query throughput, the non-pipelined approach provides shorter query latency. In this work we propose a third alternative, combining non-pipelined inverted index access, heuristic decision between pipelined and non-pipelined query execution and an improved query routing strategy. From our results, the method combines the advantages of both approaches and provides high throughput and short query latency. Our method increases the throughput by up to 26% compared to the non-pipelined approach and reduces the latency by up to 32% compared to the pipelined.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, N.: Distributed query processing using partitioned inverted files. In: Proc. SPIRE (2001)
Badue, C., Barbosa, R., Golgher, P., Ribeiro-Neto, B., Ziviani, N.: Basic issues on the processing of web queries. In: Proc. SIGIR. ACM Press, New York (2005)
Büttcher, S., Clarke, C., Soboroff, I.: The trec 2006 terabyte track. In: Proc. TREC (2006)
Clarke, C., Soboroff, I.: The trec 2005 terabyte track. In: Proc. TREC (2005)
Lester, N., Moffat, A., Webber, W., Zobel, J.: Space-limited ranked query evaluation using adaptive pruning. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 470–477. Springer, Heidelberg (2005)
Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Mining query logs to optimize index partitioning in parallel web search engines. In: Proc. InfoScale (2007)
MacFarlane, A., McCann, J., Robertson, S.: Parallel search using partitioned inverted files. In: Proc. SPIRE. IEEE Computer Society, Los Alamitos (2000)
Marin, M., Gil-Costa, V.: High-performance distributed inverted files. In: Proc. CIKM. ACM, New York (2007)
Marin, M., Gil-Costa, V., Bonacic, C., Baeza-Yates, R., Scherson, I.: Sync/Async parallel search for the efficient design and construction of web search engines. Parallel Computing 36(4) (2010)
Marin, M., Gomez, C.: Load balancing distributed inverted files. In: Proc. WIDM. ACM, New York (2007)
Marin, M., Gomez-Pantoja, C., Gonzalez, S., Gil-Costa, V.: Scheduling intersection queries in term partitioned inverted files. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 434–443. Springer, Heidelberg (2008)
Moffat, A., Webber, W., Zobel, J.: Load balancing for term-distributed parallel retrieval. In: Proc. SIGIR. ACM Press, New York (2006)
Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10(3) (2007)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: OSIR Workshop, SIGIR (2006)
Ribeiro-Neto, B., Barbosa, R.: Query performance for tightly coupled distributed digital libraries. In: Proc. DL. ACM Press, New York (1998)
Zipf, G.: Human behavior and the principle of least-effort. Journal of the American Society for Information Science and Technology (1949)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jonassen, S., Bratsberg, S.E. (2010). A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval. In: Chen, L., Triantafillou, P., Suel, T. (eds) Web Information Systems Engineering – WISE 2010. WISE 2010. Lecture Notes in Computer Science, vol 6488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17616-6_51
Download citation
DOI: https://doi.org/10.1007/978-3-642-17616-6_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17615-9
Online ISBN: 978-3-642-17616-6
eBook Packages: Computer ScienceComputer Science (R0)