Advertisement

A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval

  • Simon Jonassen
  • Svein Erik Bratsberg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6488)

Abstract

Term-partitioning is an efficient way to distribute a large inverted index. Two fundamentally different query processing approaches are pipelined and non-pipelined. While the pipelined approach provides higher query throughput, the non-pipelined approach provides shorter query latency. In this work we propose a third alternative, combining non-pipelined inverted index access, heuristic decision between pipelined and non-pipelined query execution and an improved query routing strategy. From our results, the method combines the advantages of both approaches and provides high throughput and short query latency. Our method increases the throughput by up to 26% compared to the non-pipelined approach and reduces the latency by up to 32% compared to the pipelined.

Keywords

Query Processing Inverted Index Pruning Method Inverted List Decision Heuristic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, N.: Distributed query processing using partitioned inverted files. In: Proc. SPIRE (2001)Google Scholar
  2. 2.
    Badue, C., Barbosa, R., Golgher, P., Ribeiro-Neto, B., Ziviani, N.: Basic issues on the processing of web queries. In: Proc. SIGIR. ACM Press, New York (2005)Google Scholar
  3. 3.
    Büttcher, S., Clarke, C., Soboroff, I.: The trec 2006 terabyte track. In: Proc. TREC (2006)Google Scholar
  4. 4.
    Clarke, C., Soboroff, I.: The trec 2005 terabyte track. In: Proc. TREC (2005)Google Scholar
  5. 5.
    Lester, N., Moffat, A., Webber, W., Zobel, J.: Space-limited ranked query evaluation using adaptive pruning. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 470–477. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Mining query logs to optimize index partitioning in parallel web search engines. In: Proc. InfoScale (2007)Google Scholar
  7. 7.
    MacFarlane, A., McCann, J., Robertson, S.: Parallel search using partitioned inverted files. In: Proc. SPIRE. IEEE Computer Society, Los Alamitos (2000)Google Scholar
  8. 8.
    Marin, M., Gil-Costa, V.: High-performance distributed inverted files. In: Proc. CIKM. ACM, New York (2007)Google Scholar
  9. 9.
    Marin, M., Gil-Costa, V., Bonacic, C., Baeza-Yates, R., Scherson, I.: Sync/Async parallel search for the efficient design and construction of web search engines. Parallel Computing 36(4) (2010)Google Scholar
  10. 10.
    Marin, M., Gomez, C.: Load balancing distributed inverted files. In: Proc. WIDM. ACM, New York (2007)Google Scholar
  11. 11.
    Marin, M., Gomez-Pantoja, C., Gonzalez, S., Gil-Costa, V.: Scheduling intersection queries in term partitioned inverted files. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 434–443. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Moffat, A., Webber, W., Zobel, J.: Load balancing for term-distributed parallel retrieval. In: Proc. SIGIR. ACM Press, New York (2006)Google Scholar
  13. 13.
    Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10(3) (2007)Google Scholar
  14. 14.
    Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: OSIR Workshop, SIGIR (2006)Google Scholar
  15. 15.
    Ribeiro-Neto, B., Barbosa, R.: Query performance for tightly coupled distributed digital libraries. In: Proc. DL. ACM Press, New York (1998)Google Scholar
  16. 16.
    Zipf, G.: Human behavior and the principle of least-effort. Journal of the American Society for Information Science and Technology (1949)Google Scholar
  17. 17.
    Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Simon Jonassen
    • 1
  • Svein Erik Bratsberg
    • 1
  1. 1.Department of Computer and Information ScienceNorwegian University of Science and TechnologyTrondheimNorway

Personalised recommendations