Improving the performance of pipelined query processing with skipping—and its comparison to document-wise partitioning

Abstract

Web search engines need to provide high throughput and short query latency. Recent results show that pipelined query processing over a term-wise partitioned inverted index may have superior throughput. However, the query processing latency and scalability with respect to the collections size are the main challenges associated with this method. In this paper, we evaluate the effect of inverted index skipping on the performance of pipelined query processing. Further, we introduce a novel idea of using Max-Score pruning within pipelined query processing and a new term assignment heuristic, partitioning by Max-Score. Our current results indicate a significant improvement over the state-of-the-art approach and lead to several further optimizations which include dynamic load balancing, intra-query concurrent processing and a hybrid combination between pipelined and non-pipelined execution. Lastly, we show how the state of term-wise partitioning relates to the industry standard document-wise partitioning. Even though there are situations pipelined query processing is advantegous, document-wise partitioning is still the road to follow.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    Cambazoglu, B.B., Aykanat, C.: A term-based inverted index organization for communication-efficient parallel query processing. In: IFIP NPC (2006)

  2. 2.

    Cambazoglu, B.B., Kayaaslan, E., Jonassen, S., Ayakanat, C.: A term-based inverted index partitioning model for efficient distributed query processing. In: ACM Transactions on Embedded Computing Systems, vol. 9, no. 4, Article 39 (2010)

  3. 3.

    Dimopoulos, C., Nepomnyachiy, S., Suel, T.: Optimizing top-k document retrieval strategies for block-max indexes. In: WSDM (2013)

  4. 4.

    Jonassen, S., Bratsberg, S.E.: Impact of the query model and system settings on performance of distributed inverted indexes. In: NIK (2009)

  5. 5.

    Jonassen, S., Bratsberg, S.E.: A combined semi-pipelined query processing architecture for distributed full-text retrieval. In: WISE (2010)

  6. 6.

    Jonassen, S., Bratsberg, S.E.: Efficient compressed inverted index skipping for disjunctive text-queries. In: ECIR (2011)

  7. 7.

    Jonassen, S., Bratsberg, S.E.: Intra-query concurrent pipelined processing for distributed full-text retrieval. In: ECIR (2012)

  8. 8.

    Jonassen, S., Bratsberg, S.E.: Improving the performance of pipelined query processing with skipping. In: WISE (2012)

  9. 9.

    Lester, N., Moffat, A., Webber, W., Zobel, J.: Space-limited ranked query evaluation using adaptive pruning. In: WISE (2005)

  10. 10.

    Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Mining query logs to optimize index partitioning in parallel web search engines. In: InfoScale (2007)

  11. 11.

    Moffat, A., Webber, W., Zobel, J.: Load balancing for term-distributed parallel retrieval. In: SIGIR (2006)

  12. 12.

    Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.A.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10(3), 205–231 (2007)

    Article  Google Scholar 

  13. 13.

    Risvik, K.M.: Scaling internet search engines – methods and analysis. PhD thesis, Norwegian University of Science and Technology (2004)

  14. 14.

    Risvik, K.M., Michelsen, R.: Search engines and web dynamics. Comput. Netw. 39(3), 289–302 (2002)

    Article  Google Scholar 

  15. 15.

    Risvik, K.M., Chilimbi, T.M., Tan, H., Kalyanaraman, K., Anderson, C.: Maguro, a system for indexing and searching over very large text collections. In: WSDM (2013)

  16. 16.

    Strohman, T., Turtle, H., Croft, W.B.: Optimization strategies for complex queries. In: SIGIR (2005)

  17. 17.

    Turtle, H.R., Flood, J.: Query evaluation: strategies and optimizations. Inf. Proc. Manag. 31(6), 831–850 (1995)

    Article  Google Scholar 

  18. 18.

    Webber, W.: Design and evaluation of a pipelined distributed information retrieval architecture. Master’s thesis (2007)

  19. 19.

    Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: WWW (2009)

  20. 20.

    Zhang, J., Suel, T.: Optimized inverted list assignment in distributed search engine architectures. In: IPDPS (2007)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Svein Erik Bratsberg.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Jonassen, S., Bratsberg, S.E. Improving the performance of pipelined query processing with skipping—and its comparison to document-wise partitioning. World Wide Web 17, 949–967 (2014). https://doi.org/10.1007/s11280-013-0260-2

Download citation

Keywords

  • Information retrieval
  • Text search engines
  • Distributed index organization
  • Pipelined query processing
  • Skipping