Advertisement

World Wide Web

, Volume 17, Issue 5, pp 949–967 | Cite as

Improving the performance of pipelined query processing with skipping—and its comparison to document-wise partitioning

  • Simon Jonassen
  • Svein Erik BratsbergEmail author
Article

Abstract

Web search engines need to provide high throughput and short query latency. Recent results show that pipelined query processing over a term-wise partitioned inverted index may have superior throughput. However, the query processing latency and scalability with respect to the collections size are the main challenges associated with this method. In this paper, we evaluate the effect of inverted index skipping on the performance of pipelined query processing. Further, we introduce a novel idea of using Max-Score pruning within pipelined query processing and a new term assignment heuristic, partitioning by Max-Score. Our current results indicate a significant improvement over the state-of-the-art approach and lead to several further optimizations which include dynamic load balancing, intra-query concurrent processing and a hybrid combination between pipelined and non-pipelined execution. Lastly, we show how the state of term-wise partitioning relates to the industry standard document-wise partitioning. Even though there are situations pipelined query processing is advantegous, document-wise partitioning is still the road to follow.

Keywords

Information retrieval Text search engines Distributed index organization Pipelined query processing Skipping 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cambazoglu, B.B., Aykanat, C.: A term-based inverted index organization for communication-efficient parallel query processing. In: IFIP NPC (2006)Google Scholar
  2. 2.
    Cambazoglu, B.B., Kayaaslan, E., Jonassen, S., Ayakanat, C.: A term-based inverted index partitioning model for efficient distributed query processing. In: ACM Transactions on Embedded Computing Systems, vol. 9, no. 4, Article 39 (2010)Google Scholar
  3. 3.
    Dimopoulos, C., Nepomnyachiy, S., Suel, T.: Optimizing top-k document retrieval strategies for block-max indexes. In: WSDM (2013)Google Scholar
  4. 4.
    Jonassen, S., Bratsberg, S.E.: Impact of the query model and system settings on performance of distributed inverted indexes. In: NIK (2009)Google Scholar
  5. 5.
    Jonassen, S., Bratsberg, S.E.: A combined semi-pipelined query processing architecture for distributed full-text retrieval. In: WISE (2010)Google Scholar
  6. 6.
    Jonassen, S., Bratsberg, S.E.: Efficient compressed inverted index skipping for disjunctive text-queries. In: ECIR (2011)Google Scholar
  7. 7.
    Jonassen, S., Bratsberg, S.E.: Intra-query concurrent pipelined processing for distributed full-text retrieval. In: ECIR (2012)Google Scholar
  8. 8.
    Jonassen, S., Bratsberg, S.E.: Improving the performance of pipelined query processing with skipping. In: WISE (2012)Google Scholar
  9. 9.
    Lester, N., Moffat, A., Webber, W., Zobel, J.: Space-limited ranked query evaluation using adaptive pruning. In: WISE (2005)Google Scholar
  10. 10.
    Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Mining query logs to optimize index partitioning in parallel web search engines. In: InfoScale (2007)Google Scholar
  11. 11.
    Moffat, A., Webber, W., Zobel, J.: Load balancing for term-distributed parallel retrieval. In: SIGIR (2006)Google Scholar
  12. 12.
    Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.A.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10(3), 205–231 (2007)CrossRefGoogle Scholar
  13. 13.
    Risvik, K.M.: Scaling internet search engines – methods and analysis. PhD thesis, Norwegian University of Science and Technology (2004)Google Scholar
  14. 14.
    Risvik, K.M., Michelsen, R.: Search engines and web dynamics. Comput. Netw. 39(3), 289–302 (2002)CrossRefGoogle Scholar
  15. 15.
    Risvik, K.M., Chilimbi, T.M., Tan, H., Kalyanaraman, K., Anderson, C.: Maguro, a system for indexing and searching over very large text collections. In: WSDM (2013)Google Scholar
  16. 16.
    Strohman, T., Turtle, H., Croft, W.B.: Optimization strategies for complex queries. In: SIGIR (2005)Google Scholar
  17. 17.
    Turtle, H.R., Flood, J.: Query evaluation: strategies and optimizations. Inf. Proc. Manag. 31(6), 831–850 (1995)CrossRefGoogle Scholar
  18. 18.
    Webber, W.: Design and evaluation of a pipelined distributed information retrieval architecture. Master’s thesis (2007)Google Scholar
  19. 19.
    Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: WWW (2009)Google Scholar
  20. 20.
    Zhang, J., Suel, T.: Optimized inverted list assignment in distributed search engine architectures. In: IPDPS (2007)Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Norwegian University of Science and TechnologyTrondheimNorway

Personalised recommendations