Advertisement

Hybrid Query Scheduling for a Replicated Search Engine

  • Ana Freire
  • Craig Macdonald
  • Nicola Tonellotto
  • Iadh Ounis
  • Fidel Cacheda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7814)

Abstract

Search engines use replication and distribution of large indices across many query servers to achieve efficient retrieval. Under high query load, queries can be scheduled to replicas that are expected to be idle soonest, facilitated by the use of predicted query response times. However, the overhead of making response time predictions can hinder the usefulness of query scheduling under low query load. In this paper, we propose a hybrid scheduling approach that combines the scheduling methods appropriate for both low and high load conditions, and can adapt in response to changing conditions. We deploy a simulation framework, which is prepared with actual and predicted response times for real Web search queries for one full day. Our experiments using different numbers of shards and replicas of the 50 million document ClueWeb09 corpus show that hybrid scheduling can reduce the average waiting times of one day of queries by 68% under high load conditions and by 7% under low load conditions w.r.t. traditional scheduling methods.

Keywords

Query Efficiency Prediction Query Scheduling Distributed Search Engines 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dean, J.: Challenges in building large-scale information retrieval systems: invited talk. In: Proceedings of WSDM 2009 (2009)Google Scholar
  2. 2.
    Cacheda, F., Carneiro, V., Plachouras, V., Ounis, I.: Performance analysis of distributed information retrieval architectures using an improved network simulation model. Inf. Process. Manage. 43(1), 204–224 (2007)CrossRefGoogle Scholar
  3. 3.
    Macdonald, C., Tonellotto, N., Ounis, I.: Learning to predict response times for online query scheduling. In: Proceedings of SIGIR 2012, pp. 621–630 (2012)Google Scholar
  4. 4.
    Freire, A., Macdonald, C., Tonellotto, N., Ounis, I., Cacheda, F.: Scheduling queries across replicas. In: Proceedings of SIGIR 2012, pp. 1139–1140 (2012)Google Scholar
  5. 5.
    Silvestri, F.: Mining query logs: Turning search usage data into knowledge. Foundations and Trends in Information Retrieval 4(1-2), 1–174 (2010)CrossRefzbMATHGoogle Scholar
  6. 6.
    Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10, 205–231 (2007)CrossRefGoogle Scholar
  7. 7.
    Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.: Efficient query evaluation using a two-level retrieval process. In: Proceedings of CIKM 2003, pp. 426–434 (2003)Google Scholar
  8. 8.
    Brutlag, J.D., Hutchinson, H., Stone, M.: User preference and search engine latency. In: JSM Proceedings: Quality and Productivity Research Section (2008)Google Scholar
  9. 9.
    Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proceedings of CIKM 2003, pp. 199–206 (2003)Google Scholar
  10. 10.
    Craswell, N., Jones, R., Dupret, G., Viegas, E. (eds.): Proceedings of the Web Search Click Data Workshop at WSDM 2009 (2009)Google Scholar
  11. 11.
    Macdonald, C., McCreadie, R., Santos, R., Ounis, I.: From puppy to maturity: Experiences in developing Terrier. In: Proc. of the OSIR at SIGIR 2012 (2012)Google Scholar
  12. 12.
    Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. Transactions on Information Systems 14(4), 349–379 (1996)CrossRefGoogle Scholar
  13. 13.
    Amati, G., Ambrosi, E., Bianchi, M., Gaibisso, C., Gambosi, G.: FUB, IASI-CNR and University of Tor Vergata at TREC 2007 blog track. In: Proceedings of TREC 2007 (2007)Google Scholar
  14. 14.
    Tonellotto, N., Macdonald, C., Ounis, I.: Query efficiency prediction for dynamic pruning. In: Proceedings of LSDS-IR at CIKM 2011 (2011)Google Scholar
  15. 15.
    Friedman, J.H.: Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 1189–1232 (2000)CrossRefGoogle Scholar
  16. 16.
    Ganjisaffar, Y., Caruana, R., Lopes, C.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proc. of SIGIR 2011, pp. 85–94 (2011)Google Scholar
  17. 17.
    Simmons, B., McCloskey, A., Lutfiyya, H.: Dynamic provisioning of resources in data centers. In: Proceedings of ICAS 2007, pp. 40–46 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ana Freire
    • 1
  • Craig Macdonald
    • 2
  • Nicola Tonellotto
    • 3
  • Iadh Ounis
    • 2
  • Fidel Cacheda
    • 1
  1. 1.University of A CoruñaA CoruñaSpain
  2. 2.University of GlasgowGlasgowUK
  3. 3.National Research Council of ItalyPisaItaly

Personalised recommendations