Skip to main content

Performance Improvements for Search Systems Using an Integrated Cache of Lists+Intersections

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8799))

Abstract

Modern information retrieval systems use several levels of caching to speedup computation by exploiting frequent, recent or costly data used in the past. In this study we propose and evaluate a static cache that works simultaneously as list and intersection cache, offering a more efficient way of handling cache space. In addition, we propose effective strategies to select the term pairs that should populate the cache. Simulation using two datasets and a real query log reveal that the proposed approach improves overall performance in terms of total processing time, achieving savings of up to 40% in the best case.

This work was partially supported by EU-IRSES project EUSACOU 247574, by EU FET project MULTIPLEX 317532 and by UBACyT Project 20020120100058 “Herramientas algorítmicas avanzadas para aplicaciones de búsqueda en Internet - Parte 2”.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F.: The impact of caching on search engines. In: Proc. of the 30th Annual Int. Conf. on Research and Development in Information Retrieval (2007)

    Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology behind Search, 2nd edn. Addison-Wesley Prof., Inc. (2011)

    Google Scholar 

  3. Cambazoglu, B.B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., Degenhardt, J.: Early exit optimizations for additive machine learned ranking systems. In: Proc. of the Third ACM Int. Conf. on Web Search and Data Mining (2010)

    Google Scholar 

  4. Culpepper, J.S., Moffat, A.: Compact set representation for information retrieval. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 137–148. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Dean, J.: Challenges in building large-scale information retrieval systems: Invited talk. In: Proc. of the Second ACM International Conf. on Web Search and Data Mining, WSDM 2009, p. 1. ACM, New York (2009)

    Chapter  Google Scholar 

  6. Ding, S., Attenberg, J., Baeza-Yates, R., Suel, T.: Batch query processing for web search engines. In: Proc. of the Fourth ACM International Conf. on Web Search and Data Mining, WSDM 2011, New York, NY, USA, pp. 137–146 (2011)

    Google Scholar 

  7. Fagni, T., Perego, R., Silvestri, F., Orlando, S.: Boosting the performance of web search engines: Caching and prefetching query results by exploiting historicalusage data. ACM Trans. Inf. Syst. 24(1), 51–78 (2006)

    Article  Google Scholar 

  8. Feuerstein, E., Tolosa, G.: Analysis of cost-aware policies for intersection caching in search nodes. In: Proc. of the XXXII Conf. of the Chilean Society of Computer Science, SCCC 2013 (2013)

    Google Scholar 

  9. Feuerstein, E., Tolosa, G.: Cost-aware intersection caching and processing strategies for in-memory inverted indexes. In: Proc. of 11th Workshop on Large-scale and Distributed Systems for Information Retrieval, LSDS-IR 2014, New York (2014)

    Google Scholar 

  10. Gan, Q., Suel, T.: Improved techniques for result caching in web search engines. In: Proc. of the 18th Int. Conf. on World Wide Web, WWW 2009, pp. 431–440 (2009)

    Google Scholar 

  11. Hirai, J., Raghavan, S., Garcia-Molina, H., Paepcke, A.: Webbase: A repository of web pages. In: Proc. of the 9th International World Wide Web Conf. on Computer Networks. North-Holland Publishing Co. (2000)

    Google Scholar 

  12. Lam, H.T., Perego, R., Quan, N.T.M., Silvestri, F.: Entry pairing in inverted file. In: Vossen, G., Long, D.D.E., Yu, J.X. (eds.) WISE 2009. LNCS, vol. 5802, pp. 511–522. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  13. Long, X., Suel, T.: Three-level caching for efficient query processing in large web search engines. In: Proc. of the 14th Int. Conf. on World Wide Web, WWW 2005, USA, pp. 257–266 (2005)

    Google Scholar 

  14. Markatos, E.: On caching search engine query results. Comput. Commun. 24(2), 137–143 (2001)

    Article  Google Scholar 

  15. Ozcan, R., Altingovde, I.S., Ulusoy, O.: Cost-aware strategies for query result caching in web search engines. ACM Trans. Web 5(2), 9:1–9:25 (2011)

    Google Scholar 

  16. Ozcan, R., Sengor Altingovde, I., Barla Cambazoglu, B., Junqueira, F.P., Ulusoy, O.: A five-level static cache architecture for web search engines. Information Processing & Management 48(5), 828–840 (2012)

    Article  Google Scholar 

  17. Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: Proc. of the 1st International Conf. on Scalable Information Systems, InfoScale 2006. ACM (2006)

    Google Scholar 

  18. Saraiva, P.C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., Riberio-Neto, B.: Rank-preserving two-level caching for scalable search engines. In: Proc. of the 24th Annual Int. Conf. on Research and Development in Information Retrieval, SIGIR 2001, USA, pp. 51–58 (2001)

    Google Scholar 

  19. Turtle, H., Flood, J.: Query evaluation: Strategies and optimizations. Information Processing and Management 31(6), 831–850 (1995)

    Article  Google Scholar 

  20. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (1999)

    Google Scholar 

  21. Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proc. of the 17th Int. Conf. on World Wide Web, WWW 2008, USA, pp. 387–396 (2008)

    Google Scholar 

  22. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (July 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tolosa, G., Becchetti, L., Feuerstein, E., Marchetti-Spaccamela, A. (2014). Performance Improvements for Search Systems Using an Integrated Cache of Lists+Intersections. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11918-2_22

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11917-5

  • Online ISBN: 978-3-319-11918-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics