Performance Improvements for Search Systems Using an Integrated Cache of Lists+Intersections

Tolosa, Gabriel; Becchetti, Luca; Feuerstein, Esteban; Marchetti-Spaccamela, Alberto

doi:10.1007/978-3-319-11918-2_22

Performance Improvements for Search Systems Using an Integrated Cache of Lists+Intersections

Gabriel Tolosa^17,18,
Luca Becchetti¹⁹,
Esteban Feuerstein¹⁷ &
…
Alberto Marchetti-Spaccamela¹⁹

Conference paper

630 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8799))

Abstract

Modern information retrieval systems use several levels of caching to speedup computation by exploiting frequent, recent or costly data used in the past. In this study we propose and evaluate a static cache that works simultaneously as list and intersection cache, offering a more efficient way of handling cache space. In addition, we propose effective strategies to select the term pairs that should populate the cache. Simulation using two datasets and a real query log reveal that the proposed approach improves overall performance in terms of total processing time, achieving savings of up to 40% in the best case.

This work was partially supported by EU-IRSES project EUSACOU 247574, by EU FET project MULTIPLEX 317532 and by UBACyT Project 20020120100058 “Herramientas algorítmicas avanzadas para aplicaciones de búsqueda en Internet - Parte 2”.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F.: The impact of caching on search engines. In: Proc. of the 30th Annual Int. Conf. on Research and Development in Information Retrieval (2007)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology behind Search, 2nd edn. Addison-Wesley Prof., Inc. (2011)
Google Scholar
Cambazoglu, B.B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., Degenhardt, J.: Early exit optimizations for additive machine learned ranking systems. In: Proc. of the Third ACM Int. Conf. on Web Search and Data Mining (2010)
Google Scholar
Culpepper, J.S., Moffat, A.: Compact set representation for information retrieval. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 137–148. Springer, Heidelberg (2007)
Chapter Google Scholar
Dean, J.: Challenges in building large-scale information retrieval systems: Invited talk. In: Proc. of the Second ACM International Conf. on Web Search and Data Mining, WSDM 2009, p. 1. ACM, New York (2009)
Chapter Google Scholar
Ding, S., Attenberg, J., Baeza-Yates, R., Suel, T.: Batch query processing for web search engines. In: Proc. of the Fourth ACM International Conf. on Web Search and Data Mining, WSDM 2011, New York, NY, USA, pp. 137–146 (2011)
Google Scholar
Fagni, T., Perego, R., Silvestri, F., Orlando, S.: Boosting the performance of web search engines: Caching and prefetching query results by exploiting historicalusage data. ACM Trans. Inf. Syst. 24(1), 51–78 (2006)
Article Google Scholar
Feuerstein, E., Tolosa, G.: Analysis of cost-aware policies for intersection caching in search nodes. In: Proc. of the XXXII Conf. of the Chilean Society of Computer Science, SCCC 2013 (2013)
Google Scholar
Feuerstein, E., Tolosa, G.: Cost-aware intersection caching and processing strategies for in-memory inverted indexes. In: Proc. of 11th Workshop on Large-scale and Distributed Systems for Information Retrieval, LSDS-IR 2014, New York (2014)
Google Scholar
Gan, Q., Suel, T.: Improved techniques for result caching in web search engines. In: Proc. of the 18th Int. Conf. on World Wide Web, WWW 2009, pp. 431–440 (2009)
Google Scholar
Hirai, J., Raghavan, S., Garcia-Molina, H., Paepcke, A.: Webbase: A repository of web pages. In: Proc. of the 9th International World Wide Web Conf. on Computer Networks. North-Holland Publishing Co. (2000)
Google Scholar
Lam, H.T., Perego, R., Quan, N.T.M., Silvestri, F.: Entry pairing in inverted file. In: Vossen, G., Long, D.D.E., Yu, J.X. (eds.) WISE 2009. LNCS, vol. 5802, pp. 511–522. Springer, Heidelberg (2009)
Chapter Google Scholar
Long, X., Suel, T.: Three-level caching for efficient query processing in large web search engines. In: Proc. of the 14th Int. Conf. on World Wide Web, WWW 2005, USA, pp. 257–266 (2005)
Google Scholar
Markatos, E.: On caching search engine query results. Comput. Commun. 24(2), 137–143 (2001)
Article Google Scholar
Ozcan, R., Altingovde, I.S., Ulusoy, O.: Cost-aware strategies for query result caching in web search engines. ACM Trans. Web 5(2), 9:1–9:25 (2011)
Google Scholar
Ozcan, R., Sengor Altingovde, I., Barla Cambazoglu, B., Junqueira, F.P., Ulusoy, O.: A five-level static cache architecture for web search engines. Information Processing & Management 48(5), 828–840 (2012)
Article Google Scholar
Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: Proc. of the 1st International Conf. on Scalable Information Systems, InfoScale 2006. ACM (2006)
Google Scholar
Saraiva, P.C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., Riberio-Neto, B.: Rank-preserving two-level caching for scalable search engines. In: Proc. of the 24th Annual Int. Conf. on Research and Development in Information Retrieval, SIGIR 2001, USA, pp. 51–58 (2001)
Google Scholar
Turtle, H., Flood, J.: Query evaluation: Strategies and optimizations. Information Processing and Management 31(6), 831–850 (1995)
Article Google Scholar
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proc. of the 17th Int. Conf. on World Wide Web, WWW 2008, USA, pp. 387–396 (2008)
Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (July 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Buenos Aires, Argentina
Gabriel Tolosa & Esteban Feuerstein
National University of Luján, Argentina
Gabriel Tolosa
Sapienza University of Rome, Italy
Luca Becchetti & Alberto Marchetti-Spaccamela

Authors

Gabriel Tolosa
View author publications
You can also search for this author in PubMed Google Scholar
Luca Becchetti
View author publications
You can also search for this author in PubMed Google Scholar
Esteban Feuerstein
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Marchetti-Spaccamela
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto de Computação, Universidade Federal do Amazonas, 6200, Manaus, Brazil
Edleno Moura
King’s College London, UK
Maxime Crochemore

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tolosa, G., Becchetti, L., Feuerstein, E., Marchetti-Spaccamela, A. (2014). Performance Improvements for Search Systems Using an Integrated Cache of Lists+Intersections. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-11918-2_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11917-5
Online ISBN: 978-3-319-11918-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics