Advertisement

Information Retrieval Journal

, Volume 20, Issue 3, pp 172–198 | Cite as

Performance improvements for search systems using an integrated cache of lists + intersections

  • Gabriel TolosaEmail author
  • Esteban Feuerstein
  • Luca Becchetti
  • Alberto Marchetti-Spaccamela
Information Retrieval Efficiency

Abstract

Modern information retrieval systems use several levels of caching to speedup computation by exploiting frequent, recent or costly data used in the past. Previous studies show that the use of caching techniques is crucial in search engines, as it helps reducing query response times and processing workloads on search servers. In this work we propose and evaluate a static cache that acts simultaneously as list and intersection cache, offering a more efficient way of handling cache space. We also use a query resolution strategy that takes advantage of the existence of this cache to reorder the query execution sequence. In addition, we propose effective strategies to select the term pairs that should populate the cache. We also represent the data in cache in both raw and compressed forms and evaluate the differences between them using different configurations of cache sizes. The results show that the proposed Integrated Cache outperforms the standard posting lists cache in most of the cases, taking advantage not only of the intersection cache but also the query resolution strategy.

Keywords

Information retrieval systems Caching Integrated cache Performance improvement 

Notes

Acknowledgements

This work was partially supported by EU-IRSES project EUSACOU 247574, by EU FET project MULTIPLEX 317532 and by UBACyT Project 20020120100058 “Herramientas algorítmicas avanzadas para aplicaciones de búsqueda en Internet—Parte 2”.

References

  1. Anh, V. N., & Moffat, A. (2005). Inverted index compression using word-aligned binary codes. Information Retrieval, 8(1), 151–166.CrossRefGoogle Scholar
  2. Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search (2nd ed.). Reading, MA: Addison-Wesley Prof., Inc.Google Scholar
  3. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., & Silvestri, F. (2007). The impact of caching on search engines. In Proceedings of the 30th international conference on research and development in information retrieval, SIGIR ’14.Google Scholar
  4. Barroso, L. A., Dean, J., & Hölzle, U. (2003). Web search for a planet: The google cluster architecture. IEEE Micro, 23(2), 22–28.CrossRefGoogle Scholar
  5. Broder, A. Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. In Proceedings of the 12th international conference on information and knowledge management, CIKM ’03. (pp 426–434). New York, NY: ACM.Google Scholar
  6. Cambazoglu, B. B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., et al. (2010). Early exit optimizations for additive machine learned ranking systems. In Proceedings of the 3rd ACM international conference on Web search and data mining, WSDM ’10.Google Scholar
  7. Cao, P., & Irani, S. (1997). Cost-aware www proxy caching algorithms. In Proceedings of the USENIX symposium on internet technologies and systems on USENIX symposium on internet technologies and systems, USITS’97 (pp. 18–18). Berkeley, CA, USA: USENIX Association.Google Scholar
  8. Catena, M., Macdonald, C., & Ounis, I. (2014). On inverted index compression for search engine efficiency. In M. de Rijke, T. Kenter, A. de Vries, C. Zhai, F. de Jong, K. Radinsky, & K. Hofmann (eds) Advances in information retrieval, lecture notes in computer science (Vol. 8416, pp. 359–371). New York: Springer.Google Scholar
  9. Culpepper, J. S., & Moffat, A. (2007). Compact set representation for information retrieval. In Proceedings of the 14th International Conference on String Processing and Information Retrieval, SPIRE’07 (pp. 137–148). Berlin: Heidelberg.Google Scholar
  10. Dean, J. (2009). Challenges in building large-scale information retrieval systems: Invited talk. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM ’09 (pp. 1–1). New York, NY, USA: ACM.Google Scholar
  11. Ding, S., Attenberg, J., Baeza-Yates, R., & Suel, T. (2011). Batch query processing for web search engines. In Proceedings of the 4th ACM international conference on Web search and data mining, WSDM ’11 (pp. 137–146). New York, NY, USA.Google Scholar
  12. Edmonds, J. (1965). Maximum matching and a polyhedron with \(0,1\) vertices. Journal of Research of the National Bureau of Standards, 69(B), 125–130.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Elias, P. (2006). Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2), 194–203.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Fagni, T., Perego, R., Silvestri, F., & Orlando, S. (2006). Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Transactions on Information Systems, 24(1), 51–78.CrossRefGoogle Scholar
  15. Feuerstein, E., & Tolosa, G. (2013). Analysis of cost-aware policies for intersection caching in search nodes. In Proceedings of the XXXII conference of the Chilean society of computer science, SCCC ’13 Google Scholar
  16. Feuerstein, E., & Tolosa, G. (2014). Cost-aware intersection caching and processing strategies for in-memory inverted indexes. In Proceedings of 11th workshop on large-scale and distributed systems for information retrieval, LSDS-IR ’14. New York.Google Scholar
  17. Gan, Q., & Suel, T. (2009). Improved techniques for result caching in web search engines. In Proceedings of the 18th international conference on World wide web, WWW ’09, pp. 431–440.Google Scholar
  18. Goldstein, J., Ramakrishnan, R., & Shaft, U. (1998). Compressing relations and indexes. In Proceedings of the 14th international conference on data engineering, ICDE ’98 (pp 370–379). Washington, DC, USA: IEEE Computer Society.Google Scholar
  19. Golomb, S. (1966). Run-length encodings. In IEEE transactions on information theory (Vol. 12, No. 3).Google Scholar
  20. Hirai, J., Raghavan, S., Garcia-Molina, H., & Paepcke, A. (2000). Webbase: A repository of web pages. In Proceedings of the 9th international World Wide Web conference on computer networks. Amsterdam: North-Holland Publishing Co.Google Scholar
  21. Jonassen, S., Cambazoglu, B. B., & Silvestri, F. (2012). Prefetching query results and its impact on search engines. In Proceedings of the 35th international conference on research and development in information retrieval, USA, SIGIR ’12, pp. 631–640.Google Scholar
  22. Lam, H. T., Perego, R., Quan, N. T., & Silvestri, F. (2009). Entry pairing in inverted file. In Proceedings of the 10th international conference on Web information systems engineering, WISE ’09 (pp. 511–522). Berlin: Springer.Google Scholar
  23. Levene, H. (1960). Contributions to probability and statistics: Essays in honor of harold hotelling. Redwood City: Stanford University Press.Google Scholar
  24. Long, X., & Suel, T. (2005). Three-level caching for efficient query processing in large web search engines. In Proceedings of the 14th international conference on World Wide Web, USA, WWW ’05, pp 257–266.Google Scholar
  25. Macdonald, C., Ounis, I., & Tonellotto, N. (2011). Upper-bound approximations for dynamic pruning. ACM Transactions on Information Systems, 29(4), 17:1–17:28.CrossRefGoogle Scholar
  26. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York, NY, USA: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  27. Markatos, E. (2001). On caching search engine query results. Computer Communications, 24(2), 137–143.CrossRefGoogle Scholar
  28. Melink, S., Raghavan, S., Yang, B., & Garcia-Molina, H. (2001). Building a distributed full-text index for the web. ACM Transactions on Information Systems, 19(3), 217–241.CrossRefGoogle Scholar
  29. Ozcan, R., Altingovde, I. S., & Ulusoy, O. (2011). Cost-aware strategies for query result caching in web search engines. ACM Transactions on the Web, 5(2), 9:1–9:25.CrossRefGoogle Scholar
  30. Ozcan, R., Sengor Altingovde, I., Barla Cambazoglu, B., Junqueira, F. P., & Ulusoy, O. (2012). A five-level static cache architecture for web search engines. Information Processing & Management, 48(5), 828–840.CrossRefGoogle Scholar
  31. Pass, G., Chowdhury, A., & Torgeson, C. (2006). A picture of search. In Proceedings of the 1st international conference on scalable information systems. ACM, InfoScale ’06.Google Scholar
  32. Saraiva, P. C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., & Riberio-Neto, B. (2001). Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th international conference on research and development in information retrieval, USA, SIGIR ’01, pp. 51–58.Google Scholar
  33. Skobeltsyn, G., Junqueira, F., Plachouras, V., & Baeza-Yates, R. (2008). Resin: A combination of results caching and index pruning for high-performance web search engines. In Proceedings of the 31th international conference on research and development in information retrieval, SIGIR ’08 (pp. 131–138). New York, NY, USA: ACM.Google Scholar
  34. Tolosa, G., Becchetti, L., Feuerstein, E., & Marchetti-Spaccamela, A. (2014). Performance improvements for search systems using an integrated cache of lists+intersections. In Proceedings of the 21st international symposium on string processing and information retrieval, SPIRE 2014 (Vol. 8799, pp. 227–235). New York, NY, USA: Springer.Google Scholar
  35. Turtle, H., & Flood, J. (1995). Query evaluation: Strategies and optimizations. Information Processing and Management, 31(6), 831–850.CrossRefGoogle Scholar
  36. Webber, W., & Moat, A. (2005). In search of reliable retrieval experiments. In Proceedings of the 10th Australasian document computing symposium, ADCS’05, pp. 26–33.Google Scholar
  37. Williams, H. E., & Zobel, J. (1999). Compressing integers for fast file access. The Computer Journal, 42, 193–201.CrossRefGoogle Scholar
  38. Witten, I. H., Moffat, A., & Bell, T. C. (1999). Managing gigabytes (2nd Ed.). San Francisco, CA, USA: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers Inc.Google Scholar
  39. Young, N. E. (1998). On-line file caching. In Proceedings of the 9th annual ACM-SIAM symposium on discrete algorithms, society for industrial and applied mathematics. Philadelphia, PA, USA, SODA ’98, pp 82–86.Google Scholar
  40. Zhang, J., Long, X., & Suel, T. (2008). Performance of compressed inverted list caching in search engines. In Proceedings of the 17th international conference on World Wide Web, USA, WWW ’08, pp. 387–396.Google Scholar
  41. Zobel, J., & Moffat, A. (2006). Inverted files for text search engines. ACM Computer Surveys, 38(2), Article No 6.Google Scholar
  42. Zukowski, M., Heman, S., Nes, N., & Boncz, P. (2006). Super-scalar ram-cpu cache compression. In Proceedings of the 22nd international conference on data engineering, ICDE ’06 (pp. 59–70). Washington, DC, USA: IEEE Computer Society.Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.University of Buenos AiresBuenos AiresArgentina
  2. 2.National University of LujánBuenos AiresArgentina
  3. 3.Sapienza University of RomeRomeItaly

Personalised recommendations