Skip to main content

Measurement Techniques and Caching Effects

  • Conference paper
Advances in Information Retrieval (ECIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

  • 3218 Accesses

Abstract

Overall query execution time consists of the time spent transferring data from disk to memory, and the time spent performing actual computation. In any measurement of overall time on a given hardware configuration, the two separate costs are aggregated. This makes it hard to reproduce results and to infer which of the two costs is actually affected by modifications proposed by researchers. In this paper we show that repeated submissions of the same query provides a means to estimate the computational fraction of overall query execution time. The advantage of separate measurements is exemplified for a particular optimization that is, as it turns out, reducing computational costs only. Finally, by exchange of repeated query terms with surrogates that have similar document-frequency, we are able to measure the natural caching effects that arise as a consequence of term repetitions in query logs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Francisco (1999)

    MATH  Google Scholar 

  2. Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: IO-Top-k: Index-access optimized top-k query processing. In: Proc. 32nd Int. Conf. on Very Large Data Bases, Seoul, Korea, VLDB Endowment, September 2006, pp. 475–486 (2006)

    Google Scholar 

  3. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F.: The impact of caching on search engines. In: Proc. 30th Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 183–190. ACM, New York (2007)

    Google Scholar 

  4. Strohman, T., Croft, W.B.: Efficient document retrieval in main memory. In: Proc. 30th Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 175–182. ACM, New York (2007)

    Google Scholar 

  5. Büttcher, S., Clarke, C.L.A., Soboroff, I.: The TREC 2006 terabyte track. In: Proc. 15th Text REtrieval Conf., November 2006, pp. 128–141 (2006)

    Google Scholar 

  6. Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems 14(4), 349–379 (1996)

    Article  Google Scholar 

  7. Pohl, S., Moffat, A.: Term-frequency surrogates in text similarity computations. In: Proc. 13th Australasian Document Computing Symp., Hobart, Tasmania, December 2008, pp. 3–10 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pohl, S., Moffat, A. (2009). Measurement Techniques and Caching Effects. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_68

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00958-7_68

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00957-0

  • Online ISBN: 978-3-642-00958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics