Skip to main content

Experiments on Adaptive Set Intersections for Text Retrieval Systems

  • Conference paper
  • First Online:
Algorithm Engineering and Experimentation (ALENEX 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2153))

Included in the following conference series:

Abstract

In [3] we introduced an adaptive algorithm for computing the intersection of k sorted sets within a factor of at most 8k comparisons of the information-theoretic lower bound under a model that deals with an encoding of the shortest proof of the answer. This adaptive algorithm performs better for “burstier” inputs than a straightforward worst-case optimal method. Indeed, we have shown that, subject to a reasonable measure of instance difficulty, the algorithm adapts optimally up to a constant factor. This paper explores how this algorithm behaves under actual data distributions, compared with standard algorithms. We present experiments for searching 114 megabytes of text from the World Wide Web using 5,000 actual user queries from a commercial search engine. From the experiments, it is observed that the theoretically optimal adaptive algorithm is not always the optimal in practice, given the distribution of WWW text data. We then proceed to study several improvement techniques for the standard algorithms. These techniques combine improvements suggested by the observed distribution of the data as well as the theoretical results from [3]. We perform controlled experiments on these techniques to determine which ones result in improved performance, resulting in an algorithm that outperforms existing algorithms in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Baeza-Yates. Efficient Text Searching. PhD thesis, Department of Computer Science, University of Waterloo, 1989.

    Google Scholar 

  2. Svante Carlsson, Christos Levcopoulos, and Ola Petersson. Sublinear merging and natural mergesort. Algorithmica, 9:629–648, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  3. Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. Adaptive set intersections, unions, and differences. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 743–752, San Francisco, California, January 2000.

    Google Scholar 

  4. Vladimir Estivill-Castro and Derick Wood. A survey of adaptive sorting algorithms. A CM Computing Surveys, 24(4):441–476, December 1992.

    Google Scholar 

  5. William Frakes and Richardo Baeza-Yates. Information Retrieval. Prentice Hall, 1992.

    Google Scholar 

  6. F. K. Hwang. Optimal merging of 3 elements with n elements. SIAM Journal on Computing, 9(2):298–320, 1980.

    Article  MATH  MathSciNet  Google Scholar 

  7. F. K. Hwang and S. Lin. A simple algorithm for merging two disjoint linearly ordered sets. SIAM Journal on Computing, 1(1):31–39, 1980.

    Article  MathSciNet  Google Scholar 

  8. Michael Lesk. “Real world” searching panel at SIGIR 1997. SIGIR Forum, 32(1), Spring 1998.

    Google Scholar 

  9. U. Manber and G. Myers. Suffix arrays: A new method for on-line string searchs. In Proceedings of the 1st Symposium on Discrete Algorithms, pages 319–327, 1990.

    Google Scholar 

  10. Alistair Moffat, Ola Petersson, and Nicholas C. Wormald. A tree-based Mergesort. Acta Informatica, 35(9):775–793, August 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Demaine, E.D., López-Ortiz, A., Ian Munro, J. (2001). Experiments on Adaptive Set Intersections for Text Retrieval Systems. In: Buchsbaum, A.L., Snoeyink, J. (eds) Algorithm Engineering and Experimentation. ALENEX 2001. Lecture Notes in Computer Science, vol 2153. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44808-X_7

Download citation

  • DOI: https://doi.org/10.1007/3-540-44808-X_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42560-1

  • Online ISBN: 978-3-540-44808-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics