Skip to main content

Optimized binary search and text retrieval

  • Session 5. Chair: Hava Siegelmann
  • Conference paper
  • First Online:
Algorithms — ESA '95 (ESA 1995)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 979))

Included in the following conference series:

  • 168 Accesses

Abstract

We present an algorithm that minimizes the expected cost of indirect binary search for data with non-constant access costs, such as disk data. Indirect binary search means that sorted access to the data is obtained through an array of pointers to the raw data. One immediate application of this algorithm is to improve the retrieval performance of disk databases that are indexed using the suffix array model (also called PAT array). We consider the cost model of magnetic and optical disks and the anticipated knowledge of the expected size of the subproblem produced by reading each disk track. This information is used to devise a modified binary searching algorithm to decrease overall retrieval costs. Both an optimal and a practical algorithm are presented, together with analytical and experimental results. For 100 megabytes of text the practical algorithm costs 60% of the standard binary search cost for the magnetic disk and 65% for the optical disk.

The authors wish to acknowledge the financial support from the Brazilian CNPq — Conselho Nacional de Desenvolvimento Científico e Tecnológico, Fondecyt Grant No. 1930765, IBM do Brasil, Programa de Cooperación Científica Chile-Brasil de Fundación Andes, and Project RITOS/CYTED. We also wish to acknowledge the fruitful suggestions from an anonymous referee.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Aggarwal, B. Alpern, K. Chandra and M. Snir. “A Model for Hierarchical Memory”, Proc. of the 19th Annual ACM Symp. of the Theory of Computing, New York, 1987, 305–314.

    Google Scholar 

  2. R. Baeza-Yates, E.F. Barbosa and N. Ziviani. Hierarchies of indices for text searching. In Proceedings RIAO'94 Intelligent Multimedia Information Retrieval Systems and Management, pages 11–13. Rockefeller University, New York, Oct. 1994.

    Google Scholar 

  3. E. F. Barbosa and N. Ziviani. Data structures and access methods for read-only optical disks. In R. Baeza-Yates and U. Manber, editors, Computer Science: Research and Applications, pages 189–207. Plenum Publishing Corp., 1992.

    Google Scholar 

  4. G. H. Gonnet and R. Baeza-Yates. Handbook of Algorithms and Data Structures. Addison-Wesley, 1991.

    Google Scholar 

  5. G. H. Gonnet. PAT 3.1: An Efficient Text Searching System. User's Manual. Center for the New Oxford English Dictionary. University of Waterloo, Waterloo, Canada, 1987.

    Google Scholar 

  6. J. L. Hennesy and D. A. Patterson. Computer Architecture. A Quantitative Approach. Morgan Kaufmann Publishers, Inc., 1990.

    Google Scholar 

  7. W. J. Knight. Search in an Ordered Array having Variable Probe Cost. SIAM J. of Computing 17 (6), Dec. 1988, 1203–1214.

    Article  Google Scholar 

  8. D.E. Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. Addison-Wesley, Reading, Massachusetts, 1973.

    Google Scholar 

  9. U. Manber and G. Myers. Suffix Arrays: A new method for on-line string searches. ACM-SIAM Symposium on Discrete Algorithms, pages 319–327, Jan. 1990.

    Google Scholar 

  10. D. R. Morrison. PATRICIA — Practical Algorithm To Retrieve Information Coded in Alphanumeric. Journal of the ACM, 15(4):514–534, 1968.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Paul Spirakis

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Barbosa, E.F., Navarro, G., Baeza-Yates, R., Perleberg, C., Ziviani, N. (1995). Optimized binary search and text retrieval. In: Spirakis, P. (eds) Algorithms — ESA '95. ESA 1995. Lecture Notes in Computer Science, vol 979. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60313-1_152

Download citation

  • DOI: https://doi.org/10.1007/3-540-60313-1_152

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60313-9

  • Online ISBN: 978-3-540-44913-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics