Advertisement

Optimized binary search and text retrieval

  • Eduardo Fernandes Barbosa
  • Gonzalo Navarro
  • Ricardo Baeza-Yates
  • Chris Perleberg
  • Nivio Ziviani
Session 5. Chair: Hava Siegelmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 979)

Abstract

We present an algorithm that minimizes the expected cost of indirect binary search for data with non-constant access costs, such as disk data. Indirect binary search means that sorted access to the data is obtained through an array of pointers to the raw data. One immediate application of this algorithm is to improve the retrieval performance of disk databases that are indexed using the suffix array model (also called PAT array). We consider the cost model of magnetic and optical disks and the anticipated knowledge of the expected size of the subproblem produced by reading each disk track. This information is used to devise a modified binary searching algorithm to decrease overall retrieval costs. Both an optimal and a practical algorithm are presented, together with analytical and experimental results. For 100 megabytes of text the practical algorithm costs 60% of the standard binary search cost for the magnetic disk and 65% for the optical disk.

Key-words

Optimized binary search text retrieval PAT arrays suffix arrays magnetic disks read-only optical disks CD-ROM disks 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AACS87]
    A. Aggarwal, B. Alpern, K. Chandra and M. Snir. “A Model for Hierarchical Memory”, Proc. of the 19th Annual ACM Symp. of the Theory of Computing, New York, 1987, 305–314.Google Scholar
  2. [BYBZ94]
    R. Baeza-Yates, E.F. Barbosa and N. Ziviani. Hierarchies of indices for text searching. In Proceedings RIAO'94 Intelligent Multimedia Information Retrieval Systems and Management, pages 11–13. Rockefeller University, New York, Oct. 1994.Google Scholar
  3. [BZ92]
    E. F. Barbosa and N. Ziviani. Data structures and access methods for read-only optical disks. In R. Baeza-Yates and U. Manber, editors, Computer Science: Research and Applications, pages 189–207. Plenum Publishing Corp., 1992.Google Scholar
  4. [GBY91]
    G. H. Gonnet and R. Baeza-Yates. Handbook of Algorithms and Data Structures. Addison-Wesley, 1991.Google Scholar
  5. [Gon87]
    G. H. Gonnet. PAT 3.1: An Efficient Text Searching System. User's Manual. Center for the New Oxford English Dictionary. University of Waterloo, Waterloo, Canada, 1987.Google Scholar
  6. [HP90]
    J. L. Hennesy and D. A. Patterson. Computer Architecture. A Quantitative Approach. Morgan Kaufmann Publishers, Inc., 1990.Google Scholar
  7. [Kni88]
    W. J. Knight. Search in an Ordered Array having Variable Probe Cost. SIAM J. of Computing 17 (6), Dec. 1988, 1203–1214.CrossRefGoogle Scholar
  8. [Knu73]
    D.E. Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. Addison-Wesley, Reading, Massachusetts, 1973.Google Scholar
  9. [MM90]
    U. Manber and G. Myers. Suffix Arrays: A new method for on-line string searches. ACM-SIAM Symposium on Discrete Algorithms, pages 319–327, Jan. 1990.Google Scholar
  10. [Mor68]
    D. R. Morrison. PATRICIA — Practical Algorithm To Retrieve Information Coded in Alphanumeric. Journal of the ACM, 15(4):514–534, 1968.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Eduardo Fernandes Barbosa
    • 1
  • Gonzalo Navarro
    • 2
  • Ricardo Baeza-Yates
    • 2
  • Chris Perleberg
    • 2
  • Nivio Ziviani
    • 1
  1. 1.Departamento de Ciência da ComputaçãoUniversidade Federal de Minas GeraisBelo HorizonteBrazil
  2. 2.Departamento de Ciencias de la ComputaciónUniversidad de ChileSantiagoChile

Personalised recommendations