Abstract
We present an algorithm that minimizes the expected cost of indirect binary search for data with non-constant access costs, such as disk data. Indirect binary search means that sorted access to the data is obtained through an array of pointers to the raw data. One immediate application of this algorithm is to improve the retrieval performance of disk databases that are indexed using the suffix array model (also called PAT array). We consider the cost model of magnetic and optical disks and the anticipated knowledge of the expected size of the subproblem produced by reading each disk track. This information is used to devise a modified binary searching algorithm to decrease overall retrieval costs. Both an optimal and a practical algorithm are presented, together with analytical and experimental results. For 100 megabytes of text the practical algorithm costs 60% of the standard binary search cost for the magnetic disk and 65% for the optical disk.
The authors wish to acknowledge the financial support from the Brazilian CNPq — Conselho Nacional de Desenvolvimento Científico e Tecnológico, Fondecyt Grant No. 1930765, IBM do Brasil, Programa de Cooperación Científica Chile-Brasil de Fundación Andes, and Project RITOS/CYTED. We also wish to acknowledge the fruitful suggestions from an anonymous referee.
Preview
Unable to display preview. Download preview PDF.
References
A. Aggarwal, B. Alpern, K. Chandra and M. Snir. “A Model for Hierarchical Memory”, Proc. of the 19th Annual ACM Symp. of the Theory of Computing, New York, 1987, 305–314.
R. Baeza-Yates, E.F. Barbosa and N. Ziviani. Hierarchies of indices for text searching. In Proceedings RIAO'94 Intelligent Multimedia Information Retrieval Systems and Management, pages 11–13. Rockefeller University, New York, Oct. 1994.
E. F. Barbosa and N. Ziviani. Data structures and access methods for read-only optical disks. In R. Baeza-Yates and U. Manber, editors, Computer Science: Research and Applications, pages 189–207. Plenum Publishing Corp., 1992.
G. H. Gonnet and R. Baeza-Yates. Handbook of Algorithms and Data Structures. Addison-Wesley, 1991.
G. H. Gonnet. PAT 3.1: An Efficient Text Searching System. User's Manual. Center for the New Oxford English Dictionary. University of Waterloo, Waterloo, Canada, 1987.
J. L. Hennesy and D. A. Patterson. Computer Architecture. A Quantitative Approach. Morgan Kaufmann Publishers, Inc., 1990.
W. J. Knight. Search in an Ordered Array having Variable Probe Cost. SIAM J. of Computing 17 (6), Dec. 1988, 1203–1214.
D.E. Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. Addison-Wesley, Reading, Massachusetts, 1973.
U. Manber and G. Myers. Suffix Arrays: A new method for on-line string searches. ACM-SIAM Symposium on Discrete Algorithms, pages 319–327, Jan. 1990.
D. R. Morrison. PATRICIA — Practical Algorithm To Retrieve Information Coded in Alphanumeric. Journal of the ACM, 15(4):514–534, 1968.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barbosa, E.F., Navarro, G., Baeza-Yates, R., Perleberg, C., Ziviani, N. (1995). Optimized binary search and text retrieval. In: Spirakis, P. (eds) Algorithms — ESA '95. ESA 1995. Lecture Notes in Computer Science, vol 979. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60313-1_152
Download citation
DOI: https://doi.org/10.1007/3-540-60313-1_152
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60313-9
Online ISBN: 978-3-540-44913-3
eBook Packages: Springer Book Archive