Optimized binary search and text retrieval
We present an algorithm that minimizes the expected cost of indirect binary search for data with non-constant access costs, such as disk data. Indirect binary search means that sorted access to the data is obtained through an array of pointers to the raw data. One immediate application of this algorithm is to improve the retrieval performance of disk databases that are indexed using the suffix array model (also called PAT array). We consider the cost model of magnetic and optical disks and the anticipated knowledge of the expected size of the subproblem produced by reading each disk track. This information is used to devise a modified binary searching algorithm to decrease overall retrieval costs. Both an optimal and a practical algorithm are presented, together with analytical and experimental results. For 100 megabytes of text the practical algorithm costs 60% of the standard binary search cost for the magnetic disk and 65% for the optical disk.
Key-wordsOptimized binary search text retrieval PAT arrays suffix arrays magnetic disks read-only optical disks CD-ROM disks
Unable to display preview. Download preview PDF.
- [AACS87]A. Aggarwal, B. Alpern, K. Chandra and M. Snir. “A Model for Hierarchical Memory”, Proc. of the 19th Annual ACM Symp. of the Theory of Computing, New York, 1987, 305–314.Google Scholar
- [BYBZ94]R. Baeza-Yates, E.F. Barbosa and N. Ziviani. Hierarchies of indices for text searching. In Proceedings RIAO'94 Intelligent Multimedia Information Retrieval Systems and Management, pages 11–13. Rockefeller University, New York, Oct. 1994.Google Scholar
- [BZ92]E. F. Barbosa and N. Ziviani. Data structures and access methods for read-only optical disks. In R. Baeza-Yates and U. Manber, editors, Computer Science: Research and Applications, pages 189–207. Plenum Publishing Corp., 1992.Google Scholar
- [GBY91]G. H. Gonnet and R. Baeza-Yates. Handbook of Algorithms and Data Structures. Addison-Wesley, 1991.Google Scholar
- [Gon87]G. H. Gonnet. PAT 3.1: An Efficient Text Searching System. User's Manual. Center for the New Oxford English Dictionary. University of Waterloo, Waterloo, Canada, 1987.Google Scholar
- [HP90]J. L. Hennesy and D. A. Patterson. Computer Architecture. A Quantitative Approach. Morgan Kaufmann Publishers, Inc., 1990.Google Scholar
- [Knu73]D.E. Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. Addison-Wesley, Reading, Massachusetts, 1973.Google Scholar
- [MM90]U. Manber and G. Myers. Suffix Arrays: A new method for on-line string searches. ACM-SIAM Symposium on Discrete Algorithms, pages 319–327, Jan. 1990.Google Scholar