Advertisement

Proximity matching using fixed-queries trees

  • Ricardo Baeza-Yates
  • Walter Cunto
  • Udi Manber
  • Sun Wu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 807)

Abstract

We present a new data structure, called the fixed-queries tree, for the problem of finding all elements of a fixed set that are close, under some distance function, to a query element. Fixed-queries trees can be used for any distance function, not necessarily even a metric, as long as it satisfies the triangle inequality. We give an analysis of several performance parameters of fixed-queries trees and experimental results that support the analysis. Fixed-queries trees are particularly efficient for applications in which comparing two elements is expensive.

Keywords

Distance Function Triangle Inequality Suffix Tree Alphabet Size Levenshtein Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AGMML90]
    Altschul S.F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Molecular Biology 15 (1990), 403–410.Google Scholar
  2. [BYC92]
    Baeza-Yates, R.A. and Cunto, W., “Unbalanced Multiway Trees Improved by Partial Expansions”, Acta Information, 29 (5), 1992, 443–460.Google Scholar
  3. [BYG90]
    Baeza-Yates, R.A. and Gonnet, G.H., “All-against-all Sequence Matching”, Dept. of Computer Science, Universidad de Chile, 1990.Google Scholar
  4. [BGKN90]
    Bahl L. R., P. S. Gopalakrishnan, D. S. Kanevsky, and D.S. Nahamoo, “A fast admissible method for identifying a short list of candidate words,” IBM tech report RC 15874 (June 1990).Google Scholar
  5. [BR+93]
    Bugnion, E. and Roos, T. and Shi, F. and Widmayer, P. and Widmer, F. “A Spatial Index for Approximate Multiple String Matching”, 1st South American Workshop on String Processing, Belo Horizonte, Sept 1993, 43–54.Google Scholar
  6. [BK73]
    Burkhard, W.A. and Keller, R.M. “Some Approaches to Best-Match File Searching”, Communications of the ACM 16 (4), April 1973, 230–236.Google Scholar
  7. [CL90]
    Chang W.L., and E.L. Lawler, “Approximate matching in sublinear expected time,” Proc. of the 31st IEEE Symp. on Foundations of Computer Science (1990) 116–124.Google Scholar
  8. [FBF77]
    Friedman, J.H. and Bentley, J.L. and Finkel, R.A. “An Algorithm to find best matches in logarithmic expected time”, ACM Trans. on Math. Software 3(3), 1977.Google Scholar
  9. [GBY91]
    Gonnet, G.H. and Baeza-Yates, R. Handbook of Algorithms and Data Structures, Addison-Wesley, second edition, 1991.Google Scholar
  10. [GCB92]
    Gonnet, G.H., M.A. Cohen, and S.A. Benner, “Exhaustive matching of the entire protein sequence database,” Science 256, 1443.Google Scholar
  11. [LP85]
    Lipman D. J., and W.R. Pearson, “Rapid and sensitive protein similarity searches,” Science 227 (1985), 1435–1441.Google Scholar
  12. [Mah92]
    Mahmoud, H. Evolution of Random Search Trees, John Wiley, New York, 1992.Google Scholar
  13. [Mur83]
    Murtagh, F. “A Survey of Recent Advances in Hierarchical Clustering Algorithms”, IEEE Computer 26 (4), 1983, 354–359.Google Scholar
  14. [My92]
    Myers, E. “Algorithmic Advances for Searching Biosequence Databases,” Proceedings of the International Symposium on Computational Methods in Genome Research (Heidelberg, 1992), to appear.Google Scholar
  15. [My94]
    Myers, E. “A Sublinear Algorithm for Approximate Keyword Matching,” Algorithmica, in press.Google Scholar
  16. [NK82]
    Nevalainen, O. and Katajainen, J. “Experiments with a Closest Point Algorithm in Hamming Space”, Angewandte Informatik 5, 1982, 277–281.Google Scholar
  17. [SDDR89]
    Santana, O. and Diaz, M. and Duque, J.D. and Rodriguez, J.C. “Increasing radius search schemes for the most similar strings on the Burkhard-Keller tree”, International Workshop on Computer Aided Systems Theory, EUROCAST'89, 1989.Google Scholar
  18. [Sha77]
    Shapiro, M. “The Choice of Reference Points in Best-Match File Searching”, Communications of the ACM 20 (5), May 1977, 339–343.Google Scholar
  19. [SW90]
    Shasha, D. and Wang, T-L. “New Techniques for Best-Match Retrieval”, ACM Transactions on Information Systems 8, 1990, 140–158.Google Scholar
  20. [Uk92]
    Ukkonen, E., “Approximate string matching with q-grams and maximal matches,” Theoretical Computer Science (1992), 191–212.Google Scholar
  21. [Uk93]
    Ukkonen, E., “Approximate string-matching over suffix trees,” 4th Annual Combinatorial Pattern Matching Symp., Padova, Italy (June 1993), 228–242.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Ricardo Baeza-Yates
    • 1
  • Walter Cunto
    • 2
  • Udi Manber
    • 3
  • Sun Wu
    • 4
  1. 1.Dpto. de Ciencias de la ComputaciónUniversidad de ChileSantiagoChile
  2. 2.IBM Consulting Group, Aptdo. 64778 & Dpto. de Computación y Tecnología de la InformaciónUniv. Simón BolivarCaracasVenezuela
  3. 3.Dept. of Computer ScienceUniversity of ArizonaTucsonUSA
  4. 4.Dept. of Computer ScienceNational Chung-Cheng Univ.Ming-Shong, Chia-YiTaiwan

Personalised recommendations