Probabilistic Proximity Searching Algorithms Based on Compact Partitions

  • Benjamin Bustos
  • Gonzalo Navarro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2476)

Abstract

The main bottleneck of the research in metric space searching is the so-called curse of dimensionality, which makes the task of searching some metric spaces intrinsically difficult, whatever algorithm is used. A recent trend to break this bottleneck resorts to probabilistic algorithms, where it has been shown that one can find 99% of the elements at a fraction of the cost of the exact algorithm. These algorithms are welcome in most applications because resorting to metric space searching already involves a fuzziness in the retrieval requirements. In this paper we push further in this direction by developing probabilistic algorithms on data structures whose exact versions are the best for high dimensions. As a result, we obtain probabilistic algorithms that are better than the previous ones. We also give new insights on the problem and propose a novel view based on time-bounded searching.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999.Google Scholar
  2. [2]
    S. Brin. Near neighbor search in large metric spaces. In Proc. 21st Conference on Very Large Databases (VLDB’95), pages 574–584, 1995.Google Scholar
  3. [3]
    B. Bustos, G. Navarro, and E. Chávez. Pivot selection techniques for proximity searching in metric spaces. In Proc. of the XXI Conference of the Chilean Computer Science Society (SCCC’01), pages 33–40. IEEE CS Press, 2001.Google Scholar
  4. [4]
    E. Chávez and G. Navarro. An effective clustering algorithm to index high dimensional metric spaces. In Proc. 7th South American Symposium on String Processing and Information Retrieval (SPIRE’00), pages 75–86. IEEE CS Press, 2000.Google Scholar
  5. [5]
    E. Chávez and G. Navarro. A probabilistic spell for the curse of dimensionality. In Proc. 3rd Workshop on Algorithm Engineering and Experiments (ALENEX’01), LNCS 2153, pages 147–160, 2001.Google Scholar
  6. [6]
    E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroquín. Proximity searching in metric spaces. ACM Computing Surveys, 33(3):273–321, 2001.CrossRefGoogle Scholar
  7. [7]
    P. Ciaccia, M. Patella, and P. Zezula. M-tree: an efficient access method for similarity search in metric spaces. In Proc. of the 23rd Conference on Very Large Databases (VLDB’97), pages 426–435, 1997.Google Scholar
  8. [8]
    F. Dehne and H. Noltemeier. Voronoi trees and clustering problems. Information Systems, 12(2):171–175, 1987.CrossRefGoogle Scholar
  9. [9]
    D. Harman. Overview of the Third Text REtrieval Conference. In Proc. Third Text REtrieval Conference (TREC-3), pages 1–19, 1995. NIST Special Publication 500-207.Google Scholar
  10. [10]
    G. Hjaltason and H. Samet. Incremental similarity search in multimedia databases. Technical Report TR 4199, Department of Computer Science, University of Maryland, November 2000.Google Scholar
  11. [11]
    I. Kalantari and G. McDonald. A data structure and an algorithm for the nearest point problem. IEEE Transactions on Software Engineering, 9(5):631–634, 1983.CrossRefGoogle Scholar
  12. [12]
    G. Navarro. Searching in metric spaces by spatial approximation. The Very Large Databases Journal (VLDBJ), 2002. To appear. Earlier version in SPIRE’99, IEEE CS Press.Google Scholar
  13. [13]
    H. Noltemeier, K. Verbarg, and C. Zirkelbach. Monotonous Bisector* Trees-a tool for efficient partitioning of complex schenes of geometric objects. In Data Structures and Efficient Algorithms, LNCS 594, pages 186–203. Springer-Verlag, 1992.Google Scholar
  14. [14]
    J. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40:175–179, 1991.MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Benjamin Bustos
    • 2
  • Gonzalo Navarro
    • 1
    • 2
  1. 1.1 Center for Web ResearchUniversidad de ChileSantiagoChile
  2. 2.Departamento de Ciencias de la ComputaciónUniversidad de ChileSantiagoChile

Personalised recommendations