Probabilistic Proximity Searching Algorithms Based on Compact Partitions
The main bottleneck of the research in metric space searching is the so-called curse of dimensionality, which makes the task of searching some metric spaces intrinsically difficult, whatever algorithm is used. A recent trend to break this bottleneck resorts to probabilistic algorithms, where it has been shown that one can find 99% of the elements at a fraction of the cost of the exact algorithm. These algorithms are welcome in most applications because resorting to metric space searching already involves a fuzziness in the retrieval requirements. In this paper we push further in this direction by developing probabilistic algorithms on data structures whose exact versions are the best for high dimensions. As a result, we obtain probabilistic algorithms that are better than the previous ones. We also give new insights on the problem and propose a novel view based on time-bounded searching.
Unable to display preview. Download preview PDF.
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999.Google Scholar
- S. Brin. Near neighbor search in large metric spaces. In Proc. 21st Conference on Very Large Databases (VLDB’95), pages 574–584, 1995.Google Scholar
- B. Bustos, G. Navarro, and E. Chávez. Pivot selection techniques for proximity searching in metric spaces. In Proc. of the XXI Conference of the Chilean Computer Science Society (SCCC’01), pages 33–40. IEEE CS Press, 2001.Google Scholar
- E. Chávez and G. Navarro. An effective clustering algorithm to index high dimensional metric spaces. In Proc. 7th South American Symposium on String Processing and Information Retrieval (SPIRE’00), pages 75–86. IEEE CS Press, 2000.Google Scholar
- E. Chávez and G. Navarro. A probabilistic spell for the curse of dimensionality. In Proc. 3rd Workshop on Algorithm Engineering and Experiments (ALENEX’01), LNCS 2153, pages 147–160, 2001.Google Scholar
- P. Ciaccia, M. Patella, and P. Zezula. M-tree: an efficient access method for similarity search in metric spaces. In Proc. of the 23rd Conference on Very Large Databases (VLDB’97), pages 426–435, 1997.Google Scholar
- D. Harman. Overview of the Third Text REtrieval Conference. In Proc. Third Text REtrieval Conference (TREC-3), pages 1–19, 1995. NIST Special Publication 500-207.Google Scholar
- G. Hjaltason and H. Samet. Incremental similarity search in multimedia databases. Technical Report TR 4199, Department of Computer Science, University of Maryland, November 2000.Google Scholar
- G. Navarro. Searching in metric spaces by spatial approximation. The Very Large Databases Journal (VLDBJ), 2002. To appear. Earlier version in SPIRE’99, IEEE CS Press.Google Scholar
- H. Noltemeier, K. Verbarg, and C. Zirkelbach. Monotonous Bisector* Trees-a tool for efficient partitioning of complex schenes of geometric objects. In Data Structures and Efficient Algorithms, LNCS 594, pages 186–203. Springer-Verlag, 1992.Google Scholar