On the Least Cost for Proximity Searching in Metric Spaces
Proximity searching consists in retrieving from a database those elements that are similar to a query. As the distance is usually expensive to compute, the goal is to use as few distance computations as possible to satisfy queries. Indexes use precomputed distances among database elements to speed up queries. As such, a baseline is AESA, which stores all the distances among database objects, but has been unbeaten in query performance for 20 years. In this paper we show that it is possible to improve upon AESA by using a radically different method to select promising database elements to compare against the query. Our experiments show improvements of up to 75% in document databases. We also explore the usage of our method as a probabilistic algorithm that may lose relevant answers. On a database of faces where any exact algorithm must examine virtually all elements, our probabilistic version obtains 85% of the correct answers by scanning only 10% of the database.
KeywordsFace Image Exact Algorithm Distance Computation Range Query Probabilistic Algorithm
Unable to display preview. Download preview PDF.
- 1.Arya, S., Mount, D., Netanyahu, N., Silverman, R., Wu, A.: An optimal algorithm for approximate nearest neighbor searching in fixed dimension. In: Proc. 5th ACM-SIAM Symposium on Discrete Algorithms (SODA 1994), pp. 573–583 (1994)Google Scholar
- 2.Baeza-Yates, R., Ribeiro, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
- 11.Fredriksson, K.: Parallel and memory adaptive metric indexes. Pattern Recognition Letters (to appear)Google Scholar
- 19.White, D., Jain, R.: Algorithms and strategies for similarity retrieval. Technical Report VCL-96-101, Visual Computing Laboratory, U. of California (1996)Google Scholar