Efficient Similarity Search in Metric Spaces with Cluster Reduction
Clustering-based methods for searching in metric spaces partition the space into a set of disjoint clusters. When solving a query, some clusters are discarded without comparing them with the query object, and clusters that can not be discarded are searched exhaustively. In this paper we propose a new strategy and algorithms for clustering-based methods that avoid the exhaustive search within clusters that can not be discarded, at the cost of some extra information in the index. This new strategy is based on progressively reducing the cluster until it can be discarded from the result. We refer to this approach as cluster reduction. We present the algorithms for range and kNN search. The results obtained in an experimental evaluation with synthetic and real collections show that the search cost can be reduced by a 13% - 25% approximately with respect to existing methods.
Keywordssimilarity search metric spaces cluster reduction
Unable to display preview. Download preview PDF.
- 2.Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity search. The metric space approach. Advances in Database Systems, vol. 32. Springer (2006)Google Scholar
- 6.Brin, S.: Near neighbor search in large metric spaces. In: Procs. of Conf. on Very Large Databases (VLDB 1995), pp. 574–584. Morgan Kaufmann Publishers (1995)Google Scholar
- 8.Navarro, G.: Searching in metric spaces by spatial approximation. In: Procs. of String Processing and Information Retrieval (SPIRE 1999), pp. 141–148. IEEE CS Press (1999)Google Scholar
- 9.Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Procs. of Conf. on Very Large Databases (VLDB 1997), pp. 426–435. ACM Press (1997)Google Scholar
- 12.Bozkaya, T., Ozsoyoglu, M.: Distance-based indexing for high-dimensional metric spaces. In: Proc. of the ACM Conf. on Management of Data (SIGMOD 1997), pp. 357–368. ACM Press (1997)Google Scholar
- 14.Skopal, T., Pokorný, J., Snásel, V.: Pm-tree: Pivoting metric tree for similarity search in multimedia databases. In: Procs. of Advances in Database Systems (ADBIS 2004), Local Procs., pp. 803–815 (2004)Google Scholar