Skip to main content

Efficient Similarity Search in Metric Spaces with Cluster Reduction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7404))

Abstract

Clustering-based methods for searching in metric spaces partition the space into a set of disjoint clusters. When solving a query, some clusters are discarded without comparing them with the query object, and clusters that can not be discarded are searched exhaustively. In this paper we propose a new strategy and algorithms for clustering-based methods that avoid the exhaustive search within clusters that can not be discarded, at the cost of some extra information in the index. This new strategy is based on progressively reducing the cluster until it can be discarded from the result. We refer to this approach as cluster reduction. We present the algorithms for range and kNN search. The results obtained in an experimental evaluation with synthetic and real collections show that the search cost can be reduced by a 13% - 25% approximately with respect to existing methods.

This work has been partially funded by “Ministerio de Ciencia y Innovación” (PGE and FEDER) refs. TIN2009-14560-C03-02, TIN2010-21246-C02-01, and ref. AP2010-6038 (FPU Program) for Alberto Ordóñez Pereira, and by “Xunta de Galicia” refs. 2010/17 (Fondos FEDER), and 10SIN028E.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys 33, 273–321 (2001)

    Article  Google Scholar 

  2. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity search. The metric space approach. Advances in Database Systems, vol. 32. Springer (2006)

    Google Scholar 

  3. Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM Transactions on Database Systems 28(4), 517–580 (2006)

    Article  Google Scholar 

  4. Kalantari, I., McDonald, G.: A data structure and an algorithm for the nearest point problem. IEEE Transactions on Software Engineering 9, 631–634 (1983)

    Article  MATH  Google Scholar 

  5. Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40, 175–179 (1991)

    Article  MATH  Google Scholar 

  6. Brin, S.: Near neighbor search in large metric spaces. In: Procs. of Conf. on Very Large Databases (VLDB 1995), pp. 574–584. Morgan Kaufmann Publishers (1995)

    Google Scholar 

  7. Dehne, F., Noltemeier, H.: Voronoi trees and clustering problems. Information Systems 12(2), 171–175 (1987)

    Article  Google Scholar 

  8. Navarro, G.: Searching in metric spaces by spatial approximation. In: Procs. of String Processing and Information Retrieval (SPIRE 1999), pp. 141–148. IEEE CS Press (1999)

    Google Scholar 

  9. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Procs. of Conf. on Very Large Databases (VLDB 1997), pp. 426–435. ACM Press (1997)

    Google Scholar 

  10. Traina Jr., C., Traina, A.J.M., Seeger, B., Faloutsos, C.: Slim-Trees: High Performance Metric Trees Minimizing Overlap between Nodes. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 51–65. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  11. Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognition Letters 26(9), 1363–1376 (2005)

    Article  Google Scholar 

  12. Bozkaya, T., Ozsoyoglu, M.: Distance-based indexing for high-dimensional metric spaces. In: Proc. of the ACM Conf. on Management of Data (SIGMOD 1997), pp. 357–368. ACM Press (1997)

    Google Scholar 

  13. Novak, D., Batko, M., Zezula, P.: Metric index: An efficient and scalable solution for precise and approximate similarity search. Information Systems 36(4), 721–733 (2009)

    Article  Google Scholar 

  14. Skopal, T., Pokorný, J., Snásel, V.: Pm-tree: Pivoting metric tree for similarity search in multimedia databases. In: Procs. of Advances in Database Systems (ADBIS 2004), Local Procs., pp. 803–815 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ares, L.G., Brisaboa, N.R., Ordóñez Pereira, A., Pedreira, O. (2012). Efficient Similarity Search in Metric Spaces with Cluster Reduction. In: Navarro, G., Pestov, V. (eds) Similarity Search and Applications. SISAP 2012. Lecture Notes in Computer Science, vol 7404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32153-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32153-5_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32152-8

  • Online ISBN: 978-3-642-32153-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics