Skip to main content

On Fast Non-metric Similarity Search by Metric Access Methods

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3896))

Abstract

The retrieval of objects from a multimedia database employs a measure which defines a similarity score for every pair of objects. The measure should effectively follow the nature of similarity, hence, it should not be limited by the triangular inequality, regarded as a restriction in similarity modeling. On the other hand, the retrieval should be as efficient (or fast) as possible. The measure is thus often restricted to a metric, because then the search can be handled by metric access methods (MAMs). In this paper we propose a general method of non-metric search by MAMs. We show the triangular inequality can be enforced for any semimetric (reflexive, non-negative and symmetric measure), resulting in a metric that preserves the original similarity orderings (retrieval effectiveness). We propose the TriGen algorithm for turning any black-box semimetric into (approximated) metric, just by use of distance distribution in a fraction of the database. The algorithm finds such a metric for which the retrieval efficiency is maximized, considering any MAM.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, p. 420. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Ashby, F., Perrin, N.: Toward a unified theory of similarity and recognition. Psychological Review 95(1), 124–150 (1988)

    Article  Google Scholar 

  3. Bartolini, I., Ciaccia, P., Patella, M.: WARP: Accurate Retrieval of Shapes Using Phase of Fourier Descriptors and Time Warping Distance. IEEE Pattern Analysis and Machine Intelligence 27(1), 142–147 (2005)

    Article  Google Scholar 

  4. Chávez, E., Navarro, G.: A probabilistic spell for the curse of dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 147–160. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  5. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)

    Article  Google Scholar 

  6. Ciaccia, P., Patella, M.: Searching in metric spaces with user-defined and approximate distances. ACM Database Systems 27(4), 398–437 (2002)

    Article  Google Scholar 

  7. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: VLDB 1997, pp. 426–435 (1997)

    Google Scholar 

  8. Corazza, P.: Introduction to metric-preserving functions. American Mathematical Monthly 104(4), 309–323 (1999)

    Article  MathSciNet  Google Scholar 

  9. Dohnal, V., Gennaro, C., Savino, P., Zezula, P.: D-index: Distance searching index for metric data sets. Multimedia Tools and Applications 21(1), 9–33 (2003)

    Article  Google Scholar 

  10. Donahue, M., Geiger, D., Liu, T., Hummel, R.: Sparse representations for image decomposition with occlusions. In: CVPR, pp. 7–12 (1996)

    Google Scholar 

  11. Faloutsos, C., Lin, K.: Fastmap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. In: SIGMOD (1995)

    Google Scholar 

  12. Filho, R.F.S., Traina, A.J.M., Traina, C., Faloutsos, C.: Similarity search without tears: The OMNI family of all-purpose access methods. In: ICDE (2001)

    Google Scholar 

  13. Goh, K.-S., Li, B., Chang, E.: DynDex: a dynamic and non-metric space indexer. In: ACM Multimedia (2002)

    Google Scholar 

  14. Hart, P.: The condensed nearest neighbour rule. IEEE Transactions on Information Theory 14(3), 515–516 (1968)

    Article  Google Scholar 

  15. Hjaltason, G.R., Samet, H.: Properties of embedding methods for similarity searching in metric spaces. IEEE Patt.Anal. and Mach.Intell. 25(5), 530–549 (2003)

    Article  Google Scholar 

  16. Howarth, P., Rüger, S.M.: Fractional distance measures for content-based image retrieval. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 447–456. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. Huttenlocher, D., Klanderman, G., Rucklidge, W.: Comparing images using the hausdorff distance. IEEE Patt. Anal. and Mach. Intell. 15(9), 850–863 (1993)

    Article  Google Scholar 

  18. Jacobs, D., Weinshall, D., Gdalyahu, Y.: Classification with nonmetric distances: Image retrieval and class representation. IEEE Pattern Analysis and Machine Intelligence 22(6), 583–600 (2000)

    Article  Google Scholar 

  19. Jain, A.K., Zongker, D.E.: Representation and recognition of handwritten digits using deformable templates. IEEE Patt.Anal.Mach.Intell. 19(12), 1386–1391 (1997)

    Article  Google Scholar 

  20. Jesorsky, O., Kirchberg, K.J., Frischholz, R.W.: Robust face detection using the hausdorff distance. In: Bigun, J., Smeraldi, F. (eds.) AVBPA 2001. LNCS, vol. 2091, p. 90. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  21. Krumhansl, C.L.: Concerning the applicability of geometric models to similar data: The interrelationship between similarity and spatial density. Psychological Review 85(5), 445–463 (1978)

    Article  Google Scholar 

  22. Mandl, T.: Learning similarity functions in information retrieval. In: EUFIT (1998)

    Google Scholar 

  23. Rosch, E.: Cognitive reference points. Cognitive Psychology 7, 532–547 (1975)

    Article  Google Scholar 

  24. Rothkopf, E.: A measure of stimulus similarity and errors in some paired-associate learning tasks. J. of Experimental Psychology 53(2), 94–101 (1957)

    Article  Google Scholar 

  25. Santini, S., Jain, R.: Similarity measures. IEEE Pattern Analysis and Machine Intelligence 21(9), 871–883 (1999)

    Article  Google Scholar 

  26. Skopal, T., Pokorný, J., Krátký, M., Snášel, V.: Revisiting M-tree building principles. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 148–162. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  27. Skopal, T., Pokorný, J., Snášel, V.: Nearest neighbours search using the PM-tree. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 803–815. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  28. Tversky, A.: Features of similarity. Psychological review 84(4), 327–352 (1977)

    Article  Google Scholar 

  29. Tversky, A., Gati, I.: Similarity, separability, and the triangle inequality. Psychological Review 89(2), 123–154 (1982)

    Article  Google Scholar 

  30. Wavelet-based Image Indexing and Searching, Stanford University, http://wang.ist.psu.edu

  31. Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB (1998)

    Google Scholar 

  32. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics 2(3), 408–421 (1972)

    Article  MATH  Google Scholar 

  33. Yi, B.-K., Jagadish, H.V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: ICDE 1998, pp. 201–208 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Skopal, T. (2006). On Fast Non-metric Similarity Search by Metric Access Methods. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_43

Download citation

  • DOI: https://doi.org/10.1007/11687238_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32960-2

  • Online ISBN: 978-3-540-32961-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics