On Fast Non-metric Similarity Search by Metric Access Methods

  • Tomáš Skopal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

The retrieval of objects from a multimedia database employs a measure which defines a similarity score for every pair of objects. The measure should effectively follow the nature of similarity, hence, it should not be limited by the triangular inequality, regarded as a restriction in similarity modeling. On the other hand, the retrieval should be as efficient (or fast) as possible. The measure is thus often restricted to a metric, because then the search can be handled by metric access methods (MAMs). In this paper we propose a general method of non-metric search by MAMs. We show the triangular inequality can be enforced for any semimetric (reflexive, non-negative and symmetric measure), resulting in a metric that preserves the original similarity orderings (retrieval effectiveness). We propose the TriGen algorithm for turning any black-box semimetric into (approximated) metric, just by use of distance distribution in a fraction of the database. The algorithm finds such a metric for which the retrieval efficiency is maximized, considering any MAM.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, p. 420. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  2. 2.
    Ashby, F., Perrin, N.: Toward a unified theory of similarity and recognition. Psychological Review 95(1), 124–150 (1988)CrossRefGoogle Scholar
  3. 3.
    Bartolini, I., Ciaccia, P., Patella, M.: WARP: Accurate Retrieval of Shapes Using Phase of Fourier Descriptors and Time Warping Distance. IEEE Pattern Analysis and Machine Intelligence 27(1), 142–147 (2005)CrossRefGoogle Scholar
  4. 4.
    Chávez, E., Navarro, G.: A probabilistic spell for the curse of dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 147–160. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)CrossRefGoogle Scholar
  6. 6.
    Ciaccia, P., Patella, M.: Searching in metric spaces with user-defined and approximate distances. ACM Database Systems 27(4), 398–437 (2002)CrossRefGoogle Scholar
  7. 7.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: VLDB 1997, pp. 426–435 (1997)Google Scholar
  8. 8.
    Corazza, P.: Introduction to metric-preserving functions. American Mathematical Monthly 104(4), 309–323 (1999)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Dohnal, V., Gennaro, C., Savino, P., Zezula, P.: D-index: Distance searching index for metric data sets. Multimedia Tools and Applications 21(1), 9–33 (2003)CrossRefGoogle Scholar
  10. 10.
    Donahue, M., Geiger, D., Liu, T., Hummel, R.: Sparse representations for image decomposition with occlusions. In: CVPR, pp. 7–12 (1996)Google Scholar
  11. 11.
    Faloutsos, C., Lin, K.: Fastmap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. In: SIGMOD (1995)Google Scholar
  12. 12.
    Filho, R.F.S., Traina, A.J.M., Traina, C., Faloutsos, C.: Similarity search without tears: The OMNI family of all-purpose access methods. In: ICDE (2001)Google Scholar
  13. 13.
    Goh, K.-S., Li, B., Chang, E.: DynDex: a dynamic and non-metric space indexer. In: ACM Multimedia (2002)Google Scholar
  14. 14.
    Hart, P.: The condensed nearest neighbour rule. IEEE Transactions on Information Theory 14(3), 515–516 (1968)CrossRefGoogle Scholar
  15. 15.
    Hjaltason, G.R., Samet, H.: Properties of embedding methods for similarity searching in metric spaces. IEEE Patt.Anal. and Mach.Intell. 25(5), 530–549 (2003)CrossRefGoogle Scholar
  16. 16.
    Howarth, P., Rüger, S.M.: Fractional distance measures for content-based image retrieval. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 447–456. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Huttenlocher, D., Klanderman, G., Rucklidge, W.: Comparing images using the hausdorff distance. IEEE Patt. Anal. and Mach. Intell. 15(9), 850–863 (1993)CrossRefGoogle Scholar
  18. 18.
    Jacobs, D., Weinshall, D., Gdalyahu, Y.: Classification with nonmetric distances: Image retrieval and class representation. IEEE Pattern Analysis and Machine Intelligence 22(6), 583–600 (2000)CrossRefGoogle Scholar
  19. 19.
    Jain, A.K., Zongker, D.E.: Representation and recognition of handwritten digits using deformable templates. IEEE Patt.Anal.Mach.Intell. 19(12), 1386–1391 (1997)CrossRefGoogle Scholar
  20. 20.
    Jesorsky, O., Kirchberg, K.J., Frischholz, R.W.: Robust face detection using the hausdorff distance. In: Bigun, J., Smeraldi, F. (eds.) AVBPA 2001. LNCS, vol. 2091, p. 90. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  21. 21.
    Krumhansl, C.L.: Concerning the applicability of geometric models to similar data: The interrelationship between similarity and spatial density. Psychological Review 85(5), 445–463 (1978)CrossRefGoogle Scholar
  22. 22.
    Mandl, T.: Learning similarity functions in information retrieval. In: EUFIT (1998)Google Scholar
  23. 23.
    Rosch, E.: Cognitive reference points. Cognitive Psychology 7, 532–547 (1975)CrossRefGoogle Scholar
  24. 24.
    Rothkopf, E.: A measure of stimulus similarity and errors in some paired-associate learning tasks. J. of Experimental Psychology 53(2), 94–101 (1957)CrossRefGoogle Scholar
  25. 25.
    Santini, S., Jain, R.: Similarity measures. IEEE Pattern Analysis and Machine Intelligence 21(9), 871–883 (1999)CrossRefGoogle Scholar
  26. 26.
    Skopal, T., Pokorný, J., Krátký, M., Snášel, V.: Revisiting M-tree building principles. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 148–162. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  27. 27.
    Skopal, T., Pokorný, J., Snášel, V.: Nearest neighbours search using the PM-tree. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 803–815. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  28. 28.
    Tversky, A.: Features of similarity. Psychological review 84(4), 327–352 (1977)CrossRefGoogle Scholar
  29. 29.
    Tversky, A., Gati, I.: Similarity, separability, and the triangle inequality. Psychological Review 89(2), 123–154 (1982)CrossRefGoogle Scholar
  30. 30.
    Wavelet-based Image Indexing and Searching, Stanford University, http://wang.ist.psu.edu
  31. 31.
    Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB (1998)Google Scholar
  32. 32.
    Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics 2(3), 408–421 (1972)MATHCrossRefGoogle Scholar
  33. 33.
    Yi, B.-K., Jagadish, H.V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: ICDE 1998, pp. 201–208 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tomáš Skopal
    • 1
  1. 1.FMP, Department of Software EngineeringCharles University in PraguePrague 1Czech Republic

Personalised recommendations