Algorithmic Exploration of Axiom Spaces for Efficient Similarity Search at Large Scale

  • Tomáš Skopal
  • Tomáš Bartoš
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7404)

Abstract

Similarity search is becoming popular in even more disciplines, such as multimedia databases, bioinformatics, social networks, to name a few. The existing indexing techniques often assume the metric space model that could be too restrictive from the domain point of view. Hence, many modern applications that involve complex similarities do not use any indexing and use just sequential search, so they are applicable only to small databases. In this paper we revisit the assumptions which persist in the mainstream research of content-based retrieval. Leaving the traditional indexing paradigms such as the metric space model, our goal is to propose alternative methods for indexing that shall lead to high-performance similarity search. We introduce the design of the algorithmic framework SIMDEX for exploration of analytical properties (axioms) useful for indexing that hold in a given complex similarity space but were not discovered so far. Consequently, the known axioms will be localized as a subset within the universe of all axioms suitable for indexing. Speaking in a hyperbole, for database research the discovery of new axioms valid in some similarity space might have an impact comparable to the discovery of new laws of physics holding in parallel universes.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barták, R.: Constraint Models for Reasoning on Unification in Inductive Logic Programming. In: Dicheva, D., Dochev, D. (eds.) AIMSA 2010. LNCS, vol. 6304, pp. 101–110. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Beecks, C., Uysal, M.S., Seidl, T.: Signature quadratic form distance. In: Proc. ACM International Conference on Image and Video Retrieval, pp. 438–445 (2010)Google Scholar
  3. 3.
    Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: CoPhIR: a Test Collection for Content-Based Image Retrieval. CoRR, abs/0905.4627v2 (2009)Google Scholar
  4. 4.
    Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)CrossRefGoogle Scholar
  5. 5.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6 (2004)Google Scholar
  6. 6.
    Galgonek, J., Hoksza, D., Skopal, T.: SProt: sphere-based protein structure similarity algorithm. Proteome Science 9, 1–12 (2011)CrossRefGoogle Scholar
  7. 7.
    Hetland, M.L.: Ptolemaic indexing. arXiv:0911.4384 [cs.DS] (2009)Google Scholar
  8. 8.
    Hettich, S., Bay, S.D.: The UCI KDD archive (1999), http://kdd.ics.uci.edu
  9. 9.
    Howarth, P., Rüger, S.: Fractional distance measures for content-based image retrieval. In: 27th European Conference on Information Retrieval, pp. 447–456. Springer (2005)Google Scholar
  10. 10.
    Lämmel, R., Schulte, W.: Controllable Combinatorial Coverage in Grammar-Based Testing. In: Uyar, M.Ü., Duale, A.Y., Fecko, M.A. (eds.) TestCom 2006. LNCS, vol. 3964, pp. 19–38. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Lokoč, J., Hetland, M.L., Skopal, T., Beecks, C.: Ptolemaic indexing of the signature quadratic form distance. In: Proceedings of the Fourth International Conference on Similarity Search and Applications, pp. 9–16. ACM (2011)Google Scholar
  12. 12.
    Navarro, G.: Analyzing metric space indexes: What for? In: IEEE SISAP 2009, pp. 3–10 (2009)Google Scholar
  13. 13.
    Noam, Chomsky: On certain formal properties of grammars. Information and Control 2(2), 137–167 (1959)MathSciNetMATHCrossRefGoogle Scholar
  14. 14.
    Novák, J., Skopal, T., Hoksza, D., Lokoč, J.: Non-metric Similarity Search of Tandem Mass Spectra Including Posttranslational Modifications. Journal of Discrete Algorithms 13 (2012)Google Scholar
  15. 15.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)Google Scholar
  16. 16.
    Skopal, T.: Unified framework for fast exact and approximate search in dissimilarity spaces. ACM Transactions on Database Systems 32(4), 1–46 (2007)CrossRefGoogle Scholar
  17. 17.
    Skopal, T., Bustos, B.: On nonmetric similarity search problems in complex domains. ACM Comput. Surv. 43, 34:1–34:50 (2011)CrossRefGoogle Scholar
  18. 18.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)CrossRefGoogle Scholar
  19. 19.
    Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag New York, Inc., Secaucus (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Tomáš Skopal
    • 1
  • Tomáš Bartoš
    • 1
  1. 1.SIRET Research Group, Faculty of Mathematics and Physics, Department of Software EngineeringCharles University in PragueCzech Republic

Personalised recommendations