Datenbank-Spektrum

, Volume 17, Issue 2, pp 155–167 | Cite as

Reducing the Distance Calculations when Searching an M‑Tree

  • Steffen Guhlemann
  • Uwe Petersohn
  • Klaus Meyer-Wegener
Fachbeitrag
  • 65 Downloads

Abstract

Recent years have brought rising interest in efficiently searching for similar entities in a broad range of domains. Such search can be used to facilitate working with unstructured data such as genome sequences, text corpora, complex production information, or multimedia content, where queries always contain an amount of noise. In such domains the only common structure is a distance function obeying the axioms of a metric. As mostly no other structure information is available, a lot of distances have to be computed during the course of a search. Contrary to classical database indexes, where the optimization focus is on reducing the number of disk accesses (or in case of in-memory databases the number of tree traversal operations), a major cost driver in such multimedia domains is this number of distance calculations which can be very computation intense.

There exists a range of index structures for supporting similarity search in metric spaces. A very promising one is the M‑Tree, along with a number of compatible extensions (e. g. Slim-Tree, Bulk Loaded M‑Tree, multi way insertion M‑Tree, \(M^{2}\)-Tree, etc.). The M‑Tree family uses common algorithms for the \(k\)-nearest-neighbor and range search. These algorithms leave room for optimization in terms of necessary distance calculations. In this paper we present new algorithms for these tasks to considerably improve retrieval performance of all M‑Tree-compatible data structures.

Keywords

Metric databases Metric access methods Index structures Multimedia databases Selectivity estimation Similarity search 

References

  1. 1.
    Aronovich L, Spiegler I (2007) CM-tree: a dynamic clustered index for similarity search in metric databases. Data Knowl Eng 63(3):919–946CrossRefGoogle Scholar
  2. 2.
    Baeza-Yates R, Cunto W, Manber U, Wu S (1994) Proximity matching using fixed-queries trees. In: Crochemore M, Gusfield D (eds) Combinatorial pattern matching. Lecture Notes in Computer Science, vol 807. Springer, Berlin Heidelberg, pp 198–212CrossRefGoogle Scholar
  3. 3.
    Bartolini I, Ciaccia P, Patella M (2002) String matching with metric trees using an approximate distance. In: String processing and information retrieval. Springer, Berlin Heidelberg, pp 423–431Google Scholar
  4. 4.
    Bozkaya T, Ozsoyoglu M (1997) Distance-based indexing for high-dimensional metric spaces. ACM SIGMOD Rec 26(2):357–368CrossRefGoogle Scholar
  5. 5.
    Brin S (1995) Near neighbor search in large metric spaces. In: Very Large Data Bases (VLDB). IEEE, Washington, D.C, pp 574–584 (Conference paper)Google Scholar
  6. 6.
    Burkhard WA, Keller RM (1973) Some approaches to best-match file searching. Commun ACM 16(4):230–236CrossRefMATHGoogle Scholar
  7. 7.
    Bustos B, Navarro G, Chávez E (2003) Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit Lett 24(14):2357–2366CrossRefMATHGoogle Scholar
  8. 8.
    Chávez E, Navarro G (2000) An effective clustering algorithm to index high dimensional metric spaces. Proc. 7th International Symposium on String Processing and Information Retrieval. IEEE CS Press, Washington DC, pp 75–86Google Scholar
  9. 9.
    Chávez E, Marroquín J, Navarro G (1999) Overcoming the curse of dimensionality. European Workshop on Content-based Multimedia Indexing (CBMI 99), pp 57–64Google Scholar
  10. 10.
    Chávez E, Marroquín J, Navarro G (2001) Fixed queries array: a fast and economical data structure for proximity searching. Multimed Tools Appl 14(2):113–135CrossRefMATHGoogle Scholar
  11. 11.
    Ciaccia P, Patella M (1998) Bulk loading the M‑tree. Proc. 9th Australasian Database Conf. (ADC), pp 15–26MATHGoogle Scholar
  12. 12.
    Ciaccia P, Patella M (2000) The M2-tree: processing complex multi-feature queries with just one index. DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries.Google Scholar
  13. 13.
    Ciaccia P, Patella M, Zezula P (1997) M‑tree: an efficient access method for similarity search in metric spaces. In: Very Large Data Bases (VLDB). ACM Press, New York, pp 426–435 (Conference paper)Google Scholar
  14. 14.
    Deepak P, Deshpande PM (2015) Operators for similarity search: semantics, techniques and usage scenarios. Springer, Berlin HeidelbergGoogle Scholar
  15. 15.
    Dehne F, Noltemeier H (1988) Voronoi trees and clustering problems. In: Ferrate G, Pavlidis T, Sanfeliu A, Bunke H (eds) Syntactic and structural pattern recognition. NATO ASI Series, vol 45. Springer, Berlin Heidelberg, pp 185–194CrossRefGoogle Scholar
  16. 16.
    Dohnal V, Gennaro C, Savino P, Zezula P (2003a) D‑index: distance searching index for metric data sets. Multimed Tools Appl 21(1):9–33CrossRefGoogle Scholar
  17. 17.
    Dohnal V, Gennaro C, Zezula P (2003b) Similarity join in metric spaces using eD-index. In: Database and Expert Systems Applications (DEXA). Springer, Berlin Heidelberg, pp 484–493 (Conference paper)CrossRefGoogle Scholar
  18. 18.
    Fenz D, Lange D, Rheinländer A, Naumann F, Leser U (2012) Efficient similarity search in very large string sets. In: Scientific and Statistical Database Management (SSDBM) Chania. (Conference paper)Google Scholar
  19. 19.
    Guhlemann S (2016) Neue Indexverfahren für die Ähnlichkeitssuche in metrischen Räumen über großen Datenmengen. PhD thesis, TU DresdenGoogle Scholar
  20. 20.
    Hetland ML (2009) The basic principles of metric indexing. Springer, Berlin Heidelberg, pp 199–232Google Scholar
  21. 21.
    Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst (TODS) 30(2):364–397CrossRefGoogle Scholar
  22. 22.
    Kalantari I, McDonald G (1983) A data structure and an algorithm for the nearest point problem. IEEE Trans Softw Eng 5:631–634. doi: 10.1109/TSE.1983.235263 CrossRefMATHGoogle Scholar
  23. 23.
    Lange D, Vogel T, Draisbach U, Naumann F (2011) Projektseminar ’Similarity Search Algorithms’. Datenbank Spektrum 11(1):51–57CrossRefGoogle Scholar
  24. 24.
    Micó M, Oncina J, Vidal E (1994) A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognit Lett 15(1):9–17CrossRefGoogle Scholar
  25. 25.
    Noltemeier H, Verbarg K, Zirkelbach C (1992) Monotonous Bisector* Trees – a tool for efficient partitioning of complex scenes of geometric objects. In: Monien B, Ottmann T (eds) Data structures and efficient algorithms. Lecture Notes in Computer Science, vol 594, pp 186–203CrossRefGoogle Scholar
  26. 26.
    Noltemeier H, Verbarg K, Zirkelbach C (1993) A data structure for representing and efficient querying large scenes of geometric objects: MB* trees. In: Farin G, Hagen H, Noltemeier H, Knödel W (eds) Geometric modelling. Springer, Vienna, pp 211–226CrossRefGoogle Scholar
  27. 27.
    Novak D, Zezula P (2014) Rank aggregation of candidate sets for efficient similarity search. In: Decker H, Lhotská L, Link S, Spies M, Wagner RR (eds) Database and Expert Systems Applications (DEXA). Lecture Notes in Computer Science, vol 8645. Springer, Cham, pp 42–58 (Conference paper)Google Scholar
  28. 28,.
    Novak D, Batko M, Zezula P (2011) Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf Syst 36(4):721–733CrossRefGoogle Scholar
  29. 29.
    Patella M (1999) Similarity search in multimedia databases. Dipartmento di Elettronica Informatica e Sistemistica, BolognaGoogle Scholar
  30. 30.
    Rheinländer A, Knobloch M, Hochmuth N, Leser U (2010) Prefix tree indexing for similarity search and similarity joins on genomic data. In: Int. Conf. on Scientific and Statistical Database Management (SSDBM). Springer, Cham, pp 519–536 (Conference paper)CrossRefGoogle Scholar
  31. 31.
    Sedmidubsky J, Mic V, Zezula P (2015) Face image retrieval revisited. In: Int. Conf. on Similarity Search and Applications. Springer, Cham, pp 204–216 (Conference paper)CrossRefGoogle Scholar
  32. 32.
    Shapiro M (1977) The choice of reference points in best-match file searching. Commun ACM 20(5):339–343CrossRefGoogle Scholar
  33. 33.
    Skopal T (2004) Pivoting M‑tree: a metric access method for efficient similarity search. In: Dateso Annual Int. Workshop on Databases, Texts, Specifications and Objects Desna, 14.-16. April 2004. Desna, pp 27–37 (Conference paper)Google Scholar
  34. 34.
    Skopal T, Pokornỳ J, Krátkỳ M, Snášel V (2003) Revisiting M‑tree building principles. In: Advances in Databases and Information Systems. Lecture Notes in Computer Science, vol 2798. Springer, Berlin Heidelberg, pp 148–162CrossRefGoogle Scholar
  35. 35.
    Traina C, Traina A, Seeger B, Faloutsos C (2000) Slim-trees: High performance metric trees minimizing overlap between nodes. In: Advances in Database Technology, EDBT. Lecture Notes in Computer Science, vol 1777. Springer, Berlin Heidelberg, pp 51–65Google Scholar
  36. 36.
    Traina C Jr, Traina A, Faloutsos C, Seeger B (2002) Fast indexing and visualization of metric data sets using slim-trees. Knowl Data Eng IEEE Trans 14(2):244–260CrossRefGoogle Scholar
  37. 37.
    Uhlmann J (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40(4):175–179CrossRefMATHGoogle Scholar
  38. 38.
    Vidal E (1986) An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recognit Lett 4(3):145–157CrossRefGoogle Scholar
  39. 39.
    Yianilos P (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: 4th Annual ACM-SIAM Symp. on Discrete Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, pp 311–321 (Conference paper)Google Scholar
  40. 40.
    Yianilos PN (1999) Excluded middle vantage point forests for nearest neighbor search. In: 6th DIMACS Implementation Challenge: Near Neighbor Searches (ALENEX) Baltimore. (Conference paper)Google Scholar
  41. 41.
    Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach. Springer, BerlinMATHGoogle Scholar
  42. 42.
    Zhou X, Wang G, Zhou X, Yu G (2005) BM+-tree: a hyperplane-based index method for high-dimensional metric spaces. In: Database Systems for Advanced Applications. Springer, Berlin Heidelberg, pp 398–409CrossRefGoogle Scholar
  43. 43.
    Zierenberg M, Schmitt I (2015) Optimizing the Distance Computation Order of Multi-Feature Similarity Search Indexing. In: Int. Conf. on Similarity Search and Applications. Springer, New York, pp 90–96Google Scholar

Copyright information

© Springer-Verlag GmbH Deutschland 2017

Authors and Affiliations

  1. 1.Faculty of Computer Science, Institute for Artificial IntelligenceTU DresdenDresdenGermany
  2. 2.Faculty of Engineering, Department of Computer ScienceFriedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations