Advertisement

Pivot selection for metric-space indexing

  • Rui Mao
  • Peihan Zhang
  • Xingliang Li
  • Xi Liu
  • Minhua LuEmail author
Original Article

Abstract

Metric-space indexing abstracts various data types into universal metric spaces and prunes data only exploiting the triangle inequality of the distance function in metric spaces. Since there is no coordinates in metric space, one usually first pick a number of reference points, pivots, and consider the distances from a data point to the pivots as its coordinates. In this paper, we first survey and discuss the state of the art of pivot selection for metric-space indexing from the perspectives of importance, objective function, number of pivots, and selection algorithm. Further, we propose a new objective function, a new method to determine the number of pivots and an incremental sampling framework for pivot selection. Experimental results show that the new objective function is more consistent with the query performance, the new method to determine the number of pivots is more efficient, and the incremental sampling framework leads to better query performance.

Keywords

Metric-space indexing Pivot selection Intrinsic dimension Objective function Range query 

Notes

Acknowledgments

This work is partially supported by the following Grants: China 863: 2015AA015305; NSF-China: 61170076, U1301252, 61471243; Guangdong Key Laboratory Project: 2012A061400024; NSF-Shenzhen: JCYJ20140418095735561, JCYJ20150731160834611; Shenzhen-Hong Kong Innovation circle Project: SGLH20131010163759789. Dr. Minhua Lu is the corresponding author.

References

  1. 1.
    Mao R, Honglong X, Wenbo W, Li J, Li Y, Minhua L (2015) Overcoming the challenge of variety: big data abstraction, the next evolution of data management for AAL communication systems. IEEE Commun Mag 53(1):42–47CrossRefGoogle Scholar
  2. 2.
    Roman S (1992) Advanced linear. Algebra graduate texts in mathematics, vol 135. Springer, BerlinCrossRefGoogle Scholar
  3. 3.
    Chavez E, Navarro G, Baeza-Yates R, Marroqu J (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321CrossRefGoogle Scholar
  4. 4.
    Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach. Springer, HeidelbergzbMATHGoogle Scholar
  5. 5.
    Samet H (2006) Foundations of multidimensional and metric data structures. Morgan-Kaufmann, San FranciscozbMATHGoogle Scholar
  6. 6.
    Hjaltason G, Samet H (2003) Index-driven similarity search in metric spaces. ACM Trans Database Syst (TODS) 28(4):517–580CrossRefGoogle Scholar
  7. 7.
    Mao R, Miranker W, Miranker DP (2012) Pivot Selection: dimension reduction for distance-based indexing. J Discret Algorithm Elsevier 13:32–46MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inf Proc Lett 40(4):175–179CrossRefzbMATHGoogle Scholar
  9. 9.
    Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In the fourth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied MathematicsGoogle Scholar
  10. 10.
    Bozkaya T, Ozsoyoglu M (1999) Indexing large metric spaces for similarity search queries. ACM Trans Database Syst 24(3):361–404CrossRefGoogle Scholar
  11. 11.
    Bustos B, Navarro G, Chavez E (2003) Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn Lett 24(14):2357–2366CrossRefzbMATHGoogle Scholar
  12. 12.
    Clarkson KL (2006) Nearest-neighbor searching and metric space dimensions, In: Nearest-neighbor methods for learning and vision: theory and practice, MIT Press, pp. 15–59Google Scholar
  13. 13.
    Kegl B (2003) Intrinsic dimension estimation using packing numbers. Adv Neural Inf Proc Syst 15:681–688Google Scholar
  14. 14.
    Camastra F (2003) Data dimensionality estimation methods: a survey. Pattern Recogn 36(12):2945–2954CrossRefzbMATHGoogle Scholar
  15. 15.
    Mao R, Xu W, Ramakrishnan S, Nuckolls G, Miranker DP (2005) On optimizing distance-based similarity search for biological databases. In the 2005 IEEE computational systems bioinformatics conference (CSB 2005)Google Scholar
  16. 16.
    Traina C, Jr, Traina A, Faloutsos C (1999) Distance exponent: a new concept for selectivity estimation in metric trees, Technical Report CMU-CS-99-110, Computer Science Department, Carnegie Mellon UniversityGoogle Scholar
  17. 17.
    Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? The 7th international conference on database theory. Springer, BerlinGoogle Scholar
  18. 18.
    Shaft U, Ramakrishnan R (2005) When is nearest neighbors indexable? In the tenth international conference on database theory (ICDT 2005). Springer, BerlinGoogle Scholar
  19. 19.
    Grassberger P, Procaccia I (1983) Measuring the strangeness of strange attractors. Physica 9D(1–2):189–208MathSciNetzbMATHGoogle Scholar
  20. 20.
    Roweis S (1997) EM Algorithms for PCA and SPCA. Neural Inf Proc Syst 10:626–632Google Scholar
  21. 21.
    Brin S (1995) Near neighbor search in large metric spaces. In the 21th international conference on very large data bases (VLDB’95). 1995. Zurich, Switzerland, Morgan Kaufmann Publishers IncGoogle Scholar
  22. 22.
    Ciaccia P, Patella M (1997) Bulk loading the M-tree. In 9th Australasian database conference (ADO’98)Google Scholar
  23. 23.
    Navarro G (1999) Searching in metric spaces by spatial approximation. In: Proceedings of the string processing and information retrieval symposium and international workshop on groupware. IEEE Computer SocietyGoogle Scholar
  24. 24.
    Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theoret Comput Sci 38:293–306MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Hochbaum DS, David B (1985) Shmoys, A best possible heuristic for the k-center problem. Math Op Res 10(2):180–184CrossRefzbMATHGoogle Scholar
  26. 26.
  27. 27.
    Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453CrossRefGoogle Scholar
  28. 28.
    Xu W, Miranker DP (2004) A metric model of amino acid substitution. Bioinformatics 20(8):1214–1221CrossRefGoogle Scholar
  29. 29.
    Navarro G (2009) Analyzing metric space indexes: what for? In the proceedings of the second international conference on similarity search and applications (SISAP2009), pp. 3–10Google Scholar
  30. 30.
    Venkateswaran J, Kahveci T, Jermaine CM, Lachwani D (2008) Reference-based indexing for metric spaces with costly distance measures. VLDB J 17(5):1231–1251 Springer CrossRefGoogle Scholar
  31. 31.
    Celik C (2002) Priority vantage points structures for similarity queries in metric spaces. In: Proceedings of EurAsia-ICT 2002: information and communication technology, ser. LNCS(2510). pp. 256–263. SpringerGoogle Scholar
  32. 32.
    Celik C (2008) Effective use of space for pivot-based metric indexing structures. In: Proceedings of international workshop on similarity search and applications (SISAP’08). IEEE Press, pp. 402–409Google Scholar
  33. 33.
    Micó ML, Oncina J, Vidal E (1994) A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognition Letters 5(1):9–17CrossRefGoogle Scholar
  34. 34.
    Vleugels J, Veltkamp RC (2002) Efficient image retrieval through vantage objects. Pattern Recogn. 35(1):69–80 Elsevier CrossRefzbMATHGoogle Scholar
  35. 35.
    Van Leuken RH, Veltkamp RC (2011) Selecting vantage objects for similarity indexing. ACM Trans Multim Comput Commun Appl 7(3):1–18CrossRefGoogle Scholar
  36. 36.
    Shapiro M (1977) The choice of reference points in best-match file searching. Commun ACM 20(5):339–343CrossRefGoogle Scholar
  37. 37.
    Ramasubramanian V, Paliwal KK (1992) An efficient approximation-elimination algorithm for fast nearest-neighbor search based on a spherical distance coordinate formulation. Pattern Recogn Lett 13(7):471–480CrossRefGoogle Scholar
  38. 38.
    Traina C Jr, Filho RF, Traina AJ, Vieira MR, Faloutsos C (2007) The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. VLDB J 16(4):483–505CrossRefGoogle Scholar
  39. 39.
    Mao R, Xu W, Singh N, Miranker DP (2005) An assessment of a metric space database index to support sequence homology. Int J Artif Intell Tools 14(5):867–885CrossRefGoogle Scholar
  40. 40.
    Hennig C, Latecki LJ (2003) The choice of vantage objects for image retrieval. Pattern Recognit 36(9):2187–2196CrossRefzbMATHGoogle Scholar
  41. 41.
    Brisaboa NR, Farina A, Pedreira O, Reyes N (2006) Similarity search using sparse pivots for efficient multimedia information retrieval. In Proceedings of the 8th IEEE international symposium on multimedia (ISM’06). IEEE Press, pp. 881–888Google Scholar
  42. 42.
    Bustos B, Pedreira O, Brisaboa NR (2008) A dynamic pivot selection technique for similarity search in metric spaces. In Proceedings of 1st international workshop on similarity search and applications (SISAP’08). IEEE Press, pp. 105–112Google Scholar
  43. 43.
    Berman A, Shapiro LG (1998) Selecting good keys for triangle-inequality-based pruning algorithms. In: Proceedings of the 1998 international workshop on content-based access of image and video databases (CAIVD ‘98), pp. 12–19,1998, Bombay, IndiaGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Rui Mao
    • 1
  • Peihan Zhang
    • 1
  • Xingliang Li
    • 2
  • Xi Liu
    • 1
  • Minhua Lu
    • 3
    Email author
  1. 1.College of Computer Science and Software EngineeringShenzhen UniversityShenzhenChina
  2. 2.College of Computer Science and TechnologyUniversity of Science and Technology of ChinaHefeiChina
  3. 3.School of MedicineShenzhen UniversityShenzhenChina

Personalised recommendations