Extreme pivots: a pivot selection strategy for faster metric search

  • Guillermo Ruiz
  • Edgar Chavez
  • Ubaldo Ruiz
  • Eric S. TellezEmail author
Regular Paper


This manuscript presents the extreme pivots (EP) metric index, a data structure, to speed up exact proximity searching in the metric space model. For the EP, we designed an automatic rule to select the best pivots for a dataset working on limited memory resources. The net effect is that our approach solves queries efficiently with a small memory footprint, and without a prohibitive construction time. In contrast with other related structures, our performance is achieved automatically without dealing directly with the index’s parameters, using optimization techniques over a model of the index. The EP’s model is studied in-depth in this contribution. In practical terms, an interested user only needs to provide the available memory and a sample of the query distribution as parameters. The resulting index is quickly built, and has a good trade-off among memory usage, preprocessing, and search time. We provide an extensive experimental comparison with state-of-the-art searching methods. We also carefully compared the performance of metric indexes in several scenarios, firstly with synthetic data to characterize performance as a function of the intrinsic dimension and the size of the database, and also in different real-world datasets with excellent results.


Nearest neighbors search Pivot-based metric indexes Extreme pivots 



  1. 1.
    Arya S, Mount D, Netanyahu N, Silverman R, Wu Y (1998) An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. J ACM 45(6):891–923MathSciNetCrossRefGoogle Scholar
  2. 2.
    Böhm C, Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373. CrossRefGoogle Scholar
  3. 3.
    Bolettieri P, Esuli A, Falchi F, Lucchese C, Perego R, Piccioli T, Rabitti F (2009) CoPhIR: a test collection for content-based image retrieval. CoRR abs/0905.4627v2.
  4. 4.
    Burges CJC (2010) Dimension reduction: a guided tour (foundations and trends(r) in machine learning), 1st edn. Now Publishers Inc, Microsoft Research, Boston. CrossRefGoogle Scholar
  5. 5.
    Bustos B, Navarro G, Chávez E (2003) Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit Lett 24(14):2357–2366CrossRefGoogle Scholar
  6. 6.
    Celik C (2002) Priority vantage points structures for similarity queries in metric spaces. In: EurAsia-ICT ’02: proceedings of the 1st EurAsian conference on information and communication technology. Springer, London, pp 256–263zbMATHGoogle Scholar
  7. 7.
    Celik C (2008) Effective use of space for pivot-based metric indexing structures. In: SISAP ’08: proceedings of the 1st international workshop on similarity search and applications (sisap 2008). IEEE Computer Society, Washington, pp 113–120.
  8. 8.
    Chávez E, Marroquin JL, Baeza-Yates R (1999) Spaghettis: an array based algorithm for similarity queries in metric spaces. In: String processing and information retrieval symposium, 1999 and international workshop on groupware, pp 38–46. IEEEGoogle Scholar
  9. 9.
    Chávez E, Navarro G (2003) Probabilistic proximity search: fighting the curse of dimensionality in metric spaces. Inf Process Lett 85:39–46MathSciNetCrossRefGoogle Scholar
  10. 10.
    Chávez E, Navarro G (2005) A compact space decomposition for effective metric indexing. Pattern Recognit Lett 26:1363–1376. CrossRefGoogle Scholar
  11. 11.
    Chavez E, Navarro G, Baeza-Yates R, Marroquin JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321. CrossRefGoogle Scholar
  12. 12.
    Chen L, Gao Y, Zheng B, Jensen CS, Yang H, Yang K (2017) Pivot-based metric indexing. Proc VLDB Endow 10(10):1058–1069. CrossRefGoogle Scholar
  13. 13.
    Chávez E, Ludueña V, Reyes N, Roggero P (2016) Faster proximity searching with the distal sat. Inf Syst 59:15–47. CrossRefGoogle Scholar
  14. 14.
    Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd international conference on very large data bases, VLDB ’97. Morgan Kaufmann Publishers Inc., San Francisco, pp 426–435.
  15. 15.
    Cormen TH, Leiserson C, Rivest RL, Stein CELC (2001) Introduction to algorithms, 2nd edn. McGraw-Hill Inc, New YorkzbMATHGoogle Scholar
  16. 16.
    Hjaltason GR, Samet H (2003) Index-driven similarity search in metric spaces. ACM Trans Database Syst 28(4):517–580. CrossRefGoogle Scholar
  17. 17.
    Hjaltason GR, Samet H (2003) Index-driven similarity search in metric spaces (survey article). ACM Trans Database Syst (TODS) 28(4):517–580CrossRefGoogle Scholar
  18. 18.
    Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) idistance: an adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2):364–397. CrossRefGoogle Scholar
  19. 19.
    Micó ML, Oncina J, Vidal E (1994) A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recognit Lett 15:9–17. CrossRefGoogle Scholar
  20. 20.
    Mirylenka K, Giannakopoulos G, Do LM, Palpanas T (2017) On classifier behavior in the presence of mislabeling noise. Data Min Knowl Discov 31(3):661–701. MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Navarro G (2002) Searching in metric spaces by spatial approximation. Very Large Databases J (VLDBJ) 11(1):28–46CrossRefGoogle Scholar
  22. 22.
    Novak D, Batko M (2009) Metric index: an efficient and scalable solution for similarity search. In: Second international workshop on similarity search and applications, 2009. SISAP ’09, pp. 65–73.
  23. 23.
    Pedreira O, Brisaboa N (2007) Spatial selection of sparse pivots for similarity search in metric spaces. In: van Leeuwen J, Italiano G, van der Hoek W, Meinel C, Sack H, Plášil F (eds) SOFSEM 2007: theory and practice of computer science. Lecture notes in computer science, vol 4362. Springer, Berlin, pp 434–445. zbMATHGoogle Scholar
  24. 24.
    Pestov V (2007) Intrinsic dimension of a dataset: what properties does one expect? In: Proceedings of 20th International Joint Conference on Neural Networks, pp 1775–1780Google Scholar
  25. 25.
    Pestov V (2008) An axiomatic approach to intrinsic dimension of a dataset. Neural Netw 21(2–3):204–213CrossRefGoogle Scholar
  26. 26.
    Pestov V (2010) Indexability, concentration, and VC theory. In: Proceedings of 3rd international conference on similarity search and applications (SISAP), pp 3–12Google Scholar
  27. 27.
    Pestov V (2010) Intrinsic dimensionality. ACM SIGSPATIAL 2:8–11. CrossRefGoogle Scholar
  28. 28.
    Ruiz G, Santoyo F, Chávez E, Figueroa K, Tellez ES (2013) Extreme pivots for faster metric indexes. In: Brisaboa N, Pedreira O, Zezula P (eds) Similarity search and applications. Springer, Berlin, pp 115–126CrossRefGoogle Scholar
  29. 29.
    Samet H (2006) Foundations of multidimensional and metric data structures. Morgan Kaufmann, Los AltoszbMATHGoogle Scholar
  30. 30.
    Shaft U, Ramakrishnan R (2006) Theory of nearest neighbors indexability. ACM Trans Database Syst 31:814–838. CrossRefGoogle Scholar
  31. 31.
    Skopal T (2004) Pivoting m-tree: a metric access method for efficient similarity search. In: DATESO’04, pp 27–37Google Scholar
  32. 32.
    Skopal T (2010) Where are you heading, metric access methods?: a provocative survey. In: Proceedings of the 3rd international conference on similarity search and applications, SISAP’10. ACM, New York, pp 13–21.
  33. 33.
    Skopal T, Bustos B (2011) On nonmetric similarity search problems in complex domains. ACM Comput Surv 43(4), art. 34CrossRefGoogle Scholar
  34. 34.
    Tellez E, Ruiz G, Chavez E (2016) Singleton indexes for nearest neighbor search. Inf Syst 60:50–68. CrossRefGoogle Scholar
  35. 35.
    Theiler J (1990) Estimating fractal dimension. J Opt Soc Am A 7(6):1055–1073. MathSciNetCrossRefGoogle Scholar
  36. 36.
    Vidal Ruiz E (1986) An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recognit Lett 4:145–157CrossRefGoogle Scholar
  37. 37.
    Volnyansky I, Pestov V (2009) Curse of dimensionality in pivot based indexes. In: Proceedings of 2nd international workshop on similarity search and applications (SISAP), pp 39–46.
  38. 38.
    Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the 4th annual ACM-SIAM symposium on discrete algorithms, SODA ’93. Society for Industrial and Applied Mathematics, Philadelphia, pp 311–321.
  39. 39.
    Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search—the metric space approach. Advances in database systems, vol 32. Springer, BelrinCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.CONACyT-CentroGeo AguascalientesAguascalientesMexico
  2. 2.CICESEEnsenadaMexico
  3. 3.CONACyT-CICESEEnsenadaMexico
  4. 4.CONACyT - INFOTEC Centro de Investigación e Innovación en Tecnologías de la Información y ComunicaciónAguascalientesMexico

Personalised recommendations