Advertisement

Pruning Algorithms for Low-Dimensional Non-metric k-NN Search: A Case Study

  • Leonid BoytsovEmail author
  • Eric Nyberg
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11807)

Abstract

We focus on low-dimensional non-metric search, where tree-based approaches permit efficient and accurate retrieval while having short indexing time. These methods rely on space partitioning and require a pruning rule to avoid visiting unpromising parts. We consider two known data-driven approaches to extend these rules to non-metric spaces: TriGen and a piece-wise linear approximation of the pruning rule. We propose and evaluate two adaptations of TriGen to non-symmetric similarities (TriGen does not support non-symmetric distances). We also evaluate a hybrid of TriGen and the piece-wise linear approximation pruning. We find that this hybrid approach is often more effective than either of the pruning rules. We make our software publicly available.

Keywords

\(k\)-NN search Non-metric distance VP-tree TriGen 

Notes

Acknowledgments

This work was done while Leonid Boytsov was a PhD student at CMU. Authors gratefully acknowledge the support by the NSF grant #1618159.

References

  1. 1.
    Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1999).  https://doi.org/10.1007/3-540-49257-7_15CrossRefGoogle Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  3. 3.
    Boytsov, L.: Efficient and accurate non-metric k-NN search with applications to text matching. Ph.D. thesis, Carnegie Mellon University (2017)Google Scholar
  4. 4.
    Boytsov, L., Naidan, B.: Engineering efficient and effective non-metric space library. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 280–293. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41062-8_28CrossRefGoogle Scholar
  5. 5.
    Boytsov, L., Naidan, B.: Learning to prune in metric and non-metric spaces. In: Proceedings of NIPS 2013, pp. 1574–1582 (2013)Google Scholar
  6. 6.
    Bozkaya, T., Özsoyoglu, Z.M.: Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst. 24(3), 361–404 (1999).  https://doi.org/10.1145/328939.328959CrossRefGoogle Scholar
  7. 7.
    Bregman, L.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Cayton, L.: Fast nearest neighbor retrieval for bregman divergences. In: Proceedings of the 25th International Conference on Machine Learning, pp. 112–119. ACM (2008)Google Scholar
  9. 9.
    Chávez, E., Navarro, G.: Probabilistic proximity search: fighting the curse of dimensionality in metric spaces. Inf. Process. Lett. 85(1), 39–46 (2003)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRefGoogle Scholar
  11. 11.
    Fog, A.: Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for intel, AMD and VIA CPUs (2011)Google Scholar
  12. 12.
    Hetland, M.L., Skopal, T., Lokoč, J., Beecks, C.: Ptolemaic access methods: challenging the reign of the metric space model. Inf. Syst. 38(7), 989–1006 (2013)CrossRefGoogle Scholar
  13. 13.
    Itakura, F., Saito, S.: Analysis synthesis telephony based on the maximum likelihood method. In: Proceedings of the 6th International Congress on Acoustics, pp. C17–C20 (1968)Google Scholar
  14. 14.
    Knuth, D.E.: The Art of Computer Programming: Volume 3: Sorting and Searching. Atmospheric Chemistry & Physics (1973)Google Scholar
  15. 15.
    Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)Google Scholar
  17. 17.
    Markatou, M., Chen, Y., Afendras, G., Lindsay, B.G.: Statistical distances and their role in robustness. In: Chen, D.-G., Jin, Z., Li, G., Li, Y., Liu, A., Zhao, Y. (eds.) New Advances in Statistics and Data Science. IBSS, pp. 3–26. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-69416-0_1CrossRefzbMATHGoogle Scholar
  18. 18.
    Navarro, G.: Searching in metric spaces by spatial approximation. VLDB J. 11(1), 28–46 (2002)CrossRefGoogle Scholar
  19. 19.
    Omohundro, S.M.: Five balltree construction algorithms (1989). iCSI Technical Report TR-89-063. http://www.icsi.berkeley.edu/icsi/publication_details?ID=000562
  20. 20.
    Rényi, A.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 547–561 (1961)Google Scholar
  21. 21.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)zbMATHGoogle Scholar
  22. 22.
    Skopal, T.: Unified framework for fast exact and approximate search in dissimilarity spaces. ACM Trans. Database Syst. 32(4), 29 (2007)CrossRefGoogle Scholar
  23. 23.
    Skopal, T., Bustos, B.: On nonmetric similarity search problems in complex domains. ACM Comput. Surv. 43(4), 34 (2011)CrossRefGoogle Scholar
  24. 24.
    Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40(4), 175–179 (1991)CrossRefGoogle Scholar
  25. 25.
    Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, vol. 98, pp. 194–205 (1998)Google Scholar
  26. 26.
    Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of ACM/SIGACT-SIAM 1993, pp. 311–321 (1993)Google Scholar
  27. 27.
    Zhang, Z., Ooi, B.C., Parthasarathy, S., Tung, A.K.H.: Similarity search on Bregman divergence: towards non-metric indexing. PVLDB 2(1), 13–24 (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations