Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs

  • Leonid BoytsovEmail author
  • Eric Nyberg
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11807)


We demonstrate that a graph-based search algorithm—relying on the construction of an approximate neighborhood graph—can directly work with challenging non-metric and/or non-symmetric distances without resorting to metric-space mapping and/or distance symmetrization, which, in turn, lead to substantial performance degradation. Although the straightforward metrization and symmetrization is usually ineffective, we find that constructing an index using a modified, e.g., symmetrized, distance can improve performance. This observation paves a way to a new line of research of designing index-specific graph-construction distance functions.


\(k\)-NN search Non-metric distance Neighborhood graph 



This work was done while Leonid Boytsov was a PhD student at CMU. Authors gratefully acknowledge the support by the NSF grant #1618159.


  1. 1.
    Arya, S., Mount, D.M.: Approximate nearest neighbor queries in fixed dimensions. In: SODA, vol. 93, pp. 271–280 (1993)Google Scholar
  2. 2.
    Aumüller, M., Bernhardsson, E., Faithfull, A.: ANN-benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 34–49. Springer, Cham (2017). CrossRefGoogle Scholar
  3. 3.
    Bellet, A., Habrard, A., Sebban, M.: A survey on metric learning for feature vectors and structured data. CoRR abs/1306.6709 (2013)Google Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. 5.
    Boytsov, L.: Efficient and accurate non-metric k-NN search with applications to text matching. Ph.D. thesis, Carnegie Mellon University (2017)Google Scholar
  6. 6.
    Boytsov, L., Naidan, B.: Engineering efficient and effective non-metric space library. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 280–293. Springer, Heidelberg (2013). Scholar
  7. 7.
    Cayton, L.: Fast nearest neighbor retrieval for bregman divergences. In: Proceedings of the 25th International Conference on Machine Learning, pp. 112–119. ACM (2008)Google Scholar
  8. 8.
    Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRefGoogle Scholar
  9. 9.
    Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11, 1109–1135 (2010)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of ICML 2007, pp. 209–216. ACM (2007)Google Scholar
  11. 11.
    Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of WWW 2011, pp. 577–586. ACM (2011)Google Scholar
  12. 12.
    Hajebi, K., Abbasi-Yadkori, Y., Shahbazi, H., Zhang, H.: Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: IJCAI/AAAI 2011, pp. 1312–1317 (2011)Google Scholar
  13. 13.
    Harwood, B., Drummond, T.: FANNG: fast approximate nearest neighbour graphs. In: Proceedings of CVPR, pp. 5713–5722 (2016)Google Scholar
  14. 14.
    Hjaltason, G.R., Samet, H.: Properties of embedding methods for similarity searching in metric spaces. IEEE Trans. Pattern Anal. Mach. Intell. 25(5), 530–549 (2003)CrossRefGoogle Scholar
  15. 15.
    Houle, M.E., Sakuma, J.: Fast approximate similarity search in extremely high-dimensional data sets. In: ICDE 2005, pp. 619–630 (2005)Google Scholar
  16. 16.
    Itakura, F., Saito, S.: Analysis synthesis telephony based on the maximum likelihood method. In: Proceedings of the 6th International Congress on Acoustics, pp. C17–C20 (1968)Google Scholar
  17. 17.
    Jacobs, D.W., Weinshall, D., Gdalyahu, Y.: Classification with nonmetric distances: image retrieval and class representation. IEEE Trans. Pattern Anal. Mach. Intell. 22(6), 583–600 (2000)CrossRefGoogle Scholar
  18. 18.
    Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Statist. 22(1), 79–86 (1951)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)Google Scholar
  20. 20.
    Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0). CoRR abs/1610.02455 (2016)Google Scholar
  21. 21.
    Liu, E.Y., Guo, Z., Zhang, X., Jojic, V., Wang, W.: Metric learning from relative comparisons by minimizing squared residual. In: IEEE ICDM 2012, pp. 978–983. IEEE (2012)Google Scholar
  22. 22.
    Malkov, Y., Ponomarenko, A., Logvinov, A., Krylov, V.: Approximate nearest neighbor algorithm based on navigable small world graphs. Inf. Syst. 45, 61–68 (2014)CrossRefGoogle Scholar
  23. 23.
    Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. CoRR abs/1603.09320 (2016)Google Scholar
  24. 24.
    Naidan, B., Boytsov, L., Nyberg, E.: Permutation search methods are efficient, yet faster search is possible. PVLDB 8(12), 1618–1629 (2015)Google Scholar
  25. 25.
    Ponomarenko, A., Avrelin, N., Naidan, B., Boytsov, L.: Comparative analysis of data structures for approximate nearest neighbor search. In: DATA ANALYTICS 2014, The Third International Conference on Data Analytics, pp. 125–130 (2014)Google Scholar
  26. 26.
    Qi, G., Tang, J., Zha, Z., Chua, T., Zhang, H.: An efficient sparse metric learning in high-dimensional space via l\({}_{\text{1}}\)-penalized log-determinant regularization. In: ICML 2009, pp. 841–848. ACM (2009)Google Scholar
  27. 27.
    Rényi, A.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 547–561 (1961)Google Scholar
  28. 28.
    Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Doc. 60(5), 503–520 (2004)CrossRefGoogle Scholar
  29. 29.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)zbMATHGoogle Scholar
  30. 30.
    Skopal, T., Bustos, B.: On nonmetric similarity search problems in complex domains. ACM Comput. Surv. 43(4), 34 (2011)CrossRefGoogle Scholar
  31. 31.
    Sutherland, D.J.: Scalable, flexible and active learning on distributions. Ph.D. thesis, Carnegie Mellon University (2016)Google Scholar
  32. 32.
    Tellez, E.S., Ruiz, G., Chávez, E., Graff, M.: Local search methods for fast near neighbor search. CoRR abs/1705.10351 (2017).
  33. 33.
    Toussaint, G.T.: The relative neighbourhood graph of a finite planar set. Pattern Recogn. 12(4), 261–268 (1980)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: a survey. CoRR abs/1408.2927 (2014)Google Scholar
  35. 35.
    Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, vol. 98, pp. 194–205 (1998)Google Scholar
  36. 36.
    Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: NIPS 2005, pp. 1473–1480 (2005)Google Scholar
  37. 37.
    Xiong, C., Johnson, D.M., Xu, R., Corso, J.J.: Random forests for metric learning with implicit pairwise position dependence. In: KDD 2012, pp. 958–966. ACM (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations