Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

  • Elias Jääsaari
  • Ville HyvönenEmail author
  • Teemu Roos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11440)


Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy–speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.


Nearest neighbor search Approximate nearest neighbors Randomized space-partitioning trees Indexing methods Autotuning 



This project was supported by Business Finland (project 3662/31/2018 Advanced Machine Learning for Industrial Applications) and the Academy of Finland (project 311277 TensorML).


  1. 1.
    Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: CVPR, pp. 1081–1089. IEEE (2015)Google Scholar
  2. 2.
    Boytsov, L., Naidan, B.: Engineering efficient and effective non-metric space library. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 280–293. Springer, Heidelberg (2013). Scholar
  3. 3.
    Dasgupta, S., Sinha, K.: Randomized partition trees for nearest neighbor search. Algorithmica 72(1), 237–263 (2015)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: CIKM, pp. 669–678. ACM (2008)Google Scholar
  5. 5.
    Hassan, H., Elaraby, M., Tawfik, A.Y.: Synthetic data for neural machine translation of spoken-dialects. Small 16, 17–33 (2017)Google Scholar
  6. 6.
    Hyvönen, V., et al.: Fast nearest neighbor search through sparse random projections and voting. In: IEEE International Conference on Big Data, pp. 881–888 (2016)Google Scholar
  7. 7.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613. ACM (1998)Google Scholar
  8. 8.
    Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI 33(1), 117–128 (2011)CrossRefGoogle Scholar
  9. 9.
    Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017)
  10. 10.
    Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv:1603.09320 (2016)
  11. 11.
    McBryde, C.R.: Spacecraft visual navigation using appearance matching and multi-spectral sensor fusion. Ph.D. thesis, Georgia Institute of Technology (2018)Google Scholar
  12. 12.
    McCartin-Lim, M., McGregor, A., Wang, R.: Approximate principal direction trees. In: ICML, pp. 1611–1618 (2012)Google Scholar
  13. 13.
    Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. TPAMI 36(11), 2227–2240 (2014)CrossRefGoogle Scholar
  14. 14.
    Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: CVPR, pp. 1–8. IEEE (2008)Google Scholar
  15. 15.
    Theil, H.: A rank-invariant method of linear and polynomial regression analysis. In: Henri Theil’s Contributions to Economics and Econometrics, pp. 345–381 (1992)Google Scholar
  16. 16.
    Verma, N., Kpotufe, S., Dasgupta, S.: Which spatial partition trees are adaptive to intrinsic dimension? In: UAI, pp. 565–574. AUAI Press (2009)Google Scholar
  17. 17.
    Wang, L., Tasoulis, S., Roos, T., Kangasharju, J.: Kvasir: scalable provision of semantically relevant web content on big data framework. IEEE Trans. Big Data 2(3), 219–233 (2016)CrossRefGoogle Scholar
  18. 18.
    Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: SODA, vol. 93, pp. 311–321 (1993)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Elias Jääsaari
    • 1
  • Ville Hyvönen
    • 2
    • 3
    Email author
  • Teemu Roos
    • 2
    • 3
  1. 1.Kvasir Ltd.CambridgeEngland
  2. 2.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland
  3. 3.Helsinki Institute for Information Technology (HIIT)HelsinkiFinland

Personalised recommendations