Skip to main content

Fast kNN Graph Construction with Locality Sensitive Hashing

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8189)

Abstract

The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graph-based learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n 2), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient algorithm for approximating kNN graphs, which has the time complexity of O(l(d + logn)n) only (d is the dimensionality and l is usually a small number). This is much faster than most existing fast methods. Specifically, we engage the locality sensitive hashing technique to divide items into small subsets with equal size, and then build one kNN graph on each subset using the brute force method. To enhance the approximation quality, we repeat this procedure for several times to generate multiple basic approximate graphs, and combine them to yield a high quality graph. Compared with existing methods, the proposed approach has features that are: (1) much more efficient in speed (2) applicable to generic similarity measures; (3) easy to parallelize. Finally, on three benchmark large-scale data sets, our method beats existing fast methods with obvious advantages.

Keywords

  • graph construction
  • locality sensitive hashing
  • graph-based machine learning

References

  1. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), 1373–1396 (2003)

    MATH  CrossRef  Google Scholar 

  2. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)

    MathSciNet  MATH  CrossRef  Google Scholar 

  3. Bentley, J.L.: Multidimensional divide-and-conquer. Communications of the ACM 23(4), 214–229 (1980)

    MathSciNet  MATH  CrossRef  Google Scholar 

  4. Bronstein, M.M., Fua, P.: LDAHash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(1), 66–78 (2012)

    CrossRef  Google Scholar 

  5. Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing (2002)

    Google Scholar 

  6. Chen, J., Fang, H., Saad, Y.: Fast approximate kNN graph construction for high dimensional data via recursive lanczos bisection. The Journal of Machine Learning Research 10, 1989–2012 (2009)

    MathSciNet  MATH  Google Scholar 

  7. Cheng, B., Yang, J.C., Yan, S.C., Fu, Y., Huang, T.: Learning with l1-graph for image analysis. IEEE Transaction on Image Processing 19, 858–866 (2010)

    MathSciNet  CrossRef  Google Scholar 

  8. Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.-T.: NUS-WIDE: A real-world web image database from national university of singapore. In: Proceedings of ACM Conference on Image and Video Retrieval (2009)

    Google Scholar 

  9. Daitch, S.I., Kelner, J.A., Spielman, D.A.: Fitting a graph to vector data. In: Proceedings of the International Conference on Machine Learning (2009)

    Google Scholar 

  10. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Annual Symposium on Computational Geometry (2004)

    Google Scholar 

  11. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  12. Dong, W., Charikar, M., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the International Conference on World Wide Web (2011)

    Google Scholar 

  13. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the International Conference on Very Large Data Bases (1999)

    Google Scholar 

  14. Gorisse, D., Cord, M., Precioso, F.: Locality-sensitive hashing for chi2 distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(2), 402–409 (2012)

    CrossRef  Google Scholar 

  15. Goyal, A., Daumé III, H., Guerra, R.: Fast large-scale approximate graph construction for nlp. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (2012)

    Google Scholar 

  16. Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report, California Institute of Technology (2007)

    Google Scholar 

  17. Herbster, M., Pontil, M., Galeano, S.R.: Fast predciton on a tree. In: Advances in Neural Information Processing Systems (2008)

    Google Scholar 

  18. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Annual ACM Symposium on Theory of Computing (1998)

    Google Scholar 

  19. Jebara, T., Wang, J., Chang, S.F.: Graph construction and b-matching for semi-supervised learning. In: Proceedings of the International Conference on Machine Learning (2009)

    Google Scholar 

  20. Kong, W., Li, W.J.: Isotropic hashing. In: Advances in Neural Information Processing Systems (2012)

    Google Scholar 

  21. Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: IEEE International Conference on Computer Vision (2009)

    Google Scholar 

  22. Kulis, B., Jain, P., Grauman, K.: Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(12), 2143–2157 (2009)

    CrossRef  Google Scholar 

  23. Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In: Proceedings of the International Conference on Machine Learning (2011)

    Google Scholar 

  24. Salakhutdinov, R., Hinton, G.: Semantic hashing. International Journal of Approximate Reasoning 50(7), 969–978 (2009)

    CrossRef  Google Scholar 

  25. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2002)

    Google Scholar 

  26. Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40(4), 175–179 (1991)

    MATH  CrossRef  Google Scholar 

  27. Vaidya, P.M.: An O(nlogn) algorithm for the all-nearest-neighbors problem. Discrete & Computational Geometry 4(1), 101–115 (1989)

    MathSciNet  MATH  CrossRef  Google Scholar 

  28. Von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)

    MathSciNet  CrossRef  Google Scholar 

  29. Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for scalable image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)

    Google Scholar 

  30. Wang, J., Wang, J., Zeng, G., Tu, Z., Gan, R., Li, S.: Scalable k-NN graph construction for visual descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)

    Google Scholar 

  31. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advances in Neural Information Processing Systems (2008)

    Google Scholar 

  32. Zhang, Y.M., Huang, K., Liu, C.L.: Fast and robust graph-based transductive learning via minimum tree cut. In: IEEE International Conference on Data Mining (2011)

    Google Scholar 

  33. Zhu, X.: Semi-supervised learning literature survey. Technical report, Computer Science, University of Wisconsin-Madison (2008)

    Google Scholar 

  34. Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the International Conference on Machine Learning (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, YM., Huang, K., Geng, G., Liu, CL. (2013). Fast kNN Graph Construction with Locality Sensitive Hashing. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40991-2_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40991-2_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40990-5

  • Online ISBN: 978-3-642-40991-2

  • eBook Packages: Computer ScienceComputer Science (R0)