GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints

  • Zixin Luo
  • Tianwei Shen
  • Lei Zhou
  • Siyu Zhu
  • Runze ZhangEmail author
  • Yao Yao
  • Tian Fang
  • Long Quan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)


Learned local descriptors based on Convolutional Neural Networks (CNNs) have achieved significant improvements on patch-based benchmarks, whereas not having demonstrated strong generalization ability on recent benchmarks of image-based 3D reconstruction. In this paper, we mitigate this limitation by proposing a novel local descriptor learning approach that integrates geometry constraints from multi-view reconstructions, which benefits the learning process in terms of data generation, data sampling and loss computation. We refer to the proposed descriptor as GeoDesc, and demonstrate its superior performance on various large-scale benchmarks, and in particular show its great success on challenging reconstruction tasks. Moreover, we provide guidelines towards practical integration of learned descriptors in Structure-from-Motion (SfM) pipelines, showing the good trade-off that GeoDesc delivers to 3D reconstruction tasks between accuracy and efficiency.


Local features Feature descriptors Deep learning 



This work is supported by T22-603/15N, Hong Kong ITC PSKL12EG02 and the Special Project of International Scientific and Technological Cooperation in Guangzhou Development District (No. 2017GH24).

Supplementary material

474192_1_En_11_MOESM1_ESM.pdf (10.3 mb)
Supplementary material 1 (pdf 10513 KB)


  1. 1.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv (2016)Google Scholar
  2. 2.
    Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: CVPR (2012)Google Scholar
  3. 3.
    Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local lescriptors. In: CVPR (2017)Google Scholar
  4. 4.
    Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: BMVC (2016)Google Scholar
  5. 5.
    Balntas, V., Johns, E., Tang, L., Mikolajczyk, K.: PN-net: conjoined triple deep network for learning local image descriptors. arXiv (2016)Google Scholar
  6. 6.
    Brown, M.A., Hua, G., Winder, S.A.J.: Discriminative learning of local image descriptors. PAMI 33, 43–57 (2011)CrossRefGoogle Scholar
  7. 7.
    Vijay Kumar, B.G., Carneiro, G., Reid, I.: Learning local image descriptors with deep siamese and triplet convolutional networks by minimizing global loss functions. In: CVPR (2016)Google Scholar
  8. 8.
    Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet - unifying feature and metric learning for patch-based matching. In: CVPR (2015)Google Scholar
  9. 9.
    Heinly, J., Dunn, E., Frahm, J.-M.: Comparative evaluation of binary features. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 759–773. Springer, Heidelberg (2012). Scholar
  10. 10.
    Kaplan, A., Avraham, T., Lindenbaum, M.: Interpreting the ratio criterion for matching SIFT descriptors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 697–712. Springer, Cham (2016). Scholar
  11. 11.
    Labatut, P., Pons, J.P., Keriven, R.: Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In: ICCV (2007)Google Scholar
  12. 12.
    Li, S., Yuan, L., Sun, J., Quan, L.: Dual-feature warping-based motion model estimation. In: ICCV (2015)Google Scholar
  13. 13.
    Lin, J., Morere, O., Chandrasekhar, V., Veillard, A., Goh, H.: DeepHash: getting regularization, depth and fine-tuning right. arXiv (2015)Google Scholar
  14. 14.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. In: IJCV (2004)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Markuš, N., Pandžić, I.S., Ahlberg, J.: Learning local descriptors by optimizing the keypoint-correspondence criterion. In: ICPR (2016)Google Scholar
  16. 16.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. PAMI 27, 1615–1630 (2005)CrossRefGoogle Scholar
  17. 17.
    Mishchuk, A., Mishkin, D., Radenovic, F.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: NIPS (2017)Google Scholar
  18. 18.
    Mishkin, D., Matas, J., Perdoch, M., Lenc, K.: WxBS: wide baseline stereo generalizations. In: BMVC (2015)Google Scholar
  19. 19.
    Mitra, R., et al.: A large dataset for improving patch matching. arXiv (2018)Google Scholar
  20. 20.
    Morel, J.M., Yu, G.: ASIFT: a new framework for fully affine invariant image comparison. SIAM J. Imaging Sci. 2, 438–469 (2009)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No Fuss Distance Metric Learning using Proxies. In: ICCV (2017)Google Scholar
  22. 22.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  23. 23.
    Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 3–20. Springer, Cham (2016). Scholar
  24. 24.
    Schnberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)Google Scholar
  25. 25.
    Schönberger, J.L., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative evaluation of hand-crafted and learned local features. In: CVPR (2017)Google Scholar
  26. 26.
    Shen, T., Zhu, S., Fang, T., Zhang, R., Quan, L.: Graph-based consistent matching for structure-from-motion. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 139–155. Springer, Cham (2016). Scholar
  27. 27.
    Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: CVPR (2015)Google Scholar
  28. 28.
    Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: NIPS (2016)Google Scholar
  29. 29.
    Tian, B.F.Y., Wu, F: L2-net: deep learning of discriminative patch descriptor in Euclidean space. In: CVPR (2017)Google Scholar
  30. 30.
    Wilson, K., Snavely, N.: Robust global translations with 1DSfM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 61–75. Springer, Cham (2014). Scholar
  31. 31.
    Winder, S., Hua, G., Brown, M.: Picking the best daisy. In: CVPR (2009)Google Scholar
  32. 32.
    Wu, C.: SiftGPU: a GPU implementation of sift (2007).
  33. 33.
    Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). Scholar
  34. 34.
    Yi, K.M., Verdie, Y., Fua, P., Lepetit, V.: Learning to assign orientations to feature points. In: CVPR (2015)Google Scholar
  35. 35.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: CVPR (2015)Google Scholar
  36. 36.
    Zhang, R., Li, S., Fang, T., Zhu, S., Quan, L.: Joint camera clustering and surface segmentation for large-scale multi-view stereo. In: ICCV (2015)Google Scholar
  37. 37.
    Zhang, R., Zhu, S., Fang, T., Quan, L.: Distributed very large scale bundle adjustment by global camera consensus. In: ICCV (2017)Google Scholar
  38. 38.
    Zhang, X., Yu, F.X., Kumar, S., Chang, S.F.: Learning spread-out local feature descriptors. In: CVPR (2017)Google Scholar
  39. 39.
    Zhou, L., Zhu, S., Shen, T., Wang, J., Fang, T., Quan, L.: Progressive large scale-invariant image matching in scale space. In: ICCV (2017)Google Scholar
  40. 40.
    Zhu, S., Fang, T., Xiao, J., Quan, L.: Local readjustment for high-resolution 3D reconstruction. In: CVPR (2014)Google Scholar
  41. 41.
    Zhu, S., et al.: Very large-scale global SFM by distributed motion averaging. In: CVPR (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Hong Kong University of Science and TechnologyClear Water BayHong Kong
  2. 2.Shenzhen Zhuke Innovation Technology (Altizure)ShenzhenChina

Personalised recommendations