gvnn: Neural Network Library for Geometric Computer Vision

  • Ankur Handa
  • Michael Bloesch
  • Viorica Pătrăucean
  • Simon Stent
  • John McCormac
  • Andrew Davison
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9915)


We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be inserted within a neural network much in the spirit of the original spatial transformers and allow backpropagation to enable end-to-end learning of a network involving any domain knowledge in geometric computer vision. This opens up applications in learning invariance to 3D geometric transformation for place recognition, end-to-end visual odometry, depth estimation and unsupervised learning through warping with a parametric transformation for image reconstruction error.


Spatial transformer networks Geometric vision Unsupervised learning 



AH and AD would like to thank Dyson Technology Ltd. for kindly funding this research work.


  1. 1.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS (2015)Google Scholar
  2. 2.
    Patraucean, V., Handa, A., Cipolla, R.: Spatio-temporal video autoencoder with differentiable memory. CoRR abs/1511.06309 (2015)Google Scholar
  3. 3.
    Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop. Number EPFL-CONF-192376 (2011)Google Scholar
  4. 4.
    Moodstocks: Open Source Implementation of Spatial Transformer Networks (2015).
  5. 5.
    Gallego, G., Yezzi, A.J.: A compact formula for the derivative of a 3-D rotation in exponential coordinates (2013)Google Scholar
  6. 6.
    Brooks, M.J., Chojnacki, W., Baumela, L.: Determining the egomotion of an uncalibrated camera from instantaneous optical flow. JOSA A (1997)Google Scholar
  7. 7.
    Nir, T., Bruckstein, A.M., Kimmel, R.: Over-parameterized variational optical flow. Int. J. Comput. Vis. (IJCV) 76(2), 205–216 (2008)CrossRefGoogle Scholar
  8. 8.
    Hornáček, M., Besse, F., Kautz, J., Fitzgibbon, A., Rother, C.: Highly overparameterized optical flow using patchmatch belief propagation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 220–234. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10578-9_15 Google Scholar
  9. 9.
    Bleyer, M., Rhemann, C., Rother, C.: PatchMatch stereo – stereo matching with slanted support windows. In: Proceedings of the British Machine Vision Conference (BMVC) (2011)Google Scholar
  10. 10.
    Pock, T., Zebedin, L., Bischof, H.: TGV-fusion. In: Calude, C.S., Rozenberg, G., Salomaa, A. (eds.) Rainbow of Computer Science. LNCS, vol. 6570, pp. 245–258. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-19391-0_18 CrossRefGoogle Scholar
  11. 11.
    Garg, R., BG, V.K., Reid, I.D.: Unsupervised CNN for single view depth estimation: geometry to the rescue. CoRR abs/1603.04992 (2016)Google Scholar
  12. 12.
    Sumner, R.W., Schmid, J., Pauly, M.: Embedded deformation for shape manipulation. In: Proceedings of SIGGRAPH (2007)Google Scholar
  13. 13.
    Zollhöfer, M., Nießner, M., Izadi, S., Rehmann, C., Zach, C., Fisher, M., Wu, C., Fitzgibbon, A., Loop, C., Theobalt, C., et al.: Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. (TOG) (2014)Google Scholar
  14. 14.
    Newcombe, R.A., Fox, D., Seitz, S.M.: Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  15. 15.
    Johnson, J., Alahi, A., Li, F.: Perceptual losses for real-time style transfer and super-resolution. CoRR abs/1603.08155 (2016)Google Scholar
  16. 16.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the International Conference on Computer Vision (ICCV) (2015)Google Scholar
  18. 18.
    Black, M.J., Anandan, P.: A framework for the robust estimation of optical flow. In: Proceedings of the International Conference on Computer Vision (ICCV) (1993)Google Scholar
  19. 19.
    Black, M., Anandan, P.: Robust dynamic motion estimation over time. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1991)Google Scholar
  20. 20.
    Black, M.J., Sapiro, G., Marimont, D.H., Heeger, D.: Robust anisotropic diffusion. IEEE Trans. Image Process. 7, 421–432 (1998)CrossRefGoogle Scholar
  21. 21.
    Strobl, K.H., Hirzinger, G.: Optimal hand-eye calibration. In: 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE (2006)Google Scholar
  22. 22.
    Agrawal, P., Carreira, J., Malik, J.: Learning to see by moving. In: Proceedings of the IEEE International Conference on Computer Vision (2015)Google Scholar
  23. 23.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (1981)Google Scholar
  24. 24.
    Drummond, T., Cipolla, R.: Visual tracking and control using lie algebras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1999)Google Scholar
  25. 25.
    Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). In: ICLR (2016)Google Scholar
  26. 26.
    Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. (2014)Google Scholar
  27. 27.
    Handa, A., Whelan, T., McDonald, J.B., Davison, A.J.: A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2014)Google Scholar
  28. 28.
    Wu, C.: VisualSfM : A visual structure from motion system.
  29. 29.
    Handa, A., Pătrăucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: SceneNet: understanding real world indoor scenes with synthetic data. arXiv preprint (2015). arXiv:1511.07041
  30. 30.
    Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Léonard, N., Waghmare, S., Wang, Y., Kim, J.: RNN: recurrent library for torch. CoRR abs/1511.07889 (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Ankur Handa
    • 1
  • Michael Bloesch
    • 3
  • Viorica Pătrăucean
    • 2
  • Simon Stent
    • 2
  • John McCormac
    • 1
  • Andrew Davison
    • 1
  1. 1.Dyson Robotics Laboratory, Department of ComputingImperial College LondonLondonUK
  2. 2.Department of EngineeringUniversity of CambridgeCambridgeUK
  3. 3.Robotic Systems LabETH ZurichZurichSwitzerland

Personalised recommendations