Skip to main content

Rigidity Preserving Image Transformations and Equivariance in Perspective

  • Conference paper
  • First Online:
Image Analysis (SCIA 2023)

Abstract

We characterize the class of image plane transformations which realize rigid camera motions and call these transformations ‘rigidity preserving’. It turns out that the only rigidity preserving image transformations are homographies corresponding to rotating the camera. In particular, 2D translations of pinhole images are not rigidity preserving. Hence, when using CNNs for 3D inference tasks, it can be beneficial to modify the inductive bias from equivariance w.r.t. translations to equivariance w.r.t. rotational homographies. We investigate how equivariance with respect to rotational homographies can be approximated in CNNs, and test our ideas on 6D object pose estimation. Experimentally, we improve on a competitive baseline.

L. Brynte and G. Bökman—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations? J. Mach. Learn. Res. 20(184), 1–25 (2019)

    MathSciNet  MATH  Google Scholar 

  2. Batzner, S.: E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13(1), 2453 (2022). https://doi.org/10.1038/s41467-022-29939-5

    Article  Google Scholar 

  3. Boomsma, W., Frellsen, J.: Spherical convolutions and their application in molecular modelling. In: Neural Information Processing Systems (2017)

    Google Scholar 

  4. Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35

    Chapter  Google Scholar 

  5. Brachmann, E., Michel, F., Krull, A., Yang, M.Y., Gumhold, S., Rother, C.: Uncertainty-driven 6D pose estimation of objects and scenes from a single rgb image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  6. Brachmann, E., Rother, C.: Visual camera re-localization from RGB and RGB-D images using DSAC. TPAMI 44, 5847–5865 (2021)

    Google Scholar 

  7. Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Rigidity preserving image transformations and equivariance in perspective. Preprint arXiv:2201.13065 (2022)

  8. Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Back to the feature with rigidity preserving image transformations (2023). https://github.com/lucasbrynte/pixloc-rigid-img-trafos

  9. Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Dsac* with rigidity preserving image transformations (2023). https://github.com/lucasbrynte/dsacstar-rigid-img-trafos

  10. Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Efficientpose with rigidity preserving image transformations (2023). https://github.com/lucasbrynte/EfficientPose-rigid-img-trafos

  11. Bukschat, Y., Vetter, M.: EfficientPose: an efficient, accurate and scalable end-to-end 6D multi object pose estimation approach. Preprint arXiv:2011.04307 (2020)

  12. Chen, B., Parra, A., Cao, J., Li, N., Chin, T.J.: End-to-end learnable geometric vision by backpropagating pnp optimization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  13. Cheng, J., Liu, P., Zhang, Q., Ma, H., Wang, F., Zhang, J.: Real-time and efficient 6-d pose estimation from a single rgb image. IEEE Trans. Instrument. Meas. 70, 1–14 (2021). https://doi.org/10.1109/TIM.2021.3115564

    Article  Google Scholar 

  14. Cohen, T., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: International Conference on Machine Learning (2019)

    Google Scholar 

  15. Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning (2016)

    Google Scholar 

  16. Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical CNNs. In: International Conference on Learning Representations (2018)

    Google Scholar 

  17. Coors, B., Condurache, A.P., Geiger, A.: SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 525–541. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_32

    Chapter  Google Scholar 

  18. Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation policies from data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  19. Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: RandAugment: practical automated data augmentation with a reduced search space. In: Neural Information Processing Systems (2020)

    Google Scholar 

  20. Defferrard, M., Milani, M., Gusset, F., Perraudin, N.: DeepSphere: a graph-based spherical CNN. In: International Conference on Learning Representations (2019)

    Google Scholar 

  21. Eder, M., Frahm, J.M.: Convolutions on spherical images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  22. Eder, M., Shvets, M., Lim, J., Frahm, J.M.: Tangent images for mitigating spherical distortion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  23. Esteves, C., Makadia, A., Daniilidis, K.: Spin-weighted spherical CNNs. In: Neural Information Processing Systems (2020)

    Google Scholar 

  24. Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning SO(3) equivariant representations with spherical CNNs. International Journal of Computer Vision 128(3), 588–600 (2019). https://doi.org/10.1007/s11263-019-01220-1

    Article  Google Scholar 

  25. Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980)

    Article  MATH  Google Scholar 

  26. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  27. Hinterstoisser, S., et al.: Model Based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42

    Chapter  Google Scholar 

  28. Hu, Y., Fua, P., Wang, W., Salzmann, M.: Single-stage 6D object pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  29. Islam, M.A., Jia, S., Bruce, N.D.B.: How much position information do convolutional neural networks encode? In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rJeB36NKvB

  30. Jiang, C., Huang, J., Kashinath, K., Marcus, P., Niessner, M., et al.: Spherical CNNs on unstructured grids. In: International Conference on Learning Representations (2019)

    Google Scholar 

  31. Kayhan, O.S., Gemert, J.C.v.: On translation invariance in CNNs: convolutional layers can exploit absolute spatial location. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  32. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems (2012)

    Google Scholar 

  33. Lang, L., Weiler, M.: A Wigner-Eckart theorem for group equivariant convolution kernels. In: International Conference on Learning Representations (2020)

    Google Scholar 

  34. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)

    Article  Google Scholar 

  35. Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  36. Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: IEEE/CVF International Conference on Computer Vision (2019)

    Google Scholar 

  37. Liu, Y., Yixuan, Y., Liu, M.: Ground-aware monocular 3D object detection for autonomous driving. IEEE Rob. Autom. Lett. 6(2), 919–926 (2021)

    Article  Google Scholar 

  38. Mahendran, S., Ali, H., Vidal, R.: 3D pose regression using convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 494–495 (2017). https://doi.org/10.1109/CVPRW.2017.73, iSSN: 2160-7516

  39. Matsuki, H., von Stumberg, L., Usenko, V., Stückler, J., Cremers, D.: Omnidirectional DSO: direct sparse odometry with fisheye cameras. IEEE Rob. Autom. Lett. 3(4), 3693–3700 (2018)

    Article  Google Scholar 

  40. Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: Pvnet: pixel-wise voting network for 6dof pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  41. Sarlin, P.E., et al.: Back to the feature: learning robust camera localization from pixels to pose. In: CVPR (2021)

    Google Scholar 

  42. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

    Google Scholar 

  43. Snyder, J.P.: Map projections-A working manual, vol. 1395. US Government Printing Office (1987)

    Google Scholar 

  44. Su, Y.C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  45. Wang, K., Fang, B., Qian, J., Yang, S., Zhou, X., Zhou, J.: Perspective transformation data augmentation for object detection. IEEE Access 8, 4935–4943 (2020). https://doi.org/10/gmxt9r

  46. Wang, Y.: Data Augmentation Study for Learning-based 6D Pose Estimation. Master’s thesis, EPFL, Switzerland and Chalmers University of Technology, Sweden (2021)

    Google Scholar 

  47. Xu, Y., Lin, K.Y., Zhang, G., Wang, X., Li, H.: Rnnpose: recurrent 6-dof object pose refinement with robust correspondence field estimation and pose optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14880–14890 (2022)

    Google Scholar 

  48. Yarotsky, D.: Universal approximations of invariant maps by neural networks. In: Constructive Approximation, pp. 1–68 (2021)

    Google Scholar 

  49. Zhang, R.: Making convolutional networks shift-invariant again. In: ICML (2019)

    Google Scholar 

  50. Zhang, X., Wang, Q., Zhang, J., Zhong, Z.: Adversarial AutoAugment. In: International Conference on Machine Learning (2020)

    Google Scholar 

  51. Zhou, S., Zhang, J., Jiang, H., Lundh, T., Ng, A.Y.: Data augmentation with Mobius transformations. Mach. Learn.: Sci. Technol. 2(2), 025016 (2021)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. The authors acknowledge support from Chalmers AI Research (CHAIR) as well as the Swedish Foundation for Strategic Research. The computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at Chalmers Centre for Computational Science and Engineering (C3SE) partially funded by the Swedish Research Council through grant agreement no. 2018-05973.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucas Brynte .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2287 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Brynte, L., Bökman, G., Flinth, A., Kahl, F. (2023). Rigidity Preserving Image Transformations and Equivariance in Perspective. In: Gade, R., Felsberg, M., Kämäräinen, JK. (eds) Image Analysis. SCIA 2023. Lecture Notes in Computer Science, vol 13886. Springer, Cham. https://doi.org/10.1007/978-3-031-31438-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31438-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31437-7

  • Online ISBN: 978-3-031-31438-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics