Rigidity Preserving Image Transformations and Equivariance in Perspective

Brynte, Lucas; Bökman, Georg; Flinth, Axel; Kahl, Fredrik

doi:10.1007/978-3-031-31438-4_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13886))

Included in the following conference series:

Scandinavian Conference on Image Analysis

626 Accesses
3 Citations

Abstract

We characterize the class of image plane transformations which realize rigid camera motions and call these transformations ‘rigidity preserving’. It turns out that the only rigidity preserving image transformations are homographies corresponding to rotating the camera. In particular, 2D translations of pinhole images are not rigidity preserving. Hence, when using CNNs for 3D inference tasks, it can be beneficial to modify the inductive bias from equivariance w.r.t. translations to equivariance w.r.t. rotational homographies. We investigate how equivariance with respect to rotational homographies can be approximated in CNNs, and test our ideas on 6D object pose estimation. Experimentally, we improve on a competitive baseline.

L. Brynte and G. Bökman—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Estimation of 3D Category-Specific Object Structure: Symmetry, Manhattan and/or Multiple Images

Article 02 August 2019

Coplanarity-Based Approach for Camera Motion Estimation Invariant to the Scene Depth

Article 15 December 2022

Simultaneous Multi-view Relative Pose Estimation and 3D Reconstruction from Planar Regions

References

Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations? J. Mach. Learn. Res. 20(184), 1–25 (2019)
MathSciNet MATH Google Scholar
Batzner, S.: E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13(1), 2453 (2022). https://doi.org/10.1038/s41467-022-29939-5
Article Google Scholar
Boomsma, W., Frellsen, J.: Spherical convolutions and their application in molecular modelling. In: Neural Information Processing Systems (2017)
Google Scholar
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35
Chapter Google Scholar
Brachmann, E., Michel, F., Krull, A., Yang, M.Y., Gumhold, S., Rother, C.: Uncertainty-driven 6D pose estimation of objects and scenes from a single rgb image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Brachmann, E., Rother, C.: Visual camera re-localization from RGB and RGB-D images using DSAC. TPAMI 44, 5847–5865 (2021)
Google Scholar
Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Rigidity preserving image transformations and equivariance in perspective. Preprint arXiv:2201.13065 (2022)
Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Back to the feature with rigidity preserving image transformations (2023). https://github.com/lucasbrynte/pixloc-rigid-img-trafos
Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Dsac* with rigidity preserving image transformations (2023). https://github.com/lucasbrynte/dsacstar-rigid-img-trafos
Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Efficientpose with rigidity preserving image transformations (2023). https://github.com/lucasbrynte/EfficientPose-rigid-img-trafos
Bukschat, Y., Vetter, M.: EfficientPose: an efficient, accurate and scalable end-to-end 6D multi object pose estimation approach. Preprint arXiv:2011.04307 (2020)
Chen, B., Parra, A., Cao, J., Li, N., Chin, T.J.: End-to-end learnable geometric vision by backpropagating pnp optimization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Cheng, J., Liu, P., Zhang, Q., Ma, H., Wang, F., Zhang, J.: Real-time and efficient 6-d pose estimation from a single rgb image. IEEE Trans. Instrument. Meas. 70, 1–14 (2021). https://doi.org/10.1109/TIM.2021.3115564
Article Google Scholar
Cohen, T., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: International Conference on Machine Learning (2019)
Google Scholar
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning (2016)
Google Scholar
Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical CNNs. In: International Conference on Learning Representations (2018)
Google Scholar
Coors, B., Condurache, A.P., Geiger, A.: SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 525–541. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_32
Chapter Google Scholar
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation policies from data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: RandAugment: practical automated data augmentation with a reduced search space. In: Neural Information Processing Systems (2020)
Google Scholar
Defferrard, M., Milani, M., Gusset, F., Perraudin, N.: DeepSphere: a graph-based spherical CNN. In: International Conference on Learning Representations (2019)
Google Scholar
Eder, M., Frahm, J.M.: Convolutions on spherical images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Google Scholar
Eder, M., Shvets, M., Lim, J., Frahm, J.M.: Tangent images for mitigating spherical distortion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Esteves, C., Makadia, A., Daniilidis, K.: Spin-weighted spherical CNNs. In: Neural Information Processing Systems (2020)
Google Scholar
Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning SO(3) equivariant representations with spherical CNNs. International Journal of Computer Vision 128(3), 588–600 (2019). https://doi.org/10.1007/s11263-019-01220-1
Article Google Scholar
Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980)
Article MATH Google Scholar
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Hinterstoisser, S., et al.: Model Based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42
Chapter Google Scholar
Hu, Y., Fua, P., Wang, W., Salzmann, M.: Single-stage 6D object pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Islam, M.A., Jia, S., Bruce, N.D.B.: How much position information do convolutional neural networks encode? In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rJeB36NKvB
Jiang, C., Huang, J., Kashinath, K., Marcus, P., Niessner, M., et al.: Spherical CNNs on unstructured grids. In: International Conference on Learning Representations (2019)
Google Scholar
Kayhan, O.S., Gemert, J.C.v.: On translation invariance in CNNs: convolutional layers can exploit absolute spatial location. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems (2012)
Google Scholar
Lang, L., Weiler, M.: A Wigner-Eckart theorem for group equivariant convolution kernels. In: International Conference on Learning Representations (2020)
Google Scholar
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: IEEE/CVF International Conference on Computer Vision (2019)
Google Scholar
Liu, Y., Yixuan, Y., Liu, M.: Ground-aware monocular 3D object detection for autonomous driving. IEEE Rob. Autom. Lett. 6(2), 919–926 (2021)
Article Google Scholar
Mahendran, S., Ali, H., Vidal, R.: 3D pose regression using convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 494–495 (2017). https://doi.org/10.1109/CVPRW.2017.73, iSSN: 2160-7516
Matsuki, H., von Stumberg, L., Usenko, V., Stückler, J., Cremers, D.: Omnidirectional DSO: direct sparse odometry with fisheye cameras. IEEE Rob. Autom. Lett. 3(4), 3693–3700 (2018)
Article Google Scholar
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: Pvnet: pixel-wise voting network for 6dof pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Sarlin, P.E., et al.: Back to the feature: learning robust camera localization from pixels to pose. In: CVPR (2021)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Google Scholar
Snyder, J.P.: Map projections-A working manual, vol. 1395. US Government Printing Office (1987)
Google Scholar
Su, Y.C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Wang, K., Fang, B., Qian, J., Yang, S., Zhou, X., Zhou, J.: Perspective transformation data augmentation for object detection. IEEE Access 8, 4935–4943 (2020). https://doi.org/10/gmxt9r
Wang, Y.: Data Augmentation Study for Learning-based 6D Pose Estimation. Master’s thesis, EPFL, Switzerland and Chalmers University of Technology, Sweden (2021)
Google Scholar
Xu, Y., Lin, K.Y., Zhang, G., Wang, X., Li, H.: Rnnpose: recurrent 6-dof object pose refinement with robust correspondence field estimation and pose optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14880–14890 (2022)
Google Scholar
Yarotsky, D.: Universal approximations of invariant maps by neural networks. In: Constructive Approximation, pp. 1–68 (2021)
Google Scholar
Zhang, R.: Making convolutional networks shift-invariant again. In: ICML (2019)
Google Scholar
Zhang, X., Wang, Q., Zhang, J., Zhong, Z.: Adversarial AutoAugment. In: International Conference on Machine Learning (2020)
Google Scholar
Zhou, S., Zhang, J., Jiang, H., Lundh, T., Ng, A.Y.: Data augmentation with Mobius transformations. Mach. Learn.: Sci. Technol. 2(2), 025016 (2021)
Google Scholar

Download references

Acknowledgements

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. The authors acknowledge support from Chalmers AI Research (CHAIR) as well as the Swedish Foundation for Strategic Research. The computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at Chalmers Centre for Computational Science and Engineering (C3SE) partially funded by the Swedish Research Council through grant agreement no. 2018-05973.

Author information

Authors and Affiliations

Chalmers University of Technology, Gothenburg, Sweden
Lucas Brynte, Georg Bökman & Fredrik Kahl
Umeå University, Umeå, Sweden
Axel Flinth

Authors

Lucas Brynte
View author publications
You can also search for this author in PubMed Google Scholar
Georg Bökman
View author publications
You can also search for this author in PubMed Google Scholar
Axel Flinth
View author publications
You can also search for this author in PubMed Google Scholar
Fredrik Kahl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucas Brynte .

Editor information

Editors and Affiliations

Aalborg University, Aalborg, Denmark
Rikke Gade
Linköping University, Linköping, Sweden
Michael Felsberg
Tampere University, Tampere, Finland
Joni-Kristian Kämäräinen

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2287 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brynte, L., Bökman, G., Flinth, A., Kahl, F. (2023). Rigidity Preserving Image Transformations and Equivariance in Perspective. In: Gade, R., Felsberg, M., Kämäräinen, JK. (eds) Image Analysis. SCIA 2023. Lecture Notes in Computer Science, vol 13886. Springer, Cham. https://doi.org/10.1007/978-3-031-31438-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-31438-4_5
Published: 27 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31437-7
Online ISBN: 978-3-031-31438-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Rigidity Preserving Image Transformations and Equivariance in Perspective

Abstract

Access this chapter

Similar content being viewed by others

Estimation of 3D Category-Specific Object Structure: Symmetry, Manhattan and/or Multiple Images

Coplanarity-Based Approach for Camera Motion Estimation Invariant to the Scene Depth

Simultaneous Multi-view Relative Pose Estimation and 3D Reconstruction from Planar Regions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2287 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Rigidity Preserving Image Transformations and Equivariance in Perspective

Abstract

Access this chapter

Similar content being viewed by others

Estimation of 3D Category-Specific Object Structure: Symmetry, Manhattan and/or Multiple Images

Coplanarity-Based Approach for Camera Motion Estimation Invariant to the Scene Depth

Simultaneous Multi-view Relative Pose Estimation and 3D Reconstruction from Planar Regions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2287 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation