Abstract
We characterize the class of image plane transformations which realize rigid camera motions and call these transformations ‘rigidity preserving’. It turns out that the only rigidity preserving image transformations are homographies corresponding to rotating the camera. In particular, 2D translations of pinhole images are not rigidity preserving. Hence, when using CNNs for 3D inference tasks, it can be beneficial to modify the inductive bias from equivariance w.r.t. translations to equivariance w.r.t. rotational homographies. We investigate how equivariance with respect to rotational homographies can be approximated in CNNs, and test our ideas on 6D object pose estimation. Experimentally, we improve on a competitive baseline.
L. Brynte and G. Bökman—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations? J. Mach. Learn. Res. 20(184), 1–25 (2019)
Batzner, S.: E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13(1), 2453 (2022). https://doi.org/10.1038/s41467-022-29939-5
Boomsma, W., Frellsen, J.: Spherical convolutions and their application in molecular modelling. In: Neural Information Processing Systems (2017)
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35
Brachmann, E., Michel, F., Krull, A., Yang, M.Y., Gumhold, S., Rother, C.: Uncertainty-driven 6D pose estimation of objects and scenes from a single rgb image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2016)
Brachmann, E., Rother, C.: Visual camera re-localization from RGB and RGB-D images using DSAC. TPAMI 44, 5847–5865 (2021)
Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Rigidity preserving image transformations and equivariance in perspective. Preprint arXiv:2201.13065 (2022)
Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Back to the feature with rigidity preserving image transformations (2023). https://github.com/lucasbrynte/pixloc-rigid-img-trafos
Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Dsac* with rigidity preserving image transformations (2023). https://github.com/lucasbrynte/dsacstar-rigid-img-trafos
Brynte, L., Bökman, G., Flinth, A., Kahl, F.: Efficientpose with rigidity preserving image transformations (2023). https://github.com/lucasbrynte/EfficientPose-rigid-img-trafos
Bukschat, Y., Vetter, M.: EfficientPose: an efficient, accurate and scalable end-to-end 6D multi object pose estimation approach. Preprint arXiv:2011.04307 (2020)
Chen, B., Parra, A., Cao, J., Li, N., Chin, T.J.: End-to-end learnable geometric vision by backpropagating pnp optimization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Cheng, J., Liu, P., Zhang, Q., Ma, H., Wang, F., Zhang, J.: Real-time and efficient 6-d pose estimation from a single rgb image. IEEE Trans. Instrument. Meas. 70, 1–14 (2021). https://doi.org/10.1109/TIM.2021.3115564
Cohen, T., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: International Conference on Machine Learning (2019)
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning (2016)
Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical CNNs. In: International Conference on Learning Representations (2018)
Coors, B., Condurache, A.P., Geiger, A.: SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 525–541. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_32
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation policies from data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: RandAugment: practical automated data augmentation with a reduced search space. In: Neural Information Processing Systems (2020)
Defferrard, M., Milani, M., Gusset, F., Perraudin, N.: DeepSphere: a graph-based spherical CNN. In: International Conference on Learning Representations (2019)
Eder, M., Frahm, J.M.: Convolutions on spherical images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Eder, M., Shvets, M., Lim, J., Frahm, J.M.: Tangent images for mitigating spherical distortion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Esteves, C., Makadia, A., Daniilidis, K.: Spin-weighted spherical CNNs. In: Neural Information Processing Systems (2020)
Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning SO(3) equivariant representations with spherical CNNs. International Journal of Computer Vision 128(3), 588–600 (2019). https://doi.org/10.1007/s11263-019-01220-1
Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)
Hinterstoisser, S., et al.: Model Based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42
Hu, Y., Fua, P., Wang, W., Salzmann, M.: Single-stage 6D object pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Islam, M.A., Jia, S., Bruce, N.D.B.: How much position information do convolutional neural networks encode? In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rJeB36NKvB
Jiang, C., Huang, J., Kashinath, K., Marcus, P., Niessner, M., et al.: Spherical CNNs on unstructured grids. In: International Conference on Learning Representations (2019)
Kayhan, O.S., Gemert, J.C.v.: On translation invariance in CNNs: convolutional layers can exploit absolute spatial location. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems (2012)
Lang, L., Weiler, M.: A Wigner-Eckart theorem for group equivariant convolution kernels. In: International Conference on Learning Representations (2020)
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: IEEE/CVF International Conference on Computer Vision (2019)
Liu, Y., Yixuan, Y., Liu, M.: Ground-aware monocular 3D object detection for autonomous driving. IEEE Rob. Autom. Lett. 6(2), 919–926 (2021)
Mahendran, S., Ali, H., Vidal, R.: 3D pose regression using convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 494–495 (2017). https://doi.org/10.1109/CVPRW.2017.73, iSSN: 2160-7516
Matsuki, H., von Stumberg, L., Usenko, V., Stückler, J., Cremers, D.: Omnidirectional DSO: direct sparse odometry with fisheye cameras. IEEE Rob. Autom. Lett. 3(4), 3693–3700 (2018)
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: Pvnet: pixel-wise voting network for 6dof pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Sarlin, P.E., et al.: Back to the feature: learning robust camera localization from pixels to pose. In: CVPR (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Snyder, J.P.: Map projections-A working manual, vol. 1395. US Government Printing Office (1987)
Su, Y.C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Wang, K., Fang, B., Qian, J., Yang, S., Zhou, X., Zhou, J.: Perspective transformation data augmentation for object detection. IEEE Access 8, 4935–4943 (2020). https://doi.org/10/gmxt9r
Wang, Y.: Data Augmentation Study for Learning-based 6D Pose Estimation. Master’s thesis, EPFL, Switzerland and Chalmers University of Technology, Sweden (2021)
Xu, Y., Lin, K.Y., Zhang, G., Wang, X., Li, H.: Rnnpose: recurrent 6-dof object pose refinement with robust correspondence field estimation and pose optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14880–14890 (2022)
Yarotsky, D.: Universal approximations of invariant maps by neural networks. In: Constructive Approximation, pp. 1–68 (2021)
Zhang, R.: Making convolutional networks shift-invariant again. In: ICML (2019)
Zhang, X., Wang, Q., Zhang, J., Zhong, Z.: Adversarial AutoAugment. In: International Conference on Machine Learning (2020)
Zhou, S., Zhang, J., Jiang, H., Lundh, T., Ng, A.Y.: Data augmentation with Mobius transformations. Mach. Learn.: Sci. Technol. 2(2), 025016 (2021)
Acknowledgements
This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. The authors acknowledge support from Chalmers AI Research (CHAIR) as well as the Swedish Foundation for Strategic Research. The computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at Chalmers Centre for Computational Science and Engineering (C3SE) partially funded by the Swedish Research Council through grant agreement no. 2018-05973.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Brynte, L., Bökman, G., Flinth, A., Kahl, F. (2023). Rigidity Preserving Image Transformations and Equivariance in Perspective. In: Gade, R., Felsberg, M., Kämäräinen, JK. (eds) Image Analysis. SCIA 2023. Lecture Notes in Computer Science, vol 13886. Springer, Cham. https://doi.org/10.1007/978-3-031-31438-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-31438-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31437-7
Online ISBN: 978-3-031-31438-4
eBook Packages: Computer ScienceComputer Science (R0)