Skip to main content

CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review

Abstract

One of the main challenges in machine vision relates to the problem of obtaining robust representation of visual features that remain unaffected by geometric transformations. This challenge arises naturally in many practical machine vision tasks. For example, in mobile robot applications like simultaneous localization and mapping (SLAM) and visual tracking, object shapes change depending on their orientation in the 3D world, camera proximity, viewpoint, or perspective. In addition, natural phenomena such as occlusion, deformation, and clutter can cause geometric appearance changes of the underlying objects, leading to geometric transformations of the resulting images. Recently, deep learning techniques have proven very successful in visual recognition tasks but they typically perform poorly with small data or when deployed in environments that deviate from training conditions. While convolutional neural networks (CNNs) have inherent representation power that provides a high degree of invariance to geometric image transformations, they are unable to satisfactorily handle nontrivial transformations. In view of this limitation, several techniques have been devised to extend CNNs to handle these situations. This article reviews some of the most promising approaches to extend CNN architectures to handle nontrivial geometric transformations. Key strengths and weaknesses, as well as the application domains of the various approaches are also highlighted. The review shows that although an adequate model for generalized geometric transformations has not yet been formulated, several techniques exist for solving specific problems. Using these methods, it is possible to develop task-oriented solutions to deal with nontrivial transformations.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Availability of Data and Materials

The work does not involve primary data—we report on works that have already been published.

References

  1. Alcorn MA, Li Q, Gong Z, Wang C, Mai L, Ku WS, et al. Strike (with) a pose: neural networks are easily fooled by strange poses of familiar objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019. pp. 4845–54.

  2. Lenc K, Vedaldi A. Understanding image representations by measuring their equivariance and equivalence. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. pp. 991–99.

  3. Fischler MA, Elschlager RA. The representation and matching of pictorial structures. IEEE Trans Comput. 1973;100(1):67–92.

    Google Scholar 

  4. Mundy JL. Object recognition in the geometric era: A retrospective. In: Toward category-level object recognition. Springer; 2006. p. 3–28.

    Google Scholar 

  5. Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, et al. The history began from alexnet: a comprehensive survey on deep learning approaches. arXiv preprint. 2018. arXiv:1803.01164.

  6. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p.p 770–78.

  7. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90.

    Google Scholar 

  8. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: a brief review. Comput Intell Neurosci. 2018;2018:1–13. https://doi.org/10.1155/2018/7068349.

  9. Fukushima K, Miyake S. Neocognitron: a self- organizing neural network model for a mechanism of visual pattern recognition. In: Competition and cooperation in neural nets. Springer; 1982. p. 267–85.

    Google Scholar 

  10. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient- based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.

    Google Scholar 

  11. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint. 2012. arXiv:1207.0580.

  12. Plagianakos V, Magoulas G, Vrahatis M. Learning rate adaptation in stochastic gradient descent. In: Advances in convex analysis and global optimization. Springer; 2001. p. 433–44.

    MATH  Google Scholar 

  13. Nair V, Hinton GE. Rectified linear units improve restricted boltzmann machines. Berlin: ICML; 2010.

    Google Scholar 

  14. Scherer D, Müller A, Behnke S. Evaluation of pooling operations in convolutional architectures for object recognition. In: International conference on artificial neural networks. Springer; 2010. p. 92–101.

    Google Scholar 

  15. Moody J, Hanson S, Krogh A, Hertz JA. A simple weight decay can improve generalization. Adv Neural Inf Process Syst. 1992;4:950–57.

  16. Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol. 1962;160(1):106.

    Google Scholar 

  17. Riesenhuber M, Poggio T. Hierarchical models of object recognition in cortex. Nat Neurosci. 1999;2(11):1019–25.

    Google Scholar 

  18. Gong Y, Wang L, Guo R, Lazebnik S. Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision. Springer; 2014. p. 392–407.

    Google Scholar 

  19. Kheradpisheh SR, Ghodrati M, Ganjtabesh M, Masquelier T. Deep networks can resemble human feed-forward vision in invariant object recognition. Sci Rep. 2016;6:32672.

    Google Scholar 

  20. Fukushima K. Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Netw. 1988;1(2):119–30.

    Google Scholar 

  21. Xiao YP, Lai YK, Zhang FL, Li C, Gao L. A survey on deep geometry learning: from a representation perspective. Comput Vis Media. 2020;6(2):113–33.

    Google Scholar 

  22. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell. 2009;32(9):1627–45.

    Google Scholar 

  23. Müller M, Casser V, Lahoud J, Smith N, Ghanem B. Sim4cv: a photo-realistic simulator for computer vision applications. Int J Comput Vis. 2018;126(9):902–19.

    Google Scholar 

  24. Roska T, Hamori J, Labos E, Lotz K, Orzo L, Takacs J, et al. The use of CNN models in the subcortical visual pathway. IEEE Trans Circ Syst I. 1993;40(3):182–95.

    MATH  Google Scholar 

  25. Albawi S, Mohammed TA, Al-Zawi S. Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET). IEEE; 2017. pp. 1–6.

  26. Zaniolo L, Marques O. On the use of variable stride in convolutional neural networks. Multimedia Tools Appl. 2020;1–18.

  27. Murray N, Perronnin F. Generalized max pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2014. pp. 2473–80.

  28. Kuan K, Manek G, Lin J, Fang Y, Chandrasekhar V. Region average pooling for context-aware object detection. In: 2017 IEEE International Conference on Image Processing (ICIP). IEEE; 2017. pp. 1347–51.

  29. Khan A, Sohail A, Zahoora U, Qureshi AS. A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev. 2020;53(8):5455–516.

    Google Scholar 

  30. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.

    Google Scholar 

  31. Luo W, Li Y, Urtasun R, Zemel R. Understanding the effective receptive field in deep convolutional neural networks. Adv Neural Inf Process Syst. 2016;29:4898–906.

    Google Scholar 

  32. Araujo A, Norris W, Sim J. Computing receptive fields of convolutional neural networks. Distill. 2019;4(11):e21.

    Google Scholar 

  33. Montserrat DM, Lin Q, Allebach J, Delp EJ. Training object detection and recognition CNN models using data augmentation. Electron Imaging. 2017;2017(10):27–36.

    Google Scholar 

  34. Savalle PA, Tsogkas S, Papandreou G, Kokkinos I. Deformable part models with cnn features. In: Deformable Part Models with CNN Features. European Conference on Computer Vision, Parts and Attributes Workshop, Sep 6, 2014, Zurich, Switzerland (hal-01109290).

  35. Tang W, Yu P, Zhou J, Wu Y. Towards a unified compositional model for visual pattern modeling. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. pp. 2784–93.

  36. Kortylewski A, He J, Liu Q, Yuille AL. Compositional convolutional neural networks: a deep architecture with innate robustness to partial occlusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. pp. 8940–49.

  37. Jack D, Maire F, Shirazi S, Eriksson A. IGE- Net: Inverse graphics energy networks for human pose estimation and single-view reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019. pp. 7075–84.

  38. Halder SS, Lalonde JF, Charette Rd. Physics-based rendering for improving robustness to rain. In: Proceedings of the IEEE International Conference on Computer Vision; 2019. pp. 10203–12.

  39. Clevert DA, Unterthiner T, Hochreiter S. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint. 2015. arXiv:1511.07289.

  40. Maas AL, Hannun AY, Ng AY. Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml. vol. 30; 2013. p. 3.

  41. Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y. Maxout networks. In: International conference on machine learning. PMLR; 2013. pp. 1319–27.

  42. He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.

    Google Scholar 

  43. Laptev D, Savinov N, Buhmann JM, Pollefeys M. TI-POOLING: transformation-invariant pooling for feature learning in convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. pp. 289–97.

  44. Yu D, Wang H, Chen P, Wei Z. Mixed pooling for convolutional neural networks. In: International conference on rough sets and knowledge technology. Springer; 2014. p. 364–75.

    Google Scholar 

  45. Zeiler MD, Fergus R. Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint. 2013. arXiv:1301.3557.

  46. Wan L, Zeiler M, Zhang S, Le Cun Y, Fergus R. Regularization of neural networks using dropconnect. In: International conference on machine learning; 2013. pp. 1058–66.

  47. Larsson G, Maire M, Shakhnarovich G. Fractalnet: Ultra-deep neural networks without residuals. arXiv preprint. 2016. arXiv:1605.07648.

  48. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint. 2015. arXiv:1502.03167.

  49. Wei Z, Zhang J, Liu L, Zhu F, Shen F, Zhou Y, et al. Building detail-sensitive semantic segmentation networks with polynomial pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019. pp. 7115–23.

  50. Estrach JB, Szlam A, LeCun Y. Signal recovery from pooling representations. In: International conference on machine learning. PMLR; 2014. pp. 307–15.

  51. Ouyang W, Luo P, Zeng X, Qiu S, Tian Y, Li H, et al. Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. arXiv preprint. 2014. arXiv:1409.3505.

  52. Girshick R. Fast R-CNN object detection with Caffe. Microsoft Res. 2015.

  53. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):60.

    Google Scholar 

  54. Paulin M, Revaud J, Harchaoui Z, Perronnin F, Schmid C. Transformation pursuit for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2014. pp. 3646–53.

  55. Azulay A, Weiss Y. Why do deep convolutional networks generalize so poorly to small image transformations? arXiv preprint. 2018. arXiv:1805.12177.

  56. Engstrom L, Tsipras D, Schmidt L, Madry A. A rotation and a translation suffice: fooling CNNs with simple transformations. arXiv preprint. 2017;1(2):3. arXiv:1712.02779

  57. Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. In: Advances in neural information processing systems; 2017. pp. 3856–66.

  58. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, et al. Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision; 2017. pp. 764–73.

  59. Jia X, De Brabandere B, Tuytelaars T, Gool LV. Dynamic filter networks. In: Advances in neural information processing systems; 2016. pp. 667–75.

  60. Tarasiuk P, Pryczek M. Geometric transformations embedded into convolutional neural networks. J Appl Comput Sci. 2016;24(3):33–48.

    Google Scholar 

  61. Cohen T, Welling M. Group equivariant convolutional networks. In: International conference on machine learning; 2016. pp. 2990–9.

  62. Dieleman S, De Fauw J, Kavukcuoglu K. Exploiting cyclic symmetry in convolutional neural networks. arXiv preprint. 2016. arXiv:1602.02660.

  63. Marcos D, Volpi M, Komodakis N, Tuia D. Rotation equivariant vector field networks. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. pp. 5048–57.

  64. Van Noord N, Postma E. Learning scale-variant and scale-invariant features for deep image classification. Pattern Recogn. 2017;61:583–92.

    Google Scholar 

  65. Ghosh R, Gupta AK. Scale steerable filters for locally scale-invariant convolutional neural networks. arXiv preprint. 2019. arXiv:1906.03861.

  66. Li J, Liang X, Shen S, Xu T, Feng J, Yan S. Scale- aware fast R-CNN for pedestrian detection. IEEE Trans Multimedia. 2017;20(4):985–96.

    Google Scholar 

  67. Marcos D, Volpi M, Tuia D. Learning rotation invariant convolutional filters for texture classification. In: 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE; 2016. pp. 2012–7.

  68. Zhou Y, Ye Q, Qiu Q, Jiao J. Oriented response networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. pp. 519–28.

  69. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2014. pp. 580–7.

  70. Lin TY, Doll´ar P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. pp. 2117–25.

  71. Jeon Y, Kim J. Active convolution: learning the shape of convolution for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. pp. 4201–9.

  72. Chen LC, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint. 2017. arXiv:1706.05587.

  73. Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv preprint. 2015. arXiv:1511.07122.

  74. Hinton GE, Krizhevsky A, Wang SD. Transforming auto-encoders. In: International conference on artificial neural networks. Springer; 2011. p. 44–51.

    Google Scholar 

  75. Hinton GE, Sabour S, Frosst N. Matrix capsules with EM routing. In: International conference on learning representations; 2018.

  76. Zhao W, Ye J, Yang M, Lei Z, Zhang S, Zhao Z. Investigating capsule networks with dynamic routing for text classification. arXiv preprint. 2018. arXiv:1804.00538.

  77. Venkatraman S, Balasubramanian S, Sarma RR. Building deep, equivariant capsule networks. arXiv preprint. 2019. arXiv:1908.01300.

  78. Phaye SSR, Sikka A, Dhall A, Bathula D. Dense and diverse capsule networks: making the capsules learn better. arXiv preprint. 2018. arXiv:1805.04001.

  79. Ramasinghe S, Athuraliya C, Khan SH. A context- aware capsule network for multi-label classification. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. pp. 0–0.

  80. Zhang L, Edraki M, Qi GJ. Cappronet: Deep feature learning via orthogonal projections onto capsule subspaces. In: Advances in Neural Information Processing Systems; 2018. pp. 5814–23.

  81. Rodrıguez-Sanchez A, Dick T. Capsule Networks for Attention Under Occlusion. In: International Conference on Artificial Neural Networks. Springer; 2019. pp. 523–34.

  82. Prakash S, Gu G. Simultaneous localization and mapping with depth prediction using capsule networks for uavs. arXiv preprint. 2018. arXiv:1808.05336.

  83. Mekhalfi ML, Bejiga MB, Soresina D, Melgani F, Demir B. Capsule networks for object detection in UAV imagery. Remote Sensing. 2019;11(14):1694.

    Google Scholar 

  84. Kumar AD. Novel deep learning model for traffic sign detection using capsule networks. arXiv preprint. 2018. arXiv:1805.04424.

  85. LaLonde R, Bagci U. Capsules for object segmentation. arXiv preprint. 2018. arXiv:1804.04241.

  86. Duarte K, Rawat Y, Shah M. Videocapsulenet: a simplified network for action detection. In: Advances in Neural Information Processing Systems; 2018. pp. 7610–9.

  87. Zhao Y, Birdal T, Deng H, Tombari F. 3D point capsule networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2019. pp. 1009–18.

  88. Ahmad A, Kakillioglu B, Velipasalar S. 3D capsule networks for object classification from 3D model data. In: 2018 52nd Asilomar Conference on Signals, Systems, and Computers. IEEE; 2018. pp. 2225–9.

  89. Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks. In: Advances in neural information processing systems; 2015. pp. 2017–25.

  90. Worrall DE, Garbin SJ, Turmukhambetov D, Brostow GJ. Harmonic networks: deep translation and rotation equivariance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. pp. 5028–37.

  91. Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer; 2014. p. 818–33.

    Google Scholar 

  92. Doersch C, Gupta A, Efros AA. Mid-level visual element discovery as discriminative mode seeking. In: Advances in neural information processing systems; 2013. pp. 494–502.

  93. Parizi SN, Vedaldi A, Zisserman A, Felzenszwalb P. Automatic discovery and optimization of parts for image classification. arXiv preprint. 2014. arXiv:1412.6598.

  94. Li Y, Liu L, Shen C, Van Den Hengel A. Mining mid-level visual patterns with deep CNN activations. Int J Comput Vision. 2017;121(3):344–64.

    MathSciNet  MATH  Google Scholar 

  95. Yang L, Xie X, Li P, Zhang D, Zhang L. Part-based convolutional neural network for visual recognition. In: 2017 IEEE International Conference on Image Processing (ICIP). IEEE; 2017. pp. 1772–6.

  96. Kortylewski A, Liu Q, Wang H, Zhang Z, Yuille A. Combining compositional models and deep networks for robust object classification under occlusion. In: The IEEE Winter Conference on Applications of Computer Vision; 2020. pp. 1333–41.

  97. Sun Y, Zheng L, Li Y, Yang Y, Tian Q, Wang S. Learning part-based convolutional features for person re-identification. IEEE Trans Pattern Anal Mach Intell. 2019;43(3):902–17. https://doi.org/10.1109/TPAMI.2019.2938523.

  98. Hsieh PJ, Lin YL, Chen YH, Hsu W. Egocentric activity recognition by leveraging multiple mid- level representations. In: 2016 IEEE International Conference on Multimedia and Expo (ICME). IEEE; 2016. pp. 1–6.

  99. Tang W, Yu P, Wu Y. Deeply learned compositional models for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. pp. 190–206.

  100. Zhang Z, Xie C, Wang J, Xie L, Yuille AL. Deepvoting: a robust and explainable deep network for semantic part detection under partial occlusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. pp. 1372–80.

  101. Hariharan B, Arbelaez P, Girshick R, Malik J. Object instance segmentation and fine-grained localization using hypercolumns. IEEE Trans Pattern Anal Mach Intell. 2016;39(4):627–39.

    Google Scholar 

  102. Johnson J. Deep, skinny neural networks are not universal approximators. arXiv preprint. 2018. arXiv:1810.00393.

  103. Marcus G. Deep learning: a critical appraisal. arXiv preprint. 2018. arXiv:1801.00631.

  104. Shen X, Tian X, He A, Sun S, Tao D. Transform- invariant convolutional neural networks for image classification and search. In: Proceedings of the 24th ACM international conference on Multimedia; 2016. pp. 1345–54.

  105. Shu C, Chen X, Xie Q, Han H. Hierarchical Spatial Transformer Network. arXiv preprint. 2018. arXiv:1801.09467.

  106. Wang X, Shrivastava A, Gupta A. A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. pp. 2606–15.

  107. Girdhar R, Carreira J, Doersch C, Zisserman A. Video action transformer network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019. pp. 244–53.

  108. Yan X, Yang J, Yumer E, Guo Y, Lee H. Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision. In: Proceedings of the 30th International Conference on Neural Information Processing Systems; 2016. pp. 1704–12.

  109. Bhagavatula C, Zhu C, Luu K, Savvides M. Faster than real-time facial alignment: a 3D spatial transformer network approach in unconstrained poses. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. pp. 3980–89.

  110. Lin CH, Lucey S. Inverse compositional spatial transformer networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. pp. 2568–76.

  111. Freifeld O, Hauberg S, Batmanghelich K, Fisher JW. Transformations based on continuous piecewise-affine velocity fields. IEEE Trans Pattern Anal Mach Intell. 2017;39(12):2496–509.

    Google Scholar 

  112. Wei Z, Sun Y, Lin J, Liu S. Learning adaptive receptive fields for deep image parsing networks. Comput Vis Media. 2018;4(3):231–44.

    Google Scholar 

  113. Jing Y, Liu Y, Yang Y, Feng Z, Yu Y, Tao D, et al. Stroke controllable fast style transfer with adaptive receptive fields. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. pp. 238–54.

  114. Ciresan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE; 2012. pp. 3642–9.

  115. Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems; 2014. pp. 568–76.

  116. Ciresan D, Meier U. Multi-column deep neural networks for offline handwritten Chinese character classification. In: 2015 international joint conference on neural networks (IJCNN). IEEE; 2015. pp. 1–6.

  117. Natarajan S, Annamraju AK, Baradkar CS. Traffic sign recognition using weighted multi-convolutional neural network. IET Intel Transport Syst. 2018;12(10):1396–405.

    Google Scholar 

  118. Zhang J, Duan S, Wang L, Zou X. Multi-column spatial transformer convolution neural network for traffic sign recognition. In: International Symposium on Neural Networks. Springer; 2018. p. 593–600.

    Google Scholar 

  119. Fan C, Li Y, Wang G, Li Y. Learning transformation- invariant representations for image recognition with drop transformation networks. IEEE Access. 2018;6:73357–69.

    Google Scholar 

  120. Liu Y, Guo Y, Georgiou T, Lew MS. Fusion that matters: convolutional fusion networks for visual recognition. Multimedia Tools Appl. 2018;77(22):29407–34.

    Google Scholar 

  121. Lu X, Lin Z, Shen X, Mech R, Wang JZ. Deep multi- patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. pp. 990–8.

  122. Wen G, Hou Z, Li H, Li D, Jiang L, Xun E. Ensemble of deep neural networks with probability-based fusion for facial expression recognition. Cogn Comput. 2017;9(5):597–610.

    Google Scholar 

  123. Tabik S, Alvear-Sandoval RF, Ruiz MM, Sancho-Gómez JL, Figueiras-Vidal AR, Herrera F. MNIST- NET10: a heterogeneous deep networks fusion based on the degree of certainty to reach 0.1% error rate. Ensembles overview and proposal. Inf Fus. 2020;62:73–80.

    Google Scholar 

  124. Hong X, Xiong P, Ji R, Fan H. Deep fusion network for image completion. In: Proceedings of the 27th ACM International Conference on Multimedia; 2019. pp. 2033–42.

  125. Gallo I, Calefati A, Nawaz S. Multimodal classification fusion in real-world scenarios. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 5. IEEE; 2017. pp. 36–41.

  126. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint. 2014. arXiv:1409.1556.

  127. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. pp. 1–9.

  128. Xu Y, Xiao T, Zhang J, Yang K, Zhang Z. Scale- invariant convolutional neural networks. arXiv preprint. 2014. arXiv:1411.6369.

  129. Liao Z, Carneiro G. Competitive multi-scale convolution. arXiv preprint. 2015. arXiv:1511.05635.

  130. Du X, Qu X, He Y, Guo D. Single image super- resolution based on multi-scale competitive convolutional neural network. Sensors. 2018;18(3):789.

    Google Scholar 

  131. Chen X, Bin Y, Sang N, Gao C. Scale pyramid network for crowd counting. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE; 2019. pp. 1941–50.

  132. Szegedy C, Toshev A, Erhan D. Deep neural networks for object detection. In: Advances in neural information processing systems; 2013. pp. 2553–61.

  133. Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K. Densenet: implementing efficient convnet descriptor pyramids. arXiv preprint. 2014. arXiv:1404.1869.

  134. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint. 2013. arXiv:1312.6229.

  135. Wu R, Yan S, Shan Y, Dang Q, Sun G. Deep image: scaling up image recognition. arXiv preprint. 2015;7(8). arXiv:1501.02876.

  136. Kong T, Yao A, Chen Y, Sun F. Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. pp. 845–53.

  137. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. pp. 3431–40.

  138. Bell S, Lawrence Zitnick C, Bala K, Girshick R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. pp. 2874–83.

  139. Cai Z, Fan Q, Feris RS, Vasconcelos N. A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer; 2016. p. 354–70.

    Google Scholar 

  140. Li Y, Chen Y, Wang N, Zhang Z. Scale-aware trident networks for object detection. In: Proceedings of the IEEE international conference on computer vision; 2019. pp. 6054–63.

  141. Zhang Y, Zhou D, Chen S, Gao S, Ma Y. Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. pp. 589–97.

  142. Cui J, Chen P, Li R, Liu S, Shen X, Jia J. Fast and practical neural architecture search. In: Proceedings of the IEEE International Conference on Computer Vision; 2019. pp. 6509–18.

  143. Cai H, Zhu L, Han S. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint. 2018. arXiv:1812.00332.

  144. Cheng G, Han J, Zhou P, Xu D. Learning rotation- invariant and fisher discriminative convolutional neural networks for object detection. IEEE Trans Image Process. 2018;28(1):265–78.

    MathSciNet  MATH  Google Scholar 

  145. Wu F, Hu P, Kong D. Flip-rotate-pooling convolution and split dropout on convolution neural networks for image classification. arXiv preprint. 2015. arXiv:1507.08754.

  146. Jiang R, Mei S. Polar coordinate convolutional neural network: from rotation-invariance to translation-invariance. In: 2019 IEEE International Conference on Image Processing (ICIP). IEEE; 2019. pp. 355–59.

  147. Chen J, Luo Z, Zhang Z, Huang F, Ye Z, Takiguchi T, et al. Polar transformation on image features for orientation-invariant representations. IEEE Trans Multimedia. 2018;21(2):300–13.

    Google Scholar 

  148. Kim J, Jung W, Kim H, Lee J. CyCNN: a rotation invariant CNN using polar mapping and cylindrical convolution layers. arXiv preprint. 2020. arXiv:2007.10588.

  149. Esteves C, Allen-Blanchette C, Zhou X, Daniilidis K. Polar transformer networks. arXiv preprint. 2017. arXiv:1709.01889.

  150. Henriques JF, Vedaldi A. Warped convolutions: efficient invariance to spatial transformations. In: International Conference on Machine Learning. PMLR; 2017. pp. 1461–9.

  151. Schmidt U, Roth S. Learning rotation-aware features: from invariant priors to equivariant descriptors. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2012. pp. 2050–7.

  152. Amorim M, Bortoloti F, Ciarelli PM, de Oliveira E, de Souza AF. Analysing rotation-invariance of a log-polar transformation in convolutional neural networks. In: 2018 International Joint Conference on Neural Networks (IJCNN). IEEE; 2018. pp. 1–6.

  153. Remmelzwaal LA, Mishra AK, Ellis GF. Human eye inspired log-polar pre-processing for neural networks. In: 2020 International SAUPEC/RobMech/PRASA Conference. IEEE; 2020. pp. 1–6.

  154. Freeman WT, Adelson EH, et al. The design and use of steerable filters. IEEE Trans Pattern Anal Mach Intell. 1991;13(9):891–906.

    Google Scholar 

  155. Cohen TS, Welling M. Steerable CNNs. arXiv preprint. 2016. arXiv:1612.08498.

  156. Jacobsen JH, De Brabandere B, Smeulders AW. Dynamic steerable blocks in deep residual networks. arXiv preprint. 2017. arXiv:1706.00598.

  157. Weiler M, Hamprecht FA, Storath M. Learning steerable filters for rotation equivariant CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. pp. 849–58.

  158. Luan S, Chen C, Zhang B, Han J, Liu J. Gabor convolutional networks. IEEE Trans Image Process. 2018;27(9):4357–66.

    MathSciNet  Google Scholar 

  159. Su YC, Grauman K. Making 360 video watchable in 2d: learning videography for click free viewing. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2017. pp. 1368–76.

  160. Monroy R, Lutz S, Chalasani T, Smolic A. Salnet360: saliency maps for omni-directional images with CNN. Signal Process. 2018;69:26–34.

    Google Scholar 

  161. Khasanova R, Frossard P. Graph-based isometry invariant representation learning. arXiv preprint. 2017. arXiv:1703.00356.

  162. Khasanova R, Frossard P. Graph-based classification of omnidirectional images. In: Proceedings of the IEEE International Conference on Computer Vision Workshops; 2017. pp. 869–78.

  163. Cohen TS, Geiger M, Köhler J, Welling M. Spherical CNNs. arXiv preprint. 2018. arXiv:1801.10130.

  164. Zhao Q, Zhu C, Dai F, Ma Y, Jin G, Zhang Y. Distortion-aware CNNs for Spherical Images. In: IJCAI; 2018. pp. 1198–204.

  165. Zhang Z, Xu Y, Yu J, Gao S. Saliency detection in 360 videos. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. pp. 488–503.

  166. Perraudin N, Defferrard M, Kacprzak T, Sgier R. DeepSphere: efficient spherical convolutional neural network with HEALPix sampling for cosmological applications. Astronomy Comput. 2019;27:130–46.

    Google Scholar 

  167. Boomsma W, Frellsen J. Spherical convolutions and their application in molecular modelling. In: Advances in Neural Information Processing Systems; 2017. pp. 3433–43.

  168. Coors B, Paul Condurache A, Geiger A. Spherenet: learning spherical representations for detection and classification in omnidirectional images. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. pp. 518–33.

  169. Su YC, Grauman K. Learning spherical convolution for fast features from 360 imagery. In: Advances in Neural Information Processing Systems; 2017. pp. 529–39.

  170. Esteves C, Allen-Blanchette C, Makadia A, Daniilidis K. Learning so (3) equivariant representations with spherical CNNs. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. pp. 52–68.

  171. Su YC, Grauman K. Kernel transformer networks for compact spherical convolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019. pp. 9442–51.

  172. Schmalstieg D, Hollerer T. Augmented reality: principles and practice. Addison-Wesley Professional; 2016.

    Google Scholar 

  173. Hirabayashi M, Kurosawa K, Yokota R, Imoto D, Hawai Y, Akiba N, et al. Flying object detection system using an omnidirectional camera. Forensic Sci Int. 2020;35:301027.

    Google Scholar 

  174. Cohen TS, Geiger M, Weiler M. A general theory of equivariant cnns on homogeneous spaces. In: Advances in Neural Information Processing Systems; 2019. pp. 9145–56.

  175. Weiler M, Cesa G. General e (2)-equivariant steerable CNNs. In: Advances in Neural Information Processing Systems; 2019. pp. 14334–45.

  176. Kondor R, Trivedi S. On the generalization of equivariance and convolution in neural networks to the action of compact groups. arXiv preprint. 2018. arXiv:1802.03690.

  177. Folland GB. A course in abstract harmonic analysis, vol. 29. CRC Press; 2016.

    MATH  Google Scholar 

  178. Tai KS, Bailis P, Valiant G. Equivariant transformer networks. arXiv preprint. 2019. arXiv:1901.11399.

  179. Lenssen JE, Fey M, Libuschewski P. Group equivariant capsule networks. In: Advances in Neural Information Processing Systems; 2018. pp. 8844–53.

  180. Romero DW, Bekkers EJ, Tomczak JM, Hoogendoorn M. Attentive group equivariant convolutional networks. arXiv preprint. 2020. arXiv:2002.03830.

  181. Worrall D, Welling M. Deep scale-spaces: equivariance over scale. In: Advances in Neural Information Processing Systems; 2019. pp. 7366–78.

  182. Marcos D, Kellenberger B, Lobry S, Tuia D. Scale equivariance in CNNs with vector fields. arXiv preprint. 2018. arXiv:1807.11783.

  183. Sosnovik I, Szmaja M, Smeulders A. Scale-equivariant steerable networks. arXiv preprint. 2019. arXiv:1910.11093.

  184. Romero DW, Bekkers EJ, Tomczak JM, Hoogendoorn M. Wavelet networks: scale equivariant learning from raw waveforms. arXiv preprint. 2020. arXiv:2006.05259.

  185. Cheng X, Qiu Q, Calderbank R, Sapiro G. RotDCF: decomposition of convolutional filters for rotation-equivariant deep networks. arXiv preprint. 2018. arXiv:1805.06846.

  186. Dieleman S, Willett KW, Dambre J. Rotation- invariant convolutional neural networks for galaxy morphology prediction. Mon Not R Astron Soc. 2015;450(2):1441–59.

    Google Scholar 

  187. Cohen TS, weiler M, Kicanaoglu B, Welling M. Gauge equivariant convolutional networks and the icosahedral CNN. In: Proceedings of the 36th International Conference on Machine Learning, 2019:97:1321–30.

  188. Worrall D, Brostow G. Cubenet: equivariance to 3D rotation and translation. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. pp. 567–84.

  189. Cohen TS, Welling M. Transformation properties of learned visual representations. arXiv preprint. 2014. arXiv:1412.7659.

  190. Smets B, Portegies J, Bekkers E, Duits R. PDE-based group equivariant convolutional neural networks. arXiv preprint. 2020. arXiv:2001.09046.

  191. Romero DW, Hoogendoorn M. Co-attentive equivariant neural networks: Focusing equivariance on transformations co-occurring in data. arXiv preprint. 2019. arXiv:1911.07849.

  192. Romero DW, Cordonnier JB. Group equivariant stand-alone self-attention for vision. arXiv preprint. 2020. arXiv:2010.00977.

  193. Finzi M, Stanton S, Izmailov P, Wilson AG. Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. arXiv preprint. 2020. arXiv:2002.12880.

  194. Bruna J, Mallat S. Invariant scattering convolution networks. IEEE Trans Pattern Anal Mach Intell. 2013;35(8):1872–86.

    Google Scholar 

  195. Bekkers EJ. B-spline CNNs on lie groups. arXiv preprint. 2019. arXiv:1909.12057.

  196. Fey M, Eric Lenssen J, Weichert F, Mu¨ller H. Splinecnn: fast geometric deep learning with continuous b-spline kernels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. pp. 869–77.

  197. Itti L, Koch C. Computational modelling of visual attention. Nat Rev Neurosci. 2001;2(3):194–203.

    Google Scholar 

  198. Dey N, Chen A, Ghafurian S. Group equivariant generative adversarial networks. arXiv preprint. 2020. arXiv:2005.01683.

  199. Shen C, Wang X, Song J, Sun L, Song M. Amalgamating knowledge towards comprehensive classification. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33; 2019. pp. 3068–75.

  200. Carlucci FM, D’Innocente A, Bucci S, Caputo B, Tommasi T. Domain generalization by solving jigsaw puzzles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019. pp. 2229–38.

  201. Finn C, Abbeel P, Levine S. Model-agnostic meta- learning for fast adaptation of deep networks. arXiv preprint. 2017. arXiv:1703.03400.

  202. Jarvers C, Neumann H. Incorporating feedback in convolutional neural networks. In: Proceedings of the Cognitive Computational Neuroscience Conference; 2019. pp. 395–8.

  203. Marblestone AH, Wayne G, Kording KP. Toward an integration of deep learning and neuroscience. Front Comput Neurosci. 2016;10:94.

    Google Scholar 

  204. Hu T, Yang P, Zhang C, Yu G, Mu Y, Snoek CG. Attention-based multi-context guiding for few-shot semantic segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33; 2019. pp 8441–8.

  205. Hutter F, Kotthoff L, Vanschoren J. Automated machine learning: methods, systems, challenges. Springer Nature; 2019.

    Google Scholar 

  206. He X, Zhao K, Chu X. AutoML: a survey of the state-of-the-art. arXiv preprint. 2019. arXiv:1908.00709.

  207. Zoph B, Le QV. Neural architecture search with reinforcement learning. arXiv preprint. 2016. arXiv:1611.01578.

  208. Peng J, Sun M, ZHANG ZX, Tan T, Yan J. Efficient neural architecture transformation search in channel- level for object detection. In: Advances in Neural Information Processing Systems; 2019. pp. 14313–22.

  209. Nekrasov V, Chen H, Shen C, Reid I. Fast neural architecture search of compact semantic segmentation models via auxiliary cells. In: Proceedings of the IEEE Conference on computer vision and pattern recognition; 2019. pp. 9126–35.

  210. Zhang Y, Qiu Z, Liu J, Yao T, Liu D, Mei T. Customizable architecture search for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019. pp. 11641–50.

  211. Liu C, Chen LC, Schroff F, Adam H, Hua W, Yuille AL, et al. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2019. pp. 82–92.

  212. Elsken T, Staffler B, Metzen JH, Hutter F. Meta-learning of neural architectures for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. pp. 12365–75.

  213. Biedenkapp A, Bozkurt HF, Eimer T, Hutter F, Lindauer M. Dynamic algorithm configuration: foundation of a new meta-algorithmic framework. In: Proceedings of the Twenty-fourth European Conference on Artificial Intelligence (ECAI’20) (Jun 2020); 2020.

  214. Elsken T, Metzen JH, Hutter F. Simple and efficient architecture search for convolutional neural networks. arXiv preprint. 2017. arXiv:1711.04528.

  215. Veniat T, Denoyer L. Learning time/memory-efficient deep architectures with budgeted super networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. pp. 3492–500.

  216. Jin H, Song Q, Hu X. Auto-keras: An efficient neural architecture search system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2019. pp. 1946–56.

  217. Feurer M, Klein A, Eggensperger K, Springenberg JT, Blum M, Hutter F. Auto-sklearn: efficient and robust automated machine learning. In: Automated machine learning. Cham: Springer; 2019. p. 113–34.

    Google Scholar 

Download references

Acknowledgements

The work was entirely done by the authors without the direct involvement of any third party.

Funding

No funding has been received for this work.

Author information

Authors and Affiliations

Authors

Contributions

The individual contributions of authors to this manuscript is specified below. AM: conceived the study and formulated the original problem. FM: helped in providing the initial direction of the work. Both authors jointly developed the methodology and categorized common approaches for encoding geometric transformation invariance. AM: surveyed approaches to transformation invariance in general contexts (that is, affine and arbitrary transformations). In addition, he analyzed the various methods and architectures for rotation-equivariance, surveyed new research and development trends relating to the design of deep convolutional neural network architectures. FM: principally surveyed approaches for symmetry group equivariance, as well as techniques for encoding invariance in single transformations (specifically, scale, rotation and projective transformations). Both authors jointly design the artwork for the illustrations. The authors also equally participated in drafting the manuscript and its subsequent revisions.

Corresponding author

Correspondence to Alhassan Mumuni.

Ethics declarations

Conflict of Interest

There are no competing interests or conflict of interests associated with this work, and there has not been any support or external involvement of any sort in this work.

Consent to Participate

Not applicable. No person’s data have been used in this work. Consequently, consent is not needed for its publication.

Final Comments

Both authors agree to be fully responsible for all aspects of the work, including the content and all ethical and legal issues.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mumuni, A., Mumuni, F. CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review. SN COMPUT. SCI. 2, 340 (2021). https://doi.org/10.1007/s42979-021-00735-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00735-0

Keywords

  • Convolutional neural network
  • Robust computer vision
  • Invariant recognition
  • Transformation-equivariant network
  • Symmetry group transformation