Abstract
Convolutional neural networks (CNN) have recently seen tremendous success in various computer vision tasks. However, their application to problems with high dimensional input and output, such as high-resolution image and video segmentation or 3D medical imaging, has been limited by various factors. Primarily, in the training stage, it is necessary to store network activations for back-propagation. In these settings, the memory requirements associated with storing activations can exceed what is feasible with current hardware, especially for problems in 3D. Motivated by the propagation of signals over physical networks, that are governed by the hyperbolic Telegraph equation, in this work we introduce a fully conservative hyperbolic network for problems with high-dimensional input and output. We introduce a coarsening operation that allows completely reversible CNNs by using a learnable discrete wavelet transform and its inverse to both coarsen and interpolate the network state and change the number of channels. We show that fully reversible networks are able to achieve results comparable to the state of the art in 4D time-lapse hyper-spectral image segmentation and full 3D video segmentation, with a much lower memory footprint that is a constant independent of the network depth. We also extend the use of such networks to variational auto-encoders, where optimization begins from an exact recovery and we discover the level of compression through optimization.
Similar content being viewed by others
Availability of data and materials
All data used in this manuscript has been cited and is publicly available. The hyperspectral example used in Sect. 5.1.1 is publicly available at [22]. The bear video used in Sect. 5.1.2 is publicly available at [47]. The MNIST dataset used in Sect. 5.2.1 is publicly available at [34]. The CelebA dataset used in Sect. 5.2.2 is publicly available at [42].
Code availability
An open source implementation of the proposed network is available on Github: https://github.com/klensink/HyperNet.
References
Ardizzone, L., Lüth, C., Kruse, J., Rother, C., Köthe, U.: Guided image generation with conditional invertible neural networks (2019)
Avendi, M., Kheradvar, A., Jafarkhani, H.: A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac mri. Med. Image Anal. 30, 108–119 (2016)
Bengio, Y.: Learning deep architectures for AI. Found. Trends® Mach. Learn. 2(1), 1–127 (2009)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis (2018)
Bruna, J., Mallat, S.: Invariant scattering convolution networks. https://doi.org/10.48550/ARXIV.1203.1513. arXiv:1203.1513 (2012)
Caelles, S., Pumarola, A., Moreno-Noguer, F., Sanfeliu, A., Van Gool, L.: Fast video object segmentation with spatio-temporal GANs. arXiv preprint arXiv:1903.12161 (2019)
Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: AAAI Conference on AI (2018)
Chen, T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. CoRR arXiv:1806.07366 (2018)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) Medical Image Computing and Computer-Assisted Intervention—MICCAI 2016, pp. 424–432. Springer International Publishing, Cham (2016)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. CoRR arXiv:1606.06650 (2016)
Dai, B., Seljak, U.: Sliced iterative normalizing flows. https://doi.org/10.48550/ARXIV.2007.00674. arXiv:2007.00674 (2020)
Daubechies, I.: Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41(3), 909–996 (1988). https://doi.org/10.1002/cpa.3160410705
Dieng, A.B., Ruiz, F.J.R., Blei, D.M., Titsias, M.K.: Prescribed generative adversarial networks. https://doi.org/10.48550/ARXIV.1910.04302. arXiv:1910.04302 (2019)
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. CoRR arXiv:1605.08803 (2016)
Etmann, C., Ke, R., Schönlieb, C.B.: iUNets: fully invertible U-Nets with learnable up- and downsampling (2020)
Fujieda, S., Takayama, K., Hachisuka, T.: Wavelet convolutional neural networks for texture classification. arXiv preprint arXiv:1707.07394 (2017)
Gholami, A., Keutzer, K., Biros, G.: Anode: unconditionally accurate memory-efficient gradients for neural odes (2019)
Gomez, A.N., Ren, M., Urtasun, R., Grosse, R.B.: The reversible residual network: backpropagation without storing activations. In: Advances in Neural Information Processing Systems, pp. 2211–2221 (2017)
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press, Cambridge (2016)
Hammernik, K., Klatzer, T., Kobler, E., Recht, M.P., Sodickson, D.K., Pock, T., Knoll, F.: Learning a variational network for reconstruction of accelerated MRI data. Magn. Reson. Med. 79(6), 3055–3071 (2017). https://doi.org/10.1002/mrm.26977
Hanin, B.: Universal function approximation by deep neural nets with bounded width and relu activations. arXiv preprint arXiv:1708.02691v3 (2017)
Hasanlou, M., Seydi, S.T.: Hyperspectral change detection: an experimental comparative study. Int. J. Remote Sens. 39(20), 7029–7083 (2018). https://doi.org/10.1080/01431161.2018.1466079
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, M., Li, B., Chen, H.: Multi-scale 3D deep convolutional neural network for hyperspectral image classification. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3904–3908 (2017). https://doi.org/10.1109/ICIP.2017.8297014
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. https://doi.org/10.48550/ARXIV.1706.08500. arXiv:1706.08500 (2017)
Hou, R., Chen, C., Shah, M.: An end-to-end 3D convolutional neural network for action detection and segmentation in videos. arXiv preprint arXiv:1712.01111 (2017)
Jacobsen, J., Smeulders, A.W.M., Oyallon, E.: i-RevNet: deep invertible networks. CoRR arXiv:1802.07088 (2018)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation (2017)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks (2018)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013)
Kolouri, S., Pope, P.E., Martin, C.E., Rohde, G.K.: Sliced-wasserstein autoencoder: an embarrassingly simple generative model. https://doi.org/10.48550/ARXIV.1804.01947. arXiv:1804.01947 (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lecun, Y., Cortes, C.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
LeCun, Y., Cortes, C.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/ (2010)
Lee, H., Kwon, H.: Contextual deep CNN based hyperspectral classification. In: 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 3322–3325 (2016). https://doi.org/10.1109/IGARSS.2016.7729859
Li, J., Liang, B., Wang, Y.: A hybrid neural network for hyperspectral image classification. Remote Sens. Lett. 11(1), 96–105 (2020). https://doi.org/10.1080/2150704X.2019.1686780
Li, Y., Zhang, H., Shen, Q.: Spectral-spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote Sens. 9(1), 67 (2017). https://doi.org/10.3390/rs9010067
Lin, Z., Khetan, A., Fanti, G., Oh, S.: PacGAN: the power of two samples in generative adversarial networks. https://doi.org/10.48550/ARXIV.1712.04086. arXiv:1712.04086 (2017)
Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W.: Multi-level wavelet-CNN for image restoration. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 886–88609 (2018). https://doi.org/10.1109/CVPRW.2018.00121
Liu, S., Zhong, G., De Mello, S., Gu, J., Jampani, V., Yang, M.H., Kautz, J.: Switchable temporal propagation network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 89–104. Springer International Publishing, Cham (2018)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 4898–4906. Curran Associates Inc., Red Hook (2016)
Mallat, S.: Group invariant scattering. https://doi.org/10.48550/ARXIV.1101.2286. arXiv:1101.2286 (2011)
Oh, S.W., Lee, J., Sunkavalli, K., Kim, S.J.: Fast video object segmentation by reference-guided mask propagation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7376–7385. https://doi.org/10.1109/CVPR.2018.00770 (2018)
Oyallon, E., Belilovsky, E., Zagoruyko, S.: Scaling the scattering transform: deep hybrid networks. https://doi.org/10.48550/ARXIV.1703.08961. arXiv:1703.08961 (2017)
Perazzi, F., Pont-Tuset, J., McWilliams, B., Gool, L.V., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 724–732 (2016). https://doi.org/10.1109/CVPR.2016.85
Peters, B., Granek, J., Haber, E.: Multiresolution neural networks for tracking seismic horizons from few training images. Interpretation 7(3), SE201–SE213 (2019). https://doi.org/10.1190/INT-2018-0225.1
Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C: The Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (1992)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. https://doi.org/10.48550/ARXIV.1511.06434. arXiv:1511.06434 (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. CoRR arXiv:1505.04597 (2015)
Ruthotto, L., Haber, E.: Deep neural networks motivated by partial differential equations. arXiv preprint arXiv:1804.04272 (2018)
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. CoRR arXiv:1606.03498 (2016)
Seitzer, M.: pytorch-fid: FID Score for PyTorch. https://github.com/mseitzer/pytorch-fid (2020). Version 0.2.1
Shah, S., Ghosh, P., Davis, L.S., Goldstein, T.: Stacked U-Nets: a no-frills approach to natural image segmentation. CoRR arXiv:1804.10343 (2018)
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017). https://doi.org/10.1109/TPAMI.2016.2572683
Srivastava, A., Valkov, L., Russell, C., Gutmann, M.U., Sutton, C.: Veegan: Reducing mode collapse in GANs using implicit variational learning. https://doi.org/10.48550/ARXIV.1705.07761. arXiv:1705.07761 (2017)
Székely, G.J., Rizzo, M.L.: Testing for Equal Distributions in High Dimensions. InterStat, Durban (2004)
Tao, X., Gao, H., Wang, Y., Shen, X., Wang, J., Jia, J.: Scale-recurrent network for deep image deblurring. CoRR arXiv:1802.01770 (2018)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Deep end2end voxel2voxel prediction. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 402–409 (2016). https://doi.org/10.1109/CVPRW.2016.57
Truchetet, F., Laligant, O.: Wavelets in industrial applications: a review. In: Wavelet Applications in Industrial Processing II, vol. 5607 (2004). https://doi.org/10.1117/12.580395
Xu, Q., Xiao, Y., Wang, D., Luo, B.: CSA-MSO3DCNN: multiscale octave 3D CNN with channel and spatial attention for hyperspectral image classification. Remote Sens. 12(1), 188 (2020). https://doi.org/10.3390/rs12010188
Xue, Z.: A general generative adversarial capsule network for hyperspectral image spectral–spatial classification. Remote Sens. Lett. 11(1), 19–28 (2020). https://doi.org/10.1080/2150704X.2019.1681598
Yu, J.J., Derpanis, K.G., Brubaker, M.A.: Wavelet flow: fast training of high resolution normalizing flows (2020)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). https://doi.org/10.1109/cvpr.2017.660
Zhou, Y., Luo, Z.: A Crank–Nicolson collocation spectral method for the two-dimensional telegraph equations. J. Inequal. Appl. 2018, 137 (2018). https://doi.org/10.1186/s13660-018-1728-5
Acknowledgements
K.L. and E.H. acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lensink, K., Peters, B. & Haber, E. Fully hyperbolic convolutional neural networks. Res Math Sci 9, 60 (2022). https://doi.org/10.1007/s40687-022-00343-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40687-022-00343-1