Skip to main content

MgNet: A unified framework of multigrid and convolutional neural network

Abstract

We develop a unified model, known as MgNet, that simultaneously recovers some convolutional neural networks (CNN) for image classification and multigrid (MG) methods for solving discretized partial differential equations (PDEs). This model is based on close connections that we have observed and uncovered between the CNN and MG methodologies. For example, pooling operation and feature extraction in CNN correspond directly to restriction operation and iterative smoothers in MG, respectively. As the solution space is often the dual of the data space in PDEs, the analogous concept of feature space and data space (which are dual to each other) is introduced in CNN. With such connections and new concept in the unified model, the function of various convolution operations and pooling used in CNN can be better understood. As a result, modified CNN models (with fewer weights and hyperparameters) are developed that exhibit competitive and sometimes better performance in comparison with existing CNN models when applied to both CIFAR-10 and CIFAR-100 data sets.

This is a preview of subscription content, access via your institution.

References

  1. 1

    Barron A R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inform Theory, 1993, 39: 930–945

    MathSciNet  Article  MATH  Google Scholar 

  2. 2

    Bottou L, Curtis F E, Nocedal J. Optimization methods for large-scale machine learning. SIAM Rev, 2018, 60: 223–311

    MathSciNet  Article  MATH  Google Scholar 

  3. 3

    Chang B, Meng L, Haber E, et al. Multi-level residual networks from dynamical systems view. ArXiv:1710.10348, 2017

    Google Scholar 

  4. 4

    Chen Y, Li J, Xiao H, et al. Dual path networks. In: Advances in Neural Information Processing Systems, vol. 30. Long Beach: Neural Information Processing Systems Foundation, 2017, 4467–4475

    Google Scholar 

  5. 5

    Cybenko G. Approximation by superpositions of a sigmoidal function. Math Control Signals Systems, 1989, 2: 303–314

    MathSciNet  Article  MATH  Google Scholar 

  6. 6

    Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2009, 248–255

    Google Scholar 

  7. 7

    E W. A proposal on machine learning via dynamical systems. Commun Math Stat, 2017, 5: 1–11

    MathSciNet  Article  MATH  Google Scholar 

  8. 8

    E W, Wang Q. Exponential convergence of the deep neural network approximation for analytic functions. Sci China Math, 2018, 61: 1733–1740

    MathSciNet  Article  MATH  Google Scholar 

  9. 9

    Ellacott S W. Aspects of the numerical analysis of neural networks. Acta Numer, 1994, 3: 145–202

    MathSciNet  Article  MATH  Google Scholar 

  10. 10

    Golub G H, Van Loan C F. Matrix Computations, 3rd ed. Baltimore: Johns Hopkins University Press, 2012

    MATH  Google Scholar 

  11. 11

    Gomez A N, Ren M, Urtasun R, et al. The reversible residual network: Backpropagation without storing activations. In: Advances in Neural Information Processing Systems, vol. 30. Long Beach: Neural Information Processing Systems Foundation, 2017, 2214–2224

    Google Scholar 

  12. 12

    Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge: MIT Press, 2017

    MATH  Google Scholar 

  13. 13

    Haber E, Ruthotto L, Holtham E. Learning across scales—A multiscale method for convolution neural networks. ArXiv:1703.02009, 2017

    Google Scholar 

  14. 14

    Hackbusch W. Iterative Solution of Large Sparse Systems of Equations. New York: Springer, 1994

    Book  MATH  Google Scholar 

  15. 15

    Hackbusch W. Multi-grid Methods and Applications. Heidelberg: Springer, 2013

    MATH  Google Scholar 

  16. 16

    He J, Li L, Xu J, et al. ReLU deep neural networks and linear finite elements. ArXiv:1807.03973, 2018

    Google Scholar 

  17. 17

    He K, Zhang X, Ren S, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015, 1026–1034

    Google Scholar 

  18. 18

    He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016, 770–778

    Google Scholar 

  19. 19

    He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016, 630–645

    Google Scholar 

  20. 20

    Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Networks, 1989, 2: 359–366

    Article  MATH  Google Scholar 

  21. 21

    Hsieh J T, Zhao S, Eismann S, et al. Learning neural PDE solvers with convergence guarantees. In: Proceedings of the 7th International Conference on Learning Representations. https://openreview.net/forum?id=rklaWn0qK7, 2019

    Google Scholar 

  22. 22

    Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017, 4700–4708

    Google Scholar 

  23. 23

    Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ArXiv:1502.03167, 2015

    Google Scholar 

  24. 24

    Katrutsa A, Daulbaev T, Oseledets I. Deep multigrid: Learning prolongation and restriction matrices. ArXiv:1711.03825, 2017

    Google Scholar 

  25. 25

    Ke T W, Maire M, Stella X Y. Multigrid neural architectures. ArXiv:1611.07661, 2016

    Google Scholar 

  26. 26

    Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Technical report. Toronto: University of Toronto, 2009

    Google Scholar 

  27. 27

    Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25. Lake Tahoe: Neural Information Processing Systems Foundation, 2012, 1097–1105

    Google Scholar 

  28. 28

    Larsson G, Maire M, Shakhnarovich S. Fractalnet: Ultra-deep neural networks without residuals. ArXiv:1605.07648, 2016

    Google Scholar 

  29. 29

    LeCun L, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, vol. 86. New York: IEEE, 1998, 2278–2324

    Google Scholar 

  30. 30

    LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436

    Article  Google Scholar 

  31. 31

    Li Z, Shi Z. A flow model of neural networks. ArXiv:1708.06257v2, 2017

    Google Scholar 

  32. 32

    Lin T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017, 2117–2125

    Google Scholar 

  33. 33

    Liu D, Wen B, Liu X, et al. When image denoising meets high-level vision tasks: A deep learning approach. ArXiv:1706.04284, 2017

    Google Scholar 

  34. 34

    Long Z, Lu Y, Dong B. PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network. ArXiv:1812.04426, 2018

    Google Scholar 

  35. 35

    Long Z, Lu Y, Ma X, et al. PDE-Net: Learning PDEs from data. In: Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018, 3214–3222

    Google Scholar 

  36. 36

    Lu Y, Zhong A, Li Q, et al. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In: Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018, 3282–3291

    Google Scholar 

  37. 37

    Mao X, Shen C, Yang Y B. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In: Advances in Neural Information Processing Systems, vol. 29. Barcelona: Neural Information Processing Systems Foundation, 2016, 2802–2810

    Google Scholar 

  38. 38

    Milletari F, Navab N, Ahmadi S A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of 2016 4th International Conference on 3D Vision. Stanford: IEEE, 2016, 565–571

    Google Scholar 

  39. 39

    Montanelli H, Du Q. Deep ReLU networks lessen the curse of dimensionality. ArXiv:1712.08688, 2017

    Google Scholar 

  40. 40

    Nair V, Hinton G E. Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning. Haifa: PMLR, 2010, 807–814

    Google Scholar 

  41. 41

    Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015, 1520–1528

    Google Scholar 

  42. 42

    Pinkus A. Approximation theory of the MLP model in neural networks. Acta Numer, 1999, 8: 143–195

    MathSciNet  Article  MATH  Google Scholar 

  43. 43

    Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Proceedings of Medical Image Computing and Computer-Assisted Intervention. Munich: Springer, 2015, 234–241

    Google Scholar 

  44. 44

    Shaham U, Cloninger A, Coifman R R. Provable approximation properties for deep neural networks. Appl Comput Harmon Anal, 2018, 44: 537–557

    MathSciNet  Article  MATH  Google Scholar 

  45. 45

    Siegel J W, Xu J. On the approximation properties of neural networks. ArXiv:1904.02311, 2019

    Google Scholar 

  46. 46

    Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. ArXiv:1409.1556, 2014

    Google Scholar 

  47. 47

    Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015, 1–9

    Google Scholar 

  48. 48

    Xu J. Iterative methods by space decomposition and subspace correction. SIAM Rev, 1992, 34: 581–613

    MathSciNet  Article  MATH  Google Scholar 

  49. 49

    Xu J. The Finite Element Methods. http://www.multigrid.org/wiki, 2019

    Google Scholar 

  50. 50

    Xu J, Zikatanov L. The method of alternating projections and the method of subspace corrections in Hilbert space. J Amer Math Soc, 2002, 15: 573–597

    MathSciNet  Article  MATH  Google Scholar 

  51. 51

    Xu J, Zikatanov L. Algebraic multigrid methods. Acta Numer, 2017, 26: 591–721

    MathSciNet  Article  MATH  Google Scholar 

  52. 52

    Zagoruyko S, Komodakis N. Wide residual networks. ArXiv:1605.07146, 2016

    Book  Google Scholar 

  53. 53

    Zhang T, Qi G J, Xiao B, et al. Interleaved group convolutions. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017, 4373–4382

    Google Scholar 

  54. 54

    Zhang X, Li Z, Change Loy C, et al. Polynet: A pursuit of structural diversity in very deep networks. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017, 3900–3908

    Google Scholar 

  55. 55

    Zhou D X. Universality of deep convolutional neural networks. ArXiv:1805.10769, 2018

    Book  MATH  Google Scholar 

Download references

Acknowledgements

The first author was supported by the Elite Program of Computational and Applied Mathematics for PhD Candidates of Peking University. The second author was supported in part by the National Science Foundation of USA (Grant No. DMS-1819157) and the US Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics Program (Grant No. DE-SC0014400). The authors thank Xiaodong Jia for his help with the numerical experiments.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jinchao Xu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

He, J., Xu, J. MgNet: A unified framework of multigrid and convolutional neural network. Sci. China Math. 62, 1331–1354 (2019). https://doi.org/10.1007/s11425-019-9547-2

Download citation

Keywords

  • convolutional neural network
  • multigrid
  • unified framework
  • network architecture

MSC(2010)

  • 65D19
  • 65N55
  • 68T30