Skip to main content
Log in

Model complexity of deep learning: a survey

  • Survey Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Model complexity is a fundamental problem in deep learning. In this paper, we conduct a systematic overview of the latest studies on model complexity in deep learning. Model complexity of deep learning can be categorized into expressive capacity and effective model complexity. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process, and data complexity. We also discuss the applications of deep learning model complexity including understanding model generalization, model optimization, and model selection and design. We conclude by proposing several interesting future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Adams RA, Fournier JJ (2003) Sobolev spaces. Elsevier, Amsterdam

    MATH  Google Scholar 

  2. Allen-Zhu Z, Li Y, Liang Y (2019) Learning and generalization in overparameterized neural networks, going beyond two layers. In: Advances in neural information processing systems, pp 6155–6166

  3. Arora R, Basu A, Mianjy P, Mukherjee A (2018) Understanding deep neural networks with rectified linear units. In: International conference on learning representations

  4. Arora S, Barak B (2009) Computational complexity: a modern approach. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  5. Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in neural information processing systems, pp 2654–2662

  6. Balasubramanian V (1997) Statistical inference, Occams razor, and statistical mechanics on the space of probability distributions. Neural Comput 9(2):349–368

    Article  MATH  Google Scholar 

  7. Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39(3):930–945

    Article  MathSciNet  MATH  Google Scholar 

  8. Bartlett PL, Boucheron S, Lugosi G (2002) Model selection and error estimation. Mach Learn 48(1–3):85–113

    Article  MATH  Google Scholar 

  9. Bartlett PL, Foster DJ, Telgarsky MJ (2017) Spectrally-normalized margin bounds for neural networks. In: Advances in neural information processing systems, pp 6240–6249

  10. Bartlett PL, Harvey N, Liaw C (2019) A Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks. J Mach Learn Res 20(63):1–17

    MathSciNet  MATH  Google Scholar 

  11. Bartlett PL, Maiorov V, Meir R (1998) Almost linear vc-dimension bounds for piecewise polynomial networks. Neural Comput 10(8):2159–2173

    Article  Google Scholar 

  12. Bartlett PL, Mendelson S (2002) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3(Nov):463–482

    MathSciNet  MATH  Google Scholar 

  13. Bengio Y, Delalleau O (2011) On the expressive power of deep architectures. In: International conference on algorithmic learning theory, Springer, pp 18–36

  14. Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learn Syst 25(8):1553–1565

    Article  Google Scholar 

  15. Bianchini M, Scarselli F (2014) On the complexity of shallow and deep neural network classifiers. In: ESANN

  16. Bohanec M, Bratko I (1994) Trading accuracy for simplicity in decision trees. Mach Learn 15(3):223–250

    MATH  Google Scholar 

  17. Bonaccorso G (2017) Machine learning algorithms. Packt Publishing Ltd, Birmingham

    Google Scholar 

  18. Bredon GE (2013) Topology and geometry, vol 139. Springer, Berlin

    MATH  Google Scholar 

  19. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton

    MATH  Google Scholar 

  20. Buhrman H, De Wolf R (2002) Complexity measures and decision tree complexity: a survey. Theoret Comput Sci 288(1):21–43

    Article  MathSciNet  MATH  Google Scholar 

  21. Bulso N, Marsili M, Roudi Y (2019) On the complexity of logistic regression models. Neural Comput 31(8):1592–1623

    Article  MathSciNet  MATH  Google Scholar 

  22. Cano JR (2013) Analysis of data complexity measures for classification. Expert Syst Appl 40(12):4820–4831

    Article  Google Scholar 

  23. Carothers NL (2000) Real analysis. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  24. Carroll JD, Chang JJ (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of eckart-young decomposition. Psychometrika 35(3):283–319

    Article  MATH  Google Scholar 

  25. Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Process Mag 35(1):126–136

    Article  Google Scholar 

  26. Cherkassky V, Shao X, Mulier FM, Vapnik VN (1999) Model complexity control for regression using vc generalization bounds. IEEE Trans Neural Netw 10(5):1075–1089

    Article  Google Scholar 

  27. Cohen N, Shashua A (2016) Convolutional rectifier networks as generalized tensor decompositions. In: International conference on machine learning, pp 955–963

  28. Cook S, Dwork C, Reischuk R (1986) Upper and lower time bounds for parallel random access machines without simultaneous writes. SIAM J Comput 15(1):87–97

    Article  MathSciNet  MATH  Google Scholar 

  29. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314

    Article  MathSciNet  MATH  Google Scholar 

  30. Delalleau O, Bengio Y (2011) Shallow vs. deep sum-product networks. In: Advances in neural information processing systems, pp 666–674

  31. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255

  32. Du J (2016) The weight of models and complexity. Complexity 21(3):21–35

    Article  MathSciNet  Google Scholar 

  33. Frieden B (2004) Science from fisher information: a unification. Cambridge univ. Press, Cambridge

    Book  MATH  Google Scholar 

  34. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, New York

    MATH  Google Scholar 

  35. Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: International conference on machine learning, PMLR, pp 1319–1327

  36. Gühring I, Kutyniok G, Petersen P (2019) Complexity bounds for approximations with deep relu neural networks in sobolev norms

  37. Hanin B, Rolnick D (2019) Complexity of linear regions in deep networks. In: International conference on machine learning, PMLR, pp 2596–2604

  38. Hanin B, Sellke M (2017) Approximating continuous functions by relu nets of minimal width. arXiv preprint arXiv:1710.11278

  39. Hayou S, Doucet A, Rousseau J (2018) On the selection of initialization and activation function for deep neural networks. STAT 1050:7

    Google Scholar 

  40. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  41. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. STAT 1050:9

    Google Scholar 

  42. Höge M, Wöhling T, Nowak W (2018) A primer for model selection: the decisive role of model complexity. Water Resour Res 54(3):1688–1715

    Article  Google Scholar 

  43. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366

    Article  MATH  Google Scholar 

  44. Hu X, Liu W, Bian J, Pei J (2020) Measuring model complexity of neural networks with curve activation functions. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1521–1531

  45. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, proceedings of machine learning research, pp 448–456

  46. Kakade SM, Sridharan K, Tewari A (2009) On the complexity of linear prediction: risk bounds, margin bounds, and regularization. In: Advances in neural information processing systems, pp 793–800

  47. Kalimeris D, Kaplun G, Nakkiran P, Edelman B, Yang T, Barak B, Zhang H (2019) Sgd on neural networks learns functions of increasing complexity. In: Advances in neural information processing systems, pp 3491–3501

  48. Kalman BL, Kwasny SC (1992) Why tanh: choosing a sigmoidal function. In: Proceedings 1992 of IJCNN international joint conference on neural networks, vol 4, IEEE, pp 578–581

  49. Kawaguchi K, Kaelbling LP, Bengio Y (2017) Generalization in deep learning. arXiv preprint arXiv:1710.05468

  50. Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: International conference on learning representations

  51. Khrulkov V, Novikov A, Oseledets I (2018) Expressive power of recurrent neural networks. In: International conference on learning representations

  52. Kileel J, Trager M, Bruna J (2019) On the expressive power of deep polynomial neural networks. In: Advances in neural information processing systems, pp 10310–10319

  53. Kileel J, Trager M, Bruna J (2019) On the expressive power of deep polynomial neural networks. In: Advances in neural information processing systems, pp 10310–10319

  54. Kuurkova V (2018) Constructive lower bounds on model complexity of shallow perceptron networks. Neural Comput Appl 29(7):305–315

    Article  Google Scholar 

  55. Lample G, Ott M, Conneau A, Denoyer L, Ranzato M (2018) Phrase-based and neural unsupervised machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 5039–5049

  56. Landsberg JM (2012) Tensors: geometry and applications. Rep Theory 381(402):3

    MATH  Google Scholar 

  57. Laredo D, Ma SF, Leylaz G, Schütze O, Sun JQ (2020) Automatic model selection for fully connected neural networks. Int J Dyn Control 8(4):1063–1079

    Article  Google Scholar 

  58. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  59. Li L (2006) Data complexity in machine learning and novel classification algorithms. PhD thesis, California Institute of Technology

  60. Liang T, Poggio T, Rakhlin A, Stokes J (2019) Fisher-rao metric, geometry, and complexity of neural networks. In: The 22nd international conference on artificial intelligence and statistics, PMLR, pp 888–896

  61. Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–228

    Article  MATH  Google Scholar 

  62. Liu C, Zoph B, Neumann M, Shlens J, Hua W, Li LJ, Fei-Fei L, Yuille A, Huang J, Murphy K (2018) Progressive neural architecture search. In: Proceedings of the European conference on computer vision (ECCV), pp 19–34

  63. Liu H, Simonyan K, Yang Y (2019) Darts: differentiable architecture search. In: International conference on learning representations

  64. Lu Z, Pu H, Wang F, Hu Z, Wang L (2017) The expressive power of neural networks: a view from the width. In: Advances in neural information processing systems, pp 6231–6239

  65. Lundqvist S, Oneto A, Reznick B, Shapiro B (2019) On generic and maximal k-ranks of binary forms. J Pure Appl Algebra 223(5):2062–2079

    Article  MathSciNet  MATH  Google Scholar 

  66. Maass W (1994) Neural nets with superlinear vc-dimension. Neural Comput 6(5):877–884

    Article  MATH  Google Scholar 

  67. Mhaskar H, Liao Q, Poggio T (2017). When and why are deep networks better than shallow ones? In: Proceedings of the thirty-first AAAI conference on artificial intelligence, AAAI’17, AAAI Press, pp 2343–2349

  68. Michel B, Nouy A (2020) Learning with tree tensor networks: complexity estimates and model selection. arXiv preprint arXiv:2007.01165

  69. Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of machine learning. MIT press, New York

    MATH  Google Scholar 

  70. Montufar GF, Pascanu R, Cho K, Bengio Y (2014) On the number of linear regions of deep neural networks. In: Advances in neural information processing systems, pp 2924–2932

  71. Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, New York

    MATH  Google Scholar 

  72. Myung IJ (2000) The importance of complexity in model selection. J Math Psychol 44(1):190–204

    Article  MATH  Google Scholar 

  73. Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning, pp 807–814

  74. Nakkiran P, Kaplun G, Bansal Y, Yang T, Barak B, Sutskever I (2020) Deep double descent: where bigger models and more data hurt. In: International conference on learning representations

  75. Neyshabur B (2017) Implicit regularization in deep learning. arXiv preprint arXiv:1709.01953

  76. Neyshabur B, Bhojanapalli S, McAllester D, Srebro N (2017) Exploring generalization in deep learning. In: Advances in neural information processing systems, pp 5947–5956

  77. Neyshabur B, Li Z, Bhojanapalli S, LeCun Y, Srebro N (2018) The role of over-parametrization in generalization of neural networks. In: International conference on learning representations

  78. Neyshabur B, Tomioka R, Srebro N (2015) In search of the real inductive bias: on the role of implicit regularization in deep learning. In: International conference on learning representations (workshop)

  79. Neyshabur B, Tomioka R, Srebro N (2015) Norm-based capacity control in neural networks. In: Conference on learning theory, pp 1376–1401

  80. Nisan N, Szegedy M (1994) On the degree of Boolean functions as real polynomials. Comput Complex 4(4):301–313

    Article  MathSciNet  MATH  Google Scholar 

  81. Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J (2018) Sensitivity and generalization in neural networks: an empirical study. In: International conference on learning representations

  82. Nwankpa C, Ijomah W, Gachagan A, Marshall S (2021) Activation functions: comparison of trends in practice and research for deep learning. In: International conference on computational sciences and technology (INCCST) pp 124–133

  83. Oseledets IV (2011) Tensor-train decomposition. SIAM J Sci Comput 33(5):2295–2317

    Article  MathSciNet  MATH  Google Scholar 

  84. Pérez Arribas I (2017) Sobolev spaces and partial differential equations

  85. Pham H, Guan M, Zoph B, Le Q, Dean J (2018) Efficient neural architecture search via parameters sharing. In: Proceedings of the 35th international conference on machine learning, proceedings of machine learning research, vol 80, pp 4095–4104

  86. Poggio T, Kawaguchi K, Liao Q, Miranda B, Rosasco L, Boix X, Hidary J, Mhaskar H (2017) Theory of deep learning iii: explaining the non-overfitting puzzle. Massachusetts Institute of Technology CBMM Memo No. 73

  87. Poole B, Lahiri S, Raghu M, Sohl-Dickstein J, Ganguli S (2016) Exponential expressivity in deep neural networks through transient chaos. In: Advances in neural information processing systems, pp 3360–3368

  88. Radosavovic I, Johnson J, Xie S, Lo WY, Dollár P (2019) On network design spaces for visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1882–1890

  89. Raghu M, Poole B, Kleinberg J, Ganguli S, Dickstein JS (2017) On the expressive power of deep neural networks. In: Proceedings of the 34th international conference on machine learning, vol. 70, JMLR, pp 2847–2854

  90. Rasmussen CE, Ghahramani Z (2001) Occams razor. In: Advances in neural information processing systems, pp 294–300

  91. Rebentrost P, Gupt B, Bromley TR (2018) Quantum computational finance: Monte carlo pricing of financial derivatives. Phys Rev A 98(2):022321

    Article  Google Scholar 

  92. Serra T, Tjandraatmadja C, Ramalingam S (2018) Bounding and counting linear regions of deep neural networks. In: International conference on machine learning, PMLR, pp 4558–4566

  93. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc Ser B (Stat Methodol) 64(4):583–639

    Article  MathSciNet  MATH  Google Scholar 

  94. Sun R (2020) Optimization for deep learning: theory and algorithms. J Oper Res Soc China 8(2):249–294

    Article  MathSciNet  Google Scholar 

  95. Tan Y, Wang J (2004) A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension. IEEE Trans Knowl Data Eng 16(4):385–395

    Article  Google Scholar 

  96. Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  97. Xu H, Mannor S (2012) Robustness and generalization. Mach Learn 86(3):391–423

    Article  MathSciNet  MATH  Google Scholar 

  98. Yao ACC (1997) Decision tree complexity and Betti numbers. J Comput Syst Sci 55(1):36–43

    Article  MathSciNet  MATH  Google Scholar 

  99. Yin D, Kannan R, Bartlett P (2019) Rademacher complexity for adversarially robust generalization. In: International conference on machine learning, PMLR, pp 7085–7094

  100. Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2017) Understanding deep learning requires rethinking generalization. In: International conference on learning representations

  101. Zheng S, Meng Q, Zhang H, Chen W, Yu N, Liu TY (2019) Capacity control of Relu neural networks by basis-path norm. Proc AAAI Conf Artif Intell 33:5925–5932

    Google Scholar 

  102. Ziegler GM (2012) Lectures on polytopes, vol 152. Springer, Berlin

    MATH  Google Scholar 

  103. Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. In: International conference on learning representations

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Pei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xia Hu’s and Jian Pei’s research is supported in part by the NSERC Discovery Grant program. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, X., Chu, L., Pei, J. et al. Model complexity of deep learning: a survey. Knowl Inf Syst 63, 2585–2619 (2021). https://doi.org/10.1007/s10115-021-01605-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-021-01605-0

Keywords

Navigation