Abstract
Model complexity is a fundamental problem in deep learning. In this paper, we conduct a systematic overview of the latest studies on model complexity in deep learning. Model complexity of deep learning can be categorized into expressive capacity and effective model complexity. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process, and data complexity. We also discuss the applications of deep learning model complexity including understanding model generalization, model optimization, and model selection and design. We conclude by proposing several interesting future directions.
Similar content being viewed by others
References
Adams RA, Fournier JJ (2003) Sobolev spaces. Elsevier, Amsterdam
Allen-Zhu Z, Li Y, Liang Y (2019) Learning and generalization in overparameterized neural networks, going beyond two layers. In: Advances in neural information processing systems, pp 6155–6166
Arora R, Basu A, Mianjy P, Mukherjee A (2018) Understanding deep neural networks with rectified linear units. In: International conference on learning representations
Arora S, Barak B (2009) Computational complexity: a modern approach. Cambridge University Press, Cambridge
Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in neural information processing systems, pp 2654–2662
Balasubramanian V (1997) Statistical inference, Occams razor, and statistical mechanics on the space of probability distributions. Neural Comput 9(2):349–368
Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39(3):930–945
Bartlett PL, Boucheron S, Lugosi G (2002) Model selection and error estimation. Mach Learn 48(1–3):85–113
Bartlett PL, Foster DJ, Telgarsky MJ (2017) Spectrally-normalized margin bounds for neural networks. In: Advances in neural information processing systems, pp 6240–6249
Bartlett PL, Harvey N, Liaw C (2019) A Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks. J Mach Learn Res 20(63):1–17
Bartlett PL, Maiorov V, Meir R (1998) Almost linear vc-dimension bounds for piecewise polynomial networks. Neural Comput 10(8):2159–2173
Bartlett PL, Mendelson S (2002) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3(Nov):463–482
Bengio Y, Delalleau O (2011) On the expressive power of deep architectures. In: International conference on algorithmic learning theory, Springer, pp 18–36
Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learn Syst 25(8):1553–1565
Bianchini M, Scarselli F (2014) On the complexity of shallow and deep neural network classifiers. In: ESANN
Bohanec M, Bratko I (1994) Trading accuracy for simplicity in decision trees. Mach Learn 15(3):223–250
Bonaccorso G (2017) Machine learning algorithms. Packt Publishing Ltd, Birmingham
Bredon GE (2013) Topology and geometry, vol 139. Springer, Berlin
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
Buhrman H, De Wolf R (2002) Complexity measures and decision tree complexity: a survey. Theoret Comput Sci 288(1):21–43
Bulso N, Marsili M, Roudi Y (2019) On the complexity of logistic regression models. Neural Comput 31(8):1592–1623
Cano JR (2013) Analysis of data complexity measures for classification. Expert Syst Appl 40(12):4820–4831
Carothers NL (2000) Real analysis. Cambridge University Press, Cambridge
Carroll JD, Chang JJ (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of eckart-young decomposition. Psychometrika 35(3):283–319
Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Process Mag 35(1):126–136
Cherkassky V, Shao X, Mulier FM, Vapnik VN (1999) Model complexity control for regression using vc generalization bounds. IEEE Trans Neural Netw 10(5):1075–1089
Cohen N, Shashua A (2016) Convolutional rectifier networks as generalized tensor decompositions. In: International conference on machine learning, pp 955–963
Cook S, Dwork C, Reischuk R (1986) Upper and lower time bounds for parallel random access machines without simultaneous writes. SIAM J Comput 15(1):87–97
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314
Delalleau O, Bengio Y (2011) Shallow vs. deep sum-product networks. In: Advances in neural information processing systems, pp 666–674
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255
Du J (2016) The weight of models and complexity. Complexity 21(3):21–35
Frieden B (2004) Science from fisher information: a unification. Cambridge univ. Press, Cambridge
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, New York
Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: International conference on machine learning, PMLR, pp 1319–1327
Gühring I, Kutyniok G, Petersen P (2019) Complexity bounds for approximations with deep relu neural networks in sobolev norms
Hanin B, Rolnick D (2019) Complexity of linear regions in deep networks. In: International conference on machine learning, PMLR, pp 2596–2604
Hanin B, Sellke M (2017) Approximating continuous functions by relu nets of minimal width. arXiv preprint arXiv:1710.11278
Hayou S, Doucet A, Rousseau J (2018) On the selection of initialization and activation function for deep neural networks. STAT 1050:7
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. STAT 1050:9
Höge M, Wöhling T, Nowak W (2018) A primer for model selection: the decisive role of model complexity. Water Resour Res 54(3):1688–1715
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Hu X, Liu W, Bian J, Pei J (2020) Measuring model complexity of neural networks with curve activation functions. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1521–1531
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, proceedings of machine learning research, pp 448–456
Kakade SM, Sridharan K, Tewari A (2009) On the complexity of linear prediction: risk bounds, margin bounds, and regularization. In: Advances in neural information processing systems, pp 793–800
Kalimeris D, Kaplun G, Nakkiran P, Edelman B, Yang T, Barak B, Zhang H (2019) Sgd on neural networks learns functions of increasing complexity. In: Advances in neural information processing systems, pp 3491–3501
Kalman BL, Kwasny SC (1992) Why tanh: choosing a sigmoidal function. In: Proceedings 1992 of IJCNN international joint conference on neural networks, vol 4, IEEE, pp 578–581
Kawaguchi K, Kaelbling LP, Bengio Y (2017) Generalization in deep learning. arXiv preprint arXiv:1710.05468
Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: International conference on learning representations
Khrulkov V, Novikov A, Oseledets I (2018) Expressive power of recurrent neural networks. In: International conference on learning representations
Kileel J, Trager M, Bruna J (2019) On the expressive power of deep polynomial neural networks. In: Advances in neural information processing systems, pp 10310–10319
Kileel J, Trager M, Bruna J (2019) On the expressive power of deep polynomial neural networks. In: Advances in neural information processing systems, pp 10310–10319
Kuurkova V (2018) Constructive lower bounds on model complexity of shallow perceptron networks. Neural Comput Appl 29(7):305–315
Lample G, Ott M, Conneau A, Denoyer L, Ranzato M (2018) Phrase-based and neural unsupervised machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 5039–5049
Landsberg JM (2012) Tensors: geometry and applications. Rep Theory 381(402):3
Laredo D, Ma SF, Leylaz G, Schütze O, Sun JQ (2020) Automatic model selection for fully connected neural networks. Int J Dyn Control 8(4):1063–1079
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Li L (2006) Data complexity in machine learning and novel classification algorithms. PhD thesis, California Institute of Technology
Liang T, Poggio T, Rakhlin A, Stokes J (2019) Fisher-rao metric, geometry, and complexity of neural networks. In: The 22nd international conference on artificial intelligence and statistics, PMLR, pp 888–896
Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–228
Liu C, Zoph B, Neumann M, Shlens J, Hua W, Li LJ, Fei-Fei L, Yuille A, Huang J, Murphy K (2018) Progressive neural architecture search. In: Proceedings of the European conference on computer vision (ECCV), pp 19–34
Liu H, Simonyan K, Yang Y (2019) Darts: differentiable architecture search. In: International conference on learning representations
Lu Z, Pu H, Wang F, Hu Z, Wang L (2017) The expressive power of neural networks: a view from the width. In: Advances in neural information processing systems, pp 6231–6239
Lundqvist S, Oneto A, Reznick B, Shapiro B (2019) On generic and maximal k-ranks of binary forms. J Pure Appl Algebra 223(5):2062–2079
Maass W (1994) Neural nets with superlinear vc-dimension. Neural Comput 6(5):877–884
Mhaskar H, Liao Q, Poggio T (2017). When and why are deep networks better than shallow ones? In: Proceedings of the thirty-first AAAI conference on artificial intelligence, AAAI’17, AAAI Press, pp 2343–2349
Michel B, Nouy A (2020) Learning with tree tensor networks: complexity estimates and model selection. arXiv preprint arXiv:2007.01165
Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of machine learning. MIT press, New York
Montufar GF, Pascanu R, Cho K, Bengio Y (2014) On the number of linear regions of deep neural networks. In: Advances in neural information processing systems, pp 2924–2932
Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, New York
Myung IJ (2000) The importance of complexity in model selection. J Math Psychol 44(1):190–204
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning, pp 807–814
Nakkiran P, Kaplun G, Bansal Y, Yang T, Barak B, Sutskever I (2020) Deep double descent: where bigger models and more data hurt. In: International conference on learning representations
Neyshabur B (2017) Implicit regularization in deep learning. arXiv preprint arXiv:1709.01953
Neyshabur B, Bhojanapalli S, McAllester D, Srebro N (2017) Exploring generalization in deep learning. In: Advances in neural information processing systems, pp 5947–5956
Neyshabur B, Li Z, Bhojanapalli S, LeCun Y, Srebro N (2018) The role of over-parametrization in generalization of neural networks. In: International conference on learning representations
Neyshabur B, Tomioka R, Srebro N (2015) In search of the real inductive bias: on the role of implicit regularization in deep learning. In: International conference on learning representations (workshop)
Neyshabur B, Tomioka R, Srebro N (2015) Norm-based capacity control in neural networks. In: Conference on learning theory, pp 1376–1401
Nisan N, Szegedy M (1994) On the degree of Boolean functions as real polynomials. Comput Complex 4(4):301–313
Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J (2018) Sensitivity and generalization in neural networks: an empirical study. In: International conference on learning representations
Nwankpa C, Ijomah W, Gachagan A, Marshall S (2021) Activation functions: comparison of trends in practice and research for deep learning. In: International conference on computational sciences and technology (INCCST) pp 124–133
Oseledets IV (2011) Tensor-train decomposition. SIAM J Sci Comput 33(5):2295–2317
Pérez Arribas I (2017) Sobolev spaces and partial differential equations
Pham H, Guan M, Zoph B, Le Q, Dean J (2018) Efficient neural architecture search via parameters sharing. In: Proceedings of the 35th international conference on machine learning, proceedings of machine learning research, vol 80, pp 4095–4104
Poggio T, Kawaguchi K, Liao Q, Miranda B, Rosasco L, Boix X, Hidary J, Mhaskar H (2017) Theory of deep learning iii: explaining the non-overfitting puzzle. Massachusetts Institute of Technology CBMM Memo No. 73
Poole B, Lahiri S, Raghu M, Sohl-Dickstein J, Ganguli S (2016) Exponential expressivity in deep neural networks through transient chaos. In: Advances in neural information processing systems, pp 3360–3368
Radosavovic I, Johnson J, Xie S, Lo WY, Dollár P (2019) On network design spaces for visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1882–1890
Raghu M, Poole B, Kleinberg J, Ganguli S, Dickstein JS (2017) On the expressive power of deep neural networks. In: Proceedings of the 34th international conference on machine learning, vol. 70, JMLR, pp 2847–2854
Rasmussen CE, Ghahramani Z (2001) Occams razor. In: Advances in neural information processing systems, pp 294–300
Rebentrost P, Gupt B, Bromley TR (2018) Quantum computational finance: Monte carlo pricing of financial derivatives. Phys Rev A 98(2):022321
Serra T, Tjandraatmadja C, Ramalingam S (2018) Bounding and counting linear regions of deep neural networks. In: International conference on machine learning, PMLR, pp 4558–4566
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc Ser B (Stat Methodol) 64(4):583–639
Sun R (2020) Optimization for deep learning: theory and algorithms. J Oper Res Soc China 8(2):249–294
Tan Y, Wang J (2004) A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension. IEEE Trans Knowl Data Eng 16(4):385–395
Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin
Xu H, Mannor S (2012) Robustness and generalization. Mach Learn 86(3):391–423
Yao ACC (1997) Decision tree complexity and Betti numbers. J Comput Syst Sci 55(1):36–43
Yin D, Kannan R, Bartlett P (2019) Rademacher complexity for adversarially robust generalization. In: International conference on machine learning, PMLR, pp 7085–7094
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2017) Understanding deep learning requires rethinking generalization. In: International conference on learning representations
Zheng S, Meng Q, Zhang H, Chen W, Yu N, Liu TY (2019) Capacity control of Relu neural networks by basis-path norm. Proc AAAI Conf Artif Intell 33:5925–5932
Ziegler GM (2012) Lectures on polytopes, vol 152. Springer, Berlin
Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. In: International conference on learning representations
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Xia Hu’s and Jian Pei’s research is supported in part by the NSERC Discovery Grant program. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
Rights and permissions
About this article
Cite this article
Hu, X., Chu, L., Pei, J. et al. Model complexity of deep learning: a survey. Knowl Inf Syst 63, 2585–2619 (2021). https://doi.org/10.1007/s10115-021-01605-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01605-0