Skip to main content
Log in

A comprehensive survey on model compression and acceleration

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

In recent years, machine learning (ML) and deep learning (DL) have shown remarkable improvement in computer vision, natural language processing, stock prediction, forecasting, and audio processing to name a few. The size of the trained DL model is large for these complex tasks, which makes it difficult to deploy on resource-constrained devices. For instance, size of the pre-trained VGG16 model trained on the ImageNet dataset is more than 500 MB. Resource-constrained devices such as mobile phones and internet of things devices have limited memory and less computation power. For real-time applications, the trained models should be deployed on resource-constrained devices. Popular convolutional neural network models have millions of parameters that leads to increase in the size of the trained model. Hence, it becomes essential to compress and accelerate these models before deploying on resource-constrained devices while making the least compromise with the model accuracy. It is a challenging task to retain the same accuracy after compressing the model. To address this challenge, in the last couple of years many researchers have suggested different techniques for model compression and acceleration. In this paper, we have presented a survey of various techniques suggested for compressing and accelerating the ML and DL models. We have also discussed the challenges of the existing techniques and have provided future research directions in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of twitter data. In: Proceedings of the workshop on language in social media (LSM 2011), pp 30–38

  • Al-Rfou R, Alain G, Almahairi A, Angermueller C, Bahdanau D, Ballas N, Bastien F, Bayer J, Belikov A, Belopolsky A et al (2016) Theano: a python framework for fast computation of mathematical expressions. ArXiv preprint arXiv:1605.02688

  • Alvarez JM, Salzmann M (2017) Compression-aware training of deep networks. In: Advances in neural information processing systems, pp 856–867

  • Anwar S, Hwang K, Sung W (2017) Structured pruning of deep convolutional neural networks. ACM J Emerg Technol Comput Syst (JETC) 13(3):32

    Google Scholar 

  • Ardakani A, Condo C, Gross WJ (2016) Sparsely-connected neural networks: towards efficient vlsi implementation of deep neural networks. In: Published as a conference paper at ICLR 2017

  • Ardakani A, Ji Z, Smithson SC, Meyer BH, Gross WJ (2019) Learning recurrent binary/ternary weights. In: International conference on learning representations. https://openreview.net/forum?id=HkNGYjR9FX

  • Babaeizadeh M, Smaragdis P, Campbell RH (2017) A simple yet effective method to prune dense layers of neural networks. In: Under review as a conference paper at ICLR

  • Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in neural information processing systems, pp 2654–2662

  • Balzer W, Takahashi M, Ohta J, Kyuma K (1991) Weight quantization in boltzmann machines. Neural Netw 4(3):405–409

    Google Scholar 

  • Boni A, Pianegiani F, Petri D (2007) Low-power and low-cost implementation of svms for smart sensors. IEEE Trans Instrum Meas 56(1):39–44

    Google Scholar 

  • Buciluǎ C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 535–541

  • Cai Z, He X, Sun J, Vasconcelos N (2017) Deep learning with low precision by half-wave gaussian quantization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5918–5926

  • Chen S, Zhao Q (2018) Shallowing deep networks: layer-wise pruning based on feature representations. IEEE Trans Pattern Anal Mach Intell 41:3048–3056

    Google Scholar 

  • Chen G, Choi W, Yu X, Han T, Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In: Advances in neural information processing systems, pp 742–751

  • Cheng J, Wu J, Leng C, Wang Y, Hu Q (2017) Quantized CNN: a unified approach to accelerate and compress convolutional networks. IEEE Trans Neural Netw Learn Syst 29:4730–4743

    Google Scholar 

  • Chen T, Goodfellow I, Shlens J (2016) Net2net: accelerating learning via knowledge transfer. In: Published as a conference paper at ICLR

  • Cheng Z, Soudry D, Mao Z, Lan Z-Z (2015) Training binary multilayer neural networks for image classification using expectation backpropagation. CoRR arXiv:1503.03562

  • Chen C, Seff A, Kornhauser A, Xiao J (2015a) Deepdriving: learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE international conference on computer vision. pp 2722–2730

  • Chen W, Wilson J, Tyree S, Weinberger K, Chen Y (2015b) Compressing neural networks with the hashing trick. In: International conference on machine learning, pp 2285–2294

  • Choi J, Wang Z, Venkataramani S, Chuang PI-J, Srinivasan V, Gopalakrishnan K (2018) Pact: Parameterized clipping activation for quantized neural networks. ArXiv preprint arXiv:1805.06085

  • Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1724–1734

  • Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(1):2493–2537

    MATH  Google Scholar 

  • Courbariaux M, Bengio Y, David J-P (2015a) Training deep neural networks with low precision multiplications. In: Accepted as a workshop contribution at ICLR

  • Courbariaux M, Bengio Y, David J-P (2015b) Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems, pp 3123–3131

  • Crowley EJ, Turner J, Storkey A, O’Boyle M (2018) A closer look at structured pruning for neural network compression. ArXiv preprint arXiv:1810.04622

  • Demeester T, Deleu J, Godin F, Develder C (2018) Predefined sparseness in recurrent sequence models. In: Proceedings of the 22nd conference on computational natural language learning, pp 324–333

  • Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255

  • Denil M, Shakibi B, Dinh L, De Freitas N et al (2013) Predicting parameters in deep learning. In: Advances in neural information processing systems, pp 2148–2156

  • Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277

  • Ericsson-Mobility-Report (2018) Ericsson mobility report. https://www.ericsson.com/assets/local/mobility-report/documents/2018/ericsson-mobility-report-november-2018.pdf

  • Fiesler E, Choudry A, Caulfield HJ (1990) Weight discretization paradigm for optical neural networks. In: Optical interconnections and networks, volume 1281. International Society for Optics and Photonics, pp 164–174

  • Frankle J, Carbin M (2019) The lottery ticket hypothesis: finding, trainable neural networks. In: Published as a conference paper at ICLR 2019

  • Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. In: Advances in neural information processing systems, pp 1019–1027

  • Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS (1993) DARPA TIMIT acoustic–phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. In: NASA STI/Recon technical report N 93

  • Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  • Gong Y, Liu L, Yang M, Bourdev L (2015) Compressing deep convolutional networks using vector quantization. In: Under review as a conference paper at ICLR

  • Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  • Gordon A, Eban E, Nachum O, Chen B, Wu H, Yang T-J, Choi E (2018) Morphnet: fast and simple resource-constrained structure learning of deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1586–1595

  • Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5–6):602–610

    Google Scholar 

  • Graves A, Mohamed A-R, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6645–6649

  • Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient dnns. In: Advances in neural information processing systems, pp 1379–1387

  • Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: International conference on machine learning, pp 1737–1746

  • Gupta C, Suggala AS, Goyal A, Simhadri HV, Paranjape B, Kumar A, Goyal S, Udupa R, Varma M, Jain P (2017) Protonn: compressed and accurate KNN for resource-scarce devices. In: International conference on machine learning, pp 1331–1340

  • Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016a) EIE: efficient inference engine on compressed deep neural network. In: 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA). IEEE, pp 243–254

  • Han S, Mao H, Dally WJ (2016b) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. In: Published as a conference paper at ICLR

  • Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135–1143

  • Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Advances in neural information processing systems, pp 164–171

  • He Y, Lin J, Liu Z, Wang H, Li L-J, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. In: Proceedings of the European conference on computer vision (ECCV), pp 784–800

  • He Q, Wen H, Zhou S, Wu Y, Yao C, Zhou X, Zou Y (2016b) Effective quantization methods for recurrent neural networks. ArXiv preprint arXiv:1611.10176

  • He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  • He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1389–1397

  • Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. ArXiv preprint arXiv:1503.02531

  • Horowitz M (2014) 1.1 computing’s energy problem (and what we can do about it). In: 2014 IEEE international solid-state circuits conference digest of technical papers (ISSCC). IEEE, pp 10–14

  • Hou L, Kwok JT (2018) Loss-aware weight quantization of deep networks. In: Published as a conference paper at ICLR 2018. https://openreview.net/forum?id=BkrSv0lA-

  • Hou L, Yao Q, Kwok JT (2017) Loss-aware binarization of deep networks. In: Published as a conference paper at ICLR

  • Howard AG, Zhu AG, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. ArXiv preprint arXiv:1704.04861

  • Huang G, Liu Z, Van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  • Huang G, Liu S, Van der Maaten L, Weinberger KQ (2018) Condensenet: an efficient densenet using learned group convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2752–2761

  • Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2017) Quantized neural networks: Training neural networks with low precision weights and activations. J Mach Learn Res 18(1):6869–6898

    MathSciNet  MATH  Google Scholar 

  • Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: Advances in neural information processing systems, pp 4107–4115

  • Hwang K, Sung W (2014) Fixed-point feedforward deep neural network design using weights \(+1\), 0, and \(-1\). In: 2014 IEEE workshop on signal processing systems (SiPS). IEEE, pp 1–6

  • Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2017) Squeezenet: alexnet-level accuracy with \(50{\times }\) fewer parameters and \(<0.5\) MB model size. In: International conference on learning representations

  • Ioannou Y, Robertson D, Shotton J, Cipolla R, Criminisi A (2016) Training cnns with low-rank filters for efficient image classification. In: Published as a conference paper at ICLR

  • Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456

  • Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. In: Proceedings of the British machine vision conference. BMVA Press

  • Joly A, Schnitzler F, Geurts P, Wehenkel L (2012) L1-based compression of random forest models. In: 20th European symposium on artificial neural networks

  • Jose C, Goyal P, Aggrwal P, Varma M (2013) Local deep kernel learning for efficient non-linear svm prediction. In: International conference on machine learning, pp 486–494

  • Juefei-Xu F, Boddeti VN, Savvides M (2017) Local binary convolutional neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 4284–4293

  • Kim Y-D, Park E, Yoo S, Choi T, Yang L, Shin D (2016) Compression of deep convolutional neural networks for fast and low power mobile applications. In: Published as a conference paper at ICLR

  • Kim J, Hwang K, Sung W (2014) X1000 real-time phoneme recognition vlsi using feed-forward deep neural networks. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 7510–7514

  • Kim J, Park S, Kwak N (2018) Paraphrasing complex network: network compression via factor transfer. In: Advances in neural information processing systems, pp 2760–2769

  • Kim M, Smaragdis P (2016) Bitwise neural networks. In: International conference on machine learning (ICML) workshop on resource-efficient machine learning

  • Kim M, Smaragdis P (2018) Efficient source separation using bitwise neural networks. In: Audio source separation. Springer, pp 187–206

  • Krizhevsky A (2014) One weird trick for parallelizing convolutional neural networks. ArXiv preprint arXiv:1404.5997

  • Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. In: Technical report. Citeseer

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  • Kumar A, Goyal S, Varma M (2017) Resource-efficient machine learning in 2 kb ram for the internet of things. In: International conference on machine learning, pp 1935–1944

  • Kusupati A, Singh M, Bhatia K, Kumar A, Jain P, Varma M (2018) Fastgrnn: a fast, accurate, stable and tiny kilobyte sized gated recurrent neural network. In: Advances in neural information processing systems, pp 9031–9042

  • Lan X, Zhu X, Gong S (2018) Knowledge distillation by on-the-fly native ensemble. In: Proceedings of the 32nd international conference on neural information processing systems. Curran Associates Inc., pp 7528–7538

  • Lebedev V, Ganin Y, Rakhuba M, Oseledets I, Lempitsky V (2015) Speeding-up convolutional neural networks using fine-tuned cp-decomposition. In: Published as a conference paper at ICLR

  • LeCun Y (1998) The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/

  • LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Google Scholar 

  • LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598–605

  • Lee JD, Sun Y, Saunders MA (2014) Proximal newton-type methods for minimizing composite functions. SIAM J Optim 24(3):1420–1443

    MathSciNet  MATH  Google Scholar 

  • Le Q, Sarlós T, Smola A (2013) Fastfood-approximating kernel expansions in loglinear time. In: Proceedings of the international conference on machine learning, volume 85

  • Li X-B, Sweigart J, Teng J, Donohue J, Thombs L (2001) A dynamic programming based pruning method for decision trees. INFORMS J Comput 13(4):332–344

    MATH  Google Scholar 

  • Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2017) Pruning filters for efficient convnets. In: Published as a conference paper at ICLR

  • Li F, Liu B (2016) Ternary weight networks. In: 30th conference on neural information processing systems (NIPS). Barcelona

  • Lin C-Y, Wang T-C, Chen K-C, Lee B-Y, Kuo J-J (2019) Distributed deep neural network deployment for smart devices from the edge to the cloud. In: Proceedings of the ACM MobiHoc workshop on pervasive systems in the IoT era. pp 43–48

  • Lin J-H, Xing T, Zhao R, Zhang Z, Srivastava MB, Tu Z, Gupta RK (2017a) Binarized convolutional neural networks with separable filters for efficient hardware acceleration. In: CVPR workshops, pp 344–352

  • Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755

  • Lin M, Chen Q, Yan S (2013) Network in network. ArXiv preprint arXiv:1312.4400

  • Lin Z, Courbariaux M, Memisevic R, Bengio Y (2016b) Neural networks with few multiplications. In: Published as a conference paper at ICLR

  • Lin J, Rao Y, Lu J, Zhou J (2017b) Runtime neural pruning. In: Advances in neural information processing systems, pp 2181–2191

  • Lin D, Talathi S, Annapureddy S (2016a) Fixed point quantization of deep convolutional networks. In: International conference on machine learning, pp 2849–2858

  • Li C, Shi CJR (2018) Constrained optimization based low-rank approximation of deep neural networks. In: European conference on computer vision. Springer, pp 746–761

  • Littman ML (2015) Reinforcement learning improves behaviour from evaluative feedback. Nature 521(7553):445

    Google Scholar 

  • Liu S, Lin Y, Zhou Z, Nan K, Liu H, Du J (2018) On-demand deep model compression for mobile devices: a usage-driven model selection framework. In: Proceedings of the 16th annual international conference on mobile systems, applications, and services. ACM, pp 389–400

  • Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736–2744

  • Liu Z, Sun M, Zhou T, Huang G, Darrell T (2019) Rethinking the value of network pruning. In: Published as a conference paper at ICLR

  • Liu B, Wang M, Foroosh H, Tappen M, Pensky M (2015) Sparse convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 806–814

  • Li E, Zeng L, Zhou Z, Chen X (2019) Edge AI: on-demand accelerating deep neural network inference via edge computing. IEEE Trans Wirel Commun 19:447–457

    Google Scholar 

  • Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    MathSciNet  MATH  Google Scholar 

  • Lobacheva E, Chirkova N, Vetrov D (2017) Bayesian sparsification of recurrent neural networks. In: Published in workshop on learning to generate natural language. ICML

  • Lobacheva E, Chirkova N, Vetrov D (2018) Bayesian sparsification of gated recurrent neural networks. In: Published in workshop on compact deep neural networks with industrial applications. NeurIPS

  • Luo J-H, Zhang H, Zhou H-Y, Xie C-W, Wu J, Lin W (2018) Thinet: pruning cnn filters for a thinner net. IEEE Trans Pattern Anal Mach Intell 41:2525–2538

    Google Scholar 

  • Lu Z, Sindhwani V, Sainath TN (2016) Learning compact recurrent neural networks. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5960–5964

  • Marcus M, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330

    Google Scholar 

  • Ma N, Zhang X, Zheng H-T, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116–131

  • Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 552–568

  • Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) Espnetv2: a light-weight, power efficient, and general purpose convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9190–9200

  • Mishra A, Marr D (2017) Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. ArXiv preprint arXiv:1711.05852

  • Molchanov D, Ashukha A, Vetrov D (2017a) Variational dropout sparsifies deep neural networks. In: Proceedings of the 34th international conference on machine learning volume 70. JMLR.org, pp 2498–2507

  • Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2017b) Pruning convolutional neural networks for efficient inference. In: Published as a conference paper at ICLR

  • Moshtaghi M, Rajasegarar S, Leckie C, Karunasekera S (2011) An efficient hyperellipsoidal clustering algorithm for resource-constrained environments. Pattern Recognit 44(9):2197–2209

    Google Scholar 

  • Nakajima S, Tomioka R, Sugiyama M, Babacan SD (2012) Perfect dimensionality recovery by variational Bayesian PCA. In: Advances in neural information processing systems, pp 971–979

  • Nan F, Wang J, Saligrama V (2016) Pruning random forests for prediction on a budget. In: Advances in neural information processing systems, pp 2334–2342

  • Narang S, Elsen E, Diamos G, Sengupta S (2017) Exploring sparsity in recurrent neural networks. In: Published as a conference paper at ICLR

  • Narang S, Undersander E, Diamos GF (2018) Block-sparse recurrent neural networks. CoRR arXiv:1711.02782

  • Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning. Granada, p 5

  • Novikov A, Podoprikhin D, Osokin A, Vetrov DP (2015) Tensorizing neural networks. In: Advances in neural information processing systems, pp 442–450

  • Oguntola I, Olubeko S, Sweeney C (2018) Slimnets: an exploration of deep model compression and acceleration. In: 2018 IEEE high performance extreme computing conference (HPEC). IEEE, pp 1–6

  • Ott J, Lin Z, Zhang Y, Liu S-C, Bengio Y (2016) Recurrent neural networks with limited numerical precision. ArXiv preprint arXiv:1608.06902

  • Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an asr corpus based on public domain audio books. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5206–5210

  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) PyTorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems, pp 8024–8035

  • Peddinti V, Povey D, Khudanpur S (2015) A time delay neural network architecture for efficient modeling of long temporal contexts. In: Sixteenth annual conference of the international speech communication association

  • Polino A, Pascanu R, Alistarh D (2018) Model compression via distillation and quantization. In: Published as a conference paper at ICLR 2018

  • Povey D, Cheng G, Wang Y, Li K, Xu H, Yarmohamadi M, Khudanpur S (2018) Semi-orthogonal low-rank matrix factorization for deep neural networks. In: Proceedings of the 19th annual conference of the international speech communication association (INTERSPEECH). Hyderabad

  • Prabhavalkar R, Alsharif O, Bruguier A, McGraw L (2016) On the compression of recurrent neural networks with an application to lvcsr acoustic modeling for embedded speech recognition. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5970–5974

  • Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, Hoboken

    MATH  Google Scholar 

  • Rakotomamonjy A, Flamary R, Gasso G (2015) Dc proximal newton for nonconvex optimization problems. IEEE Trans Neural Netw Learn Syst 27(3):636–647

    MathSciNet  Google Scholar 

  • Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542

  • Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  • Rigamonti R, Sironi A, Lepetit V, Fua P (2013) Learning separable filters. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2754–2761

  • Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2015) Fitnets: hints for thin deep nets. In: Published as a conference paper at ICLR

  • Sainath TN, Kingsbury B, Sindhwani V, Arisoy E, Ramabhadran B (2013) Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6655–6659

  • Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  • Shen J, Vesdapunt N, Boddeti VN, Kitani KM (2016) In teacher we trust: learning compressed models for pedestrian detection. ArXiv preprint arXiv:1612.00478

  • Sherali HD, Hobeika AG, Jeenanunta C (2009) An optimal constrained pruning strategy for decision trees. INFORMS J Comput 21(1):49–61

    MathSciNet  MATH  Google Scholar 

  • Shin S, Hwang K, Sung W (2016) Fixed-point performance analysis of recurrent neural networks. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 976–980

  • Shi B, Sun M, Kao C-C, Rozgic V, Matsoukas S, Wang C (2018) Compression of acoustic event detection models with low-rank matrix factorization and quantization training. In: 32nd conference on neural information processing systems. Montreal

  • Shotton J, Sharp T, Kohli P, Nowozin S, Winn J, Criminisi A (2013) Decision jungles: compact and rich models for classification. In: Advances in neural information processing systems, pp 234–242

  • Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Published as a conference paper at ICLR

  • Soudry D, Hubara I, Meir R (2014) Expectation backpropagation: Parameter-free training of multilayer neural networks with continuous or discrete weights. In: Advances in neural information processing systems, pp 963–971

  • Srinivas S, Babu RV (2015) Data-free parameter pruning for deep neural networks. ArXiv preprint arXiv:1507.06149

  • Srinivas S, Fleuret F (2018) Knowledge transfer with jacobian matching. ArXiv preprint arXiv:1803.00443

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  • Sung W, Shin S, Hwang K (2015) Resiliency of deep neural networks under quantization. ArXiv preprint arXiv:1511.06488

  • Sutskever I, Martens J, Hinton GE (2011) Generating text with recurrent neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 1017–1024

  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112

  • Suzuki K, Horiba I, Sugie N (2001) A simple neural network pruning algorithm with application to filter synthesis. Neural Process Lett 13(1):43–53

    MATH  Google Scholar 

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  • Tai C, Xiao T, Zhang Y, Wang X et al (2016) Convolutional neural networks with low-rank regularization. In: Published as a conference paper at ICLR

  • Theis L, Korshunova I, Tejani A, Huszar F (2018) Faster gaze prediction with dense networks and fisher pruning. ArXiv preprint arXiv:1801.05787

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Tucker LR (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31(3):279–311

    MathSciNet  Google Scholar 

  • Verhelst M, Moons B (2017) Embedded deep neural network processing: algorithmic and processor techniques bring deep learning to iot and edge devices. IEEE Solid State Circuits Mag 9(4):55–65

    Google Scholar 

  • Vu TH, Dung L, Wang J-C (2016) Transportation mode detection on mobile devices using recurrent nets. In: Proceedings of the 24th ACM international conference on multimedia. ACM, pp 392–396

  • Wu B, Iandola FN, Jin PH, Keutzer K (2017) Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: CVPR workshops, pp 446–454

  • Wu X, Wu Y, Zhao Y (2016) Binarized neural networks on the imagenet classification task. ArXiv preprint arXiv:1604.03058

  • Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

  • Xu C, Yao J, Lin Z, Ou W, Cao Y, Wang Z, Zha H (2018) Alternating multi-bit quantization for recurrent neural networks. In: Published as a conference paper at ICLR

  • Yang T-J, Chen Y-H, Sze V (2017) Designing energy-efficient convolutional neural networks using energy-aware pruning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5687–5695

  • Yang Z, Moczulski M, Denil M, de Freitas N, Smola A, Song L, Wang Z (2015) Deep fried convnets. In: Proceedings of the IEEE international conference on computer vision, pp 1476–1483

  • Yuan Z, Lu Y, Wang Z, Xue Y (2014) Droid-sec: deep learning in android malware detection. In: ACM SIGCOMM computer communication review, volume 44. ACM, pp 371–372

  • Yu X, Liu T, Wang X, Tao D (2017) On compressing deep models by low rank and sparse decomposition. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 67–76

  • Zagoruyko S, Komodakis N (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Published as a conference paper at ICLR

  • Zhang X, Zou J, He K, Sun J (2016) Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell 38(10):1943–1955

    Google Scholar 

  • Zhang J, Wang X, Li D, Wang Y (2018a) Dynamically hierarchy revolution: dirnet for compressing recurrent neural network on mobile devices. In: Proceedings of the 27th international joint conference on artificial intelligence. AAAI Press, pp 3089–3096

  • Zhang X, Zhou X, Lin M, Sun J (2018b) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

  • Zhang X, Zou J, Ming X, He K, Sun J (2015) Efficient and accurate approximations of nonlinear convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1984–1992

  • Zhao R, Song W, Zhang W, Xing T, Lin J-H, Srivastava M, Gupta R, Zhang Z (2017) Accelerating binarized convolutional neural networks with software-programmable fpgas. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. ACM, pp 15–24

  • Zhou S-C, Wang Y-Z, Wen H, He Q-Y, Zou Y-H (2017b) Balanced quantization: an effective and efficient approach to quantized neural networks. J Comput Sci Technol 32(4):667–682

    MathSciNet  Google Scholar 

  • Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. ArXiv preprint arXiv:1606.06160

  • Zhou A, Yao A, Guo Y, Xu L, Chen Y (2017a) Incremental network quantization: towards lossless cnns with low-precision weights. ArXiv preprint arXiv:1702.03044

  • Zhu M, Gupta S (2017) To prune, or not to prune: exploring the efficacy of pruning for model compression. ArXiv preprint arXiv:1710.01878

  • Zhu C, Han S, Mao H, Dally WJ (2017) Trained ternary quantization. In: Published as a conference paper at ICLR

  • Zhu F, Pool J, Andersch M, Appleyard J, Xie F (2018) Sparse persistent RNNs: Squeezing large recurrent networks on-chip. In: International conference on learning representations. https://openreview.net/forum?id=HkxF5RgC-

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vipul Mishra.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choudhary, T., Mishra, V., Goswami, A. et al. A comprehensive survey on model compression and acceleration. Artif Intell Rev 53, 5113–5155 (2020). https://doi.org/10.1007/s10462-020-09816-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09816-7

Keywords

Navigation