AMC: AutoML for Model Compression and Acceleration on Mobile Devices

  • Yihui He
  • Ji Lin
  • Zhijian Liu
  • Hanrui Wang
  • Li-Jia Li
  • Song HanEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11211)


Model compression is an effective technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted features and require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality. We achieved state-of-the-art model compression results in a fully automated way without any human efforts. Under 4\(\times \) FLOPs reduction, we achieved 2.7% better accuracy than the hand-crafted model compression method for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet-V1 and achieved a speedup of 1.53\(\times \) on the GPU (Titan Xp) and 1.95\(\times \) on an Android phone (Google Pixel 1), with negligible loss of accuracy.


AutoML Reinforcement learning Model compression CNN acceleration Mobile vision 



We thank Quoc Le, Yu Wang and Bill Dally for the supportive feedback. We thank Jiacong Chen for drawing the cartoon on the left of Fig. 1.

Supplementary material

474212_1_En_48_MOESM1_ESM.pdf (462 kb)
Supplementary material 1 (pdf 461 KB)


  1. 1.
    Anwar, S., Sung, W.: Compact deep convolutional neural networks with coarse pruning. arXiv preprint arXiv:1610.09639 (2016)
  2. 2.
    Ashok, A., Rhinehart, N., Beainy, F., Kitani, K.M.: N2N learning: network to network compression via policy gradient reinforcement learning. arXiv preprint arXiv:1709.06030 (2017)
  3. 3.
    Bagherinezhad, H., Rastegari, M., Farhadi, A.: LCNN: lookup-based convolutional neural network. arXiv preprint arXiv:1611.06473 (2016)
  4. 4.
    Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167 (2016)
  5. 5.
    Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Smash: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344 (2017)
  6. 6.
    Cai, H., Chen, T., Zhang, W., Yu, Y., Wang, J.: Reinforcement learning for architecture search by network transformation. arXiv preprint arXiv:1707.04873 (2017)
  7. 7.
    Canziani, A., Paszke, A., Culurciello, E.: An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678 (2016)
  8. 8.
    Chen, T., Goodfellow, I., Shlens, J.: Net2Net: accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641 (2015)
  9. 9.
    Chollet, F.: Xception: deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357 (2016)
  10. 10.
    Courbariaux, M., Bengio, Y.: BinaryNet: training deep neural networks with weights and activations constrained to +1 or \(-\)1. arXiv preprint arXiv:1602.02830 (2016)
  11. 11.
    Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems, pp. 1269–1277 (2014)Google Scholar
  12. 12.
    Dong, X., Huang, J., Yang, Y., Yan, S.: More is less: a more complicated network with less inference complexity. arXiv preprint arXiv:1703.08651 (2017)
  13. 13.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results.
  14. 14.
    Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  15. 15.
    Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014)
  16. 16.
  17. 17.
    Han, S., et al.: ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 75–84. ACM (2017)Google Scholar
  18. 18.
    Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., Dally, W.J.: EIE: efficient inference engine on compressed deep neural network. In: Proceedings of the 43rd International Symposium on Computer Architecture, pp. 243–254. IEEE Press (2016)Google Scholar
  19. 19.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)
  20. 20.
    Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)Google Scholar
  21. 21.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  22. 22.
    He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1389–1397 (2017)Google Scholar
  23. 23.
    Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  24. 24.
    Hu, H., Peng, R., Tai, Y.W., Tang, C.K.: Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250 (2016)
  25. 25.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  26. 26.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014)
  27. 27.
    Kim, Y.D., Park, E., Yoo, S., Choi, T., Yang, L., Shin, D.: Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530 (2015)
  28. 28.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)Google Scholar
  29. 29.
    Lavin, A.: Fast algorithms for convolutional neural networks. arXiv preprint arXiv:1509.09308 (2015)
  30. 30.
    Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., Lempitsky, V.: Speeding-up convolutional neural networks using fine-tuned CP-decomposition. arXiv preprint arXiv:1412.6553 (2014)
  31. 31.
    Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016)
  32. 32.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
  33. 33.
    Lin, J., Rao, Y., Lu, J.: Runtime neural pruning. In: Advances in Neural Information Processing Systems, pp. 2178–2188 (2017)Google Scholar
  34. 34.
    Luo, J.H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. arXiv preprint arXiv:1707.06342 (2017)
  35. 35.
    Masana, M., van de Weijer, J., Herranz, L., Bagdanov, A.D., Alvarez, J.M.: Domain-adaptive deep network compression. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  36. 36.
    Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through FFTs. arXiv preprint arXiv:1312.5851 (2013)
  37. 37.
    Miikkulainen, R., et al.: Evolving deep neural networks. arXiv preprint arXiv:1703.00548 (2017)
  38. 38.
    Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient transfer learning. CoRR, abs/1611.06440 (2016)Google Scholar
  39. 39.
    Parashar, A., et al.: SCNN: an accelerator for compressed-sparse convolutional neural networks. In: 44th International Symposium on Computer Architecture (2017)Google Scholar
  40. 40.
    Polyak, A., Wolf, L.: Channel-level acceleration of deep face representations. IEEE Access 3, 2163–2175 (2015)CrossRefGoogle Scholar
  41. 41.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). Scholar
  42. 42.
    Real, E., et al.: Large-scale evolution of image classifiers. arXiv preprint arXiv:1703.01041 (2017)
  43. 43.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  44. 44.
    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. arXiv preprint arXiv:1801.04381 (2018)
  45. 45.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  46. 46.
    Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)CrossRefGoogle Scholar
  47. 47.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  48. 48.
    Vasilache, N., Johnson, J., Mathieu, M., Chintala, S., Piantino, S., LeCun, Y.: Fast convolutional nets with FBFFT: a GPU performance evaluation. arXiv preprint arXiv:1412.7580 (2014)
  49. 49.
    Wang, H., Zhang, Q., Wang, Y., Hu, R.: Structured probabilistic pruning for deep convolutional neural network acceleration. arXiv preprint arXiv:1709.06994 (2017)
  50. 50.
    Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)Google Scholar
  51. 51.
    Xue, J., Li, J., Gong, Y.: Restructuring of deep neural network acoustic models with singular value decomposition. In: INTERSPEECH, pp. 2365–2369 (2013)Google Scholar
  52. 52.
    Yang, T.J., et al.: NetAdapt: platform-aware neural network adaptation for mobile applications. arXiv preprint arXiv:1804.03230 (2018)
  53. 53.
    Zhang, X., Zou, J., He, K., Sun, J.: Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 1943–1955 (2016)CrossRefGoogle Scholar
  54. 54.
    Zhong, Z., Yan, J., Liu, C.L.: Practical network blocks design with q-learning. arXiv preprint arXiv:1708.05552 (2017)
  55. 55.
    Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)
  56. 56.
    Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)
  57. 57.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. arXiv preprint arXiv:1707.07012 (2017)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yihui He
    • 2
  • Ji Lin
    • 1
  • Zhijian Liu
    • 1
  • Hanrui Wang
    • 1
  • Li-Jia Li
    • 3
  • Song Han
    • 1
    Email author
  1. 1.Massachusetts Institute of TechnologyCambridgeUSA
  2. 2.Carnegie Mellon UniversityPittsburghUSA
  3. 3.GoogleMountain ViewUSA

Personalised recommendations