Advertisement

Automatic Memory-Efficient Scheduling of CNNs

  • Luc WaeijenEmail author
  • Savvas Sioutas
  • Yifan He
  • Maurice Peemen
  • Henk Corporaal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11733)

Abstract

Accessing large external DRAM is costly, and poses a challenge to efficiently evaluate data-intensive convolutional neural networks (CNNs) on embedded devices. These external memory accesses can be minimized by exploiting data reuse in on-chip memory. Selecting the combination of code transformations that minimize the external DRAM accesses is however an extremely complex task. In this work a mathematical model is presented to quickly and very precisely evaluate combinations of code transformations on CNNs. An accompanying open source tool is developed which leverages this model to perform automated design space exploration and code generation for CNNs. The correctness of the developed model is demonstrated by measurement of seven neural networks. Results show the transformations selected by the tool can reduce external memory accesses by over an order of magnitude.

Keywords

Memory efficient Reuse Scheduling CNN 

Notes

Acknowledgements

This work is supported by NWO project CPS-P3 (12695).

References

  1. 1.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.orgzbMATHGoogle Scholar
  2. 2.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)Google Scholar
  3. 3.
    Horowitz, M.: Computing’s energy problem (and what we can do about it). In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14, February 2014Google Scholar
  4. 4.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  5. 5.
    Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR Oral), June 2016Google Scholar
  6. 6.
    Low, T.M., Igual, F.D., Smith, T.M., Quintana-Orti, E.S.: Analytical modeling is enough for high-performance BLIS. ACM Trans. Math. Softw. 43(2), 12:1–12:18 (2016)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Motamedi, M., Gysel, P., Ghiasi, S.: Placid: a platform for FPGA-based accelerator creation for DCNNs. ACM Trans. Multimedia Comput. Commun. Appl. 13(4), 62:1–62:21 (2017)CrossRefGoogle Scholar
  8. 8.
    Pareto, V.: Manual of Political Economy. Scholars Book Shelf, Cranbury (1971). https://books.google.nl/books?id=qAC8AAAAIAAJGoogle Scholar
  9. 9.
    Paszke, A., et al.: Automatic differentiation in pytorch (2017)Google Scholar
  10. 10.
    Peemen, M.: Improving the efficiency of deep convolutional networks. Eindhoven University of Technology (2017). https://pure.tue.nl/ws/portalfiles/portal/77700147/20171012_Peemen.pdf
  11. 11.
    Pradelle, B., Meister, B., Baskaran, M., Springer, J., Lethin, R.: Polyhedral optimization of tensorflow computation graphs. In: 6th Workshop on Extreme-Scale Programming Tools (ESPT, Associated with SC 2017) (2017)Google Scholar
  12. 12.
    Ragan-Kelley, J.: Decoupling algorithms from the organization of computation for high performance image processing. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, June 2014. http://groups.csail.mit.edu/commit/papers/2014/jrkthesis.pdf
  13. 13.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR (2018)Google Scholar
  15. 15.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  16. 16.
    Sze, V., Chen, Y., Yang, T., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)CrossRefGoogle Scholar
  17. 17.
  18. 18.
  19. 19.
    Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst. 2, 452–471 (1991)CrossRefGoogle Scholar
  20. 20.
    Yang, X., et al.: DNN dataflow choice is overrated. CoRR abs/1809.04070 (2018). http://arxiv.org/abs/1809.04070

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Luc Waeijen
    • 1
    Email author
  • Savvas Sioutas
    • 1
  • Yifan He
    • 1
  • Maurice Peemen
    • 1
  • Henk Corporaal
    • 1
  1. 1.Eindhoven University of TechnologyEindhovenThe Netherlands

Personalised recommendations