Optimal Tiling Strategy for Memory Bandwidth Reduction for CNNs

Cecconi, Leonardo; Smets, Sander; Benini, Luca; Verhelst, Marian

doi:10.1007/978-3-319-70353-4_8

Leonardo Cecconi¹⁸,
Sander Smets¹⁹,
Luca Benini¹⁸ &
…
Marian Verhelst¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10617))

Included in the following conference series:

International Conference on Advanced Concepts for Intelligent Vision Systems

3007 Accesses
4 Citations
3 Altmetric

Abstract

Convolutional Neural Networks (CNNs), are nowadays present in many different embedded solutions. One of the biggest problems related to their execution is the memory bottleneck. In this work we propose an optimal double buffering tiling strategy, to reduce the memory bandwidth in the execution of deep CNN architecture, testing our model on one of the two cores of a Zynq®-7020 embedded platform. An optimal tiling strategy is found for each layer of the network, optimizing for lowest external memory \(\rightleftharpoons \) On-Chip memory bandwidth. Performance test results show an improvement in the total execution time of 50% (cache disabled/34% cache enabled), compared to a non double buffered implementation. Moreover, a 5x lower external memory \(\rightleftharpoons \) On-Chip memory double buffering memory bandwidth is achieved, with respect to naive tiling settings. Furthermore it is shown that tiling settings for highest OCM usage do not generally lead to the lowest bandwidth scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

PrimeCell DMA Controller (PL330) (2007)
Google Scholar
Zedboard, Zynq Evaluation and Development Hardware Users Guide (2014)
Google Scholar
Al Maashri, A., Cotter, M., Chandramoorthy, N., DeBole, M., Yu, C.-L., Narayanan, V., Chakrabarti, C.: Hardware acceleration for neuromorphic vision algorithms. J. Sig. Process. Syst. 70(2), 163–175 (2013)
Article Google Scholar
S. C. class. Cs231n: convolutional neural networks for visual recognition (2016)
Google Scholar
Conti, F., Pullini, A., Benini, L.: Brain-inspired classroom occupancy monitoring on a low-power mobile platform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 610–615 (2014)
Google Scholar
Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., LeCun, Y.: Neuflow: a runtime reconfigurable dataflow processor for vision. In: 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 109–116. IEEE (2011)
Google Scholar
Fischer, P., Dosovitskiy, A., Ilg, E., Häusser, P., Hazırbaş, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. arXiv preprint arXiv:1504.06852 (2015)
Huang, Q., Xue, J., Vera, X.: Code tiling for improving the cache performance of PDE solvers. In: 2003 International Conference on Parallel Processing, Proceedings, pp. 615–624. IEEE (2003)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Kandemir, M., Ramanujam, J., Irwin, M.J., Vijaykrishnan, N., Kadayif, I., Parikh, A.: Dynamic management of scratch-pad memory space. In: Design Automation Conference, Proceedings, pp. 690–695. IEEE (2001)
Google Scholar
Kodukula, I., Ahmed, N., Pingali, K.: Data-centric multi-level blocking. In: ACM SIGPLAN Notices, vol. 32, pp. 346–357. ACM (1997)
Google Scholar
Saidi, S., Tendulkar, P., Lepley, T., Maler, O.: Optimizing two-dimensional DMA transfers for scratchpad based MPSoCs platforms. Microprocess. Microsyst. 37(8), 848–857 (2013)
Article Google Scholar
Yang, X., Wang, L., Xue, J., Tang, T., Ren, X., Ye, S.: Improving scratchpad allocation with demand-driven data tiling. In: Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 127–136. ACM (2010)
Google Scholar
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)
Google Scholar

Download references

Acknowledgment

The work of S. Smets was supported by a Doctoral Fellowship of the Research Foundation Flanders (FWO).

Author information

Authors and Affiliations

DEI, University of Bologna, Bologna, Italy
Leonardo Cecconi & Luca Benini
ESAT-MICAS KU Leuven, Leuven, Belgium
Sander Smets & Marian Verhelst

Authors

Leonardo Cecconi
View author publications
You can also search for this author in PubMed Google Scholar
Sander Smets
View author publications
You can also search for this author in PubMed Google Scholar
Luca Benini
View author publications
You can also search for this author in PubMed Google Scholar
Marian Verhelst
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sander Smets .

Editor information

Editors and Affiliations

DGA, Paris, France
Jacques Blanc-Talon
University of Antwerp, Antwerp, Belgium
Rudi Penne
Ghent University - imec, Ghent, Belgium
Wilfried Philips
CSIRO Data 61, Canberra, Aust Capital Terr, Australia
Dan Popescu
University of Antwerp, Wilrijk, Belgium
Paul Scheunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cecconi, L., Smets, S., Benini, L., Verhelst, M. (2017). Optimal Tiling Strategy for Memory Bandwidth Reduction for CNNs. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2017. Lecture Notes in Computer Science(), vol 10617. Springer, Cham. https://doi.org/10.1007/978-3-319-70353-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-70353-4_8
Published: 23 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70352-7
Online ISBN: 978-3-319-70353-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics