Abstract
Convolutional Neural Networks (CNNs), are nowadays present in many different embedded solutions. One of the biggest problems related to their execution is the memory bottleneck. In this work we propose an optimal double buffering tiling strategy, to reduce the memory bandwidth in the execution of deep CNN architecture, testing our model on one of the two cores of a Zynq®-7020 embedded platform. An optimal tiling strategy is found for each layer of the network, optimizing for lowest external memory \(\rightleftharpoons \) On-Chip memory bandwidth. Performance test results show an improvement in the total execution time of 50% (cache disabled/34% cache enabled), compared to a non double buffered implementation. Moreover, a 5x lower external memory \(\rightleftharpoons \) On-Chip memory double buffering memory bandwidth is achieved, with respect to naive tiling settings. Furthermore it is shown that tiling settings for highest OCM usage do not generally lead to the lowest bandwidth scenario.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
PrimeCell DMA Controller (PL330) (2007)
Zedboard, Zynq Evaluation and Development Hardware Users Guide (2014)
Al Maashri, A., Cotter, M., Chandramoorthy, N., DeBole, M., Yu, C.-L., Narayanan, V., Chakrabarti, C.: Hardware acceleration for neuromorphic vision algorithms. J. Sig. Process. Syst. 70(2), 163–175 (2013)
S. C. class. Cs231n: convolutional neural networks for visual recognition (2016)
Conti, F., Pullini, A., Benini, L.: Brain-inspired classroom occupancy monitoring on a low-power mobile platform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 610–615 (2014)
Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., LeCun, Y.: Neuflow: a runtime reconfigurable dataflow processor for vision. In: 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 109–116. IEEE (2011)
Fischer, P., Dosovitskiy, A., Ilg, E., Häusser, P., Hazırbaş, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. arXiv preprint arXiv:1504.06852 (2015)
Huang, Q., Xue, J., Vera, X.: Code tiling for improving the cache performance of PDE solvers. In: 2003 International Conference on Parallel Processing, Proceedings, pp. 615–624. IEEE (2003)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Kandemir, M., Ramanujam, J., Irwin, M.J., Vijaykrishnan, N., Kadayif, I., Parikh, A.: Dynamic management of scratch-pad memory space. In: Design Automation Conference, Proceedings, pp. 690–695. IEEE (2001)
Kodukula, I., Ahmed, N., Pingali, K.: Data-centric multi-level blocking. In: ACM SIGPLAN Notices, vol. 32, pp. 346–357. ACM (1997)
Saidi, S., Tendulkar, P., Lepley, T., Maler, O.: Optimizing two-dimensional DMA transfers for scratchpad based MPSoCs platforms. Microprocess. Microsyst. 37(8), 848–857 (2013)
Yang, X., Wang, L., Xue, J., Tang, T., Ren, X., Ye, S.: Improving scratchpad allocation with demand-driven data tiling. In: Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 127–136. ACM (2010)
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)
Acknowledgment
The work of S. Smets was supported by a Doctoral Fellowship of the Research Foundation Flanders (FWO).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Cecconi, L., Smets, S., Benini, L., Verhelst, M. (2017). Optimal Tiling Strategy for Memory Bandwidth Reduction for CNNs. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2017. Lecture Notes in Computer Science(), vol 10617. Springer, Cham. https://doi.org/10.1007/978-3-319-70353-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-70353-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70352-7
Online ISBN: 978-3-319-70353-4
eBook Packages: Computer ScienceComputer Science (R0)