A Modular Software Library for Effective High Level Synthesis of Convolutional Neural Networks

  • Hector Gerardo Munoz HernandezEmail author
  • Safdar Mahmood
  • Marcelo Brandalero
  • Michael Hübner
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12083)


Convolutional Neural Networks (CNNs) have applications in many valuable domains such as object detection for autonomous cars and security using facial recognition. This vast field of application usually places strict non-functional requirements such as resource-efficient implementations on the hardware devices, while at the same time requiring flexibility. In response, this work presents a C++-based software library of reusable modules to build arbitrary CNNs that support High-Level-Synthesis to be implemented as FPGA hardware accelerators for the inference process. Our work demonstrates how parametrization and modularization of basic building blocks of a CNN enable easier customization of the hardware to match the software model. This project also works with low-precision parameters throughout the CNN to provide a more resource-efficient implementation.


High Level Synthesis Modular approach HW acceleration Convolutional Neural Networks Inference acceleration Library of components Machine learning C library FPGA 


  1. 1.
    Abdelouahab, K., Pelcat, M., Serot, J., Berry, F.: Accelerating CNN inference on FPGAs: a Survey (2018). arXiv: 1806.01683 [cs.DC]
  2. 2.
    Bacis, M., Natale, G., Del Sozzo, E., Santambrogio, M.D.: A pipelined and scalable dataflow implementation of convolutional neural networks on FPGA. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 2017, pp. 90–97 (2017)Google Scholar
  3. 3.
    Bhandare, A., Bhide, M.V., Gokhale, P., Chandavarkar, R.: Applications of Convolutional Neural Networks (2016)Google Scholar
  4. 4.
    Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R. Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or \(-\)1 (2016). arXiv: 1602.02830 [cs.LG]
  5. 5.
    Farabet, C., et al.: Hardware accelerated convolutional neural networks for synthetic vision systems. In: Proceedings of 2010 IEEE International Symposium on Circuits and Systems, May 2010, pp. 257–260 (2010)Google Scholar
  6. 6.
    Fu, C., Zhu, S., Su, H., Lee, C.-E., Zhao, J.: Towards fast and energy-efficient binarized neural network inference on FPGA (2018). arXiv: 1810.02068 [cs.LG]
  7. 7.
    Fukushima, K., Miyake, S.: Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recogn. 15, 455–469 (1982)CrossRefGoogle Scholar
  8. 8.
    Guan, Y., et al.: FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 2017, pp. 152–159, IEEE Computer Society, Los Alamitos (2017).
  9. 9.
    Hailesellasie, M., Hasan, S.R., Mohamed, O.A.: MulMapper: towards an automated FPGA-Based CNN processor generator based on a dynamic design space exploration. In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), May 2019, pp. 1–5 (2019)Google Scholar
  10. 10.
    Hao, Y.: A general neural network hardware architecture on FPGA (2017). arXiv: 1711.05860 [cs.CV]
  11. 11.
    Huang, C., Ni, S., Chen, G.: A layer-based structured design of CNN on FPGA. In: 2017 IEEE 12th International Conference on ASIC (ASICON), October 2017, pp. 1037–1040 (2017)Google Scholar
  12. 12.
    Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations (2016). arXiv: 1609.07061 [cs.NE]
  13. 13.
    Iandola, F.N., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5MB model size (2016). arXiv: 1602.07360 [cs.CV]
  14. 14.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprintarXiv:1408.5093 (2014)
  15. 15.
    Kluyver, T., et al.: Jupyter notebooks - a publishing format for reproducible computational workflows. In: Loizides, F., Scmidt, B. (eds.) Positioning and Power in Academic Publishing: Players, Agents and Agendas, pp. 87–90. IOS Press, Amsterdam (2016).
  16. 16.
    LeCun, Y., et al.: In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems 2, pp. 396–404. Morgan-Kaufmann, Burlington (1990).
  17. 17.
    Leon, V., et al.: A tensorflow extension framework for optimized generation of hardware CNN inference engines in technologies 2020, MDPI 2020.
  18. 18.
    Li, P., Li, J., Wang, G.: Application of convolutional neural network in natural language processing. In: 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), December 2018, pp. 120–122 (2018)Google Scholar
  19. 19.
    Natale, G., Bacis, M., Santambrogio, M.D.: On how to design dataflow FPGA-based accelerators for convolutional neural networks. In: 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), July 2017, pp. 639–644 (2017)Google Scholar
  20. 20.
    Nielsen, M.: Neural Network and Deep Learning. Determination Press.
  21. 21.
    Noronha, D.H., Salehpour, B., Wilton, S.J.E.: LeFlow: enabling flexible FPGA high-level synthesis of tensorflow deep neural networks (2018). arXiv: 1807.05317 [cs.LG]
  22. 22.
    Ovtcharov, K., et al.: Accelerating deep convolutional neural networks using specialized hardware, February 2015.
  23. 23.
    Solovyev, R.A., Kalinin, A.A., Kustov, A.G., Telpukhov, D.V., Ruhlov, V.S.: FPGA Implementation of Convolutional Neural Networks with Fixed-Point Calculations (2018). arXiv: 1808.09945 [cs.CV]
  24. 24.
    Umuroglu, Y., et al.: FINN. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA 2017 (2017).
  25. 25.
    Venieris, S.I., Kouris, A., Bouganis, C.-S.: Toolflows for mapping convolutional neural networks on FPGAs: a survey and future directions (2018). arXiv: 1803.05900 [cs.CV]
  26. 26.
    Wang, E., Davis, J.J., Cheung, P.Y.K.: A PYNQ-based framework for rapid CNN prototyping. In: 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), April 2018, p. 223 (2018)Google Scholar
  27. 27.
    Ma, Y., Suda, N., Cao, Y., Seo, J., Vrudhula, S.: Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL), August 2016, pp. 1–8 (2016)Google Scholar
  28. 28.
    Zaheer, R., Shaziya, H.: GPU-based empirical evaluation of activation functions in convolutional neural networks. In: 2018 2nd International Conference on Inventive Systems and Control (ICISC), January 2018, pp. 769–773 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Brandenburg University of Technology Cottbus - Senftenberg Computer Engineering GroupCottbusGermany

Personalised recommendations