Skip to main content

SqueezeJet: High-Level Synthesis Accelerator Design for Deep Convolutional Neural Networks

  • Conference paper
  • First Online:
Book cover Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2018)

Abstract

Deep convolutional neural networks have dominated the pattern recognition scene by providing much more accurate solutions in computer vision problems such as object recognition and object detection. Most of these solutions come at a huge computational cost, requiring billions of multiply-accumulate operations and, thus, making their use quite challenging in real-time applications that run on embedded mobile (resource-power constrained) hardware. This work presents the architecture, the high-level synthesis design, and the implementation of SqueezeJet, an FPGA accelerator for the inference phase of the SqueezeNet DCNN architecture, which is designed specifically for use in embedded systems. Results show that SqueezeJet can achieve 15.16 times speed-up compared to the software implementation of SqueezeNet running on an embedded mobile processor with less than 1% drop in top-5 accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this work, SqueezeNet refers to SqueezeNet v1.1.

  2. 2.

    A pixel is comprised by all the channels at a specific \((x, \, y)\) location in the future map volume (see Fig. 1).

  3. 3.

    In this work, kernel has the same meaning as filter.

  4. 4.

    In the case of the ARM Cortex-A53, we measure RPI3 board power consumption, because there is no way to acquire power consumption measurements or estimations for the Broadcom 2837 SoC.

  5. 5.

    https://www.intel.com/software/pcm.

  6. 6.

    https://www.xilinx.com/products/technology/power/xpe.html.

  7. 7.

    https://github.com/pmgysel/caffe.

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  2. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  3. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  4. Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM, 2015 February

    Google Scholar 

  5. Motamedi, M., Gysel, P., Akella, V., Ghiasi, S.: Design space exploration of FPGA-based deep convolutional neural networks. In: 2016 21st Asia and South Pacific, Design Automation Conference (ASP-DAC), pp. 575–580. IEEE, January 2016

    Google Scholar 

  6. Ovtcharov, K., Ruwase, O., Kim, J.Y., Fowers, J., Strauss, K., Chung, E.S.: Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res. Whitepaper 2(11) (2015)

    Google Scholar 

  7. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5 MB model size. arXiv preprint (2016). arXiv:1602.07360

  8. Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., Wang, Y.: Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 26–35. ACM, February 2016

    Google Scholar 

  9. Gschwend, D.: Zynqnet: an FPGA-accelerated embedded convolutional neural network. Masters thesis, Swiss Federal Institute of Technology Zurich (ETH-Zurich) (2016)

    Google Scholar 

  10. Gokhale, V., Jin, J., Dundar, A., Martini, B., Culurciello, E.: A 240 G-ops/s mobile coprocessor for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 682–687 (2014)

    Google Scholar 

  11. Ma, Y., Cao, Y., Vrudhula, S., Seo, J.S.: Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 45–54. ACM, February 2017

    Google Scholar 

  12. Iandola, F.: SqueezeNet/SqueezeNet_v1.1 at master. DeepScale/SqueezeNet (2017). https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1

  13. Xilinx Inc.: High-Level Synthesis. Vivado Design Suite User Guide. UG902 (2017). https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_2/ug902-vivado-high-level-synthesis.pdf

  14. Vranesic, Z., Brown, S.: Fundamentals of Digital Logic with Verilog Design, 3rd edn. McGraw-Hill Education, New York (2014)

    Google Scholar 

  15. Ali, K.M.A., Ben Atitallah, R., Fakhfakh, N., Dekeyser, J.-L.: Exploring HLS optimizations for efficient stereo matching hardware implementation. In: Wong, S., Beck, A.C., Bertels, K., Carro, L. (eds.) ARC 2017. LNCS, vol. 10216, pp. 168–176. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56258-2_15

    Chapter  Google Scholar 

  16. Gysel, P., Motamedi, M., Ghiasi, S.: Hardware-oriented approximation of convolutional neural networks. arXiv preprint (2016). arXiv:1604.03168

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis G. Mousouliotis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mousouliotis, P.G., Petrou, L.P. (2018). SqueezeJet: High-Level Synthesis Accelerator Design for Deep Convolutional Neural Networks. In: Voros, N., Huebner, M., Keramidas, G., Goehringer, D., Antonopoulos, C., Diniz, P. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2018. Lecture Notes in Computer Science(), vol 10824. Springer, Cham. https://doi.org/10.1007/978-3-319-78890-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78890-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78889-0

  • Online ISBN: 978-3-319-78890-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics