SqueezeJet: High-Level Synthesis Accelerator Design for Deep Convolutional Neural Networks

Mousouliotis, Panagiotis G.; Petrou, Loukas P.

doi:10.1007/978-3-319-78890-6_5

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10824))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

2494 Accesses
9 Citations
1 Altmetric

Abstract

Deep convolutional neural networks have dominated the pattern recognition scene by providing much more accurate solutions in computer vision problems such as object recognition and object detection. Most of these solutions come at a huge computational cost, requiring billions of multiply-accumulate operations and, thus, making their use quite challenging in real-time applications that run on embedded mobile (resource-power constrained) hardware. This work presents the architecture, the high-level synthesis design, and the implementation of SqueezeJet, an FPGA accelerator for the inference phase of the SqueezeNet DCNN architecture, which is designed specifically for use in embedded systems. Results show that SqueezeJet can achieve 15.16 times speed-up compared to the software implementation of SqueezeNet running on an embedded mobile processor with less than 1% drop in top-5 accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this work, SqueezeNet refers to SqueezeNet v1.1.
2.
A pixel is comprised by all the channels at a specific \((x, \, y)\) location in the future map volume (see Fig. 1).
3.
In this work, kernel has the same meaning as filter.
4.
In the case of the ARM Cortex-A53, we measure RPI3 board power consumption, because there is no way to acquire power consumption measurements or estimations for the Broadcom 2837 SoC.
5.
https://www.intel.com/software/pcm.
6.
https://www.xilinx.com/products/technology/power/xpe.html.
7.
https://github.com/pmgysel/caffe.

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM, 2015 February
Google Scholar
Motamedi, M., Gysel, P., Akella, V., Ghiasi, S.: Design space exploration of FPGA-based deep convolutional neural networks. In: 2016 21st Asia and South Pacific, Design Automation Conference (ASP-DAC), pp. 575–580. IEEE, January 2016
Google Scholar
Ovtcharov, K., Ruwase, O., Kim, J.Y., Fowers, J., Strauss, K., Chung, E.S.: Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res. Whitepaper 2(11) (2015)
Google Scholar
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5 MB model size. arXiv preprint (2016). arXiv:1602.07360
Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., Wang, Y.: Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 26–35. ACM, February 2016
Google Scholar
Gschwend, D.: Zynqnet: an FPGA-accelerated embedded convolutional neural network. Masters thesis, Swiss Federal Institute of Technology Zurich (ETH-Zurich) (2016)
Google Scholar
Gokhale, V., Jin, J., Dundar, A., Martini, B., Culurciello, E.: A 240 G-ops/s mobile coprocessor for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 682–687 (2014)
Google Scholar
Ma, Y., Cao, Y., Vrudhula, S., Seo, J.S.: Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 45–54. ACM, February 2017
Google Scholar
Iandola, F.: SqueezeNet/SqueezeNet_v1.1 at master. DeepScale/SqueezeNet (2017). https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1
Xilinx Inc.: High-Level Synthesis. Vivado Design Suite User Guide. UG902 (2017). https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_2/ug902-vivado-high-level-synthesis.pdf
Vranesic, Z., Brown, S.: Fundamentals of Digital Logic with Verilog Design, 3rd edn. McGraw-Hill Education, New York (2014)
Google Scholar
Ali, K.M.A., Ben Atitallah, R., Fakhfakh, N., Dekeyser, J.-L.: Exploring HLS optimizations for efficient stereo matching hardware implementation. In: Wong, S., Beck, A.C., Bertels, K., Carro, L. (eds.) ARC 2017. LNCS, vol. 10216, pp. 168–176. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56258-2_15
Chapter Google Scholar
Gysel, P., Motamedi, M., Ghiasi, S.: Hardware-oriented approximation of convolutional neural networks. arXiv preprint (2016). arXiv:1604.03168

Download references

Author information

Authors and Affiliations

Division of Electronics and Computer Engineering, Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
Panagiotis G. Mousouliotis & Loukas P. Petrou

Authors

Panagiotis G. Mousouliotis
View author publications
You can also search for this author in PubMed Google Scholar
Loukas P. Petrou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panagiotis G. Mousouliotis .

Editor information

Editors and Affiliations

Technological Educational Institute of Western Greece, Antirrio, Greece
Nikolaos Voros
Ruhr-Universität Bochum, Bochum, Germany
Michael Huebner
Technological Educational Institute of Western Greece, Antirrio, Greece
Georgios Keramidas
Technische Universität Dresden, Dresden, Germany
Diana Goehringer
Technological Educational Institute of Western Greece, Antirio, Greece
Christos Antonopoulos
INESC-ID, Lisbon, Portugal
Pedro C. Diniz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mousouliotis, P.G., Petrou, L.P. (2018). SqueezeJet: High-Level Synthesis Accelerator Design for Deep Convolutional Neural Networks. In: Voros, N., Huebner, M., Keramidas, G., Goehringer, D., Antonopoulos, C., Diniz, P. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2018. Lecture Notes in Computer Science(), vol 10824. Springer, Cham. https://doi.org/10.1007/978-3-319-78890-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-78890-6_5
Published: 08 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78889-0
Online ISBN: 978-3-319-78890-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics