Accelerating Inference on Binary Neural Networks with Digital RRAM Processing

Vieira, João; Giacomin, Edouard; Qureshi, Yasir; Zapater, Marina; Tang, Xifan; Kvatinsky, Shahar; Atienza, David; Gaillardon, Pierre-Emmanuel

doi:10.1007/978-3-030-53273-4_12

Accelerating Inference on Binary Neural Networks with Digital RRAM Processing

João Vieira²⁰,
Edouard Giacomin²¹,
Yasir Qureshi²²,
Marina Zapater²²,
Xifan Tang²¹,
Shahar Kvatinsky²³,
David Atienza²² &
…
Pierre-Emmanuel Gaillardon²¹

Conference paper
First Online: 22 July 2020

552 Accesses
1 Citations

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 586))

Abstract

The need for efficient Convolutional Neural Network (CNNs) targeting embedded systems led to the popularization of Binary Neural Networks (BNNs), which significantly reduce execution time and memory requirements by representing the operands using only one bit. Also, due to 90% of the operations executed by CNNs and BNNs being convolutions, a quest for custom accelerators to optimize the convolution operation and reduce data movements has started, in which Resistive Random Access Memory (RRAM)-based accelerators have proven to be of interest. This work presents a custom Binary Dot Product Engine(BDPE) for BNNs that exploits the low-level compute capabilities enabled RRAMs. This new engine allows accelerating the execution of the inference phase of BNNs by locally storing the most used kernels and performing the binary convolutions using RRAM devices and optimized custom circuitry. Results show that the novel BDPE improves performance by 11.3%, energy efficiency by 7.4% and reduces the number of memory accesses by 10.7% at a cost of less than 0.3% additional die area.

J. Vieira—At the time of this work, João Vieira was affiliated with the University of Utah.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: ICML. ACM International Conference Proceeding Series, vol. 307, pp. 160–167. ACM (2008)
Google Scholar
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Article MathSciNet Google Scholar
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788. IEEE Computer Society (2016)
Google Scholar
Shafiee, A., et al.: ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In: ISCA, pp. 14–26. IEEE Computer Society (2016)
Google Scholar
Vieira, J., et al.: A product engine for energy-efficient execution of binary neural networks using resistive memories. In: VLSI-SoC, pp. 160–165. IEEE (2019)
Google Scholar
Giacomin, E., Greenberg-Toledo, T., Kvatinsky, S., Gaillardon, P.: A robust digital RRAM-based convolutional block for low-power image processing and learning applications. IEEE Trans. Circ. Syst. 66-I(2), 643–654 (2019)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Cong, J., Xiao, B.: Minimizing computation in convolutional neural networks. In: Wermter, S., et al. (eds.) ICANN 2014. LNCS, vol. 8681, pp. 281–290. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11179-7_36
Chapter Google Scholar
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587. IEEE Computer Society (2014)
Google Scholar
Girshick, R.B.: Fast R-CNN. In: ICCV, pp. 1440–1448. IEEE Computer Society (2015)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011 (2011)
Google Scholar
Wong, H.P., et al.: Metal-oxide RRAM. Proc. IEEE 100(6), 1951–1970 (2012)
Article Google Scholar
Tang, X., Giacomin, E., Micheli, G.D., Gaillardon, P.: Circuit designs of high-performance and low-power RRAM-based multiplexers based on 4t(ransistor)1r(ram) programming structure. IEEE Trans. Circ. Syst. 64-I(5), 1173–1186 (2017)
Google Scholar
ARM: Arm architecture reference manual (2018)
Google Scholar
Redmon, J.: Darknet: Open source neural networks in c (2013–2016). http://pjreddie.com/darknet/
Binkert, N.L., et al.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011)
Article Google Scholar
Butko, A., Garibotti, R., Ost, L., Sassatelli, G.: Accuracy evaluation of GEM5 simulator system. In: ReCoSoC, pp. 1–7. IEEE (2012)
Google Scholar
Qureshi, Y.M., Simon, W.A., Zapater, M., Atienza, D., Olcoz, K.: Gem5-x: A gem5-based system level simulation framework to optimize many-core platforms. In: SpringSim, pp. 1–12. IEEE (2019)
Google Scholar
Abouzeid, F., et al.: 30% static power improvement on ARM cortex -a53 using static biasing-anticipation. In: ESSCIRC, pp. 37–40. IEEE (2016)
Google Scholar
Pahlevan, A., et al.: Energy proportionality in near-threshold computing servers and cloud data centers: Consolidating or not? In: DATE, pp. 147–152. IEEE (2018)
Google Scholar
Du, L., et al.: A reconfigurable streaming deep convolutional neural network accelerator for internet of things. IEEE Trans. Circ. Syst. 65-I(1), 198–208 (2018)
Google Scholar
Chen, Y., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. J. Solid-State Circuits 52(1), 127–138 (2017)
Article Google Scholar
Jo, J., Kim, S., Park, I.: Energy-efficient convolution architecture based on rescheduled dataflow. IEEE Trans. Circ. Syst. 65-I(12), 4196–4207 (2018)
Google Scholar
Sim, J., Park, J., Kim, M., Bae, D., Choi, Y., Kim, L.: 14.6 A 1.42tops/w deep convolutional neural network recognition processor for intelligent IOE systems. In: ISSCC, pp. 264–265. IEEE (2016)
Google Scholar
Kim, S., Howe, P., Moreau, T., Alaghi, A., Ceze, L., Sathe, V.S.: Energy-efficient neural network acceleration in the presence of bit-level memory errors. IEEE Trans. Circ. Syst. 65-I(12), 4285–4298 (2018)
Google Scholar
Ni, L., Liu, Z., Yu, H., Joshi, R.V.: An energy-efficient digital reram-crossbar-based CNN with bitwise parallelism. IEEE J. Explor. Solid-State Comput. Dev. Circ. 3, 37–46 (2017)
Google Scholar
Tang, T., Xia, L., Li, B., Wang, Y., Yang, H.: Binary convolutional neural network on RRAM. In: ASP-DAC, pp. 782–787. IEEE (2017)
Google Scholar
Agbo, I., et al.: Quantification of sense amplifier offset voltage degradation due to zero-and run-time variability. In: ISVLSI, pp. 725–730. IEEE Computer Society (2016)
Google Scholar
Sun, X., Yin, S., Peng, X., Liu, R., Seo, J., Yu, S.: XNOR-RRAM: a scalable and parallel resistive synaptic architecture for binary neural networks. In: DATE, pp. 1423–1428. IEEE (2018)
Google Scholar
Chen, A., Lin, M.R.: Variability of resistive switching memories and its impact on crossbar array performance. In: 2011 International Reliability Physics Symposium, p. MY-7. IEEE (2011)
Google Scholar
Xia, L., et al.: Switched by input: power efficient structure for RRAM-based convolutional neural network. In: DAC, ACM, pp. 125:1–125:6 (2016)
Google Scholar
Chen, X., Jiang, J., Zhu, J., Tsui, C.: A high-throughput and energy-efficient RRAM-based convolutional neural network using data encoding and dynamic quantization. In: ASP-DAC, pp. 123–128. IEEE (2018)
Google Scholar
Chi, P., et al.: PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In: ISCA, pp. 27–39. IEEE Computer Society (2016)
Google Scholar

Download references

Acknowledgements

This work was primarily supported by the grant 2016016 from the United States-Israel Binational Science Foundation.

Other supporting grants are SFRH/BD/144047/2019 from Fundação para a Ciência e a Tecnologia (FCT), Portugal; ERC Consolidator Grant COMPUSAPIEN (GA No. 725657); ERC starting grant Real-PIM-System (GA No. 757259); and EC H2020 EUROLAB4HPC2 project (GA No. 800962).

Author information

Authors and Affiliations

INESC-ID, Instituto Superior Técnico, University of Lisboa, Lisbon, Portugal
João Vieira
LNIS, University of Utah, Salt Lake City, USA
Edouard Giacomin, Xifan Tang & Pierre-Emmanuel Gaillardon
ESL, Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland
Yasir Qureshi, Marina Zapater & David Atienza
Andrew and Erna Viterbi Faculty of Electrical Engineering, Technion, Israel Institute of Technology, Haifa, Israel
Shahar Kvatinsky

Authors

João Vieira
View author publications
You can also search for this author in PubMed Google Scholar
Edouard Giacomin
View author publications
You can also search for this author in PubMed Google Scholar
Yasir Qureshi
View author publications
You can also search for this author in PubMed Google Scholar
Marina Zapater
View author publications
You can also search for this author in PubMed Google Scholar
Xifan Tang
View author publications
You can also search for this author in PubMed Google Scholar
Shahar Kvatinsky
View author publications
You can also search for this author in PubMed Google Scholar
David Atienza
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-Emmanuel Gaillardon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Vieira .

Editor information

Editors and Affiliations

Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Carolina Metzler
University of Utah, Salt Lake City, UT, USA
Pierre-Emmanuel Gaillardon
EPFL, Lausanne, Switzerland
Giovanni De Micheli
Pontificia Universidad Católica del Perú, Lima, Peru
Carlos Silva-Cardenas
Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Ricardo Reis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vieira, J. et al. (2020). Accelerating Inference on Binary Neural Networks with Digital RRAM Processing. In: Metzler, C., Gaillardon, PE., De Micheli, G., Silva-Cardenas, C., Reis, R. (eds) VLSI-SoC: New Technology Enabler. VLSI-SoC 2019. IFIP Advances in Information and Communication Technology, vol 586. Springer, Cham. https://doi.org/10.1007/978-3-030-53273-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-53273-4_12
Published: 22 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53272-7
Online ISBN: 978-3-030-53273-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)