Abstract
The need for efficient Convolutional Neural Network (CNNs) targeting embedded systems led to the popularization of Binary Neural Networks (BNNs), which significantly reduce execution time and memory requirements by representing the operands using only one bit. Also, due to 90% of the operations executed by CNNs and BNNs being convolutions, a quest for custom accelerators to optimize the convolution operation and reduce data movements has started, in which Resistive Random Access Memory (RRAM)-based accelerators have proven to be of interest. This work presents a custom Binary Dot Product Engine(BDPE) for BNNs that exploits the low-level compute capabilities enabled RRAMs. This new engine allows accelerating the execution of the inference phase of BNNs by locally storing the most used kernels and performing the binary convolutions using RRAM devices and optimized custom circuitry. Results show that the novel BDPE improves performance by 11.3%, energy efficiency by 7.4% and reduces the number of memory accesses by 10.7% at a cost of less than 0.3% additional die area.
J. Vieira—At the time of this work, João Vieira was affiliated with the University of Utah.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: ICML. ACM International Conference Proceeding Series, vol. 307, pp. 160–167. ACM (2008)
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788. IEEE Computer Society (2016)
Shafiee, A., et al.: ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In: ISCA, pp. 14–26. IEEE Computer Society (2016)
Vieira, J., et al.: A product engine for energy-efficient execution of binary neural networks using resistive memories. In: VLSI-SoC, pp. 160–165. IEEE (2019)
Giacomin, E., Greenberg-Toledo, T., Kvatinsky, S., Gaillardon, P.: A robust digital RRAM-based convolutional block for low-power image processing and learning applications. IEEE Trans. Circ. Syst. 66-I(2), 643–654 (2019)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Cong, J., Xiao, B.: Minimizing computation in convolutional neural networks. In: Wermter, S., et al. (eds.) ICANN 2014. LNCS, vol. 8681, pp. 281–290. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11179-7_36
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587. IEEE Computer Society (2014)
Girshick, R.B.: Fast R-CNN. In: ICCV, pp. 1440–1448. IEEE Computer Society (2015)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011 (2011)
Wong, H.P., et al.: Metal-oxide RRAM. Proc. IEEE 100(6), 1951–1970 (2012)
Tang, X., Giacomin, E., Micheli, G.D., Gaillardon, P.: Circuit designs of high-performance and low-power RRAM-based multiplexers based on 4t(ransistor)1r(ram) programming structure. IEEE Trans. Circ. Syst. 64-I(5), 1173–1186 (2017)
ARM: Arm architecture reference manual (2018)
Redmon, J.: Darknet: Open source neural networks in c (2013–2016). http://pjreddie.com/darknet/
Binkert, N.L., et al.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011)
Butko, A., Garibotti, R., Ost, L., Sassatelli, G.: Accuracy evaluation of GEM5 simulator system. In: ReCoSoC, pp. 1–7. IEEE (2012)
Qureshi, Y.M., Simon, W.A., Zapater, M., Atienza, D., Olcoz, K.: Gem5-x: A gem5-based system level simulation framework to optimize many-core platforms. In: SpringSim, pp. 1–12. IEEE (2019)
Abouzeid, F., et al.: 30% static power improvement on ARM cortex -a53 using static biasing-anticipation. In: ESSCIRC, pp. 37–40. IEEE (2016)
Pahlevan, A., et al.: Energy proportionality in near-threshold computing servers and cloud data centers: Consolidating or not? In: DATE, pp. 147–152. IEEE (2018)
Du, L., et al.: A reconfigurable streaming deep convolutional neural network accelerator for internet of things. IEEE Trans. Circ. Syst. 65-I(1), 198–208 (2018)
Chen, Y., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. J. Solid-State Circuits 52(1), 127–138 (2017)
Jo, J., Kim, S., Park, I.: Energy-efficient convolution architecture based on rescheduled dataflow. IEEE Trans. Circ. Syst. 65-I(12), 4196–4207 (2018)
Sim, J., Park, J., Kim, M., Bae, D., Choi, Y., Kim, L.: 14.6 A 1.42tops/w deep convolutional neural network recognition processor for intelligent IOE systems. In: ISSCC, pp. 264–265. IEEE (2016)
Kim, S., Howe, P., Moreau, T., Alaghi, A., Ceze, L., Sathe, V.S.: Energy-efficient neural network acceleration in the presence of bit-level memory errors. IEEE Trans. Circ. Syst. 65-I(12), 4285–4298 (2018)
Ni, L., Liu, Z., Yu, H., Joshi, R.V.: An energy-efficient digital reram-crossbar-based CNN with bitwise parallelism. IEEE J. Explor. Solid-State Comput. Dev. Circ. 3, 37–46 (2017)
Tang, T., Xia, L., Li, B., Wang, Y., Yang, H.: Binary convolutional neural network on RRAM. In: ASP-DAC, pp. 782–787. IEEE (2017)
Agbo, I., et al.: Quantification of sense amplifier offset voltage degradation due to zero-and run-time variability. In: ISVLSI, pp. 725–730. IEEE Computer Society (2016)
Sun, X., Yin, S., Peng, X., Liu, R., Seo, J., Yu, S.: XNOR-RRAM: a scalable and parallel resistive synaptic architecture for binary neural networks. In: DATE, pp. 1423–1428. IEEE (2018)
Chen, A., Lin, M.R.: Variability of resistive switching memories and its impact on crossbar array performance. In: 2011 International Reliability Physics Symposium, p. MY-7. IEEE (2011)
Xia, L., et al.: Switched by input: power efficient structure for RRAM-based convolutional neural network. In: DAC, ACM, pp. 125:1–125:6 (2016)
Chen, X., Jiang, J., Zhu, J., Tsui, C.: A high-throughput and energy-efficient RRAM-based convolutional neural network using data encoding and dynamic quantization. In: ASP-DAC, pp. 123–128. IEEE (2018)
Chi, P., et al.: PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In: ISCA, pp. 27–39. IEEE Computer Society (2016)
Acknowledgements
This work was primarily supported by the grant 2016016 from the United States-Israel Binational Science Foundation.
Other supporting grants are SFRH/BD/144047/2019 from Fundação para a Ciência e a Tecnologia (FCT), Portugal; ERC Consolidator Grant COMPUSAPIEN (GA No. 725657); ERC starting grant Real-PIM-System (GA No. 757259); and EC H2020 EUROLAB4HPC2 project (GA No. 800962).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 IFIP International Federation for Information Processing
About this paper
Cite this paper
Vieira, J. et al. (2020). Accelerating Inference on Binary Neural Networks with Digital RRAM Processing. In: Metzler, C., Gaillardon, PE., De Micheli, G., Silva-Cardenas, C., Reis, R. (eds) VLSI-SoC: New Technology Enabler. VLSI-SoC 2019. IFIP Advances in Information and Communication Technology, vol 586. Springer, Cham. https://doi.org/10.1007/978-3-030-53273-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-53273-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53272-7
Online ISBN: 978-3-030-53273-4
eBook Packages: Computer ScienceComputer Science (R0)