Abstract
The recent emerging memristor can provide non-volatile memory storage but also intrinsic computing for matrix-vector multiplication, which is ideal for low-power and high-throughput data analytics accelerator performed in memory. However, the existing memristor-crossbar based computing is mainly assumed as a multi-level analog computing, whose result is sensitive to process non-uniformity as well as additional overhead from AD-conversion and I/O. In this chapter, we explore the matrix-vector multiplication accelerator on a binary memristor-crossbar with adaptive 1-bit-comparator based parallel conversion. Moreover, a distributed in-memory computing architecture is also developed with according control protocol. Both memory array and logic accelerator are implemented on the binary memristor-crossbar, where logic-memory pair can be distributed with protocol of control bus. Experiment results have shown that compared to the analog memristor-crossbar, the proposed binary memristor-crossbar can achieve significant area-saving with better calculation accuracy. Moreover, significant speedup can be achieved for matrix-vector multiplication in the neuron-network based machine learning such that the overall training and testing time can be both reduced respectively. In addition, large energy saving can be also achieved when compared to the traditional CMOS-based out-of-memory computing architecture.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Akinaga, H., & Shima, H. (2010). Resistive random access memory (reram) based on metal oxides. Proceedings of the IEEE, 98(12), 2237–2251.
Chen, P. Y., et al. (2015). Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. In IEEE date.
Chen, Y.-C., Wang, W., Li H., & Zhang, W. (2012). Non-volatile 3d stacking rram-based fpga. In 22nd International conference on field programmable logic and applications (FPL) (pp. 367–372). IEEE.
Chua, L. O. (1971). Memristor-the missing circuit element. IEEE Transactions on Circuit Theory, 18(5), 507–519.
Coates, A., Ng, A. Y., & Lee, H. (2011). An analysis of single-layer networks in unsupervised feature learning. In International conference on artificial intelligence and statistics (pp. 215–223).
Cong, J., & Xiao, B. (2014). Minimizing computation in convolutional neural networks. In International conference on artificial neural networks (pp. 281–290). Springer.
Fan, D., Sharad, M., & Roy, K., (2014). Design and synthesis of ultralow energy spin-memristor threshold logic. IEEE Transactions on Nanotechnology, 13(3), 574–583.
Fei, W., Yu, H., Zhang, W., & Yeo, K. S. (2012). Design exploration of hybrid cmos and memristor circuit by new modified nodal analysis. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 20(6), 1012–1025.
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics (pp. 249–256).
Gu, P., Li, B., Tang, T., Yu, S., Cao, Y., Wang, Y., & Yang, H. (2015). Technological exploration of rram crossbar array for matrix-vector multiplication. In 2015 20th Asia and South Pacific design automation conference (ASP-DAC) (pp. 106–111). IEEE.
Haykin, S. S., Haykin, S. S., & Haykin, S. S. (2009). Neural networks and learning machines (Vol. 3). Pearson Education Upper Saddle River.
Higham, N. J. (2009). Cholesky factorization. Wiley Interdisciplinary Reviews: Computational Statistics, 1(2), 251–254. doi:10.1002/wics.18.
Hinton, G. E., Osindero, S., & Teh, Y. -W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1), 489–501.
Huang, G. B., Ramesh, M., Berg, T., Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst.
Kang, J., Gao, B., Chen, B., Huang, P.-Y., Zhang, F., & Deng, Y. et al. (2014). 3d rram: Design and optimization. In 2014 12th IEEE international conference on solid-state and integrated circuit technology (ICSICT) (pp. 1–4). IEEE.
Kim, K. -H., Gaba, S., Wheeler, D., Cruz-Albrecht, J. M., Hussain, T., & Srinivasa, N., et al. (2011). A functional hybrid memristor crossbar-array/cmos system for data storage and neuromorphic applications. Nano Letters, 12(1), 389–395.
Kim, Y., Zhang, Y., & Li, P. (2012). A digital neuromorphic vlsi architecture with memristor crossbar synaptic array for machine learning. In 2012 IEEE international SOC conference (SOCC) (pp. 328–333). IEEE.
Kouzes, R. T., Anderson, G. A., Elbert, S. T., Gorton, I., & Gracio, D. K. (2009). The changing paradigm of data-intensive computing. Computer, 1, 26–34.
Krishnamoorthy, A., & Menon, D. (2011). Matrix inversion using cholesky decomposition. arXiv preprint arXiv:11114144.
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.
Kumar, V., Sharma, R., Uzunlar, E., Zheng, L., Bashirullah, R., & Kohl, P., et al. (2014). Airgap interconnects: Modeling, optimization, and benchmarking for backplane, pcb, and interposer applications. IEEE Transactions on Components, Packaging and Manufacturing Technology, 4(8), 1335–1346.
LeCun, Y. A., Bottou, L., Orr, G. B., & Müller, K. -R. (2012). Efficient backprop. In Neural networks: Tricks of the Trade (pp. 9–48). Springer.
Lee, H., Che, P., Wu, T., Che, Y., Wan, C., & Tzen, P., et al. (2008). Low power and high speed bipolar switching with a thin reactive ti buffer layer in robust hfo2 based rram. In IEEE international electron devices meeting, IEDM 2008 (pp. 1–4). IEEE.
Liauw, Y. Y., Zhang, Z., Kim, W., El Gamal, A., Wong, S. S. (2012). Nonvolatile 3d-fpga with monolithically stacked rram-based configuration memory. In 2012 IEEE international solid-state circuits conference (pp. 406–408). IEEE.
Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Liu, X., Mao, M., Liu, B., Li, H., Chen, Y., & Li, B., et al. (2015). Reno: A high-efficient reconfigurable neuromorphic computing accelerator design. In 2015 52nd ACM/EDAC/IEEE design automation conference (DAC) (pp. 1–6). IEEE.
Lu, W., Kim, K. -H., Chang, T., & Gaba, S. (2011). Two-terminal resistive switches (memristors) for memory and logic applications. In Design automation conference (ASP-DAC).
Matsunaga, S., Hayakawa, J., Ikeda, S., Miura, K., Endoh, T., & Ohno, H., et al. (2009). Mtj-based nonvolatile logic-in-memory circuit, future prospects and issues. In Proceedings of the Conference on Design European Design and Automation Association: Automation and Test in Europe (pp. 433–435).
Müller, K.-R., Tangermann, M., Dornhege, G., Krauledat, M., Curio, G., & Blankertz, B. (2008). Machine learning for real-time single-trial eeg-analysis: From brain-computer interfacing to mental state monitoring. Journal of neuroscience methods, 167(1), 82–90.
Park, S., Qazi, M., Peh, L. -S., & Chandrakasan, A. P. (2013). 40.4 fj/bit/mm low-swing on-chip signaling with self-resetting logic repeaters embedded within a mesh noc in 45nm soi cmos. In Proceedings of the Conference on Design, Automation and Test in Europe, EDA Consortium (pp. 1637–1642).
Shang, Y., Fei, W., & Yu, H., (2012). Analysis and modeling of internal state variables for dynamic effects of nonvolatile memory devices. IEEE Transactions on Circuits and Systems I: Regular Papers, 59(9), 1906–1918.
Singh, P. N., Kumar, A., Debnath, C., Malik, R. (2007). 20mw, 125 msps, 10 bit pipelined adc in 65nm standard digital cmos process. In Custom integrated circuits conference, CICC’07 (pp. 189–192). IEEE.
Srimani, T., Manna, B., Mukhopadhyay, A. K., Roy, K., Sharad, M. (2015). Energy efficient and high performance current-mode neural network circuit using memristors and digitally assisted analog cmos neurons. arXiv preprint arXiv:151109085.
Strukov, D. B., Snider, G. S., Stewart, D. R., & Williams, R. S. (2008). The missing memristor found. Nature, 453(7191), 80–83.
Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural processing letters, 9(3), 293–300.
Tan, T., & Sun, Z. (2010). CASIA-FingerprintV5. http://biometrics.idealtest.org/.
Topaloglu, R. O. (2015). More than moore technologies for next generation computer design. Springer.
Vaidyanathan, S., & Volos, C. (2016a). Advances and applications in chaotic systems (Vol. 636). Springer.
Vaidyanathan, S., Volos, C. (2016b). Advances and applications in nonlinear control systems (Vol. 635). Springer.
Wang, Y., Yu, H., & Zhang, W. (2014). Nonvolatile cbram-crossbar-based 3-d-integrated hybrid memory for data retention. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(5), 957–970.
Wang, Y., Yu, H., Ni, L., Huang, G. -B., Yan, M., & Weng, C., et al.(2015). An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices. IEEE Transactions on Nanotechnology, 14(6), 998–1012.
Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560.
Williams, S. R. (2008). How we found the missing memristor. Spectrum, IEEE, 45(12), 28–35.
Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1–3), 37–52.
Wolpert, D. H. (1996). The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7), 1341–1390.
Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., & Ma, Y., (2009). Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2), 210–227.
Yu, H., & Wang, Y. (2014). Design exploration of emerging nano-scale non-volatile memory. Springer.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Yu, H., Ni, L., Huang, H. (2017). Distributed In-Memory Computing on Binary Memristor-Crossbar for Machine Learning. In: Vaidyanathan, S., Volos, C. (eds) Advances in Memristors, Memristive Devices and Systems. Studies in Computational Intelligence, vol 701. Springer, Cham. https://doi.org/10.1007/978-3-319-51724-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-51724-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51723-0
Online ISBN: 978-3-319-51724-7
eBook Packages: EngineeringEngineering (R0)