Abstract
This chapter presents a least-square based learning on the single hidden layer neural network. A square-root free Cholesky decomposition technique is applied to reduce the training complexity. Furthermore, the optimized learning algorithm is mapped on CMOS and RRAM based hardware. The two implementations on both RRAM and CMOS are presented. The detailed analysis of hardware implementation is discussed with significant speed-up and energy-efficiency improvement when compared with CPU and GPU based implementations (Figures and illustrations may be reproduced from [11, 12]).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
PCIe is short for Peripheral Component Interconnect Express.
References
(2016) Adm-pcie-7v3. http://www.alpha-data.com/dcp/products.php?product=adm-pcie-7v3, Accessed: 13 June 2016
(2016) Beagleboard-xm. http://beagleboard.org/beagleboard-xm
Akinaga H, Shima H (2010) Resistive random access memory (ReRAM) based on metal oxides. In: Proceedings of the IEEE 98(12):2237–2251. https://doi.org/10.1109/JPROC.2010.2070830
Aljarah I, Faris H, Mirjalili S (2018) Optimizing connection weights in neural networks using the whale optimization algorithm. Soft Comput 22(1):1–15
Chen PY, Kadetotad D, Xu Z, Mohanty A, Lin B, Ye J, Vrudhula S, Seo Js, Cao Y, Yu S (2015) Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. In: IEEE Proceedings of the 2015 design, automation and test in europe conference and exhibition. EDA Consortium, pp 854–859
Chen YC, Wang W, Li H, Zhang W (2012) Non-volatile 3D stacking RRAM-based FPGA. In: IEEE international conference on field programmable logic and applications. Oslo, Norway
Decherchi S, Gastaldo P, Leoncini A, Zunino R (2012) Efficient digital implementation of extreme learning machines for classification. IEEE Trans Circuits Syst II: Express Briefs 59(8):496–500
Franzon P, Rotenberg E, Tuck J, Davis WR, Zhou H, Schabel J, Zhang Z, Dwiel JB, Forbes E, Huh J et al (2015) Computing in 3D. In: Custom integrated circuits conference (CICC), 2015 IEEE. IEEE, California, pp 1–6
Hecht-Nielsen R (1989) Theory of the backpropagation neural network. In: International joint conference on neural networks, Washington, DC, pp 593–605
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Huang H, Yu H (2017) Least-squares-solver based machine learning accelerator for real-time data analytics in smart buildings. In: Emerging technology and architecture for big-data analytics, Springer, pp 51–76. https://doi.org/10.1007/978-3-319-54840-1_3
Huang H, Ni L, Wang Y, Yu H, Wangl Z, Cail Y, Huangl R (2016) A 3d multi-layer cmos-rram accelerator for neural network. In: 3D Systems Integration Conference (3DIC), 2016 IEEE International, IEEE, pp 1–5. https://doi.org/10.1109/3DIC.2016.7970014
Igelnik B, Igelnik B, Zurada JM (2013) Efficiency and scalability methods for computational intellect, 1st edn. IGI Global
Khan GM (2018) Evolutionary computation. In: Evolution of artificial neural development, Springer, pp 29–37
Kim DH, Athikulwongse K, Lim SK (2013) Study of through-silicon-via impact on the 3-D stacked IC layout. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21(5):862–874
Kim KH, Gaba S, Wheeler D, Cruz-Albrecht JM, Hussain T, Srinivasa N, Lu W (2011) A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. Nano Lett. 12(1):389–395
Krishnamoorthy A, Menon D (2011) Matrix inversion using Cholesky decomposition. arXiv:11114144
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
Lee H, et al (2008) Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM. In: IEEE electron devices meeting,
Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. In: USENIX Symposium on Operating Systems Design and Implementation, vol 14. Broomfield, Colorado, pp 583–598
Liauw YY, Zhang Z, Kim W, El Gamal A, Wong SS (2012) Nonvolatile 3D-FPGA with monolithically stacked rram-based configuration memory. In: IEEE international solid-state circuits conference, San Francisco, California
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Martino MD, Fanelli S, Protasi M (1993) A new improved online algorithm for multi-decisional problems based on MLP-networks using a limited amount of information. In: International joint conference on neural networks. Nagoya, Japan, pp 617–620
Ni L et al (2016) An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar. In: Asia and South Pacific design automation conference. Macao, China, pp 280–285
Pao YH, Park GH, Sobajic DJ (1994) Backpropagation, Part IV Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180. https://doi.org/10.1016/0925-2312(94)90053-1
Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field- programmable gate arrays, ACM, Monterey, California, pp 26-35
Ren F, Marković D, (2016) A configurable 12237 kS/s 12.8 mw sparse-approximation engine for mobile data aggregation of compressively sampled physiological signals. IEEE J Solid-State Circuits 51(1):68–78
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Suykens JA, Van Gestel T, De Brabanter J (2002) Least squares support vector machines. World Scientific
Topaloglu RO (2015) More than Moore technologies for next generation computer design. Springer
Trefethen LN, Bau III D (1997) Numerical linear algebra, vol 50. Siam
Wang Q, Li P, Kim Y (2015a) A parallel digital VLSI architecture for integrated support vector machine training and classification. IEEE Trans. Very Large Scale Integr. Syst. 23(8):1471–1484
Wang Y, Yu H, Ni L, Huang GB, Yan M, Weng C, Yang W, Zhao J (2015b) An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices. IEEE Trans. Nanotechnol. 14(6):998–1012
Xia L, Gu P, Li B, Tang T, Yin X, Huangfu W, Yu S, Cao Y, Wang Y, Yang H (2016) Technological exploration of RRAM crossbar array for matrix-vector multiplication. J. Comput. Sci. Technol. 31(1):3–19
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: International symposium on field-programmable gate arrays. Monterey, California, pp 161–170
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Huang, H., Yu, H. (2019). Least-Squares-Solver for Shallow Neural Network. In: Compact and Fast Machine Learning Accelerator for IoT Devices. Computer Architecture and Design Methodologies. Springer, Singapore. https://doi.org/10.1007/978-981-13-3323-1_3
Download citation
DOI: https://doi.org/10.1007/978-981-13-3323-1_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3322-4
Online ISBN: 978-981-13-3323-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)