Least-Squares-Solver for Shallow Neural Network

Huang, Hantao; Yu, Hao

doi:10.1007/978-981-13-3323-1_3

Least-Squares-Solver for Shallow Neural Network

Hantao Huang⁷ &
Hao Yu⁸

Chapter
First Online: 08 December 2018

833 Accesses

Part of the book series: Computer Architecture and Design Methodologies ((CADM))

Abstract

This chapter presents a least-square based learning on the single hidden layer neural network. A square-root free Cholesky decomposition technique is applied to reduce the training complexity. Furthermore, the optimized learning algorithm is mapped on CMOS and RRAM based hardware. The two implementations on both RRAM and CMOS are presented. The detailed analysis of hardware implementation is discussed with significant speed-up and energy-efficiency improvement when compared with CPU and GPU based implementations (Figures and illustrations may be reproduced from [11, 12]).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
PCIe is short for Peripheral Component Interconnect Express.

References

(2016) Adm-pcie-7v3. http://www.alpha-data.com/dcp/products.php?product=adm-pcie-7v3, Accessed: 13 June 2016
(2016) Beagleboard-xm. http://beagleboard.org/beagleboard-xm
Akinaga H, Shima H (2010) Resistive random access memory (ReRAM) based on metal oxides. In: Proceedings of the IEEE 98(12):2237–2251. https://doi.org/10.1109/JPROC.2010.2070830
Aljarah I, Faris H, Mirjalili S (2018) Optimizing connection weights in neural networks using the whale optimization algorithm. Soft Comput 22(1):1–15
Article Google Scholar
Chen PY, Kadetotad D, Xu Z, Mohanty A, Lin B, Ye J, Vrudhula S, Seo Js, Cao Y, Yu S (2015) Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. In: IEEE Proceedings of the 2015 design, automation and test in europe conference and exhibition. EDA Consortium, pp 854–859
Google Scholar
Chen YC, Wang W, Li H, Zhang W (2012) Non-volatile 3D stacking RRAM-based FPGA. In: IEEE international conference on field programmable logic and applications. Oslo, Norway
Google Scholar
Decherchi S, Gastaldo P, Leoncini A, Zunino R (2012) Efficient digital implementation of extreme learning machines for classification. IEEE Trans Circuits Syst II: Express Briefs 59(8):496–500
Article Google Scholar
Franzon P, Rotenberg E, Tuck J, Davis WR, Zhou H, Schabel J, Zhang Z, Dwiel JB, Forbes E, Huh J et al (2015) Computing in 3D. In: Custom integrated circuits conference (CICC), 2015 IEEE. IEEE, California, pp 1–6
Google Scholar
Hecht-Nielsen R (1989) Theory of the backpropagation neural network. In: International joint conference on neural networks, Washington, DC, pp 593–605
Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Article Google Scholar
Huang H, Yu H (2017) Least-squares-solver based machine learning accelerator for real-time data analytics in smart buildings. In: Emerging technology and architecture for big-data analytics, Springer, pp 51–76. https://doi.org/10.1007/978-3-319-54840-1_3
Huang H, Ni L, Wang Y, Yu H, Wangl Z, Cail Y, Huangl R (2016) A 3d multi-layer cmos-rram accelerator for neural network. In: 3D Systems Integration Conference (3DIC), 2016 IEEE International, IEEE, pp 1–5. https://doi.org/10.1109/3DIC.2016.7970014
Igelnik B, Igelnik B, Zurada JM (2013) Efficiency and scalability methods for computational intellect, 1st edn. IGI Global
Google Scholar
Khan GM (2018) Evolutionary computation. In: Evolution of artificial neural development, Springer, pp 29–37
Google Scholar
Kim DH, Athikulwongse K, Lim SK (2013) Study of through-silicon-via impact on the 3-D stacked IC layout. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21(5):862–874
Article Google Scholar
Kim KH, Gaba S, Wheeler D, Cruz-Albrecht JM, Hussain T, Srinivasa N, Lu W (2011) A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. Nano Lett. 12(1):389–395
Article Google Scholar
Krishnamoorthy A, Menon D (2011) Matrix inversion using Cholesky decomposition. arXiv:11114144
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
Google Scholar
Lee H, et al (2008) Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM. In: IEEE electron devices meeting,
Google Scholar
Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. In: USENIX Symposium on Operating Systems Design and Implementation, vol 14. Broomfield, Colorado, pp 583–598
Google Scholar
Liauw YY, Zhang Z, Kim W, El Gamal A, Wong SS (2012) Nonvolatile 3D-FPGA with monolithically stacked rram-based configuration memory. In: IEEE international solid-state circuits conference, San Francisco, California
Google Scholar
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Martino MD, Fanelli S, Protasi M (1993) A new improved online algorithm for multi-decisional problems based on MLP-networks using a limited amount of information. In: International joint conference on neural networks. Nagoya, Japan, pp 617–620
Google Scholar
Ni L et al (2016) An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar. In: Asia and South Pacific design automation conference. Macao, China, pp 280–285
Google Scholar
Pao YH, Park GH, Sobajic DJ (1994) Backpropagation, Part IV Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180. https://doi.org/10.1016/0925-2312(94)90053-1
Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field- programmable gate arrays, ACM, Monterey, California, pp 26-35
Google Scholar
Ren F, Marković D, (2016) A configurable 12237 kS/s 12.8 mw sparse-approximation engine for mobile data aggregation of compressively sampled physiological signals. IEEE J Solid-State Circuits 51(1):68–78
Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Article Google Scholar
Suykens JA, Van Gestel T, De Brabanter J (2002) Least squares support vector machines. World Scientific
Google Scholar
Topaloglu RO (2015) More than Moore technologies for next generation computer design. Springer
Google Scholar
Trefethen LN, Bau III D (1997) Numerical linear algebra, vol 50. Siam
Google Scholar
Wang Q, Li P, Kim Y (2015a) A parallel digital VLSI architecture for integrated support vector machine training and classification. IEEE Trans. Very Large Scale Integr. Syst. 23(8):1471–1484
Article Google Scholar
Wang Y, Yu H, Ni L, Huang GB, Yan M, Weng C, Yang W, Zhao J (2015b) An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices. IEEE Trans. Nanotechnol. 14(6):998–1012
Article Google Scholar
Xia L, Gu P, Li B, Tang T, Yin X, Huangfu W, Yu S, Cao Y, Wang Y, Yang H (2016) Technological exploration of RRAM crossbar array for matrix-vector multiplication. J. Comput. Sci. Technol. 31(1):3–19
Article MathSciNet Google Scholar
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: International symposium on field-programmable gate arrays. Monterey, California, pp 161–170
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Hantao Huang
Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, Guangdong, China
Hao Yu

Authors

Hantao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hantao Huang .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Huang, H., Yu, H. (2019). Least-Squares-Solver for Shallow Neural Network. In: Compact and Fast Machine Learning Accelerator for IoT Devices. Computer Architecture and Design Methodologies. Springer, Singapore. https://doi.org/10.1007/978-981-13-3323-1_3

Download citation

DOI: https://doi.org/10.1007/978-981-13-3323-1_3
Published: 08 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3322-4
Online ISBN: 978-981-13-3323-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics