Advertisement

Tensor-Solver for Deep Neural Network

  • Hantao Huang
  • Hao Yu
Chapter
Part of the Computer Architecture and Design Methodologies book series (CADM)

Abstract

This chapter introduces a tensorized formulation for compressing neural network during training. By reshaping neural network weight matrices into high dimensional tensors with low-rank decomposition, significant neural network compression can be achieved with maintained accuracy. A layer-wise training algorithm of tensorized multilayer neural network is further introduced by modified alternating least-squares (MALS) method. The proposed TNN algorithm can provide state-of-the-arts results on various benchmarks with significant neural network compression rate. The accuracy can be further improved by fine-tuning with backward propagation (BP). Significant compression rate can be achieved for MNIST dataset and CIFAR-10 dataset. In addition, a 3D multi-layer CMOS-RRAM accelerator architecture is proposed for energy-efficient and highly-parallel computation (Figures and illustrations may be reproduced from [29, 30, 31]).

Keywords

Tensorized neural network Neural network compression RRAM 

References

  1. 1.
    Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: OSDI, vol 16, p 265–283Google Scholar
  2. 2.
    Amor BB, Su J, Srivastava A (2016) Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans Pattern Anal Mach Intell 38(1):1–13CrossRefGoogle Scholar
  3. 3.
    Annadani Y, Rakshith D, Biswas S (2016) Sliding dictionary based sparse representation for action recognition. arXiv:161100218
  4. 4.
    Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153Google Scholar
  5. 5.
    Chen K, Li S, Muralimanohar N, Ahn JH, Brockman JB, Jouppi NP (2012) CACTI-3DD: architecture-level modeling for 3D die-stacked DRAM main memory. In: Proceedings of the conference on design, automation and test in Europe, Dresden, Germany, pp 33–38Google Scholar
  6. 6.
    Chen PY, Kadetotad D, Xu Z, Mohanty A, Lin B, Ye J, Vrudhula S, Seo JS, Cao Y, Yu S (2015) Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. In: IEEE Proceedings of the 2015 design, automation and test in Europe conference and exhibition, EDA consortium, pp 854–859Google Scholar
  7. 7.
    Chen W, Wilson JT, Tyree S, Weinberger KQ, Chen Y (2015) Compressing neural networks with the hashing trick. In: International conference on machine learning, Lille, France, pp 2285–2294Google Scholar
  8. 8.
    Chen YC, Wang W, Li H, Zhang W (2012) Non-volatile 3D stacking RRAM-based FPGA. In: IEEE international conference on field programmable logic and applications, Oslo, NorwayGoogle Scholar
  9. 9.
    Cichocki A (2014) Era of big data processing: a new approach via tensor networks and tensor decompositions. arXiv:14032048
  10. 10.
    Cireşan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) High-performance neural networks for visual object classification. arXiv:11020183
  11. 11.
    Collins MD, Kohli P (2014) Memory bounded deep convolutional networks. arXiv:14121442
  12. 12.
    Davis A, Arel I (2013) Low-rank approximations for conditional feedforward computation in deep neural networks. arXiv:13124461
  13. 13.
    Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, Lake Tahoe, Nevada, pp 1223–1231Google Scholar
  14. 14.
    Deng J, Berg A, Satheesh S, Su H, Khosla A, Fei-Fei L (2012) Imagenet large scale visual recognition competition 2012 (ILSVRC2012)Google Scholar
  15. 15.
    Denil M, Shakibi B, Dinh L, de Freitas N et al (2013) Predicting parameters in deep learning. In: Advances in neural information processing systems, Lake Tahoe, Nevada, pp 2148–2156Google Scholar
  16. 16.
    Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, Montreal, Canada, pp 1269–1277Google Scholar
  17. 17.
    Fei W, Yu H, Zhang W, Yeo KS (2012) Design exploration of hybrid CMOS and memristor circuit by new modified nodal analysis. IEEE Trans Very Large Scale Integr (VLSI) Syst 20(6):1012–1025CrossRefGoogle Scholar
  18. 18.
    Fothergill S, Mentis H, Kohli P, Nowozin S (2012) Instructing people for training gestural interactive systems. In: Proceedings of the SIGCHI conference on human factors in computing systems, Austin, Texas, pp 1737–1746Google Scholar
  19. 19.
    Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, Sardinia, Italy, pp 249–256Google Scholar
  20. 20.
    Govoreanu B, Kar G, Chen Y, Paraschiv V, Kubicek S, Fantini A, Radu I, Goux L, Clima S, Degraeve R et al (2011) 10 \(\times \) 10 nm 2 Hf/HfO x crossbar resistive RAM with excellent performance, reliability and low-energy operation. In: International electron devices meeting, Washington, DC, pp 31–36Google Scholar
  21. 21.
    Hagan MT, Demuth HB, Beale MH, De Jesús O (1996) Neural network design, vol 20. PWS Publishing Company, BostonGoogle Scholar
  22. 22.
    Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:151000149
  23. 23.
    Han S et al (2016) DSD: regularizing deep neural networks with dense-sparse-dense training flow. arXiv:160704381
  24. 24.
    Han S et al (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: International symposium on field-programmable gate arrays, Monterey, California, pp 75–84Google Scholar
  25. 25.
    Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:150302531
  26. 26.
    Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554MathSciNetCrossRefGoogle Scholar
  27. 27.
    Holtz S, Rohwedder T, Schneider R (2012) The alternating linear scheme for tensor optimization in the tensor train format. SIAM J Sci Comput 34(2):A683–A713MathSciNetCrossRefGoogle Scholar
  28. 28.
    Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501CrossRefGoogle Scholar
  29. 29.
    Huang H, Yu H (2018) LTNN: a layer-wise tensorized compression of multilayer neural network. IEEE Trans Neural Netw Learn Syst  https://doi.org/10.1109/TNNLS.2018.2869974
  30. 30.
    Huang H, Ni L, Yu H (2017) LTNN: an energy-efficient machine learning accelerator on 3d CMOS-RRAM for layer-wise tensorized neural network. In: 2017 30th IEEE international system-on-chip conference (SOCC). IEEE, pp 280–285Google Scholar
  31. 31.
    Huang H, Ni L, Wang K, Wang Y, Yu H (2018) A highly parallel and energy efficient three-dimensional multilayer CMOS-RRAM accelerator for tensorized neural network. IEEE Trans Nanotechnol 17(4):645–656.  https://doi.org/10.1109/TNANO.2017.2732698CrossRefGoogle Scholar
  32. 32.
    Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Quantized neural networks: training neural networks with low precision weights and activations. arXiv:160907061
  33. 33.
    Hubara I, Soudry D, Yaniv RE (2016) Binarized neural networks. arXiv:160202505
  34. 34.
    Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: International joint conference on artificial intelligence, Menlo Park, California, pp 2466–2472Google Scholar
  35. 35.
    Kasun LLC, Yang Y, Huang GB, Zhang Z (2016) Dimension reduction with extreme learning machine. IEEE Trans Image Process 25(8):3906–3918MathSciNetCrossRefGoogle Scholar
  36. 36.
    Kingma DP, Ba J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980
  37. 37.
    Krizhevsky A, Nair V, Hinton G (2014) The cifar-10 datasetGoogle Scholar
  38. 38.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324Google Scholar
  39. 39.
    LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digitsGoogle Scholar
  40. 40.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRefGoogle Scholar
  41. 41.
    Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Computer vision and pattern recognition workshops, San Francisco, California, pp 9–14Google Scholar
  42. 42.
    Liauw YY, Zhang Z, Kim W, El Gamal A, Wong SS (2012) Nonvolatile 3D-FPGA with monolithically stacked RRAM-based configuration memory. In: IEEE international solid-state circuits conference, San Francisco, CaliforniaGoogle Scholar
  43. 43.
    Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
  44. 44.
    Liu Z, Li Y, Ren F, Yu H (2016) A binary convolutional encoder-decoder network for real-time natural scene text processing. arXiv:161203630
  45. 45.
    Martens J, Sutskever I (2011) Learning recurrent neural networks with Hessian-free optimization. In: International conference on machine learning, Bellevue, Washington, pp 1033–1040Google Scholar
  46. 46.
    Mellempudi N, Kundu A, Mudigere D, Das D, Kaul B, Dubey P (2017) Ternary neural networks with fine-grained quantization. arXiv:170501462
  47. 47.
    Micron Technology I (2017) Breakthrough nonvolatile memory technology. http://www.micron.com/about/emerging-technologies/3d-xpoint-technology/. Accessed 04 Jan 2018
  48. 48.
    Migacz S (2017) 8-bit inference with tensorrt. http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf. Accessed 04 Jan 2018
  49. 49.
    Ni L, Huang H, Liu Z, Joshi RV, Yu H (2017) Distributed in-memory computing on binary RRAM crossbar. ACM J Emerg Technol Comput Syst (JETC) 13(3):36.  https://doi.org/10.1145/2996192CrossRefGoogle Scholar
  50. 50.
    Novikov A, Podoprikhin D, Osokin A, Vetrov DP (2015) Tensorizing neural networks. In: Advances in neural information processing systems, Montreal, Canada, pp 442–450Google Scholar
  51. 51.
    Nvidia (2017) GPU specs. http://www.nvidia.com/object/workstation-solutions.html. Accessed 30 Mar 2017
  52. 52.
    Oseledets IV (2011) Tensor-train decomposition. SIAM J Sci Comput 33(5):2295–2317MathSciNetCrossRefGoogle Scholar
  53. 53.
    Oseledets IV, Dolgov S (2012) Solution of linear systems and matrix inversion in the TT-format. SIAM J Sci Comput 34(5):A2718–A2739MathSciNetCrossRefGoogle Scholar
  54. 54.
    Poremba M, Mittal S, Li D, Vetter JS, Xie Y (2015) DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches. In: Design, automation and test in Europe conference, Grenoble, France, pp 1543–1546Google Scholar
  55. 55.
    Rosenberg A (2009) Linear regression with regularization. http://eniac.cs.qc.cuny.edu/andrew/gcml/lecture5.pdf
  56. 56.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556
  57. 57.
    Tang J, Deng C, Huang GB (2016) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27(4):809–821MathSciNetCrossRefGoogle Scholar
  58. 58.
    Topaloglu RO (2015) More than Moore technologies for next generation computer design. Springer, BerlinGoogle Scholar
  59. 59.
    Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for MATLAB. In: International conference on multimedia, Brisbane, Australia, pp 689–692Google Scholar
  60. 60.
    Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3D action recognition with random occupancy patterns. In: Computer vision (ECCV). Springer, pp 872–885Google Scholar
  61. 61.
    Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM multimedia conference, Amsterdam, The Netherlands, pp 102–106Google Scholar
  62. 62.
    Wang Y, Zhang C, Nadipalli R, Yu H, Weerasekera R (2012) Design exploration of 3D stacked non-volatile memory by conductive bridge based crossbar. In: IEEE international conference on 3D system integration, Osaka, JapanGoogle Scholar
  63. 63.
    Wang Y, Yu H, Zhang W (2014) Nonvolatile CBRAM-crossbar-based 3D-integrated hybrid memory for data retention. IEEE Trans Very Large Scale Integr (VLSI) Syst 22(5):957–970CrossRefGoogle Scholar
  64. 64.
    Wang Y, Huang H, Ni L, Yu H, Yan M, Weng C, Yang W, Zhao J (2015) An energy-efficient non-volatile in-memory accelerator for sparse-representation based face recognition. In: Design, automation and test in Europe conference and exhibition (DATE), 2015. IEEE, pp 932–935Google Scholar
  65. 65.
    Wang Y, Li X, Xu K, Ren F, Yu H (2017) Data-driven sampling matrix boolean optimization for energy-efficient biomedical signal acquisition by compressive sensing. IEEE Trans Biomed Circuits Syst 11(2):255–266CrossRefGoogle Scholar
  66. 66.
    Xia L, Chen CC, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: Computer vision and pattern recognition workshops, Providence, Rhode Island, pp 20–27Google Scholar
  67. 67.
    Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Annual conference of the international speech communication association, Lyon, France, pp 2365–2369Google Scholar
  68. 68.
    Yang S, Yuan C, Hu W, Ding X (2014) A hierarchical model based on latent Dirichlet allocation for action recognition. In: International conference on pattern recognition, Stockholm, Sweden, pp 2613–2618Google Scholar
  69. 69.
    Yang Z, Moczulski M, Denil M, de Freitas N, Smola A, Song L, Wang Z (2015) Deep fried convnets. In: IEEE international conference on computer vision, Santiago, Chile, pp 1476–1483Google Scholar
  70. 70.
    Yu S et al (2013) 3D vertical RRAM-scaling limit analysis and demonstration of 3D array operation. In: Symposium on VLSI technology and circuits, Kyoto, Japan, pp 158–159Google Scholar
  71. 71.
    Zhou L, Li W, Zhang Y, Ogunbona P, Nguyen DT, Zhang H (2014) Discriminative key pose extraction using extended LC-KSVD for action recognition. In: International conference on digital image computing: techniques and applications, New South Wales, Australia, pp 1–8Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Electrical and Electronic EngineeringNanyang Technological UniversitySingaporeSingapore
  2. 2.Department of Electrical and Electronic EngineeringSouthern University of Science and TechnologyShenzhenChina

Personalised recommendations