Abstract
For computing weights of deep neural networks (DNNs), the backpropagation (BP) method has been widely used as a de-facto standard algorithm. Since the BP method is based on a stochastic gradient descent method using derivatives of objective functions, the BP method has some difficulties finding appropriate parameters such as learning rate. As another approach for computing weight matrices, we recently proposed an alternating optimization method using linear and nonlinear semi-nonnegative matrix factorizations (semi-NMFs). In this paper, we propose a parallel implementation of the nonlinear semi-NMF based method. The experimental results show that our nonlinear semi-NMF based method and its parallel implementation have competitive advantages to the conventional DNNs with the BP method.
This is a preview of subscription content, access via your institution.





Notes
- 1.
In [11], the simplified objective function (3), which discards bias vectors and sparse regularizations, was considered. To consider bias vectors and sparse regularizations, we need to construct algorithms for solving “constrained” (nonlinear) semi-NMFs with sparse regularizations, because \({\varvec{1}}\) is fixed in (1). Therefore, in this paper, we also consider the simplified objective function (3). Note that we have been developing methods for solving such constrained problems.
References
- 1.
Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wisetraining of deep networks. Proc Adv Neural Inf Process Syst 19:153–160
- 2.
Ciresan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. Proc. 22nd International joint conference on artificial intelligence, 1237–1242
- 3.
Ding D, Li T, Jordan MI (2010) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32:45–55
- 4.
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics, 249–256
- 5.
Hinton GE, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29:82–97
- 6.
Kingma DP Ba J (2015) ADAM: a method for stochastic optimization. The international conference on learning representations (ICLR), San Diego
- 7.
LeCun Y The MNIST database of handwritten digits, http://yann.lecun.com/exdb/mnist
- 8.
LeCun Y, Bottou L, Bengio Y, Huffier P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86:2278–2324
- 9.
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In Proc, ICML
- 10.
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536
- 11.
Sakurai T, Imakura A, Inoue Y, Futamura F (2016) Alternating optimization method based on nonnegative matrix factorizations for deep neural networks. In Proc. ICONIP2016
- 12.
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
- 13.
TensorFlow, https://www.tensorflow.org/
Author information
Affiliations
Corresponding author
Additional information
This research was supported partly by JST/ACT-I (No. JPMJPR16U6), JST/CREST, MEXT KAKENHI (No. 17K12690) and University of Tsukuba Basic Research Support Program Type A. This research in part used computational resources of the K computer provided by the RIKEN Advanced Institute for Computational Science through the HPCI System Research project (Project ID:hp160138) and COMA provided by Interdisciplinary Computational Science Program in Center for Computational Sciences, University of Tsukuba.
Rights and permissions
About this article
Cite this article
Imakura, A., Inoue, Y., Sakurai, T. et al. Parallel Implementation of the Nonlinear Semi-NMF Based Alternating Optimization Method for Deep Neural Networks. Neural Process Lett 47, 815–827 (2018). https://doi.org/10.1007/s11063-017-9642-2
Published:
Issue Date:
Keywords
- Deep neural networks
- Nonlinear semi-nonnegative matrix factorizations
- Parallel implementation