A Neural Network Model with Bidirectional Whitening
We present here a new model and algorithm which performs an efficient Natural gradient descent for multilayer perceptrons. Natural gradient descent was originally proposed from a point of view of information geometry, and it performs the steepest descent updates on manifolds in a Riemannian space. In particular, we extend an approach taken by the “Whitened Neural Networks” model. We make the whitening process not only in the feed-forward direction as in the original model, but also in the back-propagation phase. Its efficacy is shown by an application of this “Bidirectional Whitened Neural Networks” model to a handwritten character recognition data (MNIST data).
This work was supported by Grant-in-Aid for Scientific Research from Japan Society for the Promotion of Science KAKENHI No. 16H03360 and No. 16H01175.
- 4.Desjardins, G., Simonyan, K., Pascanu, R., Kavukcuoglu, K.: Natural neural networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2071–2079. Curran Associates Inc. (2015)Google Scholar
- 5.Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 448–456 (2015)Google Scholar
- 6.Martens, J.: New insights and perspectives on the natural gradient method. arXiv preprint arXiv:1412.1193 (2014)
- 7.Martens, J., Grosse, R.: Optimizing neural networks with kronecker-factored approximate curvature. arXiv preprint arXiv:1503.05671 (2015)
- 9.Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. arXiv preprint arXiv:1301.3584 (2013)
- 10.Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 901–909. Curran Associates Inc. (2016)Google Scholar