Advertisement

A Neural Network Model with Bidirectional Whitening

  • Yuki FujimotoEmail author
  • Toru Ohira
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10841)

Abstract

We present here a new model and algorithm which performs an efficient Natural gradient descent for multilayer perceptrons. Natural gradient descent was originally proposed from a point of view of information geometry, and it performs the steepest descent updates on manifolds in a Riemannian space. In particular, we extend an approach taken by the “Whitened Neural Networks” model. We make the whitening process not only in the feed-forward direction as in the original model, but also in the back-propagation phase. Its efficacy is shown by an application of this “Bidirectional Whitened Neural Networks” model to a handwritten character recognition data (MNIST data).

Notes

Acknowledgement

This work was supported by Grant-in-Aid for Scientific Research from Japan Society for the Promotion of Science KAKENHI No. 16H03360 and No. 16H01175.

References

  1. 1.
    Amari, S.-I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)CrossRefGoogle Scholar
  2. 2.
    Amari, S.-I., Nagaoka, H.: Methods of Information Geometry (Translations of Mathematical Monographs). American Mathematical Society, Providence (2007)CrossRefGoogle Scholar
  3. 3.
    Cardoso, J.-F., Laheld, B.H.: Equivariant adaptive source separation. IEEE Trans. Signal Process. 44(12), 3017–3030 (1996)CrossRefGoogle Scholar
  4. 4.
    Desjardins, G., Simonyan, K., Pascanu, R., Kavukcuoglu, K.: Natural neural networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2071–2079. Curran Associates Inc. (2015)Google Scholar
  5. 5.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 448–456 (2015)Google Scholar
  6. 6.
    Martens, J.: New insights and perspectives on the natural gradient method. arXiv preprint arXiv:1412.1193 (2014)
  7. 7.
    Martens, J., Grosse, R.: Optimizing neural networks with kronecker-factored approximate curvature. arXiv preprint arXiv:1503.05671 (2015)
  8. 8.
    Park, H., Amari, S.-I., Fukumizu, K.: Adaptive natural gradient learning algorithms for various stochastic models. Neural Netw. 13(7), 755–764 (2000)CrossRefGoogle Scholar
  9. 9.
    Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. arXiv preprint arXiv:1301.3584 (2013)
  10. 10.
    Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 901–909. Curran Associates Inc. (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Graduate School of MathematicsNagoya UniversityNagoyaJapan

Personalised recommendations