MaxGain: Regularisation of Neural Networks by Constraining Activation Magnitudes

  • Henry GoukEmail author
  • Bernhard Pfahringer
  • Eibe Frank
  • Michael J. Cree
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)


Effective regularisation of neural networks is essential to combat overfitting due to the large number of parameters involved. We present an empirical analogue to the Lipschitz constant of a feed-forward neural network, which we refer to as the maximum gain. We hypothesise that constraining the gain of a network will have a regularising effect, similar to how constraining the Lipschitz constant of a network has been shown to improve generalisation. A simple algorithm is provided that involves rescaling the weight matrix of each layer after each parameter update. We conduct a series of studies on common benchmark datasets, and also a novel dataset that we introduce to enable easier significance testing for experiments using convolutional networks. Performance on these datasets compares favourably with other common regularisation techniques. Data related to this paper is available at:


  1. 1.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)
  2. 2.
    Cayton, L.: Algorithms for manifold learning. University of California at San Diego, Technical report 12(1–17), 1 (2005)Google Scholar
  3. 3.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  4. 4.
    Dinh, L., Pascanu, R., Bengio, S., Bengio, Y.: Sharp minima can generalize for deep nets. In: International Conference on Machine Learning, pp. 1019–1028 (2017)Google Scholar
  5. 5.
    Gal, Y., Hron, J., Kendall, A.: Concrete dropout. In: Advances in Neural Information Processing Systems, pp. 3584–3593 (2017)Google Scholar
  6. 6.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  7. 7.
    Gouk, H., Frank, E., Pfahringer, B., Cree, M.: Regularisation of neural networks by enforcing Lipschitz continuity. Technical report, University of Waikato (2018)Google Scholar
  8. 8.
    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems, pp. 5769–5779 (2017)Google Scholar
  9. 9.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  10. 10.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  11. 11.
    Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. In: Advances in Neural Information Processing Systems, pp. 2575–2583 (2015)Google Scholar
  12. 12.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)Google Scholar
  13. 13.
    Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: 6th International Conference on Learning Representations (2018)Google Scholar
  14. 14.
    Nadeau, C., Bengio, Y.: Inference for the generalization error. In: Advances in Neural Information Processing Systems, pp. 307–313 (2000)Google Scholar
  15. 15.
    Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 901–909 (2016)Google Scholar
  16. 16.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  17. 17.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Proceedings of the British Machine Vision Conference (BMVC), September 2016Google Scholar
  19. 19.
    Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Henry Gouk
    • 1
    Email author
  • Bernhard Pfahringer
    • 1
  • Eibe Frank
    • 1
  • Michael J. Cree
    • 2
  1. 1.Department of Computer ScienceUniversity of WaikatoHamiltonNew Zealand
  2. 2.School of EngineeringUniversity of WaikatoHamiltonNew Zealand

Personalised recommendations