Using Supervised Pretraining to Improve Generalization of Neural Networks on Binary Classification Problems

  • Alex Yuxuan PengEmail author
  • Yun Sing Koh
  • Patricia Riddle
  • Bernhard Pfahringer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)


Neural networks are known to be very sensitive to the initial weights. There has been a lot of research on initialization that aims to stabilize the training process. However, very little research has studied the relationship between initialization and generalization. We demonstrate that poorly initialized model will lead to lower test accuracy. We propose a supervised pretraining technique that helps improve generalization on binary classification problems. The experimental results on four UCI datasets show that the proposed pretraining leads to higher test accuracy compared to the he_normal initialization when the training set is small. In further experiments on synthetic data, the improvement on test accuracy using the proposed pretraining reaches more than 30% when the data has high dimensionality and noisy features. Code related to this paper is available at:


Neural network Pretraining Initialization Generalization 


  1. 1.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  2. 2.
    Dheeru, D., Karra Taniskidou, E.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2017).
  3. 3.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)Google Scholar
  4. 4.
    Saxe, A.M., McClelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (2013). arXiv preprint: arXiv:1312.6120
  5. 5.
    Mishkin, D., Matas, J.: All you need is a good init. In: Proceedings of the International Conference on Learning Representations (2016)Google Scholar
  6. 6.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  7. 7.
    Guyon, I.: Design of experiments of the NIPS 2003 variable selection benchmark (2003)Google Scholar
  8. 8.
    Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 4308 (2014)CrossRefGoogle Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  11. 11.
    Gulshan, V., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316(22), 2402–2410 (2016)CrossRefGoogle Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  13. 13.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)Google Scholar
  14. 14.
    Ranzato, M., Poultney, C., Chopra, S., Cun, Y.L.: Efficient learning of sparse representations with an energy-based model. In: Advances in Neural Information Processing Systems, pp. 1137–1144 (2007)Google Scholar
  15. 15.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, pp. 153–160 (2007)Google Scholar
  16. 16.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Graves, A., Mohamed, A.r., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)Google Scholar
  18. 18.
    Sak, H., Senior, A., Rao, K., Beaufays, F.: Fast and accurate recurrent neural network acoustic models for speech recognition. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)Google Scholar
  19. 19.
    Chollet, F., et al.: Keras (2015).
  20. 20.
    Abadi, M., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Alex Yuxuan Peng
    • 1
    Email author
  • Yun Sing Koh
    • 1
  • Patricia Riddle
    • 1
  • Bernhard Pfahringer
    • 2
  1. 1.University of AucklandAucklandNew Zealand
  2. 2.University of WaikatoHamiltonNew Zealand

Personalised recommendations