Further Advantages of Data Augmentation on Convolutional Neural Networks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11139)


Data augmentation is a popular technique largely used to enhance the training of convolutional neural networks. Although many of its benefits are well known by deep learning researchers and practitioners, its implicit regularization effects, as compared to popular explicit regularization techniques, such as weight decay and dropout, remain largely unstudied. As a matter of fact, convolutional neural networks for image object classification are typically trained with both data augmentation and explicit regularization, assuming the benefits of all techniques are complementary. In this paper, we systematically analyze these techniques through ablation studies of different network architectures trained with different amounts of training data. Our results unveil a largely ignored advantage of data augmentation: networks trained with just data augmentation more easily adapt to different architectures and amount of training data, as opposed to weight decay and dropout, which require specific fine-tuning of their hyperparameters.


Data augmentation Regularization CNNs 



This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 641805.


  1. 1.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015).
  2. 2.
    Antoniou, A., Storkey, A., Edwards, H.: Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017)
  3. 3.
    Bengio, Y., et al.: Deep learners benefit more from out-of-distribution examples. In: International Conference on Artificial Intelligence and Statistics, pp. 164–172 (2011)Google Scholar
  4. 4.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, pp. 153–160 (2007)Google Scholar
  5. 5.
    Chollet, F., et al.: Keras (2015).
  6. 6.
    Ciresan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep big simple neural nets excel on handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)CrossRefGoogle Scholar
  7. 7.
    DeVries, T., Taylor, G.W.: Dataset augmentation in feature space. In: International Conference on Learning Representations (2017)Google Scholar
  8. 8.
    DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
  9. 9.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 249–256, May 2010Google Scholar
  10. 10.
    Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A.C., Bengio, Y.: Maxout networks. In: International Conference on Machine Learning, pp. 1319–1327 (2013)Google Scholar
  11. 11.
    Graham, B.: Fractional max-pooling. arXiv preprint arXiv:1412.6071 (2014)
  12. 12.
    Hanson, S.J., Pratt, L.Y.: Comparing biases for minimal network construction with back-propagation. In: Advances in Neural Information Processing Systems, pp. 177–185 (1989)Google Scholar
  13. 13.
    Hauberg, S., Freifeld, O., Larsen, A.B.L., Fisher, J., Hansen, L.: Dreaming more data: class-dependent distributions over diffeomorphisms for learned data augmentation. In: Artificial Intelligence and Statistics, pp. 342–350 (2016)Google Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  16. 16.
    Hernández-García, A., König, P.: Do deep nets really need weight decay and dropout? arXiv preprint arXiv:1802.07042 (2018)
  17. 17.
    Hernández-García, A., Mehrer, J., Kriegeskorte, N., König, P., Kietzmann, T.C.: Deep neural networks trained with heavier data augmentation learn features closer to representations in hIT. In: Conference on Cognitive Computational Neuroscience (2018)Google Scholar
  18. 18.
    Hilliard, N., Phillips, L., Howland, S., Yankov, A., Corley, C.D., Hodas, N.O.: Few-shot learning with metric-agnostic conditional embeddings. arXiv preprint arXiv:1802.04376 (2018)
  19. 19.
    Jaitly, N., Hinton, G.E.: Vocal tract length perturbation (VTLP) improves speech recognition. In: ICML Workshop on Deep Learning for Audio, Speech and Language, pp. 625–660 (2013)Google Scholar
  20. 20.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)Google Scholar
  21. 21.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  22. 22.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  23. 23.
    Lemley, J., Bazrafkan, S., Corcoran, P.: Smart augmentation-learning an optimal data augmentation strategy. IEEE Access 5, 5858–5869 (2017)CrossRefGoogle Scholar
  24. 24.
    Lu, X., Zheng, B., Velivelli, A., Zhai, C.: Enhancing text categorization with semantic-enriched representation and training data augmentation. J. Am. Med. Inf. Assoc. 13(5), 526–535 (2006)CrossRefGoogle Scholar
  25. 25.
    Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621 (2017)
  26. 26.
    Ratner, A.J., Ehrenberg, H.R., Hussain, Z., Dunnmon, J., Ré, C.: Learning to compose domain-specific transformations for data augmentation. In: Advances in Neural Information Processing Systems, pp. 3239–3249 (2017)Google Scholar
  27. 27.
    Simard, P., Victorri, B., LeCun, Y., Denker, J.: Tangent prop-a formalism for specifying selected invariances in an adaptive network. In: Advances in Neural Information Processing Systems, pp. 895–903 (1992)Google Scholar
  28. 28.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: International Conference on Learning Representations (2014)Google Scholar
  29. 29.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Uhlich, S., et al.: Improving music source separation based on deep neural networks through data augmentation and network blending. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 261–265 (2017)Google Scholar
  31. 31.
    Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16(2), 264–280 (1971)CrossRefGoogle Scholar
  32. 32.
    Wu, R., Yan, S., Shan, Y., Dang, Q., Sun, G.: Deep image: scaling up image recognition. arXiv preprint arXiv:1501.02876 (2015)
  33. 33.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Proceedings of the British Machine Vision Conference, BMVC, pp. 87.1–87.12 (2016)Google Scholar
  34. 34.
    Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: International Conference on Learning Representations, ICLR, arXiv:1611.03530 (2017)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Institute of Cognitive ScienceUniversity of OsnabrückOsnabrückGermany

Personalised recommendations