Skip to main content

Efficient Model Averaging for Deep Neural Networks

  • Conference paper
  • First Online:
Computer Vision – ACCV 2016 (ACCV 2016)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10112))

Included in the following conference series:

Abstract

Large neural networks trained on small datasets are increasingly prone to overfitting. Traditional machine learning methods can reduce overfitting by employing bagging or boosting to train several diverse models. For large neural networks, however, this is prohibitively expensive. To address this issue, we propose a method to leverage the benefits of ensembles without explicitely training several expensive neural network models. In contrast to Dropout, to encourage diversity of our sub-networks, we propose to maximize diversity of individual networks with a loss function: DivLoss. We demonstrate the effectiveness of DivLoss on the challenging CIFAR datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Breiman, L.: Bagging predictors. Mach. Learn. (ML) 24, 123–140 (1996)

    MATH  Google Scholar 

  2. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. (JCSS) 55, 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. (ML) 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  4. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)

    Google Scholar 

  5. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations (ICLR) (2014)

    Google Scholar 

  6. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. (JMLR) 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  7. Cogswell, M., Ahmed, F., Girshick, R.B., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  8. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the International Conference on Machine Learning (ICML) (2010)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  10. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the International Conference on Machine Learning (ICML) (2013)

    Google Scholar 

  11. Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the International Conference on Machine Learning (ICML) (2013)

    Google Scholar 

  12. Clevert, D., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  13. Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J. (eds.) Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)

    Google Scholar 

  14. Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. (2015). arXiv:1512.03385

  16. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. (JMLR) 12, 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  17. Tieleman, T., Hinton, G.: Lecture 6.5–RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012)

    Google Scholar 

  18. Zeiler, M.D.: ADADELTA: an adaptive learning rate method (2012). http://arxiv.org/abs/1212.5701

  19. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  20. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning (ICML) (2015)

    Google Scholar 

  21. Mishkin, D., Matas, J.: All you need is a good in it. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  22. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Artificial Intelligence and Statistics Conference (AISTATS) (2010)

    Google Scholar 

  23. Krähenbühl, P., Doersch, C., Donahue, J., Darrell, T.: Data-dependent initializations of convolutional neural networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  24. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  25. Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics Conference (AISTATS) (2015)

    Google Scholar 

  26. Lin, M., Chen, Q., Yan, S.: Network in network. In: Proceedings of the International Conference on Learning Representations (ICLR) (2014)

    Google Scholar 

  27. Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN) (2011)

    Google Scholar 

  28. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). http://arxiv.org/abs/1503.02531

  29. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  30. Wan, L., Zeiler, M., Zhang, S., Cun, Y.L., Fergus, R.: Regularization of neural networks using dropConnect. In: Proceedings of the International Conference on Machine Learning (ICML), JMLR Workshop and Conference Proceedings (2013)

    Google Scholar 

  31. Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2013)

    Google Scholar 

  32. Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neural Netw. 12, 1399–1404 (1999)

    Article  Google Scholar 

  33. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)

    MATH  Google Scholar 

  34. Ueda, N., Nakano, R.: Generalization error of ensemble estimators. In: Proceedings of the IEEE International Conference on Neural Networks (ICNN) (1996)

    Google Scholar 

  35. Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images (2009)

    Google Scholar 

  36. The Theano Development Team: Theano: a Python framework for fast computation of mathematical expressions (2016). arXiv.org/abs/1605.02688

Download references

Acknowledgement

This work was supported by the Austrian Research Promotion Agency (FFG) project Vision+.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Opitz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Opitz, M., Possegger, H., Bischof, H. (2017). Efficient Model Averaging for Deep Neural Networks. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54184-6_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54183-9

  • Online ISBN: 978-3-319-54184-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics