Skip to main content
Log in

Customised ensemble methodologies for deep learning: Boosted Residual Networks and related approaches

  • EANN 2017
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper introduces a family of new customised methodologies for ensembles, called Boosted Residual Networks, which builds a boosted ensemble of residual networks by growing the member network at each round of boosting. The proposed approach combines recent developments in residual networks—a method for creating very deep networks by including a shortcut layer between different groups of layers—with Deep Incremental Boosting, a methodology to train fast ensembles of networks of increasing depth through the use of boosting. Additionally, we explore a simpler variant of Boosted Residual Networks based on bagging, called Bagged Residual Networks. We then analyse how the recent developments in ensemble distillation can improve our results. We demonstrate that the synergy of residual networks and Deep Incremental Boosting has better potential than simply boosting a residual network of fixed structure or using the equivalent Deep Incremental Boosting without the shortcut layers, by permitting the creation of models with better generalisation in significantly less time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. In a few cases BRN is actually faster than DIB, but we believe this to be just noise due to external factors such as system load and affinity of some resulting computational graphs instead of others.

References

  1. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385

  2. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. arXiv:1603.05027

  3. Schapire RE, Freund Y (1996) Experiments with a new boosting algorithm. In: Machine learning: proceedings of the thirteenth international conference, pp 148–156

  4. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157

    Article  Google Scholar 

  5. Tramèr F, Kurakin A, Papernot N, Goodfellow I, Boneh D, McDaniel P (2017) Ensemble adversarial training: attacks and defenses. arXiv:1705.07204

  6. Mosca A, Magoulas GD (2018) Distillation of deep learning ensembles as a regularisation method. In: Advances in hybridization of intelligent methods, Springer, pp 97–118

  7. Mosca A, Magoulas G (2017) Boosted residual networks. In: EANN. 18th international conference on engineering applications of neural networks

  8. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  9. Mosca A, Magoulas G (2016) Deep incremental boosting. In: Benzmuller C, Sutcliffe G, Rojas R (eds) GCAI 2016. 2nd global conference on artificial intelligence. EPiC series in computing, EasyChair, vol 41, pp 293–302

  10. Whitley D, Starkweather T, Bogart C (1990) Genetic algorithms and neural networks: optimizing connections and connectivity. Parallel Comput 14(3):347–361

    Article  Google Scholar 

  11. Malakooti B, Zhou YQ (1994) Feedforward artificial neural networks for solving discrete multiple criteria decision making problems. Manag Sci 40(11):1542–1561

    Article  MATH  Google Scholar 

  12. Płaczek S, Adhikari B (2014) Analysis of multilayer neural networks with direct and cross forward connection. Fundam Inf 133(2–3):227–240

    MathSciNet  Google Scholar 

  13. Bishop C (1995) Neural networks for pattern recognition. Oxford University Press, Oxford

    MATH  Google Scholar 

  14. Ripley BD (2007) Pattern recognition and neural networks. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  15. Raiko T, Valpola H, LeCun Y (2012) Deep learning made easier by linear transformations in perceptrons. In: Artificial intelligence and statistics, pp 924–932

  16. Schraudolph N (1998) Accelerated gradient descent by factor-centering decomposition. Technical report/IDSIA 98

  17. Schraudolph NN (2012) Centering neural network gradient factors. In: Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the trade. Springer, Berlin, pp 205–223

    Chapter  Google Scholar 

  18. Vatanen T, Raiko T, Valpola H, LeCun Y (2013) Pushing stochastic gradient towards second-order methods—backpropagation learning with transformations in nonlinearities. In: International conference on neural information processing, Springer, pp 442–449

  19. Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387

  20. Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Advances in neural information processing systems, pp 2377–2385

  21. Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. arXiv:1608.06993

  22. Greff K, Srivastava RK, Schmidhuber J (2016) Highway and residual networks learn unrolled iterative estimation. arXiv:1612.07771

  23. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems, pp 3320–3328

  24. Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1717–1724

  25. Veit A, Wilber MJ, Belongie S (2016) Residual networks behave like ensembles of relatively shallow networks. In: Advances in neural information processing systems, pp 550–558

  26. Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227

    Google Scholar 

  27. Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360

    Article  MathSciNet  MATH  Google Scholar 

  28. Mukherjee I, Schapire RE (2013) A theory of multiclass boosting. J Mach Learn Res 14:437–497

    MathSciNet  MATH  Google Scholar 

  29. Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969

    MathSciNet  MATH  Google Scholar 

  30. Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv:1605.07146

  31. Ba LJ, Caurana R (2014) Do deep nets really need to be deep? In: Advances in neural information processing systems, pp 2654–2662

  32. Bucilu C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 535–541

  33. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531

  34. Mosca A, Magoulas GD (2016) Regularizing deep learning ensembles by distillation. In: 6th international workshop on combinations of intelligent methods and applications (CIMA 2016), p 53

  35. Benenson R What is the class of this image? http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html. Accessed 6 Dec 2018

  36. Lecun Y, Cortes C The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. Accessed 6 Dec 2018

  37. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. vol 4, No. 4. Technical report, University of Toronto

  38. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252

    Article  MathSciNet  Google Scholar 

  39. Wan L, Zeiler M, Zhang S, Cun YL, Fergus R (2013) Regularization of neural networks using dropconnect. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 1058–1066

  40. Graham B (2014) Fractional max-pooling. CoRR arXiv:1412.6071

  41. Clevert D, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). CoRR arXiv:1511.07289

  42. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  MATH  Google Scholar 

  43. Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1(6):80–83

    Article  MathSciNet  Google Scholar 

  44. Mosca A, Magoulas GD (2017) Training convolutional networks with weight-wise adaptive learning rates. In: ESANN 2017 proceedings, European symposium on artificial neural networks, computational intelligence and machine learning. Bruges (Belgium), 26–28 April 2017, i6doc.com publ

  45. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  46. Lu Y, Zhong A, Li Q, Dong B (2018) Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. ICLR https://openreview.net/forum?id=ryZ283gAZ. Accessed 6 Dec 2018

  47. Ciccone M, Gallieri M, Masci J, Osendorfer C, Gomez F (2018) NAIS-Net: stable deep networks from non-autonomous differential equations. CoRR arXiv:1804.07209

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan Mosca.

Ethics declarations

Conflict of interest

The authors have received a hardware grant from NVIDIA for this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPUs used for this research.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mosca, A., Magoulas, G.D. Customised ensemble methodologies for deep learning: Boosted Residual Networks and related approaches. Neural Comput & Applic 31, 1713–1731 (2019). https://doi.org/10.1007/s00521-018-3922-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3922-2

Navigation