Customised ensemble methodologies for deep learning: Boosted Residual Networks and related approaches

Mosca, Alan; Magoulas, George D.

doi:10.1007/s00521-018-3922-2

Customised ensemble methodologies for deep learning: Boosted Residual Networks and related approaches

EANN 2017
Published: 11 December 2018

Volume 31, pages 1713–1731, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

638 Accesses
6 Citations
Explore all metrics

Abstract

This paper introduces a family of new customised methodologies for ensembles, called Boosted Residual Networks, which builds a boosted ensemble of residual networks by growing the member network at each round of boosting. The proposed approach combines recent developments in residual networks—a method for creating very deep networks by including a shortcut layer between different groups of layers—with Deep Incremental Boosting, a methodology to train fast ensembles of networks of increasing depth through the use of boosting. Additionally, we explore a simpler variant of Boosted Residual Networks based on bagging, called Bagged Residual Networks. We then analyse how the recent developments in ensemble distillation can improve our results. We demonstrate that the synergy of residual networks and Deep Incremental Boosting has better potential than simply boosting a residual network of fixed structure or using the equivalent Deep Incremental Boosting without the shortcut layers, by permitting the creation of models with better generalisation in significantly less time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

A survey on semi-supervised learning

Article Open access 15 November 2019

Notes

In a few cases BRN is actually faster than DIB, but we believe this to be just noise due to external factors such as system load and affinity of some resulting computational graphs instead of others.

References

He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. arXiv:1603.05027
Schapire RE, Freund Y (1996) Experiments with a new boosting algorithm. In: Machine learning: proceedings of the thirteenth international conference, pp 148–156
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157
Article Google Scholar
Tramèr F, Kurakin A, Papernot N, Goodfellow I, Boneh D, McDaniel P (2017) Ensemble adversarial training: attacks and defenses. arXiv:1705.07204
Mosca A, Magoulas GD (2018) Distillation of deep learning ensembles as a regularisation method. In: Advances in hybridization of intelligent methods, Springer, pp 97–118
Mosca A, Magoulas G (2017) Boosted residual networks. In: EANN. 18th international conference on engineering applications of neural networks
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Mosca A, Magoulas G (2016) Deep incremental boosting. In: Benzmuller C, Sutcliffe G, Rojas R (eds) GCAI 2016. 2nd global conference on artificial intelligence. EPiC series in computing, EasyChair, vol 41, pp 293–302
Whitley D, Starkweather T, Bogart C (1990) Genetic algorithms and neural networks: optimizing connections and connectivity. Parallel Comput 14(3):347–361
Article Google Scholar
Malakooti B, Zhou YQ (1994) Feedforward artificial neural networks for solving discrete multiple criteria decision making problems. Manag Sci 40(11):1542–1561
Article MATH Google Scholar
Płaczek S, Adhikari B (2014) Analysis of multilayer neural networks with direct and cross forward connection. Fundam Inf 133(2–3):227–240
MathSciNet Google Scholar
Bishop C (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
MATH Google Scholar
Ripley BD (2007) Pattern recognition and neural networks. Cambridge University Press, Cambridge
MATH Google Scholar
Raiko T, Valpola H, LeCun Y (2012) Deep learning made easier by linear transformations in perceptrons. In: Artificial intelligence and statistics, pp 924–932
Schraudolph N (1998) Accelerated gradient descent by factor-centering decomposition. Technical report/IDSIA 98
Schraudolph NN (2012) Centering neural network gradient factors. In: Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the trade. Springer, Berlin, pp 205–223
Chapter Google Scholar
Vatanen T, Raiko T, Valpola H, LeCun Y (2013) Pushing stochastic gradient towards second-order methods—backpropagation learning with transformations in nonlinearities. In: International conference on neural information processing, Springer, pp 442–449
Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Advances in neural information processing systems, pp 2377–2385
Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. arXiv:1608.06993
Greff K, Srivastava RK, Schmidhuber J (2016) Highway and residual networks learn unrolled iterative estimation. arXiv:1612.07771
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems, pp 3320–3328
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1717–1724
Veit A, Wilber MJ, Belongie S (2016) Residual networks behave like ensembles of relatively shallow networks. In: Advances in neural information processing systems, pp 550–558
Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227
Google Scholar
Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360
Article MathSciNet MATH Google Scholar
Mukherjee I, Schapire RE (2013) A theory of multiclass boosting. J Mach Learn Res 14:437–497
MathSciNet MATH Google Scholar
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
MathSciNet MATH Google Scholar
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv:1605.07146
Ba LJ, Caurana R (2014) Do deep nets really need to be deep? In: Advances in neural information processing systems, pp 2654–2662
Bucilu C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 535–541
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531
Mosca A, Magoulas GD (2016) Regularizing deep learning ensembles by distillation. In: 6th international workshop on combinations of intelligent methods and applications (CIMA 2016), p 53
Benenson R What is the class of this image? http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html. Accessed 6 Dec 2018
Lecun Y, Cortes C The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. Accessed 6 Dec 2018
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. vol 4, No. 4. Technical report, University of Toronto
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252
Article MathSciNet Google Scholar
Wan L, Zeiler M, Zhang S, Cun YL, Fergus R (2013) Regularization of neural networks using dropconnect. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 1058–1066
Graham B (2014) Fractional max-pooling. CoRR arXiv:1412.6071
Clevert D, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). CoRR arXiv:1511.07289
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article MATH Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1(6):80–83
Article MathSciNet Google Scholar
Mosca A, Magoulas GD (2017) Training convolutional networks with weight-wise adaptive learning rates. In: ESANN 2017 proceedings, European symposium on artificial neural networks, computational intelligence and machine learning. Bruges (Belgium), 26–28 April 2017, i6doc.com publ
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Lu Y, Zhong A, Li Q, Dong B (2018) Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. ICLR https://openreview.net/forum?id=ryZ283gAZ. Accessed 6 Dec 2018
Ciccone M, Gallieri M, Masci J, Osendorfer C, Gomez F (2018) NAIS-Net: stable deep networks from non-autonomous differential equations. CoRR arXiv:1804.07209

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Systems Birkbeck, University of London, London, UK
Alan Mosca & George D. Magoulas

Authors

Alan Mosca
View author publications
You can also search for this author in PubMed Google Scholar
George D. Magoulas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan Mosca.

Ethics declarations

Conflict of interest

The authors have received a hardware grant from NVIDIA for this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPUs used for this research.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mosca, A., Magoulas, G.D. Customised ensemble methodologies for deep learning: Boosted Residual Networks and related approaches. Neural Comput & Applic 31, 1713–1731 (2019). https://doi.org/10.1007/s00521-018-3922-2

Download citation

Received: 15 January 2018
Accepted: 29 November 2018
Published: 11 December 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s00521-018-3922-2

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Customised ensemble methodologies for deep learning: Boosted Residual Networks and related approaches

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A survey on semi-supervised learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Navigation

Customised ensemble methodologies for deep learning: Boosted Residual Networks and related approaches

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A survey on semi-supervised learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation