The MBPEP: a deep ensemble pruning algorithm providing high quality uncertainty prediction

  • Ruihan Hu
  • Qijun HuangEmail author
  • Sheng Chang
  • Hao Wang
  • Jin He


Machine learning algorithms have been effectively applied into various real world tasks. However, it is difficult to provide high-quality machine learning solutions to accommodate an unknown distribution of input datasets; this difficulty is called the uncertainty prediction problems. In this paper, a margin-based Pareto deep ensemble pruning (MBPEP) model is proposed. It achieves the high-quality uncertainty estimation with a small value of the prediction interval width (MPIW) and a high confidence of prediction interval coverage probability (PICP) by using deep ensemble networks. In addition to these networks, unique loss functions are proposed, and these functions make the sub-learners available for standard gradient descent learning. Furthermore, the margin criterion fine-tuning-based Pareto pruning method is introduced to optimize the ensembles. Several experiments including predicting uncertainties of classification and regression are conducted to analyze the performance of MBPEP. The experimental results show that MBPEP achieves a small interval width and a low learning error with an optimal number of ensembles. For the real-world problems, MBPEP performs well on input datasets with unknown distributions datasets incomings and improves learning performance on a multi task problem when compared to that of each single model.


Uncertainty prediction Ensemble pruning Loss function Margin criterion tuning 



This work was supported by the National Natural Science Foundation of China (61874079, 61574102 and 61774113), the Fundamental Research Fund for the Central Universities, Wuhan University (2042018gf0045, 2042017gf0052), the Wuhan Research Program of Application Foundation (2018010401011289), and the Luojia Young Scholars Program. Part of calculation in this paper has been done on the supercomputing system in the Supercomputing Center of Wuhan University.


  1. 1.
    Tang ZR, Zhu RH, Lin P, He J, Wang H, Huang QJ, Chang S, Ma QM (2018) A hardware friendly unsupervised Memristive neural network with weight sharing mechanism. Neurocomputing. (On publishing)
  2. 2.
    Xiong W, Droppo J, Huang X, Seide F, Seltzer ML, Stolcke A, Yu D, Zweig G (2017) Toward human parity in conversational speech recognition. IEEE-ACM T Audio Spe 25(12):2410–2423Google Scholar
  3. 3.
    Liu WH, Zhang MX, Zhang YD, Liao Y, Huang QJ, Chang S (2018) Real-time multilead convolutional neural network for myocardial infarction detection. IEEE J Biomed Health Inform 22(5):1434–1444Google Scholar
  4. 4.
    Fazlollahtabar H, Hassanli S (2018) Hybrid cost and time path planning for multiple autonomous guided vehicles. Appl Intell 48(2):482–498Google Scholar
  5. 5.
    Lin P, Chang S, Wang H, Huang Q, He J (2018) SpikeCD: a parameter-insensitive spiking neural network with clustering degeneracy strategy. Neural Comput & Applic 5768:1–13Google Scholar
  6. 6.
    Hu RH, Chang S, Wang H, He J, Huang Q (2018) Huang Q (2018) Efficient multi-spike learning for spiking neural networks using probability modulated timing method. IEEE T Neur Net Lear 1–14. (On publishing)
  7. 7.
    Hu R, Chang S, Wang H, He J, Huang Q (2019) Monitor-based spiking recurrent network for the representation of complex dynamic patterns. Int J Neural Syst (Accepted for publication)Google Scholar
  8. 8.
    Gal Y (2016) Uncertainty in deep learning. PhD thesisGoogle Scholar
  9. 9.
    Kwon Y, Won JH, Kim BJ, Cho M (2018) Uncertainty quantification using Bayesian neural networks in classification: application to ischemic stroke lesion segmentation. In: Proceedings of the 1th Conference on Medical Imaging with Deep LearningGoogle Scholar
  10. 10.
    Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the 33th international conference on machine learning. ICML, pp 1050–1059Google Scholar
  11. 11.
    Neumann D, Mansi T, Georgescu B, Kamen A, Kayvanpour E (2014) Robust image-based estimation of cardiac tissue parameters and their uncertainty from noisy data. In: Proceedings of the 17th International Conference on Medical Image Computing and Computer-Assisted Intervention, pp 9–16Google Scholar
  12. 12.
    Deng XY, Xiao F, Deng Y (2016) An improved distance-based total uncertainty measure in belief function theory. Appl Intell 46(4):1–18Google Scholar
  13. 13.
    Mullins J, Mahadevan S (2016) Bayesian uncertainty integration for model calibration, validation, and prediction. JVVUQ 1(1):011006Google Scholar
  14. 14.
    Jadaiha M, Xu Y, Choi J (2012) Gaussian process regression using Laplace approximation under localization uncertainty. Proceedings of the IEEE American Control Conference, InGoogle Scholar
  15. 15.
    Khosravi A, Nahavandi S, Creighton D, Atiya AF (2011) Lower upper bound estimation method for construction of neural network-based prediction intervals. IEEE Trans Neural Netw 22(3):337–346Google Scholar
  16. 16.
    Galvan IM, Valls JM, Cervantes A, Aler R (2017) Multi-objective evolutionary optimization of prediction intervals for solar energy forecasting with neural networks. Inf Sci 418(2017):363–382Google Scholar
  17. 17.
    Mallidi SH, Ogawa T, Hermansky H (2015) Uncertainty estimation of DNN classifier. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and UnderstandingGoogle Scholar
  18. 18.
    Pearce T, Zaki M, Brintrup A, Neely A (2018) High-quality prediction intervals for deep learning: a distribution-free, ensemble approach. In: Proceedings of the 35th International Conference on Machine Learning. ICMLGoogle Scholar
  19. 19.
    Zeiler MD (2012) ADADELTA: An adaptive learning rate method. arXiv:1212.5701Google Scholar
  20. 20.
    D Kingma, P Diederik and J Ba (2014) Adam: a method for stochastic optimization. Computer ScienceGoogle Scholar
  21. 21.
    Glorot X, Bordes A, Bengio Y (2010) Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. AISTATSGoogle Scholar
  22. 22.
    Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetzbMATHGoogle Scholar
  23. 23.
    Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. In: Proceedings of the 31th Conference on Neural Information Processing systems. NIPS, pp 1050–1059Google Scholar
  24. 24.
    Ko AHR, Sabourin R, Britto JR (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recogn 41(2008):1735–1748zbMATHGoogle Scholar
  25. 25.
    C Qian YY, Zhou ZH (2015) Subset selection by pareto optimization. In: Proceedings of 28th Advances in Neural Information Processing Systems. NIPS, pp 1765–1773Google Scholar
  26. 26.
    Zhan SH, Lin J, Zhang ZJ, Zhong YW (2016) List-based simulated annealing algorithm for traveling salesman problem. Comput Intell Neurosci 2016(5):1–12Google Scholar
  27. 27.
    Yu Y, Zhou ZH (2008) On the usefulness of infeasible solutions in evolutionary search: a theoretical study. In: Proceedings of the IEEE Congress on Evolutionary Computation. CEC, pp 835–840Google Scholar
  28. 28.
    Wang G, Ma J (2011) Study of corporate credit risk prediction based on integrating boosting and random subspace. Expert Syst Appl 38(11):13871–13878Google Scholar
  29. 29.
    Wu ZY, Lin W, Ji Y (2018) An integrated ensemble learning model for imbalanced fault diagnostics and prognostics. IEEE ACCESS 6(2018):8394–8402Google Scholar
  30. 30.
    Zhang ZL, Luo XG, Yu Y, Yuan BW, Tang JF (2018) Integration of an improved dynamic ensemble selection approach to enhance one-vs-one scheme. Eng Appl Artif Intell 74(2018):45–53Google Scholar
  31. 31.
    Masoudnia S, Ebrahimpour R (2014) Mixture of experts: a literature survery. Artif Intell Rev 42(2):275–293Google Scholar
  32. 32.
    Blake C, Keogh E, Merz CJ (1998) UCI repository of Mach Learn databases.»mlearn/MLRepository.html
  33. 33.
    Hernandez-Lobato JM, Adams RP (2015) Probability backpropagation for scalable learning of Bayesian neural networks. In: Proceedings of the 32th International Conference on Machine Learning. ICML, pp 1861–1869Google Scholar
  34. 34.
    Gal J, Hron J, Kendall A (2017) Concrete dropout. In: Proceedings of 30th Advances in Neural Information Processing Systems, NIPSGoogle Scholar
  35. 35.
    Woloszynski T, Kurzynski M, Podsiadlo P, Stachowiak GW (2012) A measure of competence based on random classification for dynamic ensemble selection. Inform Fusion 13(3):207–213Google Scholar
  36. 36.
    Oliverira DVR, Cavalcanti GDC, Sabourin R (2017) Online pruning of base classifiers for dynamic ensemble selection. Pattern Recogn 72(2017):44–58Google Scholar
  37. 37.
    Cruz RMO, Sabourin R, Cavalcanti GDC (2014) On meta-learning for dynamic ensemble selection. In: Proceedings on 22th International Conference on Pattern Recognition. ICPR, pp 1230–1235Google Scholar
  38. 38.
    Cruz RMO, Sabourin R, Cavalcanti GD (2017) META-DES.Oracle: META-learning and feature selection for dynamic ensemble selection. Inform Fusion 38(2017):84–103Google Scholar
  39. 39.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient based learning applied to document recognition. Proc IEEE 86(11):2278–2324Google Scholar
  40. 40.
    Sun H, Xu W, Deng C, Tan Y (2016) Multi-digit image synthesis using recurrent conditional variational autoencoder. In: Proceedings of the IEEE International Joint Conference on Neural Networks. IJCNNGoogle Scholar
  41. 41.
    Seng HKP, Ang LM, Ooi CS (2018) A combined rule-based and machine learning audio-visual emotion recognition approach. IEEE Trans Affect Comput 9(1):3–13Google Scholar
  42. 42.
    Adavanne S, Virtanen T (2017) Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network. In: Proceedings Workshop on Detection and Classification of Acoustic Scenes and Events. DCASEGoogle Scholar
  43. 43.
    Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings on 14th IEEE Conference on Computer Vision and Pattern Recognition. CVPRGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Ruihan Hu
    • 1
  • Qijun Huang
    • 1
    Email author
  • Sheng Chang
    • 1
  • Hao Wang
    • 1
  • Jin He
    • 1
  1. 1.Wuhan UniversityWuhanChina

Personalised recommendations