An Ensemble of Optimal Trees for Class Membership Probability Estimation

  • Zardad KhanEmail author
  • Asma Gul
  • Osama Mahmoud
  • Miftahuddin Miftahuddin
  • Aris Perperoglou
  • Werner Adler
  • Berthold Lausen
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Machine learning methods can be used for estimating the class membership probability of an observation. We propose an ensemble of optimal trees in terms of their predictive performance. This ensemble is formed by selecting the best trees from a large initial set of trees grown by random forest. A proportion of trees is selected on the basis of their individual predictive performance on out-of-bag observations. The selected trees are further assessed for their collective performance on an independent training data set. This is done by adding the trees one by one starting from the highest predictive tree. A tree is selected for the final ensemble if it increases the predictive performance of the previously combined trees. The proposed method is compared with probability estimation tree, random forest and node harvest on a number of bench mark problems using Brier score as a performance measure. In addition to reducing the number of trees in the ensemble, our method gives better results in most of the cases. The results are supported by a simulation study.


Random Forest Bench Mark Problem Node Harvest Test Observation Probability Machine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Ali, K. M., & Pazzani, M. J. (1996). Error reduction through learning multiple descriptions. Machine Learning, 24, 173–202.Google Scholar
  2. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.CrossRefzbMATHGoogle Scholar
  3. Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, 359–378.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Gul, A., Khan, Z., Mahmoud, O., Perperoglou, A., Miftahuddin, M., Adler, W., et al. (2015). Ensemble of k-nearest neighbour classifiers for class membership probability estimation. In The Proceedings of European Conference on Data Analysis, 2014.Google Scholar
  5. Hothorn, T., & Lausen, B. (2003). Double-bagging: Combining classifiers by bootstrap aggregation. Pattern Recognition, 36, 1303–1309.CrossRefzbMATHGoogle Scholar
  6. Kruppa, J., Liu, Y., Biau, G., Kohler, M., Konig, I. R., Malley, J. D., et al. (2014a). Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory. Biometrical Journal, 56, 534–563.MathSciNetCrossRefzbMATHGoogle Scholar
  7. Kruppa, J., Liu, Y., Diener, H. C., Weimar, C., Konig, I. R., & Ziegler, A. (2014b). Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications. Biometrical Journal, 56, 564–583.MathSciNetCrossRefzbMATHGoogle Scholar
  8. Kruppa, J., Ziegler, A., & Konig, I. R. (2012). Risk estimation and risk prediction using machine-learning methods. Human Genetics, 131, 1639–1654.CrossRefGoogle Scholar
  9. Liaw, A., & Wiener, M. (2002). Classification and regression by random forest. R News, 2, 18–22.Google Scholar
  10. Maclin, R., & Opitz, D. (2011). Popular ensemble methods: An empirical study. Journal of Artificial Research, 11, 169–189.zbMATHGoogle Scholar
  11. Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., & Lausen, B. (2014b). propOverlap: Feature (Gene) selection based on the proportional overlapping scores. R package version 1.0.
  12. Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., Metodiev, M. V., et al. (2014a). A feature selection method for classification within functional genomics experiments based on the proportional overlapping score. BMC Bioinformatics, 15, 274.Google Scholar
  13. Malley, J., Kruppa, J., Dasgupta, A., Malley, K., & Ziegler, A. (2012). Probability machines: Consistent probability estimation using nonparametric learning machines. Methods of Information in Medicine, 51, 74–81.CrossRefGoogle Scholar
  14. Meinshausen, N. (2010). Node harvest. The Annals of Applied Statistics, 4, 2049–2072.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Platt, J. C. (2000). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In A. J. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 61–74). Cambridge, MA: MIT Press.Google Scholar
  16. R Core Team. (2014). R: A language and environment for statistical computing.

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Zardad Khan
    • 1
    • 2
    Email author
  • Asma Gul
    • 1
    • 3
  • Osama Mahmoud
    • 1
    • 4
  • Miftahuddin Miftahuddin
    • 1
  • Aris Perperoglou
    • 1
  • Werner Adler
    • 5
  • Berthold Lausen
    • 1
  1. 1.Department of Mathematical SciencesUniversity of EssexColchesterUK
  2. 2.Department of StatisticsAbdul Wali Khan UniversityMardanPakistan
  3. 3.Department of StatisticsShaheed Benazir Bhutto Women University PeshawarKhyber PukhtoonkhwaPakistan
  4. 4.Department of Applied StatisticsHelwan UniversityCairoEgypt
  5. 5.Department of Biometry and EpidemiologyUniversity of Erlangen-NurembergErlangenGermany

Personalised recommendations