Hierarchical Bayesian Classifier Combination

  • Mohammad Ghasemi HamedEmail author
  • Ahmad AkbariEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10934)


This paper proposes a Bayesian method for combining the output of multiple base classifiers. The focus is put on combination methods for merging the outputs of several and possibly heterogeneous classifiers with the aim of gaining in the final accuracy. Our work is based on the Dawid and Skene’s work [11] for modelling disagreement among human assessors. We also take advantage of the Bayesian Model Averaging (BMA) framework without requiring the ensemble of base classifiers to correspond in a mutually exclusive and exhaustive way to all the possible data generating models. This makes our method relevant for combining multiple classifiers’ output each observing and predicting the behavior of an entity by means of divers aspects of the underlying environment. The proposed method, called Hierarchical Bayesian Classifier Combination (HBCC) is for discrete classifiers and assumes that the individual classifiers are conditionally independent given the true class label. The comparison of HBCC with majority voting on six benchmark classification data sets shows that it generally outperforms majority voting in the classification accuracy.


  1. 1.
    Ahdesmäki, M., Strimmer, K., et al.: Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. Ann. Appl. Stat. 4(1), 503–519 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Bauer, E., Ron, K.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999)CrossRefGoogle Scholar
  3. 3.
    Bernardo, J.M., Smith, A.F.M.: Bayesian Theory, vol. 405. Wiley, Hoboken (2009)Google Scholar
  4. 4.
    Bishop, C.M., et al.: Pattern Recognition and Machine Learning, vol. 4. Springer, New York (2006)zbMATHGoogle Scholar
  5. 5.
    Bolstad, W.M.: Introduction to Bayesian Statistics. Wiley, Hoboken (2013)zbMATHGoogle Scholar
  6. 6.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHGoogle Scholar
  7. 7.
    Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)zbMATHGoogle Scholar
  8. 8.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  9. 9.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)zbMATHGoogle Scholar
  10. 10.
    Clarke, B.: Comparing bayes model averaging and stacking when model approximation error cannot be ignored. J. Mach. Learn. Res. 4, 683–712 (2003)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28, 20–28 (1979)CrossRefGoogle Scholar
  12. 12.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). Scholar
  13. 13.
    Domingos, P.: Bayesian averaging of classifiers and the overfitting problem. In: ICML, pp. 223–230 (2000)Google Scholar
  14. 14.
    Džeroski, S., Ženko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54(3), 255–273 (2004)zbMATHCrossRefGoogle Scholar
  15. 15.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010)Google Scholar
  16. 16.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. ICML 96, 148–156 (1996)Google Scholar
  17. 17.
    Haitovsky, Y., Smith, A., Liu, Y.: Modelling disagreements among and within raters’ assessments from the bayesian point of view. In: Draft. Venue: Presented at the Valencia Meeting (2002)Google Scholar
  18. 18.
    Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)CrossRefGoogle Scholar
  19. 19.
    Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6(2), 181–214 (1994)CrossRefGoogle Scholar
  20. 20.
    Kim, H.-C., Ghahramani, Z.: Bayesian classifier combination. In: International Conference on Artificial Intelligence and Statistics, pp. 619–627 (2012)Google Scholar
  21. 21.
    Lacoste, A., Marchand, M., Laviolette, F., Larochelle, H.: Agnostic Bayesian learning of ensembles. In: Proceedings of the 31st International Conference on Machine Learning, pp. 611–619 (2014)Google Scholar
  22. 22.
    Maclin, R., Opitz, D.: An empirical evaluation of bagging and boosting. AAAI/IAAI 1997, 546–551 (1997)Google Scholar
  23. 23.
    Minka, T.P.: Bayesian model averaging is not model combination, pp. 1–2 (2000).
  24. 24.
    Monteith, K., Carroll, J.L., Seppi, K., Martinez, T.: Turning Bayesian model averaging into Bayesian model combination. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2657–2663. IEEE (2011)Google Scholar
  25. 25.
    Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)zbMATHCrossRefGoogle Scholar
  26. 26.
    Quinlan, J.R.: Bagging, boosting, and c4. 5. In: AAAI/IAAI, vol. 1, pp. 725–730 (1996)Google Scholar
  27. 27.
    Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)MathSciNetGoogle Scholar
  28. 28.
    Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)CrossRefGoogle Scholar
  29. 29.
    Schölkopf, B., Smola, A.: Support vector machines. In: Encyclopedia of Biostatistics (1998)Google Scholar
  30. 30.
    Simpson, E., Roberts, S., Psorakis, I., Smith, A.: Dynamic Bayesian combination of multiple imperfect classifiers. In: Guy, T., Karny, M., Wolpert, D. (eds.) Decision Making and Imperfection, vol. 474, pp. 1–35. Springer, Heidelberg (2013). Scholar
  31. 31.
    Ting, K.M., Witten, I.H.: Stacking bagged and dagged models. In: ICML, pp. 367–375. Citeseer (1997)Google Scholar
  32. 32.
    Todorovski, L., Džeroski, S.: Combining classifiers with meta decision trees. Mach. Learn. 50(3), 223–249 (2003)zbMATHCrossRefGoogle Scholar
  33. 33.
    Tulyakov, S., Jaeger, S., Govindaraju, V., Doermann, D.: Review of classifier combination methods. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition, vol. 90, pp. 361–386. Springer, Heidelberg (2008). Scholar
  34. 34.
    Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)CrossRefGoogle Scholar
  35. 35.
    Xu, L., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)CrossRefGoogle Scholar
  36. 36.
    Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE Trans. Neural Netw. Learn. Syst. 23(8), 1177–1193 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Audio & Speech Processing Lab, Computer Engineering DepartmentIran University of Science and TechnologyTehranIran

Personalised recommendations