Robust Naive Bayes Combination of Multiple Classifications

  • Naonori Ueda
  • Yusuke Tanaka
  • Akinori Fujino
Conference paper
Part of the Mathematics for Industry book series (MFI, volume 1)


When we face new complex classification tasks, since it is difficult to design a good feature set for observed raw data, we often obtain an unsatisfactorily biased classifier. Namely, the trained classifier can only successfully classify certain classes of samples owing to its poor feature set. To tackle the problem, we propose a robust naive Bayes combination scheme in which we effectively combine classifier predictions that we obtained from different classifiers and/or different feature sets. Since we assume that the multiple classifier predictions are given, any type of classifier and any feature set are available in our scheme. In our combination scheme each prediction is regarded as an independent realization of a categorical random variable (i.e., class label) and a naive Bayes model is trained by using a set of the predictions within a supervised learning framework. The key feature of our scheme is the introduction of a class-specific variable selection mechanism to avoid overfitting to poor classifier predictions. We demonstrate the practical benefit of our simple combination scheme with both synthetic and real data sets, and show that it can achieve much higher classification accuracy than conventional ensemble classifiers.


Classification Naive Bayes model Model combination Meta-learning Bayesian learning Ensemble learning Real nursing activity recognition 



This research is supported by FIRST program. The authors would like to appreciate the cooperation for experiment by staff of Saiseikai Kumamoto Hospital, Japan.


  1. 1.
    Bao, L., Intille, S.: Activity recognition from user-annotated acceleration data. In: Proceedings of International Conference on Pervasive Computing, Pervasive 2004, pp. 1–17. Springer, (2004)Google Scholar
  2. 2.
    Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Department of Information and Computer Science, Irvine (1998)Google Scholar
  3. 3.
    Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)MATHMathSciNetGoogle Scholar
  4. 4.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefMATHGoogle Scholar
  5. 5.
    Dawid, A., Skene, A.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Appl. Stat. 28, 20–28 (1979)CrossRefGoogle Scholar
  6. 6.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Proceedings of the First International Workshop on Multiple Classifier Systems, pp. 1–15. Springer, London (2000)Google Scholar
  7. 7.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of International Conference on Machine Learning ICML96, pp. 148–156 (1996)Google Scholar
  8. 8.
    Fu, Q., Banerjee, A.: Bayesian overlapping subspace clustering. In: Proceedings of International Conference on Data Mining, ICDM2009 (2009)Google Scholar
  9. 9.
    Guan, Y., Dy, J.G., Jordan, M.I.: A unified probabilistic model for global and local unsupervised feature selection. In: Proceedings of International Conference on Machine Learning ICML2011 (2011)Google Scholar
  10. 10.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  11. 11.
    Hastie, T., Tibshirani, T., Friedman, J.H.: The elements of statistical learning: data mining, inference, and prediction (2009)Google Scholar
  12. 12.
    Hoff, P.D.: Subset clustering of binary sequences, with an application to genomic abnormality data. Biometrics 61(4), 1027–1036 (2005)CrossRefMATHMathSciNetGoogle Scholar
  13. 13.
    Hsu, C., Chang, C., Lin, C.: A practical guide to support vector classification. (2010)
  14. 14.
    Kim, H.C., Ghahramani, Z.: Bayesian classifier combination. In: Proceedings of International Conference on Artificial Intelligence and Statistcs, AISTATS2012. (2012)
  15. 15.
    Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)MATHGoogle Scholar
  16. 16.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  17. 17.
    Shan, H., Banerjee, A.: Bayesian co-clustering. In: Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 530–539 ( 2008)Google Scholar
  18. 18.
    Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. 58(1), 267–288 (1996)MATHMathSciNetGoogle Scholar
  19. 19.
    Whitehil, J., Ruvolo, P., Wu, T., Bergsma, L., Movellan, J.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. Advances in Neural Information Processing Systems, NIPS2009 (2009)Google Scholar
  20. 20.
    Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992)CrossRefGoogle Scholar

Copyright information

© Springer Japan 2014

Authors and Affiliations

  1. 1.NTT Communication Science LaboratoriesSorakugunJapan
  2. 2.NTT Service Evolution LaboratoriesYokosuka-shiJapan

Personalised recommendations