Advertisement

Ensemble of Subset of k-Nearest Neighbours Models for Class Membership Probability Estimation

  • Asma GulEmail author
  • Zardad Khan
  • Aris Perperoglou
  • Osama Mahmoud
  • Miftahuddin Miftahuddin
  • Werner Adler
  • Berthold Lausen
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

Combining multiple classifiers can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. This technique can also be used for estimating class membership probabilities. We propose an ensemble of k-Nearest Neighbours (kNN) classifiers for class membership probability estimation in the presence of non-informative features in the data. This is done in two steps. Firstly, we select classifiers based upon their individual performance from a set of base kNN models, each generated on a bootstrap sample using a random feature set from the feature space of training data. Secondly, a step wise selection is used on the selected learners, and those models are added to the ensemble that maximize its predictive performance. We use bench mark data sets with some added non-informative features for the evaluation of our method. Experimental comparison of the proposed method with usual kNN, bagged kNN, random kNN and random forest shows that it leads to high predictive performance in terms of minimum Brier score on most of the data sets. The results are also verified by simulation studies.

Keywords

Random Forest Predictive Performance Brier Score Simulation Simulation Class Membership Probability 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Bay, S. (1998). Combining nearest neighbor classifiers through multiple feature subsets. In Proceedings of the Fifteenth International Conference on Machine Learning (Vol.3, pp. 37–45).Google Scholar
  2. Breiman, L. (1996): Bagging predictors. Machine Learning, 24(2), 123–140.MathSciNetzbMATHGoogle Scholar
  3. Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1–3.CrossRefGoogle Scholar
  4. Cover, T., & Hart, P. (1967). Nearest nieghbor pattern classification. IEEE Transaction on Information Theory, 13, 21–27.CrossRefzbMATHGoogle Scholar
  5. Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102 359–378.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Hothorn, T., & Lausen, B. (2003). Double-bagging: Combining classifiers by bootstrap aggregation. Pattern Recognition, 36(9), 1303–1309.CrossRefzbMATHGoogle Scholar
  7. Khan, Z., Perperoglou, A., Gul, A., Mahmoud, O., Adler, W., Miftahuddin, M., & Lausen, B. (2015). An ensemble of optimal trees for class membership probability estimation. In Proceedings of European Conference on Data Analysis. Google Scholar
  8. Kruppa, J., Liu, Y., Biau, G., Kohler, M., Konig, I. R., Malley, J. D., et al. (2014a). Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory. Biometrical Journal, 56, 534–563.MathSciNetCrossRefzbMATHGoogle Scholar
  9. Kruppa, J., Liu, Y., Diener, H. C., Weimar, C., Konig, I. R., & Ziegler, A. (2014b). Probability Estimation with machine learning methods for dichotomous and multicategory outcome: applications. Biometrical Journal, 56, 564–583.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Kruppa, J., Ziegler, A., & Konig, I. R. (2012). Risk estimation and risk prediction using machine-learning methods. Human Genetics, 131, 1639–1654.CrossRefGoogle Scholar
  11. Kuncheva, L. I.(2004). Combining pattern classifiers. Methods and algorithms. New York: Wiley.CrossRefzbMATHGoogle Scholar
  12. Lee, B. K., Lessler, J., & Stuart, E. A. (2010). Improving propensity score weighting using machine learning. Statistics in Medicine, 29, 337–346.MathSciNetGoogle Scholar
  13. Li, S., Harner, E. J., & Adjeroh, D. (2011). Random knn feature selection a fast and stable alternative to random forests. BMC Bioinformatics, 12(1), 450.Google Scholar
  14. Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., & Lausen, B. (2014b). Propoverlap: Feature (gene) selection based on the Proportional Overlapping scores. R package version 1.0, http://CRAN.R-project.org/package=propOverlap
  15. Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., Metodiev, M. V., et al. (2014a). A feature selection method for classification within functional genomics experiments based on the proportional overlapping score. BMC Bioinformatics, 15, 274.Google Scholar
  16. Malley, J., Kruppa, J., Dasgupta, A., Malley, K., & Ziegler, A. (2012). Probability machines: Consistent probability estimation using nonparametric learning machines. Methods of Information in Medicine, 51, 74–81.CrossRefGoogle Scholar
  17. Mease, D., Wyner, A. J., & Buja, A. (2007). Boosted classification trees and class probability/quantile estimation. The Journal of Machine Learning Research, 8, 409–439.zbMATHGoogle Scholar
  18. Melville, P., Shah, N., Mihalkova, L., & Mooney, R. (2004). Experiments on ensembles with missing and noisy data. Multiple Classifier Systems, 53, 293–302.CrossRefGoogle Scholar
  19. Nettleton, D. F., Orriols-puig, A., & Fornells, A. (2010). A Study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review, 33(4), 275–306.CrossRefGoogle Scholar
  20. Samworth, R. J. (2012). Optimal weighted nearest neighbour classifiers. The Annals of Statistics, 40(5), 2733–2763.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Asma Gul
    • 1
    • 2
    Email author
  • Zardad Khan
    • 1
  • Aris Perperoglou
    • 1
  • Osama Mahmoud
    • 1
    • 3
  • Miftahuddin Miftahuddin
    • 1
  • Werner Adler
    • 4
  • Berthold Lausen
    • 1
  1. 1.Department of Mathematical SciencesUniversity of EssexColchesterUK
  2. 2.Department of StatisticsShaheed Benazir Bhutto Women University PeshawarKhyber PukhtoonkhwaPakistan
  3. 3.Department of Applied StatisticsHelwan UniversityCairoEgypt
  4. 4.Department of Biometry and EpidemiologyUniversity of Erlangen-NurembergErlangenGermany

Personalised recommendations