Skip to main content

Ensemble of Subset of k-Nearest Neighbours Models for Class Membership Probability Estimation

  • Conference paper
  • First Online:
Analysis of Large and Complex Data

Abstract

Combining multiple classifiers can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. This technique can also be used for estimating class membership probabilities. We propose an ensemble of k-Nearest Neighbours (kNN) classifiers for class membership probability estimation in the presence of non-informative features in the data. This is done in two steps. Firstly, we select classifiers based upon their individual performance from a set of base kNN models, each generated on a bootstrap sample using a random feature set from the feature space of training data. Secondly, a step wise selection is used on the selected learners, and those models are added to the ensemble that maximize its predictive performance. We use bench mark data sets with some added non-informative features for the evaluation of our method. Experimental comparison of the proposed method with usual kNN, bagged kNN, random kNN and random forest shows that it leads to high predictive performance in terms of minimum Brier score on most of the data sets. The results are also verified by simulation studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Bay, S. (1998). Combining nearest neighbor classifiers through multiple feature subsets. In Proceedings of the Fifteenth International Conference on Machine Learning (Vol.3, pp. 37–45).

    Google Scholar 

  • Breiman, L. (1996): Bagging predictors. Machine Learning, 24(2), 123–140.

    MathSciNet  MATH  Google Scholar 

  • Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1–3.

    Article  Google Scholar 

  • Cover, T., & Hart, P. (1967). Nearest nieghbor pattern classification. IEEE Transaction on Information Theory, 13, 21–27.

    Article  MATH  Google Scholar 

  • Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102 359–378.

    Article  MathSciNet  MATH  Google Scholar 

  • Hothorn, T., & Lausen, B. (2003). Double-bagging: Combining classifiers by bootstrap aggregation. Pattern Recognition, 36(9), 1303–1309.

    Article  MATH  Google Scholar 

  • Khan, Z., Perperoglou, A., Gul, A., Mahmoud, O., Adler, W., Miftahuddin, M., & Lausen, B. (2015). An ensemble of optimal trees for class membership probability estimation. In Proceedings of European Conference on Data Analysis.

    Google Scholar 

  • Kruppa, J., Liu, Y., Biau, G., Kohler, M., Konig, I. R., Malley, J. D., et al. (2014a). Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory. Biometrical Journal, 56, 534–563.

    Article  MathSciNet  MATH  Google Scholar 

  • Kruppa, J., Liu, Y., Diener, H. C., Weimar, C., Konig, I. R., & Ziegler, A. (2014b). Probability Estimation with machine learning methods for dichotomous and multicategory outcome: applications. Biometrical Journal, 56, 564–583.

    Article  MathSciNet  MATH  Google Scholar 

  • Kruppa, J., Ziegler, A., & Konig, I. R. (2012). Risk estimation and risk prediction using machine-learning methods. Human Genetics, 131, 1639–1654.

    Article  Google Scholar 

  • Kuncheva, L. I.(2004). Combining pattern classifiers. Methods and algorithms. New York: Wiley.

    Book  MATH  Google Scholar 

  • Lee, B. K., Lessler, J., & Stuart, E. A. (2010). Improving propensity score weighting using machine learning. Statistics in Medicine, 29, 337–346.

    MathSciNet  Google Scholar 

  • Li, S., Harner, E. J., & Adjeroh, D. (2011). Random knn feature selection a fast and stable alternative to random forests. BMC Bioinformatics, 12(1), 450.

    Google Scholar 

  • Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., & Lausen, B. (2014b). Propoverlap: Feature (gene) selection based on the Proportional Overlapping scores. R package version 1.0, http://CRAN.R-project.org/package=propOverlap

  • Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., Metodiev, M. V., et al. (2014a). A feature selection method for classification within functional genomics experiments based on the proportional overlapping score. BMC Bioinformatics, 15, 274.

    Google Scholar 

  • Malley, J., Kruppa, J., Dasgupta, A., Malley, K., & Ziegler, A. (2012). Probability machines: Consistent probability estimation using nonparametric learning machines. Methods of Information in Medicine, 51, 74–81.

    Article  Google Scholar 

  • Mease, D., Wyner, A. J., & Buja, A. (2007). Boosted classification trees and class probability/quantile estimation. The Journal of Machine Learning Research, 8, 409–439.

    MATH  Google Scholar 

  • Melville, P., Shah, N., Mihalkova, L., & Mooney, R. (2004). Experiments on ensembles with missing and noisy data. Multiple Classifier Systems, 53, 293–302.

    Article  Google Scholar 

  • Nettleton, D. F., Orriols-puig, A., & Fornells, A. (2010). A Study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review, 33(4), 275–306.

    Article  Google Scholar 

  • Samworth, R. J. (2012). Optimal weighted nearest neighbour classifiers. The Annals of Statistics, 40(5), 2733–2763.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asma Gul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gul, A. et al. (2016). Ensemble of Subset of k-Nearest Neighbours Models for Class Membership Probability Estimation. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_35

Download citation

Publish with us

Policies and ethics