Analysis of Large and Complex Data pp 411-421 | Cite as
Ensemble of Subset of k-Nearest Neighbours Models for Class Membership Probability Estimation
Abstract
Combining multiple classifiers can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. This technique can also be used for estimating class membership probabilities. We propose an ensemble of k-Nearest Neighbours (kNN) classifiers for class membership probability estimation in the presence of non-informative features in the data. This is done in two steps. Firstly, we select classifiers based upon their individual performance from a set of base kNN models, each generated on a bootstrap sample using a random feature set from the feature space of training data. Secondly, a step wise selection is used on the selected learners, and those models are added to the ensemble that maximize its predictive performance. We use bench mark data sets with some added non-informative features for the evaluation of our method. Experimental comparison of the proposed method with usual kNN, bagged kNN, random kNN and random forest shows that it leads to high predictive performance in terms of minimum Brier score on most of the data sets. The results are also verified by simulation studies.
Keywords
Random Forest Predictive Performance Brier Score Simulation Simulation Class Membership ProbabilityReferences
- Bay, S. (1998). Combining nearest neighbor classifiers through multiple feature subsets. In Proceedings of the Fifteenth International Conference on Machine Learning (Vol.3, pp. 37–45).Google Scholar
- Breiman, L. (1996): Bagging predictors. Machine Learning, 24(2), 123–140.MathSciNetMATHGoogle Scholar
- Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1–3.CrossRefGoogle Scholar
- Cover, T., & Hart, P. (1967). Nearest nieghbor pattern classification. IEEE Transaction on Information Theory, 13, 21–27.CrossRefMATHGoogle Scholar
- Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102 359–378.MathSciNetCrossRefMATHGoogle Scholar
- Hothorn, T., & Lausen, B. (2003). Double-bagging: Combining classifiers by bootstrap aggregation. Pattern Recognition, 36(9), 1303–1309.CrossRefMATHGoogle Scholar
- Khan, Z., Perperoglou, A., Gul, A., Mahmoud, O., Adler, W., Miftahuddin, M., & Lausen, B. (2015). An ensemble of optimal trees for class membership probability estimation. In Proceedings of European Conference on Data Analysis. Google Scholar
- Kruppa, J., Liu, Y., Biau, G., Kohler, M., Konig, I. R., Malley, J. D., et al. (2014a). Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory. Biometrical Journal, 56, 534–563.MathSciNetCrossRefMATHGoogle Scholar
- Kruppa, J., Liu, Y., Diener, H. C., Weimar, C., Konig, I. R., & Ziegler, A. (2014b). Probability Estimation with machine learning methods for dichotomous and multicategory outcome: applications. Biometrical Journal, 56, 564–583.MathSciNetCrossRefMATHGoogle Scholar
- Kruppa, J., Ziegler, A., & Konig, I. R. (2012). Risk estimation and risk prediction using machine-learning methods. Human Genetics, 131, 1639–1654.CrossRefGoogle Scholar
- Kuncheva, L. I.(2004). Combining pattern classifiers. Methods and algorithms. New York: Wiley.CrossRefMATHGoogle Scholar
- Lee, B. K., Lessler, J., & Stuart, E. A. (2010). Improving propensity score weighting using machine learning. Statistics in Medicine, 29, 337–346.MathSciNetGoogle Scholar
- Li, S., Harner, E. J., & Adjeroh, D. (2011). Random knn feature selection a fast and stable alternative to random forests. BMC Bioinformatics, 12(1), 450.Google Scholar
- Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., & Lausen, B. (2014b). Propoverlap: Feature (gene) selection based on the Proportional Overlapping scores. R package version 1.0, http://CRAN.R-project.org/package=propOverlap
- Mahmoud, O., Harrison, A., Perperoglou, A., Gul, A., Khan, Z., Metodiev, M. V., et al. (2014a). A feature selection method for classification within functional genomics experiments based on the proportional overlapping score. BMC Bioinformatics, 15, 274.Google Scholar
- Malley, J., Kruppa, J., Dasgupta, A., Malley, K., & Ziegler, A. (2012). Probability machines: Consistent probability estimation using nonparametric learning machines. Methods of Information in Medicine, 51, 74–81.CrossRefGoogle Scholar
- Mease, D., Wyner, A. J., & Buja, A. (2007). Boosted classification trees and class probability/quantile estimation. The Journal of Machine Learning Research, 8, 409–439.MATHGoogle Scholar
- Melville, P., Shah, N., Mihalkova, L., & Mooney, R. (2004). Experiments on ensembles with missing and noisy data. Multiple Classifier Systems, 53, 293–302.CrossRefGoogle Scholar
- Nettleton, D. F., Orriols-puig, A., & Fornells, A. (2010). A Study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review, 33(4), 275–306.CrossRefGoogle Scholar
- Samworth, R. J. (2012). Optimal weighted nearest neighbour classifiers. The Annals of Statistics, 40(5), 2733–2763.MathSciNetCrossRefMATHGoogle Scholar