Ensemble of a subset of kNN classifiers

Gul, Asma; Perperoglou, Aris; Khan, Zardad; Mahmoud, Osama; Miftahuddin, Miftahuddin; Adler, Werner; Lausen, Berthold

doi:10.1007/s11634-015-0227-5

Ensemble of a subset of kNN classifiers

Regular Article
Open access
Published: 22 January 2016

Volume 12, pages 827–840, (2018)
Cite this article

Download PDF

You have full access to this open access article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Ensemble of a subset of kNN classifiers

Download PDF

Asma Gul^1,2,
Aris Perperoglou¹,
Zardad Khan^1,3,
Osama Mahmoud¹,
Miftahuddin Miftahuddin¹,
Werner Adler⁴ &
…
Berthold Lausen¹

11k Accesses
63 Citations
3 Altmetric
Explore all metrics

Abstract

Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines.

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A random forest guided tour

Article 19 April 2016

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

1 Introduction

In supervised classification tasks, the aim is to construct a predictor that assigns a class label to new observations. To do so the training data is utilized, where a class label is associated with each pattern. The class label of an observation is described by a feature vector. However, in many real life classification problems, one often encounters with imprecise data including non-informative features which dramatically increases the classification error of the algorithms (Nettleton et al. 2010).

To overcome this problem feature selection methods are usually recommended before classification to mitigate the effect of such non-informative features (Liu et al. 2014; Mahmoud et al. 2014). These methods investigate the most discriminative features subset from the original features that increases classification performance of a classifier. However, different feature selection methods will result in different feature subsets for the same data set thus varying feature relevancy. This encourages combining the results of several best feature subsets.

Combining multiple classifiers, known as ensemble techniques, have emerged as promising methods to improve the classification performance of weak learners and have gained a lot of interest in the last two decades (Barandela et al. 2013; Bauer and Kohavi 1999; Maclin and Opitz 2011; Melville et al. 2004). These techniques lead to substantial reduction in classification error in many real life applications and, in general, are more resilient to non-informative features in the data than using an individual model (Khoshgoftaar et al. 2011; Melville et al. 2004). One of the simplest ensemble technique is bootstrap aggregation (bagging), that combines the outputs of classifiers constructed on randomly-generated bootstrap training sets (Breiman 1996a). In bagging, B bootstrap samples are randomly drawn from the learning set, and a base learner is developed on each of these samples. A new observation is then classified by majority voting of these individual classifiers. Bagging has been used with numerous variations in the literature (Bauer and Kohavi 1999; Hothorn and Lausen 2003a, b). It is demonstrated that bagging can be used to improve the prediction accuracy of weak classifiers, such as decision trees (Breiman 1996a; Hothorn et al. 2004; Hothorn and Lausen 2005).

One of the simplest and oldest methods for classification is the k nearest neighbours (kNN) classifier. It classifies an unknown observation to the class of majority among its k nearest neighbours observations, as measured by a distance metric, in the training data (Cover and Hart 1967; Guvenir and Akkus 1997). Despite its simplicity, kNN gives competitive results and in some cases even outperforms other complex learning algorithms. However, kNN is affected by non-informative features in the data, often the case with high dimensional data. Attempts have been made to improve the performance of nearest neighbours classifier by ensemble techniques. Some related work on ensemble of kNN classifiers can be found in Grabowski (2002), Domeniconi and Yan (2004), Zhou and Yu (2005), Hall and Samworth (2005) and Samworth (2012).

An ensemble of nearest neighbour classifiers where each member classifier of the ensemble has access to a random feature subset only and the outcomes of these multiple nearest neighbour classifiers are combined for final decision is proposed in Bay (1998). A similar approach based on random feature subsets, random kNN based on the idea of random forest, is proposed for classification of high dimensional data sets (Li et al. 2011). Li et al. (2011) rank the features according to their importance and get a final set of features for the final model.

In this manuscript we suggest an ensemble of subset of kNN classifiers (ESkNN) particularly to deal with the issue of non-informative features in a data set. We applied ESkNN to a benchmark and simulated classification problems and compare the results with those of simple kNN, bagged kNN (BkNN), random kNN (RkNN), ensemble based on multiple feature subset method (MFS), random forest (RF) and support vector machines (SVM). Experiments are carried out on the data sets with their original features set and with some added non-informative features.

2 Ensemble of subset of kNN classifiers

Let ${\mathcal {L}}={(\mathbf x _i,y_i), i=1 \ldots n}$ be a training set consisting of n independent observations, where $\mathbf{x _i}= (x_{i1}, x_{i2}, \ldots , x_{id})$ is a d-dimensional feature vector and y is the vector of class labels; where ${ y_i \in \{{1, \ldots , J}}\}$, J being the total number of classes, here we consider the two class problem, thus ${ y_i \in \{{1,2}}\}$. Based on this available data set ${\mathcal {L}}$, a classifier predicts the class label for a new/test observation with feature vector $(\mathbf {x}^\prime )$. Divide the training data ${\mathcal {L}}$ in two parts, ${\mathcal {L}_{T}}$ and ${\mathcal {L}_{\textit{V}}}$, the first one for construction of the classifiers and the other part for validation. For simplicity we denote the set used for construction of the models ${\mathcal {L}_{T}}$ by ${\mathcal {L} ^ *}$. Let us denote the d input features in ${\mathcal {L} ^ *}$ by ${\mathbf {P}} = (p_1,p_2,p_3, \ldots , p_d)$. For a given subset size, say l, where $l < d$, a random subset of features ${\mathbf {P}}^{l}$, is drawn from $\mathbf {P}$. Based on the randomly selected features a bootstrap sample is drawn from ${\mathcal {L} ^ *}$. The new bootstrap learning set ${\mathcal {L}^*}^{(l)}$, consists of l dimensional feature vector. This process is repeated until we get m training sets, ${\mathcal {L}^ *}^{(1l)}, \ldots , {\mathcal {L}^ *}^{(ml)}$, each of $n\times {l}+1$ dimensions. The base kNN classifier is constructed on these bootstrap training sets and a set of m classifiers is generated.

While, drawing a random sample of the same size n from the training set, approximately $\frac{1}{3}$ of the observations are left out from that sample. These observations are called out-of-bag (OOB) observations, and can be utilized for estimation of the classification error (Breiman 1996b). In our framework we use the OOB sample for the assessment of the classifier. The m classifiers are then ranked according to their individual classification accuracy on the OOB sample and the first h of the m classifiers are selected from them. The selected classifiers are then assessed for their collective contribution as an ensemble on the validation set ${\mathcal {L}_{\textit{V}}}$. This is done by starting from the best one among h classifiers and then adding one by one the rest of the classifiers to the ensemble.

The formation of the ensemble of subset of kNN classifiers can be summarized as:

1.
Draw a random sample of size $l < d$, without replacement, of features from the feature vector $\mathbf {P}$ of ${\mathcal {L}^*}$, denote the feature vector by $\mathbf {P}^{l}$.
2.
Based on the selected random feature subset $\mathbf {P}^{l}$, draw a random sample of size n, ${\mathcal {L}^*}^{(l)}$, from ${\mathcal {L}^* }$.
3.
Construct the kNN classifier on ${\mathcal {L}^*}^{(l)}$.
4.
Calculate the accuracy of the classifier on the OOB sample using the same feature set as used for its construction.
5.
Iterate step (1) to (4) m times and rank the m classifiers according to their accuracies.
6.
Select first h classifiers with highest accuracies.

These selected classifiers are further assessed as follows:

The ensemble is started with combining the second best classifier to the first best classifier, and classification performance is evaluated on the validation set ${\mathcal {L}_{\textit{V}}}$. The ensemble is then grown by adding the third best classifier and the performance is measured, this process is carried out for all the h classifiers,
- let ${{\mathcal {BS}}}^{\langle r-1 \rangle }$ be the Brier score of the ensemble of selected best kNN models without the rth model and ${{\mathcal {BS}}}^{\langle r \rangle }$ be the Brier score of the ensemble of the best models after including the rth model, then rth model is selected if
  $$\begin{aligned} {{\mathcal {BS}}}^{\langle r \rangle } < {{\mathcal {BS}}}^{\langle r-1 \rangle }. \end{aligned}$$

The ensemble is formed in a two stage procedure by assessing the models using two different performance measures misclassification rate and Brier score.

In the first stage the classification models are evaluated using the misclassification rate (MR) as the performance measure. A classification model is desired to have minimum misclassification rate than others used for a classification task, and thus the classification models with a low misclassification rate are selected.

In the second stage of the algorithm the selected models are further evaluated using the Brier score as a performance measure. The Brier score measures the difference between the observed state of the outcomes of the test instances and the estimated probabilities that are in turn used to classify new observations using some threshold. Besides the traditional misclassification rate and other metrics, Brier score can also be used to evaluate the predictive performance of a classifier. While using output of the classifier as a basis for decision making, a more detailed evaluation is required; where not only the prediction accuracy of the classifier should be considered but also the quality of the estimate needs ample consideration. That can be done through a score such as the Brier score that, in principle, measures the predictive ability/quality of a classifier in classifying new data (Hernández-Orallo et al. 2012; Steyerberg et al. 2010; Kruppa et al. 2014).

Let the class labels of the test instances from the two classes, “positive” and “negative”, are represented by 0, or 1, i.e ${ y \in \{{0,1}}\}$. The Brier score for the probabilities of the predicted class 1, $y=1$, is:

$$\begin{aligned} \mathcal {BS}= & {} E(y_i-p(y_i=1))^2. \end{aligned}$$

An estimator for the above score is:

$$\begin{aligned} \hat{\mathcal {BS}} = \frac{\sum _{i=1}^{n_{t}}\left( y_i-\hat{p}(y_i| \mathbf{{x}})\right) ^2}{n_{t}}, \end{aligned}$$

where, $n_t$ is the total number of test points and the state of the outcome is, ${ y \in \{{0,1}}\}$. A low Brier score indicates better performance of the predictor. Thus the models minimizing the Brier score of the ensemble are selected.

One technical reason for assessing the individually selected models, in the first stage, for their collective contribution using the Brier score is that this score is more capable of determining the contribution of a model, to be included in the ensemble, than the misclassification rate. To illustrate this, let the estimated probability of a test observation belonging to class 1, provided that class 1 is the true class, by a classifier c1 is given as:

$$\begin{aligned} \hat{f}_{c1}= 0.56. \end{aligned}$$

Suppose that the cut-off for assigning this observation to class 1 is

$$\begin{aligned} \hat{f}(.) > 0.5, \end{aligned}$$

which implies that the given observation belongs to class 1 and classification error will be 0 (correct classification). The Brier score in this case is 0.1936.

Now consider that the second classifier gives the estimated probability for that observation as 0.68. The combined probability estimate of the two classifiers for the same observation, denoted by $\hat{f}_{{c1,c2}}$, is given as:

$$\begin{aligned} \hat{f}_{{c1,c2}} = 0.62 \ . \end{aligned}$$

Consequently, the Brier score decreases to 0.1444. The classification error in both the cases is 0 as that of a single classifier for the given cut-off.

A third classifier has an estimated probability of 0.88, the resultant combined probability is:

$$\begin{aligned} \hat{f}_{{c1,c2,c3}}= 0.71 . \end{aligned}$$

Here the Brier score decreases to 0.0841 while the classification error remains the same (0) as the previous ensemble of two classifiers for the given cut-off.

This follows that if classification errors are considered for classifier addition into the ensemble, classifier c2 and c3 would not be part of the ensemble, as the error remains the same, whereas the Brier score reduces with the addition of classifiers c1 and c2 thus leading to an ensemble of size 3.

The general pseudo code of ESkNN is given in Algorithm 1.

3 Simulation study

In addition to bench mark data sets we assessed ESkNN by simulation setups. We state two simulation models to assess the performance of ESkNN. The models proposed in our simulation study involve several variations to get an understanding of the behaviour of classifiers under different situations. The details of the two models are given below.

3.1 Simulation model 1

In this model, the performance of the classifiers is investigated in different setups. Firstly, the predictors of the two classes are generated with correlated and uncorrelated structures respectively. The variables for class 1 are correlated and generated with a varying variance covariance structure, while the features determining class 2 are independent. A total of 500 independent binary class data sets are generated, each with 20 features. The variables for class 1 are generated from ${\mathcal {N}(2,w\Psi )}$, while those of class 2 generated from ${\mathcal {N}({1},{1})}$. The values considered for w in class 1 are 3, 5, 10, 15 and 20. The predictive performance of the algorithms are investigated by adding 50, 100, 200 and 500 non-informative features, generated from normal distribution, to the data. The variance covariance matrix $\Psi $, which is a $d\times {d}$ matrix, is:

$$\begin{aligned} \Psi = \left( {\begin{array}{l@{\quad }l@{\quad }l@{\quad }l} \sigma _{1,1} &{} \varrho _{1,2} &{} ,\ldots , &{} \varrho _{1,d} \\ \varrho _{2,1} &{} \sigma _{2,2} &{} ,\ldots , &{} \varrho _{2,d} \\ \vdots &{}\vdots &{} \vdots &{} \vdots \\ \varrho _{m,1} &{} \varrho _{d,2} &{} ,\ldots ,&{} \sigma _{d,d} \end{array}} \right) , \end{aligned}$$

(1)

where $\varrho _{ij}$ are the covariances given by

$$\begin{aligned} \varrho _{ij} =(1/2)^{|{i-j}|}, i,j=1,\ldots ,d. \end{aligned}$$

(2)

and $\sigma _{ij}$, on the diagonal of ${\Psi }$, is the variance, $\sigma _{ij}$ = 1 when w is 1. Changing the value of w results in different degree of correlation between variables. The data is generated in such a manner that the variables within Class 1 are correlated among each other and are exhibiting negligible/no correlation with the features from Class 2.

3.2 Simulation model 2

The second simulation model, model 2, is a four-dimensional model, derived from the model proposed in Mease et al. (2007). A set of 500 independent binary class data sets are generated each consisting of 1000 observations and 4 features. The feature vector $\mathbf {x}$ is a four dimensional random vector uniformly distributed on [0, 100] and the response variable y with two outcomes 0 or 1. The class is determined by the distance r, the distance of feature vector $\mathbf {x}$ from the central point. The class probabilities given features are:

$$\begin{aligned} p({y=1} \mid \mathbf {x})=\left\{ \begin{array}{ll} 1, &{} \quad \hbox {if }{r< 110},\\ \frac{150-r}{140}, &{} \quad \hbox {if } {110\le {r}\le {140}},\\ 0, &{} \quad {otherwise.} \end{array} \right. \end{aligned}$$

(3)

The response values are generated from the above distribution using a Bernoulli random number generator. We extend the dimensions of this model by adding 50, 100, 200 and 500 non-informative feature generated from uniform distribution. The data complexity increases with the increase in the number of added non-informative features.

4 Simulation results and discussion

The average misclassification rate, from model 1 and model 2, are presented in Tables 1, 2 and 3.

Table 1 Misclassification rate of the methods on the data sets with added non-informative features from model 1

Full size table

The results from model 1, in Table 1 indicate that the classification accuracy of ESkNN is higher than all the other methods on most of the cases except for the data with original 20 features where SVM outperforms all the methods. The table reveals that unsurprisingly, kNN shows high error rate compared to other methods and the performance of kNN based methods declines with the increasing number of non-informative features in the data where as ESkNN still perform better. In case of the data set with original features SVM performs better, by giving minimum misclassification rate, as compared to all the other methods.

Table 2 Misclassification rate of the classifiers on the data sets from model 1 for different values of w, on 70 features ($20+50$ noninformative), listed in column 1

Full size table

Table 3 Misclassification rate of the methods on the data sets with added non-informative features from model 2

Full size table

From Table 2, there is an increase of misclassification rate of all the classifiers, except random forest. It can be observed that the prediction performance of the kNN based classification methods and SVM decrease with high variance and covariance of the data, i.e., for increasing values of w. However random forest gives better classification accuracy in this case. Although the performance of kNN based methods declines, ESkNN consistently perform better than the other methods except from random forest in such situations.

The results of model 2 from Table 3 reveal that ESkNN consistently outperform the other methods in the presence of non-informative features in the data, however, in the case of data with original features only, SVM is giving the best result and in case of 100 features ESkNN gives better results than other methods and comparable to SVM. Bagged kNN provide same results as usual kNN on the data with 4 features and slight accuracy gain is achieved than the usual kNN on the data with added no-informative features (Fig. 1).

5 Experiments on bench mark data sets

The performance of the proposed method in terms of misclassification rate, is evaluated on a total of 31 benchmark data sets. The data sets chosen include a wide range of domain that is microarray gene expression data sets, data sets from life science, finance and physical science. “Diabetes” and “Sonar”, data sets are from R-packages “mlbench” (Leisch and Dimitriadou 2010); ‘dystrophy” and “Glaucoma” are from “ipred” (Peters and Hothorn 2012). All the other data sets are from UCI (Bache and Lichman 2013). Summary of the data sets is given in Table 4.

Table 4 Summary of the data sets

Full size table

Table 5 Misclassification rate of kNN, RkNN, BkNN, MFS, RF, SVM and ESkNN

Full size table

5.1 Experimental setup

The performance of the ESkNN is evaluated on a total of 23 data sets. The ESkNN is evaluated in two scenarios on benchmark data sets; in case of benchmark data sets with their original features and then adding non-informative features to the data sets. The performance of ESkNN in terms of misclassification rate is compared with usual kNN, bagged kNN, random kNN, MFS, random forest and SVM. Each data set is divided into test and training sets, 90 % of the total data is used for the training and 10 % for testing. The same test and training set is used for all the methods and the results are averaged over a total of 1000 such splits. All the experiments are carried out using R (R Core Team 2013). The value of k for $k=1,\ldots ,10$, is selected by tenfold cross validation using the R-Package “e1071” for the kNN based methods (Meyer et al. 2012). Random forest is tuned by using R-function “tune.randomForest” available within the same package. For SVM we used “kernlab” R-Package (Karatzoglou et al. 2004). For tuning sigma for SVM, we used the automatic selection available with the “kernlab” R package. The other parameters are fixed at default values. Total of 1001, kNN models are generated on bootstrap samples and then 40 % of the total are reselected for the second stage. The number of models generated is taken an odd number to break ties in voting on the classifiers for classification of a test point. The feature subset size is set to one-third of the input features, however, in low dimensions, in case of original features in the data, i.e., $d<6$ the feature subset size is taken as 2.

6 Results and discussion

The results on the data sets with their original features and with added 500 randomly generated non-informative features are reported in Tables 5 and 6 respectively. The results from Table 5, show that ESkNN outperform or giving comparable results to other methods considered here. It is interesting to note that in case of the data sets with their original features ESkNN consistently outperform the kNN based methods on most of the data sets and gives comparable results to random forest. ESkNN gives overall better results on 8 data sets, on 9 data sets random forest is better than all the methods, on 5 data sets SVM is giving minimum classification error and on one data sets RkNN outperforms the rest of the methods.

In case of non-informative features in the data, Table 6, on 11 data sets ESkNN gives minimum classification error than the other methods, on 9 data set RF is giving best classification performance and on one data set SVM is giving better results and on two data sets their is no clear winner between random forest and ESkNN, however, ESkNN gives better performance than kNN based methods and SVM. Here again, it is observed that ESkNN results in smaller classification error than kNN based methods on most of the data sets.

Table 6 Misclassification rate of kNN, RkNN, BkNN, MFS, RF, SVM and ESkNN with added non-informative features to the data sets

Full size table

7 Conclusion and outlook

Considering the idea of ensemble techniques, we have proposed an ensemble of subset of kNN classifiers (ESkNN) for classification tasks particularly to deal with the issue of non-informative features in the data sets. Our approach consists of forming an ensemble of best kNN models thus implicitly digging out the informative features subsets and discarding the non-informative ones. ESkNN is assessed for its classification performance on simulated and benchmark data sets. Our results on simulated and benchmark data sets show that the ESkNN gives comparable results to RF and outperform kNN and kNN based ensembles. The results from the simulations, Table 2, reveal that in case of high variance in the classes RF performs better than the others. Random projection ensemble classification (Cannings and Samworth 2015) may allow further improvements. Moreover, it would be of interest to investigate if recent proposals as predictive hubs (Lausser et al. 2014) and representative prototypes (Müssel et al. 2015) can be exploited to develop ESkNN further. ESkNN is implemented and available as R-Package “ESkNN” on CRAN (Gul et al. 2015).

References

Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 3 October 2014
Barandela R, Valdovinos RM, Sánchez JS (2013) New applications of ensembles of classifiers. Pattern Anal Appl 6(3):245–256
Article MathSciNet Google Scholar
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1):105–139
Article Google Scholar
Bay S (1998) Combining nearest neighbor classifiers through multiple feature subsets. In: Proceedings of the Fifteenth International Conference on Machine Learning, vol 3. Morgan Kaufmann Publishers Inc., pp 37–45
Breiman L (1996a) Bagging predictors. Mach Learn 24(2):123–140
MathSciNet MATH Google Scholar
Breiman L (1996b) Out-of-bag estimation. Tech. rep. http://citeseerx.ist.psu.edu. Accessed 3 October 2014
Cannings T, Samworth R (2015) Random projection ensemble classification. arXiv:1504.04595v1.pdf. Accessed 3 October 2015
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article MATH Google Scholar
Domeniconi C, Yan B (2004) Nearest neighbor ensemble. In: IEEE Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), vol 1, pp 228–231
Grabowski S (2002) Voting over multiple k-nn classifiers. In: Proceedings of the International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science IEEE, pp 223–225
Gul A, Perperoglou A, Khan Z, Mahmoud O, Adler W, Miftahuddin M, Lausen B (2015) R package: ESKNN: ensemble of subset of K-nearest neighbours classifiers for classification and class membership probability estimation. http://cran.r-project.org/web/packages/ESKNN/index.html. Accessed 30 Sept 2015
Guvenir HA, Akkus A (1997) Weighted k nearest neighbor classification on feature projections. http://www.cs.bilkent.edu.tr/tech-reports/1997/BU-CEIS-9719.pdf. Accessed 3 October 2014
Hall P, Samworth R (2005) Properties of bagged nearest neighbour classifiers. J R Stat Soc Ser B (Statistical Methodology) 67(3):363–379
Article MathSciNet MATH Google Scholar
Hernández-Orallo J, Flach P, Ferri C (2012) A unified view of performance metrics: Translating threshold choice into expected classification loss. J Mach Learn Res 13(1):2813–2869
MathSciNet MATH Google Scholar
Hothorn T, Lausen B (2003a) Bagging tree classifiers for laser scanning images: a data-and simulation-based strategy. Artif Intell Med 27(1):65–79
Article Google Scholar
Hothorn T, Lausen B (2003b) Double-bagging: combining classifiers by bootstrap aggregation. Pattern Recognit 36(6):1303–1309
Article MATH Google Scholar
Hothorn T, Lausen B (2005) Bundling classifiers by bagging trees. Comput Stat Data Anal 49(4):1068–1078
Article MathSciNet MATH Google Scholar
Hothorn T, Lausen B, Benner A, Radespiel-TrÂĺoger M (2004) Bagging survival trees. Stat Med 23(1):77–91
Article Google Scholar
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab—an S4 Package for Kernel Methods in R. J Stat Softw 11(9):1–20
Article Google Scholar
Khoshgoftaar T, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A Syst Hum 41(3):552–568
Article Google Scholar
Kruppa J, Liu Y, Diener HC, Holste T, Weimar C, König IR, Ziegler A (2014) Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications. Biom J 56(4):564–583
Article MathSciNet MATH Google Scholar
Lausser L, Müssel C, Melkozerov A, Kestler HA (2014) Identifying predictive hubs to condense the training set of k-nearest neighbour classifiers. Comput Stat 29(1):81–95
Article MathSciNet MATH Google Scholar
Leisch F, Dimitriadou E (2010) mlbench: Machine Learning Benchmark Problems. R package version 2.1-1
Li S, Harner EJ, Adjeroh D (2011) Random knn feature selection-a fast and stable alternative to random forests. BMC Bioinform 12(1):450
Article Google Scholar
Liu Z, Zhao X, Zuo MJ, Xu H (2014) Feature selection for fault level diagnosis of planetary gearboxes. Adv Data Anal Classif 8(4):377–401
Article MathSciNet Google Scholar
Maclin R, Opitz D (2011) Popular ensemble methods: an empirical study. J Artif Res 11:169–189
MATH Google Scholar
Mahmoud O, Harrison A, Perperoglou A, Gul A, Khan Z, Metodiev MV, Lausen B (2014) A feature selection method for classification within functional genomics experiments based on the proportional overlapping score. BMC Bioinform 15(1):274
Article Google Scholar
Mease D, Wyner AJ, Buja A (2007) Boosted classification trees and class probability/quantile estimation. J Mach Learn Res 8:409–439
MATH Google Scholar
Melville P, Shah N, Mihalkova L, Mooney R (2004) Experiments on ensembles with missing and noisy data. In: Roli F, Kittler J, Windeatt T (eds) Lecture Notes in Computer Science: Proceedings of the Fifth International Workshop on Multi Classifier Systems (MCS-2004), Cagliari, Italy. Springer, Heidelberg, pp 293–302
Google Scholar
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2012) e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.6-1
Müssel C, Lausser L, Kestler HA (2015) Ensembles of representative prototype sets for classification and data set analysis. In: Lausen B, Krolak-Schwerdt S, Böhmer M (eds) Data Science, Learning by Latent Structures, and Knowledge Discovery. Springer, Heidelberg, pp 329–339
Google Scholar
Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306
Article Google Scholar
Peters A, Hothorn T (2012) ipred: Improved Predictors. http://CRAN.R-project.org/package=ipred. R package version 0.9-1
R Core Team (2013) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
Samworth RJ (2012) Optimal weighted nearest neighbour classifiers. Ann Stat 40(5):2733–2763
Article MathSciNet MATH Google Scholar
Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW (2010) Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology 21(1):128–138
Article Google Scholar
Zhou ZH, Yu Y (2005) Adapt bagging to nearest neighbor classifiers. J Comput Sci Technol 20(1):48–54
Article Google Scholar

Download references

Acknowledgments

We thank the referees and editors for valuable suggestions. We acknowledge support from grant number ES/L011859/1, from The Business and Local Government Data Research Centre, funded by the Economic and Social Research Council to provide researchers and analysts with secure data services.

Author information

Authors and Affiliations

Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, UK
Asma Gul, Aris Perperoglou, Zardad Khan, Osama Mahmoud, Miftahuddin Miftahuddin & Berthold Lausen
Department of Statistics, Shaheed Benazir Bhutto Women University, Peshawar, Pakistan
Asma Gul
Department of Statistics, Abdul Wali Khan University, Mardan, Pakistan
Zardad Khan
Institute of Medical Informatics, Biometry and Epidemiology, University of Erlangen-Nuremberg, Erlangen, Germany
Werner Adler

Authors

Asma Gul
View author publications
You can also search for this author in PubMed Google Scholar
Aris Perperoglou
View author publications
You can also search for this author in PubMed Google Scholar
Zardad Khan
View author publications
You can also search for this author in PubMed Google Scholar
Osama Mahmoud
View author publications
You can also search for this author in PubMed Google Scholar
Miftahuddin Miftahuddin
View author publications
You can also search for this author in PubMed Google Scholar
Werner Adler
View author publications
You can also search for this author in PubMed Google Scholar
Berthold Lausen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Berthold Lausen.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Gul, A., Perperoglou, A., Khan, Z. et al. Ensemble of a subset of kNN classifiers. Adv Data Anal Classif 12, 827–840 (2018). https://doi.org/10.1007/s11634-015-0227-5

Download citation

Received: 22 December 2014
Revised: 12 October 2015
Accepted: 10 December 2015
Published: 22 January 2016
Issue Date: December 2018
DOI: https://doi.org/10.1007/s11634-015-0227-5

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Ensemble of a subset of kNN classifiers

Abstract

Similar content being viewed by others