# Dynamic Classifier Selection for Data with Skewed Class Distribution Using Imbalance Ratio and Euclidean Distance

- 893 Downloads

## Abstract

Imbalanced data analysis remains one of the critical challenges in machine learning. This work aims to adapt the concept of *Dynamic Classifier Selection* (dcs) to the pattern classification task with the skewed class distribution. Two methods, using the similarity (distance) to the reference instances and class imbalance ratio to select the most confident classifier for a given observation, have been proposed. Both approaches come in two modes, one based on the *k*-Nearest Oracles (knora) and the other also considering those cases where the classifier makes a mistake. The proposed methods were evaluated based on computer experiments carried out on
Open image in new window
datasets with a high imbalance ratio. The obtained results and statistical analysis confirm the usefulness of the proposed solutions.

## Keywords

Classifier ensemble Dynamic Classifier Selection Imbalanced data## 1 Introduction

Traditional machine learning algorithms assume that the number of instances belonging to problem classes is relatively similar. However, it is worth noting that in many real problems the size of one class (*majority class*) may significantly exceed the size of the second one (*minority class*). This makes the algorithms biased towards the majority class, although the correct recognition of less common class is often more important. This research trend is known as learning from imbalanced data [8] and it is still widely discussed in scientific works.

*Data-level methods*focusing on modifying the training set in such a way that it becomes suitable for classic learning algorithms (e.g.,*oversampling*and*undersampling*).*Algorithm-level methods*that modify existing classification algorithms to offset their bias towards the majority class.*Hybrid methods*combining the strengths of the previously mentioned approaches.

*Dynamic Ensemble Selection*(des) [5]. Dynamic selection (ds) methods select a single classifier or an ensemble (from an available classifier pool) to predict the decision for each unknown query. This is based on the assumption that each of the base classifiers is an expert in a different region of the feature space. The classification of each unknown sample by des involves three steps:

Definition of the region of competence; that is, how to define the local region surrounding the unknown sample, in which the competence level of the base models is estimated. This local region of competence is found in the dynamic selection dataset (dsel), which is usually part of the training set.

Defining the selection criterion later used to assess the competence of the base classifiers in the local region of competence (e.g., accuracy or diversity).

Determination of the selection mechanism deciding whether we choose a single classifier or an ensemble.

Previous work related to the imbalanced data classification using classifier ensembles and des involves various approaches. Ksieniewicz in [9] proposed an *Undersampled Majority Class Ensemble* (umce) employing different combination methods and pruning, based on a *k*-fold division of the majority class to divide an imbalanced problem into many balanced ones. Chen et al. [4] presented the *Dynamic Ensemble Selection Decision-making* (desd) algorithm to select the most appropriate classifiers using a weighting mechanism to highlight the base models that are better suited for recognizing the minority class. Zyblewski et al. in [17] proposed the *Minority Driven Ensemble* (mde) for highly imbalanced data streams classification and Roy et al. in [14] combined preprocessing with dynamic ensemble selection to classify both binary and multiclass stationary imbalanced datasets.

The proposition of the new dynamic selection methods adapted for the classification of highly imbalanced data.

Experimental evaluation of the proposed algorithms based on a high number of diverse benchmark datasets and a detailed comparison with the

*state-of-art*approaches.

## 2 Dynamic Ensemble Selection Based on Imbalance Ratio and Euclidean Distance

This paper proposes two algorithms for dynamic classifier selection for the imbalanced data classification problem. These are respectively the Dynamic Ensemble Selection using Euclidean distance (dese) and the Dynamic Ensemble Selection using Imbalance Ratio and Euclidean distance (desire).

The generation of the classifier pool is based on the *Bagging* approach [2], and more specifically on the *Stratified Bagging*, in which the samples are drawn with replacement from the minority and majority class separately in such a way that each bootstrap maintains the original training set class proportion. This is necessary due to the high imbalance, which in the case of standard bagging can lead to the generation of training sets containing only the majority class.

Both proposed methods are derived in part from algorithms based on local oracles, and more specifically on knora-u [7], which gives base classifiers weights based on the number of correctly classified instances in the local region of competence and then combines them by weighted majority voting. The computational cost in this type of method is mainly related to the size of the classifier pool and the dsel size, as the *k*-nearest neighbors technique is used to define local competence regions, which can be costly for large datasets. Instead of hard voting, dese and desire are based on the probabilities returned by the base models and they calculate weights for each classifier for both the minority and majority classes separately.

Proposed methods come in two variants: *Correct* (denoted as *C*), where weights are modified only in the case of correct classification, and *All* (denoted as *A*), where, in addition to correct decisions, weights are also affected by incorrect ones. The exact way of weights calculation is presented in Algorithm 1.

In step Open image in new window , the

*k*-nearest neighbors of a given instance are found in dsel, which form the local region of competence lrc.In step Open image in new window , each classifier \(\varPsi _j\) from the pool classifies all samples belonging to lrc.

In steps Open image in new window – Open image in new window , the classifier weights are modified separately for the minority and majority class, starting from the value of Open image in new window . The

*All*variant uses all four conditions, while the*Correct*variant is based only on the conditions in lines Open image in new window and Open image in new window . In the case of dese, the modifications are based on the Euclidean distance between the classified sample and its neighbor from the local competence region, and in the case of desire, the Euclidean distance is additionally scaled by a percentage of the minority or majority class in such a way that more emphasis is placed on the minority class.

Finally, the weights obtained from dese or desire are normalized to the [0, 1] range and multiplied by the ensemble support matrix. The combination is carried out according to the maximum rule [6], which chooses the classifier that is most confident of itself. The choice of this combination rule was dictated by a small number of instances in the datasets, which significantly reduces the risk of base classifiers overfitting.

## 3 Experimental Evaluation

This section presents the details of the experimental study, the datasets used and the results that the proposed approaches have achieved compared to the *state-of-art* methods.

### 3.1 Experimental Set-Up

The main goal of the following experiments was to compare the performance of proposed dynamic selection methods, designed specifically for the task of imbalanced data classification, with the *state-of-art* ensemble methods paired with preprocessing. The evaluation in each of the experiments is based on
Open image in new window
metrics commonly used to assess the quality of classification for imbalanced problems. These are *F1 score* [15], *precision* and *recall* [13], *G-mean* [11] and *balanced accuracy score* (bac) [3] according to the *stream-learn* [10] implementation. All experiments have been implanted in *Python* and can be repeated using the code on *Github*^{1}.

As the base models three popular classifiers, according to the *scikit-learn* [12] implementation, were selected, i.e. *Gaussian Naive Bayes* (gnb), *Classification and Regression Trees* (cart) and *k-Nearest Neighbors* classifier (knn). The fixed size of the classifier pool has been determined successively as
Open image in new window
,
Open image in new window
,
Open image in new window
and
Open image in new window
base models. The evaluation was carried out using
Open image in new window
times repeated
Open image in new window
-fold cross-validation. Due to the small number of instances in the datasets, dsel is defined as the entire training set.

Datasets characteristics.

Subsections 3.2 and 3.3 present the results of experiments comparing the presented methods, dese in experiment
Open image in new window
and desire in experiment
Open image in new window
, with *state-of-art* ensemble algorithms used for the imbalanced data classification.

Both proposed and reference methods occur in versions with preprocessing (in the form of *random oversampling*) and without it, the use of oversampling is denoted by the letter *O* found before the acronym of the method. As a reference method, a single classifier, as well as stratified bagging and dynamic selection in the form of the knora-u algorithm were selected.

The radar diagrams show the average global ranks achieved by each of the tested algorithms in terms of each of the
Open image in new window
evaluation metrics, while the tables show the results of the Wilcoxon signed-rank (\(p=0.05\)) statistical test for a pool size of
Open image in new window
base classifiers. The numbers under the average rank of each method indicate the algorithms which are statistically significantly worse than the one in question. The complete results for each of the
Open image in new window
datasets and the full statistical analysis can be found on the *Github*^{2}.

### 3.2 Experiment 1 – Euclidean Distance-Based Approach

In Fig. 1 we can see how the average ranks for dese and reference methods changed in terms of different metrics depending on the ensemble size. We can see that the proposed methods (especially odese-c) for
Open image in new window
base models achieve higher rankings in terms of each metric with an exception of *recall*. While the single classifier and bagging are preferring *recall*, odese-c and dese-c prefer *precision*. As the number of base classifiers increases, bac and *G-mean*-based rankings deteriorate to knora-u level, while the *F1 score* remains high due to high *precision*.

Table 2 presents the results of the statistical analysis, which shows that the odese-c method performs statistically significantly better than all reference methods in terms of each metric except for *recall*.

*precision*, odese-c performs better in terms of other metrics, and odese-a, despite the low

*F1 score*and

*precision*, achieves the highest average ranks in terms of bac,

*G-mean*and

*recall*. Table 3 confirms that for the five base classifiers, odese-c is statistically significantly better than all reference methods, while odese-a performs statistically significantly better than odese-c in terms of

*recall*,

*G-mean*and bac.

Statistical tests on mean ranks for gnb with pool size = 5.

Statistical tests on mean ranks for cart with pool size = 5.

Statistical tests on mean ranks for knn with pool size = 5.

In Fig. 3 and Table 4 we can see that the proposed methods using oversampling do not differ statistically from the reference methods, except for a single classifier, which is characterized by high *precision* but at the same time achieves the worst mean ranks based on the remaining metrics. Together with the increase in the base classifier number, knora-u and osb achieve higher average ranks than odese-c and odese-a.

### 3.3 Experiment 2 – Scaled Euclidean Distance-Based Approach

The results below show the average ranks for the proposed desire method, which calculates weights based on Euclidean distances scaled by the percentages of the minority and majority classes in the training set.

In the case of gnb as the base model (Fig. 4), the odesire-c method achieves the best results compared to reference methods in terms of mean ranks based on *F1 score*, *precision*, *G-mean* and bac. When the ensemble size increases, the proposed method is equal to oknora-u in terms of bac and *G-mean* but retains the advantage in terms of *F1 score* and *precision*. Also, the more base classifiers the smaller the differences between desire using preprocessing and the version without it. Table 5 presents the results of the statistical analysis, which shows that odesire-c is statistically better than all reference methods when the number of base classifiers is low.

Figure 5 shows that for a small classifier pool, odesire-c achieves higher ranks than reference methods in terms of each evaluation metric, and as the classifier number increases, it loses significantly in *precision* compared to osb and oknora-u. odesire-a has a high *recall*, which unfortunately is reflected by the lowest *precision* and *F1 score*. In Table 6 we see that for
Open image in new window
base classifiers, dsire-c both with and without preprocessing is statistically significantly better than reference methods in terms of all metrics except one, *G-mean* in the case desire-c and *recall* for odesire-c.

Statistical tests on mean ranks for gnb with pool size = 5.

Statistical tests on mean ranks for cart with pool size = 5.

Statistical tests on mean ranks for knn with pool size = 5.

### 3.4 Lessons Learned

The presented results confirmed that dynamic selection methods adapted specifically for the imbalanced data classification can achieve statistically better results than *state-of-art* ensemble methods coupled with preprocessing, especially when the pool of base classifiers is relatively small. This may be due to the fact that *bagging* has not yet stabilized, while the proposed method chooses the best single classifier. The *Correct* approach in which the weights of the models were changed only if the instances belonging to the local competence region were correctly classified, proved to be more balanced in terms of all
Open image in new window
evaluation measures. This may indicate too high weight penalties with incorrect classification in the *All* approach. When knn is used as the base classifier, with a small pool the proposed methods performed statistically similar to knora-u, and with a larger number of classifiers, achieved statistically inferior rank compared to the reference methods. This may be probably due to the support calculation method in the knn, which is not suitable for the algorithms proposed in this work. For gnb and cart, dese-c and desire-c achieved results which are statistically better than or similar to the reference methods, often without the use of preprocessing, since it has a built-in mechanism to deal with the imbalance.

## 4 Conclusions

The main purpose of this work was to propose a novel solution based on dynamic classifier selection for imbalanced data classification problem. Two methods were proposed, namely dese and desire, which use the Euclidean distance and imbalance ratio in the training set to select the most appropriate model for the classification of each new sample. Research conducted on benchmark datasets and statistical analysis confirmed the usefulness of proposed methods, especially when there is a need to maintain a relatively low number of classifiers.

Future work may involve the exploration of different approaches to the base classifiers’ weighting, as well as using different combination methods and the use of proposed methods for the imbalanced data stream classification.

## Footnotes

## Notes

### Acknowledgments

This work was supported by the Polish National Science Centre under the grant No. 2017/27/B/ST6/01325.

## References

- 1.Alcala-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput.
**17**, 255–287 (2010)Google Scholar - 2.Breiman, L.: Bagging predictors. Mach. Learn.
**24**(2), 123–140 (1996). https://doi.org/10.1007/BF00058655CrossRefzbMATHGoogle Scholar - 3.Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution. In: Proceedings of the 2010 20th International Conference on Pattern Recognition, ICPR 2010, Washington, DC, USA, pp. 3121–3124. IEEE Computer Society (2010)Google Scholar
- 4.Chen, D., Wang, X.-J., Wang, B.: A dynamic decision-making method based on ensemble methods for complex unbalanced data. In: Cheng, R., Mamoulis, N., Sun, Y., Huang, X. (eds.) WISE 2020. LNCS, vol. 11881, pp. 359–372. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34223-4_23CrossRefGoogle Scholar
- 5.Cruz, R.M.O., Sabourin, R., Cavalcanti, G.D.C.: Dynamic classifier selection: recent advances and perspectives. Inf. Fus.
**41**, 195–216 (2018)CrossRefGoogle Scholar - 6.Duin, R.P.W.: The combining classifier: to train or not to train? In: Object Recognition Supported by User Interaction for Service Robots, vol. 2, pp. 765–770, August 2002Google Scholar
- 7.Ko, A.H., Sabourin, R., Alceu Souza Britto, J.: From dynamic classifier selection to dynamic ensemble selection. Pattern Recogn.
**41**(5), 1718–1731 (2008)CrossRefGoogle Scholar - 8.Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progress Artif. Intell.
**5**(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0CrossRefGoogle Scholar - 9.Ksieniewicz, P.: Undersampled majority class ensemble for highly imbalanced binary classification. In: Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications. Proceedings of Machine Learning Research, Dublin, Ireland, vol. 94, pp. 82–94. PMLR, ECML-PKDD, 10 September 2018Google Scholar
- 10.Ksieniewicz, P., Zyblewski, P.: Stream-learn-open-source Python library for difficult data stream batch analysis. arXiv preprint arXiv:2001.11077 (2020)
- 11.Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML (1997)Google Scholar
- 12.Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res.
**12**, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar - 13.Powers, D.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol
**2**, 2229–3981 (2011) Google Scholar - 14.Roy, A., Cruz, R.M., Sabourin, R., Cavalcanti, G.D.: A study on combining dynamic selection and data preprocessing for imbalance learning. Neurocomputing
**286**, 179–192 (2018)CrossRefGoogle Scholar - 15.Sasaki, Y.: The truth of the F-measure. Teach Tutor Mater, January 2007Google Scholar
- 16.Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fus.
**16**, 3–17 (2014). Special Issue on Information Fusion in Hybrid Intelligent Fusion SystemsCrossRefGoogle Scholar - 17.Zyblewski, P., Ksieniewicz, P., Woźniak, M.: Classifier selection for highly imbalanced data streams with
*minority driven ensemble*. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2019. LNCS (LNAI), vol. 11508, pp. 626–635. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20912-4_57CrossRefGoogle Scholar