SVM ensemble training for imbalanced data classification using multi-objective optimization techniques

Grzyb, Joanna; Woźniak, Michał

doi:10.1007/s10489-022-04291-9

SVM ensemble training for imbalanced data classification using multi-objective optimization techniques

Open access
Published: 17 November 2022

Volume 53, pages 15424–15441, (2023)
Cite this article

Download PDF

You have full access to this open access article

Applied Intelligence Aims and scope Submit manuscript

SVM ensemble training for imbalanced data classification using multi-objective optimization techniques

Download PDF

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

One of the main problems with classifier training for imbalanced data is defining the correct learning criterion. On the one hand, we want the minority class to be correctly recognized, and on the other hand, we do not want to make too many mistakes in the majority class. Commonly used metrics focus either on the predictive quality of the distinguished class or propose an aggregation of simple metrics. The aggregate metrics, such as Gmean or AUC, are primarily ambiguous, i.e., they do not indicate the specific values of errors made on the minority or majority class. Additionally, improper use of aggregate metrics results in solutions selected with their help that may favor the majority class. The authors realize that a solution to this problem is using overall risk. However, this requires knowledge of the costs associated with errors made between classes, which is often unavailable. Hence, this paper will propose the semoos algorithm - an approach based on multi-objective optimization that optimizes criteria related to the prediction quality of both minority and majority classes. semoos returns a pool of non-dominated solutions from which the user can choose the model that best suits him. Automatic solution selection formulas with a so-called Pareto front have also been proposed to compare state-of-the-art methods. The proposed approach will train a svm classifier ensemble dedicated to the imbalanced data classification task. The experimental evaluations carried out on a large number of benchmark datasets confirm its usefulness.

Clustering and Weighted Scoring in Geometric Space Support Vector Machine Ensemble for Highly Imbalanced Data Classification

Automatic Configuration of a Multi-objective Local Search for Imbalanced Classification

Combining One-vs-One Decomposition and Ensemble Learning for Multi-class Imbalanced Data

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Among imbalanced data classification methods, one of the most promising directions is using models based on classifier ensembles. In the case of ensemble learning, great emphasis is placed, on the one hand, on good prediction quality and, on the other hand, on appropriate diversification of base classifiers. Additionally, for the task in which there are unequal proportions between classes, and thus the cost of misclassification of one class (usually minority class) is higher than an incorrect object from the other class, a significant problem is the selection of a correct classification quality metric. This metric should serve as an optimization criterion when learning the model. The canonical classification model assumes knowledge of the mentioned misclassification costs, which should be provided by the user in the form of a loss function. Then a conceptually simple criterion would be the expectation value of the mentioned function, i.e., the overall risk [20].

Unfortunately, such information is lacking in real-world problems, and determining the cost of an erroneous decision is very difficult. One may suggest that a misclassification cost should be inversely proportional to the frequency of objects in a given class. However, it is easy to show that this is not necessarily true for many practical problems, and especially in the case of highly imbalanced data sets, this would lead to ignoring errors made on the majority class. Hence, approaches are constantly being developed to acquire the misclassification costs mentioned above, such as those given in the work of Branco et al. [9] utility-based learning. Most of the work on classifying imbalanced data analyzes simple metrics such as Recall and Precision (or Specificity). However, aggregate metrics such as Gmean, AUC, or F-measure are adopted due to the desire to express the quality assessment of a method by a single value. Their undoubted disadvantage is that they assume a fixed relationship between simple criteria, e.g., in the case of Gmean, it is the geometric mean of Precision and Recall. It is also worth noting that these criteria ignore the imbalance ratio or the misclassification costs on objects of different classes. Brzezinski et al. [10] showed that such metrics are biased towards the majority class and suggested using parametric metrics as F_β, where β should be proportional to the imbalance ratio. Nevertheless, such a recommendation still does not guarantee that the best-evaluated classifier using this criterion will properly optimize the cost of misclassification because it assumes that the misclassification cost of minority class is proportional to the imbalance ratio, but it is not always true.

Additionally, it should also be noted that the values of these aggregated criteria are ambiguous. Based on their values, we de facto do not know how the given model behaves, i.e., what values the simple criteria take since a given value of the aggregated criterion can be achieved for many pairs of precision and recall values.

Considering the above, it seems interesting to consider the problem of classifier learning for imbalanced data as a multi-criteria optimization task. As a result of such algorithms, we should obtain not a single solution but a set of solutions (Pareto front), from which the selection of a single solution can either be automated or left to the user to decide.

In this paper, we intend to answer whether it is possible to propose classifier learning algorithms using multi-criteria optimization to train a set of Pareto-optimal svm classifiers to obtain the ensemble as good as the state-of-the-art methods. The classical svm classifier has difficulties classifying imbalanced data because determining the optimal hyperplane is not as simple as for more balanced data. The higher the imbalance ratio, the weaker the classification ability is. Therefore data sampling mechanisms (undersampling or oversampling) or changes at the classifier architecture level are used [12].

The main contributions of this paper can be summarized as follows:

Proposing the semoos ensemble method using multi-criteria optimization in the model learning phase.
Designing a strategy for building the ensemble consisting of Pareto optimal svm classifiers.
Developing a new bootstrapping variant for sample subspace selection from imbalanced data.
Analysing the impact of hyperparameters setting on the semoos behavior.
Estimating the computational complexity of the semoos method.
Experimental evaluation of the proposed approach based on diverse benchmark datasets and a detailed comparison with the state-of-the-art methods.

The paper is organized as follows. The following section discusses the related works. Section 3 provides the details of the semoos algorithm and its run-time complexity. Then we discuss the experimental setup, report the results along with our analysis, and finally present the concluding remarks and possible future research directions.

2 Related works

This section presents the main works related to our research. We shortly introduce the main topics related to multi-objective optimization, imbalanced data analysis, and classifier ensemble learning.

2.1 Multi-objective optimization

Multi-criteria (multi-objective) optimization algorithms optimize more than one objective function. Hence unlike single- optimization methods they return many solutions instead of a single one. The returned solutions may be divided into dominated and non-dominated ones [21, 60]. A non-dominated (Pareto-optimal) solution cannot be improved due to any criterion without compromising the value of the others. Most multi-objective optimization algorithms are global methods [17], where approximate solutions are returned. They may be grouped into: (i) Weighted Objectives Methods, (ii) Hierarchical Optimization Methods, (iii) Trade-Off Methods, (iv) Methods of Distance Functions, Min-Max Methods, (v) Goal Programming Methods, (vi) Genetic Methods.

Among the latter methods, nsga, nsga-ii, npga, and ffga [3] should be mentioned, as well as a multi-objective evolutionary algorithm based on decomposition (moea/d) that decomposes an optimization problem into a number of scalar optimization subproblems and optimizes them simultaneously [79]. Nguyen et al. [54] modified the moea/d algorithm and proposed two decomposition methods based on multiple reference points, static and dynamic. The developed method gave satisfactory results in the feature selection in the classification. Currently, multi-objective evolutionary algorithms (moea s) are eagerly developed and bring significant benefits, especially in large-scale multi-objective optimization problems. In this case, the decision space is grouped into several subspaces, is reduced, or novel search strategies are used [69].

Pareto solutions make it possible to reconcile several criteria but simultaneously make it difficult to choose one, the best solution, which is subjective and should depend on the user’s preference. The more critical criterion is for a user, the more likely the solution with a better score for that criterion will be selected. The Pareto MTL method was developed to facilitate selecting a Pareto front solution [45]. Thanks to it, solutions in the objective space are well-distributed, which ensures a compromise between various, often opposite, criteria. Another approach that provides well-distributed solutions on the Pareto front is the cosmos method [61] developed for deep learning.

There are many propositions on how to solve optimization. One of them is the use of the surrogate-assisted Particle Swarm Optimization with the Pareto active learning algorithm [48], which has a relatively low computational cost, fast convergence, and good diversity. Convergence is good when the solutions are close to the Pareto front, and good diversity in the objective space means that the solutions are evenly distributed in space and are not just at one point. The Ma et al. [49] proposed the lsmoea/d method using reference vectors in the control variable analysis. The experiments confirmed high quality for the huge test problems with 2–10 objectives and 200–1000 variables.

Multi-criteria optimization is a rapidly growing domain, primarily due to the increasing complexity of modeled processes. Increasingly, such optimization is used in real problems such as crude oil price forecasting [35], efficient use of energy in agriculture [36], renewable energy of the building sector [46] and many more. There are also some applications of multi-criteria optimization in classification, e.g., Jin et al. [77] solved the problem of multi-criteria optimization of the structure and parameters of the neural network using evolutionary methods (nsga-ii). The mo-selm method [75] has been tested for classification and regression. It relies on the optimization of parameters and structure learning of the Extreme Learning Machine network to cope with the overfitting problem. gemonn [76] employed a gradient-guided evolutionary approach containing the advantages of gradient descent and evolutionary algorithms to train deep neural networks. Optimization is used to determine weights for the network, while multi-criteria optimization is used simultaneously for the network sparsity and the training loss.

Many researchers combine svm classifiers with optimization algorithms that solve the feature selection problem. The article [78] used nsga-iii for the feature subset selection and cnn-svm (Convolutional Neural Network - Support Vector Machine) for software defect prediction with an imbalance problem. Mierswa [51] showed the possibility of using multi-criteria optimization techniques in svm learning, pointing out that thanks to this approach, it is possible to turn away from aggregated optimization criteria as a combination of opposing criteria. Additionally, Pareto-optimal solutions allow complexity analysis of the solution so the user can easily see which solutions are overfitted. The combination of svm classifier to detect malicious traffic and a Genetic Algorithm to optimize hyperparameters is used in the article [62].

2.2 Imbalanced data classification

In imbalanced data classification, the disproportion among the different classes is not the sole issue of learning difficulties. One may easily come up with an example where the instance distributions from different classes are well-separated. Napierała and Stefanowski observed that the minority class samples often may form scattered clusters of an unknown structure [53]. An additional complication arises from the possibility that there may be an insufficient number of minority class samples for a classifier learning algorithm to achieve an adequate level of generalization, resulting in overfitting [16].

One may divide imbalanced data classification algorithms into three groups [47].

Data preprocessing methods

concentrate on decreasing the number of examples from the majority class (undersampling) or generating new minority class samples (oversampling). These mechanisms aim to balance the number of objects from considered classes. Oversampling randomly replicates existing samples or generates new samples in a guided manner. smote is the most recognized algorithm [14] that generates new minority samples between two randomly selected objects. Unfortunately, methods such as smote may lead to changes in the characteristic of the minority class and, as a result, overfit the classifier. Therefore several modifications of smote have been proposed that can identify the samples to be copied more intelligently, such as Borderlinesmote [29] that generates new minority class samples close to the decision border. Safe-Level smote [11] and ln-smote [50] reduce the probability of generating synthetic minority instances in areas where the predominant objects are that of the majority class. Koziarski et al. proposed rbo [39] and ccr that enforce instances from the majority-class to be relocated from the areas where the minority-class instances are present [38]. The alternative preprocessing approach is undersampling. Such methods employ randomly removing the instances from the majority class or removing them from the areas so that the classifier’s quality is not disrupted using neighborhood analysis.

Inbuilt mechanisms

modify existing classification algorithms for imbalanced tasks ensuring balanced accuracy for both classes. One approach is one-class classification [34], which aims to learn the decision areas associated with one class. Initially, an approach based on building models for the majority class was proposed due to the sufficiently large number of objects representing it, and the minority class was treated as so-called outliers. A different approach was proposed in [41], where a one-class classifier was trained on minority class. In turn cost-sensitive classification considers the asymmetrical loss function that assigns a higher misclassification cost of minority class [30, 40, 47, 80]. Unfortunately, such methods may cause a reverse bias towards the minority class.

Hybrid methods

combine the advantages of methods using data preprocessing with the different classification methods. One of the most popular approaches is the hybridization of under- and oversampling with ensemble classifiers [27]. This approach allows the data to be independently processed for each base model. It is also worth noting methods based on ensemble classification [74], such as smoteBoost [15] and AdaBoost.NC [71].

2.3 Classifier ensembles

The purpose of the classifier ensemble is to make a joint decision by the pool of base classifiers to obtain more significant predictive power [42]. The main factors affecting the predictive quality of the ensemble are the individual predictive quality of base classifiers and their diversity [52]. These factors are the opposite. In the case of an ideal classifier, making a correct prediction in each case, increasing ensemble diversification must lead to the addition of worse quality classifiers. Therefore the ensemble forming process may be treated as a multi-objective optimization problem.

Chandra and Yao proposed divace (DIVerse and Accurate Ensemble Learning Algorithm) [13], which employs multi-objective optimization to the ensemble learning task to find a trade-off between diversity and accuracy. Abbas developed Memetic Pareto Artificial Neural Network (mpann), which optimizes similar criteria [1]. In [73], the authors optimize the weights of the models in the ensemble, and select the solution from the Pareto front using the promethee method. Other works focus on ensembles of decision trees [37], recursive networks [63], or fuzzy rules [33] which prove the possibility of using this approach. However, the number of works devoted to employing multi-objective optimization to classifier ensemble design is relatively small, especially compared to the common problem of designing the ensemble using a single criterion. Answers to key questions require further research, i.e., which models are best suited to the multi-objective task, and whether it is possible to develop algorithms that effectively combine different decision models. It is also necessary to develop methods for forming the base classifier and creating combination rules that connect them in the multi-objective optimization task.

Fletcher et al. [26] proposed a non-specific ensemble classification algorithm that uses multi-objective optimization to set the required. Ribero and Reynoso-Meza [59] developed a two-stage ensemble learning framework, where firstly, a set of diverse classifiers is generated, and secondly, a pool of classifiers is pruned. The same authors also [4] analyze different mood approaches for ensemble learning and propose a taxonomy of multi-objective ensemble learning. Olivera et al. [56] employed multi-objective optimization to select the base classifier’s valuable features and then choose the best ensemble line-op. Onan et al. [57] developed an ensemble method that employs a static classifier selection involving majority voting error and forward search, as well as a multi-objective differential evolution algorithm. Liang et al. [44] described an ensemble learning model based on multimodal multi-objective optimization. Asadi and Roshan [5] formulated an interesting proposition that focuses on the bagging procedure, considering the number of bags and simultaneously the diversity of the trained classifiers. Gu et al. [28] focused on classifier ensemble generation, where they proposed a solution that is a trade-off between accuracy and diversity ensemble. They also showed that the proposed solution could outperform single-objective methods.

2.4 Multi-objective optimization for imbalanced data classification

One could find only a few works on imbalanced data classification employing multi-objective optimization methods. It is worth mentioning the work of Bhowan et al. [6], who proposed to build a classifier ensemble based on Pareto-optimal classifiers. In turn, Soda [64] suggested training the classification system named Reliability-based Balancing, using multi-objective optimization methods and maximizing two criteria related to the frequency of correct decisions and G-mean. They used the data preprocessing technique based simultaneously on feature selection and prototype selection obtained from multi-objective optimization. Li et al. [43] proposed a data preprocessing method Adaptive Multi-objective Swarm Crossover Optimization, which used both over- and under-sampling at the same time. This approach selected the best proportion between majority and minority samples by multi-objective optimization.

Also, several ensemble techniques have been employed for imbalanced data classification tasks. Bhowan et al. [7] proposed a two-step approach to evolving ensembles using genetic programming for imbalanced data classification. The Pareto front optimal classifiers form an initial pool of classifiers, then ensemble pruning methods based on genetic programming were employed. Fernandez et al. [25] employed a decision tree ensemble and multi-objective optimization to find the best combination between feature and instance selections for the multi-class imbalanced task. Felicioni et al. [22] developed an algorithm that took fourth place in the ACM RecSys Challenge 2020, organized by Twitter. The challenge aimed to predict the probability of user engagement based on past interactions on the Twitter platform. Authors employ feature extraction and gradient boosting for decision trees and neural networks and build the ensemble using multi-objective optimization.

3 Proposition of the method

Let us propose the S VM E nsemble with M ulti-O bjective O ptimization S election (semoos) method dedicated to training an svm classifier ensemble for imbalanced data. Its pseudocode is presented in Algorithm 1 and in Fig. 1. ^{Footnote 1} The main idea is to find such a pool of svm s that gives a diversified ensemble. To achieve that, we look for the setting of two parameters C and γ of the svm Radial Basis Function (rbf) kernel. Additionally, the feature selection for each base classifier is performed to ensure the high diversity of the ensemble. The multi-objective optimization is used to select the best svm parameter setting, including feature selection [67]. This method depends on several parameters, but we distinguish two main versions of semoos - semoos b and semoos bp. semoos b employs an original bootstrapping method to increase the diversity of the ensemble (Bootstrapping is true). semoos bp employs also pruning to remove similar models from the ensemble (Pruning is true).

Bootstrapping generates data subspaces. Optimization based on this data results in differentiated svm parameters from a given range and determines which features will be selected. The best non-dominated solutions were used to create a model, and then such a model was added to the ensemble of classifiers. This iteration of center-based Bootstrapping was repeated several times to ensure stability, higher quality, as well as avoid overfitting.

3.1 Algorithm

Let us quickly analyze Algorithm 1, which starts with the learning set ${\mathscr{L}}\mathcal {S}$ as an input. Suppose the center-based Bootstrapping mechanism is enabled (Bootstrapping = True). In that case, it is repeated n-times, and it divides each fold of the dataset into subsets S. A root r is selected randomly from the set of examples only for the first iteration (r₁), and this point becomes a center $\overline {x}$. The distribution list D_i is built based on the sum of distances D and every distance from the root r to each samples in the ${\mathscr{L}}\mathcal {S}$. Then, D_i is used as a probability in the sampling with replacement. It creates the subset S_i composed of examples from ${\mathscr{L}}\mathcal {S}$. The following root r_i+ 1 is the furthest point from the center of mass. If x_c is the center of the time k −examples and d is the dimension of the x, then the center of k + 1 examples is given by the following formula:

$$ center(x_{c},k,x)=\frac{1}{k+1} \begin{bmatrix} k x_{c}^{(1)}+x^{(1)}\\ k x_{c}^{(2)}+x^{(2)}\\ \vdots\\ k x_{c}^{(d)}+x^{(d)} \end{bmatrix} $$

(1)

For example, in two-dimensional space, after the second iteration, there are two roots, and the center of them is in the middle of the line that connects these roots. After the third iteration, three roots are selected creating a triangle, so the new root is a centroid of this figure. This process continues until the iterator reaches the value of the n-repetition parameter.

In each iteration, the subset S_i is used as the input to the multi-objective optimization fulfilled by the nsga-ii algorithm [18]. Equation 2 presents two fitness functions, F₁ and F₂ that the optimization algorithm uses to maximize Precision and Recall

$$ \begin{cases} \text{maximize } F_{1}(C, \gamma, \hat{x}) = Precision \\ \text{maximize } F_{2}(C, \gamma, \hat{x}) = Recall \end{cases} $$

(2)

The metrics are calculated during the validation process inside the optimization with the base estimator svm and different values of its hyperparameters C and γ that have various lower and upper limits in the set of real numbers: C ∈ [1e 6,1e 9] and γ ∈ [1e-7,1e-4]. $\hat {x}$ is a binary vector containing selected features. These three parameters ($C, \gamma , \hat {x}$) form an initial population. The optimization is repeated until the maximum number of evaluations (m) is reached. It returns the Pareto optimal set containing results from the Pareto front, such as C,γ, and selected features.

Then, the estimator is trained and added to the ensemble for each result. The version semoos bp with the Pruning does not add all models to the ensemble. It finds unique values of fitness functions (Precision and Recall) and, based on these indexes of solutions, trains the estimator on $(C, \gamma , \hat {x})$. Finally, the algorithm returns the ensemble of classifiers. The prediction is based on Average Support Vectors.

The basic version of semoos is much simpler. It starts from the multi-objective optimization described above but the input is ${\mathscr{L}}\mathcal {S}$. Subsequently solutions from the Pareto Set PS such as $(C, \gamma , \hat {x})$ are used to train the model, and all models are added to the ensemble.

3.2 Computational complexity analysis

The computational complexity depends on a few aspects of the proposed method. Firstly, the time complexity of the base classifier svm is O(N³) [2], where N denotes the size of the dataset, so it is the number of examples in the learning set. Let us assume that M is the number of objectives, and K is the population size. The computational complexity of nsga-ii is O(MK²) [18]. In our case, M = 2, the complexity is O(2K²). The last part of the method is Bootstrapping, and its complexity is given by O(N). The complexity of each possible step of semoos is O(N³) + O(MK²) + O(N) = O(N³) + O(MK²), and the highest of them is the overall computational complexity of all versions of the proposed methods.

4 Experimental evaluation

The experiments described in this section are used to test the proposed methods and answer the research questions posed below.

RQ1: What is the impact of the semoos’s parameters (especially Bootstrapping and Pruning) on its quality?
RQ2: How do variants of the semoos affect classification quality?
RQ3: Can semoos methods outperform state-of-the-art algorithms?
RQ4: What is diversity of semoos ensemble compared to the reference methods?

4.1 Setup

Experiments were prepared using the Python programming language and a few libraries: Pymoo [8], scikit-learn [58], Numpy [55], Matplotlib [32], Pandas [72]. The implementation of the experiments is available in the GitHub repository^{Footnote 2}.

As the proposed method aims to train an ensemble of svm s, the Random Subspace method is also based on svm, which randomly performs feature selection for base classifiers. By choosing the right features for each base classifier, diversity is assured. The other reference algorithms are not using the ensemble classifier paradigm, i.e., simple svm classifiers without feature selection and two svm models based on feature selection. A description of the semoos algorithm’s variants and the reference methods used in the experiments are presented below.

semoos - svm Ensemble with Multi Objective Optimization Selection
semoos b - svm with Multi Objective Optimization Selection with Bootstrapping
semoos bp - svm Ensemble with Multi Objective Optimization Selection with Bootstrapping and Pruning
rs - Random Subspace svm Ensemble [31]
svm - Support Vector Machines [65]
fs - Feature Selection svm
fsirsvm - Feature Selection Imbalance Ratio svm

Three versions of the proposed method (semoos, semoos b, semoos bp) will be compared with the following benchmark solutions: svm, an ensemble rs, and two classifiers with feature selection (fs and fsirsvm). fs is a classical approach to Feature Selection. It is based on the Chi-square statistic [68] and the K-best function, which chooses K-best features. fsirsvm is almost the same as fs, but it has an additional parameter, ir – Imbalance Ratio of each fold, which is applied to the svm as the class weight parameter. fs and fsirsvm select 75% of features from each dataset. rs creates random subspaces, and it has 100 models in the ensemble. The Support Vector Machines (svm) classifier was used as a base classifier for all methods except all variants of semoos. svm has default parameters such as the regularization parameter C = 1, and the kernel is rbf, kernel coefficient γ scaled, probability estimates are set to True, and the rest parameters had default values from the scikit-learn.

As mentioned in Section 3, all semoos variants use svm and optimize its parameters C and γ. The rest parameters of svm are default. Our methods use the optimization algorithm nsga-ii (Non-dominated Sorting Genetic Algorithm) [18] by 100 populations with the diverse representation, because C and γ are real values, and $\hat {x}$ is a binary vector of selected features. After pre-experiments genetic operators were used, such as Random Sampling, Polynomial Mutation (for the real representation) and Bitflip Mutation (for the binary representation), Simulated Binary Crossover (for the real representation), and Two Point Crossover (for the binary representation). An eta parameter of Simulated Binary Crossover and Polynomial Mutation was set to 5. A constraint’s representation $\hat {x}$ is 75% meaning that 75% of features are selected and their value in the vector is 1. The process of optimizing parameters is done by 1000 evaluations. The size of the ensemble depends on the variant of our methods. The parameter determines the ensemble size in the semoos methods, which is ten models. In semoos b, this value is multiplied by a number of iterations (five iterations) of the Bootstrapping, and the final ensemble consists of 50 classifiers. Due to the Pruning in semoos bp, we cannot specify its size in advance.

As the experimental protocol, the Repeated Stratified K-Fold (5 repeats x 2 splits) cross-validation was chosen [66]. Such cross-validation was also applied inside the optimization to avoid overfitting. Results were saved as csv files for each dataset fold and metric. Then, we performed Wilcoxon statistical rank-sum tests to see if one method was statistically significantly better than the other in pairwise rankings. Five metrics were used to measure the quality of methods: bac – Balanced Accuracy, Gmean, Recall, and Precision.

All imbalanced datasets used in the experiments are listed in Table 1, where ID — is the dataset identifier, Dataset — is the name of the dataset, IR – is the Imbalance Ratio, Ex. – is the number of examples, Attr. – is the number of attributes. They were loaded from the keel dataset repository [70], where these datasets were divided according to ir. The first 19 datasets are separated with the line and have ir lower than 9. The rest of the datasets has ir higher than 9. We keep up this division in our results to check the effectiveness of our methods on data with low and high imbalance. Most of the datasets are multi-class problems but they have already been prepared as binary classification problems. For example, the glass-0-1-2-3_vs_4-5-6 dataset combines 0, 1, 2, 3 classes of the original dataset as a negative class (majority) and 4, 5, 6 as a positive class (minority). The extended description with class names is characterized by Fernandez et al. [23, 24].

Table 1 Description of datasets

Full size table

4.2 Experiments

We carried out four groups of experiments to answer the research questions:

Selection of the best hyperparameters of the semoos algorithm (Experiment 1).
Comparison of the main variants of the proposed method, i.e., investigation of the effect of bootstrapping and ensemble pruning on the quality of semoos (Experiment 2).
Comparison of the variants of the proposed method with selected reference algorithms (Experiment 3).
Evaluation of classifier ensemble diversity of the proposed variants of semoos (Experiment 4).

4.2.1 Experiment 1: Setting hyperparameters

Our methods have a few parameters, so we decided to conduct pre-experiments to choose values of these parameters that get the best quality. We tested semoos b on four datasets (ecoli-0-3-4-7_vs_5-6, glass5, vehicle3, yeast-2_vs_4) and averaged results to present figures.

Firstly, we check the eta parameter of crossover and mutation for each metric for values [2, 5, 10, 20]. Deb et al. in [19] point out that small eta values for crossover provide a diverse search among solutions. Figure 2 shows exemplary results for one dataset yeast-2_vs_4, where eta_m and eta_c are eta parameters for mutation and crossover accordingly. Figures presenting the rest datasets are available in the GitHub repository.^{Footnote 3}, ^{Footnote 4} The best results is a black square with bold, white font inside. After analyzing all datasets, the results shown in Fig. 2 proved that the parameters achieving the highest bac, Gmean, and Recall are eta_c = 5 and eta_m = 5. Precision is not the highest for these values, but the metric equal 0.916 are not much worse.

Next, the number of iterations of bootstraps and the percent of selected features were tested, and ecoli-0-3-4-7_vs_5-6 dataset is shown in Fig. 3. This situation is similar to the previous one, bac, Gmean, and Recall indicate the highest metrics for bootstrap = 5 and features = 75%.

4.2.2 Experiment 2: Comparison of three variants of semoos

Based on the parameters selected in the previous experiment, tests are conducted comparing the proposed semoos, semoos b, and semoos bp methods. The results from the folds and all datasets are averaged, and Wilcoxon rank-sum statistical tests are performed. In the figure of the Wilcoxon test, rows are different metrics (bac, Gmean, Recall, Precision), and columns are labeled with methods. The green color means that the method wins, yellow – ties, and red – loses to the method located at the bar on the chart. The black dashed line indicates the statistical significance of the method as winning. Figure 4 shows a test for datasets with the imbalance ratio of less than 9. The results do not indicate any of the methods as statistically significantly better than the others. However, it can be noticed that semoos and semoos bp win with the semoos b method, especially in the first three metrics, i.e. bac, Gmean, and Recall. In Fig. 5, there are 59 datasets where IR > 9. Similar conclusions can be drawn from it that semoos is better than semoos b, but the other methods are at a similar level.

4.2.3 Experiment 3: Comparison with reference methods

A vital element of each method is to compare it with state-of-the-art methods to check whether the proposed new method is statistically significantly better than the others. This experiment will compare three proposed variants of the semoos method and the reference methods rs, svm, fs, fsirsvm. In Appendix ?? is a link to tables with the exact results of the metrics averaged over the folds together with the standard deviation for each dataset. The best result for a given dataset is marked in bold. Each table contains a different metric. The results of the proposed methods are compared with the reference ones. Sometimes the quality is slightly higher or lower, but it is difficult to pinpoint a winning method.

Therefore, Wilcoxon statistical tests showed how many datasets each proposed method won. As in the previous section, the figures are grouped by imbalance ratio above or below 9. Each semoos variant is compared to all reference methods for the five metrics. All semoos variants work similarly in Fig. 6, and they are statistically significantly better than fsirsvm for bac, Gmean, and Recall metrics. They also achieved many wins compared to the rs method. However, only semoos for the Recall metric shows a statistically significant win.

The results in Fig. 7 are slightly different for data with high imbalance. There are many more datasets in this case. All variants of the semoos method win with static significance with the rs method for bac, Gmean, and Recall metrics. There are also fewer draws between the methods for these three metrics. An important goal for us was to correct identify the minority class that is represented by Recall.

Analysis of the Pareto front in the objective functions space is presented in Fig. 8. Triangles represent the reference methods. Our method proposals fall into two categories, the method name with PF (Pareto Front), and the method name alone is the quality of the final constructed ensemble. These scatter plots show how the Pareto front solutions locate relative to the final built ensembles and the reference methods. Figure 8a shows an exemplary broad Pareto front, which provides more diverse models. The reference methods are behind the Pareto front; when comparing them to ensembles, they obtain a lower Precision value. Figure 8b shows more focused PF solutions than Fig. 8a. semoos variants obtain a similar value for both metrics and win over the reference methods. In these examples, the methods score is both high Precision and Recall, but this is not a rule for all datasets.

4.2.4 Experiment 4: Evaluating semoos ensemble diversity

The last experiment focuses on the evaluation of the diversity of models in the ensemble. We conducted tests for four diversity metrics proposed in [42] for three semoos variants and one reference ensemble rs. Figure 9 shows the Q-statistic metric, while all other metrics are available on the remote GitHub repository.^{Footnote 5}, ^{Footnote 6}

Q-statistic is a pairwise diversity measure. It can assess two classifier outputs and return the decision on their similarity. Q is in the range from − 1 to 1, where Q = 0 means that classifiers are statistically independent; Q < 0 – classifiers make mistakes on different objects; Q > 0 – classifiers correctly recognize the same objects.

From Fig. 9a, it may be concluded that for data with a small imbalance, the method diversity hardly differs between each other. However, for a greater imbalance (Fig. 9b), these differences are significant, and the result of the semoos method is closest to the value 0, which means that the models in this method are the most diverse, compared to rs the difference is 0.3.

4.3 Lessons learned

Let us try to answer the asked research questions considering the obtained results of the experiments.

RQ1: What is the impact of the semoos’s parameters (especially Bootstrapping and Pruning) on its quality?

During pre-experiments, four parameters were examined: eta_c for crossover and eta_m for mutation, the number of Bootstrap iterations, and the number of features. The quality of the eta parameter for both genetic operators depends on the selected metric. The most significant difference between the best and worst results is 0.3, but it is usually only a few hundredths. The combination of different values of the eta_c and eta_m parameters does not give unequivocal results. The analysis of these cases showed that good results were obtained for the values of eta_c = 5 and eta_m = 5. It is easier to draw the following conclusions for the other two parameters. The greater the number of features selected, the higher the value of most metrics. The results are the worst for the 1 and 10 iterations of Bootstrapping. Therefore, we selected 75% of the features and five iterations of Bootstrapping for further experiments to obtain the highest classification quality. Bootstrapping and Pruning as parameters of the semoos method improve the quality of the tested metric.
RQ2: How do variants of the semoos method affect classification quality?

None of the proposed methods shows a statistically significant difference compared to the others. Therefore, we included all three variants in further experiments. However, based on the presented statistical test results, semoos and semoos bp are better than the semoos b for all datasets.
RQ3: Can semoos methods outperform state-of-the-art algorithms?

Each variant of the semoos method has been compared with four state-of-the-art methods, and the statistical test results for each semoos variant are similar if we consider the number of wins. However, depending on the level of dataset imbalance, the results vary. For datasets with the imbalance ratio below 9, the proposed methods exceed the fsirsvm method with statistical significance and have a high number of wins with the rs method. However, for the imbalance ratio above 9, statistical significance is obtained for the rs algorithm.
RQ4: What is diversity of semoos ensemble compared to the reference methods?

The last element of the research shows that all proposed methods are more diversified than the Random Subspace (rs) method. When analyzing the figure with a greater imbalance ratio with a larger number of datasets, it is noticeable that the semoos method works best.

5 Conclusion

The main goal of this work was to use multi-objective optimization (nsga-ii) in the form of two independent fitness functions, Precision and Recall, to classify imbalanced data. Three ensemble classifiers using svm as a base model were proposed: semoos, semoos b, and semoos bp. semoos is a basic version that adds all svm models obtained from optimization. semoos b has the additional option of Bootstrapping, i.e., resampling a dataset with replacement is performed to get more samples from one dataset. The last version is semoos bp, and the method includes both the Bootstrapping and Pruning needed to remove redundant models from the ensemble.

The experiments were performed according to the experimental protocol on 78 imbalanced datasets. First, semoos hyperparameters were set to select the parameter values that produce the best results for this kind of data. Then, the three versions of the semoos method were compared with each other, but none of the methods showed statistical significance. Therefore all versions were selected for further research. The main experiment was to compare the proposed methods with various state-of-the-art methods, such as single svm classifier, ensemble classifier (rs), and two methods using feature selection (fs and fsirsvm). Statistical tests showed that the semoos variants outperform the rs and fsirsvm methods with statistical significance. The last element of the work was to compare the diversity of models in ensemble methods. The presented results show that the proposed methods are more diversified than the SOTA solutions.

Our methods could also be tested for multi-class problems because data binarization is only a simplification of this problem. Due to feature selection in the proposed methods and potentially good results for datasets with more features, it would be good to check more such datasets. In case of too little real data, it is worth considering artificially generated data, which will also ensure controlled conditions for testing methods. The comparison could also be made with other reference methods that would have more elements in common with the semoos, semoos b or semoos bp methods. The last of future works is to consider the optimization itself, that is to use other multi-objective optimization methods, e.g. moea/d, other metrics in fitness functions, or to increase their number.

Notes

References

Abbass HA (2003) Pareto neuro-evolution: Constructing ensemble of neural networks using multi-objective optimization. In: The 2003 congress on evolutionary computation, 2003. CEC’03. IEEE, pp 2074–2080
Abdiansah A, Wardoyo R (2015) Time complexity analysis of support vector machines (svm) in libsvm. Int J Comput Appl 128(3):28–34
Google Scholar
Abraham A, Jain L (2005) Evolutionary multiobjective optimization.evolutionary multio-bjective optimization. In: Advanced information and knowledge processing. Springer, London, pp 1–6
Alves Ribeiro VH, Reynoso-Meza G (2020) Ensemble learning by means of a multi-objective optimization design approach for dealing with imbalanced data sets. Expert Syst Appl 147:113,232
Google Scholar
Asadi S, Roshan SE (2021) A bi-objective optimization method to produce a near-optimal number of classifiers and increase diversity in bagging. Knowl-Based Syst 213:106,656. https://doi.org/10.1016/j.knosys.2020.106656
Google Scholar
Bhowan U, Johnston M, Zhang M et al (2013) Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans Evol Comput 17(3):368–386
Google Scholar
Bhowan U, Johnston M, Zhang M et al (2014) Reusing genetic programming for ensemble selection in classification of unbalanced data. IEEE Trans Evol Comput 18(6):893–908
Google Scholar
Blank J, Deb K (2020) Pymoo: Multi-objective optimization in python. IEEE Access 8:89,497–89,509
Google Scholar
Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surveys 49(2):31,1–31,50
Google Scholar
Brzezinski D, Stefanowski J, Susmaga R et al (2020) On the dynamics of classification measures for imbalanced and streaming data. IEEE Trans Neural Netw Learn Syst 31(8):2868–2878
Google Scholar
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-Level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Advances in knowledge discovery and data mining, 13th Pacific-Asia Conference 2009, Bangkok, Thailand, April 27-30, 2009, Proceedings, pp 475–482
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L et al (2020) A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 408:189–215. https://doi.org/10.1016/j.neucom.2019.10.118. https://www.sciencedirect.com/science/article/pii/S0925231220307153
Google Scholar
Chandra A, Yao X (2006) Ensemble learning using multi-objective evolutionary algorithms. J Math Modell Algo 5(4):417–445
MathSciNet MATH Google Scholar
Chawla NV, Bowyer KW, Hall LO et al (2002) SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16:321– 357
MATH Google Scholar
Chawla NV, Lazarevic A, Hall LO et al (2003) SMOTEBOost: Improving prediction of the minority class in boosting. Springer, Berlin, pp 107–119
Google Scholar
Chen Xw, Wasikowski M (2008) Fast: A ROC-based feature selection metric for small samples and imbalanced data classification problems. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 124– 132
Coello CA, Lamont GB, Veldhuizen DAV (2006) Evolutionary algorithms for solving multi-objective problems (Genetic and Evolutionary Computation). Springer, Berlin
MATH Google Scholar
Deb K, Pratap A, Agarwal S et al (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evolutionary Comput 6(2):182–197
Google Scholar
Deb K, Sindhya K, Okabe T (2007) Self-adaptive simulated binary crossover for real-parameter optimization. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, pp 1187–1194
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley-Interscience, USA
MATH Google Scholar
Ehrgott M (2005) Multicriteria optimization. Springer, Berlin
MATH Google Scholar
Felicioni N, Donati A, Conterio L et al (2020) Multi-objective blended ensemble for highly imbalanced sequence aware tweet engagement prediction. In: Proceedings of the recommender systems challenge 2020. association for computing machinery, New York, NY, USA, RecSysChallenge ‘20, pp 29–33
Fernández A, García S, Del Jesus MJ et al (2008) A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst 159(18):2378–2398
MathSciNet Google Scholar
Fernández A, del Jesus MJ, Herrera F (2009) Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets. Int J Approx Reason 50(3):561–577
MATH Google Scholar
Fernández A, Carmona CJ, José del Jesus M et al (2017) A pareto-based ensemble with feature and instance selection for learning from multi-class imbalanced datasets. Int J Neural Syst 27(06):1750,028
Google Scholar
Fletcher S, Verma B, Zhang M (2020) A non-specialized ensemble classifier using multi-objective optimization. Neurocomputing 409:93–102
Google Scholar
Galar M, Fernandez A, Barrenechea E et al (2012) A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on systems, man, and cybernetics part C (Applications and Reviews) 42(4):463–484
Google Scholar
Gu S, Cheng R, Jin Y (2015) Multi-objective ensemble generation. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5
Han H, Wang W, Mao B (2005) Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Advances in intelligent computing, international conference on intelligent computing 2005, Hefei, China, August 23-26 2005, Proceedings Part I, pp 878–887
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21 (9):1263–1284
Google Scholar
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Google Scholar
Hunter JD (2007) Matplotlib: A 2d graphics environment. Comput Sci Eng 9(3):90–95
Google Scholar
Ishibuchi H, Nojima Y (2006) Fuzzy ensemble design through Multi-Objective fuzzy rule selection. Springer, Berlin, pp 507–530
MATH Google Scholar
Japkowicz N, Myers C, Gluck M (1995) A novelty detection approach to classification. In: Proceedings of the 14th international joint conference on artificial intelligence - Volume 1. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’95, pp 518–523
Karasu S, Altan A, Bekiros S et al (2020) A new forecasting model with wrapper-based feature selection approach using multi-objective optimization technique for chaotic crude oil time series. Energy 212:118,750. https://doi.org/10.1016/j.energy.2020.118750 https://doi.org/10.1016/j.energy.2020.118750 . https://www.sciencedirect.com/science/article/pii/S0360544220318570 https://www.sciencedirect.com/science/article/pii/S0360544220318570
Google Scholar
Khanali M, Akram A, Behzadi J et al (2021) Multi-objective optimization of energy use and environmental emissions for walnut production using imperialist competitive algorithm. Appl Energy 284:116,342. https://doi.org/10.1016/j.apenergy.2020.116342 https://doi.org/10.1016/j.apenergy.2020.116342. https://www.sciencedirect.com/science/article/pii/S0306261920317244 https://www.sciencedirect.com/science/article/pii/S0306261920317244
Google Scholar
Kocev D, Vens C, Struyf J et al (2007) Ensembles of multi-objective decision trees. In: European conference on machine learning. Springer, Berlin, pp 624–631
Koziarski M, Woźniak M (2017) Ccr: Combined cleaning and resampling algorithm for imbalanced data classification. Int J Appl Math Comput Sci 27(4)
Koziarski M, Krawczyk B, Woźniak M (2017) Radial-based approach to imbalanced data oversampling. In: International conference on hybrid artificial intelligence systems. Springer, Berlin, pp 318–327
Krawczyk B, Woźniak M, Schaefer G (2014a) Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl Soft Comput 14(Part C):554–562
Google Scholar
Krawczyk B, Wozniak M, Cyganek B (2014b) Clustering-based ensembles for one-class classification. Inf Sci 264:182–195
MathSciNet MATH Google Scholar
Kuncheva LI (2014) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, USA
MATH Google Scholar
Li J, Fong S, Wong RK et al (2018) Adaptive multi-objective swarm fusion for imbalanced data classification. Inform Fusion 39:1–24
Google Scholar
Liang J, Wei P, Qu B et al (2020) Ensemble learning based on multimodal multiobjective optimization. In: Pan L, Liang J, Qu B (eds) Bio-inspired computing: Theories and applications. Springer, Singapore, pp 299–313
Lin X, Zhen HL, Li Z et al (2019) Pareto multi-task learning. In: Wallach H, Larochelle H, Beygelzimer A et al (eds) Advances in neural information processing systems. https://proceedings.neurips.cc/paper/2019/file/685bfde03eb646c27ed565881917c71c-Paper.pdf, vol 32. Curran Associates Inc, New York
Liu B, Rodriguez D (2021) Renewable energy systems optimization by a new multi-objective optimization technique: A residential building. Journal of Building Engineering 35:102–094. https://doi.org/10.1016/j.jobe.2020.102094 https://doi.org/10.1016/j.jobe.2020.102094. https://www.sciencedirect.com/science/article/pii/S2352710220337268
Google Scholar
Lopez V, Fernandez A, Moreno-Torres JG et al (2012) Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. open problems on intrinsic data characteristics. Expert Syst Appl 39 (7):6585–6608
Google Scholar
Lv Z, Wang L, Han Z et al (2019) Surrogate-assisted particle swarm optimization algorithm with pareto active learning for expensive multi-objective optimization. IEEE/CAA J Autom Sin 6(3):838–849. https://doi.org/10.1109/JAS.2019.1911450
Article MathSciNet Google Scholar
Ma L, Huang M, Yang S et al (2022) An adaptive localized decision variable analysis approach to large-scale multiobjective and many-objective optimization. IEEE Trans Cybern 52(7):6684–6696. https://doi.org/10.1109/TCYB.2020.3041212
Article Google Scholar
Maciejewski T, Stefanowski J (2011) Local neighbourhood extension of SMOTE for mining imbalanced data. In: Proceedings of the IEEE symposium on computational intelligence and data mining 2011, part of the IEEE symposium series on computational intelligence 2011, April 11-15 2011, Paris, France, pp 104–111
Mierswa I (2007) Regularization through multi-objective optimization. In: Hinneburg A (ed) LWA 2007: Lernen - Wissen - adaption, Halle, Deutschland, september 2007, workshop proceedings. Martin-luther-university Halle-wittenberg, pp 94– 101
Minku LL, White AP, Yao X (2009) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742
Google Scholar
Napierala K, Stefanowski J (2012) Identification of different types of minority class examples in imbalanced data. In: Hybrid artificial intelligent systems, lecture notes in computer science, vol 7209. Springer, Berlin, pp 139–150
Nguyen BH, Xue B, Andreae P et al (2020) Multiple reference points-based decomposition for multiobjective feature selection in classification: Static and dynamic mechanisms. IEEE Trans Evol Comput 24 (1):170–184. https://doi.org/10.1109/TEVC.2019.2913831
Google Scholar
Oliphant TE (2006) A guide to NumPy, vol 1. Trelgol Publishing, USA
Google Scholar
Oliveira L, Morita M, Sabourin R et al (2005) Multi-objective genetic algorithms to create ensemble of classifiers. In: Evolutionary Multi-Criterion Optimization, pp 592–606
Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Syst Appl 62:1– 16
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825– 2830
MathSciNet MATH Google Scholar
Ribeiro VHA, Reynoso-Meza G (2018) A multi-objective optimization design framework for ensemble generation. In: Proceedings of the genetic and evolutionary computation conference companion. Association for Computing Machinery, New York, NY, USA, GECCO ’18, pp 1882–1885
Richardson J, Liepins G (1989) Some guidelines for genetic algorithms with penalty functions. In: Proceedings of the third international conference on genetic algorithms. Morgan Kaufmann Publishers, Los Altos, CA, pp 191–197
Ruchte M, Grabocka J (2021) Scalable pareto front approximation for deep multi-objective learning. In: 2021 IEEE international conference on data mining (ICDM), pp 1306–1311, DOI https://doi.org/10.1109/ICDM51629.2021.00162, (to appear in print)
Sahoo KS, Tripathy BK, Naik K et al (2020) An evolutionary svm model for ddos attack detection in software defined networks. IEEE Access 8:132,502–132,513. https://doi.org/10.1109/ACCESS.2020.3009733
Google Scholar
Smith C, Jin Y (2014) Evolutionary multi-objective generation of recurrent neural network ensembles for time series prediction. Neurocomputing 143:302–311
Google Scholar
Soda P (2011) A multi-objective optimisation approach for class imbalance learning. Pattern Recogn 44(8):1801–1810
MATH Google Scholar
Steinwart I, Christmann A (2008) Support vector machines. Springer Science & Business Media, Berlin
MATH Google Scholar
Stąpor K, Ksieniewicz P, García S et al (2021) How to design the fair experimental classifier evaluation. Appl Soft Comput 104:107,219. https://doi.org/10.1016/j.asoc.2021.107219. https://www.sciencedirect.com/science/article/pii/S1568494621001423
Google Scholar
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. Data classification: Algorithms and applications 37
Thaseen IS, Kumar CA (2017) Intrusion detection model using fusion of chi-square feature selection and multi class svm. J King Saud Univ Comput Info Sci 29(4):462–472
Google Scholar
Tian Y, Si L, Zhang X et al (2021) Evolutionary large-scale multi-objective optimization: A survey. ACM Comput Surv 54(8). https://doi.org/10.1145/3470971
Triguero I, González S, Moyano JM et al (2017) Keel 3.0: an open source software for multi-stage analysis in data mining. International Journal of Computational Intelligence Systems 10(1):1238–1249
Google Scholar
Wang S, Chen H, Yao X (2010) Negative correlation learning for classification ensembles. In: The 2010 international joint conference on neural networks (IJCNN), pp 1–8
Wes M (2010) Data structures for statistical computing in python. In: van der Walt S, Millman J (eds) Proceedings of the 9th python in science conference, pp 56–61
Węgier W, Koziarski M, Woźniak M (2022) Multicriteria classifier ensemble learning for imbalanced data. IEEE Access 10:16,807–16,818. https://doi.org/10.1109/ACCESS.2022.3149914
Google Scholar
Woźniak M, Graña M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17
Google Scholar
Wu Y, Zhang Y, Liu X et al (2018) A multiobjective optimization-based sparse extreme learning machine algorithm. Neurocomputing 317:88–100. https://doi.org/10.1016/j.neucom.2018.07.060. https://www.sciencedirect.com/science/article/pii/S0925231218308956 https://www.sciencedirect.com/science/article/pii/S0925231218308956
Google Scholar
Yang S, Tian Y, He C et al (2021) A gradient-guided evolutionary approach to training deep neural networks. IEEE Transactions on Neural Networks and Learning Systems 1–15
Yaochu J, Okabe T, Sendhoff B (2004) Neural network regularization and ensembling using multi-objective evolutionary algorithms. In: Proceedings of the 2004 congress on evolutionary computation (IEEE Cat. No.04TH8753), vol 1, pp 1–8
Zhang N, Ying S, Ding W et al (2021) Wgncs: A robust hybrid cross-version defect model via multi-objective optimization and deep enhanced feature representation. Inf Sci 570:545–576. https://doi.org/10.1016/j.ins.2021.05.008. https://www.sciencedirect.com/science/article/pii/S0020025521004540
MathSciNet Google Scholar
Zhang Q, Li H (2007) Moea/d: A multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731. https://doi.org/10.1109/TEVC.2007.892759
Article Google Scholar
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the Polish National Science Centre under the grant No. 2019/35/B/ST6/04442.

Author information

Authors and Affiliations

Department of Systems and Computer Networks, Wroclaw University of Science and Technology, Wybrzeże Wyspiańskiego 27, Wroclaw, 50-370, Poland
Joanna Grzyb & Michał Woźniak

Authors

Joanna Grzyb
View author publications
You can also search for this author in PubMed Google Scholar
Michał Woźniak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joanna Grzyb.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Detailed results of all experiments are available at GitHub repository: https://github.com/joannagrzyb/SEMOOS_cv/blob/main/README.md.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Grzyb, J., Woźniak, M. SVM ensemble training for imbalanced data classification using multi-objective optimization techniques. Appl Intell 53, 15424–15441 (2023). https://doi.org/10.1007/s10489-022-04291-9

Download citation

Accepted: 22 October 2022
Published: 17 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04291-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

SVM ensemble training for imbalanced data classification using multi-objective optimization techniques

Abstract

Similar content being viewed by others

Clustering and Weighted Scoring in Geometric Space Support Vector Machine Ensemble for Highly Imbalanced Data Classification

Automatic Configuration of a Multi-objective Local Search for Imbalanced Classification

Combining One-vs-One Decomposition and Ensemble Learning for Multi-class Imbalanced Data

1 Introduction

2 Related works

2.1 Multi-objective optimization

2.2 Imbalanced data classification

Data preprocessing methods

Inbuilt mechanisms

Hybrid methods

2.3 Classifier ensembles

2.4 Multi-objective optimization for imbalanced data classification

3 Proposition of the method

3.1 Algorithm

3.2 Computational complexity analysis

4 Experimental evaluation

4.1 Setup

4.2 Experiments

4.2.1 Experiment 1: Setting hyperparameters

4.2.2 Experiment 2: Comparison of three variants of semoos

4.2.3 Experiment 3: Comparison with reference methods

4.2.4 Experiment 4: Evaluating semoos ensemble diversity

4.3 Lessons learned

5 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation