All experiments were run on an Intel Core(TM) i5-2217U CPU@1.70 GHz with 6 GB of RAM under Windows 8—64 bits, processor x64.
The dataset normalization
The dataset normalization called also feature scaling is a mandatory preprocessing step before staring the classification task. This step is used to avoid variables in greater numeric ranges to dominate those in smaller numeric ranges. The feature values are linearly scaled to the range \([-1,+1]\) or [0, 1] using formula (2), where X denotes the original value; X denotes the scaled value. \(MAX_{a}\) is the upper bound of the feature value a, and \(MIN_{a}\) is the lower bound of the feature value a.
Table 2 An instance of Australian dataset with scaled values
Table 3 The results of 50 run of LS+SVM on German dataset
In our study, we scaled the different feature values to the range \([-1, +1]\).
$$\begin{aligned} X ^{'} = \left( \begin{array}{c} {\frac{X -MIN_{a}}{MAX_{a} -MIN_{a}}} \end{array} \right) \times 2 -1. \end{aligned}$$
(2)
The dataset description
To evaluate the performance of the proposed methods for credit scoring, we considered both German and Australian credit datasets from UCI (University of California at Irvine) Machine Learning RepositoryFootnote 1. The descriptions of the two credit datasets are given as follows:
-
1.
The German credit dataset is a credit dataset proposed by the Professor Hans Hofmann from Universit"at Hamburg. The dataset consists of 1000 instances. There are two classes: class 1 (worthy, 700 instances) and class 0 (unworthy, 300 instances). We find on UCI, two versions of German dataset:
-
The original dataset german.data that contains categorical/symbolic variables. The number of variable is equal to 20 where 7 are numerical and 13 categorical.
-
The "german.data-numeric" dataset provided by Strathclyde University to be used with algorithms which cannot cope with categorical variables. The number of attributes is equals to 24 numerical attributes. In our experiments, we worked on the "german.data-numeric" dataset version.
An example of an instance of "german.data-numeric" before and after the scaling process is given in Table 1. We note that an instance describe the profile of a given applicant.
-
2.
The Australian Credit Approval is proposed by Quinlan [39]. It concerns credit card applications. The dataset consists of 690 instances of loan applicants. There are two classes: class 1 (worthy, 307 instances) and class 0 (unworthy, 384 instances). The number of variables is equal to 14. There are 6 numerical and 8 categorical variables. An example of an instance of "Australian" is given in Table 2.
Numerical results
Due to the non-deterministic nature of the proposed methods, 50 runs have been considered for each dataset and for each method. In the following, we give the results obtained with LS+SVM, SLS+SVM and VNS+SVM methods. We give the accuracy rate for each run for each method on each dataset.
We compute some summary statistics on accuracy and the number of selected variables. We give the minimum (Min), the mean, the median, the first quartile (first Qu.), the third quartile (third Qu.) and the maximum (Max). We give also the best solution found with the best accuracy for each dataset. The results are given in Tables 3, 4, 5, 6, 7, 8.
Table 4 The results of 50 run of LS+SVM on Australian dataset
Table 5 The results of 50 run of SLS+SVM on German dataset
Table 6 The results of 50 run of SLS+SVM on Australian dataset
Table 7 The results of 50 run of VNS+SVM on German dataset
Table 8 The results of 50 run of VNS+SVM on Australian dataset
From Tables 3, 4, 5, 6, 7, 8 we observe that the obtained results can have the same number of selected variables but different accuracy on different runs. The local search-based feature selection methods do not lead to the same solution when applied to the same problem. This is due to the non-deterministic nature of these methods. In addition, some variables have a significant effect on the solution quality which leads to improvements in accuracy when such variables are selected. We can conclude that the generated solutions are not unique.
We can obtain solutions with the same number of variables but with different accuracy rate because the selected variables are not always the same. For example: Table 8 shows that the solutions with eight selected variables found in run 1, run 4 and run 39 are not the same in spite of the same number of selected variables. The accuracy rates are 86.811, 86.521 and 86.376, respectively.
For instance, the following two solutions have 8 selected variables. The solution: "0 1 1 1 1 1 1 0 0 1 0 0 1 0 " has an accuracy rate equals to 86.811%. The selected variables are A2, A3, A4, A5, A6, A7, A10 and A13. But the solution: "1 1 1 0 0 1 0 1 1 0 1 0 0 1" has an accuracy rate equals to 86.521%. The selected variables are: A1, A2, A3, A6, A8, A9, A11 and A14. This means that for the Australian dataset, the set of variables {A2, A3, A4, A5, A6, A7, A10 and A13} is more significant than the set of variables {A1, A2, A3, A6, A8, A9, A11 and A14}.
According to the numerical results, we can say that the three methods succeed in finding good results for the two considered datasets. However, we see a slight performance in favor of the variable neighborhood search (VNS). The latter is able to find better solution compared to LS and SLS. Hence, we can conclude that the VNS method with the four different neighbor structures is effective for feature selection and classification.
The superiority of VNS is due to the good combination of intensification and diversification which permits to explore the search space effectively and locate good solutions.
In addition to the numerical results given in the different Tables 3, 4, 5, 6, 7, 8, we draw the boxplots given in Figs. 3 and 4 to better visualize the distribution of values of the classification accuracy.
From the box diagram depicted in Figs. 3 and 4,we visualized the distribution of classification accuracy on the 50 runs for each algorithm and for both Australian and German dataset. This diagram shows clearly that in general VNS is able to produce good solutions. The results are promising and demonstrate the benefit of the proposed technique in feature selection. To further demonstrate the effectiveness of the proposed technique in credit scoring, we give further comparisons in the next subsection.
A comparison with a pure SVM
In this section, we compare the three proposed methods LS+SVM, SLS+SVM and VNS+SVM with a pure SVM on both German and Australian datasets. The aim is to show the impact of the feature selection in the classification task.
Table 9 gives the results obtained with SVM, LS+SVM, SLS+SVM and VNS+SVM methods. We give the best accuracy rate and the number of best variables set (significant) returned by each method.
As we can see from Table 9 that the three methods are better than the pure SVM. The three proposed methods are able to find good results for the two considered datasets. SLS and LS are comparable and succeed in improving the accuracy rate of SVM.
Further, VNS+SVM method is more effective on both Australian and German datasets compared to both LS+SVM and SLS+SVM. We draw Fig. 5 (respectively Fig. 6) to compare a pure SVM with our approach in term of accuracy rate (respectively in term of the number of selected variables) point of view. The performance of our approach compared to SVM is shown clearly in Figs. 5 and 6. We note that:
-
LS returns 13 significant selected variables for the German dataset which are: A2, A4, A10, A12, A13, A15, A17, A18, A19, A20, A22, A23 and A24. The accuracy rate is equal to 77.70 %. The significant variables returned by LS for the Australian dataset are: A2, A3, A4, A6, A7, A9, A10, A13 and A14. The accuracy rate is equal to 86.38% and the number of selected variables s is 9.
-
SLS returns 12 significant selected variables for the German dataset which are: A1, A3, A10, A13, A15, A16, A17, A18, A19, A20, A22, A23 and A24. The accuracy rate is equal to 77.90%. The significant variables found by SLS for the Australian dataset are: A2, A4, A6, A7, A9, A10, A11, A13 and A14. The accuracy rate is equal to 86.38% and the number of selected variables is 9.
-
The 16 significant selected variables returned by VNS for the German dataset are: A2, A4, A5, A6, A9, A11, A12, A13, A14, A15, A16, A17, A18, A19, A22 and A24 where the accuracy rate is equals to 78% and the. For the Australian dataset, VNS returns 8 significant variables which are: A2, A3, A4, A5, A6, A7, A10, A13. The best accuracy rate is equal to 86.81%.
In this section, we compared the three proposed methods with feature selection to a pure SVM to measure the effectiveness of the additional feature selection method. As shown in Table 9, the proposed methods perform better than the pure SVM on both German and Australian datasets.
Further, we remark that fewer features are selected in the model to be used by SVM compared to the initial feature number of the dataset. This implies that some features in the dataset are redundant and should be eliminated to enhance the classification accuracy.
Further comparison
To show the performance of the proposed approaches in credit scoring, we evaluated them against some well-known classifiers. Several classifiers can be found on the WEKA Data mining software package [43].
We compared our approaches with some popular classifiers which are: the rule-learning scheme (PART), ZeroR, JRip, BayesNet, NaiveBayes, adaBoost, attributeSelectedClassifier, Bagging, RandomForst, RandomTree and J48. These eleven classifiers from WEKA [43] were used in this study by means of their default parameters originally set in WEKA.
We add also a comparison with two well-known filtering methods. We choose the best-first search (CFS) and the ranking filter information gain methods (IGRF). We note that CFS is a correlation based feature selection that can be used to select a set of variables. However, CFS is unable to select all relevant variables when there are strong dependencies between variables. The IGRF ranking filter permits to select a set of variables from the original dataset using score or weights [19, 37]. We combined these two feature selection methods (CFS and IGRF) with SVM to classify data.
Table 10 compares the three proposed methods (LS+SVM, SLS+SVM and VNS+SVM), the eleven classifiers from WEKA, CFS+SVM and IGRF+SVM on the two considered datasets: Australian and German. The comparison is in term of the average classification accuracy rates.
As shown in Table 10, the three proposed approaches (LS+SVM, SLS+SVM and VNS+SVM) are comparable to the well-known classifiers. We can see a slight performance in favor of our VNS+SVM method. The proposed method (VNS+SVM) gives the highest average classification accuracy compared to PART, JRip, BayesNet, NaiveBayes, adaBoost, attributeSelectedClassifier, Bagging, RandomForst, RandomTree and J48 on both Australian and German datasets.
Further, we remark that OneR and VNS+SVM classifiers are comparable on Australian dataset. However, VNS+SVM is better than OneR on German dataset. OneR gives an average accuracy equal to 86.6% on Australian dataset but it fails on German dataset where the average accuracy rate value given by OneR is equal to 60.8%. The VNS+SVM method succeeds in finding good results for both Australian and German datasets. For Australian dataset, VNS+SVM gives an average accuracy value equal to 86.50% when VNS is used as a feature selection method within SVM classifier. For German dataset, VNS+SVM gives the best average accuracy value equals to 77.46% compared to the all considered classifiers.
When we compare the feature selection methods (CFS, IGRF and our three local search methods), we can see that our approaches provide good results compared to both CFS and IGRF ranking methods. For example, for German dataset, SVM with CFS gives an average accuracy value equals to 72.70% when the CFS is used as a feature selection method while SVM with IGRF gives an average accuracy equal to 75.6%. The results are much better when we use our proposed approaches in particular when we consider VNS with SVM. As already said, the resulting method VNS+SVM gives the best average accuracy value which is equal to 77.46% for the German dataset. This performance is also confirmed on Australian dataset with an average accuracy value equal to 86.50%.
Table 9 SVM .vs. LS+SVM .vs. SLS+SVM .vs. VNS+SVM
Table 10 A comparison according to the average classification accuracy rates
In conclusion, we can say that the three proposed approaches (LS+SVM, SLS+SVM and VNS+SVM) are comparable. However, promising results are obtained when combining SVM with the VNS-based feature selection method. This improvement can be shown for the two considered datasets which proves the ability of VNS+SVM as a good classifier in credit scoring.