Advertisement

Ensemble of Classifiers with Modification of Confidence Values

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9842)

Abstract

In the classification task, the ensemble of classifiers have attracted more and more attention in pattern recognition communities. Generally, ensemble methods have the potential to significantly improve the prediction base classifier which are included in the team. In this paper, we propose the algorithm which modifies the confidence values. This values are obtained as an outputs of the base classifiers. The experiment results based on thirteen data sets show that the proposed method is a promising method for the development of multiple classifiers systems. We compared the proposed method with other known ensemble of classifiers and with all base classifiers.

Keywords

Multiple classifier system Decision profile Confidence value 

1 Introduction

For several years, in the field of supervised learning a number of base classifiers have been used in order to solve one classification task. The use of the multiple base classifier for a decision problem is known as an ensemble of classifiers (EoC) or as multiple classifiers systems (MCSs) [12, 15]. The building of MCSs consists of three main phases: generation, selection and integration (fusion) [4]. When injecting randomness into the learning algorithm or manipulating the training objects is done [3, 11], then we are talking about homogeneous classifiers. In the second approach the ensemble is composed of heterogeneous classifiers. It means, that some different learning algorithms are applied to the same data set.

The selection phase is related to the choice of a set of classifiers from the whole available pool of base classifiers [14]. Formally, if we choose one classifier then it is called the classifier selection. But if we choose a subset of base classifiers from the pool then it is called the ensemble selection or ensemble pruning. Generally, in the ensemble selection, there are two approaches: the static ensemble selection and the dynamic ensemble selection [2, 4].

A number of articles [7, 16, 21] present a large number of fusion methods. For example, in the third phase the simple majority voting scheme [19] is most popular. Generally, the final decision which is made in the third phase uses the prediction of the base classifiers and it is popular for its ability to fuse together multiple classification outputs for the better accuracy of classification. The fusion methods can be divided into selection-based, fusion-based and hybrid ones according to their functioning [6]. The fusion strategies can be also divided into fixed and trainable ones [7]. Another division distinguishes class-conscious and class-indifferent integration methods [16].

In this work we will consider the modification confidence values method. The proposed method is based on information from decision profiles. The decision scheme used in training phase is created only from these confidence values that concern to the correct classification.

The remainder of this paper is organized as follows. Section 2 presents the concept of the base classifier and ensemble of classifiers. Section 3 contains the proposed modification confidence values method. The experimental evaluation, discussion and conclusions from the experiments are presented in Sect. 4. The paper is concluded by a final discussion.

2 Supervised Classification

2.1 Base Classifiers

The aim of the supervised classification is to assign an object to a specific class label. The object is represented by a set of d features, or attributes, viewed as d-dimensional feature vector x. The recognition algorithm maps the feature space x to the set of class labels \(\varOmega \) according to the general formula:
$$\begin{aligned} \varPsi :X\rightarrow \varOmega . \end{aligned}$$
(1)
The recognition algorithm defines the classifier, which in the complex classification task with multiple classifiers is called a base classifier.

The output of a base classifier can be divided into three types [16].

  • The abstract level – the classifier \(\psi \) assigns the unique label j to a given input x.

  • The rank level – in this case for each input (object) x, each classifier produces an integer rank array. Each element within this array corresponds to one of the defined class labels. The array is usually sorted and the label at the top being the first choice.

  • The measurement level – the output of a classifier is represented by a confidence value (CV) that addresses the degree of assigning the class label to the given input x. An example of such a representation of the output is a posteriori probability returned by Bayes classifier. Generally, this level can provide richer information than the abstract and rank level.

In this work we consider the situation when each base classifier returns CVs. Additionally, before the final combination of base classifiers’ outputs the CVs modification process is carried out.

2.2 Ensemble of Classifiers

Let us assume that \(k\in \{1,2,...,K\}\) different classifiers \(\varPsi _1,\varPsi _2,\ldots ,\varPsi _K\) are available to solve the classification task. The output information from all K component classifiers is applied to make the ultimate decision of MCSs. This decision is made based on the predictions of all the base classifiers.

One of the possible methods for integrating the output of the base classifier is the sum rule. In this method the score of MCSs is based on the application of the following sums:
$$\begin{aligned} s_{\omega }(x)=\sum _{k=1}^{K}p_k(\omega |x), \qquad \omega \in \varOmega , \end{aligned}$$
(2)
where \(p_k(\omega |x)\) is CV for class label \(\omega \) returned by classifier k.
The final decision of MCSs is made following the maximum rule:
$$\begin{aligned} \varPsi _{S}(x)= \arg \max _{\omega } s_{\omega }(x). \end{aligned}$$
(3)
In the presented method (3) CV obtained from the individual classifiers take an equal part in building MCSs. This is the simplest situation in which we do not need additional information on the testing process of the base classifiers except for the models of these classifiers. One of the possible methods in which weights of the base classifier are used is presented in [5].

Decision template (DT) is another approaches to build the MCSs. DT was proposed in [17]. In this MCS model DTs are calculated based on training set. One DT per class label. In the operation phase the similarity between each DT and outputs of base classifiers for object x is computed. The class label with the closest DT is assigned to object x. In this paper algorithm with DT is labelled \(\varPsi _{DT}\) and it is used as one of the reference classifiers.

3 Modification of Confidence Values Algorithm

3.1 Training Phase

The proposed algorithm of the modification CVs values uses DPs. DP is a matrix containing CVs for each base classifier, i.e.:
$$\begin{aligned} DP(x)= \left[ \begin{array}{ccc} p_1(0|x) &{} \dots &{}p_1(\varOmega |x) \\ \vdots &{} \dots &{}\vdots \\ p_K(0|x) &{} \dots &{}p_K(\varOmega |x) \\ \end{array} \right] . \end{aligned}$$
(4)
In the first step of the algorithm we remove CVs which relate to the misclassification on the training set. This set contains N labelled examples \(\{(x_1,\overline{\omega }_1),...,(x_N,\overline{\omega }_N)\}\), where \(\overline{\omega }_i\) is the true class label of the object described by feature vector \(x_i\). CVs are removed according to the formula:
$$\begin{aligned} p'_k(\omega |x)= \left\{ \begin{array}{crr} p_k(\omega |x),&{}\quad \text {if} &{}\quad I(\varPsi (x),\overline{\omega })=1 \\ 0,&{} \text {if} &{} \quad I(\varPsi (x),\overline{\omega })=0. \end{array} \right. \end{aligned}$$
(5)
where \(I(\varPsi (x),\overline{\omega })\) is an indicator function having the value 1 in the case of the correct classification of the object described by feature vector x, i.e. when \(\varPsi (x)=\overline{\omega }\).
In the next step, our algorithm, the decision scheme (DS) is calculated according to the formula:
$$\begin{aligned} DS(\beta )= \left[ \begin{array}{ccc} ds(\beta )_{10} &{} \dots &{}ds(\beta )_{1\varOmega } \\ \vdots &{} \dots &{}\vdots \\ ds(\beta )_{K0} &{} \dots &{}ds(\beta )_{K\varOmega } \\ \end{array} \right] , \end{aligned}$$
(6)
where
$$\begin{aligned} ds(\beta )_{k\omega } =\overline{ds}_{k\omega } + \beta \sqrt{\frac{\sum _{n=1}^{N}(p'_k(\omega _n|x_n)\, - \overline{ds}_{k\omega })^2}{N-1}} \end{aligned}$$
(7)
and
$$\begin{aligned} \overline{ds}_{k\omega } =\frac{\sum _{n=1}^{N} p'_k(\omega _n|x_n)}{N}. \end{aligned}$$
(8)
The parameter \(\beta \) in our algorithm determines how we compute DS elements. For example, if \(\beta =0\), then \(ds_{k\omega }\) is the average of appropriate DFs received after the condition (5).

3.2 Operation Phase

During the operation phase the modification of CVs is carried out using \(DS(\beta )\) calculated in the training phase. For the new object x being recognized, the outputs of the base classifiers construct DP(x) as in (4). The modification CVs from DP(x) is performed with the use of \(DS(\beta )\) calculated in (6) according to the formula:
$$\begin{aligned} {p'}_k(\omega |x)= \left\{ \begin{array}{lll} \overline{m}*{p}_k(\omega |x) &{}\quad \text {if } &{} {p}_k(\omega |x) \ge ds(1)_{k\omega }\\ m*{p}_k(\omega |x) &{} \quad \text {if } &{} ds(0)_{k\omega }< {p}_k(\omega |x) < ds(1)_{k\omega },\\ \underline{m}*{p}_k(\omega |x) &{}\quad \text {if } &{} {p}_k(\omega |x) \le ds(0)_{k\omega }\\ \end{array} \right. \end{aligned}$$
(9)
where \(\overline{m}\), m and \(\underline{m}\) define how ordinal CVs are modified. The modification process taking into account the Eq. (9) causes the ensemble of the classifier method to use the modified CVs. The algorithm using the proposed method is denoted as \(\varPsi ^{MCV}\). In the experiment modified CVs are combined according to the sum method (3).

4 Experimental Studies

In the experiments we use 13 data sets. Nine of them come from the Keel Project [1] and four com from UCI Repository [9] (Blood, Breast Cancer Wisconsin, Indian Liver Patient and Mammographic Mass UCI). The details of the data sets are included in Table 1. In the experiment 16 base classifiers were used from four different classification models. The first group of four base classifiers works according to \(k-NN\) rule, the second group uses the Support Vector Machines models. The next group uses the Neural Network model and the last group uses the decision trees algorithms. The base classifiers are labelled as \(\varPsi _1,...,\varPsi _{16}\).

The main aim of the experiments was to compare the quality of classifications of the proposed modification CVs algorithms \(\varPsi _{MCV}\) with the base classifiers \(\varPsi _1,...,\varPsi _{16}\) and their ensemble without the selection \(\varPsi _{DT}\) and \(\varPsi _S\). The parameters of \(\varPsi _{MCV}\) algorithm were established on \(\overline{m}=1.25\), \(m=1\) and \(\underline{m}=0.75\). In the experiments the feature selection process [13, 18] was not performed and we have used the standard 10-fold-cross-validation method.
Table 1.

Description of data sets selected for the experiments

Data set

Example

Attribute

Ration (0 / 1)

Cylinder bands

365

19

1.7

Haberman’s survival

306

3

2.8

Hepatitis

80

19

0.2

Mammographic mass

830

5

1.1

Parkinson

197

23

0.3

Pima Indians diabetes

768

8

1.9

South African hearth

462

9

1.9

Spectf heart

267

44

0.3

Statlog (heart)

270

13

1.3

Blood

748

5

3.2

Breast cancer Wisconsin

699

10

1.9

Indian liver patient

583

10

0.4

Mammographic mass UCI

961

6

1.2

Table 2.

Classification error and mean rank positions for the base classifiers (\(\varPsi _1,...,\varPsi _{16}\)), algorithms \(\varPsi _{S}\), \(\varPsi _{DT}\) and the proposed algorithm \(\varPsi _{MCV}\) produced by the Friedman test

Clas.

Data set

Rank

Bands

Blood

Cance

Haber

Heart

Hepat

Liver

MamU

Mam

Park

Pima

Sahea

Spectf

\(\varPsi _1\)

0.62

0.77

0.95

0.75

0.68

0.83

0.69

0.81

0.79

0.86

0.73

0.67

0.80

10.70

\(\varPsi _2\)

0.56

0.79

0.96

0.76

0.69

0.83

0.70

0.81

0.77

0.81

0.73

0.69

0.78

10.58

\(\varPsi _3\)

0.61

0.80

0.96

0.74

0.68

0.84

0.70

0.81

0.78

0.80

0.74

0.68

0.77

10.50

\(\varPsi _4\)

0.63

0.76

0.95

0.75

0.70

0.84

0.65

0.81

0.77

0.86

0.72

0.64

0.80

11.81

\(\varPsi _5\)

0.64

0.80

0.98

0.74

0.79

0.85

0.73

0.80

0.78

0.91

0.67

0.69

0.80

7.31

\(\varPsi _6\)

0.64

0.74

0.87

0.69

0.56

0.84

0.28

0.70

0.74

0.45

0.66

0.66

0.25

16.54

\(\varPsi _7\)

0.64

0.79

0.94

0.75

0.56

0.84

0.73

0.79

0.79

0.73

0.59

0.67

0.80

11.50

\(\varPsi _8\)

0.64

0.77

0.93

0.71

0.56

0.84

0.74

0.80

0.76

0.74

0.66

0.66

0.80

13.04

\(\varPsi _9\)

0.64

0.81

0.96

0.74

0.77

0.76

0.70

0.83

0.81

0.87

0.64

0.66

0.78

9.66

\(\varPsi _{10}\)

0.65

0.81

0.95

0.72

0.77

0.76

0.70

0.84

0.81

0.88

0.65

0.66

0.78

9.47

\(\varPsi _{11}\)

0.65

0.81

0.94

0.71

0.73

0.79

0.68

0.83

0.81

0.87

0.65

0.66

0.76

11.31

\(\varPsi _{12}\)

0.66

0.80

0.96

0.71

0.77

0.85

0.69

0.82

0.81

0.93

0.66

0.66

0.79

8.27

\(\varPsi _{13}\)

0.63

0.80

0.94

0.74

0.82

0.84

0.69

0.84

0.81

0.91

0.73

0.73

0.73

7.81

\(\varPsi _{14}\)

0.61

0.78

0.94

0.67

0.74

0.85

0.62

0.82

0.80

0.85

0.72

0.66

0.75

12.85

\(\varPsi _{15}\)

0.63

0.79

0.94

0.70

0.79

0.81

0.65

0.84

0.80

0.90

0.70

0.68

0.76

11.20

\(\varPsi _{16}\)

0.66

0.80

0.94

0.73

0.80

0.85

0.66

0.83

0.80

0.88

0.72

0.72

0.69

8.50

\(\varPsi _{DP}\)

0.65

0.69

0.97

0.73

0.78

0.89

0.64

0.81

0.79

0.84

0.71

0.65

0.78

10.89

\(\varPsi _S\)

0.67

0.82

0.96

0.75

0.84

0.82

0.72

0.84

0.81

0.94

0.73

0.70

0.79

4.39

\(\varPsi _{MCV}\)

0.68

0.82

0.97

0.74

0.83

0.84

0.70

0.84

0.81

0.93

0.74

0.71

0.80

3.74

The classification error with the mean ranks obtained by the Friedman test for classification algorithms used in the experiments are presented in Table 2. Considering only the mean ranks obtained by the Friedman test the best result achieved proposed in the work algorithm labelled \(\varPsi _{MCV}\). The obtained results were compared also by the post-hoc test [20]. The critical difference (CD) for this test at \(p=0.05\) is equal to \(CD=7.76\) – for 19 methods used for classification and 13 data sets. We can conclude that the post-hoc Nemenyi test detects significant differences between the proposed algorithm \(\varPsi _{MCV}\) and the four base classifiers \(\varPsi _4\), \(\varPsi _6\), \(\varPsi _8\), and \(\varPsi _{14}\). The classifier method labelled \(\varPsi _S\) is statistically better than the three base classifiers \(\varPsi _6\), \(\varPsi _8\) and \(\varPsi _{14}\). This confirms that the proposed algorithm is better than the ensemble of classifiers with the used sum method. It should be noted, however, that the difference in average ranks is not large enough to point to the significant differences between \(\varPsi _{MCV}\) and \(\varPsi _{S}\) algorithms.

5 Conclusion

In this paper we have proposed the methods that use information from the decision profiles and modified CVs received from the base classifiers. The aim of the experiments was to compare the proposed algorithm with all base classifiers and the ensemble classifiers based on the sum and decision profile methods. The experiments have been carried out on 13 benchmark data sets. We can conclude that the post-hoc Nemenyi test detects significant differences between the proposed algorithm \(\varPsi _{MCV}\) and four base classifiers. While the algorithm \(\varPsi _S\) classification results are statistically different from the three base classifiers. The obtained results show an improvement in the quality of the classification proposed algorithm \(\varPsi _{MCV}\) with respect to all the base classifiers and the used reference ensemble of the classifiers’ methods. Future work might involve the application of the proposed methods for various practical tasks [8, 10, 22] in which base classifiers are used. Additionally, the advantage of the proposed algorithm can be investigated as well as its ability to work in the parallel and distributed environment.

Notes

Acknowledgments

This work was supported by the Polish National Science Center under the grant no. DEC-2013/09/B/ST6/02264 and by the statutory funds of the Department of Systems and Computer Networks, Wroclaw University of Technology.

References

  1. 1.
    Alcalá, J., Fernández, A., Luengo, J., Derrac, J., GarcÍa, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Logic Soft Comput. 17(2–3), 255–287 (2010)Google Scholar
  2. 2.
    Baczyńska, P., Burduk, R.: Ensemble selection based on discriminant functions in binary classification task. In: Jackowski, K., et al. (eds.) IDEAL 2015. LNCS, vol. 9375, pp. 61–68. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24834-9_8CrossRefGoogle Scholar
  3. 3.
    Breiman, L.: Randomizing outputs to increase prediction accuracy. Mach. Learn. 40(3), 229–242 (2000)zbMATHCrossRefGoogle Scholar
  4. 4.
    Britto, A.S., Sabourin, R., Oliveira, L.E.: Dynamic selection of classifiers–a comprehensive review. Pattern Recogn. 47(11), 3665–3680 (2014)CrossRefGoogle Scholar
  5. 5.
    Burduk, R.: Classifier fusion with interval-valued weights. Pattern Recogn. Lett. 34(14), 1623–1629 (2013)CrossRefGoogle Scholar
  6. 6.
    Canuto, A.M., Abreu, M.C., de Melo Oliveira, L., Xavier, J.C., Santos, A.D.M.: Investigating the inuence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles. Pattern Recogn. Lett. 28(4), 472–486 (2007)CrossRefGoogle Scholar
  7. 7.
    Duin, R.P.: The combining classifier: to train or not to train? In: Proceedings of the 16th International Conference on Pattern Recognition, 2002, vol. 2, pp. 765–770. IEEE (2002)Google Scholar
  8. 8.
    Forczmański, P., Łabedź, P.: Recognition of occluded faces based on multi-subspace classification. In: Saeed, K., Chaki, R., Cortesi, A., Wierzchoń, S. (eds.) CISIM 2013. LNCS, vol. 8104, pp. 148–157. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  9. 9.
    Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010)Google Scholar
  10. 10.
    Frejlichowski, D.: An algorithm for the automatic analysis of characters located on car license plates. In: Kamel, M., Campilho, A. (eds.) ICIAR 2013. LNCS, vol. 7950, pp. 774–781. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  11. 11.
    Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156 (1996)Google Scholar
  12. 12.
    Giacinto, G., Roli, F.: An approach to the automatic design of multiple classifier systems. Pattern Recogn. Lett. 22, 25–33 (2001)zbMATHCrossRefGoogle Scholar
  13. 13.
    Inbarani, H.H., Azar, A.T., Jothi, G.: Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput. Methods Programs Biomed. 113(1), 175–185 (2014)CrossRefGoogle Scholar
  14. 14.
    Jackowski, K., Krawczyk, B., Woźniak, M.: Improved adaptive splitting, selection: the hybrid training method of a classifier based on a feature space partitioning. Int. J. Neural Syst. 24(03), 1430007 (2014)CrossRefGoogle Scholar
  15. 15.
    Korytkowski, M., Rutkowski, L., Scherer, R.: From ensemble of fuzzy classifiers to single fuzzy rule base classifier. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 265–272. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)zbMATHCrossRefGoogle Scholar
  17. 17.
    Kuncheva, L.I., Bezdek, J.C., Duin, R.P.: Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recogn. 34(2), 299–314 (2001)zbMATHCrossRefGoogle Scholar
  18. 18.
    Rejer, I.: Genetic algorithm with aggressive mutation for feature selection in BCI feature space. Pattern Anal. Appl. 18(3), 485–492 (2015)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Ruta, D., Gabrys, B.: Classifier selection for majority voting. Inf. Fusion 6(1), 63–81 (2005)zbMATHCrossRefGoogle Scholar
  20. 20.
    Trawiński, B., Smȩetek, M., Telec, Z., Lasota, T.: Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms. Int. J. Appl. Math. Comput. Sci. 22(4), 867–881 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    Xu, L., Krzyżak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)CrossRefGoogle Scholar
  22. 22.
    Zdunek, R., Nowak, M., Pliński, E.: Statistical classification of soft solder alloys by laser-induced breakdown spectroscopy: review of methods. J. Eur. Optical Soc.-Rapid Publ. 11(16006), 1–20 (2016)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  1. 1.Department of Systems and Computer NetworksWroclaw University of TechnologyWroclawPoland

Personalised recommendations