Multiclass classification by Min–Max ECOC with Hamming distance optimization

Szűcs, Gábor

doi:10.1007/s00371-022-02540-z

Multiclass classification by Min–Max ECOC with Hamming distance optimization

Original article
Open access
Published: 27 June 2022

Volume 39, pages 3949–3961, (2023)
Cite this article

Download PDF

You have full access to this open access article

The Visual Computer Aims and scope Submit manuscript

Multiclass classification by Min–Max ECOC with Hamming distance optimization

Download PDF

Gábor Szűcs ORCID: orcid.org/0000-0002-5781-1088¹

1202 Accesses
3 Citations
Explore all metrics

Abstract

Two questions often arise in the field of the ensemble in multiclass classification problems, (i) how to combine base classifiers and (ii) how to design possible binary classifiers. Error-correcting output codes (ECOC) methods answer these questions, but they focused on only the general goodness of the classifier. The main purpose of our research was to strengthen the bottleneck of the ensemble method, i.e., to minimize the largest values of two types of error ratios in the deep neural network-based classifier. The research was theoretical and experimental, the proposed Min–Max ECOC method suggests a theoretically proven optimal solution, which was verified by experiments on image datasets. The optimal solution was based on the maximization of the lowest value in the Hamming matrix coming from the ECOC matrix. The largest ECOC matrix, the so-called full matrix is always a Min–Max ECOC matrix, but smaller matrices generally do not reach the optimal Hamming distance value, and a recursive construction algorithm was proposed to get closer to it. It is not easy to calculate optimal values for large ECOC matrices, but an interval with upper and lower limits was constructed by two theorems, and they were proved. Convolutional Neural Networks with Min–Max ECOC matrix were tested on four real datasets and compared with OVA (one versus all) and variants of ECOC methods in terms of known and two new indicators. The experimental results show that the suggested method surpasses the others, thus our method is promising in the ensemble learning literature.

Experimental validation for N-ary error correcting output codes for ensemble learning of deep neural networks

Article 07 July 2018

An ensemble learning framework for convolutional neural network based on multiple classifiers

Article 20 June 2019

Parallel Cooperative Ensemble Learning by Adaptive Data Weighting and Error-Correcting Output Codes

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Deep Neural Networks (DNN) have become increasingly popular in computer science, especially in the computer vision community for image and other types of multimedia content classification. Neural Network was originally based on a binary classifier, the perceptron, but later it was extended with more output nodes for multiclass classification. Each node in the output layer of the DNN still can be considered a binary classifier, and the outputs of these classifiers could be combined by information fusion as an alternative to the state-of-the-art multiclass DNN classifiers.

In computer vision, object detection in images is a usual task, and there are some application areas where not only the misclassification ratio (which measures the average) but the largest error between any two classes are crucial. For example, the recognition of traffic signs is important and the mismatch of them should be avoided. So, in our special task, the aim was not only to achieve high accuracy in the classification but also to avoid a large error among the classes. The accuracy indicator can measure the average goodness (number of the good decisions divided by all decisions of the classifier), but cannot measure the bottleneck of this classification problem. Therefore, new indicators were also required to evaluate the classifiers.

The focus of this paper is on the error-correcting output codes (ECOC) combination of binary DNN classifiers. The decoding of the results of binary classifiers is based on the nearest code (closest row) in the ECOC matrix, which contains ones and zeros indicating whether a class belongs to a binary classifier or not. The coding methods can be categorized into data-dependent and data-independent code, our research focuses on only the latter one. Both of them have a large literature, and the current papers of ECOC research deal with weighted decoding for the competence reliability problem [24] and novel strategies to focus on the accuracy-complexity trade-off [1]. In the literature of ECOC methods, the early researchers used classical binary classifiers (e.g., SVM), as base learners; but recently deep neural networks (DNN) are also considered. The ECOC generation method can be based on the Hadamard matrix, this Hadamard-ECOC algorithm with DNN as the base classifier has been proposed in a recent paper [9]. However, the drawback of the Hadamard-ECOC is that the number of the binary classifiers cannot be determined arbitrarily, this is always equal to 2^k − 1, for example, if the number of classes is 5 or 6, then the number of the binary classifiers is 7. An open question in ECOC design is how to construct a good ECOC matrix. Existing methods answer these questions, but they focused on only the general goodness of the classifier. The main purpose of our research was to strengthen the bottleneck of the ensemble classifier, where we examined (i) the mistakes and (ii) the mistake pairs in the confusion matrix of the classifier. The latter consists of two values, one is the number of errors where the real class is determined and the predicted class differs from it, and the other is the number of errors where the previously mentioned real and the predicted classes switch roles. The number of errors in the confusion matrix can be proportionated to (i) the number of elements in the corresponding real classes or to (ii) all elements. At the former, we took into account the sum of the mistake pairs (as can be seen later, in Sect. 4.1, where LCPE indicator is defined), and at the latter, we used the number of the mistakes (for the LME indicator as can be seen in Sect. 4.1). Our aim was to minimize the largest values of these ratios. Our motivation in this research was to theoretically find an optimal design for this subproblem and to use it in the experiments with larger flexibility, where the number of binary classifiers can be selected from a range.

2 Related works

In machine learning, one can take advantage of using multiple models together by different strategies, like bagging, boosting [15], and stacking [16], and the frequent case in this topic when the model is a binary classifier. In the literature, there are many possible approaches to reduce multiclass to binary classification problems [12, 34] (in special cases the binary problem can be a binary descriptor), which can be used in different areas, like medical prognosis [19], stock market prediction [37] or visual computing, e.g., pedestrian detection [42]. Various authors modified the multiclass classification approach such as one versus one (OVO) [17, 44], one versus all (OVA) [13, 39], which is used for transforming classifier scores into the estimation of multiclass probabilities as well [41], and Directed Acyclic Graph (DAG), which creates many binary (base) classifiers and combines their results to determine the correct class label [3, 10, 27]. Error Correcting Output Coding (ECOC) [11] is also a frequently used approach for this multiclass classification problem, the ECOC is a generic ensemble classification framework designed with an ECOC matrix, which decomposes the task into several two-class problems, and the results of binary classifiers are aggregated by information fusion. For a special task (cancer classification), where only a few samples are available in each class, a special ECOC with a hierarchical ensemble strategy, named Hierarchical Ensemble of Error Correcting Output Codes (HE-ECOC) [25] was a good solution. Besides the multiclass problem, biometric cryptosystem can also use the ECOC framework because it often requires a binarization phase to transform the original real-valued templates into their binary versions; and there is a solution that combines the genetic algorithm with ECOC matrices with specific crossover, mutation, and extension operations considering the properties of optimally constructed ECOC matrices [28].

Lots of types of machine learning methods, e.g., Support Vector Machine [23], SVM with a Gaussian kernel [32], logistic regression, naive Bayes classifier, multi-layer perceptron, C4.5 decision tree, Multiple Birth Least Squares Support Vector Machine [8] were already investigated in information fusion [14]. Researchers constructed combined methods, for example, OVO-SVM, OVA-SVM, and ECOC-SVM were developed [36] as a combination of SVM with OVO, OVA [26], and ECOC; but it is worth examining the Deep Neural Networks (DNNs) as well. Deep Neural Networks can be used in end-to-end machine learning at image classification without manual feature extraction, thus our work focused on image datasets, instead of investigating the UCI datasets [40, 45] with a simple data structure. DNN consists of smaller parts, which can be considered as binary classifiers (perceptron), thus in this paper, we analyzed only the classifiers where the number of the classes in the base learners is two, despite there are some other possibilities, e.g., one-class classifier [21], N-ary classifier [43], Generic Subclass Ensemble [5].

Researchers developed ECOC for Recurrent Neural Networks (especially LSTM) [29], and for other types of DNN, for the Convolutional Neural Network (CNN) based on the Hadamard matrix, the so-called Hadamard-ECOC [9]. Another study analyses ECOC concerning the deep learning (CNN) research focusing on the accuracy-complexity trade-off [1]. We also planned to develop ECOC for CNN, not only constructing an ECOC matrix but theoretically finding the best matrix from the point of view mentioned at the end of Sect. 1. We considered the base binary learners as independent classifiers, contrary to the paper [30], where the authors examined the concept of correlation among them.

The row separation is an important measure to evaluate the error-correcting ability of the coding ECOC matrix [2]. The codes for different classes are expected to be as dissimilar as possible; otherwise, it is easier to commit errors. Thus, the capability of error correction relies on the distances among the rows [43]. The absolute distance and the Hamming distance can be examined, but we used only the latter one because in the binary case, they are the equivalent. The Hamming distance cannot be used in DNN without ECOC coding, because in DNN the one-hot encoding is required for multiclass labels. The Hamming distance in one-hot encoding is equal to two from the coding theory perspective, which does not allow error-detection or error-correcting capabilities. ECOC coding provides more possibilities for encoding categorical data into the output codes, which mitigates the limitations of the one-hot encoding mentioned above [20]. So, we used ECOC coding with Hamming distance, but contrary to the paper [20], we did not apply Zadeh fuzzy logic.

Deep Neural Networks (DNN) are multiclass classifiers, but we used such constructed networks, where the DNN operated as a binary classifier. Two types of construction were developed, one type was a self-made Convolutional Neural Network (briefly CNN21), and another type was a prepared one in the literature. CNN21 consists of 21 layers, convolutional layers with ReLU activation function, max-pooling layers, dropout, and softmax layer; and we used the binary cross-entropy loss function (the details are seen in Table 18 in Appendix).

Furthermore, we used VGG19 [33] and ResNet50 architectures [18] from the literature, however, the last fully connected layer (of VGG19 and ResNet50 as well) was replaced with a new layer for binary classification. For the multiclass problem, an appropriate ECOC construction was needed, which is described in the next section.

We demonstrated our information fusion results on complex datasets, thus we tested it on four datasets, e.g., on the Fashion-MNIST dataset, instead of an easier MNIST dataset, as in papers [4, 6] (and the paper [31] also justifies that the Fashion-MNIST is more difficult to learn by the deep learning models).

3 Min–Max ECOC method

3.1 ECOC construction

The number of rows in the ECOC matrix is given because this is equal to the number of the classes (number of the different labels) in the classification problem, let us denote this number by N_L. The number of the columns is larger than N_L, but a too large number will cause a long runtime during the learning, this number will be denoted by N in this paper. Before the calculation of the theoretical largest N (depending on N_L), we present the constraints of the design. None of the column vectors can consist of only “1” values (or only “0” values), because columns code the classes for the binary classification learning, and at least one class should be on the other side in the two groups of the classes. Another constraint is about uniqueness, each column vector should be different from the others, otherwise, the columns would be redundant during the learning. This unique condition is not enough, another condition is that two column vectors cannot be complements to each other, because in the binary classification the changes of “1” and “0” label values cause the same situation with the same results. Thus, the maximal number of the columns is equal to $2^{{N_{L} - 1}} - 1$ as can be seen in the next equation.

$$ N \le 2^{{N_{L} - 1}} - 1 $$

(1)

When N reaches the maximum value according to this equation, we call this kind of matrix by full ECOC matrix. In the literature, the number of columns is greater than or equal to the number of rows in the ECOC matrix, and this restriction was also applied in our work, i.e., $N_{L} \le N$ (this restriction is necessary but not sufficient for error-correcting capabilities).

3.2 Construction of full ECOC matrix

At the ECOC matrix, the row vectors determine the goodness of the decisions, different row vectors have an advantage. The difference of row vectors is measured by Hamming distance function. The symmetric matrix by the construction of each pair of row vectors is called Hamming matrix containing Hamming distance values of the pairs.

Theorem 1

At full ECOC the entries in the Hamming matrix are the same, except for the diagonal, where the values are equal to zero.

Theorem 2

At full ECOC the positive entries in the Hamming matrix are $2^{{N_{L} - 2}}$.

The construction of a full ECOC matrix can be solved from a smaller full ECOC matrix where the number of rows is less by one. We describe the construction algorithm, and based on them the proofs of both theorems will be presented.

Proofs of the theorems

We will prove both theorems by mathematical induction. When N_L = 3, the following matrix (as can be seen in Table 1) is full ECOC, and the statements in Theorems 1 and 2 are true because Hamming distance between any different pair of rows is 2, so diagonal entries in the Hamming matrix are 0, others are equal to 2.

Table 1 Full ECOC matrix with 3 classes (identity matrix)

Multiclass classification by Min–Max ECOC with Hamming distance optimization

Abstract

Similar content being viewed by others

Experimental validation for N-ary error correcting output codes for ensemble learning of deep neural networks

An ensemble learning framework for convolutional neural network based on multiple classifiers

Parallel Cooperative Ensemble Learning by Adaptive Data Weighting and Error-Correcting Output Codes

1 Introduction

2 Related works

3 Min–Max ECOC method

3.1 ECOC construction

3.2 Construction of full ECOC matrix

Theorem 1

Theorem 2

3.3 Min–Max ECOC matrix for optimized fusion

3.4 Theorems for Min–Max ECOC matrix

Theorem 3

Theorem 4

Theorem 5

4 Evaluation

4.1 Metrics for evaluation

4.2 Results on 4 datasets

5 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation