Employing One-Class SVM Classifier Ensemble for Imbalanced Data Stream Classification
- 1 Citations
- 809 Downloads
Abstract
The classification of imbalanced data streams is gaining more and more interest. However, apart from the problem that one of the class is not well represented, there are problems typical for data stream classification, such as limited resources, lack of access to the true labels and the possibility of occurrence of the concept drift. Possibility of concept drift appearing enforces design in the method adaptation mechanism. In this article, we propose the OCEIS classifier (One-Class support vector machine classifier Ensemble for Imbalanced data Stream). The main idea is to supply the committee with one-class classifiers trained on clustered data for each class separately. The results obtained from experiments carried out on synthetic and real data show that the proposed method achieves results at a similar level as the state of the art methods compared with it.
Keywords
One-class classification Imbalanced data Data streams Ensemble learning1 Introduction
Currently, the classification of difficult data is a frequently selected topic of research. One of many examples of this type of data is data streams. Such data should be processed for a limited time, having appropriate memory restrictions and performing only one-time use of incoming data. Also, the classifiers are required to be adaptable. A common phenomenon accompanying streams is the concept drift, which causes a change in the incoming data distribution. These changes may occur indefinitely.
Another problem is the imbalance of data, when it is combined with streams, significantly increases the difficulty. Uneven distribution of the number of classes is a fairly common phenomenon occurring in real data sets. This is not a problem when the differences are small, but it becomes serious when the difference between the number of objects from minority and majority classes is significantly huge. One of the known ways to deal with these difficulties is data sampling methods. These methods are designed to reduce the number of objects in the dominant class or to generate artificial objects of the minority class [2].
Designing methods with mechanisms for adapting to this type of data is another approach. One of this kind of approach is Learn++CDS [6] method, which combines the Learn++NSE [7] for nonstationary streams and SMOTE [2] for oversampling data. The next method in this paper is Learn++NIE, which is similar to the previous one, but with little difference. The classification error is introduced and some variation of bagging is used for balancing data. Wang et al. [19] design a method that uses the k-Mean clustering algorithm for undersampling data by prototype generation from centroids. The REA method proposed by Chen and He [4]. It is extension of the SERA [3] and the MuSeRA [5]. This family of methods uses a strategy for estimating similarity between previous samples of minority classes and the current minority data from the chunk.
One of the demanding situations when classifying imbalanced data streams is the temporary disappearance of the minority class or their appearance only in later stages. This type of phenomenon can cause a significant decrease in quality or sometimes prevent the typical classifier from working. The solution that raises this type of problem is the use of one-class classifiers that can make decisions based only on objects from one class only. Krawczyk et al. [11] proposed to the form an ensemble of one-class classifiers. Clustered data within samples from each class is used to train new models and expand ensemble. J. Liu et al. [14] designed a modular committee of single-class classifiers based on data density analysis. This is a similar approach, where clusters are created as part of a single-class data set. Krawczyk and Woźniak [10] presented various metrics enabling the creation of effective one-class classifier committees.
A proposal for an OCEIS method for classifying imbalanced data streams based on one-class SVM classifiers
Introduction of an appropriate combination rule allowing full use of the potential of the one-class SVM classifier ensemble
Designing the proper learning procedure for the proposed method using division of data into classes and k-mean clustering
Experimental evaluation of the proposed OCEIS method using real and synthetic generated imbalanced data streams and a comparison with the state-of-the-art methods
Decision regions visualisation on the paw dataset from the Keel.es repository [1]
2 Proposed Method
The proposed method One Class support vector machine classifier Ensemble for Imbalanced data Stream (OCEIS) is a combination of different approaches to data classification. The main core of this idea is the use of one-class support vector machines (OCSVM) to classify imbalanced binary problems. This method is the chunk-based data stream method.
In the first step of the Algorithm 1, the chunk of training data is divided into a minority (\(D_{min}\)) and a majority set (\(D_{maj}\)). Then these sets of data are divided into clusters. Krawczyk et al. [11] indicate the importance of this idea. This decomposition of data over the feature space allows achieving less overlap of classifiers decision areas in the ensemble (Fig. 1). The k-means algorithm [15] is used to create clusters. The key aspect is choosing the right number of clusters. Silhouette Value (SV) [18] comes with help, which allows calculating how similar an object is to its own cluster compared to other clusters. Kaufman et al. [9] introduced the Silhouette Coefficient (SC) for the maximum value of the mean SV over the entire dataset.
A crucial component of any classifier ensemble is the combination rule, which makes decisions based on the predictions of the classifier ensemble. Designing a good decision rule is vital for proper operation and obtaining satisfactory classification quality. First of all, OCEIS uses one-class classifiers and class clustering technique, which changes the way how the ensemble works. Well-known decision making based on majority voting [20] does not allow this kind of committee to make correct decisions. The number of classifiers for individual classes may vary significantly depending on the number of clusters. In this situation, there is a considerable risk that the decision will mainly base on majority classifiers.
OCEIS uses the original combination rule (Algorithm 2) based on distance from the decision boundary of classifiers to predicted samples. In the first step, the distances (\(Dist_{i,m}\), \(Dist_{j,m}\)) are calculated from all objects of the predicted data to the hypersphere of the models forming the minority and the majority committee. The DecisionFunction calculates these values. When the examined object is inside the checked hypersphere, it obtains a positive value, when it is outside, it receives a negative value. Then the highest value (\(D_{maj}\), \(D_{min}\)) is determined from the majority and minority committees for each sample. When the best value (\(D_{maj}\)) for the model from the majority subensemble is greater than the best value (\(D_{min}\)) for the model from the minority subensemble, it means that this object belongs to the majority class. Similarly, when \(D_{min} \) is greater than \(D_{maj}\), the object belongs to a minority class.
3 Experimental Evaluation
It is possible to design a method with a statistically better or equal classification quality of imbalanced data streams compared to the selected state of the art methods.
3.1 Experiment Setup
Dataset | IR | Samples | Features |
---|---|---|---|
abalone-17_vs_7-8-9-10 | 39 | 2338 | 8 |
australian | 1.2 | 690 | 14 |
elecNormNew | 1.4 | 45312 | 8 |
glass-0-1-2-3_vs_4-5-6 | 3.2 | 214 | 9 |
glass0 | 2.1 | 214 | 9 |
glass1 | 1.8 | 214 | 9 |
heart | 1.2 | 270 | 13 |
jm1 | 5.5 | 2109 | 21 |
kc1 | 5.5 | 2109 | 21 |
kc2 | 3.9 | 522 | 21 |
kr-vs-k-three_vs_eleven | 35 | 2935 | 6 |
kr-vs-k-zero-one_vs_draw | 27 | 2901 | 6 |
page-blocks0 | 8.8 | 5472 | 10 |
pima | 1.9 | 768 | 8 |
segment0 | 6 | 2308 | 19 |
shuttle-1vs4 | 14 | 1829 | 9 |
shuttle-1vsA | 3.7 | 57999 | 9 |
shuttle-4-5vsA | 3.8 | 57999 | 9 |
shuttle-4vsA | 5.5 | 57999 | 9 |
shuttle-5vsA | 17 | 57999 | 9 |
vehicle0 | 3.3 | 846 | 18 |
vowel0 | 10 | 988 | 13 |
wisconsin | 1.9 | 683 | 9 |
yeast-0-2-5-6_vs_3-7-8-9 | 9.1 | 1004 | 8 |
yeast-0-2-5-7-9_vs_3-6-8 | 9.1 | 1004 | 8 |
yeast-0-3-5-9_vs_7-8 | 9.1 | 506 | 8 |
yeast-0-5-6-7-9_vs_4 | 9.4 | 528 | 8 |
yeast-2_vs_4 | 9.1 | 514 | 8 |
yeast1 | 2.5 | 1484 | 8 |
yeast3 | 8.1 | 1484 | 8 |
3.2 Results Analysis
Wilcoxon pair rank sum tests for synthetic data streams. Dashed vertical line is a critical value with a confidence level 0.05 (green - win, yellow - tie, red - lose) (Color figure online)
Wilcoxon pair rank sum tests for real data streams. Dashed vertical line is a critical value with a confidence level 0.05 (green - win, yellow - tie, red - lose) (Color figure online)
Gmean score over the data chunks for synthetic data with incremental drift
Gmean score over the data chunks for synthetic data with sudden drift
Gmean score over the data chunks for real stream shuttle-4-5vsA
When analyzing the results, one should pay attention to the significant divergences in the performance of the proposed method for synthetic and real data streams. A large variety characterized real data streams, while artificial streams were generated using one type of generator (of course, for different settings). However, generated data streams are biased towards one type of data distribution, which probably was easy to analyze by some of the models, while the bias of the rest of them was not consistent with this type of data generator. Therefore, in the future, we are going to carry out the experimental research on the expanded pool of synthetic streams generated by other different generators.
4 Conclusions
We proposed an imbalanced data streams classification algorithm based on the one-class classifier ensemble. Based on the results obtained from reliable experiments, the formulated research hypothesis seems to be confirmed. OCEIS achieves results at a similar level to the compared methods, but it is worth noticing that it performs best on real stream data, which is its important advantage. Another advantage is that there is no tendency towards the excessive classification of objects from one of the classes. This was a problem in experiments carried out for the REA and OUSE methods. Such “stability” contributes significantly to improving the quality of classification and obtaining satisfactory results.
For synthetic data streams, the proposed algorithm is not the worst-performing one. However, one can see some dominance of the methods from the Learn++ family, because the decision made by OCEIS is built based on all classifiers as part of the committee. One possible way to change this would be to break down newly created models by data chunks. This would build subcommittees (the Learn++NIE method works similarly). Then decisions would be made for each subcommittee separately. Expanding this by the weighted voting decision may significantly improve predictive performance. Another modernization of the method that would allow for some improvement would be the introduction of a drift detector. This mechanism would enable the ensemble to clean up after detecting concept drift.
The conducted research indicates the potential hidden in the presented method. It is worth considering extending the research to streams with other types of concept drifts. It is also beneficial to increase the number of real streams to test to get a broader spectrum of knowledge about how this method works on real data. One of the ideas for further research that arose while working on this paper is to test the operation on streams where the imbalance ratio changes over time. A very interesting would be an experiment on imbalanced data streams where the minority class temporarily disappears or appears after some time.
Footnotes
Notes
Acknowledgment
This work was supported by the Polish National Science Centre under the grant No. 2017/27/B/ST6/01325 as well as by the statutory funds of the Department of Systems and Computer Networks, Faculty of Electronics, Wroclaw University of Science and Technology.
References
- 1.Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17 (2011)Google Scholar
- 2.Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
- 3.Chen, S., He, H.: Sera: selectively recursive approach towards nonstationary imbalanced stream data mining. In: 2009 International Joint Conference on Neural Networks, pp. 522–529. IEEE (2009)Google Scholar
- 4.Chen, S., He, H.: Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol. Syst. 2(1), 35–50 (2011)CrossRefGoogle Scholar
- 5.Chen, S., He, H., Li, K., Desai, S.: Musera: multiple selectively recursive approach towards imbalanced stream data mining. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)Google Scholar
- 6.Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2012)CrossRefGoogle Scholar
- 7.Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)CrossRefGoogle Scholar
- 8.Gao, J., Ding, B., Fan, W., Han, J., Philip, S.Y.: Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Comput. 12(6), 37–49 (2008)CrossRefGoogle Scholar
- 9.Kaufmann, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)CrossRefGoogle Scholar
- 10.Krawczyk, B., Woźniak, M.: Diversity measures for one-class classifier ensembles. Neurocomputing 126, 36–44 (2014)CrossRefGoogle Scholar
- 11.Krawczyk, B., Woźniak, M., Cyganek, B.: Clustering-based ensembles for one-class classification. Inf. Sci. 264, 182–195 (2014)MathSciNetCrossRefGoogle Scholar
- 12.Ksieniewicz, P., Zyblewski, P.: Stream-learn-open-source python library for difficult data stream batch analysis. arXiv preprint arXiv:2001.11077 (2020)
- 13.Lima, M., Valle, V., Costa, E., Lira, F., Gadelha, B.: Software engineering repositories: expanding the promise database. In: Proceedings of the XXXIII Brazilian Symposium on Software Engineering, pp. 427–436. ACM (2019)Google Scholar
- 14.Liu, J., Miao, Q., Sun, Y., Song, J., Quan, Y.: Modular ensembles for one-class classification based on density analysis. Neurocomputing 171, 262–276 (2016)CrossRefGoogle Scholar
- 15.MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)Google Scholar
- 16.Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, classifiaction (1992)Google Scholar
- 17.Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
- 18.Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefGoogle Scholar
- 19.Wang, Y., Zhang, Y., Wang, Y.: Mining data streams with skewed distribution by static classifier ensemble. In: Chien, B.C., Hong, T.P. (eds.) Opportunities and Challenges for Next-Generation Applied Intelligence, pp. 65–71. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-92814-0_11CrossRefGoogle Scholar
- 20.Xu, L., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)CrossRefGoogle Scholar