A Double-Ensemble Approach for Classifying Skewed Data Streams
Nowadays, many applications need to handle large amounts of streaming data, which often presents a skewed distribution, i.e. one or more classes are largely under-represented in comparison to the others. Unfortunately, little effort has been directed towards the classification of skewed data streams, although class-imbalance learning has already been studied in the area of pattern recognition on static data. Furthermore, while existing class-imbalance learning methods increase the recognition accuracy on minority class, they often harm the global classification accuracy. Motivated by these observations, we develop an approach suited for classifying skewed data streams, which integrates two ensembles of classifiers, each one suited for non-skewed and skewed data. This approach substantially increases the global accuracy compared to existing classification methods for skewed data. Experimental tests have been carried out on three public datasets showing interesting results. As a further contribution, we will study metrics to evaluate the performance of skewed data streams classification. We will also review the literature on class-imbalance learning, and skewed data streams classification.
KeywordsData Stream Minority Class Class Imbalance Skewed Data Imbalanced Data
Unable to display preview. Download preview PDF.
- 3.Bay, S.D., Kibler, D., Pazzani, M.J., Smyth, P.: The uci kdd archive of large data sets for data mining research and experimentation. SIGKDD Explorations, 81–85 (2000)Google Scholar
- 7.Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. SIGKDD, pp. 71–80 (2000)Google Scholar
- 8.Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proc. SIGKDD, pp. 523–528 (2003)Google Scholar
- 9.Gao, J., Fan, W., Han, J., Yu, P.S.: A general framework for mining concept-drifting data streams with skewed distributions. In: Proc. SIAM SDM 2007, pp. 3–14 (2007)Google Scholar
- 11.Kotsiantis, S., Pintelas, P.: Mixture of expert agents for handling imbalanced data sets. Ann. of Mathematics, Computing and Teleinformatics 1(1), 46–55 (2003)Google Scholar
- 12.Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proc. ICML 1997, pp. 179–186 (1997)Google Scholar
- 13.Pazzani, M., Merz, C., Murphy, P., Ali, K.: Reducing misclassification costs. In: Proc. ICML 1994, pp. 217–225 (1994)Google Scholar
- 15.Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proc. SIGKDD, pp. 377–382 (2001)Google Scholar
- 16.Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: SIGKDD, pp. 226–235 (2003)Google Scholar