Skip to main content

A Double-Ensemble Approach for Classifying Skewed Data Streams

  • Conference paper
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7301))

Included in the following conference series:

Abstract

Nowadays, many applications need to handle large amounts of streaming data, which often presents a skewed distribution, i.e. one or more classes are largely under-represented in comparison to the others. Unfortunately, little effort has been directed towards the classification of skewed data streams, although class-imbalance learning has already been studied in the area of pattern recognition on static data. Furthermore, while existing class-imbalance learning methods increase the recognition accuracy on minority class, they often harm the global classification accuracy. Motivated by these observations, we develop an approach suited for classifying skewed data streams, which integrates two ensembles of classifiers, each one suited for non-skewed and skewed data. This approach substantially increases the global accuracy compared to existing classification methods for skewed data. Experimental tests have been carried out on three public datasets showing interesting results. As a further contribution, we will study metrics to evaluate the performance of skewed data streams classification. We will also review the literature on class-imbalance learning, and skewed data streams classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Analysis & Applications 6(3), 245–256 (2003)

    Article  Google Scholar 

  2. Batista, G.E., Carvalho, A.C., Monard, M.C.: Applying One-sided Selection to Unbalanced Datasets. In: Cairó, O., Cantú, F.J. (eds.) MICAI 2000. LNCS, vol. 1793, pp. 315–325. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  3. Bay, S.D., Kibler, D., Pazzani, M.J., Smyth, P.: The uci kdd archive of large data sets for data mining research and experimentation. SIGKDD Explorations, 81–85 (2000)

    Google Scholar 

  4. Chan, P.K., Fan, W., Prodromidis, A.: Distributed data mining in credit card fraud detection. IEEE Intelligent Systems 14, 67–74 (1999)

    Article  Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  6. Chawla, N.V., Japkowicz, N.: Editorial: Special issue on learning from imbalanced data sets. SIGKDD Explorations 6 (2004)

    Article  Google Scholar 

  7. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. SIGKDD, pp. 71–80 (2000)

    Google Scholar 

  8. Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proc. SIGKDD, pp. 523–528 (2003)

    Google Scholar 

  9. Gao, J., Fan, W., Han, J., Yu, P.S.: A general framework for mining concept-drifting data streams with skewed distributions. In: Proc. SIAM SDM 2007, pp. 3–14 (2007)

    Google Scholar 

  10. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Transactions On Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)

    Article  Google Scholar 

  11. Kotsiantis, S., Pintelas, P.: Mixture of expert agents for handling imbalanced data sets. Ann. of Mathematics, Computing and Teleinformatics 1(1), 46–55 (2003)

    Google Scholar 

  12. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proc. ICML 1997, pp. 179–186 (1997)

    Google Scholar 

  13. Pazzani, M., Merz, C., Murphy, P., Ali, K.: Reducing misclassification costs. In: Proc. ICML 1994, pp. 217–225 (1994)

    Google Scholar 

  14. Soda, P.: A multi-objective optimisation approach for class imbalance learning. Pattern Recognition 44(8), 1801–1810 (2011)

    Article  MATH  Google Scholar 

  15. Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proc. SIGKDD, pp. 377–382 (2001)

    Google Scholar 

  16. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: SIGKDD, pp. 226–235 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, C., Soda, P. (2012). A Double-Ensemble Approach for Classifying Skewed Data Streams. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7301. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30217-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30217-6_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30216-9

  • Online ISBN: 978-3-642-30217-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics