Advertisement

Ensemble Dynamics in Non-stationary Data Stream Classification

  • Hossein GhomeshiEmail author
  • Mohamed Medhat Gaber
  • Yevgeniya Kovalchuk
Chapter
Part of the Studies in Big Data book series (SBD, volume 41)

Abstract

Data stream classification is the process of learning supervised models from continuous labelled examples in the form of an infinite stream that, in most cases, can be read only once by the data mining algorithm. One of the most challenging problems in this process is how to learn such models in non-stationary environments, where the data/class distribution evolves over time. This phenomenon is called concept drift. Ensemble learning techniques have been proven effective adapting to concept drifts. Ensemble learning is the process of learning a number of classifiers, and combining them to predict incoming data using a combination rule. These techniques should incrementally process and learn from existing data in a limited memory and time to predict incoming instances and also to cope with different types of concept drifts including incremental, gradual, abrupt or recurring. A sheer number of applications can benefit from data stream classification from non-stationary data, including weather forecasting, stock market analysis, spam filtering systems, credit card fraud detection, traffic monitoring, sensor data analysis in Internet of Things (IoT) networks, to mention a few. Since each application has its own characteristics and conditions, it is difficult to introduce a single approach that would be suitable for all problem domains. This chapter studies ensembles’ dynamic behaviour of existing ensemble methods (e.g. addition, removal and update of classifiers) in non-stationary data stream classification. It proposes a new, compact, yet informative formalisation of state-of-the-art methods. The chapter also presents results of our experiments comparing a diverse selection of best performing algorithms when applied to several benchmark data sets with different types of concept drifts from different problem domains.

Keywords

Data Stream Classification Concept Drift ADWIN Bagging Forest Cover Type Data Set Dynamic Weighted Majority (DWM) 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16. ACM, New York (2002)Google Scholar
  2. 2.
    Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 139–148. ACM, New York (2009)Google Scholar
  3. 3.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: Moa: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)Google Scholar
  4. 4.
    Blackard, J.A., Dean, D.J.: Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput. Electron. Agric. 24(3), 131–151 (1999)CrossRefGoogle Scholar
  5. 5.
    Brzezinski, D., Stefanowski, J.: Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf. Sci. 265, 50–67 (2014)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 81–94 (2014)CrossRefGoogle Scholar
  7. 7.
    Chu, F., Zaniolo, C.: Fast and light boosting for adaptive mining of data streams. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 282–292. Springer, Berlin (2004)Google Scholar
  8. 8.
    Deckert, M.: Batch weighted ensemble for mining data streams with concept drift. In: International Symposium on Methodologies for Intelligent Systems, pp. 290–299. Springer, Berlin (2011)Google Scholar
  9. 9.
    Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)CrossRefGoogle Scholar
  10. 10.
    Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR), 46(4), 44 (2014)Google Scholar
  11. 11.
    Gomes, H.M., Barddal, J.P., Enembreck, F., Bifet, A.: A survey on ensemble learning for data stream classification. ACM Comput. Surv. (CSUR) 50(2), 23 (2017)Google Scholar
  12. 12.
    Gonçalves, P.M., Jr., De Barros, R.S.M.: RCD: a recurring concept drift framework. Pattern Recogn. Lett. 34(9), 1018–1025 (2013)CrossRefGoogle Scholar
  13. 13.
    Harries, M., Wales, N.S.: Splice-2 comparative evaluation: Electricity pricing (1999)Google Scholar
  14. 14.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM, New York (2001)Google Scholar
  15. 15.
    Jaber, G.: An approach for online learning in the presence of concept change. PhD thesis, Citeseer (2013)Google Scholar
  16. 16.
    Kolter, J.Z., Maloof, M.A.: Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 449–456. ACM, New York (2005)Google Scholar
  17. 17.
    Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: an ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)zbMATHGoogle Scholar
  18. 18.
    Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)CrossRefGoogle Scholar
  19. 19.
    Nguyen, H.-L., Woon, Y.-K., Ng, W.-K., Wan, L.: Heterogeneous ensemble for feature drifts in data streams. In: Advances in Knowledge Discovery and Data Mining, pp. 1–12 (2012)Google Scholar
  20. 20.
    Nishida, K., Yamauchi, K.: Adaptive classifiers-ensemble system for tracking concept drift. In: Machine Learning and Cybernetics, 2007 International Conference on, vol. 6, pp. 3607–3612. IEEE, New York (2007)Google Scholar
  21. 21.
    Ortíz Díaz, A., del Campo-Ávila, J., Ramos-Jiménez, G., Frías Blanco, I., Caballero Mota, Y., Mustelier Hechavarría, A., Morales-Bueno, R.: Fast adapting ensemble: a new algorithm for mining data streams with concept drift. Sci. World J. 2015, 1–15 (2015)CrossRefGoogle Scholar
  22. 22.
    Ramamurthy, S., Bhatnagar, R.: Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on, pp. 404–409. IEEE, New York (2007)Google Scholar
  23. 23.
    Rushing, J., Graves, S., Criswell, E., Lin, A.: A coverage based ensemble algorithm (CBEA) for streaming data. In: Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on, pp. 106–112. IEEE, New York (2004)Google Scholar
  24. 24.
    Stanley, K.O.: Learning concept drift with a committee of decision trees. Informe técnico: UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA (2003)Google Scholar
  25. 25.
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM, New York (2001)Google Scholar
  26. 26.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235. ACM, New York (2003)Google Scholar
  27. 27.
    Woźniak, M.: Application of combined classifiers to data stream classification. In: Computer Information Systems and Industrial Management, pp. 13–23. Springer, Berlin (2013)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Hossein Ghomeshi
    • 1
    Email author
  • Mohamed Medhat Gaber
    • 1
  • Yevgeniya Kovalchuk
    • 1
  1. 1.School of Computing and Digital TechnologyBirmingham City UniversityBirminghamUK

Personalised recommendations