Advertisement

On Ensemble Components Selection in Data Streams Scenario with Gradual Concept-Drift

  • Piotr DudaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10842)

Abstract

In the paper we study the issue of components selection of an ensemble for data stream classification. Decision about adding or removing single component has significant meaning not only for an accuracy in the current instant, but can be also significant for the further stream processing. The algorithm proposed in this paper is an enhanced version of the ASE (Automatically Sized Ensemble) algorithm which guarantees that a new component will be added to the ensemble only if it increases the accuracy not only for the current data chunk but also for the whole data stream. The algorithm is designed to improve data stream processing in the case when one concept is gradually replaced by the other. The Hellinger distance is applied to allow adding a new component, if its predictions differ significantly from the rest of the ensemble, even though that component does not increase accuracy of the whole ensemble.

Keywords

Ensemble methods Data streams Gradual concept drift 

Notes

Acknowledgments

This work was supported by the Polish National Science Centre under Grant No. 2014/15/B/ST7/05264.

References

  1. 1.
    Amini, A., Wah, T.Y., Saboohi, H.: On density-based data streams clustering algorithms: a survey. J. Comput. Sci. Technol. 29(1), 116–141 (2014)CrossRefGoogle Scholar
  2. 2.
    Andressian, V., Parent, E., Claude, M.: A distributions free test to detect gradual changes in watershed behavior. Water Resour. Res. 39(9) (2003).  https://doi.org/10.1029/2003WR002081
  3. 3.
    Ayadi, N., Derbel, N., Morette, N., Novales, C., Poisson, G.: Simulation and experimental evaluation of the ekf simultaneous localization and mapping algorithm on the wifibot mobile robot. J. Artif. Intell. Soft Comput. Res. 8(2), 91–101 (2018).  https://doi.org/10.1515/jaiscr-2018-0006CrossRefGoogle Scholar
  4. 4.
    Beygelzimer, A., Kale, S., Luo, H.: Optimal and adaptive algorithms for online boosting. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 2323–2331 (2015)Google Scholar
  5. 5.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11(May), 1601–1604 (2010)Google Scholar
  6. 6.
    Bustamam, A., Sarwinda, D., Ardenaswari, G.: Texture and gene expression analysis of the MRI brain in detection of Alzheimers disease. J. Artif. Intell. Soft Comput. Res. 8(2), 111–120 (2018).  https://doi.org/10.1515/jaiscr-2018-0008CrossRefGoogle Scholar
  7. 7.
    Cao, Y., He, H., Man, H.: SOMKE: Kernel density estimation over data streams by sequences of self-organizing maps. IEEE Trans. Neural Netw. Learn. Syst. 23(8), 1254–1268 (2012)CrossRefGoogle Scholar
  8. 8.
    Davis, J.J.J., Lin, C.T., Gillett, G., Kozma, R.: An integrative approach to analyze EEG signals and human brain dynamics in different cognitive states. J. Artif. Intell. Soft Comput. Res. 7(4), 287–299 (2017)CrossRefGoogle Scholar
  9. 9.
    Devi, V.S., Meena, L.: Parallel MCNN (PMCNN) with application to prototype selection on large and streaming data. J. Artif. Intell. Soft Comput. Res. 7(3), 155–169 (2017)CrossRefGoogle Scholar
  10. 10.
    Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)CrossRefGoogle Scholar
  11. 11.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)Google Scholar
  12. 12.
    Duda, P., Jaworski, M., Rutkowski, L.: Knowledge discovery in data streams with the orthogonal series-based generalized regression neural networks. Inf. Sci. (2017).  https://doi.org/10.1016/j.ins.2017.07.013MathSciNetCrossRefGoogle Scholar
  13. 13.
    Duda, P., Jaworski, M., Rutkowski, L.: Convergent time-varying regression models for data streams: tracking concept drift by the recursive parzen-based generalized regression neural networks. Int. J. Neural Syst. 28(02), 1750048 (2018)CrossRefGoogle Scholar
  14. 14.
    Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)CrossRefGoogle Scholar
  15. 15.
    Hoffmann, M., Vetter, M., Dette, H.: Nonparametric inference of gradual changes in the jump behaviour of time-continuous processes. Stoch. Process. Appl. (2018).  https://doi.org/10.1016/j.spa.2017.12.005MathSciNetCrossRefGoogle Scholar
  16. 16.
    Ikonomovska, E., Gama, J., Džeroski, S.: Online tree-based ensembles and option trees for regression on evolving data streams. Neurocomputing 150, 458–470 (2015)CrossRefGoogle Scholar
  17. 17.
    Jaworski, M., Duda, P., Rutkowski, L.: On applying the restricted Boltzmann machine to active concept drift detection. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE (2017)Google Scholar
  18. 18.
    Liu, A., Zhang, G., Lu, J.: Fuzzy time windowing for gradual concept drift adaptation. In: 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–6. IEEE (2017)Google Scholar
  19. 19.
    Mahdi, O.A., Pardede, E., Cao, J.: Combination of information entropy and ensemble classification for detecting concept drift in data stream. In: Proceedings of the Australasian Computer Science Week Multiconference, p. 13. ACM (2018)Google Scholar
  20. 20.
    Minku, L., Yao, X.: DDD: a new ensemble approach for dealing with concept drift. IEEE Trans. Knowl. Data Eng. 24(4), 619–633 (2012)CrossRefGoogle Scholar
  21. 21.
    Notomista, G., Botsch, M.: A machine learning approach for the segmentation of driving maneuvers and its application in autonomous parking. J. Artif. Intell. Soft Comput. Res. 7(4), 243–255 (2017)CrossRefGoogle Scholar
  22. 22.
    Oza, N.C.: Online bagging and boosting. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 2340–2345. IEEE (2005)Google Scholar
  23. 23.
    Pietruczuk, L., Rutkowski, L., Jaworski, M., Duda, P.: A method for automatic adjustment of ensemble size in stream data mining. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 9–15. IEEE (2016)Google Scholar
  24. 24.
    Pietruczuk, L., Rutkowski, L., Jaworski, M., Duda, P.: How to adjust an ensemble size in stream data mining? Inf. Sci. 381, 46–54 (2017)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Polikar, R., Upda, L., Upda, S.S., Honavar, V.: Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 31(4), 497–508 (2001)CrossRefGoogle Scholar
  26. 26.
    Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014)CrossRefGoogle Scholar
  27. 27.
    Rutkowski, L., Pietruczuk, L., Duda, P., Jaworski, M.: Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Trans. Knowl. Data Eng. 25(6), 1272–1279 (2013)CrossRefGoogle Scholar
  28. 28.
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM (2001)Google Scholar
  29. 29.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235. ACM (2003)Google Scholar
  30. 30.
    Woźniak, M., Połap, D., Napoli, C., Tramontana, E.: Graphic object feature extraction system based on cuckoo search algorithm. Expert Syst. Appl. 66, 20–31 (2016)CrossRefGoogle Scholar
  31. 31.
    Zalasiński, M., Cpałka, K., Er, M.J.: Stability evaluation of the dynamic signature partitions over time. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2017. LNCS (LNAI), vol. 10245, pp. 733–746. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-59063-9_66CrossRefGoogle Scholar
  32. 32.
    Zalasiński, M., Cpałka, K., Rakus-Andersson, E.: An idea of the dynamic signature verification based on a hybrid approach. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS (LNAI), vol. 9693, pp. 232–246. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-39384-1_21CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute of Computational IntelligenceCzestochowa University of TechnologyCzestochowaPoland

Personalised recommendations