Abstract
While diversity of ensembles has been studied in the context of static data, it has not still received such research interest for evolving data streams. This paper aims at analyzing the impact of concept drift on diversity measures calculated for streaming ensembles. We consider six popular diversity measures and adapt their calculations to data stream requirements. A comprehensive series of experiments reveals the potential of each measure for visualizing ensemble performance over time. Measures highlighted as capable of depicting sudden and virtual drifts over time are used as basis for detecting changes with the Page-Hinkley test. Experimental results demonstrate that the \(\kappa \) interrater agreement, disagreement, and double fault measures, although designed to quantify diversity, provide a means of detecting changes competitive to that using classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Source code, test scripts, and generator parameters available at:http://www.cs.put.poznan.pl/dbrzezinski/software.php.
- 2.
References
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A new ensemble diversity measure applied to thinning ensembles. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 306–316. Springer, Heidelberg (2003). doi:10.1007/3-540-44938-8_31
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 81–94 (2014)
Brzezinski, D., Stefanowski, J.: Prequential AUC for classifier evaluation and drift detection in evolving data streams. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2014. LNCS (LNAI), vol. 8983, pp. 87–101. Springer, Heidelberg (2015). doi:10.1007/978-3-319-17876-9_6
Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comp. Intell. Mag. 10(4), 12–25 (2015)
Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall, Boca Raton (2010)
Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)
Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A surveyon concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)
Giacinto, G., Roli, F.: An approach to the automatic design of multiple classifier systems. Pattern Recogn. Lett. 22(1), 25–33 (2001)
Giacinto, G., Roli, F.: Design of effective neural network ensembles for image classification purposes. Image Vis. Comput. 19(9–10), 699–707 (2001)
Krempl, G., Zliobaite, I., Brzezinski, D., Hüllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. SIGKDD Explorations 16(1), 1–10 (2014)
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken (2004)
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Procreedings of 14th International Conference on Machine Learning, pp. 211–218 (1997)
Minku, L.L., White, A.P., Yao, X.: The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. Knowl. Data Eng. 22(5), 730–742 (2010)
Oza, N.C., Russell, S.J.: Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of 7th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, pp. 359–364 (2001)
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of 7th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, pp. 377–382 (2001)
Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.B.: PAKDD data mining competition (2009)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of 9th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, pp. 226–235 (2003)
Woźniak, M.: Application of combined classifiers to data stream classification. In: Saeed, K., Chaki, R., Cortesi, A., Wierzchoń, S. (eds.) CISIM 2013. LNCS, vol. 8104, pp. 13–23. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40925-7_2
Zliobaite, I., Pechenizkiy, M., Gama, J.: An overview of concept drift applications. In: Japkowicz, N., Stefanowski, J. (eds.) Big Data Analysis: New Algorithms for a New Society, Studies in Big Data, vol. 16, pp. 91–114. Springer, Heidelberg (2016)
Acknowledgments
The authors’ research was funded by the Polish National Science Center under Grant No. DEC-2013/11/B/ST6/00963. Dariusz Brzezinski acknowledges the support of an FNP START scholarship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Brzezinski, D., Stefanowski, J. (2016). Ensemble Diversity in Evolving Data Streams. In: Calders, T., Ceci, M., Malerba, D. (eds) Discovery Science. DS 2016. Lecture Notes in Computer Science(), vol 9956. Springer, Cham. https://doi.org/10.1007/978-3-319-46307-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-46307-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46306-3
Online ISBN: 978-3-319-46307-0
eBook Packages: Computer ScienceComputer Science (R0)