On the Creation of Diverse Ensembles for Nonstationary Environments Using Bio-inspired Heuristics
Recently the relevance of adaptive models for dynamic data environments has turned into a hot topic due to the vast number of scenarios generating nonstationary data streams. When a change (concept drift) in data distribution occurs, the ensembles of models trained over these data sources are obsolete and do not adapt suitably to the new distribution of the data. Although most of the research on the field is focused on the detection of this drift to re-train the ensemble, it is widely known the importance of the diversity in the ensemble shortly after the drift in order to reduce the initial drop in accuracy. In a Big Data scenario in which data can be huge (and also the number of past models), achieving the most diverse ensemble implies the calculus of all possible combinations of models, which is not an easy task to carry out quickly in the long term. This challenge can be formulated as an optimization problem, for which bio-inspired algorithms can play one of the key roles in these adaptive algorithms. Precisely this is the goal of this manuscript: to validate the relevance of the diversity right after drifts, and to unveil how to achieve a highly diverse ensemble by using a self-learning optimization technique.
KeywordsConcept drift Diversity Bioinspired optimization
This work has been supported by the Basque Government through the ELKARTEK program (ref. KK-2015/0000080, BID3A project) and BID3ABI project.
- 11.Geem, Z.W., Tseng, C.L., Williams, J.C.: Harmony search algorithms for water and environmental systems. In: Geem, Z.W. (ed.) Music-Inspired Harmony Search Algorithm, vol. 191, pp. 113–127. Springer, Heidelberg (2009)Google Scholar
- 14.Z̆liobaitė, J., Pechenizkiy, M., Gama, J.: An overview of concept drift applications. In: Japkowicz, N., Stefanowski, J. (eds.) Big Data Analysis: New Algorithms for a New Society, vol. 16, pp. 91–114. Springer, Cham (2016)Google Scholar
- 16.Ditzler, G., Polikar, R., Chawla, N.: An incremental learning algorithm for non-stationary environments and class imbalance. In: International Conference on Pattern Recognition (ICPR), pp. 2997–3000 (2010)Google Scholar
- 17.Ditterrich, T.G.: Machine learning research: four current directions. Artif. Intell. Mag. 4, 97–136 (1997)Google Scholar
- 19.Yule, G.U.: On the association of attributes in statistics: with illustrations from the material of the childhood society, & c. In: Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, vol. 194, pp. 257–319 (1900)Google Scholar
- 20.Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382 (2001)Google Scholar