Abstract
Data stream classification is fast growing research area due to increasing number of practical applications in modern technology. SPAM filtering, weather forecast are just two well known examples. Nonetheless, high pace of incoming data makes classical algorithm inefficient as they usually use batch processing methods. What more, the characteristic of the data can change over time what makes classifiers obsolete. Their continuous updating can help, but cannot by applied in classical batch algorithms. On-line or chunk base training could be a solution. The last one is based on repeated extracting data chunks from data stream and using them for adaptation. In case of many difficult classification tasks ensembles of classifies work much better that systems based on single one classifier. Unfortunately the ensembles require additional training of their fusion model. In this paper we present the ensemble for data stream classification and compare two optimisation methods used for its training: Genetic Algorithm and Simulated Annealing. Results of experiments on several benchmark datasets shows that both methods are equally effective in term of accuracy and outperform several competing methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Alpaydin, E.: Introduction to Machine Learning (Adaptive Computation and Machine Learning), vol. 5. The MIT Press, Cambridge (2004)
Bifet, A., Holmes, G., Pfahringer, B., Read, J., Kranen, P., Kremer, H., Jansen, T., Seidl, T.: MOA: a real-time analytics open source framework. Analysis 6913, 617–620 (2011)
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics, vol. 4. Springer, New York (2006)
Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 81–94 (2014)
Chen, S., Wang, H., Zhou, S., Yu, P.S.: Stop chasing trends: discovering high order models in evolving data. In: 2008 IEEE 24th International Conference on Data Engineering, pp. 923–932, April 2008
Duin, R.P.W., Juszczak, P., Paclik, P., Pekalska, E., de Ridder, D., Tax, D.M.J.: PRTools4, A Matlab Toolbox for Pattern Recognition, Delft University of Technology (2004)
Eiben, A.E., Smith, J.E., James, E.: Introduction to Evolutionary Computing. Springer, New York (2003)
Gama, J., Rodrigues, P.P., Spinosa, E., Carvalho, A.: Knowledge discovery from data streams. In: Web Intelligence and Security - Advances in Data and Text Mining Techniques for Detecting and Preventing Terrorist Activities on the Web, pp. 125–138 (2010)
Gama, J., Žliobait, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 1–37 (2014)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: SBIA Brazilian Symposium on Artificial Intelligence, pp. 286–295. Springer (2004)
Jackowski, K.: Fixed-size ensemble classifier system evolutionarily adapted to a recurring context with an unlimited pool of classifiers. In: Pattern Analysis and Applications, February 2013
Kuncheva, L.I.: Classifier Ensembles for Changing Environments, pp. 1–15 (2004)
Oza, N.C., Russell, S.: Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 359–364 (2001)
Schlimmer, J.C., Granger Jr., R.H.: Incremental learning from noisy data. Mach. Learn. 1(3), 317–354 (1986)
Street, W.N., Kim, Y.S.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 377–382 (2001)
Tsymbal, A.: The problem of concept drift : definitions and related work (2004)
Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Dynamic Integration of Classifiers for Handling Concept Drift Dynamic Integration of Classifiers for Handling Concept Drift, pp. 1–27
van Laarhoven, P.J.M., Aarts, E.H.L.: Introduction. In: Simulated Annealing: Theory and Applications, pp. 1–6. Springer, Dordrecht (1987)
Žliobaitė, I.: Adaptive training set formation. Ph.D. thesis, Vilnius University (2010)
Wang, H., Fan, W., Philip, P.S., Yu, S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, vol. 2(1), pp. 226–235 (2003)
Widmer, G., Kubat, M.: Effective learning in dynamic environments by explicit context tracking. In: European Conference on Machine Learning, pp. 227–243. Springer (1993)
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)
Žliobaitė, I., Pechenizkiy, M., Gama, J.: An Overview of Concept Drift Applications, pp. 91–114. Springer International Publishing, Cham (2016)
Acknowledgements
This work was supported by the Polish National Science Centre under the grant no. DEC-2013/09/B/ST6/02264.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Jackowski, K. (2018). Application of Genetic Algorithm and Simulated Annealing to Ensemble Classifier Training on Data Streams. In: Xhafa, F., Caballé, S., Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-69835-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-69835-9_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69834-2
Online ISBN: 978-3-319-69835-9
eBook Packages: EngineeringEngineering (R0)