Abstract
Mining concept drifting data stream is a challenging area for data mining research. In real world, data streams are not stable but change with time. Such changes termed as drifts in concept of the data stream are categorized into gradual and abrupt, based on the amount of drifting time, i.e. the time steps taken to replace the old concept completely by the new one. In traditional online learning systems, this categorization has not been exploited in developing different approaches for handling different types of drifts in the data stream. Such handling of concept drifts according to their type can help improve the performance of the classification system and hence, the issue can be explored further. Among the most popular and effective approaches to handle concept drifts is ensemble learning, where a set of models built over different time periods is maintained and the predictions of models are combined, usually according to their expertise level regarding the current concept. If early instances of new concept are stored and used for ensemble learning once the drift is detected, this may help increase the overall accuracy after the drift. Moreover, if an ensemble learns with zero diversity for instances of a new concept during the drifting period, the ensemble may learn the new concept faster, thus boosting recovery. The paper presents the above mentioned approach for effective handling of gradual concept drifts in the data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baena-Garcia, M., Campo-Avila, J., Del, F.R., Bifet, A.: Early Drift Detection Method. In: Proceedings 24th ECML PKDD International Workshop on Knowledge Discovery From Data Streams (IWKDDS 2006), Berlin, Germany, pp. 77–86 (2006)
Bifet, A., Kirkby, R.: Data Stream Mining − A Practical Approach, http://moa.cs.waikato.ac.nz/downloads/
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees, p. 368. Wadsworth International Group (1984)
Cao, L., Gorodetsky, V., Mitkas, P.A.: Agent Mining: The Synergy of Agents and Data Mining. IEEE Intelligent Systems 24(3), 64–72 (2009)
Fern, A., Givan, R.: Online Ensemble Learning: An Empirical Study. Machine Learning 53, 71–109 (2003)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with Drift Detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)
Katakis, I., Tsoumakas, G., Vlahavas, I.: Tracking Recurring Contexts using Ensemble Classifiers: An Application to Email Filtering. Knowledge and Information Systems 22, 371–391 (2010)
Minku, F.L., Inoue, H., Yao, X.: Negative Correlation in Incremental Learning. Natural Computing Journal - Special Issue on Nature-inspired Learning and Adaptive Systems, 32P (2008)
Minku, L., White, A., Yao, X.: The Impact of Diversity on On-line Ensemble Learning in the Presence of Concept Drift. IEEE Transactions on Knowledge and Data Engineering (2008)
Minku, F.L., Yao, X.: Using Diversity to Handle Concept Drift in On-line Learning. IEEE Transactions on Knowledge and Data Engineering 99(1) (2009)
Oza, N.C., Russell, S.: Experimental Comparisons of On-line and Batch Versions of Bagging and Boosting. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, August 26-29, pp. 359–364 (2001)
Oza, N.C., Russell, S.: Online Bagging and Boosting. In: Proceedings of the 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 2340–2345. Institute for Electrical and Electronics Engineers, New Jersey (2005)
Pelossof, R., Jones, M., Vovsha, I., Rudin, C.: Online Coordinate Boosting (2008), http://arxiv.org/abs/0810.4553
Polikar, R., Udpa, L., Udpa, S.S., Honavar, V.: Learn ++: An Incremental Learning Algorithm for Supervised Neural Networks. IEEE Transactions on Systems, Man and Cybernetics - Part C 31(4), 497–508 (2001)
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 2001, pp. 377–382. ACM Press (2001)
Tsymbala, A., Pechenizkiy, M., Cunningham, P., Puuronen, S.: Dynamic Integration of Classifiers for Handling Concept Drift. Information Fusion 9(1), 56–68 (2008)
UCI Repository Covertype Dataset, http://archive.ics.uci.edu/ml/datasets/Covertype
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, p. 226 (2003)
Zliobaite, I.: Learning Under Concept Drift- An Overview, Technical Report, Faculty of Mathematics and Informatics, Vilnius University, Vilnius, Lithuania (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Attar, V., Chaudhary, P., Rahagude, S., Chaudhari, G., Sinha, P. (2012). An Instance-Window Based Classification Algorithm for Handling Gradual Concept Drifts. In: Cao, L., Bazzan, A.L.C., Symeonidis, A.L., Gorodetsky, V.I., Weiss, G., Yu, P.S. (eds) Agents and Data Mining Interaction. ADMI 2011. Lecture Notes in Computer Science(), vol 7103. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27609-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-27609-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27608-8
Online ISBN: 978-3-642-27609-5
eBook Packages: Computer ScienceComputer Science (R0)