Regression Trees from Data Streams with Drift Detection
The problem of extracting meaningful patterns from time changing data streams is of increasing importance for the machine learning and data mining communities. We present an algorithm which is able to learn regression trees from fast and unbounded data streams in the presence of concept drifts. To our best knowledge there is no other algorithm for incremental learning regression trees equipped with change detection abilities. The FIRT-DD algorithm has mechanisms for drift detection and model adaptation, which enable to maintain accurate and updated regression models at any time. The drift detection mechanism is based on sequential statistical tests that track the evolution of the local error, at each node of the tree, and inform the learning process for the detected changes. As a response to a local drift, the algorithm is able to adapt the model only locally, avoiding the necessity of a global model adaptation. The adaptation strategy consists of building a new tree whenever a change is suspected in the region and replacing the old ones when the new trees become more accurate. This enables smooth and granular adaptation of the global model. The results from the empirical evaluation performed over several different types of drift show that the algorithm has good capability of consistent detection and proper adaptation to concept drifts.
Keywordsdata stream regression trees concept drift change detection stream data mining
Unable to display preview. Download preview PDF.
- 1.Basseville, M., Nikiforov, I.: Detection of Abrupt Changes: Theory and Applications. Prentice-Hall Inc., Englewood Cliffs (1987)Google Scholar
- 3.Tsymbal, A.: The problem of concept drift: definitions and related work. Technical Report, TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland (2004)Google Scholar
- 5.Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. J. Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift 8(3), 281–300 (2004)Google Scholar
- 6.Widmer, G., Kubat, M.: Learning in the presence of concept drifts and hidden contexts. J. Machine Learning 23, 69–101 (1996)Google Scholar
- 7.Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: Langley, P. (ed.) 17th International Conference on Machine Learning, pp. 487–494. Morgan Kaufmann, San Francisco (2000)Google Scholar
- 8.Klinkenberg, R., Renz, I.: Adaptive information filtering: Learning in the presence of concept drifts. In: Learning for Text Categorization, pp. 33–40. AAAI Press, Menlo Park (1998)Google Scholar
- 9.Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: 30th International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann, San Francisco (2004)Google Scholar
- 10.Gama, J., Fernandes, R., Rocha, R.: Decision trees for mining data streams. J. Intelligent Data Analysis 10(1), 23–46 (2006)Google Scholar
- 11.Kolter, J.Z., Maloof, M.: Using additive expert ensembles to cope with concept drift. In: 22nd International Conference on Machine Learning, pp. 449–456. ACM, New York (2005)Google Scholar
- 13.Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM Press, Menlo Park (2001)Google Scholar
- 16.Mouss, H., Mouss, D., Mouss, N., Sefouhi, L.: Test of Page-Hinkley, an Approach for Fault Detection in an Agro-Alimentary Production System. In: 5th Asian Control Conference, vol. 2, pp. 815–818. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
- 18.ASA Sections on Statistical Computing and Statistical Graphics, Data Expo (2009), http://stat-computing.org/dataexpo/2009/