Regression Trees from Data Streams with Drift Detection

  • Elena Ikonomovska
  • João Gama
  • Raquel Sebastião
  • Dejan Gjorgjevik
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5808)

Abstract

The problem of extracting meaningful patterns from time changing data streams is of increasing importance for the machine learning and data mining communities. We present an algorithm which is able to learn regression trees from fast and unbounded data streams in the presence of concept drifts. To our best knowledge there is no other algorithm for incremental learning regression trees equipped with change detection abilities. The FIRT-DD algorithm has mechanisms for drift detection and model adaptation, which enable to maintain accurate and updated regression models at any time. The drift detection mechanism is based on sequential statistical tests that track the evolution of the local error, at each node of the tree, and inform the learning process for the detected changes. As a response to a local drift, the algorithm is able to adapt the model only locally, avoiding the necessity of a global model adaptation. The adaptation strategy consists of building a new tree whenever a change is suspected in the region and replacing the old ones when the new trees become more accurate. This enables smooth and granular adaptation of the global model. The results from the empirical evaluation performed over several different types of drift show that the algorithm has good capability of consistent detection and proper adaptation to concept drifts.

Keywords

data stream regression trees concept drift change detection stream data mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Basseville, M., Nikiforov, I.: Detection of Abrupt Changes: Theory and Applications. Prentice-Hall Inc., Englewood Cliffs (1987)Google Scholar
  2. 2.
    Ikonomovska, E., Gama, J.: Learning Model Trees from Data Streams. In: Boulicaut, J.-F., Berthold, M.R., Horváth, T. (eds.) DS 2008. LNCS (LNAI), vol. 5255, pp. 5–63. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Tsymbal, A.: The problem of concept drift: definitions and related work. Technical Report, TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland (2004)Google Scholar
  4. 4.
    Gama, J., Castillo, G.: Learning with Local Drift Detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. J. Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift 8(3), 281–300 (2004)Google Scholar
  6. 6.
    Widmer, G., Kubat, M.: Learning in the presence of concept drifts and hidden contexts. J. Machine Learning 23, 69–101 (1996)Google Scholar
  7. 7.
    Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: Langley, P. (ed.) 17th International Conference on Machine Learning, pp. 487–494. Morgan Kaufmann, San Francisco (2000)Google Scholar
  8. 8.
    Klinkenberg, R., Renz, I.: Adaptive information filtering: Learning in the presence of concept drifts. In: Learning for Text Categorization, pp. 33–40. AAAI Press, Menlo Park (1998)Google Scholar
  9. 9.
    Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: 30th International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann, San Francisco (2004)Google Scholar
  10. 10.
    Gama, J., Fernandes, R., Rocha, R.: Decision trees for mining data streams. J. Intelligent Data Analysis 10(1), 23–46 (2006)Google Scholar
  11. 11.
    Kolter, J.Z., Maloof, M.: Using additive expert ensembles to cope with concept drift. In: 22nd International Conference on Machine Learning, pp. 449–456. ACM, New York (2005)Google Scholar
  12. 12.
    Kolter, J.Z., Maloof, M.: Dynamic weighted majority: A new ensemble method for tracking concept drift. In: 3rd International Conference on Data Mining, pp. 123–130. IEEE Computer Society Press, Los Alamitos (2003)CrossRefGoogle Scholar
  13. 13.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM Press, Menlo Park (2001)Google Scholar
  14. 14.
    Grant, L., Leavenworth, S.: Statistical Quality Control. McGraw-Hill, United States (1996)MATHGoogle Scholar
  15. 15.
    Page, E.S.: Continuous Inspection Schemes. J. Biometrika 41, 100–115 (1954)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Mouss, H., Mouss, D., Mouss, N., Sefouhi, L.: Test of Page-Hinkley, an Approach for Fault Detection in an Agro-Alimentary Production System. In: 5th Asian Control Conference, vol. 2, pp. 815–818. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  17. 17.
    Friedman, J.H.: Multivariate Adaptive Regression Splines. J. The Annals of Statistics 19, 1–141 (1991)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    ASA Sections on Statistical Computing and Statistical Graphics, Data Expo (2009), http://stat-computing.org/dataexpo/2009/

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Elena Ikonomovska
    • 1
  • João Gama
    • 2
    • 3
  • Raquel Sebastião
    • 2
    • 4
  • Dejan Gjorgjevik
    • 1
  1. 1.FEEITSs. Cyril and Methodius UniversitySkopjeMacedonia
  2. 2.LIAAD/INESCUniversity of PortoPortoPortugal
  3. 3.Faculty of EconomicsUniversity of PortoPortoPortugal
  4. 4.Faculty of ScienceUniversity of PortoPortoPortugal

Personalised recommendations