Skip to main content
Log in

Learning model trees from evolving data streams

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The problem of real-time extraction of meaningful patterns from time-changing data streams is of increasing importance for the machine learning and data mining communities. Regression in time-changing data streams is a relatively unexplored topic, despite the apparent applications. This paper proposes an efficient and incremental stream mining algorithm which is able to learn regression and model trees from possibly unbounded, high-speed and time-changing data streams. The algorithm is evaluated extensively in a variety of settings involving artificial and real data. To the best of our knowledge there is no other general purpose algorithm for incremental learning regression/model trees able to perform explicit change detection and informed adaptation. The algorithm performs online and in real-time, observes each example only once at the speed of arrival, and maintains at any-time a ready-to-use model tree. The tree leaves contain linear models induced online from the examples assigned to them, a process with low complexity. The algorithm has mechanisms for drift detection and model adaptation, which enable it to maintain accurate and updated regression models at any time. The drift detection mechanism exploits the structure of the tree in the process of local change detection. As a response to local drift, the algorithm is able to update the tree structure only locally. This approach improves the any-time performance and greatly reduces the costs of adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Aggarwal CC (2006) Data streams: models and algorithms. Springer, New York

    Google Scholar 

  • Basseville M, Nikiforov I (1993) Detection of abrupt changes: theory and applications. Prentice-Hall, Englewood Cliffs, NJ

    Google Scholar 

  • Blake C, Keogh E, Merz C (1999) UCI repository of machine learning databases. http://archive.ics.uci.edu/ml. Accessed 19 Jan 2010

  • Breiman L (1998) Arcing classifiers. Ann Stat 26(3): 801–824

    Article  MATH  MathSciNet  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1998) Classification and regression trees. CRC Press, Boca Raton, FL

    Google Scholar 

  • Chaudhuri P, Huang M, Loh W, Yao R (1994) Piecewise polynomial regression trees. Stat Sin 4: 143–167

    MATH  Google Scholar 

  • Chen Y, Dong G, Han J, Wah BW, Wang J (2002) Multi-dimensional regression analysis of time-series data streams. In: Proc the 28th int conf on very large databases. Morgan Kaufmann, San Francisco, pp 323–334

  • CUBIST (2009) RuleQuest research. http://www.rulequest.com/cubist-info.html. Accessed 19 Jan 2010

  • Dasu T, Krishnan S, Lin D, Venkatasubramanian S, Yi K (2009) Change (detection) you can believe in: finding distributional shifts in data streams. In: Proc IDA’09. Springer, Berlin, pp 21–34

  • Data Expo (2009) ASA sections on statistical computing and statistical graphics. http://stat-computing.org/dataexpo/2009. Accessed 19 Jan 2010

  • Dawid AP (1984) Statistical theory: the prequential approach. J R Stat Soc A 147: 278–292

    Article  MATH  MathSciNet  Google Scholar 

  • Dobra A, Gherke J (2002) SECRET: a scalable linear regression tree algorithm. In: Proc 8th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 481–487

  • Domingos P, Hulten G (2000) Mining high speed data streams. In: Proc 6th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 71–80

  • Friedman JH (1991) Multivariate adaptive regression splines. J Ann Stat 19(1): 1–67. doi:10.1214/aos/1176347963

    Article  MATH  Google Scholar 

  • Gama J, Castillo G (2004) Learning with local drift detection. In: Proc 2nd int conf on advanced data mining and applications, LNCS, vol 4093. Springer, Berlin, pp 42–55

  • Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proc 9th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 523–528

  • Gama J, Medas P, Rocha R (2004) Forest trees for on-line data. In: Proc 2004 ACM symposium on applied computing. ACM Press, New York, pp 632–636

  • Gama J, Sebastiao R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: Proc 16th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 329–338

  • Gammerman A, Vovk V (2002) Prediction algorithms and confidence measures based on algorithmic randomness theory. J Theor Comput Sci 287: 209–217

    Article  MATH  MathSciNet  Google Scholar 

  • Gammerman A, Vovk V (2007) Hedging predictions in machine learning. Comput J 50: 151–163

    Article  Google Scholar 

  • Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting data streams with skewed distributions. In: Proc 7th int conf on data mining, SIAM, Philadelphia, PA

  • Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. J Neural Comput 4: 1–58

    Article  Google Scholar 

  • Gratch J (1996) Sequential inductive learning. In: Proc 13th natl conf on artificial intelligence and 8th innovative applications of artificial intelligence conf, vol 1. AAAI Press, Menlo Park, CA, pp 779–786

  • Hoeffding W (1963) Probability for sums of bounded random variables. J Am Stat Assoc 58: 13–30

    Article  MATH  MathSciNet  Google Scholar 

  • Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proc 7th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 97–106

  • Ikonomovska E, Gama J (2008) Learning model trees from data streams. In: Proc 11th int conf on discovery science, LNAI, vol. 5255. Springer, Berlin, pp 52–63

  • Ikonomovska E, Gama J, Sebastião R, Gjorgjevik D (2009) Regression trees from data streams with drift detection. In: Proc 11th int conf on discovery science, LNAI, vol 5808. Springer, Berlin, pp 121–135

  • Jin R, Agrawal G (2003) Efficient decision tree construction on streaming data. In: Proc 9th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 571–576

  • Karalic A (1992) Employing linear regression in regression tree leaves. In: Proc 10th European conf on artificial intelligence. Wiley, New York, pp 440–441

  • Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: Proc 30th int conf on very large data bases. Morgan Kaufmann, San Francisco, pp 180–191

  • Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: (eds) In: Proc 17th int conf on machine learning. Morgan Kaufmann, San Francisco, pp 487–494

    Google Scholar 

  • Klinkenberg R, Renz I (1998) Adaptive information filtering: learning in the presence of concept drifts. In: Proc AAAI98/ICML-98 wshp on learning for text categorization. AAAI Press, Menlo Park, pp 33–40

  • Loh W (2002) Regression trees with unbiased variable selection and interaction detection (2002). Stat Sin 12: 361–386

    MATH  MathSciNet  Google Scholar 

  • Malerba D, Appice A, Ceci M, Monopoli M (2002) Trading-off local versus global effects of regression nodes in model trees. In: Proc 13th int symposium on foundations of intelligent systems, LNCS, vol 2366. Springer, Berlin, pp 393–402

  • Mouss H, Mouss D, Mouss N, Sefouhi L (2004) Test of Page–Hinckley, an approach for fault detection in an agro-alimentary production system. In: Proc 5th Asian control conference, vol 2. IEEE Computer Society, Los Alamitos, CA, pp 815–818

  • Musick R, Catlett J, Russell S (1993) Decision theoretic sub-sampling for induction on large databases. In: Proc 10th int conf on machine learning. Morgan Kaufmann, San Francisco, pp 212–219

  • Pang KP, Ting KM (2005) Improving the centered CUSUMS statistic for structural break detection in time series. In: Proc 17th Australian joint conf on artificial intelligence, LNCS, vol 3339. Springer, Berlin, pp 402–413

  • Pfahringer B, Holmes G, Kirkby R (2008) Handling numeric attributes in Hoeffding trees. In: Proc 12th Pacific-Asian conf on knowledge discovery and data mining, LNCS, vol 5012. Springer, Berlin, pp 296–307

  • Potts D, Sammut C (2005) Morgan Kaufmann, San Francisco, pp 5–48. doi:10.1007/s10994-005-1121-8

    Article  MATH  Google Scholar 

  • Quinlan JR (1992) Learning with continuous classes. In: Proc 5th Australian joint conf on artificial intelligence. World Scientific, Singapore, pp 343–348

  • Rajaraman K, Tan (2001) A topic detection, tracking, and trend analysis using self-organizing neural networks. In: Proc 5th Pacific-Asian conf on knowledge discovery and data mining, LNCS, vol 2035. Springer, Berlin, pp 102–107

  • Rodrigues PP, Gama J, Bosnic Z (2008) Online reliability estimates for individual predictions in data streams. In: Proc IEEE int conf on data mining workshops. IEEE Computer Society, Los Alamitos, CA, pp 36–45

  • Sebastiao R, Rodrigues PP, Gama J (2009) Change detection in climate data over the Iberian peninsula. In: Proc IEEE int conf on data mining workshops. IEEE Computer Society, Los Alamitos, CA, pp 248–253

  • Siciliano R, Mola F (1994) Modeling for recursive partitioning and variable selection. In: Proc int conf on computational statistics. Physica Verlag, Heidelberg, pp 172–177

  • Song X, Wu M, Jermaine C, Ranka S (2007) Statistical change detection for multidimensional data. In: Proc 13th ACM SIGKDD conf on knowledge discovery and data mining, pp 667–676

  • Subramaniam S, Palpanas T, Papadopulous D, Kalogeraki V, Ginopulos D (2006) Online outlier detection in sensor data using non-parametric methods. In: Proc 32nd int conf on very large databases, ACM, New York, pp 187–198

  • Torgo L (1997) Functional models for regression tree leaves. In: Proc 14th int conf on machine learning. Morgan Kaufmann, San Francisco, pp 385-393

  • VFML (2003) A toolkit for mining high-speed time-changing data streams. http://www.cs.washington.edu/dm/vfml. Accessed 19 Jan 2010

  • Vogel DS, Asparouhov O, Scheffer T (2007) Scalable look-ahead linear regression trees. In: Berkhin P, Caruana R, Wu X (eds) Proc 13th ACM SIGKDD int conf on knowledge discovery and data mining, KDD. ACMK, San Jose, CA, pp 757–764

  • WEKA 3 (2005) Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka. Accessed 19 Jan 2010

  • Widmer G, Kubat M (1996) Morgan Kaufmann, San Francisco, pp 69–101. doi:10.1007/BF00116900

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elena Ikonomovska.

Additional information

Responsible editor: Eamonn Keogh.

This paper has its origins in two conference papers that propose and partly evaluate a new algorithm for learning model trees from stationary data streams (Ikonomovska and Gama 2008) and an improvement of this algorithm for learning from non-stationary data streams (Ikonomovska et al. 2009). However, this paper significantly extends and upgrades the work presented there, both on the algorithmic design and experimental evaluation fronts. Concerning the algorithm: We consider a new approach for the split selection criteria; We include a memory saving method for disabling bad split points; We improve the adaptation mechanism by using the Q statistic with a fading factor (Gama et al. 2009); We discuss and propose several memory management methods that enable most effective learning with constrained resources (deactivating and reactivating non-problematic leaves, removing non-promising alternate trees). We also perform a much more extensive and in-depth experimental evaluation: We consider a larger collection of real-world datasets; we use a more carefully designed experimental methodology (sliding window prequential and holdout evaluation among others); we provide a much more comprehensive discussion of the experimental results.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ikonomovska, E., Gama, J. & Džeroski, S. Learning model trees from evolving data streams. Data Min Knowl Disc 23, 128–168 (2011). https://doi.org/10.1007/s10618-010-0201-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-010-0201-y

Keywords

Navigation