Abstract
The problem of real-time extraction of meaningful patterns from time-changing data streams is of increasing importance for the machine learning and data mining communities. Regression in time-changing data streams is a relatively unexplored topic, despite the apparent applications. This paper proposes an efficient and incremental stream mining algorithm which is able to learn regression and model trees from possibly unbounded, high-speed and time-changing data streams. The algorithm is evaluated extensively in a variety of settings involving artificial and real data. To the best of our knowledge there is no other general purpose algorithm for incremental learning regression/model trees able to perform explicit change detection and informed adaptation. The algorithm performs online and in real-time, observes each example only once at the speed of arrival, and maintains at any-time a ready-to-use model tree. The tree leaves contain linear models induced online from the examples assigned to them, a process with low complexity. The algorithm has mechanisms for drift detection and model adaptation, which enable it to maintain accurate and updated regression models at any time. The drift detection mechanism exploits the structure of the tree in the process of local change detection. As a response to local drift, the algorithm is able to update the tree structure only locally. This approach improves the any-time performance and greatly reduces the costs of adaptation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aggarwal CC (2006) Data streams: models and algorithms. Springer, New York
Basseville M, Nikiforov I (1993) Detection of abrupt changes: theory and applications. Prentice-Hall, Englewood Cliffs, NJ
Blake C, Keogh E, Merz C (1999) UCI repository of machine learning databases. http://archive.ics.uci.edu/ml. Accessed 19 Jan 2010
Breiman L (1998) Arcing classifiers. Ann Stat 26(3): 801–824
Breiman L, Friedman JH, Olshen RA, Stone CJ (1998) Classification and regression trees. CRC Press, Boca Raton, FL
Chaudhuri P, Huang M, Loh W, Yao R (1994) Piecewise polynomial regression trees. Stat Sin 4: 143–167
Chen Y, Dong G, Han J, Wah BW, Wang J (2002) Multi-dimensional regression analysis of time-series data streams. In: Proc the 28th int conf on very large databases. Morgan Kaufmann, San Francisco, pp 323–334
CUBIST (2009) RuleQuest research. http://www.rulequest.com/cubist-info.html. Accessed 19 Jan 2010
Dasu T, Krishnan S, Lin D, Venkatasubramanian S, Yi K (2009) Change (detection) you can believe in: finding distributional shifts in data streams. In: Proc IDA’09. Springer, Berlin, pp 21–34
Data Expo (2009) ASA sections on statistical computing and statistical graphics. http://stat-computing.org/dataexpo/2009. Accessed 19 Jan 2010
Dawid AP (1984) Statistical theory: the prequential approach. J R Stat Soc A 147: 278–292
Dobra A, Gherke J (2002) SECRET: a scalable linear regression tree algorithm. In: Proc 8th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 481–487
Domingos P, Hulten G (2000) Mining high speed data streams. In: Proc 6th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 71–80
Friedman JH (1991) Multivariate adaptive regression splines. J Ann Stat 19(1): 1–67. doi:10.1214/aos/1176347963
Gama J, Castillo G (2004) Learning with local drift detection. In: Proc 2nd int conf on advanced data mining and applications, LNCS, vol 4093. Springer, Berlin, pp 42–55
Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proc 9th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 523–528
Gama J, Medas P, Rocha R (2004) Forest trees for on-line data. In: Proc 2004 ACM symposium on applied computing. ACM Press, New York, pp 632–636
Gama J, Sebastiao R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: Proc 16th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 329–338
Gammerman A, Vovk V (2002) Prediction algorithms and confidence measures based on algorithmic randomness theory. J Theor Comput Sci 287: 209–217
Gammerman A, Vovk V (2007) Hedging predictions in machine learning. Comput J 50: 151–163
Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting data streams with skewed distributions. In: Proc 7th int conf on data mining, SIAM, Philadelphia, PA
Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. J Neural Comput 4: 1–58
Gratch J (1996) Sequential inductive learning. In: Proc 13th natl conf on artificial intelligence and 8th innovative applications of artificial intelligence conf, vol 1. AAAI Press, Menlo Park, CA, pp 779–786
Hoeffding W (1963) Probability for sums of bounded random variables. J Am Stat Assoc 58: 13–30
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proc 7th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 97–106
Ikonomovska E, Gama J (2008) Learning model trees from data streams. In: Proc 11th int conf on discovery science, LNAI, vol. 5255. Springer, Berlin, pp 52–63
Ikonomovska E, Gama J, Sebastião R, Gjorgjevik D (2009) Regression trees from data streams with drift detection. In: Proc 11th int conf on discovery science, LNAI, vol 5808. Springer, Berlin, pp 121–135
Jin R, Agrawal G (2003) Efficient decision tree construction on streaming data. In: Proc 9th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 571–576
Karalic A (1992) Employing linear regression in regression tree leaves. In: Proc 10th European conf on artificial intelligence. Wiley, New York, pp 440–441
Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: Proc 30th int conf on very large data bases. Morgan Kaufmann, San Francisco, pp 180–191
Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: (eds) In: Proc 17th int conf on machine learning. Morgan Kaufmann, San Francisco, pp 487–494
Klinkenberg R, Renz I (1998) Adaptive information filtering: learning in the presence of concept drifts. In: Proc AAAI98/ICML-98 wshp on learning for text categorization. AAAI Press, Menlo Park, pp 33–40
Loh W (2002) Regression trees with unbiased variable selection and interaction detection (2002). Stat Sin 12: 361–386
Malerba D, Appice A, Ceci M, Monopoli M (2002) Trading-off local versus global effects of regression nodes in model trees. In: Proc 13th int symposium on foundations of intelligent systems, LNCS, vol 2366. Springer, Berlin, pp 393–402
Mouss H, Mouss D, Mouss N, Sefouhi L (2004) Test of Page–Hinckley, an approach for fault detection in an agro-alimentary production system. In: Proc 5th Asian control conference, vol 2. IEEE Computer Society, Los Alamitos, CA, pp 815–818
Musick R, Catlett J, Russell S (1993) Decision theoretic sub-sampling for induction on large databases. In: Proc 10th int conf on machine learning. Morgan Kaufmann, San Francisco, pp 212–219
Pang KP, Ting KM (2005) Improving the centered CUSUMS statistic for structural break detection in time series. In: Proc 17th Australian joint conf on artificial intelligence, LNCS, vol 3339. Springer, Berlin, pp 402–413
Pfahringer B, Holmes G, Kirkby R (2008) Handling numeric attributes in Hoeffding trees. In: Proc 12th Pacific-Asian conf on knowledge discovery and data mining, LNCS, vol 5012. Springer, Berlin, pp 296–307
Potts D, Sammut C (2005) Morgan Kaufmann, San Francisco, pp 5–48. doi:10.1007/s10994-005-1121-8
Quinlan JR (1992) Learning with continuous classes. In: Proc 5th Australian joint conf on artificial intelligence. World Scientific, Singapore, pp 343–348
Rajaraman K, Tan (2001) A topic detection, tracking, and trend analysis using self-organizing neural networks. In: Proc 5th Pacific-Asian conf on knowledge discovery and data mining, LNCS, vol 2035. Springer, Berlin, pp 102–107
Rodrigues PP, Gama J, Bosnic Z (2008) Online reliability estimates for individual predictions in data streams. In: Proc IEEE int conf on data mining workshops. IEEE Computer Society, Los Alamitos, CA, pp 36–45
Sebastiao R, Rodrigues PP, Gama J (2009) Change detection in climate data over the Iberian peninsula. In: Proc IEEE int conf on data mining workshops. IEEE Computer Society, Los Alamitos, CA, pp 248–253
Siciliano R, Mola F (1994) Modeling for recursive partitioning and variable selection. In: Proc int conf on computational statistics. Physica Verlag, Heidelberg, pp 172–177
Song X, Wu M, Jermaine C, Ranka S (2007) Statistical change detection for multidimensional data. In: Proc 13th ACM SIGKDD conf on knowledge discovery and data mining, pp 667–676
Subramaniam S, Palpanas T, Papadopulous D, Kalogeraki V, Ginopulos D (2006) Online outlier detection in sensor data using non-parametric methods. In: Proc 32nd int conf on very large databases, ACM, New York, pp 187–198
Torgo L (1997) Functional models for regression tree leaves. In: Proc 14th int conf on machine learning. Morgan Kaufmann, San Francisco, pp 385-393
VFML (2003) A toolkit for mining high-speed time-changing data streams. http://www.cs.washington.edu/dm/vfml. Accessed 19 Jan 2010
Vogel DS, Asparouhov O, Scheffer T (2007) Scalable look-ahead linear regression trees. In: Berkhin P, Caruana R, Wu X (eds) Proc 13th ACM SIGKDD int conf on knowledge discovery and data mining, KDD. ACMK, San Jose, CA, pp 757–764
WEKA 3 (2005) Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka. Accessed 19 Jan 2010
Widmer G, Kubat M (1996) Morgan Kaufmann, San Francisco, pp 69–101. doi:10.1007/BF00116900
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
This paper has its origins in two conference papers that propose and partly evaluate a new algorithm for learning model trees from stationary data streams (Ikonomovska and Gama 2008) and an improvement of this algorithm for learning from non-stationary data streams (Ikonomovska et al. 2009). However, this paper significantly extends and upgrades the work presented there, both on the algorithmic design and experimental evaluation fronts. Concerning the algorithm: We consider a new approach for the split selection criteria; We include a memory saving method for disabling bad split points; We improve the adaptation mechanism by using the Q statistic with a fading factor (Gama et al. 2009); We discuss and propose several memory management methods that enable most effective learning with constrained resources (deactivating and reactivating non-problematic leaves, removing non-promising alternate trees). We also perform a much more extensive and in-depth experimental evaluation: We consider a larger collection of real-world datasets; we use a more carefully designed experimental methodology (sliding window prequential and holdout evaluation among others); we provide a much more comprehensive discussion of the experimental results.
Rights and permissions
About this article
Cite this article
Ikonomovska, E., Gama, J. & Džeroski, S. Learning model trees from evolving data streams. Data Min Knowl Disc 23, 128–168 (2011). https://doi.org/10.1007/s10618-010-0201-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0201-y