Data Mining and Knowledge Discovery

, Volume 23, Issue 1, pp 128–168 | Cite as

Learning model trees from evolving data streams

Article

Abstract

The problem of real-time extraction of meaningful patterns from time-changing data streams is of increasing importance for the machine learning and data mining communities. Regression in time-changing data streams is a relatively unexplored topic, despite the apparent applications. This paper proposes an efficient and incremental stream mining algorithm which is able to learn regression and model trees from possibly unbounded, high-speed and time-changing data streams. The algorithm is evaluated extensively in a variety of settings involving artificial and real data. To the best of our knowledge there is no other general purpose algorithm for incremental learning regression/model trees able to perform explicit change detection and informed adaptation. The algorithm performs online and in real-time, observes each example only once at the speed of arrival, and maintains at any-time a ready-to-use model tree. The tree leaves contain linear models induced online from the examples assigned to them, a process with low complexity. The algorithm has mechanisms for drift detection and model adaptation, which enable it to maintain accurate and updated regression models at any time. The drift detection mechanism exploits the structure of the tree in the process of local change detection. As a response to local drift, the algorithm is able to update the tree structure only locally. This approach improves the any-time performance and greatly reduces the costs of adaptation.

Keywords

Non-stationary data streams Stream data mining Regression trees Model trees Incremental algorithms On-line learning Concept drift On-line change detection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal CC (2006) Data streams: models and algorithms. Springer, New YorkGoogle Scholar
  2. Basseville M, Nikiforov I (1993) Detection of abrupt changes: theory and applications. Prentice-Hall, Englewood Cliffs, NJGoogle Scholar
  3. Blake C, Keogh E, Merz C (1999) UCI repository of machine learning databases. http://archive.ics.uci.edu/ml. Accessed 19 Jan 2010
  4. Breiman L (1998) Arcing classifiers. Ann Stat 26(3): 801–824CrossRefMATHMathSciNetGoogle Scholar
  5. Breiman L, Friedman JH, Olshen RA, Stone CJ (1998) Classification and regression trees. CRC Press, Boca Raton, FLGoogle Scholar
  6. Chaudhuri P, Huang M, Loh W, Yao R (1994) Piecewise polynomial regression trees. Stat Sin 4: 143–167MATHGoogle Scholar
  7. Chen Y, Dong G, Han J, Wah BW, Wang J (2002) Multi-dimensional regression analysis of time-series data streams. In: Proc the 28th int conf on very large databases. Morgan Kaufmann, San Francisco, pp 323–334Google Scholar
  8. CUBIST (2009) RuleQuest research. http://www.rulequest.com/cubist-info.html. Accessed 19 Jan 2010
  9. Dasu T, Krishnan S, Lin D, Venkatasubramanian S, Yi K (2009) Change (detection) you can believe in: finding distributional shifts in data streams. In: Proc IDA’09. Springer, Berlin, pp 21–34Google Scholar
  10. Data Expo (2009) ASA sections on statistical computing and statistical graphics. http://stat-computing.org/dataexpo/2009. Accessed 19 Jan 2010
  11. Dawid AP (1984) Statistical theory: the prequential approach. J R Stat Soc A 147: 278–292CrossRefMATHMathSciNetGoogle Scholar
  12. Dobra A, Gherke J (2002) SECRET: a scalable linear regression tree algorithm. In: Proc 8th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 481–487Google Scholar
  13. Domingos P, Hulten G (2000) Mining high speed data streams. In: Proc 6th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 71–80Google Scholar
  14. Friedman JH (1991) Multivariate adaptive regression splines. J Ann Stat 19(1): 1–67. doi:10.1214/aos/1176347963 CrossRefMATHGoogle Scholar
  15. Gama J, Castillo G (2004) Learning with local drift detection. In: Proc 2nd int conf on advanced data mining and applications, LNCS, vol 4093. Springer, Berlin, pp 42–55Google Scholar
  16. Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proc 9th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 523–528Google Scholar
  17. Gama J, Medas P, Rocha R (2004) Forest trees for on-line data. In: Proc 2004 ACM symposium on applied computing. ACM Press, New York, pp 632–636Google Scholar
  18. Gama J, Sebastiao R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: Proc 16th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 329–338Google Scholar
  19. Gammerman A, Vovk V (2002) Prediction algorithms and confidence measures based on algorithmic randomness theory. J Theor Comput Sci 287: 209–217CrossRefMATHMathSciNetGoogle Scholar
  20. Gammerman A, Vovk V (2007) Hedging predictions in machine learning. Comput J 50: 151–163CrossRefGoogle Scholar
  21. Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting data streams with skewed distributions. In: Proc 7th int conf on data mining, SIAM, Philadelphia, PAGoogle Scholar
  22. Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. J Neural Comput 4: 1–58CrossRefGoogle Scholar
  23. Gratch J (1996) Sequential inductive learning. In: Proc 13th natl conf on artificial intelligence and 8th innovative applications of artificial intelligence conf, vol 1. AAAI Press, Menlo Park, CA, pp 779–786Google Scholar
  24. Hoeffding W (1963) Probability for sums of bounded random variables. J Am Stat Assoc 58: 13–30CrossRefMATHMathSciNetGoogle Scholar
  25. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proc 7th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 97–106Google Scholar
  26. Ikonomovska E, Gama J (2008) Learning model trees from data streams. In: Proc 11th int conf on discovery science, LNAI, vol. 5255. Springer, Berlin, pp 52–63Google Scholar
  27. Ikonomovska E, Gama J, Sebastião R, Gjorgjevik D (2009) Regression trees from data streams with drift detection. In: Proc 11th int conf on discovery science, LNAI, vol 5808. Springer, Berlin, pp 121–135Google Scholar
  28. Jin R, Agrawal G (2003) Efficient decision tree construction on streaming data. In: Proc 9th ACM SIGKDD int conf on knowledge discovery and data mining. ACM Press, New York, pp 571–576Google Scholar
  29. Karalic A (1992) Employing linear regression in regression tree leaves. In: Proc 10th European conf on artificial intelligence. Wiley, New York, pp 440–441Google Scholar
  30. Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: Proc 30th int conf on very large data bases. Morgan Kaufmann, San Francisco, pp 180–191Google Scholar
  31. Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: (eds) In: Proc 17th int conf on machine learning. Morgan Kaufmann, San Francisco, pp 487–494Google Scholar
  32. Klinkenberg R, Renz I (1998) Adaptive information filtering: learning in the presence of concept drifts. In: Proc AAAI98/ICML-98 wshp on learning for text categorization. AAAI Press, Menlo Park, pp 33–40Google Scholar
  33. Loh W (2002) Regression trees with unbiased variable selection and interaction detection (2002). Stat Sin 12: 361–386MATHMathSciNetGoogle Scholar
  34. Malerba D, Appice A, Ceci M, Monopoli M (2002) Trading-off local versus global effects of regression nodes in model trees. In: Proc 13th int symposium on foundations of intelligent systems, LNCS, vol 2366. Springer, Berlin, pp 393–402Google Scholar
  35. Mouss H, Mouss D, Mouss N, Sefouhi L (2004) Test of Page–Hinckley, an approach for fault detection in an agro-alimentary production system. In: Proc 5th Asian control conference, vol 2. IEEE Computer Society, Los Alamitos, CA, pp 815–818Google Scholar
  36. Musick R, Catlett J, Russell S (1993) Decision theoretic sub-sampling for induction on large databases. In: Proc 10th int conf on machine learning. Morgan Kaufmann, San Francisco, pp 212–219Google Scholar
  37. Pang KP, Ting KM (2005) Improving the centered CUSUMS statistic for structural break detection in time series. In: Proc 17th Australian joint conf on artificial intelligence, LNCS, vol 3339. Springer, Berlin, pp 402–413Google Scholar
  38. Pfahringer B, Holmes G, Kirkby R (2008) Handling numeric attributes in Hoeffding trees. In: Proc 12th Pacific-Asian conf on knowledge discovery and data mining, LNCS, vol 5012. Springer, Berlin, pp 296–307Google Scholar
  39. Potts D, Sammut C (2005) Morgan Kaufmann, San Francisco, pp 5–48. doi:10.1007/s10994-005-1121-8 CrossRefMATHGoogle Scholar
  40. Quinlan JR (1992) Learning with continuous classes. In: Proc 5th Australian joint conf on artificial intelligence. World Scientific, Singapore, pp 343–348Google Scholar
  41. Rajaraman K, Tan (2001) A topic detection, tracking, and trend analysis using self-organizing neural networks. In: Proc 5th Pacific-Asian conf on knowledge discovery and data mining, LNCS, vol 2035. Springer, Berlin, pp 102–107Google Scholar
  42. Rodrigues PP, Gama J, Bosnic Z (2008) Online reliability estimates for individual predictions in data streams. In: Proc IEEE int conf on data mining workshops. IEEE Computer Society, Los Alamitos, CA, pp 36–45Google Scholar
  43. Sebastiao R, Rodrigues PP, Gama J (2009) Change detection in climate data over the Iberian peninsula. In: Proc IEEE int conf on data mining workshops. IEEE Computer Society, Los Alamitos, CA, pp 248–253Google Scholar
  44. Siciliano R, Mola F (1994) Modeling for recursive partitioning and variable selection. In: Proc int conf on computational statistics. Physica Verlag, Heidelberg, pp 172–177Google Scholar
  45. Song X, Wu M, Jermaine C, Ranka S (2007) Statistical change detection for multidimensional data. In: Proc 13th ACM SIGKDD conf on knowledge discovery and data mining, pp 667–676Google Scholar
  46. Subramaniam S, Palpanas T, Papadopulous D, Kalogeraki V, Ginopulos D (2006) Online outlier detection in sensor data using non-parametric methods. In: Proc 32nd int conf on very large databases, ACM, New York, pp 187–198Google Scholar
  47. Torgo L (1997) Functional models for regression tree leaves. In: Proc 14th int conf on machine learning. Morgan Kaufmann, San Francisco, pp 385-393Google Scholar
  48. VFML (2003) A toolkit for mining high-speed time-changing data streams. http://www.cs.washington.edu/dm/vfml. Accessed 19 Jan 2010
  49. Vogel DS, Asparouhov O, Scheffer T (2007) Scalable look-ahead linear regression trees. In: Berkhin P, Caruana R, Wu X (eds) Proc 13th ACM SIGKDD int conf on knowledge discovery and data mining, KDD. ACMK, San Jose, CA, pp 757–764Google Scholar
  50. WEKA 3 (2005) Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka. Accessed 19 Jan 2010
  51. Widmer G, Kubat M (1996) Morgan Kaufmann, San Francisco, pp 69–101. doi:10.1007/BF00116900 Google Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  • Elena Ikonomovska
    • 1
    • 4
  • João Gama
    • 2
    • 3
  • Sašo Džeroski
    • 1
  1. 1.Jožef Stefan InstituteLjubljanaSlovenia
  2. 2.LIAAD/INESCUniversity of PortoPortoPortugal
  3. 3.Faculty of EconomicsUniversity of PortoPortoPortugal
  4. 4.Faculty of Electrical Engineering and Information TechnologiesSs. Cyril and Methodius UniversitySkopjeMacedonia

Personalised recommendations