A hybrid decision tree training method using data streams

Abstract

Classical classification methods usually assume that pattern recognition models do not depend on the timing of the data. However, this assumption is not valid in cases where new data frequently become available. Such situations are common in practice, for example, spam filtering or fraud detection, where dependencies between feature values and class numbers are continually changing. Unfortunately, most classical machine learning methods (such as decision trees) do not take into consideration the possibility of the model changing, as a result of so-called concept drift and they cannot adapt to a new classification model. This paper focuses on the problem of concept drift, which is a very important issue, especially in data mining methods that use complex structures (such as decision trees) for making decisions. We propose an algorithm that is able to co-train decision trees using a modified NGE (Nested Generalized Exemplar) algorithm. The potential for adaptation of the proposed algorithm and the quality thereof are evaluated through computer experiments, carried out on benchmark datasets from the UCI Machine Learning Repository.

References

  1. 1

    Aggarwal ChC (2009) On classification and segmentation of massive audio data streams. Knowl Inf Syst 20(2): 137–156

    MathSciNet  Article  Google Scholar 

  2. 2

    Aha, DW (ed) (1997) Lazy learning. Kluwer, Dordrecht

    Google Scholar 

  3. 3

    Aksela M, Laaksonen J (2007) Adaptive combination of adaptive classifiers for handwritten character recognition. Pattern Recognit Lett 28(1): 136–143

    Article  Google Scholar 

  4. 4

    Alpaydin W (2010) Introduction to Machine Learning, 2nd edn. The MIT Press, London

    Google Scholar 

  5. 5

    Asuncion A, Newman DJ (2007) UCI Mach.Learn. Rep. [http://www.ics.uci.edu/~mlearn/MLRepository.html]. University of California, School of Information and Computer Science, Irvine, CA

  6. 6

    Ben-Haim Y, Yom-Tov E (2011) A streaming parallel decision tree algorithm. J Mach Learn Res 11: 849–872

    MathSciNet  Google Scholar 

  7. 7

    Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In Proceedings of SIAM International Conference on Data Mining (SDM’07)

  8. 8

    Bishop ChM (2006) Pattern recognition and machine learning. Springer, Berlin

    Google Scholar 

  9. 9

    Black M, Hickey R (2002) Classification of customer call data in the presence of concept drift and noise. In: Proceedings of the 1st international conference on computing in an imperfect world soft-ware 2002. Springer, Berlin, pp 74–87

  10. 10

    Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks, Monterey

    Google Scholar 

  11. 11

    Brendt M (1995) Instance-based learning: nearest neighbour with generalization, Techical Report of Department of Computer Science. University of Waitako, New Zealand

  12. 12

    Domingos P, Hulten G (2000) Mining highSpeed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. Boston, Massachusetts, United States pp 71–80

  13. 13

    Duin RPW et al (2004) PRTools4, A Matlab Toolbox for Pattern Recognition. Delft University of Technology

  14. 14

    Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1): 1–67

    MATH  Article  Google Scholar 

  15. 15

    Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM SIGMOD Record 34(1): 18–26

    Article  Google Scholar 

  16. 16

    Gehrke J, Ganti V, Ramakrishnan R, Loh W-L (1999) BOAT: optimistic decision tree construction. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data. Philadelphia, PA, pp 169–180

  17. 17

    Holte R (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11: 63–91

    MATH  Article  Google Scholar 

  18. 18

    Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceeding of the 7th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–101

  19. 19

    Is See5/C5.0 Better Than C4.5?, RuleQuest Research Pty Ltd. http://rulequest.com/see5-comparison.html. Accessed 10 August 2010

  20. 20

    Jin R, Agrawal G (2003) Communication and memory efficient parallel decision tree construction. In: Proceedings of the 3rd SIAM conference on data mining. pp 119–129

  21. 21

    Jin R, Agrawal G (2003) Efficient decision tree construction on streaming data. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (SIGKDD). Washington, D.C. pp 571–576

  22. 22

    Jiang L, Li Ch, Cai Z (2009) Learning decision tree for ranking. Knowl Inf Syst 20(1): 123–135

    Article  Google Scholar 

  23. 23

    Kelly M, Hand D, Adams N (1999) The impact of changing populations on classifier performance. In: Proceedings of the 5th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 367–371

  24. 24

    Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint Conference on artificial intell. San Mateo, pp 1137–1143

  25. 25

    Kufrin R (1997) Decision trees on parallel processors. In: Geller J, Kitano H, Suttner CB (eds) Parallel processing for artificial intelligence, vol.3. Elsevier Science, Amsterdam, pp 279–306

    Google Scholar 

  26. 26

    Liu H, Lin Y, Han J Methods for mining frequent items in data streams: an overview. Knowl Inf Syst doi:10.1007/s10115-009-0267-2

  27. 27

    Liu S, Duffy AHB, Whitfield RI, Boyle IM (2010) Integration of decision support systems to improve decision support performance. Knowl Inf Syst 22(3): 261–286

    Article  Google Scholar 

  28. 28

    Mehta M, et al (1996) SLIQ: A fast scalable classifier for data mining. In: Proceedings of the 5th international conference on extending database technology, pp 18–32

  29. 29

    Paliouras G, Bree DS (1995) The effect of numeric features on the scalability of inductive learning programs. Lecture notes in computer science 912: 218–231

  30. 30

    Patcha A, Park J (2007) An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput. Netw. 51(12): 3448–3470

    Article  Google Scholar 

  31. 31

    Quinlan JR (1986) Induction on decision tree. Mach Learn 1: 81–106

    Google Scholar 

  32. 32

    Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Los Altos

    Google Scholar 

  33. 33

    Salzberg S (1991) A nearest hyperrectangle learning method. Mach Learn 6: 251–276

    Google Scholar 

  34. 34

    Su J, Zhang H (2006) A fast decision tree learning algorithm. In: Proceedings of the twenty-first AAAI conference on artificial intelligence. Boston, Massachusetts July 16–20, pp 500–505

  35. 35

    Shafer J et al (1996) SPRINT: a scalable parallel classifier for data mining. In the Proceedings of the 22nd VLBD conference, pp. 544–555

  36. 36

    Srivastava A et al (1999) Parallel formulations of decision tree classification algorithms. Data Min Knowl Discov 3(3): 237–261

    Article  Google Scholar 

  37. 37

    Tsymbal A (2004) The problem of concept drift: Definitions and related work. Technical report Department of Computer Science, Trinity College: Dublin, Ireland

  38. 38

    Ulaş A, Semerci M, Yıldız OT, Alpaydın E (2009) Incremental construction of classifier and discriminant ensembles. Inf Sci 179(9): 1298–1318

    Article  Google Scholar 

  39. 39

    Wettschereck D (1994) A hybrid nearest-neighbor and nearest-hyperrectangle algorithm. In: Proceedings of the European Conference on machine learning, pp 323–335

  40. 40

    Wettschereck D, Dietterich TG (1995) An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Mach Learn 19: 5–27

    Google Scholar 

  41. 41

    Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann Publisher, Los Altos

    Google Scholar 

  42. 42

    Wozniak M (2009) Modification of nested hyperrectangle exemplar as a proposition of information fusion method. LNCS 5788: 687–694

    Google Scholar 

  43. 43

    Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37

    Article  Google Scholar 

  44. 44

    Yang Ch-T, Tsai ST, Li K-Ch (2005) Decision tree construction for data mining on grid computing environments. In: Proceedings of the 19th international conference on advanced information networking and applications AINA’05. Taipei, Taiwan, pp 421–424

  45. 45

    Yidiz OT, Dikmen O (2007) Parallel univariate decision trees. Pattern Recognit Lett 28(7): 825–832

    Article  Google Scholar 

  46. 46

    Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3): 339–363

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgments

This work is supported in part by the Polish State Committee for Scientific Research under a grant for the period 2010–2013.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Michal Wozniak.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and Permissions

About this article

Cite this article

Wozniak, M. A hybrid decision tree training method using data streams. Knowl Inf Syst 29, 335–347 (2011). https://doi.org/10.1007/s10115-010-0345-5

Download citation

Keywords

  • Nested generalized exemplar
  • Nearest hyperrectangle
  • Concept drift
  • Decision tree
  • Incremental learning
  • Pattern recognition