Advertisement

Data Mining and Knowledge Discovery

, Volume 29, Issue 1, pp 168–202 | Cite as

Very fast decision rules for classification in data streams

  • Petr Kosina
  • João GamaEmail author
Article

Abstract

Data stream mining is the process of extracting knowledge structures from continuous, rapid data records. Many decision tasks can be formulated as stream mining problems and therefore many new algorithms for data streams are being proposed. Decision rules are one of the most interpretable and flexible models for predictive data mining. Nevertheless, few algorithms have been proposed in the literature to learn rule models for time-changing and high-speed flows of data. In this paper we present the very fast decision rules (VFDR) algorithm and discuss interesting extensions to the base version. All the proposed versions are one-pass and any-time algorithms. They work on-line and learn ordered or unordered rule sets. Algorithms designed to work with data streams should be able to detect changes and quickly adapt the decision model. In order to manage these situations we also present the adaptive extension (AVFDR) to detect changes in the process generating data and adapt the decision model. Detecting local drifts takes advantage of the modularity of the rule sets. In AVFDR, each individual rule monitors the evolution of performance metrics to detect concept drift. AVFDR prunes rules whenever a drift is signaled. This explicit change detection mechanism provides useful information about the dynamics of the process generating data, faster adaptation to changes and generates more compact rule sets. The experimental evaluation demonstrates that algorithms achieve competitive results in comparison to alternative methods and the adaptive methods are able to learn fast and compact rule sets from evolving streams.

Keywords

Data streams Classification Rule learning Concept drift 

Notes

Acknowledgments

The authors would like to express their gratitude to the reviewers of previous versions of the paper. This work is partially funded by FCT - Fundao para a Ciłncia e a Tecnologia/MEC - Ministrio da Educao e Ciłncia through National Funds (PIDDAC) and the ERDF - European Regional Development Fund through ON2 North Portugal Regional Operational Programme within the projects Knowledge Discovery from Ubiquitous Data Streams FCT-KDUS(PTDC/EIA/098355/2008), NORTE-07-0124-FEDER-000059. Authors also acknowledge the support of the European Commission through the project MAESTRA (Grant Number ICT-2013-612944). Petr Kosina also acknowledges the support of Faculty of Informatics, MU, Brno.

References

  1. Baena-Garcia M, Campo-Avila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams. ECML-PKDD, Berlin, pp 77–86Google Scholar
  2. Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Thiel K, Wiswedel B (2009) KNIME: the konstanz information miner: version 2.0 and beyond. SIGKDD Explor Newsl 11:26–31CrossRefGoogle Scholar
  3. Bifet A, Gavalda R (2009) Adaptive learning from evolving data streams. In: Advances in intelligent data analysis VIII. Lecture notes in computer science, vol 5772. Springer, Berlin/Heidelberg, pp 249–260Google Scholar
  4. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res (JMLR) 11:1601–1604Google Scholar
  5. Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09. ACM Press, New York, pp 139–148Google Scholar
  6. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees, 1st edn. Chapman and Hall/CRC, Boca RatonzbMATHGoogle Scholar
  7. Clark P, Boswell R (1991) Rule induction with CN2: some recent improvements. In: Proceedings of the European working session on machine learning, EWSL ’91. Springer, London, pp 151–163Google Scholar
  8. Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3:261–283Google Scholar
  9. Cohen W (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning, ICML’95. Morgan Kaufmann, San Francisco, pp 115–123Google Scholar
  10. Data Expo (2009) ASA sections on statistical computing statistical graphics. http://stat-computing.org/dataexpo/2009/. Accessed 1 Feb 2013
  11. Data Mining Group (2011) Predictive model markup language (pmml 4.1). http://www.dmg.org/v4-0-1/RuleSet.html. Accessed 1 Feb 2013
  12. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30zbMATHMathSciNetGoogle Scholar
  13. Domingos P (1996) Unifying instance-based and rule-based induction. Mach Learn 24:141–168Google Scholar
  14. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’00. ACM Press, New York, pp 71–80Google Scholar
  15. Ferrer F, Aguilar J, Riquelme J (2005) Incremental rule learning and border examples selection from numerical data streams. J Univ Comput Sci 11(8):1426–1439Google Scholar
  16. Frank A, Asuncion A (2010) UCI machine learning repository. University of California, IrvineGoogle Scholar
  17. Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the 15th international conference on machine learning, ICML’98. Morgan Kaufmann, San Mateo, pp 144–151Google Scholar
  18. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRefGoogle Scholar
  19. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92CrossRefGoogle Scholar
  20. Fürnkranz J (2001) Round robin rule learning. In: Proceedings of the 18th international conference on machine learning, ICML’01. Morgan Kaufmann, San Mateo, pp 146–153Google Scholar
  21. Fürnkranz J, Gamberger D, Lavrač N (2012) Foundations of rule learning. Springer, New YorkCrossRefzbMATHGoogle Scholar
  22. Gama J (2010) Knowledge discovery from data streams. Chapman and Hall/CRC, Baco RatonCrossRefzbMATHGoogle Scholar
  23. Gama J, Kosina P (2011) Learning decision rules from data streams. In: Proceedings of the 22nd international joint conference on artificial intelligence. AAAI, Menlo Park, pp 1255–1260Google Scholar
  24. Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’03. ACM Press, New York, pp 523–528Google Scholar
  25. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: SBIA Brazilian symposium on artificial intelligence, LNCS 3171. Springer, Heidelberg, pp 286–295Google Scholar
  26. Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Anal 10:23–45Google Scholar
  27. Gama J, Sebastiao R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09. ACM Press, New York, pp 329–338Google Scholar
  28. Grant E, Leavenworth R (1996) Statistical quality control. McGraw-Hill, New YorkGoogle Scholar
  29. Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Technical report, The University of New South Wales, SydneyGoogle Scholar
  30. Hinkley D (1970) Inference about the change point from cumulative sum-tests. Biometrika 58:509–523CrossRefMathSciNetGoogle Scholar
  31. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 97–106Google Scholar
  32. Katakis I, Tsoumakas G, Banos E, Bassiliades N, Vlahavas I (2009) An adaptive personalized news dissemination system. J Intell Inf Syst 32:191–212CrossRefGoogle Scholar
  33. Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300Google Scholar
  34. Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3th international IEEE conference on data mining. IEEE Computer Society, New York, pp 123–130Google Scholar
  35. Kosina P, Gama J (2012a) Handling time changing data with adaptive very fast decision rules. In: Proceedings of the 2012 European conference on machine learning and knowledge discovery in databases, ECML PKDD’12, vol I. Springer, Berlin, Heidelberg, pp 827–842Google Scholar
  36. Kosina P, Gama J (2012b) Very fast decision rules for multi-class problems. In: Proceedings of the 2012 ACM symposium on applied computing. ACM Press, New York, pp 795–800Google Scholar
  37. Lindgren T, Boström H (2004) Resolving rule conflicts with double induction. Intell Data Anal 8(5):457–468Google Scholar
  38. Maloof M, Michalski R (2004) Incremental learning with partial instance memory. Artif Intell 154:95–126CrossRefzbMATHMathSciNetGoogle Scholar
  39. Moro S, Laureano R, Cortez P (2011) Using data mining for bank direct marketing: an application of the crisp-dm methodology. In: Proceedings of the European simulation and modelling conference, ESM’2011. EUROSIS, Guimaraes, pp 117–121Google Scholar
  40. Nemenyi P (1963) Distribution-free multiple comparisons. PhD thesis, Princeton UniversityGoogle Scholar
  41. Oza NC, Russell S (2001) Online bagging and boosting. In: Artificial intelligence and statistics 2001. Morgan Kaufmann, San Mateo, pp 105–112Google Scholar
  42. Quinlan JR (1991) Determinate literals in inductive logic programming. In: Proceedings of the 12th international joint conference on artificial intelligence, IJCAI’91, vol 2. Morgan Kaufmann Publishers Inc, San Francisco, pp 746–750Google Scholar
  43. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San MateoGoogle Scholar
  44. Rivest R (1987) Learning decision lists. Mach Learn 2:229–246MathSciNetGoogle Scholar
  45. Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1:317–354Google Scholar
  46. Shaker A, Hüllermeier E (2012) IBLStreams: a system for instance-based classification and regression on data streams. Evol Syst 3:235–249CrossRefGoogle Scholar
  47. Street WN, Kim Y (2001) A streaming ensemble algorithm SEA for large-scale classification. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01. ACM Press, New York, pp 377–382Google Scholar
  48. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03. ACM Press, New York, pp 226–235Google Scholar
  49. Weiss SM, Indurkhya N (1998) Predictive data mining: a practical guide. Morgan Kaufmann Publishers, San FranciscozbMATHGoogle Scholar
  50. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23:69–101Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.LIAAD - INESC TECPortoPortugal
  2. 2.Faculty of InformaticsMasaryk UniversityBrnoCzech Republic
  3. 3.Faculty of EconomicsUniversity of PortoPortoPortugal

Personalised recommendations