# Very fast decision rules for classification in data streams

- 1.4k Downloads
- 16 Citations

## Abstract

Data stream mining is the process of extracting knowledge structures from continuous, rapid data records. Many decision tasks can be formulated as stream mining problems and therefore many new algorithms for data streams are being proposed. Decision rules are one of the most interpretable and flexible models for predictive data mining. Nevertheless, few algorithms have been proposed in the literature to learn rule models for time-changing and high-speed flows of data. In this paper we present the very fast decision rules (VFDR) algorithm and discuss interesting extensions to the base version. All the proposed versions are one-pass and any-time algorithms. They work on-line and learn ordered or unordered rule sets. Algorithms designed to work with data streams should be able to detect changes and quickly adapt the decision model. In order to manage these situations we also present the adaptive extension (AVFDR) to detect changes in the process generating data and adapt the decision model. Detecting local drifts takes advantage of the modularity of the rule sets. In AVFDR, each individual rule monitors the evolution of performance metrics to detect concept drift. AVFDR prunes rules whenever a drift is signaled. This explicit change detection mechanism provides useful information about the dynamics of the process generating data, faster adaptation to changes and generates more compact rule sets. The experimental evaluation demonstrates that algorithms achieve competitive results in comparison to alternative methods and the adaptive methods are able to learn fast and compact rule sets from evolving streams.

## Keywords

Data streams Classification Rule learning Concept drift## Notes

### Acknowledgments

The authors would like to express their gratitude to the reviewers of previous versions of the paper. This work is partially funded by FCT - Fundao para a Ciłncia e a Tecnologia/MEC - Ministrio da Educao e Ciłncia through National Funds (PIDDAC) and the ERDF - European Regional Development Fund through ON2 North Portugal Regional Operational Programme within the projects Knowledge Discovery from Ubiquitous Data Streams FCT-KDUS(PTDC/EIA/098355/2008), NORTE-07-0124-FEDER-000059. Authors also acknowledge the support of the European Commission through the project MAESTRA (Grant Number ICT-2013-612944). Petr Kosina also acknowledges the support of Faculty of Informatics, MU, Brno.

## References

- Baena-Garcia M, Campo-Avila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams. ECML-PKDD, Berlin, pp 77–86Google Scholar
- Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Thiel K, Wiswedel B (2009) KNIME: the konstanz information miner: version 2.0 and beyond. SIGKDD Explor Newsl 11:26–31CrossRefGoogle Scholar
- Bifet A, Gavalda R (2009) Adaptive learning from evolving data streams. In: Advances in intelligent data analysis VIII. Lecture notes in computer science, vol 5772. Springer, Berlin/Heidelberg, pp 249–260Google Scholar
- Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res (JMLR) 11:1601–1604Google Scholar
- Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09. ACM Press, New York, pp 139–148Google Scholar
- Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees, 1st edn. Chapman and Hall/CRC, Boca RatonzbMATHGoogle Scholar
- Clark P, Boswell R (1991) Rule induction with CN2: some recent improvements. In: Proceedings of the European working session on machine learning, EWSL ’91. Springer, London, pp 151–163Google Scholar
- Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3:261–283Google Scholar
- Cohen W (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning, ICML’95. Morgan Kaufmann, San Francisco, pp 115–123Google Scholar
- Data Expo (2009) ASA sections on statistical computing statistical graphics. http://stat-computing.org/dataexpo/2009/. Accessed 1 Feb 2013
- Data Mining Group (2011) Predictive model markup language (pmml 4.1). http://www.dmg.org/v4-0-1/RuleSet.html. Accessed 1 Feb 2013
- Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30zbMATHMathSciNetGoogle Scholar
- Domingos P (1996) Unifying instance-based and rule-based induction. Mach Learn 24:141–168Google Scholar
- Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’00. ACM Press, New York, pp 71–80Google Scholar
- Ferrer F, Aguilar J, Riquelme J (2005) Incremental rule learning and border examples selection from numerical data streams. J Univ Comput Sci 11(8):1426–1439Google Scholar
- Frank A, Asuncion A (2010) UCI machine learning repository. University of California, IrvineGoogle Scholar
- Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the 15th international conference on machine learning, ICML’98. Morgan Kaufmann, San Mateo, pp 144–151Google Scholar
- Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRefGoogle Scholar
- Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92CrossRefGoogle Scholar
- Fürnkranz J (2001) Round robin rule learning. In: Proceedings of the 18th international conference on machine learning, ICML’01. Morgan Kaufmann, San Mateo, pp 146–153Google Scholar
- Fürnkranz J, Gamberger D, Lavrač N (2012) Foundations of rule learning. Springer, New YorkCrossRefzbMATHGoogle Scholar
- Gama J (2010) Knowledge discovery from data streams. Chapman and Hall/CRC, Baco RatonCrossRefzbMATHGoogle Scholar
- Gama J, Kosina P (2011) Learning decision rules from data streams. In: Proceedings of the 22nd international joint conference on artificial intelligence. AAAI, Menlo Park, pp 1255–1260Google Scholar
- Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’03. ACM Press, New York, pp 523–528Google Scholar
- Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: SBIA Brazilian symposium on artificial intelligence, LNCS 3171. Springer, Heidelberg, pp 286–295Google Scholar
- Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Anal 10:23–45Google Scholar
- Gama J, Sebastiao R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09. ACM Press, New York, pp 329–338Google Scholar
- Grant E, Leavenworth R (1996) Statistical quality control. McGraw-Hill, New YorkGoogle Scholar
- Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Technical report, The University of New South Wales, SydneyGoogle Scholar
- Hinkley D (1970) Inference about the change point from cumulative sum-tests. Biometrika 58:509–523CrossRefMathSciNetGoogle Scholar
- Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 97–106Google Scholar
- Katakis I, Tsoumakas G, Banos E, Bassiliades N, Vlahavas I (2009) An adaptive personalized news dissemination system. J Intell Inf Syst 32:191–212CrossRefGoogle Scholar
- Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300Google Scholar
- Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3th international IEEE conference on data mining. IEEE Computer Society, New York, pp 123–130Google Scholar
- Kosina P, Gama J (2012a) Handling time changing data with adaptive very fast decision rules. In: Proceedings of the 2012 European conference on machine learning and knowledge discovery in databases, ECML PKDD’12, vol I. Springer, Berlin, Heidelberg, pp 827–842Google Scholar
- Kosina P, Gama J (2012b) Very fast decision rules for multi-class problems. In: Proceedings of the 2012 ACM symposium on applied computing. ACM Press, New York, pp 795–800Google Scholar
- Lindgren T, Boström H (2004) Resolving rule conflicts with double induction. Intell Data Anal 8(5):457–468Google Scholar
- Maloof M, Michalski R (2004) Incremental learning with partial instance memory. Artif Intell 154:95–126CrossRefzbMATHMathSciNetGoogle Scholar
- Moro S, Laureano R, Cortez P (2011) Using data mining for bank direct marketing: an application of the crisp-dm methodology. In: Proceedings of the European simulation and modelling conference, ESM’2011. EUROSIS, Guimaraes, pp 117–121Google Scholar
- Nemenyi P (1963) Distribution-free multiple comparisons. PhD thesis, Princeton UniversityGoogle Scholar
- Oza NC, Russell S (2001) Online bagging and boosting. In: Artificial intelligence and statistics 2001. Morgan Kaufmann, San Mateo, pp 105–112Google Scholar
- Quinlan JR (1991) Determinate literals in inductive logic programming. In: Proceedings of the 12th international joint conference on artificial intelligence, IJCAI’91, vol 2. Morgan Kaufmann Publishers Inc, San Francisco, pp 746–750Google Scholar
- Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San MateoGoogle Scholar
- Rivest R (1987) Learning decision lists. Mach Learn 2:229–246MathSciNetGoogle Scholar
- Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1:317–354Google Scholar
- Shaker A, Hüllermeier E (2012) IBLStreams: a system for instance-based classification and regression on data streams. Evol Syst 3:235–249CrossRefGoogle Scholar
- Street WN, Kim Y (2001) A streaming ensemble algorithm SEA for large-scale classification. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01. ACM Press, New York, pp 377–382Google Scholar
- Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03. ACM Press, New York, pp 226–235Google Scholar
- Weiss SM, Indurkhya N (1998) Predictive data mining: a practical guide. Morgan Kaufmann Publishers, San FranciscozbMATHGoogle Scholar
- Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23:69–101Google Scholar