Very fast decision rules for classification in data streams

Kosina, Petr; Gama, João

doi:10.1007/s10618-013-0340-z

Very fast decision rules for classification in data streams

Published: 03 December 2013

Volume 29, pages 168–202, (2015)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Petr Kosina^1,2 &
João Gama^1,3

1887 Accesses
31 Citations
1 Altmetric
Explore all metrics

Abstract

Data stream mining is the process of extracting knowledge structures from continuous, rapid data records. Many decision tasks can be formulated as stream mining problems and therefore many new algorithms for data streams are being proposed. Decision rules are one of the most interpretable and flexible models for predictive data mining. Nevertheless, few algorithms have been proposed in the literature to learn rule models for time-changing and high-speed flows of data. In this paper we present the very fast decision rules (VFDR) algorithm and discuss interesting extensions to the base version. All the proposed versions are one-pass and any-time algorithms. They work on-line and learn ordered or unordered rule sets. Algorithms designed to work with data streams should be able to detect changes and quickly adapt the decision model. In order to manage these situations we also present the adaptive extension (AVFDR) to detect changes in the process generating data and adapt the decision model. Detecting local drifts takes advantage of the modularity of the rule sets. In AVFDR, each individual rule monitors the evolution of performance metrics to detect concept drift. AVFDR prunes rules whenever a drift is signaled. This explicit change detection mechanism provides useful information about the dynamics of the process generating data, faster adaptation to changes and generates more compact rule sets. The experimental evaluation demonstrates that algorithms achieve competitive results in comparison to alternative methods and the adaptive methods are able to learn fast and compact rule sets from evolving streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

eRules: A Modular Adaptive Classification Rule Learning Algorithm for Data Streams

RILL: Algorithm for Learning Rules from Streaming Data with Concept Drift

Computationally Efficient Rule-Based Classification for Continuous Streaming Data

Notes

Note that decision lists are ordered rule sets.
Weighted Max generally did not produce results much different from the Weighted Sum therefore we opted for not including this setting in the results.

References

Baena-Garcia M, Campo-Avila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams. ECML-PKDD, Berlin, pp 77–86
Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Thiel K, Wiswedel B (2009) KNIME: the konstanz information miner: version 2.0 and beyond. SIGKDD Explor Newsl 11:26–31
Article Google Scholar
Bifet A, Gavalda R (2009) Adaptive learning from evolving data streams. In: Advances in intelligent data analysis VIII. Lecture notes in computer science, vol 5772. Springer, Berlin/Heidelberg, pp 249–260
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res (JMLR) 11:1601–1604
Google Scholar
Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09. ACM Press, New York, pp 139–148
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees, 1st edn. Chapman and Hall/CRC, Boca Raton
MATH Google Scholar
Clark P, Boswell R (1991) Rule induction with CN2: some recent improvements. In: Proceedings of the European working session on machine learning, EWSL ’91. Springer, London, pp 151–163
Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3:261–283
Google Scholar
Cohen W (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning, ICML’95. Morgan Kaufmann, San Francisco, pp 115–123
Data Expo (2009) ASA sections on statistical computing statistical graphics. http://stat-computing.org/dataexpo/2009/. Accessed 1 Feb 2013
Data Mining Group (2011) Predictive model markup language (pmml 4.1). http://www.dmg.org/v4-0-1/RuleSet.html. Accessed 1 Feb 2013
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar
Domingos P (1996) Unifying instance-based and rule-based induction. Mach Learn 24:141–168
Google Scholar
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’00. ACM Press, New York, pp 71–80
Ferrer F, Aguilar J, Riquelme J (2005) Incremental rule learning and border examples selection from numerical data streams. J Univ Comput Sci 11(8):1426–1439
Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. University of California, Irvine
Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the 15th international conference on machine learning, ICML’98. Morgan Kaufmann, San Mateo, pp 144–151
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article Google Scholar
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92
Article Google Scholar
Fürnkranz J (2001) Round robin rule learning. In: Proceedings of the 18th international conference on machine learning, ICML’01. Morgan Kaufmann, San Mateo, pp 146–153
Fürnkranz J, Gamberger D, Lavrač N (2012) Foundations of rule learning. Springer, New York
Book MATH Google Scholar
Gama J (2010) Knowledge discovery from data streams. Chapman and Hall/CRC, Baco Raton
Book MATH Google Scholar
Gama J, Kosina P (2011) Learning decision rules from data streams. In: Proceedings of the 22nd international joint conference on artificial intelligence. AAAI, Menlo Park, pp 1255–1260
Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’03. ACM Press, New York, pp 523–528
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: SBIA Brazilian symposium on artificial intelligence, LNCS 3171. Springer, Heidelberg, pp 286–295
Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Anal 10:23–45
Google Scholar
Gama J, Sebastiao R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09. ACM Press, New York, pp 329–338
Grant E, Leavenworth R (1996) Statistical quality control. McGraw-Hill, New York
Google Scholar
Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Technical report, The University of New South Wales, Sydney
Hinkley D (1970) Inference about the change point from cumulative sum-tests. Biometrika 58:509–523
Article MathSciNet Google Scholar
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 97–106
Katakis I, Tsoumakas G, Banos E, Bassiliades N, Vlahavas I (2009) An adaptive personalized news dissemination system. J Intell Inf Syst 32:191–212
Article Google Scholar
Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300
Google Scholar
Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3th international IEEE conference on data mining. IEEE Computer Society, New York, pp 123–130
Kosina P, Gama J (2012a) Handling time changing data with adaptive very fast decision rules. In: Proceedings of the 2012 European conference on machine learning and knowledge discovery in databases, ECML PKDD’12, vol I. Springer, Berlin, Heidelberg, pp 827–842
Kosina P, Gama J (2012b) Very fast decision rules for multi-class problems. In: Proceedings of the 2012 ACM symposium on applied computing. ACM Press, New York, pp 795–800
Lindgren T, Boström H (2004) Resolving rule conflicts with double induction. Intell Data Anal 8(5):457–468
Google Scholar
Maloof M, Michalski R (2004) Incremental learning with partial instance memory. Artif Intell 154:95–126
Article MATH MathSciNet Google Scholar
Moro S, Laureano R, Cortez P (2011) Using data mining for bank direct marketing: an application of the crisp-dm methodology. In: Proceedings of the European simulation and modelling conference, ESM’2011. EUROSIS, Guimaraes, pp 117–121
Nemenyi P (1963) Distribution-free multiple comparisons. PhD thesis, Princeton University
Oza NC, Russell S (2001) Online bagging and boosting. In: Artificial intelligence and statistics 2001. Morgan Kaufmann, San Mateo, pp 105–112
Quinlan JR (1991) Determinate literals in inductive logic programming. In: Proceedings of the 12th international joint conference on artificial intelligence, IJCAI’91, vol 2. Morgan Kaufmann Publishers Inc, San Francisco, pp 746–750
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo
Google Scholar
Rivest R (1987) Learning decision lists. Mach Learn 2:229–246
MathSciNet Google Scholar
Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1:317–354
Google Scholar
Shaker A, Hüllermeier E (2012) IBLStreams: a system for instance-based classification and regression on data streams. Evol Syst 3:235–249
Article Google Scholar
Street WN, Kim Y (2001) A streaming ensemble algorithm SEA for large-scale classification. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’01. ACM Press, New York, pp 377–382
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03. ACM Press, New York, pp 226–235
Weiss SM, Indurkhya N (1998) Predictive data mining: a practical guide. Morgan Kaufmann Publishers, San Francisco
MATH Google Scholar
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23:69–101
Google Scholar

Download references

Acknowledgments

The authors would like to express their gratitude to the reviewers of previous versions of the paper. This work is partially funded by FCT - Fundao para a Ciłncia e a Tecnologia/MEC - Ministrio da Educao e Ciłncia through National Funds (PIDDAC) and the ERDF - European Regional Development Fund through ON2 North Portugal Regional Operational Programme within the projects Knowledge Discovery from Ubiquitous Data Streams FCT-KDUS(PTDC/EIA/098355/2008), NORTE-07-0124-FEDER-000059. Authors also acknowledge the support of the European Commission through the project MAESTRA (Grant Number ICT-2013-612944). Petr Kosina also acknowledges the support of Faculty of Informatics, MU, Brno.

Author information

Authors and Affiliations

LIAAD - INESC TEC, Porto, Portugal
Petr Kosina & João Gama
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Kosina
Faculty of Economics, University of Porto, Porto, Portugal
João Gama

Authors

Petr Kosina
View author publications
You can also search for this author in PubMed Google Scholar
João Gama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Gama.

Additional information

Responsible editor: Johannes Fürnkranz.

Appendices

Appendix 1: Datasets

In this section we describe the datasets that are used in the experiments. We have used large scale artificial and real world datasets. The real world datasets were previously used in other works when testing on-line learning algorithms as they represent large datasets and it is likely that they contain drifts, but their presence and nature is not known.

1.1 Artificial datasets

The artificial datasets are obtained using generators proposed by Bifet et al. (2010), each generator was used to produce five datasets with different random seeds. The hyperplane dataset is generated such that the class is given by rotating hyperplane (Hulten et al. 2001). A hyperplane in d-dimensional space is set of points x that satisfy \(\sum \nolimits _{i=1}^{d} w_i x_i = w_0\) where \(x_i\) is the \(i\)th coordinate of x. \(\sum \nolimits _{i=1}^{d} w_i x_i \ge w_0\) then represents the positive and \(\sum \nolimits _{i=1}^{d} w_i x_i < w_0\) the negative concept. This set with 100,000 examples has two classes, ten attributes and five of them changing at speed 0.01 with 5 % noise (probability for each instance to have its class inverted).

Another artificial dataset is SEA concepts (Street and Kim 2001) and is commonly used in stream mining tasks that require time changing qualities of data. It is a two-class problem, defined by three attributes (two relevant) and 10 % of noise (the same as previous). The domain of the attributes is: \(x_i \in [0,10]\), where \(i=1,2,3\). The target concept is \(x_1 + x_2\le \beta \), where here \(\beta \in \{7,8,9,9.5\}\). There are four concepts; the size of each is 15,000 examples, with a total 60,000 for the whole dataset.

LED is formed by examples (Breiman et al. 1984) with \(\{0, 1\}\) values of each attribute signaling whether given LED is off or on. Only seven out of 24 are relevant. Class label reflects the number (0–9) displayed by the diodes. There is 10 % of noise added to this dataset (probability for each attribute that it would have its value inverted). The generated set size is 200,000 instances. The drift in this dataset is caused by changing relevant attributes with irrelevant.

The goal in the Waveform dataset is to recognize three different classes of waveform. The waveforms are generated from a combination of two or three base waves. The optimal Bayes classification rate is known to be 86 %. The dataset has 21 numeric attributes, all of which include noise, and consists of 100,000 examples. The drift switches the positions (attributes) of the generated attribute-values.

The radial basis function (RBF) generates a fixed number of random centroids. Each center has a random position, a single SD, class label and weight. A new example is generated by a randomly selected center. The weights are considered and centers with higher weight are more likely to be chosen. Then a random direction is chosen to offset the attribute values from the central point. The displacement length is randomly drawn from a Gaussian distribution with SD determined by the chosen centroid. The chosen centroid also determines the class label of the example. The generated RBF datasets have ten numerical attributes and 50 centers with two classes. The number of examples is 100,000, the speed of change of centroids is 0.0001, and the number of centroids with drift is 50.

1.2 Real-world datasets

The real-world datasets are large datasets, which are used with the ordering of examples the way they were collected as it is likely that they contain drift. The intrusion detection from KDDCUP 99 obtained from the UCI repository (Frank and Asuncion 2010), is a data set describing connections which are labeled either as normal or one of four categories of attack. The dataset consists of 4,898,431 instances.

The next dataset is forestCovtype also from the UCI repository (Frank and Asuncion 2010), which has 54 cartographic attributes, continuous and categorical. The goal is to predict the forest cover type for given area. The dataset contains 581,012 instances.

The elec dataset (Harries 1999) contains data collected from electricity market of New South Wales, Australia. It has 45,312 instances.

The task of Airlines dataset based on data from Data Expo (2009) is to predict whether a flight will be delayed given the information of the scheduled departure in seven attributes. It consists of 539,383 instances.

The connect-4 dataset from the UCI repository (Frank and Asuncion 2010) consists of 42 categorical attributes and contains 67,557 examples.

The pokerhand (Frank and Asuncion 2010) consists of 829,201 instances and ten predictive attributes. Each example represents a hand consisting of five playing cards drawn from a standard deck of 52. Each card is described using two attributes (suit and rank). The class describes the poker hand. This dataset was modified so that the cards are sorted by rank and suit and the duplicates were removed.

The bank dataset (Moro et al. 2011) is related with direct marketing campaigns of a Portuguese bank institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required in order to access if the product (bank term deposit) would be (or not) subscribed. The classification task is to predict if the client will subscribe a term deposit. The full dataset has 45,211 examples with 16 attributes.

Katakis et al. (2009) presented the spam dataset, a real world text data stream that is chronologically ordered to represent the evolution of Spam messages over time. There are two classes, legitimate and Spam messages, and 9,324 examples with 500 attributes.

Appendix 2: Results from tests on shuffled real world datasets

See Appendix Tables 16, 17 and 18.

Table 16 Prequential error rates of the classifiers on shuffled real world data

Full size table

Table 17 The p values of paired t tests on shuffled real world datasets

Full size table

Table 18 Number of rules of rule classifiers and leaves of VFDTc on stationary data and shuffled real world datasets

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kosina, P., Gama, J. Very fast decision rules for classification in data streams. Data Min Knowl Disc 29, 168–202 (2015). https://doi.org/10.1007/s10618-013-0340-z

Download citation

Received: 31 July 2012
Accepted: 11 November 2013
Published: 03 December 2013
Issue Date: January 2015
DOI: https://doi.org/10.1007/s10618-013-0340-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Very fast decision rules for classification in data streams

Abstract

Access this article

Similar content being viewed by others

eRules: A Modular Adaptive Classification Rule Learning Algorithm for Data Streams

RILL: Algorithm for Learning Rules from Streaming Data with Concept Drift

Computationally Efficient Rule-Based Classification for Continuous Streaming Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Datasets

1.1 Artificial datasets

1.2 Real-world datasets

Appendix 2: Results from tests on shuffled real world datasets

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Very fast decision rules for classification in data streams

Abstract

Access this article

Similar content being viewed by others

eRules: A Modular Adaptive Classification Rule Learning Algorithm for Data Streams

RILL: Algorithm for Learning Rules from Streaming Data with Concept Drift

Computationally Efficient Rule-Based Classification for Continuous Streaming Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Datasets

1.1 Artificial datasets

1.2 Real-world datasets

Appendix 2: Results from tests on shuffled real world datasets

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation