Computationally Efficient Rule-Based Classification for Continuous Streaming Data

Le, Thien; Stahl, Frederic; Gomes, João Bártolo; Gaber, Mohamed Medhat; Fatta, Giuseppe Di

doi:10.1007/978-3-319-12069-0_2

Computationally Efficient Rule-Based Classification for Continuous Streaming Data

Thien Le³,
Frederic Stahl³,
João Bártolo Gomes⁴,
Mohamed Medhat Gaber⁵ &
…
Giuseppe Di Fatta³

Conference paper
First Online: 30 October 2014

586 Accesses
13 Citations
3 Altmetric

Abstract

Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. SIGMOD Rec. 34(2), 18–26 (2005)
Article Google Scholar
de Aquino, A.L.L., Figueiredo, C.M.S., Nakamura, E.F., Buriol, L.S., Loureiro, A.A.F., Fernandes, A.O., Coelho, Jr. C.J.N.: Data stream based algorithms for wireless sensor network applications. In: AINA, pp. 869–876. IEEE Computer Society, New York (2007)
Google Scholar
Thuraisingham, B.M.: Data mining for security applications: Mining concept-drifting data streams to detect peer to peer botnet traffic. In: ISI, IEEE (2008)
Google Scholar
Stahl, F., Gaber, M.M., Salvador, M.M.: eRules: amodular adaptive classification rule learning algorithm for data streams. In: Bramer, M., Petridis, M., (eds.) SGAI Conference, pp. 65–78. Springer, Berlin (2012)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS’02, pp. 1–16. ACM, New York (2002)
Google Scholar
Cendrowska, J.: PRISM: an algorithm for inducing modular rules. Int. J. Man-Mach. Stud. 27(4), 349–370 (1987)
Article MATH Google Scholar
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: A survey of classification methods in data streams. In: Data Streams, pp. 39–59. Springer, Berlin (2007)
Google Scholar
Domingos, P., Hulten, G., Mining high-speed data streams. In: Ramakrishnan, R., Stolfo, S.J., Bayardo, R.J., Parsa, I. (eds.) KDD, pp. 71–80. ACM (2000)
Google Scholar
Witten, I.H., Eibe, F.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edn. In: Kaufmann, M. (2005)
Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Getoor, L., Senator, T.E., Domingos, P., Faloutsos, C. (eds.) KDD, pp. 226–235. ACM (2003)
Google Scholar
Gama, J., Kosina, P.: Learning decision rules from data streams. In: Walsh, T. (ed.) IJCAI 23, pp. 1255–1260. IJCAI/AAAI (2011)
Google Scholar
Han, J., Kamber, M.: Data Mining. Concepts and Techniques, 2nd edn. In: Kaufmann, M. (2006)
Google Scholar
Bruce, N., Pope, D., Stanistreet, D.: Quantitative methods for health research: a practical interactive guide to epidemiology and statistics. Wiley, Chichester (2008), 2013
Google Scholar
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learning Res. 11, 1601–1604 (2010)
Google Scholar
Hoeglinger, S., Pears, R.: Use of hoeffding trees in concept based data stream mining. In: Third International Conference on Information and automation for sustainability, ICIAFS 2007, pp. 57–62. IEEE, New York (2007)
Google Scholar
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Lee, D., Schkolnick, M., Provost, F.J., Srikant, R. (eds.) KDD, pp. 377–382. ACM (2001)
Google Scholar
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Elder IV, J.F., Fogelman-Soulié, F., Flach, P.A., Zaki, M., (eds.) KDD, pp. 139–148. ACM, New York (2009)
Google Scholar
Abdulsalam, H., Skillicorn, D.B., Martin, P.: Streaming random forests. In: IDEAS, pp. 225–232. IEEE Computer Society, Washington, DC (2007)
Google Scholar
Datasets from moa (massive online analysis) website (online). Accessed Apr 2014
Google Scholar
Ikonomovska, E., Gama, J., Dzeroski, S.: Learning model trees from evolving data streams. Data Min. Knowl. Discov. 23(1), 128168 (2011)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Systems Engineering, University of Reading, Whiteknights, PO Box 225, Reading, RG6 6AY, UK
Thien Le, Frederic Stahl & Giuseppe Di Fatta
Institute for Infocomm Research (I2R), A*STAR 1 Fusionopolis Way Connexis, Singapore, 138632, Singapore
João Bártolo Gomes
School of Computing Science and Digital Media, Robert Gordon University, Riverside East, Garthdee Road, Aberdeen, AB10 7GJ, UK
Mohamed Medhat Gaber

Authors

Thien Le
View author publications
You can also search for this author in PubMed Google Scholar
Frederic Stahl
View author publications
You can also search for this author in PubMed Google Scholar
João Bártolo Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Medhat Gaber
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Di Fatta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thien Le .

Editor information

Editors and Affiliations

University of Portsmouth, Portsmouth, United Kingdom
Max Bramer
School of Computing, Eng & Mathematics, University of Brighton, Brighton, West Sussex, United Kingdom
Miltos Petridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le, T., Stahl, F., Gomes, J.B., Gaber, M.M., Fatta, G.D. (2014). Computationally Efficient Rule-Based Classification for Continuous Streaming Data. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXI. SGAI 2014. Springer, Cham. https://doi.org/10.1007/978-3-319-12069-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-12069-0_2
Published: 30 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12068-3
Online ISBN: 978-3-319-12069-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics