Abstract
Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. SIGMOD Rec. 34(2), 18–26 (2005)
de Aquino, A.L.L., Figueiredo, C.M.S., Nakamura, E.F., Buriol, L.S., Loureiro, A.A.F., Fernandes, A.O., Coelho, Jr. C.J.N.: Data stream based algorithms for wireless sensor network applications. In: AINA, pp. 869–876. IEEE Computer Society, New York (2007)
Thuraisingham, B.M.: Data mining for security applications: Mining concept-drifting data streams to detect peer to peer botnet traffic. In: ISI, IEEE (2008)
Stahl, F., Gaber, M.M., Salvador, M.M.: eRules: amodular adaptive classification rule learning algorithm for data streams. In: Bramer, M., Petridis, M., (eds.) SGAI Conference, pp. 65–78. Springer, Berlin (2012)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS’02, pp. 1–16. ACM, New York (2002)
Cendrowska, J.: PRISM: an algorithm for inducing modular rules. Int. J. Man-Mach. Stud. 27(4), 349–370 (1987)
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: A survey of classification methods in data streams. In: Data Streams, pp. 39–59. Springer, Berlin (2007)
Domingos, P., Hulten, G., Mining high-speed data streams. In: Ramakrishnan, R., Stolfo, S.J., Bayardo, R.J., Parsa, I. (eds.) KDD, pp. 71–80. ACM (2000)
Witten, I.H., Eibe, F.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edn. In: Kaufmann, M. (2005)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Getoor, L., Senator, T.E., Domingos, P., Faloutsos, C. (eds.) KDD, pp. 226–235. ACM (2003)
Gama, J., Kosina, P.: Learning decision rules from data streams. In: Walsh, T. (ed.) IJCAI 23, pp. 1255–1260. IJCAI/AAAI (2011)
Han, J., Kamber, M.: Data Mining. Concepts and Techniques, 2nd edn. In: Kaufmann, M. (2006)
Bruce, N., Pope, D., Stanistreet, D.: Quantitative methods for health research: a practical interactive guide to epidemiology and statistics. Wiley, Chichester (2008), 2013
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learning Res. 11, 1601–1604 (2010)
Hoeglinger, S., Pears, R.: Use of hoeffding trees in concept based data stream mining. In: Third International Conference on Information and automation for sustainability, ICIAFS 2007, pp. 57–62. IEEE, New York (2007)
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Lee, D., Schkolnick, M., Provost, F.J., Srikant, R. (eds.) KDD, pp. 377–382. ACM (2001)
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Elder IV, J.F., Fogelman-Soulié, F., Flach, P.A., Zaki, M., (eds.) KDD, pp. 139–148. ACM, New York (2009)
Abdulsalam, H., Skillicorn, D.B., Martin, P.: Streaming random forests. In: IDEAS, pp. 225–232. IEEE Computer Society, Washington, DC (2007)
Datasets from moa (massive online analysis) website (online). Accessed Apr 2014
Ikonomovska, E., Gama, J., Dzeroski, S.: Learning model trees from evolving data streams. Data Min. Knowl. Discov. 23(1), 128168 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Le, T., Stahl, F., Gomes, J.B., Gaber, M.M., Fatta, G.D. (2014). Computationally Efficient Rule-Based Classification for Continuous Streaming Data. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXI. SGAI 2014. Springer, Cham. https://doi.org/10.1007/978-3-319-12069-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-12069-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12068-3
Online ISBN: 978-3-319-12069-0
eBook Packages: Computer ScienceComputer Science (R0)