eRules: A Modular Adaptive Classification Rule Learning Algorithm for Data Streams

  • Frederic Stahl
  • Mohamed Medhat Gaber
  • Manuel Martin Salvador
Conference paper

Abstract

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values

References

  1. 1.
    Computational intelligence platform for evolving and robust predictive systems, http://infer.eu/ 2012.Google Scholar
  2. 2.
    Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In In PODS, pages 1–16, 2002.Google Scholar
  3. 3.
    Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. Moa: Massive online analysis. J. Mach. Learn. Res., 99:1601–1604, August 2010.Google Scholar
  4. 4.
    Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavald`a. New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, pages 139–148, New York, NY, USA, 2009. ACM.Google Scholar
  5. 5.
    M A Bramer. Automatic induction of classification rules from examples using N-Prism. In Research and Development in Intelligent Systems XVI, pages 99–121, Cambridge, 2000. Springer-Verlag.Google Scholar
  6. 6.
    M A Bramer. An information-theoretic approach to the pre-pruning of classification rules. In B Neumann M Musen and R Studer, editors, Intelligent Information Processing, pages 201– 212. Kluwer, 2002.Google Scholar
  7. 7.
    Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen. Classification and Regression Trees. Chapman & Hall/CRC, 1 edition, January 1984.Google Scholar
  8. 8.
    J. Cendrowska. PRISM: an algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4):349–370, 1987.MATHCrossRefGoogle Scholar
  9. 9.
    Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. In ACM-SIAM Symposium on Discrete Algorithms (SODA 2002), 2002.Google Scholar
  10. 10.
    Pedro Domingos and Geoff Hulten. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’00, pages 71–80, New York, NY, USA, 2000. ACM.Google Scholar
  11. 11.
    Pedro Domingos and Geoff Hulten. A general framework for mining massive data stream. Journal of Computational and Graphical Statistics, 12:2003, 2003.Google Scholar
  12. 12.
    Mohamed Medhat Gaber. Advances in data stream mining. Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, 2(1):79–85, 2012.Google Scholar
  13. 13.
    Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy. Mining data streams: a review. SIGMOD Rec., 34(2):18–26, 2005.Google Scholar
  14. 14.
    Jo˜ao Gama, Raquel Sebasti˜ao, and Pedro Pereira Rodrigues. Issues in evaluation of stream learning algorithms. In Proceedings of the 15th ACM SIGKDD international conference onKnowledge discovery and data mining, KDD ’09, pages 329–338, New York, NY, USA, 2009. ACM.Google Scholar
  15. 15.
    Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann 2001.Google Scholar
  16. 16.
    Petr Kadlec and Bogdan Gabrys. Architecture for development of adaptive on-line prediction models. Memetic Computing, 1:241–269, 2009.CrossRefGoogle Scholar
  17. 17.
    J. Zico Kolter and Marcus A. Maloof. Dynamic weighted majority: An ensemble method for drifting concepts. J. Mach. Learn. Res., 8:2755–2790, December 2007.Google Scholar
  18. 18.
    Ross J Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.Google Scholar
  19. 19.
    P. Smyth and R M Goodman. An information theoretic approach to rule induction from databases. 4(4):301–316, 1992.Google Scholar
  20. 20.
    F. Stahl and M. Bramer. Towards a computationally efficient approach to modular classification rule induction. Research and Development in Intelligent Systems XXIV, pages 357–362, 2008.Google Scholar
  21. 21.
    F. Stahl and M. Bramer. Computationally efficient induction of classification rules with the pmcri and j-pmcri frameworks. Knowledge-Based Systems, 2012.Google Scholar
  22. 22.
    F. Stahl and M. Bramer. Jmax-pruning: A facility for the information theoretic pruning of modular classification rules. Knowledge-Based Systems, 29(0):12 – 19, 2012.CrossRefGoogle Scholar
  23. 23.
    W. Nick Street and YongSeog Kim. A streaming ensemble algorithm (sea) for large-scale classification. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’01, pages 377–382, New York, NY, USA, 2001. ACM.Google Scholar
  24. 24.
    Periasamy Vivekanandan and Raju Nedunchezhian. Mining data streams with concept drifts busing genetic algorithm. Artif. Intell. Rev., 36(3):163–178, October 2011.Google Scholar

Copyright information

© Springer-Verlag London 2012

Authors and Affiliations

  • Frederic Stahl
    • 1
  • Mohamed Medhat Gaber
    • 2
  • Manuel Martin Salvador
    • 1
  1. 1.School of Design, Engineering and ComputingBournemouth UniversityPooleUSA
  2. 2.School of Computing, Buckingham BuildingUniversity of PortsmouthLion TerraceUSA

Personalised recommendations