Automatization of the Stream Mining Process

  • Lovro Šubelj
  • Zoran Bosnić
  • Matjaž Kukar
  • Marko Bajec
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8484)

Abstract

The problem this paper addresses is related to Data Stream Mining and its automatization within Information Systems. Our aim is to show that the expertise which is usually provided by data and data mining experts and is crucial for problems of this kind can be successfully captured and computerized. To this end we observed data mining experts at work and in discussion with them coded their knowledge in a form of an expert system. The evaluation over four different datasets confirms the automatization of the stream mining process is possible and can produce results comparable to those achieved by data mining experts.

Keywords

data mining stream mining expert system 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aha, D.W.: Generalizing from case studies: A case study. In: Proceedings of the International Conference on Machine Learning (MLC 1992), Aberdeen, Scotland, pp. 1–10 (1992)Google Scholar
  2. 2.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2002), Madison, WI, USA, pp. 1–16 (2002)Google Scholar
  3. 3.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)Google Scholar
  4. 4.
    Cormode, G.: Conquering the divide: Continuous clustering of distributed data streams. In: Proceedings of the International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, pp. 1036–1045 (2007)Google Scholar
  5. 5.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Gama, J., Brazdil, P.: Characterization of classification algorithms. In: Pinto-Ferreira, C., Mamede, N.J. (eds.) EPIA 1995. LNCS, vol. 990, pp. 189–200. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  7. 7.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), 10–18 (2009)CrossRefGoogle Scholar
  8. 8.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, USA, pp. 97–106 (2001)Google Scholar
  9. 9.
    Kriegel, H.-P., Borgwardt, K.M., Kröger, P., Pryakhin, A., Schubert, M., Zimek, A.: Future trends in data mining. Data Min. Knowl. Discov. 15(1), 87–97 (2007)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Matheus, C.J., Rendell, L.A.: Constructive induction on decision trees. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 1989), Detroit, MI, USA, pp. 645–650 (1989)Google Scholar
  11. 11.
    Merz, C.J.: Dynamical selection of learning algorithms. In: Learning from Data: Artificial Intelligence and Statistics, pp. 281–290. Springer (1996)Google Scholar
  12. 12.
    Pratt, L., Jennings, B.: A survey of connectionist network reuse through transfer. In: Learning to Learn, pp. 19–43. Springer (1998)Google Scholar
  13. 13.
    Rodrigues, P.P., Gama, J., Bosnic, Z.: Online reliability estimates for individual predictions in data streams. In: Proceedings of the IEEE International Conference on Data Mining Workshops (ICDMW 2008), Pisa, Italy, pp. 36–45 (2008)Google Scholar
  14. 14.
    Rossi, A.L.D., Soares, C., Carvalho, A.C.P.L.F.: Bioinspired parameter tuning of MLP networks for gene expression analysis: Quality of fitness estimates vs. Number of solutions analysed. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008, Part II. LNCS, vol. 5507, pp. 252–259. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Shaker, A., Hüllermeier, E.: IBLStreams: a system for instance-based classification and regression on data streams. Evolving Systems 3(4), 235–249 (2012)CrossRefGoogle Scholar
  16. 16.
    Wolpert, D.H.: Stacked generalization. Neural Networks 5(2), 241–259 (1992)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Lovro Šubelj
    • 1
  • Zoran Bosnić
    • 1
  • Matjaž Kukar
    • 1
  • Marko Bajec
    • 1
  1. 1.Faculty of Computer and Information ScienceUniversity of LjubljanaLjubljanaSlovenia

Personalised recommendations