Data Streams Classification: A Selective Ensemble with Adaptive Behavior

  • Valerio Grossi
  • Franco Turini
Part of the Communications in Computer and Information Science book series (CCIS, volume 271)

Abstract

Data streams classification represents an important and challenging task for a wide range of applications. The diffusion of new technologies, such as smartphones and sensor networks, related to communication services introduces new challenges in the analysis of streaming data. The latter requires the use of approaches that require little time and space to process a single item, providing an accurate representation of only relevant data characteristics for keeping track of concept drift. Based on these premises, this paper introduces a set of requirements related to the data streams classification proposing a new adaptive ensemble method. The outlined system employs two distinct structure, for managing both data aggregation and mining features. The latter are represented by a selective ensemble managed with an adaptive behavior. Our approach dynamically updates the threshold value for enabling the models directly involved in the classification step. The system is conceived to satisfy the proposed requirements even in the presence of concept drifting events. Finally, our method is compared with several existing systems employing both synthetic and real data.

Keywords

Data Stream Adaptive Behavior Activation Threshold Concept Drift Hoeffding Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proceedings of the 2003 International Conference on Very Large Data Bases (VLDB 2003), Berlin, Germany, pp. 81–92 (2003)Google Scholar
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A framework for projected clustering of high dimensional data streams. In: Proceedings of the 2004 International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, pp. 852–863 (2004)Google Scholar
  3. 3.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.: On demand classification of data streams. In: Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD 2004), Seattle, WA, pp. 503–508 (2004)Google Scholar
  4. 4.
    Baena-Garcia, M., del Campo-Avila, J., Fidalgo, R., Bifet, A., Ravalda, R., Morales-Bueno, R.: Early drift detection method. In: International Workshop on Knowledge Discovery from Data Streams (2006)Google Scholar
  5. 5.
    Bifet, A., Holmes, G., Pfahringer, B., Kirby, R., Gavaldá, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining, pp. 139–148 (2009)Google Scholar
  6. 6.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)MATHGoogle Scholar
  7. 7.
    Chu, F., Zaniolo, C.: Fast and Light Boosting for Adaptive Mining of Data Streams. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 282–292. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Cohen, L., Avrahami, G., Last, M., Kandel, A.: Info-fuzzy algorithms for mining dynamic data streams. Applied Soft Computing 8(4), 1283–1294 (2008)CrossRefGoogle Scholar
  9. 9.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA, pp. 71–80 (2000)Google Scholar
  10. 10.
    Domingos, P., Hulten, G.: A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the 18th International Conference on Machine Learning (ICML 2001), Williamstown, MA, pp. 106–113 (2001)Google Scholar
  11. 11.
    Folino, G., Pizzuti, C., Spezzano, G.: Mining Distributed Evolving Data Streams using Fractal GP Ensembles. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 160–169. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM SIGMOD Records 34(2), 18–26 (2005)CrossRefGoogle Scholar
  13. 13.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: SBIA Brazilian Symposium on Artificial Intelligence, pp. 286–295 (2004)Google Scholar
  14. 14.
    Gama, J., Pinto, C.: Discretization from data streams: applications to histograms and data mining. In: Proceedings of the 2006 ACM Symposium on Applied Computing (SAC 2006), Dijon, France, pp. 662–667 (2006)Google Scholar
  15. 15.
    Gama, J., Fernandes, R., Rocha, R.: Decision trees for mining data streams. Intelligent Data Analysis 10(1), 23–45 (2006)Google Scholar
  16. 16.
    Gao, J., Fan, W., Han, J., Yu, P.S.: On appropriate assumptions to mine data streams: Analysis and practice. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, pp. 143–152 (2007)Google Scholar
  17. 17.
    Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings of the 2002 Annual ACM Symposium on Theory of Computing (STOC 2002), Montreal, Quebec, Canada, pp. 389–398 (2002)Google Scholar
  18. 18.
    Grossi, V.: A New Framework for Data Streams Classification. Ph.D. thesis, Supervisor Prof. Franco Turini, University of Pisa (2009), http://etd.adm.unipi.it/theses/available/etd-11242009-124601/
  19. 19.
    Grossi, V., Turini, F.: Stream mining: a novel architecture for ensemble based classification. Accepted as full paper by Internl. Journ. of Knowl. and Inform. Sys., forthcoming, draft (2011), www.di.unipi.it/~vgrossi
  20. 20.
    Grossi, V., Turini, F.: A new selective ensemble approach for data streams classification. In: Proceedings of the 2010 International Conference in Artificial Intelligence and Applications (AIA 2010), Innsbruck, Austria, pp. 339–346 (2010)Google Scholar
  21. 21.
    Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: Proceedings of the 2001 Annual ACM Symposium on Theory of Computing (STOC 2001), Heraklion, Crete, Greece, pp. 471–475 (2001)Google Scholar
  22. 22.
    Hulten, G., Spencer, L., Domingos, P.: Mining time changing data streams. In: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, pp. 97–106 (2001)Google Scholar
  23. 23.
    Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis 8, 281–300 (2004)Google Scholar
  24. 24.
    Kolter, J.Z., Maloof, M.A.: Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd International Conference on Machine learning (ICML 2005), Bonn, Germany, pp. 449–456 (2005)Google Scholar
  25. 25.
    Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research 8, 2755–2790 (2007)MATHGoogle Scholar
  26. 26.
    Oza, N.C., Russell, S.: Online bagging and boosting. In: Proceedings of 8th International Workshop on Artificial Intelligence and Statistics (AISTATS 2001), Key West, FL, pp. 105–112 (2001)Google Scholar
  27. 27.
    Pfahringer, B., Holmes, G., Kirkby, R.: Handling Numeric Attributes in Hoeffding Trees. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 296–307. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  28. 28.
    Schlimmer, J.C., Granger, R.H.: Beyond incremental processing: Tracking concept drift. In: Proceedings of the 5th National Conference on Artificial Intelligence, Menlo Park, CA, pp. 502–507 (1986)Google Scholar
  29. 29.
    Scholz, M., Klinkenberg, R.: An ensemble classifier for drifting concepts. In: Proceeding of 2nd International Workshop on Knowledge Discovery from Data Streams, in Conjunction with ECML-PKDD 2005, Porto, Portugal, pp. 53–64 (2005)Google Scholar
  30. 30.
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, pp. 377–382 (2001)Google Scholar
  31. 31.
    The UCI KDD: University of California: KDD Cup 1999 Data, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
  32. 32.
    The University of Waikato: MOA: Massive Online Analysis (August 2009), http://www.cs.waikato.ac.nz/ml/moa
  33. 33.
    The University of Waikato: Weka 3: Data Mining Software in Java, Version 3.6, http://www.cs.waikato.ac.nz/ml/weka
  34. 34.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining (KDD 2003), Washington, DC, pp. 226–235 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Valerio Grossi
    • 1
  • Franco Turini
    • 2
  1. 1.Dept. of Pure and Applied MathematicsUniversity of PadovaPadovaItaly
  2. 2.Dept. of Computer ScienceUniversity of PisaPisaItaly

Personalised recommendations