Advertisement

Identifying Features with Concept Drift in Multidimensional Data Using Statistical Tests

  • Piotr Sobolewski
  • Michał Woźniak
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 436)

Abstract

Concept drift is a common problem in the data streams, which makes the classifiers no longer valid. In the multidimensional data, this problem becomes difficult to tackle. This paper examines the possibilities of identifying the specific features, in which concept drift occurs. This allows to limit the scope of the necessary update in the classification system. As a tool, we select a popular Kolmogorov-Smirnov test statistic.

Keywords

Concept drift detection statistical test 

References

  1. 1.
    Newman, D.J., Asuncion, A.: UCI machine learning repository (2007)Google Scholar
  2. 2.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2002, pp. 1–16. ACM, New York (2002)CrossRefGoogle Scholar
  3. 3.
    Dries, A., Rückert, U.: Adaptive concept drift detection. Stat. Anal. Data Min. 2(5-6), 311–327 (2009)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Greiner, R., Grove, A.J., Roth, D.: Learning cost-sensitive active classifiers. Artif. Intell. 139(2), 137–174 (2002)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion (Grundlehren der mathematischen Wissenschaften), 3rd edn. Springer (December 2004)Google Scholar
  6. 6.
    Schlimmer, J.C., Granger Jr., R.H.: Incremental learning from noisy data. Mach. Learn. 1(3), 317–354 (1986)Google Scholar
  7. 7.
    Smirnov, N.V.: Table for estimating the goodness of fit of empirical distributions. Ann. Math. Stat. 19, 279–281 (1948)CrossRefzbMATHGoogle Scholar
  8. 8.
    Sobolewski, P., Wozniak, M.: Sequential Tests of Statistical Hypotheses. The Annals of Mathematical Statistics 16(2), 117–186 (1945)CrossRefGoogle Scholar
  9. 9.
    Sobolewski, P., Wozniak, M.: Ldcnet: minimizing the cost of supervision for various types of concept drift. In: Proceedings of the CIDUE 2013 - IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments, CIDUE 2013, pp. 68–75 (2013)Google Scholar
  10. 10.
    Sobolewski, P., Woźniak, M.: Comparable study of statistical tests for virtual concept drift detection. In: Burduk, R., Jackowski, K., Kurzynski, M., Wozniak, M., Zolnierek, A. (eds.) CORES 2013. AISC, vol. 226, pp. 333–341. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  11. 11.
    Wilcoxon, F.: Individual Comparisons by Ranking Methods. Biometrics Bulletin 1(6), 80–83 (1945)CrossRefGoogle Scholar
  12. 12.
    Wolfowitz, J.: On Wald’s Proof of the Consistency of the Maximum Likelihood Estimate. The Annals of Mathematical Statistics 20, 601–602 (1949)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Zliobaite, I., Kuncheva, L.I.: Determining the training window for small sample size classification with concept drift. In: Proceedings of the 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009, pp. 447–452. IEEE Computer Society, Washington, DC (2009)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2014

Authors and Affiliations

  • Piotr Sobolewski
    • 1
  • Michał Woźniak
    • 1
  1. 1.Wrocław University of TechnologyWrocławPoland

Personalised recommendations