Advertisement

Drift Detection Using Stream Volatility

  • David Tse Jung HuangEmail author
  • Yun Sing Koh
  • Gillian Dobbie
  • Albert Bifet
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9284)

Abstract

Current methods in data streams that detect concept drifts in the underlying distribution of data look at the distribution difference using statistical measures based on mean and variance. Existing methods are unable to proactively approximate the probability of a concept drift occurring and predict future drift points. We extend the current drift detection design by proposing the use of historical drift trends to estimate the probability of expecting a drift at different points across the stream, which we term the expected drift probability. We offer empirical evidence that applying our expected drift probability with the state-of-the-art drift detector, ADWIN, we can improve the detection performance of ADWIN by significantly reducing the false positive rate. To the best of our knowledge, this is the first work that investigates this idea. We also show that our overall concept can be easily incorporated back onto incremental classifiers such as VFDT and demonstrate that the performance of the classifier is further improved.

Keywords

Data stream Drift detection Stream volatility 

References

  1. 1.
    Bifet, A., Gavaldá, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining (2007)Google Scholar
  2. 2.
    Bifet, A., Holmes, G., Pfahringer, B., Read, J., Kranen, P., Kremer, H., Jansen, T., Seidl, T.: MOA: a real-time analytics open source framework. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 617–620. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  3. 3.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)Google Scholar
  4. 4.
    Gama, J.A., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Computing Surveys 46(4), 44:1–44:37 (2014)CrossRefGoogle Scholar
  5. 5.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  6. 6.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–29 (1963)CrossRefMathSciNetzbMATHGoogle Scholar
  7. 7.
    Huang, D.T.J., Koh, Y.S., Dobbie, G., Pears, R.: Detecting volatility shift in data streams. In: 2014 IEEE International Conference on Data Mining (ICDM), pp. 863–868 (2014)Google Scholar
  8. 8.
    Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the 30th International Conference on VLDB, pp. 180–191. VLDB Endowment (2004)Google Scholar
  9. 9.
    Page, E.: Continuous inspection schemes. Biometrika, 100–115 (1954)Google Scholar
  10. 10.
    Pears, R., Sakthithasan, S., Koh, Y.S.: Detecting concept change in dynamic data streams - A sequential approach based on reservoir sampling. Machine Learning 97(3), 259–293 (2014)CrossRefMathSciNetzbMATHGoogle Scholar
  11. 11.
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 377–382. ACM, New York (2001)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • David Tse Jung Huang
    • 1
    Email author
  • Yun Sing Koh
    • 1
  • Gillian Dobbie
    • 1
  • Albert Bifet
    • 2
  1. 1.Department of Computer ScienceUniversity of AucklandAucklandNew Zealand
  2. 2.Huawei Noah’s Ark LabHong KongChina

Personalised recommendations