Self-adaptive Change Detection in Streaming Data with Non-stationary Distribution

  • Xiangliang Zhang
  • Wei Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6440)


Non-stationary distribution, in which the data distribution evolves over time, is a common issue in many application fields, e.g., intrusion detection and grid computing. Detecting the changes in massive streaming data with a non-stationary distribution helps to alarm the anomalies, to clean the noises, and to report the new patterns. In this paper, we employ a novel approach for detecting changes in streaming data with the purpose of improving the quality of modeling the data streams. Through observing the outliers, this approach of change detection uses a weighted standard deviation to monitor the evolution of the distribution of data streams. A cumulative statistical test, Page-Hinkley, is employed to collect the evidence of changes in distribution. The parameter used for reporting the changes is self-adaptively adjusted according to the distribution of data streams, rather than set by a fixed empirical value. The self-adaptability of the novel approach enhances the effectiveness of modeling data streams by timely catching the changes of distributions. We validated the approach on an online clustering framework with a benchmark KDDcup 1999 intrusion detection data set as well as with a real-world grid data set. The validation results demonstrate its better performance on achieving higher accuracy and lower percentage of outliers comparing to the other change detection approaches.


Change detection Data stream Self-adaptive parameter setting Non-stationary distribution 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fan, W., Wang, H., Yu, P.: Active mining of data streams. In: Proceedings of SIAM Conference on Data Mining, SDM (2004)Google Scholar
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 81–92 (2003)Google Scholar
  3. 3.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of SIAM Conference on Data Mining (SDM), pp. 326–337 (2006)Google Scholar
  4. 4.
    Zhang, X., Furtlehner, C., Sebag, M.: Data streaming with affinity propagation. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), pp. 628–643 (2008)Google Scholar
  5. 5.
    Muthukrishnan, S.: Data streams: Algorithms and applications. In: Found. Trends Theor. Comput. Sci., vol. 1, pp. 117–236. Now Publishers Inc. (2005)Google Scholar
  6. 6.
    Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: IEEE Symposium on Foundations of Computer Science, pp. 359–366 (2000)Google Scholar
  7. 7.
    Zhang, X., Furtlehner, C., Perez, J., Germain-Renaud, C., Sebag, M.: Toward autonomic grids: Analyzing the job flow with affinity streaming. In: KDD 2009: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 987–995 (2009)Google Scholar
  8. 8.
    Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: Proceedings of ACM Symposium Principles of Database Systems(PODS), pp. 286–296 (2004)Google Scholar
  9. 9.
    Zhang, X., Sebag, M., Germain-Renaud, C.: Multi-scale real-time grid monitoring with job stream mining. In: Proceedings of 9th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pp. 420–427 (2009)Google Scholar
  10. 10.
    MIT-Lincoln-Lab: Mit lincoln laboratory, darpa intrusion detection evaluation documentation (1999),
  11. 11.
    Zhang, X., Germain, C., Sebag, M.: Adaptively detecting changes in autonomic grid computing. In: Proceedings of 11th ACM/IEEE International Conference on Grid Computing (Grid 2010), workshop on Autonomic Computational Science (2010)Google Scholar
  12. 12.
    Page, E.: Continuous inspection schemes. Biometrika 41, 100–115 (1954)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Hinkley, D.: Inference about the change-point from cumulative sum tests. Biometrika 58, 509–523 (1971)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Dasu, T., Krishnan, S., Venkatasubramanian, S., Yi, K.: An information-theoretic approach to detecting changes in multi-dimensional data streams. In: Proceedings of 38th Symposium on the Interface of Statistics, Computing Science, and Applications, Interface 2006 (2006)Google Scholar
  15. 15.
    Song, X., Wu, M., Jermaine, C., Ranka, S.: Statistical change detection for multi-dimensional data. In: KDD 2007: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 667–676 (2007)Google Scholar
  16. 16.
    Aggarwal, C.C.: A framework for diagnosing changes in evolving data streams. In: SIGMOD 2003: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 575–586 (2003)Google Scholar
  17. 17.
    Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6, 461–464 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    KDDCup: KDD Cup 1999 data (computer network intrusion detection) (1999),
  19. 19.
    Lee, W., Stolfo, S.J., Mok, K.W.: A data mining framework for building intrusion detection models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, pp. 120–132 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Xiangliang Zhang
    • 1
  • Wei Wang
    • 2
  1. 1.Mathematical and Computer Sciences and Engineering DivisionKing Abdullah University of Science and TechnologySaudi Arabia
  2. 2.Interdisciplinary Centre for Security, Reliability and Trust (SnT Centre)University of LuxembourgLuxembourg

Personalised recommendations