Advertisement

A Geometric Moving Average Martingale method for detecting changes in data streams

Conference paper

Abstract

In this paper, we propose a Geometric Moving Average Martingale (GMAM) method for detecting changes in data streams. There are two components underpinning the GMAM method. The first is the exponential weighting of observations which has the capability of reducing false changes. The second is the use of the GMAM value for hypothesis testing. When a new data point is observed, the hypothesis testing decides whether any change has occurred on it based on the GMAM value. Once a change is detected, then all variables of the GMAM algorithm are re-initialized in order to find other changes. The experiments show that the GMAM method is effective in detecting concept changes in two synthetic time-varying data streams and a real world dataset ‘Respiration dataset’.

Keywords

Data Stream Control Chart Concept Drift Sequential Probability Ratio Test Martingale Theory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bondu, M. Boullé: A supervised approach for change detection in data streams. , The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 519 – 526 (2011).Google Scholar
  2. 2.
    Daniel Kifer, Shai Ben-David, Johannes Gehrke: Detecting Change in Data Streams. Proceedings of the 30th VLDB Conference,Toronto,Canada, pp. 180-191 (2004).Google Scholar
  3. 3.
    Leszek Czerwonka: Changes in share prices as a response to earnings forecasts regarding future real profits. Alexandru Ioan Cuza University of Iasi, Vol. 56, pp. 81-90 (2009).Google Scholar
  4. 4.
    Q. Siqing, W. Sijing: A homomorphic model for identifying abrupt abnormalities of landslide forerunners. Engineering Geology, Vol. 57, pp. 163–168 (2000).CrossRefGoogle Scholar
  5. 5.
    Wei Xiong, NaixueXiong, Laurence T. Yang, etc.: Network Traffic Anomaly Detection based on Catastrophe Theory. IEEE Globecom 2010 Workshop on Advances in Communications and Networks, pp. 2070-2074 (2010).Google Scholar
  6. 6.
    Thomas Hilker , Michael A.Wulder , Nicholas C. Coops, etc. : A new data fusion model for high spatial- and temporal-resolution mapping of forest disturbance based on Landsat and MODIS. Remote Sensing of Environment, Vol. 113, pp. 1613–1627 (2009).Google Scholar
  7. 7.
    Ashraf M. Dewan , Yasushi Yamaguchi: Using remote sensing and GIS to detect and monitor land use and land cover change in Dhaka Metropolitan of Bangladesh during 1960– 2005. Environ Monit Assess, Vol. 150, pp. 237-249 (2009).CrossRefGoogle Scholar
  8. 8.
    Jin S. Deng, KeWang,Yang Hong,Jia G.Qi.: Spatio-temporal dynamics and evolution of land use change and landscape pattern in response to rapid urbanization. Landscape and Urban Planning, Vol. 92, pp. 187-198 (2009).CrossRefGoogle Scholar
  9. 9.
    Asampbu Kitamoto: Spatio-Temporal Data Mining for Typhoon Image Collection.Journal of Intelligent Information Systems, Vol. 19(1), pp. 25-41 (2002).Google Scholar
  10. 10.
    Tao Cheng, Jiaqiu Wang: Integrated Spatio-temporal Data Mining for Forest Fire Prediction. Transactions in GIS. Vol. 12 (5), pp. 591-611 (2008).Google Scholar
  11. 11.
    A. Dries and U. Ruckert: Adaptive Concept Drift Detection. In SIAM Conference on Data Mining, pp. 233–244 (2009).Google Scholar
  12. 12.
    J.H. Friedman and L.C Rafsky: Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Annals of Statistic, Vol. 4, pp. 697–717 (2006).Google Scholar
  13. 13.
    F. Nemec, O. Santolik, M. Parrot,and J. J. Berthelier: Spacecraft observations of electromagnetic perturbations connected with seismic activity. Geophysical Research Letters, Vol. 35(L05109), pp. 1-5 (2008).Google Scholar
  14. 14.
    Sheskin, D. J.: Handbook of Parametric and Nonparametric Statistical Procedures. 2nd ed. CRC Press, Boca Raton, Fla. pp. 513-727 (2000).Google Scholar
  15. 15.
    W.A. Shewhart: The Application of Statistics as an Aid in Maintaining Quality of a manufactured Product. Am.Statistician Assoc., Vol. 20, pp. 546-548 (1925).CrossRefGoogle Scholar
  16. 16.
    W.A. Shewhart: Economic Control of Quality of Manufactured Product. Am. Soc. for Quality Control, (1931).Google Scholar
  17. 17.
    E.S. Page: On Problem in Which a Change in a Parameter Occurs at an Unknown Point. Biometrika, Vol. 44, pp. 248-252 (1957).MATHGoogle Scholar
  18. 18.
    M.A. Girshik and H. Rubin: A Bayes Approach to a Quality Control Model, Annal of Math. Statistics, Vol. 23(1), pp. 114-125 (1952).CrossRefGoogle Scholar
  19. 19.
    Ludmila I. Kuncheva: Change Detection in Streaming Multivariate Data Using Likelihood Detectors. IEEE Transactions on Knowledge and Data Engineering, Vol. 6(1), pp. 1-7 (2007).Google Scholar
  20. 20.
    F. Chu, Y. Wang, and C. Zaniolo: An Adaptive Learning Approach for Noisy Data Streams.Proc. Fourth IEEE Int’l Conf.Data Mining, pp. 351-354 (2004).Google Scholar
  21. 21.
    J.Z. Kolter and M.A. Maloof: Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift. Proc. Third IEEE Int’l Conf. Data Mining, pp. 123-130 (2003).Google Scholar
  22. 22.
    H. Wang, W. Fan, P.S. Yu, and J. Han: Mining Concept-Drifting Data Streams Using Ensemble Classifiers. Proc. ACM SIGKDD, pp. 226-235 (2003).Google Scholar
  23. 23.
    M. Scholz and R. Klinkenberg: Boosting Classifiers for Drifting Concepts.Intelligent Data Analysis, Vol. 11(1), pp. 3-28 (2007).Google Scholar
  24. 24.
    R. Klinkenberg: Learning Drifting Concepts: Examples Selection vs Example Weighting, Intelligent Data Analysis. special issue on incremental learning systems capable of dealing with concept drift, Vol. 8(3), pp. 281-300 (2004).Google Scholar
  25. 25.
    R. Klinkenberg and T. Joachims: Detecting Concept Drift with Support Vector Machines. Proc. 17th Int’l Conf. Machine Learning, P. Langley, ed., pp. 487-494 (2000).Google Scholar
  26. 26.
    G. Widmer and M. Kubat: Learning in the Presence of Concept Drift and Hidden Contexts.Machine Learning, Vol. 23(1), pp. 69-101 (1996).Google Scholar
  27. 27.
    Kong Fanlang: A Dynamic Method of System Forecast. Systems Engineering Theory and Practice, Vol. 19(3), pp. 58-62 (1999).Google Scholar
  28. 28.
    Kong Fanlang: A Dynamic Method of Air Temperature Forecast. Kybernetes, Vol. 33(2), pp. 282-287 (2004).Google Scholar
  29. 29.
    S. S. Ho, H. Wechsler: A Martingale Framework for Detecting Changes in Data Streams by Testing Exchangeability. IEEE transactions on pattern analysis and machine intelligence, Vol. 32(12), pp. 2113-2127 (2010).CrossRefGoogle Scholar
  30. 30.
    S. Muthukrishnan, E. van den Berg, and Y. Wu: Sequential Change Detection on Data Streams, Proc. ICDM Workshop Data Stream Mining and Management, pp. 551-556 (2007)Google Scholar
  31. 31.
    V. Vovk, I. Nouretdinov, and A. Gammerman: Testing Exchangeability On-Line. Proc. 20th Int’l Conf. Machine Learning,T. pp. 768-775 (2003).Google Scholar
  32. 32.
    M. Steele: Stochastic Calculus and Financial Applications. SpringerVerlag, (2001).Google Scholar
  33. 33.
    E. Keogh, J. Lin, and A. Fu: HOT SAX: Efficiently finding the most unusual time series subsequences. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM'05), pp. 226-233 (2005).Google Scholar
  34. 34.
    V. Moskvina and A. A. Zhigljavsky: An algorithm based on singular spectrum analysis for change-point detection. Communication in Statistics: Simulation & Computation, Vol. 32(2), pp. 319-352 (2003).MathSciNetMATHCrossRefGoogle Scholar
  35. 35.
    Y. Takeuchi and K. Yamanishi: A unifying framework for detecting outliers and change points from non-stationary time series data. IEEE Transactions on Knowledge and Data Engineering, Vol. 18(4), pp. 482–489 (2006).CrossRefGoogle Scholar
  36. 36.
    F. Desobry, M. Davy, and C. Doncarli: An online kernel change detection algorithm. IEEE Transactions on Signal Processing, Vol. 53(8), pp. 2961-2974 (2005).MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2012

Authors and Affiliations

  1. 1.University of UlsterNorthern IrelandUK

Personalised recommendations