Continuous monitoring for changepoints in data streams using adaptive estimation

Abstract

Data streams are characterised by a potentially unending sequence of high-frequency observations which are subject to unknown temporal variation. Many modern streaming applications demand the capability to sequentially detect changes as soon as possible after they occur, while continuing to monitor the stream as it evolves. We refer to this problem as continuous monitoring. Sequential algorithms such as CUSUM, EWMA and their more sophisticated variants usually require a pair of parameters to be selected for practical application. However, the choice of parameter values is often based on the anticipated size of the changes and a given choice is unlikely to be optimal for the multiple change sizes which are likely to occur in a streaming data context. To address this critical issue, we introduce a changepoint detection framework based on adaptive forgetting factors that, instead of multiple control parameters, only requires a single parameter to be selected. Simulated results demonstrate that this framework has utility in a continuous monitoring setting. In particular, it reduces the burden of selecting parameters in advance. Moreover, the methodology is demonstrated on real data arising from Foreign Exchange markets.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    Note that the empty product has value: \(\prod _{p=N}^{N-1}\lambda _p = 1\).

  2. 2.

    Note that since it is an offline method it does not make sense to compute performance measures such as ARL0 and ARL1 for comparison with AFF, CUSUM and EWMA.

References

  1. Adams, N.M., Tasoulis, D.K., Anagnostopoulos, C., Hand, D.J.: Temporally-adaptive linear classification for handling population drift in credit scoring. In: Lechevallier, Y., Saporta, G. (eds.) COMPSTAT2010, Proceedings of the 19th International Conference on Computational Statistics, pp 167–176. Springer, Berlin (2010)

  2. Aggarwal, C.C. (ed.): Data Streams: Models and Algorithms. Springer, Berlin (2006)

    Google Scholar 

  3. Anagnostopoulos, C.: A statistical framework for streaming data analysis. PhD thesis, Imperial College London (2010)

  4. Anagnostopoulos, C., Tasoulis, D.K., Adams, N.M., Pavlidis, N.G., Hand, D.J.: Online linear and quadratic discriminant analysis with adaptive forgetting for streaming classification. Stat. Anal. Data Mining 5(2), 139–166 (2012)

    MathSciNet  Article  Google Scholar 

  5. Apley, D.W., Chin, C.H.: An optimal filter design approach to statistical process control. J. Qual. Technol. 39(2), 93–117 (2007)

    Google Scholar 

  6. Appel, U., Brandt, A.V.: Adaptive sequential segmentation of piecewise stationary time series. Inf. Sci. 29(1), 27–56 (1983)

    Article  MATH  Google Scholar 

  7. Åström, K., Borisson, U., Ljung, L., Wittenmark, B.: Theory and applications of self-tuning regulators. Automatica 13(5), 457–476 (1977)

    Article  MATH  Google Scholar 

  8. Åström, K.J., Wittenmark, B.: On self tuning regulators. Automatica 9(2), 185–199 (1973)

    Article  MATH  Google Scholar 

  9. Basseville, M., Nikiforov, I.V.: Detection of Abrupt Changes: Theory and Application. Prentice Hall, Englewood Cliffs (1993)

    Google Scholar 

  10. Bodenham, D.A., Adams, N.M.: Continuous monitoring of a computer network using multivariate adaptive estimation. In: IEEE 13th International Conference on Data Mining Workshops (ICDMW), pp 311–388 (2013)

  11. Bodenham, D.A., Adams, N.M.: Adaptive change detection for relay-like behaviour. In: IEEE Joint Information and Security Informatics Conference (2014)

  12. Borkar, V.S.: Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, Cambridge (2008)

    Google Scholar 

  13. Capizzi, G., Masarotto, G.: An adaptive exponentially weighted moving average control chart. Technometrics 45(3), 199–207 (2003)

    MathSciNet  Article  Google Scholar 

  14. Capizzi, G., Masarotto, G.: Self-starting CUSCORE control charts for individual multivariate observations. J. Qual. Technol. 42(2), 136–152 (2010)

    Google Scholar 

  15. Capizzi, G., Masarotto, G.: Adaptive generalized likelihood ratio control charts for detecting unknown patterned mean shifts. J. Qual. Technol. 44(4), 281–303 (2012)

    Google Scholar 

  16. Choi, S.W., Martin, E.B., Morris, A.J., Lee, I.B.: Adaptive multivariate statistical process control for monitoring time-varying processes. Ind. Eng. Chem. Res. 45(9), 3108–3118 (2006)

    Article  Google Scholar 

  17. Fortescue, T., Kershenbaum, L., Ydstie, B.: Implementation of self-tuning regulators with variable forgetting factors. Automatica 17(6), 831–835 (1981)

    Article  Google Scholar 

  18. Fraker, S.E., Woodall, W.H., Mousavi, S.: Performance metrics for surveillance schemes. Qual. Eng. 20(4), 451–464 (2008)

    Article  Google Scholar 

  19. Frisén, M.: Statistical surveillance. Optimality and methods. Int. Stat. Rev. 71(2), 403–434 (2003)

    Article  MATH  Google Scholar 

  20. Gama, J.: Knowledge Discovery from Data Streams. Chapman Hall, Boca Raton (2010)

    Google Scholar 

  21. German, R.R., Lee, L.M., Horan, J.M., Milstein, R.L., Pertowski, C.A., Waller, M.N.: Updated guidelines for evaluating public health surveillance systems. Morb. Mortal. Wkly. Rep. 50, 1–35 (2001)

    Google Scholar 

  22. Gustafsson, F.: Adaptive Filtering and Change Detection. Wiley, New York (2000)

    Google Scholar 

  23. Hawkins, D.M.: Self-starting Cusum charts for location and scale. J. R. Stat. Soc. Ser. D 36(4), 299–316 (1987)

    Google Scholar 

  24. Hawkins, D.M.: Cumulative sum control charting: an underutilized SPC tool. Qual. Eng. 5(3), 463–477 (1993)

  25. Hawkins, D.M., Qiu, P., Chang, W.K.: The changepoint model for statistical process control. J. Qual. Technol. 35(4), 355–366 (2003)

    Google Scholar 

  26. Haykin, S.: Adaptive Filter Theory. Prentice-Hall, Upper Saddle River (2002)

    Google Scholar 

  27. Jensen, W.A., Jones-Farmer, L.A., Champ, C.W., Woodall, W.H., et al.: Effects of parameter estimation on control chart properties: a literature review. J. Qual. Technol. 38(4), 349–364 (2006)

    Google Scholar 

  28. Jiang, W., Shu, W., Apley, D.W.: Adaptive cusum procedures with EWMA-based shift estimators. IIE Trans. 40(10), 992–1003 (2008)

    Article  Google Scholar 

  29. Jones, L.A.: The statistical design of EWMA control charts with estimated parameters. J. Qual. Technol. 34(3), 277–288 (2002)

    Google Scholar 

  30. Jones, L.A., Champ, C.W., Rigdon, S.E.: The performance of exponentially weighted moving average charts with estimated parameters. Technometrics 43(2), 156–167 (2001)

    MathSciNet  Article  Google Scholar 

  31. Jones, L.A., Champ, C.W., Rigdon, S.E.: The run length distribution of the CUSUM with estimated parameters. J. Qual. Technol. 36(1), 95–108 (2004)

    Google Scholar 

  32. Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)

    Article  Google Scholar 

  33. Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the 13th international conference on Very large data bases-Volume 30, VLDB Endowment, pp. 180–191 (2004)

  34. Killick, R., Eckley, I.A.: Changepoint: An R Package for Changepoint Analysis. Lancaster University, Lancaster (2011)

    Google Scholar 

  35. Killick, R., Fearnhead, P., Eckley, I.A.: Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 107(500), 1590–1598 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  36. Lorden, G.: Procedures for reacting to a change in distribution. Ann. Math. Stat. 1(6), 1897–1908 (1971)

    MathSciNet  Article  MATH  Google Scholar 

  37. Lucas, J.M.: The design and use of V-mask control schemes. J. Qual. Technol. 8, 1–11 (1976)

    Google Scholar 

  38. Lucas, J.M., Saccucci, M.S.: Exponentially weighted moving average control schemes: properties and enhancements. Technometrics 32(1), 1–12 (1990)

    MathSciNet  Article  Google Scholar 

  39. Maboudou-Tchao, E.M., Hawkins, D.M.: Detection of multiple change-points in multivariate data. J. Appl. Stat. 40(9), 1979–1995 (2013)

    MathSciNet  Article  Google Scholar 

  40. Moustakides, G.V.: Optimal stopping times for detecting changes in distributions. Ann. Stat. 14(4), 1379–1387 (1986)

    MathSciNet  Article  MATH  Google Scholar 

  41. Page, E.: Continuous inspection schemes. Biometrika 41(1/2), 100–115 (1954)

    MathSciNet  Article  MATH  Google Scholar 

  42. Pavlidis, N.G., Tasoulis, D.K., Adams, N.M., Hand, D.J.: lambda-perceptron: an adaptive classifier for data streams. Pattern Recogn. 44(1), 78–96 (2011)

    Article  MATH  Google Scholar 

  43. Roberts, S.W.: Control chart tests based on geometric moving averages. Technometrics 1(3), 239–250 (1959)

    Article  Google Scholar 

  44. Ross, G.J., Adams, N.M., Tasoulis, D.K.: Nonparametric monitoring of data streams for changes in location and scale. Technometrics 53(4), 379–389 (2011)

    MathSciNet  Article  Google Scholar 

  45. Sullivan, J.H.: Detection of multiple change points from clustering individual observations. J. Qual. Control 34(4), 371–383 (2002)

    Google Scholar 

  46. Tsung, F., Wang, T.: Adaptive charting techniques: literature review and extensions. In: Lenz, H. (ed.) Frontiers in Statistical Quality Control, vol. 9, pp. 19–35. Springer, Berlin (2010)

    Google Scholar 

  47. Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2009)

    Google Scholar 

  48. Xie, Y., Sigmund, D.: Sequential multi-sensor change-point detection. Ann. Stat. 41(2), 670–692 (2013)

    MathSciNet  Article  MATH  Google Scholar 

Download references

Acknowledgments

The work of Dean Bodenham was fully supported by a Roth Studentship provided by the Department of Mathematics, Imperial College, London. The authors would like to thank C. Anagnostopoulos, D. J. Hand, N. A. Heard, G. J. Ross, W. H. Woodall and the three anonymous referees for their helpful comments which improved the manuscript. All figures were created in R using the ggplot2 package (Wickham 2009). Finally, we note that an R package ffstream implementing the AFF algorithm is in preparation.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Dean A. Bodenham.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1006 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bodenham, D.A., Adams, N.M. Continuous monitoring for changepoints in data streams using adaptive estimation. Stat Comput 27, 1257–1270 (2017). https://doi.org/10.1007/s11222-016-9684-8

Download citation

Keywords

  • Changepoint detection
  • Adaptive estimation
  • Data stream
  • Sequential analysis