Advertisement

Statistics and Computing

, Volume 27, Issue 5, pp 1257–1270 | Cite as

Continuous monitoring for changepoints in data streams using adaptive estimation

  • Dean A. Bodenham
  • Niall M. Adams
Article

Abstract

Data streams are characterised by a potentially unending sequence of high-frequency observations which are subject to unknown temporal variation. Many modern streaming applications demand the capability to sequentially detect changes as soon as possible after they occur, while continuing to monitor the stream as it evolves. We refer to this problem as continuous monitoring. Sequential algorithms such as CUSUM, EWMA and their more sophisticated variants usually require a pair of parameters to be selected for practical application. However, the choice of parameter values is often based on the anticipated size of the changes and a given choice is unlikely to be optimal for the multiple change sizes which are likely to occur in a streaming data context. To address this critical issue, we introduce a changepoint detection framework based on adaptive forgetting factors that, instead of multiple control parameters, only requires a single parameter to be selected. Simulated results demonstrate that this framework has utility in a continuous monitoring setting. In particular, it reduces the burden of selecting parameters in advance. Moreover, the methodology is demonstrated on real data arising from Foreign Exchange markets.

Keywords

Changepoint detection Adaptive estimation Data stream Sequential analysis 

Notes

Acknowledgments

The work of Dean Bodenham was fully supported by a Roth Studentship provided by the Department of Mathematics, Imperial College, London. The authors would like to thank C. Anagnostopoulos, D. J. Hand, N. A. Heard, G. J. Ross, W. H. Woodall and the three anonymous referees for their helpful comments which improved the manuscript. All figures were created in R using the ggplot2 package (Wickham 2009). Finally, we note that an R package ffstream implementing the AFF algorithm is in preparation.

Supplementary material

11222_2016_9684_MOESM1_ESM.pdf (1006 kb)
Supplementary material 1 (pdf 1006 KB)

References

  1. Adams, N.M., Tasoulis, D.K., Anagnostopoulos, C., Hand, D.J.: Temporally-adaptive linear classification for handling population drift in credit scoring. In: Lechevallier, Y., Saporta, G. (eds.) COMPSTAT2010, Proceedings of the 19th International Conference on Computational Statistics, pp 167–176. Springer, Berlin (2010)Google Scholar
  2. Aggarwal, C.C. (ed.): Data Streams: Models and Algorithms. Springer, Berlin (2006)zbMATHGoogle Scholar
  3. Anagnostopoulos, C.: A statistical framework for streaming data analysis. PhD thesis, Imperial College London (2010)Google Scholar
  4. Anagnostopoulos, C., Tasoulis, D.K., Adams, N.M., Pavlidis, N.G., Hand, D.J.: Online linear and quadratic discriminant analysis with adaptive forgetting for streaming classification. Stat. Anal. Data Mining 5(2), 139–166 (2012)MathSciNetCrossRefGoogle Scholar
  5. Apley, D.W., Chin, C.H.: An optimal filter design approach to statistical process control. J. Qual. Technol. 39(2), 93–117 (2007)Google Scholar
  6. Appel, U., Brandt, A.V.: Adaptive sequential segmentation of piecewise stationary time series. Inf. Sci. 29(1), 27–56 (1983)CrossRefzbMATHGoogle Scholar
  7. Åström, K., Borisson, U., Ljung, L., Wittenmark, B.: Theory and applications of self-tuning regulators. Automatica 13(5), 457–476 (1977)CrossRefzbMATHGoogle Scholar
  8. Åström, K.J., Wittenmark, B.: On self tuning regulators. Automatica 9(2), 185–199 (1973)CrossRefzbMATHGoogle Scholar
  9. Basseville, M., Nikiforov, I.V.: Detection of Abrupt Changes: Theory and Application. Prentice Hall, Englewood Cliffs (1993)Google Scholar
  10. Bodenham, D.A., Adams, N.M.: Continuous monitoring of a computer network using multivariate adaptive estimation. In: IEEE 13th International Conference on Data Mining Workshops (ICDMW), pp 311–388 (2013)Google Scholar
  11. Bodenham, D.A., Adams, N.M.: Adaptive change detection for relay-like behaviour. In: IEEE Joint Information and Security Informatics Conference (2014)Google Scholar
  12. Borkar, V.S.: Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, Cambridge (2008)zbMATHGoogle Scholar
  13. Capizzi, G., Masarotto, G.: An adaptive exponentially weighted moving average control chart. Technometrics 45(3), 199–207 (2003)MathSciNetCrossRefGoogle Scholar
  14. Capizzi, G., Masarotto, G.: Self-starting CUSCORE control charts for individual multivariate observations. J. Qual. Technol. 42(2), 136–152 (2010)Google Scholar
  15. Capizzi, G., Masarotto, G.: Adaptive generalized likelihood ratio control charts for detecting unknown patterned mean shifts. J. Qual. Technol. 44(4), 281–303 (2012)Google Scholar
  16. Choi, S.W., Martin, E.B., Morris, A.J., Lee, I.B.: Adaptive multivariate statistical process control for monitoring time-varying processes. Ind. Eng. Chem. Res. 45(9), 3108–3118 (2006)CrossRefGoogle Scholar
  17. Fortescue, T., Kershenbaum, L., Ydstie, B.: Implementation of self-tuning regulators with variable forgetting factors. Automatica 17(6), 831–835 (1981)CrossRefGoogle Scholar
  18. Fraker, S.E., Woodall, W.H., Mousavi, S.: Performance metrics for surveillance schemes. Qual. Eng. 20(4), 451–464 (2008)CrossRefGoogle Scholar
  19. Frisén, M.: Statistical surveillance. Optimality and methods. Int. Stat. Rev. 71(2), 403–434 (2003)CrossRefzbMATHGoogle Scholar
  20. Gama, J.: Knowledge Discovery from Data Streams. Chapman Hall, Boca Raton (2010)CrossRefzbMATHGoogle Scholar
  21. German, R.R., Lee, L.M., Horan, J.M., Milstein, R.L., Pertowski, C.A., Waller, M.N.: Updated guidelines for evaluating public health surveillance systems. Morb. Mortal. Wkly. Rep. 50, 1–35 (2001)Google Scholar
  22. Gustafsson, F.: Adaptive Filtering and Change Detection. Wiley, New York (2000)Google Scholar
  23. Hawkins, D.M.: Self-starting Cusum charts for location and scale. J. R. Stat. Soc. Ser. D 36(4), 299–316 (1987)Google Scholar
  24. Hawkins, D.M.: Cumulative sum control charting: an underutilized SPC tool. Qual. Eng. 5(3), 463–477 (1993)Google Scholar
  25. Hawkins, D.M., Qiu, P., Chang, W.K.: The changepoint model for statistical process control. J. Qual. Technol. 35(4), 355–366 (2003)Google Scholar
  26. Haykin, S.: Adaptive Filter Theory. Prentice-Hall, Upper Saddle River (2002)zbMATHGoogle Scholar
  27. Jensen, W.A., Jones-Farmer, L.A., Champ, C.W., Woodall, W.H., et al.: Effects of parameter estimation on control chart properties: a literature review. J. Qual. Technol. 38(4), 349–364 (2006)Google Scholar
  28. Jiang, W., Shu, W., Apley, D.W.: Adaptive cusum procedures with EWMA-based shift estimators. IIE Trans. 40(10), 992–1003 (2008)CrossRefGoogle Scholar
  29. Jones, L.A.: The statistical design of EWMA control charts with estimated parameters. J. Qual. Technol. 34(3), 277–288 (2002)Google Scholar
  30. Jones, L.A., Champ, C.W., Rigdon, S.E.: The performance of exponentially weighted moving average charts with estimated parameters. Technometrics 43(2), 156–167 (2001)MathSciNetCrossRefGoogle Scholar
  31. Jones, L.A., Champ, C.W., Rigdon, S.E.: The run length distribution of the CUSUM with estimated parameters. J. Qual. Technol. 36(1), 95–108 (2004)Google Scholar
  32. Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)CrossRefGoogle Scholar
  33. Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the 13th international conference on Very large data bases-Volume 30, VLDB Endowment, pp. 180–191 (2004)Google Scholar
  34. Killick, R., Eckley, I.A.: Changepoint: An R Package for Changepoint Analysis. Lancaster University, Lancaster (2011)Google Scholar
  35. Killick, R., Fearnhead, P., Eckley, I.A.: Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 107(500), 1590–1598 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  36. Lorden, G.: Procedures for reacting to a change in distribution. Ann. Math. Stat. 1(6), 1897–1908 (1971)MathSciNetCrossRefzbMATHGoogle Scholar
  37. Lucas, J.M.: The design and use of V-mask control schemes. J. Qual. Technol. 8, 1–11 (1976)Google Scholar
  38. Lucas, J.M., Saccucci, M.S.: Exponentially weighted moving average control schemes: properties and enhancements. Technometrics 32(1), 1–12 (1990)MathSciNetCrossRefGoogle Scholar
  39. Maboudou-Tchao, E.M., Hawkins, D.M.: Detection of multiple change-points in multivariate data. J. Appl. Stat. 40(9), 1979–1995 (2013)MathSciNetCrossRefGoogle Scholar
  40. Moustakides, G.V.: Optimal stopping times for detecting changes in distributions. Ann. Stat. 14(4), 1379–1387 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  41. Page, E.: Continuous inspection schemes. Biometrika 41(1/2), 100–115 (1954)MathSciNetCrossRefzbMATHGoogle Scholar
  42. Pavlidis, N.G., Tasoulis, D.K., Adams, N.M., Hand, D.J.: lambda-perceptron: an adaptive classifier for data streams. Pattern Recogn. 44(1), 78–96 (2011)CrossRefzbMATHGoogle Scholar
  43. Roberts, S.W.: Control chart tests based on geometric moving averages. Technometrics 1(3), 239–250 (1959)CrossRefGoogle Scholar
  44. Ross, G.J., Adams, N.M., Tasoulis, D.K.: Nonparametric monitoring of data streams for changes in location and scale. Technometrics 53(4), 379–389 (2011)MathSciNetCrossRefGoogle Scholar
  45. Sullivan, J.H.: Detection of multiple change points from clustering individual observations. J. Qual. Control 34(4), 371–383 (2002)Google Scholar
  46. Tsung, F., Wang, T.: Adaptive charting techniques: literature review and extensions. In: Lenz, H. (ed.) Frontiers in Statistical Quality Control, vol. 9, pp. 19–35. Springer, Berlin (2010)Google Scholar
  47. Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2009)CrossRefzbMATHGoogle Scholar
  48. Xie, Y., Sigmund, D.: Sequential multi-sensor change-point detection. Ann. Stat. 41(2), 670–692 (2013)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of MathematicsImperial College LondonLondonUK
  2. 2.D-BSSEETH ZürichZurichSwitzerland
  3. 3.Heilbronn Institute of MathematicsUniversity of BristolBristolUK

Personalised recommendations