Abstract
Long time series are often heterogeneous in nature. As such, the most appropriate model is one whose parameters are allowed to change through time. The exponential number of solutions to the multiple change point problem requires an efficient algorithm in order to be computationally feasible. Exact Bayesian solutions have at best quadratic complexity in the number of observations, which still can be too slow for very large data sets. Here, a pruned dynamic programming algorithm is proposed to fit a piecewise regression model with unknown break points to a data set. The algorithm removes unessential calculations, reducing the complexity of the most time consuming step of the algorithm from quadratic in the number of observations to quadratic in the average distance between change points. A distance measure is introduced that can be used to determine the divergence of the approximate joint posterior distribution from the exact posterior distribution. Analysis of two real data sets shows that this approximate algorithm produces a nearly identical representation of the joint posterior distribution on the locations of the change points, but with a significantly faster run time than its exact counterpart.
Similar content being viewed by others
References
Adams RP, MacKay DJC (2007) Bayesian online changepoint detection. http://arxiv.org/pdf/0710.3742.pdf. Accessed 20 June 2016
Auger IE, Lawrence CE (1989) Algorithms for the optimal identification of segment neighborhoods. Bull Math Biol 51:39–54
Bai J, Perron P (2003) Computation and analysis of multiple structural change models. J Appl Econom 18:1–22
Barry D, Hartigan JA (1993) A Bayesian analysis for change point problems. J Am Stat Assoc 88(421):309–319
Carlin BP, Gelfand AE, Smith AFM (1992) Hierarchical Bayesian analysis of changepoint problems. Appl Stat 41:389–405
Chib S (1998) Estimation and comparison of multiple change-point models. J Econom 86:221–241
Chopin N (2007) Dynamic detection of change points in line time series. Ann Inst Stat Math 59:349–366
Erdman C, Emerson J (2008) A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics 24:2143–2148
Fearnhead P (2006) Exact and efficient Bayesian inference for multiple changepoint problems. Stat Comput 16:203–213
Fearnhead P, Clifford P (2003) On-line inference for hidden Markov models via particle filters. J R Stat Soc B 65(4):887–899
Fearnhead P, Liu Z (2007) On-line inference for multiple changepoint problems. J R Stat Soc B 69(4):589–605
Fryzlewicz P (2013) Wild binary segmentation for multiple change-point detection. http://stats.lse.ac.uk/fryzlewicz/wbs/wbs.pdf. Accessed 20 June 2016
Gallagher C, Lund R, Robbins M (2012) Changepoint detection in daily precipitation data. Environmetrics 23(5):407–419
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–732. doi:10.1093/biomet/82.4.711
Hawkins DM, Qiu P, Kang CW (2003) The changepoint model for statistical process control. J Qual Technol 35(4):355–366
Hays JD, Imbrie J, Shackleton NJ (1976) Variations in the Earth’s orbit: pacemakers of the ice ages. Science 194:1121–1132
Jarrett RG (1979) A note on the intervals between coal-mining disasters. Biometrika 66:191–193
Killick R, Fearnhead P, Eckley IA (2012a) Optimal detection of changepoints with a linear computational cost. J Am Stat Assoc 107(500):1590–1598
Killick R, Nam CFH, Aston JAD, Eckley IA (2012b) Changepoint.info: the changepoint repository. http://changepoint.info
Lavielle M, Lebarbier E (2001) An application of MCMC methods for the multiple change-points problem. Signal Process 81(1):39–53
Lisiecki LE, Raymo ME (2005) A Pliocene-Pleistocene stack of 57 globally distributed benthic \(\delta \)18O records. Paleoceanography 20:PA1003. doi:10.1029/2004PA001071
Liu JS, Lawrence CE (1999) Bayesian inference on biopolymer models. Bioinformatics 15(1):38–52
Milankovitch M (1941) Canon of insolation and the ice-age problem. Israel program for scientific translations, Jerusalem (1969)
Olshen AB, Venkatraman ES, Lucito R, Wigler M (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5(4):557–572
O’Ruanaidh J, Fitzgerald WJ (1996) Numerical Bayesian methods applied to signal processing. Springer, New York
Rigaill G (2010) Pruned dynamic programming for optimal multiple change-point detection. http://arXiv:1004.0887v2.pdf . Accessed 20 June 2016
Ross GJ (2013) Parametric and nonparametric sequential change detection in R: the cpm package. http://www.gordonjross.co.uk/cpm.pdf. Accessed 20 June 2016
Ruddiman WF (2013) Earth’s climate: past and future, 3rd edn. WH Freeman, New York
Ruggieri E (2013) A Bayesian approach to detecting change points in climatic records. Int J Climatol 33:520–528
Ruggieri E, Antonellis M (2016) An exact approach to sequential change point detection. Comput Stat Data Anal 97:71–86
Ruggieri E, Lawrence CE (2014) The Bayesian change point and variable selection algorithm: application to the \({\updelta }^{18}\text{ O }\) record of the Plio-Pleistocene. J Comput Gr Stat 23(1):87–110
Ruggieri E, Herbert T, Lawrence KT, Lawrence CE (2009) Change point method for detecting regime shifts in paleoclimatic time series: application to \(\delta \)18O time series of the Plio-Pleistocene. Paleoceanography 24:PA1204. doi:10.1029/2007PA001568
Saatci Y, Turner R, Rasmussen CE (2010) Gaussian process change point models. In: Proceedings of the 27th international conference on machine learning, pp 927–934
Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics 30:507–512
Stephens DA (1994) Bayesian retrospective multiple-changepoint identification. Appl Stat 43(1):159–178
Wang Q, Kulkarni SR, Verdú S (2009) Divergence estimation for multidimensional densities via \(k\)-nearest neighbor distances. IEEE Trans Inf Theory 55(5):2392–2405
Western B, Kleykamp M (2004) A Bayesian change point model for historical time series analysis. Polit Anal 12(4):354–374
Whiteley N, Andrieu C, Doucet A (2011) Bayesian computational methods for inference in multiple change-point models. http://www.maths.bris.ac.uk/~manpw/change_points_2011.pdf. Accessed 20 June 2016
Wilson RC, Nassar MR, Gold JI (2010) Bayesian on-line learning of the hazard rate in change-point problems. Neural Comput 22(9):2452–2476
Yildirim S, Singh SS, Doucet A (2013) An online expectation-maximization algorithm for changepoint models. J Comput Gr Stat 22(4):906–926
Zeileis A, Leisch F, Hornik K, Kleiber C (2002) Strucchange: an R package for testing for structural change in linear regression models. J Stat Softw 7(2):1–38
Acknowledgements
The author would like to thank the two anonymous reviewers for their thoughtful feedback which helped to greatly improve this manuscript. This work was supported by a grant from the National Science Foundation, DMS-1407670 (E. Ruggieri, PI).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ruggieri, E. A pruned recursive solution to the multiple change point problem. Comput Stat 33, 1017–1045 (2018). https://doi.org/10.1007/s00180-017-0756-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0756-9