Abstract
In this paper, dynamic programming (DP) algorithm is applied to automatically segment multivariate time series. The definition and recursive formulation of segment errors of univariate time series are extended to multivariate time series, so that DP algorithm is computationally viable for multivariate time series. The order of autoregression and segmentation are simultaneously determined by Schwarz’s Bayesian information criterion. The segmentation procedure is evaluated with artificially synthesized and hydrometeorological multivariate time series. Synthetic multivariate time series are generated by threshold autoregressive model, and in real-world multivariate time series experiment we propose that besides the regression by constant, autoregression should be taken into account. The experimental studies show that the proposed algorithm performs well.
Similar content being viewed by others
References
Abonyi J, Feil B, Nemeth S, Arva P (2003) Fuzzy clustering based segmentation of time-series. In: Lecture notes in computer science, pp 275–286.
Abonyi J, Feil B, Nemeth S, Arva P (2005) Modified Gath–Geva clustering for fuzzy segmentation of multivariate time-series. Fuzzy Sets Syst 149:39–56
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723
Aksoy H, Unal NE, Alexandrov V, Dakova S, Yoon J (2008a) Hydrometeorological analysis of northwestern Turkey with links to climate change. Int J Climatol 28(8):1047–1060
Aksoy H, Gedikli A, Unal NE, Kehagias A (2008b) Fast segmentation algorithms for long hydrometeorological time series. Hydrol Process 22:4600–4608
Beeferman D, Berger A, Lafferty J (1999) Statistical models for text segmentation. Mach Learn 34:177–210
Bellman R (1961) On the approximation of curves by line segments using dynamic programming. Commun ACM 4:284
Braun JV, Mueller H-G (1998) Statistical methods for DNA sequence segmentation. Stat Sci 13:142–162
Braun JV, Braun RK, Mueller H-G (2000) Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika 87:301–314
Fisch D, Gruber T, Sick B (2011) Swiftrule: mining comprehensible classification rules for time series analysis. IEEE Trans Knowl Data Eng 23(5):774–787
Fortin V, Perreault L, Salas JD (2004) Retrospective analysis and forecasting of streamflows using a shifting level model. J Hydrol 296:135–163
Fuchs E, Gruber T, Nitschke J, Sick B (2009) On-line motif detection in time series with swiftmotif. Pattern Recognit 42:3015–3031
Fuchs E, Gruber T, Nitschke J, Sick B (2010) Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans Pattern Anal Mach Intell 32(12):2232–2245
Gath I, Geva AB (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 7:773–780
Gedikli A, Aksoy H, Unal NE (2008) Segmentation algorithm for long time series analysis. Stoch Environ Res Risk Assess 22(3):291–302
Gedikli A, Aksoy H, Unal NE, Kehagias A (2010) Modified dynamic programming approach for offline segmentation of long hydrometeorological time series. Stoch Environ Res Risk Assess 24:547–557
Himberg J, Korpiaho K, Mannila H, Tikanmaki J, Toivonen HTT (2001) Time series segmentation for context recognition in mobile devices. Proc ICDM 2001:203–210
Hubert P (2000) The segmentation procedure as a tool for discrete modeling of hydrometeorogical regimes. Stoch Environ Res Risk Assess 14:297–304
Jackson B et al (2005) An algorithm for optimal partitioning of data on an interval. IEEE Signal Proc Lett 12:105–108
Kehagias A (2004) A hidden Markov model segmentation procedure for hydrological and environmental time series. Stoch Environ Res Risk Assess 18:117–130
Kehagias A, Nidelkou E, Petridis V (2005) A dynamic programming segmentation procedure for hydrological and environmental time series. Stoch Environ Res Risk Assess 20:77–94
Kehagias A, Fortin V (2006) Time series segmentation with shifting means hidden Markov models. Nonlinear Process Geophys 13:339–352
Kehagias A, Petridis V, Nidelkou E (2007) Reply by the authors to the letter by Aksoy et al. Stoch Environ Res Ris Assess 21:451–455
Keogh E, Chu S, Hart D, Pazzani M (2003) Segmenting time series: a survey and novel approach, In: Last M, Kandel A, Bunke H (eds) Data mining in time series databases. World Scientific Publishing Company, Singapore
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371
Liu X, Lin Z, Wang H (2008) Novel online methods for time series segmentation. IEEE Trans Knowl Data Eng 20:1616–1626
Lütkepohl H (2005) New introduction to multiple time series analysis. Springer, Berlin, pp 653–662
Pavlidis T, Horowitz SL (1974) Segmentation of plane curves. IEEE Trans Comput 23:860–870
Seghouane A, Amari S (2007) The AIC criterion and symmetrizing the Kullback–Leibler divergence. IEEE Trans Neural Netw 18:97–106
Tsay RS (1998) Testing and modeling multivariate threshold models. J Am Stat Assoc 443:1188–1202
Wang N, Liu X, Yin J (2012) Improved Gath–Geva clustering for fuzzy segmentation of hydrometeorological time series. Stoch Environ Res Risk Assess 26:139–155
Yao Y-C (1988) Estimating the number of change-points via Schwarz’ criterion. Stat Prob Lett 6:181–189
Acknowledgments
This work is supported by the Natural Science Foundation of China under Grant 61175041 and 11371077, and Boshidian Funds 20110041110017.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix
A: Dynamic programming segmentation
B: Recursive formulas for segment errors of multivariate time series
In this appendix, we will prove the correctness of the recursive formulas Eq. (15) for the computation of segment error \(d_{s,t}\) for multivariate time series. As given by Eq. (13),
Based on \(vec(B)^{T}vec(A)=tr(B^{T}A)\), we have
where \(tr\) is the trace of a matrix, while based on \(\hat{\mathbf {Z}}_{s,t}=\hat{\mathbf {A}}_{s,t}{\mathbf {Y}}_{s,t}\), we have
we can get
Now we will compute the last three terms on the right-hand side in Eq. (30) respectively. With \(\mathbf {\delta A}=\hat{\mathbf {A}}_{s,t+1}-\hat{\mathbf {A}}_{s,t}\), we have
then we obtain
Also we have
As \(vec(AB)=(I\otimes A)vec(B)\) (Lütkepohl 2005), where \(\otimes \) is Kronecker product. What is more, based on the properties of Knonecker product, \((A\otimes B)^{T}=A^{T}\otimes B^{T}\) and \((A\otimes B)(C\otimes D)=AC\otimes BD\) (Lütkepohl 2005), we have
Replacing Eqs. (32, 33, 34) in Eq. (30), we have
Further we get that
As \(tr(CD)=tr(DC)\), finally we obtain
Rights and permissions
About this article
Cite this article
Guo, H., Liu, X. & Song, L. Dynamic programming approach for segmentation of multivariate time series. Stoch Environ Res Risk Assess 29, 265–273 (2015). https://doi.org/10.1007/s00477-014-0897-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-014-0897-0