Abstract
There have been a slew of ready-made methods for the segmentation of univariate time series, but in contrast, there are fewer segmentation methods to satisfy the demand for multivariate time series analysis. It has become a common practice to develop more segmentation methods for multivariate time series by extending segmentation methods of univariate time series. But on the contrary, this paper tries to reduce multivariate time series to a univariate common factor sequence to adapt to the methods for segmentation of univariate time series. First, a common factor sequence is extracted from the multivariate time series as a composite index by a dynamic factor model. Then, three typical search methods including binary segmentation, segment neighborhoods and the pruned exact linear time are applied to the common factor sequence to detect the change points and the segmentation result is considered as the final segmentation result of multivariate time series. The case studies show the applicability and robustness of the proposed approach in hydrometeorological time series segmentation.
Similar content being viewed by others
References
Abonyi J, Feil B, Nemeth S, Arva P (2003) Fuzzy clustering based segmentation of time-series. In: Advances in intelligent data analysis V, Springer, pp 275–285
Abonyi J, Feil B, Nemeth S, Arva P (2005) Modified gath-geva clustering for fuzzy segmentation of multivariate time-series. Fuzzy Sets Syst 149(1):39–56
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
Aksoy H, Gedikli A, Unal NE, Kehagias A (2008) Fast segmentation algorithms for long hydrometeorological time series. Hydrol process 22(23):4600–4608
Albertson DG, Pinkel D (2003) Genomic microarrays in human genetic disease and cancer. Hum Mol Genet 12(suppl 2):R145–R152
Auger IE, Lawrence CE (1989) Algorithms for the optimal identification of segment neighborhoods. Bull Math Biol 51(1):39–54
Bai J, Wang P (2015) Identification and bayesian estimation of dynamic factor models. J Bus Econ Stat 33(2):221–240
Bellman RE, Dreyfus SE (2015) Applied dynamic programming. Princeton university press, Princeton
Choi I (2012) Efficient estimation of factor models. Econ Theory 28(2):274–308
Dickey DA, Fuller WA (1981) Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica 49(4):1057–1072
Doz C, Giannone D, Reichlin L (2012) A quasi-maximum likelihood approach for large, approximate dynamic factor models. Rev Econ Stat 94(4):1014–1024
Durbin J, Koopman SJ (2012) Time series analysis by state space methods. Oxford University Press, Oxford
Edwards AW, Cavalli-Sforza LL (1965) A method for cluster analysis. Biometrics 21(2):362–375
Engle R, Watson M (1981) A one-factor multivariate time series model of metropolitan wage rates. J Am Stat Assoc 76(376):774–781
Forni M, Reichlin L (2005) The generalized dynamic factor model: one-sided estimation and forecasting. J Am Stat Assoc 100(471):830–840
Garcia-Papani F, Uribe-Opazo MA, Leiva V, Aykroyd RG (2016) Birnbaumcsaunders spatial modelling and diagnostics applied to agricultural engineering data. Stoch Environ Res Risk Assess pp 1–20
Gedikli A, Aksoy H, Unal NE (2008) Segmentation algorithm for long time series analysis. Stoch Environ Res Risk Assess 22(3):291–302
Gedikli A, Aksoy H, Unal NE, Kehagias A (2010) Modified dynamic programming approach for offline segmentation of long hydrometeorological time series. Stoch Environ Res Risk Assess 24(5):547–557
Guo H, Liu X, Song L (2015) Dynamic programming approach for segmentation of multivariate time series. Stoch Environ Res Risk Assess 29(1):265–273
Hannan EJ, Quinn BG (1979) The determination of the order of an autoregression. J R Stat Soc 41(2):190–195
Hinkley DV (1970) Inference about the change-point in a sequence of random variables. Biometrika 57(1):1–17
Holmes EE, Ward EJ, Wills K (2012) Marss: multivariate autoregressive state-space models for analyzing time-series data. R J 4(1):11–19
Hubert P (2000) The segmentation procedure as a tool for discrete modeling of hydrometeorological regimes. Stoch Environ Res Risk Assess 14(4):297–304
Hubert P, Carbonnel JP, Chaouche A (1989) Segmentation des séries hydrométéorologiquesapplication à des séries de précipitations et de débits de l’afrique de l’ouest. J Hydrol 110(3):349–367
Inclan C, Tiao GC (1994) Use of cumulative sums of squares for retrospective detection of changes of variance. J Am Stat Assoc 89(427):913–923
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Fluids Eng 82(1):35–45
Kawahara Y, Sugiyama M (2012) Sequential change-point detection based on direct density-ratio estimation. Stat Anal Data Min 5(2):114–127
Kehagias A (2004) A hidden markov model segmentation procedure for hydrological and environmental time series. Stoch Environ Res Risk Assess 18(2):117–130
Kehagias A, Fortin V (2006) Time series segmentation with shifting means hidden markov models. Nonlinear Process Geophys 13(3):339–352
Kehagias A, Nidelkou E, Petridis V (2006) A dynamic programming segmentation procedure for hydrological and environmental time series. Stoch Environ Res Risk Assess 20(1):77–94
Killick R, Eckley I (2014) Changepoint: an R package for changepoint analysis. J Stat Softw 58(3):1–19
Killick R, Fearnhead P, Eckley I (2012) Optimal detection of changepoints with a linear computational cost. J Am Stat Assoc 107(500):1590–1598
Koopman SJ, Shephard N, Doornik JA (1999) Statistical algorithms for models in state space using ssfpack 2.2. Econom J 2(1):107–160
Mariano RS, Murasawa Y (2003) A new coincident index of business cycles based on monthly and quarterly series. J Appl Econ 18(4):427–443
Mariano RS, Murasawa Y (2010) A coincident index, common factors, and monthly real gdp*. Oxf Bull Econ Stat 72(1):27–46
Matteson DS, James NA (2014) A nonparametric approach for multiple change point analysis of multivariate data. J Am Stat Assoc 109(505):334–345
Molinari N, Daures JP, Durand JF (2001) Regression splines for threshold selection in survival data analysis. Stat Med 20(2):237–247
Muggeo VM (2003) Estimating regression models with unknown break-points. Stat Med 22(19):3055–3071
Muggeo VM, Adelfio G (2010) Efficient change point detection for genomic sequences of continuous measurements. Bioinformatics 27(2):161–166
Pfaff B (2008) Var, svar and svec models: implementation within R package vars. J Stat Softw 27(4):1–32
Ramsey JB, Lampart C (1998) The decomposition of economic relationships by time scale using wavelets: expenditure and income. Stud Nonlinear Dyn Econom 3(1):1–22
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W et al (2006) Global variation in copy number in the human genome. Nature 444(7118):444–454
Schwarz G et al (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Seong B, Ahn SK, Zadrozny PA (2013) Estimation of vector error correction models with mixed-frequency data. J Time Ser Anal 34(2):194–205
Shumway RH, Stoffer DS (2010) Time series analysis and its applications: with R examples. Springer, New York
Stock JH, Watson MW (1988) A probability model of the coincident economic indicators. Technical report, National Bureau of Economic Research
Stock JH, Watson MW (2011) Dynamic factor models. Oxf Handb Econ Forecast 1:35–59
Wang N, Liu X, Yin J (2012) Improved Gath–Geva clustering for fuzzy segmentation of hydrometeorological time series. Stoch Environ Res Risk Assess 26(1):139–155
Acknowledgments
This work is supported by the Natural Science Foundation of China under Grant 61673082 and 61533005.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
Let for all t,
so Eqs. (7)–(8) can be transformed into the state space form:
where
Define
so the following expression holds:
Assuming \(v_{t}\sim \mathrm {N}(0, \Sigma _{v})\) and the initial state vector \(f_{0}\) is distributed \(\mathrm {N}(\delta ,\Omega )\), we can express the complete-data log-likelihood function as
where \(\Theta =(\mathrm {vec}(\Lambda )',\mathrm {vec}(B^{*})',\mathrm {vech}(\Sigma _{w})',\mathrm {vech}(\Sigma _{v})')\) is the vector containing all the unknown parameters and \(\mathrm {vec}(\cdot )\) denotes the vectorization of a matrix column-wise from left to right, and \(\mathrm {vech}(\cdot )\) denotes the vectorization of the lower triangular part of a matrix column-wise from left to right. Let
and define \(Q(\Theta )\) as the expectation of \({\log (L(\Theta ))}\) conditional on \(X_{T}\), namely,
We get the iteration formula by calculating the partial differential of Eq. (30) regarding unknown parameters:
where
In addition, define
as the conditional expectation based on \(X_{t}\). The conditional variance and covariance based on \(X_{t}\) are respectively denoted by
and
which can be estimated by the updating and smooth equations of the Kalman filter (Durbin and Koopman 2012). The EM estimation procedure is performed in the following steps (Shumway and Stoffer 2010; Seong et al. 2013):
-
(1)
Given the initial values \(\Theta ^{0},\delta\) and \(\Omega\). (In general, the initial values of \(\Sigma _{w}\) and \(\Sigma _{v}\) are set as the identity matrices with associated dimensions and the initial values of \(B^{*}\) and \(\Lambda\) are set as the zero matrices with associated dimensions. Moreover, we set \(\delta =0\) and \(\Omega =\kappa I\), where I is an identity matrix and \(\kappa\) is 1 for stationary process and is a large value such as \(10^6\) for nonstationary process). On iteration \(j, \mathrm {for} \; j = 1,2,\ldots\):
-
(2)
Compute the negative log-likelihood \({-\log (L_{X}(\Theta ^{j-1}))}\).
-
(3)
Perform the E-Step of EM algorithm. Obtain smoothed values \(s^{T}_{t},P^{T}_{t},P^{T}_{t,t-1}\) for \(t=1,\ldots ,T\) by the Kalman filter based on \(\Theta ^{j-1}\) and then calculate \(M_{ij}\) for \(i,j=0,1\) according to Eq. (35).
-
(4)
Perform the M-Step of EM algorithm. Update the estimates \(\Theta ^{j}\) according to Eqs. (31)–(34).
-
(5)
Repeat Steps (2)–(4) until the likelihood values converge.
Appendix 2
Let for all t,
so Eqs. (7)–(9) can be transformed into the state space form:
where
with
in which \(I_{i\times j}\) and \({\mathbf{0}}_{i\times j}\) stand for an i-by-j identity matrix and an i-by-j zero matrix respectively.
In this case, let \(\eta _{t}=x_t-\hat{x}_t\) denote innovations and its variances are signified as \(F_t\). The Kalman filter allows the computation of the Gaussian log-likelihood function via the prediction error decomposition (Engle and Watson 1981; Koopman et al. 1999). Assuming
the log-likelihood function is given by
where \(\Theta\) is the vector of parameters for a specific statistical model represented in the state space form. The iterative procedure given by Eq. (50) involves finding \(H^{k}\), the information matrix evaluated at \(\Theta ^{k}\); and \(\alpha ^{k}\) is a scalar step length to obtain new estimates \(\Theta ^{k+1}\) based upon estimates from the k-th iteration:
For a symmetric matrix B, the following expressions are satisfied:
Differentiate \(L_{t}\) in Eq. (49) with respect to the parameter \(\Theta _{i}\) according to Eqs. (51) and (52), we get the following expressions (Engle and Watson 1981):
To get the second derivative matrix of the log-likelihood, first calculate
and the only random variables in this expression are the \(\eta _{t}\). Hence, taking the expected value of Eq. (56), we have
Similarly, differentiate \(L_{2_{t}}\) with respect to \(\Theta _{j}\) to obtain
Take expected values of Eq. (58),
The ij-th element of the information matrix is the negative of the sum of Eq. (57) and Eq. (59) summed over all time periods. Thus
The expression Eq. (60) requires \(\eta _{t}\) and its variance \(F_{t}\), which can be calculated numerically by the smoothing equations of the Kalman filter (Durbin and Koopman 2012). In turn, the updated estimate of \(\Theta\) will be employed in the equations of the Kalman filter. Further, the iteration process will achieve the goal of estimating the DFM.
Rights and permissions
About this article
Cite this article
Sun, Z., Liu, X. & Wang, L. A hybrid segmentation method for multivariate time series based on the dynamic factor model. Stoch Environ Res Risk Assess 31, 1291–1304 (2017). https://doi.org/10.1007/s00477-016-1323-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-016-1323-6