Moving dynamic principal component analysis for non-stationary multivariate time series

This paper proposes an extension of principal component analysis to non-stationary multivariate time series data. A criterion for determining the number of final retained components is proposed. An advance correlation matrix is developed to evaluate dynamic relationships among the chosen components. The theoretical properties of the proposed method are given. Many simulation experiments show our approach performs well on both stationary and non-stationary data. Real data examples are also presented as illustrations. We develop four packages using the statistical software R that contain the needed functions to obtain and assess the results of the proposed method.


Introduction
Multivariate time series analysis has many applications, as it can account for interrelations between variables. Advanced technology nowadays allows for the collection of multivariate natured data in a wide range of fields, such as economics, industry, healthcare, and social networks. Many of the existing models, such as VARIMA models, face the challenge of complexity in their structures, even when modelling series with large dimensions. This complexity occurs because the number of parameters expands enormously fast as the dimension increases. Therefore reducing the dimension of the series becomes critical to manage such data.
Many approaches have been proposed in literature for dimension reduction. Factor models are widely used tools to reduce the dimension of a vector time series by applying eigenanalysis on the covariance matrix of the data. See, for example, Peña and Box (1987), Watson (1988, 2002), Bai and Ng (2002), Forni et al. (2005), Peña and Poncela (2006), Pan and Yao (2008), Lam and Yao (2012), and many others. These models treat the variables of an observed series as linear combinations of some hidden factors that could be interpreted subjectively. Another approach of interest is the principal component analysis (PCA), where it also applies eigenanalysis on the covariance matrix of the data. PCA seeks dimension reduction by retaining a small number of principal components that are linear combinations of the original variables. PCA is, in fact, a commonly used technique to perform dimension reduction for static and independent multivariate data. However, because of the dynamic nature of multivariate time series data, the classical PCA technique will not be applicable. The reason is that PCA is static, therefore, will not be able to capture the dynamic dependence between the variables of a multivariate time series. Dimension reduction for time series data can also be achieved using canonical correlation analysis by Box and Tiao (1977) and scalar component analysis by Tiao and Tsay (1989). Ku et al. (1995) introduced dynamic principal component analysis (DPCA) by including lagged series into the analysis. Without losing a valuable amount of information, the results of projected components are linear combinations of both current and lagged values of the data. However, DPCA assumes a stationary series. Therefore, it is not suitable for non-stationary series. Chang et al. (2018) extended PCA by transforming the original series into uncorrelated subseries with lower dimensions. This method is called the principal component analysis for time series (TS-PCA). The resulted subseries can be separately analyzed as they are uncorrelated. However, this method is also limited to stationary series.
Many PCA-based methods were proposed to account for non-stationarity such as moving window principal component analysis (MWPCA) by Lennox et al. (2001) and variable MWPCA by He and Yang (2008). These methods were mostly developed for process monitoring, where PCA is performed separately on each window. By including data from the next time point and excluding those from the oldest time point, new results are obtained based on the new window and so on. However, excluding a large amount of observation by using one widow at a time would lead to the loss of a valuable amount of information. Brillinger (1981) proposed another related approach where the reduction is produced based on a reconstruction criterion. The resulted dynamic components are linear combinations of the original series. Peña and Yohai (2016) proposed the generalized dynamic principal component analysis (GDPCA), where the original data is reconstructed based on a loss function. This method accounts for non-stationary series and produces dynamic principal components that could be non-linear combinations of the original data with nearly zero reconstruction error. This precision is a result of using an iteration method that minimizes the reconstruction error. However, using this iteration method reduces the accuracy of forecasting by using GDPCA's results; See Peña and Yohai (2016).
In this paper, we extend DPCA to non-stationary vector time series. The main difference between DPCA and the method we propose is that the former uses the classical covariance matrix of the data, where the latter uses a new form of the covariance matrix called the moving cross-covariance matrix of the data. This new matrix updates itself at each time point and consists of static and dynamic information of the whole series. The method we propose is different from MWPCA approaches mentioned earlier, where our method uses all observation to calculate one moving cross-covariance matrix. Using the moving cross-covariance matrix enables our method to extract static and dynamic information from series that are allowed to be non-stationarity. Therefore, it will be called the moving dynamic principal component analysis (MDPCA).
There are other methods that divide analyze time series into windows in order to analyze the data such as Multivariate Singular Spectrum Analysis (MSSA). Even though both MSSA and MDPCA aim to produce complex components that consist of dynamic dependence in the data, the two methods have different purposes. MDPCA aims to reduce the dimension of a multivariate time series by seeking fewer uncorrelated principal components with directions which can explain most of variation of the original data. MSSA on the other hand decomposes a time series into a number of components (i.e. elements such as trend, periodic and random noise) then reconstructs data by selecting important elements that contain the dynamic information of the original series; See for example, Golyandina et al. (2001), Broomhead and King (1986) and Hossein and Rahim (2013). This paper is arranged as follows. Section 2 reveals the building-structure of MDPCA, along with a new proposed diagnostic tool to evaluate the relationship between the retained components. Additionally, a criterion for determining the number of final retained components is proposed. Section 3 shows the theoretical properties of our estimators. In Sect. 4, the ability of MDPCA to dimension reduction is examined on both simulated and real data. We also reveal the R packages that consist of the necessary functions used to produce and assess MDPCA's results. Section 5 states concluding remarks and suggested problems for further research.

Methodology
Consider an m-dimensional time series z t = (z 1,t , z 2,t , . . . , z m,t ) , which is allowed to be non-stationary. The initial step in the MDPCA method is to build an m(l + 1)dimensional extended data vector, denoted by y t , which consists of the series z t and its lagged series up to a pre-specified lag l. Then the extended data vector y t is going to have the following structure = (z 1,t+l , . . . z m,t+l , z 1,t+l−1 , . . . z m,t+l−1 , . . . , z 1,t , . . . z m,t ) .
The rest of the analyses will be performed on y t instead of z t . Assume the series z t is observed at T time points. Let m(l + 1) = M and T − l = N . Let Y be an M × N extended data matrix whose columns are y 1 , . . . , y N . A critical feature of the extended data vector y t is that its cross-covariance matrix will account for the dynamic relations that exist among the components (i.e., variables) of z t . This idea was first introduced to PCA by Ku et al. (1995) to reduce the dimension of dynamic data, while PCA is limited to static data. For a stationary series, the DPCA of Ku et al. (1995) applies its analysis to the cross-covariance matrix of y t to reduce the dimension of z t . However, for a nonstationary series, the results of the DPCA would not be valid as it assumes the first two moments to be constant for all time points. Furthermore, if DPCA is applied to a non-stationary series, it could produce correlated dynamic principal components (i.e., DPCs). This is mainly because the cross-covariance matrix will not be able to measure the dynamic dependence between the variables of non-stationary series. One solution we propose is to use moving cross-covariance matrices instead. These matrices will allow the capture of dynamic relations among the components of non-stationary time series because they can be updated as we move in time. Define the cross-covariance matrix to be (1) Once z t is observed, the sample cross-covariance matrix defined over window i with a pre-specified size W = 2w + 1 can be calculated as followŝ where where w is a positive integer. Then, the moving cross-covariance matrix is defined as where Γ i is the cross-covariance matrix defined over window i of y t . The building structure of MΓ will make it more suitable to measure the dynamic dependence between non-stationary series' components as it collects its information from the cross-covariance matrices defined over the updated local windows of y t . In specific, the first cross-covariance matrix is calculated over the first window, then the second cross-covariance matrix is calculated over the second window (i.e. by including the next time point and excluding the oldest one), and so on. Then the moving crosscovariance matrix uses all these cross-covariance matrices to extract the dynamic dependence from y t as a whole. Based on sample data, MΓ can be estimated bulging inΓ i into Eq. (3) asM Note that the moving cross-covariance matrix MΓ is an M × M symmetric matrix which has a spectral decomposition as follows: Correspondingly, the sample moving cross-covariance matrixMΓ has the following spectral decomposition:M whereÛ is an M × M orthogonal matrix whose columns are the eigenvectors ofMΓ andΛ is an M × M diagonal matrix consists of the eigenvalues ofMΓ along its diagonal. Letλ j , 1 ≤ j ≤ M, be the jth eigenvalue ofMΓ (i.e.,λ j is the ( j, j)th element ofΛ), whereλ 1 ≥λ 2 ≥ · · · ≥λ M . Letû j be the corresponding eigenvector (i.e.,û j is the jth column ofÛ). MDPCA reduces the dimension of z t by producing M uncorrelated moving dynamic principal components (MDPCs) and transform y t into a space with dimension k < m such that λ 1 + · · · +λ k λ 1 + · · · +λ M 1.
Here, the value of k also indicates the number of MDPCs being used to reconstruct the data. The optimal value for k is the minimum number of MDPCs that consist of the maximum variation of the data and the minimum reconstruction error. More details about determining the optimal choice of k will be provided in the next section.
Remark 1 Averaging the local sample variance-covariance matrices in Eq.
(2) will formulate the estimation of the global average variance-covariance matrix (i.e. the sample moving cross-covariance matrix) in Eq. (4). The aim of this procedure is to allow non-stationarity while measuring variation and cross-covariation. This procedure can be carried out by following a few steps. First, we determine the window size (i.e. W ) based on the degree of stationarity of the data. Then calculate the first local sample variance-covariance matrix based on the observations from times 1 to W . Then calculate the second local sample variance-covariance matrix based on the observations from times 2 to W +1, and so on. Averaging these local sample variancecovariance matrices produce a global smoothed covariance matrix that consists of the dynamic dependence of the original series.

Optimizing MDPCA's results
In order to improve the results of MDPCA, one would choose optimal values for the window size W , the number of lags l to include in the extended data vector, and the number of retained MDPCs. Choosing a size for W is vital to enhance the results of MDPCA and extract accurate information from the data. The size of W depends on the degree of stationarity of the data. Small window sizes will be more suitable for data that exhibit strong nonstationarity. Hence, determining a size for W can be done by looking at the time series plot and assessing the stationarity of the data. More analyses on determining the size of W will be conducted in the simulations section of this article. Notice that MDPCA can be applied on both stationary and non-stationary series by adjusting the window size, as mention earlier. Therefore, DPCA is a special case of MDPCA where W = N .
In the following, we are going to provide a procedure to determine the optimal size for l. Additionally, a new criterion will be proposed in order to objectively determine the number of optimal components (i.e. MDPCs) to retain.

Choosing optimal number of lags
Including more lagged series can provide more dynamic information to the analysis; however, it would also increase the dimension of y t , which makes the analysis more complicated. Therefore, one would include only lagged series that provide more dynamic information related to the original series in order to gain accurate results with the lowest dimension possible for y t . In order to choose an optimal value for l, we are going to adapt the procedure suggested by Ku et al. (1995), which can be summarized as follows: 1. Start with l = 0. 2. Build the extended data vector y t by including l lagged series. 3. Apply MDPCA to y t and obtain all MDPCs. 4. Set j = m(l + 1) and r (l) = 0 where r is the number of relations. 5. Determine if the jth MDPC provides a linear relation. If yes, go to next step, otherwise go to step 7. 6. Set j = j − 1 and r = r (l) + 1, then repeat step 5. 7. Calculate the number of new relations by: if r new (l) ≤ 0, go to step 9, otherwise go to next step. 8. Set l = l + 1, go to step 2. 9. Stop.
The above steps assumed the size of W to be given or already determined. The number of significant MDPCs can be determined by examining the plot of the eigenvalues of MΓ . Then, the number of relations r can be obtained by subtracting the number Here we provide a diagram example to clarify the idea behind the above procedure. Suppose we consider a series z t which consists of five variables that have some relationships among them. Assume the contribution rate plot of eigenvalues after applying MDPCA with 0, 1, 2, and 3 lags are shown in Fig. 1. Then three static relations are found when MDPCA with l = 0 is applied because only two MDPCs have significantly high contribution rates. Notice that by including each lag, new relations might be detected, and the previous relations will be repeated (l + 1) times. By applying MDPCA with l = 1, eight relations are found, which are the three static relations repeated twice and two new dynamic relations that are exposed by including the first lagged series. By using MDPCA with l = 2, 13 relations are found, which are the three static relations repeated three times, and two dynamic relations repeated twice. Hence, no new relations are found, and the procedure would suggest not including more lags as l = 1 is the optimal choice.
Notice that in the above example, if these five variables are independent and do not have any relationship between them, then all MDPCs resulted from applying MDPCA will be significant. Therefore, no relation will be detected in this case (i.e. r = 0).

Retained component criterion (RCC)
Once W and l are already determined, and MDPCA is applied to y t , then the next task is to choose the optimal number of MDPCs to retain, k. This can be done by balancing between the following desires: maximizing the percentage of explained variance, minimizing the MSE (i.e., mean of squared error) of reconstructing the original data, and reducing the dimension of the series. The percentage of explained variance can be measured as given in Eq. (6). The following equation calculates the MSE of reconstructed data by the first k MDPCs: where y recon t is the reconstructed data by the first k MDPCs,û j,v is the vth element of the jth eigenvector ofMΓ and C v,t is the tth observation of the vth MDPC (C v ) which can be obtained by left multiplying y t by the transpose of the first v columns ofÛ. Notice that choosing more MDPCs will increase the percentage of explained variance and reduce the MSE of reconstructed data; however, it will also increase the final dimension. Therefore, our goal here is to retain the minimum number of MDPCs that explain most of the variation and have minimum reconstruction error. In literature, this is usually done subjectively by balancing between the above desires. To this end, we are going to propose a criterion that can balance between the above desires and objectively suggests the optimal number of MDPCs to retain. This criterion will be called the retained component criterion (RCC).
In order to determine the optimal number of MDPCs, we need to measure the effect of adding each MDPC on the accuracy of the final results of MDPCA, where maximum accuracy can be achieved by explaining all variations in the original data and reducing the MSE of reconstructed data to zero. Notice, we are going to assume that both the percentage of explained variance and the MSE of reconstructed data are equally important to measure the accuracy of MDPCA's results.
Consider the case where a time series y t with dimension M is observed. Assume an ideal case where all M variables are independent and equally important to explain the variability in the data. In particular, all variables consist of equally important information and contribute equally to the variation in y t . Then after applying MDPCA to y t , we expect that each MDPC will equally explain 1/M% of the total variation of y t and reduce the MSE of reconstructing the data by an equal amount of 1/M%. Therefore, each MDPC will improve the accuracy of the final results of MDPCA by 2/M%. The reason behind assuming an ideal case and giving the components of y t equal weights is to include an objective penalty term in our criterion for retaining an extra MDPC in the final results. Before we move further, consider the following definition. Let MaxMSE be the maximum MSE of reconstructing data defined by Notice that MaxMSE is equivalent to the MSE of reconstructing data with no MDPCs available and replacing elements of y recon t in (7) by zeros. Then, the RCC criterion of the first k MDPCs is defined as whereλ j is the largest jth eigenvalue of the matrixMΓ defined in (3), MaxMSE is defined in (8), and MSE k is given in (7). The RCC criterion consists of three main terms: the term ( k j=1λ j / M j=1λ j ), which represents the percentage of explained variance by first k MDPCs, the term ((MaxMSE − MSE k )/MaxMSE), which represents the percentage of reduced MSE by the first k MDPCs, and the term (2k/M), which is a penalty for retaining k MDPCs. The constant "2" is included in the calculation in (9) to retain positive values for the RCC criterion. This is a technical reason as the constant value will not change the final decision of the RCC criterion. The optimal number of MDPCs to retain is the number corresponding to the minimum RCC value in (9). Furthermore, the RCC criterion can be used to determine the optimal number of components in most of the PCA-based reduction methods (e.g., classical PCA and DPCA).
For example, consider a series y t with dimension M = 8. After applying MDPCA, if the first MDPC explains 50% of the total variation of y t (i.e., k j=1λ j / M j=1λ j = 0.5) and reduces MaxMSE by 85% (i.e., (MaxMSE − MSE k )/MaxMSE = 0.85), then the RCC criterion will have a value of 2 − 0.5 − 0.85 + 0.25 = 0.9. Now, let the second MDPC explains 40% of the total variation of y t and reduces MaxMSE by 10%, then the RCC criterion of the first two MDPCs will have a value of 2 − (0.5 + 0.4) − (0.85 + 0.1) + (0.25 + 0.25) = 0.65. Hence, adding the second component will contribute significantly to increase the accuracy of MDPCA's results. Additionally, if the third MDPC explains 5% of the total variation of y t and reduces MaxMSE by 3%, then the RCC criterion of the first three MDPCs will have a value of 2 − (0.5 + 0.4 + 0.05) − (0.85 + 0.1 + 0.03) + (0.25 + 0.25 + 0.25) = 0.87, which means that adding the third MDPC will increase the accuracy by a non-significant amount. This can be explained as the penalty of using the third MDPC is larger than the amount of accuracy added to MDPCA's results. Hence, for this example, the optimal number of retained MDPCs will be 2, as it has the lowest RCC of 0.65.

MDPCA calculation procedure
The following is a summary of the steps of MDPCA: 1. Create the extended data vector y t by including lagged series up to lag l. 2. Calculate the moving cross-covariance matrixMΓ based on y t . 3. Calculate the eigenvalues and the corresponding eigenvectors ofMΓ . 4. Use the RCC criterion to determine k, the optimal number of MDPCs to retain. 5. Left multiplying y t by the first kth columns of the matrixÛ defined in (5) produces the transformed data with reduced dimension.

Evaluating dynamic relationships between MDPCs
For stationary series, examining for a significant correlation between the variables of a multivariate time series can be done by visualizing tools such as the crosscorrelation plots, which is a generalization of the autocorrelation function plot (ACF) of Box and Jenkins (1976) to the multivariate time series. Methods involving testing the significance of the multiple null hypotheses exist in literature such as the multivariate portmanteau statistic; See Hosking (1980). However, the methods mentioned above were developed to capture the dynamic dependence of stationary series and would not be meaningful for non-stationary series because they use the classical correlation function with a fixed mean throughout the calculations. Methods such as co-integration searches for stationary linear combinations of non-stationary series. However, co-integration is concerned with the long-run relationships between nonstationary variables; See Engle and Granger (1987), Johansen (1995). To this end, we need to extend some of the methods mentioned above to find correlated components or variables of non-stationary series by using a bit different measurement of correlation that can be updated as we move forward or backward in time. Hence, we propose the use of a moving cross-correlation function. This function will be used to check whether two non-stationary variables are correlated. It will also be used to evaluate the relationship between MDPCs. Before we proceed further, the following definitions are needed. Define the lag l cross-covariance matrix of y i as Also, define the lag l cross-correlation matrix of y i to be where l is a non-negative integer, Γ i (l) is defined in (10) and S i is the diagonal matrix of the standard deviations of y i . The ( j, j)th element of S i is the square root of the ( j, j)th element of Γ i (0) defined over y i . The above functions can be estimated using the following formulas as follows. The sample lag l cross-covariance matrix over window y i with a pre-specified size of W = 2w + 1 will be calculated usinĝ where Then,Γ i (l) defined in (12) can be used to calculate the sample lag l moving crosscorrelation matrix over the window y i ,ρ i (l), to estimate ρ i (l) aŝ where the ( j, j)th element ofŜ i is the square root of the ( j, j)th element of Γ i (0) defined over the same window y i . Further, define the lag l moving cross-correlation matrix of the series y t to be Based on sample data, we can estimate Mρ(l) using the sample lag l moving crosscorrelation matrix as followŝ Notice thatMρ(l) will be updated at each time point as we move in time to account for non-stationary series. Based on the above-stated definitions, both visualization and multiple hypotheses testing methods can be developed to check for the significance of correlations between the components of either stationary or non-stationary series. For visualization, one can plot the sample moving cross-correlation matrices with different time lags l = 0, ±1, ±2, . . . ± p; where p is a positive integer taken to be p = 10 log 10 (N /M), similarly to those in ACF plot. The significance of the correlation can be evaluated by looking at the 95% confidence interval computed using ±1.96/ √ N . We demonstrate the use of the moving cross-correlation function in the following examples.
Example 1 This example is a short simulation study to test the ability of the moving cross-correlation plots to capture the dynamic relationship among different variables of a multivariate time series. A window with size 101 will be used in the calculation of the moving cross-correlation function. The results then will be compared with those based on the cross-correlation function.
The simulated data in this example consists of eight variables and a sample of size 1200, where three different non-stationary models were used to generate three subseries of 4, 3, and 1 variable as described below. Let a t , b t , and c t be three independent standard normal white noises, which are the innovation terms of the following three models, respectively, then: j = 1, 2, 3 and 4 y j,t = v t+ j−4 , j = 5, 6 and 7 A time series plot of the simulated multivariate time series is available in Fig. 2, where all variables exhibit non-stationary behaviours. First, we examine the sample cross-correlation plots (i.e., using the classical cross-correlation function) of the data; See Figs. 3, 4, 5, and 6. Based on these plots, a strong dynamic relationship exists among the eight variables, which implies that all three simulated subgroups are strongly correlated. The last result contradicts with the way that we simulated the data. Therefore, the cross-correlation plots could lead to non-correct results when dealing with non-stationary series.
On the other hand, the sample moving cross-correlation plots of the simulated series are provided in Figs. 7,8,9,and 10. Three uncorrelated subgroups of 4, 3, 1 variable are detected, where the variables in each subgroup are strongly correlated.
The main reason that we obtained different results using the two visualization methods above is the non-stationarity nature of the data. Therefore, we can conclude that the moving cross-correlation plots can capture the dynamic relationship between non-stationary series. Furthermore, it can be shown that the above two methods will produce similar conclusions when applied to stationary series.

Theoretical properties
To show the reliability of the results obtained by the proposed MDPCA, we shall prove the consistency of the estimated MDPCs, which are generated by left multiplying the extended data matrix byÛ in (5). Therefore, we shall show thatÛ is a consistent estimator of U. We are going to approach the consistency by showing that D(M (Û), M (U)) → 0 as W → ∞. Here, W is the window size used in the calculation of MDPCA, M (U) is the linear space spanned by U's columns, and D(M (Û), M (U)) is the distance between the spaces M (Û) and M (U). For ease of notation, we are going to use c, c 1 , c 2 , . . . to denote constants whose values might be different from place to place.
For two positive integers c 1 < c 2 , let B 1 and B 2 be any c 2 × (c 2 − c 1 ) matrices satisfying the condition Define the distance between the B 1 and B 2 to be Notice that D(B 1 , B 2 ) = 0 if and only if M (B 1 ) = M (B 2 ). This measurement was applied by Pan and Yao (2008), Chang et al. (2018).
It is important to know that the convergence ofÛ is implied by the convergence ofMΓ in (4). This can be seen sinceÛ consists of the eigenvectors ofMΓ . The fact thatMΓ is calculated based on moving windows whose width depends on the stationarity of the data makes the moving cross-covariance function more complicated than the ordinary cross-covariance function. However, the consistency ofMΓ still can be reached as each window inMΓ is calculated as in the stationary case. Therefore, the convergence of the estimated moving principal components MDPCs will depend on the size of W . Recall that W ≤ N , where W = N when data is stationary, and W gets smaller as the data becomes more non-stationary.
In the following work, we approach the consistency assuming the dimension M to be fixed. The needed conditions will be stated. Moreover, since time series data is known to be dependent data, we are going to consider the following measurement of dependence: where F c 4 c 3 is the σ -field generated by y t for c 3 ≤ t ≤ c 4 . This measurement of dependence [i.e., θ l in (19)] is called the mixing coefficients in literature and θ l = 0 if the time series is a sequence of independent random variables. This measurement indicates that the two data observed at two time points, which are l times apart, are going to be independent as l → ∞. More information on the use of the mixing coefficients can be found in Bradley (1986). The proof of Theorem 1 will be provided in the "Appendix".

Simulations and real data examples
In this section, we are going to test the ability of the proposed method on both real and simulated multivariate time series data. The following examples will focus on nonstationary series. Recall, MDPCA and DPCA produce identical results when applied to stationary data since MDPCA uses a window of size W = N in the stationary case. The performance of MDPCA will be assessed by considering the percentage of explained variance (i.e., contribution percentage), the MSE of reconstructed data, and the moving cross-correlation plots of the retained MDPCs. All analyses are done using R software. The needed functions to produce and assess MDPCA's results can be found under the following R packages (i.e. libraries): MACF of Alshammri (2020a), MCOV of Alshammri (2020b), MDPCA of Alshammri (2020c) and RCC_MDPCA of Alshammri (2020d).

Simulations
In the following simulation studies, we are going to apply MDPCA with different combinations of window and lag sizes on simulated datasets of different dimensions and sample sizes. Each simulation will be replicated 500 times. Data sets will be generated using arima.sim command in R.

Example 2
In this example, we apply the MDPCA on a non-stationary series z t with ten variables. This example consists of two parts. The first part studies the results of MDPCA when using different combinations of W and l. The second part compares the effect of the size of T on MDPCA's results. The series z t is generated using five different models, such that each model produces two correlated variables as follows.
Let a t , b t , c t , d t and e t be independent standard normal white noises, which are the innovation terms of the following five models, respectively, then: j = 1 and 2 z j,t = v t+ j−3 , j = 3 and 4 z j,t = w t+ j−5 , j = 5 and 6 z j,t = x t+ j−7 , j = 7 and 8 z j,t = q t+ j−9 , j = 9 and 10 (20) where A time series plot of the simulated data is shown in Fig. 11, where it can be seen that every two variables represent a different non-stationary model. First, we would like to see the results of MDPCA with different options of W and l. Based on 500 replicas, Table 1 is a comparison between the results of MDPCA with different sizes of W and l when applied to the simulated series z t with 1500 samples, where two MDPCs are considered. For the mean percentages of explained variance by the two MDPCs, the percentages obtained by using l = 1 are higher than those obtained by using l = 5. However, the percentages differ by a small amount, which can be justified as using more lagged data (i.e., l = 5) can include more information to the analysis. For example, two MDPCs, on average, explain 96.48% of the variance of the data when using MDPCA with W = 101 and l = 1, compared with 94.59% when using W = 101 and l = 5. The standard errors of explained variance in all cases are 0.01, which indicates steady percentages in all replicas. For the mean of MSE of reconstructed data, it ranges between 309.09 and 403.74, where it has its lowest when W = 301 and l = 1 are used, and it has its highest when W = 101 and l = 5 are used.
Furthermore, the dynamic dependence between the two MDPCs can be revealed by plotting the means of the absolute value of the moving cross-correlation with W = 101; See Figs. 12, 13 and 14 and the corresponding standard errors in Table 6. There are no dynamic relationships between the two MDPCs for both cases when using W = 101. However, the correlations get slightly larger and cross the significance line as we increase the window size from W = 201 to W = 301. Therefore, based on the above discussion, the dimension of the simulated nonstationary series with ten variables in this example can be best reduced by using MDPCA with W = 101 and l = 1 or MDPCA with W = 101 and l = 5.
Second, we would like to see the results of MDPCA when the series z t has different sample sizes (i.e., T = 200, 400, 600, and 800). In this part, we are considering MDPCA with W = 101 and l = 5. Based on 500 replicas, the results of applying MDPCA on different sizes when simulating z t are shown in Table 2, where two MDPCs are considered. For the mean percentage of explained variance, the highest percentage On the other hand, the means of the absolute value of the moving cross-correlation with W = 101 indicates a significant dynamic relationship between the two MDPCs when T = 200; See Fig. 15. The correlations then decreased between the two MDPCs when T = 400 with some minor significant cross-correlations. For the cases where T = 600 and 800, the plots indicate uncorrelated MDPCs; See Fig. 16. Also, Table 7 shows the standard errors of the absolute value of the moving cross-correlation. We notice an improvement on the errors when increasing the sample size of the data. For example, the standard errors range between 0 and 0.24 when T = 200, comparing with 0-0.12 when T = 800. Notice by applying MDPCA to the simulated data with T = 600 we are able to obtain similar results to those with T = 1500. Therefore, even though we increased the dimension of z t to m = 10, MDPCA still performs well on data with moderate sample sizes.
Example 3 In the following simulation study, MDPCA is applied to a non-stationary series z t that consists of 15 variables. This study consists of two parts. The first part compares the MDPCA's results when using different combinations of W and l. The second part compares the effect of the size of T on MDPCA's results. The data is generated using five different models, such that each model produces three correlated variables.
Let a t , b t , c t , d t and e t be independent standard normal white noises, which are the innovation terms of the following five models, respectively, then: z j,t = u t+ j−1 , j = 1, 2 and 3 z j,t = v t+ j−4 , j = 4, 5 and 6 z j,t = w t+ j−7 , j = 7, 8 and 9 z j,t = x t+ j−10 , j = 10, 11 and 12 z j,t = q t+ j−13 , j = 13, 14 and 15 (22) where Based on 500 replicas, Table 3 shows the results of MDPCA with different combinations of W and l applied to z t with 2000 samples where two MDPCs are considered. It can be seen that the mean percentages of the explained variance by the two MDPCs increase slightly as we increase W . For example, the mean percentage is 96.13% when using MDPCA with W = 101 and l = 1 then increases to 97.22% when using MDPCA with W = 301 and l = 1. Also, the percentages are slightly lower when using more lagged data (i.e., when increasing l). For example, the mean percentage is 96.85% when using MDPCA with W = 201 and l = 1 then decreases to 95.90% when using MDPCA with W = 201 and l = 5. The standard error of the percentage of explained variance has a small value of 0.01 in all cases. Additionally, the MSE of reconstructed data has large standard errors, which means that its value can be small or large, depending on the data. For example, the mean MSE is 359.27 when using MDPCA with W = 101 and l = 1, and has a standard error of 234.25.
Additionally, the plot of the means of the absolute value of the moving crosscorrelation with W = 101 indicates no dynamic relationships between the two components when using MDPCA with W = 101; See Fig. 17. However, minor, but significant, correlations between the two components for all cases where MDPCA is used with W = 201; See Fig. 18. The correlations become slightly larger when using MDPCA with W = 301; See Fig. 19. Small standard errors of the absolute value of the moving cross-correlation are reported in Table 8, where their values range between 0 to 0.08 in all cases.
Based on the above results, we can best reduce the dimension of the simulated nonstationary series with 15 variables in this example by using MDPCA with W = 101 and l = 1, or MDPCA with W = 101 and l = 5.
To see the effect of changing the sample size of z t on the MDPCA, we are going to apply MDPCA with W = 101 and l = 1 on z t with sample sizes T = 200, 400, 600, and 800. The results based on 500 replicas are summarized in Table 4 Table 9, where the values can be improved by increasing the sample size. For instance, the standard errors range between 0 and 0.23 when T = 200 comparing with 0-0.11 when T = 800. Therefore, in this example, MDPCA performs well when applied to z t with T ≥ 600. The MDPCA was able to reduce the dimension of z t from 15 to 2.
By the end of the above simulation studies, we conclude that the proposed MDPCA is able to reduce the dimension of a multivariate time series by taking into account both the dynamic and non-stationarity behaviors of the data. It was noticed that the performance of MDPCA was steady even though we increased the dimension of the tested series.

Real data examples
Example 4 In this example, the MDPCA will be applied to daily stock prices of 17 USA companies in US Dollars from November 07, 2013 to December 18, 2017. The sample size of the data is 1036 days. The names of the 17 companies are shown in Table 5. The data is accessible on Yahoo! Finance. The time series plots of the daily stock prices of the 17 companies reveal the non-stationarity behavior of the daily stock prices; See Figs. 22 and 23. Figure 24 shows the last 36 sample moving cross-correlation plots with W = 101 of the stock prices before the transformation. The daily stock prices of the 17 companies are moderately correlated. For example, the company MetLife Insurance is strongly correlated with Prudential Financial and weakly correlated with McKesson.
By looking at Figs. 22 and 23, MDPCA with W = 101 will be used. The optimal number of lags is l = 1, as shown in Fig. 25, where ten static and seven dynamic relations were found. Figure 26 consists of an eigenvalues' plot along with the relative RCC criterion plot. The RCC has values of 0.508, 0.490, 0.489, 0.486, and 0.502 for the first three, four, five, six, and seven MDPCs receptively. Thus, the optimal number of MDPCs to retain is six MDPCs, as suggested by the RCC criterion. The six MDPCs explain 90.25% of the total variation in the data and produce a reconstruction error of 304.55. The retained six MDPCs are uncorrelated, as shown in the sample moving cross-correlation plots in Fig. 27.
To conclude, MDPCA with W = 101 and l = 1 was able to reduce the dimension of the daily stock prices of the USA companies from 17 to 6 by accounting for the non-stationarity and the dynamic dependence in the stock prices. Notice that MDPCA was applied directly to the original stock prices. This will prevent any loss of information caused by dealing with, for example, the log return of the prices.

Concluding remarks
In this paper, we introduced MDPCA, which is a PCA-based dimension reduction method that is used to reduce the dimension of multivariate time series data by transforming them into uncorrelated components. MDPCA is a generalization of DPCA of Ku et al. (1995) to non-stationary series. DPCA can be considered as a special case of MDPCA when W = N .
We used three methods to assess MDPCA's results. The moving cross-correlation function which evaluates the dynamic relationships between the final retained MDPCs. The MSE of reconstructed data. The percentage of explained variance. Choosing the window size for MDPCA depends on the stationarity of the data. Shorter windows are suitable for series with strong non-stationary behavior, and the opposite is true. Determining an optimal window size for MDPCA can be a subject for further research.
The RCC criterion is a new tool to determine the optimal number of MDPCs to retain. This criterion balances between the following two desires, reducing the dimension of the data and increasing the accuracy of the final results. Additionally, the RCC criterion can be employed to determine the optimal number of retained components in PCA-based reduction methods.
The asymptotic properties of our estimatorÛ, the matrix that consists of the eigenvectors of the moving cross-covariance matrix of the data, are studied. Under some regularity assumptions, we show thatÛ is a consistent estimator of U with W −1/2 convergence rate.
We carried out many simulations considering non-stationary series with different dimensional and sample sizes. MDPCA was able to reach dimensional reduction and performs well even for reasonably small sample sizes (i.e., T = 400). A real data example was used to confirm the results of the simulations.

Data availability
The data used to support the findings of this article are available at Yahoo! Finance via https://finance.yahoo.com.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The proof of Theorem 1 is shown here. The following lemma of Chang et al. (2018) is useful to state our results.
Lemma 1 Assume for γ > 2, E(|y j,t −μ j | 2γ ) is uniformly bounded away from infinity for j = 1, . . . , p, and t ≥ 1. Let the mixing coefficients T ) for each l ≤ l 1 , where l 1 is a pre-described positive integer number.
The following lemma is based on Lemma 1.
Lemma 2 Let the Assumptions 1 and 2 hold. Also, if we assume the dimension M to be fixed, then for each i, (1) and (2), respectively.
Proof of Lemma 2 By assuming Assumptions 1 and 2, then Lemma 1 can be applied on each window, and we have for the lagged covariance matrices up to any specified lag l 1 . Notice that MDPCA is only considering cross-covariance with no lags (i.e., Γ i = Γ i (0)), thus the following is true The following lemma is based on the results from Lemma 2. 1 and 2, then the following holds as W → ∞,

Lemma 3 Assume the dimension M is fixed, and under the Assumptions
where MΓ andMΓ are defined in (3) and (4), respectively.