A robust-filtering method for noisy non-stationary multivariate time series with econometric applications

We investigate a new filtering method to estimate the hidden states of random variables for multiple non-stationary time series data. This helps in analyzing small sample non-stationary macro-economic time series in particular and it is based on the frequency domain application of the separating information maximum likelihood (SIML) method, developed by Kunitomo et al. (Separating Information Maximum Likelihood Estimation for High Frequency Financial Data. Springer, New York, 2018), and Kunitomo et al. (Japan J Statistics Data Sci 2:73–101, 2020), and Nishimura et al. (Asic-Pacific Financial Markets, 2019). We solve the filtering problem of hidden random variables of trend-cycle, seasonal and measurement-errors components, and propose a method to handle macro-economic time series. We develop the asymptotic theory based on the frequency domain analysis for non-stationary time series. We illustrate applications, including some properties of the method of Müller and Watson (Econometrica 86-3:775–804, 2018), and analyses of some macro-economic data in Japan.


Introduction
There exists vast research on the use of statistical time series analysis for macro-economic time series. One important feature of macroeconomic time series, which is different from standard statistical time series analysis is that the observed time series is an apparent mixture of non-stationary and stationary components. The second feature is that the measurement errors in economic time series play important roles because macro-economic data are usually constructed from various sources, including sample surveys in major official statistics, while the statistical time series analysis often ignored measurement errors. Third, the sample size of macro-economic data is rather small, and we have about 120 time series observations for each series for quarterly data over 30 years. The quarterly GDP series, which is the most important data in the Japanese macro-economy, for instance, is regularly estimated and released by the cabinet office of Japan. 1 Fourth, to publish the seasonally adjusted data, the official agencies usually apply the X-12-ARIMA program, which uses the univariate reg-ARIMA model to remove the seasonality as the standard filtering procedure. As the sample size is small, it is important to use an appropriate statistical procedure to extract information on trend-cycle, seasonality and noise (or measurement error) components systematically from multiple time series data.
In this study, we investigate a new filtering procedure to estimate the hidden states of trend-cycle, which are non-stationary, and to handle multiple time series data, including small sample time series. Kunitomo and Sato (2017), Kunitomo et al. (2018), and Kunitomo et al. (2020) have developed the separating information maximum likelihood (SIML) method for estimating the non-stationary errors-invariables models. They have discussed the asymptotic and finite sample properties of the estimation of unknown parameters in the statistical models. We utilize their results to solve the filtering problem of hidden random variables and show that they lead to new a way to handle macro-economic time series.
Related literature on the non-stationary economic time series analysis are Engle and Granger (1987) and Johansen (1995), which examined multivariate non-stationary and stationary time series and developed the notion of co-integration without measurement errors. Our problem is related to their work, but it has different aspects, and our focus is on the non-stationary trend-cycle, seasonality and measurement error in the non-stationary errors-in-variable models and their frequency domain analysis. Some related econometric studies on time series in the frequency domain are Baxter and King (1999), Christiano and Fitzgerald (2003) and Müller and Watson (2018). See Yamada (2019) for a survey of related studies, including the well-known Hodrick-Prescot (HP) filter in econometrics.
In statistical multivariate analysis, some studies on the errors-in-variables models are Anderson (1984Anderson ( , 2003 and Fuller (1987); however, they considered multivariate cases of independent observations, and the underlying situation is different from ours.
For statistical filtering methods, Kitagawa (2010) discussed the standard statistical methods already known, including the Kalman-filtering and particle-filtering methods. Although many studies have examined statistical filtering theories, we must exercise caution in analyzing non-stationary multivariate economic time series. See Granger and Hatanaka (1964), Brillinger and Hatanaka (1969) on early studies, and Harvey and Trimbur (2008) on the relationship between HP filter and other methods, for instance. Here we should mention two issues. First, the existing methods often depend on the underlying distributions such as the Gaussian distributions for the Kalman-filtering, and second, they often depend on the dimension of state variables. There may be some difficulty in extending the existing methods to high-dimension cases, even when the dimension is about 10. On the other hand, we expect that our method has robustness properties when we handle small sample economic times series with non-stationary trend-cycle, and stationary seasonality and measurement errors because our method does not depend on the specific distribution as well as the dimension of the underlying random variables. See Kunitomo et al. (2020) for a comparison of small sample properties of the ML (maximum likelihood) and SIML methods for the non-stationary errors-in-variables models, and Nishimura et al. (2019) for an application of financial data smoothing. The most important feature of the present procedure is that it may be applicable to small sample time series data with non-stationary trend-cycle, seasonal and noise components and it has a statistical foundation based on the (real-valued) spectral decomposition of stochastic processes by a (real-valued) Fourier transformation, as we shall explain in Sects. 4 and 5.
In Sect. 2, we give some macro-economic data, which have motivated this present study. In Sect. 3, we define the non-stationary errors-in-variables models and the SIML method. In Sect. 4, we introduce the SIML filtering method. In Sect. 5, we give the statistical foundation of the method and in Sect. 6, we discuss the problem of choosing the number of orthogonal processes and give some numerical examples based on simulation and data. Section 7 contains some applications including an interpretation on the M üller-Watson method in econometrics and gives two empirical applications of macro-consumption in Japan. Concluding remarks are given in Sect. 8. The "Appendices A and B" contains some mathematical derivations of our results and figures.

Two illustrative examples
In the first illustrative example, we plot the graph of two macro-economic time series in Japan: quarterly (real) consumption and quarterly (real) GDP (1994Q1-2018Q2) as Fig. 1. 2 It looks like a simple example of linear regression in undergraduate textbooks. However, if we draw the time series sequences of these (original and seasonally unadjusted official) macro-data estimated by the Cabinet office of the Japanese Government as Fig. 2, we find things to be a little more complex than in Fig. 1. There are clear trend-cycle components, seasonal fluctuations, and noise components in two time series data. Although many economists usually use the seasonally adjusted (published) data, which were constructed by using the X-12-ARIMA program in different ministries within the government, the effects of filtering in the program are often unknown. The X-12-ARIMA program uses the univariate 2 In Japan, both the original quarterly series and the seasonally adjusted series of GDP and its major components are regularly published. It differs from macro-data in the US in some aspect. reg-ARIMA model, which is a mixture of univariate seasonal ARIMA and linear regression, and it decomposes univariate time series into the trend-cycle, seasonality and noise components as the standard procedure. 3 In contrast, the DECOMP program, which is explained by Kitagawa (2010), is a possible choice particularly in Japan, uses the univariate AR model and the Kalman filtering technique with AIC, which is based on the Gaussian likelihood. When each time series is handled using different filtering procedures (i.e., different ARIMA models or reg-ARIMA models for instance), it may cause a fundamental problem in their interpretation when the focus is on the relationships among different non-stationary time series. Figure 3 gives three different macro-consumption data (2002 January-2016 December), which are observed as monthly time series and widely used by economists in Japan to judge the current macro-business condition. The first series is Kakei-Chosa (the data from monthly consumer-survey collected by the Statistics Bureau, Ministry of Internal Affairs and Communications), the second is Shougyo-Doutai-Statistics (the data from monthly retail constructed by Ministry of Economy, Trade and Industry (METI)) and the third is Dai-Sanji-Sangyo-Statistics (the index data on commerce constructed by METI). We note that the data construction processes of these series based on sample surveys are complex in different ministries within the government and each data reflects different aspects of macro-consumption. Although they show similar movements, we observe some differences in trend-cycle, seasonality and noises. Then it may be desirable to unify the monthly consumption series because we want to judge the business condition each month by just observing these data to evaluate the state of the Japanese macro-economy and forming macro-economic policy. Many economists in both governments and private sectors usually use seasonally adjusted data, which were constructed from the quarterly or monthly (original) time series via the univariate X-12-ARIMA seasonal adjustment program. It is important to construct the monthly consumption index, which is consistent with the published quarterly macro-consumption data, which usually reported with substantial time lags. It was one of the motivations to develop our filtering theory.
Some econometricians use the multivariate (parametric) time series models such as VAR (vector autoregressive process) for analyzing macro-economic data and investigating the relationships among them. They may use the seasonally adjusted (official) data, but we need caution to use such data because most published official data are already filtered by the X-12-ARIMA program. When the dimension is more than 2, some difficulties handling trend-cycle, seasonal, and measurement errors at the same time typically arise. A need to handle macro-economic data in a simple non-parametric way motivated us to develop the multivariate non-stationary errors-in-variables models and the filtering method for the hidden state variables with measurement errors in Sects. 3 and 4 (See Morgenstern 1950;Nerlove et al. 1995 for the related issues).

A simple non-stationary errors-in-variables model and the SIML Method
In this sub-section, we first introduce a simple non-stationary errors-in-variables model with trend and noise components, and explain the SIML method. Then in the next subsection we shall investigate the general framework for non-stationary multivariate time series with trend-cycle, seasonal and measurement-error components.
Let y ji be the ith observation of the jth time series at i for i = 1, … , n; j = 1, … , p . We set i = (y 1i , … , y pi ) � be a p × 1 vector and n = ( � i ) (= (y ij )) be an n × p matrix of observations and denote 0 as the initial p × 1 vector and it is fixed. We consider the simple model that the underlying non-stationary trend is When each pair of vectors i and i are independently, identically, and normally distributed (i.i.d.) as N p ( , x ) and N p ( , v ) , respectively, we have the observations of an n × p matrix n = ( � i ) and set the np × 1 random vector ( � 1 , … , � n ) � . Given the initial condition 0 (= 0 ) , we have where � n = (1, … , 1) and We use the n -transformation that from n to n (= ( We need to use m terms in n and the reason for this becomes clear from the frequency domain analysis, which will be explained in Sect. 5. Kunitomo et al. (2020) discussed the estimation of the variance-covariance matrix v when i are i.i.d. vectors and some consistent estimators of v were developed. As we shall see in Sect. 5, the SIML estimation method is quite robust, even when (x) i and i are non-Gaussian stationary processes, and they are serially-correlated.

General non-stationary errors-in-variables model
To investigate non-stationary trend-cycle component, and stationary seasonality and measurement error component, we consider the general non-stationary multivariate errors-in-variables model 4 We take a positive integer s (s > 1) , N, and n = sN for the resulting simplicity of exposition and arguments. We explain the general model in three steps.
(1) The trend-cycle factor i (i = 1, … , n) is a sequence of non-stationary I(1) process that satisfies with the lag-operator L i = i−1 , = 1 − L, The random vectors i (i = 1, … , n) are a sequence of stationary I(0) process with where the p × p coefficient matrices (v) j are absolutely summable and (2) The seasonal factor i (i = 1, … , n) is a sequence of stationary process, 5 which satisfies where the lag-operator is defined by Anderson (1971)).
Then the p × p spectral density matrix of the transformed vector process of difference series i (= i − i−1 ) can be represented as We denote the long-run variance-covariance matrices of trend-cycle and noise components for g, h = 1, … , p as respectively. One important often neglected issue is that when applying the differencing procedure to non-stationary time series and using the standard statistical method for multivariate stationary time series, there is no guarantee of keeping the relationships among the original time series as they were by using the transformations. Although Engle and Granger (1987) and Johansen (1995) noticed this problem, they did not consider the frequency domain aspect with seasonality and measurement errors (See Hayashi 2000). The SIML filtering approach may shed a new light on the relationships among the time domain and frequency decompositions of non-stationary multivariate time series.
Our notation of spectral density is slightly different from the standard notation used in Anderson (1971). Let = 2 and f A ( ) (− ≤ ≤ ) be the spectral density in Chapter 7 of Anderson (1971).
The present defition of spectral density corresponds to 2 f A (− ) in some literature.

Basic filtering
We introduce the general filtering procedure based on the n -transformation in (4). When we interpret that the elements of the resulting n × p random matrix n take real values in the frequency domain, it is easy to understand their roles. Since n is a kind of real-valued discrete Fourier transformation, vectors k (k = 1, … , n) in n are asymptotically uncorrelated, as we shall discuss in Sect. 5.1. We investigate the partial inversion of the transformed orthogonal processes. Let an n × p matrix The stochastic process n is the orthogonal decomposition of the original time series n in the frequency domain and n is an n × n filtering matrix. Because n consist of non-stationary time series, we need a special form of transformation n in (4). We give explicit forms of two examples, including the trend-cycle filtering and the band filtering procedures. Although there can be many possible filtering procedures within our general framework, it is useful to discuss some linear filtering procedures.
We start with the case when w t,n = 1 (t = 1, … , n) and we have the identity matrix n = n . Then we find n n n n −1 n = n . There can be useful cases and we present two cases as the trend-cycle filtering and the band filtering by choosing w t,n = 1 or 0 for some t ′ s . Although the first example could be regarded a special case of the second one, the analysis of trend-cycle component in the first example has an important role in economic time series.
(1) Trend-cycle filtering Let an m × n (m < n) choice matrix m = ( m , ) , and let also n × p matrix and an n × n matrix n = � m m .
We construct an estimator of n × p hidden state matrix * n only in the lower frequency parts by using the inverse transformation of n and deleting the estimated seasonal and noise parts. We denote the hidden trend-cycle state based on m frequencies as This quantity is different from * n because t (t = 1, … , n) in (9) and (10) contains not only the trend-cycle component of t (t = 1, … , n) , but also the noise component in the frequency domain, which is different from the measurement noise component t (t = 1, … , n) in (9)-(13). We try to estimate the trendcycle component of t by using (22) and recover the trend-cycle component of n near at the zero frequency because the effects of differenced measurement error noises ( t − t−1 ) are negligible around at zero frequency. This method differs from some existing procedures that consider the decomposition of time series only in the time domain. Our arguments can be justified by using the frequency decomposition of t and (n) t = t ( = t − t−1 and 0 being fixed), and we shall discuss this issue in Sect. 5.2.
We partition n into [m + (n − m)] × [m + (n − m)] matrices as and then After straightforward calculations (see the "Appendices A and B" for the derivation), the (j, j � ) th element of n = n � m m n = a (n,m) j,j � is given by It is possible to evaluate MSE of statistical state vector estimation. Let (24) is the n × n variance-covariance matrix of ni + ni and nj + nj ( n × 1 vectors for i, j = 1, … , p ), and, ni and ni ( n × 1 vectors) are the ith and jthe column vectors of n and n , respectively.
Since (25) does not depend on t , (22) minimizes the MSE with respect to unknown state vector t to estimate (23) and it is optimal in this sense.
(2) Band filtering We consider a general filtering based on the n transformation in (4) and use the inversion of some frequency parts of the random matrix n . The leading example is the analysis of seasonal frequencies in the discrete time series and we take s (> 1) being a positive integer as the seasonal lag.
Let an , and let also n × p matrix and an n × n matrix n = � m 1 ,m 2 m 1 ,m 2 . When we have a particular seasonal frequency s (> 1) , for instance, we can take m 1 = [2n∕s] − [m∕2] and m 2 = m . We set s = 4 for quarterly data and s = 12 for monthly data. (We may have many seasonal frequencies in data analysis as pointed out by Granger and Hatanaka (1964) already.) As in the trend-cycle filtering problem, (26) is the SIML-filtering value for and it is an estimate of some frequency components of t + t (t = 1, … , n) in (9), (10) and (13). The filtering problem becomes difficult because some frequency component of t includes not only some component t at the same frequency, but also some component of the measurement errors at the same frequency. After straightforward calculations (see the "Appendices A and B") in this case, the (j, j � ) th element of n = n � m 1 ,m 2 m 1 ,m 2 n = a (n,m 1 ,m 2 ) j,j � is given by when m 1 = 0 and m 2 = m , the resulting formula reduces to the trend-cycle filtering case. There are existing filtering procedures having the frequency domain interpretation, but it seems that our procedure differs from some existing literature because of (24) and (28).
It is also possible to evaluate MSE of the state vector estimation in the same way as Case (1). Let ̂ (m 1 ,m 2 ) n be an estimate of the hidden state as (s,m 1 ,m 2 ) n = n n � m 1 ,m 2 ,n m 1 ,m 2 ,n n −1 n ( * n + n ). By using the similar calculation as Case (1) and the notation of (v) If there are several seasonal frequencies, we need more complicated filtering procedures.

Underlying asymptotic theory
At first glance, the SIML filtering procedure may be regarded as an ad-hoc statistical procedure without any mathematical foundation. However, it has a rather solid statistical foundation.
We shall utilize an asymptotic theory for the stationary linear processes because the time series model defined by (9)-(13) can be regarded as a special case. Let where is a constant vector (we assume = for simplicity in this section), j are p × p matrices, and i are a sequence of mutually independent random variables with We then summarize the useful result on (n) k ( (n) k ) . Although it could be regarded as a direct extension of Theorem 8.4.3 of Anderson (1971) for discrete and (ergodic) stationary time series, we could not find the following representation. The proof is given in the "Appendices A and B".
Proposition 1 Let j (j = 1, … , n) be an ergodic stationary stochastic process given by (31) with ∑ ∞ h=0 ‖ j ‖ < ∞ and the fourth order moments of each element of i are finite.
Let also (n) k ( (n) k ) = ∑ n j=1 p (n) jk j and j be an ergodic stationary sequence with , and the (symmetrized real-valued) spectral density matrix is the positive definite and bounded (real-valued and symmetrized) spectral matrix.
This proposition covers the general model with (9)-(13) with the moment conditions because i are stationary. As the asymptotic variance-covariance matrix of the orthogonal random vectors (n) k ( (n) k ) is the (symmetrized real) spectral density matrix, it can be estimated consistently. When we have noise terms as in Sect. 3, it is not possible to estimate the (long-run) variance-covariance matrix x of trend-cycle component simply by using the differenced time series j (= j ) = j − j−1 . It is because The SIML estimator of SR (0) (= x (0)) = x in the general case can be defined by Then we summarize the basic property of the SIML estimation of x and the derivation is given in the "Appendices A and B".
Proposition 2 Assume that the fourth order moments of each element of (x) i , i and (s) i (i = 1, … , n) in (9)-(13) are bounded. We set m n = [n ] (0 < < 1)). Then, as The above result has some implication on the use of the frequency domain analysis of (non-stationary) multiple time series. For 0 ≤ ≤ 1 2 , let the symmetrized (real-valued) spectral density matrix of j (j = 1, … , n) be f SR, y ( ) = (1∕2)[f y ( ) +f y ( )], where f ( ⋅ ) is the complex conjugate of f and we have the initial condition 0 .
From this interpretation, it may be easy to find that m is a consistent estimator of the long-run variance-covariance matrix, which is the spectral density matrix at the zero-frequency f SR, x (0) when we use the n -transformation of data.

Frequency interpretation of SIML-filtering
In the traditional statistical time series analysis for a stationary discrete (vector) process with the (complex-valued) spectral distribution F, it has a representation with rightcontinuous (complex-valued) orthogonal increments in the frequency domain (see Doob 1953;Brillinger 1980;Brockwell and Davis 1990 for the details). Chapter 7 of Anderson (1971) is informative because of its discussion on real-valued representations although it has only univariate cases. The real-valued multivariate orthogonal processes and the spectral density matrix play important roles in our formulation. (30) as Then, by using the inversion transformation with n , we can confirm that It is another representation of n = ( (n) � i ) = −1 n̂ n (Q) in (19) when n = n . For any s (s = 1, … , n) , (n) s can be recovered as the weighted sum of othogonal processes n ( (n) k ) at frequency (n) k (k = 1, … , n) . We then, by using n = n n , recover the non-stationary process (n) t (t = 1, … , n) given the initial condition 0 as Let (37) Then, when (n) m → as n → ∞ ( 0 < < 1 2 ), using Lemma 5.1 of Kunitomo et al. (2018), we find If we set the uncorrelated stochastic process of uncorrelated increments with continuous parameter ( 0 ≤ ≤ 1 2 ) as A n ( ) = ∑ n j=1 ( , j − 1 2 ) (n) j , then we find This corresponds to the continuous representation of a discrete (real-valued) stationary time series in the frequency domain (see Chapter 7.4 of Anderson (1971)). If we write the limit of ( ) = lim n→∞ n ( ) (assuming it exists), the (real-valued) spectral distribution matrix F RS for any 0 ≤ 1 < 2 ≤ 1∕2 can be defined as if F RS is absolutely continuous and the matrix-valued density process f RS ( ) (0 ≤ 1 < 2 ≤ 1∕2) exists.
From (22)  From our interpretation of the SIML filtering we find an interesting representation of (discrete time and real-valued) stationary processes and orthogonal incremental stochastic processes. They are closely related to our method of data analysis for nonstationary vector time series.

On multivariate filtering methods
There may be a natural question whether any multivariate method of seasonal adjustment or detrending is necessary because the filtered series can depend on the particular choice of co-variables. This may be true if we use some time-domain models such as multivariate ARMA models or other multivariate transformations. An application of Kalman-filtering approach may be an example because it has a multivariate state equation. On the other hand, the application of univariate ARIMA models with X-12-ARIMA has another problem because the filtering result depends on ARIMA for each series and there is no guarantee to keep the original relation among variables at each frequency of our interest. Some filters such as the HP filter use the minimization of univariate criterion function with restriction. Then the filtered result depends on the choice of criteria for each series and there is no guarantee to preserve the original relation among variables at each frequency. Because our method solely depends on the frequency decomposition, it may be free from these problems. Hence it should be a robust filtering method and with this respect it may be appropriate for analyzing non-stationary multivariate time series.

A guide of choosing frequencies
When we are interested in filtering a non-stationary time series with trend-cycle, seasonality and measurement errors, we need to choose the parameter m. We give a guide to set m in the trend-cycle filtering case for practical purpose. From the discussion of Sect. 5.2, the orthogonal process n ( (n) k ) corresponds to the frequency (n) k = (k − 1 2 )∕(2n + 1) (k = 1, … , n) . When we are interested in the trend-cycle component, we may only use (n) k = (k − 1 2 )∕(2n + 1) (k = 1, … , m) and then the maximum frequency is approximately For instance, when we have monthly data over 20 years as an example, we have n = 240 and s = 12 . Since we have seasonal frequencies, we want to find trend-cycle components as business cycles more than 1.5 year, say. Then an appropriate maximum frequency would be (n) max = 1.5∕24 and then we could take m * = 480 × (1.5∕24) = 30.
As another example if we have quarterly data over 30 years, we have n = 120 and s = 4 . Since we have a seasonal frequency, we want to find trend-cycle components as business cycles more than 1.5 years, say. Then an appropriate maximum frequency would be (n) max = 1.5∕8 and then we could take m * = 240 × (1.5∕8) = 45. If we were interested in the trend-cycle component of the non-stationary time series, these choices might be reasonable candidates.
As we shall see in the next subsection, this point could be checked by simulation for prediction.

Some simulation
When we have estimates of the state variables (m) i (i = 1, … , n) , the estimates of error components are ̂ (m) 1, … , n) . Then, an estimated MSE of the one-step ahead prediction errors based on the SIML-smoothing or filtering is given by where F n is the -field (information) available at n.
One may try to minimize the estimated h-step prediction MSE by choosing an appropriate m. It may be reasonable to take h = 2, 3 as short-run prediction and h = 4, 8 for long-run prediction from our limited experiments.
We only show a result on the simple trend plus noise model when p = 1 , . The criterion function is the prediction MSE given by where x (m) i is the estimate of x (m) i . We present the minimum m as m * based on the trend-cycle filtering as Tables 1, 2 and 3 by taking h = 2, 4, 6, 8 and n = 80, 120, 200, 400 . In our simulations, as n increases, we have larger choice of m * while m * decreases as h increases. When we have a long horizon with h, it may be natural to use a small number of lower frequencies, and for a wide range of x and v .

A comparison of the SIML filtering and HP filtering
To characterize some property of the SIML filtering, we give an illustrative example. For this purpose, we take a monthly original consumption data in Sect. 2 (Kakei-Chosa series in Fig. 3) and show the result of the SIML filtering and the HP filtering with two parameter values ( ) as Fig. 4. (We used hpfilter procedure in mFilter R-package for HP filter and took m = 12 for the SIML filter, which corresponds to frequencies over 2 year cycles since (n) max = 1∕24 .) An important point is that when we have strong seasonality in data, the SIML often gives reasonable estimates while the usefulness of HP filter depends on the particular parameter value chosen for .
Further comparison would be interesting, but it is beyond the scope of this paper.

An interpretation of Müller and Watson (2018)
Recently, Müller and Watson (2018) have proposed the so-called long-run co-variability of macro-economic time series. They have investigated many non-stationary  time series and obtained some empirical findings. In this sub-section, an interpretation of their method will be given as measuring the relationships among long-run trend-cycle in our framework when p = 2 . Moreover, we obtain an important consequence from Proposition 2 in Sect. 5.1. In this sub-section, we consider the case of p = 2 in (9)-(13). Let 2 × 2 matrices This quantity can be interpreted as the least squares slope of the transformed vector from 1n on the transformed vector from 2n for an n × 2 matrix n = ( 1n , 2n ) , which is essentially the same as the one proposed by Müller and Watson (2018). 7 Then, from Proposition 2 in Sect. 5 and its proof in the "Appendices A and B" we immediately obtain the following result.

Fig. 4
An example of SIML and HP filters. (An illustrated comparison of the SIML and HP Filters by using consumption data) 7 In their notation, m corresponds to q, which is fixed. They did use the (differenced) stationary data, and thus, we could interpret that they calculated the linear regression from the filtered data ̂ * n = � n � m m n −1 n ( n − 0 ) as a modification of (19) in our notation.

Proposition 3
In (9)-(13) with p = 2, we assume that the fourth order moments of each element of (x) i , i and (s) i (i = 1, … , n) are bounded, and x is positive definite.
(1) We fix an m, then ̂ is not consistent when n → ∞.
(2) Set m n = [n ] and 0 < < 1, then as n ⟶ ∞ The second part of this proposition is an extended version of the first part of Theorem 4.1 of Kunitomo and Sato (2017).
A simulation example To illustrate our arguments in Proposition 3, we performed a set of Monte Carlo experiments under the simple situation with p = 2 (the replication was 10,00 in each simulation). The model is given by where we have generated the normal errors ( , 2) , zero covariances and an initial value 0 . This is a two dimensional I(1)-process, which is not co-integrated, but close to a cointegrated-process. We present the finite sample properties of the (naive) least squares (LS) estimator from the original data, the M üller-Watson (MW) estimatot, and the SIML (SIML) estimator from the transformed data.
The simulation results in Tables 4, 5 and 6 are consistent with our arguments in Proposition 3. In the tables, Direct LS stands for the standard regression on 1n on 2n , while MW and SIML stand for Muller-Watson and SIML, respectively. When we have two-dimensional time series and they are not cointegrated, the standard least square method is badly biased. The procedure proposed by Müller and Watson (2018) often gives reasonable results, but its variance does not decrease as the sample size increases. The SIML estimation method has reasonable finite sample properties as well as reasonable asymptotic properties and it is applicable to more general cases.

Some macro-economic data and constructing a monthly consumption index
As illustrations for the empirical analysis, we have used our filtering method to analyze Japanese quarterly (real) consumption-GDP data as the first example, and three sets of monthly consumption data as the second example, which have been discussed in Sect. 2. "Appendix B" contains all figures in this subsection. As the first example, the ratio of real GDP and real consumption (quarterly original series) and their time series plots are given in Figs. 1 and 2. They show non-stationarity in their trend-cycle and seasonality as typical macro-economic variables. We then calculate the transformed data using n (a kind of real Fourier transformation) as Fig. 5 from the original consumption data. In this case, the transformed series gives a wild up and down fluctuation, and trend-cycle and seasonality are sucked. To find seasonal components from the original series, we calculated the realized z k (k = 1, … , n) (Fig. 6) and the empirical cumulative distribution of z 2 k (k = 1, … , n) (Fig. 7), which roughly correspond to the normalized and realvalued sample spectral distribution function. Each component of k (k = 1, … , n) is an orthogonal decomposition in the frequency domain. Because we have quarterly macro-data, we have a large up and down around ( (n) k = ) 0.25, which corresponds to the seasonal frequency at s = 4 . The empirical spectral distribution has an abrupt change at this frequency. From these figures, we can judge that the real Fourier transformation based on n does give useful information.
As it has been a practice in time series data analysis to use seasonal differencing in the Box-Jenkins method, we calculated the real Fourier transformation based on n (Fig. 8) after seasonal differencing. Although the contribution of the resulting orthogonal process around the seasonal frequency could be significant, there are some rather wild fluctuations on at many other frequencies by using n -transformation. Because we have some difficulty in interpreting the resulting time series, it may not be possible to justify the seasonal differencing procedure, and we recommend not to use this representation. In the following analysis, we simply use the differencing and then use the frequency domain analysis.  In Fig. 9, we have analyzed real Quarterly-GDP. We show one example with m = [n 0.99 ] and the deleted seasonal frequency are around 48-52 (48/196-52/196 in [0, 1/2]) and we delete extremely high-frequency part. We deleted only five transformed data around the seasonal frequency and the main intention was to investigate the effect of seasonality with a narrow band. We have taken = 0.99 since we wanted to remove some contribution of high frequency, but we could have used other choices and the results are not much different from Fig. 8 in most cases. We compared the filtered time series using our method and the official (published) seasonally adjusted time series. We found that the differences in two-time series (i.e., the published time series and the SIML filtered time series) are rather small and they are often of negligible magnitude. Although our filtering procedure is simple, this empirical example suggests the usefulness of our method developed in this study.
As the second example, we have analyzed three consumption (monthly) time series and the quarterly consumption time series, which are mentioned in Sect. 2. Then this is an empirical example when p = 4 and s = 12 with missing observations because the quarterly one series cannot give monthly information. As we have seen in Fig. 3, three monthly consumption series have similarities and some differences in their components. In our example, our goal is to construct the monthly consumption index, which is consistent with the observed quarterly consumption time series in trend-cycle component. Because of non-stationary trend-cycle, seasonal and measurement errors, it may not be obvious to construct a useful consumption index using existing statistical tools. Let Y i (i = 1, … , n) be the target (quarterly) time series and Z jt (j = 1, 2, 3;t = 3(i − 1) + l, l = 1, 2, 3) be the jth monthly time series ( t = 0 is the initial period and we fix the initial values Y 0 , Z j0 ). Then the criterion function is j,t−1 (the trend parts of Z jt ), and w j (j = 1, 2, 3) are (unknown) weight coefficients and m, m j (j = 1, 2, 3) are the numbers of trendcycle filtering. In the above formulation, we need to measure the prediction errors based on differenced data because we have non-stationary trend-cycle component.
Using the least-squares method, we minimized the MSE criterion with respect to the underlying parameters. The estimated w j (j = 1, 2, 3) are 3.69, 5.19, and 1.64 (while the measurement units are different), but their magnitudes are comparable to the published quarterly consumption level at 2002Q1-2016Q4), which are statistically significant at 1% . We have chosen m = 29 , m 1 = 36, m 2 = 23 and m 3 = 33 . Although it may be possible to use other possibilities, but in our limited experiments, we found some improvements in prediction error over other cases with different combinations of m and m j (j = 1, 2, 3). The black curves are the original series and the red curves are the estimated trend curves in Figs. 10 and 11 for two monthly series. By taking relatively large m j (j = 1, 2, 3) , we can recover the cycle components of each series, which are crucial as the indicators of macro-business condition. In Fig. 12, the green curve shows the predicted state value calculated from the latest observed (quarterly) data plus the predicted monthly part based on the estimated parameters. As there is no monthly observation of quarterly published consumption, we draw their latest (quarterly) level with the black curve and the estimated SIML (filtered) values with the red curve. 8 Overall, we found that while our procedure is much simpler than the X-12-ARIMA seasonal adjustment with reg-ARIMA model, the estimated filtered series are consistent with the (published) quarterly series given the fact we have non-stationary trend-cycle, seasonal and measurement errors components even when we do not have large samples. In Fig. 13, we have drawn the prediction errors in terms of the differenced value Y i (i = 1, … , n) based on our procedure. This figure illustrates the usefulness of the procedure because the macro-economic time series are nonstationary with measurement errors. The empirical data analyses in this subsection are presented for illustrations. We are currently investigating the consumption data and the results will be reported on another occasion.

Concluding remarks
When the observed non-stationary multivariate time series contain seasonality and noises, it may be difficult to disentangle the effects of trend-cycle, seasonal and measurement errors. In real (seasonally unadjusted) times series, we often observe nonstationary trend-cycle, seasonality and measurement errors while the X-12-ARIMA program in official agencies uses the univariate reg-ARIMA model to remove the seasonality from the original time series. In this study, we investigate a new procedure to decompose time series into non-stationary trend-cycle components, stationary seasonal and noise (or measurement errors) components. The resulting method for non-stationary multivariate series is simple and free from the underlying distributions of components. Hence, it is robust against possible misspecification in the non-stationary multivariate economic time series. An important conclusion is that it is useful to transform the observed time series using the n -transformation and investigate the transformed n series.
In empirical example in Sect. 7, we have illustrated our method to analyze quarterly and monthly macro-consumption data in Japan. We presented a way to construct the monthly consumption index as the second example, which is consistent with the published or official (GDP-)consumption quarterly data. Although the problem is practically complicated, we have shown that our method gives a useful way for a practical purpose.
There can be several further problems. Although it is easy to handle the n -transformations of non-stationary multivariate time series and construct the transformed ndata, there is a non-trivial initial value problem of filtering. Another direction would be to handle some abrupt changes in multiple time series, such as the changes in consumption tax in Japan. Some progress on these problem has been made (Sato and Kunitomo 2020a, b), which will be reported on other occasions. As there are many important empirical applications, we need to develop a systematic statistical procedure. directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.

Appendix A: Mathematical derivations
We now present some details of derivations that we have omitted in the previous sections. We refer to Kunitomo et al. (2018) as KSK 2018).
Hence, we show (28). n , when j ≠ j ′ , the summation of last two terms becomes .
When j = j � , jk − j � ,k = 0 and the summation of last two terms with the index set (2) n becomes 2m 2 . Hence, it is possible to evaluate each terms of (24) and (28). Using the relation and the corresponding results for j − j � (there are two cases when (a) j = j � and (b) j ≠ j ′ ), we obtain the results of (28), and then (24) by using (1) n instead of (2) n .
(2) The proof of Proposition 1 Essentially, we apply the Central Limit Theorem (Theorem 8.4.3 of Anderson (1971) or Theorem 7.6 of Durrett (1991)) to the sequence of ergodic stationary (discrete) time series. We will give the basic steps of our derivations and mention that the problem here is similar to those explained in Chapters 7-9 of Anderson (1971) in details. First, we need to show that the resulting variance-covariance terms correspond to those of the limiting Gaussian random variables.
For this purpose, we need to evaluate When k ≠ k ′ , we find that the right-hand side terms are bounded by using the straightforward calculations. We notice that the right-hand side consists of sums of four terms associated with Then we find that the sums of each terms associated with (A) and (B) in (56) are bounded, which becomes small when n is large. We write    (56) become arbitrarily small when n is large. (The terms with (B) are the same as those with (A) except their signs in the exponential parts.) Second, we need to show that the sums of each terms with (C) and (D) in (56) when k = k � are dominant terms. We utilized the relation When h ≥ 0 given h, the second sum is given as and when h < 0 given h, the sum can be written The sums of the above terms are bounded when k ≠ k ′ by using the same argument as we have (58). (The terms with (D) are the same as those with (C) except their signs in the exponential parts.) On the other hand, when k = k � , ∑ n−h j � =1 e i2 k−k � 2n+1 (j � −1) = n − h . Then the dominant sums with (C) and (D) in ( (The arguments here are similar to the derivations in Chapter 5 of KSK (2018), but there is a major difference on the conditions because their asymptotic arguments are based on the situation when the length of observed intervals decrease. In the present situation, there is no √ n factor in n -transformation and we use * n and a * kn while they have used a kn = na * kn