1 Introduction

There has been a vast amount of published research on the use of statistical time series analysis of macro-economic time series. One important feature of macroeconomic time series, which may be different from the standard time series analysis, is the fact that the observed time series is an apparent mixture of non-stationary components and stationary components. The second feature is that the measurement errors in economic time series often play important roles because macro-economic data are usually constructed from various sources including sample surveys in major official statistics although their statistical analysis often ignores measurement errors. There is the third important issue that the sample size of macro-economic data is rather small and we have 120, say, time series observations for each series when we have quarterly data over 30 years. The quarterly GDP series, for instance, are published from 1994 data to present by the cabinet office of Japan. Since the sample size is small, it is important to use an appropriate statistical procedure to extract information on trend and noise (or measurement error) components in a systematic way from data. Some of these aspects have been discussed by Morgenstern (1950), Granger and Newbold (1977), and Nerlove et al. (1995) for instance. See https://www.esri.cao.go.jp/index-e.html for the official macro-economic (GDP) data published by Cabinet Office of Japanese Government.

Although economists often take the transformation of the non-stationary time series such as log-transformation and/or differencing by the Box–Jenkins method (Box and Jenkins 1970), the standard assumption on statistical procedures after joint transformation of multiple time series data may not be valid when we have measurement errors. In this regard, Kunitomo and Sato (2017, 2019) [or Kunitomo et al. (2018)] developed a new method called the separating information maximum likelihood (SIML) estimation for the multivariate non-stationary errors-in-variables models. Earlier and related literature on the non-stationary economic time series analysis are Engle and Granger (1987) and Johansen (1995), which dealt with multivariate non-stationary and stationary time series and developed the notion of co-integration without measurement errors. The problem of our interest is related to their work, but it has different aspects and our focus is on the non-stationarity and measurement error in the non-stationary errors-in-variables models. Also in econometric literature the issue of identification of parametric models and the issue of estimation when the true parameters are around the boundary points have been discussed by Rothenberg (1971) and Andrews (1999). These works are potentially related to our problem in the non-stationary errors-in-variable models, which would be an interesting future topic.

In statistics literature, on the other hand, the state space modeling and filtering for non-stationary time series has been developed by Akaike (1989) and Kitagawa (2010) among other similar attempts and there have been applications in many fields in control engineering and statistical seismology already reported. Their method may look different from the framework of standard time series econometrics at the first glance, but the underlying statistical problem is essentially the same because of the non-stationarity of time series and measurement errors. Our study can be regarded as an investigation of statistical inference problem of state space modeling, which is related to an earlier work by Chang et al. (2008), for instance. The problem of the ML estimation in our setting may be related to Anderson and Takemura (1986) in a different context.

In the statistical multivariate analysis, there is also some literature on the errors-in-variables models as Anderson (1984, 2003) and Fuller (1987), but they considered the multivariate cases of independent observations and the underlying situation is different from ours.

The main purpose of this paper is to compare the SIML estimation and the maximum likelihood (ML) estimation, which are two different methods to estimate the multivariate non-stationary errors-in-variables models when there are non-stationary trends and noise components. We investigate the finite and large sample properties of two estimation methods. An important finding is the fact that the Gaussian likelihood function may have non-concave shape in some case of the non-stationary errors-in-variables models although the ML method works well when the Gaussianities of non-stationary and stationary components hold with some restriction on the parameter space and under the assumption that the measurement errors are not small. When the measurement error is small in a sense and/or there are co-integrated relations among trends with the rank being smaller than the dimension of observations, there could be a serious problem in the ML estimation under the assumption of Gaussian distributions (see Sect. 3.1 on the details of this issue). The SIML method, on the other hand, gives an alternative way to overcome the underlying difficulty in a non-parametric way. It has the asymptotic robust properties under general conditions of moments for consistency and for asymptotic normality.

In Sect. 2 we present a general formulation of the non-stationary errors-in-variables models and explain two estimation methods, that is the SIML and ML methods in non-stationary time series. We also give simple examples to illustrate the importance of measurement error problem in non-stationary time series models and explain some motivations of the present study. In Sect. 3, as a simple example we use one-dimensional case with the random walk plus noise model and then we discuss a common (non-stationary) factor case as the two-dimensions errors-in-variables model. In Sect. 4 we investigate the Gaussian likelihood function and its shape. We give the consistency result of the ML estimation under Gaussianity, which may be new although it is not surprising. We also discuss the asymptotic properties and simulation results on the SIML method. Then in Sect. 5, we shall discuss possible extensions of our analysis and give concluding remarks in Sect. 6. Some mathematical derivations will be given in the Appendix.

2 Non-stationary errors-in-variables models

2.1 The basic formulation and estimation methods

Let \(y_{j i}\) be the i-th observation of the j-th time series at i for \(i=1,\ldots ,n;\, j=1,\ldots ,p\). We set \({\mathbf{y}}_i=(y_{1 i},\ldots , y_{p i})^{\prime }\) be a \(p\times 1\) vector and \({\mathbf{Y}}_n=({\mathbf{y}}_i^{\prime })\;(=(y_{i j}))\) be an \(n\times p\) matrix of observations and denote \({\mathbf{y}}_0\) as the initial \(p\times 1\) vector. We consider the situation when the underlying non-stationary trends \({\mathbf{x}}_i\;(=(x_{ji}))\;(i=1,\ldots ,n)\) are not necessarily the same as the observed time series and let \({\mathbf{v}}_i^{\prime }=(v_{1 i},\ldots , v_{p i})\) be the vector of noise components, which are independent of \({\mathbf{x}}_i\). Then we use the additive state space decomposition form as [see Akaike (1989) and Kitagawa (2010)]

$$\begin{aligned} {\mathbf{y}}_{i}={\mathbf{x}}_i+{\mathbf{v}}_i\quad (i=1,\ldots ,n), \end{aligned}$$
(1)

where \({\mathbf{x}}_i\;(i=1,\ldots ,n)\) are a sequence of non-stationary trend components satisfying

$$\begin{aligned} \varDelta {\mathbf{x}}_i= (1-{\mathcal{L}}) {\mathbf{x}}_i={\mathbf{w}}_i^{(x)} \end{aligned}$$
(2)

with \({\mathcal{L}}{} {\mathbf{x}}_i={\mathbf{x}}_{i-1},\) \(\varDelta =1-{\mathcal{L}},\) \({\mathcal{E}}({\mathbf{w}}_i^{(x)})={{\mathbf{0}}},\) \({\mathcal{E}}({\mathbf{w}}_i^{(x)} {\mathbf{w}}_i^{(x)'})={{\varvec{\varSigma }}}_{x}\), and \({\mathbf{v}}_i\;(i=1,\cdots ,n)\) are a sequence of (mutually) independent noise components with \({\mathcal{E}}({\mathbf{v}}_i)={{\mathbf{0}}} ,\) \({\mathcal{E}}({\mathbf{v}}_i {\mathbf{v}}_i^{\prime })={{\varvec{\varSigma }}}_{v}\).

We assume that \({\mathbf{w}}_i^{(x)}\) and \({\mathbf{v}}_i\) are the sequence of i.i.d. random variables with \({{\varvec{\varSigma }}}_{v}\) being non-negative definite and finite, and the random variables \({\mathbf{w}}_i^{(x)}\) and \({\mathbf{v}}_i\) are mutually independent.

We consider the situation when \(\varDelta {\mathbf{x}}_i\) and \({\mathbf{v}}_i\;(i=1,\ldots ,n)\) are mutually independent and each of the component vectors are independently, identically, and normally distributed as \(N_p({{\mathbf{0}}},{{\varvec{\varSigma }}}_x)\) and \(N_p({\mathbf{0}},{{\varvec{\varSigma }}}_v) ,\) respectively. We use an \(n\times p\) matrix \({\mathbf{Y}}_n=({\mathbf{y}}_i^{\prime })\) and consider the distribution of \(np\times 1\) random vector \(({\mathbf{y}}_1^{\prime },\ldots , {\mathbf{y}}_n^{\prime })^{\prime }\). Given the initial condition \({\mathbf{y}}_0 ,\) we have

$$\begin{aligned} {\mathrm{vec}}({\mathbf{Y}}_n) \sim N_{n \times p}( {\mathbf{1}}_n \cdot {\mathbf{y}}_0^{\prime }, {\mathbf{I}}_n\otimes {{\varvec{\varSigma }}}_v +{\mathbf{C}}_n{\mathbf{C}}_n^{\prime }\otimes {{\varvec{\varSigma }}}_x), \end{aligned}$$
(3)

where \({\mathbf{1}}_n^{\prime }=(1,\ldots ,1)\) and

$$\begin{aligned} {\mathbf{C}}_n= \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 1 &{} 0 &{} \cdots &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 &{} \cdots &{} 0 \\ 1 &{} 1 &{} 1 &{} \cdots &{} 0 \\ 1 &{} \cdots &{} 1 &{} 1 &{} 0 \\ 1 &{} \cdots &{} 1 &{} 1 &{} 1 \\ \end{array} \right) _{n\times n}. \end{aligned}$$
(4)

Then, given the initial condition \({\mathbf{y}}_0\), the conditional maximum likelihood (ML) estimator can be defined as the solution of maximizing the conditional log-likelihood function except a constant as

$$\begin{aligned} L_n^{*}= & {} \log \vert {\mathbf{I}}_n\otimes {{\varvec{\varSigma }}}_v +{\mathbf{C}}_n{\mathbf{C}}_n^{\prime }\otimes {{\varvec{\varSigma }}}_x\vert ^{-1/2}\\&-\frac{1}{2} [ \text{vec} ({\mathbf{Y}}_n-{\bar{{\mathbf{Y}}}}_0)^{\prime }]^{\prime } [ {\mathbf{I}}_n\otimes {{\varvec{\varSigma }}}_v +{\mathbf{C}}_n{\mathbf{C}}_n^{\prime }\otimes {{\varvec{\varSigma }}}_x ]^{-1} [ \text{vec} ({\mathbf{Y}}_n-{\bar{{\mathbf{Y}}}}_0)^{\prime }], \end{aligned}$$

where

$$\begin{aligned} {\bar{{\mathbf{Y}}}}_0 = {\mathbf{1}}_n \cdot {\mathbf{y}}_0^{\prime }. \end{aligned}$$
(5)

To develop the method of the SIML estimation [see Kunitomo and Sato (2017)], we use the \(K_n\)-transformation from \({\mathbf{Y}}_n\) to \(\mathbf{Z}_n\;(=(\mathbf{z}_k^{\prime }))\) by

$$\begin{aligned} \mathbf{Z}_n={\mathbf{K}}_n ({\mathbf{Y}}_n-{\bar{{\mathbf{Y}}}}_0),\quad {\mathbf{K}}_n=\mathbf{P}_n {\mathbf{C}}_n^{-1}, \end{aligned}$$
(6)

where

$$\begin{aligned} {\mathbf{C}}_n^{-1}= \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 1 &{} 0 &{} \cdots &{} 0 &{} 0 \\ -1 &{} 1 &{} 0 &{} \cdots &{} 0 \\ 0 &{}-1 &{} 1 &{} 0 &{} \cdots \\ 0 &{} 0 &{}-1 &{} 1 &{} 0 \\ 0 &{} 0 &{} 0 &{} -1 &{} 1 \\ \end{array} \right) _{n\times n}, \end{aligned}$$
(7)

and

$$\begin{aligned} \mathbf{P}_n=(p_{jk}^{(n)}),\; p_{jk}^{(n)} =\sqrt{ \frac{2}{n+\frac{1}{2}} } \cos \left[ \frac{2\pi }{2n+1} \left( k-\frac{1}{2}\right) \left( j-\frac{1}{2}\right) \right] . \end{aligned}$$
(8)

Using the spectral decomposition \({\mathbf{C}}_n^{-1}{} {\mathbf{C}}_n^{' -1} =\mathbf{P}_n \mathbf{D}_n \mathbf{P}_n^{\prime }\) and \(\mathbf{D}_n\) is a diagonal matrix with the k-th element \(d_k= 2 [ 1-\cos (\pi (\frac{2k-1}{2n+1})) ] \;(k=1,\ldots ,n)\). Then the conditional likelihood function given the initial condition is proportional to

$$\begin{aligned} L_n = \sum _{k=1}^n \log \vert a_{k n}^{*}{{\varvec{\varSigma }}}_v +{{\varvec{\varSigma }}}_x\vert ^{-1/2} -\frac{1}{2}\sum _{k=1}^n \mathbf{z}_k^{\prime } [ a_{k n}^{*}{{\varvec{\varSigma }}}_v+{{\varvec{\varSigma }}}_x ]^{-1} \mathbf{z}_k, \end{aligned}$$
(9)

where

$$\begin{aligned} a_{k n}^{*} \;(=d_k) =4 \sin ^2 \left[ \frac{\pi }{2}\left( \frac{2k-1}{2n+1} \right) \right] \quad (k=1,\ldots ,n). \end{aligned}$$
(10)

We use the transformation on the non-stationary time series and use the random variables \(\mathbf{z}_k \;(k=1,\ldots ,n)\), which follows \(N_p({{\mathbf{0}}}, {{\varvec{\varSigma }}}_x+a_{k n}^{*}{{\varvec{\varSigma }}}_v)\) and the coefficients \(a_{kn}^{*}\) is a dense sample of \(4 \sin ^2 (x)\) in \((0,\pi /2)\).Footnote 1

The ML estimation of unknown parameters is defined as the maximization of (9) with respect to \({{\varvec{\varSigma }}}_v\) and \({{\varvec{\varSigma }}}_x\). Since the coefficients \(a_{kn}^{*}\;(k=1,\ldots ,n)\), the ML estimator is a complicated function of data and its computation is not a trivial task as we shall see in Sects. 3 and 4.

From the representation (9), it may be natural to use \(\mathbf{z}_k\mathbf{z}_k^{\prime }\) to estimate \(a_{kn}^{*}{{\varvec{\varSigma }}}_v +{{\varvec{\varSigma }}}_x\) since it is the variance–covariance matrix of \(\mathbf{z}_k\). We notice that \(a_{k n}^{*}\rightarrow 0\) as \(n\rightarrow \infty\) for a fixed k. When k is small, \(a_{k n}^{*}\) is small and we can expect that \(k=k_n\) depending n is still small when n is large. However, \((1/{m_n})\sum _{k=1}^{m_n}a_{kn}^{*}\) is not small if \(m_n\) is near to n, which suggests the condition \(m_n/n\rightarrow 0\) as \(n\rightarrow \infty\). The separating information maximum likelihood (SIML) estimator of \({\hat{{\varvec{\varSigma }}}}_x\) is defined by

$$\begin{aligned} {\hat{{\varvec{\varSigma }}}}_{x,SIML} =\frac{1}{m_n}\sum _{k=1}^{m_n} \mathbf{z}_k\mathbf{z}_k^{\prime }. \end{aligned}$$
(11)

This estimator of the variance–covariance of non-stationary trends is trying to use the information on trends in the frequency domain, which corresponds to using only the trend parts without measurement errors from the time series observations. The interpretation of (11) from the frequency domain of non-stationary time series will be discussed in a general setting [see (42) and (43) in Sect. 5].

From our construction of the SIML estimation the essential features of estimation do not much depend on the presence of noise terms when the noise terms are stationary. This feature was the main reason for developing the SIML method by Kunitomo and Sato (2017) and Kunitomo et al. (2018).

Let the quadratic variation of observed vectors \({\mathbf{y}}_i\;(i=1,\ldots ,n)\) be

$$\begin{aligned} \mathbf{QV}_y(1) =\sum _{i=1}^n({\mathbf{y}}_i-{\mathbf{y}}_{i-1})({\mathbf{y}}_i-{\mathbf{y}}_{i-1})^{\prime }, \end{aligned}$$
(12)

where \({\mathbf{y}}_0\) is the initial vector. If we denote \({\hat{{\varvec{\varSigma }}}}_y=(1/n)\mathbf{QV}_y(1),\) we need the condition \({\hat{{\varvec{\varSigma }}}}_{x,{\mathrm{SIML}}}\le {\hat{{\varvec{\varSigma }}}}_{y}\) in the sense of positive semi-definiteness.

For the SIML estimation \({\hat{{\varvec{\varSigma }}}}_{x.{\mathrm{SIML}}} ,\) the number of terms \(m_n\) should be dependent on n. In the representation of (11) we need the order requirement that \(m_n=O([n^{\alpha }])\) and \(0<\alpha <1\), which is the first property of the macro-SIML estimation. There is a trade-off between the bias and the variances of the SIML estimator for \({{\varvec{\varSigma }}}_x\). Kunitomo and Sato (2017) have shown that when \(m_n\rightarrow \infty\) as \(n\rightarrow +\infty\) for the consistency we need the condition \(0<\alpha < 1.0\) while for the asymptotic normality we need the condition \(0<\alpha <0.8\) under the assumption that the parameter matrices \({{\varvec{\varSigma }}}_x>0\) and \({{\varvec{\varSigma }}}_v>0\) are fixed (see Theorem 4.3 in Sect. 4). As an example, we may take \(\alpha =0.79\).

2.2 Non-stationary time series and measurement errors

In this subsection using a simple example when \(p=1\) we shall illustrate the reason why the presence of noise term in the non-stationary time series, even if it is small, forces us to change the standard thinking on time series analysis.

Fig. 1
figure 1

Normal approximation (\(n=30\))

Fig. 2
figure 2

(a) (Left) Estimated variance (\(n=80, c=2\)), (b) (right) estimated variance (\(n=80, c=1/2\))

Fig. 3
figure 3

(a) (Left) Estimated variance (\(n=80, c=8\)), (b) (right) estimated variance (\(n=80, c=1/8\))

In traditional econometric analysis of time series, the non-stationarity of economic times series has been often discussed, but there were not many discussions on the role of measurement errors [see Engle and Granger (1987) and Johansen (1995) for instance]. The standard arguments for the integrated processes have been to use the Brownian functionals to describe the behaviors of integrated processes when the sample size is large. As a typical example,Footnote 2 we generate a set of mutually independent variables \(w_i^{(x)}\;(i=1,\ldots ,n)\),which follows \(\sqrt{12} (U(0,1)-0.5)\) and U(0, 1) follows the uniform distribution. Then we generate \(x_i\) and \(y_i\) satisfying \(x_i=x_{i-1}+w_i^{(x)} ,\) and \(y_i=x_i+v_i\;(i=1,\ldots ,n)\) by adding the measurement errors \(v_i\), which are independent N(0, 1) random variables. By replicating 1000 times, the empirical distribution of \([\sum _{i=1}^ny_i]/[ \sigma _x\sqrt{n^3}]\) (we have set \(\sigma ^2_x={\mathcal{E}}(w_i^{(x) 2})=1\) and \(x_0=0\)) can be approximated by the weak convergence to the Brownian functional

$$\begin{aligned} X(1)=\int _0^1 B(s)\,{\mathrm{d}}s, \end{aligned}$$
(13)

where B(s) is the Brownian motion on [0, 1] and it follows N(0, 1 / 3). Figure 1 illustrates the fact that this approximation is reasonable even when \(n=30\). (The red curve shows N(0, 1 / 3).) However, as illustrated in Chapters 9 and 10 of Hayashi (2000), to make statistical inference on the unit roots and possible co-integrated vectors for instance, we need an estimate of the variance and covariances of innovation terms of integrated processes such as \(\sigma _x^2\), which are generally unknown. There should be a careful analysis on the role of innovations, but the easiest way to estimate the variance of innovation term of the integrated processes and to use the normalized sum of squared differences of observed time series as \({{\hat{\sigma }}}^2_y\;(=(1/n)\sum _{i=2}^n(y_i-y_{i-1})^2)\) although there can be more sophisticated methods such as the maximum likelihood method in more general situations. As a simple illustration, we generate a set of observations \(y_i\;(i=1,\ldots ,n)\) by adding the measurement errors \(v_i\) which are independent N(0, 0.5) random variables to the I(1) process \(x_i\), and thus \(y_i=x_i+v_i\;(i=1,\ldots ,n)\). Figure 2a (case 2-1) illustrates the estimated variance of innovation \(w_i^{(x)}\;(=x_i-x_{i-1})\) by this method, where the true parameter value (as red line) is 1. Although the variance of innovation of the integrated process is twice of the variance of measurement errors and the measurement errors is small in a sense, the standard estimated value has a significant bias and it is distributed around 2 when \(n=80\). This case corresponds to \(c=2=\sigma _x^2/\sigma _v^2\). We also give Figs. 2b, 3a, b for case 2-2 (\(c=1/2\)), case 2-3 (\(c=8\)) and case 2-4 (\(c=1/8\)) for a comparison. Case 2-1 and case 2-3 correspond to the small noise cases while case 2-2 and case 2-4 correspond to the large noise cases. In latter cases the bias of the estimated variance \(\sigma _x^2\) becomes large as we can expect.

The point here is the fact that even when we have small (Gaussian) noise we may have misleading estimation on the variance of system variables. These examples may illustrate the importance of our analysis of measurement errors in non-stationary time series. We usually do not have much information on the magnitude of c, which is the ratio of the signal variance over the noise variance, and the distribution of measurement errors in advance when we observe non-stationary data. Hence it is important to use the statistical method which does not depend on the value of c and the distribution of noise for practical purpose.

3 Simple cases

3.1 An illustrative example

To see the main problem of our interest in a clear way, we consider the simplest case when \(p=1\). Let \(y_{i}\) be the ith observation of time series for \(i=1,\ldots ,n\) and \({\mathbf{y}}_n=(y_i)\) be an \(n\times 1\) vector of observations (\(y_0\) is the initial observation). We consider the situation when the underlying non-stationary trends \(x_i\) (\(i=1,\ldots ,n\)) satisfies

$$\begin{aligned} x_i=x_{i-1}+ w_i^{(x)}, \end{aligned}$$
(14)

where \(w_i^{(x)}\) are the independent random variables followed by \(N(0, \sigma _x^2)\) and \(x_0\) is the initial variable. Let \(v_i\) (measurement error) be the sequence of i.i.d. random variables followed by \(N(0, \sigma _v^2)\), which are independent of \(x_i\;(i=1,\ldots ,n)\). For the additive model

$$\begin{aligned} y_i=x_i+v_i\quad (i=1,\ldots ,n), \end{aligned}$$
(15)

the log-likelihood function is proportional to

$$\begin{aligned} L_n = \sum _{k=1}^n \log \vert a_{k n}^{*}\sigma _v^2+\sigma _x^2\vert ^{-1/2} -\frac{1}{2}\sum _{k=1}^n \frac{z_k^2}{ a_{k n}^{*} \sigma _v^2+\sigma _x^2 }, \end{aligned}$$
(16)

where

$$\begin{aligned} a_{k n}^{*} =4 \sin ^2 \left[ \frac{\pi }{2}\left( \frac{2k-1}{2n+1} \right) \right] \quad (k=1,\ldots ,n). \end{aligned}$$
(17)

Using the variance ratio \(c =\sigma _x^2/\sigma _v^2\;(\ge 0)\),Footnote 3 we rewrite \(-(1/2)L_n\) as

$$\begin{aligned} L_{1n} = \sum _{k=1}^n [\log \sigma _v^2 +\log ( a_{k n}^{*} + c )] +\frac{1}{\sigma _v^2} \sum _{k=1}^n \frac{z_k^2}{ a_{k n}^{*} +c }. \end{aligned}$$
(18)

Since \(z_k\sim N(0, a_{kn}^{*}\sigma _v^2+\sigma _x^2)\;(k=1,\ldots ,n)\), the maximum likelihood estimator of \(\sigma _v^2\) can be represented as

$$\begin{aligned} {{\hat{\sigma }}}_{v.ML}^2 =\frac{1}{n} \sum _{k=1}^n \frac{z_k^2}{a_{k n}^{*} +c } \end{aligned}$$
(19)

and the concentrated (normalized) log-likelihood function in this simple case is proportional to \(-(1/2)\) times

$$\begin{aligned} L_{1n}(c ) = \log \left[ \frac{1}{n} \sum _{k=1}^n \frac{z_k^2}{a_{k n}^{*} +c }\right] +\frac{1}{n}\sum _{k=1}^n \log [ a_{kn}^{*}+c ] +1. \end{aligned}$$
(20)

Then it may not be straight-forward to obtain the maximum likelihood estimator of c because the likelihood function may not be a simple function and the likelihood equation \(\frac{\partial L_{1n}(c )}{\partial c}=0\) is a polynomial function of order \(2n-3\), and as consequence there are local maximum points with any finite sample data.

As a typical situation we draw the likelihood function with respect to the parameter c and the result of a small simulation in Fig. 4a when the true values are \(\sigma _v^2=0.4\) and \(\sigma _x^2=0.2\) (case 3-1). We also show the likelihood function and the result of case when the true values are \(\sigma _v^2=0.1\) and \(\sigma _x^2=0.8\) (case 3-2) as Fig. 4b. To make a comparison, we give case 3-3 (\(\sigma _v^2=0.2, \sigma _x^2=0.4\)) as Fig. 5a and case 3-4 (\(\sigma _v^2=0.8, \sigma _x^2=0.1\) as Fig. 5b. The shape of the likelihood function is reasonable in case 3.1 and case 3.4 while it is not so in case 3.2 and case 3.3. In the latter cases, the likelihood function is rather flat around the maximum point, which means that it does not have much information in a sense.

Fig. 4
figure 4

(a) (Left) Likelihood function (\(n=100, c=1/2\)), (b) (right) likelihood function (\(n=100, c=8\))

Fig. 5
figure 5

(a) (Left) Likelihood function (\(n=100, c=2\)), (b) (right) likelihood function (\(n=100, c=1/8\))

Case 3-1 and case 3-4 are the standard ones and we can expect that the ML estimation under the Gaussian assumption gives a reasonable result. However, case 3-2 and case 3.3 illustrate some problem on the ML estimation. The likelihood function looks flat over a wide range of the parameter space of c and it causes some computation difficulty to find its maximum. We could interpret the reason why Akaike (1989) suggested that we should impose the restriction \(0< c \le c_u\) for a pre-specified \(c_u\) in our setting because of the usefulness of the resulting statistical models of time series filtering. Kitagawa (2010) has developed DECOMP program,Footnote 4 which has a similar restriction on the parameter space. We sometimes obtain the estimated value quite near 1.0 for c when we analyze macro-economic data such as real (quarterly) GDP in Japan.

One of interesting aspects of the present problem is the fact the ML method does not necessarily give a satisfactory solution meaning a numerically stable solution and it is the case when \(p=1 , \sigma _v^2=0.1\) and \(\sigma _x^2=0.8\). Since the likelihood function is near flat over a wide range of the parameter space, it is often difficult to find the maximum in a stable way. In fact, given the finite sample, there is a positive probability of zero when the true value of \(\sigma ^2_v\) is small, but not zero. This can be easily shown because the normalized second-order derivative of the log-likelihood function can be approximated as some negative number when \(\tau \;(=1/c)\rightarrow 0\) and the zero point is a local maximum. As an illustration, we give the histogram of the ML estimation with the restriction \(0<c\le 50\) for the second case as Fig. 6 with 500 replications. There is a difficulty to find the global maximum of likelihood function and there is some probability at the boundary point of \(c=50\).

Fig. 6
figure 6

Histogram of ML (\(n=100, c=8\))

On the other hand, it is certainly possible to approximate the log-likelihood function (16) as

$$\begin{aligned} L_n^{SI} = \sum _{k=1}^{m_n} \log \vert a_{k n}^{*}\sigma _v^2+\sigma _x^2\vert ^{-1/2} -\frac{1}{2}\sum _{k=1}^{m_n} \frac{z_k^2}{ a_{k n}^{*} \sigma _v^2+\sigma _x^2 }. \end{aligned}$$
(21)

We set the requirement on \(m_n/n=o(1)\) because \(a_{kn}^{*}=o(1)\) when \(k=1,\ldots ,m_n\) and \(m_n, n\rightarrow \infty\). Then the SIML estimator \(\sigma _x^2\) (we call the macro-SIML method) can be given by

$$\begin{aligned} {{\hat{\sigma }}}_{x.{\mathrm{SIML}}}^2 =\frac{1}{m_n} \sum _{k=1}^{m_n} z_k^2, \end{aligned}$$
(22)

Since the information on the trend term is separated from the noise term, we expect that the resulting macro-SIML estimation has a robust property.

We note that the macro-SIML estimation of \(\sigma _v^2\) is not the same as the original (finance) SIML method developed by Kunitomo et al. (2018) because \(a_{kn}^{*}=O(1)\) for \(k=n-m_n+1,\ldots ,n\). One way to estimate \(\sigma _v^2\) is to use the fact that

$$\begin{aligned} {\mathcal{E}}\left[ \frac{1}{n}\sum _{k=1}^n z_k^2 \right] = \sigma _x^2+ \left( \frac{1}{n}\sum _{k=1}^n a_{kn}^{*}\right) \sigma _v^2 \longrightarrow \sigma _x^2+ 2\sigma _v^2\quad (n\rightarrow \infty ). \end{aligned}$$
(23)

It is clear that \(a_{kn}^{*}\longrightarrow 0\) when \(k_n/n\rightarrow 0\;(n\rightarrow \infty )\) and \(a_{kn}^{*}\rightarrow 4\;(k_n/n\rightarrow 1, n\rightarrow \infty )\). Then a possible SIML estimator \(\sigma _v^2\) can be given by

$$\begin{aligned} {{\hat{\sigma }}}_{v.{\mathrm{SIML}}}^2(1) =\frac{1}{2}\left[ \frac{1}{n} \sum _{k=1}^{n} z_k^2- {{\hat{\sigma }}}_{x.{\mathrm{SIML}}}^2\right] , \end{aligned}$$
(24)

with the restriction of non-negativity.

For the estimation problem of high-frequency financial data, Kunitomo et al. (2018) have suggested to use \({{\hat{\sigma }}}_{v.{\mathrm{SIML}}}^2 =\frac{1}{l_n} \sum _{k=n-l_n}^{n} a_{kn}^{-1} z_k^2\,,\) where \(a_{kn} =n a_{kn}^{*}\) and \(l_n=o(n)\) for the high-frequency asymptotics. However, in the present case of Macro-SIML with \(a_{kn}^{*}\) it is straight-forward to show that \(a_{n-l+1,n}^{*}\rightarrow 4\) as \(n\rightarrow \infty\) for fixed l, the macro-SIML estimator is given by

$$\begin{aligned} {{\hat{\sigma }}}_{v.{\mathrm{SIML}}}^2(2) = \frac{1}{l_n} \sum _{k=n-l_n+1}^{n} a_{kn}^{*-1}z_k^2 -\frac{1}{4} {{\hat{\sigma }}}_{x.{\mathrm{SIML}}}^2 \end{aligned}$$
(25)

with the restriction of non-negativity.

3.2 A non-stationary common trend case

The difficulty in the ML estimation becomes more serious in the multivariate non-stationary errors-in-variables models. We illustrate the problem of multivariate aspects in a simple, but important formulation. It can be regarded as a simple extension of the so-called reduced rank regression.

Let \({\mathbf{y}}_{i}\) be the ith observation of p-dimensional time series (\(i=1,\ldots ,n\)), \({\mathbf{y}}_i={\mathbf{x}}_i+{\mathbf{v}}_i\) and \({\mathbf{Y}}_n=({\mathbf{y}}_i^{\prime })\) be an \(n\times p\;(p > 1)\) matrix of observation. We assume that the vectors \({\mathbf{x}}_i\) satisfy

$$\begin{aligned} {\mathbf{x}}_i={\mathbf{x}}_{i-1}+{\mathbf{w}}_i^{(x)} , \end{aligned}$$
(26)

and \({\mathbf{w}}_i^{(x)}={{\varvec{\pi }}}{\mu }_i^{*}\), \({{\varvec{\pi }}}\) is a (non-zero) \(p\times 1\) vector, \({\mu }_i^{*}\) is a sequence of i.i.d. (one-dimensional) random variablesFootnote 5 following \(N(0,\sigma _{\mu }^2)\) and \({\mathbf{v}}_i\) are i.i.d. (p-dimensional) random variables following \(N_p({{\mathbf{0}}}, {{\varvec{\varSigma }}}_v)\) with the variance–covariance (non-singular) matrix \({{\varvec{\varSigma }}}_v\). We set \(\mathbf{b} =\sigma _{\mu } {{\varvec{\pi }}}\) and \({\mathbf{A}}=a_{kn}^{*}{{\varvec{\varSigma }}}_v\), and then apply the matrix formulas such that for a positive definite \({\mathbf{A}}\) and non-zero vector \(\mathbf{b}\)

$$\begin{aligned} \vert {\mathbf{A}}+\mathbf{b}{} \mathbf{b}^{\prime }\vert =\vert {\mathbf{A}}\vert [1+\mathbf{b}^{\prime }{} {\mathbf{A}}^{-1}{} \mathbf{b}] \end{aligned}$$
(27)

and

$$\begin{aligned} {[}{} {\mathbf{A}}+\mathbf{b}{} \mathbf{b}^{\prime }]^{-1} ={\mathbf{A}}^{-1}- {\mathbf{A}}^{-1}\mathbf{b}[1+ \mathbf{b}^{\prime }{} {\mathbf{A}}^{-1}{} \mathbf{b}]^{-1} \mathbf{b}^{\prime }{} {\mathbf{A}}^{-1} \end{aligned}$$
(28)

for \({{\varvec{\varSigma }}}_x=\mathbf{b}{} \mathbf{b}^{\prime }\).

The likelihood function \(L_n\) is proportional to \((-1/2)\) times

$$\begin{aligned} L_{1n}= & {} \sum _{k=1}^n\left[ \log \vert a_{kn}^{*}{{\varvec{\varSigma }}}_v\vert + \log ( 1+a_{kn}^{*-1}\mathbf{b}^{\prime } {{\varvec{\varSigma }}}_v^{-1}{} \mathbf{b} ) \right. \\&\left. +\, a_{kn}^{*-1} \mathbf{z}_k^{\prime }{{\varvec{\varSigma }}}_v^{-1} \mathbf{z}_k - \frac{ a_{kn}^{*-1} (\mathbf{z}_k^{\prime }{{\varvec{\varSigma }}}_v^{-1}{} \mathbf{b})^2}{a_{kn}^{*}+\mathbf{b}^{\prime }{{\varvec{\varSigma }}}_v^{-1}{} \mathbf{b} } \right] \\= & {} \sum _{k=1}^n\log \vert a_{kn}^{*}{{\varvec{\varSigma }}}_v\vert +\sum _{k=1}^n a_{kn}^{*-1} \mathbf{z}_k^{\prime }{{\varvec{\varSigma }}}_v^{-1} \mathbf{z}_k \\&+\sum _{k=1}^n\left[ \log ( 1+a_{kn}^{*-1}c ) - \frac{ a_{kn}^{*-1} (\mathbf{z}_k^{\prime }{{\varvec{\varSigma }}}_v^{-1}{} \mathbf{b})^2}{a_{kn}^{*} +c } \right] , \end{aligned}$$

where we denote

$$\begin{aligned} c=\sigma _{\mu }^2{{\varvec{\pi }}}^{\prime } {{\varvec{\varSigma }}}_v^{-1}{{\varvec{\pi }}}. \end{aligned}$$
(29)

We need a normalization for vector \({{\varvec{\pi }}}\) and one possibility is to take \({{\varvec{\pi }}}^{\prime }=(1,-{{\varvec{\theta }}}_2^{\prime })\), but there can be other possibility.

Remark 3.1

When \(p=2,\) we take \({{\varvec{\beta }}}^{\prime }=(1,-{\beta }_2)\). and \({{\varvec{\pi }}}^{\prime }{{\varvec{\beta }}}=0\) with a normalization. Then we can interpret \({{\varvec{\beta }}}^{\prime }{} {\mathbf{y}}_i={{\varvec{\beta }}}^{\prime }{} {\mathbf{v}}_i\;(=u_i)\) (the rank of \({\varvec{\pi }}\) is 1) as the structural equation in time series econometrics. It is because \({\mathbf{y}}_i\) is an I(1) vector and while \({{\varvec{\beta }}}^{\prime }{} {\mathbf{y}}_i=u_i\) is an I(0) variable, where d in I(d) (\(d=0,1\)) is the integration order of time series.

As an intuitive way to simplify the present problem of statistical relationship among non-stationary variables and to obtain the solution is to use the moment condition that

$$\begin{aligned} {\mathcal{E}} [ \mathbf{z}_k\mathbf{z}_k^{\prime } ] ={{\varvec{\varSigma }}}_x +o(1)\quad \text{for}\ k=1,\ldots ,m_n \end{aligned}$$

and

$$\begin{aligned} {\mathcal{E}}[ a_{kn}^{*-1} \mathbf{z}_k\mathbf{z}_k^{\prime }] = {{\varvec{\varSigma }}}_v +\frac{1}{4}{{\varvec{\varSigma }}}_x +o(1)\quad \text{for}\ k=n+1-m_n,\ldots ,n. \end{aligned}$$

In the present case, the rank of matrix \({{\varvec{\varSigma }}}_x\) is one while the matrix \({{\varvec{\varSigma }}}_v\) has a full rank. When \(p=2\) in particular, we can find a vector \({{\varvec{\beta }}}\) uniquely such that \({{\varvec{\varSigma }}}_x{{\varvec{\beta }}}=0\) with a normalization (see Sect. 5.2 for more general cases).

To estimate the structural equation vector \({{\varvec{\beta }}}\), then, it may be natural to consider the characteristic equation

$$\begin{aligned}{}[ {\hat{{\varvec{\varSigma }}}}_{x.{\mathrm{SIML}}} -\lambda {\hat{{\varvec{\varSigma }}}}_{v.{\mathrm{SIML}}}]{\hat{{\varvec{\beta }}}}_{{\mathrm{SIML}}} ={{\mathbf{0}}}, \end{aligned}$$
(30)

and

$$\begin{aligned} {{\hat{\varSigma }}}_{x.{\mathrm{SIML}}}= & {} \frac{1}{m_n} \sum _{k=1}^{m_n} \mathbf{z}_k\mathbf{z}_k^{\prime }, \end{aligned}$$
(31)
$$\begin{aligned} {{\hat{\varSigma }}}_{v.{\mathrm{SIML}}}(1)= & {} \frac{1}{2}\left[ \frac{1}{n} \sum _{k=1}^{n} \mathbf{z}_k\mathbf{z}_k^{\prime }-{{\hat{\varSigma }}}_{x.{\mathrm{SIML}}}\right] , \end{aligned}$$
(32)

or

$$\begin{aligned} {{\hat{\varSigma }}}_{v.{\mathrm{SIML}}}(2) =\frac{1}{l_n} \sum _{k=n+1-l_n}^{n} a_{kn}^{*-1}{} \mathbf{z}_k\mathbf{z}_k^{\prime } -\frac{1}{4} {{\hat{\varSigma }}}_{x.{\mathrm{SIML}}}, \end{aligned}$$
(33)

where \(\mathbf{Z}_n=(\mathbf{z}_k^{\prime }) =\mathbf{P}_n {\mathbf{C}}_n^{-1} ( {\mathbf{Y}}_n-{\mathbf{1}}_n {\bar{{\mathbf{y}}}}_0^{\prime })\) as (6), \(\lambda\) is the (scalar) eigenvalue, and \({\hat{{\varvec{\varSigma }}}}_{x.{\mathrm{SIML}}}\) and \({\hat{{\varvec{\varSigma }}}}_{v.{\mathrm{SIML}}}\) are the SIML estimators of \({{\varvec{\varSigma }}}_x\) and \({{\varvec{\varSigma }}}_v\), respectively. We require the condition that these estimators of variance–covariance matrices are non-negative definite.

Since the rank of \({{\varvec{\varSigma }}}_x\) is degenerated and it is one in the present case, it may be natural to use the smaller eigenvalue, say, \(\lambda _1\). Then the resulting characteristic vector \({\hat{{\varvec{\beta }}}}_{\mathrm{SIML}} ,\) is called the SIML estimator of \({{\varvec{\beta }}}\) because of (30). Since the estimated variance–covariance matrix of \({{\varvec{\varSigma }}}_v\) should be positive definite, we may have instability in some cases if we use (32) or (33) without any restriction as the non-negative definiteness.

A simplified (consistent) estimation may be given by

$$\begin{aligned} {\hat{{\varvec{\varSigma }}}}_{x.{\mathrm{SIML}}} \times {\hat{{\varvec{\beta }}}}_{\mathrm{SIL}}={{\mathbf{0}}}, \end{aligned}$$
(34)

that is,

$$\begin{aligned} {\hat{{\varvec{\varSigma }}}}_{x.{\mathrm{SIML}}} \times \left[ \begin{array}{c}1\\ -{\hat{{\varvec{\beta }}}}_{2.{\mathrm{SIL}}}\end{array}\right] ={{\mathbf{0}}}. \end{aligned}$$
(35)

We can solve as

$$\begin{aligned} {\hat{{\varvec{\beta }}}}_{2.{\mathrm{SIL}}} = {\hat{{\varvec{\varSigma }}}}_{22 x.{\mathrm{SIML}}}^{-1} {\hat{{\varvec{\varSigma }}}}_{21 x.{\mathrm{SIML}}}, \end{aligned}$$
(36)

where \({\hat{{\varvec{\varSigma }}}}_{22 x.{\mathrm{SIML}}}\) and \({\hat{{\varvec{\varSigma }}}}_{21 x.{\mathrm{SIML}}}\) are the (2, 2)-element and (2, 1)-element of \({\hat{{\varvec{\varSigma }}}}_{x.{\mathrm{SIML}}}\), respectively. (\({\hat{{\varvec{\varSigma }}}}_{22 x.{\mathrm{SIML}}}\) is positive with probability one.)

It is the least squares method for the transformed variables \(\mathbf{z}_k\;(k=1,\ldots ,m)\) and hence we call the separating information least squares (SILS) estimator.

Remark 3.2

We note that there were extensive discussions on alternative estimation methods including the similar form as the SIML and the SISL estimators for the errors-in-variables models and the single structural equation econometric models for independent observations. Some improvements on the finite sample properties may be possible. See Anderson (1984), Anderson et al. (1986), and Fuller (1987).

4 Gaussian likelihood function and estimation methods

It may be natural to apply the general parametric principle of the maximum likelihood (ML) method. One of interesting aspects of the present problem is the fact the ML method does not necessarily give a satisfactory solution.

As a two-dimensions example, we use the example in Sect. 3.2 and set the true parameter values as \(\sigma _{\mu }^2=0.4 , \theta =1.0\) and

$$\begin{aligned} {{\varvec{\varSigma }}}_v =\left( \begin{array}{cc} 0.45&{} 0.23\\ 0.23&{}0.4\end{array} \right) ,\quad {{\varvec{\varSigma }}}_x =\sigma _{\mu }^2{{\varvec{\pi }}}{{\varvec{\pi }}}^{\prime },\quad {{\varvec{\pi }}}=\left( \begin{array}{c}1\\ - \theta \end{array} \right) . \end{aligned}$$

Then we generate a set of simulated observations and we have drawn the Gaussian likelihood functions of \(\theta\) in Figs. 7 and 8 when n = 1000, given the true values for other parameters. It is possible to attain the maximum point of the likelihood function locally as shown in Fig. 7. Figure 8 is the same figure and suggests that there is a global maximization problem because we need the right starting point for the maximization. We have investigated the likelihood function in different cases and found that the likelihood function could have some non-concave forms in some cases as illustrated by Fig. 8. We found that the Gaussian likelihood function is nearly flat with respect to the correlation coefficient parameter and the maximization may be difficult with respect to correlation of the noise terms. These are some of important consequences in the non-stationary errors-in-variables models.

Fig. 7
figure 7

Likelihood function of \(\theta\) (\(n=1000\))

Fig. 8
figure 8

Likelihood function of \(\theta\) (\(n=1000\))

It is important to see what happens if the Gaussian assumption is not true and as an illustration we have drawn one wrong likelihood function in Fig. 9 on this problem. We generated the random variables followed by the uniform distribution on \([-2,+2]\) (i.e., \({\mathbf{w}}^{(x)}_i=(w^{(x)}_{ji})\)), the distribution of \({\mathbf{v}}_i\) is normal and the correlation coefficient \(\rho\) is 0.3. As Fig. 9 (\(\rho\), which is the correlation coefficient among measurement errors \({\mathbf{v}}_i\)) suggests, the ML estimation of the variance-covariance matrix of trend components crucially depends on the assumption of Gaussianity as we had expected. Hence we have a risk to use the ML computation to investigate the relationships among hidden trend variables unless we have the strong support for the Gaussianity of data.

Fig. 9
figure 9

Wrong likelihood function of \({\rho }\) (\(n=1000\))

Now we investigate the asymptotic properties of \((-1) \times\) the log-likelihood function and the estimation methods (the ML and SIML estimators) when \({{\varvec{\varSigma }}}_x=\mathbf{b}{} \mathbf{b}^{\prime }\;(\mathbf{b}\ne {{\mathbf{0}}})\) (i.e., the rank of \({{\varvec{\varSigma }}}_x\) is 1) and \(p \ge 2\). We normalize the Gaussian log-likelihood function by \(-(1/n)\) and rewrite

$$\begin{aligned} L_{1n}^{*}= & {} \frac{1}{n}\sum _{k=1}^n\vert a_{kn}^{*}{{\varvec{\varSigma }}}_v\vert \\&+\frac{1}{n}\sum _{k=1}^n \log ( 1+a_{kn}^{*-1}c ) \\&+\frac{1}{n}\sum _{k=1}^na_{kn}^{*-1} \mathbf{tr}\left[ {{\varvec{\varSigma }}}_v^{-1} \left( {{\varvec{\varSigma }}}_v-\frac{1}{a_{kn}^{*} +c}{} \mathbf{b}{} \mathbf{b}^{\prime }\right) {{\varvec{\varSigma }}}_v^{-1} (\mathbf{z}_k\mathbf{z}_k^{\prime }\right. \\&\left. -(a_{kn}^{*}{{\varvec{\varSigma }}}_v(\theta _0) +\mathbf{b}(\theta _0)\mathbf{b}(\theta _0)^{\prime })) \right] \\&+\frac{1}{n}\sum _{k=1}^na_{kn}^{*-1} \mathbf{tr}\left[ {{\varvec{\varSigma }}}_v^{-1} \left( {{\varvec{\varSigma }}}_v -\frac{1}{a_{kn}^{*}+c}{} \mathbf{b}{} \mathbf{b}^{\prime }\right) {{\varvec{\varSigma }}}_v^{-1} (a_{kn}^{*}{{\varvec{\varSigma }}}_v(\theta _0) +\mathbf{b}(\theta _0)\mathbf{b}(\theta _0)^{\prime }) \right] \\= & {} \frac{1}{n}\sum _{k=1}^n\vert a_{kn}^{*}{{\varvec{\varSigma }}}_v\vert +\frac{1}{n}\sum _{k=1}^n \log ( 1+a_{kn}^{*-1}c ) +L_{12n}^{*} +L_{13n}^{*}\quad (\text{say}), \end{aligned}$$
(37)

where \({{\varvec{\varSigma }}}_v(\theta _0)\) and \(\mathbf{b}(\theta _0)\) are \({{\varvec{\varSigma }}}_v\) and \(\mathbf{b},\) respectively, evaluated at the true parameter values.

We prepare the next lemma.

Lemma 4.1

Let a \(p\times p\) random vector \(\mathbf{z}_k\) follows \(N_p({{\mathbf{0}}}, {\mathbf{Q}})\). Then for any \(p\times p\) matrices \({\mathbf{A}}_k\),

$$\begin{aligned} {\mathcal{E}}[ ({\mathrm{tr}}({\mathbf{A}}_k\mathbf{z}_k\mathbf{z}_k^{\prime }))^2] = [{\mathrm{tr}}({\mathbf{A}}_k{\mathbf{Q}})]^2 +2{\mathrm{tr}}({\mathbf{A}}_k{\mathbf{Q}}{} {\mathbf{A}}_k{\mathbf{Q}}). \end{aligned}$$
(38)

Using this lemma, it is straight-forward to see that as \(n\longrightarrow \infty\) the third term converges to

$$\begin{aligned} L_{12n}^{*}{\mathop {\longrightarrow }\limits ^{p}} 0. \end{aligned}$$
(39)

Then we can establish the next result on the ML estimation by evaluating the remaining terms of \(L_{1n}^{*}\) as in the Appendix. Although we expect that the ML estimation under the Gaussian assumption has good asymptotic properties, we could not find any proof in the present setting and we have given it in the Appendix.

Theorem 4.2

For \(p\ge 2\) and the rank of \({{\varvec{\varSigma }}}_x\) being 1, we set \({{\varvec{\varSigma }}}_x=\mathbf{b}\mathbf{b}^{\prime }\) in (1) and (2). Assume that \({\mathbf{v}}_i\;(i=1,\ldots ,n)\) are the sequence of i.i.d. random vectors and \({{\varvec{\varSigma }}}_v\) is a positive definite matrix. Then under the assumption of Gaussian distributions of \({\mathbf{v}}_i\) and \({\mathbf{w}}_i^{(x)}\), the maximum likelihood estimator of \(\mathbf{b}\) and \({{\varvec{\varSigma }}}_v\) are consistent as \(n\longrightarrow \infty\).

Remark 4.1

It should be noted that in time series econometrics it has been known that coefficient parameter vector \({{\varvec{\beta }}}\) can be estimated using the standard regression method if the observed variables are co-integrated. Johansen (1995) had developed the ML method without any noise term and investigated the estimation and inference of the co-integrating vectors.

We should mention to the fact that the SIML estimator does have not only consistency as well as the asymptotic normality under the standard regularity conditions such as the 4-th order moments without the Gaussian assumption.

Theorem 4.3

(Kunitomo and Sato 2017) For \(p\ge 1\) Assume the non-stationary errors-in-variables model given by (1) and (2) with \({\mathcal{E}}[v_{ji}^4 ]<+\infty\) and \({\mathcal{E}}[w_{ji}^{(x) 4}]<+\infty\) for \({\mathbf{w}}_i^{(x)}=(w_{ji}^{(x)})\) (\(j=1,\ldots ,p;i=1,\ldots ,n\)). We set \(m_n=[n^{\alpha }]\;(0<\alpha <1)\). Then, the SIML estimator of \({{\varvec{\varSigma }}}_x\) is consistent for \(0< \alpha < 1.0\) and asymptotically normal for \(0< \alpha <0.8\) as \(m_n\longrightarrow \infty\) (\(n\longrightarrow \infty\)).

We omit the proof because it is parallel to the one given in Chapter 5 of Kunitomo et al. (2018) and the detail is given in Kunitomo and Sato (2017).

There has not been any finite sample result on the estimation methods for the non-stationary time series models with errors-in-variables. Since there are many situations with macro-economic variables that we observe the non-stationarity and measurement errors, it is worthwhile to investigate the related issues using simulations. As we have seen in Sect. 3 the ML estimator has finite sample instability and we only report the finite sample properties of the SIML estimation in this section.

We have set \(\sigma ^{2}_{\mu } = 1, \sigma ^{2}_{v} = 0.5, 2\) or 4, \(\beta _2 = 1.5\), where we summarize our setting of simulations as

$$\begin{aligned} \varSigma _{x}= & {} \left( \begin{array}{cc} \varSigma _{x,11} &{} \varSigma _{x,12} \\ \varSigma _{x,12} &{} \varSigma _{x,22} \\ \end{array} \right) = \sigma _{\mu }^2 \left( \begin{array}{c}\beta _2\\ 1 \end{array}\right) \left( \beta _2 , 1\right) ,\\ \varSigma _{v}= & {} \left( \begin{array}{cc} \varSigma _{v,11} &{} \varSigma _{v,12} \\ \varSigma _{v,12} &{} \varSigma _{v,22} \\ \end{array} \right) . \end{aligned}$$

(The parametrization is slightly different from Figs. 7, 8, 9.) We took the cases when \(n = 80\) or 400 and \(m_{n} = [n^{\alpha }]\) with \(\alpha = 0.6\) or 0.7, and the number of Monte Carlo repetition is 1500 in each case. From our simulations we summarize mail results as Table 1. In Table 1 the number inside the parentheses is the standard deviation of estimators calculated by our simulations. We found that the SIML estimator of trend variance–covariance estimates \(\varSigma _{x.ij}\;(i,j=1,2)\) have reasonable finite sample properties. We also give the SIML estimator of noise variance–covariance estimates \(\varSigma _{v.ij}\;(i,j=1,2)\), which is based on (22) are reasonable. The SILS estimate of \(\beta _2\) is biased a little in comparison with the SIML estimates, but the former has smaller sample variance than the latter.

Table 1 Finite sample properties of SIML

We have done many simulations, but the results are similar with Table 1 in the present formulation. There are several general findings, which are summarized as follows. First, on the effects of sample sizes the performance of the estimators of the SIML estimation, it becomes better as the sample size increases as we had expected. Second, when the variances of noises are small, both the SIL estimator and the SIML estimator give reasonable estimates on the coefficient parameter, the former is slightly biased toward zero while the latter has some correction of bias. The variability of the SIML estimate in terms of simulation variance is slightly larger than that of the SIL estimate. Third, when the variances of noises are not small, the SILS estimator has a significant bias.

To summarize our simulations, the finite sample performance of the SIML estimation gives reasonable performances as the asymptotic theory has suggested as in the previous sections.

5 Extensions

There can be several extensions of the problem we have been investigating. We discuss three of them rather briefly.

5.1 On autocorrelations of trend term

The main statistical problem of the time series decomposition models in Kunitomo and Sato (2017) was the estimation of non-stationary trend components and seasonal components. The results do not much depend on the specification of trend components with measurement errors. When there are autocorrelations in the trend terms in (2), the analysis of frequency domain on the underlying errors-in-variables models may give a new insight on the issue.

When the vector sequence of random vectors \({\mathbf{w}}_i^{(x)}\;(i=2,\ldots ,n)\) in (2) follows the stationary stochastic process, we write

$$\begin{aligned} {\mathbf{w}}_i^{(x)}=\sum _{j=0}^{\infty }{} {\mathbf{C}}_j^{(x)}\mathbf{e}_{i-j}^{(x)}, \end{aligned}$$
(40)

where \({\mathbf{C}}_0^{(x)}={\mathbf{I}}_p\), \({\mathbf{C}}_j^{(x)}\;(j\ge 1)\) are absolutely convergent coefficient matrices and \(\mathbf{e}_{j}^{(x)}\;(j\ge 1)\) are the sequence of i.i.d. vectors with \({\mathcal{E}}[\mathbf{e}_j^{(x)}]={{\mathbf{0}}}\) and \({\mathcal{E}} [ \mathbf{e}_j^{(x)}{} \mathbf{e}_j^{(x)'} ]={{\varvec{\varSigma }}}_x\). (We use \(\Vert {\mathbf{C}}_j^{(x)}\Vert =\max _{k,l=1,\ldots ,p}\vert c_j^{(x)}(k,l)\vert\) for \(c_j^{(x)}(k,l)\) is the (kl)-th element of \({\mathbf{C}}_j\) and \(\sum _{j=0}^{\infty }\Vert {\mathbf{C}}_j\Vert <+\infty\).)

Then we can represent the spectral density \(p\times p\) matrix of \(\varDelta {\mathbf{x}}_i\) as

$$\begin{aligned} f_{\varDelta x}(\lambda ) =\frac{1}{\pi } \left( \sum _{j=0}^{\infty } {\mathbf{C}}_j^{(x)}{\mathrm{e}}^{2i\lambda j}\right) {{\varvec{\varSigma }}}_x \left( \sum _{j=0}^{\infty } {\mathbf{C}}_j^{(x) '}{\mathrm{e}}^{-2i\lambda j}\right) \quad \left( -\frac{\pi }{2}\le \lambda \le \frac{\pi }{2}\right) , \end{aligned}$$
(41)

where \(i^2=-1\), \(\lambda\) is the frequency, and \({{\varvec{\varSigma }}}_x\) is the variance–covariance matrix of \(\mathbf{e}_j^{(x)}\) (see Chapter 7 of Anderson (1971) for instance).

Then the \(p\times p\) spectral density matrix of the transformed vector process \(\varDelta {\mathbf{y}}_i\;(= {\mathbf{y}}_i-{\mathbf{y}}_{i-1})\) can be also represented as

$$\begin{aligned} f_{\varDelta y}(\lambda ) =f_{\varDelta x}(\lambda ) +\frac{1}{\pi } (1-{\mathrm{e}}^{2i\lambda }) {{\varvec{\varSigma }}}_v (1-{\mathrm{e}}^{-2i\lambda }). \end{aligned}$$
(42)

Since the transformed random vectors \({{\varvec{z}}}_k\;(k=1,\ldots ,n)\) correspond to the Fourier transformed vectors of \(\varDelta {\mathbf{y}}_i\;(i=1,\ldots ,n)\) except the initial condition \({\mathbf{y}}_0,\) it is possible to estimate the spectral density matrix \(f_{\varDelta x}(\lambda )\) at the zero-frequency from the sequence of observations of \(\mathbf{z}_k\;(k=1,\ldots ,n)\) and we write

$$\begin{aligned} {{\varvec{\varOmega }}}_x= \pi f_{\varDelta x}(0)=\pi f_{\varDelta y}(0). \end{aligned}$$
(43)

We denote the components of \({{\varvec{\varOmega }}}_x=(\varOmega _{x.ij})\;(i,j=1,2)\) and we have the same form of SIML estimate in (11) except the fact that the estimated parameter is \({{\varvec{\varOmega }}}_x\) instead of \({{\varvec{\varSigma }}}_x\). Then the asymptotic results stated in Theorem 4.3 hold when we have (1), (2), and (40) with mild regularity conditions using straight-forward, but lengthy arguments. It is because the SIML estimation can be regarded a kernel estimation of the spectral density matrix at zero frequency.

5.2 Reduced rank condition

For the multivariate non-stationary (economic) time series, there are possibilities of co-integration in trends. In our framework, it may be interesting to consider the general case of reduced rank cases when

$$\begin{aligned} {\mathrm{rank}}[ {{\varvec{\varSigma }}}_x]=q_x ,\quad 1\le q_x < p, \end{aligned}$$
(44)

where we can represent \({{\varvec{\varSigma }}}_x=\mathbf{B}{} \mathbf{B}^{\prime }\) and \(\mathbf{B}\) is a \(p\times q_x\) matrix (its rank is \(q_x\)).

Then the example in Sect. 3.2 corresponds to the case when \(q_x=1<p\) and \(p\ge 2.\)

In the more general cases, however, there is a parametrization problem for the \(p\times p\) matrix \({{\varvec{\varSigma }}}_x ,\) whose rank is \(q_x\;(1\le q_x<p,p\ge 2)\). Then we can take a \(p\times r_x\) (\(r_x=p-q_x\)) matrix \({{\varvec{\beta }}}\) such that \(\mathbf{B}^{\prime }{{\varvec{\beta }}}=\mathbf{O}\) and a normalization such as \({{\varvec{\beta }}}^{\prime }{{\varvec{\varSigma }}}_v^{-1}{\varvec{\beta }}=({\mathrm{diag}}\;c_{ii})\) for instance. The algebra of Sect. 3.2 can be extended by using the matrix formulae such that for a positive definite matrix \({\mathbf{A}}\), a \(p\times q_x\) matrix \(\mathbf{B}\) (the rank is \(q_x\)) and a \(p\times r_x\) matrix \({{\varvec{\beta }}}\) (the rank is \(r_x\)), we have \({{\varvec{\varSigma }}}_x=\mathbf{B}{} \mathbf{B}^{\prime }\), \(\vert {\mathbf{A}}+\mathbf{B}\mathbf{B}^{\prime }\vert =\vert {\mathbf{A}}\vert \vert {\mathbf{I}}_{q_x} +\mathbf{B}^{\prime }{\mathbf{A}}^{-1}{} \mathbf{B} \vert\) and

$$\begin{aligned} {[}{} {\mathbf{A}}+\mathbf{B}{} \mathbf{B}^{\prime }]^{-1} ={\mathbf{A}}^{-1}- {\mathbf{A}}^{-1}\mathbf{B}[ {\mathbf{I}}_{q_x} + \mathbf{B}^{\prime }{} {\mathbf{A}}^{-1}{} \mathbf{B}]^{-1} \mathbf{B}^{\prime }{} {\mathbf{A}}^{-1}. \end{aligned}$$
(45)

Hence we can use the characteristic vectors with the \(r_x\) smaller characteristic roots of the equation [which is similar to (30)].

The special case when \(q_x=p-1\) has attracted special attention among econometricians or economists because there exists a unique vector \({{\varvec{\beta }}}\) such that \({{\varvec{\varSigma }}}_x{{\varvec{\beta }}}={{\mathbf{0}}}\) with a normalization. It has been called the single structural equation in traditional econometrics or the co-integrated relation in time series econometrics and we can extend the developments of Sect. 3.2. When \(p>2\) and \(0<q_x <p-1\), there are \(r_x\;(=p-q_x)\) co-integrated vectors satisfying \(\mathbf{B}^{\prime }{{\varvec{\beta }}}=\mathbf{O}\) and then there is an identification problem on the vectors consisting of \({{\varvec{\beta }}}\).

Remark 5.1

The problem of reduced rank condition was a central issue in Anderson (1984) for the case of independent observations in statistical multivariate analysis. For the non-stationary time series without measurement-errors, Engle and Granger (1987), and Johansen (1995) have developed the statistical inference and they called the resulting equations from the non-stationary multiple time series as co-integrated relations. It is possible to develop the testing procedure of the rank condition for hidden non-stationary components.

5.3 Higher order integrated processes

In some cases the second-order (or higher order) differencing may be often appropriate for modelling economic time series. Although the likelihood function can be complicated in general, we can develop the SIML estimation when \(p\ge 1\) and \(d=2 ,\) where

$$\begin{aligned} \varDelta ^d {\mathbf{x}}_i={\mathbf{v}}_i^{(x)}, \end{aligned}$$
(46)

\({\mathcal{E}}[{\mathbf{v}}_i^{(x)}]={{\mathbf{0}}},\) and \({\mathcal{E}}[{\mathbf{v}}_i^{(x)}{} {\mathbf{v}}_i^{(x)'}]={{\varvec{\varSigma }}}_x\).

We use the \({\mathbf{K}}_n\)-transformation that from the observation matrix \({\mathbf{Y}}_n\) to \(\mathbf{Z}_n^{(2)}\;(=(\mathbf{z}_k^{(2)'}))\) by

$$\begin{aligned} \mathbf{Z}_n^{(2)}=\left( \mathbf{z}^{(2)'}_k\right) ={\mathbf{K}}_n \left( {\mathbf{Y}}_n-{\bar{{\mathbf{Y}}}}_0 \right) , \quad {\mathbf{K}}_n=\mathbf{P}_n {\mathbf{C}}_n^{-2} . \end{aligned}$$
(47)

Then the separating information maximum likelihood (SIML) estimator of \({\hat{{\varvec{\varSigma }}}}_x\) in this case can be defined by

$$\begin{aligned} {\hat{{\varvec{\varSigma }}}}_{x,{\mathrm{SIML}}} =\frac{1}{m_n}\sum _{k=1}^{m_n} \mathbf{z}_k^{(2)}{} \mathbf{z}_k^{(2)'}. \end{aligned}$$
(48)

We prepare the next lemma.

Lemma 5.1

Let

$$\begin{aligned} {\mathbf{K}}_n= (b_{ij}^{(2)})=\mathbf{P}_n {\mathbf{C}}_n^{-2}. \end{aligned}$$
(49)

Then for \(i, i^{\prime }=1,\ldots ,m_n\), we have

$$\begin{aligned} \sum _{j=1}^n b_{ij}^{(2)} b_{i^{\prime } ,j}^{(2)} =\delta (i,i^{\prime }) \left[ 2 \sin \left( \frac{\pi }{2}\frac{2i-1}{2n+1}\right) \right] ^4 +O\left( \frac{1}{n}\right) . \end{aligned}$$
(50)

Using the above lemma, it is straight-forward to obtain the next result for the case of \(d\ge 1\). We omit the proof because it is essentially the same to the case when \(d=1\) except Lemma 5.1. It is clear that the result holds for a positive integer d.

Theorem 5.2

Assume \(p\ge 1 ,\) \(d=2\) and \(m_n/n\longrightarrow 0\) as \(n \longrightarrow \infty\), where \(m_n=[ n^{\alpha }] \;(0<\alpha <1)\). Under the assumption of existence of fourth order moments, the SIML estimator of \({{\varvec{\varSigma }}}_x\) is consistent with \(0< \alpha <1\) and asymptotically normal with \(0<\alpha <0.8\) as \(n\longrightarrow \infty\).

It should be important to note that the diagonal elements \(a_{kn}^{*}\;(k=1,\ldots ,n)\) should be modified to

$$\begin{aligned} a_{kn}^{(2)}=\left[ 2 \sin \frac{\pi }{2n+1}\left( k-\frac{1}{2}\right) \right] ^4 \end{aligned}$$
(51)

in the present case and we need the corresponding bias correction for estimating the variance–covariance matrix \({{\varvec{\varSigma }}}_v\).

Remark 5.2

Akaike (1989) and Kitagawa (2010) have proposed to use the ML estimation with some restriction for the filtering problem of the same models when \(d\ge 1\). The SIML estimation would be a useful tool for the state space modeling of non-stationary multivariate time series because it does not have any computational problem without any restriction and it has an asymptotic robustness. The SIML method may give reasonable estimates not only for the coefficients parameters, but also the variance-covariance matrices even if we do not know \(d=1,2\) by taking \(m_n\) appropriately.

6 Concluding remarks

In this study, we introduced the non-stationary errors-in-variables models. We first illustrated the reason why the presence of noise term in the non-stationary time series, even if it was small, forces to change the standard way of thinking in time series analysis. Then, we discussed the finite sample and large sample properties of the ML and SIML estimation methods for the non-stationary errors-in-variables models when there are non-stationary trends and noise components (or the measurement errors). We have found that the Gaussian likelihood function shows non-concave shape in some cases and the ML method does work when the Gaussianities of non-stationary and stationary components hold with some restriction such as the signal–noise variance ratio in the parameter space. The SIML estimation has the asymptotic robust properties under general conditions of existence of fourth-order moments for consistency and asymptotic normality. We have investigated the conditions for the asymptotic properties of the ML and SIML estimations as well as some simulations. The SIML method gives reasonable estimates for the coefficients parameters when the random variables do not necessarily follow the Gaussian distribution and we do not have much knowledge on the value of variance ratio c by taking \(m_n\) appropriately. Since we usually do not have information on the magnitude of the signal–noise variance ratio and the precise distribution of noises in advance when we observe non-stationary data, it is important to use the statistical method which does not depend on these conditions for practical purpose.

There are several possible extensions and some discussions are given in Sects. 3 and 5. It may be interesting to see if the results reported in this paper would be valid when there are non-stationary seasonal components. There can be many applications of the methods we have discussed because many economists are interested in macro-economic data and their relations using seasonally adjusted data. Another issue may be that some econometricians have relied on the large sample asymptotic theory and used the approximations based on Brownian motions, which are the limits of random walks. Although there are some procedures based on Quadratic Variation [see (12)] of the observed data to estimate the variance–covariances of innovations and the long-run variance–covariances, there is a fundamental problem as illustrated by examples in Sect. 2.2.

If we ignore the presence of noise components and/or measurement components, there could be serious problems. For instance, the actual sample size of macro-economic data is usually not large and often we have measurement errors. In such situations we should be careful to use such asymptotic theory and we need to investigate the finite sample properties of alternative estimation methods and their improvements. The precise consequences of these effects are currently under investigation and some progress on the filtering and smoothing problem of noisy non-stationary multivariate time series has been reported in Kunitomo and Sato (2019).