Abstract
We derive novel explicit formulas for the inverses of truncated block Toeplitz matrices that correspond to a multivariate minimal stationary process. The main ingredients of the formulas are the Fourier coefficients of the phase function attached to the spectral density of the process. The derivation of the formulas is based on a recently developed finite prediction theory applied to the dual process of the stationary process. We illustrate the usefulness of the formulas by two applications. The first one is a strong convergence result for solutions of general block Toeplitz systems for a multivariate short-memory process. The second application is closed-form formulas for the inverses of truncated block Toeplitz matrices corresponding to a multivariate ARMA process. The significance of the latter is that they provide us with a linear-time algorithm to compute the solutions of corresponding block Toeplitz systems.
Similar content being viewed by others
1 Introduction
Let \({\mathbb {T}}:=\{z\in {\mathbb {C}}:\vert z\vert =1\}\) be the unit circle in \({\mathbb {C}}\). We write \(\sigma \) for the normalized Lebesgue measure \(d\theta /(2\pi )\) on \(([-\pi ,\pi ), {\mathcal {B}}([-\pi ,\pi ))\), where \({\mathcal {B}}([-\pi ,\pi ))\) is the Borel \(\sigma \)-algebra on \([-\pi ,\pi )\); thus we have \(\sigma ([-\pi ,\pi ))=1\). For \(p\in [1,\infty )\), we write \(L_p({\mathbb {T}})\) for the Lebesgue space of measurable functions \(f:{\mathbb {T}}\rightarrow {\mathbb {C}}\) such that \(\Vert f\Vert _p<\infty \), where \(\Vert f\Vert _p:=\{\int _{-\pi }^{\pi }\vert f(e^{i\theta })\vert ^p \sigma (d\theta )\}^{1/p}\). Let \(L_p^{m\times n}({\mathbb {T}})\) be the space of \({\mathbb {C}}^{m\times n}\)-valued functions on \({\mathbb {T}}\) whose entries belong to \(L_p({\mathbb {T}})\).
Let \(d\in {\mathbb {N}}\). For \(n\in {\mathbb {N}}\), we consider the block Toeplitz matrix
where
and the symbol w satisfies the following two conditions:
Let \(\{X_k:k\in {\mathbb {Z}}\}\) be a \({\mathbb {C}}^d\)-valued, centered, weakly stationary process that has spectral density w, hence autocovariance function \(\gamma \). Then the conditions (1.2) and (1.3) imply that \(\{X_k\}\) is minimal (see Sect. 10 of [21, Chapter II]).
In this paper, we show novel explicit formulas for \(T_n(w)^{-1}\) (Theorem 2.1), which are especially useful for large n (see [2]). The formulas are new even for \(d=1\). The main ingredients of the formulas are the Fourier coefficients of \(h^*h_{\sharp }^{-1}=h^{-1}h_{\sharp }^*\), where h and \(h_{\sharp }\) are \({\mathbb {C}}^{d\times d}\)-valued outer functions on \({\mathbb {T}}\) such that
(see [10]; see also Sect. 2). We note that the unitary matrix valued function \(h^*h_{\sharp }^{-1}=h^{-1}h_{\sharp }^*\) on \({\mathbb {T}}\) attached to w is called the phase function of w (see page 428 in [20]).
Let \(\{X_k\}\) be as above, and let \(\{X^{\prime }_k: k\in {\mathbb {Z}}\}\) be the dual process of \(\{X_k\}\) (see [19]; see also Sect. 2 below). In the proof of the above explicit formulas for \(T_n(w)^{-1}\), the dual process \(\{X^{\prime }_k\}\) plays an important role. In fact, the key to the proof of the explicit formulas for \(T_n(w)^{-1}\) is the following equality (Theorem 3.1):
Here, \(\langle \cdot , \cdot \rangle \) stands for the Gram matrix (see Sect. 3) and \(P_{[1,n]}X^{\prime }_t\) denotes the best linear predictor of \(X^{\prime }_t\) based on the observations \(X_{1},\dots ,X_{n}\) (see Sect. 2 for the precise definition). Moreover, for \(n\in {\mathbb {N}}\), \(A \in {\mathbb {C}}^{dn \times dn}\) and \(s, t\in \{1,\dots ,n\}\), we write \(A^{s,t}\in {\mathbb {C}}^{d\times d}\) for the (s, t) block of A; thus \(A = (A^{s,t})_{1\le s, t\le n}\). The equality (1.5) enables us to apply the \(P_{[1,n]}\)-related methods developed in [11, 12, 14,15,16] and others to derive the explicit formulas for \(T_n(w)^{-1}\).
We illustrate the usefulness of the explicit formulas for \(T_n(w)^{-1}\) by two applications. The first one is a strong convergence result for solutions of block Toeplitz systems. For this application, we assume (1.2) as well as the following condition:
Here, for \(a\in {\mathbb {C}}^{d\times d}\), \(\Vert a\Vert \) denotes the operator norm of a. The condition (1.6) implies that \(\{X_k\}\) with spectral density w is a short-memory process. We note that (1.3) follows from (1.2) and (1.6) (see Sect. 4). Under (1.2) and (1.6), for \(n\in {\mathbb {N}}\) and a \({\mathbb {C}}^{d\times d}\)-valued sequence \(\{y_k\}_{k=1}^{\infty }\) such that \(\sum _{k=1}^{\infty } \Vert y_k\Vert < \infty \), let
be the solution to the block Toeplitz system
where
Also, let
be the solution to the corresponding infinite block Toeplitz system
where
and
Then, our result (Theorem 4.1) reads as follows:
We explain the background of the result (1.14). As above, let \(\{X_k:k\in {\mathbb {Z}}\}\) be a \({\mathbb {C}}^d\)-valued, centered, weakly stationary process that has spectral density w. For \(n\in {\mathbb {N}}\), the finite and infinite predictor coefficients \(\phi _{n,k}\in {\mathbb {C}}^{d\times d}\), \(k\in \{1,\dots ,n\}\), and \(\phi _k\), \(k\in {\mathbb {N}}\), of \(\{X_k\}\) are defined by
respectively; see Sect. 3 for the precise definitions of \(P_{[1,n]}\) and \(P_{(-\infty ,n]}\). We note that \(\sum _{k=1}^{\infty } \Vert \phi _k\Vert < \infty \) holds under (1.2) and (1.6) (see Sect. 4 below and (2.16) in [16]). Baxter’s inequality in [1, 5, 9] states that, under (1.2) and (1.6), there exists \(K\in (0,\infty )\) such that
In particular, we have
If we put \({\tilde{w}}(e^{i\theta }):=w(e^{-i\theta })\), then, \((\phi _{n,1},\dots ,\phi _{n,n})\) is the solution to the block Toeplitz system
called the Yule–Walker equation, while \((\phi _{1},\phi _{2}, \dots )\) is the solution to the corresponding infinite block Toeplitz system
Clearly, \({\tilde{w}}\) satisfies (1.2) and (1.6) since so does w. Therefore, our result (1.14) can be viewed as an extension to (1.16). It should be noted, however, that we prove (1.14) directly, without proving an analogue of Baxter’s inequality (1.15).
The convergence result (1.16) has various applications in time series analysis, such as the autoregressive sieve bootstrap (see, e.g., [16] and the references therein), whille Toeplitz systems of the form (1.8) appear in various fields, such as filtering of signals. Therefore the extension (1.14), as well as the other results explained below, may potentially be useful in such fields. We note that Baxter’s inequality (1.15), hence (1.16), is also proved for univariate and multivariate FARIMA (fractional autoregressive integrated moving-average) processes, which are long-memory processes, in [14] and [16], respectively. The FARIMA processes have singular spectral densities w but our explicit formulas for \(T_n(w)^{-1}\) above also cover them since we only assume minimality in the formulas. Applications of the explicit formulas to univariate and multivariate FARIMA processes will be discussed elsewhere. However, the problem of proving results of the type (1.14) for FARIMA processes remains unsolved so far.
The second application of the explicit formulas for \(T_n(w)^{-1}\) is closed-form formulas for \(T_n(w)^{-1}\) with rational w that corresponds to a univariate (\(d=1\)) or multivariate (\(d\ge 2\)) ARMA (autoregressive moving-average) process (Theorem 5.2). More precisely, we assume that w is of the form
where \(h:{\mathbb {T}}\rightarrow {\mathbb {C}}^{d\times d}\) satisfies the following condition:
Here \({\overline{{\mathbb {D}}}}:=\{z\in {\mathbb {C}}:\vert z\vert \le 1\}\) is the closed unit disk in \({\mathbb {C}}\). The closed-form formulas for \(T_n(w)^{-1}\) consist of several building block matrices that are of fixed sizes independent of n. The significance of the formulas for \(T_n(w)^{-1}\) is that they provide us with a linear-time, or O(n), algorithm to compute the solution \(Z\in {\mathbb {C}}^{dn\times d}\) to the block Toeplitz system
for \(Y\in {\mathbb {C}}^{dn\times d}\) (see Sect. 6). The famous Durbin–Levinson algorithm solves the Eq. (1.19) for more general w in \(O(n^2)\) time. Algorithms for Toeplitz linear systems that run faster than \(O(n^2)\) are called superfast. While our algorithm is restricted to the class of w corresponding to ARMA processes, the class is important in applications, and the linear-time algorithm is ideally superfast in the sense that there is no algorithm faster than O(n).
Toeplitz matrices appear in a variety of fields, including operator theory, orthogonal polynomials on the unit circle, time series analysis, engineering, and physics. Therefore, there is a vast amount of literature on Toeplitz matrices. Here, we refer to [2, 3, 6, 8, 22, 23] and [24] as textbook treatments. For example, in [6, III], the Gohberg-Semencul formulas in [7], which express the inverse of a Toeplitz matrix as a difference of products of lower and upper triangular Toeplitz matrices, are explained.
After this work was completed, the author learned of [25] by Subba Rao and Yang, where they also provide an explicit series expansion for \(T_n(w)^{-1}\) that corresponds to a univariate stationary process satisfying some conditions (see [25], Sect. 3.2). The main aim of [25] is to reconcile the Gaussian and Whittle likelihood, and the series expansion in [25] is tailored to this purpose, using the complete DFT (discrete Fourier transform) introduced in [25]. It should be noticed that \(T_n(w)^{-1}\) appears in the Gaussian likelihood, while the Whittle likelihood is based on the ordinary DFT. Since most results of the present paper directly concern \(T_n(w)^{-1}\), some of them may also be useful for studies related to the Gaussian likelihood.
This paper is organized as follows. We state the explicit formulas for \(T_n(w)^{-1}\) in Sect. 2. In Sect. 3, we first prove (1.5) and then use it to prove the explicit formulas for \(T_n(w)^{-1}\). In Sect. 4, we prove (1.14) for w satisfying (1.2) and (1.6), using the explicit formulas for \(T_n(w)^{-1}\). In Sect. 5, we prove the closed-form formulas for \(T_n(w)^{-1}\) with w satisfying (1.18), using the explicit formulas for \(T_n(w)^{-1}\). In Sect. 6, we explain how the results in Sect. 5 give a linear-time algorithm to compute the solution to (1.19). Finally, the Appendix contains the omitted proofs of two lemmas.
2 Explicit formulas
Let \({\mathbb {C}}^{m\times n}\) be the set of all complex \(m\times n\) matrices; we write \({\mathbb {C}}^d\) for \({\mathbb {C}}^{d\times 1}\). Let \(I_n\) be the \(n\times n\) unit matrix. For \(a\in {\mathbb {C}}^{m\times n}\), \(a^{\top }\) denotes the transpose of a, and \({\overline{a}}\) and \(a^*\) the complex and Hermitian conjugates of a, respectively; thus, in particular, \(a^*:={\overline{a}}^{\top }\). For \(a\in {\mathbb {C}}^{d\times d}\), we write \(\Vert a\Vert \) for the operator norm of a:
Here \(\vert u\vert :=(\sum _{i=1}^d\vert u^i\vert ^2)^{1/2}\) denotes the Euclidean norm of \(u=(u^1,\dots ,u^d)^{\top }\in {\mathbb {C}}^d\). For \(p\in [1,\infty )\) and \(K\subset {\mathbb {Z}}\), \(\ell _p^{d\times d}(K)\) denotes the space of \({\mathbb {C}}^{d\times d}\)-valued sequences \(\{a_k\}_{k\in K}\) such that \(\sum _{k\in K}\Vert a_k\Vert ^p<\infty \). We write \(\ell _{p+}^{d\times d}\) for \(\ell _p^{d\times d}({\mathbb {N}}\cup \{0\})\) and \(\ell _{p+}\) for \(\ell _{p+}^{1\times 1}=\ell _p^{1\times 1}({\mathbb {N}}\cup \{0\})\).
Recall \(\sigma \) from Sect. 1. The Hardy class \(H_2({\mathbb {T}})\) on \({\mathbb {T}}\) is the closed subspace of \(L_2({\mathbb {T}})\) consisting of \(f\in L_2({\mathbb {T}})\) such that \(\int _{-\pi }^{\pi }e^{im\theta }f(e^{i\theta })\sigma (d\theta )=0\) for \(m=1,2,\dots \). Let \(H_2^{m\times n}({\mathbb {T}})\) be the space of \({\mathbb {C}}^{m\times n}\)-valued functions on \({\mathbb {T}}\) whose entries belong to \(H_2({\mathbb {T}})\). Let \({\mathbb {D}}:=\{z\in {\mathbb {C}}: \vert z\vert {<}1\}\) be the open unit disk in \({\mathbb {C}}\). We write \(H_2({\mathbb {D}})\) for the Hardy class on \({\mathbb {D}}\), consisting of holomorphic functions f on \({\mathbb {D}}\) such that \(\sup _{r\in [0,1)}\int _{-\pi }^{\pi }\vert f(re^{i\theta })\vert ^2\sigma (d\theta )<\infty \). As usual, we identify each function f in \(H_2({\mathbb {D}})\) with its boundary function \(f(e^{i\theta }):=\lim _{r\uparrow 1}f(re^{i\theta })\), \(\sigma \)-a.e., in \(H_2({\mathbb {T}})\). A function h in \(H_2^{d\times d}({\mathbb {T}})\) is called outer if \(\det h\) is a \({\mathbb {C}}\)-valued outer function, that is, \(\det h\) satisfies \(\log \vert \det h(0)\vert =\int _{-\pi }^{\pi }\log \vert \det h(e^{i\theta })\vert \sigma (d\theta )\) (see Definition 3.1 in [18]).
We assume that w satisfies (1.2) and (1.3). Then \(\log \det w\) is in \(L_1({\mathbb {T}})\) (see Sect. 3 in [16]). Therefore w has the decompositions (1.4) for two outer functions h and \(h_{\sharp }\) belonging to \(H_2^{d\times d}({\mathbb {T}})\), and h and \(h_{\sharp }\) are unique up to constant unitary factors (see Chapter II in [21] and Theorem 11 in [10]; see also Sect. 3 in [16]). We may take \(h_{\sharp }=h\) for the case \(d=1\) but there is no such simple relation between h and \(h_{\sharp }\) for \(d\ge 2\). We define the outer function \({\tilde{h}}\) in \(H_2^{d\times d}({\mathbb {T}})\) by
All of \(h^{-1}\), \(h_{\sharp }^{-1}\) and \({\tilde{h}}^{-1}\) also belong to \(H_2^{d\times d}({\mathbb {T}})\) since we have assumed (1.3).
We define four \({\mathbb {C}}^{d\times d}\)-valued sequences \(\{c_k\}\), \(\{a_k\}\), \(\{{\tilde{c}}_k\}\) and \(\{{\tilde{a}}_k\}\) by
and
respectively. By (1.3), all of \(\{c_k\}\), \(\{a_k\}\), \(\{{\tilde{c}}_k\}\) and \(\{{\tilde{a}}_k\}\) belong to \(\ell _{2+}^{d\times d}\).
We define a \({\mathbb {C}}^{d\times d}\)-valued sequence \(\{\beta _k\}_{k=-\infty }^{\infty }\) as the (minus of the) Fourier coefficients of the phase function \(h^*h_{\sharp }^{-1}=h^{-1}h_{\sharp }^*\):
For \(n\in {\mathbb {N}}\), \(u \in \{1,\dots ,n\}\) and \(k\in {\mathbb {N}}\), we can define the sequences \(\{b_{n,u,\ell }^k\}_{\ell =0}^{\infty }\in \ell _{2+}^{d\times d}\) by the recursion
(see Sect. 3 below). Similarly, for \(n\in {\mathbb {N}}\), \(u \in \{1,\dots ,n\}\) and \(k\in {\mathbb {N}}\), we can define the sequences \(\{{\tilde{b}}_{n,u,\ell }^k\}_{\ell =0}^{\infty }\in \ell _{2+}^{d\times d}\) by the recursion
Recall from Sect. 1 that \((T_n(w)^{-1})^{s,t}\) denotes the (s, t) block of \(T_n(w)^{-1}\). Since \(T_n(w)\), hence \(T_n(w)^{-1}\), is self-adjoint, we have
We use the following notation:
We are ready to state the explicit formulas for \((T_n(w))^{-1}\).
Theorem 2.1
We assume (1.2) and (1.3). Then the following two assertions hold.
-
(i)
For \(n\in {\mathbb {N}}\) and \(s, t\in \{1,\dots ,n\}\), we have
$$\begin{aligned}&\left( T_n(w)^{-1}\right) ^{s,t} = \sum _{\ell = 1}^{s\wedge t} {\tilde{a}}_{s - \ell }^* {\tilde{a}}_{t - \ell } \nonumber \\&\qquad + \sum _{u=1}^t \sum _{k=1}^{\infty } \left\{ \sum _{\ell = 0}^{\infty } {\tilde{b}}_{n,u,\ell }^{2k-1} a_{n + 1 - s + \ell } + \sum _{\ell = 0}^{\infty } {\tilde{b}}_{n,u,\ell }^{2k} {\tilde{a}}_{s + \ell } \right\} ^* {\tilde{a}}_{t-u}. \end{aligned}$$(2.10) -
(ii)
For \(n\in {\mathbb {N}}\) and \(s, t\in \{1,\dots ,n\}\), we have
$$\begin{aligned}&\left( T_n(w)^{-1}\right) ^{s,t} = \sum _{\ell =s\vee t}^n a_{\ell - s}^* a_{\ell - t} \nonumber \\&\qquad + \sum _{u=t}^n \sum _{k=1}^{\infty } \left\{ \sum _{\ell = 0}^{\infty } b_{n,u,\ell }^{2k-1} {\tilde{a}}_{s + \ell } + \sum _{\ell = 0}^{\infty } b_{n,u,\ell }^{2k} a_{n + 1 - s + \ell } \right\} ^* a_{u-t}. \end{aligned}$$(2.11)
The proof of Theorem 2.1 will be given in Sect. 3.
Corollary 2.1
We assume (1.2) and (1.3). Then the following two assertions hold.
-
(i)
For \(n\in {\mathbb {N}}\) and \(s, t\in \{1,\dots ,n\}\), we have
$$\begin{aligned}&\left( T_n(w)^{-1}\right) ^{s,t} = \sum _{\ell = 1}^{s\wedge t} {\tilde{a}}_{s - \ell }^* {\tilde{a}}_{t - \ell } \nonumber \\&\qquad + \sum _{u=1}^s {\tilde{a}}_{s-u}^* \sum _{k=1}^{\infty } \left\{ \sum _{\ell = 0}^{\infty } {\tilde{b}}_{n,u,\ell }^{2k-1} a_{n + 1 - t + \ell } + \sum _{\ell = 0}^{\infty } {\tilde{b}}_{n,u,\ell }^{2k} {\tilde{a}}_{t + \ell } \right\} . \end{aligned}$$(2.12) -
(ii)
For \(n\in {\mathbb {N}}\) and \(s, t\in \{1,\dots ,n\}\), we have
$$\begin{aligned}&\left( T_n(w)^{-1}\right) ^{s,t} = \sum _{\ell =s\vee t}^n a_{\ell - s}^* a_{\ell - t} \nonumber \\&\qquad + \sum _{u=s}^n a_{u-s}^* \sum _{k=1}^{\infty } \left\{ \sum _{\ell = 0}^{\infty } b_{n,u,\ell }^{2k-1} {\tilde{a}}_{t + \ell } + \sum _{\ell = 0}^{\infty } b_{n,u,\ell }^{2k} a_{n + 1 - t + \ell } \right\} . \end{aligned}$$(2.13)
Proof
Thanks to (2.9), we obtain (2.12) and (2.13) from (2.10) and (2.11), respectively. \(\square \)
Remark 2.1
Recall \(T_{\infty }(w)\) from (1.12). For \(n\in {\mathbb {N}}\cup \{0\}\), we have \(\gamma (n)=\sum _{k=0}^{\infty } {\tilde{c}}_k {\tilde{c}}_{n+k}^*\) and \(\gamma (-n)=\sum _{k=0}^{\infty } {\tilde{c}}_{n+k} {\tilde{c}}_{k}^*\) (see (2.13) in [16]), hence \(T_{\infty }(w) = {\tilde{C}}_{\infty } ({\tilde{C}}_{\infty })^*\), where
On the other hand, it follows from \({\tilde{h}}(z){\tilde{h}}(z)^{-1}=I_d\) that \(\sum _{k=0}^n {\tilde{c}}_k {\tilde{a}}_{n-k} = -\delta _{n0}I_d\) for \(n\in {\mathbb {N}}\cup \{0\}\), hence \({\tilde{C}}_{\infty } {\tilde{A}}_{\infty } = -I_{\infty }\), where
Combining, we have \(T_{\infty }(w)^{-1} = ({\tilde{A}}_{\infty })^* {\tilde{A}}_{\infty }\). Thus, we find that the first term \(\sum _{\ell = 1}^{s\wedge t} {\tilde{a}}_{s - \ell }^* {\tilde{a}}_{t - \ell }\) in (2.10) or (2.12) coincides with the (s, t) block of \(T_{\infty }(w)^{-1}\).
For \(n\in {\mathbb {N}}\), we define
and
The next lemma will turn out to be useful in Sect. 6.
Lemma 2.1
For \(n\in {\mathbb {N}}\) and \(s, t \in \{1,\dots ,n\}\), we have the following two equalities:
The proof of Lemma 2.1 is straightforward and will be omitted.
3 Proof of Theorem 2.1
In this section, we prove Theorem 2.1. We assume (1.2) and (1.3). Let \(\{X_k\}=\{X_k:k\in {\mathbb {Z}}\}\) be a \({\mathbb {C}}^d\)-valued, centered, weakly stationary process, defined on a probability space \((\varOmega , {\mathcal {F}}, P)\), that has spectral density w, hence autocovariance function \(\gamma \). Thus we have \(E[X_k X_0^*] = \gamma (k) = \int _{-\pi }^{\pi }e^{-ik\theta }w(e^{i\theta })(d\theta /(2\pi ))\) for \(k\in {\mathbb {Z}}\).
Write \(X_k=(X^1_k,\dots ,X^d_k)^{\top }\), and let V be the complex Hilbert space spanned by all the entries \(\{X^j_k: k\in {\mathbb {Z}},\ j=1,\dots ,d\}\) in \(L^2(\varOmega , {\mathcal {F}}, P)\), which has inner product \((x, y)_{V}:=E[x{\overline{y}}]\) and norm \(\Vert x\Vert _{V}:=(x,x)_{V}^{1/2}\). For \(J\subset {\mathbb {Z}}\) such as \(\{n\}\), \((-\infty ,n]:=\{n,n-1,\dots \}\), \([n,\infty ):=\{n,n+1,\dots \}\), and \([m,n]:=\{m,\dots ,n\}\) with \(m\le n\), we define the closed subspace \(V_J^X\) of V by
Let \(P_J\) and \(P_J^{\perp }\) be the orthogonal projection operators of V onto \(V_J^X\) and \((V_J^X)^{\perp }\), respectively, where \((V_J^X)^{\bot }\) denotes the orthogonal complement of \(V_J^X\) in V.
By Theorem 3.1 in [11] for \(d=1\) and Corollary 3.6 in [15] for general \(d\ge 1\), the conditions (1.2) and (1.3) imply the following intersection of past and future property:
Let \(V^d\) be the space of \({\mathbb {C}}^d\)-valued random variables on \((\varOmega , {\mathcal {F}}, P)\) whose entries belong to V. The norm \(\Vert x\Vert _{V^d}\) of \(x=(x^1,\dots ,x^d)^{\top }\in V^d\) is given by \(\Vert x\Vert _{V^d}:=(\sum _{i=1}^d \Vert x^i\Vert _V^2)^{1/2}\). For \(J\subset {\mathbb {Z}}\) and \(x=(x^1,\dots ,x^d)^{\top }\in V^d\), we write \(P_Jx\) for \((P_Jx^1, \dots , P_Jx^d)^{\top }\). We define \(P_J^{\perp }x\) in a similar way. For \(x=(x^1,\dots ,x^d)^{\top }\) and \(y=(y^1,\dots ,y^d)^{\top }\) in \(V^d\),
stands for the Gram matrix of x and y.
Let
be the spectral representation of \(\{X_k\}\), where \(\eta \) is a \({\mathbb {C}}^d\)-valued random spectral measure. We define a d-variate stationary process \(\{\varepsilon _k:k\in {\mathbb {Z}}\}\), called the forward innovation process of \(\{X_k\}\), by
Then, \(\{\varepsilon _k\}\) satisfies \(\langle \varepsilon _n, \varepsilon _m\rangle = \delta _{n m}I_d\) and \(V_{(-\infty ,n]}^X=V_{(-\infty ,n]}^{\varepsilon }\) for \(n\in {\mathbb {Z}}\), hence
Recall the outer function \(h_{\sharp }\) in \(H_2^{d\times d}({\mathbb {T}})\) from (1.4). We define the backward innovation process \(\{{\tilde{\varepsilon }}_k: k\in {\mathbb {Z}}\}\) of \(\{X_k\}\) by
Then, \(\{{\tilde{\varepsilon }}_k\}\) satisfies \(\langle {\tilde{\varepsilon }}_n, {\tilde{\varepsilon }}_m\rangle =\delta _{n m}I_d\) and \(V_{[-n,\infty )}^X=V_{(-\infty , n]}^{{\tilde{\varepsilon }}}\) for \(n\in {\mathbb {Z}}\), hence
(see Sect. 2 in [16]). Moreover, by Lemma 4.1 in [16], we have
By (3.2), for \(\{s_{\ell }\}\in \ell _{2+}^{d\times d}\) and \(n\in {\mathbb {N}}\),
Therefore,
See Lemma 4.2 in [16]. In particular, for \(n\in {\mathbb {N}}\), \(u \in \{1,\dots ,n\}\) and \(k\in {\mathbb {N}}\), we can define the sequences \(\{b_{n,u,\ell }^k\}_{\ell =0}^{\infty }\in \ell _{2+}^{d\times d}\) and \(\{{\tilde{b}}_{n,u,\ell }^k\}_{\ell =0}^{\infty }\in \ell _{2+}^{d\times d}\) by the recursions (2.7) and (2.8), respectively.
By (1.2) and (1.3), \(\{X_k\}\) has the dual process \(\{X^{\prime }_k: k\in {\mathbb {Z}}\}\), which is a \({\mathbb {C}}^d\)-valued, centered, weakly stationary process characterized by the biorthogonality relation
(see [19]). Recall \(\{a_k\} \in \ell _{2+}^{d\times d}\) and \(\{{\tilde{a}}_k\} \in \ell _{2+}^{d\times d}\) from (2.3) and (2.5), respectively. The dual process \(\{X^{\prime }_k\}\) admits the following two MA representations (see Sect. 5 in [16]):
The next theorem is the key to the proof of Theorem 2.1.
Theorem 3.1
Assume (1.2) and (1.3). Then, for \(n\in {\mathbb {N}}\) and \(s,t \in \{1,\dots ,n\}\), we have (1.5).
Proof
Fix \(n\in {\mathbb {N}}\). For \(s\in \{1,\dots ,n\}\), we can write \(P_{[1,n]}X^{\prime }_s = \sum _{k=1}^n q_{s,k} X_k\) for some \(q_{s,k}\in {\mathbb {C}}^{d\times d}\), \(k\in \{1,\dots ,n\}\). For \(s,t \in \{1,\dots ,n\}\), we have
or \(Q_n T_n(w) = I_{dn}\), where \(Q_n := (q_{s,k})_{1\le s, k\le n}\in {\mathbb {C}}^{dn\times dn}\). Therefore, we have \(Q_n = T_n(w)^{-1}\). However,
Thus, the theorem follows. \(\square \)
Lemma 3.1
Assume (1.2) and (1.3). Then, for \(n\in {\mathbb {N}}\) and \(s,t \in \{1,\dots ,n\}\), the following two equalities hold:
Proof
First, we prove (3.7). Since \(V_{[1,n]}^{X} \subset V_{(-\infty ,n]}^{X}\), we have
On the other hand, from (3.5), we have \(P_{(-\infty ,n]} X^{\prime }_t = -\sum _{m=t}^n a_{m-t}^* \varepsilon _{m}\), hence
and \(\langle X^{\prime }_s, P_{[1,n]}^{\bot } P_{(-\infty ,n]} X^{\prime }_t\rangle \) is equal to
Combining, we obtain (3.7).
Next, we prove (3.8). Since \(V_{[1,n]}^{X} \subset V_{[1,\infty )}^{X}\), we have
On the other hand, from (3.6), we have \(P_{[1, \infty )} X^{\prime }_t = -\sum _{m=1}^{t} {\tilde{a}}_{t-m}^* {\tilde{\varepsilon }}_{-m}\), hence
and \(\langle X^{\prime }_s, P_{[1,n]}^{\bot } P_{[1,\infty )} X^{\prime }_t\rangle \) is equal to
Combining, we obtain (3.8). \(\square \)
For \(n\in {\mathbb {N}}\) and \(u \in \{1,\dots ,n\}\), we define the sequence \(\{W_{n,u}^k\}_{k=1}^{\infty }\) in \(V^d\) by
Lemma 3.2
We assume (1.2) and (1.3). Then, for \(n\in {\mathbb {N}}\) and \(u\in \{1,\dots ,n\}\), we have
the sum converging strongly in \(V^d\).
Proof
Since \(\varepsilon _u\) is in \(V_{(-\infty ,n]}^X\), (3.9) follows from (3.1) and Theorem 3.2 in [16]. \(\square \)
Proposition 3.1
We assume (1.2) and (1.3). Then, for \(n\in {\mathbb {N}}\), \(u\in \{1,\dots ,n\}\) and \(k\in {\mathbb {N}}\), we have
Proof
Note that, from the definition of \(W_{n,u}^k\),
We prove (3.10) and (3.11) by induction. First, by (3.2), we have
For \(k \in {\mathbb {N}}\), assume that \(W_{n,u}^{2k-1}=\sum _{\ell =0}^\infty b_{n,u,\ell }^{2k-1} {\tilde{\varepsilon }}_{\ell }\). Then, by (3.4),
and, by (3.3),
Thus (3.10) and (3.11) follow. \(\square \)
For \(n\in {\mathbb {N}}\) and \(u \in \{1,\dots ,n\}\), we define the sequence \(\{{\tilde{W}}_{n,u}^k\}_{k=1}^{\infty }\) in \(V^d\) by
Lemma 3.3
We assume (1.2) and (1.3). Then, for \(n\in {\mathbb {N}}\) and \(u\in \{1,\dots ,n\}\), we have
the sum converging strongly in \(V^d\).
Proof
Since \({\tilde{\varepsilon }}_{-u}\) is in \(V_{[1,\infty )}^X\), (3.12) follows from (3.1) and Theorem 3.2 in [16]. \(\square \)
Proposition 3.2
We assume (1.2) and (1.3). Then, for \(n\in {\mathbb {N}}\), \(u\in \{1,\dots ,n\}\) and \(k\in {\mathbb {N}}\), we have
Proof
Note that, from the definition of \({\tilde{W}}_{n,u}^k\),
We prove (3.13) and (3.14) by induction. First, by (3.2), we have
For \(k \in {\mathbb {N}}\), assume that \({\tilde{W}}_{n,u}^{2k-1}=\sum _{\ell =0}^\infty {\tilde{b}}_{n,u,\ell }^{2k-1} \varepsilon _{n+1+\ell }\). Then, by (3.3),
and, by (3.4),
Thus (3.13) and (3.14) follow. \(\square \)
We are ready to prove Theorem 2.1.
Proof
(i) For \(n\in {\mathbb {N}}\), \(s, u \in \{1,\dots ,n\}\) and \(k\in {\mathbb {N}}\), we see from (3.5) and (3.13) that
and from (3.6) and (3.14) that
Therefore, by Lemma 3.3, \(\langle X^{\prime }_s, P_{[1,n]}^{\perp } {\tilde{\varepsilon }}_{-u} \rangle \) is equal to
The assertion (i) follows from this, Theorem 3.1 and Lemma 3.1.
(ii) For \(n\in {\mathbb {N}}\), \(s, u \in \{1,\dots ,n\}\) and \(k\in {\mathbb {N}}\), we see from (3.6) and (3.10) that
and from (3.5) and (3.11) that
Therefore, by Lemma 3.2, \(\langle X^{\prime }_s, P_{[1,n]}^{\perp } \varepsilon _u \rangle \) is equal to
The assertion (ii) follows from this, Theorem 3.1 and Lemma 3.1. \(\square \)
4 Strong convergence result for Toeplitz systems
In this section, we use Theorem 2.1 to show a strong convergence result for solutions of block Toeplitz systems. We assume (1.2) and (1.6). Then w is continuous on \({\mathbb {T}}\) since \(w(e^{i\theta })=(2\pi )^{-1}\sum _{k\in {\mathbb {Z}}} e^{ik\theta } \gamma (k)\). In particular, (1.3) is also satisfied. The conditions (1.2) and (1.6) also imply that all of \(\{a_k\}\), \(\{c_k\}\), \(\{{\tilde{a}}_k\}\) and \(\{{\tilde{c}}_k\}\) belong to \(\ell _{1+}^{d\times d}\). See Theorem 3.3 and (3.3) in [17]; see also Theorem 4.1 in [12]. In particular, we have \(h(e^{i\theta })^{-1} = - \sum _{k=0}^{\infty } e^{ik\theta } a_k\) and \(h_{\sharp }(e^{i\theta }) = {\tilde{h}}(e^{-i\theta })^* = \sum _{k=0}^{\infty } e^{ik\theta } {\tilde{c}}_k^*\), hence, by (2.6),
Under (1.2) and (1.6), we define
Then F(n) decreases to zero as \(n\rightarrow \infty \).
We need the next lemma in the proof of Theorem 4.1 below.
Lemma 4.1
Assume (1.2) and (1.6). Then, for \(n, k\in {\mathbb {N}}\) and \(u\in \{1,\dots ,n\}\), we have
Proof
For \(m\in {\mathbb {N}}\), we see from (4.1) that
hence
Let \(n\in {\mathbb {N}}\) and \(u\in \{1,\dots ,n\}\). We use induction on k to prove (4.2). Since \({\tilde{b}}^1_{n,u,\ell }=\beta _{n+1-u+\ell }^*\), we see from (4.3) that
We assume (4.2) for \(k\in {\mathbb {N}}\). Then, again by (4.3),
Thus (4.2) with k replaced by \(k+1\) also holds. \(\square \)
For \(\{y_k\}_{k=1}^{\infty } \in \ell _1^{d\times d}({\mathbb {N}})\), the solution \(Z_{\infty }\) to (1.11) with (1.12) and (1.13) is given by (1.10) with
(see Remark 2.1 in Sect. 2). Notice that the sum in (4.4) converges absolutely.
Theorem 4.1
We assume (1.2) and (1.6). Let \(\{y_k\}_{k=1}^{\infty } \in \ell _1^{d\times d}({\mathbb {N}})\). Then, for \(Z_n\) in (1.7)–(1.9) and \(Z_{\infty }\) in (1.10)–(1.13), we have (1.14).
Proof
By Theorem 2.1 (i), we have
hence, by (4.4), \(\sum _{s=1}^n \Vert z_{n,s} - z_s\Vert \le S_1(n) + S_2(n) + S_3(n) + S_4(n)\), where
and
By the change of variables \(m=s-\ell +1\), we have
By (4.2) with \(k=1\) or (4.3), we have
Furthermore, by the change of variables \(v=t-u+1\), we obtain
Since
the dominated convergence theorem yields
hence \(\lim _{n\rightarrow \infty } S_2(n) = 0\).
Choose \(N\in {\mathbb {N}}\) such that \(F(N+1)<1\). Then, by Lemma 4.1, we have, for \(n\ge N\),
Thus \(\lim _{n\rightarrow \infty }S_3(n)=0\). Similarly, we have, for \(n\ge N\),
hence \(\lim _{n\rightarrow \infty }S_4(n)=0\).
Combining, we obtain (1.14). \(\square \)
5 Closed-form formulas
In this section, we use Theorem 2.1 to derive closed-form formulas for \(T_n(w)^{-1}\) with rational symbol w that corresponds to a d-variate ARMA process. We assume that the symbol w of \(T_n(w)\) is of the form (1.17) with \(h:{\mathbb {T}}\rightarrow {\mathbb {C}}^{d\times d}\) satisfying (1.18). Then h is an outer function in \(H_2^{d\times d}({\mathbb {T}})\), and another outer function \(h_{\sharp }\in H_2^{d\times d}({\mathbb {T}})\) that appears in (1.4) also satisfies (1.18); see Sect. 6.2 in [16]. Notice that (1.17) with (1.18) implies (1.2) and (1.3).
We can write \(h(z)^{-1}\) in the form
where
Here the convention \(\sum _{k=1}^0=0\) is adopted in the sums on the right-hand side of (5.1). For example, if \(m_0=0\), then
while, if \(K=0\), then
and the corresponding stationary process \(\{X_k\}\) is a d-variate AR\((m_0)\) process.
Remark 5.1
It should be noticed that the expression (5.1) with (5.2) is uniquely determined, up to a constant unitary factor, from \(\{X_k\}\) satisfying (1.17) with (1.18) since so is h in the factorization (1.17) with (1.18) (see Sect. 2). Suppose that we start with a d-variate, causal and invertible ARMA process \(\{X_k\}\) in the sense of [4], that is, a \({\mathbb {C}}^d\)-valued, centered, weakly stationary process described by the ARMA equation
where, for \(r, s\in {\mathbb {N}}\cup \{0\}\) and \(\varPhi _i, \varPsi _j\in {\mathbb {C}}^{d\times d}\ (i=1,\dots ,r,\ j=1,\dots ,s)\),
are \({\mathbb {C}}^{d\times d}\)-valued polynomials satisfying \(\det \varPhi (z)\ne 0\) and \(\det \varPsi (z)\ne 0\) on \({\overline{{\mathbb {D}}}}\), B is the backward shift operator defined by \(B X_m=X_{m-1}\), and \(\{\xi _k : k\in {\mathbb {Z}}\}\) is a d-variate white noise, that is, a d-variate, centered process such that \(E[\xi _n \xi _m^*]=\delta _{nm}V\) for some positive-definite \(V\in {\mathbb {C}}^{d\times d}\). Notice that the pair \((\varPhi (z),\varPsi (z))\) is not uniquely determined from \(\{X_k\}\); for example, we can replace \((\varPhi (z),\varPsi (z))\) by \(((2-z)\varPhi (z),(2-z)\varPsi (z))\). However, if we put \(h(z) = \varPhi (z)^{-1} \varPsi (z) V^{1/2}\), then h is an outer function belonging to \(H_2^{d\times d}({\mathbb {T}})\) and satisfies (1.17) for the spectral density w of \(\{X_k\}\). Therefore, h is uniquely determined, up to a constant unitary factor, from \(\{X_k\}\). In particular, the expression (5.1) with (5.2) for h is also uniquely determined, up to a constant unitary factor, from \(\{X_k\}\). From these observations and the results in [13] and this paper, we are led to the idea of parameterizing the ARMA processes by the expression (5.1) with (5.2) (see Remark 8 in [13]). This point will be discussed in future work.
By Theorem 2 in [13], \(h_{\sharp }^{-1}\) has the same \(m_0\) and the same poles with the same multiplicities as \(h^{-1}\), that is, for \(m_0\), K and \((p_1, m_1), \dots , (p_K, m_K)\) in (5.1) with (5.2), \(h_{\sharp }^{-1}\) has the form
where
Notice that if \(d=1\), then we can take \(h_{\sharp }=h\), hence \(\rho _{0,0}=\rho _{0,0}^{\sharp }\) and \(\rho _{{\mu }, j} = \rho _{{\mu }, j}^{\sharp }\) for \(\mu \in \{1,\dots ,K\}\) and \(j\in \{1,\dots ,m_{\mu }\}\).
Recall \({\tilde{h}}\) from (2.1). From (5.4), we have
where
Recall the sequences \(\{a_k\}\) and \(\{{\tilde{a}}_k\}\) from (2.3) and (2.5), respectively. We have
and
where the convention \(\left( {\begin{array}{c}0\\ 0\end{array}}\right) =1\) is adopted; see Proposition 4 in [13].
We first consider the case of \(K = 0\) that corresponds to a d-variate AR\((m_0)\) process. As can be seen from the following theorem, in this case, we have simple closed-form formulas for \(T_n(w)^{-1}\).
Theorem 5.1
We assume (1.17), (1.18) and \(K = 0\) for K in (5.1). Thus we assume (5.3). Then the following four assertions hold.
-
(i)
For \(n \ge m_0 + 1\), \(s \in \{1,\dots ,n\}\) and \(t \in \{1, \dots , n-m_0\}\), we have
$$\begin{aligned} \left( T_n(w)^{-1}\right) ^{s,t} = \sum _{\lambda =1}^{s\wedge t} {\tilde{a}}_{s-\lambda }^* {\tilde{a}}_{t-\lambda }. \end{aligned}$$(5.9) -
(ii)
For \(n \ge m_0 + 1\), \(s \in \{1, \dots , n-m_0\}\) and \(t \in \{1,\dots ,n\}\), we have
$$\begin{aligned} \left( T_n(w)^{-1}\right) ^{s,t} = \sum _{\lambda =1}^{s\wedge t} {\tilde{a}}_{s-\lambda }^* {\tilde{a}}_{t-\lambda }. \end{aligned}$$(5.10) -
(iii)
For \(n \ge m_0 + 1\), \(s \in \{1,\dots ,n\}\) and \(t \in \{m_0+1, \dots , n\}\), we have
$$\begin{aligned} \left( T_n(w)^{-1}\right) ^{s,t} = \sum _{\lambda =s\vee t}^n a_{\lambda -s}^* a_{\lambda -t}. \end{aligned}$$(5.11) -
(iv)
For \(n \ge m_0 + 1\), \(s \in \{m_0+1, \dots , n\}\) and \(t \in \{1,\dots ,n\}\), we have
$$\begin{aligned} \left( T_n(w)^{-1}\right) ^{s,t} = \sum _{\lambda =s\vee t}^n a_{\lambda -s}^* a_{\lambda -t}. \end{aligned}$$(5.12)
Proof
For w satisfying (1.17), (1.18) and \(K = 0\), let \(\{X_k\}\), \(\{X^{\prime }_k\}\), \(\{\varepsilon _k\}\) and \(\{{\tilde{\varepsilon }}_k\}\) be as in Sect. 3.
(i) By (5.4) with \(K=0\), we have \({\tilde{a}}_0={\tilde{\rho }}_0\), \({\tilde{a}}_k={\tilde{\rho }}_{0,k}\) for \(k\in \{1,\dots ,m_0\}\) and \({\tilde{a}}_k=0\) for \(k \ge m_0 + 1\). In particular, we have \(\sum _{k=0}^{m_0}{\tilde{a}}_{k}X_{u+k} + {\tilde{\varepsilon }}_{-u} = 0\) for \(u\in {\mathbb {Z}}\); see (2.15) in [16]. This implies \({\tilde{\varepsilon }}_{-u} \in V_{[1,n]}^X\), or \(P_{[1,n]}^{\bot } {\tilde{\varepsilon }}_{-u} = 0\), for \(u\in \{1,\dots ,n-m_0\}\). Therefore, (5.9) follows from Theorem 3.1 and (3.8).
(iii) By (5.3), we have \(a_0=\rho _{0,0}\), \(a_k=\rho _{0,k}\) for \(k\in \{1,\dots ,m_0\}\) and \(a_k=0\) for \(k \ge m_0 + 1\). In particular, \(\sum _{k=0}^{m_0}a_{k}X_{u-k} + \varepsilon _u = 0\) for \(u\in {\mathbb {Z}}\); see (2.15) in [16]. This implies \(\varepsilon _u \in V_{[1,n]}^X\), or \(P_{[1,n]}^{\bot } \varepsilon _u = 0\), for \(u\in \{m_0+1, \dots ,n\}\). Therefore, (5.11) follows from Theorem 3.1 and (3.7).
(ii), (iv) By (2.9), (ii) and (iv) follow from (i) and (iii), respectively. \(\square \)
We turn to the case of \(K \ge 1\). In what follows in this section, for K in (5.1), we assume
For \(m_1,\dots ,m_K\) in (5.1), we define \(M\in {\mathbb {N}}\) by
For \(\mu \in \{1,\dots ,K\}\), \(p_{\mu }\) in (5.1) and \(i\in {\mathbb {N}}\), we define \(p_{{\mu }, i}: {\mathbb {Z}}\rightarrow {\mathbb {C}}^{d\times d}\) by
Notice that
For \(n\in {\mathbb {Z}}\), we also define \({\mathbf {p}}_n \in {\mathbb {C}}^{dM\times d}\) by the following block representation:
Notice that
We define \(\varLambda \in {\mathbb {C}}^{dM\times dM}\) by
For \({\mu }, {\nu }\in \{1,2,\dots ,K\}\), we define \(\varLambda ^{{\mu },{\nu }}\in {\mathbb {C}}^{dm_{\mu }\times dm_{\nu }}\) by the block representation
where, for \(i \in \{1,\dots ,m_{\mu }\}\) and \(j \in \{1,\dots ,m_{\nu }\}\),
Then, by Lemma 3 in [13], the matrix \(\varLambda \) has the following block representation:
We define, for \(\mu \in \{1,\dots ,K\}\) and \(j \in \{1,\dots ,m_{\mu }\}\),
where
We define \(\varTheta \in {\mathbb {C}}^{dM\times dM}\) by the block representation
where, for \(\mu \in \{1,\dots ,K\}\), \(\varTheta _{\mu } \in {\mathbb {C}}^{dm_{\mu }\times dm_{\mu }}\) is defined by
using \(\theta _{\mu ,j}\) in (5.15) with (5.16).
For \(n\in {\mathbb {Z}}\), we define \(\varPi _n\in {\mathbb {C}}^{dM\times dM}\) by the block representation
where, for \(\mu \in \{1,\dots ,K\}\) and \(n\in {\mathbb {Z}}\), \(\varPi _{\mu , n} \in {\mathbb {C}}^{dm_{\mu }\times dm_{\mu }}\) is defined by
using \(p_{{\mu }, i}(n)\) in (5.14).
The next lemma slightly extends Lemma 17 in [13].
Lemma 5.1
We assume (1.17), (1.18) and \(K \ge 1\) for K in (5.1). Then, for \(n, k, \ell \in {\mathbb {Z}}\) such that \(n+k+\ell \ge m_0\), we have
hence
The proof of Lemma 5.1 is almost the same as that of Lemma 17 in [13], hence we omit it.
For \(n\in {\mathbb {Z}}\), we define \(G_n, {\tilde{G}}_n\in {\mathbb {C}}^{dM\times dM}\) by
Lemma 5.2
We assume (1.17), (1.18) and \(K \ge 1\) for K in (5.1). Then the following two assertions hold.
-
(i)
We assume \(n\ge u\ge m_0+1\). Then, for \(k\in {\mathbb {N}}\) and \(\ell \in {\mathbb {N}}\cup \{0\}\), we have
$$\begin{aligned} b_{n,u,\ell }^{2k-1}&= {\mathbf {p}}_{u-n-1}^* ( {\tilde{G}}_n G_n )^{k-1} (\varPi _n \varTheta )^* {\overline{{\mathbf {p}}}}_{\ell }, \end{aligned}$$(5.17)$$\begin{aligned} b_{n,u,\ell }^{2k}&= {\mathbf {p}}_{u-n-1}^* ({\tilde{G}}_n G_n )^{k-1} {\tilde{G}}_n \varPi _n \varTheta {\mathbf {p}}_{\ell }. \end{aligned}$$(5.18) -
(ii)
We assume \(1\le u\le n-m_0\). Then, for \(k\in {\mathbb {N}}\) and \(\ell \in {\mathbb {N}}\cup \{0\}\), we have
$$\begin{aligned} {\tilde{b}}_{n,u,\ell }^{2k-1}&= {\mathbf {p}}_{-u}^{\top } ( G_n {\tilde{G}}_n )^{k-1} \varPi _n \varTheta {\mathbf {p}}_{\ell }, \end{aligned}$$(5.19)$$\begin{aligned} {\tilde{b}}_{n,u,\ell }^{2k}&= {\mathbf {p}}_{-u}^{\top } (G_n {\tilde{G}}_n )^{k-1} G_n (\varPi _n \varTheta )^* {\overline{{\mathbf {p}}}}_{\ell }. \end{aligned}$$(5.20)
The proof of Lemma 5.2 will be given in the Appendix.
For \(n\in {\mathbb {N}}\) and \({\mu }, {\nu }\in \{1,2,\dots ,K\}\), we define \(\varXi _n^{{\mu },{\nu }}\in {\mathbb {C}}^{dm_{\mu }\times dm_{\nu }}\) by the block representation
where, for \(n\in {\mathbb {N}}\), \(i \in \{1,\dots ,m_{\mu }\}\) and \(j \in \{1,\dots ,m_{\nu }\}\), \(\xi _n^{{\mu },{\nu }}(i, j) \in {\mathbb {C}}^{d\times d}\) is defined by
For \(n\in {\mathbb {N}}\), we define \(\varXi _n\in {\mathbb {C}}^{dM\times dM}\) by
We also define \(\rho \in {\mathbb {C}}^{dM\times d}\) and \({\tilde{\rho }} \in {\mathbb {C}}^{dM\times d}\) by the block representations
and
respectively. For \(n\in {\mathbb {N}}\), we define \(v_n, {\tilde{v}}_n \in {\mathbb {C}}^{dM\times d}\) by
Then, by Lemma 5 in [13], we have
Moreover, if \(m_0\ge 1\), then we have
For \(n\in {\mathbb {Z}}\), we define \(w_n, {\tilde{w}}_n \in {\mathbb {C}}^{dM\times d}\) by
To give closed-form expressions for \(w_n\) and \({\tilde{w}}_n\), we introduce some matrices. For \(n\in {\mathbb {Z}}\) and \({\mu }, {\nu }\in \{1,2,\dots ,K\}\), we define \(\varPhi _n^{{\mu },{\nu }}\in {\mathbb {C}}^{dm_{\mu }\times dm_{\nu }}\) by the block representation
where, for \(n\in {\mathbb {Z}}\), \(i=1,\dots ,m_{\mu }\) and \(j = 1,\dots ,m_{\nu }\), \(\varphi _n^{{\mu },{\nu }}(i, j) \in {\mathbb {C}}^{d\times d}\) is defined by
For \(n\in {\mathbb {Z}}\), we define \(\varPhi _n\in {\mathbb {C}}^{dM\times dM}\) by
Here are closed-form expressions for \(w_n\) and \({\tilde{w}}_n\).
Lemma 5.3
We have
The proof of Lemma 5.3 will be given in the Appendix.
Recall M from (5.13). For \(n \in {\mathbb {N}}\) and \(s\in \{1,\dots ,n\}\), we define
and
Here are closed-form formulas for \((T_n(w))^{-1}\) with w satisfying (1.18) and \(K \ge 1\).
Theorem 5.2
We assume (1.17), (1.18) and \(K \ge 1\) for K in (5.1). Then the following four assertions hold.
-
(i)
For \(n \ge m_0 + 1\), \(s \in \{1,\dots ,n\}\) and \(t \in \{1, \dots , n-m_0\}\), we have
$$\begin{aligned} \left( T_n(w)^{-1}\right) ^{s,t} = {\tilde{r}}_{n,s}^* {\tilde{\ell }}_{n,t}^* + \sum _{\lambda =1}^{s\wedge t} {\tilde{a}}_{s-\lambda }^* {\tilde{a}}_{t-\lambda }. \end{aligned}$$ -
(ii)
For \(n \ge m_0 + 1\), \(s \in \{1, \dots , n-m_0\}\) and \(t \in \{1,\dots ,n\}\), we have
$$\begin{aligned} \left( T_n(w)^{-1}\right) ^{s,t} = {\tilde{\ell }}_{n,s} {\tilde{r}}_{n,t} + \sum _{\lambda =1}^{s\wedge t} {\tilde{a}}_{s-\lambda }^* {\tilde{a}}_{t-\lambda }. \end{aligned}$$ -
(iii)
For \(n \ge m_0 + 1\), \(s \in \{1,\dots ,n\}\) and \(t \in \{m_0+1, \dots , n\}\), we have
$$\begin{aligned} \left( T_n(w)^{-1}\right) ^{s,t} = r_{n,s}^* \ell _{n,t}^* + \sum _{\lambda =s\vee t}^n a_{\lambda -s}^* a_{\lambda -t}. \end{aligned}$$ -
(iv)
For \(n \ge m_0 + 1\), \(s \in \{m_0+1, \dots , n\}\) and \(t \in \{1,\dots ,n\}\), we have
$$\begin{aligned} \left( T_n(w)^{-1}\right) ^{s,t} = \ell _{n,s} r_{n,t} + \sum _{\lambda =s\vee t}^n a_{\lambda -s}^* a_{\lambda -t}. \end{aligned}$$
Proof
(i) We assume \(n \ge m_0 + 1\), \(s \in \{1,\dots ,n\}\) and \(t \in \{1, \dots , n-m_0\}\). Then, by Lemma 5.2 (ii) above and Lemma 19 in [13], we have
Similarly, by Lemma 5.2 (ii) above and Lemma 19 in [13],
However, \(\sum _{u=1}^t {\overline{{\mathbf {p}}}}_{-u} {\tilde{a}}_{t-u} =\sum _{\lambda =0}^{\infty } {\overline{{\mathbf {p}}}}_{\lambda -t} {\tilde{a}}_{\lambda } -\sum _{\lambda =0}^{\infty } {\overline{{\mathbf {p}}}}_{\lambda } {\tilde{a}}_{t+\lambda } ={\tilde{w}}_{t} - {\tilde{v}}_t\). Therefore, the assertion (i) follows from Theorem 2.1 (i).
(iii) We assume \(n \ge m_0 + 1\), \(s \in \{1,\dots ,n\}\) and \(t \in \{m_0+1, \dots , n\}\). Then, by Lemma 5.2 (i) above and Lemma 19 in [13], we have
Similarly, by Lemma 5.2 (i) above and Lemma 19 in [13], we have
However, \(\sum _{u=t}^n {\mathbf {p}}_{u-n-1} a_{u-t} = w_{n+1-t} - v_{n+1-t}\). Therefore, the assertion (ii) follows from Theorem 2.1 (ii).
(ii), (iv) By (2.9), (ii) and (iv) follow from (i) and (iii), respectively. \(\square \)
Example 5.1
Suppose that \(K\ge 1\), \(m_{\mu }=1\) for \(\mu \in \{1,\dots ,K\}\) and \(m_0=0\). Then,
We have
We also have
Example 5.2
In Example 5.1, we further assume \(d=K=1\). Then, we can write \(h(z) = h_{\sharp }(z) = -(1 - {\overline{p}}z)/\rho \), where \(\rho \in {\mathbb {C}}{\setminus }\{0\}\) and \(p \in {\mathbb {D}}{\setminus }\{0\}\). It follows that
Since \(\gamma (k) = \sum _{\ell =0}^{\infty } c_{k+\ell }{\overline{c}}_{\ell }\) and \(\gamma (-k)=\overline{\gamma (k)}\) for \(k\in {\mathbb {N}}\cup \{0\}\), we have
hence
We also have
for \({\tilde{A}}_2\) and \(A_2\) in (2.14) and (2.15), respectively. By simple calculations, we have
hence
which agrees with equalities in Theorem 5.2.
6 Linear-time algorithm
As in Sect. 5, we assume (1.17) and (1.18). Let K be as in (5.1) with (5.2). In this section, we explain how Theorems 5.1 and 5.2 above provide us with a linear-time algorithm to compute the solution Z to the block Toeplitz system (1.19).
For
let
be the solution to (1.19), that is, \(Z = T_n(w)^{-1}Y\). For \(m_0\) in (5.1), let \(n \ge 2m_0 + 1\) so that \(n-m_0 \ge m_0+1\) holds.
Recall \({\tilde{A}}_n\) and \(A_n\) from (2.14) and (2.15), respectively. If \(K = 0\), then it follows from Lemma 2.1 and Theorem 5.1 (ii), (iv) that
where
On the other hand, if \(K \ge 1\), then we see from Lemma 2.1 and Theorem 5.2 (ii), (iv) that
where
Therefore, algorithms to compute \({\tilde{A}}_n^* {\tilde{A}}_n Y\) and \(A_n^* A_n Y\) in O(n) operations imply that of Z. We present the former ones below.
For \(n\in {\mathbb {N}}\cup \{0\}\), \(\mu \in \{1,\dots ,K\}\) and \(j\in \{1,\dots ,m_{\mu }\}\), we define \(q_{\mu ,j}(n) \in {\mathbb {C}}^{d\times d}\) by \(q_{\mu ,j}(n):=p_{\mu ,j}(n+j-1)\), that is,
For \(n\in {\mathbb {N}}\), \(\mu \in \{1,\dots ,K\}\) and \(j\in \{1,\dots ,m_{\mu }\}\), we define the upper trianglular block Toeplitz matrix \(Q_{\mu ,j,n} \in {\mathbb {C}}^{dn\times dn}\) by
Notice that
with \(q^*_{\mu ,j}(n) = \left( {\begin{array}{c}n+j-1\\ j-1\end{array}}\right) {\overline{p}}_{\mu }^nI_d\). For \(n\in {\mathbb {N}}\), \(\mu \in \{1,\dots ,K\}\) and \(j\in \{1,\dots ,m_{\mu }\}\), we define the block diagonal matrices \({\tilde{D}}_{\mu ,j,n} \in {\mathbb {C}}^{dn\times dn}\) and \(D_{\mu ,j,n} \in {\mathbb {C}}^{dn\times dn}\) by
respectively. Moreover, for \(n\ge m_0+1\), we define the upper and lower triangular block Toeplitz matrices \({\tilde{\varDelta }}_{n} \in {\mathbb {C}}^{dn\times dn}\) and \(\varDelta _{n} \in {\mathbb {C}}^{dn\times dn}\) by
and
respectively. Note that both \({\tilde{\varDelta }}_{n}\) and \(\varDelta _{n}\) are sparse matrices in the sense that they have only O(n) nonzero elements.
Therefore, it is enough to give linear-time algorithms to compute \(Q_{\mu ,i,n} Y\) and \(Q^*_{\mu ,i,n} Y\) for \(Y \in {\mathbb {C}}^{dn\times d}\) in O(n) operations. The following two propositions provide such linear-time algorithms.
Proposition 6.1
Let \(n\in {\mathbb {N}}\), \(\mu \in \{1,\dots ,K\}\) and Y be as in (6.1). We put \(Z_{\mu ,i}=Q_{\mu ,i,n} Y\) for \(i\in \{1,\dots ,m_{\mu }\}\). Then the component blocks \(z_{\mu ,i}(s)\) of \(Z_{\mu ,i}=(z_{\mu ,i}^{\top }(1),\dots ,z_{\mu ,i}^{\top }(n))^{\top }\) satisfy the following equalities:
Proof
From the definition of \(Q_{\mu ,i,n}\), (6.3) is trivial. For \(q_{\mu ,i}(k)\) in (6.2), Pascal’s rule yields the following recursions:
For \(s\in \{1,\dots ,n-1\}\), we see, from (6.6),
and, from (6.7),
for \(i\in \{2,\dots ,j\}\). Thus, (6.4) and (6.5) follow. \(\square \)
By Proposition 6.1, we can compute \(z_{\mu ,i}(s)\) in the following order in O(n) operations:
Proposition 6.2
Let \(n\in {\mathbb {N}}\), \(\mu \in \{1,\dots ,K\}\) and Y be as in (6.1). We put \(W_{\mu ,i}=Q^*_{\mu ,i,n} Y\) for \(i\in \{1,\dots ,m_{\mu }\}\). Then the component blocks \(w_{\mu ,i}(s)\) of \(W_{\mu ,i}=(w_{\mu ,i}^{\top }(1),\dots ,w_{\mu ,i}^{\top }(n))^{\top }\) satisfy the following equalities:
The proof of Proposition 6.2 is similar to that of Proposition 6.1; we omit it.
By Proposition 6.2, we can compute \(w_{\mu ,i}(s)\) in the following order in O(n) operations:
References
Baxter, G.: An asymptotic result for the finite predictor. Math. Scand. 10, 137–144 (1962)
Böttcher, A., Silbermann, B.: Introduction to Large Truncated Toeplitz Matrices. Springer, New York (1999)
Böttcher, A., Silbermann, B.: Analysis of Toeplitz Operators, 2nd edn. Springer, Berlin (2006)
Brockwell, P.J., Davis, R.A.: Time Series: Theory and Methods, 2nd edn. Springer, New York (1991)
Cheng, R., Pourahmadi, M.: Baxter’s inequality and convergence of finite predictors of multivariate stochastic processes. Probab. Theory Relat. Fields 95(1), 115–124 (1993)
Gohberg, I.C., Fel’dman, I.A.: Convolution Equations and Projection Methods for Their Solution, volume 41 of Translations of Mathematical Monographs. American Mathematical Society, Providence (1974)
Gohberg, I.C., Semencul, A.A.: The inversion of finite Toeplitz matrices and their continual analogues. (Russian) Mat. Issled. 7(2), 201–223 (1972)
Grenander, U., Szegö, G.: Toeplitz Forms and Their Applications, 2nd edn. Chelsea Publishing Co., New York (1984)
Hannan, E.J., Deistler, M.: The Statistical Theory of Linear Systems. Wiley, New York (1988)
Helson, H., Lowdenslager, D.: Prediction theory and Fourier series in several variables II. Acta Math. 106, 175–213 (1961)
Inoue, A.: Asymptotics for the partial autocorrelation function of a stationary process. J. Anal. Math. 81, 65–109 (2000)
Inoue, A.: AR and MA representation of partial autocorrelation functions, with applications. Probab. Theory Relat. Fields 140(3–4), 523–551 (2008)
Inoue, A.: Closed-form expression for finite predictor coefficients of multivariate ARMA processes. J. Multivariate Anal. 176, 104578 (2020)
Inoue, A., Kasahara, Y.: Explicit representation of finite predictor coefficients and its applications. Ann. Stat. 34(2), 973–993 (2006)
Inoue, A., Kasahara, Y., Pourahmadi, M.: The intersection of past and future for multivariate stationary processes. Proc. Amer. Math. Soc. 144(4), 1779–1786 (2016)
Inoue, A., Kasahara, Y., Pourahmadi, M.: Baxter’s inequality for finite predictor coefficients of multivariate long-memory stationary processes. Bernoulli 24(2), 1202–1232 (2018)
Kasahara, Y., Bingham, N.H.: Matricial Baxter’s theorem with a Nehari sequence. Math. Nachr. 291(17–18), 2590–2598 (2018)
Katsnelson, V.E., Kirstein, B.: On the theory of matrix-valued functions belonging to the Smirnov class. In: Topics in Interpolation Theory. (Leipzig, 1994). Oper. Theory Adv. Appl. vol. 95, pp. 299–350, Birkhäuser (1997)
Masani, P.: The prediction theory of multivariate stochastic processes. III. Unbounded spectral densities. Acta Math. 104, 141–162 (1960)
Peller, V.V.: Hankel Operators and Their Applications. Springer, New York (2003)
Rozanov, Yu.A.: Stationary Random Processes. Holden-Day, San Francisco (1967)
Simon, B.: Orthogonal Polynomials on the Unit Circle. Part 1. Classical Theory. American Mathematical Society, Providence (2005)
Simon, B.: Orthogonal Polynomials on the Unit Circle. Part 2. Spectral Theory. American Mathematical Society, Providence (2005)
Simon, B.: Szegö’s Theorem and Its Descendants. Spectral Theory for \(L^2\) Perturbations of Orthogonal Polynomials. Princeton University Press, Princeton (2011)
Subba Rao, S., Yang, J.: Reconciling the Gaussian and Whittle Likelihood with an application to estimation in the frequency domain. Ann. Stat. 49(5), 2774–2802 (2021)
Acknowledgements
The author is grateful to two referees for their valuable comments and helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Proofs of Lemmas 5.2 and 5.3
Proofs of Lemmas 5.2 and 5.3
As in Sect. 5, we assume (1.17) and (1.18). We use the same notation as in Sect. 5. For K in (5.1) with (5.2), we assume \(K\ge 1\).
We prove Lemma 5.2.
Proof
(i) We assume \(n\ge u\ge m_0+1\), and prove (5.17) and (5.18) by induction. First, from Lemma 5.1,
Next, for \(k=1,2,\dots \), we assume (5.17). Then, by Lemma 5.1,
or (5.18). From this as well as Lemma 5.1,
or (5.17) with k replaced by \(k+1\). Thus (5.17) and (5.18) follow.
(ii) We assume \(1\le u\le n-m_0\), and prove (5.19) and (5.20) by induction. First, from Lemma 5.1,
Next, for \(k=1,2,\dots \), we assume (5.19). Then, by Lemma 5.1,
or (5.20). From this as well as Lemma 5.1,
or (5.19) with k replaced by \(k+1\). Thus (5.19) and (5.20) follow. \(\square \)
To prove Lemma 5.3, we need some propositions.
Proposition A.1
For \(m, n\in {\mathbb {Z}}\), \(i, j \in {\mathbb {N}}\cup \{0\}\) and \(x, y\in {\mathbb {D}}\), we have
Proof
Let \(m,n\in {\mathbb {Z}}\), \(i, j \in {\mathbb {N}}\cup \{0\}\) and \(x, y\in {\mathbb {D}}\). Then, we have
hence
Thus, the proposition follows. \(\square \)
Proposition A.2
For \(n\in {\mathbb {Z}}\), \(i, j \in {\mathbb {N}}\cup \{0\}\) and \(x, y\in {\mathbb {D}}\), we have
Proof
Let \(n\in {\mathbb {Z}}\), \(i, j \in {\mathbb {N}}\cup \{0\}\) and \(x, y\in {\mathbb {D}}\). Since \(x^ny^j/(1-xy)=\sum _{\ell =0}^{\infty } x^{n+\ell }y^{j+\ell }\), we have
On the other hand, by Proposition A.1, we have
Comparing, we obtain the proposition. \(\square \)
We are ready to prove Lemma 5.3.
Proof
By (5.5)–(5.8) and Proposition A.2, we have, for \(n\in {\mathbb {Z}}\), \(\mu \in \{1,\dots ,K\}\) and \(i \in \{1,\dots ,m_{\mu }\}\),
and
Thus, the lemma follows. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Inoue, A. Explicit formulas for the inverses of Toeplitz matrices, with applications. Probab. Theory Relat. Fields 185, 513–552 (2023). https://doi.org/10.1007/s00440-022-01162-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-022-01162-9