1 Main result

Certain periodic solutions of nonlinear delay differential equations with some smooth nonlinearity can be calculated by some Galerkin equations. Oishi’s paper [3] discusses how to obtain a lower bound for the smallest singular value of the coefficient matrix G of the linearized Galerkin equation.

Throughout this note \(\Vert \cdot \Vert \) denotes the \(\ell _2\)-norm, and we use the convention that matrices are supposed to be nonsingular when using their inverse. Dividing the matrix G into blocks

$$\begin{aligned} G:= \left( \begin{array}{ccc} A &{} B\\ C &{} D\\ \end{array}\right) \end{aligned}$$
(1)

with a square upper left block A of small size, often \(\Vert G^{-1}\Vert \) is in practical applications not too far from \(\Vert A^{-1}\Vert \). In other words, the main information on \(\Vert G^{-1}\Vert \) sits in A. Moreover, the matrix G shares some kind of diagonal dominance so that \(\Vert D^{-1}\Vert \) can be expected to be not too far from \(\Vert D_d^{-1}\Vert \) for the splitting \(D = D_d+D_f\) into diagonal and off-diagonal part.

Based on that Oishi discusses in [3] how to obtain an upper bound on \(\Vert G^{-1}\Vert = \sigma _{\min }(G)^{-1}\) by estimating the influence of the remaining blocks BCD. Oishi proves the following theorem.

Theorem 1.1

Let n be a positive integer, and m be a non-negative integer satisfying \(m \le n\). Let \(G \in M_n(\mathbb {R})\) be as in (1) with \(A \in M_m(\mathbb {R})\), \(B \in M_{m,n-m}(\mathbb {R})\), \(C \in M_{n-m,m}(\mathbb {R})\) and \(D \in M_{n-m}(\mathbb {R})\). Let \(D_d\) and \(D_f\) be the diagonal part and the off-diagonal part of D, respectively. If

$$\begin{aligned} \Vert A^{-1}B\Vert< 1,\;\; \Vert CA^{-1}\Vert< 1 \quad \text{ and }\quad \Vert D_d^{-1}(D_f-C A^{-1} B)\Vert < 1, \end{aligned}$$
(2)

then G is invertible and

$$\begin{aligned} \Vert G^{-1}\Vert \le \displaystyle \frac{\max \left\{ \Vert A^{-1}\Vert ,\displaystyle \frac{\Vert D_d^{-1}\Vert }{1-\Vert D_d^{-1}(D_f-C A^{-1} B)\Vert }\right\} }{(1-\Vert A^{-1}B\Vert )(1-\Vert CA^{-1}\Vert )} \;. \end{aligned}$$
(3)

In a practical application the matrix G may be large. So the advantage of Oishi’s bound is that with small computational effort a reasonable upper bound for \(\Vert G^{-1}\Vert \), and thus a lower bound for \(\sigma _{\min }(G)^{-1}\), is derived. In particular the clever use of Schur’s complement requires only a small upper left part A of G to be inverted, whereas the inverse of the large lower right part D is estimated using a Neumann expansion.

For \(p:= \min (m,n)\), denote by \(I_{m,n}\) the matrix with the \(p \times p\) identity matrix in the upper left corner. If \(m=n\), we briefly write \(I_m\). Oishi’s estimate is based on the factorization [2] of G using its Schur complement

$$\begin{aligned} G = \left( \begin{array}{ccc} I_m &{} 0\\ C A^{-1} &{} I_{n-m}\\ \end{array}\right) \left( \begin{array}{ccc} A &{} 0\\ 0 &{} D-CA^{-1}B\\ \end{array}\right) \left( \begin{array}{ccc} I_m &{} A^{-1}B\\ 0 &{} I_{n-m}\\ \end{array}\right) . \end{aligned}$$

It follows

$$\begin{aligned} \Vert G^{-1}\Vert \le \varphi (C A^{-1}) \, \max \left\{ \Vert A^{-1}\Vert ,\Vert (D-CA^{-1}B)^{-1}\Vert \right\} \, \varphi (A^{-1}B) \end{aligned}$$
(4)

where for \(N \in M_{m,n-m}(\mathbb {R})\)

$$\begin{aligned} \varphi (N):= \left\| \left( \begin{array}{ccc} I_m &{} -N\\ 0 &{} I_{n-m}\\ \end{array}\right) \right\| = \left\| \left( \begin{array}{ccc} I_m &{} N\\ 0 &{} I_{n-m}\\ \end{array}\right) ^{-1} \right\| = \left\| \left( \begin{array}{ccc} I_m &{} 0\\ N^T &{} I_{n-m}\\ \end{array}\right) ^{-1} \right\| = \left\| \left( \begin{array}{ccc} I_m &{} 0\\ -N^T &{} I_{n-m}\\ \end{array}\right) \right\| . \end{aligned}$$

To bound \(\Vert (D-CA^{-1}B)^{-1}\Vert \) the standard estimate

$$\begin{aligned} \Vert (D-CA^{-1}B)^{-1}\Vert = \Vert \left( I + D_d^{-1}(D_f - CA^{-1}B) \right) ^{-1} D_d^{-1}\Vert \le \displaystyle \frac{ \Vert D_d^{-1}\Vert }{1-\Vert D_d^{-1}(D_f - CA^{-1}B)\Vert } \end{aligned}$$

based on a Neumann expansion is used. In practical applications as those in [3] this imposes only a small overestimation, and often the maximum in the middle of (4) is equal to \(\Vert A^{-1}\Vert \).

Thus it remains to bound \(\varphi (N)\) for \(N \in \{CA^{-1},A^{-1}B\}\) which is done in [3] by \(\varphi (N) \le \displaystyle \frac{1}{1-\Vert N\Vert }\) requiring \(\Vert N\Vert < 1\). The following lemma removes that restriction and gives an explicit formula for \(\varphi (N)\).

Lemma 1.2

Let \(H \in M_n(\mathbb {R})\) have the block representation

$$\begin{aligned} H = \left( \begin{array}{ccc} I_k &{} -N\\ 0 &{} I_{n-k}\\ \end{array}\right) \end{aligned}$$

for \(1 \le k < n\) and \(N \in M_{k,n-k}(\mathbb {R})\) with \(\Vert N\Vert \ne 0\). Abbreviate \(\mu := \Vert N\Vert \) and define

$$\begin{aligned} \alpha := \left\{ \frac{1}{2} \left( 1+\displaystyle \frac{1}{\sqrt{1+\frac{4}{\mu ^2}}} \right) \right\} ^{1/2} \in (0,1). \end{aligned}$$

Then

$$\begin{aligned} \psi (N):= \Vert H\Vert = \Vert H^{-1}\Vert = \sqrt{ 1 + 2\alpha \mu \sqrt{1-\alpha ^2} + \alpha ^2\mu ^2 } \;. \end{aligned}$$
(5)

Proof

Let \(x \in \mathbb {R}^k\) and \(y \in \mathbb {R}^{n-k}\). Then

$$\begin{aligned} \left\| H \left( \begin{array}{ccc} \hspace{-4pt} x \hspace{-4pt}\\ \hspace{-4pt} y \hspace{-4pt}\\ \end{array}\right) \right\| ^2 = \left\| \left( \begin{array}{ccc} \hspace{-4pt} x + Ny \hspace{-4pt}\\ \hspace{-4pt} y \hspace{-4pt}\\ \end{array}\right) \right\| ^2 \le (\Vert x\Vert + \Vert N\Vert \Vert y\Vert )^2 + \Vert y\Vert ^2 \end{aligned}$$

with equality if, and only if, \(\Vert Ny\Vert = \Vert N\Vert \Vert y\Vert \) and x is a positive multiple of Ny. Let \(u \in \mathbb {R}^k\) and \(v \in \mathbb {R}^{n-k}\) with \(\Vert u\Vert = \Vert v\Vert = 1\) and \(Nv = \Vert N\Vert u\) and define \(y:= \alpha v\) and \(x:= \sqrt{1-\alpha ^2} \, u\). Then \(\left\| \left( \begin{array}{cc} \hspace{-4pt} x \hspace{-4pt}\\ \hspace{-4pt} y \hspace{-4pt}\\ \end{array}\right) \right\| = 1\) and

$$\begin{aligned} \nu (\alpha ):= \left\| H \left( \begin{array}{ccc} \hspace{-4pt} x \hspace{-4pt}\\ \hspace{-4pt} y \hspace{-4pt}\\ \end{array}\right) \right\| ^2 = \left( \sqrt{1-\alpha ^2} + \alpha \mu \right) ^2 + \alpha ^2 = 1 + 2\alpha \mu \sqrt{1-\alpha ^2} + \alpha ^2\mu ^2, \end{aligned}$$
(6)

so that the maximum of \(\nu (\alpha )\) over \(\alpha \in [0,1]\) is equal to \(\Vert H\Vert ^2\). A computation yields

$$\begin{aligned} \displaystyle \frac{d \nu (\alpha )}{d \alpha } = \displaystyle \frac{2\mu }{\sqrt{1-\alpha ^2}} \left( 1 - 2\alpha ^2 + \alpha \mu \sqrt{1-\alpha ^2} \right) . \end{aligned}$$

Setting the derivative to zero and abbreviating \(\beta := \alpha ^2\) gives

$$\begin{aligned} \beta ^2 - \beta + \frac{1}{4+\mu ^2} = 0 \end{aligned}$$

with positive solution

$$\begin{aligned} \beta = \frac{1}{2}\left( 1+\displaystyle \frac{1}{\sqrt{1+\frac{4}{\mu ^2}}} \right) \in (0,1) \end{aligned}$$

for nontrivial N. That proves the lemma. \(\square \)

The lemma removes the first two assumptions in (2). In practical applications those are usually satisfied, so the advantage is that they do not have to be verified. That proves the following.

Theorem 1.3

Let nm and \(G \in M_n(\mathbb {R})\) with the splitting as in (1) be given. With the splitting \(D = D_d+D_f\) into diagonal and off-diagonal part assume

$$\begin{aligned} \Vert D_d^{-1}(D_f-C A^{-1} B)\Vert < 1. \end{aligned}$$
(7)

Then G is invertible and

$$\begin{aligned} \Vert G^{-1}\Vert \le \max \left\{ \Vert A^{-1}\Vert ,\displaystyle \frac{\Vert D_d^{-1}\Vert }{1-\Vert D_d^{-1}(D_f-C A^{-1} B)\Vert }\right\} \; \psi (A^{-1}B) \; \psi (CA^{-1}) \end{aligned}$$
(8)

using \(\psi (\cdot )\) as in (5) in Lemma 1.2.

For a matrix G arising in linearized Galerkin equations the norm \(\Vert (D-CA^{-1}B)^{-1}\Vert \) of the inverse of the Schur complement is often dominated by \(\Vert A^{-1}\Vert \). That is because of the increasing diagonal elements \(D_d\) of the lower right part of G, implying that \(\Vert D_d^{-1}\Vert \) is equal to the reciprocal of the left upper element of \(D_d\) and \(\Vert D_d^{-1}(D_f-CA^{-1}B)^{-1}\Vert \) to become small. In that case the bound (8) reduces to \(\Vert G^{-1}\Vert \le \Vert A^{-1}\Vert \; \psi (A^{-1}B) \; \psi (CA^{-1})\). It can be adapted to other norms following the lines of the proof of Lemma 1.2.

2 Computational improvement and an infinite dimensional example

For both the original bound (3) and the new bound (8) the main computational effort is to estimate \(\Vert D_d^{-1}(D_f-C A^{-1} B)\Vert \) requiring \({\mathcal {O}}((n-m)^2\,m)\) operations. It often suffices to use \(\Vert N\Vert \le \sqrt{\Vert N\Vert _1\Vert N\Vert _{\infty }} =: \pi (N)\) and the crude estimate

$$\begin{aligned} \Vert D_d^{-1}(D_f-C A^{-1} B)\Vert \le \pi (D_d^{-1} D_f) + \pi (D_d^{-1} C) \pi (A^{-1} B) {=}{:} \gamma . \end{aligned}$$
(9)

That avoids the product of large matrices and to compute the \(\ell _2\)-norm of large matrices, thus reducing the total computational cost to \({\mathcal {O}}((n-m)m^2)\) operations. For instance, in Example 3 in [4] with n=10,000 and \(k=0.9\) the bound (9) is successful in the sense that \(\max \left( \Vert A^{-1}\Vert ,\frac{\Vert D_d^{-1}\Vert }{1-\gamma }\right) = \Vert A^{-1}\Vert \) for dimension \(m \ge 13\) of the upper left block A. The additional computational cost for (9) is marginal, so in any case it is worth to try. If successful, then the computational cost for dimension n=10,000 reduces from 8 minutes to less than 2 seconds on a standard laptop.

In order to compute rigorous bounds, an upper bound for the \(\ell _2\) norm of a matrix \(C \in M_n(\mathbb {R})\) is necessary. An obvious possibility is to use \(\Vert C\Vert _2 \le \sqrt{\Vert C\Vert _1\Vert C\Vert _\infty }\). Denoting the spectral radius of a matrix by \(\varrho (\cdot )\), better bounds are obtained by \(\Vert C\Vert _2^2 \le \Vert |C|\Vert _2^2 = \varrho (|C|^T|C|)\) and Collatz’ bound [1]

$$\begin{aligned} \varrho (|C|^T|C|) \le \max _i \displaystyle \frac{(|C|^T(|C|x))_i}{x_i} \, \end{aligned}$$

which is true for any positive vector x. Then a few power iterations on x yield an accurate upper bound for \(\Vert |C|\Vert _2\) in some \({\mathcal {O}}(n^2)\) operations. More sophisticated methods for bounding \(\Vert C\Vert _2\) can be found in [6].

A practical consideration, in particular for so-called verification methods [4, 5], is that the true value \(\mu = \Vert N\Vert \) is usually not available. However, \(\nu (N)\) in (6) is increasing with \(\mu \) and implies

$$\begin{aligned} \Vert N\Vert \le \mu \quad \Rightarrow \quad \Vert H\Vert = \Vert H^{-1}\Vert \le \sqrt{ 1 + 2\alpha \mu \sqrt{1-\alpha ^2} + \alpha ^2\mu ^2 }. \end{aligned}$$
(10)

For example, \(\pi (N)\) can be used for an upper bound \(\mu \) of \(\Vert N\Vert \). Applying this method to (8) to bound \(\psi (N)\) for \(N \in \{A^{-1}B,CA^{-1}\}\) using \(\Vert N\Vert \le \pi (N)\) entirely avoids the computation of the \(\ell _2\)-norm of large matrices.

To that end we show how to compute an upper bound of \(\Vert G^{-1}\Vert \) for the third example in [4] for infinite dimension. The matrix G is parameterized by some \(k \in (0,5,1)\) with elements \(G_{ij} = k^{|i-j|}\) for \(i \ne j\) and \((1,\ldots ,n)\) on the diagonal. As before m denotes the size of the upper left block A.

We first show how to estimate \(\gamma \) in (9) for arbitrarily large n. Define

$$\begin{aligned} P = D_d^{-1} D_f \quad \Rightarrow \quad P_{ij}:= {\left\{ \begin{array}{ll} 0 \quad &{} \text{ if } i = j \\ k^{|i-j|}/(i+m) \quad &{} \text{ otherwise } \\ \end{array}\right. }. \end{aligned}$$

Then for every fixed \(s \in \{1,\ldots ,n-m\}\)

$$\begin{aligned} \sum _{j=1}^{n-m} P_{sj} = \sum _{j=1}^{s-1} k^{s-j}/(s+m) + \sum _{j=s+1}^{n-m} k^{j-s}/(s+m) < 2 \sum _{j=1}^{\infty } k^{j}/(1+m) = \displaystyle \frac{2k}{(1-k)(m+1)} \end{aligned}$$

and

$$\begin{aligned} \sum _{i=1}^{n-m} P_{is} = \sum _{i=1}^{s-1} k^{s-i}/(i+m) + \sum _{i=s+1}^{n-m} k^{i-s}/(i+m) < 2 \sum _{i=1}^{\infty } k^{i}/(1+m) = \displaystyle \frac{2k}{(1-k)(m+1)} \end{aligned}$$

proves

$$\begin{aligned} \Vert D_d^{-1} D_f\Vert \le \sqrt{\Vert D_d^{-1} D_f\Vert _1\Vert D_d^{-1} D_f\Vert _\infty }< \displaystyle \frac{2k}{(1-k)(m+1)}. \end{aligned}$$
(11)

That estimate is valid for any dimension \(n > m\). For \(Q = D_d^{-1} C\) we have \(Q_{ij}:= k^{|m+i-j|}/(m+i)\), so that for every fixed \(s \in \{1,\ldots ,n-m\}\)

$$\begin{aligned} \sum _{j=1}^{m} Q_{sj} \le \sum _{j=1}^{s-1} k^{m+1-j}/(m+1) < \displaystyle \frac{k}{(1-k)(m+1)} \, \end{aligned}$$

and for every fixed \(s \in \{1,\ldots ,m\}\)

$$\begin{aligned} \sum _{i=1}^{n-m} Q_{is} = \sum _{i=1}^{n-m} k^{m+i-s}/(m+i) < \displaystyle \frac{k}{(1-k)(m+1)}. \end{aligned}$$

It follows

$$\begin{aligned} \Vert D_d^{-1} C\Vert < \displaystyle \frac{k}{(1-k)(m+1)}. \end{aligned}$$
(12)

In order to estimate \(\Vert A^{-1}B\Vert \) we split \(B = [ S \;\; T]\) into blocks \(S \in M_{m,q}\) and \(T \in M_{m,n-m-q}\) and \(1 \le q < n-m\). Then \(B_{ij}:= k^{m-i+j}\) implies \(T_{ij}:= k^{q+m-i+j}\) and

$$\begin{aligned} \Vert T\Vert \le \sqrt{\left( \sum _{i=1}^m k^{q+m-i+1}\right) \left( \sum _{j=1}^{n-m-q} k^{q+m-1+j}\right) } < \displaystyle \frac{k^{q+1}}{1-k} \, \end{aligned}$$

so that

$$\begin{aligned} \Vert A^{-1} B\Vert \le \Vert A^{-1} S\Vert + \Vert A^{-1}T\Vert < \Vert A^{-1} S\Vert + \displaystyle \frac{k^{q+1}\Vert A^{-1}\Vert }{1-k}. \end{aligned}$$
(13)

Summarizing and using (11), (12), and (13), (9) becomes

$$\begin{aligned} \Vert D_d^{-1}(D_f-C A^{-1} B)\Vert < \displaystyle \frac{2k}{(1-k)(m+1)} + \displaystyle \frac{k}{(1-k)(m+1)}\left( \Vert A^{-1} S\Vert + \displaystyle \frac{k^{q+1}\Vert A^{-1}\Vert }{1-k}\right) =: \delta . \end{aligned}$$

Note that only “small” matrices are involved in the computation of \(\delta \). For given k, we choose m and q large enough to ensure \(\max \left( \Vert A^{-1}\Vert ,\frac{\Vert D_d^{-1}\Vert }{1-\delta }\right) = \Vert A^{-1}\Vert \). Then it is true for the infinite dimensional matrix G as well. The computational effort is \({\mathcal {O}}(\max (m^2q,m^3))\)

Note that the bound is very crude with much room for improvement. Nevertheless, \(k=0.9\), \(m=25\) and \(q=35\) ensures \(\max \left( \Vert A^{-1}\Vert ,\frac{\Vert D_d^{-1}\Vert }{1-\delta }\right) = \Vert A^{-1}\Vert \) for infinite n with marginal computational effort.

In our example, symmetry implies \(\Vert A^{-1}B\Vert = \Vert C A^{-1}\Vert \), so that inserting the above quantities proves the bounds displayed in Table 1 for arbitrarily large dimension n.

Table 1 Bounds of \(\Vert G^{-1}\Vert \) infinite dimensional G with \(k=0.9\) and choices of m and q

Numerical evidence suggests that the true norm does not exceed 2.39. The total computing time for infinite dimensional G with \(m=200\) and \(q=100\) is less than 0.09 seconds on a laptop.

3 Comparison of the original and new bound

In [3] the bounds on \(\Vert G^{-1}\Vert \) are tested by means of three typical application examples. The first example is a tridiagonal matrix G with all elements equal to 2 on the first subdiagonal, all elements equal to 3 on the first superdiagonal, and \((1,\ldots ,n)\) on the diagonal. Therefore, with increasing dimension \(n \ge 21\), neither \(\varphi (CA^{-1})\) nor \(\varphi (A^{-1}B)\) change. Since the maximum in (4) is equal to \(\Vert A^{-1}\Vert \), the original bound and the new bound using (4) and (5) do not change for any \(n \ge 21\), i.e., they are valid for the infinite dimensional case. Computed bounds for different size m of the left upper block are displayed in Table 2. Computational evidence suggests that \(\Vert G^{-1}\Vert \) does not exceed 3.12 for infinite dimension.

Table 2 Bounds of \(\Vert G^{-1}\Vert \) infinite dimensional G in Example 1 [4] and choices of m

In the second example in [3] the upper left \(5 \times 5\) matrix in G is replaced by some random positive matrix. Since the lower \((n-5)\times (n-5)\) block in G is still tridiagonal, again the original and new bound do not change for any dimension \(n \ge 21\). An example for the original, the new bound and for \(\Vert G^{-1}\Vert \) is 1.909, 1.648, 1.449.

The third example in [3] was already treated in the previous section. For k close to 1 the bounds (3) and (8) are getting weak. For different values of k and dimension n the results are displayed in Table 3. For the true value of \(\Vert G^{-1}\Vert \) verified bounds were computed using verification methods, so that the displayed digits in the last column are correct.

Table 3 Bounds of \(\Vert G^{-1}\Vert \) using dimension \(m=20\) of the upper left block A for different parameters k and dimensions n of G

We see not much dependency on the dimension of the original and the new bound, and practically no dependency of \(\Vert G^{-1}\Vert \). With k getting closer to the upper limit 1 there is more improvement by the new bound, but naturally the bounds become also weaker compared to \(\Vert G^{-1}\Vert \). For \(k \ge 0.97\) the original bound fails because the assumptions \(\Vert CA^{-1}\Vert < 1\) and \(\Vert A^{-1}B\Vert < 1\) are not satisfied.