1 Introduction

Let \(T >0, u^0\in H,\) and consider the initial value problem of seeking \(u \in C((0,T];{\mathscr {D}}(A))\cap C([0,T];H)\) satisfying

$$\begin{aligned} \left\{ \begin{aligned}&u' (t) + Au(t) = f(t), \quad 0<t<T,\\&u(0)=u^0, \end{aligned} \right. \end{aligned}$$
(1.1)

with A a positive definite, selfadjoint, linear operator on a Hilbert space \((H, (\cdot , \cdot )) \) with domain \({\mathscr {D}}(A)\) dense in H and \(f: [0,T] \rightarrow H\) a given forcing term.

The backward difference formula (BDF) methods are popular for stiff differential equations, in particular, for parabolic equations. They are frequently implemented on nonuniform partitions for numerical efficiency.

For an integer \(N\geqslant 2,\) consider a partition \(0=t_0<t_1<\cdots <t_N=T\) of the time interval [0, T],  with time steps \(k_n:=t_n-t_{n-1}, n=1,\dotsc ,N.\) We recursively define a sequence of approximations \(u^n\) to the nodal values \( u(t_n)\) of the exact solution by the variable two-step BDF method,

$$\begin{aligned} D_2 u^{n}+A u^{n} =f^n, \quad n=2,\dotsc ,N, \end{aligned}$$
(1.2)

with \(f^n:=f(t_n),\) assuming that arbitrary starting approximations \(u^0\) and \(u^{1}\) are given. Here,

$$\begin{aligned} D_2 \upsilon ^n:=\Bigg (1+\tfrac{k_n}{k_{n-1}}\Bigg )\frac{\upsilon ^n-\upsilon ^{n-1}}{k_n} -\frac{k_n}{k_{n-1}}\frac{\upsilon ^n-\upsilon ^{n-2}}{k_n+k_{n-1}}. \end{aligned}$$

Let \(| \cdot |\) denote the norm on H induced by the inner product \((\cdot , \cdot )\), and introduce on \(V, V:={\mathscr {D}}(A^{1/2}),\) the norm \(\Vert \cdot \Vert \) by \(\Vert \upsilon \Vert :=| A^{1/2} \upsilon |.\) We identify H with its dual, and denote by \(V'\) the dual of V, and by \(\Vert \cdot \Vert _\star \) the dual norm on \(V', \Vert \upsilon \Vert _\star =| A^{-1/2} \upsilon |.\) We shall use the notation \((\cdot , \cdot )\) also for the antiduality pairing between \(V'\) and V. For simplicity, we denote by \(\langle \cdot ,\cdot \rangle \) the inner product on V\(\langle \upsilon , w\rangle :=(A^{1/2}\upsilon , A^{1/2}w).\)

1.1 Main result

We establish the following stability result.

Theorem 1.1

(Stability estimate) Let \(u^n\) satisfy (1.2), with \(u^0, u^1\in V\), and assume that

$$\begin{aligned} r_n:=\frac{k_n}{k_{n-1}}\leqslant r^\star \approx 1.9398,\quad n=2,\dotsc ,N; \end{aligned}$$
(1.3)

the bound \(r^\star \) is expressed in terms of the multipliers \(\delta =0.9672\) and \(\eta =-0.1793\) in (3.1); see also (4.17) for more precise values of the bound \(r^\star \) as well as of the multipliers. Then, the variable two-step BDF method (1.2) is stable in the sense that

$$\begin{aligned}{} & {} |u^n|^2+\sum _{j=2}^{n}k_j\Vert u^{j}\Vert ^2\nonumber \\{} & {} \quad \leqslant C\textrm{e}^{c\varGamma _n}\Bigg (|u^0|^2+|u^1|^2 +k_2\Vert u^0\Vert ^2+k_2\Vert u^1\Vert ^2+\sum _{j=2}^{n}k_j\Vert f^{j}\Vert _\star ^2\Bigg ), \end{aligned}$$
(1.4)

\(n=2,\dotsc ,N.\) Here, \(\varGamma _n\) is a mesh-dependent quantity,

$$\begin{aligned} \varGamma _n:=\sum \limits _{j=2}^{n-2}[r_{j}-r_{j+2}]_{+}\quad \text {with}\quad [x]_+:=\max (x,0), \end{aligned}$$
(1.5)

and Cc denote generic constants, independent of T and the operator A as well as of f and of the partition of the time interval.

Let us recall some partitions for which \(\varGamma _n\) is finite; see [12, p. 175]. If the sequence of the ratios \((r_n)\) is monotone (and bounded), then \(\varGamma _n\) is bounded; more precisely, \(\varGamma _n=0\) if \((r_n)\) is nondecreasing, and \(\varGamma _n=r_2+r_3-r_{n-1}-r_n\) if \((r_n)\) is decreasing. More generally, \(\varGamma _n\) is bounded in the practically reasonable case that the number of changes in monotonicity of the sequence \((r_n)\) is bounded, uniformly with respect to the number N of time steps. For partitions of the form \(t_i=(i/N)^\alpha ,\) with \(\alpha >1,\) the time steps \(k_i\) increase and the ratios \(r_i\) decrease to 1,  whence, in particular, \(r_i\leqslant r^\star \) except for a finite number of i.

1.2 Main ingredients of the proof

We shall use the energy technique. Let \(r_n=k_n/k_{n-1}, n=2,\dotsc ,N,\) be the adjacent time step ratios. With the notation

$$\begin{aligned} \delta _k\upsilon ^n:=\upsilon ^n-\upsilon ^{n-k},\quad \omega _n:=\frac{1}{1+r_n},\quad \psi _n:=\Bigg (\frac{r_n}{1+r_n}\Bigg )^2, \end{aligned}$$

the backward difference quotient \(D_2 \upsilon ^{n}\) can be written in the form (cf. [2])

$$\begin{aligned} D_2 \upsilon ^{n}=\frac{1}{\omega _nk_n}\big (\delta _1\upsilon ^n-\psi _n\delta _2\upsilon ^n\big ). \end{aligned}$$
(1.6)

Testing the BDF method (1.2) by \(2\omega _nk_n\left( u^{n} -\delta u^{n-1}-\eta u^{n-2}\right) \), with \(0<\delta <1\) and \(-1<\eta < 0\) two multipliers to be suitably chosen below, we obtain

$$\begin{aligned} {\mathscr {D}}_n+{\mathscr {A}}_n={\mathscr {F}}_n,\quad n=2,\dotsc ,N, \end{aligned}$$
(1.7)

with

$$\begin{aligned} \left\{ \begin{aligned}&{\mathscr {D}}_n:=2\omega _nk_n\left( D_2 u^n, u^{n} -\delta u^{n-1}-\eta u^{n-2}\right) ,\\&{\mathscr {A}}_n:=2\omega _nk_n\langle u^n, u^{n} -\delta u^{n-1}-\eta u^{n-2}\rangle ,\\&{\mathscr {F}}_n:=2\omega _nk_n\big (f^n, u^{n} -\delta u^{n-1}-\eta u^{n-2}\big ). \end{aligned} \right. \end{aligned}$$
(1.8)

The terms \({\mathscr {F}}_n\) on the right-hand side of (1.7), accounting for the forcing term f,  can be easily estimated from above by the generalized Cauchy–Schwarz and the weighted arithmetic–geometric mean inequalities. We shall estimate \({\mathscr {D}}_n\) from below, and subsequently the sum over all \({\mathscr {D}}_n,\) in Sect. 3.1, while in Sect. 3.2 we shall directly estimate the sum over all \({\mathscr {A}}_n\) from below rather than each term \({\mathscr {A}}_n\) separately. The key point in the estimate of the sum over all \({\mathscr {A}}_n\) is the positive definiteness of families of certain banded matrices; this property is described and established in Sect. 2.

1.3 Previous work

Stability of the A-stable two-step BDF method for parabolic equations for equidistant partitions can be easily established by the energy technique. The zero-stability property, and thus the stability for o.d.e’s satisfying the Lipschitz condition, of the variable two-step BDF method is also well-understood; a sufficient condition is \(r^\star <1+\sqrt{2}\approx 2.414\) in (1.3) and the bound is sharp; see [4, 7] as well as [10, p. 405]. In contrast, the analysis of the variable two-step BDF method for parabolic equations is cumbersome and still incomplete.

Grigorieff proved stability for linear parabolic equations, with bounds independent of \(\varGamma _n\), for \(r^\star \leqslant (1+\sqrt{3})/2\approx 1.366\) in [8, 9]. In [2], Becker established stability of the form (1.4) and derived error estimates for linear parabolic equations for \(r^\star \leqslant (2+\sqrt{13})/3\approx 1.8685\); see also [12, pp. 174–180]. Emmrich [5] further relaxed the bound to 1.9104 for semilinear parabolic equations. For stability estimates for the three-step BDF method, with a mesh-dependent quantity similar to \(\varGamma _n,\) we refer to [3].

In [2, 12] and [5] the method is tested by linear combinations of two terms, \(u^n\) and \(u^{n-1};\) here, to relax the condition on the ratios, as we mentioned, we test by linear combinations of all three terms that enter into the method, namely, of \(u^n, u^{n-1},\) and \(u^{n-2}.\) Furthermore, we directly estimate the sum of the terms accounting for the elliptic operator from below; this is in sharp contrast to Becker [2], Thomée [12], Emmrich [5], where each one of these terms is estimated separately; see Sect. 3.2.

Several stability estimates of a different kind, in which the difference quotient \((u^1-u^0)/k_1\) enters on the right-hand side, have been recently established both for linear and nonlinear parabolic equations, for bounds \(r^\star \) significantly larger than the optimal bound \(1+\sqrt{2}\) for zero-stability; see [11] and references therein. Notice that \((u^1-u^0)/k_1\) may enter implicitly, if, for instance, the starting value \(u^1\) is computed by employing one step of the trapezoidal method.

We establish key auxiliary results in Sect. 2 and provide the proof of Theorem 1.1 in Sect. 3. We motivate the choice of the multipliers \(\delta \) and \(\eta \) in Sect. 4.

2 Auxiliary results

Our main tool in the proof of the stability result in Theorem 1.1 will be the positive definiteness of families of certain banded matrices. This property will allow us to suitably estimate from below the sum over all terms \({\mathscr {A}}_n\) entering into (1.7).

For given real numbers \(\delta \) and \(\eta \leqslant 0,\) we are interested in properties of families of banded lower triangular \((n-1)\times (n-1)\) real matrices of the form

$$\begin{aligned} {\mathbb {L}}(r_2,\dotsc ,r_n):= \begin{pmatrix} \frac{1}{1+r_2} &{} &{} &{} &{} \\ -\delta \frac{\sqrt{r_3}}{1+r_3} &{} \frac{1}{1+r_3} &{} &{}&{} \\ -\eta \frac{\sqrt{r_3r_4}}{1+r_4} &{}-\delta \frac{\sqrt{r_4}}{1+r_4}&{}\frac{1}{1+r_4} &{} \\ &{}\hspace{-0.8cm} \ddots &{} \hspace{-0.8cm} \ddots &{} \hspace{-0.5cm} \ddots &{} \\ &{} &{}\hspace{-2cm} -\eta \frac{\sqrt{r_{n-1}r_n}}{1+r_n} &{}\hspace{-0.6cm}-\delta \frac{\sqrt{r_n}}{1+r_n}&{}\frac{1}{1+r_n} \end{pmatrix} \end{aligned}$$
(2.1)

with positive \(r_2,\dotsc ,r_n\leqslant r,\) with a uniform positive upper bound r,  for all \(n\geqslant 4.\)

Lemma 2.1

(Property of matrices of the form (2.1)) Let \((\cdot ,\cdot )_2\) and \(\Vert \cdot \Vert _2\) denote the Euclidean inner product and norm, respectively, on \(\mathbb {R}^{n-1},\) and let c be a real constant. Then,

$$\begin{aligned} ({\mathbb {L}}(r_2,\dotsc ,r_n)x,x)_2\geqslant c\Vert x\Vert _2^2\quad \forall x\in \mathbb {R}^{n-1}, \end{aligned}$$
(2.2)

for all matrices of the form (2.1) and for all \(n\geqslant 4,\) if and only if

$$\begin{aligned} p(y)=\frac{1}{1+r}\big [1+\eta r-\delta \sqrt{r} y-2\eta r y^2\big ]\geqslant c\quad \forall y \in [-1,1]. \end{aligned}$$
(2.3)

As we shall see later on, the necessity of (2.3) is an easy consequence of well-known properties of the spectrum of symmetric, banded Toeplitz matrices.

Proof

First, we shall prove that condition (2.3) implies the estimate (2.2). With

$$\begin{aligned} J:=\begin{pmatrix} \frac{1}{1+r_2} &{} &{} &{} \\ &{}\!\! \frac{1}{1+r_3} &{} &{} \\ &{} &{}\ddots &{} \\ &{} &{}&{}\!\!\frac{1}{1+r_n} \end{pmatrix}\quad \text {and}\quad G:=\begin{pmatrix} 0 &{} &{} &{} \\ \sqrt{r_3} &{}0 &{} &{} \\ &{} \ddots &{}\ddots &{} \\ &{} &{}\sqrt{r_n}&{}0 \end{pmatrix} \end{aligned}$$

the matrix \({\mathbb {L}}:={\mathbb {L}}(r_2,\dotsc ,r_n)\) in (2.1) can be rewritten as

$$\begin{aligned} {\mathbb {L}}=J-\delta JG-\eta JG^2. \end{aligned}$$

It suffices to consider the symmetric part \({\mathbb {L}}_s\) of the matrix \({\mathbb {L}},\)

$$\begin{aligned} {\mathbb {L}}_s=\frac{1}{2} (\mathbb {L}+\mathbb {L}^\top ) =J-\frac{\delta }{2} (JG+G^\top J)-\frac{\eta }{2} \big (JG^2+(G^\top )^2J\big ), \end{aligned}$$

since \(({\mathbb {L}}x,x)_2=({\mathbb {L}}_sx,x)_2.\) With \(K:=J^{1/2},\) we have

$$\begin{aligned} 2K^{-1}{\mathbb {L}}_sK^{-1} =2I-\delta \big (KGK^{-1}+K^{-1}G^\top K\big )-\eta \big (KG^2K^{-1}+K^{-1}(G^\top )^2K\big ). \end{aligned}$$

Letting

$$\begin{aligned} P:=KGK^{-1}= \left( \begin{array}{llll} 0 &{} &{} &{} \\ \sqrt{\frac{1+r_2}{1+r_3}r_3} &{} 0 &{} &{} \\ &{} \ddots \qquad \,\, \ddots &{} &{} \\ &{} \sqrt{\frac{1+r_{n-1}}{1+r_n}r_n}&{}\quad 0 \end{array}\right) , \end{aligned}$$

we can rewrite \(2K^{-1}{\mathbb {L}}_sK^{-1}\) in the form

$$\begin{aligned} 2K^{-1}{\mathbb {L}}_sK^{-1}=2I-\delta (P+P^\top )-\eta \big (P^2+(P^\top )^2\big ), \end{aligned}$$

i.e.,

$$\begin{aligned} 2K^{-1}{\mathbb {L}}_sK^{-1} =2I-\delta \sqrt{r}\frac{P+P^\top }{\sqrt{r}}-\eta r \frac{P^2+(P^\top )^2}{r}. \end{aligned}$$

Therefore, with

$$\begin{aligned} Z:=\frac{P}{\sqrt{r}} =\begin{pmatrix} 0 &{} &{} &{} \\ z_3 &{}0 &{} &{} \\ &{} \ddots &{}\ddots &{} \\ &{} &{}z_n&{}0 \end{pmatrix}, \end{aligned}$$

we have

$$\begin{aligned} 2K^{-1}{\mathbb {L}}_sK^{-1} =2I-\delta \sqrt{r} (Z+Z^\top )-\eta r \big (Z^2+(Z^\top )^2\big ). \end{aligned}$$

Using here the identity \(Z^2+(Z^\top )^2=(Z+Z^\top )^2-ZZ^\top -Z^\top Z,\) we see that

$$\begin{aligned} 2K^{-1}{\mathbb {L}}_sK^{-1}=2M-\eta r (2I-ZZ^\top -Z^\top Z) \end{aligned}$$
(2.4)

with the symmetric matrix M

$$\begin{aligned} M:=(1+\eta r)I-\delta \sqrt{r} Z_s-2\eta r Z_s^2, \end{aligned}$$
(2.5)

where \(Z_s:=(Z+Z^\top )/2\) is the symmetric part of the matrix Z.

Since \(\frac{r_i}{1+r_i}\leqslant \frac{r}{1+r}\) and \(1+r_{i-1}\leqslant 1+r\), we have \(z_i=\sqrt{\frac{r_i}{1+r_i}\frac{1+r_{i-1}}{r}}\leqslant 1,\)

$$\begin{aligned} 0<z_i\leqslant 1, \quad i=3,\dotsc ,n. \end{aligned}$$
(2.6)

To prove (2.2), we shall proceed in two steps: first we shall show that (2.6) implies that the diagonal matrix \(2I-ZZ^\top -Z^\top Z\) is positive semidefinite, and subsequently, using the Rayleigh quotient criterion, that the eigenvalues of the matrix M are bounded from below by \(c(1+r).\)

Now,

$$\begin{aligned} \begin{aligned} ZZ^\top =\begin{pmatrix} 0 &{} &{} &{} \\ &{}z_3^2 &{} &{} \\ &{} &{}\ddots &{} \\ &{} &{}&{}z_n^2 \end{pmatrix}\quad \text {and}\quad Z^\top Z=\begin{pmatrix} z_3^2 &{} &{} &{} \\ &{}\ddots &{} &{} \\ &{} &{}z_n^2&{}\\ &{}&{}&{}0 \end{pmatrix}, \end{aligned} \end{aligned}$$

and, thus, the matrix \(2I-ZZ^\top -Z^\top Z\) is diagonal. In view of (2.6), its diagonal entries are nonnegative; consequently, this matrix is indeed positive semidefinite. Notice also that \(\eta \leqslant 0.\)

To complete the proof of (2.2), it remains to show that the eigenvalues of the symmetric matrix M are bounded from below by \(c(1+r).\) Now, the eigenvalues \(\mu _i\) and \(\lambda _i\) of the symmetric matrices M and \(Z_s\), respectively, are related by

$$\begin{aligned} \mu _i=1+\eta r-\delta \sqrt{r} \lambda _i-2\eta r \lambda _i^2=(1+r)p(\lambda _i); \end{aligned}$$
(2.7)

see (2.5) and (2.3).

Let us first show that \(\lambda _i\in [-1,1]\) via the Rayleigh quotient criterion. Indeed, for \(y=(y_2,y_3,\dotsc ,y_n)^\top \in \mathbb {R}^{n-1},\) we have

$$\begin{aligned}(Z_s y,y)_2=\sum _{i=3}^nz_iy_iy_{i-1},\end{aligned}$$

whence, in view of (2.6),

$$\begin{aligned}|(Z_s y,y)_2|\leqslant \frac{1}{2}\sum _{i=3}^n\big ((y_{i-1})^2+(y_i)^2\big )=\Vert y\Vert _2^2-\frac{1}{2} \big [(y_2)^2+(y_n)^2\big ].\end{aligned}$$

Therefore,

$$\begin{aligned}|\lambda _i|\leqslant \sup _{\begin{array}{c} y\in \mathbb {R}^{n-1}\\ y\ne 0 \end{array}}\frac{|(Z_s y,y)_2|}{\Vert y\Vert _2^2}\leqslant 1.\end{aligned}$$

Now, it follows immediately from (2.3) and (2.7) that the eigenvalues \(\mu _i\) of the symmetric matrix M are bounded from below by \(c(1+r).\) Thus, for \(x\in \mathbb {R}^{n-1},\)

$$\begin{aligned}\big (K^{-1}{\mathbb {L}}_sK^{-1}x,x)_2\geqslant (Mx,x)_2 \geqslant c(1+r)\Vert x\Vert _2^2,\end{aligned}$$

which, in combination with \(\Vert K^{-1}x\Vert _2^2\leqslant (1+r)\Vert x\Vert _2^2,\) yields the asserted estimate (2.2).

Next, we prove that condition (2.3) is necessary for (2.2).

It suffices to show that condition (2.3) is necessary for (2.2) for all matrices of the form (2.1) with \(r_2=\cdots =r_n=r.\) The symmetric part \({\mathbb {L}}_s(r,\dotsc ,r):=\big ({\mathbb {L}}(r,\dotsc ,r)+{\mathbb {L}}(r,\dotsc ,r)^\top \big )/2\) of the \((n-1)\times (n-1)\) matrix \({\mathbb {L}}(r,\dotsc ,r)\) is a symmetric pentadiagonal Toeplitz matrix with generating function g (see [1, 6]),

$$\begin{aligned} g(x):=\frac{1}{1+r} \big [1-\delta \sqrt{r}\cos x-\eta r\cos (2x)\big ],\quad x\in \mathbb {R}. \end{aligned}$$

Now, with p the polynomial of (2.3) and the change of variables \(y=\cos x,\) we have

$$\begin{aligned} g_{\min }:=\min _{x\in \mathbb {R}}g(x)=\min _{-1\leqslant y\leqslant 1}p(y). \end{aligned}$$

Assume that (2.3) is not satisfied; then, we would have \(g_{\min }<c.\) From Theorem 2.1, a simplified version of more general results for symmetric banded Toeplitz matrices, we would then infer that the matrices \({\mathbb {L}}_s(r,\dotsc ,r)\) possess, for sufficiently large dimension, eigenvalues less than c,  a contradiction to (2.2).\(\square \)

Theorem 2.1

(Grenander–Szegő theorem, and asymptotic behavior of extreme eigenvalues of symmetric, banded Toeplitz matrices; cf. [6, Theorems 6.1 and 6.6]) Let g be a nonconstant, real and even, \(2\pi \)-periodic, trigonometric polynomial. Then, the eigenvalues of all symmetric, banded, \(n\times n\) Toeplitz matrices \(T_n,\) with generating function g,  belong to the open interval \((g_{\min },g_{\max })\) with \(g_{\min }\) and \(g_{\max }\) the minimum and maximum of g,  respectively.

Let \(\lambda _1(T_n)\geqslant \lambda _2(T_n)\geqslant \cdots \geqslant \lambda _n(T_n)\) be the eigenvalues of \(T_n\) sorted in nonincreasing order. Then, for each fixed integer \(j\geqslant 1,\) we have

$$\begin{aligned}\lim _{n\rightarrow \infty }\lambda _j(T_n)=g_{\max }\quad \text {and}\quad \lim _{n\rightarrow \infty }\lambda _{n-j+1}(T_n)=g_{\min }.\end{aligned}$$

Remark 2.1

The Grenander–Szegő theorem applies to Toeplitz matrices; see the first part of Theorem 2.1. Here, Lemma 2.1 can be viewed as a variant of the Grenander–Szegő theorem, applicable to a class of non-Toeplitz matrices.

3 Proof of Theorem 1.1

In this section, we prove Theorem 1.1.

Let us first recall a discrete version of Gronwall’s lemma that we will need in the sequel.

Lemma 3.1

(Discrete Gronwall inequality; Emmrich, [5]) Let \(\alpha _n,\beta _n,\xi _n,\varphi _n\) be nonnegative numbers, with a monotonically increasing sequence \((\xi _n)_{n\geqslant 2},\) satisfying the inequalities

$$\begin{aligned} \alpha _n+\beta _n\leqslant \sum _{i=2}^{n-1}\varphi _i\alpha _i+\xi _n,\quad n=2,3,\dotsc . \end{aligned}$$

Then, the following estimate is valid

$$\begin{aligned} \alpha _n+\beta _n\leqslant \xi _n\exp \Bigg (\sum _{i=2}^{n-1}\varphi _i\Bigg ), \quad n=2,3,\dotsc . \end{aligned}$$

3.1 Estimation of the terms accounting for the difference quotient

Let us first focus on the first term on the left-hand side of (1.7).

Lemma 3.2

(Estimation of \({\mathscr {D}}_n\)) Assume that \(0<\delta<1,-1<\eta <0\) with \(2-\delta +2\eta \geqslant 0\), \(1+\delta +3\eta \geqslant 0\), and \(r_j\leqslant r, j=2,\dotsc ,N,\) with r such that

$$\begin{aligned} r\leqslant \frac{\sqrt{1+\delta +3\eta }}{2\sqrt{1+\eta }-\sqrt{1+\delta +3\eta }} =:r^\star (\delta ,\eta ). \end{aligned}$$
(3.1)

Then,Footnote 1

$$\begin{aligned} \sum _{j=2}^{n}{\mathscr {D}}_j\geqslant & {} (1-\delta -\eta ) \Bigg ( (1-\psi _n)|u^n|^2-\psi _{n-1}|u^{n-1}|^2 -|u^1|^2 -\sum _{j=2}^{n-2}[\psi _{j}-\psi _{j+2}]_{+}|u^j|^2\Bigg )\nonumber \\{} & {} -\,\big [-\eta +\left( 2-\delta +2\eta \right) \psi _2\big ]|\delta _1u^1|^2, \end{aligned}$$
(3.2)

\(n=3,\dotsc ,N.\) For \(n=2,\) (3.2) is also valid without the second and fourth terms on the right-hand side, i.e.,

$$\begin{aligned}{\mathscr {D}}_2\geqslant (1-\delta -\eta ) \Big ( (1-\psi _2)|u^2|^2-|u^1|^2\Big ) -\big [-\eta +\left( 2-\delta +2\eta \right) \psi _2\big ]|\delta _1u^1|^2.\end{aligned}$$

Proof

We shall estimate each term \({\mathscr {D}}_j\) from below separately and subsequently sum over j to obtain (3.2).

Using (1.6) and expanding \({\mathscr {D}}_n\) in (1.8), we have

$$\begin{aligned} {\mathscr {D}}_n=I_1^n+I_2^n+I_3^n+I_4^n+I_5^n+I_6^n \end{aligned}$$
(3.3)

with

$$\begin{aligned} \left\{ \begin{aligned} I_1^n&=2(\delta _1u^n,u^n),\quad{} & {} I_2^n=-2\psi _n(\delta _2u^n,u^n),\quad{} & {} I_3^n=-2\delta (\delta _1u^n,u^{n-1}),\\ I_4^n&=2\delta \psi _n(\delta _2u^n,u^{n-1}),\quad{} & {} I_5^n=-2\eta (\delta _1u^n,u^{n-2}),\quad{} & {} I_6^n=2\eta \psi _n(\delta _2u^n,u^{n-2}). \end{aligned} \right. \nonumber \\ \end{aligned}$$
(3.4)

Using the identities

$$\begin{aligned} 2(\delta _ku^n,u^n)=\delta _k|u^n|^2+|\delta _ku^n|^2,\quad 2(\delta _ku^n,u^{n-k})=\delta _k|u^n|^2-|\delta _ku^n|^2, \end{aligned}$$

we see that

$$\begin{aligned} I_1^n= & {} \delta _1|u^n|^2+|\delta _1u^n|^2,\quad I_2^n=-\psi _n\big (\delta _2|u^n|^2+|\delta _2u^n|^2\big ),\\ I_3^n= & {} -\delta \big (\delta _1|u^n|^2- |\delta _1u^n|^2\big ), \quad I_6^n=\eta \psi _n\big (\delta _2|u^n|^2-|\delta _2u^n|^2\big ). \end{aligned}$$

Furthermore, since \(\delta _2u^n=\delta _1u^n+\delta _1u^{n-1}\), we have

$$\begin{aligned} \begin{aligned} I_4^n&=2\delta \psi _n(\delta _1u^n+\delta _1u^{n-1},u^{n-1})\\&=\delta \psi _n\big (\delta _2|u^n|^2-|\delta _1u^n|^2 +|\delta _1u^{n-1}|^2\big ),\\ I_5^n&=-2\eta (\delta _1u^n,u^{n-2}) =-2\eta (\delta _2u^n,u^{n-2})+2\eta (\delta _1u^{n-1},u^{n-2})\\&=-\eta \big (\delta _1|u^n|^2-|\delta _2u^n|^2+|\delta _1u^{n-1}|^2\big ). \end{aligned} \end{aligned}$$

Collecting terms, we therefore obtain from (3.3) and (3.4)

$$\begin{aligned} {\mathscr {D}}_n= & {} J_1^n+(1+\delta -\delta \psi _n)|\delta _1u^n|^2+(\delta \psi _n-\eta )|\delta _1u^{n-1}|^2\nonumber \\{} & {} +\,(\eta -\eta \psi _n-\psi _n)|\delta _2u^n|^2\geqslant J_1^n+J_2^n \end{aligned}$$
(3.5)

with

$$\begin{aligned} J_1^n=(1-\delta -\eta )\big (\delta _1|u^n|^2-\psi _n\delta _2|u^n|^2\big ),\quad J_2^n=A_n|\delta _1u^n|^2-B_n|\delta _1u^{n-1}|^2, \end{aligned}$$

where

$$\begin{aligned} A_n:=1+\delta +2\eta -(2+\delta +2\eta )\psi _n,\quad B_n:=-\eta +(2-\delta +2\eta )\psi _n; \end{aligned}$$

in the derivation of the inequality in (3.5), we used the obvious estimate \(|\delta _2u^n|^2\leqslant 2|\delta _1u^n|^2+2|\delta _1u^{n-1}|^2\).

Now,

$$\begin{aligned} \begin{aligned} \sum _{j=2}^{n}J_1^j&=(1-\delta -\eta ) \Bigg (\left( 1-\psi _n\right) |u^n|^2-\psi _{n-1}|u^{n-1}|^2-|u^1|^2 -\sum _{j=2}^{n-2}\left( \psi _{j}-\psi _{j+2}\right) |u^j|^2\Bigg )\\&\quad +\,(1-\delta -\eta )\big (\psi _2|u^0|^2+\psi _3|u^1|^2\big ). \end{aligned} \end{aligned}$$

Hence, noting that \(\delta +\eta <1,\) we have

$$\begin{aligned} \sum _{j=2}^{n}J_1^j \geqslant (1-\delta -\eta ) \Bigg ( (1-\psi _n)|u^n|^2-\psi _{n-1}|u^{n-1}|^2 -|u^1|^2 -\sum _{j=2}^{n-2}[\psi _{j}-\psi _{j+2}]_{+}|u^j|^2\Bigg ).\nonumber \\ \end{aligned}$$
(3.6)

Moreover,

$$\begin{aligned} \sum _{j=2}^{n}J_2^j= & {} \sum _{j=2}^{n}\big (A_j|\delta _1u^j|^2 -B_j|\delta _1u^{j-1}|^2\big )\nonumber \\= & {} \sum _{j=2}^{n-1}(A_j-B_{j+1})|\delta _1u^j|^2 +A_n|\delta _1u^n|^2-B_2|\delta _1u^1|^2. \end{aligned}$$
(3.7)

We shall now show that if \(r_j\leqslant r^\star (\delta ,\eta )\) for all j, then \(A_j-B_{j+1}\geqslant 0\). Assume that \(r_j\leqslant r\) for all j. Since \(2+\delta +2\eta >0\) and \(2-\delta +2\eta \geqslant 0,\) \(A_j\) and \(B_j\) are decreasing and increasing functions of \(r_j\), respectively; thus,

$$\begin{aligned} \begin{aligned} A_j-B_{j+1}&\geqslant 1+\delta +2\eta -\left( 2+\delta +2\eta \right) \Big (\frac{r}{1+r}\Big )^2 +\eta -\left( 2-\delta +2\eta \right) \Big (\frac{r}{1+r}\Big )^2\\&=1+\delta +3\eta -4\left( 1+\eta \right) \Big (\frac{r}{1+r}\Big )^2. \end{aligned} \end{aligned}$$

In view of (3.1), there holds \(A_j-B_{j+1}\geqslant 0,\) and (3.7) yields

$$\begin{aligned} \sum _{j=2}^{n}J_2^j\geqslant -B_2|\delta _1u^1|^2. \end{aligned}$$
(3.8)

The asserted estimate (3.2) is an immediate consequence of (3.5), (3.6), and (3.8).\(\square \)

3.2 Estimation of the terms \(\pmb {\mathscr {A}}_n\) accounting for the elliptic operator

Here, we shall estimate the sum of the terms \({\mathscr {A}}_n\) from below. Lemma 2.1 plays a key role in the proof.

Lemma 3.3

(Estimation of \({\mathscr {A}}_n\)) Let \(\delta =0.9672\) and \(\eta =-0.1793\), and assume that \(r_j\leqslant r^\star (0.9672,-0.1793) \approx 1.9398, j=2,\dotsc ,N;\) see (3.1). Then,

$$\begin{aligned} \frac{1}{2} \sum _{j=2}^{n}{\mathscr {A}}_j \geqslant c_1\sum _{j=2}^{n}k_j\Vert u^{j}\Vert ^2-\delta \omega _2k_2\langle u^{2}, u^{1}\rangle -\eta \omega _2k_2\langle u^{2}, u^{0}\rangle -\eta \omega _3k_3\langle u^{3}, u^{1}\rangle ,\nonumber \\ \end{aligned}$$
(3.9)

\(n=3,\dotsc ,N,\) with \(c_1=10^{-6};\) for \(n=2,\) (3.9) is also valid without the last term on the right-hand side.

Proof

We rewrite the sum on the left-hand side of (3.9) in the form

$$\begin{aligned} \frac{1}{2} \sum _{j=2}^{n}{\mathscr {A}}_j =\sum _{i,j=1}^{n-1}L_{ij}\langle u^{i+1}, u^{j+1}\rangle -\delta \omega _2k_2\langle u^{2}, u^{1}\rangle -\eta \omega _2k_2\langle u^{2}, u^{0}\rangle -\eta \omega _3k_3 \langle u^{3}, u^{1}\rangle ,\nonumber \\ \end{aligned}$$
(3.10)

with \(L_{ij}\) the entries of the matrix \(L\in \mathbb {R}^{n-1,n-1},\)

$$\begin{aligned} L:=\begin{pmatrix} \omega _2k_2 &{} &{} &{} &{} \\ -\delta \omega _3k_3 &{} \omega _3k_3 &{} &{}&{} \\ -\eta \omega _4k_4 &{}-\delta \omega _4k_4&{}\omega _4k_4 &{} \\ &{}\hspace{-0.8cm} \ddots &{} \hspace{-0.8cm} \ddots &{} \ddots &{} \\ {} &{} &{}\hspace{-1.9cm} -\eta \omega _nk_n &{}\hspace{-0.7cm} -\delta \omega _nk_n &{}\omega _nk_n \end{pmatrix}. \end{aligned}$$
(3.11)

With \({\mathbb {L}}(r_2,\dotsc ,r_n)\) the matrix in (2.1) and \(\varLambda \) the diagonal matrix

$$\begin{aligned} \varLambda :={{\,\textrm{diag}\,}}\Big (\frac{1}{k_2},\frac{1}{k_3},\dotsc ,\frac{1}{k_n}\Big ), \end{aligned}$$

it is easily seen that \({\mathbb {L}}(r_2,\dotsc ,r_n)=\varLambda ^{1/2}L\varLambda ^{1/2}.\)

It suffices to show that

$$\begin{aligned} ({\mathbb {L}}(r_2,\dotsc ,r_n)x,x)_2\geqslant c_1\Vert x\Vert _2^2\quad \forall x\in \mathbb {R}^{n-1}. \end{aligned}$$
(3.12)

Indeed, then, the first term on the right-hand side of (3.10) is larger than or equal to \(c_1\big (k_2\Vert u^2\Vert ^2\) \(+\cdots +k_n\Vert u^n\Vert ^2\big )\) and thus (3.12) leads to the asserted estimate (3.9).

To see that (3.12) is valid for \(n\geqslant 4,\) we note that the quadratic polynomial p in (2.3) with r replaced by \(r^\star ,\) attains its minimum in \([-1,1]\) at \(y^\star =\frac{-\delta \sqrt{r^\star }}{4\eta r^\star }\). For \(\delta =0.9672\) and \(\eta =-0.1793\), we have \(r\leqslant r^\star (0.9672,-0.1793) \approx 1.9398\) by inequality (3.1); then, indeed, \(0<y^\star <1.\) Furthermore,

$$\begin{aligned} p(y^\star )\approx 7.3592\cdot 10^{-6}>c_1. \end{aligned}$$

Notice that (3.12) is valid also for \(n=2\) and \(n=3\). Indeed, for \(n=2, ({\mathbb {L}}(r_2)x,x)_2 =\frac{1}{1+r_2}\Vert x\Vert _2^2\geqslant \frac{1}{1+r^\star }\Vert x\Vert _2^2\geqslant c_1\Vert x\Vert _2^2,\) and for \(n=3, ({\mathbb {L}}(r_2,r_3)x,x)_2=\frac{1}{1+r_2}x_2^2-\delta \frac{\sqrt{r_3}}{1+r_3}x_2x_3+\frac{1}{1+r_3}x_3^2\) \(\geqslant (\frac{1}{1+r^\star }-\frac{\delta }{2}\frac{\sqrt{r_3}}{1+r_3})\Vert x\Vert _2^2 \geqslant (\frac{1}{1+r^\star }-\frac{\delta }{4})\Vert x\Vert _2^2\geqslant c_1\Vert x\Vert _2^2.\)

Now, in view of (3.12), (3.10) and (2.2) lead to the asserted estimate (3.9).

For the motivation of the specific choice of the multipliers \(\delta \) and \(\eta \), see Sect. 4.\(\square \)

3.3 Proof of Theorem 1.1

Here, we use Lemmata 3.2 and 3.3, the discrete Gronwall inequality in Lemma 3.1, and elementary inequalities, to prove Theorem 1.1.

Replacing n by j in (1.7), summing from \(j=2\) to \(j=n,\) and using (3.2) and (3.9), we obtain

$$\begin{aligned} \begin{aligned}&\left( 1-\delta -\eta \right) \left( 1-\psi _n\right) |u^n|^2 +2c_1\sum _{j=2}^{n}k_j\Vert u^{j}\Vert ^2\\&\quad \leqslant \left( 1-\delta -\eta \right) \psi _{n-1}|u^{n-1}|^2+C \left( |u^0|^2+|u^1|^2\right) +\left( 1-\delta -\eta \right) \sum _{j=2}^{n-2}[\psi _{j}-\psi _{j+2}]_{+}|u^j|^2\\&\qquad +\,\sum _{j=2}^{n}{\mathscr {F}}_j+2\delta \omega _2k_2\langle u^{2}, u^{1}\rangle +2\eta \omega _2k_2\langle u^{2}, u^{0}\rangle +2\eta \omega _3k_3\langle u^{3}, u^{1}\rangle . \end{aligned} \end{aligned}$$

Now, the terms involving the forcing term or the starting approximations can be estimated by the Cauchy–Schwarz inequality and the elementary inequality

$$\begin{aligned} 2ab\leqslant \varepsilon a^2+\varepsilon ^{-1}b^2,\quad a,b\in \mathbb {R}, \end{aligned}$$

with \(\varepsilon >0\) small enough. We obtain

$$\begin{aligned} {\mathscr {F}}_j\leqslant \omega _jk_j\varepsilon _1^{-1}(1+\delta -\eta )\Vert f^j\Vert _\star ^2+ \omega _jk_j\varepsilon _1\left( \Vert u^j\Vert ^2+\delta \Vert u^{j-1}\Vert ^2-\eta \Vert u^{j-2}\Vert ^2\right) \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} 2|\langle u^i,u^j \rangle |\leqslant \varepsilon _2 \Vert u^i\Vert ^2+\varepsilon _2^{-1}\Vert u^j\Vert ^2,~~i=2,3,~~j=0,1, \end{aligned} \end{aligned}$$

with sufficiently small \(\varepsilon _1\) and \(\varepsilon _2,\) and we are lead to the inequality

$$\begin{aligned} \begin{aligned} |u^n|^2 +c_1\sum _{j=2}^{n}k_j\Vert u^{j}\Vert ^2&\leqslant \frac{\psi _{n-1}}{1-\psi _{n}}\left| u^{n-1}\right| ^2 +C \big (|u^0|^2+|u^1|^2+k_2\Vert u^0\Vert ^2+k_2\Vert u^1\Vert ^2\big )\\&\quad +\,C\sum _{j=2}^{n-2}[\psi _{j}-\psi _{j+2}]_{+}|u^j|^2 +C\sum _{j=2}^{n}k_j\Vert f^j\Vert _\star ^2,~~n\geqslant 2. \end{aligned} \end{aligned}$$

Since \(\frac{\psi _{n-1}}{1-\psi _{n}}\leqslant \bar{c}<1\), and \([\psi _{j}-\psi _{j+2}]_{+}\leqslant C[r_{j}-r_{j+2}]_{+}\) (see [12, p. 179]), we have

$$\begin{aligned} |u^n|^2 +c_1\sum _{j=2}^{n}k_j\Vert u^{j}\Vert ^2\leqslant & {} \bar{c}|u^{n-1}|^2 +C\left( |u^0|^2+|u^1|^2+k_2\Vert u^0\Vert ^2+k_2\Vert u^1\Vert ^2\right) \nonumber \\{} & {} +\,C\sum _{j=2}^{n-2}[r_{j}-r_{j+2}]_{+}|u^j|^2 +C\sum _{j=2}^{n}k_j\Vert f^j\Vert _\star ^2,~~n\geqslant 2.\nonumber \\ \end{aligned}$$
(3.13)

Hence, we have

$$\begin{aligned} \begin{aligned} |u^n|^2\leqslant \bar{c}|u^{n-1}|^2+K_n,\quad n\geqslant 2, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} K_n=C\Bigg (|u^0|^2+|u^1|^2+k_2\Vert u^0\Vert ^2+k_2\Vert u^1\Vert ^2 +\sum _{j=2}^{n-2}[r_{j}-r_{j+2}]_{+}|u^j|^2+\sum _{j=2}^{n}k_j\Vert f^j\Vert _\star ^2\Bigg ). \end{aligned}$$

Let \(2\leqslant n_\star \leqslant n,\) be such that \(|u^{n_\star }| =\max \nolimits _{1\leqslant \ell \leqslant n}|u^\ell |\). Setting \(n:=n_\star \) in the above inequality and using the fact that \(K_{n_\star }\leqslant K_n\), we get

$$\begin{aligned} |u^{n_\star }|^2\leqslant \bar{c}|u^{{n_\star }-1}|^2+K_{n_\star }\leqslant \bar{c}|u^{{n_\star }}|^2+K_n, \end{aligned}$$

which leads to

$$\begin{aligned} |u^{{n}-1}|^2\leqslant |u^{n_\star }|^2\leqslant \frac{1}{1-\bar{c}}K_n. \end{aligned}$$

Thus, (3.13) yields

$$\begin{aligned} |u^n|^2 +c_1\sum _{j=2}^{n}k_j\Vert u^{j}\Vert ^2 \leqslant \bar{c} |u^{n-1}|^2+K_n \leqslant \frac{1}{1-\bar{c}}K_n. \end{aligned}$$

Applying here the discrete Gronwall Lemma 3.1, we obtain the asserted stability estimate (1.4).\(\square \)

Remark 3.1

Proceeding as in the proof of Theorem 1.1, we can see that Emmrich’s bound (\(r^*\approx 1.9104\)) is optimal for a single multiplier as far as the positive definiteness of suitable matrices is concerned, with \(\delta =0.72349\), \(\eta =0\).

4 On the choice of the multipliers \(\delta \) and \(\eta \)

Here, we comment on the choice \(\delta =0.9672\) and \(\eta =-0.1793\) of the multipliers; we also give more precise values of the multipliers and of the bound \(r^\star \); see (4.17).

We recall that in our stability analysis we used two conditions on the bound r of the ratios, namely,

$$\begin{aligned} 0<r\leqslant r^\star (\delta ,\eta )=\frac{\sqrt{1+\delta +3\eta }}{2\sqrt{1+\eta }-\sqrt{1+\delta +3\eta }} \end{aligned}$$
(4.1)

and the positivity condition

$$\begin{aligned} P(y)=(1+r)p(y)=1+\eta r-\delta \sqrt{r}y-2\eta ry^2> 0\quad \forall y\in [-1,1] \end{aligned}$$
(P)

to estimate the terms accounting for the difference quotient and for the elliptic operator, respectively; see (3.1) and (2.3). Our goal here is to choose the multipliers \(\delta \) and \(\eta \) in such a way that both conditions, (4.1) and (P), are satisfied for values of r as large as possible.

Let us focus on the condition (P) and introduce the domain

$$\begin{aligned} D:=\big \{(\delta ,\eta ): 0<\delta<2,-1<\eta < 0,1+\delta +3\eta \geqslant 0\big \}=D_1\cup D_2 \end{aligned}$$
(4.2)

with

$$\begin{aligned} D_1:=\big \{(\delta ,\eta )\in D: \dfrac{3}{16}\delta ^2\leqslant -\eta \big \},\quad D_2:=\big \{(\delta ,\eta )\in D: \dfrac{3}{16}\delta ^2> -\eta \}; \end{aligned}$$

see Fig. 1. Notice that instead of trying to determine optimal multipliers in the set of admissible multipliers, i.e., multipliers satisfying all conditions needed in our stability analysis, we find it more convenient to determine optimal multipliers in the larger domain D,  in which only some of the conditions of our stability analysis are automatically satisfied, and, a posteriori, check that these multipliers are indeed admissible.

Fig. 1
figure 1

The domains D (colored region), \(D_1,\) and \(D_2,\) as well as the segments \(L_t\); see (4.2) and (4.7)

Claim. For \((\delta , \eta )\in D,\) the positivity condition (P) is satisfied if and only if

$$\begin{aligned} r< h(\delta , \eta ):=\left\{ \begin{aligned}&-\dfrac{1}{\eta }-\dfrac{\delta ^2}{8\eta ^2},\quad{} & {} (\delta , \eta )\in D_1, \\&\left( \dfrac{2}{\delta +\sqrt{\delta ^2+4\eta }}\right) ^2,\quad{} & {} (\delta , \eta )\in D_2. \end{aligned} \right. \end{aligned}$$
(4.3)

To see this, we consider the cases that \((\delta , \eta )\) belongs to \(D_1\) or to \(D_2\) separately.

We write P in the form

$$\begin{aligned} P(y)=-2\eta r\Big (y+\frac{\delta }{4\eta \sqrt{r}}\Big )^2 +1+\eta r+\frac{\delta ^2}{8\eta },\quad y\in [-1,1]. \end{aligned}$$
(4.4)

Since the first term on the right-hand side is nonnegative, P is positive in \([-1,1]\) provided that \(1+\eta r+\frac{\delta ^2}{8\eta }\) is positive, i.e.,

$$\begin{aligned} r< -\dfrac{1}{\eta }-\dfrac{\delta ^2}{8\eta ^2}. \end{aligned}$$
(4.5)

Notice that (4.5) is also necessary if \(-\frac{\delta }{4\eta \sqrt{r}}\leqslant 1.\)

For \((\delta ,\eta )\in D_1,\) in case \(-\frac{\delta }{4\eta \sqrt{r}}> 1,\) i.e., for \(r<\frac{\delta ^2}{16\eta ^2},\) a seemingly milder condition for the positivity of P in \([-1,1]\) suffices, namely, \(P(1)> 0.\) However, we have

$$\begin{aligned}\frac{\delta ^2}{16\eta ^2}\leqslant -\dfrac{1}{\eta }-\dfrac{\delta ^2}{8\eta ^2} \iff \frac{3}{16}\delta ^2 \leqslant -\eta ,\end{aligned}$$

which is the motivation for the definition of \(D_1\).

Summarizing, for \((\delta ,\eta )\in D_1, P\) is positive in \([-1,1]\) if and only if (4.5) holds; this proves (4.3) for \((\delta ,\eta )\in D_1.\)

Next, we consider the case \((\delta , \eta )\in D_2\). For \(0<-\dfrac{\delta }{4\eta \sqrt{r}}\leqslant 1\), i.e., for \(r\geqslant \frac{\delta ^2}{16\eta ^2},\) we have

$$\begin{aligned}1+\eta r+\frac{\delta ^2}{8\eta }\leqslant 1+\dfrac{3\delta ^2}{16\eta }<0 \quad \text {since}\quad \dfrac{3}{16}\delta ^2> -\eta \quad \text {for}\quad (\delta ,\eta )\in D_2,\end{aligned}$$

and we easily infer from (4.4) that (P) is not satisfied.

For \(-\dfrac{\delta }{4\eta \sqrt{r}}>1\), i.e., for \(0<\sqrt{r}<-\dfrac{\delta }{4\eta }\), P is positive in \([-1,1]\) if and only if

$$\begin{aligned} P(1)=-\eta \left( \sqrt{r}+\frac{\delta }{2\eta }\right) ^2+1+\frac{\delta ^2}{4\eta }> 0. \end{aligned}$$

The discriminant \(\delta ^2+4\eta \) is positive for \((\delta ,\eta )\in D_2\), whence P(1) has two real roots \(\sqrt{r}=\dfrac{\delta \pm \sqrt{\delta ^2+4\eta }}{-2\eta }\). In this case, we have

$$\begin{aligned}0<\sqrt{r}< \dfrac{\delta -\sqrt{\delta ^2+4\eta }}{-2\eta }<-\dfrac{\delta }{4\eta } \quad \text {since}\quad \dfrac{3}{16}\delta ^2> -\eta \quad \text {for}\quad (\delta ,\eta )\in D_2.\end{aligned}$$

Summarizing, for \((\delta , \eta )\in D_2, P\) is positive in \([-1,1]\) if and only if

$$\begin{aligned} r< \left( \dfrac{2}{\delta +\sqrt{\delta ^2+4\eta }}\right) ^2, \quad (\delta , \eta )\in D_2; \end{aligned}$$

this proves (4.3) for \((\delta ,\eta )\in D_2.\)

Obviously, the mildest condition on r such that (4.1) and (4.3) are satisfied is

$$\begin{aligned} r< \max _{(\delta ,\eta )\in D} \min \{r^\star (\delta , \eta ), h(\delta , \eta )\}. \end{aligned}$$
(4.6)

It will be convenient to rewrite the expression on the right-hand side of (4.6). Since \(\frac{1+\eta }{2-\delta }\in [1/3, \infty )\) for \((\delta ,\eta )\in D,\) we let

$$\begin{aligned} t:=\frac{1+\eta }{2-\delta }\quad \text {with}\quad t\in \left[ 1/3, \infty \right) , \end{aligned}$$

and, for fixed t,  consider the secant segments \(L_t\subset D,\)

$$\begin{aligned} L_t:\quad \eta =t(2-\delta )-1\quad \text {for}\quad \eta \in (-1,0). \end{aligned}$$
(4.7)

Notice that the secant segment \(L_{1/3}\) is \(1+\delta +3\eta =0\), and, as t increases from 1/3 to \(\infty ,\) the secant segment \(L_t\) rotates clockwise and approaches the right boundary of the domain D,  whence \(L_t\) sweeps the whole domain D; see the colored part in Fig. 1. Consequently, (4.6) can be equivalently written in the form

$$\begin{aligned} r< \max _{t\in [1/3,\infty )} \max _{(\delta ,t(2-\delta )-1)\in L_t} \min \{r^\star (\delta , t(2-\delta )-1), h(\delta , t(2-\delta )-1)\}. \end{aligned}$$
(4.8)

From (4.7) and (4.1), we get

$$\begin{aligned} H(t){} & {} :=r^\star (\delta ,t(2-\delta )-1) =\dfrac{\sqrt{(3t-1)(2-\delta )}}{2\sqrt{t(2-\delta )}-\sqrt{(3t-1)(2-\delta )}}\nonumber \\{} & {} =\dfrac{\sqrt{3t-1}}{2\sqrt{t}-\sqrt{3t-1}}. \end{aligned}$$
(4.9)

Analogously, in view of (4.3) and (4.7), we let

$$\begin{aligned} G(t):=\max _{\{\delta : (\delta ,t(2-\delta )-1)\in L_t\}} h(\delta , t(2-\delta )-1)\quad \text {for}\quad t\in \left[ 1/3, \infty \right) \end{aligned}$$
(4.10)

with

$$\begin{aligned}{} & {} h(\delta , t(2-\delta )-1)\nonumber \\{} & {} \qquad =\left\{ \begin{aligned}&-\dfrac{1}{t(2-\delta )-1}-\dfrac{\delta ^2}{8\left( t(2-\delta )-1\right) ^2}, ~{} & {} ~(\delta ,t(2-\delta )-1)\in D_1, \\&\left( \dfrac{2}{\delta +\sqrt{\delta ^2+4\left( t(2-\delta )-1\right) }}\right) ^2, ~{} & {} ~(\delta ,t(2-\delta )-1)\in D_2. \end{aligned} \right. \end{aligned}$$
(4.11)

According to (4.9) and (4.10), inequality (4.8) can be written as

$$\begin{aligned} r< \max _{t\in [1/3,\infty )} \min \{H(t), G(t)\}. \end{aligned}$$

Next, we consider the maximum of \(h(\delta , t(2-\delta )-1)\) in (4.11) for \((\delta ,t(2-\delta )-1)\in D\).

For the points \((\delta ,t(2-\delta )-1)\in L_t\cap D_2\), according to (4.11), we have

$$\begin{aligned} \begin{aligned} \dfrac{\partial h(\delta ,t(2-\delta )-1)}{\partial \delta }&=\frac{-8\left( \sqrt{\delta ^{2} + 4\left( t(2-\delta )-1\right) }+\delta -2t\right) }{\left( \delta + \sqrt{\delta ^{2} + 4\left( t(2-\delta )-1\right) }\right) ^{3} \sqrt{\delta ^{2} + 4 \left( t(2-\delta )-1\right) }}\\&=\frac{4\left( \sqrt{\delta ^{2} \!+\! 4\left( t(2-\delta )-1\right) }\!+\!\delta - 2\right) ^2}{(2-\delta )\left( \delta + \sqrt{\delta ^{2} + 4\left( t(2-\delta )-1\right) }\right) ^{3} \sqrt{\delta ^{2} \!+\! 4\left( t(2-\delta )-1\right) }} \geqslant 0. \end{aligned} \end{aligned}$$

Notice that \(t(2-\delta )-1=\eta \in (-1,0)\) and \(\delta ^2+4\eta \) is positive. Therefore, \(h(\delta ,t(2-\delta )-1)\) is increasing with respect to \(\delta \) in the secant line \(L_t\cap D_2\), which implies that the maximum of \(h(\delta ,t(2-\delta )-1)\) is attained at the point on the curve \(\dfrac{3}{16}\delta ^2=-\eta \). Notice also that this curve lies in \(D_1\). Hence, we only need to consider the points \((\delta , t(2-\delta )-1)\in D_1\).

For points \((\delta ,t(2-\delta )-1)\in L_t\cap D_1\), in view of (4.11), we see that

$$\begin{aligned} \begin{aligned} \dfrac{\partial h(\delta ,t(2-\delta )-1)}{\partial \delta }&=-\frac{1}{4\left( t(2-\delta )-1\right) ^{3}}\left[ \delta \left( t(2-\delta )-1\right) +\left( \delta ^{2} + 4\left( t(2-\delta )-1\right) \right) t\right] \\&=-\frac{1}{4\left( t(2-\delta )-1\right) ^{3}}\rho (\delta ) \end{aligned} \end{aligned}$$

with

$$\begin{aligned} t(2-\delta )-1=\eta \in (-1,0),\quad \rho (\delta ):=-\left( 4t^2-2t+1\right) \delta +8t^2-4t. \end{aligned}$$

Notice that \(\rho \) is a decreasing function of \(\delta \) since it is linear and \(4t^2-2t+1=4(t-\frac{1}{4})^2+\frac{3}{4}>0\).

If \(t\in [1/3, 1/2)\), then \(\rho (\delta )<\rho (0)=8t^2-4t<0\). Therefore, \(h(\delta ,t(2-\delta )-1)\) is decreasing with respect to \(\delta \) and attains its maximum on the secant segment \(L_t\) at \(\delta =0\). From (4.7), we infer that \(\eta =2t-1\). According to (4.3), we have

$$\begin{aligned} r< h(\delta ,\eta )=-\dfrac{1}{\eta }=\dfrac{1}{1-2t}. \end{aligned}$$
(4.12)

If \(t\in (1/2,\infty )\), then \(8t^2-4t>0\). The root \(\delta ^\star \) of \(\rho \) is

$$\begin{aligned} \delta ^\star =\dfrac{4t(2t-1)}{4t^2-2t+1}. \end{aligned}$$
(4.13)

If \(\delta \in (0,\delta ^\star )\), then \(\rho (\delta )>0\) and \(h(\delta ,t(2-\delta )-1)\) is increasing with respect to \(\delta \). If \(\delta \in (\delta ^\star ,2)\), then \(\rho (\delta )<0\) and \(h(\delta ,t(2-\delta )-1)\) is decreasing with respect to \(\delta \). Therefore, \(h(\delta ,t(2-\delta )-1)\) attains its maximum on the secant segment \(L_t\) at \(\delta ^\star \). From (4.7), we have

$$\begin{aligned} \eta ^\star =t\left( 2-\delta ^\star \right) -1=-\dfrac{\left( 2t-1\right) ^{2}}{4t^{2}-2t+1}. \end{aligned}$$
(4.14)

Therefore, from (4.3), we obtain

$$\begin{aligned} r< h(\delta ^\star ,\eta ^\star )=-\dfrac{1}{\eta ^\star }-\dfrac{\left( \delta ^\star \right) ^2}{8\left( \eta ^\star \right) ^2}=\dfrac{1}{2}+\dfrac{1}{2(2t-1)^2}. \end{aligned}$$
(4.15)

Combining (4.10), (4.12) and (4.15), we have

$$\begin{aligned} G(t)={\left\{ \begin{array}{ll} \dfrac{1}{1-2t}, ~~&{}t\in \left[ 1/3,1/2\right) , \\ +\infty , ~~&{}t=1/2, \\ \dfrac{1}{2}+\dfrac{1}{2(2t-1)^2},~~&{}t\in (1/2,+\infty ). \end{array}\right. } \end{aligned}$$
(4.16)

It is easily seen from (4.9) that H is increasing with respect to \(t\in [1/3,\infty ).\) Furthermore, in view of (4.16), G is increasing in the interval [1/3, 1/2) and decreasing in the interval \((1/2,\infty )\); see Fig. 2. Since \(H(t)<H(1/2)\!=\!1<3\!=\!G(1/3)\!<G(t)\) for \(t\in [1/3,1/2)\), the graphs of H and G do not intersect for \(t\in [1/3,1/2)\).

Fig. 2
figure 2

The graphs of H and G; see (4.9) and (4.16)

On the other hand, there exists a unique optimal point of \(H(t)=G(t)\) if \(t\in (1/2,\infty )\). Indeed, from (4.9) and (4.16) for \(t\in (1/2,\infty )\), we have

$$\begin{aligned} \dfrac{\sqrt{3t-1}}{2\sqrt{t}-\sqrt{3t-1}}=\dfrac{1}{2}+\dfrac{1}{2(2t-1)^2},\quad t\in (1/2,\infty ), \end{aligned}$$

that is

$$\begin{aligned} 23t^5-55t^4+55t^3-29t^2+8t-1=0,\quad t\in (1/2,\infty ). \end{aligned}$$

Notice that the above polynomial has only one real root, namely \(t \approx \) 0.794645365827. Substituting this value of t in (4.13), (4.14), and (4.15), respectively, we obtain the optimal values

$$\begin{aligned}{} & {} \delta ^\star \approx 0.967237837020572,\quad \eta ^\star \approx -0.179320334471962,\nonumber \\{} & {} r^\star \approx 1.9398285699451. \end{aligned}$$
(4.17)

Let us mention that the multipliers \(\delta ^\star \) and \(\eta ^\star \) are admissible, i.e., they satisfy all conditions in our stability analysis; in particular, \(2-\delta ^\star +2\eta ^\star \geqslant 0,\) which is used in Lemma 3.2 but does not enter into the definition of the domain D.