1 Introduction

The nonlinear Schrödinger (NLS) equation

$$\begin{aligned} i \partial _T A = \nu _1 \partial _X^2 A + {\widetilde{\nu }}_2 A |A|^2, \end{aligned}$$
(1)

with coefficients \( \nu _1, {\widetilde{\nu }}_2 \in {\mathbb {R}} \), can be derived for the description of small modulations in time and space of oscillatory wave packets in dispersive wave systems, such as the quadratic (\( f(u) = u^2 \)) or cubic (\( f(u) = u^3 \)) Klein–Gordon equation

$$\begin{aligned} \partial _t^2 u = \partial _x^2 u - u - f(u), \qquad (x, t, u(x,t) \in {\mathbb {R}}), \end{aligned}$$

the water wave problem and systems from nonlinear optics. For the cubic Klein–Gordon equation, the ansatz for the derivation of the NLS equation is given by

$$\begin{aligned} u(x,t) = \varepsilon A(\varepsilon (x-c_gt),\varepsilon ^2 t) e^{i(k_0 x - \omega _0 t)} + c.c., \end{aligned}$$

where \( c_g\) is the linear group velocity, \( k_0\) the basic spatial wave number, \( \omega _0 \) the basic temporal wave number and \( 0 < \varepsilon \ll 1 \) a small perturbation parameter.

Various NLS approximation results have been proved in the last decades. Such a result can trivially be established for a dispersive wave system with no quadratic terms by using Gronwall’s inequality, cf. [18]. However, in case of quadratic nonlinearities such a result is non-trivial since terms of order \( {\mathcal {O}}(\varepsilon ) \) have to be controlled on the long \( {\mathcal {O}}(1/\varepsilon ^2) \)-timescale. The idea to get rid of this problem is to use so-called normal form transformations. By a near identity change of variables, the terms of order \( {\mathcal {O}}(\varepsilon ) \) can be eliminated if non-resonance conditions are satisfied, cf. [13]. The last years saw various attempts to weaken these non-resonance conditions in order to control appearing resonances, cf. [22], and to make the theory applicable to quasilinear systems, cf. [4, 7, 28], such as the water wave problem, cf. [6, 8, 11, 27].

It turned out that in case of initial conditions for the NLS equation which are analytic in a strip of the complex plane almost no non-resonance conditions are necessary, cf. [5, 21]. It is the purpose of this paper to explain that the method developed in [5, 21] can be used in the justification of the derivative nonlinear Schrödinger (DNLS) approximation, too. Interestingly, it allows us to get rid of a problem which is not present in the justification of the NLS approximation, see below.

The DNLS equation

$$\begin{aligned} i \partial _T A = \nu _1 \partial _X^2 A + \nu _2 A |A|^2 + i \nu _3 |A|^2 \partial _X A + i \nu _4 A^2 \partial _X {\overline{A}} + \nu _5 A |A|^4, \end{aligned}$$
(2)

with \( T \ge 0 \), \( X \in {\mathbb {R}}\), \( A(X,T) \in {\mathbb {C}}\), and coefficients \( \nu _j \in {\mathbb {R}} \) for \( j = 1,\ldots ,5 \), appears if the cubic coefficient \( {\widetilde{\nu }}_2 = {\widetilde{\nu }}_2 (k_0) \) in (1) vanishes for the chosen basic spatial wave number \( k_0 \). This situation appears for instance in the water wave problem for one-dimensional sets in the parameter plane of surface tension and basic spatial wave number \( k_0 \), cf. [1]. The DNLS equation can be derived with an ansatz

$$\begin{aligned} u(x,t)= \varepsilon ^{1/2} A(\varepsilon (x-c_gt),\varepsilon ^2 t) e^{i(k_0 x - \omega _0 t)} + c.c.. \end{aligned}$$
(3)

The justification is more difficult and from a mathematical point of view even more interesting than that for the NLS approximation since for original dispersive wave systems with a quadratic nonlinearity, in the equation for the error, terms of order \( {\mathcal {O}}(\varepsilon ^{1/2}) \) have to be controlled on a long \( {\mathcal {O}}(1/\varepsilon ^2) \)-timescale. Even for dispersive wave systems with a cubic nonlinearity, in the equation for the error, terms of order \( {\mathcal {O}}(\varepsilon ) \) have to be controlled on a long \( {\mathcal {O}}(1/\varepsilon ^2) \)-timescale, and so, as a first step in establishing an approximation theory for the DNLS approximation we start with the most simple toy problem, namely a nonlinear Klein–Gordon equation with a special cubic nonlinearity,

$$\begin{aligned} \partial _t^2 u = \partial _x^2 u - u + \varrho (\partial _x) u^3. \end{aligned}$$
(4)

Herein, \( x \in {\mathbb {R}} \), \( t \in {\mathbb {R}} \), \( u(x,t) \in {\mathbb {R}} \), and

$$\begin{aligned} \varrho (ik) = \frac{k^2-1}{k^2+1}, \qquad \text {resp.} \qquad \varrho (\partial _x) = - (1-\partial _x^2)^{-1} (1+\partial _x^2). \end{aligned}$$

Plugging the ansatz (3) with \( k_0 = 1 \) into (4) and equating the coefficients in front of \( e^{i(k_0 x - \omega _0 t)} \) to zero gives at \( {\mathcal {O}}(\varepsilon ^{1/2})\) the linear dispersion relation \( \omega _0^2 = k_0^2 + 1 \) and at \( {\mathcal {O}}(\varepsilon ^{3/2}) \) the linear group velocity \( c_g = k_0/\omega _0 \). Using the expansion

$$\begin{aligned} \varrho (i + \varepsilon \partial _X) = \frac{-(i + \varepsilon \partial _X)^2-1}{-(i + \varepsilon \partial _X)^2+ 1} = - i \varepsilon \partial _X +{\mathcal {O}}(\varepsilon ^2) \end{aligned}$$

gives at \( {\mathcal {O}}(\varepsilon ^{5/2}) \) the DNLS equation

$$\begin{aligned} - 2 i \omega _0 \partial _T A = (1-c_g^2 )\partial _X^2 A - 3 i \partial _X (A|A|^2). \end{aligned}$$
(5)

Remark 1.1

The Fourier transform of the DNLS approximation (3) is given by

$$\begin{aligned} {\widehat{u}}(k,t)= \varepsilon ^{1/2} \varepsilon ^{-1} {\widehat{A}}(\frac{k-k_0}{\varepsilon },\varepsilon ^2 t) e^{ - i \omega _0 t - i c_g (k-k_0) t } + \widehat{c.c}. \end{aligned}$$
(6)

Hence, the Fourier transform is strongly concentrated at the wave numbers \( \pm k_0 = \pm 1 \) and so the evolution of \( {\widehat{A}} \) is determined by the form of the dispersion relation and of the nonlinearity at the wave number \( k_0 = 1 \).

It is the goal of this paper to prove that the DNLS equation (5) makes through the ansatz (3) correct predictions about the dynamics of the Klein–Gordon model (4).

For the formulation of our approximation theorem, we need.

Definition 1.2

For \( \sigma , s \ge 0 \), we define the Gevrey spaces

$$\begin{aligned} G_{\sigma }^s = \{ u: {\mathbb {R}} \rightarrow {\mathbb {C}}: \Vert u\Vert _{G^s_\sigma }:=\Vert e^{\sigma (|k|+1)}(1+|k|^{2s})^{1/2}\widehat{u}(k)\Vert _{L^2(dk)} < \infty \}. \end{aligned}$$

Then our approximation theorem is as follows.

Theorem 1.3

Let \( s_A \ge 12 \), \( \sigma _0 > 0 \), and \( A\in C([0,T_0],G_{\sigma _0}^{s_A}) \) be a solution of the DNLS equation (5). Then there exist \( \varepsilon _0 > 0 \), \( T_1 \in (0,T_0] \), and \( C > 0 \) such that for all \( \varepsilon \in (0,\varepsilon _0) \) we have solutions u of the Klein–Gordon model (4) such that

$$\begin{aligned} \sup _{t \in [0,T_1/\varepsilon ^{2}]} \sup _{x \in {\mathbb {R}}}| u(x,t) - (\varepsilon ^{1/2} A(\varepsilon (x-ct),\varepsilon ^2 t) e^{i(k_0 x - \omega _0 t)} + c.c.)| \le C \varepsilon . \end{aligned}$$

Remark 1.4

As already said, such an approximation result is non-trivial since solutions of order \( {\mathcal {O}}(\varepsilon ^{1/2}) \) of (4) have to be controlled on an \( {\mathcal {O}}(1/\varepsilon ^{2}) \)-timescale. Although we have a cubic nonlinearity, a simple application of Gronwall’s inequality would only give estimates on an \( {\mathcal {O}}(1/\varepsilon ) \)-timescale.

Remark 1.5

Such an approximation result should not be taken for granted. There are counterexamples, cf. [10, 19, 23], showing that there are amplitude equations which are derived in a formally correct way, but fail to make correct predictions about the original system on the natural timescale of the approximation.

Remark 1.6

We recall that for \( \sigma , s \ge 0 \) due to the Paley–Wiener theorem functions \( u \in G_{\sigma }^{s} \) can be extended to a strip \( \{ z \in {{\mathbb {C}}}: |\text {Im} z| < \sigma \}\) in the complex plane, cf. [15].

Remark 1.7

The approximation result is not optimal in the sense that error estimates can only be proved on the correct timescale, namely for \( t \in [0,T_1/\varepsilon ^2] \), but not necessarily for all \( t \in [0,T_0/\varepsilon ^2] \). Hence, we can only guarantee that parts of the DNLS dynamics can be seen in the original system.

It turns out that there are two new difficulties which were not present in the justification analysis of other modulation equations so far and which have to be overcome, namely the problem of a total resonance and the problem of a second-order resonance, see below and Sect. 4 for detailed definitions of these resonances. We get rid of the second-order resonance by adapting a method developed in [5, 21] for justifying the NLS approximation under rather weak non-resonance conditions, however, with the drawback that the initial conditions for the NLS equation have to be chosen to be analytic in a strip of the complex plane. This approach will be combined with some energy estimates in order to get rid of the total resonance which is also not present in the justification analysis of the NLS approximation. For a detailed outline of the proof, see Sect. 2.

Remark 1.8

We call the following approach to justify the DNLS approximation robust since our approximation result holds under rather weak non-resonance conditions.

Remark 1.9

For completeness, we remark that the DNLS equation is a well-studied nonlinear dispersive system. Local well-posedness of smooth solutions in Sobolev spaces \(H^s\) with \(s>3/2\) was established by Tsutsumi and Fukuda [26]. See also [3, 9, 25, 29, 30] for further improvements. The complete integrability of the DNLS equation has been established in [16]. For a recent overview, see [12].

Notation

The Fourier transform of a function \( u: {\mathbb {R}} \mapsto {\mathbb {C}} \) is given by

$$\begin{aligned} ({\mathcal {F}}u)[k]={{\widehat{u}}}(k)=\frac{1}{2\pi }\mathop {\int }\limits _{{\mathbb {R}}} u(x) e^{-i kx}\hbox {d}x. \end{aligned}$$

The inverse Fourier transform of a function \( {\widehat{u}}: {\mathbb {R}} \mapsto {\mathbb {C}} \) is given by

$$\begin{aligned} ({\mathcal {F}}^{-1}{{\widehat{u}}})[x]= u(x)= \mathop {\int }\limits _{{\mathbb {R}}} {\widehat{u}}(k) e^{i kx}{d}k. \end{aligned}$$

Multiplication \((uv)(x)=u(x)v(x)\) in physical space corresponds in Fourier space to the convolution

$$\begin{aligned} ({{\widehat{u}}}*{{\widehat{v}}})(k)=\mathop {\int }\limits _{{\mathbb {R}}} {\widehat{u}}(k-l){\widehat{v}}(l){d}l. \end{aligned}$$

The weighted Lebesgue space \( L^2_s \) which will be used a few times in Fourier space is equipped with the norm \( \Vert {\widehat{u}} \Vert _{L^2_s} = \Vert {\widehat{u}}\rho _0^s \Vert _{L^2} \) with \( \rho _0(k)=(1+k^2)^{1/2} \).

In the following, many possibly different constants are denoted with the same symbol C if they can be chosen independent of the small perturbation parameter \( 0 < \varepsilon \ll 1 \).

2 Plan of the paper

The plan of the paper is as follows. In Sect. 3, we write the Klein–Gordon model (4) as a first-order system in Fourier space and derive the equations for the error made by an improved DNLS approximation. The improved DNLS approximation is \( {\mathcal {O}}(\varepsilon ^{3/2}) \)-close to the original DNLS approximation (3), but has the advantage that it makes the residual smaller, i.e., the terms which do not cancel after inserting the ansatz in the original system. The improved DNLS approximation is constructed and remaining residual terms are estimated in “Appendix A.”

There are terms of order \( {\mathcal {O}}(\varepsilon ) \) in the equations for the error which prevents us to obtain bounds on the long \( {\mathcal {O}}(1/\varepsilon ^2) \)-timescale. Therefore, in Sect. 4 we perform some normal form transformations in order to get rid of these terms in the equations for the error. It turns out that this elimination is a non-trivial task since resonances are present in the system, and therefore, not all terms of order \( {\mathcal {O}}(\varepsilon ) \) can be eliminated by these transformations.

There are cubic terms which cannot be eliminated for any wave number \( k \in {{\mathbb {R}}}\). We call the associated resonance a total resonance. Moreover, there is another resonance at the wave numbers \( \pm k_0 = \pm 1 \). The denominator in the normal from transformation vanishes of second order for these wave numbers and so this resonance will be called second-order resonance in the following. Since the nonlinear terms which appear in the nominator only vanish linearly at the wave numbers \( \pm k_0 = \pm 1 \), the associated part of the normal form transform would be unbounded. Therefore, in Sect. 4 we only eliminate all terms which are not associated with the total resonance or second-order resonance. It turns out that the total resonant terms can be controlled with a simple energy estimate, and so, we concentrate on the handling of the terms associated with the second-order resonance by the use of Gevrey spaces in the following.

As a preparation we recall some estimates from the local existence and uniqueness proof for the DNLS equation in Gevrey spaces in Sect. 5. By lowering the decay rates \( \sigma \) in the definition of the norm \( \Vert \cdot \Vert _{G^s_{\sigma }} \) with time, we obtain an artificial smoothing which allows us to get rid of first-order derivatives in the nonlinearity.

The transfer of the analysis made in Sect. 5 to Fourier space is the basis of our approach to get rid of the terms associated with the second-order resonance. The transfer to Fourier space corresponds to giving up the exponential localization of the solutions in Fourier space with time which will allow us to use the derivative in front of the nonlinear term in (5) to come to the correct \( {\mathcal {O}}(1/\varepsilon ^2) \)-timescale. However, in order to use this idea on the original system (4) we have to get rid of the fact that the DNLS modes are concentrated at the wave number \( k_0=1 \) and that the nonlinear term vanishes at this wave number, too. Hence, in Sect. 6 we introduce a space where the Fourier modes are located at integer multiples of \( k_0 \) with an exponential decay proportional to \( |k-mk_0| \). Again by lowering the decay rates with time, we rebuilt the construction from Sect. 5 for (4), cf. Fig. 1. This allows us to come with our error estimates to the natural \( {\mathcal {O}}(1/\varepsilon ^{2}) \)-timescale of the DNLS approximation. This construction originally was used in [5, 21] to get rid of resonances which are bounded away from the integer multiples of the basic wave number \( k_0 \). The solutions at such resonant wave numbers will grow like \( e^{ {\mathcal {O}}(\varepsilon ) t} \). However, by the chosen spaces the solutions are initially \( e^{ -{\mathcal {O}}(1/\varepsilon )} \) small. Thus, the error will stay \( {\mathcal {O}}(1) \)-bounded on a long \( {\mathcal {O}}(1/\varepsilon ^2) \)-timescale in t.

Fig. 1
figure 1

The mode distribution of the solutions \( {\widehat{A}} \) and \( {\widehat{u}} \) will be controlled by weighted \( L^2 \)-spaces. The left panel shows the inverse of the weight for the DNLS equation (5) for \( t = 0 \) in blue and for a \( t > 0 \) in red, Moreover, it shows the vanishing of the nonlinearity at \( K=0 \) in Fourier space in green. The right panel shows the inverse of the weight for the original system (4) for \( t = 0 \) in blue and for a \( t > 0 \) in red, Moreover, it shows the vanishing of the nonlinearity at \( k=\pm 1 \) in green again (color figure online)

In Sect. 7, we introduce so-called mode filters which allow us to separate the error function. These mode filters allow us to handle different parts of the error function in Fourier space differently. At the wave number \( k = \pm k_0 \) and except for small neighborhoods around integer multiples of \( k_0 \), we use the time-dependent weighted \( L^2 \)-spaces to control the magnitude of the error. In the small neighborhoods around the integer multiples of \( k_0 \), without the neighborhoods around \( k = \pm k_0 \), we use normal form transformations. Estimates for the normal form transformations in the chosen spaces can be found in Sect. 8. The final error estimates can be found in Sect. 9. We use energy estimates for the transformed system since we still have to get rid of the totally resonant terms. All ideas from the previous sections can be incorporated in these energy estimates. We close the paper in Sect. 10 with a discussion about possible improvements, generalizations and about the possible transfer to more complicated systems.

3 Equations for the error

The Fourier transformed cubic Klein–Gordon model (4) is given by

$$\begin{aligned} \partial _t^2 {\widehat{u}}(k,t) = - \omega ^2(k) {\widehat{u}}(k,t)- \omega (k) \rho (k) {\widehat{u}}^{*3}(k,t) \end{aligned}$$
(7)

where \( \omega (k) = \text {sign}(k) \sqrt{1+k^2} \) and \( \rho (k)=- \frac{\varrho (i k)}{\omega (k)} \). By this choice of \( \omega \) and \( \rho \), the subsequent variables will be real-valued in physical space. With \( u = u_1 \), we write (7) as a first-order system

$$\begin{aligned} \partial _t {\widehat{u}}_1(k,t)= & {} - i \omega (k) {\widehat{u}}_2(k,t),\\ \partial _t {\widehat{u}}_2(k,t)= & {} - i \omega (k) {\widehat{u}}_1(k,t) - i \rho (k) {\widehat{u}}_1^{*3}(k,t). \end{aligned}$$

This system is diagonalized with

$$\begin{aligned} 2 {\widehat{v}}_{-1} = {\widehat{u}}_1+ {\widehat{u}}_2, \qquad 2 {\widehat{v}}_{1} = {\widehat{u}}_1- {\widehat{u}}_2. \end{aligned}$$

With \( V = (v_{-1},v_1) \), we obtain

$$\begin{aligned} \partial _t V = \Lambda V + N(V,V,V), \end{aligned}$$
(8)

where in Fourier space

$$\begin{aligned} {\widehat{\Lambda }}(k) = \left( \begin{array}{cc} - i \omega (k) &{} \quad 0 \\ 0 &{} \quad i \omega (k) \end{array}\right) \end{aligned}$$

is a skew-symmetric operator and

$$\begin{aligned} {\widehat{N}}({\widehat{V}},{\widehat{V}},{\widehat{V}})(k,t) = \frac{1}{2} i \rho (k) \left( \begin{array}{c} -({\widehat{v}}_1+ {\widehat{v}}_{-1})^{*3} \\ ({\widehat{v}}_1+ {\widehat{v}}_{-1})^{*3} \end{array}\right) (k,t) \end{aligned}$$

a symmetric trilinear mapping. The DNLS approximation is of the form

$$\begin{aligned} \varepsilon ^{1/2} \psi =\left( \begin{array}{c} \varepsilon ^{1/2} a_1 + \varepsilon ^{1/2} a_{-1} + \varepsilon ^{3/2} \psi _{s,-1} \\ \varepsilon ^{3/2} \psi _{s,1}\end{array}\right) , \end{aligned}$$
(9)

cf. (30), with \( a_j \) concentrated at the wave number \( k = j \) and higher-order approximation terms \( \varepsilon ^{3/2} \psi _{s,\pm 1} \). See “Appendix A” for the detailed construction. The error \( \varepsilon ^{{\widetilde{\beta }}} R = V - \varepsilon ^{1/2} \psi \) with \( {\widetilde{\beta }} > 3/2 \) made by the DNLS approximation satisfies

$$\begin{aligned} \partial _t R = \Lambda R + \varepsilon L_{c}(R) + \varepsilon ^2 L_{s}(R) + \varepsilon ^{{\widetilde{\beta }}+1/2} L_r(R) + \varepsilon ^{-{\widetilde{\beta }}} \text {Res}(\varepsilon ^{1/2} \psi ), \end{aligned}$$
(10)

where \( \varepsilon L_{c}(R) \) are terms which are linear in R but have too few powers of \( \varepsilon \) in front and thus make difficulties to obtain \( {\mathcal {O}}(1) \)-bounds for R on the long \( {\mathcal {O}}(1/\varepsilon ^2) \)-timescale, where \( \varepsilon ^2 L_{s}(R) \) are terms which are linear in R and have enough powers of \( \varepsilon \) in front, and where \( \varepsilon ^{{\widetilde{\beta }}+1/2} L_r(R) \) are terms which are nonlinear in R (and have enough powers of \( \varepsilon \) in front), in detail

$$\begin{aligned} \widehat{L_{c}(R)}(k,t)= & {} \frac{3}{2} i \rho (k) \left( \begin{array}{c} -({\widehat{a}}_1+ {\widehat{a}}_{-1})^{*2}*({\widehat{R}}_1+ {\widehat{R}}_{-1}) \\ ({\widehat{a}}_1+ {\widehat{a}}_{-1})^{*2}*({\widehat{R}}_1+ {\widehat{R}}_{-1}) \end{array}\right) (k,t), \\ \widehat{L_{s}(R)}(k,t)= & {} 3 i \rho (k) \left( \begin{array}{c} -({\widehat{a}}_1+ {\widehat{a}}_{-1})* ({\widehat{\psi }}_{s,1}+ {\widehat{\psi }}_{s,-1} )*({\widehat{R}}_1+ {\widehat{R}}_{-1}) \\ ({\widehat{a}}_1+ {\widehat{a}}_{-1})*({\widehat{\psi }}_{s,1}+ {\widehat{\psi }}_{s,-1} )*({\widehat{R}}_1+ {\widehat{R}}_{-1}) \end{array}\right) (k,t) \\ {}{} & {} + \varepsilon \frac{3}{2} i\rho (k) \left( \begin{array}{c} -({\widehat{\psi }}_{s,1}+ {\widehat{\psi }}_{s,-1} )^{*2}*({\widehat{R}}_1+ {\widehat{R}}_{-1}) \\ ({\widehat{\psi }}_{s,1}+ {\widehat{\psi }}_{s,-1} )^{*2}*({\widehat{R}}_1+ {\widehat{R}}_{-1}) \end{array}\right) (k,t), \\ \widehat{L_{r}(R)}(k,t)= & {} \frac{3}{2} i\rho (k) \left( \begin{array}{c} -({\widehat{\psi }}_{1}+ {\widehat{\psi }}_{-1} )*({\widehat{R}}_1+ {\widehat{R}}_{-1})^{*2} \\ ({\widehat{\psi }}_{1}+ {\widehat{\psi }}_{-1} )*({\widehat{R}}_1+ {\widehat{R}}_{-1})^{*2} \end{array}\right) (k,t) \\{} & {} + \varepsilon ^{{\widetilde{\beta }}- 1/2} \frac{1}{2} i\rho (k) \left( \begin{array}{c} -({\widehat{R}}_1+ {\widehat{R}}_{-1})^{*3} \\ ({\widehat{R}}_1+ {\widehat{R}}_{-1})^{*3} \end{array}\right) (k,t). \end{aligned}$$

Moreover, \( \varepsilon ^{-{\widetilde{\beta }}} \text {Res}(\varepsilon ^{1/2} \psi ) \) are the so-called residual terms. These are the terms which do not cancel after inserting the DNLS approximation into the nonlinear Klein–Gordon equation (4).

In the following, we concentrate on estimating the error made by the DNLS approximation and postpone the standard construction of an improved approximation and the estimates for the residual to Appendix A. The improved approximation will be chosen in such a way that the term \( \varepsilon ^{-{\widetilde{\beta }}} \text {Res}(\varepsilon ^{1/2} \psi ) \) is of order \( {\mathcal {O}}(\varepsilon ^2) \).

In the following, in our notation we keep \( {\widetilde{\beta }} > 3/2 \) in order to show that by using improved approximations the error can be made arbitrarily small. All terms on the right-hand side of (10) are at least of order \( {\mathcal {O}}(\varepsilon ^2)\) except for the first two terms. Since \( \Lambda \) is skew-symmetric, the first term on the right-hand side of (10) makes no problems, too. However, the second term \( \varepsilon L_{c}(R) \) which is of order \( {\mathcal {O}}(\varepsilon ) \) makes serious problems in estimating the error on the long \({\mathcal {O}}(1/\varepsilon ^{2}) \)-timescale.

4 Normal form transformations and the resonance structure

The approach to get rid of the dangerous term \( \varepsilon L_{c}(R) \) in (10), which is a sum of trilinear mappings of \( a_{j_1} \), \( a_{j_2} \) and \( R_{j_3} \), with \( j_1,j_2,j_3 \in \{ -1,1\} \), are normal form transformations. By these near identity change of variables

$$\begin{aligned} R = w + \varepsilon M(\psi ,\psi ,R), \end{aligned}$$

with M a trilinear mapping, the \( {\mathcal {O}}(\varepsilon ) \)-terms can be transformed into \( {\mathcal {O}}(\varepsilon ^2) \)-terms, if a number of non-resonance conditions are satisfied.

Remark 4.1

In order to eliminate a trilinear term \( \varepsilon B(a_{j_1},a_{j_2},R_{j_3}) \) of the form

$$\begin{aligned} \widehat{B(a_{j_1},a_{j_2},R_{j_3})}= & {} \int \int b(k,k-k_1,k_1-k_2,k_2) {\widehat{a}}_{j_1}(k-k_1){\widehat{a}}_{j_2}(k_1-k_2) {\widehat{R}}_{j_3}(k_2) dk_2 dk_1 \end{aligned}$$

from the equation of \( R_j \) by a near identity transformation \( w_j = R_j + \varepsilon M(a_{j_1},a_{j_2},R_{j_3}) \), using the fact that the \( {\widehat{a}}_{j} \) in (9) are strongly concentrated at the wave numbers \( k = j \), we have to choose

$$\begin{aligned} \widehat{M(a_{j_1},a_{j_2},R_{j_3})}= & {} \int \int m(k,k-k_1,k_1-k_2,k_2) {\widehat{a}}_{j_1}(k-k_1){\widehat{a}}_{j_2}(k_1-k_2) {\widehat{R}}_{j_3}(k_2) dk_2 dk_1, \end{aligned}$$

with

$$\begin{aligned} m(k,k-k_1,k_1-k_2,k_2) = \frac{b(k,k-k_1,k_1-k_2,k_2)}{-j \omega (k) - \omega (j_1)-\omega (j_2) + j_3 \omega (k-j_1-j_2)}, \end{aligned}$$

cf. [24, §11]. For the terms which will be eliminated, the nominator b is bounded and the denominator is bounded away from zero.

Thus, the non-resonance condition

$$\begin{aligned} r_{jj_1j_2j_3}(k) = -j \omega (k) - \omega (j_1)- \omega (j_2) + j_3 \omega (k-j_1-j_2) \ne 0 \end{aligned}$$
(11)

has to be satisfied for all \( k \in {{\mathbb {R}}}\) for the elimination of a term \( {\widehat{a}}_{j_1}*{\widehat{a}}_{j_2}*R_{j_3} \) from the equation for \( R_{j} \), cf. [24, §11] or Remark 4.1. The non-resonance conditions (11) can be analyzed graphically. We find no resonances except for

(TR):

For \((j,j_1,j_2,j_3)=(j,j_1,-j_1,j)\), the resonance function \( r_{jj_1j_2j_3}(k) \) vanishes identically. Thus, the associated terms in \( \varepsilon L_{c}(R) \) cannot be eliminated by a normal form transformation.

(SOR):

For \((j,j_1,j_2,j_3)=(-1,j_1,j_1,-1)\), cf. Fig. 2, there is a resonance at \(k=j_1\), which is of second order, in detail

$$\begin{aligned} \omega (k)-2\omega (j_1) -\omega (k-2j_1)= 2 \omega ''(j_1) (k-j_1)^2 +{\mathcal {O}}(|k-j_1|^3) \, \end{aligned}$$

for k near \(j_1\). This second-order resonance would appear in the denominator of the normal form transformation. It cannot be balanced by the term \( \rho \) in the nominator of the normal form transformation which only vanishes linearly at \( k = \pm 1 \). Thus, the normal form transformation would be singular near the wave numbers \( k = \pm 1 \).

Fig. 2
figure 2

Plots of \( r_{jj_1j_2j_3}(k) \) for the resonances with the second-order touching. The left panel shows the case \( (j,j_1,j_2,j_3)= (-1,1,1,-1)\), and the right panel shows the case \( (j,j_1,j_2,j_3)= (-1,-1,-1,-1) \)

Thus, besides the normal form transformations which we use to get rid of the non-resonant terms, we need an idea to get rid of the terms which cannot be eliminated at all due to the total resonance (TR), and we need an idea to get rid of the terms which cannot be eliminated in a small neighborhood of the wave numbers \( k = \pm 1 \) due to the second-order resonance (SOR).

It turned out that the problem with the total resonance (TR) can be solved rather easily by using energy estimates. For the handling of the second-order resonance (SOR), we use the fact that in lowest order the system near the wave numbers \( k = \pm 1 \) is given by the DNLS equation. Our approach to solve this problem is similar to the approach chosen for instance in [17, 20] for the justification of the KdV approximation. By this approach, we not only get rid of the quasilinearity of the DNLS equation but also gain the missing \( {\mathcal {O}}(\varepsilon ) \) order to come to the long \( {\mathcal {O}}(1/\varepsilon ^2) \)-timescale. In order to explain this approach, we have a look at the solution theory of the DNLS equation in Gevrey spaces in Sect. 5 first.

Solving the NLS equation in Gevrey spaces was also the basis of the approach which has been used in [5, 21] for justifying the NLS approximation under rather weak non-resonance conditions. This underlying idea of the approach is introduced in Sect. 6. Interestingly, it also allows us to get rid of the second-order resonances which are not present in the justification of the NLS approximation.

And so in the end the transfer of the method developed in [5, 21] from the NLS approximation to the DNLS approximation not only gains the missing \( {\mathcal {O}}(\varepsilon ) \) order in order to come to the long \({\mathcal {O}}(1/\varepsilon ^2) \)-timescale but also allows us to justify the DNLS approximation under rather weak non-resonance conditions. Since we need to control the total resonant terms, too, we use energy estimates instead of the variation of constant formula, and so the \( L^1 \)-based spaces in Fourier space from [5] are replaced here by \( L^2 \)-based spaces.

5 The DNLS equation in Gevrey spaces

As already said, we solve the DNLS equation in Gevrey spaces \(G_\sigma ^s\) equipped with the norm

$$\begin{aligned} \Vert u\Vert _{G^s_\sigma }:=\Vert e^{\sigma (|\xi |+1)}(1+|\xi |^{2s})^{1/2}\widehat{u}(\xi )\Vert _{L^2(d \xi )}. \end{aligned}$$
(12)

In our presentation of the properties of these spaces, we follow [2]. For the local existence and uniqueness of solutions, we use that \(G_\sigma ^s\) is an algebra for \(s>1/2\) and \( \sigma \ge 0\), i.e., if \(u,v\in G_\sigma ^s\), then \(uv\in G_\sigma ^s\) and

$$\begin{aligned} \Vert uv\Vert _{G_\sigma ^s}\le C_s\Vert u\Vert _{G_\sigma ^s}\Vert v\Vert _{G^s_\sigma }, \end{aligned}$$
(13)

where the constant \(C_s\) is independent of \(\sigma \ge 0\). Since the DNLS equation is a quasilinear system, we need the following improved, so-called tame, estimate

$$\begin{aligned} \Vert uv\Vert _{G^s_\sigma }\le C_s (\Vert u\Vert _{G^s_\sigma }\Vert v\Vert _{G^{\kappa }_\sigma }+\Vert u\Vert _{G^{\kappa }_\sigma }\Vert v\Vert _{G^s_\sigma }) \end{aligned}$$
(14)

which holds for all \( \sigma \ge 0 \), \(\kappa \ge 0\) and \(s>1/2\). The elements of \( G^s_\sigma \) form a proper subset of the space of functions which are analytic in a strip of the complex plane of width \( < 2 \sigma \), symmetric around the real axis, equipped with the sup-norm due to the Paley–Wiener theorem, cf. [15].

For the DNLS equation, we have the following local existence and uniqueness result.

Theorem 5.1

Let \(s>1\) and \(\sigma _A>0\). Then, for every \(R_0>0\), there exist \(\eta =\eta (R_0,s,\sigma _A)\) such that for every \(A_0\in G_{\sigma _A}^s\), with \(\Vert A_0\Vert _{G_{\sigma _A}^{s}}\le R_0\), there exists a unique local solution \(A(T)\in G^s_{\sigma (T)}\) of the DNLS equation (5) with \(\sigma (T):=\sigma _A-\eta T\), \(T\in [0,\sigma _A/\eta ]\), and \( \sup _{T\in [0,\sigma _A/\eta ]} \Vert A(T) \Vert _{G^s_{\sigma (T)}} \le R_0 \).

Proof

By rescaling T, X and A, the DNLS equation (5) is brought in its normal form

$$\begin{aligned} \partial _T A = i \partial _X^2 A - \partial _X (A|A|^2). \end{aligned}$$

Next we set

$$\begin{aligned} A(\cdot ,T) = S(T) B(\cdot ,T) = e^{-\sigma (T)(1+M)} B(\cdot ,T), \end{aligned}$$

where \(M=\sqrt{-\partial _x^2}\). Then B satisfies

$$\begin{aligned} \partial _T B = - \eta (1+M) B + i \partial _X^2 B - \partial _X (S^{-1}(T) ((S(T) B)|S(T) B|^2)). \end{aligned}$$

We denote the scalar product in \( H^s \) with \( ( \cdot ,\cdot )_s \) and obtain

$$\begin{aligned} \partial _T ( B,B)_s = - \eta ((1+M)^{1/2} B, (1+M)^{1/2} B)_s + g(B), \end{aligned}$$

where

$$\begin{aligned} |g(B)| \le C \Vert B \Vert _{H^s}^2 \Vert B \Vert _{H^{s+1/2}}^2 \le C \Vert B \Vert _{H^s}^2 ((1+M)^{1/2} B, (1+M)^{1/2} B)_s \end{aligned}$$

such that

$$\begin{aligned} \partial _T (B,B)_s \le (- \eta + C \Vert B \Vert _{H^s}^2 ) ((1+M)^{1/2} B, (1+M)^{1/2} B)_s. \end{aligned}$$

Hence, \( (B,B)_s \) decays in time if \( \eta > 0 \) is chosen so large that initially

$$\begin{aligned} (- \eta + C \Vert B|_{T=0} \Vert _{H^s}^2 ) < 0. \end{aligned}$$

With these a priori estimates, the existence and uniqueness of solutions follow standard arguments, cf. [14]. \(\square \)

The existence of the solutions of the DNLS equation (5) which is assumed in Theorem 1.3 is guaranteed by the following corollary.

Corollary 5.2

Let \( s_A \ge 12 \) and \(\sigma _A>0\). Then, for every \(R_0>0\), there exist \(\eta =\eta (R_0,s_A,\sigma _A)\) such that for every \(A_0\in G_{\sigma _A}^{s_A}\), with \(\Vert A_0\Vert _{G_{\sigma _A}^{s_A}}\le R_0\) and \( \sigma _0 \in [0,\sigma _A) \) there exists a \( T_0 > 0 \) and unique local solution \(A\in C([0,T_0],G^{s_A}_{\sigma _0})\) of the DNLS equation (5) with \( \sup _{T\in [0,\sigma _A/\eta ]} \Vert A(T) \Vert _{G^s_{\sigma (T)}} \le R_0 \).

Proof

As above choose \(\sigma (T):=\sigma _A-\eta T\) and stop for \(\sigma (T_0) = \sigma _0 \). \(\square \)

6 Modulational Gevrey spaces

In the last section, we have seen that with an initial exponential decay in Fourier space for wave numbers \( |K| \rightarrow \infty \) we can create an artificial smoothing which allows us to control the derivative in front of the nonlinear terms of the DNLS equation. For the nonlinear Klein–Gordon equation (4), the DNLS equation is the lowest order approximation for the modes located at the wave number \( k = 1 \); in particular, the derivative in front of the nonlinear terms of the DNLS equation corresponds to the vanishing of the nonlinear terms of the nonlinear Klein–Gordon equation (4) at the wave number \( k = 1 \). For the DNLS approximation, the associated modes decay with an exponential rate around the wave number \( k = 1 \), cf. Fig. 1. However, by nonlinear interaction small peaks with width of order \( {\mathcal {O}}(\varepsilon ) \) are created at odd integer multiples of the basic wave number \( k_0 = 1\). See the right panel of Fig. 1. This means that the solutions of the nonlinear Klein–Gordon equation (4) will have a Fourier mode distribution which is bounded from above by a multiple of \(1/\vartheta _{\beta }\), where

$$\begin{aligned} \vartheta _\beta (k):=\textrm{exp}\Big (\beta \inf _{m\in {\mathbb {Z}}_{odd}}|k-mk_0|\Big ) \end{aligned}$$

or equivalently

$$\begin{aligned} \frac{1}{\vartheta _\beta (k)}=\sup _{m\in {\mathbb {Z}}_{odd}}e^{-\beta |k-mk_0|} \end{aligned}$$

for \(\beta \ge 0\). We define a number of spaces to combine these facts with the ideas from Sect. 5 for the DNLS equation (2) in order to handle the nonlinear Klein–Gordon equation (4). For estimating the solutions of the original system, we use energy estimates, and so, the nonlinear Klein–Gordon equation (4) will be solved in the \( L^2 \)-based space

$$\begin{aligned} {\mathcal {M}}^s_{\beta } = \{ u: {\mathbb {R}} \rightarrow {\mathbb {C}}: \Vert u \Vert _{{\mathcal {M}}_{\beta } } = \Vert {\widehat{u}}(k) \vartheta _{\beta }(k) (1+k^2)^{s/2} \Vert _{L^2(dk)} < \infty \}. \end{aligned}$$

As a consequence for \( u \in {\mathcal {M}}^s_{\beta } \), with \( \beta > 0 \), the modes bounded away from integer multiples of the basic wave number \( {k}_0 = 1\) are exponentially small w.r.t. \( \varepsilon \), i.e., these modes are of order \( {\mathcal {O}}(e^{-r/\varepsilon }) \) for \( 0 < \varepsilon \ll 1 \) with an \( {\mathcal {O}}(1) \)-bound \( r > 0 \). Due to the \( L^2 \)-scaling properties, the DNLS approximation is of order \( {\mathcal {O}}(1) \) in the \( {\mathcal {M}}^s_{\beta } \)-spaces and not of the formal order \( {\mathcal {O}}(\varepsilon ^{1/2}) \). Therefore, we additionally define the spaces

$$\begin{aligned} {\mathcal {W}}^s_{\beta } = \{ u: {\mathbb {R}} \rightarrow {\mathbb {C}}: \Vert u \Vert _{{\mathcal {W}}^s_{\beta } } = \Vert {\widehat{u}}(k) \vartheta _{\beta }(k) (1+k^2)^{s/2} \Vert _{L^1(dk)} < \infty \} \end{aligned}$$

for which the DNLS approximation is of order \( {\mathcal {O}}(\varepsilon ^{1/2}) \).

For the subsequent error estimates, we need that these spaces are closed under point-wise multiplication.

Lemma 6.1

For all \( \beta \ge 0 \) and \( s > 1/2 \), we have

$$\begin{aligned} \Vert u v \Vert _{{\mathcal {M}}^s_\beta } \le \Vert u \Vert _{{\mathcal {M}}^s_\beta } \Vert v \Vert _{{\mathcal {M}}^s_\beta }. \end{aligned}$$

Moreover, for all \( \beta , s \ge 0 \) we have

$$\begin{aligned} \Vert u v \Vert _{{\mathcal {M}}^s_\beta } \le \Vert u \Vert _{{\mathcal {W}}^s_\beta } \Vert v \Vert _{{\mathcal {M}}^s_\beta } \quad \text {and} \quad \Vert u v \Vert _{{\mathcal {W}}^s_\beta } \le \Vert u \Vert _{{\mathcal {W}}^s_\beta } \Vert v \Vert _{{\mathcal {W}}^s_\beta }. \end{aligned}$$
(15)

Proof

The estimates immediately follow from

$$\begin{aligned} \Vert u v \Vert _{{\mathcal {M}}^s_\beta }= & {} \Vert ({\widehat{u}}* {\widehat{v}}) \vartheta _\beta \Vert _{L^2_s} \\\le & {} \Vert {\widehat{u}} \vartheta _\beta \Vert _{L^1} \Vert {\widehat{v}} \vartheta _\beta \Vert _{L^2_s} + \Vert {\widehat{u}} \vartheta _\beta \Vert _{L^2_s} \Vert {\widehat{v}} \vartheta _\beta \Vert _{L^1} \\\le & {} \Vert u \Vert _{{\mathcal {W}}^0_\beta } \Vert v \Vert _{{\mathcal {M}}^s_\beta } + \Vert u \Vert _{{\mathcal {M}}^s_\beta } \Vert v \Vert _{{\mathcal {W}}^0_\beta } \end{aligned}$$

due to Young’s inequality for convolutions, Sobolev’s embedding

$$\begin{aligned} \Vert u \Vert _{{\mathcal {W}}^s_\beta } \le C \Vert v \Vert _{{\mathcal {M}}^{s+\delta }_\beta } \end{aligned}$$

for \( \delta > 1/2 \), and the inequality

$$\begin{aligned} \frac{1}{\vartheta _\beta (k- l)\vartheta _\beta (l)}= & {} \sup _{m\in {\mathbb {Z}}_{odd}}\left( e^{-\beta | k- l-m{k_0}|}\right) \sup _{m\in {\mathbb {Z}}_{odd}}\left( e^{-\beta | l-m{k_0}|}\right) \\\le & {} \sup _{m\in {\mathbb {Z}}_{odd}}\left( e^{- \beta | k-m{ k_0}|}\right) =\frac{1}{\vartheta _\beta ( k)}. \end{aligned}$$

\(\square \)

Remark 6.2

The inequalities of (15) will be used to estimate combinations of the approximation with the error subsequently. The DNLS approximation will be estimated in the space \( {\mathcal {W}}^s_\beta \) where it is of order \( {\mathcal {O}}(\varepsilon ^{1/2}) \). The error will be estimated in the space \( {\mathcal {M}}^s_\beta \).

The initial value problem for (4) for initial conditions of order \( {\mathcal {O}}(\varepsilon ^{1/2}) \) can be solved in these spaces on a time interval of length \( {\mathcal {O}}(1/\varepsilon ) \) using the variation of constant formula, using the fact that we have a cubic nonlinearity and the fact that these spaces are closed under multiplication. In order to bound the error not only on the short \( {\mathcal {O}}(1/\varepsilon ) \)-timescale but also on the natural \( {\mathcal {O}}(1/\varepsilon ^2) \)-timescale of the DNLS approximation, we use the spaces \( {\mathcal {M}}^s_{\beta } \), but now with time-dependent \( \beta \). Similar to above, we choose

$$\begin{aligned} \beta (t) = \sigma _0/\varepsilon - \eta \varepsilon t, \end{aligned}$$
(16)

with constants \( \sigma _0,\, \eta >0\), which can be chosen independently of \( 0 < \varepsilon \ll 1 \). Note that (16) is the scaled version of \(\sigma (T)=\sigma _0-\eta T\) defined in Theorem 5.1 and that \( T_1 = \sigma _0/\eta \). If A is initially in a space \( G^{s+1}_{\sigma _0} \), then according to Remark 1.1 the DNLS approximation is initially in a space \( {\mathcal {W}}^s_{\sigma _0/\varepsilon } \). This is the reason why \( \beta (t) \) starts with \( \sigma _0/\varepsilon \). The decay \( - \eta \varepsilon t \) allows us to consider t on the natural \( {\mathcal {O}}(1/\varepsilon ^2) \)-timescale of the DNLS approximation. It turns out that subsequently choosing \( \eta = {\mathcal {O}}(1) \) is sufficient for our purposes.

In the subsequent sections, we explain in detail how with this approach all problems to come to the long \( {\mathcal {O}}(1/\varepsilon ^2) \)-timescale, found in Sect. 4, can be solved.

7 Separation of the modes

In order to obtain a bound for the error on the long \({\mathcal {O}}(1/\varepsilon ^2)\)-timescale, independently of the small perturbation parameter \( 0 < \varepsilon \ll 1 \), we have to get rid of the term \( \varepsilon L_c(R) \) in (10). Except at the resonant wave numbers, this term is oscillatory and can be removed by a near identity change of variables. In the last sections, we explained our strategy to get rid of the total resonance (TR) and of the second-order resonance (SOR).

Hence, for the error estimates we separate the modes in three parts. The first part which is denoted by \( R_n \) has a support near the odd integer multiples of the basic wave number \( k_0= 1 \) excluding neighborhoods around the basic wave numbers \( \pm k_0= \pm 1 \). It will be handled with normal form transformations and energy estimates. The second part which is denoted by \( R_r \) has a support which is bounded away from the odd integer multiples of the basic wave number \( k_0= 1 \) and will linearly be exponentially damped by our choice of time-dependent weights. The third part which is denoted by \( R_c \) has support near the basic wave numbers \( \pm k_0= \pm 1 \) and will be handled with the ideas which have been explained above in Sect. 5 and Sect. 6 and with energy estimates. The index n stands for normal form, r for rest, and c for critical.

In detail, we define for a small \(\delta _r>0\), but independent of \( 0 < \varepsilon \ll 1 \), the mode filter

$$\begin{aligned} {\widehat{E}}_r(k) ={\left\{ \begin{array}{ll} 1, &{} \quad \text {for }{ \inf _{n \in {{\mathbb {Z}}}_{odd}} |k-n|> \delta _r},\\ 0, &{} \quad \text {else}, \end{array}\right. } \end{aligned}$$

the mode filter

$$\begin{aligned} {\widehat{E}}_c(k) ={\left\{ \begin{array}{ll} 1, &{} \quad \text {for }{ \inf _{n \in \{ -1,1\} } |k-n|\le \delta _r},\\ 0, &{} \quad \text {else}, \end{array}\right. } \end{aligned}$$

and finally the mode filter \({\widehat{E}}_n=1-{\widehat{E}}_r- {\widehat{E}}_c\).

Fig. 3
figure 3

Support of the mode filters \( E_r \), \( E_n \) and \( E_c \) in Fourier space

We use these projections to separate the error \( R=R_r+R_n + R_c\) in three parts, namely \(R_r=E_rR_r\), \(R_c=E_cR_c\) and \(R_n=E_nR_n\). These new variables satisfy

$$\begin{aligned} \partial _t R_r= & {} \Lambda R_r + \varepsilon E_r L_{c}(R) + \varepsilon ^2 E_r G, \end{aligned}$$
(17)
$$\begin{aligned} \partial _t R_n= & {} \Lambda R_n + \varepsilon E_n L_{c}(R) + \varepsilon ^2 E_n G, \end{aligned}$$
(18)
$$\begin{aligned} \partial _t R_c= & {} \Lambda R_c + \varepsilon E_c L_{c}(R) + \varepsilon ^2 E_c G, \end{aligned}$$
(19)

where

$$\begin{aligned} \varepsilon ^2 G = \varepsilon ^2 L_{s}(R) + \varepsilon ^{{\widetilde{\beta }}+1/2} L_r(R) + \varepsilon ^{-{\widetilde{\beta }}} \text {Res}(\varepsilon ^{1/2} \psi ). \end{aligned}$$

8 The normal form transform

As already said, in order to come to the long \({\mathcal {O}}(1/\varepsilon ^2)\)-timescale, we have to get rid of the terms

$$\begin{aligned} \varepsilon \widehat{E_j L_{c}(R)}(k,t)= & {} \varepsilon \frac{3}{2} i \rho (k) {\widehat{E}}_j(k) \left( \begin{array}{c} -({\widehat{a}}_1+ {\widehat{a}}_{-1})^{*2}*({\widehat{R}}_1+ {\widehat{R}}_{-1}) \\ ({\widehat{a}}_1+ {\widehat{a}}_{-1})^{*2}*({\widehat{R}}_1+ {\widehat{R}}_{-1}) \end{array}\right) (k,t), \end{aligned}$$

for \( j = n,r,c \) in (17)–(19). In a first step, we simplify the \( \varepsilon E_j L_{c}(R) \) for \( j = n,r,c \) by eliminating all non-resonant terms by normal form transformations. The \( \varepsilon E_j L_{c}(R) \) for \( j = n,r,c \) are sums of trilinear mappings w.r.t. \( a_{j_1} \), \( a_{j_2} \) and \( R_{j_3} \), with \( j_1,j_2,j_3 \in \{ -1,1\} \).

We recall from Remark 4.1 that to eliminate a trilinear term \( \varepsilon B(a_{j_1},a_{j_2},R_{j_3}) \) of the form

$$\begin{aligned} \widehat{B(a_{j_1},a_{j_2},R_{j_3})}= & {} \int \int b(k,k-k_1,k_1-k_2,k_2) {\widehat{a}}_{j_1}(k-k_1){\widehat{a}}_{j_2}(k_1-k_2) {\widehat{R}}_{j_3}(k_2) dk_2 dk_1 \end{aligned}$$

from the equation of \( R_j \) by a near identity transformation \( w_j = R_j + \varepsilon M(a_{j_1},a_{j_2},R_{j_3}) \) we have to choose

$$\begin{aligned} \widehat{M(a_{j_1},a_{j_2},R_{j_3})}= & {} \int \int m(k,k-k_1,k_1-k_2,k_2) {\widehat{a}}_{j_1}(k-k_1){\widehat{a}}_{j_2}(k_1-k_2) {\widehat{R}}_{j_3}(k_2) dk_2 dk_1 \end{aligned}$$

with

$$\begin{aligned} m(k,k-k_1,k_1-k_2,k_2) = \frac{b(k,k-k_1,k_1-k_2,k_2)}{-j \omega (k) - \omega (j_1)-\omega (j_2) + j_3 \omega (k-j_1-j_2)}. \end{aligned}$$

By the analysis of the denominator, made in Sect. 4, we can eliminate all terms except for the total resonant terms and second-order resonant terms.

After the transformation, we obtain a system

$$\begin{aligned} \partial _t w_r= & {} \Lambda w_r + \varepsilon E_r L_{c}(w_r) + \varepsilon ^2 H_r, \end{aligned}$$
(20)
$$\begin{aligned} \partial _t w_n= & {} \Lambda w_n + \varepsilon B_1(a_1,a_{-1},w_n) + \varepsilon ^2 H_n, \end{aligned}$$
(21)
$$\begin{aligned} \partial _t w_c= & {} \Lambda w_c + \varepsilon B_2(a_1,a_{-1},w_c) \nonumber \\{} & {} + \varepsilon B_3(a_{-1},a_{-1},w_c) + \varepsilon B_4(a_1,a_{1},w_c) + \varepsilon ^2 H_c, \end{aligned}$$
(22)

where the \( B_j \) are smooth trilinear mappings in their arguments and \( \varepsilon ^2 H_{r,n,c} = {\mathcal {O}}(\varepsilon ^2)\) with properties specified below.

\( \bullet \) The totally resonant term \( B_1(a_1,a_{-1},w_n) \) in (21) and the totally resonant term \( B_2(a_1,a_{-1},w_c) \) in (22) will be controlled by energy estimates in the following.

\( \bullet \) For the second-order resonant terms \( B_3(a_{-1},a_{-1},w_c) \) and \( B_4(a_1,a_{1},w_c) \) in (22), the denominator in the above normal form transformation would vanish quadratically for \( k = \pm 1 \) as we have seen in Sect. 4. Since the nominator only vanishes linearly at these wave numbers, the normal form transform would be unbounded. Therefore, the second-order resonant terms will be handled with the ideas presented in Sects. 5 and 6.

\( \bullet \) The term \( \varepsilon E_r L_{c}(w_r) \) contains totally resonant terms, too, i.e., it is of the form \( B_0(a_1,a_{-1},w_r) + {\mathcal {O}}(\varepsilon ^2) \). In order to explain subsequently a few possible improvements to our approach, we handle the totally resonant terms in \( \varepsilon E_r L_{c}(w_r) \) differently than the other totally resonant terms. The reason for this is, as already said, that the term \( \varepsilon E_r L_{c}(w_r) \) will be exponentially small w.r.t \( \varepsilon \) initially, and that it will take an \({\mathcal {O}}(1/\varepsilon ^2)\)-timescale to grow to an order \({\mathcal {O}}(\varepsilon )\) in any case. This observation allows us to reduce the number of necessary non-resonance conditions for general dispersive systems subsequently.

The properties of the normal form transformation are summarized in the following lemma.

Lemma 8.1

Let \( s > 1/2 \) and \( \sigma _0 \ge 0 \). The transformation

$$\begin{aligned} {\mathcal {T}}^\varepsilon : \left\{ \begin{array}{c} ({\mathcal {M}}^s_{\sigma /\varepsilon })^3 \rightarrow ({\mathcal {M}}^s_{\sigma /\varepsilon })^3,\\ (R_n,R_r,R_c) \mapsto (w_n,w_r,w_c), \end{array} \right. \end{aligned}$$

is a small perturbation of identity. For all \( \sigma \in [0,\sigma _0 ] \), the mapping is analytic. For all \( C_1 > 0 \), there exists an \( \varepsilon _0 > 0 \) such for all \( \varepsilon \in (0,\varepsilon _0) \) and all \( \sigma \in [0,\sigma _0 ] \) the following holds. For all \( (w_n,w_r,w_c) \) with \( \Vert (w_n,w_r,w_c) \Vert _{{\mathcal {M}}^s_{\sigma /\varepsilon } } \le C_1 \), there exists an analytic inverse. All bounds are independent of \( \varepsilon \in (0,\varepsilon _0) \) and \( \sigma \in [0,\sigma _0 ] \).

Proof

The estimate (15), the fact that \( |r_{jj_1j_2j_3}(k)| \) is uniformly bounded away from zero for the terms considered in the normal form transformation, and Neumann’s series imply the statements. \(\square \)

With this lemma, we immediately have

Corollary 8.2

Let \( s > 1/2 \) and

$$\begin{aligned} {\check{M}} = \Vert w \Vert _{{\mathcal {M}}_{\beta (t)}^s }:= \Vert w_{r} \Vert _{{\mathcal {M}}_{\beta (t)}^s }+\Vert w_{n} \Vert _{{\mathcal {M}}_{\beta (t)}^s } + \Vert w_{c} \Vert _{{\mathcal {M}}_{\beta (t)}^s }, \end{aligned}$$

with \( \beta (t) \) defined in (16). There exist constants \( C_1 \), \( C_3>0 \) independent of \( {\check{M}} \) and \( \varepsilon \in (0,\varepsilon _0] \), with \( \varepsilon _0 > 0 \) from Lemma 8.1, and a monotonically increasing function \( C_2( {\check{M}} )>0 \), independent of \( \varepsilon \in (0,\varepsilon _0] \), such that

$$\begin{aligned} \Vert \varepsilon ^2 H_j \Vert _{{\mathcal {M}}_{\beta (t)}^s }\le & {} C_1 \varepsilon ^2 \Vert w \Vert _{{\mathcal {M}}_{\beta (t)}^s } + C_2( {\check{M}} ) \varepsilon ^{{\widetilde{\beta }}+1/2} \Vert w \Vert _{{\mathcal {M}}_{\beta (t)}^s }^2 + C_3 \varepsilon ^2, \end{aligned}$$

for \( j = r,n,c \).

9 The final error estimates

In order to estimate the solutions of Eqs.  (20)–(22) for the error, we use the modulational Gevrey spaces introduced in Sect. 6. Introducing the new weighted variables

$$\begin{aligned} {\widehat{W}}_j (k) = {\widehat{w}}_j (k) \vartheta _{\beta }(k) \end{aligned}$$

for \( j = r,n,c \) allows us to work in classical Sobolev spaces. We find

$$\begin{aligned} \partial _t W_r= & {} \Lambda W_r + \Gamma W_r+ \varepsilon E_r {\widetilde{L}}_{c}(W_r) + \varepsilon ^2 {\widetilde{H}}_r, \end{aligned}$$
(23)
$$\begin{aligned} \partial _t W_n= & {} \Lambda W_n + \Gamma W_n+ \varepsilon {\widetilde{B}}_1(a_1,a_{-1},W_n) + \varepsilon ^2 {\widetilde{H}}_n, \end{aligned}$$
(24)
$$\begin{aligned} \partial _t W_c= & {} \Lambda W_c + \Gamma W_c+ \varepsilon {\widetilde{B}}_2(a_1,a_{-1},W_c) \nonumber \\{} & {} + \varepsilon {\widetilde{B}}_3(a_{-1},a_{-1},w_c) + \varepsilon {\widetilde{B}}_4(a_1,a_{1},W_c) + \varepsilon ^2 {\widetilde{H}}_c, \end{aligned}$$
(25)

where the operator \( \Gamma \) is defined in Fourier space by

$$\begin{aligned} \widehat{\Gamma W}(k) = - \eta \varepsilon (\inf _{m \in {{\mathbb {Z}}}_{odd}} |k-mk_0|) {\widehat{W}}(k). \end{aligned}$$
(26)

The trilinear mappings \( B_j \) from (20)–(22) transform into the \( {\widetilde{B}}_j \) which are again smooth trilinear mappings in their arguments. They are estimated below in detail. The remaining terms \( H_j \) from (20)–(22) transform into the \( {\widetilde{H}}_j \) whose properties are specified in the subsequent lemma.

Lemma 9.1

Let \( s > 1/2 \) and

$$\begin{aligned} M = \Vert W \Vert _{H^s }:= (\Vert W_{r} \Vert _{H^s }^2+\Vert W_{n} \Vert _{H^s }^2 + \Vert W_{c} \Vert _{H^s }^2)^{1/2}. \end{aligned}$$

There are constants \( C_1 \), \( C_3>0 \) independent of M and \( \varepsilon \in (0,\varepsilon _0] \), with \( \varepsilon _0 > 0 \) from Lemma 8.1, and a monotonically increasing function \( C_2(M)>0 \), independent of \( \varepsilon \in (0,\varepsilon _0] \), such that

$$\begin{aligned} \Vert \varepsilon ^2 {\widetilde{H}}_j \Vert _{H^s }\le & {} C_1 \varepsilon ^2 \Vert W \Vert _{H^s } + C_2(M) \varepsilon ^{{\widetilde{\beta }}+1/2} \Vert W \Vert _{H^s }^2 + C_3 \varepsilon ^2, \end{aligned}$$

for \( j = r,n,c \).

Proof

The lemma is mainly a reformulation of Corollary 8.2. \(\square \)

In order to estimate the solutions \( W_j \) of Eqs.  (23)–(25), we use energy estimates, i.e., we multiply the equation for \( W_j \) with \( W_j \) for \( j =r,n,c \) and take the \( H^s \)-scalar product \( (\cdot ,\cdot )_s \). We find

$$\begin{aligned} \partial _t (W_r,W_r)_s= & {} 2 \text {Re}(s_1 + s_2 + s_3 + s_4), \\ \partial _t (W_n,W_n)_s= & {} 2 \text {Re}(s_5 + s_6 + s_7 + s_8), \\ \partial _t (W_c,W_c)_s= & {} 2 \text {Re}(s_9 +s_{10} +s_{11}+s_{12} +s_{13}+s_{14}), \end{aligned}$$

with

$$\begin{aligned} \begin{array}{ll} s_1 = (W_r, \Lambda W_r )_s, &{} s_2 = (W_r, \Gamma W_r)_s, \\ s_3 = (W_r, \varepsilon E_r {\widetilde{L}}_{c}(W_r) )_s, &{} s_4 = (W_r, \varepsilon ^2 {\widetilde{H}}_r)_s, \\ \end{array} \end{aligned}$$

with

$$\begin{aligned} \begin{array}{ll} s_5 = (W_n, \Lambda W_n )_s, &{} s_6 = (W_n, \Gamma W_n)_s, \\ s_7 = (W_n, \varepsilon {\widetilde{B}}_1(a_1,a_{-1},W_n) )_s, &{} s_8 = (W_n, \varepsilon ^2 {\widetilde{H}}_n)_s, \\ \end{array} \end{aligned}$$

and

$$\begin{aligned} \begin{array}{ll} s_9 = (W_c, \Lambda W_c )_s, &{} s_{10} = (W_c, \Gamma W_c)_s, \\ s_{11} = (W_c, \varepsilon {\widetilde{B}}_2(a_1,a_{-1},W_c) )_s, &{} s_{12} = (W_c, \varepsilon {\widetilde{B}}_3(a_{-1},a_{-1},W_c) )_s, \\ s_{13} = (W_c, \varepsilon {\widetilde{B}}_4(a_1,a_{1},W_c) )_s, &{} s_{14} = (W_c, \varepsilon ^2 {\widetilde{H}}_c)_s. \end{array} \end{aligned}$$

Among the terms \( s_1,\ldots ,s_{14} \), there are terms which vanish identically and terms which do not make any difficulties since they have an \( \varepsilon ^2 \) in front. The dangerous terms are the ones which only have an \( \varepsilon \) in front, namely \( s_3,s_7,s_{11},s_{12},s_{13} \). They will be estimated through integration by parts, such as the totally resonant terms \( s_7 \) and \( s_{11} \), or by the damping terms \( s_2, s_6, s_{10} \), such as \( s_3 \) or the second-order resonant terms \(s_{12},s_{13} \).

We start with the terms which vanish identically, namely

\( \textbf{s}_{\textbf{1}}, \textbf{s}_{\textbf{5}}, \textbf{s}_{\textbf{9}}\): Due to the skew symmetry of \( \Lambda \), we immediately have

$$\begin{aligned} s_1 = s_5 = s_9 = 0. \end{aligned}$$

For the terms which have an \( \varepsilon ^2 \) in front, we proceed as follows and find:

\( \textbf{s}_{\textbf{4}}, \textbf{s}_{\textbf{8}}, \textbf{s}_{\textbf{14}}\): Using Lemma 9.1, the Cauchy–Schwarz inequality and \( a \le 1+ a^2 \) in the last term in \( {\widetilde{H}}_r \), \( {\widetilde{H}}_c \) and \( {\widetilde{H}}_c \), respectively, yields

$$\begin{aligned} |s_j|\le & {} C_1 \varepsilon ^2 \Vert W \Vert _{H^s}^2 + C_2(M) \varepsilon ^{{\widetilde{\beta }}+1/2} \Vert W \Vert _{H^s }^3 + C_3 \varepsilon ^2 (1+ \Vert W \Vert _{H^s}^2), \end{aligned}$$

for \( j = 4,8,14 \) using the notation from Lemma 9.1.

Next we go on with the rest of the \( W_r \)-equation.

\( \textbf{s}_{\textbf{2}}\): is the good term in the \( W_r \)-equation. There is a \( \sigma > 0 \) independent of \( 0 < \varepsilon ^2 \ll 1 \) such that

$$\begin{aligned} 2 \text {Re} ( s_2) = 2 \text {Re}( (W_r, \Gamma W_r)_s \le - \alpha (\eta ) \varepsilon (W_r, W_r)_s. \end{aligned}$$

We have \( \alpha (\eta ) \rightarrow \infty \) for \( \eta \rightarrow \infty \), cf. (26).

\( \textbf{s}_{\textbf{3}}\): is the dangerous term in the \( W_r \)-equation which, however, can be estimated by the \( s_2 \)-term. For the \( s_3 \)-term, we have

$$\begin{aligned} |s_3| =|(W_r, \varepsilon E_r {\widetilde{L}}_{c}(W_r) )_s| \le C_{s_3} \varepsilon (W_r, W_r)_s \end{aligned}$$

for a constant \( C_{s_3} = C_{s_3}(\psi ) \) independent of \( 0 < \varepsilon ^2 \ll 1 \).

Next we go on with the rest of the \( W_n \)-equation.

\( \textbf{s}_{\textbf{6}}\): is the good term in the \( W_n \)-equation. However, for our purposes it is sufficient to have \( 2 \text {Re}(s_6) \le 0 \).

\( \textbf{s}_{\textbf{7}}\): comes from the totally resonant terms. In Fourier space, we have to control terms of the form

$$\begin{aligned}{} & {} \varepsilon \mathop {\int }\limits _{{\mathbb {R}}} \mathop {\int }\limits _{{\mathbb {R}}} \mathop {\int }\limits _{{\mathbb {R}}} \overline{{\widehat{W}}_n(k)} i \vartheta (k) \rho (k) \big ( {\widehat{a}}_{j_1}(k-m){\widehat{a}}_{-j_1}(m-l) \vartheta ^{-1}(l) {\widehat{W}}_{n}(l) \big ) {d}m {d}l {d}k\\{} & {} \quad + \varepsilon \mathop {\int }\limits _{{\mathbb {R}}} \mathop {\int }\limits _{{\mathbb {R}}} \mathop {\int }\limits _{{\mathbb {R}}} {\widehat{W}}_n(k) \overline{ i \vartheta (k) \rho (k) \big ( {\widehat{a}}_{j_1}(k-m){\widehat{a}}_{-j_1}(m-l) \vartheta ^{-1}(l) {\widehat{W}}_{n}(l) \big )} {d}m {d}l {d}k\\{} & {} \quad = \varepsilon \mathop {\int }\limits _{{\mathbb {R}}} \mathop {\int }\limits _{{\mathbb {R}}} \mathop {\int }\limits _{{\mathbb {R}}} \overline{{\widehat{W}}_n(k)} i \vartheta (k) \rho (k) \big ( {\widehat{a}}_{j_1}(k-m){\widehat{a}}_{-j_1}(m-l) \vartheta ^{-1}(l) {\widehat{W}}_{n}(l) \big ) {d}m {d}l {d}k\\{} & {} \quad + \varepsilon \mathop {\int }\limits _{{\mathbb {R}}} \mathop {\int }\limits _{{\mathbb {R}}} \mathop {\int }\limits _{{\mathbb {R}}} {\widehat{W}}_n(l) \overline{ i \vartheta (l) \rho (l) \big ( {\widehat{a}}_{j_1}(l-m){\widehat{a}}_{-j_1}(m-k) \vartheta ^{-1}(k) {\widehat{W}}_{n}(k) \big )} {d}m {d}k {d}l\\{} & {} \quad = \varepsilon \mathop {\int }\limits _{{\mathbb {R}}} \mathop {\int }\limits _{{\mathbb {R}}} \overline{{\widehat{W}}_n(k)} \big (Q(k,k-l,l) {\widehat{W}}_{n}(l) \big ) {d}k {d}l, \end{aligned}$$

with

$$\begin{aligned} Q(k,k-l,l)= & {} \mathop {\int }\limits _{{\mathbb {R}}} i \vartheta (k) \rho (k) {\widehat{a}}_{j_1}(k-m){\widehat{a}}_{-j_1}(m-l) \vartheta ^{-1}(l) \\ {}{} & {} \qquad + \overline{ i \vartheta (l) \rho (l) {\widehat{a}}_{j_1}(l-m){\widehat{a}}_{-j_1}(m-k) \vartheta ^{-1}(k) } dm \\= & {} \mathop {\int }\limits _{{\mathbb {R}}} i \vartheta (k) \rho (k) {\widehat{a}}_{j_1}(k-m){\widehat{a}}_{-j_1}(m-l) \vartheta ^{-1}(l) \\ {}{} & {} \qquad + \overline{ i \vartheta (l) \rho (l) } {\widehat{a}}_{j_1}(k-m){\widehat{a}}_{-j_1}(m-l) \overline{\vartheta ^{-1}(k) } {d}m \\= & {} \varrho _1(k,l) \mathop {\int }\limits _{{\mathbb {R}}}{\widehat{a}}_{j_1}(k-m){\widehat{a}}_{-j_1}(m-l) {d}m, \end{aligned}$$

with

$$\begin{aligned} \varrho _1(k,l) = ( i \vartheta (k) \rho (k) \vartheta ^{-1}(l) - i \vartheta (l) \rho (l) \vartheta ^{-1}(k) ), \end{aligned}$$

and where we used that the product \( a_{j_1}a_{-j_1} \) is real. We have that \((k,l) \mapsto \varrho _1(k,l) \) is smooth and satisfies \( \varrho _1(k,k) = 0 \) such that finally \( | \varrho _1(k,l) | \le C | k -l | \). Since \( \mathop {\int }\limits _{{\mathbb {R}}}{\widehat{a}}_{j_1}(k-m){\widehat{a}}_{-j_1}(m-l) dm \) is strongly concentrated at \( k-l= 0 \), we gain another power of \( \varepsilon \). In detail, we estimate

$$\begin{aligned}{} & {} \varepsilon |\mathop {\int }\limits _{{\mathbb {R}}} \mathop {\int }\limits _{{\mathbb {R}}} \overline{{\widehat{W}}_n(k)} \big (Q(k,k-l,l) {\widehat{W}}_{n}(l) \big ) dk dl|\\{} & {} \quad \le \varepsilon \mathop {\int }\limits _{{\mathbb {R}}} \mathop {\int }\limits _{{\mathbb {R}}} | \overline{{\widehat{W}}_n(k)} | | {\mathcal {O}}(k-l) \mathop {\int }\limits _{{\mathbb {R}}}{\widehat{a}}_{j_1}(k-m){\widehat{a}}_{-j_1}(m-l) dm | | {\widehat{W}}_{n}(l) \big ) |{d}k {d}l\\{} & {} \quad \le \varepsilon \Vert \overline{{\widehat{W}}_n} \Vert _{L^2} \Vert {\mathcal {O}}(k-l) \mathop {\int }\limits _{{\mathbb {R}}}{\widehat{a}}_{j_1}(k-m){\widehat{a}}_{-j_1}(m-l) dm \Vert _{L^1(d(k-l))} \Vert {\widehat{W}}_{n}\Vert _{L^2} \end{aligned}$$

due to the Cauchy–Schwarz inequality and Young’s inequality for convolutions, such that finally

$$\begin{aligned} s_7 = {\mathcal {O}}(\varepsilon ^2). \end{aligned}$$

Finally we come to the remaining terms of the \( W_c \)-equation.

\( \textbf{s}_{\textbf{11}}\): The totally resonant terms \( \mathbf {s_{11}} \) are handled line for line as the totally resonant terms \( \mathbf {s_{7}} \), and so, we also have

$$\begin{aligned} s_{11} = {\mathcal {O}}(\varepsilon ^2). \end{aligned}$$

\( \textbf{s}_{\textbf{10}}\): is the good term which allows us to handle the second-order resonant terms. We have

$$\begin{aligned} 2 \text {Re} (s_{10}) = 2 \text {Re} (W_c, \Gamma W_c)_s \le - \eta \varepsilon (\Gamma ^{1/2}W_c, \Gamma ^{1/2} W_c)_s. \end{aligned}$$

\( \textbf{s}_{\textbf{12}}, \textbf{s}_{\textbf{13}}\): In the following, \( W_{c,1} \) denotes the part of \( W_c \) located at \( k = 1 \) and \( W_{c,-1} \) the part of \( W_c \) located at \( k = -1 \). Then the second-order resonant terms are written as

$$\begin{aligned} {\widetilde{B}}_{3,4}(a_{-1},a_{-1},W_{c,1}) = \rho \widetilde{{\widetilde{B}}}_{3,4}(a_{-1},a_{-1},W_{c,1}) \end{aligned}$$

and estimated by

$$\begin{aligned} |s_{12}|+|s_{13}|\le & {} C |(W_{c.-1}, \varepsilon {\widetilde{B}}_3(a_{-1},a_{-1},W_{c,1}) )_s|+ C | (W_{c,1}, \varepsilon {\widetilde{B}}_4(a_1,a_{1},W_{c,-1}) )_s | \\ {}\le & {} C \varepsilon \Vert \rho ^{1/2}W_{c,-1} \Vert _{H^s} \Vert \rho ^{1/2} \widetilde{{\widetilde{B}}}_3(a_{-1},a_{-1},W_{c,1}) \Vert _{H^s} \\ {}{} & {} + C \varepsilon \Vert \rho ^{1/2}W_{c,1} \Vert _{H^s} \Vert \rho ^{1/2} \widetilde{{\widetilde{B}}}_4(a_1,a_{1},W_{c,-1}) \Vert _{H^s}. \end{aligned}$$

The term \( \Vert \rho ^{1/2} \widetilde{{\widetilde{B}}}_4(a_1,a_{1},W_{c,-1}) \Vert _{L^2}^2 \) is in Fourier space of the form

$$\begin{aligned} \int \int \int | \rho ^{1/2}(k) {\widehat{a}}_1(k-l) {\widehat{a}}_1(l-m) {\widehat{W}}_{c,-1}(m) |^2 {d}m {d}l {d}k \le s_{15} + s_{16}, \end{aligned}$$

with

$$\begin{aligned} s_{15}= & {} \int \int \int | (\rho (k) - \rho (m+2) )|| {\widehat{a}}_1(k-l) {\widehat{a}}_1(l-m) {\widehat{W}}_{c,-1}(m) |^2 {d}m {d}l {d}k, \\ s_{16}= & {} \int \int \int | {\widehat{a}}_1(k-l) {\widehat{a}}_1(l-m) |^2 |\rho (m+2)| | {\widehat{W}}_{c,-1}(m) |^2 {d}m {d}l {d}k. \end{aligned}$$

We use that

$$\begin{aligned} \rho (k) - \rho (m+2) = \rho (m+2+k-m-2) - \rho (m+2) = {\mathcal {O}}(|k-m-2|), \end{aligned}$$

that \( \int {\widehat{a}}_1(k-l) {\widehat{a}}_1(l-m) dl \) is strongly concentrated at \( k-m \approx 2 \) and Young’s inequality to obtain a bound

$$\begin{aligned} s_{15} \le C \varepsilon \Vert W_{c,-1} \Vert ^2. \end{aligned}$$

Less complicated is the bound

$$\begin{aligned} s_{16} \le C \Vert \Gamma ^{1/2} W_{c,-1} \Vert ^2 \end{aligned}$$

since \( \rho (m+2) = {\mathcal {O}}(|m+1|) \). Thus, we finally obtain

$$\begin{aligned} |s_{12}|+|s_{13}|\le & {} C \varepsilon (\Gamma ^{1/2}W_c, \Gamma ^{1/2} W_{c})_s + C \varepsilon ^{3/2} \Vert \Gamma ^{1/2} W_{c} \Vert _{H^s} \Vert W_c \Vert _{H^s} \\\le & {} 2 C \varepsilon (\Gamma ^{1/2}W_c, \Gamma ^{1/2} W_{c})_s + C \varepsilon ^2 \Vert W_c \Vert _{H^s}^2 \\\le & {} C_{\psi } \varepsilon (\Gamma ^{1/2}W_c, \Gamma ^{1/2} W_{c})_s + C_1 \varepsilon ^2 \Vert W_c \Vert _{H^s}^2 \end{aligned}$$

where we used \( \varepsilon ^{3/2} ab \le \varepsilon a^2 + \varepsilon ^{2} b^2 \). This defines the constant \( C_{\psi } \) and we may increase the original constant \( C_1 \).

Summary: Collecting all estimates gives for

$$\begin{aligned} E_s = (W_r,W_r)_s + (W_n,W_n)_s + (W_c,W_c)_s \end{aligned}$$

that

$$\begin{aligned} \partial _t E_s\le & {} 2 \text {Re}(s_2) + 2 |s_3| +2 |s_4| + 2 \text {Re}(s_7) + 2 |s_8| \\ {}{} & {} +2 \text {Re}(s_{10}) +2 \text {Re}(s_{11})+2|s_{12}| +2|s_{13}|+2 |s_{14}| \\\le & {} - \alpha (\eta ) \varepsilon (W_r, W_r)_s + C_{s_3} \varepsilon (W_r, W_r)_s \\ {}{} & {} - \eta \varepsilon (\Gamma ^{1/2}W_c, \Gamma ^{1/2} W_c)_s + C_\psi \varepsilon (\Gamma ^{1/2}W_c, \Gamma ^{1/2} W_c)_s \\ {}{} & {} + 2 C_1 \varepsilon ^2 E_s + C_2(M) \varepsilon ^{{\widetilde{\beta }}+1/2} E_s^{3/2} + C_3 \varepsilon ^2 (1+ E_s). \end{aligned}$$

The third and fourth line can be made negative by choosing \( \eta \) sufficiently large, but independent of the small perturbation parameter \( 0 < \varepsilon \ll 1 \) such that we finally have

$$\begin{aligned} \partial _t E_s \le 2 C_1 \varepsilon ^2 E_s + C_2(M) \varepsilon ^{{\widetilde{\beta }}+1/2} E_s^{3/2} + C_3 \varepsilon ^2 (1+ E_s). \end{aligned}$$

Choosing \( C_2(M) \varepsilon ^{{\widetilde{\beta }}-3/2} E_s^{1/2}\le 1 \) yields

$$\begin{aligned} \partial _t E_s \le (2 C_1+C_3 + 1) \varepsilon ^2 E_s + C_3 \varepsilon ^2. \end{aligned}$$
(27)

Applying Gronwall’s inequality yields for all \( t \in [0,T_1/\varepsilon ^2] \) that

$$\begin{aligned} E_s(t) \le C_3 T_1 e^{ (2 C_1+C_3 + 1) T_1}=: M^2 \end{aligned}$$

independent of \(\varepsilon \in (0,\varepsilon _0)\) where \(\varepsilon _0>0\) had to be chosen so small that \( C_2(M) \varepsilon ^{{\widetilde{\beta }}-3/2} M \le 1 \). Therefore, we are done. \(\square \)

10 Discussion

In this section, we make a number of remarks about possible improvements and generalizations.

Remark 10.1

For an arbitrary, but fixed, \( {\widetilde{\beta }} > 3/2 \), the improved approximation can be constructed in such a way that the residual term \( \varepsilon ^{-{\widetilde{\beta }}} \text {Res}(\varepsilon ^{1/2} \psi ) \) is of order \( {\mathcal {O}}(\varepsilon ^2) \), cf. “Appendix A.” Hence, the error made by this higher-order approximation is of order \( {\mathcal {O}}(\varepsilon ^{{\widetilde{\beta }}}) \) in some Sobolev norm.

Remark 10.2

It is obvious by the proof that the assumption on the solutions of the DNLS equation (5), namely \( A\in C([0,T_0],G_{\sigma _0}^{s_A}) \), can be replaced by the weaker assumption that we take a solution constructed in Theorem 5.1 with \( s = s_A \), \( \sigma _A = \sigma _0 \) and \(A(T)\in G^s_{\sigma (T)}\) with \(\sigma (T)=\sigma _0-\eta T\).

Remark 10.3

From (17) to (20), we have eliminated all terms of order \( {\mathcal {O}}(\varepsilon ) \) except for the totally resonant ones. This is not necessary since finally \( \varepsilon E_r {\widetilde{L}}_{c}(W_r) \) appears in an equation where \( W_r \) is exponentially damped. Hence, other terms can be kept and other resonances can be handled as long as they are bounded away from odd integer multiples of the basic wave number \( k_0 = 1 \). This follows line for line as in [5]. Obviously our approach breaks down if a resonance falls on an integer multiple of the basic wave number \( k_0 \).

Remark 10.4

We have demonstrated that the DNLS approximation makes correct predictions about the dynamics of our chosen nonlinear Klein–Gordon equation (4). The question about possible generalizations and about the possible transfer to more complicated systems occurs. First of all, we would like to mention that the problems with the total resonance and the second-order resonance occur for all non-trivial systems for which the DNLS approximation can be derived. On the one hand, with this respect our system is not more complicated than necessary. On the other hand, the chosen nonlinear Klein–Gordon equation (4) is sufficiently complicated to contain all principle difficulties which have to be overcome.

Remark 10.5

Other additional difficulties one could think of have been handled in our situation before. For instance, quadratic terms in the original systems can be eliminated completely with a normal form transform for Klein–Gordon models. For other more complicated original systems, additional quadratic or quartic resonances can occur. It is not obvious how existing methods to handle such resonances interplay with the presented approach of this paper. The same is true for quasilinear systems such as the water wave problem. This will be the topic of future research.

Remark 10.6

It is the topic of parallel research to prove a DNLS approximation result for initial conditions which are not analytic in a strip of the complex plane but only live in a Sobolev space. In this case, the totally resonant terms have to be handled with energy estimates again. New ideas are needed to handle the second-order resonant terms. Moreover, all other terms of order \( {\mathcal {O}}(\varepsilon ) \) in the error Eq. (10) have to be eliminated by normal form transformations, i.e., no other resonances can be allowed. This is different to the situation in this paper where additional resonances bounded away from integer multiples of the basic wave number \( k_0 = 1 \) can be allowed due to the exponential smallness of these modes initially, cf. Remark 10.3.