1 Introduction

In this paper we study the long-time dynamics of the quasi-linear Klein–Gordon equation on the compact interval \([0, 2\pi ]\), under periodic boundary conditions

$$\begin{aligned} u_{tt}-u_{xx}+m u+N(u)=0 \quad u(t, x)\in \mathbb {R}\,, \quad t\in \mathbb {R}\,, \quad x\in \mathbb {T}:=\mathbb {R}/(2\pi \mathbb {Z})\,, \end{aligned}$$
(1.1)

where

$$\begin{aligned} m\in [1, 2], \qquad N(u):=f_2(u, u_x, u_{xx})\, \end{aligned}$$
(1.2)

and \(f_2\in C^{\infty }(\mathbb {R}^{3};\mathbb {R})\) is a homogeneous polynomial of degree 2.

We consider Klein–Gordon equations possessing a Hamiltonian structure, namely we require that the nonlinearity N in (1.2) has the form

$$\begin{aligned} N(u)=\partial _u G(u, u_{x})-\frac{d}{dx}\big (\partial _{u_x}G(u, u_x)\big )\,, \end{aligned}$$
(1.3)

where \(G\in C^{\infty }(\mathbb {R}^{2};\mathbb {R})\) is a homogeneous polynomial of degree 3.

The origin \(u=0\) is an elliptic equilibrium for the equation (1.1). The long-time stability of this fixed point has been proved in the seminal works [23,24,25] by Delort for a positive measure set of masses m. For long-time stability here we mean the following property: given \(N\ge 1\) there exists \(\varepsilon _0=\varepsilon _0(N)\) such that for all solutions u(t) with initial datum \(u_0\) such that \(\Vert u_0 \Vert _{H^s}<\varepsilon \) there exist constants \(c=c(N), C=C(N)>0\), independent of \(\varepsilon \), such that \(\Vert u(t)\Vert _{H^s}\le C \varepsilon \) for all \(t\in [0, c\, \varepsilon ^{-N}]\). This is a result of orbital stability for the elliptic equilibrium \(u=0\) and it does not provide information about the dynamics of small data solutions. While the theory of long-time stability is quite well-established for semilinear PDEs on some compact manifolds (without trying to be exhaustive [2, 3, 21, 22, 30], and the more recent results [5, 15] ), there are few results concerning quasi-linear PDEs using normal form techniques. We mention for instance [9, 10, 34], and the partial results (namely, where the stability time scale is \(\varepsilon ^{-m}\) for some fixed \(m\in \mathbb {Q}\), \(m\ge 1\)) [6, 14, 26, 33, 37, 42, 44]. We point out that results of long / global in time existence for Klein–Gordon equations as (1.1) have been proved also in the Euclidean setting, and we cite for instance [43, 46, 52].

The aim of the present paper is to provide an accurate description of the long-time dynamics for solutions arising from an open set of small initial data. With long-time behavior (or dynamics) we mean the evolution of the system (1.1) beyond the local time scale. Since the equation (1.1) is quadratic, the local time scale for solutions with initial data of size \(\varepsilon \) here corresponds to \(\varepsilon ^{-1}\). If the equation does not present 3-wave resonant interactions, as for instance in the pure gravity water waves system and KdV equation, then one can expect to obtain a quadratic lifespan \(\varepsilon ^{-2}\), over which the solutions remains of size \(\varepsilon \). Our aim is to provide dynamical information beyond this time scale, where we have to deal with the effect of higher order resonances.

We prove that the long-time evolution of initial data, within the above mentioned open set, is almost recurrent, in the sense that these solutions are very close to nonlinear oscillations, over the range of time \([0, c\,\varepsilon ^{{-9/4+\delta }}]\) for any \(\delta >0\) and some \(c>0\) independent of \(\varepsilon \).

The oscillatory motions that we consider are supported on finite dimensional tori and they can be minimal or not, in the sense that they may fill densely the underlying torus or not. It is possible that certain motions (of the former kind) may be continued to periodic or quasi-periodic solutions by using KAM techniques. However, these would provide the description of the dynamics of a zero measure set of data in phase space. In any case, at the best of our knowledge, such KAM results are not still available in literature for the quasi-linear Klein–Gordon equation on the circle. Concerning the existence of periodic and quasi-periodic solutions for semilinear Klein–Gordon equations on the circle we refer to [7, 12, 16, 19, 20, 39, 41, 49, 55] and to [8, 11, 13, 17, 54] (and reference therein) for the higher dimensional case.

An interesting research direction in the study of Hamiltonian PDEs concerns the orbital stability of invariant objects rather than fixed points, such as plane waves and quasi-periodic tori. About that, we mention [31, 45, 47, 56]. We also quote the stability result [18] for traveling waves of the Burger-Hilbert equation.

We point out that our result is not an orbital stability result, in the sense that we do not only prove the existence of solutions that stay close to certain embedded tori, but also that the solutions of equation (1.1) follow (in the sense of Sobolev norms) periodic or quasi-periodic orbits that can be explicitly computed.

To explain the main difficulties in proving our result, we can think to set action-angle variables on the finite dimensional embedded torus complemented by cartesian coordinates in the normal infinite dimensional directions. In the following we refer to tangential and normal directions with respect to the torus.

To prove that solutions u(t) of the Klein–Gordon equation starting close to the tours follow closely some of the orbits \(\varphi (t)\) supported on it we shall study the evolution of the error function \(R:=u-\varphi \).

One of the main issue is that the tangential and normal dynamics of R are coupled. We will use normal form techniques to almost decouple them at linear level.

Since the finite dimensional torus is embedded in an infinite dimensional phase space, we certainly need to control the deviation between u(t) and \(\varphi (t)\) along infinitely many normal directions. In this analysis the quasi-linear nature of the equation provides the main issue. We deal with this by combining para-differential calculus techniques and using a modified energy method.

A further issue comes from the analysis of tangential directions. Indeed, we need to control the deviation of the trajectories both in the actions and angles directions.

As it is well known, even for integrable Hamiltonian systems, the linear dynamics along the angle directions is unstable. To overcome this problem we need to impose certain non-resonance conditions on the linear frequencies of oscillations. This accounts to exclude just a zero measure set of masses.

We will continue this discussion, giving more details, in Sect. 1.2.

1.1 Main Result

To state our main result is useful to look at equation (1.1) as a first order system, which, in appropriate complex coordinates \(Z:=(z^+, z^-)\) (see (2.10)), reads as

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{z}^+ =\textrm{i} \Lambda z^++ \frac{\textrm{i} }{\sqrt{2}} \Lambda ^{-1/2}N\Big (\Lambda ^{-1/2}\big (\frac{z^++ z^-}{\sqrt{2}}\big )\Big )\,, \qquad \Lambda :=\sqrt{-\partial _{xx}+m}\\[2mm] \dot{{z}}^- =\textrm{i} \Lambda {z}^- - \frac{\textrm{i} }{\sqrt{2}} \Lambda ^{-1/2}N\Big (\Lambda ^{-1/2}\big (\frac{z^++ z^-}{\sqrt{2}}\big )\Big )\,. \end{array}\right. } \end{aligned}$$
(1.4)

We look for solutions in the Sobolev real subspaces

$$\begin{aligned} H^s:=\left( H^{s}(\mathbb {T};\mathbb C)\times H^{s}(\mathbb {T};\mathbb C) \right) \cap \mathcal {U}\,, \qquad \mathcal {U}:=\{(z^{+},z^{-})\in {L}^{2}(\mathbb {T};\mathbb {C}^{2})\; :\; \overline{z^{+}}=z^{-}\}\,, \end{aligned}$$
(1.5)

with \(s\ge 0\). With abuse of notation we denote by \(\Vert \cdot \Vert _{H^{s}}\) (see (2.3)) both the norm on \(H^{s}(\mathbb T;\mathbb C)\) and on the product space \(H^s\).

Fix \(\varepsilon >0\) small, \(N\in \mathbb N\), a symmetric subset \(S:=\{j_{1},\ldots ,j_{N}\}\subset \mathbb Z\) (meaning that \( j\in S\) implies \(-j\in S\, \)) and let \((\xi ,\theta )\in \mathcal {O}^{N}\times \mathbb T^{N}\) for some compact subset \(\mathcal {O}\subset \mathbb {R}_+:=(0, \infty ]\) containing the origin. We consider small amplitude, oscillating functions of the form

$$\begin{aligned} \varepsilon \varphi (t)=\varepsilon \varphi _{\xi ,\theta }(t):=\sum _{j\in S}\varepsilon \sqrt{\xi _{j}}e^{\textrm{i} (\theta _j+jx+\omega _j t) }\,, \qquad \xi _{j_i}:=\xi _i\,,\quad \theta _{j_i}:=\theta _{i}\,,\;\;i=1,\ldots ,N\,, \end{aligned}$$
(1.6)

where the frequencies \(\omega _{j}\), \(j\in S\) satisfy

$$\begin{aligned} \sup _{j\in S}|\omega _{j}-\Lambda (j)|\lesssim \varepsilon ^2,\qquad \Lambda (j):=\sqrt{j^2+m}. \end{aligned}$$

These oscillating functions will be obtained as nonlinear corrections of small amplitude linear solutionsFootnote 1 through normal form methods.

Our goal is to prove that there is plenty of solutions of (1.4) which are “well-approximated” by functions as in (1.6) for a very long time scale provided that the \(\omega _j\) are chosen as suitable corrections of the linear frequencies of oscillation. More precisely, the main result is the following.

Theorem 1.1

Let \(\texttt{n}\in \mathbb {N}\) and set \(N=2 \texttt{n}+1\). Consider the equation (1.4)–(1.3) and the subset \(S=\{ -\texttt{n}, -(\texttt{n}-1), \dots , \texttt{n}-1, \texttt{n} \}\subset \mathbb Z\). There exist a symmetric matrix \(\texttt{C}\in \mathbb {R}^{N\times N}\) such that for almost all \(m\in [1,2]\) there is \(s_0>1/2\) such that for any \(s\ge s_0\) the following holds. There is \(\varepsilon _0=\varepsilon _0(s,S)\) such that for all \(0<\varepsilon <\varepsilon _0\) and for any \(\xi \in \mathbb {R}^{N}_{+}\), \(\theta \in \mathbb T^{N}\) there exists an open set \(\mathcal {U}^{N}_{\xi ,\theta }\subseteq H^{s}(\mathbb T;\mathbb C)\) such that for any \(z_0\in \mathcal {U}^{N}_{\xi ,\theta }\) the solution z(t) of (1.4) with \(z(0)=z_0\) belongs to \(C([0,T];H^{s})\), with \(T=\varepsilon ^{-{(\frac{9}{4})^{-}}}\) and satisfies

$$\begin{aligned} \sup _{t\in [0,T]}\Vert z(t)-\varepsilon \varphi (t)\Vert _{H^{s}}\lesssim _s \varepsilon ^2\,, \end{aligned}$$
(1.7)

where \(\varepsilon \varphi \) is defined as in (1.6) with frequencies

$$\begin{aligned} \omega _j=\omega _{j}(\xi ):=\sqrt{j^2+m}+\varepsilon ^2\sum _{k\in S}\texttt{C}_{jk}\xi _{{k}},\;\;\; \quad \,j\in S. \end{aligned}$$

Remark 1.2

We remark that we mostly exploit the symmetry of the set S in order to avoid certain resonant interactions. We need the additional property

$$\begin{aligned} j\notin S \qquad \Leftrightarrow \qquad |j|>\max _{k\in S}\{ |k| \} \end{aligned}$$

only for the construction of a modified energy in Sect. 5.2.2.

The matrix \(\texttt{C}\), which provides the corrections to the linear frequencies of oscillation \(\Lambda (j)\), is completely determined by the choice of the set S and by the coefficients of the nonlinearity N.

When the matrix \(\texttt{C}\) is invertible one can modulate the frequencies \((\omega _j)_{j\in S}\) through the choice of the amplitudes \((\xi _j)_{j\in S}\). When the PDE presents no external/physical parameters this is the approach implemented to find small amplitude quasi-periodic solutions, which bifurcate from quasi-periodic functions of the form (1.6) for which diophantine non-resonance conditions are imposed (we mention for instance [1, 32, 38, 40].

To prove our result these conditions are not required. In principle we could consider resonant oscillating motions as

$$\begin{aligned} \varepsilon \varphi (t, x)=\varepsilon \,\left( \sqrt{\xi _{j}}e^{\textrm{i} j x }+\sqrt{\xi _{-j}}e^{-\textrm{i} j x }\right) \,e^{\textrm{i} ( \theta _j+ \omega _j t)}, \end{aligned}$$

where we assumed that \(\theta _j=\theta _{-j}\) and \(\omega _j=\omega _{-j}\). The existence of such functions depend on the choice of the nonlinearity. Therefore, our result allow to control the dynamics of more initial data with respect to the ones considered by KAM theory, but only for finite (even if long) time. On the other hand, Birkhoff normal form methods allow to control the evolution of any small (enough) initial data over any polynomial time scale, but they do not provide a description of the orbits as the one provided by Theorem 1.1.

Description of the Long-time Dynamics and Time Scales Theorem 1.1 provides an accurate description of the long-time dynamics for general Hamiltonian Klein–Gordon equations with quadratic nonlinearities. The strength of this result relies on the fact that we are able to approximate solutions of the Klein–Gordon equation over long time with explicit oscillating functions \(\varphi (t)\) which are solutions of integrable finite dimensional systems. Such systems are obtained as the restriction of the truncated Hamiltonian, after some steps of normalization, to some finite dimensional subspace. Since these ones are integrable in the sense of Liouville, all the trajectories lie on (finite dimensional) tori and they are conjugated to a linear flow \(\theta \mapsto \omega t+\theta \). Hence, once the nonlinearity has been fixed, the evolution of the functions \(\varphi (t)\) is completely determined, even for infinite time. We remark again that the orbits \(\varphi (t)\) may lie also on resonant tori.

In the framework of Hamiltonian PDEs, a similar result has been provided in the work by Bernier–Grébert [4] for the generalized KdV and Benjamin–Ono equations on the circle. A description of the dynamics is provided for any polynomial time scale \(\varepsilon ^{-r}\) and for many small initial data by using the rational normal form method [5]. However, apart from the fact that the above mentioned PDEs are not quasi linear equations, the information provided about the long time behavior of small data solutions differ substantially from the ones given by Theorem 1.1, indeed:

  1. 1.

    The solutions of the generalized KdV and Benjamin–Ono equations are approximated by oscillating functions whose expression is not explicit. For instance, it is not clear whether these are periodic, quasi-periodic or none of the previous.

  2. 2.

    The vicinity among the solutions of the PDEs and their approximating oscillating functions is not ensured in the phase space (where the initial data belong). Indeed, the approximation (as (1.7)) is provided in a Sobolev space which is less regular than the space of initial data (with a loss of one derivative).

With respect to [4] here we pay these extra information on the dynamics by considering "less" initial data (even if they still form open sets in the phase space) and shorter time scales. Concerning the latter, the fact that we are not able to provide a result as Theorem 1.1 for any polynomial time scale is not merely a technical issue.

We observe that just to obtain an orbital stability result, one would require the integrability of the normalized Hamiltonian at any order, and this requirement is not satisfied by the Klein–Gordon system (and it is not expected in general from non integrable equations, even on one dimensional spatial domains). The normalized Hamiltonian of Klein–Gordon at degree six is not integrable, it has only half of the required constants of motion, usually called super-actions. This is due to the dispersion relation of Klein–Gordon, and more precisely on the multiplicity of the linear eigenvalues \(\textrm{i} \Lambda (j)\).

Therefore the study of the dynamics of the normal form at higher orders can be very complicated to study and we cannot expect to give an accurate description of it, especially when the number of degrees of freedom (that is N in Theorem 1.1) is large.

For this reason, in this situation, we have an expected optimal time scale for which the approximation (1.7) holds. Certainly the approximation (1.7) provided by Theorem 1.1 cannot hold true beyond the time scale \(\varepsilon ^{-3^-}\). One can see it from the proof of Lemma 6.5, which provides energy estimates for the linearized part of the equation (1.12) for low frequencies. We now explain this issue and how we obtain the time scale \(\varepsilon ^{-\frac{9}{4}^-}\) in the main theorem. It is useful to denote by \(\varepsilon ^{-2-\sigma }\) the time scale of the approximation and highlight the restriction on the parameter \(\sigma >0\).

We start by noticing that a sub problem of our analysis consists in controlling the deviation among the trajectories of a finite dimensional integrable Hamiltonian which is quadratic in the action variables I (up to linear terms in actions which play no role in this analysis); we refer to Hamiltonian (6.10) where \(I_j=|w_j|^2\).

The key problem is to control the linearization of the Hamiltonian vector field at one of the trajectories considered. This linearized problem approximately describes the time evolution of the first variation of the action-angle variables.

The main issue is that the variation in the angles direction grows linearly in time. The first bound in (6.19), which can be satisfied by taking \(\sigma <1\), shows how this fact reflects on the estimate of the energy for low frequencies. We remark that this is a finite dimensional issue (independent of the PDE’s context that we are considering) and it cannot be overcome neither by the presence (if any) of integrable Hamiltonian terms of higher order. Indeed, in a small amplitude regime, the quadratic term in actions provides the main contribution. Hence, this gives the upper bound \(\varepsilon ^{-3^-}\) on the time of validity of the approximation (1.7).

A further restriction on the parameter \(\sigma \) comes from the fact that the normalized Hamiltonian of the Klein–Gordon is integrable only up to degree six. Since we are interested in approximating solutions of the Klein–Gordon equation with trajectories of an integrable system (hence, which are completely known), we need to consider \(\varepsilon \varphi \) in (1.7) as a solution of the Klein–Gordon equation up to remainders \(O(\varepsilon ^5)\), see (4.7). This means that the residual term \( \text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\) in the equation (1.12) cannot be made too small. This affects the last bound in (6.19), where we remark that the parameter \(\beta \) has to be chosen \(>2+2\sigma \) in order to control the nonlinear terms of the equation (1.12). If we relax the constraints on \(\beta \) we need to enforce the limitations on \(\sigma \), and viceversa. Then, in order to satisfy all the constraints on \(\beta \) and \(\sigma \), it turns out that \(\sigma \) has to be taken strictly less than 1/4. This gives the time scale \(\varepsilon ^{-\frac{9}{4}^-}\) provided in Theorem 1.1. Clearly this loss, with respect to the time scale \(\varepsilon ^{-3^-}\), depends on our method and we reserve to improve it as matter of future investigations.

The Set of Initial Data The open set \(\mathcal {U}^N_{\xi , \theta }\) is the neighborhood of an embedded N-dimensional torus in the phase space. Then the set of initial data for which we can describe the long-time dynamics forms an open covering of a continuous family of embedded N-dimensional tori.

The following proposition provides a characterization of this set.

Proposition 1.3

Under the assumptions of Theorem 1.1 there exists a diffeomorphism \(\mathcal {F}:B_{\varepsilon }(H^s)\rightarrow H^s\), where \(B_{\varepsilon }(H^s)\) is the ball of radius \(\varepsilon >0\) centered at the origin, such that the following holds. Defining

$$\begin{aligned} {\mathcal {B}}_S^{N}:=\left\{ \varepsilon \varphi _{\xi ,\theta }(0):=\sum _{j\in S}\varepsilon \sqrt{\xi _{j}}e^{\textrm{i} (\theta _j+jx) }\;\;:\;\; (\xi ,\theta )\in \mathcal {O}^{N}\times \mathbb T^{N} \right\} \,, \end{aligned}$$
(1.8)

one has that the union of the open sets \(\mathcal {U}_{\xi ,\theta }^{N}\) is a covering of an \(\varepsilon ^{7/2}\)-neighbourhood of the embedded manifold \(\mathcal {F}({\mathcal {B}}_S^{N})\subseteq H^{s}\), namely

$$\begin{aligned} \mathcal {A}_{\varepsilon ^{7/2}}^{N}:=\left\{ u:=u_1+u_2\in H^{s}\;:\; u_1\in \mathcal {F}({\mathcal {B}}_S^{N})\,, \Vert u_{2}\Vert _{H^{s}}\le \frac{\varepsilon ^{7/2}}{2} \right\} \subseteq \bigcup _{(\xi ,\theta )\in \mathcal {O}^{N}\times \mathbb T^{N}}\mathcal {U}_{\xi ,\theta }^{N}\,. \end{aligned}$$
(1.9)

Remark 1.4

We note that the union of the sets \(\mathcal {B}_S^N\), as N and S vary respectively in \(\mathbb {N}\) and in the symmetric subsets of \(\mathbb Z\), contains the set of trigonometric polynomials.

The result above implies that solutions evolving from a large family of initial data can be approximated, in the Sobolev topology, by oscillatory motions thanks to Theorem 1.1. We remark that, since trigonometric polynomials are dense in Sobolev spaces, in principle taking N larger one can consider more and more functions \(u_1\in \mathcal {F}({\mathcal {S}}^{N})\) in (1.9), being the map \(\mathcal {F}\) a diffeomorphism. The price to pay in order to apply Theorem 1.1 is to shrink to zero the ball of radius \(\varepsilon \) in which the approximation holds.

Relation with Modulation Theory The functions \(\varepsilon \varphi \) shall be chosen as good approximate solutions of the quasi-linear Klein–Gordon equation (1.1). An efficient way to construct approximate solutions for a Hamiltonian PDE is to perform a normalization of the Hamiltonian and consider orbits of its resonant part.

A different method to study the long time evolution and stability of oscillatory solutions to partial differential equations is provided by the modulation theory. In this framework we mention the work by Düll [27], where the NLS approximation is provided for a quadratic quasi-linear Klein–Gordon equation on \(\mathbb {R}\) (we also mention [28, 29, 50, 51], and reference therein, for similar results on other PDEs). The description of the dynamics in these papers is given over the cubic time scale \(\varepsilon ^{-2}\), where the only resonant effects are given by 3-wave resonances. The crucial difference with respect to our analysis is that here we deal with higher order resonances taking advantage of considering the PDE on the torus and of its Hamiltonian structure.

1.2 Strategy of the Proof

First of all we remark that equation (1.4), in view of the assumption (1.3), can be written as

$$\begin{aligned} \partial _{t}Z=X_{H}(Z)\,,\quad Z={\bigl [{\begin{matrix}z\\ \overline{z}\end{matrix}}\bigr ]}\,, \end{aligned}$$
(1.10)

where \(X_{H}\) is the Hamiltonian vector field of the KG equation in complex coordinates and H admits the expansion (see (2.12))

$$\begin{aligned} H(Z)=H^{(2)}(Z)+H^{(3)}(Z),\qquad H^{(2)}(Z)=\int _{\mathbb {T}}\overline{z}\Lambda z\ \textrm{d}x, \end{aligned}$$

where \(H^{(3)}\) is homogeneous of degree 3 in the variables \(Z,Z_{x}\). Precise details on the Hamiltonian structure will be given in Sect. 2.1. We now discuss the strategy of the proof of our main result.

(1) Approximate solution.

The first step is to find a suitable approximate solution of the KG equation, of the form (1.6), supported on some symmetric set of Fourier modes \(S\subset \mathbb Z\). It turns out that, for our aims, the choice \(\omega _j\equiv \Lambda (j)\) (namely to consider linear solutions of KG) provides a rough approximation. In order to determine a better approximation through oscillatory motion we perform some steps of “weak” Birkhoff normal form. In other words we construct an invertible, bounded, symplectic map \(\Phi _{B}\) with the following properties:

\(\bullet \) The transformed Hamiltonian has the form

$$\begin{aligned} \mathcal {H}:=H\circ \Phi _B^{-1}=H^{(2)} + \mathcal {H}_{\text{ res }}^{(4, 0)} +\mathcal {H}^{(>)}+\mathfrak {R}^{(\ge 6)}\,, \end{aligned}$$
(1.11)

where

$$\begin{aligned} \mathcal {H}_{\text{ res }}^{(4, 0)}(W)=\tfrac{1}{2}\texttt{C} I\cdot I\qquad I:=(|w_{j}|^2)_{j\in S},\qquad \Phi _{B}(Z):=W={\bigl [{\begin{matrix}w\\ \overline{w}\end{matrix}}\bigr ]}, \end{aligned}$$

with \(\texttt{C}\) some symmetric \(N\times N\) matrix, \(\mathfrak {R}^{(\ge 6)}\) has a zero of order 6 at the origin and \(\mathcal {H}^{(>)}\) is an Hamiltonian function which vanishes on the finite dimensional subspace \(\mathcal {U}_S=\{ w_n=0\,\,\,n\notin S \}\). We remark that the matrix \(\texttt{C}\) is completely determined in terms of the coefficients of the original Hamiltonian H and the set of modes S. This is the content of Proposition 3.9

\(\bullet \) Functions \(\varepsilon \varphi \) of the form (1.6) are approximate solutions for the transformed equation

$$\begin{aligned} \partial _{t}W=X_{\mathcal {H}}(W),\qquad \Phi _{B}(Z)=W, \end{aligned}$$

in the sense that

$$\begin{aligned} \textrm{Res}_{\mathcal {H}}(\varepsilon \varphi ):=-\varepsilon \partial _{t}\varphi +X_{\mathcal {H}}(\varepsilon \varphi )\sim \varepsilon ^{5}. \end{aligned}$$

The precise estimates are given in Lemma 4.1.

(2) The error function. Once we constructed the approximate oscillatory solution we introduce the “error” function

$$\begin{aligned} \varepsilon ^{\beta }V=W-\varepsilon \varphi ,\qquad \beta >2. \end{aligned}$$

Our aim is to control the norms of the function V, over a long time interval, to measure the deviation between the true solution W of the KG equation in the Birkhoff coordinates and the approximate solution \(\varepsilon \varphi \). Thanks to estimates on the map \(\Phi _{B}\) we are able to show that (see Proposition 4.2) \(Z-\varepsilon \varphi \) remains small over the time scale \(T\sim \varepsilon ^{-(9/4)^{-}}\), i.e. satisfies (1.7), provided that

$$\begin{aligned} \sup _{t\in [0, T]} \varepsilon ^{\beta } \Vert V \Vert _{H^s} \le 2 \varepsilon ^{\beta },\qquad T\sim \varepsilon ^{-(9/4)^{-}}. \end{aligned}$$

The core of the paper is then to show that the above estimate holds true. We have that V solves the problem (see Sect. 4.2)

$$\begin{aligned} \dot{V}=d X_{\mathcal {H}}(\varepsilon \varphi ) [V] +\varepsilon ^{\beta } \mathcal {Q} (\varepsilon \varphi )[V, V] +\varepsilon ^{-\beta } \text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\,, \end{aligned}$$
(1.12)

where

$$\begin{aligned} \mathcal {Q} (\varepsilon \varphi )[V, V]:= \int _0^1 (1-t)\,d^2 X_{\mathcal {H}}(\varepsilon \varphi +t \varepsilon ^{\beta } V)[V, V]\,dt\,. \end{aligned}$$

In order to study the evolution of the Sobolev norms of solutions of (1.12) we provide a priori energy estimates for such equation, recalling that the term \(\text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\) is perturbative thanks to the choice of \(\varepsilon \varphi \). Due to the quasi-linear nature of the nonlinearity we first need to study the pseudo-differential structure of the linear operator \(d X_{\mathcal {H}}(\varepsilon \varphi )[\cdot ]\) and provide a para-differential formulation of the nonlinear term \(\mathcal {Q} (\varepsilon \varphi )[V, V]\). This is the content of Sect. 4.3.

(3) High/low frequencies analysis and normal forms. We first note that given V, defined for all \(t\in [0,T]\), \(\partial _{t}\Vert \varepsilon ^{\beta }V\Vert _{s}^{2}=\partial _{t}(D^{2s}V,V)_{L^{2}}\) where D is the Fourier multiplier \(D=\sqrt{-\partial _{xx}+1}\) and \((\cdot ,\cdot )_{L^2}\) the \(L^{2}\)-scalar product. Using equation (1.12) we have that \(\partial _{t}\Vert \varepsilon ^{\beta }V\Vert _{s}^{2}\) is controlled from above by the sum of three terms

$$\begin{aligned} \partial _{t}\Vert \varepsilon ^{\beta }V\Vert _{s}^{2}\lesssim A_{dX_{\mathcal {H}}}+A_{\mathcal {Q}}+A_{Res}, \end{aligned}$$

with the following properties:

\(\bullet \) \(A_{Res}\) is the contribution coming from the residual term. In particular we show that

$$\begin{aligned} A_{Res}\lesssim \varepsilon ^{\beta }\sup _{[0,T]}\Vert V\Vert _{H^{s}}\sup _{[0,T]}\Vert \textrm{Res}_{\mathcal {H}}(\varepsilon \varphi )\Vert _{H^s}\lesssim \varepsilon ^{\beta +5}, \end{aligned}$$

The smallness of the residual \(\textrm{Res}_{\mathcal {H}}(\varepsilon \varphi )\) is obtained by construction of \(\varepsilon \varphi \).

\(\bullet \) The term \(A_{\mathcal {Q}}\) is the contribution coming from the quadratic part in V. The energy estimates for the quadratic terms \(\mathcal {Q}(\varepsilon \varphi )\) are obtained thanks to the use of para-differential techniques. So we get

$$\begin{aligned} A_{\mathcal {Q}}\lesssim \varepsilon ^{3\beta }\sup _{[0,T]}\Vert V\Vert _{H^{s}}^{3}. \end{aligned}$$

Its smallness is guaranteed by taking \(\beta \) large enough.

\(\bullet \) The term \(A_{dX_{\mathcal {H}}}\) is the most delicate one and comes from the contribution of the linearized operator \(dX_{\mathcal {H}}(\varepsilon \varphi )\) at \(\varepsilon \varphi \). The main issue is to control the effect of this linear part. We remark that a direct estimate of this term provides a bound like

$$\begin{aligned} A_{dX_{\mathcal {H}}}\lesssim \varepsilon ^{2\beta +1}\sup _{[0,T]}\Vert V\Vert _{H^{s}}^{2}. \end{aligned}$$

An estimate like this allows a control of the norm of V only over the trivial time scale of size \(\varepsilon ^{-1}\). This is due to the fact that our equation has a quadratic nonlinearity. Of course this estimate is not sufficient to our aims.

In Sects. 56 we improve such estimate to show the stability of V over a longer time scale.

The analysis of the linearized operator is done by performing normal form methods. Since we want to reach stability beyond the nonlinear time scale we shall deal with resonances of order (three and) four, i.e. solutions of

$$\begin{aligned} \sigma _1\Lambda (j_1)+\sigma _2\Lambda (j_2)+\sigma _3\Lambda (j_3)+\sigma _4\Lambda (j_4)=0, \qquad \sigma _i=\pm , j_i\in \mathbb Z. \end{aligned}$$

By picking the mass \(m\in [1, 2]\) in a positive Lebesgue measure subset, we can prove that (see Lemma 3.8) these resonances are of the form (up to permutations)

$$\begin{aligned} j_1=j_2, \quad j_3=j_4, \qquad \sigma _1+\sigma _2=0, \qquad \sigma _3+\sigma _4=0. \end{aligned}$$

Then the Hamiltonian terms Fourier supported on these resonances (resonant terms) cannot be eliminated and we have to guarantee that they do not let growth much the Sobolev norms of R. In principle it is not clear how to deal with the resonant terms Fourier supported on

$$\begin{aligned} j_1, j_3\in \text{ supp } (\widehat{V}) \qquad \text{ and } \qquad j_2, j_4\in \text{ supp }(\widehat{\varphi }). \end{aligned}$$

Indeed such resonant monomials are not dependent just by the actions \(|\widehat{\varphi }(j)|^2\), \(|\widehat{V}(j)|^2\). For instance they may have the form

$$\begin{aligned} \widehat{\varphi }(j_1)\, \overline{\widehat{V}(j_1)}\,\widehat{\varphi }(j_3)\, \overline{\widehat{V}(j_3)}\,, \end{aligned}$$
(1.13)

which are clearly angle-dependent and they could generate hyperbolicity and unstable phenomena.

To overcome this problem we exploit the fact that the Fourier support S of \(\varepsilon \varphi \) is compact. The idea is to split the study of the time evolution of low and high modes of the error function V. Thanks to the weak normal form of step (1) the projection on the low frequencies of the linearized vector field at \(\varepsilon \varphi \) of the Hamiltonian terms \(\mathcal {H}^{(>)}\) and \(\mathfrak {R}^{(\ge 6)}\) vanish. Then the linear part of the equation for the low frequency arise from the linearization of an integrable finite dimensional Hamiltonian \(H^{(2)} + \mathcal {H}_{\text{ res }}^{(4, 0)}\).

Despite the integrability, the linearized problem at an approximate solution of an integrable system presents instability in some directions (the angles directions). Thanks to the fact that we pushed the non-normalized part of the Hamiltonian at higher orders and to the choice of the initial data (6.2), we are able to control the evolution of the low modes for long time.

In normal form coordinates the linear dynamics of low and high frequencies are decoupled, and terms like (1.13) do not appear in the linear problem for high frequencies. In Sect. 5, we provide energy estimates for the projection of V onto high frequencies where we deal with the quasi-linear nature of the equation by using normal form techniques and energy methods.

2 Functional Setting

2.1 Hamiltonian Formalism

We denote by \(H^s(\mathbb T):=H^{s}(\mathbb {T};\mathbb {C})\) the usual Sobolev space of functions \(\mathbb {T}\ni x \mapsto u(x)\in \mathbb {C}\). We consider the following Fourier expansion of a function u(x) , \(x\in \mathbb {T}\),

$$\begin{aligned} u(x) = \frac{1}{\sqrt{2\pi }} \sum _{n \in \mathbb Z} \widehat{u}(n) e^{\textrm{i} n x } \, , \qquad \widehat{u}(n) := \frac{1}{\sqrt{2\pi }} \int _{\mathbb {T}} u(x) e^{-\textrm{i} n x } \, dx \, . \end{aligned}$$
(2.1)

We shall also use the notation

$$\begin{aligned} u_n^{+1} := u_n:=\widehat{u}(n) \qquad \textrm{and} \qquad u_n^{-1} := \overline{u_n} :=\overline{\widehat{u}(n)} \, . \end{aligned}$$
(2.2)

For \(\xi \in \mathbb {R}\) we define \(\langle \xi \rangle :=\sqrt{1+|\xi |^{2}}\) and we denote by \(\langle D\rangle \) the Fourier multiplier with symbol \(\langle \xi \rangle \), i.e.

$$\begin{aligned} \langle D\rangle e^{\textrm{i} n x}=\langle n\rangle e^{\textrm{i} n x},\quad n\in \mathbb {Z}. \end{aligned}$$

We endow \(H^{s}(\mathbb {T};\mathbb {C})\) with the norm

$$\begin{aligned} \Vert u(\cdot )\Vert _{H^{s}}^{2}:= \sum _{j\in \mathbb {Z}}\langle j\rangle ^{2s}| u_{j}|^{2}\,. \end{aligned}$$
(2.3)

Moreover, for \(r\in \mathbb {R}^{+}\), we denote by \(B_{r}(X)\) the ball of the Banach space X with radius r centred at the origin.

Notation. We shall use the notation \(A\lesssim B\) to denote \(A\le C B\) where C is a positive constant depending on parameters fixed once for all, for instance d and s. We use the notation \(\lesssim _{q}\) to highlight the dependence of the constant C on some parameter q. We use the notation \(A\sim B\) to denote that \(C_1 A\le B \le C_2 A\) for some constants \(C_1, C_2>0\).

The Klein–Gordon equation in (1.1) reads

$$\begin{aligned} \partial _{tt}u+\Lambda ^{2}u+N(u)=0 \end{aligned}$$
(2.4)

where \(\Lambda \) is the Fourier multiplier defined by linearity as

$$\begin{aligned} \Lambda \, e^{\textrm{i} n\cdot x}=\Lambda (n) e^{\textrm{i} n\cdot x}\,, \qquad \Lambda ({n}):=\sqrt{|n|^{2}+m}\,, \qquad \forall \,n\in \mathbb {Z}\,. \end{aligned}$$
(2.5)

Introducing the variable \(v=\dot{u}=\partial _{t}u\) we can rewrite equation (2.4) as

$$\begin{aligned} \dot{u}=-v\,,\qquad \dot{v}=\Lambda ^{2} u+N(u)\,. \end{aligned}$$
(2.6)

By (1.3) we note that (2.6) can be written in the Hamiltonian form

$$\begin{aligned} \partial _{t}{\bigl [{\begin{matrix}u\\ v\end{matrix}}\bigr ]}=X_{H_{\mathbb {R}}}(u,v)=J\left( \begin{matrix} \partial _{u}H_{\mathbb {R}}(u,v)\\ \partial _{v}H_{\mathbb {R}}(u,v) \end{matrix}\right) , \quad J={\bigl [{\begin{matrix}0&{}1\\ -1&{}0\end{matrix}}\bigr ]} \end{aligned}$$

where \(\partial \) denotes the \(L^{2}\)-gradient of the Hamiltonian function

$$\begin{aligned} H_{\mathbb {R}}(u,v)= \int _{\mathbb {T}}\left( \frac{1}{2}v^{2}+\frac{1}{2}(\Lambda ^{2}u) u +G(u,u_x)\right) dx\,, \end{aligned}$$
(2.7)

on the phase space \(H^{1}(\mathbb {T};\mathbb {R})\times L^{2}(\mathbb {T};\mathbb {R})\). Indeed we have

(2.8)

for any \((u,v), (\widehat{u},\widehat{v})\) in \(H^{1}(\mathbb {T};\mathbb {R})\times L^{2}(\mathbb {T};\mathbb {R})\), where \(\omega _{\mathbb {R}}\) is the non-degenerate symplectic form

$$\begin{aligned} \omega _{\mathbb {R}}(W_1,W_2):=\int _{\mathbb {T}}(u_1v_2-v_1 u_2)dx, \qquad W_1:={\bigl [{\begin{matrix}u_1\\ v_1\end{matrix}}\bigr ]}, W_2:={\bigl [{\begin{matrix}u_2\\ v_2\end{matrix}}\bigr ]}. \end{aligned}$$

The Poisson bracket between two Hamiltonians \(H_{\mathbb {R}}, G_{\mathbb {R}}: H^{1}(\mathbb {T};\mathbb {R})\times L^{2}(\mathbb {T};\mathbb {R})\rightarrow \mathbb {R}\) is defined as

$$\begin{aligned} \{H_{\mathbb {R}},G_{\mathbb {R}}\} =\omega _{\mathbb {R}}(X_{H_{\mathbb {R}}},X_{G_{\mathbb {R}}})\,. \end{aligned}$$
(2.9)

We introduce the complex symplectic variables

$$\begin{aligned} \!\!\!\!\!\!\left( \begin{matrix} z \\ \overline{z} \end{matrix} \right) = \mathcal {C} \left( \begin{matrix} u \\ v\end{matrix}\right) : =\frac{1}{\sqrt{2}} \left( \begin{matrix} \Lambda ^{\frac{1}{2}} u+ \textrm{i} \Lambda ^{-\frac{1}{2}} v \\ \Lambda ^{\frac{1}{2}} u - \textrm{i} \Lambda ^{-\frac{1}{2}} v \end{matrix} \right) \, , \qquad \left( \begin{matrix} u \\ v \end{matrix}\right) = \mathcal {C}^{-1} \left( \begin{matrix} z \\ \overline{z} \end{matrix}\right) = \frac{1}{\sqrt{2}} \left( \begin{matrix} \Lambda ^{-\frac{1}{2}}( z + \overline{z} ) \\ - \textrm{i} \Lambda ^{\frac{1}{2}}( z - \overline{z} ) \end{matrix}\right) \,, \end{aligned}$$
(2.10)

where \(\Lambda \) is the Fourier multiplier defined in (2.5). Then the system (2.6) reads as in (1.4).

Notice that (1.4) can be written in the Hamiltonian form

(2.11)

with Hamiltonian function (see (2.7))

$$\begin{aligned} \begin{aligned} H(Z)&=H_{\mathbb {R}}(\mathcal {C}^{-1}Z)=H^{(2)}(Z)+H^{(3)}(Z)\,,\\ H^{(2)}(Z)&=\int _{\mathbb {T}}\overline{z}\Lambda z\ \textrm{d}x \,,\qquad H^{(3)}(Z):=\int _{\mathbb {T}} G\Big ( \frac{\Lambda ^{-1/2}(z+\overline{z})}{\sqrt{2}}, \frac{\Lambda ^{-1/2}(z_{x}+\overline{z}_{x})}{\sqrt{2}}\Big ) dx\,, \end{aligned} \end{aligned}$$
(2.12)

and where \(\partial _{\overline{z}}=(\partial _{\textrm{Re}\,(z)}+\textrm{i} \partial _{\textrm{Im}\,(z)})/2\), \(\partial _{z}=(\partial _{\textrm{Re}\,(z)}-\textrm{i} \partial _{\textrm{Im}\,(z)})/2\). Notice that

$$\begin{aligned} X_{H}=\mathcal {C}\circ X_{H_{\mathbb {R}}}\circ \mathcal {C}^{-1} \end{aligned}$$
(2.13)

and that (using (2.10))

$$\begin{aligned} \begin{aligned} \textrm{d}H(z, \overline{z}){\bigl [{\begin{matrix}h\\ \overline{h}\end{matrix}}\bigr ]}= (\textrm{d}H_{\mathbb {R}})(u,v)[\mathcal {C}^{-1}{\bigl [{\begin{matrix}h\\ \overline{h}\end{matrix}}\bigr ]} ] {\mathop {=}\limits ^{(2.8),(2.13)}} -\omega \left( X_{H}(Z),{\bigl [{\begin{matrix}h\\ \overline{h}\end{matrix}}\bigr ]} \right) \end{aligned} \end{aligned}$$
(2.14)

for any \(h\in H^{2}(\mathbb {T};\mathbb {C})\) and where the two form \(\omega \) is given by the pullback of \(\omega _{\mathbb {R}}\) through \(\mathcal {C}^{-1}\). In complex variables the Poisson bracket in (2.9) reads as

$$\begin{aligned} \begin{aligned} \{H,G\}&:=\omega (X_{H},X_{G}) =\textrm{i} \left( \int _{\mathbb {T}}\partial _{z}G\partial _{\overline{z}}H- \partial _{\overline{z}}G\partial _{z}H\right) \textrm{d}x\,, \end{aligned} \end{aligned}$$
(2.15)

where we set \(H=H_{\mathbb {R}}\circ \mathcal {C}^{-1}\), \(G=G_{\mathbb {R}}\circ \mathcal {C}^{-1}\).

2.2 Preliminaries

In this section we introduce some classes of operators that we shall consider along the paper.

2.2.1 Basic Para-Differential Calculus

We follow the notation of [36]. We denote by \(\mathcal {N}_{s}^{\; m}\), \(s\ge 0,m\in \mathbb {R}\), the spaces of functions \(\mathbb {T}\times \mathbb {R}\ni (x,\xi )\rightarrow a(x,\xi )\) of symbols defined by the norms

$$\begin{aligned} |a|_{\mathcal {N}_{s}^{\; m}}:=\sup _{0\le \alpha +\beta \le s}\sup _{\xi \in \mathbb {R}} \langle \xi \rangle ^{-m+\beta }\Vert \partial _{\xi }^{\beta }\partial _{x}^{\alpha }a(x,\xi )\Vert _{L^{2}}\,. \end{aligned}$$
(2.16)

The constant \(m\in \mathbb {R}\) indicates the order of the symbols, while s denotes its regularity. The following result is a consequence of item (i) of Lemma 2.3 in [36].

Lemma 2.1

Let \(m_1,m_2\in \mathbb {R}\), \(s>1/2\) and \(a\in \mathcal {N}^{\; m_1}_s\), \(b\in \mathcal {N}^{\; m_2}_s\). One has

$$\begin{aligned} |ab|_{\mathcal {N}^{\; m_1+m_2}_s}+|\{a,b\}|_{\mathcal {N}_{s-1}^{\; m_1+m_2-1}}+ |\sigma (a,b)|_{\mathcal {N}_{s-2}^{\; m_1+m_2-2}} \lesssim |a|_{\mathcal {N}_{s}^{\; m_1}}|b|_{\mathcal {N}_{s}^{\; m_2}} \end{aligned}$$
(2.17)

where

$$\begin{aligned}{} & {} \{a,b\}:=(\partial _{\xi }a)(\partial _{x}b) -(\partial _{x}a)(\partial _{\xi }b), \end{aligned}$$
(2.18)
$$\begin{aligned}{} & {} \sigma (a,b):=(\partial _{\xi \xi }a)(\partial _{xx}b) -2(\partial _{x\xi }a)(\partial _{\xi x}b) +(\partial _{xx}a)(\partial _{\xi \xi }b). \end{aligned}$$
(2.19)

The Weyl quantization. For a symbol \(a(x,\xi )\) in \(\mathcal {N}_{s}^{\; m}\) we define its (Weyl) quantization as

$$\begin{aligned} {Op^{\textrm{W}}}(a(x,\xi ))h:=\frac{1}{\sqrt{2\pi }}\sum _{j\in \mathbb {Z}}e^{\textrm{i} j\, x} \sum _{k\in \mathbb {Z}} \widehat{a}\big (j-k,\frac{j+k}{2}\big )\widehat{h}(k) \end{aligned}$$
(2.20)

where \(\widehat{a}(\eta ,\xi )\) denotes the Fourier transform of \(a(x,\xi )\) in the variable \(x\in \mathbb {T}\).

Smoothed symbols. Let \(0<\epsilon < 1/2\) and consider a smooth function \(g: \mathbb {R}\rightarrow [0,1]\)

$$\begin{aligned} g(\xi )=\left\{ \begin{aligned}&1 \quad \textrm{if}\,\,\,\, |\xi |\le 5/4 \\&0 \quad \textrm{if}\,\,\,\, |\xi |\ge 8/5 \end{aligned}\right. \qquad \mathrm{and \; define}\quad \qquad {\chi (x)=\chi (\epsilon ; \xi ):=g(|\xi |/\epsilon )}\,. \end{aligned}$$
(2.21)

Given \(a(x,\xi )\in \mathcal {N}_{s}^{\; m}\) we define

$$\begin{aligned} a_{\chi }(x,\xi )=\mathcal {F}^{-1}\big (\widehat{a}(\eta ,\xi )\chi (|\eta |/\langle \xi \rangle )\big )\, \end{aligned}$$
(2.22)

where \(\mathcal {F}\) denotes the Fourier transform in x.

We denote by

$$\begin{aligned} {Op^{\textrm{BW}}}(a(x, \xi )):={Op^{\textrm{W}}}(a_{\chi }(x, \xi ))\,. \end{aligned}$$
(2.23)

Lemma 2.2

Let \(s_0>1\). Then the following holds.

  1. (i)

    Let \(m\in \mathbb {R}, a\in \mathcal {N}^{\; m}_{s_0}\), then for any \(s\in \mathbb {R}\)

    $$\begin{aligned} \Vert {Op^{\textrm{BW}}}(a)[h]\Vert _{H^{s-m}}\lesssim _{s} |a|_{\mathcal {N}^{\; m}_{s_0}}\Vert h\Vert _{H^{s}}\,, \qquad \forall h\in H^{s}(\mathbb {T};\mathbb {C})\,. \end{aligned}$$
    (2.24)
  2. (ii)

    Let \(m_1, m_2\in \mathbb {R}, a\in \mathcal {N}_{s_0+4}^{\; m_1}\), \(b\in \mathcal {N}_{s_0+4}^{\; m_2}\), then

    $$\begin{aligned} {Op^{\textrm{BW}}}(a)\circ {Op^{\textrm{BW}}}(b)={Op^{\textrm{BW}}}(ab+\tfrac{1}{2\textrm{i} }\{a,b\}-\tfrac{1}{8}\sigma (a,b))+R(a,b)\,, \end{aligned}$$
    (2.25)

    where R(ab) is a remainder satisfying, for any \(s\in \mathbb {R}\),

    $$\begin{aligned} \Vert R(a,b)h\Vert _{H^{s-m_1-m_2+3}}\lesssim \Vert h\Vert _{H^{s}}|a|_{\mathcal {N}^{\; m_1}_{s_0+4}} |b|_{\mathcal {N}^{\; m_2}_{s_0+4}}\,. \end{aligned}$$
    (2.26)

    In particular

    $$\begin{aligned} \big [{Op^{\textrm{BW}}}(a), {Op^{\textrm{W}}}(b)\big ]={Op^{\textrm{BW}}}(\tfrac{1}{\textrm{i} }\{a,b\})+R(a,b)\,, \end{aligned}$$
    (2.27)

    with R(ab) as in (2.26).

  3. (iii)

    Let \(\rho \ge 0\) and \(a=a(x), b=b(x)\in H^{\rho +s_0}(\mathbb {T};\mathbb {C})\) (independent of \(\xi \in \mathbb {R}\)) then, for any \(s\in \mathbb {R}\),

    $$\begin{aligned} \Vert ({Op^{\textrm{BW}}}(a)\circ {Op^{\textrm{BW}}}(b)-{Op^{\textrm{BW}}}(ab))h\Vert _{H^{s+\rho }} \lesssim \Vert h\Vert _{H^{s}}\Vert a\Vert _{H^{\rho +s_0}}\Vert b\Vert _{H^{\rho +s_0}}\,. \end{aligned}$$
    (2.28)

Proof

By the definition of the semi-norm (2.16) we have

$$\begin{aligned} |\widehat{a}(j,\xi )|\lesssim _s \langle \xi \rangle ^{\; m}\langle j\rangle ^{-{s_0}}\,|a|_{\mathcal {N}_{s_0}^{\; m}}, \quad \forall \, j,\xi \in \mathbb {Z}\,. \end{aligned}$$
(2.29)

We have (recall (2.23), (2.20))

$$\begin{aligned} \Vert {Op^{\textrm{BW}}}(a)[h]\Vert ^2_{H^{s-m}}&\le \sum _{j\in \mathbb Z} \left( \sum _{\begin{array}{c} |j-k|\le 2^{-1} |j+k| \end{array}} \left| a\left( j-k, \frac{j+k}{2} \right) \right| |\widehat{h}(k) | \langle j \rangle ^{s-m} \right) ^2 . \end{aligned}$$

We note that \( |j-k|\le 2^{-1} |j+k|\) implies that \(\langle j \rangle \sim \langle k \rangle \). Then by using (2.29) and Young’s inequality we have

$$\begin{aligned} \Vert {Op^{\textrm{BW}}}(a)[h]\Vert ^2_{H^{s-m}}&\lesssim _s |a|^2_{\mathcal {N}^{\; m}_{s_0}} \sum _{j\in \mathbb Z} \left( \sum _{\begin{array}{c} |j-k|\le 2^{-1} |j+k| \end{array}} \frac{|j+k|^m}{|j-k|^{s_0} \langle j \rangle ^m} |\widehat{h}(k) | \langle k \rangle ^{s} \right) ^2 \\ {}&\lesssim _s |a|^2_{\mathcal {N}^{\; m}_{s_0}} \sum _{j\in \mathbb Z} \left( \sum _{\begin{array}{c} |j-k|\le 2^{-1} |j+k| \end{array}} |j-k|^{-s_0} |\widehat{h}(k) | \langle k \rangle ^{s} \right) ^2 \\ {}&\lesssim _s |a|^2_{\mathcal {N}^{\; m}_{s_0}} \Vert h \Vert ^2_{H^{s}}\,. \end{aligned}$$

The items (ii)–(iii) follows by Proposition 2.5 in [36].

2.2.2 Multilinear Operators

We are interested in studying properties of multilinear operators of the form

$$\begin{aligned} \begin{aligned}&R_{p}[z_{p+1}]=R_{p}(z_1,\ldots ,z_p)[z_{p+1}] : (C^{\infty }(\mathbb {T};\mathbb {C}))^{p+1}\rightarrow C^{\infty }(\mathbb {T};\mathbb {C})\,, \quad p=1,2\,, \\ {}&R_{p}[z_{p+1}]= \frac{1}{\sqrt{2\pi }}\sum _{{j_{p+1}}, {j}\in \mathbb {Z}}(R_{p})_{j}^{j_{p+1}} \widehat{z}(j_{p+1})e^{\textrm{i} j x}\,, \end{aligned} \end{aligned}$$
(2.30)

with

$$\begin{aligned} (R_{1})_{j}^{j_{2}}&=\frac{1}{\sqrt{2\pi }} \sum _{\begin{array}{c} j_1,j_2\in \mathbb {Z}, \sigma _1\in \{\pm \} \\ \sigma _1 j_1+j_{2}=j \end{array}} r_{1}^{\sigma _1}(j, j_1,j_2) \widehat{z}_1^{\sigma _1}(j_1)\,, \end{aligned}$$
(2.31)
$$\begin{aligned} (R_{2})_{j}^{j_{3}}&=\frac{1}{{2\pi }} \sum _{\begin{array}{c} j_1,j_2,j_3\in \mathbb {Z}, \sigma _1,\sigma _2\in \{\pm \} \\ \sigma _1 j_1+\sigma _2 j_{2}+j_3=j \end{array}} r_{2}^{\sigma _1,\sigma _2}(j, j_1,j_2,j_3) \widehat{z}_1^{\sigma _1}(j_1)\widehat{z}_2^{\sigma _2}(j_2)\,, \end{aligned}$$
(2.32)

where the coefficients \(r_{p}^{\sigma _1\ldots \sigma _{p}}(j, j_1,\ldots ,j_{p+1})\in \mathbb {C}\) for any \(j, j_1,\ldots ,j_{p+1}\in \mathbb {Z}\), \(p=1,2\). We introduce the following notation: given \(j_1,\ldots , j_q \ge 0\), \(q\ge 2\) we define

$$\begin{aligned} {\max }_{i}\{j_1, \ldots ,j_q\}= i\mathrm{-th} \; \mathrm{largest \; among } \;j_1,\ldots ,j_q\,. \end{aligned}$$
(2.33)

We need the following definition.

Definition 2.3

Let \(\rho \in \mathbb {R}\), \(p=1,2\), \(q\in \mathbb {N}\) and \(0<r\le 1\).

(i) (Multilinear operators). We say that a \((p+1)\)-linear map \(R_p\) as in (2.30)–(2.32) belongs to the class \({\textbf{M}}^{-\rho }_{p}\) if there is \(\mu \ge 0\) such that for any \(j_1,\ldots ,j_{p+1}\in \mathbb {Z}\), \(\sigma _1,\ldots ,\sigma _{p}\in \{\pm \}\) one has

$$\begin{aligned} |r_{p}^{\sigma _1\ldots \sigma _{p}}(\zeta ,j_1,\ldots ,j_{p+1})|\lesssim \frac{ \max _{2}\{\langle j_1\rangle , \ldots , \langle j_{p+1}\rangle \}^{{\mu +|\rho |}}}{\max _{1}\{\langle j_1\rangle , \ldots , \langle j_{p+1}\rangle \}^{\rho }}\,, \end{aligned}$$
(2.34)

for \(\zeta :=\sigma _1 j_1+\cdots +\sigma _{p} j_{p}+j_{p+1}\).

(ii) (Non-homogenenous operators). We denote \(\textbf{NH}_{q}^{-\rho }\) the class of maps \((\varphi ,V, u) \mapsto {\mathcal R}(\varphi ,V)[u] \) defined on \(B_r(H^{\mu }(\mathbb T))^2\times H^{\mu }(\mathbb T)\) for some \(\mu >1/2\), linear in u, such that the following holds. For any \(s \ge \mu \), and \((\varphi ,V)\in B_r(H^{\mu }(\mathbb T))^2\cap H^{s}(\mathbb T) \), \(u\in H^s(\mathbb T)\) one has

$$\begin{aligned} \begin{aligned} \Vert \mathcal {R}(\varphi ,V)u \Vert _{ H^{s+ \rho }}&\lesssim _{s, \rho } (\Vert \varphi \Vert _{H^{s}}^q +\Vert V\Vert _{H^{s}})\Vert u\Vert _{H^{s}}\,, \\ \Vert d_{\varphi }\big (\mathcal {R}(\varphi ,V) \big ) (u)h \Vert _{ H^{s+ \rho }}&\lesssim _{s, \rho } (\Vert \varphi \Vert _{H^{s}}^{q-1} +\Vert V\Vert _{H^{s}})\Vert u\Vert _{H^{s}} \Vert h\Vert _{H^{s}}\,, \\ \Vert d_{V}\big (\mathcal {R}(\varphi ,V) \big ) (u)h \Vert _{ H^{s+ \rho }}&\lesssim _{s, \rho } \Vert u\Vert _{H^{s}} \Vert h\Vert _{H^{s}}\,. \end{aligned} \end{aligned}$$
(2.35)

(iii) We denote by \(\mathbf {\Sigma }^{-\rho }_{1}[r,3]\) the space of operators \(R(\varphi )[\cdot ]\) of the form

$$\begin{aligned} R(\varphi )=R_1(\varphi )+R_2(\varphi )+R_{\ge 3}(\varphi ,V), \end{aligned}$$

where \(R_{j}\in {\textbf{M}}^{-\rho }_{j}\), \(j=1,2\), and \(R_{\ge 3}\in \textbf{NH}^{-\rho }_{3}\).

We now prove that the operators defined above extend as continuous maps on the Sobolev spaces.

Lemma 2.4

Let \(\rho \in \mathbb {R}\). Consider a multilinear operator \(R\in \textbf{M}^{-\rho }_{p}\) and \(\mu \) introduced in Def. 2.3-(i). Then, there is \(s_0=s_0(\mu )>1/2\) such that, for \(s\ge s_0+|\rho |\), the map R in (2.30) with coefficients satisfying (2.34) extends as a continuous map from \((H^{s}(\mathbb {T};\mathbb {C}))^{p}\times H^{s}(\mathbb {T};\mathbb {C})\) to \(H^{s+\rho }(\mathbb {T};\mathbb {C})\). Moreover one has

$$\begin{aligned} \Vert R(u_1,\ldots ,u_p)[u_{p+1}]\Vert _{H^{s+\rho }}\lesssim \Vert u_{p+1}\Vert _{H^{s}}\prod _{i=1}^{p}\Vert u_{i}\Vert _{H^{s}}\,. \end{aligned}$$
(2.36)

Proof

We follow Lemma 2.5 in [33]. We give the proof only in the case \(p=2\). The case \(p=1\) will follow similarly. By (2.3) we have

(2.37)

where IIIIII are the terms in (2.37) which are supported respectively on indexes such that

$$\begin{aligned}{} & {} \max _{1}\{\langle j_1\rangle , \langle j_2\rangle ,\langle j_3\rangle \}=\langle j_1\rangle ,\;\;\textrm{or}\;\;\; \max _{1}\{\langle j_1\rangle , \langle j_2\rangle ,\langle j_3\rangle \}=\langle j_2\rangle ,\;\;\textrm{or}\\{} & {} \max _{1}\{\langle j_1\rangle , \langle j_2\rangle ,\langle j_3\rangle \}=\langle j_3\rangle . \end{aligned}$$

Consider for instance the term III. By using the Young inequality for sequences we deduce

$$\begin{aligned} \begin{aligned} III&\lesssim \Vert ( \langle j_1 \rangle ^{\mu +|\rho |}\widehat{u}_1(j_1) )*(\langle j_2\rangle ^{\mu }\widehat{u}_2(j_2))*(\langle j_3\rangle ^{s}\widehat{u}_3(j_3))\Vert _{\ell ^{2}} \\ {}&\lesssim \Vert u_1\Vert _{H^{1/2+\epsilon +\mu +|\rho |}}\Vert u_2\Vert _{H^{1/2+\epsilon +\mu }} \Vert u_3\Vert _{H^{s}}, \end{aligned} \end{aligned}$$

for \(\epsilon >0\), which is the is the (2.36) taking \(s_0> \mu +1/2\). The bounds of I and II are similar.

Now we introduce class of linear operators obtained by linearizing multilinear and non-homogenous maps at a \(C^{\infty }\)-function \(\varphi \). Such operators are pseudo-differential according to the definition given in subsection 2.2.1.

Definition 2.5

Let \(m\in \mathbb {R}, s >1/2\), \(q\in \mathbb {N}\) and \(0<r\le 1\).

(i) (Linear symbols). We denote by \(\textbf{S M}_1^{\; m}\) the class of maps \(\varphi \in C^{\infty }(\mathbb T; \mathbb {C})\rightarrow a_1(\varphi ; \cdot ) \in \mathcal {N}_s^m\) such that it has the form

$$\begin{aligned} a_1(\varphi ;x,\xi )=\sum _{j\in \mathbb {Z},\sigma \in \{\pm \}}a_1^{\sigma }(j,\xi )\varphi ^{\sigma }_{j}e^{\sigma \textrm{i} jx}\,, \end{aligned}$$
(2.38)

and

$$\begin{aligned} |\partial _{\xi }^{\beta }(a_1)^{\sigma }(j,\xi )|\lesssim \langle j\rangle ^{\mu }\langle \xi \rangle ^{m-\beta }\,, \end{aligned}$$
(2.39)

for some \(\mu \ge 0\).

(ii) (Quadratic symbols). We denote by \(\textbf{S M}_2^{\; m}\) the class of maps \(\varphi \in C^{\infty }(\mathbb T; \mathbb {C})\rightarrow a_2(\varphi ; \cdot ) \in \mathcal {N}_s^m\) such that it has the form

$$\begin{aligned} \begin{aligned} a_2(\varphi ;x,\xi )&=\sum _{j_1,j_2\in \mathbb {Z},\sigma \in \{\pm \}}a_2^{\sigma \sigma }(j_1,j_2,\xi ) \varphi ^{\sigma }_{j_1}\varphi ^{\sigma }_{j_2}e^{\textrm{i} \sigma (j_1+j_2)x} \\ {}&\qquad \qquad \qquad + \sum _{j_1,j_2\in \mathbb {Z}}a_{2}^{+-}(j_1,j_2,\xi )\varphi _{j_1}\overline{\varphi _{j_2}}e^{\textrm{i} (j_1-j_2)x}\,, \end{aligned} \end{aligned}$$
(2.40)

and

$$\begin{aligned} |\partial _{\xi }^{\beta }(a_2)^{\sigma _1\sigma _2}(j_1,j_2,\xi )|\lesssim \max \{\langle j_1\rangle , \langle j_2\rangle \}^{\mu }\langle \xi \rangle ^{m-\beta }\, \end{aligned}$$
(2.41)

for some \(\mu \ge 0\).

(iii) (Non-homogeneous symbols). Let \(r\in (0, 1)\). We denote by \(\textbf{SNH}_{q}^{\; m}\) the class of maps

$$\begin{aligned} (\varphi ,V)\in B_r(H^{\mu }(\mathbb T))^2 \rightarrow a_{\ge q}(\varphi ,V; \cdot )\in \mathcal {N}_s^m \qquad \mathrm {for\,\,some\,\,}\mu >1/2, \end{aligned}$$

such that, for any \(s\ge s_0+\mu \), if \((\varphi ,V)\in (B_r(H^{\mu }(\mathbb T))\cap H^s(\mathbb T))^2\) one has

$$\begin{aligned} \begin{aligned} |a_{\ge q}(\varphi ,V;\cdot )|_{\mathcal {N}_{s}^{\; m}}&\lesssim _{s} \Vert \varphi \Vert ^{q}_{H^{s+\mu }}+\Vert V\Vert _{H^{s+\mu }}\,, \\ |d_{\varphi }a_{\ge q}(\varphi ,V,\cdot )[h]|_{\mathcal {N}_{s}^{\; m}}&\lesssim _{s} (\Vert \varphi \Vert ^{q-1}_{H^{s+\mu }}+\Vert V\Vert _{H^{s+\mu }}) \Vert h\Vert _{H^{s+\mu }}\,, \qquad \forall \, h\in H^{s+\mu }(\mathbb {T})\,, \\ |d_{V}a_{\ge q}(\varphi ,V,\cdot )[h]|_{\mathcal {N}_{s}^{\; m}}&\lesssim _{s} \Vert h\Vert _{H^{s+\mu }}\,. \end{aligned} \end{aligned}$$
(2.42)

(iv) (Symbols). We denote by \(\mathbf {S\Sigma }^{\; m}_{1}[r,3]\) the space of symbols \(a(\varphi ,V;x,\xi )\) of the form

$$\begin{aligned} a(\varphi ,V;x,\xi )=a_1(\varphi ;x,\xi )+a_2(\varphi ;x,\xi )+a_{\ge 3}(\varphi ,V;x,\xi ), \end{aligned}$$

where \(a_j\in \textbf{SM}^{\; m}_j\), \(j=1,2\), and \(a_{\ge 3}\in \textbf{SNH}^{\; m}_{3}\).

Lemma 2.6

Let \(a_j=a_{j}(\varphi ;\cdot )\in \textbf{SM}^{\; m}_j\), \(j=1,2\). Then, there exist \(\mu \ge 0\), \(s_0=s_0(\mu )>1/2\) such that, for \(s\ge s_0\), the maps \(\varphi \rightarrow a_{j}(\varphi ;\cdot )\) extend as continuous maps \(H^{s+\mu }(\mathbb {T})\rightarrow \mathcal {N}_{s}^{\; m}\). Moreover

$$\begin{aligned} \begin{aligned} |a_1(\varphi )|_{\mathcal {N}_{s}^{\; m}}&\lesssim _s\Vert \varphi \Vert _{H^{s+\mu }},\qquad |a_2(\varphi )|_{\mathcal {N}_{s}^{\; m}}&\lesssim _s\Vert \varphi \Vert ^{2}_{H^{s+\mu }}. \end{aligned} \end{aligned}$$

Proof

The proof is straightforward using (2.16) and the bounds on the coefficients of \(a_{j}\) in Def. 2.5.

Lemma 2.7

Let \(m\in \mathbb {R}\), \(s_0>1\). The following holds.

(i) Let \(a=a(\varphi , \cdot )\in \textbf{SM}^m_j\), \(j=1, 2\). Then \({Op^{\textrm{BW}}}(a)\in \textbf{M}^m_j\) and there is \(\mu \ge 0\) such that

$$\begin{aligned} \Vert {Op^{\textrm{BW}}}(a) h \Vert _{H^{s-m}}\lesssim _s \Vert \varphi \Vert ^j_{H^{s_0+\mu }} \Vert h \Vert _{H^s}. \end{aligned}$$

(ii) Let \(a=a(\varphi , \cdot )\in \textbf{SNH}^m_3\). Then, for all \(h\in H^s(\mathbb T)\),

$$\begin{aligned} \Vert {Op^{\textrm{BW}}}(a)[h]\Vert _{H^{s-m}}\lesssim _{s} (\Vert \varphi \Vert ^3_{H^{s_0+\mu }}+\Vert V\Vert _{H^{s_0+\mu }}) \Vert h\Vert _{H^{s}}\,. \end{aligned}$$
(2.43)

(iii) Let \(a=a(\varphi , \cdot )\in \mathbf {S \Sigma }^m_1\). Then \({Op^{\textrm{BW}}}(a)\in \mathbf {\Sigma }_1^m[r, 3]\).

Proof

(i) It follows by Lemmata 2.2 and 2.6. (ii) The bound (2.43) is consequence of Lemma 2.2 and bounds (2.42). (iii) If the symbol \(a(\varphi ,\cdot )\) is multilinear (see Def. 2.5), then the (2.34) follows by (2.23), (2.20), (2.38), (2.39), (2.40) and (2.41) with \(\beta =0\) and by explicit computations. Assume that the symbol \(a(\varphi ,\cdot )\) is non-homogeneous. We first notice that

$$\begin{aligned} d_{\varphi , V} {Op^{\textrm{W}}}(a)={Op^{\textrm{W}}}(d_{\varphi , V} a). \end{aligned}$$

Then (2.42) and Lemma 2.2 imply the (2.35). This concludes the proof.

Lemma 2.8

Let \(m, \rho _1, \rho _2\in \mathbb {R}\). We have the following:

(i) \(\textbf{SM}_{j}^{\; m}\subset \textbf{SNH}^{\; m}_{j}\) for \(j=1, 2\);

(ii) \(\textbf{M}_{j}^{\; m}\subset \textbf{NH}^{\; m}_{j}\) for \(j=1, 2\);

(iii) If \(R_{i}\in \textbf{M}^{-\rho _1}_{i}\), \(Q_{j}\in \textbf{M}^{-\rho _2}_{j}\) for some \(i,j\in \mathbb {N}\), then \(R_i\circ Q_{j}\in \textbf{M}^{-\min \{\rho _1, \rho _2\}}_{i+j}\). If \(R\in \mathbf {\Sigma }^{-\rho _1}_{1}[r,3]\), \(Q\in \mathbf {\Sigma }^{-\rho _2}_{1}[r,3]\) then \(R\circ Q\in \mathbf {\Sigma }^{-\min \{\rho _1, \rho _2\}}_{1}[r,3]\).

Proof

(i) Using formulæ (2.38), (2.40) one can write explicitly the differential of the symbol in the variable \(\varphi \). Then the bounds (2.42) follows by using (2.39), (2.41) and Lemma 2.6. (ii) One can reason as done for item (i) using (2.30)–(2.32), (2.34) and reasoning as in the proof of Lemma 2.4. (iii) Since Definitions 2.3, 2.5 are equivalent to the classes of symbols and operators introduced in [9, 14] we refer to these papers for a detailed proof of the third bullet.

Lemma 2.9

Let \(m\in \mathbb {R}\), \(i, k\in \mathbb {N}\), \(a\in \textbf{SM}_i^m\), \(B=B(\varphi )\in \textbf{M}_k^n\) and \(c(\varphi ; x, \xi ):=a(B(\varphi ) \varphi ; x, \xi )\). Then the following holds.

(i) If \(i=k=1\) then \(c\in \textbf{SM}_2^m\).

(ii) If i or k are strictly greater than 1 then \(c\in \textbf{SNH}_{i+k}^m\).

Proof

(i) By (2.38), (2.30), (2.31) we have

$$\begin{aligned} c(\varphi ; x, \xi )&= \sum _{j\in \mathbb Z} a^{\sigma }(j, \xi ) (B(\varphi ) \varphi )_j^{\sigma }\, e^{\textrm{i} \sigma j x} =\frac{1}{\sqrt{2\pi }}\sum _{j\in \mathbb Z} a^{\sigma }(j, \xi ) \left( \sum _{j_2\in \mathbb Z} (B(\varphi ))_{\sigma j}^{\sigma _2 j_{2}} {\varphi }^{\sigma _2}_{j_2} \right) \,e^{\textrm{i} \sigma j x} \\ {}&=\frac{1}{2\pi } \sum _{\begin{array}{c} j, j_1, j_2\in \mathbb Z,\\ \sigma _1 j_1+\sigma _2 j_2= \sigma j \end{array}} a^{\sigma }(j, \xi ) b^{\sigma _1}(j, j_1, j_2) \varphi ^{\sigma _1}_{j_1} \varphi ^{\sigma _2}_{j_2}\, e^{\textrm{i} (\sigma _1 j_1+\sigma _2 j_2) x}\,. \end{aligned}$$

Then setting, for \(\sigma j=\sigma _1 j_1+\sigma _2 j_2\),

$$\begin{aligned} c^{\sigma _1, \sigma _2}(j_1, j_2, \xi ):= \frac{1}{2\pi }\,a^{\sigma }(j, \xi ) b^{\sigma _1}(j, j_1, j_2) , \end{aligned}$$

and using the fact that \(\varphi \in C^{\infty }(\mathbb T;\mathbb C)\) we have (recall bounds (2.39), (2.34))

$$\begin{aligned} |\partial _{\xi }^{\beta } c^{\sigma _1, \sigma _2}(j_1, j_2, \xi )|&\lesssim |\partial _{\xi }^{\beta } a^{\sigma }(j, \xi )| \max _{2}\{\langle j_1\rangle ,\langle j_{2}\rangle \}^{{\mu +|n|}} \max _{1}\{\langle j_1\rangle , \langle j_{2}\rangle \}^{n} \\ {}&\lesssim \max \{\langle j_1\rangle , \langle j_2\rangle \}^{\tilde{\mu }} \langle \xi \rangle ^{m-\beta }\,, \end{aligned}$$

for some \(\tilde{\mu }\ge 0\). This concludes the proof of (i). To prove (ii) we first notice that, reasoning as before, one can prove that c is homogenous of degree \(i+k\ge 3\). More precisely

$$\begin{aligned} c(\varphi ; x, \xi )= \sum _{\begin{array}{c} j_p\in \mathbb Z, p=1, \dots , i+k, \\ \sigma _1 j_1+\dots +\sigma _{i+k} j_{i+k}=\sigma j \end{array}} a^{\sigma }(j, \xi ) b^{\sigma _1, \dots , \sigma _{i+k}}(j, j_1, \dots , j_{i+k})\, \varphi _{j_1}^{\sigma _1} \dots \varphi _{j_{i+k}}^{\sigma _{i+k}} \, e^{\textrm{i} \sigma j x}. \end{aligned}$$

Without loss of generality we can set \(\langle j_1 \rangle =\max _{p=1, \dots , i+k}\{ \langle j_p \rangle \}\). Therefore, for \(s\ge \beta \ge 0 \) we have

$$\begin{aligned} \Vert \partial _{\xi }^{\beta } c(\varphi ; x, \xi ) \Vert ^2_{H^s}&\lesssim \sum _{\begin{array}{c} j\in \mathbb Z \end{array}} |\partial _{\xi }^{\beta } a^{\sigma }(j, \xi )|^2 \left( \sum _{\sigma _1 j_1+\dots +\sigma _{i+k} j_{i+k}=\sigma j} |\varphi _{j_1}^{\sigma _1} \dots \varphi _{j_{i+k}}^{\sigma _{i+k}} | \langle j_{1}\rangle ^{ s+|n|+\mu }\right) ^2 \\ {}&\lesssim \sum _{\begin{array}{c} j\in \mathbb Z \end{array}} | \big (\partial _{\xi }^{\beta } a^{\sigma }(j, \xi ) \big )\,\,(\langle j_1 \rangle ^{s+|n|+\mu } |\varphi _{j_1}|* \dots * |\varphi _{j_{i+k}}|)_j|^2 \\ {}&\lesssim |a|_{\mathcal {N}_s^m} \Vert \varphi \Vert _{H^{s+\tilde{\mu }}}^{2(i+k)}\,, \end{aligned}$$

for some \(\tilde{\mu }\ge 0\). Since c is a multilinear symbol its differential can be computed explicitly and we obtain the bounds (2.42).

Lemma 2.10

Let \(a\in \textbf{SM}_i^m\), \(Q\in \textbf{M}_j^{-\rho }\) and \(A:={Op^{\textrm{BW}}}(a)\circ Q, Q\circ {Op^{\textrm{BW}}}(a)\). Then the following holds.

(i) If \(i=j=1\) then \(A\in \textbf{M}_{2}^{m-\rho }\).

(ii) If i or j are greater than 1 then \(A\in \textbf{NHM}_{i+j}^{m-\rho }\).

Proof

Let us study the case \(A={Op^{\textrm{BW}}}(a)\circ Q\). The other is similar.

(i) By using (2.20), (2.22)–(2.23), (2.38) and (2.30) (with \(p=1\)), (2.31) one deduces that the operator A can be expanded as in (2.30) with \(p=2\) with coefficients as in (2.32) where

$$\begin{aligned} \begin{aligned}&r_{2}^{\sigma _1,\sigma _2}(j, j_1,j_2,j_3):= a^{\sigma _1}(\sigma _1j_1, \frac{j-\sigma _1j_1}{2}) \chi \big (\frac{2|j_1|}{\langle j-\sigma _1j_1\rangle }\big ) Q^{\sigma _2}(j-\sigma _1j_1,j_3,j_2) \\ {}&\sigma _1 j_1+\sigma _2j_2+j_3=j, \end{aligned} \end{aligned}$$

where \(\alpha ^{\sigma _1}(p,j)\), \(Q^{\sigma _2}(j,p_1,p_2)\), \(j,p,p_1,p_2\in \mathbb Z\) are respectively the coefficients of the symbol \(a(x,\xi )\) and of the operator Q, and satisfy the bounds (2.39) and (2.34). Recalling (2.21), one can note that

$$\begin{aligned} \chi \big (\frac{2|j_1|}{\langle j-\sigma _1j_1\rangle }\big )\ne 0 \quad \Rightarrow \quad |j_1|\ll |j|\sim |j-\sigma _1j_1|\le |j_2| +|j_3|\lesssim \max _2\{\langle j_1\rangle , \langle j_2\rangle ,\langle j_3\rangle \}. \end{aligned}$$

Therefore, by (2.39) and (2.34), we get

$$\begin{aligned} \begin{aligned} |r_{2}^{\sigma _1,\sigma _2}(j, j_1,j_2,j_3)|&\lesssim |j_1|^{\mu }\langle j-\sigma _1j_1\rangle ^{\; m} \frac{\max _2\{ \langle j_2\rangle ,\langle j_3\rangle \}^{\mu +\rho }}{\max _1\{ \langle j_2\rangle ,\langle j_3\rangle \}^{\rho }} \\ {}&\lesssim \frac{\max _2\{\langle j_1\rangle , \langle j_2\rangle ,\langle j_3\rangle \}^{\tilde{\mu }+\rho }}{\max _1\{\langle j_1\rangle , \langle j_2\rangle ,\langle j_3\rangle \}^{\rho -m}}, \end{aligned} \end{aligned}$$

for some \(\tilde{\mu }\ge \mu \). This implies \(A\in \textbf{M}_{2}^{m-\rho }\).

(ii) It follows by Lemmata 2.2, 2.6, 2.7 and 2.4, 2.8.

Remark 2.11

Given \(a\in \mathbf {S \Sigma }^{\; m_1}_1[r,3]\) and \(b\in \mathbf {S\Sigma }^{\; m_2}_1[r,3]\) and reasoning as in [9, 14], one can note that the remainder \(R(a,b)\in \mathbf {\Sigma }^{\; m_1+m_2-3}_{1}[r, 3]\).

2.2.3 Matrices of Operators

We define some special classes of linear operators on spaces of functions.

Definition 2.12

Let \(\rho \in \mathbb {R}\), \(p=1, 2\), \(q\ge 3\). We denote by \(\textbf{M}^{-\rho }_{p}\otimes \mathcal {M}_{2}(\mathbb C)\) the space of \(2\times 2\) matrices whose entries are operators in \(\textbf{M}^{-\rho }_{p}\). We denote by \(\textbf{SM}_{p}^{\; m}\otimes \mathcal {M}_2(\mathbb {C})\) the space of \(2\times 2\) matrices whose entries are symbols in \(\textbf{SM}_{p}^{\; m}\). Given a matrix of operators \(Q\in \textbf{M}^{-\rho }_{p}\otimes \mathcal {M}_{2}(\mathbb C)\) (resp. \(\textbf{NH}^{-\rho }_{q}[r]\otimes \mathcal {M}_{2}(\mathbb C)\), \(\mathbf {\Sigma }^{-\rho }_{1}[r,3]\otimes \mathcal {M}_{2}(\mathbb C)\)) we shall write

$$\begin{aligned} Q:=(Q_{\sigma }^{\sigma '})_{\sigma ,\sigma '\in \{\pm \}}:= \left( \begin{matrix} Q_{+}^{+} &{} Q_{+}^{-} \\ Q_{-}^{+} &{} Q_{-}^{-} \end{matrix} \right) \end{aligned}$$
(2.44)

where \(Q_{\sigma }^{\sigma '}\in \textbf{M}^{-\rho }_{p}\) denote the entries of the matrix Q.

Similar definitions and notations are used when we consider \(\textbf{NH}^{-\rho }_{q}[r]\), \(\mathbf {\Sigma }^{-\rho }_{1}[r,3]\) instead of \( \textbf{M}^{-\rho }_{p}\).

Along the paper we shall consider special subspaces of matrix valued operators introduced above.

Definition 2.13

Let \(\rho \in \mathbb {R}\), \(p\in \mathbb {N}\cup \{0\}\), \(0<r\le 1\) and let \(Q\in \textbf{M}^{-\rho }_{p}\).

(i) We define the operator \(\overline{Q}\) as

$$\begin{aligned} \overline{Q}[h]:= \overline{Q[\overline{h}]}, \qquad h\in H^{s}(\mathbb T). \end{aligned}$$
(2.45)

(ii) We say that a matrix of linear operators \(\mathcal {Q}\in \textbf{M}^{-\rho }_{p}\otimes \mathcal {M}_2(\mathbb {C})\) is real-to-real (or reality preserving) if it has the form (see (2.44))

$$\begin{aligned} \mathcal {Q}:=\left( \begin{matrix} Q_{+}^{+} &{} Q_{+}^{-} \\ \overline{Q_{+}^{-}} &{} \overline{Q_{+}^{+}}\end{matrix}\right) \,, \quad \mathrm{i.e.}\quad Q_{-\sigma }^{-\sigma '}=\overline{Q_{\sigma }^{\sigma '}}\,, \quad \forall \, \sigma ,\sigma '\in \{\pm \}\,, \end{aligned}$$
(2.46)

for some \(Q_{+}^{+}, Q_{+}^{-}\in \textbf{M}^{-\rho }_{p}\).

(iii) We say that a matrix of linear operators \(\mathcal {Q}\in \textbf{M}^{-\rho }_{p}\otimes \mathcal {M}_2(\mathbb {C})\) is real if it has the form

$$\begin{aligned} \mathcal {Q}:=\left( \begin{matrix} Q^{+} &{} Q^{-} \\ {Q^{-}} &{} {Q^{+}}\end{matrix}\right) \,, \quad \textrm{where}\quad Q^{+}:={Q_{+}^{+}}\,, \quad Q^{-}:={Q_{+}^{-}}\,, \quad Q^{\sigma }=\overline{Q^{\sigma }} \end{aligned}$$
(2.47)

for some \(Q_{+}^{+}, Q_{+}^{-}\in \textbf{M}^{-\rho }_{p}\).

Similar definitions and notations are used when we consider \(\textbf{NH}^{-\rho }_{q}[r]\), \(\mathbf {\Sigma }^{-\rho }_{1}[r,3]\) instead of \( \textbf{M}^{-\rho }_{p}\).

Recall the spaces \(H^s(\mathbb {T})\) defined in (1.5). One can easily check that a real-to-real linear operator \(\mathcal {Q} \) preserves the spaces \(H^s(\mathbb {T})\). We also observe that a real matrix is real-to-real.

On the space \(H^0(\mathbb T)\) we define the scalar product

$$\begin{aligned} (U,V)_{{{ L}}^2}:= \int _{\mathbb {T}}U\cdot \overline{V}dx, \qquad U={\bigl [{\begin{matrix}u\\ \overline{u}\end{matrix}}\bigr ]}, \quad V={\bigl [{\begin{matrix}v\\ \overline{v}\end{matrix}}\bigr ]}. \end{aligned}$$
(2.48)

Given an operator \(\mathcal {Q}\) of the form (2.47) we denote by \(\mathcal {Q}^{*}\) its adjoint with respect to the scalar product (2.48), i.e.

$$\begin{aligned} (\mathcal {Q}U,V)_{{{ L}}^2}=(U,\mathcal {Q}^{*}V)_{{{ L}}^2}\,, \quad \forall \,\, U,\, V\in H^0(\mathbb T)\,. \end{aligned}$$
(2.49)

One can check that

$$\begin{aligned} \mathcal {Q}^*:=\left( \begin{matrix} (Q^{+})^* &{} (\overline{Q^{-}})^* \\ ({Q^{-}})^* &{} (\overline{Q^{+}})^*\end{matrix}\right) {\mathop {=}\limits ^{(2.47)}} \left( \begin{matrix} (Q^{+})^* &{} ({Q^{-}})^* \\ ({Q^{-}})^* &{} ({Q^{+}})^*\end{matrix}\right) \,, \end{aligned}$$
(2.50)

where \((Q^{+})^*\) and \((Q^{-})^*\) are respectively the adjoints of the operators \(Q^{+}\) and \(Q^{-}\) with respect to the standard complex scalar product on \(L^{2}(\mathbb {T};\mathbb {C})\).

Definition 2.14

Let \(\mathcal {Q}\in \textbf{M}^{-\rho }_{p}\otimes \mathcal {M}_2(\mathbb {C})\) be a real linear operator of the form (2.47). We say that \(\mathcal {Q}\) is:

  • self-adjoint if

    $$\begin{aligned} (Q^{+})^{*}=Q^{+}\,,\;\; \;\; {Q^{-}}=(Q^{-})^{*}\,. \end{aligned}$$
    (2.51)
  • Hamiltonian if

    $$\begin{aligned} \mathcal {Q}=\textrm{i} E \mathcal {A}\,, \qquad E={\bigl [{\begin{matrix}1&{}0\\ 0&{}-1\end{matrix}}\bigr ]}\,, \end{aligned}$$
    (2.52)

    where \(\mathcal {A}\in \textbf{M}^{-\rho }_{p}\otimes \mathcal {M}_2(\mathbb {C})\) is a self-adjoint operator matrix.

Similar definitions are used when we consider operators in \(\textbf{NH}^{-\rho }_{q}[r]\otimes \mathcal {M}_2(\mathbb {C})\), \(\mathbf {\Sigma }^{-\rho }_{1}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\).

Consider now a symbol \(a(x,\xi )\in \mathbf {S\Sigma }_1^{\; m}[r,3]\) and set \(A:={Op^{\textrm{W}}}(a(x,\xi ))\). Using (2.20) one can check that (recall (2.45))

$$\begin{aligned}&\overline{A}={Op^{\textrm{W}}}(\overline{a(x,-\xi )})\,, \qquad A^{*}={Op^{\textrm{W}}}\big (\,\overline{a(x,\xi )}\,\big )\,. \end{aligned}$$
(2.53)

By (2.53) we deduce that the operator A is self-adjoint with respect to the standard scalar product on \(L^{2}(\mathbb {T};\mathbb {C})\) if and only if the symbol \(a(x,\xi )\) is real valued.

We need the following definition. Consider two symbols \(a, b\in \mathcal {N}_s^{\; m}\) and the matrix

$$\begin{aligned} A := A(x,\xi ):= \left( \begin{matrix} a(x,\xi ) &{} b(x,\xi ) \\ \overline{b(x,-\xi )} &{} \overline{a(x,-\xi )} \end{matrix} \right) \,. \end{aligned}$$

Define the operator

$$\begin{aligned} M:={Op^{\textrm{W}}}(A(x,\xi )):= \left( \begin{matrix} {Op^{\textrm{W}}}(a(x,\xi )) &{} {Op^{\textrm{W}}}(b(x,\xi )) \\ {Op^{\textrm{W}}}(\overline{b(x,-\xi )}) &{} {Op^{\textrm{W}}}(\overline{a(x,-\xi )}) \end{matrix} \right) \,. \end{aligned}$$
(2.54)

For the matrix of pseudo-differential operators defined above the following facts hold:

  • Real-to-real: by (2.53) we have that the operator M in (2.54) has the form (2.46), hence it is real-to-real;

  • Self-adjointeness: using (2.53) the operator M in (2.54) is self-adjoint with respect to the scalar product (2.48) if and only if (recall (2.51))

    $$\begin{aligned} a(x,\xi )=\overline{a(x,\xi )}\,, \qquad b(x,-\xi )=b(x,\xi )\,. \end{aligned}$$
    (2.55)
  • Reality: if both the symbols \(a(x,\xi ), b(x,\xi )\) are real valued we have that the operator M in (2.54) has the form (2.47), hence it is real and self-adjoint.

Definition 2.15

(Symplectic map) Let \(\mathcal {Q}\in \textbf{M}^{-\rho }_{p}\otimes \mathcal {M}_2(\mathbb {C})\) (resp. \(\textbf{NH}^{-\rho }_{p}[r]\otimes \mathcal {M}_2(\mathbb {C})\) or \(\mathbf {\Sigma }^{-\rho }_{1}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\)) be a real-to-real (or real) linear operator of the form (2.46) (or (2.47)). We say that \(\mathcal {Q}\) is symplectic if

$$\begin{aligned} \mathcal {Q}^{*}(\textrm{i} E )\mathcal {Q}= \textrm{i} E\,. \end{aligned}$$
(2.56)

3 Birkhoff Normal Form

In this section we construct a suitable normal form for the Klein–Gordon Hamiltonian H in (2.12). This is the content of Proposition 3.9. This normal form procedure presents small divisors problems. Estimates on such small divisors are provided by Lemma 3.8. In Sect. 3.1 we recall some properties of homogeneous Hamiltonian functions. In Sect. 3.3 we construct the oscillating function which approximate the dynamics of the Klein–Gordon.

3.1 Homogenous Hamiltonians

Let F be a homogenous Hamiltonian of the form

$$\begin{aligned} F(Z)=\sum _{\pi (\mathbf {\sigma }\,, \textbf{j})=0} F_{\mathbf {\sigma }\,, \textbf{j}} \,z_{j_1}^{\sigma _n}\dots z_{j_n}^{\sigma _n}\,, \qquad Z={\bigl [{\begin{matrix}z\\ \overline{z}\end{matrix}}\bigr ]}\,, \end{aligned}$$
(3.1)

where \(\mathbf {\sigma }:=(\sigma _1, \dots , \sigma _n)\) is a n-dimensional vector of signs \(\{\pm \}\), \(\textbf{j}:=(j_1, \dots , j_n)\in \mathbb Z^{n}\), \(z_j^+:=z_j\), \(z_j^-:=\overline{z_j}\), \(F_{\mathbf {\sigma }, \textbf{j}}\in \mathbb {C}\) and

$$\begin{aligned} \pi (\mathbf {\sigma }\,, \textbf{j}):=\sum _{i=1}^n \sigma _i j_i\,. \end{aligned}$$
(3.2)

The equality \(\pi (\mathbf {\sigma }, \textbf{j})=0\) encodes the fact that F commutes with the momentum Hamiltonian

$$\begin{aligned} M:=-\frac{\textrm{i}}{2} ( Z_x,\overline{Z}\,)_{H^0(\mathbb T)}=\sum _{j\in \mathbb {Z}} j\,|z_j|^2. \end{aligned}$$

If we want to highlight the degree of homogeneity of the Hamiltonian F we shall write \(F=F^{(n)}\).

Fixed a subset S of \(\mathbb {Z}\), we define for \(0\le k\le n\)

$$\begin{aligned} \mathcal {A}_{n, k}:=\big \{ (\mathbf {\sigma }, \textbf{j})\in \{ \pm \}^n\times \mathbb Z^{n}: \text{ there } \text{ are } \text{ exactly } \text{ indexes }\,j_i\,\text{ that } \text{ belong } \text{ to } \text{ S }, \,\,\pi (\mathbf {\sigma }, \textbf{j})=0 \big \}. \end{aligned}$$

We define also

$$\begin{aligned} \mathcal {A}_{n, \le k}:= \bigcup _{i\le k} \mathcal {A}_{n, i}, \qquad \mathcal {A}_{n, \ge k}:= \bigcup _{i\ge k} \mathcal {A}_{n, i}, \end{aligned}$$

and

$$\begin{aligned} F^{(n, k)}=\sum _{\mathcal {A}_{n, k}} F_{\mathbf {\sigma }, \textbf{j}} \,z_{j_1}^{\sigma _n}\dots z_{j_n}^{\sigma _n}. \end{aligned}$$

Analogously one can define \(F^{(n, \le k)}\) and \(F^{(n, \ge k)}\) replacing \(\mathcal {A}_{n, k}\) with \(\mathcal {A}_{n, \le k}\) and \(\mathcal {A}_{n, \ge k}\) respectively. We define the projections

$$\begin{aligned} \Pi ^{(n, k)} F=F^{(n, k)}\,, \qquad \Pi ^{(n, \le k)} F=F^{(n, \le k)}\,, \qquad \Pi ^{(n, \ge k)} F=F^{(n, \ge k)}\,. \end{aligned}$$
(3.3)

Remark 3.1

By the conservation of momentum, if S is a finite set then \(\mathcal {A}_{n, \le 1}\) is finite, for all \(n \in \mathbb {N}\).

Definition 3.2

(Resonances) Let \(n\ge 0\). We say that \((\mathbf {\sigma }, \textbf{j})\in \{ \pm \}^n\times \mathbb Z^{n}\) is a resonance, or a n-resonance, if

$$\begin{aligned} \Omega (m; \mathbf {\sigma }, \textbf{j}):=\Omega (\mathbf {\sigma }\,, \textbf{j}) :=\sum _{i=1}^n \sigma _i \Lambda (j_i)=0\,, \qquad \pi (\mathbf {\sigma }, \textbf{j})=0\,. \end{aligned}$$
(3.4)

We say that a n-resonance \((\mathbf {\sigma }, \textbf{j})\) is trivial if if n is even and, up to permutations, one has

$$\begin{aligned} \textbf{j}=(j, j, k, k, \dots ), \qquad \mathbf {\sigma }=(+, -, +, -, \dots ) . \end{aligned}$$

Definition 3.3

(Resonant monomials)

\(\bullet \) We say that \(z_{j_1}^{\sigma _1}\dots z_{j_n}^{\sigma _n}\) is a resonant monomial if \((\mathbf {\sigma }, \textbf{j})\) is a resonance.

\(\bullet \) A resonant monomial \(z_{j_1}^{\sigma _1}\dots z_{j_n}^{\sigma _n}\) is said action-preserving if it depends only on the square modulus (or actions) \(|z_{j_i}|^2\).

\(\bullet \) Given a homogenous Hamiltonian F of degree n as in (3.1) we denote by \(F_{\text{ res }}\) the projection of F on the space of resonant monomials.

Remark 3.4

Using the conservation of momentum, one can prove that actually all 4-resonances are trivial. This means that the only resonant monomials of degree 4 are action-preserving. We prove this in Lemma 3.8-(iv).

Definition 3.5

We denote by \(\text{ ad}_{H^{(2)}}(\cdot ):=\{ \cdot , H^{(2)} \}\) the adjoint action of the Hamiltonian \(H^{(2)}\). We denote by \(\Pi _{\text{ Ker }}\) and \(\Pi _{\text{ Rg }}\) the projections on the kernel and the range of the adjoint action respectively. We observe that \(\Pi _{\text{ Ker }}\) is the projection on the space of resonant monomials.

Definition 3.6

Let E be a subset of \(\mathbb {Z}\). Let us define the subspace

$$\begin{aligned} \textbf{V}_E:=\Big \{ Z=\frac{1}{\sqrt{2\pi }}\sum _{j\in \mathbb {Z}} Z_j\,e^{\textrm{i} j x} : Z_j=0 \,\,\,j\notin E \Big \}\,. \end{aligned}$$
(3.5)

We denote with \(\Pi _{\textbf{V}_E}\) the projection on the subspace \(\textbf{V}_E\).

If E is a finite subset and a vector field X has image contained in \(\textbf{V}_E\) then we say that X has finite rank or it is a finite rank vector field.

In the following lemma we provide estimates for the vector fields associated to the generators of the Birkhoff maps. With an abuse of notation we denote \(L^{\infty }(\mathbb {T})\times L^{\infty }(\mathbb {T})\) by \(L^{\infty }(\mathbb {T})\).

Lemma 3.7

Let \(n\ge 3\) and \(S\subset \mathbb {Z}\) be a finite set. Let F be a homogenous Hamiltonian of degree n as in (3.1) such that

$$\begin{aligned} F=F^{(n , \le 1)}\,\quad \textrm{and}\qquad [ \! [F ] \! ]:=\sup _{(\mathbf {\sigma }, \textbf{j})} |F_{\mathbf {\sigma }, \textbf{j}}|<\infty \,. \end{aligned}$$
(3.6)

Then the following holds:

  1. (i)

    We have the estimates

    $$\begin{aligned} \begin{aligned} \Vert X_{F}(Z) \Vert _{H^{s}}&\lesssim _s [ \! [F ] \! ]\Vert Z \Vert _{L^{\infty }}^{n-2}\, \Vert Z \Vert _{{H}^0}, \quad \Vert X_{F}(Z) \Vert _{L^{\infty }}\lesssim [ \! [F ] \! ]\Vert Z \Vert ^{n-1}_{L^{\infty }}, \\ \quad \forall Z\in H^0(\mathbb T)\cap L^{\infty }(\mathbb T), \\ \Vert (dX_{F})(Z)[ \widetilde{Z} ]\Vert _{H^{s}}&\lesssim _s [ \! [F ] \! ]\Vert Z \Vert _{L^{\infty }}^{n-2}\, \Vert \widetilde{Z}\Vert _{H^0}, \qquad \forall \widetilde{Z}\in H^0(\mathbb T), \\ \Vert (dX_{F})(Z)[ \widetilde{Z} ]\Vert _{L^{\infty }}&\lesssim [ \! [F ] \! ]\Vert Z \Vert _{L^{\infty }}^{n-2}\,\Vert \widetilde{Z} \Vert _{L^{\infty }}, \qquad \forall \widetilde{Z}\in L^{\infty }(\mathbb T). \end{aligned} \end{aligned}$$
  2. (ii)

    If G is another homogenous Hamiltonian of degree m then \(\{ F, G \}\) is a homogenous Hamiltonian of degree \(n+m-2\) such that its vector field has finite rank and the following estimate holds

    $$\begin{aligned} \Vert X_{\{ F, G \}} \Vert _{H^{s}}\lesssim _s [ \! [F ] \! ][ \! [G ] \! ]\Vert Z \Vert _{L^{\infty }}^{n+m-4}\,\Vert Z\Vert _{H^0}\, \quad \quad \forall Z\in H^0(\mathbb T)\cap L^{\infty }(\mathbb T). \end{aligned}$$

Proof

By Remark 3.1 the assumption (3.6) guarantees that the vector field \(X_F\) has finite rank. In particular its image is contained in \(\textbf{V}_{E_n}\) (recall (3.5)) where \(E_n:=(n-1)\,S\). Then \(\Vert X_F(Z) \Vert _{H^s}\le C \Vert X_F(Z)\Vert _{H^0}\) where \(C>0\) is a constant depending on the index s and on the set S. For some constant \(C_*>0\) we have

$$\begin{aligned} \Vert X_F(Z)\Vert _{H^0} \le C_* [ \! [F ] \! ]\Vert \widehat{Z} * \dots * \widehat{Z} \Vert _{\ell ^2} =C_*\,[ \! [F ] \! ]\Vert Z^{n-1} \Vert _{H^0} \le C_* [ \! [F ] \! ]\Vert Z \Vert ^{n-2}_{L^{\infty }} \Vert Z \Vert _{{H}^0}, \end{aligned}$$

where we denoted by \(\widehat{Z}\) the sequence of the Fourier coefficients of Z. This concludes the proof of the first bound in item (i). The third follows similarly using the fact that \(X_{F}\) is a multilinear operator (see (3.1)).

Regarding the second bound, we have that \(\Vert Z \Vert _{L^{\infty }}\le \Vert Z \Vert _{\ell ^1}\), where

$$\begin{aligned} \Vert Z \Vert _{\ell ^1}:=\frac{1}{\sqrt{2\pi }}\sum _{j\in \mathbb Z} |Z_j|. \end{aligned}$$

Therefore by Young inequality

$$\begin{aligned} \Vert X_{F}(Z)\Vert _{L^{\infty }}\le \Vert X_{F}(Z)\Vert _{\ell ^1} \lesssim \frac{[ \! [F ] \! ]}{\sqrt{2\pi }} \sum _{j\in E} |\sum _{\sum _{i=1}^{n-1}\sigma _i j_i=j} z^{\sigma _1}_{j_1}\dots z_{j_{n-1}}^{\sigma _{n-1}} | \lesssim [ \! [F ] \! ]\Big ( \frac{1}{\sqrt{2\pi }}\sum _{j\in E_n} |Z_j| \Big )^{n-1}, \end{aligned}$$

for some \(E\subseteq E_n\). By noticing that \(|Z_j|\le \sqrt{2\pi } \Vert Z \Vert _{L^{\infty }}\) we get the second bound in (i). The fourth bound is obtained in the same way using that \(d X_F\) is a homogenous polynomial of degree \(n-1\).

Item (ii) follows using formula (2.15) and reasoning as in the proof of item (i). We observe that the support of the Hamiltonian \(\{ F, G \}\) is contained in the support of F, which is finite, hence \(X_{\{ F, G\}}\) has finite rank.

The following lemma provides lower bounds on the function \(\Omega (\mathbf {\sigma }, \textbf{j})\) in (3.4).

Lemma 3.8

(Small divisors) Let S be a finite subset of \(\mathbb {Z}\). There exists a full measure set \(\mathcal {M}\subset [1, 2]\) such that for all \(m\in \mathcal {M}\) the following holds:

  1. (i)

    if \(n=3,5\) then \(\Omega (\mathbf {\sigma }, \textbf{j})\ne 0\) for all \((\mathbf {\sigma }, \textbf{j})\in \mathcal {A}_{n, \le 1}\);

  2. (ii)

    if \(n=4\) and \((\mathbf {\sigma }, \textbf{j})\in \mathcal {A}_{n, \le 1}\) is such that \(\Omega (\mathbf {\sigma }, \textbf{j})= 0\) then \((\mathbf {\sigma }, \textbf{j})\in \mathcal {A}_{n, 0}\). Moreover the 4-resonances are trivial, namely they have the following form (up to permutations)

    $$\begin{aligned} \textbf{j}=(j, j, k, k)\,, \quad \mathbf {\sigma }=(+, -, +, -)\,. \end{aligned}$$
    (3.7)
  3. (iii)

    If \(n=3\) and \(\pi (\mathbf {\sigma },\textbf{j})=0\) then

    $$\begin{aligned} |\Omega (\mathbf {\sigma }, \textbf{j})| \ge \dfrac{m}{2 \sqrt{j^2+m}}\,, \end{aligned}$$
    (3.8)

    where \(|j|=\min _{i=1, 2, 3} |j_i|\).

  4. (iv)

    If \(n=4\), \(\pi (\mathbf {\sigma },\textbf{j})=0\) and \((\mathbf {\sigma }, \textbf{j})\) is not a trivial resonance (recall Def. 3.2) then

    $$\begin{aligned} |\Omega (\mathbf {\sigma }, \textbf{j})| \ge c_* \dfrac{m}{ (\sqrt{j^2+m})^3}\,, \end{aligned}$$
    (3.9)

    where \(|j|=\min _{i=1, 2, 3, 4} |j_i|\) and for some absolute constant \(c_*>0\).

  5. (v)

    For any \(m\in \mathcal {M}\) and for \(n=5\) one has

    $$\begin{aligned} |\Omega (m; \mathbf {\sigma }, \textbf{j})|> \gamma \,, \end{aligned}$$
    (3.10)

    for some \(\gamma >0\) depending only on S, for any \((\mathbf {\sigma }, \textbf{j})\in \mathcal {A}_{5, \le 1}\).

Proof

First of all, given \((\mathbf {\sigma }, \textbf{j})\in \{ \pm \}^n\times \mathbb Z^{n}\), \(n\ge 3\), without loss of generality we can consider \(|j_1|\ge \dots \ge |j_n|\ge 0\). We have to give lower bounds for the resonant combinations of the \(j_i\)’s with any possible choice of the signs \(\sigma _i\). To simplify the notation we shall write (recall (3.2), (3.4))

$$\begin{aligned} \Omega _{\sigma _1, \dots , \sigma _n}(m):=\Omega (m; \mathbf {\sigma }, \textbf{j}), \qquad \pi _{\sigma _1, \dots , \sigma _n}:=\pi (\mathbf {\sigma }, \textbf{j}). \end{aligned}$$

Proof of item (iii). First we observe that \(\Omega _{++-}(m), \Omega _{+-+}(m) \ge \Omega _{-++}(m)\). Hence we can find a bound from below just for \(\Omega _{-++}(m)\). We have First we note that

$$\begin{aligned} \Omega _{-++}(0)=-|j_1|+|j_2|+|j_3|\ge 0, \end{aligned}$$

since \(\pi _{-++}=-j_1+j_2+j_3=0\). Secondly we also note that

$$\begin{aligned} \Omega '_{-++}(m)= \frac{1}{2} \left( -\frac{1}{\Lambda (j_1)}+\frac{1}{\Lambda (j_2)}+\frac{1}{\Lambda (j_3)} \right) \ge \frac{1}{2 \Lambda (j_3)}, \end{aligned}$$

where \(\Lambda (j)\) is in (2.5). Then we have

$$\begin{aligned} \Omega _{-++}(m)= \Omega _{-++}(0)+\int _0^m \Omega '_{-++}(\widetilde{m})\,d\widetilde{m}\ge \frac{m}{2 \sqrt{j_3^2+m}}, \end{aligned}$$

which is the bound (3.8).

Proof of item (iv). The interesting cases are when there are 3 and 2 plus signs. For the first case we reason as before. We have \(\Omega _{+-++}(m),\, \Omega _{++-+}(m),\, \Omega _{+++-} (m)\ge \Omega _{-+++}(m)\). Hence we study the latter case. Recalling again that

$$\begin{aligned} \pi _{-+++}=-j_1+j_2+j_3+j_4=0, \end{aligned}$$

we observe that

$$\begin{aligned} \Omega _{-+++}(0)=-|j_1|+|j_2|+|j_3|+|j_4|\ge 0, \quad \Omega '_{-+++}(m)\ge \frac{1}{2\Lambda (j_4)}. \end{aligned}$$

Then we obtain

$$\begin{aligned} \Omega _{-+++}(m)\ge \frac{m}{2\sqrt{j_4^2+m}}. \end{aligned}$$

For the case of 2 plus signs we have \(\Omega _{+-+-}(m), \Omega _{++--}(m)\ge \Omega _{+--+}(m)\). All the other cases are obtained by a global change of sign. Let us now write

$$\begin{aligned} \Omega _{+--+}(m):=\Omega (m) =\sqrt{j_1^2+m}-\sqrt{j_2^2+m}-\sqrt{j_3^2+m}+\sqrt{j_4^2+m}. \end{aligned}$$

We notice that \(\Omega _{+--+}(m)\) vanishes identically on 4-resonances \((\mathbf {\sigma }, \textbf{j})\) of the form

$$\begin{aligned} \textbf{j}=(j, \pm j, k, \pm k), \qquad \mathbf {\sigma }=(+, -, -, +) , \end{aligned}$$

up to permutations. Actually we claim that the above resonances are trivial. Indeed, up to permutations, we have the following cases.

The first case is when \(\textbf{j}=(j, j, k, k)\) or \(\textbf{j}=(j, j, -k, -k)\), \(\mathbf {\sigma }=(+,-,-,+)\) which are obviously trivial. The second case is when \(\textbf{j}=(j, -j, k, - k)\), \(\mathbf {\sigma }=(+,-,-,+)\). Then (recall (3.2)), by the conservation of momentum, we have \(\pi (\mathbf {\sigma }, \textbf{j})=2j-2k=0\), which implies that \(j=k\). Then \((\mathbf {\sigma }, \textbf{j})\) has the form (3.7). The other case is \(\textbf{j}=(j, -j, k, k)\), \(\mathbf {\sigma }=(+,-,-,+)\). Then \(\pi (\mathbf {\sigma }, \textbf{j})=2j=0\). Again \((\mathbf {\sigma }, \textbf{j})\) has the form (3.7). This proves the claim.

Hence, recalling the momentum conservation \(j_1-j_2-j_3+j_4=0\), and assuming that \((\mathbf {\sigma }, \textbf{j})\) is not a trivial resonance (see Def. 3.2), we can follow Lemma 7.2 in [7] to obtain the lower bound

$$\begin{aligned} \Omega (m)\ge c_*\frac{m}{(\sqrt{j_4^2+m})^3}\, \end{aligned}$$

for some absolute constant \(c_*>0\). This implies (3.9) and concludes the proof of item (iv).

Proof of item (ii). By item (iv) proved above we know that \(\Omega (\mathbf {\sigma }, \textbf{j})= 0\) only if \((\mathbf {\sigma }, \textbf{j})\) is a trivial resonance, i.e. if it has the form (3.7). This implies item (ii)

Proof of items (i) and (v). By section 5 in [22] we have that for almost every mass m, there are \(\texttt{c}>0\) and \(N\in \mathbb N\) such that (see (3.4))

$$\begin{aligned} |\Omega (m; \mathbf {\sigma }, \textbf{j})|\ge \texttt{c}(1+|j_1|^{2}+\cdots +|j_{n}|^{2})^{-N} \end{aligned}$$

for any \((\mathbf {\sigma }, \textbf{j})\in \{ \pm \}^n\times \mathbb Z^{n}\) and n odd. We are only interested in the case \(n=3,5\). Moreover, taking \((\mathbf {\sigma }, \textbf{j})\in \mathcal {A}_{n,\le 1}\), \(n=3,5\), we have, by Remark 3.1, that only a finite number of choices of \((\mathbf {\sigma }, \textbf{j})\) are possible. Hence, the bound above implies that actually \(|\Omega (m; \mathbf {\sigma }, \textbf{j})|\ge \gamma \) for some constant \(\gamma \) depending only on the set of indexes S. Therefore bound (3.10) and item (i) follow. This concludes the proof.

3.2 Partial Birkhoff Normal Form

In this subsection we prove the following result.

Proposition 3.9

Let S be a finite subset of \(\mathbb {Z}\) and \(\mathcal {M}\) the set defined in Lemma 3.8. Then for all \(m\in \mathcal {M}\) and for all \(s\ge 0\) there exists \(C(s)>0\) such that for all \(r>0\) satisfying

$$\begin{aligned} C(s)\, r \le 1\,, \end{aligned}$$
(3.11)

the following holds. There exists a finite set \(E\subset \mathbb {Z}\) such that \(S\subset E\) and an analytic, invertible, symplectic change of variables \(\Phi _B:B_{1}(H^s(\mathbb T))\cap B_r(L^{\infty }(\mathbb T))\rightarrow B_{2}(H^s(\mathbb {T}))\cap B_{2 r}(L^{\infty }(\mathbb T))\) of the form

$$\begin{aligned} \Phi _B=\textrm{I}+\Pi _{\textbf{V}_E} \Phi \Pi _{\textbf{V}_E}\,, \qquad \Phi _B(Z)=:W\,, \end{aligned}$$
(3.12)

with

$$\begin{aligned} \Vert \Phi _B(Z)-Z \Vert _{H^{s}}&\lesssim _s r \Vert Z \Vert _{H^0}\,, \end{aligned}$$
(3.13)
$$\begin{aligned} \Vert (d\Phi _B(Z)) \widetilde{Z}-\widetilde{Z}\Vert _{H^s}&\lesssim _s r \Vert \widetilde{Z} \Vert _{H^0}\,, \qquad \forall \, \widetilde{Z}\in {H^0(\mathbb T)}\,, \end{aligned}$$
(3.14)

such that the Hamiltonian H in (2.12) is transformed in

$$\begin{aligned} \mathcal {H}:=H\circ \Phi _B^{-1}=H^{(2)}+H^{(3, \ge 2)} + \mathcal {H}_{\text{ res }}^{(4, 0)} +\mathcal {H}^{(4, \ge 2)}+\mathcal {H}^{(5, \ge 2)}+\mathfrak {R}^{(\ge 6)}\,, \end{aligned}$$
(3.15)

where \(\mathcal {H}^{(4, \ge 2)}, \mathcal {H}^{(5, \ge 2)}\) and \(\mathfrak {R}^{(\ge 6)}\) generate a finite rank vector field. Moreover

  1. (i)

    \( \mathcal {H}_{\text{ res }}^{(4, 0)}\) is a action-preserving Hamiltonian of the form

    $$\begin{aligned} \mathcal {H}_{\text{ res }}^{(4, 0)}(W)=\frac{1}{2}\sum _{j, k\in S} \texttt{C}_{j k} |w_j|^2 |w_k|^2\,, \qquad \texttt{C}_{j k}\in \mathbb {R}\,, \quad \texttt{C}_{j k}=\texttt{C}_{k j}\,. \end{aligned}$$
    (3.16)
  2. (ii)

    The remainder \(X_{\mathfrak {R}^{(\ge 6)}}\) is a smooth function with a zero at the origin of order 6 and

    $$\begin{aligned} \begin{aligned} \Vert X_{\mathfrak {R}^{(\ge 6)}} (W) \Vert _{H^{s}}&\lesssim _{s} \Vert W\Vert _{L^{\infty }}^{4}\Vert W\Vert _{H^{s}}\,, \qquad \forall \,W\in H^s(\mathbb {T})\,. \end{aligned} \end{aligned}$$
    (3.17)

    Moreover the vector fields \(X_{\mathcal {H}^{(4,\ge 2)}}, X_{\mathcal {H}^{(5,\ge 2)}}\) are homogenous of degree 4 and 5 respectively and satisfy

    $$\begin{aligned} \begin{aligned} \Vert X_{\mathcal {H}^{(4,\ge 2)}}(W) \Vert _{H^s}&\lesssim _s \Vert W\Vert _{L^{\infty }}^{2}\Vert W\Vert _{H^{s}}\,, \\ \Vert X_{\mathcal {H}^{(5,\ge 2)}}(W) \Vert _{H^s}&\lesssim _s \Vert W\Vert _{L^{\infty }}^{3}\Vert W\Vert _{H^{s}}\,. \end{aligned} \end{aligned}$$
    (3.18)
  3. (iii)

    Let \(\varphi \in B_{r}(H^s(\mathbb {T}))\). Then the linearized operator \((dX_{\mathfrak {R}^{(\ge 6)}})(\varphi )[\cdot ]\) belongs to the class \(\textbf{NH}_{4}^{-\rho }[r]\) (see Def. 2.3) for any \(\rho \ge 0\). In particular one has

    $$\begin{aligned} \begin{aligned} \Vert (dX_{\mathfrak {R}^{(\ge 6)}})(\varphi )[W]\Vert _{H^s}&\lesssim _{s}\Vert \varphi \Vert _{H^s}^{4}\Vert W\Vert _{H^s}\,, \\ \Vert (d^{2}X_{\mathfrak {R}^{(\ge 6)}})(\varphi )[W_1,W_2]\Vert _{H^s}&\lesssim _{s}\Vert \varphi \Vert _{H^s}^{3}\Vert W_1\Vert _{H^s}\Vert W_2\Vert _{H^s}\,, \end{aligned} \end{aligned}$$
    (3.19)

    for any \(W,W_1,W_2\in H^s(\mathbb {T})\).

The inverse map \(\Phi _{B}^{-1}\) fulfils bounds like (3.13)–(3.14).

Proof

The map \(\Phi _B\) is constructed in three steps.

First step: Let us consider the homogenous Hamiltonian (recall (3.4))

$$\begin{aligned} F^{(3)}=\sum _{\mathcal {A}_{3, \le 1}} F^{(3)}_{\mathbf {\sigma }, \textbf{j}} \,z_{j_1}^{\sigma _1} z_{j_2}^{\sigma _2} z_{j_3}^{\sigma _3}, \qquad F^{(3)}_{\mathbf {\sigma }, \textbf{j}}:= {\left\{ \begin{array}{ll} \dfrac{H_{\mathbf {\sigma }, \textbf{j}}^{(3, \le 1)}}{\textrm{i} \Omega (\mathbf {\sigma }, \textbf{j})} \qquad \text{ if }\,\,\Omega (\mathbf {\sigma }, \textbf{j})\ne 0,\\ 0 \qquad \quad \text{ if }\,\,\Omega (\mathbf {\sigma }, \textbf{j})=0. \end{array}\right. } \end{aligned}$$

By definition the Hamiltonian \(F^{(3)}\) solves the homological equation

$$\begin{aligned} \{ F^{(3)}, H^{(2)} \}+H^{(3, \le 1)}=\Pi _{\text{ Ker }} H^{(3, \le 1)}\,. \end{aligned}$$
(3.20)

By item (i) of Lemma 3.8 we have that \(\Pi _{\text{ Ker }} H^{(3, \le 1)}=0\). We observe that by Remark 3.1 the equation \(\dot{Z}=X_{F^{(3)}}(Z)\) is an analytic finite dimensional ODE. In particular the image of \(X_{F^{(3)}}\) is contained in a subspace of the form (3.5). Then it is flow \(\Phi ^t_3\) is well defined, at least for small times, and analytic. Since \(F^{(3)}\) is a homogenous Hamiltonian of degree 3, then the vector field \(X_{F^{(3)}}\) is a homogenous function of degree 2. Moreover by the lower bound (3.8) on the 3-resonances and the fact that \([ \! [H^{(3)} ] \! ]\lesssim 1\) we have

$$\begin{aligned}{}[ \! [F^{(3)} ] \! ]\lesssim 1\,. \end{aligned}$$
(3.21)

Since \(F^{(3)}\) is homogenous, in a sufficiently small neighbourhood of the origin the flow \(\Phi ^t_3\) is defined for times \(t\in [0, 1]\). We call \(\Phi _3:=\Phi ^1_{F^{(3)}}\). We claim that

$$\begin{aligned} \Vert \Phi ^t_3(Z) \Vert _{H^s}\le 2 \Vert Z \Vert _{H^s}\,, \qquad \Vert \Phi ^t_3(Z) \Vert _{L^{\infty }}\le 2 \Vert Z \Vert _{L^{\infty }}\,, \qquad \forall t\in [0, 1]\,, \end{aligned}$$
(3.22)

for \(Z\in B_1(H^s(\mathbb {T}))\cap B_r(L^{\infty }(\mathbb {T}))\) for \(r>0\) small enough.

We start by proving the bound for the \(L^{\infty }\)-norm. We use a bootstrap argument. Let us call

$$\begin{aligned} T_*:=\sup \{ T\ge 0: \Vert \Phi ^t_3(Z) \Vert _{L^{\infty }}\le 2 \Vert Z \Vert _{L^{\infty }} \}. \end{aligned}$$

We observe that \(T_*>0\). Assume that \(T_*\ge 1\). We shall prove that at time \(t=1\) a better estimate on \( \Vert \Phi ^t_3(Z) \Vert _{L^{\infty }}\) holds and then \(T_*\) has to be greater than 1. We have that

$$\begin{aligned} \Phi ^t_3(Z)=Z+\int _0^t X_{F^{(3)}}(\Phi _3^{\tau }(Z))\,d\tau , \end{aligned}$$
(3.23)

then, by Lemma 3.7-(i) and using that \(Z\in B_1(H^s(\mathbb {T}))\cap B_r(L^{\infty }(\mathbb {T}))\), we have for \(t\in [0, T_*)\)

$$\begin{aligned} \Vert \Phi ^t_3(Z) \Vert _{L^{\infty }}\le \Vert Z \Vert _{L^{\infty }} +2\,C \,[ \! [F ] \! ]r \int _0^t \Vert \Phi ^{\tau }_{F^{(3)}}(Z) \Vert _{L^{\infty }}\,d\tau , \end{aligned}$$

for some universal constant \(C>0\). By Gronwall Lemma

$$\begin{aligned} \Vert \Phi ^t_3(Z) \Vert _{L^{\infty }}\le \Vert Z \Vert _{L^{\infty }}\,\exp (2 C [ \! [F ] \! ]r t). \end{aligned}$$

Then taking \(r\ll (2C[ \! [F^{(3)} ] \! ])^{-1}\) we have that \( \Vert \Phi ^1_{F^{(3)}}(Z) \Vert _{L^{\infty }}\le (3/2) \Vert Z \Vert _{L^{\infty }}\). Hence we proved the claim on the bound of the \(L^{\infty }\)-norm. Now we prove the one on the \(H^s\) norm. By Cauchy–Schwarz and Lemma 3.7-(i) we have, for \(Z\in B_1(H^s(\mathbb {T}))\cap B_r(L^{\infty }(\mathbb {T}))\),

$$\begin{aligned} \frac{d}{d t} \Vert \Phi ^t_3(Z) \Vert ^2_{H^0}&\lesssim \Vert X_{F^{(3)}}(\Phi ^t_3(Z)) \Vert _{H^0} \Vert \Phi ^t_3(Z) \Vert _{H^0} \\ {}&\lesssim \Vert \Phi ^t_3(Z) \Vert _{L^{\infty }} \Vert \Phi ^t_3(Z) \Vert ^2_{H^0} \lesssim 2 r \Vert \Phi ^t_3(Z) \Vert ^2_{H^0}\,. \end{aligned}$$

Again by Gronwall Lemma we have \(\Vert \Phi ^t_3(Z) \Vert ^2_{H^0}\le \Vert Z \Vert ^2_{H^0}\,\exp (2 r C t)\) for all \(t\in [0, T_*)\). Taking \(r\ll (2r C)^{-1}\) we get \(\Vert \Phi ^t_3(Z) \Vert _{H^0}\le 2 \Vert Z \Vert _{H^0}\) for all \(t\in [0, 1]\). By (3.23) and Lemma 3.7-(i)

$$\begin{aligned} \Vert \Phi ^t_3(Z) \Vert _{H^s}&\le \Vert Z \Vert _{H^s} +C(s) [ \! [F^{(3)} ] \! ]\int _0^t \Vert \Phi _3^{\tau }(Z) \Vert _{L^{\infty }}\,\Vert \Phi _3^{\tau }(Z) \Vert _{H^0}\,d\tau \\ {}&\le \Vert Z \Vert _{H^s}+2\,C(s) [ \! [F^{(3)} ] \! ]r\,\int _0^t \Vert \Phi _3^{\tau }(Z) \Vert _{H^s}\,d\tau \,, \end{aligned}$$

for some constant \(C(s)>0\) depending on s. Reasoning as for the bound of the \(L^{\infty }\)-norm we get the estimate \(\Vert \Phi ^t_3(Z) \Vert _{H^s}\le 2 \Vert Z \Vert _{H^s}\) for all \(t\in [0, 1]\).

Now we claim that

$$\begin{aligned} \Vert d\Phi ^t_3(Z)[\widetilde{Z}] \Vert _{H^s}\le 2 \Vert \widetilde{Z} \Vert _{H^s}\,, \qquad \Vert d\Phi ^t_3(Z) [\widetilde{Z}] \Vert _{L^{\infty }}\le 2 \Vert \widetilde{Z} \Vert _{L^{\infty }}\qquad \forall t\in [0, 1]\,, \end{aligned}$$
(3.24)

for \(Z\in B_1(H^s(\mathbb {T}))\cap B_r(L^{\infty }(\mathbb {T}))\) for \(r>0\) small enough. We have

$$\begin{aligned} d \Phi ^t_3(Z)[\widetilde{Z}]=\widetilde{Z} +\int _0^t d X_{F^{(3)}}(\Phi ^{\tau }_{F^{(3)}}(Z))\,[d \Phi ^{\tau }_{F^{(3)}}(Z)[\widetilde{Z}]]\,d\tau \,. \end{aligned}$$
(3.25)

By Lemma 3.7-(i) and (3.22) we have

$$\begin{aligned} \Vert d \Phi ^t_3(Z)[\widetilde{Z}] \Vert _{L^{\infty }}\le \Vert \widetilde{Z} \Vert _{L^{\infty }} +2 C [ \! [F^{(3)} ] \! ]r \int _0^t \Vert d \Phi ^{\tau }_{F^{(3)}}(Z)[\widetilde{Z}] \Vert _{L^{\infty }}\,d\tau \end{aligned}$$

for some universal constant \(C>0\). Taking r small enough and using Gronwall Lemma we obtain the bound for the \(L^{\infty }\)-norm in (3.24). In the same way one can prove the bound for the \(H^s\)-norm.

By Lemma 3.7, bounds (3.21), (3.22) and the expression (3.23) we have for all \(Z\in B_1(H^s(\mathbb T))\cap B_r(L^{\infty }(\mathbb {T}))\)

$$\begin{aligned} \begin{aligned} \Vert \Phi _3(Z)-Z \Vert _{H^s}&\le \sup _{t\in [0, 1]} \Vert X_{F^{(3)}} (\Phi ^t_3(Z)) \Vert _{H^s} \\ {}&\lesssim _s \sup _{t\in [0, 1]} \Vert \Phi ^t_3(Z) \Vert _{L^{\infty }} \Vert \Phi ^t_3(Z) \Vert _{H^0} \lesssim _s r \Vert Z \Vert _{H^0}\,, \end{aligned} \end{aligned}$$
(3.26)

which is a bound like (3.13). By using the estimates (3.24) we obtain a similar bound for the differential \(d \Phi ^t_3\). For the inverse of \(\Phi _3\) a similar estimate holds.

We obtain the new Hamiltonian by Taylor expanding \(H\circ \Phi _3^t\) at \(t=0\). Hence

$$\begin{aligned} H_1:=H\circ \Phi ^{-1}_3&=H+\{ F^{(3)}, H \}+ \sum _{p=2}^{3}\frac{1}{p!}\{\underbrace{F^{(3)}, \{F^{(3)},\ldots }_{p-times},H\}\cdots \} +\mathfrak {R}_1^{(\ge 6)} \\&{\mathop {=}\limits ^{(3.20)}} H^{(2)}+H^{(3, \ge 2)}+H_1^{(4)}+H_1^{(5)}+\mathfrak {R}_1^{(\ge 6)} \end{aligned}$$

with

$$\begin{aligned} H_1^{(4)}&:=\frac{1}{2}\{ F^{(3)}, H^{(3, \le 1)} \}+\{ F^{(3)}, H^{(3, \ge 2)} \}\,, \\ \mathfrak {R}_1^{(\ge 6)}&:= \frac{1}{3!} \int _{0}^1 (1-t)^{4} \{ F^{(3)}, \{F^{(3)}, \{ F^{(3)}, \{ F^{(3)},H \} \} \} \} \circ \Phi ^{-t}_3\,dt\,, \end{aligned}$$

and where \(H_{1}^{(5)}\) collect all the terms of degree of homogeneity 5. By (3.21) and the fact that \(X_{F^{(3)}}\) is finite rank we have

$$\begin{aligned}{}[ \! [H_1^{(5)} ] \! ]\lesssim 1,\qquad [ \! [\{ F^{(3)}, \{F^{(3)}, \{ F^{(3)}, \{ F^{(3)},H \} \} \} \} ] \! ]\lesssim 1. \end{aligned}$$

We observe that \(H_1-H^{(2)}-H^{(3, \ge 2)}\) generates a finite rank vector field. Moreover, using that

$$\begin{aligned} X_{\mathfrak {R}^{(\ge 6)}_1}=\frac{1}{3!} \int _{0}^1 (1-t)^{4} d \Phi _3^t [X_{F^{(3)}}, [X_{F^{(3)}}, [X_{F^{(3)}}, [X_{F^{(3)}}, X_H]]]] \circ \Phi ^{-t}_3\,dt, \end{aligned}$$

Lemma 3.7-(i) and the estimates (3.22), (3.24) on the map \(\Phi _{3}\) and its differential one can check that \(X_{\mathfrak {R}^{(\ge 6)}_1}\) has a zero at the origin of order at least 5 and it satisfies (3.17) and (3.19).

Second step: Now we look for a transformation that normalizes the term \(H_1^{(4, \le 1)}\). Let us consider the homogenous Hamiltonian

$$\begin{aligned} F^{(4)}=\sum _{\mathcal {A}_{4, \le 1}} F^{(4)}_{\mathbf {\sigma }, \textbf{j}} \,z_{j_1}^{\sigma _1} z_{j_2}^{\sigma _2} z_{j_3}^{\sigma _3} z_{j_4}^{\sigma _4}, \qquad F^{(4)}_{\mathbf {\sigma }, \textbf{j}}:={\left\{ \begin{array}{ll} \dfrac{H_{\mathbf {\sigma }, \textbf{j}}^{(4, \le 1)}}{\textrm{i} \Omega (\mathbf {\sigma }, \textbf{j})} \qquad \text{ if }\,\,\Omega (\mathbf {\sigma }, \textbf{j})\ne 0 \\[3mm] 0 \qquad \qquad \quad \text{ if }\,\,\Omega (\mathbf {\sigma }, \textbf{j})=0. \end{array}\right. } \end{aligned}$$

By definition the Hamiltonian \(F^{(4)}\) solves the homological equation

$$\begin{aligned} \{ F^{(4)}, H^{(2)} \}+H_1^{(4, \le 1)}=\Pi _{\text{ Ker }} H_1^{(4, \le 1)}\,. \end{aligned}$$
(3.27)

By item (ii) of Lemma 3.8 we have that \(\Pi _{\text{ Ker }} H_1^{(4, \le 1)}=\Pi _{\text{ Ker }} H_1^{(4, 0)}\). By Remark 3.4\(\Pi _{\text{ Ker }} H_1^{(4, 0)}\) is a homogenous Hamiltonian action-preserving of degree 4, then it has the form (3.16). By Lemma 3.8-(iv) and by considering that \(\mathcal {A}_{4, \le 1}\) is finite we have

$$\begin{aligned}{}[ \! [F^{(4)} ] \! ]\lesssim 1. \end{aligned}$$

Reasoning as in the first step and using item (i) of Lemma 3.7 we have that the flow

$$\begin{aligned} \Phi ^t_{F^{(4)}}:B_1(H^s(\mathbb {T}))\cap B_{r}(L^{\infty }(\mathbb {T}))\rightarrow B_{2}(H^s(\mathbb {T}))\cap B_{2r}(L^{\infty }(\mathbb {T})) \end{aligned}$$

for times \(t\in [0, 1]\) and \(r>0\) small enough (satisfying a condition like (3.11)), has estimates as in (3.22) (and similar for its differential as in (3.24)). We call \(\Phi _4:=\Phi ^1_{F^{(4)}}\) and we have, reasoning as in (3.26),

$$\begin{aligned} \Vert \Phi _4(Z)-Z \Vert _{H^{s}}\lesssim _{s} r^2 \Vert Z \Vert _{H^{0}} \,, \qquad \forall Z\in B_1(H^s(\mathbb T))\cap B_{r}(L^{\infty }(\mathbb {T}))\,. \end{aligned}$$
(3.28)

Using the smallness condition (3.11), we have that (3.28) implies a bound like (3.13). The bound (3.14) follows by reasoning as in the previous step. We obtain the new Hamiltonian by Taylor expanding \(H\circ \Phi _{F^{(4)}}^{-t}\) at \(t=0\)

$$\begin{aligned} H_2:=H\circ \Phi ^{-1}_4&=H_1+\{ F^{(4)}, H_1 \} +\int _{0}^1 (1-t) \{ F^{(4)}, \{ F^{(4)}, H_1\} \}\circ \Phi ^{-t}_{F^{(4)}}\,dt \\&{\mathop {=}\limits ^{(3.27)}} H^{(2)}+H^{(3, \ge 2)} +\Pi _{\text{ Ker }} H_1^{(4, 0)}+H_1^{(4, \ge 2)}+H_2^{(5)}+\mathfrak {R}_2^{(\ge 6)} \end{aligned}$$

with

$$\begin{aligned} \begin{aligned}&H_2^{(5)}=\{ F^{(4)}, H^{(3, \ge 2)}\}+{H}_1^{(5)}, \\ {}&\mathfrak {R}_2^{(\ge 6)}:=\mathfrak {R}_1^{(\ge 6)} +\{ F^{(4)}, H_1^{(4)}+\mathfrak {R}_1^{(\ge 6)} \} +\int _{0}^1 (1-t) \{ F^{(4)}, \{ F^{(4)}, H_1\} \}\circ \Phi ^t_{F^{(4)}}\,dt. \end{aligned} \end{aligned}$$

We have

$$\begin{aligned}{}[ \! [H_2^{(5)} ] \! ]\lesssim L^2, \qquad [ \! [\{ F^{(4)}, \{ F^{(4)}, H_1\} \} ] \! ]\lesssim 1. \end{aligned}$$

We observe that \(H_1^{(4, \ge 2)}, H_2^{(5)}, \mathfrak {R}_2^{(\ge 6)}\) generate finite rank vector fields. Moreover, reasoning as in the previous step, one can check that \(X_{\mathfrak {R}_2^{(\ge 6)}}\) has a zero at the origin of order at least 5 and satisfies (3.17), (3.19).

Third step: We want to normalize \(H_2^{(5, \le 1)}\). Let us consider the homogenous Hamiltonian

$$\begin{aligned} F^{(5)}=\sum _{\mathcal {A}_{5, \le 1}} F^{(5)}_{\mathbf {\sigma }, \textbf{j}} \,z_{j_1}^{\sigma _1} z_{j_2}^{\sigma _2} z_{j_3}^{\sigma _3} z_{j_4}^{\sigma _4} z_{j_5}^{\sigma _5}, \qquad F^{(5)}_{\mathbf {\sigma }, \textbf{j}}:={\left\{ \begin{array}{ll} \dfrac{H_{\mathbf {\sigma }, \textbf{j}}^{(5, \le 1)}}{\textrm{i} \Omega (\mathbf {\sigma }, \textbf{j})} \qquad \text{ if }\,\,\Omega (\mathbf {\sigma }, \textbf{j})\ne 0 \\[3mm] 0 \qquad \qquad \quad \text{ if }\,\,\Omega (\mathbf {\sigma }, \textbf{j})=0. \end{array}\right. } \end{aligned}$$

By definition the Hamiltonian \(F^{(5)}\) solves the homological equation

$$\begin{aligned} \{ F^{(5)}, H^{(2)} \}+H_2^{(5, \le 1)}=\Pi _{\text{ Ker }} H_2^{(5, \le 1)}\,. \end{aligned}$$

By item (ii) of Lemma 3.8 we have that \(\Pi _{\text{ Ker }} H_2^{(5, \le 1)}=0\). By item (iii) of Lemma 3.8 and by considering that \(\mathcal {A}_{5, \le 1}\) is finite we have

$$\begin{aligned}{}[ \! [F^{(5)} ] \! ]\lesssim 1. \end{aligned}$$

Reasoning as in the first step and using item (i) of Lemma 3.7 we have that the flow

$$\begin{aligned} \Phi ^t_{F^{(5)}}:B_1(H^s(\mathbb {T}))\cap B_{r}(L^{\infty }(\mathbb {T}))\rightarrow B_{2}(H^s(\mathbb {T}))\cap B_{2r}(L^{\infty }(\mathbb {T})) \end{aligned}$$

for times \(t\in [0, 1]\) and \(r>0\) satisfying a condition like (3.11), has estimates as in (3.22) (and similar for its differential as in (3.24)). We call \(\Phi _5:=\Phi ^1_{F^{(5)}}\) and we have, reasoning as in (3.26),

$$\begin{aligned} \Vert \Phi _5(Z)-Z \Vert _{H^s}\lesssim _{s} r^3 \Vert Z \Vert _{H^{0}} \,, \qquad \forall Z\in B_{r}(L^{\infty }(\mathbb {T}))\,. \end{aligned}$$
(3.29)

Using the smallness condition (3.11) (taking \(\texttt{a}>0\) large enough), we have that (3.29) implies a bound like (3.13). The bound (3.14) follows by reasoning as in the previous step using the fact that \(\Phi _{5}\) is the time one flow map generated by the vector field \(X_{F^{(5)}}\). We obtain the new Hamiltonian by Taylor expanding \(H\circ \Phi _{F^{(5)}}^{-t}\) at \(t=0\)

$$\begin{aligned} H_3:=H\circ \Phi ^{-1}_5&=H_2+\{ F^{(5)}, H_2 \} +\int _{0}^1 (1-t) \{ F^{(5)}, \{ F^{(5)}, H_2\} \}\circ \Phi ^{-t}_{F^{(5)}}\,dt \\&{\mathop {=}\limits ^{(3.27)}} H^{(2)}+H^{(3, \ge 2)} +\Pi _{\text{ Ker }} H_1^{(4, 0)}+H_1^{(4, \ge 2)}+H_2^{(5, \ge 2)}+\mathfrak {R}_3^{(\ge 6)} \end{aligned}$$

with

$$\begin{aligned} \mathfrak {R}_3^{(\ge 6)}:= \mathfrak {R}_2^{(\ge 6)}+\{ F^{(5)}, H_1^{(4)}+H_2^{(5)}+\mathfrak {R}_2^{(\ge 6)} \} +\int _{0}^1 (1-t) \{ F^{(5)}, \{ F^{(5)}, H_2\} \}\circ \Phi ^{-t}_{F^{(5)}}\,dt. \end{aligned}$$

We observe that \(H_1^{(4, \ge 2)}, H_2^{(5, \ge 2)}, \mathfrak {R}_3^{(\ge 6)}\) generate finite rank vector fields. Moreover, reasoning as in the previous step, one can check that \(X_{\mathfrak {R}_3^{(\ge 6)}}\) has a zero at the origin of order at least 5 and satisfies (3.17), (3.19).

Now we define

$$\begin{aligned} \Phi _B:=\Phi _5 \circ \Phi _{4} \circ \Phi _{3}, \qquad \mathfrak {R}^{(\ge 6)}:=\mathfrak {R}_3^{(\ge 6)}. \end{aligned}$$

Then formula (3.15) holds. Since each Birkhoff map has the form (3.12) then \(\Phi _B\) has the same form. By the estimates (3.26), (3.28), (3.29) and using the smallness condition (3.11) we obtain (3.13). The bound (3.14) follows by composition since the maps \(\Phi _{i}\), \(i=3,4,5\) satisfies the same properties.

3.3 The Dynamics of the Normalized Hamiltonian

Let \(S\subset \mathbb Z\) be any finite set. In this section we study the dynamics of the first order of the Hamiltonian which has been normalized following the procedure described in Proposition 3.9. We call

$$\begin{aligned} \mathcal {N}=H^{(2)}+H^{(3, \ge 2)} + \mathcal {H}_{\text{ res }}^{(4, 0)} +\mathcal {H}^{(4, \ge 2)}+\mathcal {H}^{(5, \ge 2)} \end{aligned}$$
(3.30)

the Hamiltonian in the Birkhoff coordinates \(\Phi _B\) provided by Proposition 3.9, up to the remainder \(\mathfrak {R}^{(\ge 6)}\). We observe that the vector field of \(H^{(3, \ge 2)}+\mathcal {H}^{(4, \ge 2)}+\mathcal {H}^{(5, \ge 2)}\) vanishes on the finite dimensional subspace

$$\begin{aligned} \mathcal {U}_S=\{ w_n=0\,\,\,n\notin S \}. \end{aligned}$$

The restricted Hamiltonian \(\mathcal {N}_{|_{\mathcal {U}_S}}\) coincides then with

$$\begin{aligned} \mathcal {H}_{res}=H^{(2)}+\mathcal {H}_{\text{ res }}^{(4, 0)}. \end{aligned}$$
(3.31)

Thus, the equations of motion read as

$$\begin{aligned} \dot{w}_n=\textrm{i} \left( \Lambda (n)+\sum _{k\in S} \texttt{C}_{n k} |w_k|^2\right) \,w_n, \qquad n\in S. \end{aligned}$$
(3.32)

For all subsets \(\tilde{S}\subseteq S\) the subspace

$$\begin{aligned} \mathcal {V}_{\tilde{S}}=\{ w_n=0\,\,\, n\notin \tilde{S} \} \end{aligned}$$

is invariant by the flow of (3.32). The actions \(|w_j|^2, j\in S\) are all conserved quantities. Then the complementary of \(\bigcup _{\tilde{S}\subset S} \mathcal {V}_{\tilde{S}}\) is foliated by maximal (n-dimensional) invariant tori \(\{ |w_j|^2=\xi _j,\,\,\,j\in S \}\) with \(\xi _j>0,\,j\in S\). They support quasi-periodic motions given by

$$\begin{aligned} \varphi (\xi , \theta ; t, x)=\sum _{j\in S} \sqrt{\xi _j} \,e^{\textrm{i} (\theta _j+ \omega _j(\xi ) t+j x)}\,, \qquad \xi =(\xi _j)_{j\in S}, \qquad \theta =(\theta _j)_{j\in S} \qquad \theta _j\in \mathbb T\,, \end{aligned}$$
(3.33)

where the frequency vector has components

$$\begin{aligned} \omega _j(\xi )=\Lambda (j)+\sum _{k\in S} \texttt{C}_{j k} \,\xi _k \qquad j\in S\,. \end{aligned}$$
(3.34)

We observe that these frequencies can be very close to resonant, depending on the choice of the \(\xi _j\)’s. The subspaces \(V_{\tilde{S}}\) are foliated by lower dimensional invariant tori supporting the quasi-periodic motions

$$\begin{aligned} \varphi (\xi , \theta ; t, x) =\sum _{j\in \tilde{S}} \sqrt{\xi _j} \,e^{\textrm{i} (\theta _j+ \omega _j(\xi ) t+j x)}, \qquad \theta _j\in \mathbb T. \end{aligned}$$

4 Setting of the Problem in the Birkhoff Coordinates

Let us fix some \(N\in \mathbb N\) and consider a finite symmetric subset such that

$$\begin{aligned} \begin{aligned} S=\{ j_1, \dots , j_N\}\subset \mathbb Z\,, \qquad j\in S \qquad \Rightarrow \qquad -j\in S,\\ j\notin S \qquad \Leftrightarrow \qquad |j|>\max _{k\in S}\{ |k| \}\,. \end{aligned} \end{aligned}$$
(4.1)

We need the above properties of the set S only for then construction of the modified energy on high frequencies in Sect. 5.2.2, see Lemma 5.16. Consider the new variables

$$\begin{aligned} W:={\bigl [{\begin{matrix}w\\ \overline{w}\end{matrix}}\bigr ]}:=\Phi _{B}(Z) \end{aligned}$$

where \(\Phi _B\) is the Birkhoff map constructed in Proposition 3.9 applied with S as in (4.1). By the symplectic nature of the Birkhoff map we have that, the equation \(\partial _{t}Z=X_{H}(Z)\) with H in (2.12), becomes

$$\begin{aligned} \partial _{t}W=X_{\mathcal {H}}(W)\,, \end{aligned}$$
(4.2)

where \(X_{\mathcal {H}}\) is the vector field of the new Hamiltonian \(\mathcal {H}=H\circ \Phi _B^{-1}\) in (3.15).

4.1 The Approximate Solution

We consider a solution \(\varphi (t, x)\) of the normalized Hamiltonian system \(\mathcal {N}\) with initial condition on the subspace \(\mathcal {V}_S\). We could consider also solutions with initial data in \(\mathcal {V}_{\tilde{S}}\) with \(\tilde{S}\subset S\) symmetric without relevant changes in the proof.

The function \(\varphi (t, x)=\varphi (\xi , \theta ; t, x)\) has the form (3.33), where \(\xi =(\xi _j)_{j\in S}\) can be any vector with positive components and \(\theta =(\theta _j)_{j\in S}\) is any vector of angles. Since the normalized change of coordinates \(\Phi _B\) provided in Proposition 3.9 is well defined only on a small neighbourhood of the origin of \(L^{\infty }\), it is convenient to rescale the actions \(\xi _j \mapsto \varepsilon ^2 \xi _j\), with \(\varepsilon >0\) small enough such that if \(\varepsilon =r\) condition (3.11) is satisfied. Then the rescaled solution has the form

$$\begin{aligned} \varepsilon \varphi (t, x)=\varepsilon \sum _{j\in {S}}\sqrt{\xi _j}\,e^{\textrm{i} (\theta _j+ \omega _j(\xi ) t+j x)}\,. \end{aligned}$$
(4.3)

We observe that

$$\begin{aligned} \sup _{t\in \mathbb {R}} \Vert \varepsilon \varphi \Vert _{H^s}\lesssim \varepsilon \qquad \forall s\ge 0. \end{aligned}$$
(4.4)

In the following we shall assume further smallness conditions on the parameter \(\varepsilon \). Such conditions can all be written in the following form

$$\begin{aligned} C_{s}\varepsilon < 1\,, \end{aligned}$$
(4.5)

for some constant \(C_{s}>0\) depending on \(s>1/2\) and the set S.

The next lemma shows that the functions \(\varepsilon \varphi \) constructed in (4.3) are approximate solutions of the Hamiltonian equation given by \(\mathcal {H}\).

Lemma 4.1

Let \(s>1/2\). There \(C_s>1\) such that if (4.5) holds, then the residual

$$\begin{aligned} \textrm{Res}_{\mathcal {H}}(\varepsilon \varphi ):=-\varepsilon \partial _{t}\varphi +X_{\mathcal {H}}(\varepsilon \varphi ) \end{aligned}$$
(4.6)

satisfies

$$\begin{aligned} \sup _{t\in [0,T]}\Vert \textrm{Res}_{\mathcal {H}}(\varepsilon \varphi )\Vert _{H^s}\lesssim _{s}\varepsilon ^{5}\,. \end{aligned}$$
(4.7)

Proof

We have that \(X_{\mathcal {H}}=X_{\mathcal {N}}+X_{\mathfrak {R}^{(\ge 6)}}\). Since \(\varepsilon \varphi \) is a solution of the Hamiltonian system given by \(\mathcal {N}\) we have

$$\begin{aligned} \textrm{Res}_{\mathcal {H}}(\varepsilon \varphi )=X_{\mathfrak {R}^{(\ge 6)}}(\varepsilon \varphi ). \end{aligned}$$

Then the thesis follows by estimates (3.17) and (4.4) taking \(\varepsilon \) small enough.

4.2 The Error Function in Birkhoff Coordinates

We set

$$\begin{aligned} \varepsilon ^{\beta } V:=W-\varepsilon \varphi \,, \qquad \textrm{for}\,\,\beta >2, \end{aligned}$$
(4.8)

where \(\varepsilon \varphi \) is the approximate solution of the form (4.3), supported on a set S as in (4.1), and with frequencies of oscillation given in (3.34). We also recall that estimate (4.4) holds true. Then the error function V in the Birkhoff coordinates solves the equation (recall the definition of \(\text{ Res}_{\mathcal {H}}\) in (4.6))

$$\begin{aligned} \dot{V}=d X_{\mathcal {H}}(\varepsilon \varphi ) [V] +\varepsilon ^{\beta } \mathcal {Q} (\varepsilon \varphi )[V, V] +\varepsilon ^{-\beta } \text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\,, \end{aligned}$$
(4.9)

where

$$\begin{aligned} \mathcal {Q} (\varepsilon \varphi )[V, V]:= \int _0^1 (1-t)\,d^2 X_{\mathcal {H}}(\varepsilon \varphi +t \varepsilon ^{\beta } V)[V, V]\,dt\,. \end{aligned}$$
(4.10)

In the next proposition we show that the control on the function V implies the main result Theorem 1.1.

Proposition 4.2

There is \(C_s>1\) such that if

$$\begin{aligned} \begin{aligned} \sup _{t\in [0, T]} \varepsilon ^{\beta } \Vert V \Vert _{H^s}&\le 2 \varepsilon ^{\beta } \,,\quad \beta>2\,, \\ \sup _{t\in [0, T]} \Vert Z \Vert _{H^s}&\lesssim _s\varepsilon \,, \qquad T=c_0 \varepsilon ^{-2-\sigma }\,,\;\;\sigma >0\,, \end{aligned} \end{aligned}$$
(4.11)

for some \(c_0>0\) and with \(\varepsilon \) satisfying (4.5), then the error \(\varepsilon ^{\beta }R:=Z-\varepsilon \varphi \) satisfies the bound

$$\begin{aligned} \sup _{t\in [0, T]}\varepsilon ^{\beta } \Vert R \Vert _{H^s}\lesssim _s\varepsilon ^{2}\,. \end{aligned}$$
(4.12)

Proof

Recall that \(Z=\Phi _{B}^{-1}(W)\). Then, by estimate (3.13) in Prop. 3.9, which holds also for \(\Phi _{B}^{-1}\), and the assumptions (4.11), (4.5) with \(C_s>1\) large enough, we deduce that

$$\begin{aligned} \begin{aligned} \Vert W\Vert _{H^s}&\lesssim _{s} \varepsilon ,\quad \textrm{and}\quad \Vert Z-W\Vert _{H^{s}}=\Vert \Phi _{B}^{-1}(W)-W\Vert _{H^{s}}\lesssim _s \varepsilon \Vert W\Vert _{H^s}\lesssim _{s}\varepsilon ^2, \end{aligned} \end{aligned}$$

uniformly in \(t\in [0,T]\). Then, recalling \(\beta >2\), one has

$$\begin{aligned} \begin{aligned} \Vert Z-\varepsilon \varphi \Vert _{H^{s}}&{\mathop {\le }\limits ^{(4.8)}} \Vert Z-W\Vert _{H^{s}}+\Vert \varepsilon ^{\beta } V\Vert _{H^{s}} {\mathop {\lesssim _{s}}\limits ^{(4.2)}}\varepsilon ^2+\varepsilon ^{\beta }\lesssim _{s}\varepsilon ^{2}, \end{aligned} \end{aligned}$$

uniformly in \(t\in [0,T]\). This implies (4.12).

The result above guarantees that in order to obtain our main result we must show that the solution V of (4.9) satisfies the (4.11). To do this we will provide some à priori estimates on V.

Recalling (4.9) the main issues are the following:

  • Show that the term \(d X_{\mathcal {H}}(\varepsilon \varphi )[V]+\varepsilon ^{\beta }\mathcal {Q}(\varepsilon \varphi )[V, V]\) has a pseudo-differential structure. This is the content of Sect. 4.3.

  • In Sects. 56 we provide the energy estimates for the flow.

4.3 Pseudo-Differential Structure of the Equation for the Remainder

As explained before, in the next sections we shall provide a priori bounds (see (4.11)) on the solution V of the problem (4.9). In Sect. 3.3 we provided suitable upper bounds on the non-homogenenous term \(\varepsilon ^{-\beta } \text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\) appearing in the right hand side of (4.9) (see Lemma 4.1). Hence, we shall prove that the vector field

$$\begin{aligned} \mathcal {Y}(\varepsilon \varphi , V):= d X_{\mathcal {H}}(\varepsilon \varphi ) [V] + \mathcal {Q} (\varepsilon \varphi )[V, V]\,, \end{aligned}$$
(4.13)

which is nonlinear in V, generates a well-defined flow on the spaces \(H^s\). In this subsection we show that actually \(\mathcal {Y}(\varepsilon \varphi , V)\) has a pseudo-differential structure. We deal with the two summands in (4.13) separately.

Define the following real symbols (recall (1.2)–(1.3))

$$\begin{aligned} \begin{aligned} a(U; x)&:=-\tfrac{1}{2}(\partial _{{\phi }_{xx}}f_2)(\phi , \phi _x,\phi _{xx})\,, \qquad \phi =\tfrac{\Lambda ^{-\frac{1}{2}}}{\sqrt{2}}(u+\overline{u})\,, \quad U={\bigl [{\begin{matrix}u\\ \overline{u}\end{matrix}}\bigr ]}\,, \\ b(U; x)&:=(\partial _{{\phi }_{x}}f_2)(\phi , \phi _x,\phi _{xx})\,, \\ c(U; x)&:=(\partial _{\phi }f_2)(\phi , \phi _x,\phi _{xx}) \\ d(U; x)&:=-m a(U; x)-\frac{3}{8}a_{xx}(U; x)-\frac{1}{2}b_{x}(U; x)+c(U; x)\,. \end{aligned} \end{aligned}$$
(4.14)

Consider the functions in (4.8) and the symbols in (4.14) with \(U\rightsquigarrow \varepsilon \varphi , V\). Let us define

$$\begin{aligned} \begin{aligned} \texttt{f}(\varepsilon \varphi , V; x)&:=a(\varepsilon \varphi ;x)+\tfrac{1}{2}a(V;x)\,, \\ \texttt{g}(\varepsilon \varphi , V; x)&:=d(\varepsilon \varphi ;x)+\tfrac{1}{2}d(V;x)\,. \end{aligned} \end{aligned}$$
(4.15)

As a consequence of Propositions 4.4, 4.8 we deduce the following result.

Proposition 4.3

The vector field \(\mathcal {Y}(\varepsilon \varphi , V)\) in (4.13) has the form

$$\begin{aligned} \begin{aligned} \mathcal {Y}(\varepsilon \varphi , V)&= \textrm{i} E{Op^{\textrm{BW}}}\Big (\big ({\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]} +{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{f}(\varepsilon \varphi , V; x)\big )\Lambda (\xi )\Big )V \\&\quad +\textrm{i} E{Op^{\textrm{BW}}}\Big ({\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{g}(\varepsilon \varphi , V; x)\Lambda ^{-1}(\xi )\Big )V + \textrm{i} E\mathfrak {Q}(\varepsilon \varphi )V+ R(V)\,, \end{aligned} \end{aligned}$$
(4.16)

where the remainder (recall Def. 2.3)

$$\begin{aligned} \mathfrak {Q}(\varepsilon \varphi )=\mathfrak {Q}_1(\varepsilon \varphi ) +\mathfrak {Q}_2(\varepsilon \varphi )+\mathfrak {Q}_{\ge 3}(\varepsilon \varphi )\in \mathbf {\Sigma }_1^{-2}[r,3] \otimes \mathcal {M}_2(\mathbb {C}), \end{aligned}$$

is real according to Def. 2.13. Moreover the non-homogeneous component \(\mathfrak {Q}_{\ge 3}(\varepsilon \varphi )\in \textbf{NH}^{-2}_{3}[r] \otimes \mathcal {M}_2(\mathbb {C})\) satisfy bounds like (2.35). Moreover for s large enough the remainder R(V) has the form

$$\begin{aligned} ({R}^+(V), \overline{{R}^+(V)})^{T} \end{aligned}$$

and it satisfies

$$\begin{aligned} \begin{aligned} \Vert R(V)\Vert _{H^{s+2}}\lesssim _{s}&\, \Vert V\Vert _{H^{s}}^{2}\, \qquad \forall V\in B_{\varepsilon }(H^s). \end{aligned} \end{aligned}$$
(4.17)

Eventually, for s large enough and \(\varepsilon \) satisfying (4.5), one has

$$\begin{aligned} \Vert \mathcal {Y}(\varepsilon \varphi , V)\Vert _{H^{s-1}}\lesssim _{s} \Vert V\Vert _{H^{s}}\,,\qquad \forall \, V\in B_{\varepsilon \sqrt{L}}(H^{s})\,. \end{aligned}$$
(4.18)

4.3.1 Para-Differential Structure of the Linearized Operator

Consider the function \(\varepsilon \varphi \) in (4.8) and recall the bound (4.4). The aim of this section is to prove the following result.

Proposition 4.4

There exists \(s_0>0\) large enough such that for all \(s\ge s_0\) there is \(C_s>0\) such that the following holds. If \(\varepsilon \) satisfies (4.5) and \(\varepsilon \varphi \) defined in (4.8) satisfies (4.4) then we have that \(a(\varepsilon \varphi ;x), d(\varepsilon \varphi ;x)\in \textbf{SM}^{0}_{1}\) (see Def. 2.5) are independent of \(\xi \in \mathbb {R}\) and real valued, and

$$\begin{aligned} \begin{aligned} (dX_{\mathcal {H}})(\varepsilon \varphi )[V]&= \textrm{i} E {Op^{\textrm{BW}}}\Big ( \big ({\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]} +{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}a(\varepsilon \varphi ;x)\big ) \Lambda (\xi )\Big )V \\ {}&\quad + \textrm{i} E {Op^{\textrm{BW}}}\big ({\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}d(\varepsilon \varphi ;x)\Lambda ^{-1}(\xi )\big ) + \textrm{i} E\mathfrak {Q}(\varepsilon \varphi ) V\,, \end{aligned} \end{aligned}$$
(4.19)

where \(\mathcal {H}\) in (3.15), \(\Lambda (\xi )\) is in (2.5) and where the remainder

$$\begin{aligned} \mathfrak {Q}(\varepsilon \varphi )=\mathfrak {Q}_1(\varepsilon \varphi ) +\mathfrak {Q}_2(\varepsilon \varphi )+\mathfrak {Q}_{\ge 3}(\varepsilon \varphi )\in \mathbf {\Sigma }_1^{-2}[r,3] \otimes \mathcal {M}_2(\mathbb {C}), \end{aligned}$$

is real according to Def. 2.13. Moreover the non-homogeneous component \(\mathfrak {Q}_{\ge 3}(\varepsilon \varphi )\in \textbf{NH}^{-2}_{3}[r] \otimes \mathcal {M}_2(\mathbb {C})\) satisfy bounds like (2.35). Finally the following estimate holds: for \(s\ge s_0\)

$$\begin{aligned} \Vert dX_{\mathcal {H}}(\varepsilon \varphi )[Z]\Vert _{ H^{s-1}} \lesssim _{s} \Vert Z\Vert _{ H^{s}}(1+C_s\Vert \varepsilon \varphi \Vert _{{ H}^{s}})\,, \end{aligned}$$
(4.20)

for any \(Z\in H^s\).

Remark 4.5

We remark that the operator \((dX_{\mathcal {H}})(\varepsilon \varphi )[\cdot ]\) is Hamiltonian according to Definition 2.14.

We have the following.

Lemma 4.6

Recall (3.15) and let \(X:=X_{\mathcal {H}_{\text{ res }}^{(4, 0)}} + X_{\mathcal {H}^{(4, \ge 2)}}+ X_{\mathcal {H}^{(5, \ge 2)}}\). Then we have

$$\begin{aligned} (dX)(\varepsilon \varphi )[\cdot ]:=\sum _{j=2}^{3}\mathcal {S}_{j}(\varepsilon \varphi )[\cdot ], \end{aligned}$$

for some real \({\mathcal {S}_{j}}(\varphi )\in \textbf{M}^{-\rho }_{j}\otimes \mathcal {M}_{2}(\mathbb {C})\), for any \(\rho \ge 0\). Moreover the coefficients of \(\mathcal {S}_{2}(\varepsilon \varphi )\) and \(\mathcal {S}_{3}(\varepsilon \varphi )\) satisfy (2.34).

Proof

It follows recalling that X is sum of finite rank vector fields (recall Def. 3.6).

In view of Lemma 4.6, we have that the linearized operator of the vector field \(X_{\mathcal {H}}\) has the form

$$\begin{aligned} (dX_{\mathcal {H}})(\varepsilon \varphi )[V]=\textrm{i} E \Lambda V+(dX_{{H}^{(3,\ge 2)}})(\varepsilon \varphi )[V] +\mathcal {S}(\varepsilon \varphi )[V]\,, \end{aligned}$$
(4.21)

for some \(\mathcal {S}(\varepsilon \varphi )\in \Sigma _{1}^{-\rho }[r,3]\otimes \mathcal {M}_{2}(\mathbb {C})\) (see Def. 2.3) for any \(\rho \ge 0\). To study the contribution coming from the Hamiltonian \(H^{(3,\ge 2)}\) we first analyse the linearized operator of \(X_{H^{(3)}}\) where \(H^{(3)}\) is the Hamiltonian in (2.12). We have the following Lemma.

Lemma 4.7

Consider the symbols \(a(\varepsilon \varphi ;x), d(\varepsilon \varphi ;x)\) in (4.14). Then we have that \(a(\varepsilon \varphi ;x), d(\varepsilon \varphi ;x)\) belong to \( \textbf{SM}^{0}_{1}\) (see Def. 2.5), are independent of \(\xi \in \mathbb {R}\) and real valued, and

$$\begin{aligned} (dX_{H^{(3)}})(\varepsilon \varphi )[V]= \textrm{i} E{Op^{\textrm{BW}}}\Big ({\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}a(\varepsilon \varphi ;x)\Lambda (\xi ) +{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]} d(\varepsilon \varphi ;x)\Lambda ^{-1}(\xi ) \Big )V+ \widetilde{\mathcal {S}}(\varepsilon \varphi )V \end{aligned}$$
(4.22)

for some real \(\widetilde{\mathcal {S}}(\varepsilon \varphi )\in \textbf{M}^{-2}_1\otimes \mathcal {M}_{2}(\mathbb {C})\), where \(\Lambda (\xi )\) is in (2.5).

Proof

Recall that

$$\begin{aligned} X_{H^{(3)}}= \left( \begin{matrix} X_{H^{(3)}}^{+} \\ \overline{X_{H^{(3)}}^{+} } \end{matrix} \right) , \qquad X_{H^{(3)}}^{+}(v){\mathop {=}\limits ^{(1.4)}}\frac{\textrm{i} }{\sqrt{2}} \Lambda ^{-1/2}N\left( \Lambda ^{-1/2}\left( \frac{v+ \overline{v}}{\sqrt{2}}\right) \right) \end{aligned}$$

where N is the nonlinearity in (1.2), (1.3). Therefore, by linearizing at (recall \(\mathcal {C}\) in (2.10)) \( \varepsilon \varphi =\mathcal {C} {\bigl [{\begin{matrix}u\\ \tilde{u}\end{matrix}}\bigr ]}, \) we get

$$\begin{aligned} \begin{aligned} (dX_{H^{(3)}}^{+})(\varepsilon \varphi )[V]&= \frac{\textrm{i} \Lambda ^{-1/2} }{\sqrt{2}} \big ((\partial _{u}f_{2})(u,u_{x},u_{xx}) h\big ) \\ {}&+\frac{\textrm{i} \Lambda ^{-1/2} }{\sqrt{2}}\big ( (\partial _{u_{x}}f_{2})(u,u_{x},u_{xx}) h_{x}+ (\partial _{u_{xx}}f_{2})(u,u_{x},u_{xx}) h_{xx} \big ) \end{aligned} \end{aligned}$$
(4.23)

where \(h:=\Lambda ^{-1/2}(v+\overline{v})/\sqrt{2}\) and \(V={\bigl [{\begin{matrix}v\\ \overline{v}\end{matrix}}\bigr ]}\). We now expand in decreasing symbols the above operators. Since the function G in (1.3) is \(C^{\infty }\) and cubic it is easy to check that the functions \(\partial _{u}f_2\), \(\partial _{u_{x}}f_2\), \(\partial _{u_{xx}}f_2\) are symbols independent of \(\xi \in \mathbb {R}\) and in the class \(\textbf{SM}^{0}_{1}\) (see Def. 2.5). As a consequence all the symbols defined in (4.14) belongs to \(\textbf{SM}^{0}_{1}\) as well.

We write

$$\begin{aligned} a(\varepsilon \varphi ;x)=a_{\chi }(\varepsilon \varphi ;x,\xi )+B(\varepsilon \varphi ;x,\xi )\,,\qquad B(\varepsilon \varphi ;x,\xi ):= \mathcal {F}^{-1}\Big (\widehat{a}(\eta )\big (1-\chi (|\eta |/\langle \xi \rangle )\big )\Big )\,. \end{aligned}$$
(4.24)

The operator \({Op^{\textrm{W}}}(B)\) has the form (2.30), (2.31) with

$$\begin{aligned} r^{\sigma }(j, j_1, j_2)=a\left( j_1, \frac{j_1+2 j_2}{2} \right) \left( 1-\chi \left( \frac{2 j_1}{j_1+2 j_2}\right) \right) , \qquad j=j_1+j_2. \end{aligned}$$

Let us consider \({Op^{\textrm{W}}}(\widetilde{B})={Op^{\textrm{W}}}(\widetilde{B}(\varepsilon \varphi ; x, \xi ))\) as the multinlinear operator with the form (2.30), (2.31) with coefficients

$$\begin{aligned} \tilde{r}^{\sigma }(j, j_1, j_2)={\left\{ \begin{array}{ll} r^{\sigma }(j, j_1, j_2) \qquad \; \textrm{if}\,\,j_1\in S, \\ 0 \qquad \qquad \qquad \quad \textrm{if}\,\, j_1\notin S. \end{array}\right. } \end{aligned}$$

Since \(r^{\sigma }(j, j_1, j_2)\ne 0\) implies that \(\langle j_1 \rangle \ge \langle j+j_2 \rangle \) and \(S\subset \mathbb Z\) is finite, we have \({Op^{\textrm{W}}}(\widetilde{B})\in \textbf{M}_1^{-2}\).

Since \(\varepsilon \varphi \) is compactly Fourier supported (its Fourier support is contained in S) then \({Op^{\textrm{W}}}(B)={Op^{\textrm{W}}}(\widetilde{B})\). By abuse of notation we say that \({Op^{\textrm{W}}}(B)\in \textbf{M}_1^{-2}\).

One can reason in the same way to deal with the term depending on the symbol \(d(\varepsilon \varphi ;x)\).

By the symbolic calculus in Lemma 2.2 we get that

$$\begin{aligned}&\Lambda ^{-1/2}{Op^{\textrm{BW}}}\left( \partial _{u_{xx}}f_2\right) \partial _{xx}\Lambda ^{-1/2}\\&\quad = {Op^{\textrm{BW}}}\Big ( -\xi ^{2}\partial _{u_{xx}}f_2\Lambda ^{-1}(\xi ) -\frac{1}{2\textrm{i} }\{\Lambda ^{-1/2}(\xi ), \xi ^{2}\Lambda ^{-1/2}(\xi )\partial _{u_{xx}}f_2\}\Big ) \\ {}&\qquad -{Op^{\textrm{BW}}}\Big (\frac{\Lambda ^{-1/2}(\xi )}{2\textrm{i} }\{\partial _{u_{xx}}f_2, \xi ^{2}\Lambda ^{-1/2}(\xi )\} +\frac{\Lambda ^{-\frac{1}{2}}(\xi )}{8}\sigma (\partial _{u_{xx}}f_2, \xi ^{2}\Lambda ^{-\frac{1}{2}}(\xi )) \Big ) \\ {}&\qquad +{Op^{\textrm{BW}}}\Big ( \frac{1}{8}\sigma \big (\Lambda ^{-\frac{1}{2}}(\xi ), \partial _{u_{xx}}f_2\xi ^{2}\Lambda ^{-\frac{1}{2}}(\xi )\big ) +\frac{1}{4}\big \{ \Lambda ^{-\frac{1}{2}}(\xi ), \{\partial _{u_{xx}}f_2 ,\xi ^{2}\Lambda ^{-\frac{1}{2}}(\xi )\} \big \} \Big ) \\ {}&\quad ={Op^{\textrm{BW}}}(-\xi ^{2}\partial _{u_{xx}}f_2\Lambda ^{-1}(\xi )-(\partial _{u_{xx}}f_2)_{x}\textrm{i} \xi \Lambda ^{-1}(\xi ) +\texttt{a}^{(-1)}(x,\xi )) \end{aligned}$$

up to smoothing remainders in the class \(\textbf{M}^{-2}_{1}\) and where

$$\begin{aligned} \begin{aligned} \texttt{a}^{(-1)}(x,\xi )&:=-\frac{\Lambda ^{-\frac{1}{2}}(\xi )}{8}\sigma (\partial _{u_{xx}}f_2, \xi ^{2}\Lambda ^{-\frac{1}{2}}(\xi ) +\frac{1}{8}\sigma \big (\Lambda ^{-\frac{1}{2}}(\xi ), \partial _{u_{xx}}f_2\xi ^{2}\Lambda ^{-\frac{1}{2}}(\xi )\big ) \\ {}&+\frac{1}{4}\big \{ \Lambda ^{-\frac{1}{2}}(\xi ), \{\partial _{u_{xx}}f_2 ,\xi ^{2}\Lambda ^{-\frac{1}{2}}(\xi )\} \big \}\,. \end{aligned} \end{aligned}$$
(4.25)

Similarly we have

$$\begin{aligned}{} & {} \Lambda ^{-1/2}{Op^{\textrm{BW}}}\left( \partial _{u}f_2 \right) \Lambda ^{-1/2} +\Lambda ^{-1/2}{Op^{\textrm{BW}}}\left( \partial _{u_{x}}f_2 \right) \partial _{x}\Lambda ^{-1/2} \\{} & {} ={Op^{\textrm{BW}}}\Big ( \textrm{i} \xi \Lambda ^{-1}(\xi )(\partial _{u_{x}}f_2)+\texttt{b}^{(-1)}(x,\xi )\Big ) \end{aligned}$$

up to smoothing remainders in the class \(\textbf{M}^{-2}_{1}\) and where

$$\begin{aligned} \begin{aligned} \texttt{b}^{(-1)}(x,\xi )&:= \frac{\Lambda ^{-\frac{1}{2}}(\xi )}{2\textrm{i} }\big \{ \partial _{u_{x}}f_2,\textrm{i} \xi \Lambda ^{-\frac{1}{2}}(\xi ) \big \}+\frac{1}{2\textrm{i} }\big \{ \Lambda ^{-\frac{1}{2}}(\xi ), \partial _{u_{x}}f_2\textrm{i} \xi \Lambda ^{-\frac{1}{2}}(\xi ) \big \}+\partial _{u}f_2 \Lambda ^{-1}(\xi )\,. \end{aligned} \end{aligned}$$
(4.26)

Therefore

$$\begin{aligned} \begin{aligned} (dX^{+}_{H^{(3)}})(\varepsilon \varphi )[V]&= \frac{\textrm{i} }{2}{Op^{\textrm{BW}}}\Big ( -\xi ^{2}\partial _{u_{xx}}f_2\Lambda ^{-1}(\xi )+\textrm{i} \xi \Lambda ^{-1}(\xi )\big ( \partial _{u_{x}}f_2-(\partial _{u_{xx}}f_2)_{x} \big ) \Big )[v+\overline{v}] \\ {}&+\frac{\textrm{i} }{2}{Op^{\textrm{BW}}}\Big (\texttt{a}^{(-1)}(x,\xi )+\texttt{b}^{(-1)}(x,\xi )\Big )[v+\overline{v}] \end{aligned} \end{aligned}$$
(4.27)

with \(\texttt{a}^{(-1)}, \texttt{b}^{(-1)}\) in (4.25), (4.26), up to some remainder in \(\textbf{M}^{-2}_{1}\). Now we show that, thanks to the Hamiltonian structure of the nonlinearity, there is a cancelation of the terms of order zero. Indeed, by (1.2)–(1.3), one has

$$\begin{aligned} \begin{aligned} f_2&=(\partial _{u}G)(u,u_{x})-\partial _{uu_{x}}G(u,u_{x})u_x- \partial _{u_{x}u_{x}}G(u,u_{x})u_{xx}, \\ \partial _{u_{x}}f_2&= (\partial _{uu_{x}}G)(u,u_{x})- (\partial _{u_{x}u}G)(u,u_{x}) \\ {}&\quad -(\partial _{uu_{x}u_x}G)(u,u_{x})u_{x} -(\partial _{u_{x}u_{x}u_x}G)(u,u_{x})u_{xx}, \\ \partial _{u_{xx}}f_2&=-(\partial _{u_{x}u_{x}}G)(u,u_{x}), \\ (\partial _{u_{xx}}f_2)_{x}&=\partial _{u_{x}}f_2. \end{aligned} \end{aligned}$$

Moreover, using formulæ  (2.18)–(2.19), one gets that (recall (4.25), (4.26))

$$\begin{aligned} \begin{aligned} \texttt{a}^{(-1)}(x,\xi )+\texttt{b}^{(-1)}(x,\xi )&= (\partial _{u_{xx}}f_2)_{xx}\Lambda ^{-1}(\xi )\Big (\frac{1}{4} -\frac{\xi ^{2}\Lambda ^{-2}(\xi )}{8}+\frac{\xi ^{4}\Lambda ^{-4}(\xi )}{4}\Big ) \\ {}&-\frac{1}{2}(\partial _{u_x}f_2)_{x}\Lambda ^{-1}(\xi )+(\partial _{u}f_2)\Lambda ^{-1}(\xi )\,. \end{aligned} \end{aligned}$$
(4.28)

Notice also that (recall (2.5))

$$\begin{aligned} \begin{aligned} -\xi ^{2}\Lambda ^{-1}(\xi )&=\frac{-\xi ^{2}}{\sqrt{|\xi |^{2}+m}}= -\sqrt{|\xi |^{2}+m}+\frac{m}{\sqrt{|\xi |^{2}+m}} =-\Lambda (\xi )+m\Lambda ^{-1}(\xi ) , \\ \xi ^{2}\Lambda ^{-2}(\xi )&=1-m\Lambda ^{-2}(\xi ). \end{aligned} \end{aligned}$$

By the discussion above we have that (4.27) can be written as

$$\begin{aligned} \begin{aligned} (dX^{+}_{H^{(3)}})&(\varepsilon \varphi )[V]= \frac{\textrm{i} }{2}{Op^{\textrm{BW}}}\Big ( -\partial _{u_{xx}}f_2\Lambda (\xi ) \Big )[v+\overline{v}] \\ {}&+\frac{\textrm{i} }{2}{Op^{\textrm{BW}}}\Big ( \big ( m(\partial _{u_{xx}}f_2)+\frac{3}{8}(\partial _{u_{xx}}f_2)_{xx}-\frac{1}{2}(\partial _{u_{x}}f_2)_x +(\partial _{u}f_2) \big )\Lambda ^{-1}(\xi ) \Big )[v+\overline{v}]\,, \end{aligned} \end{aligned}$$
(4.29)

up to some remainder in \(\textbf{M}^{-2}_{1}\). In the latter equation we have used that pseudo-differential operators of order \(-2\) are 2-smoothing multilinear operators by Lemma 2.8. Recalling (4.14), we have that the (4.29) implies (4.22).

Proof of Proposition 4.4

Notice that \(H^{(3)}=H^{(3,\le 1)}+H^{(3,\ge 2)}\). Hence

$$\begin{aligned} (dX_{H^{(3,\ge 2)}})(\varepsilon \varphi )[\cdot ]=(dX_{H^{(3)}})(\varepsilon \varphi )[\cdot ]-(dX_{H^{(3,\le 1)}})(\varepsilon \varphi )[\cdot ]. \end{aligned}$$

Since \(X_{H^{(3,\le 1)}}\) is finite rank and quadratic we have \((dX_{H^{(3,\le 1)}})(\varepsilon \varphi )\in \textbf{M}_1^{-2}\otimes \mathcal {M}_{2}(\mathbb {C})\). Then the structure of the linearized operator \((dX_{H^{(3,\ge 2)}})(\varphi )[\cdot ]\) is essentially determined by Lemma 4.7. Then (4.19) follows by (4.21).

The bound (4.20) follows from Lemmata 4.6, 2.4 to estimate the smoothing remainders, and from Lemma 4.7 and Lemma 2.7 (recall also Def. 2.5) to estimate the unbounded pseudo-differential terms.

4.3.2 Para-Linearization of the Nonlinear Term

We consider the nonlinear term \(\mathcal {Q}(\varepsilon \varphi )[V,V]\) appearing in (4.9). Recalling (4.10) we write

$$\begin{aligned} \begin{aligned} \mathcal {Q}(\varepsilon \varphi )[V, V]&:= \widetilde{\mathcal {Q}}(\varepsilon \varphi )[V, V]+\mathcal {Q}^{\star }(\varepsilon \varphi )[V, V]\,, \quad \textrm{where} \\ \widetilde{\mathcal {Q}}(\varepsilon \varphi )[V, V]&:= \frac{1}{2}\, X_{{H}^{(3, \ge 2)}}(V,V)\,, \\ \mathcal {Q}^{\star }(\varepsilon \varphi )[V, V]&:=\int _0^1 (1-t)\,d^2 X_{\mathcal {H} -H^{(3, \ge 2)}}(\varepsilon \varphi +t V)[V, V]\,dt\,. \end{aligned} \end{aligned}$$
(4.30)

We note that \(\mathcal {Q}^{\star }\) has finite rank and the following estimate holds

$$\begin{aligned} \Vert \mathcal {Q}^{\star }(\varepsilon \varphi )[V, V] \Vert _{{H}^s} \lesssim _s \Vert \varepsilon \varphi \Vert _{L^{\infty }}\Vert V \Vert _{H^{0}}^2\, \qquad \forall \,\, V\in {B_{\varepsilon }(H^s)}. \end{aligned}$$
(4.31)

The (4.31) follows recalling (3.15), (3.18), (3.19) estimate (4.4) and the smallness (4.5). The aim of this section is to show that term \(\widetilde{\mathcal {Q}}\) in (4.30) has a para-differential structure.

We have the following.

Proposition 4.8

Let \(s>2+1/2\), \(V\in H^{s}\) and let \(\mathcal {Q}(\varepsilon \varphi )[V,V]\) be the nonlinearity in (4.30) with \(\varepsilon \varphi \) satisfying (4.4). We have that

$$\begin{aligned} \mathcal {Q}(\varepsilon \varphi )[V,V]= \tfrac{\textrm{i} }{2} E{Op^{\textrm{BW}}}\big ({\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}a(V;x)\Lambda (\xi ) + {\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}d(V;x)\Lambda ^{-1}(\xi ) \big )V+{R}(V)\,, \end{aligned}$$
(4.32)

where a(Vx), d(Vx) are in (4.14). The function R(V) has the form \(({R}^+(V), \overline{{R}^+(V)})^{T}\) and it satisfies a bound like (4.17). Moreover we have that \(a\in \textbf{S M}_1^{0}\) (see Def. 2.5) and

$$\begin{aligned} \begin{aligned} |a(V)|_{\mathcal {N}_p^{0}}&\lesssim \Vert V\Vert _{H^{p+s_0+2}}\,, \quad \forall \, p+s_0+2\le s\,,\,\quad p\in \mathbb {N}\,, \\ |d(V)|_{\mathcal {N}_p^{0}}&\lesssim \Vert V\Vert _{H^{p+s_0+4}}\,, \quad \forall \, p+s_0+4\le s\,,\,\quad p\in \mathbb {N}\,, \end{aligned} \end{aligned}$$
(4.33)

where \(s_0>1/2\).

Proof

By (4.30) we write

$$\begin{aligned} \mathcal {Q}(\varepsilon \varphi )[V, V]=\frac{1}{2}X_{H^{(3)}}(V,V)-\frac{1}{2}X_{H^{(3,\le 1)}}(V,V)+ \mathcal {Q}^{\star }(\varepsilon \varphi )[V, V]\,, \end{aligned}$$
(4.34)

where \(H^{(3)}\) is the Hamiltonian function in (2.12). The vector field \(X_{H^{(3,\le 1)}}\) is finite rank (recall Remark 3.1). Then the last two summands in the r.h.s. of (4.34) satisfy a bound like (4.17) using also the estimate (4.31). By applying the Bony-paralinearization formula (see [48, 53]) to the vector field \(X_{H^{(3)}}\), using symbolic calculus (see Lemma 2.7) and reasoning as in Lemma 4.7, one gets

$$\begin{aligned} X_{H^{(3)}}(V,V)=\textrm{i} E {Op^{\textrm{BW}}}\big ({\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}a(V;x)\Lambda (\xi ) + {\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}d(V;x)\Lambda ^{-1}(\xi )\big ) \end{aligned}$$

up to some smoothing remainder satisfying (4.17). For more details we refer the reader to Proposition 3.6 in [33]. Therefore formula (4.32) follows. The bound (4.33) follows using the explicit definition of the symbols a(Vx), d(Vx) in (4.14) and by Lemma 2.6.

Proof of Proposition 4.3

It follows by by Propositions 4.4 and 4.8.

Remark 4.9

(Hamiltonian structure 1). Recalling (4.15), Remark 4.5 and Proposition 4.4, we notice that the operator

$$\begin{aligned} \textrm{i} E{Op^{\textrm{BW}}}\Big (\big ({\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]} +{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{f}(\varepsilon \varphi , V; x)\big )\Lambda (\xi ) +{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{g}(\varepsilon \varphi , V; x)\Lambda ^{-1}(\xi ) \Big )_{|V\equiv 0}[\cdot ]+ \textrm{i} E\mathfrak {Q}(\varepsilon \varphi )[\cdot ] \end{aligned}$$

is Hamiltonian according to Definition 2.14.

5 The Estimates on the Error Function: High Frequencies

Consider the remainder V in (4.8) which solves (4.9). We shall provide a priori bounds on the norm of V as long as (4.11) holds. This section concerns the study of the high frequencies of V. In particular we consider the equation

$$\begin{aligned} \varepsilon ^{\beta }\partial _t \Pi ^{\perp }_S \dot{V}= \Pi ^{\perp }_S d X_{\mathcal {H}}(\varepsilon \varphi ) [\varepsilon ^{\beta }V] +\Pi _S^{\perp } \mathcal {Q}(\varepsilon \varphi )[\varepsilon ^{\beta }V, \varepsilon ^{\beta }V] +\Pi ^{\perp }_S \text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\,, \end{aligned}$$
(5.1)

where S is in (4.1). The main result of this section is the following.

Theorem 5.1

(Estimates of high modes) There is \(s_0\gg 1\) such that for any \(s\ge s_0\) there is a constant \(C_s>0\) such that if \(\varepsilon \) satisfies (4.5) the following holds. Consider \(\varepsilon \varphi \) in (4.8) and let V be a solution of (4.9) defined for \(t\in [0,T]\) for some \(T>0\). Then, if (4.11) and (4.4) hold true, one has

$$\begin{aligned} \begin{aligned} \varepsilon ^{2\beta }\Vert \Pi _{S}^{\perp }V\Vert _{H^{s}}^{2}&\le (1+\varepsilon C_{s})\varepsilon ^{2\beta }\Vert \Pi _{S}^{\perp }V(0)\Vert _{H^{s}}^{2} \\ {}&+C_{s}T \sup _{t\in [0,T]}\Vert \varepsilon \varphi \Vert _{H^{s}}^{3}\sup _{t\in [0,T]}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}^{2} \\ {}&+ C_{s}T\sup _{t\in [0,T]}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}^{3} +C_{s}T\varepsilon ^{5}\sup _{t\in [0,T]}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}\,, \end{aligned} \end{aligned}$$
(5.2)

uniformly in \(t\in [0,T]\).

The proof of the above result involves several arguments. We start by rewriting equation (5.1) in a more suitable way. Let us define

$$\begin{aligned} U^{\perp }:=\varepsilon ^{\beta }\Pi _{S}^{\perp }V\,, \qquad U^{\perp }=\begin{pmatrix} u^{\perp }\\ \overline{u^{\perp }} \end{pmatrix}\,, \quad u^{\perp }:=\varepsilon ^{\beta }\Pi _S^{\perp } v\,. \end{aligned}$$
(5.3)

In the following lemma we provide some properties of the vector field in the r.h.s. of (5.1).

Lemma 5.2

(i) The function \(U^{\perp }\) in (5.3) satisfies the problem

$$\begin{aligned} \begin{aligned} \partial _{t}U^{\perp }&=\Pi _{S}^{\perp }\textrm{i} E{Op^{\textrm{BW}}}\Big (\big ( {\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]}+{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{f}(\varepsilon \varphi ,\varepsilon ^{\beta }V;x)\big )\Lambda (\xi ) \Big )U^{\perp } \\ {}&\qquad \qquad +\Pi _{S}^{\perp }\textrm{i} E{Op^{\textrm{BW}}}\Big ( {\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{g}(\varepsilon \varphi ,\varepsilon ^{\beta }V;x)\Lambda ^{-1}(\xi ) \Big )U^{\perp } \\ {}&\qquad \qquad +\Pi _{S}^{\perp }\textrm{i} E\mathfrak {Q}(\varepsilon \varphi )U^{\perp } +\Pi _{S}^{\perp }\widetilde{R}(\varepsilon \varphi ,\varepsilon ^{\beta }V) +\Pi _S^{\perp } \text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\,, \end{aligned} \end{aligned}$$
(5.4)

where the symbols \(\texttt{f}, \texttt{g}\) and the remainder \(\mathfrak {Q}\) are given by Proposition 4.3. Moreover the remainder

$$\begin{aligned} \widetilde{R}(\varepsilon \varphi ;\varepsilon ^{\beta }V)= \big (\widetilde{R}^{+}(\varepsilon \varphi ;\varepsilon ^{\beta }V), \overline{\widetilde{R}^{+}(\varepsilon \varphi ;\varepsilon ^{\beta }V)}\big )^{T}, \end{aligned}$$

and there is \(s_0>1\) such that, for \(s\ge s_0\), it satisfies the bound

$$\begin{aligned} \begin{aligned} \Vert \widetilde{R}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\Vert _{{H}^{s}}\lesssim _{s}&\,\varepsilon ^{2\beta }\Vert V\Vert _{H^{s}}^{2}\,. \end{aligned} \end{aligned}$$
(5.5)

(ii) The vector field

$$\begin{aligned} \begin{aligned} \Pi _{S}^{\perp }\textrm{i} E{Op^{\textrm{BW}}}\Big (\big ({\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]}&+{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{f}(\varepsilon \varphi , \varepsilon ^{\beta }V; x)\big )\Lambda (\xi )\Big )_{|V\equiv 0}\Pi _{S}^{\perp }+ \Pi _{S}^{\perp }\textrm{i} E\mathfrak {Q}(\varepsilon \varphi )\Pi _{S}^{\perp } \\ {}&\qquad +\Pi _{S}^{\perp }\textrm{i} E{Op^{\textrm{BW}}}\Big ( {\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{g}(\varepsilon \varphi ,\varepsilon ^{\beta }V;x,\xi )\Lambda ^{-1}(\xi ) \Big )_{|V\equiv 0} \\ {}&=\Pi _{S}^{\perp } (dX_{\mathcal {H}})(\varepsilon \varphi )\Pi _{S}^{\perp } \end{aligned} \end{aligned}$$

is Hamiltonian according to Def. 2.14.

Proof

By Proposition 4.3 and (5.3) (see also (4.9), (4.13) and (4.16)) we have that equation (5.1) can be written as

$$\begin{aligned} \begin{aligned} \partial _{t}U^{\perp }&= \Pi _{S}^{\perp }\textrm{i} E{Op^{\textrm{BW}}}\Big (\big ({\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]} +{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{f}(\varepsilon \varphi , \varepsilon ^{\beta }V; x)\big )\Lambda (\xi )\Big )[\varepsilon ^{\beta }V] \\ {}&+\Pi _{S}^{\perp }\textrm{i} E{Op^{\textrm{BW}}}\Big ({\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{g}(\varepsilon \varphi , \varepsilon ^{\beta }V; x)\Lambda ^{-1}(\xi )\Big ) [\varepsilon ^{\beta }V] \\ {}&+\Pi _{S}^{\perp }\textrm{i} E\mathfrak {Q}(\varepsilon \varphi )[\varepsilon ^{\beta }V] + \Pi _{S}^{\perp }R(\varepsilon ^{\beta }V) +\Pi _S^{\perp } \text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\,. \end{aligned} \end{aligned}$$
(5.6)

Then (5.4) comes from the fact that \(\varepsilon ^{\beta } V=\Pi _S \varepsilon ^{\beta } V+U^{\perp }\) and that the operators in the right hand side of (5.6) composed with \(\Pi _S\) are bounded and satisfy (5.5). Recalling Propositions 4.4, 4.3 we notice that

$$\begin{aligned} \begin{aligned} \Pi _{S}^{\perp }(dX_{\mathcal {H}})(\varepsilon \varphi )[\varepsilon ^{\beta }V]= \Pi _S^{\perp } d X_{\mathcal {H}}(\varepsilon \varphi ) [\Pi _S^{\perp }\varepsilon ^{\beta }V]\,, \end{aligned} \end{aligned}$$
(5.7)

where we used that, since \(\varepsilon \varphi \) is Fourier supported only on the tangential set S in (4.1) then

$$\begin{aligned} \Pi _S^{\perp } d X_{\mathcal {H}_{res}^{(4, 0)}} (\varepsilon \varphi )= \Pi _S^{\perp } d X_{\mathcal {H}^{(n, \ge 2)}} (\varepsilon \varphi )[\Pi _S V] =0. \end{aligned}$$

This proves item (ii).

5.1 Block-Diagonalization and Basic Energy Estimates

In this section we construct a (linear) change of coordinates which block-diagonalize the system (5.4) up to smoothing remainders. We first introduce a map, given in terms of a suitable para-product, in such a way one can diagonalize the matrix of symbols

$$\begin{aligned} E\big ({\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]} +{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{f}(\varepsilon \varphi , \varepsilon ^{\beta }V; x)\big )\,, \end{aligned}$$
(5.8)

where \(\texttt{f}\) is the symbol in (4.15). This is the content of Sect. 5.1.1. In Sect. 5.1.2 we provide a correction which give us a map which is symplectic on the restriction of the space \(H^{s}(\mathbb {T};\mathbb {C})\times H^{s}(\mathbb {T};\mathbb {C})\) to functions supported only on \(S^{c}\). Finally in Sect. 5.1.3 we diagonalize at the highest order the system (5.4).

5.1.1 Construction of an Approximate Diagonalizating Matrix

Let us define the matricesFootnote 2

(5.9)

where the symbols \(\texttt{g}^{+},\texttt{g}^{-}\) are defined as

$$\begin{aligned} \begin{aligned} \texttt{g}^{+}&:=\texttt{g}^{+}(\varepsilon \varphi , \varepsilon ^{\beta }V):=\frac{1+\texttt{f} +\lambda }{\sqrt{2\lambda \big (1+\texttt{f}+ \lambda \big ) }}, \\ \texttt{g}^{-}&:=\texttt{g}^{-}(\varepsilon \varphi ,\varepsilon ^{\beta }V) :=\frac{-\texttt{f}}{\sqrt{2\lambda \big (1 +\texttt{f} +\lambda \big ) }}\,, \end{aligned} \end{aligned}$$
(5.10)

and

$$\begin{aligned} \lambda :=\lambda (\varepsilon \varphi ,\varepsilon ^{\beta }V):= \sqrt{(1+\texttt{f})^{2} -(\texttt{f})^{2}}=\sqrt{1+2\texttt{f}}\,. \end{aligned}$$
(5.11)

The first result of this section is the following.

Lemma 5.3

Recall the symbol \(\texttt{f}\) in (4.15) (see also (4.14)), and the functions \(\varepsilon \varphi , V\) in (4.8). Assume the (4.4) and (4.11). Then the following holds.

(i) the symbol \(\lambda -1\) in (5.11) is independent of \(\xi \in \mathbb {R}\) and belongs to (see Def. 2.5) \(\mathbf {S\Sigma }^{\; m}_{1}[r,3]\) with estimates uniform in \(t\in [0,T]\).

(ii) the symbols \(\texttt{g}^{+}-1, \texttt{g}^{-}\) in (5.10) are independent of \(\xi \in \mathbb {R}\) and belong to (see Def. 2.5) \(\mathbf {S\Sigma }^{\; m}_{1}[r,3]\) with estimates uniform in \(t\in [0,T]\).

(iii) One has that

$$\begin{aligned} \partial _{t}\texttt{g}^{\sigma }= \widetilde{\texttt{g}}_{1}^{\sigma }(\varepsilon \varphi ) +\widetilde{\texttt{g}}_{2}^{\sigma }(\varepsilon \varphi ,\varepsilon \varphi ) +\widetilde{\texttt{g}}_{\ge 3}^{\sigma }(\varepsilon \varphi ,\varepsilon ^\beta V)\,,\;\;\; \sigma \in \{\pm \}\,, \end{aligned}$$
(5.12)

where \( \widetilde{\texttt{g}}^{\sigma }_{j}\in \textbf{SM}_{j}^{0}\), \(j=1,2\), and \(\widetilde{\texttt{g}}_{\ge 3}^{\sigma }\), \(\sigma \in \{\pm \}\), are in \(\mathcal {N}_{p}^{0}\) with

$$\begin{aligned} \begin{aligned} |\widetilde{\texttt{g}}_{\ge 3}^{\sigma }|_{\mathcal {N}_{p}^{0}}&\lesssim _{s} \Vert \varepsilon \varphi \Vert ^{3}_{H^{p+\mu }}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{p+\mu }} +\varepsilon ^5\,, \end{aligned} \end{aligned}$$
(5.13)

\(\sigma \in \{\pm \}\), for any \(p+\mu \le s\), \(p\in \mathbb {N}\), for some \(\mu >1/2\), with estimates uniform in \(t\in [0,T]\).

(iv) Recall (5.8), (5.9). One has that

$$\begin{aligned} G^{-1}(\varepsilon \varphi ,\varepsilon ^{\beta }V)E\big ({\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]} +{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{f}(\varepsilon \varphi , \varepsilon ^{\beta }V; x)\big ) G(\varepsilon \varphi ,\varepsilon ^{\beta }V)=E{\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]}\lambda (\varepsilon \varphi ,\varepsilon ^{\beta }V)\,. \end{aligned}$$
(5.14)

Proof

Item (i). By Taylor expanding the function \(\lambda \) in (5.11) we get

$$\begin{aligned} \lambda =1+\texttt{f}-\frac{\texttt{f}^{2}}{2} +\frac{3\texttt{f}^{3}}{2}\int _{0}^{1}(1+2t\texttt{f})^{-\frac{5}{2}}(1-t)^{2}dt\,, \end{aligned}$$
(5.15)

from which we deduce, using also (4.15),

$$\begin{aligned} \begin{aligned} \lambda =1+a(\varepsilon \varphi )-\tfrac{1}{2}[a(\varepsilon \varphi )]^{2}+\tfrac{1}{2}a(\varepsilon ^{\beta }V)-\tfrac{1}{2}a(\varepsilon \varphi )&a(\varepsilon ^{\beta }V) -\tfrac{1}{8}[a(\varepsilon ^{\beta }V)]^{2} \\ {}&+\frac{3\texttt{f}^{3}}{2}\int _{0}^{1}(1+2t\texttt{f})^{-\frac{5}{2}}(1-t)^{2}dt\,. \end{aligned} \end{aligned}$$
(5.16)

Using (4.33), (4.4), the fact that the symbol \(a(\varepsilon \varphi )\) in (4.14) is in \(\textbf{SM}^{0}_{1}\), and Lemma 2.8 we deduce that \(\lambda -1\) belongs to the class \(\mathbf {S\Sigma }^{\; m}_{1}[r,3]\) of Definition 2.5.

Item (ii). It follows using the explicit formulæ  (5.10) and reasoning as done for Item (i).

Item (iii). Recalling \(\mathcal {H}\) in (3.15) and (4.6), we notice that the function \(\varepsilon \varphi \) in (4.8) solves the equation

$$\begin{aligned} \partial _{t} \varepsilon \varphi =X_{\mathcal {H}}(\varepsilon \varphi )-\text{ Res}_{\mathcal {H}}(\varepsilon \varphi ). \end{aligned}$$

By the properties of \(\mathcal {H}\) given by Proposition 3.9 we can write

$$\begin{aligned} \begin{aligned} \partial _{t} \varepsilon \varphi&= \textrm{i} E{Op^{\textrm{W}}}({\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]} \Lambda (\xi )) \varepsilon \varphi +\textrm{i} E M(\varepsilon \varphi ) \varepsilon \varphi -\text{ Res}_{\mathcal {H}}(\varepsilon \varphi ) \\ M(\varepsilon \varphi )&=M_1(\varepsilon \varphi )+M_2(\varepsilon \varphi )+M_{\ge 3}(\varepsilon \varphi ) \end{aligned} \end{aligned}$$
(5.17)

for some real and self-adjoint matrix of operators \(M(\varphi )\) where \(M_i\in \textbf{M}^{1}_i\otimes \mathcal {M}_2(\mathbb {C})\), \(i=1,2\) and \(M_{\ge 3}\in \textbf{NH}^{1}_{3}[r]\otimes \mathcal {M}_2(\mathbb {C})\). Recall also that V solves the equation (4.9) that, by Proposition 4.3, can be written as

$$\begin{aligned} \varepsilon ^{\beta }\partial _{t}V=\mathcal {Y}(\varepsilon \varphi ,\varepsilon ^{\beta }V)+\textrm{Res}_{\mathcal {H}}(\varepsilon \varphi ) \end{aligned}$$
(5.18)

where \(\mathcal {Y}(\varepsilon \varphi ,V)\) is in (4.16). We now compute the time derivative of the symbol \(\texttt{g}^{+}\) in (5.10). The case of \(\texttt{g}^{-}\) is similar. By item (ii) (recall Def. 2.5) we have that \(\texttt{g}^{+}\) admits the expansion

$$\begin{aligned} \texttt{g}^{+}=1+\texttt{g}^{+}_{1}(\varepsilon \varphi )+\texttt{g}^{+}_{2}(\varepsilon \varphi ,\varepsilon \varphi ) +\texttt{g}_{\ge 3}^{+}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\,, \quad \texttt{g}^{+}_{j}\in \textbf{SM}_{j}^{0}\,,\;\; j=1,2\,, \end{aligned}$$
(5.19)

and \(\texttt{g}_{\ge 3}^{+}\), \(\sigma \in \{\pm \}\), are in \(\mathcal {N}_{p}^{0}\) with

$$\begin{aligned} |\texttt{g}_{\ge 3}^{+}|_{\mathcal {N}_{p}^{0}}\lesssim _{s} \Vert \varepsilon \varphi \Vert ^{3}_{H^{p+\mu }}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{p+\mu }}\,, \end{aligned}$$
(5.20)

for any \(p+\mu \le s\), \(p\in \mathbb {N}\) and some \(\mu >1/2\), with estimates uniform in \(t\in [0,T]\). Moreover one also has (recall estimates (2.42))

$$\begin{aligned} \begin{aligned} |d_{\varepsilon \varphi }\texttt{g}^{+}_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[h]|_{\mathcal {N}_{p}^{0}}&\lesssim _{s} \Vert \varepsilon \varphi \Vert ^{2}_{H^{p+\mu }}\Vert h\Vert _{H^{p+\mu }} +\Vert \varepsilon ^{\beta }V\Vert _{H^{p+\mu }} \Vert h\Vert _{H^{p+\mu }}\,, \\ |d_{\varepsilon ^{\beta }V}\texttt{g}^{+}_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[h]|_{\mathcal {N}_{p}^{0}}&\lesssim _{s} \Vert h\Vert _{H^{p+\mu }}\,, \end{aligned} \end{aligned}$$
(5.21)

for any \(p+\mu \le s\), \(p\in \mathbb {N}\), for any \(h\in H^{s}\).

Consider the symbol \(\texttt{g}^{+}_{1}\) in (5.19) which is linear in \(\varepsilon \varphi \). Therefore we have

Using the properties of the matrices of operators \(M_{i}(\varepsilon \varphi )\), \(i=1,2\), \(M_{\ge 3}(\varepsilon \varphi )\), the composition properties in Lemma 2.9 we deduce

$$\begin{aligned} \begin{aligned} \partial _{t}\texttt{g}^{+}_{1}(\varepsilon \varphi )&= \texttt{a}_1(\varepsilon \varphi )+\texttt{a}_2(\varepsilon \varphi )+\texttt{a}_{\ge 3}(\varepsilon \varphi ) +\texttt{g}^{+}_{1}(-\textrm{Res}_{\mathcal {H}}(\varepsilon \varphi )), \end{aligned} \end{aligned}$$

for some symbols \(\texttt{a}_{j}\in \textbf{SM}^{0}_j\), \(j=1,2\), and \(\texttt{a}_{\ge 3}\in \textbf{SNH}^{\; m}_{3}[r]\). The term \(\texttt{g}^{+}_{1}(-\textrm{Res}_{\mathcal {H}}(\varepsilon \varphi ))\) can be estimated (as a symbol of order 0) using Lemmas 2.6 and 4.1 to estimate the residual. One concludes that \(\texttt{g}^{+}_{1}(-\textrm{Res}_{\mathcal {H}}(\varepsilon \varphi ))\) can be absorbed in the term \(\widetilde{\texttt{g}}_{\ge 3}^{+}\) in (5.12) satisfying (5.13). Concerning the term \(\texttt{g}^{+}_{2}\) in (5.19) (which is quadratic in \(\varepsilon \varphi \)), one can reason as done for \(\texttt{g}^{+}_{1}\). Consider now \(\texttt{g}^{+}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V)\) in (5.19). One has

$$\begin{aligned} \partial _{t}\texttt{g}^{+}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V)= (d_{\varepsilon \varphi }\texttt{g}^{+}_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta } V))[\varepsilon \partial _{t}\varphi ] +(d_{\varepsilon ^{\beta }V}\texttt{g}^{+}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V))[\varepsilon ^{\beta }\partial _{t}V]. \end{aligned}$$

The estimate (5.13) follows using (5.21), (5.17), (5.18), and (4.7), Proposition 3.9 to estimate the norms of \(\varepsilon \partial _{t}\varphi \), the (4.18) and Lemma 4.1 to estimate \(\varepsilon ^{\beta }\partial _{t}V\).

Item (iv). The (5.14) follows by and explicit computation using (5.9), (5.10) and (5.8).

The idea now is that a possible map which block-diagonalizes the system (5.1) at highest order would be \({Op}(G^{-1}_{\chi })\) with \(G^{-1}\) as in (5.9). This is a consequence of Item (iv) in Lemma 5.3. However the linear map \(Z\mapsto {Op}(G^{-1}_{\chi }(\varepsilon \varphi ,\varepsilon ^{\beta }V))Z\) is not symplectic. In the following we will show how to construct a symplectic correction of such map.

Lemma 5.4

Under the hypotheses of Lemma 5.3 there exists a real valued symbol \(\texttt{m}:=\texttt{m}(\varepsilon \varphi , \varepsilon ^{\beta }V)\) such that the following holds:

(i) one has

$$\begin{aligned} \begin{aligned} \texttt{g}^{+}(\varepsilon \varphi ,\varepsilon ^{\beta }V)&=\cosh (|\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)|)\,, \\ \texttt{g}^{-}(\varepsilon \varphi ,\varepsilon ^{\beta }V)&= \frac{\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)}{|\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)|} \sinh (|\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)|)\,, \end{aligned} \end{aligned}$$
(5.22)

where \(\texttt{g}^{\sigma }\), \(\sigma \in \{\pm \}\), are given in (5.10).

(ii) the symbol \(\texttt{m}\) is in (see Def. 2.5) \(\mathbf {S\Sigma }^{\; m}_{1}[r,3]\).

(iii) One has that

$$\begin{aligned} \partial _{t}\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)= \widetilde{\texttt{m}}_{1}(\varepsilon \varphi ) +\widetilde{\texttt{m}}_{2}(\varepsilon \varphi ,\varepsilon \varphi ) +\widetilde{\texttt{m}}_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\,, \end{aligned}$$
(5.23)

where \(\widetilde{\texttt{m}}_{j}\in \textbf{SM}_{j}^{0}\), \(j=1,2\), and \(\widetilde{\texttt{m}}_{\ge 3}\) is in \(\mathcal {N}_{p}^{0}\) with

$$\begin{aligned} |\widetilde{\texttt{m}}_{\ge 3}|_{\mathcal {N}_{p}^{0}}\lesssim _{s} \Vert \varepsilon \varphi \Vert ^{3}_{H^{p+\mu }}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{p+\mu }} +\varepsilon ^5\,, \end{aligned}$$
(5.24)

for any \(p+\mu \le s\), \(p\in \mathbb {N}\), \(\mu >1/2\), with estimates uniform in \(t\in [0,T]\).

Proof

We look for a solution of the equations (5.22). First of all notice that, using the estimates on the symbol \(\texttt{f}\) (see (4.15) and (4.33)), the estimate on \(\lambda \in \mathbf {S\Sigma }^{\; m}_{1}[r,3]\) (see item (i) of Lemma 5.3), estimates (4.4) and (4.11) on the functions \(\varepsilon \varphi , V\) and the smallness assumption on \(\varepsilon \), we get that

$$\begin{aligned} (\texttt{g}^{+}(\varepsilon \varphi ,\varepsilon ^{\beta }V))^{2}-1 {\mathop {=}\limits ^{(5.10), (5.11)}} \frac{\texttt{f}^{2}}{2\lambda \big (1 +\texttt{f} +\lambda \big )}\ge 0. \end{aligned}$$

Therefore we set

$$\begin{aligned} \begin{aligned} |\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)|&:=\textrm{arccosh}(\texttt{g}^{+}(\varepsilon \varphi ,\varepsilon ^{\beta }V)) \\ {}&=\ln \Big (\texttt{g}^{+}(\varepsilon \varphi ,\varepsilon ^{\beta }V) +\sqrt{(\texttt{g}^{+}(\varepsilon \varphi ,\varepsilon ^{\beta }V))^{2}-1}\Big )\,. \end{aligned} \end{aligned}$$
(5.25)

Consider now the second equation (5.22). We first observe that

$$\begin{aligned} \frac{\sinh (|\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)|)}{|\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)|}=1 +\sum _{k\ge 0} \frac{({\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V))^{2k}}{(2k+1)!}\ge 1. \end{aligned}$$

Hence we set

$$\begin{aligned} \texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V):=\texttt{g}^{-}(\varepsilon \varphi ,\varepsilon ^{\beta }V) \frac{|\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)|}{\sinh (|\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)|)}\,. \end{aligned}$$
(5.26)

Using the properties of the symbols \(\texttt{g}^{+}, \texttt{g}^{-}\) (given by Lemma 5.3), and by Taylor expanding formulæ  (5.25), (5.26), reasoning as in item (i) of Lemma 5.3 and using the (5.19)–(5.20), one obtains that (see Def. 2.5) \(\mathbf {S\Sigma }^{\; m}_{1}[r,3]\). This proves item (ii).

Item (iii) follows by an explicit computation using (5.26), estimates (5.21) and reasoning as done in the proof if item (iii) of Lemma 5.3.

We need the following technical Lemma, which is essentially a consequence of Proposition 3.6 in [35].

Lemma 5.5

Under the hypotheses of Lemma 5.3 the following holds. For any \(k\ge 1\) one has that the operator

$$\begin{aligned} Q^{(k)}(\varepsilon \varphi ,\varepsilon ^{\beta }V):= \Big ({Op^{\textrm{BW}}}(\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V))\Big )^{k} -{Op^{\textrm{BW}}}\Big ((\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V))^{k} \Big ) \end{aligned}$$

belongs to the class \(\mathbf {\Sigma }^{-3}_{1}[r,3]\otimes \mathcal {M}_2(\mathbb {C}) \) (see Def. 2.3). More precisely there is \(\texttt{C}=\texttt{C}(s)\) independent of \(k\ge 1\), such that

$$\begin{aligned} \begin{aligned} \Vert Q^{(k)}(\varepsilon \varphi ,\varepsilon ^{\beta }V)Z \Vert _{ H^{s+ \rho }}&\le \texttt{C}^{k} (\Vert \varepsilon \varphi \Vert _{H^{s}}^k +\Vert \varepsilon ^{\beta }V\Vert ^{k}_{H^{s}})\Vert Z\Vert _{H^{s}}\,, \\ \Vert d_{\varepsilon \varphi }\big (Q^{(k)}(\varepsilon \varphi ,\varepsilon ^{\beta }V) \big ) (Z)h \Vert _{ H^{s+ \rho }}&\le \texttt{C}^{k} (\Vert \varphi \Vert _{H^{s}}^{k-1} +\Vert V\Vert ^{k}_{H^{s}})\Vert Z\Vert _{H^{s}} \Vert h\Vert _{H^{s}}\,, \\ \Vert d_{\varepsilon ^{\beta }V}\big (Q^{(k)}(\varepsilon \varphi ,\varepsilon ^{\beta }V) \big ) (Z)h \Vert _{ H^{s+ \rho }}&\le \texttt{C}^{k} \Vert V\Vert ^{k-1}_{H^{s}}\Vert Z\Vert _{H^{s}} \Vert h\Vert _{H^{s}}. \end{aligned} \end{aligned}$$
(5.27)

any \(Z={\bigl [{\begin{matrix}z\\ \overline{z}\end{matrix}}\bigr ]}\in H^s\).

Proof

The case \(k=1\) is trivial. Let us consider \(k=2\). This case is essentially a consequence of (2.28) in Remark 2.2. Indeed, notice that

$$\begin{aligned} \begin{aligned} Q^{(2)}(\varepsilon \varphi ,\varepsilon ^{\beta }V)h&= \Big ({Op^{\textrm{BW}}}(\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V))\circ {Op^{\textrm{BW}}}(\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V))- {Op^{\textrm{BW}}}\big ((\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V) )^{2}\big )\Big )h \\ {}&=\sum _{\xi \in \mathbb {Z}}e^{\textrm{i} \xi \, x}\widehat{(Q^{(2)}h)}(\xi )\,, \\ \widehat{(Q^{(2)}h)}(\xi )&=(2\pi )^{-\frac{3}{2}}\sum _{\eta ,\theta \in \mathbb {Z}} (r_1-r_2)(\xi ,\theta ,\eta ) \widehat{\texttt{m}}(\xi -\theta ) \widehat{\texttt{m}}(\theta -\eta )\widehat{h}(\eta ) \end{aligned} \end{aligned}$$
(5.28)

where

$$\begin{aligned} r_1(\xi ,\theta ,\eta ):=\chi \left( \frac{|\xi -\theta |}{\langle \xi +\theta \rangle }\right) \chi \left( \frac{|\theta -\eta |}{\langle \theta +\eta \rangle }\right) \,, \qquad r_2(\xi ,\eta ):=\chi \left( \frac{|\xi -\eta |}{\langle \xi +\eta \rangle }\right) \end{aligned}$$
(5.29)

and \(\widehat{\texttt{m}}(\xi )\) denotes the Fourier transform in x of the function \((\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V))(x)\) evaluated at \(\xi \in \mathbb {Z}\). We remark that the remainder \(Q^{(2)}\) is bilinear in the symbol \(\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\). Recall that, by item (ii) of Lemma 5.4, the symbol \(\texttt{m}\) belongs to \(\mathbf {S\Sigma }^{\; m}_{1}[r,3]\). Therefore, using (5.28), we can expand \(Q^{(2)}\) as the sum of two remainders, the first depending only on \(\varepsilon \varphi \) and the second depending at least linearly in the variable \(\varepsilon ^{\beta }V\). The estimates (5.27) can be deduced by following almost word by word the proof of Proposition 2.5 in [36], and using formulæ  (5.28) and (5.29). In order to get the result for any \(k\ge 3\), one reasons by induction following the proof of Proposition 3.6 in [35]. The Taylor expansions in \(\varepsilon \varphi \) and \(\varepsilon ^{\beta }V\) follow again recalling that the remainders \(Q^{(k)}\) are multilinear in the symbol \(\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\in \mathbf {S\Sigma }^{\; m}_{1}[r,3]\).

Consider the matrix of symbols

$$\begin{aligned} \textbf{M}:=\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V)=\left( \begin{matrix} 0 &{} \texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\\ {\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)} &{} 0 \end{matrix}\right) \,, \end{aligned}$$
(5.30)

where \(\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\) is given by Lemma 5.4. We shall study, for \(\tau \in [0,1]\), the properties of the flow map of the following problem:

$$\begin{aligned} \left\{ \begin{aligned}&\partial _{\tau }\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ] ={Op^{\textrm{BW}}}(\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V)) \Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ],\\&\Phi _{\texttt{m}}^{0}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]=\mathbb {1}[\cdot ]\,, \end{aligned}\right. \end{aligned}$$
(5.31)

where \(\textbf{M}_{\chi }\) is the matrix whose entries are given in terms of the symbol \(\texttt{m}_{\chi }(\varepsilon \varphi ,\varepsilon ^{\beta }V)\) defined as in (4.15). More precisely we prove the following.

Lemma 5.6

Under the hypotheses of Lemma 5.3 the following holds. The flow map of (5.31) is well-posed for \(\tau \in [0,1]\) and have the form

$$\begin{aligned} \Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]-\mathbb {1}=\texttt{Q}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta }V) \qquad \tau \in [0,1]\,, \end{aligned}$$
(5.32)

for some real operator \(\texttt{Q}\in \mathbf{\Sigma }_1^{0}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\) (see Def. 2.3), with estimates uniform in \(\tau \in [0,1]\). The same holds for the inverse map \((\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V))^{-1}[\cdot ]\). Moreover for the time one flow map we have the following expansion

$$\begin{aligned}{} & {} \Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]_{|\tau =1}:= {Op^{\textrm{BW}}}\big ( G^{-1}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\big )[\cdot ] +\texttt{R}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ], \end{aligned}$$
(5.33)
$$\begin{aligned}{} & {} (\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V))^{-1}[\cdot ]_{|\tau =1}:= {Op^{\textrm{BW}}}\big ( G(\varepsilon \varphi ,\varepsilon ^{\beta }V)\big )[\cdot ] +\texttt{R}'(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ], \end{aligned}$$
(5.34)

where \(G,G^{-1}\) are in (5.9), the operators \(\texttt{R},{\texttt{R}'}\) are real and belong to \(\mathbf{\Sigma }_1^{-3}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\) (see Def. 2.3). Finally

$$\begin{aligned} (\partial _{t}\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ])_{|\tau =1}:= {Op^{\textrm{BW}}}\big ( \widetilde{G}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\big )[\cdot ]+ \widetilde{\texttt{R}}_{\le 2}(\varepsilon \varphi )[\cdot ]+\widetilde{\texttt{R}}_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]\,, \end{aligned}$$
(5.35)

where \(\widetilde{G}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\) is a \(2\times 2\) matrix of real valued symbols, independent of \(\xi \in \mathbb {R}\), of the form

$$\begin{aligned} \widetilde{G}(\varepsilon \varphi ,\varepsilon ^{\beta }V):= \widetilde{G}_1(\varepsilon \varphi ) +\widetilde{G}_2(\tau ;\varepsilon \varphi ,\varepsilon \varphi ) +\widetilde{G}_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\,, \end{aligned}$$
(5.36)

where \(\widetilde{G}_j\in \textbf{SM}_j^{0}\otimes \mathcal {M}_2(\mathbb {C})\), \(j=1,2\), and where \(\widetilde{G}_{\ge 3}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta }V)\) is in \(\mathcal {N}_{p}^{0}\) with

$$\begin{aligned} |\widetilde{G}_{\ge 3}|_{\mathcal {N}_{p}^{0}}\lesssim _{s} \Vert \varepsilon \varphi \Vert ^{3}_{H^{p+\mu }}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{p+\mu }} +\varepsilon ^5\,, \end{aligned}$$
(5.37)

for any \(p+\mu \le s\), \(p\in \mathbb {N}\), for some \(\mu >1/2\), with estimates uniform in \(\tau \in [0,1]\). Moreover the operator \(\widetilde{\texttt{R}}_{\le 2}\) has the form

$$\begin{aligned} \widetilde{\texttt{R}}_{\le 2}(\tau ;\varepsilon \varphi ):=\widetilde{\texttt{R}}_{1}(\tau ;\varepsilon \varphi )+ \widetilde{\texttt{R}}_{ 2}(\tau ;\varepsilon \varphi ),\qquad \widetilde{\texttt{R}}_{j}, \in \textbf{M}^{-3}_{j}\otimes \mathcal {M}_2(\mathbb {C}),\;\;\;j=1,2, \end{aligned}$$

and \(\widetilde{\texttt{R}}_{\ge 3}\) satisfies

$$\begin{aligned} \Vert \widetilde{\texttt{R}}_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)Z\Vert _{H^{s+3}} \lesssim _{s} \Vert Z\Vert _{H^{s}}\big ( \Vert \varepsilon \varphi \Vert ^{3}_{H^{s}}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{s}} +\varepsilon ^5\big )\,. \end{aligned}$$
(5.38)

Proof

By Lemma 2.2 (see (2.24)) and the bounds for the semi-norm of the symbol \(\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\in \mathbf{S\Sigma }_1^{0}[r,3]\), we have that \({Op^{\textrm{BW}}}(\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V))\) is a bounded operator on \(\mathbf{{H}}^{s}\). Then by standard theory of Banach space ODE we have that the flow \(\Phi _{\texttt{m}}^{\tau }\) is well-defined. In particular we have

$$\begin{aligned} \Phi ^{\tau }_{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]= \exp \Big (\tau {Op^{\textrm{BW}}}(\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V))\Big ) =\sum _{k\ge 0}\frac{\tau ^{k}}{k!} ({Op^{\textrm{BW}}}(\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V)))^{k}. \end{aligned}$$

Hence by Lemma 5.5 one has

$$\begin{aligned} \Phi ^{\tau }_{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]= {Op^{\textrm{BW}}}\Big ( \exp (\tau \textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V))\Big )+ \sum _{k\ge 0}\frac{\tau ^{k}}{k!}Q^{(k)}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\,. \end{aligned}$$

Reasoning as in Corollary 3.1 in [35], one deduces that

$$\begin{aligned} \Phi ^{\tau }_{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]= {Op^{\textrm{BW}}}\Big ( \exp (\tau \textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V))\Big )+ \texttt{R}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V) \end{aligned}$$
(5.39)

where \(\texttt{R}^{\tau }\in \mathbf{\Sigma }_1^{-3}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\), with estimates uniform in \(\tau \in [0,1]\). The latter assertion follows by estimates (5.27) and using the smallness assumption on \(\varepsilon \) to have the convergence of the series. Furthermore, by (5.30), we have

(5.40)

We claim that there exists two real valued symbols \(\texttt{K}^{\sigma }(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta } V)\), \(\sigma \in \{\pm \}\), independent of \(\xi \in \mathbb {R}\), such that

(5.41)

with estimates uniform in \(\tau \in [0,1]\). Indeed \(\cosh (\tau |\texttt{m}|)\) and \((\texttt{m}/|\texttt{m}|)\sinh (\tau |\texttt{m}|)\) are analytic functions of the symbol \(\texttt{m}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\in \mathbf{S\Sigma }_1^0[r,3]\) (see Lemma 5.4 and Def. 2.5). Therefore the multilinear expansions in \(\varepsilon \varphi \) and estimates (2.42) follow by explicit computations. This implies the claim (5.41). By the discussion above we have that the (5.39) reads

$$\begin{aligned} \Phi ^{\tau }_{\texttt {m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]= {Op^{\textrm{BW}}}\Big ( \begin{matrix}{} \texttt {K}^{+}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta } V) &{} \texttt {K}^{-}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta } V)\\ \texttt {K}^{-}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta } V)&{} \texttt {K}^{+}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta } V) \end{matrix} \Big )+ \texttt {R}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)\,. \end{aligned}$$
(5.42)

This implies (5.32) recalling Lemma 2.7-(iii) and the fact that \(\texttt{R}^{\tau }\in \mathbf{\Sigma }_1^{-3}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\).

Using Lemma 5.4 (see (5.22)) and recalling also (5.9)–(5.10) we deduce that formulæ  (5.39), (5.40) and (5.42) imply the expansion (5.33). The expansion (5.34) follows similarly. Let us check (5.35). By differentiating in t equation (5.31) we get

$$\begin{aligned} \begin{aligned} \partial _{\tau }(\partial _{t}\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V))[\cdot ]&= {Op^{\textrm{BW}}}(\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V)) (\partial _{t}\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V))[\cdot ] \\ {}&\qquad \qquad \qquad \qquad +{Op^{\textrm{BW}}}(\partial _{t}\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V))\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ], \end{aligned} \end{aligned}$$

with \((\partial _{t}\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V))[\cdot ]_{|\tau =0}=0\). By Duhamel formulation we deduce that

$$\begin{aligned} (\partial _{t}\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,V))[\cdot ]=\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,V) \int _{0}^{\tau }(\Phi _{\texttt{m}}^{\sigma }(\varepsilon \varphi ,V))^{-1} {Op^{\textrm{BW}}}(\partial _{t}\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V))\Phi _{\texttt{m}}^{\sigma }(\varepsilon \varphi ,V)[\cdot ]d\sigma . \end{aligned}$$

The term \({Op^{\textrm{W}}}(\partial _{t}{} \textbf{M}_{\chi }(\varepsilon \varphi ,\varepsilon ^{\beta }V))\) in the integral, is completely under control for any \(t\in [0,T]\) thanks to item (iii) of Lemma 5.4 and recalling (5.30). Therefore, the expansion (5.35), together with (5.36)–(5.37) and estimate (5.38) follow using the expansions (5.42) for the flow \(\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)\), Lemma 2.2, Lemma 2.10, Lemma 2.8 and using the expansions (5.23) and of \(\texttt{m}\), (5.42) and (5.24).

Remark 5.7

(Hamiltonian structure 2). Notice that the operator \({Op^{\textrm{BW}}}(\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V))_{|V\equiv 0}[\cdot ]\) is Hamiltonian according to Definition 2.14. Therefore the flow map \(\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)_{|V\equiv 0}[\cdot ]\) is symplectic according to Definition 2.15.

Lemma 5.8

(Equivalence of the norms) Consider the function \(\varepsilon \varphi \) in (4.8) and let V be a solution of (4.9). For any \(s\ge s_0\) for some \(s_0>1\) there is a constant \(C_s>1\) such that if (4.11) and (4.4) hold true with \(\varepsilon >0\) satisfying (4.5) the following holds. One has that

$$\begin{aligned} \sup _{\tau \in [0,1]}\Vert (\Phi ^{\tau }_{\texttt{m}} (\varepsilon \varphi ,\varepsilon ^{\beta }V) )^{\pm 1}Z\Vert _{H^{s}} \le \Vert Z\Vert _{H^{s}} (1+C_s\Vert \varepsilon \varphi \Vert _{H^{s}} +C_{s}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}})\,, \end{aligned}$$
(5.43)

for any \(Z\in H^S\), where \(\Phi ^{\tau }_{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\) is the the flow map given in Lemma 5.6. Moreover, defining the function

$$\begin{aligned} W:={\bigl [{\begin{matrix}w\\ \overline{w}\end{matrix}}\bigr ]}:=\big (\Phi ^{\tau }_{\texttt{m}} (\varepsilon \varphi ,\varepsilon ^{\beta }V)\big )_{|\tau =1}[\varepsilon ^{\beta }V]\,, \end{aligned}$$
(5.44)

one has the equivalence

$$\begin{aligned} (1+\varepsilon C_{s})^{-1}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}} \le \Vert W\Vert _{H^{s}} \le (1+\varepsilon C_{s}) \Vert \varepsilon ^{\beta }V\Vert _{H^{s}}\,. \end{aligned}$$
(5.45)

Proof

Consider the map in (5.33) and recall \(G^{-1}\) in (5.9)–(5.10). Then, using (5.20) and (2.43) to estimate the action of the pseudo-differential operator \({Op^{\textrm{BW}}}\big ( G^{-1}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\big )\), using that \(\texttt{R}\in \mathbf{\Sigma }_1^{-3}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\) (see Lemma 5.6) and the smallness assumptions on \(\varepsilon \varphi , V\), one gets the bound (5.43) for the map \(\Phi ^{\tau }_{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\). The bound (5.43) on the inverse follows similarly using the (5.34). The equivalence (5.45) follows by (5.43) using also (4.4), (4.11) and the smallness (4.5).

5.1.2 The Symplectic Correction

The map \(\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]\) introduced in Lemma 5.6 is a linear symplectic map w.r.t. the symplectic form \(\omega \) in (2.14). However we need to study the equation (5.4) which is posed on the subspace

$$\begin{aligned} H_{\perp }^{s}:=\{U\in H^s \;:\; \Pi _{S}^{\perp }U=U\}\, . \end{aligned}$$
(5.46)

Hence we need to find a correction of the map \(\Phi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]\) which is symplectic with respect to the restricted symplectic form \(\omega \big (\Pi _S^{\perp } (\cdot ), \Pi _S^{\perp } (\cdot ) \big )\). This fact plays an important role in order to prove item (iii) Prop. 5.11 (see also Remark 5.12). The Hamiltonian structure of these operators in the conjugated vector field will be used in Sect. 5.2 to ensure some terms in the energy forms vanish on resonances.

This is the content of the next lemma.

Lemma 5.9

Recall (5.30), (5.31). Under the hypotheses of Lemma 5.3 the following holds. Consider the flow \(\Psi ^{\tau }_{\texttt{m}}=\Psi ^{\tau }_{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\), \(\tau \in [0, 1]\), defined by the system

$$\begin{aligned} \left\{ \begin{aligned}&\partial _{\tau }\Psi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ] =\Pi _S^{\perp }{Op^{\textrm{BW}}}(\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V))\Pi _S^{\perp } \Psi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ],\\&\Psi _{\texttt{m}}^{0}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]=\mathbb {1}_{\perp }[\cdot ]\,, \end{aligned}\right. \end{aligned}$$
(5.47)

where \(\mathbb {1}_{\perp }\) is the identity on \( H_{\perp }^{s}\) in (5.46). There exists a real-to-real matrix of linear operators \(\Theta ^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)\) belonging to \(\mathbf{\Sigma }_1^{-\rho }[r,3]\otimes \mathcal {M}_2(\mathbb {C})\) for any \(\rho \ge 0\) with estimates uniform in \(\tau \in [0,1]\), such that

$$\begin{aligned} \Psi _{\texttt{m}}^{\tau }(\varepsilon \varphi , \varepsilon ^{\beta }V)= \Pi _S^{\perp } \Phi ^{\tau }_{\texttt{m}}(\varepsilon \varphi , \varepsilon ^{\beta }V) \circ (\mathbb {1}+ \Theta ^{\tau }(\varepsilon \varphi , \varepsilon ^{\beta }V)) \,\Pi _S^{\perp }\,. \end{aligned}$$
(5.48)

In particular one has the expansion

$$\begin{aligned} \Psi ^{\tau }_{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]=\Pi _S^{\perp } {Op^{\textrm{BW}}}\Big ( \begin{matrix}{\texttt{K}}^{+}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta } V) &{} {\texttt{K}}^{-}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta } V) \\ {\texttt{K}}^{-}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta } V)&{} {\texttt{K}}^{+}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta } V) \end{matrix} \Big ) \Pi _S^{\perp }+ \texttt{R}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)\,, \end{aligned}$$
(5.49)

where symbols \({\texttt{K}}^{\sigma }\), \(\sigma \in \{\pm \}\), are in (5.42) and \(\texttt{R}^{\tau }\in \mathbf{\Sigma }_1^{-3}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\). The inverse map \(\Psi ^{-\tau }_{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]\) admits a similar expansion as (5.49). Moreover

$$\begin{aligned} (\partial _{t}\Psi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ])_{|\tau =1}:= {Op^{\textrm{BW}}}\big ( \widetilde{G}^{\perp }(\varepsilon \varphi ,\varepsilon ^{\beta }V)\big )[\cdot ]+ \widetilde{\texttt{R}}^{\perp }_{\le 2}(\varepsilon \varphi )[\cdot ] +\widetilde{\texttt{R}}^{\perp }_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ]\,, \end{aligned}$$
(5.50)

where \(\widetilde{G}^{\perp }(\varepsilon \varphi ,\varepsilon ^{\beta }V)\) is a \(2\times 2\) matrix of real valued symbols, independent of \(\xi \in \mathbb {R}\), of the form

$$\begin{aligned} \widetilde{G}^{\perp }(\varepsilon \varphi ,\varepsilon ^{\beta }V):= \widetilde{G}^{\perp }_{1}(\varepsilon \varphi )+\widetilde{G}^{\perp }_2(\tau ;\varepsilon \varphi ,\varepsilon \varphi ) +\widetilde{G}^{\perp }_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\,, \end{aligned}$$
(5.51)

where \(\widetilde{G}^{\perp }_j\in \textbf{SM}_j^{0}\otimes \mathcal {M}_2(\mathbb {C})\), \(j=1,2\) and where \(\widetilde{G}^{\perp }_{\ge 3}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta }V)\) is in \(\mathcal {N}_{p}^{0}\) with

$$\begin{aligned} |\widetilde{G}^{\perp }_{\ge 3}|_{\mathcal {N}_{p}^{0}}\lesssim _{s} \Vert \varepsilon \varphi \Vert ^{3}_{H^{p+\mu }}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{p+\mu }} +\varepsilon ^5\,, \end{aligned}$$
(5.52)

for any \(p+s_0+3\le s\), \(p\in \mathbb {N}\), and some \(\mu >1/2\), with estimates uniform in \(\tau \in [0,1]\). The operator \(\texttt{R}^{\perp }_{\le 2}\) has the form

$$\begin{aligned} \widetilde{\texttt{R}}^{\perp }_{\le 2}(\tau ;\varepsilon \varphi ):=\widetilde{\texttt{R}}^{\perp }_{1}(\tau ;\varepsilon \varphi )+ \widetilde{\texttt{R}}^{\perp }_{ 2}(\tau ;\varepsilon \varphi ),\qquad \widetilde{\texttt{R}}^{\perp }_{j}\in \textbf{M}^{-3}_{j}\otimes \mathcal {M}_2(\mathbb {C}),\;\;\;j=1,2, \end{aligned}$$

and \(\texttt{R}^{\perp }_{\ge 3}\) satisfies

$$\begin{aligned} \Vert \texttt{R}^{\perp }_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)Z\Vert _{H^{s+3}} \lesssim _{s} \Vert Z\Vert _{H^{s}} \big ( \Vert \varepsilon \varphi \Vert ^{3}_{H^{s}}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{s}} +\varepsilon ^5\big )\,. \end{aligned}$$
(5.53)

Finally one has that \(U^{\perp } \mapsto \Psi ^{\tau }_m(\varepsilon \varphi , \varepsilon ^{\beta }V) [U^{\perp }]\) is symplectic with respect to the restricted symplectic form \( \omega \big (\Pi _S^{\perp } (\cdot ), \Pi _S^{\perp } (\cdot ) \big ) \) (recall \(\omega \) in (2.14)).

Proof

To simplify the notation during this proof we denote by X the operator \({Op^{\textrm{BW}}}(\textbf{M}(\varepsilon \varphi , \varepsilon ^{\beta }V))\). Recalling (5.30) and Lemma 5.4 we shall write

$$\begin{aligned}{} & {} X=X_{\le 2}+X_{\ge 3},\\{} & {} \nonumber \begin{aligned} X_{\le 2}=X_{\le 2}(\varepsilon \varphi )&:= {Op^{\textrm{BW}}}\begin{pmatrix} 0 &{} \texttt{m}_{\le 2}(\varepsilon \varphi )\\ \texttt{m}_{\le 2}(\varepsilon \varphi ) &{} 0 \end{pmatrix},&\qquad \\ X_{\ge 3}=X_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)&:= {Op^{\textrm{BW}}}\begin{pmatrix} 0 &{} \texttt{m}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V)\\ \texttt{m}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V) &{} 0 \end{pmatrix}, \end{aligned} \end{aligned}$$
(5.54)

where \(\texttt{m}_{\le }=\texttt{m}_1+\texttt{m}_2\), \(\texttt{m}_j\in \textbf{SM}_{j}^{0}\), \(j=1,2\) and \(\texttt{m}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V)\in \textbf{SNH}_{3}^{0}\). To lighten the notation in the following we omit the dependence of the maps on \((\varepsilon \varphi ,\varepsilon ^{\beta }V)\). We look for a one-parameter group of bounded linear transformations \(\Upsilon ^{\tau }:H^s\rightarrow H^s\) such that

$$\begin{aligned} \Psi ^{\tau }=\Pi _S^{\perp } \,\Phi _{\texttt{m}}^{\tau }\circ \Upsilon ^{\tau }\, \Pi _S^{\perp }\,. \end{aligned}$$
(5.55)

We differentiate both sides with respect to the parameter \(\tau \) using (5.31), (5.47). One obtains

$$\begin{aligned} \Pi _S^{\perp } X \Pi _S^{\perp } \Psi ^{\tau } = \Pi _S^{\perp } X \Phi _{\texttt{m}}^{\tau } \Upsilon ^{\tau } \Pi _S^{\perp } +\Pi _S^{\perp } \Phi _{\texttt{m}}^{\tau } (\partial _{\tau }\Upsilon ^{\tau }) \Pi _S^{\perp }\,, \end{aligned}$$

which is equivalent to

$$\begin{aligned} -\Pi _S^{\perp } X \Pi _S \Phi _{\texttt{m}}^{\tau } \Upsilon ^{\tau } \Pi _S^{\perp }= \Pi _S^{\perp } \Phi _{\texttt{m}}^{\tau } (\partial _{\tau }\Upsilon ^{\tau }) \Pi _S^{\perp }, \end{aligned}$$

by noticing that

$$\begin{aligned} \Pi _S^{\perp } X \Phi _{\texttt{m}}^{\tau } \Upsilon ^{\tau } \Pi _S^{\perp } = \Pi _S^{\perp } X \Phi _{\texttt{m}}^{\tau } \Pi _S^{\perp } \Upsilon ^{\tau } \Pi _S^{\perp } +\Pi _S^{\perp } X \Pi _S \Phi _{\texttt{m}}^{\tau } \Upsilon ^{\tau } \Pi _S^{\perp }. \end{aligned}$$

Then (5.55) is satisfied by the solution of the following Cauchy problem

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _{\tau } \Upsilon ^{\tau }=Y(\tau )(\Upsilon ^{\tau })\,, \qquad Y(\tau )=Y:=-\Phi _{\texttt{m}}^{-\tau } X \Pi _S \Phi _{\texttt{m}}^{\tau }\,,\\ \Upsilon ^0=\mathbb {1}\,. \end{array}\right. } \end{aligned}$$
(5.56)

We observe that \(X\Pi _S\) is a finite rank operator. In particular

$$\begin{aligned} X_{\le 2}\Pi _S(Z)=\sum _{j\in S} Z_j \varsigma _j, \qquad X_{\ge 3}\Pi _S(Z)=\sum _{j\in S} Z_j \varrho _j, \end{aligned}$$

with

$$\begin{aligned} \varsigma _j:=X_{\le 2}(\texttt{e}_j), \qquad \varrho _j:=X_{\ge 3}(\texttt{e}_j) , \qquad \texttt{e}_j:= \begin{pmatrix} e^{\textrm{i} j x}\\ e^{-\textrm{i} j x} \end{pmatrix}. \end{aligned}$$

Recall that the operator \(\texttt{Q}\) in (5.32) is in \(\mathbf{\Sigma }_1^{0}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\) and so can be expanded as

$$\begin{aligned} \begin{aligned}&\texttt{Q}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta }V):=\texttt{Q}_{\le 2}(\tau ;\varepsilon \varphi )+ \texttt{Q}_{\ge 3}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta }V), \\ {}&\texttt{Q}_{\le 2}(\tau ;\varepsilon \varphi ):=\texttt{Q}_{1}(\tau ;\varepsilon \varphi )+ \texttt{Q}_{ 2}(\tau ;\varepsilon \varphi ) \qquad \texttt{Q}_{j}\in \textbf{M}^{0}_{j}\otimes \mathcal {M}_2(\mathbb {C}),\;\;\;j=1,2, \end{aligned} \end{aligned}$$

and some operator \(\texttt{Q}_{\ge 3}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta }V)\) satisfying, for \(s\ge s_0\gg 1\),

$$\begin{aligned} \Vert \texttt{Q}_{\ge 3}(\tau ;\varepsilon \varphi ,\varepsilon ^{\beta }V)Z\Vert _{H^{s}} \lesssim _{s} \Vert Z\Vert _{H^{s}} \big (\Vert \varepsilon \varphi \Vert ^{3}_{H^{s}}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{s}}\big )\,, \end{aligned}$$
(5.57)

uniformly in \(\tau \in [0,1]\). In turn the vector field Y is finite rank, since \(\Phi _{\texttt{m}}^{\tau }\) is a linear operator and \( Y=-X\Pi _S+Y_{\le 2}+Y_{\ge 3} \) with, by using (5.32) and recalling (2.48),

$$\begin{aligned} Y_{\le 2}(\tau )[Z]:=&-\sum _{j\in S} (Z, \texttt{e}_j )_{L^2} \texttt{Q}_{\le 2}(-\tau ; \varepsilon \varphi )[\varsigma _j] -\sum _{j\in S} ( \texttt{Q}_{\le 2}(\tau ; \varepsilon \varphi ) [Z], \texttt{e}_j )_{L^2} \varsigma _j \\ {}&-\sum _{j\in S} ( \texttt{Q}_{\le 2}(\tau ; \varepsilon \varphi )[Z], \texttt{e}_j )_{L^2} \texttt{Q}_{\le 2}(-\tau ; \varepsilon \varphi )[\varsigma _j] -\sum _{j\in S} ( Z, \texttt{e}_j )_{L^2} \texttt{Q}_{\le 2}(-\tau ; \varepsilon \varphi )[\varrho _j] \\ {}&-\sum _{j\in S} ( \texttt{Q}_{\le 2}(\tau ; \varepsilon \varphi ) [Z], \texttt{e}_j )_{L^2} \varrho _j -\sum _{j\in S} ( \texttt{Q}_{\le 2}(\tau ; \varepsilon \varphi )[Z], \texttt{e}_j )_{L^2} \texttt{Q}_{\le 2}(-\tau ; \varepsilon \varphi )[\varrho _j]\,, \end{aligned}$$

and \(Y_{\ge 3}\) can be written explicitly as done for \(Y_{\le 2 }\), but it depend on V or it is at least cubic in \(\varepsilon \varphi \). We recall that by Lemma 5.6 we have that \(\texttt{Q}_1(\tau )\in \mathbf{\Sigma }_{1}^{0}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\) (recall Definition 2.3-(iii)). Then it is evident that, by the structure of \(Y_{\le 2}\), it has an expansion in linear and quadratic terms with respect to \(\varepsilon \varphi \). Moreover every term in \(Y_{\le 2}\) is an infinitely regularizing operator. Then \(Y_{\le 2}(\tau )\in \mathbf{\Sigma }_{1}^{-\rho }[r,3]\otimes \mathcal {M}_2(\mathbb {C})\) for all \(\rho >0\).

By (5.57), the fact that \(\texttt{Q}_1(\tau )\in \mathbf {\Sigma }^0_1[r, 3]\otimes \mathcal {M}_2(\mathbb C)\) and that \(Y_{\ge 3}\) is finite rank, we have

$$\begin{aligned} \Vert Y_{\ge 3}(\tau )[Z] \Vert _{H^{s+\rho }}\lesssim _s \Vert Z\Vert _{H^{s}} \big ( \Vert \varepsilon \varphi \Vert ^{3}_{H^{s}}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{s}}\big ). \end{aligned}$$

By classical theory of Banach space ODEs the Cauchy problem (5.56) is well-posed on \(H^s(\mathbb T)\). In particular \(\Upsilon ^{\tau }\) is invertible and bounded on any \(H^s(\mathbb T)\) for \(\tau \in [0, 1]\), namely

$$\begin{aligned} \Vert (\Upsilon ^{\tau })^{\pm 1} (Z) \Vert _{H^s}\lesssim \Vert Z \Vert _{H^s}\,, \qquad \tau \in [0, 1]\,. \end{aligned}$$
(5.58)

By Taylor expanding \(\Upsilon ^{\tau }\) at \(\tau =0\) we have

$$\begin{aligned} \Upsilon ^{\tau }[\cdot ]-\mathbb {1}[\cdot ]=- \tau \,Y(0)[\cdot ] +\int _0^{\tau } (1-l)\,\partial ^2_{\tau } \Upsilon ^l [\cdot ]\,dl =:\Theta ^{\tau }. \end{aligned}$$

We claim that \(\Theta ^{\tau }\) is in \(\mathbf{\Sigma }_1^{-\rho }[r,3]\otimes \mathcal {M}_2(\mathbb {C})\). We observe that

$$\begin{aligned} \partial ^2_{\tau } \Upsilon ^l=Z(l)\circ \Upsilon ^{l}, \qquad Z(l):=\partial _{\tau } Y(l)+Y^2(l), \end{aligned}$$

and

$$\begin{aligned} \partial _{\tau } Y=&-\sum _{j\in S}( X(\Phi _{\texttt{m}}^{\tau }(Z)), e^{\textrm{i} j x} )_{L^2}\, \Phi _{\texttt{m}}^{-\tau }(\varsigma _j) +\sum _{j\in S} ( \Phi _{\texttt{m}}^{\tau }(Z), e^{\textrm{i} j x} )_{L^2}\, X(\Phi _{\texttt{m}}^{-\tau }(\varsigma _j)) \\ {}&-\sum _{j\in S} ( X(\Phi _{\texttt{m}}^{\tau }(Z)), e^{\textrm{i} j x} )_{L^2}\, \Phi _{\texttt{m}}^{-\tau }(\varrho _j) +\sum _{j\in S} ( \Phi _{\texttt{m}}^{\tau }(Z), e^{\textrm{i} j x} )_{L^2}\, X(\Phi _{\texttt{m}}^{-\tau }(\varrho _j))\,, \\ Y^2=&Y_{\le 2}^2+Y_{\ge 3}^2+Y_{\le 2} Y_{\ge 3}+Y_{\ge 3}Y_{\le 2}\,. \end{aligned}$$

Recalling (5.54), (5.32), we consider the splitting \(X\circ \Phi _{\texttt{m}}^{\tau }=\mathfrak {X}_{\le 2}(\tau ) +\mathfrak {X}_{\ge 3}(\tau )\) where

$$\begin{aligned} \mathfrak {X}_{\le 2}(\tau ):=X_{\le 2}+X_{\le 2} \texttt{Q}_{\le 2}(\tau ), \qquad \mathfrak {X}_{\ge 3}(\tau ):=X_{\ge 3}+X_{\le 2}\texttt{Q}_{\ge 3}(\tau ) +X_{\ge 3} \texttt{Q}_{\le 2}(\tau )+X_{\ge 3}\texttt{Q}_{\ge 3}(\tau ). \end{aligned}$$

We write \(Z(\tau )=Z_{\le 2}(\tau )+Z_{\ge 3}(\tau )\), where \(Z_{\le 2}=Z_{\le 2}(\tau )\) is obtained by the sum of

  • terms of \(\partial _{\tau } Y\) where \(X\circ \Phi _{\texttt{m}}^{\tau }\) is replaced by \(\mathfrak {X}_{\le 2}(\tau )\) and \(\Phi _{\texttt{m}}^{\tau }\) by \(\mathbb {1}+\texttt{Q}_{\le 2}(\tau )\);

  • the term \(Y_{\le 2}^2\),

while \(Z_{\ge 3}=Z_{\ge 3}(\tau )\) is defined by difference. We notice that \(Z_{\ge 3}\) is sum of terms depending on V (plus terms with high homogeneity in \(\varepsilon \varphi \)), while \(Z_{\le 2}\) is independent of V. Moreover, reasoning as for \(Y_{\le 2}\) and \(Y_{\ge 3}\), we have that, for all \(\rho \ge 0\), \(Z_{\le 2}\in \mathbf {\Sigma }^{-\rho }_1[r, 3]\otimes \mathcal {M}_2(\mathbb C)\) and \(Z_{\ge 3}\) satisfies

$$\begin{aligned} \Vert Z_{\ge 3}(\tau )[Z] \Vert _{H^{s+\rho }}\lesssim _s \Vert Z\Vert _{H^{s}} \big ( \Vert \varepsilon \varphi \Vert ^{3}_{H^{s}}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{s}}\big ). \end{aligned}$$

We set

$$\begin{aligned} \begin{aligned} \widetilde{\Theta }_1^{\tau }&:=-\tau \,X_{\le 2}\Pi _S +\tau \,Y_{\le 2}(0) +\int _0^{\tau } Z_{\le 2}(l)\circ \Upsilon ^l\,dl, \\ \widetilde{\Theta }_2^{\tau }&:=-\tau \,X_{\ge 3}\Pi _S +\tau \,Y_{\ge 3}(0)+\int _0^{\tau } Z_{\ge 3}(l)\circ \Upsilon ^l\,dl. \end{aligned} \end{aligned}$$

By the discussion on \(\Upsilon ^{\tau }\), the bound (5.58) and Lemma 5.4-(ii) we can conclude that \(\widetilde{\Theta }^{\tau }_1+ \widetilde{\Theta }^{\tau }_2\) belongs to the class \(\mathbf {\Sigma }^{-\rho }_1[r, 3]\otimes \mathcal {M}_2(\mathbb C)\). Therefore formula (5.48) is proved. By (5.48), the expansions on the map \(\Theta ^{\tau }\), (5.42), Lemmata 2.8, 2.10 we deduce the expansion (5.49).

To prove (5.50) we proceed exactly as in the proof of (5.35) in Lemma 5.6. We have

$$\begin{aligned} \begin{aligned} \partial _{\tau }(\partial _{t}\Psi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V))[\cdot ]&=\Pi _S^{\perp } {Op^{\textrm{BW}}}(\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V))\Pi _S^{\perp } (\partial _{t}\Psi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V))[\cdot ] \\ {}&\qquad \qquad \qquad +\Pi _S^{\perp }{Op^{\textrm{BW}}}(\partial _{t}\textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V)) \Pi _S^{\perp } \Psi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)[\cdot ] \end{aligned} \end{aligned}$$

with \((\partial _{t}\Psi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V))[\cdot ]_{|\tau =0}=0\). By Duhamel formulation we deduce that

The term \(\Pi _S^{\perp } {Op^{\textrm{BW}}}(\partial _{t}{} \textbf{M}(\varepsilon \varphi ,\varepsilon ^{\beta }V)) \Pi _S^{\perp }\) in the integral can be controlled for any \(t\in [0,T]\) thanks to item (iii) of Lemma 5.4 and recalling (5.30). Therefore, the expansion (5.50) and estimate (5.53) follow using the expansion (5.49) for the flow \(\Psi _{\texttt{m}}^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)\), (5.23), the properties of \(\texttt{m}\) in Lemma 5.4 and Lemmata 2.2, 2.8, 2.10.

Corollary 5.10

Consider the flow map \(\Psi ^{\tau }_{\texttt{m}}(\varepsilon \varphi , \varepsilon ^{\beta }V)\) given in Lemma 5.9. Then, for any \(s\ge s_0\) with \(s_0\gg 1\) one has that

$$\begin{aligned} \sup _{\tau \in [0,1]} \Vert (\Psi ^{\tau }_{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V) )^{\pm 1}Z\Vert _{H^{s}} \le \Vert Z\Vert _{H^{s}}(1+C_{s}\Vert \varepsilon \varphi \Vert _{H^{s}} +C_{s}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}) \end{aligned}$$
(5.59)

for some \(C_s>0\) depending on s, for any \(Z\in H^s\) such that \(\Pi _{S}^{\perp }Z=Z\). Define the function

$$\begin{aligned} W^{\perp }:={\bigl [{\begin{matrix}w^{\perp }\\ \overline{w}^{\perp }\end{matrix}}\bigr ]}:= \Psi _{\texttt{m}}(\varepsilon \varphi , \varepsilon ^{\beta }V) [U^{\perp }]\,, \end{aligned}$$
(5.60)

where \(U^{\perp }\) is in (5.3) and \(\Psi _{\texttt{m}}(\varepsilon \varphi , \varepsilon ^{\beta }V):=\big (\Psi ^{\tau }_{\texttt{m}}(\varepsilon \varphi , \varepsilon ^{\beta }V)\big )_{|_{\tau =1}}\). One has the equivalence

$$\begin{aligned} (1+\varepsilon C_s)^{-1} \Vert \varepsilon ^{\beta }V\Vert _{H^{s}} \le \Vert W^{\perp }\Vert _{H^{s}} \le (1+\varepsilon C_{s}) \Vert \varepsilon ^{\beta }V\Vert _{H^{s}}\,. \end{aligned}$$
(5.61)

Proof

Let us check the bound (5.59) on \(\Psi ^{\tau }_{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\). First of all, using (2.35) and Lemma 2.4 to estimate \(\Theta ^{\tau }(\varepsilon \varphi ,\varepsilon ^{\beta }V)\) (which belongs to \(\mathbf{\Sigma }_1^{-\rho }[r,3]\otimes \mathcal {M}_2(\mathbb {C})\) for any \(\rho \ge 0\)) one can prove that (recall also the assumptions (4.4), (4.11) and (4.5))

$$\begin{aligned} \Vert \Theta ^{\tau }(\varepsilon \varphi , \varepsilon ^{\beta }V) Z \Vert _{H^{s+3}} \lesssim _{s} \Vert Z\Vert _{H^{s}}(\Vert \varepsilon \varphi \Vert _{H^{s}} +\Vert \varepsilon ^{\beta }V\Vert _{H^{s}})\,, \end{aligned}$$
(5.62)

for any \(Z\in H^s\). The bound (5.59) follows by formula (5.48), estimates (5.43) and (5.62). Using Neumann series, estimate (5.62) and the smallness (4.5) one can prove that \((\mathbb {1}+ \Theta ^{\tau })^{-1}-\mathbb {1}\) satisfies an estimate like (5.62). Then the estimate (5.59) on the inverse map follows using (5.43). The equivalence (5.61) follows by (5.59) using also (4.4), (4.11) and the smallness (4.5).

5.1.3 Conjugation and First Energy Estimate

We are now in position to state our conjugation result.

Proposition 5.11

Under the assumptions of Theorem 5.1 the following holds. Let \(U^{\perp }\) in (5.3) be a solution of (5.4), then the function \(W^{\perp }\) in (5.60) satisfies

$$\begin{aligned} \begin{aligned} \partial _{t}W^{\perp }&= \textrm{i} E\Pi _S^{\perp } {Op^{\textrm{BW}}}\left( {\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]} \lambda (\varepsilon \varphi ,\varepsilon ^{\beta }V;x)\Lambda (\xi ) +{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]} a^{(0)}(\varepsilon \varphi ,\varepsilon ^{\beta }V;x, \xi )\right) W^{\perp } \\ {}&+\textrm{i} E\Pi _{S}^{\perp }\mathfrak {R}_{\le 2}(\varepsilon \varphi ) W^{\perp } +\Pi _{S}^{\perp }\mathfrak {R}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V) +\Pi _S^{\perp } \Psi _{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V) \text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\,, \end{aligned} \end{aligned}$$
(5.63)

where the symbol \(\lambda (\varepsilon \varphi ,\varepsilon ^{\beta }V)\) is in (5.11), the symbol \(a^{(0)}\) is real and belongs to \(\mathbf{S\Sigma }_1^{0}[r,3]\) with estimates uniform in \(t\in [0,T]\). Moreover the following holds.

(i) The remainder \(\mathfrak {R}_{\le 2}(\varepsilon \varphi )\) is real (see Def. 2.13) and has the form

$$\begin{aligned} \mathfrak {R}_{\le 2}(\varepsilon \varphi )=\mathfrak {R}_{1}(\varepsilon \varphi ) +\mathfrak {R}_{2}(\varepsilon \varphi ,\varepsilon \varphi )\,, \qquad \mathfrak {R}_{j}\in \textbf{M}_{j}^{-2}\otimes \mathcal {M}_2(\mathbb {C})\,,\;\;\; j=1,2\,. \end{aligned}$$
(5.64)

(ii) The remainder \(\mathfrak {R}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V)\) has the form

$$\begin{aligned} \mathfrak {R}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V)= \big (\mathfrak {R}^{+}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V), \overline{\mathfrak {R}^{+}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V)}\big )^{T}\, \end{aligned}$$
(5.65)

and it satisfies the bound

$$\begin{aligned} \begin{aligned} \Vert \mathfrak {R}_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\Vert _{{H}^{s}}&\lesssim _s \Vert \varepsilon \varphi \Vert _{H^{s}(\mathbb {T}_{L})}^{3}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}} + \, \Vert \varepsilon ^{\beta }V\Vert _{H^{s}}^{2} +\varepsilon ^5\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}\,. \end{aligned} \end{aligned}$$
(5.66)

(iii) The vector field

$$\begin{aligned} \begin{aligned} \Pi _{S}^{\perp }\textrm{i}&E{Op^{\textrm{BW}}}\Big ({\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]} \lambda (\varepsilon \varphi ,\varepsilon ^{\beta }V;x)\Lambda (\xi ) +{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]} a^{(0)}(\varepsilon \varphi ,\varepsilon ^{\beta }V;x, \xi )\Big )_{|V\equiv 0}\Pi _{S}^{\perp }\\&+ \textrm{i} E\Pi _{S}^{\perp }\mathfrak {R}_{\le 2}(\varepsilon \varphi )\Pi _{S}^{\perp } \end{aligned} \end{aligned}$$

is Hamiltonian according to Def. 2.14.

Proof

By (5.4) and (5.60) we have

(5.67)
(5.68)
(5.69)
(5.70)
(5.71)
(5.72)
(5.73)

By Lemma 5.2 and Proposition 4.3 we have \(\mathfrak {Q}(\varepsilon \varphi )\in \mathbf {\Sigma }_1^{-2}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\). Recall the expansion (5.49). By Lemmata 2.8, 2.10 we deduce that the remainder (5.73) can be absorbed in a remainder \(\mathfrak {R}_{\le 2}(\varepsilon \varphi )\) satisfying the conditions in item (i), up to a term satisfying (5.66). Similarly, using Lemmata 5.9, 5.2, one can check that the remainders in (5.69) can be absorbed in the remainder \(\mathfrak {R}_{\ge 3}(\varepsilon \varphi ,V)\) satisfying the conditions in item (ii). Consider now the terms in (5.71)–(5.72). Using formula (5.48), the estimates on the remainder \(\Theta ^{\tau }\in \mathbf{\Sigma }_1^{-\rho }[r,3]\otimes \mathcal {M}_2(\mathbb {C})\), for any \(\rho \ge 0\), and the composition properties in Lemmata 2.8, 2.10 one deduces that

$$\begin{aligned}&(5.71)+(5.72)\\&\quad = \Pi _{S}^{\perp }\Phi _{\texttt{m}}(\varepsilon \varphi , V) [\textrm{i} E{Op^{\textrm{BW}}}\Big (\big ({\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]} +{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{f}(\varepsilon \varphi , V; x)\big )\Lambda (\xi )\Big )] (\Phi _{\texttt{m}}(\varepsilon \varphi , V))^{-1} W^{\perp } \\ {}&\qquad + \Pi _{S}^{\perp }\Phi _{\texttt{m}}(\varepsilon \varphi , V) [\textrm{i} E{Op^{\textrm{BW}}}\Big ({\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]}\texttt{g}(\varepsilon \varphi , V; x)\Lambda ^{-1}(\xi )\Big )] (\Phi _{\texttt{m}}(\varepsilon \varphi , V))^{-1} W^{\perp } \end{aligned}$$

up to remainders which satisfy the properties of \(\mathfrak {R}_{\le 2}, \mathfrak {R}_{\ge 3}\) in items (i), (ii). Using the expansions (5.33)–(5.34), by Lemma 2.2, Remark 2.11 and the (5.14) in Lemma 5.3 one gets

$$\begin{aligned} (5.71)+(5.72)= \Pi _{S}^{\perp } \textrm{i} E{Op^{\textrm{BW}}}\left( ({\bigl [{\begin{matrix}1&{}0\\ 0&{}1\end{matrix}}\bigr ]} (\lambda (\varepsilon \varphi ,V)))\Lambda (\xi )+{\bigl [{\begin{matrix}1&{}1\\ 1&{}1\end{matrix}}\bigr ]} \widetilde{a}^{(0)}(\varepsilon \varphi ,\varepsilon ^{\beta }V;x, \xi )\right) W^{\perp } \end{aligned}$$

for some \(\widetilde{a}^{(0)}\in \mathbf{S\Sigma }_1^{0}[r,3]\), up to remainders that can be absorbed in terms \(\mathfrak {R}_{\le 2}, \mathfrak {R}_{\ge 3}\) satisfying items (i), (ii) and where \(\lambda (\varepsilon \varphi ,V)\) is in Lemma 5.3.

Consider the term (5.68). By Lemma 5.9 we have that \(\texttt{R}^{\perp }_{\le 2}\in \mathbf {\Sigma }^{-3}_{1}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\). By Lemma 5.9 the term in (5.68) can be absorbed in terms \(\mathfrak {R}_{\le 2}, \mathfrak {R}_{\ge 3}\) satisfying items (i), (ii). Finally, consider the term in (5.67). Recalling (5.51), the expansion (5.49) and Lemmata 2.2, 2.10 we deduce that (5.67) is the sum of a pseudo-differential operator of order zero (with symbols in \(\mathbf{S\Sigma }_1^{0}[r,3]\)) plus remainders of the form \(\mathfrak {R}_{\le 2},\mathfrak {R}_{\ge 3}\) as in items (i)–(ii). The item (iii) follows recalling item (ii) of Lemma 5.2 and the fact that the map \(U^{\perp } \mapsto \Psi ^{\tau }_m(\varepsilon \varphi , \varepsilon ^{\beta }V) [U^{\perp }]\) is symplectic (see Lemma 5.9 ). This concludes the proof.

Recalling Proposition 5.11 we have the expansions of the symbols in the r.h.s. of (5.63)

$$\begin{aligned} \lambda (\varepsilon \varphi ,\varepsilon ^{\beta }V)&=1+\lambda _{\le 2}(\varepsilon \varphi ) +\lambda _{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\,, \end{aligned}$$
(5.74)
$$\begin{aligned} a^{(0)}(\varepsilon \varphi ,\varepsilon ^{\beta }V)&= a^{(0)}_{\le 2}(\varepsilon \varphi ) +a^{(0)}_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\,, \end{aligned}$$
(5.75)

where

$$\begin{aligned}{} & {} \lambda _{\le 2}(\varepsilon \varphi )=\lambda _1(\varepsilon \varphi )+\lambda _2(\varepsilon \varphi ,\varepsilon \varphi ), \qquad \lambda _{j}\in \textbf{S M}_j^{0},\;\; j=1,2, \end{aligned}$$
(5.76)
$$\begin{aligned}{} & {} a_{\le 2}^{(0)}(\varepsilon \varphi )=a^{(0)}_{1}(\varepsilon \varphi ) +a_{2}^{(0)}(\varepsilon \varphi ,\varepsilon \varphi ), \qquad a^{(0)}_{j}\in \textbf{SM}_{j}^{0}, \quad j=1,2, \end{aligned}$$
(5.77)

and \(\lambda _{\ge 3}, a^{(0)}_{\ge 3}\) are in \(\mathcal {N}_{p}^{0}\) with

$$\begin{aligned}{} & {} |\lambda _{\ge 3}|_{\mathcal {N}_{p}^{0}}\lesssim _{s} \Vert \varepsilon \varphi \Vert ^{3}_{H^{p+\mu }}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{p+\mu }}, \end{aligned}$$
(5.78)
$$\begin{aligned}{} & {} |a^{(0)}_{\ge 3}|_{\mathcal {N}_{p}^{0}}\lesssim _{p} \Vert \varepsilon \varphi \Vert ^{3}_{H^{p+\mu }}+ \Vert \varepsilon ^{\beta }V\Vert _{H^{p+\mu }}, \end{aligned}$$
(5.79)

for any \(p+\mu \le s\), \(p\in \mathbb {N}\), and some \(\mu >1/2\).

We now expand the right hand side of (5.63) in degrees of homogeneity in \(\varepsilon \varphi \). We define

$$\begin{aligned} \begin{aligned} \mathcal {M}&:=\begin{pmatrix} \mathcal {M}^{+} &{} 0\\ 0 &{} \mathcal {M}^{+} \end{pmatrix}, \qquad \mathcal {M}^+:=\mathcal {M}^{+}_0+\mathcal {M}^{+}_1+\mathcal {M}^{+}_2,\\ \mathcal {Z}&:=\begin{pmatrix} \mathcal {Z}^{+} &{} \mathcal {Z}^{-}\\ \mathcal {Z}^{-} &{} \mathcal {Z}^{+} \end{pmatrix}\,, \qquad \mathcal {Z}^{+}:=\mathcal {Z}^{+}_1+\mathcal {Z}^{+}_2\,, \quad \mathcal {Z}^{-}:=\mathcal {Z}^{-}_1+\mathcal {Z}^{-}_2 \end{aligned} \end{aligned}$$
(5.80)

where

$$\begin{aligned} \begin{aligned} \mathcal {M}_{0}^{(+)}&:=\Pi _{S}^{\perp }{Op^{\textrm{W}}}(\Lambda (\xi ))\,, \qquad \mathcal {M}_{j}^{(+)}:=\Pi _{S}^{\perp }{Op^{\textrm{BW}}}\Big ( \lambda _{j}(\varepsilon \varphi ;x)\Lambda (\xi ) \Big )\,, \qquad j=1, 2\,,\\ \mathcal {Z}_{j}^{(\pm )}&:=\Pi _{S}^{\perp }{Op^{\textrm{BW}}}\Big ( a^{(0)}_{j}(\varepsilon \varphi ;x,\xi ) \Big ) + \Pi _{S}^{\perp }\mathfrak {R}_{j}^{\pm }(\varepsilon \varphi )\,. \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \mathcal {M}_{>}&:=\begin{pmatrix} {\mathcal {M}}^{+}_{>} &{}0\\ 0 &{} \mathcal {M}^{+}_{>} \end{pmatrix}\,, \qquad {\mathcal {Z}}_{>}:=\begin{pmatrix} {\mathcal {Z}}^{+}_{>} &{} \mathcal {Z}^{-}_{>}\\ \mathcal {Z}^{-}_{>} &{} \mathcal {Z}^{+}_{>} \end{pmatrix}\,, \\ \mathcal {M}_{>}^+&:={Op^{\textrm{BW}}}\big ( \lambda _{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V;x,\xi )\Lambda (\xi )\big )\\ \mathcal {Z}^{+}_{>}&:=\Pi _{S}^{\perp } {Op^{\textrm{BW}}}\big ( a^{(0)}_{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V;x,\xi )\big )\,,\qquad \mathcal {Z}^{-}_{>}:= \Pi _{S}^{\perp }{Op^{\textrm{BW}}}\big (a^{(0)}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V;x,\xi )\big )\,, \end{aligned} \end{aligned}$$
(5.81)

We finally define the function \(\widetilde{\mathcal {Z}}:=\widetilde{\mathcal {Z}}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\) as

$$\begin{aligned} \widetilde{\mathcal {Z}}:= \Pi _{S}^{\perp }\mathfrak {R}_{\ge 3}(\varepsilon \varphi , \varepsilon ^{\beta }V) +\Pi _S^{\perp } \Psi _{\texttt{m}}(\varepsilon \varphi ,\varepsilon ^{\beta }V) \text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\,, \end{aligned}$$
(5.82)

where \(\mathfrak {R}_{\ge 3}\) is in (5.65). Using the notation above we rewrite the equation (5.63) as

$$\begin{aligned} \partial _{t}W^{\perp }=\textrm{i} E(\mathcal {M}+\mathcal {Z})W^{\perp } +\textrm{i} E (\mathcal {M}_{>}+\mathcal {Z}_{>}) W^{\perp } +\widetilde{\mathcal {Z}}\,. \end{aligned}$$
(5.83)

Remark 5.12

By item (iii) of Proposition 5.11 one deduces that the operators \(\mathcal {M},\mathcal {Z}\) and \(\mathcal {M}_{>}, \mathcal {Z}_{>}\) are self-adjoint according to Def. 2.14. This will be used to obtain the commutator structure in (5.86).

Consider the Fourier multiplier \(\langle D\rangle :={Op^{\textrm{W}}}(\langle \xi \rangle )\), set \(\mathcal {D}:={\bigl [{\begin{matrix}\langle D\rangle &{}0\\ 0&{}\langle D\rangle \end{matrix}}\bigr ]}\) and

$$\begin{aligned} \mathcal {N}_{s}(W^{\perp }):=\Vert W^{\perp }\Vert _{H^{s}}^{2}= \frac{1}{2}(\mathcal {D}^{2s} W^{\perp }, W^{\perp })_{L^2} \end{aligned}$$
(5.84)

(see (2.48) and (2.3) ). We have the following.

Lemma 5.13

(First energy estimates) Under the assumptions of Proposition 5.11 the following holds. One has that, for \(t\in [0,T]\)

$$\begin{aligned} \partial _{t}\mathcal {N}_{s}(W^{\perp })= \tfrac{1}{2}(\textrm{i} E\mathcal {D}^{2s}\mathcal {A}(\varepsilon \varphi ) W^{\perp }, W^{\perp })_{L^2} +\mathcal {B}(\varepsilon \varphi ,\varepsilon ^{\beta }V), \end{aligned}$$

where

\(\bullet \) the remainder \(\mathcal {B}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\) satisfies

$$\begin{aligned} \begin{aligned} \sup _{t\in [0,T] }\Vert \mathcal {B}(\varepsilon \varphi ,\varepsilon ^{\beta }V)\Vert _{H^{s}}&\lesssim _{s} \sup _{t\in [0,T]}\Vert \varepsilon \varphi \Vert _{H^{s}}^{3} \sup _{t\in [0,T]}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}^{2} \\ {}&+ \sup _{t\in [0,T]}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}^{3} +\varepsilon ^5\sup _{t\in [0,T]}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}\,; \end{aligned} \end{aligned}$$
(5.85)

\(\bullet \) the operator \(\mathcal {A}(\varepsilon \varphi )\) has the form

$$\begin{aligned} \mathcal {A}(\varepsilon \varphi ):={\bigl [{\begin{matrix}A^{+}&{}A^{-}\\ A^{-}&{}A^{+}\end{matrix}}\bigr ]} :=\mathcal {D}^{-2s} \big [\textrm{i} E (\mathcal {M}+\mathcal {Z})(\varepsilon \varphi ), \textrm{i} E \mathcal {D}^{2s}\big ]\,, \end{aligned}$$
(5.86)

and it is real, where \(\mathcal {Z}\) is in (5.80). In particular

$$\begin{aligned} \begin{aligned}&A^{\sigma }(\varepsilon \varphi )=A^{\sigma }_{1}(\varepsilon \varphi )+A^{\sigma }_{2}(\varepsilon \varphi ,\varepsilon \varphi )\,,\qquad \sigma \in \{\pm \}\,, \end{aligned} \end{aligned}$$
(5.87)

where (recall (5.80))

$$\begin{aligned} \begin{aligned} A_k^{+}(\varepsilon \varphi )&:= \langle D\rangle ^{-2s} \big [ \textrm{i} (\mathcal {M}_{k}^+(\varepsilon \varphi )+\mathcal {Z}_{k}^{+}(\varepsilon \varphi )), \textrm{i} \langle D\rangle ^{2s}\big ]\,, \quad k=1,2\,, \\ A_k^{-}(\varepsilon \varphi )&:=\langle D\rangle ^{-2s} \Big (\textrm{i} \mathcal {Z}_{k}^{-} (\varepsilon \varphi )\textrm{i} \langle D\rangle ^{2s}+ \textrm{i} \langle D\rangle ^{2s}\textrm{i} \mathcal {Z}_{k}^{-}(\varepsilon \varphi )\Big )\,, \quad k=1,2\,. \end{aligned} \end{aligned}$$
(5.88)

Moreover the following holds.

\(\bullet \) The operator \(A_{1}^{+}\) has the form

$$\begin{aligned} A_1^{+}(\varepsilon \varphi ):={Op^{\textrm{BW}}}(\texttt{a}_1(\varepsilon \varphi ;x,\xi ))+\texttt{A}_1(\varepsilon \varphi )\,,\qquad \texttt{a}_1\in \textbf{SM}_1^{0}\,,\quad \texttt{A}_1\in \textbf{M}_1^{-2}\,, \end{aligned}$$
(5.89)

and \((A_1^{+})^{\sigma }(j, 0, j)=0\) for any \(\sigma \in \{\pm \}\), \(j\in \mathbb {Z}\).

\(\bullet \) The operator \(A_2^{+}\) has the form

$$\begin{aligned} A_2^{+}(\varepsilon \varphi ,\varepsilon \varphi ):={Op^{\textrm{BW}}}(\texttt{a}_2(\varepsilon \varphi ;x,\xi )) +\texttt{A}_2(\varepsilon \varphi )\,, \qquad \texttt{a}_2\in \textbf{SM}_2^{0}\,, \quad \texttt{A}_2\in \textbf{M}_2^{-2}\,, \end{aligned}$$
(5.90)

and \((A_2^{+})^{\sigma , -\sigma }(j, p_1, p_2, j)=0\), for any \(\sigma \in \{\pm \}\), \(j,p_1,p_2\in \mathbb {Z}\).

\(\bullet \) The operators \(A^{-}_{k}\) belongs to \(\textbf{M}_{k}^{0}\), \(k=1,2\).

Proof

By (5.83), Remark 5.12, and an explicit computation one gets

$$\begin{aligned} \partial _{t}\mathcal {N}_{s}(W^{\perp })&= \tfrac{1}{2}(\mathcal {D}^{2s} \partial _{t}W^{\perp }, W^{\perp })_{L^{2}}+ \tfrac{1}{2}(\mathcal {D}^{2s} W^{\perp }, \partial _{t}W^{\perp })_{L^{2}}\nonumber \\&=\tfrac{1}{2}(\textrm{i} E\mathcal {D}^{2s}\mathcal {A}(\varepsilon \varphi ) W^{\perp }, W^{\perp })_{L^2} \end{aligned}$$
(5.91)
$$\begin{aligned}&+\tfrac{1}{2}(\textrm{i} E\mathcal {D}^{2s} \Big (\langle D\rangle ^{-2s}\big [\textrm{i} \mathcal {M}^{+}_{>}, \textrm{i} \langle D\rangle ^{2s} \big ]\Big ) W^{\perp }, W^{\perp })_{L^2} \end{aligned}$$
(5.92)
$$\begin{aligned}&+\tfrac{1}{2}(\textrm{i} E\big [\textrm{i} E\mathcal {Z}_{>}, \textrm{i} E \mathcal {D}^{2s} \big ] W^{\perp }, W^{\perp })_{L^2} \end{aligned}$$
(5.93)
$$\begin{aligned}&+\tfrac{1}{2}( \mathcal {D}^{2s} \widetilde{\mathcal {Z}}, W^{\perp } )_{L^2} +(\mathcal {D}^{2s} W^{\perp }, \widetilde{\mathcal {Z}} )_{L^2}\,, \end{aligned}$$
(5.94)

where \(\mathcal {A}(\varepsilon \varphi )\) is in (5.86) and where we used the fact that

$$\begin{aligned} ( [\textrm{i} E \mathcal {M}^{+}_0, \textrm{i} E\mathcal {D}^{2s}] W^{\perp }, W^{\perp } )_{L^2}=0. \end{aligned}$$

The terms in (5.94) satisfies the bound (5.85), by (5.82), Cauchy–Schwarz inequality, estimates (5.66) and (5.59), the equivalence (5.61), bounds (4.4), (4.11), the smallness (4.5) and Lemma 4.1 to estimate the residual.

By (5.81) \(\mathcal {Z}_{>}^{\pm }\) are bounded para-differential operators with symbols in \(\textbf{SNH}_{3}^{0}[r]\). Then, by Lemma 2.2, we have that the operator \( \mathcal {D}^{-2s} \big [\textrm{i} E\mathcal {Z}_{>}, \textrm{i} E \mathcal {D}^{2s} \big ] \) is bounded. Therefore one can check that the term in (5.93) satisfies the bound (5.85).

Consider the term (5.92). By Lemmata 2.2, 2.1, and recalling the definition of \(\mathcal {M}_{>}^{+}\) in (5.81), we deduce that the operator

$$\begin{aligned} \langle D\rangle ^{-2s}\big [\langle D\rangle ^{2s}, {Op^{\textrm{BW}}}(\lambda _{\ge 3}(\varepsilon \varphi ,\varepsilon ^{\beta }V;x,\xi )\Lambda (\xi ))\big ] \end{aligned}$$

is bounded and

$$\begin{aligned} \begin{aligned} \Vert \langle D\rangle ^{-2s}\big [\langle D\rangle ^{2s},&{Op^{\textrm{W}}}((\lambda _{\ge 3})_{\chi }(\varepsilon \varphi ,\varepsilon ^{\beta }V;x,\xi )\Lambda (\xi ))\big ] h\Vert _{H^{s}} \\ {}&\lesssim _{s}\Vert h\Vert _{H^{s}} \big (\sup _{t\in [0,T]}\Vert \varepsilon \varphi \Vert ^{3}_{H^{s}}+ \sup _{t\in [0,T]}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}\Big ), \end{aligned} \end{aligned}$$

where we used (5.78). This implies that the term (5.92) satisfies (5.85). Consider the term (5.91) which has components in (5.86)–(5.88). Since \(\mathcal {M}+\mathcal {Z}\) in (5.80) is a bounded para-differential operator plus a smoothing remainder, the operator \(\mathcal {M}\) is diagonal, one can note, using Lemmata 2.2, 2.1 and the commutator structure in \(A_{k}^{+}\), that \(A_{k}^{-}\) is in \(\textbf{M}_k^{0}\) and \(A^{+}_{k}\) admits the expansions (5.89)–(5.90). Since \(A_{k}^{+}\) involves a commutator with the diagonal operator \(\langle D\rangle ^{2s}\) it is easy to see that \((A_1^{+})^{\sigma }(j, p, k)\) and \((A_2^{+})^{\sigma , -\sigma }(j, p_1, p_2, k)\) are zero when \(j=k\).

5.2 Modified Energies

We introduce the following energy forms (recall (2.48), (5.84)):

$$\begin{aligned} \texttt{E}&:= \tfrac{1}{2}(\textrm{i}\,E\mathcal {D}^{2s}\mathcal {Q}(\varepsilon \varphi )W^{\perp }, W^{\perp })_{L^{2}}\,,\nonumber \\ \mathcal {Q}(\varphi )&:={\bigl [{\begin{matrix}Q^{+}&{}Q^{-}\\ Q^{-}&{}Q^{+}\end{matrix}}\bigr ]}\in \mathbf {\Sigma }_1^{0}[r,3]\otimes \mathcal {M}_2(\mathbb {C})\,, \end{aligned}$$
(5.95)
$$\begin{aligned} \mathcal {E}_{s}&:=\mathcal {N}_{s}+\texttt{E}\,. \end{aligned}$$
(5.96)

In particular we look for an operator \(\mathcal {Q}(\varphi )\) which is self-adjoint, real-to-real and that admits an expansion of the form (recall Def. 2.3)

$$\begin{aligned} \begin{aligned}&Q^{\sigma }(\varphi )=Q^{\sigma }_{1}(\varphi )+Q^{\sigma }_{2}(\varphi ,\varphi )\,,\qquad Q_{k}^{\sigma }\in \textbf{M}^{0}_k\,, \;\;\; k=1,2\,,\;\;\sigma \in \{\pm \}\,. \end{aligned} \end{aligned}$$
(5.97)

We shall construct the modified energies \(Q_1^{\sigma }, Q_{2}^{\sigma }\), \(\sigma \in \{\pm \}\), in a suitable way. More precisely we prove the following.

Proposition 5.14

Under the assumptions of Theorem 5.1 the following holds. There exists a real, self-adjoint operator \(\mathcal {Q}(\varphi )\) as in (5.95) with \(Q_{i}^{\sigma }\), \(i=1,2\), \(\sigma \in \{\pm \}\), of the form (5.97) such that the energy \(\mathcal {E}_s\) in (5.96)–(5.95) satisfies, for any \(t\in [0,T]\) and for \(s\ge s_0\) with \(s_0\gg 1\),

$$\begin{aligned} \begin{aligned} | \partial _{t}\mathcal {E}_{s}(W^{\perp })&|\lesssim _{s}\sup _{t\in [0,T]}\Vert \varphi \Vert _{H^{s}}^{3} \sup _{t\in [0,T]}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}^{2} +\sup _{t\in [0,T]}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}^{3} +\varepsilon ^5\sup _{t\in [0,T]}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}\,. \end{aligned} \end{aligned}$$
(5.98)

Moreover the following holds:

\(\bullet \) (Off-diagonal). The operators \(Q^{-}_{k}\) belong to \(\textbf{M}_{k}^{-1}\), \(k=1,2\) .

\(\bullet \) (Diagonal I). The operator \(Q_{1}^{+}\) has the form

$$\begin{aligned} Q_1^{+}(\varepsilon \varphi ):={Op^{\textrm{BW}}}(\texttt{q}_1(\varepsilon \varphi ;x,\xi ))+\texttt{Q}_1(\varepsilon \varphi )\,,\qquad \texttt{q}_1\in \textbf{SM}_1^{0}\,,\quad \texttt{Q}_1\in \textbf{M}_1^{-2}\,. \end{aligned}$$
(5.99)

\(\bullet \) (Diagonal II). The operator \(Q_2^{+}\) has the form

$$\begin{aligned} Q_2^{+}(\varepsilon \varphi ):={Op^{\textrm{BW}}}(\texttt{q}_2(\varepsilon \varphi ;x,\xi ))+\texttt{Q}_2(\varepsilon \varphi )\,, \qquad \texttt{q}_2\in \textbf{SM}_2^{0}\,, \quad \texttt{Q}_2\in \textbf{M}_2^{-1}\,. \end{aligned}$$
(5.100)

Finally (recall 2.32, (2.34) and (2.40)) one has

$$\begin{aligned} |\texttt{q}_2^{\sigma _1\sigma _2}(j_1,j_2,\xi )|&\lesssim \gamma ^{-1}\max \{\langle j_1\rangle ,\langle j_2\rangle \}^{\mu }\,, \end{aligned}$$
(5.101)
$$\begin{aligned} |(\texttt{Q}_{2})^{\sigma _1\sigma _2}(\xi ,j_1,j_{2},j_3)|&+|(Q_{2}^{-})^{\sigma _1\sigma _2}(\xi ,j_1,j_{2},j_3)| \lesssim \gamma ^{-1} \frac{ \max _{2}\{\langle j_1\rangle ,\langle j_2\rangle ,\langle j_3\rangle \}^{\mu +1}}{\max \{\langle j_1\rangle ,\langle j_2\rangle ,\langle j_3\rangle \}}\,, \end{aligned}$$
(5.102)

for some \(\mu >0\).

In Sect. 5.2.1 we construct the operators \(Q_{1}^{\sigma }\), \(\sigma \in \{\pm \}\), while in Sect. 5.2.2 we construct the operators \(Q_{2}^{\sigma }\), \(\sigma \in \{\pm \}\), Then in Sect. 5.2.3 we prove Proposition 5.14 and then we conclude the proof of Theorem 5.1.

5.2.1 First Order Homological Equation

In this section we prove the following result.

Lemma 5.15

(Homological equation 1) There exist operators \(Q_{1}^{+}\) satisfying (5.99) such that

$$\begin{aligned} A^{+}_1+Q^{+}_1(\textrm{i} E \Lambda \varepsilon \varphi ) +\langle D\rangle ^{-2s} [\langle D\rangle ^{2s} Q^{+}_1, \textrm{i}\Lambda ]=0 \end{aligned}$$
(5.103)

and \( Q_{1}^{-}\in \textbf{M}^{-1}_{1}\) such that

$$\begin{aligned} A^{-}_1+Q^{-}_1(\textrm{i} E \Lambda \varphi ) -\textrm{i}\langle D\rangle ^{-2s}(\langle D\rangle ^{2s} Q_1^{-} \Lambda +\Lambda \,\langle D\rangle ^{2s}\,Q_1^{-})=0\,, \end{aligned}$$
(5.104)

where \(A^{+}_1,A^{-}_1\) are the operators introduced in (5.88) and \(\Lambda \) is the Fourier multiplier in (2.5).

Moreover \(Q_{1}^{\sigma }=\overline{ Q_{1}^{\sigma }}\), \(\sigma \in \{\pm \}\).

Proof

We start by writing the equation (5.103) for the Fourier coefficients. We have, fixing \(j, k\in S^c\), \(p\in S\) and \(\sigma \in \{\pm \}\) such that \(\sigma p=j-k\)

$$\begin{aligned} \Big ( (Q_1^{+})^{\sigma }(j, p, k)\,\sigma \textrm{i}\,\Lambda (p) \, +\textrm{i}(\Lambda (k)-\Lambda (j)) (Q_1^{+})^{\sigma }(j, p, k)\,+ (A^{+}_1)^{\sigma }(j, p, k)\, \Big ) \widehat{\varphi }^{\sigma }(p)=0\,. \end{aligned}$$

Then we set

$$\begin{aligned} \begin{aligned} (Q_1^{+})^{\sigma }(j, p, k)&= \frac{1}{ \textrm{i} \big (\Lambda (j)-\Lambda (k)-\sigma \Lambda (p)\big )}\, (A^{+}_1)^{\sigma }(j, p, k)\,. \end{aligned} \end{aligned}$$
(5.105)

The coefficients \((Q_1^{+})^{\sigma }(j, p, k)\) vanish as \(j=k\) because by Lemma 5.13 we have \((A^{+}_1)^{\sigma }(j, 0, j)=0\). Recalling (5.89), Definitions 2.3, 2.5, (2.20), we have

$$\begin{aligned} (A^{+}_1)^{\sigma }(j, p, k):= {(\texttt{a}_1)}^{\sigma }(\sigma p, (j+k)/2)\chi \big ( \frac{2|p|}{\langle j+k\rangle }\big ) +\texttt{A}_1^{\sigma }(j, p, k), \end{aligned}$$

where

$$\begin{aligned} |\texttt{A}_1^{\sigma }(j, p, k)|\lesssim \langle p\rangle ^{\mu }\,(\max \{ \langle p \rangle , \langle k \rangle \})^{-2} \end{aligned}$$
(5.106)

for some \(\mu >0\). We observe that, by definition of S in (4.1), the modulus of a site in \(S^{c}\) is always greater than the modulus of a site in S, then we must have \(j k>0\) because otherwise the momentum condition implies \(|p|=| j |+ |k|\), which is not possible since \(p\in S\), \(j, k\in S^c\). Now we claim that we can expand the denominator in (5.105)

$$\begin{aligned} \frac{1}{ \Lambda (j)-\Lambda (k)-\sigma \Lambda (p)}={\left\{ \begin{array}{ll} \dfrac{1}{ \sigma p-\sigma \Lambda (p)}+r_{-2}(p; j, k)\qquad \text{ if } \quad j, k>0\\[3mm] -\dfrac{1}{ \sigma p+\sigma \Lambda (p)}+\tilde{r}_{-2}(p; j, k) \qquad \,\, \text{ if } \quad j, k<0 \end{array}\right. } \end{aligned}$$

where

$$\begin{aligned} |r_{-2}(p;j, k)|, |\tilde{r}_{-2}(p;j, k)|\lesssim \langle p\rangle ^{\mu } \left( \min \{\langle j\rangle , \langle k\rangle \} \right) ^{-2}\,, \end{aligned}$$
(5.107)

for some \(\mu >0\). Let us prove the claim in the case \(j, k>0\). The other case is similar. We use that

$$\begin{aligned} \Lambda (j)=|j|+\frac{m}{2|j|}+\mathcal {O}(|j|^{-3}). \end{aligned}$$

We have that

$$\begin{aligned}&\frac{1}{ \Lambda (j)-\Lambda (k)-\sigma \Lambda (p)} -\dfrac{1}{ \sigma p-\sigma \Lambda (p)} = \frac{\Lambda (k)-\Lambda (j)+\sigma p}{\sigma p-\sigma \Lambda (p)} \,\\&\quad \frac{1}{\Lambda (j)-\Lambda (k)-\sigma \Lambda (p)} =:r_{-2}(p; j, k)\,. \end{aligned}$$

By momentum condition \(j-k=\sigma p\) we have that

$$\begin{aligned} \sigma p+\Lambda (k)-\Lambda (j)&=\frac{m}{2}(k^{-1}-j^{-1}) +\mathcal {O}(\left( \min \{\langle j\rangle , \langle k\rangle \} \right) ^{-3} ) \\ {}&=\frac{m}{2}\frac{\sigma p}{j\,k} +\mathcal {O}(\left( \min \{\langle j\rangle , \langle k\rangle \} \right) ^{-3} ) =\mathcal {O}(\left( \min \{\langle j\rangle , \langle k\rangle \} \right) ^{-2} )\,. \end{aligned}$$

Therefore, using bound (3.8), we notice that \(r_{-2}\) satisfies (5.107). This concludes the proof of the claim.

Therefore (recall that \(j, k\in S^c\) then \(j, k\ne 0\)) we write

$$\begin{aligned} (Q_1^{+})^{\sigma }(j, p, k) = \widetilde{\texttt{q}}_{1}^{\sigma }(j,p,k) \chi \big ( \frac{2|p|}{\langle j+k\rangle }\big ) +\widetilde{\texttt{Q}}_1^{\sigma }(j,p,k) \end{aligned}$$
(5.108)

where

$$\begin{aligned} \widetilde{\texttt{q}}_{1}^{\sigma }(j,p,k):= \frac{1}{\textrm{i} \big (\widetilde{\chi }(j+k)\,\, \sigma p-\sigma \Lambda (p)\big )}\, {(\texttt{a}_1)}^{\sigma }(\sigma p, (j+k)/2) \end{aligned}$$
(5.109)

and (recall \(j+k\ne 0\) because \(j\, k>0\))

$$\begin{aligned} \widetilde{\chi }(x)={\left\{ \begin{array}{ll} 1 \qquad \quad \text{ if }\,\,x>0,\\ -1 \qquad \,\, \text{ if } \,x< 0 \end{array}\right. } \end{aligned}$$

and \(\widetilde{\texttt{Q}}^{\sigma }_1(j,p, k)\) is defined by difference. Using the bound (5.107) and Lemma 3.8 (see bound (3.8)) we deduce the estimate

$$\begin{aligned} |\widetilde{\texttt{Q}}_1^{\sigma }(j,p, k)|\lesssim \langle p \rangle ^{\mu } \left( \max \{\langle p\rangle , \langle k\rangle \} \right) ^{-2}, \end{aligned}$$

for some \(\mu >0\) (possibly larger than the one appearing in (5.106)). Then the operator \(\widetilde{\texttt{Q}}_1\) of the form (2.30)–(2.31) (with \(p=1\)) with coefficients \(\widetilde{\texttt{Q}}_1^{\sigma }(j,p, k)\) is a remainder in the class \(\textbf{M}^{-2}_1\). We shall consider a slight modification of \(\widetilde{\texttt{q}}^{\sigma }_1\). Let us now introduce the \(C^{\infty }\) cut-off function \(\eta : \mathbb {R}\rightarrow \mathbb {R}\) defined as

$$\begin{aligned} \eta (y):=\left\{ \begin{aligned}&1\quad y\ge 1/2,\\&0 \quad y\le -1/2. \end{aligned} \right. \end{aligned}$$
(5.110)

Let us define the symbol

$$\begin{aligned} \texttt{q}^{\sigma }_1(\varepsilon \varphi ;x,\xi ):= \eta (\xi )\widetilde{\texttt{q}}_1^{\sigma }(L_{+} \varepsilon \varphi ;x,\xi )+ (1-\eta (\xi ))\widetilde{\texttt{q}}_1^{\sigma }(L_{-} \varepsilon \varphi ;x,\xi ) \end{aligned}$$

where \(L_{\pm }\) are the operator defined by linearity as

$$\begin{aligned} L_{\pm }e^{\textrm{i} \sigma px}= \frac{1}{\pm \sigma p-\sigma \Lambda (p)} e^{\textrm{i} \sigma px},\quad p\in S, \end{aligned}$$

and 0 otherwise. It is easy to check that \(\texttt{q}^{\sigma }_1(\varepsilon \varphi ;x,\xi )\in \textbf{SM}_1^{0}\) and \(\texttt{q}^{\sigma }_1-\widetilde{\texttt{q}}^{\sigma }_1\) is infinitely smoothing. By the discussion above we have that the operator \(Q^{+}_1\) of the form (2.30) with coefficients as in (5.108) can be written as

$$\begin{aligned} Q^{+}_1(\varepsilon \varphi ):={Op^{\textrm{BW}}}(\texttt{q}_1(\varepsilon \varphi ;x,\xi ))+\texttt{Q}_1(\varepsilon \varphi ),\qquad \texttt{q}_1\in \textbf{SM}_1^{0},\quad \texttt{Q}_1\in {\textbf{M}_1^{-2}}, \end{aligned}$$

where \(\texttt{Q}_1:={Op^{\textrm{BW}}}(\texttt{q}_1-\widetilde{\texttt{q}}_1)+\widetilde{\texttt{Q}}_1\). We now study the equation (5.104). In Fourier we have, fixing \(j, k\in S^c\), \(p\in S\) and \(\sigma \in \{\pm \}\) such that \(\sigma p=j-k\)

$$\begin{aligned}&(Q_1^{-})^{\sigma }(j, p, k)\,\sigma \textrm{i}\,\Lambda (p) \,\widehat{\varphi }^{\sigma }(p) -\textrm{i} (\Lambda (j)+\Lambda (k)) (Q_1^{-})^{\sigma }(j, p, k)\,\widehat{\varphi }^{\sigma }(p)\\&\quad +(A^{-}_1)^{\sigma }(j, p, k)\,\widehat{\varphi }^{\sigma }(p) =0\,. \end{aligned}$$

We set

$$\begin{aligned} \begin{aligned} (Q_1^{-})^{\sigma }(j, p, k)&= \frac{1}{ \textrm{i}\big (\Lambda (j)+\Lambda (k)-\sigma \Lambda (p)\big )}\, (A^{-}_1)^{\sigma }(j, p, k)\,. \end{aligned} \end{aligned}$$
(5.111)

First we note that, by Lemma 5.13, \(A_1^{-}\in \textbf{M}_1^0\) and then

$$\begin{aligned} | (A^{-}_1)^{\sigma }(j, p, k)|\lesssim \langle p\rangle ^{\mu }\, \end{aligned}$$

for some \(\mu >0\). Moreover \(|\Lambda (j)+\Lambda (k)-\sigma \Lambda (p)|\ge \Lambda (j)\), since \(\min \{ \langle j \rangle , \langle k \rangle \}\ge |p|\). Therefore the operator \(Q_1^{-}\) with coefficients in (5.111) belongs to \(\textbf{M}_1^{-1}\). This concludes the proof. \(\square \)

5.2.2 Second order Homological Equation

In this section we prove the following result.

Lemma 5.16

(Homological equation 2) Recall (5.17) and Lemma 5.13. Consider the operators \(Q_{1}^{\sigma }\), \(\sigma \in \{\pm \}\), given by Lemma 5.15. There exist operators \(Q_{2}^{+}\in \textbf{M}^{0}_{2}\) such that

$$\begin{aligned} \begin{aligned}&Q^{+}_2(\textrm{i} E \Lambda \varepsilon \varphi , \varepsilon \varphi ) +Q^{+}_2(\varepsilon \varphi , \textrm{i} E \Lambda \varepsilon \varphi ) +\langle D\rangle ^{-2s} [\langle D\rangle ^{2s} Q^{+}_2, \textrm{i}\Lambda ] = \\ {}&=-A^{+}_2-Q^{+}_1(M_1(\varepsilon \varphi ) \varepsilon \varphi ) \\ {}&\quad \,-\langle D\rangle ^{-2s} \Big ( [ \langle D\rangle ^{2s} Q^{+}_1, \textrm{i}(\mathcal {M}_1^++\mathcal {Z}_1^{+})] -\textrm{i} (\langle D\rangle ^{2s} Q_1^{-} \mathcal {Z}_1^{-} +\mathcal {Z}_1^{-}\langle D\rangle ^{2s} Q_1^{-}) \Big )\,. \end{aligned} \end{aligned}$$
(5.112)

and \( Q_{2}^{-}\in \textbf{M}^{-1}_{2}\) such that

$$\begin{aligned} \begin{aligned} Q_2^{-}(\textrm{i} E \Lambda \varepsilon \varphi ,\varepsilon \varphi ) +&Q_2^{-}(\varepsilon \varphi ,\textrm{i} E \Lambda \varepsilon \varphi ) +\textrm{i} \Lambda Q_2^{-}+\textrm{i} Q_2^{-} \Lambda = \\ {}&=-A_2^{-}-Q^{-}_1(M_1(\varepsilon \varphi )\varepsilon \varphi )-\textrm{i} [Q_1^{+}, \mathcal {Z}_1^{-}] -\langle D\rangle ^{-2s}[\langle D\rangle ^{2s}Q_{1}^{+},\textrm{i} \mathcal {Z}_1^{-}] \\&+\langle D\rangle ^{-2s} \Big (\textrm{i} (\langle D\rangle ^{2s} Q_1^{-}(\mathcal {M}_1^{+}+ \mathcal {Z}_1^{+}) +(\mathcal {M}_1^{+}+\mathcal {Z}_1^{+}) \langle D\rangle ^{2s} Q_1^{-}) \Big ). \end{aligned} \end{aligned}$$
(5.113)

Moreover the operator \(Q_2^{+}\) has the form (5.100) and satisfies (5.102). Finally \(Q_{2}^{\sigma }=\overline{ Q_{2}^{\sigma }}\), \(\sigma \in \{\pm \}\).

Proof

We start by considering the equation (5.112). We claim that the right hand side of (5.112)

can be rewritten as

$$\begin{aligned} F(\varepsilon \varphi )={Op^{\textrm{BW}}}(\texttt{f}(\varepsilon \varphi ; x, \xi ))+\texttt{F}(\varepsilon \varphi ) \end{aligned}$$

where \(\texttt{f}(\varepsilon \varphi ; x, \xi )\in \textbf{SM}^0_2\) and \(\texttt{F}(\varepsilon \varphi )\in \textbf{M}_2^{-1}\). By Lemma 5.13–(5.90) \(A_2^{+}\) is the sum of a pseudo differential operator of order 0 and a 2-smoothing operator. By Lemma 5.15 the same holds for \(Q^{+}_1\). The most critical term is the one involving \(\mathcal {M}_1^{+}+\mathcal {Z}_1^{+}\), because it is unbounded. Indeed, we recall that \(\mathcal {M}_1^{+}+\mathcal {Z}_1^{+}\) in (5.80) is the sum of a pseudo differential operator of order 1 and an operator \(\textbf{M}_1^{-2}\), that for notation convenience we call respectively \(\mathcal {P}, \mathcal {B}\). Then the commutator \([ \langle D\rangle ^{2\,s} Q^{+}_1, \textrm{i}(\mathcal {M}_1^{+}+\mathcal {Z}_1^{+})]\) looses just \(2\,s\) derivatives, indeed

$$\begin{aligned}{}[ \langle D\rangle ^{2s} Q^{+}_1, \textrm{i}(\mathcal {M}_1^{+}+\mathcal {Z}_1^{+})] =[ \langle D\rangle ^{2s} {Op^{\textrm{BW}}}(\texttt{q}_1), \textrm{i}\mathcal {P}] + [ \langle D\rangle ^{2s} \texttt{Q}_1, \textrm{i}\mathcal {P}] +[ \langle D\rangle ^{2s} Q^{+}_1, \textrm{i}\mathcal {B}]. \end{aligned}$$

The first term is the commutator between two pseudo differential operators, then it is a pseudo differential operator and it gains one derivative, hence it has a symbol in \(\textbf{SM}_2^{2s}\); the second term is, by composition, a linear operator in \(\textbf{M}^{2s-1}_2\) because \(\texttt{Q}_1\in \textbf{M}_1^{-2}\); the third term is, by composition, in \(\textbf{M}^{2s-2}_2\). Therefore

$$\begin{aligned} \langle D\rangle ^{-2s}[ \langle D\rangle ^{2s} Q^{+}_1, \textrm{i}(\mathcal {M}_1^{+}+\mathcal {Z}_1^{+})] \end{aligned}$$

is the sum of a pseudo differential operator with symbol in \(\textbf{SM}_2^0\) and an operator in \(\textbf{M}_2^{-1}\). By recalling that \(\mathcal {Z}_1^{-}\) is bounded and \(Q_1^{-}\in \textbf{M}_1^{-1}\), the remaining terms in the right hand side of (5.112) are \(\textbf{M}_2^{-1}\). This proves the claim.

Recalling Definitions 2.32.5 and Lemma 2.6 we have the following bounds on the coefficients of \(\texttt{f}(\varepsilon \varphi ;x,\xi )\) and \(\texttt{F}(\varepsilon \varphi )\):

$$\begin{aligned} \begin{aligned} |\texttt{f}^{\sigma _1\sigma _2}(p_1,p_2, j)|&\lesssim _{s} \max _{1}\{\langle p_1\rangle ,\langle p_2\rangle \}^{\mu }\,, \\ |\texttt{F}^{\sigma _1 \sigma _2}(j, p_1, p_2, k)|&\lesssim _{s} \frac{\max _{1}\{\langle p_1\rangle , \langle p_2\rangle \}^{\mu } \max _{2}\{\langle p_1\rangle , \langle p_2\rangle ,\langle k\rangle \}^{\mu +1}}{\max _{1}\{\langle p_1\rangle , \langle p_2\rangle ,\langle k\rangle \}}\,, \end{aligned} \end{aligned}$$
(5.114)

for any \(j,k \in S^c\), \(p_1,p_2\in S\), \(\sigma _1,\sigma _2\in \{\pm 1\} \) and for some \(\mu >0\) .

Then we define

$$\begin{aligned} \begin{aligned}&\big ( {Q_2^{+} } \big )^{\sigma _1 \sigma _2} (j, p_1, p_2, k)= \frac{ \texttt{f}^{\sigma _1 \sigma _2}( p_1, p_2, (j+k)/2) \chi \big ( \frac{2|p|}{\langle j+k\rangle }\big ) + \texttt{F}^{\sigma _1 \sigma _2}(j, p_1, p_2, k)}{ \textrm{i} \big ( \Lambda (j)-\Lambda (k)-\sigma _1 \Lambda (p_1)-\sigma _2 \Lambda (p_2) \big )}\,, \\ {}&\textrm{if} \quad \Lambda (j)-\Lambda (k)-\sigma _1 \Lambda (p_1)-\sigma _2 \Lambda (p_2)\ne 0 \quad \textrm{and}\quad \sigma _1 p_1+\sigma _2 p_2=j-k\,, \end{aligned} \end{aligned}$$
(5.115)

and

$$\begin{aligned} \begin{aligned}&\big ( {Q_2^{+} } \big )^{\sigma _1 \sigma _2} (j, p_1, p_2, k)=0 \\ {}&\textrm{if} \quad \Lambda (j)-\Lambda (k)-\sigma _1 \Lambda (p_1)-\sigma _2 \Lambda (p_2)=0 \quad \textrm{and}\quad \sigma _1 p_1+\sigma _2 p_2=j-k\,. \end{aligned} \end{aligned}$$
(5.116)

We have to show that the operator defined by (5.115)–(5.116) solves the equation (5.112). We know that the only possible resonances at order four are \((j, k, p_1, p_2)\) such that (up to permutations)

$$\begin{aligned} |j|=|k|, \quad |p_1|=|p_2| \qquad \text{ or } \qquad |j|=|p_1|, \quad |k|=|p_2|. \end{aligned}$$

Since S is symmetric and \(j, k\notin S\), \(p_1, p_2\in S\) then the only possible case is \(|j|=|k|\), \(|p_1|=|p_2|\) with \(\sigma _1\ne \sigma _2\). By the Lemma 3.8 we have that

$$\begin{aligned} \Lambda (j)-\Lambda (k)-\sigma _1 \Lambda (p_1)-\sigma _2 \Lambda (p_2)=0, \qquad \sigma _1 p_1+\sigma _2 p_2=j-k, \end{aligned}$$

only if

$$\begin{aligned} j=k\,,\;\;\;p_1=p_2\,,\quad \sigma _1=-\sigma _2\,. \end{aligned}$$
(5.117)

Therefore the operator \(Q_{2}^{+}\) defined by (5.115)–(5.116) is a solution of the equation (5.112) if and only if the coefficients of the operator in the right hand side of (5.112) are zero when (5.117) holds true. In order to prove this we study each summand in the r.h.s. of (5.112).

By the previous step in Lemma 5.15, in particular (5.105) and the fact that \((A^{+}_1)^{\sigma }(j, 0, j)=0\), the coefficients of the operator \(Q_1^{+}( \textrm{i} E M_1(\varepsilon \varphi ) \varepsilon \varphi )\) are zero as \(j=k\). By Lemma 5.13 the same holds for the coefficients of \(A_2^{+}\).

Consider the linear operator \([\langle D\rangle ^{2\,s}Q_1^{+}, \textrm{i} (\mathcal {M}_1^{+}+\mathcal {Z}_1^{+})]\). Recalling the expansion (2.32), and the (5.105), (5.88), for indexes satisfying

$$\begin{aligned} \sigma _1p_1+\sigma _2p_2=j-k, \end{aligned}$$

we have that

where

$$\begin{aligned} k'=j-\sigma _1p_1=k+\sigma _2p_2. \end{aligned}$$

If (5.117) holds one can check that \([\langle D\rangle ^{2\,s}Q_1^{+}, \textrm{i} (\mathcal {M}_1^{+}+\mathcal {Z}_1^{+})]^{\sigma _1, -\sigma _1}(j, p_1, -p_1, j)=0\). Concerning the term

$$\begin{aligned} \langle D\rangle ^{2s}Q_1^{-} \mathcal {Z}_1^{-}+\mathcal {Z}_1^{-} \langle D\rangle ^{2s}Q_1^{-} \end{aligned}$$

one can reason as above using equation (5.111). This proves the claim.

It remains to prove the (5.100) for the operator \(Q_{2}^{+}\). It is easy to check (using (5.114) and (3.9) in Lemma 3.8) that the coefficients

$$\begin{aligned} \frac{\texttt{F}^{\sigma _1 \sigma _2}(j, p_1, p_2, k)}{ \Lambda (j)-\Lambda (k)-\sigma _1 \Lambda (p_1)-\sigma _2 \Lambda (p_2) } \end{aligned}$$

appearing in (5.115) contribute to a smoothing remainder satisfying (5.102). To estimate the contribution of the first summand in the r.h.s. of (5.115) we reason as done for the operator \(Q_{1}^{+}\) in Lemma 5.15. Notice that, by the momentum condition \(\sigma _1p_1+\sigma _2p_2=j-k\) we deduce that, if \(jk<0\), then

$$\begin{aligned} \max \{|j|,|k|\}\lesssim \max \{|p_1|, |p_2|\}. \end{aligned}$$

Therefore the coefficients

$$\begin{aligned} \frac{\texttt{f}^{\sigma _1 \sigma _2}( p_1, p_2, (j+k)/2)}{ \Lambda (j)-\Lambda (k)-\sigma _1 \Lambda (p_1)-\sigma _2 \Lambda (p_2) }, \qquad \qquad jk<0 \end{aligned}$$

satisfy the (5.102), which means that they contribute to a smoothing remainder in \(\textbf{M}_2^{-1}\). Hence we only study the case \(jk>0\). We Taylor expand the denominator

$$\begin{aligned} \frac{1}{ \Lambda (j)-\Lambda (k)-\sigma _1 \Lambda (p_1)-\sigma _2\Lambda (p_2)}= \left\{ \begin{aligned}&\frac{1}{\sigma _1p_1+\sigma _2p_2+\sigma _1 \Lambda (p_1)+\sigma _2\Lambda (p_2)}+r_1 , \quad j, k>0\\&\frac{1}{ -\sigma _1p_1-\sigma _2p_2+\sigma _1 \Lambda (p_1)+\sigma _2\Lambda (p_2)}+\tilde{r}_1, \quad j, k<0, \end{aligned}\right. \end{aligned}$$

where \(r_1:=r_1(j,p_1,p_2, k)\), \(\tilde{r}_1:=\tilde{r}_1(j,p_1,p_2, k)\) satisfy

$$\begin{aligned} |r_1(j,p_1,p_2, k)|, |\tilde{r}_1(j,p_1,p_2, k)|\le \max \{\langle p_1\rangle , \langle p_2\rangle \}^{\mu }\min \{\langle j\rangle , \langle k\rangle \}^{-1}, \end{aligned}$$

and \(p_1,p_2\in S\). Recalling (5.110) we set

with

$$\begin{aligned} \texttt{q}_2^{\sigma _1\sigma _2}(p_1,p_2,\xi ):= \frac{\textrm{i} \eta (\xi )\texttt{f}^{\sigma _1 \sigma _2}( p_1, p_2, \xi )}{ \sigma _1p_1+\sigma _2p_2-\sigma _1 \Lambda (p_1)-\sigma _2 \Lambda (p_2) } + \frac{\textrm{i} (1-\eta (\xi ))\texttt{f}^{\sigma _1 \sigma _2}( p_1, p_2, \xi )}{ -\sigma _1p_1-\sigma _2p_2-\sigma _1 \Lambda (p_1)-\sigma _2 \Lambda (p_2) }. \end{aligned}$$

By (5.114) and (3.9) one can check that \(\texttt{q}_2(\varphi ;x,\xi )\) is a symbol in the class \(\textbf{SM}^{0}_2\) with coefficients satisfying (5.101). Therefore, the discussion above implies that the operator \(Q_{2}^{+}\) with coefficients in (5.115) can be written as

$$\begin{aligned} Q_{2}^{+}:={Op^{\textrm{BW}}}(\texttt{q}_2(\varepsilon \varphi ;x,\xi ))+\texttt{Q}_2(\varepsilon \varphi ) \end{aligned}$$

for some \(\texttt{Q}_2(\varepsilon \varphi )\in \textbf{M}^{-1}_{2}\). Then formula (5.100) follows.

Consider now the equation (5.113). By Lemma 5.13 and Lemma 5.15, using Lemmata 2.2, 2.102.8 and 2.11, we note that the right hand side of (5.113) is a linear operator \(\texttt{R}(\varphi )\) in the class \(\textbf{M}_{2}^{0}\) with coefficients satisfying

$$\begin{aligned} |\texttt{R}^{\sigma _1\sigma _2}(j,p_1,p_2,k)|\lesssim \max _{2}\{\langle p_1\rangle , \langle p_2\rangle ,\langle k\rangle \}^{\mu }\,, \end{aligned}$$
(5.118)

for any \(j,k,p_1,p_2\in \mathbb {Z}\), \(\sigma _1,\sigma _2\in \{\pm \}\), for some \(\mu >0\). Then we define

$$\begin{aligned} \big ( {Q_2^{-} } \big )^{\sigma _1 \sigma _2} (j, p_1, p_2, k)= \frac{\textrm{i} \texttt{R}^{\sigma _1 \sigma _2}(j, p_1, p_2, k)}{ \Lambda (j)+\Lambda (k)-\sigma _1 \Lambda (p_1)-\sigma _2 \Lambda (p_2) }\,, \end{aligned}$$
(5.119)

for any \(j,k,p_1,p_2\in \mathbb {Z}\), \(\sigma _1,\sigma _2\in \{\pm \}\) with \(\sigma _1p_1+\sigma _2p_2=j+k\). If \(jk>0\) then

$$\begin{aligned} \max \{|j|,|k|\}\lesssim \max \{|p_1|, |p_2|\}. \end{aligned}$$

Hence, without loss of generality we can assume that \(j>0, k<0\), otherwise \(Q_2^-\) is infinitely smoothing because jk are equivalent to the inner frequencies \(p_1, p_2\). The momentum condition reads as \(|j|-|k|=\sigma _1 p_1+\sigma _2 p_2\). If \(p_1=p_2=0\) the bound on the coefficients (5.119) is trivial. Hence we assume that \(|p_1|+|p_2|\ge 1\) ( they cannot be both zero). Assume that k is such that

$$\begin{aligned} |k|>4( |p_1|+|p_2|). \end{aligned}$$
(5.120)

Then we have that \(\Lambda (k)>\Lambda (p_1)+\Lambda (p_2)\), indeed

$$\begin{aligned} \Lambda (k)^2>16 (|p_1|+|p_2|)^2+m&>(|p_1|+|p_2|)^2+2 m^2+m(|p_1|+|p_2|)\\&\quad +m >(\Lambda (p_1)+\Lambda (p_2))^2, \end{aligned}$$

where we used that \(m\le 2\) and \(|p_1|+|p_2|\ge 1\). Hence

$$\begin{aligned} \Lambda (j)+\Lambda (k)-\sigma _1 \Lambda (p_1)-\sigma _2 \Lambda (p_2)\ge \Lambda (j). \end{aligned}$$

Therefore the operator with coefficients in (5.119) with k such that (5.120) holds belongs to \(\textbf{M}_1^{-1}\). If k is such that \( |k|\le 4( |p_1|+|p_2|) \), then by the momentum condition \(|j|\le 5 ( |p_1|+|p_2|)\). Therefore we can argue as before, because jk are equivalent to the inner frequencies \(p_1, p_2\).

By estimates (5.118) and the above discussion we deduce that the coefficients in (5.119) satisfy (5.102). Hence the thesis follows.

5.2.3 Proof of Proposition 5.14

Recalling (5.96) we have

$$\begin{aligned} \partial _{t}\mathcal {E}_{s}(W^{\perp })&=\partial _{t}\mathcal {N}_{s}(W^{\perp }) + \frac{1}{2}(\textrm{i}\,E\mathcal {D}^{2s}\partial _{t}(\mathcal {Q}(\varepsilon \varphi ))W^{\perp }, W^{\perp })_{L^{2}} \\ {}&+(\textrm{i}\,E\mathcal {D}^{2s}\mathcal {Q}(\varepsilon \varphi )\dot{W}^{\perp }, W^{\perp })_{L^{2}} +(\textrm{i}\,E\mathcal {D}^{2s}\mathcal {Q}(\varepsilon \varphi )W^{\perp }, \dot{W}^{\perp })_{L^{2}}\,. \end{aligned}$$

Therefore, using Lemma 5.13 and the equation (5.83) (recall also Remark 5.12), one gets

$$\begin{aligned} \begin{aligned} \partial _{t}\mathcal {E}_{s}(W^{\perp })&= \tfrac{1}{2}(\textrm{i} E\mathcal {D}^{2s}\mathcal {L}(\varepsilon \varphi ) W^{\perp }, W^{\perp })_{L^{2}} +\mathcal {B}(\varepsilon \varphi ,V) \\ {}&+\frac{1}{2}(\textrm{i} E\big [ \mathcal {D}^{2s}\mathcal {Q}(\varepsilon \varphi ), \textrm{i} E\big (\mathcal {M}_{>}+\mathcal {Z}_{>} \big )\big ] W^{\perp }, W^{\perp })_{L^{2}} \\ {}&+\tfrac{1}{2}(\textrm{i} E\mathcal {D}^{2s} \mathcal {Q}(\varepsilon \varphi ) \widetilde{\mathcal {Z}}, W^{\perp })_{L^{2}} +\tfrac{1}{2}(\textrm{i} E\mathcal {D}^{2s}\mathcal {Q}(\varepsilon \varphi ) W^{\perp }, \widetilde{\mathcal {Z}})_{L^{2}} \end{aligned} \end{aligned}$$
(5.121)

where \(\mathcal {M}_{>}, \mathcal {Z}_{>}\) are in (5.81), \(\widetilde{\mathcal {Z}}\) is in (5.82) and where

$$\begin{aligned} \mathcal {L}(\varepsilon \varphi ):= \mathcal {A}(\varepsilon \varphi ) +\partial _{t}(\mathcal {Q}(\varepsilon {\varphi })) + \mathcal {D}^{-2s}[\mathcal {D}^{2s}\mathcal {Q}(\varepsilon \varphi ), \textrm{i} E \big (\mathcal {M}+\mathcal {Z}\big )] \,, \end{aligned}$$

with \(\mathcal {M}, \mathcal {Z}\) in (5.80). By (5.95), (5.97), (5.80) we can write

$$\begin{aligned} \mathcal {L}^{({\textrm{diag}})}&:={A}^{+} + \partial _t \big ( Q^{+}(\varepsilon {\varphi })\big )\nonumber \\&+\langle D\rangle ^{-2s} \Big ([\langle D\rangle ^{2s} Q^{+}, \textrm{i}(\mathcal {M}^{+}+\mathcal {Z}^{+})] -\textrm{i} (\langle D\rangle ^{2s} Q^{-} \mathcal {Z}^{-} +\mathcal {Z}^{-} \langle D\rangle ^{2s} Q^{-})\Big )\,, \end{aligned}$$
(5.122)
$$\begin{aligned} \mathcal {L}^{({\textrm{off}})}&:=A^{-}+\partial _t \big ( Q^{-}(\varepsilon {\varphi }) \big )\nonumber \\&+\langle D\rangle ^{-2s} \Big ([\mathcal {D}^{2s} Q^{+}, \textrm{i} \mathcal {Z}^{-}] -\textrm{i} (\langle D\rangle ^{2s} Q^{-} (\mathcal {M}^{+}+\mathcal {Z}^{+}) +(\mathcal {M}^{+}+\mathcal {Z}^{+})\langle D\rangle ^{2s} Q^{-})\Big )\,. \end{aligned}$$
(5.123)

Notice that by (5.17) we have, for \(\sigma \in \{\pm \}\),

$$\begin{aligned} \begin{aligned} \partial _t (Q^{\sigma }(\varepsilon \varphi ))&=Q^{\sigma }_1(\varepsilon \varphi _t)+Q^{\sigma }_2(\varepsilon \varphi _t, \varepsilon \varphi ) +Q^{\sigma }_2(\varepsilon \varphi , \varepsilon \varphi _t)\\&=Q^{\sigma }_1(\textrm{i} E \Lambda \varphi )+Q^{\sigma }_1(M_1(\varepsilon \varphi ) \varepsilon \varphi ) +Q^{\sigma }_2(\textrm{i} E \Lambda \varepsilon \varphi , \varepsilon \varphi )\\&\quad +Q^{\sigma }_2(\varepsilon \varphi , \textrm{i} E \Lambda \varepsilon \varphi )+\textbf{R}_{\ge 3}^{\sigma } \end{aligned} \end{aligned}$$
(5.124)

where

$$\begin{aligned} \begin{aligned} \textbf{R}^{\sigma }_{\ge 3}&:= Q^{\sigma }_1(M_2(\varepsilon \varphi ) \varepsilon \varphi +M_{\ge 3}(\varepsilon \varphi ) \varepsilon \varphi ) \\ {}&+Q^{\sigma }_2 \big (\varepsilon \varphi , M_1(\varepsilon \varphi ) \varepsilon \varphi +M_2(\varepsilon \varphi ) \varepsilon \varphi +M_{\ge 3}(\varepsilon \varphi ) \varepsilon \varphi \big ) \\ {}&+Q^{\sigma }_2(M_1(\varepsilon \varphi ) \varepsilon \varphi +M_2(\varepsilon \varphi ) \varepsilon \varphi +M_{\ge 3}(\varepsilon \varphi ) \varepsilon \varphi , \varepsilon \varphi ) \\ {}&+Q^{\sigma }_1(-\textrm{Res}_{\mathcal {H}}(\varepsilon \varphi )) +Q^{\sigma }_2(\varepsilon \varphi ,-\textrm{Res}_{\mathcal {H}}(\varepsilon \varphi )) +Q^{\sigma }_2(-\textrm{Res}_{\mathcal {H}}(\varepsilon \varphi ),\varepsilon \varphi )\,. \end{aligned} \end{aligned}$$
(5.125)

Then, using (5.87), (5.97), (5.80), (5.124), we Taylor expand \(\mathcal {L}^{({\textrm{diag}})}\) and \(\mathcal {L}^{({\textrm{off}})}\) in (5.122)–(5.123) as

$$\begin{aligned} \begin{aligned} \mathcal {L}^{({\textrm{diag}})}&=\mathcal {L}^{({\textrm{diag}})}_1+\mathcal {L}^{({\textrm{diag}})}_2+\mathcal {L}^{({\textrm{diag}})}_{\ge 3}, \qquad \mathcal {L}^{({\textrm{off}})}=\mathcal {L}^{({\textrm{off}})}_{1}+\mathcal {L}^{({\textrm{off}})}_2+\mathcal {L}^{({\textrm{off}})}_{\ge 3} \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \mathcal {L}^{({\textrm{diag}})}_1&= A^{+}_1 +Q^{+}_1(\textrm{i} E \Lambda \varepsilon \varphi ) +\langle D\rangle ^{-2s} [\langle D\rangle ^{2s} Q^{+}_1, \textrm{i}\Lambda ], \\ \mathcal {L}^{({\textrm{diag}})}_2&= A^{+}_2 +Q^{+}_1(M_1(\varepsilon \varphi )\varepsilon \varphi )+Q^{+}_2(\textrm{i} E \Lambda \varepsilon \varphi , \varepsilon \varphi ) +Q^{+}_2(\varepsilon \varphi , \textrm{i} E \Lambda \varepsilon \varphi )\,, \\ {}&\quad +\langle D\rangle ^{-2s} \Big ( [\langle D\rangle ^{2s} Q^{+}_2, \textrm{i}\Lambda ] +[ \langle D\rangle ^{2s} Q^{+}_1, \textrm{i}(\mathcal {M}_1^{+}+\mathcal {Z}_1^{+})] -\textrm{i} (\langle D\rangle ^{2s} Q_1^{-} \mathcal {Z}_1^{-}\\&\quad +\mathcal {Z}_1^{-} \langle D\rangle ^{2s} Q_1^{-}) \Big ), \end{aligned}$$
$$\begin{aligned} \begin{aligned} \mathcal {L}^{({\textrm{diag}})}_{\ge 3}&=\textbf{R}^{+}_{\ge 3} +\langle D\rangle ^{-2s}\Big ( [ \langle D\rangle ^{2s} Q^{+}_2, \textrm{i}(\mathcal {M}_1^{+}+\mathcal {Z}_1^{+})]\Big ) \\ {}&\quad +\langle D\rangle ^{-2s}\Big ( [ \langle D\rangle ^{2s} Q^{+}_1, \textrm{i}(\mathcal {M}_2^{+}+\mathcal {Z}_2^{+})] +[ \langle D\rangle ^{2s} Q^{+}_2, \textrm{i}(\mathcal {M}_2^{+}+\mathcal {Z}_2^{+})] \Big )\,, \\ {}&\quad -\textrm{i} \langle D\rangle ^{-2s}\Big ( \langle D\rangle ^{2s} Q_1^{-} \mathcal {Z}_2^{-} +\mathcal {Z}_2^{-} \langle D\rangle ^{2s} Q_1^{-} +\langle D\rangle ^{2s} Q_2^{-} \mathcal {Z}_1^{-} +\mathcal {Z}_1^{-} \langle D\rangle ^{2s} Q_2^{-} \Big ) \\ {}&\quad -\textrm{i} \langle D\rangle ^{-2s}\Big ( \langle D\rangle ^{2s} Q_2^{-} \mathcal {Z}_2^{-} +\mathcal {Z}_2^{-} \langle D\rangle ^{2s} Q_2^{-} \Big )\,, \end{aligned} \end{aligned}$$
(5.126)

and

$$\begin{aligned} \mathcal {L}^{({\textrm{off}})}_1&= A^{-}_1 +Q^{-}_1(\textrm{i} E \Lambda \varphi ) -\textrm{i}\langle D\rangle ^{-2s}(\langle D\rangle ^{2s} Q_1^{-} \Lambda +\Lambda \,\langle D\rangle ^{2s}\,Q_1^{-})\,, \\ \mathcal {L}^{({\textrm{off}})}_2&= A_2^{-} +Q^{-}_1(M_1(\varepsilon \varphi )\varepsilon \varphi ) +Q^{-}_2(\textrm{i} E \Lambda \varphi , \varepsilon \varphi ) +Q^{-}_2(\varepsilon \varphi , \textrm{i} E \Lambda \varepsilon \varphi ) \\ {}&\quad +\langle D\rangle ^{-2s} \Big ( [\langle D\rangle ^{2s} Q_1^{+}, \textrm{i}\mathcal {Z}_1^{-}] \Big ) \\ {}&\quad -\langle D\rangle ^{-2s}\Big (\textrm{i} (\langle D\rangle ^{2s} Q_1^{-} (\mathcal {M}_1^{+}+\mathcal {Z}_1^{+}) +(\mathcal {M}_1^{+}+\mathcal {Z}_1^{+}) \langle D\rangle ^{2s} Q_1^{-}) \Big ) \\ {}&\quad -\textrm{i} \langle D\rangle \Big (\langle D\rangle ^{ 2s} Q^{-}_2 \Lambda +\Lambda \langle D\rangle ^{2s} Q^{-}_2 \Big )\,, \end{aligned}$$
$$\begin{aligned} \begin{aligned} \mathcal {L}^{({\textrm{off}})}_{\ge 3}&=\textbf{R}^{-}_{\ge 3} +\langle D\rangle ^{-2s} \Big ( [\langle D\rangle ^{2s} Q_2^{+}, \textrm{i}\mathcal {Z}_1^{-}] +[\langle D\rangle ^{2s} Q_1^{+}, \textrm{i}\mathcal {Z}_2^{-}] +[\langle D\rangle ^{2s} Q_2^{+}, \textrm{i}\mathcal {Z}_2^{-}] \Big ) \\ {}&\quad -\textrm{i} \langle D\rangle ^{-2s} \Big ( \langle D\rangle ^{2s} Q_2^{-} ( \mathcal {M}_1^{+}+\mathcal {Z}_1^{+}) +( \mathcal {M}_1^{+}+\mathcal {Z}_1^{+} )\langle D\rangle ^{2s} Q_2^{-}\Big ) \\ {}&\quad -\textrm{i} \langle D\rangle ^{-2s}\Big \langle D\rangle ^{2s} Q_1^{-} \mathcal {Z}_2^{+} +\mathcal {Z}_2^{+} \langle D\rangle ^{2s} Q_1^{-} \Big ) \\ {}&\quad -\textrm{i} \langle D\rangle ^{-2s} \Big ( \langle D\rangle ^{2s} Q_2^{-} \mathcal {Z}_2^{+} +\mathcal {Z}_2^{+} \langle D\rangle ^{2s} Q_2^{-} \Big )\,. \end{aligned} \end{aligned}$$
(5.127)

By Lemmata 5.15, 5.16 we have constructed \(Q^{(1)}, Q^{(2)}\) such that

$$\begin{aligned} \mathcal {L}^{({\textrm{diag}})}_1=\mathcal {L}^{({\textrm{diag}})}_2 =\mathcal {L}^{({\textrm{off}})}_1=\mathcal {L}^{({\textrm{off}})}_2\equiv 0. \end{aligned}$$

Then, by (5.121), we deduce

$$\begin{aligned} \partial _{t}\mathcal {E}_{s}(W^{\perp })&= \tfrac{1}{2}(\textrm{i} E\mathcal {D}^{2s}\mathcal {L}_{\ge 3}(\varepsilon \varphi ) W^{\perp }, W^{\perp })_{L^{2}} +\mathcal {B}(\varepsilon \varphi ,V) \\ {}&\quad +\frac{1}{2}(\textrm{i} E\big [ \mathcal {D}^{2s}\mathcal {Q}(\varepsilon \varphi ), \textrm{i} E(\mathcal {M}_{>}+\mathcal {Z}_{>} )\big ] W^{\perp }, W^{\perp })_{L^{2}} \\ {}&\quad +\tfrac{1}{2}(\textrm{i} E\mathcal {D}^{2s} \mathcal {Q}(\varepsilon \varphi ) \widetilde{\mathcal {Z}}, W^{\perp })_{L^{2}} +\tfrac{1}{2}(\textrm{i} E\mathcal {D}^{2s}\mathcal {Q}(\varepsilon \varphi ) W^{\perp }, \widetilde{\mathcal {Z}})_{L^{2}} \end{aligned}$$

where

$$\begin{aligned} \mathcal {L}_{\ge 3}(\varepsilon \varphi ):= \begin{pmatrix} \mathcal {L}^{({\textrm{diag}})}_{\ge 3} &{} \mathcal {L}^{({\textrm{off}})}_{\ge 3}\\ \overline{\mathcal {L}^{({\textrm{off}})}_{\ge 3}} &{}\overline{\mathcal {L}^{({\textrm{diag}})}_{\ge 3}} \end{pmatrix}. \end{aligned}$$

We claim that the operators \(\mathcal {L}^{({\textrm{diag}})}_{\ge 3}\), \( \mathcal {L}^{({\textrm{off}})}_{\ge 3} \) belong to \(\textbf{NH}^{0}_{3}[r]\) and satisfy for \(s\ge s_0\) (\(s_0\) large enough)

$$\begin{aligned} \Vert \mathcal {L}^{({\textrm{diag}})}_{\ge 3}h\Vert _{H^{s}} + \Vert \mathcal {L}^{({\textrm{off}})}_{\ge 3}\overline{h}\Vert _{H^{s}} \lesssim _{s} L^{2}\Vert \varphi \Vert _{{H}^{s}(\mathbb T)}^{3}\Vert h\Vert _{H^{s}}\,, \end{aligned}$$
(5.128)

for all \(h\in H^{s}\).

We start considering the term \(\mathcal {L}^{({\textrm{diag}})}_{\ge 3}\) in (5.126). First of all we recall that, by (5.97), (5.99), (5.100), we have

$$\begin{aligned}{} & {} Q^{+}(\varphi )={Op^{\textrm{W}}}(\texttt{q}_1+\texttt{q}_2)+\texttt{Q}_1(\varphi )+\texttt{Q}_2(\varphi ), \qquad q_1\in \textbf{SM}^{0}_1,\;\;q_2\in \textbf{SM}^{0}_2,\\{} & {} \quad \texttt{Q}_1\in \textbf{M}^{-2}_1, \quad \texttt{Q}_2\in \textbf{M}^{-1}_2. \end{aligned}$$

We also recall that

$$\begin{aligned} Q^{-}(\varphi )=Q_1^{-}(\varphi )+Q_2^{-}(\varphi )\,, \qquad Q_1^{-}\in \textbf{M}^{-1}_1\,,\;\;Q_2^{-}\in \textbf{M}^{-1}_2\,. \end{aligned}$$
(5.129)

We now analyse the cubic contributions coming from each summand in (5.126) and we show that they are all bounded operators. We point out that \(\mathcal {M}^{+}\) is an unbounded operator of order one, while \(\mathcal {Z}^{+}\) is bounded. By the properties of \(Q^{\sigma }_j\), \(\sigma =\pm \), \(j=1, 2\) discussed above we have that \(\textbf{R}^{(j)}_{\ge 3}\) in (5.125) are bounded.

In the second line of (5.126) the terms are bounded because \(Q_j^{+}={Op^{\textrm{W}}}(\texttt{q}_j)+\texttt{Q}_j\) and

(i) the commutator between the pseudo differential operators

$$\begin{aligned} {Op^{\textrm{W}}}(\texttt{q}_j),\quad \mathcal {M}^{+}_{k}+\mathcal {Z}^{+}_k, \;\;\; (j, k)\in \{ (1, 2), (2, 1), (2, 2) \} , \end{aligned}$$

gains one derivative;

(ii) the composition between \(\texttt{Q}_j\) and \(\mathcal {M}^{+}_{k}+\mathcal {Z}^{+}_k\), \((j, k)\in \{ (1, 2), (2, 1), (2, 2) \}\) is bounded by Lemma 2.10 and because \(\texttt{Q}_j\) are 1-smoothing.

The third and fourth lines in (5.126) are bounded because they are compositions of bounded operators.

Now we make a similar analysis for \(\mathcal {L}_{\ge 3}^{({\textrm{off}})}\) in (5.127). As before the only critical terms are the ones involving \(\mathcal {M}^{+}\), namely the terms of the third and fourth lines in (5.127). By (5.129) and Lemma 2.10 the composition of \(Q^{-}_j\) with \(\mathcal {M}^{+}_k\), \((j, k)\in \{ (1, 2), (2, 1), (2, 2) \}\), is bounded. The estimate (5.128) is consequence of (5.102) and (5.76), (5.77), (5.64) for the estimates on \(\mathcal {M}_{k}^{+}, \mathcal {Z}_k^{\pm }\), \(k=1, 2\).

Recall \(\widetilde{\mathcal {Z}}\) in (5.82). By (5.66), (5.59) and (4.7) we have

$$\begin{aligned} \Vert \widetilde{\mathcal {Z}} \Vert _{H^s}\lesssim _s \Vert \varepsilon \varphi \Vert _{H^{s}}^{3}\Vert \varepsilon ^{\beta }V\Vert _{H^{s}} + \,\Vert \varepsilon ^{\beta }V\Vert _{H^{s}}^{2} +\varepsilon ^5 \,. \end{aligned}$$
(5.130)

We observe that \(\big [ \mathcal {D}^{2\,s}\mathcal {Q}(\varepsilon \varphi ), \textrm{i} E(\mathcal {M}_{>}+\mathcal {Z}_{>} \big ]\) looses only \(2\,s\) derivatives thanks to the commutator structure since \(\mathcal {M}_{>}\) is a diagonal matrix of para-differential operator of order one, while \(\mathcal {Q}, \mathcal {Z}_{>}\) are (uo to smoothing remainders) para-differential of order zero.

By Cauchy–Schwarz inequality, (5.85), (5.128), (5.78), (5.79) and (5.130) we obtain (5.98).

5.3 Proof of Theorem 5.1

We are now in position to conclude the proof of Theorem 5.1. Consider the function \(\varepsilon ^{\beta }V\) which is a solution of (4.9). Our aim is to prove an a priori estimate on \(\varepsilon ^{\beta }\Pi _{S}^{\perp }V\) which solves the equation (5.1). Recall that, by Lemma 5.2, the (5.1) can be written as (5.4). By Proposition 5.11 (see also Lemma 5.9) we have that the function \(W^{\perp }\) in (5.60) solves the equation (5.63). Now consider the modified energy \(\mathcal {E}_s(W^{\perp })\) in (5.95)–(5.96) (recall also (5.84)) where the operator \(\mathcal {Q}(\varepsilon \varphi )\) is given by Proposition 5.14. We claim that there is \(C_s>0\) such that, for \(\varepsilon >0\) small enough,

$$\begin{aligned} \frac{1}{1+\varepsilon C_s}\Vert W^{\perp }\Vert ^{2}_{H^{s}}\le |\mathcal {E}_{s}(W^{\perp })| \le (1+\varepsilon C_{s})\Vert W^{\perp }\Vert ^{2}_{H^{s}}\,. \end{aligned}$$
(5.131)

Indeed, using (5.99)–(5.102), recalling Definitions 2.3, 2.5, Lemma 2.2 and the Cauchy–Schwarz inequality, one gets

$$\begin{aligned} \begin{aligned} |\mathcal {E}_s(W^{\perp })|&{\mathop {\le }\limits ^{(5.96)}} \Vert W^{\perp }\Vert ^{2}_{H^{s}}+ |\tfrac{1}{2}(\textrm{i}\,E\mathcal {D}^{2s}\mathcal {Q}(\varepsilon \varphi )W^{\perp }, W^{\perp })_{\textbf{H}^{0}}| \\ {}&\le \Vert W^{\perp }\Vert ^{2}_{H^{s}}+c_s \Vert \mathcal {Q}(\varepsilon \varphi )W^{\perp }\Vert _{H^{s}} \Vert W^{\perp }\Vert _{H^{s}} \\ {}&\le \Vert W^{\perp }\Vert ^{2}_{H^{s}} (1+c_{s}\Vert \varepsilon \varphi \Vert _{H^{s}}) \end{aligned} \end{aligned}$$

which implies the second in (5.131). The first inequality follows similarly. Using (5.131), estimate (5.98) and integrating in t one obtains

$$\begin{aligned} \begin{aligned} \Vert W^{\perp }(t)\Vert ^{2}_{H^{s}}&\le \Vert W^{\perp }(0)\Vert ^{2}_{H^{s}} + C_sT\sup _{t\in [0,T]}\Vert \varphi \Vert _{H^{s}}^{3} \sup _{t\in [0,T]}\Vert V\Vert _{H^{s}}^{2} \\ {}&+C_sT\sup _{t\in [0,T]}\Vert V\Vert _{H^{s}}^{3} +C_sT \varepsilon ^5\sup _{t\in [0,T]}\Vert V\Vert _{H^{s}}, \end{aligned} \end{aligned}$$

for \(t\in [0,T]\). Therefore, using the equivalence (5.61), one gets the bound (5.2). This concludes the proof.

6 The Estimates on the Remainder and Proof of the Main Result

Consider the function V in (4.8) which solves the problem (4.9) and set

$$\begin{aligned} \begin{aligned} 0<\sigma <1/4\,,\quad \beta \in (2+2\sigma , 3-2\sigma )\,. \end{aligned} \end{aligned}$$
(6.1)

The main result of this section is the following.

Proposition 6.1

(Main bootstrap) Let \(\sigma , \beta \) as in (6.1). There exists \(s_0\gg 1\) such that for all \(s\ge s_0\) there exist constants \(c_0=c_0(s)>0\), \(C_{s}>0\) such that the following holds. Let \(\varepsilon >0\) satisfying (4.5) and let \(\varepsilon \varphi \) be the approximate solution in (4.3) and satisfying (4.4). Let V(tx) be a solution of (4.9) with initial condition \(V_0\in H^s(\mathbb {T})\) defined for times \(t\in [0, T_0]\) for some \(T_0>0\). If

$$\begin{aligned} \varepsilon ^{\beta }\Vert V_0\Vert _{H^s}\le \varepsilon ^{\beta }\,, \quad \varepsilon ^{\beta }\Vert \Pi _S V_0\Vert _{H^{1}}\le \varepsilon ^{\beta +1}\,, \end{aligned}$$
(6.2)

then V(tx) extends over an interval \([0, \widetilde{T}]\) with

$$\begin{aligned} \widetilde{T}:=c_0 \varepsilon ^{-2-\sigma }\,, \end{aligned}$$
(6.3)

and we have the following bound

$$\begin{aligned} \sup _{t\in [0,\widetilde{T}]}\ \varepsilon ^{\beta } \Vert V(t)\Vert _{H^{s}{}}\le 2 \varepsilon ^{\beta }\,. \end{aligned}$$
(6.4)

The proof of Proposition 6.1 is based on a bootstrap argument. We define

$$\begin{aligned} T_*:= \sup \Big \{ T\ge 0 : \sup _{t\in [0,T]} \Vert \varepsilon ^{\beta } V \Vert _{H^s}\le 2 \varepsilon ^{\beta } \Big \}\,. \end{aligned}$$
(6.5)

By assumption of Proposition 6.1 we have that \(T_*>0\). We shall prove, by contradiction, that the supremum \(T_*\) cannot be smaller than \(\widetilde{T}\) in (6.3).

Remark 6.2

The bootstrap assumption implies (4.11).

We split the equation (4.9) in the following way

$$\begin{aligned} \left\{ \begin{aligned}&\varepsilon ^{\beta }\partial _t \Pi _S {V}= \varepsilon ^{\beta }\Pi _S d X_{\mathcal {H}}(\varepsilon \varphi ) [V] +\varepsilon ^{2\beta }\Pi _S \mathcal {Q}(\varepsilon \varphi )[V, V] +\Pi _S \text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\,, \\&\varepsilon ^{\beta }\partial _t \Pi ^{\perp }_S {V}= \varepsilon ^{\beta } \Pi ^{\perp }_S d X_{\mathcal {H}}(\varepsilon \varphi ) [V] +\varepsilon ^{2\beta }\Pi _S^{\perp } \mathcal {Q}(\varepsilon \varphi )[V, V] +\Pi ^{\perp }_S \text{ Res}_{\mathcal {H}}(\varepsilon \varphi )\,. \end{aligned}\right. \end{aligned}$$
(6.6)

The rest of the section is organized as follows: in Sect. 6.1 we show the improved bound

$$\begin{aligned} \varepsilon ^{\beta }\sup _{t\in [0,T]}\Vert \Pi _{S} V \Vert _{H^s(\mathbb {T})}\le \frac{3}{2} \varepsilon ^{\beta } \, \end{aligned}$$
(6.7)

as long as \(T\le \widetilde{T}\) in (6.3). Similarly in Sect. 6.2 we prove

$$\begin{aligned} \varepsilon ^{\beta }\sup _{t\in [0,T]} \Vert \Pi ^{\perp }_{S} V \Vert _{H^s(\mathbb {T})}\le \frac{3}{2} \varepsilon ^{\beta } \, \end{aligned}$$
(6.8)

as long as \(T\le \widetilde{T}\). By continuity the bounds (6.7)–(6.8) imply that the supremum \(T_*\) in (6.5) should be larger than \(\widetilde{T}\) in (6.3). Then we will conclude the proof of Proposition 6.1.

From now on we consider (by contradiction) \(T\le T_{*}\le \widetilde{T}\) in (6.5)–(6.3).

6.1 The Equation for Low Frequencies

We study the time evolution of the Sobolev norms

$$\begin{aligned} \frac{d}{d t} \Vert \varepsilon ^{\beta } \Pi _S V \Vert ^2_{H^s}= \varepsilon ^{2\beta }(\mathcal {D}^{2s} \partial _t \Pi _S {V}, \Pi _S V)_{L^2}+\varepsilon ^{2\beta }(\mathcal {D}^{2s} \Pi _S {V}, \partial _t \Pi _S V)_{L^2}. \end{aligned}$$

We use the first equation in (6.6) and we provide estimates for each term. The most delicate term is the one involving \(d X_{\mathcal {H}}(\varepsilon \varphi )\). Recall the definition of \(\mathcal {H}_{res}\) in (3.31).

Lemma 6.3

We have

$$\begin{aligned} \Pi _S d X_{\mathcal {H}} (\varepsilon \varphi )[V]= d X_{\mathcal {H}_{res}} (\varepsilon \varphi )[\Pi _S V] +\Pi _S d X_{\mathfrak {R}^{(\ge 6)}}(\varepsilon \varphi ) [V]. \end{aligned}$$

Proof

Notice that, since \(\varepsilon \varphi \) is Fourier supported on S, we have

$$\begin{aligned} \Pi _S d X_{\mathcal {H}^{(n, \ge 2)}}(\varepsilon \varphi )[V]=0, \qquad n=3, 4, 5. \end{aligned}$$

Hence the thesis follows.

Lemma 6.4

Recall \(T_*\) in (6.5). We have for \(0<T\le T_*\)

$$\begin{aligned} \varepsilon ^{2 \beta }\left| \int _0^T ( \mathcal {D}^{2s} \Pi _S d X_{\mathfrak {R}^{(\ge 6)}}(\varepsilon \varphi ) [ V], \Pi _S V)_{L^2} \,dt \right| \lesssim _s \varepsilon ^{4+2 \beta } T. \end{aligned}$$

Proof

The estimate holds by item (iii) of Proposition 3.9, in particular the first bound in (3.19), (4.4) and using the Cauchy–Schwarz inequality.

In the following Lemma we give energy estimate for the term \(\Pi _S d X_{\mathcal {H}_{res}} (\varepsilon \varphi ) [V]\).

Lemma 6.5

Recall \(T_*\) in (6.5) and let V be the solution of (6.6). We have, for \(0<T\le T_*\),

$$\begin{aligned} \varepsilon ^{2\beta }\left| \int _0^T \left( (\mathcal {D}^{2s} \Pi _S d X_{\mathcal {H}_{res}}(\varepsilon \varphi ) +[d X_{\mathcal {H}_{res}}(\varepsilon \varphi )]^* \Pi _S \mathcal {D}^{2s} ) [\Pi _S V], \Pi _S V \right) _{L^2} \,dt \right| \le \frac{1}{4} \varepsilon ^{2\beta }\,. \end{aligned}$$
(6.9)

Proof

By (3.16) we have

$$\begin{aligned} \mathcal {H}_{res}(W)=H^{(2)}(W)+\mathcal {H}^{(4, 0)}_{res}(W) =\sum _{j\in \mathbb Z} \Lambda (j) |w_j|^2+\sum _{j, k\in S} C_{j k} |w_j|^2 |w_k|^2\,, \end{aligned}$$
(6.10)

with \(C_{j k}=C_{k j}\in \mathbb {R}\), and hence \(X_{\mathcal {H}_{res}}(W)=(X^{+}_{\mathcal {H}_{res}}(W), \overline{X^{+}_{\mathcal {H}_{res}}(W)})^{T}\) with

$$\begin{aligned} \left( X^{+}_{\mathcal {H}_{res}}(W) \right) _j= -\textrm{i} \partial _{\overline{w_j}} \mathcal {H}_{res}(W) =-\textrm{i} \Lambda (j) w_j -\textrm{i} \left( \sum _{k\in S} C_{j k} |w_k|^2\right) w_j, \qquad j\in S. \end{aligned}$$

We define \(\texttt{B}:=\Pi _{S}dX_{\mathcal {H}_{res}}(\varepsilon \varphi )\). Then, for \(j\in S\), we have

$$\begin{aligned} \begin{aligned} \left( \texttt{B} [\Pi _S V] \right) _j&= \sum _{k\in S}\begin{pmatrix} \texttt{B}_{j, +}^{k, +} v_k+\texttt{B}_{j, +}^{k, -} \overline{v_k} \\[2mm] \texttt{B}_{j, -}^{k, +} v_k+\texttt{B}_{j, -}^{k, -} \overline{v_k} \end{pmatrix} \\&=\begin{pmatrix} -\textrm{i} \Lambda (j) v_j-\textrm{i} \left( \sum _{k\in S} C_{j k} |\varepsilon \varphi _k|^2\right) v_j -2\textrm{i} \varepsilon \left( \sum _{k\in S} C_{j k} \textrm{Re}\,(v_k \overline{\varphi _k}) \right) \varepsilon \varphi _j \\[2mm] \textrm{i} \Lambda (j) \overline{v_j} +\textrm{i} \left( \sum _{k\in S} C_{j k} |\varepsilon \varphi _k|^2\right) \overline{v_j}+2\textrm{i} \varepsilon \left( \sum _{k\in S} C_{j k} \textrm{Re}\,(v_k \overline{\varphi _k}) \right) \varepsilon \overline{\varphi _j} \end{pmatrix}\,. \end{aligned} \end{aligned}$$
(6.11)

Therefore we have

$$\begin{aligned} \begin{aligned} \texttt{B}_{j, +}^{k, +}&:= {\left\{ \begin{array}{ll} -\textrm{i} (\Lambda (j) +2 \varepsilon ^2 C_{j j} |\varphi _j|^2) \qquad \;j=k\,, \\ -2\textrm{i} \varepsilon ^2 C_{j k} \overline{\varphi _k} \varphi _j \qquad \qquad \qquad j\ne k\,, \end{array}\right. } \\ \texttt{B}_{j, -}^{k, -}&:= {\left\{ \begin{array}{ll} \textrm{i} (\Lambda (j) +2 \varepsilon ^2 C_{j j} |\varphi _j|^2) \qquad \; j=k\,, \\ 2\textrm{i} \varepsilon ^2 C_{j k} {\varphi _k} \overline{\varphi _j} \qquad \qquad \qquad j\ne k\,, \end{array}\right. } \end{aligned} \qquad \begin{aligned} \texttt{B}_{j, +}^{k, -}&:= {\left\{ \begin{array}{ll} -2\textrm{i} \varepsilon ^2 C_{j j} \varphi _j^2 \qquad \;\;\;\; j=k\,, \\ -2\textrm{i} \varepsilon ^2 C_{j k} {\varphi _k} \varphi _j \qquad j\ne k\,, \end{array}\right. } \\ \texttt{B}_{j, -}^{k, +}&:={\left\{ \begin{array}{ll} 2\textrm{i} \varepsilon ^2 C_{j j} \overline{\varphi _j}^2 \qquad \;\;\; j=k\,, \\ 2\textrm{i} \varepsilon ^2 C_{j k} \overline{\varphi _k} \overline{\varphi _j} \qquad j\ne k\,. \end{array}\right. } \end{aligned} \end{aligned}$$

We define

$$\begin{aligned} \texttt{A}:=\mathcal {D}^{2s}\texttt{B}+\mathtt {B^{*}} \mathcal {D}^{2s}= \mathcal {D}^{2s} \Pi _S d X_{\mathcal {H}_{res}}(\varepsilon \varphi ) +[d X_{\mathcal {H}_{res}}(\varepsilon \varphi )]^* \Pi _S \mathcal {D}^{2s}. \end{aligned}$$

By using the following formulas

$$\begin{aligned} (\texttt{B}^*)^{k, \sigma }_{j, \sigma }= \overline{\texttt{B}_{j, -\sigma }^{k, -\sigma }}=\texttt{B}_{j, \sigma }^{k, \sigma }, \qquad \quad (\texttt{B}^*)^{k, \sigma '}_{j, \sigma }= \overline{\texttt{B}_{j, \sigma '}^{k, \sigma }}=\texttt{B}_{j, -\sigma '}^{k, -\sigma }, \end{aligned}$$

we have

$$\begin{aligned} \big ( \texttt{A} [\Pi _S V]\big )_j= \sum _{k\in S}\begin{pmatrix} \big (\langle j \rangle ^{2s} \texttt{B}_{j, +}^{k, +} +\overline{\texttt{B}_{j, -}^{k, -}} \langle k \rangle ^{2s} \big ) v_k +\big ( \langle j \rangle ^{2s} \texttt{B}_{j, +}^{k, -} +\overline{\texttt{B}_{j, -}^{k, +} } \langle k \rangle ^{2s}\big ) \overline{v_k} \\[2mm] \big ( \langle j \rangle ^{2s} \texttt{B}_{j, -}^{k, +} + \overline{\texttt{B}_{j, +}^{k, -}} \langle k \rangle ^{2s} \big ) v_k +\big ( \langle j \rangle ^{2s} \texttt{B}_{j, -}^{k, -} +\overline{\texttt{B}_{j, +}^{k, +}} \langle k \rangle ^{2s}\big )\overline{v_k} \end{pmatrix}. \end{aligned}$$

By using that

$$\begin{aligned} \overline{\texttt{B}_{j, -}^{k, -}}=\texttt{B}_{j, +}^{k, +}, \qquad \overline{\texttt{B}_{j, -}^{k, +}}=\texttt{B}_{j, +}^{k, -}, \end{aligned}$$

we get

$$\begin{aligned} (\texttt{A} [\Pi _{S}V], \Pi _{S}V)_{L^2}= \sum _{j, k\in S } ( \langle j \rangle ^{2s}+ \langle k \rangle ^{2s})\, \Big ( \texttt{B}_{j, +}^{k, +} v_k \overline{v_j} +\texttt{B}_{j, +}^{k, -} \overline{v_k} \overline{v_j} + \texttt{B}_{j, -}^{k, +} v_k v_j+ \texttt{B}_{j, -}^{k, -}\overline{v_k} v_j \Big ). \end{aligned}$$

By using that \(C_{jk}=C_{kj}\in \mathbb {R}\) and that the set S is symmetric we have

$$\begin{aligned} \texttt{B}_{j, \sigma }^{k, \sigma '}= \overline{\texttt{B}_{j, -\sigma }^{k, -\sigma '}}, \qquad \texttt{B}_{\sigma , j}^{\sigma , k}= - \texttt{B}_{-\sigma , k}^{-\sigma , j}, \qquad \forall j, k\in S. \end{aligned}$$

Therefore

$$\begin{aligned} (\texttt{A} [\Pi _{S}V], \Pi _{S}V)_{L^2}&= \sum _{j, k\in S} ( \langle j \rangle ^{2s}+ \langle k \rangle ^{2s})\, \Big (\texttt{B}_{j, +}^{k, -} \overline{v_k} \overline{v_j} + \overline{\texttt{B}_{j, +}^{k, -}} v_k v_j \Big ) \\ {}&=2\textrm{i}\varepsilon ^2 \sum _{j, k\in S} ( \langle j \rangle ^{2s}+ \langle k \rangle ^{2s}) (-C_{j k} \varphi _k \varphi _j \overline{v_k} \overline{v_j} +C_{j k} \overline{\varphi _k} \overline{\varphi _j} {v_k} {v_j}) \\ {}&=-8 \varepsilon ^2 \sum _{j, k\in S} ( \langle j \rangle ^{2s}+ \langle k \rangle ^{2s}) C_{j k} \textrm{Re}\,(\overline{\varphi _k} v_k)\,\textrm{Im}\,(\overline{\varphi _j}\,{v_j})\,. \end{aligned}$$

We have

$$\begin{aligned} (\texttt{A} [\Pi _{S}V], \Pi _{S}V)_{L^2}= 16\, \sum _{j\in S} \left( \sum _{k\in S} C_{j k} \langle k \rangle ^{2 s} \textrm{Re}\,(v_k \varepsilon \overline{\varphi _k}) \right) \textrm{Im}\,(\varepsilon \varphi _j \overline{v_j})\,. \end{aligned}$$
(6.12)

We now study the derivative of \(\textrm{Re}\,(v_p \varepsilon \overline{\varphi _p})\) with \(p\in S\). One has

$$\begin{aligned} \frac{d}{dt} \textrm{Re}\,(v_p \overline{\varphi }_p)= \dot{v}_p \varepsilon \overline{\varphi _p} +v_p \varepsilon \dot{\overline{\varphi _p}} +\dot{\overline{v_p}} \varepsilon \varphi _p+\overline{v_p} \varepsilon \dot{\varphi }_p\,. \end{aligned}$$
(6.13)

Recall that in Sect. 3.3 we constructed \(\varepsilon \varphi \) in such a way (see (3.30), (3.31))

$$\begin{aligned} \varepsilon \partial _{t}\varphi =X_{ \mathcal {N}}(\varepsilon \varphi )\equiv X_{\mathcal {H}_{res}}(\varepsilon \varphi )\,, \end{aligned}$$
(6.14)

which in Fourier reads as

$$\begin{aligned} \dot{\varphi }_p=\textrm{i} \omega _p(\xi ) \,\varphi _p, \qquad p\in S. \end{aligned}$$

The first equation in (6.6) reads in Fourier, for \(p\in S\), asFootnote 3

$$\begin{aligned} \begin{aligned} \dot{v}_{p}&=(\Pi _S d X_{\mathcal {H}}(\varepsilon \varphi ) [V])_{p}^{+} +\varepsilon ^{\beta }(\Pi _S \mathcal {Q}(\varepsilon \varphi )[V, V])_{p}^{+} +\varepsilon ^{-\beta }(\Pi _S \text{ Res}_{\mathcal {H}}(\varepsilon \varphi ))_{p}^{+} \\ {}&= (\Pi _S d X_{\mathcal {H}_{res}}(\varepsilon \varphi ) [V])_{p}^{+} \\ {}&+\underbrace{(\Pi _S d X_{\mathfrak {R}^{(\ge 6)}}(\varepsilon \varphi ) [V])_{p}^{+}}_{I} +\varepsilon ^{\beta }\underbrace{(\Pi _S \mathcal {Q}(\varepsilon \varphi )[V, V])_{p}^{+}}_{II} +\underbrace{\varepsilon ^{-\beta }(\Pi _S \text{ Res}_{\mathcal {H}}(\varepsilon \varphi ))_{p}^{+}}_{III}\,. \end{aligned} \end{aligned}$$
(6.15)

Notice that, by estimate (3.19) on \(d X_{\mathfrak {R}^{(\ge 6)}}\), (4.4) on \(\varepsilon \varphi \), and the bootstrap assumption on \(\varepsilon ^{\beta }V\), one gets \(|I |\lesssim _{s}\varepsilon ^{4}\). Recalling (4.30)–(4.31) and that we are projecting on a finite dimensional subspace of \(H^{s}\), we get \(|II|\lesssim _{s}\varepsilon ^{\beta }\). Finally, by Lemma 4.1 on the residual, one gets \(|III|\lesssim _{s}\varepsilon ^{5-\beta }\). All the latter estimates are uniform in \(t\in [0,T]\). By substituting (6.14), (6.15) into (6.13) we obtain

$$\begin{aligned} \frac{d}{dt} \textrm{Re}\,(v_p \overline{\varphi }_p)&= \dot{v}_p \varepsilon \overline{\varphi _p} +v_p \varepsilon \dot{\overline{\varphi _p}} +\dot{\overline{v_p}} \varepsilon \varphi _p+\overline{v_p} \varepsilon \dot{\varphi }_p\nonumber \\&=2\textrm{Re}\big (\varepsilon \overline{\varphi _p} (\Pi _S d X_{\mathcal {H}_{res}}(\varepsilon \varphi ) [V])_{p}^{+} +\overline{v_p} \varepsilon \textrm{i} \omega _{p}(\xi )\varphi _p\big )\end{aligned}$$
(6.16)
$$\begin{aligned}&\quad +2\textrm{Re}\big (\varepsilon \overline{\varphi _p}\big (I+II+III\big )\big )\,. \end{aligned}$$
(6.17)

Using the explicit form of \(\Pi _S d X_{\mathcal {H}_{res}}(\varepsilon \varphi ) [V]\) in (6.11) and recalling (3.34), one can check that (6.16)\(\equiv 0\). On the other hand, we also have |(6.17)\(|\lesssim _{s}\varepsilon (\varepsilon ^{4}+\varepsilon ^{\beta }+\varepsilon ^{5-\beta })\). Let us define

$$\begin{aligned} \texttt{P}_p:= \textrm{Re}\,(v_p \varepsilon \overline{\varphi _p})(0), \qquad \texttt{L}_p(t):= \textrm{Re}\,(v_p \varepsilon \overline{\varphi _p})(t)-\textrm{Re}\,(v_p \varepsilon \overline{\varphi _p})(0). \end{aligned}$$

By the discussion above we deduce

$$\begin{aligned} \sup _{\tau \in [0, t]}| L_p(\tau )| \lesssim \varepsilon \big (\varepsilon ^{4}+\varepsilon ^{\beta }+\varepsilon ^{5-\beta }\big ) t, \qquad \forall t\in [0, T]. \end{aligned}$$

Let us now write (6.12) as

$$\begin{aligned} (\texttt{A} [\Pi _S V(t)], \Pi _S V(t))_{L^2}= 16 \sum _{j\in S} \left( \sum _{k\in S} C_{j k} \langle k \rangle ^{2 s} (\texttt{P}_k +L_{k}(t))\right) \textrm{Im}\,(\varepsilon \varphi _j(t) \overline{v_j}(t)). \end{aligned}$$

By the bootstrap assumption (6.5) and the bound (4.4) we get

$$\begin{aligned} \sup _{t\in [0, T]} | \textrm{Im}\,(\varepsilon \varphi _j \overline{v_j})|\lesssim \varepsilon . \end{aligned}$$

Setting \(\delta :=\max _{k\in S} \texttt{P}_k\), one can note that

$$\begin{aligned} \delta \le 2 \Vert \Pi _S V_0 \Vert _{H^1(\mathbb T)} \varepsilon {\mathop {\le }\limits ^{(6.2)}} 2 \varepsilon ^2 . \end{aligned}$$

Thus

$$\begin{aligned} \left| \int _0^T \left( \texttt{A} [\Pi _S V(t)], \Pi _S V(t) \right) _{L^2} \,dt \right|&\lesssim _{s} \delta \varepsilon T+ \varepsilon ^2\big (\varepsilon ^{4}+\varepsilon ^{\beta }+\varepsilon ^{5-\beta }\big ) T^2 \lesssim _{s} c_0\,, \end{aligned}$$
(6.18)

provided that

$$\begin{aligned} \delta \varepsilon T\lesssim \varepsilon ^3T\lesssim 1,\qquad \varepsilon ^{6}T^2\lesssim 1 ,\qquad \varepsilon ^{\beta +2}T^2\lesssim 1,\qquad \varepsilon ^{7-\beta }T^2\lesssim 1. \end{aligned}$$

Since we set \(T\lesssim \varepsilon ^{-2-\sigma }\) (see (6.3)), we need that

$$\begin{aligned} \varepsilon ^{6-4-2\sigma }= \varepsilon ^{2-2\sigma }\lesssim 1\,,\qquad \varepsilon ^{\beta +2-4-2\sigma }=\varepsilon ^{\beta -2-2\sigma }\lesssim 1\,, \qquad \varepsilon ^{7-\beta -4-2\sigma }= \varepsilon ^{3-\beta -2\sigma }\lesssim 1, \end{aligned}$$
(6.19)

which follows by taking \(\sigma \) and \(\beta \) as in (6.1). This implies the thesis.

Lemma 6.6

Recall \(T_*\) in (6.5). We have, for \(0<T\le T_*\),

$$\begin{aligned} \varepsilon ^{3 \beta }\left| \int _0^T (\mathcal {D}^{2s} \Pi _S \mathcal {Q}(\varepsilon \varphi ) [V], \Pi _S V)_{L^2} \,dt \right| \lesssim _s \varepsilon ^{3 \beta } T. \end{aligned}$$

Proof

It follows by Cauchy–Schwarz, recalling (4.30)–(4.31) and that we are projecting on a finite dimensional subspace of \(H^{s}\).

Lemma 6.7

Recall \(T_*\) in (6.5). We have, for \(0<T\le T_*\),

$$\begin{aligned} \varepsilon ^{\beta }\left| \int _0^T (\mathcal {D}^{2s} \Pi _S \text{ Res}_{\mathcal {H}}(\varepsilon \varphi ), V)_{L^2}\,dt \right| \lesssim _s \varepsilon ^{\beta +5} T. \end{aligned}$$

Proof

It follows by Lemma 4.1.

By collecting Lemmata 6.4, 6.5, 6.6, 6.7, we get

$$\begin{aligned} \varepsilon ^{2\beta } \sup _{t\in [0,T]}\Vert \Pi _S V(t) \Vert ^2_{H^s}&\le \varepsilon ^{2\beta } \Vert \Pi _S V_0 \Vert ^2_{H^s} \\ {}&\quad +C(s)\varepsilon ^{2\beta } \Big (\varepsilon ^{4} T+ 1+\varepsilon ^{\beta } T +\varepsilon ^{5-\beta }T\Big )\,. \end{aligned}$$

The bound (6.7) holds for \(T\lesssim \varepsilon ^{-2-\sigma }\) by (6.2) and (6.1).

6.2 The Equation for High Frequencies

Consider the second equation in (6.6), which is the (5.1), and take an initial condition \(V_0\) satisfying the assumption of Proposition 6.1, and take the solution V(tx) evolving form \(V_0\) and defined on a time interval [0, T] with \(T\le T_{*}\) (see (6.5)). By Remark 6.2 and the assumptions of Proposition 6.1 we can apply Theorem 5.1. Then, by the bootstrap assumption on V and the bound (4.4) on \(\varepsilon \varphi \), we have that estimate (5.2) implies

$$\begin{aligned} \begin{aligned} \varepsilon ^{2\beta }\sup _{t\in [0,T]} \Vert \Pi ^{\perp }_S {V}(t) \Vert ^2_{H^s}&\le (1+C_{s}\varepsilon ) \varepsilon ^{2\beta } \Vert \Pi ^{\perp }_S {V}(0) \Vert ^2_{H^s} \\ {}&+C_s \varepsilon ^{2\beta }\Big ( \varepsilon ^{3} T +\varepsilon ^{\beta } T +T\,\varepsilon ^{5-\beta } \Big ), \end{aligned} \end{aligned}$$

for some \(C_s>0\). If we take \(T=c_0 \varepsilon ^{-2-\sigma }\), \(0<\sigma <1/4\), then taking \(\varepsilon \) small enough and (6.1) we obtain the (6.8), provided that \(c_0\) is small enough.

Proof of Proposition 6.1

By the discussion of Sects. 6.1, 6.2 we have that, if the bound (6.5) holds true, then one obtains the improved estimates (6.7) and (6.8). Then the time \(T_{*}\le c_0\varepsilon ^{-2-\sigma }\) is not the maximal time for which the bound (6.4) holds. This is in contradiction with the definition (6.5). Then the thesis follows.

6.3 Proof of the Main Results

Proof of Theorem 1.1

Fix \(N\in \mathbb N\) and consider a symmetric subset \(S\subset \mathbb Z\) of cardinality N as in (4.1). Fix the mass m in the full measure set \(\mathcal {M}\) given by Lemma 3.8, so that the normal form Proposition 3.9 applies. By the arguments in Sect. 4.1, for \((\xi ,\theta )\in [1, 2]^{N}\times \mathbb T^{N}\),Footnote 4 we construct an oscillating function \(\varepsilon \varphi (t,x)\) of the form (4.3) supported on S, with frequency of oscillation \(\omega _{j}(\xi )\), \(j\in S\), in (3.34), where \(\texttt{C}_{jk}\) are the coefficients computed in (3.16). This also defines the linear operator \(\texttt{C}:\mathbb {R}^{N}\rightarrow \mathbb {R}^{N}\).

Now take any \(V_0\in H^{s}\) such that (6.2) holds with \(\beta :=5/2\) and set

$$\begin{aligned} Z_0=Z(0)=\Phi _{B}^{-1}\big (\varepsilon \varphi (0)+\varepsilon ^{\beta }V_0\big )\,. \end{aligned}$$
(6.20)

We observe that \(\varepsilon \varphi (0)\) is determined by the choice of \((\xi , \theta )\). Consider the solution Z(t) of the Klein–Gordon equation (1.4) with initial condition \(Z(0)=Z_0\) (which exists by local theory), and notice that, by the continuity of the map \(\Phi _{B}^{-1}\), \(Z_0\) can be chosen in an open set of \(H^{s}\).

We shall prove that the bound (1.7) holds over a time interval [0, T] with \(T=\varepsilon ^{{-\frac{9}{4}+\delta }}\) for any \(\delta >0\). We write

$$\begin{aligned} T=\frac{1}{\varepsilon ^{9/4-\delta }}= \varepsilon ^{-2-\sigma }\varepsilon ^{\delta /2}, \quad \sigma :=\frac{1}{4}-\frac{\delta }{2}. \end{aligned}$$

By this choice of \(\sigma \), we have

$$\begin{aligned} 2+2\sigma =\frac{5}{2}-\delta , \qquad 3-2\sigma =\frac{5}{2}+\delta . \end{aligned}$$

Then the choice \(\beta =5/2\) is so that the condition (6.1) is fulfilled and Proposition 6.1 applies. It guarantees that if V is a solution of (4.9) with initial condition \(V(0)=V_0\) satisfying (6.2) then the bound (6.4) holds over a time interval \([0,\widetilde{T}]\) with \(\widetilde{T}=c_0 \varepsilon ^{-2-\sigma }\) for some \(c_0=c_0(s)\) small. Notice that, taking \(\varepsilon \) small enough w.r.t. s, we also have that

$$\begin{aligned} \widetilde{T}=c_0 \varepsilon ^{-2-\sigma }\ge T=\varepsilon ^{\delta /2} \varepsilon ^{-2-\sigma }=\frac{1}{\varepsilon ^{9/4-\delta }}. \end{aligned}$$

The bound (6.4) on \(\varepsilon ^{\beta }V\), and estimate (4.4) imply that

$$\begin{aligned} \sup _{[0,T]}\Vert W\Vert _{H^{s}}=\sup _{[0,T]}\Vert \varepsilon \varphi +\varepsilon ^{\beta }V\Vert _{H^{s}}\lesssim _{s}\varepsilon , \end{aligned}$$

which, together with the estimates on the map \(\Phi _{B}\), guarantees that 4.2 holds true, so Proposition 4.2 applies. Estimate (4.12) implies (1.7). This concludes the proof.

Proof of Proposition 1.3

We define the map \(\mathcal {F}:=\Phi _{B}^{-1}\) where \(\Phi _{B}\) is the Birkhoff map constructed in the proof of Theorem 1.1 (see Prop. 3.9).

By definition (recall (1.8), (1.9)) a function \(u\in \mathcal {A}^{N}_{\varepsilon ^{7/2}}\) can be written as

$$\begin{aligned} u=u_1+u_2\,,\qquad u_1=\mathcal {F}(\varepsilon \varphi _{\xi ,\theta })\,,\qquad \Vert u_2\Vert _{H^{s}}\le \frac{\varepsilon ^{\beta +1}}{2}\,,\quad \beta :=5/2\,, \end{aligned}$$
(6.21)

for some \((\xi ,\theta )\in \mathcal {O}^{N}\times \mathbb T^{N}\). By the proof of Theorem 1.1 we know that \(u\in \mathcal {U}_{\xi ,\theta }^{N}\) if it is the image, under the map \(\mathcal {F}\) (see (6.20)), of a function \(\varepsilon \varphi _{\xi ,\theta }+\varepsilon ^{\beta }V_0\) for some \(V_0\) satisfying (6.2). So to prove the inclusion (1.9) it is sufficient to show that there is some \(V_0\) (satisfying (6.2)) such that

$$\begin{aligned} \mathcal {F}^{-1}(u_1+u_2)=\mathcal {F}^{-1}(u)=\varepsilon \varphi _{\xi ,\theta }+\varepsilon ^{\beta }V_0 \qquad \Leftrightarrow \qquad u_1+u_2=u=\mathcal {F}\big (\varepsilon \varphi _{\xi ,\theta }+\varepsilon ^{\beta }V_0\big ). \end{aligned}$$

By (6.21) we have

$$\begin{aligned} \begin{aligned} \mathcal {F}^{-1}(u)&=\mathcal {F}^{-1}\big (\mathcal {F}(\varepsilon \varphi _{\xi ,\theta })+u_2\big ) \\ {}&=\varepsilon \varphi _{\xi ,\theta }+u_2+ \mathcal {F}^{-1}\big (\mathcal {F}(\varepsilon \varphi _{\xi ,\theta }+u_2)+R\big ) -\mathcal {F}^{-1}\Big (\mathcal {F}(\varepsilon \varphi _{\xi ,\theta }+u_2)\Big ) \end{aligned} \end{aligned}$$
(6.22)

where

$$\begin{aligned} R:=\mathcal {F}(\varepsilon \varphi _{\xi ,\theta })+u_2-\mathcal {F}(\varepsilon \varphi _{\xi ,\theta }+u_2) =u_2-d\mathcal {F}(\varepsilon \varphi _{\xi ,\theta }+\sigma u_2)[u_2], \end{aligned}$$

for some \(\sigma \in [0,1]\). By estimates (3.13), (3.14) we get

$$\begin{aligned} \Vert R\Vert _{H^{s}}\le C_{s}\varepsilon \,\Vert u_{2}\Vert _{H^s}\,, \end{aligned}$$
(6.23)

taking \(\varepsilon >0\) small enough. Similarly, by (6.22), we can write

$$\begin{aligned} \mathcal {F}^{-1}(u)=\varepsilon \varphi _{\xi ,\theta }+f, \qquad f:=u_2+d\mathcal {F}^{-1}\big (\mathcal {F}(\varepsilon \varphi _{\xi ,\theta }+u_2)+\tau R\big )[R], \end{aligned}$$

for some \(\tau \in [0,1]\). Hence, again by (3.13), (3.14), we deduce that

$$\begin{aligned} \Vert f\Vert _{H^{s}}\le \Vert u_2\Vert _{H^{s}}+C_2\varepsilon \Vert R\Vert _{H^{s}} {\mathop {<}\limits ^{(6.21), (6.23)}} \varepsilon ^{\beta +1}\,, \end{aligned}$$
(6.24)

taking \(\varepsilon >0\) small enough. Notice that the function f depends only on the known functions \(\varepsilon \varphi _{\xi ,\theta },u_2\). Setting \(\varepsilon ^{\beta }V_0:=f\) we have that condition (6.2) is fulfilled thanks to estimate (6.24). This implies the thesis.