Reducible KAM Tori for the Degasperis–Procesi Equation

We develop KAM theory close to an elliptic fixed point for quasi-linear Hamiltonian perturbations of the dispersive Degasperis–Procesi equation on the circle. The overall strategy in KAM theory for quasi-linear PDEs is based on Nash–Moser nonlinear iteration, pseudo differential calculus and normal form techniques. In the present case the complicated symplectic structure, the weak dispersive effects of the linear flow and the presence of strong resonant interactions require a novel set of ideas. The main points are to exploit the integrability of the unperturbed equation, to look for special wave packet solutions and to perform a very careful algebraic analysis of the resonances. Our approach is quite general and can be applied also to other 1d integrable PDEs. We are confident for instance that the same strategy should work for the Camassa–Holm equation.


Introduction and Main Result
In this paper we prove existence and stability of Cantor families of quasi-periodic, small amplitude, solutions for quasi-linear Hamiltonian perturbations of the Degasperis-Procesi (DP) equation under periodic boundary conditions x ∈ T := R/2π Z, where (1.2) the "Hamiltonian density" f belongs to C ∞ (R, R) and is such that where O(u 9 ) denotes a function with a zero of order at least nine at the origin. The Eq. (1.1) is a Hamiltonian PDE of the form u t = J ∇ H (u) where ∇ H is the L 2 (T, R) gradient and the function is defined on the phase space H 1 0 (T) := u ∈ H 1 (T, R) : T u dx = 0 . The Eq. (1.1) for f = 0 is the DP equation which was first proposed in [29] in the form u t + c 0 u x + γ u x x x − α 2 u x xt = − 2c 1 α 2 u 2 + c 2 (u 2 x + uu x x ) x , (1.5) where c 0 , c 1 , c 2 , γ, α ∈ R, α = 0. By applying Galilean boosts, translations and time rescaling to (1.5) one obtains Eq. (1.1) with f = 0. The DP equation can be regarded as a model for nonlinear shallow water dynamics and its asymptotic accuracy is the same as for the Camassa-Holm equation and a degree more than the KdV equation [23]. There is a rather large literature on this equation starting form the paper [28] in which the complete integrability is proved. The local and global well-posedness, for instance, have been extensively studied as well as existence of wave breaking phenomena (peakons, N-peakons solutions). Without trying to be exhaustive we quote [18,[20][21][22]48,54] and we refer to [32] and references therein for more literature about Degasperis-Procesi equation.
Actually many of these results (notably the wave breaking) are studied in the dispersionless case, which corresponds to (1.1) with f = 0 and u u + 1. In the present paper the presence of the dispersive terms −4u x + u x x x is fundamental. Our main purpose is to prove existence of quasi-periodic solutions in high Sobolev regularity by following a KAM approach. In this setting a quasi-periodic solution with ν ∈ N frequencies is defined by an embedding T ν ϕ → U (ϕ, x) ∈ H 1 0 (T, R) (1.6) and a frequency vector ω ∈ R ν , with rationally independent entries, such that u(t, x) = U (ωt, x) is a solution of (1.1) and U (ϕ, x) ∈ H p (T ν+1 , R) for some p sufficiently large. Notice that, in a neighbourhood of u = 0, (1.1) can be seen as a perturbation of the linear whose bounded solutions have the form v(t, x) = j∈Z v j e i(λ( j)t+ j x) , λ(j) := j 4 + j 2 1 + j 2 = j + 3 j 1 + j 2 , j ∈ Z, (1.8) where j → λ( j) is the linear dispersion law. It is easily seen that all solutions of (1. 7) with compact Fourier support are periodic, but with period depending on the support. In this context it is natural to investigate whether Eq. (1.1) has periodic or quasi-periodic solutions close to to small amplitude linear solutions (1.8). We remark that, since the solutions of (1.8) are all periodic, the existence of quasi-periodic solutions, if any, strongly relies on the presence of the quadratic nonlinearity in (1.1).
In the present paper we construct quasi-periodic solutions mainly supported in Fourier space at ν ≥ 2 distinct tangential sites S + := {j 1 , . . . , j ν }, S := S + ∪ (−S + ), j i ∈ N \ {0}, ∀i = 1, . . . , ν, (1.9) where, without loss of generality, we shall always assume that j 1 = max i=1,...,ν j i . We denote by ω := j 1 (4 + j 2 1 ) 1 + j 2 1 , . . . , j ν (4 + j 2 ν ) 1 + j 2 ν ∈ Q ν (1.10) the linear frequencies of oscillations related to the tangential sites. More precisely our solutions will have the form where o( √ |ξ |) is meant in the H s -topology with s large. It is well know that in looking for quasi-periodic solutions "small divisors" problems arise. To overcome such problems we shall require that S + satisfies a wave packet condition and that the unperturbed amplitudes ξ belong to an appropriate Cantor-like set of positive measure. The following definition quantifies the wave packet condition. Denoting by B(0, ) the ball centred at the origin of R ν of radius > 0, our result can be stated as follows.
Moreover we have the following stability result. Theorems 1, 2 are formulated in the typical style of results on reducible KAM tori for PDEs. For the proof we use the overall strategy of [4], which however has to be substantially developed to deal with (1.1). Let us briefly explain the main new issues.
• The dispersion law is asymptotically linear as for the Klein-Gordon equation, studied for instance in [6,7]. As explained in those papers, the fact that the dispersive effects are very weak (essentially time and space play the same role) creates a number of difficulties even in the study of KAM theory for semi-linear PDEs. Of course, since (1.1) is quasi-linear, there are additional serious difficulties coming from the strong perturbative effects of the nonlinearity. • The DP equation is resonant at zero and does not depend on any external parameters. This is a fundamental difference w.r.t. the Klein-Gordon equation, where one modulates the mass in order to avoid resonances. Moreover the DP has non-trivial resonances already at order four (see Sect. 1.3), differently from the previous KAM results for quasi-linear PDEs. As a further difficulty the algebraic structure of the resonances is quite complicated. In order to avoid the inherent problems we rely on the presence of "many" (precisely eight) approximate constants of motion of (1.1) coming from the integrable structure of the DP equation. Dealing with the problems related to resonances is the core of this paper and requires a set of new ideas and a careful analysis. • The very strong restriction of the tangential sites S + is exploited several times to simplify the problems arising from the rational and asymptotically linear dispersion law. Physically we are looking for solutions mainly supported in Fourier space on modes which are relatively close to each other. It seems reasonable that such condition could be weakened, but it is not clear to us how to deal with the technical difficulties which would arise. • As in other resonant cases, the diophantine constant γ is related to the size of the solution one is looking for (see (1.11)). Moreover, due to the linear dispersion law, we are forced to impose very "weak" non-degeneracy conditions on the linear frequencies of oscillations. As a consequence we need a refined bifurcation analysis in order to find a very good first approximate solution and fulfil the smallness conditions required for the Nash-Moser scheme.
Some comments on Eq. (1.1) and on Theorems 1, 2 are in order.
The unperturbed DP equation. We look at (1.1) as a perturbation of the linear equation (1.7), in order to fit the typical perturbative setting of KAM for PDEs , we refer to Sect. 1.1 for more details. Actually, since the Degasperis-Procesi equation is completely integrable (see [28]) it would be very natural to try to construct solutions of (1.1) which bifurcate from quasi-periodic solutions of the unperturbed DP equation (1.14) which corresponds to (1.1) with f = 0. Indeed, near zero, the (1.1) can be seen also as a perturbation of (1.14). Unfortunately even though algebro-geometric finite-gap solutions have been already constructed in literature for the DP equation (see [42]) it is not clear to us whether they are real quasi-periodic solutions in the sense of (1.6). Of course if one were able to bifurcate from finite-gap solutions of (1.14) then it would be possible to prove existence of large quasi-periodic solutions, by requiring that f is small. Such a strategy has been followed successfully for the KdV and cubic NLS equation on the circle. Actually for those equations one can prove the existence of Birkhoff coordinates [41,43] (the Cartesian version of action-angle variables), which trivialize the dynamics (in the sense that the solutions turn out to be all periodic, quasi-periodic or almost periodic) and provide a fundamental tool for investigating the dynamical consequences of small perturbative effects, also far from the origin, see [14].
For 1d integrable PDEs one would expect this to be the typical scenario at least in a neighborhood of zero, see [5,46]; however, as far as we know, up to now such results are available only for the KdV, the NLS and the Toda system. Theorem 1 provides, again as far as we know, the first existence result of quasi-periodic solutions, in the sense of (1.6), for (1.14).
It would be interesting to apply our KAM approach to the Camassa-Holm equation, which is a well-known integrable PDE with an asymptotically linear dispersion law, but with a different symplectic structure. Even though we have not performed the computations, we expect to be able to prove the equivalent of Theorems 1, 2 also for this equation. We remark that in this case, the finite gap solutions are known to be quasi-periodic tori, see [20].
One could start by comparing them with the solutions predicted by our method and then possibly develop KAM theory close to large finite gap solutions.
Approximate constants of motion of (1.1). Even though we do not fully exploit the integrability of (1.14) it is fundamental for us that (the non integrable) (1.1) has at least eight approximate constants of motion (up to an error of order O(u 9 )). It is interesting to notice that, as shown in [29], no other equation with the same dispersion law, and the same symplectic structure, has eight approximate conserved quantities. This means that in (1.1) we cannot consider any quadratic nonlinearity, but we really need the DP structure.
The request of the presence of such approximate conserved quantities it is not only a technical matter. In order to implement a Nash/Moser-KAM algorithm one looks for a family of approximately invariant tori of (1.1) (with a sufficiently good approximation) such that the dynamics on the tori is integrable and non-degenerate, while the dynamics normal to the torus is non-degenerate at the linear level and satisfies the Melnikov conditions. If there are external parameters modulating the linear frequencies, then we can consider as approximate solutions the linear ones. Otherwise the modulation must come from the initial data and, hopefully, this can be achieved by means of Birkhoff normal form (BNF), see for instance [4,39]. In this case, where the the dispersion law in (1.8) is a rational number and is asymptotically linear, such procedure is very difficult. One has to explicitly compute some potentially dangerous resonant terms in the Hamiltonian and show that they vanish. This is the same type of computations which have been done for water waves, see Craig-Worfolk [27] where the authors verify (by computing them) the vanishing of the coefficients of fourth order resonant interactions, the so called Benjamin-Feir resonances. In our case we have to deal with higher order resonances (up to eight), so this would be computationally extremely heavy. Our approach is to use the approximate constant of motions. This will be explained more in detail in Sect. 1.3. Once we have constructed the approximate invariant tori we have to impose the non-degeneracy and Melnikov conditions. Differently form the KdV case, this will not be possible for any choice of the tangential set, and it is where we will use the condition S + ∈ V(r), see Definition 1.1.
Linear stability The linear stability result of Theorem 2 is of course a relevant dynamical information in the study of evolutionary PDEs, but it is also the consequence of a fundamental ingredient of our proof: the reducibility of the linearized equation at any quasi-periodic approximate solution. Reducibility for the Degasperis-Procesi equation linearized at a quasi-periodic function has been obtained in [33], under some appropriate diophantine conditions on the frequencies. Unfortunately, due to the resonances, our case does not fit such hypotheses, and a major point will be to overcome this difficulty. Here we shall use such result (appropriately adapted) inside a nonlinear algorithm to prove the existence of quasi-periodic solutions. This is a classical feature of the literature of KAM theory.
1.1. Some literature. Proving existence and stability for quasi-periodic solutions for PDEs close to an elliptic fixed point is a natural extension of the classical KAM theory for lower dimensional tori [51]. The first results in this direction were for model PDEs on an interval with no derivatives in the nonlinearity and with either Dirichlet, [44,47,51,53] or periodic, [16,19,26], boundary conditions. For extension of KAM theory to higher spatial dimension we mention [8,11,17,25,30,34,52]. While KAM methods for constructing quasi-periodic solutions for PDEs on the circle with no derivatives in the nonlinearity are by now well established, generalizing to cases with derivatives is in general not at all trivial, even in the semi-linear cases (where the derivatives in the nonlinearity are of lower order w.r.t. the linear terms). We mention [45] for the KdV, [49] for the derivative NLS, and [6,7] for the derivative NLW. Recently an innovative strategy was proposed, [3,4] to deal with quasi-linear and fully nonlinear PDEs on the circle. This approach was first developed for the KdV equation but can be applied to many equations of interest in hydrodynamics, such as NLS, [37,38] Kirchhoff [50] or directly the water wave equation [2,15]. While these methods were first thought for PDEs on the circle, of course a very interesting point is the generalization to higher dimensions.
Equation (1.1) is a quasi-linear PDE on the circle and in our study we shall follow the general strategy of [4], extended and adapted to our case. Let us briefly explain the point of view of [4], referring also to [2] for more details.
1.2. The general strategy. We describe the strategy to prove existence and linear stability for small, reducible quasi-periodic solutions of completely resonant quasi-linear PDEs.
(i) The starting point is a Nash-Moser theorem of hypothetical conjugation following [9]. The strategy is to construct quadratically convergent sequence of families of approximately invariant (isotropic) tori. Such construction is based on tame estimates on the inverse of the operator associated to the Eq. (1.1) linearized at an approximate torus and restricted to the normal direction. This is proved by exploiting the Hamiltonian structure and exhibiting symplectic variables adapted to each approximate invariant torus, which essentially decouple the linearized dynamics. Then the bounds on the inverse are achieved by removing all the "bad" values of the parameters. We mention also [24] for a parallel strategy which does not rely on the Hamiltonian structure. (ii) To construct the sequence of item (i) we need a good starting point, i.e. a first family of approximately invariant tori parametrized by real vectors ξ ∈ R ν .
As explained before this is achieved by BNF techniques. In particular, in the quasi-linear context, it is convenient to perform a Weak BNF, i.e. to exhibit a change of variables, close to the identity up to a finite rank operator, such that the following holds. The Hamiltonian H transforms to H Birk + R where R is a small remainder, and The Hamiltonian restricted to U S is integrable and non-degenerate in the sense that the "frequency-to-amplitude" map is invertible.
In order to describe in a simpler way the dynamics in a neighborhood of U S it is convenient to define action-angle variables. This allows to distinguish the tangential and normal dynamics to the approximately invariant tori. We remark that, for semi-linear PDEs, typically one performs a stronger BNF preliminary step, in order to "normalize" also the linearized dynamics normal to the torus, i.e. the terms in the Hamiltonian which are quadratic in the normal directions. In this case the Birkhoff map is close to the identity up to a bounded operator (at most one-smoothing), see for instance [47,51]. Compared to the latter approach, the weak procedure has the disadvantage that the normal form depends on the angles; on the other hand we do not have to address well-posedness issues, since these changes of coordinates are time-one flow maps of an ODE. Note that the recent papers, [10,35,36] directly study the full Birkhoff normal form for quasi-linear PDES.
(iii) The third key point is to study the invertibility of the linearized operator restricted to the normal directions. Thanks to the very "mild" conjugation procedure of item (ii) (with a map = identity+finite rank) it turns out that such linear operator is pseudo differential (with non constant coefficients) up to a finite rank remainder. This is the most important reason for adopting the weak procedure described in (ii).
The invertibility of the linearized operator, with appropriate tame estimates, is based on a reducibility argument which is divided into two parts: (a) A reduction in decreasing order procedure which conjugates the linearized operator to a pseudo differential one with constant coefficients up to a remainder which is a bounded/regularizing term i.e. maps H s (T, R) to H s+ρ (T, R), ρ ≥ 0. The choice of ρ depends of course on the problem one is studying; (b) A quadratic KAM scheme (for bounded operators) which completely diagonalizes the bounded/smoothing remainder of the previous step.
We want to point out the following: • The step (a) strongly relies on the pseudo differential structure of the operator; • The normal form contains angle-dependent terms and some of them turn out to be not perturbative for the KAM scheme (b). The conjugation to constant coefficients of such terms relies on purely algebraic arguments. We refer to this procedure as linear Birkhoff normal form; • As a consequence of having applied the weak and the linear Birkhoff procedure, the normal form around the approximately invariant tori has constant coefficients also in the normal directions. In order to perform the diagonalization procedure of step (b) one needs the second Melnikov conditions, which essentially amount to requiring that the operator has simple eigenvalues with a lower bound on the differences. Once one has diagonalized the operator, the bounds on the inverse follow trivially from lower bounds on the eigenvalues, i.e. first Melnikov conditions.
(iv) In the scheme above, at each step we have removed some bad values of the parameters ξ where the Melnikov conditions do not hold. Hence the last (but not least) step is to prove that at the end of the procedure one has still a positive measure set of parameters. Note that often it is more convenient to express such conditions in terms of the frequency of the quasi-periodic solution. This can be done thanks to the invertibility of the frequency-to-amplitude map.

Main novelties and scheme of the proof.
We describe the structure of the paper following Sect. 1.2, and with particular attention to the main novelties. In Sect. 2 we introduce the Hamiltonian formalism for the DP equation and the functional spaces on which we shall work.
In Sect. 3 we perform the weak Birkhoff normal form explained in item (ii) of the previous section. The result is stated in Proposition 3.2. In order to reach a sufficiently good first approximate solution we need to perform 6-BNF steps. As is well-known, at the n-th step of this procedure one has to take into account the denominators (recall (1.8)) λ( j 1 ) + · · · + λ( j n+2 ).
(1.15) We say that a (n +2)-uple of integer indices ( j 1 , . . . , j n+2 ) is a resonance, and hence may appear in H Birk , if (1.15)= 0 and the momentum condition holds, namely n+2 i=1 j i = 0. We say that a resonance is trivial if it has the form (i, −i, j, − j, . . .) so that the corresponding monomial is integrable.
As mentioned before a major difficulty comes from the fact that the DP equation has many non-trivial resonances (already at order four) and in principle there is no reason why the Birkhoff Hamiltonian restricted to U S should be integrable. By the fact that the Hamiltonian density f is of order O(u 9 ) the perturbation does not affect the leading terms of the Birkhoff Hamiltonian and we can exploit the integrability of the DP equation. Indeed the same Birkhoff transformation should normalize simultaneously all the commuting Hamiltonians. This means that a resonant monomial contributes to H Birk if and only if it is resonant for all the constants of motion. This was proved in detail in [32] at the level of formal power series. Here we adapt this result to the Eq. (1.1) which is only approximately integrable (close to the origin) and we reformulate it in a way better suited to the weak Birkhoff normal form context, see Proposition 3.6.
Once we have shown that the H Birk -dynamics restricted to U S is integrable, in Sect. 4, we prove that it is non-degenerate, i.e. that the frequency to amplitude map is a diffeomorphism. We have a very explicit description of this map and hence this step amounts to proving that the matrix A in (4.6) (which depends only on S + ) has determinant bounded away from zero (the so-called twist condition), see Lemma 4.1. A big difference with [4] is that, in our case, the determinant of A is a rational function of several variables j i that could accumulate to zero as |j i | → ∞. By imposing the wave packet condition we restrict the study of its asymptotic behaviour to regions in which it behaves like a one variable function. Then we use continuity arguments to guarantee the invertibility of A for every choice of S + ∈ V(r) (see Definition 1.1) for r small enough. Outside V(r) the proof of lower bounds for det A should rely on purely algebraic arguments and not on perturbative ones.
In Sect. 5 we introduce the Nash Moser hypothetical conjugation theorem (see Theorem 5.4) and in Sect. 6 we explain how to prove the invertibility of the linearized operator at an approximate solution by only studying it in the normal direction. Since there is no difference with [4] we only give a synopsis.
In Sects. 7 and 7.3 we prove the Theorems 7.1 and 7.13 which provide the reducibility of the linearized operator following item (iii) of Sect. 1.2. As we already mentioned, in [33] we provide a reducibility result for the DP equation (1.1) linearized at sufficiently small quasi-periodic functions under appropriate diophantine conditions on the frequencies . Unfortunately in our case the diophantine constant γ is related to the size of the approximate solutions (see (5.3)) and then the smallness and diophantine conditions above cannot be met.
In [4] this issue appears only in the step (b) of the strategy, where it is solved by the linear Birkhoff normal form method. A first difficulty in our case is that this problem appears also in step (a). So that we first need to perform some preliminary steps (see Sect. 7.1), more precisely we need changes of coordinates, preserving the pseudo differential structure, that conjugate the leading order of the linearized operator to a diagonal one plus a correction, which is unbounded but perturbative in the sense of [33]. In such steps the provided changes of coordinates are similar in structure to those of step (a) but they are proved to be well-defined not by using perturbative arguments, but by algebraic computations involving the Birkhoff resonances (see Lemma A.1). These difficulties appear also for the quasi-linear generalized KdV [39], but here we have several further problems due to the complexity of the symplectic structure of the DP equation. The first step, removing terms of order ε, is straightforward. Already at the second step we encounter the difficulties arising form the presence of non-trivial resonances of order 4, and a priori there is no reason why the normal form should be integrable. Here it does not appear simple to apply the strategy of the weak BNF, using the constants of motion. On the other hand, computing the normal form explicitly by hand, as done in [39], is unmanageable. To bypass this problem we take a different point of view, based on an a posteriori identification argument of normal forms. More precisely in Theorem 7.9 we prove that the normal form obtained after the weak BNF, the preliminary steps and the linear BNF coincides with the one that we would obtain by performing the full formal BNF and then projecting on the quadratic terms in the normal variables. This result strongly relies on the fact that all the resonances contributing to the formal normal form are trivial. A similar identification argument has been used, for instance, in [12,13].
A further point is that, due to the rational dispersion law λ( j), it is possible that a denominator in the linear BNF is not zero but is still uncontrollably small. In the third step, in order to deal with this problem we need to take into account in the unperturbed Hamiltonian also the integrable terms of order ε 2 coming from the previous steps of linear BNF. For this reason it is important to know the exact expression of the main order of the correction at the eigenvalues given by the perturbation, see for instance (5.5). This is also needed in the KAM scheme (b), in order to impose the second Melnikov conditions. Computing these corrections by hand would be a very difficult task, but this comes for free from Theorem 7.9.
In the first part of Sect. 8 we show the convergence of the Nash-Moser algorithm (see Theorem 8.1), which requires the ratio between the size of R = H − H Birk and γ 7/2 to be small (see the smallness condition (8.5)); in the second part we prove that the set of "bad" parameters, i.e. the frequencies which do not meet the first and second Melnikov conditions, has small measure (see (8.25), note that such sets are indexed by three parameters , j, k).
In Lemma 8.4 we provide the measure of the single bad set. Here we use the algebraic arguments provided by Lemma A.1, which guarantees the non-degeneracy of the leading terms of the small divisors. In Sect. 8.1.2 we deal with the summability of the bad sets in j, k for fixed .
The key difficulty is that the spectral gap λ( j) − λ(k) is asymptotically constant, hence there is a bad separation property of the eigenvalues. The same occurs for the wave equation [6,7]. Due to the asymptotically constant spectral gap, these sets are infinitely many. Then the key ingredient is to show that for j, k sufficiently large the second Melnikov conditions are implied by the first ones. This is possible provided that we consider two different diophantine constants. More precisely we have to impose second order Melnikov conditions with γ 3/2 (see (8.6)), which is clearly much smaller than γ . This is why we have to perform many steps of Birkhoff normal form in order to obtain a very good first approximate solution.
We point out that, differently from [2], our Melnikov conditions do not imply a loss of regularity in space. In [2] this loss is acceptable, since in the regularization step ((a) p. 5) the diagonalization is performed up to a very smoothing remainder. In this procedure it is fundamental that the diophantine constant γ is independent of the size of the solution. Of course in our case this is not true and thus in the regularization step we end up with a remainder of order −1, and then in the measure estimates we put some extra efforts to prove second Melnikov conditions without loss of regularity.

Functional Setting
Hamiltonian formalism of the Degasperis-Procesi equation For any u, v in the space we define the non-degenerate symplectic form where J is defined in (1.4) and (·, ·) L 2 is the L 2 (T, R) scalar product. To any C 1 function H : H 1 0 (T) → R we associate a vector field X H by requiring The Hamiltonian vector field X H is uniquely determined since the symplectic form in (2.1) is non-degenerate, in particular X H (u) = J ∇ H (u). The Poisson bracket between two C 1 functions F, G : (2.2) In this way where ad 0 G := I is the identity map. Functional space We consider functions u(ϕ, x) defined on T ν × T. Passing to the Fourier representation (2.6) We define the scale of Sobolev spaces where , j := max{1, | |, | j|}, | | := ν i=1 | i |. We shall work on the phase space H s ∩ H 1 0 (T, R). We denote by B r (0, X ) the ball of radius r centered at the origin of a Banach space X .  where we denoted by [r ] the integer part of r ∈ R.
Linear operators Let A : T ν → L(L 2 (T, R)), ϕ → A(ϕ), be a ϕ-dependent family of linear operators acting on L 2 (T, R). We consider A as an operator acting on H s (T ν+1 , R) by setting This action is represented in Fourier coordinates as (2.11) Conversely, given a Töpliz in time operator A, namely such that its matrix coefficients (with respect to the Fourier basis in ϕ, x) satisfy we can associate it a time dependent family of operators acting on H s (T) by setting For m = 1, . . . , ν we define the operators ∂ ϕ m A(ϕ) as We say that A is a real operator if it maps real valued functions in real valued functions. For the matrix coefficients this means that Hamiltonian linear operators In the paper we shall deal with operators which are Hamiltonian according to the following Definition. Notation. We use the notation A B to denote A ≤ C B where C is a positive constant possibly depending on fixed parameters given by the problem. We use the notation A y B to denote A ≤ C(y)B if we wish to highlight the dependence on the variable y of the constant C(y) > 0. Linear Tame operators Here we introduce rigorously the spaces and the classes of operators on which we work.

Modulo-tame operators and majorant norms
The modulo-tame operators are introduced in Sect. 2.2 of [15]. Note that we are interested only in the Lipschitz variation of the operators respect to the parameters of the problem, whereas in [15] the authors need to control also higher order derivatives. We have a partial ordering relation in the set of the infinite dimensional matrices, i.e. if Since we are working on a majorant norm we have the continuity of the projections on monomial subspace, in particular we define the following functor acting on the matrices Finally we define for b 0 ∈ N In the sequel let 1 > γ > γ 3/2 > 0 be fixed constants.
When the index σ is not relevant we write M In the following we shall systematically use −1 modulo-tame operators. We refer the reader to the "Appendix" of [33] for the properties of Tame and Modulo-tame operators. Pseudo differential operators Following [15] we give the following definitions. where a(x, j), called the symbol of A, is the restriction to T × Z of a complex valued function a(x, y) which is C ∞ smooth on T × R, 2π -periodic in x and satisfies We denote by A[·] = Op(a)[·] the pseudo operator with symbol a := a(x, j). We call O P S m the class of the pseudo differential operator of order less or equal to m and O P S −∞ := m O P S m . We define the class S m as the set of symbols which satisfies (2.22).
We will consider mainly operators acting on H s (T, R) with a quasi-periodic time dependence. In the case of pseudo differential operators this corresponds 1 to considering symbols a(ϕ, x, y) with ϕ ∈ T ν . Clearly these operators can be thought as acting on functions u(ϕ, x) = j∈Z u j (ϕ)e i j x in H s (T ν+1 , R) in the following sense: The symbol a(ϕ, x, y) is C ∞ smooth also in the variable ϕ. We still denote A := A(ϕ) = Op(a(ϕ, ·)) = Op(a). We will use also the notation |a| m,s,α := |A| m,s,α .
Note that the norm | · | m,s,α is non-decreasing in s and α. Moreover given a symbol a(ϕ, x) independent of y, the norm of the associated multiplication operator Op(a) is just the H s norm of the function a. If on the contrary the symbol a(y) depends only on y, then the norm of the corresponding Fourier multipliers Op(a(y)) is just controlled by a constant. As in formula (2.10), if A = Op(a(ω, ϕ, x, y)) ∈ O P S m is a family of pseudo differential operators with symbols a(ω, ϕ, x, y) belonging to S m and depending in a Lipschitz way on some parameter ω ∈ O ⊂ R ν , we set (2.24) For the properties of compositions, adjointness and quantitative estimates of the actions on the Sobolev spaces H s of pseudo differential operators we refer to "Appendix B" of [33].

Weak Birkhoff Normal Form
The aim of this section is to construct a ξ -parameter family of approximately invariant, finite dimensional tori supporting quasi-periodic motions with frequency ω(ξ ). We will impose the map ξ → ω(ξ ) to be a diffeomorphism and we will consider such approximate solutions as the starting point for the Nash-Moser algorithm. In order to state the main result of this section, we need some preliminary definitions. We write the DP Hamiltonian in (1.4) in the following way: Recall S in (1.9) and define S c := Z\ S ∪ {0} . We decompose the phase space as 2) and we denote by S , ⊥ S the corresponding orthogonal projectors. The subspaces H S and H ⊥ S are symplectic orthogonal respect to the 2-form (see (2.1)). We write For a finite dimensional space let E denote the corresponding L 2 -projector on E. The notation R(v k−q z q ) indicates a homogeneous polynomial of degree k in (v, z) of the form Now we start the "weak" Birkhoff normal form procedure, i.e. we look for a change of coordinates which normalizes the terms in (3.1) independent and linear in the normal variable z.
As it is well known, one of the main problem of the Birkhoff normal form procedures is to deal with the resonances given by the equations (1.15) = 0 which arise from considering the kernel of the adjoint action ad H (2) (see (2.4)). It turns out that when n ≥ 2 there are many non-trivial solutions of (1.15) = 0. A way to deal with this problem is to exploit the integrability of the DP equation. In [32] the authors construct an infinite number of conserved quantities K n for the Eq. (1.1) with f = 0 starting from the ones given in [28]. By an explicit characterization of the quadratic part of each K n , they deduce that, at a purely formal level, the Birkhoff normal form of the Degasperis-Procesi equation is action preserving (or integrable). Here we rename these constants of motion in the following way, writing only the quadratic parts (which are fundamental for the study of the Birkhoff resonances at u = 0) where we denoted by We remark that K 1 is the momentum Hamiltonian arising from the translation invariance of the equation.
Definition 3.1. Given a quadratic diagonal Hamiltonian Q(u) = j l( j)|u j | 2 , we define Ker(Q) as the projection on the kernel of the adjoint action (recall (2.2) and We define the projector on the range of the adjoint action as Rg(Q) := I − Ker(Q) .
We say that K , as in (3.6), "preserves" momentum if and only if The main result of this section is the following.

Proposition 3.2.
There exist r > 0, depending on S (see (1.9)), and an analytic symplectic change of coordinates where E is a finite dimensional space as in (3.3), such that the Hamiltonian H in (3.1) transforms into and H (k,0) = Ker(H (2) ) H (k,0) with k = 4, 6, 8 depend only on |u j | 2 . The same change of variables B puts all the Hamiltonians in (3.4) in weak Birkhoff normal form up to order eight as in (3.8). In particular we have In order to prove the Proposition 3.2 above we need some preliminary results proved in detail in [32].
We recall that the quadratic part of H and K r , 2 ≤ r ≤ M, in (3.4) are We say that an n-uple Proof. Since this Proposition is proved in [32] with different notations, for completeness we restate here a concise proof by induction on M. For M = 3 the thesis follows trivially: indeed direct computations show that up to permutations, and this solution is incompatible with j i ∈ Z\{0}.
Let us now suppose that the thesis is true up to M − 1 ≥ 3 and prove it for M. We start by noticing that if n < M then (3.10) with Without loss of generality we assume that n = M and that j i 1 + j i 2 = 0 for any Up to a permutation we can assume that for some M ≥ k ≥ 1 and α 1 , . . . , α k ≥ 1 one has Then we can extract k equations from these ones and write them in the form ⎛ The determinant of the Vandermonde matrix in (3.12) is i =h ( j i 2 − j h 2 ) = 0, since, by hypothesis, j i = ± j h . Then the only possible solution corresponds to j i = 0 for all i, which is not compatible with j i ∈ Z\{0}. ≤1) , generated by the finitely supported Hamiltonian is finite rank, and, in particular, it vanishes outside the finite dimensional subspace E := E (N −1)j 1 (see (3.3) ) and it has the form Therefore its flow (N ) is analytic and invertible on the phase space H 1 0 (T), provided that | E u| is appropriately small.
In order to prove Proposition 3.2 we need the following result.
where E is a finite dimensional space as in (3.3), such that Proof. The terms of degree at most 2 in the variable z are not affected by the procedure that we are going to describe. We argue the result by induction on the number of steps N . For N = 0 it is trivial since 0 is the identity map.
Suppose that we have performed N steps. By the fact that {H, For the latter, we are interested in the corresponding equations for the terms of homogeneity at most N + 3 and degree in the variable z less or equal than one. So we consider the projection ( . We note the following fact, which derives from the Jacobi identity: if f ∈ Ker(H (2) ) (2) ) and by (3.16) (3.17) In order to obtain the Birkhoff normal form at order N + 3 we consider a Birkhoff transformation F (N +3,≤1) with generator F (N +3),≤1 of the form (3.13) ( with N N +3) and we define N +1 := F (N +3,≤1) • N . By Remark 3.5 the flow F (N +3,≤1) is well defined in an appropriately small ball and it has the form Identity plus a finite rank operator. Note that, since F (N +3,≤1) is Fourier supported on ( j 1 , . . . , j N +3 ) such that j 1 + · · · + j N +3 = 0, the Hamiltonian K 1 commutes with F (N +3,≤1) and, by the inductive hypothesis, is chosen in order to solve the homological equation We now show that F (N +3,≤1) solves also the homological equation for the commuting and by (3.16), (3.17) we get By = 0 and we define Z We do not compute explicitly the radius r of the ball in which we can perform the Birkhoff change of variables, however one can easily check that r → 0 as N → ∞ or as r → 0 in Definition 1.9.
Proof of Proposition 3.2. We apply Proposition 3.6 with N = 6 and M = 8 and we obtain (3.7), (3.8) by setting B := −1 N . To prove (3.9) we have to show explicitly the computations of the first step of Birkhoff normal form. First we remove the cubic terms independent of z and linear in z from the Hamiltonian (3.18) We consider 1 := ( t F (3,≤1) ) | t=1 as the time-1 flow map generated by the Hamiltonian vector field X F (3,≤1) , with an auxiliary Hamiltonian F (3,≤1) of the form (3.13) with N = 3. The transformed Hamiltonian is and where H (≥5) 1 collects all the terms of order at least five in (v, z). We choose F (3,≤1) such that the following homological equation holds (3,≤1) . (3.20) Recalling (2.2) and (3.18), the solution of the Eq. (3.20) is given by F (3,≤1) as in (3.13) with N = 3 with coefficients defined as In the second step we normalize the terms of total degree 4 and ≤ 1 in the variable z.
(3.23) The remaining steps of this procedure do not affect the terms with degree of homogeneity less or equal than 4. Hence by (3.23), the fact that λ(− j) = −λ( j) (see (1.8)) and the symmetry of S we obtain (3.9).

Action-Angle Variables
On the submanifold {z = 0} we put the following action-angle variables Note that this change of coordinates is real if and only if I − j = I j and θ − j = −θ j . The symplectic form in (2.1) restricted to the subspace H S transforms into the 2-form We have that the Hamiltonian H (≤8) (θ, I, 0) = j∈S + I j + H (4,0) (I ) + H (6,0) (I ) + H (8,0) (I ) depends only by the actions I and its equations of motion read as where, by (3.9), In order to highlight the fact that we are working close to zero, we introduce a small parameter ε > 0 and we rescale I → ε 2 I , so that the frequency-amplitude map can be written as where ω is the vector of the linear frequencies (see (1.10)), The submanifold {z = 0} is foliated by tori, parameterized by the actions, supporting small amplitude quasi-periodic solutions for the truncated system with Hamiltonian H (≤8) . We shall select some of them as starting point of the Nash-Moser scheme, by fixing I = ξ (here ξ is a parameter), so that appropriate non-resonance conditions on the frequency α(I ) hold. In order to work in a small neighbourhood of the prefixed torus {I ≡ ξ } it is advantageous to introduce a set of coordinates (θ, y, z) ∈ T ν × R ν × H ⊥ S adapted to it, defined by The parameter b will be chosen close to one, to this purpose we shall set and fix a > 0 appropriately small. For the tangential sites S + := {j 1 , . . . , j ν } we will also denote θ j The symplectic 2-form in (2.1), up to rescaling of time, becomes where S ⊥ is the symplectic form in (2.1) restricted to the subspace H ⊥ S in (3.2). The Hamiltonian system generated by H in (3.8) becomes (4.10) In the following lemma we prove that, under an appropriate choice of the tangential set (1.9), the function (4.5) is a diffeomorphism for ε small enough and then the system (4.2) is integrable and non-isochronous.
As a consequence of the non-degeneracy condition in Lemma 4.1 the map in (4.5) is invertible and we denote (4.11)

The Nonlinear Functional Setting
We write the Hamiltonian in (4.10) (possibly eliminating constant terms depending only on ξ which are irrelevant for the dynamics) as where N describes the linear dynamics normal to the torus, and P := H ε − N collects the nonlinear perturbative effects. Note that both N and P depend on ω through the map ω → ξ(ω).
We consider H ε as a (ω, ε)-parameter family of Hamiltonians and we note that, for P = 0, H ε possess an invariant torus at the origin with frequency ω, which we want to continue to an invariant torus for the full system. We will select the frequency parameters from the following set (recall (4.11)) we define the non-resonant sets for some constant C depending on S, where A is defined in (4.6) and Proof. The proof is postponed in "Appendix A".
0 is typical of KAM scheme. The lower bound in G (1) 0 involves resonances of order five with two normal modes. As explained in the introduction, in order to impose such lower bounds we need to take into account also the corrections of order ε 2 . The matrix A comes from the weak BNF of Sect. 3. The terms l j come from the linear BNF procedure of Sect. 7.2. In particular they are evaluated explicitly using the identification argument of Theorem 7.9.

Remark 5.3.
Note that the definition of γ in (5.3) is slightly stronger than the minimal condition for which is possible to prove that G (0) 0 has large measure, namely γ ≤ c ε 2 , with c > 0 small enough. Our choice turns out to be useful for proving that the Cantor set of frequencies of the expected quasi-periodic solutions has asymptotically full measure (as ε → 0).
We look for an embedded invariant torus of the Hamiltonian vector field X H ε (see (5.1)) supporting quasi-periodic solutions with diophantine frequency ω ∈ G 0 . For technical reason, it is useful to consider the modified Hamiltonian More precisely, we introduce ζ in order to control the average in the y-component in our Nash Moser scheme. The vector ζ has no dynamical consequences since an invariant torus for the Hamiltonian vector field X H ε,ζ is actually invariant for X H ε itself. Thus, we look for zeros of the nonlinear operator where z ∈ H s S ⊥ := H s ∩ H ⊥ S (recall (3.2)) with norm defined in (2.7) and with abuse of notation, we are denoting by · s the Sobolev norms of functions in H s (T ν , R ν ). From now on we fix s 0 := [ν/2] + 4.
Notice that in the coordinates (4.7), a quasi-periodic solution corresponds to an embedded invariant torus (5.8). Therefore we can reformulate the main Theorem 1 as follows.
We can deduce Theorem 1 from Theorem 5.4, indeed the quasi-periodic solution u in (1.11) is where ω(ξ ) is the frequency amplitude map (4.5). The rest of the paper is devoted to the proof of Theorem 5.4.

Tame estimates of the nonlinear vector field.
We give tame estimates for the composition operator induced by the Hamiltonian vector fields X N and X P in (5.10). Since the functions y → ξ + ε 2(b−1) y, θ → e i θ are analytic for ε small enough and |y| ≤ C, classical composition results (see for instance Lemma 6.2 in [3]) imply that, for all I In the following lemma we collect tame estimates for the Hamiltonian vector fields X N , X P , X H ε , see (5.1). These bounds rely on tame estimates for composition operators and their proof is completely analogous to the one in Sect. 5 of [4].
and for all ı := ( , y, z), In the sequel we will use that, by the diophantine condition (5.7), the operator (ω · ∂ ϕ ) −1 is defined for all functions u with zero ϕ-average, and satisfies

Approximate Inverse
We want to solve the nonlinear functional equation (see (5.10)) by applying a Nash-Moser scheme. It is well known that the main issue in implementing this algorithm concerns the approximate inversion of the linearized operator of F at any approximate solution (i n , ζ n ), namely DF(i n , ζ n ). Note that DF(i n , ζ n ) is independent of ζ n . One of the main problems is that the (θ, y, z)-components of DF(i n , ζ n ) are coupled and then the linear system is quite involved. In order to approximately solve (6.2) we follow the scheme developed by Berti-Bolle in [9] which describe a way to approximately triangularize (6.2). This method has been applied in [4,39]. Since the strategy is identical to [39] we only summarize it and underline the differences which mainly come from the symplectic structure. For a fully detailed expository presentation see [40]. We now study the solvability of Eq. (6.2) at an approximate solution, which we denote by (i 0 , ζ 0 ), i 0 (ϕ) = (θ 0 (ϕ), y 0 (ϕ), z 0 (ϕ)) in order to keep the notations of [4], [39] . Assume the following hypothesis, which we shall verify at any step of the Nash-Moser iteration, (5.2)) and, for some p 0 := p 0 (ν) > 0, where I 0 (ϕ) := i 0 (ϕ) − (ϕ, 0, 0) and Z is the error function By estimating the Sobolev norm of the function Z we can measure how the embedding i 0 is close to being invariant for X H ε,ζ 0 . If Z = 0 then i 0 is a solution. In general we say that i 0 is "approximately invariant" up to order O(Z ). We observe that by Lemma 6.1 in [4] we have that if i 0 is a solution, then the parameter ζ 0 has to be naught, hence the embedded torus i 0 supports a quasi-periodic solution of the "original" system with Hamiltonian H ε (see (5.1)). By [9] we know that it is possible to construct an embedded torus i δ (ϕ) = (θ 0 (ϕ), y δ (ϕ), z 0 (ϕ)), which differs from i 0 only for a small modification of the y-component, such that the 2-form W (recall (4.9)) vanishes on the torus i δ (T ν ), namely i δ is isotropic. Lemma 7 in [9]) and, more precisely, there existsp :=p(ν) > 0 such that The strategy is to construct an approximate inverse for DF(i 0 , ζ 0 ) by starting from an approximate inverse for the linear operator DF(i δ , ζ 0 ). The advantage of analyzing the linearized problem at i δ is that it is possible to construct a symplectic change of variable which approximately triangularizes the linear system thanks to the isotropicity of i δ . For the details we refer to [9] and [4], here we only give the relevant definitions and state the main result. We define the symplectic change of coordinates ⎛ wherez 0 := z 0 (θ −1 0 (θ )). We denote the transformed Hamiltonian by K := K (ϕ, η, w, ζ 0 ). We then define where K 02 is the linear operator representing the terms quadratic in w of K , i.e.
L ω corresponds to the w-component of the linearized operator after the change of variable G δ . In [9] (see also [4,39]) the following result is proved.
). (6.15) Recall that T δ is defined in (6.13), B is the Birkhoff map given in Proposition 3.2, f is the Hamiltonian density in (1.3). The operator Q 0 is finite rank and has the form The remainders R i do not depend on I δ and satisfy Finally, recalling the Definition 2.3, we have Proof. The expression (6.15) follows from the definition (6.8) by remarking that G δ and the weak BNF transformation B is the identity plus a finite rank operator, while the action angle change of coordinates is a rescaling plus a finite rank operator (acting only on the v). Then, in applying the chain rule, we get where the finite rank part contains all the terms where a derivative falls on the change of variables. Then (6.15) follows from the definition of H in (1.4). Regarding the estimates, (6.17) follows from (6.14); regarding the bounds (6.18), (6.19), we split the finite rank part R 1 + R 2 as follows. The operator R 1 contains all terms arising form derivatives of G δ . By tame estimates on the map G δ (see for instance Lemma 6.7 in [4]), it satisfies the bounds (6.19) and we put it in R >5 . The finite rank term R 2 comes from the Birkhoff map. This is an analytic map so we consider the Taylor expansion where l(j i ) is the i-th vector of the canonical basis of Z ν and is such that , and hence we can expand i=0 ε i f i (where the f i are ε independent) plus a remainder, which is not analytic in ε, of size ε 6 + ε I δ γ,O 0 s . By the assumption (6.3), this means that in low norm s = s 0 + p 1 all these remainders are negligible w.r.t. terms of order ε 5 . This distinction is needed because, due to the resonant nature of the DP equation, we need to perform (see Sects. 7.1 and 7.2) five steps of the order reduction and of the linear BNF by hand, before entering in a perturbative regime. In this framework R >5 is purely a remainder, while the R i are homogeneous polynomial terms. One could apply the same division to the non finite rank terms, one would get where g satisfies the same estimates as (6.19).
The Hamiltonian of the operator (6.15) respect to the symplectic form (6.29) is (see (6.28)) Of course one can be even more explicit and write everything in terms of the original Hamiltonian (1.4) and of the generating functions of the weak BNF, for example one has The terms H i can be computed explicitly, however we only need to prove that they fit the following definitions.
Definition 6.4. We say that a matrix B : be a Töpliz in time operator (recall (2.12)). We say that B(ϕ) is almost diagonal if its associated matrix is almost diagonal.
Let H := H (ϕ) be a quadratic Hamiltonian of the form H = (A(ϕ)z, z) L 2 , where A(ϕ) is a Töpliz in time operator. We say that H and its vector field are almost diagonal if A(ϕ) is almost diagonal. Remark 6.5. It is easy to verify that if X and Y are almost diagonal operators then X + Y , X • Y are almost diagonal. Definition 6.6. Let p ∈ N and m ∈ R. We say that a pseudo differential operator B = Op(b(ϕ, x, j)) (recall Definition 2.8) is homogenenous of degree p in the function v in (6.25) if its symbol b(ϕ, x, j) ∈ S m has the form b(ϕ, x, j) := j 1 ,..., j p ∈S C j 1 ,..., j p ( j) ξ j 1 · · · ξ j p e i( j 1 +...+ j p )x e i(l( j 1 )+...+l( j p ))·ϕ . (6.34) Definition 6.7. Let p ∈ N. We say that a Hamiltonian is pseudo differential and phomogeneous if it has the form where f p is a homogeneous real valued function of v (of degree p) of the form B p ∈ O P S −2 is a p-homogeneous pseudo differential operator according to Definition 6.6 which is self-adjoint w.r.t. to (·, ·) L 2 ; finally R p is a finite dimensional operator of the form (6.16) with g j , χ j p-homogeneous functions of v.
One has that the Hamiltonian is equivalent to {H p , G q } e in the sense that they generate the same vector field. Here A * i , i = 1, 2, denotes the adjoint of A i w.r.t. the L 2 scalar product. Notice that which is an homogeneous function of v of degree p+q. Using the results on compositions of pseudo differential operators in Sect. 2 of [33], the fact that J is skew-self-adjoint, B i , i = p, q, are self-adjoint, and f q , f p are real valued, we deduce that the operator A 2 is a skew-self-adjoint operator in O P S −1 . Hence, using the formula (2.13) in [33] for the adjoint, we have that A 2 + A * 2 is pseudo differential homogeneous operator (according to Definition 6.6) in O P S −2 .

Reduction and Inversion of the Linearized Operator
The aim of the section is to prove the claim in (6.9). As explained in the introduction, first one should reduce the unbounded parts of L ω and then use classical KAM reducibility results to diagonalize. The difficulties arise from the fact that a few steps of this procedure must be done by hand, since they do not fit the typical smallness conditions, see [33].
The key result of this section is the following. There exist S > s 0 and μ 1 = μ 1 (ν) > 0 such that, if condition (6.12) is satisfied with p 1 = μ 1 , then the following holds. There exists a constant m(ω) defined for ω ∈ ε with there exists a real, bounded linear operator where l j is defined in (5.6). The constant m depends on i and for in a Lipschitz way then The result above has two relevant consequences. Firstly it shows that the operator L ω in (6.15) can be conjugated to an operator (see (7.4)) which is "diagonal", at the highest order of derivatives, plus a remainder which is −1-smoothing. In addition to this, thanks to a linear BNF procedure (performed in Sect. 7.2), the non-diagonal term P 0 in (7.4) has a size much smaller than ε (see estimates (7.7), (7.8)). In particular it is "perturbative" w.r.t. the constant γ in (5.3). This allows us to apply the reducibility scheme of [33] in order to complete the diagonalization of the operator L (see Theorem 7.13). Then the inversion assumption (6.9) follows directly from Proposition 7.14). Strategy of the Proof of Theorem 7.1.
• Reduction at the highest order The first step is to exploit the pseudo differential structure of the operator L ω in order to conjugate it to an operator which has constant coefficients up to a smoothing remainder of order −1. To this purpose we use changes of variables generated as the time-one flow map τ | τ =1 of Hamiltonians of the form where β is some smooth function. In Proposition C.2 we show that τ is well defined as symplectic map on H s S ⊥ (see Lemma C.1) and study the structure of τ L ω ( τ ) −1 . Proposition C.2 gives an explicit formula for the new coefficient at the highest order (see (C.17)). Then Corollary 3.6 of [31] (see also Proposition 3.6 in [33]) provides the solution for the Eq. (C.17)=const provided that some smallness condition is satisfied. This smallness condition has the form for some s 0 + p 1 > s 1 > s 0 and some constant C(s 1 ) > 0. As shown in [33], due to the Hamiltonian structure, this reduces L ω to constant coefficients up to a correction of order −1.
Unfortunately, since here γ = ε 2+a , a > 0, by (6.17), the coefficient a 0 (ϕ, x) in L ω does not satisfy (7.12). This is why we have to perform some preliminary steps in order to enter in the perturbative regime where we apply the scheme described in the proof of Corollary 3.6 in [31].
We first "regularize" the purely polynomial terms H i (see (6.32)) by hand, by exploiting their homogeneity according to Definition 6.7. After that we are left with only unbounded terms which satisfy the smallness conditions of [33]. We "regularize" them by applying the results of [33] adapted to our slightly more general setting, see Proposition C.2.
where the Poisson brackets {·, ·} e are in (6.30). Recall that is a C k map from H s to H s−k . Therefore the Taylor expansion of the conjugated Hamiltonian coincides with the Lie series of the generator up to any order τ k .
• Linear BNF The second step is to diagonalize the bounded terms. Here we diagonalize "by hand" the terms up to order ε 3 , by exploiting the fact that they are almost diagonal according to Definition 6.4 and applying a linear BNF. Once this is done, the full diagonalization follows by a standard KAM reducibility theorem (see Theorem 7.13).

Reduction at the highest order.
In the following we shall assume that the (6.12) holds with some p 1 1. The loss of regularity p 1 will be determined explicitly at the end of the section. In order to perform the non-perturbative steps, we construct changes of coordinates B i , i = 1, 2, 3, 4, 5, as the time-one flow maps generated by Hamiltonians as in (7.10). Then we set L 0 := L ω and define iteratively L i := B i L i−1 B −1 i . Note that L 0 is pseudo differential plus a finite rank operator. Even though the B i preserve the pseudo differential structure, in order to have a good quantitative control on the symbols we shall fix appropriate values p ≥ s 0 , ρ ≥ s 0 + 6τ + 9, (7.15) and write This is a class of operators of order −ρ which we introduced in [33] (we recall it in Definition C.5). Note that by Lemma C.7 Q 0 =: Q 0 belongs to L ρ, p for all ρ, p, with bounds on M γ Q 0 (s, b) given in the same lemma. Then one proves iteratively that with σ 0 defined in Proposition 6.2 and σ i+3 > 0 i = 1, . . . , 5, depending only on ν (essentially σ i+3 are the losses coming from the application of Proposition C.2). Note that we can obtain (7.16) for any ρ, p satisfying (7.15); however, if we want (7.18) to hold for some given p, we have to assume a smallness condition (6.12) with p+σ 0 +σ i+3 < p 1 .
Step (ε 2 ). Now we deal with the terms of order ε 2 of the Hamiltonian (6.31). We consider the auxiliary Hamiltoniañ and β 2 is some function of the form (6.34), with p = 2, to be determined. Notice that (∂ τS )(0) ∼ O(ε 4 ). The Hamiltonian system associated toS(τ ) is of the form (7.11) with b b 2 . If B 2 is the flow at time-one generated byS(τ ), then the Hamiltonian of the conjugated linearized operator L 2 := B 2 L 1 B −1 2 is (recall (7.22), (7.28), (7.14)) 1 } e . (7.30) We want to solve the equation where B 2 is some pseudo differential and 2-homogeneous operator of order −2 (see Definition 6.6), c is some constant to be determined and H R 2 (possibly different from the one in (6.31)) is a Hamiltonian with the form (6.35). By Lemma 6.9 we have that H and some B 2 ∈ O P S −2 , as in Definition 6.6, up to a finite rank remainder. Hence the Eq. (7.31) is equivalent to Since f 2 in (7.32) has the form (6.36) with p = 2, we look for a function β 2 of the same form in (6.36) with some coefficients (β 2 ) j 1 , j 2 ∈ C. Hence Eq. (7.33) reads We have that, for j 1 , j 2 ∈ S, λ( j 1 ) + λ( j 2 ) − ( j 1 + j 2 ) = 0 if and only if j 1 + j 2 = 0, since j 1 j 2 = −1. The terms with j 1 = − j 2 corresponds to the average in x of the function f 2 (v). Hence we set and we evaluate explicitly it. The functions 2 (v) and ∂ x x (β 2 1 ) do not contribute since they have zero average in space.
Recalling that Then the constant c = c(ω) (recall the (4.11)) in (7.35) is given by By noting that ε 2 β 2 γ,O s s ε 2 ∀s ≥ s 0 , (7.37) by (7.18)-(7.17) with i = 1 and using the assumption (6.12) with p 1 sufficiently large the smallness assumption of Lemma C.1 and the condition (C.15) are satisfied. In this case q q 1 , hence by (7.18), (7.19) the bounds (C.13), (C.14) hold with k 1 ε, Steps (ε 3 )-(ε 4 )-(ε 5 ). Consider i = 3, 4, 5. We proceed exactly as in the previous steps. We consider a change of coordinates B i as the time-one flow map of for some smooth function β i of the form (6.36) (with p = i) to be determined. Using Lemma 6.9 for the Hamiltonians of order ε i , i = 3, 4, 5, we can choose β i in order to solve an equation like the following where f i is a homogeneous function as in (6.36) (with p = i). The condition (1.13) implies that the Eq.   are in L ρ , p , as is habitual we rename them ρ, p.
By (6.31) and the fact that the generators β i in (7.10) are I δ -independent, it is clear that q (≥4) 5 and Q (≥4) 5 contain terms of size ε 4 , which are functions just of v, and terms dependent also on I δ of "size" O(ε I δ s+σ ), see the estimates (7.17), (7.18), (7.19). By the uniqueness of the Taylor expansion we have that 3 i=1 ε i q (i) 5 + Q (i) 5 coincide with the vector field − 3 i=1 ε i J ∇ K i where, recalling (7.22), (7.30), (7.23), (7.36), and K 3 is some pseudo differential 3-homogeneous Hamiltonian as in (6.35) with the corresponding function f 3 (v) = 0. Now we apply Proposition 3.6 in [33] (or Corollary 3.6 in [31]) in order to make constant the coefficient a 5 of the linearized operator L 5 , namely we find β such that Note that, by (7.17) with i = 5 and (6.12), the smallness condition (7.12) is satisfied by the function a 5 . We have the following.

Let us define
Then the Hamiltonian of the operator L 6 is (recall (6.31), (7.43) and (7.47)) Notice also that {H 0 , Z 0 } e = 0. The expansion (7.57) allows us, together with Remark 7.3, to give a more precise expression of the remainder Q 6 in (7.47). This is the content of Lemma 7.6 in the next section.

Linear Birkhoff normal form.
The aim of this section is to eliminate K 1 , K 3 and normalize the Hamiltonian K 2 from (7.57). Our first point is that the −1 smoothing remainder o(ε 3 ) belongs to a special class of operators defined in Definition C.6 and denoted by C −1 . It turns out that this class is preserved under the changes of variables used in the linear Birkhoff normal form procedure (see Lemmata C.8, C.9).
Remark 7.5. In the following steps of linear Birkhoff normal form we shall use the relation which holds by the conservation of momentum.
Proof. By the discussion of Sect. 7.1 K i , i = 1, 2, 3, are of the form (6.35) with f i = 0 for i = 1, 2, 3. Hence the vector field X K i are pseudo differential of order −1 up to a finite rank term. In addition, they are almost diagonal by (6.34) and the momentum condition (7.58). By Lemma C.8-(ii) ε X K 1 , ε 2 X K 2 , ε 3 X K 3 belong to C −1 and, by (7.27), (7.26), (7.35), (7.37), (7.41), satisfy (7.60). By Proposition 7.4, the choice of ρ as in (7.15) and by Lemma C.8-(i) taking p = s 0 and p 1 large enough, Q 6 ∈ C −1 . Thus R ∈ C −1 . Note that only Q 6 in (7.61) depend on the torus embedding i δ , then the second bound in (7.62) follows by Lemma C.8-(i), (ii), (7.49) and (7.51). To prove the first bound in (7.62) we reason as follows. By (7.45) and (7.17) with i = 5 we have that Then the map B 6 leaves invariant (using Remark 7.2) the terms of size ε, ε 2 , ε 3 in L 5 , and hence, by Remark 7.3, those terms in Q 6 are given by −ε From the proof of the bounds (C. 18), (C.20) in Proposition C.2 one can notice that the operators Op(q 6 ) and Q 6 admit a "formal" expansion in β (∞) (by expanding the flow in τ ). Of course, by the discussion above, the biggest term in R are the ones which are linear in β (∞) . Such term comes from the conjugation of L 5 under the map B 6 , more precisely from the conjugation of a 5 (ϕ, x)).
We refer to the formula (3.11) in Proposition 3.1 of [33] to see the term bounded by the norm of β (∞) . Comparing the bounds (7.63) and (7.42) one can deduce the first bound in (7.62).
In order to normalize the vector fields ε i J ∇K i we will look for changes of coordinates ϒ i generated as one-time flow of quadratic Hamiltonians H A i described by almost diagonal matrices A i (see (C.47), (C.48) ,(C.49), (C.50)). We remark that the Hamiltonian ε 2 Z 0 is left invariant by these changes of coordinates, since {H 0 , Z 0 } e = 0. At any step of the procedure we shall verify that J B i (see (C.49), (C.50)) are almost diagonal and belong to C −1 in order to apply Lemma C.13, which guarantees well-posedness and tame estimates of ϒ i .
Step one (order ε) At this step we want to eliminate ε X K 1 from (7.59). We have We choose A 1 such that Recalling that K 1 := H (1) 1 , we have (see (7.27), (7.28)) Then we choose B 1 = B 1 in (C.49). By recalling the definition of B 1 in (7.27) it is easy to see that J B 1 ∈ C −1 , since it is a pseudo differential operator of order −1. Moreover it is almost diagonal because J , 3 ∂ x are diagonal operators and β 1 is a function supported on the finite set S. Given X ∈ C −1 to shorten the notation in the following lemma we write (recall (2.3)) ad X [·] := [X, ·]. (7.68) Under this notation we have the following lemma.
Step two (order ε 2 ) At this step we want to normalize ε 2 X K (1) 2 from (7.69). We have 2 , (7.76) where K (1) 2 is given in (7.65) (see also (7.70)). We choose A 2 in order to solve the following equation (7.77) 49). Note that X K 2 is pseudo differential of order −1 and J A 1 , X K 1 belong to C −1 and so also their Poisson brackets. Hence J B 2 ∈ C −1 . By Remark 6.5 we have that J B 2 is also almost diagonal.
In order to perform the third step in the linear BNF we need to explicitly compute the corrections O(ε 2 ) coming from Ker(H 0 ) K (1) 2 . The point is that a priori, it is not clear whether the resonant terms Ker(H 0 ) K (1) 2 are supported only on trivial resonances. Our approach is then to show that the normal form we obtain must necessarily coincide with the formal one, which is relatively easy to compute. Definition 7.8. Recalling the notations used in Sect. 3, we denote by d z ≤k , respectively d z =k , the projector of a homogenous Hamiltonian of degree n on the monomials with degree less or equal than k, respectively equal k, in the normal variable z, i.e. We denote by triv the projection onto trivial resonances (of the form (3.11)), i.e. monomials of the form The following proposition allows to easily compute the resonant terms Ker(H 0 ) K (1) 2 in (B.17). Theorem 7.9 (Normal form identification). Consider the symplectic change of coordinates A ε in (4.7). Then Proof. The proof is postponed to the "Appendix B".
Proof. The proof follows by using the same arguments of the proof of Lemma 7.7. In particular, expanding the left hand side of (7.84) using (C.58) we get (7.86) By (7.77) and Theorem 7.9 we have that By (7.10) D(ξ ) ∈ C −1 and by the fact that A 2 is almost diagonal we have that The bounds (7.85) are obtained by using the estimates (C.43), (C.44), (7.74), (C.54) and (7.72).
Step three (order ε 3 ) At this step we eliminate ε 3 X K (2) 3 from (7.84). Recalling that K (2) 3 is given in Lemma 7.11, we have Note that we consider in the normal form also the ε 2 -terms. We want to solve the equation Hence we choose the matrix B 3 := ∇K 3 ). Recalling (7.71) it is easy to see that J B 3 is sum of Lie brackets of elements of C −1 , hence by Lemma C.8 it belongs to C −1 . By the fact that A 1 is almost diagonal and by Remark 6.5 we have that J B 3 is almost diagonal.
Lemma 7.12. The transformed operator is (recall (7.69)) for someσ possibly larger than the one in Lemma 7.11.
Proof. The proof follows the same arguments used for proving Lemma 7.7. By (C.58) we deduce We note that by (7.88) we have (recall (4.5) and (7.36)) since A 3 is almost diagonal . Hence the bounds (7.90) follows by (C.54), (7.85) and by using Lemma C.9.
Proof of Theorem 7.1. We choose μ 1 =σ given in Lemma 7.12. We consider p and p 1 so that whereσ is the loss of regularity in Lemma 7.12, σ 0 has been introduced in Sect. 7, see estimates (6.17)-(6.22), σ 1 > 0 and s 1 are given respectively in Lemma C.1 and in Proposition 3.6 in [33].

KAM reducibilty and Inversion of the linearized operator.
In this subsection we prove the claim (6.9) by diagonalizing the operator L in (7.4). We first write Notice that (by the smallness condition (7.94)) Proposition 4.1 in [33] applies to the operator L in (7.4). Hence by following almost word by word the proof of Theorem 1.7 in [33] one has the following.
then the following holds.
with m and κ j in (7.48) and (7.81) respectively. Furthermore, for all j ∈ S c sup j j |r All the eigenvalues id ∞ j are purely imaginary.

Proof of the inversion assumption
We deduce the inversion assumption (6.9) by the following result.

The Nash-Moser Nonlinear Iteration
In this section we prove Theorem 5.4. It will be a consequence of the Nash-Moser theorem 8.1. Consider the finite-dimensional subspaces We define ⊥ n = I − n . The classical smoothing properties hold, namely, for all α, s ≥ 0, Recall (5.3), (4.8) for the definition of b we set a := 2b − 2. We define the following constants α 0 := 3μ + 3, α:= 3α 0 + 1, where μ := μ(ν) > 0 is the "loss of regularity" given by the Theorem 6.1 and C 1 is fixed below.
To conclude the proof of Theorem (8.1) it remains to show the bounds (8.9). This is done in the next section. Recalling (8.6) we can write, setting η γ n for the sets Q j (η, σ ) and P j (η, σ ), η γ * n for the set R jk (η, σ ), and σ τ , γ n , τ ) . (8.25) Since, by (5.7) and γ > γ 3/2 (see (8.5)), R jk (i n ) = ∅ for j = k, in the sequel we assume that j = k. We start with a preliminary lemma, which gives a first relation between , j, k which must be satisfied in order to have non empty resonant sets. Lemma 8.3. Let n ≥ 0. There is a constant C > 0 dependent of the tangential set and independent of , j, k, n, i n , ω such that the following holds: Moreover, using (7.95), (7.96), (7.2), (7.5), we get |d ∞ and this proves the first claim on R jk . If Q j = ∅ then we have |m j| < 2η −σ +|ω · |. Hence, for ε small enough, we have Following the same arguments and by using that |d j | ≥ C| j| for some constant C > 0 we get the last statement.

Measure of a resonant set
The aim of this subsection is to prove the following lemma.

Lemma 8.4.
There is r 0 > 0 such that, for any 0 < r ≤ r 0 , and any choice of S + ∈ V(r) we have that
The proof of the lemma above involves many arguments and we split it into several steps. In several bounds we will evidence the dependence of the constants on the tangential set S in order to highlight that the smallness of the amplitudes ξ depends on the choice of the tangential sites.
Let us first consider the set R jk , which is the most difficult case. We study the sub-levels of the function ω → φ R (ω) defined by (recall (4.5),(7.95)) We recall that (see (7.2), (7.48)) (8.28) where κ j is defined in (7.5) (see also (7.40)) and We first study some properties of the function φ R (ω) in (8.27).
. For ε small enough, by (8.33), we get The lemma follows by Fubini's theorem.
Proof of Lemma 8.4. For the sets R jk the lemma follows by Lemmata 8.6,8.7,8.8. The proof for the sets Q l j and P j follows using the same arguments used for R jk . Lemmata 8.6, 8.7 are identical, with the only difference that the non-resonance condition now reads respectively |ω · + j| ≥ γ 0 −σ ,|ω · + λ( j)| ≥ γ 0 −σ in the case of Q j and P j . Regarding Lemma 8.8, it follows from (A.4) in the case of Q j and from (A.6) in the case of P j .
We are in position to prove (8.9). We have, by (8.42), On one hand we have that, using Lemmata 8.4 and 8.10, Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A. Non-degeneracy conditions
Proof of Lemma 4.1. Recalling (4.6) we introduce a matrix K so that A =: (2/9) diag λ(j i )(1 + j 2 i ) K. Now we show that the entries of K are bounded by some constant independent of the j i . After some direct computations we have that Obviously |K j j | ≤ 2; regarding the off-diagonal terms, we note that 0 ≤ 2k j ≤ k 2 + j 2 , hence |K jk | ≤ 12 if j = k. We consider the variables x, p 2 , . . . , p ν defined as so that P(x, p i ) = det(K) is a rational function. It is easily seen that K computed at p i = 1 for all i, coincides with the matrix where U is the matrix with components U i j = 1 for any i, j = 1, . . . , ν. Its determinant is We note that the absolute value of (A.2) is ≥ 1 at x = 0. We conclude that there exists 0 < r 0 < 1 such that This implies the thesis.
Lemma A.1. There exists 0 < r 0 < 1 such that, for any S + ∈ V(r) with 0 < r ≤ r 0 (see Definition 1.1), the following holds true: Here v and w j are defined in (8.28) (see also (7.5)), ω in (1.10) and δ is some appropriately small pure constant.
Proof of (A.3). The case | | = 1 is trivial. For | | = 2 we use the fact that the j i are all distinct. For | | = 3, 5 we pass to the variables (A.1) and we get We notice that L(0, 1) = | i i | ≥ 1 (since i i has the same parity of | |) so, by continuity, there exists 0 < r 0 < 1 such that L(x, p) > 1/2 for all 0 ≤ x < r 0 , | p i − 1| ≤ r 0 . This implies the result.
Proof of (A.4). We first note that (recall (8.28)) Consider the change of variables (A.1). One can note that the matrix A in (4.6) at p i = 1, i = 2, . . . , ν, is given by . (A.9) We note that, for x = 0, one has |det( Proof of (A.5),(A.6). We systematically use the variables (A.1). We define = diag (ω i ), Then w j in (8.28) can be written as where e i is the i-th vector of the canonical basis of R ν . In the new coordinates (x, p) in (A.1), and setting t = j (j 1 ) −1 , this reads as By direct computations we have with C independent of t, and for x, p in a neighborhood of (0, 1). Moreover sup t | f (t, x)| ≤ 3|x| 2 /2. Thus, for x, p sufficiently close to (0, 1), we have that which implies the claim (A.10). Hence, setting s = k(j 1 ) −1 , we have where h(t, s) := t (t 4 + t 2 + 1) −1 − s (s 4 + s 2 + 1) −1 .
We are left to deal with the case | j| ≤ C(S). We write (A.12) as P(j i , j)/Q(j i , j) where P, Q are polynomials with integer coefficients and Q has no real zeros. We remark that 1 < Q < C(S) due to the condition | j| ≤ C(S). If P = 0 then |P| ≥ 1 and again (A.12) is larger than some K (S). We conclude that T jk = ∅ by reasoning as in the case j large. Now we study the case in which P = 0. Fixed in (A.12), then P has degree four in j and so the condition P = 0 fixes at most four choices of j that we call j 1 , j 2 , j 3 , j 4 . For P = 0 (which is (A.12) = 0) we have where v is in (8.28) and κ j = w j ·ξ with κ j in (7.5). These are a finite number (depending only on ν) of linear functions of ξ . We compute the derivative in ξ which is where j ∈ { j 1 , j 2 , j 3 , j 4 }. Now (A.5) implies that the quantity (A.14) is bounded from below by a constant depending on S. This lower bound and Fubini Theorem imply that |T jk | ≤ C(S)ε 2(ν−1) γ for some C(S) > 0 depending on S. By the discussion above we have where K (S) > 0 and C(S) > 0 are constant depending on the set S. This implies the thesis.

Appendix B. Normal form identification
Proof of Theorem 7.9. The core of Theorem 7.9 is to show that the terms in the l.h.s. of (7.78), which are obtained through a rather complicated sets of bounded changes of coordinates, coincide with the ones obtained by a purely formal full Birkhoff Normal Form procedure. In [32] it has been shown that, at purely formal level, the latter is well-defined and not resonant, i.e. the resonant Hamiltonian is supported only on trivial resonances as in (3.11). We procede as follows.
Step 1 The first step is to show that resonant terms at order ε 2 of the Full Birkhoff normal form coincide with the ones obtained by using the weak BNF procedure in Sect. 3, passing to action-angle variables and finally using a formal linear BNF.
Step 2 In order to conclude we note that the bounded maps we applied in Sects. 7.1 and 7.2 are, as functions of ε, C 3 with values in L(H s , H s−3 ) . Therefore the Taylor expansion of the Hamiltonian associated to the operators in (7.69) coincides with the Lie series of the generator up to order ε 2 (see also (7.71)). Then we show that the Lie series coincides (up to order ε 2 ) to the one obtained in step 1. Even though we taylor our proof to the particular set of changes of variables that we use in Sect. 7, the argument is quite general and is essentially that the linear BNF up to order ε 2 is coordinate independent.
Let us now perform the same Birkhoff procedure by first cancelling the terms of degree ≤ 1 in z (weak BNF) and then the terms of degree two (linear BNF).
By the discussion in Sect. 3, recall the notations of Proposition 3.6, we have that, after two steps of weak Birkhoff normal form, the Hamiltonian of degree less or equal than 4 is Here Z (3,2) , H where H 1 , H 1 are defined in (3.19). The monomials of degree greater than 4 will be not involved in this computation, so we omit them. It is important to notice that, by direct inspection, F (3,≤1) = F (3,≤1) defined in formula (3.21). By Proposition 3.6, we know that the same change of variables puts one of the constants of motion (lets say K 3 and drop the subindex 3) into normal form, The step of formal linear BNF entails applying the formal change of variables generated by F (3,2) := [ad H (2) ] −1 H (3,2) .
Again, by direct inspection, one can note that F (3,2) ≡ F (3,2)  We now want to pass to the action-angle variables introduced in (4.7). Since the rescaling with the parameter ε is covariant under the change of variables that we use, we consider instead of A ε the symplectic change of variables A 1 := A ε | ε=1 . By recalling that θ=ϕ and setting F (3,2) we have that (B.10) reads Ker(H 0 ) ( (3,2) , The rigorous procedure of subsections 7.1, 7.2 and the linear BNF. Since the r.h.s. of (B.12) is the r.h.s. of (7.78) it remains to show that the ε 2 -terms of the Hamiltonian associated to the operator L 7 in (7.69) coincides with the l.h.s. in (B.12). We remark that the operator L 7 has been constructed through a rigorous procedure providing also "tame" estimates of the remainder of higher order in ε. As already explained the maps B, ϒ i , i = 1, 2, are, as functions of ε, C 3 with values in L(H s , H s−3 ) . Therefore the Taylor expansion of the Hamiltonian associated to the operators in (7.69) coincides with the Lie series of the generator up to order ε 2 (see also (7.71)). Let us then Taylor expand the Hamiltonian K (2) (B.14) Using the Jacobi identity we have Ker(H 0 ) Since Ker(H (2) ) is trivial on cubic monomials, we deduce that and this concludes the proof.
Proof. The strategy is the following.
In particular q + and Q satisfy bounds like (C. 18)-(C.20). By Lemma C.1 we have (recall (C.7), (C.8)) We define the remainder Q + := Q + Q . To conclude the proof we show that Q satisfies the bounds (C. 19) and (C.20). We note that j := e i j x , χ (1) j := L 0 −1 e i j x , j ) L 2 χ (2) j , g (2) j := L 0 −1 e i j x , χ (2) j = e i j x , j := ( −1 ) * e i j x , χ where a (i) , q (i) have the form respectively (6.36), (6.34), and a (≥4) , q (≥4) satisfy estimates like (7.42). The remainders Q (i) are almost-diagonal (see Def. 6.4) and Q (≥4) satisfies estimates like (7.42). Assume also that β in (7.10) has an expansion as in (C.23). Then the symbols a + , q + and the operator Q + in (C.16) admit the same expansion in ε as in (C.23). This fact can be deduced by following the proof of Proposition 3.5 in [33]. More precisely one reasons as follows. First of all, by linearity, the conjugate of a sum as in (C.23) is the sum of the conjugates. The conjugate of a Q (i) in (C.23) under the flow of (7.11) is a smoothing remainder by applying Lemma B.10 in [33]. Of course in order to obtain homogeneous terms of degree ≤ 3 in ε we must Taylor expand the flow, following Remark 7.2 this implies that the remainders are in L ρ−3, p (of course since ρ is arbitrary this is not a problem). The conjugation of the pseudo differential operators J • a(ϕ, x) and Op(q(ϕ, x, ξ)) is based on the Egorov Theorem 3.4 in [33]. This is a constructive perturbation scheme so we can Taylor expand up to order three. In conclusion (C.18) holds for each term in the homogeneity expansion, possibly with a largerσ . On the other hand expanding the remainder Q + gives an estimate as (C.19) but with ρ ρ − 3.
Remark C. 4. We point out that the remainder in Proposition C.2 is of the order of β, i.e. of the generator of the torus diffeomorphism. This will create problems in fulfilling the smallness conditions in the KAM reducibility scheme of Sect. 7.3, where a term is perturbative if it is small w.r.t. γ 3/2 ( and γ ε 2 ).

C.2 Classes of "smoothing" operators.
In the first step of our reduction procedure (see Theorem 7.1) we need to work with operators which are pseudo differential up to a remainder in the class L ρ, p defined as follows. This class of smoothing (in space) operators has been introduced in [33].
Definition C.5. Fix s 0 ≥ (ν + 1)/2 and p, S ∈ N with s 0 ≤ p < S with possibly S = +∞. Fix ρ ∈ N, with ρ ≥ 3 and consider any subset O of R ν . We denote by L ρ, p = L ρ, p (O) the set of the linear operators A = A(ω) : H s (T ν+1 ) → H s (T ν+1 ), ω ∈ O with the following properties: • The operator A is Lipschitz in ω, (C.29) We define for 0 ≤ b ≤ ρ − 3 (C.30) By construction one has that M γ We shall also deal with "tame" operators in the following class.
(C. 36) Proof. The Lemma follows by reasoning as in the proof of Lemma B.2 in [33] and using the explicit formula (6.16).
We conclude this section by showing the connection between the class L ρ, p and the class C −1 in Definition C.6.
for b ∈ N ν , | b| = b, following the same reasoning as above, one gets the (C.37). In order to prove item (ii) one can follow almost word by word the proof of Lemma One concludes the proof of (C.39) and (C.40) followings the same ideas used above. For further details we refer to the proof of Lemma B.1 in [33].