1 Introduction and Main Results

The main goal of this work is to introduce a unified way for proving “long time stability” of the action variables for perturbations of completely integrable Hamiltonian systems which belong to a large class of function spaces. We will limit ourselves here to Hölder perturbations of analytic systems, but our method is flexible enough to be adapted to many other settings.Footnote 1

The effective stability theory for nearly-integrable hamiltonian systems was initiated by the pioneering work of Littlewood [14] and reached a first main achievement in the seventies with the work of Nekhoroshev [19]; it was then developed by many authors. The usual setting is that of Hamiltonian systems of the form

$$\begin{aligned} H(I,\theta )=h(I)+ f(I,\theta ), \end{aligned}$$
(1.1)

where \((I,\theta )\in \mathbb {R}^n\times \mathbb {T}^n\) are the action-angle variables and f is small with respect to h. In Nekhoroshev’s work the Hamiltonian H is analytic and h satisfies a steepness condition (see definition 1.1 below). The theory has been then developed in various settings: H can be assumed to be Gevrey (which includes the analytic case) or \(C^k\) with \(k\ge 2\) and integer, while h can be assumed to be convex or quasi-convex (see for example [18] or [5])

The norm of f, relative to the function space at hand, is denoted by \(\varepsilon \). For systems as (1.1), the previous results assert that the action variables are confined in a ball of radius \(\mathbf{R}(\varepsilon )\) centered at the initial action during a time \(\mathbf{T}(\varepsilon )\), provided that \(\varepsilon \) is smaller that some threshold \(\mathbf{E}\). We say that \(\mathbf{R}(\varepsilon )\) is the confinement radius, \(\mathbf{T}(\varepsilon )\) is the stability time and \(\mathbf{E}\) is the applicability threshold. The remarkable fact is that – h being given – the results depend only on the norm of f and not on its particular form.

Much attention has been paid in the literature in order to obtain good estimates for the quantities \(\mathbf{R}(\varepsilon )\) and \(\mathbf{T}(\varepsilon )\) in the different frameworks. As we shall see in the sequel, in the setting of Hölder perturbations of analytic integrable systems, the method we introduce in this paper yields sharper estimates than those that are found in the literature up to now. Before stating rigorously our results, however, it is useful to have an overview of the classical results on the effective stability of near-integrable Hamiltonian systems.

1.1 The classical results

Let us briefly describe the classical abstract results. In the 70’s Nekhoroshev proved his seminal theorem [19], which asserts that for a steep real-analytic function h and for any real-analytic perturbation f with analytic extension to a complex domain \({{\mathcal {D}}}\), all solutions are stable at least over exponentially long time intervals. Namely, there exist positive exponents a, b and a positive threshold \(\mathbf{E}\), depending only on h, such that if \({\left| f\right| }_{{{\mathcal {D}}}}\le \mathbf{E}\), then any initial condition \((I_0,\theta _0)\) gives rise to a solution \(\big (I(t),\theta (t)\big )\) which is defined at least for \({\left| t\right| }\le \exp \big (c(1/\varepsilon )^a\big )\) and satisfies \({\left| I(t)-I_0\right| }\le C\varepsilon ^b\) in that range. Here \({\left| f\right| }_{{{\mathcal {D}}}}\) is the \(C^0\) sup-norm on the domain \({{\mathcal {D}}}\) and c, C are positive constants which also depend only on h. With our notation, for these systems:

$$\begin{aligned} \mathbf{T}(\varepsilon )=\exp (c(1/\varepsilon )^a),\qquad \mathbf{R}(\varepsilon )=C\varepsilon ^b, \end{aligned}$$
(1.2)

while the expression of the threshold \(\mathbf{E}\) is quite difficult to obtain explicitly,Footnote 2 see [19]. Since the constants c and C are less significant than the exponents we will get rid of them in our subsequent description.

Nekhoroshev’s proof is based on the construction of a partition (a “patchwork") of the phase space into zones of approximate resonances of different multiplicities, over which one can construct adapted normal forms. The global stability result necessitates a very delicate control of the size and disposition of the elements of the patchwork in order to produce a “dynamical confinement” preventing the orbits from fast motions along distances larger than the confinement radius (see below for a discussion).

In the convex case, as noticed in [11] and [4], a shrewd use of energy conservation leads to a much simpler and “physical” way to confine the orbits. This gave rise to two distinct series of works, originating in the articles of Lochak [15] - where the simultaneous approximation method was introduced - and Pöschel [23] - where the construction of Nekhoroshev’s patchwork was made much easier - both relying on the convexity or quasi-convexity of the integrable Hamiltonian.

The simplicity of these methods made it possible to prove that the Nekhoroshev Theorem in the analytic case holds with

$$\begin{aligned} \mathbf{T}(\varepsilon )=\exp (c(1/\varepsilon )^{1/2n}),\qquad \mathbf{R}(\varepsilon )=C\varepsilon ^{1/2n}, \end{aligned}$$
(1.3)

if h is assumed to be quasi-convex (see [15, 17, 23]). Moreover, besides the global result, one can state local results for neighborhoods of resonant surfaces. For \(m\in \{1,\ldots ,n-1\}\), consider a sublattice \(\Lambda \in \mathbb {Z}^n_K:=\{k\in \mathbb {Z}^n:|k|_1\le K\}\) of rank m and the resonant subset . Then, for all trajectories starting at a distance of order \(\varepsilon ^{1/2}\) of \({{\mathcal {M}}}_\Lambda \), one gets larger stability exponents, namely \(a=b=1/(2(n-m))\). Moreover, in the resonant block \({{\mathcal {B}}}_\Lambda \) (which is obtained by eliminating from \({{\mathcal {M}}}_\Lambda \) all the intersections with other resonant subsets \({{\mathcal {M}}}_{\Lambda '}\), with \(\text {rank }\Lambda '=m+1\)) one can even take \(a=1/(2(n-m)),\ b=1/2\).

As alluded to above, long time stability does not require a priori the analyticity of the Hamiltonian at hand. For general Gevrey quasi-convex systems,Footnote 3 the fast decay of the Fourier coefficients also yields exponentially long stability times. Namely, for \(\beta \)-Gevrey systems (where \(\beta \) is the Gevrey exponent) it is proved in [18] that

$$\begin{aligned} \mathbf{T}(\varepsilon )=\exp \big (c/\varepsilon ^{1/(2n\beta )}\big ),\qquad \mathbf{R}(\varepsilon )=C\varepsilon ^{1/(2n\beta )}. \end{aligned}$$

The proof is based on a direct construction of normal forms for Gevrey systems. This study was initiated by M. Herman for proving the optimality of the stability exponents by constructing explicit examples taking advantage of the flexibility of the Gevrey category, see below.

Soon after, finitely differentiable systems have been investigated in [5] using a direct implementation of Lochak’s scheme in this setting, which yields the estimates

$$\begin{aligned} \mathbf{T}(\varepsilon )=c/\varepsilon ^{(\ell -2)/(2n)}\qquad \mathbf{R}(\varepsilon )=C\varepsilon ^{1/(2n)} \end{aligned}$$

for quasi-convex \(C^\ell \) systems with \(\ell \ge 2\) and integer. On the other hand, the stability of \(C^\ell \) systems, with \(\ell \) an integer such that \(\ell \ge \ell ^*n+1\) for some suitable \(\ell ^*\ge 1, \ell ^*\in {{\mathbb {N}}}\), satisfying a property known as Diophantine-Morse condition,Footnote 4 was investigated in [6], where the values

$$\begin{aligned} \mathbf{T}(\varepsilon )=c/\varepsilon ^{\ell ^*/[3(4(n+1))^n]}\qquad \mathbf{R}(\varepsilon )=C\varepsilon ^{1/(4(n+1))^n} \end{aligned}$$

were found.

The case \(\ell =+\infty \) has been studied in [1], where the authors find that, in the case \(h(I)=I^2/2\) and for fixed \(b\in (0,1/2)\), for any \(M>0\) there exists \(C_M>0\) such that

$$\begin{aligned} \mathbf{T}(\varepsilon )=\frac{C_M}{\varepsilon ^M}\qquad \mathbf{R}(\varepsilon )=C_M\varepsilon ^{b}\ . \end{aligned}$$

The result is achieved by implementing an innovative global normal form in Pöschel’s framework.

Finally, we also refer to the recent work [7] and references therein for much more information about stability in various functional classes.

1.2 Purpose of the work

The objective of this paper is to make a systematic use of analytic smoothing methods to derive normal forms in a very simple way - whatever the regularity of the Hamiltonians at hand - from the usual analytic ones. This way we get maximal flexibility to adapt the different long-time stability proofs to a large class of function spaces. We will investigate here only the case of Hölder differentiable Hamiltonians, but our method extends to any steep functions belonging to any regularity class which admits an analytic smoothing. More precisely, the proposed strategy (see Sect. 4.3) allows us to prove, in a very simple way, the first Nekoroshev-type result of stability for Hölder steep Hamiltonians with presumed sharp exponentsFootnote 5. In this case one cannot expect to get more than polynomial stability times relative to the size \(\varepsilon \) of the perturbation [5]. In the course of the proof we need to adjust in a rather unusual way the size of the various parameters: ultraviolet cutoff and, in an essential way, the analyticity width, as a function of the size \(\varepsilon \) of the perturbation.

1.3 Main results

Let us fix the main definitions and assumptions. In the following, given \(\nu \in {\left\{ 1,\ldots ,\infty \right\} }\), we denote by \(|\cdot |_\nu \) the corresponding \(\ell ^\nu \)-norm in \(\mathbb {R}^n\) or \(\mathbb {C}^n\). We denote by \( B_\nu (I_0,R)\) the open ball centered at \(I_0\) of radius R for the norm \({\left| \, \cdot \,\right| }_\nu \) in \(\mathbb {R}^n\).

Consider a Hamiltonian of the form (1.1), where we assume, for the sake of simplicity, that the unperturbed part h is analyticFootnote 6 while only the perturbation f is Hölder, so:

$$\begin{aligned} h\in C^\omega (B_\infty (0,R)_{\rho _0}), \qquad f\in C^\ell (B_\infty (0,R)\times \mathbb {T}^n), \end{aligned}$$
(1.4)

where \(B_\infty (0,R)_{\rho _0}\) is the complex extension of analyticity width \(\rho _0\ge 1\) of \(B_\infty (0,R)\), and \(\ell \in (1,+\infty )\) (meaning that f is Hölder differentiable when \(\ell \) is not an integer, see Sect. 3 for a brief overview on this class of functions). The small parameter is

$$\begin{aligned} \varepsilon := |f|_{C^\ell (B_\infty (0,R)\times \mathbb {T}^n)}, \end{aligned}$$
(1.5)

(see (3.2) for a definition of the Hölder norm). We denote by \({\omega }=\nabla h:\mathbb {R}^n\rightarrow \mathbb {R}^n\) the action-to-frequency map attached to h.

We will assume that the Hessian of h is uniformly bounded from above:

$$\begin{aligned} M := \sup _{I\in B_\infty (0,R)_{\rho _0}} \left\Vert D^2 h(I)\right\Vert _{op} < \infty , \end{aligned}$$
(1.6)

where \(\left\Vert \ \right\Vert _{op} \) stands for the operator norm induced by the Hermitian norm on \(\mathbb {C}^n\).

We will also assume that the Hamiltonian h is steep according to the following definition.

Definition 1.1

(Steepness). Fix \(\delta >0\). A \(C^1\) function \(h: B_\infty (0, R+\delta )\rightarrow \mathbb {R}\) is steep with steepness indices \({\varvec{\alpha }}_1,\ldots ,{\varvec{\alpha }}_{n-1}\ge 1\) and steepness coefficients \(C_1,\ldots , C_{n-1}, \delta \) if:

  1. (1)

    \(\inf _{B_\infty (0,R)} {\left| {\omega }(I)\right| }_2 > 0\);

  2. (2)

    for any \(I \in B_\infty (0,R)\) and any m-dimensional subspace \(\Gamma \) orthogonal to \({\omega }(I)\), with \(1\le m < n\):

    $$\begin{aligned} \max _{0\le \eta \le \xi } \, \min _{u\in \Gamma ,{\left| u\right| }_2=\eta } {\left| \pi _\Gamma {\omega }(I+u)\right| }_2 > C_m \xi ^{{\varvec{\alpha }}_m},\quad \forall \xi \in (0,\delta ], \end{aligned}$$
    (1.7)

    where \(\pi _\Gamma \) stands for the orthogonal projection on \(\Gamma \).

Remark 1.1

Note that a uniformly strictly convex function is steep with steepness indices equal to 1.

Remark 1.2

The steepness condition is generic in the space of jets of sufficiently regular functions (see [20] for the general discussion and [2, 25] for sufficient conditions for steepness in the space of jets of order four and five respectively).

Our main theorem is the following.

Theorem 1.1

(Stability estimates in the steep case). Consider a near-integrable Hamiltonian system (1.1) satisfying (1.4) and assume \(\ell \ge n+1\).Footnote 7 Suppose that h is steep in \(B_\infty (0,R)\) with steepness indices \({\varvec{\alpha }}:= ({\varvec{\alpha }}_1,\ldots ,{\varvec{\alpha }}_{n-1})\) and set:

$$\begin{aligned} {\mathtt {a}} := \frac{\ell -1}{2n{\varvec{\alpha }}_1\times \cdots \times {\varvec{\alpha }}_{n-2}}+\frac{1}{2}, \qquad {\mathtt {b}} := \frac{1}{2n{\varvec{\alpha }}_1\times \cdots \times {\varvec{\alpha }}_{n-1}} . \end{aligned}$$

Then, there exist positive constants \({\mathbf{E}}={\mathbf{E}}(n,\ell ,{\varvec{\alpha }}),{{\texttt {C}}''_\mathbf{I}}:={{\texttt {C}}''_\mathbf{I}}(n,\ell ,{\varvec{\alpha }})\), \({{\texttt {C}}''_{\mathbf{T}}}:={{\texttt {C}}''_{\mathbf{T}}}(n,\ell ,{\varvec{\alpha }})\) such that, for \(\varepsilon \le \mathbf{E}\), the radius and time of confinement relative to any initial condition in the set \( B_{\infty }(0, R/4)\) satisfy:

$$\begin{aligned} \mathbf{R}(\varepsilon ) \le {{\texttt {C}}''_{\mathbf{I}}} {\varepsilon ^{\texttt {b}}} , \qquad \mathbf{T}(\varepsilon ) \le {{\texttt {C}}''_{\mathbf{T}}}\frac{ 1}{|\ln \varepsilon |^{\ell -1}\,{\varepsilon ^{\texttt {a}}}}\,. \end{aligned}$$
(1.8)

Remark 1.3

  • The presence of the logarithm in (1.8) comes from the fact that in our method we have some freedom to fix the analyticity width depending on \(\varepsilon \), in contrast with the classical analytic setting. We send the reader to Remark 5.1, where this comment is contextualized, the dependence of the analyticity width in \(\varepsilon \) is made precise and a qualitative justification is given.

  • If we set \({\varvec{\alpha }}_1,\ldots ,{\varvec{\alpha }}_{n-1}=1\) (i.e. the convex case) we obtain better estimates than in [5].

  • Our proof relies on the geometric construction of the geography of resonances introduced in [13], which is appropriate only for Hamiltonians in \(n \ge 3\) degrees of freedom. Here too we shall restrict to this setting, the 2 d.o.f. isoenergetic non-degenerate case being easily managed through KAM theory. A specific construction should be implemented to treat the peculiarity of the isoenergetic degenerate 2 d.o.f. case. This study is in progress in a forthcoming work.

1.4 Prospects

The sharpness of the exponents in Theorem 1.1 should be proved in the same way as in the case of convex system. The first attempt to tackle this problem led to work in the Gevrey category instead of the analytic one and construct examples with unstable orbits, which experience a drift in action of the same order as the confinement radius within a time of the same order as the stability time, see [18]. It has then be realized that the initial conjecture in quasi-convex analytic systems (\(a\sim 1/2n\), see [10] and Lochak [15]) was in fact incorrect: as proved in [8] using a purely topological argument together with the previous remark on the local exponents near simple resonances, one can choose \(a=1/(2(n-1))\) as a global stability exponent for \(\mathbf{T}(\varepsilon )\). This result was improved soon after with \(a\sim 1/(2(n-2))\) (see [26]). The construction of unstable system proving the optimality of these latter exponents was achieved in [16, 18, 26]. A remarkable fact is that the unstable mechanism introduced by Arnold in the 60’s, with its subsequent improvements, is exactly what is needed to produce the unstable examples in the quasi-convex case.

As for the steep case, a careful construction of the geography of resonances leads with strong evidence to the conjecture that the exponents \(a=1/(2n{\varvec{\alpha }}_1...{\varvec{\alpha }}_{n-2})\) and \(b=1/(2n{\varvec{\alpha }}_1...{\varvec{\alpha }}_{n-1})\) are sharp (see ref. [13]). The question of constructing explicit examples with unstable orbits proving this sharpness is still open nowadays and is maybe the last challenging problem in the general long time stability theory, probably relying on new Arnold diffusion ideas.

The paper is organized as follows: in the next section we give a short overview of the classical methods with particular attention on the geometry of resonant blocks, on which the present work strongly relies. Next we define the functional setting. In Sect. 4 we introduce the analytic smoothing appropriately adapted to our problem. Finally Sect. 5 is devoted to the study of the steep case.

2 General Setting and Classical Methods: A Geometric Framework

2.1 Resonances, resonant normal forms and the steepness condition

Consider a Hamiltonian system of the form (1.1) defined on \(O\times \mathbb {T}^n\), where O is an open subset of \(\mathbb {R}^n\). The main feature underlying Hamiltonian perturbation theory is that one can modify the form of the perturbation f by composing H with properly chosen local Hamiltonian diffeomorphisms, in order to remove a large number of “nonessential harmonics”. The result of this process - a local normal form - strongly depends on the location of the domain of the normalizing diffeomorphism w.r.t the resonances of the unperturbed part h, and enables one to discriminate between fast drift and extremely slow drift directions in the action space, according to this location.

Let us first make this idea more precise. Given an integer lattice \(\Lambda \subset \mathbb {R}^n\) of dimension \(m\in \{1,\ldots ,n-1\}\) – a resonance lattice – one associates with \(\Lambda \) the resonance vector subspace \(\Lambda ^\bot \subset \mathbb {R}^n\) in the frequency space \(\mathbb {R}^n\), together with the corresponding resonance subset in the action space previously introduced

$$\begin{aligned} {{\mathcal {M}}}_\Lambda :={\omega }^{-1}(\Lambda ^\bot )=\{I\in O\mid {\omega }(I)\in \Lambda ^{\bot }\}, \end{aligned}$$

where \({\omega }=\nabla h\) is the frequency map. The dimension m of \(\Lambda \) is said to be the multiplicity of the resonance \({{\mathcal {M}}}_\Lambda \). Of course, given a resonance module \(\Lambda '\supset \Lambda \) with \(\dim \Lambda ' >\dim \Lambda \), the resonance \({{\mathcal {M}}}_{\Lambda '}\) is contained in \({{\mathcal {M}}}_\Lambda \), so that a resonance subset contains in general infinitely many resonances of higher multiplicity. The complement \({{\mathcal {M}}}_0\subset O\) of the union of all resonance subsets is the non-resonant subset. In general, a resonance subset \({{\mathcal {M}}}_\Lambda \) has no particular structure, however, one can think of \({{\mathcal {M}}}_\Lambda \) as a submanifold of \(\mathbb {R}^n\) of the same dimension as \(\Lambda ^\bot \) (with perharps singular loci).

As a rule, when \(\varepsilon \) is small enough, for a small enough \(\varepsilon \)-depending neighborhood \(W_\Lambda \) of the parts of the resonance subset \({{\mathcal {M}}}_\Lambda \) located far enough from resonances of higher multiplicity,Footnote 8 one can iteratively construct a symplectic diffeomorphism \(\Psi _\Lambda \), whose image contains \(W_\Lambda \times \mathbb {T}^n\), such that the pull-back \(H_\Lambda =H\circ \Psi _\Lambda \) takes the following form

$$\begin{aligned} H_\Lambda =h+N_\Lambda +R_\Lambda . \end{aligned}$$
(2.1)

Here \(R_\Lambda \) is a remainder whose \(C^2\) norm is (very) smallFootnote 9 with respect to \(\varepsilon \) and the resonant part \(N_\Lambda \) contains only harmonics belonging to \(\Lambda \), that is:

$$\begin{aligned} N_\Lambda (I,\theta )=\sum _{k\in \Lambda ,\,{\left| k\right| }_1\le K(\varepsilon )}a_k(I)\,e^{i k\cdot \theta }, \end{aligned}$$

where \(K(\varepsilon )\) is an ultraviolet cutoff which has to be properly chosen.Footnote 10 Both terms \(N_\Lambda \) and \(R_\Lambda \) of course depend on \(\varepsilon \). A subset \(W_\Lambda \) for which such a normal form is proved to exist will be called a normal form neighborhood associated with \(\Lambda \), with multiplicity \(\dim \Lambda \). One proves that the space of actions can be covered by such neighborhoods, and in Sect. 5.1, we will construct finer covers by subsets of those, named resonant blocks (and denoted by \(D_\Lambda \) in the aforementioned section).

The iterative process to construct the normalizing diffeomorphism involves the control of small denominators which appear during the resolution of the so-called homological equation, and which depend on the location of the normalization domain with respect to the resonances (see for instance [23]). This can be seen as a drawback of the method which could be greatly simplified by an idea due to Lochak (see below), however the general method presented here give precise dynamical informations which would not be reachable otherwise.

The Hamilton equations generated by (2.1) yield the following form for the evolution of the action variables:

$$\begin{aligned} \begin{array}{lll} I(t)-I(0)&{}=&{}\displaystyle \int _0^t \partial _\theta N_\Lambda \big (I(s),\theta (s)\big )+\partial _\theta R_\Lambda \big (I(s),\theta (s)\big )\,ds\\ &{}=&{}\displaystyle \sum _{k\in \Lambda ,\,{\left| k\right| }_1\le K(\varepsilon )}k\cdot \Big (\int _0^t i\,a_k(I(s))\,e^{i k\cdot \theta (s)}\,ds\Big )+ {{\mathcal {R}}}(t). \end{array} \end{aligned}$$
(2.2)

The variation of I is therefore the sum of the main part

$$\begin{aligned} {{\mathcal {D}}}(t):=\sum _{k\in \Lambda ,\,{\left| k\right| }_1\le K(\varepsilon )}k\cdot {{\mathcal {N}}}^{(k)}(t), \qquad {{\mathcal {N}}}^{(k)}(t)=\int _0^t i\,a_k(I(s))\,e^{i k\cdot \theta (s)}\,ds, \end{aligned}$$
(2.3)

and the very small remainder term \({{\mathcal {R}}}(t)\). To simplify the presentation in the following, we will forget about the angles and consider only the action part of the solutions of our system (which is legitimized by the fact that the angles play no role in the various estimates).

The whole theory relies firstly on the obvious fact that the main drift term \({{\mathcal {D}}}(t)\) in (2.3) belongs to the vector space \(\mathrm Vect\,\Lambda \) spanned by \(\Lambda \) (which is often called “plane” of fast drift), and secondly on the smallness of the remainder term \({{\mathcal {R}}}\). A solution starting from some initial condition \(I(0)\in W_\Lambda \) will therefore remain very close to the fast drift space

$$\begin{aligned} I(0)+\mathrm Vect\,\Lambda \end{aligned}$$

during a very long time – governed by the smallness of \({{\mathcal {R}}}\)as long as it is contained inside the neighborhood \(W_\Lambda \). This makes it necessary to understand first the intersections of the fast drift planes \(I+\mathrm Vect\,\Lambda \) and the neighborhoods \(W_\Lambda \) to which they are attached.

As an extreme example, let us consider the Hamiltonian

$$\begin{aligned} h(I)=\frac{1}{2}(I_1^2-I_2^2) \end{aligned}$$

on \(\mathbb {A}^2\), with (invertible) frequency map \({\omega }(I_1,I_2)=(I_1,-I_2)\). We focus on the resonance module \(\Lambda =\mathbb {Z}(1,-1)\), so that \(\Lambda ^\bot =\mathbb {R}(1,1)\) and \(\mathrm Vect\,\Lambda ={{\mathcal {M}}}_\Lambda \). Hence, given an initial action \(I(0)\in {{\mathcal {M}}}_\Lambda \), the entire fast drift affine subspace \(I(0)+\mathrm Vect\,\Lambda \) coincides with \({{\mathcal {M}}}_\Lambda \), so that nothing prevents the fast drift to take place during the whole motion provided the perturbation is well-chosen: the resonance \({{\mathcal {M}}}_\Lambda \) is called a superconductivity channel. No long time stability result can be expected in this case: indeed, when \(f(I,\theta )=\sin (\theta _1-\theta _2)\), the initial condition \(I=0\), \(\theta =0\) yields the fast evolution \((I_1(t),I_2(t))=(-\varepsilon t,\varepsilon t)\) for the action variables.Footnote 11

In constrast with the previous example, for the Hamiltonian

$$\begin{aligned} H(I,\theta )=\frac{1}{2}{{\left| I\right| }}^2_2+\varepsilon f(\theta ) \end{aligned}$$

on \(\mathbb {A}^n\), for any \(\Lambda \subset \mathbb {Z}^n_K\), the the resonant set \({{\mathcal {M}}}_\Lambda \) coincides with \(\Lambda ^\bot \), so that the affine planes of fast drift are always orthogonal to \({{\mathcal {M}}}_\Lambda \). In this case a fast drift - if it happens - makes the orbits move away from the resonance in a very short time.

These extreme examples illustrate the role of the Nekhoroshev condition: steepness is an intermediate quantitative property, which prevents from the existence of the superconductivity channels by ensuring a certain amount of transversality between the fast drift planes and the corresponding resonances in action. Starting from an action \(I=I(0)\) located at some resonance \({{\mathcal {M}}}_\Lambda \), so that its associated frequency \({\omega }(I)\) is orthogonal to \(\Gamma :=\mathrm Vect\,\Lambda \), the condition

$$\begin{aligned} \max _{0\le \eta \le \xi } \, \min _{u\in \Gamma ,{\left| u\right| }_2=\eta } {\left| \pi _\Gamma {\omega }(I+u)\right| }_2 > C_m \xi ^{{\varvec{\alpha }}_m},\quad \forall \xi \in (0,\delta ], \end{aligned}$$
(2.4)

(where \(\pi _\Gamma \) stands for the orthogonal projection on \(\Gamma \)) imposes that a drift of length \(\xi \) starting from I and occuring along the fast drift plane \(I+\Gamma \) makes the projection \(\pi _\Gamma ({\omega })\) change by an amount of \(C_m\xi ^{{\varvec{\alpha }}_m}\) during the way.

Fig. 1
figure 1

Geometric interpretation of the steep condition

This admits an easy geometric interpretation (see Fig. 1). Assume \(\dim \Lambda =m\) and consider the vector space \(\Gamma \) spanned by \(\Lambda \), together with its orthogonal space \(\Lambda ^\bot \) - of dimension \(n-m\). Then one can define a family of tubular neighborhoods of \(\Lambda ^\bot \) of width \(\delta >0\) by

(2.5)

Each such neighborhood gives rise to a neighborhood of the resonance \({{\mathcal {M}}}_\Lambda \) in action, namely:

$$\begin{aligned} \mathbf{W}_\delta ({{\mathcal {M}}}_\Lambda )={\omega }^{-1}\big (\mathbf{T}_\delta (\Lambda ^\bot )\big ). \end{aligned}$$
(2.6)

Therefore, condition (2.4) just says that any orbit starting from I and drifting to a distance \(\xi \) from I along the plane of fast drift \(\Gamma \) must exit the neighborhood \( \mathbf{W}_{\delta }({{\mathcal {M}}}_\Lambda )\) with \( \delta =C_m\xi ^{{\varvec{\alpha }}_m}. \)

Note finally that given disjoint subsets \(\mathbf{T}\), \(\mathbf{T}'\) of tubular neighborhoods of the form (2.5), the associated neighborhoods \({\omega }^{-1}(\mathbf{T})\) and \({\omega }^{-1}(\mathbf{T}')\) are disjoint too, whatever the geometric assumptions on the frequency map \({\omega }\).

2.2 Nekhoroshev’s hierarchy

This section is inspired by Nekhoroshev’s ideas as presented in the very nice paper [13]. We also refer to [12] for further details and to [22] for a different approach. Nekhoroshev’s strategy to prove long-time stability results for perturbations of steep Hamiltonians is based on the previous description of resonant neighborhoods, and relies on the following key observation.

Given \(\varepsilon \) small enough, there exist \(T(\varepsilon )\), \(R(\varepsilon )\) and a covering of the action space O by resonant “blocks” \(({{\mathcal {B}}}_{m,p})_{0\le p\le p_m}\),for \({0\le m\le n-1}\), and \(m, p, p_m\in {{\mathbb {N}}}\), which satisfy the following properties:

  1. (1)

    \(T(\varepsilon )\rightarrow +\infty \) and \(R(\varepsilon )\rightarrow 0\) when \(\varepsilon \rightarrow 0\);

  2. (2)

    each block \({{\mathcal {B}}}_{m,p}\) is contained in a resonant neighborhood of multiplicity m and admits an enlargement \(\widehat{{\mathcal {B}}}_{m,p}\supset {{\mathcal {B}}}_{m,p}\) contained in the same resonant neighborhood;

  3. (3)

    any solution starting from an initial condition in \({{\mathcal {B}}}_{m,p}\) either stays inside \(\widehat{{\mathcal {B}}}_{m,p}\) for \(0\le t\le T(\varepsilon )\) or admits a first exit time \(t_1\) such that \(I(t_1)\) belongs to a block \({{\mathcal {B}}}_{m',p'}\) with \(m'<m\);

  4. (4)

    for any initial condition I(0)inside a block \({{\mathcal {B}}}_{m,p}\) and for any interval \({{\mathcal {I}}}\) such that \(I(t)\in \widehat{{\mathcal {B}}}_{m,p}\) for all \(t\in {{\mathcal {I}}}\), then

    $$\begin{aligned} {\left| I(t)-I(0)\right| }_2<R(\varepsilon ),\quad \forall t\in {{\mathcal {I}}}. \end{aligned}$$

We say that m is the multiplicity of the block \({{\mathcal {B}}}_{m,p}\). Taking the previous observation for granted, the stability of the action variable over a timescale \(T(\varepsilon )\) is easy to prove by finite induction. Given an initial condition I(0) located in some block \({{\mathcal {B}}}_{m_0,p_0}\), either \(I(t)\in \widehat{{\mathcal {B}}}_{m_0,p_0}\) for \(0\le t\le T(\varepsilon )\), or there is a \(t_1\) such that \(I(t)\in \widehat{{\mathcal {B}}}_{m_0,p_0}\) for \(0\le t <t_1\) and \(I(t_1)\) belongs to a block \({{\mathcal {B}}}_{m_1,p_1}\) with \(m_1<m_0\). Consequently, there is a finite sequence \((m_0,p_0),\ldots ,(m_j,p_j)\) such that \(m_0>m_1>\cdots >m_j\) (with maybe \(m_j=0\)) and a finite sequence of times \(t_0=0<t_1<\cdots <t_p=T(\varepsilon )\) such that for \(0\le i <j\):

$$\begin{aligned} I(t)\in \widehat{{\mathcal {B}}}_{(m_i,p_i)},\quad \forall t\in [t_i,t_{i+1}]. \end{aligned}$$

In words, any orbits crosses a finite number of enlarged blocks during the interval \([0,T(\varepsilon )]\) and get trapped inside the last one. To conclude, one just has to use property (4), which proves that the distance between I(0) and I(t) is at most \(nR(\varepsilon )\) for \(t\in [0,T(\varepsilon )]\).

One should be aware that the covering by the blocks is not a partition of O: two distinct blocks may have a nonempty intersection. However, one can choose the blocks visited by the orbits according to a hierarchical order, in such a way that their multiplicity decreases as t increases.Footnote 12 We say that a covering of O by blocks satisfying the previous properties is a Nekhoroshev patchwork.

2.3 Construction of Nekhoroshev patchworks

Let us now describe how the blocks are constructed so as to possess their covering and confinement properties.Footnote 13

Given \(\varepsilon >0\), we first fix an ultraviolet cutoff \(K(\varepsilon )\) and consider only the set \(\mathbf{M}_\varepsilon \) of resonance modules which are spanned by vectors of length smaller than \(K(\varepsilon )\). Given a resonant module \(\Lambda \in \mathbf{M}_\varepsilon \) of multiplicity m, we start with the resonant zone of “width” \(\delta _\Lambda \)

$$\begin{aligned} Z_\Lambda := W_{\delta _\Lambda }({{\mathcal {M}}}_\Lambda )={\omega }^{-1}\big \{\varpi \in \mathbb {R}^n\mid {\left| \pi _\Gamma (\varpi )\right| }_2<\delta _\Lambda \big \}, \end{aligned}$$

where \(\delta _\Lambda \) has to be properly chosen as a function of \(\varepsilon \) and the various geometric invariants of the module (see Sect. 5). We then define the (\(\varepsilon \)-dependent) resonant zone \({{\mathcal {Z}}}_m\) of multiplicity m as

$$\begin{aligned} Z_m=\bigcup _{\Lambda \in \mathbf{M}_\varepsilon ,\,\dim \Lambda =m}Z_\Lambda . \end{aligned}$$

Given \(\Lambda \in \mathbf{M}_\varepsilon \), \(\dim \Lambda =m\), the block attached to \(\Lambda \) is obtained by removing from \(Z_\Lambda \) its intersection with the complete resonant zone of multiplicity \(m+1\):

$$\begin{aligned} {{\mathcal {B}}}_\Lambda = Z_\Lambda \setminus Z_{m+1}. \end{aligned}$$

The blocks \({{\mathcal {B}}}_{m,p}\) are the connected components of \(Z_m\). With no great loss of generality, one can think of (the closure of) a block as a submanifold with boundary and corners – even if it is not necessary.

The following figure (Fig. 2) shows the construction of the blocks in the case \(n=3\) (and in a transverse section). The resonance zone of multiplicity 2 is the disjoint union of the blue blocks, the resonance zone of multiplicity 1 is the union on the strips with red boundaries, while the 0-multiplicity zone is the complement of the 1-multiplicity zone.

In any case, the blocks satisfy two main properties.

  • The closures of two different blocks can intersect only when their multiplicities are distinct.

This comes from a very careful choice of the widths of the various resonance zones (see [13] and Sect. 5), which in fact ensures a more stringent (and crucial) property: the enlargement of a block contained in some \({{\mathcal {B}}}_\Lambda \) cannot intersect any other block contained in the zone \({{\mathcal {B}}}_\Lambda \), neither any other neighborhood \({{\mathcal {M}}}_{\Lambda '}\) with \(\dim \Lambda '=\dim \Lambda \) (see below for precisions on the construction of the enlargement).

Fig. 2
figure 2

Construction of the resonant blocks

We state the second property in the spirit of Conley’s isolating blocks theory.

  • The frontier \(\partial {{\mathcal {B}}}_{m,p}\) of \({{\mathcal {B}}}_{m,p}\) is the union of two subsets

    $$\begin{aligned} \partial {{\mathcal {B}}}_{m,p}=\partial ^+{{\mathcal {B}}}_{m,p}+\partial ^-{{\mathcal {B}}}_{m,p} \end{aligned}$$

    where \(\partial ^+{{\mathcal {B}}}_{m,p}\) (resp. \(\partial ^-{{\mathcal {B}}}_{m,p}\)) is contained in blocks \({{\mathcal {B}}}_{m',p'}\) with \(m'>m\) (resp. \(m'<m\)).

This raises new questions which could be the starting point of a better understanding of the relations between diffusion along invariant subsets and long-time stability theory. Indeed, given a block \({{\mathcal {B}}}_{m,p}\), a description of the (generic) features of the Hamiltonian vector field \(X_{H_\varepsilon }\) at the frontier \(\partial {{\mathcal {B}}}_{m,p}\) has never been done. In particular, nothing is known on the locus where \(X_{H_\varepsilon }\) “enters the block” and the locus where \(X_{H_\varepsilon }\) “exits the block” (Figs. 3 and 4). These two subsets are crucial for the understanding of the homology of the invariant sets contained into the blocks, following Conley’s theory, and could provide one with a new tool for constructing diffusing orbits in the steep setting.

Fig. 3
figure 3

Interpretation of the resonant blocks in the light of Conley’s theory

Fig. 4
figure 4

The Steepness property prevents the existence of superconductivity channels by ensuring a contact of finite order between the resonant manifold and the plane of fast drift. Here in the figure, \(\ell \) is the size of the resonant zone (see Section 5.1)

Going back to the construction of Nekhoroshev’s patchwork, we have to make precise the process conducting to the enlargement of a block and its stability property. Here we will again make a crucial use of the fact that an orbit starting from an initial condition \(I:=I(0)\) located in \({{\mathcal {B}}}_{m,p}\) will remain extremely close to the fast drift space \(I+\mathrm Vect\,\Lambda \) for \(0\le t\le T(\varepsilon )\), as long as it stays inside the resonant neighborhood \({{\mathcal {M}}}_\Lambda \) and far enough to the higher multiplicity resonance zones. Hence, to enlarge the block \({{\mathcal {B}}}_{m,k}\), we just have to add to it the collection of all the parts of the disks centered at points \(I\in {{\mathcal {B}}}_{m,p}\) which are contained in the intersection of the fast drift spaces \(I+\mathrm Vect\,\Lambda \) with the resonant neighborhood \({{\mathcal {M}}}_\Lambda \) (the resulting added subset is the green part in the previous two figures). We have in fact to add a very small neighborhood of these union of disks, in order to prevent the solutions to exit the extended block under the influence of the remainder part \({{\mathcal {R}}}\) of the dynamics during the time \(T(\varepsilon )\), but this would not change our description significantly. Finally, one has to make sure that the extension will not intersect any other block of the same neighborhood \({{\mathcal {B}}}_\Lambda \) or any other resonance neighborhood, which can be done by a careful tuning of the width of the zone (see Sect. 5).

This concludes our description of Nekhoroshev’s method.

3 Functional Setting

For \(n\ge 1\), we denote the standard n-dimensional torus by \(\mathbb {T}^n = \mathbb {R}^n/2\pi \mathbb {Z}^n \) and the standard 2n-dimensional annulus by \( \mathbb {A}^n = \mathbb {R}^n\times \mathbb {T}^n\).

3.1 Hölder differentiable functions

Given an integer \(q\ge 0\) and an open subset D of \(\mathbb {R}^n\), we denote by \(C^{q}(D)\) the set of q-times continuously differentiable maps \(f:D\rightarrow \mathbb {R}\) (\(C^0(D)\) being the set of continuous functions on D). We identify \(C^{q}(\mathbb {T}^n)\) with the subset of \(C^{q}(\mathbb {R}^n)\) formed by the functions that are \(2\pi \mathbb {Z}^n\)-periodic and \(C^{q}(D\times \mathbb {T}^n)\) with the subset of \(C^{q}(D\times \mathbb {R}^n)\) formed by the functions which are \(2\pi \mathbb {Z}^n\)-periodic with respect to their last n variables.

We use the conventional notation for partial derivatives: given \(f\in C^{q}(D)\) and \(\alpha \in {{\mathbb {N}}}^n\), we set for \(x\in D\):

$$\begin{aligned} \partial ^\alpha f(x)=\frac{\partial ^{\vert \alpha \vert }f}{\partial x_1^{\alpha _1}\ldots \partial x_n^{\alpha _n}}(x), \end{aligned}$$

with \(\vert \alpha \vert =\alpha _1+\cdots +\alpha _n\).

We denote by \(C_b^q(D)\) the set of \(f\in C^q(D)\) such that

$$\begin{aligned} \left\Vert f\right\Vert _{C^q(D)} := \sup _{{\left| \alpha \right| } \le q}\, \sup _{x\in D} {\left| \partial ^\alpha f(x)\right| }<+\infty , \end{aligned}$$
(3.1)

so that \(\big (C_b^q(D),\left\Vert \cdot \right\Vert _{C^q(D)}\big )\) is a Banach space with multiplicative norm.Footnote 14 It is understood that, for a function defined on a compex domain D, the \(\left\Vert \cdot \right\Vert _{C^0(D)}\) is the usual sup-norm.

If \(\ell >0\) is a non-integer real number, we write \(q:=\left\lfloor \ell \right\rfloor \) for its integer part and \(\mu = \ell - q \in (0,1)\) for its fractional part. Given a non-negative integer q and \(\mu \in (0,1)\), we denote by \(C_b^{q,\mu }(D)\) the space formed by those functions \(f\in C^q(D)\) such that

$$\begin{aligned} \begin{aligned} |f|_{C^{q,\mu }(D)}:=\left\Vert f\right\Vert _{C^q(D)}+\sup _{\alpha \in \mathbb {N}^n:|\alpha |= q}\ \ \sup _{\begin{array}{c} x,y \in D:\\ 0<|x-y|<1 \end{array}}\frac{|\partial ^\alpha f(x)-\partial ^\alpha f(y)|}{|x-y|^{\mu }}<+\infty . \end{aligned} \end{aligned}$$
(3.2)

It is well-known that \(\big (C_b^{q,\mu }(D),\vert \cdot \vert _{C^{q,\mu }(D)}\big )\) is also a Banach space with multiplicative norm. Functions belonging to these spaces are called Hölder-differentiable functions.

Given a non-integer real number \(\ell >0\), together with its integer part \(q:=\left\lfloor \ell \right\rfloor \) and its fractional part \(\mu = \ell - q \in (0,1)\), we also write \(C^\ell _b(D)\) instead of \(C_b^{q,\mu }(D)\) and \(|\cdot |_{C^{\ell }(D)}\) instead of \(|\cdot |_{C^{q,\mu }(D)}\). Clearly \(C^\ell _b(D)\subset C^{\ell '}_b(D)\) when \(\ell \ge \ell '\) and if \(f\in C^\ell _b(D)\)

$$\begin{aligned} |f|_{C^{\ell '}(D)}\le |f|_{C^{\ell }(D)}. \end{aligned}$$
(3.3)

3.2 Domains and their complex extensions

Let us define the complex n-dimensional torus \(\mathbb {T}_\mathbb {C}^n\) and the complex 2n-dimensional annulus \(\mathbb {A}_\mathbb {C}^n\) as

$$\begin{aligned} \mathbb {T}^n_\mathbb {C}= \mathbb {C}^n/2\pi \mathbb {Z}^n \qquad \text {and} \qquad \mathbb {A}^n_\mathbb {C}= \mathbb {C}^n\times \mathbb {T}^n_\mathbb {C}. \end{aligned}$$
(3.4)

We use angle coordinates \(\theta \) on \(\mathbb {T}^n_\mathbb {C}\) (with the usual abuse \(\theta \in \mathbb {C}^n\) when there is no ambiguity) and action-angle coordinates \((I,\theta )\) on \(\mathbb {A}_\mathbb {C}^n\). We see \(\mathbb {T}^n_\mathbb {C}\) as a real n-dimensional vector bundle over \(\mathbb {T}^n\). Consequently, we write

$$\begin{aligned} {\left| \theta \right| } := \max _j {\left( {\left| Im \theta _j\right| }\right) }\ , \qquad {\left| I\right| }:= \max _j {\left| I_j\right| }\ , \qquad {\left| (I, \theta )\right| } = \max {\left( {\left| I\right| },{\left| \theta \right| }\right) }\, . \end{aligned}$$
(3.5)

For integer vectors \(k\in \mathbb {Z}^n\), we use the “dual" \(\ell ^1\)-norm, which we write |k| only when there is no risk of confusion.

We need to introduce specific domains in \(\mathbb {A}^n_\mathbb {C}\). First, given \(r>0\), for a domain \(D\subset \mathbb {R}^n\), we set

$$\begin{aligned} D_r:=\big \{z\in \mathbb {C}^n:\exists z^*\in D: |z-z^*|_2< r \big \}\ . \end{aligned}$$
(3.6)

As for the torus, given \(s>0\), we introduce the global complex neighborhood

$$\begin{aligned} \mathbb {T}^n_s := \big \{\theta \in \mathbb {T}^n_\mathbb {C}\, : \, |\theta | < s\big \}. \end{aligned}$$
(3.7)

We will essentially deal with complex domains of the form

$$\begin{aligned} {{\mathcal {D}}}_{r,s}\ := D_r\times \mathbb {T}^n_s\subset \mathbb {A}_\mathbb {C}^n. \end{aligned}$$
(3.8)

We finally write \(D^\mathbb {R}_r\) and \({{\mathcal {D}}}_{r,s}^{\mathbb {R}}\) for the projections of \(D_r\) and \({{\mathcal {D}}}_{r,s}\) on \(\mathbb {R}^n\) and \(\mathbb {A}^n\) respectively.

3.3 Analytic functions and norms

If g is a bounded holomorphic function defined on \(\mathbb {T}^n_s, D_r\) or \({{\mathcal {D}}}_{r,s}\) we denote the corresponding classical sup-norms by

$$\begin{aligned} {\left| g\right| }_s = \sup _{\theta \in \mathbb {T}^n_s} {\left| g(\theta )\right| },\quad {\left| g\right| }_{r} = \sup _{I\in D_r} {\left| g(I)\right| },\quad {\left| g\right| }_{r,s} = \sup _{(I,\theta )\in {{\mathcal {D}}}_{r,s}} {\left| g(I,\theta )\right| }. \end{aligned}$$
(3.9)

Fix a bounded holomorphic function \(g:{{\mathcal {D}}}_{r,s+2\sigma }\rightarrow \mathbb {C}\), where \(\sigma >0\), and let \(g(I,\theta ) = \sum _{k\in \mathbb {Z}^n} {\hat{g}}_k(I)e^{\mathrm{i}\,k\cdot \theta }\) be its Fourier expansion, where \(k\cdot \theta = k_1\theta _1 + \cdots + k_n\theta _n\). We then introduce the weighted Fourier norm

$$\begin{aligned} \left| \left| g\right| \right| _{r,s}:= \sup _{I \in D_r}\sum _{k\in \mathbb {Z}^n}{\left| {\hat{g}}_k(I)\right| }\,e^{{\left| k\right| }s}, \end{aligned}$$
(3.10)

which is finite and satisfies

$$\begin{aligned} {\left| g\right| }_{r,s} \le \left| \left| g\right| \right| _{r,s}\le \mathrm{coth^n\,}\sigma \,{\left| g\right| }_{r,s+\sigma }. \end{aligned}$$
(3.11)

We denote by \({{\mathcal {A}}}_{r,s}\) the space of holomorphic functions on \({{\mathcal {D}}}_{r,s}\) with finite Fourier norm. Endowed with this norm, \({{\mathcal {A}}}_{r,s}\) is a Banach algebra.

Finally, the norm of a vector valued function will be the maximum of the norms of its components.

4 Analytic Smoothing

We state in this section the key ingredient of the present work. We first recall the analytic smoothing method as developed by Jackson-Moser-Zehnder for Hölder functions of \(\mathbb {R}^n\): given a Hölder function \(f\in C^\ell (\mathbb {R}^n)\) and a positive number \(s\le 1\), this yields an analytic function on the complex neighborhood \(\mathbb {R}^n_s\) whose restriction to \(\mathbb {R}^n\) is close to f in the \(C^k\) topology, for \(1\le k\le \ell \).

We then adapt their method to our specific setting of functions defined on \(\mathbb {A}^n\) (see Sect. 4.2) and, in addition, we derive the new estimate (4.22) for the weighted Fourier norm of the smoothed function.

4.1 Analytic smoothing in \(\mathbb {R}^n\)

We recall here the result by Jackson, Moser and Zehnder, following the presentation by [9] and [24].

Proposition 4.1

(Jackson-Moser-Zehnder). Fix an integer \(n\ge 1\), a real number \(\ell > 0\) and let \(f\in C^\ell _b(\mathbb {R}^n)\). Then there is a constant \({\texttt {C}}_{\texttt {J}}= {\texttt {C}}_{\texttt {J}}(\ell , n)\) such that for every \(0<s\le 1\) there exists a function \({\mathtt {f}}_s\), analytic on \(\mathbb {R}^n_s\), which satisfies

$$\begin{aligned} \left| \partial ^\alpha {\mathtt {f}}_s(x) - \sum _{\begin{array}{c} \beta \in \mathbb {N}^n:|\beta |\le \left\lfloor \ell \right\rfloor -|\alpha | \end{array}}\partial ^{\alpha +\beta } f(\mathrm{Re} x)\frac{(\mathrm{Im} x)^\beta }{\beta !}\right| \le {\texttt {C}}_{\texttt {J}}\, s^{\ell -|\alpha |} {\left| f\right| }_{C^\ell (\mathbb {R}^n)},\quad \forall x \in \mathbb {R}^n_s,\nonumber \\ \end{aligned}$$
(4.1)

for all multi-integer \(\alpha \in {{\mathbb {N}}}^{n}\) such that \({\left| \alpha \right| }\le \left\lfloor \ell \right\rfloor \). More precisely, given any even \(C^\infty \) function \(\Phi \) with compact support in \(\mathbb {R}^n\) and setting

$$\begin{aligned} K(\xi ) := \frac{1}{(2\pi )^{n}}\int _{\mathbb {R}^{n}} \Phi (x) e^{\mathrm{i}x\cdot \xi }dx ,\quad \xi \in \mathbb {R}^n_s, \end{aligned}$$
(4.2)

the function

$$\begin{aligned} {\mathtt {f}}_s( x):=\int _{\mathbb {R}^{n}}{K}\left( \frac{x}{s}-\xi \right) f(s\xi )\ d\xi \ , \end{aligned}$$
(4.3)

satisfies the previous requirements (where the constant \({\texttt {C}}_{\texttt {J}}(\ell , n)\) depends on the choice of \(\Phi \)).

Observe that \({\mathtt {f}}_s\) takes real values when its argument is in \(\mathbb {R}^n\).

4.2 Analytic smoothing in \(\mathbb {A}^n\)

In the following, the Hölder regularity \(\ell \) is assumed to satisfy \(\left\lfloor \ell \right\rfloor \ge n+1\) as in the hypotheses of Theorem 1.1.

We now specialize the previous result to our setting and give a more detailed description of the method in the case of functions of \(\mathbb {A}^n\). In that case, the analytic smoothing is a truncation of the Fourier series of the initial Hölder function with suitably modified Fourier coefficients (the so-called Jackson polynomials). Our main concern here is to derive an estimate on the weighted Fourier norm of an s-smoothed \(C^\ell \) function over a complex strip of width s.

To make the whole presentation more explicit and take the anisotropy of the weighted Fourier norm into account, we first consider functions defined on \(\mathbb {R}^n\) and \(\mathbb {T}^n\) separately. This then yields a statement for functions of \(\mathbb {A}^n\).

  • The non-periodic case. Fix an even function \(\Phi :\mathbb {R}^n\rightarrow [0,1]\), of class \(C^\infty \), with support in the ball \(\overline{B}_2(0,1)\) and let \(K:\mathbb {C}^n\rightarrow \mathbb {C}\) be its Fourier-Laplace transform:

    $$\begin{aligned} K(y)=\frac{1}{(2\pi )^n}\int _{\mathbb {R}^n}\Phi (\eta )e^{-i \eta \cdot y}d\eta . \end{aligned}$$
    (4.4)

Since \(\Phi \) is compactly supported, then K is an entire function . Moreover its restriction to \(\mathbb {R}^n\) is in the Schwartz class \({{\mathcal {S}}}(\mathbb {R}^n)\) since \(\Phi \) is, and this is also the case for the translates \(y\mapsto K(y-z)\) for \(y\in \mathbb {R}^n\) and fixed \(z\in \mathbb {C}^n\).

Let \(f:\mathbb {R}^n\rightarrow \mathbb {R}\) be a \(C^\ell \) function with \(\left\lfloor \ell \right\rfloor \ge n+1\), with compact support contained in the ball \(\overline{B}_\infty (0,R_0)\) for some \(R_0>0\). Given \(s\in \,]0,1]\), set for \(x\in \mathbb {R}^n\):

$$\begin{aligned} \mathbf{f}_s(x)= & {} \frac{1}{s^n}\int _{\mathbb {R}^n}K\Big (\frac{x-y}{s}\Big )f(y)dy \nonumber \\= & {} \int _{\mathbb {R}^n}K\Big (\frac{x}{s}-y\Big )f(sy)dy=\int _{\mathbb {R}^n}K(y)f(x-sy)dy. \end{aligned}$$
(4.5)

By Fourier reciprocity:

$$\begin{aligned} \mathbf{f}_s(x)=\int _{\mathbb {R}^n}\Phi (\eta )\widehat{f(x-sy)}(\eta )d\eta , \end{aligned}$$

with:

$$\begin{aligned} \widehat{f(x-sy)}(\eta )= & {} \frac{1}{(2\pi )^n}\int _{\mathbb {R}^n}f(x-sy)e^{-iy\cdot \eta }dy\\= & {} \frac{1}{(2\pi )^ns^n}\int _{\mathbb {R}^n}f(u)e^{-i(x-u)\cdot \eta /s}du\\= & {} \frac{e^{-i x\cdot \eta /s}}{s^n}\widehat{f}\Big (\frac{-\eta }{s}\Big ). \end{aligned}$$

Therefore, since \(\Phi \) is even:

$$\begin{aligned} \mathbf{f}_s(x)= & {} \frac{1}{s^n}\int _{\mathbb {R}^n}\Phi (\eta )\widehat{f}\Big (\frac{-\eta }{s}\Big )e^{-ix\cdot \eta /s}d\eta = \int _{\mathbb {R}^n}\Phi (s\eta )\widehat{f}({-\eta })e^{-i x \cdot \eta }d\eta \nonumber \\= & {} \int _{\mathbb {R}^n}\Phi (s\eta )\widehat{f}({\eta })e^{i x\cdot \eta }d\eta . \end{aligned}$$
(4.6)

Hence \(\mathbf{f}_s\) is the inverse Fourier-Laplace transform of the “truncation”

$$\begin{aligned} \eta \mapsto \Phi (s\eta )\widehat{f}(\eta ). \end{aligned}$$

The first term of (4.5) shows that \(\mathbf{f}_s\) extends to \(\mathbb {C}^n\) and is an entire function. To get our final estimate we go back to the second term in (4.5), which yields

$$\begin{aligned} |\mathbf{f}_s(z)|\le \left\Vert f\right\Vert _{C^0(\mathbb {R}^n)} \int _{\mathbb {R}^n}\left| K\left( \frac{z}{s}-y\right) \right| dy,\qquad z\in \mathbb {C}^n. \end{aligned}$$
(4.7)

By the Schwartz estimate of Lemma A.1, there exists a constant \(C_n\) such that

$$\begin{aligned} \left| K\left( \frac{z}{s}-y\right) \right| \le C_n \frac{e^{\mathrm{Im} (z/s-y)}}{(1+|z/s-y|_2)^{n+1}}, \end{aligned}$$

so that, for \(y\in \mathbb {R}^n,z\in \mathbb {C}^n\) and \(|\mathrm{Im}\; z|_2\le s\):

$$\begin{aligned} \left| K\left( \frac{z}{s}-y\right) \right| \le C_n \frac{e}{(1+|\mathrm{Re}(z/s-y)|_2)^{n+1}}. \end{aligned}$$

Hence:

$$\begin{aligned} |\mathbf{f}_s(z)|\le \left\Vert f\right\Vert _{C^0(\mathbb {R}^n)} C_n e\int _{\mathbb {R}^n} \frac{dy}{(1+|y|_2)^{n+1}}. \end{aligned}$$
(4.8)

since z/s is fixed and can be eliminated by a simple translation. We finally get the following estimate:

$$\begin{aligned} |\mathbf{f}_s|_s=\sup _{z\in \mathbb {C}^n: {\left| \mathrm{Im} z\right| }_2\le s}|\mathbf{f}_s(z)|\le C_1(n)\left\Vert f\right\Vert _{C^0(\mathbb {R}^n)}, \end{aligned}$$
(4.9)

with

$$\begin{aligned} C_1(n):=C_n e\int _{\mathbb {R}^n} \frac{dy}{(1+|y|_2)^{n+1}}<\infty . \end{aligned}$$
  • The periodic case. Fix now an even function \(\Psi :\mathbb {R}^n\rightarrow [0,1]\), of class \(C^\infty \), with support in the ball \(\overline{B}_1(0,1)\) and define the associate kernel K as in (4.4).

Fix a \({2\pi }\mathbb {Z}^n\)-periodic function \(f\in C^\ell (\mathbb {R}^n)\) with \(\ell \ge n+1\). Then the Fourier expansion

$$\begin{aligned} f(\theta )=\sum _{k\in \mathbb {Z}^n}\widehat{f}_k e^{ik\cdot \theta },\qquad \widehat{f}_k=\frac{1}{(2\pi )^n}\int _{\mathbb {T}^n}f(\varphi )e^{-ik\cdot \varphi }d\varphi , \end{aligned}$$

converges normally since, by Lemma A.2 in Appendix 5.3, for \(k\in \mathbb {Z}^n\setminus \{0\}\), there exists a universal constant \({\texttt {C}}_{\mathtt { F}}(n,\ell )\) satisfying

$$\begin{aligned} {\left| \widehat{f}_k\right| }\le {\texttt {C}}_{\mathtt { F}}(n,\ell )\frac{||f||_{C^{\left\lfloor \ell \right\rfloor }}}{{\left| k\right| }_\infty ^{\left\lfloor \ell \right\rfloor }} \end{aligned}$$
(4.10)

and \(\left\lfloor \ell \right\rfloor \ge n+1\) by hypothesis. For \(s\in \,]0,1]\), the function

$$\begin{aligned} \mathbf{f}_s(\theta )=\frac{1}{s^n}\int _{\mathbb {R}^n}K\Big (\frac{\theta -\varphi }{s}\Big )f(\varphi )d\varphi \end{aligned}$$

is well-defined and, by the Fubini interversion theorem:

$$\begin{aligned} \mathbf{f}_s(\theta )=\sum _{k\in \mathbb {Z}^n}\widehat{f}_k\int _{\mathbb {R}^n}K(\varphi )e^{ik\cdot (\theta -s\varphi )}d\varphi =\sum _{k\in \mathbb {Z}^n}\widehat{f}_k e^{ik\cdot \theta }\int _{\mathbb {R}^n}K(\varphi )e^{-i sk\cdot \varphi }d\varphi . \end{aligned}$$

Hence, since K is the inverse Fourier transform of \(\Psi \), by the Fourier inversion theorem:

$$\begin{aligned} \mathbf{f}_s(\theta )=\sum _{k\in \mathbb {Z}^n}\widehat{f}_k \Psi (sk)\,e^{ik\cdot \theta },\quad \theta \in \mathbb {R}^n. \end{aligned}$$
(4.11)

As in the non-periodic case, this makes apparent that \(\mathbf{f}_s\) is a continuous truncation of the Fourier expansion of f with a \(\Psi \)-dependent modification of its Fourier coefficients (the so-called Jackson polynomial):

$$\begin{aligned} \widehat{(\mathbf{f}_s)}_k=\Psi (sk)\widehat{f}_k\ . \end{aligned}$$
(4.12)

Consequently, the Fourier norm

$$\begin{aligned} \left\Vert \mathbf{f}_s\right\Vert _s=\sum _{k\in \mathbb {Z}^n}{\left| \widehat{(\mathbf{f}_s)}_k\right| }e^{s{\left| k\right| }_1} \end{aligned}$$

depends only on the harmonics such that \(|k|_1\le 1/s\) and satisfies

$$\begin{aligned} \left\Vert \mathbf{f}_s\right\Vert _s\le \sum _{{\left| k\right| }_1\le 1/s}{\left| \widehat{(\mathbf{f}_s)}_k\right| }\,e^{s{\left| k\right| }_1} \le e\sum _{{\left| k\right| }_1\le 1/s}{\left| \widehat{(\mathbf{f}_s)}_k\right| }\le e\,\sum _{k\in \mathbb {Z}^n}{\left| \widehat{f}_k\right| }. \end{aligned}$$

Hence, by (4.10):

$$\begin{aligned} \left\Vert \mathbf{f}_s\right\Vert _s\le C_2(\ell ) {\left| f\right| }_{C^{\left\lfloor \ell \right\rfloor }} \end{aligned}$$
(4.13)

with

$$\begin{aligned} C_2(\ell ):=e\Bigg (1+{\texttt {C}}_{\mathtt { F}}(n,\ell )\sum _{k\in \mathbb {Z}^n\setminus \{0\}}\frac{1}{{\left| k\right| }_\infty ^{[\ell ]}} \Bigg ) \end{aligned}$$
(4.14)
  • Functions on \(\mathbb {A}^n\). We finally gather together the previous two cases. Let \(\Phi \otimes \Psi :\mathbb {R}^n\times \mathbb {R}^n\rightarrow [0,1]\) be defined by

    $$\begin{aligned} \Phi \otimes \Psi (x,\theta )=\Phi (x)\Psi (\theta ), \end{aligned}$$

and define the kernel

$$\begin{aligned} K(y,\varphi )=\int _{\mathbb {R}^{2n}}\Phi \otimes \Psi (x,\theta )\,e^{-i (x,\theta )\cdot (y,\varphi )}\,dxd\theta =K_\Phi (y)K_\Psi (\varphi )=K_\Phi \otimes K_\Psi (y,\varphi ) \end{aligned}$$

where \(K_\Phi \) and \(K_\Psi \) are defined as above. Fix a function \(f:\mathbb {R}^n\times \mathbb {R}^n\rightarrow \mathbb {C}\), \(2\pi \mathbb {Z}^n\)-periodic with respect to its last n variables, with support in \(\overline{B}_2(0,R_0)\times \mathbb {R}^n\) for some \(R_0>0\), belonging to \(C^\ell (\mathbb {R}^{2n})\) with \(\left\lfloor \ell \right\rfloor \ge n+1\). For \(s\in \,]0,1]\) and \((x,\theta )\in \mathbb {R}^n\times \mathbb {R}^n\), set

$$\begin{aligned} \begin{array}{lll} \mathbf{f}_s(x,\theta )&{} =\displaystyle \int _{\mathbb {R}^{2n}}K(y,\varphi )f(x-sy,\theta -s\varphi )dyd\varphi \\ &{} =\displaystyle \int _{\mathbb {R}^{2n}}K(y,\varphi )\sum _{k\in \mathbb {Z}^n} \widehat{f}_k(x-sy)e^{ik\cdot (\theta -s\varphi )}dyd\varphi \\ \end{array} \end{aligned}$$

with

$$\begin{aligned} \widehat{f}_k(u)=\frac{1}{(2\pi )^n}\int _{\mathbb {T}^n}f(u,v) e^{-i k\cdot v} dv. \end{aligned}$$
(4.15)

Note that \(f_k\) is \(C^{\ell }\), with support in \(\overline{B}_2(0,R_0)\), so that the previous study on the non-periodic case applies to \(f_k\). By Fubini interversion

$$\begin{aligned} \mathbf{f}_s(x,\theta )= & {} \displaystyle \sum _{k\in \mathbb {Z}^n}\int _{\mathbb {R}^{2n}}K(y,\varphi )\widehat{f}_k(x-sy)e^{ik\cdot (\theta -s\varphi )}dyd\varphi \nonumber \\= & {} \displaystyle \sum _{k\in \mathbb {Z}^n}\Big (\int _{\mathbb {R}^{n}}K_\Phi (y)\widehat{f}_k(x-sy)dy\Big )\Big (\int _{\mathbb {R}^{n}}K_\Psi (\varphi )e^{ik\cdot (\theta -s\varphi )}d\varphi \Big ) \nonumber \\= & {} \displaystyle \sum _{k\in \mathbb {Z}^n}(\mathbf{\widehat{f}}_k)_s(x)\Psi (sk)e^{ik\cdot \theta } \end{aligned}$$
(4.16)

where \((\mathbf{\widehat{f}}_k)_s\) stands for the analytic smoothing of the Fourier coefficient \(\widehat{f}_k\). This proves that the Fourier coefficient \((\widehat{\mathbf{f}}_s)_k(x)\) relative to the periodic variable \(\theta \) reads

$$\begin{aligned} (\widehat{\mathbf{f}}_s)_k(x)=\Psi (sk)(\mathbf{\widehat{f}}_k)_s(x),\quad k\in \mathbb {Z}^n. \end{aligned}$$
(4.17)

Expressions (4.16) and (4.17) make clear that the whole smoothing procedure of a function depending both on action and angle variables consists in constructing a Jackson trigonometric polynomial by smoothing the Fourier coefficients and by suitably truncating the Fourier series.

Using the definition of \(\Psi \), \((\widehat{\mathbf{f}}_s)_k=0\) when \({\left| k\right| }_1>1/s\) and, by (4.17) and (4.9):

$$\begin{aligned} |(\widehat{\mathbf{f}}_s)_k(z)|\le&|(\widehat{\mathbf{f}}_k)_s(z)|\le C_1(n) \left\Vert \widehat{f}_k\right\Vert _{C^0(\mathbb {R}^n)}\le C_1(n){\texttt {C}}_{\mathtt { F}}(n,\ell )\frac{|f|_{C^{\left\lfloor \ell \right\rfloor }(\mathbb {R}^n)}}{|k|_\infty ^{\left\lfloor \ell \right\rfloor }}, \nonumber \\&\quad k\ne 0, \ {\left| k\right| }_1\le 1/s, \end{aligned}$$
(4.18)

and

$$\begin{aligned} |(\widehat{\mathbf{f}}_s)_0(z)|\le C_1(n)\left\Vert \widehat{f}_0\right\Vert _{C^0(\mathbb {R}^n)}\le C_1(n)\left\Vert f\right\Vert _{C^0(\mathbb {R}^n)}. \end{aligned}$$
(4.19)

As for the weighted Fourier norm of \(\mathbf{f}_s\), we finally get:

$$\begin{aligned} \begin{array}{lll} ||\mathbf{f}_s||_{s,s}&{}=&{}\sup _{{\left| \mathrm{Im}\, z\right| }_2\le s}\sum _{k\in \mathbb {Z}^n} {\left| (\widehat{\mathbf{f}}_s)_k(z)\right| } \,e^{s{\left| k\right| }_1}\\ &{}\le &{}C_1(n)\left\Vert f\right\Vert _{C^0(\mathbb {R}^n)}+\displaystyle \sum _{\begin{array}{c} k\in \mathbb {Z}^n\backslash \{0\}:\\ |k|_1\le 1/s \end{array}} eC_1(n){\texttt {C}}_{\mathtt { F}}(n,\ell )\frac{|f|_{C^{\left\lfloor \ell \right\rfloor }(\mathbb {R}^n)}}{|k|_\infty ^{\left\lfloor \ell \right\rfloor }}\le C_L(n,\ell ) |f|_{C^\ell (\mathbb {R}^n)}\,, \end{array} \end{aligned}$$

where

$$\begin{aligned} C_L(n,\ell ):=C_1(n)\left( 1+e{\texttt {C}}_{\mathtt { F}}(n,\ell )\sum _{k\in \mathbb {Z}^n}\frac{1}{|k|_\infty ^{\left\lfloor \ell \right\rfloor }}\right) <+\infty . \end{aligned}$$
(4.20)

4.3 The main result with an application to normal forms

4.3.1 Main result

Gathering together the elements of the previous section, we get the following result.

Theorem 4.1

(Analytic smoothing). Fix an integer \(n\ge 1\), \(R>0\) and \(s\in \,]0,1]\). Let f be a \(C^\ell \) function on \(B_\infty (0,2R)\times \mathbb {T}^n\). Then there exist two constants \({\texttt {C}}_A(R,\ell ,n),{\texttt {C}}_B(R,\ell ,n)\) and an analytic function \({\mathtt {f}}_s\) on the set \(\mathbb {A}_s^n\) satisfying

$$\begin{aligned}&\left\Vert f-\mathtt {f}_s\right\Vert _{C^p(B_\infty (0,R)\times \mathbb {T}^n)} \le \ {\texttt {C}}_A(R,\ell ,n)\ s^{\ell -p} |f|_{C^{\ell }(B_\infty (0,2R)\times \mathbb {T}^n)} \nonumber \\&\quad \hbox { for any integer}\ 0\le p\le \left\lfloor \ell \right\rfloor \end{aligned}$$
(4.21)

and

$$\begin{aligned} \left\Vert {\mathtt {f}}_s\right\Vert _{s,s}\le {\texttt {C}}_B(R,\ell ,n) {|f|_{C^{\ell }(B_\infty (0,2R)\times \mathbb {T}^n)}}. \end{aligned}$$
(4.22)

Moreover, \(\mathtt {f}_s\) is a trigonometric polynomial in the angular variables.

Proof

Fix a function \(\chi \in C^\infty (\mathbb {R}^n)\), with values in [0, 1], equal to 1 on the ball \(B_\infty (0,R)\) and with support in \(B_\infty (0,2R)\). Then the product \(\overline{f}:=\chi f\) is \(C^\ell \) on \(\mathbb {A}^n\), has compact support in \(B_\infty (0,2R)\times \mathbb {T}^n\) and coincides with f on \( B_\infty (0,R)\times \mathbb {T}^n\). Moreover

$$\begin{aligned} |\overline{f}|_{C^{\ell }(B_\infty (0,2R)\times \mathbb {T}^n)}\le {{\texttt {C}}_K}|f|_{C^{\ell }(B_\infty (0,2R)\times \mathbb {T}^n)} \end{aligned}$$

where \({{\texttt {C}}_K}=C|\chi |_{C^{\ell }(B_\infty (0,R)\times \mathbb {T}^n)}\) and C is a universal constant. By the Jackson-Moser-Zehnder theorem applied to \(\overline{f}\), there is an analytic function \(\bar{\mathtt {{{f}}}}_s\) on \(\mathbb {A}^n_s\) satisfying

$$\begin{aligned} \left| \partial ^\alpha \bar{\mathtt {{{f}}}}_s(I,\theta )-\sum _{\begin{array}{c} \beta \in \mathbb {N}^{2n}:\\ |\beta |\le \left\lfloor \ell \right\rfloor -|\alpha | \end{array}}\partial ^{\alpha +\beta } {\bar{f}}(\mathrm{Re} (I,\theta ))\frac{(\mathrm{Im} (I,\theta ))^\beta }{\beta !}\right| \le {\texttt {C}}_{\texttt {J}}s^{{\ell }-|\alpha |}|{\bar{f}}|_{C^{\ell }(\mathbb {A}^n)},\nonumber \\ \end{aligned}$$
(4.23)

so that for any \(p\le \left\lfloor \ell \right\rfloor \):

$$\begin{aligned} \left\Vert {\bar{f}}- \bar{\mathtt {{{f}}}}_s\right\Vert _{C^p(\mathbb {A}^n)} \le \ {\texttt {C}}_{\texttt {J}}s^{\ell -p} |{\bar{f}}|_{C^{\ell }(\mathbb {A}^n)}. \end{aligned}$$
(4.24)

As a consequence, taking the form of \(\chi \) into account, one gets

$$\begin{aligned} \left\Vert {f}-\mathtt {f}_s\ \right\Vert _{C^p(B_\infty (0,R)\times \mathbb {T}^n)} \le {{\texttt {C}}_K} {\texttt {C}}_{\texttt {J}}s^{\ell - p} |f|_{C^{\ell }(B_\infty (0,2R)\times \mathbb {T}^n)}. \end{aligned}$$
(4.25)

Setting \({\texttt {C}}_A:={{\texttt {C}}_K} {\texttt {C}}_{\texttt {J}}\) and, since the analyticity width \(\rho \) of the integrable part h is greater than s, the bound (4.21) follows. The proof of (4.22) is an immediate consequence of the previous paragraphs if one sets \({\texttt {C}}_B:=C_L\times C_K\). \(\square \)

4.3.2 An easy way to derive normal forms for Hölder functions from analytic ones.

Let us now explain our strategy for a general Hölder Hamiltonian, we will then restrict ourselves to the case where h is analytic. Let

$$\begin{aligned} H(I,\theta ):=h(I)+f(I,\theta ) \end{aligned}$$
(4.26)

be \(C^\ell \) on \(B_\infty (0,2R) \times \mathbb {T}^n\). Given \(s\in \,]0,1]\), let \(\mathbf{H}_s\) be the s-smoothed analytic function given by Theorem 4.1 applied to the function H. By classical constructions (alluded to in the introduction and which will be recalled in the following), there exist (close to identity) symplectic analytic local diffeomorphisms \(\Psi \) defined on domains \(D\subset \mathbb {A}^n\) which bring \(\mathbf{H}_s=\mathbf{h}_s+\mathbf{f}_s\) to the normal form \(\mathbf{H}_s\circ \Psi : D\rightarrow \mathbb {R}\):

$$\begin{aligned} \mathbf{H}_s\circ \Psi =\mathbf{h}_s+\mathbf {g}+ \mathbf{f}_s^* \end{aligned}$$
(4.27)

where \(\mathbf{h}_s\) is nothing else than the smoothed initial integrable Hamiltonian, \({\mathtt {g}}\) is a resonant part which controls the fast drift in certain directions and \(\mathbf{f}_s^*\) is a very small remainder – all these functions being analytic on D. The keypoint in our subsequent constructions is the following very simple equality

$$\begin{aligned} H\circ \Psi =\mathbf{H}_s\circ \Psi +(H-\mathbf{H}_s)\circ \Psi =\mathbf{h}_s+\mathbf {g}+ \big [\mathbf{f}_s^*+(H-\mathbf{H}_s)\circ \Psi \big ]. \end{aligned}$$
(4.28)

This is a normal form for H, obtained by composition of H with an analytic diffeomorphism, in which the first three terms are analytic on D and only the last one is \(C^\ell \). So \(H\circ \Psi \) has the same structure and dynamical interpretation as \(\mathbf{H}_s\circ \Psi \), provided that the \(C^\ell \) size of the additional remainder \((H-\mathbf{H}_s)\circ \Psi \) is of the same order as the size of the initial remainder \(\mathbf{f}_s^*\). This issue strongly depends on the analytic smoothing method in use, we will show in the sequel that the Jackson-Moser-Zehnder method is relevant for our purposes. Our study will be even easier since we assume from the beginning that the integrable part h is analytic.

It turns out that the same smoothing method - and the same simple way to get a normal form from an analytical one - are also relevant in many other functional classes, the main ones being the Gevrey classes already used in [18], but also other ultradifferentiable ones. This will be developed in a further work.

5 Estimates of Stability

The aim of this section is to prove Theorem 1.1. The proof consists of several steps. Following the discussion in Sect. 2.1 of the introduction, we first build an appropriate resonant covering of the phase space for the integrable Hamiltonian h. Secondly, we study the local dynamics by applying Pöschel’s resonant normal form (see Appendix 5.3) in each resonant block and we set the dependencies of the ultraviolet cut-off K and analyticity widths rs on the perturbative parameter \(\varepsilon \). Finally, we exploit the properties of the resonant covering and we obtain a global result of stability by exploiting the so called ”capture in resonance” argument.

5.1 Construction of the resonant patchwork

In the sequel, we follow ref. [13], in which the choices of the parameters and the dependencies of the small denominators on the ultraviolet cut-off K are justified heuristically. For the sake of clarity, in order to have coherent notations we denote by \(D_\Lambda \) rather than \({{\mathcal {B}}}_\Lambda \) the resonant blocks introduced in Sect. 2, moreover when possible we will not keep track of constantsFootnote 15 but rather indicate their presence in bounds and equalities by using the following symbols respectively: \(\circeq ,\lessdot \) and \(\gtrdot \).

We start by setting some parameters, depending on the steepness indices \({\varvec{\alpha }}_1,..,{\varvec{\alpha }}_{n-1}\) of h, that will be useful throughout this section.

$$\begin{aligned}&p_j:={\left\{ \begin{array}{ll} \Pi _{i=j}^{n-2}{\varvec{\alpha }}_i, &{} \quad \text {if }j\in \{1,...,n-2\}\\ 1, &{} \quad \text {if }j\in \{n-1,n\} \end{array}\right. }\ ; \quad q_j:=np_j-j\ , \ \ j\in \{1,...,n\}\ ; \nonumber \\&\quad c_j:=q_j-q_{j+1} \ , \ \ j\in \{1,...,n-1\} \end{aligned}$$
(5.1)

and set

$$\begin{aligned} a:=\frac{1}{2n{\varvec{\alpha }}_1...{\varvec{\alpha }}_{n-2}}=\frac{1}{2np_1}, \qquad b:=\frac{1}{2n{\varvec{\alpha }}_1...{\varvec{\alpha }}_{n-1}}=\frac{a}{{\varvec{\alpha }}_{n-1}}, \qquad {\mathtt {R}}(\varepsilon ):\circeq \, \varepsilon ^b\ .\nonumber \\ \end{aligned}$$
(5.2)

With this setting, we fix an action \(I_0\in B_\infty (0,R/4)\) and we consider its neighborhood \(B_2(I_0,{\mathtt {R}}(\varepsilon ))\).

Since h is steep in \(B_\infty (0,R)\), the norm of the frequency \(\omega :=\partial _I h(I)\) at any point of this set admits a uniform lower positive bound, that is \(\inf _{I\in B_\infty (0,R)}||\omega (I)||\gtrdot 1\). Hence, when studying the geography of resonances for h, for sufficiently small \(\varepsilon \) and without any loss of generality we can just consider maximal lattices \(\Lambda \subset \mathbb {Z}^n_K\) of dimension \(j\in \{0,...,n-1\}\), with \(K\ge 1\) the ultraviolet cut-off. For a lattice \(\Lambda \) of dimension \(j\in \{0,...,n-1\}\) we define its associated resonant zone as

$$\begin{aligned} Z_\Lambda :=\{I\in B_2(I_0,{\mathtt {R}}(\varepsilon )):\ \forall k\in \Lambda \ \text {one has }\ \ |k\cdot \omega (I)|<\delta _\Lambda \}\ ,\ \ \delta _\Lambda :\circeq \frac{1}{|\Lambda |K^{q_j}}\ .\nonumber \\ \end{aligned}$$
(5.3)

and its associated resonant block \(D_\Lambda \) as

$$\begin{aligned} D_\Lambda :=Z_\Lambda \backslash \bigcup _{\Lambda ':\,\dim \Lambda '=j+1}Z_{\Lambda '}\ . \end{aligned}$$
(5.4)

Note that \(D_\Lambda \) corresponds to that part of the resonant zone \(Z_\Lambda \) which does not contain any other resonances other than the one associated to \(\Lambda \). In particular, this implies that for the completely non-resonant block associated to \(\Lambda =\{0\}\) and for any block \(\Lambda \) corresponding to a maximal resonance of dimension \(j=n-1\) one has, respectively

$$\begin{aligned} D_0:=B(I_0,{\mathtt {R}}(\varepsilon ))\backslash \bigcup _{\Lambda ':\ \dim \Lambda '=1}Z_{\Lambda '}\quad \text { and } \quad D_\Lambda =Z_\Lambda \ . \end{aligned}$$
(5.5)

For any \(j\in \{0,...,n-1\}\) we set

$$\begin{aligned} D_j:=\bigcup _{\Lambda :\ \dim \Lambda =j}D_{\Lambda },\qquad Z_j:= \bigcup _{\Lambda :\ \dim \Lambda =j}Z_{\Lambda } \ . \end{aligned}$$
(5.6)

It is easy to see from (5.4) that

$$\begin{aligned} D_j=Z_j\backslash Z_{j+1} \end{aligned}$$
(5.7)

so that from the definition of \(D_0\) in (5.5) one has the decompositions

$$\begin{aligned} B_2(I_0,{\mathtt {R}}(\varepsilon ))=\bigcup _{i=0}^{n-1} D_{i},\quad B_2(I_0,{\mathtt {R}}(\varepsilon ))=\left( \bigcup _{i=0}^{j-1} D_i\right) \cup Z_j \quad \forall j=1,...,n-1.\qquad \end{aligned}$$
(5.8)

As we have explained in the introduction (see Sect. 2.1), a large drift over a short time of any action variable \(I\in D_\Lambda \) is only possible along the plane of fast drift \(I+\langle \Lambda \rangle \) spanned by the vectors belonging to \(\Lambda \). Moreover, the fast motion of the orbit starting at I along \(I+\langle \Lambda \rangle \) can take the actions out of the block \(D_\Lambda \). So, we are interested in understanding what happens when the actions leave \(D_\Lambda \) but keep staying in \(Z_\Lambda \). Hence, we are naturally taken to consider the intersection of a neighborhood of \(I+\langle \Lambda \rangle \) with \(Z_\Lambda \). In this spirit, we fix

$$\begin{aligned} \rho (\varepsilon ):=\displaystyle \frac{{\mathtt {R}}(\varepsilon )}{2n} \end{aligned}$$
(5.9)

and, for any \(0<\eta \le \rho (\varepsilon )\) and for any action \(I\in D_\Lambda \) with \(\Lambda \ne \{0\}\), we define the disc associated to I as

$$\begin{aligned} {\mathbf {D}}^{\rho }_{\Lambda ,\eta }(I):=\Bigg (\ \bigg (\bigcup _{I'\in I+\langle \Lambda \rangle }B_2(I',\eta )\bigg )\cap Z_\Lambda \cap B\big (I_0,{\mathtt {R}}(\varepsilon )-\rho (\varepsilon )\big ) \ \Bigg )_I \end{aligned}$$
(5.10)

where the subscript I denotes the connected component of the set containing the action I. Since we are going to study the fate of all orbits starting at a fixed block \(D_\Lambda \), with \(\Lambda \ne \{0\}\), that exit such block in a short time along the plane of fast drift, we are also led to define the extended resonant block

(5.11)

where M was defined in (1.6). In the same way, the extended non-resonant block is defined as

$$\begin{aligned} D_0^\rho :=D_0\cap B(I_0,{\mathtt {R}}(\varepsilon )-\rho (\varepsilon ))\ . \end{aligned}$$
(5.12)

5.2 The resonant blocks

As we have explained there, Nekhoroshev proved in [20] that, if h is steep, when any action \(I\in D_\Lambda \), with \(\Lambda \ne \{0\}\), moves along the plane of fast drift, it must exit the resonant zone \(Z_\Lambda \) after having travelled for a short distance. Indeed, if h is steep with steepness indices \({\varvec{\alpha }}_1,...,{\varvec{\alpha }}_{n-1}\) one can prove that the diameter of the intersection of a neighborhood of the fast drift plane with the resonant zone is small in the sense given by the following

Lemma 5.1

For any \(\Lambda \ne 0\), \(\dim \Lambda =j\in \{1,...,n-1\}\), for any \(I\in D_\Lambda \cap B(I_0,{\mathtt {R}}(\varepsilon )-\rho (\varepsilon ))\) and for any \(I'\in {\mathbf {D}}^\rho _{\Lambda ,r_\Lambda }(I)\) one has

$$\begin{aligned} {\left| I-I'\right| }_2\le r_j,\qquad \hbox {where} \quad r_j:\circeq \displaystyle \frac{1}{K^{q_j/\alpha _j}} . \end{aligned}$$
(5.13)

For a proof of this result we refer to Lemma 2.1 of ref. [13].

We notice that a smaller value of \(\varepsilon \), i.e. a higher value of K since the ultraviolet cut-off is always a decreasing function of \(\varepsilon \), leads to a closer maximal distance between any action I belonging to a resonant block and any action belonging to its disc.

Since we will perform normal forms in the (extended) resonant blocks, we also need an estimate of the small divisors in these sets, namely we have

Lemma 5.2

For any maximal lattice \(\Lambda \in \mathbb {Z}^n_K\) of dimension \(j\in \{0,...,n-1\}\), for any \( k\in \mathbb {Z}^n_K\backslash \Lambda \) and for any \( I\in D^\rho _{\Lambda ,r_\Lambda }\) one has

$$\begin{aligned} |\langle k,\omega (I)\rangle |\ge \alpha _\Lambda :\circeq \displaystyle \frac{1}{|\Lambda |K^{q_j-c_j}}\ , \end{aligned}$$
(5.14)

whereas for any action I in the completely non-resonant block \(D_0\) and for any \(k\in \mathbb {Z}^n_K\) one has

$$\begin{aligned} |\langle k,\omega (I)\rangle |\ge \alpha _0:\circeq \displaystyle \frac{1}{K^{q_1}}\ . \end{aligned}$$
(5.15)

We refer again to [13, Lemma 2.2] for a proof of this result.

Finally, a key ingredient in order to insure stability in the steep case is the fact that, when possibly exiting a resonant zone along the plane of fast drift, the actions must enter another resonant zone associated to a lattice of lower dimension. This is the content of

Lemma 5.3

Let \(\Lambda ,\Lambda '\) two maximal lattices of \(\mathbb {Z}^n_K\) having the same dimension \(j\in \{1,...,n-1\}\). Then one has

$$\begin{aligned} \text {closure}\left( D^\rho _{\Lambda ,r_\Lambda }\right) \cap Z_{\Lambda '}=\varnothing \ . \end{aligned}$$
(5.16)

Once again, the proof of this Lemma can be found in [13] (Lemma 2.3).

With the ingredients of this paragraph, we are able to prove stability.

5.3 Proof of Theorem 1.1

We start by giving the standard estimates of stability in the completely non-resonant extended block \(D_0^\rho \). Note that the following bounds do not require any geometric assumption on the integrable part h.

Lemma 5.4

(Non-resonant Stability Estimates). For any sufficiently small \(\varepsilon \) and for any time t satisfying

$$\begin{aligned} |t|\le T_0:\circeq \frac{1}{(1+a\ell )|\ln \varepsilon |^{\ell -1}\ \varepsilon ^{a(\ell -1)+1/2}},\qquad a:=\frac{1}{2np_1}\ , \end{aligned}$$
(5.17)

any initial condition \(I(0)\in D_0^\rho \) drifts at most as

(5.18)

Proof

Our goal is to apply Pöschel’s normal form (see Lemma B.1) to the smoothed Hamiltonian of Theorem 4.1 with analyticity widths r and s.

  • Normal form By monotonicity of the Fourier norm w.r.t. the action variables and (4.22) we immediately get,

    $$\begin{aligned} ||\mathtt {f}_s||_{{r,s}}\le ||\mathtt {f}_s||_{{s,s}} \le {\texttt {C}}_B(R, \ell , n ) \varepsilon =:\epsilon \ , \end{aligned}$$
    (5.19)

    for any \(r\le s\), where we set \( \varepsilon :=|f|_{C^\ell (B_\infty (0,R)\times \mathbb {T}^n)} \).

Denote

$$\begin{aligned} {{\mathcal {B}}}_{\varrho ,\sigma }:=\{(I,\theta )\in \mathbb {C}^n:|I-B_\infty (0,R/4)|_2<\varrho \ ,\ \ \theta \in \mathbb {T}^n_\sigma \}\ \,, \end{aligned}$$

since h is analytic, we chose not to regularize it further. So let \({\mathtt {H}}_s := h(I) + {\mathtt {f}}_s\) be the corresponding analytic Hamiltonian defined on \({{\mathcal {B}}}_{s,s}\). By Pöschel’s Lemma B.1 applied in the complex extension, denoted \({{\mathcal {D}}}^\rho _{0,r,s}\), of the non-resonant block \(D_0^\rho \), with \(\varrho ' \rightsquigarrow r, \varrho \rightsquigarrow s, {\sigma }\rightsquigarrow s\), if

(5.20)

are satisfied, then there exists a symplectic diffeomorfism \(\Psi _0\) that puts \({\mathtt {H}}_s\) into resonant normal form:

$$\begin{aligned} {\mathtt {H}}_s \circ \Psi _0 = h(I) + {\mathtt {g}}+ {\mathtt {f}}^*_s\ ,\ \ \{h,{\mathtt {g}}\}=0,\quad \Psi _0:{{\mathcal {D}}}^\rho _{0,r/2,s/6}\longrightarrow {{\mathcal {D}}}^\rho _{0,r,s}\ . \end{aligned}$$
(5.21)

In particular the resonant and non-resonant part satisfy, respectively,

(5.22)

where are the projectors defined in Lemma B.1.

  • Setting of the initial parameters

Let us set the following dependences on \(\epsilon \) of the ultraviolet cut-off K and of the analyticity widths rs

$$\begin{aligned}&K:= \left( \frac{\epsilon _0}{\epsilon }\right) ^a\ ,\ \ s:\circeq \,\left( \frac{\epsilon }{\epsilon _0}\right) ^{a}\left| \ln \left[ \left( \frac{\epsilon }{\epsilon _0}\right) ^{6(1+a\ell )}\right] \right| \ ,\nonumber \\&r:\circeq \frac{1}{K^{1+q_1}}\circeq \left( \frac{\epsilon }{\epsilon _0}\right) ^{a(1+q_1)}= \left( \frac{\epsilon }{\epsilon _0}\right) ^{1/2}\ . \end{aligned}$$
(5.23)

where \(\epsilon _0\) is a free parameter and \(\epsilon \le \epsilon _0\) since \(K\ge 1\).

Remark 5.1

The freedom in the definitions above is subordinated to the fact that, in order for the construction to be meaningful, the reminder produced by the normal form must be less than or equal to the size of the additional term \((H - \mathtt {H}_s)\circ \Psi _0\), byproduct of the analytic smoothing. As we are working in finite regularity, the latter is expected to be polynomial. The reminder of the normal form being of order \(e^{-Ks}\), one must have \(Ks \sim O(|\log \epsilon |^c)\) for some \(c>0\). Since s tunes the size of the remainder yielded by the analytic smoothing, it has to be polynomial. Hence one is left with two possibilities: either the choice we made in (5.23), or to set \(K \sim \epsilon ^{-a} |\log \epsilon |^{c} \) and \(s \sim \epsilon ^a\). However this second choice would worsen the exponents of stability, since the thresholds of applicability in the normal form lemma strongly depend on K. Of course, to deal with other regularity classes, such as the Gevrey one, other choices must be made.

By plugging the choices (5.23) into the three thresholds in (5.20), it is easy to see that there exists an appropriate choice of \(\epsilon _0\) that makes the three conditions to be simultaneously satisfied. Hence, for the Hölder Hamiltonian

$$\begin{aligned} H=h+f = {\mathtt {H}}_s + f - {\mathtt {f}}_s,\qquad {\mathtt {H}}_s := h + {\mathtt {f}}_s \end{aligned}$$

we can write

$$\begin{aligned} H\circ \Psi _0=\mathtt {H}_s\circ \Psi _0+(f-\mathtt {f}_s)\circ \Psi _0=h + {\mathtt {f}}_s^*+ (f-\mathtt {f}_s)\circ \Psi _0 \ . \end{aligned}$$
(5.24)

Note that since we are in a completely non-resonant block, the resonant term \({\mathtt {g}}\) does not appear in the normal form. Now, the normal form in Lemma B.1 insures that there exists a constant \(\xi >1\) such that any initial condition \((I(0),\theta (0))\in D_0^\rho \times \mathbb {T}^n\) is mapped by \(\Psi _0\) into \(({\mathtt {I}}(0),\vartheta (0))\in ({{\mathcal {D}}}^\rho _{0,\frac{r}{32\xi }})^\mathbb {R}\times \mathbb {T}^n\). For any time t such that the normalized flow \(\Phi ^t_{H\circ \Psi _0}: ({\mathtt {I}}(0),\vartheta (0))\longmapsto ({\mathtt {I}}(t),\vartheta (t))\) starting at \(({{\mathcal {D}}}^\rho _{0,\frac{r}{32\xi }})^\mathbb {R}\times \mathbb {T}^n\) does not exit from \(({{\mathcal {D}}}^\rho _{0,r/2})^\mathbb {R}\times \mathbb {T}^n\), the evolution of the normalized variables reads \((i=1,...,n)\)

$$\begin{aligned}&|{\mathtt {I}}_i(t)-{\mathtt {I}}_i(0)|\nonumber \\&\quad \le \int _0^t \sup _{({\mathtt {I}},\vartheta )\in ({{\mathcal {D}}}^\rho _{0,\frac{r}{32\xi }})^\mathbb {R}\times \mathbb {T}^n}\bigg (\left| (\partial _{\vartheta _i} {\mathtt {f}}^*_s)\circ \Phi ^t_{H\circ \Psi _0}\right| +\left| \{\partial _{\vartheta _i} [(f-\mathtt {f}_s)\circ \Psi _0]\}\circ \Phi ^t_{H\circ \Psi _0}\right| \bigg ) dt\nonumber \\&\quad \le \int _0^t\left( \sup _{({\mathtt {I}},\vartheta )\in ({{\mathcal {D}}}^\rho _{0,r/2})^\mathbb {R}\times \mathbb {T}^n}|\partial _{\vartheta _i} {\mathtt {f}}^*_s|+\sup _{({\mathtt {I}},\vartheta )\in ({{\mathcal {D}}}^\rho _{0,r/2})^\mathbb {R}\times \mathbb {T}^n}|\partial _{\vartheta _i}[(f-\mathtt {f}_s)\circ \Psi _0]|\right) dt\nonumber \\&\quad \le |t|\left[ \frac{||\mathtt {f}_s^*||_{r/2,s/6}}{s}+ \left\Vert f-\mathtt {f}_s\right\Vert _{C^1(B_\infty (0,R/2)\times \mathbb {T}^n)}\times \sup _{({\mathtt {I}},\vartheta )\in ({{\mathcal {D}}}^\rho _{0,r/2})^\mathbb {R}\times \mathbb {T}^n}|\partial _{\vartheta _i}\Psi _0|\right] .\nonumber \\ \end{aligned}$$
(5.25)

The normal form Lemma B.1, together with the choices in (5.23) and the definition of \(\epsilon \) in (5.19), assures that

(5.26)

whereas, by Theorem 4.1, we have

(5.27)

Finally, by writing in the usual way \(|\partial _{\vartheta _i} \Psi _0|=|\partial _{\vartheta _i} (\Psi _0-\text {id}+\text {id})|\), the Cauchy estimates together with the bounds in B.5 imply (since \(r\le s\))

(5.28)

It is easy to see from estimates (5.26), (5.27) and (5.28) that, in order, the remainder from the analytic smoothing dominates on the one coming from the normal form, namely

$$\begin{aligned} \frac{||\mathtt {f}_s^*||_{r/2,s/6}}{s}\ll \left\Vert f-\mathtt {f}_s\right\Vert _{C^1(B_\infty (0,R/2)\times \mathbb {T}^n)}\times \sup _{({\mathtt {I}},\vartheta )\in ({{\mathcal {D}}}^\rho _{0,r/2})^\mathbb {R}\times \mathbb {T}^n}|\partial _{\vartheta _i}\Psi _0| \end{aligned}$$

so that finally we can write

(5.29)

Hence, over a time

one has and, by scaling back to the original variables,

\(\square \)

As for the dynamics in the resonant blocks, we have the following

Lemma 5.5

Consider a maximal lattice \(\Lambda \subset \mathbb {Z}^n_K\) of dimension \(j\in \{1,...,n-1\}\). There exists \(\mathtt {T}_j>0\) such that for any sufficiently small \(\varepsilon \) and for any initial condition \((I(0),\theta (0))\in \bigg (D_\Lambda \cap B\big (I_0,{\mathtt {R}}(\varepsilon )-(j+1)\rho (\varepsilon )\big )\bigg ) \times \mathbb {T}^n\), if one sets

$$\begin{aligned} T_\Lambda :=&\mathtt {T}_j\times \frac{r_\Lambda }{|\ln \varepsilon ^{6(1+a\ell )}|^{\ell -1}\,\varepsilon ^{1+a(\ell -1)}},\qquad a:=\frac{1}{2np_1}\ , \end{aligned}$$
(5.30)

and considers the time of escape of the flow generated by H from the extended resonant block

$$\begin{aligned} \tau _e:=&\inf \left\{ t\in \mathbb {R}: \Phi ^t_{H}\bigg (D_\Lambda \cap B\big (I_0,{\mathtt {R}}(\varepsilon )-(j+1)\rho (\varepsilon )\big )\times \mathbb {T}^n\bigg )\not \subset D^\rho _{\Lambda ,r_\Lambda }\times \mathbb {T}^n\right\} \ , \end{aligned}$$
(5.31)

the following dichotomy applies:

  1. (1)

    If \(|\tau _e|\ge T_\Lambda \) one has

    $$\begin{aligned} |I(t)-I(0)|_2< \rho (\varepsilon ) \end{aligned}$$
    (5.32)

    over a time \(|t|\le T_\Lambda \);

  2. (2)

    If \(|\tau _e|< T_\Lambda \) there exists \(i\in \{0,...,j-1\}\) such that

    $$\begin{aligned} I(\tau _e)\in D_i\cap \bigg (B\big (I_0,{\mathtt {R}}(\varepsilon )-j\rho (\varepsilon )\big )\bigg )\ . \end{aligned}$$

Proof

We start by considering the case \(|\tau _e|\ge T_\Lambda \). In a similar way to what we did in the proof of Lemma 5.4, we apply Pöschel’s Normal Form (see Lemma B.1) to the smoothed Hamiltonian \({\mathtt {H}}_s\) in the complex extension \((D^\rho _{\Lambda ,r_\Lambda })_{r_\Lambda }\) of the real extended resonant block \(D^\rho _{\Lambda ,r_\Lambda }\), with parameters

$$\begin{aligned} K:= \left( \frac{\epsilon _0}{\epsilon }\right) ^a\ ,\ \ s:\circeq \,\left( \frac{\epsilon }{\epsilon _0}\right) ^{a}\left| \ln \left[ \left( \frac{\epsilon }{\epsilon _0}\right) ^{6(1+a\ell )}\right] \right| \ ,\ \ r_\Lambda :\circeq \frac{1}{|\Lambda |K^{q_j}} \end{aligned}$$
(5.33)

and with a small divisor estimate given by formula (5.14) in Lemma 5.2, namely

$$\begin{aligned} \alpha _\Lambda :\circeq \frac{1}{|\Lambda |K^{q_j-c_j}}\ . \end{aligned}$$
(5.34)

As before, we plug (5.33) and (5.34) into Pöschel’s thresholds (B.1) – (B.2) and we derive the conditions

(5.35)

By definition of the parameters \(p_j\) in (5.1), it is easy to see that the first two conditions are always satisfied by appropriately choosing \(\epsilon _0\), whereas the last condition is trivial.

Therefore, by taking into account the notations in (3.6), there exists a symplectic transformation \(\Psi _\Lambda :(D^\rho _{\Lambda ,r_\Lambda })_{r_\Lambda /2}\times \mathbb {T}^n_{s/6}\longrightarrow (D^\rho _{\Lambda ,r_\Lambda })_{r_\Lambda }\times \mathbb {T}^n_{s}\ ,\ \ ({\mathtt {I}},\vartheta )\longmapsto (I,\theta )\), that takes H into the resonant normal form

$$\begin{aligned} H\circ \Psi _\Lambda =\mathtt {H}_s\circ \Psi _\Lambda +(H-\mathtt {H}_s)\circ \Psi _\Lambda =h + {\mathtt {g}}+{\mathtt {f}}_s^*+ (f-\mathtt {f}_s)\circ \Psi _\Lambda \end{aligned}$$
(5.36)

with \(\{h,{\mathtt {g}}\}=0\), .

Now, for any time t such that \(|t|\le T_\Lambda \le |\tau _e|\), the dynamics on the subspace orthogonal to the plane of fast drift \(\langle \Lambda \rangle \) can be controlled in the usual way by exploiting the smallness of the non-resonant remainder \({\mathtt {f}}_s^*\), as well as that of \((f-{\mathtt {f}}_s)\circ \Psi _{\Lambda }\). Namely, for any initial position in the actions \(I(0)\in D_\Lambda \), by the first estimate in (B.5) one has that the associated normalized coordinate satisfies \({\mathtt {I}}(0)\in (D_\Lambda )^\mathbb {R}_{\frac{r_\Lambda }{32\xi }}\), where \((D_\Lambda )^\mathbb {R}_{\frac{r_\Lambda }{32\xi }}\) represents the real projection of the complex extension of width \(\frac{r_\Lambda }{32\xi }\) around \(D_\Lambda \) (not to be confused with the extended resonant block) and where \(\xi >1\) is a free parameter that can be suitably adjusted. By taking into account the fact that \(\Pi _{\langle \Lambda \rangle ^\perp }(\partial _{\vartheta } {\mathtt {g}})=0\), one can write

$$\begin{aligned} \begin{aligned}&\left| \Pi _{\langle \Lambda \rangle ^\perp }\big ({\mathtt {I}}(t)-{\mathtt {I}}(0)\big )\right| _2\\&\le \int _0^t \sup _{({\mathtt {I}},\vartheta )\in \left( D_\Lambda \right) _{\frac{r_\Lambda }{32\xi }}^\mathbb {R}\times \mathbb {T}^n} \bigg (\left| \Pi _{\langle \Lambda \rangle ^\perp }(\partial _{\vartheta } g +\partial _{\vartheta } {\mathtt {f}}^*_s)\circ \Phi ^t_{H\circ \Psi _\Lambda }\right| _2\\&\quad +\left| \Pi _{\langle \Lambda \rangle ^\perp }\{\partial _{\vartheta } [(f-\mathtt {f}_s)\circ \Psi _\Lambda ]\}\circ \Phi ^t_{H\circ \Psi _\Lambda }\right| _2\bigg ) dt\\&\le \int _0^t \sup _{({\mathtt {I}},\vartheta )\in \left( D_\Lambda \right) _{\frac{r_\Lambda }{32\xi }}^\mathbb {R}\times \mathbb {T}^n} \bigg (\left| (\partial _{\vartheta } {\mathtt {f}}^*_s)\circ \Phi ^t_{H\circ \Psi _\Lambda }\right| _2 +\left| \{\partial _{\vartheta } [(f-\mathtt {f}_s)\circ \Psi _\Lambda ]\}\circ \Phi ^t_{H\circ \Psi _\Lambda }\right| _2\bigg ) dt\\&\le \sup _{({\mathtt {I}},\vartheta )\in \left( D^\rho _{\Lambda ,r_\Lambda }\right) _{\frac{r_\Lambda }{32\xi }}^\mathbb {R}\times \mathbb {T}^n}\bigg (\left| (\partial _{\vartheta } {\mathtt {f}}^*_s) \right| _2+\left| \{\partial _{\vartheta } [(f-\mathtt {f}_s)\circ \Psi _\Lambda ]\}\right| _2\bigg ) |t|\ ,\\ \end{aligned} \end{aligned}$$
(5.37)

where the last inequality follows from the fact that \(|t|\le \tau _e\) and, since the initial variables are confined in \(D^\rho _{\Lambda ,r_\Lambda }\), the normalized ones stay in \((D^\rho _{\Lambda ,r_\Lambda })^\mathbb {R}_{\frac{r_\Lambda }{32\xi }}\) over the same time.

Since \(|t|\le T_\Lambda \le \tau _e\), by the same arguments that were used in estimate (5.25) and estimate (5.37) we obtain

(5.38)

by suitably choosing \(\mathtt {T}_j\).

Let us decompose the variation of the action variables as

$$\begin{aligned} \begin{aligned} I(t)-I(0)=&I(t)-{\mathtt {I}}(t)+{\mathtt {I}}(t)-{\mathtt {I}}(0)+{\mathtt {I}}(0)-I(0)\\ =&I(t)-{\mathtt {I}}(t)+\Pi _{\langle \Lambda \rangle ^\perp }\big ({\mathtt {I}}(t)-{\mathtt {I}}(0)\big ) +\Pi _{\langle \Lambda \rangle }\big ({\mathtt {I}}(t)-{\mathtt {I}}(0)\big )+{\mathtt {I}}(0)-I(0)\ , \end{aligned} \end{aligned}$$
(5.39)

so that estimate (5.38), together with the size of the normal form, implies that, for \(|t|\le T_\Lambda \), the motion orthogonal to the fast drift plane is bounded by

$$\begin{aligned} \begin{aligned}&|I(t)-I(0)-\Pi _{\langle \Lambda \rangle }\big ({\mathtt {I}}(t)-{\mathtt {I}}(0)\big )|_2\\&\quad \le |I(t)-{\mathtt {I}}(t)|_2 +|\Pi _{\langle \Lambda \rangle ^\perp }\big ({\mathtt {I}}(t)-{\mathtt {I}}(0)\big )|_2+|{\mathtt {I}}(0)-I(0)|_2\\&\quad \le \frac{r_\Lambda }{32\xi } + \frac{r_\Lambda }{4} + \frac{r_\Lambda }{32\xi }\le \frac{3}{4} r_\Lambda \ , \end{aligned} \end{aligned}$$
(5.40)

where we have used the fact that \(\xi >1\). Hence, by (5.40), \(I(t)\in D^\rho _{\Lambda ,r_\Lambda }\) since \(I(0)\in D_\Lambda \) and the orbit lies entirely in this set for any \(|t|\le T_\Lambda \le \tau _e\); moreover, the definition in (5.10) implies

$$\begin{aligned} I(t)\in {\mathbf {D}}^\rho _{\Lambda ,\frac{3}{4} r_\Lambda }(I(0))\subset {\mathbf {D}}^\rho _{\Lambda , r_\Lambda }(I(0))\ . \end{aligned}$$

This fact, together with Lemma 5.1, yields

(5.41)

As it is shown in [13] (formula (38)), a careful choice of the constants leads to

$$\begin{aligned} \max _{j\in \{1,...,n-1\}}r_j< \rho (\varepsilon )\ , \end{aligned}$$

which concludes the proof of the first claim of this Lemma.

We now consider the second claim. In this case, for any time t such that \(|t|< |\tau _e|< T_\Lambda \) we can repeat the same arguments above and find \(I(t)\in {\mathbf {D}}^\rho _{\Lambda ,\frac{3}{4} r_\Lambda }(I(0))\). Then, by construction, the escape time satisfies

$$\begin{aligned} I(\tau _e)\in \text {closure}\big ({\mathbf {D}}^\rho _{\Lambda , \frac{3}{4} r_\Lambda }(I(0))\big ) . \end{aligned}$$
(5.42)

Again, by Lemma 5.1, this implies \(|I(t)-I(0)|_2< \rho (\varepsilon )\) for any \(|t|<\tau _e< T_\Lambda \), so that, since \(I(0)\in B_2\big (I(0), {\mathtt {R}}(\varepsilon )-(j+1)\rho (\varepsilon )\big )\) one has

$$\begin{aligned} I(\tau _e)\in B_2\big (I(0), {\mathtt {R}}(\varepsilon )-j\rho (\varepsilon )\big )\,. \end{aligned}$$
(5.43)

Now, we shall prove that \(I(\tau _e)\not \in Z_\Lambda \). By definition(5.11) we have \(I(\tau _e)\not \in D^\rho _{\Lambda ,r_\Lambda }\) and, thanks to , this means that there does not exist any action \( I^*\in D_\Lambda \cap B\big (I_0, {\mathtt {R}}(\varepsilon )-\rho (\varepsilon )\big )\) such that \(I(\tau _e)\) belongs to its disc \({\mathbf {D}}^\rho _{\Lambda ,r_\Lambda }(I^*)\). Hence, by (5.10), \(I(\tau _e)\) must satisfy at least one of the three following conditions:

  1. (1)

    \(\not \exists I^*\in D_\Lambda \cap B_2\big (I_0,{\mathtt {R}}(\varepsilon )-\rho (\varepsilon )\big ):\, I(\tau _e)\in \bigcup _{I'\in I^*+\langle \Lambda \rangle }B_2(I',r_\Lambda )\);

  2. (2)

    \(I(\tau _e)\not \in Z_\Lambda \);

  3. (3)

    \(I(\tau _e)\not \in B_2\big (I_0,{\mathtt {R}}(\varepsilon )-\rho (\varepsilon )\big )\).

By taking (5.42) and (5.43) into account, we see that the first and the third possibility cannot occur. Therefore, there must exist a maximal lattice \(\Lambda '\ne \Lambda \) and a resonant zone \(Z_{\Lambda '}\) such that \(I(\tau _e)\in Z_{\Lambda '}\). Moreover, Lemma 5.3, insures that \(\dim \Lambda '\ne \dim \Lambda \) so that \(I(\tau _e)\not \in Z_j\). The second decomposition in (5.8) together with (5.43) and (5.9) implies that \(I(\tau _e)\) belongs to a resonant block of lower multiplicity, hence the claim. \(\square \)

Remark 5.2

The decompositions in (5.8) are a covering of \(B(I_0,{\mathtt {R}}(\varepsilon ))\) but they are not a partition since, in general, \(D_i\cap D_j\ne \varnothing \) for \(j>i+1\). Hence, nothing prevents \(I(\tau _e)\) from belonging to a resonant block of strictly higher multiplicity than the starting one. If this happens, however, thanks to the construction in (5.8), one is insured that \(I(\tau _e)\) will also belong to another block associated to a lower order resonance. One therefore chooses the block in which to study the evolution of the actions once they leave the resonant zone they started at. This is at the core of the resonant trap argument, which is discussed in the sequel.

Proof of Theorem 1.1

Theorem 1.1 follows from Lemmas 5.4 and 5.5. Indeed, for any initial condition in the action variables \(I_0\in B_\infty (0,R/4)\), we consider the ball \(B_2(I_0,{\mathtt {R}}(\varepsilon ))\) and the following dichotomy holds:

  1. (1)

    either \(I_0\) belongs to the completely non-resonant domain \(D_0^\rho \), in which case the proof ends here thanks to Lemma 5.4;

  2. (2)

    or for some \(j\in \{1,...,n-1\}\) and some maximal \(\Lambda \subset \mathbb {Z}^n_K\) of rank j, \(I_0\in D_\Lambda \cap B\big (I_0,{\mathtt {R}}(\varepsilon )-(j+1)\rho (\varepsilon )\big )\).

In the second case, Lemma 5.5 applies and one has another dichotomy:

  1. (1)

    either over a time \(T_\Lambda \); in this case the Theorem is proven since, taking into account the fact that the analyticity width in Lemma 5.4 satisfies \(r\circeq \varepsilon ^{1/2}\), one has

    (5.44)

    where the last inequality is a consequence of the fact that, by (5.23), (5.33), one can write

    $$\begin{aligned} r\le r_\Lambda \quad \longleftrightarrow \quad \frac{1}{K^{1+q_1}}\le \frac{1}{|\Lambda |K^{q_j}} \end{aligned}$$

    and that, since \(|\Lambda |\le K^j\), the stricter inequality

    $$\begin{aligned} \frac{1}{K^{1+q_1}}\le \frac{1}{K^{j+q_j}}\quad \longleftrightarrow \quad 1+q_1 \ge j+q_j \quad \longleftrightarrow \quad p_1\ge p_j\ , \end{aligned}$$

    is trivially satisfied by the definition of \(p_1\) and \(p_j\), \(j\in \{1,...,n-1\}\), in (5.1) and by the fact that the steepness indices are always greater or equal than one.

  2. (2)

    or the actions enter a resonant block \(D_i\cap \bigg (B\big (I_0,{\mathtt {R}}(\varepsilon )-j\rho (\varepsilon )\big )\bigg )\) corresponding to a resonant lattice of dimension \(i<j\) after having travelled a distance \(\rho (\varepsilon )\) over a time inferior to the time of escape. In this block, the above arguments can be repeated so that, after having possibly visited at most \(n-1\) blocks, overall the actions can travel at most a distance \((n-1)\rho (\varepsilon )\) before entering the completely non-resonant block, in which they are trapped for a time \(T_0\) given by Lemma 5.4 and they travel for another length \(\rho (\varepsilon )\). Thanks to (5.9), by construction one has \(|I(t)-I(0)|\le n\rho (\varepsilon )=\frac{1}{2} {\mathtt {R}}(\varepsilon )\circeq \varepsilon ^b\).

This is the so-called resonant trap argument and concludes the proof of Theorem 1.1, once one sets

$$\begin{aligned} \mathtt {a}=a(\ell -1)+\frac{1}{2},\qquad \mathtt {b}=b\ . \end{aligned}$$

\(\square \)