Analytic Smoothing and Nekhoroshev estimates for H\"older steep Hamiltonians

In this paper we prove the first result of Nekhoroshev stability for steep Hamiltonians in H\"older class. Our new approach combines the classical theory of normal forms in analytic category with an improved smoothing procedure to approximate an H\"older Hamiltonian with an analytic one. It is only for the sake of clarity that we consider the (difficult) case of H\"older perturbations of an analytic integrable Hamiltonian, but our method is flexible enough to work in many other functional classes, including the Gevrey one. The stability exponents can be taken to be $(\ell-1)/(2n{\mathbf{\alpha}}_1...{\mathbf{\alpha}}_{n-2})+1/2$ for the time of stability and $1/(2n{\mathbf{\alpha}}_1...{\mathbf{\alpha}}_{n-1})$ for the radius of stability, $n$ being the dimension, $\ell>n+1$ being the regularity and the ${\mathbf{\alpha}}_i$'s being the indices of steepness. Crucial to obtain the exponents above is a new non-standard estimate on the Fourier norm of the smoothed function. As a byproduct we improve the stability exponents in the $C^k$ class, with integer $k$.


Introduction and main results
1. The main goal of this work is to introduce a unified way for proving "long time stability" of the action variables for perturbations of completely integrable Hamiltonian systems which belong to a large class of function spaces. We will limit ourselves here to Hölder perturbations of analytic systems, but our method is flexible enough to be adapted to many other settings 1 .
The effective stability theory for nearly-integrable hamiltonian systems was initiated by the pioneering work of J.E. Littlewood [14] and reached a first main achievement in the seventies with the work of N.N. Nekhoroshev [19]; it was then developed by many authors. The usual setting is that of Hamiltonian systems of the form where (I, θ) ∈ R n × T n are the action-angle variables and f is small with respect to h. In Nekhoroshev's work the Hamiltonian H is analytic and h satisfies a steepness condition (see definition 1.1 below). The theory has been then developed in various settings: H can be assumed to be Gevrey (which includes the analytic case) or C k with k ≥ 2 and integer, while h can be assumed to be convex or quasi-convex (see for example [18] or [5]) The norm of f , relative to the function space at hand, is denoted by ε. For systems as (1.1), the previous results assert that the action variables are confined in a ball of radius R(ε) centered at the initial action during 1 Assuming that the unperturbed system is analytic is just a matter of simplification. a time T(ε), provided that ε is smaller that some threshold E. We say that R(ε) is the confinement radius, T(ε) is the stability time and E is the applicability threshold. The remarkable fact is that -h being given -the results depend only on the norm of f and not on its particular form.
Much attention has been paid in the literature in order to obtain good estimates for the quantities R(ε) and T(ε) in the different frameworks. As we shall see in the sequel, in the setting of Hölder perturbations of analytic integrable systems, the method we introduce in this paper yields sharper estimates than those that are found in the literature up to now. Before stating rigorously our results, however, it is useful to have an overview of the classical results on the effective stability of near-integrable Hamiltonian systems.
2. The classical results. Let us briefly describe the classical abstract results. In the 70's Nekhoroshev proved his seminal theorem [19], which asserts that for a steep real-analytic function h and for any real-analytic perturbation f with analytic extension to a complex domain D, all solutions are stable at least over exponentially long time intervals. Namely, there exist positive exponents a, b and a positive threshold E, depending only on h, such that if |f | D ≤ E, then any initial condition (I 0 , θ 0 ) gives rise to a solution I(t), θ(t) which is defined at least for |t| ≤ exp c(1/ε) a and satisfies |I(t) − I 0 | ≤ Cε b in that range. Here |f | D is the C 0 sup-norm on the domain D and c, C are positive constants which also depend only on h. With our notation, for these systems: while the expression of the threshold E is quite difficult to obtain explicitly 2 , see [19]. Since the constants c and C are less significant than the exponents we will get rid of them in our subsequent description. Nekhoroshev's proof is based on the construction of a partition (a "patchwork") of the phase space into zones of approximate resonances of different multiplicities, over which one can construct adapted normal forms. The global stability result necessitates a very delicate control of the size and disposition of the elements of the patchwork in order to produce a "dynamical confinement" preventing the orbits from fast motions along distances larger than the confinement radius (see below for a discussion).
In the convex case, as noticed in [11] and [4], a shrewd use of energy conservation leads to a much simpler and "physical" way to confine the orbits. This gave rise to two distinct series of works, originating in the articles of Lochak [15] -where the simultaneous approximation method was introduced -and Pöschel [23] where the construction of Nekhoroshev's patchwork was made much easier -both relying on the convexity or quasi-convexity of the integrable Hamiltonian.
As alluded to above, long time stability does not require a priori the analyticity of the Hamiltonian at hand. For general Gevrey quasi-convex systems 3 , the fast decay of the Fourier coefficients also yields exponentially long stability times. Namely, for β-Gevrey systems (where β is the Gevrey exponent) it is proved in [18] that The proof is based on a direct construction of normal forms for Gevrey systems. This study was initiated by M. Herman for proving the optimality of the stability exponents by constructing explicit examples taking advantage of the flexibility of the Gevrey category, see below.
Soon after, finitely differentiable systems have been investigated in [5] using a direct implementation of Lochak's scheme in this setting, which yields the estimates for quasi-convex C systems with ≥ 2 and integer. On the other hand, the stability of C systems, with an integer such that ≥ * n+1 for some suitable * ≥ 1, * ∈ N, satisfying a property known as Diophantine-Morse condition 4 , was investigated in [6], where the values R(ε) = Cε 1/(4(n+1)) n were found. The case = +∞ has been studied in [1], where the authors find that, in the case h(I) = I 2 /2 and for fixed b ∈ (0, 1/2), for any M > 0 there exists C M > 0 such that The result is achieved by implementing an innovative global normal form in Pöschel's framework. Finally, we also refer to the recent work [7] and references therein for much more information about stability in various functional classes.
3. Purpose of the work. The objective of this paper is to make a systematic use of analytic smoothing methods to derive normal forms in a very simple way -whatever the regularity of the Hamiltonians at handfrom the usual analytic ones. This way we get maximal flexibility to adapt the different long-time stability proofs to a large class of function spaces. We will investigate here only the case of Hölder differentiable Hamiltonians, but our method extends to any steep functions belonging to any regularity class which admits an analytic smoothing. More precisely, the proposed strategy (see Section 4.3) allows us to prove, in a very simple way, the first Nekoroshev-type result of stability for Hölder steep Hamiltonians with presumed sharp exponents 5 . In this case one cannot expect to get more than polynomial stability times relative to the size ε of the perturbation [5]. In the course of the proof we need to adjust in a rather unusual way the size of the various parameters: ultraviolet cutoff and, in an essential way, the analyticity width, as a function of the size ε of the perturbation.
4. Main results. Let us fix the main definitions and assumptions. In the following, given ν ∈ {1, . . . , ∞}, we denote by | · | ν the corresponding ν -norm in R n or C n . We denote by B ν (I 0 , R) the open ball centered at I 0 of radius R for the norm | · | ν in R n .
Consider a Hamiltonian of the form (1.1), where we assume, for the sake of simplicity, that the unperturbed part h is analytic 6 while only the perturbation f is Hölder, so: where B ∞ (0, R) ρ0 is the complex extension of analyticity width ρ 0 ≥ 1 of B ∞ (0, R), and ∈ (1, +∞) (meaning that f is Hölder differentiable when is not an integer, see section 3 for a brief overview on this class of functions). The small parameter is (see (3.2) for a definition of the Hölder norm). We denote by ω = ∇h : R n → R n the action-to-frequency map attached to h. 4 The Diophantine-Morse property is a special case of the Diophantine-steep condition introduced in [22] which, in turn, is a prevalent condition on integrable systems that ensures long time stability once these are perturbed. All steep functions are Diophantine-steep. 5 Sharpness has the same meaning as in [13], i.e. these are the best values of the exponents for T(ε) and R(ε) that one can obtain with these techniques. 6 As we will see in the course of the proof, assuming that h is Hölder with large enough exponents would be enough, see We will assume that the Hessian of h is uniformly bounded from above: where op stands for the operator norm induced by the Hermitian norm on C n . We will also assume that the Hamiltonian h is steep according to the following definition.
Remark 1.1. Note that a uniformly strictly convex function is steep with steepness indices equal to 1.
Remark 1.2. The steepness condition is generic in the space of jets of sufficiently regular functions (see [20] for the general discussion and [25], [2] for sufficient conditions for steepness in the space of jets of order four and five respectively).
Then, there exist positive constants E = E(n, , α), C I := C I (n, , α), C T := C T (n, , α) such that, for ε ≤ E, the radius and time of confinement relative to any initial condition in the set B ∞ (0, R/4) satisfy: • The presence of the logarithm in (1.8) comes from the fact that in our method we have some freedom to fix the analyticity width depending on ε, in contrast with the classical analytic setting. We send the reader to Remark 5.1, where this comment is contextualized, the dependence of the analyticity width in ε is made precise and a qualitative justification is given.
• Our proof relies on the geometric construction of the geography of resonances introduced in [13], which is appropriate only for Hamiltonians in n ≥ 3 degrees of freedom. Here too we shall restrict to this setting, the 2 d.o.f. isoenergetic non-degenerate case being easily managed through KAM theory. A specific construction should be implemented to treat the peculiarity of the isoenergetic degenerate 2 d.o.f. case. This study is in progress in a forthcoming work.

Prospects.
The sharpness of the exponents in Theorem 1.1 should be proved in the same way as in the case of convex system. The first attempt to tackle this problem led to work in the Gevrey category instead of the analytic one and construct examples with unstable orbits, which experience a drift in action of the same order as the confinement radius within a time of the same order as the stability time, see [18]. It has then be realized that 7 Actually one could probably get n/2 by making use of Paley-Littlewood theory.
the initial conjecture in quasi-convex analytic systems (a ∼ 1/2n, see [10] and Lochak [15]) was in fact incorrect: as proved in [8] using a purely topological argument together with the previous remark on the local exponents near simple resonances, one can choose a = 1/(2(n − 1)) as a global stability exponent for T(ε). This result was improved soon after with a ∼ 1/(2(n − 2)) (see [26]). The construction of unstable system proving the optimality of these latter exponents was achieved in [18], [16], [26]. A remarkable fact is that the unstable mechanism introduced by Arnold in the 60's, with its subsequent improvements, is exactly what is needed to produce the unstable examples in the quasi-convex case.
As for the steep case, a careful construction of the geography of resonances leads with strong evidence to the conjecture that the exponents a = 1/(2nα 1 ...α n−2 ) and b = 1/(2nα 1 ...α n−1 ) are sharp (see ref. [13]). The question of constructing explicit examples with unstable orbits proving this sharpness is still open nowadays and is maybe the last challenging problem in the general long time stability theory, probably relying on new Arnold diffusion ideas.
The paper is organized as follows: in the next section we give a short overview of the classical methods with particular attention on the geometry of resonant blocks, on which the present work strongly relies. Next we define the functional setting. In Section 4 we introduce the analytic smoothing appropriately adapted to our problem. Finally Section 5 is devoted to the study of the steep case.
Acknowledgements. We wish to thank A. Bounemoura, L. Biasco, L. Chierchia, M. Salvatori, and L. Niederman for fruitful discussions and stimulating comments, which definitively helped to improve this work. J.E.M. acknowledges the support of the INdAM-GNAMPA grant "Spectral and dynamical properties of Hamiltonian systems". Let us first make this idea more precise. Given an integer lattice Λ ⊂ R n of dimension m ∈ {1, . . . , n − 1}a resonance lattice -one associates with Λ the resonance vector subspace Λ ⊥ ⊂ R n in the frequency space R n , together with the corresponding resonance subset in the action space previously introduced where ω = ∇h is the frequency map. The dimension m of Λ is said to be the multiplicity of the resonance M Λ . Of course, given a resonance module Λ ⊃ Λ with dim Λ > dim Λ, the resonance M Λ is contained in M Λ , so that a resonance subset contains in general infinitely many resonances of higher multiplicity. The complement M 0 ⊂ O of the union of all resonance subsets is the non-resonant subset. In general, a resonance subset M Λ has no particular structure, however, one can think of M Λ as a submanifold of R n of the same dimension as Λ ⊥ (with perharps singular loci).
As a rule, when ε is small enough, for a small enough ε-depending neighborhood W Λ of the parts of the resonance subset M Λ located far enough from resonances of higher multiplicity 8 , one can iteratively construct a symplectic diffeomorphism Ψ Λ , whose image contains W Λ × T n , such that the pull-back H Λ = H • Ψ Λ takes 8 In fact, only a finite ε-depending subset (related to the cutoff K(ε) introduced below) of these resonances has to be taken into account.
the following form Here R Λ is a remainder whose C 2 norm is (very) small 9 with respect to ε and the resonant part N Λ contains only harmonics belonging to Λ, that is: where K(ε) is an ultraviolet cutoff which has to be properly chosen 10 . Both terms N Λ and R Λ of course depend on ε. A subset W Λ for which such a normal form is proved to exist will be called a normal form neighborhood associated with Λ, with multiplicity dim Λ. One proves that the space of actions can be covered by such neighborhoods, and in Section 5.1, we will construct finer covers by subsets of those, named resonant blocks (and denoted by D Λ in the aforementioned section).
The iterative process to construct the normalizing diffeomorphism involves the control of small denominators which appear during the resolution of the so-called homological equation, and which depend on the location of the normalization domain with respect to the resonances (see for instance [23]). This can be seen as a drawback of the method which could be greatly simplified by an idea due to Lochak (see below), however the general method presented here give precise dynamical informations which would not be reachable otherwise.
The Hamilton equations generated by (2.1) yield the following form for the evolution of the action variables: The variation of I is therefore the sum of the main part and the very small remainder term R(t).
To simplify the presentation in the following, we will forget about the angles and consider only the action part of the solutions of our system (which is legitimized by the fact that the angles play no role in the various estimates).
The whole theory relies firstly on the obvious fact that the main drift term D(t) in (2.3) belongs to the vector space Vect Λ spanned by Λ (which is often called "plane" of fast drift), and secondly on the smallness of the remainder term R. A solution starting from some initial condition I(0) ∈ W Λ will therefore remain very close to the fast drift space during a very long time -governed by the smallness of R -as long as it is contained inside the neighborhood W Λ . This makes it necessary to understand first the intersections of the fast drift planes I + Vect Λ and the neighborhoods W Λ to which they are attached.
As an extreme example, let us consider the Hamiltonian on A 2 , with (invertible) frequency map ω(I 1 , I 2 ) = (I 1 , −I 2 ). We focus on the resonance module Λ = Z(1, −1), so that Λ ⊥ = R(1, 1) and Vect Λ = M Λ . Hence, given an initial action I(0) ∈ M Λ , the entire fast drift affine subspace I(0) + Vect Λ coincides with M Λ , so that nothing prevents the fast drift to take place during the whole motion provided the perturbation is well-chosen: the resonance M Λ is called a superconductivity channel. No long time stability result can be expected in this case: indeed, when f (I, θ) = sin(θ 1 − θ 2 ), the initial condition I = 0, θ = 0 yields the fast evolution (I 1 (t), I 2 (t)) = (−εt, εt) for the action variables 11 .
In constrast with the previous example, for the Hamiltonian on A n , for any Λ ⊂ Z n K , the the resonant set M Λ coincides with Λ ⊥ , so that the affine planes of fast drift are always orthogonal to M Λ . In this case a fast drift -if it happens -makes the orbits move away from the resonance in a very short time.
These extreme examples illustrate the role of the Nekhoroshev condition: steepness is an intermediate quantitative property, which prevents from the existence of the superconductivity channels by ensuring a certain amount of transversality between the fast drift planes and the corresponding resonances in action. Starting from an action I = I(0) located at some resonance M Λ , so that its associated frequency ω(I) is orthogonal to Γ := Vect Λ, the condition (where π Γ stands for the orthogonal projection on Γ) imposes that a drift of length ξ starting from I and occuring along the fast drift plane I + Γ makes the projection π Γ (ω) change by an amount of C m ξ αm during the way. This admits an easy geometric interpretation (see Figure 1). Assume dim Λ = m and consider the vector space Γ spanned by Λ, together with its orthogonal space Λ ⊥ -of dimension n − m. Then one can define a family of tubular neighborhoods of Λ ⊥ of width δ > 0 by Each such neighborhood gives rise to a neighborhood of the resonance M Λ in action, namely: Therefore, condition (2.4) just says that any orbit starting from I and drifting to a distance ξ from I along the plane of fast drift Γ must exit the neighborhood W δ (M Λ ) with δ = C m ξ αm .
Note finally that given disjoint subsets T, T of tubular neighborhoods of the form (2.5), the associated neighborhoods ω −1 (T) and ω −1 (T ) are disjoint too, whatever the geometric assumptions on the frequency map ω.
2. Nekhoroshev's hierarchy. This section is inspired by Nekhoroshev's ideas as presented in the very nice paper [13]. We also refer to [12] for further details and to [22] for a different approach. Nekhoroshev's strategy to prove long-time stability results for perturbations of steep Hamiltonians is based on the previous description of resonant neighborhoods, and relies on the following key observation.
Given ε small enough, there exist T (ε), R(ε) and a covering of the action space O by resonant "blocks" (B m,p ) 0≤p≤pm , for 0 ≤ m ≤ n − 1, and m, p, p m ∈ N, which satisfy the following properties: (2) each block B m,p is contained in a resonant neighborhood of multiplicity m and admits an enlargement B m,p ⊃ B m,p contained in the same resonant neighborhood; (3) any solution starting from an initial condition in B m,p either stays inside B m,p for 0 ≤ t ≤ T (ε) or admits a first exit time t 1 such that I(t 1 ) belongs to a block B m ,p with m < m; (4) for any initial condition I(0) inside a block B m,p and for any interval I such that I(t) ∈ B m,p for all t ∈ I, then We say that m is the multiplicity of the block B m,p . Taking the previous observation for granted, the stability of the action variable over a timescale T (ε) is easy to prove by finite induction. Given an initial condition I(0) located in some block B m0,p0 , either I(t) ∈ B m0,p0 for 0 ≤ t ≤ T (ε), or there is a t 1 such that I(t) ∈ B m0,p0 for 0 ≤ t < t 1 and I(t 1 ) belongs to a block B m1,p1 with m 1 < m 0 . Consequently, there is a finite sequence (m 0 , p 0 ), . . . , (m j , p j ) such that m 0 > m 1 > · · · > m j (with maybe m j = 0) and a finite sequence of times t 0 = 0 < t 1 < · · · < t p = T (ε) such that for 0 ≤ i < j: In words, any orbits crosses a finite number of enlarged blocks during the interval [0, T (ε)] and get trapped inside the last one. To conclude, one just has to use property (4), which proves that the distance between I(0) and I(t) is at most nR(ε) for t ∈ [0, T (ε)].
One should be aware that the covering by the blocks is not a partition of O: two distinct blocks may have a nonempty intersection. However, one can choose the blocks visited by the orbits according to a hierarchical order, in such a way that their multiplicity decreases as t increases 12 . We say that a covering of O by blocks satisfying the previous properties is a Nekhoroshev patchwork.

Construction of Nekhoroshev patchworks.
Let us now describe how the blocks are constructed so as to possess their covering and confinement properties 13 .
Given ε > 0, we first fix an ultraviolet cutoff K(ε) and consider only the set M ε of resonance modules which are spanned by vectors of length smaller than K(ε). Given a resonant module Λ ∈ M ε of multiplicity m, we start with the resonant zone of "width" δ Λ where δ Λ has to be properly chosen as a function of ε and the various geometric invariants of the module (see section 5). We then define the (ε-dependent) resonant zone Z m of multiplicity m as 12 This raises the question of the existence of local finite time Lyapunov functions on the phase space, a still unclear issue. 13 A source of inspiration for nowadays governments.
Given Λ ∈ M ε , dim Λ = m, the block attached to Λ is obtained by removing from Z Λ its intersection with the complete resonant zone of multiplicity m + 1: The blocks B m,p are the connected components of Z m . With no great loss of generality, one can think of (the closure of) a block as a submanifold with boundary and corners -even if it is not necessary.
The following figure shows the construction of the blocks in the case n = 3 (and in a transverse section). The resonance zone of multiplicity 2 if the disjoint union of the blue blocks, the resonance zone of multiplicity 1 is the union on the strips with red boundaries, while the 0-multiplicity zone is the complement of the 1-multiplicity zone.
In any case, the blocks satisfy two main properties. This comes from a very careful choice of the widths of the various resonance zones (see [13] and Section 5), which in fact ensures a more stringent (and crucial) property: the enlargement of a block contained in some B Λ cannot intersect any other block contained in the zone B Λ , neither any other neighborhood M Λ with dim Λ = dim Λ (see below for precisions on the construction of the enlargement).  This raises new questions which could be the starting point of a better understanding of the relations between diffusion along invariant subsets and long-time stability theory. Indeed, given a block B m,p , a description of the (generic) features of the Hamiltonian vector field X Hε at the frontier ∂B m,p has never been done. In particular, nothing is known on the locus where X Hε "enters the block" and the locus where X Hε "exits the block". These two subsets are crucial for the understanding of the homology of the invariant sets contained into the blocks, following Conley's theory, and could provide one with a new tool for constructing diffusing orbits in the steep setting.   Going back to the construction of Nekhoroshev's patchwork, we have to make precise the process conducting to the enlargement of a block and its stability property. Here we will again make a crucial use of the fact that an orbit starting from an initial condition I := I(0) located in B m,p will remain extremely close to the fast drift space I + Vect Λ for 0 ≤ t ≤ T (ε), as long as it stays inside the resonant neighborhood M Λ and far enough to the higher multiplicity resonance zones. Hence, to enlarge the block B m,k , we just have to add to it the collection of all the parts of the disks centered at points I ∈ B m,p which are contained in the intersection of the fast drift spaces I + Vect Λ with the resonant neighborhood M Λ (the resulting added subset is the green part in the previous two figures). We have in fact to add a very small neighborhood of these union of disks, in order to prevent the solutions to exit the extended block under the influence of the remainder part R of the dynamics during the time T (ε), but this would not change our description significantly. Finally, one has to make sure that the extension will not intersect any other block of the same neighborhood B Λ or any other resonance neighborhood, which can be done by a careful tuning of the width of the zone (see Section 5).
This concludes our description of Nekhoroshev's method.

Functional setting
For n ≥ 1, we denote the standard n-dimensional torus by T n = R n /2πZ n and the standard 2n-dimensional annulus by A n = R n × T n .
1. Hölder differentiable functions. Given an integer q ≥ 0 and an open subset D of R n , we denote by C q (D) the set of q-times continuously differentiable maps f : D → R (C 0 (D) being the set of continuous functions on D). We identify C q (T n ) with the subset of C q (R n ) formed by the functions that are 2πZ n -periodic and C q (D × T n ) with the subset of C q (D × R n ) formed by the functions which are 2πZ n -periodic with respect to their last n variables. We use the conventional notation for partial derivatives: given f ∈ C q (D) and α ∈ N n , we set for x ∈ D: is a Banach space with multiplicative norm 14 . It is understood that, for a function defined on a compex domain D, the · C 0 (D) is the usual sup-norm.
If > 0 is a non-integer real number, we write q := for its integer part and µ = − q ∈ (0, 1) for its fractional part. Given a non-negative integer q and µ ∈ (0, 1), we denote by C q,µ b (D) the space formed by those functions f ∈ C q (D) such that It is well-known that C q,µ b (D), |·| C q,µ (D) is also a Banach space with multiplicative norm. Functions belonging to these spaces are called Hölder-differentiable functions.
Given a non-integer real number > 0, together with its integer part q := and its fractional part µ = − q ∈ (0, 1), we also write C b (D) instead of C q,µ b (D) and | · | C (D) instead of | · | C q,µ (D) . Clearly

Domains and their complex extensions.
Let us define the complex n-dimensional torus T n C and the complex 2n-dimensional annulus A n C as (3.4) T n C = C n /2πZ n and A n C = C n × T n C . We use angle coordinates θ on T n C (with the usual abuse θ ∈ C n when there is no ambiguity) and action-angle coordinates (I, θ) on A n C . We see T n C as a real n-dimensional vector bundle over T n . Consequently, we write For integer vectors k ∈ Z n , we use the "dual" 1 -norm, which we write |k| only when there is no risk of confusion.
We need to introduce specific domains in A n C . First, given r > 0, for a domain D ⊂ R n , we set (3.6) D r := z ∈ C n : ∃z * ∈ D : |z − z * | 2 < r .
As for the torus, given s > 0, we introduce the global complex neighborhood (3.7) T n s := θ ∈ T n C : |θ| < s .
We will essentially deal with complex domains of the form (3.8) D r,s := D r × T n s ⊂ A n C . We finally write D R r and D R r,s for the projections of D r and D r,s on R n and A n respectively.
3. Analytic functions and norms. If g is a bounded holomorphic function defined on T n s , D r or D r,s we denote the corresponding classical sup-norms by Fix a bounded holomorphic function g : D r,s+2σ → C, where σ > 0, and let g(I, θ) = k∈Z nĝk (I)e i k·θ be its Fourier expansion, where k · θ = k 1 θ 1 + · · · + k n θ n . We then introduce the weighted Fourier norm which is finite and satisfies (3.11) |g| r,s ≤ ||g|| r,s ≤ coth n σ |g| r,s+σ .
We denote by A r,s the space of holomorphic functions on D r,s with finite Fourier norm. Endowed with this norm, A r,s is a Banach algebra.
Finally, the norm of a vector valued function will be the maximum of the norms of its components.

Analytic smoothing
We state in this section the key ingredient of the present work. We first recall the analytic smoothing method as developed by Jackson-Moser-Zehnder for Hölder functions of R n : given a Hölder function f ∈ C (R n ) and a positive number s ≤ 1, this yields an analytic function on the complex neighborhood R n s whose restriction to R n is close to f in the C k topology, for 1 ≤ k ≤ .
We then adapt their method to our specific setting of functions defined on A n (see Section 4.2) and, in addition, we derive the new estimate (4.22) for the weighted Fourier norm of the smoothed function.

4.1.
Analytic smoothing in R n . We recall here the result by Jackson, Moser and Zehnder, following the presentation by [9] and [24]. Proposition 4.1 (Jackson-Moser-Zehnder). Fix an integer n ≥ 1, a real number > 0 and let f ∈ C b (R n ). Then there is a constant C J = C J ( , n) such that for every 0 < s ≤ 1 there exists a function f s , analytic on R n s , which satisfies for all multi-integer α ∈ N n such that |α| ≤ . More precisely, given any even C ∞ function Φ with compact support in R n and setting Observe that f s takes real values when its argument is in R n .

4.2.
Analytic smoothing in A n . In the following, the Hölder regularity is assumed to satisfy ≥ n + 1 as in the hypotheses of Theorem 1.1. We now specialize the previous result to our setting and give a more detailed description of the method in the case of functions of A n . In that case, the analytic smoothing is a truncation of the Fourier series of the initial Hölder function with suitably modified Fourier coefficients (the so-called Jackson polynomials). Our main concern here is to derive an estimate on the weighted Fourier norm of an s-smoothed C function over a complex strip of width s.
To make the whole presentation more explicit and take the anisotropy of the weighted Fourier norm into account, we first consider functions defined on R n and T n separately. This then yields a statement for functions of A n .
• The non-periodic case. Fix an even function Φ : R n → [0, 1], of class C ∞ , with support in the ball B 2 (0, 1) and let K : C n → C be its Fourier-Laplace transform: Since Φ is compactly supported, then K is an entire function . Moreover its restriction to R n is in the Schwartz class S(R n ) since Φ is, and this is also the case for the translates y → K(y − z) for y ∈ R n and fixed z ∈ C n .
Let f : R n → R be a C function with ≥ n + 1, with compact support contained in the ball B ∞ (0, R 0 ) for some R 0 > 0. Given s ∈ ]0, 1], set for x ∈ R n : By Fourier reciprocity: Therefore, since Φ is even: Hence f s is the inverse Fourier-Laplace transform of the "truncation" The first term of (4.5) shows that f s extends to C n and is an entire function. To get our final estimate we go back to the second term in (4.5), which yields By the Schwartz estimate of Lemma A.1, there exists a constant C n such that K z s − y ≤ C n e Im(z/s−y) (1 + |z/s − y| 2 ) n+1 , so that, for y ∈ R n , z ∈ C n and | Im z| 2 ≤ s: Hence: since z/s is fixed and can be eliminated by a simple translation. We finally get the following estimate: • The periodic case. Fix now an even function Ψ : R n → [0, 1], of class C ∞ , with support in the ball B 1 (0, 1) and define the associate kernel K as in (4.4).
Fix a 2πZ n -periodic function f ∈ C (R n ) with ≥ n + 1. Then the Fourier expansion is well-defined and, by the Fubini interversion theorem: Hence, since K is the inverse Fourier transform of Ψ, by the Fourier inversion theorem: As in the non-periodic case, this makes apparent that f s is a continuous truncation of the Fourier expansion of f with a Ψ-dependent modification of its Fourier coefficients (the so-called Jackson polynomial): Consequently, the Fourier norm depends only on the harmonics such that |k| 1 ≤ 1/s and satisfies Hence, by (4.10): (4.13) f s s ≤ C 2 ( )|f | C with (4.14) C 2 ( ) := e 1 + C F (n, ) • Functions on A n . We finally gather together the previous two cases. Let Φ ⊗ Ψ : R n × R n → [0, 1] be defined by Φ ⊗ Ψ(x, θ) = Φ(x)Ψ(θ), and define the kernel where K Φ and K Ψ are defined as above.
Fix a function f : R n × R n → C, 2πZ n -periodic with respect to its last n variables, with support in B 2 (0, R 0 ) × R n for some R 0 > 0, belonging to C (R 2n ) with ≥ n + 1. For s ∈ ]0, 1] and (x, θ) ∈ R n × R n , set Note that f k is C , with support in B 2 (0, R 0 ), so that the previous study on the non-periodic case applies to f k .

By Fubini interversion
where ( f k ) s stands for the analytic smoothing of the Fourier coefficient f k . This proves that the Fourier coefficient ( f s ) k (x) relative to the periodic variable θ reads Expressions (4.16) and (4.17) make clear that the whole smoothing procedure of a function depending both on action and angle variables consists in constructing a Jackson trigonometric polynomial by smoothing the Fourier coefficients and by suitably truncating the Fourier series.
Using the definition of Ψ, ( f s ) k = 0 when |k| 1 > 1/s and, by (4.17) and (4.9): As for the weighted Fourier norm of f s , we finally get: where (4.  .
Moreover, f s is a trigonometric polynomial in the angular variables.
Proof. Fix a function χ ∈ C ∞ (R n ), with values in [0, 1], equal to 1 on the ball B ∞ (0, R) and with support in B ∞ (0, 2R). Then the product f := χf is C on A n , has compact support in B ∞ (0, 2R) × T n and coincides with f on B ∞ (0, R) × T n . Moreover where C K = C|χ| C (B∞(0,R)×T n ) and C is a universal constant. By the Jackson-Moser-Zehnder theorem applied to f , there is an analytic functionf s on A n s satisfying so that for any p ≤ : As a consequence, taking the form of χ into account, one gets Setting C A := C K C J and, since the analyticity width ρ of the integrable part h is greater than s, the bound (4.21) follows. The proof of (4.22) is an immediate consequence of the previous paragraphs if one sets C B := C L × C K .

4.3.2.
An easy way to derive normal forms for Hölder functions from analytic ones. Let us now explain our strategy for a general Hölder Hamiltonian, we will then restrict ourselves to the case where h is analytic. Let where h s is nothing else than the smoothed initial integrable Hamiltonian, g is a resonant part which controls the fast drift in certain directions and f * s is a very small remainder -all these functions being analytic on D. The keypoint in our subsequent constructions is the following very simple equality (4.28) This is a normal form for H, obtained by composition of H with an analytic diffeomorphism, in which the first three terms are analytic on D and only the last one is C . So H • Ψ has the same structure and dynamical interpretation as H s • Ψ, provided that the C size of the additional remainder (H − H s ) • Ψ is of the same order as the size of the initial remainder f * s . This issue strongly depends on the analytic smoothing method in use, we will show in the sequel that the Jackson-Moser-Zehnder method is relevant for our purposes. Our study will be even easier since we assume from the beginning that the integrable part h is analytic.
It turns out that the same smoothing method -and the same simple way to get a normal form from an analytical one -are also relevant in many other functional classes, the main ones being the Gevrey classes already used in [18], but also other ultradifferentiable ones. This will be developed in a further work.

Estimates of stability
The aim of this section is to prove Theorem 1.1. The proof consists of several steps. Following the discussion in section 2 of the introduction, we first build an appropriate resonant covering of the phase space for the integrable Hamiltonian h. Secondly, we study the local dynamics by applying Pöschel's resonant normal form (see Appendix B) in each resonant block and we set the dependencies of the ultraviolet cut-off K and analyticity widths r, s on the perturbative parameter ε. Finally, we exploit the properties of the resonant covering and we obtain a global result of stability by exploiting the so called "capture in resonance" argument.

Construction of the resonant patchwork.
In the sequel, we follow ref. [13], in which the choices of the parameters and the dependencies of the small denominators on the ultraviolet cut-off K are justified heuristically. For the sake of clarity, in order to have coherent notations we denote by D Λ rather than B Λ the resonant blocks introduced in Section 2, moreover when possible we will not keep track of constants 15 but rather indicate their presence in bounds and equalities by using the following symbols respectively: , and .
Since h is steep in B ∞ (0, R), the norm of the frequency ω := ∂ I h(I) at any point of this set admits a uniform lower positive bound, that is inf I∈B∞(0,R) ||ω(I)|| 1. Hence, when studying the geography of resonances for h, for sufficiently small ε and without any loss of generality we can just consider maximal lattices Λ ⊂ Z n K of dimension j ∈ {0, ..., n − 1}, with K ≥ 1 the ultraviolet cut-off. For a lattice Λ of dimension j ∈ {0, ..., n − 1} we define its associated resonant zone as (5.3) Z Λ := {I ∈ B 2 (I 0 , R(ε)) : ∀k ∈ Λ one has |k · ω(I)| < δ Λ } , δ Λ : 1 |Λ|K qj . and its associated resonant block D Λ as Note that D Λ corresponds to that part of the resonant zone Z Λ which does not contain any other resonances other than the one associated to Λ. In particular, this implies that for the completely non-resonant block associated to Λ = {0} and for any block Λ corresponding to a maximal resonance of dimension j = n − 1 one has, respectively For any j ∈ {0, ..., n − 1} we set It is easy to see from (5.4) that so that from the definition of D 0 in (5.5) one has the decompositions As we have explained in the introduction (see section 2), a large drift over a short time of any action variable I ∈ D Λ is only possible along the plane of fast drift I + Λ spanned by the vectors belonging to Λ. Moreover, the fast motion of the orbit starting at I along I + Λ can take the actions out of the block D Λ . So, we are interested in understanding what happens when the actions leave D Λ but keep staying in Z Λ . Hence, we are naturally taken to consider the intersection of a neighborhood of I + Λ with Z Λ . In this spirit, we fix and, for any 0 < η ≤ ρ(ε) and for any action I ∈ D Λ with Λ = {0}, we define the disc associated to I as where M was defined in (1.6). In the same way, the extended non-resonant block is defined as For a proof of this result we refer to Lemma 2.1 of ref. [13]. We notice that a smaller value of ε, i.e. a higher value of K since the ultraviolet cut-off is always a decreasing function of ε, leads to a closer maximal distance between any action I belonging to a resonant block and any action belonging to its disc.
Since we will perform normal forms in the (extended) resonant blocks, we also need an estimate of the small divisors in these sets, namely we have Lemma 5.2. For any maximal lattice Λ ∈ Z n K of dimension j ∈ {0, ..., n − 1}, for any k ∈ Z n K \Λ and for any I ∈ D ρ Λ,rΛ one has whereas for any action I in the completely non-resonant block D 0 and for any k ∈ Z n K one has We refer again to [13, Lemma 2.2] for a proof of this result. Finally, a key ingredient in order to insure stability in the steep case is the fact that, when possibly exiting a resonant zone along the plane of fast drift, the actions must enter another resonant zone associated to a lattice of lower dimension. This is the content of Lemma 5.3. Let Λ, Λ two maximal lattices of Z n K having the same dimension j ∈ {1, ..., n − 1}. Then one has Once again, the proof of this Lemma can be found in [13] (Lemma 2.3).
With the ingredients of this paragraph, we are able to prove stability.

5.3.
Proof of Theorem 1.1. We start by giving the standard estimates of stability in the completely nonresonant extended block D ρ 0 . Note that the following bounds do not require any geometric assumption on the integrable part h.  where g 0 := P Λ P K f s and P Λ , P K are the projectors defined in Lemma B.1.
• Setting of the initial parameters Let us set the following dependences on of the ultraviolet cut-off K and of the analyticity widths r, s , r : where 0 is a free parameter and ≤ 0 since K ≥ 1.
Remark 5.1. The freedom in the definitions above is subordinated to the fact that, in order for the construction to be meaningful, the reminder produced by the normal form must be less than or equal to the size of the additional term (H − H s ) • Ψ 0 , byproduct of the analytic smoothing. As we are working in finite regularity, the latter is expected to be polynomial. The reminder of the normal form being of order e −Ks , one must have Ks ∼ O(| log | c ) for some c > 0. Since s tunes the size of the remainder yielded by the analytic smoothing, it has to be polynomial. Hence one is left with two possibilities: either the choice we made in (5.23), or to set K ∼ −a | log | c and s ∼ a . However this second choice would worsen the exponents of stability, since the thresholds of applicability in the normal form lemma strongly depend on K. Of course, to deal with other regularity classes, such as the Gevrey one, other choices must be made.
By plugging the choices (5.23) into the three thresholds in (5.20), it is easy to see that there exists an appropriate choice of 0 that makes the three conditions to be simultaneously satisfied. Hence, for the Hölder Hamiltonian Note that since we are in a completely non-resonant block, the resonant term g does not appear in the normal form. Now, the normal form in Lemma B.1 insures that there exists a constant ξ > 1 such that any initial condition (I(0), θ(0)) ∈ D ρ 0 × T n is mapped by Ψ 0 into ( Finally, by writing in the usual way |∂ ϑi Ψ 0 | = |∂ ϑi (Ψ 0 − id + id)|, the Cauchy estimates together with the bounds in B.5 imply (since r ≤ s) It is easy to see from estimates (5.26), (5.27) and (5.28) that, in order, the remainder from the analytic smoothing dominates on the one coming from the normal form, namely As for the dynamics in the resonant blocks, we have the following Lemma 5.5. Consider a maximal lattice Λ ⊂ Z n K of dimension j ∈ {1, ..., n − 1}. There exists T j > 0 such that for any sufficiently small ε and for any initial condition (I(0), θ(0)) ∈ D Λ ∩ B I 0 , R(ε) − (j + 1)ρ(ε) × T n , if one sets T Λ :=T j × r Λ | ln ε 6(1+a ) | −1 ε 1+a( −1) , a := 1 2np 1 , (5.30) and considers the time of escape of the flow generated by H from the extended resonant block τ e := inf t ∈ R : Φ t H D Λ ∩ B I 0 , R(ε) − (j + 1)ρ(ε) × T n ⊂ D ρ Λ,rΛ × T n , (5.31) the following dichotomy applies: (1) If |τ e | ≥ T Λ one has Remark 5.2. The decompositions in (5.8) are a covering of B(I 0 , R(ε)) but they are not a partition since, in general, D i ∩ D j = ∅ for j > i + 1. Hence, nothing prevents I(τ e ) from belonging to a resonant block of strictly higher multiplicity than the starting one. If this happens, however, thanks to the construction in (5.8), one is insured that I(τ e ) will also belong to another block associated to a lower order resonance. One therefore chooses the block in which to study the evolution of the actions once they leave the resonant zone they started at. This is at the core of the resonant trap argument, which is discussed in the sequel.
(2) or the actions enter a resonant block D i ∩ B I 0 , R(ε) − jρ(ε) corresponding to a resonant lattice of dimension i < j after having travelled a distance ρ(ε) over a time inferior to the time of escape. In this block, the above arguments can be repeated so that, after having possibly visited at most n − 1 blocks, overall the actions can travel at most a distance (n − 1)ρ(ε) before entering the completely non-resonant block, in which they are trapped for a time T 0 given by Lemma 5.4 and they travel for another length ρ(ε). Thanks to (5.9), by construction one has |I(t) − I(0)| ≤ nρ(ε) = 1 2 R(ε) ε b . This is the so-called resonant trap argument and concludes the proof of Theorem 1.1, once one sets a = a( − 1)
Furthermore, Ψ is close to the identity, in the sense that, for any (I, θ) ∈ D Λ, /2,σ/6 , one has (B.5) where Π I , Π θ denote the projection on the action and angle variables, respectively.
Declarations. Data sharing not applicable to this article as no datasets were generated or analyzed during the current study. Conflicts of interest: The authors have no conflicts of interest to declare.