1 Introduction and statement of the result

We reconsider the classical problem of iteration of analytic functions, previously investigated by Schröder [14] and Siegel [15], extending it to higher dimension. Our aim was to improve the existing results on the convergence radius of an analytic near the identity transformation that conjugates the map to its linear part, thus producing general and possibly optimal estimates.

We consider an analytic map of the complex space \({\mathbb {C}}^n\) into itself that leaves the origin fixed, so that it may be written as

$$\begin{aligned} x' = \varLambda x + v_1(x) + v_2(x) + \ldots \;,\quad x\in {\mathbb {C}}^n \end{aligned}$$
(1)

where \(\varLambda \) is a \(n\times n\) complex matrix and \(v_s(x)\) is a homogeneous polynomial of degree \(s+1\). The series is assumed to be convergent in a neighborhood of the origin of \({\mathbb {C}}^n\). We also assume that \(\varLambda =\hbox {diag}(\lambda _1,\ldots ,\lambda _n)\) is a diagonal complex matrix with non-resonant eigenvalues, in a sense to be clarified in a short.

The problem raised by Schröder is the following: to find an analytic near the identity coordinate transformation written as a convergent power series

$$\begin{aligned} y_j = x_j + \varphi _{1,j}(x) + \varphi _{2,j}(x) + \cdots \ ,\quad j=1,\ldots ,n, \end{aligned}$$
(2)

with \(\varphi _{s,j}\) of degree \(s+1\) which conjugates the map (1) to its linear part, namely in the new coordinates \(y\) the map takes the form

$$\begin{aligned} y' = \varLambda y. \end{aligned}$$
(3)

Let us write the eigenvalues of the complex matrix \(\varLambda \) in exponential form, namely as \(\lambda _j=e^{\mu _j+i\omega _j}\), with real \(\mu _j\) and \(\omega _j\). We define the sequence \(\{\beta _r\}_{r\ge 0}\) of positive real numbers as

$$\begin{aligned} \beta _0=1\ ,\quad \beta _r=\min _{|k|=r+1} \bigl |e^{\langle k,\mu +i\omega \rangle -\mu _j-i\omega _j}-1\bigr |\>,\ r\ge 1 \end{aligned}$$
(4)

The eigenvalues are said to be non-resonant if \(\beta _r\ne 0\) for \(r\ge 1\). This is enough in order to positively answer the question raised by Schröder in formal sense, i.e., disregarding the question of convergence.

We briefly recall some known results that motivate the present work. For a detailed discussion of the problem, we refer to Arnold’s book [1]. The eigenvalues are said to belong to the Poincaré domain in case all \(\mu _j\)’s have the same sign (i.e., the eigenvalues \(\lambda \) are all inside or all outside the unit circle in the complex plane). The complement of the Poincaré domain is named Siegel domain. In the latter case, the problem raised by Schröder represented a paradigmatic case of perturbation series involving small divisors, since the sequence \(\{\beta _r\}_{r\ge 0}\) above has zero as inferior limit.

The formal solution of the problem has been established by Schröder [14] for the case \(n=1\), when the Siegel domain is the unit circle, i.e., \(\lambda =e^{i\omega }\). The problem of convergence in this case has been solved by Siegel [15], assuming a strong non-resonance condition of diophantine type. Siegel’s proof is worked out using the classical method of majorants introduced by Cauchy and makes use of a delicate number-theoretical lemma, often called Siegel’s lemma, putting particular emphasis on the relevance of Diophantine approximations. This was actually the first proof of convergence of a meaningful problem involving small divisors.

A condition of diophantine type was also used by Kolmogorov in his proof of persistence of invariant tori under small perturbation of integrable Hamiltonian systems and has become a standard one in KAM theory. It should be noted that Kolmogorov introduced also an iteration scheme exhibiting a fast convergence, which he described as “similar to Newton’s method” and is often referred to as “quadratic method.” The fast convergence scheme applies also to the case of maps discussed here: see, e.g., [1], section 28.

In the last decades, two problems have been raised (among many others). The first one is concerned with determining the optimal non-resonance condition and the second one with the size of the convergence radius of the transformation to normal form. In the case \(n=1\), both questions have been answered exploiting the geometric renormalization approach introduced by Yoccoz (see [2, 16, 17]). The optimal set of rotation numbers for which an analytic transformation to linear normal form exists has been identified with the set of Bruno numbers. These numbers obey a non-resonance condition weaker than the diophantine one introduced by Siegel.

Bruno’s condition in case \(n> 1\) may be stated as follows. Let us introduce the sequence \(\{\alpha _r\}_{r\ge 0}\) defined as

$$\begin{aligned} \alpha _r = \min _{0\le s\le r} \beta _s\ ,\quad r\ge 0. \end{aligned}$$
(5)

The latter sequence is monotonically non-increasing, and if the eigenvalues belong to Siegel’s domain has zero as lower limit. The non-resonance condition is

The series expansion of the transformation that solves Schröder’s problem is proved to be convergent in a disk \(\varDelta _{\rho }\) centered at the origin with radius \(\rho (\lambda )\) satisfying

(6)

where \(C'\) is a constant independent of \(\lambda \). For \(n=1\), the optimal value has been proved to be \(C=1\). However, the geometric renormalization methods cannot be extended to the case \(n> 1\), for which only the value \(C=2\) has been found. A proof that gives \(C=1\) in the case \(n=1\) using the majorant method has been given in [3], but has not been extended to the case \(n> 1\).

In this paper, we prove that the same bound with \(C=1\) holds true in any finite dimension. We obtain this result by implementing a representation of the map with the method of Lie series and Lie transforms and producing convergence estimates in the spirit of Cauchy’s majorants method. The formal implementation of the representation of maps may be found in [11].

This paper extends to the case of maps a previous similar result for the case of the Poincaré–Siegel theoretical center problem, where the problem of linearization of an analytic system of differential equations in the neighborhood of a fixed point is considered. We stress that the interest of the method is not limited to the cases mentioned here; e.g., for applications to KAM theory see [4, 5] and [12]; a proof of Lyapounov’s theorem on the existence of periodic orbits in the neighborhood of the equilibrium is given in [10]; extensions of Lyapounov’s theorem may be found in [6] and [8].

We come now to a formal statement of our result. We assume the following

Condition \(\varvec{\tau }\): The sequence \(\alpha _r\) above satisfies

$$\begin{aligned} -\sum _{r\ge 1} \frac{\ln \alpha _r}{r(r+1)} = \Gamma < \infty . \end{aligned}$$
(7)

Theorem 1

Consider the map (1) and assume that the eigenvalues of \(\varLambda \) are non-resonant and satisfy condition \(\varvec{\tau }\). Then there exists a near to identity coordinate transformation \(y=x+\psi (x)\), with \(\psi \) analytic at least in the polydisk of radius \(B^{-1}e^{-\Gamma }\), where \(B>0\) is a universal constant, which transforms the map into the normal form \(y'=\varLambda y\).

Our condition \(\varvec{\tau }\) is equivalent to Bruno’s one. We have indeed

However, our formulation of condition \(\varvec{\tau }\) comes out naturally from our analysis of accumulation of small divisors and turns out to be the key that allows us to find the estimate with \(C=1\). For a brief discussion of the relations between the two conditions, see [9].

The paper is organized as follows. In Sect. 2, we include the essential information on the formal algorithm, referring to [11] for details. In Sect. 3, we give the technical estimates that lead to the proof of convergence of the formal algorithm. In particular, we include a complete discussion of the mechanism of accumulation of small divisors which allows us to have an accurate control. We remark in particular that we do not need to use the Siegel lemma, because the latter controls combinations of small divisors that do not occur in our scheme. The estimates of Sect. 3 are used in Sect. 4 in order to complete the proof of the main theorem. A technical appendix follows.

2 Formal algorithm

We represent a map of type (1) using Lie transforms. A detailed exposition of the representation algorithm and of the method of reduction to normal form is given in [11], where the problem is treated at a formal level. We refer to that paper for details concerning the formal setting, while in this paper, we pay particular attention to the problem of convergence. Here, we just include the definitions that we will need later, so that also the notations will be fixed. Some relevant lemmas are reported in Appendix A.

We start with the definition of Lie series and Lie transform. Let \(X_s(x)\) be a vector field on \({\mathbb {C}}^n\) whose components are homogeneous polynomials of degree \(s+1\). We will say that \(X_s(x)\) is of order \(s\), as indicated by the label. Moreover, in the following, we will denote by \(X_{s,j}\) the \(j\)th component of the vector field \(X_s\). The Lie series operator is defined as

$$\begin{aligned} \exp (L_{X_s}) = \sum _{j\ge 0} \frac{1}{j!} L_{X_s}^j \end{aligned}$$
(8)

where \(L_{X_s}\) is the Lie derivative with respect to the vector field \(X_s\).

Let now \(X=\{X_j\}_{j\ge 1}\) be a sequence of polynomial vector fields of degree \(j+1\). The Lie transform operator \(T_X\) is defined as

$$\begin{aligned} T_{X} = \sum _{s\ge 0} E^{X}_s, \end{aligned}$$
(9)

where the sequence \(E^{X}_s\) of linear operators is recursively defined as

$$\begin{aligned} E^{X}_0 = \mathsf{1}\ ,\quad E^{X}_s = \sum _{j=1}^{s} \frac{j}{s}L_{X_j} E^{X}_{s-j}. \end{aligned}$$
(10)

The superscript in \(E^{X}\) is introduced in order to specify which sequence of vector fields is intended. By letting the sequence to have only one vector field different from zero, e.g., \(X=\{0,\ldots ,0,X_k,0,\ldots \}\) it is easily seen that one gets \(T_X=\exp \bigl (L_{X_k}\bigr )\).

2.1 Representation and conjugation of maps

We recall the representation of maps introduced in [11] together with some formal results that we are going to use here. Let \(\varLambda =e^{\mathsf{A}}\), i.e., in our case, \(\mathsf{A}=\hbox {diag}(\mu _1+i\omega _1,\ldots ,\mu _n+i\omega _n)\) with \(\lambda _j=e^{\mu _j+i\omega _j}\). Remark that we may express the linear part of the map as a Lie series by introducing the exponential operator \(\mathsf{R}=\exp \bigl (L_{\mathsf{A}x}\bigr )\). The action of the operator \(\mathsf{R}\) on a function \(f\) or on a vector field \(V\) is easily calculated as

$$\begin{aligned} \bigl (\mathsf{R}f\bigr )(x) = f(\varLambda x)\;,\quad \bigl (\mathsf{R}V\bigr )(x) = \varLambda ^{-1} V(\varLambda x). \end{aligned}$$
(11)

The first result is concerned with the representation of the map (1) using a Lie transform.

Lemma 1

There exist generating sequences of vector fields \(V= \bigl \{V_s(x)\bigr \}_{s\ge 1}\) and \(W =\bigl \{W_s(x)\bigr \}_{s\ge 1}\) with \(W_s =\mathsf{R}V_s\) such that the map (1) is represented as

$$\begin{aligned} x' = \mathsf{R}\circ T_{V} x\quad \mathrm{and}\quad x' = T_{W}\circ \mathsf{R}\, x \end{aligned}$$
(12)

The second result is concerned with the composition of Lie transforms.

Lemma 2

Let \(X,\,Y\) be generating sequences. Then, one has \(T_X\circ T_Y = T_Z\) where \(Z\) is the generating sequence recursively defined as

$$\begin{aligned} Z_1 = X_1 + Y_1\; ,\quad Z_s = X_s + Y_s + \sum _{j=1}^{s-1} \frac{j}{s} E^{X}_{s-j} Y_j\ . \end{aligned}$$
(13)

The latter formula reminds the well-known Baker–Campbell–Hausdorff composition of exponentials. The difference is that the result is expressed as a Lie transform instead of an exponential, which makes the formula more effective for our purposes.

The third result gives the algorithm that we will use in order to conjugate the map (1) to its linear part. We formulate it in a more general form, looking for conjugation between maps. Let two maps

$$\begin{aligned} x'=T_{W}\circ \mathsf{R}\, x\ ,\quad y'=T_{Z}\circ \mathsf{R}\, y \end{aligned}$$
(14)

be given, where \(W=\bigl \{W_s\bigr \}_{s\ge 1}\), \(Z=\bigl \{Z_s\bigr \}_{s\ge 1}\) are generating sequences. We say that the maps are conjugated up to order \(r\) in case there exists a finite generating sequence \(X=\bigl \{X_1,\ldots ,X_r\bigr \}\) such that the transformation \(y=S^{(r)}_{X}x\) makes the difference between the maps to be of order higher than \(r\), i.e.,

$$\begin{aligned} S^{(r)}_{X} x'\Big |_{x' = T_{W}\circ \mathsf{R}\, x} - T_{Z}\circ \mathsf{R}\, y\Big |_{y = S^{(r)}_{X} x} = \mathcal{O}(r+1), \end{aligned}$$

where \(S^{(r)}_{X} =\exp \bigl (L_{X_r}\bigr )\circ \ldots \circ \exp \bigl (L_{X_1}\bigr )\). Suppose that we have \(W_1=Z_1,\ldots ,W_{r}=Z_{r}\). Then, the maps are conjugated up to order \(r\), since one has \(T_{W} x - T_{Z} x = \mathcal{O}(r+1)\).

Lemma 3

Let the generating sequences of the maps (14) coincide up to order \(r-1\) and let \(X_r\) be a vector field of order \(r\) generating the near the identity transformation \(y=\exp \bigl (L_{X_r} x\bigr )\). Then, the maps are conjugated up to order \(r\) if

$$\begin{aligned} T_{Z} = \exp \bigl (L_{X_r}\bigr ) \circ T_{W} \circ \exp \bigl (L_{-\mathsf{R}X_r}\bigr ). \end{aligned}$$
(15)

The vector field \(X_r\) must satisfy the equation

$$\begin{aligned} \mathsf{D}X_r = W_r - Z_r\ ,\quad \mathsf{D}= \mathsf{R}-\mathsf{1}. \end{aligned}$$
(16)

2.2 Construction of the normal form

Following Schröder and Siegel, we want to conjugate the map (1) to its linear part; that is, writing the map as

$$\begin{aligned} x' =T_{W^{(0)}}\circ \mathsf{R}x \end{aligned}$$
(17)

with a known sequence \(W^{(0)}\) of vector fields, we want to reduce it to the linear normal form

$$\begin{aligned} x' = \mathsf{R}x. \end{aligned}$$
(18)

To this end, we look for a generating sequence \(\{X_r\}_{r\ge 1}\) of vector fields and a corresponding sequence \(\{W^{(r)}\}_{r\ge 1}\) satisfying \(W^{(r)}_1=\ldots =W^{(r)}_r=0\). We emphasize that the map \(x' =T_{W^{(r)}}\circ \mathsf{R}x\) is conjugated to (18) up to order \(r\). We say that \(W^{(r)}\) is in normal form up to order \(r\).

According to (16), we should solve for \(X_r\) the equation

$$\begin{aligned} \mathsf{D}X_r = W^{(r-1)}_r\ ,\quad \mathsf{D}= \mathsf{R}- \mathsf{1}. \end{aligned}$$
(19)

The operator \(\mathsf{D}\) is diagonal on the basis of monomials \(x^k\mathbf{e}_j = x_1^{k_1}\cdot \ldots \cdot x_n^{k_n}\mathbf{e}_j\), where \((\mathbf{e}_1,\ldots ,\mathbf{e}_n)\) is the canonical basis of \({\mathbb {C}}^n\). For we have

$$\begin{aligned} \mathsf{D}\, x^k\mathbf{e}_j = \bigl (e^{\langle k,\mu +i\omega \rangle - \mu _j-i\omega _j}-1\bigr )\, x^k\mathbf{e}_j. \end{aligned}$$
(20)

Thus, provided the eigenvalues of \(\mathsf{D}\) are not zero, the vector field \(X_r\) is determined as

$$\begin{aligned} X_r = \sum _{j=1}^{n} \mathbf{e}_j \sum _{k} \frac{w_{j,k}}{e^{\langle k,\mu +i\omega \rangle - \mu _j-i\omega _j}-1} x^k, \end{aligned}$$

where \(w_{j,k}\) are the coefficients of \(W^{(r-1)}_r\).

Next, we need an explicit form for the transformed sequence \(W^{(r)}\). We use the conjugation formula (15) replacing \(W\) and \(Z\) with \(W^{(r-1)}\) and \(W^{(r)}\), respectively. It is convenient to introduce an auxiliary vector field \(V^{(r)}\) and to split the formula as

$$\begin{aligned} T_{V^{(r)}}&= T_{W^{(r-1)}} \circ \exp \bigl (L_{-\mathsf{R}X_r}\bigr ).\nonumber \\ T_{W^{(r)}}&= \exp \bigl (L_{X_r}\bigr ) \circ T_{V^{(r)}}, \end{aligned}$$
(21)

A more explicit form comes from Lemma 2, recalling that \(T_{X}=\exp (L_{X_r})\) if one considers the generating sequence \(X=\{0,\ldots ,0,X_r,0,\ldots \}\). The auxiliary vector field \(V^{(r)}\) is determined as

$$\begin{aligned} V^{(r)}_r&= W^{(r-1)}_r - \mathsf{R}X_r,\nonumber \\ V^{(r)}_s&= W^{(r-1)}_s -\frac{r}{s} E^{(r-1)}_{s-r} \mathsf{R}X_r \qquad \hbox {for}\; s> r, \end{aligned}$$
(22)

where we use the simplified notation \(E^{(r-1)}\) in place of \(E^{(W^{(r-1)})}\). Having so determined the sequence \(V^{(r)}\), we calculate the transformed sequence \(W^{(r)}\) as

$$\begin{aligned} W^{(r)}_r&= V^{(r)}_r + X_r\nonumber \\ W^{(r)}_{s}&= V^{(r)}_{s} + \frac{1}{s} \sum _{k=1}^{\lfloor s/r\rfloor -1} {\frac{s-kr}{k!}} L_{X_r}^{k} V^{(r)}_{s-kr},\qquad s> r. \end{aligned}$$
(23)

Here, a remark is in order. The formulæ above define the sequences \(V^{(r)}_r,V^{(r)}_{r+1},\ldots \) and \(W^{(r)}_{r+1},W^{(r)}_{r+2},\ldots \) starting with terms of order \(r\) and \(r+1\), respectively. This is fully natural, because all terms of lower order vanish due to \(W^{(r)}\) being in normal form up to order \(r\). Moreover, we emphasize that in view of (16), we should determine \(X_r\) by solving the equation \(\mathsf{D}X_r=W^{(r-1)}_r\), since we want \(W^{(r)}_r=0\).

3 Quantitative estimates

Our aim now was to complete the formal algorithm of the previous section with quantitative estimates that will lead to the proof of convergence of the transformation to normal form. The main result of this section is the iteration lemma of Sect. 3.3 below. However, we must anticipate a few technical tools.

3.1 Norms on vector fields and generalized Cauchy estimates

For a homogeneous polynomial \(f(x)=\sum _{|k|=s}f_kx^k\) (using multiindex notation and with \(|k|=|k_1|+\cdots +|k_n|\)) with complex coefficients \(f_k\) and for a homogeneous polynomial vector field \(X_s=(X_{s,1},\ldots ,X_{s,n})\), we use the polynomial norm

$$\begin{aligned} \bigl \Vert {f}\bigr \Vert = \sum _{k} |f_k|\ ,\quad \bigl \Vert {X_s}\bigr \Vert = \sum _{j=1}^{n}\, \bigl \Vert {X_{s,j}}\bigr \Vert . \end{aligned}$$
(24)

The following lemma allows us to control the norms of Lie derivatives of functions and vector fields.

Lemma 4

Let \(X_r\) be a homogeneous polynomial vector field of degree \(r+1\). Let \(f_s\) and \(v_s\) be a homogeneous polynomial and vector field, respectively, of degree \(s+1\). Then, we have

$$\begin{aligned} \bigl \Vert L_{X_r} f_s\bigr \Vert \le (s+1)\, \Vert X_r\Vert \, \Vert f_s\Vert \quad \mathrm{and}\quad \bigl \Vert {L_{X_r} v_s}\bigr \Vert \le (r+s+2)\, \Vert X_r\Vert \, \Vert v_s\Vert . \end{aligned}$$
(25)

Proof

Write \(f_s=\sum _{|k|=s+1} b_k x^k\) with complex coefficients \(b_k\). Similarly, write the \(j\)th component of the vector field \(X_r\) as \(X_{r,j}=\sum _{|k'|=r+1} c_{j,k'} x^{k'}\). Recalling that \(L_{X_r}f_s=\sum _{j=1}^{n}X_{r,j}{{{\partial }{f_s}}\over {{\partial }{x_j}}}\), we have

$$\begin{aligned} L_{X_r}f_s = \sum _{j=1}^{n} \sum _{k,k'} \frac{c_{j,k'}k_jb_k}{x_j} x^{k+k'}. \end{aligned}$$

Thus, in view of \(|k_j|\le s+1\), we have

$$\begin{aligned} \bigl \Vert {L_{X_r}f_s}\bigr \Vert \le (s+1)\sum _{j=1}^{n}\sum _{k'} |c_{j,k'}| \sum _{k} |b_k| = (s+1)\Vert {X_r}\Vert \, \Vert {f_s}\Vert , \end{aligned}$$

namely the first of (25). In order to prove the second inequality recall that the \(j\)th component of the Lie derivative of the vector field \(v_s\) is

$$\begin{aligned} \bigl (L_{X_r} v_s\bigr )_j = \sum _{l=1}^{n}\left( X_{r,l}{{{\partial }{v_{s,j}}}\over {{\partial }{x_l}}} - v_{s,l}{{{\partial }{X_{r,j}}}\over {{\partial }{x_l}}}\right) . \end{aligned}$$

Then using the first of (25), we have

$$\begin{aligned} \biggl \Vert {\sum _{l=1}^{n}\left( X_{r,l}{{{\partial }{v_{s,j}}}\over {{\partial }{x_l}}} - v_l{{{\partial }{X_{r,j}}}\over {{\partial }{x_l}}}\right) }\biggr \Vert \le (s+1) \Vert {X_r}\Vert \, \Vert {v_{s,j}}\Vert +(r+1) \Vert {v_s}\Vert \, \Vert {X_{r,j}}\Vert , \end{aligned}$$

which readily gives the wanted inequality in view of the definition (24) of the polynomial norm of a vector field.\(\square \)

Lemma 5

Let \(V_r\) be a homogeneous polynomial vector field of degree \(r+1\). Then the solution \(X_r\) of the equation \(\mathsf{D}X_r = V_r\) satisfies

$$\begin{aligned} \bigl \Vert {X_r}\bigr \Vert \le \frac{1}{\alpha _r}\bigl \Vert {V_r}\bigr \Vert \ ,\quad \bigl \Vert {\mathsf{R}X_r}\bigr \Vert \le \frac{1+\alpha _r}{\alpha _r}\bigl \Vert {V_r}\bigr \Vert , \end{aligned}$$
(26)

with the sequence \(\alpha _r\) defined by (5)

Proof

The first inequality is a straightforward consequence of the definition (24) of the norm and of the sequence \(\alpha _r\) in terms of \(\beta _r\) defined by (4). If \(v_{j,k}\) are the coefficients of \(V_r\), then the coefficients of \(X_r\) are bounded by \(|v_{j,k}|/\beta _r\le |v_{j,k}|/\alpha _r\). The second inequality follows from \(\mathsf{R}X_r = X_r+V_r\), which gives the stated inequality.\(\square \)

3.2 Accumulation of small divisors

Lemma 5 shows that solving the equation for every vector field of the generating sequence introduces a divisor \(\alpha _r\). Such divisors do accumulate, and our aim now is to analyze in detail the process of accumulation. It will be convenient to introduce a further sequence \(\{\sigma _r\}_{r\ge 0}\) defined as

$$\begin{aligned} \sigma _0 = 1\;,\quad \sigma _r = \frac{\alpha _r}{r^2}\>,\ r\ge 1\>, \end{aligned}$$
(27)

which will play a major role in the rest of the proof. The extra factor \(1/r^2\) can be interpreted as due to the generalized Cauchy estimates for derivatives that are also a source of divergence in perturbation processes. The quantities \(\sigma _r\) are the actual small divisors that we must deal with. Here, we follow the scheme presented in [9], with a few variazioni. However, since this is a crucial part of the proof, we include it in a detailed and self-contained form.

The guiding remark is that the small divisors \(\sigma _r\) propagate and accumulate through the formal construction due to the use of the recursive formulæ (22) and (23); e.g., the expression \(L_{X_r}V_s^{(r)}\) will contain the product of the divisors contained in both \(X_r\) and \(V_s^{(r)}\), with no extra divisors generated by the Lie derivative. The explicit constructive form of the algorithm allows us to have a quite precise control on the accumulation process. It is an easy remark that unfolding the recursive formulæ (22) and (11) will produce an estimate of \(\Vert {W^{(r)}_{s}}\Vert \) as a sum of many terms every one of which contains as denominator a product of \(q\) divisors of the form \(\sigma _{j_1}\ldots \sigma _{j_q}\), with some indexes \(j_1,\ldots ,j_q\) and some \(q\) to be found. This is what we call the accumulation of small divisors, and the problem is to identify the worst product among them. The key of our argument is to focus our attention on the indexes rather than on the actual values of the divisors.

We call \(I=\{j_1,\ldots ,j_s\}\) with non-negative integers \(j_1,\ldots ,j_s\) a set of indexes. We introduce a partial ordering as follows. Let \(I=\{j_1,\ldots ,j_s\}\) and \(I'=\{j'_1,\ldots ,j'_s\}\) be two sets of indexes with the same number \(s\) of elements. We say that \(I\triangleleft I'\) in case there is a permutation of the indexes such that the relation \(j_m\le j'_m\) holds true for \(m=1,\ldots ,s\,\). If two sets of indexes contain a different number of elements, we pad the shorter one with zeros and use the same definition. We also define the special sets of indexes

$$\begin{aligned} I^*_s = \Bigl (\Bigl \lfloor \frac{s}{s}\Bigr \rfloor , \Bigl \lfloor \frac{s}{s-1}\Bigr \rfloor ,\ldots , \Bigl \lfloor \frac{s}{2}\Bigr \rfloor \Bigr ). \end{aligned}$$
(28)

Lemma 6

For the sets of indexes \(I_s^*=\{j_1,\ldots ,j_s\}\), the following statements hold true:

  1. (i)

    the maximal index is \(j_\mathrm{max}=\bigl \lfloor \frac{s}{2}\bigr \rfloor \,\);

  2. (ii)

    for every \(k\in \{1,\ldots ,j_{\max }\}\) the index \(k\) appears exactly \(\bigl \lfloor \frac{s}{k}\bigr \rfloor -\bigl \lfloor \frac{s}{k+1}\bigr \rfloor \) times;

  3. (iii)

    for \(0< r\le s\) one has

    $$\begin{aligned} \bigl (\{r\}\cup I^*_r\cup I^*_s\bigr ) \triangleleft I^*_{r+s}. \end{aligned}$$

Proof

The claim (i) is a trivial consequence of the definition.

(ii) For each fixed value of \(s>0\) and \(1\le k\le \lfloor s/2\rfloor \,\), we have to determine the cardinality of the set \(M_{k,s}=\{m\in {\mathbb {N}}:\ 2\le m\le s\,,\ \lfloor s/m\rfloor =k\}\). For this purpose, we use the trivial inequalities

$$\begin{aligned} \biggl \lfloor \frac{s}{\lfloor s/k\rfloor }\biggr \rfloor \ge k \quad \hbox {and}\quad \biggl \lfloor \frac{s}{\lfloor s/k\rfloor +1}\biggr \rfloor < k. \end{aligned}$$

After having rewritten the same relations with \(k+1\) in place of \(k\,\), one immediately realizes that a index \(m\in M_{k,s}\) if and only if \(m\le \lfloor s/k\rfloor \) and \(m\ge \lfloor s/(k+1)\rfloor +1\), therefore \(\#M_{k,s}=\bigl \lfloor \frac{s}{k}\bigr \rfloor -\bigl \lfloor \frac{s}{k+1}\bigr \rfloor \).

(iii) Since \(r\le s\), the definition in (28) implies that neither \(\{r\}\cup I^*_r\cup I^*_s\) nor \(I^*_{r+s}\) can include any index exceeding \(\bigl \lfloor (r+s)/2\bigr \rfloor \). Thus, let us define some finite sequences of non-negative integers as follows:

$$\begin{aligned} \begin{array}{ll} R_k = \#\bigl \{j\in I^*_{r}\>:\> j\le k\bigr \},&{} S_k = \#\bigl \{j\in I^*_{s}\>:\> j\le k\bigr \},\\ M_k = \#\bigl \{j\in \{r\}\cup I^*_r\cup I^*_s\>:\> j\le k\bigr \},&{} N_k = \#\bigl \{j\in I^*_{r+s}\>:\> j\le k\bigr \}, \end{array} \end{aligned}$$

where \(1\le k \le \lfloor (r+s)/2\rfloor \). When \(k< r\), the property (ii) of the present lemma allows us to write

$$\begin{aligned} R_k = r - \Bigl \lfloor \frac{r}{k+1}\Bigr \rfloor ,\qquad S_k = s - \Bigl \lfloor \frac{s}{k+1}\Bigr \rfloor ,\qquad N_k = r+s - \Bigl \lfloor \frac{r+s}{k+1}\Bigr \rfloor ; \end{aligned}$$

using the elementary estimate \(\lfloor x\rfloor + \lfloor y\rfloor \le \lfloor x+y\rfloor \,\), from the equations above it follows that \(M_k \ge N_k\) for \(1\le k<r\,\). In the remaining cases, i.e., when \(r\le k\le \lfloor (r+s)/2\rfloor \,\), we have that

$$\begin{aligned} R_k = r - 1,\qquad S_k = s - \Bigl \lfloor \frac{s}{k+1}\Bigr \rfloor ,\qquad N_k = r+s - \Bigl \lfloor \frac{r+s}{k+1}\Bigr \rfloor ; \end{aligned}$$

therefore, \(M_k=1+R_k+S_k\ge N_k\). Since we have just shown that \(M_k \ge N_k\;\forall \ 1\le k\le \lfloor (r+s)/2\rfloor \), it is now an easy matter to complete the proof. Let us first imagine to have reordered both the set of indexes \(\{r\}\cup I^*_r\cup I^*_s\) and \(I^*_{r+s}\) in increasing order; moreover, let us recall that \(\#\big (\{r\}\cup I^*_r\cup I^*_s\big )=\# I^*_{r+s}=r+s-1\), because of the definition in (28). Thus, since \(M_1\ge N_1\), every element equal to \(1\) in \(\{r\}\cup I^*_r\cup I^*_s\) has a corresponding index in \(I^*_{r+s}\) the value of which is at least \(1\,\). Analogously, since \(M_2\ge N_2\), every index \(2\) in \(\{r\}\cup I^*_r\cup I^*_s\) has a corresponding index in \(I^*_{r+s}\) which is at least 2, and so on up to \(k =\lfloor (r+s)/2\rfloor \). We conclude that \(\{r\}\cup I^*_r\cup I^*_s\triangleleft I^*_{r+s}\).\(\square \)

We come now to identify the sets of indexes that describe the allowed combinations of small divisors. These are the sets

$$\begin{aligned} \mathcal{J}_{r,s} = \bigl \{I=\{j_1,\ldots ,j_{s-1}\}\>:\> j_m\in \{0,\ldots ,\min (r,s/2)\}\,,\> I\triangleleft I^*_s\bigr \}. \end{aligned}$$
(29)

We will refer to condition \(I\triangleleft I^*_s\) as the selection rule \(\mathsf{S}\). The relevant properties are stated by the following

Lemma 7

For the sets of indexes \(\mathcal{J}_{r,s}\) the following statements hold true:

  1. (i)

    \(\mathcal{J}_{r-1,s}\subset \mathcal{J}_{r,s}\);

  2. (ii)

    if \(I\in \mathcal{J}_{r-1,r}\) and \(I'\in \mathcal{J}_{r,s}\) then we have \(\bigr (\{r\} \cup I \cup I'\bigl )\in \mathcal{J}_{r,r+s}\).

Remark

Property (ii) plays a major role in controlling the accumulation of small divisors, since it gives us a control of how the indexes are accumulated. For it reflects the fact that a Lie derivative, e.g., \(L_{X_r}V_s^{(r)}\) contains the union of the divisors in \(V_s^{(r)}\) and in \(X_r\), taking also into account that \(X_r\) contains an extra divisor in view of Lemma 5.

Proof of lemma 7

(i) is immediately checked in view of the definition of \(\mathcal{J}_{r,s}\).

(ii) We have \(\#\bigl (\{r\}\cup \, I\cup \, I'\bigr ) = 1+\#(I)+\#(I') = 1+r-1+s-1 = r+s-1\,\). For \(j\in \{r\}\cup \, I\cup \, I'\,\), we also have \(1\le j\le r\) because this is true for all \(j\in I\) and for all \(j\in I'\), and we just add an extra index \(r\). Since \(r< s\), we also have \(r =\min \bigl (r,(r+s)/2\bigr )\), as required. Coming to the selection rule \(\mathsf{S}\), we remark that \(\{r\}\cup I\cup I'\triangleleft \{r\}\cup I^*_r\cup I^*_s\) readily follows from \(I\triangleleft I^*_r\) and \(I'\triangleleft I^*_s\), which are true in view of \(I\in \mathcal{J}_{r-1,r}\) and \(I'\in \mathcal{J}_{r,s}\), so that the claim follows from property (iii) of Lemma 6.\(\square \)

We come now to consider the accumulation of small divisors. Recall the definition (27) of the sequence \(\sigma _r\). We associate to the sets of indexes \(\mathcal{J}_{r,s}\) the sequence of positive real numbers \(T_{r,s}\) defined as

$$\begin{aligned} T_{0,s} = 1\ ,\quad T_{r,s} = \max _{I\in \mathcal{J}_{r,s}} \prod _{j\in I} \frac{1}{\sigma _j} \ ,\quad 0< r\le s. \end{aligned}$$
(30)

Lemma 8

The sequence \(T_{r,s}\) satisfies the following properties for \(1\le r\le s\):

  1. (i)

    \(T_{r-1,s} \le T_{r,s}\);

  2. (ii)

    \(\frac{1}{\sigma _r} T_{r-1,r} T_{r,s} \le T_{r,r+s}\).

Proof

(i) From property (i) of Lemma 7, we readily get

$$\begin{aligned} T_{r-1,s} = \max _{I\in \mathcal{J}_{r-1,s}} \prod _{j\in I}\frac{1}{\sigma _j} \le \max _{I\in \mathcal{J}_{r,s}} \prod _{j\in I}\frac{1}{\sigma _j}, \end{aligned}$$

since the maximum is evaluated over a larger set of indexes.

(ii) Compute

$$\begin{aligned} \frac{1}{\sigma _r} T_{r-1,r} T_{r,s}&= \frac{1}{\sigma _r} \max \limits _{I\in \mathcal{J}_{r-1,r}} \prod _{j\in I}\frac{1}{\sigma _j} \max \limits _{I'\in \mathcal{J}_{r,s}} \prod _{j'\in I'}\frac{1}{\sigma _{j'}}\\&= \max \limits _{I\in \mathcal{J}_{r-1,r}} \max \limits _{I'\in \mathcal{J}_{r,s}} \prod _{j\in \{r\}\cup \,I\cup \,I'}\frac{1}{\sigma _{j}}\\&\le \max \limits _{I\in \mathcal{J}_{r,r+s}} \prod _{j\in I}\frac{1}{\sigma _j} = T_{r,r+s}, \end{aligned}$$

where in the inequality of the last line property (ii) of Lemma 7 has been used. \(\square \)

The final estimate uses condition \(\varvec{\tau }\) and the definition (27) of the sequence \(\sigma _r\,\).

Lemma 9

Let \(\lambda \) satisfy condition \(\varvec{\tau }\). Then the sequence \(T_{r,s}\) is bounded by

$$\begin{aligned} T_{r,s} \le \gamma ^s e^{s\Gamma }\ ,\quad \frac{1}{\sigma _s} T_{r,s} \le \gamma ^s e^{s\Gamma } \end{aligned}$$

with some positive constant \(\gamma \) not depending on \(\lambda \).

Proof

In view of \(\sigma _s\le 1\), it is enough to prove the second inequality. We use the property (ii) of Lemma 7 and the selection rule \(\mathsf{S}\). We readily get

$$\begin{aligned} \frac{1}{\sigma _s} T_{r,s} = \frac{1}{\sigma _s} \max _{I\in \mathcal{J}_{r,s}} \prod _{j\in I} \frac{1}{\sigma _j} \le \max _{I\in \mathcal{J}_{r,s}} \prod _{j\in \{s\}\cup I} \frac{1}{\sigma _j} \le \prod _{j\in \{s\}\cup I^*_s} \frac{1}{\sigma _j}. \end{aligned}$$

By property (ii) of Lemma 6, the latter product is evaluated as

$$\begin{aligned} \prod _{j\in \{s\}\cup I^*_s} \frac{1}{\sigma _j} = \left[ \sigma _1^{q_1}\cdot \ldots \cdot \sigma _{\scriptscriptstyle \lfloor s/2\rfloor }^{q_{\lfloor s/2\rfloor }} \sigma _s^{q_s} \right] ^{-1}, \end{aligned}$$

where \(q_k =\bigl \lfloor \frac{s}{k}\bigr \rfloor -\bigl \lfloor \frac{s}{k+1}\bigr \rfloor \) is the number of indexes in \(I_s^*\) which are equal to \(k\). In view of the definition (27) of the sequence \(\sigma _s\), we have

$$\begin{aligned} \ln \frac{1}{\sigma _r} T_{r,s} \le -\sum _{k=1}^{s} \Bigl (\Bigl \lfloor \frac{s}{k}\Bigr \rfloor -\Bigl \lfloor \frac{s}{k+1}\Bigr \rfloor \Bigr ) (\ln \alpha _k - 2\ln k) \le - s \sum _{k\ge 1} \frac{\ln \alpha _k - 2\ln k}{k(k+1)} = s(\Gamma + a), \end{aligned}$$

where \(a=\sum _{k\ge 1} \frac{2\ln k}{k(k+1)}<\infty \) is clearly independent of \(\lambda \). The claim follows by just setting \(\gamma =e^a\).\(\square \)

3.3 Iteration lemma

This is the main lemma which allows us to control the norms of the sequence of vector fields \(\{X_r\}_{r> 1}\,\).

Lemma 10

Assume that the sequence \(W^{(0)}\) of vector fields satisfies \(\Vert {W^{(0)}_s}\Vert \le \frac{C_0^{s-1}A}{s}\) with some constants \(A> 0\) and \(C_0\ge 0\). Then the sequence of vector fields \(\{X_r\}_{r\ge 1}\) that for every \(r\) give the normal form \(W^{(r)}\) satisfies the following estimates: there exists a bounded monotonically non-decreasing sequence \(\{C_r\}_{r\ge 1}\) of positive constants, with \(C_r\rightarrow C_{\infty }<\infty \) for \(r\rightarrow \infty \), such that we have

$$\begin{aligned} \bigl \Vert {X_r}\bigr \Vert \le T_{r-1,r} \frac{C_{r-1}^{r-1}A}{r\alpha _r} \; ,\quad \bigl \Vert {W^{(r)}_s}\bigr \Vert \le T_{r,s} \frac{C_r^{s-1} A}{s}. \end{aligned}$$
(31)

The sequence may be recursively defined as

$$\begin{aligned} C_1 = 2C_0+16A\ ,\quad C_r = \Bigl (1+\frac{1}{r^2}\Bigr )^{1/r}\Bigl (1+\frac{1}{r}\Bigr )^{1/r} C_{r-1}, \end{aligned}$$
(32)

so that one has \(C_r> 16A\).

Proof

The proof proceeds by induction. For \(r=0\), the inequality for \(W^{(0)}_s\) is nothing but the initial hypothesis, recalling that by definition we have \(T_{0,s}=1\). From this, the corresponding inequality for \(X_1\) immediately follows by Lemma 5. The induction requires two main steps, namely (a) estimating \(\Vert {V^{(r)}_s}\Vert \) as defined by (22) and (b) estimating \(\Vert {W^{(r)}_s}\Vert \) as defined by (23).

The estimate for \(X_r\) is determined by solving (19). In view of the induction hypothesis (31), we rewrite the claim of Lemma 5 as

$$\begin{aligned} \bigl \Vert {X_r}\bigr \Vert \le \frac{r^2 T_{r-1,r}}{\alpha _r} \cdot \frac{C_{r-1}^{r-1} A}{r^3}\ ,\quad \bigl \Vert {\mathsf{R}X_r}\bigr \Vert \le \frac{T_{r-1,r}}{\alpha _r} \cdot \frac{(1+\alpha _r)C_{r-1}^{r-1} A}{r}. \end{aligned}$$
(33)

In order to estimate \(\Vert {V^{(r)}_s}\Vert \), we first prove that for \(s> r\) we have

$$\begin{aligned} \bigl \Vert {E_{s-1}^{(1)} \mathsf{R}X_1}\bigr \Vert&\le T_{1,s} C_0^{s-1} A \,\frac{8A}{C_0} \left( 1+\frac{4A}{C_0}\right) ^{s-2},\end{aligned}$$
(34)
$$\begin{aligned} \bigl \Vert {E_{s-r}^{(r-1)} \mathsf{R}X_r}\bigr \Vert&< T_{r,s} \frac{C_{r-1}^{s-1} A}{r^2} \left( 1+\frac{1}{r}\right) ^{\frac{s}{r}-2}\quad \mathrm{for}\ r> 1\>. \end{aligned}$$
(35)

The proof of the latter estimates is based on the general inequality

$$\begin{aligned} \bigl \Vert {E_{s-r}^{(r-1)} \mathsf{R}X_r}\bigr \Vert \le \!\!\!\sum _{k=1}^{\lfloor s/r\rfloor -1}\!\!\! \Theta (r,s\!-\!r,k)\, T_{r-1,j_1}\ldots T_{r-1,j_k}\, \frac{T_{r-1,r}}{\alpha _r} \cdot \frac{(1\!+\!\alpha _r)C_{r-1}^{s-k-1}A^{k+1}}{r}\qquad \quad \end{aligned}$$
(36)

where we have introduced the quantities

$$\begin{aligned} \Theta (r,s,k) = \!\!\!\!\mathop {\sum _{j_1+\cdots +j_k=s}}\limits _ {j_1,\ldots ,j_k\ge r} \!\!\!\!\frac{(j_1\!+\cdots +\!j_k\!+\!r\!+\!2) (j_1\!+\cdots \!+j_{k-1}\!+\!r+\!2)\cdots (j_{1}\!+\!r\!+\!2)}{(j_1+\cdots +j_k) (j_1+\cdots +j_{k-1})\cdots j_1}\qquad \qquad \end{aligned}$$
(37)

The inequality (36) follows by repeatedly applying Lemma 4 to the non-recursive expression for \(E^{(r-1)}_{s-r}\) given in Lemma 23. We describe this process in detail. First estimate

$$\begin{aligned} \bigl \Vert {L_{W^{(r-1)}_{j_1}}\mathsf{R}X_r}\bigr \Vert \le (j_1+r+2) T_{r-1,j_1} \frac{C_{r-1}^{j_1-1} A}{j_1} \cdot \frac{T_{r-1,r}}{\alpha _r}\cdot \frac{(1+\alpha _r)C_{r-1}^{r-1} A}{r} \end{aligned}$$

which follows by the induction hypothesis (31), from (33) and from Lemma 4. Remark that the factor \(j_1+r+2\) is the degree of the vector field \(L_{W^{(r-1)}_{j_1}}\mathsf{R}X_r\). The same estimate is repeated for \(L_{W^{(r-1)}_{j_2}}L_{W^{(r-1)}_{j_1}}\mathsf{R}X_r,\ldots \) thus yielding

$$\begin{aligned} \bigl \Vert {L_{W^{(r-1)}_{j_k}}\ldots L_{W^{(r-1)}_{j_1}}\mathsf{R}X_r}\bigr \Vert&\le \frac{(j_1+\cdots +j_k+r+2)\cdots (j_1+r+2)}{j_k\ldots j_1} \nonumber \\&\times T_{r-1,j_1}\ldots T_{r-1,j_k}\, \frac{T_{r-1,r}}{\alpha _r} \cdot \frac{(1\!+\!\alpha _r)C_{r-1}^{s-k-1}A^{k+1}}{r} \end{aligned}$$

In view of the expression (44) of \(E^{(r-1)}_{s-r}\), we get (36) and (37). Here, we need an estimate for \(\Theta (r,s,k)\), that we defer to Appendix A.3. Putting \(m=2\) in (55), we get

$$\begin{aligned} \Theta (r,s-r,k) < 2^kr^{k-1} \Bigl (1 +\frac{1}{r}\Bigr )^{k} {{\frac{s}{r}-2}\atopwithdelims (){k-1}}. \end{aligned}$$
(38)

Furthermore, we isolate the contribution of the quantities \(T_{r-1,\cdot }\) that control the small divisors. With an appropriate use of property (ii) of Lemma 8, we have

$$\begin{aligned} T_{r-1,j_1}\ldots T_{r-1,j_k}\frac{T_{r-1,r}}{\alpha _r}&= \frac{1}{r^2} \left( \frac{\alpha _r}{r^2}\right) ^{k-1} \frac{r^2 T_{r-1,j_1}}{\alpha _r} \cdots \frac{r^2 T_{r-1,j_{k-1}}}{\alpha _r} T_{r-1,j_k} \frac{r^2 T_{r-1,r}}{\alpha _r}\nonumber \\&\le \frac{1}{r^2} \left( \frac{\alpha _r}{r^2}\right) ^{k-1} T_{r,s}. \end{aligned}$$
(39)

Thus, using also \(\alpha _r\le 1\) which is true in view of the definition of \(\alpha _r\), we replace (36) with

$$\begin{aligned} \bigl \Vert {E_{s-r}^{(r-1)} \mathsf{R}X_r}\bigr \Vert&\le T_{r,s} C_{r-1}^{s-1} \frac{(1+\alpha _r)A}{\alpha _r} \sum _{k=1}^{\lfloor s/r\rfloor -1} \Theta (r,s-r,k) \left( \frac{\alpha _r A}{r^2 C_{r-1}}\right) ^k\\&\le T_{r,s} C_{r-1}^{s-1} A\cdot \frac{4\left( 1+\frac{1}{r}\right) A}{r^2 C_{r-1}} \sum _{l=0}^{\lfloor s/r\rfloor -2} {{\frac{s}{r}-2}\atopwithdelims (){l}} \left( \frac{2\left( 1+\frac{1}{r}\right) A}{r C_{r-1}}\right) ^l\\&< T_{r,s} C_{r-1}^{s-1} A\cdot \frac{4\left( 1+\frac{1}{r}\right) A}{r^2 C_{r-1}} \left( 1+\frac{2\left( 1+\frac{1}{r}\right) A}{r C_{r-1}}\right) ^{\frac{s}{r}-2}. \end{aligned}$$

For \(r=1\), the latter formula readily gives (34). For \(r> 1\), we use \(1+\frac{1}{r}< 2\) and \(\frac{8A}{C_{r-1}}< 1\), which follows from the choice of the sequence \(C_r\) in the statement of the Lemma, so that (35) is easily recovered.

We come now to the estimate of \(\Vert {V^{(r)}_s}\Vert \). Here, it is convenient to separate the case \(r=1\). Putting (34) and the induction hypothesis  (31) in (22) we get

$$\begin{aligned} \bigl \Vert {V^{(1)}_s}\bigr \Vert&< T_{0,s}\frac{C_{0}^{s-1} A}{s} + T_{1,s} \frac{C_0^{s-1} A}{s} \cdot \frac{8A}{C_{0}} \left( 1+\frac{4A}{C_{0}}\right) ^{s-2}\\&< T_{1,s} \frac{C_0^{s-1} A}{s} \left[ 1+ \frac{8A}{C_0} \left( 1+\frac{4A}{C_{0}}\right) ^{s-2}\right] \\&< T_{1,s} \frac{C_0^{s-1} A}{s} \left( 1+\frac{8A}{C_{0}}\right) ^{s-1}. \end{aligned}$$

Thus, we may write

$$\begin{aligned} \bigl \Vert {V^{(1)}_s}\bigr \Vert < T_{1,s} \frac{\hat{C}_1^{s-1} A}{s}\ ,\quad \hat{C}_1 =C_0 + 8A. \end{aligned}$$
(40)

For \(r> 1\) we put (35) in (22), and we get

$$\begin{aligned} \bigl \Vert {V^{(r)}_s}\bigr \Vert&< T_{r-1,s} \frac{C_{r-1}^{s-1} A}{s} + T_{r,s} \frac{C_{r-1}^{s-1} A}{r s} \left( 1+\frac{1}{r}\right) ^{\frac{s}{r}-2}\\&< T_{r,s} \frac{C_{r-1}^{s-1} A}{s} \left[ 1+\frac{1}{r} \left( 1+\frac{1}{r}\right) ^{\frac{s}{r}-2} \right] < T_{r,s} \frac{C_{r-1}^{s-1} A}{s} \left( 1+\frac{1}{r}\right) ^{\frac{s-1}{r}} \end{aligned}$$

Thus, we conclude

$$\begin{aligned} \bigl \Vert {V^{(r)}_s}\bigr \Vert < T_{r,s} \frac{\hat{C}_{r}^{s-1} A}{s}\ ,\quad \hat{C}_r = \left( 1+\frac{1}{r}\right) ^{1/r} C_{r-1}. \end{aligned}$$
(41)

Now we look for an estimate of \(\Vert {W^{(r)}_s}\Vert \) as given by (11). Recall that we have \(s> r\), because \(W^{(r)}_r=0\) by construction. We use (40) and (41) together with Lemma 4, and get

$$\begin{aligned} \bigl \Vert {W^{(r)}_{s}}\bigr \Vert&\le T_{r,s}\frac{\hat{C}_{r}^{s-1} A}{s} + \frac{1}{s}\sum _{k=1}^{\lfloor s/r\rfloor -1} \frac{(s+2)(s-r+2)\cdots (s-kr+r+2)}{k!}\\&\times \biggl (\frac{r^2 T_{r-1,r}}{\alpha _r}\biggr )^{\!\! k} T_{r,s-kr} \times \biggl (\frac{C_{r-1}^{r-1}A}{r^2} \biggr )^{\!\! k}\, {\hat{C}_r^{s-kr-1}A}. \end{aligned}$$

Here, we use again the statement (ii) of Lemma 8. By the trivial inequality

$$\begin{aligned} s-jr+2\le 4r\Bigl (\frac{s}{r} - 1 -j\Bigr )\quad \mathrm{for}\quad 0\le j< \lfloor s/r\rfloor -1 \end{aligned}$$

and remarking that \(\hat{C}_r> 4A\), the latter estimate yields

$$\begin{aligned} \bigl \Vert {W^{(r)}_{s}}\bigr \Vert&\le T_{r,s}\frac{\hat{C}_{r}^{s-1} A}{s} \biggl [ 1 + \sum _{k=1}^{\lfloor s/r\rfloor -1} {{\frac{s}{r}-1}\atopwithdelims (){k}} \left( \frac{4A}{r^2\hat{C}_{r}}\right) ^k \biggr ] \\&< T_{r,s}\frac{\hat{C}_{r}^{s-1} A}{s} \sum _{k=0}^{\lfloor s/r\rfloor -1} {{\frac{s}{r}-1}\atopwithdelims (){k}} \left( \frac{1}{r^{2}}\right) ^k\\&< T_{r,s}\frac{\hat{C}_{r}^{s-1} A}{s} \biggl (1+\frac{1}{r^2}\biggr )^{\frac{s-1}{r}}. \end{aligned}$$

We conclude

$$\begin{aligned} \bigl \Vert {W^{(r)}_{s}}\bigr \Vert < T_{r,s}\frac{C_{r}^{s-1} A}{s}\ ,\quad C_{r} = \biggl (1+\frac{1}{r^2}\biggr )^{1/r}\hat{C}_r. \end{aligned}$$
(42)

In view of (40) and (41), this proves the claim of the Lemma.\(\square \)

4 Proof of the main theorem

Having established the estimate of the iteration Lemma 10 on the sequence of generating vector fields, it is now a standard matter to complete the proof of Theorem 1. Hence, this section will be less detailed with respect to the previous ones.

In view of the iteration lemma, we are given an infinite sequence \(\{X_r\}_{r\ge 1}\) of generating vector fields with \(X_r\) homogeneous polynomial of degree \(r+1\) satisfying

$$\begin{aligned} \bigl \Vert {X_r}\bigr \Vert \le T_{r-1,r} \frac{C_{r-1}^{r-1}A}{r\alpha _r}. \end{aligned}$$

By Lemma 9, we have \(\frac{T_{r-1,r}}{\alpha _r}\le \gamma ^r e^{r\Gamma }\) with \(\Gamma \) as in condition \(\varvec{\tau }\) and \(\gamma \) a constant independent of \(\lambda \). Moreover, still by Lemma 10, we have \(C_r\le C_{\infty }\), a positive constant independent of \(\lambda \). Thus, we have

$$\begin{aligned} \bigl \Vert {X_r}\bigr \Vert \le \frac{\eta ^{r}e^{r\Gamma }}{r} K \end{aligned}$$
(43)

with positive constants \(\eta \) and \(K\) independent of \(\lambda \).

We refer now to the analytic setting that we recall in Appendix A.2. Every vector field \(X_r\) generates a near the identity transformation \(y=\exp \bigl (L_{X_r})x\) which transforms the generating sequence \(W^{(r-1)}\) into \(W^{(r)}\) according to the algorithm of Sect. 2.2. Thus, the near the identity transformation to normal form is generated by the limit \(S_X\) of the sequence of operators \(S^{(r)}_X =\exp (L_{X_r})\circ \ldots \circ \exp (L_{X_1})\). We apply Proposition 2 of Appendix A.2. In view of (43), in a polydisk \(\varDelta _\rho \) of radius \(\rho \) centered at the origin of \({\mathbb {C}}^n\), we have

$$\begin{aligned} \bigl |X_r\bigr |_{\rho } \le \bigl \Vert {X_r}\bigr \Vert \rho ^{r+1} \le \frac{\eta ^{r}e^{r\Gamma }}{r} K \rho ^{r+1}, \end{aligned}$$

where \(\bigl |X_r\bigr |_{\rho }\) is the supremum norm. Thus, condition (52) of Proposition 2 reads

$$\begin{aligned} \rho \sum _{r\ge 1} \frac{\eta ^{r}e^{r\Gamma }}{r} K \rho ^{r} < \frac{\rho }{4eK}, \end{aligned}$$

which is true if we take, e.g., \(\rho <\overline{\rho }=\frac{3}{2}B^{-1}e^{-\Gamma }\) with a constant \(B\) independent of \(\lambda \). Thus, Proposition 2 applies with, e.g., \(\delta =\overline{\rho }/3\), and we conclude that the near the identity transformation that gives the map the wanted linear normal form is analytic at least in a polydisk of radius \(B^{-1}e^{-\Gamma }\), as claimed. This concludes the proof of the main theorem.