1 Introduction

The goal of this paper is to study the simultaneous linearization problem for some commuting nearly integrable \(C^\infty \) diffeomorphisms of the cylinder. The question of linearization has been one of the central themes in dynamical systems. We start by considering two types of typical integrable maps on the infinite cylinder \(\mathbb {T}\times \mathbb {R}\), whose perturbations will be discussed below. Here, \(\mathbb {T}=\mathbb {R}/\mathbb {Z}\) denotes the circle. Let \(\mathcal {F}_0:\mathbb {T}\times \mathbb {R}\rightarrow \mathbb {T}\times \mathbb {R}\) be a smooth integrable twist map of the form

$$\begin{aligned}\mathcal {F}_0 (x, y)=(x+\omega (y), y),\end{aligned}$$

where the frequency map \(\omega (y)\) is non-degenerate, in the sense that \(\omega (y):\mathbb {R}\rightarrow \mathbb {R}\) has a smooth inverse map. A typical example is \(\omega (y)=y\). For \(\alpha \in \mathbb {R}\), we denote by \(T_\alpha : \mathbb {T}\times \mathbb {R}\rightarrow \mathbb {T}\times \mathbb {R}\) the linear map as follows

$$\begin{aligned}T_\alpha (x, y)=(x+\alpha , y).\end{aligned}$$

Clearly, the phase spaces of \(\mathcal {F}_0\) and \(T_\alpha \) are completely foliated by smooth invariant circles, on which the dynamics are conjugate to the rigid rotations.

We wish to study the perturbations of \(\mathcal {F}_0\) and the perturbations of \(T_\alpha \). They arise naturally in many physical and geometric problems. Consider a smooth diffeomorphism (not necessarily symplectic) \(\mathcal {F}\) which is a perturbation of \(\mathcal {F}_0\) and homotopic to the identity. This means there is a perturbation \(f=(f_1, f_2)\) with \(f_1, f_2\in C^\infty (\mathbb {T}\times \mathbb {R},\mathbb {R})\), such that

$$\begin{aligned} \mathcal {F}=&\,\mathcal {F}_0+f : \mathbb {T}\times \mathbb {R}\rightarrow \mathbb {T}\times \mathbb {R}\nonumber \\&\big (x,y\big ) \longmapsto \big (x+\omega (y) +f_1(x,y)\quad \text {mod}~1, \quad y+f_2(x,y) \big ). \end{aligned}$$
(1.1)

In particular, for the case where \(\mathcal {F}\) is exact symplectic, the question of persistence of invariant circles has been much studied. The celebrated KAM (Kolmogorov-Arnold-Moser) theorem asserts that the Diophantine invariant circles persist under small perturbations. Moreover, the question of when there do or do not exist invariant circles has led to deep studies by Rüssmann, Herman, Mather, et al. See [20] and the references therein.

We also consider a perturbation \(\mathcal {K}\) of \(T_\alpha \) that is homotopic to the identity. This means there is a perturbation \(k=(k_1, k_2)\) with \(k_1, k_2\in C^\infty (\mathbb {T}\times \mathbb {R},\mathbb {R})\), such that

$$\begin{aligned} \mathcal {K}=&T_\alpha +k ~:~ \mathbb {T}\times \mathbb {R}\rightarrow \mathbb {T}\times \mathbb {R}\nonumber \\&\big (x,y\big ) \longmapsto \big (x+\alpha +k_1(x,y)\quad \text {mod}~1,\quad y+k_2(x,y)\big ). \end{aligned}$$
(1.2)

There are many related problems and results under certain assumptions. We briefly review some of them. Restricted to the bounded annulus \(\mathbb {T}\times [0,1]\), an important model is the irrational pseudo-rotation, i.e., an orientation and area preserving diffeomorphism of the annulus that has no periodic points and its rotation number on a boundary circle is \(\alpha \). For any Liouville \(\alpha \in \mathbb {R}\), examples of weak mixing pseudo-rotations were constructed by the Anosov-Katok method [2, 17]. For Diophantine \(\alpha \), it was observed by Herman (see also [14]) that the pseudo-rotation is always smoothly conjugated to \(T_\alpha \) in a small neighborhood of the boundary circle. We also refer to course note [3] and a recent work [1] for more background and overview of the properties of the pseudo-rotations. However, our paper does not focus on the pseudo-rotations. In fact, we do not presuppose the existence of a \(\mathcal {K}\)-invariant circle and the area-preserving condition for the map \(\mathcal {K}\).

In this paper, we are interested in the local rigidity aspect of \(\mathcal {F}_0\) and \(T_\alpha \), i.e., the preservation of smooth foliations under small perturbations. This is essentially a linearization problem. In general, it is not possible to find a smooth conjugacy to the linear model for a single element of the action generated by the pair \((\mathcal {F}, \mathcal {K})\). Indeed, for a single map, it has been known since the work of Poincaré that the smooth foliation structure is in general destroyed by an arbitrarily small perturbation. Here, we are motivated by an attempt to investigate the following question:

Question

For the smooth cylinder maps \(\mathcal {F}\) and \(\mathcal {K}\) that are, respectively, close to \(\mathcal {F}_0\) and \(T_\alpha \), assume that \(\mathcal {F}\) and \(\mathcal {K}\) commute (i.e., \(\mathcal {F}\circ \mathcal {K}=\mathcal {K}\circ \mathcal {F}\)), can \(\mathcal {F}\) and \(\mathcal {K}\) be simultaneously \(C^\infty \)-linearizable ?

The present paper gives a positive answer in the case where \(\alpha \) is Diophantine.

The linearization problem for commuting diffeomorphisms is related to the rigidity theory of a higher rank \(\mathbb {Z}^n\)-action where \(n\geqslant 2\) is the number of diffeomorphisms which generate the action. The case of circle maps has been thoroughly studied. In [26], the problem of linearizing commuting circle diffeomorphisms was raised by Moser in connection with the holonomy group of certain foliations with codimension 1. Using a perturbative KAM scheme, he proved that for commuting \(C^\infty \) circle diffeomorphisms \(\phi _1,\ldots , \phi _n\), if the rotation numbers satisfy a simultaneous Diophantine condition and \(\phi _1,\ldots , \phi _n\) are close to the rigid rotations, then they can be simultaneously \(C^\infty \)-conjugated to the rigid circle rotations. Later, the global version of Moser’s result was proved by Fayad and Khanin [13] by using the global theory of Herman [19] and Yoccoz [36]. In the higher dimensional case, the local rigidity for commuting diffeomorphisms (close to the torus translations) of \(\mathbb {T}^d\) was obtained in [28] for \(d=2\) and in [5, 27, 34] for \(d\geqslant 2\), by assuming an appropriate Diophantine condition on the rotation sets.

Historically, the dynamical motivation for investigating the rigidity of abelian group actions started with the study of structural stability of hyperbolic diffeomorphisms, see [22] for a brief introduction. Unlike the rigidity of elliptic group actions which mainly uses analytic methods, rigidity of the hyperbolic group actions uses more geometric techniques from the hyperbolic theory. For higher rank Anosov actions on compact manifolds, the rigidity problem has been widely studied (cf. [11, 15, 21, 23, 29], etc.). For local rigidity of certain higher rank partially hyperbolic abelian actions, see [5, 7, 8, 33] and the references therein. A complete local picture for affine actions by higher rank lattices in semisimple Lie groups was obtained in [16]. For background and overview of the local rigidity problem for general group actions, we refer to the survey [12].

The linearization problem in this paper is inspired by studying a corresponding local rigidity question for a class of parabolic \(\mathbb {Z}^2\)-action on the cylinder \(\mathbb {T}\times \mathbb {R}\). More precisely, consider a \(\mathbb {Z}^2\)-action \((G_1, G_2)\) generated by two linear twist maps \(G_1(x,y)= (x+y+\alpha _1, y)\) and \(G_2(x,y)=(x+y+\alpha _2,y)\). Let \(({\widetilde{G}}_1, {\widetilde{G}}_2)\) be a small perturbation of the action \((G_1, G_2)\). Then, analogous to [26] one may assume that the frequency maps satisfy a simultaneous Diophantine condition as follows,

$$\begin{aligned} \max _{j=1,2}\left| e^{i2\pi m(y+\alpha _j)}-1\right| \geqslant \frac{\sigma }{|m|^\tau } ,\quad \forall ~m\in \mathbb {Z}\setminus \{0\},\quad \forall ~ y\in \mathbb {R}. \end{aligned}$$

Nevertheless, this also implies that the number \(\alpha _2-\alpha _1\) is Diophantine ( by taking \(y=-\alpha _1\) in the inequality above). Meanwhile, observe that the diffeomorphisms \({\widetilde{G}}_1\) and \({\widetilde{G}}_2\circ {\widetilde{G}}_1^{-1}\) are also generators of the action \(({\widetilde{G}}_1, {\widetilde{G}}_2)\), so the local rigidity problem of \(({\widetilde{G}}_1, {\widetilde{G}}_2)\) is equivalent to that of \(({\widetilde{G}}_1, {\widetilde{G}}_2\circ {\widetilde{G}}_1^{-1})\). Thus, one finds that \({\widetilde{G}}_1\) is of the form (1.1), and \({\widetilde{G}}_2\circ {\widetilde{G}}_1^{-1}\) is a small perturbation of the translation map \(T_{\alpha _2-\alpha _1}\), so it is of the form (1.2). Consequently, it reduces to the simultaneous linearization problem of commuting diffeomorphisms \(\mathcal {F}\) and \(\mathcal {K}\) given in (1.1)–(1.2).

1.1 Statement of results

Denote by \(Diff _0^\infty (\mathbb {T}\times \mathbb {R})\) the set of \(C^\infty \) diffeomorphisms of the infinite cylinder \(\mathbb {T}\times \mathbb {R}\) that are homotopic to the identity. The diffeomorphisms \(\mathcal {F}\) and \(\mathcal {K}\) defined in (1.1)–(1.2) belong to \(Diff _0^\infty (\mathbb {T}\times \mathbb {R})\).

A number \(\alpha \in \mathbb {R}\) is said to be Diophantine if there exist \(\tau >0\) and \(\sigma >0\) such that

$$\begin{aligned} \left| e^{i2\pi m\alpha }-1\right| \geqslant \frac{\sigma }{|m|^\tau }\qquad \forall ~m\in \mathbb {Z}\setminus \{0\}. \end{aligned}$$
(1.3)

In the sequel, we denote by \(\mathrm {DC}(\sigma ,\tau )\) the set of all numbers satisfying (1.3).

Definition 1.1

[30] A map F(xy) of \(\mathbb {T}\times \mathbb {R}\) is said to satisfy the intersection property if each homotopically nontrivial circle \(C^1\)-close to \(\{y=const\}\) intersects its image under F.

Remark 1.1

It is known that any exact symplectic map of \(\mathbb {T}\times \mathbb {R}\) has the intersection property. In addition, an area-preserving map of \(\mathbb {T}\times \mathbb {R}\) having at least one homotopically nontrivial invariant circle also satisfies such a property. Here, we mention that the intersection property was also used to obtain the KAM-type result (e.g. codimension-one invariant tori) for certain non-symplectic maps of \(\mathbb {T}^d\times \mathbb {R}\), \(d\geqslant 1\) [4, 35, 37].

We need the notion of semi-conjugacy.

Definition 1.2

For the map \(\mathcal {K}\) defined in (1.2), we say \(\mathcal {K}\) is Lipschitz semi-conjugate to the rigid circle rotation \(R_\alpha \) \(: x\mapsto x+\alpha \) mod \(\mathbb {Z}\) if there exists a Lipschitz continuous surjective map \(W: \mathbb {T}\times \mathbb {R}\rightarrow \mathbb {T}\) such that \(W \circ \mathcal {K}=R_\alpha \circ W\). The Lipschitz semi-conjugacy W can be written as \(W(x,y)=x+v(x,y)\) mod \(\mathbb {Z}\) for some function \(v\in Lip (\mathbb {T}\times \mathbb {R},\mathbb {R})\).

For example, \(K(x,y)=(x+\alpha ,y+k_2(x,y))\) is always Lipschitz semi-conjugate to the rotation \(R_\alpha \) via the projection map \(\pi _1(x,y)=x\), that is \(\pi _1 \circ K=R_\alpha \circ \pi _1\).

Throughout this paper, the frequency map \(\omega (y)\) in \(\mathcal {F}_0(x,y)=(x+\omega (y),y)\) is always assumed to be non-degenerate, in the sense that \(\omega (y): \mathbb {R}\rightarrow \mathbb {R}\) is a smooth diffeomorphism of \(\mathbb {R}\). This also implies that \(\mathcal {F}_0\) is smoothly conjugate to the standard twist map \((x,y)\mapsto (x+y, y)\), see Sect. 4. We are now ready to state the main result.

Theorem A

Let \(\mathcal {F}, \mathcal {K}\in Diff _0^\infty (\mathbb {T}\times \mathbb {R})\) be commuting diffeomorphisms defined as in (1.1) and (1.2). Suppose that \(\mathcal {F}\) satisfies the intersection property and \(\mathcal {K}\) is Lipschitz semi-conjugate to the rigid rotation \(R_\alpha \) \(: x\mapsto x+\alpha \) mod \(\mathbb {Z}\) with \(\alpha \in \mathrm {DC}(\sigma ,\tau )\).

Then, there exists \(\mu =\mu (\tau )>0\) such that: for any \(\delta >0\) and bounded open interval \(\mathcal {I}\subset \mathbb {R}\) we denote by \(\mathcal {I}_\delta =\{y\in \mathbb {R}:~dist (y, \mathcal I)<\delta \}\) the \(\delta \)-neighborhood of \(\mathcal I\), if

$$\begin{aligned}\left\| \mathcal {F}-\mathcal {F}_0\right\| _{C^\mu (\mathbb {T}\times \mathcal {I}_\delta )}<\varepsilon _0,\qquad \left\| \mathcal {K}-T_\alpha \right\| _{C^\mu (\mathbb {T}\times \mathcal {I}_\delta )}<\varepsilon _0\end{aligned}$$

for a sufficiently small \(\varepsilon _0=\varepsilon _0(\tau , \mathcal {I}, \delta )>0\), then \(\mathcal {F}\) and \(\mathcal {K}\) can be simultaneously \(C^\infty \)-conjugated to \(\mathcal {F}_0\) and \(T_\alpha \) on \(\mathbb {T}\times \mathcal {I}\), in the sense that there exists a \(C^\infty \) diffeomorphism H from \(\mathbb {T}\times \mathcal {I}\) onto its image such that

$$\begin{aligned} H^{-1}\circ \mathcal {F}\circ H=\mathcal {F}_0,\qquad H^{-1}\circ \mathcal {K}\circ H=T_\alpha . \end{aligned}$$

Remark 1.2

It is worth noting that we do not presuppose the intersection property for \(\mathcal {K}\). We also do not presuppose the existence of any invariant circle for \(\mathcal {K}\).

Remark 1.3

Even if \(\mathcal {F}\) and \(\mathcal {K}\) are assumed to be both symplectic, the conjugacy H is in general non-symplectic. For example, let \(\mathcal {F}_0(x,y)=(x+y,y)\) and we consider the perturbations \(\mathcal {F}(x,y)=(x+y+\varepsilon y, y)\), with \(\varepsilon \ll 1\), and \(\mathcal {K}=T_\alpha \). Then, we can define

$$\begin{aligned}H: (x,y)\mapsto ( x, \frac{y}{1+\varepsilon } ).\end{aligned}$$

It is easy to check that \(H^{-1}\circ \mathcal {F}\circ H(x,y)=(x+y,y)=\mathcal {F}_0\) and \(H^{-1}\circ \mathcal {K}\circ H(x,y)=T_\alpha \). Obviously, H is non-symplectic if \(\varepsilon \ne 0\).

Remark 1.4

As we will see in Sect. 7, for the value \(\mu =\mu (\tau )\) it is enough to take any number greater than or equal to \(15(\varrho +1)=15([\tau ]+3)\).

We point out that without the semi-conjugacy condition our simultaneous linearization result is not true in general. The following existence result shows that using only the intersection property of \(\mathcal {F}\) and the commutativity condition can not guarantee that both maps are linearizable.

Proposition B

Let \(\alpha \in \mathbb {R}\), and \(\mathcal {I}\) be a bounded open interval and \(\delta >0\). For any \(r\in \mathbb {N}\) and any small \(\varepsilon >0\), we can always find two commuting diffeomorphisms \(\mathcal {F}, \mathcal {K}\in Diff _0^\infty (\mathbb {T}\times \mathbb {R})\) where \(\mathcal {F}\) satisfies the intersection property and

$$\begin{aligned}\left\| \mathcal {F}-\mathcal {F}_0\right\| _{C^r(\mathbb {T}\times \mathcal {I}_\delta )}<\varepsilon ,\qquad \left\| \mathcal {K}-T_\alpha \right\| _{C^r(\mathbb {T}\times \mathcal {I}_\delta )}<\varepsilon ,\end{aligned}$$

but at least one of the maps \(\mathcal {F}\) and \(\mathcal {K}\) is non-integrable in \(\mathbb {T}\times \mathcal {I}\).

In Proposition B there is no restriction on the number \(\alpha \) (not necessarily irrational or Diophantine).

As a direct corollary of Theorem A and Proposition B, we obtain a result concerning perturbations of abelian actions generated by integrable twist maps.

We say that two maps \({\widetilde{G}}_1, {\widetilde{G}}_2\) of the cylinder are \(\alpha \)-compatible if \({\widetilde{G}}_2\circ {\widetilde{G}}_1^{-1}\) is Lipschitz semi-conjugate to the rigid circle rotation \(x\mapsto x+\alpha \).

Corollary C

Consider two linear twist maps \(G_1(x,y)= (x+y+\alpha _1, y)\) and \(G_2(x,y)=(x+y+\alpha _2,y)\) with \(\alpha := \alpha _2-\alpha _1\) a Diophantine number. Let \({\widetilde{G}}_1, {\widetilde{G}}_2\) be commuting diffeomorphisms of the cylinder which are \(\alpha \)-compatible, and such that some element of the action \(({\widetilde{G}}_1, {\widetilde{G}}_2)\) has the intersection property. If \({\widetilde{G}}_1, {\widetilde{G}}_2\) are sufficiently close to \(G_1, G_2\), respectively, then \({\widetilde{G}}_1, {\widetilde{G}}_2\) are simultaneously smoothly conjugate to \(G_1, G_2\), respectively. Consequently, all the elements of the action \((\widetilde{G}_1, {\widetilde{G}}_2)\) are integrable.

Without the \(\alpha \)-compatibility condition, there exist perturbations \({\widetilde{G}}_1, {\widetilde{G}}_2\) such that the action \(({\widetilde{G}}_1, {\widetilde{G}}_2)\) contains a non-integrable element.

As a by-product of Theorem A, one can study the local perturbation problem of certain commuting non-ergodic maps of \(\mathbb {T}^2\). Recall that an automorphism of \(\mathbb {T}^2\) is determined by a matrix in \(GL (2,\mathbb {Z})\) with determinant \(\pm 1\). Given a matrix \(A= \left( \begin{matrix} 1 &{} n\\ 0 &{} 1 \end{matrix} \right) \) with \(n\in \mathbb {Z}\setminus \{0\},\) it determines a toral automorphism which we also denote by A. In other words,

$$\begin{aligned}A: \mathbb {T}^2\rightarrow \mathbb {T}^2, \qquad (x, y)\mapsto (x+ny~mod ~ 1, y). \end{aligned}$$

We also consider a translation of \(\mathbb {T}^2\) given by \((x,y)\mapsto (x+\alpha , y)\), and for simplicity we still denote by the same symbol \(T_\alpha \). For such toral maps, it is easy to see that \(T_\alpha \) is homotopic to the identity while A is not. Moreover, the commutation relation \(A\circ T_\alpha =T_\alpha \circ A\) holds.

We are interested in the local perturbation problem, so we consider two perturbations \(F_A\in Diff ^\infty (\mathbb {T}^2)\) and \(G\in Diff ^\infty (\mathbb {T}^2)\) which are homotopic to A and id respectively. This means that \(F_A\) and G can be defined by \(F_A=A+ f\) and \(G=T_\alpha + g\), and \(f(x,y), g(x,y)\in C^\infty (\mathbb {T}^2,\mathbb {R}^2)\) are both \(\mathbb {Z}\)-periodic in x and y. Then, under certain assumptions we have the following local result.

Corollary D

For \(\alpha \in DC (\sigma ,\tau )\), we can find \(\mu =\mu (\tau )>0\) and a small number \(\varepsilon _0=\varepsilon _0(\tau )>0\) such that: for any pair of commuting maps \(F_A\in Diff ^\infty (\mathbb {T}^2) \) and \(G\in Diff ^\infty (\mathbb {T}^2)\), if the \(C^\mu \)-distance satisfies

$$\begin{aligned}dist _{C^\mu }(F_A, A)<\varepsilon _0,\qquad dist _{C^\mu }(G, T_\alpha )<\varepsilon _0, \end{aligned}$$

and if \(F_A\) satisfies the intersection property and G(xy) is Lipschitz semi-conjugate to the rigid circle rotation \(x\mapsto x+\alpha \), then there exists a near-identity \(C^\infty \) diffeomorphism \(H:\mathbb {T}^2\rightarrow \mathbb {T}^2\) such that \(H^{-1}\circ F_A\circ H=A\) and \(H^{-1}\circ G\circ H=T_\alpha \).

The proof of Corollary D is a direct application of Theorem A. Note that \(F_A\) (resp. G) is necessarily homotopic to A (resp. id) because the toral diffeomorphism \(F_A\) (resp. G) is sufficiently close to A (resp. \(T_\alpha \)). Thus, the diffeomorphism \(F_A\) of \(\mathbb {T}^2\) admits a natural lift to the infinite cylinder \(\mathbb {T}\times \mathbb {R}\), denoted by \(\widetilde{F_A}\), which can be defined by \(\widetilde{F_A}(x,y)=(x+ny+ f_1(x,y)~mod ~ 1 ,y +f_2(x,y) )\) where \(f_1, f_2\in C^\infty (\mathbb {T}\times \mathbb {R}, \mathbb {R})\) are periodic in both x and y with period 1. Meanwhile, the diffeomorphism G also admits a natural lift to \(\mathbb {T}\times \mathbb {R}\), denoted by \(\widetilde{G}\), which can be defined by \(\widetilde{G}(x,y)=(x+\alpha + g_1(x,y)~mod ~ 1 ,y +g_2(x,y) )\), where \(g_1(x,y)\), \( g_2(x,y)\) are periodic in x and y with period 1. Therefore, by applying Theorem A with \(\mathcal {F}=\widetilde{F_A}\), \(\mathcal {K}={\widetilde{G}}\) and \(\mathcal {I}=[-1, 1]\) and \(\delta =\frac{1}{2}\), we are able to obtain Corollary D.

We also remark that for perturbations of affine abelian actions on the torus \(\mathbb {T}^n\), with parabolic linear parts, there is a more general result on classifying perturbations [6].

1.2 Remarks on our assumptions and method

The assumptions in Theorem A are essentially needed for the simultaneous linearization result. Observe that we have assumed three assumptions: (1) the commutativity condition; (2) the intersection property of \(\mathcal {F}\); (3) the Lipschitz semi-conjugacy condition for \(\mathcal {K}\).

  • The commutativity condition is important for the simultaneous linearization problem. For example, consider \(\mathcal {F}(x,y)=\mathcal {F}_0(x,y)=(x+y,y)\) and \(\mathcal {K}(x,y)=(x+\alpha , y+\varepsilon (x,y) )\) with \(\varepsilon (x,y)\ne 0\). Observe that \(\mathcal {F}\) has the intersection property, and \(\mathcal {K}\) is obviously Lipschitz semi-conjugate to the circle rotation \(x\mapsto x+\alpha \) mod \(\mathbb {Z}\), but \(\mathcal {F}\circ \mathcal {K}\ne \mathcal {K}\circ \mathcal {F}\). For this model, it is well known that for a generic small perturbation \(\varepsilon (x,y)\), \(\mathcal {K}\) can not be conjugated to \(T_\alpha \).

  • The intersection property of \(\mathcal {F}\) is also crucial, otherwise \(\mathcal {F}\) may not be integrable. For example, consider two smooth diffeomorphisms of the cylinder, \(\mathcal {F}(x,y)=(x+y, y+\varepsilon (y))\) with \(\varepsilon (y)>0\) and \(\mathcal {K}=T_\alpha \). Obviously, \(\mathcal {F}\) commutes with \(\mathcal {K}\). But \(\mathcal {F}\) does not satisfy the intersection property. Then, we find that \(\mathcal {F}\) is non-integrable since there are no invariant circles.

  • The semi-conjugacy condition is also needed (see Proposition B). In fact, the Lipschitz semi-conjugacy condition of \(\mathcal {K}\) is only used to control the average part of the perturbation for \(\mathcal {K}\) during the KAM process. Besides, we also point out that it is possible to replace this Lipschitz semi-conjugacy condition by a Hölder semi-conjugacy whose Hölder exponent is close to 1. See Remark 6.2 for an explanation.

Let us compare our conditions with those used in [14]. For a single diffeomorphism \(\mathcal {K}\) of the form (1.2), if the following three conditions (see [14]) are satisfied:

  1. (i)

    the intersection property;

  2. (ii)

    it possesses a smooth invariant circle \(\Gamma \) with Diophantine rotation number \(\alpha \);

  3. (iii)

    it has no periodic points.

then \(\mathcal {K}\) can be \(C^\infty \)-conjugated to \(T_\alpha \) in a small neighborhood U of \(\Gamma \).

If this happens, and if one continues to assume the commutation relation \(\mathcal {F}\circ \mathcal {K}=\mathcal {K}\circ \mathcal {F}\), then the other map \(\mathcal {F}\) would also be integrable in the small neighborhood U. We briefly explain it here. From the preceding analysis, one obtains a diffeomorphism \(H_1\) from U onto its image such that \(H_1^{-1}\circ \mathcal {K}\circ H_1=T_\alpha \). Next, we study the conjugated map \(\widetilde{\mathcal {F}}:=H_1^{-1}\circ \mathcal {F}\circ H_1\). It commutes with \(T_\alpha \), and if we write \(\widetilde{\mathcal {F}}(x,y)=(x+\omega (y)+f_1(x,y), y+f_2(x,y))\), the commutation relation yields \(f_1(x+\alpha ,y)=f_1(x,y)\) and \( f_2(x+\alpha ,y)=f_2(x,y).\) As a consequence, \(f_1\) and \(f_2\) are supposed to be independent of the variable x since \(\alpha \) is Diophantine. In other words, they are of the form \(f_1(x,y)=f_1(y)\) and \(f_2(x,y)=f_2(y)\). On the other hand, \(\mathcal {F}\) also satisfies the intersection property, so \(f_2(y)\) must be 0. Thus we obtain \(\widetilde{\mathcal {F}}=(x+{\widetilde{\omega }}(y), y)\), where \({\widetilde{\omega }}(y)=\omega (y)+f_1(y)\) would be invertible. Finally, using the transformation \(H_2(x,y):=(x, \widetilde{\omega }^{-1}\circ \omega (y))\) it is not difficult to check that \(H^{-1}_2\circ \widetilde{\mathcal {F}}\circ H_2(x,y)=(x+\omega (y), y)=\mathcal {F}_0(x,y)\) and \(H_2^{-1}\circ T_\alpha \circ H_2=T_\alpha \). In conclusion, by setting \(H=H_1\circ H_2\), we can conjugate \(\mathcal {F}\) and \(\mathcal {K}\) to \(\mathcal {F}_0\) and \(T_\alpha \).

However, the above analysis can not be applied to our model since we do not assume the intersection property for the map \(\mathcal {K}\) nor the existence of any \(\mathcal {K}\)-invariant circle with rotation number \(\alpha \). Instead, for our purpose we impose a semi-conjugacy condition on \(\mathcal {K}\).

Now, we outline the method for proving Theorem A. First, as the frequency map \(\omega (y)\) is non-degenerate, under a suitable coordinate transformation Theorem A can be reduced to Theorem 4.1 which studies commuting maps \(\mathbf {F}=U_0+\mathbf {f}\) and \(\mathbf {K}=T_\alpha +\mathbf {k}\) with \(U_0(x,y)=(x+y,y)\). Next, the technique used to prove Theorem 4.1 is based on a KAM iterative scheme for the group action \((\mathbf {F}, \mathbf {K})\). We linearize the nonlinear problem and solve the corresponding linearized equation to obtain a better approximation. By iterating this process, the limit of successive iterations produces a solution to the nonlinear problem. The commutativity is enough to provide a common (approximate) solution to the linearized conjugacy equations of \((\mathbf {F},\mathbf {K})\). At each iteration step, in order to show that the new error is smaller than the initial one, in principle the hard part is the elimination of the average (over \(x\in \mathbb {T}\)) of the perturbations, i.e. \([\mathbf {f}]=([\mathbf {f}_1],[\mathbf {f}_2])\) and \([\mathbf {k}]=([\mathbf {k}_1],[\mathbf {k}_2])\). For this purpose, the intersection property of \(\mathbf {F}\) enters and causes the term \([\mathbf {f}_2]\) to be of higher order, and the semi-conjugacy condition of \(\mathbf {K}\) causes the term \([\mathbf {k}_1]\) to be of higher order. Besides, using the commutativity condition we can show that \([\mathbf {k}_2]\) is quadratic. As for \([\mathbf {f}_1]\), this term can be, to some extent, eliminated by choosing suitably an approximate solution to the cohomological equation. See Sects. 5 and 6 for more discussions.

1.3 Structure of this paper

The paper is organized as follows. Section 2 is devoted to prove Proposition B, the construction is based on the generalized standard family. Section 3 reviews some basic concepts used in this paper. In Sect. 4, by using a suitable coordinate transformation we show that the simultaneous linearization problem of \(\mathcal {F}=\mathcal {F}_0+f\) and \(\mathcal {K}=T_\alpha +k\) are equivalent to that of \(\mathbf {F}=U_0+\mathbf {f}\) and \(\mathbf {K}=T_\alpha +\mathbf {k}\), where \(U_0(x,y)=(x+y,y)\). Theorem A thus reduces to Theorem 4.1. In Sects. 5 and 6, we study the commutativity property, and prove the inductive lemma which is the main ingredient of the iteration process. In Sect. 7, by applying inductively Proposition 6.1 we use the KAM scheme to prove Theorem 4.1.

2 An example of non-integrable commuting diffeomorphisms

In this section we prove Proposition B. For this purpose, we first introduce the generalized standard family. It is a generalization of the Chirikov–Taylor standard family, and is one of the most widely studied family of monotone twist maps. Consider symplectic diffeomorphisms of the cylinder \(\mathbb {T}\times \mathbb {R}\) which are defined by

$$\begin{aligned}S_\varepsilon (x,y)=(x+y+\varepsilon V'(x) ,y +\varepsilon V'(x))\end{aligned}$$

where \(V(x)\in C^\infty (\mathbb {T},\mathbb {R})\) is 1-periodic in x.

\(S_\varepsilon \) is a small perturbation of the integrable map \((x,y)\mapsto (x+y,y)\). It is an elementary fact in symplectic geometry that such a map \(S_\varepsilon \) can be induced by a generating function. More precisely, it is implicitly defined by the following generating function

$$\begin{aligned} G(x,X)=\frac{1}{2}(X-x)^2+\varepsilon V(x) \end{aligned}$$

through the equations:

$$\begin{aligned}y=-\frac{\partial G(x, X)}{\partial x},\qquad Y=\frac{\partial G(x, X)}{\partial X}.\end{aligned}$$

Thus \(S_\varepsilon \) is exact symplectic, which implies zero flux, that is

$$\begin{aligned}\oint _{S_\varepsilon (\gamma )} ydx=\oint _{\gamma } ydx \end{aligned}$$

for every non-contractible loop \(\gamma \) on the cylinder. As a consequence, \(S_\varepsilon \) satisfies the intersection property.

We also point out that if \(V'(x)\) is \(\frac{1}{q}\)–periodic with \(q\in \mathbb {N}\), then \(S_\varepsilon \) commutes with the linear map \(T_{p/q}(x,y)=(x+p/q, y)\) for any \(p\in \mathbb {Z}\). Indeed,

$$\begin{aligned} \begin{aligned} S_\varepsilon \circ T_{p/q}(x,y)=&\,\left( x+\frac{p}{q}+y+\varepsilon V'\left( x+\frac{p}{q}\right) , y+\varepsilon V'\left( x+\frac{p}{q}\right) \right) \\ =&\,\left( x+\frac{p}{q}+y+\varepsilon V'(x), y+\varepsilon V'(x) \right) \\ =&\, T_{p/q}\circ S_\varepsilon (x,y). \end{aligned}\nonumber \\ \end{aligned}$$
(2.1)

Now, we turn to prove Proposition B. The construction will be based on the generalized standard maps described above.

Proof of Proposition B

Let \(\alpha \in \mathbb {R}\). For any \(\varepsilon >0\) and any \(r\in \mathbb {N}\), we can choose a rational number \(\frac{p}{q}\) such that

$$\begin{aligned} 0<\left| \alpha -\frac{p}{q}\right| <\varepsilon , \end{aligned}$$
(2.2)

and choose \(V(x)=\frac{-1}{(2\pi q)^{r+1}}\cos 2\pi qx\). Then we define a pair of smooth diffeomorphisms \(S_\varepsilon \) and \(\mathcal {K}\) by

$$\begin{aligned} S_\varepsilon (x,y)=\left( x+y+\frac{\varepsilon }{(2\pi q)^r}\sin 2\pi q x, ~y+\frac{\varepsilon }{(2\pi q)^r}\sin 2\pi q x \right) \end{aligned}$$
(2.3)

and

$$\begin{aligned} \mathcal {K}(x,y)=\left( x+\frac{p}{q},y\right) . \end{aligned}$$
(2.4)

Since \(S_\varepsilon \) is exact symplectic, \(S_\varepsilon \) satisfies the intersection property. By (2.1) we see that \(S_\varepsilon \) commutes with \(\mathcal {K}\). Moreover, due to (2.2)–(2.3) the perturbations are small in the \(C^r\) topology,

$$\begin{aligned}\Vert S_\varepsilon -S_0\Vert _{C^r}\leqslant \varepsilon , \quad \Vert \mathcal {K}-T_\alpha \Vert _{C^r}\leqslant \varepsilon .\end{aligned}$$

However, a basic fact is that there always exists an arbitrarily small \(\varepsilon >0\) such that the generalized standard map \(S_\varepsilon \) is chaotic and non-integrable (see an illustration in Fig. 1).

To finish our proof, we recall that the frequency map \(\omega (y)\) in \(\mathcal {F}_0\) is a smooth diffeomorphism, and its inverse map is denoted by \(\omega ^{-1}(y)\). Then, under the coordinate transformation Q which is defined by \(Q(x,y)=(x, \omega ^{-1}(y))\) and \(Q^{-1}(x,y)=(x, \omega (y))\), the map \(S_\varepsilon \) can be transformed into

$$\begin{aligned} \mathcal {F}=Q\circ S_\varepsilon \circ Q^{-1}=\mathcal {F}_0+f: \mathbb {T}\times \mathbb {R}\rightarrow \mathbb {T}\times \mathbb {R}\end{aligned}$$

Here, \(\mathcal {F}_0(x,y)=(x+\omega (y), y)\) and \(f=(f_1, f_2)\) is given by

$$\begin{aligned}f_1(x,y)= \frac{\varepsilon }{(2\pi q)^r}\sin 2\pi qx, \quad f_2(x,y)=\omega ^{-1}\left( \omega (y)+\frac{\varepsilon }{(2\pi q)^r}\sin 2\pi qx\right) -y.\end{aligned}$$

Clearly, on the bounded region \(\mathbb {T}\times \mathcal {I}_\delta \), f can be arbitrarily small in the \(C^r\) topology provided that \(\varepsilon \) is small enough. Moreover, by (2.4) we have \(\mathcal {K}=Q\circ \mathcal {K}\circ Q^{-1}\).

Therefore, \(\mathcal {F}\) commutes with \(\mathcal {K}\), and \(\mathcal {F}\) also satisfies the intersection property. In view of the non-integrability of \(S_\varepsilon \), the desired result follows immediately. \(\square \)

Fig. 1
figure 1

An example for \(q=3\)

3 Preliminaries

In this section we review some basic terminology.

A Fréchet space X is defined to be a complete metrizable locally convex topological vector space. Its topology may be induced by a family of seminorms \(\{\Vert \cdot \Vert _r\}_r\). A Fréchet space X is graded if the topology is defined by a family of semi-norms \(\{\Vert \cdot \Vert _r\}_r\) satisfying \(\Vert x\Vert _s\leqslant \Vert x\Vert _t\) for every \(x\in X\) and \(s\leqslant t\). For example, the space \(C^\infty (\mathbb {T}^d,\mathbb {R})\) with the topology given by the \(C^r\) semi-norms \(|g|_r=\max _{|j|=r}\sup _{z\in \mathbb {T}^d}|\partial ^j g(z)|\), \(r\in \mathbb {N}\) is a Fréchet space. By summing up the first i semi-norms for every \(i\in \mathbb {N}\), it turns \(C^\infty (\mathbb {T}^d,\mathbb {R})\) into a graded Fréchet space.

Our method of this paper shall use some approximation properties and quantitative estimates, e.g. the smoothing operators, the interpolation inequalities and the regularity of the composition operator. In particular, we need to control the norm of a function in the scale of Hölder spaces.

Now, let us turn to define Hölder regularities. For our purpose, it is sufficient to consider a convex set \(U=\mathbb {T}\times E\) or \(\mathbb {T}\) or E, with \(E\subset \mathbb {R}\) an open interval, and then study the Hölder regularities of functions defined on U.

For \(\lambda \in (0,1)\), we denote by \(C^\lambda (U,\mathbb {R})\) the space of bounded \(\lambda \)-Hölder functions \(g:U\rightarrow \mathbb {R}\) with the following norm

$$\begin{aligned} \Vert g\Vert _\lambda :=\max \left\{ \Vert g(x)\Vert _{C^0}, \quad \sup _{0<|z-z'|\leqslant 1}\frac{|g(z)-g(z')|}{|z-z'|^\lambda }\right\} .\end{aligned}$$

For integer \(p\in \mathbb {N}\), \(C^p(U,\mathbb {R})\) denotes the space of functions with continuous derivatives up to p with the following norm

$$\begin{aligned}\Vert g\Vert _p:=\max _{0\leqslant t\leqslant p} \max _{ |j|=t}\,\Vert \partial ^j g\Vert _{C^0}.\end{aligned}$$

For \(\ell =p+\lambda \) with \(p\in \mathbb {N}\) and \(\lambda \in (0,1)\), we denote by \(C^\ell (U,\mathbb {R})\) the space of functions \(f:U\rightarrow \mathbb {R}\) with continuous derivatives up to p and Hölder continuous partial derivatives \(\partial ^j f\) for all multi-indices j satisfying \(|j|=p\). We define its norm by

$$\begin{aligned}\Vert g\Vert _\ell :=\max \left\{ \Vert g\Vert _p,~\max _{ |j|=p}\Vert \partial ^jg\Vert _\lambda \right\} . \end{aligned}$$

Here, following [31], we have used the restriction \(0<|z-z'|\leqslant 1\) for the Hölder part of the norm. In this context, an immediate observation is that for any \(f\in C^r(U,\mathbb {R})\), we have

$$\begin{aligned}\Vert f\Vert _r\geqslant \Vert f\Vert _s,\qquad \text { for all~} r\geqslant s\geqslant 0.\end{aligned}$$

Indeed, this can be readily verified using the mean value theorem, since the domain U is convex.

In consequence, we find that the space \(C^\infty (U,\mathbb {R})\) of smooth functions with the family of Hölder norms \(\{\Vert \cdot \Vert _r\}_{r\geqslant 0}\) is a graded Fréchet space.

Throughout this paper, the \(C^r\) norm of a vector-valued function \(G\in C^\infty (U, \mathbb {R}^l)\) is defined by

$$\begin{aligned}\Vert G\Vert _r:=\max _{1\leqslant i \leqslant l}\Vert g_i\Vert _r,\end{aligned}$$

where \(g_i\in C^\infty (U, \mathbb {R})\) is the i-th coordinate function of \(G=(g_1,\ldots ,g_l)\).

4 Initial reduction

In this section we will show that the proof of Theorem A can be reduced to that of Theorem 4.1. The basic idea is simple: since \(\omega (y)\) is non-degenerate, the map \(\mathcal {F}\) can be transformed into a simplified form which is just a perturbation of the standard integrable map \(U_0(x,y)=(x+y,y)\).

Recall that the frequency map \(\omega (y): \mathbb {R}\longrightarrow \mathbb {R}\) is a smooth diffeomorphism, with its inverse denoted by \(\omega ^{-1}(y)\). Define a smooth diffeomorphism Q by

$$\begin{aligned} Q: ~\mathbb {T}\times \mathbb {R}\longrightarrow \mathbb {T}\times \mathbb {R},\quad (x,y) \longmapsto (x, \omega ^{-1}(y)), \end{aligned}$$

and its inverse is

$$\begin{aligned} Q^{-1}: ~\mathbb {T}\times \mathbb {R}\longrightarrow \mathbb {T}\times \mathbb {R},\quad (x,y) \longmapsto (x, \omega (y)). \end{aligned}$$

Under the change of coordinates by Q, the unperturbed map \(\mathcal {F}_0\) can be transformed into

$$\begin{aligned} U_0=Q^{-1}\circ \mathcal {F}_0\circ Q&: ~ \mathbb {T}\times \mathbb {R}\longrightarrow \mathbb {T}\times \mathbb {R}\\ U_0(x,y)&=(x+y,y). \end{aligned}$$

Meanwhile, it is easily seen that \(T_\alpha \) is invariant under the conjugacy Q, that is

$$\begin{aligned}Q^{-1}\circ T_\alpha \circ Q =T_\alpha .\end{aligned}$$

For the maps \(\mathcal {F}\) and \(\mathcal {K}\) considered in Theorem A, under the coordinate transformation Q we obtain the corresponding conjugated maps

$$\begin{aligned}\mathbf {F}= & {} Q^{-1}\circ \mathcal {F}\circ Q :~ \mathbb {T}\times \mathbb {R}\longrightarrow \mathbb {T}\times \mathbb {R}.\\ \mathbf {K}= & {} Q^{-1}\circ \mathcal {K}\circ Q : ~\mathbb {T}\times \mathbb {R}\longrightarrow \mathbb {T}\times \mathbb {R}.\end{aligned}$$

More precisely, \(\mathbf {F}=U_0+\mathbf {f}\) and \(\mathbf {K}=T_\alpha +\mathbf {k}\) for some \(\mathbf {f}, \mathbf {k}\in C^\infty (\mathbb {T}\times \mathbb {R},\mathbb {R}^{2})\), and

$$\begin{aligned} \mathbf {F}(x,y)=(x+y+\mathbf {f}_1(x,y),~ y+\mathbf {f}_2(x,y)), \end{aligned}$$
(4.1)

where \(\mathbf {f}_1=f_1\circ Q\) and \(\mathbf {f}_2(x,y)=\omega (\omega ^{-1}(y)+f_2\circ Q(x,y))-y\).

$$\begin{aligned} \mathbf {K}(x,y)=(x+\alpha +\mathbf {k}_1(x,y),~ y+\mathbf {k}_2(x,y)), \end{aligned}$$
(4.2)

where \(\mathbf {k}_1=k_1\circ Q\) and \(\mathbf {k}_2(x,y)=\omega (\omega ^{-1}(y)+k_2\circ Q(x,y))-y\).

It is easy to verify the following facts.

Lemma 4.1

The commutativity \(\mathbf {F}\circ \mathbf {K}=\mathbf {K}\circ \mathbf {F}\) holds. \(\mathbf {F}\) satisfies the intersection property. \(\mathbf {K}\) is Lipschitz semi-conjugate to \(R_\alpha \).

In fact, the commutativity of \(\mathbf {F}\) and \(\mathbf {K}\) follows directly from that of \(\mathcal {F}\) and \(\mathcal {K}\). The intersection property and the Lipschitz semi-conjugacy property are both preserved under coordinate transformations.

Therefore, by what we have shown above, Theorem A reduces to the following theorem.

Theorem 4.1

Let \(\mathbf {F}, \mathbf {K}\in Diff _0^\infty (\mathbb {T}\times \mathbb {R})\) be commuting diffeomorphisms which are induced by \(\mathbf {F}=U_0+\mathbf {f}\) and \(\mathbf {K}=T_\alpha +\mathbf {k}\), where \(\mathbf {f}, \mathbf {k}\in C^\infty (\mathbb {T}\times \mathbb {R},\mathbb {R}^{2})\) and \(\alpha \in \mathrm {DC}(\sigma ,\tau )\). Suppose that

  • \(\mathbf {F}\) satisfies the intersection property.

  • \(\mathbf {K}\) is Lipschitz semi-conjugate to the rigid circle rotation \(R_\alpha \).

Then, there exists \(\mu =\mu (\tau )>0\) such that: for any \(\delta >0\) and any bounded open interval \(\mathcal {I}\subset \mathbb {R}\), if the perturbations

$$\begin{aligned}\left\| \mathbf {f},~\mathbf {k}\right\| _{C^\mu (\mathbb {T}\times \mathcal {I}_\delta )}<\varepsilon _0\end{aligned}$$

for a sufficiently small \(\varepsilon _0=\varepsilon _0(\tau ,\mathcal {I},\delta )>0\), then \(\mathbf {F}\) and \(\mathbf {K}\) can be simultaneously \(C^\infty \)-conjugated to \(U_0\) and \(T_\alpha \) on \(\mathbb {T}\times \mathcal {I}\), in the sense that there exists a \(C^\infty \) diffeomorphism H from \(\mathbb {T}\times \mathcal {I}\) onto its image such that

$$\begin{aligned} H^{-1}\circ \mathbf {F}\circ H=U_0,\qquad H^{-1}\circ \mathbf {K}\circ H=T_\alpha . \end{aligned}$$

We remark that in the above theorem, \(\mathcal {I}_\delta :=\{y\in \mathbb {R},~ dist (y, \mathcal {I})<\delta \}\). For simplicity we have used the notation

$$\begin{aligned}\Vert \mathbf {f},~\mathbf {k}\Vert _{C^\mu (\mathbb {T}\times \mathcal {I}_\delta )}\overset{def }{=}\max \{\Vert \mathbf {f}\Vert _{C^\mu (\mathbb {T}\times \mathcal {I}_\delta )}, ~\Vert \mathbf {k}\Vert _{C^\mu (\mathbb {T}\times \mathcal {I}_\delta )}\}.\end{aligned}$$

\(C^\infty (\mathbb {T}\times \mathbb {R},\mathbb {R}^2)\) is the set of functions \(\phi (x,y)\in C^\infty (\mathbb {R}\times \mathbb {R},\mathbb {R}^2)\) that are 1-periodic in x.

The following sections will be devoted to prove Theorem 4.1.

5 Linearized conjugacy equations and the commutativity

5.1 Linearized conjugacy equations

Let us focus on the commuting diffeomorphisms \(\mathbf {F}=U_0+\mathbf {f}\) and \(\mathbf {K}=T_\alpha +\mathbf {k}\) obtained in (4.1) and (4.2). In our setting, the simultaneous \(C^\infty \)-linearization problem amounts to find a smooth near-identity conjugacy \(H=id +\mathbf {h}\), with \(\mathbf {h}(x,y)=(\mathbf {h}_1(x,y), \mathbf {h}_2(x,y))\) such that

$$\begin{aligned} \mathbf {K}\circ H=H\circ T_\alpha ,\qquad \mathbf {F}\circ H=H\circ U_0. \end{aligned}$$

Since \(\mathbf {K}=T_\alpha +\mathbf {k}\), the conjugacy equation \(\mathbf {K}\circ H=H\circ T_\alpha \) is reduced to

$$\begin{aligned} \mathbf {h}\circ T_\alpha -\mathbf {h}=\mathbf {k}\circ H. \end{aligned}$$
(5.1)

Simultaneously, as \(\mathbf {F}=U_0+\mathbf {f}\), the conjugacy equation \(\mathbf {F}\circ H=H\circ U_0\) is reduced to

$$\begin{aligned} \left\{ \begin{array}{lll} \mathbf {h}_1\circ U_0-\mathbf {h}_1-\mathbf {h}_2&{}=\mathbf {f}_1\circ H\\ \mathbf {h}_2\circ U_0-\mathbf {h}_2&{}=\mathbf {f}_2\circ H. \end{array} \right. \end{aligned}$$
(5.2)

There is no direct way to solve the nonlinear equations (5.1)-(5.2). Instead, we will use a KAM iterative scheme to solve this nonlinear problem. In other words, the solution is the limit of successive approximations obtained by approximating the nonlinear problem by its linear part, and solving approximately the corresponding linearized equation.

To simplify the notation, for any convex domain \(E\subset \mathbb {R}\) we define two linear operators on \(C^\infty (\mathbb {T}\times E,\mathbb {R}^{2})\) as follows: for \(u(x,y)=(u_1(x,y), u_2(x,y))\),

$$\begin{aligned} \begin{aligned} \Delta _{\alpha }: ~C^\infty (\mathbb {T}\times E,\mathbb {R}^{2})&\longrightarrow C^\infty (\mathbb {T}\times E,\mathbb {R}^{2})\\ u&\longmapsto u\circ T_\alpha -u, \end{aligned} \end{aligned}$$
(5.3)

where \(T_\alpha (x,y)=(x+\alpha , y)\), and

$$\begin{aligned} \begin{aligned} \Delta _{U_0}: ~ C^\infty (\mathbb {T}\times E,\mathbb {R}^{2})&\longrightarrow C^\infty (\mathbb {T}\times E,\mathbb {R}^{2})\\ u= \left( \begin{array}{ll} u_1\\ u_2 \end{array} \right)&\longmapsto \left( \begin{array}{ll} u_1\circ U_0-u_1-u_2\\ u_2\circ U_0-u_2 \end{array} \right) , \end{aligned} \end{aligned}$$
(5.4)

where \(U_0(x,y)=(x+y,y)\). It is easily seen that the two linear operators commute, i.e.,

$$\begin{aligned}\Delta _{U_0}\Delta _{\alpha }=\Delta _{\alpha }\Delta _{U_0}.\end{aligned}$$

Now, the corresponding linearized equations of (5.1)–(5.2) can be written as

$$\begin{aligned} \Delta _\alpha \mathbf {h}&=\mathbf {k} \end{aligned}$$
(5.5)
$$\begin{aligned} \Delta _{U_0}\mathbf {h}&=\mathbf {f} \end{aligned}$$
(5.6)

where \(\mathbf {h}=(\mathbf {h}_1, \mathbf {h}_2)\), \(\mathbf {k}=(\mathbf {k}_1,\mathbf {k}_2)\) and \(\mathbf {f}=(\mathbf {f}_1,\mathbf {f}_2)\).

The basic idea of finding a common approximate solution is as follows. Thanks to the Diophantine property of \(\alpha \), one can first obtain a solution \(\mathbf {h}\) to equation (5.5). Then, by exploiting the commutativity relation we can show that \(\mathbf {h}\) also solves equation (5.6) up to a higher order error. This idea is inspired by Moser’s commuting mechanism [26].

For our purpose, we first give the following lemma for the linear operator \(\Delta _\alpha \). It can be proved using Fourier analysis. We repeat the argument here for completeness. We also remark that the norm of the functions are in the scale of Hölder spaces.

Lemma 5.1

Let \(\alpha \in \mathrm {DC}(\sigma ,\tau )\) and \(E\subset \mathbb {R}\) be a convex open set. Given \(\varphi (x,y)\in C^\infty (\mathbb {T}\times E,\mathbb {R})\), there is a unique solution \(u\in C^\infty (\mathbb {T}\times E,\mathbb {R})\) satisfying \(\int _{\mathbb {T}} u(x,y)\, dx=0\) such that

$$\begin{aligned} \Delta _\alpha u(x,y)=\varphi (x,y)-\int _{\mathbb {T}} \varphi (x,y)\, dx. \end{aligned}$$
(5.7)

Moreover, for all real number \(r\in [0,\infty )\) the solution u satisfies

$$\begin{aligned} \left\| u\right\| _r\leqslant C\left\| \varphi \right\| _{r+\varrho }, \qquad \varrho =[\tau ]+2, \end{aligned}$$
(5.8)

where the constant \(C=C(\tau ,\sigma )=\frac{1}{\sigma }\sum _{m\ne 0}\frac{1}{|m|^{2+[\tau ]-\tau }}\), and \([\tau ]\) is the integer part of \(\tau >0\). For \(r\notin \mathbb {N}\) we use the Hölder norm (see Sect. 3).

Remark 5.1

The constant C can be independent of \(\tau \) if one choose \(\varrho =\tau +2\) instead of \([\tau ]+2\). Sometimes, the linear equation of the form (5.7) is also called a cohomological equation.

Proof

Using Fourier series equation (5.7) becomes

$$\begin{aligned} \sum _{m\in \mathbb {Z}\setminus \{0\}}\left( e^{i2\pi m\alpha }-1\right) \widehat{u}_m(y)\, e^{i2\pi mx}=\sum _{m\in \mathbb {Z}\setminus \{0\}} \widehat{\varphi }_m(y)\, e^{i2\pi mx} \end{aligned}$$

where the Fourier coefficients \(\widehat{\varphi }_m(y)=\int _{\mathbb {T}} \varphi (\theta ,y)e^{-i2\pi m\theta }\,d\theta \). Then we formally have a solution

$$\begin{aligned} u(x,y)=\sum _{m\in \mathbb {Z}\setminus \{0\}}\frac{\widehat{\varphi }_m(y)}{e^{i2\pi m\alpha }-1}e^{i2\pi mx}. \end{aligned}$$

Observe that for each m,

$$\begin{aligned} \widehat{\varphi }_m(y)e^{i2\pi mx}=\int _{\mathbb {T}}\varphi (\theta ,y)e^{-i2\pi m(\theta -x)}\,d\theta =\int _{\mathbb {T}} \varphi (\theta +x,y)e^{-i2\pi m\theta }\,d\theta . \end{aligned}$$
(5.9)

Using integration by parts, we thus obtain

$$\begin{aligned} \left\| \widehat{\varphi }_m(y) e^{i2\pi mx}\right\| _p\leqslant \frac{1}{(2\pi )^q} \frac{\Vert \varphi \Vert _{p+q}}{|m|^q}\leqslant \frac{\Vert \varphi \Vert _{p+q}}{|m|^q},\qquad \text {for all ~} p, q\in \mathbb {N}. \end{aligned}$$
(5.10)

Meanwhile, for each m the following Hölder norm estimate holds

$$\begin{aligned} \left\| \widehat{\varphi }_m(y) e^{i2\pi mx}\right\| _{p+\lambda }\leqslant \frac{\Vert \varphi \Vert _{p+q+\lambda }}{|m|^q},\qquad \text {for all ~} p, q\in \mathbb {N},\quad \lambda \in (0,1). \end{aligned}$$
(5.11)

To verify this estimate, we define \(G(x,y)=\widehat{\varphi }_m(y)\,e^{i2\pi mx}\) for simplicity. Recall that

$$\begin{aligned}\left\| G\right\| _{p+\lambda }=\max \left\{ \left\| G\right\| _{p},~\max _{|J|=p}\left\| \partial ^J G\right\| _{\lambda }\right\} \end{aligned}$$

where the multi-index \(J=(J_1, J_2)\in \mathbb {N}^{2}\) and \(\partial ^J=\partial ^{J_1}_x\partial ^{J_2}_y\). Since by (5.10)

$$\begin{aligned} \Vert G\Vert _p\leqslant \frac{\Vert \varphi \Vert _{p+q}}{|m|^q} \leqslant \frac{\Vert \varphi \Vert _{p+q+\lambda }}{|m|^q},\end{aligned}$$

it remains to check the Hölder norm \(\left\| \partial ^J G\right\| _{\lambda }\) for every multi-index J satisfying \(|J|=p\). In fact, by (5.9), for any two points \((x_1,y_1)\) and (\(x_2,y_2\)) ,

$$\begin{aligned}&\left| \partial ^J G(x_1,y_1)-\partial ^J G(x_2,y_2)\right| \nonumber \\&\quad = \left| \int _{\mathbb {T}}\left( \partial ^J \varphi (\theta +x_1,y_1)-\partial ^J \varphi (\theta +x_2,y_2)\right) e^{-i2\pi m\theta }\,d\theta \right| \nonumber \\&\quad \leqslant \frac{1}{|m|^q}\left| \int _{\mathbb {T}}\left( \partial ^{J} \partial _x^q\varphi (\theta +x_1,y_1)-\partial ^{J}\partial _x^q \varphi (\theta +x_2,y_2)\right) \,e^{-i2\pi m\theta }\,d\theta \right| \nonumber \\&\quad \leqslant \frac{1}{|m|^q}\,\sup _{\theta }\,\left| \partial ^{J}\partial _x^q \varphi (\theta +x_1,y_1)-\partial ^{J}\partial _x^q \varphi (\theta +x_2,y_2)\right| \end{aligned}$$
(5.12)

Here, we have used integration by parts for the third line. As \(|J|=p\), we infer from (5.12) that

$$\begin{aligned} \left\| \partial ^J G\right\| _{\lambda }=&\sup \limits _{0<\Vert (x_1,y_1)-(x_2,y_2)\Vert \leqslant 1} \frac{\left| \partial ^J G(x_1,y_1)-\partial ^J G(x_2,y_2)\right| }{\Vert (x_1,y_1)-(x_2, y_2)\Vert ^\lambda }\\ \leqslant&\frac{1}{|m|^q}\cdot \sup \limits _{0<\Vert (x_1,y_1)-(x_2,y_2)\Vert \leqslant 1} \frac{\Vert \varphi \Vert _{p+q+\lambda }\cdot \Vert (x_1,y_1)-(x_2, y_2)\Vert ^\lambda }{\Vert (x_1,y_1)-(x_2, y_2)\Vert ^\lambda }\\ \leqslant&\frac{1}{|m|^q}\Vert \varphi \Vert _{p+q+\lambda }. \end{aligned}$$

This thus verifies the desired result (5.11).

Next, we will estimate the \(C^r\) norm of the solution u for any \(r\geqslant 0\). By (5.10)–(5.11), for any real \(r\in \mathbb {R}^+\) and \(q\in \mathbb {N}\),

$$\begin{aligned} \left\| u\right\| _r\leqslant \sum _{m\in \mathbb {Z}\setminus \{0\}}\frac{\left\| \widehat{\varphi }_m(y) e^{i2\pi mx}\right\| _r}{|e^{i2\pi m\alpha }-1|} \leqslant \sum _{m\in \mathbb {Z}\setminus \{0\}}\frac{\Vert \varphi \Vert _{r+q}}{ |m|^{q}\,|e^{i2\pi m\alpha }-1|} \leqslant \frac{1}{\sigma }\sum _{m\in \mathbb {Z}\setminus \{0\}}\frac{\Vert \varphi \Vert _{r+q}}{ |m|^{q-\tau }} , \end{aligned}$$

where for the last inequality we have used the Diophantine condition \(\alpha \in \mathrm {DC}(\sigma ,\tau )\). Note that the series on the right hand side is convergent if and only if the integer q satisfies \(q-\tau >1\). Hence, we can choose \(q=[\tau ]+2\), then

$$\begin{aligned} \left\| u\right\| _r\leqslant \frac{1}{\sigma }\sum _{m\in \mathbb {Z}\setminus \{0\}}\frac{\Vert \varphi \Vert _{r+[\tau ]+2}}{ |m|^{2+[\tau ]-\tau }} \leqslant C(\tau ,\sigma )\cdot \Vert \varphi \Vert _{r+[\tau ]+2}, \end{aligned}$$

where the constant \(C(\tau ,\sigma )=\frac{1}{\sigma }\sum _{m\ne 0}\frac{1}{|m|^{2+[\tau ]-\tau }}<\infty \) depends on \(\tau \) and \(\sigma \). This therefore proves estimate (5.8) for any real \(r\geqslant 0\). This finishes the proof. \(\square \)

This lemma tells us that given a differentiable function \(\varphi \), the cohomological equation \(\Delta _\alpha u=\varphi -[\varphi ]\) has a solution, which in general is of lower regularity than \(\varphi \). However, the loss of regularity can be controlled by the Diophantine exponent \(\tau \). In particular, the solution \(u\in C^\infty \) if \(\varphi \in C^\infty \).

5.2 The commutativity property

Now we investigate the commutativity assumption.

Suppose that \(\mathbf {F}=U_0+\mathbf {f}\) commutes with \(\mathbf {K}=T_\alpha +\mathbf {k}\) on \(\mathbb {T}\times E\) with \(E\subset \mathbb {R}\) being convex and open. Then the commutation relation \(\mathbf {F}\circ \mathbf {K}=\mathbf {K}\circ \mathbf {F}\) implies

$$\begin{aligned} \begin{aligned} \mathbf {f}_1\circ \mathbf {K}-\mathbf {f}_1&=\mathbf {k}_1\circ \mathbf {F}-\mathbf {k}_1-\mathbf {k}_2\\ \mathbf {f}_2\circ \mathbf {K}-\mathbf {f}_2&=\mathbf {k}_2\circ \mathbf {F}-\mathbf {k}_2 \end{aligned} \end{aligned}$$
(5.13)

on \(\mathbb {T}\times E\).

In view of the linear operators \(\Delta _\alpha \) and \(\Delta _{U_0}\) defined in (5.3)–(5.4), we also introduce a new linear operator

$$\begin{aligned}\mathcal {L}: C^\infty (\mathbb {T}\times E,\mathbb {R}^{2})\times C^\infty (\mathbb {T}\times E,\mathbb {R}^{2}) \longrightarrow C^\infty (\mathbb {T}\times E,\mathbb {R}^{2})\end{aligned}$$

given by

$$\begin{aligned} \mathcal {L}(f, g)\overset{\Delta }{=}\Delta _{U_0}\, g-\Delta _\alpha \, f. \end{aligned}$$
(5.14)

In what follows, for a smooth function \(\psi (x,y)\) we use \([\psi ](y)\) to denote the average (or mean value) of \(\psi \) over \(\mathbb {T}\), that is

$$\begin{aligned}(y)=\int _{\mathbb {T}} \psi (x,y)\, dx.\end{aligned}$$

In fact, this is exactly the 0-th Fourier coefficient \({\widehat{\psi }}_0(y)\) of \(\psi (x,y)=\sum {\widehat{\psi }}_m(y) e^{i2\pi mx}\).

For our maps \(\mathbf {F}=U_0+\mathbf {f}\) and \(\mathbf {K}=T_\alpha +\mathbf {k}\), the following result states that \(\mathcal {L}(\mathbf {f}, \mathbf {k})\) and the average \([\mathbf {k}_2]\) are both of higher order with respect to the size of the perturbations \(\mathbf {f}\) and \(\mathbf {k}\). This is essentially due to the commutativity property.

Lemma 5.2

If \(\mathbf {F}=U_0+\mathbf {f}\) commutes with \(\mathbf {K}=T_\alpha +\mathbf {k}\), then the following estimates hold:

$$\begin{aligned} \Vert \mathcal {L}(\mathbf {f}, \mathbf {k})\Vert _r\leqslant&C_r \big (\Vert \mathbf {f}\Vert _{r+1}\,\Vert \mathbf {k}\Vert _r+\Vert \mathbf {k}\Vert _{r+1}\,\Vert \mathbf {f}\Vert _r\big ),\qquad for any ~r\geqslant 0 \end{aligned}$$
(5.15)
$$\begin{aligned} \Vert ~[\mathbf {k}_2]~\Vert _0\leqslant&\Vert \mathbf {f}\Vert _1\,\Vert \mathbf {k}\Vert _0+\Vert \mathbf {k}\Vert _1\,\Vert \mathbf {f}\Vert _0\, , \end{aligned}$$
(5.16)

where \([\mathbf {k}_2](y)\) is the average (over \(\mathbb {T}\)) of the second component of \(\mathbf {k}=(\mathbf {k}_1,\mathbf {k}_2)\).

Proof

The commutation relation gives (5.13), which can be rewritten as

$$\begin{aligned} \Delta _{U_0}\mathbf {k}-\Delta _\alpha \mathbf {f}=\mathbf {f}\circ \mathbf {K}-\mathbf {f}\circ T_\alpha -\mathbf {k}\circ \mathbf {F}+\mathbf {k}\circ U_0 \end{aligned}$$
(5.17)

which means that

$$\begin{aligned} \mathcal {L}(\mathbf {f},\mathbf {k})=&\mathbf {f}\circ \mathbf {K}-\mathbf {f}\circ T_\alpha -\mathbf {k}\circ \mathbf {F}+\mathbf {k}\circ U_0 = \int _0^1 D\mathbf {f}(T_\alpha +t\mathbf {k})\,\mathbf {k}-D\mathbf {k}(U_0+t\mathbf {f})\,\mathbf {f}\, dt . \end{aligned}$$
(5.18)

Then,

$$\begin{aligned} \Vert \mathcal {L}(\mathbf {f},\mathbf {k})\Vert _0&\leqslant \Vert D\mathbf {f}\Vert _0\,\Vert \mathbf {k}\Vert _0+\Vert D\mathbf {k}\Vert _0\,\Vert \mathbf {f}\Vert _0\leqslant \Vert \mathbf {f}\Vert _1\,\Vert \mathbf {k}\Vert _0+\Vert \mathbf {k}\Vert _1\,\Vert \mathbf {f}\Vert _0. \end{aligned}$$

This verifies (5.15) for \(r=0\). Based on (5.18), the \(C^r\), \(r\geqslant 1\), norm estimates can be proved similarly, see for example [5, Proposition A.2] or [24, Appendix II].

Now, it remains to prove inequality (5.16). Indeed, taking the average over \(\mathbb {T}\) on both sides of (5.17) we get

$$\begin{aligned} {}[\Delta _{U_0}\mathbf {k}]-[\Delta _\alpha \mathbf {f}]=\int _\mathbb {T}\mathbf {f}_1\circ \mathbf {K}(x,y)-\mathbf {f}_1\circ T_\alpha (x,y)-\mathbf {k}_1\circ \mathbf {F}(x,y)+\mathbf {k}_1\circ U_0(x,y)\, dx.\nonumber \\ \end{aligned}$$
(5.19)

Here, by (5.4) it follows that

$$\begin{aligned} {}[\Delta _{U_0}\mathbf {k}]=\left( \begin{array}{ll} \int _{\mathbb {T}}\mathbf {k}_1(x+y,y)\, dx-\int _{\mathbb {T}}\mathbf {k}_1(x,y)\, dx-\int _{\mathbb {T}}\mathbf {k}_2(x,y)\, dx\\ \int _{\mathbb {T}}\mathbf {k}_2(x+y,y)\, dx-\int _{\mathbb {T}}\mathbf {k}_2(x,y)\, dx \end{array} \right) =\left( \begin{array}{ll} -[\mathbf {k}_2]\\ 0 \end{array} \right) . \end{aligned}$$

Similarly, we can show that \([\Delta _\alpha \mathbf {f}]=(0,0)\). Thus, (5.19) implies that

$$\begin{aligned}-[\mathbf {k}_2](y)= \int _{\mathbb {T}}\mathbf {f}_1\circ \mathbf {K}(x,y)-\mathbf {f}_1\circ T_\alpha (x,y)-\mathbf {k}_1\circ \mathbf {F}(x,y)+\mathbf {k}_1\circ U_0(x,y)\,dx,\end{aligned}$$

which yields

$$\begin{aligned}\Vert [\mathbf {k}_2]\Vert _0\leqslant \Vert D\mathbf {f}_1\Vert _0\Vert \,\mathbf {k}\Vert _0+\Vert D\mathbf {k}_1\Vert _0\,\Vert \mathbf {f}\Vert _0\leqslant \Vert \mathbf {f}\Vert _1\,\Vert \mathbf {k}\Vert _0+\Vert \mathbf {k}\Vert _1\,\Vert \mathbf {f}\Vert _0.\end{aligned}$$

This finishes the proof. \(\square \)

We end this section by mentioning an interesting result [32] which reveals some connection between the commutativity and the KAM set for the analytic systems. It shows that for two nearly integrable and exact symplectic \(C^\omega \) maps, if the image of the KAM curves of the two maps intersect on a \(C^\infty \)-uniqueness set, then the two maps commute.

5.3 Smoothing operators

As we can see from estimate (5.8) in Lemma 5.1, the \(C^r\) norm of the solution u can be estimated by the \(C^{r+\varrho }\) norm of \(\varphi \), with a fixed loss of regularity \(\varrho =[\tau ]+2\). For our KAM iterative scheme in the following sections, we shall choose an appropriate smoothing operator to compensate for this fixed loss of regularity at each iterative step. By using interpolation inequalities, one can recover good behavior of some intermediate norms. Then the error introduced by this smoothing operator would not destroy the rapid convergence of the iteration (The convergence is not quadratic, but it is still faster than exponential). This idea comes from the Nash-Moser technique.

The following approximation result is well known. We refer to [25, 31, 38] for the proof and more details.

Lemma 5.3

Let \(E\subset \mathbb {R}\) be open and convex. There exists a family of linear smoothing operators \(\{\mathrm {S}_N\}_{N\in \mathbb {R}^+}\) from \(C^\infty (\mathbb {T}\times E,\mathbb {R})\) into itself, such that for every \(\psi \in C^\infty (\mathbb {T}\times E,\mathbb {R})\), one has \(\lim _{N\rightarrow \infty }\left\| \psi -\mathrm {S}_N \psi \right\| _0=0\), and

$$\begin{aligned} \left\| \mathrm {S}_N \psi \right\| _l&\leqslant C_{s,l} N^{l-s}\left\| \psi \right\| _s\qquad \text {for~} l\geqslant s, \end{aligned}$$
(5.20)

and for the linear operator \(\mathrm {R}_N\overset{def }{=}id -\mathrm {S}_N\), it satisfies

$$\begin{aligned} \left\| \mathrm {R}_N \psi \right\| _s&\leqslant C_{s,l} \frac{\left\| \psi \right\| _l}{N^{l-s}} \qquad \text {for~} l\geqslant s. \end{aligned}$$
(5.21)

Here, \(C_{s,l}>0\) are constants depending on s and l.

Remark 5.2

In fact, the smoothing operators \(\mathrm {S}_N\) are constructed by convoluting with appropriate kernels decaying rather fast at infinity. So, if \(\psi \) is periodic in some variables then so are the approximating functions \(\mathrm {S}_N\psi \) in the same variables. Moreover, by the definition of convolution, it is not difficult to check that \([\mathrm {S}_N\psi ](y)=\mathrm {S}_N[\psi ](y)\).

However, the operators \(\mathrm {S}_N\) given in Lemma 5.3 may not preserve the averages, i.e., \([\mathrm {S}_N\psi ](y)\ne [\psi ](y)\) and \(\mathrm {S}_N[\psi ](y)\ne [\psi ](y)\) in general.

We also note that for the functions defined on \(\mathbb {T}\times E\), the Fourier truncation operators \(S_N\psi (x,y)\) \(=\) \(\sum _{|m|\leqslant N} \widehat{\psi }_m(y) e^{i2\pi mx}\) are not smoothing operators. In fact, for Fourier truncation operators, although inequality (5.21) is still true, inequality (5.20) does not hold for the partial derivatives of \(\psi \) with respect to y (it holds only for the partial derivatives of \(\psi \) with respect to x).

As pointed out in [38], one important consequence of the existence of smoothing operators is the interpolation inequalities (Hadamard convexity inequalities), which will be very useful to us later on.

Lemma 5.4

[38] Let \(g\in C^\infty (\mathbb {T}\times E,\mathbb {R})\) with \(E\subset \mathbb {R}\) convex and open. Then, for all \(s\leqslant m\leqslant l\), \(m=(1-\lambda ) s+\lambda l\) with \(\lambda \in [0,1]\),

$$\begin{aligned}\left\| g\right\| _m\leqslant C_{\lambda ,s,l}\left\| g\right\| _s^{1-\lambda } \left\| g\right\| _l^{\lambda },\end{aligned}$$

where the constants \(C_{\lambda ,l,s}>0\) depend only on ls and \(\lambda \).

In fact, as \(s\leqslant m\leqslant l\), we choose \(N\in \mathbb {R}^+\) satisfying \(N^{l-s}=\frac{ \Vert g\Vert _l }{ \Vert g\Vert _s }\), and then invoke Lemma 5.3 to obtain that

$$\begin{aligned} \left\| g\right\| _m \leqslant&\, \Vert \mathrm {S}_N g\Vert _m+\Vert \mathrm {R}_Ng\Vert _m\leqslant C_{s,m} N^{m-s}\Vert g\Vert _s +C_{m,l} N^{m-l}\Vert g\Vert _l\\ =&(C_{s,m}+C_{m,l}) \Vert g\Vert ^{\frac{l-m}{l-s}}_s \Vert g\Vert _l^\frac{m-s}{l-s}. \end{aligned}$$

We also refer to [10] for a proof done by an elementary method, and extend even to Hölder spaces of functions defined in a Banach space [9].

6 Inductive lemma and the error estimates

The goal of this section is to prove Proposition 6.1, which will be the main ingredient in the proof of Theorem 4.1. It allows us to obtain smaller errors after each iteration, which thus ensures the convergence of our KAM iteration scheme, see Sect. 7.

Let \(\alpha \in \mathrm {DC}(\sigma ,\tau )\), we recall the constant

$$\begin{aligned}\varrho =[\tau ]+2\end{aligned}$$

obtained in Lemma 5.1. Then the following result holds.

Proposition 6.1

Let \(\mathbf {F}=U_0+\mathbf {f}\) and \(\mathbf {K}=T_\alpha +\mathbf {k}\) be commuting \(C^\infty \) diffeomorphisms, where \(\mathbf {F}\) has the intersection property. Let \(\delta \in (0, \frac{1}{2}]\) and \(\mathcal {I}\subset \mathbb {R}\) be a bounded open interval, we write \(\Vert \mathbf {f},~\mathbf {k}\Vert _r=\Vert \mathbf {f},~\mathbf {k}\Vert _{C^r(\mathbb {T}\times \mathcal {I}_\delta )}\). Suppose that \(\mathbf {K}\) is semi-conjugate to \(R_\alpha \) via a Lipschitz semi-conjugacy of the form \(W(x,y)=x+v(x,y)\), where \(v\in Lip (\mathbb {T}\times \mathcal {I}_\delta ,\mathbb {R})\) satisfies \(|v(z)-v(z')|\leqslant {\mathfrak {L}}\cdot dist (z, z')\) for some \({\mathfrak {L}}>1\).

Then, for \(N>1\), there exists \(\mathbf {h}\in C^\infty (\mathbb {T}\times \mathcal {I}_\delta ,\mathbb {R}^{2})\), see formula (6.13), satisfying

$$\begin{aligned} \Vert \mathbf {h}\Vert _r\leqslant C_{r',r,\varrho }\, N^{r-r'+\varrho }\Vert \mathbf {f},~\mathbf {k}\Vert _{r'},\qquad \text {for~}r\geqslant r'\geqslant 0. \end{aligned}$$
(6.1)

Denote \(\theta =\Vert \mathbf {h}\Vert _1\), \(\theta '=\Vert \mathbf {f},~\mathbf {k}\Vert _0\) and assume that

$$\begin{aligned} \widetilde{\delta }:=\delta -2\theta -\theta '>0, \end{aligned}$$
(6.2)

then the map \(H=id +\mathbf {h}\) has a smooth inverse \(H^{-1}\) defined on \(\mathbb {T}\times \mathcal {I}_{\delta -\theta }\), and the conjugated maps

$$\begin{aligned} {\widetilde{\mathbf {F}}}=H^{-1} \circ \mathbf {F}\circ H, \qquad {\widetilde{\mathbf {K}}}=H^{-1}\circ \mathbf {K}\circ H, \end{aligned}$$

are smooth diffeomorphisms from \(\mathbb {T}\times \mathcal {I}_{{\widetilde{\delta }}}\) onto their images.

Writing \( {\widetilde{\mathbf {F}}}=U_0+{\widetilde{\mathbf {f}}}\) and \({\widetilde{\mathbf {K}}}=T_\alpha +{\widetilde{\mathbf {k}}}\), where \({\widetilde{\mathbf {f}}}, {\widetilde{\mathbf {k}}}\in C^\infty (\mathbb {T}\times \mathcal {I}_{{\widetilde{\delta }}},\mathbb {R}^{2})\), we have:

$$\begin{aligned} \left\| {\widetilde{\mathbf {f}}}, ~{\widetilde{\mathbf {k}}} \right\| _0\leqslant & {} C_{r,\varrho } \cdot {\mathfrak {L}} \cdot \left( N^{2\varrho }\Vert \mathbf {f}, ~\mathbf {k}\Vert _1\,\Vert \mathbf {f},~ \mathbf {k}\Vert _0+\frac{\Vert \mathbf {f},\mathbf {k}\Vert ^2_{\varrho +r+1}}{N^r}+\frac{\Vert \mathbf {f}, \mathbf {k}\Vert _{\varrho +r}}{N^r}\right) , \quad \text {for~} r\geqslant 0,\nonumber \\ \end{aligned}$$
(6.3)
$$\begin{aligned} \left\| {\widetilde{\mathbf {f}}},~{\widetilde{\mathbf {k}}}\right\| _r\leqslant & {} C_{r,\varrho }\,\Big (1+N^{\varrho }\Vert \mathbf {f},~\mathbf {k}\Vert _{r}\Big ),\qquad \text {for~}r> 0. \end{aligned}$$
(6.4)

Moreover, \({\widetilde{\mathbf {K}}}\) is semi-conjugate to \(R_\alpha \) via a Lipschitz semi-conjugacy \({\widetilde{W}}(x,y)=x+{\widetilde{v}}(x,y)\), where \({\widetilde{v}}\in Lip (\mathbb {T}\times \mathcal {I}_{{\widetilde{\delta }}},\mathbb {R})\) has a Lipschitz bound \(\mathfrak {{\widetilde{L}}}>1\) satisfying

$$\begin{aligned} \mathfrak {{\widetilde{L}}}\leqslant {\mathfrak {L}}\,(1+2\Vert \mathbf {h}\Vert _1). \end{aligned}$$
(6.5)

Remark 6.1

In fact, to simplify the notation we have used

$$\begin{aligned}\Vert \mathbf {h}\Vert _r=\Vert \mathbf {h}\Vert _{C^r(\mathbb {T}\times \mathcal {I}_\delta )}, \qquad \left\| {\widetilde{\mathbf {f}}}, ~{\widetilde{\mathbf {k}}} \right\| _r=\left\| {\widetilde{\mathbf {f}}}, ~{\widetilde{\mathbf {k}}} \right\| _{C^r(\mathbb {T}\times \mathcal {I}_{{\widetilde{\delta }}})}.\end{aligned}$$

Condition (6.2) implies that \(\Vert \mathbf {f},~\mathbf {k}\Vert _1\) shall be suitably small.

The proof of Proposition 6.1 will be divided into several lemmas.

6.1 Construction of \(\mathbf {h}\)

The following lemma shows that the solution of the linearized equation \(\Delta _\alpha u=\mathrm {S}_N\mathbf {k}-[\mathrm {S}_N\mathbf {k}]\) is, to some extent, an approximate solution of the linearized equation \(\Delta _{U_0}u=\mathrm {S}_N\mathbf {f}-[\mathrm {S}_N\mathbf {f}]\). It is essentially due to the commutativity condition (see Lemma 5.2).

For simplicity we introduce the set

$$\begin{aligned}C_0^\infty (\mathbb {T}\times \mathcal {I}_\delta ,\mathbb {R}^{2})=\left\{ \phi (x,y)\in C^\infty (\mathbb {T}\times \mathcal {I}_\delta ,\mathbb {R}^{2})~:~ [\phi ](y)=\int _{\mathbb {T}}\phi (x,y)\,dx=0\right\} .\end{aligned}$$

Lemma 6.1

Given \(N>1\), there is a unique solution \(\xi _N(x,y)\in C_0^\infty (\mathbb {T}\times \mathcal {I}_\delta ,\mathbb {R}^{2})\) to the following equation of u

$$\begin{aligned} \Delta _\alpha u=\mathrm {S}_N\mathbf {k}-[\mathrm {S}_N\mathbf {k}]. \end{aligned}$$
(6.6)

It satisfies

$$\begin{aligned} \Vert \xi _N\Vert _{r}\leqslant C_{r',r+\varrho } N^{r-r'+\varrho }\Vert \mathbf {k}\Vert _{r'}\, , \end{aligned}$$
(6.7)

for any \(r\geqslant r'\geqslant 0\). Moreover, if we define \(\mathcal N\) by

$$\begin{aligned} \mathcal {N}(x,y)\overset{def }{=} \Delta _{U_0}\xi _N(x,y)-\big (\mathrm {S}_N \mathbf {f}(x,y)-[\mathrm {S}_N \mathbf {f}](y)\big ). \end{aligned}$$
(6.8)

Then,

$$\begin{aligned} \Vert \mathcal N\Vert _0\leqslant C_{\varrho ,r} \left( N^\varrho \Vert \mathbf {f},~\mathbf {k}\Vert _1\Vert \mathbf {f},~\mathbf {k}\Vert _0+\frac{\Vert \mathbf {f},\mathbf {k}\Vert ^2_{\varrho +r+1}}{N^r}+\frac{\Vert \mathbf {f}, \mathbf {k}\Vert _{\varrho +r}}{N^r}\right) ,\quad for ~ r\geqslant 0. \end{aligned}$$
(6.9)

Proof

By Lemma 5.1, there is a unique solution denoted by \(\xi _N(x,y)\in C_0^\infty (\mathbb {T}\times \mathcal {I}_\delta ,\mathbb {R}^{2})\) to the linear equation (6.6), and by estimate (5.8), it follows that \(\Vert \xi _N\Vert _{r}\leqslant C\Vert \mathrm {S}_N \mathbf {k}\Vert _{r+\varrho }\). Then, due to Lemma 5.3 we have

$$\begin{aligned} \Vert \xi _N\Vert _{r}\leqslant C_{r',r+\varrho } N^{r-r'+\varrho }\Vert \mathbf {k}\Vert _{r'}\,, \end{aligned}$$

for any \(r\geqslant r'\geqslant 0\). Next, we consider the function \(\mathcal N\). Recall that the smoothing operators \(\mathrm {S}_N\) are constructed by the convolution, it is easy to find that every \(\mathrm {S}_N\) commutes with the operator \(\Delta _\alpha \), and \(\Delta _\alpha \) also commutes with \(\Delta _{U_0}\), namely

$$\begin{aligned}\Delta _\alpha \Delta _{U_0}=\Delta _{U_0}\Delta _\alpha ,\quad \Delta _\alpha \,\mathrm {S}_N=\mathrm {S}_N\,\Delta _\alpha .\end{aligned}$$

Then \(\mathcal N\) satisfies the following equation

$$\begin{aligned} \begin{aligned} \Delta _\alpha \mathcal N=&\,\Delta _\alpha \Delta _{U_0}\xi _N-\Delta _\alpha \mathrm {S}_N \mathbf {f}+\Delta _\alpha [\mathrm {S}_N \mathbf {f}] = \Delta _{U_0}\Delta _\alpha \xi _N-\Delta _\alpha \mathrm {S}_N \mathbf {f}\\ =&\, \Delta _{U_0}(\mathrm {S}_N\mathbf {k}-[\mathrm {S}_N\mathbf {k}])-\Delta _\alpha \mathrm {S}_N \mathbf {f}\\ =&\, \Delta _{U_0}\mathrm {S}_N\mathbf {k}- \Delta _{U_0}[\mathrm {S}_N\mathbf {k}]-\Delta _\alpha \mathrm {S}_N \mathbf {f}\\ =&\, \mathcal {L}(\mathrm {S}_N\mathbf {f},\mathrm {S}_N\mathbf {k})- \Delta _{U_0}[\mathrm {S}_N\mathbf {k}].\\ \end{aligned}\nonumber \\ \end{aligned}$$
(6.10)

See also (5.14) for the definition of \(\mathcal {L}\). Note that the average

$$\begin{aligned} {}[\mathcal {N}]=[\Delta _{U_0}\xi _N]=&\, \left( \begin{array}{ll} \int _{\mathbb {T}}\xi _{N,1}(x+y,y)\, dx-\int _{\mathbb {T}}\xi _{N,1}(x,y)\, dx-\int _{\mathbb {T}}\xi _{N,2}(x,y)\, dx\\ \int _{\mathbb {T}}\xi _{N,2}(x+y,y)\, dx-\int _{\mathbb {T}}\xi _{N,2}(x,y)\, dx \end{array} \right) \\ =&\, \left( \begin{array}{ll} -[\xi _{N,2}]\\ 0 \end{array} \right) =\mathbf {0} \end{aligned}$$

as a result of \(\xi _N\in C_0^\infty (\mathbb {T}\times \mathcal {I}_\delta ,\mathbb {R}^{2})\). Thus, applying Lemma 5.1 to (6.10) we deduce that

$$\begin{aligned} \left\| \mathcal {N}\right\| _0\leqslant&\, C\left\| ~\mathcal {L}(\mathrm {S}_N\mathbf {f},\mathrm {S}_N\mathbf {k})- \Delta _{U_0}[\mathrm {S}_N\mathbf {k}]~\right\| _\varrho \,. \end{aligned}$$

Since \(\mathrm {S}_N=id-\mathrm {R}_N\), we have

$$\begin{aligned} \mathcal {L}(\mathrm {S}_N\mathbf {f},\mathrm {S}_N\mathbf {k})=&\, \mathcal {L}(\mathbf {f},\mathbf {k})-\Delta _{U_0}\mathrm {R}_N\mathbf {k}+\Delta _{\alpha }\mathrm {R}_N\mathbf {f}\\ =&\, \mathrm {S}_N\mathcal {L}(\mathbf {f},\mathbf {k})+ \mathrm {R}_N\mathcal {L}(\mathbf {f},\mathbf {k})-\Delta _{U_0}\mathrm {R}_N\mathbf {k}+\Delta _{\alpha }\mathrm {R}_N\mathbf {f}. \end{aligned}$$

Thus, by Lemma 5.3 and inequality (5.15) of Lemma 5.2 we deduce that: for any \(r\geqslant 0\),

$$\begin{aligned} \Vert \mathcal {L}(\mathrm {S}_N\mathbf {f},\mathrm {S}_N\mathbf {k})\Vert _\varrho&\leqslant C_{\varrho ,r} \left( N^\varrho \Vert \mathcal {L}(\mathbf {f},\mathbf {k})\Vert _0+ \frac{\Vert \mathcal {L}(\mathbf {f},\mathbf {k})\Vert _{\varrho +r}}{N^r}+ \Vert \mathrm {R}_N\mathbf {k}\Vert _\varrho +\Vert \mathrm {R}_N\mathbf {f}\Vert _\varrho \right) \nonumber \\&\leqslant C'_{\varrho ,r} \left( N^\varrho \Vert \mathbf {f},\mathbf {k}\Vert _1\Vert \mathbf {f},\mathbf {k}\Vert _0+\frac{\Vert \mathbf {f},\mathbf {k}\Vert _{\varrho +r+1}\Vert \mathbf {f},\mathbf {k}\Vert _{\varrho +r}}{N^r}+\frac{\Vert \mathbf {f}, \mathbf {k}\Vert _{\varrho +r}}{N^r} \right) \nonumber \\&\leqslant C'_{\varrho ,r}\left( N^\varrho \Vert \mathbf {f},\mathbf {k}\Vert _1\Vert \mathbf {f},\mathbf {k}\Vert _0+\frac{\Vert \mathbf {f},\mathbf {k}\Vert ^2_{\varrho +r+1}}{N^r}+\frac{\Vert \mathbf {f}, \mathbf {k}\Vert _{\varrho +r}}{N^r} \right) . \end{aligned}$$
(6.11)

Meanwhile, it is easy to check that \(\Delta _{U_0}[\mathrm {S}_N\mathbf {k}]=(-[\mathrm {S}_N\mathbf {k}_2],0)\), then using Lemma 5.3 and the inequality (5.16) of Lemma 5.2,

$$\begin{aligned} \Vert \Delta _{U_0}[\mathrm {S}_N\mathbf {k}]\Vert _\varrho = \Vert ~[\mathrm {S}_N\mathbf {k}_2]~\Vert _\varrho = \Vert ~\mathrm {S}_N[\mathbf {k}_2]~\Vert _\varrho \leqslant&\, C_{\varrho } \,N^{\varrho } \Vert ~[\mathbf {k}_2]~\Vert _{0}\nonumber \\ \leqslant&\, C_\varrho N^{\varrho }\left( \Vert \mathbf {f}\Vert _1\Vert \mathbf {k}\Vert _0+\Vert \mathbf {k}\Vert _1\Vert \mathbf {f}\Vert _0\right) . \end{aligned}$$
(6.12)

Therefore, (6.11) together with (6.12) implies that

$$\begin{aligned} \left\| \mathcal {N}\right\| _0 \leqslant&C_{\varrho ,r} \left( N^\varrho \Vert \mathbf {f},~\mathbf {k}\Vert _1\Vert \mathbf {f},~\mathbf {k}\Vert _0+\frac{\Vert \mathbf {f},\mathbf {k}\Vert ^2_{\varrho +r+1}}{N^r}+\frac{\Vert \mathbf {f}, \mathbf {k}\Vert _{\varrho +r}}{N^r}\right) ,\qquad for any ~r\geqslant 0. \end{aligned}$$

\(\square \)

Based on the solution \(\xi _N\) obtained in Lemma 6.1, we construct the near-identity conjugacy \(H=id +\mathbf {h}\) as follows:

$$\begin{aligned} \mathbf {h}\overset{def }{=}\left( \begin{array}{r} 0\\ -{[}\mathrm {S}_N \mathbf {f}_1{]} \end{array} \right) +\xi _N=\left( \begin{array}{r} \xi _{N,1}\\ -{[}\mathrm {S}_N \mathbf {f}_1{]}+\xi _{N,2} \end{array} \right) , \end{aligned}$$
(6.13)

where we write \(\xi _N=(\xi _{N,1}, \xi _{N,2})\). Note that \(\Delta _\alpha [\mathrm {S}_N \mathbf {f}_1]=0\), so \(\mathbf {h}\in C^\infty (\mathbb {T}\times \mathcal {I}_\delta ,\mathbb {R}^{2})\) is still a solution of (6.6), that is

$$\begin{aligned}\Delta _\alpha \mathbf {h}=\mathrm {S}_N\mathbf {k}-[\mathrm {S}_N\mathbf {k}].\end{aligned}$$

However, the average \([\mathbf {h}]\ne 0\) in general.

Lemma 6.2

\(\mathbf {h}\) satisfies the following estimates.

$$\begin{aligned} \Vert \mathbf {h}\Vert _r \leqslant&\, C_{r,\varrho } N^{\varrho }\Vert \mathbf {f},~\mathbf {k}\Vert _r\, ,\qquad \text {for every~ } r\geqslant 0. \end{aligned}$$
(6.14)
$$\begin{aligned} \Vert \mathbf {h}\Vert _r\leqslant&\, C_{r',r,\varrho } N^{r-r'+\varrho }\Vert \mathbf {f}, ~\mathbf {k}\Vert _{r'}\,. \qquad \text {for every~ } r\geqslant r'\geqslant 0. \end{aligned}$$
(6.15)

Moreover, under assumption (6.2), the map \(H=id +\mathbf {h}\) has a smooth inverse

$$\begin{aligned}H^{-1}: \mathbb {T}\times \mathcal {I}_{\delta -\theta }\longrightarrow \mathbb {T}\times \mathbb {R}\end{aligned}$$

which is a smooth diffeomorphism from \(\mathbb {T}\times \mathcal {I}_{\delta -\theta }\) onto its image, and \(H^{-1}(\mathbb {T}\times \mathcal {I}_{\delta -\theta })\) \(\subset \) \(\mathbb {T}\times \mathcal {I}_{\delta }\).

Proof

Applying Lemma 5.3 and inequality (6.7) to the formula (6.13),

$$\begin{aligned} \begin{aligned} \Vert \mathbf {h}\Vert _r\leqslant \Vert \mathrm {S}_N \mathbf {f}\Vert _{r}+\Vert \xi _N\Vert _{r}\leqslant&\, C_{r',r}N^{r-r'}\Vert \mathbf {f}\Vert _{r'}+C_{r',r+\varrho }N^{r-r'+\varrho }\Vert \mathbf {k}\Vert _{r'}\\ \leqslant&\, C_{r',r,\varrho } N^{r-r'+\varrho }\Vert \mathbf {f}, ~\mathbf {k}\Vert _{r'} \end{aligned} \end{aligned}$$

for any \(r\geqslant r'\geqslant 0\), where the constant \(C_{r', r,\varrho }>0\) depends on \(r', r\) and \(\varrho \). This proves the desired estimate (6.15). In particular, (6.14) follows immediately by taking \(r=r'\).

By assumption (6.2), we infer that \(\theta =\Vert \mathbf {h}\Vert _1\) satisfies

$$\begin{aligned}\theta <\delta /2\leqslant \frac{1}{4}.\end{aligned}$$

Then, Proposition A.1 implies that the map \(H=id +\mathbf {h}\) has a smooth inverse \(H^{-1}\), which is a smooth diffeomorphism from \(\mathbb {T}\times \mathcal {I}_{\delta -\theta }\) onto its image, and \(H^{-1}(\mathbb {T}\times \mathcal {I}_{\delta -\theta })\) \(\subset \) \(\mathbb {T}\times \mathcal {I}_{\delta }\). \(\square \)

6.2 \(C^0\)-norm estimates of the new errors

By assumption (6.2), \(\theta =\Vert \mathbf {h}\Vert _1\) and \(\theta '=\Vert \mathbf {f}, \mathbf {k}\Vert _0\) satisfies

$$\begin{aligned} {\widetilde{\delta }}:=\delta -2\theta -\theta '>0. \end{aligned}$$
(6.16)

Then, it is easy to find that \(\mathbf {F}\circ H (\mathbb {T}\times \mathcal {I}_{{\widetilde{\delta }}})\subset \mathbb {T}\times \mathcal {I}_{\delta -\theta }\). According to Lemma 6.2, \(H^{-1}\) is well defined on \(\mathbb {T}\times \mathcal {I}_{\delta -\theta }\), we thus have the following conjugated map

$$\begin{aligned} \widetilde{\mathbf {F}}=H^{-1}\circ \mathbf {F}\circ H: ~ \mathbb {T}\times \mathcal {I}_{{\widetilde{\delta }}}\longrightarrow \mathbb {T}\times \mathbb {R}\end{aligned}$$

which is a smooth diffeomorphism from \(\mathbb {T}\times \mathcal {I}_{{\widetilde{\delta }}}\) onto its image. Similarly,

$$\begin{aligned} \widetilde{\mathbf {K}}=H^{-1}\circ \mathbf {K}\circ H: ~ \mathbb {T}\times \mathcal {I}_{{\widetilde{\delta }}}\longrightarrow \mathbb {T}\times \mathbb {R}\end{aligned}$$

is also a smooth diffeomorphism from \(\mathbb {T}\times \mathcal {I}_{{\widetilde{\delta }}} \) onto its image.

We write \(\widetilde{\mathbf {F}}=U_0+\widetilde{\mathbf {f}}\) and \(\widetilde{\mathbf {K}}=T_\alpha +\widetilde{\mathbf {k}}\), where \({\widetilde{\mathbf {f}}}, {\widetilde{\mathbf {k}}} \in C^\infty (\mathbb {T}\times \mathcal {I}_{{\widetilde{\delta }}},\mathbb {R}^{2})\). We will show that the new errors \(\Vert {\widetilde{\mathbf {f}}}\Vert _0\) and \(\Vert {\widetilde{\mathbf {k}}}\Vert _0\) are of higher order. As we will see below, the hard part is the average terms. This is the only place where we need the intersection property and the Lipschitz semi-conjugacy condition.

Lemma 6.3

For every \(r\geqslant 0\),

$$\begin{aligned} \left\| {\widetilde{\mathbf {f}}}\right\| _0 \leqslant C_{r,\varrho }\left( N^\varrho \Vert \mathbf {f},~\mathbf {k}\Vert _1\Vert \mathbf {f},~\mathbf {k}\Vert _0 +\frac{\Vert \mathbf {f},\mathbf {k}\Vert ^2_{\varrho +r+1}}{N^r}+\frac{\Vert \mathbf {f}, \mathbf {k}\Vert _{\varrho +r}}{N^r}\right) . \end{aligned}$$
(6.17)

For \({\widetilde{\mathbf {k}}}=({\widetilde{\mathbf {k}}}_1,{\widetilde{\mathbf {k}}}_2)\), it satisfies

$$\begin{aligned} \left\| {\widetilde{\mathbf {k}}}_1\right\| _0\leqslant C_{r, \varrho }\cdot {\mathfrak {L}} \cdot \left( N^{2\varrho }\Vert \mathbf {f}, ~\mathbf {k}\Vert _1\Vert \mathbf {f}, ~\mathbf {k}\Vert _0+\frac{\Vert \mathbf {k}\Vert _{r}}{N^r}\right) , \end{aligned}$$
(6.18)
$$\begin{aligned} \left\| {\widetilde{\mathbf {k}}}_2\right\| _0\leqslant C_{r, \varrho }\left( N^{2\varrho }\Vert \mathbf {f}, ~\mathbf {k}\Vert _1\Vert \mathbf {f}, ~\mathbf {k}\Vert _0+\frac{\Vert \mathbf {k}\Vert _{r}}{N^r}\right) . \end{aligned}$$
(6.19)

Here, \({\mathfrak {L}}>1\) is a Lipschitz bound of v(xy) for the semi-conjugacy \(W(x,y)=x+v(x,y)\).

Proof

We first consider \({\widetilde{\mathbf {f}}}\). Note that the identity \( H\circ \widetilde{\mathbf {F}}=\mathbf {F}\circ H \) implies

$$\begin{aligned} \widetilde{\mathbf {f}}=U_0\circ H-U_0+\mathbf {f}\circ H-\mathbf {h}\circ \widetilde{\mathbf {F}}.\end{aligned}$$

In light of \(U_0(x,y)=(x+y, y)\) and \(\mathbf {h}\) given in (6.13), we deduce that

$$\begin{aligned} \widetilde{\mathbf {f}}=&\left( \begin{array}{l} \mathbf {h}_1+\mathbf {h}_2\\ \mathbf {h}_2 \end{array}\right) +\mathbf {f}\circ H-\mathbf {h}\circ \widetilde{\mathbf {F}}\\ =&-\Delta _{U_0}\mathbf {h}+\mathbf {h}\circ U_0+\mathbf {f}\circ H-\mathbf {h}\circ \widetilde{\mathbf {F}}\\ =&-\Delta _{U_0}\mathbf {h}+\mathbf {f}+ (\mathbf {f}\circ H-\mathbf {f}+\mathbf {h}\circ U_0-\mathbf {h}\circ \widetilde{\mathbf {F}})\\ =&\left( \begin{array}{l} -[\mathrm {S}_N\mathbf {f}_1]\\ 0 \end{array}\right) -\Delta _{U_0}\xi _N+ \mathbf {f}+(\mathbf {f}\circ H-\mathbf {f}+\mathbf {h}\circ U_0-\mathbf {h}\circ \widetilde{\mathbf {F}})\\ =&\left( \begin{array}{l} 0\\ {[}\mathrm {S}_N\mathbf {f}_2{]} \end{array}\right) -\left( [\mathrm {S}_N\mathbf {f}]+\Delta _{U_0}\xi _N-\mathrm {S}_N\mathbf {f}\right) + \mathrm {R}_N\mathbf {f}+(\mathbf {f}\circ H-\mathbf {f}+\mathbf {h}\circ U_0-\mathbf {h}\circ \widetilde{\mathbf {F}})\\ =&\left( \begin{array}{l} 0\\ {[}\mathrm {S}_N\mathbf {f}_2{]} \end{array}\right) -\mathcal {N}+(\mathrm {R}_N\mathbf {f}+\mathbf {f}\circ H-\mathbf {f}+\mathbf {h}\circ U_0-\mathbf {h}\circ \widetilde{\mathbf {F}}), \end{aligned}$$

where \(\mathcal N\) is given in (6.8). Writing \({\widetilde{\mathbf {f}}}=(\widetilde{\mathbf {f}}_1,\widetilde{\mathbf {f}}_2 )\) and \(\mathcal {N}=(\mathcal {N}_1, \mathcal {N}_2)\), we get

$$\begin{aligned} \begin{array}{lll} \widetilde{\mathbf {f}}_1=&{} &{}-\mathcal {N}_1+ \mathrm {R}_N\mathbf {f}_1+(\mathbf {f}_1\circ H-\mathbf {f}_1)-(\mathbf {h}_1\circ {\widetilde{\mathbf {F}}}-\mathbf {h}_1\circ U_0),\\ \widetilde{\mathbf {f}}_2= &{}[\mathrm {S}_N \mathbf {f}_2]&{}-\mathcal {N}_2+ \mathrm {R}_N\mathbf {f}_2+(\mathbf {f}_2\circ H-\mathbf {f}_2)-(\mathbf {h}_2\circ {\widetilde{\mathbf {F}}}-\mathbf {h}_2\circ U_0). \end{array} \end{aligned}$$

Basically, \({\widetilde{\mathbf {f}}}_1\) is of higher order. In fact, we get the following preliminary estimate

$$\begin{aligned} \begin{aligned} \left\| {\widetilde{\mathbf {f}}}_1\right\| _0\leqslant&\left\| \mathcal {N}\right\| _0+ \left\| \mathrm {R}_N\mathbf {f}\right\| _0+\left\| \mathbf {f}\right\| _1\left\| \mathbf {h}\right\| _0+\left\| \mathbf {h}\right\| _1\left\| {\widetilde{\mathbf {f}}}\right\| _0\,. \end{aligned} \end{aligned}$$
(6.20)

As for \({\widetilde{\mathbf {f}}}_2\), the hard part is the average term \([\mathrm {S}_N\mathbf {f}_2]\) which is only of order one without further information. It is here that the intersection property of \({\widetilde{\mathbf {F}}}\) comes into play, causing this term to be of higher order. More precisely, as \(\widetilde{\mathbf {F}}(x,y)=(x+y+{\widetilde{\mathbf {f}}}_1, y+{\widetilde{\mathbf {f}}}_2)\) satisfies the intersection property, we have that for each point \(y\in \mathcal {I}_{{\widetilde{\delta }}}\),

$$\begin{aligned}\big (\mathbb {T}\times \{y\}\big ) ~\bigcap ~{\widetilde{\mathbf {F}}}\big (\mathbb {T}\times \{y\}\big ) \ne \emptyset , \end{aligned}$$

which implies that for every y, the map \(x\longmapsto {\widetilde{\mathbf {f}}}_2(x,y)\) has zeros. Hence, it follows that

$$\begin{aligned} \begin{aligned} \left\| {\widetilde{\mathbf {f}}}_2\right\| _0\leqslant&\, 2\left\| {\widetilde{\mathbf {f}}}_2-[\mathrm {S}_N \mathbf {f}_2]\right\| _0\\ \leqslant&\, 2\Big ( \left\| \mathcal {N}_2\right\| _0+\left\| \mathrm {R}_N \mathbf {f}_2\right\| _0+\left\| \mathbf {f}_2\circ H-\mathbf {f}_2\right\| _0+\left\| \mathbf {h}_2\circ {\widetilde{\mathbf {F}}}-\mathbf {h}_2\circ U_0\right\| _0\Big )\\ \leqslant&\, 2\Big ( \left\| \mathcal {N}_2\right\| _0+\left\| \mathrm {R}_N \mathbf {f}_2\right\| _0+\left\| D\mathbf {f}_2\right\| _0\left\| \mathbf {h}\right\| _0+\left\| D\mathbf {h}_2\right\| _0\left\| {\widetilde{\mathbf {f}}}\right\| _0\Big )\\ \leqslant&\, 2\Big ( \left\| \mathcal {N}\right\| _0+\left\| \mathrm {R}_N \mathbf {f}\right\| _0+\left\| \mathbf {f}\right\| _1\left\| \mathbf {h}\right\| _0+\left\| \mathbf {h}\right\| _1\left\| \widetilde{\mathbf {f}}\right\| _0\Big ). \end{aligned}\nonumber \\ \end{aligned}$$
(6.21)

Since \(\left\| {\widetilde{\mathbf {f}}}\right\| _0=\max \left\{ \left\| {\widetilde{\mathbf {f}}}_1\right\| _0,~\left\| {\widetilde{\mathbf {f}}}_2\right\| _0\right\} \), we combine (6.21) with (6.20) to obtain

$$\begin{aligned} \left\| {\widetilde{\mathbf {f}}}\right\| _0 \leqslant 2\Big ( \left\| \mathcal {N}\right\| _0+\left\| \mathrm {R}_N \mathbf {f}\right\| _0+\left\| \mathbf {f}\right\| _1\left\| \mathbf {h}\right\| _0+\left\| \mathbf {h}\right\| _1\left\| \widetilde{\mathbf {f}}\right\| _0\Big ). \end{aligned}$$

which yields

$$\begin{aligned} (1-2\left\| \mathbf {h}\right\| _1)\cdot \left\| {\widetilde{\mathbf {f}}}\right\| _0 \leqslant 2\Big ( \left\| \mathcal {N}\right\| _0+\left\| \mathrm {R}_N \mathbf {f}\right\| _0+\left\| \mathbf {f}\right\| _1\left\| \mathbf {h}\right\| _0\Big ). \end{aligned}$$

As \(\Vert \mathbf {h}\Vert _1=\theta <\delta /2\leqslant 1/4\), we infer that

$$\begin{aligned} \left\| {\widetilde{\mathbf {f}}}\right\| _0 \leqslant&\, 4\Big (\left\| \mathcal {N}\right\| _0+\left\| \mathrm {R}_N \mathbf {f}\right\| _0+\left\| \mathbf {f}\right\| _1\left\| \mathbf {h}\right\| _0\Big ). \end{aligned}$$

Here, by estimate (6.14) and Lemma 5.3 we readily get

$$\begin{aligned} \left\| \mathrm {R}_N \mathbf {f}\right\| _0\leqslant C_r \frac{\left\| \mathbf {f}\right\| _{r}}{N^{r}},\qquad \left\| \mathbf {f}\right\| _1\left\| \mathbf {h}\right\| _0\leqslant C_{\varrho } N^{\varrho }\left\| \mathbf {f}\right\| _1\,\Vert \mathbf {f},~\mathbf {k}\Vert _0,\end{aligned}$$

for any \(r\geqslant 0\). The term \(\Vert \mathcal N\Vert _0\) can be estimated by (6.9). Thus, the desired estimate (6.17) follows immediately.

Now, we turn to investigate \(\widetilde{\mathbf {k}}\). Observe that

$$\begin{aligned} {\widetilde{\mathbf {k}}}=H^{-1}\circ \mathbf {K}\circ H-T_\alpha =(H^{-1}-id )\circ \mathbf {K}\circ H+ \mathbf {h}+\mathbf {k}\circ H \end{aligned}$$

Then, using Proposition A.1 we obtain a preliminary estimate for \(\Vert {\widetilde{\mathbf {k}}}\Vert _0\) which will be useful below,

$$\begin{aligned} \Vert {\widetilde{\mathbf {k}}}\Vert _0\leqslant \Vert H^{-1}-id \Vert _0+ \Vert \mathbf {h}\Vert _0+\Vert \mathbf {k}\Vert _0\leqslant 2\Vert \mathbf {h}\Vert _0+\Vert \mathbf {k}\Vert _0. \end{aligned}$$
(6.22)

On the other hand, we deduce from the conjugacy equation \( H\circ \widetilde{\mathbf {K}}=\mathbf {K}\circ H \) that

$$\begin{aligned} \begin{aligned} {\widetilde{\mathbf {k}}}&= \mathbf {k}\circ H-\mathbf {h}\circ {\widetilde{\mathbf {K}}}+\mathbf {h}\\&=\mathbf {k}-\Delta _\alpha \mathbf {h}+(\mathbf {k}\circ H-\mathbf {k})-(\mathbf {h}\circ {\widetilde{\mathbf {K}}}-\mathbf {h}\circ T_\alpha )\\&= [\mathrm {S}_N \mathbf {k}]+\mathrm {R}_N \mathbf {k}+ (\mathbf {k}\circ H-\mathbf {k})-(\mathbf {h}\circ {\widetilde{\mathbf {K}}}-\mathbf {h}\circ T_\alpha ), \end{aligned} \end{aligned}$$

where for the last line we used the fact \(\Delta _\alpha \mathbf {h}=\mathrm {S}_N\mathbf {k}-[\mathrm {S}_N\mathbf {k}]\). Then, for \({\widetilde{\mathbf {k}}}=({\widetilde{\mathbf {k}}}_1, {\widetilde{\mathbf {k}}}_2)\),

$$\begin{aligned} {\widetilde{\mathbf {k}}}_1=[\mathrm {S}_N \mathbf {k}_1]+{\widetilde{\mathbf {k}}}_1',\quad with~ {\widetilde{\mathbf {k}}}_1'=\mathrm {R}_N \mathbf {k}_1+ (\mathbf {k}_1\circ H-\mathbf {k}_1)-(\mathbf {h}_1\circ {\widetilde{\mathbf {K}}}-\mathbf {h}_1\circ T_\alpha ), \end{aligned}$$
(6.23)

and

$$\begin{aligned} {\widetilde{\mathbf {k}}}_2=[\mathrm {S}_N \mathbf {k}_2]+\mathrm {R}_N \mathbf {k}_2+ (\mathbf {k}_2\circ H-\mathbf {k}_2)-(\mathbf {h}_2\circ {\widetilde{\mathbf {K}}}-\mathbf {h}_2\circ T_\alpha ). \end{aligned}$$
(6.24)

For the term \({\widetilde{\mathbf {k}}}_2\), we apply estimate (6.22) to obtain that

$$\begin{aligned} \begin{aligned} \left\| {\widetilde{\mathbf {k}}}_2\right\| _0\leqslant&\left\| ~[\mathrm {S}_N \mathbf {k}_2]~\right\| _0+\left\| \mathrm {R}_N \mathbf {k}_2\right\| _0+\left\| \mathbf {k}_2\right\| _1\left\| \mathbf {h}\right\| _0+\left\| \mathbf {h}_2\right\| _1\left\| \widetilde{\mathbf {k}}\right\| _0\,\\ =&\left\| ~[\mathbf {k}_2]-[\mathrm {R}_N\mathbf {k}_2]~\right\| _0+\left\| \mathrm {R}_N \mathbf {k}_2\right\| _0+\left\| \mathbf {k}_2\right\| _1\left\| \mathbf {h}\right\| _0+\left\| \mathbf {h}_2\right\| _1 \,(2\Vert \mathbf {h}\Vert _0+\Vert \mathbf {k}\Vert _0)\\ \leqslant&\left\| ~[\mathbf {k}_2]~\right\| _0+2\left\| \mathrm {R}_N \mathbf {k}_2\right\| _0+\left\| \mathbf {k}_2\right\| _1\left\| \mathbf {h}\right\| _0+\left\| \mathbf {h}_2\right\| _1 \,(2\Vert \mathbf {h}\Vert _0+\Vert \mathbf {k}\Vert _0)\\ \leqslant&2\Vert \mathbf {f}, ~\mathbf {k}\Vert _1\Vert \mathbf {f}, ~\mathbf {k}\Vert _0+2\left\| \mathrm {R}_N \mathbf {k}\right\| _0+\left\| \mathbf {k}\right\| _1\left\| \mathbf {h}\right\| _0+2\left\| \mathbf {h}\right\| _1\left\| \mathbf {h}\right\| _0+\left\| \mathbf {h}\right\| _1\left\| \mathbf {k}\right\| _0, \end{aligned}\nonumber \\ \end{aligned}$$
(6.25)

where the last line used Lemma 5.2 to estimate \(\Vert [\mathbf {k}_2]\Vert _0\). Now, applying (6.14) to estimate \(\Vert \mathbf {h}\Vert _1\) and \(\Vert \mathbf {h}\Vert _0\) we can show that

$$\begin{aligned} \left\| \mathbf {k}\right\| _1\left\| \mathbf {h}\right\| _0 \leqslant C_{\varrho }N^{\varrho }\left\| \mathbf {k}\right\| _1\Vert \mathbf {f},~\mathbf {k}\Vert _0, \quad \left\| \mathbf {h}\right\| _1\left\| \mathbf {h}\right\| _0\leqslant C_{1,\varrho }N^{2\varrho }\left\| \mathbf {f},~\mathbf {k}\right\| _1\Vert \mathbf {f},~\mathbf {k}\Vert _0\,, \end{aligned}$$

and

$$\begin{aligned}\left\| \mathbf {h}\right\| _1\left\| \mathbf {k}\right\| _0\leqslant C_{1,\varrho }N^{\varrho }\left\| \mathbf {f},~\mathbf {k}\right\| _1\Vert \mathbf {k}\Vert _0\,. \end{aligned}$$

The term \(\left\| \mathrm {R}_N \mathbf {k}\right\| _0\) can be estimated using Lemma 5.3. Therefore, (6.25) reduces to

$$\begin{aligned} \left\| {\widetilde{\mathbf {k}}}_2\right\| _0\leqslant C_{r, \varrho }\left( N^{2\varrho }\Vert \mathbf {f}, ~\mathbf {k}\Vert _1\Vert \mathbf {f}, ~\mathbf {k}\Vert _0+\frac{\Vert \mathbf {k}\Vert _{r}}{N^r}\right) ,\qquad \text {for any~} r\geqslant 0. \end{aligned}$$
(6.26)

This verifies the desired estimate (6.19).

Using similar arguments, one can also show that

$$\begin{aligned} \left\| {\widetilde{\mathbf {k}}}_1'\right\| _0\leqslant C_{r, \varrho }\left( N^{2\varrho }\Vert \mathbf {f}, ~\mathbf {k}\Vert _1\Vert \mathbf {f}, ~\mathbf {k}\Vert _0+\frac{\Vert \mathbf {k}\Vert _{r}}{N^r}\right) ,\qquad \text {for any~} r\geqslant 0. \end{aligned}$$
(6.27)

Thus, in order to complete the \(C^0\) norm estimate of \(\widetilde{\mathbf {k}}_1=[\mathrm {S}_N \mathbf {k}_1]+{\widetilde{\mathbf {k}}}_1'\), it remains to control the average term \([\mathrm {S}_N \mathbf {k}_1](y)\). In general, \([\mathrm {S}_N\mathbf {k}_1]\) is only of order one without further information. This is the moment where we need the Lipschitz semi-conjugacy condition. Recall that \(\mathbf {K}\) is semi-conjugate to \(R_\alpha \) via a Lipschitz semi-conjugacy \(W:\mathbb {T}\times \mathcal {I}_\delta \rightarrow \mathbb {T}\), which can be written as \(W(x,y)=x+v(x,y)\) with \(v\in Lip (\mathbb {T}\times \mathcal {I}_\delta ,\mathbb {R})\). Define \({\widetilde{W}}(x,y):=W\circ H(x,y)\). It is Lipschitz continuous and

$$\begin{aligned} {\widetilde{W}}(x,y)=x+\widetilde{v}(x,y),\quad with~ \widetilde{v}(x,y)=\mathbf {h}_1(x,y)+v\circ H(x,y). \end{aligned}$$
(6.28)

Clearly, \({\widetilde{\mathbf {K}}}\) is semi-conjugate to \(R_\alpha \) via the semi-conjugacy \({\widetilde{W}}\), that is \({\widetilde{W}}\circ \widetilde{\mathbf {K}}=R_\alpha \circ {\widetilde{W}}\) on \(\mathbb {T}\times \mathcal {I}_{{\widetilde{\delta }}}\).

By (6.28), the semi-conjugacy equation \({\widetilde{W}}\circ {\widetilde{\mathbf {K}}}=R_\alpha \circ {\widetilde{W}}\) reduces to

$$\begin{aligned} x+\alpha +{\widetilde{\mathbf {k}}}_1+{\widetilde{v}}\circ \widetilde{\mathbf {K}}=x+{\widetilde{v}}+\alpha , \end{aligned}$$

or equivalently, \([\mathrm {S}_N \mathbf {k}_1]+{\widetilde{\mathbf {k}}}_1'+{\widetilde{v}}\circ {\widetilde{\mathbf {K}}}-{\widetilde{v}}=0\). It can be rewritten as

$$\begin{aligned} \begin{aligned} {}[\mathrm {S}_N \mathbf {k}_1](y)=&-{\widetilde{\mathbf {k}}}_1'-{\widetilde{v}} (x+\alpha +[\mathrm {S}_N \mathbf {k}_1]+{\widetilde{\mathbf {k}}}_1',y+{\widetilde{\mathbf {k}}}_2) +{\widetilde{v}}(x+\alpha +[\mathrm {S}_N \mathbf {k}_1],y)\\&-{\widetilde{v}}(x+\alpha +[\mathrm {S}_N \mathbf {k}_1],y)+{\widetilde{v}}. \end{aligned} \end{aligned}$$

Taking the average over \(x\in \mathbb {T}\) on both sides of the above identity, we get

$$\begin{aligned}{}[\mathrm {S}_N \mathbf {k}_1](y)&= -\int _{\mathbb {T}}{\widetilde{\mathbf {k}}}_1'\,dx-\int _{\mathbb {T}}{\widetilde{v}} (x+\alpha +[\mathrm {S}_N \mathbf {k}_1]+{\widetilde{\mathbf {k}}}_1',y+{\widetilde{\mathbf {k}}}_2)\nonumber \\&\qquad -\widetilde{v}(x+\alpha +[\mathrm {S}_N \mathbf {k}_1],y)\, dx \end{aligned}$$
(6.29)

where we already used the fact that for each fixed y,

$$\begin{aligned}\int _{\mathbb {T}} {\widetilde{v}}(x+\alpha +[\mathrm {S}_N \mathbf {k}_1](y),y)\, dx=\int _{\mathbb {T}} {\widetilde{v}}(x,y)\, dx.\end{aligned}$$

Moreover, \(|{\widetilde{v}}(z)-{\widetilde{v}}(z')|\) \(\leqslant \) \(\mathfrak {{\widetilde{L}}}\cdot dist (z,z')\) with some Lipschitz bound \(\mathfrak {{\widetilde{L}}}>1\) that satisfies

$$\begin{aligned} \mathfrak {{\widetilde{L}}}\leqslant \Vert D\mathbf {h}_1\Vert _0+\mathfrak L\,(1+\Vert D\mathbf {h}\Vert _0)\leqslant {\mathfrak {L}}\,(1+2\Vert \mathbf {h}\Vert _1), \end{aligned}$$
(6.30)

as a consequence of (6.28) and \({\mathfrak {L}}>1\). Then, we infer from (6.29) and (6.30) that

$$\begin{aligned} \left\| ~[\mathrm {S}_N \mathbf {k}_1]~\right\| _0\leqslant \left\| {\widetilde{\mathbf {k}}}_1'\right\| _0+\mathfrak {\widetilde{L}} \cdot \left\| {\widetilde{\mathbf {k}}}_1',~ {\widetilde{\mathbf {k}}}_2\right\| _0\leqslant \left\| {\widetilde{\mathbf {k}}}_1'\right\| _0+2\mathfrak {L} \cdot \left\| {\widetilde{\mathbf {k}}}_1',~ {\widetilde{\mathbf {k}}}_2\right\| _0, \end{aligned}$$

where for the last inequality we used the fact \(\Vert \mathbf {h}\Vert _1= \theta <\delta /2\leqslant 1/4.\) This yields

$$\begin{aligned} \begin{aligned} \left\| {\widetilde{\mathbf {k}}}_1\right\| _0=\left\| ~[\mathrm {S}_N \mathbf {k}_1]+{\widetilde{\mathbf {k}}}_1'\right\| _0 \leqslant&\, \left\| {\widetilde{\mathbf {k}}}_1'\right\| _0+2\mathfrak {L} \cdot \left\| {\widetilde{\mathbf {k}}}_1',~ {\widetilde{\mathbf {k}}}_2\right\| _0+\left\| {\widetilde{\mathbf {k}}}_1'\right\| _0\\ \leqslant&\ \left( 2+2\mathfrak {L} \right) \cdot \left\| {\widetilde{\mathbf {k}}}_1',~ {\widetilde{\mathbf {k}}}_2\right\| _0\\ \leqslant&\, 4 {\mathfrak {L}} \cdot \left\| {\widetilde{\mathbf {k}}}_1',~ {\widetilde{\mathbf {k}}}_2\right\| _0, \end{aligned} \end{aligned}$$

since \({\mathfrak {L}}>1\). Thus, using (6.26)–(6.27) the desired estimate (6.18) follows immediately. \(\square \)

6.3 Proof of Proposition 6.1

By what we have shown above, the desired \(C^r\)-estimate (6.1) of \(\mathbf {h}\) follows from Lemma 6.2. The desired \(C^0\)-estimate (6.3) of \(\left\| {\widetilde{\mathbf {f}}}, ~{\widetilde{\mathbf {k}}}\right\| _0\) follows from Lemma 6.3. The estimate (6.5) for the Lipschitz bound \(\widetilde{{\mathfrak {L}}}\) comes from (6.30).

Thus, to complete the proof of Proposition 6.1, it remains to verify estimate (6.4) for \(\left\| {\widetilde{\mathbf {f}}}, ~{\widetilde{\mathbf {k}}}\right\| _r\). More precisely, \({\widetilde{\mathbf {f}}}\) can be rewritten as

$$\begin{aligned} {\widetilde{\mathbf {f}}}=H^{-1}\circ \mathbf {F}\circ H-U_0=&\,(H^{-1}-id )\circ \mathbf {F}\circ H+\mathbf {F}\circ H-U_0 \\ =&\,(H^{-1}-id )\circ \mathbf {F}\circ H+ U_0\circ H-U_0+\mathbf {f}\circ H. \end{aligned}$$

Hence,

$$\begin{aligned} \begin{aligned} \left\| {\widetilde{\mathbf {f}}}\right\| _r\leqslant \Vert (H^{-1}-id )\circ \mathbf {F}\circ H\Vert _r+ 2\Vert \mathbf {h}\Vert _r+\Vert \mathbf {f}\circ H\Vert _r \end{aligned} \end{aligned}$$
(6.31)

According to Proposition A.2, for two smooth functions the \(C^r\) norm of their composition can be controlled linearly if the \(C^1\) norm of the two functions are bounded. We also point out that

$$\begin{aligned}(H^{-1}-id )\circ \mathbf {F}\circ H(x+m,y)= & {} (H^{-1}-id )\circ \mathbf {F}\circ H(x,y),\\ \mathbf {f}\circ H(x+m,y)= & {} \mathbf {f}\circ H(x,y).\end{aligned}$$

for any \(m\in \mathbb {Z}\), which, means that \((H^{-1}-id )\circ \mathbf {F}\circ H\) and \(\mathbf {f}\circ H\) are functions on \(\mathbb {R}\times \mathcal {I}_{{\widetilde{\delta }}}\) that are \(\mathbb {Z}\)-periodic in x.

Thus, to estimate \(\Vert {\widetilde{\mathbf {f}}}\Vert _r\) it suffices to give the \(C^r\) norm of the right hand side terms of (6.31) on the bounded domain \([0, 1]\times \mathcal {I}_{{\widetilde{\delta }}}\). In fact, since \(\Vert \mathbf {h}\Vert _1\) and \(\Vert \mathbf {f}\Vert _1\) are bounded, we infer from Proposition A.2 that

$$\begin{aligned} \begin{aligned} \left\| (H^{-1}-id )\circ \mathbf {F}\circ H\right\| _r \leqslant&\,C_r\left( 1+ \left\| H^{-1}-id \right\| _r+\Vert \mathbf {f}\Vert _r+\Vert \mathbf {h}\Vert _r\right) ,\\ \left\| \mathbf {f}\circ H\right\| _r\leqslant&\, C_r(1+\Vert \mathbf {f}\Vert _r+\Vert \mathbf {h}\Vert _r). \end{aligned} \end{aligned}$$

By Proposition A.1,

$$\begin{aligned}\left\| H^{-1}-id \right\| _r\leqslant C_r \Vert \mathbf {h}\Vert _r.\end{aligned}$$

Together with inequality (6.14), we finally get

$$\begin{aligned} \left\| {\widetilde{\mathbf {f}}}\right\| _r \leqslant C'_r\Big (1+\Vert \mathbf {h}\Vert _r+\Vert \mathbf {f}\Vert _r\Big ) \leqslant C_{r,\varrho } \Big (1+N^{\varrho }\Vert \mathbf {f},~\mathbf {k}\Vert _{r}\Big ) \end{aligned}$$

for every \(r> 0\). Next, we consider \({\widetilde{\mathbf {k}}}\). Observe that

$$\begin{aligned} {\widetilde{\mathbf {k}}}=&H^{-1}\circ \mathbf {K}\circ H-T_\alpha =(H^{-1}-id )\circ \mathbf {K}\circ H+\mathbf {K}\circ H-T_\alpha \\ =&(H^{-1}-id )\circ \mathbf {K}\circ H+ \mathbf {h}+\mathbf {k}\circ H \end{aligned}$$

Analogous to \(\widetilde{\mathbf {f}}\), one can show that

$$\begin{aligned} \left\| {\widetilde{\mathbf {k}}}\right\| _r \leqslant C_{r,\varrho } \Big (1+N^{\varrho }\Vert \mathbf {f},~\mathbf {k}\Vert _{r}\Big ) \end{aligned}$$

for every \(r> 0\). This verifies the desired estimate (6.4). Therefore, we finish the proof of Proposition 6.1.

Remark 6.2

(Lipschitz versus Hölder semi-conjugacy) We would like to say a little more on our Lipschitz semi-conjugacy condition, which is only used to control the \(C^0\)-norm of the average term \([\mathrm {S}_N\mathbf {k}_1]\). It seems possible to replace the Lipschitz semi-conjugacy condition by a Hölder one with a suitable Hölder exponent. More precisely, if one assumes that \(\mathbf {K}\) is semi-conjugate to \(R_\alpha \) via a \(\beta \)-Hölder semi-conjugacy, then by formula (6.29) and the estimates (6.26)–(6.27) we would get

$$\begin{aligned} \Vert [\mathrm {S}_N \mathbf {k}_1]\Vert _0\ll&\Vert {\widetilde{\mathbf {k}}}_1'\Vert _0+ \Vert {\widetilde{\mathbf {k}}}_1',~ {\widetilde{\mathbf {k}}}_2\Vert _0^\beta \\ \ll&\left( N^{2\varrho }\Vert \mathbf {f}, ~\mathbf {k}\Vert _1\Vert \mathbf {f}, ~\mathbf {k}\Vert _0+\frac{\Vert \mathbf {k}\Vert _{r}}{N^r}\right) +\left( N^{2\varrho }\Vert \mathbf {f}, ~\mathbf {k}\Vert _1\Vert \mathbf {f}, ~\mathbf {k}\Vert _0+\frac{\Vert \mathbf {k}\Vert _{r}}{N^r}\right) ^\beta \end{aligned}$$

for every \(r\geqslant 0\). Thus, for the exponent \(\beta \) greater than \(\frac{1}{2}\) and close to 1, one may still obtain a higher-order estimate for \([\mathrm {S}_N \mathbf {k}_1]\) by choosing suitably large N at each KAM step.

Anyway, our approach requires the Hölder exponent \(\beta \) to be close to 1. It still does not give results for any exponent \(\beta \in (0,1] \), so we do not purse this direction in this paper.

7 The KAM iterative scheme

In this section we prove Theorem 4.1 by using a KAM iterative scheme. At each iteration step we choose a smoothing operator \(\mathrm {S}_{N_i}\) with an appropriate \(N_i>0\), and then apply Proposition 6.1 to conjugate the maps \(\mathbf {F}, \mathbf {K}\) closer and closer to the linear maps \(U_0, T_\alpha \). The KAM technique ensures the rapid convergence of the iteration.

Proof of Theorem 4.1

Let \(\delta \in (0, \frac{1}{2})\). To begin the iterative process, we set up

$$\begin{aligned} \mathbf {f}^{(0)}= & {} \mathbf {f},\quad \mathbf {k}^{(0)}=\mathbf {k};\\ \mathbf {F}^{(0)}= & {} U_0+\mathbf {f}^{(0)},\quad \mathbf {K}^{(0)}=T_\alpha +\mathbf {k}^{(0)}; \quad \mathbf {h}^{(0)}=0. \end{aligned}$$

Here, the commuting maps

$$\begin{aligned} \mathbf {F}^{(0)}, ~\mathbf {K}^{(0)}: \mathbb {T}\times \mathcal {I}_{\delta ^{(0)}}\longrightarrow \mathbb {T}\times \mathbb {R}\end{aligned}$$

are diffeomorphisms from \( \mathbb {T}\times \mathcal {I}_{\delta ^{(0)}}\) onto their images, where \(\delta ^{(0)}=\delta \). By assumption, \(\mathbf {K}^{(0)}\) is semi-conjugate to \(R_\alpha \) via a Lipschitz semi-conjugacy of the form \(W^{(0)}(x,y)=x+v^{(0)}(x,y)\). The function \(v^{(0)}\) has a Lipschitz bound \(\mathfrak {L}^{(0)}=\mathfrak {L}>1\) on \(\mathbb {T}\times \mathcal {I}_{\delta ^{(0)}}\).

Then, at the i-th step (\(i=1,2,\ldots \)), with an appropriate large \(N_i>0\) we apply inductively Proposition 6.1 to obtain \(\mathbf {h}^{(i)}\), \(\mathbf {f}^{(i)}\), \(\mathbf {k}^{(i)}\) such that

$$\begin{aligned}H^{(i)}= & {} id +\mathbf {h}^{(i)}\\ \mathbf {F}^{(i)}= & {} \left( H^{(i)}\right) ^{-1}\circ \mathbf {F}^{(i-1)}\circ H^{(i)}=U_0+\mathbf {f}^{(i)}\\ \mathbf {K}^{(i)}= & {} \left( H^{(i)}\right) ^{-1}\circ \mathbf {K}^{(i-1)}\circ H^{(i)}=T_\alpha +\mathbf {k}^{(i)}\end{aligned}$$

where \(\mathbf {h}^{(i)}\in C^\infty (\mathbb {T}\times \mathcal {I}_{\delta ^{(i-1)}},\mathbb {R}^{2})\), \(\mathbf {F}^{(i)}\) and \(\mathbf {K}^{(i)}\) are smooth diffeomorphisms from \( \mathbb {T}\times \mathcal {I}_{\delta ^{(i)}}\) onto their images, for some \(\delta ^{(i)}>0\), and \(\mathbf {f}^{(i)}, \mathbf {k}^{(i)}\in C^\infty (\mathbb {T}\times \mathcal {I}_{\delta ^{(i)}},\mathbb {R}^{2})\). In what follows, we introduce the notation

$$\begin{aligned} \mathcal {E}_{i,r}\overset{def }{=}&\left\| \mathbf {f}^{(i)},~\mathbf {k}^{(i)}\right\| _{C^r\left( \mathbb {T}\times \mathcal {I}_{\delta ^{(i)}}\right) },\qquad \mathcal {U}_{i,r}\overset{def }{=}\left\| \mathbf {h}^{(i)}\right\| _{C^r\left( \mathbb {T}\times \mathcal {I}_{\delta ^{(i-1)}}\right) }. \end{aligned}$$

To ensure the convergence of the iteration process, at the i-th step (\(i\geqslant 1\)) we choose

$$\begin{aligned} N_{i}=\mathcal {E}_{i-1,0}^{-\frac{1}{4(\varrho +1)}}, \end{aligned}$$
(7.1)

Then, we infer from Proposition 6.1 that for \(i=1,2,\ldots ,\)

$$\begin{aligned} \mathcal {U}_{i,r}\leqslant&\, C_{r',r,\varrho }\, N_i^{r-r'+\varrho }\,\mathcal {E}_{i-1,r'},\qquad \text {for~}r\geqslant r'\geqslant 0. \end{aligned}$$
(7.2)
$$\begin{aligned} \mathcal {E}_{i,0} \leqslant&\, C_{r,\varrho } \cdot \mathfrak {L}^{(i-1)} \cdot \left( N_i^{2\varrho }\,\mathcal {E}_{i-1,1}\cdot \mathcal {E}_{i-1,0}+\frac{\mathcal {E}^2_{i-1,\varrho +r+1}}{N_i^r}+\frac{\mathcal {E}_{i-1,\varrho +r}}{N_i^r}\right) ,\qquad \text {for~} r\geqslant 0. \end{aligned}$$
(7.3)
$$\begin{aligned} \mathcal {E}_{i,r}\leqslant&\, C_{r,\varrho } \Big (1+N_i^{\varrho }\,\mathcal {E}_{i-1,r}\Big ),\qquad \text {for~}r> 0. \end{aligned}$$
(7.4)

and

$$\begin{aligned} \delta ^{(i)}=\delta ^{(i-1)}-2 \mathcal {U}_{i,1}-\mathcal {E}_{i-1,0} \end{aligned}$$
(7.5)

Moreover, by (6.5), \(\mathbf {K}^{(i)}\) is semi-conjugate to \(R_\alpha \) via a Lipschitz semi-conjugacy \(W^{(i)}(x,y)=x+v^{(i)}(x,y)\), where \(v^{(i)}\) has a Lipschitz bound \(\mathfrak {L}^{(i)}\) satisfying

$$\begin{aligned} \mathfrak {L}^{(i)}\leqslant \mathfrak {L}^{(i-1)}\,(1+2\mathcal {U}_{i,1}). \end{aligned}$$
(7.6)

Set \(\mu =15(\varrho +1)\). The following result holds.

Lemma 7.1

Assume that \(\mathcal {E}_{0,\mu }=\Vert \mathbf {f}^{(0)},~\mathbf {k}^{(0)}\Vert _\mu \) is sufficiently small, then for all \(i\geqslant 1\),

$$\begin{aligned} \mathcal {E}_{i,0}\leqslant \mathcal {E}^{\frac{5}{4}}_{i-1,0}~,\quad \mathcal {E}_{i,\mu }\leqslant \mathcal {E}_{i,0}^{-1}~,\quad \mathcal {U}_{i,1}\leqslant \mathcal {E}^{\frac{1}{2}}_{i-1,0},\qquad \delta ^{(i)}\geqslant \frac{\delta }{2}+\frac{\delta }{2^{i+1}}. \end{aligned}$$
(7.7)

Proof

Note that by the interpolation inequalities (see Lemma 5.4), we get

$$\begin{aligned} \mathcal {E}_{i,1}\leqslant C_\mu \,\mathcal {E}_{i,0}^{1-\frac{1}{\mu }} \,\mathcal {E}_{i,\mu }^{\frac{1}{\mu }},\qquad for all ~ i. \end{aligned}$$
(7.8)

According to (7.1)–(7.6), it is easy to find that the inequalities in (7.7) are true for the first step \(i=1\), provided that \(\mathcal {E}_{0,\mu }\) is suitably small.

Suppose inductively that all inequalities in (7.7) hold for \(1 ,\ldots ,i\). Then, we will check these estimates for the \((i+1)\)-th step.

Since \(N_{i+1}=\mathcal {E}_{i,0}^{-\frac{1}{4(\varrho +1)}}\) and (7.8) holds, using inequality (7.3) with \(r=\mu -(\varrho +1)\) we obtain

$$\begin{aligned} \mathcal {E}_{i+1,0}\leqslant&\, C_{\mu ,\varrho }\cdot \mathfrak {L}^{(i)} \cdot \left( N_{i+1}^{2\varrho }\mathcal {E}_{i,1}\cdot \mathcal {E}_{i,0}+\frac{\mathcal {E}^2_{i,\mu }}{N_{i+1}^{\mu -\varrho -1}}+\frac{\mathcal {E}_{i,\mu -1}}{N_{i+1}^{\mu -\varrho -1}} \right) \nonumber \\ \leqslant&\, C'_{\mu ,\varrho } \cdot \mathfrak {L}^{(i)}\cdot \left( N_{i+1}^{2\varrho }\mathcal {E}_{i,0}^{2-\frac{2}{\mu }} +\frac{\mathcal {E}_{i,0}^{-2}}{N_{i+1}^{\mu -\varrho -1}}+\frac{\mathcal {E}_{i,0}^{-1}}{N_{i+1}^{\mu -\varrho -1} }\right) \nonumber \\ =&\, C'_{\mu ,\varrho } \cdot \mathfrak {L}^{(i)} \cdot \left( \mathcal {E}_{i,0}^{2-\frac{2}{\mu }-\frac{\varrho }{2(\varrho +1)}} +\mathcal {E}_{i,0}^{\frac{\mu -\varrho -1}{4(\varrho +1)}-2}+\mathcal {E}_{i,0}^{\frac{\mu -\varrho -1}{4(\varrho +1)}-1} \right) . \end{aligned}$$
(7.9)

By (7.6), we derive inductively that

$$\begin{aligned} \mathfrak {L}^{(i)}\leqslant \mathfrak {L}\,\prod _{t=1}^{i}(1+2\mathcal {U}_{t,1})\leqslant \mathfrak {L}\,\prod _{t=1}^{i}(1+2\mathcal {E}_{t-1,0}^{\frac{1}{2}}) \leqslant \mathfrak {L}\,\prod _{t=1}^{i}\left( 1+2\mathcal {E}_{0,0}^{\frac{1}{2}\left( \frac{5}{4}\right) ^{t-1}}\right) \leqslant C\,\mathfrak {L}, \end{aligned}$$

where \(C>1\) is a constant independent of i provided that \(\mathcal {E}_{0,0}<1/2\). Observe that \(\mu =15(\varrho +1)\), then

$$\begin{aligned}2-\frac{2}{\mu }-\frac{\varrho }{2(\varrho +1)}>\frac{3}{2},\qquad \frac{\mu -\varrho -1}{4(\varrho +1)}-2=\frac{14(\varrho +1)}{4(\varrho +1)}-2= \frac{3}{2}.\end{aligned}$$

Substituted into (7.9), we obtain

$$\begin{aligned} \mathcal {E}_{i+1,0}\leqslant C'_{\mu ,\varrho } \, C\,\mathfrak {L}\, \left( \mathcal {E}_{i,0}^{\frac{3}{2}} +\mathcal {E}_{i,0}^{\frac{3}{2}} +\mathcal {E}_{i,0}^{\frac{5}{2}} \right) \leqslant \mathcal {E}_{i,0}^{\frac{5}{4}}. \end{aligned}$$
(7.10)

Applying (7.4) with \(r=\mu \), it follows that

$$\begin{aligned} \mathcal {E}_{i+1,\mu }\leqslant C_{\mu , \varrho } \left( 1+N_{i+1}^{\varrho }\mathcal {E}_{i,\mu } \right) \leqslant 2C_{\mu ,\varrho }\mathcal {E}_{i,0}^{-1-\frac{\varrho }{4(\varrho +1)}}\leqslant \mathcal {E}_{i,0}^{-\frac{5}{4}}\leqslant \mathcal {E}^{-1}_{i+1,0}. \end{aligned}$$
(7.11)

Here, for the last inequality we used (7.10).

Next, applying inequality (7.2) with \(r=1\) and \(r'=0\), we have

$$\begin{aligned} \mathcal {U}_{i+1,1}&\leqslant C_{0,1,\varrho }\, N_{i+1}^{\varrho +1}\mathcal {E}_{i,0} \leqslant C_{0,1,\varrho }\, \mathcal {E}^{1-\frac{\varrho +1}{4(\varrho +1)}}_{i,0}\leqslant \mathcal {E}_{i,0}^{\frac{1}{2}}. \end{aligned}$$
(7.12)

Finally, by (7.5) it follows that

$$\begin{aligned} \delta ^{(i+1)}=&\, \delta ^{(i)}-2 \mathcal {U}_{i+1,1}-\mathcal {E}_{i,0} \geqslant \frac{\delta }{2}+\frac{\delta }{2^{i+1}}-2\mathcal {E}_{i,0}^{\frac{1}{2}}-\mathcal {E}_{i,0}\nonumber \\ \geqslant&\, \frac{\delta }{2}+\frac{\delta }{2^{i+1}}-3\mathcal {E}_{0,0}^{\frac{1}{2}\left( \frac{5}{4}\right) ^{i}}\geqslant \frac{\delta }{2}+\frac{\delta }{2^{i+2}} \end{aligned}$$
(7.13)

as long as \(\mathcal {E}_{0,0}<c\cdot \delta \) for some small constant \(c>0\).

Combining (7.10)–(7.13), we thus verify (7.7) for \((i+1)\) in place of i. This proves Lemma 7.1. \(\square \)

Now, let us proceed with the proof of Theorem 4.1. By Lemma 7.1, as long as \(\mathcal {E}_{0,\mu }\) is sufficiently small, the following sequences

$$\begin{aligned}\Vert \mathbf {f}^{(i)},~\mathbf {k}^{(i)}\Vert _0\leqslant \mathcal {E}_{0,0}^{\left( \frac{5}{4}\right) ^i},\quad \Vert \mathbf {h}^{(i)}\Vert _1\leqslant \mathcal {E}_{0,0}^{\frac{1}{2}\left( \frac{5}{4}\right) ^{i-1}} \end{aligned}$$

converge rapidly to zero. Also, \(\delta ^{(i)}\rightarrow \frac{\delta }{2}\). Thus, this rapid convergence ensures that as \(l\rightarrow \infty \), the composition

$$\begin{aligned}\mathcal {H}_l =H^{(1)}\circ \cdots \circ H^{(l)}\end{aligned}$$

converges in the \(C^1\) topology to some \(\mathcal {H}_\infty \) which is a \(C^1\) diffeomorphism from \(\mathbb {T}\times \mathcal {I}_{\frac{\delta }{2}}\) onto its image, for which the following conjugacy equations hold

$$\begin{aligned} \mathbf {F}\circ \mathcal {H}_\infty =\mathcal {H}_\infty \circ U_0,\qquad \mathbf {K}\circ \mathcal {H}_\infty =\mathcal {H}_\infty \circ T_\alpha .\end{aligned}$$

Now, it remains to show that the \(C^1\) limit solution \(\mathcal {H}_\infty \) is also of class \(C^s\) for every \(s>1\). In fact, just as shown in [38], this can be achieved by making full use of the interpolation inequalities.

More precisely, we first observe that for any \(t>0\), applying (7.4) with \(r=t\) we get

$$\begin{aligned} \mathcal {E}_{i,t}\leqslant C_{t,\varrho } \Big (1+N_i^{\varrho }\mathcal {E}_{i-1,t}\Big ), \end{aligned}$$

for some constant \(C_{t,\varrho }> 1\). In light of the choice of \(N_i\) (see (7.1)), it follows that

$$\begin{aligned}1+\mathcal {E}_{i,t}\leqslant C_{t,\varrho } \,\mathcal {E}_{i-1,0}^{-\frac{1}{4}}\Big (1+\mathcal {E}_{i-1,t}\Big ),\end{aligned}$$

from which we derive inductively that

$$\begin{aligned} \mathcal {E}_{i,t}\leqslant&\left( 1+\mathcal {E}_{0,t} \right) \prod _{j=0}^{i-1}\left( C_{t,\varrho } \,\mathcal {E}_{j,0}^{-\frac{1}{4}}\right) \leqslant \left( 1+\mathcal {E}_{0,t} \right) C^i_{t,\varrho }\prod _{j=0}^{i-1} \mathcal {E}_{0,0}^{-\frac{1}{4}\left( \frac{5}{4}\right) ^j}\\ \leqslant&M_t\cdot C^i_{t,\varrho }\cdot \mathcal {E}^{-\left( \frac{5}{4}\right) ^i}_{0,0}, \end{aligned}$$

with \(M_t= (1+\mathcal {E}_{0,t})\).

Now, for any fixed \(s>1\), we choose \(t=4s\). The interpolation inequalities (Lemma 5.4) imply

$$\begin{aligned} \mathcal {E}_{i,s}\leqslant C_{s}\, \mathcal {E}_{i,t}^{\frac{1}{4}}\cdot \mathcal {E}_{i,0}^{\frac{3}{4}}\leqslant C_s\left( M_t\, C^i_{t,\varrho }\right) ^{\frac{1}{4}}\, \mathcal {E}_{0,0}^{-\frac{1}{4}\left( \frac{5}{4}\right) ^i}\cdot \mathcal {E}_{0,0}^{\frac{3}{4}\left( \frac{5}{4}\right) ^i}\leqslant C_s\, M_t\, C_{t,\varrho }^{i}\cdot \mathcal {E}^{\frac{1}{2}\left( \frac{5}{4}\right) ^i}_{0,0}\, . \end{aligned}$$

Then, applying (7.2) with \(r=r'=s\) yields

$$\begin{aligned} \mathcal {U}_{i+1,s}\leqslant C_{s,\varrho }\, N_{i+1}^{\varrho }\mathcal {E}_{i,s}= C_{s,\varrho } \mathcal {E}_{i,0}^{-\frac{\varrho }{4(\varrho +1)}}\mathcal {E}_{i,s} \leqslant&C_{s,\varrho } C_s\,M_t\, C^{i}_{t,\varrho }\cdot \mathcal {E}_{0,0}^{-\frac{\varrho }{4(\varrho +1)}\left( \frac{5}{4}\right) ^i}\mathcal {E}^{\frac{1}{2}\left( \frac{5}{4}\right) ^i}_{0,0}\nonumber \\ \leqslant&L \cdot b^i\cdot \mathcal {E}^{\frac{1}{4}\left( \frac{5}{4}\right) ^i}_{0,0} \end{aligned}$$

where the constants \(L=C_{s,\varrho } C_s\,M_t\) and \(b=C_{t,\varrho }>1\), with \(t=4s\). Observe that although \( b^i \) grows exponentially, the term \(\mathcal {E}^{\frac{1}{4}\left( \frac{5}{4}\right) ^i}_{0,0}\) decays super-exponentially as \(i\rightarrow \infty \). Hence,

$$\begin{aligned}\mathcal {U}_{i+1,s}=\Vert \mathbf {h}^{(i+1)}\Vert _{s}\end{aligned}$$

still converges rapidly to zero as \(i\rightarrow \infty \). This implies the convergence of the sequence \(\mathcal {H}_l\) in the \(C^s\) topology and the limit is exactly \(\mathcal {H}_\infty \). Therefore, the limit \(\mathcal {H}_\infty \) is a \(C^\infty \) diffeomorphism of \(\mathbb {T}\times \mathcal {I}_{\frac{\delta }{2}}\) onto its image. This finishes the proof. \(\square \)

Now that Theorem 4.1 has been proved, by what we have shown in Sect. 4 it also implies Theorem A.