1 Introduction and main results

Among various techniques for finding the quasi-periodic solutions for partial differential equations (PDEs), Kolmogorov–Arnold–Moser (KAM) theory has been proven to be one of the most powerful approaches. This KAM theory provides us not only with the required quasi-periodic solutions for PDEs but also the linear stability of the preserved invariant tori. Let us briefly recall the existing literature along this line. Kuksin [12,13,14] and Wayne [24] were the first ones to extend the finite-dimensional KAM theory to the infinite-dimensional case and construct the corresponding quasi-periodic solutions for some Hamiltonian PDEs. Among those PDEs, the nonlinear Schrödinger equations (\(\mathbf{i}u_{t}-u_{xx}+Vu+f(|u|^{2})u=0\)) and the nonlinear wave equations (\(u_{tt}-u_{xx}+Vu+f(u)=0\)) in various situations have been investigated by many authors; see [4,5,6, 8, 9, 17, 23, 25, 26] for references. For those kind of PDEs with nonlinearity containing spatial derivative, like the KdV equations, Benjamin–Ono equations, derivative nonlinear Schrödinger equations and derivative nonlinear wave equations, the corresponding unbounded KAM theorems were developed to establish the existence of quasi-periodic solutions for these PDEs; see, for instance, [1, 2, 11, 15, 16, 18,19,20,21, 27] for references.

It is well known that the potentials V play a key role in constructing quasi-periodic solutions for PDEs via KAM theory, the reason lies in the fact that these potentials could help us to solve the small divisor problem occurring in each KAM iteration step. Actually, the KAM-machinery is to reformulate the PDE into a non-degenerate and partially integrable system plus a small perturbation, where parameters need to be introduced so as to adjust frequencies to overcome the small divisor problem, and fortunately one could extract parameters from these potentials V. However, in the aforementioned papers, the potentials V are always regular. A natural question is, what happens when the potentials possess a singularity? To the best of our knowledge, the only existing result in this direction is [7]. In [7], Cao and Yuan proved that the nonlinear wave equation with Legendre potentials admits plenty of quasi-periodic solutions of two frequencies. Precisely, the authors in [7] studied the following Legendre potential:

$$ V_{L}(x)=-\frac{1}{2}-\frac{1}{4}\tan ^{2}{x},\quad x\in \biggl(-\frac{ \pi }{2},\frac{\pi }{2} \biggr). $$
(1)

Obviously, this potential function admits a singularity at the endpoints \(x=\pm \frac{\pi }{2}\).

Enlightened by [7], in the present note, we consider a nonlinear wave equation in the following form:

$$ u_{tt}-u_{xx}+V_{L}(x)u+mu+\sec x \cdot u^{3}=0, $$
(2)

subject to the boundary conditions

$$ u\cdot \sqrt{\sec x}\quad \text{is bounded on } \biggl(- \frac{\pi }{2},\frac{ \pi }{2} \biggr). $$
(3)

We introduce the change of variables

$$ \textstyle\begin{cases} y=\sin {x},\\ z=\frac{u}{\sqrt{\cos {x}}}. \end{cases} $$

Equation (2) with its boundary conditions (3) can be rewritten as

$$ \textstyle\begin{cases} z_{tt}- ((1-y^{2})z_{y} )_{y}+ mz+z^{3}=0,\\ z(y)\quad \text{is bounded on }(-1,1). \end{cases} $$
(4)

Conventionally, we still write \(z(y)=u(x)\), \(y=x\). Set \(A= - \frac{d}{d x}(1-x^{2})\frac{d}{d x}+m\) and let \(\lambda ^{2}_{j}\) (\(\lambda _{j}>0\)), \(\phi _{j}\) (\(j=1,2,\ldots \)) be the eigenvalues and eigenfunctions of A, respectively. From [7], we know that

$$ \lambda ^{2}_{j}=j(j-1)+m,\qquad \phi _{j}=\sqrt{j-\frac{1}{2}}P_{j-1}(x), \quad j=1,2, \ldots , $$
(5)

where \(P_{j}(x)\) represents the jth Legendre polynomial. It is well known that the sequence \(\{\phi _{j}\}_{j\geq 1}\) form a complete orthogonal basis in \(L^{2}\). Hence one can expand u in terms of \(\phi _{j}\), that is, \(u=\sum_{j\geq 1}\frac{q_{j}(t)}{\sqrt{ \lambda _{j}}}\phi _{j}(x)\), then Eq. (4) turns into infinitely many ODEs, namely,

$$ \ddot{q}_{j}+\lambda ^{2}_{j}q_{j}+ \sqrt{\lambda _{j}} \bigl\langle u^{3}, \phi _{j} \bigr\rangle =0 $$
(6)

where \(q\in \ell ^{2}_{s}\) (the precise definition of \(\ell ^{2}_{s}\) will be given in (8)).

Since the quasi-periodic solutions to be constructed are of small amplitude, (4) may be considered as the linear equation \(u_{tt}- ((1-x^{2})u_{x} )_{x}+ mu=0\) plus a small nonlinear perturbation \(u^{3}\). It is clear that every solution of the linear system is the superposition of the eigenfunctions \(\phi _{j}\) and is of the form

$$ u=\sum_{j\geq 1}\frac{q_{j}(t)}{\sqrt{\lambda _{j}}}\phi _{j}(x), \quad q_{j}(t)=I_{j}\cos \bigl(\lambda _{j}t+\varphi ^{0}_{j}\bigr), $$

with amplitude \(I_{j}\geq 0\) and initial phase \(\varphi _{j}^{0}\). The solution \(u(t,x)\) is periodic, quasi-periodic or almost periodic, respectively, depending on whether one, finitely many, or infinitely many modes are excited. In this paper, for simplicity, we choose the three modes \(\phi _{1}\), \(\phi _{2}\), \(\phi _{3}\) to be excited. Let E be an invariant linear space of complex \(2\times 3\) dimension which is completely foliated into rotational tori. That is,

$$\begin{aligned} E =& \biggl\{ (u,v)=\biggl(\frac{q_{1}}{\sqrt{\lambda _{1}}}\phi _{1}+ \frac{q _{2}}{\sqrt{\lambda _{2}}}\phi _{2}+ \frac{q_{3}}{\sqrt{\lambda _{3}}}\phi _{3}, \sqrt{\lambda _{1}}p_{1}\phi _{1}+\sqrt{\lambda _{2}}p_{2}\phi _{2}+\sqrt{\lambda _{3}}p_{3}\phi _{3}\biggr)\biggr\} \\ =& \bigcup_{I\in \overline{\mathbb{P}^{3}}}\mathscr{T}(I), \end{aligned}$$

where \(\mathbb{P}^{3}=\{I\in \mathbb{R}^{3}:I_{j}>0\text{ for }j=1,2,3 \}\) is the positive quadrant in \(\mathbb{R}^{3}\) and

$$ \mathscr{T}(I)=\bigl\{ (u,v) : q_{j}^{2}+p_{j}^{2}=I_{j} \text{ for }j=1,2,3 \bigr\} . $$

This is the linear situation. Upon restoration of the nonlinearity \(u^{3}\), the invariant manifold E will not persist in their entirety due to the resonances. However, we shall show that a large Cantor subfamily of rotational 3-tori persists in a sufficiently small neighborhood of the origin.

Theorem 1.1

(Main theorem)

For every \(m\in (0,\frac{1}{4})\cup ( \frac{1}{4},+\infty )\), there exist a set \(\mathscr{C}\) in \(\mathbb{P}^{3}\) with positive Lebesgue measure, a family of 3-tori

$$ \mathscr{T}[\mathscr{C}]=\bigcup_{I\in \mathscr{C}}\mathscr{T}(I) \subset E $$

over \(\mathscr{C}\), and a Lipschitz continuous embedding into phase space \(\mathscr{P,}\)

$$ \varPhi :\mathscr{T}[\mathscr{C}]\hookrightarrow \mathscr{P}, $$

which is a higher order perturbation of the inclusion map \(\varPhi _{0}:E \hookrightarrow \mathscr{P}\) restricted to \(\mathscr{T}[\mathscr{C}]\), such that the restriction of Φ to each \(\mathscr{T}(I)\) in the family is an embedding of a rotational invariant 3-torus for the nonlinear Hamiltonian differential equation (4).

In our paper, we have generalized the results in [7] in the following aspects. First, when checking the Lemma 4.2 in [7] and the non-degeneracy condition, one has used a necessary condition, i.e., \(m<\frac{41}{4}\). However, we remove this restriction in this paper. Let us briefly explain our strategy. On the one hand, we choose \(z_{1}\), \(z_{2}\), \(z_{3}\) as tangential variables and just remove the fourth order terms with at most two normal variables instead of three ones in [7] to get the partial Birkhoff normal form. On the other hand, we will adopt a different method to verify the non-degeneracy condition without that restriction on m. Second, as we know, for the case \(V(x)\equiv m\), the perturbed vector field \(G_{q}\) belongs to \(\mathbf{A}(\ell ^{2}_{s},\ell ^{2}_{s+1})\), which is the collection of all the real analytic maps from some neighborhood of the origin in \(\ell ^{2}_{s}\) into \(\ell ^{2}_{s+1}\). As to the Legendre potential, Cao and Yuan [7] just proved that it holds true for \(s= \frac{7}{2}\) by making use of several complicated inequalities often used in the analysis of PDEs. Meanwhile, we claim that, for any \(s>1\), \(G_{q}\) belongs to \(\mathbf{A}(\ell ^{2}_{s},\ell ^{2}_{s+\frac{1}{2}})\), which is enough to check the KAM theorem. Finally, note that we obtain many quasi-periodic solutions for (4) with three frequencies instead of two in [7].

Remark 1.2

Note that our results hold true for \(m\in \mathbb{R}_{+}\setminus \{ \frac{1}{4}\}\), when \(m=\frac{1}{4}\), it is clear that \(\lambda _{j}=j+ \frac{1}{2} \), this is a completely resonant case for Eq. (2), and we cannot deal with this case. However, we point out a potential strategy, by making full use of a Lyapunov–Schmidt decomposition, variational methods and Nash–Moser implicit function theory, so one may expect to be able to handle this case. Actually, Berti and Procesi [3], Gentile, Mastropietro and Procesi [10] used this strategy and managed to obtain small amplitude periodic solutions for the completely resonant wave equation \(u_{tt}-u_{xx}+u ^{3}=0\), while Yuan [25] derived quasi-periodic solutions for a completely resonant wave equation.

2 The Hamiltonian

The Hamiltonian of the nonlinear wave, Eq. (4), is

$$ H=\frac{1}{2}\langle v,v\rangle +\frac{1}{2}\langle Au,u\rangle + \frac{1}{4} \int _{-1}^{1}u^{4}\,dx, $$

where \(A=-\frac{d}{d x}(1-x^{2})\frac{d}{d x}+m\). We rewrite H as a Hamiltonian in infinitely many coordinates by making the ansatz

$$ u=\sum_{j\geq 1}\frac{q_{j}}{\sqrt{\lambda _{j}}}\phi _{j}(x), \qquad v=\sum_{j\geq 1}\sqrt{\lambda _{j}}p_{j}\phi _{j}(x). $$
(7)

The coordinates are taken from the Hilbert space \(\ell ^{2}_{s}\) of all real valued sequences \(w=(w_{1},w_{2},\ldots )\) with finite norm,

$$ \Vert {w} \Vert ^{2}_{s}=\sum _{j\geq 1}j^{2s} \vert w_{j} \vert ^{2}< \infty . $$
(8)

One then gets the Hamiltonian

$$ H=\varLambda +G=\frac{1}{2}\sum _{j\geq 1}\lambda _{j}\bigl(p_{j}^{2}+q_{j}^{2} \bigr)+ \frac{1}{4}\sum_{i,j,k,l}G_{ijkl}q_{i}q_{j}q_{k}q_{l}, $$
(9)

where

$$ G_{ijkl}=\frac{1}{\sqrt{\lambda _{i}\lambda _{j}\lambda _{k}\lambda _{l}}} \int _{-1}^{1}\phi _{i}\phi _{j}\phi _{k}\phi _{l}\,dx $$
(10)

on the phase space with the symplectic structure in the form of \(\sum_{j\geq 1} dq_{j}\wedge d p_{j}\). Then its equations of motion are

$$ \dot{q}_{j}=\frac{\partial H}{\partial p_{j}}=\lambda _{j}p_{j},\qquad \dot{p}_{j}= \frac{\partial H}{\partial q_{j}}=-\lambda _{j}q_{j}-\frac{ \partial G}{\partial q_{j} }, \quad j\geq 1. $$
(11)

Next, we shall establish the regularity of the vector field \(G_{q}:= (G_{ q_{j}} )_{j\geq 1}\). To this end, we need some properties of the coefficients \(G_{ijkl}\).

Lemma 2.1

Assume \(0< i\leq j\leq k\leq l\), then \(G_{ijkl}=0\) unless \(l \leq i+j+k-2 \) or \(i\pm j\pm k\pm l \in 2\mathbb{Z}\). Moreover, there exists a constant \(C>0\) such that \(0\leq G_{ijkl}\leq \frac{C}{\sqrt{\lambda _{i}\lambda _{j}\lambda _{k}\lambda _{l}}}\).

For details, we refer to (3.13) and Lemma 3.3 in [7]. Note that the indices i, j, k, l in [7] are replaced by \(i-1\), \(j-1\), \(k-1\), \(l-1\), respectively.

Lemma 2.2

For \(s>1\), the vector field \(G_{q}\) is real analytic as a map from some neighborhood of the origin in \(\ell ^{2}_{s}\) into \(\ell ^{2}_{s+ \frac{1}{2}}\), with

$$ \Vert {G_{q}} \Vert _{s+\frac{1}{2}}= O\bigl( \Vert {q} \Vert ^{3}_{s}\bigr). $$
(12)

Proof

From (9), we obtain \(G_{ q_{j}}=\sum_{i,k,l}G _{ijkl}q_{i}q_{k}q_{l}\). In view of Lemma 2.1 and the facts \(\lambda _{j}\sim j\), it is clear that

$$ \vert G_{ q_{j}} \vert \leq Cj^{-\frac{1}{2}}\sum _{i,k,l}\frac{ \vert q_{i} \vert }{ \sqrt{i}}\frac{ \vert q_{k} \vert }{\sqrt{k}}\frac{ \vert q_{l} \vert }{\sqrt{l}}. $$

Thus, it follows that

$$\begin{aligned} \Vert {G_{q}} \Vert ^{2}_{s+\frac{1}{2}} &=\sum_{j\geq 1} j^{2s+1} \vert G_{ q_{j}} \vert ^{2} \\ &\leq C \sum_{j\geq 1} j^{2s} \biggl(\sum _{i,k,l}\frac{ \vert q_{i} \vert }{ \sqrt{i}}\frac{ \vert q_{k} \vert }{\sqrt{k}} \frac{ \vert q_{l} \vert }{\sqrt{l}} \biggr) ^{2} \\ &\leq C\sum_{j\geq 1} j^{2s} \biggl(6\sum _{i\leq k\leq l}\frac{ \vert q _{i} \vert }{\sqrt{i}}\frac{ \vert q_{k} \vert }{\sqrt{k}} \frac{ \vert q_{l} \vert }{\sqrt{l}} \biggr)^{2} \\ &\leq C\sum_{j\geq 1} j^{2s} \biggl[\biggl( \sum_{j\leq i\leq k\leq l\leq j+i+k-2}+ \sum_{i\leq j\leq k\leq l\leq i+j+k-2}+ \sum_{ i\leq k\leq j\leq l\leq i+k +j-2}+ \sum_{i\leq k\leq l\leq j\leq i+k+l-2} \biggr) \\ &\quad{}\cdot \frac{ \vert q_{i} \vert }{\sqrt{i}}\frac{ \vert q_{k} \vert }{\sqrt{k}}\frac{ \vert q _{l} \vert }{\sqrt{l}} \biggr]^{2} \\ &\leq C\sum_{j\geq 1} j^{2s} \biggl(\sum _{j\leq i\leq k\leq l\leq j+i+k-2}\frac{ \vert q _{i} \vert }{\sqrt{i}}\frac{ \vert q_{k} \vert }{\sqrt{k}} \frac{ \vert q_{l} \vert }{\sqrt{l}} \biggr)^{2}\\ &\quad {}+C\sum_{j\geq 1} j^{2s} \biggl(\sum_{i\leq j\leq k\leq l\leq i+j+k-2}\frac{ \vert q_{i} \vert }{\sqrt{i}} \frac{ \vert q _{k} \vert }{\sqrt{k}}\frac{ \vert q_{l} \vert }{\sqrt{l}} \biggr)^{2} \\ &\quad{} +C\sum_{j\geq 1} j^{2s} \biggl(\sum _{ i\leq k\leq j\leq l\leq i+k +j-2}\frac{ \vert q _{i} \vert }{\sqrt{i}}\frac{ \vert q_{k} \vert }{\sqrt{k}} \frac{ \vert q_{l} \vert }{\sqrt{l}} \biggr)^{2}\\ &\quad {}+C\sum_{j\geq 1} j^{2s} \biggl(\sum_{i\leq k\leq l\leq j\leq i+k+l-2}\frac{ \vert q_{i} \vert }{\sqrt{i}} \frac{ \vert q _{k} \vert }{\sqrt{k}}\frac{ \vert q_{l} \vert }{\sqrt{l}} \biggr)^{2} \\ & :=I+\mathit{II}+\mathit{III}+\mathit{IV}. \end{aligned}$$

For I, by the Cauchy inequality, one simply has

$$\begin{aligned} I &\leq C \sum_{j\geq 1} j^{2s} \biggl(\sum_{j\leq i\leq k\leq l\leq j+i+k-2}i ^{2s-1} \vert q_{i} \vert ^{2}k^{2s-1} \vert q_{k} \vert ^{2}l^{2s-1} \vert q_{l} \vert ^{2} \biggr) \\ &\quad{}\cdot \biggl(\sum_{j\leq i\leq k\leq l\leq j+i+k-2}\frac{1}{i^{2s}k ^{2s}l^{2s}} \biggr) \\ &\leq C\sum_{j\geq 1} \biggl(\sum _{j\leq i\leq k\leq l\leq j+i+k-2}i ^{2s-1} \vert q_{i} \vert ^{2}k^{2s-1} \vert q_{k} \vert ^{2}l^{2s-1} \vert q_{l} \vert ^{2} \biggr) \\ &\quad{}\cdot \biggl(\sum_{j\leq i\leq k\leq l\leq j+i+k-2 }\frac{1}{i^{2s}k ^{2s}} \biggr) \quad (\text{using } j\leq l ) \\ &\leq C\sum_{j\geq 1} \biggl( \sum _{j\leq i\leq k\leq l\leq j+i+k-2}i ^{2s-1} \vert q_{i} \vert ^{2}k^{2s-1} \vert q_{k} \vert ^{2}l^{2s-1} \vert q_{l} \vert ^{2} \biggr) \biggl(\sum_{i\leq k} \frac{i+j}{i^{2s}k^{2s}} \biggr) \\ &\leq C\sum_{j\geq 1}\sum_{j\leq i\leq k\leq l\leq j+i+k-2}i^{2s-1} \vert q _{i} \vert ^{2}k^{2s-1} \vert q_{k} \vert ^{2}l^{2s-1} \vert q_{l} \vert ^{2} \\ &\leq C \sum_{l\geq 1}l^{2s-1} \vert q_{l} \vert ^{2}\sum_{k=1}^{l}k^{2s-1} \vert q _{k} \vert ^{2}\sum _{i=1}^{k}i^{2s-1} \vert q_{i} \vert ^{2}\sum_{j=1}^{i}1 \\ &\leq C\sum_{l\geq 1}l^{2s-1} \vert q_{l} \vert ^{2}\sum_{k=1}^{l}k^{2s-1} \vert q_{k} \vert ^{2} \sum _{i=1}^{k}i^{2s} \vert q_{i} \vert ^{2}=O\bigl( \Vert {q} \Vert ^{6}_{s} \bigr). \end{aligned}$$

By the same argument as that of estimating I, one gets \(\mathit{II},\mathit{III}=O(\| {q}\|^{6}_{s})\).

As to IV, due to the fact that \(j\leq i+k+l\leq 3l\), it is clear that

$$\begin{aligned} \mathit{IV} \leq & C\sum_{j\geq 1} \biggl(\sum _{ i\leq k\leq l\leq j\leq i+k +l-2}i ^{2s-1} \vert q_{i} \vert ^{2}k^{2s-1} \vert q_{k} \vert ^{2}l^{2s-1} \vert q_{l} \vert ^{2} \biggr) \\ &{}\cdot \biggl(\sum_{i\leq k\leq l\leq j\leq i+k +l}\frac{ j^{2s}}{i ^{2s}k^{2s}l^{2s}} \biggr) \\ \leq & C\sum_{j\geq 1} \biggl(\sum _{ i\leq k\leq l\leq j\leq i+k +l-2}i ^{2s-1} \vert q_{i} \vert ^{2}k^{2s-1} \vert q_{k} \vert ^{2}l^{2s-1} \vert q_{l} \vert ^{2} \biggr) \\ &{}\cdot \biggl(\sum_{i\leq k\leq l\leq j\leq i+k +l-2}\frac{(3l)^{2s}}{i ^{2s}k^{2s}l^{2s}} \biggr) \\ \leq & C\sum_{j\geq 1} \biggl(\sum _{ i\leq k\leq l\leq j\leq i+k +l-2}i ^{2s-1} \vert q_{i} \vert ^{2}k^{2s-1} \vert q_{k} \vert ^{2}l^{2s-1} \vert q_{l} \vert ^{2} \biggr) \\ &{}\cdot \biggl(\sum_{i\leq k\leq l\leq j\leq i+k +l-2}\frac{1}{i^{2s}k ^{2s}} \biggr) \\ \leq & C\sum_{j\geq 1} \biggl(\sum _{ i\leq k\leq l\leq j\leq i+k +l-2}i ^{2s-1} \vert q_{i} \vert ^{2}k^{2s-1} \vert q_{k} \vert ^{2}l^{2s-1} \vert q_{l} \vert ^{2} \biggr) \biggl(\sum_{i\leq k} \frac{j-k}{i^{2s}k^{2s}} \biggr) \\ \leq & C\sum_{j\geq 1}\sum _{i\leq k\leq l \leq j\leq i+k+l-2}i^{2s-1} \vert q _{i} \vert ^{2}k^{2s-1} \vert q_{k} \vert ^{2}l^{2s} \vert q_{l} \vert ^{2} \\ \leq & C\sum_{l\geq 1}l^{2s} \vert q_{l} \vert ^{2}\sum_{k=1}^{l}k^{2s-1} \vert q_{k} \vert ^{2} \sum _{i=1}^{k}i^{2s-1} \vert q_{i} \vert ^{2}\sum_{j=l}^{i+k+l}1 \\ \leq & C\sum_{l\geq 1}l^{2s} \vert q_{l} \vert ^{2}\sum_{k=1}^{l}k^{2s} \vert q_{k} \vert ^{2} \sum _{i=1}^{k}i^{2s} \vert q_{i} \vert ^{2}=O\bigl( \Vert {q} \Vert ^{6}_{s} \bigr). \end{aligned}$$

Now we finish the proof.

To check the KAM theory in the last section, we need to address the coefficients \(G_{iijj}\). As we know, if \(\phi _{j}\) is the trigonometric function \(\sqrt{\frac{2}{\pi }}\sin x\), it is easy to calculate \(G_{iijj}\). However, in this paper, \(\phi _{j}\) is a Legendre polynomial and the calculation of the integral \(\int _{-1}^{1}\phi _{i}\phi _{j}\phi _{i}\phi _{j} \,d x\) is very complicated. Through direct calculus, one can derive the following useful facts:

$$\begin{aligned}& G_{11jj}=\frac{1}{2\lambda _{1}\lambda _{j}},\quad j\geq 1; \end{aligned}$$
(13)
$$\begin{aligned}& G_{22jj}=\frac{3(2j^{2}-2j-1)}{2\lambda _{2}\lambda _{j}(2j-3)(2j+1)}, \quad j\geq 2; \end{aligned}$$
(14)
$$\begin{aligned}& G_{33jj}=\frac{5(11j^{4}-22j^{3}-31j^{2}+42j+18)}{4\lambda _{3}\lambda _{j}(2j-5)(2j-3)(2j+1)(2j+3)},\quad j\geq 3. \end{aligned}$$
(15)

 □

3 Partial Birkhoff normal form

In this section, we shall derive the partial Birkhoff normal form for the Hamiltonian (9). To this end, we introduce the following complex coordinates:

$$ z_{j}=\frac{1}{\sqrt{2}}(q_{j}+\sqrt{-1}p_{j}), \qquad \bar{z}_{j}=\frac{1}{ \sqrt{2}}(q_{j}- \sqrt{-1}p_{j}). $$

Inserting them into (9), one gets the real analytic Hamiltonian

$$ \begin{aligned}[b] H &=\varLambda +G \\ &=\sum_{j}\lambda _{j} \vert z_{j} \vert ^{2}+ \frac{1}{16}\sum _{i,j,k,l} G_{ijkl}(z _{i}+\bar{z}_{i}) (z_{j}+\bar{z}_{j}) (z_{k}+\bar{z}_{k}) (z_{l}+\bar{z} _{l}) \end{aligned} $$
(16)

on the complex Hilbert \(\ell ^{2}_{s}\) with symplectic structure \(\sqrt{-1}\sum_{j\geq 1} dz_{j}\wedge d\bar{z}_{j}\). Real analytic means that H is a function of z and , real analytic in the real and imaginary part of z. Conveniently, introducing \(z_{-j}= \bar{z}_{j}\) for \(j\geq 1\), then H in (16) is written as

$$ H=\sum_{j\geq 1}\lambda _{j}z_{j}z_{-j}+ \frac{1}{16} \sum _{i,j,k,l\in \mathbb{Z}_{*}} G_{ijkl}z_{i}z_{j}z_{k}z_{l}, $$
(17)

where \(G_{ijkl}:=G_{|i||j||k||l|}\) for \(i,j,k,l\in \mathbb{Z}_{*}:= \mathbb{Z}\setminus \{0\}\).

Since the quadratic part of the Hamiltonian does not provide any “twist” required by KAM theory, we shall use the normal form technique to get the “twisted” integrable terms from the fourth order terms. To get a three-dimensional KAM torus, for simplicity, we choose \((z_{1},z_{2},z _{3})\) as tangential variables. All the other variables are called normal ones. In this part, the fourth order terms with at most two normal variables will be cancelled, while the other fourth order terms are left since they have no effect on the tori. Then we define the index sets \(\triangle _{*}\), \(*=0,1,2\) and \(\triangle _{3}\) in the following way: \(\triangle _{*}\) is the set of index \((i, j, k, l)\) such that there exist right ∗ components not in \(\{\pm 1,\pm 2,\pm 3\}\). \(\triangle _{3}\) is the set of index \((i, j, k, l)\) such that there exist at least three components not in \(\{\pm 1,\pm 2,\pm 3\}\).

Define the normal form set

$$ \mathcal{N}=\bigl\{ (i,j,k,l)\in \mathbb{Z}_{*}^{4} : (i,j,k,l) \text{ is of the form }(p,-p,q,-q) \text{ or its permutations}\bigr\} . $$

For our convenience, rewrite \(G=\bar{G}+\tilde{G}+\hat{G}\), where

$$\begin{aligned}& \bar{G}=\frac{1}{16}\sum_{\substack{(i,j,k,l)\in (\triangle _{0}\cup \triangle _{1} \cup \triangle _{2})\bigcap \mathcal{N}}}G_{ijkl}z_{i}z_{j}z_{k}z _{l}, \\& \tilde{G}=\frac{1}{16}\sum_{\substack{(i,j,k,l)\in (\triangle _{0}\cup \triangle _{1} \cup \triangle _{2})\setminus \mathcal{N}}}G_{ijkl}z_{i}z_{j}z_{k}z _{l}, \end{aligned}$$

and

$$ \hat{G}=\frac{1}{16}\sum_{ \substack{(i,j,k,l)\in \triangle _{3}}}G_{ijkl}z_{i} z_{j} z_{k} z_{l}. $$

We will eliminate by a symplectic coordinate transformation \(X_{F}^{1}\), which is the time-1-map of the flow of a Hamiltonian vector \(X_{F}\) given by a Hamiltonian

$$ F=\sum_{i,j,k,l\in \mathbb{Z}_{*}}F_{ijkl}z_{i} z_{j} z_{k} z_{l} $$
(18)

with coefficients

$$ \sqrt{-1} F_{ijkl}= \textstyle\begin{cases} \frac{1}{16}\frac{G_{ijkl}}{\lambda _{i}^{\prime }+\lambda _{j}^{ \prime }+\lambda _{k}^{\prime }+\lambda _{l}^{\prime }} & \text{for } (i,j,k,l)\in (\triangle _{0}\cup \triangle _{1}\cup \triangle _{2})\setminus \mathcal{N},\\ 0& \text{otherwise}. \end{cases} $$
(19)

Here \(\lambda _{j}^{\prime }:=\operatorname{sgn}j\cdot \lambda _{|j|}\) for \(j\in \mathbb{Z}_{*}\). Then formally we have

$$ \{\varLambda ,F\}+\tilde{G}=0 $$

where \(\{\cdot ,\cdot \}\) is the Poisson bracket with respect to the symplectic structure

$$ \sqrt{-1}\sum_{j\geq 1}\,dz_{j}\wedge dz_{-j}. $$

Expanding at \(t=0\) and using Taylor’s formula we formally obtain

$$ \begin{aligned}[b] H\circ \varGamma &= H\circ X^{t}_{F}|_{t=1} \\ &=H+ \{H,F \}+ \int _{0}^{1}(1-t) \bigl\{ \{H,F \},F \bigr\} \circ X^{t} _{F} \,d t \\ &=\varLambda +\bar{G}+\hat{G}+ \{G,F \}+ \int _{0} ^{1}(1-t) \bigl\{ \{H,F \},F \bigr\} \circ X^{t}_{F} \,d t. \end{aligned} $$
(20)

Now we need to show the correctness of the definition (19) and establish the regularity of the vector field \(X_{F}\). To this end, we check that the divisors \(\lambda _{i}^{\prime }+\lambda _{j}^{\prime }+ \lambda _{k}^{\prime }+\lambda _{l}^{\prime }\neq 0\).

Lemma 3.1

If \(m\in (0,\frac{1}{4})\cup (\frac{1}{4},+\infty )\), there exists a positive number σ depending on m such that

$$ \bigl\vert \lambda _{i}^{\prime }+\lambda _{j}^{\prime }+ \lambda _{k}^{\prime }+ \lambda _{l}^{\prime } \bigr\vert >\sigma (m),\quad \forall (i,j,k,l)\in ( \triangle _{0}\cup \triangle _{1}\cup \triangle _{2})\setminus \mathcal{N}. $$

Proof

If \(0< m<\frac{1}{4}\), the conclusion holds true, refer to [7]. In the following, we assume that \(m> \frac{1}{4}\). Since \(\lambda _{i}^{\prime }+\lambda _{j}^{\prime }+ \lambda _{k}^{\prime }+\lambda _{l}^{\prime }=\pm \lambda _{|i|}\pm \lambda _{|j|}\pm \lambda _{|k|}\pm \lambda _{|l|}\), it is equivalent to studying divisors of the form \(\delta :=\pm \lambda _{i}\pm \lambda _{j}\pm \lambda _{k}\pm \lambda _{l}\) where the indices i, j, k, l are positive integers. Without loss of generality, we assume that \(i\leq j\leq k\leq l\). Recalling Lemma 2.1, we know that \(l\leq i+j+k-2\) and \(i\pm j\pm k\pm l \in 2\mathbb{Z}\). To show that δ does not vanish, we distinguish them according to their number of minus signs. To shorten notation we let for example \(\delta _{++-+}= \lambda _{i}+\lambda _{j}-\lambda _{k}+\lambda _{l}\). Then we split this problem into several cases.

Case 0: No minus sign appears, this is trivial.

Case 1: One minus sign appears. Since \(i\le j\le k\le l\), it is sufficient to study

$$ \delta =\delta _{+++-}=\sqrt{i(i-1)+m}+\sqrt{j(j-1)+m}+ \sqrt{k(k-1)+m}- \sqrt{l(l-1)+m}. $$

We regard δ as a function of m and claim that \(\delta (0)>0\). In fact, if \(j\geq 2\), we have

$$ \begin{aligned}[b] \delta (0) &=\sqrt{i(i-1)}+ \sqrt{j(j-1)}+\sqrt{k(k-1)}- \sqrt{l(l-1)} \\ &=i+j+k-l-\frac{i}{\sqrt{i(i-1)}+i}-\frac{j}{\sqrt{j(j-1)}+j}-\frac{k}{ \sqrt{k(k-1)}+k} \\ &\quad{}+\frac{l}{\sqrt{l(l-1)+l}} \\ &\geq 2-1-2\times \frac{2}{2+\sqrt{2}}+\frac{1}{2} \\ &>0, \end{aligned} $$
(21)

where the third line we have used the fact that the function \(\frac{s}{\sqrt{s(s-1)}+s}\) is monotone decreasing on \([1,+\infty )\) and approaches \(\frac{1}{2}\) when \(s\to +\infty \). Since

$$ \begin{aligned}[b] \delta '(m) &=\frac{1}{2\sqrt{i(i-1)+m}}+ \frac{1}{2 \sqrt{j(j-1)+m}}+\frac{1}{2\sqrt{k(k-1)+m}}-\frac{1}{2 \sqrt{l(l-1)+m}} \\ &\ge \frac{1}{2\sqrt{i(i-1)+m}}, \end{aligned} $$

it is easy to see that

$$ \delta (m)\ge \int _{0}^{m}\frac{1}{2\sqrt{i(i-1)+s}}\,ds\ge \frac{m}{2 \sqrt{i(i-1)+m}}\geq \frac{m}{2\sqrt{6+m}},\quad \forall m>0. $$

Otherwise \(j=1\), then \(k=l\) due to the fact \(l\leq i+j+k-2\). Thus, \(\delta (m)=2\sqrt{m}\).

Case 2: Two minus signs appear. It suffices to consider the case \(\delta (m)=\delta _{+--+}\), namely,

$$ \delta (m)=\sqrt{i(i-1)+m}-\sqrt{j(j-1)+m}-\sqrt{k(k-1)+m}+ \sqrt{l(l-1)+m}. $$

Denote \(\mathfrak{S}=\{1,2,3\}\). We discuss this problem dividing it into several subcases.

  1. (a)

    \(i,j,k,l\in \mathfrak{S}\),

    1. (a1)

      When all of the four elements overlap, that is, \(i=j=k=l\), it follows that the corresponding terms are resonant ones.

    2. (a2)

      When \(i=j=k< l\) or \(i< j=k=l\), one obtains

      $$ \vert \delta \vert >\min \{\sqrt{m+2}-\sqrt{m},\sqrt{m+6}-\sqrt{m+2}\}>0. $$
    3. (a3)

      When \(i=j< k< l\) or \(i< j=k< l\) or \(i< j< k=l\), it is easy to check \(|\delta |>0\) for \(m\ne \frac{1}{4}\), respectively.

  2. (b)

    Three elements of i, j, k, l lie in \(\mathfrak{S}\), while the other one l lies outside of \(\mathfrak{S},l\ge 4\).

    1. (b1)

      When \(i=j=k\), say \(i=j=k=3\), one has

      $$ \delta =\sqrt{l(l-1)+m}-\sqrt{m+6}\geq \sqrt{m+12}-\sqrt{m+6}>0. $$
    2. (b2)

      When \(i=j< k,l\ge 4\), one obtains

      $$ \delta \ge \sqrt{12+m}-\sqrt{6+m}>0. $$

      As to the case \(i< j=k\), without loss of generality, we assume that \(i=1\), \(j=k=3\). In view of the fact that \(h(t)=\sqrt{t(t-1)+m}\) is monotone increasing and convex when \(m>\frac{1}{4}\), one has

      $$\begin{aligned} \delta =& \sqrt{m}-2\sqrt{6+m}+\sqrt{l(l+1)+m} \\ >& \sqrt{m}-2\sqrt{6+m}+\sqrt{20+m}>h''(5)>0. \end{aligned}$$

      We remark that the case \(i=1\), \(j=k=3\), \(k=4\) does not appear because of the condition \(i\pm j\pm k\pm l \in 2\mathbb{Z}\).

    3. (b3)

      When \(i=1\), \(j=2\), \(k=3\), \(l\ge 4\), one gets

      $$ \begin{aligned}[b] \delta &=\sqrt{m}-\sqrt{2+m}-\sqrt{6+m}+ \sqrt{l(l+1)+m} \\ &\ge \sqrt{m}-\sqrt{2+m}-\sqrt{6+m}+\sqrt{12+m} \\ &\ge \sqrt{6+m}-2\sqrt{2+m}+\sqrt{m} \\ &\ge h''(3) >0, \end{aligned} $$

      by using the convexity of \(h(t)\) when \(m>\frac{1}{4}\).

  3. (c)

    There exist two elements that lie in \(\mathfrak{S}\), while \(k,l\notin \mathfrak{S}\).

    1. (c1)

      In the case \(i=j=1\), or 2, or 3, one obtains

      $$ \begin{aligned}[b] \delta &=\sqrt{l(l-1)+m}-\sqrt{k(k-1)+m} \\ &\ge \sqrt{6+m}-\sqrt{m}>0. \end{aligned} $$
    2. (c2)

      In the case \(i< j\). When \(i=1\), \(j=2\), one gets

      $$ \begin{aligned}[b] \delta &=\sqrt{l(l-1)+m}-\sqrt{k(k-1)+m}- \sqrt{2+m}+\sqrt{m} \\ &\ge \sqrt{6+m}-2\sqrt{2+m}+\sqrt{m} \\ &\ge h''(3)>0. \end{aligned} $$

      When \(i=1\), \(j=3 \), it is clear that \(k\le l\le k+i+j-2\le k+2\). Since \(i\pm j\pm k\pm l\in 2\mathbb{Z}\), it follows that \(k=l\) or \(l=k+2\). When \(k=l\), \(\delta =\sqrt{6+m}-\sqrt{m}>0\). When \(l=k+2\), one has

      $$ \begin{aligned}[b] \delta &=\sqrt{(k+1) (k+2)+m}-\sqrt{k(k-1)+m}- \sqrt{6+m}+ \sqrt{m} \\ &\ge h(5)-2h(3)+h(1) \\ &\ge h''(5)>0. \end{aligned} $$

      When \(i=2\), \(j=3\), in view of \(i\pm j\pm k\pm l\in 2\mathbb{Z}\), we know that \(k\ne l\), that is, \(l\ge k+1\). Hence, it follows that

      $$ \begin{aligned}[b] \delta &=\sqrt{l(l-1)+m}-\sqrt{k(k-1)+m}- \sqrt{6+m}+ \sqrt{2+m} \\ &\ge h(k+1)-h(k)-h(3)+h(2) \\ &\ge h(4)-2h(3)+h(2) \\ &\ge h''(4)>0. \end{aligned} $$

 □

In view of (10) and the above lemma, in the same way as [7], the regularity of the vector field \(X_{F}\) could easily be established, that is,

$$ X_{F}\in \mathbf{A}\bigl(\ell ^{2}_{s,b}, \ell ^{2}_{s+\frac{1}{2},b}\bigr), $$
(22)

where \(X_{F}\in \mathbf{A}(\ell ^{2}_{s,b},\ell ^{2}_{s+\frac{1}{2},b})\) denotes the class of all real analytic maps from some neighborhood of the origin in \(\ell ^{2}_{s,b}\) into \(\ell ^{2}_{s+\frac{1}{2},b}\), and \(\ell ^{2}_{s,b}\) denotes the Hilbert space of all bi-infinite sequences with finite norm \(\Vert q\Vert ^{2}_{s,b}=|q_{0}|^{2}+\sum_{j}|q_{j}|^{2}j ^{2s}\).

Next, due to (20), we transform the Hamiltonian into the partial Birkhoff form of order four so that the KAM theorem can be applied.

Proposition 3.2

Assume \(m\in (0,\frac{1}{4})\cup (\frac{1}{4},+\infty )\), for the Hamiltonian \(H=\varLambda +G\) in (9), there exists a real analytic, symplectic change of coordinates Γ in some neighborhood of the origin in \(\ell ^{2}_{s,b}\) that takes it into

$$ H\circ \varGamma =\varLambda +\bar{G}+\hat{G}+K, $$

where

$$ \begin{aligned}[b] &K = \{G,F \}+ \int _{0}^{1}(1-t) \bigl\{ \{H,F \},F \bigr\} \circ X^{t}_{F} \,dt=O\bigl( \Vert {z} \Vert ^{6}_{4}\bigr), \\ &\bar{G} =\frac{1}{2}\sum_{\min (i,j)\leq 3} \bar{G}_{ij} \vert z_{i} \vert ^{2} \vert z _{j} \vert ^{2}, \end{aligned} $$

with uniquely determined coefficients \(\bar{G}_{ii}=\frac{12}{16}G _{iiii}\) and \(\bar{G}_{ij}=\frac{24}{16}G_{iijj}\) for \(i\neq j\) and \(\min (i,j)\leq 3\). Moreover, \(X_{\bar{G}},X_{\hat{G}},X_{K} \in \mathbf{A}(\ell ^{2}_{s,b},\ell ^{2}_{s+\frac{1}{2},b})\).

The proof is similar to that of Proposition 4.1 in [7], we omit the details.

4 The proof of main theorem

In this section, with the aid of the KAM theorem for infinite-dimensional Hamiltonian systems [22], we shall establish the existence of quasi-periodic solutions for Eq. (4).

First, we introduce symplectic polar and real coordinates by setting

$$ z_{j}= \textstyle\begin{cases} \sqrt{\xi _{j}+y_{j}}e^{-\sqrt{-1}x_{j}}, & j=1,2,3,\\ \frac{1}{ \sqrt{2}}(u_{j}+\sqrt{-1}v_{j}), &j\geq 4, \end{cases} $$
(23)

depending on the parameters \(\xi =(\xi _{1},\xi _{2},\xi _{3})\in \varPi =[0,1]^{3}\). The precise domain will be specified later when it matters. Then one has

$$ \sqrt{-1}\sum_{j\geq 1}\,dq_{j} \wedge d \bar{q}_{j}=\sum_{1\leq j \leq 3 }\,dx_{j} \wedge dy_{j}+\sum_{ j\geq 4 }\,du_{j} \wedge dv_{j}, $$

and

$$ \textstyle\begin{cases} I_{j}=\xi _{j}+y_{j},&j=1,2,3, \\ I_{k}=\frac{1}{2}(u_{k}^{2}+v_{k}^{2}),&k\ne 1,2,3. \end{cases} $$

Hence, up to a constant depending only on ξ, the Hamiltonian (still denoted by H) reads

$$ H=\varLambda +\bar{G}+\hat{G}+K=\bigl\langle \omega (\xi ),y\bigr\rangle + \frac{1}{2}\bigl\langle \varOmega (\xi ),u^{2}+v^{2} \bigr\rangle +\check{G}+ \hat{G}+K, $$

where

$$\begin{aligned}& \varLambda =\sum_{j=1}^{3} \lambda _{j} y_{j}+\frac{1}{2}\sum _{j\ge 4}\lambda _{j}\bigl(u_{j}^{2}+v_{j}^{2} \bigr), \end{aligned}$$
(24)
$$\begin{aligned}& \begin{aligned}[b] \bar{G} &=\frac{1}{2}\sum _{\max \{i,j\}\le 3}\bar{G}_{ij} \vert z _{i} \vert ^{2} \vert z_{j} \vert ^{2}+\frac{1}{2}\sum_{1\le i\le 3< j}\bar{G} _{ij} \vert z_{i} \vert ^{2} \vert z_{j} \vert ^{2}+\frac{1}{2}\sum _{1\le j\le 3< i} \bar{G}_{ij} \vert z_{i} \vert ^{2} \vert z_{j} \vert ^{2} \\ &=\frac{1}{2}\sum_{\max \{i,j\}\le 3}\bar{G}_{ij}( \xi _{i}+y _{i}) (\xi _{j}+y_{j})+ \frac{1}{4}\sum_{1\le i\le 3< j}\bar{G} _{ij}( \xi _{i}+y_{i}) \bigl(u_{j}^{2}+v_{j}^{2} \bigr) \\ &\quad{}+\frac{1}{4}\sum_{1\le j\le 3< i} \bar{G}_{ij}\bigl(u_{i}^{2}+v_{i} ^{2}\bigr) (\xi _{j}+y_{j}) \\ &=\frac{1}{2}\sum_{1\le i< j\le 3}\bar{G}_{ij}( \xi _{i} \xi _{j}+\xi _{j} y_{i}+\xi _{i} y_{j}+y_{i} y_{j})+ \frac{1}{4}\sum_{1\le i\le 3< j}\bar{G}_{ij}\xi _{i}\bigl(u_{j}^{2}+v_{j}^{2} \bigr) \\ &\quad{}+\frac{1}{4}\sum_{1\le i\le 3< j} \bar{G}_{ij}\bigl(u_{j}^{2}+v _{j}^{2} \bigr)y_{i}+\frac{1}{4}\sum_{1\le j\le 3< i} \bar{G}_{ij}\xi _{j}\bigl(u_{i}^{2}+v_{i}^{2} \bigr) \\ &\quad{}+\frac{1}{4}\sum_{1\le j\le 3< i} \bar{G}_{ij}y_{j}\bigl(u_{i}^{2}+v _{i}^{2}\bigr) \\ &=\frac{1}{2}\sum_{1\le i,j\le 3}\bar{G}_{ij} \xi _{i}\xi _{j}+ \sum_{j=1}^{3} \Biggl(\sum_{i=1}^{3}\bar{G}_{ij} \xi _{i}\Biggr)y_{j}+ \frac{1}{2}\sum _{j\ge 4}^{\infty } \sum_{i=1}^{\infty } \bar{G}_{ij}\xi _{i}\bigl(u_{j}^{2}+v_{j}^{2} \bigr)+h.o.t. \end{aligned} \end{aligned}$$
(25)

Hence the frequencies take the following form:

$$ \omega (\xi )=\alpha +A\xi , \varOmega (\xi )=\beta +B\xi $$
(26)

where

$$\begin{aligned}& \alpha =(\lambda _{1},\lambda _{2},\lambda _{3}),\qquad \beta =(\lambda _{4},\lambda _{5}, \ldots ), \end{aligned}$$
(27)
A= ( G ¯ 11 G ¯ 12 G ¯ 13 G ¯ 21 G ¯ 22 G ¯ 23 G ¯ 31 G ¯ 32 G ¯ 33 ) = 1 16 ( 6 λ 1 2 12 λ 1 λ 2 12 λ 1 λ 3 12 λ 1 λ 2 54 5 λ 2 2 132 7 λ 2 λ 3 12 λ 1 λ 3 132 7 λ 2 λ 3 90 7 λ 3 2 ) ,
(28)
$$\begin{aligned}& B= \begin{pmatrix} \bar{G}_{40} & \bar{G}_{41} & \bar{G}_{42} \\ \bar{G}_{50} & \bar{G}_{51} & \bar{G}_{52} \\ \cdots & \cdots & \cdots \end{pmatrix}, \end{aligned}$$
(29)

and the remainder

$$ \check{G}=O\bigl(y^{2}+ \vert y \vert \bigl(u^{2}+v^{2}\bigr)\bigr),\qquad \hat{G}=O\bigl( \vert \xi \vert ^{\frac{1}{2}}\bigl(u ^{2}+v^{2}\bigr)^{\frac{3}{2}} \bigr),\qquad K=O\bigl( \vert \xi \vert ^{3}\bigr). $$
(30)

From (26), we know that the frequencies are affine functions of the parameters ξ. To prove our main theorem, by Theorem D in [22], we only check the assumptions for Theorem A in [22]. These assumptions are, respectively, non-degeneracy, spectral asymptotics, regularity and smallness of the perturbation. Due to (5) and Proposition 3.2, it is easy to check the second and the third assumptions. To verify the non-degeneracy assumption, it is enough to prove that the following three conditions hold true:

$$ \begin{aligned} &(A_{1})\quad \det {A}\neq 0, \\ &(A_{2})\quad \langle l,\beta \rangle \neq 0, \\ &(A_{3})\quad \bigl\langle k,\omega (\xi )\bigr\rangle +\bigl\langle l, \varOmega (\xi )\bigr\rangle \not \equiv 0, \end{aligned} $$
(31)

for all \((k,l)\in \mathbb{Z}^{3}\times \mathbb{Z}^{\infty }\) with \(1\leq |l|\leq 2\).

For simplicity, set

A 0 := ( 6 12 12 12 54 5 132 7 12 132 7 90 7 ) , A 1 := ( 1 λ 1 0 0 0 1 λ 2 0 0 0 1 λ 3 ) ,

then one gets

$$ A=\frac{1}{16}A_{1} A_{0} A_{1}. $$
(32)

It is easy to obtain the inverse matrix of \(A_{0}\),

A 0 1 = ( 1475 4926 245 2463 329 2463 245 2463 455 4926 35 821 329 2463 35 821 539 4926 ) ,
(33)

which yields \(\det A\ne 0\), since \(A_{1}\) is invertible. The condition \((A_{2})\) is easy to check since \(\lambda _{j}\) or \(\lambda _{i}\pm \lambda _{j}\) (\(i\neq j\)) are not equal to zero.

Secondly, we shall check the non-degeneracy condition \((A_{3})\). It is equivalent to showing that either \(\langle \alpha ,k\rangle +\langle \beta ,l\rangle \neq 0\) or \(Ak+B^{T}l\neq 0\) for all \((k,l)\) with \(1\leq |l|\leq 2 \). Suppose that \(Ak+B^{T}l=0\), that is, \(k=-A^{-1}B ^{T}l\). We shall discuss it dividing it into two cases.

Case I. \(|l|=1\). Without loss of generality, we assume that

$$ l=\bigl(0,\ldots ,\underset{\underset{(j-3)\text{th}}{\uparrow }}{-1},0,\ldots \bigr) $$

with \(j\geq 4\), one has

$$ k= \begin{pmatrix} k_{1} \\ k_{2} \\ k_{3} \end{pmatrix} =A^{-1}( \bar{G}_{j1},\bar{G}_{j2},\bar{G}_{j3})^{T} =16A_{1} ^{-1}A_{0}^{-1} A_{1}^{-1}(\bar{G}_{j1},\bar{G}_{j2}, \bar{G}_{j3})^{T}, $$
(34)

where

$$ \begin{aligned} \bar{G}_{j1} &= \frac{24\cdot \frac{1}{2}(j+\frac{1}{2})P(0,j)}{16 \lambda _{1}\lambda _{j}}=\frac{12}{16\lambda _{1}\lambda _{j}}, \\ \bar{G}_{j2} &=\frac{24\cdot \frac{1}{2}(j+\frac{1}{2})P(1,j)}{16 \lambda _{2}\lambda _{j}}=\frac{36(2j^{2}+2j-1)}{16\lambda _{2}\lambda _{j}(2j-1)(2j+3)}, \\ \bar{G}_{j3} &=\frac{24\cdot (2+\frac{1}{2})(j+\frac{1}{2})P(2,j)}{16 \lambda _{3}\lambda _{j}}=\frac{30(11j^{4}+22j^{3}-31j^{2}-42j+18)}{16 \lambda _{3} \lambda _{j}(2j-3)(2j-1)(2j+3)(2j+5)}. \end{aligned} $$
(35)

Let

$$ f(t)=\frac{2t^{2}+2t-1}{(2t-1)(2t+3)},\qquad g(t)=\frac{11t^{4}+22t^{3}-31t ^{2}-42t+18}{(2t-3)(2t-1)(2t+1)(2t+3)(2t+5)}, $$

one can show that the above two functions \(f(t)\), \(g(t)\) are monotone decreasing when \(t\ge 3\), which yields

$$ f(j)\in \biggl(\frac{1}{2},\frac{23}{45}\biggr], \qquad g(j)\in \biggl( \frac{11}{16}, \frac{122}{165}\biggr],\quad j\ge 3. $$
(36)

Clearly, we have

$$ k_{1}=\frac{\lambda _{1}}{\lambda _{j}} \biggl[\frac{2940}{2463}- \frac{16380}{4926}f(j)+\frac{1050}{821}g(j) \biggr], $$

In view of (36), one can easily get \(k_{1}\notin \mathbb{Z}\), which is a contradiction.

Case II. \(|l|=2\), in this case, it suffices to consider the following two subcases.

Subcase 1. Assume that \(l=(0,\ldots ,\underset{\underset{(i-3)\text{th}}{ \uparrow }}{-1},0,\ldots ,0, \underset{\underset{(j-3)\text{th}}{\uparrow }}{-1},0,\ldots )\) with \(j> i\geq 4\). This time we have

$$ k=(k_{1},k_{2},k_{3})^{T}=A^{-1}( \bar{G}_{j1}+\bar{G}_{i1},\bar{G} _{j2}+ \bar{G}_{i2},\bar{G}_{j3}+\bar{G}_{i3})^{T}, $$
(37)

specially one has

$$ k_{2}=\frac{\lambda _{2}}{\lambda _{i}} \biggl[ \biggl( \frac{2940}{2463}- \frac{16\text{,}380}{4926}f(i)+\frac{1050}{821}g(i) \biggr) \biggl(1+\frac{\lambda _{i}}{\lambda _{j}} \biggr) \biggr], $$
(38)

Due to \(f(j)\in (\frac{1}{2},\frac{23}{45}]\), \(g(j)\in ( \frac{11}{16},\frac{122}{165}]\), one can deduce that \(k_{2}\notin \mathbb{Z}\), which cannot happen.

Subcase 2. Assume that \(l=(0,\ldots ,\underset{\underset{(i-3)\text{th}}{ \uparrow }}{1},0,\ldots ,0, \underset{\underset{(j-3)\text{th}}{\uparrow }}{-1},0,\ldots )\) with \(j>i\geq 4\). One has

$$ k=A^{-1}(\bar{G}_{j1}-\bar{G}_{i1}, \bar{G}_{j2}-\bar{G}_{i2},\bar{G} _{j3}- \bar{G}_{i3})^{T} $$
(39)

and

$$ k_{2}=\lambda _{2} \biggl[\frac{2940}{2463} \biggl(\frac{1}{\lambda _{j}}-\frac{1}{ \lambda _{i}}\biggr)-\frac{5460}{4926}\biggl( \frac{f(j)}{\lambda _{j}}-\frac{f(i)}{ \lambda _{i}}\biggr) +\frac{1050}{821}\biggl( \frac{g(j)}{\lambda _{j}}-\frac{g(i)}{ \lambda _{i}}\biggr) \biggr], $$
(40)

where

$$ \begin{aligned} &\frac{f(j)}{\lambda _{j}}-\frac{f(i)}{\lambda _{i}} =f(j) \biggl(\frac{1}{ \lambda _{j}}-\frac{1}{\lambda _{i}}\biggr)+\frac{f(j)-f(i)}{\lambda _{i}}, \\ &\frac{g(j)}{\lambda _{j}}-\frac{g(i)}{\lambda _{i}}=g(j) \biggl(\frac{1}{ \lambda _{j}}- \frac{1}{\lambda _{i}}\biggr)+\frac{g(j)-g(i)}{\lambda _{i}}. \end{aligned} $$

By simple computation, one gets

$$ \frac{1}{\lambda _{j}}-\frac{1}{\lambda _{i}}=\frac{\lambda _{i}^{2}- \lambda _{j}^{2}}{\lambda _{i}\lambda _{j}(\lambda _{i}+\lambda _{j})}= \frac{(i-j)(i+j+1)}{ \lambda _{i}\lambda _{j}(\lambda _{i}+\lambda _{j})}. $$

Furthermore,

$$ \biggl\vert \frac{i-j}{\lambda _{j}} \biggr\vert < 1,\qquad \biggl\vert \frac{i+j+1}{\lambda _{i}+\lambda _{j}} \biggr\vert < \frac{8}{7}. $$

Using (36), (40) and the above inequality, we have \(|k_{2}|<1\), which indicates \(k_{2}=0\). Similar arguments yield \(k_{3}=0\). As to \(k_{1}\), we have

$$ \begin{aligned}[b] k_{1} &=\frac{\lambda _{1}}{\lambda _{i}} \biggl[- \frac{17\text{,}700}{4926}+ \frac{8820}{2463}f(j)+\frac{9870}{2463}g(j) \biggr] \biggl[\frac{\lambda _{i}}{\lambda _{j}}-1\biggr] \\ &\quad{}+\frac{\lambda _{1}}{\lambda _{i}} \biggl[\frac{8820}{2463}\bigl(f(j)-f(i)\bigr)+ \frac{9870}{2463}\bigl(g(j)-g(i)\bigr) \biggr], \end{aligned} $$
(41)

it follows that \(k_{1}\in (-1.6,0)\), which indicates \(k_{1}=-1\). This time, if

$$ \langle k,\alpha \rangle +\langle l,\beta \rangle =0, $$

then one gets \(-\alpha _{1}+\beta _{i}-\beta _{j}=0\), which yields \(\beta _{i}-\beta _{j}=\alpha _{1}>0\), this is in contradiction with the fact that \(\beta _{j}>\beta _{i}\).

Finally, it remains to check the small perturbation assumption. To make this more precise we introduce complex neighborhoods

$$ D(s,r): \vert \mathit{Im}\, x \vert < s, \vert y \vert < r^{2}, \Vert u \Vert _{s}+ \Vert v \Vert _{s}< r $$

of \(\mathbb{T}^{3}\times \{y=0\}\times \{u=0\}\times \{v=0\}\) and weighted norms

$$ \bigl\vert (x,y,u,v) \bigr\vert _{r}= \vert x \vert + \frac{ \vert y \vert }{r^{2}}+\frac{ \Vert u \Vert _{s}}{r}+ \frac{ \Vert v \Vert _{s}}{r}, $$

where \(|\cdot |\) is the max-norm for complex vectors. Then we assume that the Hamiltonian vector field \(X_{G}\) is real analytic on \(D(s,r)\) for some positive s, r uniformly in ξ with finite norm \(|X_{G}|_{r,D(s,r)}=\sup_{D(s,r)}|X_{G}|_{r}\), and that the same holds for its Lipschitz semi-norm

$$ \vert X_{G} \vert ^{\mathcal{L}}_{r}=\sup _{\xi \neq \zeta }\frac{ \vert \Delta _{\xi \zeta }X_{G} \vert _{r,D(s,r)}}{ \vert \xi -\zeta \vert }, $$

where \(\Delta _{\xi \zeta }X_{G}=X_{G}(\cdot ,\xi )-X_{G}(\cdot , \zeta )\), and where the sup is taken over Π.

Set \(\varPi =\{ \xi \in [0,1]^{2}: 0<|\xi |\leq r^{\frac{4}{3}}\}\). From the perturbation term (30), we easily get

$$\begin{aligned} \begin{aligned}[b] \vert X_{(\check{G}+\hat{G}+K)} \vert _{r,D(s,r)}&\leq \vert X_{ \check{G}} \vert _{r,D(s,r)}+ \vert X_{\hat{G}} \vert _{r,D(s,r)}+ \vert X_{K} \vert _{r,D(s,r)} \\ &=O\bigl(r^{2}\bigr)+O\bigl(r^{\frac{5}{3}}\bigr)+O \bigl(r^{2}\bigr)=O\bigl(r^{\frac{5}{3}}\bigr). \end{aligned} \end{aligned}$$
(42)

Since \(X_{(\check{G}+\hat{G}+K)}\) is analytic in ξ, one has

$$ \vert X_{(\check{G}+\hat{G}+K)} \vert ^{\mathcal{L}}_{r}=O \bigl(r^{ \frac{5}{3}}r^{-\frac{4}{3}}\bigr)=O\bigl(r^{\frac{1}{3}}\bigr). $$

If r is small enough, the small perturbation assumption for KAM is satisfied. Now, the proof of our main theorem is complete by applying the KAM theorem in [22].