1 Introduction

The Kustaanheimo–Stiefel (KS) transformation is probably the most renowned regularization technique for the three-dimensional Kepler problem. In the planar case, the conversion of the Kepler problem into a harmonic oscillator has been known since Goursat (1889) and Levi-Civita (1906), but its extension to the three-dimensional problem took many decades of futile efforts. Finally, Kustaanheimo (1964) discovered that the way to the third dimension is not direct, but requires a detour through a constrained problem with four degrees of freedom. The KS transformation gained popularity in the matrix–vector formulation of Kustaanheimo and Stiefel (1965), but it is much easier to interpret and generalize in the language of quaternion algebra, very closely related to the original spinor formulation of Kustaanheimo (1964).

The most common use of the KS transformation is the numerical integration of perturbed elliptic motion, where many intricacies introduced by the additional degree of freedom can be ignored, although—as recently demonstrated by Roa et al. (2016)—they can be quite useful in the assessment of a global integration error. Analytical perturbation methods for KS-transformed problems often follow the way indicated by Kustaanheimo and Stiefel (1965) and developed by Stiefel and Scheifele (1971): variation of arbitrary constants is applied to constant vector amplitudes of the KS coordinates and velocities. But those who want to benefit from the wealth of canonical formalism require a set of action–angle variables of the regularized Kepler problem.

The first step in this direction can be found in the monograph by Stiefel and Scheifele (1971), where the symplectic polar coordinates are introduced for each separate degree of freedom. However, this approach does not account for degeneracy of the problem and thus is unfit for the averaging-based perturbation techniques. Moreover, no attempt was made to relate this set to the constraint known as the ‘bilinear invariant’, effectively reducing the system to three degrees of freedom. Both problems have been resolved by Zhao (2015), who proposed the ‘LCF’ variables [presumably named after Levi-Civita (1906) and Féjoz (2001)]. In his approach, the motion in the KS variables is considered in an osculating ‘Levi-Civita plane’ (Deprit et al. 1994) as a two-degree-of-freedom problem. The third degree of freedom is added by the pair of action–angle variables orienting the plane. The redundant fourth degree is hidden in the definition of the Levi-Civita plane. The transformed Keplerian Hamiltonian depends on a single action variable, the other two actions being closely related to the angular momentum and its projection on the polar axis. Interestingly, the result is identical to the ‘isoenergetic variables’ found by Levi-Civita (1913) without regularization.

The LCF variables respect the degeneracy and bring the oscillations back to three degrees of freedom. Yet they possess a significant weakness: they are founded on the orientation of a plane determined by the angular momentum. Whenever the angular momentum vanishes (even temporarily), the angles become undetermined and equations of motion are singular. It turns out that seeking the proximity to the Delaunay variables, Zhao (2015) reintroduced the singularities of unregularized Kepler problem. Of course, some singularities are inevitable when the problem having spherical topology is mapped onto a torus of action–angle variables. But there is always some freedom in the choice of the singularities. Recalling that the main purpose of regularization is to allow the study of highly elliptic and rectilinear orbits, we find it worth an effort to construct the action–angle set that—unlike the LCF variables—is regular for this class of motions.

The main goal of the present work is to derive an alternative set of the action–angle variables which is not based upon the notion of an orbital plane (thus avoiding singularities when the orbit degenerates into a straight segment) and to test it on some well-known astronomical problem. Section 2 introduces some preliminary concepts related to the KS coordinate transformation in the language of quaternions. We use its generalized form with an arbitrary ‘defining vector’ (Breiter and Langner 2017), which helps to realize how the choice of the KS1 or KS3 convention allows or inhibits the use of the Levi-Civita plane in the construction of the action–angle sets. We have also benefited from the opportunity to polish and extend the geometrical interpretation given to the KS transformation by Saha (2009). In Sect. 3, we complement the KS coordinates with their conjugate momenta and provide the Hamiltonian function in the extended phase space as the departure point for further transformations. Section 4 builds the new action–angle set—the Lissajous–Kustaanheimo–Stiefel (LKS) variables. Two independent Lissajous transformations are followed by a linear Mathieu transformation. In Sect. 5, we show how to interpret the new variables not only in terms of the Lissajous ellipses, but also by the reference to the angular momentum and Laplace vectors of the Kepler problem. As an application, we discuss the classical Lidov–Kozai problem (Sect. 6), showing that stability of rectilinear orbits can be discussed directly in terms of the LKS variables, which has not been possible using the Delaunay or the LCF framework. Conclusions and future prospects are presented in the closing Sect. 7.

2 KS transformation in quaternion form

2.1 Quaternion algebra

Adhering to the convention used by Deprit et al. (1994), we treat a quaternion \(\mathsf {v} \in {\mathbb {H}}\) as union of a scalar \(v_0\) and a vector \(\mathbf {v}\),

$$\begin{aligned} \mathsf {v} = \left( v_0, \mathbf {v} \right) = \sum _{j=0}^3 v_j \mathsf {e}_j, \end{aligned}$$
(1)

where the standard basis quaternions

$$\begin{aligned} \mathsf {e}_0 = (1,\mathbf {0}), \quad \mathsf {e}_1 = (0,\mathbf {e}_1), \quad \mathsf {e}_2 = (0,\mathbf {e}_2), \quad \mathsf {e}_3 = (0,\mathbf {e}_3), \end{aligned}$$
(2)

have been defined by referring to the standard vector basis \(\mathbf {e}_j\). Downgrading a ‘pure quaternion’ \(\mathsf {u} = (0,\mathbf {u}) \in {\mathbb {H}}'\) to a vector \(\mathbf {u} \in {\mathbb {R}}^3\) requires application of the projection operator \(\natural \), whose action on any quaternion is \(\mathsf {v}^\natural = (v_0, \mathbf {v})^\natural = \mathbf {v}\).

As members of the Euclidean linear space \({\mathbb {R}}^4\), quaternions admit the sum and product-by-scalar rules

$$\begin{aligned} \mathsf {u} + \mathsf {v} = \sum _{j=0}^3 \left( u_j+v_j\right) \mathsf {e}_j, \quad \alpha \mathsf {v} = \sum _{j=0}^3 \alpha v_j \mathsf {e}_j, \end{aligned}$$
(3)

as well as the scalar product

$$\begin{aligned} \mathsf {u} \mathbf {\cdot }\mathsf {v} = \sum _{j=0}^3 u_j v_j = u_0 v_0 + \mathbf {u} \mathbf {\cdot }\mathbf {v}, \end{aligned}$$
(4)

implying the norm \( |\mathsf {v} | = \sqrt{\mathsf {v} \mathbf {\cdot }\mathsf {v} } = \sqrt{v_0^2 + \Vert \mathbf {v}\Vert ^2}\), where \(\Vert \mathbf {v}\Vert = \sqrt{\mathbf {v}\cdot \mathbf {v}},\) to distinguish the norms in \({\mathbb {R}}^3\) and \({\mathbb {R}}^4\).

What makes four vectors \(\mathsf {u}\) and \(\mathsf {v}\) the members of the quaternion algebra \({\mathbb {H}}\) over \({\mathbb {R}}\), is the noncommutative quaternion product definition

$$\begin{aligned} \mathsf {u}\mathsf {v} = \left( u_0 v_0 - \mathbf {u} \cdot \mathbf {v}, u_0 \mathbf {v} + v_0 \mathbf {u} + \mathbf {u} \mathbf {\times }\mathbf {v}\right) . \end{aligned}$$
(5)

Note that \({\mathbb {H}}'\) is only a linear subspace, but not a subalgebra of \({\mathbb {H}}\), because the quaternion product of two pure quaternions may have a nonzero scalar part.

Two other useful operations to be defined are the quaternion conjugate

$$\begin{aligned} {\overline{\mathsf {v}}} = (v_0 , - \mathbf {v}), \end{aligned}$$
(6)

allowing to write \(|\mathsf {v}|^2 = \mathsf {v} {\overline{\mathsf {v}}}\), and the quaternion cross product

$$\begin{aligned} \mathsf {u} \wedge \mathsf {v} = \frac{ \mathsf {v} \bar{\mathsf {u}} - \mathsf {u} \bar{\mathsf {v}} }{2} = ( 0, u_0 \mathbf {v} - v_0 \mathbf {u} + \mathbf {u} \mathbf {\times }\mathbf {v}), \end{aligned}$$
(7)

always resulting in a pure quaternion, and reducing to a standard vector cross product if \(u_0=v_0=0\).

2.2 KS coordinates transformation

2.2.1 Generalized definition

In a recent paper (Breiter and Langner 2017), we have proposed a generalized form of the standard KS transformation \(\kappa \) that uses an arbitrary ‘defining vector’ \(\mathbf {c}\) with a unit norm and its respective pure quaternion \(\mathsf {c} = (0,\mathbf {c})\), so that

$$\begin{aligned} \kappa : {\mathbb {H}} \rightarrow {\mathbb {H}}' : \mathsf {v} \mapsto \mathsf {x} = \frac{\mathsf {v} \mathsf {c} {\overline{\mathsf {v}}}}{\alpha }, \end{aligned}$$
(8)

or, equivalently,

$$\begin{aligned} \alpha \mathbf {x} = \left( v_0^2 - \mathbf {v}\mathbf {\cdot }\mathbf {v} \right) \mathbf {c} + 2 \left( \mathbf {c}\mathbf {\cdot }\mathbf {v}\right) \mathbf {v} + 2 v_0 \mathbf {v} \mathbf {\times }\mathbf {c} = \left( \mathbf {c} \mathbf {\cdot }\mathbf {v}\right) \,\mathbf {v}+\left[ \mathsf {v} \wedge (\mathsf {v} \wedge \mathsf {c})\right] ^\natural , \end{aligned}$$
(9)

links the KS variables quaternion \(\mathsf {v}\) with the original Cartesian coordinates \(\mathbf {x} \in {\mathbb {R}}^3\), the latter being the vector part of a pure quaternion \(\mathsf {x} = (0, \mathbf {x})\). A real, positive parameter \(\alpha \) was introduced by Deprit et al. (1994). They gave it the units of length, in order to allow the KS coordinates \(v_j\) carry the same units as \(x_j\). We adhere to this convention for a while, although other options will be presented in Sect. 3. With \(|\mathsf {c}|=\Vert \mathbf {c}\Vert =1\), the KS transformation \(\kappa \) admits the well-known property

$$\begin{aligned} \Vert \mathbf {x} \Vert = r = \frac{\mathsf {v} \mathbf {\cdot }\mathsf {v}}{\alpha }. \end{aligned}$$
(10)

2.2.2 Fibres

A noninjective nature of the KS map had been known since its origins, although only recently it has been considered more an advantage than a nuisance (Roa et al. 2016).

Let us introduce a quaternion-valued function of angle \(\phi \)

$$\begin{aligned} \mathsf {q}(\phi ) = (\cos {\phi }, \sin {\phi }\mathbf {c}), \end{aligned}$$
(11)

with a number of useful properties, such as

$$\begin{aligned} |\mathsf {q}(\phi )|= & {} 1, \end{aligned}$$
(12)
$$\begin{aligned} \mathsf {q}(\phi ) \mathsf {q}(\psi )= & {} \mathsf {q}(\phi +\psi ), \end{aligned}$$
(13)
$$\begin{aligned} \left[ \mathsf {q}(\phi )\right] ^{-1}= & {} \mathsf {q}(-\phi ) = \overline{\mathsf {q}}(\phi ), \end{aligned}$$
(14)
$$\begin{aligned} \mathsf {q}(\phi ) \mathsf {c} \overline{\mathsf {q}}(\phi )= & {} \mathsf {c}, \end{aligned}$$
(15)

and special values \(\mathsf {q}(0) = \mathsf {e}_0\), \(\mathsf {q}(\pi /2) = \mathsf {c}\). Property (15) clearly implies that the KS transformation (8) is only homomorphic: given some representative KS quaternion \(\mathsf {v}\), all quaternions \(\mathsf {v}\mathsf {q}(\phi )\) belonging to the fibre parameterized by \(0 \leqslant \phi < 2 \pi \) render the same vector \(\mathbf {x}\), i.e. \(\kappa (\mathsf {v}) = \kappa (\mathsf {v}\mathsf {q}(\phi ))\). Indeed, since (15) describes the rotation of vector \(\mathbf {c}\) around the axis \(\mathbf {c}\), the left-hand side of the equality can be substituted for \(\mathsf {c}\) in Eq. (8), and then \(\mathsf {v} \mathsf {c} \, {\overline{\mathsf {v}}} = (\mathsf {v} \mathsf {q}) \mathsf {c} \overline{(\mathsf {v} \mathsf {q})}\), leading to the same \(\mathsf {x}\).

On the other hand, one might ask about the possibility of generating the fibre through the left multiplication by some quaternion function. Multiplying both sides of equality in (8) by a quaternion \(\mathsf {p}\) from the left and its conjugate from the right, we find the condition

$$\begin{aligned} \mathsf {p}\,\mathsf {x}\,\overline{\mathsf {p}} = \frac{(\mathsf {p} \mathsf {v}) \mathsf {c} \, \overline{(\mathsf {p} \mathsf {v})} }{\alpha }, \end{aligned}$$

where the left-hand side remains equal to \(\mathsf {x} = (0,\mathbf {x})\) only if \(\mathsf {p}\) is a function

$$\begin{aligned} \mathsf {p}(\phi ) = ( \cos {\phi }, \sin {\phi } {\hat{\mathbf {x}}}), \end{aligned}$$
(16)

that rotates vector \(\mathbf {x}\) around itself. Thus, given some representative KS quaternion \(\mathsf {v}\), we can create the fibre \(\mathsf {p}(\phi )\mathsf {v}\) parameterized by \(0 \leqslant \phi < 2 \pi \), such that \(\kappa (\mathsf {v})=\kappa (\mathsf {p}(\phi )\mathsf {v}) = \mathsf {x}\).

The action of the fibre generators (11) and (16) with the same argument \(\phi \) is equivalent; direct computation demonstrates that

$$\begin{aligned} \mathsf {p}(\phi )\mathsf {v} = \mathsf {v}\mathsf {q}(\phi ), \quad \text{ and } \quad \mathsf {p}(\phi )\mathsf {v}\overline{\mathsf {q}}(\phi ) = \mathsf {v}. \end{aligned}$$
(17)

A kind of symmetry between the defining vector \(\mathbf {c}\) and the normalized Cartesian position vector \({\hat{\mathbf {x}}}\) implied by the form of \(\mathsf {q}(\phi )\) and \(\mathsf {p}(\phi )\) manifests also in the geometrical construction of the next section.

2.3 KS quaternions more geometrico

Given the transformation (9), let us polish the geometrical interpretation of the KS variables proposed by Saha (2009). Scalar multiplication of both sides of (9) by \(\mathsf {v}\) leads to the basic relation

$$\begin{aligned} {\hat{\mathbf {x}}} \mathbf {\cdot }\mathbf {v} = \mathbf {c} \mathbf {\cdot }\mathbf {v}, \end{aligned}$$
(18)

with two unit vectors \(\mathbf {c}\) and \({\hat{\mathbf {x}}} = \mathbf {x}/r\). This property, valid for any scalar part \(v_0\), means that all quaternions \(\mathsf {v}\) belonging to the fibre of given Cartesian vector \(\mathbf {x}\) have vector parts \(\mathbf {v} = \mathsf {v}^\natural \) forming the same angle with \(\mathbf {c}\) and \(\mathbf {x}\), hence lying in the symmetry plane of this pair of vectors. The plane, marked grey in Fig. 1, contains \(\mathbf {c}+{\hat{\mathbf {x}}}\) and is perpendicular to \(\mathbf {c} - {\hat{\mathbf {x}}}\). The norm \(|\mathsf {v}|= \sqrt{r \alpha }\) is the upper bound on the length of \(\mathbf {v}\), so the dashed circle in Fig. 1 has the radius \(\sqrt{\alpha r}\). Setting \(v_0=0\) in Eq. (9), we see that \(\mathbf {x}\) is a linear combination of \(\mathbf {c}\) and \(\mathbf {v}\), so the three vectors must be coplanar. Accordingly, there are exactly two pure quaternions related to \(\mathbf {x}\): \(\mathsf {v}_\mathrm {s} = (0,\mathbf {v}_\mathrm {s})\), and \(- \mathsf {v}_\mathrm {s}\), where

$$\begin{aligned} \mathbf {v}_s = \sqrt{\alpha r} \frac{\mathbf {c} + {\hat{\mathbf {x}}}}{|| \mathbf {c} + {\hat{\mathbf {x}}}||}, \end{aligned}$$
(19)

is the ‘Saha–Kustaanheimo–Stiefel (SKS)vector’ of Breiter and Langner (2017).

Fig. 1
figure 1

Geometrical construction for the vector part of the KS quaternion \(\mathsf {v}\)

The entire fibre \(\mathsf {v}\) can be generated from \(\mathsf {v}_\mathrm {s}\) by the application of the generator (11),

$$\begin{aligned} \mathsf {v} = \mathsf {v}_\mathrm {s} \,\mathsf {q}(-\phi ), \end{aligned}$$
(20)

leading to

$$\begin{aligned} v_0= & {} \mathbf {c} \mathbf {\cdot }\mathbf {v}_\mathrm {s} \sin {\phi }, \end{aligned}$$
(21)
$$\begin{aligned} \mathbf {v}= & {} \cos {\phi } \mathbf {v}_\mathrm {s} + \sin {\phi } \left( \mathbf {c} \mathbf {\times }\mathbf {v}_\mathrm {s} \right) . \end{aligned}$$
(22)

The latter of the formulae is a parametric equation of an ellipse with the major semi-axis \(\sqrt{\alpha r}\) and the eccentricity \(\sqrt{(1+\mathbf {c} \mathbf {\cdot }{\hat{\mathbf {x}}})/2}\). The ellipse is drawn with a solid line in Fig. 1. The position angle \(\beta \) in the figure should not be confused with the parametric longitude \(\phi \); the angles are related by the formula

$$\begin{aligned} \tan \beta = \sqrt{\frac{1 - \mathbf {c} \mathbf {\cdot }{\hat{\mathbf {x}}} }{2}} \, \tan \phi . \end{aligned}$$
(23)

The line segment with arrowheads at both ends in Fig. 1 complements the length of \(\mathbf {v}\) to the full value \(\sqrt{\alpha r}\), so its length can be interpreted as the absolute value of the scalar part of \(\mathsf {v}\).

Of course, the generic picture shown in Fig. 1 does not include the special case of the parallel \(\mathbf {x}\) and \(\mathbf {c}\). If \({\hat{\mathbf {x}}} = \mathbf {c}\), the fibre degenerates to the set of quaternions having vector part aligned with \(\mathbf {c}\), i.e.

$$\begin{aligned} \mathsf {v} = \sqrt{\alpha r} \left( \sin {\phi } , \cos {\phi } \mathbf {c} \right) , \end{aligned}$$
(24)

with \(\mathsf {v}_\mathrm {s} = (0,\sqrt{\alpha r} \mathbf {c})\). The eccentricity of the ellipse from Fig. 1 attains the value 1, so the ellipse degenerates into a straight segment. The shaded plane from the figure is no longer defined.

But if \({\hat{\mathbf {x}}} = -\mathbf {c}\), the situation is different. Observing that then the ellipse from Fig. 1 turns into a circle, we conclude that the fibre consists exclusively of the pure quaternions \(\mathsf {v} = (0, \sqrt{\alpha r} \,\hat{\mathbf {f}})\), where \(\hat{\mathbf {f}}\) is any vector orthogonal to \(\mathbf {c}\).

2.4 Bilinear form \({\mathscr {J}}\) and LC planes

2.4.1 Definitions

The skew-symmetric bilinear form \({\mathscr {J}} : {\mathbb {H}} \times {\mathbb {H}} \rightarrow {\mathbb {R}}\), introduced by Kustaanheimo (1964) and discussed in later works, can be generalized to an arbitrary defining vector \(\mathbf {c}\) as

$$\begin{aligned} {\mathscr {J}}(\mathsf {v},\mathsf {w}) = \left( \bar{\mathsf {v}} \wedge \bar{\mathsf {w}} \right) \cdot \mathsf {c} = - v_0 \mathbf {w} \mathbf {\cdot }\mathbf {c} + w_0 \mathbf {v} \mathbf {\cdot }\mathbf {c} + \left( \mathbf {v} \mathbf {\times }\mathbf {w} \right) \mathbf {\cdot }\mathbf {c}. \end{aligned}$$
(25)

The form plays a central role in the KS formulation of motion. If the motion can be restricted to the linear subspace of \({\mathbb {H}}\) spun by two basis quaternions \(\mathsf {u}\) and \(\mathsf {w}\), such that \({\mathscr {J}}(\mathsf {u},\mathsf {w})=0\), the KS transformation reduces to the Levi-Civita transformation (Levi-Civita 1906). For this reason, a two-dimensional subspace P of quaternions being the linear combinations of \(\mathsf {u}\) and \(\mathsf {w}\), hence such that the form \({\mathscr {J}}\) on any two of them equals 0, was dubbed the ‘Levi-Civita plane’ by Stiefel and Scheifele (1971). We will use the name ‘LC plane’, although, strictly speaking, a (hyper-)plane in a space of dimension 4 should be spun by three basis quaternions.

Repeating the proof of Theorem 3 from Deprit et al. (1994) in our generalized framework, we conclude that for any unitary quaternion \(\mathsf {u}\) selected for the orthonormal basis of LC plane P, the second basis quaternion should be

$$\begin{aligned} \mathsf {w} = \mathsf {u} (0, \mathbf {f}) = \left( - \mathbf {u} \cdot \mathbf {f}, u_0 \mathbf {f} + \mathbf {u} \times \mathbf {f}\right) , \end{aligned}$$
(26)

where \(\mathbf {f}\) is any unitary vector orthogonal to the defining vector, i.e. \(\mathbf {f} \cdot \mathbf {c} =0\), and \(||\mathbf {f}||=1\). This meaning of the symbol \(\mathbf {f}\) will be held throughout the text. The basis is indeed orthonormal, since \(\mathsf {u}\cdot \mathsf {w} = 0\), and \(|\mathsf {u}|=|\mathsf {w}|=1\), by the definition of \(\mathsf {u}\) and \(\mathbf {f}\).

2.4.2 KS map of an LC plane

Once the LC plane has been defined, a question arises about the possibility of restricting the motion in KS variables to this subspace. But such restriction implies that the motion in ‘physical’ configuration space \({\mathbb {R}}^3\) is planar.

Let us prove that KS transformation maps any quaternion in the LC plane P onto a plane \(\varPi \) in ‘physical’ \({\mathbb {R}}^3\) space. Using the basis of two orthonormal quaternions \(\mathsf {u}\) and \(\mathsf {w} = \mathsf {u} (0, \mathbf {f})\), we consider their linear combination

$$\begin{aligned} \mathsf {v} = \xi \mathsf {u} + \eta \mathsf {w} = \mathsf {u}\,(\xi ,\,\eta \mathbf {f}), \end{aligned}$$
(27)

with real parameters \(\xi ,\eta \) having the dimension of length. The KS transform of these \(\mathsf {v}\), belonging to P, is, by the definition (8),

$$\begin{aligned} \mathsf {x} = \kappa {(\mathsf {v})} = \frac{\mathsf {u}\,(\xi ,\,\eta \mathbf {f})\,\mathsf {c}\,(\xi ,\,-\eta \mathbf {f})\,{\overline{\mathsf {u}}}}{\alpha }. \end{aligned}$$
(28)

Thanks to the orthogonality of \(\mathbf {c}\) and \(\mathbf {f}\), the product in the middle evaluates to

$$\begin{aligned} (\xi ,\,\eta \mathbf {f})\,\mathsf {c}\,(\xi ,\,-\eta \mathbf {f}) = \left( \xi ^2 - \eta ^2\right) ( 0, \mathbf {c}) + 2 \xi \eta ( 0, \mathbf {f} \times \mathbf {c}), \end{aligned}$$
(29)

so the vector part of \(\kappa {(\mathsf {v})}\) is a linear combination of two fixed, orthonormal vectors

$$\begin{aligned} \mathbf {x} = \frac{ \xi ^2 - \eta ^2 }{\alpha } {\hat{\mathbf {x}}}_1 + \frac{2 \xi \eta }{\alpha }\,{\hat{\mathbf {x}}}_2, \end{aligned}$$
(30)

where

$$\begin{aligned} {\hat{\mathbf {x}}}_1= & {} \left[ \mathsf {u} (0,\mathbf {c}) {\overline{\mathsf {u}}} \right] ^\natural = (2 u_0^2-1) \mathbf {c} + 2 (\mathbf {u}\cdot \mathbf {c}) \, \mathbf {u} + 2 u_0 \mathbf {u} \times \mathbf {c}, \end{aligned}$$
(31)
$$\begin{aligned} {\hat{\mathbf {x}}}_2= & {} \left[ \mathsf {u} (0,\mathbf {f} \times \mathbf {c}) {\overline{\mathsf {u}}} \right] ^\natural \nonumber \\= & {} (2 u_0^2-1) (\mathbf {f} \times \mathbf {c}) + 2 (\mathbf {u}\cdot (\mathbf {f} \times \mathbf {c})) \mathbf {u} + 2 u_0 \mathbf {u} \times (\mathbf {f} \times \mathbf {c}). \end{aligned}$$
(32)

Since Eq. (30) is actually a parametric equation of a plane in \({\mathbb {R}}^3\), we have demonstrated that the KS transformation of any LC plane \(P \subset {\mathbb {H}}\) is a plane \(\varPi \) in \({\mathbb {R}}^3\) (or in \({\mathbb {H}}'\), depending on the context). The parameters \(\xi \), \(\eta \) become parabolic coordinates on \(\varPi \), i.e. the usual Levi-Civita variables.

The vector \({\hat{\mathbf {x}}}_3 = {\hat{\mathbf {x}}}_1 \times {\hat{\mathbf {x}}}_2\), normal to the plane, can be most easily derived in terms of the quaternion cross product (7), with the first lines of (31) and (32) substituted. Thus, letting \(\mathsf {x}_j = (0,{\hat{\mathbf {x}}}_j)\), and \(\mathsf {b} = (0, \mathbf {f} \times \mathbf {c})\), we find

$$\begin{aligned} \mathsf {x}_3= & {} \frac{1}{2} \left[ \overline{\mathsf {x}}_2 \mathsf {x}_1 - \overline{\mathsf {x}}_1 \mathsf {x}_2 \right] \nonumber \\= & {} \frac{1}{2} \left[ ( \mathsf {u} \mathsf {b} {\overline{\mathsf {u}}})( \mathsf {u} \overline{\mathsf {c}} \,{\overline{\mathsf {u}}}) - ( \mathsf {u} \overline{\mathsf {c}} {\overline{\mathsf {u}}})( \mathsf {u} \mathsf {b} {\overline{\mathsf {u}}})\right] \nonumber \\= & {} \mathsf {u} (\mathsf {c} \wedge \mathsf {b} ) {\overline{\mathsf {u}}} = \mathsf {u}\,(0,\mathbf {f}) {\overline{\mathsf {u}}} \nonumber \\= & {} \left( 0, (2 u_0^2 -1) \mathbf {f} + 2 ( \mathbf {u} \cdot \mathbf {f} ) \mathbf {u} + 2 u_0 \mathbf {u} \times \mathbf {f}\right) . \end{aligned}$$
(33)

Thanks to the above equation, we can relate the choice of the LC plane basis to the orientation of \(\varPi \). The cosine of the angle between the defining vector \(\mathbf {c}\) and the normal to the plane of motion \({\hat{\mathbf {x}}}_3\) is given by the scalar product

$$\begin{aligned} \mathbf {c} \cdot {\hat{\mathbf {x}}}_3 = 2 (\mathbf {u} \cdot \mathbf {f}) (\mathbf {u} \cdot \mathbf {c}) - 2 u_0 \mathbf {u} \cdot (\mathbf {c} \times \mathbf {f}). \end{aligned}$$
(34)

2.4.3 KS1 and KS3 setup

Some particular choices of the first basis quaternion \(\mathsf {u}\) deserve a special comment. Inspecting Eq. (34), we notice three obvious cases leading to \(\mathbf {c}\) positioned in the plane of motion: a pure scalar \(\mathsf {u} = \pm \,(1,\mathbf {0})\), or pure quaternions: \(\mathsf {u} = (0, \pm \,\mathbf {f})\), and \(\mathsf {u} = (0, \pm \, \mathbf {c})\). The basis vectors \({\hat{\mathbf {x}}}_1\), resulting from Eq. (31), are \(\mathbf {c}\), \(-\,\mathbf {c}\), and \(\mathbf {c}\), respectively. The last case, i.e. \(\mathsf {u} = (0, \mathbf {c})\), has been the most common choice in Celestial Mechanics since the first paper of Kustaanheimo (1964). It allows the most direct identification of the LC plane with the plane of motion, both spanned by the same vectors (or pure quaternions) \(\mathsf {u}^\natural = {\hat{\mathbf {x}}}_1 = \mathbf {c}\), and \(\mathsf {w}^\natural = {\hat{\mathbf {x}}}_2 = \mathbf {c} \times \mathbf {f}\). The freedom of choice for \(\mathbf {f}\) (any vector perpendicular to \(\mathbf {c}\)) permits to identify \(\mathbf {c}\) and \(-\mathbf {f}\) with the basis vector \(\mathbf {e}_1\) and \(\mathbf {e}_3\) of the particular reference frame used to describe the planar (\(x_3=0\)) motion. For this reason, let us call the KS transformation based upon the paradigmatic choice \(\mathbf {c}=\mathbf {e}_1\), the KS1 transformation.

Remaining in the domain of pure quaternions, let us consider \(\mathsf {u} = (0,\mathbf {u})\). Without loss of generality, we can assume \(\mathbf {u} = \cos {\psi } \mathbf {c} + \sin {\psi } \mathbf {f}\), with \(0 \leqslant \psi \leqslant \pi \). Then, according to Eq. (34), we have \(\mathbf {c} \cdot {\hat{\mathbf {x}}}_3 = \sin {2 \psi }\), so an appropriate choice of the parameter \(\psi \) may lead to any orientation of the orbital plane with respect to \(\mathbf {c}\). In particular, the defining vector will coincide with \({\hat{\mathbf {x}}}_3\) when \(\psi =\pi /2\). The LC plane spanned by the basis quaternions

$$\begin{aligned} \mathsf {u} = \left( 0 , \frac{\mathbf {c}+\mathbf {f}}{\sqrt{2}} \right) , \quad \mathsf {w} = \left( - \frac{1}{\sqrt{2}}, \frac{\mathbf {c} \times \mathbf {f}}{\sqrt{2}} \right) , \end{aligned}$$
(35)

is mapped onto the plane of motion \(\varPi \) with basis vectors \({\hat{\mathbf {x}}}_1 = \mathbf {f}\), and \({\hat{\mathbf {x}}}_2 = \mathbf {c} \times \mathbf {f}\)—both orthogonal to \(\mathbf {c}\). Thus, the choice of \(\mathbf {c}=\mathbf {e}_3\), and \(\mathbf {f}=\mathbf {e}_1\) leads to the KS3 transformation, which may look less attractive than KS1, with its LC plane no longer consisting of pure quaternions. Indeed, it is not practiced in Celestial Mechanics, save for two exceptions known to the authors (Saha 2009; Breiter and Langner 2017). In physics, however, the KS3 transformation is common at least since 1970s (e.g. Duru and Kleinert 1979; Cordani 2003; Díaz et al. 2010; Egea et al. 2011; van der Meer et al. 2016); there are good reasons for this, but they come out only in the context of dynamics and symmetries of a perturbed Kepler (or Coulomb) problem.

3 Canonical KS variables in the extended phase space

In contrast to earlier works, let us consider from the onset a canonical problem in the extended phase space \((x^*,\mathbf {x},X^*,\mathbf {X})\), with a Hamiltonian

$$\begin{aligned} {\mathscr {H}}(x^*,\mathbf {x},X^*,\mathbf {X}) = {\mathscr {H}}_0(\mathbf {x},\mathbf {X}) + {\mathscr {R}}(x^*,\mathbf {x},\mathbf {X})+X^*= 0, \end{aligned}$$
(36)

where the Keplerian term

$$\begin{aligned} {\mathscr {H}}_0 = \frac{\mathbf {X} \cdot \mathbf {X}}{2} - \frac{\mu }{r}, \end{aligned}$$
(37)

depends on the Cartesian coordinates \(\mathbf {x}\), their conjugate momenta \(\mathbf {X}\) and the gravitational parameter \(\mu \). The time-dependent perturbation \({\mathscr {R}}(t,\mathbf {x},\mathbf {X})\) is converted into a conservative term by substituting a formal, time-like coordinate \(x^*\) for physical time t. The fact that \(x^*(t)=t\) is a direct consequence of the way its conjugate momentum \(X^*\) appears in Eq. (36), because

$$\begin{aligned} {\dot{x^*}} = \frac{\partial {\mathscr {H}}}{\partial X^*} = 1, \end{aligned}$$
(38)

and an appropriate choice of the arbitrary constant leads to the identity map of t on \(x^*\). The momentum \(X^*\) itself evolves according to

$$\begin{aligned} {\dot{X}}^*= - \frac{\partial {\mathscr {H}}}{\partial x^*} = - \frac{\partial {\mathscr {R}}}{\partial x^*}, \end{aligned}$$
(39)

counterbalancing the variations of energy in nonconservative problems, or staying constant in the conservative case.

If the same problem is to be handled canonically in terms of the KS coordinates, their conjugate momenta \(\mathsf {V}\) are implicitly defined through

$$\begin{aligned} \mathsf {X} = \frac{\mathsf {V} \mathsf {c} \bar{\mathsf {v}}}{2r}, \text{ or } \mathsf {V} = \frac{2\,\mathsf {X} \,\mathsf {v} \,\bar{\mathsf {c}}}{\alpha }. \end{aligned}$$
(40)

In this transformation, we postulate

$$\begin{aligned} {\mathscr {J}}(\mathsf {v},\mathsf {V}) = \left( \bar{\mathsf {v}} \wedge {\bar{\mathsf {V}}} \right) \cdot \mathsf {c} = 0, \end{aligned}$$
(41)

to secure

$$\begin{aligned} X_0 = \frac{{\mathscr {J}}(\mathsf {v},\mathsf {V})}{2 r} =0, \end{aligned}$$
(42)

so that \(\mathsf {X} = (0, \mathbf {X})\) remains a pure quaternion. The transformation \({\mathbb {R}}^2 \times {\mathbb {H}}^2 \rightarrow {\mathbb {R}}^2 \times {\mathbb {H}}' \times {\mathbb {H}}'\), which maps \((v^*,\mathsf {v},V^*,\mathsf {V}) \mapsto (x^*,\mathbf {x},X^*,\mathbf {X})\) according to the definitions (8), (40) and the identities \(x^*= v^*\), \(X^*= V^*\), is known to be weakly canonical [i.e. canonical only on a specific manifold (41)].

Let us now generalize the transformation by allowing that \(\alpha \), instead of being a fixed parameter, is an arbitrary differentiable function of the energy-like momentum \(X^*\) or \(V^*\). A similar assumption was recently made for the Levi-Civita transformation (Breiter and Langner 2018). The necessity or at least usefulness of such a generalization will not be clear until the action–angle variables are introduced, but it has to be introduced already at this stage. If the generalized transformation is to be kept weakly canonical, while maintaining the direct relation \(V^*= X^*\), the new formal time-like variable \(v^*\) should differ from \(x^*\).Footnote 1 Then, the transformation

$$\begin{aligned} \lambda : \quad (v^*,\mathsf {v},V^*,\mathsf {V}) \mapsto (x^*,\mathbf {x},X^*,\mathbf {X}), \end{aligned}$$
(43)

conserves the Pfaffian one-form up to the total differential of a primitive function Q (Arnold et al. 1997)

$$\begin{aligned} V^*\,\mathrm {d}v^*+ \mathsf {V} \cdot \mathrm {d}\mathsf {v}- X^*\,\mathrm {d}x^*- \mathbf {X} \cdot \mathrm {d}\mathbf {x} = \mathrm {d}Q + \frac{{\mathscr {J}}(\mathsf {v},\mathsf {V}) {\mathscr {J}}(\mathsf {v} ,\mathrm {d} \mathsf {v})}{\mathsf {v}\cdot \mathsf {v}}, \end{aligned}$$
(44)

provided

$$\begin{aligned} Q= & {} \left[ \frac{X^*}{\alpha } \frac{\partial \alpha }{\partial X^*}\right] \mathbf {x} \cdot \mathbf {X} = \left[ \frac{V^*}{ \alpha } \frac{\partial \alpha }{\partial V^*}\right] \frac{\mathsf {v} \cdot \mathsf {V}}{2}, \end{aligned}$$
(45)
$$\begin{aligned} x^*= & {} v^*- \frac{Q}{V^*}, \end{aligned}$$
(46)

and with a necessary condition of \({\mathscr {J}}(\mathsf {v},\mathsf {V})=0\).

It is worth noting that with an elementary choice of \(\alpha = k_1 (X^*)^{k_2}\), the expression in the square bracket evaluates to a single number \(k_2\), and the multiplier \(k_1\) has no influence on canonicity; hence, it can be selected at will—for example to conserve (or to modify) the units of time and length.

In order to convert the Hamiltonian (36) into a perturbed harmonic oscillator, the independent variable has to be changed from the physical time t to the Sundmann time \(\tau \), related by

$$\begin{aligned} \frac{\mathrm {d}\tau }{\mathrm {d} t} = \frac{\alpha }{4 r} = \frac{\alpha ^2}{4 \,\mathsf {v} \cdot \mathsf {v}}, \end{aligned}$$
(47)

involving \(\alpha \) as a function of \(V^*\) or \(X^*\). Transforming the Hamiltonian (36) by the composition of \(\lambda \) and \(t \mapsto \tau \), we obtain

$$\begin{aligned} {\mathscr {K}}(v^*,\mathsf {v},V^*,\mathsf {V})= & {} {\mathscr {K}}_0(\mathsf {v},V^*,\mathsf {V})+ {\mathscr {P}}(v^*,\mathsf {v},V^*,\mathsf {V}) = 0, \end{aligned}$$
(48)
$$\begin{aligned} {\mathscr {K}}_0(\mathsf {v},V^*,\mathsf {V})= & {} \frac{\mathsf {V} \cdot \mathsf {V}}{2} + \frac{\omega ^2 \mathsf {v} \cdot \mathsf {v}}{2} - \frac{4\mu }{\alpha } + \frac{\alpha {\mathscr {J}}(\mathsf {v},\mathsf {V})^2}{2 |\mathsf {v}|^2},~~~~ \end{aligned}$$
(49)
$$\begin{aligned} {\mathscr {P}}(v^*,\mathsf {v},V^*,\mathsf {V})= & {} \frac{4 r}{\alpha } {\mathscr {R}}^\star (v^*,\mathsf {v},V^*,\mathsf {V}), \end{aligned}$$
(50)

where \({\mathscr {R}}^\star \) is the perturbation Hamiltonian \({\mathscr {R}}(x^*,\mathbf {x},\mathbf {X})\) expressed in terms of the extended KS coordinates and momenta, and

$$\begin{aligned} \omega = \frac{2 \sqrt{2 V^*}}{\alpha }, \end{aligned}$$
(51)

will have a constant value only if the original Hamiltonian \({\mathscr {H}}\) does not depend on time. Let us emphasize that now every function of \(\mathbf {x}\), when expressed in terms of \(\mathsf {v}\), will generally depend on the energy-like momentum \(V^*\) as well, due to its presence in \(\alpha \). Noteworthy, the simplification to \(\omega =1\) can be achieved by assuming \(\alpha = \sqrt{8 V^*}\), which makes the Sundmann time dimensionless. Choosing \(\alpha = \mu / V^*\), is roughly equivalent to \(\alpha = 2 a\), in terms of the Keplerian orbit semi-axis a.

4 Action–angle variables

4.1 LLC and LCF variables

When the motion is planar, with \(x_3=0\), an appropriate action–angle set lgLG can be created using a combination of the Levi-Civita (Levi-Civita 1906) and Lissajous transformations (Deprit and Williams 1991). This approach has been recently revisited and discussed by Breiter and Langner (2018). Viewed as the special case of the KS framework, the Lissajous–Levi-Civita (LLC) variables are inherently attached to the KS1 setup, requiring the identification of the LC plane \(P \subset {\mathbb {H}}'\) of pure quaternions and the plane of motion \(\varPi \subset {\mathbb {R}}^3\). A generalization of this approach was proposed by Zhao (2015). Roughly speaking, he attached the LC plane to an osculating plane of motion \(\varPi \) and added the third action–angle pair hH orienting \(\varPi \) in \({\mathbb {R}}^3\) by direct analogy with the third Delaunay pair: longitude of the ascending node, and projection of the angular momentum on the axis \({\hat{\mathbf {x}}}_3\). As noted by the author, this approach has the same drawbacks as in the Dalunay set—in particular, the singularity when the orbit in physical space is rectilinear, thus having no unique orbital plane.

4.2 Lissajous–Kustaanheimo–Stiefel (LKS) variables

4.2.1 Intermediate set

The starting point for the new set of variables we would like to propose is completely different than in Zhao (2015). First, we choose the KS3 framework, assuming the defining vector \(\mathbf {c}=\mathbf {e}_3\). Then, we select two subspaces of \({\mathbb {H}}\): \(P_{03}\) with the basis \(\mathsf {e}_0, \mathsf {e}_{3}\), and \(P_{12}\) spanned by \(\mathsf {e}_1\) and \(\mathsf {e}_2\). None of them is a Levi-Civita plane, because in the KS3 framework

$$\begin{aligned} {\mathscr {J}}(\mathsf {e}_0,\mathsf {e}_3)= & {} {\mathscr {J}}((1,\mathbf {0}),(0,\mathbf {c})) = -1, \nonumber \\ {\mathscr {J}}(\mathsf {e}_0,\mathsf {e}_3)= & {} {\mathscr {J}}((1,\mathbf {f}),(0,\mathbf {c} \times \mathbf {f}))=1. \end{aligned}$$
(52)

Thus, even for the planar case, we do not restrict motion to an invariant plane P, but merely project \(\mathsf {v}\) on two orthogonal subspaces. The orthogonality is readily checked by

$$\begin{aligned} (v_0 \mathsf {e}_0 + v_3 \mathsf {e}_3) \cdot (v_1 \mathsf {e}_1 + v_2 \mathsf {e}_2) = 0. \end{aligned}$$
(53)

On each plane, with \((i,j) = (0,3)\), or \((i,j) = (1,2)\), we perform the Lissajous transformation of Deprit (1991)

$$\begin{aligned} v_i= & {} \sqrt{\frac{L_{ij}+G_{ij}}{2 \omega }} \cos {(l_{ij}+g_{ij})} - \sqrt{\frac{L_{ij}-G_{ij}}{2 \omega }} \cos {(l_{ij}-g_{ij})}, \end{aligned}$$
(54)
$$\begin{aligned} v_j= & {} \sqrt{\frac{L_{ij}+G_{ij}}{2 \omega }} \sin {(l_{ij}+g_{ij})} + \sqrt{\frac{L_{ij}-G_{ij}}{2 \omega }} \sin {(l_{ij}-g_{ij})}, \end{aligned}$$
(55)
$$\begin{aligned} V_i= & {} -\sqrt{\frac{\omega (L_{ij}+G_{ij})}{2}} \sin {(l_{ij}+g_{ij})} + \sqrt{\frac{\omega (L_{ij}-G_{ij})}{2}} \sin {(l_{ij}-g_{ij})}, \end{aligned}$$
(56)
$$\begin{aligned} V_j= & {} \sqrt{\frac{\omega (L_{ij}+G_{ij})}{2}} \cos {(l_{ij}+g_{ij})} + \sqrt{\frac{\omega (L_{ij}-G_{ij})}{2}} \cos {(l_{ij}-g_{ij})}. \end{aligned}$$
(57)

Similarly to Breiter and Langner (2018), we allow \(\omega > 0\) to be a function of \(V^*\), as given by Eq. (51)—both directly, and through \(\alpha \). This requires a new time-like variable s to be different from \(v^*\), while retaining its conjugate \(S=V^*\). Only then, the 1-forms are conserved up to the total differential

$$\begin{aligned} L_{03} \mathrm {d}l_{03} + G_{03} \mathrm {d}g_{03} + L_{12} \mathrm {d}l_{12} + G_{12} \mathrm {d}g_{12} + S \mathrm {d}s - \mathsf {V} \cdot \mathrm {d} \mathsf {v} - V^*\mathrm {d} v^*= \mathrm {d}Q^*, \end{aligned}$$
(58)

with

$$\begin{aligned} Q^*= -\frac{\mathsf {v} \cdot \mathsf {V}}{2} \left( 1 - \frac{ S}{\omega }\frac{\mathrm {d} \omega }{\mathrm {d} S} \right) , \end{aligned}$$
(59)

and

$$\begin{aligned} v^*= s -\frac{\mathsf {v} \cdot \mathsf {V}}{2 \omega } \frac{\mathrm {d} \omega }{\mathrm {d} S}, \end{aligned}$$
(60)

where

$$\begin{aligned} \mathsf {v} \cdot \mathsf {V} = \sqrt{L_{03}^2-G_{03}^2} \sin {2 l_{03}} + \sqrt{L_{12}^2-G_{12}^2} \sin {2 l_{12}}. \end{aligned}$$
(61)

The Hamiltonian function (48) is converted into the sum of

$$\begin{aligned} {\mathscr {K}}'_0 = \omega L_{03} + \omega L_{12} - \frac{4\mu }{\alpha } + \frac{\alpha \left( G_{03}-G_{12}\right) ^2}{8 |\mathsf {v}|^2}, \end{aligned}$$
(62)

and of the perturbation \({\mathscr {P}}\) expressed in terms of the Lissajous variables.

This transformation is merely an intermediate step, but before the final move let us inspect the meaning and properties of the variables in the Kepler problem defined by \({\mathscr {K}}'_0=0\). As a generic example, we take a heliocentric orbit in physical phase space with the following Keplerian elements: major semi-axis \(a=10\,\,\mathrm {au}\), eccentricity \(e=0.5\), inclination \(I=10^\circ \), argument of perihelion \(\omega _\mathrm {o} = 60^\circ \), longitude of the ascending node \(\varOmega =10^\circ \), and the initial true anomaly \(f=60^\circ \). From these elements, we compute first the position \(\mathbf {x}(0)\) and momentum \(\mathbf {X}(0)\), and then the representative KS3 quaternions \(\mathsf {v}(0)\) and \(\mathsf {V}(0)\)—an SKS vector given by Eq. (19), and its conjugate momentum defined by Eq. (40), both with \(\mathsf {c}=\mathsf {e}_3\). These initial conditions are labelled with black dots in Fig. 2a. The ellipses described in the \((v_0,v_3)\) and \((v_1,v_2)\) planes have different semi-axes and different eccentricities; however, both are traversed in the same direction—retrograde (clockwise) in the discussed example. The retrograde motion follows from the fact that \(G_{03}=G_{12} < 0\) (the momenta are equal due to the postulate (41), where \(\left( \bar{\mathsf {v}} \wedge {\bar{\mathsf {V}}} \right) \cdot \mathsf {e}_3 = (G_{03}-G_{12})/2\)). The constant angles \(g_{03}\) and \(g_{12}\), measured counterclockwise, position the ellipses in the coordinate planes. The initial angles \(l_{03}\) and \(l_{12}\) are marked according to the geometrical construction similar to that of the eccentric anomaly. Comparing our Fig. 2 with Fig. 1 of Deprit (1991), the readers may note the reverse direction of the \(l_{ij}\) angle. The difference comes from the fact that Deprit (1991) assumed \(G > 0\), i.e. the prograde (counterclockwise) motion along the Lissajous ellipse. Yet, regardless of the sign of \(G_{ij}\), equations of motion imply \(\mathrm {d}l_{03}/\mathrm {d}\tau = \mathrm {d}l_{12}/\mathrm {d}\tau = \omega > 0\).

Each Lissajous ellipse has the major semi-axis \(a_{ij}\) and the minor semi-axis \(b_{ij}\) defined by the two momenta and frequency

$$\begin{aligned} a_{ij} = \frac{\sqrt{L_{ij}+G_{ij}}+\sqrt{L_{ij}-G_{ij}}}{2 \omega }, \quad b_{ij} = \frac{|\sqrt{L_{ij}+G_{ij}}-\sqrt{L_{ij}-G_{ij}}|}{2 \omega }. \end{aligned}$$
(63)

The absolute value operator is necessary for \(G_{ij}<0\), unless one adopts a convention of negative minor semi-axis for an ellipse traversed clockwise.

Fig. 2
figure 2

The motion in two configuration planes of the KS3 variables for the Kepler problem. a Initial conditions \(\mathsf {v}\) set according to Eq. (19). b Initial conditions are multiplied by \(\mathsf {q}(\pi /2)\). More details in the text

Another point worth observing is the ambiguity in the choice of the \(l_{ij}\) and \(g_{ij}\) pair. Their values are determined from two possible sets of four equations linking quadratic forms of \(v_i,v_j,V_i,V_j\) with sine and cosine functions of the angles. Regardless of whether we use

$$\begin{aligned} \sin {2 g_{ij}}, \cos {2 g_{ij}}, \sin {2 l_{ij}}, \cos {2 l_{ij}}, \end{aligned}$$

or

$$\begin{aligned} \sin {(l_{ij}+ g_{ij})}, \cos {(l_{ij}+g_{ij})}, \sin {(l_{ij}-g_{ij})}, \cos {(l_{ij}-g_{ij})}, \end{aligned}$$

the solution will always result in two pairs: \((l_{ij},g_{ij})\), and \((l_{ij}+\pi , g_{ij}+\pi )\)—both giving the same values of the sine and cosine.Footnote 2 In other words, one of the two minor semi-axes in each of the ellipses in Fig. 2 can be chosen at will as the reference one.

Recalling the fibration property of the KS variables, we have plotted the ellipses obtained from the same Cartesian \(\mathbf {x}(0)\) and \(\mathbf {X}(0)\), but with the KS3 initial conditions \(\mathsf {v}(0)\) and \(\mathsf {V}(0)\) right-multiplied by \(\mathsf {q}(\pi /2)=\mathsf {c}=\mathsf {e}_3\), according to Eq. (11) in the KS3 case. The results are displayed in Fig. 2b. Not only the initial conditions, but the entire ellipses are rotated by \(90^\circ \) in the \((v_0,v_3)\) plane and by \(-\,90^\circ \) in the \((v_1,v_2)\) plane. The momenta \(L_{ij}, G_{ij}\), and the angles \(l_{ij}\) remain intact, compared to Fig. 2a. The new angles positioning the ellipses are \(g'_{03} = g_{03}+\pi /2\), and \(g'_{12} = g_{12}-\pi /2\), but their sum has not changed: \(g'_{03}+g'_{12} = g_{03}+g_{12}.\)

4.2.2 Final transformation

Bearing in mind the example shown in Fig. 2, we can establish the final set of the LKS variables by defining four action–angle pairs

$$\begin{aligned} l= & {} \frac{1}{2} \left( l_{12}+l_{03}\right) , \nonumber \\ \lambda= & {} \frac{1}{2} \left( l_{12}-l_{03}\right) , \nonumber \\ g= & {} \frac{1}{2} \left( g_{12}+g_{03}\right) , \nonumber \\ \gamma= & {} \frac{1}{2} \left( g_{12}-g_{03}\right) , \nonumber \\ L= & {} L_{12}+L_{03}, \nonumber \\ \varLambda= & {} L_{12} - L_{03}, \nonumber \\ G= & {} G_{12} + G_{03}, \nonumber \\ \varGamma= & {} G_{12}-G_{03}, \end{aligned}$$
(64)

with s and S retained unaffected. One may easily verify that (64) amounts to an elementary Mathieu transformation; thus, the complete composition

$$\begin{aligned} \zeta :~(x^*,\mathsf {x},X^*,\mathsf {X}; t) \rightarrow (s,l,\lambda ,g,\gamma ,S,L,\varLambda ,G,\varGamma ; \tau ), \end{aligned}$$

is a weakly canonical, dimension raising transformation. The Hamiltonian \({\mathscr {H}}\) from Eq. (36) is transformed into

$$\begin{aligned} {\mathscr {M}}(s,l,\lambda ,g,S,L,\varLambda ,G,\varGamma ) = {\mathscr {M}}_0(l,\lambda ,S,L,\varLambda ,\varGamma ) + {\mathscr {Q}}(s,l,\lambda ,g,S,L,\varLambda ,G,\varGamma ) = 0,\nonumber \\ \end{aligned}$$
(65)

where

$$\begin{aligned} {\mathscr {M}}_0 = \omega (S)\,L - \frac{4 \mu }{\alpha (S)} + \frac{\varGamma ^2}{8 r}, \end{aligned}$$
(66)

and \({\mathscr {Q}}\) is the pullback of \(\frac{4 r}{\alpha (S)} {\mathscr {R}}(x^*,\mathbf {x},\mathbf {X})\) by \(\zeta \).

Expressing the Cartesian variables from the initial extended phase space in terms of the LKS variables, we first introduce six actions-dependent coefficients

$$\begin{aligned} A_1= & {} \frac{1}{2} \sqrt{(L + G)^2 - (\varLambda + \varGamma )^2}, \nonumber \\ A_2= & {} \frac{1}{2} \sqrt{(L - G)^2 - (\varLambda - \varGamma )^2}, \nonumber \\ B_1= & {} \frac{1}{2} \sqrt{(L + \varLambda )^2 - (G + \varGamma )^2}, \nonumber \\ B_2= & {} \frac{1}{2} \sqrt{(L - \varLambda )^2 - (G - \varGamma )^2}, \nonumber \\ C_1= & {} \frac{1}{2} \sqrt{(L + \varGamma )^2 - (G + \varLambda )^2}, \nonumber \\ C_2= & {} \frac{1}{2} \sqrt{(L - \varGamma )^2 - (G - \varLambda )^2}, \end{aligned}$$
(67)

allowing a compact formulation of the expressions for coordinates

$$\begin{aligned} x_0= & {} 0, \end{aligned}$$
(68)
$$\begin{aligned} x_1= & {} \frac{1}{ \sqrt{8 S}} \left( A_1 \sin {2 (l + g)} - A_2 \sin {2 (l - g)} \right. \nonumber \\&\left. -\, C_1 \sin {2 (g + \lambda )} - C_2 \sin {2 (g - \lambda )} \right) , \end{aligned}$$
(69)
$$\begin{aligned} x_2= & {} \frac{1}{ \sqrt{8 S}} \left( - A_1 \cos {2 (l + g)} - A_2 \cos {2 (l - g)} \right. \nonumber \\&\left. +\, C_1 \cos {2 (g + \lambda )} + C_2 \cos {2 (g - \lambda )} \right) , \end{aligned}$$
(70)
$$\begin{aligned} x_3= & {} \frac{1}{ \sqrt{8 S}} \left( - \varLambda + B_1 \cos {2 (l + \lambda )} - B_2 \cos {2 (l - \lambda )} \right) , \end{aligned}$$
(71)

and momenta

$$\begin{aligned} X_0= & {} \frac{\varGamma }{2 r} = 0, \nonumber \\ X_1= & {} \frac{ A_1 \cos {2(l+g)} - A_2 \cos {2(l-g)}}{2 r} = \frac{\sqrt{8 S}}{2 r} \frac{\partial x_1}{\partial l}, \nonumber \\ X_2= & {} \frac{A_1 \sin {2 (l+g)} + A_2 \sin {2(l-g)}}{2 r} = \frac{\sqrt{8 S}}{2 r} \frac{\partial x_2}{\partial l}, \nonumber \\ X_3= & {} \frac{-B_1 \sin {2(l+\lambda )} + B_2 \sin {2(l-\lambda )}}{2 r} = \frac{\sqrt{8 S}}{2 r} \frac{\partial x_3}{\partial l}, \end{aligned}$$
(72)

where

$$\begin{aligned} r = \frac{ L - B_1 \cos {2(l+\lambda )} - B_2 \cos {2 (l-\lambda )}}{ \sqrt{8 S}}. \end{aligned}$$
(73)

Finally, the ‘time deputy’ variable \(x^*= t\) is linked with the formal time-like variable s through

$$\begin{aligned} x^*= s - \frac{B_1 \sin {2(l+\lambda )} + B_2 \sin {2(l-\lambda )}}{4 S} = s - \frac{1}{\sqrt{8 S}} \frac{\partial r}{\partial l}. \end{aligned}$$
(74)

We have skipped the explicit expression of the KS variables, because it can be immediately obtained from the substitution of (64) into (5457).

Two features of the above expressions for \(\mathsf {x}\), \(\mathsf {X}\), and \(x^*\) deserve special attention. First, none of them depends on \(\gamma \), which means that any dynamical system primarily defined in terms of \(\mathbf {x}\), \(\mathbf {X}\), and time, conserves the value of \(\varGamma \). Secondly, the expressions for the Cartesian coordinates and momenta in the extended phase space do not depend on the particular choice of \(\alpha (S)\) and \(\omega (S)\); the choice affects only the form of the Hamiltonian \({\mathscr {M}}\).

5 LKS variables and orbital elements

Let us interpret the variables forming the LKS set—first the momenta and then their conjugate angles—by showing their relation to the Keplerian elements or the Delaunay variables.

5.1 LKS momenta

Comparing Eqs. (42) and (72), one immediately finds that

$$\begin{aligned} {\mathscr {J}}(\mathsf {v},\mathsf {V}) = \varGamma , \end{aligned}$$
(75)

when \(\mathbf {c}=\mathbf {e}_3\), so observing that \({\mathscr {J}}(\mathsf {v},\mathsf {V})=0\) is the fundamental assumption of the KS transformation since the time of Kustaanheimo (1964), there is no other choice than \(\varGamma =0\). Recalling the absence of its conjugate angle \(\gamma \) in the Hamiltonian, \(\varGamma =0\) is the integral of motion.

The meaning of G becomes clear once we find the pullback of the orbital angular momentum \(\mathbf {G}_\mathrm {o}\) by \(\zeta \), obtaining

$$\begin{aligned} \mathbf {G}_\mathrm {o}= & {} \mathbf {x} \times \mathbf {X} \nonumber \\= & {} \frac{1}{2} \left( C_1 \sin {2(g+\lambda )} - C_2 \sin {2(g-\lambda )} \right) \mathbf {e}_1 \nonumber \\&+ \frac{1}{2} \left( - C_1 \cos {2(g+\lambda )} + C_2 \cos {2(g-\lambda )} \right) \mathbf {e}_2 \nonumber \\&+ \frac{G}{2} \mathbf {e}_3 + \frac{\varGamma \mathbf {x}}{2 r}. \end{aligned}$$
(76)

Thus, setting \(\varGamma =0\), we find the momentum G to be twice the projection of the orbital angular momentum on the third axis (i.e. twice the Delaunay action \(H_\mathrm {o}\)). Whenever the Hamiltonian admits the rotational symmetry around \(\mathbf {e}_3\), the momentum G will be the first integral of the system.

Proceeding to the momentum L, we have to distinguish the pure Kepler problem and the perturbed one. In the former case, we can set \({\mathscr {M}}_0=0\) in Eq. (66), finding

$$\begin{aligned} L = \frac{4\mu }{\alpha \omega } = \frac{2 \mu }{\sqrt{2 S}}, \end{aligned}$$
(77)

at \(\varGamma =0\). Moreover, in the pure Kepler problem, the momentum S can be expressed in terms of the major semi-axis a as \(S=\mu /(2a)\), which justifies the direct link between the values of L and of the Delaunay action \(L_\mathrm {o}\)

$$\begin{aligned} L = 2 \sqrt{\mu a} = 2 L_\mathrm {o}. \end{aligned}$$
(78)

The two restrictive clauses of the previous sentence (‘values’ and ‘pure Kepler’) deserve comments. Equation (78) does not imply differential relations, because, for example, \(\partial \mathbf {x}/\partial L_\mathrm {o} \ne 2\partial \mathbf {x}/\partial L\) (c.f. Deprit and Williams 1991). Moreover, the values of L and \(2 L_\mathrm {o}\) generally differ in a perturbed problem due to the fact that \(L_\mathrm {o}\) is always defined by \({\mathscr {H}}_0\) alone, whereas the definition of LKS momentum L depends on the complete Hamiltonian \({\mathscr {H}}_0+{\mathscr {R}}\) through the value of \(S=X^*\) (the latter fixed by the restriction to the manifold \({\mathscr {H}}=0\)).

Similar intricacies are met for the momentum \(\varLambda \), which turns out to be related to the Laplace (eccentricity) vector \(\mathbf {e}\), or rather the Laplace–Runge–Lenz vector \(\mathbf {J} = L_\mathrm {o} \mathbf {e}\), having the dimension of angular momentum. In the pure Kepler problem, substituting \(\varGamma =0\), we find

$$\begin{aligned} \mathbf {J}= & {} L_\mathrm {o} \left( \frac{\mathbf {X} \times \mathbf {G}}{\mu } - \frac{\mathbf {x}}{r}\right) \nonumber \\= & {} \frac{1}{2} \left( C_1 \sin {2(g+\lambda )} + C_2 \sin {2(g-\lambda )} \right) \mathbf {e}_1 \nonumber \\&- \frac{1}{2} \left( C_1 \cos {2(g+\lambda )} + C_2 \cos {2(g-\lambda )} \right) \mathbf {e}_2 \nonumber \\&+ \frac{\varLambda }{2} \mathbf {e}_3. \end{aligned}$$
(79)

Thus, the momentum \(\varLambda \) has been identified as twice the projection of the Laplace–Runge–Lenz vector on the third axis, yet this equality, using the property \(2 S L_\mathrm {o}^2 = \mu ^2\), holds only in the pure Kepler problem. In the perturbed case, one should refer to the general definition of \(\mathbf {e}\) in terms of the KS variables (Breiter and Langner 2017).

Closing the discussion of the momenta, let us collect the bounds on their values:

$$\begin{aligned} L > 0, \quad |\varLambda | + |G| \leqslant L, \quad \varGamma =0. \end{aligned}$$
(80)

By the construction, the value of \(L=L_{12}+L_{03}\) must be nonnegative; but \(L=0\) implies the permanent location at the origin (\(\mathbf {x}=\mathbf {X}=\mathbf {0}\)), so we exclude it. The momenta \(\varLambda \) and G may be either positive or negative, but the above inequality guarantees that all coefficients in Eq. (67) are real.

5.2 LKS angles

As already mentioned, the angle \(\gamma \) is a cyclic variable, absent in the pullback of any Hamiltonian \({\mathscr {H}}\) by \(\zeta \). Actually, \(\gamma \) is the ‘KS angle’ parameterizing the fibre of KS variables \((\mathsf {v},\mathsf {V})\) mapped into the same point in the \((\mathbf {x},\mathbf {X})\) phase space. Thus, unless we are interested in some topological stability issues (Roa et al. 2016), the angle can be ignored.

The only fast angular variable is l. As expected, its values in the pure Kepler problem are equal to a half of the orbital eccentric anomaly E. Indeed,

$$\begin{aligned} \frac{\mathrm {d}l}{\mathrm {d}t} = \frac{\mathrm {d}l}{\mathrm {d}\tau } \frac{\mathrm {d}\tau }{\mathrm {d}t} = \frac{\partial {\mathscr {M}}_0}{\partial L} \frac{\alpha (S)}{4r} = \frac{\omega (S) \alpha (S)}{4 r} = \frac{\sqrt{2 S}}{2 r} = \frac{\mu }{2 L_\mathrm {o} r} = \frac{1}{2}\,\frac{\mathrm {d}E}{\mathrm {d}t}, \end{aligned}$$
(81)

and both angles are equal to 0 at the pericentre. Once again, this direct relation does not survive the addition of the perturbation. Nevertheless, it also reveals the nature of Eq. (74) as a generalized Kepler’s equation.

The two remaining angles are more unusual. A quick look at Eqs. (76) and (79) might suggest that the role of g and \(\lambda \) in \(\mathbf {G}_\mathrm {o}\) and \(\mathbf {J}\) is similar. But if the norms of the vectors are evaluated, one finds

$$\begin{aligned} G_\mathrm {o}= & {} \frac{1}{2} \sqrt{ G^2 + C_1^2+C_2^2 - 2 C_1 C_2 \cos {4 \lambda } } = L_\mathrm {o} \sqrt{1-e^2}, \end{aligned}$$
(82)
$$\begin{aligned} J= & {} \frac{1}{2} \sqrt{ \varLambda ^2 + C_1^2+C_2^2 + 2 C_1 C_2 \cos {4 \lambda } } = L_\mathrm {o} e. \end{aligned}$$
(83)

The absence of g proves it to be some rotation angle; the presence of \(\lambda \) means that this angle plays a different role (and is somehow related to the eccentricity e).

More light is shed on this problem if we introduce the vectors

$$\begin{aligned} \mathbf {M}= & {} \frac{\mathbf {J}+\mathbf {G}_\mathrm {o}}{2} = \frac{C_1}{2} \sin {2(g+\lambda )} \mathbf {e}_1 - \frac{C_1}{2} \cos {2(g+\lambda )} \mathbf {e}_2 + \frac{\varLambda +G}{4}\,\mathbf {e}_3, \end{aligned}$$
(84)
$$\begin{aligned} \mathbf {N}= & {} \frac{\mathbf {J}-\mathbf {G}_\mathrm {o}}{2} = \frac{C_2}{2} \sin {2(g-\lambda )} \mathbf {e}_1 - \frac{C_2}{2} \cos {2(g-\lambda )} \mathbf {e}_2 + \frac{\varLambda -G}{4}\, \mathbf {e}_3. \end{aligned}$$
(85)

These are essentially the so-called Cartan or Pauli vectors (Cordani 2003), except that we use the sign of \(\mathbf {N}\) opposite to the usual convention. Both the vectors have the same norm \(M=N= L/4 = L_\mathrm {o}/2\) and lie either in the plane perpendicular to orbit, or along a degenerate radial orbit direction. The angle \(\theta \) they form depends on the eccentricity alone, because

$$\begin{aligned} \cos {\theta } = \frac{\mathbf {M} \cdot \mathbf {N}}{M N} = \frac{J^2-G^2_\mathrm {o}}{L_\mathrm {o}^2} = 2 e^2 - 1, \quad \sin {\theta } = 2 e \sqrt{1-e^2}. \end{aligned}$$
(86)

Obviously, \(\theta \) is the upper bound for the angle \(\theta '\) between the projections of the Cartan vectors on the coordinate plane \((x_1,x_2)\)

$$\begin{aligned} \mathbf {M}' = \mathbf {M} - \frac{\varLambda +G}{4} \mathbf {e}_3, \quad \mathbf {N}' = \mathbf {N} - \frac{\varLambda -G}{4} \mathbf {e}_3. \end{aligned}$$
(87)

Using Eqs. (84), (85), and (87), one finds

$$\begin{aligned} \cos {\theta '} = \frac{\mathbf {M}' \cdot \mathbf {N}'}{M' N'} = \cos {4 \lambda }. \end{aligned}$$
(88)

Let us make \(\theta '\) an oriented angle by postulating that it is measured from \(\mathbf {N}'\) to \(\mathbf {M}'\), counterclockwise. Then its sine is given by

$$\begin{aligned} \sin {\theta '} = \frac{(\mathbf {N}' \times \mathbf {M}') \cdot \mathbf {e}_3 }{N' M'} = \sin {4 \lambda }. \end{aligned}$$
(89)

Thus we have identified the angle \(\lambda \) as the quarter of the angle between the projections of the Cartan vectors on the reference plane \((x_1,x_2)\), measured from \(\mathbf {N}'\) to \(\mathbf {M}'\). Finding \(\theta '\) from the eccentricity-dependent \(\theta \) involves orbital inclination and the argument of pericentre, which means that \(\lambda \) is a function of e, I, and \(\omega _\mathrm {o}\).

Interestingly, whenever the argument of pericentre \(\omega _\mathrm {o}\) exists, the statement \(\sin {4\lambda }=0\) means \(\cos {\omega _\mathrm {o}}=0\). Thus, any \(\lambda = k\pi /4\) refers to \(\omega _\mathrm {o}=\pi /2\) or \(\omega _\mathrm {o}=3\pi /2\).

Once we have interpreted \(\lambda \), the meaning of g comes out of Eqs. (84) and (85): let us create the sum of normalized vectors \(\mathbf {M}'/\Vert \mathbf {M}'\Vert +\mathbf {N}'/\Vert \mathbf {N}'\Vert \) and let us rotate the resulting vector by \(\pi /2\) counterclockwise, obtaining

$$\begin{aligned} \mathbf {M}_\mathrm {m} = 2 \cos {2\lambda } \left( \cos {2g} \mathbf {e}_1 + \sin {2g} \mathbf {e}_2 \right) . \end{aligned}$$
(90)

This formula suggests that g is a half of the longitude of \(\mathbf {M}_\mathrm {m}\), or of \(-\mathbf {M}_\mathrm {m}\), depending on the sign of \(\cos {2\lambda }\). Whichever the case, changing the value of g we perform a simultaneous rotation of both \(\mathbf {N}'\) and \(\mathbf {M}'\) by the same angle. Indirectly, it means the rotation of the orbital plane (if it exists) around the third axis, which makes g a relative of the ascending node longitude.

5.3 Special orbit types

Let us inspect how some specific orbit types are mapped onto the LKS variables. The discussion is restricted to the elliptic orbits (\(0 \leqslant e \leqslant 1\)) in the pure Kepler problem (Table 1).

5.3.1 Circular orbits

Circular orbits, having \(e=0\), are characterized by \(\mathbf {J}=\mathbf {0}\), and hence all must possess \(\varLambda =0\), since \(2 \varLambda = \mathbf {J}\cdot \mathbf {e}_3\). Then, the norm of the Laplace–Runge–Lenz vector (83) simplifies, thanks to \(C_1=C_2=\sqrt{L^2-G^2}/2\), and equating its square to 0, we find the condition

$$\begin{aligned} 4 J^2 = \left( L^2-G^2\right) \left( \cos {2\lambda }\right) ^2 = 0. \end{aligned}$$
(91)

Setting \(\lambda = (2k+1) \pi /4\), \(k \in {\mathbb {Z}}\), leads to generic circular orbits with the inclination \(I = \arccos {(G/L)}\), including circular polar orbits when \(G=0\). However, if \(|G|=L\), then the first factor is null regardless of \(\lambda \). This is the case of circular orbits in the ‘equatorial plane’ \((x_1,x_2)\): prograde for \(G=L\), or retrograde for \(G=-L\).

Table 1 Particular orbits and their relation to the LKS variables

The values of \(\lambda \) mentioned above well coincide with the interpretation from Sect. 5.2. In circular orbits, the Cartan vectors \(\mathbf {N}\) and \(\mathbf {M}\) are collinear and opposite; thus, the angle \(\theta =\pi \), and its projection \(\theta '\) remains \(\pm \pi \) as long as the orbit is not equatorial. Thus \(\lambda = \theta '/4 = \pm \pi /4\), plus any multiple of \((2\pi )/4\).

Another explanation of the LKS variables for \(e=0\) can be given by inspecting the Lissajous ellipses in Fig. 2. The orbital distance r is the sum of \(\rho ^2_{03} =v_0^2+v_3^2\) and \(\rho ^2_{12} = v_1^2+v_2^2\), both divided by \(\alpha \). In order to secure a constant \(r= (\rho ^2_{12}+\rho ^2_{03})/\alpha \), it is not necessary that both \(\rho _{ij}\) are constant; enough if they oscillate with the same amplitude and a phase shift of \(\pm \pi /2\). Equal amplitudes result from \(L_{12}=L_{03}\) (because \(G_{12}=G_{03}\) by \(\varGamma =0\)), hence \(\varLambda = L_{12}-L_{03}=0\). The phase shift condition is given by \(l_{12}-l_{03} = 2 \lambda = (2k+1) \pi /2\), which means the values of \(\lambda \) as above.

The case of constant \(\rho _{ij}\), mentioned above, should be related to some special kind of a circular orbit. Indeed, since it needs \(L_{12}=|G|/2=L_{03}\), i.e. two circles of equal radii in Fig. 2, we obtain the circular equatorial orbits with \(\varLambda =0\) and \(|G|=L\) (prograde or retrograde, depending on the sign of G). Observe that due to the lack of distinct semi-axes in the two circles, the angles \(l_{ij}\) and \(g_{ij}\) are undefined, and so are l, g, \(\gamma \), and \(\lambda \). But still one can use properly defined ‘longitudes’ \(l+g\) or \(l-g\) in the prograde, and retrograde cases, respectively—at least until some ‘virtual singularities’ appear (Henrard 1974). In terms of the Cartan vectors, \(\mathbf {M}= -\mathbf {N} = \mathbf {G}_\mathrm {o}\), so \(\mathbf {M}'=\mathbf {N}'=\mathbf {0}\), making the angles g and \(\lambda \) undetermined.

5.3.2 Radial orbits

Rectilinear (radial) orbits require \(\mathbf {G}_\mathrm {o}=\mathbf {0}\), hence \(G=0\) and \(\mathbf {G}_\mathrm {o} \cdot \mathbf {G}_\mathrm {o} = 0\). According to Eq. (67), \(G=0\) means \(C_1=C_2= \sqrt{L^2-\varLambda ^2}/2\), wherefrom Eq. (82) implies

$$\begin{aligned} 4 G_\mathrm {o}^2 = \left( L^2-\varLambda ^2\right) \, \left( \sin {2\lambda }\right) ^2 = 0. \end{aligned}$$
(92)

Regardless of \(\lambda \), it is satisfied by \(|\varLambda |=L\), i.e. by polar radial orbits with \(\mathbf {J} = (\varLambda /2)\,\mathbf {e}_3\). For all other directions of the Laplace–Runge–Lenz vector, radial orbits need \(\lambda = k \pi /2\), where \(k \in {\mathbb {Z}}\); this time \(\varLambda \) can be arbitrary, with \(\varLambda =0\) indicating an equatorial radial orbit. In terms of the Cartan vectors, \(\mathbf {G}_\mathrm {o}=0\) means \(\mathbf {N}=\mathbf {M}\), so their angle \(\theta =0\) is projected as \(\theta '=0+2 k\pi \), which (divided by 4) gives the above values of \(\lambda \).

In terms of the Lissajous ellipses in \((v_1,v_2)\) and \((v_0, v_3)\) planes from Fig. 2, \(G=0\) means that both degenerate into straight segments. The motion along the segments must obey \(l_{12}=l_{03}+k\pi \), to guarantee that \(v_0=v_1=v_2=v_3=0\) at the same epoch. The direction of \(\mathbf {x}(\mathsf {v})\) is determined by the difference of lengths of the two segments: equatorial orbits result if the segments have the same length, whereas polar orbits require that one of the segments collapses into a point. In the latter case, l and \(\lambda \) are undetermined, but \(l+\lambda =l_{02}\) or \(l-\lambda =l_{03}\) retain a well-defined meaning for an appropriate sign of \(\varLambda \). Problems with the definition of \(g_{ij}\) due to the vanishing minor axes are only apparent, because they can be solved by an alternative definition: instead of ‘position angle of the minor semi-axis’, one can equally well say ‘position angle of the major semi-axis minus \(\pi /2\)’.

5.3.3 Equatorial orbits

Since the Laplace–Runge–Lenz vector lies in the plane \((x_1,x_2)\) for the equatorial orbits, all they must have \(\varLambda =0\), whence \(C_1=C_2=\sqrt{L^2-\varLambda ^2}/2\). Moreover, \(G_\mathrm {o}^2=(G/2)^2\), which leads to the condition

$$\begin{aligned} 4 (G_\mathrm {o}^2-G^2) = \left( L^2-G^2\right) \, \left( \sin {2\lambda }\right) ^2 = 0. \end{aligned}$$
(93)

The case of \(|G|=L\) brings us back to the circular equatorial orbits, already discussed. Other values of G require \(\lambda = k \pi /2\), where \(k \in {\mathbb {Z}}\). These are the same values as in the case of radial orbits, which makes sense, because \(G=0\) should bring us to the radial equatorial orbit.

For an elliptic (\(e\ne 0\)) equatorial orbit, the Cartan vectors \(\mathbf {N}\) and \(\mathbf {M}\) may form different angles \(\theta \), but since they lie in a polar plane, the projection of these angle is always \(\theta '=0\), exactly as in the radial orbit case—thus the same values of \(\lambda \).

The two Lissajous ellipses in Fig. 2 must have the same semi-axes, and \(l_{12}=l_{03}+k\pi \). This is necessary to obtain \(v_1^2+v_2^2=v_0^2+v_3^2\), which guarantees \(x_3=0\) for all epochs, according to Eq. (9) in the KS3 setup.

5.3.4 Polar orbits

Polar orbits are generically indicated by the simple condition \(G=0\). It is only in the special cases where the angle \(\lambda \) comes into play: circular polar orbits (\(\varLambda =0\)) need \(\lambda =(2k+1)\pi /4\), whereas radial polar orbits \((|\varLambda |=L)\) are the ones where \(\lambda \) is undetermined. Since \(G=0\), both Lissajous ellipses degenerate into segments, but their lengths may be different, and the phase shift arbitrary.

5.3.5 Singularities

Inspecting specific types of orbits we met the situations, where \(\lambda \) and g become undetermined: circular equatorial orbits with \(|G|=L, \varLambda =0\) and rectilinear polar orbits with \(|\varLambda |=L, G=0\). These four points are the vertices of the square on the \((G,\varLambda )\) plane defined by the constraint \(|G|+|\varLambda | \leqslant L\). However, all four edges of the square leave the angles undetermined. This is related to the fact that:

  1. (a)

    \(L=G+\varLambda \) (upper right edge in Fig. 4) means \(L_{03}=G_{03}\), i.e. prograde circular motion on \((v_0,v_3)\) plane with undetermined \(l_{03}\) and \(g_{03}\) (but \(l_{03}+g_{03}\) is well defined),

  2. (b)

    \(L=-G+\varLambda \) (upper left edge in Fig. 4) means \(L_{03}=-G_{03}\), i.e. retrograde circular motion on \((v_0,v_3)\) plane with undetermined \(l_{03}\) and \(g_{03}\) (but \(l_{03}-g_{03}\) is well defined),

  3. (c)

    \(L=G-\varLambda \) (lower right edge in Fig. 4) means \(L_{12}=G_{12}\), i.e. prograde circular motion on \((v_1,v_2)\) plane with undetermined \(l_{12}\) and \(g_{12}\) (but \(l_{12}+g_{12}\) is well defined),

  4. (d)

    \(L=-G-\varLambda \) (lower left edge in Fig. 4) means \(L_{12}=G_{12}\), i.e. retrograde circular motion on \((v_1,v_2)\) plane with undetermined \(l_{12}\) and \(g_{12}\) (but \(l_{12}-g_{12}\) is well defined).

The Keplerian orbits obtained by mapping the edges of the \((G,\varLambda )\) square onto \(\mathbf {J}\) and \(\mathbf {G}_0\) or \(\mathbf {x}\) and \(\mathbf {X}\) have \(e=\sin {I}\), and \(\sin \omega _\mathrm {o}=\pm 1\). Thus the vertices in Fig. 4 are \(\sin {I}=e=0\) (left and right) and \(\sin {I}=e=1\) (top and bottom). Along the edges, half of the coefficients (67) does vanish, and one of the vanishing coefficients is always \(C_1\) or \(C_2\), which implies that either \(\mathbf {M}'\) or \(\mathbf {N}'\) is a null vector, so the angles \(\lambda \) and g become undefined.

6 Application to the Lidov–Kozai problem

6.1 Derivation of the secular model

In order to test the LKS variables in a nontrivial astronomical problem, let us revisit the Lidov–Kozai resonance arising in the artificial satellites theory (Lidov 1962) or asteroid dynamics (Kozai 1962). In this already classical problem, the Keplerian motion of a small body (a satellite or an asteroid) around a central mass with the gravitational parameter \(\mu \) (a planet or the Sun) is influenced by a distant perturber with the gravitational parameter \(\mu '\) (the Sun or a planet, respectively). The origin of the reference frame is attached to the central mass, the plane \((x_1,x_2)\) coincides with the orbital plane of the perturber, and the third axis basis vector \(\mathbf {e}_3\) is directed along the angular momentum of the perturber. Further, let us assume that the perturber moves on a circular orbit with the mean motion \(n_\mathrm {p}\), so its position vector is

$$\begin{aligned} \mathbf {r}_\mathrm {p} = a_\mathrm {p} \cos n_\mathrm {p}t \,\mathbf {e}_1 + a_\mathrm {p} \sin n_\mathrm {p}t \mathbf {e}_2. \end{aligned}$$
(94)

Compared to the small body, whose position vector is \(\mathbf {x}\), the perturber is distant, i.e. \(||\mathbf {x}||/||\mathbf {r}_\mathrm {p}|| = r/a_\mathrm {p}\) is small enough to approximate the perturbing function by the second degree Legendre polynomial term. Thus we obtain the problem with the Hamiltonian \({\mathscr {H}}\) from Eq.  (36) with the perturbation

$$\begin{aligned} {\mathscr {R}} = - \frac{\mu _\mathrm {p} r^2}{a_\mathrm {p}^3}\,P_2(\mathbf {x}\cdot \mathbf {r}_\mathrm {p}/(r a_\mathrm {p})). \end{aligned}$$
(95)

The perturbation is time dependent, so—after substituting (94)—we replace t by its formal twin \(x^*\), obtaining

$$\begin{aligned} {\mathscr {R}} = - \frac{\mu _\mathrm {p}}{4 a_\mathrm {p}^3} \left[ r^2-x_3^2+3 (x_1^2-x_2^2) \cos {2 n_\mathrm {p} x^*} + 6 x_1 x_2 \sin {2 n_\mathrm {p} x^*} \right] . \end{aligned}$$
(96)

Choosing \(\omega =1\), and \(\alpha = \sqrt{8 X^*} = \sqrt{8 S}\), we apply the LKS transformation, setting \(\varGamma =0\), because we are not interested in the evolution of the KS angle \(\gamma \). The resulting Hamiltonian (65) is

$$\begin{aligned} {\mathscr {M}}= {\mathscr {M}}_0 + {\mathscr {Q}} = 0, \end{aligned}$$
(97)

with the Keplerian part

$$\begin{aligned} {\mathscr {M}}_0 = L - \frac{2 \mu }{\sqrt{2 S}}. \end{aligned}$$
(98)

For a while, the perturbation \({\mathscr {Q}}\) will be given in an intermediate form without the explicit substitution of the LKS variables into \(\mathbf {x}\) and r, which leads to a relatively concise form

$$\begin{aligned} {\mathscr {Q}} = - \frac{\mu _\mathrm {p} r}{32 a_\mathrm {p}^3 \sqrt{2 S}} \left[ r^2-x_3^2+3 (x_1^2-x_2^2) \cos {\left( 2 n_\mathrm {p} s - \sigma \right) } + 6 x_1 x_2 \sin {\left( 2 n_\mathrm {p} s - \sigma \right) } \right] ,\nonumber \\ \end{aligned}$$
(99)

where, according to Eq. (74),

$$\begin{aligned} \sigma = \frac{n_\mathrm {p}}{2 S} \left( B_1 \sin {2(l+\lambda )} + B_2 \sin {2(l-\lambda )} \right) . \end{aligned}$$
(100)

By the choice of \(\alpha \), the Sundman time \(\tau \) is dimensionless and the unperturbed motion gives

$$\begin{aligned} \frac{\mathrm {d}l}{\mathrm {d}\tau }= & {} \frac{\partial {\mathscr {M}}_0}{\partial L} = 1, \end{aligned}$$
(101)
$$\begin{aligned} \frac{\mathrm {d}s}{\mathrm {d}\tau }= & {} \frac{\partial {\mathscr {M}}_0}{\partial S} = \frac{\mu }{\sqrt{2 S^3}}, \end{aligned}$$
(102)

with all the remaining variables constant. Solving (101) we find \(l=\tau +l_0\). The value of S is set to give \({\mathscr {M}}=0\), but ignoring the contribution of \({\mathscr {Q}}\) we may estimate that \(s \approx \tau /n\), where n is the Keplerian mean motion.

According to the standard Lie transform method (e.g. Ferraz-Mello 2007), the mean variables can be introduced by a nearly canonical transformation that converts \({\mathscr {M}}\) into \({\mathscr {N}}={\mathscr {N}}_0+{\mathscr {Q}}'\), with \({\mathscr {N}}_0 = {\mathscr {M}}_0\) and \({\mathscr {Q}}'\) being constant along the phase trajectory generated by \({\mathscr {N}}_0\). Up to the first order, the new perturbation \({\mathscr {Q}}'\) is simply the average of \({\mathscr {Q}}\) with respect to \(\tau \), assuming \(l=\tau +l_0\) and \(s=\tau /n\).

Since the perturber has been assumed distant, its mean motion \(n_p\) is small compared to n and both frequencies can be treated as irrational; even if they are not, the resonance will occur in high degree harmonics with practically negligible amplitudes. In these circumstances, any product of sine or cosine of \(2n_\mathrm {p} s = 2 (n_\mathrm {p}/n) \tau \) with a function which is either constant or \(2\pi \)-periodic in \(\tau \) has the zero average.Footnote 3 Then \({\mathscr {Q}}'\) simplifies to

$$\begin{aligned} {\mathscr {Q}}' = - \frac{\mu _\mathrm {p} }{64 \pi a_\mathrm {p}^3 \sqrt{2 S}}\int _0^{2\pi } \left[ r^3- r x_3^2 \right] \mathrm {d}l. \end{aligned}$$
(103)

Thus we obtain the first order approximation of the secular system

$$\begin{aligned} {\mathscr {N}}= & {} L - \frac{2 \mu }{\sqrt{2 S}} - \frac{\mu _\mathrm {p} L}{1024 a_\mathrm {p}^3 S^2} \left( L^2 -6 \varLambda ^2 + 6 C_1 C_2 \cos {4\lambda } \right) = 0, \end{aligned}$$
(104)
$$\begin{aligned} C_1 C_2= & {} \frac{1}{4} \sqrt{\left( L^2-(G-\varLambda )^2\right) \,\left( L^2-(G+\varLambda )^2\right) }, \end{aligned}$$
(105)

where the mean variables should be given different symbols, but we adhere to a widespread habit of distinguishing the mean and the osculating variables by context. The following study of motion generated by \({\mathscr {N}}\) will refer only to the mean variables, so no confusion should occur.

Both \({\mathscr {N}}\) and the classical secular Hamiltonian of the Lidov–Kozai problem share the same property: they are reduced to one degree of freedom. In our case, it is the canonically conjugate pair \((\lambda ,\varLambda )\) instead of the usual Delaunay pair of the argument of pericentre and the angular momentum norm. All other momenta are constant and will be treated as parameters. However, there is a fundamental difference between our formulation and the classical approach: the equations of motion for \(\lambda \) and \(\varLambda \) are not singular for most of the radial orbits.

6.2 Secular motion and equilibria

Let us set

$$\begin{aligned} B = \frac{3 \mu _\mathrm {p} L}{1024 a_\mathrm {p} S^2}. \end{aligned}$$
(106)

The equations of motion derived from (104) are

$$\begin{aligned} \frac{\mathrm {d}\lambda }{\mathrm {d}\tau }= & {} \frac{\partial {\mathscr {N}}}{\partial \varLambda } = B \,\varLambda \left( 4 + \frac{L^2+G^2-\varLambda ^2}{4 C_1 C_2} \cos {4 \lambda } \right) , \end{aligned}$$
(107)
$$\begin{aligned} \frac{\mathrm {d}\varLambda }{\mathrm {d}\tau }= & {} -\frac{\partial {\mathscr {N}}}{\partial \lambda } = - 8 B C_1 C_2 \sin {4 \lambda }. \end{aligned}$$
(108)

Integral curves of this system are plotted in Fig. 3 for three values of G: 0.9L, 0.75L and 0. The phase plane has been clipped to \(-\pi \leqslant \lambda \leqslant \pi \), because the reaming range of \(\lambda \) is a simple duplication of the plotted phase portrait.

Fig. 3
figure 3

Integral curves of the regularized Lidov–Kozai problem on the \((\lambda ,\varLambda )\) phase plane. Top: \(G = 0.9\,L\), middle: \(G=0.75\,L\), bottom: \(G=0\)

Referring to Table 1, one can check that a generic radial orbit does not introduce a singularity into Eq. (107). Indeed, \(G=0\) and \(\lambda = k \pi /2\) result in a well-defined

$$\begin{aligned} \frac{\mathrm {d}\lambda }{\mathrm {d}\tau } = 5 B \varLambda , \quad \frac{\mathrm {d}\varLambda }{\mathrm {d}\tau } = 0. \end{aligned}$$
(109)

It means that a radial orbit is not an equilibrium, unless \(\varLambda =0\), which is exactly the case of a radial orbit in the equatorial plane. Observing that for \(G=0\) the points \((\lambda = k\pi /2,\varLambda =0)\) are well-defined local minima of \({\mathscr {N}}\), we are able to state that radial orbits in the orbital plane of the perturber are stableFootnote 4 equilibria. The bottom panel of Fig. 3 confirms this observation: the points (0, 0), \((90^\circ ,0)\), and \((-\,90^\circ ,0)\) are surrounded by closed, oval-shaped contours. Intersection of any of the integral curves plotted in the bottom panel with the vertical lines \(\lambda = 0\) or \(\lambda =\pm \,90^\circ \) marks a temporary passage through the radial orbit degeneracy.

As far as the polar radial orbits (with \(|\varLambda |=L\)) are concerned, Eq. (107) becomes singular, but this singularity is purely geometrical. Such orbits should be located at the upper and lower edges of the bottom panel in Fig. 3, where \(\lambda \) is undetermined. But since the integral curves approaching the edges become parallel to them, one should expect that polar radial orbits are stable equilibria (which is actually the case, if the analysis is performed in terms of vectors \(\mathbf {G}_\mathrm {o}\) and \(\mathbf {J}\), or simply observing that for \(G=0\) the Hamiltonian \({\mathscr {N}}\) has the local maxima at \(\varLambda = \pm L\) regardless of the value of \(\lambda \)).

Actually, the presence of \(\varLambda \) as a factor of the first of Eq. (107) means that for any value of \(|G| \ne L\), the equilibria exist at \((\lambda = j\,\pi /4,\varLambda =0)\), as shown in Fig. 3. For even \(j=2k\), the equilibria refer to equatorial orbits with the eccentricity depending on G through \(e=\sqrt{1-(G/L)^2}\). It is easy to check the they are the local minima of the Hamiltonian \({\mathscr {N}}\); hence, the equatorial orbits are stable. The circular equatorial case with \(|G|=L\) is problematic, because then the upper and lower limits of \(\varLambda \) merge, and in order to prove that these are actually the stable equilibria, one has to resort to the analysis of \(\mathbf {G}_\mathrm {o}\) and \(\mathbf {J}\) vectors.

For odd \(j=(2k+1)\), the equilibria are circular orbits with inclinations depending on G (equatorial if \(G=0\), prograde for \(G>0\) and retrograde when \(G<0\)). Their stability depends on the ratio G / L. Unlike in the Delaunay chart, variational equations can be formulated directly in the phase plane of \((\lambda ,\varLambda )\), leading to the eigenvalues that are pure imaginary for \((G/L)^2 > 3/5\). Thus circular orbits are stable for inclinations below \(I_1=\arccos {\sqrt{3/5}} \approx 39^\circ \!.23\) and above \(I_2=\arccos {-\sqrt{3/5}} \approx 140^\circ \!.77\). At these critical values, a bifurcation occurs: when \((G/L)^2 < 3/5\) circular orbits become unstable and two stable equilibria are created at \((\lambda = (2k+1)\,\pi /4, \varLambda = \pm \varLambda _\mathrm {c})\) (see the middle panel of Fig. 3). Recall that, in general case of inclined, elliptic orbits, this value of \(\lambda \) means the argument of pericentre equal \(\pi /2\) or \(3\pi /2\). The value of \(\varLambda _\mathrm {c}\) is the root of the first of Eq. (107) with \(\varLambda \ne 0\) and \(\cos {4\lambda }= -1\), i.e.

$$\begin{aligned} 4 - \frac{L^2+G^2-\varLambda ^2}{4 C_1 C_2} = 0, \end{aligned}$$
(110)

leading to

$$\begin{aligned} \varLambda _\mathrm {c} = L \sqrt{1-\frac{8 |G|}{\sqrt{15} L} + \left( \frac{G}{L}\right) ^2}. \end{aligned}$$
(111)

These are the classical equilibria of the Lidov–Kozai problem—the only ones that can be analysed directly in the Delaunay variables. In terms of the orbital elements, Eq. (111) is equivalent to the well-known condition (Lidov 1962)

$$\begin{aligned} 1-e^2 = \frac{3 (\cos {I})^2}{5}. \end{aligned}$$
(112)

The equilibria can be located in the middle panel of Fig. 3 at \(\lambda = \pm 45^\circ \), \(\varLambda \approx \pm 0.115\,L\).

Fig. 4
figure 4

Equilibria in the regularized Lidov–Kozai problem. Grey square—admissible region of G and \(\varLambda \). The case of \(\lambda = (2k+1) \pi /4\): horizontal line—circular orbits (solid line for stable, dashed for unstable equilibria), vertical curves—stable classical equilibria (112). The case of \(\lambda = k \pi /2\): horizontal line—stable equatorial orbits. Solid circles at the vertices—stable equilibria with undetermined \(\lambda \): polar radial (\(G=0\)) and equatorial circular (\(\varLambda =0\)) orbits

Figure 4 shows all the equilibria and their stability, with the dashed lines marking the unstable equilibrium. The edges of the \((G,\varLambda )\) square (upper and lower boundaries of the plots in Fig. 3) may not be attached to any of the values of \(\lambda \), but we added the black dots at the corners to show the stable equilibria of the special type as the natural limits of the stable branches (solid lines).

It is not unusual that all action–angle-like variables with bounded momentum suffer from indeterminate angle at the boundary of its conjugate. The LKS variables cannot be different, even if many cases, problematic in the Delaunay chart, have been located inside the boundaries of \(\varLambda \). For each value of \(G \ne 0\) (and \(|G| \ne L\)), there exist integral curves passing through both the extremes: \(\varLambda = L - |G|\) and \(\varLambda = -L+ |G|\). In the top or the middle panel of Fig. 3, they are seen as four disjoint fragments; for example, the two open curves approaching the edges at \(\lambda \pm 22^\circ \!.5\) are the fragments of such an integral curve. There is no singularity in these orbits (see Sect. 5.3.5) other than the indeterminacy of longitude at the poles of a sphere (Deprit 1994).

7 Conclusions

While commenting a transformation due to Fukushima, Deprit (1994) observed that it amounts to swapping singularities, and immediately added ‘This remark is not meant to diminish its practical merit, quite the contrary’. The LKS variables we have presented also ‘trade in singularities’, but the rule of trade we propose is to spare the radial, rectilinear orbits (except the polar ones) at the expense of some other types. The exceptions include mostly a family of expendable, rank-and-file orbits with \(e=\sin {I}\) and the lines of apsides perpendicular to the lines of nodes—the cases easily tractable without the KS regularization and unlikely to focus attention by becoming equilibria in typical problems of Celestial Mechanics. More we regret the problems caused by the polar radial, and equatorial circular orbits. Nevertheless, we believe that more has been gained than lost. Enough to enumerate the orbits that remain regular points in our chart: circular inclined, equatorial elliptic, and all radial (except the polar ones). Thanks to refraining from the use of orbital plane in their construction, the LKS variables are better fitted to study highly elliptic orbits than any other action–angle set known to the authors.

The analysis of the quadrupole Lidov–Kozai problem in Sect. 6 suggests that the LKS variables may be a handy tool in the analysis of the more problematic cases, like the eccentric, octupolar Lidov–Kozai problem. In the latter, the ‘orbital flip’ phenomenon occurs: changing the direction of motion with the passage through an equatorial rectilinear orbit phase (Lithwick and Naoz 2011). Previous attempts to discuss this phenomenon in terms of the action–angle variables (e.g Sidorenko 2018) faced the problems which may possibly be resolved with the newly presented parameterization.

Some of the readers might be sceptical about the unnecessary duplication of the phase space resulting from the LKS transformation \(\zeta \). Indeed, Fig. 3 covers the whole phase space of in terms of the argument of pericentre \(\omega _\mathrm {o}\), although it has been clipped to the half range of \(\lambda \). This feature can be trivially removed by means of a symplectic transformation \((\lambda ,\varLambda ) \rightarrow (2 \lambda , \varLambda /2)\), and similarly for other conjugate pairs. We have not made this move in the present work for the sake of retaining the fundamental, angle-halving property of both the Levi-Civita and the Kustaanheimo–Stiefel transformations. Avoiding factor 2 in the arguments of sines and cosines in Eqs. (69) and (72), we would introduce the factor \(\frac{1}{2}\) in the expressions for \(\mathsf {v}\) and \(\mathsf {V}\). Let us mention that the restriction of the LKS transformation to \((\mathsf {v},\mathsf {V}) \rightarrow (l,g,h,\gamma ,L,G,H,\varGamma )\) can be useful also in the studies of perturbed, four-degree-of-freedom oscillators, not necessarily resulting from the KS transformation (e.g. Crespo et al. 2015; van der Meer et al. 2016). In that case, unwanted spurious singularities may arise in course of the Birkhoff normalization, when the multiple of angle does not properly match the power of action.

Having based the LKS variables upon the KS3 variant of the KS transformation, we do not exclude a possibility of performing a similar construction within the KS1 framework. But then the G and \(\varLambda \) variables will be the projections of the angular momentum and the Laplace–Runge–Lenz vectors on the \(x_1\) axis. With such a choice, the Lidov–Kozai Hamiltonian (104) would depend on both g and \(\lambda \), with the rotation symmetry hidden deeply in some complicated function of all variables, instead of the obvious \(G=\text{ const }\).