Abstract
The aim of this paper is to present a new, analytical, method for computing the exact number of relative equilibria in the planar, circular, restricted 4-body problem of celestial mechanics. The new approach allows for a very efficient computer-aided proof, and opens a potential pathway to proving harder instances of the n-body problem.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
According to Newton’s law of gravity, the movements of n planets (located at positions \(p_1,\dots ,p_n\), and having masses \(m_1,\dots ,m_n\)) are given by the system of differential equations:
A relative equilibrium motion is a planar solution to the n-body problem that performs a rotation of uniform angular velocity \(\omega ^2\) about the system’s center of mass c. In other words, a planar solution to the following system of equations:
The relative equilibria of the 3-body problem have been known for centuries. In terms of equivalence classes, there are—irrespectively of the masses—exactly five relative equilibria. Three of these are the collinear configurations discovered by Euler [5]; the remaining two are Lagrange’s equilateral triangles [12], see Fig. 1.
The collinear configurations found by Euler have been generalized for n bodies by Moulton [16]. There are exactly n!/2 such collinear equivalence classes.
In 2006, Hampton and Moeckel [8] proved that the number of relative equilibria of the Newtonian 4-body problem is finite (always between 32 and 8472). Their computer-aided proof is based on symbolic and exact integer computations. The upper bound 8472 is believed to be a large overestimation; numerical simulations suggest that no more than 50 equilibria exists, see e.g. [21].
Albouy and Kaloshin [1] almost settled the question of finiteness for \(n = 5\) bodies. They proved that there are finitely many relative equilibria in the Newtonian 5-body problem, except perhaps if the 5-tuple of positive masses belongs to a given co-dimension-2 subvariety of the mass space. By Bèzout’s theorem, an upper bound on the number of relative equilibria is obtained (outside the exceptional subvariety), but the authors conclude “However, the bound is so bad that we avoid writing it explicitly”.
Relaxing the positivity of the masses can produce a continuum of relative equilibria. In [20] Roberts demonstrated this for the 5-body problem with one negative mass.
Looking at the restricted 4-body problem (i.e. when one of the planets has an infinitesimally small mass), Lindow [13] and Palmore [18] found that in the collinear case, only two relative equilibria exist. In the equilateral setting, Gannaway [6], Pedersen [19] and Simó [21] found numerical evidence of there being 8, 9, or 10 relative equilibria. Gannaway’s thesis is further explained in [2]. As a first rigorous result, Kulevich et al [11] proved finiteness with an upper bound of 196 relative equilibria. This, however, was assumed to be a great overestimation, and later Barros and Leandro [3, 4] proved—as earlier works had indicated—that there could only be 8, 9 or 10 relative equilibria, depending on the three primary masses. These proofs are based on techniques from algebra, used to count solutions to large polynomial equations with huge integer coefficients. Due to the high degrees and large number of monomials of the polynomials involved, Barros and Leandro resorted to the software Maple, so their proof is (mildly) computer-assisted. The main technique used is inspired by an algorithm developed by Vincent [23] which allows for a significant reduction in the number of variations of signs in the coefficients of a polynomial. Together with Descartes’ rule of signs, it is possible to determine the sign of the relevant polynomials in a given region of space. The main result of [4] puts the earlier works mentioned above on firm mathematical ground.
In this paper, we present a new approach to counting relative equilibria in various settings. We use techniques from real analysis rather than algebraic geometry, an approach we believe will generalize better to more complicated settings such as the full 4-body problem, which still remains unresolved. Moreover, the techniques presented here do not use algebraic properties of the system, only differentiability. This quality could play a role in more general contexts, like in curved spaces or in physical systems where the potential is not given by Newton’s laws of gravitation.
In what follows, we will focus on the planar, circular, restricted 4-body problem, and give a new proof of the results of Barros and Leandro. In this setting, the three primary bodies form an equilateral triangle as in Fig. 1b.
2 Formulating the Problem
Let \(m_1, m_2, m_3\) denote the positive masses of the three primaries, and let \(p_1, p_2, p_3\) denote their positions in \({\mathbb {R}}^2\), which form an equilateral triangle. Also let z be the position of the fourth (weightless) body. In this setting, the gravitational pull on z is described by the amended potential:
Here c denotes the center of mass of the primaries. It follows that the locations of the relative equilibria are given by the critical points of V. Thus, the challenge of counting the number of relative equilibria can be translated into the task of counting the critical points of V, given the appropriate search region \({{\mathcal {C}}}\times {{\mathcal {M}}}\). Here \({{\mathcal {C}}}\subset {\mathbb {R}}^2\) is the set of positions the of weightless body, and \({{\mathcal {M}}}\) is the set of masses, discussed in Sects. 2.2 and 2.1, respectively.
We are now prepared to formulate the main problem of this study (Fig. 2):
Main Problem 1
How many solutions can the critical equation \(\nabla _z V(z;m) = 0\) have (in \({{\mathcal {C}}}\)) when \(m \in {{\mathcal {M}}}\)?
With our approach, there are three difficulties that must be resolved in order to be successful:
-
1.
The potential V (and its gradient) is singular at the three primaries.
-
2.
When the mass of a primary tends to zero, several critical points of V tend to that primary.
-
3.
As masses (not close to zero) are varied, the number of critical points of V may change due to bifurcations taking place.
When all masses are uniformly bounded away from zero, the singularities at the primaries (case 1) can be handled by proving that no critical points can reside in certain small disks centered at the primaries. This is explained further in Sect. 2.2.
For masses approaching zero (case 2), we end up with a multi-restrictedFootnote 1 problem of type \(2 + 2\) (\(m_1\rightarrow 0\)) or of type \(1 + 3\) (\(m_1\rightarrow 0\) and \(m_2\rightarrow 0\)). Both scenarios can be resolved by desingularizing the potential V along the circle \(\Vert z - p_3\Vert = 1\). More about this in Sects. 4 and 6.3.1.
Out of these difficulties, the bifurcations (case 3) are the easiest to resolve. In principle, this only requires verifying the sign of certain combinations of partial derivatives of the potential V. We will address this in Sect. 6.5.
2.1 The Mass Space \({{\mathcal {M}}}\)
Without loss of generality, we may assume that the masses of the primaries are normalized \(m_1 + m_2 + m_3 = 1\) and ordered \(0 < m_1 \le m_2\le m_3\). Call this set \({{\mathcal {M}}}\):
We illustrate the unordered and ordered mass space in Fig. 3. Note that the midpoint of the large triangle corresponds to all masses being equal: \(m = (\tfrac{1}{3},\tfrac{1}{3},\tfrac{1}{3})\).
The bifurcations taking place are of two kinds: quadratic and cubic. The latter are rare, and come from the inherent 6-fold symmetry of the normalized mass space, see Fig. 3 (compare to Figure 21 of [19]). The quadratic bifucations are of a saddle-node type, in which two unique solutions approach each other, merge at the bifucration, and no longer exists beyond the point of bifurcation, see Fig. 4.
In Sect. 6.5 we will present the mathematics needed to resolve the two types of bifurcations taking place.
2.2 The Configuration Space \({{\mathcal {C}}}\)
Without loss of generality, we may fix the positions of the three primaries: \(p_1 = (\tfrac{\sqrt{3}}{2}, +\tfrac{1}{2})\), \(p_2 = (\tfrac{\sqrt{3}}{2}, -\tfrac{1}{2},)\), and \(p_3 = (0,0)\), thus forming an equilateral triangle with unit length sides, as in Fig. 1b.
We begin by deriving some basic results that will be used later on. Let \(\nabla _z V(z;m)\) denote the gradient (with respect to z) of the potential V. A relative equilibria is then simply a solution to the equation \(\nabla _z V(z;m) = 0\), i.e., a critical point of V(z; m).
In what follows, it will be convenient to adopt the following notation:
In determining the relevant configuration space, we will use the following two exclusion results (Lemmas 3 and 5, respectively):
-
Assume that \(m_3 \ge 1/3\). If \(r_3(z) \le 1/3\), then z is not a critical point of the potential V.
-
If \(r_i(z) \ge 2\) for some \(i =1,2,3\), then z is not a critical point of the potential V.
Combining these two results, we see that, for \(m\in {{\mathcal {M}}}\), all relative equilibria must satisfy \(1/3\le r_3\le 2\). We will take this to be our global search region \({{\mathcal {C}}}\) in configuration space:
A more detailed analysis reveals that the relative eqililibria all reside in an even smaller subset of \({{\mathcal {C}}}\), see Fig. 5a. These seven regions were presented already in the work of Pedersen [19]. We will, however, not use this level of detail in our computations.
We end this section by deriving the two exclusion results used above. The critical equation \(\nabla _z V(z;m) = 0\) can be written as
Note that we have
The following lemma provides an a priori bound on how close a solution of \(\nabla _z V(z;m) = 0\) can be to one of the primaries.
Lemma 2
Let \(z \in {\mathbb {R}}^2\) and set \(r_i = r_i(z)\). If for some \(i =1,2,3\) we have
then z is not a critical point of the potential V.
Proof
First, we note that Eq. (5) can be rewritten as
We use this formulation, and consider small \(r_1\), so that the norm of the left-hand side is larger than the norms of terms appearing to the right.
Observe that (7) implies that \(r_i < 1\), because \( \frac{m_i}{r_i^2} > 1\) and \(m_i < 1\). Hence, throughout the proof, we assume that \(r_1 < 1\).
From the triangle inequality \(r_1 + r_2 \ge 1\), we have
Using this together with (7), we obtain
hence (8) is not satisfied. \(\square \)
Since we are only considering ordered masses, we always have \(m_3\ge 1/3\). This fact, combined with Lemma 2, immediately gives us a uniform bound for the primary \(p_3\).
Lemma 3
Assume that \(m_3 \ge 1/3\). If \(r_3(z) \le 1/3\), then z is not a critical point of the potential V.
Proof
We verify (7) with \(i = 3\). Estimating each side of the inequality, we find
\(\square \)
Remark
For \(m_3 \ge 1/3\), we can exclude the slightly larger region \(r_3 \le 0.3405784\).
Note that Lemma 2 can also be used to exclude a small disc centered at \(p_i\) (\(i = 1,2\)) whenever \(m_i\) is not too small. As an example, if \(m_i\ge \varepsilon > 0\), then we can exclude the disc \(r_i \le \tfrac{1}{2}\sqrt{\varepsilon }\).
Another exclusion principle is given by the following result.
Lemma 4
Let \(z \in {\mathbb {R}}^2\) and set \(r_i = r_i(z)\). If for some \(i = 1,2,3\) we have
then z is not a critical point of the potential V.
Proof
The proof uses the idea that we may rewrite Equation 5 as
Without any loss of the generality, we may assume that \(i=3\) and shift coordinate frame so that \(p_3\) is situated at the origin. Then \(\Vert z\Vert = \Vert z - p_3\Vert = r_3\), and from (9) (with \(i=3\)) it follows that \(r_3 > 1\). This, together with the triangle inequality, gives
Here we use the fact that the center of mass c is located within the triangle spanned by the three primaries, and therefore \(\Vert c\Vert \le 1\).
Applying the triangle inequality again, we have \(r_i + 1 \ge r_3\) (\(i=1,2\)), from which it follows that
Therefore we obtain the following estimate (we use (6, 12, 11))
hence (10) is not satisfied. \(\square \)
A direct consequence of Lemma 4 is the following uniform bound.
Lemma 5
If \(r_i(z) \ge 2\) for some \(i =1,2,3\), then z is not a critical point of the potential V.
Proof
Using only \(r_i(z) \ge 2\), we verify (9). Indeed, a straight forward computation gives:
\(\square \)
3 Reparametrizing the Masses
Due to the normalisation \(m_1 + m_2 + m_3 = 1\), the mass space can be viewed as a 2-dimensional set parametrized by \(m_1\) and \(m_2\). Instead of working directly with the masses \((m_1, m_2)\), we introduce the following non-linear, singular transformation:
The new parameters (s, t) can be transformed back to mass space via the inverse transformation:
The reason for working in the mass space using (s, t)-coordinates is as follows: when \(m_1\) and \(m_2\) tend to zero, then some relative equilibria may move in a non-continuous way and no limit exists. This makes our kind of study virtually impossible. When seen in (s, t)-coordinates, however, the movements are regular and amenable to our computer assisted techniques.
In the \((m_1,m_2)\)-space, the ordered mass space \({{\mathcal {M}}}\) is a right-angled triangle, see Fig. 3b. Under the transformation (13) it is mapped to a non-linear 2-dimensional region \({\tilde{{{\mathcal {P}}}}}\). Taking \({{\mathcal {P}}}\) to be the rectangular hull of \({\tilde{{{\mathcal {P}}}}}\), we have our new parameter region, see Fig. 6. The three vertices of \({{\mathcal {M}}}\) are mapped into \({{\mathcal {P}}}\) according to the following transformations:
Note how the single point \((m_1, m_2) = (0, 0)\) is mapped to the line segment \((s,t) = ([0, 1/2], 0)\) when taking all possible limits from within \({{\mathcal {M}}}\). This desingularization is the main reason for moving to the (s, t)-space; it gives us a better control when masses are near the multi-restricted cases \(m_1 = 0\) and \((m_1, m_2) = (0, 0)\).
Based on this, we will define \({{\mathcal {P}}}= \{(s,t):0\le s\le \frac{1}{2}; 0\le t \le \tfrac{2}{3}\}\), and use the partition \({{\mathcal {P}}}= {{\mathcal {P}}}_1\cup {{\mathcal {P}}}_2\cup {{\mathcal {P}}}_3\) as illustrated in Fig. 6. More precisely we use
Note that each of the three partition elements contain points from \({{\mathcal {P}}}{\setminus }{\tilde{{{\mathcal {P}}}}}\). Such points correspond to unordered masses, and we will automatically remove most them from our computations. In the following, when we say that \({{\mathcal {P}}}_i\) has some property, we mean that the ordered parameters \(\tilde{{{\mathcal {P}}}_i} = {{\mathcal {P}}}_i\cap \tilde{{{\mathcal {P}}}}\) have that property.
The three partition elements \({{\mathcal {P}}}_1, {{\mathcal {P}}}_2, {{\mathcal {P}}}_3\) have the following properties:
-
For each \((s,t)\in {{\mathcal {P}}}_1\), there are exactly 8 solutions in \({{\mathcal {C}}}\).
-
For each \((s,t)\in {{\mathcal {P}}}_2\), there are exactly 10 solutions in \({{\mathcal {C}}}\).
-
For each \((s,t)\in {{\mathcal {P}}}_3\), there are between 8 and 10 solutions in \({{\mathcal {C}}}\).
Let us describe these three regions in more detail. In subsequent sections, we will prove that these descriptions are accurate.
For \((s,t)\in {{\mathcal {P}}}_2\) no bifurcations take place in \({{\mathcal {C}}}\); exactly ten solutions exist but never come close to each other or a primary. This is the easiest region to account for. Also for \((s,t)\in {{\mathcal {P}}}_1\) there are no bifurcations in \({{\mathcal {C}}}\); exactly eight solutions exist. This region, however, includes parameters corresponding to arbitrarily small masses which presents other complications that must be resolved; more about this in Sect. 6.3.1. The remaining set \({{\mathcal {P}}}_3\) contains all parameters for which a bifucation occurs. As discussed in Sect. 2.1, there are two bifurcation types that we must account for; quadratic and cubic. These bifurcations only take place in a small subset \({{\mathcal {C}}}_2\) of the full configuration space. For \((s,t)\in {{\mathcal {P}}}_3\), there can be 1, 2 or 3 solutions in \({{\mathcal {C}}}_2\). In the remaining space \({{\mathcal {C}}}_1 = {{\mathcal {C}}}{\setminus }{{\mathcal {C}}}_2\) there are exactly seven solutions, all isolated from each other and the primaries. Summing up, when \((s,t)\in {{\mathcal {P}}}_3\) we have 8, 9 or 10 solutions in \({{\mathcal {C}}}\) .
In the ideal setting, \({{\mathcal {P}}}_3\) would correspond to the transformed (blue) bifurcation curve illustrated in Fig. 6. It would bisect \({\tilde{{{\mathcal {P}}}}}\), acting as a common boundary line separating \({\tilde{{{\mathcal {P}}}}}_1\) from \({\tilde{{{\mathcal {P}}}}}_2\). Our approach, however, will build upon finite resolution computations, and therefore \({{\mathcal {P}}}_2\) will be constructed as a rectangular subset of \({{\mathcal {P}}}\), covering the entire bifurcation curve, see Fig. 6.
4 Polar Coordinates
Given the shape of the configuration space (4), and how the solutions behave when masses become small, it makes sense to work in polar coordinates centered at the heaviest primary \(p_3 = (0,0)\). In these coordinates the lighter primaries become \(p_1=(1,\pi /6)\) and \(p_2=(1,-\pi /6)\).
For convenience, let us define
Let \((z_1|z_2)\) denote the scalar product on \({\mathbb {R}}^2\). Given \(z=(r\cos \phi ,r \sin \phi )\), we have
It follows that
where \(g(m_1,m_2)\) depends only on the masses and not on z. Therefore we can ignore g when studying spatial derivatives of the potential V.
Note that, for \(i=1,2\), we have
Therefore, if we define
then for \(i=1,2\) we have
Concluding, in polar coordinates, we obtain the new expression for the amended potential (compare to (1)):
where
Taking partial derivatives, the gradient of the potential (16) is given by
where
Note that (18) has a factor r in both terms. We rescale this equation by a factor 1/r and \(1/(m_1 + m_2)\). In the (s, t) parameters this gives us
where
This \(F(r,\varphi ; s,t)\) will be used in place of \(\nabla _z V(z;m)\) appearing in the previous sections. Zeros of F correspond to critical points of V, and these correspond to relative equilibria.
5 General Strategy and Key Results
Recall that we want to determine the number of solutions to \(\nabla _z V(z;m) = 0\), and to understand how these behave within \({{\mathcal {C}}}\) when the masses vary within \({{\mathcal {M}}}\) (see Main Problem 1). For this to succeed, we must have a means of locating all solutions to the critical equation, and a way to analyze the various bifurcations that are possible as the masses vary.
As mentioned in Sect. 1 our overall goal is to construct a completely analytic proof of the following theorem:
Theorem 6
For each \(m\in {{\mathcal {M}}}\) there are exactly 8,9 or 10 relative equilibria (i.e., solutions to the critical equation \(\nabla _z V(z;m) = 0\)) in \({{\mathcal {C}}}\).
This result was originally established by Barros and Leandro [4] using algebraic techniques. Developing analytic tools for the proof, we hope to apply these to harder instances of the n-body problem that are not within reach using algebraic methods.
In our new setting, using the (s, t)-coordinates together with polar coordinates (described in Sects. 3 and 4), the critical equation is transformed into its equivalent form \(F(r,\varphi ; s,t) = 0\). As explained earlier, this form is better suited for our approach, where set-valued numerical computations will play a major role.
Before going into the details of the computations used as part of our computer-assisted framework, we present the following key results. We will use three different techniques: finding the exact number of solutions for given parameters (s, t); proving that no bifurcations take place for a range of parameters; and controlling the number of solutions when when a bifurcation takes place.
We begin by determining the number of solutions for two different parameters.
Theorem 7
Consider the critical equation \(F(r,\varphi ;s,t) = 0\) for the \(3+1\) body problem.
-
(a)
For parameters \((s,t) = (1/4, 1/4)\), there are exactly 8 solutions in \({{\mathcal {C}}}\).
-
(b)
For parameters \((s,t) = (9/20, 3/5)\), there are exactly 10 solutions in \({{\mathcal {C}}}\).
Note that \((s,t) = (1/4, 1/4)\in {{\mathcal {P}}}_1\) and \((s,t) = (9/20, 3/5)\in {{\mathcal {P}}}_2\), as discussed in Sect. 3.
Combining the results of Theorem 7 with a criterion that ensures that no bifurcations are taking place (see Sect. 6.3), we can extend the two solution counts to the two connected regions \({{\mathcal {P}}}_1\) and \({{\mathcal {P}}}_2\), respectively:
Theorem 8
Consider the critical equation \(F(r,\varphi ;s,t) = 0\) for the \(3+1\) body problem.
-
(a)
For all parameters \((s,t)\in {{\mathcal {P}}}_1\), no solution in \({{\mathcal {C}}}\) bifurcates.
-
(b)
For all parameters \((s,t)\in {{\mathcal {P}}}_2\), no solution in \({{\mathcal {C}}}\) bifurcates.
It follows that the number of solutions in \({{\mathcal {P}}}_1\) and \({{\mathcal {P}}}_2\) is constant (8 and 10, respectively).
For the remaining part \({{\mathcal {P}}}_3\) of the parameter space, we must be a bit more detailed. As discussed in Sect. 7, we will split the configuration space into two connected components: \({{\mathcal {C}}}= {{\mathcal {C}}}_1 \cup {{\mathcal {C}}}_2\), thus isolating the region where all bifurcations occur.
Theorem 9
Consider the critical equation \(F(r,\varphi ;s,t) = 0\) for the \(3+1\) body problem.
-
(a)
For all parameters \((s,t)\in {{\mathcal {P}}}_3\), there are exactly 7 solutions in \({{\mathcal {C}}}_1\).
-
(b)
For all parameters \((s,t)\in {{\mathcal {P}}}_3\), there are 1, 2, or 3 solutions in \({{\mathcal {C}}}_2\).
Theorem 6 now follows from the combination of Theorem 7, 8, and 9. In what follows, we will justify each of the three steps described above.
6 Computational Techniques
Here we present the computational techniques that we need to employ in order to establish our main theorem. We also discuss the underlying set-valued methods used later on in the computer-assisted proofs.
6.1 Set-Valued Mathematics
We begin by giving a very brief introduction to set-valued mathematics and rigorous numerics. For more in-depth accounts we refer to e.g. [15, 17, 22].
We will exclusively work with compact boxes \({{\varvec{x}}}\) in \({\mathbb R}^n\), represented as vectors whose components are compact intervals: \({{\varvec{x}}}= ({{\varvec{x}}}_1,\dots ,{{\varvec{x}}}_n)\), where \({{\varvec{x}}}_i = \{x\in {\mathbb R}:\underline{x}_i \le x \le \overline{x}_i\}\) for \(i = 1,\dots , n\).
Given en explicit formula for a function \(f:{\mathbb R}^n\rightarrow {\mathbb R}^m\), we can form its interval extension (which we also denote by f), by extending each real operation by its interval counterpart. As long as the resulting interval image \(f({{\varvec{x}}})\) is well-defined, we always have the following inclusion property:
The main benefit of moving from real-valued to interval-valued analysis is the ability to discretise continuous problems while retaining full control of the discretisation errors. Indeed, whilst the exact range of a function \(\textrm{range}(f;{{\varvec{x}}})\) is hard to compute, its interval image \(f({{\varvec{x}}})\) can be found by a finite computation. In practice, the interval image is computed via a finite sequence of numerical operations. Carefully crafted libraries using directed rounding, such as [9], ensures that the numerical output respects the mathematical inclusion property of (21).
6.2 Equation Solving and Safe Exclusions
We begin by stating what is known as the exclusion principle:
Theorem 10
If f(x) is well-defined and if \(0\notin f({{\varvec{x}}})\), then f has no zero in \({{\varvec{x}}}\).
The proof is an immediate consequence of (21). The exclusion principle can be used in an adaptive bisection scheme, gradually discarding subsets of a global search space \({{\varvec{X}}}\). At each stage of the bisection process, a (possibly empty) collection of subsets \({{\varvec{x}}}_1,\dots , {{\varvec{x}}}_n\) remain, whose union must contain all zeros of f. Note, however, that this does not imply that \(f(x) = 0\) has any solutions; we have only discarded subsets of \({{\varvec{X}}}\) where we are certain that no zeros of f reside. In order to prove the existence of zeros, we need an addition result.
Let \(f\in C^1({{\varvec{X}}}, {\mathbb R}^n)\), where \(\textrm{Dom}(f) = {{\varvec{X}}}\subseteq {\mathbb R}^n\). Given an interval vector \({{\varvec{x}}}\subset {{\varvec{X}}}\), a point \({\check{x}}\in {{\varvec{x}}}\), and an invertible \(n\times n\) matrix C, we define the Krawczyk operator [10, 17] as
Popular choices are \({\check{x}} = \textrm{mid}({{\varvec{x}}})\) and \(C = \textrm{mid}([Df({\check{x}})])^{-1}\), resulting in Newton-like convergence rates near simple zeros.
Theorem 11
Assume that \(K_f({{\varvec{x}}}; {\check{x}}; C)\) is well-defined. Then the following statements hold:
-
1.
If \(K_f({{\varvec{x}}}; {\check{x}}; C) \cap {{\varvec{x}}}= \emptyset \), then f has no zeros in \({{\varvec{x}}}\).
-
2.
If \(K_f({{\varvec{x}}}; {\check{x}}; C) \subset \textrm{int}\,{{\varvec{x}}}\), then f has a unique zero in \({{\varvec{x}}}\).
We will use this theorem together with the interval bisection scheme, were we adaptively bisect the initial search region \({{\varvec{X}}}\) into subsets \({{\varvec{x}}}\) that are either discarded due to the fact that \(0\notin f({{\varvec{x}}})\), kept intact because of \(K_f({{\varvec{x}}}; {\check{x}}; C) \subset \textrm{int}\,{{\varvec{x}}}\), or bisected for further study. On termination, this will give us an exact count on the number of zeros of f within \({{\varvec{X}}}\).
Theorem 11 can be extended to the setting where f also depends on some m-dimensional parameter: \(f:{\mathbb R}^n\times {\mathbb R}^m \rightarrow {\mathbb R}^n\) with \((x;p)\mapsto f(x;p)\). This is what we use to establish Theorem 7.
6.3 A Set-Valued Criterion for Local Uniqueness
Continuing in the set-valued, parameter dependent setting, we will explain in detail the criteria used for detecting when (and when not) a bifurcation occurs for a general system of non-linear equations, depending on some parameters. We will also derive results aimed at extracting more detailed information about some specific bifurcations that may occur.
Let us begin by considering the general problem of solving a system of (non-linear) equations
where \({{\varvec{x}}}\subset {\mathbb R}^n\) and \({{\varvec{p}}}\subset {\mathbb R}^m\) are high-dimensional boxes. For a sufficiently smooth \(f:{\mathbb R}^n\times {\mathbb R}^m\rightarrow {\mathbb R}^n\), we want to know how many solutions (23) can have. We will focus on the bifurcations and develop a criterion which will tell us when the number of solutions of (23) changes.
For now, we will suppress the parameter dependence of f for clarity. All results that follow are extendable to the parameter-dependent setting.
An obvious condition for the local uniqueness of solutions to (23) is given by the following theorem.
Theorem 12
Let \(f:{\mathbb R}^n\rightarrow {\mathbb R}^n\) be \(C^1\). Assume that we are given a box \({{\varvec{x}}}\subset {\mathbb R}^n\) such that \(0 \notin \det Df({{\varvec{x}}})\). Then f is a bijection from \({{\varvec{x}}}\) onto its image.
Note that \(Df({{\varvec{x}}})\) is a matrix with interval entries; it contains all possible Jacobian matrices Df(x), where \(x\in {{\varvec{x}}}\).
Proof
We have for \(x,y \in {{\varvec{x}}}\), \(x\ne y\)
Now, since \(0 \notin \det Df({{\varvec{x}}})\), all (point-valued) matrices in \(Df({{\varvec{x}}})\) are non-singular. Therefore the right-hand side of (24) cannot contain the zero vector. Hence the left-hand side \(f(x) - f(y)\) cannot vanish. \(\square \)
This result constitutes the core of most of our computations. Given two boxes \({{\varvec{x}}}\) and \({{\varvec{p}}}\), we can easily compute \({{\varvec{y}}}= \det Df({{\varvec{x}}};{{\varvec{p}}})\) using interval arithmetic. If \(0\notin {{\varvec{y}}}\), we know that \(f(\cdot ;p)\) is a bijection (between \({{\varvec{x}}}\) and its image) for each \(p\in {{\varvec{p}}}\). Therefore, the Eq. (23) can have at most one solution in \({{\varvec{x}}}\), and such a solution cannot undergo any bifurcation when \(p\in {{\varvec{p}}}\).
In dimension two, it is easy to illustrate the condition we are checking: the level sets of \(f_1\) and \(f_2\) must never become parallel, see Fig. 7.
6.3.1 Small Masses
In the case when one or two masses are very small, Theorem 12 is not practically applicable. In this situation some solutions to the critical equation are close to the primaries, and a careful analysis combined with rigorous bounds produced with computer-assistance are needed.
The strategy to study this problem is the following: we focus on each primary with small mass, and use (local) polar coordinates \((\varrho ,\varphi )\) around it to study the problem. In this setting we study the following system of equations (suppressing the dependence on masses) based on the original amended potential (1)
It turns out that it is relatively easy to control the solutions of the equation \(\frac{\partial V}{\partial \varphi }(\varrho ,\varphi )=0\) for \(\varphi \), obtaining four curves \((\varrho ,\varphi (\varrho ))\) on which we have uniform estimates over a whole range of \((m_1,m_2)\) including (0, 0). Turning to the remaining equation
we study two cases separately: when \(m_1\) alone tends to zero, and when \(m_1\) and \(m_2\) both tend to zero. In both cases we can prove that any solution of (25) is of the form \((\varrho , \varphi (\varrho ))\), and satisfies
This implies that all solutions are regular: there are no bifurcations. We summarise our findings in a quantitative and practically applicable statement:
Theorem 13
For \(0 < m_i \le 10^{-2}\) and \(R=10^{-3}\), any relative equilibria z of (1) with \(\Vert z-p_i\Vert \le R\), \(i=1,2\), is non-degenerate: it does not bifurcate.
The details of all this can be found in the “Appendix A”, with a thorough analysis of the solutions close to the light primaries in question, and with quantitative bounds that are both useful for proving the theorem, and of general interest for further studies. We note that there exist earlier, qualitative, results [24] that treat the case of several infinitesimal masses. The strength of Theorem 13 is that it is quantitative: we are given explicit bounds for the subset of \({{\mathcal {C}}}\times {{\mathcal {M}}}\) on which the statement holds. This is crucial for our approach: in the remainder of \({{\mathcal {C}}}\times {{\mathcal {M}}}\), not covered by Theorem 13, all three masses are quite large, and all relative equilibria are well-separated from the primaries. This is an important condition for an efficient implementation of Theorem 12, which forms a major part of our general computer-assisted proof.
This concludes what we use to establish Theorem 8.
6.4 Lyapunov–Schmidt Reduction in \({\mathbb R}^2\)
If we cannot invoke Theorems 12 or 13 on a given region \({{\varvec{x}}}\times {{\varvec{p}}}\), we must explore the system of equations (23) further.
Assume that the elimination of one variable (the Lyapunov–Schmidt reduction) is possible in the box \({{\varvec{x}}}\) for all \(p \in {{\varvec{p}}}\). The condition \(0\notin \frac{\partial f_i}{\partial x_j}({{\varvec{x}}};{{\varvec{p}}})\) for some \(i,j\in \{1,\dots ,n\}\) is sufficient for this to be possible. It ensures that \(f_i\) is strictly monotone in the variable \(x_j\) for all parameters \(p\in {{\varvec{p}}}\). This implies that a relation \(f_i(x) = C\) implicitly defines \(x_j\) in terms of the other independent variables: \(x_j = x_j(x_1,\dots , x_{j-1}, x_{j+1},\dots , x_n)\). The domain of this parametrization will depend on the constant C.
From now on we will work exclusively in the two-dimensional setting, and we will once again suppress the parameter dependency for clarity. As we are interested in solutions to (23), we want to understand how the zero-level sets of \(f_1\) and \(f_2\) behave for various parameters. Without any loss of generality consider the case \((i,j)=(1,1)\), when we have
Now assume also that the level set \(f_1(x) = 0\) forms exactly one connected component in \({{\varvec{x}}}\). Then there exists a parametrization \(x_1 = x_1(x_2)\) defined on a connected domain \([x_2^-,x_2^+]\subset {{\varvec{x}}}_2\), such that
We can now define the reduced form of (23) as follows
Thus, the number of zeros of g will determine the number of solutions to (23).
For the remaining three cases, the analogous construction would produce
To simplify notation, we will use y as the independent variable of g, and its domain will be denoted \({{\varvec{y}}}= [y^-,y^+]\). Note that a sufficient condition for the uniqueness of a solution of \(g(y)=0\), and hence also of (23) is
It turns out that (29) can be formulated in an invariant way. We will return to the case \((i,j) = (1,1)\) for the sake of clarity.
Implicit differentiation of (27) gives
Applying the chain rule to (28) gives
and substituting the expression for \(x_1'(x_2)\) into this produces
The importance of the non-vanishing condition (26) is now clear, as is the non-vanishing determinant condition of Theorem 12.
In what follows, we will refer to the constructed function g as the bifurcation function.
6.5 Bifurcation Analysis
Having seen how to apply the Lyapunov–Schmidt reduction, we now re-introduce the parameters to f (and thus g). Given a bifurcation function \(g:{{\varvec{y}}}\times {{\varvec{p}}}\rightarrow {\mathbb {R}}\), we would like to study the maximal number of solutions \(g(y;p) = 0\) can have in \({{\varvec{y}}}\) for \(p \in {{\varvec{p}}}\).
Instead of fully resolving the details of all possible bifurcations, we will use the following simple observation:
Lemma 14
Let g be the bifurcation function as defined in Sect. 6.4. Assume that for some \(k \in {\mathbb {Z}}^+\) we have
Then for each \(p \in {{\varvec{p}}}\), the equation \(g(y;p) = 0\) has at most k solutions in \({{\varvec{y}}}\).
The rightmost part of Fig. 7 illustrates the case \(k=2\): the bifurcation function g then has a quadratic behaviour.
Given a search region \({{\varvec{x}}}\times {{\varvec{p}}}\) for the original problem (23), we can try to find a positive integer k such that \(0\notin g^{(k)}({{\varvec{x}}}_i;{{\varvec{p}}})\). Here i can be any index for which the Lyapunov–Schmidt reduction works (note that then we have \({{\varvec{y}}}\subset {{\varvec{x}}}_i\)). If we succeed, we have an upper bound on the number of solutions to \(g(y;p) = 0\) in the region \({{\varvec{x}}}_i\times {{\varvec{p}}}\supset {{\varvec{y}}}\times {{\varvec{p}}}\). Note that this number translates to the original system of equations (23). By construction, the solutions to \(g(y;p) = 0\) are in one-to-one correspondence to those of \(f(x;p) = 0\).
For the planar circular restricted 4-body problem it turns out that we only have to consider the cases \(k=1,2,3\). The bifurcation function g never behaves worse than a cubic function. The actual evaluation of the increasingly complicated expressions \(g^{(k)}\) is achieved by automatic differentiation—a technique that automatically computes the (partial) derivatives of a given function, having access only to the algebraic expression of the function itself [7]. Furthermore, the Lyapunov–Schmidt reduction always succeeds in the case \((i,j) = (2,2)\), so we always have \(g = g(x_1) = f_1(x_1, x_2(x_1))\).
We end this section by describing how we ensure that the level set \(f_2(x) = 0\) forms exactly one connected component in \({{\varvec{x}}}\). This is an important part of the Lyapunov–Schmidt reduction, and allows us to obtain an upper bound on the number of solutions via Lemma 14. In our implementation, all operations and functions are extended to their interval-valued counterparts, as described in Sect. 6.1. First we verify that \((i,j) = (2,2)\) are suitable indices by checking that \(0 \notin \frac{\partial f_2}{\partial x_2}({{\varvec{x}}})\). Note that this condition prevents a component of \(f_2(x) = 0\) forming a closed loop inside \({{\varvec{x}}}\); each component must enter and exit \({{\varvec{x}}}\). Writing \({{\varvec{x}}}= [x_1^-, x_1^+]\times [x_2^-, x_2^+]\), we next verify that \(f_2(x_1^-, x_2^-)< 0 < f_2(x_1^+, x_2^+)\). This implies that \(f_2\) must vanish at least twice on the boundary of \({{\varvec{x}}}\). For each of the four sides \({{\varvec{s}}}_i\) \((i = 1,\dots ,4)\) of the rectangle \({{\varvec{x}}}\), we compute an enclosure of the zero set \({{\varvec{z}}}_i = \{x\in {{\varvec{s}}}_i:f_2(x) = 0\}\). On each such non-empty \({{\varvec{z}}}_i\), we check that \(f_2\) is strictly increasing in its non-constant variable. This implies that \(\cup _{i=1}^4 {{\varvec{z}}}_i\) must form two connected components \(w_1\) and \(w_2\). Each \(w_i\) is made up of either one (non-empty) zero set \({{\varvec{z}}}_j\) or two such sets joined at a corner of \({{\varvec{x}}}\). The level set \(f_2(x) = 0\) crosses each of \(w_1\) and \(w_2\) transversally, and exactly once. Thus we have proved that \(f_2(x) = 0\) forms exactly one connected component in \({{\varvec{x}}}\).
7 Computational Results
We will now describe the program used for proving the results of Sect. 5. Throughout all computations, we use the parameters \((s,t)\in {{\mathcal {P}}}= [0, \tfrac{1}{2}]\times [0, \tfrac{2}{3}]\) in place of the masses \((m_1, m_2, m_3)\in {{\mathcal {M}}}\), as described in Sect. 3. We also represent the phase space variable \(z\in {{\mathcal {C}}}\) in polar coordinates \((r, \theta ) \in [\tfrac{1}{3},2]\times [-\pi , \pi ]\), and desingularize the equations as described in Sect. 4. This transforms the original critical equation \(\nabla _z V(z;m) = 0\) into the equivalent problem \(F(r,\varphi ; s, t) = 0\), which is more suitable for computations.
The syntax of the program is rather straight-forward:
-
tol is the stopping tolerance used in the adaptive bisection process (used to discard subsets of the search region proved to have no zeros);
-
We consider parameters \(s\in {{\varvec{s}}}\) satisfying minS \(\le s\le \) maxS.
-
We consider parameters \(t\in {{\varvec{t}}}\) satisfying minT \(\le t\le \) maxT.
-
strategy determines which method we use:
-
1.
Explicitly count all solutions in \({{\mathcal {C}}}\)—used for \({{\varvec{s}}}\times {{\varvec{t}}}\subset {{\mathcal {P}}}\) of very small width only.
-
2.
Verify that there are no bifurcations taking place in \({{\mathcal {C}}}\) for \((s,t)\in {{\varvec{s}}}\times {{\varvec{t}}}\subset {{\mathcal {P}}}\).
-
3.
Resolve all bifurcations taking place in \({{\mathcal {C}}}\) for \((s,t)\in {{\varvec{s}}}\times {{\varvec{t}}}\subset {{\mathcal {P}}}\).
-
1.
The three strategies are based on Theorem 11, Theorem 12 together with Theorem 13, and Lemma 14, respectively.
When using strategy 1, no splitting in parameter space is carried out. Only the configuration space is adaptively subdivided during the search for isolated solutions. By contrast, when using strategy 2 or 3, splitting occurs in both parameter and configuration space. The splitting is carried out according to several different criteria in the code—all with the aim of locally verifying the conditions required by the theorems employed.
7.1 Proof of Theorem 7
We demonstrate the program by proving Theorem 7, running it on two different point-valued parameters (using strategy 1). The first one is for \((s,t) = (1/4, 1/4)\). This corresponds to the masses \(m = (1/16, 3/16, 12/16)\), and produces 8 solutions.
Here \(\texttt {smallList}\) is empty, signaling that the bisection stage was successful and had no unresolved subdomains. \(\texttt {noList}\) contains all boxes that were excluded using Theorem 10: there were 185 such boxes in this run. The eight elements of \(\texttt {yesList}\) are certified to contain unique zeros of f, according to Theorem 11. The same eight zeros are enclosed in the much smaller boxes stored in \(\texttt {tightList}\). Finally, the four boxes of \(\texttt {n}dtList\) are close to the lighter primaries, and are proven not to contain any zeros of f according to the exclusion results of Sect. 2.2. The output of this run is illustrated in Fig. 8a.
The second run is for \((s,t) = (9/20, 3/5)\). This corresponds to \(m = (27/100, 33/100, 40/100)\), and produces 10 solutions. The output of this run is listed below, and is illustrated in Fig. 8b.
These two runs complete the proof of Theorem 7.
7.2 Proof of Theorem 8
Extending these results to larger domains in the (s, t)-parameter space, we turn to Theorem 8. In order to improve execution times, we pre-split the parameter domain into smaller subsets as illustrated in Fig. 9. This particular splitting is rather ad-hoc, and is based on some heuristic trial runs; other splittings would work fine too.
We begin with the parameter set \({{\mathcal {P}}}_1 = [0,0.5]\times [0,0.55]\) which is pre-split into four rectangles:
see Fig. 9b. Note that \(\tilde{{{\mathcal {P}}}_1} = {{\mathcal {P}}}_1\cap {\tilde{{{\mathcal {P}}}}}\) corresponds to the mass region above the bifurcation line in Fig. 3b: this region contains all small masses \(m_1\) and \(m_2\). Examining each of the four rectangles separately, we now use strategy number 2, which means that we are attempting to verify that no bifurcations take place for these parameter ranges. We begin with \((s,t)\in [0, 10^{-6}]\times [0, 10^{-6}]\):
We repeat the computation on the second rectangle \((s,t)\in [10^{-6},0.5]\times [0, 10^{-6}]\):
We repeat the computation on the third rectangle \((s,t)\in [0, 10^{-6}]\times [10^{-6}, 0.55]\):
Finally, we repeat the computation on the fourth rectangle \((s,t)\in [10^{-6},0.5]\times [10^{-6}, 0.55]\):
These four runs prove that \({{\mathcal {P}}}_1\) is bifurcation free, which establishes part (a) of Theorem 8. Here we also get a report of several encountered parameter regions that are unordered; they are a consequence of the reparametrization of the mass space, and belong to the set \({{\mathcal {P}}}{\setminus }{\tilde{{{\mathcal {P}}}}}\), see Fig. 6. Such parameters are automatically discarded. Furthermore, ndgList contains all boxes where we have established that no bifurcations are taking place. As we are encountering parameters corresponding to both \(m_1\) and \(m_2\) becoming (arbitrarily) small, there is a significant partitioning near the lighter primaries, resulting in many boxes in ndtList.
Turning to part (b) of Theorem 8, we continue with the parameter set \({{\mathcal {P}}}_2 = [0,0.5]\times [0.58, 0.67]\). Note that \({{\mathcal {P}}}_2\cap {\tilde{{{\mathcal {P}}}}}\) corresponds to the mass region below the bifurcation line in Fig. 3b. This region contains the point of equal masses. Again we use strategy number 2.
This run proves that \({{\mathcal {P}}}_2\) is bifurcation free, as claimed. Note that the computational effort was smaller than that for \({{\mathcal {P}}}_1\). This is due to the fact that we are not considering any parameters corresponding to small masses in this run. We have now completed the proof of Theorem 8.
7.3 Proof of Theorem 9
Turning to Theorem 9, we focus on the only remaining parameter set \({{\mathcal {P}}}_3 = [0,0.5]\times [0.55, 0.58]\), which has been constructed to contain the entire line of bifurcation, see Fig. 6b. We now use strategy number 3, which means that we will locate and resolve all occurring bifurcations.
It turns out that all bifurcations take place within a small subset \({{\mathcal {C}}}_2\) of the configuration space. In the complement \({{\mathcal {C}}}_1 = {{\mathcal {C}}}{\setminus }{{\mathcal {C}}}_2\), the solutions are non-degenerate and persist for all parameter values in \({{\mathcal {P}}}_3\). In light of this, the program splits the configuration space into two parts: \({{\mathcal {C}}}= {{\mathcal {C}}}_1\cup {{\mathcal {C}}}_2\) which are examined separately. For purely technical reasons \({{\mathcal {C}}}_1\) is further divided into three subregions (named C11, C12, C13 in the output).
As explained above, \({{\mathcal {P}}}_3\) is pre-split into three rectangles:
see Fig. 9b.
We begin with \((s,t)\in [0, 0.2]\times [0.55, 0.58]\):
This run is not so interesting: there are no bifurcations taking place in the parameter region. Indeed, most of these parameters are unordered, and are discarded immediately. Apart from these, there are some parameters that are provably bifurcation-free (we are actually using the techniques of strategy 2 as part of the computation). No higher order bifurcations take place at all.
The next two parameter regions are much more challenging. We continue with \((s,t)\in [0.2, 0.25]\times [0.55, 0.58]\):
Let us now explain what we can learn from this run.
The first part of the run verifies that no bifurcations take place within \({{\mathcal {C}}}_1\) for parameters in \({{\mathcal {P}}}_3\): we use strategy 2 to establish this fact. Figure 10a illustrates the outcome of this part of the computations.
The second part of the run focuses on the remaining portion of the configuration space \({{\mathcal {C}}}_2\) (named \(\texttt {C0}\) in the output); in polar coordinates \({{\mathcal {C}}}_2 = [\tfrac{1}{3},1]\times [\tfrac{2}{10}, \tfrac{7}{10}]\). Throughout these computations, the parameter region \({{\mathcal {P}}}_3\) is adaptively bisected into many smaller sets; each one being examined via various test. When successfully classified, a parameter set is stored in one of several lists. Together with each such parameter set, we also store a covering of all possible solutions in \({{\mathcal {C}}}_2\). The lists are organized as follows:
-
1.
s0List: the parameters are unordered (we do not consider these).
-
2.
s1List: the number of solutions is at most one and fixed (no bifurcations) in \({{\mathcal {C}}}_2\).
-
3.
s111List: there are three connected components in \({{\mathcal {C}}}_2\); in each one of them the number of solutions is at most one and fixed (no bifurcations).
-
4.
s210List: there are two connected components in \({{\mathcal {C}}}_2\) where solutions may reside. For one of these the number of solutions is at most one and fixed (no bifurcations). For the second component there can be 0, 1, or 2 solutions (a quadratic bifurcation).
-
5.
s300List: there is one connected component in \({{\mathcal {C}}}_2\) where the number of solutions can be 1, 2 or 3 (a cubic bifurcation).
We also mention that s3List is the union of s111List, s210List, and s300List.
Our final computation deals with the region \((s,t)\in [0.25, 0.5]\times [0.55, 0.58]\):
The main difference to the previous run, is that there are no cubic bifurcations in this parameter region; only quadratic. Indeed, s300List is empty, but s210List is not.
In summary, the classifications of the parameter sets corresponding to the lists produced by these three runs are illustrated in Fig. 11.
The data presented in Fig. 11 (and Fig. 12) combined with our previous results provide all information needed to get an accurate count on the number of solutions.
First, note that the parameters in s1List form a connected set having a non-empty intersection with \({{\mathcal {P}}}_1\). We can therefore use part (a) of Theorem 8 to conclude that each parameter in s1List will yield exactly eight solutions in \({{\mathcal {C}}}\): seven in \({{\mathcal {C}}}_1\) and one in \({{\mathcal {C}}}_2\).
Similarly, we note that the parameters in s111List form a connected set having a non-empty intersection with \({{\mathcal {P}}}_2\). Using part (b) of Theorem 8 we conclude that out of the ten solutions in \({{\mathcal {C}}}\), seven reside in \({{\mathcal {C}}}_1\) and the remaining three belong to \({{\mathcal {C}}}_2\).
Turning to s210List, these parameters form a connected set having a non-empty intersection with both s1List and s111List. The bifurcation-free connected component of \({{\mathcal {C}}}_2\) detected for all parameters in s210List must, by continuity, carry exactly one solution in \({{\mathcal {C}}}_2\). Thus each parameter in s210List yields 1, 2 or 3 solutions in \({{\mathcal {C}}}_2\).
Finally, we discuss the parameters in s300List. Without additional information, the condition used to indicate the presence of a cubic bifurcation does not exclude there being no solutions; it only bounds the number of solutions from above by three. We do, however, guarantee the existence of at least one solution for elements of s300List via an extra topological check performed during the computations.
As mentioned above, each parameter set stored in s300List comes equipped with a cover of the associated solution set \(\{{{\varvec{c}}}_i\}_{i=1}^N\) making up a connected subset of \({{\mathcal {C}}}_2\). Forming the rectangular hull \({{\varvec{c}}}\) of all \({{\varvec{c}}}_i\), \(i=1,\dots ,N\) we prove that there must be a zero of f inside \({{\varvec{c}}}\) (and thus inside \({{\mathcal {C}}}_2\)) using the following topological theorem.
Theorem 15
Let \(f:[x^-, x^+]\times [y^-, y^+]\rightarrow {\mathbb R}^2\) be a continuous function (with components \(f_1\) and \(f_2\)), and assume that the following holds:
-
1.
Both \(f_1\) and \(f_2\) are negative on the two sides \(\{x^-\}\times [y^-, y^+]\) and \([x^-, x^+]\times \{y^-\}\).
-
2.
Both \(f_1\) and \(f_2\) are positive at the upper-right corner \((x^+, y^+)\).
-
3.
\(\max \{x\in [x^-, x^+] :f_1(x,y^+) = 0\} < \min \{x\in [x^-, x^+] :f_2(x,y^+) = 0\}\).
-
4.
\(\max \{y\in [y^-, y^+] :f_2(x^+,y) = 0\} < \min \{y\in [y^-, y^+] :f_1(x^+,y) = 0\}\).
Then f has (at least) one zero in \([x^-, x^+]\times [y^-, y^+]\).
Of course, the theorem remains true when we reverse all appearing signs, or interchange the function components. We can also relax assumption 1 and simply demand that \(f_1\) is negative on \([x^-, x^+]\times \{y^-\}\), and \(f_2\) is negative on \(\{x^-\}\times [y^-, y^+]\). We keep the current (stronger) assumptions as they actually hold for the problem at hand. A typical realization of these are illustrated in Fig. 13.
Proof
By the intermediate value theorem, assumptions 1 and 2 imply that both \(f_1\) and \(f_2\) change sign (and therefore vanish) somewhere along the two sides \([x^-, x^+]\times \{y^+\}\) and \(\{x^+\}\times [y^-, y^+]\). Therefore the sets appearing in assumptions 3 and 4 are non-empty, and the inequalities can be checked. Now, let \(x^\star \) satisfy \(\max \{x\in [x^-, x^+] :f_1(x,y^+) = 0\}< x^\star < \min \{x\in [x^-, x^+] :f_2(x,y^+) = 0\}\). Similarly, let \(y^\star \) satisfy \(\max \{y\in [y^-, y^+] :f_2(x^+,y) = 0\}< y^\star < \min \{y\in [y^-, y^+] :f_1(x^+,y) = 0\}\). Then, by a continuous deformation, we can form a new rectangle with corners \((x^-,y^-), (x^\star , y^+), (x^+, y^+)\), and \((x^+, y^\star )\) on which we can directly apply the Poincaré-Miranda theorem, see [14]. It follows that f has a zero in the original rectangle. \(\square \)
As all assumptions of the theorem are open, we can extend it to the case when f depends on (set-valued) parameters. This is what we use for the parameters forming \(\texttt {s300List}\).
7.4 Timings
We end this section by reporting the timings of all computations. These we carried out sequencially on a latop, using a single thread on an Intel Core i7-7500U CPU running at 2.70 GHz. The memory requirements are very low, and are not reported. For all computations we used the same splitting tolerance: \(\texttt {tol} = 10^{-6}\). The total wall time amounts to 16m:29s. This can of course be massively reduced by a finer pre-splitting in parameter space, combined with a simple script for parallel execution (Table 1).
8 Conclusions and Future Work
We have demonstrated a novel way to account for all relative equilibria in the planar, circular, restricted 4-body problem, and used it to give a new proof of the results by Barros and Leandro (Theorem 6). The novely of our approach is that it does not rely upon any algebraic considerations; it is purely analytic. As such it is completely insensitive to the exact shape of the gravitational potential, and generalizes to a wider range of problems. The main advantage, however, is that our method is amenable to set-valued computations, and as such can use the mature techniques and machinery available to solve non-linear equations with computer-assisted methods. This, in turn, gives us a realistic expectation that our approach is transferable to harder instances of the n-body problem, and our next challenge will be to work on the unrestricted 4-body problem, where we expect the number of relative equilibria to range between 32 and 50 (depending on the masses of the four primaries).
Notes
A restricted n-body problem is said to be of type \(m+k\) is there are m bodies with positive mass and k weightless bodies, and \(n = m+k\).
References
Albouy, A., Kaloshin, V.: Finiteness of central configurations of five bodies in the plane. Ann. Math. 176, 535–588 (2012)
Arenstorf, R.F.: Central configurations of four bodies with one inferior mass. Celest. Mech. 28, 9–15 (1982)
Barros, J., Leandro, E.: The set of degenerate central configurations in the planar restricted four-body problem. SIAM J. Math. Anal. 43, 634–661 (2011)
Barros, J., Leandro, E.: Bifurcations and enumeration of classes of relative equilibria in the planar restricted four-body problem. SIAM J. Math. Anal. 46, 03 (2014)
Euler, L.: De motu rectilineo trium corporum se mutuo attrahentium. Novi Comm. Acad. Sci. Imp. Petrop. 11, 144–151 (1767)
Gannaway, J.R.: Determination of all central configurations in the planar four-body problem with one inferior mass. Ph.D. thesis, Vanderbilt University (1981)
Griewank, A., Walther, A.: Evaluating Derivatives, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2008)
Hampton, M., Moeckel, R.: Finiteness of relative equilibria of the four-body problem. Invent. Math. 163, 289–312 (2006)
Kapela, T., Mrozek, M., Wilczak, D., Zgliczyński, P.: CAPD::DynSys: a flexible C++ toolbox for rigorous numerical analysis of dynamical systems. Commun. Nonlinear Sci. Numer. Simul. 101, 105578 (2020)
Krawczyk, R.: Newton-algorithmen zur besstimmung von nullstellen mit fehlerschranken. Computing 4, 187–201 (1969)
Kulevich, J., Roberts, G., Smith, C.: Finiteness in the planar restricted four-body problem. Qual. Theory Dyn. Syst. 8, 357–370 (2010)
Lagrange, J.L.: Essai sur le problème des trois corps. Oevres 6, 229–324 (1772)
Lindow, M.: Ein Spezialfall des Vierkörperproblems. Astron. Nachr. 216, 389–408 (1922)
Miranda, C.: Un’osservazione su un teorema di Brouwer. Consiglio Nazionale delle Ricerche, Lamezia Terme (1940)
Moore, R.E.: Interval Analysis, vol. 4. Prentice-Hall, Englewood Cliffs (1966)
Moulton, F.R.: The straight line solutions of the problem of \(n\) bodies. Ann. Math. 12, 1–17 (1910)
Neumaier, A.: Interval Methods for Systems of Equations. Cambridge University Press, Cambridge (1990)
Palmore, J.I.: Collinear relative equilibria of the planar n-body problem. Celest. Mech. 28, 17–24 (1982)
Pedersen, P.: Librationspunkte im restringierten Vierkörperproblem. Dan. Mat. Fys. Medd. 21, 6 (1944)
Roberts, G.E.: A continuum of relative equilibria in the five-body problem. Physica D 127(3:4), 141–145 (1999)
Simó, C.: Relative equilibrium solutions in the four body problem. Celest. Mech. 18, 165–184 (1978)
Tucker, W.: Validated Numerics: A Short Introduction to Rigorous Computations. Princeton University Press, Princeton (2011)
Vincent, A.J.H.: Sur la résolution des equations numériques. J. Math. Pures Appl. 10, 341–372 (1836)
Xia, Z.: Central configurations with many small masses. J. Differ. Equ. 91(1), 168–179 (1991)
Acknowledgements
We are very greatful to Professor Carles Simó for bringing this problem to our attention and for fruitful discussions. W.T. has been partially supported by VR Grant 2013-4964. P.Z. has been partially supported by the NCN Grant 2019/35/B/ST1/00655. J-Ll.F. has been partially supported by VR Grant 2019-04591.
Funding
Open access funding provided by Uppsala University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The conflict of interest and data sharing are not applicable to this article as no datasets were generated or analysed during the current study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Bounding the Number of Small Masses
Bounding the Number of Small Masses
The goal of this appendix is to prove Theorem 13.
From the discussion in Sect. 2.2 (see (4)) we know that no relative equilibrium exists close to \(p_3\), while for small \(m_1\) or \(m_2\) the relative equilibria can exist arbitrarily close to \(p_1\) and \(p_2\) (this is proved in this “Appendix”). If we relax the requirement that \(m_1 \le m_2\), then it is enough to investigate the neighborhood of \(p_1\) and the same conclusions will be valid by symmetry close to \(p_2\).
We will investigate the neighborhood of \(p_1\) when \(m_1 \in (0,M]\) and \(0<m_2<1\) with \(M=10^{-2}\). This will be split later in two cases: when the second mass \(m_2\) is away from zero (\(m_2 \ge M =10^{-2}\)) and when \(m_2 \in [0,M]\). The difference between these two cases is that for the former we can control all (four) the solutions in a neighbourhood of \(p_1\), while in the latter two of the four can escape the neighbourhood (while the other two stay always in it). Technically this is related to the behaviour of the eigenvalues of Hessian of the nonsingular part of V at \(p_1\). In the first case both eigenvalues are separated from 0 and the second case the smaller eigenvalue approaches zero when \(m_1,m_2 \rightarrow 0\).
1.1 The Separation of Eigenvalues at \(p_1\) for Non-singular Part of \(V\)
Let
The following straightforward lemma will be used during the entire discussion.
Lemma 16
The matrix \(D{\tilde{V}}(p_1)\) is positive definite. Moreover, if \(m_1 \le 1/2\) (in our case it is true because \(m_3 \ge m_1\) and \(m_3 + m_1 \le 1\)), then its eigenvalues \(\lambda _1 > \lambda _2\) are given by
and satisfy the following inequalities
The proof of this lemma is straightforward. It is important to notice that if \(m_1,m_2 \rightarrow 0\), then \(\lambda _2 \rightarrow 0\), but if \(m_2>\epsilon \) then \(\lambda _2\) stays away from 0. Also that \(\lambda _1-\lambda _2\) has a positive lower bound independent of the size of \(m_1\) and \(m_2\). Moreover, with the help of the code Bounds_eigenvalues and under the assumption that \(m_2, m_3 \ge 10^{-2}\) we can obtain sharper bounds: \(\lambda _1-\lambda _2<2.95643\), \(1.62287< \lambda _1 < 2.97843\), \(0.02167< \lambda _2 < 0.87695\). Also, if we allow \(m_2, m_3\in [0,1]\) then we have \(\lambda _1 \in [1,3]\) and \(\lambda _2\in [0,1]\).
1.2 Estimates for Higher Order Terms in \(V\)
We shift the coordinate origin to \(p_1\). Then V(x, y) can be written as follows
where \(r=\Vert (x,y)\Vert \) and h is analytic with respect to all variables and is \(O(\Vert (x,y)\Vert ^3)\) uniformly with respect to the masses. In fact h depends linearly on masses \(m_2\) and \(m_3\). The term V(0, 0) will be ignored.
For any \(R >0\) there exist \(C_1=C_1(m_2, m_3, x, y)\) and \(C_2=C_2(m_2,m_3, x, y)\) such that
for \(\Vert (x,y)\Vert \le R\).
We obtain quantitative instances of the upper bounds (34) and (35) as follows: we set \(R=2\) (from the discussion in Sect. A.1 it follows that for any \(i=1,2,3\) all relative equilibria are contained in \({\overline{B}}(p_i,R)\).) By performing automatic differentiation of the function h(x, y) for all (x, y) with \(\Vert (x,y)\Vert \le 10^{-4}\), the first one is three times the sum of all Taylor coefficients of order 3, and the second one is six times the sum of all Taylor coefficients of order 3. Hence, using the code Bounds_h and under the assumption of \(0\le m_2, m_3\le 1\), and \(\Vert (x,y)\Vert \le 10^{-3}\), we obtain the bounds
1.2.1 Estimates in Polar Coordinates
Using the polar coordinates we obtain the following expressions for partial derivatives of \(h(r,\varphi )\)
Let
In a more geometric way we can express the above partial derivatives as follows
Therefore we have the following estimates for the partial derivatives of h with respect to polar coordinates.
Lemma 17
Assume that (34, 35) hold for \(r \le R\). Then the following estimates are satisfied for \(r \le R\)
Lemma 18
The same assumptions as in Lemma 17. Let us set
Then we have for \(r \le R\)
1.3 Our System in Polar Coordinates Near \({p}_{1}\)
By an orthogonal change of variables V we can diagonalize \(D{\widetilde{V}}(0)\), hence V can be written as follows (compare (33))
Observe that \(\lambda \) and a depend on mass parameters (see Lemma 16) and the same holds for the coordinate change. Bounds on h and its derivatives uniform with respect to masses have been obtained for \(r<R\) in previous subsections.
In polar coordinates our potential (39) becomes (we set \(m=m_1\))
and
We will study the system
It turns out that it is relatively easy to solve (46), i.e. to find four curves \(\varphi (r)\) such that all solutions of (46) are of the form \((r,\varphi (r))\). This is done in the next subsection for \(0<m_2<1\) and \(m_1\) small enough.
Then we study the equation \(\frac{\partial V}{\partial r}(r,\varphi (r))=0\) on each of these curves. This is the place where we will split our considerations into two cases: \(m_2\) bounded from below and both \(m_1,m_2 \rightarrow 0\). The second case is much more subtle.
1.4 Solving \(\frac{\partial V}{\partial \varphi }(r,\varphi )=0\) for \(\varphi (r)\)
Our strategy to study the solutions of \(\frac{\partial V}{\partial \varphi }(r,\varphi (r))=0\) in the set \(\{(r,\varphi ): r \le R, \varphi \in [0,2\pi ] \}\) is: we try to find possible large intervals such that the solution is excluded, for this it is enough to have (see (42) and Lemma 18)
and then on the complementary intervals we want \(\frac{\partial ^2 V}{\partial \varphi ^2}\) to be either positive or negative on the whole interval, this is guaranteed by the following inequality (see (44)) and Lemma 18)
Remark 19
Recall that all this computations must have \(R\le 2\) since this is the range where the existence of the constants \(C_1, C_2\) have been proved.
To deal with (48) let us set
For this to make sense we must have that
which in our case (\(C_1=9.68850\), see (36), and \(\lambda -a > \frac{3}{4}\)) it is satisfied when \(R< 0.03952\).
We define the following sets (intervals) in \(\varphi \)
These sectors are positioned as follows as we move in the direction of increasing \(\varphi \): \(J^+_0\), \(N_0\), \(J^-_0\), \(N_1\), \(J^+_1\), \(N_2\), \(J^-_1\), \(N_3\), \(J^+_0\), .... The meaning of ± in J symbol is the sign of \(\cos (2 \varphi )\).
Lemma 20
Assume that assumptions of Lemma 17 are satisfied. Assume that
Then the equation
has a unique solution for \(r \le R\) in each of the intervals \(J_{0,1}^\pm \). These are all solutions of (57) with \(r \le R\).
The solutions curves of (57) satisfy
where \(\delta (r)\) is a different function for each branch and satisfies the following estimate
In particular, by assuming \(C_1=9.68850\), \(C_2=19.3770\) (see (36)), \(m_1\le 10^{-2}\), \(m_2\in [0,1]\) and that \(\lambda -a>\frac{3}{4}\) we obtain that R must be less than 0.02193 and that \(|\delta (r)|\le 0.0259 \) for all \(|r|\le 10^{-3}\).
Proof
Observe first that (56) implies condition (51) and hence each of the intervals \(J^+_0\), \(N_0\), \(J^-_0\), \(N_1\), \(J^+_1\), \(N_2\), \(J^-_1\), \(N_3\) is nonempty and we will show that the following conditions are satisfied
and
To establish (62, 63) we need condition (49) to hold for \(\varphi \in J^+_0 \cup J^+_1 \cup J^-_0 \cup J^-_1 \).
It is easy to see that it will hold iff
From (50) we know the value of \(\sin (\alpha )\), therefore (64) is equivalent to the following chain of inequalities
This is condition (56). Monotonicity in \(J_{0,1}^\pm \) and opposite signs on the end points give us the existence of one branch of \(\varphi (r)\) in each sector.
In fact in the above reasoning we can use any \(r \le R\) to define \(\alpha =\alpha (r)=\arcsin \left( \frac{2r C_1}{\lambda -a}\right) \) and define the sectors \(J_{0,1}^\pm \) using this \(\alpha \). In this way we obtain that
From this and the series expansion of \(\arcsin x\) we obtain estimates of \(\delta (r)\). \(\square \)
Now we estimate the derivative of \(\varphi '(r)\).
Lemma 21
The same assumptions as in Lemma 20. Then there exists \(R_1 = 10^{-3}\), such that for all \(r\le R_1\) and all branches of \(\varphi (r)\) holds
Proof
From (44), (45) and Lemma 17 it follows that
Now, using the fact that \(\left| \sin (2\varphi )\right| \le \frac{2 C_1 R_1}{\lambda -a}\) and \(|\cos (2\varphi )|=\sqrt{1-\sin (2\varphi )^2}\) we obtain the desired inequality. Finally, by plugging in the same constants as in Lemma 20 we obtain the numerical upper bound. \(\square \)
1.5 The Case of \(m_1\rightarrow 0\) with \(m_2\) Bounded from Below
Here we treat the case \(0< m_1 \le M\) and \((x,y) \in {\overline{B}}(0,R)\) (we will see below that it suffices \(M=10^{-2}\) and \(R=10^{-3}\)) in the configuration space with the goal to obtain \(R>0\) independent of \(m_1 \in (0,M]\), where we know that there is at most four solutions (central configurations) and all of them are non-degenerate.
We work with the representation of potential V given by (40). In that setting
where \(\lambda _1\), \(\lambda _2\) are as in Lemma 16 and for \(m_2 \ge 10^{-2}\) we have the bounds
1.5.1 Solving \(\frac{\partial V}{\partial r}(r,\varphi (r))=0\)
On curves \(\varphi (r)\) in \(J_{0,1}^+\) (i.e. \(\cos (2\varphi )>0\)) we have from (41) and Lemma 18
with \(S(r)=2 A(r) (\delta (r))^2\), where \(A(r)\in [-1,1]\). (All these come from expanding \(\cos \) around \(\varphi (0)\) up to order 2.) Notice that
with \(r\le R\).
With the same assumptions and constants as the ones from Lemma 20 we obtain the numerical upper bound \({|L(r)|\le 9.68851}\). So,
For the derivative we obtain (we use (44),(45), Lemmas 17, 20 and 21)
with \(A(r)\in [-1,1]\). So we have
for \(r\le R\).
With the same assumptions and constants as the ones from Lemma 20 we obtain the numerical upper bound \({|T(r)|\le 21.33076}\). So,
Notice that this last interval expression is always positive because \(\lambda >21.33076\cdot 10^{-3}\).
Similarly as before, on curves in \(J_{0,1}^-\) (i.e. \(\cos (2\varphi )<0\)) we have from (41) and Lemma 18
with \(S(r)=2 A(r) (\delta (r))^2\), where \(A(r)\in [-1,1]\). (All these come from expanding \(\cos \) around \(\varphi (0)\) up to order 2.) Hence, the bound in (68) is the same as in here.
With the same assumptions and constants as the ones from Lemma 20 we obtain the numerical upper bound \({|L(r)|\le 9.68851}\). So,
For the derivative we obtain (see the derivation of (69))
with \(A(r)\in [-1,1]\). Hence, the same bound 70 for T works here.
With the same assumptions and constants as the ones from Lemma 20 we obtain the numerical upper bound \({|T(r)|\le 21.33076}\). So,
In particular, since \(a>0.02167\) we obtain that the interval expression above is positive since \({a>21.33076\cdot 10^{-3}}\).
1.5.2 The Solutions are Non-degenerate
We have proven above that
does not vanish in the range \(m_1 \in (0,10^{-2}]\), \(m_2\ge [10^{-2},1)\) and \(r\le 10^{-3}\). This suffices to see that any critical point of V is non-degenerate. In fact, this is equivalent to Hessian is full rank, since
Hence \(\det D^2V (r,\varphi (r)) \ne 0\).
1.6 The Case of \(m_1, m_2\rightarrow 0\)
In this section we are interested in studying the critical points in a neighbourhood of \(p_1\) and with masses \(m_1, m_2\le M\) (M to be stated below, it is \(M=10^{-2}\)). For doing so, we will use polar coordinates \((r, \varphi )\) centered at \(p_1\). Hence, we will study the system of equations
In Sect. A.4 the equation \(\frac{\partial V}{\partial \varphi }(r,\varphi )=0\) for \(\varphi \) was studied obtaining four curves \((r,\varphi (r))\) on which will have uniform estimates over the whole range of \((m_1,m_2)\) including (0, 0). These bounds were produced in a coordinate system in which \(D^2{\tilde{V}}(p_1)\) is diagonal.
This time we work in another coordinate system (this choice simplifies the computations)
With this choice of axes we obtain that the Hessian matrix of \(D^2{\tilde{V}}(p_1)\) in \((r,\varphi )\)-coordinates has the following form for \(m_1=m_2=0\)
The second eigenvalue \(a=\lambda _2=0\) is the reason why we cannot apply tools from the previous subsection. This is the cause that while studying the equation \(\frac{\partial V}{\partial r}(r,\varphi (r))=0\) we will encounter that the number of solutions is more subtle and, in some cases, it will depend on the ratio \(\frac{m_1}{m_2}\).
1.6.1 Some Useful Formulae
In polar coordinates we have
Since we are interested in the stationary solutions of \(\nabla V(r,\varphi )\), in the following computations we will drop the terms in \(\Vert z-c\Vert ^2\) which do not depend of \((r,\varphi )\). Hence, the important part of \(\Vert z-c\Vert ^2\) is
Let us set
Using \(m_3=1-m_1-m_2\) we separate the potential V into several parts as follows
where
It turns out that the point \(p_1\) is a critical point for the potential
for any \(m_1,m_2\), hence obtaining that (z here denotes cartesian coordinates)
1.6.2 More Useful Formulae
To obtain a formula for \(r_2(r,\varphi )\) or \(1-\frac{1}{r_2^3}\) it is enough to do the substitution \(\varphi \rightarrow \varphi + \pi /3\) in the above expressions.
From (77) we obtain
From (78) we have
From (79) we have
A nicer and better organized expression for \( \frac{\partial V_2}{\partial \varphi }\) is
We will also need second derivatives. From (82, 83) we obtain
Third derivatives:
For the future use observe that we can factor \(r^2\) from \(\frac{\partial V_0}{\partial \varphi }\) as follows (see (81))
1.6.3 Solving \(\frac{\partial V}{\partial \varphi }(r,\varphi )=0\) for \(\varphi (r)\)
As we will see in the next subsections, we can prove the existence of four continuous curves \(\varphi (r)\) to the equation \(\frac{\partial V}{\partial \varphi }(r,\varphi )=0\), each of these curves satisfying \(\varphi (r)=\left\{ 0, \frac{\pi }{2}, \pi , \frac{3\pi }{2}\right\} +O(r)\). This is very similar to the result in lemma 20. However, as we will see in the next subsections, we get this differently: it is enough to study this problem for \(\frac{\partial V_0}{\partial \varphi }(r,\varphi )=0\) since the other terms are perturbative in terms of the two small masses \(m_1, m_2\).
1.6.4 Equation \(\frac{\partial V_0}{\partial \varphi }=0\)
From (82) we see that the solution of \(\frac{\partial V_0(r,\varphi )}{\partial \varphi }=0\) is given by \(\sin \varphi =0\) which is \(\varphi =0\) or \(\varphi =\pi \) (these are the solutions "collinear" with the large body at \(p_3\)) or by
which is equivalent to (we drop the solution \(r=0\))
There are two branches of solutions of (91) , denoted by \(\varphi _0^\pm (r)\). The series expansion for the first branch is
and for the other branch
Actually, we have the more accurate result.
Lemma 22
From code Bound_1 we obtain for \(r \in [0,R]\), \(R=10^{-1}\), that
Observe that \(\varphi _0^+(r)-\varphi _0^+(0)=-(\varphi _0^-(r)-\varphi _0^-(0))\). This is implied by \((r,\varphi ^\pm (r))\) being just a parametrisation of the circle \(r_3=1\).
Since \(r_3=1\) on \((r,\varphi _0^\pm )\) from (83) we obtain
1.6.5 Analysis of \(\frac{\partial V}{\partial \varphi }=0\)
In this subsection we study the full problem of \(\frac{\partial V}{\partial \varphi }=0\) treating it as the perturbation of the curves solving equation \(\frac{\partial V_0}{\partial \varphi }=0\) considered in Sect. A.6.4.
Observe that from (76, 84, 86) it follows that
Therefore we rewrite equation \(\frac{\partial V}{\partial \varphi }=0\) as
where \(m_3=1-m_1-m_2\).
The implicit function theorem implies that
where \(\delta (r,m_1,m_2)\) is analytic and \(\varphi _0(r)\) is any of the four curves satisfying
Below we develop constructive estimates.
1.6.6 Bounds for \(\delta (r,m_1,m_2)\)
We want to solve Equation (73) for \(\varphi (r,m_1,m_2)=\varphi _0(r) + \Delta (r,m_1,m_2)\), where \(\varphi _0(r)\) is any of the four branches of solutions of \(\frac{\partial V_0}{\partial \varphi }(r,\varphi _0(r))=0\).
Let us set (compare (89))
and then
Equation (73) becomes
From Expansion (81) we obtain the following result.
Lemma 23
Let \(R_0 <1\). Then
where for \(r \le R_0\) it holds that
for some positive constants \(D_1\), \(D_2\), \(D_3\), \(D_4\) and \(D_5\).
In particular, for \(R_0=10^{-3}\) we have constants
\(D_1 = 10.0662, D_2 = 16.665, D_3 = 45.4482\), \(D_4= 98.4105, D_5= 173.801\). The code with its proof is named Lemma_gs.
Proof
The idea of the proof is that with the help of Automatic Differentiation and Interval arithmetics we can compute the Taylor expansion of the left hand-side of (103) in any interval, call it \(f(r,\varphi )\). Then, since \(f(r, \varphi ) = r^3g(r, \phi )\), \(g(r, \phi )\in f_{3,0}([0, R_0]\times [0, 2\pi ])\) (\(f_{3,0}\) is the coefficient (3, 0) of f). Also, \(\partial _\varphi g\in f_{3,1}\) and \(\partial _{\varphi ,\varphi }g \in 2f_{3,2}\). For the partial derivatives of g with respect to r, one uses \(r \partial _r f-3f=r^4 \partial _r g\) and proceed as before replacing the computations on f with \(r\partial _r f-3f\). \(\square \)
We will need some expressions for the derivatives of \(f_0\). From (100, 82) it follows that
From (104) and (103) in Lemma 23 we obtain
1.6.7 Estimates on \(\delta \)
Here we establish the following theorem, bounding the solution curves \(\varphi (r)\) of the following equation \({\frac{\partial V}{\partial \varphi }(r,\varphi _0(r))=0}\).
Theorem 24
Consider Eq. (102). Let \(\varphi _0(r)\) be any of the four corresponding branch of solutions of \(\frac{\partial V_0}{\partial \varphi }=0\), let \(\alpha \) be a positive number and let J be any of the intervals centered at \(\left\{ 0, \frac{\pi }{2}, \pi , \frac{3\pi }{2}\right\} \) with width \(\frac{\alpha }{2}\).
With \(R_0\), \(m_1, m_2\), and \(D_1\), \(D_2\), \(D_3\) be as in Lemma 23 we obtain the bounds
Assume that \(0<R_1 \le R_0\) is such that
Then any solution \(\varphi (r)\) of (102) is defined for \(0 <r \le R_1\) and satisfies
In particular, this Theorem is true for \(m_1, m_2\in [0, 10^{-2}]\), \(R\le 10^{-3}\), \(\alpha =\frac{\pi }{4}\) and constants \(D_1, D_2\) and \(D_3\) as in Lemma 23. The proof is in the code Theorem_phis and we obtain
Proof
The basic idea of the proof is to find \(\Delta \), such that \(f(r,\varphi _0(r)\pm \Delta )\) have opposite signs and \(\frac{\partial f}{\partial \varphi }(r,\varphi _0(r) + [-\Delta ,\Delta ])\) is positive or negative. We should also obtain that \(\Delta =O(m_2)\).
By combining assumptions (111, 112) we obtain that for \(\varphi \in J\), and \(r \le R_0\) holds
From assumption (113) it follows that for a fixed \(r \in (0,R_1]\) equation (102) has at most one solution. This is in fact already contained in Lemma 20, but here it is made explicit.
Let us fix \(0 < \Delta \le \alpha /2\). From assumptions (114) and (111) we see that \(f_0(r,\varphi _0(r)-\Delta )\) and \(f_0(r,\varphi _0(r)+\Delta )\) have opposite signs and
To prove that \(f(r,\varphi _0(r)-\Delta )\) and \(f(r,\varphi _0(r)+\Delta )\) have opposite signs we require that (we use (117) and (110) )
Therefore we have proved that
provided that it holds
\(\square \)
1.6.8 Estimates on \(\frac{\partial }{\partial r}\delta (r,m)\)
We would like to find explicit bounds on \(\frac{\partial \delta }{\partial r}(r,m)\), which will be O(1) for \(m_2 \rightarrow 0\).
We will work under assumptions of Theorem 24.
Let \(\Delta (r;m)=m_2 \delta (r;m)\). We will use the notation \(\delta '(r)=\frac{\partial \delta }{\partial r}(r,m)\) and \(\Delta '(r)=\frac{\partial \Delta }{\partial r}(r,m)\)
Differentiating Eq. (102) with respect to r we obtain (we write \(\varphi (r)=\varphi _0(r) + \Delta (r)\))
Hence
We will now estimate various terms in (121) in order to show that it is \(O(m_2)\) times a bounded function.
Since by the definition of curves \(\varphi _0(r)\) it holds that
we obtain that
With the help of the Taylor theorem we obtain that there exists \(\theta _1(r) \in (0,1)\) such that
implying that
where
Observe that \(h(r)=O(1)\). Indeed from (110, 111) we have
From (107) we have for \(r \le R_1\) and \(\varphi \in {\mathbb {R}}\)
From this and the estimate (115) for \(\delta (r)\) we obtain
Combining (125, 123, 124) we obtain
Therefore from (122) and the above estimates we infer that
where
Observe that
hence \(h_2(r)=O(1)\) and, more explicitly,
We have for some \(\theta _2(r) \in (0,1)\)
We have
It is clear that \(\Delta '(r)=m_2 O(1)\). More explicitly,
where
and
Hence, we have that
1.6.9 Non-degeneracy of the Solutions for \(m_1, m_2\le 10^{-2}\) and \(R\le 10^{-3}\)
The following theorem summarizes all the expansions for the four solutions, proving in particular the non-degeneracy (\(\frac{d}{dr}\left( \frac{\partial V}{\partial r}(r, \varphi (r))\right) \ne 0\)).
Moreover, all solutions are nondegenerate.
Theorem 25
From the codes Existence_results and Existence_results_derivative, and for masses \(m_1, m_2\in [0, 10^{-2}]\) and \(r\le 10^{-3}\) we have that
-
$$\begin{aligned} \varphi (r)\in & {} \frac{\pi }{2}+\frac{r}{2}+\frac{r^3}{48}+[ 0, 0.00119]r^4+m_2\delta (r), \\ \frac{\partial V}{\partial r}(r, \varphi (r))\in & {} m_1 r+m_2\left( \frac{9}{4}r+[ 1.8369662715, 2.7184275039]r^2\right) \\{} & {} + m_2^2([ -1.3099756791, 2.3608417928]r) \\{} & {} + m_1m_2([ -0.0242767833, 0.0242767833]r)- \frac{m_1}{r^2} \\\in & {} m_1 r+[ 2.2366574753, 2.2765696133]m_2 r-\frac{m_1}{r^2},\\ \frac{d}{dr}\left( \frac{\partial V}{\partial r}(r, \varphi (r))\right)\in & {} [ 0.9997275169, 1.0000428593]m_1\\{} & {} +[ 2.1720852328, 2.3307223616] m_2+2m_1/r^3 \end{aligned}$$
-
$$\begin{aligned} \varphi (r)\in & {} \frac{3\pi }{2}-\frac{r}{2}-\frac{r^3}{48}-[ 0, 0.0001171998]r^4+m_2\delta (r), \\ \frac{\partial V}{\partial r}(r, \varphi (r))\in & {} m_1 r+m_2\left( \frac{9}{4}r+[ -2.7094865798, -1.8290229479]r^2\right) \\{} & {} m_2^2([ -1.3055557029, 2.3564218165]r)\\{} & {} m_1m_2([ -0.0242767833, 0.0242767833]r)- \frac{m_1}{r^2} \\\in & {} m_1 r+[ 2.2339921885, 2.2738069860]m_2 r-\frac{m_1}{r^2},\\ \frac{d}{dr}\left( \frac{\partial V}{\partial r}(r, \varphi (r))\right)\in & {} [ 0.9997275169, 1.0000428593]m_1\\{} & {} +[ 2.1720852328, 2.3307223616]m_2+2m_1/r^3 \end{aligned}$$
-
$$\begin{aligned} \varphi (r)= & {} 0+m_2\delta (r) \\ \frac{\partial V}{\partial r}(r, \varphi (r))= & {} 3r+[ -3.0990751276, -2.8895899185]r^2 \\{} & {} + m_1[ -2.0120180121, -1.9819212587]r \\{} & {} + m_2[ -2.3047529557, -2.1865555824]r -\frac{m_1}{r^2}\\\in & {} [ 2.9969009248, 3.0000000000]r \\{} & {} +[ -2.0120180121, -1.9819212587]m_1 r\\{} & {} +[ -2.3047529557, -2.1865555824]m_2 r-\frac{m_1}{r^2}\\ \frac{d}{dr}\left( \frac{\partial V}{\partial r}(r, \varphi (r))\right)\in & {} [ 2.9419980924, 3.0090010494]+2m_1/r^3 \end{aligned}$$
-
$$\begin{aligned} \varphi (r)= & {} \pi +m_2\delta (r) \\ \frac{\partial V}{\partial r}(r, \varphi (r))= & {} 3r+[ 2.9009801393, 3.1116899497]r\\{} & {} + m_1[ -2.0180903512, -1.9878489103] r \\{} & {} + m_2[ -2.3135901858, -2.1950040940] r -\frac{m_1}{r^2}\\\in & {} [ 3.0000000000, 3.0031116900] r+ \\ {}{} & {} +[ -2.0180903512, -1.9878489103]m_1 r+\\{} & {} +[ -2.3135901858, -2.1950040940]m_2 r-\frac{m_1}{r^2}\\ \frac{d}{dr}\left( \frac{\partial V}{\partial r}(r, \varphi (r))\right)\in & {} [ 2.9477930888, 3.0150582104]+2m_1/r^3 \end{aligned}$$
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Figueras, JL., Tucker, W. & Zgliczynski, P. The Number of Relative Equilibria in the PCR4BP. J Dyn Diff Equat 36, 2827–2877 (2024). https://doi.org/10.1007/s10884-022-10230-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10884-022-10230-6