1 Introduction

According to Newton’s law of gravity, the movements of n planets (located at positions \(p_1,\dots ,p_n\), and having masses \(m_1,\dots ,m_n\)) are given by the system of differential equations:

$$\begin{aligned} m_i\ddot{p}_i = - G\sum _{j \ne i}m_im_j\frac{p_i - p_j}{\Vert p_i - p_j\Vert ^3}\qquad (1\le i \le n). \end{aligned}$$

A relative equilibrium motion is a planar solution to the n-body problem that performs a rotation of uniform angular velocity \(\omega ^2\) about the system’s center of mass c. In other words, a planar solution to the following system of equations:

$$\begin{aligned} -\omega ^2(p_i - c) = \sum _{j \ne i}m_j\frac{p_i - p_j}{\Vert p_i - p_j\Vert ^3}\qquad (1\le i \le n). \end{aligned}$$

The relative equilibria of the 3-body problem have been known for centuries. In terms of equivalence classes, there are—irrespectively of the masses—exactly five relative equilibria. Three of these are the collinear configurations discovered by Euler [5]; the remaining two are Lagrange’s equilateral triangles [12], see Fig. 1.

Fig. 1
figure 1

Relative equilibria for the 3-body problem. a The collinear case of Euler; b the equilateral triangle case of Lagrange

The collinear configurations found by Euler have been generalized for n bodies by Moulton [16]. There are exactly n!/2 such collinear equivalence classes.

In 2006, Hampton and Moeckel [8] proved that the number of relative equilibria of the Newtonian 4-body problem is finite (always between 32 and 8472). Their computer-aided proof is based on symbolic and exact integer computations. The upper bound 8472 is believed to be a large overestimation; numerical simulations suggest that no more than 50 equilibria exists, see e.g. [21].

Albouy and Kaloshin [1] almost settled the question of finiteness for \(n = 5\) bodies. They proved that there are finitely many relative equilibria in the Newtonian 5-body problem, except perhaps if the 5-tuple of positive masses belongs to a given co-dimension-2 subvariety of the mass space. By Bèzout’s theorem, an upper bound on the number of relative equilibria is obtained (outside the exceptional subvariety), but the authors conclude “However, the bound is so bad that we avoid writing it explicitly”.

Relaxing the positivity of the masses can produce a continuum of relative equilibria. In [20] Roberts demonstrated this for the 5-body problem with one negative mass.

Looking at the restricted 4-body problem (i.e. when one of the planets has an infinitesimally small mass), Lindow [13] and Palmore [18] found that in the collinear case, only two relative equilibria exist. In the equilateral setting, Gannaway [6], Pedersen [19] and Simó [21] found numerical evidence of there being 8, 9, or 10 relative equilibria. Gannaway’s thesis is further explained in [2]. As a first rigorous result, Kulevich et al [11] proved finiteness with an upper bound of 196 relative equilibria. This, however, was assumed to be a great overestimation, and later Barros and Leandro [3, 4] proved—as earlier works had indicated—that there could only be 8, 9 or 10 relative equilibria, depending on the three primary masses. These proofs are based on techniques from algebra, used to count solutions to large polynomial equations with huge integer coefficients. Due to the high degrees and large number of monomials of the polynomials involved, Barros and Leandro resorted to the software Maple, so their proof is (mildly) computer-assisted. The main technique used is inspired by an algorithm developed by Vincent [23] which allows for a significant reduction in the number of variations of signs in the coefficients of a polynomial. Together with Descartes’ rule of signs, it is possible to determine the sign of the relevant polynomials in a given region of space. The main result of [4] puts the earlier works mentioned above on firm mathematical ground.

In this paper, we present a new approach to counting relative equilibria in various settings. We use techniques from real analysis rather than algebraic geometry, an approach we believe will generalize better to more complicated settings such as the full 4-body problem, which still remains unresolved. Moreover, the techniques presented here do not use algebraic properties of the system, only differentiability. This quality could play a role in more general contexts, like in curved spaces or in physical systems where the potential is not given by Newton’s laws of gravitation.

In what follows, we will focus on the planar, circular, restricted 4-body problem, and give a new proof of the results of Barros and Leandro. In this setting, the three primary bodies form an equilateral triangle as in Fig. 1b.

2 Formulating the Problem

Let \(m_1, m_2, m_3\) denote the positive masses of the three primaries, and let \(p_1, p_2, p_3\) denote their positions in \({\mathbb {R}}^2\), which form an equilateral triangle. Also let z be the position of the fourth (weightless) body. In this setting, the gravitational pull on z is described by the amended potential:

$$\begin{aligned} V(z;m) = \frac{1}{2}\Vert z - c\Vert ^2 + \sum _{i=1}^3 m_i\Vert z - p_i\Vert ^{-1}. \end{aligned}$$
(1)

Here c denotes the center of mass of the primaries. It follows that the locations of the relative equilibria are given by the critical points of V. Thus, the challenge of counting the number of relative equilibria can be translated into the task of counting the critical points of V, given the appropriate search region \({{\mathcal {C}}}\times {{\mathcal {M}}}\). Here \({{\mathcal {C}}}\subset {\mathbb {R}}^2\) is the set of positions the of weightless body, and \({{\mathcal {M}}}\) is the set of masses, discussed in Sects. 2.2 and 2.1, respectively.

We are now prepared to formulate the main problem of this study (Fig. 2):

Main Problem 1

How many solutions can the critical equation \(\nabla _z V(z;m) = 0\) have (in \({{\mathcal {C}}}\)) when \(m \in {{\mathcal {M}}}\)?

Fig. 2
figure 2

Level curves of the amended potential (1) with the three primaries forming an equilateral triangle. a Equal masses: \(m_1 = m_2 = m_3 = 1/3\); b different masses: \(m_1 = 1/16\), \(m_2 = 3/16\), \(m_3 = 12/16\) with the heaviest primary located at the origin

With our approach, there are three difficulties that must be resolved in order to be successful:

  1. 1.

    The potential V (and its gradient) is singular at the three primaries.

  2. 2.

    When the mass of a primary tends to zero, several critical points of V tend to that primary.

  3. 3.

    As masses (not close to zero) are varied, the number of critical points of V may change due to bifurcations taking place.

When all masses are uniformly bounded away from zero, the singularities at the primaries (case 1) can be handled by proving that no critical points can reside in certain small disks centered at the primaries. This is explained further in Sect. 2.2.

For masses approaching zero (case 2), we end up with a multi-restrictedFootnote 1 problem of type \(2 + 2\) (\(m_1\rightarrow 0\)) or of type \(1 + 3\) (\(m_1\rightarrow 0\) and \(m_2\rightarrow 0\)). Both scenarios can be resolved by desingularizing the potential V along the circle \(\Vert z - p_3\Vert = 1\). More about this in Sects. 4 and 6.3.1.

Out of these difficulties, the bifurcations (case 3) are the easiest to resolve. In principle, this only requires verifying the sign of certain combinations of partial derivatives of the potential V. We will address this in Sect. 6.5.

2.1 The Mass Space \({{\mathcal {M}}}\)

Without loss of generality, we may assume that the masses of the primaries are normalized \(m_1 + m_2 + m_3 = 1\) and ordered \(0 < m_1 \le m_2\le m_3\). Call this set \({{\mathcal {M}}}\):

$$\begin{aligned} {{\mathcal {M}}}= \{m\in {\mathbb {R}}^3 :m_1 + m_2 + m_3 = 1 \text { and } 0 < m_1 \le m_2\le m_3\}. \end{aligned}$$
(2)

We illustrate the unordered and ordered mass space in Fig. 3. Note that the midpoint of the large triangle corresponds to all masses being equal: \(m = (\tfrac{1}{3},\tfrac{1}{3},\tfrac{1}{3})\).

Fig. 3
figure 3

a The normalised mass space \(m_1 + m_2 + m_3 = 1\) in barycentric coordinates, with the region of ordered masses (\(0 < m_1 \le m_2 \le m_3\)) highlighted in a darker shade. The blue curve illustrates the set on which bifurcations take place. b 1/6 of the normalized mass space corresponding to the space \({{\mathcal {M}}}\). The mass \(m^\star \) corresponds to a cubic-type bifurcation (Color figure online)

The bifurcations taking place are of two kinds: quadratic and cubic. The latter are rare, and come from the inherent 6-fold symmetry of the normalized mass space, see Fig. 3 (compare to Figure 21 of [19]). The quadratic bifucations are of a saddle-node type, in which two unique solutions approach each other, merge at the bifucration, and no longer exists beyond the point of bifurcation, see Fig. 4.

Fig. 4
figure 4

The typical quadratic bifurcation (blue dot), in which two solutions merge and vanish, becomes a cubic-type bifurcation at \(m^\star \) due to symmetry

In Sect. 6.5 we will present the mathematics needed to resolve the two types of bifurcations taking place.

2.2 The Configuration Space \({{\mathcal {C}}}\)

Without loss of generality, we may fix the positions of the three primaries: \(p_1 = (\tfrac{\sqrt{3}}{2}, +\tfrac{1}{2})\), \(p_2 = (\tfrac{\sqrt{3}}{2}, -\tfrac{1}{2},)\), and \(p_3 = (0,0)\), thus forming an equilateral triangle with unit length sides, as in Fig. 1b.

We begin by deriving some basic results that will be used later on. Let \(\nabla _z V(z;m)\) denote the gradient (with respect to z) of the potential V. A relative equilibria is then simply a solution to the equation \(\nabla _z V(z;m) = 0\), i.e., a critical point of V(zm).

In what follows, it will be convenient to adopt the following notation:

$$\begin{aligned} r_i = r_i(z) = \Vert z - p_i\Vert \quad \quad (i = 1,2,3). \end{aligned}$$
(3)

In determining the relevant configuration space, we will use the following two exclusion results (Lemmas 3 and 5, respectively):

  • Assume that \(m_3 \ge 1/3\). If \(r_3(z) \le 1/3\), then z is not a critical point of the potential V.

  • If \(r_i(z) \ge 2\) for some \(i =1,2,3\), then z is not a critical point of the potential V.

Combining these two results, we see that, for \(m\in {{\mathcal {M}}}\), all relative equilibria must satisfy \(1/3\le r_3\le 2\). We will take this to be our global search region \({{\mathcal {C}}}\) in configuration space:

$$\begin{aligned} {{\mathcal {C}}}= \{z\in {\mathbb {R}}^2 :1/3 \le \Vert z\Vert \le 2\}. \end{aligned}$$
(4)

A more detailed analysis reveals that the relative eqililibria all reside in an even smaller subset of \({{\mathcal {C}}}\), see Fig. 5a. These seven regions were presented already in the work of Pedersen [19]. We will, however, not use this level of detail in our computations.

Fig. 5
figure 5

The configuration space with the three primaries \(p_1, p_2, p_3\) (red points) spanning an equilateral triange. a The seven regions (colored light grey) in phase space where all relative equilibria must reside. When the masses are ordered, all relative equilibria are restricted to a smaller subset (colored dark grey). The blue curve illustrates the set on which bifurcations take place. b The global search region \({{\mathcal {C}}}= \{z\in {\mathbb {R}}^2 :1/3 \le \Vert z\Vert \le 2\}\) of the configuration space. Note that the heaviest primary \(p_3 = (0,0)\) does not belong to \({{\mathcal {C}}}\) (Color figure online)

We end this section by deriving the two exclusion results used above. The critical equation \(\nabla _z V(z;m) = 0\) can be written as

$$\begin{aligned} (z-c) + m_1 \nabla \frac{1}{r_1} + m_2 \nabla \frac{1}{r_2} + m_3 \nabla \frac{1}{r_3} = 0. \end{aligned}$$
(5)

Note that we have

$$\begin{aligned} \left\| \nabla \frac{1}{r_i} \right\| = \frac{1}{r_i^2} \qquad (i = 1,2,3). \end{aligned}$$
(6)

The following lemma provides an a priori bound on how close a solution of \(\nabla _z V(z;m) = 0\) can be to one of the primaries.

Lemma 2

Let \(z \in {\mathbb {R}}^2\) and set \(r_i = r_i(z)\). If for some \(i =1,2,3\) we have

$$\begin{aligned} \frac{m_i}{r_i^2}> & {} \frac{1-m_i}{(1-r_i)^2} + r_i + 1, \end{aligned}$$
(7)

then z is not a critical point of the potential V.

Proof

First, we note that Eq. (5) can be rewritten as

$$\begin{aligned} m_1 \nabla \frac{1}{r_1}= -m_2 \nabla \frac{1}{r_2} - m_3 \nabla \frac{1}{r_3} - (z-c). \end{aligned}$$
(8)

We use this formulation, and consider small \(r_1\), so that the norm of the left-hand side is larger than the norms of terms appearing to the right.

Observe that (7) implies that \(r_i < 1\), because \( \frac{m_i}{r_i^2} > 1\) and \(m_i < 1\). Hence, throughout the proof, we assume that \(r_1 < 1\).

From the triangle inequality \(r_1 + r_2 \ge 1\), we have

$$\begin{aligned} r_2\ge & {} 1-r_1 , \\ r_2^2\ge & {} (1-r_1)^2, \\ \Vert z-c\Vert\le & {} \Vert z-p_1\Vert + \Vert p_1 - c\Vert \le r_1 + 1. \end{aligned}$$

Using this together with (7), we obtain

$$\begin{aligned} \left\| -m_2 \nabla \frac{1}{r_2} - m_3 \nabla \frac{1}{r_3} - (z-c) \right\|\le & {} \frac{m_2}{r_2^2} + \frac{m_3}{r_3^2} + r_1 + 1 \\\le & {} \frac{m_2}{(1-r_1)^2} + \frac{m_3}{(1-r_1)^2} + r_1 + 1\\= & {} \frac{1-m_1}{(1-r_1)^2} + r_1 + 1 \\< & {} \frac{m_1}{r_1^2} = \left\| m_1 \nabla \frac{1}{r_1}\right\| , \end{aligned}$$

hence (8) is not satisfied. \(\square \)

Since we are only considering ordered masses, we always have \(m_3\ge 1/3\). This fact, combined with Lemma 2, immediately gives us a uniform bound for the primary \(p_3\).

Lemma 3

Assume that \(m_3 \ge 1/3\). If \(r_3(z) \le 1/3\), then z is not a critical point of the potential V.

Proof

We verify (7) with \(i = 3\). Estimating each side of the inequality, we find

$$\begin{aligned} \frac{m_3}{r_3^2} \ge \frac{1/3}{(1/3)^2} = 3 \qquad \textrm{and}\qquad \frac{1-m_3}{(1-r_3)^2} + r_3 + 1 \le \frac{2/3}{(2/3)^2} + \frac{1}{3} + 1 = \frac{3}{2} + \frac{4}{3} = \frac{17}{6} < 3. \end{aligned}$$

\(\square \)

Remark

For \(m_3 \ge 1/3\), we can exclude the slightly larger region \(r_3 \le 0.3405784\).

Note that Lemma 2 can also be used to exclude a small disc centered at \(p_i\) (\(i = 1,2\)) whenever \(m_i\) is not too small. As an example, if \(m_i\ge \varepsilon > 0\), then we can exclude the disc \(r_i \le \tfrac{1}{2}\sqrt{\varepsilon }\).

Another exclusion principle is given by the following result.

Lemma 4

Let \(z \in {\mathbb {R}}^2\) and set \(r_i = r_i(z)\). If for some \(i = 1,2,3\) we have

$$\begin{aligned} r_i - 1> & {} \frac{1-m_i}{(1-r_i)^2} + \frac{m_i}{r_i^2}, \end{aligned}$$
(9)

then z is not a critical point of the potential V.

Proof

The proof uses the idea that we may rewrite Equation 5 as

$$\begin{aligned} m_1 \nabla \frac{1}{r_1} + m_2 \nabla \frac{1}{r_2} + m_3 \nabla \frac{1}{r_3} = -(z - c). \end{aligned}$$
(10)

Without any loss of the generality, we may assume that \(i=3\) and shift coordinate frame so that \(p_3\) is situated at the origin. Then \(\Vert z\Vert = \Vert z - p_3\Vert = r_3\), and from (9) (with \(i=3\)) it follows that \(r_3 > 1\). This, together with the triangle inequality, gives

$$\begin{aligned} \Vert z-c\Vert \ge \Vert z\Vert - \Vert c\Vert \ge r_3 - 1. \end{aligned}$$
(11)

Here we use the fact that the center of mass c is located within the triangle spanned by the three primaries, and therefore \(\Vert c\Vert \le 1\).

Applying the triangle inequality again, we have \(r_i + 1 \ge r_3\) (\(i=1,2\)), from which it follows that

$$\begin{aligned} r_i \ge r_3 - 1 \qquad \textrm{ and }\qquad r_i^2 \ge (r_3-1)^2 \qquad \qquad (i = 1,2). \end{aligned}$$
(12)

Therefore we obtain the following estimate (we use (6, 12, 11))

$$\begin{aligned} \left\| m_1 \nabla \frac{1}{r_1} + m_2 \nabla \frac{1}{r_2} + m_3 \nabla \frac{1}{r_3}\right\| \le \frac{m_1}{r_1^2} + \frac{m_2}{r_2^2} + \frac{m_3}{r_3^2}\le & {} \\ \frac{m_1}{(r_3-1)^2} + \frac{m_2}{(r_3-1)^2} + \frac{m_3}{r_3^2} = \frac{1-m_3}{(r_3-1)^2} + \frac{m_3}{r_3^2}< & {} r_3 - 1 \le \Vert z-c\Vert , \end{aligned}$$

hence (10) is not satisfied. \(\square \)

A direct consequence of Lemma 4 is the following uniform bound.

Lemma 5

If \(r_i(z) \ge 2\) for some \(i =1,2,3\), then z is not a critical point of the potential V.

Proof

Using only \(r_i(z) \ge 2\), we verify (9). Indeed, a straight forward computation gives:

$$\begin{aligned} \frac{1-m_i}{(r_i-1)^2} + \frac{m_i}{r_i^2} \le 1-m_i + \frac{m_i}{4}=1-\frac{3 m_i}{4} < 1 \le r_i -1. \\ \end{aligned}$$

\(\square \)

3 Reparametrizing the Masses

Due to the normalisation \(m_1 + m_2 + m_3 = 1\), the mass space can be viewed as a 2-dimensional set parametrized by \(m_1\) and \(m_2\). Instead of working directly with the masses \((m_1, m_2)\), we introduce the following non-linear, singular transformation:

$$\begin{aligned} s = \frac{m_1}{m_1 + m_2}\qquad \textrm{ and }\qquad t = m_1 + m_2, \end{aligned}$$
(13)

The new parameters (st) can be transformed back to mass space via the inverse transformation:

$$\begin{aligned} m_1 = st, \qquad m_2 = (1-s)t,\qquad (\text {and } m_3 = 1 - t). \end{aligned}$$
(14)

The reason for working in the mass space using (st)-coordinates is as follows: when \(m_1\) and \(m_2\) tend to zero, then some relative equilibria may move in a non-continuous way and no limit exists. This makes our kind of study virtually impossible. When seen in (st)-coordinates, however, the movements are regular and amenable to our computer assisted techniques.

In the \((m_1,m_2)\)-space, the ordered mass space \({{\mathcal {M}}}\) is a right-angled triangle, see Fig. 3b. Under the transformation (13) it is mapped to a non-linear 2-dimensional region \({\tilde{{{\mathcal {P}}}}}\). Taking \({{\mathcal {P}}}\) to be the rectangular hull of \({\tilde{{{\mathcal {P}}}}}\), we have our new parameter region, see Fig. 6. The three vertices of \({{\mathcal {M}}}\) are mapped into \({{\mathcal {P}}}\) according to the following transformations:

$$\begin{aligned} (m_1, m_2) = (1/3,1/3)\mapsto & {} (s,t) = (1/2, 2/3) \\ (m_1, m_2) = (0, 1/2)\mapsto & {} (s,t) = (0, 1/2) \\ (m_1, m_2) = (0, 0)\mapsto & {} (s,t) = ([0, 1/2], 0). \end{aligned}$$

Note how the single point \((m_1, m_2) = (0, 0)\) is mapped to the line segment \((s,t) = ([0, 1/2], 0)\) when taking all possible limits from within \({{\mathcal {M}}}\). This desingularization is the main reason for moving to the (st)-space; it gives us a better control when masses are near the multi-restricted cases \(m_1 = 0\) and \((m_1, m_2) = (0, 0)\).

Fig. 6
figure 6

a The image \({\tilde{{{\mathcal {P}}}}}\) (shaded) of the ordered mass space \({{\mathcal {M}}}\) under the transformation (13). Compare to Fig. 3b. The unshaded part of the rectangle \({{\mathcal {P}}}\) corresponds to unordered masses. b The partition \({{\mathcal {P}}}= {{\mathcal {P}}}_1\cup {{\mathcal {P}}}_2\cup {{\mathcal {P}}}_3\) we will be using in this parametrization

Based on this, we will define \({{\mathcal {P}}}= \{(s,t):0\le s\le \frac{1}{2}; 0\le t \le \tfrac{2}{3}\}\), and use the partition \({{\mathcal {P}}}= {{\mathcal {P}}}_1\cup {{\mathcal {P}}}_2\cup {{\mathcal {P}}}_3\) as illustrated in Fig. 6. More precisely we use

$$\begin{aligned} {{\mathcal {P}}}_1 = [0,0.5]\times [0,0.55],\quad {{\mathcal {P}}}_2 = [0,0.5]\times [0.58, 0.67],\quad {{\mathcal {P}}}_3 = [0,0.5]\times [0.55,0.58]. \end{aligned}$$

Note that each of the three partition elements contain points from \({{\mathcal {P}}}{\setminus }{\tilde{{{\mathcal {P}}}}}\). Such points correspond to unordered masses, and we will automatically remove most them from our computations. In the following, when we say that \({{\mathcal {P}}}_i\) has some property, we mean that the ordered parameters \(\tilde{{{\mathcal {P}}}_i} = {{\mathcal {P}}}_i\cap \tilde{{{\mathcal {P}}}}\) have that property.

The three partition elements \({{\mathcal {P}}}_1, {{\mathcal {P}}}_2, {{\mathcal {P}}}_3\) have the following properties:

  • For each \((s,t)\in {{\mathcal {P}}}_1\), there are exactly 8 solutions in \({{\mathcal {C}}}\).

  • For each \((s,t)\in {{\mathcal {P}}}_2\), there are exactly 10 solutions in \({{\mathcal {C}}}\).

  • For each \((s,t)\in {{\mathcal {P}}}_3\), there are between 8 and 10 solutions in \({{\mathcal {C}}}\).

Let us describe these three regions in more detail. In subsequent sections, we will prove that these descriptions are accurate.

For \((s,t)\in {{\mathcal {P}}}_2\) no bifurcations take place in \({{\mathcal {C}}}\); exactly ten solutions exist but never come close to each other or a primary. This is the easiest region to account for. Also for \((s,t)\in {{\mathcal {P}}}_1\) there are no bifurcations in \({{\mathcal {C}}}\); exactly eight solutions exist. This region, however, includes parameters corresponding to arbitrarily small masses which presents other complications that must be resolved; more about this in Sect. 6.3.1. The remaining set \({{\mathcal {P}}}_3\) contains all parameters for which a bifucation occurs. As discussed in Sect. 2.1, there are two bifurcation types that we must account for; quadratic and cubic. These bifurcations only take place in a small subset \({{\mathcal {C}}}_2\) of the full configuration space. For \((s,t)\in {{\mathcal {P}}}_3\), there can be 1, 2 or 3 solutions in \({{\mathcal {C}}}_2\). In the remaining space \({{\mathcal {C}}}_1 = {{\mathcal {C}}}{\setminus }{{\mathcal {C}}}_2\) there are exactly seven solutions, all isolated from each other and the primaries. Summing up, when \((s,t)\in {{\mathcal {P}}}_3\) we have 8, 9 or 10 solutions in \({{\mathcal {C}}}\) .

In the ideal setting, \({{\mathcal {P}}}_3\) would correspond to the transformed (blue) bifurcation curve illustrated in Fig. 6. It would bisect \({\tilde{{{\mathcal {P}}}}}\), acting as a common boundary line separating \({\tilde{{{\mathcal {P}}}}}_1\) from \({\tilde{{{\mathcal {P}}}}}_2\). Our approach, however, will build upon finite resolution computations, and therefore \({{\mathcal {P}}}_2\) will be constructed as a rectangular subset of \({{\mathcal {P}}}\), covering the entire bifurcation curve, see Fig. 6.

4 Polar Coordinates

Given the shape of the configuration space (4), and how the solutions behave when masses become small, it makes sense to work in polar coordinates centered at the heaviest primary \(p_3 = (0,0)\). In these coordinates the lighter primaries become \(p_1=(1,\pi /6)\) and \(p_2=(1,-\pi /6)\).

For convenience, let us define

$$\begin{aligned} \alpha _1 = \pi /6, \qquad \alpha _2=-\pi /6. \end{aligned}$$

Let \((z_1|z_2)\) denote the scalar product on \({\mathbb {R}}^2\). Given \(z=(r\cos \phi ,r \sin \phi )\), we have

$$\begin{aligned} \Vert z- c\Vert ^2= & {} (z-m_1 p_1 - m_2 p_2 | z-m_1 p_1 - m_2 p_2)\\= & {} z^2 - 2m_1(p_1|z) - 2m_2(p_2|z) + 2m_1m_2(p_1|p_2) + p_1^2 + p_2^2, \\ (p_i|z)= & {} r \cos (\varphi -\alpha _i), \quad i=1,2. \end{aligned}$$

It follows that

$$\begin{aligned} \Vert z- c\Vert ^2= r^2 - 2 m_1 r \cos (\varphi -\pi /6) - 2m_2 r \cos (\varphi + \pi /6) + g(m_1,m_2) \end{aligned}$$

where \(g(m_1,m_2)\) depends only on the masses and not on z. Therefore we can ignore g when studying spatial derivatives of the potential V.

Note that, for \(i=1,2\), we have

$$\begin{aligned} r_i^2= (z-p_i)^2=z^2 - 2(p_i|z) + p_i^2= r^2 - 2r \cos (\varphi -\alpha _i) + 1. \end{aligned}$$

Therefore, if we define

$$\begin{aligned} d(r,\alpha )= \left( r^2 - 2r \cos \alpha + 1 \right) ^{1/2}, \end{aligned}$$
(15)

then for \(i=1,2\) we have

$$\begin{aligned} r_i(r,\varphi )=d(r,\varphi -\alpha _i). \end{aligned}$$

Concluding, in polar coordinates, we obtain the new expression for the amended potential (compare to (1)):

$$\begin{aligned} V(r,\varphi ;m)= V_0(r) + m_1 W(r, \varphi -\alpha _1) + m_2 W(r, \varphi -\alpha _2) + g(m_1, m_2) \end{aligned}$$
(16)

where

$$\begin{aligned} V_0(r) = \frac{r^2}{2} + \frac{m_3}{r} \quad \textrm{ and }\quad W(r,\alpha ) = \frac{1}{d(r,\alpha )} - r \cos (\alpha ). \end{aligned}$$

Taking partial derivatives, the gradient of the potential (16) is given by

$$\begin{aligned} \frac{\partial V}{\partial r}(r,\varphi ;m)= & {} r - \frac{m_3}{r^2} + m_1 \frac{\partial W}{\partial r}(r,\varphi -\alpha _1) + m_2 \frac{\partial W}{\partial r}(r,\varphi -\alpha _2), \end{aligned}$$
(17)
$$\begin{aligned} \frac{\partial V}{\partial \varphi }(r,\varphi ;m)= & {} m_1 \frac{\partial W}{\partial \alpha }(r,\varphi -\alpha _1) + m_2 \frac{\partial W}{\partial \alpha }(r,\varphi -\alpha _2), \end{aligned}$$
(18)

where

$$\begin{aligned} \frac{\partial W}{\partial \alpha }(r,\alpha )= & {} r \sin \alpha \left( 1 - \frac{1}{d(r,\alpha )^3} \right) , \\ \frac{\partial W}{\partial r}(r,\alpha )= & {} -\frac{r - \cos \alpha }{d(r,\alpha )^3} - \cos \alpha . \end{aligned}$$

Note that (18) has a factor r in both terms. We rescale this equation by a factor 1/r and \(1/(m_1 + m_2)\). In the (st) parameters this gives us

$$\begin{aligned} F_1(r,\varphi ;s,t)= & {} r - \frac{1 - t}{r^2} + st \frac{\partial W}{\partial r}(r,\varphi -\alpha _1) + (1 - s)t \frac{\partial W}{\partial r}(r,\varphi -\alpha _2), \end{aligned}$$
(19)
$$\begin{aligned} F_2(r,\varphi ;s,t)= & {} s \frac{\partial W^\star }{\partial \alpha }(r,\varphi -\alpha _1) + (1-s) \frac{\partial W^\star }{\partial \alpha }(r,\varphi -\alpha _2), \end{aligned}$$
(20)

where

$$\begin{aligned} \frac{\partial W^\star }{\partial \alpha }(r,\alpha ) = \frac{1}{r}\frac{\partial W}{\partial \alpha }(r,\alpha ) = \sin \alpha \left( 1 - \frac{1}{d(r,\alpha )^3} \right) . \end{aligned}$$

This \(F(r,\varphi ; s,t)\) will be used in place of \(\nabla _z V(z;m)\) appearing in the previous sections. Zeros of F correspond to critical points of V, and these correspond to relative equilibria.

5 General Strategy and Key Results

Recall that we want to determine the number of solutions to \(\nabla _z V(z;m) = 0\), and to understand how these behave within \({{\mathcal {C}}}\) when the masses vary within \({{\mathcal {M}}}\) (see Main Problem 1). For this to succeed, we must have a means of locating all solutions to the critical equation, and a way to analyze the various bifurcations that are possible as the masses vary.

As mentioned in Sect. 1 our overall goal is to construct a completely analytic proof of the following theorem:

Theorem 6

For each \(m\in {{\mathcal {M}}}\) there are exactly 8,9 or 10 relative equilibria (i.e., solutions to the critical equation \(\nabla _z V(z;m) = 0\)) in \({{\mathcal {C}}}\).

This result was originally established by Barros and Leandro [4] using algebraic techniques. Developing analytic tools for the proof, we hope to apply these to harder instances of the n-body problem that are not within reach using algebraic methods.

In our new setting, using the (st)-coordinates together with polar coordinates (described in Sects. 3 and 4), the critical equation is transformed into its equivalent form \(F(r,\varphi ; s,t) = 0\). As explained earlier, this form is better suited for our approach, where set-valued numerical computations will play a major role.

Before going into the details of the computations used as part of our computer-assisted framework, we present the following key results. We will use three different techniques: finding the exact number of solutions for given parameters (st); proving that no bifurcations take place for a range of parameters; and controlling the number of solutions when when a bifurcation takes place.

We begin by determining the number of solutions for two different parameters.

Theorem 7

Consider the critical equation \(F(r,\varphi ;s,t) = 0\) for the \(3+1\) body problem.

  1. (a)

    For parameters \((s,t) = (1/4, 1/4)\), there are exactly 8 solutions in \({{\mathcal {C}}}\).

  2. (b)

    For parameters \((s,t) = (9/20, 3/5)\), there are exactly 10 solutions in \({{\mathcal {C}}}\).

Note that \((s,t) = (1/4, 1/4)\in {{\mathcal {P}}}_1\) and \((s,t) = (9/20, 3/5)\in {{\mathcal {P}}}_2\), as discussed in Sect. 3.

Combining the results of Theorem 7 with a criterion that ensures that no bifurcations are taking place (see Sect. 6.3), we can extend the two solution counts to the two connected regions \({{\mathcal {P}}}_1\) and \({{\mathcal {P}}}_2\), respectively:

Theorem 8

Consider the critical equation \(F(r,\varphi ;s,t) = 0\) for the \(3+1\) body problem.

  1. (a)

    For all parameters \((s,t)\in {{\mathcal {P}}}_1\), no solution in \({{\mathcal {C}}}\) bifurcates.

  2. (b)

    For all parameters \((s,t)\in {{\mathcal {P}}}_2\), no solution in \({{\mathcal {C}}}\) bifurcates.

It follows that the number of solutions in \({{\mathcal {P}}}_1\) and \({{\mathcal {P}}}_2\) is constant (8 and 10, respectively).

For the remaining part \({{\mathcal {P}}}_3\) of the parameter space, we must be a bit more detailed. As discussed in Sect. 7, we will split the configuration space into two connected components: \({{\mathcal {C}}}= {{\mathcal {C}}}_1 \cup {{\mathcal {C}}}_2\), thus isolating the region where all bifurcations occur.

Theorem 9

Consider the critical equation \(F(r,\varphi ;s,t) = 0\) for the \(3+1\) body problem.

  1. (a)

    For all parameters \((s,t)\in {{\mathcal {P}}}_3\), there are exactly 7 solutions in \({{\mathcal {C}}}_1\).

  2. (b)

    For all parameters \((s,t)\in {{\mathcal {P}}}_3\), there are 1, 2, or 3 solutions in \({{\mathcal {C}}}_2\).

Theorem 6 now follows from the combination of Theorem 78, and 9. In what follows, we will justify each of the three steps described above.

6 Computational Techniques

Here we present the computational techniques that we need to employ in order to establish our main theorem. We also discuss the underlying set-valued methods used later on in the computer-assisted proofs.

6.1 Set-Valued Mathematics

We begin by giving a very brief introduction to set-valued mathematics and rigorous numerics. For more in-depth accounts we refer to e.g. [15, 17, 22].

We will exclusively work with compact boxes \({{\varvec{x}}}\) in \({\mathbb R}^n\), represented as vectors whose components are compact intervals: \({{\varvec{x}}}= ({{\varvec{x}}}_1,\dots ,{{\varvec{x}}}_n)\), where \({{\varvec{x}}}_i = \{x\in {\mathbb R}:\underline{x}_i \le x \le \overline{x}_i\}\) for \(i = 1,\dots , n\).

Given en explicit formula for a function \(f:{\mathbb R}^n\rightarrow {\mathbb R}^m\), we can form its interval extension (which we also denote by f), by extending each real operation by its interval counterpart. As long as the resulting interval image \(f({{\varvec{x}}})\) is well-defined, we always have the following inclusion property:

$$\begin{aligned} \textrm{range}(f;{{\varvec{x}}}) = \{f(x):x\in {{\varvec{x}}}\} \subseteq f({{\varvec{x}}}). \end{aligned}$$
(21)

The main benefit of moving from real-valued to interval-valued analysis is the ability to discretise continuous problems while retaining full control of the discretisation errors. Indeed, whilst the exact range of a function \(\textrm{range}(f;{{\varvec{x}}})\) is hard to compute, its interval image \(f({{\varvec{x}}})\) can be found by a finite computation. In practice, the interval image is computed via a finite sequence of numerical operations. Carefully crafted libraries using directed rounding, such as [9], ensures that the numerical output respects the mathematical inclusion property of (21).

6.2 Equation Solving and Safe Exclusions

We begin by stating what is known as the exclusion principle:

Theorem 10

If f(x) is well-defined and if \(0\notin f({{\varvec{x}}})\), then f has no zero in \({{\varvec{x}}}\).

The proof is an immediate consequence of (21). The exclusion principle can be used in an adaptive bisection scheme, gradually discarding subsets of a global search space \({{\varvec{X}}}\). At each stage of the bisection process, a (possibly empty) collection of subsets \({{\varvec{x}}}_1,\dots , {{\varvec{x}}}_n\) remain, whose union must contain all zeros of f. Note, however, that this does not imply that \(f(x) = 0\) has any solutions; we have only discarded subsets of \({{\varvec{X}}}\) where we are certain that no zeros of f reside. In order to prove the existence of zeros, we need an addition result.

Let \(f\in C^1({{\varvec{X}}}, {\mathbb R}^n)\), where \(\textrm{Dom}(f) = {{\varvec{X}}}\subseteq {\mathbb R}^n\). Given an interval vector \({{\varvec{x}}}\subset {{\varvec{X}}}\), a point \({\check{x}}\in {{\varvec{x}}}\), and an invertible \(n\times n\) matrix C, we define the Krawczyk operator [10, 17] as

$$\begin{aligned} K_f({{\varvec{x}}}; {\check{x}}; C) = f({\check{x}}) - C\cdot f({\check{x}}) + \big (I - C\cdot Df({{\varvec{x}}})\big )\cdot ({{\varvec{x}}}- {\check{x}}). \end{aligned}$$
(22)

Popular choices are \({\check{x}} = \textrm{mid}({{\varvec{x}}})\) and \(C = \textrm{mid}([Df({\check{x}})])^{-1}\), resulting in Newton-like convergence rates near simple zeros.

Theorem 11

Assume that \(K_f({{\varvec{x}}}; {\check{x}}; C)\) is well-defined. Then the following statements hold:

  1. 1.

    If \(K_f({{\varvec{x}}}; {\check{x}}; C) \cap {{\varvec{x}}}= \emptyset \), then f has no zeros in \({{\varvec{x}}}\).

  2. 2.

    If \(K_f({{\varvec{x}}}; {\check{x}}; C) \subset \textrm{int}\,{{\varvec{x}}}\), then f has a unique zero in \({{\varvec{x}}}\).

We will use this theorem together with the interval bisection scheme, were we adaptively bisect the initial search region \({{\varvec{X}}}\) into subsets \({{\varvec{x}}}\) that are either discarded due to the fact that \(0\notin f({{\varvec{x}}})\), kept intact because of \(K_f({{\varvec{x}}}; {\check{x}}; C) \subset \textrm{int}\,{{\varvec{x}}}\), or bisected for further study. On termination, this will give us an exact count on the number of zeros of f within \({{\varvec{X}}}\).

Theorem 11 can be extended to the setting where f also depends on some m-dimensional parameter: \(f:{\mathbb R}^n\times {\mathbb R}^m \rightarrow {\mathbb R}^n\) with \((x;p)\mapsto f(x;p)\). This is what we use to establish Theorem 7.

6.3 A Set-Valued Criterion for Local Uniqueness

Continuing in the set-valued, parameter dependent setting, we will explain in detail the criteria used for detecting when (and when not) a bifurcation occurs for a general system of non-linear equations, depending on some parameters. We will also derive results aimed at extracting more detailed information about some specific bifurcations that may occur.

Let us begin by considering the general problem of solving a system of (non-linear) equations

$$\begin{aligned} f(x;p) = 0\quad x\in {{\varvec{x}}},\quad p\in {{\varvec{p}}}, \end{aligned}$$
(23)

where \({{\varvec{x}}}\subset {\mathbb R}^n\) and \({{\varvec{p}}}\subset {\mathbb R}^m\) are high-dimensional boxes. For a sufficiently smooth \(f:{\mathbb R}^n\times {\mathbb R}^m\rightarrow {\mathbb R}^n\), we want to know how many solutions (23) can have. We will focus on the bifurcations and develop a criterion which will tell us when the number of solutions of (23) changes.

For now, we will suppress the parameter dependence of f for clarity. All results that follow are extendable to the parameter-dependent setting.

An obvious condition for the local uniqueness of solutions to (23) is given by the following theorem.

Theorem 12

Let \(f:{\mathbb R}^n\rightarrow {\mathbb R}^n\) be \(C^1\). Assume that we are given a box \({{\varvec{x}}}\subset {\mathbb R}^n\) such that \(0 \notin \det Df({{\varvec{x}}})\). Then f is a bijection from \({{\varvec{x}}}\) onto its image.

Note that \(Df({{\varvec{x}}})\) is a matrix with interval entries; it contains all possible Jacobian matrices Df(x), where \(x\in {{\varvec{x}}}\).

Proof

We have for \(x,y \in {{\varvec{x}}}\), \(x\ne y\)

$$\begin{aligned} f(x) - f(y) = \int _0^1 Df(y + t(x-y))dt \cdot (x-y) \in Df({{\varvec{x}}})\cdot (x-y). \end{aligned}$$
(24)

Now, since \(0 \notin \det Df({{\varvec{x}}})\), all (point-valued) matrices in \(Df({{\varvec{x}}})\) are non-singular. Therefore the right-hand side of (24) cannot contain the zero vector. Hence the left-hand side \(f(x) - f(y)\) cannot vanish. \(\square \)

This result constitutes the core of most of our computations. Given two boxes \({{\varvec{x}}}\) and \({{\varvec{p}}}\), we can easily compute \({{\varvec{y}}}= \det Df({{\varvec{x}}};{{\varvec{p}}})\) using interval arithmetic. If \(0\notin {{\varvec{y}}}\), we know that \(f(\cdot ;p)\) is a bijection (between \({{\varvec{x}}}\) and its image) for each \(p\in {{\varvec{p}}}\). Therefore, the Eq. (23) can have at most one solution in \({{\varvec{x}}}\), and such a solution cannot undergo any bifurcation when \(p\in {{\varvec{p}}}\).

In dimension two, it is easy to illustrate the condition we are checking: the level sets of \(f_1\) and \(f_2\) must never become parallel, see Fig. 7.

Fig. 7
figure 7

Some of the possible configurations of level sets of \(f_1\) and \(f_2\). The right-most configuration cannot happen under the assumptions of Theorem 12

6.3.1 Small Masses

In the case when one or two masses are very small, Theorem 12 is not practically applicable. In this situation some solutions to the critical equation are close to the primaries, and a careful analysis combined with rigorous bounds produced with computer-assistance are needed.

The strategy to study this problem is the following: we focus on each primary with small mass, and use (local) polar coordinates \((\varrho ,\varphi )\) around it to study the problem. In this setting we study the following system of equations (suppressing the dependence on masses) based on the original amended potential (1)

$$\begin{aligned} \left\{ \begin{array}{c} \frac{\partial V}{\partial \varrho }(\varrho ,\varphi )=0,\\ \frac{\partial V}{\partial \varphi }(\varrho ,\varphi )=0. \end{array} \right. \end{aligned}$$
(25)

It turns out that it is relatively easy to control the solutions of the equation \(\frac{\partial V}{\partial \varphi }(\varrho ,\varphi )=0\) for \(\varphi \), obtaining four curves \((\varrho ,\varphi (\varrho ))\) on which we have uniform estimates over a whole range of \((m_1,m_2)\) including (0, 0). Turning to the remaining equation

$$\begin{aligned} \frac{\partial V}{\partial \varrho }(\varrho ,\varphi (\varrho ))=0, \end{aligned}$$

we study two cases separately: when \(m_1\) alone tends to zero, and when \(m_1\) and \(m_2\) both tend to zero. In both cases we can prove that any solution of (25) is of the form \((\varrho , \varphi (\varrho ))\), and satisfies

$$\begin{aligned} \frac{d}{d\varrho }\left( \frac{\partial V}{\partial \varrho }(\varrho ,\varphi (\varrho ))\right) \ne 0. \end{aligned}$$

This implies that all solutions are regular: there are no bifurcations. We summarise our findings in a quantitative and practically applicable statement:

Theorem 13

For \(0 < m_i \le 10^{-2}\) and \(R=10^{-3}\), any relative equilibria z of (1) with \(\Vert z-p_i\Vert \le R\), \(i=1,2\), is non-degenerate: it does not bifurcate.

The details of all this can be found in the “Appendix A”, with a thorough analysis of the solutions close to the light primaries in question, and with quantitative bounds that are both useful for proving the theorem, and of general interest for further studies. We note that there exist earlier, qualitative, results [24] that treat the case of several infinitesimal masses. The strength of Theorem 13 is that it is quantitative: we are given explicit bounds for the subset of \({{\mathcal {C}}}\times {{\mathcal {M}}}\) on which the statement holds. This is crucial for our approach: in the remainder of \({{\mathcal {C}}}\times {{\mathcal {M}}}\), not covered by Theorem 13, all three masses are quite large, and all relative equilibria are well-separated from the primaries. This is an important condition for an efficient implementation of Theorem 12, which forms a major part of our general computer-assisted proof.

This concludes what we use to establish Theorem 8.

6.4 Lyapunov–Schmidt Reduction in \({\mathbb R}^2\)

If we cannot invoke Theorems 12 or 13 on a given region \({{\varvec{x}}}\times {{\varvec{p}}}\), we must explore the system of equations (23) further.

Assume that the elimination of one variable (the Lyapunov–Schmidt reduction) is possible in the box \({{\varvec{x}}}\) for all \(p \in {{\varvec{p}}}\). The condition \(0\notin \frac{\partial f_i}{\partial x_j}({{\varvec{x}}};{{\varvec{p}}})\) for some \(i,j\in \{1,\dots ,n\}\) is sufficient for this to be possible. It ensures that \(f_i\) is strictly monotone in the variable \(x_j\) for all parameters \(p\in {{\varvec{p}}}\). This implies that a relation \(f_i(x) = C\) implicitly defines \(x_j\) in terms of the other independent variables: \(x_j = x_j(x_1,\dots , x_{j-1}, x_{j+1},\dots , x_n)\). The domain of this parametrization will depend on the constant C.

From now on we will work exclusively in the two-dimensional setting, and we will once again suppress the parameter dependency for clarity. As we are interested in solutions to (23), we want to understand how the zero-level sets of \(f_1\) and \(f_2\) behave for various parameters. Without any loss of generality consider the case \((i,j)=(1,1)\), when we have

$$\begin{aligned} \frac{\partial f_1}{\partial x_1}(x) \ne 0, \quad x \in {{\varvec{x}}}. \end{aligned}$$
(26)

Now assume also that the level set \(f_1(x) = 0\) forms exactly one connected component in \({{\varvec{x}}}\). Then there exists a parametrization \(x_1 = x_1(x_2)\) defined on a connected domain \([x_2^-,x_2^+]\subset {{\varvec{x}}}_2\), such that

$$\begin{aligned} f_1(x_1,x_2)=0 \quad \text{ if } \text{ and } \text{ only } \text{ if } \quad x_1=x_1(x_2). \end{aligned}$$
(27)

We can now define the reduced form of (23) as follows

$$\begin{aligned} g(x_2) = f_2(x_1(x_2),x_2) = 0, \qquad x_2 \in [x_2^-,x_2^+]. \end{aligned}$$
(28)

Thus, the number of zeros of g will determine the number of solutions to (23).

For the remaining three cases, the analogous construction would produce

$$\begin{aligned} g(x_1)= & {} f_2(x_1,x_2(x_1)) \qquad (i,j) = (1,2), \\ g(x_2)= & {} f_1(x_1(x_2),x_2) \qquad (i,j) = (2,1), \\ g(x_1)= & {} f_1(x_1,x_2(x_1)) \qquad (i,j) = (2,2). \end{aligned}$$

To simplify notation, we will use y as the independent variable of g, and its domain will be denoted \({{\varvec{y}}}= [y^-,y^+]\). Note that a sufficient condition for the uniqueness of a solution of \(g(y)=0\), and hence also of (23) is

$$\begin{aligned} g'(y) \ne 0, \quad y \in {{\varvec{y}}}. \end{aligned}$$
(29)

It turns out that (29) can be formulated in an invariant way. We will return to the case \((i,j) = (1,1)\) for the sake of clarity.

Implicit differentiation of (27) gives

$$\begin{aligned} x_1'(x_2)= -\left( \frac{\partial f_1}{\partial x_1}(x_1(x_2),x_2)\right) ^{-1} \frac{\partial f_1}{\partial x_2}(x_1(x_2),x_2). \end{aligned}$$

Applying the chain rule to (28) gives

$$\begin{aligned} g'(x_2)= \frac{\partial f_2}{\partial x_1}(x_1(x_2),x_2) x_1'(x_2) + \frac{\partial f_2}{\partial x_2}(x_1(x_2),x_2) \end{aligned}$$

and substituting the expression for \(x_1'(x_2)\) into this produces

$$\begin{aligned} g'(x_2)= & {} -\left( \frac{\partial f_1}{\partial x_1}\right) ^{-1} \frac{\partial f_1}{\partial x_2}\frac{\partial f_2}{\partial x_1}+\frac{\partial f_2}{\partial x_2} \\= & {} \left( \frac{\partial f_1}{\partial x_1}\right) ^{-1} \left( \frac{\partial f_1}{\partial x_1} \frac{\partial f_2}{\partial x_2} -\frac{\partial f_1}{\partial x_2}\frac{\partial f_2}{\partial x_1}\right) = \left( \frac{\partial f_1}{\partial x_1}\right) ^{-1} \det (Df). \end{aligned}$$

The importance of the non-vanishing condition (26) is now clear, as is the non-vanishing determinant condition of Theorem 12.

In what follows, we will refer to the constructed function g as the bifurcation function.

6.5 Bifurcation Analysis

Having seen how to apply the Lyapunov–Schmidt reduction, we now re-introduce the parameters to f (and thus g). Given a bifurcation function \(g:{{\varvec{y}}}\times {{\varvec{p}}}\rightarrow {\mathbb {R}}\), we would like to study the maximal number of solutions \(g(y;p) = 0\) can have in \({{\varvec{y}}}\) for \(p \in {{\varvec{p}}}\).

Instead of fully resolving the details of all possible bifurcations, we will use the following simple observation:

Lemma 14

Let g be the bifurcation function as defined in Sect. 6.4. Assume that for some \(k \in {\mathbb {Z}}^+\) we have

$$\begin{aligned} \frac{\partial ^k g}{\partial y^k}(y;p) \ne 0 , \quad \forall (y,p) \in {{\varvec{y}}}\times {{\varvec{p}}}. \end{aligned}$$
(30)

Then for each \(p \in {{\varvec{p}}}\), the equation \(g(y;p) = 0\) has at most k solutions in \({{\varvec{y}}}\).

The rightmost part of Fig. 7 illustrates the case \(k=2\): the bifurcation function g then has a quadratic behaviour.

Given a search region \({{\varvec{x}}}\times {{\varvec{p}}}\) for the original problem (23), we can try to find a positive integer k such that \(0\notin g^{(k)}({{\varvec{x}}}_i;{{\varvec{p}}})\). Here i can be any index for which the Lyapunov–Schmidt reduction works (note that then we have \({{\varvec{y}}}\subset {{\varvec{x}}}_i\)). If we succeed, we have an upper bound on the number of solutions to \(g(y;p) = 0\) in the region \({{\varvec{x}}}_i\times {{\varvec{p}}}\supset {{\varvec{y}}}\times {{\varvec{p}}}\). Note that this number translates to the original system of equations (23). By construction, the solutions to \(g(y;p) = 0\) are in one-to-one correspondence to those of \(f(x;p) = 0\).

For the planar circular restricted 4-body problem it turns out that we only have to consider the cases \(k=1,2,3\). The bifurcation function g never behaves worse than a cubic function. The actual evaluation of the increasingly complicated expressions \(g^{(k)}\) is achieved by automatic differentiation—a technique that automatically computes the (partial) derivatives of a given function, having access only to the algebraic expression of the function itself [7]. Furthermore, the Lyapunov–Schmidt reduction always succeeds in the case \((i,j) = (2,2)\), so we always have \(g = g(x_1) = f_1(x_1, x_2(x_1))\).

We end this section by describing how we ensure that the level set \(f_2(x) = 0\) forms exactly one connected component in \({{\varvec{x}}}\). This is an important part of the Lyapunov–Schmidt reduction, and allows us to obtain an upper bound on the number of solutions via Lemma 14. In our implementation, all operations and functions are extended to their interval-valued counterparts, as described in Sect. 6.1. First we verify that \((i,j) = (2,2)\) are suitable indices by checking that \(0 \notin \frac{\partial f_2}{\partial x_2}({{\varvec{x}}})\). Note that this condition prevents a component of \(f_2(x) = 0\) forming a closed loop inside \({{\varvec{x}}}\); each component must enter and exit \({{\varvec{x}}}\). Writing \({{\varvec{x}}}= [x_1^-, x_1^+]\times [x_2^-, x_2^+]\), we next verify that \(f_2(x_1^-, x_2^-)< 0 < f_2(x_1^+, x_2^+)\). This implies that \(f_2\) must vanish at least twice on the boundary of \({{\varvec{x}}}\). For each of the four sides \({{\varvec{s}}}_i\) \((i = 1,\dots ,4)\) of the rectangle \({{\varvec{x}}}\), we compute an enclosure of the zero set \({{\varvec{z}}}_i = \{x\in {{\varvec{s}}}_i:f_2(x) = 0\}\). On each such non-empty \({{\varvec{z}}}_i\), we check that \(f_2\) is strictly increasing in its non-constant variable. This implies that \(\cup _{i=1}^4 {{\varvec{z}}}_i\) must form two connected components \(w_1\) and \(w_2\). Each \(w_i\) is made up of either one (non-empty) zero set \({{\varvec{z}}}_j\) or two such sets joined at a corner of \({{\varvec{x}}}\). The level set \(f_2(x) = 0\) crosses each of \(w_1\) and \(w_2\) transversally, and exactly once. Thus we have proved that \(f_2(x) = 0\) forms exactly one connected component in \({{\varvec{x}}}\).

7 Computational Results

We will now describe the program used for proving the results of Sect. 5. Throughout all computations, we use the parameters \((s,t)\in {{\mathcal {P}}}= [0, \tfrac{1}{2}]\times [0, \tfrac{2}{3}]\) in place of the masses \((m_1, m_2, m_3)\in {{\mathcal {M}}}\), as described in Sect. 3. We also represent the phase space variable \(z\in {{\mathcal {C}}}\) in polar coordinates \((r, \theta ) \in [\tfrac{1}{3},2]\times [-\pi , \pi ]\), and desingularize the equations as described in Sect. 4. This transforms the original critical equation \(\nabla _z V(z;m) = 0\) into the equivalent problem \(F(r,\varphi ; s, t) = 0\), which is more suitable for computations.

The syntax of the program is rather straight-forward:

figure a
  • tol is the stopping tolerance used in the adaptive bisection process (used to discard subsets of the search region proved to have no zeros);

  • We consider parameters \(s\in {{\varvec{s}}}\) satisfying minS \(\le s\le \) maxS.

  • We consider parameters \(t\in {{\varvec{t}}}\) satisfying minT \(\le t\le \) maxT.

  • strategy determines which method we use:

    1. 1.

      Explicitly count all solutions in \({{\mathcal {C}}}\)—used for \({{\varvec{s}}}\times {{\varvec{t}}}\subset {{\mathcal {P}}}\) of very small width only.

    2. 2.

      Verify that there are no bifurcations taking place in \({{\mathcal {C}}}\) for \((s,t)\in {{\varvec{s}}}\times {{\varvec{t}}}\subset {{\mathcal {P}}}\).

    3. 3.

      Resolve all bifurcations taking place in \({{\mathcal {C}}}\) for \((s,t)\in {{\varvec{s}}}\times {{\varvec{t}}}\subset {{\mathcal {P}}}\).

The three strategies are based on Theorem 11, Theorem 12 together with Theorem 13, and Lemma 14, respectively.

When using strategy 1, no splitting in parameter space is carried out. Only the configuration space is adaptively subdivided during the search for isolated solutions. By contrast, when using strategy 2 or 3, splitting occurs in both parameter and configuration space. The splitting is carried out according to several different criteria in the code—all with the aim of locally verifying the conditions required by the theorems employed.

7.1 Proof of Theorem 7

We demonstrate the program by proving Theorem 7, running it on two different point-valued parameters (using strategy 1). The first one is for \((s,t) = (1/4, 1/4)\). This corresponds to the masses \(m = (1/16, 3/16, 12/16)\), and produces 8 solutions.

figure b

Here \(\texttt {smallList}\) is empty, signaling that the bisection stage was successful and had no unresolved subdomains. \(\texttt {noList}\) contains all boxes that were excluded using Theorem 10: there were 185 such boxes in this run. The eight elements of \(\texttt {yesList}\) are certified to contain unique zeros of f, according to Theorem 11. The same eight zeros are enclosed in the much smaller boxes stored in \(\texttt {tightList}\). Finally, the four boxes of \(\texttt {n}dtList\) are close to the lighter primaries, and are proven not to contain any zeros of f according to the exclusion results of Sect. 2.2. The output of this run is illustrated in Fig. 8a.

Fig. 8
figure 8

Relative equilibria (blue small disks) for the restricted 4-body problem. The three primaries are red large disks. Green sectors contain unique solutions. White sectors are solution-free. a When masses differ substantially, there are eight relative equilibria. b Near equal masses yield ten relative equilibria (Color figure online)

The second run is for \((s,t) = (9/20, 3/5)\). This corresponds to \(m = (27/100, 33/100, 40/100)\), and produces 10 solutions. The output of this run is listed below, and is illustrated in Fig. 8b.

figure c

These two runs complete the proof of Theorem 7.

7.2 Proof of Theorem 8

Extending these results to larger domains in the (st)-parameter space, we turn to Theorem 8. In order to improve execution times, we pre-split the parameter domain into smaller subsets as illustrated in Fig. 9. This particular splitting is rather ad-hoc, and is based on some heuristic trial runs; other splittings would work fine too.

Fig. 9
figure 9

a The original partition \({{\mathcal {P}}}= {{\mathcal {P}}}_1\cup {{\mathcal {P}}}_2\cup {{\mathcal {P}}}_3\) used in the (st)-space. b The finer partition of \({{\mathcal {P}}}\) used in our computations (not to scale). \({{\mathcal {P}}}_3\) is pre-split into three subsets, and \({{\mathcal {P}}}_1\) is pre-split into four. \({{\mathcal {P}}}_2\) remains intact

We begin with the parameter set \({{\mathcal {P}}}_1 = [0,0.5]\times [0,0.55]\) which is pre-split into four rectangles:

$$\begin{aligned} {{\mathcal {P}}}_1= & {} [0, 10^{-6}]\times [0, 10^{-6}] \cup [10^{-6},0.5]\times [0, 10^{-6}] \cup [0, 10^{-6}]\\{} & {} \times [10^{-6}, 0.55] \cup [10^{-6},0.5]\times [10^{-6}, 0.55], \end{aligned}$$

see Fig. 9b. Note that \(\tilde{{{\mathcal {P}}}_1} = {{\mathcal {P}}}_1\cap {\tilde{{{\mathcal {P}}}}}\) corresponds to the mass region above the bifurcation line in Fig. 3b: this region contains all small masses \(m_1\) and \(m_2\). Examining each of the four rectangles separately, we now use strategy number 2, which means that we are attempting to verify that no bifurcations take place for these parameter ranges. We begin with \((s,t)\in [0, 10^{-6}]\times [0, 10^{-6}]\):

figure d

We repeat the computation on the second rectangle \((s,t)\in [10^{-6},0.5]\times [0, 10^{-6}]\):

figure e

We repeat the computation on the third rectangle \((s,t)\in [0, 10^{-6}]\times [10^{-6}, 0.55]\):

figure f

Finally, we repeat the computation on the fourth rectangle \((s,t)\in [10^{-6},0.5]\times [10^{-6}, 0.55]\):

figure g

These four runs prove that \({{\mathcal {P}}}_1\) is bifurcation free, which establishes part (a) of Theorem 8. Here we also get a report of several encountered parameter regions that are unordered; they are a consequence of the reparametrization of the mass space, and belong to the set \({{\mathcal {P}}}{\setminus }{\tilde{{{\mathcal {P}}}}}\), see Fig. 6. Such parameters are automatically discarded. Furthermore, ndgList contains all boxes where we have established that no bifurcations are taking place. As we are encountering parameters corresponding to both \(m_1\) and \(m_2\) becoming (arbitrarily) small, there is a significant partitioning near the lighter primaries, resulting in many boxes in ndtList.

Turning to part (b) of Theorem 8, we continue with the parameter set \({{\mathcal {P}}}_2 = [0,0.5]\times [0.58, 0.67]\). Note that \({{\mathcal {P}}}_2\cap {\tilde{{{\mathcal {P}}}}}\) corresponds to the mass region below the bifurcation line in Fig. 3b. This region contains the point of equal masses. Again we use strategy number 2.

figure h

This run proves that \({{\mathcal {P}}}_2\) is bifurcation free, as claimed. Note that the computational effort was smaller than that for \({{\mathcal {P}}}_1\). This is due to the fact that we are not considering any parameters corresponding to small masses in this run. We have now completed the proof of Theorem 8.

7.3 Proof of Theorem 9

Turning to Theorem 9, we focus on the only remaining parameter set \({{\mathcal {P}}}_3 = [0,0.5]\times [0.55, 0.58]\), which has been constructed to contain the entire line of bifurcation, see Fig. 6b. We now use strategy number 3, which means that we will locate and resolve all occurring bifurcations.

It turns out that all bifurcations take place within a small subset \({{\mathcal {C}}}_2\) of the configuration space. In the complement \({{\mathcal {C}}}_1 = {{\mathcal {C}}}{\setminus }{{\mathcal {C}}}_2\), the solutions are non-degenerate and persist for all parameter values in \({{\mathcal {P}}}_3\). In light of this, the program splits the configuration space into two parts: \({{\mathcal {C}}}= {{\mathcal {C}}}_1\cup {{\mathcal {C}}}_2\) which are examined separately. For purely technical reasons \({{\mathcal {C}}}_1\) is further divided into three subregions (named C11, C12, C13 in the output).

As explained above, \({{\mathcal {P}}}_3\) is pre-split into three rectangles:

$$\begin{aligned} {{\mathcal {P}}}_3 = [0, 0.2]\times [0.55, 0.58] \cup [0.2,0.25]\times [0.55, 0.58] \cup [0.25, 0.5]\times [0.55, 0.58], \end{aligned}$$

see Fig. 9b.

We begin with \((s,t)\in [0, 0.2]\times [0.55, 0.58]\):

figure i

This run is not so interesting: there are no bifurcations taking place in the parameter region. Indeed, most of these parameters are unordered, and are discarded immediately. Apart from these, there are some parameters that are provably bifurcation-free (we are actually using the techniques of strategy 2 as part of the computation). No higher order bifurcations take place at all.

The next two parameter regions are much more challenging. We continue with \((s,t)\in [0.2, 0.25]\times [0.55, 0.58]\):

figure j

Let us now explain what we can learn from this run.

The first part of the run verifies that no bifurcations take place within \({{\mathcal {C}}}_1\) for parameters in \({{\mathcal {P}}}_3\): we use strategy 2 to establish this fact. Figure 10a illustrates the outcome of this part of the computations.

Fig. 10
figure 10

a There are no bifurcations taking place inside \({{\mathcal {C}}}_1\). For all parameters in \({{\mathcal {P}}}_3\), there exist seven solutions here, each located in one of the green connected regions. b Inside \({{\mathcal {C}}}_2\) two kinds of bifurcations occur: quadratic and cubic. As before, some exclusion theorems are valid in the yellow sections (Color figure online)

The second part of the run focuses on the remaining portion of the configuration space \({{\mathcal {C}}}_2\) (named \(\texttt {C0}\) in the output); in polar coordinates \({{\mathcal {C}}}_2 = [\tfrac{1}{3},1]\times [\tfrac{2}{10}, \tfrac{7}{10}]\). Throughout these computations, the parameter region \({{\mathcal {P}}}_3\) is adaptively bisected into many smaller sets; each one being examined via various test. When successfully classified, a parameter set is stored in one of several lists. Together with each such parameter set, we also store a covering of all possible solutions in \({{\mathcal {C}}}_2\). The lists are organized as follows:

  1. 1.

    s0List: the parameters are unordered (we do not consider these).

  2. 2.

    s1List: the number of solutions is at most one and fixed (no bifurcations) in \({{\mathcal {C}}}_2\).

  3. 3.

    s111List: there are three connected components in \({{\mathcal {C}}}_2\); in each one of them the number of solutions is at most one and fixed (no bifurcations).

  4. 4.

    s210List: there are two connected components in \({{\mathcal {C}}}_2\) where solutions may reside. For one of these the number of solutions is at most one and fixed (no bifurcations). For the second component there can be 0, 1, or 2 solutions (a quadratic bifurcation).

  5. 5.

    s300List: there is one connected component in \({{\mathcal {C}}}_2\) where the number of solutions can be 1, 2 or 3 (a cubic bifurcation).

We also mention that s3List is the union of s111List, s210List, and s300List.

Our final computation deals with the region \((s,t)\in [0.25, 0.5]\times [0.55, 0.58]\):

figure k

The main difference to the previous run, is that there are no cubic bifurcations in this parameter region; only quadratic. Indeed, s300List is empty, but s210List is not.

In summary, the classifications of the parameter sets corresponding to the lists produced by these three runs are illustrated in Fig. 11.

Fig. 11
figure 11

The bifurcations (and non-bifurcations) taking place within the parameter region \({{\mathcal {P}}}_3 = [0, 0.5]\times [0.55, 0.58]\). There are five groups of parameters illustrated here: (1) the white region corresponds to s0List; (2) the green region corresponds to s1List; (3) the yellow region corresponds to s111List; (4) the blue region corresponds to s210List; 5. the red region corresponds to s300List. Each group of parameters forms a simply connected set (Color figure online)

The data presented in Fig. 11 (and Fig. 12) combined with our previous results provide all information needed to get an accurate count on the number of solutions.

First, note that the parameters in s1List form a connected set having a non-empty intersection with \({{\mathcal {P}}}_1\). We can therefore use part (a) of Theorem 8 to conclude that each parameter in s1List will yield exactly eight solutions in \({{\mathcal {C}}}\): seven in \({{\mathcal {C}}}_1\) and one in \({{\mathcal {C}}}_2\).

Similarly, we note that the parameters in s111List form a connected set having a non-empty intersection with \({{\mathcal {P}}}_2\). Using part (b) of Theorem 8 we conclude that out of the ten solutions in \({{\mathcal {C}}}\), seven reside in \({{\mathcal {C}}}_1\) and the remaining three belong to \({{\mathcal {C}}}_2\).

Turning to s210List, these parameters form a connected set having a non-empty intersection with both s1List and s111List. The bifurcation-free connected component of \({{\mathcal {C}}}_2\) detected for all parameters in s210List must, by continuity, carry exactly one solution in \({{\mathcal {C}}}_2\). Thus each parameter in s210List yields 1, 2 or 3 solutions in \({{\mathcal {C}}}_2\).

Fig. 12
figure 12

Successive zoom-ins toward the region of finest subdivision in Fig. 11

Finally, we discuss the parameters in s300List. Without additional information, the condition used to indicate the presence of a cubic bifurcation does not exclude there being no solutions; it only bounds the number of solutions from above by three. We do, however, guarantee the existence of at least one solution for elements of s300List via an extra topological check performed during the computations.

As mentioned above, each parameter set stored in s300List comes equipped with a cover of the associated solution set \(\{{{\varvec{c}}}_i\}_{i=1}^N\) making up a connected subset of \({{\mathcal {C}}}_2\). Forming the rectangular hull \({{\varvec{c}}}\) of all \({{\varvec{c}}}_i\), \(i=1,\dots ,N\) we prove that there must be a zero of f inside \({{\varvec{c}}}\) (and thus inside \({{\mathcal {C}}}_2\)) using the following topological theorem.

Theorem 15

Let \(f:[x^-, x^+]\times [y^-, y^+]\rightarrow {\mathbb R}^2\) be a continuous function (with components \(f_1\) and \(f_2\)), and assume that the following holds:

  1. 1.

    Both \(f_1\) and \(f_2\) are negative on the two sides \(\{x^-\}\times [y^-, y^+]\) and \([x^-, x^+]\times \{y^-\}\).

  2. 2.

    Both \(f_1\) and \(f_2\) are positive at the upper-right corner \((x^+, y^+)\).

  3. 3.

    \(\max \{x\in [x^-, x^+] :f_1(x,y^+) = 0\} < \min \{x\in [x^-, x^+] :f_2(x,y^+) = 0\}\).

  4. 4.

    \(\max \{y\in [y^-, y^+] :f_2(x^+,y) = 0\} < \min \{y\in [y^-, y^+] :f_1(x^+,y) = 0\}\).

Then f has (at least) one zero in \([x^-, x^+]\times [y^-, y^+]\).

Of course, the theorem remains true when we reverse all appearing signs, or interchange the function components. We can also relax assumption 1 and simply demand that \(f_1\) is negative on \([x^-, x^+]\times \{y^-\}\), and \(f_2\) is negative on \(\{x^-\}\times [y^-, y^+]\). We keep the current (stronger) assumptions as they actually hold for the problem at hand. A typical realization of these are illustrated in Fig. 13.

Fig. 13
figure 13

An illustration of how the (set-valued) zero-sets of \(f_1\) and \(f_2\) behave for typical parameters in s300List

Proof

By the intermediate value theorem, assumptions 1 and 2 imply that both \(f_1\) and \(f_2\) change sign (and therefore vanish) somewhere along the two sides \([x^-, x^+]\times \{y^+\}\) and \(\{x^+\}\times [y^-, y^+]\). Therefore the sets appearing in assumptions 3 and 4 are non-empty, and the inequalities can be checked. Now, let \(x^\star \) satisfy \(\max \{x\in [x^-, x^+] :f_1(x,y^+) = 0\}< x^\star < \min \{x\in [x^-, x^+] :f_2(x,y^+) = 0\}\). Similarly, let \(y^\star \) satisfy \(\max \{y\in [y^-, y^+] :f_2(x^+,y) = 0\}< y^\star < \min \{y\in [y^-, y^+] :f_1(x^+,y) = 0\}\). Then, by a continuous deformation, we can form a new rectangle with corners \((x^-,y^-), (x^\star , y^+), (x^+, y^+)\), and \((x^+, y^\star )\) on which we can directly apply the Poincaré-Miranda theorem, see [14]. It follows that f has a zero in the original rectangle. \(\square \)

As all assumptions of the theorem are open, we can extend it to the case when f depends on (set-valued) parameters. This is what we use for the parameters forming \(\texttt {s300List}\).

7.4 Timings

We end this section by reporting the timings of all computations. These we carried out sequencially on a latop, using a single thread on an Intel Core i7-7500U CPU running at 2.70 GHz. The memory requirements are very low, and are not reported. For all computations we used the same splitting tolerance: \(\texttt {tol} = 10^{-6}\). The total wall time amounts to 16m:29s. This can of course be massively reduced by a finer pre-splitting in parameter space, combined with a simple script for parallel execution (Table 1).

Table 1 Wall timings for the computer-assisted part of the proof

8 Conclusions and Future Work

We have demonstrated a novel way to account for all relative equilibria in the planar, circular, restricted 4-body problem, and used it to give a new proof of the results by Barros and Leandro (Theorem 6). The novely of our approach is that it does not rely upon any algebraic considerations; it is purely analytic. As such it is completely insensitive to the exact shape of the gravitational potential, and generalizes to a wider range of problems. The main advantage, however, is that our method is amenable to set-valued computations, and as such can use the mature techniques and machinery available to solve non-linear equations with computer-assisted methods. This, in turn, gives us a realistic expectation that our approach is transferable to harder instances of the n-body problem, and our next challenge will be to work on the unrestricted 4-body problem, where we expect the number of relative equilibria to range between 32 and 50 (depending on the masses of the four primaries).