1 Introduction

1.1 Context

Let \(\lambda : A \rightarrow B\) be an isogeny of abelian varieties over \(\mathbb {Q}\). The \(\lambda \)- of A is defined by

$$\begin{aligned} {{\,\mathrm{Sel}\,}}_{\lambda }A :=\ker \left( \mathrm {H}^1(\mathbb {Q},A[\lambda ]) \rightarrow \prod _v \mathrm {H}^1(\mathbb {Q}_v,A) \right) , \end{aligned}$$

where \(A[\lambda ]\) denotes the kernel of \(\lambda \), the cohomology groups are Galois cohomology and the product runs over all places v of \(\mathbb {Q}\). It is a finite group defined by local conditions and fits in an exact sequence

The determination of \({{\,\mathrm{Sel}\,}}_{\lambda }A\), known as performing a \(\lambda \)-descent, is often the first step towards determining the finitely generated abelian groups \(A(\mathbb {Q})\) and \(B(\mathbb {Q})\). One is therefore led to ask how \({{\,\mathrm{Sel}\,}}_{\lambda }A\) behaves on average as \(\lambda \) varies in families. When \(A=B\) ranges over a family of Jacobian varieties and \(\lambda \) is multiplication by an integer, the last ten years have seen spectacular progress in this direction; see for example [7, 12, 13, 52, 56] for works of particular relevance to this paper. There are some results when A is not a Jacobian variety (see for example [9, 40, 41]) but they concern twists of a single abelian variety over \(\mathbb {Q}\), therefore considering only an isotrivial family in the relevant moduli space. By contrast in this paper we study for the first time a non-isotrivial family of abelian varieties which are not Jacobians.

1.2 Statement of results

Let \(\mathscr {E} \subset \mathbb {Z}^4\) be the subset of 4-tuples of integers \(b=(p_2,p_6,p_8,p_{12})\) such that the projective closure of the equation

$$\begin{aligned} y^4+p_2xy^2+p_6y^2 = x^3+p_8x+p_{12} \end{aligned}$$
(1.1)

defines a smooth genus-3 curve \(C_b\) over \(\mathbb {Q}\). The quotient of \(C_b\) by the involution \(\tau (x,y) = (x,-y)\) is an elliptic curve \(E_b\) given by the equation

$$\begin{aligned} y^2+p_2xy+p_6y = x^3+p_8x+p_{12}. \end{aligned}$$
(1.2)

The associated morphism \(f:C_b\rightarrow E_b\) is a double cover ramified at four points, namely the ones with \(y=0\) and the point at infinity. The families of curves \(C_b\) and \(E_b\) parametrized by such b have a moduli interpretation, see Remark 1.2.

Let \(J_b\) be the Jacobian variety of \(C_b\) and let \(P_b\) be the kernel of the norm map \(f_* :J_b \rightarrow E_b\). Then \(P_b\) is an abelian surface carrying a polarization \(\rho : P_b \rightarrow P_b^{\vee }\) of type (1, 2). (This means that \(P_b[\rho ](\overline{\mathbb {Q}})\simeq (\mathbb {Z}/2\mathbb {Z})^2\).) It is called the associated to the double cover \(C_b \rightarrow E_b\). The abelian threefold \(J_b\) is isogenous to \(P_b \times E_b\).

For \(b\in \mathscr {E}\) we define the of b as

$$\begin{aligned} \text {ht}(b) = \max |p_i(b)|^{1/i}. \end{aligned}$$

Note that for every \(X\in \mathbb {R}_{>0}\), the set \(\{b\in \mathscr {E} \mid \text {ht}(b) < X \}\) is finite.

Theorem 1.1

(Theorem 8.2) The average size of \({{\,\mathrm{Sel}\,}}_{\rho } P_b\) for \(b\in \mathscr {E}\), when ordered by height, equals 3. More precisely, we have

$$\begin{aligned} \lim _{X\rightarrow \infty } \frac{ \sum _{b\in \mathscr {E},\; \text {ht}(b)<X }\# {{\,\mathrm{Sel}\,}}_{\rho }P_b }{\# \{b \in \mathscr {E}\mid \text {ht}(b) < X\}} = 3. \end{aligned}$$

Theorem 1.2

(Theorem 8.4) The average size of \({{\,\mathrm{Sel}\,}}_{2} P_b\) for \(b\in \mathscr {E}\), when ordered by height, is bounded above by 5. More precisely, we have

$$\begin{aligned} \limsup _{X\rightarrow \infty } \frac{ \sum _{b\in \mathscr {E},\; \text {ht}(b)<X }\# {{\,\mathrm{Sel}\,}}_{2}P_b }{\# \{b \in \mathscr {E}\mid \text {ht}(b) < X\}} \le 5. \end{aligned}$$

In fact, both theorems also hold when \(\mathscr {E}\) is replaced by a subset defined by finitely many congruence conditions.

Remark 1.1

We expect that the limit in Theorem 1.2 exists and equals 5, see the end of Sect. 1.3.

We mention a few standard consequences of the above theorems. The first one concerns the Mordell–Weil rank \({{\,\mathrm{rk}\,}}(P_b)\) of \(P_b\). Using the inequalities \(2{{\,\mathrm{rk}\,}}(P_b) \le 2^{{{\,\mathrm{rk}\,}}(P_b)}\le \#{{\,\mathrm{Sel}\,}}_2P_b\), Theorem 1.2 immediately implies:

Corollary 1.1

The average rank of \(P_b\) for \(b\in \mathscr {E}\), when ordered by height, is bounded above by 5/2.

Because the rank of \(J_b\) equals the sum of the ranks of its isogeny factors \(P_b\) and \(E_b\), Corollary 1.1 also gives a bound on the average rank of the family of Jacobians \(J_b\) for \(b\in \mathscr {E}\), once a bound for the average rank of \(E_b\) is known. Since the statistical properties of Selmer groups of the family of elliptic curves \(E_b\) reduce to those of the family of elliptic curves in short Weierstrass form (see Remark 8.1), we may use the previously obtained estimates in the case of elliptic curves [11,  Theorem 3] to obtain:

Corollary 1.2

The average rank of \(J_b\) for \(b\in \mathscr {E}\), when ordered by height, is \(<5/2+0.885 = 3.385\).

1.3 Methods

The basic proof strategy is the same as the one employed in previous works: for each of the isogenies \(\rho \) and [2], we construct a representation of a reductive group over \(\mathbb {Q}\) whose integral orbits parametrize Selmer elements and then count those orbits using the geometry-of-numbers techniques pioneered by Bhargava and his collaborators. Given the robustness of these counting techniques, the crux of the matter is finding the right representation in the first place and showing that its rational orbits relate to the arithmetic of our isogeny of interest.

Previous cases suggest that relevant representations can very often be constructed using graded Lie algebras. In the special case of \(\mathbb {Z}/2\mathbb {Z}\)-gradings on simply laced Lie algebras, Thorne [60] has made this very explicit using the connection with simple singularities [58], paving the way for studying the 2-Selmer groups of certain families of curves using orbit-counting techniques. (See the introduction of [34] for a more detailed exposition.) In the classical cases \(A_n\) or \(D_n\) the families of curves in question are hyperelliptic with marked points and most of these results were already obtained using different methods (where the papers [7, 55, 56] handle the cases \(A_{2n}, A_{2n+1}\), \(D_{2n+1}\) respectively), but in the exceptional cases \(E_6, E_7, E_8\) the curves are not hyperelliptic and this framework has led to new results: see [34, 52, 53, 61].

The present work is a first attempt to incorporate non-simply laced Dynkin diagrams in the above picture, and more specifically the Dynkin diagram of type \(F_4\). Since non-simply laced Dynkin diagrams have a more complicated relationship to geometry (as can be seen in the work of Slodowy [58] which forms the basis of Thorne’s framework), this introduces various difficulties. The starting observation is the following. If \({{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\) is a simple complex Lie algebra of type \(E_6\), then there exists an involution \(\zeta :{{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\rightarrow {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\) whose fixed point subalgebra \({{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}^{\zeta }\) is a simple complex Lie algebra of type \(F_4\). This procedure is somewhat informally depicted as folding the Dynkin diagram of \(E_6\):

figure d

It suggests that studying the \(F_4\) case should correspond to studying the \(E_6\) case equivariantly with respect to the symmetry of the Dynkin diagram. This viewpoint is already present in the work of Slodowy [58] where he identifies the restriction of the adjoint quotient of the \(F_4\) Lie algebra to a subregular transverse slice as the semi-universal deformation of the \(E_6\) surface singularity with ‘fixed symmetries’, and analogously for other non-simply laced Lie algebras. We will approach Theorems 1.1 and 1.2 similarly.

In more detail, we will define an involution \(\zeta \) on the representation \((\mathsf {G}_{\mathrm {E}},\mathsf {V}_{\mathrm {E}})\) constructed by Thorne in the \(E_6\) case, whose fixed points give rise to a representation \(\mathsf {V}\) of a reductive group \(\mathsf {G}\). The family \(C\) of Eq. (1.1) is then the subfamily of the semi-universal deformation of the \(E_6\) curve singularity (explicitly given by Eq. (3.1)) to which the involution \(\tau (x,y) =(x,-y)\) lifts. In our previous work [34] we have constructed an embedding of \({{\,\mathrm{Sel}\,}}_2 J_b\) in the set of \(\mathsf {G}_{\mathrm {E}}(\mathbb {Q})\)-orbits of \(\mathsf {V}_{\mathrm {E}}(\mathbb {Q})\). The techniques of that paper combined with a detailed study of the actions of \(\tau \) and \(\zeta \) allow us to embed \({{\,\mathrm{Sel}\,}}_2 P_b\) into the set of \(\mathsf {G}(\mathbb {Q})\)-orbits of \(\mathsf {V}(\mathbb {Q})\). In that same paper, a general construction of integral orbit representatives was given using properties of compactified Jacobians. A similar construction works here using a compactified Prym variety instead.

It then seems that Theorem 1.2 follows from geometry-of-numbers arguments to count integral orbits in \(\mathsf {V}\), but there is a catch: such arguments will only allow us to count ‘strongly irreducible’ elements of \({{\,\mathrm{Sel}\,}}_2 P_b\). To explain what this means, note that there exists a unique isogeny \(\hat{\rho }:P_b^{\vee } \rightarrow P_b\) such that \([2] = \hat{\rho }\circ \rho \), giving rise to the exact sequence

$$\begin{aligned} {{\,\mathrm{Sel}\,}}_{\rho } P_b \rightarrow {{\,\mathrm{Sel}\,}}_2 P_b \rightarrow {{\,\mathrm{Sel}\,}}_{\hat{\rho }} P_b^{\vee }. \end{aligned}$$

We say an element of \({{\,\mathrm{Sel}\,}}_2 P_b\) is if it has nontrivial image in \({{\,\mathrm{Sel}\,}}_{\hat{\rho }} P_b^{\vee }\). Estimating \({{\,\mathrm{Sel}\,}}_2P_b\) then breaks up into two parts: estimating the strongly irreducible elements (which can be done using the representation \(\mathsf {V}\)), and \({{\,\mathrm{Sel}\,}}_{\rho }P_b\). This is not unlike the situation of [10], where the representation used in that paper only counts elements of the 4-Selmer group of an elliptic curve of exact order 4, i.e. having nontrivial image in the 2-Selmer group.

Therefore to prove Theorem 1.2 it remains to prove Theorem 1.1, which we focus on now. Using a classical geometric construction going back to Pantazis, we may reduce to estimating the size of \({{\,\mathrm{Sel}\,}}_{\hat{\rho }}P_b^{\vee }\) instead. A construction in invariant theory which we call the ‘resolvent binary quartic’ allows us to embed \({{\,\mathrm{Sel}\,}}_{\hat{\rho }}P^{\vee }_b\) in the set of \({{\,\mathrm{PGL}\,}}_2(\mathbb {Q})\)-orbits of binary quartic forms with rational coefficients. Counting orbits of integral binary quartic forms using the techniques of [12] and modifying the local conditions leads to the determination of the average size of \({{\,\mathrm{Sel}\,}}_{\hat{\rho }}P^{\vee }_b\), proving Theorem 1.1 and consequently Theorem 1.2.

We end this introduction by discussing some limitations, questions and remarks. We only obtain an upper bound in Theorem 1.2 because we are unable to prove a uniformity estimate similar to [12,  Theorem 2.13] hence we cannot apply the so-called square-free sieve to obtain an equality in Theorem 7.2. We expect that a similar such estimate holds and that the average size of \({{\,\mathrm{Sel}\,}}_2 P_b\) equals 5. For proving an equality in Theorem 1.1, we bypassed proving such a uniformity estimate by reducing it to the one established by Bhargava and Shankar [12,  Theorem 2.13]. The crucial ingredient for this reduction step is Corollary 5.1 which is based on a detailed analysis of Néron component groups of certain Prym varieties in Sect. 5.3.

The fact that the \(\hat{\rho }\)-Selmer group of \(P_b^{\vee }\) (and so consequently, by the ‘bigonal construction’ of Theorem 3.1, the \(\rho \)-Selmer group of \(P_b\)) has an interpretation in terms of binary quartic forms (Theorem 4.2) might be of independent interest. It seems conceivable that a further analysis would make the computation of \({{\,\mathrm{Sel}\,}}_{\rho }P_b\) possible using binary quartic forms, similar to the computation of the 2-Selmer group of an elliptic curve.

We compare our results with the heuristics of Poonen and Rains [47], which provide a framework for statistics of Selmer groups using random matrix models. The self-dual isogeny \(\rho :P_b\rightarrow P^{\vee }_b\) is defined by a symmetric line bundle, soFootnote 1 [47,  Theorem 4.13] shows that \({{\,\mathrm{Sel}\,}}_{\rho }P_b\) is the intersection of two maximal isotropic subspaces of an infinite-dimensional quadratic space over \(\mathbb {F}_2\). It is therefore natural to ask whether the distribution of \(\#{{\,\mathrm{Sel}\,}}_{\rho }P_b\) coincides with the one modelling 2-Selmer groups of elliptic curves (Conjecture 1.1 of op. cit.); Theorem 1.1 provides evidence for this. On the other hand, the isogeny \([2]:P_b \rightarrow P_b\) is not self-dual and a different type of matrix model is needed.

Remark 1.2

The families of curves considered here have a moduli interpretation. Loosely speaking, Eq. (1.2) defines the universal family of elliptic curves with a marked line in its Weierstrass embedding (here given by intersecting with the line \(\{y = 0\}\)) not meeting the origin \(\infty \), and Eq. (1.1) describes the double cover of this elliptic curve branched along the marked line and \(\infty \).

Remark 1.3

Stable gradings on nonsimply laced Lie algebras have played an implicit role before in arithmetic statistics. In [6], the authors study the 3-isogeny Selmer group of the family of cubic twist elliptic curves \(y^2= x^3+k\). They use a representation associated to a \(\mathbb {Z}/3\mathbb {Z}\)-grading on a Lie algebra of type \(G_2\). This forms the starting point of the previously cited results of [9], so graded Lie algebras play a role there too.

Remark 1.4

Bhargava and Ho have studied the representation \(\mathsf {V}\) before in the context of invariant theory of genus-1 curves (cf. Entry 10 of [8,  Table 1]). It would be interesting to relate their geometric constructions to ours, and to see how the Prym variety fits in their description.

1.4 Organization

In Sect. 2 we define the representation \((\mathsf {G},\mathsf {V})\), summarize its invariant theory and describe it explicitly. Moreover we describe the resolvent binary quartic of an element of \(\mathsf {V}\). In Sect. 3, we start by establishing a link between stable orbits in \(\mathsf {V}\) and the family of curves \(C\rightarrow \mathsf {B}\). Then we introduce the family of Prym varieties \(P\rightarrow \mathsf {B}\) and study its geometry. The construction of orbits associated with Selmer elements is the content of Sect. 4. We start by embedding the 2-Selmer group inside the space of rational orbits of the representation \(\mathsf {V}\). We then define a new representation \(\mathsf {V}^{\star }\) of \(\mathsf {G}^{\star }\) (very closely related to binary quartic forms) and embed the \(\hat{\rho }\)-Selmer group inside the space of rational orbits of \(\mathsf {V}^{\star }\). In Sect. 5, we prove that orbits coming from Selmer elements admit integral representatives away from small primes. Then we count integral orbits of \(\mathsf {V}\) and \(\mathsf {V}^{\star }\) using geometry-of-numbers techniques in Sects. 6 and 7 respectively. Finally in Sect. 8 we combine all of the above ingredients and prove Theorems 1.1 and 1.2.

1.5 Notation

For a field k we write \(\bar{k}\) for a fixed algebraic closure of k.

If X is a scheme over S and \(T\rightarrow S\) a morphism we write \(X_T\) for the base change of X to T. If \(T = {{\,\mathrm{Spec}\,}}A\) is an affine scheme we also write \(X_A\) for \(X_T\). We write X(S) for the set of sections of the structure map \(X\rightarrow S\) and \(X(T) = X_T(T)\).

Table 1 Notation used throughout the paper

If \(\lambda :A\rightarrow B\) is a morphism between group schemes we write \(A[\lambda ]\) for the kernel of \(\lambda \).

If T is a torus over a field k and V a representation of T, we write \(\varPhi (V,T)\subset X^*(T)\) for the set of weights of T on V. If H is a group scheme over k containing T, we write \(\varPhi (H,T)\) for \(\varPhi ({{\,\mathrm{Ad}\,}}H, T)\), where \({{\,\mathrm{Ad}\,}}H\) denotes the adjoint representation of H.

If G is a smooth group scheme over S we write \(\mathrm {H}^1(S,G)\) for the set of isomorphism classes of étale sheaf torsors under G over S, which is a pointed set coming from non-abelian Čech cohomology. If \(S = {{\,\mathrm{Spec}\,}}R\) we write \(\mathrm {H}^1(R,G)\) for the same object. If \(G\rightarrow S\) is affine then every sheaf torsor under G is representable by a scheme.

If \(G\rightarrow S\) is a group scheme acting on \(X\rightarrow S\) and \(x \in X(T)\) is a T-valued point, we write \(Z_G(x) \rightarrow T\) for the centralizer of x. If x is an element of a Lie algebra \({{\,\mathrm{\mathfrak {h}}\,}}\), we write \(\mathfrak {z}_{{{\,\mathrm{\mathfrak {h}}\,}}}(x)\) for the centralizer of x, a subalgebra of \({{\,\mathrm{\mathfrak {h}}\,}}\).

A \(\mathbb {Z}/2\mathbb {Z}\)- on a Lie algebra \({{\,\mathrm{\mathfrak {h}}\,}}\) over a field k is a direct sum decomposition

$$\begin{aligned} {{\,\mathrm{\mathfrak {h}}\,}}= \bigoplus _{i\in \mathbb {Z}/2\mathbb {Z}} {{\,\mathrm{\mathfrak {h}}\,}}(i) \end{aligned}$$

of linear subspaces of \({{\,\mathrm{\mathfrak {h}}\,}}\) such that \([h(i),h(j)] \subset {{\,\mathrm{\mathfrak {h}}\,}}(i+j)\) for all \(i,j \in \mathbb {Z}/2\mathbb {Z}\). If 2 is invertible in k then giving a \(\mathbb {Z}/2\mathbb {Z}\)-grading is equivalent to giving an involution of \({{\,\mathrm{\mathfrak {h}}\,}}\).

If V is a finite free R-module over a ring R we write R[V] for the graded algebra \({{\,\mathrm{Sym}\,}}(R^{\vee })\). Then V is naturally identified with the R-points of the scheme \({{\,\mathrm{Spec}\,}}R[V]\), and we call this latter scheme V as well. If G is a group scheme over R we write \(V \mathbin {//}G:={{\,\mathrm{Spec}\,}}R[V]^G\) for the of V by G.

2 Representation theory

2.1 Definition of the representation \(\mathsf {V}\)

In this section we define the pair \((\mathsf {G},\mathsf {V})\) using a \(\mathbb {Z}/2\mathbb {Z}\)-grading on a Lie algebra of type \(F_4\). We will define it by embedding it in a larger representation \((\mathsf {G}_{\mathrm {E}},\mathsf {V}_{\mathrm {E}})\) defined in [34,  §2.1] using a \(\mathbb {Z}/2\mathbb {Z}\)-grading on a Lie algebra of type \(E_6\), which we recall first. Objects related to \(\mathsf {V}_{\mathrm {E}}\) will usually denoted by a subscript \((-)_{\mathrm {E}}\).

Let \(\mathsf {H}_{\mathrm {E}}\) be a split adjoint semisimple group of type \(E_6\) over \(\mathbb {Q}\) with Lie algebra \({{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\). We suppose that \(\mathsf {H}_{\mathrm {E}}\) comes with a pinning \((\mathsf {T}_{\mathrm {E}},\mathsf {P}_{\mathrm {E}},\{Y_{\alpha }\})\). So \(\mathsf {T}_{\mathrm {E}}\subset \mathsf {H}_{\mathrm {E}}\) is a split maximal torus (which determines a root system \(\varPhi (\mathsf {H}_{\mathrm {E}},\mathsf {T}_{\mathrm {E}}) \subset X^*(\mathsf {T}_{\mathrm {E}})\)), \(\mathsf {P}_{\mathrm {E}}\subset \mathsf {H}_{\mathrm {E}}\) is a Borel subgroup containing \(\mathsf {T}_{\mathrm {E}}\) (which determines a root basis \(S_{\mathsf {H}_{\mathrm {E}}} \subset \varPhi (\mathsf {H}_{\mathrm {E}},\mathsf {T}_{\mathrm {E}})\)) and \(Y_{\alpha }\) is a generator for each root space \(({{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}})_{\alpha }\) for \(\alpha \in S_{\mathsf {H}_{\mathrm {E}}}\). The group \(\mathsf {H}_{\mathrm {E}}\) is of dimension 78.

Let \(\check{\rho }_{\mathrm {E}}\in X_*(\mathsf {T}_{\mathrm {E}})\) be the sum of the fundamental coweights with respect to \(S_{\mathsf {H}_{\mathrm {E}}}\), defined by the property that \(\langle \check{\rho }_{\mathrm {E}},\alpha \rangle = 1\) for all \(\alpha \in S_{\mathsf {H}_{\mathrm {E}}}\). Write \(\zeta :\mathsf {H}_{\mathrm {E}}\rightarrow \mathsf {H}_{\mathrm {E}}\) for the unique nontrivial automorphism preserving the pinning: it is an involution inducing the order-2 symmetry of the Dynkin diagram of \(E_6\). Let

$$\begin{aligned} \theta _{\mathrm {E}}:=\zeta \circ {{\,\mathrm{Ad}\,}}(\check{\rho }_{\mathrm {E}}(-1)) = {{\,\mathrm{Ad}\,}}(\check{\rho }_{\mathrm {E}}(-1)) \circ \zeta . \end{aligned}$$

Then \(\theta _{\mathrm {E}}\) defines an involution of \({{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\) and thus by considering \((\pm 1)\)-eigenspaces it determines a \(\mathbb {Z}/2\mathbb {Z}\)-grading

$$\begin{aligned} {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}= {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}(0) \oplus {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}(1). \end{aligned}$$

Let \(\mathsf {G}_{\mathrm {E}}:=\mathsf {H}_{\mathrm {E}}^{\theta _{\mathrm {E}}}\) be the centralizer of \(\theta _{\mathrm {E}}\) in \(\mathsf {H}_{\mathrm {E}}\subset {{\,\mathrm{Aut}\,}}({{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}})\) and write \(\mathsf {V}_{\mathrm {E}}:={{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}(1)\); the space \(\mathsf {V}_{\mathrm {E}}\) defines a representation of \(\mathsf {G}_{\mathrm {E}}\) and its Lie algebra \(\mathfrak {g}_{\mathrm {E}}\) by restricting the adjoint representation. The pair \((\mathsf {G}_{\mathrm {E}},\mathsf {V}_{\mathrm {E}})\) has been studied extensively in [34].

We now consider the \(\zeta \)-fixed points of the above objects. Let \(\mathsf {H}:=\mathsf {H}_{\mathrm {E}}^{\zeta }\) and \(\mathfrak {h}:={{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}^{\zeta }\). Then \(\mathsf {H}\) is a split adjoint semisimple group of type \(F_4\) with Lie algebra \(\mathfrak {h}\), and the pinning of \(\mathsf {H}_{\mathrm {E}}\) induces a pinning of \(\mathsf {H}\), cf. [49,  §3.1]. Indeed, \(\mathsf {T}:=\mathsf {T}_{\mathrm {E}}^{\zeta }\) is a split maximal torus and \(\mathsf {P}:=\mathsf {P}_{\mathrm {E}}^{\zeta }\) is a Borel subgroup containing \(\mathsf {T}\). They determine a root system \(\varPhi (\mathsf {H},\mathsf {T}) \subset X^*(\mathsf {T})\) and a root basis \(S_{\mathsf {H}} \subset \varPhi (\mathsf {H},\mathsf {T})\) respectively. The natural map \(X^*(\mathsf {T}_{\mathrm {E}}) \rightarrow X^*(\mathsf {T})\) restricts to a surjection \(S_{\mathsf {H}_{\mathrm {E}}} \rightarrow S_{\mathsf {H}}\) where two different elements \(\beta ,\beta '\in S_{\mathsf {H}_{\mathrm {E}}}\) define the same element of \(S_{\mathsf {H}}\) if and only if \(\beta ' = \zeta (\beta )\). (The map \(S_{\mathsf {H}_{\mathrm {E}}}\rightarrow S_{\mathsf {H}}\) can be seen as ‘folding’ the \(E_6\) Dynkin diagram alluded to in the introduction.) If \(\alpha \in S_{\mathsf {H}}\) we write \([\alpha ]\) for its inverse image in \(S_{\mathsf {H}_{\mathrm {E}}}\) under this map, and we define \(X_{\alpha } :=\sum _{\beta \in [\alpha ]} Y_{\beta } \in \mathfrak {h}_{\alpha }\). Then the triple \((\mathsf {T},\mathsf {P},\{X_{\alpha }\})\) is a pinning of \(\mathsf {H}\). Since \(\theta _{\mathrm {E}}\) commutes with \(\zeta \), the restriction \(\theta :=\theta _{\mathrm {E}}|_{\mathsf {H}}\) defines an involution \(\mathsf {H}\rightarrow \mathsf {H}\). We have \(\theta = {{\,\mathrm{Ad}\,}}\check{\rho }(-1)\), where \(\check{\rho }\in X_*(\mathsf {T})\) is the sum of the fundamental coweights with respect to \(S_{\mathsf {H}}\). As before this determines a \(\mathbb {Z}/2\mathbb {Z}\)-grading

$$\begin{aligned} \mathfrak {h}= \mathfrak {h}(0) \oplus \mathfrak {h}(1). \end{aligned}$$

Let \(\mathsf {G}= \mathsf {H}^{\theta }\) be the centralizer of \(\theta \) in \(\mathsf {H}\) and write \(\mathsf {V}:=\mathfrak {h}(1)\). Again \(\mathsf {V}\) defines a representation of \(\mathsf {G}\) and its Lie algebra \(\mathfrak {g}\). The pair \((\mathsf {G},\mathsf {V})\) is the central object of study in this paper. We summarize some of its basic properties here.

Proposition 2.1

The groups \(\mathsf {G}_{\mathrm {E}},\mathsf {H},\mathsf {G}\) are split connected semisimple groups over \(\mathbb {Q}\) with maximal torus \(\mathsf {T}\). Their properties are listed in Table 2. The vector spaces \(\mathsf {V}_{\mathrm {E}}\) and \(\mathsf {V}\) have dimension 42 and 28 respectively.

Table 2 Properties of the semisimple groups

Proof

The properties of \(\mathsf {H}\) follow from [49,  Lemma 3.1]. The isomorphism class of \(\mathsf {G}_{\mathrm {E}}\) and \(\mathsf {G}\) over \(\overline{\mathbb {Q}}\) can be deduced from the analysis of the Kač diagrams of the automorphisms \(\theta _{\mathrm {E}}\), \(\theta \) given in [50,  §7.1, Tables 2 and 6], using the results of [49]. (The notation \(({{\,\mathrm{Sp}\,}}_6\times {{\,\mathrm{SL}\,}}_2)/\mu _2\) means the quotient of \({{\,\mathrm{Sp}\,}}_6\times {{\,\mathrm{SL}\,}}_2\) by the diagonally embedded \(\mu _2\) in the center.) These groups are split since \(\mathsf {T}\) is a split torus of maximal rank. \(\square \)

The next proposition concerns the invariant theory of the pair \((\mathsf {G},\mathsf {V})\) and shows that regular semisimple orbits over algebraically closed fields are well understood. For a field \(k/\mathbb {Q}\), we say \(v\in \mathsf {V}(k)\) is respectively if it is so when considered as an element of \(\mathfrak {h}(k)\).

Proposition 2.2

Let \(k/\mathbb {Q}\) be a field. The following properties are satisfied:

  1. 1.

    \(\mathsf {V}_k\) satisfies the Chevalley restriction theorem: if \(\mathfrak {a} \subset \mathsf {V}_k\) is a Cartan subalgebra, then the map \(N_{\mathsf {G}}(\mathfrak {a}) \rightarrow W_{\mathfrak {a}} :=N_{\mathsf {H}}(\mathfrak {a})/Z_{\mathsf {H}}(\mathfrak {a})\) is surjective, and the inclusions \(\mathfrak {a} \subset \mathsf {V}_k\subset \mathfrak {h}_k\) induce isomorphisms

    $$\begin{aligned} \mathfrak {a}\mathbin {//}W_{\mathfrak {a}} \simeq \mathsf {V}_k\mathbin {//}\mathsf {G}\simeq \mathfrak {h}_k \mathbin {//}\mathsf {H}. \end{aligned}$$

    In particular, the quotient is isomorphic to affine space.

  2. 2.

    Suppose that k is algebraically closed and let \(x,y\in \mathsf {V}(k)\) be regular semisimple elements. Then x is \(\mathsf {G}(k)\)-conjugate to y if and only if xy have the same image in \(\mathsf {V}\mathbin {//}\mathsf {G}\).

Proof

These are classical results in the invariant theory of graded Lie algebras due to Vinberg and Kostant–Rallis; we refer to [60,  §2] for precise references. \(\square \)

We now give some alternative characterizations of regular semisimple elements in \(\mathsf {V}\), after introducing some more notation. First recall that the of \(\mathfrak {h}\) is the image under the Chevalley isomorphism \(\mathbb {Q}[\mathfrak {t}]^{W(\mathsf {H},\mathsf {T})} \rightarrow \mathbb {Q}[\mathfrak {h}]^{\mathsf {H}}\) of the product of all roots \(\alpha \in \varPhi (\mathsf {H},\mathsf {T})\), where \(\mathfrak {t}:={{\,\mathrm{Lie}\,}}\mathsf {T}\). Write \(\varDelta \in \mathbb {Q}[\mathsf {V}]^{\mathsf {G}}\) for its restriction to \(\mathsf {V}\subset \mathfrak {h}\). Next we introduce weights of one-parameter subgroups. If \(k/\mathbb {Q}\) is a field and \(\lambda :\mathbb {G}_m \rightarrow \mathsf {G}_{k}\) a homomorphism, we may decompose \(\mathsf {V}(k)\) as \(\oplus _{i\in \mathbb {Z}} \mathsf {V}_i\) where \(\mathsf {V}_i = \{v\in \mathsf {V}(k)\mid \lambda (t)\cdot v = t^iv \}\). Every \(v\in \mathsf {V}(k)\) can be written as \(v = \sum v_i\) and we call integers i with \(v_i \ne 0\) the v \(\lambda \).

Proposition 2.3

Let \(k/\mathbb {Q}\) be field and \(v\in \mathsf {V}(k)\). Then the following are equivalent:

  1. 1.

    v is regular semisimple.

  2. 2.

    \(\varDelta (v) \ne 0\).

  3. 3.

    The \(\mathsf {G}\)-orbit of v is closed in \(\mathsf {V}\) and \(Z_{\mathsf {G}}(v)\) is finite (i.e. v is stable in the sense of geometric invariant theory).

  4. 4.

    For every nontrivial homomorphism \(\lambda :\mathbb {G}_m \rightarrow \mathsf {G}_{\bar{k}}\), v has a negative weight with respect to \(\lambda \).

Proof

The equivalence between the first two properties is a well known property of the discriminant. The first property implies the third by [60,  Proposition 2.8], and the converse follows from [50,  Lemma 5.6]. Finally, the equivalence between the last two properties is the content of the Hilbert-Mumford stability criterion [43]. \(\square \)

We write \(\mathsf {B}:=\mathsf {V}\mathbin {//}\mathsf {G}= {{\,\mathrm{Spec}\,}}\mathbb {Q}[\mathsf {V}]^{\mathsf {G}}\), \(\mathsf {B}_{\mathrm {E}}:=\mathsf {V}_{\mathrm {E}}\mathbin {//}\mathsf {G}_{\mathrm {E}}= {{\,\mathrm{Spec}\,}}\mathbb {Q}[\mathsf {V}_{\mathrm {E}}]^{\mathsf {G}_{\mathrm {E}}}\) and \(\pi :\mathsf {V}\rightarrow \mathsf {B}\), \(\pi _{\mathrm {E}}:\mathsf {V}_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\) for the natural quotient maps. Scaling defines \(\mathbb {G}_m\)-actions on \(\mathsf {V}\) and \(\mathsf {V}_{\mathrm {E}}\), and there are unique \(\mathbb {G}_m\)-actions on \(\mathsf {B}\) and \(\mathsf {B}_{\mathrm {E}}\) such that the morphisms \(\pi \) and \(\pi _{\mathrm {E}}\) are \(\mathbb {G}_m\)-equivariant. In Sect. 3.1 we will describe the weights of \(\mathsf {B}\) and \(\mathsf {B}_{\mathrm {E}}\).

2.2 The distinguished orbit

We describe a section of the quotient map \(\pi :\mathsf {V}\rightarrow \mathsf {B}\) whose construction is originally due to Kostant. Let \(E :=\sum _{\alpha \in S_{\mathsf {H}_{\mathrm {E}}}} Y_{\alpha } \in {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\). Then E is a regular nilpotent element of \({{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\) which lies in \(\mathfrak {h}(1)\). Using [60,  Proposition 2.7], there exists a unique normal \({{\,\mathrm{\mathfrak {sl}}\,}}_2\)-triple (EXF) containing E. By definition, this means that (EXF) satisfies the identities

$$\begin{aligned} {[}X,E] = 2E , \quad [X,F] = -2F ,\quad [E,F] = H, \end{aligned}$$

with the additional property that \(X\in {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}(0)\) and \(F \in {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}(1)\). Since \((E,\zeta (X),\zeta (F))\) is also a normal \({{\,\mathrm{\mathfrak {sl}}\,}}_2\)-triple containing E, we see that \((E,\zeta (X),\zeta (F)) = (E,X,F)\) hence X and F lie in \(\mathfrak {h}\).

We define affine linear subspaces \(\kappa _{\mathrm {E}}:=E +\mathfrak {z}_{{{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}}(F) \subset \mathsf {V}_{\mathrm {E}}\) and \(\kappa :=\kappa _{\mathrm {E}}^{\zeta } = E+\mathfrak {z}_{\mathfrak {h}}(F) \subset \mathsf {V}\).

Proposition 2.4

  1. 1.

    The composite maps \(\kappa \hookrightarrow \mathsf {V}\rightarrow \mathsf {B}\) and \(\kappa _{\mathrm {E}}\hookrightarrow \mathsf {V}_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\) are isomorphisms.

  2. 2.

    \(\kappa \) and \(\kappa _{\mathrm {E}}\) are contained in the open subscheme of regular elements of \(\mathsf {V}\) and \(\mathsf {V}_{\mathrm {E}}\) respectively.

  3. 3.

    The morphisms \(\mathsf {G}\times \kappa \rightarrow \mathsf {V}, (g,v) \mapsto g\cdot v\) and \(\mathsf {G}_{\mathrm {E}}\times \kappa _{\mathrm {E}}\rightarrow \mathsf {V}_{\mathrm {E}}, (g,v) \mapsto g\cdot v\) are étale.

Proof

Parts 1 and 2 are [60,  Lemma 3.5]; the last part is [60,  Proposition 3.4]. (These facts are stated only for simply laced groups in [60] but they remain valid in the \(F_4\) case by the same proof.) \(\square \)

Write \(\sigma : \mathsf {B}\rightarrow \mathsf {V}\) for the inverse of \(\pi |_{\kappa }\) and \(\sigma _{\mathrm {E}}: \mathsf {B}_{\mathrm {E}}\rightarrow \mathsf {V}_{\mathrm {E}}\) for the inverse of \(\pi _{\mathrm {E}}|_{\kappa _{\mathrm {E}}}\). We call \(\sigma \) the for the pair \((\mathsf {G},\mathsf {V})\). It determines, for every field \(k/\mathbb {Q}\) and \(b\in \mathsf {B}(k)\), a distinguished orbit in \(\mathsf {V}(k)\) with invariants b, playing an analogous role to reducible binary quartic forms as studied in [12]. It will be used to organize the set of \(\mathsf {G}(k)\)-orbits of \(\mathsf {V}(k)\).

Definition 2.1

Let \(k/\mathbb {Q}\) be a field and \(v\in \mathsf {V}(k)\). We say v is k- if v is not regular semisimple or v is \(\mathsf {G}(k)\)-conjugate to \(\sigma (b)\) with \(b = \pi (v)\).

If \(k/\mathbb {Q}\) is algebraically closed, every element of \(\mathsf {V}(k)\) is k-reducible by Proposition 2.2.

2.3 The action of \(\zeta \) on \(\mathsf {B}_{\mathrm {E}}\)

The involution \(\zeta : \mathsf {V}_{\mathrm {E}}\rightarrow \mathsf {V}_{\mathrm {E}}\) induces an involution \(\mathsf {B}_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\), still denoted by \(\zeta \).

Proposition 2.5

  1. 1.

    The inclusion \(\mathsf {V}\subset \mathsf {V}_{\mathrm {E}}\) induces a closed embedding \(\mathsf {B}\hookrightarrow \mathsf {B}_{\mathrm {E}}\) whose image is the subset of \(\zeta \)-fixed points of \(\mathsf {B}_{\mathrm {E}}\).

  2. 2.

    The involution \(\zeta :\mathsf {B}_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\) coincides with the involution \((-1):\mathsf {B}_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\) induced by the \(\mathbb {G}_m\)-action on \(\mathsf {B}_{\mathrm {E}}\).

  3. 3.

    Let \(k/\mathbb {Q}\) be a field and \(v\in \mathsf {V}(k)\). Then v is regular semisimple as an element of \(\mathfrak {h}(k)\) if and only if v is regular semisimple as an element of \({{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}(k)\).

Proof

Because the inclusion \(\mathsf {V}\subset \mathsf {V}_{\mathrm {E}}\) restricts to the inclusion \(\kappa \subset \kappa _{\mathrm {E}}\), the first claim follows from Part 1 of Proposition 2.4.

To prove the second claim, recall that \(\mathsf {T}_{\mathrm {E}}\subset \mathsf {H}_{\mathrm {E}}\) denotes a split maximal torus. Write \(\mathfrak {t}_{\mathrm {E}}\subset {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\) for its Lie algebra and \(W_{\mathrm {E}}\) for its Weyl group. By the classical Chevalley restriction theorem and Proposition 2.2 respectively, the inclusions \(\mathfrak {t}_{\mathrm {E}}\hookrightarrow {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\), \(\mathsf {V}_{\mathrm {E}}\hookrightarrow {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\) induce isomorphisms \(\mathfrak {t}_{\mathrm {E}}\mathbin {//}W_{\mathrm {E}} \simeq {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\mathbin {//}\mathsf {H}_{\mathrm {E}}\), \(\mathsf {B}_{\mathrm {E}}\simeq {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\mathbin {//}\mathsf {H}_{\mathrm {E}}\), equivariant with respect to the actions of \(\mathbb {G}_m\) and \(\zeta \). So it suffices to prove that the action of \(\zeta \) on \(\mathfrak {t}_{\mathrm {E}}\mathbin {//}W_{\mathrm {E}}\) is given by \(-1\). Since \(\zeta \) and \(-1\) are not contained in \(W_{\mathrm {E}}\) and this group has index 2 in \(N_{\mathsf {G}_{\mathrm {E}}}(\mathfrak {t}_{\mathrm {E}})\), the product \(-\zeta \) lies in \(W_{\mathrm {E}}\). Therefore \(-\zeta \) acts trivially on \(\mathfrak {t}_{\mathrm {E}}\mathbin {//}W_{\mathrm {E}}\), as desired.

To prove the third claim, we may assume that k is algebraically closed and after conjugating by \(\mathsf {H}(k)\) that \(v\in \mathfrak {t}(k) = \mathfrak {t}_{\mathrm {E}}^{\zeta }(k)\). Then v is regular semisimple as an element of \(\mathfrak {h}(k)\) if and only if \(d\alpha (v)\ne 0\) for all \(\alpha \in \varPhi (\mathsf {H},\mathsf {T})\), and v is regular semisimple as an element of \({{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}(k)\) if and only if \(d\alpha (v)\ne 0\) for all \(\alpha \in \varPhi (\mathsf {H}_{\mathrm {E}},\mathsf {T}_{\mathrm {E}})\). These two statements are equivalent because the restriction map \(\varPhi (\mathsf {H}_{\mathrm {E}},\mathsf {T}_{\mathrm {E}})\rightarrow \varPhi (\mathsf {H},\mathsf {T})\) is surjective. \(\square \)

2.4 An explicit description of \(\mathsf {V}\)

In this section we give an explicit description of \(\mathsf {V}\) which will be convenient for performing computations in Sects. 2.5, 2.6 and 6.10. Recall from Sect. 2.1 that \(\mathfrak {h}\) is a Lie algebra of type \(F_4\) and that there is a direct sum decomposition \(\mathfrak {h}= \mathfrak {g}\oplus \mathsf {V}\) where \(\mathfrak {g}\simeq {{\,\mathrm{\mathfrak {sp}}\,}}_6 \oplus {{\,\mathrm{\mathfrak {sl}}\,}}_2\) and \(\mathsf {V}\) is a 28-dimensional representation of \(\mathfrak {g}\). The split maximal torus \(\mathsf {T}\subset \mathsf {H}\) gives rise to three subsets of \(X^*(\mathsf {T})\): \(\varPhi (\mathsf {H},\mathsf {T})\), \(\varPhi (\mathsf {G},\mathsf {T})\) and \(\varPhi (\mathsf {V},\mathsf {T})\). They will be denoted by \(\varPhi _{\mathsf {H}}\), \(\varPhi _{\mathsf {G}}\) and \(\varPhi _{\mathsf {V}}\) and satisfy \(\varPhi _{\mathsf {H}} = \varPhi _{\mathsf {G}}\sqcup \varPhi _{\mathsf {V}}\). Using the root basis \(S_{\mathsf {H}}\) fixed in Sect. 2.1, \(\varPhi _{\mathsf {V}}\) (resp. \(\varPhi _{\mathsf {G}}\)) consists of those roots in \(\varPhi _{\mathsf {H}}\) which have odd root height (resp. even root height).

Following Bourbaki [19,  Planche VIII], we denote the elements of \(S_{\mathsf {H}} = \{\alpha _1,\alpha _2,\alpha _3,\alpha _4\}\) according to the following labeling of the nodes of the Dynkin diagram:

figure n

Define \(\beta _1, \beta _2, \beta _3, \beta _4\) to be \(\alpha _2+\alpha _3, \alpha _3+\alpha _4, \alpha _1+\alpha _2, \alpha _1+\alpha _2+2\alpha _3\) respectively. Then \(S_{\mathsf {G}} :=\{\beta _1,\beta _2,\beta _3,\beta _4\}\) is a root basis of \(\varPhi _{\mathsf {G}}\), according to the following labelling of the Dynkin diagram of type \(C_3\times A_1\):

figure o

With respect to this root basis the positive roots of \(\varPhi _{\mathsf {G}}\), denoted \(\varPhi _{\mathsf {G}}^+\), are given by

$$\begin{aligned} \{ \beta _1, \beta _2, \beta _3,\beta _1+ & {} \beta _2,\beta _2+\beta _3,2\beta _2+\beta _3,\beta _1+\beta _2 +\beta _3,\beta _1+2\beta _2+\beta _3,2\beta _1\\&+2\beta _2+\beta _3 \} \cup \{\beta _4\}. \end{aligned}$$

Another basis of \(X^*(\mathsf {T})\otimes {\mathbb {Q}}\) will be convenient for describing \(\varPhi _{\mathsf {G}}\) and \(\varPhi _{\mathsf {V}}\). We define

$$\begin{aligned} {\left\{ \begin{array}{ll} L_1 = (2\beta _1+2\beta _2+\beta _3)/2, \\ L_2 = (2\beta _2+\beta _3)/2, \\ L_3 = \beta _3/2,\\ L_4 = \beta _4/2.\\ \end{array}\right. } \end{aligned}$$

Then \(S_{\mathsf {G}} = \{L_1-L_2,L_2-L_3,2L_3,2L_4 \}\) and the elements of \(\varPhi _{\mathsf {G}}\) are given by

$$\begin{aligned} \{\pm L_i \pm L_j \mid 1\le i, j \le 3 \} \cup \{\pm 2L_4 \}. \end{aligned}$$

Using the above explicit description or a general recipe applied to the Kač diagram of \(\theta \) (given in [50,  §7.1, Table 6]), we see that \(\mathsf {V}\) is isomorphic to \(\mathsf {W}\boxtimes (2)\) where \(\mathsf {W}\) is a 14-dimensional irreducible representation of \({{\,\mathrm{\mathfrak {sp}}\,}}_6\) with highest weight \(L_1+L_2+L_3\) (we will choose an explicit realization of this representation in a moment) and (2) denotes the standard representation of \({{\,\mathrm{\mathfrak {sl}}\,}}_2\). The elements of \(\varPhi _{\mathsf {V}}\) are of the form \(x\pm L_4\), where x is any element of the set

$$\begin{aligned} \varPhi _{\mathsf {W}} :=\{\pm L_i \mid i= 1,2,3 \} \cup \{\pm L_1 \pm L_2 \pm L_3 \}. \end{aligned}$$

Every element \(\alpha \in X^*(\mathsf {T}) \otimes \mathbb {Q}\) has a unique expression of the form \(\sum _i n_i(\alpha ) \beta _i\) with \(n_i(\alpha ) \in \mathbb {Q}\). We define a partial ordering on \(X^*(\mathsf {T})\otimes \mathbb {Q}\) by declaring for \(x,y\in X^*(\mathsf {T})\otimes \mathbb {Q}\) that

$$\begin{aligned} x\ge y \quad \text {if } \, n_i(x-y) \ge 0\quad \text {for all }i= 1,\dots ,4. \end{aligned}$$
(2.1)

This induces a partial ordering on \(\varPhi _{\mathsf {V}}\).

Table 3 The elements of \(\varPhi _{\mathsf {V}}\)

We have tabulated the elements of \(\varPhi _{\mathsf {V}}\) in Table 3; the second column displays the coordinates of a weight in the basis \(\{\beta _1/2,\beta _2/2,\beta _3/2,\beta _4/2\}\). For example, the first entry is \(\alpha _0 = L_1+L_2+L_3+L_4 = 2\alpha _1+3\alpha _2+4\alpha _3+2\alpha _4=(2\beta _1+4\beta _2+3\beta _3+\beta _4)/2 \in \varPhi _{\mathsf {V}}\); it is the highest root of \(\varPhi _{\mathsf {H}}\) and the unique maximal element of \(\varPhi _{\mathsf {V}}\) with respect to the partial ordering.

We now describe the \({{\,\mathrm{Sp}\,}}_6\)-representation \(\mathsf {W}\) explicitly following [32,  §2.2]. Fix a vector space \(\mathbb {Q}^6\) with standard basis \(e_1, \dots , e_6\). We define \({{\,\mathrm{Sp}\,}}_6\) as the symplectic group stabilizing the 2-form \(\omega \) on \(\mathbb {Q}^6\) given by the matrix

$$\begin{aligned} \begin{pmatrix} 0 &{}\quad I_3 \\ -I_3 &{}\quad 0 \end{pmatrix}. \end{aligned}$$

The form \(\omega \) defines a \({{\,\mathrm{Sp}\,}}_6\)-equivariant contraction map \(\text {cont}_{\omega } :\bigwedge \nolimits ^{3}(\mathbb {Q}^6) \rightarrow \mathbb {Q}^6 , x_1 \wedge x_2 \wedge x_3 \mapsto \omega (x_2,x_3)-\omega (x_1,x_3)+\omega (x_1,x_2)\). Define

$$\begin{aligned} \mathsf {W}:=\ker \text {cont}_{\omega } \subset \bigwedge \nolimits ^{3}(\mathbb {Q}^6). \end{aligned}$$

We may organize an element \(\sum c_{ijk} e_i \wedge e_j \wedge e_k \in \bigwedge \nolimits ^{3}\mathbb {Q}^6\) in the matrices:

$$\begin{aligned} (u,X,Y,z) = \left( c_{123},\begin{pmatrix} c_{423} &{}\quad c_{143} &{}\quad c_{124} \\ c_{523} &{}\quad c_{153} &{}\quad c_{125} \\ c_{623} &{}\quad c_{163} &{}\quad c_{126} \end{pmatrix}, \begin{pmatrix} c_{156} &{}\quad c_{416} &{}\quad c_{451} \\ c_{256} &{}\quad c_{426} &{}\quad c_{452} \\ c_{356} &{}\quad c_{436} &{}\quad c_{453} \end{pmatrix}, c_{456} \right) . \end{aligned}$$

Then elements of \(\mathsf {W}\) correspond to 4-tuples (uXYz) such that X and Y are symmetric matrices. An element of \(\mathsf {W}\) will be usually thought of as such a 4-tuple.

The ring of invariant polynomials \(\mathbb {Q}[\mathsf {W}]^{{{\,\mathrm{Sp}\,}}_6}\) is freely generated by one degree-4 polynomial F explicitly given by

$$\begin{aligned} F(u,X,Y,z) :=\left( uz-{{\,\mathrm{tr}\,}}XY \right) ^2+4u\det Y+4 z \det X - 4 \sum _{ij} \det (\hat{X}_{ij})\det (\hat{Y}_{ij}), \end{aligned}$$
(2.2)

where for a matrix A we denote by \(\hat{A}_{ij}\) the matrix obtained by crossing out the \(i\hbox {th}\) row and \(j\hbox {th}\) column.

Proposition 2.6

Let \(k/\mathbb {Q}\) be an algebraically closed field. Then \(\mathsf {W}(k)\) has finitely many \({{\,\mathrm{Sp}\,}}_6(k) \times k^{\times }\)-orbits. Moreover:

  • \(\{w\in \mathsf {W}(k) \mid F(w)\ne 0\}\) is the unique open dense orbit.

  • If \(w\in \mathsf {W}(k)\) is nonzero with \(F(w)=0\), then w is \({{\,\mathrm{Sp}\,}}_6(k)\times k^{\times }\)-conjugate to an element of the form

    $$\begin{aligned} \left( 1,\begin{pmatrix} * &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad * &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad * \end{pmatrix}, \begin{pmatrix} 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \end{pmatrix}, 0 \right) . \end{aligned}$$

Proof

It is well-known that \(\mathsf {W}(k)\) has finitely many \({{\,\mathrm{Sp}\,}}_6(k)\times k^{\times }\)-orbits with \(\{F\ne 0\}\) the unique open dense one; see [32,  §2.3] for precise references. The description of the remaining orbits and Proposition 2.3.3 of loc. cit. implies the existence of the representatives above. \(\square \)

We now fix the identifications of this subsection to remove any ambiguities. There exists an isomorphism \(\mathsf {G}\simeq ({{\,\mathrm{Sp}\,}}_6\times {{\,\mathrm{SL}\,}}_2)/\mu _2\) such that:

  • the weights \(L_1, L_2, L_3\) correspond to the weights of \(e_1, e_2, e_3\) in the defining representation of \({{\,\mathrm{Sp}\,}}_6\) (and \({{\,\mathrm{SL}\,}}_2\) acts trivially),

  • the weight \(L_4\) corresponds to the weight \(\begin{pmatrix} t&{}\quad 0 \\ 0 &{}\quad t^{-1} \end{pmatrix} \mapsto t\) of \({{\,\mathrm{SL}\,}}_2\).

Then there exists a unique isomorphism \(\mathsf {V}\simeq \mathsf {W}\boxtimes (2)\) of \(\mathsf {G}\)-representations which sends \(X_{\alpha _1}\in \mathsf {V}_{\alpha _1}\) (part of the pinning of \(\mathsf {H}\) fixed in 2.1) to the element \((e_4\wedge e_2\wedge e_3,0)\). This choice is somewhat arbitrary but what is important for us is that it preserves the ‘obvious’ integral structures on both sides; this will be relevant in Sect. 5.1. We fix these isomorphisms for the remainder of the paper. It is therefore permitted, for every field \(k/\mathbb {Q}\), to view an element \(v\in \mathsf {V}(k)\) as a pair \((w_1,w_2)\) of elements of \(\mathsf {W}(k)\), where \(A\in {{\,\mathrm{SL}\,}}_2(k)\) acts on \((w_1,w_2)\) via \((w_1,w_2)\cdot A^{t}\).

2.5 The resolvent binary quartic

In this section we define for every \(v\in \mathsf {V}(k)\) a binary quartic form \(Q_v\). At the end of Sect. 2.4 we fixed an isomorphism \(\mathsf {G}\simeq ({{\,\mathrm{Sp}\,}}_6\times {{\,\mathrm{SL}\,}}_2)/\mu _2\); let \(p:\mathsf {G}\rightarrow {{\,\mathrm{PGL}\,}}_2\) be the corresponding projection map. Moreover we have fixed an isomorphism \(\mathsf {V}\simeq \mathsf {W}\boxtimes (2)\), where \(\mathsf {W}\) is the 14-dimensional \({{\,\mathrm{Sp}\,}}_6\)-representation described in Sect. 2.4.

Definition 2.2

Let \(k/\mathbb {Q}\) be a field and \(v\in \mathsf {V}(k)\), giving rise to a pair of elements \((w_1,w_2)\) in \(\mathsf {W}(k)\). We define the \(Q_v\) by the formula

$$\begin{aligned} Q_v :=F(xw_1+yw_2) \in k[x,y]_{\deg = 4}. \end{aligned}$$

Note that \(Q_{\lambda v} = \lambda ^4 Q_v\) and \(Q_{g\cdot v} = p(g)\cdot Q_v\), where an element \([A] \in {{\,\mathrm{PGL}\,}}_2(k)\) acts on a binary quartic form Q(xy) by \([A]\cdot Q(x,y):=Q((x,y)\cdot A)/(\det A)^2\).

Definition 2.3

Let \(k/\mathbb {Q}\) be a field and \(v\in \mathsf {V}(k)\). We say v is if \(Q_v\) has distinct roots in \(\mathbb {P}^1(\bar{k})\).

Lemma 2.1

Let \(k/\mathbb {Q}\) be a field and \(v\in \mathsf {V}(k)\). If v is regular semisimple, then v is almost regular semisimple.

Proof

We may assume that k is algebraically closed. Assume for contradiction that \(Q_v\) does not have distinct roots. Then there exists an element \(\gamma \in {{\,\mathrm{PGL}\,}}_2(k)\) so that the coefficients of \(\gamma \cdot Q_v\) at \(x^4\) and \(x^3y\) vanish. Choosing a lift \(g\in \mathsf {G}(k)\) of \(\gamma \) and replacing v by \(g\cdot v\), we may assume that this holds for \(Q_v\). Therefore if \(v = (w_1,w_2)\in \mathsf {V}(k)\) and \(g(t) := F(w_1+tw_2)\) then \(g(0)=g'(0)=0\). Since \(F(w_1)=0\), Proposition 2.6 shows that we may assume after conjugation by \({{\,\mathrm{Sp}\,}}_6(k)\) that \(w_1\) is of the form

$$\begin{aligned} \left( 1,\begin{pmatrix} * &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad * &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad * \end{pmatrix}, \begin{pmatrix} 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \end{pmatrix}, 0 \right) . \end{aligned}$$

To derive a contradiction we will use the equivalence between Parts 1 and 4 of Proposition 2.3 repeatedly.

We first claim that all the elements \(*\) on the diagonal are nonzero. If not, then we may assume that the one in the bottom right corner is zero. But then the one-parameter subgroup (in the explicit realizations of \({{\,\mathrm{Sp}\,}}_6\) described in Sect. 2.4)

$$\begin{aligned} t\mapsto \text {diag}(1,1,t,1,1,t^{-1}) \times \text {diag}(t^{-1},t) \end{aligned}$$

does not have a negative weight with respect to v, contradicting the assumption that v is regular semisimple. Secondly, the condition \(g'(0)=0\) translates into the condition that the coordinate of \(w_2\) at z in the decomposition (uXYz) vanishes, by an explicit computation using Formula (2.2). But then the one-parameter subgroup \(t \mapsto \text {diag}(t,t,t,t^{-1},t^{-1},t^{-1}) \times \text {diag}(t^{-1},t)\) again has no negative weight with respect to v, a contradiction. \(\square \)

Definition 2.4

For a field \(k/\mathbb {Q}\) and an element \(v \in \mathsf {V}(k)\), we say v is k- if it is not regular semisimple or the resolvent binary quartic form \(Q_v\) has a k-rational linear factor.

Lemma 2.2

Let \(k/\mathbb {Q}\) be a field and \(v\in \mathsf {V}(k)\). If v is k-reducible (Definition 2.1), then v is almost k-reducible.

Proof

We may assume that v is regular semisimple and of the form \(\sigma (b)\) for some \(b\in \mathsf {B}(k)\). A well-known result of Kostant determines the adjoint action of the \({{\,\mathrm{\mathfrak {sl}}\,}}_2\)-subalgebra generated by (EXF) on \(\mathfrak {h}\) in terms of the exponents 2, 6, 8, 12 of \(F_4\) [33,  Corollary 8.7]. It implies that \(\sigma (b)\in \kappa (k)\) is supported on vectors whose weights, considered as elements of \(\varPhi _{\mathsf {H}}\), have root height \(1,-1,-5,-7,-11\) with respect to \(S_{\mathsf {H}}\). Using Table 3 it follows that if \(\sigma (b) = (w_1,w_2)\) with \(w_i \in \mathsf {W}(k)\) then \(w_1\) is of the form

$$\begin{aligned} \left( 0,\begin{pmatrix} * &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \end{pmatrix}, \begin{pmatrix} 0 &{}\quad * &{}\quad * \\ * &{}\quad * &{}\quad 0 \\ * &{}\quad 0 &{}\quad * \end{pmatrix}, * \right) . \end{aligned}$$

Formula (2.2) shows that the polynomial F vanishes on elements of such form, so \(Q_v\) is divisible by y. \(\square \)

Remark 2.1

Not every element \(v\in \mathsf {V}(\mathbb {R})\) is almost \(\mathbb {R}\)-reducible. For a somewhat arbitrary example, let \(v = (w_1,w_2)\in \mathsf {V}(\mathbb {R})\) be given by:

$$\begin{aligned} w_1&=\left( 1,\begin{pmatrix} 1 &{}\quad 2&{}\quad 3\\ 4 &{}\quad 1 &{}\quad 0 \\ 2 &{}\quad 0 &{}\quad 1\end{pmatrix},\begin{pmatrix} 1 &{}\quad 2 &{}\quad 3 \\ 1 &{}\quad -2 &{}\quad -1\\ 2 &{}\quad 3 &{}\quad 1\end{pmatrix},0\right) , \\ w_2&=\left( -1, \begin{pmatrix} 3 &{}\quad 2 &{}\quad 5 \\ -1 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 2 &{}\quad 1\end{pmatrix} , \begin{pmatrix} 0 &{}\quad -3 &{}\quad 1\\ 2 &{}\quad 2 &{}\quad 1 \\ 1 &{}\quad 1 &{}\quad 0\end{pmatrix},2\right) . \end{aligned}$$

Then one computes that \(Q_v = 376 x^4+507x^3y+1697x^2y^2+846 xy^3 +119 y^4\), which has no real roots nor repeated roots. If v is regular semisimple, we have obtained a valid example; if not, then we may replace v by a small perturbation which is regular semisimple whose resolvent binary quartic form has no real roots either. This observation will be used in the proof of Lemma 4.6.

2.6 A criterion for almost reducibility

Let \(\varPhi _{\mathsf {H}}^+\) denote the positive roots of \(\varPhi _{\mathsf {H}}\) with respect to \(S_{\mathsf {H}}\) and write \(\varPhi _{\mathsf {V}}^+ :=\varPhi _{\mathsf {V}} \cap \varPhi _{\mathsf {H}}^+\). If \(v\in \mathsf {V}\) we can decompose v as \(\sum _{{\alpha }\in \varPhi _{\mathsf {V}}} v_{\alpha }\) with \(v_{\alpha }\) in the weight space corresponding to \({\alpha }\). For any subset M of \(\varPhi _{\mathsf {V}}\) we define the linear subspace

$$\begin{aligned} \mathsf {V}(M) = \{v\in \mathsf {V}\mid v_{\alpha } = 0 \text { for all }{\alpha } \in M \}\subset \mathsf {V}. \end{aligned}$$

We state a lemma which describes sufficient conditions for an element \(v\in \mathsf {V}\) to be almost \(\mathbb {Q}\)-reducible. This will (only) be useful when estimating the number of irreducible orbits in the cuspidal region in Sect. 6.10. Recall that we write \(\alpha = \sum n_i(\alpha ) \beta _i\).

Lemma 2.3

Let M be a subset of \(\varPhi _{\mathsf {V}}\), and suppose that one of the following three conditions is satisfied:

  1. 1.

    There exist integers \(b_1,\dots ,b_4\) not all equal to zero such that

    $$\begin{aligned} \left\{ {\alpha } \in \varPhi _{\mathsf {V}} \mid \sum _{i=1}^4 b_i n_i({\alpha }) > 0 \right\} \subset M. \end{aligned}$$
  2. 2.

    For every \(v = (w_1,w_2) \in \mathsf {V}(M)(\mathbb {Q})\), we have \(F(w_1)=0\).

Then every element of \(\mathsf {V}(M)(\mathbb {Q})\) is almost \(\mathbb {Q}\)-reducible.

Proof

In the first case, the integers \(b_1,\dots ,b_4\) determine a cocharacter of \(\mathsf {T}\) with respect to which every element of \(\mathsf {V}(M)(\mathbb {Q})\) has only nonnegative weights. By the Hilbert-Mumford stability criterion (Proposition 2.3), \(\mathsf {V}(M)(\mathbb {Q})\) then contains no regular semisimple elements so consists solely of almost \(\mathbb {Q}\)-reducible elements.

If the second condition is satisfied, then for every \(v\in \mathsf {V}(\mathbb {Q})\) the resolvent binary quartic form \(Q_v\) has a \(\mathbb {Q}\)-rational linear factor, so v is almost \(\mathbb {Q}\)-reducible too. \(\square \)

Lemma 2.4

Let M be a subset of \(\varPhi _{\mathsf {V}}\), and suppose M contains one of the following subsets, in the notation of Table 3:

$$\begin{aligned} \{1,2,3,4,5,7,8,11\}, \{1,2,3,4,6,7,8,10,13,15\},\\ \{1,2,3,5,6,9,10\}, \{1,2,3,5,6,9,14\}. \end{aligned}$$

Then every element of \(\mathsf {V}(M)(\mathbb {Q})\) is almost \(\mathbb {Q}\)-reducible.

Proof

We show that M satisfies one of the conditions of Lemma 2.3. If M contains the first displayed subset, we may use Condition 1 with \((b_1,b_2,b_3,b_4)= (0,1,0,0)\). If M contains the second subset, we use the same condition with \((b_1,b_2,b_3,b_4)= (1,0,0,0)\). The last two cases follow from Condition 2: indeed for \(v\in \mathsf {V}(M)(\mathbb {Q})\) the vector \(w_1\) is either of the form

$$\begin{aligned} \left( 0,\begin{pmatrix} 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \end{pmatrix}, \begin{pmatrix} * &{}\quad * &{}\quad * \\ * &{}\quad * &{}\quad * \\ * &{}\quad * &{}\quad * \end{pmatrix}, * \right) \text { or } \left( 0,\begin{pmatrix} * &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \end{pmatrix}, \begin{pmatrix} 0 &{}\quad * &{}\quad * \\ * &{}\quad * &{}\quad * \\ * &{}\quad * &{}\quad * \end{pmatrix}, * \right) . \end{aligned}$$

In both cases we see using Formula (2.2) that \(F(w_1) = 0\). \(\square \)

3 Geometry

3.1 A family of curves

In this section we relate the representation \((\mathsf {G},\mathsf {V})\) to our family of curves of interest, see Proposition 3.2. The proof involves a similar result for the representation \((\mathsf {G}_{\mathrm {E}},\mathsf {V}_{\mathrm {E}})\) and a study of the involution \(\zeta \). We first recall this result for \((\mathsf {G}_{\mathrm {E}},\mathsf {V}_{\mathrm {E}})\), after introducing some notation.

Let \(\mathsf {V}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\) denote the open subscheme of regular semisimple elements of \(\mathsf {V}_{\mathrm {E}}\subset {{\,\mathrm{\mathfrak {h}}\,}}_{\mathrm {E}}\), and let \(\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\) be its image under \(\pi _{\mathrm {E}}:\mathsf {V}_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\). Define the \(\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\)-scheme \(\mathsf {A}_{\mathrm {E}}:=Z_{\mathsf {H}_{\mathrm {E}}}(\sigma _{\mathrm {E}}|_{\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}})\), the centralizer of the \(\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\)-point \(\sigma _{\mathrm {E}}|_{\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}}\) of \(\mathsf {V}_{\mathrm {E}}\). It is a family of maximal tori in \(\mathsf {H}_{\mathrm {E}}\) parametrized by \(\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\). We also define \(\varLambda _{\mathrm {E}}:={{\,\mathrm{Hom}\,}}(\mathsf {A}_{\mathrm {E}},\mathbb {G}_m)\) as the character group of \(\mathsf {A}_{\mathrm {E}}\). Then \(\varLambda _{\mathrm {E}}\) is an étale sheaf of \(E_6\) root lattices on \(\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\). By definition, this means that \(\varLambda _{\mathrm {E}}\) is a locally constant étale sheaf of finite free \(\mathbb {Z}\)-modules, equipped with a pairing \((\cdot ,\cdot ):\varLambda _{\mathrm {E}}\times \varLambda _{\mathrm {E}}\rightarrow \mathbb {Z}\) such that for every geometric point \(\bar{x}\) of \(\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\), the stalk of \(\varLambda _{\mathrm {E}}\) at \(\bar{x}\) is a root lattice of type \(E_6\). This induces a pairing \(\varLambda _{\mathrm {E}}/2\varLambda _{\mathrm {E}}\times \varLambda _{\mathrm {E}}/2\varLambda _{\mathrm {E}}\rightarrow \{\pm 1\} : (\lambda ,\mu ) \mapsto (-1)^{(\lambda ,\mu )}\).

Proposition 3.1

We can choose polynomials \(p_2, p_5, p_6, p_8, p_9, p_{12} \in \mathbb {Q}[\mathsf {V}_{\mathrm {E}}]^{\mathsf {G}_{\mathrm {E}}}\) with the following properties:

  1. 1.

    Each polynomial \(p_i\) is homogeneous of degree i and \(\mathbb {Q}[\mathsf {V}_{\mathrm {E}}]^{\mathsf {G}_{\mathrm {E}}} \simeq \mathbb {Q}[p_2, p_5, p_6, p_8, p_9, p_{12}]\). Consequently, there is an isomorphism \(\mathsf {B}_{\mathrm {E}}\simeq \mathbb {A}^6_{\mathbb {Q}}\).

  2. 2.

    Let \(C_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\) be the family of projective curves inside \(\mathbb {P}^2_{\mathsf {B}_{\mathrm {E}}}\) with affine model

    $$\begin{aligned} y^4+x(p_2y^2+p_5y)+(p_6y^2+p_9y) = x^3+p_8x+p_{12}. \end{aligned}$$
    (3.1)

    If \(k/\mathbb {Q}\) is a field and \(b\in \mathsf {B}_{\mathrm {E}}(k)\), then \((C_{\mathrm {E}})_b\) is smooth if and only if \(b\in \mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}(k)\).

  3. 3.

    Let \(J_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\) be the Jacobian of its smooth part [18,  §9.3; Theorem 1]. Then there is a unique isomorphism \(\varLambda _{\mathrm {E}}/2\varLambda _{\mathrm {E}}\simeq J_{\mathrm {E}}[2] \) of finite étale group schemes over \(\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\) that sends the pairing on \(\varLambda _{\mathrm {E}}/2\varLambda _{\mathrm {E}}\) to the Weil pairing \(J_{\mathrm {E}}[2] \times J_{\mathrm {E}}[2] \rightarrow \{ \pm 1\}\).

  4. 4.

    There exists an isomorphism \(Z_{\mathsf {G}_{\mathrm {E}}}(\sigma _{\mathrm {E}}|_{\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}}) \simeq J_{\mathrm {E}}[2]\) of finite étale group schemes over \(\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\).

Proof

This is a combination of classical results and Thorne’s thesis [60]. We refer to [34,  Proposition 2.5] for precise references, with the caveat that the role of the coordinates x and y is interchanged here. The only part that remains to be proven is the uniqueness of the isomorphism \(\varLambda _{\mathrm {E}}/2\varLambda _{\mathrm {E}}\simeq J_{\mathrm {E}}[2]\) that preserves the pairings on both sides. This follows from [34,  Proposition 2.6(4)]. \(\square \)

We now incorporate the involution \(\zeta \) in the picture, and compare it to an involution defined on the level of curves. Recall that \(\zeta :\mathsf {G}_{\mathrm {E}}\rightarrow \mathsf {G}_{\mathrm {E}}\) is an involution with fixed points \(\mathsf {G}\). Since \(\zeta \) commutes with \(\sigma _{\mathrm {E}}\), it defines an involution of the scheme \(Z_{\mathsf {H}_{\mathrm {E}}}(\sigma _{\mathrm {E}}|_{\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}})\) lifting the involution \(\zeta :\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}} \rightarrow \mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}} \). It induces an involution of \(\varLambda _{\mathrm {E}}/2\varLambda _{\mathrm {E}}\), still denoted by \(\zeta \).

On the other hand, if \(C_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\) denotes the family of Eq. (3.1), then the map \((x,y) \mapsto (x,-y)\) defines an involution \(\tau :C_{\mathrm {E}}\rightarrow C_{\mathrm {E}}\) lifting the involution \((-1):\mathsf {B}_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\). It induces an involution of \(J_{\mathrm {E}}[2]\), denoted by \(\tau ^*\).

Lemma 3.1

Under the isomorphism \(\phi :\varLambda _{\mathrm {E}}/2\varLambda _{\mathrm {E}}\xrightarrow {\sim } J_{\mathrm {E}}[2]\) from Proposition 3.1, the involutions \(\zeta \) and \(\tau ^*\) are identified.

Proof

Write \(\phi ' = \tau ^* \circ \phi \circ \zeta \). We need to prove that \(\phi ' = \phi \). By the second part of Proposition 2.5, \(\phi ':\varLambda _{\mathrm {E}}/2\varLambda _{\mathrm {E}}\rightarrow J_{\mathrm {E}}[2]\) is an isomorphism of \(\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\)-schemes. Moreover, \(\zeta \) and \(\tau ^*\) respect the pairings on \(\varLambda _{\mathrm {E}}/2\varLambda _{\mathrm {E}}\) and \(J_{\mathrm {E}}[2]\) respectively. The result follows from the uniqueness statement in Part 3 of Proposition 3.1. \(\square \)

Proposition 3.1 and Lemma 3.1 have the following important consequence, which connects the representation \((\mathsf {G},\mathsf {V})\) with the subfamily \(C\rightarrow \mathsf {B}\) of \(C_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\). Again let \(\mathsf {V}^{{{\,\mathrm{rs}\,}}}\) denote the open subscheme of regular semisimple elements of \(\mathsf {V}\) and let \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\) be its image under \(\pi :\mathsf {V}\rightarrow \mathsf {B}\).

Proposition 3.2

We can choose polynomials \(p_2, p_6, p_8, p_{12} \in \mathbb {Q}[\mathsf {V}]^{\mathsf {G}}\) with the following properties:

  1. 1.

    Each polynomial \(p_i\) is homogeneous of degree i and \(\mathbb {Q}[\mathsf {V}]^{\mathsf {G}} \simeq \mathbb {Q}[p_2, p_6, p_8, p_{12}]\). Consequently, there is an isomorphism \(\mathsf {B}\simeq \mathbb {A}^4_{\mathbb {Q}}\).

  2. 2.

    Let \(C\rightarrow \mathsf {B}\) be the family of projective curves inside \(\mathbb {P}^2_{\mathsf {B}}\) with affine model

    $$\begin{aligned} y^4+p_2xy^2+p_6y^2 = x^3+p_8x+p_{12}. \end{aligned}$$
    (3.2)

    If \(k/\mathbb {Q}\) is a field and \(b\in \mathsf {B}(k)\), then \(C_b\) is smooth if and only if \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(k)\).

  3. 3.

    Let \(J\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) be the Jacobian of the morphism \(C^{{{\,\mathrm{rs}\,}}}\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\). Let \(\tau : C^{{{\,\mathrm{rs}\,}}} \rightarrow C^{{{\,\mathrm{rs}\,}}}\) be the involution of \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\)-schemes sending (xy) to \((x,-y)\) and let \(\tau ^*:J\rightarrow J\) be the induced morphism on \(J\). Then the isomorphism \(Z_{\mathsf {G}_{\mathrm {E}}}(\sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}}) \simeq J[2]\) obtained from Proposition 3.1 intertwines the involutions \(\zeta \) and \(\tau ^*\) and restricts to an isomorphism \(Z_{\mathsf {G}}(\sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}}) \simeq J[2]^{\tau ^*}\)

Proof

Let \(p'_2, p'_5, p'_6, p'_8, p'_9, p'_{12} \in \mathbb {Q}[\mathsf {V}_{\mathrm {E}}]^{\mathsf {G}_{\mathrm {E}}}\) be a choice of polynomials satisfying the conclusion of Proposition 3.1. Write \(p_i\in \mathbb {Q}[\mathsf {V}]^{\mathsf {G}}\) for the restriction of \(p'_i\) to \(\mathsf {V}\). The first two parts of Proposition 2.5 imply that \(p_5 = p_9 = 0\) and \(\mathbb {Q}[\mathsf {V}]^{\mathsf {G}} = \mathbb {Q}[p_2,p_6,p_8,p_{12}]\). The family \(C\rightarrow \mathsf {B}\) is the pullback of the family \(C_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\) along \(\mathsf {B}\rightarrow \mathsf {B}_{\mathrm {E}}\). Moreover, Part 3 of Proposition 2.5 shows that \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}=\mathsf {B}\cap \mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\). The proposition now follows from Proposition 3.1 and Lemma 3.1. \(\square \)

We henceforth fix \(p_2,p_6,p_8,p_{12}\in \mathbb {Q}[\mathsf {V}]^{\mathsf {G}}\) satisfying the conclusions of Proposition 3.2. Recall that we have defined a \(\mathbb {G}_m\)-action on \(\mathsf {B}\) which satisfies \(\lambda \cdot p_i = \lambda ^ip_i\). The assignment \(\lambda \cdot (x,y) = (\lambda ^4x,\lambda ^3y)\) defines a \(\mathbb {G}_m\)-action on \(C\) such that the morphism \(C\rightarrow \mathsf {B}\) is \(\mathbb {G}_m\)-equivariant.

3.2 Monodromy of \(J[2]\)

We give some additional properties of the finite étale group scheme \(J[2] \rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\). Recall that \(\mathsf {T}\) is a split maximal torus of \(\mathsf {H}\); let \({{\,\mathrm{\mathfrak {t}}\,}}\) be its Lie algebra and \(W:=N_{\mathsf {G}}(\mathsf {T})/\mathsf {T}\) its Weyl group . We define a map \(f:{{\,\mathrm{\mathfrak {t}}\,}}^{{{\,\mathrm{rs}\,}}} \rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) as follows. The inclusions \({{\,\mathrm{\mathfrak {t}}\,}}\subset {{\,\mathrm{\mathfrak {h}}\,}}\) and \(\mathsf {V}\subset {{\,\mathrm{\mathfrak {h}}\,}}\) induce isomorphisms \({{\,\mathrm{\mathfrak {t}}\,}}\mathbin {//}W \simeq {{\,\mathrm{\mathfrak {h}}\,}}\mathbin {//}\mathsf {H}\) and \(\mathsf {B}\simeq {{\,\mathrm{\mathfrak {h}}\,}}\mathbin {//}\mathsf {H}\) by the classical Chevalley isomorphism and Proposition 2.2 respectively. Composing the first with the inverse of the second determines an isomorphism \({{\,\mathrm{\mathfrak {t}}\,}}\mathbin {//}W \xrightarrow {\sim } \mathsf {B}\). Precomposing this isomorphism with the natural projection \({{\,\mathrm{\mathfrak {t}}\,}}\rightarrow {{\,\mathrm{\mathfrak {t}}\,}}\mathbin {//}W\) and restricting to regular semisimple elements defines a morphism \(f:{{\,\mathrm{\mathfrak {t}}\,}}^{{{\,\mathrm{rs}\,}}} \rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\). Since \({{\,\mathrm{\mathfrak {t}}\,}}^{{{\,\mathrm{rs}\,}}} \rightarrow {{\,\mathrm{\mathfrak {t}}\,}}^{{{\,\mathrm{rs}\,}}}\mathbin {//}W\) is a torsor under W, f is a W-torsor too.

Let \(W_{\mathrm {E}}\) be the Weyl group of the split maximal torus \(\mathsf {T}_{\mathrm {E}}\) of \(\mathsf {H}_{\mathrm {E}}\). It it known that the inclusion \(\mathsf {T}\subset \mathsf {T}_{\mathrm {E}}\) induces an isomorphism of W onto \(W_{\mathrm {E}}^{\zeta }\), the centralizer of \(\zeta \) in \(W_{\mathrm {E}}\) [21,  §13.3.3]. We therefore obtain an action of W on \(\varLambda :=X^*(\mathsf {T}_{\mathrm {E}})\), a root lattice of type \(E_6\).

Proposition 3.3

The finite étale group scheme \(J[2] \rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) becomes trivial after the base change \(f:{{\,\mathrm{\mathfrak {t}}\,}}^{{{\,\mathrm{rs}\,}}} \rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\), where it is isomorphic to the constant group scheme \(\varLambda /2\varLambda \). The monodromy action is given by the natural action of \(W \simeq W_{\mathrm {E}}^{\zeta }\).

Proof

By [34,  Part 1 of Proposition 2.6], the group scheme \(J_{\mathrm {E}}[2] \rightarrow \mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}} \) becomes trivial after the base change \(f_{\mathrm {E}}:\mathfrak {t}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}} \rightarrow \mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\) where \(f_{\mathrm {E}}\) is defined analogously as before. Moreover the monodromy action is given by the natural action of \(W_{\mathrm {E}}\) on \(\varLambda /2\varLambda \). The proposition is thus implied by the following commutative diagram:

figure t

Corollary 3.1

The finite étale \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\)-subgroup schemes of \(J[2]\) are

$$\begin{aligned} 0 \subset (1+\tau ^*)J[2] \subset J[2]^{\tau ^*}\subset J[2] \end{aligned}$$
(3.3)

of order \(1,2^2,2^4,2^6\) respectively. Moreover the \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\)-group schemes \((1+\tau ^*)J[2]\) and \(J[2]^{\tau ^*}/(1+\tau ^*)J[2]\) are not isomorphic, even after base change to k for any field extension \(k/\mathbb {Q}\).

Proof

In light of Proposition 3.3, the above claims are reduced to analyzing the action of \(W_{\mathrm {E}}^{\zeta }\) on \(\varLambda /2\varLambda \). For example for the first part it suffices to determine the \(W_{\mathrm {E}}^{\zeta }\)-invariant subgroups of \(\varLambda /2\varLambda \) and for the second part, it suffices to find an element of \(W_{\mathrm {E}}^{\zeta }\) which acts trivially on \((1+\zeta )\left( \varLambda /2\varLambda \right) \) but not so on \(\left( \varLambda /2\varLambda \right) ^{\zeta }/(1+\zeta )\left( \varLambda /2\varLambda \right) \). Both are direct computations in the \(E_6\) root lattice, which we omit. \(\square \)

3.3 A family of Prym varieties

In this section we introduce the family of Prym surfaces \(P\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) and discuss some of its properties. We first discuss it in a more general set-up.

Let k be a field of characteristic different from 2 and X/k a smooth projective genus-3 curve. Let \(\tau :X\rightarrow X\) be an involution with four fixed points. Suppose we are given a k-point \(\infty \in X(k)\) fixed by \(\tau \). Let \(E:=X/\tau \) be the quotient of X by \(\tau \) and \(f:X\rightarrow E\) be the associated double cover which is branched at four points. By the Riemann–Hurwitz formula, E is an elliptic curve with origin \(f(\infty )\). This defines an isomorphism \(E\simeq J_E\) between E and its Jacobian which sends \(f(\infty )\) to the identity of \(J_E\).

The Jacobian variety \(J_X\) of X is not simple. Indeed the map f induces a surjective norm homomorphism \(f_*:J_X \rightarrow J_E \simeq E\) which sends the equivalence class [D] of a divisor to [f(D)], so E is an isogeny factor of \(J_X\). To describe the remaining part of \(J_X\) we use the following classical definition.

Definition 3.1

We define the \(P_{X,\tau }\) of the pair \((X,\tau )\) as the kernel of the norm map:

$$\begin{aligned} P_{X,\tau } :=\ker \left( f_*: J_X \rightarrow E\right) . \end{aligned}$$

Prym varieties have been studied by Mumford in a much more general set-up [42]. We warn the reader that many authors only consider fixed-point free involutions when defining Prym varieties or equivalently, unramified double covers. In our case the algebraic group \(P_{X,\tau }\) satisfies the following properties:

  1. 1.

    Let \(f^*:E \rightarrow J_X\) be the pullback map on divisors. Then \(f^*\) is injective, \(f_*\circ f^*=[2]\) and \(f^*\circ f_* = 1+\tau ^*\). Hence

    $$\begin{aligned} P_{X,\tau } = \ker \left( 1+\tau ^*:J_X\rightarrow J_X \right) . \end{aligned}$$
    (3.4)
  2. 2.

    \(P_{X,\tau }\) is connected, hence an abelian surface.

  3. 3.

    The restriction of \(f^*: E \rightarrow J_X\) to E[2] induces an isomorphism \(E[2] \xrightarrow {\sim } {{\,\mathrm{image}\,}}(f^*) \cap P_{X,\tau }\). Consequently there is an injective morphism \(\psi : E[2] \hookrightarrow P_{X,\tau }[2]\).

  4. 4.

    The map \(E\times P_{X,\tau } \rightarrow J_X\), determined by \(f^*\) and the inclusion \(P_{X,\tau } \hookrightarrow J_X\), is surjective with kernel equal to the graph of \(\psi \), given by \(\{(x,\psi (x)) \mid x\in E[2] \}\). Consequently there is an isomorphism

    $$\begin{aligned} J_X \simeq \left( E\times P_{X,\tau } \right) /\{(x,\psi (x) ) \mid x\in E[2] \}. \end{aligned}$$

    So \(J_X\) is isogenous to \(E \times P_{X,\tau }\).

  5. 5.

    The cokernel of \(f^*\) is naturally identified with the dual abelian variety of \(P_{X,\tau }\), written \(P_{X,\tau }^{\vee }\), and the composite \(P_{X,\tau } \hookrightarrow J_X \twoheadrightarrow P_{X,\tau }^{\vee } \), denoted \(\rho \), is a polarization of type (1, 2). (This means that \(P_{X,\tau }[\rho ]\) is isomorphic to \((\mathbb {Z}/2)^2\) over \(\bar{k}\).) The map \(f^*:E[2] \rightarrow P_{X,\tau }[\rho ]\) is an isomorphism.

Indeed, to verify the above properties we may assume that k is algebraically closed. Then Property 1 follows from [42,  §3; Lemma 1] and the fact that \(X\rightarrow E\) is ramified at four points. The other properties follow from going through the correspondence described in [42,  §2]: in the notation of that paper, we start with Data I of the form \((X,Y,\phi ) = (E,X,f^*)\) whose invariants are \((a,b,c) = (1,2,1)\). Equation (2.1) of loc. cit. holds by the discussion in §1 of op. cit. We additionally record the following important fact.

Lemma 3.2

The isogeny \(\rho :P_{X,\tau } \rightarrow P_{X,\tau }^{\vee }\) is self-dual.

Proof

This follows from the fact that \(f_*\) and \(f^*\) are dual to each other when transported along the principal polarizations of \(J_X\) and E, see [42,  End of §1]. \(\square \)

We now specialize to our situation of interest. Recall that \(C^{{{\,\mathrm{rs}\,}}}\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) consists of the smooth members of the projective closure of the family of curves

$$\begin{aligned} y^4+p_2xy^2+p_6y^2 = x^3+p_8x+p_{12} \end{aligned}$$

and that \(\tau :C\rightarrow C, (x,y) \mapsto (x,-y)\) is the involution which defines, for every field \(k/\mathbb {Q}\) and \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(k)\), an involution \(\tau _b:C_b\rightarrow C_b\) with four fixed points fixing the point at infinity \(\infty \in C_b(k)\).

Define \(\overline{E}\rightarrow \mathsf {B}\) to be the projective completion of the family of plane curves given by

$$\begin{aligned} y^2+p_2xy+p_6y = x^3+p_8x+p_{12}. \end{aligned}$$
(3.5)

Define \(E\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) to be its restriction to \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\). Then there is a unique morphism of \(\mathsf {B}\)-schemes \(f:C\rightarrow \overline{E}\) sending a point (xy) to \((x,y^2)\). This identifies, for each field \(k/\mathbb {Q}\) and \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(k)\), \(E_b\) with the quotient of \(C_b\) by \(\tau _b\).

The morphism \(\tau \) defines via pullback a morphism of abelian schemes \(\tau ^*:J\rightarrow J\). Define

$$\begin{aligned} P:=\ker (1+\tau ^*:J\rightarrow J). \end{aligned}$$
(3.6)

The morphism \(P\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) is proper and by Eq. (3.4) its fibres are abelian surfaces enjoying the properties described above. The next useful lemma [28,  Proposition 3.5] applied to the \(\mathbb {Z}/2\mathbb {Z}\)-action \(-\tau ^*\) on \(J\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) shows that \(P\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) is smooth, hence an abelian scheme.

Lemma 3.3

Let G be a finite group, acting equivariantly on a smooth morphism of schemes \(X\rightarrow S\). If the order of G is invertible on S, then the induced morphism on fixed points \(X^G \rightarrow S^G\) is smooth.

Lemma 3.4

The filtration \(0 \subset (1+\tau ^{*})J[2] \subset J[2]^{\tau ^*} \subset J[2]\) of Corollary 3.1 is identified with the filtration

$$\begin{aligned} 0 \subset E[2] \subset P[2] \subset J[2]. \end{aligned}$$

Here we see \(E[2]\) as a subgroup of \(P[2]\) using the pullback map \(f^* :E[2] \hookrightarrow J[2]\).

Proof

This follows from Corollary 3.1 and the fact that \(E[2]\) and \(P[2]\) have order \(2^2\) and \(2^4\) respectively. \(\square \)

The map \(1+\tau ^*:J[2]\rightarrow J[2]\) has image \(E[2]\) and kernel \(P[2]\), so \(J[2]/P[2]\simeq E[2]\). The remaining graded piece \(P[2]/E[2]\) of the filtration of Lemma 3.4 will be determined in Corollary 3.2.

Since \(P[2] = J[2]^{\tau ^*}\), Proposition 3.2 immediately implies the following.

Proposition 3.4

The isomorphism \(Z_{\mathsf {G}_{\mathrm {E}}}(\sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}}) \simeq J[2]\) of Proposition 3.2 restricts to an isomorphism \(Z_{\mathsf {G}}(\sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}}) \simeq P[2]\) of finite étale group schemes over \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\).

The following diagram of smooth group schemes over \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\) summarizes the situation.

figure v

3.4 The bigonal construction

Let \(k/\mathbb {Q}\) be a field and let \((X,\tau )\) be a pair where X/k a smooth projective genus-3 curve and \(\tau :X \rightarrow X\) an involution with four fixed points. Let \(\infty \in X(k)\) be a k-point fixed by \(\tau \). In Sect. 3.3 we have associated to this data a Prym variety \(P_{X,\tau }\). In this section we will, under the presence of additional assumptions, realize the dual \(P_{X,\tau }^{\vee }\) as the Prym variety of another such pair \((\hat{X},\hat{\tau })\). This is a special case of the bigonal construction going back to Pantazis [45] but we present it in a way closer in spirit to Barth [3] who analyzed the above situation in great detail. Only Theorem 3.1 will be used later.

Recall that we have defined a polarization \(\rho :P_{X,\tau } \rightarrow P_{X,\tau }^{\vee }\) of type (1, 2); let \(\hat{\rho }:P^{\vee }_{X,\tau }\rightarrow P_{X,\tau }\) be the unique isogeny such that \(\hat{\rho }\circ \rho = [2]\). (Warning: \(\hat{\rho }\) is not the dual of \(\rho \)!) Define \(i:X\rightarrow P_{X,\tau }^{\vee }\) as the composite of the Abel–Jacobi map \(X\rightarrow J_X\) with respect to \(\infty \) with the projection \(J_X \rightarrow P_{X,\tau }^{\vee }\).

Proposition 3.5

(Barth)

  1. 1.

    The morphism \(i:X\hookrightarrow P_{X,\tau }^{\vee }\) is a closed embedding.

  2. 2.

    The divisor i(X) is ample and the induced polarization of \(P_{X,\tau }^{\vee }\) coincides with \(\hat{\rho }\).

  3. 3.

    If A/k is an abelian surface and \(j:X\hookrightarrow A\) is a closed embedding mapping \(\infty \) to 0 such that \([-1]\) restricts to \(\tau \) on X, then there exists a unique isomorphism of abelian varieties \(P_{X,\tau }^{\vee }\rightarrow A\) sending i to j.

Proof

We may suppose that k is algebraically closed. Part 1 follows from the proof of [3,  Proposition 1.8]. For Part 2, note that by the adjunction formula we have \(i(X)\cdot i(X) = 2p_a(X)-2 = 4\) and if a curve Y on A is not numerically equivalent to i(X) we can translate Y using A so that it intersects i(X) in a finite non-empty subscheme, implying that \(Y\cdot i(X)>0\). Therefore i(X) is ample by the Nakai–Moishezon criterion. Let \(\lambda :P_{X,\tau }^{\vee } \rightarrow P_{X,\tau }\) be the corresponding polarization. The fact that \(\lambda = \hat{\rho }\) follows from the equality \(\ker \lambda = \ker \hat{\rho }\) [3,  Lemma 1.11]. Part 3 is [3,  Proposition 1.10]. \(\square \)

We now describe the bigonal construction. Suppose in addition to the above that X is not hyperelliptic and we are given an effective divisor \(\kappa \) on X fixed by \(\tau \) such that \(2\kappa \) is canonical. Let \(\varTheta _{\kappa }\subset J_X\) be the corresponding theta divisor, namely the pullback of the image of the natural summing map \(X\times X\rightarrow {{\,\mathrm{Pic}\,}}^2(X)\) along the translation-by-\(\kappa \) map \(J_X \rightarrow {{\,\mathrm{Pic}\,}}^2(X), D\mapsto D+\kappa \). The divisor \(\varTheta _{\kappa }\) is symmetric and induces the principal polarization on \(J_X\); we refer to [14,  Chapter 11, § 2] for these classical facts. Set

$$\begin{aligned} \hat{X} :=\varTheta _{\kappa } \cap P_{X,\tau }. \end{aligned}$$

Then \(\hat{X}\) induces the polarization \(\rho :P_{X,\tau } \rightarrow P_{X,\tau }^{\vee }\), by construction of \(\rho \). Let \(\hat{\tau }\) be the restriction of \([-1] \) to \(\hat{X}\), which coincides with the restriction of \(\tau ^*\) to \(\hat{X}\).

Lemma 3.5

The curve \(\hat{X}\) is smooth, geometrically connected and of genus 3. The involution \(\hat{\tau }:\hat{X} \rightarrow \hat{X}\) has 4 fixed points over \(\bar{k}\).

Proof

We may suppose that k is algebraically closed. Since \(\hat{X}\) is an ample divisor on the smooth projective surface \(P_{X,\tau }\), it is connected by the Kodaira vanishing theorem. Moreover because \(\hat{X}\) defines a polarization of degree 4, it has self-intersection 4 so arithmetic genus 3 by the adjunction formula. Because we assumed that X is not hyperelliptic and of genus 3, \(\varTheta _{\kappa }\) is smooth by Riemann’s singularity theorem [14,  Chapter 11, §2.5]. Therefore \(\hat{X}\) is smooth by Lemma 3.3, being the fixed points of the involution \([-1]\circ \tau :\varTheta _{\kappa } \rightarrow \varTheta _{\kappa }\).

It remains to calculate the number of fixed points of \(\hat{\tau }\). Let \(f:X\rightarrow E\) be the quotient of X by \(\tau \) and let \(g:E\rightarrow \mathbb {P}^1\) be the morphism induced by the degree-2 divisor \(f_*(\kappa )\). Since \(\varTheta _{\kappa }\) is smooth and X is not hyperelliptic, the summing map \({{\,\mathrm{Sym}\,}}^2X\rightarrow \varTheta _{\kappa }, D\mapsto D-\kappa \) is an isomorphism; let \(\widetilde{X}\) be the inverse image of \(\hat{X}\) under this isomorphism. An effective degree-2 divisor D lies on \(\widetilde{X}\) if and only if \(D+\tau (D)\sim 2\kappa \). Since \(f^*:{{\,\mathrm{Pic}\,}}(E) \rightarrow {{\,\mathrm{Pic}\,}}(X)\) is injective and \(f^*\circ f_*=1+\tau ^*\) (Property 1 of Sect. 3.3), the latter holds if and only if \(f_*(D)\sim f_*(\kappa )\).

It suffices to prove that the involution \(D\mapsto \tau (D)\) on \(\widetilde{X}\) has 4 fixed points. If \(e_1,\dots ,e_4\) are the ramification points of g then \(f^*(e_1),\dots ,f^*(e_4)\) are fixed points; we claim that these are the only ones. Arguing by contradiction, suppose that \(D=P_1+P_2\in \widetilde{X}\) is fixed by \(\tau \) and not of this form. Then \(\tau (P_i) = P_i\) for \(i=1,2\) and \(P_1\ne P_2\); write \(P_3, P_4\) for the remaining fixed points of \(\tau \) on X. We have equivalences of divisors \(2P_1+2P_2 = D+\tau (D) \sim 2\kappa \sim P_1+P_2+P_3+P_4\) where last equivalence follows from the Riemann-Hurwitz formula applied to f. This implies that \(P_1+P_2 \sim P_3+P_4\). Since X is not hyperelliptic and \(P_1,\dots ,P_4\) are distinct, we obtain a contradiction. \(\square \)

The effective degree-2 divisor \(\kappa \) defines a point \(\hat{\infty }\in \hat{X}(k)\) fixed by \(\hat{\tau }\). We thus obtain a Prym variety \(P_{\hat{X},\hat{\tau }}\) and an embedding \(\hat{i}:\hat{X} \hookrightarrow P_{\hat{X},\hat{\tau }}^{\vee }\) as defined above. The inclusion \(\hat{X} \hookrightarrow P_{X,\tau }\) maps \(\hat{\infty }\) to 0 and extends to a homomorphism \(J_{\hat{X}} \rightarrow P_{X,\tau }\) from the Jacobian of \(\hat{X}\).

Proposition 3.6

The homomorphism \(J_{\hat{X}} \rightarrow P_{X,\tau }\) factors through an isomorphism of abelian varieties \(P_{\hat{X},\hat{\tau }}^{\vee } \rightarrow P_{X,\tau }\) which identifies the polarizations \(\hat{\rho }_{\hat{X}}\) and \(\rho _{X}\).

Proof

The first claim follows from Part 3 of Proposition 3.5 applied to the closed embedding \(\hat{X} \hookrightarrow P_{X,\tau }\). Since the polarizations of \(P_{X,\tau }\) and \(P^{\vee }_{\hat{X},\hat{\tau }}\) are defined by the embedded curve \(\hat{X}\), the isomorphism identifies the polarizations. \(\square \)

We apply the above generalities to the family of curves that concern us. If \(b=(p_2,p_6,p_8,p_{12}) \in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(k)\) then \(C_b\) and \(E_b\) are of the form

$$\begin{aligned}&C_b : y^4+p_2xy^2+p_6y^2 = x^3+p_8x+p_{12},\\&E_b : y^2+p_2xy+p_6y = x^3+p_8x+p_{12}. \end{aligned}$$

The point \(\infty \in C_b(k)\) is the unique point at infinity, \(\tau _b:C_b \rightarrow C_b\) is the involution sending (xy) to \((x,-y)\) and \(f_b:C_b\rightarrow E_b\) the quotient of \(C_b\) by \(\tau _b\). The divisor \(\kappa = 2\infty \) is a theta characteristic fixed by \(\tau _b\).

The proof of Lemma 3.5 shows that \(\hat{C}_b\) is isomorphic to the closed subscheme of \({{\,\mathrm{Sym}\,}}^2 C_b\) consisting of degree-2 divisors D with the property that \(f_b(D)\sim 2f_b(\infty )\). It follows that \(\hat{C}_b\) has an affine open given by the closed subscheme of \(\mathbb {A}^4\) defined by the equations

$$\begin{aligned} {\left\{ \begin{array}{ll} y^4+p_2xy^2+p_6y^2 = x^3+p_8x+p_{12},\\ y'^4+p_2x'y'^2+p_6y'^2 = x'^3+p_8x'+p_{12},\\ x = x', \\ y^2+y'^2+p_2x+p_6 = 0, \end{array}\right. } \end{aligned}$$

quotiented by the involution \((x,y,x',y') \mapsto (x',y',x,y)\). This quotient can be realized by introducing the variables \(y+y'\) and \(yy'\); a computation then shows that \(\hat{C}_b\) and its quotient by \(\hat{\tau }\) are given by (the projective closure of) the equations

$$\begin{aligned}&\hat{C}_b : (y^2+p_2x+p_6)^2 = -4(x^3+p_8x+p_{12}), \end{aligned}$$
(3.7)
$$\begin{aligned}&\hat{E}_b : (y+p_2x+p_6)^2 = -4(x^3+p_8x+p_{12}). \end{aligned}$$
(3.8)

This construction motivates us to define a \(\mathbb {G}_m\)-equivariant morphism \(\chi :\mathsf {B}\rightarrow \mathsf {B}\) by sending \((p_2,p_6,p_8,p_{12})\) to

$$\begin{aligned} 3\cdot \left( -2 p_2,8 p_6-\frac{2 p_2^3}{3},16 p_8 -\frac{p_2^4}{3}+8 p_2p_6,-64p_{12}-\frac{2p_2^6}{27}+\frac{8p_2^3p_6}{3}-16p_6^2+\frac{16p_2^2p_8}{3} \right) . \end{aligned}$$
(3.9)

(We include the factor 3 in front so that \(\chi \) has integer coefficients.) We have defined \(\chi \) so that for all \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(k)\), \(C_{\chi (b)}\) is isomorphic to \(\hat{C}_b\). We also write \(\hat{b}\) for \(\chi (b)\), thinking of it as the ‘bigonal dual’ of b.

Theorem 3.1

(The bigonal construction) For any field \(k/\mathbb {Q}\) and \(b = (p_2,p_6,p_8,p_{12})\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(k)\):

  • The projective curve \(C_{\chi (b)}\) is isomorphic to the projective closure of the curve

    $$\begin{aligned} \left( y^2+p_2x+p_6\right) ^2 = -4(x^3+p_8x+p_{12}), \end{aligned}$$
    (3.10)

    and \(\tau :C_{\chi (b)}\rightarrow C_{\chi (b)}\) maps (xy) to \((x,-y)\).

  • There exists an isomorphism \(P_{\chi (b)} \simeq P_b^{\vee }\) of (1, 2)-polarized abelian varieties.

  • We have \(\chi (\chi (b)) = 18\cdot b\) for all \(b\in \mathsf {B}\).

Proof

Only the last part is not yet established, which follows from an explicit computation. \(\square \)

For any \(\mathsf {B}\)-scheme U we define the \(\mathsf {B}\)-scheme \(\hat{U}\) as the pullback of \(U\rightarrow \mathsf {B}\) along \(\chi :\mathsf {B}\rightarrow \mathsf {B}\). In particular, we obtain the \(\mathsf {B}\)-schemes \(\hat{C}, \hat{E}, \hat{J}, \hat{P}\). In this notation, one can prove that there exists an isomorphism \(P^{\vee } \simeq \hat{P}\) of polarized abelian schemes over \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\). However, we will not need this fact in what follows.

Corollary 3.2

There exists an exact sequence of finite étale group schemes over \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\):

$$\begin{aligned} 0 \rightarrow E[2] \rightarrow P[2]\rightarrow \hat{E}[2] \rightarrow 0 \end{aligned}$$

isomorphic to the exact sequence

$$\begin{aligned} 0 \rightarrow P[\rho ] \rightarrow P[2] \xrightarrow {\rho } P^{\vee }[\hat{\rho }]\rightarrow 0. \end{aligned}$$

Moreover, the \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\)-groups \(E[2]\) and \(\hat{E}[2]\) are not isomorphic, even after base change to k for any field extension \(k/\mathbb {Q}\).

Proof

Since \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\) is normal, it suffices to prove the corollary over the generic point. The second exact sequence of the corollary follows from the identity \(\hat{\rho }\circ \rho = [2]\). We have seen in Sect. 3.3 (Property 5) that the kernel \(P[\rho ]\) of \(\rho \) is identified with \(E[2]\). Since \(P^{\vee }[\hat{\rho }] \simeq \hat{P}[\rho ]\) by Theorem 3.1, we see that \(P^{\vee }[\hat{\rho }]\simeq \hat{E}[2]\). The last claim follows from the last claim of Corollary 3.1. \(\square \)

3.5 The compactified Prym variety

In Section 3.3 we have constructed a family of abelian varieties \(P\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\). In this section we construct a projective scheme \(\overline{P}\rightarrow \mathsf {B}\) containing \(P\) as a dense open subscheme. The properties of \(\overline{P}\) (which are summarized in Proposition 3.9) will be the crucial geometric input for the construction of integral orbit representatives for \((\mathsf {G},\mathsf {V})\) in Sect. 5.4.

Recall that \(C_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\) is the family of projective curves given by Eq. (3.1). Let \(J_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\) be the relative Jacobian of its smooth part, a smooth and proper morphism. In [34,  §4.3], a proper morphism \(\bar{J}_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\) is constructed which parametrizes torsion-free rank-1 sheaves on the fibres of \(C_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\). (In fact, in that paper the compactified Jacobian \(\bar{\mathcal {J}}_{E}\) was constructed over \({{\,\mathrm{Spec}\,}}\mathbb {Z}[1/N]\) for some \(N\ge 1\); we define \(\bar{J}_{\mathrm {E}}\) as the \(\mathbb {Q}\)-fibre of \(\bar{\mathcal {J}}_{E}\).) We state some of its properties here, referring to [34,  Corollary 4.13] for proofs and references.

Proposition 3.7

  1. 1.

    For any \(\mathsf {B}_{\mathrm {E}}\)-scheme T, the T-points of \(\bar{J}_{\mathrm {E}}\) are in natural bijection with the set of isomorphism classes of locally finitely presented \(\mathcal {O}_{C_{\mathrm {E}}\times T}\)-modules \(\mathscr {F}\), flat over T, with the property that \(\mathscr {F}_t\) is torsion-free rank 1 of degree zero for every geometric point t of T, and that there exists an isomorphism of \(\mathcal {O}_T\)-modules \(\infty _T^*\mathscr {F}\simeq \mathcal {O}_T\), where \(\infty :\mathsf {B}_{\mathrm {E}}\rightarrow C_{\mathrm {E}}\) denotes the section at infinity.

  2. 2.

    The morphism \(\bar{J}_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\) is flat, projective and its restriction to \(\mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\subset \mathsf {B}_{\mathrm {E}}\) is isomorphic to \(J_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}^{{{\,\mathrm{rs}\,}}}\).

  3. 3.

    The variety \(\bar{J}_{\mathrm {E}}\rightarrow {{\,\mathrm{Spec}\,}}\mathbb {Q}\) is smooth.

Recall from Sect. 3.1 that the involution \(\tau : (x,y) \mapsto (x,-y)\) of \(C_{\mathrm {E}}\) lifts the involution \((-1) :\mathsf {B}_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\). It induces an involution \(\tau ^*\) of \(\bar{J}_{\mathrm {E}}\), sending a rank 1 torsion-free sheaf \(\mathscr {F}\) to its pullback \(\tau ^*(\mathscr {F})\).

On the other hand, we may construct a different involution of \(\bar{J}_{\mathrm {E}}\) extending \([-1]: J_{\mathrm {E}}\rightarrow J_{\mathrm {E}}\), as follows. If \(\mathscr {F}\) is a coherent sheaf on a scheme X, we define \(\mathscr {F}^{\vee } :=\mathscr {H} om(\mathscr {F},\mathcal {O}_X)\).

Lemma 3.6

Let T be a \(\mathsf {B}_{\mathrm {E}}\)-scheme and \(\mathscr {F}\) an \(\mathcal {O}_{C_{\mathrm {E}}\times T}\)-module, corresponding to a T-point of \(\bar{J}_{\mathrm {E}}\). Then the \(\mathcal {O}_{C_{\mathrm {E}}\times T}\)-module \(\mathscr {F}^{\vee }\) corresponds to a T-point of \(\bar{J}_{\mathrm {E}}\), and the corresponding morphism \(\bar{J}_{\mathrm {E}}\rightarrow \bar{J}_{\mathrm {E}},\, \mathscr {F}\mapsto \mathscr {F}^{\vee }\) is an involution.

Proof

Since the fibres of \(C_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\) are all Gorenstein curves (being complete intersections), [31,  Lemma 1.1(a)] shows that \(\mathscr {E} xt_{\mathcal {O}_{C_{\mathrm {E},t}}}^1(\mathscr {F}_t,\mathcal {O}_{C_{\mathrm {E},t}})=0\) for all geometric points t of T. Therefore [1,  Theorem 1.10(ii)] implies that \(\mathscr {F}^{\vee }\) is locally finitely presented and flat over T, and that \((\mathscr {F}^{\vee })_S\simeq (\mathscr {F}_S)^{\vee }\) for every morphism \(S\rightarrow T\). It follows that \((\mathscr {F}^{\vee })_t=\mathscr {F}_t^{\vee }\) is torsion-free rank 1 of degree zero since the same is true for \(\mathscr {F}_t\). Moreover \(\infty _T^*(\mathscr {F}^{\vee })\simeq (\infty _T^*\mathscr {F})^{\vee } \simeq \mathcal {O}_T^{\vee }\simeq \mathcal {O}_T\). Therefore by Proposition 3.7, \(\mathscr {F}^{\vee }\) corresponds to a T-point of \(\bar{J}_{\mathrm {E}}\).

It remains to prove that the natural map \(\mathscr {F} \rightarrow \mathscr {F}^{\vee \vee }\) is an isomorphism. Since the formation of \(\mathscr {F}^{\vee \vee }\) commutes with base change, we may assume that T is the spectrum of an algebraically closed field. In this case the claim follows from [31,  Lemma 1.1(b)]. \(\square \)

Write \(\mu \) for the composite of the commuting involutions \(\tau ^*\) and \( \mathscr {F}\mapsto \mathscr {F}^{\vee }\). Write \(\bar{J}\rightarrow \mathsf {B}\) for the restriction of \(\bar{J}_{\mathrm {E}}\) to \(\mathsf {B}\hookrightarrow \mathsf {B}_{\mathrm {E}}\).

Definition 3.2

We define the \(\overline{P}\rightarrow \mathsf {B}\) as the \(\mu \)-fixed points of the morphism \(\bar{J}_{\mathrm {E}}\rightarrow \mathsf {B}_{\mathrm {E}}\).

The scheme \(\overline{P}\) is a closed subscheme of \(\bar{J}_{\mathrm {E}}\) so the morphism \(\overline{P}\rightarrow \mathsf {B}\) is projective. By definition of \(P\) (cf. Eq. (3.6)) the restriction of \(\overline{P}\) to \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\subset \mathsf {B}\) is isomorphic to \(P\).

Lemma 3.7

The scheme \(\overline{P}\) is smooth over \(\mathbb {Q}\).

Proof

Apply Lemma 3.3 to the \(\mathbb {Z}/2\mathbb {Z}\)-action of \(\mu \) on the smooth morphism \(\bar{J}_{\mathrm {E}}\rightarrow {{\,\mathrm{Spec}\,}}\mathbb {Q}\). \(\square \)

We now analyze the irreducible components of \(\overline{P}\). The following lemma contains the key calculation of the central fibre of \(\overline{P}\rightarrow \mathsf {B}\).

Proposition 3.8

The fibre of the morphism \(\overline{P}\rightarrow \mathsf {B}\) above 0 is geometrically irreducible of dimension 2.

Proof

In the course of the proof we may and will assume that all schemes are base changed to \(\mathbb {C}\) and by abuse of notation will identify them with their set of complex points. Write \(J_0\) for the identity component of the Picard scheme of the projective curve \(C_0\) given by the equation \((y^4=x^3)\), an open subscheme of \(\bar{J}_0\) stable under \(\mu \). To prove the proposition it suffices to prove that \(P_0:=J_0^{\mu }\) is irreducible of dimension 2 and \(P_0 \subset \overline{P}_0\) is a dense open subscheme. Because the normalization \(\pi :\tilde{C}_0\rightarrow C_0\) is rational and \(C_0\) is Gorenstein, we may appeal to the results of [4] (originally due to Rego [51]) to describe \(\bar{J}_0\) explicitly.

Define the local rings \(\tilde{\mathcal {O}} = \mathbb {C}[[t]]\), \(\mathcal {O}= \mathbb {C}[[t^3,t^4]]\subset \tilde{\mathcal {O}}\) and the truncated versions \(\tilde{A} = \tilde{\mathcal {O}}/t^6\) and \(A = {{\,\mathrm{image}\,}}( \mathcal {O}\rightarrow \tilde{\mathcal {O}}/t^6)\subset \tilde{A}\). Then \(\mathcal {O}\) is the completed local ring of \(C_0\) at the origin and \(\tilde{\mathcal {O}}\) its normalization. Let \(Gr(3,\tilde{A})\) be the Grassmannian parametrizing 3-dimensional subspaces of \(\tilde{A}\). Let \(\mathcal {M} \subset Gr(3,\tilde{A})\) be the reduced closed subscheme parametrizing those subspaces which are stable under the action of A. The map \(M \mapsto M\otimes _{\mathcal {O}}A\) establishes a bijection between the \(\mathcal {O}\)-submodules M of \(\tilde{\mathcal {O}}\) with \(\dim _{\mathbb {C}} \tilde{\mathcal {O}}/M=3\) and \(\mathcal {M}\) (by [29,  Lemma 1.1(iv)] and the fact that \(\dim _{\mathbb {C}} \tilde{\mathcal {O}}/\mathcal {O}=3\)), whose inverse we denote by \(M\mapsto M^{\mathcal {O}}\). We have a natural morphism of \(\mathcal {O}_{C_0}\)-modules \(\pi _*\mathcal {O}_{\tilde{C}_0}\rightarrow \tilde{A}\), where \(\tilde{A}\) is considered as the structure sheaf of the degree-6 divisor supported at the preimage under \(\pi \) of the singular point. The assignment

$$\begin{aligned} M \mapsto \mathscr {F}_M:=\ker (\pi _*\mathcal {O}_{\tilde{C}_0} \rightarrow \tilde{A}/M) \end{aligned}$$

defines a morphism \(e:\mathcal {M}\rightarrow \bar{J}_0\) which is bijective and proper [4,  Proposition 3.7], hence a homeomorphism in the Zariski topology. The sheaf \(\mathscr {F}_M\) is invertible if and only if \(M^{\mathcal {O}}\) is a cyclic \(\mathcal {O}\)-module; the locus of such M define an open subscheme \(\mathcal {M}^{\circ }\subset \mathcal {M}\). Let \(\tau :\tilde{A} \rightarrow \tilde{A}\) be the \(\mathbb {C}\)-algebra homomorphism sending t to \(-t\). For \(M\in \mathcal {M}(\mathbb {C})\) define \(M^{\vee } = \{ x\in \tilde{A} \mid x\cdot M \subset A\}\) and \(\tau ^*M = \{ \tau (m) \mid m\in M\}\); they define involutions \((-)^{\vee }\) and \(\tau ^*\) of \(\mathcal {M}\) with composite \(\mu \). Since \(\mathscr {F}_{\mu (M)} \simeq \mu (\mathscr {F}_M)\), it suffices to prove that \(\mathcal {M}_P^{\circ } :=\left( \mathcal {M}^{\circ }\right) ^{\mu }\) is irreducible, two-dimensional and dense in \(\mathcal {M}_P:=\mathcal {M}^{\mu }\).

The map \(a\mapsto A\cdot a\) defines a bijection \(\tilde{A}^{\times }/A^{\times }\rightarrow \mathcal {M}^{\circ } \). We have a group isomorphism \(\mathbb {G}_a^3\rightarrow \tilde{A}^{\times }/A^{\times }\) given by sending \((a_1,a_2,a_5)\) to the coset of

$$\begin{aligned} \text {exp}(a_1t+a_2t^2+a_5t^5)= & {} 1+(a_1t+a_2t^2+a_5t^5)+(a_1t+a_2t^2+a_5t^5)^2/2\\&+\dots +(a_1t+a_2t^2+a_5t^5)^5/5!. \end{aligned}$$

The composite \(\mathbb {G}_a^3\rightarrow \mathcal {M}^{\circ }\) is an isomorphism of varieties and gives \(\mathcal {M}^{\circ }\) the structure of an algebraic group which acts on \(\mathcal {M}\). The restriction of \(\mu \) to \(\mathcal {M}^{\circ }\) corresponds to the involution \((a_1,a_2,a_5) \mapsto (a_1,-a_2,a_5)\) under the above isomorphism. Therefore \(\mathcal {M}_P^{\circ }\) is isomorphic to \(\mathbb {G}_a^2\), hence irreducible and two-dimensional; it remains to prove that it is dense in \(\mathcal {M}_{P}\). The orbits of the action of \(\mathcal {M}^{\circ }\) stratifies \(\mathcal {M}\) into affine cells which are described in [25,  §4]. They correspond to isomorphism classes of torsion-free rank 1 \(\mathcal {O}\)-modules and their properties are described in Table 4. The second column gives an \(\mathcal {O}\)-module representative \(M^{\mathcal {O}}\) for some \(M\in X_i\); the third column depicts the powers of t generating M.

Since \(\mu \) preserves \(\mathcal {M}^{\circ }\) it permutes the strata. By dimension reasons it can only permute \(X_2\) and \(X_3\). Since the dual of tA is \(t^2A+t^3A\) and \(\tau \) fixes tA we see that \(\mu (X_2)=X_3\). Therefore \(\mathcal {M}_{P} = \mathcal {M}^{\circ }_P \sqcup X_4^{\mu } \sqcup X_5^{\mu }\). So it will be enough to show that the closure of \(\mathcal {M}^{\circ }_P\) contains \(X_4^{\mu } \sqcup X_5^{\mu }\).

Table 4 Stratification of M

Using the description of \(\mathcal {M}^{\circ }\) given above and the exponential map, every element of \(\mathcal {M}_P^{\circ }\) is an A-module generated by

$$\begin{aligned} \left( 1+at+\frac{a^2}{2}t^2+\frac{a^3}{6}t^3+\frac{a^4}{24}t^4+\left( \frac{a^5}{120}+b\right) t^5 \right) \end{aligned}$$

for some \(a,b \in \mathbb {C}\). Using the Plucker coordinates \(\{t^i\wedge t^j \wedge t^k \}\) in \(Gr(3,\tilde{A})\), one can compute that the closure of \(\mathcal {M}_P^{\circ }\) contains \(\lambda (t^2\wedge t^4\wedge t^5)+\mu (t^3\wedge t^4\wedge t^5)\) for all \(\lambda , \mu \in \mathbb {C}\). Since every element of \(X_4\) or \(X_5\) is of this form, this proves the proposition. \(\square \)

Recall that we have defined a \(\mathbb {G}_m\)-action on \(C\rightarrow \mathsf {B}\) in Sect. 3.1 after Proposition 3.2. By functoriality this induces a \(\mathbb {G}_m\)-action on \(\overline{P}\) such that the morphism \(\overline{P}\rightarrow \mathsf {B}\) is \(\mathbb {G}_m\)-equivariant. The following fact will be used in the next three lemmas: if \(Z\subset \mathsf {B}\) is a closed, nonempty and \(\mathbb {G}_m\)-invariant subscheme, then it contains the central point 0.

Lemma 3.8

The scheme \(\overline{P}\) is geometrically irreducible.

Proof

Since \(\overline{P}\) is smooth (Lemma 3.7), the irreducible components of \(\overline{P}_{\overline{\mathbb {Q}}}\) coincide with its connected components so in particular are disjoint. The image of each connected component under \(\overline{P}\rightarrow \mathsf {B}\) is closed (using properness) and \(\mathbb {G}_m\)-invariant, hence contains the central point. But \(\overline{P}_{0,\overline{\mathbb {Q}}}\) is irreducible by Proposition 3.8 so there exists at most one such connected component, as required. \(\square \)

Lemma 3.9

The morphism \(\overline{P}\rightarrow \mathsf {B}\) is flat.

Proof

We first claim that all the fibres of \(\overline{P}\rightarrow \mathsf {B}\) are 2-dimensional. Since \(\overline{P}\rightarrow \mathsf {B}\) is proper, the fibre dimension of this morphism is upper semicontinuous on \(\mathsf {B}\) [30,  Corollaire 13.1.5]. The general fibre is 2-dimensional; let \(Z\subset \mathsf {B}\) be the closed subset where the fibre has larger dimension. The \(\mathbb {G}_m\)-action on \(\overline{P}\rightarrow \mathsf {B}\) shows that this locus is invariant under \(\mathbb {G}_m\) hence it must contain the central point 0, if it is non-empty. But Proposition 3.8 shows that \(0\not \in Z\), proving the claim.

The lemma now follows from the smoothness and irreducibility of \(\overline{P}\) (Lemmas 3.7 and 3.8) and \(\mathsf {B}\) and Miracle Flatness [37,  Theorem 23.1]. \(\square \)

Lemma 3.10

The fibres of the morphism \(\overline{P}\rightarrow \mathsf {B}\) are geometrically integral.

Proof

We first claim that \(\overline{P}_0\) is geometrically reduced. Proposition 3.8 shows that \(\overline{P}_0\) contains a smooth open dense subscheme \(P_0\). Therefore \(\overline{P}_0\) is generically reduced and it suffices to prove that it is geometrically Cohen–Macaulay. But since \(\overline{P}\rightarrow \mathsf {B}\) is flat (Lemma 3.9) and \(0\hookrightarrow \mathsf {B}\) is a complete intersection, the pullback \(\overline{P}_0 \hookrightarrow \overline{P}\) is a complete intersection. Since \(\overline{P}\) is smooth (Lemma 3.7), \(\overline{P}_{0,\overline{\mathbb {Q}}}\) is a local complete intersection hence Cohen–Macaulay. We conclude that \(\overline{P}_0\) is geometrically reduced hence by Proposition 3.8 geometrically integral.

The proposition now follows from the contracting \(\mathbb {G}_m\)-action. Indeed, the locus Z of elements of \(\mathsf {B}\) above which the fibre fails to be geometrically integral is closed [30,  Théorème 12.2.1(x)] and \(\mathbb {G}_m\)-invariant. Since we have just shown that Z does not contain the central point, it must be empty. \(\square \)

We summarize the properties of \(\overline{P}\) for later reference in the following proposition.

Proposition 3.9

The morphism \(\overline{P}\rightarrow \mathsf {B}\) constructed above is flat, projective and its restriction to \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\) is isomorphic to \(P\). Moreover \(\overline{P}\) is smooth and geometrically integral. The locus of \(\overline{P}\) where the morphism \(\overline{P}\rightarrow \mathsf {B}\) is smooth is an open subset whose complement has codimension at least two.

Proof

The only thing that remains to be proven is the statement about the smooth locus of \(\overline{P}\rightarrow \mathsf {B}\); denote this morphism by \(\phi \). Let \(Z\subset \overline{P}\) be the (reduced) closed subscheme where \(\phi \) fails to be smooth. The smoothness of \(P\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) shows that \(Z_b\) is empty if \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}\). Moreover since the fibres of \(\phi \) are geometrically integral by Lemma 3.10, the smooth locus of \(\overline{P}_b\) is nonempty and \(Z_b\subset \overline{P}_b\) is a proper closed subset of smaller dimension for every \(b\in B\). Combining the last two sentences proves the statement. \(\square \)

The discussion of this section has another geometric consequence, which will be useful in Sect. 5.3.

Proposition 3.10

Let \({{\,\mathrm{Pic}\,}}^0_{C/\mathsf {B}} \rightarrow \mathsf {B}\) be the identity component of the relative Picard scheme of \(C\rightarrow \mathsf {B}\) [18,  §9.3, Theorem 1]. Then the fibres of \({{\,\mathrm{Pic}\,}}^0[1+\tau ^*]\rightarrow \mathsf {B}\) are geometrically integral.

Proof

By construction of \(\overline{P}\), there exists a morphism of \(\mathsf {B}\)-schemes \({{\,\mathrm{Pic}\,}}^0[1+\tau ^*]\rightarrow \overline{P}\) which is an open immersion. Therefore for every \(b\in \mathsf {B}\), \({{\,\mathrm{Pic}\,}}^0[1+\tau ^*]_b\) is a non-empty open subset of \(\overline{P}_b\). Since the fibres of \(\overline{P}\rightarrow \mathsf {B}\) are geometrically integral (Lemma 3.10), the proposition follows.

3.6 The discriminant polynomial

We give an explicit description of the discriminant polynomial \(\varDelta \in \mathbb {Q}[\mathsf {V}]^{\mathsf {G}} = \mathbb {Q}[\mathsf {B}]\) introduced in Sect. 2.1 before Proposition 2.3. Recall that we have fixed an isomorphism \(\mathbb {Q}[\mathsf {V}]^{\mathsf {G}} \simeq \mathbb {Q}[p_2,p_6,p_8,p_{12}]\) from Proposition 3.2, so we consider \(\varDelta \) as a polynomial in \(p_2,p_6,p_8,p_{12}\). Since the \(F_4\) root system has 48 roots, \(\varDelta \) is homogeneous of degree 48 with respect to the \(\mathbb {G}_m\)-action on \(\mathsf {B}\).

Set \(\varDelta _{\hat{E}} :=4p_8^3+27p_{12}^2\) and \(\varDelta _{E} :=\varDelta _{\hat{E}}\circ \chi \), where \(\chi \) is defined by Formula (3.9), both elements of \(\mathbb {Q}[\mathsf {B}]\). Then \(\varDelta _{E}\) and \(\varDelta _{\hat{E}}\) are up to elements of \(\mathbb {Q}^{\times }\) the discriminants of the curves \({\overline{E}}\rightarrow \mathsf {B}\) and \(\hat{\overline{E}}\rightarrow \mathsf {B}\).

Lemma 3.11

The polynomial \(\varDelta \in \mathbb {Q}[p_2,p_6,p_8,p_{12}]\) equals, up to an element of \(\mathbb {Q}^{\times }\), the polynomial \(\varDelta _{E}\cdot \varDelta _{\hat{E}}\). In other words, there exists a constant \(A_0\in \mathbb {Q}^{\times }\) such that

$$\begin{aligned} \varDelta (b) = A_0\left( 4p_8(\hat{b})^3+27p_{12}(\hat{b}) \right) \left( 4p_8(b)^3+27p_{12}(b) \right) . \end{aligned}$$

Proof

It suffices to prove the claim when base changed to an algebraically closed field \(k/\mathbb {Q}\). The polynomials \(\varDelta \) and \(\varDelta _{E}\cdot \varDelta _{\hat{E}}\) both have degree 48. Moreover \(\varDelta _{E}\) and \(\varDelta _{\hat{E}}\) are irreducible coprime polynomials in \(k[\mathsf {B}]\). So to prove the claim it suffices to prove that the vanishing loci of \(\varDelta \) and \(\varDelta _{E}\cdot \varDelta _{\hat{E}}\) agree. By Proposition 3.2, if \(b\in \mathsf {B}(k)\) then \(\varDelta (b)\ne 0\) if and only if the curve \((y^4+p_2xy^2+p_6y^2 = x^3+p_8x+p_{12})\) is smooth. By the Jacobian criterion for smoothness, this happens if and only if the curve \(y^2+p_2xy+p_6y = x^3+p_8x+p_{12}\) is smooth and the polynomial \(x^3+p_8x+p_{12}\) has no multiple roots. The lemma then follows from the explicit descriptions of \(\overline{E}\rightarrow \mathsf {B}\) and \(\hat{\overline{E}} \rightarrow \mathsf {B}\) given by Eqs. (3.5) and (3.8) respectively. \(\square \)

Remark 3.1

The factorization of \(\varDelta \) into a product of two degree-24 polynomials of Lemma 3.11 can be interpreted Lie-theoretically. It corresponds to the fact that the Weyl group \(W(\mathsf {H},\mathsf {T})\) has two orbits on \(\varPhi (\mathsf {H},\mathsf {T})\), namely an orbit consisting of the 24 short roots and one consisting of the 24 long roots. It is true (although we do not prove this) that \(\varDelta _{\hat{E}}\) corresponds the short root orbit and \(\varDelta _{E}\) to the long root orbit.

4 Orbit parametrization

In this section we construct, for each \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(\mathbb {Q})\), an embedding of \({{\,\mathrm{Sel}\,}}_2 P_b\) inside the set of \(\mathsf {G}(\mathbb {Q})\)-orbits of \(\mathsf {V}(\mathbb {Q})\) with invariants b. Moreover we introduce a different representation \((\mathsf {G}^{\star },\mathsf {V}^{\star })\) and similarly prove that \({{\,\mathrm{Sel}\,}}_{\hat{\rho }} P_b^{\vee }\) embeds in its rational orbits.

We first recall a well-known lemma which gives a cohomological description of orbits.

Lemma 4.1

Let \(G\rightarrow S\) be a smooth affine group scheme. Suppose that G acts on the S-scheme X and let \(e\in X(S)\). Suppose that the action map \(m:G\rightarrow X, g\mapsto g\cdot e\) is smooth and surjective. Then the assignment \(x\mapsto m^{-1}(x)\) induces a bijection between the set of G(S)-orbits on X(S) and the kernel of the map of pointed sets \(\mathrm {H}^1(S,Z_{G}(e)) \rightarrow \mathrm {H}^1(S,G)\).

Proof

This is [24,  Exercise 2.4.11]: the conditions imply that \(X\simeq G/Z_{G}(e)\) and since G and \(Z_G(e)\) (the fibre above e of a smooth map) are S-smooth we can replace fppf cohomology by étale cohomology. \(\square \)

Lemma 4.1 has the following concrete consequence. Let k be a field and G/k a smooth algebraic group which acts on a k-scheme X. Suppose that \(e\in X(k)\) has smooth stabilizer \(Z_G(e)\) and the action of \(G(k^s)\) on \(X(k^s)\) is transitive, where \(k^s\) denotes a separable closure of k. Then the G(k)-orbits of X(k) are in bijection with \(\ker (\mathrm {H}^1(k,Z_G(e))\rightarrow \mathrm {H}^1(k,G))\). This fact allows us to make the connection with Galois cohomology and lies at the basis of our orbit parametrizations in Sect. 4.1 and Sect. 4.3.

4.1 Embedding the 2-Selmer group

The purpose of this section is to prove Theorem 4.1 and its consequence, Corollary 4.1. The essential input is a similar orbit parametrization obtained in the \(E_6\) case in [34]. For any morphism \(b:S\rightarrow \mathsf {B}\) we write \(\mathsf {V}_b\) for the fibre of \(\pi :\mathsf {V}\rightarrow \mathsf {B}\) under b and similarly for \(\mathsf {V}_{\mathrm {E}}\).

Lemma 4.2

Let R be a \(\mathbb {Q}\)-algebra and \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(R)\). Then there are canonical bijections of sets

  1. 1.

    \( \mathsf {G}(R) \backslash \mathsf {V}_b(R) \simeq \ker \left( \mathrm {H}^1(R,P_b[2]) \rightarrow \mathrm {H}^1(R,\mathsf {G})\right) .\)

  2. 2.

    \( \mathsf {G}_{\mathrm {E}}(R)\backslash \mathsf {V}_{\mathrm {E},b}(R) \simeq \ker \left( \mathrm {H}^1(R,J_b[2]) \rightarrow \mathrm {H}^1(R,\mathsf {G}_{\mathrm {E}})\right) .\)

The reducible orbits \(\mathsf {G}(R)\cdot \sigma (b)\) and \(\mathsf {G}_{\mathrm {E}}(R)\cdot \sigma _{\mathrm {E}}(b)\) correspond to the trivial element in \(\mathrm {H}^1(R,P_b[2])\) and \(\mathrm {H}^1(R,J_b[2])\) respectively. Moreover the following diagram is commutative:

figure x

Here the horizontal maps are induced by the natural inclusions and the vertical maps are the injections induced by the above bijections.

Proof

We consider the case of \((\mathsf {G},\mathsf {V})\), the case of \((\mathsf {G}_{\mathrm {E}},\mathsf {V}_{\mathrm {E}})\) being analogous. The bijection then follows from Lemma 4.1 applied to the action of \(\mathsf {G}_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}}\) on \(\mathsf {V}^{{{\,\mathrm{rs}\,}}}\). Indeed, the action map \(\mathsf {G}\times \mathsf {B}^{{{\,\mathrm{rs}\,}}} \rightarrow \mathsf {V}^{{{\,\mathrm{rs}\,}}}, (g,b) \mapsto g\cdot \sigma (b)\) is étale (Proposition 2.4) and it is surjective by Part 2 of Proposition 2.2. Moreover we have an isomorphism \(Z_{\mathsf {G}}(\sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}})\simeq P[2]\) by Proposition 3.4. Pulling back along \(b:{{\,\mathrm{Spec}\,}}R\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) gives the desired bijection.

The claim about \(\mathsf {G}(R)\cdot \sigma (b)\) follows from the explicit description of the bijection of Lemma 4.1. The commutative diagram follows from the definition of the pushout of torsors and the compatibility between the isomorphisms \(Z_{\mathsf {G}_{\mathrm {E}}}(\sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}}) \simeq J[2]\) and \(Z_{\mathsf {G}}(\sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}}) \simeq P[2]\). \(\square \)

Lemma 4.3

Let \(\mathsf {G}^{sc}\rightarrow \mathsf {G}\) be the simply connected cover of \(\mathsf {G}\). Let R be a \(\mathbb {Q}\)-algebra such that every locally free R-module of constant rank is free. Then the pointed set \(\mathrm {H}^1(R,\mathsf {G}^{sc})\) is trivial.

Proof

We have \(\mathsf {G}^{sc} \simeq {{\,\mathrm{SL}\,}}_2 \times {{\,\mathrm{Sp}\,}}_6\) (Proposition 2.1). The result now follows from the triviality of \(\mathrm {H}^1(R,{{\,\mathrm{SL}\,}}_2)\) (by Hilbert’s theorem 90) and \(\mathrm {H}^1(R,{{\,\mathrm{Sp}\,}}_6)\) [34,  Lemma 3.12]. \(\square \)

Lemma 4.4

Let R be a \(\mathbb {Q}\)-algebra such that every locally free R-module of constant rank is free. Then the natural map of pointed sets \(\mathrm {H}^1(R,\mathsf {G}) \rightarrow \mathrm {H}^1(R,\mathsf {G}_{\mathrm {E}})\) has trivial kernel.

Proof

Let \(\mathsf {G}_{\mathrm {E}}^{sc}\rightarrow \mathsf {G}_{\mathrm {E}}\) be the simply connected cover of \(\mathsf {G}_{\mathrm {E}}\). We have a commutative diagram with exact rows over R:

figure y

Here the maps are the natural ones and we omit the subscript R from the notation. Considering the long exact sequence in cohomology we obtain a commutative diagram with exact rows of pointed sets:

figure z

Lemma 6.1 implies that \(\mathrm {H}^1(R,\mathsf {G}^{sc})\) is trivial. The exactness of the rows and the commutativity of the diagram imply that the kernel of the map \(\mathrm {H}^1(R,\mathsf {G}) \rightarrow \mathrm {H}^1(R,\mathsf {G}_{\mathrm {E}}) \) is trivial, as desired. \(\square \)

Theorem 4.1

Let R be a \(\mathbb {Q}\)-algebra such that every locally free R-module is free and \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(R)\). Then there is a canonical injection \(\eta _b : P_b(R)/2P_b(R) \hookrightarrow \mathsf {G}(R)\backslash \mathsf {V}_b(R)\) compatible with base change. Moreover the map \(\eta _b\) sends the identity element to the orbit of \(\sigma (b)\).

Proof

If \(A\in P_b(R)\), define \(\eta _b(A)\in \mathrm {H}^1(R,P_b[2])\) as the image of A under the 2-descent map, namely the isomorphism class of the \(P_b[2]\)-torsor \([2]^{-1}(A)\). It suffices to prove, under the identification of Lemma 4.2, that the class \(\eta _b(A)\) is killed under the map \(\mathrm {H}^1(R,P_b[2]) \rightarrow \mathrm {H}^1(R,\mathsf {G})\). By Lemma 4.4 it suffices to prove that this class is trivial in \(\mathrm {H}^1(R,\mathsf {G}_{\mathrm {E}})\). By the parametrization of orbits of the representation \(\mathsf {V}_{\mathrm {E}}\) [34,  Theorem 3.13], the composite \(J_b(R)/2J_b(R) \hookrightarrow \mathrm {H}^1(R,J_b[2]) \rightarrow \mathrm {H}^1(R,\mathsf {G}_{\mathrm {E}})\) is trivial. The commutative diagram

figure aa

then implies the theorem. \(\square \)

We obtain the following concrete corollary of the parametrization of 2-Selmer elements.

Corollary 4.1

Let \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(\mathbb {Q})\) and write \({{\,\mathrm{Sel}\,}}_2 P_b\) for the 2-Selmer group of \(P_b\). Then the injection \(\eta _b:P_b(\mathbb {Q})/2P_b(\mathbb {Q}) \hookrightarrow \mathsf {G}(\mathbb {Q})\backslash \mathsf {V}_b(\mathbb {Q})\) of Theorem 4.1 extends to an injection

$$\begin{aligned} {{\,\mathrm{Sel}\,}}_2 P_b \hookrightarrow \mathsf {G}(\mathbb {Q})\backslash \mathsf {V}_b(\mathbb {Q}). \end{aligned}$$

Proof

To prove the corollary it suffices to prove that 2-Selmer elements in \(\mathrm {H}^1(\mathbb {Q},P_b[2])\) are killed under the natural map \(\mathrm {H}^1(\mathbb {Q},P_b[2]) \rightarrow \mathrm {H}^1(\mathbb {Q},\mathsf {G})\). By definition, an element of \({{\,\mathrm{Sel}\,}}_2 P_b\) consists of a class in \(\mathrm {H}^1(\mathbb {Q},P_b[2])\) whose restriction to \(\mathrm {H}^1(\mathbb {Q}_v,P_b[2])\) lies in the image of the 2-descent map for every place v. By Theorem 4.1 the image of such an element in \(\mathrm {H}^1(\mathbb {Q}_v,\mathsf {G})\) is trivial for every v. Since the restriction map \(\mathrm {H}^2(\mathbb {Q},\mu _2) \rightarrow \prod _{v} \mathrm {H}^2(\mathbb {Q}_v,\mu _2)\) has trivial kernel by the Hasse principle for the Brauer group, the kernel of \(\mathrm {H}^1(\mathbb {Q},G) \rightarrow \prod _{v} \mathrm {H}^1(\mathbb {Q}_v,G)\) is trivial too. \(\square \)

4.2 The representation \((\mathsf {G}^{\star },\mathsf {V}^{\star })\)

We define a representation \((\mathsf {G}^{\star }, \mathsf {V}^{\star })\) and study its relation to \((\mathsf {G},\mathsf {V})\) using the binary quartic resolvent map from Sect. 2.5.

Definition 4.1

Define the \(\mathbb {Q}\)-group \(\mathsf {G}^{\star }:={{\,\mathrm{PGL}\,}}_2\). Define the \(\mathsf {G}^{\star }\)-representation \(\mathsf {V}^{\star }:=\mathbb {Q}\oplus \mathbb {Q}\oplus {{\,\mathrm{Sym}\,}}^4(2)\), where \(\mathbb {Q}\) denotes a copy of the trivial representation and \({{\,\mathrm{Sym}\,}}^4(2)\) denotes the space of binary quartic forms

$$\begin{aligned} \{q\mid q(x,y) = ax^4+bx^3y+cx^2y^2+dxy^3+ey^4\}. \end{aligned}$$

An element \([A]\in {{\,\mathrm{PGL}\,}}_2(\mathbb {Q})\) acts on q via \([A]\cdot q(x,y)=q((x,y)\cdot A)/(\det A)^2\). Define \(\mathsf {B}^{\star }:=\mathsf {V}^{\star }\mathbin {//}\mathsf {G}^{\star }\).

We will typically write an element of \(\mathsf {V}^{\star }\) as a triple \((b_2,b_6,q)\). We define a \(\mathbb {G}_m\)-action on \(\mathsf {V}^{\star }\) by \(\lambda \cdot (b_2,b_6,q) = (\lambda ^2b_2,\lambda ^6b_6,\lambda ^4 q)\). Write \(\mathcal {Q}:\mathsf {V}\rightarrow \mathsf {V}^{\star }\) for the morphism \(v\mapsto (p_2(v),p_6(v), Q_v)\), where \(Q_v\) denotes the resolvent binary quartic from Sect. 2.5 and \(p_2,p_6\) denote the invariant polynomials fixed in Proposition 3.2. The odd choice of \(\mathbb {G}_m\)-action on \(\mathsf {V}^{\star }\) is explained by the fact it makes \(\mathcal {Q}\) equivariant with respect to the \(\mathbb {G}_m\)-actions on \(\mathsf {V}\) and \(\mathsf {V}^{\star }\). Similarly to Sect. 2.5 write \(p:\mathsf {G}\rightarrow \mathsf {G}^{\star }\) for the projection associated to the identification \(\mathsf {G}\simeq ({{\,\mathrm{Sp}\,}}_6\times {{\,\mathrm{SL}\,}}_2)/\mu _2\) chosen in Sect. 2.4. There exists a unique \(\mathbb {G}_m\)-action on \(\mathsf {B}^{\star }\) such that the quotient morphism \(\pi ^{\star }:\mathsf {V}^{\star }\rightarrow \mathsf {B}^{\star }\) is \(\mathbb {G}_m\)-equivariant.

If \(q(x,y) = ax^4+bx^3y+cx^2y^2+dxy^3+ey^4 \), we define

$$\begin{aligned} I(q)&:=-3(12ae-3bd+c^2), \end{aligned}$$
(4.1)
$$\begin{aligned} J(q)&:=(72ace+9bcd-27ad^2-27eb^2-2c^3). \end{aligned}$$
(4.2)

Then IJ generate the ring of invariants of a binary quartic form. (Our I(q) is \(-3\) times the degree-2 invariant defined in [12,  §2, Eq. (4)].) We obtain an isomorphism of graded \(\mathbb {Q}\)-algebras \(\mathbb {Q}[\mathsf {B}^{\star }]\simeq \mathbb {Q}[b_2,b_6,I,J]\) where \(b_2,b_6,I,J\) have degree 2, 6, 8, 12 respectively. Moreover a binary quartic form q with coefficients in a field extension \(k/\mathbb {Q}\) has distinct roots in \(\mathbb {P}^1(\bar{k})\) if and only if \(4I(q)^3+27J(q)^2 \ne 0\).

We describe centralizers of elements of \(\mathsf {V}^{\star }\) in two ways. First we recall their classical relation to 2-torsion of elliptic curves. If k is a field and \(I,J\in k\) write \(E^{I,J}\) for the elliptic curve over k given by the Weierstrass equation \(y^2 = x^3+Ix+J\).

Lemma 4.5

Let \(k/\mathbb {Q}\) be a field and \(v\in \mathsf {V}^{\star }(k)\) have invariants \((I,J) :=(I(v),J(v)) \in k^2\) such that \(4I^3+27J^2 \ne 0\). Then there is an isomorphism of finite étale group schemes over k:

$$\begin{aligned} Z_{\mathsf {G}^{\star }}(v) \simeq E^{I,J}[2]. \end{aligned}$$

Proof

Up to scaling the invariants and changing an elliptic curve by a quadratic twist which doesn’t affect the 2-torsion group scheme, this is contained in [12,  Theorem 3.2]. \(\square \)

Next we give an alternative interpretation of centralizers in \(\mathsf {V}^{\star }\) using the results of Sect. 3. Recall from Corollary 3.2 that we have an exact sequence of finite étale group schemes over \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\):

$$\begin{aligned} 0 \rightarrow E[2] \rightarrow P[2] \rightarrow \hat{E}[2]\rightarrow 0. \end{aligned}$$

Lemma 4.6

The following two morphisms are canonically identified:

  • The morphism \(p:Z_{\mathsf {G}}(\sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}}) \rightarrow Z_{\mathsf {G}^{\star }}(\mathcal {Q}\circ \sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}})\).

  • The morphism \(P[2] \rightarrow \hat{E}[2]\).

In particular for every field \(k/\mathbb {Q}\) and \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(k)\), we have an isomorphism of k-group schemes

$$\begin{aligned} Z_{\mathsf {G}^{\star }}(\mathcal {Q}(\sigma (b))) \simeq E^{p_8(b),p_{12}(b)}[2]. \end{aligned}$$

Proof

The last sentence follows from the first claim and the fact that \(\hat{E}_b\) and \(E^{p_8(b),p_{12}(b)}\) are quadratic twists so have isomorphic 2-torsion group scheme. To prove the first claim it suffices to prove that the map \(Z_{\mathsf {G}}(\sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}}) \rightarrow Z_{\mathsf {G}^{\star }}(\mathcal {Q}\circ \sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}})\) is a nonconstant morphism of finite étale group schemes and \(Z_{\mathsf {G}^{\star }}(\mathcal {Q}\circ \sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}})\) has order 4; its kernel must then correspond, under the isomorphism \(Z_{\mathsf {G}}(\sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}})\simeq P[2]\) of Proposition 3.4, to the unique finite étale subgroup scheme of \(P[2]\) of order 4 by Corollary 3.1. Lemma 2.1 implies that \(Z_{\mathsf {G}^{\star }}(\mathcal {Q}\circ \sigma |_{\mathsf {B}^{{{\,\mathrm{rs}\,}}}})\) is finite étale and Lemma 4.5 implies that it is of order 4. Assume for contradiction that p is constant. Then by Lemma 4.1 we obtain a commutative diagram for every field \(k/\mathbb {Q}\) and \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(k)\):

figure ab

where the bottom map is constant. This implies that for every field \(k/\mathbb {Q}\) and for every two \(v_1, v_2\in \mathsf {V}_b(k)\), the binary quartic forms \(Q_{v_1}\) and \(Q_{v_2}\) are \({{\,\mathrm{PGL}\,}}_2(k)\)-equivalent. In particular by taking \(v_1 = \sigma (b)\), Lemma 2.2 shows that \(Q_v\) has a k-rational linear factor for every \(v\in \mathsf {V}^{{{\,\mathrm{rs}\,}}}(k)\). It is now simple to exhibit an explicit \(v\in \mathsf {V}^{{{\,\mathrm{rs}\,}}}(k)\) for which this fails; an example with \(k=\mathbb {R}\) is given in Remark 2.1. \(\square \)

The morphism \(\mathcal {Q}:\mathsf {V}\rightarrow \mathsf {V}^{\star }\) induces a morphism \(\mathsf {B}\rightarrow \mathsf {B}^{\star }\), still denoted by \(\mathcal {Q}\). We write \(\mathcal {Q}_2, \mathcal {Q}_6 , \mathcal {Q}_I , \mathcal {Q}_J \in \mathbb {Q}[\mathsf {B}]\) for the components of \(\mathcal {Q}\) using the coordinates \(b_2,b_6, I,J\). Evidently, we have \(\mathcal {Q}_2 = p_2\) and \(\mathcal {Q}_6 =p_6\). The next lemma determines \(\mathcal {Q}_I\) and \(\mathcal {Q}_J\) up to a constant.

Proposition 4.1

There exists \(\lambda \in \mathbb {Q}^{\times }\) such that

$$\begin{aligned} \mathcal {Q}_I = \lambda ^2 p_8, \quad \mathcal {Q}_J = \lambda ^3 p_{12}. \end{aligned}$$

Proof

Since \(\mathcal {Q}\) is \(\mathbb {G}_m\)-equivariant, the elements \(\mathcal {Q}_I\) and \(\mathcal {Q}_J\) of \(\mathbb {Q}[\mathsf {B}]\) are homogeneous of degree 8, 12 respectively. Lemma 2.1 implies that \(\mathcal {Q}\) maps \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\) in the locus of \(\mathsf {B}^{\star }\) where \(4\mathcal {Q}_I^3+27\mathcal {Q}_J^2\) does not vanish. In other words, we have a divisibility of polynomials in \(\mathbb {Q}[\mathsf {B}]\):

$$\begin{aligned} 4\mathcal {Q}_I(b)^3+27\mathcal {Q}_J(b)^2 \mid (4p_8(\hat{b})^3+27p_{12}(\hat{b})^2)(4p_8(b)^3+27p_{12}(b)^2). \end{aligned}$$
(4.3)

Here we have replaced \(\varDelta \in \mathbb {Q}[\mathsf {B}]\) by its explicit description afforded by Lemma 3.11. By degree considerations and the fact that the right hand side of (4.3) is a product of two irreducible polynomials, we know that up to a nonzero constant \(4\mathcal {Q}_I(b)^3+27\mathcal {Q}_J(b)^2\) equals either \(4p_8(b)^3+27p_{12}(b)^2\) or \(4p_8(\hat{b})^3+27p_{12}(\hat{b})^2\). In the first case, an explicit computation (using that \(\mathcal {Q}_I\) is a \(\mathbb {Q}\)-linear combination of elements of the form \(p_8, p_2p_6, p_2^3\) and analogously for \(\mathcal {Q}_J\)) one see that we must have \((\mathcal {Q}_I(b), \mathcal {Q}_J(b)) = (\lambda ^2 p_8(b), \lambda ^3 p_{12}(b))\) for some \(\lambda \in \mathbb {Q}^{\times }\). In the second case, we must have \((\mathcal {Q}_I(b), \mathcal {Q}_J(b)) = (\lambda ^2 p_8(\hat{b}), \lambda ^3 p_{12}(\hat{b}))\) for some \(\lambda \in \mathbb {Q}^{\times }\) since \(b\mapsto \hat{b}\) is an isomorphism (Theorem 3.1).

We argue by contradiction to exclude the second case, so suppose that it holds. Let \(k/\mathbb {Q}\) be an algebraically closed field extension and \(\mu \in k^{\times }\) a fourth root of \(\lambda \). Then \((\mathcal {Q}_I(b), \mathcal {Q}_J(b)) = (p_8(\mu \cdot \hat{b}), p_{12}(\mu \cdot \hat{b}))\). Let \(\eta :{{\,\mathrm{Spec}\,}}k(\eta ) \rightarrow \mathsf {B}_k\) be the generic point of \(\mathsf {B}_k\) and for ease of notation write \(v^{\star } = \mathcal {Q}(\sigma (\eta ))\).

By Lemma 4.5 we have an isomorphism \(Z_{\mathsf {G}^{\star }}(v^{\star }) \simeq E^{\mathcal {Q}_I(\eta ),\mathcal {Q}_J(\eta )}[2]\). On the other hand by Lemma 4.6 we have \(Z_{\mathsf {G}^{\star }}(v^{\star }) \simeq E^{p_8(\eta ),p_{12}(\eta )}[2]\). Using the assumption \((\mathcal {Q}_I(b), \mathcal {Q}_J(b)) = (p_8(\mu \cdot \hat{b}),p_{12}(\mu \cdot \hat{b}))\) and the fact that \(\hat{E}_{b}[2] \simeq E^{p_8(b),p_{12}(b)}[2]\), we obtain a chain of isomorphisms

$$\begin{aligned} \hat{E}_{\eta }[2] \simeq E^{p_8(\eta ),p_{12}(\eta )}[2] \simeq Z_{\mathsf {G}^{\star }}(v^{\star }) \simeq E^{\mathcal {Q}_I(\eta ),\mathcal {Q}_J(\eta )}[2] = E^{p_8(\mu \cdot \hat{\eta }),p_{12}(\mu \cdot \hat{\eta })}[2] \simeq E_{\eta }[2]. \end{aligned}$$

But by Corollary 3.2, E[2] and \(\hat{E}[2]\) are not isomorphic as finite étale group schemes over \(\mathsf {B}^{{{\,\mathrm{rs}\,}}}\). By [59,  Tag 0BQM] and the fact that \(\mathsf {B}\) is normal, the \(k(\eta )\)-groups \(E_{\eta }[2]\) and \(\hat{E}_{\eta }[2]\) are not isomorphic either. This is a contradiction, proving the proposition. \(\square \)

Corollary 4.2

The map \(\mathcal {Q}:\mathsf {B}\rightarrow \mathsf {B}^{\star }\) is a \(\mathbb {G}_m\)-equivariant isomorphism.

Proof

In the coordinates \(\mathsf {B}\simeq \mathbb {A}^4_{(p_2,p_6,p_8,p_{12})}\) and \(\mathsf {B}^{\star }\simeq \mathbb {A}^4_{(b_2,b_6,I,J)}\), \(\mathcal {Q}\) takes the form \((p_2,p_6,p_8,p_{12}) \mapsto (p_2,p_6,\lambda ^2 p_8, \lambda ^3 p_{12})\) for some \(\lambda \in \mathbb {Q}^{\times }\) by Proposition 4.1. \(\square \)

4.3 Embedding the \(\hat{\rho }\)-Selmer group

The following proposition follows from Lemmas 4.1 and 4.6 by the same proof as Lemma 4.2. If \(b^{\star }:S\rightarrow \mathsf {B}^{\star }\) is an S-valued point we write \(\mathsf {V}^{\star }_{b^{\star }}\) for the fibre of \(\pi ^{\star }:\mathsf {V}^{\star }\rightarrow \mathsf {B}^{\star }\) under \(b^{\star }\).

Proposition 4.2

Let R be a \(\mathbb {Q}\)-algebra. Let \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(R)\) with \(b^{\star } :=\mathcal {Q}(b)\). Then there are canonical bijections of sets:

  1. 1.

    \( \mathsf {G}(R) \backslash \mathsf {V}_b(R) \simeq \ker \left( \mathrm {H}^1(R,P_b[2]) \rightarrow \mathrm {H}^1(R,\mathsf {G})\right) .\)

  2. 2.

    \( \mathsf {G}^{\star }(R) \backslash \mathsf {V}^{\star }_{b^{\star }}(R) \simeq \ker \left( \mathrm {H}^1(R,\hat{E}_b[2]) \rightarrow \mathrm {H}^1(R,\mathsf {G}^{\star })\right) .\)

The reducible orbits \(\mathsf {G}(R)\cdot \sigma (b)\) and \(\mathsf {G}^{\star }(R)\cdot \mathcal {Q}(\sigma (b))\) correspond to the trivial element in \(\mathrm {H}^1(R,P_b[2])\) and \(\mathrm {H}^1(R,\hat{E}_b[2])\) respectively. Moreover the following diagram is commutative.

figure ac

Here the horizontal maps are induced by \(\mathcal {Q}:\mathsf {V}\rightarrow \mathsf {V}^{\star }\) and the projection \(P_b[2] \rightarrow \hat{E}_b[2]\) respectively and the vertical maps are the injections induced by the above identifications.

The following corollary will be useful later and connects the notion of almost reducibility to a more arithmetic one. It follows from the commutative diagram of Proposition 4.2.

Corollary 4.3

Let \(k/\mathbb {Q}\) be a field and \(b \in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(k)\). Then the following are equivalent for \(v \in \mathsf {V}_b(k)\):

  • v is almost k-reducible (Definition 2.4).

  • The class of \(\mathsf {G}(k)\cdot v\) in \(\mathrm {H}^1(k,P_b[2])\) under the bijection of Proposition 4.2 lies in the kernel of the map \(\mathrm {H}^1(k,P_b[2]) \rightarrow \mathrm {H}^1(k,\hat{E}_b[2])\).

Theorem 4.2

Let R be a \(\mathbb {Q}\)-algebra such that every locally free R-module of constant rank is free and let \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(R)\) with \(b^{\star } :=\mathcal {Q}(b)\). Then there exists a natural embedding \(\eta ^{\star }_b:P_b(R)/\hat{\rho }(P_b^{\vee }(R)) \hookrightarrow \mathsf {G}^{\star }(R)\backslash \mathsf {V}^{\star }_{b^{\star }}(R) \) compatible with base change on R. Moreover it sends the identity element to the orbit \(\mathsf {G}^{\star }(R)\cdot \mathcal {Q}(\sigma (b))\).

Proof

Recall from Corollary 3.2 that we have an isomorphism \(P_b^{\vee }[\hat{\rho }] \simeq \hat{E}_b[2]\). For \(A \in P_b(R)\), write \(\eta ^{\star }_b(A)\in \mathrm {H}^1(R,\hat{E}_b[2])\) for the image of A under the \(\hat{\rho }\)-descent map transported along the isomorphism \(\mathrm {H}^1(R,P_b^{\vee }[\hat{\rho }]) \simeq \mathrm {H}^1(R,\hat{E}_b[2])\). Using the identification of Proposition 4.2 it suffices to prove that \(\eta ^{\star }_b(A)\) is killed under the map \(\mathrm {H}^1(R,\hat{E}_b[2]) \rightarrow \mathrm {H}^1(R,\mathsf {G}^{\star })\). The commutative diagram

figure ad

shows that \(\eta ^{\star }_b(A)\) lifts to a class in \(\mathrm {H}^1(R,P_b[2])\) lying in the image of the 2-descent map. By the proof of Theorem 4.1, the image of this class in \(\mathrm {H}^1(R,\mathsf {G})\) is trivial. Therefore the image of \(\eta _b^{\star }(A)\) in \(\mathrm {H}^1(R,\mathsf {G}^{\star })\) is trivial too. \(\square \)

We obtain the following consequence for the \(\hat{\rho }\)-Selmer group, whose proof is identical to the proof of Corollary 4.1 and uses the fact that \(\mathrm {H}^1(\mathbb {Q},\mathsf {G}^{\star }) \rightarrow \prod _v \mathrm {H}^1(\mathbb {Q}_v,\mathsf {G}^{\star })\) has trivial kernel.

Corollary 4.4

Let \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(\mathbb {Q})\) with \(b^{\star } :=\mathcal {Q}(b)\) and write \({{\,\mathrm{Sel}\,}}_{\hat{\rho }} P_b^{\vee }\) for the \(\hat{\rho }\)-Selmer group of \(P_b^{\vee }\). Then the injection \(\eta ^{\star }_b:P_b(\mathbb {Q})/\hat{\rho }(P^{\vee }_b(\mathbb {Q})) \hookrightarrow \mathsf {G}^{\star }(\mathbb {Q})\backslash \mathsf {V}^{\star }_{b^{\star }}(\mathbb {Q})\) of Theorem 4.2 extends to an injection

$$\begin{aligned} {{\,\mathrm{Sel}\,}}_{\hat{\rho }}P^{\vee }_{b} \hookrightarrow \mathsf {G}^{\star }(\mathbb {Q})\backslash \mathsf {V}^{\star }_{b^{\star } }(\mathbb {Q}). \end{aligned}$$

5 Integral orbit representatives

5.1 Integral structures

So far we have considered properties of the pair \((\mathsf {G},\mathsf {V})\) and \((\mathsf {G}^{\star },\mathsf {V}^{\star })\) over \(\mathbb {Q}\). In this subsection we define these objects over \(\mathbb {Z}\) and observe that the above results and constructions are still valid over \(\mathbb {Z}[1/N]\) for an appropriate choice of integer \(N\ge 1\).

Indeed, our choice of pinning of \(\mathsf {H}\) in Sect. 2.1 determines a Chevalley basis of \(\mathfrak {h}\), hence a \(\mathbb {Z}\)-form \(\underline{\mathfrak {h}}\) of \(\mathfrak {h}\) (in the sense of [17,  § 1]) with adjoint group \(\underline{\mathsf {H}}\), a split reductive group of type \(F_4\) over \(\mathbb {Z}\). The \(\mathbb {Z}\)-lattice \(\underline{\mathsf {V}}= \mathsf {V}\cap \underline{\mathfrak {h}}\) is admissible [17,  Definition 2.2]; define \(\underline{\mathsf {G}}\) as the Zariski closure of \(\mathsf {G}\) in \({{\,\mathrm{GL}\,}}(\underline{\mathsf {V}})\). The \(\mathbb {Z}\)-group scheme \(\underline{\mathsf {G}}\) has generic fibre \(\mathsf {G}\) and acts faithfully on the free \(\mathbb {Z}\)-module \(\underline{\mathsf {V}}\) of rank 28. The automorphism \(\theta :\mathsf {H}\rightarrow \mathsf {H}\) extends by the same formula to an automorphism \(\underline{\mathsf {H}}\rightarrow \underline{\mathsf {H}}\), still denoted by \(\theta \). We may similarly define \(\underline{\mathsf {H}}_{\mathrm {E}}, \underline{\mathsf {G}}_{\mathrm {E}}\) and \(\underline{\mathsf {V}}_{\mathrm {E}}\) and extend \(\theta _{\mathrm {E}}\), \(\zeta \) to involutions \(\underline{\mathsf {H}}_{\mathrm {E}}\rightarrow \underline{\mathsf {H}}_{\mathrm {E}}\).

Lemma 5.1

  1. 1.

    \(\underline{\mathsf {G}}\) is a split reductive group over \(\mathbb {Z}\) of type \(C_3\times A_1\).

  2. 2.

    The equality \(\mathsf {H}^{\theta }=\mathsf {G}\) extends to an isomorphism \(\underline{\mathsf {H}}^{\theta }_{\mathbb {Z}[1/2]}\simeq \underline{\mathsf {G}}_{\mathbb {Z}[1/2]}\).

  3. 3.

    The equality \(\mathsf {H}_{\mathrm {E}}^{\zeta }=\mathsf {H}\) extends to an isomorphism \(\underline{\mathsf {H}}^{\zeta }_{\mathrm {E},\mathbb {Z}[1/2]}\simeq \underline{\mathsf {H}}_{\mathbb {Z}[1/2]}\).

Proof

For the first claim, it suffices to prove that \(\underline{\mathsf {G}}\rightarrow {{\,\mathrm{Spec}\,}}\mathbb {Z}\) is smooth and affine and that its geometric fibres are connected reductive groups. But \(\underline{\mathsf {G}}\) is \(\mathbb {Z}\)-flat and affine by construction, and its fibres are reductive by [17,  §4.3]. The second claim follows from the fact that \(\underline{\mathsf {H}}^{\theta }_{\mathbb {Z}[1/2]}\) is a reductive group scheme of the same type as \(\underline{\mathsf {G}}_{\mathbb {Z}[1/2]}\), which follows from [24,  Remark 3.1.5]. The third claim follows from the fact that \(\underline{\mathsf {H}}^{\zeta }_{\mathrm {E},\mathbb {Z}[1/2]}\) is \(\mathbb {Z}[1/2]\)-smooth by Lemma 3.3, and that its geometric fibres are adjoint semisimple of type \(F_4\) (by the same reasoning as [49,  §3.1]). \(\square \)

We define the smooth \(\mathbb {Z}\)-group \(\underline{\mathsf {G}}^{\star }:={{\,\mathrm{PGL}\,}}_2\) and \(\underline{\mathsf {G}}^{\star }\)-representation \(\underline{\mathsf {V}}^{\star }:=\mathbb {Z}\oplus \mathbb {Z}\oplus {{\,\mathrm{Sym}\,}}^4(2)\), where \({{\,\mathrm{Sym}\,}}^4(2)\) denotes the space of binary quartic forms \(ax^4+bx^3y+cx^2y^2+dxy^3+ey^4\) with \(a,\dots ,e\in \mathbb {Z}\). The \(\mathbb {Z}\)-module \(\underline{\mathsf {V}}^{\star }\) is free of rank 7.

Recall that in Sect. 3.1 we have fixed polynomials \(p_2,p_6,p_8,p_{12}\in \mathbb {Q}[\mathsf {V}]^{\mathsf {G}}\) satisfying the conclusions of Proposition 3.2. Note that those conclusions are invariant under the \(\mathbb {G}_m\)-action on \(\mathsf {B}\). By rescaling the polynomials \(p_2,\dots ,p_{12}\) using this \(\mathbb {G}_m\)-action, we can assume they lie in \(\mathbb {Z}[\underline{\mathsf {V}}]^{\underline{\mathsf {G}}}\). We may additionally assume that the discriminant \(\varDelta \) from Sect. 3.6 lies in \(\mathbb {Z}[\underline{\mathsf {V}}]^{\underline{\mathsf {G}}}\). Define \(\underline{\mathsf {B}}:={{\,\mathrm{Spec}\,}}\mathbb {Z}[p_2,p_6,p_8,p_{12}]\) and \(\underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}} :={{\,\mathrm{Spec}\,}}\mathbb {Z}[p_2,p_6,p_8,p_{12}][\varDelta ^{-1}]\). We have an invariant map \(\pi :\underline{\mathsf {V}}\rightarrow \underline{\mathsf {B}}\).

Recall from Sect. 4.2 that we have defined a morphism \(\mathcal {Q}:\mathsf {V}\rightarrow \mathsf {V}^{\star }\) using the binary quartic resolvent from Sect. 2.5, which extends by the same formula to a morphism \(\mathcal {Q}:\underline{\mathsf {V}}\rightarrow \underline{\mathsf {V}}^{\star }\) (This follows from Formula (2.2) and our choice of isomorphism \(\mathsf {V}\simeq \mathsf {W}\boxtimes (2)\) made at the end of Sect. 2.4). Define \(\underline{\mathsf {B}}^{\star }= {{\,\mathrm{Spec}\,}}\mathbb {Z}[b_2,b_6,I,J]\) and write \(\pi ^{\star }:\underline{\mathsf {V}}^{\star }\rightarrow \underline{\mathsf {B}}^{\star }\) for the invariant map.

Extend the morphism \(\chi \) from Sect. 3.4 to the morphism \(\chi :\underline{\mathsf {B}}\rightarrow \underline{\mathsf {B}}\) given by the same Formula (3.9). Following Sect. 3.6 we define \(\varDelta _{\hat{E}} :=4p_8^3+27p_{12}^2\) and \(\varDelta _{E} :=\varDelta _{E} \circ \chi \), both elements of \(\mathbb {Z}[\underline{\mathsf {B}}]\).

We extend the family of curves given by Eq. (3.2) to the family \(\mathcal {C}\rightarrow \underline{\mathsf {B}}\) given by that same equation. Similarly we define \(\overline{\mathcal {E}}\rightarrow \underline{\mathsf {B}}\) by the family of curves given by Eq. (3.5). They are defined by the projective closures of the equations

$$\begin{aligned}&\mathcal {C}:y^4+p_2xy^2+p_6y^2 = x^3+p_8x+p_{12}, \end{aligned}$$
(5.1)
$$\begin{aligned}&\overline{\mathcal {E}}:y^2+p_2xy+p_6y = x^3+p_8x+p_{12}. \end{aligned}$$
(5.2)

As before if \(\mathcal {X}\) is a \(\underline{\mathsf {B}}\)-scheme we write \(\hat{\mathcal {X}}\) for the pullback of \(\mathcal {X}\) along \(\chi :\underline{\mathsf {B}}\rightarrow \underline{\mathsf {B}}\).

We can find an integer N with the following properties (set \(S = \mathbb {Z}[1/N]\)):

  1. 1.

    The integer N is good in the sense of [34,  Proposition 4.1]. In particular, 2, 3 and 5 are invertible in S and \(\mathcal {C}_S \rightarrow \underline{\mathsf {B}}_S\) is flat and proper with geometrically integral fibres and smooth exactly above \(\underline{\mathsf {B}}_S^{{{\,\mathrm{rs}\,}}}\).

  2. 2.

    The morphism \(\mathcal {Q}:\mathsf {B}\rightarrow \mathsf {B}^{\star }\) of §4.2 extends to an isomorphism \(\mathcal {Q}:\underline{\mathsf {B}}_S\rightarrow \underline{\mathsf {B}}^{\star }_S\), and there exists \(\lambda \in S^{\times }\) such that \((\mathcal {Q}_I,\mathcal {Q}_J) = (\lambda ^2p_8,\lambda ^{3}p_{12})\). (Proposition 4.1.)

  3. 3.

    The discriminant locus \(\{ \varDelta =0 \}_S \rightarrow \underline{\mathsf {B}}_S\) has geometrically reduced fibres. Moreover \(\varDelta \) agrees with \(\varDelta _{E}\varDelta _{\hat{E}}\) up to a unit in \(\mathbb {Z}[1/N]\). (Proposition 3.11.)

  4. 4.

    There exists open subschemes \(\underline{\mathsf {V}}^{{{\,\mathrm{rs}\,}}} \subset \underline{\mathsf {V}}^{{{\,\mathrm{reg}\,}}} \subset \underline{\mathsf {V}}_S\) such that if \(S\rightarrow k\) is a map to a field and \(v\in \underline{\mathsf {V}}(k)\) then v is regular if and only if \(v \in \underline{\mathsf {V}}^{{{\,\mathrm{reg}\,}}}(k)\) and v is regular semisimple if and only if \(v\in \underline{\mathsf {V}}^{{{\,\mathrm{rs}\,}}}(k)\). Moreover, \(\underline{\mathsf {V}}^{{{\,\mathrm{rs}\,}}}\) is the open subscheme defined by the nonvanishing of the discriminant polynomial \(\varDelta \in S[\underline{\mathsf {V}}]\).

  5. 5.

    \(S[\underline{\mathsf {V}}]^{\underline{\mathsf {G}}} = S[p_2,p_6,p_8,p_{12}]\). The Kostant section of Sect. 2.2 extends to a section \(\sigma :\underline{\mathsf {B}}_S \rightarrow \underline{\mathsf {V}}^{{{\,\mathrm{reg}\,}}}\) of \(\pi \) satisfying the following property: for any \(b\in \underline{\mathsf {B}}(\mathbb {Z})\subset \underline{\mathsf {B}}_S(S)\) we have \(\sigma (N\cdot b) \in \underline{\mathsf {V}}(\mathbb {Z})\).

  6. 6.

    Define \(\mathcal {J}\rightarrow \underline{\mathsf {B}}_S^{{{\,\mathrm{rs}\,}}}\) to be the Jacobian of the family of smooth curves \(\mathcal {C}^{{{\,\mathrm{rs}\,}}}_S \rightarrow \underline{\mathsf {B}}_S^{{{\,\mathrm{rs}\,}}}\) [18,  §9.3, Theorem 1]. Let \(\mathcal {E}\rightarrow \underline{\mathsf {B}}_S^{{{\,\mathrm{rs}\,}}}\) denote the restriction of \(\overline{\mathcal {E}}_S\) to \(\underline{\mathsf {B}}_S^{{{\,\mathrm{rs}\,}}}\). Let \(\mathcal {P}\rightarrow \underline{\mathsf {B}}_S\) be the Prym variety of the cover \(\mathcal {C}^{{{\,\mathrm{rs}\,}}}_S\rightarrow \mathcal {E}\) as defined in Sect. 3.3. Then the isomorphism from Proposition 3.4 extends to an isomorphism \(\mathcal {P}[2] \simeq Z_{\underline{\mathsf {G}}_S}(\sigma |_{\underline{\mathsf {B}}_S^{{{\,\mathrm{rs}\,}}}})) \) of finite étale group schemes over \(\underline{\mathsf {B}}_S^{{{\,\mathrm{rs}\,}}}\).

  7. 7.

    The action map \(\underline{\mathsf {G}}_S \times \underline{\mathsf {B}}_S \rightarrow \underline{\mathsf {V}}^{{{\,\mathrm{reg}\,}}},\, (g,b) \mapsto g\cdot \sigma (b) \) is étale and its image contains \(\underline{\mathsf {V}}^{{{\,\mathrm{rs}\,}}}\). (Proposition 2.4.)

  8. 8.

    The \(\mathsf {B}\)-scheme \(\overline{P}\) constructed in Sect. 3.5 extends to a \(\underline{\mathsf {B}}_S\)-scheme \(\overline{\mathcal {P}}\rightarrow \underline{\mathsf {B}}_S\) which is flat, projective, with geometrically integral fibres and whose restriction to \(\underline{\mathsf {B}}_S^{{{\,\mathrm{rs}\,}}}\) is isomorphic to \(\mathcal {P}\). Moreover, \(\overline{\mathcal {P}}\rightarrow S\) is smooth with geometrically integral fibres, and the smooth locus of the morphism \(\overline{\mathcal {P}}\rightarrow \underline{\mathsf {B}}_S\) is an open subscheme of \(\overline{\mathcal {P}}\) whose complement is S-fibrewise of codimension at least two. (Proposition 3.9.)

  9. 9.

    Let \({{\,\mathrm{Pic}\,}}_{\mathcal {C}_S/\underline{\mathsf {B}}_S}^0\) denote the identity component of the relative Picard scheme of \(\mathcal {C}_S\rightarrow \underline{\mathsf {B}}_S\). Then the fibres of \({{\,\mathrm{Pic}\,}}_{\mathcal {C}_S/\underline{\mathsf {B}}_S}^0[1+\tau ^*]\rightarrow \underline{\mathsf {B}}_S\) are geometrically integral. (Proposition 3.10.)

  10. 10.

    For every field k of characteristic not dividing N and \(b\in \underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(k)\), there exists an isomorphism \(\hat{\mathcal {P}}_b\simeq \mathcal {P}^{\vee }_b\) of (1, 2)-polarized abelian varieties. (Theorem 3.1.)

The existence of such an N follows from the principle of spreading out. (See [34,  Proposition 4.1] for more details.) We fix such an integer for the remainder of the paper.

Using these properties, we can extend our previous results to S-algebras rather than \(\mathbb {Q}\)-algebras. We mention in particular:

Proposition 5.1

(Analogue of Lemma 4.2 and Proposition 4.2) Let R be an S-algebra and \(b\in \underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(R)\) with \(b^{\star } :=\mathcal {Q}(b)\). Then we have natural bijections of pointed sets:

  1. 1.

    \( \underline{\mathsf {G}}(R) \backslash \underline{\mathsf {V}}_b(R) \simeq \ker \left( \mathrm {H}^1(R,\mathcal {P}_b[2]) \rightarrow \mathrm {H}^1(R,\underline{\mathsf {G}})\right) .\)

  2. 2.

    \(\underline{\mathsf {G}}_{\mathrm {E}}(R)\backslash \underline{\mathsf {V}}_{\mathrm {E},b}(R) \simeq \ker \left( \mathrm {H}^1(R,\mathcal {J}_b[2]) \rightarrow \mathrm {H}^1(R,\underline{\mathsf {G}}_{\mathrm {E}})\right) .\)

  3. 3.

    \( \underline{\mathsf {G}}^{\star }(R) \backslash \underline{\mathsf {V}}^{\star }_{b^{\star }}(R) \simeq \ker \left( \mathrm {H}^1(R,\hat{\mathcal {E}}_b[2]) \rightarrow \mathrm {H}^1(R,\underline{\mathsf {G}}^{\star })\right) .\)

Proposition 5.2

(Analogue of Theorems 4.1 and 4.2) Let R be an S-algebra and \(b\in \underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(R)\) with \(b^{\star }:=\mathcal {Q}(b)\). Suppose that every locally free R-module of constant rank is free. Then there is a commutative diagram

figure ae

Here the horizontal arrows \(\eta _b\) and \(\eta ^{\star }_b\) are injections and send the identity to the orbit of \(\sigma (b)\) and \(\mathcal {Q}(\sigma (b))\) respectively.

The main aim of Sect. 5 is to prove the following two theorems concerning integral orbit representatives. Both have consequences for orbits over \(\mathbb {Z}\), see Corollaries 5.3 and 5.4.

Theorem 5.1

Let p be a prime not dividing N. Let \(b\in \underline{\mathsf {B}}(\mathbb {Z}_p)\) with \(\varDelta (b)\ne 0\). Then every orbit in the image of the map

$$\begin{aligned} \eta _b :P_b(\mathbb {Q}_p)/2P_b(\mathbb {Q}_p) \rightarrow \mathsf {G}(\mathbb {Q}_p) \backslash \mathsf {V}_b(\mathbb {Q}_p) \end{aligned}$$

of Theorem 4.1 has a representative in \(\underline{\mathsf {V}}(\mathbb {Z}_p)\).

Theorem 5.2

Let p be a prime not dividing N. Let \(b\in \underline{\mathsf {B}}(\mathbb {Z}_p)\) with \(\varDelta (b)\ne 0\) and write \(b^{\star } :=\mathcal {Q}(b)\). Then every orbit in the image of the map

$$\begin{aligned} \eta ^{\star }_b:P_b(\mathbb {Q}_p)/\hat{\rho }(P^{\vee }_b(\mathbb {Q}_p)) \rightarrow \mathsf {G}^{\star }(\mathbb {Q}_p) \backslash \mathsf {V}^{\star }_{b^{\star }}(\mathbb {Q}_p) \end{aligned}$$

of Theorem 4.2 has a representative in \(\underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)\).

Theorem 5.2 will follow easily from Theorem 5.1, so we spend most of Sect. 5 proving Theorem 5.1. We follow the general strategy of [34,  §4], the main difference being that the role of the compactified Jacobian is played here by the compactified Prym variety introduced in Sect. 3.5.

5.2 Some groupoids

In this section we define some groupoids which will be a convenient way to think about orbits in our representations and a crucial ingredient for the proof of Theorem 5.1. It is closely modelled on the corresponding section [52,  §4.3]; the reader may also consult [34,  §4.2]. Throughout this section we fix a scheme X over \(S = \mathbb {Z}[1/N]\).

We define the groupoid \({{\,\mathrm{GrLie}\,}}_X\) whose objects are pairs \((H',\theta ')\) where

  • \(H'\) is a reductive group scheme over X whose geometric fibres are simple of Dynkin type \(F_4\). (See [24,  Definition 3.1.1] for the definition of a reductive group scheme over a general base.)

  • \(\theta ': H' \rightarrow H'\) is an involution of reductive X-group schemes such that for each geometric point \(\bar{x}\) of X there exists a maximal torus \(A_{\bar{x}}\) of \(H'_{\bar{x}}\) such that \(\theta '\) acts as \(-1\) on \(X^*(A_{\bar{x}})\).

A morphism \((H',\theta ') \rightarrow (H'',\theta '')\) in \({{\,\mathrm{GrLie}\,}}_X\) is given by an isomorphism \(\phi : H'\rightarrow H''\) such that \(\phi \circ \theta ' = \theta '' \circ \phi \). There is a natural notion of base change and the groupoids \({{\,\mathrm{GrLie}\,}}_X\) form a stack over the category of schemes over S in the étale topology. Recall that in Sect. 5.1 we have defined a pair \((\underline{\mathsf {H}}_S,\theta _S)\) which by [50,  Corollary 14] defines an object of \({{\,\mathrm{GrLie}\,}}_{S}\).

Proposition 5.3

Let X be an S-scheme. The assignment \((H',\theta ')\mapsto {{\,\mathrm{Isom}\,}}((\underline{\mathsf {H}}_X,\theta _X),(H',\theta '))\) defines a bijection between:

  • The isomorphism classes of objects in \({{\,\mathrm{GrLie}\,}}_X\).

  • The set \(\mathrm {H}^1(X,\underline{\mathsf {G}})\).

Proof

Since \({{\,\mathrm{GrLie}\,}}\) is a stack in the étale topology of S-schemes and \({{\,\mathrm{Aut}\,}}((\underline{\mathsf {H}}_X,\theta _X))=\mathsf {G}_X\), it suffices to prove that any two objects \((H,\theta ), (H',\theta ')\) of \({{\,\mathrm{GrLie}\,}}_X\) are étale locally isomorphic. The proof of this fact is very similar to the proof of [52,  Lemma 2.3] and we omit it. (See also [34,  Proposition 4.4].) \(\square \)

We define the groupoid \({{\,\mathrm{GrLieE}\,}}_X\) whose objects are triples \((H',\theta ',\gamma ')\) where \((H',\theta ')\) is an object of \({{\,\mathrm{GrLie}\,}}_X\) and \(\gamma '\in {{\,\mathrm{\mathfrak {h}}\,}}'\) (the Lie algebra of \(H'\)) satisfying \(\theta '(\gamma ') = -\gamma '\). A morphism \((H',\theta ',\gamma ')\rightarrow (H'',\theta '',\gamma '')\) in \({{\,\mathrm{GrLieE}\,}}_X\) is given by a morphism \(\phi : H' \rightarrow H''\) in \({{\,\mathrm{GrLie}\,}}_X\) mapping \(\gamma '\) to \(\gamma ''\).

We define a functor \({{\,\mathrm{GrLieE}\,}}_X \rightarrow \underline{\mathsf {B}}(X)\) (where \(\underline{\mathsf {B}}(X)\) is seen as a discrete category) as follows. For an object \((H',\theta ',\gamma ')\) in \({{\,\mathrm{GrLieE}\,}}_X\), choose a faithfully flat extension \(X'\rightarrow X\) such that there exists an isomorphism \(\phi : (H',\theta ')_{X'} \rightarrow (\underline{\mathsf {H}}_S,\theta )_{X'}\) in \({{\,\mathrm{GrLie}\,}}_X\). We define the image of the object \((H',\theta ',\gamma ')\) under the map \({{\,\mathrm{GrLieE}\,}}_X \rightarrow \underline{\mathsf {B}}(X)\) by \(\pi (\phi (\gamma '))\). This procedure is independent of the choice of \(\phi \) and \(X'\) and by descent defines an element of \(\underline{\mathsf {B}}(X)\). For \(b\in \underline{\mathsf {B}}(X)\) we write \({{\,\mathrm{GrLieE}\,}}_{X,b}\) for the full subcategory of elements of \({{\,\mathrm{GrLieE}\,}}_{X,b}\) mapping to b under this map. In Sect. 5.1 we have defined an object \((\underline{\mathsf {H}}_{\underline{\mathsf {B}}_S}, \theta _{\underline{\mathsf {B}}_S}, \sigma )\) of \({{\,\mathrm{GrLieE}\,}}_{\underline{\mathsf {B}}_S}\).

Recall that for \(b\in \underline{\mathsf {B}}(X)\), \(\underline{\mathsf {V}}_b\) denotes the fibre of b of the map \(\pi : \underline{\mathsf {V}}\rightarrow \underline{\mathsf {B}}\).

Proposition 5.4

Let X be an S-scheme and let \(b \in \underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(X)\). The assignment

$$\begin{aligned} (H',\theta ',\gamma ')\mapsto {{\,\mathrm{Isom}\,}}((\underline{\mathsf {H}}_X,\theta _X,\sigma (b)),(H',\theta ',\gamma ')) \end{aligned}$$

defines a bijection between

  • Isomorphism classes of objects in \({{\,\mathrm{GrLieE}\,}}_{X,b}\).

  • The set \(\mathrm {H}^1(X,Z_{\underline{\mathsf {G}}}(\sigma (b)))\).

Proof

The object \((\underline{\mathsf {H}}_X,\theta _X,\sigma (b))\) of \({{\,\mathrm{GrLieE}\,}}_{X,b}\) has automorphism group \(Z_{\underline{\mathsf {G}}}(\sigma (b))\). By descent, it suffices to prove that every object \((H',\theta ', \gamma ')\) in \({{\,\mathrm{GrLieE}\,}}_{X,b}\) is étale locally isomorphic to \((\underline{\mathsf {H}}_X,\theta _X,\sigma (b))\). By Proposition 5.3, we may assume that \((H',\theta ') = (\underline{\mathsf {H}}_X,\theta _X)\). By Property 7 of Sect. 5.1 (which is a spreading out of Proposition 2.4 over S), the action map \(\underline{\mathsf {G}}_S \times \underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}_S \rightarrow \underline{\mathsf {V}}^{{{\,\mathrm{rs}\,}}}\) is étale and surjective. Therefore it has sections étale locally, hence \(\gamma '\) is étale locally \(\underline{\mathsf {G}}\)-conjugate to \(\sigma (b)\). \(\square \)

The following important proposition gives an interpretation of the (not necessarily regular semisimple) \(\underline{\mathsf {G}}(X)\)-orbits of \(\underline{\mathsf {V}}(X)\) in terms of the groupoids \({{\,\mathrm{GrLie}\,}}_X\) and \({{\,\mathrm{GrLieE}\,}}_X\).

Proposition 5.5

Let X be an S-scheme and let \(b\in \underline{\mathsf {B}}(X)\). The following sets are in canonical bijection:

  • The set of \(\underline{\mathsf {G}}(X)\)-orbits on \(\underline{\mathsf {V}}_b(X)\).

  • Isomorphism classes of objects \((H',\theta ',\gamma ')\) in \({{\,\mathrm{GrLieE}\,}}_{X,b}\) such that \((H',\theta ') \simeq (\underline{\mathsf {H}}_S,\theta )_X\) in \({{\,\mathrm{GrLie}\,}}_X\).

Proof

If \(v\in \underline{\mathsf {V}}_b(X)\) we define the object \(\mathcal {A}_v = (\underline{\mathsf {H}}_X,\theta _X,v)\) of \({{\,\mathrm{GrLieE}\,}}_{X,b}\). The assignment \(v\mapsto \mathcal {A}_v\) establishes a well-defined bijection between the two sets of the proposition; we omit the formal verification. (See [34,  Proposition 4.6] for the proof of a similar statement.) \(\square \)

The following lemma is analogous to [52,  Lemma 5.6] and will be useful in Sect. 5.4 to extend orbits over a base of dimension 2.

Lemma 5.2

Let X be an integral regular scheme of dimension 2. Let \(U\subset X\) be an open subscheme whose complement has dimension 0. If \(b\in \underline{\mathsf {B}}_S(X)\), then restriction induces an equivalence of categories \({{\,\mathrm{GrLieE}\,}}_{X,b}\rightarrow {{\,\mathrm{GrLieE}\,}}_{U,b|_U}\).

Proof

We will use the following fact [23,  Lemme 2.1(iii)] repeatedly: if Y is an affine X-scheme of finite type, then restriction of sections \(Y(X)\rightarrow Y(U)\) is bijective. To prove essential surjectivity, let \((H',\theta ',\gamma ')\) be an object of \({{\,\mathrm{GrLieE}\,}}_{U,b|_U}\). By [23,  Théoreme 6.13] and Proposition 5.3, \((H',\theta ')\) extends to an object \((H'',\theta '')\) of \({{\,\mathrm{GrLie}\,}}_{X}\). If Y is the closed subscheme of \({{\,\mathrm{\mathfrak {h}}\,}}''\) of elements \(\gamma \) satisfying \(\theta ''(\gamma )=-\gamma \) and \(\gamma \) maps to b in \(\underline{\mathsf {B}}(X)\), then Y is affine and of finite type over X. It follows by the fact above that \(\gamma '\) lifts to an element \(\gamma ''\in {{\,\mathrm{\mathfrak {h}}\,}}''(X)\) and that \((H'',\theta '',\gamma '')\) defines an object of \({{\,\mathrm{GrLieE}\,}}_{X,b}\). Since the scheme of isomorphisms \({{\,\mathrm{Isom}\,}}_{{{\,\mathrm{GrLieE}\,}}}(\mathcal {A},\mathcal {A}')\) is X-affine, fully faithfulness follows again from the above fact. \(\square \)

5.3 Néron component groups of Prym varieties

In this subsection we perform some calculations with component groups of Néron models of Prym varieties. For the purposes of constructing integral representatives (Theorems 5.1 and 5.2), only Proposition 5.6 will be needed. However, the finer analysis succeeding it is necessary to obtain a lower bound in Theorem 1.1; of this analysis only Corollary 5.1 will be used later.

Notation. For the remainder of Sect. 5.3, let R be a discrete valuation ring with residue field k and fraction field K. We suppose that N is invertible in R.

Recall that we have defined in Sect. 5.1 abelian schemes \(\mathcal {J}, \mathcal {P}\) and \(\mathcal {E}\) over \(\underline{\mathsf {B}}_S^{{{\,\mathrm{rs}\,}}}\). We will use a minor abuse of notation and for any \(b\in \underline{\mathsf {B}}(R)\) with \(\varDelta (b)\ne 0\), we write \(\mathcal {J}_b\) (which is a priori only defined when \(\varDelta (b)\in R^{\times }\)) for the K-scheme \(\mathcal {J}_{b_K}\), and similarly for \(\mathcal {P}_b\) and \(\mathcal {E}_b\). For such b we write \(\mathscr {J}_b,\mathscr {P}_b, \mathscr {P}^{\vee }_b, \mathscr {E}_b\) for the Néron models of \(\mathcal {J}_b, \mathcal {P}_b, \mathcal {P}_b^{\vee }, \mathcal {E}_b\) respectively. The involution \(\tau _b^*\) of \(\mathcal {J}_b\) uniquely extends to an involution of \(\mathscr {J}_b\), again denoted by \(\tau ^*_b\).

Lemma 5.3

Let \(b\in \underline{\mathsf {B}}(R)\) with \(\varDelta (b)\ne 0\). Then the equality \(\mathcal {P}_b = \ker (1+\tau _b^* :\mathcal {J}_b \rightarrow \mathcal {J}_b)\) from (3.4) extends to an isomorphism \(\mathscr {P}_b \simeq \ker (1+\tau ^*_b:\mathscr {J}_b \rightarrow \mathscr {J}_b)\).

Proof

It suffices to prove that \(\ker (1+\tau ^*_b:\mathscr {J}_b \rightarrow \mathscr {J}_b)\) is smooth over R and satisfies the Néron mapping property for \(\mathcal {P}_b\). The smoothness follows from Lemma 3.3 applied to \(-\tau _b^*\). The Néron mapping property follows from that of \(\mathscr {J}_b\). \(\square \)

Lemma 5.4

Let \(b\in \underline{\mathsf {B}}(R)\) with \(\varDelta (b)\ne 0\). Suppose that the curve \(\mathcal {C}_b/R\) is regular. Then \(\mathscr {J}_b\) and \(\mathscr {P}_b\) have connected fibres.

Proof

Since \(\mathcal {C}_b\rightarrow {{\,\mathrm{Spec}\,}}R\) has geometrically integral fibres, [18,  §9.5, Theorem 1] shows that \(\mathscr {J}_b\) is isomorphic to \({{\,\mathrm{Pic}\,}}^0_{\mathcal {C}_b/R}\), the identity component of the Picard scheme of \(\mathcal {C}_b\rightarrow {{\,\mathrm{Spec}\,}}R\). Since \({{\,\mathrm{Pic}\,}}^0_{\mathcal {C}_b/R}\) has connected fibres by definition, the same holds for \(\mathscr {J}_b\).

It remains to consider \(\mathscr {P}_b\). Lemma 5.3 and the previous paragraph shows that \(\mathscr {P}_b\) is isomorphic to \({{\,\mathrm{Pic}\,}}^0_{\mathcal {C}_b/R}[1+\tau _b^*]\). It therefore suffices to prove that \({{\,\mathrm{Pic}\,}}^0_{\mathcal {C}_{b}/R}[1+\tau _b^*]\) has connected fibres. By Proposition 3.10 (and its analogue over \(\underline{\mathsf {B}}_S\): Property 9 of §5.1), \({{\,\mathrm{Pic}\,}}^0_{\mathcal {C}_S/\underline{\mathsf {B}}_S}[1+\tau ^*]\rightarrow \underline{\mathsf {B}}_S\) has connected fibres. Since \({{\,\mathrm{Pic}\,}}^0_{\mathcal {C}_{b}/R}[1+\tau _b^*]\) is the pullback of \({{\,\mathrm{Pic}\,}}^0_{\mathcal {C}_S/\underline{\mathsf {B}}_S}[1+\tau ^*]\) along the R-point b, the lemma follows. \(\square \)

Proposition 5.6

Let R be a discrete valuation ring in which N is a unit. Let \(K = {{\,\mathrm{Frac}\,}}R\) and let \({{\,\mathrm{ord}\,}}_K: K^{\times } \twoheadrightarrow \mathbb {Z}\) be the normalized discrete valuation. Let \(b\in \underline{\mathsf {B}}(R)\) and suppose that \({{\,\mathrm{ord}\,}}_K \varDelta (b)\le 1 \). Let \(\mathscr {J}_b, \mathscr {P}_b, \mathscr {P}^{\vee }_b\) and \(\mathscr {E}_b\) be the Néron models of \(\mathcal {J}_b\), \(\mathcal {P}_b, \mathcal {P}_b^{\vee }\) and \(\mathcal {E}_b\) respectively. Then:

  1. 1.

    The special fibres of the R-groups \(\mathscr {J}_b, \mathscr {P}_b, \mathscr {P}_b^{\vee }\) and \(\mathscr {E}_b\) are connected.

  2. 2.

    If \({{\,\mathrm{ord}\,}}_K \varDelta (b)=1 \), the special fibre of the quasi-finite étale R-group scheme \(\mathscr {P}_b[2]\) has order \(2^3\).

Proof

If \({{\,\mathrm{ord}\,}}_K \varDelta (b)= 0 \), all abelian varieties in question have good reduction so the proposition holds. Thus for the remainder of the proof we assume that \({{\,\mathrm{ord}\,}}_K \varDelta (b)= 1 \). The choice of N implies that \(\varDelta \) equals \(\varDelta _E \cdot \varDelta _{\hat{E}}\) up to a unit in \(\mathbb {Z}[1/N]\) (Property 3 of Sect. 5.1). This allows us to consider two separate cases, the first case being \(({{\,\mathrm{ord}\,}}_K \varDelta _{E}(b),{{\,\mathrm{ord}\,}}_K \varDelta _{\hat{E}}(b)) = (1,0)\) and the second case being \(({{\,\mathrm{ord}\,}}_K \varDelta _{E}(b),{{\,\mathrm{ord}\,}}_K \varDelta _{\hat{E}}(b)) = (0,1)\). Let \(\mathcal {R}\) and \(\mathcal {B}\) be the closed subschemes of \(\mathcal {C}_b\) and \(\overline{\mathcal {E}}_b\) given by intersecting them with the line \(\{y=0\} \subset \mathbb {P}^2_R\) using Eqs. (5.1) and (5.2) respectively. Then the morphism \(\mathcal {C}_b \rightarrow \overline{\mathcal {E}}_b\) restricts to a finite étale morphism \(\mathcal {C}_b - \mathcal {R} \rightarrow \overline{\mathcal {E}}_b - \mathcal {B}\). We recall that \(\varDelta _{\hat{E}}(b) = 4p_8(b)^3+27p_{12}(b)^2\).

  1. Case 1.

    In this case the discriminant of \(\overline{\mathcal {E}}_b\) has valuation 1. By Tate’s algorithm (see [57,  Lemma IV.9.5(a)]), this implies that \(\overline{\mathcal {E}}_b\) is regular and its special fibre has a unique singularity, which is a node. Since \({{\,\mathrm{ord}\,}}_K \varDelta _{\hat{E}}(b) =0\), the singular point of \(\overline{\mathcal {E}}_{b,k}\) is not contained in \(\mathcal {B}\) hence it lifts to two distinct singular points of \(\mathcal {C}_{b,k}\) which are also nodes. Since \(\mathcal {C}_b \rightarrow \overline{\mathcal {E}}_b\) is étale outside \(\mathcal {B}\), \(\mathcal {C}_b\) is regular and its special fibre has two nodal singular points which are swapped by the involution \(\tau _b:\mathcal {C}_b \rightarrow \mathcal {C}_b\).

  2. Case 2.

    Now \(\overline{\mathcal {E}}_b\) is smooth over R so the singular points of \(\mathcal {C}_b\) are contained in \(\mathcal {R}\). Since \({{\,\mathrm{ord}\,}}_K \varDelta _{\hat{E}}(b) =1\), the discriminant of the polynomial \(x^3+p_8x+p_{12}\) has valuation 1. Since \(2,3\in R^{\times }\), the unique multiple root of the reduction of this polynomial lies in k; let \(\alpha \in R\) be a lift of this root. Using the substitution \(x\mapsto x-\alpha \), the curve \(\mathcal {C}_b\) is given by the equation

    $$\begin{aligned} y^4+a_2xy^2+a_6y^2=x^3+a_4x^2+a_8x+a_{12}, \end{aligned}$$
    (5.3)

    for some \(a_i\in R\) with \({{\,\mathrm{ord}\,}}_Ka_8\ge 1\) and \({{\,\mathrm{ord}\,}}_Ka_{12}\ge 1\). Since the discriminant of the cubic polynomial on the right hand side of (5.3) has valuation 1, the formula for such a discriminant shows that \(a_{12}\) is a uniformizer and \(a_4\in R^{\times }\). Since \(\overline{\mathcal {E}}_b\) is smooth over R, we have \(a_6\in R^{\times }\). Therefore \(\mathcal {C}_b\) is regular and its special fibre contains a unique nodal singularity. (For this last claim, see [36,  Exercise 7.5.7(b)].)

We conclude that in both cases \(\overline{\mathcal {E}}_b\) and \(\mathcal {C}_b\) are regular. Since \(\mathscr {E}_b\) can be identified with the smooth locus of \(\overline{\mathcal {E}}_b\) [57,  Theorem IV.9.1], the special fibre of \(\mathscr {E}_b\) is connected. By Lemma 5.4, the special fibres of \(\mathscr {J}_b\) and \(\mathscr {P}_b\) are connected. Because of the isomorphism \(\mathscr {P}_{b}^{\vee } \simeq \mathscr {P}_{\hat{b}}\) (Theorem 3.1 and its spreading out: Property 10 of Sect. 5.1) and the fact that \(\varDelta (b)\) and \(\varDelta (\hat{b})\) are equal up to a unit in R, the connectedness of the special fibre of \(\mathscr {P}_b^{\vee }\) follows from that of \(\mathscr {P}_b\).

For the second part of the lemma, it suffices to prove that the special fibre of \(\mathscr {P}_b\) is an extension of an elliptic curve by a rank 1 torus. This follows from the fact that the special fibres of \(\mathscr {J}_b\) and \(\mathscr {E}_b\) are semiabelian varieties of toric rank 2 and 1 respectively in the first case and of toric rank 1 and 0 in the second case. \(\square \)

We proceed with a finer analysis of Néron models of Prym varieties, only necessary to obtain a lower bound in Theorem 1.1. If A/K is an abelian variety with Néron model \(\mathscr {A}/R\) we write \(\mathscr {A}^{\circ }\) for the of \(\mathscr {A}\), obtained by removing the connected components of \(\mathscr {A}_k\) not containing the identity section. Recall from Lemma 5.3 that we may view \(\mathscr {P}_b\) as a closed subgroup scheme of \(\mathscr {J}_b\).

Definition 5.1

Let \(b\in \underline{\mathsf {B}}(R)\) with \(\varDelta (b)\ne 0\). We say b is if \(\mathscr {J}_b^{\circ } \cap \mathscr {P}_b=\mathscr {P}_b^{\circ }\) or equivalently, \(\mathscr {J}_b^{\circ } \cap \mathscr {P}_b\) has connected fibres.

The reason for introducing admissibility is Lemma 5.7. It seems unlikely that every \(b\in \underline{\mathsf {B}}(R)\) with \(\varDelta (b)\ne 0\) is admissible, but we have not found a counterexample. Our first goal is establishing a sufficient condition for admissibility, Proposition 5.7. This we achieve with the help of the following two lemmas.

Lemma 5.5

Let \(b\in \underline{\mathsf {B}}(R)\) with \(\varDelta (b)\ne 0\). Let \(\tilde{\mathcal {C}}\) be a regular model of \(C_b\), i.e. a regular, proper, flat R-scheme whose generic fibre is isomorphic to \(C_b\). Suppose that the involution \(\tau _b\) of \(C_b\) extends to an involution \(\tau _b\) of \(\tilde{\mathcal {C}}\). Let \({{\,\mathrm{Pic}\,}}^0_{\tilde{\mathcal {C}}/R}\rightarrow {{\,\mathrm{Spec}\,}}R\) be the identity component of the Picard scheme of \(\tilde{\mathcal {C}}/R\). Then b is admissible if (and only if) the special fibre of \({{\,\mathrm{Pic}\,}}^0_{\tilde{\mathcal {C}}/R}[1+\tau ^*_b]\) is connected.

Proof

Since \(C_b\) has a K-rational point \(\infty \), the special fibre of \(\tilde{\mathcal {C}}\) has an irreducible component of degree 1. Therefore by a theorem of Raynaud [18,  §9.5, Theorem 4(b)], we have an isomorphism \(\mathscr {J}_b^{\circ }\simeq {{\,\mathrm{Pic}\,}}^0_{\tilde{\mathcal {C}}/R}\). This isomorphism intertwines the involutions \(\tau _b^*\) on both sides, because these involutions are the unique extensions of their restriction to the generic fibre. By Lemma 5.3, we see that \(\mathscr {J}_b^{\circ }\cap \mathscr {P}_b\simeq {{\,\mathrm{Pic}\,}}^0_{\tilde{\mathcal {C}}/R}[1+\tau _b^*]\). Since the generic fibre of \(\mathscr {J}_b^{\circ }\cap \mathscr {P}_b\) equals \(\mathcal {P}_b\) which is connected, the equivalence of the lemma follows from the definition of admissibility. \(\square \)

For the statement of the next lemma, recall [27,  Expose \(\text {VI}_{\text {A}}\), Theoreme 5.4.2] that the category of finite type commutative group schemes over a field is abelian.

Lemma 5.6

Let

$$\begin{aligned} 1 \rightarrow A \rightarrow B \rightarrow C \rightarrow 1 \end{aligned}$$

be a short exact sequence of finite type commutative group schemes over k. Let \(\tau \) be an involution of A, B and C whose action is compatible with the above sequence. Suppose that either (1) the quotient \(A^{\tau }/(1+\tau )A\) is trivial, or (2) \(A^{\tau }/(1+\tau )A\) is finite étale and \(C[1+\tau ]\) is connected. Then the following sequence is short exact:

$$\begin{aligned} 1 \rightarrow A[1+\tau ] \rightarrow B[1+\tau ] \rightarrow C[1+\tau ] \rightarrow 1. \end{aligned}$$

Proof

We consider AB and C as sheaves on the big fppf site of \({{\,\mathrm{Spec}\,}}k\). The long exact sequence in \(\mathbb {Z}/2\mathbb {Z}\)-group cohomology of sheaves applied to the \(\mathbb {Z}/2\mathbb {Z}\)-action \(-\tau \) shows that the following sequence is exact:

$$\begin{aligned} 1 \rightarrow A[1+\tau ] \rightarrow B[1+\tau ] \rightarrow C[1+\tau ] \xrightarrow {\delta } A^{\tau }/(1+\tau )A. \end{aligned}$$

(Alternatively, the exactness of this sequence is a statement that can be formulated in any abelian category. Since it is true in the category of R-modules for any ring R, it remains true in our setting.) If \(A^{\tau }/(1+\tau )A\) is trivial, the lemma is proven. If \(A^{\tau }/(1+\tau )A\) is finite étale and \(C[1+\tau ]\) is connected, then \(\delta =0\) since there are no nonconstant maps from a geometrically connected scheme to a finite étale k-scheme. \(\square \)

Recall that (up to a unit in R) the discriminant \(\varDelta \) factors as \(\varDelta _E\cdot \varDelta _{\hat{E}}\). The proof of Proposition 5.6 shows that every \(b\in \underline{\mathsf {B}}(R)\) with \({{\,\mathrm{ord}\,}}_K\varDelta (b)\le 1\) is admissible. We will need the stronger:

Proposition 5.7

Let \(b\in \underline{\mathsf {B}}(R)\) with \(\varDelta (b)\ne 0\) and \({{\,\mathrm{ord}\,}}_K\varDelta _E(b)\le 1\). Then b is admissible.

Proof

Since Néron models commute with the formation of strict henselization and completion [18,  §7.2, Theorem 1(b)], we may assume that R is complete and its residue field k is separably closed. We distinguish cases according to the value of \({{\,\mathrm{ord}\,}}_K \varDelta _E(b)\).

\(\underline{\hbox {Case }{{\,\mathrm{ord}\,}}_K \varDelta _E(b)=0.}\)

If \(\mathcal {C}_b\) is regular, b is admissible by Lemma 5.4. We may therefore assume that \(\mathcal {C}_b\) has a non-regular point \(P\in \mathcal {C}_b\). Since \(\overline{\mathcal {E}}_b\) is R-smooth, P is the unique non-regular point and lies in the ramification locus of the morphism \(\mathcal {C}_b\rightarrow \overline{\mathcal {E}}_b\). By the same reasoning as Case 2 in the proof of Proposition 5.6, we may assume after changing variables \(x\mapsto x-\alpha \) that \(\mathcal {C}_b\) is given by the equation

$$\begin{aligned} y^4+a_2xy^2+a_6y^2=x^3+a_4x^2+a_8x+a_{12} \end{aligned}$$
(5.4)

for some \(a_i\in R\) with \({{\,\mathrm{ord}\,}}_K a_{8} \ge 1\) and \({{\,\mathrm{ord}\,}}_K a_{12} \ge 1\), and P corresponds to the origin in the special fibre. Again by the smoothness of \(\overline{\mathcal {E}}_b/R\), we see that \(a_6\in R^{\times }\). Therefore the completed local ring of \(P_k\) in \(\mathcal {C}_{b,k}\) is isomorphic to \(k[[x,y]]/(y^2-(x^3+a_4x^2))\). So \(\mathcal {C}_{b,k}\) has one singular point which is a node or a cusp.

Consider the sequence of proper birational morphisms \(\dots \rightarrow \mathcal {C}_2 \rightarrow \mathcal {C}_1 \rightarrow \mathcal {C}_0:=\mathcal {C}_b\), where for \(i\ge 0\) we inductively define \(\mathcal {C}_{i+1}\rightarrow \mathcal {C}_i\) as the composition of the blowup \(\mathcal {C}_i' \rightarrow \mathcal {C}_i\) of the non-regular locus and the normalization \(\mathcal {C}_{i+1}\rightarrow \mathcal {C}_i'\). By a result of Lipman [36,  Theorem 8.3.44], there exists an \(n\ge 1\) such that the scheme \(\mathcal {C}_n\) is regular; we denote this scheme by \(\tilde{\mathcal {C}}\). The morphism \(\tilde{\mathcal {C}}\rightarrow \mathcal {C}_b\) does not depend on n and we call it the of \(\mathcal {C}_b\). Since this process is canonical, the involution \(\tau _b\) of \(\mathcal {C}_b\) lifts to an involution of \(\tilde{\mathcal {C}}\).

Let X be the closure of \(\mathcal {C}_{b,k}\setminus \{P\}\) in \(\tilde{\mathcal {C}}_k\) and let \(Y\rightarrow X\) be its normalization. The composite \(Y\rightarrow \mathcal {C}_{b,k}\) is also the normalization of \(\mathcal {C}_{b,k}\). Since \(\mathcal {C}_{b,k}\) has arithmetic genus 3 and resolving a cusp or node decreases the genus by 1, the smooth curve Y has genus 2. The involution \(\tau _b\) of \(\mathcal {C}_{b,k}\) uniquely lifts to an involution \(\tau _b\) of Y. Moreover the composite morphism \(Y\rightarrow X\rightarrow \mathcal {C}_{b,k}\rightarrow \overline{\mathcal {E}}_{b,k}\) is a double cover of a smooth genus-1 curve by a smooth genus-2 curve hence has two branch points. Therefore the Prym variety \({{\,\mathrm{Pic}\,}}^0_{Y/k}[1+\tau _b^*]\) of this cover is connected and one-dimensional: the connectedness follows from [42,  §2, Property (vi)] (in particular the description of ‘\(\ker \psi \)’ there) combined with [42,  §3, Lemma 1].

We have an exact sequence [18,  §9.2, Corollary 11]

$$\begin{aligned} 1\rightarrow G \rightarrow {{\,\mathrm{Pic}\,}}^0_{\tilde{\mathcal {C}}_k/k} \rightarrow {{\,\mathrm{Pic}\,}}^0_{Y/k} \rightarrow 1, \end{aligned}$$

where G is a smooth commutative algebraic group of dimension 1 which is an extension of an abelian variety by a connected linear algebraic group, hence connected. Since \({{\,\mathrm{Pic}\,}}^0_{\tilde{\mathcal {C}}_{k}/k}[1+\tau _b^*]\) is two-dimensional and \({{\,\mathrm{Pic}\,}}^0_{Y/k}[1+\tau _b^*]\) is one-dimensional, \(G[1+\tau _b^*]\) must be one-dimensional hence equal to G itself. Therefore \(\tau ^*_b|_{G}=-{{\,\mathrm{Id}\,}}_G\) and so \(G^{\tau ^*_b}/(1+\tau ^*_b)G=G[2]\), which is finite étale (note that 2 is invertible in k). Since \({{\,\mathrm{Pic}\,}}^0_{Y/k}[1+\tau _b^*]\) is connected, Lemma 5.6(2) shows that the following sequence is exact:

$$\begin{aligned} 1\rightarrow G \rightarrow {{\,\mathrm{Pic}\,}}^0_{\tilde{\mathcal {C}}_{k}/k}[1+\tau _b^*] \rightarrow {{\,\mathrm{Pic}\,}}^0_{Y/k}[1+\tau _b^*] \rightarrow 1. \end{aligned}$$

Since the outer terms of the sequence are connected, the same is true for the middle term. Therefore b is admissible by Lemma 5.5.

\(\underline{\text {Case }{{\,\mathrm{ord}\,}}_K \varDelta _E(b)=1}\)

By assumption, the curve \(\overline{\mathcal {E}}_b/R\) is regular and its special fibre has a unique singularity, which is a node [57,  Lemma IV.9.5(a)]; let \(Q\in \overline{\mathcal {E}}_b\) be this point. Recall that the morphism \(f:\mathcal {C}_b\rightarrow \overline{\mathcal {E}}_b\) is branched over the closed subscheme \((y=0)\) and étale over the complement. We distinguish two cases.

  • Suppose that Q is contained in the branch locus of f. Then Q uniquely lifts to \(P\in \mathcal {C}_b\) and \(\mathcal {C}_b/R\) is smooth outside P. We may then assume after changing variables \(x\mapsto x-\alpha \) that \(\mathcal {C}_b\) is given by the equation

    $$\begin{aligned} y^4+a_2xy^2+a_6y^2=x^3+a_4x^2+a_8x+a_{12}, \end{aligned}$$

    where \({{\,\mathrm{ord}\,}}_K a_8\ge 1\) and \({{\,\mathrm{ord}\,}}_K a_{12} \ge 1\). Since \(\overline{\mathcal {E}}_b/R\) is not smooth at Q, we have \({{\,\mathrm{ord}\,}}_K a_6\ge 1\). On the other hand, \(\overline{\mathcal {E}}_b\) is regular at Q so \(a_{12}\) is a uniformizer for R. Therefore \(\mathcal {C}_b\) is also regular at P. So \(\mathcal {C}_b\) is regular everywhere, hence b is admissible by Lemma 5.4.

  • Suppose that Q is not contained in the branch locus of f. Since \(\mathcal {C}_b \rightarrow \overline{\mathcal {E}}_b\) is étale above Q, this point lifts to two regular nodal points \(P_1,P_2\) of \(\mathcal {C}_b\). Therefore the curve \(\mathcal {C}_b\) is regular outside \((y=0)\subset \mathcal {C}_{b,k}\). Let \(\tilde{\mathcal {C}} \rightarrow \mathcal {C}_b\) be the canonical desingularization, described in the second paragraph of the first case of the proof. The involution \(\tau _b\) of \(\mathcal {C}_b\) lifts to an involution of \(\tilde{\mathcal {C}}\). Let \(X\rightarrow \tilde{\mathcal {C}}_k\) be the partial normalization of the special fibre, given by normalizing the nodes corresponding to \(P_1\) and \(P_2\). Then \(\tau _b\) also lifts to an involution of X. We have a \(\tau _b^*\)-equivariant exact sequence

    $$\begin{aligned} 1 \rightarrow \mathbb {G}_m\times \mathbb {G}_m \rightarrow {{\,\mathrm{Pic}\,}}^0_{\tilde{\mathcal {C}}_k/k}\rightarrow {{\,\mathrm{Pic}\,}}^0_{X/k}\rightarrow 1, \end{aligned}$$

    where \(\tau ^*\) acts on \(\mathbb {G}_m\times \mathbb {G}_m\) by interchanging the two factors. Since \({{\,\mathrm{Pic}\,}}^0_{\tilde{\mathcal {C}}_k/k}[1+\tau ^*]\) is two-dimensional and \((\mathbb {G}_m\times \mathbb {G}_m)[1+\tau _b^*]\) is one-dimensional, \({{\,\mathrm{Pic}\,}}^0_{X/k}[1+\tau ^*]={{\,\mathrm{Pic}\,}}^0_{X/k}\) by dimension reasons. By Lemma 5.6(1) we obtain an exact sequence

    $$\begin{aligned} 1\rightarrow \mathbb {G}_m \rightarrow {{\,\mathrm{Pic}\,}}^0_{\tilde{\mathcal {C}}_k/k}[1+\tau ^*]\rightarrow {{\,\mathrm{Pic}\,}}^0_{X/k}\rightarrow 1. \end{aligned}$$

    Since the outer terms are connected, the same is true for the middle term hence b is admissible by Lemma 5.5. \(\square \)

The reason for introducing admissibility is the following lemma, which is a key ingredient for Proposition 5.8.

Lemma 5.7

Let \(b\in \underline{\mathsf {B}}(R)\) be admissible. Then the following commutative diagram has exact rows:

figure ai

Proof

The exactness of the bottom row follows from the exact sequence \(0\rightarrow \mathcal {P}_b\rightarrow \mathcal {J}_b\rightarrow \mathcal {E}_b\rightarrow 0\) and [18,  §7.5, Proposition 3(a)], noting that there exists an injection \(\mathcal {E}_b\hookrightarrow \mathcal {J}_b\) such that the composite \(\mathcal {E}_b\rightarrow \mathcal {J}_b \rightarrow \mathcal {E}_b\) is multiplication by 2 (Property 1 of Sect. 3.3). To verify exactness of the top row, note that again by [18,  §7.5, Proposition 3(a)] the image of \(\mathscr {J}_b\rightarrow \mathscr {E}_b\) contains \(\mathscr {E}_b^{\circ }\). Hence the top row is exact at \(\mathscr {E}_b^{\circ }\). Since b is admissible, it is also exact at \(\mathscr {J}_b^{\circ }\). Finally because \(\mathscr {P}_b\rightarrow \mathscr {J}_b\) is a closed immersion, the same holds for \(\mathscr {P}^{\circ }_b\rightarrow \mathscr {J}^{\circ }_b\) so the top row is exact at \(\mathscr {P}_b^{\circ }\) too. \(\square \)

Remark 5.1

If b is not admissible, the top row of the above commutative diagram fails to be exact at \(\mathscr {J}_b^{\circ }\).

If A/K is an abelian variety with Néron model \(\mathscr {A}/R\), define the of \(\mathscr {A}\) as \(\varPhi _A :=\mathscr {A}_k/\mathscr {A}_k^{\circ }\), a finite étale group scheme over k.

Proposition 5.8

Let \(b\in \underline{\mathsf {B}}(R)\) with \(\varDelta (b)\ne 0\). Suppose that b is admissible and that \(\mathscr {E}_b=\mathscr {E}_b^{\circ }\). Then the morphism \(\rho :\mathscr {P}_b \rightarrow \mathscr {P}_b^{\vee }\) induces an isomorphism of component groups \(\varPhi _{\mathcal {P}_b}\xrightarrow {\sim } \varPhi _{\mathcal {P}_b^{\vee }}\).

Proof

Since \(\hat{\rho }\circ \rho = [2]\) and \(\rho \circ \hat{\rho }= [2]\), it will suffice to prove that the restriction to 2-primary parts \(\varPhi _{\mathcal {P}_b}[2^{\infty }]\rightarrow \varPhi _{\mathcal {P}_b^{\vee }}[2^{\infty }]\) is an isomorphism of finite étale group schemes. By definition, \(\rho \) is given by the composite of \(\mathscr {P}_b\rightarrow \mathscr {J}_b\) with \(\mathscr {J}_b\rightarrow \mathscr {P}_b^{\vee }\). So it will suffice to prove that the morphisms \(\varPhi _{\mathcal {P}_b}[2^{\infty }]\rightarrow \varPhi _{\mathcal {J}_b}[2^{\infty }]\) and \(\varPhi _{\mathcal {J}_b}[2^{\infty }]\rightarrow \varPhi _{\mathcal {P}^{\vee }_b}[2^{\infty }]\) are isomorphisms.

By Lemma 5.7 and the snake lemma, we obtain an exact sequence \(0\rightarrow \varPhi _{\mathcal {P}_b}\rightarrow \varPhi _{\mathcal {J}_b} \rightarrow \varPhi _{\mathcal {E}_b}\). Since \(\varPhi _{\mathcal {E}_b}\) is trivial by assumption, \(\varPhi _{\mathcal {P}_b}\rightarrow \varPhi _{\mathcal {J}_b}\) is an isomorphism of finite étale group schemes. Since Grothendieck’s pairing on component groups is perfect on l-primary parts when l is invertible in k [5,  Theorem 7], the finite étale group schemes \(\varPhi _{\mathcal {P}_b}[2^{\infty }], \varPhi _{\mathcal {J}_b}[2^{\infty }]\) and \(\varPhi _{\mathcal {P}^{\vee }_b}[2^{\infty }] \) have the same order.

By the same reasoning as the proof of Lemma 5.7 using the fact that \(\mathscr {J}_b^{\circ }\cap \mathscr {E}_b=\mathscr {E}_b^{\circ } = \mathscr {E}_b\), the exact sequence \(0\rightarrow \mathcal {E}_b\rightarrow \mathcal {J}_b\rightarrow \mathcal {P}_b^{\vee }\rightarrow 0\) induces a commutative diagram with exact rows:

figure ak

The snake lemma gives an exact sequence \(0 \rightarrow \varPhi _{\mathcal {E}_b} \rightarrow \varPhi _{\mathcal {J}_b} \rightarrow \varPhi _{\mathcal {P}^{\vee }_b}\). Since \(\varPhi _{\mathcal {E}_b}\) is trivial, the morphism of finite étale k-group schemes \(\varPhi _{\mathcal {J}_b} \rightarrow \varPhi _{\mathcal {P}^{\vee }_b}\) is injective. Since the 2-primary parts have the same order, the induced morphism \(\varPhi _{\mathcal {J}_b}[2^{\infty }] \rightarrow \varPhi _{\mathcal {P}^{\vee }_b}[2^{\infty }]\) is an isomorphism, proving the proposition. \(\square \)

The following important corollary will be the one that we will use later on.

Corollary 5.1

Let p be a prime not dividing N. Let \(b\in \underline{\mathsf {B}}(\mathbb {Z}_p)\) with \(\varDelta (b)\ne 0\) and \(p^2\not \mid \varDelta _{\hat{E}}(b)\). Then the image of the \(\hat{\rho }\)-descent map \(P_b(\mathbb {Q}_p)/\hat{\rho }(P^{\vee }_b(\mathbb {Q}_p)) \rightarrow \mathrm {H}^1(\mathbb {Q}_p, P^{\vee }_b[\hat{\rho }])\) coincides with the subset of unramified classes \(\mathrm {H}^1_{\text {nr}}(\mathbb {Q}_p, P^{\vee }_b[\hat{\rho }])\).

Proof

Since \(\varDelta _{\hat{E}}(b)\) and \(\varDelta _{E}(\hat{b})\) coincide up to a unit in \(\mathbb {Z}_p\), we see that \(p^2\not \mid \varDelta _{E}(\hat{b})\). By Proposition 5.7, \(\hat{b}\) is admissible and by Tate’s algorithm [57,  Lemma IV.9.5(a)], \(\mathscr {E}_{\hat{b}}=\mathscr {E}_{\hat{b}}^{\circ }\). Therefore by Proposition 5.8 and Theorem 3.1, \(\hat{\rho }\) induces an isomorphism \(\varPhi _{P_b^{\vee }} \xrightarrow {\sim } \varPhi _{P_b}\). Under these circumstances, the claim of the corollary is well-known and follows essentially from Lang’s theorem; see [22,  Proposition 2.7(d)]. \(\square \)

5.4 The case of square-free discriminant

In this section we analyze the orbits of \(\underline{\mathsf {V}}\) and \(\underline{\mathsf {V}}^{\star }\) over points in \(\underline{\mathsf {B}}(\mathbb {Z}_p)\) and \(\underline{\mathsf {B}}^{\star }(\mathbb {Z}_p)\) of square-free discriminant. This will be the first step in proving Theorem 5.1 and will be used in the proofs of Theorems 8.1 and 8.3 when applying the square-free sieve.

Lemma 5.8

Let R be a discrete valuation ring with residue field k in which N is a unit. Let \(K = {{\,\mathrm{Frac}\,}}R\) and let \({{\,\mathrm{ord}\,}}_K: K^{\times } \twoheadrightarrow \mathbb {Z}\) be the normalized discrete valuation. Let \(x\in \underline{\mathsf {V}}(R)\) with \(b=\pi (x)\in \underline{\mathsf {B}}(R)\) and suppose that \({{\,\mathrm{ord}\,}}_K \varDelta (b)=1\). Then the reduction \(x_k\) of x in \(\underline{\mathsf {V}}(k)\) is regular and \(\underline{\mathsf {G}}(\bar{k})\)-conjugate to \(\sigma (b)_k\). In addition the R-group scheme \(Z_{\underline{\mathsf {G}}}(x)\) is quasi-finite étale and has special fibre of order \(2^3\).

Proof

We are free to replace R by a discrete valuation ring \(R'\) containing R such that any uniformizer in R is also a uniformizer in \(R'\). Therefore we may assume that R is complete and k algebraically closed.

Let \(x_k=y_s+y_n\) be the Jordan decomposition of \(x_k\in \underline{\mathsf {V}}(k)\) as a sum of its semisimple and nilpotent parts. Let \(\underline{\mathfrak {h}}_{0,k}=\mathfrak {z}_{\underline{\mathfrak {h}}}(y_s)\) and \(\underline{\mathfrak {h}}_{1,k}={{\,\mathrm{image}\,}}({{\,\mathrm{Ad}\,}}(y_s))\). Then \(\underline{\mathfrak {h}}_{k}=\underline{\mathfrak {h}}_{0,k}\oplus \underline{\mathfrak {h}}_{1,k}\), where \({{\,\mathrm{Ad}\,}}(x_k)\) acts nilpotently on \(\underline{\mathfrak {h}}_{0,k}\) and invertibly on \(\underline{\mathfrak {h}}_{1,k}\). By Hensel’s lemma, this decomposition lifts to an \({{\,\mathrm{Ad}\,}}(x)\)-invariant decomposition of free R-modules \(\underline{\mathfrak {h}}_R = \underline{\mathfrak {h}}_{0,R}\oplus \underline{\mathfrak {h}}_{1,R}\), where \({{\,\mathrm{Ad}\,}}(x)\) acts topologically nilpotently on \(\underline{\mathfrak {h}}_{0,R}\) and invertibly on \(\underline{\mathfrak {h}}_{1,R}\). There exists a unique closed subgroup \(\mathsf {L}\subset \underline{\mathsf {H}}_R\) with Lie algebra \(\underline{\mathfrak {h}}_{0,R}\) such that \(\mathsf {L}\) is R-smooth with connected fibres; this follows from an argument identical to the proof of [34,  Lemma 4.19]. Moreover the construction of \(\mathsf {L}\) shows that \(\mathsf {L}_k=Z_{\underline{\mathsf {H}}}(y_s)\).

The proof of Proposition 5.6 shows that the curve \(\mathcal {C}_{b,k}\) either has one node, or two nodes swapped by \(\tau \). Therefore the affine surface \(\mathcal {S}/k\) cut out by the equation \(z^2+y^4+p_2(b)xy^2+p_6(b)y^2 = x^3+p_8(b)x+p_{12}(b)\) in \(\mathbb {A}^3_k\) either has one ordinary double point, or two such double points swapped by \((x,y,z)\mapsto (x,-y,-z)\). The surface \(\mathcal {S}\) is the fibre above \(b_k\in \underline{\mathsf {B}}(k)\) of a semi-universal deformation of a simple singularity of type \(F_4\), in the sense of [58,  §6.2]. The results of [58,  §6.6] (in particular Propositions 2, 3 and the subsequent remark) imply that the derived group of \(\mathsf {L}\) has type \(A_1\) and the center \(Z(\mathsf {L})\) of \(\mathsf {L}\) has rank 3. Moreover the restriction \(\theta _{\mathsf {L}}\) of \(\theta \) to \(\mathsf {L}\) is a stable involution, in the sense that for each geometric point of \({{\,\mathrm{Spec}\,}}R\) there exists a maximal torus of \(\mathsf {L}\) on which \(\theta \) acts as \(-1\), by [60,  Lemma 2.4]. There is an isomorphism \(\mathsf {L}/Z(\mathsf {L})\simeq {{\,\mathrm{PGL}\,}}_2\) inducing an isomorphism \(\underline{\mathfrak {h}}_{R,0}^{der}\simeq \underline{\mathfrak {h}}_{R,0}/\mathfrak {z}(\underline{\mathfrak {h}}_{R,0}) \simeq {{\,\mathrm{\mathfrak {sl}}\,}}_{2,R}\) under which \(\theta _{\mathsf {L}}\) corresponds to the involution \(\xi = {{\,\mathrm{Ad}\,}}\left( \text {diag}(1,-1) \right) \). The lemma now follow easily from explicit calculations in \({{\,\mathrm{\mathfrak {sl}}\,}}_{2,R}\) identical to [34,  Lemma 4.19], which we omit. \(\square \)

The following proposition and its corollary describe orbits in \(\underline{\mathsf {V}}\) of square-free discriminant. Their proofs are identical to the proofs of [34,  Proposition 4.20 and Corollary 4.21], using Proposition 5.6 and Lemma 5.8; they will be omitted.

Proposition 5.9

Let R be a discrete valuation ring in which N is a unit. Let \(K = {{\,\mathrm{Frac}\,}}R\) and let \({{\,\mathrm{ord}\,}}_K: K^{\times } \twoheadrightarrow \mathbb {Z}\) be the normalized discrete valuation. Let \(b\in \underline{\mathsf {B}}(R)\) and suppose that \({{\,\mathrm{ord}\,}}_K \varDelta (b)\le 1\). Then:

  1. 1.

    If \(x\in \underline{\mathsf {V}}_b(R)\), then \(Z_{\underline{\mathsf {G}}}(x)(K) = Z_{\underline{\mathsf {G}}}(x)(R)\).

  2. 2.

    The natural map \(\alpha :\underline{\mathsf {G}}(R)\backslash \underline{\mathsf {V}}_b(R) \rightarrow \underline{\mathsf {G}}(K)\backslash \underline{\mathsf {V}}_b(K)\) is injective and its image contains \(\eta _b\left( P_b(K)/2P_b(K)\right) \).

  3. 3.

    If further R is complete and has finite residue field then the image of \(\alpha \) equals \(\eta _b\left( P_b(K)/2P_b(K)\right) \).

Corollary 5.2

Let X be a Dedekind scheme in which N is a unit with function field K. For every closed point p of X write \({{\,\mathrm{ord}\,}}_{p} :K^{\times } \twoheadrightarrow \mathbb {Z}\) for the normalized discrete valuation of p. Let \(b\in \underline{\mathsf {B}}(X)\) be a morphism such that \({{\,\mathrm{ord}\,}}_{p}(\varDelta (b))\le 1\) for all p. Let \(P\in \mathcal {P}_b(K)/2\mathcal {P}_b(K)\) and let \(\eta _b(P)\in G(K) \backslash V_b(K)\) be the corresponding orbit from Proposition 5.2. Then the object of \({{\,\mathrm{GrLieE}\,}}_{K,b}\), corresponding to \(\eta _b(P)\) using Proposition 5.5, uniquely extends to an object of \({{\,\mathrm{GrLieE}\,}}_{X,b}\).

We now consider orbits of square-free discriminant in the representation \(\underline{\mathsf {V}}^{\star }\). We will only need to consider the case of \(\mathbb {Z}_p\); the representation-theoretic input of the following proposition has already been established by Bhargava and Shankar [12].

Proposition 5.10

Let p be a prime number not dividing N and let \(b\in \underline{\mathsf {B}}(\mathbb {Z}_p)\) such that \(\varDelta (b)\ne 0\) and \(p^2\not \mid \varDelta _{\hat{E}}(b)\). Let \(b^{\star } = \mathcal {Q}(b) \in \underline{\mathsf {B}}^{\star }(\mathbb {Z}_p)\). Then:

  1. 1.

    If \(x\in \underline{\mathsf {V}}^{\star }_{b^{\star }}(\mathbb {Z}_p)\), then \(Z_{\underline{\mathsf {G}}^{\star }}(x)(\mathbb {Q}_p) = Z_{\underline{\mathsf {G}}^{\star }}(x)(\mathbb {Z}_p)\).

  2. 2.

    The natural map \(\alpha :\underline{\mathsf {G}}^{\star }(\mathbb {Z}_p)\backslash \underline{\mathsf {V}}^{\star }_{b^{\star }}(\mathbb {Z}_p) \rightarrow \mathsf {G}^{\star }(\mathbb {Q}_p)\backslash \mathsf {V}^{\star }_{b^{\star }}(\mathbb {Q}_p)\) is injective and its image equals \(\eta ^{\star }_{b}\left( P_{b}(\mathbb {Q}_p)/\hat{\rho }(P^{\vee }_{b}(\mathbb {Q}_p))\right) \).

Proof

Let \(E'/\mathbb {Q}_p\) be the elliptic curve with Weierstrass equation \(y^2=x^3+I(\mathcal {Q}(b))/9x-J(\mathcal {Q}(b))/27\), where \(\mathcal {Q}:\underline{\mathsf {B}}_S \rightarrow \underline{\mathsf {B}}^{\star }_S\) is the resolvent binary quartic map from Sect. 4.2 and IJ are the invariants of a binary quartic form of (4.1) and (4.2). By Eq. (3.8), Proposition 4.1 and the choice of N, \(\hat{E}_b\) and \(E'\) are quadratic twists, where the twisting is given by an element of \(\mathbb {Z}[1/N]^{\times }\). We have chosen \(E'\) so that by [12,  Theorem 3.2], there exists an injection \(\eta ':E'(\mathbb {Q}_p)/2E'(\mathbb {Q}_p) \hookrightarrow \mathsf {G}^{\star }(\mathbb {Q}_p)\backslash \mathsf {V}^{\star }_{b^{\star }}(\mathbb {Q}_p)\) with the property that the composite \(E'(\mathbb {Q}_p)/2E'(\mathbb {Q}_p) \xrightarrow {\eta '} \mathsf {G}^{\star }(\mathbb {Q}_p)\backslash \mathsf {V}^{\star }_{b^{\star }}(\mathbb {Q}_p) \hookrightarrow \mathrm {H}^1(\mathbb {Q}_p, \hat{E}_b[2]) =\mathrm {H}^1(\mathbb {Q}_p, E'[2]) \) coincides with the 2-descent map. (The second map comes from Proposition 4.2.)

Since p is a unit in \(\mathbb {Z}[1/N]\) and the discriminant of \(\hat{E}_b\) is not divisible by \(p^2\), the same is true for the discriminant of \(E'\). Therefore the Tamagawa number of \(E'\) is 1 [57,  Lemma IV.9.5(a)], hence the image of \(E'(\mathbb {Q}_p)/2E'(\mathbb {Q}_p) \xrightarrow {\eta '} \mathsf {G}^{\star }(\mathbb {Q}_p)\backslash \mathsf {V}^{\star }_{b^{\star }}(\mathbb {Q}_p) \hookrightarrow \mathrm {H}^1(\mathbb {Q}_p, \hat{E}_b[2])\) coincides with the subgroup of unramified classes \(\mathrm {H}_{\text {nr}}^1(\mathbb {Q}_p, \hat{E}_b[2])\subset \mathrm {H}^1(\mathbb {Q}_p, \hat{E}_b[2])\) [20,  Lemma 7.1].

Again by the fact that the discriminant of \(E'\) is square-free, [12,  Proposition 3.18] implies that \(Z_{\underline{\mathsf {G}}^{\star }}(x)(\mathbb {Q}_p) = Z_{\underline{\mathsf {G}}^{\star }}(x)(\mathbb {Z}_p)\), that \(\alpha \) is injective and that the image of the composite \(\underline{\mathsf {G}}^{\star }(\mathbb {Z}_p)\backslash \underline{\mathsf {V}}^{\star }_{b^{\star }}(\mathbb {Z}_p) \rightarrow \mathsf {G}^{\star }(\mathbb {Q}_p)\backslash \mathsf {V}^{\star }_{b^{\star }}(\mathbb {Q}_p) \hookrightarrow \mathrm {H}^1(\mathbb {Q}_p,\hat{E}_b[2])\) is \(\mathrm {H}_{\text {nr}}^1(\mathbb {Q}_p, \hat{E}_b[2])\). By the construction of \(\eta _b^{\star }\) in the proof of Theorem 4.2, it remains to prove that this subset of \(\mathrm {H}^1(\mathbb {Q}_p,\hat{E}_b[2])\) coincides with the image of the \(\hat{\rho }\)-descent map \(P_b(\mathbb {Q}_p)/\hat{\rho }(P^{\vee }(\mathbb {Q}_p))\rightarrow \mathrm {H}^1(\mathbb {Q}_p,P_b^{\vee }[\hat{\rho }])\) transported along the isomorphism \(\mathrm {H}^1(\mathbb {Q}_p,P_b^{\vee }[\hat{\rho }])\simeq \mathrm {H}^1(\mathbb {Q}_p,\hat{E}_b[2])\) afforded by Corollary 3.2. Since the latter isomorphism preserves the unramified classes (which only depend on the Galois module \(\hat{E}_b[2] \simeq P_b^{\vee }[\hat{\rho }]\)), this follows from Corollary 5.1. \(\square \)

5.5 Integral representatives for \(\mathsf {V}\)

In this subsection we prove Theorem 5.1. Our strategy is closely modelled on the strategy of proving [34,  Theorem 4.1]: we deform to the case of square-free discriminant using a Bertini type theorem over \(\mathbb {Z}_p\) and using the compactified Prym variety. We have done all the necessary preparations and what follows is a routine adaptation of [34,  §4.5]. The following proposition and its proof are very similar to [34,  Corollary 4.23]. It establishes the existence of a deformation with good properties.

Proposition 5.11

Let p be a prime number not dividing N. Let \(b\in \underline{\mathsf {B}}(\mathbb {Z}_p)\) with \(\varDelta (b)\ne 0\) and \(Q \in P_b(\mathbb {Q}_p)\). Then there exists a morphism \(\mathcal {X} \rightarrow \mathbb {Z}_p\) that is of finite type, smooth of relative dimension 1 and with geometrically integral fibres, together with a point \(x \in \mathcal {X}(\mathbb {Z}_p)\) satisfying the following properties.

  1. 1.

    There exists a morphism \(\tilde{b}: \mathcal {X} \rightarrow \underline{\mathsf {B}}_{\mathbb {Z}_p}\) with the property that \(\tilde{b}(x) = b\) and that the discriminant \(\varDelta (\tilde{b})\), seen as a map \(\mathcal {X} \rightarrow \mathbb {A}^1_{\mathbb {Z}_p}\), is not identically zero on the special fibre and is square-free on the generic fibre of \(\mathcal {X}\).

  2. 2.

    Write \(\mathcal {X}^{{{\,\mathrm{rs}\,}}}\) for the open subscheme of \(\mathcal {X}\) where \(\varDelta (\tilde{b})\) does not vanish. Then there exists a morphism \(\tilde{Q}: \mathcal {X}^{{{\,\mathrm{rs}\,}}} \rightarrow \mathcal {P}\) lifting the morphism \( \mathcal {X}^{{{\,\mathrm{rs}\,}}} \rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}_{\mathbb {Z}_p}\) satisfying \(\tilde{Q}(x_{\mathbb {Q}_p}) = Q\).

Proof

We apply [34,  Proposition 4.22] to the compactified Prym variety \(\overline{P}\rightarrow \mathsf {B}\) introduced in Sect. 3.5. In Sect. 5.1 we have spread out \(\overline{P}\) to a scheme \(\overline{\mathcal {P}}\rightarrow \underline{\mathsf {B}}_S\) with similar properties. (Recall that \(S = \mathbb {Z}[1/N]\).) Define \(\mathcal {D} \) to be the pullback of \(\{\varDelta = 0\} \subset \underline{\mathsf {B}}_{\mathbb {Z}_p}\) along \(\overline{\mathcal {P}}_{\mathbb {Z}_p} \rightarrow \underline{\mathsf {B}}_{\mathbb {Z}_p}\). Since the latter morphism is proper, we may extend \(Q\in P_b(\mathbb {Q}_p) \subset \overline{P}_b(\mathbb {Q}_p)\) to an element of \(\overline{\mathcal {P}}_b(\mathbb {Z}_p)\), still denoted by Q. We now claim that the triple \((\overline{\mathcal {P}}_{\mathbb {Z}_p},\mathcal {D},Q)\) satisfies the assumptions of [34,  Proposition 4.22]. Indeed, the properties of \(\overline{\mathcal {P}}_{\mathbb {Z}_p}\) follow from Proposition 3.9. (Or rather the analogous properties obtained by spreading out in Sect. 5.1.) Moreover \(\overline{\mathcal {P}}_{\mathbb {F}_p}\) is not contained in \(\mathcal {D}\) since \(\varDelta \) is nonzero mod p by our assumptions on N. Since \(\overline{\mathcal {P}}_{\mathbb {Z}_p} \) is \(\underline{\mathsf {B}}_{\mathbb {Z}_p}\)-flat, \(\mathcal {D}\) is a Cartier divisor. Since the smooth locus of \(\overline{\mathcal {P}}_{\mathbb {Q}_p} \rightarrow \underline{\mathsf {B}}_{\mathbb {Q}_p}\) has complement of codimension at least two and \(\{\varDelta = 0\} \subset \underline{\mathsf {B}}_{\mathbb {Q}_p}\) is reduced, the scheme \(\mathcal {D}_{\mathbb {Q}_p}\) is reduced too. Finally \(Q_{\mathbb {Q}_p} \not \in \mathcal {D}_{\mathbb {Q}_p}\) since b has nonzero discriminant.

We obtain a closed subscheme \(\mathcal {X}\hookrightarrow \overline{\mathcal {P}}_{\mathbb {Z}_p}\) satisfying the conclusions of [34,  Proposition 4.22]. Write \(x\in \mathcal {X}(\mathbb {Z}_p)\) for the section corresponding to Q, \(\widetilde{b}\) for the restriction of \(\overline{\mathcal {P}}_{\mathbb {Z}_p} \rightarrow \underline{\mathsf {B}}_{\mathbb {Z}_p}\) to \(\mathcal {X}\) and \(\widetilde{Q}\) for the restriction of the inclusion \(\mathcal {X} \hookrightarrow \overline{\mathcal {P}}_{\mathbb {Z}_p}\) to \(\mathcal {X}^{{{\,\mathrm{rs}\,}}}\). Then the tuple \((\mathcal {X},x,\widetilde{b},\widetilde{Q})\) satisfies the conclusion of the proposition. \(\square \)

We now give the proof of Theorem 5.1. We keep the assumptions and notation of Proposition 5.11 and assume that we have made a choice of \((\mathcal {X},x,\widetilde{b},\widetilde{Q})\) satisfying the conclusions of that proposition. The strategy is to extend the orbit \(\eta _b(Q)\) (which corresponds to the point \(x_{\mathbb {Q}_p}\)) to larger and larger subsets of \(\mathcal {X}\).

Let \(y\in \mathcal {X}\) be a closed point of the special fibre with nonzero discriminant having an affine open neighborhood containing \(x_{\mathbb {Q}_p}\). Let R be the semi-local ring of \(\mathcal {X}\) at \(x_{\mathbb {Q}_p}\) and y. Since every projective module of constant rank over R is free, we can apply Proposition 5.2 to obtain an orbit \(\eta _{\widetilde{b}}(\widetilde{Q}) \in \underline{\mathsf {G}}(R)\backslash \underline{\mathsf {V}}_{\widetilde{b}}(R)\). This orbit spreads out to an element of \(\underline{\mathsf {G}}(U_1)\backslash \underline{\mathsf {V}}_{\widetilde{b}}(U_1)\), where \(U_1\subset \mathcal {X}\) is an open subset containing \(x_{\mathbb {Q}_p}\) and intersecting the special fibre nontrivially. Under the bijection of Proposition 5.5, this defines an object \(\mathcal {A}_1\) of \({{\,\mathrm{GrLieE}\,}}_{U_1,\widetilde{b}}\) such that the pullback of \(\mathcal {A}_1\) along the point \(x_{\mathbb {Q}_p}\in U_1(\mathbb {Q}_p)\) corresponds to the orbit \(\eta _b(Q)\).

Let \(U_2 = \mathcal {X}_{\mathbb {Q}_p}\). By Corollary 5.2, the restriction of \(\mathcal {A}_1\) to \(U_1\cap U_2\) extends to an object \(\mathcal {A}_2\) of \({{\,\mathrm{GrLieE}\,}}_{U_2,\widetilde{b}}\). We can glue \(\mathcal {A}_1\) and \(\mathcal {A}_2\) to an object \(\mathcal {A}_0\) of \({{\,\mathrm{GrLieE}\,}}_{U_0,\widetilde{b}}\), where \(U_0 = U_1 \cup U_2\). The complement of \(U_0\) has dimension zero since \(\mathcal {X}_{\mathbb {F}_p}\) is irreducible. By Lemma 5.2, \(\mathcal {A}_0\) extends to an object \(\mathcal {A}_3\) of \({{\,\mathrm{GrLieE}\,}}_{\mathcal {X},\widetilde{b}}\). Let \(\mathcal {A}_4\in {{\,\mathrm{GrLieE}\,}}_{\mathbb {Z}_p,b}\) denote the pullback of \(\mathcal {A}_3\) along the point \(x\in \mathcal {X}(\mathbb {Z}_p)\). Since \(\mathrm {H}^1(\mathbb {Z}_p, \underline{\mathsf {G}})\) is trivial by [39,  III.3.11(a)] and Lang’s theorem, Propositions 5.3 and 5.5 implies that \(\mathcal {A}_4\) determines an element of \(\underline{\mathsf {G}}(\mathbb {Z}_p)\backslash \underline{\mathsf {V}}_b(\mathbb {Z}_p)\) mapping to \(\eta _b(Q)\) in \(\mathsf {G}(\mathbb {Q}_p)\backslash \mathsf {V}_b(\mathbb {Q}_p)\). This completes the proof of Theorem 5.1.

We conclude this subsection by stating a consequence for orbits of \(\mathbb {Z}\). We will need the following lemma, whose proof is identical to that of [52,  Proposition 5.7]. Write \(\mathscr {E} :=\underline{\mathsf {B}}(\mathbb {Z}) \cap \mathsf {B}^{{{\,\mathrm{rs}\,}}}(\mathbb {Q})\) and \(\mathscr {E}_p :=\underline{\mathsf {B}}(\mathbb {Z}_p) \cap \mathsf {B}^{{{\,\mathrm{rs}\,}}}(\mathbb {Q}_p)\).

Lemma 5.9

Let p be a prime (not necessarily coprime to N) and let \(b_0\in \mathscr {E}_p\). Then there exists an integer \(n\ge 1\) and an open compact neighborhood \(W_p \subset \mathscr {E}_p\) of \(b_0\) such that for all \(b\in W_p\) and for all \(y\in P_{p^n\cdot b}(\mathbb {Q}_p)\), the orbit \(\eta _{p^n\cdot b}(y)\in \underline{\mathsf {G}}(\mathbb {Q}_p)\backslash \underline{\mathsf {V}}_{p^n\cdot b}(\mathbb {Q}_p)\) of Theorem 4.1 has a representative in \(\underline{\mathsf {V}}_{p^n\cdot b}(\mathbb {Z}_p)\).

Corollary 5.3

Let \(b_0 \in \mathscr {E}\). Then for each prime p dividing N we can find an open compact neighborhood \(W_p\) of \(b_0\) in \(\mathscr {E}_p\) and an integer \(n_p\ge 0\) with the following property. Let \(M = \prod _{p\mid N} p^{n_p}\). Then for all \(b\in \mathscr {E} \cap \left( \prod _{p\mid N} W_p \right) \) and for all \(y \in {{\,\mathrm{Sel}\,}}_2(P_{M\cdot b})\), the orbit \(\eta _{M\cdot b}(y) \in \mathsf {G}(\mathbb {Q}) \backslash \mathsf {V}_{M\cdot b}(\mathbb {Q})\) contains an element of \(\underline{\mathsf {V}}_{M\cdot b}(\mathbb {Z})\).

Proof

The group \(\underline{\mathsf {G}}\) has class number 1: \(\mathsf {G}(\mathbb {A}^{\infty })=\mathsf {G}(\mathbb {Q})\cdot \underline{\mathsf {G}}(\widehat{\mathbb {Z}})\) (Proposition 6.1). Therefore an orbit \(v\in \mathsf {G}(\mathbb {Q}) \setminus \mathsf {V}(\mathbb {Q})\) has a representative in \(\underline{\mathsf {V}}(\mathbb {Z})\) if and only if for every prime p the associated \(\mathsf {G}(\mathbb {Q}_p)\)-orbit has a representative in \(\underline{\mathsf {V}}(\mathbb {Z}_p)\). The corollary follows from combining Theorem 5.1 and Lemma 5.9. \(\square \)

5.6 Integral representatives for \(\mathsf {V}^{\star }\)

Proof of Theorem 5.2

Let \(A\in P_b(\mathbb {Q}_p)\) be an element giving rise to a \(\mathsf {G}^{\star }(\mathbb {Q}_p)\)-orbit \(\eta ^{\star }_b(A)\) in \(\mathsf {V}^{\star }_{b^{\star }}(\mathbb {Q}_p)\). By construction of \(\eta _b^{\star }\) we have a commutative diagram:

figure al

The diagram shows that the orbit \(\eta ^{\star }_b(A)\) is the image of the orbit \(\eta _b(A)\) under the map \(\mathcal {Q}\). By Theorem 5.1, \(\eta _b(A)\) has an integral representative \(v\in \underline{\mathsf {V}}(\mathbb {Z}_p)\). Therefore, the element \(\mathcal {Q}(v)\in \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)\) is an integral representative of \(\eta _b^{\star }(A)\). \(\square \)

Again we state the following global corollary which follows from Lemma 5.9.

Corollary 5.4

Let \(b_0 \in \mathscr {E}\). Then for each prime p dividing N we can find an open compact neighborhood \(W_p\) of \(b_0\) in \(\mathscr {E}_p\) and an integer \(n_p\ge 0\) with the following property. Let \(M = \prod _{p\mid N} p^{n_p}\). Then for all \(b\in \mathscr {E} \cap \left( \prod _{p\mid N} W_p \right) \) with \(b^{\star } = \mathcal {Q}(b)\) and for all \(y \in {{\,\mathrm{Sel}\,}}_{\hat{\rho }}(P^{\vee }_{M\cdot b})\), the orbit \(\eta ^{\star }_{M\cdot b}(y) \in \mathsf {G}^{\star }(\mathbb {Q}) \backslash \mathsf {V}^{\star }_{M\cdot b^{\star }}(\mathbb {Q})\) contains an element of \(\underline{\mathsf {V}}^{\star }_{M\cdot b^{\star }}(\mathbb {Z})\).

Proof

For each p dividing N, let \(W_p\subset \mathscr {E}_p\) be an open compact neighborhood of \(b_0\) and \(n_p\ge 0\) be an integer satisfying the conclusion of Lemma 5.9. Let \(M=\prod _{p\mid N} n_p\), let \(b\in \mathscr {E} \cap \left( \prod _{p\mid N} W_p \right) \) and let \(y \in {{\,\mathrm{Sel}\,}}_{\hat{\rho }}(P^{\vee }_{M\cdot b})\) with corresponding orbit \(\mathsf {G}^{\star }(\mathbb {Q})\cdot v=\eta _{M\cdot b}^{\star }(y)\). The orbit \(\mathsf {G}^{\star }(\mathbb {Q}_p)\cdot v\) lies in the image of \(\eta ^{\star }_{M\cdot b}\) so by an argument similar to the proof of Theorem 5.2, it is of the form \(\mathsf {G}^{\star }(\mathbb {Q}_p)\cdot \mathcal {Q}(w_p)\) for some \(w_p\in \underline{\mathsf {V}}(\mathbb {Q}_p)\) that lies in the image of \(\eta _{M\cdot b}\). By Lemma 5.9 we may assume that \(w_p\in \underline{\mathsf {V}}(\mathbb {Z}_p)\). Therefore \(\mathsf {G}^{\star }(\mathbb {Q}_p)\cdot v\) has a representative in \(\underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)\) for every prime p. Since \(\underline{\mathsf {G}}^{\star }={{\,\mathrm{PGL}\,}}_2\) has class number one, \(\underline{\mathsf {G}}^{\star }(\mathbb {A}^{\infty })=\underline{\mathsf {G}}^{\star }(\widehat{\mathbb {Z}})\underline{\mathsf {G}}^{\star }(\mathbb {Q})\) so \(\mathsf {G}^{\star }(\mathbb {Q})\cdot v\) has a representative in \(\underline{\mathsf {V}}^{\star }(\mathbb {Z})\).

\(\square \)

6 Counting integral orbits in \(\mathsf {V}\)

In this section we will apply the counting techniques of Bhargava to provide estimates for the integral orbits of bounded height in our representation \((\underline{\mathsf {G}},\underline{\mathsf {V}})\).

6.1 Heights

Recall that \(\underline{\mathsf {B}}= {{\,\mathrm{Spec}\,}}\mathbb {Z}[p_2,p_6,p_8,p_{12}]\) and that \(\pi : \underline{\mathsf {V}}\rightarrow \underline{\mathsf {B}}\) denotes the morphism of taking invariants. For any \(b\in \mathsf {B}(\mathbb {R})\) we define the of b by the formula

$$\begin{aligned} \text {ht}(b) :=\sup |p_i(b)|^{1/i}. \end{aligned}$$

We define \(\text {ht}(v) = \text {ht}(\pi (v))\) for any \(v\in \mathsf {V}(\mathbb {R})\). We have \(\text {ht}(\lambda \cdot b) = |\lambda |\text {ht}(b)\) for all \(\lambda \in \mathbb {R}\) and \(b\in \mathsf {B}(\mathbb {R})\). If A is a subset of \(\mathsf {V}(\mathbb {R})\) or \(\mathsf {B}(\mathbb {R})\) and \(X\in \mathbb {R}_{>0}\) we write \(A_{<X}\subset A\) for the subset of elements of height \(<X\). For every such X, the set \(\underline{\mathsf {B}}(\mathbb {Z})_{<X}\) is finite.

6.2 Measures

Let \(\omega _{\mathsf {G}}\) be a generator for the \(\mathbb {Q}\)-vector space of left-invariant top differential forms on \(\underline{\mathsf {G}}\) over \(\mathbb {Q}\). It is well-defined up to an element of \(\mathbb {Q}^{\times }\) and it determines Haar measures dg on \(\mathsf {G}(\mathbb {R})\) and \(\mathsf {G}(\mathbb {Q}_p)\) for each prime p.

Proposition 6.1

  1. 1.

    \(\underline{\mathsf {G}}\) has class number 1: \(\mathsf {G}(\mathbb {A}^{\infty }) = \mathsf {G}(\mathbb {Q})\underline{\mathsf {G}}(\widehat{\mathbb {Z}})\).

  2. 2.

    The product \({{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z})\backslash \underline{\mathsf {G}}(\mathbb {R}) \right) \cdot \prod _p {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z}_p)\right) \) converges absolutely and equals 2, the Tamagawa number of \(\mathsf {G}\).

Proof

The group \(\underline{\mathsf {G}}\) is the Zariski closure of \(\mathsf {G}\) in \({{\,\mathrm{GL}\,}}(\underline{\mathsf {V}})\) and in a suitable basis of \(\underline{\mathsf {V}}\), \(\mathsf {G}\) contains a maximal \(\mathbb {Q}\)-split torus consisting of diagonal matrices in \({{\,\mathrm{GL}\,}}(\underline{\mathsf {V}})\). Therefore \(\underline{\mathsf {G}}\) has class number 1 by [46,  Theorem 8.11; Corollary 2] and the fact that \(\mathbb {Q}\) has class number one. The first part implies that the product in the second part equals the Tamagawa number \(\tau (\mathsf {G})\) of \(\mathsf {G}\simeq ({{\,\mathrm{Sp}\,}}_6\times {{\,\mathrm{SL}\,}}_2)/\mu _2\). Now use the identities \(\tau (\mathsf {G})=2\tau ({{\,\mathrm{Sp}\,}}_6\times {{\,\mathrm{SL}\,}}_2)\) [44,  Theorem 2.1.1] and \(\tau ({{\,\mathrm{Sp}\,}}_6\times {{\,\mathrm{SL}\,}}_2)=1\) (because \({{\,\mathrm{Sp}\,}}_6\) and \({{\,\mathrm{SL}\,}}_2\) are simply connected). \(\square \)

We can decompose the measure dg on \(\mathsf {G}(\mathbb {R})\) using the Iwasawa decomposition. Fix, once and for all, a maximal compact subgroup \(K\subset G(\mathbb {R})\). Let \(\mathsf {P}= \mathsf {T}\mathsf {N}\subset \mathsf {G}\) be the Borel subgroup corresponding to the root basis \(S_{\mathsf {G}}\), with unipotent radical \(\mathsf {N}\). Let \(\overline{\mathsf {P}}= \mathsf {T}\overline{\mathsf {N}}\subset \mathsf {G}\) be the opposite Borel subgroup. Then the natural product maps

$$\begin{aligned} \overline{\mathsf {N}}(\mathbb {R})\times \mathsf {T}(\mathbb {R})^{\circ }\times K \rightarrow \mathsf {G}(\mathbb {R}) ,\; \mathsf {T}(\mathbb {R})^{\circ } \times \overline{\mathsf {N}}(\mathbb {R}) \times K \rightarrow \mathsf {G}(\mathbb {R}) \end{aligned}$$

are diffeomorphisms. If \(t\in \mathsf {T}(\mathbb {R})\), let \(\delta _{\mathsf {G}}(t) = \prod _{\beta \in \varPhi _{\mathsf {G}}^-} \beta (t) = \det {{\,\mathrm{Ad}\,}}(t)|_{{{\,\mathrm{Lie}\,}}\overline{\mathsf {N}}(\mathbb {R})}\). (Here \(\varPhi _{\mathsf {G}}^-\) denotes the subset of negative roots.) The following result follows from well-known properties of the Iwasawa decomposition; see [35,  Chapter 3;§1].

Lemma 6.1

Let dtdndk be Haar measures on \(\mathsf {T}(\mathbb {R})^{\circ }, \overline{\mathsf {N}}(\mathbb {R}), K\) respectively. Then the assignment

$$\begin{aligned} f \mapsto \int _{t\in \mathsf {T}(\mathbb {R})^{\circ } } \int _{n\in \overline{\mathsf {N}}(\mathbb {R})} \int _{k\in K } f(tnk)\, dk\, dn\, dt = \int _{t\in \mathsf {T}(\mathbb {R})^{\circ } } \int _{n\in \overline{\mathsf {N}}(\mathbb {R})} \int _{k\in K } f(ntk)\delta _{\mathsf {G}}(t)^{-1} \, dk\, dn\, dt \end{aligned}$$

defines a Haar measure on \(\mathsf {G}(\mathbb {R})\).

We now fix Haar measures on the groups \(\mathsf {T}(\mathbb {R})^{\circ }, K\) and \(\overline{\mathsf {N}}(\mathbb {R})\), as follows. We give \(\mathsf {T}(\mathbb {R})^{\circ }\) the measure pulled back from the isomorphism \(\prod _{\beta \in S_{\mathsf {G}}} \beta :\mathsf {T}(\mathbb {R})^{\circ } \rightarrow \mathbb {R}_{>0}^4\), where \(\mathbb {R}_{>0}\) gets its standard Haar measure \(d^{\times } \lambda = d\lambda /\lambda \). We give K its probability Haar measure. Finally we give \(\overline{\mathsf {N}}(\mathbb {R})\) the unique Haar measure dn such that the Haar measure on \(\mathsf {G}(\mathbb {R})\) from Lemma 6.1 coincides with dg.

Next we introduce measures on \(\mathsf {V}\) and \(\mathsf {B}\). Let \(\omega _{\mathsf {V}}\) be a generator for the free rank one \(\mathbb {Z}\)-module of left-invariant top differential forms on \(\underline{\mathsf {V}}\). Then \(\omega _{\mathsf {V}}\) is uniquely determined up to sign and it determines Haar measures dv on \(\mathsf {V}(\mathbb {R})\) and \(\mathsf {V}(\mathbb {Q}_p)\) for every prime p. We define the top form \(\omega _{\mathsf {B}} = dp_2\wedge dp_6 \wedge dp_8 \wedge dp_{12}\) on \(\underline{\mathsf {B}}\). It defines measures db on \(\mathsf {B}(\mathbb {R})\) and \(\mathsf {B}(\mathbb {Q}_p)\) for every prime p.

Lemma 6.2

There exists a unique rational number \(W_0\in \mathbb {Q}^{\times }\) with the following property. Let \(k/\mathbb {Q}\) be a field extension, let \(\mathfrak {c}\) a Cartan subalgebra of \(\mathfrak {h}_k\) contained in \(\mathsf {V}_k\), and let \(\mu _{\mathfrak {c}}:\mathsf {G}_k \times \mathfrak {c} \rightarrow \mathsf {V}_k\) be the natural action map. Then \(\mu ^*_{\mathfrak {c}}\omega _{\mathsf {V}} = W_0 \omega _{\mathsf {G}}\wedge \pi |_{\mathfrak {c}}^*\omega _{\mathsf {B}}\).

Proof

The proof is identical to that of [61,  Proposition 2.13]. Here we use the fact that the sum of the invariants equals the dimension of the representation: \(2+6+8+12 = 28 = \dim _{\mathbb {Q}} \mathsf {V}\). \(\square \)

Lemma 6.3

Let \(W_0\in \mathbb {Q}^{\times }\) be the constant of Lemma 6.2. Then:

  1. 1.

    Let \(\underline{\mathsf {V}}(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}} = \underline{\mathsf {V}}(\mathbb {Z}_p)\cap \mathsf {V}^{{{\,\mathrm{rs}\,}}}(\mathbb {Q}_p)\) and define a function \(m_p: \underline{\mathsf {V}}(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}} \rightarrow \mathbb {R}_{\ge 0}\) by the formula

    $$\begin{aligned} m_p(v) = \sum _{v' \in \underline{\mathsf {G}}(\mathbb {Z}_p)\backslash \left( \mathsf {G}(\mathbb {Q}_p)\cdot v\cap \underline{\mathsf {V}}(\mathbb {Z}_p) \right) } \frac{\#Z_{\underline{\mathsf {G}}}(v)(\mathbb {Q}_p) }{\#Z_{\underline{\mathsf {G}}}(v)(\mathbb {Z}_p) } . \end{aligned}$$
    (6.1)

    Then \(m_p(v)\) is locally constant.

  2. 2.

    Let \(\underline{\mathsf {B}}(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}} = \underline{\mathsf {B}}(\mathbb {Z}_p)\cap \mathsf {B}^{{{\,\mathrm{rs}\,}}}(\mathbb {Q}_p)\) and let \(\psi _p: \underline{\mathsf {V}}(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}} \rightarrow \mathbb {R}_{\ge 0}\) be a bounded, locally constant function which satisfies \(\psi _p(v) = \psi _p(v')\) when \(v,v'\in \underline{\mathsf {V}}(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}}\) are conjugate under the action of \(\mathsf {G}(\mathbb {Q}_p)\). Then we have the formula

    $$\begin{aligned} \int _{v\in \underline{\mathsf {V}}(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}}} \psi _p(v) \mathrm {d} v = |W_0|_p {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z}_p)\right) \int _{b\in \underline{\mathsf {B}}(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}}} \sum _{g\in \mathsf {G}(\mathbb {Q}_p)\backslash \underline{\mathsf {V}}_b(\mathbb {Z}_p) } \frac{m_p(v)\psi _p(v) }{\# Z_{\underline{\mathsf {G}}}(v)(\mathbb {Q}_p)} \mathrm {d} b . \end{aligned}$$
    (6.2)

Proof

The proof is identical to that of [53,  Proposition 3.3], using Lemma 6.2. \(\square \)

6.3 Fundamental sets

Let \(K\subset \mathsf {G}(\mathbb {R})\) be the maximal compact subgroup fixed in Sect. 6.2. For any \(c\in \mathbb {R}_{>0}\), define \(T_c :=\{t\in \mathsf {T}(\mathbb {R})^{\circ } \mid \forall \beta \in S_{\mathsf {G}} ,\, \beta (t) \le c \}\). A is, by definition, any subset \(\mathfrak {S}_{\omega , c} :=\omega \cdot T_c \cdot K\), where \(\omega \subset \overline{\mathsf {N}}(\mathbb {R})\) is a compact subset and \(c>0\).

Proposition 6.2

  1. 1.

    For every \(\omega \subset \overline{\mathsf {N}}(\mathbb {R})\) and \(c>0\), the set

    $$\begin{aligned} \{\gamma \in \underline{\mathsf {G}}(\mathbb {Z}) \mid \gamma \cdot \mathfrak {S}_{\omega ,c} \cap \mathfrak {S}_{\omega ,c}\ne \emptyset \} \end{aligned}$$

    is finite.

  2. 2.

    We can choose \(\omega \subset \overline{\mathsf {N}}(\mathbb {R})\) and \(c>0\) such that \(\underline{\mathsf {G}}(\mathbb {Z}) \cdot \mathfrak {S}_{\omega ,c} = \mathsf {G}(\mathbb {R})\).

Proof

The first part follows from the Siegel property [16,  Corollaire 15.3]. By [46,  Theorem 4.15], the second part is reduced to proving that \(\mathsf {G}(\mathbb {Q})=\mathsf {P}(\mathbb {Q})\cdot \underline{\mathsf {G}}(\mathbb {Z})\). This follows from [15,  §6, Lemma 1(b)], using that (in the terminology of that paper) the lattice \(\underline{\mathsf {V}}\) is special with respect to the pinning \((\mathsf {T},\mathsf {P},\{X_{\alpha }\})\). \(\square \)

Now fix \(\omega \subset \overline{\mathsf {N}}(\mathbb {R})\) and \(c>0\) so that \(\mathfrak {S}_{\omega ,c}\) satisfies the conclusions of Proposition 6.2. By enlarging \(\omega \), we may assume that \(\mathfrak {S}_{\omega ,c}\) is semialgebraic. We drop the subscripts and for the remainder of Sect. 6 we write \(\mathfrak {S}\) for this fixed Siegel set. The set \(\mathfrak {S}\) will serve as a fundamental domain for the action of \(\underline{\mathsf {G}}(\mathbb {Z})\) on \(\mathsf {G}(\mathbb {R})\).

A \(\underline{\mathsf {G}}(\mathbb {Z})\)-coset of \(\mathsf {G}(\mathbb {R})\) may be represented more than once in \(\mathfrak {S}\), but by keeping track of the multiplicities this will not cause any problems. The surjective map \(\varphi :\mathfrak {S}\rightarrow \underline{\mathsf {G}}(\mathbb {Z})\backslash \mathsf {G}(\mathbb {R})\) has finite fibres and if \(g\in \mathfrak {S}\) we define \(\mu (g) :=\# \varphi ^{-1}(\varphi (g))\). The function \(\mu :\mathfrak {S}\rightarrow \mathbb {N}\) is uniformly bounded by \(\mu _{\max } :=\# \{\gamma \in \underline{\mathsf {G}}(\mathbb {Z}) \mid \gamma \mathfrak {S}\cap \mathfrak {S}\ne \emptyset \} \) and has semialgebraic fibres. By pushing forward measures via \(\varphi \), we obtain the formula

$$\begin{aligned} \int _{g\in \mathfrak {S}} \mu (g)^{-1} \,dg = {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z}) \backslash \mathsf {G}(\mathbb {R}) \right) . \end{aligned}$$
(6.3)

We now construct special subsets of \(\mathsf {V}^{{{\,\mathrm{rs}\,}}}(\mathbb {R})\) which serve as our fundamental domains for the action of \(\mathsf {G}(\mathbb {R})\) on \(\mathsf {V}^{{{\,\mathrm{rs}\,}}}(\mathbb {R})\). By the same reasoning as in [61,  § 2.9], we can find open subsets \(L_1,\dots , L_k\) of \(\{b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(\mathbb {R})\mid \text {ht}(b)=1 \}\) and sections \(s_i :L_i \rightarrow \mathsf {V}(\mathbb {R})\) of the map \(\pi :\mathsf {V}\rightarrow \mathsf {B}\) satisfying the following properties:

  • For each i, \(L_i\) is connected and semialgebraic and \(s_i\) is a semialgebraic map with bounded image.

  • Set \(\varLambda = \mathbb {R}_{>0}\). Then we have an equality

    $$\begin{aligned} \mathsf {V}^{{{\,\mathrm{rs}\,}}}(\mathbb {R}) = \bigcup _{i=1}^k \mathsf {G}(\mathbb {R}) \cdot \varLambda \cdot s_i(L_i). \end{aligned}$$
    (6.4)

If \(v\in s_i(L_i)\) let \(r_i = \# Z_{\mathsf {G}}(v)(\mathbb {R})\); this integer is independent of the choice of v. We record the following change-of-measure formula, which follows from Lemma 6.2.

Lemma 6.4

Let \(\phi :\mathsf {V}(\mathbb {R}) \rightarrow \mathbb {C}\) be a continuous function of compact support and \(i\in \{1,\dots ,k\}\). Let \(G_0\subset \mathsf {G}(\mathbb {R})\) be a measurable subset and let \(m_{\infty }(v)\) be the cardinality of the fibre of the map \(G_0 \times \varLambda \times L_i \rightarrow \mathsf {V}(\mathbb {R}), (g,\lambda ,l)\mapsto g\cdot \lambda \cdot s(l)\) above \(v\in \mathsf {V}(\mathbb {R})\). Then

$$\begin{aligned} \int _{v\in G_0 \cdot \varLambda \cdot s_i(L_i)} f(v)m_{\infty }(v) \,dv = |W_0| \int _{b\in \varLambda \cdot L_i}\int _{g\in G_0} f(g\cdot s_i(b)) \,dg \,db, \end{aligned}$$

where \(W_0\in \mathbb {Q}^{\times }\) is the scalar of Lemma 6.2.

6.4 Counting integral orbits in \(\mathsf {V}\)

For any \(\underline{\mathsf {G}}(\mathbb {Z})\)-invariant subset \(A\subset \underline{\mathsf {V}}(\mathbb {Z})\), define

$$\begin{aligned} N(A,X) :=\sum _{v\in \underline{\mathsf {G}}(\mathbb {Z})\backslash A_{<X}} \frac{1}{\# Z_{\underline{\mathsf {G}}}(v)(\mathbb {Z})}. \end{aligned}$$

(Recall that \(A_{<X}\) denotes the elements of A of height \(<X\).) Let k be a field of characteristic not dividing N. We say an element \(v\in \underline{\mathsf {V}}(k)\) with \(b=\pi (v)\) is:

  • k- if \(\varDelta (b)=0\) or if it is \(\underline{\mathsf {G}}(k)\)-conjugate to the Kostant section \(\sigma (b)\), and k- otherwise.

  • k- if \(\varDelta (b)=0\) or if \(\mathcal {Q}(v)\) is \(\underline{\mathsf {G}}^{\star }(k)\)-conjugate to \(\mathcal {Q}(\sigma (b))\), and k otherwise.

  • k- if \(\varDelta (b)\ne 0\) and it lies in the image of the map \(\eta _b: \mathcal {P}_b(k)/2\mathcal {P}_b(k) \rightarrow \underline{\mathsf {G}}(k)\backslash \underline{\mathsf {V}}_b(k)\) of Theorem 4.1.

We note that every strongly k-irreducible element is k-irreducible by Lemma 2.2. For any \(A\subset \underline{\mathsf {V}}(\mathbb {Z})\), write \(A^{irr}\subset A\) for the subset of \(\mathbb {Q}\)-irreducible elements and \(A^{sirr} \subset A\) for the subset of strongly \(\mathbb {Q}\)-irreducible elements. Write \(\mathsf {V}(\mathbb {R})^{sol} \subset \mathsf {V}(\mathbb {R})\) for the subset of \(\mathbb {R}\)-soluble elements.

Theorem 6.1

We have

$$\begin{aligned} N(\underline{\mathsf {V}}(\mathbb {Z})^{sirr} \cap \mathsf {V}(\mathbb {R})^{sol},X) = \frac{|W_0|}{4}{{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z})\backslash \mathsf {G}(\mathbb {R})\right) {{\,\mathrm{vol}\,}}\left( \mathsf {B}(\mathbb {R})_{<X} \right) + o\left( X^{28}\right) , \end{aligned}$$

where \(W_0\in \mathbb {Q}^{\times }\) is the scalar of Lemma 6.2.

We first explain how to reduce Theorem 6.1 to Proposition 6.3. Recall that there exists \(\mathbb {G}_m\)-actions on \(\mathsf {V}\) and \(\mathsf {B}\) such that the morphism \(\pi : \mathsf {V}\rightarrow \mathsf {B}\) is \(\mathbb {G}_m\)-equivariant and that we write \(\varLambda = \mathbb {R}_{>0}\). By an argument identical to [34,  Lemma 5.5], the subset \(\mathsf {V}(\mathbb {R})^{sol}\subset \mathsf {V}^{{{\,\mathrm{rs}\,}}}(\mathbb {R})\) is open and closed in the Euclidean topology. Therefore by discarding some of the subsets \(L_1,\dots ,L_k\) of Sect. 6.3, we may write \(\mathsf {V}(\mathbb {R})^{sol} = \bigcup _{i\in J} \mathsf {G}(\mathbb {R})\cdot \varLambda \cdot s_i(L_i)\) for some \(J\subset \{1,\dots ,k\}\). Moreover for every \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(\mathbb {R})\) we have equalities

$$\begin{aligned} \#\left( \mathsf {G}(\mathbb {R})\backslash \mathsf {V}_b(\mathbb {R})^{sol}\right) /\#Z_{\mathsf {G}}(\sigma (b))(\mathbb {R})= \#\left( P_b(\mathbb {R})/2P_b(\mathbb {R})\right) /\#P_b[2](\mathbb {R})=1/4, \end{aligned}$$

where the first follows from the definition of \(\mathbb {R}\)-solubility and Proposition 3.4, and the second is a general fact about real abelian surfaces. Therefore by the inclusion-exclusion principle, to prove Theorem 6.1 it suffices to prove the following proposition.

For any subset I of \(\{1,\dots ,k\}\), write \(L_I=\pi \left( \cap _{i\in I} \mathsf {G}(\mathbb {R})\cdot s_i(L_i) \right) \). Write \(s_I\) for the restriction of \(s_i\) to \(L_I\) and write \(r_I=r_i\) for some choice of \(i\in I\). (The section \(s_I\) may depend on i but the number \(r_I\) does not if \(L_I\) is non-empty.)

Proposition 6.3

In the above notation, let (Lsr) be \((L_I, s_I, r_I)\) for some \(I\subset \{1,\dots , k\}\).

Then

$$\begin{aligned} N(\mathsf {G}(\mathbb {R})\cdot \varLambda \cdot s(L) \cap \underline{\mathsf {V}}(\mathbb {Z})^{sirr},X) = \frac{|W_0|}{r}{{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z}) \backslash \mathsf {G}(\mathbb {R}) \right) {{\,\mathrm{vol}\,}}((\varLambda \cdot L)_{<X}) +o\left( X^{28} \right) . \end{aligned}$$

So to prove Theorem 6.1 it remains to prove Proposition 6.3. For the latter we will follow the general orbit-counting techniques established by Bhargava, Shankar and Gross [7, 12] closely. The only notable differences are that we work with a Siegel set instead of a true fundamental domain and that we have to carry out a case-by-case analysis for cutting off the cusp in Sect. 6.10. For the remainder of Sect. 6 we fix a triple (Lsr) as above with \(L\ne \emptyset \).

6.5 First reductions

We first reduce Proposition 6.3 to estimating the number of (weighted) lattice points in a region of \(\mathsf {V}(\mathbb {R})\). Recall that \(\mathfrak {S}\) denotes the Siegel set fixed in Sect. 6.3 and it comes with a multiplicity function \(\mu :\mathfrak {S}\rightarrow \mathbb {N}\). Because \(\underline{\mathsf {G}}(\mathbb {Z})\cdot \mathfrak {S}= \mathsf {G}(\mathbb {R})\), every element of \(\mathsf {G}(\mathbb {R})\cdot \varLambda \cdot s(L)\) is \(\underline{\mathsf {G}}(\mathbb {Z})\)-equivalent to an element of \(\mathfrak {S}\cdot \varLambda \cdot s(L)\). In fact, we can be more precise about how often a \(\underline{\mathsf {G}}(\mathbb {Z})\)-orbit will be represented in \(\mathfrak {S}\cdot \varLambda \cdot s(L)\). Let \(\nu :\mathfrak {S}\cdot \varLambda \cdot s(L) \rightarrow \mathbb {R}_{>0}\) be the ‘weight’ function defined by

$$\begin{aligned} x\mapsto \nu (x) :=\sum _{\begin{array}{c} g\in \mathfrak {S}\\ x\in g\cdot \varLambda \cdot s(L) \end{array}} \mu (g)^{-1}. \end{aligned}$$
(6.5)

Then \(\nu \) takes only finitely many values and has semialgebraic fibres. We now claim that if every element of \(\mathfrak {S}\cdot \varLambda \cdot s(L)\) is weighted by \(\nu \), then the \(\underline{\mathsf {G}}(\mathbb {Z})\)-orbit of an element \(x\in \mathsf {G}(\mathbb {R}) \cdot \varLambda \cdot s(L)\) is represented exactly \(\#Z_{\mathsf {G}}(x)(\mathbb {R})/\#Z_{\underline{\mathsf {G}}}(x)(\mathbb {Z})\) times. More precisely, for any \(x \in \mathsf {G}(\mathbb {R}) \cdot \varLambda \cdot s(L)\) we have

$$\begin{aligned} \sum _{x' \in \underline{\mathsf {G}}(\mathbb {Z}) \cdot x \cap \mathfrak {S}\cdot \varLambda \cdot s(L) } \nu (x') = \frac{\#Z_{\mathsf {G}}(x)(\mathbb {R})}{\#Z_{\underline{\mathsf {G}}}(x)(\mathbb {Z})}. \end{aligned}$$
(6.6)

This follows from an argument similar to [12,  p. 202], by additionally keeping track of the multiplicity function \(\mu \).

In conclusion, for any \(\underline{\mathsf {G}}(\mathbb {Z})\)-invariant subset \(A\subset \underline{\mathsf {V}}(\mathbb {Z}) \cap \mathsf {G}(\mathbb {R}) \cdot \varLambda \cdot s(L)\) we have

$$\begin{aligned} N(A,X) = \frac{1}{r} \#\left[ A \cap (\mathfrak {S}\cdot \varLambda \cdot s(L))_{<X} \right] , \end{aligned}$$
(6.7)

with the caveat that elements on the right-hand-side are weighted by \(\nu \). (Recall that \(r = \#Z_{\mathsf {G}}(v)(\mathbb {R})\) for some \(v\in s(L) \).)

6.6 Averaging and counting lattice points

We consider an averaged version of (6.7) and obtain a useful expression for N(AX) (Lemma 6.5) using a trick due to Bhargava. Then we use this expression to count orbits lying in the ‘main body’ of \(\mathsf {V}\) using geometry-of-numbers techniques, see Proposition 6.6.

Fix a compact, semialgebraic subset \(G_0 \subset \mathsf {G}(\mathbb {R}) \times \varLambda \) of non-empty interior, that in addition satisfies \(K\cdot G_0 = G_0\), \({{\,\mathrm{vol}\,}}(G_0) = 1\) and the projection of \(G_0\) onto \(\varLambda \) is contained in \([1,K_0]\) for some \(K_0>1\). Moreover we may suppose that \(G_0\) is of the form \(G_0'\times [1,K_0]\) where \(G_0'\) is a subset of \(\mathsf {G}(\mathbb {R})\). Equation (6.7) still holds when L is replaced by hL for any \(h\in \mathsf {G}(\mathbb {R})\), by the same argument given above. Thus for any \(\underline{\mathsf {G}}(\mathbb {Z})\)-invariant \(A\subset \underline{\mathsf {V}}(\mathbb {Z}) \cap \mathsf {G}(\mathbb {R}) \cdot \varLambda \cdot s(L)\) we obtain

$$\begin{aligned} N(A,X) = \frac{1}{r} \int _{h\in G_0} \# \left[ A\cap (\mathfrak {S}\cdot \varLambda \cdot hs(L))_{<X} \right] \, dh. \end{aligned}$$
(6.8)

We use Eq. (6.8) to define N(AX) for any subset \(A\subset \underline{\mathsf {V}}(\mathbb {Z}) \cap \mathsf {G}(\mathbb {R}) \cdot \varLambda \cdot s(L)\) which is not necessarily \(\underline{\mathsf {G}}(\mathbb {Z})\)-invariant. We can rewrite this integral using the decomposition \(\mathfrak {S}= \omega \cdot T_c \cdot K\) and an argument similar to [12,  §2.3], which we omit. We obtain:

Lemma 6.5

Given \(X\ge 1\), \(n\in \overline{\mathsf {N}}(\mathbb {R})\), \(t\in \mathsf {T}(\mathbb {R})\) and \(\lambda \in \varLambda \), define \(B(n,t,\lambda ,X) :=(nt\lambda G_0 \cdot s(L))_{<X}\). Then for any subset \(A\subset \underline{\mathsf {V}}(\mathbb {Z}) \cap (\mathsf {G}(\mathbb {R})\cdot \varLambda \cdot s(L) ) \) we have

$$\begin{aligned} N(A,X) = \frac{1}{r}\int _{\lambda = K_0^{-1}}^{X} \int _{t\in T_c} \int _{n\in \omega } \#\left[ A \cap B(n,t,\lambda ,X) \right] \mu (nt)^{-1} \delta _{\mathsf {G}}(t)^{-1} \, dn \, dt \, d^{\times }\lambda , \end{aligned}$$
(6.9)

where an element \(v\in A\cap B(n,t,\lambda ,X)\) on the right hand side is counted with weight \(\#\{h\in G_0 \mid v\in nt\lambda h \cdot s(L)) \}\).

Before estimating the integrand of (6.9) by counting lattice points in the bounded regions \(B(n,t,\lambda ,X)\), we first need to handle the so-called cuspidal region after recalling and introducing some notation.

Recall that any \(v\in \mathsf {V}(\mathbb {Q})\) can be decomposed as \(\sum v_{\alpha }\) where \(v_{\alpha }\) lies in the weight space corresponding to \(\alpha \in \varPhi _{\mathsf {V}}\). In Sect. 2.6 we have defined the subspace \(\mathsf {V}(M)\subset \mathsf {V}\) of elements v with \(v_{\alpha } = 0\) for all \(\alpha \in M\). Define \(S(M) :=\mathsf {V}(M)(\mathbb {Q})\cap \underline{\mathsf {V}}(\mathbb {Z})\). Recall that \(\alpha _0 \in \varPhi _{\mathsf {V}}\) denotes the highest root of \(\varPhi _{\mathsf {H}}\); it is maximal with respect to the partial ordering on \(\varPhi _{\mathsf {V}}\) defined by (2.1) in Sect. 2.4. We define \(S(\alpha _0)\) as the and \(\underline{\mathsf {V}}(\mathbb {Z})\setminus S(\alpha _0)\) as the of \(\mathsf {V}\).

The next proposition, proved in Sect. 6.10, says that the number of strongly irreducible elements in the cuspidal region is negligible.

Proposition 6.4

There exists \(\delta >0\) such that \(N(S(\alpha _0)^{sirr},X) = O(X^{28-\delta })\).

Having dealt with the cuspidal region, we may now count lattice points in the main body using the following proposition [2,  Theorem 1.3], which strengthens a well-known result of Davenport [26].

Proposition 6.5

Let \(m,n\ge 1\) be integers, and let \(Z\subset \mathbb {R}^{m+n}\) be a semialgebraic subset. For \(T\in \mathbb {R}^m\), let \(Z_T = \{x\in \mathbb {R}^n\mid (T,x) \in Z\}\), and suppose that all such subsets \(Z_T\) are bounded. Then for any unipotent upper-triangular matrix \(u\in {{\,\mathrm{GL}\,}}_n(\mathbb {R})\), we have

$$\begin{aligned} \#(Z_T \cap u\mathbb {Z}^n) = {{\,\mathrm{vol}\,}}(Z_T)+O(\max \{1,{{\,\mathrm{vol}\,}}(Z_{T,j}\}), \end{aligned}$$

where \(Z_{T,j}\) runs over all orthogonal projections of \(Z_T\) to any j-dimensional coordinate hyperplane \((1\le j \le n-1)\). Moreover, the implied constant depends only on Z.

Proposition 6.6

Let \(A = \underline{\mathsf {V}}(\mathbb {Z})\cap (\mathsf {G}(\mathbb {R})\cdot \varLambda \cdot s(L))\). Then

$$\begin{aligned} N(A \setminus S(\alpha _0) ,X) = \frac{|W_0|}{r}{{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z}) \backslash \mathsf {G}(\mathbb {R}) \right) {{\,\mathrm{vol}\,}}((\varLambda \cdot L)_{<X})+o(X^{28}). \end{aligned}$$

Proof

This follows from estimating the set \(\#[(A\setminus S(\alpha _0)) \cap B(n,t,\lambda ,X)]\) using Proposition 6.5, together with Lemmas 6.4 and 6.5 and Formula (6.3); we omit the details. (See [53,  Proposition 4.6] for a similar proof.) \(\square \)

6.7 End of the proof of Proposition 6.3

The following proposition is proven in Sect. 6.9.

Proposition 6.7

Let \(\mathsf {V}^{alred}\) denote the subset of almost \(\mathbb {Q}\)-reducible elements \(v\in \underline{\mathsf {V}}(\mathbb {Z})\) with \(v\not \in S(\alpha _0)\). Then \(N(\mathsf {V}^{alred},X) = o(X^{28})\).

We now finish the proof of Proposition 6.3. Again let \(A = \underline{\mathsf {V}}(\mathbb {Z})\cap (\mathsf {G}(\mathbb {R})\cdot \varLambda \cdot s(L))\). Then

$$\begin{aligned} N(A^{sirr},X) = N(A^{sirr} \setminus S(\alpha _0),X)+N(S(\alpha _0)^{sirr},X) \end{aligned}$$

The second term on the right-hand-side is \(o(X^{28})\) by Proposition 6.4, and \(N(A^{sirr} \setminus S(\alpha _0),X)= N(A \setminus S(\alpha _0),X)+o(X^{28})\) by Proposition 6.7. Using Proposition 6.6, we obtain

$$\begin{aligned} N(A^{sirr},X) = \frac{|W_0|}{r}{{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z}) \backslash \mathsf {G}(\mathbb {R}) \right) {{\,\mathrm{vol}\,}}((\varLambda \cdot L)_{<X})+o(X^{28}). \end{aligned}$$

This completes the proof of Proposition 6.3, hence also that of Theorem 6.1.

6.8 Congruence conditions

We now introduce a weighted version of Theorem 6.1. If \(w:\underline{\mathsf {V}}(\mathbb {Z}) \rightarrow \mathbb {R}\) is a function and \(A\subset \underline{\mathsf {V}}(\mathbb {Z})\) is a \(\underline{\mathsf {G}}(\mathbb {Z})\)-invariant subset we define

$$\begin{aligned} N_w(A,X) :=\sum _{\begin{array}{c} v\in \underline{\mathsf {G}}(\mathbb {Z})\backslash A \\ \text {ht}(v)<X \end{array}} \frac{w(v)}{\# Z_{\underline{\mathsf {G}}}(v)(\mathbb {Z})}. \end{aligned}$$
(6.10)

We say a function w is if w is obtained from pulling back a function \(\bar{w} :\underline{\mathsf {V}}(\mathbb {Z}/M\mathbb {Z}) \rightarrow \mathbb {R}\) along the projection \(\underline{\mathsf {V}}(\mathbb {Z}) \rightarrow \underline{\mathsf {V}}(\mathbb {Z}/M\mathbb {Z})\) for some \(M \ge 1\). For such a function write \(\mu _w\) for the average of \(\bar{w}\) where we put the uniform measure on \(\underline{\mathsf {V}}(\mathbb {Z}/M\mathbb {Z})\). The following theorem follows immediately from the proof of Theorem 6.1, compare [12,  §2.5].

Theorem 6.2

Let \(w:\underline{\mathsf {V}}(\mathbb {Z}) \rightarrow \mathbb {R}\) be defined by finitely many congruence conditions. Then

$$\begin{aligned} N_w(\underline{\mathsf {V}}(\mathbb {Z})^{sirr} \cap \mathsf {V}(\mathbb {R})^{sol},X) = \mu _w \frac{|W_0|}{4}{{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z})\backslash \mathsf {G}(\mathbb {R})\right) {{\,\mathrm{vol}\,}}\left( \mathsf {B}(\mathbb {R})_{<X} \right) + o\left( X^{28}\right) , \end{aligned}$$

where \(W_0\in \mathbb {Q}^{\times }\) is the scalar of Lemma 6.2.

Next we will consider infinitely many congruence conditions. Suppose we are given for each prime p a \(\underline{\mathsf {G}}(\mathbb {Z}_p)\)-invariant function \(w_p: \underline{\mathsf {V}}(\mathbb {Z}_p) \rightarrow [0,1]\) with the following properties:

  • The function \(w_p\) is locally constant outside the closed subset \(\{v\in \underline{\mathsf {V}}(\mathbb {Z}_p) \mid \varDelta (v) = 0\} \subset \underline{\mathsf {V}}(\mathbb {Z}_p)\).

  • For p sufficiently large, we have \(w_p(v) = 1\) for all \(v \in \underline{\mathsf {V}}(\mathbb {Z}_p)\) such that \(p^2 \not \mid \varDelta (v)\).

In this case we can define a function \(w: \underline{\mathsf {V}}(\mathbb {Z}) \rightarrow [0,1]\) by the formula \(w(v) = \prod _{p} w_p(v)\) if \(\varDelta (v) \ne 0\) and \(w(v) = 0\) otherwise. Call a function \(w: \underline{\mathsf {V}}(\mathbb {Z}) \rightarrow [0,1]\) defined by this procedure .

Theorem 6.3

Let \(w: \underline{\mathsf {V}}(\mathbb {Z}) \rightarrow [0,1]\) be an acceptable function. Then

$$\begin{aligned} N_w(\underline{\mathsf {V}}(\mathbb {Z})^{irr}\cap \mathsf {V}^{sol}(\mathbb {R}) ,X) \le \frac{|W_0|}{4} \left( \prod _p \int _{\underline{\mathsf {V}}(\mathbb {Z}_p)} w_p(v) \mathrm {d} v \right) {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z}) \backslash \mathsf {G}(\mathbb {R}) \right) {{\,\mathrm{vol}\,}}\left( \mathsf {B}(\mathbb {R})_{<X} \right) + o(X^{28}). \end{aligned}$$

Proof

This inequality follows from Theorem 6.2; the proof is identical to the first part of the proof of [12,  Theorem 2.21].

6.9 Estimates on reducibility and stabilizers

In this subsection we give the proof of Proposition 6.7 and the following proposition which will be useful in Sect. 8.

Proposition 6.8

Let \(\mathsf {V}^{bigstab}\) denote the subset of strongly \(\mathbb {Q}\)-irreducible elements \(v\in \underline{\mathsf {V}}(\mathbb {Z})\) with \(\#Z_{\mathsf {G}}(v)(\mathbb {Q})>1\). Then \(N(\mathsf {V}^{bigstab},X) = o(X^{28})\).

By the same reasoning as [7,  §10.7] it will suffice to prove Lemma 6.6 below, after having introduced some notation.

Let N be the integer of Sect. 5.1 and let p be a prime not dividing N. We define \(\mathsf {V}_p^{alred}\subset \underline{\mathsf {V}}(\mathbb {Z}_p)\) to be the set of vectors whose reduction mod p is almost \(\mathbb {F}_p\)-reducible. We define \(\mathsf {V}_p^{bigstab} \subset \underline{\mathsf {V}}(\mathbb {Z}_p)\) to be the set of vectors \(v\in \underline{\mathsf {V}}(\mathbb {Z}_p)\) such that \(p | \varDelta (v)\) or the image \(\bar{v}\) of v in \(\underline{\mathsf {V}}(\mathbb {F}_p)\) has nontrivial stabilizer in \(\underline{\mathsf {G}}(\mathbb {F}_p)\).

Lemma 6.6

We have

$$\begin{aligned} \lim _{Y\rightarrow +\infty } \prod _{N<p<Y} \int _{\mathsf {V}_p^{alred}} \,dv = 0, \end{aligned}$$

and similarly

$$\begin{aligned} \lim _{Y\rightarrow +\infty } \prod _{N<p<Y} \int _{\mathsf {V}_p^{bigstab}} \,dv = 0. \end{aligned}$$

Proof

The proof is very similar to the proof of [34,  Lemma 5.7] which is in turn based on the proof of [52,  Proposition 6.9]. We first treat the case of \(\mathsf {V}_p^{alred}\). We have the formula

$$\begin{aligned} \int _{\mathsf {V}_p^{alred}} \,dv = \frac{1}{\# \underline{\mathsf {V}}(\mathbb {F}_p)}\# \{v\in \underline{\mathsf {V}}(\mathbb {F}_p)\mid v \text { is almost }\mathbb {F}_p\text {-reducible} \}. \end{aligned}$$
(6.11)

Since \(\#\underline{\mathsf {V}}(\mathbb {F}_p) = \#\underline{\mathsf {V}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p)+O(p^{27})\), it suffices to prove that there exists a nonnegative \(\delta <1\) with the property that

$$\begin{aligned} \frac{1}{\# \underline{\mathsf {V}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p)}\# \{v\in \underline{\mathsf {V}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p)\mid v \text { is almost }\mathbb {F}_p\text {-reducible} \}<\delta \end{aligned}$$

for all p large enough. If \(b\in \underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p)\), Proposition 5.1 and the triviality of \(\mathrm {H}^1(\mathbb {F}_p, \underline{\mathsf {G}})\) (Lang’s theorem) show that \(\underline{\mathsf {V}}_b(\mathbb {F}_p)\) is partitioned into \(\#\mathrm {H}^1(\mathbb {F}_p, \mathcal {P}_b[2])\) many orbits, each of size \(\#\underline{\mathsf {G}}(\mathbb {F}_p)/\#\mathcal {P}_b[2](\mathbb {F}_p)\). Since \(\#\mathcal {P}_b[2](\mathbb {F}_p) = \#\mathcal {P}_b(\mathbb {F}_p)/2\mathcal {P}_b(\mathbb {F}_p) = \#\mathrm {H}^1(\mathbb {F}_p, \mathcal {P}_b[2])\), we have \(\#\underline{\mathsf {V}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p) = \#\underline{\mathsf {G}}(\mathbb {F}_p)\#\underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p)\). Moreover by Corollary 4.3 (or rather a similar statement for \(\mathbb {Z}[1/N]\)-algebras, which continues to hold by the same proof), an orbit corresponding to an element of \(\mathrm {H}^1(\mathbb {F}_p, \mathcal {P}_b[2])\) is almost \(\mathbb {F}_p\)-reducible if and only if its image in \(\mathrm {H}^1(\mathbb {F}_p, \hat{\mathcal {E}}_b[2])\) is trivial. Therefore the left-hand-side of (6.11) equals

$$\begin{aligned} \frac{1}{\#\underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p)}\sum _{b\in \underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p) } \frac{\# \ker \left( \mathrm {H}^1(\mathbb {F}_p,\mathcal {P}_b[2]) \rightarrow \mathrm {H}^1(\mathbb {F}_p, \hat{\mathcal {E}}_b[2]) \right) }{\# \mathcal {P}_b[2](\mathbb {F}_p) }. \end{aligned}$$
(6.12)

We have \(\# \ker \left( \mathrm {H}^1(\mathbb {F}_p,\mathcal {P}_b[2]) \rightarrow \mathrm {H}^1(\mathbb {F}_p, \hat{\mathcal {E}}_b[2]) \right) \le \#\mathrm {H}^1(\mathbb {F}_p, \mathcal {E}_b[2])\) by Corollary 3.2. Since \(\# \mathrm {H}^1(\mathbb {F}_p, \mathcal {E}_b[2]) =\#\mathcal {E}_b[2]\), the quantity (6.12) is bounded above by

$$\begin{aligned} \frac{1}{\#\underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p)}\sum _{b\in \underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p) } \frac{\# \mathcal {E}_b[2](\mathbb {F}_p) }{\# \mathcal {P}_b[2](\mathbb {F}_p) } . \end{aligned}$$
(6.13)

Each summand in (6.13) is the inverse of an integer; let \(\eta _p\) be the proportion of \(b\in \underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p)\) where this summand equals 1. Then the quantity (6.13) is \(\le \eta _p + (1-\eta _p)/2 = 1/2+\eta _p/2\). So it suffices to prove that \(\eta _p \rightarrow \eta \) for some \(\eta <1\). In the notation of Proposition 3.3, let \(C\subset W_{\mathrm {E}}^{\zeta }\) be the subset of elements such that

$$\begin{aligned} \frac{\#\left( (1+\zeta )\varLambda /2\varLambda \right) ^w}{\#\left( (\varLambda /2\varLambda )^{\zeta }\right) ^w } = 1. \end{aligned}$$

Then [54,  Proposition 9.15] applied to the \(W_{\mathrm {E}}^{\zeta }\)-torsor \(\mathfrak {t}^{{{\,\mathrm{rs}\,}}}\rightarrow \mathsf {B}^{{{\,\mathrm{rs}\,}}}\) from Proposition 3.3 implies that

$$\begin{aligned} \frac{1}{\# \underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p)}\#\left\{ b\in \underline{\mathsf {B}}^{{{\,\mathrm{rs}\,}}}(\mathbb {F}_p) \mid \frac{\# \mathcal {E}_b[2](\mathbb {F}_p) }{\# \mathcal {P}_b[2](\mathbb {F}_p) } = 1 \right\} = \frac{\#C}{\#W_{\mathrm {E}}^{\zeta }}+O(p^{-1/2}). \end{aligned}$$

Since \(1\not \in C\), this implies that \(\eta <1\).

Next we briefly treat the case of \(\mathsf {V}_p^{bigstab}\), referring to [34,  Lemma 5.7] for more details. By a similar argument to the one above, it suffices to find an element \(w\in W_{\mathrm {E}}^{\zeta }\) with \(\left( (\varLambda /2\varLambda )^{\zeta }\right) ^w = 0\). This can be achieved by taking a Coxeter element of \(W_{\mathrm {E}}\) fixed by \(\zeta \): the end of the proof of [34,  Lemma 5.7] shows that such an element has no nonzero fixed vector on \(\varLambda /2\varLambda \) hence the same is true for its restriction to the \(\zeta \)-fixed points. An example of such a Coxeter element is \(w_1w_6w_2w_3w_5w_4\), using Bourbaki notation [19,  Planche V] for labelling the simple roots of \(E_6\). \(\square \)

We explain why Lemma 6.6 implies Propositions 6.7 and 6.8. We first claim that if \(v\in \underline{\mathsf {V}}(\mathbb {Z})\) with \(b=\pi (v)\) is almost \(\mathbb {Q}\)-reducible, then for each prime p not dividing N the reduction of v in \(\underline{\mathsf {V}}(\mathbb {F}_p)\) is almost \(\mathbb {F}_p\)-reducible. Indeed, either \(\varDelta (b)=0\) in \(\mathbb {F}_p\) (in which case v is almost \(\mathbb {F}_p\)-reducible), or \(p\not \mid \varDelta (b)\) and \(\mathcal {Q}(v)\) is \(\mathsf {G}^{\star }(\mathbb {Q})\)-conjugate to \(\mathcal {Q}(\sigma (b))\). In the latter case Proposition 5.10 implies that \(\mathcal {Q}(v)\) is \(\mathsf {G}^{\star }(\mathbb {Z}_p)\)-conjugate to \(\mathcal {Q}(\sigma (b))\), so their reductions are \(\underline{\mathsf {G}}^{\star }(\mathbb {F}_p)\)-conjugate, proving the claim. By a congruence version of Proposition 6.6, for every subset \(L\subset \mathsf {B}(\mathbb {R})\) considered in Proposition 6.3 and for every \(Y>0\) we obtain the estimate:

$$\begin{aligned} N(\mathsf {V}^{alred}\cap \mathsf {G}(\mathbb {R})\cdot \varLambda \cdot s(L),X)\le C \left( \prod _{N<p<Y} \int _{\mathsf {V}_p^{alred}} \,dv\right) \cdot X^{28} +o(X^{28}), \end{aligned}$$

where \(C>0\) is a constant independent of Y. By Lemma 6.6, the product of the integrals converges to zero as Y tends to infinity, so \( N(\mathsf {V}^{alred}\cap \mathsf {G}(\mathbb {R})\cdot \varLambda \cdot s(L),X)=o(X^{28})\). Since this holds for every such subset L, we obtain Proposition 6.7.

Note that we have not used Theorem 6.1 in this argument, but we may use it now to prove Proposition 6.8. Again the reduction of an element of \(\mathsf {V}^{bigstab}\) modulo p lands in \(\mathsf {V}_p^{bigstab}\) if p does not divide N, by Proposition 5.9. Since \(\lim _{X\rightarrow +\infty } N(\mathsf {V}^{bigstab},X)/X^{28}\) is \(O(\prod _{N<p<Y}\int _{\mathsf {V}_p^{alred}} dv) \) by Theorem 6.2 and the product of the integrals converges to zero by Lemma 6.6, this proves Proposition 6.8.

6.10 Cutting off the cusp

In this section we prove Proposition 6.4. We continue to use the notation introduced above its statement. We will follow the proof of the \(E_6\) case [61,  Proposition 3.6] using simplifications from the proof of [52,  Theorem 6.2]. We first reduce the statement to a combinatorial result, after introducing some notation.

If \((M_0, M_1)\) is a pair of disjoint subsets of \(\varPhi _{\mathsf {V}}\) we define \(S(M_0,M_1) = \{v\in \underline{\mathsf {V}}(\mathbb {Z}) \mid \forall {\alpha }\in M_0, v_{\alpha }=0; \forall {\alpha }\in M_1, v_{\alpha } \ne 0 \}\). Let \(\mathcal {C}\) be the collection of non-empty subsets \(M_0\subset \varPhi _{\mathsf {V}}\) such that if \(\alpha \in M_0\) and \(\beta \ge \alpha \) then \(\beta \in M_0\). (We have fixed a partial ordering on \(\varPhi _{\mathsf {V}}\) in Sect. 2.4, Eq. (2.1).) Given a subset \(M_0 \in \mathcal {C}\) we define \(\lambda (M_0) :=\{ \alpha \in \varPhi _{\mathsf {V}}\setminus M_0 \mid M_0 \cup \{\alpha \} \in \mathcal {C}\}\), i.e. the set of maximal elements of \(\varPhi _{\mathsf {V}}\setminus M_0\).

By definition of \(\mathcal {C}\) and \(\lambda \) we see that \(S(\{\alpha _0\}) = \cup _{M_0\in \mathcal {C}} S(M_0,\lambda (M_0))\). Therefore to prove Proposition 6.4, it suffices to prove that for each \(M_0\in \mathcal {C}\), either \(S(M_0,\lambda (M_0))^{sirr}=\emptyset \) or \(N(S(M_0,\lambda (M_0)),X)=o(X^{28})\). By the same logic as [61,  Proposition 3.6 and §5] (itself based on a trick due to Bhargava), the estimate \(N(S(M_0,\lambda (M_0)),X)=o(X^{28})\) holds if there exists a subset \(M_1\subset \varPhi _{\mathsf {V}}\setminus M_0\) and a function \(f:M_1 \rightarrow \mathbb {R}_{\ge 0}\) with \(\sum _{\alpha \in M_1} f(\alpha ) <\#M_0\) such that

$$\begin{aligned} \sum _{\alpha \in \varPhi _{\mathsf {G}}^+\setminus M_0 }\alpha +\sum _{\alpha \in M_1}f(\alpha )\alpha \end{aligned}$$

has strictly positive coordinates with respect to the basis \(S_{\mathsf {G}}\). It will thus suffice to prove the following proposition, which is the analogue of [7,  Proposition 29]. Recall that we write \(\alpha = \sum _{i=1}^4 n_i(\alpha ) \beta _i\) for every \(\alpha \in X^*(\mathsf {T})\otimes \mathbb {Q}\).

Proposition 6.9

Let \(M_0 \in \mathcal {C}\) be a subset such that \(\mathsf {V}(M_0)(\mathbb {Q})\) contains strongly \(\mathbb {Q}\)-irreducible elements. Then there exists a subset \(M_1\subset \varPhi _{\mathsf {V}} \setminus M_0\) and a function \(f:M_1 \rightarrow \mathbb {R}_{\ge 0}\) satisfying the following conditions:

  • We have \(\sum _{\alpha \in M_1} p(\alpha ) < \# M_0\).

  • For each \(i = 1,\dots , 4\) we have \(\sum _{\alpha \in \varPhi _{\mathsf {G}}^+} n_i(\alpha )- \sum _{\alpha \in M_0} n_i(\alpha ) + \sum _{\alpha \in M_1} p(\alpha ) n_i(\alpha ) >0\).

The proof of Proposition 6.9 will be given after some useful lemmas. We will use the notation of Table 3 to label the elements of \(\varPhi _{\mathsf {V}}\).

Lemma 6.7

Let \(M_0\in \mathcal {C}\) and suppose that \(\mathsf {V}(M_0)(\mathbb {Q})^{sirr}\ne \emptyset \). Then \(M_0\subset \{1,2,3,4,5,6,7,8,9,10,13\}\) and \(\{9,10\} \not \subset M_0\).

Proof

Let \(M_0\) be such a subset. Suppose that \(11\in M_0\). Since \(M_0 \in \mathcal {C}\) we have \(\{1,2,3,4,5,7,8,11\}\subset M_0\). By Lemma 2.4, this implies that \(\mathsf {V}(M_0)(\mathbb {Q})^{sirr} = \emptyset \), contradiction. The same argument involving the other three subsets of Lemma 2.4 shows that \(15\not \in M_0\), \(\{9,10\}\not \subset M_0\) and \(14\not \in M_0\). Therefore \(M_0\) is contained in the subset of \(\alpha \in \varPhi _{\mathsf {V}}\) with the property that \(\alpha \not \le 11, \alpha \not \le 14\) and \(\alpha \not \le 15\), which is easily checked to be \(\{1,2,3,4,5,6,7,8,9,10,13\}\). \(\square \)

For the reader’s convenience we give the Hasse diagram of the subset \(\{1,2,3,4,5,6,7,8,9,10,13\}\) with respect to the partial ordering on \(\varPhi _{\mathsf {V}}\).

figure az

We say a subset \(M_0\in \mathcal {C}\) is if there exists a subset \(M_1\subset \varPhi _{\mathsf {V}} \setminus M_0\) and a function \(f:M_1\rightarrow \mathbb {R}_{\ge 0}\) satisfying the conclusions of Proposition 6.9. The following lemma is a slight generalization of [52,  Lemma 6.6]; its proof is identical.

Lemma 6.8

Suppose that \(M_0', M_0''\in \mathcal {C}\) with \(M_0''\subset M_0'\), that \(M_1' \subset \varPhi _{\mathsf {V}}\setminus M_0'\), and that there exists a function \(f':M_1'\rightarrow \mathbb {R}_{\ge 0}\) satisfying the conditions of Proposition 6.9. If there exists a function \(g:(M_0'\setminus M_0'')\rightarrow M_1'\) such that

  1. 1.

    \(\alpha \ge g(\alpha )\) for all \(\alpha \in M_0'\setminus M_0''\),

  2. 2.

    \(f'(\alpha )-\#g^{-1}(\alpha )\ge 0\) for all \(\alpha \in M_1'\),

then any \(M_0 \in \mathcal {C}\) such that \(M_0''\subset M_0 \subset M_0'\) is good.

Proof

Given such a subset \(M_0\), define \(f:M_1' \rightarrow \mathbb {R}\) by \(f(\alpha ) = f'(\alpha )-\#[g^{-1}(\alpha )\cap (M_0'\setminus M_0)]\). The second condition on g implies that f takes values in \(\mathbb {R}_{\ge 0 }\). We have

$$\begin{aligned} \sum _{\alpha \in M_1'}f(\alpha ) = \sum _{\alpha \in M_1'}f'(\alpha )-\#[M_0'\setminus M_0] < \#M_0. \end{aligned}$$

Moreover

$$\begin{aligned} \sum _{\alpha \in \varPhi _{\mathsf {G}}^+\setminus M_0 }\alpha +\sum _{\alpha \in M_1'}f(\alpha )\alpha = \left( \sum _{\alpha \in \varPhi _{\mathsf {G}}^+\setminus M'_0 }\alpha +\sum _{\alpha \in M_1'}f'(\alpha )\alpha \right) + \sum _{\alpha \in M_0'\setminus M_0 }(\alpha -g(\alpha )). \end{aligned}$$

The first term on the right-hand-side has positive coordinates with respect to \(S_{\mathsf {G}}\) by assumption on \(f'\) and the second term has nonnegative coordinates by the first condition on g. \(\square \)

Table 5 gives examples of \(M_0'\in \mathcal {C}\) together with a subset \(M_1' \subset \varPhi _{\mathsf {V}} \setminus M_0'\) and a function \(f':M_1'\rightarrow \mathbb {R}_{\ge 0}\) satisfying the conclusions of Proposition 6.9. The third column denotes the coordinates of \(2\sum _{\alpha \in \varPhi _{\mathsf {V}}\setminus M'_0} \alpha \) with respect to the basis \(S_{\mathsf {G}}\). The validity this table can be easily checked in conjunction with Table 3. For example, checking the second row amounts to checking that the nonnegative reals \((v_4,v_{10},v_{14}) = (0,3\frac{1}{2},1\frac{1}{2})\) have the property that \(v_4+v_{10}+v_{14}<6\) and that the vector

$$\begin{aligned} (2v_4+2v_{10}-2v_{14}+4 ,4v_4+8 , 3v_4-v_{10}+v_{14}+4, -v_4 +v_{10}+v_{14}-4 ) \end{aligned}$$

has strictly positive entries.

Table 5 Examples of good \(M_0'\)

Proof of Proposition 6.9

Let \(M_0\in \mathcal {C}\) be such a subset. By Lemma 6.7, \(M_0 \subset \{1,2,3,4,5,6,7,8,9,10,13\}\) and \(\{9,10\}\not \subset M_0\). We prove that \(M_0\) is good by considering various cases together with the information of Table 5. If \(\#M_0\le 2\) then \(M_0 = \{1\}, \{1,2\}\) or \(\{1,4\}\) so taking \(M_1 = {3}\) and \(f(3) = 1/2\) shows that \(M_0\) is good. We may assume for the remainder of the proof that \(\#M_0\ge 3\), which implies that \(\{1,2\} \subset M_0\).

Case 1: \(4\not \in M_0\) and \(9\not \in M_0\). Then \(\{1,2\} \subset M_0\subset \{1,2,3,5,6,10\}\). We apply Lemma 6.8 with \((M_0',M_1',f')\) given by the first row of Table 5, \(M_0'' = \{1,2\}\) and \(g:(M'_0 \setminus M_0'') \rightarrow M_1'\) given by \(3,5,6,10 \mapsto 12\).

Case 2: \(4\not \in M_0\) and \(9\in M_0\). Then \(10 \not \in M_0\) by Lemma 6.7, hence \(M_0 = \{1,2,3,5,6,9\}\). The second row of Table 5 shows that \(M_0\) is good.

Case 3: \(4\in M_0\) and \(9\not \in M_0\). Then \(\{1,2,4\} \subset M_0 \subset \{1,2,3,4,5,6,7,8,10,13\}\). We apply Lemma 6.8 with \((M_0',M_1',f')\) given by the third row of Table 5, \(M_0'' = \{1,2,4\}\) and \(g:(M'_0 \setminus M_0'') \rightarrow M_1'\) given by \(3,5,6\mapsto 9; 7\mapsto 11; 8,10,13\mapsto 15\).

Case 4: \(4\in M_0\) and \(9\in M_0\). Lemma 6.7 then implies that \(10\not \in M_0\). If \(13\in M_0\), then \(M_0 = \{1,2,3,4,5,6,7,8,9,13\}\), which is good by the fourth row of Table 5. If \(13\not \in M_0\), then \(\{1,2,3,4,5,6,9\} \subset M_0 \subset \{1,2,3,4,5,6,7,8,9\}\). We apply Lemma 6.8 with \((M_0',M_1',f')\) given by the fifth row of Table 5, \(M_0'' = \{1,2,3,4,5,6,9\}\) and \(g:(M'_0 \setminus M_0'') \rightarrow M_1'\) given by \(7\mapsto 11\); \(8\mapsto 13\). \(\square \)

7 Counting integral orbits in \(\mathsf {V}^{\star }\)

In this section we count integral orbits in the representation \(\underline{\mathsf {V}}^{\star }\). Since \(\underline{\mathsf {V}}^{\star }\) is essentially the space of binary quartic forms, the methods here will be very similar to the ones employed by Bhargava and Shankar [12].

7.1 The local Selmer ratio of a self-dual isogeny

The following lemma is presumably well-known; it generalizes the observation that \(\# (E(\mathbb {Q}_p)/nE(\mathbb {Q}_p)) = |n|_p^{-1} \#E(\mathbb {Q}_p)[n]\) if \(E/\mathbb {Q}_p\) is an elliptic curve. It follows from local duality theorems.

Lemma 7.1

Let \(k = \mathbb {R}\) or \(\mathbb {Q}_p\) for some p and let K be a finite extension of k. Write \(| \cdot |_{k}: k^{\times } \rightarrow \mathbb {R}_{>0}\) for the normalized absolute value of k. Let A be an abelian variety over K with dual \(A^{\vee }\). Let \(\lambda : A\rightarrow A^{\vee }\) be a self-dual isogeny. Then the degree of \(\lambda \) is a square number \(m^2\) for some \(m\in \mathbb {Z}_{\ge 1}\). Consider the quantity

$$\begin{aligned} c(\lambda ) := \frac{\#\left( A^{\vee }(K)/\lambda (A(K)) \right) }{\#A[\lambda ](K)}. \end{aligned}$$

Then \(c(\lambda ) = |m|_{k}^{-[K:k]}\).

Proof

The selfduality of \(\lambda \) implies that there is a perfect alternating pairing \(A[\lambda ]\times A[\lambda ] \rightarrow \mathbb {G}_{m,K} , \) so the degree of \(\lambda \) is a square number \(m^2\) for some \(m\in \mathbb {Z}_{\ge 0}\). This pairing induces a pairing on Galois cohomology \(\mathrm {H}^1(K,A[\lambda ])\times \mathrm {H}^1(K,A[\lambda ]) \rightarrow \mathrm {H}^2(K,\mathbb {G}_m)\hookrightarrow \mathbb {Q}/\mathbb {Z}\) which is also perfect and alternating and the image of the descent map \(A^{\vee }(K)/\lambda (A(K))\rightarrow \mathrm {H}^1(K,A[\lambda ])\) is a maximal isotropic subspace [47,  Proposition 4.10]. This implies that

$$\begin{aligned} \left( \#\frac{A^{\vee }(K)}{\lambda (A(K))}\right) ^2 = \# \mathrm {H}^1(K,A[\lambda ]). \end{aligned}$$
(7.1)

By the local Euler characteristic formula [38,  Theorems I.2.8 and I.2.13], we obtain the equality

$$\begin{aligned} \#\mathrm {H}^1(K,A[\lambda ]) = \#\mathrm {H}^0(K,A[\lambda ]) \#\mathrm {H}^2(K,A[\lambda ]) |m|^{-2[K:\mathbb {Q}_p]}_{k} . \end{aligned}$$
(7.2)

We have \(\mathrm {H}^0(K,A[\lambda ])=A[\lambda ](K)\) and local Tate duality implies that \(\mathrm {H}^2(K,A[\lambda ])\simeq \mathrm {H}^0(K,A[\lambda ])^{\vee } \simeq A[\lambda ](K)^{\vee }\simeq A[\lambda ](K)\) too. The lemma follows from combining Eqs. (7.1) and (7.2). \(\square \)

Corollary 7.1

Let \(k = \mathbb {R}\) or \(\mathbb {Q}_p\) for some p. If \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(k)\), then in the notation of Lemma 7.1 the quantities \(c(\rho :P_b\rightarrow P_b^{\vee })\) and \(c(\hat{\rho }:P_b^{\vee } \rightarrow P_b)\) coincide and equal

$$\begin{aligned} {\left\{ \begin{array}{ll} 1/2 &{}\quad \text {if }k = \mathbb {R},\\ 2 &{}\quad \text {if }k = \mathbb {Q}_2, \\ 1 &{}\quad \text {else}. \end{array}\right. } \end{aligned}$$

Proof

We only treat the case of \(c(\rho )\), the case of \(c(\hat{\rho })\) being analogous. The isogeny \(\rho \) is self-dual by Lemma 3.2 and its kernel is isomorphic to \(E_b[2]\), which has order 4. Apply Lemma 7.1. \(\square \)

7.2 Heights, measures and fundamental sets

We discuss objects and notions analogous to those of Sects. 6.1 and 6.2 in the context of the representation \(\underline{\mathsf {V}}^{\star }\). Recall from Sect. 4.2 that we have defined the isomorphism \(\mathcal {Q}:\mathsf {B}\rightarrow \mathsf {B}^{\star }\) which was spread out to an isomorphism \(\underline{\mathsf {B}}_S\rightarrow \underline{\mathsf {B}}^{\star }_S\) in Sect. 5.1. We will use \(\mathcal {Q}\) to transport definitions of \(\underline{\mathsf {B}}_S\) to \(\underline{\mathsf {B}}^{\star }_S\). For example, we set \(\varDelta ^{\star }:=\varDelta \circ \mathcal {Q}^{-1}\), \(\varDelta ^{\star }_{E}:=\varDelta _{E}\circ \mathcal {Q}^{-1}\), \(\varDelta ^{\star }_{\hat{E}}:=\varDelta _{\hat{E}}\circ \mathcal {Q}^{-1}\), all elements of \(S[\underline{\mathsf {B}}^{\star }]\). Furthermore we define \(\underline{\mathsf {B}}^{\star ,{{\,\mathrm{rs}\,}}}:=\underline{\mathsf {B}}^{\star }_S[\left( \varDelta ^{\star }\right) ^{-1}]\), and we define \(\underline{\mathsf {V}}^{\star ,{{\,\mathrm{rs}\,}}}\) as the preimage of \(\underline{\mathsf {B}}^{\star ,{{\,\mathrm{rs}\,}}}\) under \(\pi ^{\star }:\underline{\mathsf {V}}^{\star }\rightarrow \underline{\mathsf {B}}^{\star }\). The S-schemes \(\underline{\mathsf {B}}^{\star ,{{\,\mathrm{rs}\,}}}\) and \(\underline{\mathsf {V}}^{\star ,{{\,\mathrm{rs}\,}}}\) have generic fibre \(\mathsf {B}^{\star ,{{\,\mathrm{rs}\,}}}\) and \(\mathsf {V}^{\star ,{{\,\mathrm{rs}\,}}}\).

For any \(b\in \mathsf {B}^{\star }(\mathbb {R})\) we define the of b by the formula

$$\begin{aligned} \text {ht}(b) = \text {ht}(\mathcal {Q}^{-1}(b)), \end{aligned}$$

where the height on \(\mathsf {B}(\mathbb {R})\) is defined in Sect. 6.1. We define \(\text {ht}(v) = \text {ht}(\pi ^{\star }(v))\) for any \(v\in \mathsf {V}^{\star }(\mathbb {R})\). If A is a subset of \(\mathsf {V}^{\star }(\mathbb {R})\) or \(\mathsf {B}^{\star }(\mathbb {R})\) and \(X\in \mathbb {R}_{>0}\) we write \(A_{<X}\subset A\) for the subset of elements height \(<X\).

Let \(\omega _{\mathsf {G}^{\star }}\) be a generator for the free rank 1 \(\mathbb {Z}\)-module of left-invariant top differential forms on \(\underline{\mathsf {G}}^{\star }\) over \(\mathbb {Z}\). It is well-defined up to sign and it determines Haar measures dg on \(\mathsf {G}^{\star }(\mathbb {R})\) and \(\mathsf {G}^{\star }(\mathbb {Q}_p)\) for each prime p.

Let \(\omega _{\mathsf {V}^{\star }}\) be a generator for the free rank one \(\mathbb {Z}\)-module of left-invariant top differential forms on \(\underline{\mathsf {V}}^{\star }\). Then \(\omega _{\mathsf {V}^{\star }}\) is uniquely determined up to sign and it determines Haar measures dv on \(\mathsf {V}^{\star }(\mathbb {R})\) and \(\mathsf {V}^{\star }(\mathbb {Q}_p)\) for every prime p. We define the top form \(\omega _{\mathsf {B}^{\star }}\) on \(\mathsf {B}^{\star }\) as the pullback of the form \(\omega _{\mathsf {B}}\) from Sect. 6.2 under the isomorphism \(\mathcal {Q}^{-1} :\mathsf {B}^{\star }\rightarrow \mathsf {B}\). It defines measures db on \(\mathsf {B}^{\star }(\mathbb {R})\) and \(\mathsf {B}^{\star }(\mathbb {Q}_p)\) for every prime p.

Lemma 7.2

There exists a constant \(W_1\in \mathbb {Q}^{\times }\) with the following properties:

  1. 1.

    Let \(\underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}} = \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)\cap \mathsf {V}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {Q}_p)\) and define a function \(m_p: \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}} \rightarrow \mathbb {R}_{\ge 0}\) by the formula

    $$\begin{aligned} m_p(v) = \sum _{v' \in \underline{\mathsf {G}}^{\star }(\mathbb {Z}_p)\backslash \left( \mathsf {G}^{\star }(\mathbb {Q}_p)\cdot v\cap \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p) \right) } \frac{\#Z_{\underline{\mathsf {G}}^{\star }}(v)(\mathbb {Q}_p) }{\#Z_{\underline{\mathsf {G}}^{\star }}(v)(\mathbb {Z}_p) } . \end{aligned}$$
    (7.3)

    Then \(m_p(v)\) is locally constant.

  2. 2.

    Let \(\underline{\mathsf {B}}^{\star }(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}} = \underline{\mathsf {B}}^{\star }(\mathbb {Z}_p)\cap \mathsf {B}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {Q}_p)\) and let \(\psi _p: \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}} \rightarrow \mathbb {R}_{\ge 0}\) be a bounded, locally constant function which satisfies \(\psi _p(v) = \psi _p(v')\) when \(v,v'\in \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}}\) are conjugate under the action of \(\mathsf {G}^{\star }(\mathbb {Q}_p)\). Then we have the formula

    $$\begin{aligned} \int _{v\in \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}}} \psi _p(v) \mathrm {d} v = |W_1|_p {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}^{\star }(\mathbb {Z}_p)\right) \int _{b\in \underline{\mathsf {B}}^{\star }(\mathbb {Z}_p)^{{{\,\mathrm{rs}\,}}}} \sum _{g\in \mathsf {G}^{\star }(\mathbb {Q}_p)\backslash \underline{\mathsf {V}}^{\star }_b(\mathbb {Z}_p) } \frac{m_p(v)\psi _p(v) }{\# Z_{\underline{\mathsf {G}}^{\star }}(v)(\mathbb {Q}_p)} \mathrm {d} b . \end{aligned}$$
    (7.4)

Proof

The proof is the same as that of [53,  Proposition 3.3], using the fact that the sum of the weights of the \(\mathbb {G}_m\)-action on \(\mathsf {V}^{\star }\) equals 28, the sum of the invariants of \(\mathsf {B}^{\star }\).

\(\square \)

We henceforth fix a constant \(W_1\in \mathbb {Q}^{\times }\) satisfying the properties of Lemma 7.2.

We now construct special subsets of \(\mathsf {V}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {R})\) which serve as our fundamental domains for the action of \(\mathsf {G}^{\star }(\mathbb {R})\) on \(\mathsf {V}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {R})\). We have to slightly alter the fundamental sets used in [12] because the discriminant we use here to define \(\mathsf {B}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {R})\) is larger than the discriminant used by Bhargava and Shankar.

In the notation of [12,  §2.1], if \(i=0,1,2+, 2-\), let \(\mathsf {V}^{\star }(\mathbb {R})^{(i)}\subset \mathsf {V}^{\star }(\mathbb {R})\) be the subset of elements \((b_2,b_6,q)\) where the binary quartic form q has \(4-2i\) real roots, and is positive/negative definite if \(i=2+/2-\) respectively. For each such i, an open semialgebraic subset \(K^{(i)}\subset \{b\in \mathsf {B}^{\star }(\mathbb {R}) \mid b_2=b_6=0 \text { and } \text {ht}(b)=1\}\) and a section \(s_{(i)}:K^{(i)} \rightarrow \mathsf {V}^{\star }(\mathbb {R})^{(i)}\) of \(\pi ^{\star }\) is given in [12,  Table 1], with the property that (if \(\varLambda = \mathbb {R}_{>0}\)):

$$\begin{aligned} \{v\in \mathsf {V}^{\star }(\mathbb {R}) \mid b_2=b_6=0 \text { and }\varDelta ^{\star }_{\hat{E}}(\pi ^{\star }(v))\ne 0\} = \bigsqcup _{i=0,1,2\pm }\mathsf {G}^{\star }(\mathbb {R}) \cdot \varLambda \cdot s_{(i)}(L^{(i)}). \end{aligned}$$

Let \(M^{(i)}\subset \mathsf {V}^{\star }(\mathbb {R})\) be the subset of elements \(v=(b_2,b_6,q) \) satisfying \(\text {ht}(v)=1\) and \(q\in \varLambda \cdot K^{(i)}\). Let \(L_1,\dots ,L_k\) be the connected components of \(\pi ^{\star }(M^{(i)})\cap \mathsf {B}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {R})\) for all \(i=0,1,2\pm \); they are also the connected components of \(\{b\in \mathsf {B}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {R})\mid \text {ht}(b)=1 \}\).

By construction, the open subsets \(L_1,\dots , L_k\) of \(\{b\in \mathsf {B}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {R})\mid \text {ht}(b)=1 \}\) come equipped with sections \(s_i :L_i \rightarrow \mathsf {V}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {R})\) of \(\pi ^{\star }:\mathsf {V}^{\star }(\mathbb {R}) \rightarrow \mathsf {B}^{\star }(\mathbb {R}) \) and satisfy the following properties, which follow from the corresponding properties of \(K^{(i)}\):

  • For each i, \(L_i\) is connected and semialgebraic and \(s_i\) is a semialgebraic map with bounded image.

  • Set \(\varLambda = \mathbb {R}_{>0}\). Then we have an equality

    $$\begin{aligned} \mathsf {V}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {R}) = \bigsqcup _{i=1}^k \mathsf {G}^{\star }(\mathbb {R}) \cdot \varLambda \cdot s_i(L_i). \end{aligned}$$
    (7.5)

If \(v\in s_i(L_i)\) let \(r_i = \# Z_{\mathsf {G}^{\star }}(v)(\mathbb {R})\); this integer is independent of the choice of v.

7.3 Counting integral orbits in \(\mathsf {V}^{\star }\)

For any \(\underline{\mathsf {G}}^{\star }(\mathbb {Z})\)-invariant subset \(A\subset \underline{\mathsf {V}}^{\star }(\mathbb {Z})\) and function \(w:\underline{\mathsf {V}}^{\star }(\mathbb {Z}) \rightarrow \mathbb {R}\), define

$$\begin{aligned} N_w(A,X) :=\sum _{v\in \underline{\mathsf {G}}^{\star }(\mathbb {Z})\backslash A_{<X}} \frac{w(v)}{\# Z_{\underline{\mathsf {G}}^{\star }}(v)(\mathbb {Z})}. \end{aligned}$$

Let k be a field of characteristic not dividing N. We say an element \(v\in \underline{\mathsf {V}}^{\star }(k)\) with \(b^{\star } = \pi ^{\star }(v)\) and \(b = \mathcal {Q}^{-1}(b^{\star })\) is:

  • k- if \(\varDelta (b)=0\) or if v is \(\underline{\mathsf {G}}^{\star }(k)\)-conjugate to \(\mathcal {Q}(\sigma (b))\), and k- otherwise.

  • k- if \(\varDelta (b)\ne 0\) and v lies in the image of the map \(\eta ^{\star }_b:\mathcal {P}_b(k)/\hat{\rho }(\mathcal {P}_b^{\vee }(k)) \rightarrow \underline{\mathsf {G}}^{\star }(k)\backslash \underline{\mathsf {V}}^{\star }_{b^{\star }}(k)\) of Theorem 4.2.

In more classical language, an element \(v = (b_2,b_6,q) \in \underline{\mathsf {V}}(k)\) is k-reducible if \(\varDelta ^{\star }(v)=0\) or the binary quartic form q has a k-rational linear factor. (This follows from Lemma 2.2.) The definition of k-soluble elements introduced here does not relate in a direct way to the notion of solubility used in the more classical sense as in [12]; it depends not only on q but also on \(b_2\) and \(b_6\).

For any \(A\subset \underline{\mathsf {V}}^{\star }(\mathbb {Z})\) write \(A^{irr}\subset A\) for the subset of \(\mathbb {Q}\)-irreducible elements. Write \(\mathsf {V}^{\star }(\mathbb {R})^{sol} \subset \mathsf {V}^{\star }(\mathbb {R})\) for the subset of \(\mathbb {R}\)-soluble elements. We say a function \(w :\underline{\mathsf {V}}^{\star }(\mathbb {Z})\rightarrow \mathbb {R}\) is defined by if it is the pullback of a function \(\bar{w}:\underline{\mathsf {V}}^{\star }(\mathbb {Z}/M\mathbb {Z}) \rightarrow \mathbb {R}\) and for such w write \(\mu _w\) for the average of \(\bar{w}\) when \(\underline{\mathsf {V}}^{\star }(\mathbb {Z}/M\mathbb {Z})\) is given the uniform probability measure. Recall that \(W_1\) denotes the constant fixed in Sect. 7.2 satisfying the conclusions of Lemma 7.2.

Theorem 7.1

Let \(w:\underline{\mathsf {V}}^{\star }(\mathbb {Z}) \rightarrow \mathbb {R}\) be a function defined by finitely many congruence conditions. Then

$$\begin{aligned} N_w(\underline{\mathsf {V}}^{\star }(\mathbb {Z})^{irr} \cap \mathsf {V}^{\star }(\mathbb {R})^{sol},X) \!=\! \mu _w \frac{|W_1|}{2}{{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}^{\star }(\mathbb {Z}) \backslash \underline{\mathsf {G}}^{\star }(\mathbb {R}) \right) {{\,\mathrm{vol}\,}}(\mathsf {B}(\mathbb {R})_{<X}) \!+\!o(X^{28}). \end{aligned}$$

Proof

For every \(b\in \mathsf {B}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {R})\) and \(v\in \mathsf {V}^{\star }_b(\mathbb {R})\) we have equalities

$$\begin{aligned} \#\left( \mathsf {G}^{\star }(\mathbb {R})\backslash \mathsf {V}^{\star }_b(\mathbb {R})^{sol}\right) /\#Z_{\mathsf {G}^{\star }}(v)(\mathbb {R})= \#\left( P_b(\mathbb {R})/\hat{\rho }(P^{\vee }_b(\mathbb {R}))\right) /\#P_b^{\vee }[\hat{\rho }](\mathbb {R})=1/2, \end{aligned}$$

where the first follows from the definition of \(\mathbb {R}\)-solubility and Lemma 4.6, and the second from Corollary 7.1.

By an argument identical to that of [34,  Lemma 5.5], the subset \(\mathsf {V}^{\star }(\mathbb {R})^{sol}\) is open and closed in \(\underline{\mathsf {V}}^{\star ,{{\,\mathrm{rs}\,}}}(\mathbb {R})\). Using the decomposition (7.5) and discarding those sections which do not contain \(\mathbb {R}\)-soluble elements, it suffices to prove that for each \(L_i\) we have

$$\begin{aligned} N_w(\mathsf {G}^{\star }(\mathbb {R})\cdot \varLambda \cdot s_i(L_i) \cap \underline{\mathsf {V}}^{\star }(\mathbb {Z})^{irr},X)= & {} \mu _w \frac{|W_1|}{r_i}{{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}^{\star }(\mathbb {Z}) \backslash \mathsf {G}^{\star }(\mathbb {R}) \right) {{\,\mathrm{vol}\,}}((\varLambda \cdot L_i)_{<X})\\&+o\left( X^{28} \right) . \end{aligned}$$

This may be proved in exactly the same way as Proposition 6.3 using the results of [12,  §2]; we omit the details. \(\square \)

Next we consider infinitely many congruence conditions. The key input is the following uniformity estimate, which follows immediately from the one obtained by Bhargava and Shankar [12,  Theorem 2.13]. We recall from Sects. 3.6, 4.2 that \(\varDelta ^{\star }= \varDelta ^{\star }_{E}\varDelta ^{\star }_{\hat{E}}\) (up to a unit in \(\mathbb {Z}[1/N]\)) and that \(\varDelta ^{\star }_{\hat{E}}\) coincides with the usual discriminant of the binary quartic form q (again up to a unit in \(\mathbb {Z}[1/N]\)).

Proposition 7.1

For a prime p not dividing N, let \(\mathcal {W}_p(\mathsf {V}^{\star })\) denote the subset of \(v\in \underline{\mathsf {V}}^{\star }(\mathbb {Z})^{irr}\) such that \(p^2 \mid \varDelta ^{\star }_{\hat{E}}(v)\). For any \(M>N\) we have

$$\begin{aligned} \lim _{X\rightarrow \infty } N\left( \cup _{p>M}\mathcal {W}_p(\mathsf {V}^{\star }),X \right) /X^{28} = O(1/\log M) \end{aligned}$$
(7.6)

where the implied constant is independent of M.

Suppose we are given for each prime p a \(\underline{\mathsf {G}}^{\star }(\mathbb {Z}_p)\)-invariant function \(w_p: \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p) \rightarrow [0,1]\) with the following properties:

  • The function \(w_p\) is locally constant outside a closed subset of \(\underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)\) of measure zero.

  • For p sufficiently large and not dividing N, we have \(w_p(v) = 1\) for all \(v \in \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)\) such that \(p^2 \not \mid \varDelta ^{\star }_{\hat{E}}(v)\) and \(\varDelta ^{\star }(v)\ne 0\).

In this case we can define a function \(w: \underline{\mathsf {V}}^{\star }(\mathbb {Z}) \rightarrow [0,1]\) by the formula \(w(v) = \prod _{p} w_p(v)\) if \(\varDelta ^{\star }(v) \ne 0\) and \(w(v) = 0\) otherwise. Call a function \(w: \underline{\mathsf {V}}^{\star }(\mathbb {Z}) \rightarrow [0,1]\) defined by this procedure .

Theorem 7.2

Let \(w: \underline{\mathsf {V}}^{\star }(\mathbb {Z}) \rightarrow [0,1]\) be an acceptable function. Then

$$\begin{aligned} N_w(\underline{\mathsf {V}}^{\star }(\mathbb {Z})^{irr}\cap \mathsf {V}^{\star }(\mathbb {R})^{sol} ,X)= & {} \frac{|W_1|_{\infty }}{2} \left( \prod _p \int _{\underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)} w_p(v) \mathrm {d} v \right) \\&{{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}^{\star }(\mathbb {Z}) \backslash \mathsf {G}^{\star }(\mathbb {R}) \right) {{\,\mathrm{vol}\,}}\left( \mathsf {B}(\mathbb {R})_{<X} \right) + o(X^{28}). \end{aligned}$$

Proof

Our definition of an acceptable function slightly differs from the one the one employed in [12,  §2.7], since we only require that for sufficiently large primes p, \(w_p(v)=1\) if \(p^2\not \mid \varDelta ^{\star }_{\hat{E}}(v)\) and \(\varDelta ^{\star }(v)\ne 0\). Let \(S\subset \underline{\mathsf {V}}^{\star }(\mathbb {Z})\) be the subset of b with \(\varDelta ^{\star }(b)=0\). Bearing in mind that \(N(S,X)=o(X^{28})\) and the closure of S in \(\underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)\) is of measure zero, the proof of the theorem is identical to that of [12,  Theorem 2.21], using Proposition 7.1. \(\square \)

Remark 7.1

We obtain an equality in Theorem 7.2 whereas in Theorem 6.3 we only obtain an upper bound. This is because the proof of Theorem 7.2 relies on the uniformity estimate for \(\varDelta _{\hat{E}}^*\) of Proposition 7.1. We expect a similar uniformity estimate to hold for \(\underline{\mathsf {V}}\) with respect to \(\varDelta \), but this is not known.

8 Proof of the main theorems

In this section we combine all the previous results and prove the main theorems of the introduction. To calculate the average size of the \(\rho \)-Selmer group in Sect. 8.2, we reduce it to calculating the average size of the \(\hat{\rho }\)-Selmer group using the bigonal construction (Theorem 3.1). To make this reduction step precise, we consider the effect of changing the parameter space by an automorphism in Sect. 8.1.

8.1 Changing the parameter space

This section is based on a remark of Poonen and Stoll [48,  Remark 8.11]. Let \(n\ge 1\) be an integer and \(\mathcal {B}= \mathbb {A}^n_{\mathbb {Z}}\) be affine n-space with coordinates \(x_1,\dots ,x_n\). Suppose that \(\mathcal {B}\) is equipped with a \(\mathbb {G}_m\)-action such that \(\lambda \cdot x_i = \lambda ^{d_i}x_i\) for some set of positive weights \(d_1\le \dots \le d_n\); let d be their sum.

Definition 8.1

Let T be a subset of \(\mathcal {B}(\mathbb {R}) \times \prod _{p}\mathcal {B}(\mathbb {Z}_p)\) of the form \(T_{\infty } \times \prod _p T_p\). We say T is a if

  • \(T_{\infty }\subset \mathcal {B}(\mathbb {R})\) is open, bounded and semialgebraic.

  • For each prime number p, \(T_p\subset \mathcal {B}(\mathbb {Z}_p)\) is open and compact and for all but finitely many p we have \(T_p = \mathcal {B}(\mathbb {Z}_p)\).

If in addition \(T_{\infty }\) is a product of intervals \((a_1,b_1)\times \cdots \times (a_n,b_n)\), we say T is a .

Let T be a generalized box. We define

$$\begin{aligned} \mathscr {E}_{T,<X} :=\{b\in \mathcal {B}(\mathbb {Q}) \mid b\in X\cdot T_{\infty } \text { and } b\in T_p \text { for all } p \}. \end{aligned}$$

Every element of \(\mathscr {E}_{T,<X}\) lies in \(\mathcal {B}(\mathbb {Z})\). We note that if \(\mathcal {B}= \underline{\mathsf {B}}\) and \(T = [-1,1]^6 \times \prod \underline{\mathsf {B}}(\mathbb {Z}_p)\) then \(\mathscr {E}_{T,<X}\) coincides with the elements of \(\underline{\mathsf {B}}(\mathbb {Z})\) of height bounded by X.

The top form \(dx_1\wedge \dots \wedge dx_n\) defines measures on \(\mathcal {B}(\mathbb {R})\) and \(\mathcal {B}(\mathbb {Z}_p)\) for every prime p which satisfy \({{\,\mathrm{vol}\,}}(\mathcal {B}(\mathbb {Z}_p)) = 1\) for all p. We define the volume of a generalized box T by \({{\,\mathrm{vol}\,}}(T_{\infty }) \prod _p {{\,\mathrm{vol}\,}}(T_p)\); the previous sentence shows this is well-defined.

We first show that the volume is well-behaved under \(\mathbb {G}_m\)-equivariant automorphisms.

Lemma 8.1

Let \(\phi :\mathcal {B}\rightarrow \mathcal {B}\) be a \(\mathbb {G}_m\)-equivariant morphism such that \(\phi _{\mathbb {Q}}:\mathcal {B}_{\mathbb {Q}}\rightarrow \mathcal {B}_{\mathbb {Q}}\) is an isomorphism. Let T be a generalized box of \(\mathcal {B}\). Then \(\phi (T)\) is a generalized box, \(\phi (\mathscr {E}_{T,<X}) = \mathscr {E}_{\phi (T),<X}\) and moreover \({{\,\mathrm{vol}\,}}(\phi (T)) = {{\,\mathrm{vol}\,}}(T)\).

Proof

The first two claims follow from the definitions; it remains to compute the volume of \(\phi (T)\).

Up to an element of \(\mathbb {Q}^{\times }\), the form \(\omega = dx_1\wedge \dots \wedge dx_n\) is the unique nonzero n-form of \(\mathcal {B}_{\mathbb {Q}}\) that is homogeneous of degree d. Since the pullback \(\phi ^*\omega \) has the same properties, \(\phi ^*\omega = a\cdot \omega \) for some \(a\in \mathbb {Q}^{\times }\). The lemma follows from the product formula \(|a|\prod _p |a|_p=1\). \(\square \)

Let \(f:\mathcal {B}(\mathbb {Q}) \rightarrow \mathbb {R}_{\ge 0}\) be a function such that \(f(\lambda \cdot b) = f(b)\) for all \(\lambda \in \mathbb {Q}^{\times }\) and \(b\in \mathcal {B}(\mathbb {Q})\). Let \(C \in \mathbb {R}_{\ge 0}\) be a constant. We say \({{\,\mathrm{Eq}\,}}(f,C)\) T if

$$\begin{aligned} \sum _{b\in \mathscr {E}_{T,<X}} f(b) = C {{\,\mathrm{vol}\,}}(T) X^d+o(X^d) \end{aligned}$$
(8.1)

as \(X\rightarrow +\infty \). We say \({{\,\mathrm{Eq}\,}}^{\le }(f,C)\) T if in (8.1) the equality is replaced by \(\le \).

Proposition 8.1

Let fC and \(\phi \) be as above and \(\bullet \in \{\emptyset ,\le \}\). Suppose that \({{\,\mathrm{Eq}\,}}^{\bullet }(f,C)\) holds for all boxes of \(\mathcal {B}\). Then \({{\,\mathrm{Eq}\,}}^{\bullet }(f\circ \phi ,C)\) holds for all generalized boxes of \(\mathcal {B}\).

Proof

We only consider the case of \({{\,\mathrm{Eq}\,}}^{\le }(f,C)\), the case of \({{\,\mathrm{Eq}\,}}(f,C)\) being analogous. By approximating the infinite component using rectangles, \({{\,\mathrm{Eq}\,}}^{\le }(f,C)\) holds for all generalized boxes of \(\mathcal {B}\). If T is a generalized box then by Lemma 8.1, \(\phi (T)\) is a generalized box with the same volume as T and \(\phi (\mathscr {E}_{T,<X}) = \mathscr {E}_{\phi (T),<X}\). So

$$\begin{aligned} \sum _{\mathscr {E}_{T,<X}} f(\phi (b)) = \sum _{\mathscr {E}_{\phi (T),<X}}f(b) \le C {{\,\mathrm{vol}\,}}(\phi (T)) X^d+o(X^d) = C{{\,\mathrm{vol}\,}}(T) X^d +o(X^d), \end{aligned}$$

proving the proposition. \(\square \)

Remark 8.1

Most orbit-counting results using the geometry-of-numbers methods as employed in §6 are valid for any generalized box, with the same proof. Proposition 8.1 shows that for these counting results, the choice of homogeneous coordinates of \(\mathcal {B}\) is irrelevant. For example, consider the family of elliptic curves

$$\begin{aligned} y^2+p_2xy+p_6y = x^3+p_8x+p_{12}. \end{aligned}$$
(8.2)

After applying a homogeneous change of coordinates we obtain the family

$$\begin{aligned} (y+p_2x+p_6)^2 = x^3+p_8x+p_{12}. \end{aligned}$$
(8.3)

The results of [10,11,12,13] are valid for any box of \(\mathbb {A}^2_{(p_8,p_{12})}\) hence trivially for any box of \(\mathbb {A}^4_{(p_2,p_6,p_8,p_{12})}\) parametrizing elliptic curves in Family (8.3). Proposition 8.1 shows that these results remain valid for any generalized box for the elliptic curves in Family (8.2) too.

8.2 The average size of the \(\rho \)-Selmer group

Recall that \(\mathscr {E}\subset \underline{\mathsf {B}}(\mathbb {Z})\) denotes the subset of elements b with \(\varDelta (b)\ne 0\). We say a subset \(\mathcal {F} \subset \mathscr {E}\) is if it is the preimage of a subset of \(\underline{\mathsf {B}}(\mathbb {Z}/M\mathbb {Z})\) under the mod M reduction map \(\mathscr {E} \rightarrow \underline{\mathsf {B}}(\mathbb {Z}/M\mathbb {Z})\).

Theorem 8.1

Let \(\mathcal {F}\subset \mathscr {E}\) be a subset defined by finitely many congruence conditions. Then

$$\begin{aligned} \lim _{X\rightarrow \infty } \frac{1}{\# \mathcal {F}_{<X}} \sum _{b\in \mathcal {F}_{<X}} \# {{\,\mathrm{Sel}\,}}_{\hat{\rho }}P^{\vee }_b = 3. \end{aligned}$$

The proof is very similar to the proof of [34,  Theorem 6.1]; we include it here for completeness. Note that we obtain an equality here and not just an upper bound using the uniformity estimate of Proposition 7.1 combined with Proposition 5.10. We first prove a local statement. Recall that there is a \(\mathbb {G}_m\)-action on \(\underline{\mathsf {B}}\) which satisfies \(\lambda \cdot p_i = \lambda ^i p_i\), and that \(\mathscr {E}_p\) denotes the subset of elements b of \(\underline{\mathsf {B}}(\mathbb {Z}_p)\) with \(\varDelta (b)\ne 0\), equipped with the p-adic subspace topology. Also let \(\mathcal {F}_p\) be the closure of \(\mathcal {F}\) in \(\mathscr {E}_p\).

Proposition 8.2

Let \(b_0 \in \mathcal {F}\). Then we can find for each prime p dividing N an open compact neighborhood \(W_p\) of \(b_0\) in \(\mathscr {E}_p\) with the following property. Let \(\mathcal {F}_W = \mathcal {F} \cap \left( \prod _{p | N} W_p \right) \). Then we have

$$\begin{aligned} \lim _{X\rightarrow \infty } \frac{ \sum _{b\in \mathcal {F}_W,\; \text {ht}(b)<X }\# {{\,\mathrm{Sel}\,}}_{\hat{\rho }}P^{\vee }_b }{\# \{b \in \mathcal {F}_W \mid \text {ht}(b) < X \}} =3. \end{aligned}$$

Proof

Choose sets \(W_p\) and integers \(n_p\ge 0\) for p|N satisfying the conclusion of Corollary 5.4. We assume after shrinking the \(W_p\) that they satisfy \(W_p \subset \mathcal {F}_p\). If p does not divide N, set \(W_p = \mathcal {F}_p\) and \(n_p = 0\). Let \(M = \prod _{p} p^{n_p}\).

For \(v\in \underline{\mathsf {V}}^{\star }(\mathbb {Z})\) with \(b^{\star } = \pi ^{\star }(v)\) and \(\mathcal {Q}^{-1}(b^{\star }) = b \in \mathsf {B}(\mathbb {Q})\), define \(w(v) \in \mathbb {Q}_{\ge 0}\) by the following formula:

$$\begin{aligned} w(v) = {\left\{ \begin{array}{ll} \left( {\sum }_{v'\in \underline{\mathsf {G}}^{\star }(\mathbb {Z})\backslash \left( \underline{\mathsf {G}}^{\star }(\mathbb {Q})\cdot v \cap \underline{\mathsf {V}}^{\star }(\mathbb {Z}) \right) } \frac{\# Z_{\underline{\mathsf {G}}^{\star }}(v')(\mathbb {Q})}{\# Z_{\underline{\mathsf {G}}^{\star }}(v')(\mathbb {Z})} \right) ^{-1} &{} \text {if }b\in p^{n_p}\cdot W_p \text { and } \\ &{}\mathsf {G}^{\star }(\mathbb {Q}_p)\cdot v \in \eta ^{\star }_{b}(P_b(\mathbb {Q}_p)/\hat{\rho }(P_b(\mathbb {Q}_p))) \text { for all }p, \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Define \(w'(v)\) by the formula \(w'(v) = \#Z_{\underline{\mathsf {G}}^{\star }}(v)(\mathbb {Q})\cdot w(v)\). Corollaries 4.4 and 5.4 imply that if \(b\in M \cdot \mathcal {F}_{W}\), non-identity elements in the \(\hat{\rho }\)-Selmer group of \(P^{\vee }_b\) correspond bijectively to \(\mathsf {G}^{\star }(\mathbb {Q})\)-orbits in \(\mathsf {V}^{\star }_{b^{\star }}(\mathbb {Q})\) that intersect \(\underline{\mathsf {V}}^{\star }(\mathbb {Z})\) nontrivially, that are \(\mathbb {Q}\)-irreducible and that are soluble at \(\mathbb {R}\) and \(\mathbb {Q}_p\) for all p. In other words, we have the formula:

$$\begin{aligned} \sum _{\begin{array}{c} b \in \mathcal {F}_W \\ \text {ht}(b)<X \end{array}}\left( \#{{\,\mathrm{Sel}\,}}_{\hat{\rho }}(P^{\vee }_b)-1 \right) = \sum _{\begin{array}{c} b \in M\cdot \mathcal {F}_W \\ \text {ht}(b) <M \cdot X \end{array}}\left( \#{{\,\mathrm{Sel}\,}}_{\hat{\rho }}(P^{\vee }_b)-1 \right) = N_{w'}(\underline{\mathsf {V}}^{\star }(\mathbb {Z})^{irr}\cap \mathsf {V}^{\star }(\mathbb {R})^{sol} ,M \cdot X). \end{aligned}$$
(8.4)

Since the number of \(\underline{\mathsf {G}}^{\star }(\mathbb {Z})\)-orbits of \(v\in \underline{\mathsf {V}}^{\star }(\mathbb {Z})_{<X}\) with \(Z_{\mathsf {G}^{\star }}(v)(\mathbb {Q})\ne 1\) is negligible [12,  Lemma 2.4], we have

$$\begin{aligned} N_{w'}(\underline{\mathsf {V}}^{\star }(\mathbb {Z})^{irr}\cap \mathsf {V}^{\star }(\mathbb {R})^{sol} ,M \cdot X) = N_{w}(\underline{\mathsf {V}}^{\star }(\mathbb {Z})^{irr}\cap \mathsf {V}^{\star }(\mathbb {R})^{sol},M \cdot X) + o(X^{28}). \end{aligned}$$
(8.5)

It is more convenient to work with w(v) than with \(w'(v)\) because w(v) is an acceptable function in the sense of Sect. 7.3. Indeed, for \(v\in \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)\) with \(\pi ^{\star }(v)=b^{\star }\) and \(b =\mathcal {Q}^{-1}(b^{\star })\), define \(w_p(v) \in \mathbb {Q}_{\ge 0}\) by the following formula:

$$\begin{aligned} w_p(v) = {\left\{ \begin{array}{ll} \left( {\sum }_{v'\in \underline{\mathsf {G}}^{\star }(\mathbb {Z}_p)\backslash \left( \underline{\mathsf {G}}^{\star }(\mathbb {Q}_p)\cdot v \cap \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p) \right) } \frac{\# Z_{\underline{\mathsf {G}}^{\star }}(v')(\mathbb {Q}_p)}{\# Z_{\underline{\mathsf {G}}^{\star }}(v')(\mathbb {Z}_p)} \right) ^{-1} &{} \text {if }b\in p^{n_p}\cdot W_p \text { and }\\ &{} \mathsf {G}^{\star }(\mathbb {Q}_p)\cdot v \in \eta ^{\star }_{b}(P_b(\mathbb {Q}_p)/\hat{\rho }(P^{\vee }_b(\mathbb {Q}_p)) ), \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Then [12,  Proposition 3.6] shows that \(w(v) =\prod _pw_p(v)\) for all \(v\in \underline{\mathsf {V}}^{\star }(\mathbb {Z})\). The remaining properties for w(v) to be acceptable follow from Part 1 of Lemma 7.2 and Proposition 5.10. Moreover using Lemma 7.2 we obtain the formula

$$\begin{aligned} \int _{v\in \underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)} w_p(v) d v = |W_1|_p {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}^{\star }(\mathbb {Z}_p) \right) \int _{b \in p^{n_p}\cdot {W_p}} \frac{\#P_b(\mathbb {Q}_p)/\hat{\rho }(P^{\vee }_b(\mathbb {Q}_p))}{\#P^{\vee }_b[\hat{\rho }](\mathbb {Q}_p)}d b. \end{aligned}$$
(8.6)

Using the equality \(\#P_b(\mathbb {Q}_p)/\hat{\rho }(P^{\vee }_b(\mathbb {Q}_p)) = |1/2|_p \#P^{\vee }_b[\hat{\rho }](\mathbb {Q}_p)\) of Corollary 7.1, we see that the integral on the right hand side equals \(|1/2|_p{{\,\mathrm{vol}\,}}(p^{n_p}\cdot W_p)=|1/2|_pp^{-28n_p} {{\,\mathrm{vol}\,}}(W_p)\). Combining the identities (8.4) and (8.5) shows that

$$\begin{aligned} \lim _{X\rightarrow +\infty } X^{-28} \sum _{\begin{array}{c} b \in \mathcal {F}_W \\ \text {ht}(b) <X \end{array}}\left( \#{{\,\mathrm{Sel}\,}}_{\hat{\rho }}(P^{\vee }_b)-1 \right)&= \lim _{X\rightarrow +\infty } X^{-28}N_w(\underline{\mathsf {V}}^{\star }(\mathbb {Z})^{irr}\cap \mathsf {V}^{\star }(\mathbb {R})^{sol},M \cdot X). \end{aligned}$$

(That is, the limit on the left-hand-side exists if and only if the limit on the right-hand-side exists, and in that case their values coincide.) By Theorem 7.2 and the estimate \({{\,\mathrm{vol}\,}}(\mathsf {B}(\mathbb {R})_{<X})= 2^4 X^{28}+o(X^{28})\), the right-hand-side equals

$$\begin{aligned} \frac{|W_1|}{2} \left( \prod _p \int _{\underline{\mathsf {V}}^{\star }(\mathbb {Z}_p)} w_p(v) d v \right) {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}^{\star }(\mathbb {Z}) \backslash \mathsf {G}^{\star }(\mathbb {R}) \right) 2^4M^{28}. \end{aligned}$$

Using (8.6) and the remarks thereafter this simplifies to

$$\begin{aligned} {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}^{\star }(\mathbb {Z})\backslash \underline{\mathsf {G}}^{\star }(\mathbb {R}) \right) \prod _p {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}^{\star }(\mathbb {Z}_p)\right) 2^4\prod _{p} {{\,\mathrm{vol}\,}}(W_p). \end{aligned}$$

On the other hand, since \(\mathcal {F}_W\) is defined by congruence conditions we have

$$\begin{aligned} \lim _{X\rightarrow +\infty } \frac{\# \{b \in \mathcal {F}_W \mid \text {ht}(b) < X \}}{X^{28}} = 2^4\prod _p {{\,\mathrm{vol}\,}}(W_p). \end{aligned}$$
(8.7)

We conclude that

$$\begin{aligned} \lim _{X\rightarrow \infty } \frac{ \sum _{b\in \mathcal {F}_W,\; \text {ht}(b)<X } \left( \# {{\,\mathrm{Sel}\,}}_{\hat{\rho }}P^{\vee }_b-1 \right) }{\# \{b \in \mathcal {F}_W \mid \text {ht}(b) < X \}} = {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}^{\star }(\mathbb {Z})\backslash \underline{\mathsf {G}}^{\star }(\mathbb {R}) \right) \cdot \prod _p {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}^{\star }(\mathbb {Z}_p)\right) . \end{aligned}$$

Since the Tamagawa number of \(\mathsf {G}^{\star }= {{\,\mathrm{PGL}\,}}_2\) is 2, the proposition follows. \(\square \)

To deduce Theorem 8.1 from Proposition 8.2, choose for each \(i\in \mathbb {Z}_{\ge 1}\) open compact subsets \(W_{p,i} \subset \mathscr {E}_p\) (for p dividing N) such that if \(\mathcal {F}_{W_i} = \mathcal {F} \cap \left( \prod _{p | N} W_{p,i} \right) \), then \(W_i\) satisfies the conclusion of Proposition 8.2 and we have a countable partition \(\mathcal {F} = \mathcal {F}_{W_1}\sqcup \mathcal {F}_{W_2} \sqcup \cdots \). By an argument identical to the proof of [52,  Theorem 7.1], we see that for any \(\varepsilon >0\), there exists \(k\ge 1\) such that

$$\begin{aligned} \limsup _{X\rightarrow +\infty } \frac{ \sum _{\begin{array}{c} b \in \sqcup _{i\ge k} \mathcal {F}_{W_i} , \text {ht}(b)< X \end{array}} \left( \#{{\,\mathrm{Sel}\,}}_{\hat{\rho }}P^{\vee }_b -1\right) }{ \# \{b \in \mathcal {F} \mid \text {ht}(b)< X \} }<\varepsilon . \end{aligned}$$

Using Proposition 8.2 this implies that

$$\begin{aligned} \limsup _{X\rightarrow +\infty } \frac{ \sum _{\begin{array}{c} b\in \mathcal {F} , \text {ht}(b)< X \end{array}} \left( \#{{\,\mathrm{Sel}\,}}_{\hat{\rho }}P^{\vee }_b -1\right) }{ \# \{b \in \mathcal {F} \mid \text {ht}(b)< X \} }&\le 2 \limsup _{X\rightarrow +\infty }\frac{\# \{b \in \sqcup _{i<k} \mathcal {F}_{W_i} \mid \text {ht}(b)< X \} }{ \# \{b \in \mathcal {F} \mid \text {ht}(b) < X \} } +\varepsilon \\&\le 2+\varepsilon . \end{aligned}$$

Since the above inequality is true for any \(\varepsilon >0\), the expression on the left is bounded above by 2. Similarly we obtain

$$\begin{aligned} \liminf _{X\rightarrow \infty }\frac{ \sum _{\begin{array}{c} b\in \mathcal {F} , \text {ht}(b)< X \end{array}} \left( \#{{\,\mathrm{Sel}\,}}_{\hat{\rho }}P^{\vee }_b -1\right) }{ \# \{b \in \mathcal {F} \mid \text {ht}(b) < X \} } \ge 2. \end{aligned}$$

Combining the last two inequalities concludes the proof of Theorem 8.1.

Using the bigonal construction from Sect. 3.4, we immediately obtain the average size of \({{\,\mathrm{Sel}\,}}_{\rho }P_b\).

Theorem 8.2

Let \(\mathcal {F}\subset \mathscr {E}\) be a subset defined by finitely many congruence conditions. Then the average size of \({{\,\mathrm{Sel}\,}}_{\rho }P_b\) for \(b\in \mathcal {F}\), when ordered by height, exists and equals 3.

Proof

In the notation of Sect. 8.1, Theorem 8.1 remains valid for any box of \(\underline{\mathsf {B}}\), by an identical proof. Since \({{\,\mathrm{Sel}\,}}_{\rho }P_{\hat{b}}\simeq {{\,\mathrm{Sel}\,}}_{\hat{\rho }}P_b^{\vee }\) (Theorem 3.1), the theorem follows from Proposition 8.1 applied to the automorphism \(\chi :\mathsf {B}\rightarrow \mathsf {B}\).

8.3 The average size of the 2-Selmer group

If \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(\mathbb {Q})\), let \({{\,\mathrm{Sel}\,}}^{\natural }_2 P_b \subset {{\,\mathrm{Sel}\,}}_2 P_b\) be the subset of elements whose image under the embedding \({{\,\mathrm{Sel}\,}}_2 P_b \hookrightarrow \mathsf {G}(\mathbb {Q}) \backslash \mathsf {V}_b(\mathbb {Q})\) is strongly \(\mathbb {Q}\)-irreducible (as defined in Sect. 6.4). By Corollary 4.3, it coincides with the subset of \({{\,\mathrm{Sel}\,}}_2P_b\) whose image in \({{\,\mathrm{Sel}\,}}_{\hat{\rho }}P^{\vee }_b\) under \(\rho \) is nontrivial.

Theorem 8.3

Let \(\mathcal {F}\subset \mathscr {E}\) be a subset defined by finitely many congruence conditions (see Sect. 8.2). Then the average size of \({{\,\mathrm{Sel}\,}}_2^{\natural }P_b\) for \(b\in \mathcal {F}\), when ordered by height, is bounded above by 2.

Proof

The proof is very similar to that of Theorem 8.1, using the results of Sects. 4.1, 5.4 and 6. We give a brief sketch. Again it suffices to prove that for each \(b_0\in \mathcal {F}\) and for every prime p dividing N, we can find an open compact neighborhood \(W_p\) of \(b_0\) in \(\mathcal {F}_p\) such that the average size of \({{\,\mathrm{Sel}\,}}_2^{\natural }P_b\) is bounded above by 2 in the family \(\mathcal {F}_W:=\mathcal {F} \cap \left( \prod _{p | N} W_p \right) \). Choose sets \(W_p\subset \mathcal {F}_p\) and integers \(n_p\ge 0\) for \(p\mid N\) satisfying the conclusion of Corollary 5.3. Set \(W_p=\mathcal {F}_p\) and \(n_p=0\) if p does not divide N. Let \(M=\prod _p p^{n_p}\). For \(v\in \underline{\mathsf {V}}(\mathbb {Z})\) with \(\pi (v)=b\), define \(w(v) \in \mathbb {Q}_{\ge 0}\) by the following formula:

$$\begin{aligned} w(v) = {\left\{ \begin{array}{ll} \left( {\sum }_{v'\in \underline{\mathsf {G}}(\mathbb {Z})\backslash \left( \underline{\mathsf {G}}(\mathbb {Q})\cdot v \cap \underline{\mathsf {V}}(\mathbb {Z}) \right) } \frac{\# Z_{\underline{\mathsf {G}}}(v')(\mathbb {Q})}{\# Z_{\underline{\mathsf {G}}}(v')(\mathbb {Z})} \right) ^{-1} &{} \text {if }b\in p^{n_p}\cdot W_p \text { and } \mathsf {G}(\mathbb {Q}_p)\\ &{}\quad \cdot v \in \eta _{b}(P_b(\mathbb {Q}_p)/2P_b(\mathbb {Q}_p) \text { for all }p , \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Then Corollaries 4.1 and 5.3 and Proposition 6.8 imply that

$$\begin{aligned} \sum _{\begin{array}{c} b \in \mathcal {F}_W \\ \text {ht}(b) <X \end{array}} \#{{\,\mathrm{Sel}\,}}_{2}^{\natural }(P_b) = N_{w}(\underline{\mathsf {V}}(\mathbb {Z})^{sirr}\cap \mathsf {V}(\mathbb {R})^{sol} ,M \cdot X)+o(X^{28}). \end{aligned}$$

Similar to the proof of Theorem 8.1, the function w(v) decomposes into a product of local terms \(\prod _p w_p(v)\) and is acceptable by Part 1 of Lemma 6.3 and Proposition 5.9. By Theorem 6.3 and the evaluation of the integrals \(\int _{\underline{\mathsf {V}}(\mathbb {Z}_p)}w_p(v)dv\) using Lemma 6.3, we obtain the estimate

$$\begin{aligned} N_{w}(\underline{\mathsf {V}}(\mathbb {Z})^{sirr}\cap \mathsf {V}(\mathbb {R})^{sol} , M \cdot X)\le & {} {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z})\backslash \underline{\mathsf {G}}(\mathbb {R}) \right) \\&\prod _p {{\,\mathrm{vol}\,}}\left( \underline{\mathsf {G}}(\mathbb {Z}_p)\right) \prod _{p} {{\,\mathrm{vol}\,}}(W_p){{\,\mathrm{vol}\,}}(\mathsf {B}(\mathbb {R})_{<X})+o(X^{28}). \end{aligned}$$

The result now follows from Eq. (8.7) and the fact that the Tamagawa number of \(\mathsf {G}\) is 2 (Proposition 6.1). \(\square \)

To obtain a bound on the full 2-Selmer group of \(P_b\), we use the results of §8.2. For every \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(\mathbb {Q})\), the factorization of isogenies \([2] = \hat{\rho }\circ \rho \) gives rise to an exact sequence

$$\begin{aligned} {{\,\mathrm{Sel}\,}}_{\rho } P_b \rightarrow {{\,\mathrm{Sel}\,}}_2 P_b \rightarrow {{\,\mathrm{Sel}\,}}_{\hat{\rho }} P^{\vee }_b . \end{aligned}$$

We obtain the inequality

$$\begin{aligned} \#{{\,\mathrm{Sel}\,}}_2P_b \le \#{{\,\mathrm{Sel}\,}}_2^{\natural }P_b+\# {{\,\mathrm{Sel}\,}}_{\rho }P_b. \end{aligned}$$

Our last result then follows from Theorems 8.2 and 8.3:

Theorem 8.4

Let \(\mathcal {F}\subset \mathscr {E}\) be a subset defined by finitely many congruence conditions. Then the average size of \({{\,\mathrm{Sel}\,}}_2P_b\) for \(b\in \mathcal {F}\), when ordered by height, is bounded above by 5.

Remark 8.2

For every \(b\in \mathsf {B}^{{{\,\mathrm{rs}\,}}}(\mathbb {Q})\) we have an exact sequence

$$\begin{aligned} \hat{E}_b[2](\mathbb {Q}) \rightarrow {{\,\mathrm{Sel}\,}}_{\rho } P_b \rightarrow {{\,\mathrm{Sel}\,}}_2 P_b \rightarrow {{\,\mathrm{Sel}\,}}_{\hat{\rho }} P^{\vee }_b . \end{aligned}$$

Moreover an easy Hilbert irreducibility argument shows that the average size of \(\#\hat{E}_b[2](\mathbb {Q})\) for \(b\in \mathcal {F}\) is 1. We conclude that the average size of \({{\,\mathrm{Sel}\,}}_2 P_b\) equals the sum of the average sizes of \({{\,\mathrm{Sel}\,}}_{\rho }P_b\) (which is 3 by Theorem 8.2) and \({{\,\mathrm{Sel}\,}}_2^{\natural }P_b\), provided the latter quantity exists.