The geometric distribution of Selmer groups of elliptic curves over function fields

Fix a positive integer $n$ and a finite field $\mathbb F_q$. We study the joint distribution of the rank of $E$, the $n$-Selmer group of $E$, and the $n$-torsion in the Tate-Shafarevich group of $E$ as $E$ varies over elliptic curves of fixed height $d \geq 2$ over $\mathbb F_q(t)$. We compute this joint distribution in the large $q$ limit. We also show that the"large $q$, then large height"limit of this distribution agrees with the one predicted by Bhargava-Kane-Lenstra-Poonen-Rains.

1. Introduction 1.1. Arithmetic statistics of Selmer groups. The statistical behavior of Selmer groups has recently been the focus of much study. In [BKL + 15], remarkable probability distributions are introduced to model the distribution of the n-Selmer group Sel n (E), for E varying through isomorphism classes of elliptic curves over a fixed global field. We refer to the these distributions, and the models which generate them, as the "BKLPR heuristic." The BKLPR heuristic is consistent with all known results on the statistics of Selmer groups.
One can also consider the analogous question for elliptic curves over a global function field. The heuristics make sense in that case as well, and it is generally believed that in the "large height, then large q" limit, lim q→∞ lim d→∞ , the statistics of Selmer groups over global function fields should behave the same as in the case of number fields. For example, [dJ02] computes the average size of 3-Selmer groups in this limit, and [HLHN14] computes the average size of 2-Selmer groups in this limit. Most notably, breakthrough work of Bhargava-Shankar [BS15a,BS15b,BS13a,BS13b] computes the average size of n-Selmer groups for elliptic curves over number fields for n = 2, 3, 4, 5; the methods are expected to extend to global function fields with the same answers (and without taking a large q limit!). The proofs of all these results rely on special features of small n, and confirming the BKLPR heuristic for the average size of Sel n seems out of reach at present when n > 5. Our goal is to nevertheless provide some partial evidence for the full BKLPR heuristic, by studying an easier version of the problem.
To this end, we study the limiting process in the reversed order, lim d→∞ lim q→∞ for elliptic curves over a rational function field F q (t). This problem is significantly more accessible by algebraic geometry, which allows us to identify the distribution completely. Informally speaking, we show that in the "large q, then large height" limit, the distribution of Sel n (E) is exactly as predicted by the BKLPR heuristic. A novel difficulty of this result is that it cannot be proved simply by computing and comparing the moments of the two distributions, because these distributions are not determined by their moments. Conversely, because the distribution is unbounded, convergence in distribution in the "large q, then large height" limit does not automatically imply convergence of the moments in these limits, though we do show the moments converge to the BKLPR moments as well.
1.2. Statement of results.
1.2.1. Some notation. We now introduce notation in order to state our main results precisely. Let p = char(F q ). For p > 2, an elliptic curve E over F q (t) has a minimal Weierstrass model of the form y 2 = x 3 + a 2 (t)x 2 + a 4 (t)x + a 6 (t), where a i (t) is a polynomial of degree 2id for i ∈ {1, 2, 3} (cf. [dJ02, §4.2-4.8]). This value of d is uniquely determined by E, and we define d =: h(E) to be the height of E. Let (rk, Sel n ) d Fq denote the probability distribution assigning to a pair (r, G), for r ∈ Z and G a finite abelian group, the proportion of isomorphism classes of height d elliptic curves over F q (t) with algebraic rank r and n-Selmer group isomorphic to G (see Definition 1.3).
1.2.2. The BKLPR heuristic. We summarize the BKLPR heuristic in § 5.3. Briefly put, it models the distribution of the ℓ ∞ -Selmer group in terms of the intersection in (Q ℓ /Z ℓ ) m induced by two maximal isotropic subspaces of Z m ℓ (with the standard split quadratic form) as m → ∞. Conditioned on the rank, the ℓ-primary parts of the Selmer group are predicted to behave independently. This gives, in particular, a conjectural joint distribution (rk BKLPR , Sel BKLPR n ) for the rank and n-Selmer group of elliptic curves, described in Definition 5.12. (rk, Sel n ) d Fq as functions on {(r, G)}. 1 (Note that because we are taking a pointwise lim inf or lim sup, the resulting function may no longer be a probability distribution, i.e., its sum over all (r, G) may not be 1.) Our main result is the following, which we deduce as a consequence of Theorem 6.1 and Theorem 6.4: Theorem 1.1. For fixed integers d ≥ 2 and n ≥ 1, and q ranging over prime powers, the limits As far as we are aware, our results give the first direct connection between the heuristics of [BKL + 15] for general n and the arithmetic of elliptic curves. Further, our results suggest a potential approach to proving the conjectures of [BKL + 15] in the function field setting via homological stability techniques as used in [EVW16] to prove a version of the Cohen-Lenstra heuristics over function fields.
Remark 1.2. One can deduce a more precise version of Theorem 1.1 with estimates on the error terms in the above limits directly from Theorem 6.1 and Theorem 6.4. One may also deduce the same result holds with algebraic rank replaced by analytic rank. Further, one may include the joint distribution of Tate-Shafarevich groups -see Remark 1.8.
1.2.4. Summary of the main difficulties. Experts will recognize that the distribution in this "large q limit" is completely determined by certain monodromy representations. Letting W •d B be the "moduli space of smooth height d elliptic surfaces" (described more precisely in § 3.3) the relevant monodromy representations take the form ρ d n,B : π 1 (W •d B ) → GL(V d n ). Their significance lies in the fact that they control the number of connected components of moduli spaces parameterizing Selmer elements. Let us call the image of ρ d n,B the (arithmetic) monodromy group, and the image of ρ d n,B | π1((W •d B ) F q ) the geometric monodromy group. Let us talk through some of the difficulties in proving Theorem 1.1 in order to orient the reader where the content of the paper lies. First, it is important that we determine the monodromy group precisely. If we had just wanted to compute the moments of Sel n , then it would have been enough to know that the geometry monodromy group is "large enough." However, the behavior of the distribution depends more subtly on the arithmetic monodromy group. For example, it turns out that sometimes the Selmer distribution does not have a limit as q → ∞, and this can happen even when q is taken only over powers of a fixed odd prime p. Nevertheless, both the "lim sup q→∞ " and the "lim inf q→∞ " exist, and tend towards each other as the height tends to ∞.
In a bit more detail, it is possible that for fixed height d, the Selmer distribution does not have a well defined limit as q → ∞. Specifically, the lim sup q→∞ and lim inf q→∞ do not agree when, for an infinite sequence of q's over which the limits run, the arithmetic monodromy group contains an element of non-trivial spinor norm (see § 3.2.2) but the geometric monodromy group does not. In this case, the arithmetic monodromy group fluctuates between two possibilities, which ends up creating a discrepancy between lim sup q→∞ and lim inf q→∞ .
A second substantial issue is that even after having determined the monodromy representations that control the Selmer groups, it is not straightforward to identify the resulting distribution with the BKLPR heuristic. (To be clear, this is a purely combinatorial question, although it turns out to require techniques from algebraic geometry, number theory, etc. to address.) The reason for this difficulty is that the BKLPR heuristic is not described in terms of explicit closed formulas, but in terms of a random algebraic model. For example, it is not determined by its moments, as illustrated in Example 1.12 below. In order to compare the BKLPR distribution to the distribution coming from a monodromy representation, we introduce a "random kernel model" that mediates between the two distributions. We observe that both the BKLPR heuristic and the random kernel model enjoy Markov properties which reduce their comparison to simpler cases that can be computed explicitly, by matching enough moments. (Even this is a little oversimplified: what we need is to establish enough control on the moments already at a "finite height" level-see § 4.) 1.2.5. Defining the random variables. In order to state the next results, we will need to introduce some more notation.
Let Ab n denote the set of isomorphism classes of finite Z/nZ-modules. We will next define several distributions on Z ≥0 × Ab n modeling the joint distribution of the rank and n-Selmer group of an elliptic curves. For E an elliptic curve, we use rk(E) to denote the algebraic rank of E and rk an (E) to denote the analytic rank of E. In what follows, we use E to denote an isomorphism class of elliptic curves. For a random variable X, we let E[X] be denote the expected value of X (if it exists).
Remark 1.4. In Definition 1.3, for the purposes of computing these distributions in the limit q → ∞, we could equally well replace the condition h(E) = d by the condition h(E) ≤ d. The reason for this is that isomorphism classes of curves with h(E) < d are parameterized by k points of the stack W ′ i k (defined below in § 2.1.5) for i < d, which is a finite type global quotient stack of strictly smaller dimension than W ′ d k .
Hence, ∪ i≤d W ′ i k will only contributes at most O n,d (q −1/2 ) to the probability distributions in question, as can be deduced from the Lang-Weil estimate and [Lan21,Lemma 5.3].
For analogous reasons, one can equally well weight the above counts by automorphisms (which would be the correct "stacky way" to count points) and the distribution in the q → ∞ limit will remain the same. Note that after excising the locus of elliptic curves with more than 2 automorphisms, there will be a factor of one half in both the numerator and denominator in the definition of the distributions in Definition 1.3, which cancel out.
1.2.6. Some consequences. The following result (which is part of Corollary 6.5) is a variant of the Katz-Sarnak minimalist conjecture, stating that for fixed height, in the large q limit, the average rank is 1/2. Moreover, in the large q limit, the rank takes value 1 and 0 with probability 1/2, and takes value ≥ 2 with probability 0. It can also be deduced from [Kat05,Theorem 13.3.3], though the more precise error terms given in Corollary 6.5 do not directly follow from [Kat05,Theorem 13.3.3]. We note that the fact that elliptic curves in the large q limit have rank 0 with probability 1/2 is not a direct consequence of Theorem 1.1, but it comes out of the more refined analysis used to prove Theorem 1.1 for n = ℓ a prime. 2 The following calculation of the geometric moments of Selmer groups is a consequence of Theorem 6.6, which includes more precise error terms. (1) Fix c ℓ ∈ Z ≥0 for each prime ℓ | n. Then (3) For m ≤ 6d − 3, we have The following corollary is the more familiar case of Theorem 1.6 when n is taken to be a prime ℓ. One can also deduce a version with explicit error terms in q, as in Theorem 6.6. (1) We have 2 However, the statement that elliptic curves in the large q limit have rank at least 2 with probability 0 does follow from just the computation of the average size of # Seln, see [Lan21, Corollary 1.3].
(2) We have Remark 1.8 (Distributions of Tate-Shafarevich groups). Throughout this paper, we mostly work with the joint distribution of ranks and n-Selmer groups of elliptic curves, while [BKL + 15] also makes predictions for Tate-Shafarevich groups of elliptic curves. Indeed, as an easy consequence of our results, we obtain analogous predictions for Tate-Shafarevich groups, as we now explain. For E a torsion free elliptic curve over F q (t), we have an exact sequence Note that the torsion freeness condition is satisfied 100% of the time [BKL + 15, Lemma 5.7]. Therefore, the algebraic rank and n-Selmer group of E determines X(E)[n], and hence the joint distribution of algebraic ranks, and n-Selmer groups determines the joint distribution of algebraic ranks, n-Selmer groups, and n-torsion in Tate-Shafarevich groups. Let (rk BKLPR , Sel BKLPR n , X[n] BKLPR ) denote the conjectural joint distribution for ranks, n-Selmer groups, and n-torsion in Tate-Shafarevich groups described in [BKL + 15, §5.7] and let (rk, Sel n , X[n]) d Fq ) denote the joint distribution of algebraic ranks, n-Selmer groups, and n-torsion in Tate-Shafarevich groups of height d elliptic curves over F q . Then, it follows from Theorem 1.1 and the above remarks that One can also bound the error in these limits using Theorem 6.1 and Theorem 6.4. We note that for fixed height d ≥ 2, the proportion of elliptic curves of height up to d over F q with analytic rank equal to algebraic rank tends to 1 as q → ∞ over prime powers q with gcd(q, 2) = 1. This follows from Theorem 1.1 and Proposition 6.3. Therefore, the Birch and Swinnerton-Dyer Conjecture holds for all such curves, implying the Tate-Shafarevich group is finite for all such curves. Remark 1.9 (Families of quadratic twists). In other families of elliptic curves, such as quadratic twist families, the "geometric distribution" will similarly be controlled by the analogous monodromy representations to those described in §1.2.4. Adapting our arguments will yield similar results for such families whenever the geometric monodromy group is large enough. However, the precise distribution that results depends rather delicately on the precise monodromy group, for the same reasons as described in §1.2.4.
For example, in forthcoming work [PW21], Park and Wang carry out an analog of the results of [Lan21] for quadratic twist families of elliptic curves, at least in the case of n-Selmer groups for n prime. We note this should often be extendable to composite n, see [Lan21, Remark 1.7]. Suppose one chooses a quadratic twist family such that the sheaf on that family constructed analogously to S •d n,B on the universal family has geometric monodromy containing the commutator of the relevant orthogonal group, but with nontrivial Dickson invariant (see § 3.2.4). Given such a family, via similar arguments to those in this paper, if one first takes lim inf q→∞ or lim sup q→∞ , and then a large height limit, the joint distribution of the rank and n-Selmer group will agree with (rk BKLPR , Sel BKLPR n ). We note that triviality or nontriviality of the Dickson invariant can often be verified for explicit examples, as in the proof of [Zyw14, Theorem 4.1].
On the other hand, it is possible for the Dickson invariant to be trivial in quadratic twist families; explicit such examples are constructed in [Zyw14,§5 and §6]. In these cases, the distribution of ranks and Selmer groups in the quadratic twist family will differ from those predicted in [BKL + 15]. E.g., the minimalist conjecture will fail as 100% of elliptic curves in such families will have rank 0. Nevertheless, for sufficiently high degree twists, the large q limit mth moments in these quadratic twist families will agree with those predicted in [BKL + 15]. Additionally, it is possible to choose quadratic twist families where the relevant geometric monodromy does not contain the commutator of the relevant orthogonal group, in which case the large q limit statistics of ranks and Selmer groups may differ drastically from those predicted in [BKL + 15].
Remark 1.10 (The inverse Galois problem). For ℓ a prime, let Q d ℓ denote the quadratic form defined in Definition 3.1, which we note has discriminant 1 and hence is equivalent to the standard quadratic form x 1 x 2 + x 3 x 4 + · · · + x 12d−5 x 12d−4 . In order to prove Theorem 1.1, we perform a certain monodromy computation in Theorem 3.14, which shows that for even d ≥ 2, and ℓ ∤ d − 1, O(Q d ℓ ) occurs as a Galois group over Q(t 1 , . . . , t 10d+2 ), and hence also as a Galois group over Q by Hilbert irreducibility ([Ser97, §9.2, Proposition 2] in conjunction with [Ser97, §13.1, Theorem 3]). To our knowledge, it was not previously known that these groups all appear as Galois groups over Q.
Closely related constructions to ours are given in [Zyw14, Theorem 1.1], and the techniques of [Zyw14] can likely be adapted to construct the Galois groups O(Q d ℓ ) when ℓ ≥ 5. However, our results also apply in the cases ℓ = 2 and ℓ = 3, to which the techniques of [Zyw14] seem not to apply.
Remark 1.11. An interesting byproduct of the proof of Theorem 1.1 is that the analytic rank of an elliptic curve over F q (t) with smooth minimal proper regular model is realized as the dimension of the generalized 1-eigenspace of a certain matrix associated to an action of Frobenius (see Lemma 3.18) while the ℓ ∞ -Selmer rank is the dimension of the 1-eigenspace of that same matrix (see Lemma 6.2). These dimensions agree for 100% of elliptic curves of fixed height d over F q (t) in the large q limit and also agree with the rank of the elliptic curve (see Proposition 6.3). Hence, at least in the function field setting, this gives an answer to the question raised in [PPVW19, Remark 1.1.4] as to whether there exists a natural matrix coming from the arithmetic of elliptic curves giving rise to the rank and Selmer group of an elliptic curve.
Example 1.12 (A distribution not determined by its moments). Consider the three distributions with the latter two the distributions conditioning upon whether the rank is even or odd. These give examples of three distinct distributions which we claim have the same mth moments for all m ≥ 0.
We now justify why the moments of these three distributions agree. For simplicity, we assume n is prime, though the same claim holds true for general composite n, as can be deduced from the Markov properties verified in § 5. By Theorem 6.4, the above three distributions agree with the three distributions respectively. By Definition 4.2, these distributions are all given by the limit as d → ∞ of the the dimension of the kernel of a random matrix drawn from certain cosets of the orthogonal group of rank 12d − 4. The distribution conditioned on even rank corresponds to the cosets with Dickson invariant 0 while that conditioned on odd rank corresponds to cosets with Dickson invariant 1. Therefore, by Theorem 4.10, the moments of these distributions all stabilize in d (in fact once 6d − 3 ≥ m), and are equal to m i=1 ℓ i + 1 . 1.3. Overview of the proof. We next indicate the idea of the proof of Theorem 1.
The cover π is associated to a monodromy representation ρ d n,Fq : After determining the monodromy group, this reduces to a combinatorial problem: compute the distribution of dim ker(g − id) for a g drawn randomly from the monodromy group. For V d n over Z/ℓZ, (i.e., the case that n = ℓ is prime,) and g drawn from the full O(Q d ℓ ), this computation was done in unpublished work of Rudvalis and Shinoda, as we learned from [FS16]. We give an alternative proof which generalizes to the case where g is drawn from certain proper subgroups of O(Q d ℓ ) related to the monodromy group (which is needed for our results).
After handling the case where n = ℓ is prime, we move on to the case of Sel ℓ e . In this case, we prove that there is a characterization of ker(g − id) in terms of a Markov property, and that the BKLPR heuristic is also characterized by this same Markov property. The case of general Sel n for n composite follows from the prime power case by the Chinese remainder theorem.
1.4. Outline of Paper. We next give a brief outline of the content of the various sections in this paper. In § 2 we recall the construction of Selmer spaces, which parameterize Selmer elements of elliptic curves. The Selmer spaces mentioned above are generically finite étale covers of the moduli space of height d elliptic surfaces. In § 3 we compute the monodromy associated to these covers. Next, in § 4 we establish that the geometric distribution of prime order Selmer groups agree with that predicted by the BKLPR heuristic. In § 5, we show that both the BKLPR heuristic distribution and our geometric distribution agree for prime powers, by relating the two distributions for ℓ j -Selmer groups to the two distributions for ℓ j+1 -Selmer groups via separate Markov processes. Finally, in § 6 we put the pieces together to the prove our main results.
1.5. Acknowledgements. It is our pleasure to thank Ravi Vakil for organizing the "What's on My Mind" seminar, which led to the genesis of this paper. We thank Johan de Jong, Chao Li, Bjorn Poonen, Arul Shankar, Doug Ulmer, and Melanie Matchett Wood for helpful discussions. We thank Lisa Sauermann for help translating [Kne84]. We also thank David Zureick-Brown and Jackson Morrow for help with writing and running MAGMA code. This sheaf is closely related to the L-function of elliptic curves, and hence gives us a way to access the analytic ranks of elliptic curves in terms of the Selmer sheaf. Our notation differs slightly from that of [Lan21] due to a minor error (only appearing in characteristic 3), as we will explain further in Remark 2.1.
denote the open subscheme parameterizing those points such that the Weierstrass equation defines an elliptic surface with smooth generic fiber. This is open as it corresponds to the open subscheme of A 12d+3 B such that the discriminant is nonzero.
Remark 2.1. There was a minor error in [Lan21, Definition 3.1] where it was claimed that a Weierstrass model is minimal if and only if it is of the form y 2 z = x 3 + a 2 (s, t)x 2 z + a 4 (s, t)xz 2 + a 6 (s, t)z 3 with no non-constant polynomial f ∈ k[s, t] with f 2i | a 2i (s, t) for all i ∈ {1, 2, 3}. However, it is only true that it can be written in this form after a change of variables.
This makes it less obvious that in characteristic 3, the locus of minimal Weierstrass equations is open A 12d+3 B . It is fairly simple to see this is true in characteristic neither 2 nor 3, since one can make a change of variables to assume a 2 (s, t) = 0, and then the resulting equation y 2 z = x 3 + a 4 (s, t)xz 2 + a 6 (s, t)z 3 is minimal if and only if there is no non In characteristic 3, this non-minimal locus is still open, but we only found a somewhat involved proof which involves tracing through the steps of Tate's algorithm.
To avoid this fairly involved proof, we opt to work over a slightly larger open set W ′ d B , which does not parameterize minimal Weierstrass models, but instead parameterizes all Weierstrass models over A 12d+3 B with smooth generic fiber. Since the two open subsets differ by a divisor, their point counts do not contribute in the large q limit, and so which set we work with does not substantially alter the argument.

As mentioned in
Assuming further that 2n is invertible on B. Define the n-Selmer sheaf over B of height d as Seℓ ′ d n,B := R 1 g * (R 1 f * µ n ). Define the n-Selmer space over B of height d, denoted Sel ′ d n,B as the algebraic space representing the sheaf of Z/nZ modules Seℓ ′ d n,B . Let 2.1.5. A moduli stack of elliptic curves. Note that G 2d+1 a ⋊ G m acts on UW ′ d B and W ′ d B compatibly. Loosely speaking, (r 0 , . . . , r 2d ) ∈ G 2d+1 a acts by sending x → x + r 0 s 2d + r 1 ts 2d−1 + · · · + r 2d t 2d and λ ∈ G m acts by sending a 2i (s, t) → λ 2i a 2i (s, t), see [ , for η the generic point of P 1 . We often notate this by where E x is the elliptic curve corresponding to x.
2.2. The relation between Selmer spaces and Selmer groups. We have now defined the Selmer space, but have not yet explained the connection to Selmer groups of elliptic curves. The following lemma provides the relation.
Define E[n] S := (j S ) * R 1 f S * µ n (we note that E[n] S is a slight abuse of notation since it depends on the map α S and not just the scheme S). This sheaf represents the relative n torsion of f S . Define the sheaf denote the map induced from the composition of functors spectral sequence for g • j. We will show that S •d n,B is the image of the composi- Once we show this, it will immediately follow that S •d n,B is constructible, being the image of a map of constructible sheaves.
By the Leray spectral sequence, ψ is always injective. Hence, to identify S •d n,B as the image of ψ•φ, we only need to show φ is surjective. To this end, define M as the quotient sheaf j Note that M is supported on the complement of U which is finite over W •d B . Therefore, R 1 g * M = 0 and we conclude that  [Ver67] gives an isomorphism of sheaves in the derived category . Note that the [2] denotes a cohomological shift by 2 while the [n] refers to the n-torsion.
We will now take (−1)st cohomology of both sides. By construction of U , E[n] W • d B is locally constant on U , and therefore the ith cohomology of , the latter isomorphism induced by the Weil pairing. Additionally, since . Therefore, taking (−1)st cohomology of the Poincaré duality isomorphism yields Since the right hand side is locally constant constructible over W d B , the left hand side is as well, and therefore commutes with base change.
We next produce an isomorphism , crucially using that the formation of both sheaves commute with base change.
Proposition 2.7. Retain notation from Notation 2.4. There is canonical map , which commutes with base change.
Proof. Retaining notation from Notation 2.4, define the maps j ′ and f ′ as in the fiber square We have canonical maps coming from Leray spectral sequences (2.3) Using the Kummer exact sequence (possible since n is invertible by Notation 2.4) and the assumption that the fibers of f ′ are smooth connected elliptic curves so [BLR90, §9.5, Theorem 1] applies, we obtain isomorphisms (2.4) We show this map induces an isomorphism ). To verify this is an isomorphism, it suffices to do so on stalks. As the formation of both sides commutes with base change by proper base change and Lemma 2.6, we can check this is an isomorphism in the case that the base is a geometric point.
Thus, it suffices to show that if f x : W x → P 1 x is a smooth minimal Weierstrass model corresponding to a point x ∈ W •d B , j x is the restriction of j to x, and g x is the restriction of g to x, then the map on stalks is also represented by the Néron model of E x [n] by the Néron mapping property. The Néron mapping property implies that to check 3) is an isomorphism, it suffices to check its restriction to U is an isomorphism. That is, we want to show the base change of x is an isomorphism. If we could show this is the natural base change map, it would indeed be an isomorphism by proper base change.
So, to conclude the proof, we only need to check the constructed map j * R 1 f * (µ n ) → R 1 f ′ * j ′ * µ n , coming from pulling back (2.3) along j, is the base change map. Indeed, this follows from the definitions. In more detail, recall that for F a sheaf on UW •d B , the base change map is given as the map of δ-functors However, pulling back the map of (2.3) along j is given by the composition . This is precisely the resulting map on degree 1 δ-functors, and hence is the natural base change map.

The precise monodromy of Selmer spaces
The main result of this section is Theorem 3.14 where we compute precisely the monodromy group associated to the cover Sel •d n,B → W •n B . In order to state the theorem, we first introduce some various notation relating to orthogonal groups and the monodromy representation. Following this, we recall a general result on equidistribution of Frobenius elements in § 3.4. The remainder of the section is devoted to proving Theorem 3.14, whose proof is outlined at the end of § 3.5.
3.1. Adelic notation. For R an integral noetherian ring with fraction field Frac(R) such that char(Frac(R)) = p, let We allow p = 0, in which case Z (0) = Z.

Notation for orthogonal groups.
3.2.1. Notation for quadratic forms. Let R be a ring. A quadratic space over R is a pair (V, Q) where V is a free module over R and Q : V → R is a quadratic form. We say a quadratic space (V, Q) is nondegenerate if the hypersurface defined by the vanishing of Q in PV ∨ is smooth over Spec R. When 2 is invertible or rk V is even, this is equivalent to the discriminant of Q being a unit on Spec R, see [Con14, Remark C.1.1]. See [Con14, C.1] for a characterization in terms of non-degeneracy of the associated bilinear form on fibers. Let O(Q) the corresponding orthogonal group. Note that we will use O(Q) to denote both the group and the group scheme. We will primarily consider it as a group, and whenever we use it to denote the group scheme O(Q), we refer to it as "the algebraic group O(Q)." When the map φ is understood, we notate this as (V S , Q S ) := (V φ , Q φ ). In the special case that S = Z/nZ, we will also use (V n , S n ) := (V Z/nZ , Q Z/nZ ).
to be the rank 12d − 4 free Z module associated to U ⊕(2d−2) ⊕ (−E 8 ) ⊕d , for U a hyperbolic plane and −E 8 the E 8 lattice with the negative of its usual pairing. Then (V d n , Q d n ) denotes the reduction of this quadratic space modulo n.
For Q a quadratic form on a free module V over a ring R, the associated bilinear form In what follows, we assume the quadratic form Q is nondegenerate. For Remark 3.2. When R is a field, O(Q) is generated by these reflections so long as 3.2.2. The spinor norm. For completeness, we briefly recall the formal definition of the −1-spinor norm. We follow [Con14,p. 349] which gives the definition in the more general context of algebraic groups. Let (V, Q) be a quadratic space over R, and suppose that either rk V is even or 2 is invertible on R. The +1-spinor norm is then defined as the boundary map on cohomology 3.2.3. The adelic spinor map. We now spell out some notation to describe the spinor map for a quadratic form over Z (p) . Let p either be a prime or p = 0. Let (V, Q) be a nondegenerate quadratic space over Z (p) . Let where the first copy of (Z/2Z) 2 comes from and the copy of Z/2Z indexed by an odd prime ℓ comes from When p = 0 and q is a power of p, we let denote the element induced by multiplication by q on Z (p) .
3 Although it will not be relevant to this paper, as we shall ultimately only be interested in the even rank quadratic space of Definition 3.1, one can define the spinor norm on O(Q) in the case that R is a field of characteristic 2 and rk V is odd. This can be done using the equality O(Q) = SO(Q) as abstract groups (even though the corresponding group schemes are not isomorphic) since the group scheme SO(Q) is the underlying reduced subscheme of the group scheme O(Q), see [Con14, Remark C.5.12].

The Dickson invariant.
Next, for (Q, V ) a quadratic space over a ring R with Spec R connected, the Dickson invariant is a map as defined in [Con14, (C.2.2) and Remark C.2.5]. In the case (Q, V ) is a quadratic space over a ring R such that Spec R is a disjoint union of finitely many connected components, such as when R = Z/nZ, we define the Dickson invariant as the resulting map obtained by restricting to a given connected component of Spec R and then applying the Dickson invariant on that component. In the case R = Z (p) , we define the Dickson invariant as the resulting composition Remark 3.5. In the case that 2 is invertible on R with Spec R connected, the Dickson invariant agrees with the determinant [Con14, Corollary C.3.2]. However, over a field k of characteristic 2, the determinant is trivial while the Dickson invariant is nontrivial (and it is nontrivial on k-points when the rank of the quadratic space is even) [Con14, Proposition C.2.8].
Over a field of characteristic 2, the Dickson invariant is sometimes also called the pseudodeterminant, and the following explicit description, which follows from the fact that reflections always have nontrivial Dickson invariant, will be useful: For any T ∈ O(Q), and any expression of T as a product of reflections T = r v1 · · · r vs , (which exists so long as (k, rk V ) = (F 2 , 4) by Remark 3.2,) the Dickson invariant is given by the map O(Q) → Z/2Z which sends T → s mod 2.
Because the −1-spinor norm agrees with the +1-spinor norm when restricted to SO(Q), it follows that Ω(Q) is also the joint kernel of the Dickson map and the +1-spinor norm.
3.3. Notation for the monodromy representation. When d > 0, the map π : Sel •d n,B → W •d B is finite étale, representing a locally constant constructible sheaf of rank 12d − 4 free Z/nZ modules by [Lan21,Corollary 3.22]. For B an integral noetherian Z[1/2n] scheme, letting V d n denote the rank 12d − 4 free Z/nZ module corresponding to the geometric generic fiber of π, we obtain a monodromy representation Remark 3.7. Strictly speaking, we should keep track of base points in our fundamental groups. However, as we will ultimately be concerned with integral base schemes B, changing basepoint only changes the map ρ d n,k by conjugation on the domain. Since we will only care about the image of ρ d n,k , we will often omit the basepoint from our notation.
For R a ring, we use ρ d n,R to denote ρ d n,Spec R .
3.3.1. The adelic monodromy map. For n ′ | n both prime to char(k), we obtain a map ). For n prime to p, we have a natural reduction modn map r n : ,R is uniquely characterized by the property that for all n prime to p, r n ρ d . In this section we prove an equidistribution result for Frobenius classes in the monodromy group, in the large q limit. To state the proposition, we define the "mult" map.
Definition 3.8. Let X be a geometrically connected finite type scheme over F q , let G be a profinite group, and let λ : π 1 (X) → G be a group homomorphism. Let G 0 denote the image of the composition π geom 1 (X) := π 1 (X Fq ) → π 1 (X) → G and let Γ := G/G 0 . Then, we define mult : G → Γ as the natural projection. Because π 1 (Spec F q ) = π 1 (X)/π geom 1 (X), we obtain a resulting map π 1 (Spec F q ) → Γ. We let γ q denote the image in Γ of geometric Frobenius.
The following is an equidistribution result for Frobenii in a monodromy group, which is a generalization of [Kow06b, Theorem 1].
Proposition 3.9. Let X be a smooth affine scheme of finite type over O[1/S], where O is a ring of integers in a number field, with geometrically irreducible fibers. For q a maximal ideal of O[1/S] with residue field F q , write X := X| O/q . Assume that we have a commutative diagram with λ 0 tamely ramified and surjective, G a finite group, and Γ abelian. Suppose C ⊂ G is a conjugacyinvariant subset. Then where G mult γ n q := mult −1 (γ n q ). Here the constant in the error term O X #G #C∩G mult γ n q q n is independent of q, the choice of G, and the choice of λ, so long as λ 0 is tamely ramified and surjective.
Proof. By the Lang-Weil bound, we have #X(F q ) = q dim X Fq + O X (q dim X Fq −1/2 ) and so after multiplying both sides by #X(F q ) (see also [Kow06b, Remark 2]), this statement nearly appears in [Kow06b, Theorem 1]. There are two differences however: First, Kowalski assumes that #G is prime to q instead of only that λ 0 is tamely ramified. Second, Kowalski works over a field instead of over O[1/S]. The proof of Proposition 3.9 is the same as that given in [Kow06b, Theorem 1], once these two differences are addressed. First we address the tamely ramified constraint. Indeed, a careful examination of the proof of [Kow06b, Theorem 1], shows that the only reason for assuming #G is prime to q appears in the reference to [Kow06a,Proposition 4.7], which in turn only uses this assumption in its reference to [Kow06a,Proposition 4.5], which in turn only uses this assumption in [Kow06a,(4.13)]. However, [Kow06a, (4.13)] holds whenever λ 0 , or the associated map labeled φ in [Kow06a], is tamely ramified, see [Ill81, 2.6, Cor 2.8]. We note that a generic hyperplane section of a tamely ramified cover remains tamely ramified, using Bertini's theorem to ensure that the hyperplane intersects the divisor of ramification generically. Hence, [Kow06a, Proposition 4.6], used in the proof of [Kow06a, Proposition 4.5], can be suitably generalized to include the assumption that the restriction of φ to the hyperplane is tamely ramified.
Second, we address the issue of working over O[1/S] in place of a finite field. The proof in [Kow06b] shows that if X comes as the reduction of a smooth X over O[1/S], then the constant in the error term O X #G #C q n of Proposition 3.9 can be taken to be a sum of (compactly supported) Betti numbers of X, which is uniform in q by Ehresmann's Theorem and proper base change for compactly supported étale cohomology. This applies in particular to the Selmer spaces, as they are smooth over Z[1/2].
In computing the image of the monodromy representation associated to the Selmer space, the following criterion for when an irreducible cover is geometrically connected will be crucial.
Corollary 3.10. Let Y be a geometrically irreducible finite type F q scheme and let π : X → Y be a finite étale connected Galois G cover corresponding to a surjective map ρ : π 1 (Y ) → G which is tamely ramified. Then, X is geometrically disconnected if and only if there exist infinitely many positive integers i such that for all y ∈ Y (F q i ), ρ(Frob y ) = id ∈ G.
Proof. If X is geometrically connected, then once i is sufficiently large, there do exist y ∈ Y (F q i ) with ρ(Frob y ) = id, using the equidistribution of Frobenius elements in G resulting from Proposition 3.9 (using that G = G 0 in that statement).
We next show the converse. Suppose X is geometrically disconnected and let j denote the number of components of X Fq . We claim that for any i relatively prime to j, To conclude the proof, it suffices to show that for any such i relatively prime to j, and any y ∈ Y (F q i ), ρ(Frob y ) = id ∈ G. Indeed, if ρ(Frob y ) = id ∈ G, the fiber of π : X → Y over y would necessarily be deg π copies of y, so in particular, X would have some F q i point. However, since X F q i is connected but geometrically disconnected, the j geometric components of X Fq must be nontrivially permuted by the action of Gal(F q /F q i ). In particular, this Galois action on the fiber X y over y must be nontrivial, and so X cannot have any F q i points.
Corollary 3.11. Retain the notation of Definition 3.8. For any n ≥ 1 and C ⊂ im ρ d n,Z[1/2n] a conjugacy class and F q a finite field of characteristic p with gcd(p, 2n) = 1, we have The same statement holds true with W •d k in place of W •d k . Proof. Note that in this setting, the tameness assumption on ρ d n,k was verified in the proof of [Lan21, Proposition 4.9], see especially the end of the first paragraph of [Lan21,p. 702]. The first statement follows immediately from Proposition 3.9. Note here that G and C as in the statement of Proposition 3.9 are fixed, and so we may absorb their orders into the constant in the error term O n,d (q −1/2 ).
To deduce the equidistribution statement for is cartesian. In other words, the monodromy representation associated to , all mapping to the same conjugacy class under ρ d n,k . Therefore, the 3.5. Determining the image of monodromy. In [Lan21, Theorem 4.4], a partial description of im ρ d n,k was given for k a field. The goal of this section is to precisely compute im ρ d n,k . First, we recall the description from [Lan21, Theorem 4.4]. Keeping notation as in § 3.2.1, for (V, Q) a quadratic space over a ring R with a map R → Z/nZ, we let (V n , Q n ) := (V Z/nZ , Q Z/nZ ) and let r n : O(Q) → O(Q n ) denote the induced reduction modn map of orthogonal groups. We will be most concerned with the case R = Z or R = Z (p) . In We next recall a slight generalization of the usual cyclotomic character, which we shall need to characterize im ρ d n,k . Remark 3.13. Note that χ cyc of Definition 3.12 is the usual cyclotomic character when char(k) = 0. Further, from the definition, in the case p = 0, k = F p , and q is a power of p, we have For the statement of Theorem 3.14, recall the notation for the spinor norm and Dickson invariant from § 3.2. Also, let ∆ Z/2Z : Z/2Z → primes ℓ =p Z/2Z the diagonal inclusion. For k a field of characteristic p and d ∈ Z ≥2 , let χ d−1 denote the composition Theorem 3.14. Let k be a field of characteristic p, allowing p = 0, and let d ∈ Z ≥2 . With ∆ Z/2Z and χ d−1 defined above, Example 3.15. Let's explicate what Theorem 3.14 says in the cases of interest to this paper.
• If k is algebraically closed or d is odd, then .
• If d is even and k = F q has characteristic p > 0, using Remark 3.13, we have where [q] is the group generated by the class of q.
We will prove Theorem 3.14 at the end of this section in § 3.10. The general outline of the proof is as follows. First, in § 3.6, we show the image of the monodromy representation contains Ω(Q d Z (p) ). Next, in § 3.7, we explain how to compute the spinor norm and Dickson invariant of images of Frobenius, in certain cases. Then, in § 3.8 and § 3.9 we compute the spinor norm and Dickson invariants on im ρ d Z (p) ,k , for k a finite field. Finally, we piece these parts together in § 3.10.
3.6. Showing the monodromy is big. We next explain how to deduce In particular, combining this with [Lan21,Theorem 4.4] gives Proof. The last sentence follows from the first by [Lan21,Theorem 4 invertible as a linear transformation over Z or equivalently the natural map induced by B Q from V to V ∨ , the dual lattice, is an isomorphism.
In the case that n is a prime power, since is unimodular and nondegenerate of rank more than 5 (see [Lan21,Remark 4 However, we may arrange the latter by applying [Kne84,Satz 3 We then find that r v • r w agrees with r v • r w when reduced modp ai i and is the identity when reduced mod p aj j for any j = i. It follows that 3.7. Tools to compute the Dickson invariant and spinor norm of Frobenius. In this section, we prove Proposition 3.17 which allows us to compute the spinor norm and Dickson invariants of the images of Frobenius elements under the monodromy representation. The following result essentially appears as [Zyw14, Proposition 2.9], where an analog is stated over Z/ℓZ in place of Z (p) . The following generalization has essentially the same proof, using that L-functions associated to elliptic curves are power series with coefficients in Z. Slight care must be taken to deal with the fact that the determinant disagrees with the Dickson invariant over fields of characteristic 2.
For E an elliptic curve over F q (t), we let L(T, E) denote the L-function associated to E and let ε E ∈ {±1} denote root number associated to E, see [Zyw14, §2.3] and [Zyw14, §2.2] respectively for a definitions. The only property of root numbers we will use is that they appear in the functional equation of the L function associated to E. Recall our notation (1) For where [q] is the class of the integer q in In order to prove Proposition 3.17 we will need the following Lemma, which is essentially shown in [Zyw14, p. 10].
Lemma 3.18. Let d ≥ 1, p an odd prime, ℓ a prime with ℓ = p, and , viewed as an equality of polynomials with coefficients in Z ℓ . In particular, the analytic rank of E x is equal to the Z ℓ -rank of the generalized viewed as an equality with coefficients in Q ℓ . As explained in [Zyw14, p. 10], we have Fq over which the minimal proper regular model of E x is smooth. Let j : U → P 1 Fq denote the inclusion morphism. Let E x [ℓ k ] denote the rank 2 locally free sheaf of Z/ℓ k Z modules parameterizing the ℓ k torsion of the smooth minimal proper regular model of E given by multiplication by ℓ. We next identify H 1 (P 1 Fq , j * T ℓ (E x )) with V d Z ℓ so as to compare this representation with ρ d Z ℓ ,Fp . By Proposition 2.7, there is a natural identification between the geometric fiber of the Selmer space over x, . Further, these are both free Z/ℓ k Z modules of rank 12d − 4 by [Lan21, Corollary 3.19]. By compatibility of these isomorphisms with the maps , viewed as an equality of polynomials with coefficients in Q ℓ .
To conclude the proof, it remains to explain why the final statement regarding analytic rank follows from the equality det(id −g x,ℓ T ) = L(T /q, E x ). The analytic rank is the largest power of T − 1 dividing L(T /q, E x ) = det(id −g x,ℓ T ). This agrees with the largest power of T − 1 dividing det g −1 x,ℓ − T , which is the characteristic polynomial of g −1 x,ℓ . Hence, the analytic rank agrees with the dimension of the generalized 1-eigenspace of g −1 x,ℓ , which is the same as the dimension of the generalized 1-eigenspace of g x,ℓ .
Proof of Proposition 3.17. Define g x,ℓ := ρ d Z ℓ ,Fp (Frob x ). First, we verify (1) regarding the Dickson invariant. From the definition of the Dickson invariant from § 3.2.4, to compute the D Q d ) for each prime ℓ = p separately and show this is equal to (1 − ε Ex )/2.
Next, observe that det(T − g x,ℓ ) = det(T − g −1 x,ℓ ). Indeed, for any nondegenerate quadratic space (V, Q) and M ∈ O(Q), and for M t the transpose of M , we have Hence, the characteristic polynomial of M agrees with that of M t which agrees with that of M −1 . Therefore, the characteristic polynomial of g x,ℓ agrees with that of g −1 x,ℓ using g x,ℓ ∈ O(Q d Z ℓ ) by the easier containment of [Lan21, Theorem 4.4].
Therefore, we have By [Zyw14, Theorem 2.2] in conjunction with Lemma 3.18, we also have implying det(g x,ℓ ) = ε Ex . Note that in the case ℓ = 2, we are using crucially that we are working over Z 2 which does not have characteristic 2. The relation between the Dickson invariant and the determinant for matrices over Z 2 given in [Con14, Corollary C.3.2] implies (1).

We next verify (2). It suffices to verify sp
, for every prime ℓ = p. As in the previous part, let g x,ℓ := ρ d Z ℓ ,Fp (Frob x ). First, observe that as det(id −g x,ℓ ) = 0, it follows that g x,ℓ has trivial 1-eigenspace. Because the Dickson invariant for an orthogonal group over a nondegenerate free module of even rank is congruent to the rank of the 1-eigenspace mod 2 by [Tay92, p. 160], we find To conclude the proof, we only need check L(1/q, E) ∈ q d−1 (Z × ℓ ) 2 . In fact, considering L(T, E) as a polynomial with integer coefficients, we will verify L(1/q, E) ∈ q d−1 (Q × ) 2 , and the fact that both L(1/q, E) and q d−1 lie in Z × ℓ will imply they agree up to a square in Z × ℓ . Since det(id −g x,ℓ ) = L(1/q, E x ) and det(id −g x,ℓ ) = 0, we find that the L function of E x has analytic rank 0, meaning that ord T =1/q L(T, E x ) = 0 or equivalently L(1/q, E x ) = 0. It follows from [Zyw14, Corollary 2.6] (as is deduced from the Birch and Swinnerton Dyer conjecture, applicable because the analytic rank and algebraic rank are both 0) that Lemma 3.19. For any field k of characteristic p = 2 (allowing p = 0) and any height d ≥ 2, the image of the map  ). Therefore, it is similarly nontrivial on im ρ d Z (p) ,k . Therefore, to conclude the proof, it suffices to show im D Q d . Further, from the definition of profinite groups as a limit of finite groups, it suffices to show that for any integer n of the form n = ℓ 1 · · · ℓ t , for primes ℓ 1 , . . . , ℓ t with no ℓ i = p, im D Q d n • ρ d n,k is contained in im ∆ Z/2Z . By base change, it suffices to establish the containment im D Q d n • ρ d n,k ⊂ im ∆ Z/2Z when k is either Q or a finite field of odd characteristic. If the composition D Q d n • ρ d n,k defines a surjective map π 1 (W •d k ) → G, we obtain a resulting finite étale Galois G-cover U G,n,d,k → W •d k . By Chebotarev density, for example as in [Eke90, Lemma 1.2], it suffices to establish that U G,n,d,Q is geometrically connected and to establish the claim for all finite fields k of odd characteristic. Further, geometric irreducibility for U G,n,d,Q follows from geometric irreducibility of U G,n,d,Fp for all but finitely many primes p, because U G,n,d,k → W •d k → Spec k is in fact the base change of a map U G,n,d We claim that the cover U G,n,d,k → W •d k is tamely ramified. Indeed, this holds because we are assuming k does not have characteristic 2, while the cover U G,n,d,k → W •d k has degree which is a power of 2 because the Dickson invariant takes values in a 2-group.
It follows from Corollary 3.10 that over any finite field k, the resulting G-cover is geometrically connected, and so the containment D Q d n •ρ d n,k (Frob x ) ⊂ im ∆ Z/2Z in fact holds for all finite fields of odd characteristic. 3.9. Controlling the spinor norm. We next use Proposition 3.17(2) to analyze the spinor norm applied to im ρ d Z (p) ,k . The general strategy in what follows will be to compute the image of the spinor norm restricted to the kernel of the Dickson invariant, and then use this to deduce the joint image of the spinor norm and Dickson invariant.
For this proof, we will need to know there are many elliptic curves [E x ] ∈ W •d k with trivial 1-eigenspace. This will follow from the group theoretic statement soon established in Proposition 3.22. In order to state this precisely, we recall a relevant distribution on the ℓ-adic points of a finite type scheme from [ Lemma 3.20. Let X be a finite type Z ℓ scheme of dimension d and equip X(Z ℓ ) with the ℓ-adic topology. There exists a unique bounded R ≥0 -valued measure µ X on the Borel σ-algebra of X(Z ℓ ) such that for any open and closed subset S of X(Z ℓ ), we have Remark 3.21. Lemma 3.20 is correct as stated, but the proof in [BKL + 15, Proposition 2.1(b)] has a minor error. There, it is stated that #Y (Z/ℓ e Z) = O (ℓ e ) d−1 , which is not in general true. The correct statement is that im (Y (Z ℓ ) → Y (Z/ℓ e Z)) = O (ℓ e ) d−1 . A counterexample to the incorrect statement is provided by the subscheme Y = Spec Z[x]/(x 2 ) and X = A 1 Z ℓ . In this case, we easily see that #Y (Z ℓ ) = 1 because Z ℓ is reduced, but #Y (Z/ℓ e Z) = ℓ ⌊e/2⌋ as such points are in bijection with elements of Z/ℓ e Z which square to 0.
In the following proposition only, we use O(Q) and SO(Q) to denote the algebraic groups associated to a quadratic form Q, and O(Q)(R) to denote its Spec R points, for R a ring.
Proposition 3.22. Let (V, Q) be a nondegenerate quadratic space of even rank at least 4 over Z ℓ . There is a Zariski closed pure codimension 1 subscheme Z ⊂ O(Q), such that g ∈ Z if and only if g has a generalized 1-eigenspace of dimension at least 2.
Further, any g ∈ (O(Q) − Z)(Z ℓ ) has a zero dimensional generalized 1-eigenspace and zero dimensional 1-eigenspace when g ∈ SO(Q)(Z ℓ ) and a one dimensional generalized 1-eigenspace and one dimensional 1-eigenspace when g / ∈ SO(Q)(Z ℓ ). In particular, Z(Z ℓ ) has measure 0 with respect to the distribution of Lemma 3.20.
Proof. For V L an even dimensional free module over a field L and g : V L → V L , let V g=λ L denote the λeigenspace and V [g=λ] L denote the generalized λ-eigenspace. Let Q L be a nondegenerate quadratic form on V L . Recall that the Dickson invariant agrees with dim V g=1 L mod 2, using that dim V L is even and [Tay92,p. 160]. (In [Tay92, p. 160] the notation [V, f ] is used for im(1 − f ), whose rank taken mod 2 agrees with dim V g=1 L mod 2 since dim V L is even.) In particular, every element in (O(Q L ) − SO(Q L ))(L) has odd dimensional 1-eigenspace while every element of SO(Q L )(L) has even dimensional 1-eigenspace. Now, let (V, Q) be a nondegenerate even rank quadratic space over Z ℓ as in the statement of the proposition. We may apply the above discussion to the base change (V Q ℓ , Q Q ℓ ) to deduce that any element g ∈ SO(Q)(Z ℓ ) has rk V g=1 L ≡ 0 mod 2 and any element of g ∈ (O(Q) − SO(Q))(Z ℓ ) has rk V g=1 Further, the condition that an element g ∈ SO(Q)(Z ℓ ) has rk V [g=1] Q ℓ > 0 is Zariski closed and nonempty in the algebraic group SO(Q) over Z ℓ ; it is Zariski closed because this condition can be expressed as T − 1 dividing the characteristic polynomial of g and it is nonempty because there are elements in a maximal torus with dim V g=1 is Zariski closed and nonempty. (This uses that char Z ℓ = 0 = 2, as in characteristic 2 every element of O(Q) − SO(Q) would have generalized 1 eigenspace of dimension at least 2.) Therefore, to establish the statement regarding generalized 1-eigenspaces, it suffices to show that a proper Zariski closed subscheme of an integral scheme over Z ℓ parameterizes a measure 0 subset, which is the content of Lemma 3.20.
The statement for generalized 1-eigenspaces established above implies the corresponding statement for 1-eigenspaces because when the generalized 1-eigenspace is at most 1 dimensional, it is equal to the 1eigenspace. The final statement that Z(Z ℓ ) has measure 0 follows from Lemma 3.20.
We next define a double cover Z d k → W •d k so that the Dickson invariant is trivial on π 1 (Z d k ). Definition 3.23. Let n ≥ 1, d ≥ 2, and let k be an integral domain (not necessarily a field) on which 2n is invertible. By Lemma 3.19, the Dickson invariant defines a surjective map π 1 (W •d k ) → Z/2Z and hence corresponds to a finite étale Z/2Z cover Z d k → W •d k . This yields a map π 1 (Z d k ) → SO(Q d n ) which is identified with the restriction of ρ d n,k to the kernel of the Dickson invariant.
In the case k is a field, by abuse of notation, we have a map χ cyc : π 1 (Spec k) → (Z/nZ) × (induced by the cyclotomic character χ cyc to Z (p) × from Definition 3.12). In the general case where k is just an integral domain, we also obtain a map χ cyc : π 1 (Spec k) → (Z/nZ) × which can be defined as the unique map making the diagram below commute: We have a diagram Lemma 3.24. The square (3.3) commutes when k is a field of characteristic prime to 2n.
Proof. Because commutativity of (3.3) is compatible with base change on the integral domain k, it suffices to verify it in the cases that k = Q and that k is a finite field of characteristic prime to 2n. First, we verify the claim when k is a finite field of characteristic prime to 2n. It suffices to establish the claim for all sufficiently divisible n. Hence, to simplify matters latter, we make the further harmless assumption that 8 | n. Using that (Z/nZ) × / (Z/nZ) × 2 has even order, it suffices to verify commutativity of (3.3) for all sufficiently large finite fields of characteristic p with gcd(p, 2n) = 1, and odd degree over F p . Now, for such sufficiently large finite fields, we only need verify that that for varying x ∈ Z (k), sp − Q d n (ρ d n,k (Frob x )) is always equal to q d−1 . By Proposition 3.9, Frobenius elements are equidistributed in a coset of the geometric monodromy group and so it suffices to establish sp − Q d n ρ d n,k (Frob x ) = q d−1 for a subset of x ∈ W •d k (k) with density in W •d k (k) tending to 1 as #k → ∞. Further, we note that the spinor norm is unchanged upon replacing n with n j for any j ≥ 1. Note that here we are using the assumption 8 | n, as, for example, sp − Q d 2 maps to the trivial group while sp − Q d 4 maps to a nontrivial group. By replacing n with a sufficiently large power we can ensure that the density of g ∈ im ρ d n,k with a 0-dimensional 1 eigenspace is arbitrarily close to 1 by Proposition 3. 22 is trivial. From the exact sequence [Gro71, Exposé IX, Théorème 6.1] we obtain that the cover T × Spec Z[1/2n] Spec Q → Z d Q is the pullback of a cover S → Spec Q along the structure map Z d Z → Spec Q. To conclude, we wish to show S is a trivial cover of Spec Q. By Chebotarev density, it suffices to show that the normalization of Spec Z in S is the trivial cover over a density 1 subset of primes. Since S pulls back to T × Spec Z[1/2n] Spec Q along the map Z d Z[1/2n] → Spec Q, it suffices to show that T → Z d Z[1/2n] is the trivial cover over a density 1 subset of primes. Indeed, this triviality holds by the previously established commutativity of (3.3) when char(k) is positive.
Recall in Definition 3.23, we defined Z d k as the double cover of W •d k corresponding to the kernel of the Dickson invariant. That is, Lemma 3.25. For a field k of characteristic p = 2 (allowing p = 0) and any height d ≥ 2, the image of the spinor norm map restricted to ker(D Q d is identified with the image of the composition (3.5) Remark 3.26. In the case k is algebraically closed or d is odd, Lemma 3.25 says the image of the spinor norm map sp − • ρ d Z (p) ,k , when restricted to the kernel of the Dickson invariant, is trivial.
Proof. It suffices to establish the claim for all finite n, with no prime factor of n equal to p, in place of Z (p) . The result then follows from Lemma 3.24.
3.10. Proving Theorem 3.14. Combining the results of the preceding subsections, we are ready to complete our monodromy computation.
Proof of Theorem 3.14. First, by Lemma 3.16, we find Ω(Q d Z (p) ) ⊂ im ρ d Z (p) ,k . As Therefore, the image of the joint map (D Q d ,k is generated by ∆ Z/2Z (Z/2Z) × id together with the image of the spinor norm when restricted to the kernel of the Dickson invariant. The latter image is given in the theorem statement by Lemma 3.25. Therefore, the joint map (D Q d ) has image as claimed in the statement of Theorem 3.14.

The distribution of Sel ℓ
In this section we will prove the key results towards showing that the BKLPR heuristic agrees with the geometric distribution of Sel ℓ , for prime ℓ. The psychology of the problem is as follows: one would like to "understand" the distributions by computing numerical invariants such as moments, but the distributions in question are not determined by their moments, since these moments grow too quickly. However, both distributions are the limit as a certain "height" parameter tends to infinity, and at finite height they are distributions on finite sets, hence obviously determined by their moments. We can then verify that the two limiting distributions agree by showing that the "finite height" distributions are very close, which we can then do by computing enough moments.
The key point that makes this computation feasible is that the moments stabilize very quickly as the height grows. It was already observed in [Lan21, Theorem 1.2] that the first moment (i.e., average) size of Sel ℓ for height d elliptic curves (in the large q limit) is already equal to its limiting value as soon as the height d is at least 2. In this section we go much further, computing the first 6d − 2 moments for the large q limit of families of elliptic curves with height d (in the large q limit), and showing that they are all already equal to their limiting values. Even computing one fewer moment would be insufficient for our purposes, and it seems that computing one more moment in closed form would be quite difficult, as the next moment is not equal to its limiting value! We caution, however, that the distribution at finite height depends quite delicately on the monodromy group; for example, the large q limit does not literally exist because of small fluctuations among the monodromy groups, but the difference between its lim inf q→∞ and lim sup q→∞ will tend to 0 as the height tends to infinity.
We now give an outline of the contents of this section. In § 4.1, we introduce the random kernel model, which is our model for Selmer groups that directly connects to points of the Selmer space. This model will be defined in terms of kernels of random elements of subgroups of an orthogonal group, and so in § 4.2 we compute the probability distributions of the dimensions of these kernels. In § 4.3.5 we show how to determine compute the moments of the above mentioned random kernels, and then how to determine their distribution in terms of these moments, which is used in § 4.4 to bound the total variation distance between the random kernel model and the BKLPR model. We emphasize that these results a priori concern the random kernel model rather than Sel n , but later in §6 it will be spelled out how to relate the two. 4.1. The random kernel model. We introduce another probabilistic model which is closely related to the distribution of Selmer elements. We will continue to use the notation introduced earlier, especially from § 3.2.1. is the orthogonal group for the quadratic form of Definition 3.1. We define RSel H V d n to be the random variable ker(g − id), valued in isomorphism classes of Z/nZ-modules, for g drawn uniformly at random from H.
In this section, we will primarily be concerned with the case of Definition 4.1 where n = ℓ is prime, but in § 5, we will crucially use the case that n = ℓ e is a prime power. Now we will define the precise random variable that we end up relating to the distribution of ranks and Selmer groups of elliptic curves for our universal family.
Define RSel d n,k as the distribution on Ab n given by Define (Rrk, RSel n ) d k as the distribution on Z ≥0 × Ab n given by Theorem 3.14, adapted to the case of finite fields, gives: , so we choose to work with the latter. Observe that the monodromy agrees with the geometric monodromy (i.e., im ρ d n,Fq = im ρ d n,Fq ) when q is a square or d is odd or n ≤ 2, and has index 2 in the geometric monodromy when q is a square and d is even and n > 2 by Theorem 3.14. Therefore, in the former case, it is equidistributed in the monodromy group, which is H d n,k in this case, up to an error of O n,d (q −1/2 ) by Proposition 3.9. On the other hand, when q is not a square and d is even and n > 2, γ q as in Definition 3.8 is nontrivial since the geometric monodromy is not equal to the monodromy. Hence, by Proposition 3.9, Frob x is equidistributed in the nontrivial coset of ρ d n,Fq ⊂ ρ d n,Fq , which is precisely im ρ d n,Fq − im ρ d n,Fq = H d n,k . The statement regarding the concrete characterization of the Dickson invariant and spinor norm is merely a restatement of Theorem 3.14.
In §6, we will use the results from §2.2 and §3.4 to relate the random kernel model to the distribution of Selmer groups. For the rest of this section, we focus on analyzing the random kernel model.

4.2.
Distribution of random 1-eigenspaces. We now focus on the case where n = ℓ is prime. 4.2.1. Some notation. We will use Theorem 4.9 in conjunction with Lemma 4.5 to deduce the probability generating function for ker(g − id) for g drawn uniformly at random from a coset of Ω . We label these cosets as in the following table.
For Z a random variable valued in isomorphism classes of finite-dimensional F ℓ -vector spaces, define the probability generating function of Z to be the polynomial in t given by For a polynomial f (t) = i∈N a i t i , introduce the notation [f (t)] r := a r to denote the coefficient of t r in f (t).

4.2.2.
The probability generating functions. We will now work towards the proof of: Theorem 4.4. Let ℓ > 2 be an odd prime and d ≥ 1 a positive integer. Then we have

Some lemmas.
We begin with some preliminary results. For (V, Q) a quadratic space and k ∈ Z ≥0 , we will abbreviate We claim that if dim L V ≥ 2m + 2, for every a ∈ L, there is some w ∈ W ⊥ with Q(w) = a. Assuming this claim, let us show that the orbits of O(Q) and Ω(Q) coincide. First, we tackle the case char(L) = 2. In this case, it suffices to show that for each (α, β) ∈ Z/2Z × Z/2Z, there is some h ∈ O(Q) fixing (v 1 , . . . , v m ) with sp − Q (h) = α and det(h) = β. To see such an h exists, let w be an element in W ⊥ with −Q(w) a square in L, and let w ′ be an element with −Q(w ′ ) a non-square in L. Then the four elements id, r w , r w ′ , r w • r w ′ ∈ O(Q) attain all four possible values of (sp − Q , det) and fix (v 1 , . . . , v m ). This implies that Ω(Q) acts transitively on the O(Q)-orbit of (v 1 , . . . , v m ).
The case char(L) = 2 is similar, but easier. To show Ω(Q) has the same orbits as O(Q), it suffices to exhibit an element of nontrivial Dickson invariant fixing (v 1 , . . . , v m ). Indeed, for any v ∈ W ⊥ , r v is such an element.
We now conclude the proof by verifying the claim. If (V, Q) is any nondegenerate quadratic space of dimension at least 2 over a finite field L, then for every a ∈ L there is some v ∈ V with Q(v) = a. Recall that the rank of a quadratic space (V, Q) is defined to be rk(V, Q) := dim V − dim rad(V, Q), where rad(V, Q) the radical of (V, Q), i.e., the set of x ∈ V with B Q (x, y) = 0 for all y ∈ V . Therefore, it suffices to show that rk(Q| W ⊥ , W ⊥ ) ≥ 2. Note that rad(Q| W ⊥ , W ⊥ ) = W ∩ W ⊥ . Hence (4.1) It will also be useful later to have a result on the case when dim V = 2m. To see that such w exists, it suffices to show that rk(Q| W ⊥ , W ⊥ ) > 0. But by (4.1), this holds as long as W is not maximal isotropic.
Lemma 4.7. For ℓ a prime and d ≥ 1, Proof. For g ∈ G, let V g=1 denote the 1-eigenspace of g acting on V . Let G ′ ⊂ G be a subgroup. By definition, we have so that By Lemma 4.5, the right hand side of (4.4) has the same value when we take G ′ to be any of We then obtain the result by noting that any coset can be expressed in terms of differences of the above subgroups. For example, we can obtain the result for H = B by writing Proof of Theorem 4.4. Recall that the Dickson invariant of any element g ∈ O(Q d n ) agrees with the dimension of its 1-eigenspace mod 2. Indeed, in general, the Dickson invariant of g agrees with dim im(1 − g), by [Tay92, p. 160], where the notation Because of this, only odd powers of t can appear in . Furthermore, they have degree at most 12d − 5 since dim V = 12d − 4. By Lemma 4.7, these functions agree at the 6d − 2 points 1, ℓ, . . . , ℓ 6d−3 . Since they are both odd functions, they must agree as well at 0, −1, −ℓ, . . . , −ℓ 6d−3 . But two polynomials of degree at most 12d − 5 agreeing at 12d − 3 points must be the same.
are even polynomials of degree at most 12d − 4, and they agree at the To find the constant of proportionality, note that the coefficient of is the probability that g ∈ H fixes all of V , i.e. is the identity. This happens with probability where U denotes the hyperbolic plane and −E 8 denotes the quadratic form associated to the E 8 lattice with negative its usual pairing. Since , is proved in an unpublished manuscript of Rudvalis-Shinoda, cf. [FS16]. We will give an independent proof of this theorem in § 4.3.1.
For Z a random variable we let E(Z m ) denote the mth moment of Z, which is the expected value of the random variable Z m .

Furthermore, we have
(4.5) Additionally, for 0 ≤ m ≤ 6d − 2, the moments of #RSel From Theorem 4.9 and Theorem 4.4, it is fairly straightforward to deduce explicit formulas for the probability generating functions However, we omit the answers as we will not need them.
4.3. Direct computation of the moments. In this subsection we give an alternate computation of the moments of dim ker(g − id) for g ∈ O(Q), for Q a quadratic form over F ℓ of sufficiently large rank without using the unpublished results of Rudvalis and Shinoda. We will explain that this gives an alternate proof of Theorem 4.9. In addition, the analysis here is used later to get better control on the convergence of the random kernel model. As already mentioned above, [FS16] computed an explicit formula for the moments of dim ker(g − id) for g ∈ O(Q), using the probability distribution obtained in unpublished work of Rudvalis-Shinoda. The calculation of Rudvalis-Shinoda rests on intricate combinatorial analysis. We learned of this work after we had already found an independent computation of the probability distribution, which we will explain in this subsection. Our logic in this subsection runs in the opposite direction: we directly compute the moments, and deduce the probability distribution from it. (The advantage of this approach is that it also gives the distribution for g drawn from subgroups of O(Q), such as Ω.) Theorem 4.10. Fix m ∈ Z ≥0 , let n be squarefree, and let (V, Q) be a nondegenerate quadratic space over Z/nZ. For rk Z/nZ V ≥ 2m + 2, then: (1) The number of orbits of O(Q) acting diagonally on V m is ℓ prime|n (1 + ℓ)(1 + ℓ 2 ) · · · (1 + ℓ m ). (4.6) (2) The orbits of Ω(Q) acting diagonally on V m coincide with those of O(Q) acting diagonally on V m . For the next part (which is about getting slightly sharper results in the "edge case" r = 2m), we let n = ℓ be prime and ask that (V, Q) be a split 4 quadratic space of dimension r over F ℓ .
(3) For r = 2m, the number of orbits of O(Q) acting diagonally on V m is also given by (4.6).

4.3.1.
Proof of Theorem 4.9, assuming Theorem 4.10. Let G(t) be the generating function of the distribution in Theorem 4.9. This is a polynomial of degree 12d − 4; write where G odd (t) is an odd polynomial and G even (t) is an even polynomial. The computation in [FS16] shows that the moments of the even and odd parts of the distributions coincide, so that As explained Lemma 4.7, the orbit counts in Theorem 4.10 are the moments of #RSel Theorem 4.10 shows that the mth moment of #RSel for the decomposition into odd and even parts, Lemma 4.7 implies also that Since they are both odd polynomials, they also agree at −ℓ m for 0 ≤ m ≤ 6d − 3. But since they both have degree at most 12d − 5, and they agree at 12d − 4 points, they must be equal. Similarly, G even (ℓ m ) = G even Since they are both odd polynomials, they also agree at −ℓ m for 0 ≤ m ≤ 6d − 2. Hence there difference is a polynomial of degree at most 12d − 4 vanishing at the 12d − 4 points ±ℓ m for 0 ≤ m ≤ 6d − 3, and must therefore a multiple of 6d−2 m=0 (t 2 − ℓ 2m ). But the coefficients of t 12d−4 in both G even (t) and G even , so the constant of proportionality must be 0.
The rest of this subsection is devoted towards proving Theorem 4.10.

Counting orbits of independent vectors.
Recall that a quadratic space is hyperbolic if it has the form W ⊕ W ∨ with form Q(w, λ) = λ(w); over a field, this is equivalent to the condition that it be metabolic, i.e., that it is nondegenerate and contains an isotropic subspace of half the dimension [MH73, III, Lemma 1.2].
Lemma 4.11. Let (V, Q) be a metabolic quadratic space over a field. Then any (possibly degenerate) quadratic space (W, Q ′ ) of dimension dim(W ) ≤ dim(V )/2 embeds isometrically in V .
Proof. Any nondegenerate quadratic space over a finite field is isomorphic to the direct sum of a hyperbolic quadratic space and a nondegenerate quadratic space of dimension at most 2, and Lemma 4.11 shows that (W, Q ′ ) embeds in the former.
The key technical ingredient in the proof of Theorem 4.10 is the following Proposition.

4.3.3.
Orbits of dependent vectors. We aim to explain how to determine the orbits of tuples of vectors that are linearly dependent inductively using Proposition 4.13. The following lemma is key to counting these dependent orbits. Proof. Suppose that (x i1 , . . . , x it ) is a basis for W , so dim W = t. Then for any g ∈ O(Q), g·(x 1 , . . . , x m−1 , y) is uniquely determined by g · (x i1 , . . . , x it ).
To count the number of orbits, we can express y uniquely as Then the orbit of (x 1 , . . . , x m−1 , y) is uniquely determined by the scalars (a i ∈ F ℓ ) 1≤i≤t , and so there are ℓ dim W such orbits.

A recursive formula.
Definition 4.15. Fix a quadratic space (V, Q) over a finite field k. Let f (n, i) be the number of orbits of V n under the action of O(Q) such that dim k Span(x 1 , . . . , x n ) = i.
We next explain a recursive formula for the f (n, i).
This completes the proof of parts (1) and (2). Now we move onto parts (3) and (4). The argument for part (3) is the same as for the proof of Theorem 4.10. For Part (4), we note by Lemma 4.6 that the orbits coincide except on vectors (x 1 , . . . , x m ) ∈ V m that span a maximal isotropic subspace of V . In this case there is only one orbit of such vectors under O(Q), but two orbits under SO(Q) [CF17, Corollary T.3.4].

4.4.
Bounding the TV distance. We use the moment computations in § 4.3 to obtain certain useful expressions for the probability generating functions.
In this section, let (V r , Q r ) be the split orthogonal space over F ℓ of rank r (hence discriminant 1). We denote O r = O(V r , Q r ), SO r = SO(V r , Q r ), Ω r = Ω(V r , Q r ), etc.
Let H 2r ⊂ O 2r denote the kernel of the Dickson invariant, i.e., H 2r = SO 2r when ℓ is odd, and H 2r = Ω 2r when ℓ is even. For j ≥ 0, let M j be the limit as r → ∞ of the jth moment of RSel SO Vr , which by Theorem 4.9 is j i=1 (ℓ i + 1). Lemma 4.21. We have the following values for the moments of # ker(g − 1) for g drawn from H 2r and its complement: Proof. The claims for j < r follow from Lemma 4.5 plus Theorem 4.10. The claims for j = r follow from Lemma 4.6 plus Theorem 4.10 Let P r (t) be the unique even polynomial of degree 2r such that P r (ℓ j ) = M j for all 0 ≤ j ≤ r, and let P ′ r (t) be the unique odd polynomial of degree 2r − 1 such that P ′ r (ℓ j ) = M j for 0 ≤ j < r (not to be confused with the derivative of P r ). Define to be the probability generating function for 1-eigenspaces of elements drawn randomly from H 2r , and G ′ r (t) := E g∈O2r −H2r [t dim ker(g−1) ]. Lemma 4.22. We have identities G r (t) = P r−1 (t) + 1 #H 2r 0≤j<r (t 2 − ℓ 2j ), (4.9) G r (t) = P r (t) + 0≤j<r t 2 − ℓ 2j ℓ 2r − ℓ 2j , (4.10) (4.12) Proof. First, we check (4.9). By Lemma 4.21, G r (t) − P r−1 (t) vanishes at t = ±ℓ j for 0 ≤ j ≤ r − 1, and is of degree 2r, hence is proportional to 0≤j<r (t 2 − ℓ 2j ). Therefore, we can determine G r (t) completely by examining the coefficient of t 2r , which is #H −1 2r because that is the probability of drawing the identity element.
We next check (4.10) Similarly, G r (t)−P r (t) is proportional to 0≤j<r (t 2 −ℓ 2j ), and it can be determined by evaluating at ℓ r , where the value is 1 by Lemma 4.21.
Recall that the Total Variation distance (TV) between two probability distributions P and P ′ is When P and P ′ are defined on a countable discrete probability space X, as shown in [LPW09, Proposition 4.2] we can write this as In other words, conflating P and P ′ with functions on X, this is (up to the normalization factor 1/2) the L 1 -norm. Clearly, convergence in TV distance implies convergence as distributions (which is pointwise convergence in the case of distributions on a discrete space). We define the TV distance between two random variables to be the TV distance between their induced probability distributions.
Theorem 4.23. For ℓ a prime, d ≥ 2, and q ranging over prime powers with gcd(q, 2ℓ) = 1 We have where the implicit constants are absolute in both cases.
Proof. We write the proof in the case where ℓ is odd; the case where ℓ = 2 is even easier, as the analysis of the cosets simplifies because there are fewer cosets (cf. the discussion in § 4.2.4). We first compare the TV distance between dim RSel and Note that the TV distance between random variables Z and Z ′ has a clean formulation in terms of the probability generating functions G Z (t) and G Z (t ′ ): it is half the sum of the absolute values of the differences of the coefficients, as follows from (4.13). Using this observation together with Theorem 4.4, we have (1 + ℓ 2i ).
By examining the dimension of the orthogonal group, we find On the other hand, we have .
Next, we estimate d TV (RSel . It suffices to show that We compare the even and odd parts of their generating functions, using the computations of the preceding section. For the even part, using Lemma 4.22 gives that the sum of the absolute values of the coefficients of This shows Corollary 4.24. Fix a prime ℓ, an integer d ≥ 2, and consider a sequence of prime powers {q 1 , q 2 , . . .} with gcd(q i , 2ℓ) = 1, so that the q i lie in a fixed residue class mod ℓ if ℓ is odd, and lie in a fixed residue class mod 8 if ℓ = 2. Then, the TV distance between the BKLPR heuristic and lim i→∞ dim RSel d Proof. First, we impose the assumption that the the q i lie in a fixed residue class mod ℓ if ℓ is odd, and lie in a fixed residue class mod 8 if ℓ = 2, so that the distribution in Theorem 3.14 is independent of the choice of q i in this sequence, since im χ d−1 is independent of the choice of q i . Hence, lim i→∞ dim RSel d ℓ,Fq i exists. Note that in the case where ℓ is prime, which we are currently considering, the "BKLPR heuristic" first appeared as the "Poonen-Rains heuristic" [PR12], whose explicit formula is given by [

Markov properties
In this section, we establish Markov properties satisfied by both the random kernel model and the BKLPR model, which will be used to identify their distributions for prime power order Selmer groups. In § 5.1 we state the Markov property satisfied by the random kernel model, which we prove in § 5.2. We then recall the BKLPR model in § 5.3 and demonstrate the Markov property satisfied by the BKLPR model in § 5.4. 5.1. Markov property for random 1-eigenspaces. Let (V, Q) be a nondegenerate quadratic space of rank rm over Z/ℓ e Z. Recalling from Definition 4.1, that for a subset H ⊂ O(V, Q) we let RSel H V be the random variable ker(g − id), valued in isomorphism classes of finite abelian ℓ-groups, for g drawn uniformly at random from H.
In this section only, we will use the notation O(V, Q), Ω(V, Q), and SO(V, Q) for various subgroups of orthogonal groups, because we will consider various coefficient changes and wish to emphasize this in the notation. Noting that H acts on V [ℓ j ], we let H j be the image of Proof. By definition, im ρ d n,k ∩ mult −1 (mult γ q ) is a coset of the geometric monodromy group in the monodromy group. By Theorem 3.14, the geometric monodromy group contains Ω(V d n , Q d n ) and the monodromy group is contained in , and we can apply Theorem 5.1 to each of the cosets.
We next reduce Theorem 5.1 to Theorem 5.4 below. For any 1 ≤ j ≤ e, consider ℓ e−j V = V [ℓ j ], which is a nondegenerate quadratic space of rank 2m over Z/ℓ j Z. ) will also be uniform in a coset of Ω(V Z/ℓ j Z , Q Z/ℓ j Z ). We now naturally generalize Definition 4.1 to the setting of quadratic space over Z ℓ . Definition 5.3. Let (V, Q) be a quadratic space over Z ℓ , and let H ⊂ O(V, Q) be a subset which is a union of cosets of Ω(V, Q) in O(V, Q). Define the random variable RSel H V ⊗Q ℓ /Z ℓ to be given by ker(g − id | V ⊗Q ℓ /Z ℓ ) for g ∈ H drawn from the Haar measure (normalized to be a probability measure) of Lemma 3.20.
By the compatibility with reduction modulo ℓ j discussed above, Theorem 5.1 then follows from: Theorem 5.4. Let (V, Q) be a nondegenerate quadratic space of rank 2m over Z ℓ . Let H ⊂ O(V, Q) be a union of cosets of Ω(V, Q). Define the random variable

5.2.
Proving Theorem 5.4. We now embark on the proof of Theorem 5.4. The proof encompasses this entire subsection, and notation is built cumulatively throughout the section.
We begin by giving one more interpretation of the sequences d j (H). Referring to notation of Theorem 5.4, let V H j be the random variable 6 , valued in isomorphism classes of F ℓ -vector spaces, given by for g drawn from the Haar measure on H. For a fixed g ∈ O(V, Q) we write Hence dim V H j coincides with the random variable d j (H). Proof. This is a straightforward verification which follows from commutativity of We set V H 0 := V ⊗ Z ℓ F ℓ by convention. We claim that the sequence V H 1 , V H 2 , . . . of random subspaces is Markov, and more precisely that if ℓ is odd or V H j = V H 0 , then V H j+1 is the kernel of a uniformly distributed alternating form on V H j . In view of Lemma 5.6, this will complete the proof of Theorem 5.4. Lemma 5.7. The orthogonal complement of V g j ⊂ V ⊗ F ℓ with respect to the quadratic form induced by Q is This immediately induces the claim about orthogonal complements inside V ⊗ F ℓ .
Given j and g, for v ∈ V g j , we use v to denote any choice of lift to V . Lemma 5.8. Keep the notation of the preceding discussion. The following are equivalent: where B is the bilinear form associated to the quadratic form Q on V .
Proof. Given v ∈ V g j , we want to know when it is in V g j+1 . The condition that v ∈ V g j is equivalent to there being a liftṽ of v to V such that (g − id)ṽ ∈ ℓ j V . Fixing such a lift v, the question is whether we can modify it to another lift v ′ such that (g − id)ṽ ′ ∈ ℓ j+1 V . The freedom for modification is that we can replace v by v + ℓδ for some δ ∈ V . So we want to know if δ can be chosen so that Since we know that (g − id) v ∈ ℓ j V by assumption, we can rewrite this as This establishes the equivalence of (i) and (ii).
The equivalence of (ii) and (iii) then follows from Lemma 5.7.
The F ℓ -linear functional w → B(ℓ −j (g − id)ṽ, w) on V g j depends only on v, and expresses V g j+1 as the kernel of a linear transformation V g j → (V g j ) ∨ , or equivalently as the radical of a bilinear form. Lemma 5.9. Keep the notation of the preceding discussion. Define the bilinear form on V g j : v, w j := B(ℓ −j (g − id)ṽ, w).
Proof. Part (i) follows from Lemma 5.8. For (ii), we need to show that But this follows by observing: We thus find that V g j+1 is the kernel of an alternating form on V g j , so it remains only to show that as g varies over elements with fixed sequence (V g 1 , . . . , V g j ), this alternating form is uniformly distributed. It suffices to show this when g merely varies over elements of a fixed coset of Ω(V, Q) ⊂ O(V, Q). Let Ω j ⊂ Ω(V, Q) be the subgroup consisting of elements which are 1 mod ℓ j . We will show that the uniform distribution holds already when drawing uniformly from the coset H = Ω j g. For fixed v, changing g → hg with h ∈ Ω j changes the linear functional by where δ h = ℓ −j (h − 1). We view its reduction modulo as an element of the Lie algebra of the special fiber of O(V, Q): δ h ∈ Lie(O(V, Q) F ℓ ). To get equidistribution, it suffices for the induced homomorphism from Ω j /Ω j+1 to the space ∧ 2 (V g j ) ∨ of alternating forms on V g j , sending h to the restriction of δ h , to be surjective. 5.2.1. The case ℓ > 2. If ℓ is odd, then Ω 1 is a pro-ℓ-group, and thus the spinor norm vanishes on Ω 1 . It immediately follows that the logarithm induces an isomorphism Ω j /Ω j+1 for any g ∈ O(V, Q) and any alternating form α ∈ ∧ 2 (V ⊗ F ℓ ) ∨ . Take g to be any lift of the reflection in a nonisotropic vector v ∈ V F ℓ (i.e., a vector with Q(v) = 0).
(the unusual expression because we are in characteristic 2). Then A computation shows all w * ⊗ v * with B(v, w) = 0 are in the space generated by such expressions 7 Since for any w, w ⊥ is spanned by nonisotropic vectors, the space log(Ω j ) in fact contains x i y i .
There is a probability measure on OGr (V,Q) (Z ℓ ) such that the distribution of Z/ℓ e Z in V /ℓ e V for each e ≥ 1 is uniform [BKL + 15, §1.2, §2, §4]. We define Q 2m,ℓ (notated in [BKL + 15] as Q 2m ) to be the distribution associated to the random variable S, valued in isomorphism classes of abelian groups, where S obtained by drawing Z and W from OGr (V,Q) (Z ℓ ) independently from this measure, and forming Remark 5.10. In [BKL + 15], Q 2m,ℓ and related distributions were defined on symplectic abelian groups, which are abelian groups together with a nondegenerate alternating pairing to Q/Z. Since two symplectic abelian groups are isomorphic if and only if their underlying abelian groups are isomorphic [BKL + 15, §3.2], their distribution can be regarded as a distribution on abelian groups (which takes probability 0 on any abelian group not admitting a symplectic structure). 7 We spell out this computation in more detail. Let x be such that B(x, v) = 1. Take α to be represented by x * ⊗w ∈ V * F ℓ ⊗V F ℓ , where we have used B to identify V with V * . Then gαg t − α is represented by As m → ∞ the distributions Q 2m,ℓ converge to a discrete probability distribution Q ℓ [BKL + 15, Theorem 1.2], which is conjectured in [BKL + 15, Conjecture 1.3] to determine the asymptotic distribution of ℓ ∞ -Selmer groups of elliptic curves ordered by height.
Furthermore, S fits naturally into a short exact sequence where R := (Z ∩ W ) ⊗ Q ℓ Z ℓ and T is torsion. It is further conjectured that the joint distribution of (R, S, T ) models the joint distribution of the rank of the elliptic curve (i.e., R = (Q ℓ /Z ℓ ) r for r modeling the rank), the ℓ ∞ -fSelmer group, and the ℓ-primary part of the Tate-Shafarevich group, respectively [BKL + 15, Conjecture 1.3]. For example, the following proposition expresses the compatibility of these predictions with the Katz-Sarnak philosophy [KS99] that 50% of elliptic curves should have rank 0 and 50% should have rank 1.

5.3.2.
The ℓ ∞ Selmer distribution from BKLPR conditioned on rank. Let T 2m,r,ℓ be the distribution on finite abelian ℓ-groups, (notated in [BKL + 15] as T 2m,r ) given by the above process in § 5.3.1 conditioned on the assumption rk(Z ∩ W ) = r. By [BKL +  There is another characterization of the distribution T r,ℓ . For non-negative integers m, r with m − r ∈ 2Z ≥0 , let A be drawn randomly from the Haar probability measure on the set of alternating m × m-matrices over Z ℓ having rank m − r, and A m,r,ℓ be the distribution of (coker A) tors . According to [BKL + 15, Theorem 1.10], as m → ∞ through integers with m − r ∈ 2Z ≥0 , the distributions A m,r,ℓ converge to a limit A r,ℓ , which coincides with T r,ℓ .
Finally, [BKL + 15, §5.6] predicts that, conditioned on elliptic curves having rank r, X is distributed as the direct sum over all primes ℓ of a finite abelian group drawn from T r,ℓ . 5.3.3. The BKLPR n-Selmer distribution. We next review the model for n-Selmer elements described at the beginning of [BKL + 15, §5.7]. Let T r,ℓ denote the random variable defined on isomorphism classes of finite abelian ℓ groups (notated T r in [BKL + 15]) defined in [BKL + 15, Theorem 1.6] and reviewed in § 5.3.2. For G an abelian group, we let G[n] denote the n torsion of G. For n ∈ Z ≥1 with prime factorization n = ℓ|n ℓ a ℓ , define a distribution T r,Z/nZ on finitely generated Z/nZ modules by choosing a collection of abelian groups {T ℓ } ℓ|n , with T ℓ drawn from T r,ℓ , and defining the probability T r,Z/nZ = G to be the probability that ⊕ ℓ|n T ℓ [n] ≃ G.
Given the above predicted distribution for the n-Selmer group of elliptic curves of rank r, the heuristic that 50% of elliptic curves have rank 0 and 50% have rank 1 leads to the following predicted joint distribution of the n-Selmer group and rank: Definition 5.12. Let (rk BKLPR , Sel BKLPR n ) be the joint distribution on Z ≥0 × Ab n defined by which are the analogues of the V j in Lemma 5.6. Although S j depends on Z and W , and will be viewed as a random variable in the future, we suppress this dependence for notational convenience. The main result of this subsection is the following Theorem 5.13, and the proof encompasses the remainder of this subsection.
Theorem 5.13. Let V, Z, and W be as in § 5.3. Define random variables, valued in isomorphism classes of finite-dimensional F ℓ -vector spaces, by S 0 := V ⊗ F ℓ , and S 1 , S 2 , . . . , S j , . . . as in (5.3). Then, the sequence S 1 , S 2 , . . . is Markov, and the distribution of dim S i+1 given S i coincides with the distribution of the dimension of the kernel of a uniformly random alternating form on S i .
We omit the proof of the following lemma, which is similar to that of Lemma 5.6.
Lemma 5.14. Keep the notation above. Under the identification The non-degenerate bilinear form B on V induces a non-degenerate bilinear form on V ⊗ F ℓ , that we denote by B. We may sometimes abbreviate notation by using B(v, x), with v ∈ V and x ∈ V ⊗ F ℓ , to denote B(v (mod ℓ), x).
We will construct the sequence of alternating forms (one for each S j , whose radical is S j+1 ) referenced in Theorem 5.13.
using that Z and W are maximal isotropic. Therefore, The result then follows by tensoring with F ℓ .
Next, given v ∈ S j , we seek to characterize when v ∈ S j+1 . By definition, v ∈ S j is equivalent to the existence of a representative v ∈ W/ℓ j ∩ Z/ℓ j reducing to v mod ℓ, and lifts w v of v to W and Lemma 5.16. With notation above, v ∈ S j lies in S j+1 if and only if the associated ǫ as above satisfies ǫ ∈ ℓ 1−j (Z/ℓ j + W/ℓ j ) ∩ ℓ j−1 V /ℓ j V .
Proof. For v ∈ S j+1 , if we can find other lifts v ′ , w ′ v , z ′ v satisfying the same conditions, but such that w ′ v ≡ z ′ v (mod ℓ j+1 ). Such modifications are exactly of the form Hence v ∈ S j+1 if and only if we can choose δ W , δ Z such that Since w v = z v + ℓ j ǫ, this is equivalent to solving Lemma 5.17. There is a well defined bilinear form Proof. We need to check that the value is independent of the choices of v, w v , and z v . Indeed, any other allowable w ′ v differs from w v by an element of ℓ j W , say ℓ j δ with δ ∈ W . But since W/ℓ is isotropic and x lies in S j ⊂ W/ℓ ⊂ V /ℓ, we have B(δ, x) ≡ 0 (mod ℓ). Similarly, replacing z v with any other allowable z ′ v will not alter (5.5). Lemma 5.18. Keep the notation of the preceding discussion.
Proof. By definition, v ∈ S j is in the radical of A j if and only if (following the notation above) ǫ v := ℓ −j (w v − z v ) lies in S ⊥ j . But by Lemma 5.15, ǫ ∈ S ⊥ j if and only if ǫ v ∈ ℓ 1−j (Z/ℓ j + W/ℓ j ) ∩ ℓ j−1 V /ℓ j V , which, as we proved in Lemma 5.16, occurs if and only if ǫ ∈ S j+1 . For (ii), since we can take z v as a lift of v to V , it suffices to check B(w v − z v , z v ) ∈ ℓ j+1 Z ℓ . For this, write w v − z v = ℓ j ǫ and observe that Z and W are isotropic for Q, we have As in § 5.1, it suffices to show that as Z and W are drawn from the canonical measure on OGr (V,Q) (Z ℓ ), the alternating form A j is uniformly distributed. Proof. Fix W, Z ∈ OGr (V,Q) (Z ℓ ). Then we have a scheme over Z ℓ . This is evidently a torsor for the parabolic subgroup Isom(W, W ) ⊂ O(V, Q). Moreover, Witt's theorem implies that Isom(W, Z) has a point over F ℓ , which lifts to a Z ℓ -point because Isom(W, Z) is smooth (being a torsor for a smooth group scheme).
It will suffice to show that conditioning on a fixed W , the distribution of A j is already uniform. The distribution of Z conditioned on a fixed W coincides with the orbit measure on OGr (V,Q) (Z ℓ ) induced by the Haar measure on O(V, Q), since O(V, Q) acts transitively on OGr (V,Q) (Z ℓ ) by Lemma 5.19. As in § 5.1, it suffices to show that the distribution of A j is already uniform as Z varies over an orbit of a coset of the principal congruence subgroup For fixed Z 0 , which induces the alternating form , the alternating form associated to γZ 0 for γ ∈ Γ(ℓ j ) is , the resulting alternating form A j is uniformly distributed, so we are done.
Remark 5.20. Note that unlike in the case of the random kernel model, where we had additional complications to deal with associated to ℓ = 2 in § 5.2.2, there are no additional complications here for ℓ = 2 in the proof of Theorem 5.13, because here we are working with the full congruence subgroup Γ(ℓ j ), instead of a subgroup which may have index 2, as was the case in § 5.2.

Proofs of the main theorems
We conclude the paper by proving our main theorems. In § 6.1 we connect the actual Selmer distribution to the random kernel model, while in § 6.2 we connect the random kernel model to the BKLPR distribution. Combining these gives us a proof of our main theorem, Theorem 1.1. Finally, in § 6.3 we prove Theorem 1.6 and Corollary 1.7. 6.1. Comparing the Selmer distribution with the random kernel model. To start, we state one of our main theorems, which compares the distribution of Selmer groups of elliptic curves to the random kernel model. We prove this at the end of the subsection.
Theorem 6.1. Fix integers d ≥ 2 and n ≥ 1. For q ranging over prime powers, with gcd(q, 2n) = 1 and (r, G) ∈ Z ≥0 × Ab n , we have and In particular, The values of (6.3) and (6.4) agree when d is odd or n ≤ 2, but differ when d is even and n > 2.
We are nearly ready to prove Theorem 6.1, but first we will need to establish two preliminary results. The first preliminary result relates the Selmer group of an elliptic curve to the 1-eigenspace of Frobenius.
Proof. Notate the geometric fiber of W is a finite étale F q -scheme, we have Hence, combining this with Lemma 2.3, we obtain that for Here we are using that there is an isomorphism Our second preliminary result relates the rank of an elliptic curve denotes the quadratic space over Z, whose reduction mod n is (Q d n , V d n ) on which the monodromy representation ρ d n,k acts.
Proposition 6.3. Let d ≥ 2, and let ℓ be a prime. For q a prime power with gcd(q, 2ℓ) = 1, define (1) For q ranging over prime powers with gcd(q, 2ℓ) = 1, we have (2) For all x ∈ W d,rk an ≤1 (3) The above statements are true with analytic rank replaced by algebraic rank. , there is a particular Zariski closed hypersurface Z in the algebraic group O(Q d Z ℓ ), i.e., the hypersurface parameterizing elements with a two or more dimensional generalized 1-eigenspace, such that ρ d Z ℓ ,Z[1/2ℓ] (Frob x ) ∈ Z(Z ℓ ). By Lemma 3.20, for any positive integer e, we have By Theorem 3.14, we know im ρ d ℓ e ,Z[1/2ℓ] has index at most 2 in O(Q d Z ℓ ), and hence has size within a constant factor of ℓ e dim O(Q d Z ℓ ) . Therefore, it follows from Proposition 3.9 that (6.5) Crucially, the above constant does not depend on e, and so we may freely choose e to minimize the above error term. Indeed, we may take e to be the least positive integer so that q ≤ (ℓ e ) (1+3 dim O(Q d Z ℓ )) , or equivalently ) ≤ ℓ e . Then, so long as q > ℓ, replacing q by (ℓ e ) (1+3 dim O(Q d Z ℓ )) will introduce at most a factor of ℓ, and so . (6.6) Further, for the finitely many q < ℓ, we can adjust the constants so that the above still holds with no dependence on q. Combining (6.5) and (6.6), we find .
Part (3) follows from the proceeding ones and fact that, for elliptic curves of rank at most 1 over F q of characteristic ≥ 3, we know on a full density (as q → ∞) subset that algebraic rank equals analytic rank. For char F q > 3 the statement holds for every elliptic curve of rank at most 1, as explained in [ Lemma 3.14], so that in the large q limit, a density 1 subset of W ′ d B (F q ) corresponds to elliptic curves with everywhere semistable reduction.
Proof of Theorem 6.1. We will explain how the distribution of (rk an , Sel n )  This describes the distribution (Rrk, RSel n ) d Fq and hence yields (6.2), (6.3) and (6.4). To conclude the proof we need justify the values of (6.3) and (6.4) agree when d is odd or n ≤ 2 but differ when d is even and n > 2. Because these limits approach (Rrk, RSel n ) d Fq , it suffices to show (Rrk, RSel n ) d Fq is independent of q when d is odd or n ≤ 2 but depends on q when d is even. When d is odd, this follows from Definition 4.2 because the square class of q d−1 is always trivial, hence independent of q. Also, when n ≤ 2, this holds again by Definition 4.2 because the spinor norm is trivial. However, when d is even and n > 2, the spinor norm is nontrivial, and (Rrk, RSel n ) d Fq will change depending on whether q is a square or nonsquare. Indeed, when q is a square, Prob(RSel d n,Fq = (Z/nZ) 12d−4 ) > 0, corresponding to the case that g = id in Definition 4.2, while when q is not a square, Prob(RSel d n,Fq = (Z/nZ) 12d−4 ) = 0.

6.2.
Comparing the random kernel model with the BKLPR heuristic. We now prove: Theorem 6.4. The TV distance between the BKLPR heuristic and lim sup q→∞ (Rrk, RSel n ) d Fq is O(2 −(6d−2) 2 ), where the implicit constant is absolute, and similarly for the TV distance between the BKLPR heuristic and lim inf q→∞ (Rrk, RSel n ) d

Fq
In particular, we have Proof. By Definition 4.2, with probability one the rank is 0 or 1, and determined by whether the random g in the random kernel model has Dickson invariant 0 or 1, respectively. Hence the rank component of these distributions is completely determined by the Selmer component, we can focus our attention on the Selmer component. Thanks to Corollary 4.24, we know that the TV distance between lim inf q→∞ dim RSel d ℓ,Fq and the BKLPR heuristic for Sel ℓ is O(ℓ −(6d−2) 2 ), and similarly for lim sup q→∞ in place of lim inf q→∞ . The Markov properties Theorem 5.1 and Corollary 5.2 and Theorem 5.13 imply that for ℓ > 2, the two distributions for Sel ℓ e agree conditioned upon them agreeing for Sel ℓ . For ℓ = 2, the same is true as long as d 1 < 12d − 4 where the notation d 1 is as in Theorem 5.1, which only fails if g reduces to the identity element in O(12d − 4, F ℓ ). This happens with probability 1/#O(12d − 4, F ℓ ), which is negligible compared to the error term we seek. We conclude that the TV distance between the two distributions for Sel ℓ e is also O(ℓ −(6d−2) 2 ).
Finally, we consider general n. For n = ℓ a ℓ , the prime factorization of n, we have The BKLPR heuristic predicts that the distributions of the Sel ℓ a ℓ are independent after conditioning on the rank. If (V, Q) is a quadratic form over Z/nZ then note that Ω(Q) ≃ prime ℓ|n Ω(Q| Z/ℓ a ℓ Z ). Therefore, conditioned on each coset of Ω in H d,i ℓ,k the distributions (RSel kernel ℓ a ℓ ) d Fq are independent. Since the TV distance of two product distributions is the sum of the TV distance of the factors, the TV distance between the BKLPR heuristic and lim sup We can now complete the proof of Theorem 1.1.
Proof of Theorem 1.1. This follows immediately from combining Theorem 6.1 and Theorem 6.4.
6.3. Remaining results. We conclude by proving two remaining results, promised in the introduction. First, we prove Corollary 6.5, which is a version of Proposition 1.5 with more precise error terms, and then we prove Theorem 6.6 which is a version of Theorem 1.6 with more precise error terms.
Corollary 6.5 (Large q analog of [PR12, Conjecture 1.2]). For fixed integers d ≥ 2 and n ≥ 1, and q ranging over prime powers with gcd(q, 2n) = 1, we have if r ≥ 2. (6.7) Furthermore, Proof. The first statement follows immediately from (6.2) by summing over the set of possible groups G which can appear. For the statement regarding average rank, we also need to know that there is a uniform bound on the rank of elliptic curves of height d over F q (t), only depending on d. This holds because the rank is bounded by the size of the Selmer group, which is uniformly bounded in q among all elliptic curves of height d, as follows from [Lan21, Corollary 3.27], since the Selmer space Sel ′ d n,Fq is quasi-compact and quasi-finite over W ′ d Fq and hence has uniformly bounded fiber degree.
Theorem 6.6 (Large q analog of [PR12, Conjecture 1.4]). Let n be a squarefree positive integer, d ≥ 2, and ω(n) be the number of prime factors of n.
(1) Fix c ℓ ∈ Z ≥0 for each prime ℓ | n. Then Proof. The first part follows from Theorem 1.1 once we establish that Sel BKLPR n has distribution as predicted in the bottom line of (6.8). To see this, note that, by definition, the model Sel BKLPR n is determined by the models for Sel BKLPR ℓ with ℓ | n which are independent, except for the constraint that the parities of their Z/ℓZ ranks are all equal. Hence, it suffices to establish the first part in the case n = ℓ is prime. Note that the model Sel BKLPR Note that (2) is the special case of (3) with m = 1, so it suffices to prove (3 has values as given by the right hand sides of (2) and (3). To show this is the case, it is enough to show that both #W •d Fq (F q ) is within a factor of 1 + O n,d,m (q −1/2 ) of the total number of height d elliptic curves and #Sel •d,m n,Fq (F q ) is within a factor of 1 + O n,d,m (q −1/2 ) of the sum of # Sel n (E) m over all height d elliptic curves. First, #W •d Fq (F q ) certainly furnishes a lower bound for the size of the set of all elliptic curves of height d, while W ′ d Fq (F q ) furnishes an upper bound (it is only an upper bound because it includes non-minimal smooth elliptic curves). Next, using Lemma 2.3 to compare # Sel n (E) to #H 1 (P 1 , E 0 [n]), for E with smooth Weierstrass model, we find that #Sel •d,m n,Fq (F q ) indeed furnishes a lower bound for the sum of # Sel n (E) m over all height d elliptic curves.
Finally, to reduce to computing lim q→∞ gcd(q,2n)=1  ) and O(Q d n ) have ℓ|n m i=1 ℓ i + 1 orbits on (V d n ) m . This follows from Theorem 4.9 and Lemma 4.5, together with the Chinese remainder theorem to bootstrap this latter result from primes to squarefree integers.

Conflict of interest.
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Data availability. Data sharing not applicable to this article as no datasets were generated or analysed during the current study.