1 Introduction

The Chabauty–Kim method is a method for determining the set \(X({\mathbb {Q}})\) of rational points of a curve X over \({\mathbb {Q}}\) of genus bigger than 1. The idea is to locate \(X({\mathbb {Q}})\) inside \(X({\mathbb {Q}}_p )\) by finding an obstruction to a p-adic point being global. The method developed in [39, 40] produces a tower of obstructions

$$\begin{aligned} X({\mathbb {Q}}_p )\supset X({\mathbb {Q}}_p )_1 \supset X({\mathbb {Q}}_p )_2 \supset \ldots \supset X({\mathbb {Q}}) \end{aligned}$$

In [5], it is conjectured that \(X({\mathbb {Q}}_p )_n =X({\mathbb {Q}})\) for all \(n\gg 0\), and in [40] it is proved that standard conjectures in arithmetic geometry imply \(X({\mathbb {Q}}_p )_n \) is finite for all \(n\gg 0\), but in general these results are not known.

The first obstruction set \(X({\mathbb {Q}}_p )_1\) is the one produced by Chabauty’s method. In situations when \(X({\mathbb {Q}}_p )_1 \) is finite, it can often be used to determine \(X({\mathbb {Q}})\).

The main results of this paper concern the finiteness of the Chabauty–Kim set \(X({\mathbb {Q}}_p )_2 \) when X is one of the modular curves \(X_{{{\,\mathrm{ns}\,}}}^+ (N)\) or \(X_0 ^+ (N)\) (N a prime different from p), whose definition and properties we now recall briefly (more details are given in Sect. 4).

The curve \(X_0 ^+ (N)\) is the quotient of \(X_0 (N)\) by the Atkin–Lehner involution \(w_N\). The curve \(X_{{{\,\mathrm{ns}\,}}}^+ (N)\) is the quotient of X(N) by the normalizer of a nonsplit Cartan subgroup. Determining the rational points of \(X_{{{\,\mathrm{ns}\,}}}^+ (N)\) would resolve Serre’s uniformity question [58, §4.3]: is there an \(N_0\) such that, for all \(N>N_0\) and all elliptic curves E defined over \({\mathbb {Q}}\) without complex multiplication, the mod N Galois representation

$$\begin{aligned} \rho _{E,N} :{\text {Gal}}({\overline{{\mathbb {Q}}}} / {\mathbb {Q}})\rightarrow {\text {Aut}}(E[N]) \end{aligned}$$

is surjective? The Borel and normalizer of split Cartan subgroups of Serre’s uniformity question have been given a positive answer respectively in the celebrated papers [50] and [11, 12].

Mazur’s proof may, very crudely, be described as having two stages.

  1. 1.

    Construct a non-constant map \( f :X\rightarrow A \) from X to an abelian variety of rank zero over \({\mathbb {Q}}\).

  2. 2.

    Compute the finite set \(A({\mathbb {Q}})\), and the pre-image \(f^{-1}(A({\mathbb {Q}}))\supset X({\mathbb {Q}})\).

As is explained in Sect. 4, in contrast to \(X_0 (N)\) and \(X_{\mathrm {s}}^+ (N)\), for \(X=X_0 ^+ (N)\) or \(X=X_{{{\,\mathrm{ns}\,}}}^+ (N)\), the Birch–Swinnerton-Dyer conjecture implies that there are no non-constant maps from A to abelian varieties of rank zero over \({\mathbb {Q}}\). It is hence natural to ask whether we can attempt to mimic Mazur’s strategy, with the set \(A({\mathbb {Q}})\) replaced by the set \(X({\mathbb {Q}}_p )_n\) for some n. In Sect. 4, we show that the Birch–Swinnerton-Dyer conjecture similarly implies \(X({\mathbb {Q}}_p )_1\) is infinite, hence we expect to need \(n>1\). The main result of this paper is to carry out the first stage of Mazur’s strategy for \(n=2\).

Theorem 1

  1. 1.

    For all prime N such that \(g(X_0 ^+ (N)) \ge 2\), \(X_0 ^+ (N)({\mathbb {Q}}_p )_2 \) is finite for any \(p\ne N\).

  2. 2.

    For all prime N such that \(g(X_{{{\,\mathrm{ns}\,}}}^+ (N)) \ge 2\) and \(X_{{{\,\mathrm{ns}\,}}}^+ (N)({\mathbb {Q}})\ne \emptyset \), \(X_{{{\,\mathrm{ns}\,}}}^+ (N)({\mathbb {Q}}_p )_2 \) is finite for any \(p\ne N\).

Remark 1

  • For all primes N for which one of the curves X above has genus 0 or 1, \(X({\mathbb {Q}})\) is infinite. Indeed, the prime numbers N such that \(g(X_0^+(N)) \le 1\) make up a finite list with maximal element 131 [27, Propositions 3.1 and 3.2], and the elliptic cases cases can then be checked on the LMFDB [48] by looking at the corresponding explicit elliptic curves sorted by conductor (the genus 0 case is automatic due to the rational cusp). For the nonsplit Cartan modular curve, the genus formula [55, Proposition 13] proves that \(g(X_\mathrm{{ns}}^+(N)) \le 1\) if and only if \(N \le 11\), and these 5 cases are sorted similarly, as one can always find a rational point associated to an elliptic curve with CM coming by one of the 9 class number one fields.

  • The only reason for the assumption that \(X_{{{\,\mathrm{ns}\,}}}^+ (N)({\mathbb {Q}})\) is nonempty is that the definition of \(X({\mathbb {Q}}_p )_2 \) currently assumes that X has a rational point (if Serre’s uniformity question has a positive answer, then there are infinitely many N for which \(X_{{{\,\mathrm{ns}\,}}}^+ (N)({\mathbb {Q}})\) is empty). One can modify the definition of \(X({\mathbb {Q}}_p )_2 \) - for example in a similar manner to [33] - to remove this assumption, and then \(X_{{{\,\mathrm{ns}\,}}}^+ (N)({\mathbb {Q}}_p )_2 \) will be finite whenever the genus of \(X_{{{\,\mathrm{ns}\,}}}^+ (N)\) is greater than 1. In particular, such a modification should in principle given a method to prove that \(X_{{{\,\mathrm{ns}\,}}}^+(N)({\mathbb {Q}}_p )_2\), and hence \(X_{{{\,\mathrm{ns}\,}}}^+ (N)({\mathbb {Q}})\), is empty in these cases (although the large genera of such curves mean that in practice such curves are currently beyond the scope of existing computational methods for other reasons). As this involves several techniques not relevant to the proof of Theorem 1, we do not pursue this point in this paper.

  • Finally, results of [3], together with Edixhoven and Parent’s explicit models for \(X_{{{\,\mathrm{ns}\,}}}(N)\) [23], allow us to deduce from our result an explicit bound (polynomial in N) on the number of rational points on \(X_0^+(N)\) and \(X_{{{\,\mathrm{ns}\,}}}^+ (N)\), which we do in Sect. 3.1.

In this paper we say nothing about carrying out the second stage of Mazur’s strategy (i.e. computing the finite set \(X({\mathbb {Q}}_p )_2\)). However, as alluded to above, for a given X, if one can prove \(X({\mathbb {Q}}_p )_2\) is finite there has been significant recent progress in computing it, and \(X({\mathbb {Q}})\), in practice. For example, when \(N=13\), the rational points of \(X_{{{\,\mathrm{ns}\,}}}^+ (N)\) are computed in [8], by computing \(X({\mathbb {Q}}_p )_2\). Similarly for \(X=X_0 ^+ (N)\), the rational points of all X of genus 2 are computed in [4], and in forthcoming work [9], the case of all X of genus three is handled.

The proof of Theorem 1 proceeds along the lines of the quadratic Chabauty method, which requires a precise inequality (namely (2)) in terms of invariants of the Jacobian J of X to hold (see Sect. 1.1). This inequality is expected to hold asymptotically for \(X=X_0 ^+ (N)\) or \(X=X_{{{\,\mathrm{ns}\,}}}^+ (N)\) conditionally on Birch and Swinnerton-Dyer conjecture (see §4.1), but looks out of reach unconditionally for N in noncomputable range. There are thus two important steps obtained in the proof of Theorem 1:

  • For p a prime of good reduction of a smooth projective geometrically irreducible curve X over \({\mathbb {Q}}\) with \(X({\mathbb {Q}}) \ne \emptyset \), \(X({\mathbb {Q}}_p)_2\) is finite under the condition that a similar inequality to (2) holds not for J but a quotient abelian variety A of J, and under an additional hypothesis (C) on XJA.

  • For \(X=X_0^+(N)\) or \(X=X_{{{\,\mathrm{ns}\,}}}^+ (N)\), there is an abelian variety of A satisfying (2) and such that XJA satisfy (C), if for \(M=N\) (resp. \(N^2\)) there are two distinct normalised eigenforms \(f \in S_2 (\varGamma _0 (M))^{+,\text {new}} \) such that \(L'(f,1) \ne 0\).

The final input in the proof of Theorem 1 is the following Theorem.

Theorem 2

For all \(M=N\) or \(N^2\) with N prime, if the space \(S_2 (\varGamma _0 (M))^{+,\mathrm {new}} \) is of dimension at least two, it contains two distinct normalised newforms f such that \(L'(f,1) \ne 0\).

As explained in Remark 8, this result of nonvanishing is in fact quite weak compared to known or expected asymptotic estimates (giving a positive linear proportion of nonvanishing values) so the main difficulty in the proof of Theorem 2 lies in making such estimates effective enough to prove the result except for small enough N so that the remaining cases can be checked algorithmically.

1.1 Chow–Heegner points and quadratic Chabauty

In general, \(X({\mathbb {Q}}_p )_n \) cannot unconditionally be proved to be finite without some assumptions on the Jacobian of X (Kim showed that the Bloch–Kato conjectures imply that \(X({\mathbb {Q}}_p )_n \) is finite for all \(n\gg 0\) [40, Observation 2]). In the case \(n=1\) (which reduces to the classical set-up of Chabauty’s method) it is known that a sufficient condition is that

$$\begin{aligned} {{\,\mathrm{rk}\,}}(J) < \dim (J) \end{aligned}$$
(1)

where \({{\,\mathrm{rk}\,}}(J)\) is the Mordell–Weil rank of \(J({\mathbb {Q}})\). The simplest instance extending Chabauty’s method when finiteness of \(X({\mathbb {Q}}_p )_n\) can be proved for \(n >1\) is the following Lemma. To state the Lemma, let J denote the Jacobian of X, and recall that the Picard number \(\rho (J)\) is defined to be the rank of the Néron–Severi group \({{\,\mathrm{NS}\,}}(J):={\text {Pic}}(J)/{\text {Pic}}^0 (J)\). By [51, Proposition 17.2], this is the same as the dimension of the subspace denoted by \({\text {End}}^\dagger (J)\) of \({\text {End}}^0 (J):={\text {End}}(J) \otimes {\mathbb {Q}}\) consisting of endomorphisms that are symmetric, i.e. fixed by the Rosati involution.

Lemma 1

([6], Lemma 3.2) If

$$\begin{aligned} {{\,\mathrm{rk}\,}}(J) < \dim (J) + \rho (J) - 1, \end{aligned}$$
(2)

then \(X({\mathbb {Q}}_p )_2 \) is finite. In particular, if \({{\,\mathrm{rk}\,}}(J) = \dim (J)\), then \(X({\mathbb {Q}}_p )_2 \) is finite whenever \(\rho (J)>1\).

By Kolyvagin–Logachev type results due to Nekovář and Tian (see Proposition 8 and its Corollary 4), Theorem 2 implies that the Jacobians of \(X_0 ^+ (N)\) and \(X_{{{\,\mathrm{ns}\,}}}^+ (N)\), which we will henceforth denote by \(J_0 ^+ (N)\) and \(J_{{{\,\mathrm{ns}\,}}}^+ (N)\) respectively, do have \({\mathbb {Q}}\)-isogeny factors A satisfying \({{\,\mathrm{rk}\,}}(A) <\dim (A) + \rho (A)-1\), but it seems unattainable to prove unconditionally such a result for the full Jacobian. To deduce Theorem 1, we thus need a ‘quadratic Chabauty for quotients’ result, analogous to the well-known fact that Chabauty’s method also works under the relaxed condition \({{\,\mathrm{rk}\,}}(A)<\dim (A)\), i.e. (1) for an isogeny factor A instead of J (in fact, for modular curves, Mazur–Kamienny’s method refines this for factors A such that \({{\,\mathrm{rk}\,}}(A)=0\), see e.g. [2]).

As explained below, in general such a result seems non-trivial. Fix a basepoint \(b\in X({\mathbb {Q}})\), and let \(\mathrm {AJ}:X\rightarrow J\) be the corresponding Abel–Jacobi map. Let AB be abelian varieties over \({\mathbb {Q}}\), satisfying \({\text {Hom}}(A,B)=0\), and suppose we have a surjection \((\pi _A ,\pi _B ) :J \rightarrow A \times B\).

A slight modification denoted by \({\widetilde{\mathrm {AJ}}}^*\) of the pullback by \(\mathrm {AJ}\) (which basically amounts to considering the restriction of \(\mathrm {AJ}^*\) on symmetric line bundles, see §2.1) vanishes on \({\text {Pic}}^0(J)\), so it factors through \({{\,\mathrm{NS}\,}}(J)\) and \({\widetilde{\mathrm {AJ}}}^* :{{\,\mathrm{NS}\,}}(J) \rightarrow {\text {Pic}}(X)\) will denote this factorisation by abuse of notation. It induces a map

$$\begin{aligned} d_{\pi _A} :{{\,\mathrm{NS}\,}}(A) \overset{{\widetilde{\mathrm {AJ}}}^* \circ \pi _A^*}{\longrightarrow } {\text {Pic}}(X) \overset{\deg }{\rightarrow } {\mathbb {Z}}\end{aligned}$$
(3)

and therefore a map

$$\begin{aligned} \theta _{X,\pi _A ,\pi _B } :{\text {Ker}}d_{\pi _A} \overset{{\widetilde{\mathrm {AJ}}}^* \circ \pi _A^*}{\longrightarrow } {\text {Pic}}^0(X) \longrightarrow J({\mathbb {Q}}) \overset{\pi _B \otimes {\mathbb {Q}}}{\longrightarrow }B({\mathbb {Q}})\otimes {\mathbb {Q}}, \end{aligned}$$
(4)

which is called the Chow-Heegner construction (see Definition 3 for details).

Remark 2

As an alternative definition (useful for the proofs), for any correspondence \(Z \subset X\times X\), we can associate a cycle \(D_Z (b)\in {\text {Pic}}^0(X)\) (see (16)), and this defines a homomorphism \({{\,\mathrm{NS}\,}}(X \times X) \rightarrow {\text {Pic}}^0(X)\) so that the composition

$$\begin{aligned} {{\,\mathrm{NS}\,}}(J) \overset{(\mathrm {AJ}^{(2)})^*}{\longrightarrow } {{\,\mathrm{NS}\,}}(X \times X) \longrightarrow {\text {Pic}}^0(X), \end{aligned}$$

where \(\mathrm {AJ}^{(2)} :X \times X \rightarrow J\) is defined by \((x,y) \mapsto [x] + [y] - 2[b]\), is equal to \({\widetilde{\mathrm {AJ}}}^*\) on \(({\widetilde{\mathrm {AJ}}}^*)^{-1}({\text {Pic}}^0(X))\), which then allows us to retrieve \(\theta _{X,\pi _A,\pi _B}\) on cycles Z coming from \({\text {Ker}}d_{\pi _A}\).

The ‘quadratic Chabauty for quotients’ result that we prove in this paper says that we can replace J with A, but the price we pay is that we replace \(\rho (J)-1\) with the rank of \({\text {Ker}}(\theta _{X,\pi _A ,\pi _B})\), which can be smaller than \(\rho (A)-1\).

Proposition 1

Let X be a curve as above. Suppose J admits an isogeny \((\pi _A ,\pi _B) :J\rightarrow A\times B\), where \({\text {Hom}}(A,B)=0\). If

figure a

then \(X({\mathbb {Q}}_p )_2 \) is finite.

In the case where \({{\,\mathrm{rk}\,}}(A)=\dim (A)\) which we will focus on, we can simplify this condition in terms of nice correspondences, defined in Sect. 2.1. More precisely, \((\pi _A, \pi _B)\) induces an isomorphism \({\text {End}}^0(J) \cong {\text {End}}^0(A) \times {\text {End}}^0(B)\), and \(X({\mathbb {Q}}_p )_2 \) is finite whenever there exists a nontrivial nice correspondence Z on \(X\times X\) whose corresponding endomorphism of J is zero in \({\text {End}}^0(B)\), and whose corresponding Chow–Heegner point \(D_Z (b) \in {\text {Pic}}^0(X)\) is torsion when projected to B.

Remark 3

Note that, since \({{\,\mathrm{rk}\,}}({\text {Ker}}(\theta _{X, \pi _A ,\pi _B })) \le \rho (A) -1\), inequality (C) implies that A satisfies the naive analogue of Lemma 1

$$\begin{aligned} {{\,\mathrm{rk}\,}}(A)< \dim (A)+\rho (A) -1. \end{aligned}$$
(5)

However, in general (C) is strictly stronger than (5). In fact, the trivial lower bound on \({{\,\mathrm{rk}\,}}({\text {Ker}}(\theta _{X,\pi _A,\pi _B})\) is \(\rho (A)-1 - {{\,\mathrm{rk}\,}}(B)\) and if the latter was positive, it would imply (2). This is why Proposition 2 looks quite particular to modular curves. Moreover, understanding the rank of \({\text {Ker}}(\theta _{X,\pi _A ,\pi _B })\) in general seems somewhat subtle - as becomes apparent in Example 1 and Sect. 3.2, this quantity is not an invariant of the pair (AB), or even of the triple (XAB), and does not seem to behave so well functorially even under quite strong hypotheses. Finally, as explained in the first appendix, this quantity is also related to the Gross–Kudla–Schoen cycles constructed in [31].

The following proposition emphasises that in fact, the supplementary condition (C) can always be satisfied for our modular curves.

Proposition 2

Let \(X=X_0 ^+ (N)\) or \(X_{{{\,\mathrm{ns}\,}}}^+ (N)\), and \(J={\text {Jac}}(X)\). Assume Theorem 2 holds, and the genus of X is at least two. Then J admits an isogeny \((\pi _A,\pi _B) :J \rightarrow A \times B\) satisfying

  1. 1.

    \({{\,\mathrm{rk}\,}}(A) = \dim A \ge 2\).

  2. 2.

    \(\rho (A)>1\).

  3. 3.

    \({{\,\mathrm{rk}\,}}({\text {Ker}}(\theta _{X,\pi _A,\pi _B})) =\rho (A)-1\).

As will become apparent in the proof, in fact we take A to be the maximal isogeny factor of J whose analytic rank is equal to its dimension and B its complement, otherwise we might not be able to ensure that the kernel of \(\theta _{X,\pi _A,\pi _B}\) is nontrivial. This idea relies heavily on the use of (traces of) Heegner points on the modular curves \(X_0(N),X_\mathrm{{ns}}(N)\), which generate \(A({\mathbb {Q}})\) up to finite index, but will automatically be torsion in \(B({\mathbb {Q}})\), both situations being ultimately by-products of the generalised Gross–Zagier formula (see Sect. 4.2). Note that in this case the kernel of the theta morphism is not only nontrivial, but as large as it can be, which might indicate a deeper phenomenon at play.

The structure of the paper is as follows. In Sect. 2, we give some reminders on Néron–Severi groups, Chow groups and correspondences, and describe the map \(\theta _{X,\pi _A,\pi _B}\) in terms of cycles. In Sect. 3 we prove Proposition 1. In Sect. 4, we prove Proposition 2 assuming Theorem 2, after some discussion on (C), and using generalised Gross–Zagier formulas. In Sect. 5, we prove Theorem 2. Finally, for sake of clarity and by lack of easily available references in the literature, we gather in Appendix 6 results about the Chow–Heegner construction above and explain in Appendix 7 the proof of the Kolyvagin–Logachev type result needed to translate Theorem 2 into an algebraic rank result.

1.2 Notation and conventions

Unless stated otherwise, we adopt the following conventions in this paper.

\(\bullet \) X is a smooth projective geometrically irreducible curve of genus \(\ge 2\) over \({\mathbb {Q}}\). J is the Jacobian of X and \(\mathrm {AJ}:X \rightarrow J\) is the Albanese morphism with a fixed base point \(b \in X({\mathbb {Q}})\). The notation \({\widetilde{\mathrm {AJ}}}^*\) refers to twice the pullback on symmetric line bundles of X to \({\text {Pic}}(X)\) (see (13)), and then factors through \({{\,\mathrm{NS}\,}}(J)\) (this is not the same as just the pullback \(\mathrm {AJ}^*\) from \({\text {Pic}}(J)\) to \({\text {Pic}}(X)\), which does not vanish on \({\text {Pic}}^0(J)\)).

\(\bullet \) For any n and any \(S \subset \{1, \ldots , n\}\), the morphism

$$\begin{aligned} i_S(b):X \rightarrow X^n \end{aligned}$$
(6)

is defined so that the j-th coordinate of \(i_S(b)(x)\) is x if \(j \in S\) and b otherwise. When there is no ambiguity on b we denote it simply by \(i_S\). Similarly, the morphism

$$\begin{aligned} \pi _S:X^n \rightarrow X^{\# S} \end{aligned}$$
(7)

denotes the projection of \((x_1, \ldots , x_n)\) on the coordinates belonging to S.

\(\bullet \) Morphisms between algebraic varieties over \({\mathbb {Q}}\) and their structures (line bundles, divisors, etc) are assumed to be defined over \({\mathbb {Q}}\).

\(\bullet \) For a smooth projective algebraic variety Y over \({\mathbb {Q}}\), \({{\,\mathrm{NS}\,}}(Y)\) is the Néron-Severi group of Y, and \(\rho (Y):= {{\,\mathrm{rk}\,}}{{\,\mathrm{NS}\,}}(J)\) is the Picard number of J (see §2.1).

\(\bullet \) For any abelian variety A over \({\mathbb {Q}}\) (in particular for J), \({{\,\mathrm{rk}\,}}(A)\) is the rank of the finite type \({\mathbb {Z}}\)-module \(A({\mathbb {Q}})\) and \({\text {End}}^0 (A) := ({\text {End}}_{\mathbb {Q}}A) \otimes {\mathbb {Q}}\).

\(\bullet \) N is a prime number (the level of our modular curves) and \(M=N\) or \(N^2\).

\(\bullet \) \(X_0(N)\) (resp. \(X_\mathrm{{s}}^+(N)\), \(X_\mathrm{{ns}}^+(N)\)) is the modular curve quotient of X(N) corresponding to the Borel structure (resp. normaliser of split Cartan, normaliser of nonsplit Cartan), \(X^+_0(N)\) is the quotient of \(X_0(N)\) by the Atkin-Lehner \(w_N\). Accordingly, the respective jacobians of these modular curves are denoted respectively by \(J_0(N), J_\mathrm{{s}}^+(N), J_\mathrm{{ns}}^+(N), J_0^+(N)\) (see Sect. 4).

\(\bullet \) For X a variety over a field \(K\subset \mathbb {C}\), \(H^k (X,\mathbb {Z})\) refers to the singular cohomology of \(X({\mathbb {C}})\).

\(\bullet \) Given a unipotent group U, the central series filtration of U is defined by \(U^{(1)} = U\) and \(U^{(i+1)}= [U,U^{(i)}]\), and \({\text {gr}}_i (U):=U^{(i)}/U^{(i+1)}\) (in particular \({\text {gr}}_1 (U)=U^{\mathrm {ab}}\)). If a group G acts continuously on U, then G acts on the set of normal subgroups of U, and we say that a quotient U/H is G-stable if the normal subgroup H is stabilised by G. In this case there is a unique G-action on U/H making the surjection G-equivariant.

\(\bullet \) The letter p denotes a prime number different from N which will be used (except in Appendix 7) only in the context of p-adic numbers.

2 The quadratic Chabauty condition (C) for a quotient

2.1 Reminders on Chow groups and Néron–Severi groups

We recall here the basic notions on correspondences of curves, and the Chow groups and Néron–Severi groups that we need. A good reference on correspondences is Smith’s thesis [61, Chapter 3], and classical ones are [13, section 11.5] for the complex case and [26, Chapter 16] for the general case.

Definition 1

For any geometrically smooth and irreducible projective variety Y over \({\mathbb {Q}}\) and any \(k \le \dim Y\):

  • The Chow group \(\mathrm {CH}^{k}(Y)\) is the group of cycles of Y of codimension k up to rational equivalence.

  • \(c_k :\mathrm {CH}^k (Y)\rightarrow H^{2k}(Y,\mathbb {Z})\) is the cycle map, and \(\mathrm {CH}^{k}_0(Y) :={\text {Ker}}(c_k )\) is its subgroup of homologically trivial cycles (in \(Y({\mathbb {C}})\)).

In particular, there are canonical isomorphisms

$$\begin{aligned} \mathrm {CH}^1(Y) \cong {\text {Pic}}(Y), \quad \mathrm {CH}^1_0 (Y) \cong {\text {Pic}}^0(Y). \end{aligned}$$

The Néron-Severi group \({{\,\mathrm{NS}\,}}(Y) := {\text {Pic}}(Y)/ {\text {Pic}}^0(Y)\) is thus embedded in \(H^2(Y({\mathbb {C}}),{\mathbb {Z}})\).

We can also define a geometric étale cycle map [21, Cycle]

$$\begin{aligned} c_k ^{l,{\acute{\mathrm{e}}\mathrm{t}}} :\mathrm {CH}^k (Y) \rightarrow H^{2k}_{{\acute{\mathrm{e}}\mathrm{t}}}(Y_{{\overline{{\mathbb {Q}}}}},\mathbb {Z}_l (k)) \end{aligned}$$

and an absolute étale cycle map

$$\begin{aligned} c_k ^{\mathrm {abs}} :\mathrm {CH}^k (Y)\rightarrow H^{2k}_{{\acute{\mathrm{e}}\mathrm{t}}}(Y,\mathbb {Z}_l (k)). \end{aligned}$$

By the Artin comparison theorem we have \({\text {Ker}}(\prod _l c_k ^{l,{\acute{\mathrm{e}}\mathrm{t}}})=\mathrm {CH}^k _0 (Y)\). The étale Abel–Jacobi morphism is a homomorphism

$$\begin{aligned} \mathrm {AJ}_{{\acute{\mathrm{e}}\mathrm{t}}} :\mathrm {CH}^k _0 (Y)\rightarrow {{\,\mathrm{Ext}\,}}^1 _{{{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}}({\mathbb {Q}}_p ,H^{2k-1}_{{\acute{\mathrm{e}}\mathrm{t}}}(Y_{{\overline{{\mathbb {Q}}}}},{\mathbb {Q}}_p (k))) \end{aligned}$$

which may be defined using the Leray spectral sequence or (equivalently but more directly) by realising the extension class of a homologically trivial cycle Z inside \(H^{2k-1}((X-Z)_{{\overline{{\mathbb {Q}}}}},{\mathbb {Q}}_p (k))\) (see Jannsen [38, II.9] or Nekovar [53, 5.1]). By Poincaré duality, we may equivalently think of the target of \(\mathrm {AJ}_{{\acute{\mathrm{e}}\mathrm{t}}} \) as being

$$\begin{aligned} {{\,\mathrm{Ext}\,}}^1 _{{{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}}(H^{2(d-k)+1}_{{\acute{\mathrm{e}}\mathrm{t}}}(Y_{{\overline{{\mathbb {Q}}}}},{\mathbb {Q}}_p (d)),{\mathbb {Q}}_p (k)) \quad (d = \dim Y). \end{aligned}$$

In particular, when \(Y=X\) is a curve, and for \(k=1\), the target of \(\mathrm {AJ}_{{\acute{\mathrm{e}}\mathrm{t}}}\) is

$$\begin{aligned} {{\,\mathrm{Ext}\,}}^1_{{{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}}(V_p(J),{\mathbb {Q}}_p(1)), \end{aligned}$$

where J is the Jacobian of X and \(V_p(J) = T_p(J) \otimes _{{\mathbb {Z}}_p} {\mathbb {Q}}_p\).

Let us now review the basic definitions of correspondences.

Definition 2

For two curves \(X_1,X_2\) as before:

  • A correspondence Z on \(X_1,X_2\) is a divisor of \({\text {Div}}(X_1 \times X_2)\), prime if the underlying divisor is. It is called fibral if its prime components are horizontal or vertical divisors.

  • If Z is a nonfibral prime correspondence, the two projections \(\pi _{1,Z}, \pi _{2,Z} :Z \rightarrow X_1, X_2\) are nonconstant so \(\psi _Z :=(\pi _{2,Z})_* \circ \pi _{1,Z}^*\) defines a morphism from \({\text {Div}}(X_1)\) to \({\text {Div}}(X_2)\), inducing a morphism between the Jacobians of \(X_1\) and \(X_2\), and two rationally equivalent divisors define the same morphism. This defines by linearity (extending to 0 for fibral prime divisors) a surjective morphism

    $$\begin{aligned} {\psi } :{\text {Pic}}(X_1 \times X_2) \rightarrow {\text {Hom}}({\text {Jac}}(X_1), {\text {Jac}}(X_2)), \end{aligned}$$
    (8)

    with kernel \(\pi _1 ^* {\text {Pic}}(X_1)\oplus \pi _2 ^* {\text {Pic}}(X_2)\) with notation (7) ( [13, Theorem 11.5.1] or [61, Theorem 3.3.12]).

When \(X=X_1=X_2\), with the choice of a base point b, using notation from (6) and (7), we obtain from \(\pi _1 \circ i_1 = {\text {Id}}_X\) and similar relations the identities

$$\begin{aligned} {\text {Pic}}(X \times X)= & {} \pi _1^*{\text {Pic}}(X) \oplus \pi _2^* {\text {Pic}}(X) \oplus {\text {Ker}}(i_1^* \oplus i_2^*) \end{aligned}$$
(9)
$$\begin{aligned} {\text {Pic}}^0(X \times X)= & {} \pi _1^*{\text {Pic}}^0(X) \oplus \pi _2^* {\text {Pic}}^0(X), \end{aligned}$$
(10)

(see [61, Proposition 3.3.8], as homologically trivial cycles are homomorphically trivial) which induces a decomposition

$$\begin{aligned} {{\,\mathrm{NS}\,}}(X \times X) = \pi _1^* {{\,\mathrm{NS}\,}}(X) \oplus \pi _2^* {{\,\mathrm{NS}\,}}(X) \oplus {\text {Ker}}(i_1^* \oplus i_2^*), \end{aligned}$$
(11)

where the last direct factor then canonically identifies with \({\text {End}}(J)\) via (8). By abuse of notation, we thus denote

$$\begin{aligned} \psi ^{-1} :{\text {End}}(J) \overset{\cong }{\rightarrow } {\text {Ker}}(i_1^* \oplus i_2^*) \end{aligned}$$

the inverse of this isomorphism. Now, the morphism \(i_{1,2}^* - i_1^* - i_2^*\) is trivial when restricted to \({\text {Pic}}^0(X \times X)\), hence induces a morphism

$$\begin{aligned} \varphi :{{\,\mathrm{NS}\,}}(X \times X) \rightarrow {\text {Pic}}(X). \end{aligned}$$
(12)

Define

$$\begin{aligned} \begin{array}{c|ccl} \mathrm {AJ}^{(2)} :&{} X \times X &{} \longrightarrow &{} J \\ &{} (x,y) &{} \longmapsto &{} [x]+[y]-2[b] \end{array}, \quad {\widetilde{\mathrm {AJ}}}^* := \varphi \circ (\mathrm {AJ}^{(2)})^*. \end{aligned}$$

We have \({\widetilde{\mathrm {AJ}}}^* =[2]^* \circ \mathrm {AJ}^* -2\mathrm {AJ}^*\) so for \([{{\mathcal {L}}}] \in {\text {Pic}}(J)\),

$$\begin{aligned} {\widetilde{\mathrm {AJ}}}^*([{{\mathcal {L}}}]) = \mathrm {AJ}^*( [{{\mathcal {L}}}]) + \mathrm {AJ}^* ([-1]^* [{{\mathcal {L}}}]). \end{aligned}$$
(13)

using the classical identity \([n]^* ({\mathcal {L}})\simeq {\mathcal {L}}^{\otimes (\frac{n^2+n}{2})}\otimes [-1]^* ({\mathcal {L}}^{\otimes (\frac{n^2 -n}{2})})\). In particular, \({\widetilde{\mathrm {AJ}}}^*\) is twice the usual pullback by \(\mathrm {AJ}\) on symmetric line bundles.

For any divisor D of \(X \times X\), the degree of \(\varphi (D)\) is equal to the rational trace of \(\psi (D)\) ( [13, Proposition 11.5.2]). This induces a morphism

$$\begin{aligned} {\widetilde{\theta }}_{X,b} :{\text {End}}(J)^\mathrm{{tr}=0} \overset{\varphi \circ \psi ^{-1} }{\longrightarrow } {\text {Pic}}^0(X). \end{aligned}$$

By [52, IV.20], the rule \({\mathcal {L}}\mapsto \lambda _{{\mathcal {L}}}\) defined by \(\lambda _{{\mathcal {L}}}(P) = T_P^*{{\mathcal {L}}}\otimes {{\mathcal {L}}}^{-1} \in {\text {Pic}}^0(J)\) induces an isomorphism

$$\begin{aligned} \begin{array}{c|ccl} {\tilde{\lambda }} :&{} {{\,\mathrm{NS}\,}}(J) &{} \longrightarrow &{} {\text {End}}^{\dagger }(J) \\ &{} [{{\mathcal {L}}}] &{} \longmapsto &{} {{\mathcal {P}}}^{-1} \circ \lambda _{{\mathcal {L}}} \end{array} \end{aligned}$$
(14)

where \({{\mathcal {P}}}:J \overset{\cong }{\rightarrow } {\widehat{J}}\) is a natural principal polarisation given by a theta divisor. This the same as applying the composition \(- \psi \circ (\mathrm {AJ}^{(2)})^*\). Indeed, via the natural morphisms \({\widehat{J}} \cong {\text {Pic}}^0(J)\) and \({\text {Pic}}^0(X) \cong J\), the inverse \({\widehat{J}} \rightarrow J\) of the principal polarisation given by a theta divisor on J is equal to \(- \mathrm {AJ}^*\) from \({\text {Pic}}^0(J)\) to \({\text {Pic}}^0(X)\) [13, Proposition 11.3.5].

Now, in terms of line bundles, by definition, given a line bundle L on \(X \times X\), the endomorphism of \({\text {Pic}}(X)\) associated to it is given on points by \(x \mapsto i_2^*(x)(L)\) with notation (6). As \((\mathrm {AJ}^{(2)} \circ i_2(x)) = T_{[x]-[b]} \circ \mathrm {AJ}\), for a line bundle \({{\mathcal {L}}}\) on \({\text {Pic}}(J)\) and xy points of X the endomorphism associated to \(L=(\mathrm {AJ}^{(2)})^* {{\mathcal {L}}}\) sends \([x]-[y]\) to

$$\begin{aligned} \mathrm {AJ}^* (T_{[x]-[b]}^* {{\mathcal {L}}}- T_{[y]-[b]}^* {{\mathcal {L}}}) = \mathrm {AJ}^* (T_{[x]-[y]}^* {{\mathcal {L}}}- {{\mathcal {L}}}) = \mathrm {AJ}^* \lambda _{{\mathcal {L}}}([x]-[y]), \end{aligned}$$

which gives the equality up to \(-1\). Hence, if we define

$$\begin{aligned} {{\,\mathrm{NS}\,}}(J)^0 :={\text {Ker}}({{\,\mathrm{NS}\,}}(J){\mathop {\longrightarrow }\limits ^{\deg }}{{\,\mathrm{NS}\,}}(X)), \end{aligned}$$

and

$$\begin{aligned} \theta _{X,b} := {\widetilde{\mathrm {AJ}}}^*_{|{{\,\mathrm{NS}\,}}(J)^0} :{{\,\mathrm{NS}\,}}(J)^0 \rightarrow {\text {Pic}}^0(X) \end{aligned}$$

then we have the following commutative diagram to sum up all the previous properties. Every symbol \(\circlearrowleft \) means that the diagram around it commutes, and every \(\circlearrowleft _{-}\) means that one composition is equal to \(-1\) times the other. Dashed arrows indicate that the morphisms are only defined on part of the domain or with small codomain, but in each case, it admits a natural extension. By abuse of notation, \(\psi \) and \((\mathrm {AJ}^{(2)})^*\) are used both on Picard groups and Néron–Séveri groups.

(15)

Remark 4

In [8], an element of \({\text {Pic}}(X\times X)\) whose image under \(\psi \) lies in \({\text {End}}^\dagger (J)^{{{\,\mathrm{tr}\,}}=0}\) is referred to as a ‘nice correspondence’.

2.2 Chow–Heegner points and diagonal cycles

We recall an equivalent version of the morphism \({\widetilde{\theta }}_{X,b}\), which appears in [18] and [6]. As our discussion applies in fairly broad generality, we take X to be a smooth geometrically irreducible projective curve over a field K of characteristic zero. Fix \(b\in X(K)\), and \(S \subset \{ 1,\ldots n\}\), let \(X_S\) denote the image of X under the closed immersion \(i_S (b)\) defined in (6). For any \(Z \in {\text {Div}}(X \times X)\), let \(C_Z(b) := (i_{\{1,2\}}^*(b) -i_{\{1 \}}^*(b) -i_{\{2 \} }^*(b) )(Z) = \varphi ([Z])\) and

$$\begin{aligned} D_Z (b) := C_Z(b)- \deg (C_Z(b)) \cdot b \in {\text {Pic}}^0(X). \end{aligned}$$
(16)

We refer to \(D_Z (b)\) and \(C_Z (b)\) as Chow–Heegner points, following [19].

The map \(Z\mapsto D_Z (b)\) factors through \({\text {Pic}}(X\times X)\), and has the following relation to \({\widetilde{\theta }}_{X,b}\). The projection

$$\begin{aligned} \varPi :{\text {Pic}}(X\times X)\rightarrow {\text {Ker}}(i_1 ^* \oplus i_2 ^* ) \end{aligned}$$

associated to (910) is given by \((1-\pi _1 ^* \circ i_1 ^* -\pi _2 ^* \circ i_2 ^* )\), giving the identities

$$\begin{aligned} \varphi \circ \varPi = i_{\{ 1,2 \} }^* \circ \varPi = i_{\{1,2\} }^* - i_1 ^* -i_2 ^*, \quad \psi ^{-1} \circ \psi = \varPi . \end{aligned}$$

Since \(\deg (C_Z (b))=\deg (\varphi (\varPi ([Z])))\), for any Z in \({\text {Pic}}(X\times X)\) which lies in the kernel of \(\deg \varphi \), we have

$$\begin{aligned} D_Z (b)=C_Z (b)=\varphi ([Z]) = \varphi (\varPi ([Z])) = {\tilde{\theta }}_{X,b} (\psi ([Z])). \end{aligned}$$
(17)

These computations also prove the claims of Remark 2 using the diagram (15). We define \(Z^t \in \mathrm {CH}^1 (X\times X)\) to be the pull-back of Z under the involution

$$\begin{aligned} X\times X&\rightarrow X\times X \\ (x,y)&\mapsto (y,x). \end{aligned}$$

Lemma 2

In the notation of Definition 2, we have

$$\begin{aligned} D_Z (b')-D_Z (b)=\psi _Z (b-b')+\psi _{Z^t }(b-b'). \end{aligned}$$

Proof

We have \(i_{\{1,2\}}(b)=i_{\{1,2\}}(b')\). Hence

$$\begin{aligned} C_Z (b')-C_Z (b)=i_{\{1\}}(b)^*(Z) -i_{\{ 1\}}(b')^*(Z) + i_{\{2\}}(b)^*(Z) -i_{\{ 2\}}(b')^*(Z). \end{aligned}$$

By definition of the correspondences, we then have

$$\begin{aligned} (i_{\{1\}}(b)^* -i_{\{1 \} }(b')^* )(Z)=\psi _Z (b-b') \end{aligned}$$

and

$$\begin{aligned} (i_{\{2\}}(b)^* -i_{\{2 \} }(b')^* )(Z)=\psi _{Z^t} (b-b'), \end{aligned}$$

which proves the equality for \(C_Z(b') - C_Z(b)\), thus for \(D_Z(b') - D_Z(b)\) as the degrees are then equal. \(\quad \square \)

Definition 3

Given a surjective homomorphism \(\pi _B:J\rightarrow B\) of abelian varieties, we obtain a homomorphism

$$\begin{aligned} {\text {Ker}}({{\,\mathrm{NS}\,}}(J) \overset{\deg \circ {\widetilde{\mathrm {AJ}}}^*}{\longrightarrow } {\mathbb {Z}}) \overset{ {\widetilde{\mathrm {AJ}}}^*}{\longrightarrow } {\text {Pic}}^0(X) \longrightarrow J \overset{\pi _B}{\longrightarrow } B. \end{aligned}$$
(18)

By Lemma 2 and (17), for a divisor Z on \(X \times X\), if \(\psi _{\varPi (Z)}\) has image contained in \({\text {Ker}}(\pi _B )\), then the image of [Z] in B via (18) is independent of the choice of basepoint. In particular, if we have a surjection \((\pi _A ,\pi _B):J\rightarrow A\times B\), and \({\text {Hom}}(A,B)=0\), then we obtain a homomorphism independent of b, which we will denote by

$$\begin{aligned} \begin{array}{c|ccl} \theta _{X,\pi _A,\pi _B} :&{} {\text {Ker}}(d_{\pi _A }) \subset {{\,\mathrm{NS}\,}}(A) &{} \longrightarrow &{} B \\ &{} [{{\mathcal {L}}}] &{} \longmapsto &{} \pi _B \circ {\theta }_{X,b}\circ \pi _A ^* ([{{\mathcal {L}}}]) \end{array}. \end{aligned}$$

Remark 5

This construction also has a direct description in terms of line bundles, although this is not the one we use to calculate \(\theta _{X,\pi _A ,\pi _B}\) in examples. Given a line bundle \({{\mathcal {L}}}_A\) on A whose pull-back to X via \(\mathrm {AJ}^* \circ \pi _A ^* \) has degree zero, we may also consider the projection of \(\mathrm {AJ}^* \circ \pi _A ^* ({{\mathcal {L}}}_A)\) to B. Variants of this construction are studied in the thesis of Michael Daub [20] when \({\text {Hom}}(A,B)=0\). By (13) and because \({\text {Pic}}^0(J)\) contains all classes of antisymmetric line bundles, we have the identity [20, Proposition 3.3.3]

$$\begin{aligned} \theta _{X,\pi _A ,\pi _B } \circ p = [2] \circ \pi _B \circ \mathrm {AJ}^* \circ \pi _A ^* ; \end{aligned}$$

where p is the projection from \({\text {Pic}}(A)\) to \({{\,\mathrm{NS}\,}}(A)\) restricted to \(p^{-1}({\text {Ker}}(d_{\pi _A}))\). In particular, the right-hand side does vanish on \({\text {Pic}}^0(A)\) [20, Proposition 3.3.2].

Example 1

Note that \(\theta _{X,\pi _A ,\pi _B }\) is not an invariant of A and B, or even of XAB. For example, let A and B be distinct isogeny factors of \(X_0 (N)\), and let \(X=X_0 (N^2 )\). Let \(f_1 ,f_2 :X\rightarrow X_0 (N)\) be the two natural morphisms, and let \((\pi _{A_i },\pi _{B_i })\) be the morphisms \({\text {Jac}}(X)\rightarrow A\times B\) obtained by composing the surjection \(J_0 (N)\rightarrow A\times B\) with \(f_{i*}\). Then \(\theta _{X,\pi _{A,i} ,\pi _{B,i} }\) can be nonzero (see [18] for examples), however if \(i\ne j\), \(\theta _{X,\pi _{A,i},\pi _{B,j}}\) is identically zero, since for any choice of line bundle \([{{\mathcal {L}}}]\) in \({{\,\mathrm{NS}\,}}(A)\), the associated point \(D_{[{{\mathcal {L}}}]}(b)\) will lie in \(f_{i}^* J_0 (N)\), hence the projection to \(f_{j*}J_0 (N)\) will be torsion.

3 Proof of finiteness of the Chabauty–Kim set under (C)

The strategy of proof of Proposition 1 is very similar to that of [6, Lemma 3.2]. To explain this strategy, we need to establish some notation. XAB are as in the proposition. Define

$$\begin{aligned} V:=T_p (J)\otimes {\mathbb {Q}}_p , \quad V_A := T_p (A) \otimes {\mathbb {Q}}_p , \quad V_B :=T_p (B) \otimes {\mathbb {Q}}_p. \end{aligned}$$

Let \(U_n (b)\) denote the maximal n-unipotent quotient of the \({\mathbb {Q}}_p \)-unipotent fundamental group of \({\overline{X}}\) at some basepoint b as defined in [22, §10]. Let U be a Galois-stable quotient of \(U_n (b)\) (i.e. a quotient by a Galois-stable normal subgroup of \(U_n(b)\)). Let \(T_0 \) be the set of primes of bad reduction for X, and let \(T=T_0 \cup \{ p \}\). Denote the maximal quotient of \({\text {Gal}}({\overline{{\mathbb {Q}}}} /{\mathbb {Q}})\) unramified outside T by \(G_{{\mathbb {Q}},T}\), and for \(v\in T\) denote \({\text {Gal}}({\overline{{\mathbb {Q}}}}_v /{\mathbb {Q}}_v )\) by \(G_{{\mathbb {Q}}_v} \). Then by [39, 40], we have a commutative diagram

figure b

with the following properties.

  1. 1.

    For \(G=G_{{\mathbb {Q}},T}\) or \(G_{{\mathbb {Q}}_v }\), and all \(i<k\), the sets \(H^1 (G,U^{(i)}/U^{(k)})\) have the structure of \({\mathbb {Q}}_p \) points of an algebraic variety, so that the algebraic structure on \(H^1 (G,{\text {gr}}_i U)\) is just the usual scheme structure on a vector space, and the maps

    $$\begin{aligned} H^1 (G,{\text {gr}}_i U)\rightarrow H^1 (G,U/U^{(i+1)})\rightarrow H^1 (G,U/U^{(i)}) \end{aligned}$$

    come from morphisms of algebraic varieties. The maps \({\text {loc}}_v \) are then algebraic for these structures.

  2. 2.

    For \(v\in T_0 \), the map \(j_v \) has finite image.

  3. 3.

    The image of the map \(j_p \) is contained inside the subvariety \(H^1 _f (G_{{\mathbb {Q}}_p },U)\) of crystalline torsors.

The following Lemma is proved in [6, Lemma 3.1] (although the result is stated only in the case \(A=J\), the proof generalises to the case where A is an arbitrary quotient of J).

Lemma 3

Let U be a Galois-stable quotient of \(U_2 (b)\). Suppose U is an extension of \(V_A\) by \({\mathbb {Q}}_p (1)^n\), where A is some abelian variety over \({\mathbb {Q}}\) and \(V_A= T_p (A) \otimes {\mathbb {Q}}_p\). If

$$\begin{aligned} {{\,\mathrm{rk}\,}}(A({\mathbb {Q}}))<n+\dim (A), \end{aligned}$$

then \(X({\mathbb {Q}}_p )_2\) is finite. In particular, if \({{\,\mathrm{rk}\,}}(A({\mathbb {Q}}))=\dim (A)\), then \(X({\mathbb {Q}}_p )_2 \) is finite whenenever \(n> 0\).

To prove Proposition 1, we construct a quotient U of \(U_2(b)\) as in Lemma 3, with \(n={{\,\mathrm{rk}\,}}({\text {Ker}}\theta _{X,\pi _A,\pi _B})\). We again take X to be a smooth projective geometrically irreducible curve over a field K of characteristic zero.

The group \(U_2 (b)\) is an extension

$$\begin{aligned} 1 \rightarrow {\text {Ker}}(H^2 (J_{{\overline{{\mathbb {Q}}}}},{\mathbb {Q}}_p ){\mathop {\longrightarrow }\limits ^{\mathrm {AJ}^* }}H^2 (X_{{\overline{{\mathbb {Q}}}}},{\mathbb {Q}}_p ))^* \rightarrow U_2 (b) \rightarrow V \rightarrow 1. \end{aligned}$$
(19)

Hence for any \(\xi \in {\text {Ker}}({{\,\mathrm{NS}\,}}(J) \overset{{\widetilde{\mathrm {AJ}}}^*}{\rightarrow } {{\,\mathrm{NS}\,}}(X))\), we may quotient by the kernel of the dual of the Chern class \(c_p ^{{\acute{\mathrm{e}}\mathrm{t}}}(\xi ) \in H^2(X_{{\overline{{\mathbb {Q}}}}},{\mathbb {Q}}_p(1))\) (see Sect. 1.1)

$$\begin{aligned} c_p ^{{\acute{\mathrm{e}}\mathrm{t}}}(\xi )^* (1):{\text {Ker}}(H^2 (J_{{\overline{{\mathbb {Q}}}}},{\mathbb {Q}}_p ){\mathop {\longrightarrow }\limits ^{\mathrm {AJ}^* }}H^2 (X_{{\overline{{\mathbb {Q}}}}},{\mathbb {Q}}_p ))^* \rightarrow {\mathbb {Q}}_p (1) \end{aligned}$$

to obtain a quotient \(U_Z\) of \(U_2 (b)\) which is an extension of V by \({\mathbb {Q}}_p (1)\). Similarly, for any nice correspondence on \(X\times X\), we obtain a quotient of \(U_2 (b)\) which is an extension of V by \({\mathbb {Q}}_p (1)\).

Lemma 4

([6], Theorem 6.3) Let U be a Galois-stable quotient of \(U_2 (b)\) of the form

$$\begin{aligned} 1\rightarrow {\mathbb {Q}}_p (1)\rightarrow U\rightarrow V_p(J) \rightarrow 1, \end{aligned}$$

coming from a correspondence \(Z\subset X\times X\) as above. Then the associated extension class of \(\mathrm {Lie}(U)\) in \({{\,\mathrm{Ext}\,}}^1 _{G_K }(V_p(J),{\mathbb {Q}}_p (1))\) is equal to the étale Abel–Jacobi class of the cycle \(D_Z (b)\) (see Sect. 2.1).

Proof

Let \({\mathcal {E}}(\mathrm {Lie}(U))\) be the universal enveloping algebra of \(\mathrm {Lie}(U)\), and let \(I(\mathrm {Lie}(U))\) be the kernel of the co-unit morphism \({\mathcal {E}}(\mathrm {Lie}(U))\rightarrow {\mathbb {Q}}_p \). In [6, §6], a Galois representation \(E_Z\) is constructed as a quotient of \({\mathcal {E}}(\mathrm {Lie}(U))\). The image of \(I(\mathrm {Lie}(U))\) in \(E_Z\) is an extension \(IE_Z \) of V by \({\mathbb {Q}}_p (1)\). By [6, Theorem 6.3], the extension class of \(IE_Z\) in \({{\,\mathrm{Ext}\,}}^1_{{{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}}(V_p(J),{\mathbb {Q}}_p(1))\) is the Abel–Jacobi class of \(D_Z (b)\). The restriction of \(I(\mathrm {Lie}(U))\rightarrow IE_Z\) to \(\mathrm {Lie}(U)\subset I(\mathrm {Lie}(U))\) is an isomorphism, and hence the extension class of \(\mathrm {Lie}(U)\) is isomorphic to \(D_Z (b)\). \(\quad \square \)

As explained in Appendix 6, Lemma 4 is really a consequence of Hain and Matsumoto’s computation of the extension class of \(\mathrm {Lie}(U_2 )\) in terms of the Ceresa cycle. Hence to complete the proof of Proposition 1, it will be enough to prove the following Lemma.

Lemma 5

Let \(U'\) denote the quotient of \(U_2 \) obtained from the surjection \({\text {gr}}_2 (U_2 )\rightarrow {\text {Ker}}(d_{\pi _A} )^* \otimes {\mathbb {Q}}_p (1)\). There exists a Galois stable quotient U of \(U'\) which is an extension of \(V_A\) by \({\text {Ker}}(\theta _{X,\pi _A ,\pi _B })\):

figure c

Proof

It will be enough to prove the corresponding statement for the Lie algebra \(L'\) of \(U'\). The commutator map

$$\begin{aligned}{}[\cdot ,\cdot ]_{U'} :(V_A \oplus V_B )\times (V_A \oplus V_B )\rightarrow {\text {Ker}}(d_{\pi _A} )^* \otimes {\mathbb {Q}}_p (1) \end{aligned}$$

is the composite of the commutator on \(U_2 \), given by

$$\begin{aligned} (V_A \oplus V_B )\times (V_A \oplus V_B ) \rightarrow {{\,\mathrm{Coker}\,}}({\mathbb {Q}}_p (1) {\mathop {\longrightarrow }\limits ^{\cup ^* }} \wedge ^2 V_A \oplus V_A \otimes V_B \oplus \wedge ^2 V_B ) \end{aligned}$$

with the surjection

$$\begin{aligned} {{\,\mathrm{Coker}\,}}({\mathbb {Q}}_p (1) {\mathop {\longrightarrow }\limits ^{\cup ^* }} \wedge ^2 V_A \oplus V_A \otimes V_B \oplus \wedge ^2 V_B ) \rightarrow {\text {Ker}}(d_{\pi _A} )^* \otimes {\mathbb {Q}}_p (1) \end{aligned}$$

Since the latter map factors through projection onto \(\wedge ^2 V_A /{\mathbb {Q}}_p (1)\), the composite map factors through projection onto \(V_A \times V_A \). Hence for any quotient Q of \({\text {Ker}}(d_{\pi _A })^* \otimes {\mathbb {Q}}_p (1)\), we can construct a Lie algebra quotient of \(L'\) which is an extension of \(V_A\) by Q. It remains to show that, when \(Q={\text {Ker}}(\theta _{X,\pi _A ,\pi _B})\), we can make this quotient Galois stable. That is, we first quotient out by \(({\text {Ker}}(d_{\pi _A })/{\text {Ker}}(\theta _{X,\pi _A ,\pi _B }))^* \otimes {\mathbb {Q}}_p (1)\), to form an extension

$$\begin{aligned} 0\rightarrow {\text {Ker}}(\theta _{X,\pi _A ,\pi _B })^* \otimes {\mathbb {Q}}_p (1)\rightarrow L''\rightarrow V_A \oplus V_B \rightarrow 0. \end{aligned}$$

The surjection \(L''\rightarrow V_B \) induces a Galois equivariant short exact sequence of Lie algebras

$$\begin{aligned} 0\rightarrow L'\rightarrow L''\rightarrow V_B \rightarrow 0, \end{aligned}$$

and to construct the quotient \(U\rightarrow U'\), it is enough to show that this short exact sequence admits a Galois equivariant section. Here \(L'\) sits in a short exact sequence

$$\begin{aligned} 0\rightarrow {\text {Ker}}(\theta _{X,\pi _A ,\pi _B})^* \otimes {\mathbb {Q}}_p (1)\rightarrow L' \rightarrow V_A \rightarrow 0, \end{aligned}$$

and since \(L''/{\text {Ker}}(\theta _{X,\pi _A ,\pi _B})^* \otimes {\mathbb {Q}}_p (1)=V_A \oplus V_B\), it is enough to show that image of \([L'']\) under the composite map

$$\begin{aligned} {{\,\mathrm{Ext}\,}}^1 _{G_{{\mathbb {Q}}}}(V_A \oplus V_B ,{\text {Ker}}(\theta _{X,\pi _A ,\pi _B})^* \otimes {\mathbb {Q}}_p (1))\rightarrow {{\,\mathrm{Ext}\,}}^1 _{G_{{\mathbb {Q}}}}(V_B ,{\text {Ker}}(\theta _{X,\pi _A ,\pi _B})^* \otimes {\mathbb {Q}}_p (1)) \end{aligned}$$

is zero.

Equivalently, we want to show that \({\text {Ker}}(\theta _{X,\pi _A ,\pi _B})\) is contained in the kernel of the homomorphism

$$\begin{aligned} {\text {Ker}}(d_{\pi _A })\rightarrow {{\,\mathrm{Ext}\,}}^1 _{G_{{\mathbb {Q}}}}(V_B ,{\mathbb {Q}}_p (1)) \end{aligned}$$

sending \(\xi \in {\text {Ker}}(d_{\pi _A })\) to the \(V_B\) component of the extension class in \({{\,\mathrm{Ext}\,}}^1 (V_A \oplus V_B ,{\mathbb {Q}}_p (1))\) associated to the quotient of \(L'\) defined by \(c _p ^{{\acute{\mathrm{e}}\mathrm{t}}}(\xi )\):

figure d

By Lemma 4, this extension class is equal to the étale Abel–Jacobi class of \(D_{c_p ^{{\acute{\mathrm{e}}\mathrm{t}}}(\xi )}(b)\), and hence its \(V_B\) component is equal to the étale Abel–Jacobi class of \(\theta _{X,\pi _A ,\pi _B }(c_p ^{{\acute{\mathrm{e}}\mathrm{t}}}(\xi ))\). Under the hypothesis, the latter is 0 so the extension class is trivial, which concludes the proof of Proposition 1. \(\square \)

3.1 Bounding the number of rational points on curves satisfying (C)

Following [3], the proof of finiteness of \(X({\mathbb {Q}}_p )_2\) may be used to prove an explicit upper bound on \(\# X({\mathbb {Q}}_p )_2\). To explain this, we introduce some notation. By [41, Corollary 1], for all \(v\ne p\), the size of the image of \(X({\mathbb {Q}}_v )\) in \(H^1 (G_{{\mathbb {Q}}_v },U_2 )\) is finite, and is equal to one for all primes of good reduction for X. Let \(T_0\) denote the set of primes of bad reduction for X, and for \(v\in T_0\) let \(n_v \) denote the size of the image of \(X({\mathbb {Q}}_v )\) in \(H^1 (G_{{\mathbb {Q}}_v },U_2 )\).

Corollary 1

Suppose X satisfies the hypotheses of Proposition 1, and furthermore that the rank of \(A({\mathbb {Q}})\) is equal to its dimension, and the p-adic closure of A has finite index in \(A({\mathbb {Q}}_p )\). Let \(n:=\prod _{v\in T_0 }n_v \). Let D be an effective divisor on X, let \(Y\subset X_{\mathbb {Z}_p }\) be the complement of the support of a normal crossings divisor on Y with generic fibre D, and let \(\{ \omega _0 ,\ldots ,\omega _{2g-1}\}\) be a set of differentials in \(H^0 (X,\varOmega (D))\) forming a basis of \(H^1 _{\text {dR}} (X)\). Then there are \(a_{ij},a_i \in {\mathbb {Q}}_p\), \(\eta \in H^0 (X,\varOmega (D))\) and \(g\in H^0 (X,\varOmega (2D))\), and \(\alpha _1 ,\ldots ,\alpha _n \) in \({\mathbb {Q}}_p \), such that

$$\begin{aligned}&X({\mathbb {Q}}_p )_2 \cap Y(\mathbb {Z}_p )\subset \bigcup _{i=1}^n \{x\in Y(\mathbb {Z}_p ): \sum a_{ij}\int ^x _b \omega _i \omega _j\nonumber \\&\quad +\sum a_i \int ^x _b \omega _i +\int ^x _b \eta +g(x)=\alpha _i \}. \end{aligned}$$
(20)

Proof

The argument is identical to the proof of [7, Proposition 6.4], however as the hypotheses are different we explain the steps. Arguing as in loc. cit, there are \(b_{ij}\), \(b_i \) in \({\mathbb {Q}}_p \) such that \(X({\mathbb {Q}}_p )_2 \cap Y(\mathbb {Z}_p )\) is contained in the finite set of \(x\in Y(\mathbb {Z}_p )\) satisfying

$$\begin{aligned} h_p (A_Z (x))-\sum b_{ij}\left( \int ^x _b \omega _i \right) \left( \int ^x _b \omega _j \right) -\sum \int ^x _b \omega _i =-\sum _{v\in T_0 }h(A_Z (b)^{\phi _v }) , \end{aligned}$$

for some \((\phi _v )\) in \(\prod _{v \in T_0 }j_v (X({\mathbb {Q}}_v )).\) Here \(A_Z (b)^{(\phi _v )}\) denotes the twist of \(A_Z (b)\) by \(\phi _v \).

Hence we deduce (20) from the formula for \(h_p (A_Z (x))\) given in [7, Lemma 6.7], and the formula

$$\begin{aligned} \left( \int ^x _b \omega _i \right) \left( \int ^x _b \omega _j \right) =\int ^x _b \omega _i \omega _j +\int ^x _b \omega _j \omega _i . \end{aligned}$$

\(\square \)

Corollary 2

Suppose X satisfies the hypotheses of Proposition 1, and furthermore that the rank of \(A({\mathbb {Q}})\) is equal to its dimension. Then

$$\begin{aligned} \# X({\mathbb {Q}}) <\kappa _p \left( \prod _{v\in T_0 }n_v \right) \# X(\mathbb {F}_p )(16g^3+15g^2-16g+10), \end{aligned}$$

where \(\kappa _p :=1+\frac{p-1}{p-2}\frac{1}{\log (p)}\).

Proof

It is enough to prove that, for all \(x_0 \in X(\mathbb {Z}_p )\), we can choose \(D,\omega _i \) such that \({\overline{x}}:={{\,\mathrm{red}\,}}(x_0 )\) lies in \(Y(\mathbb {F}_p )\), and

$$\begin{aligned}&\# \{x\in {{\,\mathrm{red}\,}}^{-1}(\{{\overline{x}}\})\subset X({\mathbb {Q}}_p ): \sum a_{ij}\int ^x _b \omega _i \omega _j +\sum ^x _b a_i \int ^x _b \omega _i +\int ^x _b \eta +g(x)=0 \} \\&\quad < \kappa _p (16g^3+15g^2-16g+10). \end{aligned}$$

This follows from [3, Proposition 3.2] together with [3, §4, below Lemma 4.4.]. \(\square \)

Remark 6

In [10], it is proved that the size of \(j_{2,v}(X({\mathbb {Q}}_v ))\) can be bounded by the number of irreducible components of a regular semistable model of X over a finite extension of \({\mathbb {Q}}_v \). Hence using work of Edixhoven and Parent on stable models of \(X_{{{\,\mathrm{ns}\,}}}^+(N)\) [23], one can use the above corollary, together with Theorem 1, to give explicit bounds on the size of \(X_{{{\,\mathrm{ns}\,}}}^+(N)\) and \(X_0 ^+ (N)\).

3.2 Functoriality properties of (C)

The heart of the proof of Proposition 3 is an interpretation of diagonal cycles on \(X_0 (N)\) and \(X_{{{\,\mathrm{ns}\,}}}(N)\) in terms of Heegner points. The following Lemma allows us to use this to deduce something about diagonal cycles on \(X_0 ^+ (N)\) and \(X_\mathrm{ns }^+ (N)\). This lemma is a special case of a theorem of Daub [20, Proposition 3.3.5].

Lemma 6

  1. 1.

    Let \(f:X'\rightarrow X\) be a non-constant morphism of curves over a field K. Suppose \(b'\in X'(K)\) maps to \(b\in X(K)\) under f, and let Z be an element of \(\mathrm {CH}^1 (X\times X)\). Then

    $$\begin{aligned} D_{(f,f)^* Z}(b')=f^* (D_Z (b)). \end{aligned}$$
  2. 2.

    Let \(f:X'\rightarrow X\) and \(b'\) be as above, and let \(f_*\) denote the induced surjection \(J':={\text {Jac}}(X')\rightarrow J:={\text {Jac}}(X)\). Let \((\pi _A ,\pi _B)\) be a surjective homomorphism from J to \(A\times B\). Then

    $$\begin{aligned} {\text {Ker}}(\theta _{X,\pi _A ,\pi _B})={\text {Ker}}(\theta _{X' ,\pi _A \circ f_* ,\pi _B \circ f_*}). \end{aligned}$$

Proof

For \(*=\{1\},\{2\}\) or \(\{1,2\}\), the diagram

figure e

commutes. Hence we obtain, in \(\mathrm {CH}^1 (X')\),

$$\begin{aligned} f^* (C_Z (b))&=(f^* \circ i_{\{1,2\}}(b)^* -f^* \circ i_{\{1 \}}(b)^*-f^* \circ i_{\{2\}}(b)^* )(Z)\\&= (i_{\{1,2\}}(b')^* \circ (f,f)^* - i_{\{1 \}}(b')^* \circ (f,f)^* - i_{\{2\}}(b)^* \circ (f,f)^* )(Z) \\&= C_{(f,f)^* (Z)}(b') \end{aligned}$$

and the result follows for \(D_Z(b)\). The second item follows from the first, as we now prove. Let \({{\mathcal {L}}}\) be a line bundle on A belonging to \({\text {Ker}}d_{\pi _A}\). By definition of \(\theta _{X,\pi _A,\pi _B}\) and the right part of diagram (15), we fix some cycle Z on \(X \times X\) such that \([Z] = (\mathrm {AJ}^{(2)})^* \circ \pi _A ^* ([{{\mathcal {L}}}])\), and then by (17)

$$\begin{aligned} \theta _{X,\pi _A,\pi _B}([{{\mathcal {L}}}]) = (\pi _B \otimes {\mathbb {Q}}) \circ \varphi ([Z]) = (\pi _B \otimes {\mathbb {Q}}) (D_Z(b)). \end{aligned}$$

Now, considering the morphism \(f : X' \rightarrow X\) with those choices of base points, we have \(f_* \circ \mathrm {AJ}_{X'}^{(2)} = \mathrm {AJ}_{X}^{(2)} \circ (f,f)\). Consequently, with the same \({{\mathcal {L}}}\) and Z, \([(f,f)^* Z] = (\mathrm {AJ}_{X'}^{(2)})^* \circ (\pi _A \circ f_*)^* ([{{\mathcal {L}}}])\), so the kernels of \(d_{\pi _A}\) and \(d_{\pi _A `\circ f_*}\) are the same, and on this common kernel,

$$\begin{aligned} \theta _{X',\pi _A \circ f_*,\pi _B \circ f_*}([{{\mathcal {L}}}])= & {} ((\pi _B \circ f_*) \otimes {\mathbb {Q}}) (D_{(f,f)^*Z}(b)) \\= & {} ((\pi _B \circ f_*) \otimes {\mathbb {Q}}) (f^* (D_Z(b))) \\= & {} [\deg f] (\pi _B \otimes {\mathbb {Q}}) (D_Z(b)). \end{aligned}$$

In particular,\(\theta _{X,\pi _A,\pi _B}\) and \(\theta _{X',\pi _A \circ f_*,\pi _B \circ f_*}\) have the same kernel. \(\square \)

Note that while the behaviour of diagonal cycles under pull-backs is tautological, their behaviour under push-forwards is not. For this reason it seems difficult to deduce statements about diagonal cycles on \(X_{{{\,\mathrm{ns}\,}}}(N)\) from results on \(X_{\mathrm {s}}(N)\), in spite of the explicit isogeny relating their Jacobians explained below.

4 Proof of (C) for \(X_0 ^+ (N)\) and \(X_{{{\,\mathrm{ns}\,}}}^+ (N)\)

Given Proposition 1, it will be enough to prove Theorem 2, and the following.

Proposition 3

Assume Theorem 2. Then, for \(X=X_0 ^+ (N)\) or \(X_{{{\,\mathrm{ns}\,}}}^+ (N)\) of genus at least 2, there exists an isogeny

$$\begin{aligned} (\pi _A ,\pi _B ):J\rightarrow A\times B, \end{aligned}$$

where \({{\,\mathrm{rk}\,}}(A) = \dim (A) = \rho (A) \ge 2\) and such that, for all \({{\mathcal {L}}}\) in \({\text {Ker}}(d_{\pi _A })\), \( \theta _{X,\pi _A ,\pi _B }({{\mathcal {L}}})=0 \) is torsion (see Definition 4 for the choices of A and B).

We recall the definitions of some of the modular curves which appear, for example, in [16]. Define \(C_{{{\,\mathrm{ns}\,}}}^+ (N),C_{\mathrm {s}}^+ (N)\) to be normalisers in \({\text {GL}}_2 (\mathbb {Z}/{\mathbb {N}}\mathbb {Z})\) of fixed choices of non-split Cartan \(C_\mathrm{{ns}}(N)\) and split Cartan subgroups \(C_\mathrm{{s}}(N)\) of \({\text {GL}}_2 (\mathbb {Z}/N\mathbb {Z})\). The (normaliser of) split and nonsplit Cartan modular curves are defined by

$$\begin{aligned} X_{{{\,\mathrm{ns}\,}}}^+ (N) :=X(N)/C_{{{\,\mathrm{ns}\,}}}^+ (N), \quad X_{\mathrm {s}}^+ (N)=X(N) /C_{\mathrm {s}}^+ (N). \end{aligned}$$

Similarly we define \(X_{{{\,\mathrm{ns}\,}}}(N)\) and \(X_{\mathrm {s}}(N)\) to be the quotients of X(N) by \(C_{{{\,\mathrm{ns}\,}}}(N)\) and \(C_{\mathrm {s}}(N)\) respectively. Since \(C_{{{\,\mathrm{ns}\,}}}(N)\) and \(C_{\mathrm {s}}(N)\) contain the centre of \({\text {GL}}_2 (\mathbb {Z}/N\mathbb {Z})\) and their determinant goes through all \(({\mathbb {Z}}/N{\mathbb {Z}})^*\), all \(X_{{{\,\mathrm{ns}\,}}}(N)\), \(X_{\mathrm {s}}(N)\) and their Atkin–Lehner quotients are geometrically connected and defined over \({\mathbb {Q}}\).

Non-cuspidal points of \(X_{\mathrm {s}}(N)\) (in characteristic not dividing N) correspond to elliptic curves E together with a pair \(C_1 ,C_2 \) of cyclic subgroups of E of order N generating E[N]. We have an isomorphism

$$\begin{aligned} X_0 (N^2 )\simeq X_{\mathrm {s}}(N), \end{aligned}$$
(21)

which sends a point \((f:E\rightarrow E' )\) to \((E'',C_1 ,C_2 )\), where \(E'' :=E/(N\cdot {\text {Ker}}(f))\), \(C_1 \) is the image of \({\text {Ker}}(f)\) in \(E''\), and \(C_2 \) is the image of E[N] in \(E''\).

The curve \(X_{\mathrm {s}}(N)\) is naturally a degree two cover of \(X_{\mathrm {s}}^+ (N)\), and there is an isomorphism \(X_{\mathrm {s}}^+ (N)\simeq X_0 ^+ (N^2 )\) compatible with (21).

4.1 Jacobians of modular curves and the asymptotics of the quadratic Chabauty condition

We recall a formula for the Picard numbers and ranks of modular Jacobians and their quotients, due to Siksek [59]. Let \({\mathcal {B}}_{N^k} \) denote a normalised eigenbasis for the space of newforms in \(S_2 (\varGamma _0 (N^k ))\). Let \({\mathcal {B}}_{N^k }/{{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}\) denote a choice of representatives of the orbits of \({{\mathcal {B}}}_{N^k}\) under \({{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}\). We denote by \({\mathcal {B}}_{N^k }^+\) the subset of \({\mathcal {B}}_{N^k }\) with Atkin–Lehner eigenvalue 1 for \(w_{N^k }\). The Jacobians \(J_0 (N^k )^{\mathrm {new}}\) and \(J_0 ^+ (N^k )^{\mathrm {new}}\) admit \({\mathbb {Q}}\)-isogenies

$$\begin{aligned} J_0 (N^k )^{\mathrm {new}} \sim \prod _{f\in {\mathcal {B}} _{N^k} /{{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}} A_f , \quad J_0 ^+ (N^k )^{\mathrm {new}} \sim \prod _{f\in {\mathcal {B}}^+ _{N^k} /{{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}} A_f , \end{aligned}$$

where \(A_f \) denotes the \({\mathbb {Q}}\)-simple abelian variety associated to f by the Eichler–Shimura correspondence (which is independent of the choice of representative of the orbit). Because \(X_s^+(N)\) is isomorphic to \(X_0^+(N^2)\) as we have seen above,

$$\begin{aligned} J_s^+(N) \cong J_0 ^+ (N^2 ) \sim J_0 (N)\times J_0 ^+ (N^2 )^{\mathrm {new}} \end{aligned}$$

and by a theorem of Chen [16, Theorem 1], we also have a \({\mathbb {Q}}\)-isogeny

$$\begin{aligned} J_{{{\,\mathrm{ns}\,}}}^+ (N)\sim J_0 ^+ (N^2 )^{\mathrm {new}}. \end{aligned}$$
(22)

The following lemma says that one would not expect to be able to use Chabauty’s method to understand \(X({\mathbb {Q}})\).

Lemma 7

Let \(X=X_0 ^+ (N)\) or \(X_{{{\,\mathrm{ns}\,}}}(N)\). Then the weak Birch–Swinnerton-Dyer conjecture implies \(X({\mathbb {Q}}_p )_1 =X({\mathbb {Q}}_p )\).

Proof

The weak Birch–Swinnerton-Dyer conjecture implies that, for \(f\in {\mathcal {B}}_{N^k}\), \(A_f\) will have positive rank whenever f has positive analytic rank. Since \(f\in {\mathcal {B}}_{N^k}\) has odd analytic rank whenever \(w_{N^k}(f)=1\), and \(A_f\) is simple over \({\mathbb {Q}}\), the Birch–Swinnerton-Dyer conjecture hence implies that every isogeny factor of \({\text {Jac}}(X)\) (over \({\mathbb {Q}}\)) has positive rank.

Since \({\text {End}}(A_f )\) is an order in the totally real field \(K_f\), every isogeny factor of \({\text {Jac}}(X)\) has rank at least equal to its dimension. To prove the lemma, we must show that the image of \(A_f ({\mathbb {Q}})\) in \(\mathrm {Lie}(A_f )_{{\mathbb {Q}}_p }\) under the p-adic logarithm map generates \(\mathrm {Lie}(A_f )_{{\mathbb {Q}}_p }\) as a \({\mathbb {Q}}_p \)-vector space. This is equivalent to the statement that the image of \(A_f ({\mathbb {Q}})\) in \(\mathrm {Lie}(A_f )_{\mathbb {C}_p }\) generates the latter as a \(\mathbb {C}_p \)-vector space. Since \(\mathrm {Lie}(A_f )_{{\overline{{\mathbb {Q}}}}}\) decomposes as a sum of one-dimensional isotypic components \(\mathrm {Lie}(A_f )_{{\overline{{\mathbb {Q}}}},g}\), for g conjugate to f, and the p-adic logarithm is \({\text {End}}(A_f )\)-equivariant, we deduce that if the image of \(A_f ({\mathbb {Q}})\) does not span \(\mathrm {Lie}(A_f )_{\mathbb {C}_p }\) then there is a g conjugate to f such that the image of \(A_f ({\mathbb {Q}})\) in \(\mathrm {Lie}(A_f )_{\mathbb {C}_p ,g}\) is zero. By the p-adic analytic subgroup theorem [49, Theorem 1], [25, Theorem 2.2] if \(P\in A_f ({\overline{{\mathbb {Q}}}} )\) has the property that \(\log (P)\in \mathrm {Lie}(A_f )_{\mathbb {C}_p }\) lies in a proper subspace defined over \({\overline{{\mathbb {Q}}}}\), then P lies in a proper commutative sub-variety \(B\subset A_{f,{\overline{{\mathbb {Q}}}}}\). Hence we deduce that if \(A_f ({\mathbb {Q}})\) does not generate \(\mathrm {Lie}(A_f )_{\mathbb {Q}_p }\), then \(A_f ({\mathbb {Q}})\) lies in a proper commutative subvariety of \(A_{f,{\overline{{\mathbb {Q}}}}}\), since the isotypic components of \(\mathrm {Lie}(A_f )_{\mathbb {C}_p }\) are defined over \({\overline{{\mathbb {Q}}}}\).

We claim that this contradicts the Birch–Swinnerton-Dyer conjecture. More generally, if A is a simple abelian variety over \({\mathbb {Q}}\) and \(\pi :A_K \rightarrow B\) is a non-zero morphism of abelian varieties over a finite Galois extension \(K/{\mathbb {Q}}\), we claim that \(P\in A({\mathbb {Q}})\) is torsion if and only if its image in B(K) is torsion (in particular, when \(A=A_f \) and B is an isogeny factor, we deduce that \(A_f\) has rank zero over \({\mathbb {Q}}\) if and only if there is as isogeny factor B of \(A_{f,{\overline{{\mathbb {Q}}}}}\) such that the image of \(A_f ({\mathbb {Q}})\) in B is torsion). To see this claim, for \(\sigma \in {\text {Gal}}(K/{\mathbb {Q}})\) let \(\pi ^{\sigma }\) denote the conjugate homomorphism \(A_K \rightarrow B^{\sigma }\). If \(\pi (P)\) is torsion then \(\pi ^{\sigma }(P)=\pi (P)^{\sigma }\) is torsion for all \(\sigma \), hence the image of P under the map

$$\begin{aligned} \prod _{\sigma \in {\text {Gal}}(K|{\mathbb {Q}})}\pi ^{\sigma }:A_K \rightarrow \prod _{\sigma }B^{\sigma } \end{aligned}$$

is torsion. However, this map descends to a non-zero morphism of \({\mathbb {Q}}\), and hence by simplicity of A, if \(\pi (P)\) is torsion then P is torsion. \(\square \)

Moreover, two abelian varieties \(A_f\), \(A_g\) for \(f,g \in {{\mathcal {B}}}_{N^k}\) are non-isogenous unless f and g are conjugate by \({{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}\), and \({\text {End}}^{\dagger }(A_f)\) is always totally real of rank \(\dim (A_f )\), which proves that each of the Jacobians \(J = J_0^+(N),J_{\mathrm {s}}^+(N), J_\mathrm{{ns}}^+(N)\) satisfies \(\rho (J) = \dim J\), and hence the condition (2) becomes

$$\begin{aligned} {{\,\mathrm{rk}\,}}(J) < 2 \cdot \dim (J) - 1 \end{aligned}$$
(23)

(for a more general such condition for modular curves, see the main result of [59]). Using the isogenies above, the Birch–Swinnerton-Dyer conjecture implies

$$\begin{aligned} {{\,\mathrm{rk}\,}}(J_0 ^+ (N)) = \sum _{f \in {{\mathcal {B}}}_N^+} {\text {ord}}_{s=1} L(f,s), \quad {{\,\mathrm{rk}\,}}(J_\mathrm{{ns}}^+(N)) = \sum _{f \in {{\mathcal {B}}}_{N^2}^+} {\text {ord}}_{s=1} L(f,s). \end{aligned}$$

There is a whole literature on analytic estimates for these types of analytic ranks. In particular, using [45, Theorem 1.4] one can show that the Birch–Swinnerton-Dyer conjecture implies that

$$\begin{aligned} \limsup _{N} \frac{{{\,\mathrm{rk}\,}}(J_0^+(N))}{\dim J_0^+(N)} \le 1.3782, \end{aligned}$$

and in particular asymptotically that (2) is always satisfied. It is likely that the same result can be obtained for \(J_\mathrm{{ns}}^+(N)\), but the square level (we are looking at \(J_0^+(N^2)^\mathrm{{new}}\)) raises serious technical difficulties for analytic estimates of second moments used there.

On the other hand, by Corollary 4, Theorem 2 implies that we have an isogeny factor A of J satisfying \(\rho (A)>1\) and \({{\,\mathrm{rk}\,}}(A)=\dim (A)\), hence to prove Proposition 3 it suffices to construct a nonzero \([L] \in {\text {Ker}}({{\,\mathrm{NS}\,}}(A)\rightarrow {{\,\mathrm{NS}\,}}(X))\) satisfying \(\theta _{X,\pi _A ,\pi _B}([L])=0\), where B is the isogeny factor consisting of modular abelian varieties associated to modular forms whose analytic rank of L-functions is greater than 1. It will be shown that for any L, its image \(\theta _{X,\pi _A ,\pi _B}(L)\) can be represented by a divisor supported on cusps and Heegner points, and hence is torsion by the generalised Gross–Zagier formula ( [67, Theorem 6.1]) This motivates the following definition.

Definition 4

(Heegner quotient) Let \(M=N\) or \(N^2\). The Heegner quotient A of \(J_0(M)^\mathrm{{new}}\) is the product

$$\begin{aligned} A := \prod _{\begin{array}{c} f \in {{\mathcal {B}}}^{+,\mathrm {new}}_{M}/{{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}\\ L'(f,1) \ne 0 \end{array}} A_f, \end{aligned}$$

and its complement is

$$\begin{aligned} B := \prod _{\begin{array}{c} f \in {{\mathcal {B}}}^{+,\mathrm {new}}_M/{{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}\\ L'(f,1) = 0 \end{array}} A_f \end{aligned}$$

(so that\(A \times B\) is isogenous to \(J_0^+(M)^\mathrm{{new}}\), not the full \(J_0(M)^\mathrm{{new}}\)).

In particular, Corollary 4 implies that \({{\,\mathrm{rk}\,}}(A) = \dim (A)\) (assuming the Birch–Swinnerton-Dyer conjecture, it is the largest factor of \(J_0^+(M)\) with this property) and the generalised Gross–Zagier formula implies that all images of traces of Heegner points on \(X_0 (N)\) in B are torsion (see Sects. 4.2 and 4.3). In the case of \(X_{{{\,\mathrm{ns}\,}}}(N)\), there is also a notion of Heegner point due to Kohen and Pacetti, inspired by the points used in Zhang’s Gross–Zagier formula for \(X_{{{\,\mathrm{ns}\,}}}(N)\) (and more general Shimura curves).

The main result of the next section is the following lemma, which refers to \(X_0 (N)\) and \(X_{{{\,\mathrm{ns}\,}}}(N)\) rather than their Atkin–Lehner quotients. However, by Lemma 6 it implies Proposition 3.

Lemma 8

Let \(X=X_0 (N)\) or \(X_{{{\,\mathrm{ns}\,}}}(N)\), and AB the Heegner quotient and its complement as defined above, endowed with the natural projections \((\pi _A ,\pi _B ):{\text {Jac}}(X)\rightarrow A\times B.\) Then for all [L] in \({\text {Ker}}(d_{\pi _A })\), \( \theta _{X,\pi _A ,\pi _B }([L]) \) is torsion. In particular the rank of the kernel of \(\theta _{X,\pi _A,\pi _B}\) is maximal (in particular at least 1 if \(\dim A \ge 2\)).

4.2 How to prove (C) using Heegner points under the analytic hypothesis: \(X=X_0 (N)\)

In this section we prove Lemma 8. We will deduce it from the Gross–Zagier–Zhang theorem. In the case of \(X_0 (N)\), as explained in [20] or [19], we could also deduce it from the Yuan–Zhang–Zhang formula for the height of diagonal cycles (see Sect. 4.4). By a Heegner point on \(X_0 (N)\) we will mean a point

$$\begin{aligned} E\rightarrow E' \end{aligned}$$

on \(Y_0 (N)\) such that E and \(E'\) have CM by the same order of an imaginary quadratic field K, not necessarily maximal but assumed to be with conductor prime to N (see [28] for a review of their properties, in particular N has to be split or ramified in K).

An eigenform \(f \in S_2 (\varGamma _0 (N))^{+,\text {new}}\) defines by Eichler-Shimura theory a \({\mathbb {Q}}\)-simple quotient \(\pi :J_0(N) \rightarrow A_f\) of \(J_0(N)\) (in fact of \(J_0 ^+ (N)\)) and the Heegner points behave on \(A_f\) in the following way.

Lemma 9

  1. 1.

    If \(L'(f,1)\ne 0\), then \({{\,\mathrm{rk}\,}}(A_f)= \dim (A_f )\) (and \(A_f({\mathbb {Q}})\) is generated by the projection of a trace of a suitable choice of Heegner point).

  2. 2.

    If \(L'(f,1)=0\), then for any P in \({\text {Div}}^0 (X_0(N))({\overline{{\mathbb {Q}}}} )^{{\text {Gal}}({\overline{{\mathbb {Q}}}}|{\mathbb {Q}})}\) supported on the set of Heegner points, the image \(\pi (P)\) is torsion in \(A_f ({\mathbb {Q}})\).

Remark 7

The original Gross–Zagier formula [32, Theorem I.6.3] is not sufficient for the second part of the Lemma, as it only deals with Heegner points for which the discriminant of the order is squarefree (in particular, the order is maximal) and prime to N, which we cannot afford to assume here. This is why we need Zhang’s formula and the ensuing technical interpretation.

Proof

The first part is given by Proposition 8. The second part is a consequence of the generalised Gross–Zagier formula of Zhang [67, Theorem 6.1] which for this case is made completely explicit in [15, Theorem 1.1], see also [15, Example after Theorem 1.5]. We use the following notation: \(f \in S_2(\varGamma _0(N))\) is a normalised eigenform, K an imaginary quadratic field number field in which N is not inert, c prime to N, \({{\mathcal {O}}}_c = {\mathbb {Z}}+ c {{\mathcal {O}}}_K\), and \(1_c\) the trivial ring class character on \({\text {Pic}}({{\mathcal {O}}}_c)\). We denote by \(H_c\) the ray class field of K with conductor c. If P is a Heegner point on \(X_0(N)\) with CM by \({{\mathcal {O}}}_c\), it belongs to \(X_0(N)(H_c)\), and we define

$$\begin{aligned} P_{1_c} = \sum _{\sigma \in {\text {Gal}}(H_c/K)} (P^\sigma -[\infty ]) \in J_0(N)(K) \subset J_0(N)(H_c). \end{aligned}$$

On the other hand, if \( J(H_c) \otimes {\mathbb {C}}\) denotes the extension of scalars of \(J(H_c )\) endowed with the extended Néron-Tate height, we have the decomposition into isotypical components

$$\begin{aligned} J_0(N)(H_c) \otimes {\mathbb {C}}= \bigoplus _{g} J_0(N)_{g}, \end{aligned}$$

where g goes through all eigenforms of weight 2 of \(J_0(N)\), so that \(J_0(N)_g\) is exactly the isotypical part where \(T_n\) acts by multiplication by \(a_n(g)\). We denote by \(P_{1_c}^f\) the projection of \(P_{1_c}\) on the f-isotypical component. The statement of [15, Theorem 1.1] then tells (which is sufficient for us) that \(L'(f,1_c,1)\) as defined there is proportional (by an explicit nonzero factor) to the extended Néron-Tate height of \(P_{1_c}^f\).

We have the equality of L-functions

$$\begin{aligned} L(f,1_K,s) = L\left( f,s\right) L\left( f \otimes \chi _K,s \right) , \end{aligned}$$

with \(1_K\) the trivial class character on \({\text {Pic}}({{\mathcal {O}}}_K)\) and \(\chi _K\) the Dirichlet character associated to K. In particular (and given the signs of functional equations on the right), our hypothesis \(L'(f,1)=0\) guarantees that \(L(f,1_K,s)\) vanishes with order at least 2 at 1, so the left-hand side of [15, Theorem 1.1] is zero for \(c=1\). This also holds for any c prime to N, because by construction \(L(f,1_{c},s)\) is a multiple of \(L(f,1_K,s)\) around 1 (given the definition again). We have thus proved that \(P_{1_c}^g\) is zero in \(J_0(N)(H_c) \otimes {\mathbb {C}}\).

Now, the group \({\text {Aut}}({\mathbb {C}})\) acts on \(J_0(N)(H_c) \otimes {\mathbb {C}}\) by the identity on the left and the natural action on the right, and for every \(\alpha \in {\text {Aut}}({\mathbb {C}})\) acting as such, we have \(P_{1_c}^\alpha = P_{1_c}\) and then for every \(\alpha \in {\text {Aut}}({\mathbb {C}})\), we obtain \((P_{1_c}^g)^{\alpha } = P_{1_c}^{\alpha (g)}\) where \(\alpha (g)\) is the eigenform obtained by conjugating the coefficients of g (see [32, Corollary V.1.2]). Now, as we also have the decomposition

$$\begin{aligned} J_0 (N)(H_c) \otimes {\mathbb {C}}\cong \prod _{f \in \mathcal {B}_N / {{\text{ Gal }}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}} A_f(H_c) \otimes {\mathbb {C}} \end{aligned}$$

in subrepresentations of the Hecke algebra, the sum of all \(P_{1_c}^g\) for g conjugate to f is proportional to the projection \(\pi \) of the trace of \(P- (\infty )\) (belonging to \(J_0(N)(K)\)) in \(A_f(K) \otimes {\mathbb {C}}\), so we have proven that this projection in \(A_f(K)\) is torsion. \(\square \)

We now explain how to deduce Lemma 8 from this result. Let m be an integer coprime to N. Define the Hecke correspondence \({\widetilde{C}}_{m}\) to be the image of \(X_0 (mN)\) in \(X_0 (N)\times X_0 (N)\) under the product of the two natural maps \(X_0 (mN) \rightarrow X_0 (N)\). We define

$$\begin{aligned} C_{m}=(1-\pi _1 ^* i_1 ^* -\pi _2 ^* i_2 ^* ){\widetilde{C}}_{m} \end{aligned}$$

to be the projection of \({\widetilde{C}}_{m}\) onto the \({\text {End}}(J_0 (N))\) component of \({\text {Pic}}(X_0 (N)\times X_0 (N))\) (see (8)). Then \(C_{m}\) lands in the subspace \({{\,\mathrm{NS}\,}}(J_0(N)) \subset {\text {End}}(J_0 (N))\) of endomorphisms symmetric with respect to the Rosati involution. When m is square-free, \(C_{m}\) is the Hecke operator \(T_{m}\). In general, \(C_{m}\) is a linear combination of \(T_{m/d}\) for d divisors of m.

Recall that \(i_{1,2} :X_0 (N)\hookrightarrow X_0 (N)\times X_0 (N)\) denotes the diagonal morphism. A non-cuspidal point in the support of \(i_{1,2} ^* ({\widetilde{C}}_{m} )\) is a cyclic N-isogeny \(f:E_1 \rightarrow E_2 \), together with cyclic subgroups \(G_i\) of \(E_i \) of order m such that \(f(G_1 )=G_2 \), and isomorphisms

$$\begin{aligned} E_i {\mathop {\longrightarrow }\limits ^{\simeq }}E_i /G_i \end{aligned}$$

which commute with f and the induced isogeny \(E_1 /G_1 \rightarrow E_2 /G_2 \). In particular, the ring of endomorphisms of each \(E_i\), of discriminant denoted by \(D_i\), thus contains an element of norm m so there exist \(A_i ,B_i \) in \(\mathbb {Z}\) for which

$$\begin{aligned} A_i ^2 + D_i B_i ^2 =4m. \end{aligned}$$
(24)

The isogeny being cyclic, \(A_i\) and \(B_i\) must be coprime here. The point \(E_1 \rightarrow E_2 \) is a Heegner point of \(Y_0(N)\) if and only if \(D_1 =D_2 \).

Lemma 10

Let \(X=X_0 (N)\), let m be prime to N, and let \({\widetilde{C}}_m\) be the Hecke correspondence defined above. Then the divisor \(i_{1,2} ^* {\widetilde{C}}_{m}\) is supported on the set of Heegner points whenever m is less than N/4.

Proof

Let \((E_1 \rightarrow E_2 )\) be a non-cuspidal point in the support of \(i_{1,2}^* {\widetilde{C}}_m\) as above. Suppose the point is not Heegner. Since \(E_1 \) and \(E_2 \) are N-isogenous, \(D_2 = \lambda ^2 D_1\) for some rational \(\lambda >0\) a power of N. Since \(\lambda \ne 1\), we must have \(D_i\) divisible by \(N^2\) for some i, and hence \(m>N^2 /4\), by (24). Finally, if the conductor of the order was not prime to N, we would also have \(N^2 | D_i\) which leads to the same inequality. \(\square \)

By the following Lemma (essentially just the Sturm bound) we have enough Hecke operators \(C_m\) for which \(i_{1,2}^* C_m\) is supported on cusps and Heegner points to complete the proof of the first part of Lemma 8.

Lemma 11

Let N be a prime. Then, any element of \({\text {End}}^\dagger (J_0 ^+ (N))^{{{\,\mathrm{tr}\,}}=0}\), viewed as a subspace of \({\text {End}}^\dagger (J_0 (N))^{{{\,\mathrm{tr}\,}}=0}\), can be written as a \({\mathbb {Z}}\)-linear combination of endomorphisms associated to the Hecke correspondences \(C_m\), for \(m<N^2 /4\) prime to N.

Proof

By the Sturm bound ( [62] Theorem 9.18), the set of Hecke operators \(T_m\) for \(m<N^2 /4\) spans the Hecke algebra of endomorphisms of \(J_0(N)\). Since \(a_N (f)=-1\) on newforms such that \(f_{|w_N} = -f\), the set of Hecke operators \(T_m\) for \(m<N^2 /4\) prime to N spans the Hecke algebra of endomorphisms of \(J_0 ^+ (N)\) (which is the full endomorphism algebra over \({\mathbb {Q}}\)). \(\square \)

This completes the proof of case (1) of Proposition 3. Indeed, Lemma 11 implies that any nice correspondence Z on \(X_0 (N)\) can be written as a linear combination of the \(C_m \) for \(m<N^2 /4\) prime to N. By Lemma 10, for any such Z, \(D_Z (b)\) is supported on Heegner points and cusps, so by Lemma 9 (part 2), its image by \(\pi _B\) is torsion.

4.3 How to prove (C) using Heegner points under the analytic hypothesis: \(X=X_{{{\,\mathrm{ns}\,}}}^+ (N)\)

The second case is similar to the first, but we must replace the classical notion of Heegner point with Heegner points on non-split Cartan modular curves in the sense of Zhang/Kohen–Pacetti, and replace Gross–Zagier–Zhang on \(X_0 (N)\) with Zhang’s Gross–Zagier theorem on \(X_{{{\,\mathrm{ns}\,}}}(N)\).

To make results easier to state, we use the moduli interpretation of \(X_\mathrm{{ns}}(N)\) and \(X_\mathrm{{ns}}^+(N)\) given in [42] and its consequences. To do so, one fixes an \(\varepsilon \in \mathbb {F}_N\) which is not a square. A pair \((E,\phi _\varepsilon )\) is then an elliptic curve E together with an endomorphism \(\phi _\varepsilon \) of E[N] whose square is multiplication by \(\varepsilon \). Such an endomorphism has eigenvalues in \(\mathbb {F}_{N^2} \backslash \mathbb {F}_N\), and two pairs \((E,\phi _\varepsilon )\) and \((E',\phi _\varepsilon ')\) are isomorphic if there is an isomorphism \(\psi : E \rightarrow E'\) such that on E[N], \(\psi \circ \phi _\varepsilon = \phi _{\varepsilon }' \circ \psi \).

\(X_\mathrm{{ns}}(N)\) is the compactified moduli space of such pairs up to isomorphism [42, §1.2]. Furthermore, the natural involution on this modular curve is given by \((E,\phi _\varepsilon ) \mapsto (E, - \phi _\varepsilon )\).

First, we define Hecke correspondences \(\widetilde{C_m} \subset X_{{{\,\mathrm{ns}\,}}}(N)\times X_{{{\,\mathrm{ns}\,}}}(N)\) (for m prime to N) as follows. We have a curve \(X_{{{\,\mathrm{ns}\,}}}(N,m)=X_{{{\,\mathrm{ns}\,}}}(N)\times _{X(1)}X_0 (m)\) given by adding an auxiliary \(\varGamma _0(m)\) structure. We have two maps \(X_{{{\,\mathrm{ns}\,}}}(N,m)\rightarrow X_{{{\,\mathrm{ns}\,}}}(N)\), the forgetful one, and the one sending \((E, \phi _\varepsilon ,C)\) to \((E/C,\overline{\pi _C} \circ \phi _\varepsilon \circ \overline{\pi _C}^{-1})\) where C is a cyclic subgroup of order m, \(\pi _C : E \rightarrow E/C\) the natural projection, and \(\overline{\pi _C}\) the induced map \(E[N] \rightarrow (E/C)[N]\). Furthermore, Chen morphisms between \(J_\mathrm{{ns}}(N)\) and \(J_0(N^2)\) are equivariant with respect to the Hecke actions [42, Theorem 1.11].

We will again use the generalised Gross–Zagier formula from Zhang from [67], in a slightly different context here. We follow the notation of [67, §6]. Let \(K/{\mathbb {Q}}\) be an imaginary quadratic field inert at N (instead of split or ramified in the previous case), and let \(K\hookrightarrow M_2({\mathbb {Q}})\) be an embedding associated to an integral basis of \({{\mathcal {O}}}_K\). For a choice of order \({{\mathcal {O}}}_c\) of K of conductor c prime to N, define

$$\begin{aligned} R_c={\mathcal {O}}_c +N\cdot M_2({\mathbb {Z}}) \end{aligned}$$

(notice the index of \(N {{\mathcal {O}}}_K\) is \(N^2\)). The Shimura variety \(M_{U_c}\) is then uniformised as

$$\begin{aligned} M_{U_c} (\mathbb {C})={\text {GL}}_2({\mathbb {Q}})_+ \backslash {\mathcal {H}} \times {\text {GL}}_2(\mathbb {A}_f )/U_c, \end{aligned}$$

where \(U_c\) can be defined as \({\text {GL}}_2({\mathbb {Z}}_v)\) for places v not dividing N, and \((R_c \otimes {\mathbb {Z}}_N)^* \subset {\text {GL}}_2({\mathbb {Z}}_N)\) at N (seen in \({\text {GL}}_2({\mathbb {Z}}_N)\)). Note that \({\text {GL}}_2({\mathbb {Q}})^+ \cdot U_c ={\text {GL}}_2(\mathbb {A}_f )\) and \({\text {GL}}_2({\mathbb {Q}})_+ \cap U_c \subset {\text {SL}}_2({\mathbb {Z}})\) contains the subgroup \(\varGamma (N)\) of \({\text {SL}}_2({\mathbb {Z}})\) of all matrices congruent to the identity modulo N, and the quotient is a conjugate of \(C_{{{\,\mathrm{ns}\,}}}(N) \cap {\text {SL}}_2({\mathbb {Z}}/N{\mathbb {Z}})\), where the precise choice of \(C_\mathrm{{ns}}(N)\) comes from the reduction modulo N of \({{\mathcal {O}}}_c\) inside \(M_2({\mathbb {Z}}/N{\mathbb {Z}})\) given by the embedding (it is nonsplit precisely because N is inert in \({{\mathcal {O}}}_c\)) . This gives an isomorphism

$$\begin{aligned} M_{U_c} (\mathbb {C})\simeq Y_{{{\,\mathrm{ns}\,}}}(N)_{\mathbb {C}}. \end{aligned}$$

The CM points on \(M_{U_c}\) in the sense of Zhang are then the double cosets of pairs \((h_0 ,i_c )\), where \(h_0 \) is fixed by the image T of the torus \(K^\times \). and \(i_c\) has the property that

$$\begin{aligned} i_c U_c i_c ^{-1}\cap T(\mathbb {A}_f )\simeq \widehat{{\mathcal {O}}}^\times _c /\widehat{{\mathcal {O}}}^\times _F , \end{aligned}$$

in other words the nonsplit Cartan structure of level N is the one determined by the endomorphism ring of the CM elliptic curve.

On the other hand, we say that \((E,\phi _\varepsilon ) \in Y_\mathrm{{ns}}(N)\) is a Heegner point (in the sense of Kohen–Pacetti) with multiplication by \({{\mathcal {O}}}_c\) if \({\text {End}}(E) \cong {{\mathcal {O}}}_c\) (with c prime to N) and \(\phi _\varepsilon \) comes from an endomorphism \(\beta \) of E. Note that this implies that N is inert in \({{\mathcal {O}}}_c\), since the minimal polynomial of \(\beta \) modulo N is then irreducible.

This discussion thus implies the following equivalence of definitions.

Lemma 12

Under the identification \(M_{U_c} \simeq Y_{{{\,\mathrm{ns}\,}}}(N)\) for every order \({{\mathcal {O}}}_c\) of conductor c prime to N, Zhang’s CM points correspond to Heegner points with CM by \({{\mathcal {O}}}_c\) in \(Y_\mathrm{{ns}}(N)\) in the sense of Kohen–Pacetti.

Let f be an eigenform in \(S_2 (\varGamma _0 (N^2 ))^{+,\mathrm {new}} \). It can be seen as an automorphic form on an \(M_{U_c}\) as above, using the isomorphism of Hecke modules \(S_2 (\varGamma _0 (N^2 ))^{+,\mathrm {new}} \cong S_2(\varGamma _\mathrm{{ns}}^+(N))\) and the isomorphism \(M_{U_c}({\mathbb {C}}) \cong Y_\mathrm{{ns}}(N)_{\mathbb {C}}\) and we again have by Eichler-Shimura theory a \({\mathbb {Q}}\)-simple quotient \(A_f\) of \(J_\mathrm{{ns}}^+(N)\).

The consequence of Zhang’s result that we will use is the following.

Theorem 3

([67], Theorem 6.1) With notation as above, let \(1_c\) be the trivial character of \({\text {Gal}}(H_c /K)\) and P a Heegner point on \(Y_\mathrm{{ns}}(N)\) with CM by \({{\mathcal {O}}}_c\) in the sense of Kohen-Pacetti. Denote by \(P_{1_c}\) be the projection of \(P - \xi \) (\(\xi \) the Hodge class) in \(J_{{{\,\mathrm{ns}\,}}}(N)(K) = J_{{{\,\mathrm{ns}\,}}}(N)(H_c )^{1_c}\). Let \(P_{1_c}^f\) be the projection of \(P_{1_c}\) onto the f-isotypical component of \(J_{{{\,\mathrm{ns}\,}}}(N)(H_c )\otimes \mathbb {C}\).

If \(L'(f,1)=0\), then \(P_{1_c}^f=0\) and \(\pi _f (P_{1_c})\) is torsion in \(A_f(H_c)\).

Proof

Using the previous lemmas and discussion, we can translate everything in terms of the Shimura curve \(M_{U_c}\): the Heegner point P becomes a CM point in the sense of Zhang and f becomes an automorphic representation \(\phi \). These changes are compatible with Hecke operators and Galois actions, so they preserve the decompositions into isotypical components above. We can then proceed along the same lines as the proof of Lemma 9 part 2 to deduce the conclusion from Zhang’s theorem. \(\square \)

We are now ready to prove the analogue of Lemma 10 with \(X_0 (N)\) replaced by \(X_{{{\,\mathrm{ns}\,}}}^+(N)\).

Lemma 13

Let \(X=X_{{{\,\mathrm{ns}\,}}} (N)\), let m be prime to N, and let \({\widetilde{C}}_m\) be the Hecke correspondence defined above. Then the divisor \(i_{1,2} ^* {\widetilde{C}}_{m}\) is supported on Heegner points in the sense of Kohen-Pacetti and cusps whenever m is less than \(N^2 /4\).

Proof

By the moduli interpretation of \(X_\mathrm{{ns}}(N)\) and the Hecke correspondences, a noncuspidal point in the support of \(i_{1,2} ^* {\widetilde{C}}_{m}\) is a pair \((E,\phi _\varepsilon )\) such that there exists an endomorphism \(\alpha \) of E of norm m with cyclic kernel (of order m) such that if \({\overline{\alpha }}\) is the induced endomorphism of E[N], \({\overline{\alpha }} \circ \phi _\varepsilon \circ {\overline{\alpha }}^{-1} = \phi _\varepsilon \). This implies that \({\overline{\alpha }}\) belongs to the nonsplit Cartan subgroup associated to \(\phi _\varepsilon \) (which is also the group of invertible elements of \({\mathbb {Z}}[\phi _\varepsilon ]\)). We claim \({\overline{\alpha }}\) is not scalar: if it were, we could write \(\alpha = k + N \beta , k \in {\mathbb {Z}}\beta \in {\text {End}}(E)\) and then the norm of \(\alpha \) being \(m <N^2/4\) forces \(\beta \) to be an integer as well, contradicting the assumption that \(\alpha \) has cyclic kernel.

From this, we deduce that \({\mathbb {Z}}[{\overline{\alpha }}] = {\mathbb {Z}}[\phi _\varepsilon ]\), as both are \({\mathbb {Z}}/N{\mathbb {Z}}\)-vector spaces of dimension 2 and the former is included in the latter. This implies that \(\phi _\varepsilon \) is induced by the action of an element of \({\mathbb {Z}}[\alpha ] \subset {\text {End}}(E)\) on E[N], and the ring of endomorphisms has conductor prime to N for the same reasons as in \(X_0(N)\), and its discriminant is automatically prime to N as discussed after defining Heegner points in the sense of Kohen-Pacetti. \(\square \)

By the compatibility with Hecke correspondences on \(X_0 (N^2 )\) (which is a consequence of Chen’s theorem without quotient by Atkin-Lehner involutions, e.g. [60, Théorème 2]), Lemma 11 implies that any nice correspondence Z on \(X_\mathrm{{ns}}^+(N)\) can be written as a linear combination of \(C_m\) for \(m<N^2 /4\) prime to N. By Lemma 13, for any such Z, \(D_Z (b)\) is supported on Heegner points (in the sense of Kohen–Pacetti) and cusps. Hence, Zhang’s Gross–Zagier theorem (together with Manin–Drinfeld) implies \(\pi _B (D_Z (b))\) is torsion. Assuming the conclusions of Theorem 2 hold for M, the Heegner quotient A of \(J_0^+(M)^\mathrm{{new}}\) is of dimension at least 2 so \(\rho (A) \ge 2\). This completes the proof of case (2) of Proposition 3.

4.4 An alternative approach

In this subsection, we sketch an alternative and less ad hoc approach for proving Proposition 3 in the case \(X=X_0 ^+ (N)\), using the Theorem of Yuan–Zhang–Zhang on the heights of diagonal cycles.

Theorem 4

(Darmon–Rotger–Sols [19], Theorem 3.7) Let \(X=X_0 (N)\), and let fg be non-conjugate eigenforms in \(S_2 (\varGamma _0 (N))\). Let \(Z\in {{\,\mathrm{NS}\,}}(J_0 (N))\) lie in the image of \({{\,\mathrm{NS}\,}}(A_g )\). Suppose \(\epsilon (f)=-1\) and \(\epsilon ({{\,\mathrm{Sym}\,}}^2 (g)\otimes f)=1\). If the projection of \(D_Z (b)\) to \(A_f\) is non-torsion, then \(L'(f,1)\ne 0\).

The result above holds for arbitrary N, but is most useful when N is prime, since in this case we have \(\epsilon (f\otimes g\otimes g)=-a_N (f)a_N (g)^2 =-a_N (f)\) (see e.g. [30]). Hence in this case Theorem 4 implies that the image of \(D_Z (b)\) in \(A_f\) is torsion for all eigenforms f in \(S_2 ^+ (\varGamma _0 (N))\)., which implies that we get an alternative proof for \(X_0 ^+ (N)\). One way to view Proposition 3 is that it shows that it is easier to prove diagonal cycles are torsion than it is to prove they are non-torsion. On the other hand, one can show directly that the image of \(D_Z (b)\) in \(A_f \) is torsion for all eigenforms f satisfying \(w_N (f)=-f\), as explained in [20, Theorem 3.3.8]: by Lemma 6, we have

$$\begin{aligned} w_{N}^* (D_Z (b))=D_{w_N ^* (Z)}(b). \end{aligned}$$

Since \(w_N ^* (Z)=Z\), and \(w_N ^* \) acts as (-1) on \(A_f\), we deduce \(\pi _{f *}(D_Z (b))\) is torsion.

5 Proof of the analytic part

In this section, we prove Theorem 2 using analytic weighted averages techniques, following guiding principles e.g. from [37] and [24]. For convenience and consistency, the notation below is as close as possible to that from [47].

Notation

  • N is a prime number and \(M = N\) or \(N^2\) in all of the following.

  • If \(f,g \in S_2(\varGamma _0(M))\), we denote their Petersson scalar product by

    $$\begin{aligned} \langle f,g \rangle _M = \int _{{\mathcal {D}}}\overline{f(x+iy)} g(x+iy) dx dy, \end{aligned}$$

    where \({{\mathcal {D}}}\) is a fundamental domain of \(\varGamma _0(M)\), and the associated Petersson norm by \(\Vert \cdot \Vert _M\).

  • For \(\varepsilon = \pm 1\), the space \(S_2(\varGamma _0(M))^\varepsilon \) refers to the subspace of modular forms f of \(S_2(\varGamma _0(M))\) such that \(f_{|w_M} = \varepsilon \cdot f\), where \(w_M\) is the Fricke involution of \(S_2(\varGamma _0(M))\). Note that in weight 2, this is the space of modular forms f such that L(fs) has root number \(- \varepsilon \).

  • For AB linear forms on \(S_2(\varGamma _0(M))\) (resp. on a subspace indicated by superscripts), we write

    $$\begin{aligned} \langle A,B \rangle _M = \sum _f \frac{\overline{A(f)} B(f)}{\Vert f\Vert _M^2}, \end{aligned}$$

    where f goes through an orthogonal basis of \(S_2(\varGamma _0(M))\) (it is readily checked not to depend on this choice of basis), resp. of the prescribed subspace. We will add superscripts \(\{+,-,\mathrm {new},\mathrm {old}\}\) to refer to the sum restricted to an orthogonal basis of the corresponding subspaces of \(S_2(\varGamma _0(M))\).

  • We denote by \(a_m\) (for \(m \in {\mathbb {N}}_{\ge 1}\)) and \(L'\) the linear forms on \(S_2(\varGamma _0(M))\) which to f associate respectively the m-th coefficient of the q-expansion of f, and \(L'(f,1)\) (defined properly in the next paragraph).

  • The (positive) greatest common divisors of integers ab or integers abc are respectively denoted by (ab) and (abc).

  • For any positive number B, \(O_1(B)\) refers to a complex number of absolute value \(\le B\).

The proof of Theorem 2 relies on the following lemma.

Lemma 14

Theorem 2 holds for M if

$$\begin{aligned} \langle a_1, L' \rangle _M^{+,\mathrm {new}} \ne 0 \quad \text {and} \quad \frac{\langle a_2, L' \rangle _M^{+,\mathrm {new}}}{\langle a_1,L' \rangle _M^{+, \mathrm {new}}} \in \, ]0,1[. \end{aligned}$$

Proof

If \(\langle a_1,L'\rangle _M^{+,\text {new}} \ne 0\), by definition of this sum, there must be at least one normalised newform \(f \in S_2(\varGamma _0(M))^{+,\text {new}}\) such that \(L'(f,1) \ne 0\). As a byproduct of the Gross–Zagier formula ( [32], Corollary V.1.3), this implies that \(L'(g,1) \ne 0\) for all normalised newforms g which are conjugates of f by \({{\text {Gal}}({\overline{{\mathbb {Q}}}}/ {\mathbb {Q}})}\), thus Theorem 2 holds for M unless the field of coefficients of f is \({\mathbb {Q}}\) and this f is unique, which we assume now. As f is normalised, those coefficients are algebraic integers hence belong to \({\mathbb {Z}}\). Now, one has

$$\begin{aligned} \frac{\langle a_2, L' \rangle _M^{+,\text {new}}}{\langle a_1,L' \rangle _M^{+, \text {new}}} = \frac{\overline{a_2(f)} L'(f,1) \Vert f\Vert _M^2}{\overline{a_1(f)} L'(f,1) \Vert f\Vert _M^2} = a_2(f) \in \, ]0,1[ \end{aligned}$$

by hypothesis, so \(a_2(f) \notin {\mathbb {Z}}\) which leads to a contradiction and Theorem 2 holds. \(\square \)

Remark 8

The statement of this lemma appears quite ad hoc so let us explain the main motivations behind it.

  • As we will see later, as long as m is small compared to \(\sqrt{M}\), one has

    $$\begin{aligned} \frac{ \langle a_m,L' \rangle _{M}^{+,\text {new}}}{4 \pi } = \ln (\sqrt{M}) + C - \ln (m) + O(m/\sqrt{M}) \end{aligned}$$

    with explicit implied constants. This proves that the hypotheses of the lemma are indeed satisfied for large M.

  • The error terms of the estimate above are smaller when the m’s are smaller, hence the choices of \(m=1\) and 2 for the ratio.

  • There are far better asymptotic estimates on the number of newforms f in \(S_2(\varGamma _0(M))^{+,\text {new}}\) such that \(L'(f,1) \ne 0\), e.g. : by [45] (at least for \(M=N\) prime), the proportion of such forms is asymptotically at least 7/8, in particular there are far more than just 2 for M large). These techniques, using also estimates of second moments and of the norms \(\Vert f\Vert _M\), are harder to make explicit, and we suspect the effective bounds obtained by following step-by-step the arguments would be huge. Lemma 14, while very crude (and giving a weaker result) is tailor-made to be efficient enough for precise estimates and approachable bounds.

5.1 Splitting of the terms to estimate the first moments

The starting point to estimate the weighted averages \(\langle a_m, L'\rangle _N^\mathrm{{new}}\) is the following trace formula of Petersson adapted by Akbary (and proven in greater generality in [47]).

Proposition 4

Let mnM be three positive integers, and \(\varepsilon = \pm 1\). Then, we have

$$\begin{aligned} \frac{1}{2 \pi \sqrt{mn}} \langle a_m, a_n \rangle _{M}^\varepsilon= & {} \delta _{mn} - 2 \pi \sum _{\begin{array}{c} c>0 \\ M |c \end{array} } \frac{S(m,n ;c)}{c} J_1 \left( \frac{4 \pi \sqrt{mn}}{c} \right) \nonumber \\&- 2 \pi \varepsilon \sum _{\begin{array}{c} d > 0 \\ (d,M)= 1 \end{array}} \frac{ S(m,nM^{-1};d)}{d \sqrt{M}} J_1 \left( \frac{4 \pi \sqrt{mn}}{d \sqrt{M}} \right) , \end{aligned}$$
(25)

where S is the notation for Kloosterman sums

$$\begin{aligned} S(m,n;c) = \sum _{k \in ({\mathbb {Z}}/c {\mathbb {Z}})^*} e^{ 2 i \pi (m k + n k^{-1})/c} \end{aligned}$$

(except for \(c=1\) where its value is 1 by convention), \(Q^{-1}\) means the inverse of Q modulo d in the Kloosterman sums and \(J_1\) is the Bessel function of the first kind and order 1.

The sums on the right-hand side are absolutely convergent thanks to the following well-known uniform bounds: \(|J_1(x)| \le |x|/2\) for all x, and the Weil bounds

$$\begin{aligned} |S(m,n;c)| \le (m,n,c) ^{1/2} \tau (c) \sqrt{c}, \end{aligned}$$
(26)

with \(\tau \) the divisor-counting function, which improves, if M is a prime power dividing c, in

$$\begin{aligned} |S(m,n;c)| \le 2 (m,n,c) ^{1/2} \tau (c/M) \sqrt{c} \end{aligned}$$

( [36], (3.2), (3.3), Theorem 11.11 and Corollary 11.12).

Now, our normalisation of the L-function associated to a form \(f \in S_2(\varGamma _0(M))\) is given by

$$\begin{aligned} L(f,s) = \sum _{n=1} \frac{a_n(f)}{n^s}, \end{aligned}$$

and this L-series converges uniformly on any compact subset of \(\{{\text {Re}}(s)>2 \}\).

One can express \(L'(f,1)\) itself in terms of the Fourier coefficients of f in the following way.

Lemma 15

For any \(M \ge 1\) and any \(f \in S_2(\varGamma _0(M))^+\), one has

$$\begin{aligned} L'(f,1) = 2 \sum _{n=1}^{+ \infty } \frac{a_n(f)}{n} E_1 \left( \frac{2 \pi n}{\sqrt{M}} \right) \end{aligned}$$

where \(E_1\) is the exponential integral function, defined on \(]0,+\infty [\) by

$$\begin{aligned} E_1(y) = \int _y^{+ \infty } \frac{e^{-t}}{t}dt. \end{aligned}$$

Proof

We define the completed L-function \(\varLambda \) associated to L by

$$\begin{aligned} \varLambda (f,s) := \left( \frac{\sqrt{M}}{2 \pi } \right) ^{s} \varGamma (s) L(f,s). \end{aligned}$$
(27)

By standard arguments(e.g. [14], section 1.5), this function extends to an holomorphic function on \({\mathbb {C}}\) and satisfies the functional equation

$$\begin{aligned} \varLambda (f,2-s) = - \varLambda (f_{|w_M},s). \end{aligned}$$
(28)

The expression of \(L'(f,1)\) is then deduced from the functional equation of \(\varLambda \) by integration of residues on vertical axes and Mellin transform (see e.g. [36] (26.10) where the definition of L is translated by 1/2). \(\square \)

With this formula and by uniform convergence of the terms involved, we obtain:

$$\begin{aligned} \frac{\langle a_m, L' \rangle _{M}^+}{4 \pi } = E_1 \left( \frac{2 \pi m}{\sqrt{M}} \right) - 2 \pi \sqrt{m} \left( \sum _{M|c} \frac{{{\mathcal {S}}}(c)}{c} + \sum _{(d,M)=1} \frac{{{\mathcal {T}}}(d)}{d \sqrt{M}} \right) , \end{aligned}$$
(29)

where

$$\begin{aligned} {{\mathcal {S}}}(c) = \sum _{n=1}^{+ \infty } \frac{S(m,n;c)}{\sqrt{n}} J_1 \left( \frac{4 \pi \sqrt{mn}}{c} \right) E_1 \left( \frac{2 \pi n}{\sqrt{M}} \right) \end{aligned}$$
(30)

and

$$\begin{aligned} {{\mathcal {T}}}(d) = \sum _{n=1}^{+ \infty } \frac{S(m,nM^{-1};d)}{\sqrt{n}} J_1 \left( \frac{4 \pi \sqrt{mn}}{d \sqrt{M}} \right) E_1 \left( \frac{2 \pi n}{\sqrt{M}} \right) . \end{aligned}$$
(31)

The main term in (29) will be \(E_1(2 \pi m /\sqrt{M})\) as long as \(m \ll \sqrt{M}\).

The trace formula does not separate the old and new spaces, which we need for \(M=N^2\). This is taken care of in the following lemma.

Lemma 16

For N prime and \(m \ge 1\) not divisible by N,

$$\begin{aligned} \langle a_m, L' \rangle _{N^2}^{+,\mathrm {new}} = \langle a_m, L' \rangle _{N^2}^+ - \frac{1}{N-1} \left( \langle a_m, L' \rangle _N^+ + \frac{\ln (N)}{2} \langle a_m, L \rangle _N^- \right) . \end{aligned}$$

Proof

By orthogonality of the new and old subspaces,

$$\begin{aligned} \langle a_m, L' \rangle _{N^2}^{+,\text {new}} = \langle a_m, L' \rangle _{N^2} - \langle a_m, L' \rangle _{N^2}^{+,\text {old}}. \end{aligned}$$

To prove the formula on the oldpart, we need to be a bit careful with the definitions of completed L-functions: although the definition of L(fs) does not depend on the ambient space of modular forms, the definition of the completed L-function \(\varLambda (f,s)\) in (27) does. The degeneracy operators are denoted by \(A_n\) as in the original article [1]. Let

$$\begin{aligned} A_1 = I_2, \quad A_N = \begin{pmatrix} N &{} 0 \\ 0 &{} 1 \end{pmatrix}, \quad W_N = \begin{pmatrix} 0 &{} 1 \\ -N &{} 0 \end{pmatrix}, \quad W_{N^2} = \begin{pmatrix} 0 &{} 1 \\ -N^2 &{} 0 \end{pmatrix}. \end{aligned}$$

Notice that \((A_N W_{N^2} W_N^{-1})/N\) belongs to \(\varGamma _0(N)\), thus for \(f \in S_2(\varGamma _0(N))\) such that \(f_{|W_N} = \varepsilon _f \cdot f\), one has

$$\begin{aligned} (f_{|A_N})_{|W_{N^2}} = (f_{|W_N})_{|A_1} = \varepsilon _f \cdot f_{|A_1}, \end{aligned}$$
(32)

hence also

$$\begin{aligned} (f_{|A_1})_{|W_{N^2}} = \varepsilon _f \cdot f_{|A_N}. \end{aligned}$$

Consequently, an orthogonal (see the computations of section 4 of [47] for example) basis of \(S_2(\varGamma _0(N^2))^{+,\text {old}}\) is given by the \(f_{|A_1} + (f_{|A_1})_{|W_{N^2}}\), where f goes through an eigenbasis of \(S_2(\varGamma _0(N))\). The aforementioned computations also prove with (32) that if \(f_{|W_N} = \varepsilon _f \cdot f\), then

$$\begin{aligned} \langle f_{|A_1} + (f_{|A_1})_{|W_{N^2}},f_{|A_1} + (f_{|A_1})_{|W_{N^2}} \rangle _{N^2} = 2(N-1) \langle f,f \rangle _N. \end{aligned}$$

If N does not divide m (so that \(a_m(f_{|A_N})=0\)), this implies that

$$\begin{aligned} \langle a_m, L' \rangle _{N^2}^{+,\text {old}} = \frac{1}{2(N-1)} \sum _f \overline{a_m(f)} L'( f_{|A_1} + (f_{|A_1})_{|W_{N^2}},1) \end{aligned}$$

where f goes through an orthonormal basis of \(S_2(\varGamma _0(N))\). Now, by the functional equation of \(\varLambda (f,s)\) in (28), \( \varLambda '(f_{|A_1},1) = \varLambda '((f_{|A_1})_{|W_{N^2}},1)\) but

$$\begin{aligned} \varLambda '(f_{|A_1},1)= & {} \frac{N}{2 \pi } (L'(f_{|A_1},1) + (\ln (N/2\pi ) + \gamma ) L(f,1)) \\ \varLambda '((f_{|A_1})_{|W_{N^2}},1)= & {} \frac{N}{2 \pi } (L'((f_{|A_1})_{|W_{N^2}},1) + (\ln (N/2\pi ) + \gamma ) \varepsilon _f L(f,1)). \end{aligned}$$

The first equality is a direct application of the definition of \(\varLambda \), the second one uses that \(L(f_{|A_N},1) = L(f,1)\) (easy to show by the integral formula of L(f, 1)) and the results above. Thus, to compute \(L'(f_{|A_1} + (f_{|A_1})_{|W_{N^2}},1)\), it is enough to know the sum of the two right-hand terms which is the sum of the two left-hand terms, which equal one another. Now, if \(\varepsilon _f=1\) then \(L(f,1)=0\) by sign of the functional equation of \(\varLambda (f,s)\) (in level N here !), and if \(\varepsilon _f = -1\), \(\varLambda '(f,1) =0\). We thus obtain in this case

$$\begin{aligned} L'(f,1) = - (\ln (\sqrt{N}/(2 \pi )) + \gamma ) L(f,1), \end{aligned}$$

and get the lemma by summation on those forms f’s gathered by sign of \(\varepsilon _f\). \(\square \)

5.2 First estimates

We recall that \(M=N\) or \(N^2\).

Lemma 17

Using the Weil bounds, we get for every c multiple of M and d prime to M:

$$\begin{aligned} |{{\mathcal {S}}}(c)| \le 2 \sqrt{mM} \tau (c/M) \frac{f((m,c))}{\sqrt{c}}, \quad |{{\mathcal {T}}}(d)| \le \tau (d) \sqrt{m} \frac{f((m,d))}{\sqrt{d}} \end{aligned}$$

where for every integer k, \(f(k) = \sum _{k'|k} \frac{1}{\sqrt{k'}}\). For \(m=2\) and c, d even, these estimates are improved to

$$\begin{aligned} |{{\mathcal {S}}}(c)| \le (\sqrt{2}+2) \frac{\sqrt{M} \tau (c/M)}{\sqrt{c}}, \quad |{{\mathcal {T}}}(d)| \le (1+1/\sqrt{2}) \frac{\tau (d)}{\sqrt{d}}. \end{aligned}$$
(33)

Proof

In the definitions of \({{\mathcal {S}}}(c)\) (and similarly for \({{\mathcal {T}}}(d)\)), we separate the terms in n depending on the values of \((m,n,c) = m'\) which is a divisor of (mc). Then, using \(|J_1(x)| \le |x|/2\), it only remains to control the sum of the \(E_1(2 \pi m'n/\sqrt{M})\) for n from 1 to \(+ \infty \), which after sum-integral comparison and variable change is smaller than \(\sqrt{M}/(2\pi m')\).

In the specific case where \(m=2\) and c or d even, the cases are made from the beginning on the values of \((m,n,c)^{1/2}\) instead of bounding by \((m,c)^{1/2}\), and a careful computation gives those bounds. \(\square \)

This allows us to bound the sum of the \({{\mathcal {S}}}(c)/c\) for all multiples c of M. By multiplicativity of \(\tau \),

$$\begin{aligned} \left| \sum _{M|c} \frac{{{\mathcal {S}}}(c)}{c} \right|\le & {} \frac{2 \sqrt{m}}{M} \sum _{m'|m} \frac{f(m') \tau (m')}{(m')^{3/2}} \sum _{c=1}^{+ \infty } \frac{\tau (c)}{c^{3/2}} \\\le & {} \frac{2 \sqrt{m}}{M} \sum _{m'|m} \frac{\tau (m')}{m'} \sum _{c=1}^{+ \infty } \frac{\tau (c) }{c^{3/2}}, \end{aligned}$$

the sum on c being exactly \(\zeta (3/2)^2\). We denote

$$\begin{aligned} g(m) = \sum _{m'|m} \frac{f(m') \tau (m')}{(m')^{3/2}} \end{aligned}$$

hence (and similarly for \({{\mathcal {T}}}\)):

$$\begin{aligned} 2 \pi \sqrt{m} \left| \sum _{M|c} \frac{{{\mathcal {S}}}(c)}{c} \right| \le \frac{86 m}{M} g(m), \quad 2 \pi \sqrt{m} \left| \sum _{(d,M)=1} \frac{{{\mathcal {T}}}(d)}{d\sqrt{M}} \right| \le \frac{43 m}{\sqrt{M}} g(m) \end{aligned}$$
(34)

which gives

$$\begin{aligned} \frac{\langle a_m, L' \rangle _M^+}{4 \pi } = E_1 (2 \pi m / \sqrt{M}) + g(m) m \left( O_1 \left( \frac{86}{M} \right) + O_1 \left( \frac{43 }{\sqrt{M}} \right) \right) . \end{aligned}$$
(35)

For \(m=2\), the previous refinements can be exploited and we get instead

$$\begin{aligned} 2 \pi \sqrt{2} \left| \sum _{M|c} \frac{{{\mathcal {S}}}(c)}{c} \right| \le \frac{213}{M},\quad 2 \pi \sqrt{2} \left| \sum _{(d,M)=1} \frac{{{\mathcal {T}}}(d)}{d \sqrt{M}} \right| \le \frac{97}{\sqrt{M}} \end{aligned}$$

hence

$$\begin{aligned} \frac{\langle a_2, L' \rangle _M^+}{4 \pi } = E_1 (4 \pi / \sqrt{M}) + O_1 \left( \frac{213}{M} \right) + O_1 \left( \frac{97}{\sqrt{M}} \right) . \end{aligned}$$
(36)

Identical bounds are found for

$$\begin{aligned} {{\mathcal {S}}}_0(c)= & {} \sum _{n=1}^{+ \infty } \frac{S(m,n;c)}{\sqrt{n}} J_1 \left( \frac{4 \pi \sqrt{mn}}{c} \right) \exp \left( - \frac{2 \pi n}{\sqrt{M}} \right) \\ {{\mathcal {T}}}_0(d)= & {} \sum _{n=1}^{+ \infty } \frac{S(m,nM^{-1};d)}{\sqrt{n}} J_1 \left( \frac{4 \pi \sqrt{mn}}{c\sqrt{M}} \right) \exp \left( - \frac{2 \pi n}{\sqrt{M}} \right) \end{aligned}$$

as the integral of \(e^{-t}\) on \([0,+\infty [\) is equal to 1 like the one of \(E_1\). Thus, by similar computations,

$$\begin{aligned} \frac{\langle a_m,L \rangle _N^-}{4 \pi } = e^{ - 2 \pi m/\sqrt{N}} + m g(m) \left( O_1 \left( \frac{86}{N} + \frac{43}{\sqrt{N}} \right) \right) . \end{aligned}$$

Gathering those bounds, we get for all m prime to N,

$$\begin{aligned} \frac{\langle a_m,L'\rangle _{N^2}^{+,\text {new}}}{4 \pi }= & {} E_1\left( \frac{2 \pi m}{N} \right) - \frac{E_1 \left( \frac{2 \pi m}{\sqrt{N}} \right) }{N-1} - \frac{\ln (N) e^{- 2 \pi m/\sqrt{N}}}{2(N-1)} \end{aligned}$$
(37)
$$\begin{aligned}&+ m g(m) O_1 \left( \frac{86}{N^2} + \frac{43}{N} + \frac{\ln (N)/2+1}{N-1} \left( \frac{86}{N} + \frac{43}{\sqrt{N}} \right) \right) \end{aligned}$$
(38)

and slightly better ones for \(m=2\) coming from refinements above (it suffices to replace 86mg(m) by 213 and 43mg(m) by 97 above).

By computations on Sage, we deduce the following first estimates.

Proposition 5

With the previous estimates, one finds

$$\begin{aligned} \begin{array}{rcl|rcl} \langle a_1, L' \rangle _{N}^+> 0 &{} \text {for} &{} N \ge 1213 &{} \langle a_1, L' \rangle _{N^2}^{+,\mathrm {new}}> 0 &{} \text {for} &{} N \ge 47 \\ \langle a_2, L' \rangle _{N}^+>0 &{} \text {for} &{} N \ge 5437 &{} \langle a_2, L' \rangle _{N^2}^{+,\mathrm {new}} > 0 &{} \text {for} &{} N \ge 97 \\ \frac{\langle a_2, L' \rangle _{N}^+}{\langle a_1, L' \rangle _{N}^+} \in \, ]0,1[ &{} \mathrm {for} &{} N \ge 45341 &{} \frac{\langle a_2, L' \rangle _{N^2}^{+,\mathrm {new}}}{\langle a_1, L' \rangle _{N^2}^{+,\mathrm {new}}} \in \, ]0,1[ &{} \text {for} &{} N \ge 269. \end{array} \end{aligned}$$

Hence, Lemma 14 applies and Theorem 2 is true for \(N \ge 45341\) for \(X_0 ^+ (N)\) and for \(N \ge 269\) for \(X_\mathrm{{ns}}^+(N)\).

For \(M=N\), the estimates of \(\langle a_m,L'\rangle _N\) are readily obtained, but the slowness of convergence is much more visible. This is mainly due to the fact that the error term is in \(m/\sqrt{N}\) instead of m/N.

5.3 Improving the estimates for prime level

To attain from \(N \ge 45341\) a range where all remaining primes can be checked by a different method, one needs to improve upon the worst error term appearing in \(\langle a_m,L' \rangle _N^+\), which is in \(m/\sqrt{N}\) and comes from the estimates of \({{\mathcal {T}}}(d)\) after looking at (33).

The following arguments rely on cancellations of Kloosterman sums not exploited by the Weil bounds. For \(d=1\), the Kloosterman sum is always 1 (see the convention) so this case has to be dealt with separately. A careful analysis proves that

$$\begin{aligned} 0.4 \sqrt{m} \le {{\mathcal {T}}}(1) \le \sqrt{m}, \end{aligned}$$

which will slightly improve the bounds later.

Assume now that \(d \ge 2\). The main term contributing to the bound is \(E_1(2\pi n/\sqrt{N})\), hence we write

$$\begin{aligned} {{\mathcal {T}}}(d) = {{\mathcal {T}}}_M(d) + {{\mathcal {T}}}_R(d), \end{aligned}$$

where \({{\mathcal {T}}}_M(d)\) is the sum of terms for which \(n \le 3 \sqrt{N}/\pi \) and \({{\mathcal {T}}}_R(d)\) is the remainder.

By the Weil bounds, using the fact that the integral of \(E_1\) on \([5,+\infty [\) is less than \(10^{-4}\), we obtain

$$\begin{aligned} 2 \pi \sqrt{m} \sum _{d \ge 2} \left| \frac{{{\mathcal {T}}}_R(d)}{d \sqrt{N}} \right| \le 10^{-4} \frac{\lambda _m}{\sqrt{N}} \end{aligned}$$

where \(\lambda _m = 43\) for \(m=1\) and 97 for \(m=2\) as before, so this contribution will be very small. For \({{\mathcal {T}}}_M(d)\), we will exploit Polyà-Vinogradov-type estimates ( [46], Lemma 5.9).

Proposition 6

For every \(d>1\), every k invertible modulo d and every \(m,K,K' \in {\mathbb {N}}\),

$$\begin{aligned} \left| \sum _{n=K}^{K'} S(m,nk;d) \right| \le \frac{4d}{\pi ^2} (\log (d) + 1.5). \end{aligned}$$

Now, assume \(N \ge 1000\), so that for \(m=1\) or 2 and \(n \le 5 \sqrt{N}/(2 \pi )\), \(4 \pi \sqrt{mn}/(d \sqrt{N}) \le 1.5\). This implies that in the considered range for n, the function \(t \mapsto J_1(4 \pi \sqrt{mt}/(d \sqrt{N}))/\sqrt{t} E_1(2 \pi t/\sqrt{N})\) is decreasing and positive (as the product of two such functions). Its total variation on \([1,5 \sqrt{N}/2 \pi ]\) is then bounded by its first value (itself controlled by \(E_1(2 \pi /\sqrt{N})/2\)).

By Abel transform and the previous proposition, we thus obtain

$$\begin{aligned} |{{\mathcal {T}}}_{M}(d)| \le \frac{8}{\pi } \frac{\sqrt{m}}{\sqrt{N}} (\log (d)+1.5) E_1 \left( \frac{2 \pi }{\sqrt{N}} \right) . \end{aligned}$$

Compared to Weil bounds in Lemma 17, the new bound is approximately the best for \(d \le f(N)= \lfloor N/(2.5^2 E_1(2\pi /\sqrt{N})^2) \rfloor \). We then obtain

$$\begin{aligned} 2 \pi \sqrt{m} \left| \sum _{d=2}^{f(N)} \frac{{{\mathcal {T}}}_{M}(d)}{d \sqrt{N}} \right|\le & {} \frac{16 m}{N} E_1 \left( \frac{2 \pi }{\sqrt{N}} \right) \sum _{d=2}^{f(N)} \frac{\log (d)+1.5}{d} \\\le & {} \frac{8m}{N} E_1 \left( \frac{2 \pi }{\sqrt{N}} \right) \left( \log (f(N))^2 + 3 \log (f(N)) + 1 \right) \end{aligned}$$

with lemma 5.11 of [46]. By Weil bounds and the same lemma, for \(m=1\),

$$\begin{aligned} 2 \pi \left| \sum _{d=f(N)+1}^{+ \infty } \frac{{{\mathcal {T}}}_{M}(d)}{d \sqrt{N}} \right| \le \frac{4 \pi }{\sqrt{N f(N)}} (\log (f(N))+4) \end{aligned}$$
(39)

and for \(m=2\),

$$\begin{aligned} 2 \pi \sqrt{2} \left| \sum _{d=f(N)+1}^{+ \infty } \frac{{{\mathcal {T}}}_{M}(d)}{d \sqrt{N}} \right| \le \frac{8 \pi (2-1/\sqrt{2})}{\sqrt{N f(N)}} (\log (f(N))+4). \end{aligned}$$
(40)

Combining these arguments, we get, for \(N \ge 1000\),

$$\begin{aligned} \frac{\langle a_1,L' \rangle _N^{+}}{4 \pi } \ge E_1 \left( \frac{2 \pi }{\sqrt{N}} \right) - \frac{6.3}{\sqrt{N}} - \frac{86}{N} - 2 \pi \left| \sum _{d=2}^{+ \infty } \frac{{{\mathcal {T}}}_M(d)}{d \sqrt{N}} \right| \end{aligned}$$

and

$$\begin{aligned} \frac{\langle a_2,L' \rangle _N^{+}}{4 \pi } \ge E_1 \left( \frac{4 \pi }{\sqrt{N}} \right) - \frac{6.3\sqrt{2}}{\sqrt{N}} - \frac{213}{N} - 2 \pi \sqrt{2} \left| \sum _{d=2}^{+ \infty } \frac{{{\mathcal {T}}}_M(d)}{d \sqrt{N}} \right| \end{aligned}$$

and finally

$$\begin{aligned} \langle a_1,L' \rangle _N^{+} >0 \quad \text {and} \quad \frac{\langle a_2,L' \rangle _N^{+}}{\langle a_1,L' \rangle _N^{+}} \in \, ]0,1[ \end{aligned}$$

for \(N \ge 8641\), which is much more reasonable than 45341.

The same improvements for the bounds apply exactly for \(M=N^2 \ge 1000\), thus allowing us to replace the estimate in 43/N in (3738) by the same expressions as above with f(M) instead of f(N).

One gets that \(\langle a_2,L' \rangle _{N^2}^{+,\mathrm {new}} >0\) for \(N \ge 71\) instead of 97, and that

$$\begin{aligned} \frac{\langle a_2,L' \rangle _{N^2}^{+,\mathrm {new}}}{\langle a_1,L' \rangle _{N^2}^{+,\mathrm {new}}} \in \, ]0,1[ \end{aligned}$$

for \(N \ge 151\).

We now discuss how to deal with the remaining cases, namely those for which \(N \le 8641\) and \(g(X_0^+(N)) \ge 2\), and those for which \(N \le 151\) and \(g(X_{\text {ns}}^+(N)) \ge 2\).

The most natural approach is the following: for any small N, compute a basis of eigenforms for \(S_2(\varGamma _0(M))^{+,\text {new}}\), and for every f (normalised) in this basis, compute \(L'(f,1)\) up to sufficient precision to ensure that \(L'(f,1) \ne 0\).

Recall that by ( [32], Corollary V.1.3), if \(L'(f,1) \ne 0\) under the same assumptions, the same is true for the Galois conjugate eigenforms, so only one check needs to be performed for the Galois orbit. Theorem 2 requires exactly that the sum of sizes of those Galois orbits is at least 2, so we only need to check that for two Galois orbits of size 1 (or one of size at least 2), one has \(L'(f,1) \ne 0\).

We have performed these verifications in MAGMA, and obtained the following.

\(\bullet \) For any prime \(N \le 2000\) such that \(X_0 ^+ (N)\) is of genus at least two, there are at least two distincts normalised newforms such that \(L'(f,1) \ne 0\), hence Theorem 2 holds. In fact, we have also checked that for all such N, \(L'(f,1) \ne 0\) for all the eigenforms in \(S_2(\varGamma _0(N))^{+}\), therefore by Proposition 8, \({{\,\mathrm{rank}\,}}J_0^+(N) ({\mathbb {Q}}) = \dim J_0^+(N)\) unconditionally for all those small primes.

\(\bullet \) Similarly, for any prime \(N \le 53\) such that \(X_\mathrm{{ns}}^+(N)\) is of genus at least two, \(L'(f,1) \ne 0\) for all the eigenforms in \(S_2(\varGamma _0(N^2))^{+,\text {new}}\), therefore by the same arguments, \({{\,\mathrm{rank}\,}}{\text {Jac}} (X_\mathrm{{ns}}^+(N))({\mathbb {Q}}) = \dim {\text {Jac}} (X_\mathrm{{ns}}^+(N))\) for all those small primes.

Unfortunately, these algorithms require explicit embeddings of the fields of coefficients \(K_f\) of f into \({\mathbb {C}}\), which makes them very slow when N becomes larger than 2000 (then, the degree of \(K_f\) can be larger than 100). We thus could not complete the argument by using only this method, let us explain how to deal with the intermediary range \(N \in [2000,9000]\) for \(X_0^+(N)\) and \(N \in [59,151]\) for \(X_\mathrm{{ns}}^+(N)\).

The idea is to look at the simple quotients of the two relevant Jacobians which are elliptic curves. If there are none, in this range, we have proved that \(\langle a_1,L' \rangle _M^{+,\text {new}} \ne 0\) so we must have f such that \(L'(f,1) \ne 0\), and it generates a simple quotient of dimension at least 2 by hypothesis, so we are done.

Now, if there are elliptic curves in there, it is sufficient to find two of them of rank 1 for the same reasons. Quotients of \(J_0(M)^{+,\text {new}}\) of dimension 1 are in one-to-one correspondence with isogeny classes of elliptic curves of conductor N and root number \(-1\) (the fact that this correspondence is surjective is a consequence of Cremona’s tables in this range but also a particular case of modularity theorems).

One can thus eliminate all levels N except the ones for which there exists exactly one (up to isogeny) elliptic curve E of analytic rank 1 and conductor N. Using Cremona’s tables, we obtain a list of respectively 70 (\(M=N\)) and 7 (\(M=N^2\)) possible exceptions, namely N in \(\{61,67,73,101,109,113\}\) for the latter.

Now, we use a last argument: if the modular form \(f_E\) associated to E is really the only one such that \(L'(f,1) \ne 0\) in the space, one should have

$$\begin{aligned} \langle a_1,L' \rangle _{M}^{+,\text {new}} = \frac{L'(E,1)}{\Vert f_E\Vert ^2} \end{aligned}$$

(the fact that this equality holds without a normalisation factor comes from the Manin constant being equal to 1 here, which is true in this range by results of Cremona).

Now, the left-hand side is larger than 4/5 for \(M=N\), \(N \ge 2000\) and than 1/2 for \(M=N^2\), \(N \ge 53\) by the (optimised) lower bounds given above, and the right-hand side is computable in terms of periods of E. Using this idea turns out to eliminate all remaining possible exceptions in both cases of M, which concludes the proof.

Remark 9

In some sense, this heuristic is natural: all terms in the sum defined by \(\langle a_1,L' \rangle _{M}^{+,\text {new}}\) are positive (another consequence of Gross–Zagier formula), hence there is no cancellation among those, and the idea is that one of them alone cannot be enough to approach the estimates given for the sum.