For over two decades now much has been written on the study of what happens when a fixed algebraic variety sitting inside a fixed commutative group variety is intersected with the union of group subvarieties of suitable dimension. When the group variety is the multiplicative group \(\mathbf{G}_{\mathrm{m}}^n\), we may refer to the work of Bombieri, Zannier and the second author (for example the early paper [8] on curves, our later paper [11] on varieties of codimension 2, and our paper [12] on planes) and the wide-ranging extension of Habegger to arbitrary varieties (see [31] for example). When the group variety is projectively complete there are the results of Viada about powers of a fixed elliptic curve (see [58] for example) as well as those of Rémond generalizing to abelian varieties (see [54] for example); see especially the paper [32] of Habegger and Pila. There are also investigations of Zannier and the second author inside varying group varieties such as elliptic and abelian schemes (see [45,46,47] for example). All this work on “unlikely intersections" takes place over zero characteristic, and one may consult the book [59] of Zannier for a comprehensive survey. The general conjectures are due to Zilber [60] and Pink [51].

Over positive characteristic it is well-known that related simpler problems, such as those associated with the names Manin-Mumford about torsion points, can become false. For example over zero characteristic the equation

$$\begin{aligned} x+y=1 \end{aligned}$$

has only two solutions in roots of unity x and y (involving primitive sixth roots). However over characteristic p there are infinitely many; indeed we can take any \(x \ne 0,1\) in the algebraic closure \(\overline{\mathbf{F}_p}\) and then y accordingly.

Another special kind of unlikely intersection occurs when we intersect the variety with a finitely generated group, an area often associated with the names Mordell-Lang. For example over zero characteristic we can ask for solutions of (1) with x a power of 3 and y a power of \(-2\), amounting essentially to the equation \(3^a-2^b=1\). This has for centuries been known to have only two solutions in integers ab. However over characteristic p inside the function field \(\mathbf{F}_p(t)\), with x a power of t and y a power of \(1-t\), we have infinitely many solutions

$$\begin{aligned} x=t^q,~~~y=(1-t)^q=1-t^q~~~~(q=1,p,p^2,\ldots ). \end{aligned}$$

For much more see for example the papers [33] of Hrushovski and [49] of Moosa and Scanlon.

And the torsion situation can be combined with the finitely generated situation by allowing finite rank; under this heading see for example the papers [29] of Ghioca and Moosa and [26] of Ghioca.

The second author [43] made a start on Zilber-Pink problems over positive characteristic, formulating a conjecture for curves in \(\mathbf{G}_{\mathrm{m}}^n\) and proving it for \(\mathbf{G}_{\mathrm{m}}^3\).

Then in [15] we continued the study of such problems, but now for the additive group \(\mathbf{G}_{\mathrm{a}}^n\). Over zero characteristic the naive conjectures for \(\mathbf{G}_{\mathrm{a}}^n\) become false, because they implicitly involve group subvarieties (of codimension 2), and there are simply far too many of these. For example the union of all of codimension 1 (and even of codimension \(n-1\)) is the whole \(\mathbf{G}_{\mathrm{a}}^n\).

Over positive characteristic it is well-known that problems of Manin-Mumford or Mordell-Lang type can be formulated for \(\mathbf{G}_{\mathrm{a}}^n\) by imposing some extra structure. One immediately thinks of Drinfeld modules (on which the literature is already substantial); but there is an easier way using Frobenius (also see [26], in particular Theorem 2.6 p.3841). It is these “Frobenius modules” or “F-modules” that we recently studied in [15].

In the present paper we go in the direction of Drinfeld, but we restrict ourselves to the simplest and most attractive forerunner, the Carlitz module.

To fix ideas, let us first review the situation for the multiplicative \(\mathbf{G}_{\mathrm{m}}^n\) over zero characteristic. The decisive result was obtained by Maurin [48] (see [7] also), and, taking into account [13], we now know the following best possible result.

Theorem A

Let K be an algebraically closed field of characteristic 0, and let C in \(\mathbf{G}_{\mathrm{m}}^n\) be an irreducible curve defined over K. Assume for any non-zero \((r_1, \ldots , r_n)\) in \(\mathbf{Z}^n\) that the monomial \(x_1^{r_1}\cdots x_n^{r_n}\) is not identically 1 on C. Then there are at most finitely many \((\xi _1,\ldots ,\xi _n)\) in C(K) for which there exist linearly independent \((a_1,\ldots ,a_n),(b_1,\ldots ,b_n)\) in \(\mathbf{Z}^n\) such that

$$\begin{aligned} \xi _1^{a_1}\cdots \xi _n^{a_n}=\xi _1^{b_1}\cdots \xi _n^{b_n}=1. \end{aligned}$$

It was already pointed out in [43] (p.506) that the naive analogue of this over positive characteristic is false, in that a (stronger) hypothesis about two monomials, not one, is needed. There is an exactly analogous situation in [15] for \(\mathbf{G}_{\mathrm{a}}^n\) with Frobenius structure associated with \(x^p\).

Our Carlitz structure is associated with \(x^p+tx\) instead, and it may be found surprising that such a small change makes the situation revert back to that of the original multiplicative result in Theorem A above, with only one monomial.

Let us now recall this \(\mathbf{G}_{\mathrm{a}}^n\) with Carlitz structure.

We use a distinguished parameter t.

Write \({{{\mathcal {C}}}}=t{{{\mathcal {F}}}}^0+{{{\mathcal {F}}}}^1\) where \({{{\mathcal {F}}}}^r\) is the Frobenius taking x to \(x^{p^r}\). Thus \({{{\mathcal {C}}}}(x)=tx+x^p\); or we shall usually write just \({{{\mathcal {C}}}}x\). We have the non-twisted \({{{\mathcal {R}}}}=\mathbf{F}_p[{{{\mathcal {C}}}}]\) inside the twisted \({{{\mathcal {K}}}}\{{{{\mathcal {F}}}}^1\}\), where \({{\mathcal {K}}}\) is any field of characteristic p. Of course \({{{\mathcal {K}}}}\{{{{\mathcal {F}}}}^1\}\) acts on \(\mathbf{G}_{\mathrm{a}}\) by

$$\begin{aligned} \alpha x=a_0x+a_1x^p+a_2x^{p^2}+\cdots \end{aligned}$$

for \(\alpha =a_0{{{\mathcal {F}}}}^0+a_1{{{\mathcal {F}}}}^1+a_2{{{\mathcal {F}}}}^2+\cdots \) in \({{{\mathcal {K}}}}\{{{{\mathcal {F}}}}^1\}\). Here we have used the same juxtaposition notation for the module action, as in \(\alpha x\), and the field action, as in ax. In general throughout this paper the first will be used mainly with greek letter coefficents, and the second mainly with roman letter coefficients; and the two actions will rarely be side-by-side.

There is an action on \(\mathbf{G}_{\mathrm{a}}^n\) by \(\alpha (x_1,\ldots ,x_n)=(\alpha x_1,\ldots ,\alpha x_n)\). Any algebraic subgroup of \(\mathbf{G}_{\mathrm{a}}^n\) that is an \({{{\mathcal {R}}}}\)-module is then defined by several equations of the form

$$\begin{aligned} \alpha _1x_1+\cdots +\alpha _nx_n=0 \end{aligned}$$

where \(\alpha _1,\ldots ,\alpha _n\) are in \({{\mathcal {R}}}\). The codimension is the rank of the various \((\alpha _1,\ldots ,\alpha _n)\) in \({{{\mathcal {R}}}}^n\). We believe in the following version of Theorem A.


Let K be an algebraically closed field containing \(\mathbf{F}_p(t)\), and let C in \(\mathbf{G}_{\mathrm{a}}^n\) be an irreducible curve defined over K. Assume for any non-zero \((\rho _1, \ldots , \rho _n)\) in \({{{\mathcal {R}}}}^n\) that the form \(\rho _1x_1+\cdots +\rho _nx_n\) is not identically zero on C. Then there are at most finitely many \((\xi _1,\ldots ,\xi _n)\) in C(K) for which there exist linearly independent \((\alpha _1,\ldots ,\alpha _n),(\beta _1,\ldots ,\beta _n)\) in \({{{\mathcal {R}}}}^n\) such that

$$\begin{aligned} \alpha _1\xi _1+\cdots +\alpha _n\xi _n=\beta _1\xi _1+\cdots +\beta _n\xi _n=0. \end{aligned}$$

The case \(n=1\) is empty.

The case \(n=2\) amounts to an analogue of Manin-Mumford. It was proved in the general context of Drinfeld modules by Scanlon [55], using techniques from model theory. Here we will sketch a more elementary method in the Carlitz context.

Already the case \(n=3\), going beyond torsion, is in the sense of Zilber-Pink. The main result of this paper is a proof for \(n=3\).

It is possible that the cases \(n=4,5\) can be handled by adapting the methods of [10] and the results of Amoroso and David [1].

But for \(n\ge 6\) quite different methods will probably be needed, maybe following [7, 48] or [13].

Also for general n it may well be possible to prove a weaker form of the Conjecture under the stronger hypothesis that \(\rho _1x_1+\cdots +\rho _nx_n\) is not identically constant on C (the analogue of the hypothesis in [8] for zero characteristic \(\mathbf{G}_{\mathrm{m}}^n\)).

Anyway, we shall prove


Let K be an algebraically closed field containing \(\mathbf{F}_p(t)\), and let C in \(\mathbf{G}_{\mathrm{a}}^3\) be an irreducible curve defined over K. Assume for any non-zero \((\rho _1,\rho _2,\rho _3)\) in \({{{\mathcal {R}}}}^3\) that the form \(\rho _1x_1+\rho _2x_2+\rho _3x_3\) is not identically zero on C. Then there are at most finitely many \((\xi _1,\xi _2,\xi _3)\) in C(K) for which there exist linearly independent \((\alpha _1,\alpha _2,\alpha _3),(\beta _1,\beta _2,\beta _3)\) in \({{{\mathcal {R}}}}^3\) such that

$$\begin{aligned} \alpha _1\xi _1+\alpha _2\xi _2+\alpha _3\xi _3=\beta _1\xi _1+\beta _2\xi _2+\beta _3\xi _3=0. \end{aligned}$$

In fact Theorem A above for \(n=3\) and \(K=\overline{\mathbf{Q}}\) was first proved in [8]. There the concept of height was unavoidable, and we needed also results on upper bounds as well as considerably deeper results on lower bounds.

By contrast the proofs in [43] and [15] do not use heights at all.

It turns out that the proof of our Theorem above follows much more closely [8] and in particular we need heights \(h(\xi )\) on \(\overline{\mathbf{F}_p(t)}\) (see later).

Of course the condition of algebraic closure can be omitted in all the above statements, but its retention is meant to emphasize that we are considering points of unbounded degree (over \(\mathbf{F}_p(t)\) for example).

We will prove the following upper bound, a Carlitz analogue of Theorem 1 of [8] (p.1120).

Proposition 1

For \(K=\overline{\mathbf{F}_p(t)}\) let C be an irreducible curve in \(\mathbf{G}_{\mathrm{a}}^n\) defined over K. Assume for any non-zero \((\rho _1,\ldots ,\rho _n)\) in \({{{\mathcal {R}}}}^n\) that the form \(\rho _1x_1+\cdots +\rho _nx_n\) is not identically constant on C. Then there is \({{\mathfrak {B}}}\) such that

$$\begin{aligned} h(\xi _1)+\cdots +h(\xi _n)\le {\mathfrak {B}} \end{aligned}$$

for all \((\xi _1,\ldots ,\xi _n)\) on C(K) for which there exists non-zero \((\alpha _1,\ldots ,\alpha _n)\) in \({{{\mathcal {R}}}}^n\) with

$$\begin{aligned} \alpha _1\xi _1+\cdots +\alpha _n\xi _n=0. \end{aligned}$$

For the lower bound we have to go beyond [8] with a Carlitz analogue of the Néron-Tate height on a elliptic curve. This was constructed by Denis [21] (even for Drinfeld modules), and we shall denote it by \({\hat{h}}(\xi ) \ge 0\) for \(\xi \) in \(\overline{\mathbf{F}_p(t)}\) (see later). It is well-known that \({\hat{h}}(\zeta ) = 0\) if and only if \(\zeta \) is torsion in the sense of Carlitz (see later). Then \(\mathbf{F}_p(t,\zeta )\) is a cyclotomic (see later) extension of \(\mathbf{F}_p(t)\), and every cyclotomic extension \(F_{\mathrm{c}}\) has this form (see later). They are separable (see later). We have to consider extensions of \(F_{\mathrm{c}}\) that are not necessarily separable. Thus we will prove the following lower bound, where from now on we write \(F_0=\mathbf{F}_p(t)\).

Proposition 2

There is a positive constant c depending only on p with the following property. Let \(F_{\mathrm{c}}\) be a finite cyclotomic extension of \(F_0\) and let F be an extension of \(F_{\mathrm{c}}\) of degree d. Then for any non-torsion \(\xi \) in F we have

$$\begin{aligned} {\hat{h}}(\xi ) \ge c^{-1}d^{-1}{(\log {16D})^{-3} \over (\log \log {16D})^{-2}} \end{aligned}$$

where \(D=[F:F_0]\).

In fact the lower bound can be multiplied by

$$\begin{aligned} \min \left\{ {(\log {16D})^3 \over \log \log {16D}},q^2\log \log {16D}, q^2\sqrt{d}\right\} \ge 1, \end{aligned}$$

where q is the inseparable degree of F over \(F_{\mathrm{c}}\); but this seems such a tiny improvement that we did not bother to include the details.

In the case \(F_{\mathrm{c}}=F_0\) (so no cyclotomy) a very slightly stronger result with \((\log \log {16D})^{-3}\) in place of \((\log \log {16D})^{-2}\) was proved by Denis [21] as Théorème 2 (p.218); but only for extensions which are regular (apparently not essential) and separable - a genuine restriction. This restriction was lifted by Demangos [20], even for a class of Drinfeld modules including Carlitz. The lower bound of his Theorem 2 (p.153) involves an extra negative power of the inseparable degree of F over \(F_0\). But it has the advantage that all the constants appearing are explicitly calculated. See also Bosser and Galateau [14] for several simplifications and improvements, especially Theorem 1.8 (p.168).

In the case \(F=F_{\mathrm{c}}\) (so only cyclotomy) David and Pacheco [19] have shown in Théorème 1.0.1 (p.1046) that in fact \({\hat{h}}(\xi ) \ge c^{-1}\) (and even the generalizations to abelian extensions and Drinfeld modules). See also Bauchère [4] for further generalizations.

We now describe our proofs.

That of our Conjecture for \(n=2\) follows one of the classical proofs over \(\overline{\mathbf{Q}}\).

For \(n=3\), the proof of our Theorem for \(K=\overline{F_0}\) follows the general strategy of [8], using height upper and lower bounds.

We prove Proposition 1 by adopting the slightly simplified exposition in [42].

As for Proposition 2, it is an analogue of a result of Amoroso and Zannier [2] over \(\overline{\mathbf{Q}}\) (they actually treated abelian extensions). We have at our disposal the proof in [21] for \(F_{\mathrm{c}}=F_0\); this is the natural analogue of the classical result of Dobrowolski [23]. But in fact this proof in [21] resembles much more Laurent’s analogue [37] for elliptic curves with complex multiplication. It was Ratazzi [53] who extended Laurent’s result to abelian extensions. Meanwhile Pontreau [52] gave a simpler proof of the result of [2] restricted to cyclotomic extensions (whose discriminants are known), and it is this that we adapt here to the the Carlitz context. However our choice of parameters in the auxiliary polynomial is rather different from his; for example we differentiate about d times and he only about \(\log d\) times.

In much of the early work it was possible in analogues of Proposition 2 to restrict to \(\xi \) that are integral in some sense. But already this made some trouble in the elliptic case [37] and in the original Carlitz work [21]. Here we are obliged to distinguish between valuations of small and large ramification and exploit the known ramification properties of cyclotomic extensions (this argument may fail for abelian extensions).

We have mentioned that over positive characteristic there may well be problems with inseparability. In principle this causes trouble here, especially in the use of an analogue of Siegel’s Lemma. We overcome this by adapting a version due to Thunder [57]. This version also involves the genus of the function field \(F_{\mathrm{c}}\) analogous to well-known results of Bombieri and Vaaler [6] involving the discriminant; for us it is fortunate that the genus of \(F_{\mathrm{c}}\) is also known (just as for cyclotomic extensions of \(\mathbf{Q}\)).

On the other hand inseparability can be an advantage, For example Theorem 6.6 (p.64) of Ghioca [25] about Drinfeld modules implies that \({\hat{h}}(\xi ) \ge p^{-19}\) for any non-torsion \(\xi \) in any purely inseparable extension of \(F_0\). So there is no dependence at all on field degrees. In fact here the only possible torsion \(\xi \) are 0 when \(p>2\) and \(0,1,t,t+1\) when \(p=2\) (see later).

In the earlier work [43] and [15] over positive characteristic it was relatively easy to extend the results from \(\overline{F_0}\) to general K by means of transcendence degree arguments. This does not seem possible here. Instead we follow a specialization strategy of Bombieri, Zannier and the second author in [9]. But the additive situation diverges somewhat from the multiplicative situation and there are several new elements. For example the zeroes and poles of \(x_1^{a_1}x_2^{a_2}x_3^{a_3}\) are on an equal footing, but this is not true of the Carlitz analogue \(f=\alpha _1x_1+\alpha _2x_2+\alpha _3x_3\), whose poles are clear but whose zeroes are far from evident. In [9] we used Mason’s abc inequality to settle similar problems. We do not yet have a Carlitz abc but we can exploit the underlying differentiating idea by noting that the derivative of \(tx+x^p\) is just t. Using certain systems of identities we can in this way show that the degree of f does not drop too much under specialization. Also in [9] we made an innocent-looking appeal to a result in Mumford [50] for counting inverse images of algebraic maps. This result used the concept of topologically unibranch. In the literature we could not find a suitable positive characteristic analogue and so we developed our own substitute.

The rest of our paper is arranged as follows.

In Sect. 2 we consider the case \(n=2\) of the Conjecture. After warming-up with a simple explicit example, we turn to the general proof; but in view of previous work we feel justified with just a sketch. It also follows from our Theorem by “lifting” the curve in \(\mathbf{G}_{\mathrm{a}}^2\) to \(\mathbf{G}_{\mathrm{a}}^3\) by introducing a sufficiently general constant value of \(x_3\).

Then in Sect. 3 we prove Proposition 1, also after giving an explicit example.

We postpone the comparatively technical proof of Proposition 2, and in section 4 we show that Propositions 1 and 2 imply the Theorem for \(K=\overline{F_0}\).

Section 5 contains preliminary material on Siegel’s Lemma, and Sect. 6 more preparations for the proof of Proposition 2, which then follows in Sect. 7.

In Sects. 8 and 9 we start some preliminaries for the general K, including the identities mentioned above. After that Sect. 10 contains an extension of Proposition 1. The main specialization arguments follow in Sect. 11.

We are then able almost to complete the proof in Sect. 12, with a final extra argument in Sect. 13 because certain statements of Mordell-Lang type are not quite in the literature.

As a matter of fact inseparability turns out to be not quite such a problem for the application to our Theorem, because we show in an Appendix that the relevant inseparable degree is bounded. Nevertheless, we have to take it into account in the proof of Proposition 2.

It will be clear to the experts that everything in this paper extends immediately from \(\mathbf{F}_p\) to arbitrary finite fields.

And it should go without saying that all our results are effective, and indeed we shall make no further reference to such matters.

We have mentioned Drinfeld modules several times already, so there naturally occurs the problem of generalizing this paper to those (or even to t-modules). It is reasonable to expect some sort of analogue of our Conjecture to hold.

We heartily thank Umberto Zannier for valuable correspondence regarding some of the estimates in Sect. 8.


As examples we start with an example for the line \(x+y=1\) as in (1) and also the hyperbola \(xy=1\), with K arbitrary as in the Conjecture. In fact we go further and determine the solutions for every p, as Leitner did in [38, 39] with \(\mathbf{G}_{\mathrm{m}}^4\) (confirming an expectation of Hrushovski [33] p.669). This leads to the 23 mentioned in the abstract.

Example 1

(a) If \(p > 2\) then there are no \(\xi ,\eta \) in K with \(\xi +\eta =1\) for which there exist \(\alpha \ne 0,~ \beta \ne 0\) in \({{\mathcal {R}}}\) with \(\alpha \xi = \beta \eta = 0\). If \(p=2\) then there are infinitely many solutions, all obtained by choosing any torsion \(\xi \) and taking \(\eta =1-\xi \).

(b) If \(p > 3\) then there are no \(\xi ,\eta \) in K with \(\xi \eta =1\) for which there exist \(\alpha \ne 0,~ \beta \ne 0\) in \({{\mathcal {R}}}\) with \(\alpha \xi = \beta \eta = 0\). If \(p=3\) there are twelve, corresponding to

$$\begin{aligned} \xi ^4+t\xi ^2+1=0,~~~~\xi ^4+(t+1)\xi ^2+1=0,~~~~\xi ^4+(t+2)\xi ^2+1=0. \end{aligned}$$

If \(p=2\) there are eleven, corresponding to

$$\begin{aligned} \xi =1,~~~~\xi ^2+t\xi +1=0,~~~~\xi ^2+(t+1)\xi +1=0 \end{aligned}$$


$$\begin{aligned} \xi ^3+(t+1)\xi ^2+t\xi +1=0,~~~~\xi ^3+t\xi ^2+(t+1)\xi +1=0. \end{aligned}$$

Verification. For (a) we start by checking that the line defined by \(x+y=1\) does not lie in a proper Carlitz submodule of \(\mathbf{G}_{\mathrm{a}}^2\) defined by say \(\rho x+\sigma y=0\). Otherwise we would get

$$\begin{aligned} 0=\rho x+\sigma y=\rho x+\sigma (1-x)=\rho x+\sigma 1-\sigma x=(\rho -\sigma )x+\sigma 1 \end{aligned}$$

identically in x. However if \(\rho -\sigma \ne 0\) then \((\rho -\sigma )x\) involves x. Thus \(\rho -\sigma = 0\) and we are left with \(\sigma 1=0\). But if \(\sigma =S({{{\mathcal {C}}}})\) for a non-constant polynomial S of degree d then as \(p \ne 2\) we can quickly check that \(\sigma 1\) is a polynomial in t of degree \(p^{d-1}\). This fails for \(p=2\) and indeed then \(\sigma _0 1=0\) for \(\sigma _0={{{\mathcal {C}}}}^2+{{{\mathcal {C}}}}\) (compare [30] p.61). So here the line lies in \(\sigma _0x+\sigma _0y=0\). This even makes the above example (a) fail for \(p=2\): simply take any torsion element \(\xi \) then because 1 is torsion so also is \(\eta =1-\xi \).

In fact this calculation proves (a) at once: if \(\xi ,\eta \) are torsion then so is \(\xi +\eta =1\) (it is the Carlitz analogue of say \(xy=2\) in zero characteristic \(\mathbf{G}_{\mathrm{m}}^2\)).

For (b) it is clear that the hyperbola defined by \(xy=1\) does not lie in a proper Carlitz submodule of \(\mathbf{G}_{\mathrm{a}}^2\) defined by \(\rho x+\sigma y=0\), because if \(\rho \ne 0\) then \(\rho x\) has positive degree in x and if \(\sigma \ne 0\) then \(\sigma y\) for \(y=1/x\) has negative degree in x.

Now we proceed to the main argument, by diophantine approximation as in the classical case of \(\mathbf{G}_{\mathrm{m}}^2\) over zero characteristic. There is a monic polynomial N in \(\mathbf{F}_p[X]\) of minimal degree, say n, such that the torsion elements \(\xi ,\eta \) (if any) satisfy

$$\begin{aligned} N({{{\mathcal {C}}}})\xi =N({{{\mathcal {C}}}})\eta =0. \end{aligned}$$

If \(\zeta \) is a primitive N-torsion element, then there are polynomials RS with

$$\begin{aligned} \xi =R({{{\mathcal {C}}}})\zeta ,~\eta =S({{{\mathcal {C}}}})\zeta \end{aligned}$$

(see for example [30] p.55); further we can assume RS are of degree at most \(n-1\). We can find polynomials ABD not all zero with \(AR+BS=DN\); and a simple counting argument shows that we can take ABD of degree at most n/2. Indeed there are \(3(1+[n/2])\) coefficients at our disposal subject to \(n+[n/2]\) linear conditions, and the difference is

$$\begin{aligned} 3+2\left[ {n\over 2}\right] -n \ge 3+2\left( {n\over 2}-{1\over 2}\right) -n=2. \end{aligned}$$

It follows that

$$\begin{aligned} \alpha \xi +\beta \eta =0, \end{aligned}$$

for \(\alpha =A({{{\mathcal {C}}}}),\beta =B({{{\mathcal {C}}}})\), an equation of total degree at most \(p^{n / 2}\) in \(\xi ,\eta \). We can apply Bezout to this and \(\xi \eta =1\) by our opening observation. We find that for this particular N there are at most \(2p^{n/2}\) pairs \((\xi ,\eta )\).

On the other hand replacing \(\zeta \) by any of its conjugates over \(F_0\) in (6) gives a conjugate pair also on the same hyperbola. Further these pairs are all different, because by the minimality of N and (5), (6) the polynomials RSN can have no common factor; and so we can solve \(GR+HS+LN=1\) giving \(\zeta =G({{{\mathcal {C}}}})\xi +H({{{\mathcal {C}}}})\eta \).

Now the degree of \(\zeta \) over \(F_0\) is well-known to be the Carlitz-Euler \(\phi (N)\) (see [16] p.173). It follows that

$$\begin{aligned} 2p^{n/2} \ge \phi (N). \end{aligned}$$

In terms of a prime factorization \(N=\prod _{i=1}^r N_i^{e_i}\) it is

$$\begin{aligned} \phi (N)=\prod _{i=1}^r\phi (N_i^{e_i})=\prod _{i=1}^rp^{e_in_i}\left( 1-{1 \over p^{n_i}}\right) =p^n\prod _{i=1}^r\left( 1-{1 \over p^{n_i}}\right) \end{aligned}$$

with \(n_i\) as the degree of \(N_i~(i=1,\ldots ,r)\). Thus

$$\begin{aligned} \phi (N) \ge p^n\left( 1-{1 \over p}\right) ^r\ge (p-1)^n. \end{aligned}$$

So from (8) and the fact that \(2p^{1/2} < p-1\) for \(p>5\) we reduce to the cases \(p=2,3,5\).

If \(p=5\) then \(2p^{n/2} \ge (p-1)^n\) forces \(n=1\), so \(N=X+a ~(a=0,1,2,3,4)\). But the polynomials \(N({{{\mathcal {C}}}})T=T^5+tT+a\) for \(a \ne 0\) are irreducible and non-reciprocal and none is the reciprocal of another, and for \(a=0\) it is irreducible and non-reciprocal after dividing by T. So by (5) there are no solutions if \(p=5\).

If \(p=3\) then \(2p^{n/2} \ge (p-1)^n\) forces \(n \le 4\). Checking each of the 81 possibilities for N we find that

$$\begin{aligned} N=X^2+2,~~~~X^2+2X,~~~~X^2+X \end{aligned}$$

give rise via (5) to the solutions indicated in (2). We find no other \(\xi \).

If \(p=2\) we have to work a bit harder, and we divide into eight cases according to which of the three irreducible polynomials \(X,X+1,X^2+X+1\) of degree at most 2 divide N.

The worst case is when all three divide N, say as \(N_1,N_2,N_3\) respectively. Thus \(r \ge 3\) and

$$\begin{aligned} \phi (N) = 2^n\left( {1 \over 2}\right) \left( {1 \over 2}\right) \left( {3 \over 4}\right) \prod _{i=4}^r\left( 1-{1 \over 2^{n_i}}\right) . \end{aligned}$$

In the product \(n_i \ge 3\) so it is at least \((7/8)^{r-3}\). But also

$$\begin{aligned} n=e_1+e_2+2e_3+\sum _{i=4}^re_in_i \ge 4+3(r-3). \end{aligned}$$

So now \(n \ge 4\) and we get

$$\begin{aligned} 2(2^{n/2}) \ge 3(2^{n-4})\left( {7 \over 8}\right) ^{(n-4)/3}, \end{aligned}$$

which implies \(n \le 7\). Therefore \(N=N_1N_2N_3(X^3+aX^2+bX+c)\) leading to just eight possibilities for N.

The other seven cases lead to \(n \le 6\) and better. For example when none of \(N_1,N_2,N_3\) divide N, then every \(n_i \ge 3\) and so the above product is at least \((7/8)^n\).

Here the factorization of each \(N({{{\mathcal {C}}}})T\) on Maple can take up to two hours. Spotting reciprocal factors or reciprocal pairs by eye is not too easy and so ad hoc methods using resultants were developed. Thus if the resultant of \(N({{{\mathcal {C}}}})T\) and its own reciprocal is non-zero, then such factors cannot exist. Already all solutions come from

$$\begin{aligned} N=X^2+X,~~~~X^3+X^2,~~~~X^3+X,~~~~X^4+X. \end{aligned}$$

The first N gives \(\xi =1\) of course in (3) and the next two give the next two there; but the last N provides the reciprocal pair

$$\begin{aligned} T^3+(t+1)T^2+tT+1,~~~~T^3+tT^2+(t+1)T+1 \end{aligned}$$

leading to the last two in (4).

This completes the verification of Example 1 (we checked that allowing powers of primes does not yield any additional solutions).

It is not much more difficult to prove the general conjecture above in \(\mathbf{G}_{\mathrm{a}}^2\) by these means. We just sketch the details, as it also follows rather easily from our Theorem by “lifting” the curve in \(\mathbf{G}_{\mathrm{a}}^2\) to \(\mathbf{G}_{\mathrm{a}}^3\) by introducing a sufficiently general constant value of \(x_3\).

It suffices to treat the case \(K=\overline{F_0}\). We can assume that C is defined over a finite extension F of \(F_0\), otherwise it contains anyway at most finitely many points algebraic over \(F_0\) and in particular torsion points. We obtain as above (7) and now it is by assumption that we can apply Bezout. This time we are allowed to use the conjugates of \(\zeta \) only over F, but that hardly affects their number. However as in Example 1 we need a lower bound for \(\phi (N)\) better than (9) and at least of the form \(c(p^n)^\theta \) for some \(\theta > {1 \over 2}\). The only trouble is at \(p=2\) but in general we can argue

$$\begin{aligned} \prod _{i=1}^r\left( 1-{1 \over p^{n_i}}\right) \ge \prod _{j=1}^n\left( 1-{1 \over p^{j}}\right) ^{r_j} \end{aligned}$$

for the number \(r_j\) of monic irreducible polynomials of degree j in \(F_0\). Taking logarithms, using \(-\log (1-x) \le 2x\) for \(0 \le x \le {1 \over 2}\), and also

$$\begin{aligned} r_j={1 \over j}\sum _{d|j}\mu (d)p^{j/d}\le {p^j \over j}\left( 1+\sum _{6 \le d |j}{1 \over p^{j(1-1/d)}}\right) \le {p^j \over j}\left( 1+{j \over 2^{5j/6}}\right) \le 2{p^j\over j} \end{aligned}$$

as well as \(\sum _{j=1}^n{1 / j} \le \log n +1\) we find

$$\begin{aligned} \phi (N)\ge {p^n \over 55n^4} \end{aligned}$$

(with n as the degree of N) comfortably of the required form. We shall need both (10) and (11) later.

It will now be seen that \(\Vert N\Vert \) is a useful notation for \(p^n\).

Proof of Proposition 1

We use the standard height function h on \(F_0=\mathbf{F_p}(t)\) normalized to \(h(t^d)=d\). Thus if \(\xi _0\) is in \(F_0\) we have

$$\begin{aligned} h(\xi _0)=\sum _w\log \max \{1,|\xi _0|_w\} \end{aligned}$$

taken over all w on \(F_0\) (which correspond to monic irreducible P in \(F_0\), with \(|P|_P=e^{-d}\) for d the degree of P, and \(|t|_\infty =e\)). We extend it in the usual way to the algebraic closure, so that if \(\xi \) is in an extension F of degree D over \(F_0\), then

$$\begin{aligned} h(\xi )={1 \over D}\sum _vD_v\log \max \{1,|\xi |_v\} \end{aligned}$$

over all valuations v on F which extend those on \(F_0\) (that is, for all x in \(F_0\) we have \(|x|_v=|x|_w\) for some w as above) and \(D_v=e_vf_v\) the local degrees. We shall often say that v divides P or \(\infty \).

For example

$$\begin{aligned} h(t^\theta )=|\theta | \end{aligned}$$

when \(\theta \) is rational.

First here is an illustration in the spirit of Example 1. However we cannot use the line \(x+y=1\) here because \(1x+1y\) is constant on it. And indeed the points with coordinates

$$\begin{aligned} \xi ={{{\mathcal {C}}}}^m1,~~\eta =1-{{{\mathcal {C}}}}^m1=(1-{{{\mathcal {C}}}}^m)1 \end{aligned}$$

have \(h(\xi )=p^{m-1}\) going to infinity with m provided \(p \ne 2\), in view of our remarks above about the degree of \(\sigma 1\) (again the analogue of \(xy=2\) in zero characteristic \(\mathbf{G}_{\mathrm{m}}^2\)). But the (Carlitz) hyperbola \(xy=1\) is fine.

Example 2

If \(\xi ,\eta \) are in \(\overline{F_0}\) with \(\xi \eta =1\) for which there exist \(\alpha ,\beta \) not both zero in \({{\mathcal {R}}}\) with \(\alpha \xi =\beta \eta \) then \(h(\xi )+h(\eta )\le 18\).

Verification. We use the canonical height introduced by Denis for Drinfeld modules (for which he proved his lower bound, at least in the Carlitz case). This is defined as

$$\begin{aligned} {\hat{h}}(\xi )=\lim _{m \rightarrow \infty }{h({{{\mathcal {C}}}}^m\xi ) \over p^m}. \end{aligned}$$

Not only do we have the obvious \({\hat{h}}({{{\mathcal {C}}}}\xi )=p{\hat{h}}(\xi )\) but even (in the notation at the end of Sect. 2)

$$\begin{aligned} {\hat{h}}(P({{{\mathcal {C}}}})\xi )=\Vert P\Vert {\hat{h}}(\xi ) \end{aligned}$$

for any non-zero P in \(\mathbf{F}_p[X]\). Also since \({{{\mathcal {C}}}}\) is additive we get

$$\begin{aligned} {\hat{h}}(\xi +\eta ) \le {\hat{h}}(\xi )+{\hat{h}}(\eta ) \end{aligned}$$

(but not \({\hat{h}}(\xi \eta ) \le {\hat{h}}(\xi )+{\hat{h}}(\eta )\) or even \({\hat{h}}(\xi ^2) \le 2{\hat{h}}(\xi )\), for example \(\xi \) can be torsion while \(\xi ^2\) is not, as for \(\xi =\sqrt{-t}\) with \(p=3\); and even \({\hat{h}}(1/\xi )\) need not be \({\hat{h}}(\xi )\), as for \(\xi =t\) with \(p=2\)).

To compare with (13), it may be shown for \(p=2\) that

$$\begin{aligned} {\hat{h}}(t^\theta )=\theta ,~~0,~~{1+\theta \over 2},~~-\theta +2^{[\theta ]-1}(\theta -[\theta ]+1),~~-\theta \end{aligned}$$


$$\begin{aligned} \theta >1,~~\theta =1,~~0<\theta <1,~~0 \ge \theta \notin \mathbf{Z},~~0 \ge \theta \in \mathbf{Z} \end{aligned}$$

respectively (we shall not need these values).

Denis showed that \({\hat{h}}(\xi )\) differs from \(h(\xi )\) by a bounded amount, and indeed we now check that

$$\begin{aligned} |{\hat{h}}(\xi )-h(\xi )| \le 3 \end{aligned}$$

independently of p.

In the first place we have an upper bound

$$\begin{aligned} h({{{\mathcal {C}}}}\xi )= & {} h \left( \xi \left( t+\xi ^{p-1}\right) \right) \le h(\xi )+h \left( t+\xi ^{p-1}\right) \\\le & {} h(\xi )+1+(p-1)h(\xi )=ph(\xi )+1. \end{aligned}$$

For a corresponding lower bound we use the standard Nullstellensatz argument. With \(\rho =\xi t^{p+1}\) and

$$\begin{aligned} \sigma =\xi ^{(p-1)p}-t\xi ^{(p-1)(p-1)}+t^2\xi ^{(p-1)(p-2)}\cdots -t^p={(\xi ^{p-1})^{p+1}-(-t)^{p+1}\over \xi ^{p-1}-(-t)} \end{aligned}$$

we have

$$\begin{aligned} \rho +\sigma (\xi ^p+t\xi )=\xi ^{p^2}. \end{aligned}$$

Thus for any ultrametric valuation we deduce

$$\begin{aligned} \max \left\{ 1,|\xi |^{p^2}\right\} \le \max \{1,|{{{\mathcal {C}}}}\xi |\}\max \left\{ 1,|\xi |^{p^2-p}\right\} \max \left\{ 1,|t|^{p+1}\right\} . \end{aligned}$$

Cancelling and taking the product with suitable exponents leads in the usual way to

$$\begin{aligned} h({{{\mathcal {C}}}}\xi )\ge ph(\xi )-(p+1). \end{aligned}$$

The standard telescoping sum gives

$$\begin{aligned} |{\hat{h}}(\xi )-h(\xi )| \le (p+1)\sum _{i=1}^\infty p^{-i}={p+1 \over p-1} \le 3. \end{aligned}$$

Now take \(\xi ,\eta \) as in Example 2. If \(\alpha =0\) then \(\eta \) is torsion so \(h(\eta )\le 3\) by (16). Thus \(h(\xi )=h(\eta ) \le 3\) too, and we are done. Similarly if \(\beta =0\); so we will henceforth assume that \(\alpha \ne 0,\beta \ne 0\). Write \(\alpha =A({{{\mathcal {C}}}}),\beta =B({{{\mathcal {C}}}})\). Then (14) leads to

$$\begin{aligned} p^l{\hat{h}}(\xi )=p^m{\hat{h}}(\eta ) \end{aligned}$$

with l the degree of A and m the degree of B. As in the situation over \(\overline{\mathbf{Q}}\) (see for example Theorem 14.9 of [44] p.176) everything depends on the relation between l and m. We note from (16) that

$$\begin{aligned} {\hat{h}}(\eta )={\hat{h}}(1/\xi )\le h(1/\xi )+3=h(\xi )+3\le {\hat{h}}(\xi )+6. \end{aligned}$$

First suppose \(l>m\). Then \(p{\hat{h}}(\xi ) \le {\hat{h}}(\eta )\) which by (17) leads to \({\hat{h}}(\xi ) \le 6\) and so \(h(\xi ) \le 9\); further \(h(\eta )=h(\xi )\le 9\) as well. So

$$\begin{aligned} h(\xi )+h(\eta ) \le 18. \end{aligned}$$

By symmetry we get the same result if \(l<m\). So it remains only to consider \(l=m\).

We can now write

$$\begin{aligned} \alpha =\alpha _0+a{{{\mathcal {C}}}}^m,~\beta =\beta _0+b{{{\mathcal {C}}}}^m \end{aligned}$$

with nonzero ab in \(\mathbf{F}_p\) and \(\alpha _0,\beta _0\) of degree in \({{{\mathcal {C}}}}\) smaller than m. Then

$$\begin{aligned} \alpha _0\xi -\beta _0\eta =b{{{\mathcal {C}}}}^m\eta -a{{{\mathcal {C}}}}^m\xi \end{aligned}$$

and we proceed to compare the canonical heights.

To begin with the left-hand side, (15) gives

$$\begin{aligned} {\hat{h}}(\alpha _0\xi -\beta _0\eta )\le {\hat{h}}(\alpha _0\xi )+{\hat{h}}(\beta _0\eta ) \end{aligned}$$

which is by (14) and (17) at most

$$\begin{aligned} p^{m-1}\left( {\hat{h}}(\xi )+{\hat{h}}(\eta )\right) \le p^{m-1}(2h+6) \end{aligned}$$

with \(h=h(\xi )\).

To continue with the right-hand side

$$\begin{aligned} {\hat{h}}(b{{{\mathcal {C}}}}^m\eta -a{{{\mathcal {C}}}}^m\xi )=p^m{\hat{h}}(b\eta -a\xi ) \end{aligned}$$


$$\begin{aligned} {\hat{h}}(b\eta -a\xi )\ge h(b\eta -a\xi )-3=h(b/\xi -a\xi )-3=2h-3 \end{aligned}$$

(here we used \(ab \ne 0\)). Comparison with (19) gives

$$\begin{aligned} h \le {3p+6 \over 2p-2} \le 6. \end{aligned}$$

The same argument gives the same bound for \(h(\eta )\) and by addition we get something stronger than (18). So the verification is complete.

Next we need an analogue of Lemma 2.1 of [42] (p.327).

Lemma 1

If \(\xi _1,\ldots ,\xi _n\) are in \(\overline{F_0}\) for which there exist \(\alpha _1,\ldots ,\alpha _n\) not all zero in \({{\mathcal {R}}}\) with \(\alpha _1\xi _1+\cdots +\alpha _n\xi _n=0\), then for any non-negative integer m there are \(\beta _1,\ldots ,\beta _n\) in \({{\mathcal {R}}}\), not all zero, such that

$$\begin{aligned} {\hat{h}}(\beta _1\xi _1+\cdots +\beta _n\xi _n) \le p^{-m/n}({\hat{h}}(\xi _1)+\cdots +{\hat{h}}(\xi _n)) \end{aligned}$$

with \(\beta _1=B_1({{{\mathcal {C}}}}),\ldots ,\beta _n=B_n({{{\mathcal {C}}}})\) for

$$\begin{aligned} \max \{\Vert B_1\Vert ,\ldots ,\Vert B_n\Vert \} \le p^m. \end{aligned}$$


Write \(\alpha _i=A_i({{{\mathcal {C}}}})~(i=1,\ldots ,n)\) with \(p^d=\max \{\Vert A_1\Vert ,\ldots ,\Vert A_n\Vert \}\). Choose any A in \(\mathbf{F}_p[X]\) with degree exactly d. Then the \({A_i / A}~(i=1,\ldots ,n)\) are in the completion \(\mathbf{F}_p[[1/X]]\) of \(\mathbf{F}_p[X]\). It follows that for any positive integer l we can find \(Q\ne 0,B_1,\ldots ,B_n\) in \(\mathbf{F}_p[X]\) of degree at most m such that

$$\begin{aligned} Q{A_i \over A}-B_i \mathrm{~is~in~} X^{-l}\mathbf{F}_p[[1/X]]~~~~~~~~(i=1,\ldots ,n) \end{aligned}$$

as long as \(m+1>n(l-1)\). Thus we can choose \(l=[{m / n}]+1>{m / n}\). Now the polynomials \(C_i=QA_i-AB_i~(i=1,\ldots ,n)\) have degrees at most \(d-l\).

We now put

$$\begin{aligned} \beta _i=B_i({{{\mathcal {C}}}}),~~~\gamma _i=C_i({{{\mathcal {C}}}})~~~~~(i=1,\ldots ,n) \end{aligned}$$

and note that

$$\begin{aligned} -A({{{\mathcal {C}}}})(\beta _1\xi _1+\cdots +\beta _n\xi _n)=\gamma _1\xi _1+\cdots +\gamma _n\xi _n. \end{aligned}$$

Taking canonical heights gives

$$\begin{aligned} p^d{\hat{h}}(\beta _1\xi _1+\cdots +\beta _n\xi _n)={\hat{h}}(\gamma _1\xi _1+\cdots +\gamma _n\xi _n)\le p^{d-l}\left( {\hat{h}}(\xi _1)+\cdots +{\hat{h}}(\xi _n)\right) . \end{aligned}$$

This is the required result since \(l>{m / n}\). Note that \(\beta _1,\ldots ,\beta _n\) are indeed not all zero otherwise (20) would give a contradiction, because \(l >0\) and some \(A_i\) has the same degree as A. \(\square \)

We can now prove Proposition 1. Suppose that C is defined over a finite extension F of \(F_0\). We fix some m with

$$\begin{aligned} p^{m/n}\ge 2nd \end{aligned}$$

where now d is the degree of the curve C. For any point \(P=(\xi _1,\ldots ,\xi _n)\) as there we construct \(\beta _i=B_i({{{\mathcal {C}}}})~(i=1,\ldots ,n)\) as in Lemma 1. Then the function \(y=\beta _1x_1+\cdots +\beta _nx_n\) is not constant on C by hypothesis. Thus for any i there is a polynomial \(\Phi _i(Y,X)\) in F[YX], of positive degree in X, such that \(\Phi _i(y,x_i)=0\) on C; further we can take the degree in Y to be at most d. By specialization there follows \(\Phi _i(\eta ,\xi _i)=0\) for \(\eta =\beta _1\xi _1+\cdots +\beta _n\xi _n\). Standard height estimates give now \(h(\xi _i) \le dh(\eta )+c\) for some c independent of P. So for \(h=h(\xi _1)+\cdots +h(\xi _n)\) we get by Lemma 1 and (16)

$$\begin{aligned} h\le & {} n(dh(\eta )+c)\le nd{\hat{h}}(\eta )+c'\\ {}\le & {} ndp^{-m/n}({\hat{h}}(\xi _1)+\cdots +{\hat{h}}(\xi _n))+c'\le ndp^{-m/n}h+c'' \end{aligned}$$

for \(c',c''\) also independent of P. The required result \(h \le 2c''\) now follows from (22). This completes the proof of Proposition 1.

Proof of Theorem for \(K=\overline{F_0}\)

Here we deduce the Theorem for \(K=\overline{F_0}\) from Proposition 1 just proved together with Proposition 2 whose proof will follow in Sect. 7. To avoid logarithmic pedantries we reformulate Proposition 2 as follows.

Assertion. Given \(\varepsilon >0\), there is a positive constant c depending only on p and \(\varepsilon \) with the following property. Let \(F_{\mathrm{c}}\) be a finite cyclotomic extension of \(F_0\) and let F be a finite extension of \(F_{\mathrm{c}}\). Then for any non-torsion \(\xi \) in F we have

$$\begin{aligned} {\hat{h}}(\xi ) \ge c^{-1}[F:F_{\mathrm{c}}]^{-1-\varepsilon }[F_{\mathrm{c}}:F_0]^{-\varepsilon }. \end{aligned}$$

It will be clear that all we need is any \(\varepsilon < 1/3\). But during this section we will use \(\ll \) instead of c.

In what follows we shall be relatively brief, as it follows the strategy over \(\overline{\mathbf{Q}}\) (see for example [42] p.330).

Given \((\xi _1,\xi _2,\xi _3)\) as in the Theorem, we can find \(\xi \) and a torsion point \(\zeta \) such that

$$\begin{aligned} F_0(\xi _1,\xi _2,\xi _3)=F_0(\zeta ,\xi ) \end{aligned}$$


$$\begin{aligned} \xi _1=\sigma _1\xi +\tau _1\zeta ,~~\xi _2=\sigma _2\xi +\tau _2\zeta ,~~\xi _3=\sigma _3\xi +\tau _3\zeta \end{aligned}$$

for \(\sigma _1,\tau _1,\sigma _2,\tau _2,\sigma _3,\tau _3\) in \({{\mathcal {R}}}\), somewhat as in (6). Here it is because the \({{\mathcal {R}}}\)-module generated by \(\xi _1,\xi _2,\xi _3\) in \(\overline{F_0}\) has rank at most 1, and so is \({{{\mathcal {R}}}}\xi +{{{\mathcal {Z}}}}\) for a finitely generated torsion module \({{\mathcal {Z}}}\), which further has the form \({{{\mathcal {R}}}}\zeta \). And if \(\zeta \) has order \(\nu \) then we can take \(\tau _1,\tau _2,\tau _3\) as polynomials in \({{{\mathcal {C}}}}\) of degree less than that of \(\nu \).

We now want to find small \(\gamma _1,\gamma _2,\gamma _3,\delta \) in \({{\mathcal {R}}}\), not all zero, such that

$$\begin{aligned} \gamma _1\sigma _1+\gamma _2\sigma _2+\gamma _3\sigma _3=0,~~\gamma _1\tau _1+\gamma _2\tau _2+\gamma _3\tau _3=\delta \nu . \end{aligned}$$

Using any form of Siegel’s Lemma over \(\mathbf{F}_p[X]\), or by counting as above, we find a solution with

$$\begin{aligned} \max \{\Vert \gamma _1\Vert ,\Vert \gamma _2\Vert ,\Vert \gamma _3\Vert \} \ll (\Vert \nu \Vert M)^{1/2},~~~M=\max \{\Vert \sigma _1\Vert ,\Vert \sigma _2\Vert ,\Vert \sigma _3\Vert \}. \end{aligned}$$

It follows from (24) that

$$\begin{aligned} \gamma _1\xi _1+\gamma _2\xi _2+\gamma _2\xi _3=0. \end{aligned}$$

Clearly \(\gamma _1,\gamma _2,\gamma _3\) are not all zero, and we deduce from (25) and Bezout that

$$\begin{aligned} D=[F_0(\xi _1,\xi _2,\xi _3):F_0] \ll (\Vert \nu \Vert M)^{1/2}. \end{aligned}$$

On the other hand (24) gives

$$\begin{aligned} {\hat{h}}(\xi _1)=\Vert \sigma _1\Vert {\hat{h}}(\xi ),~~{\hat{h}}(\xi _2)=\Vert \sigma _2\Vert {\hat{h}}(\xi ),~~{\hat{h}}(\xi _3)=\Vert \sigma _3\Vert {\hat{h}}(\xi ). \end{aligned}$$

Let us temporarily assume that no non-trivial \(\rho _1x_1+\rho _2x_2+\rho _3x_3\) is constant on our curve C, as in Proposition 1. Then summing we get

$$\begin{aligned} {\hat{h}}(\xi ) \ll M^{-1}. \end{aligned}$$

Assuming further that \(\xi \) is non-torsion, we are now set up to apply the Assertion, with \(F=F_0(\xi _1,\xi _2,\xi _3)=F_{\mathrm{c}}(\xi )\) for \(F_{\mathrm{c}}=F_0(\zeta )\). We get

$$\begin{aligned} {\hat{h}}(\xi ) \gg (D/\phi (\nu ))^{-1-\varepsilon }(\phi (\nu ))^{-\varepsilon }=D^{-1-\varepsilon }\phi (\nu ). \end{aligned}$$

By (11) we have \(\phi (\nu ) \gg \Vert \nu \Vert ^{1-\varepsilon }\), and comparison gives \(\Vert \nu \Vert ^{1-\varepsilon }M \ll D^{1+\varepsilon }\). But for \(\varepsilon <1/3\) this contradicts (26) or better gives an upper bound for D which implies everything (by the usual Northcott).

If \(\xi \) is torsion, then \(\xi _1,\xi _2,\xi _3\) are, and Manin-Mumford on a suitable projection to two dimensions settles the thing.

Finally, what if some non-trivial \(\rho _1x_1+\rho _2x_2+\rho _3x_3\) is constant on C? We can assume the coefficients are coprime in \({{\mathcal {R}}}\) and then use \(\mathrm{GL}_3({{{\mathcal {R}}}})\) to assume that it is \(x_3\) that is some constant, which we can call \(\xi _3\). Then \(\xi _3\) is non-torsion. Now the thing reduces to Mordell-Lang in two dimensions, that is, a group of finite \(\mathbf{Q}\)-dimension (in fact dimension 1). This was done by Ghioca [27]. But we can too. Namely, we can just eliminate \(\xi _3\) from the two relations between \(\xi _1,\xi _2,\xi _3\) to get a relation between \(\xi _1,\xi _2\) on the projected curve \(C'\) in \(\mathbf{G}_{\mathrm{a}}^2\). By Proposition 1 for this projection we see that \(\xi _1,\xi _2\) have bounded heights (unless some non-trivial \(\phi =\kappa _1x_1+\kappa _2x_2\) is constant on \(C'\)); and of course so does \(\xi _3\). We still have (24) and we can argue as before. And really finally, if some non-trivial \(\phi \) is constant on \(C'\) then it may as well be \(x_2=\xi _2\). But now \(\xi _2,\xi _3\) must be independent and we cannot have two relations. All this is exactly parallel to the situation over \(\overline{\mathbf{Q}}\).

Siegel’s Lemma

Denis [21] and others use an ad hoc version for function fields, rather as in the proof of our Lemma 1. We need a relative version. That for number fields involves discriminants. The correct analogue for function fields involves the genus and was found by Thunder [57].

We stick with our \(F_0=\mathbf{F}_p(t)\), and a finite extension F of \(F_0\). We have a genus g(F) (see [3] for this and much more). In [57] (pp.148,150) a projective absolute height \(h_\mathbf{P}\) on \(F^n\) is defined; it involves the integer

$$\begin{aligned} {{\mathfrak {m}}}(F,F_0)={[F:F_0] \over [\mathbf{F}:\mathbf{F}_p]} \end{aligned}$$

where \(\mathbf{F}\) is the algebraic closure of \(\mathbf{F}_p\) in F. It is not hard to see that it coincides with the natural extension of (12) to non-zero vectors by

$$\begin{aligned} h_\mathbf{P}(\xi _1,\ldots ,\xi _n)={1 \over D}\sum _vD_v\log \max \{|\xi _1|_v,\ldots ,|\xi _n|_v\}. \end{aligned}$$

Also it is convenient to define the height of the zero vector as zero.

We have the following extension of Corollary 3 of [57] (p.149), in which a condition about full rank is eliminated. Of course we cannot afford the luxury of a Grassmannian height anymore.

Lemma 2

Let \(F_*\) be a finite extension of \(F_0\) and let \(F_{\mathrm{s}}\) be a separable extension of \(F_*\) of degree r. Let M be a matrix with \(m \ge 1\) rows \(R_1,\ldots ,R_m\) and n columns with entries in \(F_{\mathrm{s}}\). If \(l=n-rm \ge 1\) then there are linearly independent rows \(\mathbf{b}=\mathbf{b}_1,\ldots ,\mathbf{b}_{l}\) in \(F_*^n\) with \(M\mathbf{b}^t=0\) that satisfy

$$\begin{aligned} \sum _{\nu =1}^{l}h_\mathbf{P}(\mathbf{b}_\nu ) ~\le ~ rm \max _{\mu = 1, \ldots ,m} h_\mathbf{P}(R_\mu )+l {g+{{\mathfrak {m}}}\over {\mathfrak {m}}} \end{aligned}$$

where \(g=g(F_*)\) and \({{\mathfrak {m}}}={{\mathfrak {m}}}(F_*,F_0).\)


If \(M=0\) the result is obvious (for example with any l standard basis elements of \(F_*^n\)), so we assume \(M \ne 0\). We follow the argument of Bombieri and Gubler [5] (pp.75,79,80). For that we define \(h_\mathbf{P}({M'})\) for any matrix \(M'\) with \(m'\) rows and \(n'\) columns and \(m' < n'\) of rank \(s \ge 1\) to be the Grassmannian height \(h_\mathbf{P}({\hat{M}})\) as in [57] (p.151), where \({\hat{M}}\) consists of any s independent rows of \(M'\). This is the analogue of the definition in [5] (p.75).

Pick a basis \((\lambda _1,\ldots ,\lambda _r)\) of \(F_{\mathrm{s}}/F_*\) and write \(M=\lambda _1M_1+\cdots +\lambda _rM_r\) with \(M_1,\ldots ,M_r\) over \(F_*\). Let \(\sigma _1,\ldots ,\sigma _r\) be the different (here we use separability) embeddings of \(F_{\mathrm{s}}\), fixing \(F_*\), into the algebraic closure of \(F_*\). Then we check that \(\sigma (M)=\Lambda M_\sigma \) for

$$\begin{aligned} \sigma (M)=\begin{pmatrix}\sigma _1(M) \\ \vdots \\ \sigma _r(M)\end{pmatrix},~~M_\sigma = \begin{pmatrix}M_1 \\ \vdots \\ M_r\end{pmatrix} \end{aligned}$$

with invertible \(\Lambda \).

Let \(j \ge l\) be the dimension of the space B of all \(\mathbf{b}\) in \(F_*^n\) with \(M\mathbf{b}^t=0\). Thus we see that \(M_\sigma \) has rank \(n-j>0\). Thus \(\sigma (M)\) too. Let \({\hat{M}}_{\sigma }\) be a submatrix of \(M_\sigma \) consisting of \(n-j\) independent rows. By Corollary 2 of [57] (p.148) there are linearly independent \(\mathbf{b}_1,\ldots ,\mathbf{b}_{j}\) in \(F_*^n\) with \({\hat{M}}_{\sigma }{} \mathbf{b}^t=0\) that satisfy

$$\begin{aligned} \sum _{\nu =1}^{j}h_\mathbf{P}(\mathbf{b}_\nu ) ~\le ~ h_\mathbf{P}({\hat{M}}_{\sigma })+jG \end{aligned}$$

with \(G=({g-1+{{\mathfrak {m}}}) / {\mathfrak {m}}}\). These form of course a basis of B. Also by definition \(h_\mathbf{P}({\hat{M}}_{\sigma })=h_\mathbf{P}(M_\sigma )\) and it is not difficult to see that \(h_\mathbf{P}(M_\sigma )=h_\mathbf{P}(\sigma (M))\). We can calculate this last height by choosing any \(n-j\) independent rows. These have the form \(R=\sigma _\rho (R_\mu )\) with

$$\begin{aligned} h_\mathbf{P}(R)=h_\mathbf{P}(R_\mu ) \le \max _{\mu = 1, \ldots ,m} h_\mathbf{P}(R_\mu )=H \end{aligned}$$

(say), and because \(h_\mathbf{P}({\hat{M}}_\sigma )\) is at most the sum of the heights of its rows we get

$$\begin{aligned} \sum _{\nu =1}^{j}h_\mathbf{P}(\mathbf{b}_\nu ) ~\le ~ (n-j)H+jG. \end{aligned}$$

Ordering by increasing height we deduce

$$\begin{aligned} \sum _{\nu =1}^{l}h_\mathbf{P}(\mathbf{b}_\nu ) ~\le ~ {l \over j}((n-j)H+jG)=l{n-j \over j}H+lG, \end{aligned}$$

and finally recalling \(j \ge l\) we get the required result. \(\square \)

We next drop to a single solution.

Lemma 3

Let \(F_*\) be a finite extension of \(F_0\) and let \(F_{\mathrm{s}}\) be a separable extension of \(F_*\) of degree r. Let M be a matrix with \(m \ge 1\) rows \(R_1,\ldots ,R_m\) and n columns with entries in \(F_{\mathrm{s}}\). If \(n>rm\) then there is non-zero \(\mathbf{b}\) in \(F_*^n\) with \(M\mathbf{b}^t=0\) that satisfies

$$\begin{aligned} h_\mathbf{P}(\mathbf{b}) ~\le ~ {rm \over n-rm}\max _{\mu = 1, \ldots ,m} h_\mathbf{P}(R_\mu )+{g+{{\mathfrak {m}}} \over {{\mathfrak {m}}}} \end{aligned}$$

where \(g=g(F_*)\) and \({{\mathfrak {m}}}={{\mathfrak {m}}}(F_*,F_0).\)


This follows from Lemma 2 after taking the \(\mathbf{b}_\nu \) with smallest height. \(\square \)

Finally we allow inseparable extensions. This seems to be new.

Lemma 4

Let \(F_*\) be a finite extension of \(F_0\) and let F be an extension of \(F_*\) of degree d and inseparable degree q. Let M be a matrix with \(m \ge 1\) rows \(R_1,\ldots ,R_m\) and n columns with entries in F. If \(qn>dm\) then there is non-zero \(\mathbf{b}\) in \((F_*^{1/q})^n\) with \(M\mathbf{b}^t=0\) that satisfies

$$\begin{aligned} h_\mathbf{P}(\mathbf{b}) ~\le ~ {dm \over qn-dm}\max _{\mu = 1, \ldots ,m} h_\mathbf{P}(R_\mu )+{g+{{\mathfrak {m}}} \over {{\mathfrak {m}}}} \end{aligned}$$

where \(g=g(F_*)\) and \({{\mathfrak {m}}}={{\mathfrak {m}}}(F_*,F_0).\)


We first solve \(M^q\mathbf{c}^t=0\) using Lemma 3, where \(M^q\) is not the q-th power but just has entries the q-th powers of those of M. Now we are over the separable extension \(F^q\) of degree \(r=d/q\) and so there is a non-zero solution \(\mathbf{c}\) in \(F_*^n\) with

$$\begin{aligned} h_\mathbf{P}(\mathbf{c}) ~\le ~ {rm \over n-rm}\max _{\mu = 1, \ldots ,m} qh_\mathbf{P}(R_\mu )+{g+{{\mathfrak {m}}} \over {{\mathfrak {m}}}}. \end{aligned}$$

To finish we take \(\mathbf{b}=\mathbf{c}^{1/q}\), so that the height gets divided by q. \(\square \)

More preliminaries

Some of these are analogues of those occurring in the original Dobrowolski proof [23]. We write \({{{\mathcal {O}}}}_0\) for the ring of integers \(\mathbf{F}_p[t]\) of \(F_0=\mathbf{F}_p(t)\).

Lemma 5

For \(j \ge 8\) the number \(r_j\) of monic irreducible Q in \({{{\mathcal {O}}}}_0\) of degree j satisfies

$$\begin{aligned} {1 \over 2}{p^j \over j} \le r_j \le 2{p^j\over j}. \end{aligned}$$


The upper bound is (10) and the lower bound follows by similar arguments. Of course the Prime Number Theorem for \({{{\mathcal {O}}}}_0\) suffices for our purpose. \(\square \)

From now on we find it more convenient to use the notation \(\xi ^Q\) instead of \(Q({{{\mathcal {C}}}})\xi \) for our Carlitz \({{{\mathcal {C}}}}\).

Lemma 6

Let \(F_*\) be an extension of \(F_0\) of degree d and let F be an extension of \(F_*\). Suppose \(\xi \ne 0\) in F is not torsion.

(a) As Q runs over all monic irreducible polynomials in \({{{\mathcal {O}}}}_0\), the \(\xi ^Q\) are all non-conjugate over \(F_*\).

(b) As Q runs over all monic irreducible polynomials in \({{{\mathcal {O}}}}_0\), we have \([F_*(\xi ^Q):F_*]=[F_*(\xi ):F_*]\) with at most \(({\log d})/(\log 2)\) exceptions.


This is essentially Lemma 4 of [21] (p.219), where it is merely said that the proof is identical to Dobrowolski’s (and was for the separable case). We supply some details.

For (a) suppose \(\xi ^{Q_1},\xi ^{Q_2}\) are conjugate. Then so are \((\xi ^{Q_1})^{Q_1}=\xi ^{Q_1^2}\) and \((\xi ^{Q_2})^{Q_1}=\xi ^{Q_1Q_2}\), and so are \((\xi ^{Q_1})^{Q_2}=\xi ^{Q_2Q_1}=\xi ^{Q_1Q_2}\) and \((\xi ^{Q_2})^{Q_2}=\xi ^{Q_2^2}\). Iterating we find that the \(d+1\) elements \(\xi ^{Q_1^d},\xi ^{Q_1^{d-1}Q_2},\ldots ,\xi ^{Q_2^d}\) are all conjugate over \(F_*\). So two must coincide. As \(\xi \) is non-torsion this leads to \(Q_1^l=Q_2^l\) for some positive integer l. And as \(Q_1,Q_2\) are monic this leads to \(Q_1=Q_2\).

For (b) suppose \(Q_1,\ldots ,Q_m\) are different with

$$\begin{aligned}{}[F_*(\xi ^{Q_i}):F_*] \ne [F_*(\xi ):F_*]~~(i=1,\ldots ,m). \end{aligned}$$

Write for brevity \(Q=Q_{i}\) and \(R=Q_1\cdots Q_{i-1}\) (with \(R=1\) if \(i=1\)). Then \(F_*(\xi ^{RQ})\) lies in \(F_*(\xi ^{R})\) and we claim that equality is impossible. Otherwise \(\xi ^R\) in \(F_*(\xi ^{RQ})\) is in \(F_*(\xi ^{Q})\). Now there are UV in \({{{\mathcal {O}}}}_0\) with \(UR+VQ=1\), and then \(\xi =(\xi ^R)^U+(\xi ^Q)^V\) also lies in \(F_*(\xi ^{Q})=F_*(\xi ^{Q_i})\). But this would contradict (28).

Taking \(i=m,\ldots ,1\) we deduce that the fields

$$\begin{aligned} F_*(\xi ^{Q_1\cdots Q_m}),~F_*(\xi ^{Q_1\cdots Q_{m-1}}),\ldots ,~F_*(\xi ^{Q_1}),~F_*(\xi ) \end{aligned}$$

form a strictly increasing chain. Thus

$$\begin{aligned} 2^{m} \le [F_*(\xi ):F_*(\xi ^{Q_1\cdots Q_m})] \le [F_*(\xi ):F_*] \le d \end{aligned}$$

and now (b) is clear. \(\square \)

The next result reflects the need in some of the previous literature to distinguish between small and large ramification \(e_v\) as in (12).

Lemma 7

Let F be an extension of \(F_0\) of degree D, let Q be monic irreducible in \({{{\mathcal {O}}}}_0\), let \(\xi \) be in F, and let E be the set of v on F dividing Q such that \(\xi \) is not v-integral. Then

$$\begin{aligned} {\hat{h}}(\xi ) \ge {1 \over D}{\log \Vert Q\Vert \over \log p}\sum _{v \in {E}}{ D_v \over e_v}. \end{aligned}$$


If Q has degree n, the value group of the valuation on \(F_0\) corresponding to Q is generated by \(g=e^n=\Vert Q\Vert ^{1/\log p}\). So that of v by \(g^{1 / e_v}\). Thus for each v in E we have \(|\xi |_v \ge g^{1 / e_v}\). It follows easily that \(|\xi ^{t^m}|_v \ge g^{p^m/e_v}\) for any positive integer m. Thus \(h(\xi ^{t^m})\ge D^{-1}p^m(\log g)\sum _{v \in {E}}D_v/e_v\). So

$$\begin{aligned} p^m{\hat{h}}(\xi )={\hat{h}}(\xi ^{t^m}) \ge -3+{1 \over D}p^m(\log g)\sum _{v \in {E}}{D_v \over e_v} \end{aligned}$$

by (16). Making m tend to infinity gives the result. \(\square \)

From now on the intermediate field \(F_*\) will be cyclotomic over \(F_0\), so it has the form \(F_{\mathrm{c}}=F_0(\zeta )\), where \(\zeta \) has order N for some monic N in \({{{\mathcal {O}}}}_0\). Its degree over \(F_0\) is \(n_{\mathrm{c}}=\phi (N)\). For any monic Q in \({{{\mathcal {O}}}}_0\) prime to N there is a Frobenius automorphism \(\sigma =\sigma _Q\) of \(F_{\mathrm{c}}\) over \(F_0\) such that \(\sigma (\zeta )=\zeta ^Q\). Write \({{{\mathcal {O}}}}_{\mathrm{c}}={{{\mathcal {O}}}}_0[\zeta ]\); this is in fact the ring of integers of \(F_{\mathrm{c}}\), that is, the integral closure of \({{{\mathcal {O}}}}_0\) in \(F_{\mathrm{c}}\) (see for example [56] p.82).

Lemma 8

Suppose Q is monic irreducible in \({{{\mathcal {O}}}}_0\) prime to N and \(\Delta \) is in \({{{\mathcal {O}}}}_{\mathrm{c}}[X]\).

(a) We have \(\Delta ^\sigma (X^Q) \equiv \Delta ^\sigma (X^{\Vert Q\Vert })\) mod Q in \({{{\mathcal {O}}}}_{\mathrm{c}}[X]\)

(b) We have \(\Delta (X)^{\Vert Q\Vert } \equiv \Delta ^\sigma (X^{\Vert Q\Vert })\) mod Q in \({{{\mathcal {O}}}}_{\mathrm{c}}[X]\).


Part (a) for \(\Delta =1\) is Lemma 3 of [21] (p.219) - and then it holds even in \(\mathbf{F}_p[X]\). It follows immediately for general \(\Delta \). For part (b) we need to note that every coefficient \(\alpha \) of \(\Delta \) has the form \({\Phi }(t,\zeta )\) for \({\Phi }(X,Y)\) in \(\mathbf{F}_p[X,Y]\), so \(\alpha ^\sigma ={\Phi }(t,\zeta ^Q)\) which is congruent to \({\Phi }(t,\zeta ^{\Vert Q\Vert })\) mod Q in \({{{\mathcal {O}}}}_{\mathrm{c}}\) by part (a) with \(\Delta ={\Phi }(t,X)\) and then by substituting \(X=\zeta \). This in turn is congruent to \({\Phi }(t^{\Vert Q\Vert },\zeta ^{\Vert Q\Vert })=\alpha ^{\Vert Q\Vert }\) mod Q in \({{{\mathcal {O}}}}_{\mathrm{c}}\) as Q divides \(t^{\Vert Q\Vert }-t\) in \({{{\mathcal {O}}}}_0\). \(\square \)

The following is an analogue of an estimate of Amoroso-David [1] (p.157) restricted to a single variable. Again F is a finite extension of \(F_{\mathrm{c}}\). For a polynomial \(\Phi \) in F[X] or F[XY] we write \(|{\Phi }|_v\) for the maximum of \(|f|_v\) as f runs over the coefficients (at first sight this could be confused with \(|Q|_v\) below - but in fact it is the same notation because Q, even though a polynomial in t, is already in F).

Lemma 9

Suppose \({\Omega }\) in \({{{\mathcal {O}}}}_{\mathrm{c}}[X]\) of degree at most M vanishes at some \(\xi \) in F to order at least S. Then for any monic irreducible Q in \({{{\mathcal {O}}}}_0\) prime to N and any valuation v on F dividing Q we have

$$\begin{aligned} |{\Omega }^\sigma (\xi ^Q)|_v \le |{\Omega }^\sigma |_v|Q|_v^{S}\max \{1,|\xi ^Q|_v\}^M. \end{aligned}$$


Let \(\Delta (X)=\alpha _0X^d+\cdots +\alpha _d\) be a minimal polynomial of \(\xi \) over \({{{\mathcal {O}}}}_{\mathrm{c}}\). We use Strong Approximation but not quite as in [1] (whose argument for one variable uses already a special case for two variables). In fact this allows us to assume that \(|\Delta |_w=1\) for all w on \(F_{\mathrm{c}}\) dividing Q. Namely for each such w there is \(i=i(w)\) with \(0<|\alpha _i|_w=\mu _w=|\Delta |_w\le 1\). Now by the Theorem of [17] (p.67 - see also first paragraph of proof) there is \(\beta \) in \(F_{\mathrm{c}}\) with \(|\beta -\alpha _i^{-1}|_w < \mu _w^{-1}\) for each w dividing Q and \(|\beta |_{w'} \le 1\) for all other \(w'\) on \(F_{\mathrm{c}}\) not dividing \(\infty \). The first of these imply \(|\beta |_w=\mu _w^{-1}\) (in particular \(\beta \ne 0\)) and so \(|\beta \Delta |_w=1\) for each w dividing Q; and by the second \(|\beta \Delta |_{w'}\le 1\) for all other \(w'\) not dividing \(\infty \). So \(\beta \alpha _0,\ldots ,\beta \alpha _d\) are all in \({{{\mathcal {O}}}}_{\mathrm{c}}\). Thus we just have to replace \(\Delta \) by \(\beta \Delta \).

In fact since \(|x^\sigma |_w=|x|_{w(\sigma )}\) for any x in \(F_{\mathrm{c}}\) and some other \(w(\sigma )\), we see that \(|\Delta ^\sigma |_w=1\) for all w dividing Q. In particular \(|\Delta ^\sigma |_v=1\) for our v as well.

Next \({\Omega }=\Delta ^S{\hat{\Omega }}\) for \({\hat{\Omega }}\) in \(F_{\mathrm{c}}[X]\) of degree at most \(M-dS\) so by Gauss’s Lemma we have \(|{\Omega }^\sigma |_v=|\Delta ^\sigma |_v^S|{\hat{\Omega }}^\sigma |_v=|{\hat{\Omega }}^\sigma |_v\).

Now \({\Omega }^\sigma (\xi ^Q)=\Delta ^\sigma (\xi ^Q)^S{\hat{\Omega }}^\sigma (\xi ^Q)\) and so

$$\begin{aligned} |{\Omega }^\sigma (\xi ^Q)|_v \le |{\Omega }^\sigma |_v|\Delta ^\sigma (\xi ^Q)|_v^{S}\max \{1,|\xi ^Q|_v\}^{M-dS}. \end{aligned}$$

By Lemma 8 we see that \(\Delta ^\sigma (X^Q)-\Delta (X)^{\Vert Q\Vert }=Q{\Xi }(X)\) for some \(\Xi \) in \({{{\mathcal {O}}}}_{\mathrm{c}}[X]\) of degree at most \(d\Vert Q\Vert \). Putting \(X=\xi \) we deduce \(\Delta ^\sigma (\xi ^Q)=Q{\Xi }(\xi )\) and so

$$\begin{aligned} |\Delta ^\sigma (\xi ^Q)|_v\le |Q|_v\max \{1,|\xi |_v\}^{d\Vert Q\Vert }=|Q|_v\max \{1,|\xi ^Q|_v\}^{d}. \end{aligned}$$

The result now follows from this and (29). \(\square \)

For the application of Lemma 4 we need information about \(g(F_{\mathrm{c}})\) and \({{\mathfrak {m}}}(F_{\mathrm{c}},F_0)\).

Lemma 10

(a) We have \({{\mathfrak {m}}}(F_{\mathrm{c}},F_0)=n_{\mathrm{c}}\).

(b) We have \(g(F_{\mathrm{c}}) \le 8000n_{\mathrm{c}}\log (n_{\mathrm{c}}+1)\).


For (a) we have by definition \({{\mathfrak {m}}}(F_{\mathrm{c}},F_0)={n_{\mathrm{c}} / [\mathbf{F}:\mathbf{F}_p]}\), where \(\mathbf{F}\) is the algebraic closure of \(\mathbf{F}_p\) in \(F_{\mathrm{c}}\). But according to Gebhardt [24] (p.91) \(\mathbf{F}=\mathbf{F}_p\), and the result follows.

For (b) we need a formula given by Keller [34] (see also [24] p.92). Namely

$$\begin{aligned} g(F_{\mathrm{c}})=1+{1 \over 2}n_{\mathrm{c}}\left( -2+{p-2 \over p-1}+\sum _{i=1}^r\delta _i\left( {e_iq_i-e_i-1 \over q_i-1}\right) \right) \end{aligned}$$

where \(N=\prod _{i=1}^rN_i^{e_i}\) for distinct monic irreducible \(N_1,\ldots ,N_r\) of degrees \(\delta _1,\ldots ,\delta _r\) with \(q_1=p^{\delta _1},\ldots ,q_r=p^{\delta _r}\). Clearly this is at most \(n_{\mathrm{c}}\sum _{i=1}^r\delta _ie_i=\phi (N)n\) for the degree n of N. Now the result follows without difficulty from (11) above, using \(n \le 8000\log (1+2^n/(55n^4))\). \(\square \)

Proof of Proposition 2

We may assume that \(F_{\mathrm{c}}(\xi )=F\) because making F smaller decreases d. We may also assume \(D=[F:F_0] \ge 16\) from the remarks in [21] (p.218). We continue the notation \(F_{\mathrm{c}}=F_0(\zeta )\) for \(\zeta \) of order N. Thus \(D=d\phi (N)\).

We will suppose that

$$\begin{aligned} {\hat{h}}(\xi ) \le C^{-11}d^{-1}{(\log D)^{-3} \over (\log \log D)^{-2}}. \end{aligned}$$

It then suffices to deduce a contradiction if the constant C (which cannot possibly be mistaken for our curve C) is sufficiently large as a function of p. Generally below c will denote various positive quantities depending only on p.

We use the Carlitz exponential function

$$\begin{aligned} e(z)=\sum _{i=0}^\infty {z^{p^i} \over A(i)} \end{aligned}$$

where \(A(0)=1\) and

$$\begin{aligned} A(i)=\prod _{j=1}^i(t^{p^i}-t^{p^{i-j}})=\prod _{j=1}^i(t^{p^j}-t)^{p^{i-j}} ~~(i=1,2,\ldots ) \end{aligned}$$

can be taken as the \(a_i\) of Lemma 2(ii) of [21] (p.218). We pick any u with \(e(u)=\xi \).

It is well-known that \(F_{\mathrm{c}}\) is separable over \(F_0\). Actually it follows at once from the identity

$$\begin{aligned} {\partial \over \partial X}X^N=N \end{aligned}$$

which we shall use later.

Let q be the inseparable degree of F over \(F_{\mathrm{c}}\). We fix any monic irreducible \({Q_0}\) in \({{{\mathcal {O}}}}_0=\mathbf{F}_p[t]\) satisfying

$$\begin{aligned} \Vert {Q_0}\Vert \le C^4{d \log D \over q\log \log D} < p\Vert {Q_0}\Vert \end{aligned}$$

and we define

$$\begin{aligned} L=\left[ C^3{d \log D \over q\log \log D}\right] ,~~T=\left[ C^4{d \log D \over q\log \log D}\right] . \end{aligned}$$

Now for a non-zero polynomial \(\Phi \) in F[XY] we define the height

$$\begin{aligned} h_\mathbf{P}({\Phi })={1 \over D}\sum _vD_v\log |{\Phi }|_v \end{aligned}$$

as in (27). Later we will have to be slightly careful about the projectivity. We also use the term hyperzero to remind the reader that we are considering Taylor series expansions rather than actually differentiating.

Lemma 11

There is a non-zero polynomial \({\tilde{\Phi }}(X,Y)\) of degree at most L in each of X and Y, such that \({\Phi }={\tilde{\Phi }}^q\) is in \({{{\mathcal {O}}}}_{\mathrm{c}}[X,Y]\) with \(h_\mathbf{P}({\Phi }) \le cC^3d \log D\), and such that the function

$$\begin{aligned} \varphi (z)={\Phi }(e(z),e({Q_0}z)) \end{aligned}$$

has a hyperzero of order at least qT at \(z=u\).



$$\begin{aligned} {\tilde{\Phi }}(X,Y)=\sum _{i=0}^L\sum _{j=0}^La_{ij}X^iY^j,~~{{\tilde{\varphi }}}(z)={\tilde{\Phi }}(e(z),e({Q_0}z)) \end{aligned}$$

and \(z=w+u\) we have

$$\begin{aligned} {{\tilde{\varphi }}}(w+u)=\sum _{i=0}^L\sum _{j=0}^La_{ij}(e(w)+\xi )^i(e({Q_0}w)+\xi ^{Q_0})^j=\sum _{k=0}^\infty b_kw^k \end{aligned}$$

and we start by solving \(b_0=b_1=\cdots =b_{T-1}=0\). These are \(m=T\) linear equations in the \(n=(L+1)^2\) unknowns \(a_{ij}\) over F, to be solved first in \(F_{\mathrm{c}}^{1/q}\) as in Lemma 4. We note from (35) that

$$\begin{aligned} {dm \over qn} \le cC^{-2}{\log \log D \over \log D} \end{aligned}$$

in readiness for an application of this lemma.

Here (31) gives

$$\begin{aligned} (e(w)+\xi )^i=\sum _{r=0}^i{i \atopwithdelims ()r}\xi ^{i-r}\sum _{i_1=0}^\infty \cdots \sum _{i_r=0}^\infty {w^{p^{i_1}+\cdots +p^{i_r}} \over A(i_1)\cdots A(i_r)}. \end{aligned}$$


$$\begin{aligned} (e({Q_0}w)+\xi ^{Q_0})^j=\sum _{s=0}^j{j \atopwithdelims ()s}(\xi ^{Q_0})^{j-s}\sum _{j_1=0}^\infty \cdots \sum _{j_s=0}^\infty {({Q_0}w)^{p^{j_1}+\cdots +p^{j_s}} \over A(j_1)\cdots A(j_s)}, \end{aligned}$$

so \(b_l\) involves an apparent denominator (forgetting \(\xi ,\xi ^{Q_0}\)) the lowest common multiple of all

$$\begin{aligned} A(i_1)\cdots A(i_r)A(j_1)\cdots A(j_s) \end{aligned}$$

subject to

$$\begin{aligned} p^{i_1}+\cdots +p^{i_r}+p^{j_1}+\cdots +p^{j_s}=l. \end{aligned}$$

It is clear from (32) that \(A(i_1)\cdots A(i_r)\) for \(p^{i_1}+\cdots +p^{i_r} \le p^I\) contains the factor \(t^{p^j}-t\) at most \(p^{i_1-j}+\cdots +p^{i_r-j} \le p^{I-j}\) times (\(j=1,\ldots ,I\)), so their lowest common multiple has degree at most \(\sum _{j=1}^Ip^{I-j}p^j=Ip^I\). Thus we get a contribution \(cT\log T\) to the height of the rows of the matrix of linear equations in the \(a_{ij}\); here

$$\begin{aligned} T\log T \le cC^{5}{d (\log D)^2 \over q\log \log D}. \end{aligned}$$

Similarly for the \(A(j_1)\cdots A(j_s)\).

Another contribution comes from the \(\xi \) in (37). However \(h(\xi ) \le 3+{\hat{h}}(\xi ) \le 4\) by (16), so we get only cL from this; here

$$\begin{aligned} L \le C^3{d \log D \over q\log \log D}. \end{aligned}$$

Similarly in (38) we have

$$\begin{aligned} h(\xi ^{Q_0}) \le 3+{\hat{h}}(\xi ^{Q_0})=3+||{Q_0}||{\hat{h}}(\xi ) \le 4 \end{aligned}$$

as well.

Finally the \({Q_0}^{p^{j_1}+\cdots +p^{j_s}}\) in (38) contributes \(cT\log \Vert {Q_0}\Vert \); here

$$\begin{aligned} T\log \Vert {Q_0}\Vert \le cC^5{d (\log D)^2 \over q\log \log D}. \end{aligned}$$

Taking into account (41), (42), (44) and not forgetting (36), we get using Lemma 4 (the extra \({g / {\mathfrak {m}}}\) there is at most \(c\log D\) by Lemma 10) \({\tilde{\Phi }}\) of degree at most L with coefficients in \(F_{\mathrm{c}}^{1/q}\) of projective height at most \(cC^3(d/q)\log D\), such that \({{\tilde{\varphi }}}\) has a zero of order at least T. The present lemma follows by raising to the power q, a dirty cheap trick which one might well think very wasteful. At first the \(a_{ij}^q\) are in \(F_{\mathrm{c}}=F_0(\zeta )\) but we can clear denominators to get them into \({{{\mathcal {O}}}}_{\mathrm{c}}={{{\mathcal {O}}}}_0[\zeta ]\). \(\square \)

We next define

$$\begin{aligned} T_1=\left[ C^2d\right] ; \end{aligned}$$

this is quite a bit smaller than qT because later estimates as in the proof of Lemma 11 will not be helped by a small Siegel exponent as in (36). We also fix n satisfying

$$\begin{aligned} p^n \le C^7{(\log D)^2 \over \log \log D}<p^{n+1}. \end{aligned}$$

Lemma 12

For every monic irreducible Q in \({{{\mathcal {O}}}}_0\) prime to N of degree n and \(\sigma =\sigma _Q\) the function \(\varphi _\sigma (z)={\Phi }^\sigma (e(z),e({Q_0}z))\) has a hyperzero of order at least \(T_1\) at \(z=Qu\).


We show by induction on k that there is a hyperzero of order at least \(k~(k=0,1,\ldots ,T_1\)).

The case \(k=0\) is empty, so we assume it holds up to \(k-1\) for some k with \(1 \le k \le T_1\). We can write \(\varphi (z)={\Psi }(e(z))\) for \({\Psi }(X)\) in \({{{\mathcal {O}}}}_{\mathrm{c}}[X]\) of degree at most \(M=qL+qL\Vert {Q_0}\Vert \) (incidentally it is the second term here that forces our non-integrality considerations). It follows from Lemma 11 that \(\Psi \) vanishes at \(\xi \) to order at least qT. So the kth hyperderivative \({\Omega }={\Psi }^{[k]}\) vanishes at \(\xi \) to order at least \(T'=qT-k \ge qT/2\). Let V be the set of valuations v on F dividing Q such that \(\xi \) and so \(\xi ^Q\) is v-integral. For these we have \(|{\Psi }^\sigma |_v \le |{\Phi }^\sigma |_v\) and clearly also \(|{\Omega }^\sigma |_v \le |{\Psi }^\sigma |_v\). Thus from Lemma 9 we deduce for \(\eta ={\Omega }^\sigma (\xi ^Q)\) the estimate

$$\begin{aligned} |\eta |_v \le |{\Phi }^\sigma |_v|Q|_v^{T'}\le |{\Phi }^\sigma |_v|Q|_v^{qT/2} \end{aligned}$$

still for v in V.

For v on F not in V we argue analytically, using our induction on k. From \(\varphi (z)={\Psi }(e(z))\) and the fact that the z-derivative of e(z) is 1, we deduce \(\varphi ^{[k]}(z)={\Psi }^{[k]}(e(z))+\cdots \), where the missing terms involve lower hyperderivatives of \(\Psi \). Applying \(\sigma \), putting \(z=Qu\) and using our induction we see that \(\eta \) is the k-th Taylor coefficient of \(\varphi _\sigma (z)={\Phi }^\sigma (e(z),e({Q_0}z))\) at Qu. Estimating as we did in the proof of Lemma 11 with the analogues of (37) and (38) for \(\Phi \) (instead of \({\tilde{\Phi }}\)) we get

$$\begin{aligned} |\eta |_v \le |{\Phi }^\sigma |_v{{\mathfrak {M}}}_v{{\mathfrak {N}}}_v{{\mathfrak {Q}}}_v{{\mathfrak {A}}}_v \end{aligned}$$


$$\begin{aligned} {{\mathfrak {M}}}_v=\max \{1,|\xi ^Q|_v\}^{qL},~~{{\mathfrak {N}}}_v=\max \{1,|\xi ^{{Q_0}Q}|_v\}^{qL},~~{{\mathfrak {Q}}}_v=\max \{1,|{Q_0}|_v\}^k \end{aligned}$$


$$\begin{aligned} {{\mathfrak {A}}}_v=\max \left\{ 1,\max \left\{ \left| {1 \over A}\right| _v\right\} \right\} \end{aligned}$$

the inner maximum running over all A in (39) subject to (40).

We are trying to prove \(\eta =0\). If \(\eta \ne 0\) then the sum S of \(D^{-1}D_v\log |\eta |_v\) over all v on F should be zero by the Product Formula. We will deduce a contradiction. By (46) and (47)

$$\begin{aligned} S \le h_\mathbf{P}({\Phi }^\sigma )+S_0+S_{{\mathfrak {M}}}+S_{{\mathfrak {N}}}+S_{{\mathfrak {Q}}}+S_{{\mathfrak {A}}} \end{aligned}$$


$$\begin{aligned} S_0={1 \over D}{qT \over 2}\sum _{v \in {V}}D_v\log |Q|_v \end{aligned}$$

and the last four terms correspond to sums with \({{\mathfrak {M}}}_v,{{\mathfrak {N}}}_v,{{\mathfrak {Q}}}_v,{{\mathfrak {A}}}_v\) over all v on F not in V. The first three of the latter are easily estimated. We find

$$\begin{aligned} S_{{\mathfrak {M}}} \le qLh(\xi ^Q) \le 4qL \le 4C^3d\log D \end{aligned}$$

as in (42) and (43); and even the same for \(S_{{\mathfrak {N}}}\), as \(\Vert {Q_0}\Vert \Vert Q\Vert {\hat{h}}(\xi ) \le 1/q \le 1\). Also

$$\begin{aligned} S_{{\mathfrak {Q}}} \le kh({Q_0}) =k{\log \Vert {Q_0}\Vert \over \log p} \le cC^3d\log D \end{aligned}$$

instead of (44).

It would be tedious to estimate each \({{\mathfrak {A}}}_v\) explicitly. But \(S_{{\mathfrak {A}}}\) is at most the height of the corresponding vector of 1 with 1/A in (48), so the calculation can be done in \(F_0\), which we already did in (41), now getting \(T_1\log T_1 \le cC^3d\log D\) instead.

Thus using Lemma 11 to estimate \(h_\mathbf{P}({\Phi }^\sigma )=h_\mathbf{P}({\Phi })\) we get

$$\begin{aligned} 0=S \le cC^3d\log D+S_0. \end{aligned}$$

Finally in \(S_0\) we have \(|Q|_v=|Q|_Q=\Vert Q\Vert ^{-1/\log p}\) and V is the complement (among v dividing Q) of the set E in Lemma 7. Now \(e_v \le d\) there because each Q prime to N does not ramify in the cyclotomic \(F_{\mathrm{c}}\) (this step would fail for an arbitrary abelian extension). Thus

$$\begin{aligned} \sum _{v \in {V}}D_v=D-\sum _{v \in {E}}e_v{D_v \over e_v}\ge D-dD{\hat{h}}(\xi ){\log p \over \log \Vert Q\Vert }\ge D-D{\log p \over \log \Vert Q\Vert }\ge {D \over 2} \end{aligned}$$

using just \({\hat{h}}(\xi ) \le 1/d\) from (30). Therefore

$$\begin{aligned} S_0 \le -{qT \over 4}{\log \Vert Q\Vert \over \log p}\le -c^{-1}C^4d\log D \end{aligned}$$

and (49) yields our desired contradiction. Thus indeed \(\eta =0\) and this completes the proof of Lemma 12. \(\square \)

We can now finish the proof of Proposition 2 by showing that the polynomial \(\Psi \) defined above by \({\Psi }(e(z))={\Phi }(e(z),e({Q_0}z))\) has too many hyperzeroes for its degree.

First note that \({\Psi } \ne 0\) because \({\Phi }={\tilde{\Phi }}^q\) and \(e({Q_0}z)\) is a polynomial of degree \(\Vert {Q_0}\Vert > L\) in e(z) by (34) and (35). By Lemma 12 its conjugate \({\Psi }^\sigma \) for \(\sigma =\sigma _Q\) has a hyperzero of order at least \(T_1\) at each \(\xi ^Q\). Let \(\tau \) be any automorphism of \({\overline{F}}\) over \(F_0\) extending \(\sigma ^{-1}\). Then

$$\begin{aligned} 0=({\Psi }^\sigma (\xi ^Q))^\tau ={\Psi }((\xi ^Q)^\tau ). \end{aligned}$$

By Lemma 6(b) these \((\xi ^Q)^\tau =(\xi ^\tau )^Q\) are all of degree d over \(F_{\mathrm{c}}\) if we exclude at most \(2{\log d}\le 2{\log D}\) exceptional Q. This is harmless because by Lemma 5 and (45) the total number of Q at our disposal is at least \(M \ge c^{-1}C^6{(\log D)^2 / (\log \log D)^2}\). We should also note that the number of monic irreducible polynomials in \({{{\mathcal {O}}}}_0\) not prime to \(N=N_1^{e_1}\cdots N_r^{e_r}\) is g, certainly at most the degree of N which by (11) is at most \(c\log \phi (N) \le c\log ||N||\le cC\log \log D\) .

Now as Q ranges over all those remaining, and \(\tau \) ranges over all extensions of \(\sigma ^{-1}=\sigma _Q^{-1}\) from \(F_{\mathrm{c}}\) to F we claim that the \((\xi ^Q)^\tau \) are all different. In fact an equation \((\xi ^{Q})^\tau =(\xi ^{Q'})^{\tau '}\) would imply that \(\xi ^{Q},\xi ^{Q'}\) are conjugate over \(F_0\). Thus by Lemma 6(a) (with \(F_*=F_0\)) we have \(Q=Q'\) and so \(\sigma =\sigma '\) for \(\sigma '=\sigma _{Q'}\). So \((\xi ^\tau )^Q=(\xi ^{\tau '})^Q\). To cancel the Q here we note that \(F_{\mathrm{c}}(\xi )=F_{\mathrm{c}}(\xi ^Q)\) by Lemma 6(b), so that \(\xi =R(\xi ^Q)\) for R in \(F_{\mathrm{c}}(X)\). Now

$$\begin{aligned} \xi ^\tau =R^{\sigma ^{-1}}((\xi ^Q)^\tau )=R^{\sigma ^{-1}}((\xi ^Q)^{\tau '})=R^{\sigma '^{-1}}((\xi ^Q)^{\tau '})=\xi ^{\tau '}. \end{aligned}$$

As we assumed that \(\xi \) generates F over \(F_{\mathrm{c}}\) and \(\sigma =\sigma '\) we conclude \(\tau =\tau '\). This settles the above claim.

Write \(\Delta _Q\) for the monic minimal polynomial over \(F_{\mathrm{c}}\) of \(\xi ^Q\). Then \({\Psi }^\sigma \) in \(F_{\mathrm{c}}[X]\) is divisible by \(\Delta _Q^{T_1}\), and so \(\Psi \) in \(F_{\mathrm{c}}[X]\) is divisible by \((\Delta _Q^{\sigma ^{-1}})^{T_1}\). Now the hyperzeroes of \(\Delta _Q^{\sigma ^{-1}}\) are the \((\xi ^Q)^\tau \) (each repeated q times) and so an equation \(\Delta _Q^{\sigma ^{-1}}=\Delta _{Q'}^{\sigma '^{-1}}\) leads to some \((\xi ^{Q})^\tau =(\xi ^{Q'})^{\tau '}\) as above. Thus \(Q=Q'\) and so \(\sigma =\sigma '\) and these \(\Delta _Q^{\sigma ^{-1}}\) in \(F_{\mathrm{c}}[X]\) are all different (and irreducible over \(F_{\mathrm{c}}\)). Once again by Lemma 6(b) they all have degree d and so we get in all

$$\begin{aligned} MdT_1 \ge c^{-1}C^8{d^2(\log D)^2 \over (\log \log D)^2} \end{aligned}$$

hyperzeroes for \(\Psi \). However \(\Psi \) has degree at most

$$\begin{aligned} qL+qL\Vert {Q_0}\Vert \le cC^7{d^2(\log D)^2 \over q(\log \log D)^2} \end{aligned}$$

(here we can ignore the q - taking it into account would lead to the tiny improvement mentioned in section 1) and the proof of Proposition 2 is complete.

Preliminaries for general K – geometry

To start with we need some purely geometric results; maybe the first three lemmas below are well-known, but as we are not over zero characteristic (where they also hold) we spell out some proof details.

Lemma 13

Suppose affine B is irreducible of dimension at least 2. Then there are at most finitely many b in B such that the intersection of B with the generic hyperplane through b is reducible.


If we are in \(\mathbf{A}^m\) then Bertini Irreducibility (see for example [35] p.212) gives non-zero (homogeneous) \(\Phi (X_0,X_1,\ldots ,X_m)\) such that the intersection of B with \(\lambda _1x_1+\cdots +\lambda _mx_m=\lambda _0\) is irreducible provided \(\Phi (\lambda _0,\lambda _1,\ldots ,\lambda _m) \ne 0\).

Now the generic hyperplane through \(b=(b_1,\ldots ,b_m)\) is

$$\begin{aligned} \mu _1(x_1-b_1)+\cdots +\mu _m(x_m-b_m)=0, \end{aligned}$$


$$\begin{aligned} \mu _1x_1+\cdots +\mu _mx_m=\mu _1b_1+\cdots +\mu _mb_m. \end{aligned}$$

If the intersection is reducible we must have

$$\begin{aligned} \Phi (\mu _1b_1+\cdots +\mu _mb_m,\mu _1,\ldots ,\mu _m)=0 \end{aligned}$$

identically in \(\mu _1,\ldots ,\mu _m\). This means that \(b_1X_1+\cdots +b_mX_m-X_0\) divides \(\Phi \). But that can happen for at most finitely many \((b_1,\ldots ,b_m)=b\). \(\square \)

The example of a quadric cone B in \(\mathbf{A}^3\) defined by \(uv+vw+wu=0\) through (0, 0, 0) shows that exceptional b may exist: here the intersection with any hyperplane is a union of two lines.

Lemma 14

Suppose affine B is irreducible of dimension at least 2. Then for any b in B the intersection of B with the generic hyperplane through b has codimension 1 in B.


If \(l \ge 2\) is the dimension of B, then certainly \(\dim (B \cap \Lambda _b) < l\) for \(\Lambda _b\) generic through b. Because if not, then B would be contained in \(\Lambda _b\). But we can find a bunch of such \(\Lambda _b\) whose intersection is just b, and it would follow that \(B=\{b\}\).

Let \(L_b\) be a linear polynomial defining \(\Lambda _b\). It induces a map \(L_b\) from B to \(\mathbf{A}\). This is dominant. For otherwise \(L_b\) would be a constant c on B. As \(L_b(b)=0\) we see that \(c=0\). But then B would be contained in \(\Lambda _b\), which is excluded above.

We now use the Fibre Dimension Theorem in the version quoted in [11] (p.8); there we were over zero characteristic but it holds too over positive characteristic, as the reference to [18] shows. Part (a) on this \(L_b\) from B to \(\mathbf{A}\) shows that \(L_b^{-1}(0)=B \cap \Lambda _b\) has dimension at least \(l-1\), provided it is non-empty; which it is, because it contains b. \(\square \)

Lemma 15

Suppose affine B is irreducible of dimension at least 2. Then for any non-singular b in B not in the finite set of Lemma 13 the point b remains non-singular on the intersection of B with the generic hyperplane through b.


Let \(\Phi _1,\ldots ,\Phi _N\) be generators of the ideal of B in \(\mathbf{A}^m\), so that the jacobian matrix with rows

$$\begin{aligned} \left( {\partial \Phi _i \over \partial x_1}(b),\ldots ,{\partial \Phi _i \over \partial x_m}(b)\right) ~~~(i=1,\ldots ,N) \end{aligned}$$

has rank \(m-l\), again for l the dimension of B. With \(L_b\) as in (50) defining the generic hyperplane, we adjoin the extra row

$$\begin{aligned} \left( {\partial L_b \over \partial x_1}(b),\ldots ,{\partial L_b \over \partial x_m}(b)\right) =(\mu _1,\ldots ,\mu _m) \end{aligned}$$

and then the rank increases to \(m-l+1=m-(l-1)\), because \(\mu _1,\ldots ,\mu _m\) are generic. Now \(\Phi _1,\ldots ,\Phi _N,L_b\) may not be generators of the ideal of \(B \cap \Lambda _b\), but if we extend them to include such generators then the rank will still be at least \(m-(l-1)\). By Lemmas 13 and 14 this irreducible \(B \cap \Lambda _b\) has dimension \(l-1\) and so indeed b is non-singular there (see for example [35] p.198). \(\square \)

Next we record a result of well-known type, although we could not find it precisely in the literature. It is a version of Mumford’s (3.25) and (3.26) in [50] (p. 53), which was applied in [9]. As we are over positive characteristic we cannot use his notion of “topologically unibranch”. We write \(\pi \) for the projection from affine \(\mathbf{A}^n \times \mathbf{A}^m\) to \(\mathbf{A}^m\), and for the moment we work over an arbitrary algebraically closed field.

Lemma 16

Let W be an algebraic set all of whose components have dimension \(l \ge 1\) in \(\mathbf{A}^n \times \mathbf{A}^m\) and let B be an irreducible variety of dimension l in \(\mathbf{A}^m\), with \(\pi (W)\) in B. Let T (when \(l \ge 2\)) be the finite set of Lemma 13 above for B. If b is non-singular on B and not in T (when \(l \ge 2\)) such that \(W\cap \pi ^{-1}(b)\) is finite, then its cardinality is at most that of \(W\cap \pi ^{-1}(\eta )\) for any \(\eta \) generic on B.


Here it is crucial, as in [9], that the excluded b hardly depend on W.

We will prove the result first under the assumption that the projections from each component of W to B are dominant.

We start with the case of curves \(l=1\) (for which we do not need T or the hypothesis that \(W\cap \pi ^{-1}(b)\) is finite). We proceed in three stages.

Suppose first that W is irreducible.

Then \(W \cap \pi ^{-1}(\eta )\) has exactly d elements, where d is the separable degree of \(\pi \) restricted to W. Suppose \(W \cap \pi ^{-1}(b)\) contains at least \(e>d\) points. We can find a linear form \(\lambda \) in the coordinates of \(\mathbf{A}^n\) taking e different values at these points. If q is the inseparable degree, then \(\mu =\lambda ^q\) satisfies an equation

$$\begin{aligned} \phi _0\mu ^d+\cdots +\phi _d=0 \end{aligned}$$

with \(\phi _0,\ldots ,\phi _d\) not all zero in the coordinate ring of B.

If we are lucky and \(\phi _0,\ldots ,\phi _d\) do not all vanish at b, the result is clear: (51) shows that there can be at most d values of \(\mu \), so at most d values of \(\lambda =\mu ^{1/q}\), a contradiction. Here we did not use non-singularity.

If \(\phi _0,\ldots ,\phi _d\) all vanish at b, we pick \(\phi =\phi _i \ne 0\) with \(\mathrm{ord}_b\phi _i\) minimal. Here non-singularity is implicit. Now the \(\phi _j/\phi \) are regular at b and so can be written as \(\psi _j/\psi \) for \(\psi _j,\psi \) in the coordinate ring of B with \(\psi (b) \ne 0\). Multiplying (51) by \(\psi \) gives

$$\begin{aligned} \left( \psi _0\mu ^d+\cdots +\psi _d \right) \phi =0 \end{aligned}$$

on W. But \(\phi =0\) on W would contradict dominance. As W is irreducible, it follows that

$$\begin{aligned} \psi _0\mu ^d+\cdots +\psi _d=0 \end{aligned}$$

on W. And this takes us back to the first case, because \(\psi _i=\psi \) does not vanish at b.

Next suppose, still for \(l=1\) under the dominance hypothesis, that \(W=W_1 \cup \cdots \cup W_r\) for irreducible \(W_1,\ldots ,W_r\). Then

$$\begin{aligned} \# \left( W_i\cap \pi ^{-1}(b)\right) \le \# \left( W_i\cap \pi ^{-1}(\eta )\right) ~~~~ (i=1,\ldots ,r) \end{aligned}$$

for generic \(\eta \). Thus

$$\begin{aligned} \# \left( W\cap \pi ^{-1}(b)\right) \le \sum _{i=1}^r\# \left( W_i\cap \pi ^{-1}(b)\right) \le \sum _{i=1}^r\# \left( W_i\cap \pi ^{-1}(\eta )\right) . \end{aligned}$$

Now any two \(W_i\cap \pi ^{-1}(\eta )\) are disjoint because any two \(W_i\) intersect in a finite set which cannot project to \(\eta \). Thus the last sum above is indeed \(\# \left( W\cap \pi ^{-1}(\eta )\right) \).

This settles the case \(l=1\).

We now use induction on l (still assuming dominance). Assuming the result for some dimension \(l-1 \ge 1\), we will deduce it for dimension l.

As above, we do it in stages. We may assume \(W \cap \pi ^{-1}(b)\) is non-empty and \(b \ne \eta \).

First irreducible W.

It is easy to see that a generic hyperplane constricted to pass through b and \(\eta \) is a generic hyperplane constricted only to pass through b. We call it \(\Lambda _b\). Since b is not in T, the intersection \(B \cap \Lambda _b\) is irreducible. We may denote also by \(\Lambda _b\) the product \(\mathbf{A}^n \times \Lambda _b\) in \(\mathbf{A}^n \times \mathbf{A}^m\) (it is defined by the same equation). Then \(\pi \) induces a map \(\pi _b\) from \(W \cap \Lambda _b\) to \(B \cap \Lambda _b\).

Now \(\dim (W \cap \Lambda _b) \le l-1\) else W would be contained in \(\Lambda _b\). By varying this hyperplane (still through b and \(\eta \)) we would deduce that W is contained in their intersection, which is (\(\mathbf{A}^n\) times) the line through b and \(\eta \). But then \(\pi (W)\) would be contained in this line, contradicting dominance.

Let \(L_b\) be a linear polynomial defining \(\Lambda _b\). It induces a map \(L_b\) from W to \(\mathbf{A}\). This is dominant. For otherwise \(L_b\) would be a constant c on W. As \(L_b(b)=0\) and \(W \cap \pi ^{-1}(b)\) is non-empty we see that \(c=0\). But then W would be contained in \(\Lambda _b\), which is excluded above.

Now the Fibre Dimension Theorem on this \(L_b\) from W to \(\mathbf{A}\) shows that every (non-empty) component of \(L_b^{-1}(0)=W \cap \Lambda _b\) has dimension at least \(l-1\). Note that \(W \cap \Lambda _b\) is non-empty because \(W\cap \pi ^{-1}(b)\) is.

Thus every component of \(W \cap \Lambda _b\) is irreducible of dimension \(l-1\).

Assume for the moment that there is only one component. We try to apply the induction hypothesis to the map \(\pi _b\) from \(W \cap \Lambda _b\) to \(B \cap \Lambda _b\) also irreducible of dimension \(l-1\). In fact \(\pi _b\) is dominant, otherwise \(\pi _b(W \cap \Lambda _b)\) would be of dimension at most \(l-2\) containing b and then the Fibre Dimension Theorem would imply that \(W\cap \pi ^{-1}(b)\) would be of dimension at least 1, contradicting its assumed finiteness. We see by Lemma 15 that b remains non-singular on \(B \cap \Lambda _b\). Thus by induction we have

$$\begin{aligned} \#\pi _b^{-1}(b) \le \#\pi _b^{-1}(\eta ). \end{aligned}$$

But since \(\Lambda _b\) goes through both b and \(\eta \), it is easy to see that \(\pi _b^{-1}(b)=W\cap \pi ^{-1}(b)\) and \(\pi _b^{-1}(\eta )=W\cap \pi ^{-1}(\eta )\).

A similar argument works if there are several different components \(Z^{(1)},\ldots ,Z^{(s)}\) (all necessarily of dimension \(l-1\)) of \(W \cap \Lambda _b\). Then \(\pi _b\) induces projections \(\pi ^{(1)},\ldots ,\pi ^{(s)}\) from \(Z^{(1)},\ldots ,Z^{(s)}\) to \(B \cap \Lambda _b\). If one of these is not dominant, then again \(W \cap \pi ^{-1}(b)\) would be infinite. Thus by induction again, \(\#(\pi ^{(j)})^{-1}(b) \le \#(\pi ^{(j)})^{-1}(\eta )\) for \(j=1,\ldots ,s\). Therefore

$$\begin{aligned} \#\pi _b^{-1}(b)=\sum _{j=1}^s\#(\pi ^{(j)})^{-1}(b) \le \sum _{j=1}^s\#(\pi ^{(j)})^{-1}(\eta ). \end{aligned}$$

Now any two \((\pi ^{(j)})^{-1}(\eta )\) are disjoint because any two \(Z^{(j)}\) intersect in something of dimension at most \(l-2\) which cannot project to \(\eta \). Thus the last sum above is just \(\#(W\cap \pi _b^{-1}(\eta ))\); and we have recovered (53).

Next the reducible case \(W=W_1 \cup \cdots \cup W_r\) (still under dominance) follows in a similar way. Namely

$$\begin{aligned} \# \left( W_i \cap \pi ^{-1}(b)\right) \le \# \left( W_i\cap \pi ^{-1}(\eta )\right) ~~~~ (i=1,\ldots ,r). \end{aligned}$$


$$\begin{aligned} \# \left( W \cap \pi ^{-1}(b)\right) \le \sum _{i=1}^r\# \left( W_i\cap \pi ^{-1}(b)\right) \le \sum _{i=1}^r\# \left( W_i\cap \pi ^{-1}(\eta )\right) \end{aligned}$$

and as above any two \(W_i\cap \pi ^{-1}(\eta )\) are disjoint because any two \(W_i\) intersect in something of dimension at most \(l-1\) which cannot project to \(\eta \). Thus the last sum above is indeed \(\# \left( W \cap \pi ^{-1}(\eta )\right) \).

This settles the lemma under our assumption that the projections from each component of W to B are dominant.

Finally suppose the latter fails for some component \(W_0\) of W. Then b cannot be in \(\pi (W_0)\), otherwise the Fibre Dimension Theorem would imply that \(W_0\cap \pi ^{-1}(b)\) would have dimension at least \(\dim W_0-\dim \pi (W_0) \ge 1\), contradicting finiteness. Thus \(W_0\cap \pi ^{-1}(b)\) is empty; and of course so is \(W_0\cap \pi ^{-1}(\eta )\). So these are not seen in the intersections with W. \(\square \)

Regarding the excluded b, the example of B defined by \(v^2=u^2(u+1)\) in \(\mathbf{A}^2\) and W defined by

$$\begin{aligned} v^2=u^2(u+1),~~w^2=u+1,~~wu=v \end{aligned}$$

in \(\mathbf{A}^3\), where \(b=(0,0)\) has two inverse images \((0,0,\pm 1)\), shows also that Lemma 16 can be false if b is singular.

The result can become false also if components of dimension less than l are allowed. For example if \(l=1\) and W is the single point (ab) in \(\mathbf{A}^n \times \mathbf{A}^m\) with b non-singular on the curve B, then

$$\begin{aligned} \#(W \cap \pi ^{-1}(b))=1>0=\#(W \cap \pi ^{-1}(\eta )). \end{aligned}$$

Preliminaries for general K - Carlitz

Now we return to the Carlitz world. The next result concerns the action of a Carlitz polynomial \(A({{{\mathcal {C}}}})\) on a special sort of Laurent polynomial in a variable u. Write

$$\begin{aligned} A(T)=a_0+a_1T+\cdots +a_dT^d \end{aligned}$$

with coefficients in \(\mathbf{F}_p\). For a positive integer n denote by \(S_m^{(n)}\) the mth elementary symmetric polynomial in \(-t^p,-t^{p^2},\ldots ,-t^{p^{n}}\).

For A as above and an integer \(n \le d\), now including \(n=0\), and variables \(\lambda _0,\ldots ,\lambda _n\), we define \(X_n^{(n)},\ldots ,X_{d+n}^{(n)}\) recursively as follows. At level 0 we have

$$\begin{aligned} X_k^{(0)}=a_k\lambda _0~~~(k=0,\ldots ,d). \end{aligned}$$

Then at level n from level \(n-1 \ge 0\) we have first

$$\begin{aligned} X_{d+n}^{(n)}=a_d\lambda _n, \end{aligned}$$


$$\begin{aligned} X_{d+n-1}^{(n)}= & {} \left( X_{d+n-1}^{(n-1)}\right) ^p+\left( S_0^{(n)}a_{d-1}+S_1^{(n)}a_d \right) \lambda _n,\\ X_{d+n-2}^{(n)}= & {} \left( X_{d+n-2}^{(n-1)}\right) ^p+ \left( S_0^{(n)}a_{d-2}+S_1^{(n)}a_{d-1}+S_2^{(n)}a_{d} \right) \lambda _n, \end{aligned}$$

and so on, down to

$$\begin{aligned} X_{d+1}^{(n)}=(X_{d+1}^{(n-1)})^p+(S_0^{(n)}a_{d-n+1}+S_1^{(n)}a_{d-n+2}+\cdots +S_{n-1}^{(n)}a_{d})\lambda _n, \end{aligned}$$

which define the \(X_k^{(n)}\) for \(k>d\). And finally for \(k=d,d-1,\ldots ,n\) we define

$$\begin{aligned} X_{k}^{(n)}=\left( X_{k}^{(n-1)}\right) ^p+ \left( S_0^{(n)}a_{k-n}+S_1^{(n)}a_{k-n+1}+\cdots +S_{n}^{(n)}a_{k}\right) \lambda _n. \end{aligned}$$

By induction on n we verify the following

Remark 1

For \(k=n,\ldots ,d+n\) and \(l=\min \{k,d\} \ge n\) we can write \(X_{k}^{(n)}\) as a linear form in \(a_l,\ldots ,a_{l-n}\) whose coefficients are polynomials over \(\mathbf{F}_p\) in \(t,\lambda _0,\ldots ,\lambda _n\) of degree at most \(p^{n+1}\) in each variable.

It will be crucial that for each nk the number of \(a_i\) appearing as well as the polynomial degree are bounded only in terms of n. For example

$$\begin{aligned} X_k^{(1)}= & {} (a_k\lambda _0)^p+\left( a_{k-1}-t^pa_k \right) \lambda _1=\left( \lambda _0^p-t^p\lambda _1 \right) a_k+\lambda _1a_{k-1}~~~(k=1,\ldots ,d), \\ X_k^{(2)}= & {} \left( \lambda _0^{p^2}-t^{p^2}\lambda _1^p+t^{p+p^2}\lambda _2 \right) a_k+\left( \lambda _1^p-(t+t^p)\lambda _2 \right) a_{k-1}+\lambda _2a_{k-2}~~~(k=2,\ldots ,d). \end{aligned}$$

We need the “Carnomial coefficients" \(T_{ij}\) in \({{{\mathcal {O}}}}_0=\mathbf{F}_p[t]\) defined by

$$\begin{aligned} {{{\mathcal {C}}}}^iZ=\sum _{j=0}^iT_{ij}Z^{p^j}. \end{aligned}$$

Lemma 17

We have

$$\begin{aligned} A({{{\mathcal {C}}}})\left( {\lambda _0 \over u}+{\lambda _1 \over u^p}+\cdots +{\lambda _n \over u^{p^n}}\right) =\sum _{j=0}^{d+n}{P_j^{(n)} \over u^{p^j}} \end{aligned}$$


$$\begin{aligned} P_j^{(n)}=\sum _{k=j}^{d+n}T_{kj}(X_k^{(n)})^{p^{j-n}}~~~(j=n,\ldots ,d+n). \end{aligned}$$


Note that we do not specify \(P_j^{(n)}\) for \(j=0,\ldots ,n-1\). It is not difficult to do so but we found it is not useful for applications.

From \({{{\mathcal {C}}}}^{i+1}Z={{{\mathcal {C}}}}^i({{{\mathcal {C}}}}Z)\) we derive a simple recurrence

$$\begin{aligned} T_{i ~j-1}=T_{i+1 ~j}-t^{p^j}T_{ij}, \end{aligned}$$

where the \(T_{ij}\) are considered zero if \(0 \le j \le i\) does not hold.

Then iterating n times gives

$$\begin{aligned}&T_{i~j-1}=\left( S_0^{(n)}\right) ^{p^{j-1}}T_{i+n~j+n-1}+\left( S_1^{(n)}\right) ^{p^{j-1}} T_{i+n-1~j+n-1}+\cdots +\left( S_n^{(n)}\right) ^{p^{j-1}}T_{i~j+n-1}. \nonumber \\ \end{aligned}$$

We use of course induction on n to prove (58).

For \(n=0\) the left-hand side is

$$\begin{aligned} \sum _{k=0}^da_k\sum _{j=0}^kT_{kj}{\lambda _0^{p^j} \over u^{p^j}}=\sum _{j=0}^d{1 \over u^{p^j}}\sum _{k=j}^da_kT_{kj}\lambda _0^{p^j}=\sum _{j=0}^d{1 \over u^{p^j}}\sum _{k=j}^dT_{kj}(a_k\lambda _0)^{p^j} \end{aligned}$$

which is the right-hand side thanks to (55).

Now we assume the thing done for \(n-1 \ge 0\) and we deduce it for \(n \ge 1\).

Splitting \(\lambda _n/u^{p^n}\) off the left-hand side of (58), and using the case \(n=0\) with \(u^{p^n}\) in place of u we find

$$\begin{aligned} \sum _{j=0}^{d+n-1}{P_j \over u^{p^j}}+\sum _{j=0}^d{Q_j \over u^{p^{j+n}}} \end{aligned}$$


$$\begin{aligned} P_j=P_j^{(n-1)}=\sum _{k=j}^{d+n-1}T_{kj}\left( X_k^{(n-1)}\right) ^{p^{j-n+1}},~~~(j=n-1,\ldots ,d+n-1) \end{aligned}$$


$$\begin{aligned} Q_j=\sum _{k=j}^dT_{kj}(a_k\lambda _n)^{p^{j}}~~~(j=0,\ldots ,d) \end{aligned}$$

(with \(\lambda _n\) in place of \(\lambda _0\)). For \(j=d+n\) in (58) we get at once

$$\begin{aligned} P_{d+n}^{(n)}=Q_d=(a_d\lambda _n)^{p^d} \end{aligned}$$

as required in (59), thanks to (56).


$$\begin{aligned} P_j^{(n)}=P_j+Q_{j-n}~~~(j=n,\ldots ,d+n-1) \end{aligned}$$

which is

$$\begin{aligned} \sum _{k=j}^{d+n-1}T_{kj}\left( X_k^{(n-1)}\right) ^{p^{j-n+1}}+\sum _{k=j-n}^{d}T_{k~j-n}\left( a_k\lambda _n \right) ^{p^{j-n}}. \end{aligned}$$

We use (60) to see that the second sum is

$$\begin{aligned} \sum _{k=j-n}^{d}((S_0^{(n)})^{p^{j-n}}T_{k+n~j}+(S_1^{(n)})^{p^{j-n}}T_{k+n-1~j}+\cdots +(S_n^{(n)})^{p^{j-n}}T_{kj})(a_k\lambda _n)^{p^{j-n}}. \end{aligned}$$


$$\begin{aligned} P_j^{(n)}=U+U_0+U_1+\cdots +U_n \end{aligned}$$

with (now adjusting k)

$$\begin{aligned} U=\sum _{k=j}^{d+n-1}T_{kj}(X_k^{(n-1)})^{p^{j-n+1}} \end{aligned}$$


$$\begin{aligned} U_m=(S_m^{(n)})^{p^{j-n}}\sum _{k=j-m}^{d+n-m}T_{kj}(a_{k-n+m}\lambda _n)^{p^{j-n}}. \end{aligned}$$

Here in \(U_m\) we can restrict the sum from \(k=j\).

We already checked \(P_{d+n}^{(n)}\) in (59) using (56). Now for each \(j=n,\ldots ,d+n-1\) we pick out the coefficient of the various \(T_{kj}\) in (61).

The biggest k is \(d+n\) and now we see \(T_{kj}\) only in \(U_0\), with coefficient \((S_0^{(n)})^{p^{j-n}}(a_d\lambda _n)^{p^{j-n}}\), which fits with (59) again thanks to (56).

Next for \(k=d+n-1\) we see \(T_{kj}\) in U as well as in \(U_0,U_1\). This also fits with (59) thanks to the displayed formula just after (56).

We carry on with \(k=d+n-2\) down to \(k=d+1\) and these also fit, thanks to the later formulae preceding (57).

Then we go further with \(k=d,d-1,\ldots ,n\) which appear in all of \(U,U_0,U_1,\ldots ,U_n\) and these fit with (59) because of (57). This completes the proof. \(\square \)

As mentioned, the \(P_j^{(n)}\) for \(j=0,\ldots ,n-1\) are not useful for applications. For example one finds

$$\begin{aligned} P_0^{(n)}=A(t)\lambda _0=(a_0+a_1t+\cdots +a_dt^d)\lambda _0 \end{aligned}$$

where the number of \(a_i\) appearing is not bounded in terms of n as in Remark 1.

Remark 2

Because \(T_{kk}=1\) the system (59) has a triangular nature; for example the equations

$$\begin{aligned} P_{d+n}^{(n)}=\cdots =P_{e}^{(n)}=0 \end{aligned}$$

for some \(e \ge n\) are equivalent to the equations

$$\begin{aligned} X_{d+n}^{(n)}=\cdots =X_{e}^{(n)}=0. \end{aligned}$$

It is not difficult to see from Remark 1 that (we will be more precise later) these are equations for \(\lambda _0,\ldots ,\lambda _n\) again essentially independent of A (as in the examples given just after that Remark).

More on curves

From now on \(K=\overline{\mathbf{F}_p(t,s_1,\ldots ,s_l)}\) for some \(l \ge 1\) variables \(s_1,\ldots ,s_l\) algebraically independent over \(F_0=\mathbf{F}_p(t)\). We define a height \(h_s\) on K by regarding it as the closure of \({\overline{F_0}(s_1,\ldots ,s_l)}\). See for example [22] (p.1053) - thus \(h_s(s_1)=\cdots =h_s(s_l)=1\) but \(h_s(t)=0\).

The next result is in the style of Proposition 1.

Lemma 18

Let C be an irreducible curve in \(\mathbf{G}_{\mathrm{a}}^n\) defined over K. Assume for any non-zero \((\rho _1,\ldots ,\rho _n)\) in \({{{\mathcal {R}}}}^n\) that the form \(\rho _1x_1+\cdots +\rho _nx_n\) is not identically constant on C. Then there is \({{\mathfrak {B}}}\) such that

$$\begin{aligned} h_s(\xi _1)+\cdots +h_s(\xi _n)\le {\mathfrak {B}} \end{aligned}$$

for all \((\xi _1,\ldots ,\xi _n)\) on C(K) for which there exists non-zero \((\alpha _1,\ldots ,\alpha _n)\) in \({{{\mathcal {R}}}}^n\) and \(\lambda \) in \(\overline{F_0}\) with

$$\begin{aligned} \alpha _1\xi _1+\cdots +\alpha _n\xi _n=\lambda . \end{aligned}$$


Our \(h_s\) is not a canonical height in the sense of Denis but it does have the same property that \(h_s(\xi ^P)=\Vert P\Vert h_s(\xi )\) as in (14) above. For example \(h_s(\xi ^p+t\xi )=ph_s(\xi )\) because now t appears as a constant. To see that we can go through the Nullstellensatz argument observing that \(|t|=1\). Or just directly using \(\max \{1,|\xi ^p+t\xi |\}=\max \{1,|\xi |\}^p\) (that would work better for general P). And of course \(h_s(\xi +\eta ) \le h_s(\xi )+h_s(\eta )\) as in (15) above.

Now we can follow the proof of Proposition 1 above. Lemma 1 above goes through with \(h_s\) instead of \({\hat{h}}\) because of the remarks above; the extra \(\lambda \) in (62) makes no trouble because it leads to an extra term \(Q({{{\mathcal {C}}}})\lambda \) on the left-hand side of (21), and this has zero height. The subsequent proof also goes through with \(h_s\) also instead of h (actually with \(c'{=}c''{=}nc\)). \(\quad \square \)

And now an analogue of Proposition 2 of [9] (p. 452).

Lemma 19

Let C be an irreducible curve in \(\mathbf{G}_{\mathrm{a}}^3\) defined over K but not over \(\overline{F_0}\). Assume for any non-zero \((\rho _1,\rho _2,\rho _3)\) in \({{{\mathcal {R}}}}^3\) that the form \(\rho _1x_1+\rho _2x_2+\rho _3x_3\) is not identically constant on C. Then given any D there are at most finitely many \((\xi _1,\xi _2,\xi _3)\) on C(K) for which there exists non-zero \((\alpha _1,\alpha _2,\alpha _3)\) in \({{{\mathcal {R}}}}^3\) and \(\lambda \) in \(\overline{F_0}\) with

$$\begin{aligned} \alpha _1\xi _1+\alpha _2\xi _2+\alpha _3\xi _3=\lambda \end{aligned}$$


$$\begin{aligned}{}[\overline{F_0}(\xi _1,\xi _2,\xi _3,s_1,\ldots ,s_l):\overline{F_0}(s_1,\ldots ,s_l)] \le D. \end{aligned}$$


Assume first that the group of the \((\alpha _1,\alpha _2,\alpha _3)\) for which \(\lambda \) exists in (63) has rank 1. Then much as in (23) and (24) above we can find \(\xi ,\xi '\) in K with

$$\begin{aligned} \overline{F_0}(\xi _1,\xi _2,\xi _3)=\overline{F_0}(\xi ,\xi ') \end{aligned}$$


$$\begin{aligned} \xi _1=\sigma _1\xi +\sigma _1'\xi '+\lambda _1,~~\xi _2=\sigma _2\xi +\sigma '_2\xi '+\lambda _2,~~\xi _3=\sigma _3\xi +\sigma _3'\xi '+\lambda _3 \end{aligned}$$

for \(\sigma _1,\sigma _1',\sigma _2,\sigma _2',\sigma _3,\sigma _3'\) in \({{\mathcal {R}}}\) and \(\lambda _1,\lambda _2,\lambda _3\) in \(\overline{F_0}\). Now \(\xi ,\xi '\) are linearly independent, as are the rows \((\sigma _1,\sigma _2,\sigma _3),(\sigma _1',\sigma _2',\sigma _3')\). With \(\sigma _i=S_i({{{\mathcal {C}}}}),\sigma _i'=S_i'({{{\mathcal {C}}}})\) the rows

$$\begin{aligned} \mathbf{s}=(S_1,S_2,S_3),~~\mathbf{s}'=(S_1',S_2',S_3') \end{aligned}$$

in \({{{\mathcal {O}}}}_0^3\) (recall \({{{\mathcal {O}}}}_0=\mathbf{F}_p[t]\) here) satisfy \(M\mathbf{s}^t=M\mathbf{s}'^t=0\) for some \(1 \times 3\) matrix whose entries are the minors

$$\begin{aligned} T_1=S_2S_3'-S_3S_2',~~T_2=S_3S_1'-S_1S_3',~~T_3=S_1S_2'-S_2S_1'. \end{aligned}$$

We shall show (in a fairly familar way) that we may assume that

$$\begin{aligned} d=\max \{\deg S_1,\deg S_2,\deg S_3\},~~d'= \max \{\deg S_1',\deg S_2',\deg S_3'\} \end{aligned}$$


$$\begin{aligned} e=\max \{\deg T_1,\deg T_2,\deg T_3\} \end{aligned}$$


$$\begin{aligned} d+d' \le e. \end{aligned}$$

Namely, by [57] Corollary 2 (p.148) over \(F_0\) there are independent \(\tilde{\mathbf{s}},\tilde{\mathbf{s}}'\) in \(F_0^3\) with \(M\tilde{\mathbf{s}}^t=M\tilde{\mathbf{s}}'^t=0\) and

$$\begin{aligned} h_\mathbf{P}(\tilde{\mathbf{s}})+h_\mathbf{P}(\tilde{\mathbf{s}}') \le h_\mathbf{P}(M). \end{aligned}$$

We can normalize \(\tilde{\mathbf{s}},\tilde{\mathbf{s}}'\) to lie in \({{{\mathcal {O}}}}_0^3\) and be primitive. Then

$$\begin{aligned} h_\mathbf{P}(\tilde{\mathbf{s}})=\max \{\deg {\tilde{S}}_1,\deg {\tilde{S}}_2,\deg {\tilde{S}}_3\},~~h_\mathbf{P}(\tilde{\mathbf{s}}')=\max \{\deg {\tilde{S}}_1',\deg {\tilde{S}}_2',\deg {\tilde{S}}_3'\} \end{aligned}$$

for the corresponding polynomials in \({{{\mathcal {O}}}}_0\). Also

$$\begin{aligned} h_\mathbf{P}(M)\le \max \{\deg {\tilde{T}}_1,\deg {\tilde{T}}_2,\deg {\tilde{T}}_3\} \end{aligned}$$

for the corresponding minors. Thus (66) indeed holds for the new polynomials. And the old \((\sigma _1,\sigma _2,\sigma _3),(\sigma _1',\sigma _2',\sigma _3')\) are linear combinations of the new rows \(({{\tilde{\sigma }}}_1,{{\tilde{\sigma }}}_2,{{\tilde{\sigma }}}_3),({{\tilde{\sigma }}}_1',{{\tilde{\sigma }}}_2',{{\tilde{\sigma }}}_3')\) with coefficients in \(F_0\). This leads to (65) for new \({{\tilde{\xi }}},{{\tilde{\xi }}}'\) with these new rows.

So it suffices to rename new as old, thus achieving (65).

We may suppose \(e=\deg T_3\) above. Eliminating \(\xi '\) between the first two equations of (65) gives

$$\begin{aligned} \sigma _2'\xi _1-\sigma _1'\xi _2=(\sigma _2'\sigma _1-\sigma _1'\sigma _2)\xi +\mu =T_3({{{\mathcal {C}}}})\xi +\mu \end{aligned}$$

for \(\mu \) in \(\overline{F_0}\). Taking heights \(h_s\) and estimating the left-hand side by Lemma 18, we get the inequality \(p^{d'}{{\mathfrak {B}}} \ge p^eh_s(\xi )\).

Now \(\xi \) is not in \(\overline{F_0}\) by (65) and our rank 1 assumption. Thus Lemma 2.1 (p.1053) of [22] shows that

$$\begin{aligned} h_s(\xi ) \ge \left[ \overline{F_0}(s_1,\ldots ,s_l)(\xi ):\overline{F_0}(s_1,\ldots ,s_l)\right] ^{-1}. \end{aligned}$$

By (64) the degree here is at most D. It follows that \(p^{d'}{{\mathfrak {B}}}{D} \ge p^e\).

A similar argument eliminating \(\xi \) in (65) leads to \(p^{d}{{\mathfrak {B}}}{D} \ge p^e\). Multiplying the two inequalities and using (66) we find \(p^e \le ({{\mathfrak {B}}}{D})^2\).

By grassmannian theory this means that there are at most finitely many possibilities for the \({{{\mathcal {O}}}}_0\)-module generated by \(\mathbf{s,s}'\) in \({{{\mathcal {O}}}}_0^3\). Thus we can regard \(\mathbf{s,s}'\) as fixed.

Now the original relation (63) implies

$$\begin{aligned} \alpha _1\sigma _1+\alpha _2\sigma _2+\alpha _3\sigma _3=\alpha _1\sigma _1'+\alpha _2\sigma _2'+\alpha _3\sigma _3'=0 \end{aligned}$$

of which we need only one. We can apply \(\mathrm{GL}_3({{{\mathcal {R}}}})\) to assume it is \(\alpha _3=0\). This reduces the problem to \(\mathbf{G}_{\mathrm{a}}^2\).

Then a similar argument brings us down to \(\mathbf{G}_{\mathrm{a}}\). But the lemma is then empty, because now \(C=\mathbf{G}_{\mathrm{a}}\) is defined over \(\overline{F_0}\).

Right at the start of the proof we made an assumption about rank 1. But if the rank is bigger then things only get easier.

For example if the rank of the \((\alpha _1,\alpha _2,\alpha _3)\) in (63) is 2, then we can argue as in (65) without \(\xi '\) and there is no longer any need for minors.

And if the rank is 3, then \(\xi _1,\xi _2,\xi _3\) lie in \(\overline{F_0}\). But because C is not defined over \(\overline{F_0}\), this implies that \(C(\overline{F_0})\) is at most finite anyway. This completes the proof (and on the way we proved the analogue in \(\mathbf{G}_{\mathrm{a}}^2\)). \(\square \)

Completions and specializations

During this section we assume that C satisfies the conditions of Lemma 19. Of course these are more restrictive than the conditions of the Conjecture for \(n=3\), but we will see that this will be no problem. As in [9] we regard the field of definition of C as the function field \(\overline{F_0}(s_1,\ldots ,s_m)\) of an irreducible variety B, say of dimension \(l \ge 1\), in affine \(\mathbf{A}^m\) defined over \(\overline{F_0}\). We assume as in the previous section that \(s_1,\ldots ,s_l\) are algebraically independent over \({F_0}\). As in [9] we complete C to \({\hat{C}}\) in projective \(\mathbf{P}_3\) and then take a non-singular model \({\tilde{C}}\).

In [9] (p.463) we informally defined a variety \(C_B\) in \(\mathbf{A}^3 \times \mathbf{A}^m\); here this amounts to writing the equations of C in \(\mathbf{A}^3\) with coefficients in \(\overline{F_0}[s_1,\ldots ,s_m]\) and adjoining the equations of B. Of course one should more formally define it as the \(\overline{F_0}\)-Zariski closure of a point \((P,\eta )\), where \(\eta \) is generic on B and P is generic on C over \(\overline{F_0}(\eta )\). This makes it clear that \(C_B\) is irreducible of dimension \(l+1\). The natural projection \(\pi \) from \(\mathbf{A}^3 \times \mathbf{A}^m\) to \(\mathbf{A}^m\) then takes \(C_B\) to B. There is also a natural projection \(\gamma \) from \(\mathbf{A}^3 \times \mathbf{A}^m\) to \(\mathbf{A}^3\).

Then for a point b of B we define the specialization \(C_b\) by

$$\begin{aligned} C_b=\gamma (C_B \cap \pi ^{-1}(b)) \end{aligned}$$

and similarly \({\hat{C}}_b,{\tilde{C}}_b\). As in [9] we can assume that these specializations retain enough properties of \(C,{\hat{C}},{\tilde{C}}\) that we shall need, at least when b is restricted to a non-empty open subset \(B_0\) of B. And also for \(b=\eta \) a generic point; here we shall identify \(\eta \) with \((s_1,\ldots ,s_m)\).

For b in \(B_0\) we can regard \(x_1,x_2,x_3\) as functions on \(C_b\) but for simplicity we omit any subscript b that may possibly be more precise. Equally we omit the subscript for

$$\begin{aligned} f=\alpha _1x_1+\alpha _2x_2+\alpha _3x_3 \end{aligned}$$

or also \(f-\lambda \) with a constant function \(\lambda \). But we put it back in the notation \(\deg _b(f-\lambda )\) for the degree of \(f-\lambda \) on \(C_b\) (unless \(f-\lambda \) is identically zero on \(C_b\), a possibility that we shall soon essentially discount). Note that by our condition on C, this \(f-\lambda \) is certainly non-zero on \(C_\eta \) as long as \(\alpha _1,\alpha _2,\alpha _3\) are not all zero.

The next result replaces the simple argument in the last paragraph of [9] (p.463); here we lack any multiplicative structure.

Lemma 20

There is a non-empty open subset \(B_{00}\) of \(B_0\) with the following property. For any \(\lambda \) in \(\overline{F_0}\), any b in \(B_{00}\) and any \(\alpha _1,\alpha _2,\alpha _3\) not all zero the function \(f-\lambda \) is not identically zero on \(C_b\) and

$$\begin{aligned} \deg _b(f-\lambda ) \ge \deg _\eta (f-\lambda )-\deg C. \end{aligned}$$


We shall first prove the lemma under the assumption that \(f-\lambda \ne 0\) on \(C_b\). Then at the end of the proof we shall show that in fact this follows almost automatically.

Counting by poles we have now

$$\begin{aligned} \deg _b(f-\lambda )=\sum _{{{\tilde{P}}} \in {\tilde{C}}_b}\max \{0,-\mathrm{ord}_{{\tilde{P}}}(f-\lambda )\}. \end{aligned}$$

This holds also for \(b=\eta \), but in fact we shall identify \(C_\eta \) with C over \(\overline{F_0}[s_1,\ldots ,s_m]\). In this generic case we can restrict to \({{\tilde{P}}}\) in \({\tilde{C}}\) with

$$\begin{aligned} R=R_{{\tilde{P}}}=\max \{-\mathrm{ord}_{{\tilde{P}}}x_1,-\mathrm{ord}_{{\tilde{P}}}x_2,-\mathrm{ord}_{{\tilde{P}}}x_3\}>0 \end{aligned}$$

because f is a polynomial in \(x_1,x_2,x_3\).

With a uniformizer \(u=u_{{\tilde{P}}}\) we have a local expansion

$$\begin{aligned} x_i=\underline{x_i}+\overline{x_i} \end{aligned}$$

where \(\underline{x_i}\) involves only non-negative exponents and \(\overline{x_i}\) only (finitely many) negative exponents.

We are going to split the negative exponents into subsets of various sets \(\{r,rp,rp^2,\ldots \}\) with r prime to p. These sets are disjoint. Of course we will see \(rp^m\) only with

$$\begin{aligned} 1 \le r \le R,~~0 \le m \le n_r=\left[ {\log R/r \over \log p}\right] \le {\log R \over \log p}. \end{aligned}$$

Thus we can write

$$\begin{aligned} \overline{x_i}=\sum _r\sum _{m=0}^{n_r}{\lambda _{irm} \over u^{rp^m}}. \end{aligned}$$

At first the \(\lambda _{irm}\) are in the algebraic closure of \(\overline{F_0}(s_1,\ldots ,s_m)\) but by taking a finite cover of B and introducing more variables we can suppose that they lie in \(\overline{F_0}[s_1,\ldots ,s_m]\) itself (this is just for painless specialization).

We have corresponding \(f={\underline{f}}+\overline{f}\) with

$$\begin{aligned} \overline{f}=\alpha _1\overline{x_1}+\alpha _2\overline{x_2}+\alpha _3\overline{x_3} \end{aligned}$$

which is

$$\begin{aligned} \sum _r\left( \alpha _1\left( \sum _{m=0}^{n_r}{\lambda _{1rm} \over u^{rp^m}}\right) +\alpha _2\left( \sum _{m=0}^{n_r}{\lambda _{2rm} \over u^{rp^m}}\right) +\alpha _3\left( \sum _{m=0}^{n_r}{\lambda _{3rm} \over u^{rp^m}}\right) \right) . \end{aligned}$$

We calculate these using Lemma 17 with u there replaced by the various \(u^r\) and a suitably large d. We find

$$\begin{aligned} \overline{f}=\sum _r\sum _{j=0}^{d+n_r}{1 \over u^{rp^j}}(P_{1rj}+P_{2rj}+P_{3rj}) \end{aligned}$$


$$\begin{aligned} P_{1rj}= & {} \sum _{k=j}^{d+n_r}T_{kj}X_{1rk}^{p^{j-n_r}},~P_{2rj}=\sum _{k=j}^{d+n_r}T_{kj}X_{2rk}^{p^{j-n_r}},\\ ~P_{3rj}= & {} \sum _{k=j}^{d+n_r}T_{kj}X_{3rk}^{p^{j-n_r}}~~(j=n_r,\ldots ,d+n_r) \end{aligned}$$

and the \(X_{1rk},X_{2rk},X_{3rk}\) are defined as in (56)-(57) with \(n=n_r\) on taking the \(A_1,A_2,A_3\) in \(F_0\) with \(\alpha _1=A_1({{{\mathcal {C}}}}),\alpha _2=A_2({{{\mathcal {C}}}}),\alpha _3=A_3({{{\mathcal {C}}}})\). Thus we need d at least \(\deg A_1,\deg A_2,\deg A_3\) and \(p^d \ge R\). We get

$$\begin{aligned} \overline{f}=\sum _{s=0}^S{P_s \over u^s} \end{aligned}$$

for say \(S=p^dR^2\).

Up to now we are in the generic situation but soon we shall specialize.

Suppose first the \(\omega _{{\tilde{P}}}=-\mathrm{ord}_{{\tilde{P}}}(f-\lambda ) >0\). Then it is \(-\mathrm{ord}_{{\tilde{P}}}{\overline{f}}\) and so has the form \({\tilde{s}}={\tilde{r}}p^{e-1}\) for some unique \({\tilde{r}}\) and \(e \ge 1\). This means of course

$$\begin{aligned} P_s=0~~~(s > {\tilde{s}}) \end{aligned}$$

so in particular

$$\begin{aligned} P_s=0~~~(s={\tilde{r}}p^e,\ldots ,{\tilde{r}}p^{d+{\tilde{n}}}) \end{aligned}$$

for \({\tilde{n}}=n_{{\tilde{r}}}\), but

$$\begin{aligned} P_{{\tilde{s}}} \ne 0. \end{aligned}$$

We aim to specialize these to b in \(B_0\) with corresponding \(P_s(b)\). Of course (71) and (72) are trivially done, and we must pay attention only to \(P_{{\tilde{s}}}(b)\).

By (70) we have for \(s={\tilde{r}}p^j\)

$$\begin{aligned} P_s(b)=P_{1{\tilde{r}} j}(b)+P_{2{\tilde{r}} j}(b)+P_{3{\tilde{r}} j}(b)=\sum _{k=j}^{d+{\tilde{n}}}T_{kj}X_k^{p^{j-{\tilde{n}}}}(b)~~~(j={\tilde{n}},\ldots ,d+{\tilde{n}}) \end{aligned}$$


$$\begin{aligned} X_k(b)=X_{1{\tilde{r}} k}+X_{2{\tilde{r}} k}+X_{3{\tilde{r}} k}. \end{aligned}$$

Thus at \(\eta \), the equations (72),(73) together with triangularity as in Remark 2 imply

$$\begin{aligned} X_k(\eta )=0~~~~(k=e,\ldots ,d+{\tilde{n}}) \end{aligned}$$

provided \(e-1 \ge {\tilde{n}}\), but

$$\begin{aligned} X_{e-1}(\eta ) \ne 0. \end{aligned}$$

Thus for any b in \(B_0\) not a zero of \(X_{e-1}\) we can specialize (75),(76); and doing the thing backwards leads to the required specializations of (72),(73). It follows that the specialized \(\omega _{{\tilde{P}}}(b)\) of \(f-\lambda \) on \(C_b\) is the same as the generic \(\omega _{{\tilde{P}}}\).

By Remark 1 and (74), this \(X_{e-1}(\eta )\) is a linear form in at most \(3({\tilde{n}}+1)\) of the coefficients of \(A_1,A_2,A_3\), whose coefficients are themselves of total degree at most \(p^{{\tilde{n}}+1}\) in the

$$\begin{aligned} \lambda _{1{\tilde{r}}0},\ldots ,\lambda _{1{\tilde{r}}{\tilde{n}}},\lambda _{2{\tilde{r}}0},\ldots ,\lambda _{2{\tilde{r}}{\tilde{n}}},\lambda _{3{\tilde{r}}0},\ldots ,\lambda _{3{\tilde{r}}{\tilde{n}}}. \end{aligned}$$

Thus indeed we can take any b in a set \(B_{00}\) independent of \(\alpha _1,\alpha _2,\alpha _3\). For example, if we want \(X=a_1\Lambda _1+a_2\Lambda _2 \ne 0\) at b for all non-zero \((a_1,a_2)\) in \(\mathbf{F}_p^2\) then it suffices that \(Y=\Lambda _1\Lambda _2(\Lambda _1^{p-1}-\Lambda _2^{p-1}) \ne 0\) (related to the Moore determinant in [30] p.8) at b, easy if \(Y \ne 0\) already at \(\eta \).

Above we assumed that \(e-1 \ge {\tilde{n}}\). If this is not the case, then our counting by poles gives

$$\begin{aligned} \omega _{{\tilde{P}}}(b) \ge 0 \ge {\tilde{r}}p^{e-1}-{\tilde{r}}p^{{\tilde{n}}-1}=\omega _{{\tilde{P}}}-{\tilde{r}}p^{{\tilde{n}}-1}\ge \omega _{{\tilde{P}}}-R. \end{aligned}$$

Thus in both cases for e we get

$$\begin{aligned} \omega _{{\tilde{P}}}(b) \ge \ \omega _{{\tilde{P}}}-R_{{\tilde{P}}}. \end{aligned}$$

So at each \({{\tilde{P}}}\) on \({\tilde{C}}\) where at least one of \(x_1,x_2,x_3\) has a pole, if \(\omega _{{\tilde{P}}}>0\) we have

$$\begin{aligned} \max \{0,\omega _{{\tilde{P}}}(b)\}\ge \max \{0,\omega _{{\tilde{P}}}\}-\max \{0,-\mathrm{ord}_{{\tilde{P}}}x_1,-\mathrm{ord}_{{\tilde{P}}}x_2,-\mathrm{ord}_{{\tilde{P}}}x_3\}; \end{aligned}$$

and this holds trivially if \(\omega _{{\tilde{P}}} \le 0\).

Thus summing over all such \({{\tilde{P}}}\) we get

$$\begin{aligned} \deg _b(f-\lambda ) \ge \deg _\eta (f-\lambda )-\sum _{{\tilde{P}}}\max \{0,-\mathrm{ord}_{{\tilde{P}}}x_1,-\mathrm{ord}_{{\tilde{P}}}x_2,-\mathrm{ord}_{{\tilde{P}}}x_3\}. \end{aligned}$$

The sum on the right is the total number of poles (with multiplicity) of a generic linear combination of \(x_1,x_2,x_3\). So the sum is \(\deg C\), the total number of zeroes.

As promised we now show that for any b in our present \(B_{00}\), indeed \(f-\lambda \) is not identically zero on \(C_b\). The corresponding assertion in the multiplicative situation is proved in [9] at the bottom of page 463.

Suppose on the contrary \(f=\lambda \) on \(C_b\). Choose any non-torsion \(\tau \) in \(\overline{F_0}\); then \(f \ne \lambda +\tau \) on \(C_b\). Thus \({{{\mathcal {C}}}}^kf \ne \lambda _k={{{\mathcal {C}}}}^k(\lambda +\tau )\) on \(C_b\) for any \(k \ge 0\). We apply (69) to \({{{\mathcal {C}}}}^kf-\lambda _k={{{\mathcal {C}}}}^k\lambda -\lambda _k=-{{{\mathcal {C}}}}^k\tau \ne 0\), getting

$$\begin{aligned} 0 \ge \deg _\eta ({{{\mathcal {C}}}}^kf-\lambda _k)-\deg C. \end{aligned}$$

But \(f-\lambda -\tau \) is not constant on \(C_\eta \) by our basic hypothesis, so \(f-\lambda -\tau \) has at least one pole there. Thus \({{{\mathcal {C}}}}^k(f-\lambda -\tau )={{{\mathcal {C}}}}^kf-\lambda _k\) has a pole of order at least \(p^k\) on \(C_\eta \). Therefore the first degree on the right-hand side of (77) is at least \(p^k\). Now we obtain a contradiction by making k tend to infinity. \(\square \)

The next result essentially replaces an argument in the proof of Lemma 6.1 of [9] (p.464); here we lack Mason’s abc Theorem.

Lemma 21

For b in \(B_{00}(\overline{F_0})\) or \(b=\eta \) there is a finite union \({{{\mathcal {E}}}}_b\) (possibly empty) of rank 2 submodules of \({{{\mathcal {R}}}}^3\) with the following property. Suppose the non-zero \((\alpha _1,\alpha _2,\alpha _3)\) in \({{{\mathcal {R}}}}^3\) is not in \({{{\mathcal {E}}}}_b\) (if non-empty). Then if \(\mathrm{ord}_{{\tilde{P}}}f >0\) for some \({{\tilde{P}}}\) in \({\tilde{C}}_b\), we have

(a) \(\mathrm{ord}_{{\tilde{P}}}f =1\) for any \({{\tilde{P}}}\) over a non-singular point P of \(C_b\),

(b) \(\mathrm{ord}_{{\tilde{P}}}f \le \deg C\) for any \({{\tilde{P}}}\) over a singular point of \(C_b\),

(c) \(\mathrm{ord}_{{\tilde{P}}}f \le (\deg C)(1+\deg u)\) for any \({{\tilde{P}}}\) over an infinite point of \(C_b\), where u is a corresponding uniformizer.


We assume first that b is in \(B_{00}(\overline{F_0})\).

With as usual \(\alpha _i=A_i({{{\mathcal {C}}}})\) we have by (33)

$$\begin{aligned} \mathrm{d}f=A_1(t)\mathrm{d}x_1+A_2(t)\mathrm{d}x_2+A_3(t)\mathrm{d}x_3 \end{aligned}$$

on C or \(C_b\). This is the analogue of the multiplicative

$$\begin{aligned} {\mathrm{d}(x_1^{a_1}x_2^{a_2}x_3^{a_3}) \over x_1^{a_1}x_2^{a_2}x_3^{a_3}}=a_1{\mathrm{d}x_1\over x_1}+a_2{\mathrm{d}x_2 \over x_2}+a_3{\mathrm{d}x_3 \over x_3}. \end{aligned}$$

In case (a) at least one of \(\mathrm{d}x_1,\mathrm{d}x_2,\mathrm{d}x_3\) must be non-zero at \({{\tilde{P}}}\) on \(C_b\). Say it is \(\mathrm{d}x \ne 0\). Then

$$\begin{aligned} {\mathrm{d}f \over \mathrm{d}x}=A_1(t)g_1+A_2(t)g_2+A_3(t)g_3=\phi \end{aligned}$$

say, with fixed functions \(g_i=\mathrm{d}x_i/\mathrm{d}x\).

If (a) is false we have \(\phi ({{\tilde{P}}})=0\) on \(C_b\). But this implies

$$\begin{aligned} {[}F_0(P):F_0] \le D_b \end{aligned}$$

for some \(D_b\) possibly depending on b; unless, that is, \(\phi \) is identically zero on \(C_b\). We deal with this last possibility first.

If \(\phi \) is identically zero on \(C_b\) then \((A_1(t),A_2(t),A_3(t))\) in \({{{\mathcal {O}}}}_0^3\) is in the additive relation group (with an obvious extension of the notion in [41] especially section 3) of \(g_1,g_2,g_3\) in \(F_0(b)(C_b)\). This group is of course a \({{{\mathcal {O}}}}_0\)-module. If it were the full \({{{\mathcal {O}}}}_0^3\) then \(g_1,g_2,g_3\) would all be identically zero on \(C_b\) which is absurd, because actually one of them is 1. So there is non-zero \((\varepsilon _1,\varepsilon _2,\varepsilon _3)\) in \({{{\mathcal {R}}}}^3\), possibly depending on b, such that \(\varepsilon _1\alpha _1+\varepsilon _2\alpha _2+\varepsilon _3\alpha _3 = 0\). We put the corresponding rank 2 submodule into \({{{\mathcal {E}}}}_b\).

Thus indeed we may assume that (78) holds.

Now \(f({{\tilde{P}}})=0\), and so by Proposition 1 above together with the Northcott property we deduce that there are at most finitely many possibilities for \({{\tilde{P}}}\). For each of those \({{\tilde{P}}}\) which are over a non-singular point we still have \(\phi ({{\tilde{P}}})=0\), and again this leads to \((\varepsilon _1,\varepsilon _2,\varepsilon _3)\) as above. This completes case (a).

In case (b) for \({{\tilde{P}}}\) over a singular point \((\xi _1,\xi _2,\xi _3)\) of \(C_b\) we have

$$\begin{aligned} x_i=\xi _i+\lambda _iu^r+\cdots \end{aligned}$$

for a suitably chosen uniformizer u and with \(\lambda _1,\lambda _2,\lambda _3\) values, not all zero, of fixed functions on C evaluated at \({{\tilde{P}}}\) and specialized at b. Here \(r \le \deg C\) because

$$\begin{aligned} \mathrm{ord}_{{\tilde{P}}}(x_i-\xi _i) \le \deg (x_i-\xi _i) = \deg x_i \le \deg C. \end{aligned}$$

So \(f=\mu u^r+\cdots \) for \(\mu =A_1(t)\lambda _1+A_2(t)\lambda _2+A_3(t)\lambda _3\). This gives the result unless \(\mu =0\); in which case as above this leads again to \((\varepsilon _1,\varepsilon _2,\varepsilon _3)\).

Finally in case (c) we go back to the decomposition \(x_i=\underline{x_i}+\overline{x_i}\) in the proof of Lemma 20, where \(\underline{x_i}\) involves only non-negative exponents and \(\overline{x_i}\) only (finitely many) negative exponents. As \(f({{\tilde{P}}})=0\) we must have \(\alpha _1\overline{x_1}+\alpha _2\overline{x_2}+\alpha _3\overline{x_3}=0\) identically on \(C_b\). This implies that \(\underline{x_1},\underline{x_2},\underline{x_3}\) cannot all be zero on \(C_b\), otherwise so would \(f=\alpha _1x_1+\alpha _2x_2+\alpha _3x_3\) be, already impossible for b in \(B_{00}\) thanks to Lemma 20.

So some \(\underline{x_i} \ne 0\). If also \(\overline{x_i} \ne 0\) then

$$\begin{aligned} \mathrm{ord}_{{\tilde{P}}}\underline{x_i} = \mathrm{ord}_{{\tilde{P}}}(x_i-\overline{x_i})\le \deg (x_i-\overline{x_i})\le \deg x_i+\deg \overline{x_i}. \end{aligned}$$


$$\begin{aligned} \deg \overline{x_i}\le |\mathrm{ord}_{{\tilde{P}}}~x_i|\deg u \le (\deg x_i)(\deg u) \end{aligned}$$

and so

$$\begin{aligned} \mathrm{ord}_{{\tilde{P}}}\underline{x_i} \le (\deg C)(1+\deg u)=s \end{aligned}$$

say. And this clearly holds even if \(\overline{x_i} = 0\).

So for each i we can write \(\underline{x_i}=\lambda _iu^s+\cdots \) as in (79), and as

$$\begin{aligned} \mathrm{ord}_{{\tilde{P}}}f=\mathrm{ord}_{{\tilde{P}}}(\alpha _1\underline{x_1}+\alpha _2\underline{x_2}+\alpha _3\underline{x_3}) \end{aligned}$$

we can follow the arguments in case (b).

If \(b=\eta \) then in (a) the only change is as follows. Now (78) becomes \([F_0(s_1,\ldots ,s_l,P):F_0(s_1,\ldots ,s_l)] \le D_\eta \), so the index in Lemma 19 is no bigger; this lemma shows that there are at most finitely many \({{\tilde{P}}}\). The argument for (b) is essentially unchanged. Similarly for (c); here we already know that f is not identically zero on \(C_\eta \). \(\square \)

The result would become false without the condition involving \({{{\mathcal {E}}}}_b\). For example with C as the line parametrized by (xtxsx) and \(\alpha _1={{{\mathcal {C}}}},\alpha _2=-1,\alpha _3=0\) we have \(f=x^p\) contradicting (a) at \(P=(0,0,0)\).

In fact we are not so very far from the classical abc Mason result. For example over \(\mathbf{C}\) consider \(x^{a_1}(x-1)^{a_2}(x-2)^{a_3}-1\) for positive integer exponents. This is a non-zero polynomial of degree \(D=a_1+a_2+a_3\); and abc implies at once that it has at least \(D-2\) zeroes without multiplicity. A Carlitz analogue might be

$$\begin{aligned} A_1({{{\mathcal {C}}}})(x)+A_2({{{\mathcal {C}}}})(tx)+A_3({{{\mathcal {C}}}})(t^2x)-1 \end{aligned}$$

for monic \(A_1,A_2,A_3\). If \(p \ne 2,3\) it is easy to see that this has degree \(D=\max \{||A_1||,||A_2||,||A_3||\}\) in x. Using (33) as above we see that its derivative is simply \(A_1+tA_2+t^2A_3\). If \(p \ne 2,3\) that is non-zero; thus we see that the polynomial now has exactly D zeroes (without multiplicity).

Now we can give an analogue of Lemma 6.1 of [9] (p.464). We write \(N_{\mathrm{sing}}\) for the number of points of \({\tilde{C}}\) above singular points of C and \(S_{\mathrm{inf}}\) for the sum of \(1+\deg u\) taken over all points above infinite points of C with u a corresponding uniformizer. Both quantities remain unchanged when we replace C by \(C_b\). For \(\alpha _1,\alpha _2,\alpha _3\) as above we define f as usual and then H in \(\mathbf{A}^3 \times \mathbf{A}^m\) by the equation \(f=0\). It is convenient during this section to assume that \(C_B \cap H\) is non-empty.

Lemma 22

For b in \(B_{00}(\overline{F_0})\) or \(b=\eta \) suppose the nonzero \((\alpha _1,\alpha _2,\alpha _3)\) in \({{{\mathcal {R}}}}^3\) is not in \({{{\mathcal {E}}}}_b\) (if non-empty). Then \(C_B \cap H \cap \pi ^{-1}(b)\) is a finite set whose cardinality satisfies

$$\begin{aligned} \deg _bf-(\deg C)(N_{\mathrm{sing}}+S_{\mathrm{inf}}) \le \#(C_B \cap H \cap \pi ^{-1}(b)) \le \deg _bf. \end{aligned}$$


We estimate

$$\begin{aligned} \deg _bf=\sum _{{{\tilde{P}}} \in {\tilde{C}}_b}\max \{0,\mathrm{ord}_{{\tilde{P}}}f\} \end{aligned}$$

from above and below. We need to take into account only zeroes \({{\tilde{P}}}\) of f.

By Lemma 21, each \({{\tilde{P}}}\) in (81) above a non-singular point of \(C_b\) contributes 1. The number of such \({{\tilde{P}}}\) is at most the cardinality of \(C_b \cap \gamma (H)\) (finite by Lemma 20), which is the same as that of \(C_B \cap H \cap \pi ^{-1}(b)\) because \(\gamma \) is an injection even on \(C_B \cap \pi ^{-1}(b)\). Similarly each \({{\tilde{P}}}\) above a singular point contributes at most \(\deg C\), and each \({{\tilde{P}}}\) above an infinite point at most \((\deg C)(1+\deg u)\). The left-hand inequality of (80) follows.

On the other hand (81) is at least the number of zeroes (without multiplicity) of f over finite points of \(C_b\), and this proves the right-hand inequality. \(\square \)

Next we give an analogue of Lemma 6.2 of [9] (p.464). The Dimension Theorem (see for example [35] p.36) shows that \(C_B \cap H\) (here assumed non-empty) has all its components of dimension l unless \(C_B\) is in H; but this last possibility is excluded by our original hypothesis on C.

Lemma 23

For b in \(B_{00}(\overline{F_0})\) or \(b=\eta \) suppose the nonzero \((\alpha _1,\alpha _2,\alpha _3)\) in \({{{\mathcal {R}}}}^3\) is not in \({{{\mathcal {E}}}}_b\) (if non-empty). Then for any finite union W of components of \(C_B \cap H\) we have

$$\begin{aligned} \# \left( W \cap \pi ^{-1}(b)\right) \ge \# \left( W \cap \pi ^{-1}(\eta )\right) -(\deg C) \left( 1+N_{\mathrm{sing}}+S_{\mathrm{inf}}\right) . \end{aligned}$$


Write \(s(b),s(\eta )\) for the two cardinalities to be compared. Let \(W'\) be the union of the components of \(C_B \cap H\) not in the union W, and write \(s'(b),s'(\eta )\) analogously. By Lemma 22 we have

$$\begin{aligned} s(b)+s'(b) \ge \# \left( C_B \cap H \cap \pi ^{-1}(b)\right) \ge \deg _bf-(\deg C) \left( N_{\mathrm{sing}}+S_{\mathrm{inf}}\right) . \end{aligned}$$


$$\begin{aligned} s(\eta )+s'(\eta ) = \# \left( C_B \cap H \cap \pi ^{-1}(\eta )\right) \end{aligned}$$

since the sets \(W \cap \pi ^{-1}(\eta ),W' \cap \pi ^{-1}(\eta )\) are disjoint. This is because \(W \cap W'\) has dimension at most \(l-1\), so cannot project to \(\eta \). Also by Lemma 22 we have

$$\begin{aligned} \# \left( C_B \cap H \cap \pi ^{-1}(\eta )\right) \le \deg _\eta f \end{aligned}$$

which by Lemma 20 is at most \(\deg _bf+\deg C\).

Comparing these inequalities we see that it suffices now only to verify \(s'(b) \le s'(\eta )\). But this follows from Lemma 16 (with \(W'\) not W). We just have to note that \(W' \cap \pi ^{-1}(b)\) is finite by Lemma 22. \(\square \)

However the next result has no analogue in [9] which is solidly over zero characteristic and so all inseparable degrees \(\deg ^{\mathrm{ins}}\) are 1.

Lemma 24

Suppose the nonzero \((\alpha _1,\alpha _2,\alpha _3)\) in \({{{\mathcal {R}}}}^3\) is not in \({{{\mathcal {E}}}}_\eta \) (if non-empty). Then if

$$\begin{aligned} \deg _\eta f > 2(\deg C)(N_{\mathrm{sing}}+S_{\mathrm{inf}}) \end{aligned}$$

we have

$$\begin{aligned} \deg ^{\mathrm{ins}}_\eta f=1. \end{aligned}$$


By Lemma 22 we have

$$\begin{aligned} \#(C_B \cap H \cap \pi ^{-1}(\eta )) \ge d-(\deg C)(N_{\mathrm{sing}}+S_{\mathrm{inf}}) \end{aligned}$$

for \(d=\deg _\eta f\). Now \(d^{\mathrm{sep}}=\deg ^{\mathrm{sep}}_\eta f\) is the number of solutions of \(f=\lambda \) (without multiplicity) for generic \(\lambda \). But these solutions are simply translates of the solutions of \(f=0\) by a single point. Thus

$$\begin{aligned} \#(C_B \cap H \cap \pi ^{-1}(\eta ))=d^{\mathrm{sep}}={d \over \deg ^{\mathrm{ins}}_\eta f} \end{aligned}$$

and the lemma follows at once. \(\square \)

Almost finishing

Here we prove our Theorem for general K, which as above we can take as \(K=\overline{F_0(s_1,\ldots ,s_l)}\) with \(l \ge 1\), and C defined over \(\overline{F_0}(s_1,\ldots ,s_m)=\overline{F_0}(\eta )\). We may assume that C is not defined over \(\overline{F_0}\), else every point with just one non-trivial relation would already be over \(\overline{F_0}\) and so our Theorem for that field suffices.

Throughout this section (as in the previous section) we shall assume that no non-zero \(\rho _1x_1+\rho _2x_2+\rho _3x_3\) is identically constant on C. In the next section we shall relax this as required.

Now C is defined over some field \(K_*\), finitely generated over \(\mathbf{F}_p\), which lies in \(\overline{F_0}(\eta )\). So we can find a finite extension F of \(F_0\) with \(K_*\) inside \(F(\eta )\). Both of these latter fields are finitely generated over \(F_0\) with transcendence degree l and so the index \([F(\eta ):K_*]=e\) is finite.

Fix once and for all some b in \(B_{00}(\overline{F_0})\). By Lemma 20 the hypothesis of our Theorem for \(C_b\) over \(\overline{F_0}\) is satisfied. Thus there are at most finitely many points on \(C_b(\overline{F_0})\) satisfying two independent relations. Let \(r \ge 0\) be their cardinality.

Let P be any point on C with two independent relations (if it exists at all), and let W be the F-Zariski closure of \((P,\eta )\) on \(C_B\) in \(\mathbf{A}^3 \times \mathbf{A}^m\).

Now there is a submodule \({{\mathcal {M}}}\) of \({{{\mathcal {R}}}}^3\) of rank at least 2 that kills P. It may be that \({{\mathcal {M}}}\) lies in the unions \({{{\mathcal {E}}}}_b,{{{\mathcal {E}}}}_\eta \) (if non-empty) appearing in Lemma 21. But then \({{\mathcal {M}}}\) would be in one of the members making up \({{{\mathcal {E}}}}_b \cup {{{\mathcal {E}}}}_\eta \). Using \(\mathrm{GL}_3({{{\mathcal {R}}}})\) as at the end of the proof of Lemma 19, we can assume that \({{\mathcal {M}}}\) actually lies in \({{{\mathcal {R}}}}^2\). But now the problem in \(\mathbf{G}_a^3\) is reduced to one in \(\mathbf{G}_a^2\), an easy torsion problem.

Thus we can assume that \({{\mathcal {M}}}\) does not lie in \({{{\mathcal {E}}}}_b \cup {{{\mathcal {E}}}}_\eta \). Pick any \((\alpha _1,\alpha _2,\alpha _3)\) in \({{\mathcal {M}}}\) not in \({{{\mathcal {E}}}}_b \cup {{{\mathcal {E}}}}_\eta \). This gives an H as above; but of course there is an element of \({{\mathcal {M}}}\) independent of \((\alpha _1,\alpha _2,\alpha _3)\), and this gives analogously some \(H'\).

Thus \((P,\eta )\) lies in \(C_B \cap H \cap H'\) (and in particular \(C_B \cap H\) is non-empty as we assumed in the preceding section). The dimension of W is at least l. It follows that W (which is irreducible over F) is a finite union of \(\overline{F_0}\)-irreducible components of \(C_B \cap H\) (all of dimension l as we saw above). Therefore by Lemma 23 we have

$$\begin{aligned} \#(W \cap \pi ^{-1}(b)) \ge \#(W \cap \pi ^{-1}(\eta ))-c_1 \end{aligned}$$

for some \(c_1\) depending only on C.

But \(\#(W \cap \pi ^{-1}(\eta ))\) is just the degree

$$\begin{aligned} {[}F(P,\eta ):F(\eta )]^{\mathrm{sep}}={[F(P,\eta ):F(\eta )] \over [F(P,\eta ):F(\eta )]^{\mathrm{ins}}}. \end{aligned}$$

Now \(F(P,\eta )\) lies in the field \(K_f\) obtained by adjoining to \(F(\eta )\) all solutions of \(f=0\) on \(C_\eta \). So

$$\begin{aligned} {[}F(P,\eta ):F(\eta )]^{\mathrm{ins}} \le [K_f:F(\eta )]^{\mathrm{ins}}=\deg _\eta ^{\mathrm{ins}}f. \end{aligned}$$

If \(d=\deg _\eta f >c_2\) again for \(c_2 \ge 1\) depending only on C, then by Lemma 24 we have \(\deg _\eta ^{\mathrm{ins}}f=1\). But otherwise

$$\begin{aligned} {[}F(P,\eta ):F(\eta )]^{\mathrm{ins}} \le \deg _\eta ^{\mathrm{ins}}f \le d \le c_2. \end{aligned}$$

Thus in both cases

$$\begin{aligned} \#(W \cap \pi ^{-1}(\eta ))=[F(P,\eta ):F(\eta )]^{\mathrm{sep}} \ge c_2^{-1}[F(P,\eta ):F(\eta )]. \end{aligned}$$

And in turn

$$\begin{aligned} {[}F(P,\eta ):F(\eta )]={[F(P,\eta ):K_*(P)][K_*(P):K_*] \over [F(\eta ):K_*]} \ge {[K_*(P):K_*] \over e}. \end{aligned}$$

Collecting these together, we see that \(W \cap \pi ^{-1}(b)\) contains at least

$$\begin{aligned} c_2^{-1}e^{-1}[K_*(P):K_*] -c_1 \end{aligned}$$

different points. These project under \(\gamma \) to different points of \(C_b \cap \gamma (H) \cap \gamma (H')\), whose cardinality is at most r. Therefore

$$\begin{aligned} {[}K_*(P):K_*] \le (c_1+r)c_2e \end{aligned}$$

is bounded above independently of P. Now \(K_*(t,s_1,\ldots ,s_l)\) contains \(K_*\) and \(F_0(s_1,\ldots ,s_l)\); and all are finitely generated over \(F_0\) with transcendence degrees l. So all indices are finite, and so the index \([F_0(s_1,\ldots ,s_l,P):F_0(s_1,\ldots ,s_l)]\) is also bounded above independently of P.

Thus we can use Lemma 19 (in which the index is no bigger) to conclude as desired that there are at most finitely many P. This completes the proof of the Theorem when no non-zero \(\rho _1x_1+\rho _2x_2+\rho _3x_3\) is identically constant on C.

Relaxing the condition

Finally if some non-zero \(\rho _1x_1+\rho _2x_2+\rho _3x_3\) is identically constant on C, then again via \(\mathrm{GL}_3\) we can assume that \(x_3\) is a constant \(\xi _3\) on C. It is then necessarily non-torsion. Now the thing reduces to the analogue of Mordell-Lang on the projection to \(\mathbf{G}_{\mathrm{a}}^2\): both \(\xi _1\) and \(\xi _2\) are in the division hull of \({{{\mathcal {R}}}}\xi _3\). In Sect. 4 we had reached a similar stage, but that was over \(\overline{F_0}\). Oddly enough the extension to \(\overline{F_0(s_1,\ldots ,s_l)}\) is not in the literature, although Ghioca comes very close in [27]. In the personal communication [28] he does establish what we need here. But we can also use the ideas of [9] as follows.

Indeed we argue as in Sect. 4.

Namely as in (23) we can find \(\xi \) and a torsion \(\zeta \) such that

$$\begin{aligned} F_0(s_1,\ldots ,s_l)(\xi _1,\xi _2,\xi _3)=F_0(s_1,\ldots ,s_l)(\xi ,\zeta ) \end{aligned}$$

and (24). Further if \(\zeta \) has order \(\nu \) then \(\tau _1,\tau _2,\tau _3\) are polynomials in \({{{\mathcal {R}}}}\) of degree less than that of \(\nu \).

Then we can proceed down to (26), now in the form

$$\begin{aligned} D=\left[ F_0(s_1,\ldots ,s_l)(\xi _1,\xi _2,\xi _3):F_0(s_1,\ldots ,s_l)\right] \ll (\Vert \nu \Vert M)^{1/2} \end{aligned}$$

with implied constants depending only on C (and \(\epsilon \) soon to appear).

On the other hand (24) gives

$$\begin{aligned} h_s(\xi _1)=\Vert \sigma _1\Vert h_s(\xi ),~~h_s(\xi _2)=\Vert \sigma _2\Vert h_s(\xi ),~~h_s(\xi _3)=\Vert \sigma _3\Vert h_s(\xi ) \end{aligned}$$

as \(h_s=0\) on all of \(\overline{F_0}\).

Now \(h_s(\xi _3) \ll 1\) as \(\xi _3\) is fixed. But we claim that also

$$\begin{aligned} h_s(\xi _1) \ll 1,~~~h_s(\xi _2) \ll 1. \end{aligned}$$

To see this, project C down to a curve \(C'\) in \(\mathbf{G}_{\mathrm{a}}^2\). There is a non-trivial Carlitz relation between \(\xi _1,\xi _2\) and so by Lemma 18 we indeed get (84) unless some non-trivial \(\rho _1x_1+\rho _2x_2\) is constant on \(C'\). With \(\mathrm{GL}_2\) we could then assume \(x_2=\xi _2\) is constant on \(C'\). But now \(\xi _2,\xi _3\) must be independent and so we could not have had two independent relations among \(\xi _1,\xi _2,\xi _3\).

Now (83) implies

$$\begin{aligned} h_s(\xi ) \ll M^{-1}. \end{aligned}$$

We can assume that \(\xi \) is not in \(\overline{F_0}\), because otherwise by (24) the point \((\xi _1,\xi _2,\xi _3)\) would be in \(C(\overline{F_0})\); and because C is not defined over \(\overline{F_0}\) this would give the required finiteness at once. Now in (68) we have

$$\begin{aligned} \left[ \overline{F_0}(s_1,\ldots ,s_l,\xi ):\overline{F_0}(s_1,\ldots ,s_l)\right] =\left[ \overline{F_0}(s_1,\ldots ,s_l,\xi ):\overline{F_0}(s_1,\ldots ,s_l,\zeta )\right] \end{aligned}$$

because \(\zeta \) is in \(\overline{F_0}\). This in turn is at most

$$\begin{aligned} \left[ {F_0}(s_1,\ldots ,s_l,\xi ):{F_0}(s_1,\ldots ,s_l,\zeta )\right] \le \left[ {F_0}(s_1,\ldots ,s_l,\xi _1,\xi _2,\xi _3):{F_0}(s_1,\ldots ,s_l,\zeta )\right] \end{aligned}$$

which is

$$\begin{aligned} {D \over [F_0(s_1,\ldots ,s_l,\zeta ):F_0(s_1,\ldots ,s_l)]}. \end{aligned}$$

Here the denominator is \([F_0(\zeta ):F_0]=\phi (\nu ) \gg \Vert \nu \Vert ^{1-\epsilon }\) for any \(\epsilon >0\).

So we can indeed assume

$$\begin{aligned} h_s(\xi ) \gg {\Vert \nu \Vert ^{1-\epsilon } \over D} \end{aligned}$$

a sort of “cyclotomic Lehmer” in the sense of Proposition 2. Comparing with (85) we find \(D \gg M\Vert \nu \Vert ^{1-\epsilon } \ge (M\Vert \nu \Vert )^{1-\epsilon }\). We choose \(\epsilon <1/2\) to get by (82) \(D \ll 1\), and now we conclude with Lemma 19 (which we already noted holds for \(\mathbf{G}_{\mathrm{a}}^2\)) applied to \((\xi _1,\xi _2)\) on \(C'\).

This completes the proof of the Theorem.