Unlikely intersections for curves in products of Carlitz modules

The conjectures associated with the names of Zilber-Pink greatly generalize results associated with the names of Manin-Mumford and Mordell-Lang, but unlike the latter they are almost exclusively restricted to zero characteristic. Not so long ago the second author made a start on removing this restriction by studying multiplicative groups over positive characteristic, and recently both authors went further for additive groups with extra Frobenius structure. Here we study additive groups with extra structure coming instead from the Carlitz module. We state a conjecture for curves in general dimension and we prove it in three dimensions. The main tool is a new relative version (for cyclotomic fields) of Denis’s analogue of Dobrowolski’s classical lower bound for heights, as well as a suitable upper bound. We also work out a couple of special cases in two dimensions: for example with respect to prime fields there are exactly 23 Carlitz roots of unity whose reciprocals are also roots of unity.


Introduction
For over two decades now much has been written on the study of what happens when a fixed algebraic variety sitting inside a fixed commutative group variety is intersected with the union of group subvarieties of suitable dimension. When the group variety is the multiplicative group G n m , we may refer to the work of Bombieri, Zannier and the second author (for example the early paper [8] on curves, our later paper [11] on varieties of codimension 2, and our paper [12] on planes) and the wide-ranging extension of Habegger to arbitrary varieties (see [31] for example). When the group variety is projectively complete there are the results of Viada about powers of a fixed elliptic curve (see [58] for example) as well as those of Rémond generalizing to abelian varieties (see [54] for example); see especially the paper [32] of Habegger and Pila. There are also investigations of Zannier and the second author inside varying group varieties such as elliptic and abelian schemes (see [45][46][47] for example). All this work on "unlikely intersections" takes place over zero characteristic, and one may consult the book [59] of Zannier for a comprehensive survey. The general conjectures are due to Zilber [60] and Pink [51].
Over positive characteristic it is well-known that related simpler problems, such as those associated with the names Manin-Mumford about torsion points, can become false. For example over zero characteristic the equation has only two solutions in roots of unity x and y (involving primitive sixth roots). However over characteristic p there are infinitely many; indeed we can take any x = 0, 1 in the algebraic closure F p and then y accordingly. Another special kind of unlikely intersection occurs when we intersect the variety with a finitely generated group, an area often associated with the names Mordell-Lang. For example over zero characteristic we can ask for solutions of (1) with x a power of 3 and y a power of −2, amounting essentially to the equation 3 a − 2 b = 1. This has for centuries been known to have only two solutions in integers a, b. However over characteristic p inside the function field F p (t), with x a power of t and y a power of 1 − t, we have infinitely many solutions For much more see for example the papers [33] of Hrushovski and [49] of Moosa and Scanlon. And the torsion situation can be combined with the finitely generated situation by allowing finite rank; under this heading see for example the papers [29] of Ghioca and Moosa and [26] of Ghioca. The second author [43] made a start on Zilber-Pink problems over positive characteristic, formulating a conjecture for curves in G n m and proving it for G 3 m . Then in [15] we continued the study of such problems, but now for the additive group G n a . Over zero characteristic the naive conjectures for G n a become false, because they implicitly involve group subvarieties (of codimension 2), and there are simply far too many of these. For example the union of all of codimension 1 (and even of codimension n − 1) is the whole G n a . Over positive characteristic it is well-known that problems of Manin-Mumford or Mordell-Lang type can be formulated for G n a by imposing some extra structure. One immediately thinks of Drinfeld modules (on which the literature is already substantial); but there is an easier way using Frobenius (also see [26], in particular Theorem 2.6 p.3841). It is these "Frobenius modules" or "F-modules" that we recently studied in [15].
In the present paper we go in the direction of Drinfeld, but we restrict ourselves to the simplest and most attractive forerunner, the Carlitz module.
To fix ideas, let us first review the situation for the multiplicative G n m over zero characteristic. The decisive result was obtained by Maurin [48] (see [7] also), and, taking into account [13], we now know the following best possible result.
Theorem A Let K be an algebraically closed field of characteristic 0, and let C in G n m be an irreducible curve defined over K . Assume for any non-zero (r 1 , . . . , r n ) in Z n that the monomial x r 1 1 · · · x r n n is not identically 1 on C. Then there are at most finitely many (ξ 1 , . . . , ξ n ) in C(K ) for which there exist linearly independent (a 1 , . . . , a n ), (b 1 , . . . , b n ) in Z n such that ξ a 1 1 · · · ξ a n n = ξ b 1 1 · · · ξ b n n = 1. It was already pointed out in [43] (p.506) that the naive analogue of this over positive characteristic is false, in that a (stronger) hypothesis about two monomials, not one, is needed. There is an exactly analogous situation in [15] for G n a with Frobenius structure associated with x p .
Our Carlitz structure is associated with x p + t x instead, and it may be found surprising that such a small change makes the situation revert back to that of the original multiplicative result in Theorem A above, with only one monomial.
Let us now recall this G n a with Carlitz structure. We use a distinguished parameter t. Write C = tF 0 + F 1 where F r is the Frobenius taking x to x p r . Thus C(x) = t x + x p ; or we shall usually write just Cx. We have the non-twisted R = F p [C] inside the twisted K{F 1 }, where K is any field of characteristic p. Of course K{F 1 } acts on G a by αx = a 0 x + a 1 x p + a 2 x p 2 + · · · for α = a 0 F 0 + a 1 F 1 + a 2 F 2 + · · · in K{F 1 }. Here we have used the same juxtaposition notation for the module action, as in αx, and the field action, as in ax. In general throughout this paper the first will be used mainly with greek letter coefficents, and the second mainly with roman letter coefficients; and the two actions will rarely be side-by-side.
There is an action on G n a by α(x 1 , . . . , x n ) = (αx 1 , . . . , αx n ). Any algebraic subgroup of G n a that is an R-module is then defined by several equations of the form α 1 x 1 + · · · + α n x n = 0 where α 1 , . . . , α n are in R. The codimension is the rank of the various (α 1 , . . . , α n ) in R n . We believe in the following version of Theorem A.
The case n = 1 is empty. The case n = 2 amounts to an analogue of Manin-Mumford. It was proved in the general context of Drinfeld modules by Scanlon [55], using techniques from model theory. Here we will sketch a more elementary method in the Carlitz context.
Already the case n = 3, going beyond torsion, is in the sense of Zilber-Pink. The main result of this paper is a proof for n = 3.
It is possible that the cases n = 4, 5 can be handled by adapting the methods of [10] and the results of Amoroso and David [1].
But for n ≥ 6 quite different methods will probably be needed, maybe following [7,48] or [13].
In fact Theorem A above for n = 3 and K = Q was first proved in [8]. There the concept of height was unavoidable, and we needed also results on upper bounds as well as considerably deeper results on lower bounds.
By contrast the proofs in [43] and [15] do not use heights at all. It turns out that the proof of our Theorem above follows much more closely [8] and in particular we need heights h(ξ ) on F p (t) (see later).
Of course the condition of algebraic closure can be omitted in all the above statements, but its retention is meant to emphasize that we are considering points of unbounded degree (over F p (t) for example).
We will prove the following upper bound, a Carlitz analogue of Theorem 1 of [8] (p.1120).

Proposition 1
For K = F p (t) let C be an irreducible curve in G n a defined over K . Assume for any non-zero (ρ 1 , . . . , ρ n ) in R n that the form ρ 1 x 1 +· · ·+ρ n x n is not identically constant on C. Then there is B such that for all (ξ 1 , . . . , ξ n ) on C(K ) for which there exists non-zero (α 1 , . . . , α n ) in R n with α 1 ξ 1 + · · · + α n ξ n = 0.
For the lower bound we have to go beyond [8] with a Carlitz analogue of the Néron-Tate height on a elliptic curve. This was constructed by Denis [21] (even for Drinfeld modules), and we shall denote it byĥ(ξ ) ≥ 0 for ξ in F p (t) (see later). It is well-known thatĥ(ζ ) = 0 if and only if ζ is torsion in the sense of Carlitz (see later). Then F p (t, ζ ) is a cyclotomic (see later) extension of F p (t), and every cyclotomic extension F c has this form (see later). They are separable (see later). We have to consider extensions of F c that are not necessarily separable. Thus we will prove the following lower bound, where from now on we write Proposition 2 There is a positive constant c depending only on p with the following property. Let F c be a finite cyclotomic extension of F 0 and let F be an extension of F c of degree d. Then for any non-torsion ξ in F we havê In fact the lower bound can be multiplied by where q is the inseparable degree of F over F c ; but this seems such a tiny improvement that we did not bother to include the details.
In the case F c = F 0 (so no cyclotomy) a very slightly stronger result with (log log 16D) −3 in place of (log log 16D) −2 was proved by Denis [21] as Théorème 2 (p.218); but only for extensions which are regular (apparently not essential) and separable -a genuine restriction. This restriction was lifted by Demangos [20], even for a class of Drinfeld modules including Carlitz. The lower bound of his Theorem 2 (p.153) involves an extra negative power of the inseparable degree of F over F 0 . But it has the advantage that all the constants appearing are explicitly calculated. See also Bosser and Galateau [14] for several simplifications and improvements, especially Theorem 1.8 (p.168).
In the case F = F c (so only cyclotomy) David and Pacheco [19] have shown in Théorème 1.0.1 (p.1046) that in factĥ(ξ ) ≥ c −1 (and even the generalizations to abelian extensions and Drinfeld modules). See also Bauchère [4] for further generalizations.
We now describe our proofs. That of our Conjecture for n = 2 follows one of the classical proofs over Q. For n = 3, the proof of our Theorem for K = F 0 follows the general strategy of [8], using height upper and lower bounds.
We prove Proposition 1 by adopting the slightly simplified exposition in [42]. As for Proposition 2, it is an analogue of a result of Amoroso and Zannier [2] over Q (they actually treated abelian extensions). We have at our disposal the proof in [21] for F c = F 0 ; this is the natural analogue of the classical result of Dobrowolski [23]. But in fact this proof in [21] resembles much more Laurent's analogue [37] for elliptic curves with complex multiplication. It was Ratazzi [53] who extended Laurent's result to abelian extensions. Meanwhile Pontreau [52] gave a simpler proof of the result of [2] restricted to cyclotomic extensions (whose discriminants are known), and it is this that we adapt here to the the Carlitz context. However our choice of parameters in the auxiliary polynomial is rather different from his; for example we differentiate about d times and he only about log d times.
In much of the early work it was possible in analogues of Proposition 2 to restrict to ξ that are integral in some sense. But already this made some trouble in the elliptic case [37] and in the original Carlitz work [21]. Here we are obliged to distinguish between valuations of small and large ramification and exploit the known ramification properties of cyclotomic extensions (this argument may fail for abelian extensions).
We have mentioned that over positive characteristic there may well be problems with inseparability. In principle this causes trouble here, especially in the use of an analogue of Siegel's Lemma. We overcome this by adapting a version due to Thunder [57]. This version also involves the genus of the function field F c analogous to well-known results of Bombieri and Vaaler [6] involving the discriminant; for us it is fortunate that the genus of F c is also known (just as for cyclotomic extensions of Q).
On the other hand inseparability can be an advantage, For example Theorem 6.6 (p.64) of Ghioca [25] about Drinfeld modules implies thatĥ(ξ ) ≥ p −19 for any non-torsion ξ in any purely inseparable extension of F 0 . So there is no dependence at all on field degrees. In fact here the only possible torsion ξ are 0 when p > 2 and 0, 1, t, t + 1 when p = 2 (see later).
In the earlier work [43] and [15] over positive characteristic it was relatively easy to extend the results from F 0 to general K by means of transcendence degree arguments. This does not seem possible here. Instead we follow a specialization strategy of Bombieri, Zannier and the second author in [9]. But the additive situation diverges somewhat from the multiplicative situation and there are several new elements. For example the zeroes and poles of x a 1 1 x a 2 2 x a 3 3 are on an equal footing, but this is not true of the Carlitz analogue f = α 1 x 1 + α 2 x 2 + α 3 x 3 , whose poles are clear but whose zeroes are far from evident. In [9] we used Mason's abc inequality to settle similar problems. We do not yet have a Carlitz abc but we can exploit the underlying differentiating idea by noting that the derivative of t x + x p is just t. Using certain systems of identities we can in this way show that the degree of f does not drop too much under specialization. Also in [9] we made an innocent-looking appeal to a result in Mumford [50] for counting inverse images of algebraic maps. This result used the concept of topologically unibranch. In the literature we could not find a suitable positive characteristic analogue and so we developed our own substitute. The rest of our paper is arranged as follows.
In Sect. 2 we consider the case n = 2 of the Conjecture. After warming-up with a simple explicit example, we turn to the general proof; but in view of previous work we feel justified with just a sketch. It also follows from our Theorem by "lifting" the curve in G 2 a to G 3 a by introducing a sufficiently general constant value of x 3 .
Then in Sect. 3 we prove Proposition 1, also after giving an explicit example. We postpone the comparatively technical proof of Proposition 2, and in section 4 we show that Propositions 1 and 2 imply the Theorem for K = F 0 .
Section 5 contains preliminary material on Siegel's Lemma, and Sect. 6 more preparations for the proof of Proposition 2, which then follows in Sect. 7.
In Sects. 8 and 9 we start some preliminaries for the general K , including the identities mentioned above. After that Sect. 10 contains an extension of Proposition 1. The main specialization arguments follow in Sect. 11.
We are then able almost to complete the proof in Sect. 12, with a final extra argument in Sect. 13 because certain statements of Mordell-Lang type are not quite in the literature.
As a matter of fact inseparability turns out to be not quite such a problem for the application to our Theorem, because we show in an Appendix that the relevant inseparable degree is bounded. Nevertheless, we have to take it into account in the proof of Proposition 2.
It will be clear to the experts that everything in this paper extends immediately from F p to arbitrary finite fields.
And it should go without saying that all our results are effective, and indeed we shall make no further reference to such matters.
We have mentioned Drinfeld modules several times already, so there naturally occurs the problem of generalizing this paper to those (or even to t-modules). It is reasonable to expect some sort of analogue of our Conjecture to hold.
We heartily thank Umberto Zannier for valuable correspondence regarding some of the estimates in Sect. 8.

Manin-Mumford
As examples we start with an example for the line x + y = 1 as in (1) and also the hyperbola x y = 1, with K arbitrary as in the Conjecture. In fact we go further and determine the solutions for every p, as Leitner did in [38,39] with G 4 m (confirming an expectation of Hrushovski [33] p.669). This leads to the 23 mentioned in the abstract.
If p = 2 there are eleven, corresponding to and Verification. For (a) we start by checking that the line defined by x + y = 1 does not lie in a proper Carlitz submodule of G 2 a defined by say ρx + σ y = 0. Otherwise we would get Thus ρ − σ = 0 and we are left with σ 1 = 0. But if σ = S(C) for a non-constant polynomial S of degree d then as p = 2 we can quickly check that σ 1 is a polynomial in t of degree p d−1 . This fails for p = 2 and indeed then σ 0 1 = 0 for σ 0 = C 2 + C (compare [30] p.61). So here the line lies in σ 0 x + σ 0 y = 0. This even makes the above example (a) fail for p = 2: simply take any torsion element ξ then because 1 is torsion so also is η = 1 − ξ .
In fact this calculation proves (a) at once: if ξ, η are torsion then so is ξ + η = 1 (it is the Carlitz analogue of say x y = 2 in zero characteristic G 2 m ). For (b) it is clear that the hyperbola defined by x y = 1 does not lie in a proper Carlitz submodule of G 2 a defined by ρx + σ y = 0, because if ρ = 0 then ρx has positive degree in x and if σ = 0 then σ y for y = 1/x has negative degree in x.
Now we proceed to the main argument, by diophantine approximation as in the classical case of G 2 m over zero characteristic. There is a monic polynomial N in F p [X ] of minimal degree, say n, such that the torsion elements ξ, η (if any) satisfy If ζ is a primitive N -torsion element, then there are polynomials R, S with (see for example [30] p.55); further we can assume R, S are of degree at most n − 1. We can find polynomials A, B, D not all zero with AR + B S = DN ; and a simple counting argument shows that we can take A, B, D of degree at most n/2. Indeed there are 3(1+[n/2]) coefficients at our disposal subject to n + [n/2] linear conditions, and the difference is It follows that for α = A(C), β = B(C), an equation of total degree at most p n/2 in ξ, η. We can apply Bezout to this and ξη = 1 by our opening observation. We find that for this particular N there are at most 2 p n/2 pairs (ξ, η).
On the other hand replacing ζ by any of its conjugates over F 0 in (6) gives a conjugate pair also on the same hyperbola. Further these pairs are all different, because by the minimality of N and (5), (6) the polynomials R, S, N can have no common factor; and so we can solve G R + H S + L N = 1 giving ζ = G(C)ξ + H (C)η. Now the degree of ζ over F 0 is well-known to be the Carlitz-Euler φ(N ) (see [16] p.173). It follows that 2 p n/2 ≥ φ(N ). (8) In terms of a prime factorization N = r i=1 N e i i it is So from (8) and the fact that 2 p 1/2 < p − 1 for p > 5 we reduce to the cases p = 2, 3, 5.
If p = 5 then 2p n/2 ≥ ( p − 1) n forces n = 1, so N = X + a (a = 0, 1, 2, 3, 4). But the polynomials N (C)T = T 5 + t T + a for a = 0 are irreducible and non-reciprocal and none is the reciprocal of another, and for a = 0 it is irreducible and non-reciprocal after dividing by T . So by (5) there are no solutions if p = 5.
If p = 3 then 2p n/2 ≥ ( p − 1) n forces n ≤ 4. Checking each of the 81 possibilities for N we find that give rise via (5) to the solutions indicated in (2). We find no other ξ .
If p = 2 we have to work a bit harder, and we divide into eight cases according to which of the three irreducible polynomials X , X + 1, X 2 + X + 1 of degree at most 2 divide N .
So now n ≥ 4 and we get , which implies n ≤ 7. Therefore N = N 1 N 2 N 3 (X 3 + a X 2 + bX + c) leading to just eight possibilities for N . The other seven cases lead to n ≤ 6 and better. For example when none of N 1 , N 2 , N 3 divide N , then every n i ≥ 3 and so the above product is at least (7/8) n .
Here the factorization of each N (C)T on Maple can take up to two hours. Spotting reciprocal factors or reciprocal pairs by eye is not too easy and so ad hoc methods using resultants were developed. Thus if the resultant of N (C)T and its own reciprocal is non-zero, then such factors cannot exist. Already all solutions come from The first N gives ξ = 1 of course in (3) and the next two give the next two there; but the last N provides the reciprocal pair leading to the last two in (4).
This completes the verification of Example 1 (we checked that allowing powers of primes does not yield any additional solutions).
It is not much more difficult to prove the general conjecture above in G 2 a by these means. We just sketch the details, as it also follows rather easily from our Theorem by "lifting" the curve in G 2 a to G 3 a by introducing a sufficiently general constant value of x 3 . It suffices to treat the case K = F 0 . We can assume that C is defined over a finite extension F of F 0 , otherwise it contains anyway at most finitely many points algebraic over F 0 and in particular torsion points. We obtain as above (7) and now it is by assumption that we can apply Bezout. This time we are allowed to use the conjugates of ζ only over F, but that hardly affects their number. However as in Example 1 we need a lower bound for φ(N ) better than (9) and at least of the form c( p n ) θ for some θ > 1 2 . The only trouble is at p = 2 but in general we can argue for the number r j of monic irreducible polynomials of degree j in F 0 . Taking logarithms, using − log(1 − x) ≤ 2x for 0 ≤ x ≤ 1 2 , and also as well as n j=1 1/ j ≤ log n + 1 we find φ(N ) ≥ p n 55n 4 (11) (with n as the degree of N ) comfortably of the required form. We shall need both (10) and (11) later. It will now be seen that N is a useful notation for p n .

Proof of Proposition 1
We use the standard height function h on taken over all w on F 0 (which correspond to monic irreducible P in F 0 , with |P| P = e −d for d the degree of P, and |t| ∞ = e). We extend it in the usual way to the algebraic closure, so that if ξ is in an extension F of degree D over F 0 , then over all valuations v on F which extend those on F 0 (that is, for all x in F 0 we have |x| v = |x| w for some w as above) and D v = e v f v the local degrees. We shall often say that v divides P or ∞. For example when θ is rational. First here is an illustration in the spirit of Example 1. However we cannot use the line x + y = 1 here because 1x + 1y is constant on it. And indeed the points with coordinates have h(ξ ) = p m−1 going to infinity with m provided p = 2, in view of our remarks above about the degree of σ 1 (again the analogue of x y = 2 in zero characteristic G 2 m ). But the (Carlitz) hyperbola x y = 1 is fine.
Verification. We use the canonical height introduced by Denis for Drinfeld modules (for which he proved his lower bound, at least in the Carlitz case). This is defined aŝ Not only do we have the obviousĥ(Cξ) = pĥ(ξ ) but even (in the notation at the end of Sect. 2)ĥ for any non-zero P in F p [X ]. Also since C is additive we get (but notĥ(ξ η) ≤ĥ(ξ ) +ĥ(η) or evenĥ(ξ 2 ) ≤ 2ĥ(ξ ), for example ξ can be torsion while ξ 2 is not, as for ξ = √ −t with p = 3; and evenĥ(1/ξ ) need not beĥ(ξ ), as for ξ = t with p = 2).
To compare with (13), it may be shown for p = 2 that respectively (we shall not need these values). Denis showed thatĥ(ξ ) differs from h(ξ ) by a bounded amount, and indeed we now check that independently of p.
In the first place we have an upper bound For a corresponding lower bound we use the standard Nullstellensatz argument. With ρ = ξ t p+1 and Thus for any ultrametric valuation we deduce Cancelling and taking the product with suitable exponents leads in the usual way to The standard telescoping sum gives (16). Thus h(ξ ) = h(η) ≤ 3 too, and we are done. Similarly if β = 0; so we will henceforth assume that α = 0, β = 0. Write α = A(C), β = B(C). Then (14) leads to with l the degree of A and m the degree of B. As in the situation over Q (see for example Theorem 14.9 of [44] p.176) everything depends on the relation between l and m. We note from (16) thatĥ First suppose l > m. Then pĥ(ξ ) ≤ĥ(η) which by (17) leads toĥ(ξ ) ≤ 6 and so By symmetry we get the same result if l < m. So it remains only to consider l = m.
We can now write with nonzero a, b in F p and α 0 , β 0 of degree in C smaller than m. Then and we proceed to compare the canonical heights.
To begin with the left-hand side, (15) giveŝ which is by (14) and (17) at most To continue with the right-hand sidê The same argument gives the same bound for h(η) and by addition we get something stronger than (18). So the verification is complete.

Lemma 1
If ξ 1 , . . . , ξ n are in F 0 for which there exist α 1 , . . . , α n not all zero in R with α 1 ξ 1 + · · · + α n ξ n = 0, then for any non-negative integer m there are β 1 , . . . , β n in R, not all zero, such thatĥ . It follows that for any positive integer l we can find of degree at most m such that as long as m +1 > n(l −1). Thus we can choose l = [m/n]+1 > m/n. Now the polynomials We now put and note that Taking canonical heights gives This is the required result since l > m/n. Note that β 1 , . . . , β n are indeed not all zero otherwise (20) would give a contradiction, because l > 0 and some A i has the same degree as A.
We can now prove Proposition 1. Suppose that C is defined over a finite extension F of F 0 . We fix some m with p m/n ≥ 2nd (22) where now d is the degree of the curve C. For any point P = (ξ 1 , . . . , ξ n ) as there we construct . . , n) as in Lemma 1. Then the function y = β 1 x 1 + · · · + β n x n is not constant on C by hypothesis. Thus for any i there is for c , c also independent of P. The required result h ≤ 2c now follows from (22). This completes the proof of Proposition 1.

Proof of Theorem for K = F 0
Here we deduce the Theorem for K = F 0 from Proposition 1 just proved together with Proposition 2 whose proof will follow in Sect. 7. To avoid logarithmic pedantries we reformulate Proposition 2 as follows.
Assertion. Given ε > 0, there is a positive constant c depending only on p and ε with the following property. Let F c be a finite cyclotomic extension of F 0 and let F be a finite extension of F c . Then for any non-torsion ξ in F we havê It will be clear that all we need is any ε < 1/3. But during this section we will use instead of c.
In what follows we shall be relatively brief, as it follows the strategy over Q (see for example [42] p.330).
Given (ξ 1 , ξ 2 , ξ 3 ) as in the Theorem, we can find ξ and a torsion point ζ such that and for σ 1 , τ 1 , σ 2 , τ 2 , σ 3 , τ 3 in R, somewhat as in (6). Here it is because the R-module generated by ξ 1 , ξ 2 , ξ 3 in F 0 has rank at most 1, and so is Rξ + Z for a finitely generated torsion module Z, which further has the form Rζ . And if ζ has order ν then we can take τ 1 , τ 2 , τ 3 as polynomials in C of degree less than that of ν.
We now want to find small γ 1 , γ 2 , γ 3 , δ in R, not all zero, such that Using any form of Siegel's Lemma over F p [X ], or by counting as above, we find a solution with It follows from (24) that Clearly γ 1 , γ 2 , γ 3 are not all zero, and we deduce from (25) and Bezout that On the other hand (24) giveŝ Let us temporarily assume that no non-trivial ρ 1 x 1 + ρ 2 x 2 + ρ 3 x 3 is constant on our curve C, as in Proposition 1. Then summing we get Assuming further that ξ is non-torsion, we are now set up to apply the Assertion, with By (11) we have φ(ν) ν 1−ε , and comparison gives ν 1−ε M D 1+ε . But for ε < 1/3 this contradicts (26) or better gives an upper bound for D which implies everything (by the usual Northcott).
If ξ is torsion, then ξ 1 , ξ 2 , ξ 3 are, and Manin-Mumford on a suitable projection to two dimensions settles the thing.
Finally, what if some non-trivial ρ 1 x 1 + ρ 2 x 2 + ρ 3 x 3 is constant on C? We can assume the coefficients are coprime in R and then use GL 3 (R) to assume that it is x 3 that is some constant, which we can call ξ 3 . Then ξ 3 is non-torsion. Now the thing reduces to Mordell-Lang in two dimensions, that is, a group of finite Q-dimension (in fact dimension 1). This was done by Ghioca [27]. But we can too. Namely, we can just eliminate ξ 3 from the two relations between ξ 1 , ξ 2 , ξ 3 to get a relation between ξ 1 , ξ 2 on the projected curve C in G 2 a . By Proposition 1 for this projection we see that ξ 1 , ξ 2 have bounded heights (unless some non-trivial φ = κ 1 x 1 + κ 2 x 2 is constant on C ); and of course so does ξ 3 . We still have (24) and we can argue as before. And really finally, if some non-trivial φ is constant on C then it may as well be x 2 = ξ 2 . But now ξ 2 , ξ 3 must be independent and we cannot have two relations. All this is exactly parallel to the situation over Q.

Siegel's Lemma
Denis [21] and others use an ad hoc version for function fields, rather as in the proof of our Lemma 1. We need a relative version. That for number fields involves discriminants. The correct analogue for function fields involves the genus and was found by Thunder [57].
We stick with our F 0 = F p (t), and a finite extension F of F 0 . We have a genus g(F) (see [3] for this and much more). In [57] (pp.148,150) a projective absolute height h P on F n is defined; it involves the integer where F is the algebraic closure of F p in F. It is not hard to see that it coincides with the natural extension of (12) to non-zero vectors by Also it is convenient to define the height of the zero vector as zero.
We have the following extension of Corollary 3 of [57] (p.149), in which a condition about full rank is eliminated. Of course we cannot afford the luxury of a Grassmannian height anymore.
Proof If M = 0 the result is obvious (for example with any l standard basis elements of F n * ), so we assume M = 0. We follow the argument of Bombieri and Gubler [5] (pp.75,79,80). For that we define h P (M ) for any matrix M with m rows and n columns and m < n of rank s ≥ 1 to be the Grassmannian height h P (M) as in [57] (p.151), whereM consists of any s independent rows of M . This is the analogue of the definition in [5] (p.75).
Pick a basis (λ 1 , . . . , λ r ) of F s /F * and write M = λ 1 M 1 + · · · + λ r M r with M 1 , . . . , M r over F * . Let σ 1 , . . . , σ r be the different (here we use separability) embeddings of F s , fixing F * , into the algebraic closure of F * . Then we check that σ (M) = M σ for (say), and because h P (M σ ) is at most the sum of the heights of its rows we get Ordering by increasing height we deduce and finally recalling j ≥ l we get the required result.
We next drop to a single solution.
Proof This follows from Lemma 2 after taking the b ν with smallest height.
Finally we allow inseparable extensions. This seems to be new.
Proof We first solve M q c t = 0 using Lemma 3, where M q is not the q-th power but just has entries the q-th powers of those of M. Now we are over the separable extension F q of degree r = d/q and so there is a non-zero solution c in F n * with To finish we take b = c 1/q , so that the height gets divided by q.

More preliminaries
Some of these are analogues of those occurring in the original Dobrowolski proof [23]. We Lemma 5 For j ≥ 8 the number r j of monic irreducible Q in O 0 of degree j satisfies Proof The upper bound is (10) and the lower bound follows by similar arguments. Of course the Prime Number Theorem for O 0 suffices for our purpose.
From now on we find it more convenient to use the notation ξ Q instead of Q(C)ξ for our Carlitz C.

Lemma 6
Let F * be an extension of F 0 of degree d and let F be an extension of F * . Suppose Proof This is essentially Lemma 4 of [21] (p.219), where it is merely said that the proof is identical to Dobrowolski's (and was for the separable case). We supply some details.
are all conjugate over F * . So two must coincide. As ξ is non-torsion this leads to Q l 1 = Q l 2 for some positive integer l. And as Q 1 , Q 2 are monic this leads to But this would contradict (28). Taking i = m, . . . , 1 we deduce that the fields form a strictly increasing chain. Thus The next result reflects the need in some of the previous literature to distinguish between small and large ramification e v as in (12).
Proof If Q has degree n, the value group of the valuation on F 0 corresponding to Q is generated by g = e n = Q 1/ log p . So that of v by (16). Making m tend to infinity gives the result.
From now on the intermediate field F * will be cyclotomic over F 0 , so it has the form ; this is in fact the ring of integers of F c , that is, the integral closure of O 0 in F c (see for example [56] p.82).
Proof Part (a) for = 1 is Lemma 3 of [21] (p.219) -and then it holds even in F p [X ]. It follows immediately for general . For part (b) we need to note that every coefficient α of has the form (t, ζ ) X ) and then by substituting X = ζ . This in turn is congruent to The following is an analogue of an estimate of Amoroso-David [1] (p.157) restricted to a single variable. Again F is a finite extension of F c . For a polynomial in F[X ] or F[X , Y ] we write | | v for the maximum of | f | v as f runs over the coefficients (at first sight this could be confused with |Q| v below -but in fact it is the same notation because Q, even though a polynomial in t, is already in F).

Lemma 9 Suppose in O c [X ] of degree at most M vanishes at some ξ in F to order at least S. Then for any monic irreducible Q in O 0 prime to N and any valuation v on F dividing Q we have
Proof Let (X ) = α 0 X d + · · · + α d be a minimal polynomial of ξ over O c . We use Strong Approximation but not quite as in [1] (whose argument for one variable uses already a special case for two variables). In fact this allows us to assume that | | w = 1 for all w on F c dividing Q. Namely for each such w there is i = i(w) with 0 < |α i | w = μ w = | | w ≤ 1. Now by the Theorem of [17] (p.67 -see also first paragraph of proof) there is β in F c with |β − α −1 i | w < μ −1 w for each w dividing Q and |β| w ≤ 1 for all other w on F c not dividing ∞. The first of these imply |β| w = μ −1 w (in particular β = 0) and so |β | w = 1 for each w dividing Q; and by the second |β | w ≤ 1 for all other w not dividing ∞. So βα 0 , . . . , βα d are all in O c . Thus we just have to replace by β .
In fact since |x σ | w = |x| w(σ ) for any x in F c and some other w(σ ), we see that | σ | w = 1 for all w dividing Q. In particular | σ | v = 1 for our v as well.
By Lemma 8 we see that σ (X Q ) − (X ) Q = Q (X ) for some in O c [X ] of degree at most d Q . Putting X = ξ we deduce σ (ξ Q ) = Q (ξ ) and so The result now follows from this and (29).
For the application of Lemma 4 we need information about g(F c ) and m(F c , F 0 ).

Lemma 10 (a) We have
where F is the algebraic closure of F p in F c . But according to Gebhardt [24] (p.91) F = F p , and the result follows. For (b) we need a formula given by Keller [34] (see also [24] p.92). Namely N r of degrees δ 1 , . . . , δ r with q 1 = p δ 1 , . . . , q r = p δ r . Clearly this is at most n c r i=1 δ i e i = φ(N )n for the degree n of N . Now the result follows without difficulty from (11) above, using n ≤ 8000 log(1 + 2 n /(55n 4 )).

Proof of Proposition 2
We may assume that F c (ξ ) = F because making F smaller decreases d. We may also assume D = [F : F 0 ] ≥ 16 from the remarks in [21] (p.218). We continue the notation F c = F 0 (ζ ) for ζ of order N . Thus D = dφ(N ).
We will suppose thatĥ It then suffices to deduce a contradiction if the constant C (which cannot possibly be mistaken for our curve C) is sufficiently large as a function of p. Generally below c will denote various positive quantities depending only on p.
We use the Carlitz exponential function where A(0) = 1 and can be taken as the a i of Lemma 2(ii) of [21] (p.218). We pick any u with e(u) = ξ . It is well-known that F c is separable over F 0 . Actually it follows at once from the identity which we shall use later. Let q be the inseparable degree of F over F c . We fix any monic irreducible and we define Now for a non-zero polynomial in F[X , Y ] we define the height (27). Later we will have to be slightly careful about the projectivity. We also use the term hyperzero to remind the reader that we are considering Taylor series expansions rather than actually differentiating.

Lemma 11
There is a non-zero polynomial˜ (X , Y ) of degree at most L in each of X and and such that the function has a hyperzero of order at least qT at z = u.

Proof
With˜ and z = w + u we havẽ in readiness for an application of this lemma.
Here (31) gives Similarly so b l involves an apparent denominator (forgetting ξ, ξ Q 0 ) the lowest common multiple of all subject to p i 1 + · · · + p i r + p j 1 + · · · + p j s = l.
It is clear from (32) that A(i 1 ) · · · A(i r ) for p i 1 + · · · + p i r ≤ p I contains the factor t p j − t at most p i 1 − j + · · · + p i r − j ≤ p I − j times ( j = 1, . . . , I ), so their lowest common multiple has degree at most I j=1 p I − j p j = I p I . Thus we get a contribution cT log T to the height of the rows of the matrix of linear equations in the a i j ; here Similarly for the A( j 1 ) · · · A( j s ).
Finally the Q 0 p j 1 +···+ p js in (38) contributes cT log Q 0 ; here Taking into account (41), (42), (44) and not forgetting (36), we get using Lemma 4 (the extra g/m there is at most c log D by Lemma 10)˜ of degree at most L with coefficients in F 1/q c of projective height at most cC 3 (d/q) log D, such thatφ has a zero of order at least T . The present lemma follows by raising to the power q, a dirty cheap trick which one might well think very wasteful. At first the a q i j are in F c = F 0 (ζ ) but we can clear denominators to We next define this is quite a bit smaller than qT because later estimates as in the proof of Lemma 11 will not be helped by a small Siegel exponent as in (36). We also fix n satisfying Lemma 12 For every monic irreducible Q in O 0 prime to N of degree n and σ = σ Q the function ϕ σ (z) = σ (e(z), e(Q 0 z)) has a hyperzero of order at least T 1 at z = Qu.
Proof We show by induction on k that there is a hyperzero of order at least k (k = 0, 1, . . . , T 1 ). The case k = 0 is empty, so we assume it holds up to k − 1 for some k with 1 ≤ k ≤ T 1 . We can write ϕ(z) = (e(z)) for (X ) in O c [X ] of degree at most M = q L + q L Q 0 (incidentally it is the second term here that forces our non-integrality considerations). It follows from Lemma 11 that vanishes at ξ to order at least qT . So the kth hyperderivative = [k] vanishes at ξ to order at least T = qT − k ≥ qT /2. Let V be the set of valuations v on F dividing Q such that ξ and so ξ Q is v-integral. For these we have | σ | v ≤ | σ | v and clearly also | σ | v ≤ | σ | v . Thus from Lemma 9 we deduce for η = σ (ξ Q ) the estimate For v on F not in V we argue analytically, using our induction on k. From ϕ(z) = (e(z)) and the fact that the z-derivative of e(z) is 1, we deduce ϕ [k] (z) = [k] (e(z)) + · · · , where the missing terms involve lower hyperderivatives of . Applying σ , putting z = Qu and using our induction we see that η is the k-th Taylor coefficient of ϕ σ (z) = σ (e(z), e(Q 0 z)) at Qu. Estimating as we did in the proof of Lemma 11 with the analogues of (37) and (38) for (instead of˜ ) we get with the inner maximum running over all A in (39) subject to (40).
We are trying to prove η = 0. If η = 0 then the sum S of D −1 D v log |η| v over all v on F should be zero by the Product Formula. We will deduce a contradiction. By (46) and (47) The first three of the latter are easily estimated. We find as in (42) and (43); and even the same for S N , as Q 0 Q ĥ (ξ ) ≤ 1/q ≤ 1. Also instead of (44). It would be tedious to estimate each A v explicitly. But S A is at most the height of the corresponding vector of 1 with 1/A in (48), so the calculation can be done in F 0 , which we already did in (41), now getting T 1 log T 1 ≤ cC 3 d log D instead.
Thus using Lemma 11 to estimate h P ( σ ) = h P ( ) we get Finally in S 0 we have |Q| v = |Q| Q = Q −1/ log p and V is the complement (among v dividing Q) of the set E in Lemma 7. Now e v ≤ d there because each Q prime to N does not ramify in the cyclotomic F c (this step would fail for an arbitrary abelian extension). Thus using justĥ(ξ ) ≤ 1/d from (30). Therefore and (49) yields our desired contradiction. Thus indeed η = 0 and this completes the proof of Lemma 12.
We can now finish the proof of Proposition 2 by showing that the polynomial defined above by (e(z)) = (e(z), e(Q 0 z)) has too many hyperzeroes for its degree.
First note that = 0 because =˜ q and e(Q 0 z) is a polynomial of degree Q 0 > L in e(z) by (34) and (35). By Lemma 12 its conjugate σ for σ = σ Q has a hyperzero of order at least T 1 at each ξ Q . Let τ be any automorphism of F over F 0 extending σ −1 . Then By Lemma 6(b) these (ξ Q ) τ = (ξ τ ) Q are all of degree d over F c if we exclude at most 2log d ≤ 2log D exceptional Q. This is harmless because by Lemma 5 and (45) the total number of Q at our disposal is at least M ≥ c −1 C 6 (log D) 2 /(log log D) 2 . We should also note that the number of monic irreducible polynomials in O 0 not prime to N = N e 1 1 · · · N e r r is g, certainly at most the degree of N which by (11) is at most c log φ(N ) ≤ c log ||N || ≤ cC log log D . Now as Q ranges over all those remaining, and τ ranges over all extensions of σ −1 = σ −1 Q from F c to F we claim that the (ξ Q ) τ are all different. In fact an equation (ξ Q ) τ = (ξ Q ) τ would imply that ξ Q , ξ Q are conjugate over F 0 . Thus by Lemma 6(a) (with F * = F 0 ) we have Q = Q and so σ = σ for σ = σ Q . So (ξ τ ) Q = (ξ τ ) Q . To cancel the Q here we note that F c (ξ ) = F c (ξ Q ) by Lemma 6(b), so that ξ = R(ξ Q ) for R in F c (X ). Now As we assumed that ξ generates F over F c and σ = σ we conclude τ = τ . This settles the above claim. Write Q for the monic minimal polynomial over F c of ξ Q . Then σ in F c [X ] is divisible by T 1 Q , and so in F c [X ] is divisible by ( σ −1 Q ) T 1 . Now the hyperzeroes of σ −1 Q are the (ξ Q ) τ (each repeated q times) and so an equation (ξ Q ) τ as above. Thus Q = Q and so σ = σ and these σ −1 Q in F c [X ] are all different (and irreducible over F c ). Once again by Lemma 6(b) they all have degree d and so we get in all 2 (log log D) 2 hyperzeroes for . However has degree at most q L + q L Q 0 ≤ cC 7 d 2 (log D) 2 q(log log D) 2 (here we can ignore the q -taking it into account would lead to the tiny improvement mentioned in section 1) and the proof of Proposition 2 is complete.

Preliminaries for general K -geometry
To start with we need some purely geometric results; maybe the first three lemmas below are well-known, but as we are not over zero characteristic (where they also hold) we spell out some proof details.

Lemma 13 Suppose affine B is irreducible of dimension at least 2. Then there are at most finitely many b in B such that the intersection of B with the generic hyperplane through b is reducible.
Proof If we are in A m then Bertini Irreducibility (see for example [35] p.212) gives non-zero (homogeneous) (X 0 , X 1 , . . . , X m ) such that the intersection of B with λ 1 x 1 +· · ·+λ m x m = λ 0 is irreducible provided (λ 0 , λ 1 , . . . , λ m ) = 0. Now the generic hyperplane through b = (b 1 , . . . , b m ) is i.e.
If the intersection is reducible we must have identically in μ 1 , . . . , μ m . This means that b 1 X 1 + · · · + b m X m − X 0 divides . But that can happen for at most finitely many The example of a quadric cone B in A 3 defined by uv + vw + wu = 0 through (0, 0, 0) shows that exceptional b may exist: here the intersection with any hyperplane is a union of two lines.

Lemma 14 Suppose affine B is irreducible of dimension at least 2. Then for any b in B the intersection of B with the generic hyperplane through b has codimension 1 in B.
Proof If l ≥ 2 is the dimension of B, then certainly dim(B ∩ b ) < l for b generic through b. Because if not, then B would be contained in b . But we can find a bunch of such b whose intersection is just b, and it would follow that B = {b}.
Let L b be a linear polynomial defining b . It induces a map L b from B to A. This is dominant. For otherwise L b would be a constant c on B. As L b (b) = 0 we see that c = 0. But then B would be contained in b , which is excluded above.
We now use the Fibre Dimension Theorem in the version quoted in [11] (p.8); there we were over zero characteristic but it holds too over positive characteristic, as the reference to [18] shows. Part (a) on this L b from B to A shows that L −1 b (0) = B ∩ b has dimension at least l − 1, provided it is non-empty; which it is, because it contains b.

Lemma 15 Suppose affine B is irreducible of dimension at least 2. Then for any non-singular b in B not in the finite set of Lemma 13 the point b remains non-singular on the intersection of B with the generic hyperplane through b.
Proof Let 1 , . . . , N be generators of the ideal of B in A m , so that the jacobian matrix with rows has rank m − l, again for l the dimension of B. With L b as in (50) defining the generic hyperplane, we adjoin the extra row and then the rank increases to m − l + 1 = m − (l − 1), because μ 1 , . . . , μ m are generic. Now 1 , . . . , N , L b may not be generators of the ideal of B ∩ b , but if we extend them to include such generators then the rank will still be at least m − (l − 1). By Lemmas 13 and 14 this irreducible B ∩ b has dimension l − 1 and so indeed b is non-singular there (see for example [35] p.198).
Next we record a result of well-known type, although we could not find it precisely in the literature. It is a version of Mumford's (3.25) and (3.26) in [50] (p. 53), which was applied in [9]. As we are over positive characteristic we cannot use his notion of "topologically unibranch". We write π for the projection from affine A n × A m to A m , and for the moment we work over an arbitrary algebraically closed field.

Lemma 16 Let W be an algebraic set all of whose components have dimension l ≥ 1 in A n × A m and let B be an irreducible variety of dimension l in A m , with π(W ) in B. Let T (when l ≥ 2) be the finite set of Lemma 13 above for B. If b is non-singular on B and not in T (when l ≥ 2) such that W ∩ π −1 (b) is finite, then its cardinality is at most that of W ∩ π −1 (η) for any η generic on B.
Proof Here it is crucial, as in [9], that the excluded b hardly depend on W .
We will prove the result first under the assumption that the projections from each component of W to B are dominant.
We start with the case of curves l = 1 (for which we do not need T or the hypothesis that W ∩ π −1 (b) is finite). We proceed in three stages.
Suppose first that W is irreducible. Then W ∩ π −1 (η) has exactly d elements, where d is the separable degree of π restricted to W . Suppose W ∩ π −1 (b) contains at least e > d points. We can find a linear form λ in the coordinates of A n taking e different values at these points. If q is the inseparable degree, then μ = λ q satisfies an equation with φ 0 , . . . , φ d not all zero in the coordinate ring of B.
If we are lucky and φ 0 , . . . , φ d do not all vanish at b, the result is clear: (51) shows that there can be at most d values of μ, so at most d values of λ = μ 1/q , a contradiction. Here we did not use non-singularity.
If φ 0 , . . . , φ d all vanish at b, we pick φ = φ i = 0 with ord b φ i minimal. Here nonsingularity is implicit. Now the φ j /φ are regular at b and so can be written as ψ j /ψ for ψ j , ψ in the coordinate ring of B with ψ(b) = 0. Multiplying (51) by ψ gives ψ 0 μ d + · · · + ψ d φ = 0 on W . But φ = 0 on W would contradict dominance. As W is irreducible, it follows that ψ 0 μ d + · · · + ψ d = 0 on W . And this takes us back to the first case, because ψ i = ψ does not vanish at b.
Next suppose, still for l = 1 under the dominance hypothesis, that W = W 1 ∪ · · · ∪ W r for irreducible W 1 , . . . , W r . Then for generic η. Thus Now any two W i ∩ π −1 (η) are disjoint because any two W i intersect in a finite set which cannot project to η. Thus the last sum above is indeed # W ∩ π −1 (η) . This settles the case l = 1. We now use induction on l (still assuming dominance). Assuming the result for some dimension l − 1 ≥ 1, we will deduce it for dimension l.
As above, we do it in stages. We may assume W ∩ π −1 (b) is non-empty and b = η.
It is easy to see that a generic hyperplane constricted to pass through b and η is a generic hyperplane constricted only to pass through b. We call it b . Since b is not in T , the intersection B ∩ b is irreducible. We may denote also by b the product A n × b in A n × A m (it is defined by the same equation). Then π induces a map π By varying this hyperplane (still through b and η) we would deduce that W is contained in their intersection, which is (A n times) the line through b and η. But then π(W ) would be contained in this line, contradicting dominance.
Let L b be a linear polynomial defining b . It induces a map L b from W to A. This is dominant. For otherwise L b would be a constant c on W . As L b (b) = 0 and W ∩ π −1 (b) is non-empty we see that c = 0. But then W would be contained in b , which is excluded above.
Now the Fibre Dimension Theorem on this L b from W to A shows that every (non-empty) Assume for the moment that there is only one component. We try to apply the induction hypothesis to the map π b from W ∩ b to B ∩ b also irreducible of dimension l − 1. In fact π b is dominant, otherwise π b (W ∩ b ) would be of dimension at most l − 2 containing b and then the Fibre Dimension Theorem would imply that W ∩ π −1 (b) would be of dimension at least 1, contradicting its assumed finiteness. We see by Lemma 15 that b remains non-singular on B ∩ b . Thus by induction we have But since b goes through both b and η, it is easy to see that A similar argument works if there are several different components Z (1) , . . . , Z (s) (all necessarily of dimension l − 1) of W ∩ b . Then π b induces projections π (1) , . . . , π (s) from Z (1) , . . . , Z (s) to B ∩ b . If one of these is not dominant, then again W ∩ π −1 (b) would be infinite. Thus by induction again, #(π ( j) ) −1 (b) ≤ #(π ( j) ) −1 (η) for j = 1, . . . , s. Therefore Now any two (π ( j) ) −1 (η) are disjoint because any two Z ( j) intersect in something of dimension at most l − 2 which cannot project to η. Thus the last sum above is just #(W ∩ π −1 b (η)); and we have recovered (53).
Next the reducible case W = W 1 ∪ · · · ∪ W r (still under dominance) follows in a similar way. Namely 1, . . . , r ). Thus and as above any two W i ∩ π −1 (η) are disjoint because any two W i intersect in something of dimension at most l − 1 which cannot project to η. Thus the last sum above is indeed # W ∩ π −1 (η) . This settles the lemma under our assumption that the projections from each component of W to B are dominant.
Finally suppose the latter fails for some component W 0 of W . Then b cannot be in π(W 0 ), otherwise the Fibre Dimension Theorem would imply that W 0 ∩ π −1 (b) would have dimension at least dim W 0 − dim π(W 0 ) ≥ 1, contradicting finiteness. Thus W 0 ∩ π −1 (b) is empty; and of course so is W 0 ∩ π −1 (η). So these are not seen in the intersections with W .
Regarding the excluded b, the example of B defined by v 2 = u 2 (u + 1) in A 2 and W defined by

Preliminaries for general K -Carlitz
Now we return to the Carlitz world. The next result concerns the action of a Carlitz polynomial A(C) on a special sort of Laurent polynomial in a variable u. Write  k = 0, . . . , d).
Then at level n from level n − 1 ≥ 0 we have first and so on, down to By induction on n we verify the following Remark 1 For k = n, . . . , d + n and l = min{k, d} ≥ n we can write X (n) k as a linear form in a l , . . . , a l−n whose coefficients are polynomials over F p in t, λ 0 , . . . , λ n of degree at most p n+1 in each variable.
It will be crucial that for each n, k the number of a i appearing as well as the polynomial degree are bounded only in terms of n. For example We need the "Carnomial coefficients"

Lemma 17 We have
Proof Note that we do not specify P (n) j for j = 0, . . . , n − 1. It is not difficult to do so but we found it is not useful for applications.
From C i+1 Z = C i (CZ ) we derive a simple recurrence where the T i j are considered zero if 0 ≤ j ≤ i does not hold.
Then iterating n times gives We use of course induction on n to prove (58).
which is the right-hand side thanks to (55). Now we assume the thing done for n − 1 ≥ 0 and we deduce it for n ≥ 1. Splitting λ n /u p n off the left-hand side of (58), and using the case n = 0 with u p n in place of u we find (with λ n in place of λ 0 ). For j = d + n in (58) we get at once (59), thanks to (56). Also P (n) We use (60) to see that the second sum is 1 ) p j−n T k+n−1 j + · · · + (S (n) n ) p j−n T k j )(a k λ n ) p j−n .
Thus P (n) with (now adjusting k) Here in U m we can restrict the sum from k = j. We already checked P (n) d+n in (59) using (56). Now for each j = n, . . . , d + n − 1 we pick out the coefficient of the various T k j in (61).
The biggest k is d+n and now we see T k j only in U 0 , with coefficient (S (n) 0 ) p j−n (a d λ n ) p j−n , which fits with (59) again thanks to (56).
Next for k = d + n − 1 we see T k j in U as well as in U 0 , U 1 . This also fits with (59) thanks to the displayed formula just after (56).
We carry on with k = d + n − 2 down to k = d + 1 and these also fit, thanks to the later formulae preceding (57).
Then we go further with k = d, d − 1, . . . , n which appear in all of U , U 0 , U 1 , . . . , U n and these fit with (59) because of (57). This completes the proof.
As mentioned, the P (n) j for j = 0, . . . , n − 1 are not useful for applications. For example one finds where the number of a i appearing is not bounded in terms of n as in Remark 1.

Remark 2
Because T kk = 1 the system (59) has a triangular nature; for example the equations P (n) d+n = · · · = P (n) e = 0 for some e ≥ n are equivalent to the equations X (n) d+n = · · · = X (n) e = 0. It is not difficult to see from Remark 1 that (we will be more precise later) these are equations for λ 0 , . . . , λ n again essentially independent of A (as in the examples given just after that Remark).

More on curves
From now on K = F p (t, s 1 , . . . , s l ) for some l ≥ 1 variables s 1 , . . . , s l algebraically independent over F 0 = F p (t). We define a height h s on K by regarding it as the closure of F 0 (s 1 , . . . , s l ). See for example [22] (p.1053) -thus h s (s 1 ) = · · · = h s (s l ) = 1 but h s (t) = 0.
The next result is in the style of Proposition 1.

Lemma 18
Let C be an irreducible curve in G n a defined over K . Assume for any non-zero (ρ 1 , . . . , ρ n ) in R n that the form ρ 1 x 1 + · · · + ρ n x n is not identically constant on C. Then there is B such that h s (ξ 1 ) + · · · + h s (ξ n ) ≤ B for all (ξ 1 , . . . , ξ n ) on C(K ) for which there exists non-zero (α 1 , . . . , α n ) in R n and λ in F 0 with α 1 ξ 1 + · · · + α n ξ n = λ. (62) Proof Our h s is not a canonical height in the sense of Denis but it does have the same property that h s (ξ P ) = P h s (ξ ) as in (14) above. For example h s (ξ p + tξ) = ph s (ξ ) because now t appears as a constant. To see that we can go through the Nullstellensatz argument observing that |t| = 1. Or just directly using max{1, |ξ p + tξ |} = max{1, |ξ |} p (that would work better for general P). And of course h s (ξ + η) ≤ h s (ξ ) + h s (η) as in (15) above. Now we can follow the proof of Proposition 1 above. Lemma 1 above goes through with h s instead ofĥ because of the remarks above; the extra λ in (62) makes no trouble because it leads to an extra term Q(C)λ on the left-hand side of (21), and this has zero height. The subsequent proof also goes through with h s also instead of h (actually with c =c =nc).
So it suffices to rename new as old, thus achieving (65). We may suppose e = deg T 3 above. Eliminating ξ between the first two equations of (65) gives By (64) the degree here is at most D. It follows that p d BD ≥ p e . A similar argument eliminating ξ in (65) leads to p d BD ≥ p e . Multiplying the two inequalities and using (66) we find p e ≤ (BD) 2 .
By grassmannian theory this means that there are at most finitely many possibilities for the O 0 -module generated by s, s in O 3 0 . Thus we can regard s, s as fixed. Now the original relation (63) implies of which we need only one. We can apply GL 3 (R) to assume it is α 3 = 0. This reduces the problem to G 2 a . Then a similar argument brings us down to G a . But the lemma is then empty, because now C = G a is defined over F 0 .
Right at the start of the proof we made an assumption about rank 1. But if the rank is bigger then things only get easier.
For example if the rank of the (α 1 , α 2 , α 3 ) in (63) is 2, then we can argue as in (65) without ξ and there is no longer any need for minors.
And if the rank is 3, then ξ 1 , ξ 2 , ξ 3 lie in F 0 . But because C is not defined over F 0 , this implies that C(F 0 ) is at most finite anyway. This completes the proof (and on the way we proved the analogue in G 2 a ).

Completions and specializations
During this section we assume that C satisfies the conditions of Lemma 19. Of course these are more restrictive than the conditions of the Conjecture for n = 3, but we will see that this will be no problem. As in [9] we regard the field of definition of C as the function field F 0 (s 1 , . . . , s m ) of an irreducible variety B, say of dimension l ≥ 1, in affine A m defined over F 0 . We assume as in the previous section that s 1 , . . . , s l are algebraically independent over F 0 . As in [9] we complete C toĈ in projective P 3 and then take a non-singular modelC.
In [9] (p.463) we informally defined a variety C B in A 3 ×A m ; here this amounts to writing the equations of C in A 3 with coefficients in F 0 [s 1 , . . . , s m ] and adjoining the equations of B. Of course one should more formally define it as the F 0 -Zariski closure of a point (P, η), where η is generic on B and P is generic on C over F 0 (η). This makes it clear that C B is irreducible of dimension l + 1. The natural projection π from A 3 × A m to A m then takes C B to B. There is also a natural projection γ from A 3 × A m to A 3 .
Then for a point b of B we define the specialization C b by and similarlyĈ b ,C b . As in [9] we can assume that these specializations retain enough properties of C,Ĉ,C that we shall need, at least when b is restricted to a non-empty open subset B 0 of B. And also for b = η a generic point; here we shall identify η with (s 1 , . . . , s m ).
For b in B 0 we can regard x 1 , x 2 , x 3 as functions on C b but for simplicity we omit any subscript b that may possibly be more precise. Equally we omit the subscript for or also f − λ with a constant function λ. But we put it back in the notation deg b ( f − λ) for the degree of f − λ on C b (unless f − λ is identically zero on C b , a possibility that we shall soon essentially discount). Note that by our condition on C, this f − λ is certainly non-zero on C η as long as α 1 , α 2 , α 3 are not all zero.
The next result replaces the simple argument in the last paragraph of [9] (p.463); here we lack any multiplicative structure.

Lemma 20
There is a non-empty open subset B 00 of B 0 with the following property. For any λ in F 0 , any b in B 00 and any α 1 , α 2 , α 3 not all zero the function f − λ is not identically zero on C b and Proof We shall first prove the lemma under the assumption that f − λ = 0 on C b . Then at the end of the proof we shall show that in fact this follows almost automatically.
Counting by poles we have now This holds also for b = η, but in fact we shall identify C η with C over F 0 [s 1 , . . . , s m ]. In this generic case we can restrict toP inC with With a uniformizer u = uP we have a local expansion where x i involves only non-negative exponents and x i only (finitely many) negative exponents.
We are going to split the negative exponents into subsets of various sets {r , r p, r p 2 , . . .} with r prime to p. These sets are disjoint. Of course we will see r p m only with Thus we can write At first the λ irm are in the algebraic closure of F 0 (s 1 , . . . , s m ) but by taking a finite cover of B and introducing more variables we can suppose that they lie in F 0 [s 1 , . . . , s m ] itself (this is just for painless specialization). We We calculate these using Lemma 17 with u there replaced by the various u r and a suitably large d. We find f = r d+n r j=0 1 u r p j (P 1r j + P 2r j + P 3r j ) ( j = n r , . . . , d + n r ) and the X 1rk , X 2rk , X 3rk are defined as in (56)-(57) with n = n r on taking the Up to now we are in the generic situation but soon we shall specialize.
Thus indeed we can take any b in a set B 00 independent of α 1 , α 2 , α 3 . For example, if we want X = a 1 1 + a 2 2 = 0 at b for all non-zero (a 1 , a 2 ) in F 2 p then it suffices that Y = 1 2 ( Above we assumed that e − 1 ≥ñ. If this is not the case, then our counting by poles gives Thus in both cases for e we get So at eachP onC where at least one of x 1 , x 2 , x 3 has a pole, if ωP > 0 we have and this holds trivially if ωP ≤ 0.
Thus summing over all suchP we get The sum on the right is the total number of poles (with multiplicity) of a generic linear combination of x 1 , x 2 , x 3 . So the sum is deg C, the total number of zeroes. As promised we now show that for any b in our present B 00 , indeed f −λ is not identically zero on C b . The corresponding assertion in the multiplicative situation is proved in [9] at the bottom of page 463.
Suppose on the contrary f = λ on C b . Choose any non-torsion τ in F 0 ; then But f − λ − τ is not constant on C η by our basic hypothesis, so f − λ − τ has at least one pole there. Thus C k ( f − λ − τ ) = C k f − λ k has a pole of order at least p k on C η . Therefore the first degree on the right-hand side of (77) is at least p k . Now we obtain a contradiction by making k tend to infinity.
The next result essentially replaces an argument in the proof of Lemma 6.1 of [9] (p.464); here we lack Mason's abc Theorem. Proof We assume first that b is in B 00 (F 0 ).
With as usual α i = A i (C) we have by (33) on C or C b . This is the analogue of the multiplicative In case (a) at least one of dx 1 , dx 2 , dx 3 must be non-zero atP on C b . Say it is dx = 0.
say, with fixed functions g i = dx i /dx. If (a) is false we have φ(P) = 0 on C b . But this implies for some D b possibly depending on b; unless, that is, φ is identically zero on C b . We deal with this last possibility first.
If φ is identically zero on C b then (A 1 (t), A 2 (t), A 3 (t)) in O 3 0 is in the additive relation group (with an obvious extension of the notion in [41] especially section 3) of g 1 , g 2 , g 3 in F 0 (b) (C b ). This group is of course a O 0 -module. If it were the full O 3 0 then g 1 , g 2 , g 3 would all be identically zero on C b which is absurd, because actually one of them is 1. So there is non-zero (ε 1 , ε 2 , ε 3 ) in R 3 , possibly depending on b, such that ε 1 α 1 + ε 2 α 2 + ε 3 α 3 = 0. We put the corresponding rank 2 submodule into E b .
Thus indeed we may assume that (78) holds. Now f (P) = 0, and so by Proposition 1 above together with the Northcott property we deduce that there are at most finitely many possibilities forP. For each of thoseP which are over a non-singular point we still have φ(P) = 0, and again this leads to (ε 1 , ε 2 , ε 3 ) as above. This completes case (a).
In case (b) forP over a singular point (ξ 1 , ξ 2 , ξ 3 ) of C b we have for a suitably chosen uniformizer u and with λ 1 , λ 2 , λ 3 values, not all zero, of fixed functions on C evaluated atP and specialized at b. Here r ≤ deg C because 3 . This gives the result unless μ = 0; in which case as above this leads again to (ε 1 , ε 2 , ε 3 ). Finally in case (c) we go back to the decomposition x i = x i + x i in the proof of Lemma 20, where x i involves only non-negative exponents and x i only (finitely many) negative exponents. As f (P) = 0 we must have α 1 x 1 + α 2 x 2 + α 3 x 3 = 0 identically on C b . This implies that x 1 , x 2 , x 3 cannot all be zero on C b , otherwise so would f = α 1 x 1 + α 2 x 2 + α 3 x 3 be, already impossible for b in B 00 thanks to Lemma 20. So some x i = 0. If also x i = 0 then Here and so say. And this clearly holds even if x i = 0. So for each i we can write x i = λ i u s + · · · as in (79), and as we can follow the arguments in case (b). If b = η then in (a) the only change is as follows. Now (78) becomes [F 0 (s 1 , . . . , s l , P) : F 0 (s 1 , . . . , s l )] ≤ D η , so the index in Lemma 19 is no bigger; this lemma shows that there are at most finitely manyP. The argument for (b) is essentially unchanged. Similarly for (c); here we already know that f is not identically zero on C η .
The result would become false without the condition involving E b . For example with C as the line parametrized by (x, t x, sx) and α 1 = C, α 2 = −1, α 3 = 0 we have f = x p contradicting (a) at P = (0, 0, 0). In fact we are not so very far from the classical abc Mason result. For example over C consider x a 1 (x − 1) a 2 (x − 2) a 3 − 1 for positive integer exponents. This is a non-zero polynomial of degree D = a 1 + a 2 + a 3 ; and abc implies at once that it has at least D − 2 zeroes without multiplicity. A Carlitz analogue might be 3 it is easy to see that this has degree D = max{||A 1 ||, ||A 2 ||, ||A 3 ||} in x. Using (33) as above we see that its derivative is simply A 1 + t A 2 + t 2 A 3 . If p = 2, 3 that is non-zero; thus we see that the polynomial now has exactly D zeroes (without multiplicity). Now we can give an analogue of Lemma 6.1 of [9] (p.464). We write N sing for the number of points ofC above singular points of C and S inf for the sum of 1 + deg u taken over all points above infinite points of C with u a corresponding uniformizer. Both quantities remain unchanged when we replace C by C b . For α 1 , α 2 , α 3 as above we define f as usual and then H in A 3 × A m by the equation f = 0. It is convenient during this section to assume that C B ∩ H is non-empty.

Lemma 22 For b in B
is a finite set whose cardinality satisfies Proof We estimate from above and below. We need to take into account only zeroesP of f . By Lemma 21, eachP in (81) above a non-singular point of C b contributes 1. The number of suchP is at most the cardinality of C b ∩ γ (H ) (finite by Lemma 20), which is the same as that of C B ∩ H ∩ π −1 (b) because γ is an injection even on C B ∩ π −1 (b). Similarly eachP above a singular point contributes at most deg C, and eachP above an infinite point at most (deg C)(1 + deg u). The left-hand inequality of (80) follows.
On the other hand (81) is at least the number of zeroes (without multiplicity) of f over finite points of C b , and this proves the right-hand inequality.
Next we give an analogue of Lemma 6.2 of [9] (p.464). The Dimension Theorem (see for example [35] p.36) shows that C B ∩ H (here assumed non-empty) has all its components of dimension l unless C B is in H ; but this last possibility is excluded by our original hypothesis on C.

empty). Then for any finite union W of components of C B ∩ H we have
Proof Write s(b), s(η) for the two cardinalities to be compared. Let W be the union of the components of C B ∩ H not in the union W , and write s (b), s (η) analogously. By Lemma 22 we have since the sets W ∩ π −1 (η), W ∩ π −1 (η) are disjoint. This is because W ∩ W has dimension at most l − 1, so cannot project to η. Also by Lemma 22 we have Comparing these inequalities we see that it suffices now only to verify s (b) ≤ s (η). But this follows from Lemma 16 (with W not W ). We just have to note that W ∩ π −1 (b) is finite by Lemma 22. However the next result has no analogue in [9] which is solidly over zero characteristic and so all inseparable degrees deg ins are 1.

Lemma 24 Suppose the nonzero
Proof By Lemma 22 we have

Almost finishing
Here we prove our Theorem for general K , which as above we can take as K = F 0 (s 1 , . . . , s l ) with l ≥ 1, and C defined over F 0 (s 1 , . . . , s m ) = F 0 (η). We may assume that C is not defined over F 0 , else every point with just one non-trivial relation would already be over F 0 and so our Theorem for that field suffices.
Throughout this section (as in the previous section) we shall assume that no non-zero ρ 1 x 1 + ρ 2 x 2 + ρ 3 x 3 is identically constant on C. In the next section we shall relax this as required. Now C is defined over some field K * , finitely generated over F p , which lies in F 0 (η). So we can find a finite extension F of F 0 with K * inside F(η). Both of these latter fields are finitely generated over F 0 with transcendence degree l and so the index [F(η) : K * ] = e is finite.
Fix once and for all some b in B 00 (F 0 ). By Lemma 20 the hypothesis of our Theorem for C b over F 0 is satisfied. Thus there are at most finitely many points on C b (F 0 ) satisfying two independent relations. Let r ≥ 0 be their cardinality.
Let P be any point on C with two independent relations (if it exists at all), and let W be the F-Zariski closure of (P, η) on C B in A 3 × A m . Now there is a submodule M of R 3 of rank at least 2 that kills P. It may be that M lies in the unions E b , E η (if non-empty) appearing in Lemma 21. But then M would be in one of the members making up E b ∪ E η . Using GL 3 (R) as at the end of the proof of Lemma 19, we can assume that M actually lies in R 2 . But now the problem in G 3 a is reduced to one in G 2 a , an easy torsion problem.
Thus we can assume that M does not lie in E b ∪ E η . Pick any (α 1 , α 2 , α 3 ) in M not in E b ∪ E η . This gives an H as above; but of course there is an element of M independent of (α 1 , α 2 , α 3 ), and this gives analogously some H .
Thus (P, η) lies in C B ∩ H ∩ H (and in particular C B ∩ H is non-empty as we assumed in the preceding section). The dimension of W is at least l. It follows that W (which is irreducible over F) is a finite union of F 0 -irreducible components of C B ∩ H (all of dimension l as we saw above). Therefore by Lemma 23 we have #(W ∩ π −1 (b)) ≥ #(W ∩ π −1 (η)) − c 1 for some c 1 depending only on C.
Thus we can use Lemma 19 (in which the index is no bigger) to conclude as desired that there are at most finitely many P. This completes the proof of the Theorem when no non-zero ρ 1 x 1 + ρ 2 x 2 + ρ 3 x 3 is identically constant on C.

Relaxing the condition
Finally if some non-zero ρ 1 x 1 + ρ 2 x 2 + ρ 3 x 3 is identically constant on C, then again via GL 3 we can assume that x 3 is a constant ξ 3 on C. It is then necessarily non-torsion. Now the thing reduces to the analogue of Mordell-Lang on the projection to G 2 a : both ξ 1 and ξ 2 are in the division hull of Rξ 3 . In Sect. 4 we had reached a similar stage, but that was over F 0 . Oddly enough the extension to F 0 (s 1 , . . . , s l ) is not in the literature, although Ghioca comes very close in [27]. In the personal communication [28] he does establish what we need here. But we can also use the ideas of [9] as follows.
Indeed we argue as in Sect. 4. Namely as in (23) we can find ξ and a torsion ζ such that and (24). Further if ζ has order ν then τ 1 , τ 2 , τ 3 are polynomials in R of degree less than that of ν.
Then we can proceed down to (26), now in the form as h s = 0 on all of F 0 . Now h s (ξ 3 ) 1 as ξ 3 is fixed. But we claim that also h s (ξ 1 ) 1, h s (ξ 2 ) 1.
To see this, project C down to a curve C in G 2 a . There is a non-trivial Carlitz relation between ξ 1 , ξ 2 and so by Lemma 18 we indeed get (84) unless some non-trivial ρ 1 x 1 + ρ 2 x 2 is constant on C . With GL 2 we could then assume x 2 = ξ 2 is constant on C . But now ξ 2 , ξ 3 must be independent and so we could not have had two independent relations among ξ 1 , ξ 2 , ξ 3 . Now (83) implies We can assume that ξ is not in F 0 , because otherwise by (24) the point (ξ 1 , ξ 2 , ξ 3 ) would be in C(F 0 ); and because C is not defined over F 0 this would give the required finiteness at once. Now in (68) we have .
Funding Open access funding provided by University of Basel.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. at (ξ 1 , ξ 2 , ξ 3 ), where P 1 , P 2 , P 3 are the partial derivatives and α 1 = A 1 (C) etc.
If Q = 0 on C then by Bezout we get the required bound for the entire degree [F 0 (ξ 1 , ξ 2 , ξ 3 ) : F 0 ] and we are done.
Thus we can assume Q = 0 on C. We write Q = U 1 (t)P 1 + U 2 (t)P 2 + U 3 (t)P 3 for the appropriate minors (not all zero) of the Jacobian matrix. Then with as usual O 0 = F p [t] the O 0 -module ∇ of (Q 1 , Q 2 , Q 3 ) in O 3 0 such that Q 1 P 1 + Q 2 P 2 + Q 3 P 3 = 0 on C has rank r with r = 1, 2, 3. We consider each case in turn.
If r = 3 then P 1 = P 2 = P 3 = 0 on C. As the degree of P is minimal this implies P 1 = P 2 = P 3 = 0 identically. But then P =P p contradicting this very minimality.
If r = 2 (the crucial case), then this means there are fixed R 1 , R 2 , R 3 in O 0 , not all zero, such that R 1 Q 1 + R 2 Q 2 + R 3 Q 3 = 0 on ∇. In particular which also relates the minors of (86). Now we could operate with GL 3 (R) in the usual way to produce from (86) a relation γ 3 ξ 3 = 0 on a different curve. But it is more straightforward to assume U 3 = 0, and with ρ 1 = R 1 (C), ρ 2 = R 2 (C), ρ 3 = R 3 (C) then multiply the first relation in (86) by ρ 1 β 2 − ρ 2 β 1 and the second relation by ρ 1 α 2 − ρ 2 α 1 and subtract. What comes out using (87) is U 3 (C)ξ = 0 for ξ = ρ 1 ξ 1 +ρ 2 ξ 2 +ρ 3 ξ 3 . Thus ξ is torsion. The corresponding function x = ρ 1 x 1 + ρ 2 x 2 + ρ 3 x 3 on C is by hypothesis not zero on C. It cannot be constant either, because then its value would be torsion also contradicting the hypothesis. Thus F 0 (x 1 , x 2 , x 3 ) is a fixed finite extension of F 0 (x). Similarly for the specializations F 0 (ξ 1 , ξ 2 , ξ 3 ), F 0 (ξ ). But we already noted that torsion is separable. It follows that Finally if r = 1 then (U 1 (t), U 2 (t), U 3 (t)) in projective P 2 (F 0 ) is uniquely determined by P. But these are the Grassmann coordinates of the space of relations (86). So this space has generators of bounded degree. In other words we can consider each of (86) as a multiple of a fixed relation. Now we can repeat the above argument for just one of them.
It seems an interesting problem to extend the lemma to G n a in the situation of the Conjecture (for K = F 0 again).