Cracking the problem with 33

Inspired by the Numberphile video"The uncracked problem with 33"by Tim Browning and Brady Haran, we investigate solutions to $x^3+y^3+z^3=k$ for a few small values of $k$. We find the first known solution for $k=33$.


Introduction
Let k be a positive integer with k ≡ ±4 (mod 9). Then Heath-Brown [HB92] has conjectured that there are infinitely many triples (x, y, z) ∈ Z 3 such that Various numerical investigations of (1) have been carried out, beginning as early as 1954 [MW55]; see [BPTYJ07] for a thorough account of the history of these investigations up to 2000. The computations performed since that time have been dominated by an algorithm due to Elkies [Elk00]. The latest that we are aware of is the paper of Huisman [Hui16] (based on the implementation by Elsenhans and Jahnel [EJ09]), which determined all solutions to (1) with k < 1000 and max{|x|, |y|, |z|} ≤ 10 15 . In particular, Huisman reports that solutions are known for all but 13 values of k < 1000: (2) 33, 42, 114, 165, 390, 579, 627, 633, 732, 795, 906, 921, 975.
Elkies' algorithm works by finding rational points near the Fermat curve X 3 + Y 3 = 1 using lattice basis reduction; it is well suited to finding solutions for many values of k simultaneously. In this paper we describe a different approach that is more efficient when k is fixed. It has the advantage of provably finding all solutions with a bound on the smallest coordinate, rather than the largest as in Elkies' algorithm. This always yields a nontrivial expansion of the search range since, apart from finitely many exceptions that can be accounted for separately, one has max{|x|, |y|, |z|} > 3 √ 2 min{|x|, |y|, |z|}.
Moreover, empirically it is often the case that one of the variables is much smaller than the other two, so we expect the gain to be even greater in practice. Our strategy is similar to some earlier approaches (see especially [HBLtR93], [Bre95], [KTS97] and [BPTYJ07]), and is based on the observation that in any solution, k − z 3 = x 3 + y 3 has x + y as a factor. Our main contribution over the earlier investigations is to note that with some time-space tradeoffs, the running time is very nearly linear in the height bound, and it is quite practical when implemented on modern 64-bit computers.
This work was carried out using the computational facilities of the Advanced Computing Research Centre, University of Bristol, http://www.bris.ac.uk/acrc/. The author was partially supported by EPSRC Grant EP/K034383/1. No data were created in the course of this study.

1
If k − z 3 = 0 then y = −x, and every value of x yields a solution. Otherwise, setting d = |x + y| = |x| + y sgn x, we see that d divides |k − z 3 |, and Thus, given a candidate value for z, there is an effective procedure to find all corresponding values of x and y, by running through all divisors of |k − z 3 |. Already this basic algorithm finds all solutions with min{|x|, |y|, |z|} ≤ B in time O(B 1+ε ), assuming standard heuristics for the time complexity of integer factorization. In the next section we explain how to avoid factoring and achieve the same ends more efficiently.
Acknowledgements. I thank Roger Heath-Brown for helpful comments and suggestions.

Methodology
For ease of presentation, we will assume that k ≡ ±3 (mod 9); note that this holds for all k in (2). Since the basic algorithm described above is reasonable for finding small solutions, we will assume henceforth that |z| > √ k. Also, if we specialize (1) to solutions with y = z, then we get the Thue equation x 3 + 2y 3 = k, which is efficiently solvable. Using the Thue solver in PARI/GP [The18], we verify that no such solutions exist for the k in (2). Hence we may further assume that y = z. Since Likewise, since x 3 +z 3 = k−y 3 and |y| ≥ |z|, we have sgn y = − sgn x = sgn z. Multiplying both sides of (1) by − sgn z, we thus obtain Since 3α(α + 2) > 1, this is incompatible with our assumptions that y = z and |z| > √ k. Thus we must have 0 < d < α|z|.
Next, reducing (4) modulo 3 and recalling our assumption that k ≡ ±3 (mod 9), we see that d = |x| − |y| ≡ |z| (mod 3). Let ǫ ∈ {±1} be so that k ≡ 3ǫ (mod 9). Then, since every cube is congruent to 0 or ±1 (mod 9), we must have x ≡ y ≡ z ≡ ǫ (mod 3), so that sgn z = ǫ |z| 3 = ǫ d 3 . In view of (3), we get a solution to (1) if and only if d | z 3 − k and 3d(4|z In summary, to find all solutions to (1) with |x| ≥ |y| ≥ |z| > √ k, y = z and |z| ≤ B, it suffices to solve the following system for each d ∈ Z ∩ (0, αB) coprime to 3: Our approach to solving this is straightforward: we work through the values of d recursively by their prime factorizations, and apply the Chinese remainder theorem to reduce the solution of z 3 ≡ k (mod d) to the case of prime power modulus, to which standard algorithms apply. Let r d (k) = #{z (mod d) : z 3 ≡ k (mod d)} denote the number of cube roots of k modulo d. By standard analytic estimates, since k is not a cube, we have Thus, we can work out all z satisfying the first line of (5), as a union of arithmetic progressions, in linear time. To detect solutions to the final line, it is crucial to have a quick method of determining whether ∆ := 3d 4ǫ d 3 (z 3 − k) − d 3 is a square. We first note that for fixed d this condition reduces to finding an integral point on an elliptic curve; specifically, writing X = 12d|z| and Y = (6d) 2 |x − y|, from (3) we see that (X, Y ) lies on the Mordell curve Thus, for fixed d there are at most finitely many solutions, and they can be effectively bounded. For some small values of d it is practical to find all the integral points on Next we note that some congruence and divisibility constraints come for free: Lemma. Let z be a solution to (5), let p be a prime number, and set s = ord p d, t = ord p (z 3 − k). Then: (i) z ≡ 4 3 k(2 − d 2 ) + 9(k + d) (mod 18); (ii) if p ≡ 2 (mod 3) then t ≤ 3s; (iii) if t ≤ 3s then s ≡ t (mod 2); (iv) if ord p k ∈ {1, 2} then s ∈ {0, ord p k}.
If 3s < t then p −4s ∆ ≡ −3u 4 (mod 4p), but this is impossible when p ≡ 2 (mod 3), since −3 is not a square modulo 4p. Hence we must have t ≤ 3s in that case. Next suppose that t ≤ 3s. We consider the following cases, which cover all possibilities: • If p = 3 then s = t = 0, so s ≡ t (mod 2).
Finally, suppose that p | k and p 3 ∤ k. If s = 0 then there is nothing to prove, so assume otherwise. Since d | z 3 − k, we must have p | z, whence 0 < s ≤ t = ord p (z 3 − k) = ord p k < 3s.
By part (iii) it follows that s ≡ ord p k (mod 2), and thus s = ord p k.
Thus, once the residue class of z (mod d) is fixed, its residue modulo lcm(d, 18) is determined. Note also that conditions (ii) and (iii) are efficient to test for p = 2.
However, even with these optimizations there are ≫ B log B pairs d, z satisfying the first line of (5) and conclusions (i) and (iv) of the lemma. To achieve better than O(B log B) running time therefore requires eliminating some values of z from the start. We accomplish this with a standard time-space tradeoff. To be precise, set P = 3(log log B)(log log log B), and let M = 5≤p≤P p be the product of primes in the interval [5, P ]. By the prime number theorem, we have log M = (1 + o(1))P . If ∆ is a square, then for any prime p | M we have where c ≡ ǫ d 3 k + d 3 4 (mod M). When lcm(d, 18) ≤ αB/M, we first compute this function for every residue class |z| (mod M), and select only those residues for which (7) holds for every p | M. By Hasse's bound, the number of permissible residues is at most For the z that are not eliminated in this way, we follow a similar strategy with a few other auxiliary moduli M ′ composed of larger primes, in order to accelerate the square testing. We precompute tables of cubes modulo M ′ and Legendre symbols modulo p | M ′ , so that testing (7) is reduced to table lookups. Only when all of these tests pass do we compute ∆ in multi-precision arithmetic [Gt16] and apply a general square test, and this happens for a vanishingly small proportion of candidate values. In fact we expect the number of Legendre tests to be bounded on average, so in total, finding all solutions with |z| ≤ B should require no more than O k B(log log B)(log log log B) table lookups and arithmetic operations on integers in [0, B].
Thus, when B fits within the machine word size, we expect the running time to be nearly linear, and this is what we observe in practice for B < 2 64 .

Implementation
We implemented the above algorithm in C, with a few inline assembly routines for Montgomery arithmetic [Mon85] written by Ben Buhrow [Buh19], and Kim Walisch's primesieve library [Wal19] for enumerating prime numbers.
The algorithm is naturally split between values of d with a prime factor exceeding √ αB and those that are √ αB-smooth. The former set of d consumes more than twothirds of the running time, but is more easily parallelized. We ran this part on the massively parallel cluster Bluecrystal Phase 3 at the Advanced Computing Research Centre, University of Bristol. For the smooth d we used a separate small cluster of 32and 64-core nodes.