Batch NFS
Abstract
This paper shows, assuming standard heuristics regarding the numberfield sieve, that a “batch NFS” circuit of area \(L^{1.181\ldots +o(1)}\) factors \(L^{0.5+o(1)}\) separate \(B\)bit RSA keys in time \(L^{1.022\ldots +o(1)}\). Here \(L=\exp ((\log 2^B)^{1/3}(\log \log 2^B)^{2/3})\). The circuit’s areatime product (priceperformance ratio) is just \(L^{1.704\ldots +o(1)}\) per key. For comparison, the best areatime product known for a single key is \(L^{1.976\ldots +o(1)}\).
This paper also introduces new “earlyabort” heuristics implying that “earlyabort ECM” improves the performance of batch NFS by a superpolynomial factor, specifically \(\exp ((c+o(1))(\log 2^B)^{1/6}(\log \log 2^B)^{5/6})\) where \(c\) is a positive constant.
Keywords
Integer factorization Numberfield sieve Priceperformance ratio Batching Smooth numbers Elliptic curves Early aborts1 Introduction
The cryptographic community reached consensus a decade ago that a 1024bit RSA key can be broken in a year by an attack machine costing significantly less than \(10^9\) dollars. See [51], [38], [24], and [23]. The attack machine is an optimized version of the numberfield sieve (NFS), a factorization algorithm that has been intensively studied for twenty years, starting in [36]. The runtime analysis of NFS relies on various heuristics, but these heuristics have been confirmed in a broad range of factorization experiments using several independent NFS software implementations: see, e.g., [29], [30], [31], and [4].

signed by that domain’s 1024bit “zonesigning key”, which in turn is

signed by that domain’s 2048bit “keysigning key”, which in turn is

signed by .org’s 1024bit zonesigning key, which in turn is

signed by .org’s 2048bit keysigning key, which in turn is

signed by the DNS root’s 1024bit zonesigning key, which in turn is

signed by the DNS root’s 2048bit keysigning key.
A report [41] last year indicated that, out of the 112 toplevel domains using DNSSEC, 106 used the same key sizes as .org. We performed our own survey of zonesigning keys in September 2014, after many new toplevel domains were added. We found 286 domains using 1024bit keys; 4 domains using 1152bit keys; 192 domains using 1280bit keys; and just 22 domains using larger keys. Almost all of the 1280bit keys are for obscure domains such as .boutique and .rocks; highvolume domains practically always use 1024bit keys.

A typical RSA key is believed to be worth less than the cost of the attack machine.

Building the attack machine means building a huge farm of applicationspecific integrated circuits (ASICs). Standard computer clusters costing the same amount of money are believed to take much longer to perform the same calculations.

It is believed that switching RSA signature keys after (e.g.) a month will render the attack machine useless, since the attack machine requires a full year to run.
DNSSEC signing keys should be large enough to avoid all known cryptographic attacks during the effectivity period of the key. To date, despite huge efforts, no one has broken a regular 1024bit key; in fact, the best completed attack is estimated to be the equivalent of a 700bit key. An attacker breaking a 1024bit signing key would need to expend phenomenal amounts of networked computing power in a way that would not be detected in order to break a single key. Because of this, it is estimated that most zones can safely use 1024bit keys for at least the next ten years.
This quote illustrates the first and third beliefs reported above: the attack cost would be “phenomenal” and would break only “a single key”; furthermore, the attack would have to be completed “during the effectivity period of the key”. A typical DNSSEC key is valid for just one month and is then replaced by a new key.

“Many”: The attacker is faced not with a single target, but with many targets. The algorithmic task here is not merely to break, e.g., a single 1024bit RSA key; it is to break more than one hundred 1024bit RSA keys for DNSSEC toplevel domains, many more 1024bit RSA keys at lower levels of DNSSEC, millions of 1024bit RSA keys in SSL (as in [25], and [35]; note that upgrading SSL to 2048bit RSA does nothing to protect the confidentiality of previously recorded SSL traffic), etc. This is important if there are ways to share attack work across the keys.

“Priceperformance ratio”: As in [53], [18], [50], [15], [54], [7], [51], [56], [23], [24], etc., our main interest is not in the number of “operations” carried out by an algorithm, but in the actual price and performance of a machine carrying out those operations. Parallelism increases price but often improves performance; large storage arrays are a problem for both price and performance. We use priceperformance ratio as our primary cost metric, but we also report time separately since signaturekey rotation puts a limit upon time.

“Asymptotic”: The cost improvements that we present are superpolynomial in the size of the numbers being factored. We thus systematically suppress all polynomial factors in our cost analyses, simplifying the analyses.
For comparison (see Table 1.4), the best areatime product known for factoring a single key (without quantum computers) is \(L^{1.976\ldots +o(1)}\), even if nonuniform precomputations such as Coppersmith’s “factorization factory” are allowed. The literature is reviewed below.
This paper also looks more closely at the \(L^{o(1)}\). The main bottleneck in batch NFS is not traditional sieving, but rather lowmemory factorization, motivating new attention to the complexity of lowmemory factorization. Traditional ECM, the ellipticcurve method of recognizing \(y\)smooth integers, works in low memory and takes time \(\exp (\sqrt{(2+o(1)){\log y\log \log y}})\). One can reasonably guess that, compared to traditional ECM, “earlyabort ECM” saves a subexponential factor here, but the complexity of earlyabort ECM has never been analyzed. Section 3 of this paper introduces new earlyabort heuristics implying that the complexity of earlyabort ECM is \(\exp \left( \sqrt{\left( {8\over 9}+o(1)\right) {\log y\log \log y}}\right) \). Using early aborts increases somewhat the number of auxiliary integers that need to be factored, producing a further increase in cost, but the cost is outweighed by the faster factorization.
The ECM cost is obviously bounded by \(L^{o(1)}\): more precisely, the cost is \(\exp (\varTheta ((\log N)^{1/6}(\log \log N)^{5/6}))\) in the context of batch NFS, since \(y\in L^{\varTheta (1)}\). This cost is invisible at the level of detail of \(L^{1.704\ldots +o(1)}\). The speedup from ECM to earlyabort ECM is nevertheless superpolynomial and directly translates into the same speedup in batch NFS.

Users comparing the value of an RSA key to the cost of an attack machine need to know the perkey cost of batch NFS. This has not been seriously studied. What the literature has actually studied in detail is the cost of NFS attacking one key at a time; this is not the same question. Our asymptotic results do not rule out the possibility that these costs are the same for 1024bit RSA, but there is also no reason to be confident about any such possibility.

Most of the literature on singlekey NFS relies heavily on operations that—for large key sizes—are not handled efficiently by current CPUs and that become much more efficient on ASICs: consider, for example, the routing circuit in [51]. Batch NFS relies much more heavily on massively parallel ellipticcurve scalar multiplication, exactly the operation that is shown in [12], [11], and [17] to fit very well into offtheshelf graphics cards. The literature supports the view that offtheshelf hardware is much less costeffective than ASICs for singlekey NFS, but there is no reason to think that the same is true for batch NFS.

The natural machine size for batch NFS (i.e., the circuit area if priceperformance ratio is optimized) is larger than the natural machine size for singlekey NFS, but the natural time is considerably smaller. As above, these asymptotic results undermine any confidence that one can obtain from comparing the natural time for singlekey NFS to the rotation interval for signature keys: there is no reason to think that the latency of batch NFS will be as large as the latency of singlekey NFS. Note that, even though this paper emphasizes optimal priceperformance ratio for simplicity, there are also techniques to further reduce the time below the natural time, hitting much lower latency targets without severely compromising priceperformance ratio: in particular, for the core sorting subroutines inside linear algebra, one can replace time \(T\) with \(T/f\) at the expense of replacing area \(A\) with \(Af^2\).
There are many other examples of cryptosystems where the attack cost does not grow linearly with the number of targets. For example, it is well known that exhaustive search finds preimages for \(U\) hash outputs in about the same time as a preimage for a single hash output; furthermore, the first preimage that it finds appears after only \(1/U\) of the total time, reducing actual security by \(\lg U\) bits. However, most cryptosystems have moved up to at least a “\(128\)bit” security level, giving them a buffer against losing some bits of security. RSA is an exception: its poor performance at high security levels has kept it at a bleedingedge “80bit security” level. Even when users can be convinced to move away from 1024bit keys, they normally move to \({\le }\)2048bit keys. We question whether it is appropriate to view 1024bit keys as “80bit” security and 2048bit keys as “112bit” security if the attacker’s costs per key are not so high.
1.3 Previous work. In the NFS literature, as in the algorithm literature in general, there is a split between traditional analyses of “operations” (adding two \(64\)bit integers is one “operation”; looking up an element of a \(2^{64}\)byte array is one “operation”) and modern analyses of more realistic models of computation. We follow the terminology of our paper [14]: the “RAM metric” counts traditional operations, while the “\(AT\) metric” multiplies the area of a circuit by the time taken by the same circuit.
Buhler, H. Lenstra, and Pomerance showed in [19] (assuming standard NFS heuristics, which we now stop mentioning) that NFS factors a single key \(N\) with RAM cost \(L^{1.922\ldots +o(1)}\). As above, \(L\) means \(\exp ((\log N)^{1/3}(\log \log N)^{2/3})\). This exponent \(1.922\ldots \) is the most frequently quoted cost exponent for NFS.
Asymptotic exponents for several variants of NFS, assuming standard heuristics. “Exponent” \(e\) means asymptotic cost \(L^{e+o(1)}\) per key factored. “Precomp” \(2\theta \) means that there is a precomputation involving integer pairs \((a,b)\) up to \(L^{\theta +o(1)}\), for total precomputation cost \(L^{2\theta +o(1)}\); algorithms without precomputation have \(2\theta =0\). “Batch” \(\beta \) means batch size \(L^{\beta +o(1)}\); algorithms handling each key separately have \(\beta =0\). See Sect. 2 for further details.
metric  exponent  precomp  batch  source 

\(AT\)  \(1.976\ldots \)  \(0\)  \(0\)  2001 Bernstein [7] 
RAM (unrealistic)  \(1.922\ldots \)  \(0\)  \(0\)  1993 Buhler–H. Lenstra–Pomerance [19] 
RAM (unrealistic)  \(1.901\ldots \)  \(0\)  \(0\)  1993 Coppersmith [20] 
\(AT\)  \(1.900\ldots \)  \(0\)  \(0.1\)  batch NFS; this paper 
\(AT\)  \(1.829\ldots \)  \(0\)  \(0.2\)  batch NFS; this paper 
\(AT\)  \(1.763\ldots \)  \(0\)  \(0.3\)  batch NFS; this paper 
\(AT\)  \(1.710\ldots \)  \(0\)  \(0.4\)  batch NFS; this paper 
\(AT\)  \(1.704\ldots \)  \(0\)  \(0.5\)  batch NFS; this paper 
RAM (unrealistic)  \(1.638\ldots \)  \(2.006\ldots \)  \(0\)  1993 Coppersmith [20] 

The bottleneck in [19] is sieving, while the bottleneck in [20] is ECM. Both of these algorithms use \(L^{o(1)}\) operations in the RAM metric, but the \(o(1)\) is considerably smaller for sieving than for ECM.

Even if the \(o(1)\) in [20] were as small as the \(o(1)\) in [19], there would not be much benefit in \(1.901\ldots +o(1)\) compared to \(1.922\ldots +o(1)\). For example, \((2^{50})^{1.922}\approx 2^{96}\) while \((2^{50})^{1.901}\approx 2^{95}\).

The change from \(1.901\ldots +o(1)\) to \(1.638\ldots +o(1)\) is much larger, but it comes at the cost of massive memory consumption. Specifically, [20] requires space \(L^{1.638\ldots +o(1)}\), while [19] uses space just \(L^{0.961\ldots +o(1)}\). This is not visible in the RAM metric but is obviously a huge problem in reality, and it becomes increasingly severe as computations grow larger. As a concrete illustration of the realworld costs of storage and computation, paying for \(2^{70}\) bytes of slow storage (about \(30\cdot 10^9\) USD in hard drives) is much more troublesome than paying for \(2^{80}\) floatingpoint multiplications (about \(0.02\cdot 10^9\) USD in GPUs plus \(0.005\cdot 10^9\) USD for a year of electricity).
At the time there was already more than a decade of literature showing how to analyze algorithm asymptotics in more realistic models of computation that account for memory consumption, communication, etc.; see, e.g., [18]. Bernstein in [7] analyzed the circuit performance of NFS, concluding that an optimized circuit of area \(L^{0.790\ldots +o(1)}\) would factor \(N\) in time \(L^{1.18\ldots +o(1)}\), for priceperformance ratio \(L^{1.976\ldots +o(1)}\). [7] did not analyze the factorization factory but did analyze multiple number fields, concluding that they did not reduce \(AT\) cost. The gap between the RAM exponent \(1.901\ldots +o(1)\) from [20] and the \(AT\) exponent \(1.976\ldots +o(1)\) from [7] is explained primarily by communication overhead inside linear algebra, somewhat moderated by parameter choices that reduce the cost of linear algebra at the expense of relation collection.
We pointed out in [14] that the factorization factory does not reduce \(AT\) cost. In Sect. 2 we review the reason for this and explain how batch NFS works around it. We also presented in [14] a superpolynomial improvement to the factorization factory in the RAM metric, by eliminating ECM in favor of batch trial division, but this is not useful in the \(AT\) metric.
2 Exponents
This section reviews NFS and then explains how to drastically reduce the \(AT\) cost of NFS through batching. The resulting cost exponent, \(1.704\ldots \) in Table 1.4, is new. All costs in this section are expressed as \(L^{e+o(1)}\) for various exponents \(e\). Section 3 looks more closely at the \(L^{o(1)}\) factor.
2.1 QS: the Quadratic sieve (1982). As a warmup for NFS we briefly review the general idea of combining congruences, using QS as an example.
QS writes down a large collection of congruences modulo the target integer \(N\) and tries to find a nontrivial subcollection whose product is a congruence of squares. One can then reasonably hope that the difference of square roots has a nontrivial factor in common with \(N\).
Specifically, QS computes \(s\approx \sqrt{N}\) and writes down the congruences \(s^2\equiv s^2N\), \((s+1)^2\equiv (s+1)^2N\), etc. The left side of each congruence is already a square. The main problem is to find a nontrivial set of integers \(a\) such that the product of \((s+a)^2N\) is a square.
If \((s+a)^2N\) is divisible by a very large prime then it is highly unlikely to participate in a square: the prime would have to appear a second time. QS therefore focuses on smooth congruences: congruences where \((s+a)^2N\) factors completely into small primes. Applying linear algebra modulo 2 to the matrix of exponents in these factorizations is guaranteed to find nonempty subsets of the congruences with square product once the number of smooth congruences exceeds the number of small primes.
The integers \(a\) such that \((s+a)^2N\) is divisible by a prime \(p\) form a small number of arithmetic progressions modulo \(p\). “Sieving” means jumping through these arithmetic progressions to mark divisibility, the same way that the sieve of Eratosthenes jumps through arithmetic progressions to mark nonprimality.
2.2 NFS: the numberfield sieve (1993). NFS applies the same idea, but instead of congruences modulo \(N\) it uses congruences modulo a related algebraic number \(m\alpha \). This algebraic number is chosen to have norm \(N\) (divided by a certain denominator shown below), and one can reasonably hope to obtain a factorization of \(N\) by obtaining a random factorization of this algebraic number.
Specifically, NFS chooses a positive integer \(m\), and writes \(N\) as a polynomial in radix \(m\): specifically, \(N=f(m)\) where \(f\) is a degree\(d\) polynomial with coefficients \(f_d,f_{d1},\ldots ,f_0\in \mathord {\left\{ {0,1,\ldots ,m1}\right\} }\). NFS then takes \(\alpha \) as a root of \(f\). The norm of \(ab\alpha \) is then \(f_da^d+f_{d1}a^{d1}b+\cdots +f_0b^d\) (divided by \(f_d\)), and in particular the norm of \(m\alpha \) is \(N\) (again divided by \(f_d\)).
The range of pairs \((a,b)\) searched for smooth congruences is the set of integer pairs in the rectangle \([H,H]\times [1,H]\). Here \(H\) is chosen so that there will be enough smooth congruences to produce squares at the end of the algorithm. Standard heuristics state that \(abm\) has smoothness probability \(L^{\mu /(3\gamma )+o(1)}\) if \(a\) and \(b\) are on much smaller scales than \(m\); in particular, if \(H\in L^{\theta +o(1)}\) for some positive real number \(\theta \) then the number of congruences with \(abm\) smooth is \(L^{\phi +o(1)}\) with \(\phi =2\theta \mu /(3\gamma )\). Standard heuristics also state the simultaneous smoothness probability of \(abm\) and \(ab\alpha \), implying that to obtain enough smooth congruences one can take \(H\in L^{\theta +o(1)}\) with \(\theta =(3\mu \gamma ^2+2\mu ^2)/(6\mu \gamma 1)\) and \(\phi =(18\mu \gamma ^3+6\mu ^2\gamma +\mu )/(18\mu \gamma ^23\gamma )\). See, e.g., [19]. We henceforth assume these formulas for \(\theta \) and \(\phi \) in terms of \(\mu \) and \(\gamma \).
2.3 RAM cost analysis (1993). Sieving for \(y\)smoothness of \(H^{2+o(1)}\) polynomial values uses \(H^{2+o(1)}\) operations, provided that \(y\) is bounded by \(H^{2+o(1)}\). The point here is that the pairs \((a,b)\) with congruences divisible by \(p\) form a small number of shifted lattices of determinant \(p\), usually with basis vectors of length \(O(\sqrt{p})\), making it easy to find all the lattice points inside the rectangle \([H,H]\times [1,H]\). The number of operations is thus essentially the number of points marked, and each point is marked just \(\sum _{p\le y} 1/p\approx \log \log y\) times.
Sparse techniques for linear algebra involve \(y^{1+o(1)}\) matrixvector multiplications, each involving \(y^{1+o(1)}\) operations, for a total of \(y^{2+o(1)}\) operations. Other subroutines in NFS take negligible time, so the overall RAM cost of NFS is \(L^{\mathrm{max }\mathopen {}\mathord {\left\{ {2\theta ,2\gamma }\right\} }+o(1)}\).
It is not difficult to see that the exponent \(\mathrm{max }\mathopen {}\mathord {\left\{ {2\theta ,2\gamma }\right\} }\) achieves its minimum value \((64/9)^{1/3}=1.922\ldots \) with \(\mu =(1/3)^{1/3}=0.693\ldots \) and \(\theta =\gamma =(8/9)^{1/3}=0.961\ldots \). This exponent \(1.922\ldots \) is the NFS exponent from [19], and as mentioned earlier is the most frequently quoted NFS exponent. We do not review the multiplenumberfields improvement to \(1.901\ldots \) from [20]; as far as we know, multiple number fields do not improve any of the exponents analyzed below.
2.4 \(\boldsymbol{AT}\) cost analysis (2001). In the \(AT\) metric there is an important obstacle to cost \(H^{2+o(1)}\) for sieving: namely, communicating across area \(H^{2+o(1)}\) takes time at least \(H^{1+o(1)}\). One can efficiently split the sieving problem into \(H^{2+o(1)}/y^{1+o(1)}\) tasks, running one task after another on a smaller array of size \(y^{1+o(1)}\), but communicating across this array still takes time at least \(y^{0.5+o(1)}\), so \(AT\) is at least \(H^{2+o(1)}y^{0.5+o(1)}\).
Fortunately, there is a much more efficient alternative to sieving: ECM, explained in Appendix A of the full version of this paper online. What matters in this section is that ECM tests \(y\)smoothness in time \(y^{o(1)}\) on a circuit of area \(y^{o(1)}\). A parallel array of ECM units, each handling a separate number, tests \(y\)smoothness of \(H^{2+o(1)}\) polynomial values in time \(H^{2+o(1)}/y^{1+o(1)}\) on a circuit of area \(y^{1+o(1)}\), achieving \(AT=H^{2+o(1)}\).
Unfortunately, the same obstacle shows up again for linear algebra, and this time there is no efficient alternative. Multiplying a sparse matrix by a vector requires time \(y^{0.5+o(1)}\) on a circuit of area \(y^{1+o(1)}\), and must be repeated \(y^{1+o(1)}\) times. The overall \(AT\) cost of NFS is \(L^{\mathrm{max }\mathopen {}\mathord {\left\{ {2\theta ,2.5\gamma }\right\} }+o(1)}\).
The exponent \(\mathrm{max }\mathopen {}\mathord {\left\{ {2\theta ,2.5\gamma }\right\} }\) achieves its minimum value \(1.976\ldots \) with \(\mu =0.702\ldots \), \(\gamma =0.790\ldots \), and \(\theta =0.988\ldots \). This exponent \(1.976\ldots \) is the NFS exponent from [7]. Notice that \(\gamma \) is much smaller here than it was in the RAM optimization: \(y\) has been reduced to keep the cost of linear algebra under control, but this also forced \(\theta \) to increase.
2.5 The factorization factory (1993). Coppersmith in [20] precomputes “tables which will be useful for factoring any integers in a large range ... after the precomputation, an individual integer can be factored in time \(L[1/3,1.639]\)”, i.e., \(L^{{\approx }1.639+o(1)}\).
Coppersmith’s table is simply the set of \((a,b)\) such that \(abm\) is smooth. One reuses \(m\), and thus this table, for any integer \(N\) between (e.g.) \(m^d\) and \(m^{d+1}\).
Coppersmith’s method to factor “an individual integer” is to test smoothness of \(ab\alpha \) for each \((a,b)\) in the table. At this point Coppersmith has found the same smooth congruences as conventional NFS, and continues with linear algebra in the usual way.
Coppersmith uses ECM to test smoothness. The problem with sieving here is not efficiency, as in the (subsequent) paper [7], but functionality: sieving can handle polynomial values only at regularly spaced inputs, and the pairs \((a,b)\) in this table are not regularly spaced.
Recall that the size of this table is \(L^{\phi +o(1)}\) with \(\phi =2\theta \mu /(3\gamma )\). ECM uses \(L^{o(1)}\) operations per number, for a total smoothness cost of \(L^{\phi +o(1)}\), asymptotically a clear improvement over the \(L^{2\theta +o(1)}\) for conventional NFS.
The overall RAM cost of the factorization factory is \(L^{\mathrm{max }\mathopen {}\mathord {\left\{ {\phi ,2\gamma }\right\} }+o(1)}\). The exponent achieves its minimum value \(1.638\ldots \) with \(\mu =0.905\ldots \), \(\gamma =0.819\ldots \), \(\theta =1.003\ldots \), and \(\phi =1.638\ldots \). This is the exponent from [20].
The \(AT\) metric tells a completely different story, as we pointed out in [14]. The area required for the table is \(L^{\phi +o(1)}\). This area is easy to reuse for very fast parallel smoothness detection, finishing in time \(L^{o(1)}\). Unfortunately, collecting the smooth results then takes time \(L^{0.5\phi +o(1)}\), for an \(AT\) cost of at least \(L^{\mathrm{max }\mathopen {}\mathord {\left\{ {1.5\phi ,2.5\gamma }\right\} }+o(1)}\), never mind the problem of matching the table area with the linearalgebra area. The minimum exponent here is above \(2.4\).
2.6 Batch NFS (new). We drastically reduce \(AT\) cost by sharing work across many \(N\)’s in a different way: we process a batch of \(N\)’s in parallel, rather than performing precomputation to be used for one \(N\) at a time. We dynamically enumerate the pairs \((a,b)\) with \(abm\) smooth, distribute each pair across all the \(N\)’s in the batch, and remove each pair as soon as possible, rather than storing a complete table of the pairs. To avoid excessive communication costs we completely reorganize data in the middle of the computation: at the beginning each \(N\) is repeated many times to bring \(N\) close to the pairs \((a,b)\), while at the end the pairs \((a,b)\) relevant to each \(N\) are moved much closer together. The rest of this subsection presents the details of the algorithm.
Consider as input a batch of \(L^{\beta +o(1)}\) simultaneous targets \(N\) within the large range described above. We require \(\beta \le \mathrm{min }\mathopen {}\mathord {\left\{ {2\phi 2\gamma ,4\theta 2\phi }\right\} }\); if there are more targets available at once then we actually process those targets in batches of size \(L^{\mathrm{min }\mathopen {}\mathord {\left\{ {2\phi 2\gamma ,4\theta 2\phi }\right\} }+o(1)}\), storing no data between runs.
Consider a square mesh of \(L^{\beta +o(1)}\) small parallel processors. This mesh is large enough to store all of the targets \(N\). Use each processor in parallel to test smoothness of \(abm\) for \(L^{2\theta \phi 0.5\beta +o(1)}\) pairs \((a,b)\) using ECM; by hypothesis \(2\theta \phi 0.5\beta \ge 0\). The total number of pairs here is \(L^{2\theta \phi +0.5\beta +o(1)}\). Each smoothness test takes time \(L^{o(1)}\). Overall the mesh takes time \(L^{2\theta \phi 0.5\beta +o(1)}\) and produces a total of \(L^{0.5\beta +o(1)}\) pairs \((a,b)\) with \(abm\) smooth, i.e., only \(L^{o(1)}\) pairs for each column of the mesh. See Fig. 2.7.
Move these pairs to the top row of the mesh (spreading them evenly across that row) by a standard sorting algorithm, say the Schnorr–Shamir algorithm from [50], taking time \(L^{0.5\beta +o(1)}\). Then broadcast each pair to its entire column, taking time \(L^{0.5\beta +o(1)}\). Actually, it will suffice for each pair to appear once somewhere in the first two rows, once somewhere in the next two rows, etc.
Observe that each pair has now visited each position in the mesh. When a pair \((a,b)\) visits a mesh position holding a target \(N\), use ECM to check whether \(ab\alpha \) is smooth, taking time \(L^{o(1)}\). The total time to check all \(L^{0.5\beta +o(1)}\) pairs against all \(L^{\beta +o(1)}\) targets is just \(L^{0.5\beta +o(1)}\), plus the time \(L^{2\theta \phi 0.5\beta +o(1)}\) to generate the pairs in the first place. See Fig. 2.8.
Repeat this entire procedure \(L^{\phi \gamma 0.5\beta +o(1)}\) times; by hypothesis \(\phi \gamma 0.5\beta \ge 0\). This covers a total of \(L^{2\theta \gamma +o(1)}\) pairs \((a,b)\), of which \(L^{\phi \gamma +o(1)}\) have \(abm\) smooth, so for each \(N\) there are \(L^{o(1)}\) pairs \((a,b)\) for which \(abm\) and \(ab\alpha \) are both smooth. The total number of smooth congruences found this way across all \(N\) is \(L^{\beta +o(1)}\). Store each smooth congruence as \((N,a,b)\); all of these together fit into a mesh of area \(L^{\beta +o(1)}\). The time spent is \(L^{\mathrm{max }\mathopen {}\mathord {\left\{ {\phi \gamma ,2\theta \gamma \beta }\right\} }+o(1)}\).
We are not done yet: we still need to perform linear algebra for each \(N\). To keep the communication costs of linear algebra under control we pack the linear algebra for each \(N\) into the smallest possible area. Allocate a separate square of area \(L^{\gamma +o(1)}\) to each \(N\), and route each smooth congruence \((N,a,b)\) in parallel to the corresponding square; this is another standard sorting step, taking total time \(L^{0.5\beta +0.5\gamma +o(1)}\) for all \(L^{\beta +\gamma +o(1)}\) smooth congruences. Finally, perform linear algebra separately in each square, and complete the factorization of each \(N\) as usual. This takes time \(L^{1.5\gamma +o(1)}\). See Fig. 2.10.
The overall time exponent is \(\mathrm{max }\mathopen {}\mathord {\left\{ {\phi \gamma ,2\theta \gamma \beta ,0.5\beta +0.5\gamma ,1.5\gamma }\right\} }\), and the area exponent is \(\beta +\gamma \). The final priceperformance ratio, \(AT\) per integer factored, has exponent \(\mathrm{max }\mathopen {}\mathord {\left\{ {\phi ,2\theta \beta ,0.5\beta +1.5\gamma ,2.5\gamma }\right\} }\).
2.11 Comparison and numerical parameter optimization. Bernstein’s \(AT\) exponent from [7] was \(\mathrm{max }\mathopen {}\mathord {\left\{ {2\theta ,2.5\gamma }\right\} }\). Batch NFS replaces \(2\theta \) with \(2\theta \beta \), allowing \(\gamma \) to be correspondingly reduced, at least until \(\beta \) becomes large enough for \(2.5\gamma \) to cross below \(\phi \). In principle one should also watch for \(2.5\gamma \) to cross below \(0.5\beta +1.5\gamma \), but Table 2.12 shows that \(\phi \) is more important.
Of course, even if we ignore the cost of finding the smooth \(abm\) (the term \(2\theta \beta \)), our \(AT\) exponent is not as small as Coppersmith’s RAM exponent \(\mathrm{max }\mathopen {}\mathord {\left\{ {\phi ,2\gamma }\right\} }\) from [20]. We have an extra \(0.5\beta +1.5\gamma \) term, reflecting the cost of communicating smooth congruences across a batch, and, more importantly, \(2.5\gamma \) instead of \(2\gamma \), reflecting the communication cost of linear algebra.
Table 2.12 shows the smallest exponents that we obtained for various \(\beta \), in each case from a brief search through 2500000000 pairs \((\mu ,\gamma )\). The exponent of the priceperformance ratio for batch NFS drops below Bernstein’s \(1.976\ldots \) as soon as \(\beta \) increases past \(0\), and reaches a minimum of \(1.704\ldots \) as the batch size increases. (The minimum is actually very slightly below \(1.704\), but our table does not include enough precision to show this.) Finding all \((a,b)\) with \(abm\) smooth is still a slight bottleneck for \(\beta =0.4\) but disappears for \(\beta =0.5\). When there are more inputs we partition them into batches of size \(L^{0.5+o(1)}\), preserving exponent \(1.704\ldots \) for the priceperformance ratio.
Cost exponents for batch NFS in the \(AT\) metric. The batch size is \(L^{\beta +o(1)}\). The \(AT\) cost is \(L^{e+o(1)}\). The parameter \(m\) is chosen as \(\exp ((\mu +o(1))(\log N)^{2/3}(\log \log N)^{1/3})\). The prime bound \(y\) is chosen as \(L^{\gamma +o(1)}\). The \((a,b)\) bound \(H\) is chosen as \(L^{\theta +o(1)}\). The number of \(ab\alpha \) smoothness tests is \(L^{\phi +o(1)}\) per target. The number of \(abm\) somothness tests is \(L^{2\theta \beta +o(1)}\) per target. The \(AT\) cost of routing is \(L^{0.5\beta +1.5\gamma +o(1)}\) per target. The \(AT\) cost of linear algebra is \(L^{2.5\gamma +o(1)}\) per target. All operations take place on a circuit of size \(L^{\beta +\gamma +o(1)}\).
batch  \(AT\)  \(m\) size  primes  \((a,b)\)  \(ab\alpha \)  \(abm\)  route  linear 

\(\beta \)  \(e\)  \(\mu \)  \(\gamma \)  \(\theta \)  \(\phi \)  \(2\theta \beta \)  \(0.5\beta +1.5\gamma \)  \(2.5\gamma \) 
0.0  1.976052  0.702860  0.790420  0.988026  1.679645  1.976052  1.185630  1.976050 
0.1  1.900575  0.705460  0.760230  1.000287  1.691256  1.900575  1.190345  1.900575 
0.2  1.829615  0.712320  0.731840  1.014808  1.705173  1.829615  1.197760  1.829600 
0.3  1.763034  0.718160  0.705210  1.031517  1.723580  1.763034  1.207815  1.763025 
0.4  1.710375  0.820920  0.684150  1.055172  1.710374  1.710345  1.226225  1.710375 
0.5  1.704000  0.921150  0.681600  1.077242  1.704000  1.654484  1.272400  1.704000 
3 Earlyabort ECM
Section 2 used ECM as a lowarea smoothness test for auxiliary integers \(c=f_d a^d+\cdots +f_0 b^d\). Each curve in ECM catches a fraction of the primes \(p\le y\) dividing \(c\), and many curves in sequence catch essentially all of the primes \(p\le y\).
This section analyzes a much faster smoothnessdetection method, “earlyabort ECM”. Not all smooth numbers are detected by earlyabort ECM, but new heuristics introduced in this section imply that this loss is much smaller than the speedup factor. The overall improvement grows as a superpolynomial function of \(\log y\), and therefore grows as a superpolynomial function of the NFS input size.
We emphasize again that this paper’s analyses are asymptotic. We do not claim that earlyabort ECM is better than ECM for any particular value of \(y\).
The rest of this section uses the word “time” to count simple arithmetic operations, such as multiplication and division, on integers with \(O(\lg c)\) bits. Each of these operations actually takes time \((\lg c)^{1+o(1)}\), but this extra factor is absorbed into other \(o(1)\) terms when \(c\) is bounded by the usual functions of \(y\).
3.1 Earlyabort trial division. Early aborts predate ECM. They became popular in the 1970s as a component of CFRAC [43], a subexponentialtime factorization method that, like batch NFS, generates many “random” numbers that need to be tested for smoothness.
The simplest form of early aborts is singleearlyabort trial division. Trial division simply checks divisibility of \(c\) by each prime \(p\le y\), taking time \(y^{1+o(1)}\). Singleearlyabort trial division first checks divisibility of \(c\) by each prime \(p\le y^{1/2}\); then throws \(c\) away (this is the early abort) if the unfactored part of \(c\) is too large; and then, if \(c\) has survived the early abort, checks divisibility of \(c\) by each prime \(p\le y\).
The definition of “too large” is chosen so that \(1/y^{1/2+o(1)}\) of all inputs survive the abort, balancing the cost of the stages before and after the abort. In other words, singleearlyabort trial division checks divisibility of each input by each prime \(p\le \sqrt{y}\); keeps the smallest \(1/y^{1/2+o(1)}\) of all inputs; and, for each of those inputs, checks divisibility by each prime \(p\le y\).
More generally, \((k1)\)earlyabort trial division removes each prime \(p\le y^{1/k}\) from each input (by dividing by factors found); reduces the number of inputs by a factor of \(y^{1/k}\), keeping the smallest inputs; removes each prime \(p\le y^{2/k}\) from each remaining input; reduces the number of inputs by another factor of \(y^{1/k}\), keeping the smallest inputs; and so on through \(y^{k/k}=y\).
The time per input for \((k1)\)earlyabort trial division is only \(y^{1/k+o(1)}\), saving a factor \(y^{11/k+o(1)}\), if \(k\) is limited to a slowly growing function of \(y\). The method does not detect all smooth numbers, but Pomerance’s analysis in [45, Sect. 4] shows that the loss factor is only \(y^{(11/k)/2+o(1)}\), i.e., that the method detects \(1\) out of every \(y^{(11/k)/2+o(1)}\) smooth numbers. The overall improvement factor in priceperformance ratio is \(y^{(11/k)/2+o(1)}\); if \(k\) is chosen so that \(k\rightarrow \infty \) as \(y\rightarrow \infty \) then the improvement factor is \(y^{1/2+o(1)}\).
3.2 Early aborts in more generality. One can replace trial division with any method, or combination of methods, of checking for primes \(\le y^{1/k}\), primes \(\le y^{2/k}\), etc.
In particular, Pomerance considered an earlyabort version of Pollard’s rho method. The original method takes time \(y^{1/2+o(1)}\) to find all primes \(p\le y\). Earlyabort rho takes time only \(y^{1/(2k)+o(1)}\), and Pomerance’s analysis shows that it has a loss factor of only \(y^{(11/k)/4+o(1)}\).
Pomerance actually considered a different method by Pollard and Strassen. The Pollard–Strassen method takes essentially the same amount of time as Pollard’s rho method, and has the advantage of a proof of speed without any conjectures, but has the disadvantage of using much more memory.
Pomerance’s paper was published in 1982, so of course it did not analyze the ellipticcurve method. After seeing early aborts improve trial division from \(y\) to \(y^{1/2}\), and improve Pollard’s rho method from \(y^{1/2}\) to \(y^{1/4}\), one might guess that early aborts improve ECM from \(\exp (\sqrt{(2+o(1)){\log y\log \log y}})\) to \(\exp ((1/2)\sqrt{(2+o(1)){\log y\log \log y}})\), but our heuristics do not agree with this guess.
3.3 Performance of early aborts. Recall that ECM takes time \(T(y)^{1+o(1)}\) to find primes \(p\le y\), where \(T(y)=\exp (\sqrt{2\log y\log \log y})\). We actually consider, in much more generality, any factorization method \(M\) taking time \(T(y)^{1+o(1)}\) to find primes \(p\le y\), where \(T\) is any sufficiently smooth function.
3.4 Understanding the heuristics. Let \(y\) and \(u\) be real numbers larger than \(1\), define \(x=y^u\), and define \(S_0=\mathord {\left\{ {1,2,\ldots ,\mathord {\left\lfloor {x}\right\rfloor }}\right\} }\). Define \(\varPsi (x,y)\) as the number of \(y\)smooth integers in \(S_0\). Then \(\varPsi (x,y)\) is approximately \(x/u^u\). See [45, Theorem 2.1] for a precise statement. The same approximation is still valid for \(\varPsi (x,y,z)\), the number of \(y\)smooth integers in \(S_0\) having no prime factor \({\le z}\), assuming that \(z<y^{11/{\log u}}\); see [45, Theorem 2.2].
Consider any vector \((s_1,s_2,\ldots ,s_k)\) such that each \(s_i\) is a \(y_i\)smooth positive integer \({\le }x_i\) having no prime factors \({\le }y_{i1}\). For any such \((s_1,s_2,\ldots ,s_k)\), the product \(c=s_1s_2\cdots s_k\) is a positive integer bounded by \(x_1x_2\cdots x_k=x\), so \(c\in S_0\). Dividing \(c\) by its \(y_1\)smooth part produces \(s_2\cdots s_k\le x/x_1\), so \(c\in S_1\). Similarly \(c\in S_2\) and so on through \(c\in S_k\).
The map from \((s_1,s_2,\ldots ,s_k)\) to \(s_1s_2\cdots s_k\in S_k\) is injective: the \(y_1\)smooth part of \(s_1s_2\cdots s_k\) is exactly \(s_1\), the \(y_2\)smooth part is exactly \(s_1s_2\), etc. Hence \(\#S_k\) is at least the number of such vectors \((s_1,s_2,\ldots ,s_k)\), which is exactly \(\varPsi (x_1,y_1,y_0) \varPsi (x_2,y_2,y_1) \varPsi (x_3,y_3,y_2) \cdots \varPsi (x_k,y_k,y_{k1})\). Pomerance’s earlyabort analysis in [45] says, in some cases, that \(\#S_k\) is not much larger than this. We heuristically assume that this is true in more generality.
The approximation \(\varPsi (x_i,y_i,y_{i1})\approx x_i/u_i^{u_i}\), where \(u_i=(\log x_i)/\log y_i\), now implies that \(\#S_k\) is approximately \(x/(u_1^{u_1}\cdots u_k^{u_k})\). More generally, \(\#S_i\) is approximately \(x/(u_1^{u_1}\cdots u_i^{u_i})\).
Write \(T_i\) for the cost of finding the \(y_i\)smooth part of an integer. The earlyabort factorization method, applied to a uniform random element of \(S_0\), always takes time \(T_1\) to find primes \({\le }y_1\); with probability \(\#S_1/\#S_0\approx 1/u_1^{u_1}\) takes additional time \(T_2\) to find primes \({\le }y_2\); with probability \(\#S_2/\#S_0\approx 1/(u_1^{u_1}u_2^{u_2})\) takes additional time \(T_3\) to find primes \({\le }y_3\); and so on. With probability \(\#S_k/\#S_0\approx 1/(u_1^{u_1}\cdots u_k^{u_k})\) an integer survives all aborts and is \(y\)smooth.
Balancing the time for the earlyabort stages, i.e., ensuring that each stage takes time approximately \(T_1\), requires choosing \(x_1\) (depending on \(y_1\)) so that \(u_1^{u_1}\approx T_2/T_1\), choosing \(x_2\) (depending on \(y_2\)) so that \(u_2^{u_2}\approx T_3/T_2\), and so on through choosing \(x_{k1}\) (depending on \(y_{k1}\)) so that \(u_{k1}^{u_{k1}}\approx T_k/T_{k1}\). Then \(x_k\) is determined as \(x/(x_1\cdots x_{k1})\), and \(u_k\) is determined as \((\log x_k)/\log y_k =u(\log x_1\cdots x_{k1})/\log y =u(\theta _1 u_1+\theta _2 u_2+\cdots +\theta _{k1} u_{k1}) \) where \(\theta _i=(\log y_i)/\log y\).
We see two obstacles to proving the formula \((T_1T_2\cdots T_k)^{1/k}\) for earlyabort ECM. First, the assumption \(u_i\in u^{1+o(1)}\) is correct for exponentialtime smoothness tests for standard ranges of \(x\) and \(y\); but \(u_i\in u^{0.5+o(1)}\) for ECM, except for \(i=k\). Second, the error factor \(u^{o(u)}\) in the standard \(u^u\) approximation is larger than the entire ECM running time. Despite these caveats we conjecture that the heuristics apply beyond the case of exponentialtime smoothness tests, and in particular apply to earlyabort ECM.
Even when smoothness theorems are available, one should not overstate the extent to which they constitute rigorous analyses of NFS. There is no proof that NFS congruences have similar smoothness probability to uniform random integers; this is one of the NFS heuristics. There is no proof that ECM finds all small primes at similar speed; this is another heuristic. As mentioned earlier, Pomerance’s analysis in [45] actually uses the provable Pollard–Strassen smoothnessdetection method, and Bernstein’s batch trialdivision method [8] is proven to run in polynomial time per input; but both of these methods perform poorly in the \(AT\) metric. Similarly, Pomerance proved in [45] that Dixon’s random squares have similar smoothness probability to uniform random integers; but Dixon’s method is much slower than NFS, and proving something similar about NFS is an open problem.
3.5 Impact of early aborts on smoothness probabilities. Because earlyabort ECM does not find all smooth values, it forces batch NFS to consider more pairs \((a,b)\), and therefore slightly larger pairs \((a,b)\). This increase means that the auxiliary integers \(c\) are larger and less likely to be smooth. We conclude by showing that this effect does not eliminate the (heuristic) asymptotic gain produced by early aborts.
Recall that the smoothness probability of \(c\) is heuristically \(1/v^v\), where \(v\) is the ratio of the number of bits in \((f_d+\cdots +f_0)H^d\) and the number of bits in \(y\). The derivative of \(v\) with respect to \(\log H\) is \(d/\log y\), so the derivative of \(\log (v^v)\) with respect to \(\log H\) is \(d(1+\log v)/\log y\in 1/(3\gamma \mu )+o(1)\); here we have used the asymptotics \(d\in (1/\mu +o(1))(\log N)^{1/3}(\log \log N)^{1/3}\), \(\log y\in (\gamma +o(1))(\log N)^{1/3}(\log \log N)^{2/3}\), and \(\log v\in (1/3+o(1))\log \log N\).
Write \(\delta =(2/3)/(21/(3\gamma \mu ))\). Multiplying \(H\) by a factor \(T^{\delta +o(1)}\) means multiplying the number of pairs \((a,b)\) by a factor \(T^{2\delta +o(1)}\) and thus multiplying the number of smoothness tests by a factor \(T^{2\delta +o(1)}\). Meanwhile it multiplies \(v^v\) by a factor \(T^{\delta /(3\gamma \mu )+o(1)}\), and thus multiplies the final number of smooth congruences by a factor \(T^{(21/(3\gamma \mu ))\delta +o(1)}=T^{2/3+o(1)}\). Our heuristics state that switching from ECM to earlyabort ECM reduces the number of smooth congruences found by a factor \(T^{2/3+o(1)}\), producing just enough smooth congruences for a successful factorization, while decreasing the cost of each smoothness test by a factor \(T^{1+o(1)}\). The overall speedup factor is \(T^{12\delta +o(1)}\).
For example, [7] took \(\gamma \approx 0.790420\) and \(\mu \approx 0.702860\), so the speedup factor is \(T^{0.047\ldots +o(1)}\). As another example, batch NFS with \(\beta =0.5\) takes \(\gamma \approx 0.681600\) and \(\mu \approx 0.921150\), so the speedup factor is \(T^{0.092\ldots +o(1)}\).
References
 [4]Bai, S., Bouvier, C., Filbois, A., Gaudry, P., Imbert, L., Kruppa, A., Morain, F., Thomé, E., Zimmermann, P.: CADONFS—Crible Algébrique: Distribution, Optimisation—Number Field Sieve (2013). http://cadonfs.gforge.inria.fr/. Citations in this document: §1
 [6]Barker, E., Barker, W., Burr, W., Polk, W., Smid, M.: Recommendation for key management—part 1: general (revision 3) (2012). http://csrc.nist.gov/publications/nistpubs/80057/sp80057_part1_rev3_general.pdf. Citations in this document: §1.2
 [7]Bernstein, D.J.: Circuits for integer factorization: a proposal (2001). http://cr.yp.to/papers.html#nfscircuit. Citations in this document: §1.1, §1.3, §1.3, §1.3, §1.3, §2.4, §2.5, §2.11, §3.5
 [8]Bernstein, D.J.: How to find small factors of integers (2002). http://cr.yp.to/papers.html#sf. Citations in this document: §3.4
 [10]Bernstein, D.J., Birkner, P., Lange, T., Peters, C.: ECM using Edwards curves. Math. Comput. 82, 1139–1179 (2013). Citations in this document: §AGoogle Scholar
 [11]Bernstein, D.J., Chen, H.C., Chen, M.S., Cheng, C.M., Hsiao, C.H., Lange, T., Lin, Z.C. , Yang, B.Y.: The billionmulmodpersecond PC. In: Workshop Record of SHARCS’09: SpecialPurpose Hardware for Attacking Cryptographic Systems, pp. 131–144 (2009). http://www.hyperelliptic.org/tanja/SHARCS/record2.pdf. Citations in this document: §1.2
 [12]Bernstein, D.J., Chen, T.R., Cheng, C.M., Lange, T., Yang, B.Y.: ECM on graphics cards. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 483–501. Springer, Heidelberg (2009). Citations in this document: §1.2CrossRefGoogle Scholar
 [14]Bernstein, D.J., Lange, T.: Nonuniform cracks in the concrete: the power of free precomputation. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013, Part II. LNCS, vol. 8270, pp. 321–340. Springer, Heidelberg (2013). Citations in this document: §1.2, §1.3, §1.3, §2.5CrossRefGoogle Scholar
 [15]Bilardi, G., Preparata, F.P.: Horizons of parallel computation. J. Parallel Distrib. Comput. 27, 172–182 (1995). Citations in this document: §1.1CrossRefzbMATHGoogle Scholar
 [17]Bos, J.W., Kleinjung, T.: ECM at work. In: Wang, X., Sako, K. (eds.) ASIACRYPT 2012. LNCS, vol. 7658, pp. 467–484. Springer, Heidelberg (2012). Citations in this document: §1.2CrossRefGoogle Scholar
 [18]Brent, R.P., Kung, H.T.: The areatime complexity of binary multiplication. J. ACM 28, 521–534 (1981). Citations in this document: §1.1, §1.3MathSciNetCrossRefzbMATHGoogle Scholar
 [19]Buhler, J.P., Lenstra Jr., H.W., Pomerance, C.: Factoring integers with the number field sieve. See [23], pp. 50–94 (1993). Citations in this document: §1.3, §1.3, §1.3, §1.3, §1.3, §1.3, §2.2, §2.3Google Scholar
 [20]Coppersmith, D.: Modifications to the number field sieve. J. Cryptol. 6, 169–180 (1993). Citations in this document: §1.3, §1.3, §1.3, §1.3, §1.3, §1.3, §1.3, §1.3, §2.3, §2.5, §2.5, §2.11MathSciNetCrossRefzbMATHGoogle Scholar
 [21]Crandall, R., Pomerance, C.: Prime numbers: A Computational Perspective. Springer, New York (2001). Citations in this document: §3CrossRefzbMATHGoogle Scholar
 [23]Franke, J., Kleinjung, T., Paar, C., Pelzl, J., Priplata, C., Stahlke, C.: SHARK: a realizable special hardware sieving device for factoring 1024bit integers. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 119–130. Springer, Heidelberg (2005). Citations in this document: §1, §1.1Google Scholar
 [24]Geiselmann, W., Shamir, A., Steinwandt, R., Tromer, E.: Scalable hardware for sparse systems of linear equations, with applications to integer factorization. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 131–146. Springer, Heidelberg (2005). Citations in this document: §1, §1.1CrossRefGoogle Scholar
 [25]Heninger, N., Durumeric, Z., Wustrow, E., Halderman, J.A.: Mining your Ps and Qs: detection of widespread weak keys in network devices. In: USENIX Security Symposium (2012). Citations in this document: §1.1Google Scholar
 [29]Kleinjung, T.: On polynomial selection for the general number field sieve. Math. Comput. 75, 2037–2047 (2006). Citations in this document: §1MathSciNetCrossRefzbMATHGoogle Scholar
 [30]Kleinjung, T.: Polynomial selection. Slides presented at the CADO workshop, Nancy, France (2008). http://cado.gforge.inria.fr/workshop/slides/kleinjung.pdf. Citations in this document: §1
 [31]Kleinjung, T., Aoki, K., Franke, J., Lenstra, A.K., Thomé, E., Bos, J.W., Gaudry, P., Kruppa, A., Montgomery, P.L., Osvik, D.A., te Riele, H.J.J., Timofeev, A., Zimmermann, P.: Factorization of a 768bit RSA modulus. In: Rabin, T. (ed.) CRYPTO 2010. LNCS, vol. 6223, pp. 333–350. Springer, Heidelberg (2010). Citations in this document: §1CrossRefGoogle Scholar
 [32]Kolkman, O.M., Mekking, M., Gieben, M.: RFC 6781: DNSSEC operational practices, version 2 (2012). http://tools.ietf.org/html/rfc6781. Citations in this document: §1
 [35]Lenstra, A.K., Hughes, J.P., Augier, M., Bos, J.W., Kleinjung, T., Wachter, C.: Public keys. In: SafaviNaini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 626–642. Springer, Heidelberg (2012). Citations in this document: §1.1CrossRefGoogle Scholar
 [36]Lenstra, A.K., Lenstra Jr., H.W. (eds.): The Development of the Number Field Sieve. Lecture Notes in Mathematics, vol. 1554. Springer, Berlin (1993). Citations in this document: §1. See [19]zbMATHGoogle Scholar
 [37]Lenstra, A.K., Lenstra Jr., H.W., Manasse, M.S., Pollard, J.M.: The factorization of the ninth Fermat number. Math. Comput. 61, 319–349 (1993). Citations in this document: §1.3MathSciNetCrossRefzbMATHGoogle Scholar
 [38]Lenstra, A.K., Tromer, E., Shamir, A., Kortsmit, W., Dodson, B., Hughes, J., Leyland, P.: Factoring estimates for a 1024bit RSA modulus. In: Laih, C.S. (ed.) ASIACRYPT 2003. LNCS, vol. 2894, pp. 55–74. Springer, Heidelberg (2003). Citations in this document: §1CrossRefGoogle Scholar
 [40]Lenstra Jr., H.W., Tijdeman, R. (eds.): Computational Methods in Number Theory I. Mathematical Centre Tracts, vol. 154. Mathematisch Centrum, Amsterdam (1982). See [45]zbMATHGoogle Scholar
 [41]Lewis, E.: DNSSEC at TLDs, start of 4Q 2013 (2013). https://elists.isoc.org/pipermail/dnsseccoord/2013October/000172.html. Citations in this document: §1
 [43]Morrison, M.A., Brillhart, J.: A method of factoring and the factorization of \(F_7\). Math. Comput. 29, 183–205 (1975). Citations in this document: §3.1MathSciNetzbMATHGoogle Scholar
 [45]Pomerance, C.: Analysis and comparison of some integer factoring algorithms. In: [40], pp. 89–139 (1982). http://cr.yp.to/bib/1982/pomerance.html. Citations in this document: §3.1, §3.4, §3.4, §3.4, §3.4, §3.4
 [50]Schnorr, C.P., Shamir, A.: An optimal sorting algorithm for meshconnected computers. In: STOC 1986, pp. 255–261 (1986). Citations in this document: §1.1, §2.6Google Scholar
 [51]Shamir, A., Tromer, E.: Factoring large numbers with the TWIRL device. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 1–26. Springer, Heidelberg (2003). Citations in this document: §1, §1.1, §1.2CrossRefGoogle Scholar
 [53]Thompson, C.D., Kung, H.T.: Sorting on a meshconnected parallel computer. Commun. ACM 20, 263–271 (1977). Citations in this document: §1.1MathSciNetCrossRefzbMATHGoogle Scholar
 [54]van Oorschot, P.C., Wiener, M.: Parallel collision search with cryptanalytic applications. J. Cryptol. 12, 1–28 (1999). Citations in this document: §1.1MathSciNetCrossRefzbMATHGoogle Scholar
 [56]Wiener, M.J.: The full cost of cryptanalytic attacks. J. Cryptol. 17, 105–124 (2004). Citations in this document: §1.1MathSciNetCrossRefzbMATHGoogle Scholar