In this section, we describe and improve upon two variants of the celebrated “baby-step giant-step” algorithm [28] for computing DLs. These algorithm variants have been specially adapted for cases in which the exponent is known to have low Hamming weight. The most basic form of each algorithm is described and analyzed in a paper by Stinson [29], who credits the first to Heiman [8] and Odlyzko [24] and the second to Coppersmith (by way of unpublished correspondence with Vanstone [4]).Footnote 2 In both cases, our improvements yield modest-yet-notable performance improvements—both concretely and asymptotically—over the more basic forms of the algorithms; indeed, our improvements to the second algorithm yield a worst-case computation complexity superior to that of any known algorithm for the low-Hamming-weight DL problem. In Sect. 4, we propose and analyze a simple transformation that generalizes each low-Hamming-weight DL algorithm in this paper to a corresponding low-radix-b-weight DL algorithm, where the radix \(b>1\) can be arbitrary.
3.1 The Basic Algorithm
Algorithm 3.1 gives pseudocode for the most basic form of the algorithm, which is due to Heiman and Odlyzko [8, 24].
Theorem 3
Algorithm 3.1 is correct: If there is an m-bit integer x with Hamming weight t such that \(g^{x\!}=h\), then the algorithm returns a DL of h to the base g.
Proof
(sketch). This follows directly from Lemmas 1 and 2. Specifically, Lemma 1 ensures that any value returned on Line 12 of Algorithm 3.1 satisfies \(g^{x\!}=h\), while Lemma 2 ensures that the baby-step loop (Lines 8–14) will indeed find the requisite pair \((Y_1,Y_2)\) if such a pair exists. \(\square \)
Remark
When the order q is unknown, one can set m to be any upper bound on \(\lceil \lg {q}\rceil \), and then omit the modular reduction on Line 12 of Algorithm 3.1. Indeed, one may even set \(m>\lceil \lg {q}\rceil \) when q is known if, for example, the canonical representation of the desired DL has large Hamming weight but is known to be congruent (modulo q) to an m-bit integer with low Hamming weight.
The next theorem follows easily by inspection of Algorithm 3.1.
Theorem 4
The storage cost and (both average- and worst-case) computation cost of Algorithm 3.1, counted respectively in group elements and group exponentiations, each scale as
.
Remark
Each exponentiation counted in Algorithm 3.1 is to a power with Hamming weight
. By pre-computing \(g^{\mathrm{val}(\{i\})}\) for \(i\in [m]\), one can evaluate these exponentiations using just
group operations a piece. The (both average- and worst-case) computation complexity becomes
group operations. Going a step further, one can pre-compute \(g^{\mathrm{val}(\{i\})-\mathrm{val}(\{j\})}\) for each \(i\ne j\), and then iterate through
following a “minimal-change ordering” [19, Sect. 2.3.3] wherein each successive pair of subsets differ by exactly two elements [30]. Then all but the first iteration of the baby-step (respectively, giant-step) loop uses a single group operation to “update” the \(y_1\) (respectively, \(y_2\)) from the previous iteration. The worst-case computation cost becomes
group operations (plus one inversion and \(m^2\) group operations for pre-computation).
3.2 Improved Complexity via Interleaving
Next, we propose and analyze an alternative way to implement the basic algorithm (i.e., Algorithm 3.1), which interleaves the baby-step and giant-step calculations in a manner reminiscent of Pollard’s interleaved variant of the classic baby-step giant-step algorithm [27, Sect. 3]. Although such interleaving is a well-known technique for achieving constant-factor average-case speedups in baby-step giant-step algorithms, it had not previously been applied in the context of low-Hamming-weight DLs. Our analysis reveals that interleaving can, in fact, yield a surprisingly large (super-constant) speedup in this context.
The interleaved variant comprises a single loop and two lookup tables, \(H_1\) and \(H_2\). The loop iterates simultaneously over the subsets
and
in respectively increasing and decreasing order. (To keep the following analysis simple, we assume the order is lexicographic; however, we note that one can obtain a factor t speedup by utilizing some pre-computation and a minimal-change ordering, exactly as we suggested in the above remarks following the non-interleaved algorithm.) In each iteration, the algorithm computes both
and
, storing \((y_1,Y_1)\) in \(H_1\) and \((y_2,Y_2)\) in \(H_2\), and also checking if \(y_1\) collides with a key in \(H_2\) or \(y_2\) with a key in \(H_1\). Upon discovering a collision, it computes and outputs \(x\equiv \log _gh\bmod {q}\) using Lemma 1 (cf. Line 12 of Algorithm 3.1) and then halts. A pseudocode description of our interleaved algorithm is included in our extended technical report [11, Sect. B.1].)
Despite its simplicity, this modification appears to be novel and has a surprisingly large impact on the average-case complexity. Indeed, if we assume that the interleaved loop iterates through
in increasing and decreasing lexicographic order (for the giant-step and baby-step calculations, respectively), then the worst possible costs arise when the t one bits in the binary representation of x occur consecutively in either the t highest-order or the t lowest-order bit positions (i.e., when \(x=1^t0^{m-t}\) or \(x=0^{m-t}1^t\)). In this case, the algorithm produces a collision and halts after
iterations of the loop. For
, this gives a worst-case constant factor speedup compared to the non-interleaved algorithm;Footnote 3 for
, the worst-case speedup is asymptotic (alas, we are unable to derive a precise characterization of the speedup in terms of m and t). The average-case speedup can be much more dramatic, depending on the distribution of the targeted \(x\equiv \log _gh\bmod {q}\). For a uniform distribution (among the set of all m-bit exponents with Hamming weight t) on x, we heuristically expect the one bits in x to be distributed evenly throughout its binary representation; that is, we expect to find the
and
one bits in x in or around bit positions
and
, respectively. Therefore, we expect the interleaved algorithm to produce a collision and halt after at most around
loop iterations. (Contrast this with the original average-case
complexity of the non-interleaved algorithm.) We summarize our above analysis in Theorem 5.
Theorem 5
The worst-case storage and computation costs of the interleaved algorithm described above, counted respectively in group elements and group operations, each scale as
. If x is uniform among m-bit exponents with Hamming weight t, then the average-case storage and computation complexities scale as
.
3.3 The Coppersmith Algorithms
Algorithm 3.1 and our interleaved variant are “direct” algorithmic instantiations of Lemmas 1 and 2 with a fixed radix \(b=2\). Such direct instantiations perform poorly in the worst case because Lemma 2 guarantees only existence—but not uniqueness—of the subsets \(Y_1\) and \(Y_2\) and, as a result, the collections of subsets over which these direct instantiations ultimately iterate are only guaranteed to be sufficient—but not necessary—to compute the desired logarithm. Indeed, given
such that \(\log _gh\equiv \mathrm{val}(Y)\bmod {q}\), there exist
distinct ways to partition Y into
and \(Y_2=Y\setminus {Y_1}\) to satisfy the congruence \(\log _gh\equiv \bigl (\mathrm{val}(Y_1)+\mathrm{val}(Y_2)\bigr )\bmod {q}\) arising in Lemma 2. Stirling’s approximation implies that
approaches
as t grows large so that the number of “redundant” values these basic algorithms may end up computing (and storing) grows exponentially with t. We now describe a more efficient variant of this algorithm, originally proposed by Coppersmith [4], that improves on the complexity of the basic algorithms by taking special care to iterate over significantly fewer redundant subsets. (Actually, Coppersmith proposed two related algorithms—one deterministic and the other randomized; however, due to space constraints, we discuss only the deterministic algorithm in this section, relegating our discussion of the randomized algorithm to our extended technical report [11, Sect. D].)
Coppersmith’s Deterministic Algorithm. The first variant of Algorithm 3.1 proposed by Coppersmith is based on the following observation.
Observation 6
(Coppersmith and Seroussi [5]). Let t and m be even positive integers with \(t\le m\) and, for each
, define
and \(\bar{B}_i=[m]\setminus B_i\). For any
, there exists some
and (disjoint) subsets
and
such that \(Y=Y_1\cup Y_2\).
A proof of Observation 6 is included in our extended technical report [11, Sect. A.4]. The following analog of Lemma 2 is an immediate corollary to Observation 6.
Corollary 7
Let t and m be even positive integers with \(t\le m\) and, for each
, define
and \(\bar{B}_i=[m]\setminus B_i\). If there is an \(x\equiv \log _gh\bmod {q}\) with Hamming weight t and bit length at most m, then there exists some
and (disjoint) subsets
and
such that \(g^{\mathrm{val}(Y_1)}=h\cdot g^{-\mathrm{val}(Y_2)}\).
Using Corollary 7 to improve on the worst-case complexity of the basic algorithm is straightforward. The giant-step and baby-step loops (i.e., Lines 3–6 and 8–14) from Algorithm 3.1 are respectively modified to iterate over only the subsets
and
for each
in turn. In particular, the algorithm populates a lookup table H in the giant-step loop using only the
, and then it searches for a collision within H in the baby-step loop using only the
; if the baby-step loop for \(i=1\) generates no collisions, then the algorithm clears the lookup table and repeats the process for \(i=2\), and so on up to
. Observation 6 guarantees that the algorithm finds a collision and halts at some point prior to completing the baby-step loop for
, provided a DL with the specified Hamming weight and bit length exists. Pseudocode for the above-described algorithm is included in our extended technical report [11, Sect. B.2].
The next theorem follows easily from Corollary 7 and by inspection.
Theorem 8
Coppersmith’s deterministic algorithm is correct; moreover, its storage cost scales as
group elements and its (worst-case) computation cost as
group exponentiations.Footnote 4
Remark
The average-case complexity requires a delicate analysis, owing to the fact that there may be several indices i for which
and the algorithm will always halt upon encountering the first such index. Interested readers can find a detailed analysis of the average-case complexity in Stinson’s paper [29, Sect. 3]. Stinson’s paper also proposes a generalization of Coppersmith’s deterministic algorithm utilizing a family of combinatorial set systems called splitting systems [29, Sect. 2.1] (of which the Coppersmith–Seroussi set system defined in Observation 6 and Corollary 7 is an example). A discussion of splitting systems and Stinson’s improvements to the above algorithm is included in our extended technical report [11, Sect. C].
3.4 Improved Complexity via Pascal’s Lemma
A methodical analysis of the Coppersmith–Seroussi set system suggests an optimization to Coppersmith’s deterministic algorithm that yields an asymptotically lower computation complexity than that indicated by Theorem 8. Indeed, the resulting optimized algorithm has a worst-case computation complexity of just
group operation, which is asymptotically lower than that of any low-Hamming-weight DL algorithm in the literature. Moreover, the hidden constant in the optimized algorithm (i.e.,
) seems to be about as low as one could realistically hope for. Our improvements follow from Observation 9, an immediate consequence of Pascal’s Lemma for binomial coefficients, which states that
.
Observation 9
Let
be the Coppersmith–Seroussi set system, as defined in Observation 6 and Corollary 7. For each
, we have that
.
A simple corollary to Observation 9 is that the baby-step and giant-step loops for
in a naïve implementation of Coppersmith’s deterministic algorithm each recompute
values that were also computed in the immediately preceding invocation, or, equivalently, that these loops each produce just
new values. Carefully avoiding these redundant computations can therefore reduce the per-iteration computation cost of all but the first iteration of the outer loop to
group operations. The first (i.e., \(i=1\)) iteration of the outer loop must, of course, still produce
values; thus, in the worst case, the algorithm must produce
distinct group elements. Note that in order to avoid all redundant computations in subsequent iterations, it is necessary to provide both the giant-step and baby-step loops with access to the \((y_1,Y_1)\) and \((y_2,Y_2)\) pairs, respectively, that arose in the immediately preceding invocation. Coppersmith’s deterministic algorithm already stores each \((y_1,Y_1)\) pair arising in the giant-step loop, but it does not store the \((y_2,Y_2)\) pairs arising in the baby-step loop; hence, fully exploiting Observation 9 doubles the storage cost of the algorithm (in a similar vein to interleaving the loops). The upshot of this increased storage cost is a notable asymptotic improvement to the worst-case computation cost, which we characterize in Lemma 10 and Corollary 11. A proof of Lemma 10 is located in Appendix A.1.
Lemma 10
Let
be the Coppersmith–Seroussi set system, as defined in Observation 6 and Corollary 7. We have
To realize the speedup promised by Lemma 10, the optimized algorithm must do some additional bookkeeping; specifically, in each iteration
, it must have an efficient way to determine which of the
and
—as well as the associated \(y_1=g^{\mathrm{val}(Y_1)}\) and \(y_2=h\cdot g^{-\mathrm{val}(Y_2)}\)—arose in the
iteration, and which of them arise will for the first time in the ith iteration. To this end, the algorithm keeps two sequences of hash tables, say \(H_1,\ldots ,H_{m}\) and \(I_1,\ldots ,I_m\), one for the giant-step pairs and another for the baby-step pairs. Into which hash table a given \((Y_1,y_1)\) pair gets stored is determined by the smallest integer in \(Y_1\): a \((Y_1,y_1)\) pair that arose in the
iteration of the outer loop will also arise in the ith iteration if and only if the smallest element in \(Y_1\) is not \(i-1\); thus, all values from the
iteration not in the hash table \(H_{i-1}\) can be reused in the next iteration. Moreover, each \((Y_1,y_1)\) pair that will arise for the first time in the ith iteration has a corresponding \((Y_1',y_1')\) pair that is guaranteed to reside in \(H_{i-1}\) at the end of the
iteration. Indeed, one can efficiently “update” each such \((Y_1',y_1')\) in \(H_{i-1}\) to a required \((Y_1,y_1)\) pair by setting
and
. Note that because \(Y_1\) no longer contains \(i-1\), the hash table in which the updated \((Y_1,y_1)\) pair should be stored changes from \(H_{i-1}\) to \(H_j\) for some \(j\ge i\). An analogous method is used for keeping track of and “updating” the \((Y_2,y_2)\) pairs arising in the baby-step loop. Pseudocode for the above-described algorithm is included as Algorithm B.1 in Appendix B. The following corollary is an immediate consequence of Lemma 10.
Corollary 11
Algorithm B.1 is correct; moreover, its storage cost scales as
group elements and its worst-case computation cost as
group exponentiations.
Note that the worst-case complexity obtained in Corollary 11 improves on a naïve implementation of Coppersmith’s algorithm by a factor \(\tfrac{m}{t}\) (and it improves on the previously best-known lower bound, due to Stinson [29, Theorem 4.1], by a factor \(\sqrt{t}\lg {m}\)). As with the basic algorithm, one can leverage pre-computation and a minimal-change ordering to replace all but two of the exponentiations counted by Corollary 11 with a single group operation each; hence, the worst-case computation complexity is in fact just
group operations.