1 Introduction

Sieving algorithms for the shortest vector problem (SVP) in a lattice have received a great deal of attention recently  [1, 2, 8, 17, 33, 40]. The attention mostly stems from lattice based cryptography, as many attacks on lattice based cryptographic constructions involve finding short lattice vectors  [3, 36, 39].

Lattice based cryptography is thought to be secure against quantum adversaries. None of the known algorithms to solve SVP (to a small approximation factor) do so in subexponential time, but this is not to say that there is no gain to be had given a large quantum computer. Lattice sieve algorithms use near neighbour search (NNS) as a subroutine; near neighbour search algorithms use black box search as a subroutine; and Grover’s quantum search algorithm  [25] gives a square root improvement to the query complexity of black box search. A black box search that is expected to take \(\varTheta (N)\) queries on classical hardware will take \(\varTheta (\sqrt{N})\) queries on quantum hardware using Grover’s algorithm.

Previous work has analysed the effect of quantum search on the query complexity of lattice sieves  [34, 35]. Of course, one must implement the queries efficiently in order to realise the improvement in practice. Recent work has given concrete quantum resource estimates for the black box search problems involved in key recovery attacks on AES  [23, 28] and preimage attacks on SHA-2 and SHA-3  [4]. In this work, we give explicit quantum circuits that implement the black box search subroutines of several quantum lattice sieves. Our quantum circuits are efficient enough to yield a cost improvement in dimensions of cryptanalytic interest. However, for the most performant sieve that we analyse the cost improvement is small and several barriers stand in the way of achieving it.

Outline and Contributions. We start with some preliminaries in Sect. 2. In particular, we discuss the “XOR and Population Count” operation (henceforth popcount), which is our primary optimisation target. The popcount operation is used to identify pairs of vectors that are likely to lie at a small angle to each other. It is typically less expensive than a full inner product computation.

In Sect. 3 we introduce and analyse a filtered quantum search procedure. We present our quantum circuit for popcount in Sect. 4. In Sect. 5 we provide a heuristic analysis of the probability that popcount successfully identifies pairs of vectors that are close to each other. This analysis may be of independent interest; previous work  [2, 17] has relied largely on experimental data for choosing popcount parameters.

In Sect. 6, we rederive the overall cost of the NNS subroutines of three lattice sieves. Our cost analysis exposes the impact of the \(\mathtt {popcount}\) parameters so that we can numerically optimise these in parallel with the sieve parameters. We have chosen to profile the Nguyen–Vidick sieve  [40], the bgj1 specialisation  [2] of the Becker–Gama–Joux sieve  [9], and the Becker–Ducas–Gama–Laarhoven sieve  [8]. We have chosen these three sieves as they are, respectively, the earliest and most conceptually simple, the most performant yet implemented, and the fastest known asymptotically.

Finally, we optimise the cost of classical and quantum search under various cost metrics to produce Fig. 2 of Sect. 7. We conclude by discussing barriers to obtaining the reported quantum advantages in NNS, the relationship between SVP and NNS, and future work. Both the data produced, and the source code used to compute it, are available at https://github.com/jschanck/eprint-2019-1161. We consider our software a contribution in its own right; it is documented, easily extensible and allows for the inclusion of new nearest neighbour search strategies and cost models.

Interpretation. Quantum computation seems to be more difficult than classical computation. As such, there will likely be some minimal dimension, a crossover point, below which classical sieves outperform quantum ones. Our estimates give non-trivial crossover points for the sieves we consider. Yet, our results do not rule out the relevance of quantum sieves to lattice cryptanalysis. The crossover points that we estimate are well below the dimensions commonly thought to achieve \(128\) bits of security against quantum adversaries. However, our initial logical circuit level analysis (Fig. 2, q: depth-width) is optimistic. It ignores the costs of quantum random access memory and quantum error correction.

To illustrate the potential impact of error correction, we apply a cost model developed by Gidney and Ekerå to our quantum circuits. The Gidney–Ekerå model was developed as part of a recent analysis of Shor’s algorithm  [20]. In the Gidney–Ekerå model, the crossover point for the NNS algorithm underlying the Becker–Ducas–Gama–Laarhoven sieve  [8] is dimension . In this dimension, the classical and quantum variants both perform operations and need at least bits of (quantum accessible) random access memory. A large cost improvement is obtained asymptotically, but for cryptanalytically relevant dimensions the improvement is tenuous. Between dimensions and our estimate for the quantum cost grows from appoximately \(2^{128}\) to approximately \(2^{256}\). In dimension this is an improvement of a factor of over our estimate for the classical cost. In dimension the improvement is by a factor of .

We caution that a memory constraint would significantly reduce the range of cryptanalytically relevant dimensions. For instance, an adversary with no more than \(2^{128}\) bits of quantum accessible classical memory is limited to dimension and below. In these dimensions we estimate a cost improvement of no more than a factor of at the logical circuit level and no more than in the Gidney–Ekerå metric.

A depth constraint would also reduce the range of cryptanalytically relevant dimensions. The quantum algorithms that we consider would be more severely affected by a depth constraint than their classical counterparts, due to the poor parallisability of Grover’s algorithm.

2 Preliminaries

2.1 Models of Computation

We describe quantum algorithms as circuits using the Clifford+T gate set, but we augment this gate set with a table lookup operation (qRAM). We describe classical algorithms as programs for RAM machines (random access memory machines).

Clifford+T+qRAM Quantum Circuits. Quantum circuits can be described at the logical layer, wherein an array of n qubits encodes a unit vector in , or at the physical layer, wherein the state space may be much larger. Ignoring qubit initialisation and measurement, a circuit is a sequence of unitary operations, one per unit time. Each unitary in the sequence is constructed by parallel composition of gates. At most one gate can be applied to each qubit per time step. The Clifford+T gate set

is commonly used to describe circuits at the logical layer due to its relationship with some quantum error correcting codes. This gate set is universal for quantum computation when combined with qubit initialisation (of \(|0\rangle \) and \(|1\rangle \) states) and measurement in the computational basis.

In addition to Clifford+T gates, we allow unit cost table lookups in the form of qRAM (quantum access to classical RAM). The difference between RAM and qRAM is that qRAM can construct arbitrary superpositions of table entries. Suppose that \((R_0,\dots , R_{2^n-1})\) are registers of a classical RAM and that each register encodes an \(\ell \) bit binary string. We allow our Clifford+T circuits access to these registers in the form of an \((n + \ell )\) qubit qRAM gate that enacts

(1)

Here is a superposition of addresses and x is an arbitrary \(\ell \) bit string.

Quantum access to classical RAM is a powerful resource, and the algorithms we describe below fail to achieve an advantage over their classical counterparts when qRAM is not available. We discuss qRAM at greater length in Sect. 7.

RAM Machines. We describe classical algorithms in terms of random access memory machines. For comparability with the Clifford+T gate set, we will work with a limited instruction set, e.g. {NOT, AND, OR, XOR, LOAD, STORE}. For comparability with qRAM, LOAD and STORE act on \(\ell \) bit registers.

Cost. The cost of a RAM program is the number of instructions that it performs. One can similarly define the gate cost of a quantum circuit to be the number of gates that it performs. Both metrics are reasonable in isolation, but it is not clear how one should compare the two. Jaques and Schanck recommend that quantum circuits be assigned a cost in the unit of RAM instructions to account for the role that classical computers play in dispatching gates to quantum memories  [29]. They also recommend that the identity gate be assigned unit cost to account for error correction. The depth-width cost of a quantum circuit is the total number of gate operations that it performs when one includes identity gates in the count.

2.2 Black Box Search

A predicate on \(\{0,1,\dots , N-1\}\) is a function \(f : \{0,1,\dots , N-1\} \rightarrow \{0,1\}\). The kernel, or set of roots, of f is \({\text {Ker}}(f) = \{x : f(x) = 0\}\). We write for . A black box search algorithm finds a root of a predicate without exploiting any structure present in the description of the predicate itself. Of course, black box search algorithms can be applied when structure is known, and we will often use structure such as “f has M roots” or “f is expected to have no more than M roots” in our analyses. We will also use the fact that the set of predicates on any given finite set can be viewed as a Boolean algebra. We write \(f\cup g\) for the predicate with kernel \({\text {Ker}}(f) \cup {\text {Ker}}(g)\) and \(f\cap g\) for the predicate with kernel \({\text {Ker}}(f) \cap {\text {Ker}}(g)\).

Exhaustive Search. An exhaustive search evaluates f(0), f(1), f(2), and so on until a root of f is found. The order does not matter so long as each element of the search space is queried at most once. If f is a uniformly random predicate with M roots, then this process has probability \(1-\left( {\begin{array}{c}N-M\\ j\end{array}}\right) /\left( {\begin{array}{c}N\\ j\end{array}}\right) \ge 1-{(1-M/N)}^j\) of finding a root during j evaluations of f. This is true even if M is not known.

Filtered Search. If f is expensive to evaluate, we may try to decrease the cost of exhaustive search by applying a search filter. We say that a predicate g is a filter for f if \(f \ne g\) and . We say that g recognises f with a false positive rate of

and a false negative rate of

A filtered search evaluates g(0), f(0), g(1), f(1), g(2), f(2), and so on until a root of \(f\cap g\) is found. The evaluation of f(i) can be skipped when i is not a root of g, which may reduce the cost of filtered search below that of exhaustive search.

Quantum Search. Grover’s quantum search algorithm is a black box search algorithm that provides a quadratic advantage over exhaustive search in terms of query complexity. Suppose that f is a predicate with M roots. Let \(\mathbf {D}\) be any unitary transformation that maps \(|0\rangle \) to \(\frac{1}{\sqrt{N}}\sum _i |i\rangle \), let \(\mathbf {R}_0 = \mathbf {I}_N - 2 |0\rangle \langle 0|\) and let \(\mathbf {R}_f\) be the unitary \(|x\rangle \mapsto {(-1)}^{f(x)} |x\rangle \). Measuring \(\mathbf {D}|0\rangle \) yields a root of f with probability M/N. Grover’s quantum search algorithm amplifies this to probability \(\approx 1\) by repeatedly applying the unitary \(\mathbf {G}(f) = \mathbf {D}\mathbf {R}_0\mathbf {D}^{-1}\mathbf {R}_f\)  [25]. Suppose that j repetitions are applied. The analysis in  [25] shows that measuring the state \(\mathbf {G}{(f)}^j\mathbf {D}|0\rangle \) yields a root of f with probability \(\sin ^2((2j+1)\cdot \theta )\) where \(\sin ^2(\theta ) = M/N\). Assuming \(M \ll N\), the probability of success is maximised at \(j \approx \frac{\pi }{4}\sqrt{N/M}\) iterations. Boyer, Brassard, Høyer, and Tapp (BBHT) show that a constant success probability can be obtained after \(O(\sqrt{N/M})\) iterations.

The same complexity can be obtained when M is not known. One simply runs the algorithm repeatedly with j chosen uniformly from successively larger intervals. The following lemma contains the core observation.

Lemma 1

(Lemma 2 of  [12]). Suppose that measuring \(\mathbf {D}|0\rangle \) would yield a root of f with probability \(\sin ^2(\theta )\). Fix a positive integer m. Let j be chosen uniformly from \(\{0, \dots , m-1\}\). The expected probability that measuring \(\mathbf {G}{(f)}^j\mathbf {D}|0\rangle \) yields a root of f is \(\frac{1}{m}\sum _{j=0}^{m-1}\sin ^2((2j + 1)\cdot \theta ) = \frac{1}{2} - \frac{\sin (4m\theta )}{4m\,\sin (2\theta )}\). If \(m > 1/\sin (2\theta )\) then this quantity is at least 1/4.

The complete strategy is made precise by  [12, Theorem 3].

Amplitude Amplification. Brassard, Høyer, Mosca, and Tapp observed that the \(\mathbf {D}\) subroutine of Grover’s algorithm can be replaced with any algorithm that finds a root of f with positive probability  [13]. This generalisation of Grover’s algorithm is called amplitude amplification. Let \(\mathbf {A}\) be a quantum algorithm that makes no measurements and let p be the probability that measuring \(\mathbf {A}|0\rangle \) yields a root of f. Let \(\mathbf {G}(\mathbf {A}, f) = \mathbf {A}\mathbf {R}_0 \mathbf {A}^{-1}\mathbf {R}_f\), where \(\mathbf {R}_0\) and \(\mathbf {R}_f\) are as in Grover’s algorithm. Let \(\theta \) be such that \(\sin ^2(\theta ) = p\). Suppose that j iterations of \(\mathbf {G}(\mathbf {A}, f)\) are applied to \(\mathbf {A}|0\rangle \). The analysis in  [13] shows that measuring the state \(\mathbf {G}{(\mathbf {A}, f)}^j\mathbf {A}|0\rangle \) yields a root of f with probability \(\sin ^2((2j+1)\cdot \theta )\). The BBHT strategy for handling an unknown number of roots generalises to an unknown \(p\).

2.3 Lattice Sieving and Near Neighbour Search on the Sphere

A Euclidean lattice of rank m and dimension d is an abelian group generated by integer sums of \(m \le d\) linearly independent vectors in \(\mathbb {R}^d\). In this paper we only consider full rank lattices, i.e. \(m = d\). The shortest vector problem in a lattice \(\Lambda \) is the problem of finding a non-zero \(v \in \Lambda \) of minimal Euclidean norm. Norms in this work are Euclidean and denoted \(\Vert \,\cdot \,\Vert \). The angular distance of \(u, v \in \mathbb {R}^d\) is denoted \(\theta (u, v) = \arccos \left( \langle u, v \rangle / \Vert u \Vert \Vert v \Vert \right) \), \(\arccos (x) \in [0, \pi ]\).

A lattice sieve takes as input a list of lattice points, \(L \subset \Lambda \), and searches for integer combinations of these points that are short. If the initial list is sufficiently large, SVP can be solved by performing this process recursively. Each point in the initial list can be sampled at a cost polynomial in \(d\)  [31]. Hence the initial list can be sampled at a cost of .

Sieves that combine k points at a time are called k-sieves. The sieves that we consider in this paper are 2-sieves. They take integer combinations of the form \(u \pm v\) with \(u, v \in L\) and \(u \ne \pm v\). If \(\Vert u \pm v \Vert \ge \max \{\Vert u \Vert , \Vert v \Vert \}\) then we say that (uv) is a reduced pair, else it is a reducible pair.

We analyse 2-sieves under the heuristic that the points in L are independent and identically distributed (i.i.d.) uniformly in a thin spherical shell. This heuristic was introduced by Nguyen and Vidick in  [40]. As a further simplification, we assume that the shell is very thin and normalise such that \(L \subset \mathcal {S}^{d - 1}\), the unit sphere in . As such, (uv) are reducible if and only if \(\theta (u, v) < \pi /3\). The popcount filter, introduced in Sect. 2.4, acts as a first approximation to \(\theta (\cdot \, , \cdot )\).

When we model L as a subset of \(\mathcal {S}^{d-1}\), we can translate some lattice sieves into the language of (angular) near neighbour search on the sphere. For example, the Nguyen–Vidick sieve  [40], which checks all pairs in L for reducibility, becomesFootnote 1 Algorithm 1 with \(\theta =\pi /3\).

figure l

2.4 The popcount Filter

Charikar’s locality sensitive hashing (LSH) scheme  [15] is a family of hash functions \(\mathcal {H}\), defined on \(\mathcal {S}^{d-1}\), for which

$$\begin{aligned} \Pr _{h \leftarrow \mathcal {H}}[h(u) = h(v)] = 1 - \frac{\theta (u,v)}{\pi }. \end{aligned}$$
(2)

The hash function family is defined by

$$\begin{aligned} \mathcal {H} = \left\{ u \mapsto \mathrm {sgn}(\langle r, u\rangle ) : r \in \mathcal {S}^{d-1} \right\} , \end{aligned}$$

where \(\mathrm {sgn}(x) = 1\) if \(x \ge 0\) and \(\mathrm {sgn}(x) = 0\) if \(x < 0\). Equation 2 follows from the fact that \(\theta (u,v)/\pi \) is the probability that uniformly random u and v lie in opposite hemispheres.

Charikar observed that one can estimate \(\theta (u,v)/\pi \) by choosing a random hash function \(h = (h_1, \dots , h_n) \in \mathcal {H}^n\) and measuring the Hamming distance between \(h(u) = (h_1(u), \dots , h_n(u))\) and \(h(v) = (h_1(v), \dots , h_n(v))\). Each bit \(h_i(u) \oplus h_i(v)\) is Bernoulli distributed with parameter \(p = \theta (u,v)/\pi \). In the limit of large n, the normalised Hamming weight \(wt(h(u) \oplus h(v))/n\) converges to a normal distribution with mean p and standard deviation \(\sqrt{p(1-p)/n}\).

In the sieving literature, the process of filtering a \(\theta (\cdot , \cdot )\) test using a threshold on the value of \(wt(h(u) \oplus h(v))\) is known as the “XOR and population count trick”  [2, 17, 18]. Functions in \(\mathcal {H}^n\) are also used in Laarhoven’s HashSieve  [33]. We write \(\mathtt {popcount}_{k,n}(u, v; h)\) for a search filter of this type

$$\mathtt {popcount}_{k,n}(u, v; h) = {\left\{ \begin{array}{ll} 0 &{} \text {if }\sum _{i=1}^{n} h_i(u) \oplus h_i(v) \le k,\\ 1 &{} \text {otherwise.} \end{array}\right. }$$

When the n hash functions are fixed we write \(\mathtt {popcount}_{k,n}(u, v)\). The threshold, k, is chosen based on the desired false positive and false negative rates. Heuristically, if one’s goal is to detect points at angle at most \(\theta \), one should take \(k/n \approx \theta /\pi \). If \(k/n \ll \theta /\pi \) then the false negative rate will be large, and many neighbouring pairs will be missed. An important consequence of missing potential reductions is that the N required to iterate Algorithms 1, 3, 4 increases. In Sect. 6 this increase is captured in the quantity \(\ell (k, n)\). If \(k/n \gg \theta /\pi \) then the false positive rate will be large, and the full inner product test will be applied often. We calculate these false positive and negative rates in Sect. 5. These calculations and the fact that popcount is significantly cheaper than an inner product makes popcount a good candidate for use as a filter under the techniques of Sect. 2.2. Furthermore it is the filter used in the most performant sieves to date  [2, 17].

2.5 Geometric Figures on the Sphere

Our analysis of the \(\mathtt {popcount}\) filter requires some basic facts about the size of some geometric figures on the sphere. We measure the volume of subsets of \(\mathcal {S}^{d-1} = \{v \in \mathbb {R}^{d} :\Vert v \Vert = 1\}\) using the \((d-1)\) dimensional spherical probability measureFootnote 2 \(\mu ^{d-1}\). The spherical cap of angle \(\theta \) about \(u\in \mathcal {S}^{d-1}\) is \(\mathcal {C}^{d-1}(u,\theta ) = \{ v \in \mathcal {S}^{d-1} : \theta (u, v) \le \theta \}\). The measure of a spherical cap is

$$C_d(u, \theta ) := \mu ^{d-1}(\mathcal {C}^{d-1}(u,\theta )) = \frac{1}{\sqrt{\pi }}\frac{\Gamma (\tfrac{d}{2})}{\Gamma (\tfrac{d-1}{2})}\int _{0}^{\theta } \sin ^{d-2}(t)~\mathrm {d}t.$$

We will often interpret \(C_d(u, \theta )\) as the probability that v drawn uniformly from \(\mathcal {S}^{d-1}\) satisfies \(\theta (u, v) \le \theta \). We denote the density of the event \(\theta (u, v) = \theta \) by

$$A_d(u, \theta ) := \frac{\partial }{\partial \theta } C_d(u, \theta ) = \frac{1}{\sqrt{\pi }}\frac{\Gamma (\tfrac{d}{2})}{\Gamma (\tfrac{d-1}{2})}\sin ^{d-2}(\theta ).$$

Note that \(C_d(u,\theta )\) does not depend on u, so we may write \(C_d(\theta )\) and \(A_d(\theta )\) without ambiguity. The wedge formed by the intersection of two caps is \(\mathcal {W}^{d-1}(u,\theta _u,v,\theta _v) = \mathcal {C}^{d-1}(u, \theta _u) \cap \mathcal {C}^{d-1}(v,\theta _v)\). The measure of a wedge only depends on \(\theta =\theta (u,v)\), \(\theta _u\), and \(\theta _v\), so we denote it

$$\begin{aligned} W_d(\theta , \theta _u, \theta _v) = \mu ^{d-1}(\mathcal {W}^{d-1}(u,\theta _u,v,\theta _v)). \end{aligned}$$

We will often interpret \(W_d(\theta , \theta _u, \theta _v)\) as the probability that w drawn uniformly from \(\mathcal {S}^{d-1}\) satisfies \(\theta (u, w) \le \theta _u\) and \(\theta (v, w) \le \theta _v\). Note that \(\theta \ge \theta _u + \theta _v \Rightarrow W_d(\theta , \theta _u, \theta _v) = 0\). An integral representation of \(W_d(\theta , \theta _u, \theta _v)\) is given in Appendix A of the full version.

3 Filtered Quantum Search

A filter can reduce the cost of a search because a classical computer can branch to avoid evaluating an expensive predicate. A quantum circuit cannot branch inside a Grover search in this way. Nevertheless, a filter can be used to reduce the cost of a quantum search.

The idea is to apply amplitude amplification to a Grover search. The inner Grover search prepares the uniform superposition over roots of the filter, g. The outer amplitude amplification searches for a root of f among the roots of g. We present pseudocode for this strategy in Algorithm 2.

If and are known, then we can choose the number of iterations of the inner Grover search and the outer amplitude amplification optimally. When these quantities are not known, we can attempt to guess them as in the BBHT algorithm. In our applications, we have some information about and , which we can use to fine-tune a BBHT-like strategy.

Proposition 1 gives the cost of Algorithm 2 when we know 1. a lower bound, Q, on the size of , and 2. the value of up to relative error \(\gamma \). In essence, when a filter with a low false positive rate is used to search a space with few true positives, Algorithm 2 can be tuned such that it finds a root of f with probability at least 1/14 and at a cost of roughly \(\frac{\gamma }{2} \sqrt{N/Q}\) iterations of \(\mathbf {G}(g)\).

figure m

If we know that the the inner Grover search succeeds with probability \(x < 1\), we can compensate with a factor of \(\sqrt{1/x}\) more iterations of the outer amplitude amplification. We do not know x. However, in our applications, we do know that the value of \(\theta \) for which will be fairly small, e.g. \(\theta < 1/10\). The following technical lemma shows that, when \(\theta \) is small, we may assume that \(x = 1/5\) with little impact on the overall cost of the search.

Let j and \(\mathbf {A}_j\) be as in Algorithm 2. Let \(p_\theta (j)\) be the probability that measuring \(\mathbf {A}_j|0\rangle \) would yield a root of g. For any \(x \in (0,1)\), there is some probability \(q_x(m_1)\) that the choice of j is insufficient, i.e. that \(p_\theta (j) < x\). We expect to repeat Algorithm 2 a total of \({(1-q_x(m_1))}^{-1}\) times to avoid this type of failure.

Lemma 2

Fix \(\theta \in [0, \pi /2]\) and \(x \in [0,1)\). Let be defined by \(p_\theta (j) = \sin ^2((2j+1)\cdot \theta )\) and . If \(m > \frac{\pi }{4\theta }\), then

Proof

Observe that \(p_\theta (j) < x\) when . Let \(I_0\) be the interval \([0, \arcsin (\sqrt{x}))\). For integers \(t \ge 1\) let \(I_t = (t\pi - \arcsin (\sqrt{x}), t\pi + \arcsin (\sqrt{x}))\). Let \(c = c(m)\) be the largest integer for which \(\left[ 0, (2m-1)\cdot \theta \right) \) intersects \(I_{c}\). The quantity \(mq_x(m)\) counts the number of non-negative integers \(i < m\) for which \((2i+1)\cdot \theta \) lies in \(I_0 \cup I_1 \cup \dots \cup I_{c}\). This is no more than \((c+1) + \lfloor (2c+1)\arcsin (\sqrt{x})/(2\theta ) \rfloor \). It follows that \(q_x(m) < (c+1)/m + (2c+1)\arcsin (\sqrt{x})/2m\theta \). Note that \(2m\theta> (2m-1)\theta > c\pi - \arcsin (\sqrt{x})\) and \((c+1)/m < 2\theta /\pi + 1/m\). Hence \(q_x(m) < (2c+1)\arcsin (\sqrt{x})/(c\pi - \arcsin (\sqrt{x})) + 2\theta /\pi + 1/m\). Moreover, \(q_x(m) > q_x(m-1)\) when \((2m-1)\cdot \theta \) lies in \(I_c\), and \(q_x(m) < q_x(m-1)\) otherwise. The upper bound on \(q_x(m)\) that we have derived is decreasing as a function of c. Hence the claim holds when \(c \ge 1\). Finally, when \(m = \frac{\pi }{4\theta }\) and \(c=0\) we have \(q_x(m) < 2\arcsin (\sqrt{x})/\pi + 4\theta /\pi \) and \(q_x(m)\) is decreasing until \(c = 1\).    \(\square \)

There are situations in which filtering is not effective, e.g. when the false positive rate of g is very high, when evaluting g is not much less expensive than evaluating f, or when f has a very large number of roots. In these cases, other algorithms will outperform Algorithm 2. We remark on these below. Proposition 1 optimises the choice of \(m_1\) and \(m_2\) in Algorithm 2 for a large class of filters that are typical of our applications.

Proposition 1

Suppose that f and g are predicates on a domain of size N and that g is a filter for f. Let be such that . Let P and \(\gamma \) be real numbers such that . If \(\gamma P / N < 1/100\) and \(\gamma Q/P < 1/4\), then there are parameters \(m_1\) and \(m_2\) for Algorithm 2 such that Algorithm 2 finds a root of f with probability at least 1/14 and has a cost that is dominated by \(\approx \frac{\gamma }{2}\sqrt{N/Q}\) times the cost of \(\mathbf {G}(g)\) or by \(\approx \frac{2}{3}\sqrt{\gamma P/Q}\) times the cost of \(\mathbf {R}_{f\cap g}\).

Proof

Fix \(x \in (0,1)\). We will analyse Algorithm 2 with respect to the parameters \(m_1 = \left\lceil \frac{\pi }{4}\sqrt{\gamma N/P}\right\rceil \) and \(m_2 = \left\lceil \sqrt{\gamma P/3xQ}\right\rceil \). Let \(\theta _g\) be such that . Let \(j\) and \(k\) be chosen as in Algorithm 2. Let \(p = p_{\theta _g}(j)\) and \(q = q_x(m_1)\) be defined as in Lemma 2. Note that since we can use \(6\theta _g/\pi < 1/5\) in applying Lemma 2. Let \(\theta _h(j)\) be such that . With probability at least \(1-q\) we have \(p \ge x\), which implies that \(\sin (\theta _h(j)) > \sqrt{xQ/\gamma P}\). Since \(\gamma Q/P< 1/4 \Rightarrow \sin ^2(\theta _h(j)) < 1/4\), then \(\cos (\theta _h(j)) > \sqrt{3/4}\). Thus \(1/\sin (2\theta _h(j)) < \sqrt{\frac{\gamma P}{3xQ}} \le m_2\). By Lemma 1 measuring \(\mathbf {G}{(\mathbf {A}_j, f\cap g)}^k\mathbf {A}_j|0\rangle \) yields a root of \(f\cap g\) with probability at least 1/4. It follows that Algorithm 2 succeeds with probability at least \((1-q)/4\).

The algorithm evaluates \(\mathbf {G}(g)\) exactly \(k\cdot j+1\) times and evaluates \(\mathbf {G}{(g)}^{-1}\) exactly \(k\cdot j\) times. The expected value of \(2kj+1\) is \(c_1(x) \cdot \gamma \cdot \sqrt{N/Q}\) where \(c_1(x) \approx {(\pi /8)}/\sqrt{3x}\). Likewise the algorithm evaluates \(\mathbf {R}_{f\cap g}\) exactly k times, which is \(c_2(x) \cdot \sqrt{\gamma P/Q}\) in expectation where \(c_2(x) \approx (1/2)/\sqrt{3x}\). Taking \(x = 1/5\), and applying the upper bound on \(q_x(m_1)\) from Lemma 2, we have \((1-q_x(m_1))/4 \ge 1/14\), \(c_1(x) \approx 1/2\) and \(c_2(x) \approx 2/3\).    \(\square \)

Remark 1

When \(\gamma P/N \ge 1/100\) or \(\gamma Q/P \ge 1/4\) there are better algorithms. If both inequalities hold then classical search finds a root of f quickly. If \(\gamma Q/P \ge 1/4\) then finding a root of f is not much harder than finding a root of g, so one can search on g directly. If \(\gamma P/N \ge 1/100\) then the filter has little effect and one can search on f directly.

Remark 2

It is helpful to understand when we can ignore the cost of \(\mathbf {R}_{f\cap g}\) in Proposition 1. Roughly speaking, if evaluating f is c times more expensive than evaluating g, then the cost of calls to \(\mathbf {G}(g)\) will dominate when . In a classical filtered search the cost of evaluating g dominates when .

4 Circuits for popcount

Consider a program for \(\mathtt {popcount}_{k,n}(u, v)\). This program loads u and v from specified memory addresses, computes h(u) and h(v), computes the Hamming weight of \(h(u) \oplus h(v)\), and checks whether it is less than or equal to k. Recall h(u) is defined by n inner products. If the popcount procedure is executed many times for each u, then it may be reasonable to compute h(u) once and store it in memory. Moreover, if u is fixed for many sequential calls to the procedure, then it may be reasonable to cache h(u) between calls. The algorithms that we consider in Sect. 6 use both of these optimisations.

In this section we describe RAM programs and quantum circuits that compute \(\mathtt {popcount}_{k,n}(u, \cdot )\) for a fixed u. These circuits have the value of h(u) hard-coded. They load h(v) from memory, compute the Hamming weight of \(h(u) \oplus h(v)\), and check whether the Hamming weight is less than or equal to k. We ignore the initial, one time, cost of computing h(u) and h(v).

4.1 Quantum Circuit for popcount

Loading h(v) costs a single qRAM gate. Computing \(h(u) \oplus h(v)\) can then be done in-place using a sequence of \(\mathbf {X}\) gates that encode h(u). The bulk of the effort is in computing the Hamming weight. For that we use a tree of in-place adders. The final comparison is also computed with an adder, although only one bit of the output is needed. See Fig. 1 for a full description of the circuit.

We use the Cuccaro–Draper–Kutin–Petrie adder  [16], with “incoming carry” inputs, to compute the Hamming weight. We argue in favour of this choice of adder in Appendix C of the full version. We use the Häner–Roetteler–Svore  [26] carry bit circuit for implementing the comparison.

We will later use popcount within filtered quantum searches by defining predicates of the form \(g(i) = \mathtt {popcount}_{k,n}(u, v_i)\), \(i \in \{1, \dots , N\}\). To simplify that later discussion, we cost the entire Grover iteration \(\mathbf {G}(g) = \mathbf {D}\mathbf {R}_0\mathbf {D}^{-1}\mathbf {R}_g\) here. In Appendix B of the full version we introduce the (possibly multiply controlled) Toffoli gate and discuss the Toffoli count for \(\mathbf {G}(g)\), which in turn gives the \(\mathbf {T}\) count for \(\mathbf {G}(g)\).

Fig. 1.
figure 1

A quantum circuit for popcount. This circuit computes \(h(u) \oplus h(v)\) for a fixed n bit h(u), computes the Hamming weight of \(h(u) \oplus h(v)\), and checks whether the Hamming weight is less than or equal to k. Here \(n = 2^\ell -1 = 31\). The input qubits are represented as lines ending with a black diamond. The dashed lines represent incoming carry inputs, and the dotted lines represent carry outputs. Not all of the output wires are drawn. For space efficiency, some of the input qubits are fed into the incoming carry qubits of the adders (dashed lines). The \(\mathbf {X}^{i}\) mean that gate \(\mathbf {X}\) is applied to input qubit i if bit i of h(u) is 1. The circuit uses a depth \(\ell -1\) binary tree of full bit adders from  [16], where \(\text {ADD}_i\) denotes an i bit full adder. The output \(wt(h(u) \oplus h(v))\) from the tree of adders together with the binary representation of the number \(n-k\) are finally fed into the input of the CARRY circuit from  [26], which computes the carry bit of \(n-k+wt(h(u) \oplus h(v))\) (the carry bit will be 0 if \(wt(h(u) \oplus h(v)) \le k\), and 1 otherwise). The final \(\mathbf {CNOT}\) is for illustration only. In actuality, the carry bit is computed directly into an ancilla that is initialised in the state, so we can obtain the needed phase kickback. The tree of adders and the initial \(\mathbf {X}\) gates, but not the CARRY circuit, are run in reverse to clean up scratch space and return the inputs to their initial state. The uncomputation step is not depicted here.

The Cost of \(\mathbf {R}_g\). The \(\mathbf {R}_g\) subroutine is computed by running the popcount circuit in Fig. 1 and then uncomputing the addition tree and \(\mathbf {X}\) gates. The circuit uses in-place i bit addersFootnote 3 for \(i\in \{1,\dots ,\ell -1\}\). The width of the circuit is given in Appendix B of the full version. The depth of the circuit is

$$\begin{aligned} \text {depth} = 2 + d(\text {CARRY} ) + \sum _{i=1}^{\ell -1}2\cdot d(\text {ADD}_i), \end{aligned}$$
(3)

where \(d(\cdot )\) denotes the depth of its argument. The factor of 2 accounts for uncomputation of the \(\text {ADD}_{i}\) circuits. The CARRY circuit is only cost once as the carry bit is computed directly into the state during the CARRY circuit itself. The summand 2 accounts for the \(\mathbf {X}\) gates used to compute, and later uncompute, \(h(u) \oplus h(v)\).

The Cost of \(\mathbf {D}\mathbf {R}_0\mathbf {D}^{-1}\). Recall that \(\mathbf {D}\) can be any circuit that maps to the uniform distribution on the domain of the search predicate. While there is no serious difficulty in sampling from the uniform distribution on \(\{0, \dots , N-1\}\) for any integer N, when costing the circuit we assume that N is a power of two. In this case \(\mathbf {D}\) is simply \(\log _2 N\) parallel \(\mathbf {H}\) gates. The reflection \(\mathbf {R}_0\) is implemented as a multiply controlled Toffoli gate that targets an ancilla initialised in the state. We use Maslov’s multiply controlled Toffoli from  [37]. The depth and width of \(\mathbf {D}\mathbf {R}_0\mathbf {D}^{-1}\) are both \(O(\log N)\); our software calculates the exact value.

4.2 RAM Program for popcount

Recall that we use a RAM instruction set that consists of simple bit operations and table lookups. A Boolean circuit for popcount is schematically similar to Fig. 1. Let \(\ell = \lceil \log _{2} n \rceil \). Loading h(v) has cost 1. Computing \(h(v) \oplus h(w)\) takes \(n\) XOR instructions and has depth 1. Following  [41, Table. II], with \(c_{FA} = 5\) the number of instructions in a full adder, \((n - \ell - 1)c_{FA} + \ell \) lower bounds the instruction cost of computing the Hamming weight and comparing it with a fixed \(k\). This has depth \((\ell -1)(\delta _{\text {sum}} + \delta _{\text {carry}}) + 1\). We assume \(\delta _{\text {sum}} = \delta _{\text {carry}} = 1\). Thus, the overall instruction count is \(6n - 4\ell - 5\) and the overall depth is \(2\ell \).

4.3 Cost of Inner Products

The optimal \(\mathtt {popcount}\) parameters will depend on the cost of a computing an inner product in dimension d. The cost of one inner product is amortised over many popcounts, and a small change in the \(\mathtt {popcount}\) parameters will quickly suppress the ratio of inner products to popcounts (see Remark 2). Hence we only need a rough estimate for the cost of an inner product. We assume 32 bits of precision are sufficient. We then assume schoolbook multiplication is used for scalar products, which costs approximately \(32^{2}\) AND instructions. We then assume the cost of a full inner product is approximately \(32^{2}\, d\), i.e. we ignore the cost of the final summation, assuming it is dwarfed by the ANDs.Footnote 4

5 The Accuracy of popcount

Here we give an analysis of the popcount technique based on some standard simplifying assumptions. We are particularly interested in the probability that a popcount filter identifies a random pair of points as potential neighbours. We are also interested in the probability that a pair of actual neighbours are not identified as potential neighbours, i.e. the false negative rate. Our software computes all of the quantities in this section to high precision.

Let \(P_{k,n}(u,v)\) be the probability that \(\mathtt {popcount}_{k,n}(u,v; h) = 0\) for a uniformly random h (recall \(\mathtt {popcount}_{k,n}(u,v; h) = 0\) if uv pass the filter). In other words, let \(h = (h_1, \dots , h_n)\) be a collection of independent random variables that are distributed uniformly on the sphere, and define

$$\begin{aligned} P_{k,n}(u,v) = 1 - \mathbb {E}\left[ \mathtt {popcount}_{k,n}(u,v; h)\right] . \end{aligned}$$

The hyperplane defined by \(h_i\) separates u and v with probability \(\theta (u,v)/\pi \), and \(\mathtt {popcount}_{k,n}(u,v; h) = 0\) if no more than k of the hyperplanes separate u and v. Hence,

$$\begin{aligned} P_{k,n}(u,v) = \sum _{i=0}^{k}\left( {\begin{array}{c}n\\ i\end{array}}\right) \cdot {\left( \frac{\theta (u,v)}{\pi }\right) }^i\cdot {\left( 1-\frac{\theta (u,v)}{\pi }\right) }^{n-i}. \end{aligned}$$

Note that \(P_{k,n}(u,v)\) depends only on the angle between u and v, so it makes sense to define \(P_{k,n}(\theta )\). The main heuristic in our analysis of \(\mathtt {popcount}\) is that \(P_{k,n}(u,v)\) is a good approximation to the probability that \(\mathtt {popcount}_{k,n}(u,v; h) = 0\) for fixed h and varying u and v. Under this assumption, all of the quantities in question can be determined by integrating \(P_{k,n}(u,v)\) over different regions of the sphere.

Let \(\hat{P}_{k,n}\) denote the event that \(\mathtt {popcount}_{k,n}(u,v; h) = 0\) for uniformly random u, v, and h. Let \(\hat{R}_\theta \) be the event that \(\theta (u,v) \le \theta \). Recall that \(\Pr [\hat{R}_\theta ] = C_d(\theta )\), and observe that \(\Pr [\hat{R}_\theta ]\) is a cumulative distribution with associated density \(A_d(\theta ) = \frac{\partial }{\partial \theta }C_d(\theta )\). We find, letting \(\mathcal {S}= \mathcal {S}^{d-1}\) for some implicit \(d\),

$$\begin{aligned} \Pr [\hat{P}_{k,n}]&= \int _{\mathcal {S}} \int _{\mathcal {S}} P_{k,n}(u,v)~\mathrm {d}\mu (v)~\mathrm {d}\mu (u) \nonumber \\&= \int _{\mathcal {S}}\left( \int _{0}^{\pi } P_{k,n}(\theta ) \cdot A_d(\theta ) ~\mathrm {d}\theta \right) ~\mathrm {d}\mu (u) \nonumber \\&= \int _{0}^{\pi } P_{k,n}(\theta ) \cdot A_d(\theta )~\mathrm {d}\theta . \end{aligned}$$
(4)

Let uv such that \(\theta (u, v) \le \varphi \) be neighbours. The false negative rate is \(1 - \Pr [\hat{P}_{k,n}~\vert ~\hat{R}_\varphi ]\). The quantity \(\Pr [\hat{P}_{k,n} \wedge \hat{R}_\varphi ]\) can be calculated by changing the upper limit of integration in Eq. 4. It follows that

$$\begin{aligned} 1 - \Pr [\hat{P}_{k,n}~\vert ~\hat{R}_\varphi ] = 1 - \frac{1}{C_d(\varphi )}\int _{0}^{\varphi } P_{k,n}(\theta ) \cdot A_d(\theta )~\mathrm {d}\theta . \end{aligned}$$
(5)

In Sect. 6 we consider u and v that are uniformly distributed in a cap of angle \(\beta < \pi /2\), rather than the uniformly distributed on the sphere. Let be the event that u and v are uniformly distributed in a cap of angle \(\beta \) about w. We have

(6)

In the second line we have used the fact that \(\beta < \pi /2\) and \(W(\theta , \theta _1, \theta _2)\) is zero when \(\theta \ge \theta _1 + \theta _2\). The quantity \(\Pr [\hat{B}_{w,\beta } \wedge \hat{R}_\varphi ]\) can be computed by changing the upper limit of integration in Eq. 6 from \(2\beta \) to \(\min \{2\beta , \varphi \}\). We note that \(\hat{B}_{w, \beta }\) has no dependence on w and therefore may also be written \(\hat{B}_{\beta }\). The conditional probability that \(\mathtt {popcount}_{k, n}(u, v; h) = 0\), given that \(u, v\) are uniformly distributed in a cap \(B_{\beta }\), \(\Pr [\hat{P}_{k, n}~\vert ~\hat{B}_{\beta }]\), can be computed using Eq. 6 and

$$\begin{aligned} \Pr [\hat{P}_{k,n} \wedge \hat{B}_{\beta }]&= \int _{0}^{2\beta } P_{k,n}(\theta ) \cdot W_d(\theta ,\beta ,\beta ) \cdot A_d(\theta )~\mathrm {d}\theta . \end{aligned}$$
(7)

The quantity \(\Pr [\hat{P}_{k,n} \wedge \hat{B}_{\beta } \wedge \hat{R}_\varphi ]\) can be computed by changing the upper limit of integration in Eq. 7 from \(2\beta \) to \(\min \{2\beta , \varphi \}\). The false negative rate for popcount when restricted to a cap is \(1-\Pr [\hat{P}_{k,n}~\vert ~\hat{B}_{w,\beta } \wedge \hat{R}_\varphi ]\).

6 Tuning popcount for NNS

We now use the circuit sizes from Sect. 4 and the probabilities from Sect. 5 to optimise popcount for use in NNS algorithms. Our analysis is with respect to points sampled independently from the uniform distribution on the sphere. We further restrict our attention to list-size preserving parameterisations, which take an input list of size N and return an output list of (expected) size N.

We use the notation for events introduced in Sect. 5. In particular, we write \(\hat{R}_\theta \) for the event that a uniformly random pair of vectors are neighbours, i.e. that they lie at angle less than or equal to \(\theta \) of one another; \(\hat{P}_{k,n}\) for the event that popcount identifies a uniformly random pair of vectors as potential neighbours; \(\hat{B}_{\beta }\) for the event that a uniformly random pair of vectors lie in a uniformly random cap of angle \(\beta \); and \(\hat{B}_{w, \beta }\) for the same event except we highlight the cap is centred on w. Throughout this section we use \(\mathtt {popcount}_{k,n}(u,\cdot )\), for various fixed u, as a filter for the search predicate \(\theta (u, \cdot ) \le \theta \). We write \(\eta (k,n)\) for the false negative rate of popcount. We assume that \(\theta (u, v) \le \theta \) is computed using an inner product test. Throughout this section, \(c_1\) represents the instruction cost of the inner product test from Sect. 4.3, \(c_2(k, n)\) the instruction cost of \(\mathtt {popcount}\) from Sect. 4.2, \(q_1\) the quantum cost of the reflection \(\mathbf {R}_{f \cap g}\), and \(q_2(k, n)\) the quantum cost of \(\mathbf {G}(g)\) from Sect. 4.1. We note that \(c_1, q_1\) have a dependence on d that we suppress. We write \(q_0(m)\) for the number of \(\mathbf {G}(g)\) iterations that are applied during a quantum search on a set of size m.

Our goal is to minimise the cost of list-size preserving NNS algorithms as a function of the input list size, the popcount parameters k and n, and the other NNS parameters. In a list of N points there are \(\left( {\begin{array}{c}N\\ 2\end{array}}\right) \) ordered pairs. We expect \(\left( {\begin{array}{c}N\\ 2\end{array}}\right) \cdot \Pr [\hat{R}_\theta ] = \left( {\begin{array}{c}N\\ 2\end{array}}\right) \cdot C_d(\theta )\) of these to be neighbours, and we expect a \(1-\eta (k,n)\) fraction of neighbours to be detected by popcount. List-size preserving parmaterisations that use a popcount filter must therefore take an input list of size at least

$$\begin{aligned} \ell (k,n) = \frac{2}{1-\eta (k,n)} \cdot \frac{1}{C_d(\theta )}. \end{aligned}$$
(8)

The optimised costs reported in Fig. 2 typically use popcount parameters for which \(\ell (k,n) \in \left( 2/C_d(\pi /3), 4/C_d(\pi /3)\right) \). Here we assume that list-size preserving parameterisations take \(N = \ell (k,n)\). Note that \(\eta (k,n) = 1 - \Pr [\hat{P}_{k,n}~\vert ~\hat{R}_\theta ]\) when the search is over a set of points uniformly distributed on the sphere, and \(\eta (k,n) = 1 - \Pr [\hat{P}_{k,n}~\vert ~\hat{R}_\theta \wedge \hat{B}_{\beta }]\) when the search is over a set of points uniformly distributed in a cap of angle \(\beta \) (left implicit).

In each of the quantum analyses, we apply Proposition 1 with \(\gamma =1\), and \(Q = 1\) to estimate \(q_0(m)\). We assume that filtered quantum search succeeds with probability 1 instead of probability at least 1/14, as guaranteed by Proposition 1. In practice, one will not know and one will therefore take \(\gamma > 1\). Our use of \(\gamma =1\) is a systematic underestimate of the true cost of the search. There may be searches where our lower bound of \(Q = 1\) on is too pessimistic. However, the probability of success in filtered quantum search decreases quadratically with . In Sects. 6.1 and 6.3 we expect so the effect of taking \(Q=1\) is negligible. In Sect. 6.2, where \(Q\) may be larger, an optimistic analysis using the expected value of \(Q\) makes negligible savings in dimension \(512\) and small savings in dimension \(1024\). This analysis does not decrement \(Q\) when a neighbour is found in, then removed from, a search space and ignores the quadratic decrease in success probability.

6.1 \({\text {AllPairSearch}}\)

As a warmup, we optimise \({\text {AllPairSearch}}\). Asymptotically its complexity is \(2^{(0.415\cdots + o(1))d}\) classically and \(2^{(0.311\cdots +o(1))d}\) quantumly. We describe implementations of Line 5 of Algorithm 1 based on filtered search and filtered quantum search, and optimise popcount relative to these implementations.

Filtered Search. Suppose that Line 5 applies \(\mathtt {popcount}_{k,n}(v_i, \cdot )\) to each of \(v_{i+1}\) through \(v_{N}\) and then applies an inner product test to each vector that passes. With an input list of size \(N = \ell (k,n)\), we expect this implementation to test all \(\left( {\begin{array}{c}N\\ 2\end{array}}\right) \) pairs before finding N neighbouring pairs. Moreover, we expect the popcount filter to identify \(\left( {\begin{array}{c}N\\ 2\end{array}}\right) \cdot \Pr [\hat{P}_{k,n}]\) potential neighbours, and to perform an equal number of inner product tests. The optimal parameters are obtained by minimising

$$\begin{aligned} \left( c_1\cdot \Pr [\hat{P}_{k,n}] + c_2(k,n)\right) \cdot \left( {\begin{array}{c}\ell (k,n)\\ 2\end{array}}\right) . \end{aligned}$$
(9)

Filtered Quantum Search. Suppose that Line 5 is implemented using the search routine Algorithm 2. Specifically, we take the predicate f to be \(\theta (v_i, \cdot ) \le \theta \) with domain \(L_i\). We take the filter g to be \(\mathtt {popcount}_{k,n}(v_i, \cdot )\). Each call to the search routine returns at most one neighbour of \(v_i\). To find all detectable neighbours of \(v_i\) in \(L_i\) we must repeat the search times. This is expected to be . Known neighbours of \(v_i\) can be removed from \(L_i\) to avoid a coupon collector scenario. We consider an implementation in which searches are repeated until a search fails to find a neighbour of \(v_i\).

We expect to call the search subroutine times in iteration i. Proposition 1 with , \(Q=1\), and \(\gamma =1\) gives iterations of \(\mathbf {G}(g)\). As i ranges from 1 to \(N-1\) the quantity takes each value in \(\{1, \dots , N-1\}\). Our proposed implementation therefore performs an expected

(10)

applications of \(\mathbf {G}(g)\); the expansion is obtained by the Euler–Maclaurin formula. When \(N = \ell (k,n)\) we expect \(N\cdot \Pr [\hat{P}_{k,n}\wedge \hat{R}_\theta ] = 2 + O(1/N)\). The right hand side of Eq. 10 is then \(\frac{11}{15} N^{3/2} + O(\sqrt{N})\).

Proposition 1 also provides an estimate for the rate at which reflections about the true positives, \(\mathbf {R}_{f\cap g}\) are performed. With P and Q as above, we find that \(\mathbf {R}_{f\cap g}\) is performed at roughly \(p(k,n) = \sqrt{\Pr [\hat{P}_{k,n}]}\) the rate of calls to \(\mathbf {G}(g)\). The optimal popcount parameters (up to some small error due to the \(O(\sqrt{N})\) term in Eq. 10) are obtained by minimising the total cost

$$\begin{aligned} \frac{11}{15}\left( {q_1p(k,n) + q_2(k,n)}\right) \cdot {\ell (k,n)}^{3/2}. \end{aligned}$$
(11)

6.2 \({\text {RandomBucketSearch}}\)

One can improve \({\text {AllPairSearch}}\) by bucketing the search space such that vectors in the same bucket are more likely to be neighbours  [33]. For example, one could pick a hemisphere H and divide the list into \(L_1 = L \cap H\) and \(L_2 = L \backslash L_1\). These lists would be approximately half the size of the original and the combined cost of \({\text {AllPairSearch}}\) within \(L_1\) and then within \(L_2\) would be half the cost of an \({\text {AllPairSearch}}\) within L. However, this strategy would fail to detect the expected \(\theta /\pi \) fraction of neighbours that lie in opposite hemispheres.

Becker, Gama, and Joux  [9] present a very efficient generalisation of this strategy. They propose bucketing the input list into subsets of the form \(\{v \in L : \mathtt {popcount}_{k,n}(0, v; h) = 0\}\) with varying choices of h. This bucketing strategy is applied recursively until the buckets are of a minimum size. Neighbouring pairs are then found by an \({\text {AllPairSearch}}\).

A variant of the Becker–Gama–Joux algorithm that uses buckets of the form \(L \cap \mathcal {C}^{d - 1}(f, \theta _1)\), with randomly chosen f and fixed \(\theta _1\), was proposed and implemented in  [2]. This variant is sometimes called bgj1. Here we call it \({\text {RandomBucketSearch}}\). This algorithm has asymptotic complexity \(2^{(0.349\cdots +o(1))d}\) classically  [2] and \(2^{(0.301\cdots +o(1))d}\) quantumly.Footnote 5 This is worse than the Becker–Gama–Joux algorithm, but \({\text {RandomBucketSearch}}\) is conceptually simple and still provides an enormous improvement over \({\text {AllPairSearch}}\). Pseudocode is presented in Algorithm 3.

figure q

Description of Algorithm 3. The algorithm takes as input a list of N points uniformly distributed on the sphere. A random bucket centre f is drawn uniformly from \(\mathcal {S}^{d-1}\) in each of the t iterations of the outer loop. The choice of f defines a bucket in Line 5, \(L_f = L \cap \mathcal {C}^{d-1}(f, \theta _1)\), which is of expected size \(N \cdot C_d(\theta _1)\). For each \(v_j \in L_f\), the inner loop searches a set \(L_{f,j} \subset L_f\) for neighbours of \(v_j\). The quantity takes each value in as \(v_j\) ranges over \(L_f\). The inner loop is identical to the loop in \({\text {AllPairSearch}}\) apart from indexing and the fact that elements of \(L_f\) are known to be in the cap \(\mathcal {C}^{d-1}(f, \theta _1)\).

A bucket \(L_f\) is expected to contain \(\left( {\begin{array}{c}N\\ 2\end{array}}\right) \cdot \Pr [\hat{R}_\theta \wedge \hat{B}_{f,\theta _1}]\) neighbouring pairs. Only a \(1-\eta (k,n)\) fraction of these are expected to be identified by the popcount filter. When \(\theta _1 > \theta \) it is reasonable to assume that \(\Pr [\hat{R}_\theta \wedge \hat{B}_{f,\theta _1}] \approx C_d(\theta ) \cdot W_d(\theta ,\theta _1,\theta _1)\). We use this approximation. The expected number of neighbouring pairs in \(L_f\) that are detected by the popcount filter is therefore approximately \(\left( {\begin{array}{c}N\\ 2\end{array}}\right) \cdot (1-\eta (k,n)) \cdot C_d(\theta ) \cdot W_d(\theta , \theta _1, \theta _1)\). When \(N = \ell (k,n)\) this is \(N \cdot W_d(\theta ,\theta _1,\theta _1)\). If all detectable neighbours are found by the search routine then the algorithm is list-size preserving when \(N = \ell (k,n)\) and \(t = 1/W_d(\theta ,\theta _1,\theta _1)\).

We can now derive optimal popcount parameters for various implementations of Line 8.

Filtered Search. Suppose that Line 8 of Algorithm 3 applies \(\mathtt {popcount}_{k,n}(v_j, \cdot )\) to each element of \(L_{f,j}\) and then applies an inner product test to each vector that passes. This implementation applies popcount tests to all pairs of elements in \(L_f\) and finds all of the neighbouring pairs that pass. In the process it applies inner product tests to a \(p(\theta _1,k,n) = \Pr [\hat{P}_{k,n}~\vert ~\hat{B}_{f,\theta _1}]\) fraction of pairs. The cost of populating buckets in one iteration of Line 5 is \(c_1 \cdot \ell (k,n)\). The cost of all searches on Line 8 is \(\left( c_1 \cdot p(\theta _1,k,n) + c_2(k,n)\right) \cdot \left( {\begin{array}{c}NC_d(\theta _1)\\ 2\end{array}}\right) \). With the list-size preserving parameters N and t given above, the optimal \(\theta _1\), k, and n can be obtained by minimising the total cost

$$\begin{aligned} \frac{c_1\cdot \ell (k,n) + \left( c_1\cdot p(\theta _1,k,n) + c_2(k,n)\right) \cdot \left( {\begin{array}{c}\ell (k,n) \cdot C_d(\theta _1)\\ 2\end{array}}\right) }{W_d(\theta ,\theta _1,\theta _1)}. \end{aligned}$$
(12)

Filtered Quantum Search. Suppose that Line 8 is implemented using the search routine Algorithm 2. We take the predicate f to be \(\theta (v_j, \cdot ) \le \theta \) with domain \(L_{f,j}\). We take the filter g to be \(\mathtt {popcount}_{k,n}(v_j, \cdot )\). Each call to the search routine returns at most one neighbour of \(v_j\). To find all detectable neighbours of \(v_j\) in \(L_{f,j}\) we must repeat the search several times. Known neighbours of \(v_j\) can be removed from \(L_{f,j}\) to avoid a coupon collector scenario. Proposition 1 with , \(Q = 1\), and \(\gamma =1\) gives us that the number of \(\mathbf {G}(g)\) iterations in a search on a set of size is .

We consider an implementation of Line 8 in which searches are repeated until a search fails to find a neighbour of \(v_j\). With \(N = \ell (k,n)\), the set \(L_{f}\) is of expected size \(\ell (k,n) \cdot C_d(\theta _1)\) and contains an expected \(\ell (k,n) \cdot W_d(\theta ,\theta _1,\theta _1)\) neighbouring pairs detectable by \(\mathtt {popcount}\). The set \(L_{f,j}\) is expected to contain a proportional fraction of these pairs. As such, we expect to call the search subroutine times in iteration j where

The inner loop makes an expected

applications of \(\mathbf {G}(g)\). This admits an asymptotic expansion similar to that of Eq. 10. If we assume that takes its expected value of \(\ell (k,n) \cdot C_d(\theta _1)\), then the inner loop makes

$$\begin{aligned} q_3(\theta _1,k,n) \cdot {\left( \ell (k,n) \cdot C_d(\theta _1)\right) }^{3/2} \end{aligned}$$

applications of \(\mathbf {G}(g)\), where

$$\begin{aligned} q_3(\theta _1,k,n) = \frac{2\,W_d(\theta ,\theta _1,\theta _1)}{5\,C_d(\theta _1)} + \frac{1}{3}. \end{aligned}$$

Proposition 1 also provides an estimate for the rate at which reflections about the true positives, \(\mathbf {R}_{f\cap g}\) are performed. With P and Q as above, we find that \(\mathbf {R}_{f\cap g}\) is applied at roughly \(p(\theta _1,k,n) = \sqrt{\Pr [\hat{P}_{k,n}~\vert ~\hat{B}_{f,\theta _1}]}\) the rate of \(\mathbf {G}(g)\) iterations. The total cost of searching for neighbouring pairs in \(L_f\) is therefore

$$\begin{aligned} s(\theta _1,k,n) = \left( q_1 \cdot p(\theta _1,k,n) + q_2(k,n)\right) \cdot q_3(\theta _1,k,n)\cdot {\big (\ell (k,n)\cdot C_d(\theta _1)\big )}^{3/2}. \end{aligned}$$
(13)

Populating \(L_f\) has a cost of \(c_1\cdot \ell (k,n)\). With the list-size preserving t given above, the optimal parameters \(\theta _1\), k, and n can be obtained by minimising the total cost

$$\begin{aligned} \frac{c_1\cdot \ell (k,n) + s(\theta _1, k, n)}{W_d(\theta ,\theta _1,\theta _1)}. \end{aligned}$$
(14)

6.3 \({\text {ListDecodingSearch}}\)

The optimal choice of \(\theta _1\) in \({\text {RandomBucketSearch}}\) balances the cost of \(N\cdot t\) cap membership tests against the cost of all calls to the search subroutine. It can be seen that reducing the cost of populating the buckets would allow us to choose a smaller \(\theta _1\), which would reduce the cost of searching within each bucket.

Algorithm 4, \({\text {ListDecodingSearch}}\), is due to Becker, Ducas, Gama, and Laarhoven  [8]. Its complexity is \(2^{(0.292\cdots + o(1))d}\) classically and \(2^{(0.265\cdots +o(1))d}\) quantumly  [34, 35]. Like \({\text {RandomBucketSearch}}\), it computes a large number of list-cap intersections. However, these list-cap intersections involve a structured list—the list-cap intersections in \({\text {RandomBucketSearch}}\) involve the inherently unstructured input list.

figure r

Description of Algorithm 4. The algorithm first samples a t point random product code F. See  [8] for background on random product codes. In our analysis, we treat F as a list of uniformly random points on \(\mathcal {S}^{d-1}\). A formal statement is given as  [8, Theorem 5.1], showing that such a heuristic is essentially true, up to a subexponential loss on the probability of finding the intend pairs.

The first loop populates t buckets that have as centres the points \(f\) of F. Bucket \(L_f\) stores elements of L that lie in the cap of angle \(\theta _2\) about f. Each bucket is of expected size \(N\cdot C_d(\theta _2)\).

The second loop iterates over \(v_j \in L\) and searches for neighbours of \(v_j\) in the disjoint union of buckets with centres within an angle \(\theta _1\) of \(v_j\). The set \(F_j\) constructed on Line 8 contains an expected \(t \cdot C_d(\theta _1)\) bucket centres. The disjoint union of certain elements from the corresponding buckets, denoted \(L_{F,j}\), is of expected size \((N - j) \cdot C_d(\theta _2) \cdot t \cdot C_d(\theta _1)\). We note that by simplifying and assuming the expected size of \(L_{F, j}\) is \(N \cdot C_d(\theta _2) \cdot t \cdot C_d(\theta _1)\) the costs given below are never wrong by more than a factor of two.

Suppose that w is a neighbour of \(v_j\), so \(\theta (v_j, w) \le \theta \). The measure of the wedge formed by a cap of angle \(\theta _1\) about \(v_j\) and a cap of angle \(\theta _2\) about w is at least \(W_d(\theta , \theta _1, \theta _2)\). Assuming that the points of a random product code are indistinguishable from points sampled uniformly on the sphere, the probability that some \(f \in F_j\) contains w is at least \(t\cdot W_d(\theta , \theta _1, \theta _2)\).

The second loop is executed N times. Iteration j searches \(L_{F,j}\) for neighbours of \(v_j\). With \(N = \ell (k,n)\) there are expected to be N detectable neighbouring pairs in L. With \(t = 1/W_d(\theta , \theta _1, \theta _2)\) we expect that each neighbouring pair is of the form \((v_j, w)\) with \(w \in L_{F,j}\).

The angles \(\theta _1, \theta _2\) relate to the spherical cap parameters \(\alpha , \beta \) respectively in  [8], and are such that \(\theta _1 \ge \theta _2\). Optimal time complexity is achieved when \(\theta _1 = \theta _2\).

We have omitted the list decoding mechanism by which list-cap intersections are computed. In our analysis we assume that the cost of a list-cap intersection such as \(F_i = F \cap \mathcal {C}^{d-1}(v_i, \theta _2)\) is proportional to , but independent of , i.e. we are in the “efficient list-decodability regime” of  [8, Section 5.1] and may take their parameter \(m = \log d\). In particular, we assume that in the cost of inner products and other operations, as stated in  [8, Lemma 5.1], the first cost dominates. In  [8] these costs relate to \(O(m\cdot M\cdot \mathcal {C}_n(\alpha ))\) and \(O(nB + mB \log B)\) respectively. We therefore assume the cost of forming \(F_i = F \cap \mathcal {C}^{d-1}(v_i, \theta _2)\) is inner product tests.

Filtered Search. Suppose that the implementation of Line 12 of Algorithm 4 applies \(\mathtt {popcount}_{k,n}(v_j, \cdot )\) to each element of \(L_{F,j}\) and then applies an inner product test to each vector that passes. This implementation applies popcount tests to all \(N \cdot C_d(\theta _2) \cdot t \cdot C_d(\theta _1)\) elements of \(L_{F,j}\) and finds all of the neighbours of \(v_j\) that pass. Note that \(w \in L_{F,j}\) implies that there exists some \(f \in F\) such that both \(v_j\) and w lie in a cap of angle \(\theta _1\) around f. Inner product tests are applied to a \(p(\theta _1,k,n) \ge \Pr [\hat{P}_{k,n}~\vert ~\hat{B}_{f,\theta _1}]\) fraction of all pairs.Footnote 6

The cost of preparing all t buckets in the first loop is \(c_1 \cdot N \cdot t \cdot C_d(\theta _2)\). The cost of constructing the search spaces in the second loop is \(c_1 \cdot N \cdot t \cdot C_d(\theta _1)\). Each search has a cost of popcount tests and inner product tests. With the list-size preserving parameterisation given above, the optimal \(\theta _1\), \(\theta _2\), k, and n can be obtained by minimising the total cost

(15)

Filtered Quantum Search. Suppose that Line 12 is implemented using Algorithm 2. We take the predicate f to be \(\theta (v_j, \cdot ) \le \theta \) with domain \(L_{F,j}\). We take the filter g to be \(\mathtt {popcount}_{k,n}(v_j, \cdot )\). Each call to the search routine returns at most one neighbour of \(v_j\). Known neighbours of \(v_j\) can be removed from \(L_{F,j}\) to avoid a coupon collector scenario. Proposition 1 with , \(Q = 1\), and \(\gamma =1\) gives us that the number of \(\mathbf {G}(g)\) iterations in a search on a set of size is .

Assuming that computing \(F_j = F\cap C(v_j,\theta _1)\) has a cost of , the N iterations of Lines 5 and 8 have a total cost of

$$\begin{aligned} c_1\cdot N \cdot t \cdot \left( C_d(\theta _1) + C_d(\theta _2)\right) \end{aligned}$$
(16)

Each search applies an expected

applications of \(\mathbf {G}(g)\). Reflections about the true positives, \(\mathbf {R}_{f\cap g}\), are performed at roughly \(p(\theta _1,k,n) = \sqrt{\Pr [\hat{P}_{k,n}~\vert ~B_{f,\theta _1}]}\) the rate of \(\mathbf {G}(g)\) iterations. We consider an implementation of Line 8 in which searches are repeated until a search fails to find a neighbour of \(v_j\). With the list-size preserving parameters given above, we expect to perform two filtered quantum searches per iteration of the second loop. The optimal parameters can be obtained by minimising the total cost

$$ \ell (k,n)\left( c_1\frac{C_d(\theta _1) + C_d(\theta _2)}{W_d(\theta ,\theta _1,\theta _2)} + (q_1p(\theta _1,k,n) +q_2(k,n))\sqrt{\frac{\ell (k,n)C_d(\theta _1)C_d(\theta _2)}{W_d(\theta ,\theta _1,\theta _2)}}\right) . $$

7 Cost Estimates

Our software numerically optimises the cost functions in Sects. 6.16.2 and 6.3 with respect to several classical and quantum cost metrics. The classical cost metrics that we consider are: c (unit cost), which assigns unit cost to popcount; c (RAM), which uses the classical circuits of Sect. 4. The quantum cost metrics that we consider are: q (unit cost), which assigns unit cost to a Grover iteration; q (depth-width), which assigns unit cost to every gate (including the identity) in the quantum circuits of Sect. 4; q (gates), which assigns unit cost only to the non-identity gates; q (T count), which assigns unit cost only to T gates; and q (GE19), which is described in Sect. 7.1.

We stress that our software, and Fig. 2, give estimates for the cost of each algorithm. These estimates are neither upper bounds nor lower bounds. As we mention above, we have systematically omitted and underestimated some costs. For instance, we have omitted the list decoding mechanism in our costing of Algorithm 4. We have approximated other costs. For instance, the cost that we assign to an inner product in Sect. 4.3. We have also not explored the entire optimisation space. We only consider values of the popcount parameter n that are one less than a power of two. Moreover, following the discussion in Sect. 2.4, we set \(k = \lfloor n/3 \rfloor \).

While we have omitted and approximated some costs, we have tried to ensure that these omissions and approximations will ultimately lead our software to underestimate of the total cost of the algorithm. For instance, if our inner product cost is accurate, our optimisation procedure ensures that we satisfy Remark 2 and can ignore costs relating to \(\mathbf {R}_{f \cap g}\).

Our results are presented in Fig. 2. We also plot the leading term of the asymptotic complexity of the respective algorithms as these are routinely referred to in the literature. The source code, and raw data for all considered cost metrics, is available at https://github.com/jschanck/eprint-2019-1161.

Fig. 2.
figure 2

Quantum (“q”) and classical (“c”) resource estimates for NNS search.

7.1 Barriers to a Quantum Advantage

As expected, our results in Fig. 2 indicate that quantum search provides a substantial savings over classical search asymptotically. Our plots fully contain the range of costs from \(2^{128}\) to \(2^{256}\) that are commonly thought to be cryptanalytically interesting. Modest cost improvements are attained in this range.

The range of parameters in which a sieve could conceivably be run, however, is much narrower. If one assumes a memory density of one petabyte per gram (\(2^{53}\) bits per gram), a \(2^{140}\) bit memory would have a mass comparable with that of the Moon. Supposing that a 2-sieve stores \(1/C_d(\pi /3)\) vectors, and that each vector is \(\log _2(d)\) bits, an adversary with a \(2^{140}\) bit memory could only run a sieve in dimension 608 or lower. The potential cost improvement in dimension 608 is smaller than the potential cost improvement in, say, dimension 1000. The potential cost improvement that can be actualised is likely smaller still.

We expect that our cost estimates are underestimates. However, the quantum advantage could grow, shrink, or even be eliminated if our underestimates do not affect quantum and classical costs equally. In this section, we list several reasons to think that the advantage might shrink or disappear.

Error Correction Overhead. By using the depth-width metric for quantum circuits, we assume that dispatching a logical gate to a logical qubit costs one RAM instruction. In practice, however, the cost depends on the error correcting code that is used for logical qubits. This cost may be significant.

Gidney and Ekerå have estimated the resources required to factor a 2048 bit RSA modulus using Shor’s algorithm on a surface code based quantum computer  [20]. Under a plausible assumption on the physical qubit error rate, they calculate that a factoring circuit with \(2^{12.6}\) logical qubits and depth \(2^{31}\) requires a distance \(\delta = 27\) surface code. Each logical qubit is encoded in \(2\,\delta ^2 = 1458\) physical qubits, and the error tracking routine applies at least \(\delta ^2 = 729\) bit instructions, per logical qubit per layer of logical circuit depth, to read its input.

In general, a circuit of depth D and width W requires a distance \(\delta = \varTheta (\log (DW))\) surface code. To perform a single logical gate, classical control hardware dispatches several instructions to each of the \(\varTheta (\log ^2(DW))\) physical qubits. The classical control hardware also performs a non-trivial error tracking routine between logical gates, which takes measurement results from half of the physical qubits as input.Footnote 7 Consequently, the cost of surface code computation grows like \(\varOmega (DW\log ^2(DW))\).

We have adapted scripts provided by Gidney and Ekerå to estimate \(\delta \) for our circuits. The last plot of Fig. 2 shows the cost of \({\text {ListDecodingSearch}}\) when every logical gate (including the identity) is assigned a cost of \(\delta ^2\). For \({\text {ListDecodingSearch}}\) the cost in the Gidney–Ekerå metric grows from \(2^{128}\) to \(2^{256}\) between dimensions and , and we calculate a \(2^{128}\) bit memory is sufficient to run in dimension . We find that the advantage of quantum search over classical search is a factor of in dimension , a factor of in dimension , and a factor of in dimension . Compare this with the naïve estimate for the advantage, \(2^{0.292d-0.265d}\), which is a factor of \(2^{9.5}\) in dimension , a factor of \(2^{14.7}\) in dimension , and a factor of \(2^{22.5}\) in dimension .

One should also note that error correction for the surface code sets a natural clock speed, which Gidney and Ekerå estimate at one cycle per microsecond. Gidney and Ekerå estimate that their factoring circuit, the cost of which is dominated by a single modular exponentiation, would take 7.44 hours to run. This additional overhead in terms of time is not refelected in the instruction count.

On the positive side, the cost estimate used in Fig. 2 is specific to the surface code architecture. Significant improvements may be possible. Gottesman has shown that an overhead of \(\varTheta (1)\) physical qubits per logical qubit is theoretically possible  [22]. Whether this technique offers lower overhead than the surface code in practice is yet to be seen.

Dependence on qRAM. Quantum accessible classical memories are used in many quantum algorithms. For example, they are used in black box search algorithms  [25], in collision finding algorithms  [14], and in some algorithms for the the dihedral hidden subgroup problem  [32]. The use of qRAM is not without controversy  [11, 24]. Previous work on quantum lattice sieve algorithms  [34, 35] has noted that constructing practical qRAM seems challenging.

Morally, looking up an \(\ell \) bit value in a table with \(2^{n}\) entries should have a cost that grows at least with \(n+\ell \). Recent results  [5, 6, 38] indicate that realistic implementations of qRAM have costs that grow much more quickly than this. When ancillary qubits are kept to a minimum, the best known Clifford+T implementation of a qRAM has a \(\mathbf {T}\) count of \(4\cdot (2^n-1)\)  [6]. While it is conceivable that a qRAM could be constructed at lower cost on a different architecture, as has been suggested in  [21], a unit cost qRAM gate should be seen as a powerful, and potentially unrealistic, resource.

One can argue that classical RAMs also have a large cost. This is not to say that classical and quantum RAMs have the same cost. A qRAM can be used to construct an arbitrary superposition over the elements of a memory. This process relies on quantum interference and necessarily takes as long as a worst case memory access time. This is in contrast with classical RAM, where careful programming and attention to a computer’s caches can mask the fact that accessing an N bit memory laid out in a 3-dimensional space necessarily takes \(\varOmega (N^{1/3})\) time.

If the cost of a qRAM gate is equivalent to \(\varTheta (N^{1/3})\) Clifford+T gates, then the asymptotic cost of quantum AllPair search is \(2^{(0.380\ldots + o(1))d}\), the asymptotic cost of quantum RandomBucket search is \(2^{(0.336\ldots + o(1))d}\), and the asymptotic cost of quantum ListDecoding search is \(2^{(0.284\ldots + o(1))d}\). If memory is constrained to two dimensions, and qRAM costs \(\varTheta (N^{1/2})\) Clifford+T gates, the quantum asymptotics match the classical RAM asymptotics.

Quantum Sampling Routines. We have assumed that \(\mathbf {D}\) in Sect. 4.1 (the uniform sampling subroutine in Grover’s algorithm) is implemented using parallel \(\mathbf {H}\) gates. This is the smallest possible circuit that might implement \(\mathbf {D}\), and may be a significant underestimate. In Line 12 of Algorithm 4 we must construct a superposition (ideally uniform) over \(\{k : v_k \in L_{F,j}\}\). The set \(L_{F,j}\) is presented as a disjoint union of smaller sets. Copying the elements of these smaller sets to a flat array would be more expensive than our estimate for the cost of search. While we do not expect the cost of sampling near uniformly from \(L_{F,j}\) to be large, it could easily exceed the cost of popcount.

7.2 Relevance to SVP

The NNS algorithms that we have analysed are closely related to lattice sieves for SVP. While the asymptotic cost of NNS algorithms are often used as a proxy for the asymptotic cost of solving SVP, we caution the reader against making this comparison in a non-asymptotic setting. On the one hand, our estimates might lead one to underestimate the cost of solving SVP:

  • the costs given in Fig. 2 represent one iteration of NNS within a sieve, while sieve algorithms make iterations;

  • the costs given in Fig. 2 do not account for all of the subroutines within each NNS algorithm.

On the other hand, our estimates might lead one to overestimate the cost of solving SVP:

  • it is a mistake to conflate the cost of NNS in dimension d with the cost of SVP in dimension d. The “dimensions for free” technique of  [17] can be used to solve SVP in dimension d by calling an NNS routine polynomially many times in dimension \(d' < d\). Our analysis seamlessly applies to dimension \(d'\);

  • there are heuristics that exploit structure present in applications to SVP not captured in our general setting, e.g. the vector space structure allowing both \(\pm u\) to be tested for the cost of \(u\), and keeping the vectors sorted by length.

7.3 Future Work

The sieving techniques considered here are not exhaustive. While it would be relatively easy to adapt our software to other \(2\)-sieves, like the cross polytope sieve  [10], future work might consider \(k\)-sieves such as  [7, 30].

Future work might also address the barriers to a quantum advantage discussed in Sect. 7.1. Two additional barriers are worth mentioning here. First, as Grover search does not parallelise well, one might consider depth restrictions for classical and quantum circuits. Second, our estimates might be refined by including some of the classical subroutines, present in both the classical and quantum variants of the same sieve, that we have ignored, e.g. the cost of sampling lattice vectors or the cost of list-decoding in Algorithm 4. Any cost increase will reduce the range of cryptanalytically relevant dimensions, giving fewer dimensions to overcome quantum overheads.

Finally, our estimates should be checked against experiments. Our analysis of Algorithm 3 recommends a database of size \(N(d) \approx 2/C_d(\pi /3)\), while the largest sieving experiments to date  [2] runs Algorithm 3 with a database of size \(N'(d) = 3.2 \cdot 2^{0.2075d}\) up to dimension \(d=127\). There is a factor of 8 gap between \(N'(127)\) and N(127). A factor of two can be explained by the fact that  [2] treats each database entry u as \(\pm u\). It is possible that the remaining factor of four can be explained by the other heuristics used in  [2]. As \(d\) increases, \(N(d)\) and \(N'(d)\) continue to diverge, so future work could attempt to determine more accurately the required list size.