Appendix
A Proof of Proposition 1
To prove Proposition 1, we will show how to choose a sequence of parameters \(\{(k_n, t_n)\}_{n \in \mathbb {N}}\) such that for large n, the following holds:
-
1.
The probability that a list vector \(\varvec{w}\) closeFootnote 5 to a target vector \(\varvec{v}\) collides with \(\varvec{v}\) in at least one of the t hash tables is at least constant in n:
$$\begin{aligned} p_1^* = \mathbb {P}_{\{h_{i,j}\} \subset \mathcal {H}}(\varvec{v}, \varvec{w} \text { collide} \mid \theta (\varvec{v}, \varvec{w}) \le \tfrac{\pi }{3}) \ge 1 - \varepsilon . \quad (\varepsilon \ne \varepsilon (n)) \end{aligned}$$
(6)
-
2.
The average probability that a list vector \(\varvec{w}\) far away (See footnote 5) from a target vector \(\varvec{v}\) collides with \(\varvec{v}\) is exponentially small:
$$\begin{aligned} p_2^* = \mathbb {P}_{\{h_{i,j}\} \subset \mathcal {H}}(\varvec{v}, \varvec{w} \text { collide} \mid \theta (\varvec{v}, \varvec{w}) > \tfrac{\pi }{3}) \le N^{-0.5681 + o(1)}. \end{aligned}$$
(7)
-
3.
The number of hash tables grows as \(t = N^{0.4319 + o(1)}\).
This would imply that for each search, the number of candidate vectors is of the order \(N \cdot N^{-0.5681} = N^{0.4319}\). Overall we search the list \(\tilde{O}(N)\) times, so after substituting \(N = (4/3)^{n/2 + o(n)}\) this leads to the following time and space complexities:
-
Time (hashing): \(O(N \cdot t) = 2^{0.2972n + o(n)}\).
-
Time (searching): \(O(N^2 \cdot p_2^*) = 2^{0.2972n + o(n)}\).
-
Space: \(O(N \cdot t) = 2^{0.2972n + o(n)}\).
The next two subsections are dedicated to proving Eqs. (6) and (7).
1.1 A.1 Good Vectors Collide with Constant Probability
The following lemma shows how to choose k (in terms of t) to guarantee that (6) holds.
Lemma 3
Let \(\varepsilon > 0\) and let \(k = 6 n^{-1/2} (\ln t - \ln \ln (1/\varepsilon )) \approx (6 \ln t) / \sqrt{n}\). Then the probability that reducing vectors collide in at least one of the hash tables is at least \(1 - \varepsilon \).
Proof
The probability that a reducing vector \(\varvec{w}\) is a candidate vector, given the angle \(\varTheta = \varTheta (\varvec{v}, \varvec{w}) \in (0, \frac{\pi }{3})\), is \(p_1^* = \mathbb {E}_{\varTheta \in (0, \frac{\pi }{3})} \left[ p^*(\varTheta )\right] \), where we recall that \(p^*(\theta ) = 1 - (1 - p(\theta )^k)^t\) and \(p(\theta ) = \mathbb {P}_{h \in \mathcal {H}}[h(\varvec{v}) = h(\varvec{w})]\) is given in Lemma 2. Since \(p^*(\varTheta )\) is strictly decreasing in \(\varTheta \), we can obtain a lower bound by substituting \(\varTheta = \frac{\pi }{3}\) above. Using the bound \(1 - x \le e^{-x}\) which holds for all x, and inserting the given expression for k, we obtain \(p_1^* \ge p^*\left( \tfrac{\pi }{3}\right) = 1 - (1 - \exp (\ln \ln (\tfrac{1}{\varepsilon }) - \ln t))^t = 1 - \left( 1 - \tfrac{\ln (1/\varepsilon )}{t}\right) ^t \ge 1 - \varepsilon \).
1.2 A.2 Bad Vectors Collide with Low Probability
We first recall a lemma about the density of angles between random vectors. In short, the density at an angle \(\theta \) is proportional to \((\sin \theta )^n\).
Lemma 4
[24, Lemma 4] Assuming Heuristic 1 holds, the pdf \(f(\theta )\) of the angle between target vectors and list vectors satisfies
$$\begin{aligned} f(\theta ) = \sqrt{\frac{2n}{\pi }} \ (\sin \theta )^{n-2} \left[ 1 + o(1)\right] = 2^{n\log _2\sin \theta + o(n)}. \end{aligned}$$
(8)
The following lemma relates the collision probability \(p_2^*\) of (7) to the parameters k and t. Since Lemma 3 relates k to t, this means that only t ultimately remains to be chosen.
Lemma 5
Suppose \(N = 2^{c_n \cdot n}\) with \(c_n \ge \gamma _1 = \frac{1}{2} \log _2(\frac{4}{3}) \approx 0.2075\), and suppose \(t = 2^{c_t \cdot n}\). Let \(k = \frac{6 \ln t}{\sqrt{n}}(1 - o(1))\). Then, for large n, under Heuristic 1 we have
$$\begin{aligned} p_2^* = \mathbb {P}_{\{h_{i,j}\} \subset \mathcal {H}}(\varvec{v}, \varvec{w} \text { collide} \mid \theta (\varvec{v}, \varvec{w}) > \tfrac{\pi }{3}) \le O(N^{-\alpha }), \end{aligned}$$
(9)
where \(\alpha \in (0,1)\) is defined as
$$\begin{aligned} \alpha = \frac{-1}{c_n}\left[ \max _{\theta \in (\frac{\pi }{3}, \frac{\pi }{2})} \left\{ \log _2 \sin \theta - \left( 3 \tan ^2 \left( \frac{\theta }{2}\right) - 1\right) c_t\right\} \right] + o(1). \end{aligned}$$
(10)
Proof
First, if we know the angle \(\theta \in (\frac{\pi }{3}, \frac{\pi }{2})\) between two bad vectors, then according to Lemma 2 the probability of a collision in at least one of the hash tables is equal to
$$\begin{aligned} p^*(\theta ) = 1 - \left( 1 - \exp \left[ -\frac{k\sqrt{n}}{2} \tan ^2\left( \frac{\theta }{2}\right) (1 + o(1))\right] \right) ^t\!\!. \end{aligned}$$
(11)
Letting \(f(\theta )\) denote the density of angles \(\theta \) on \((\frac{\pi }{3}, \frac{\pi }{2})\), we have
$$\begin{aligned} p_2^* = \mathbb {E}_{\varTheta \in (\frac{\pi }{3}, \frac{\pi }{2})}\left[ p^*(\varTheta )\right] = \int _{\pi /3}^{\pi /2} f(\theta ) p^*(\theta ) d\theta . \end{aligned}$$
(12)
Substituting \(p^*(\theta )\) and the expression of Lemma 4 for \(f(\theta )\), noting that \(\int _{\pi /3}^{\pi /2} f(\theta ) d\theta \approx \int _0^{\pi /2} f(\theta ) d\theta = 1\), we get
$$\begin{aligned} p_2^* = \int _{\pi /3}^{\pi /2} (\sin \theta )^n \left[ 1 - \left( 1 - \exp \left[ -3 \ln t\tan ^2\left( \tfrac{\theta }{2}\right) (1 + o(1))\right] \right) ^t\right] d\theta . \end{aligned}$$
(13)
For convenience, let us write \(w(\theta ) = [-3 \ln t\tan ^2\left( \frac{\theta }{2}\right) (1 + o(1))\). Note that for \(\theta \gg \frac{\pi }{3}\) we have \(w(\theta ) \ll -\ln t\) so that \((1 - \exp w(\theta ))^t \approx 1 - t \exp w(\theta )\), in which case we can simplify the expression between square brackets. However, the integration range includes \(\frac{\pi }{3}\) as well, so to be careful we will split the integration interval at \(\frac{\pi }{3} + \delta \), where \(\delta = \varTheta (n^{-1/2})\). (Note that any value \(\delta \) with \(\frac{1}{n} \ll \delta \ll 1\) suffices.)
$$\begin{aligned} p_2^* = \underbrace{\int _{\pi /3}^{\pi /3 + \delta } f(\theta ) p^*(\theta ) d\theta }_{I_1} + \underbrace{\int _{\pi /3 + \delta }^{\pi /2} f(\theta ) p^*(\theta ) d\theta }_{I_2}. \end{aligned}$$
(14)
Bounding \(I_1\). Using \(f(\theta ) \le f(\frac{\pi }{3} + \delta )\), \(p^*(\theta ) \le 1\), and \(\sin (\frac{\pi }{3} + \delta ) = \frac{1}{2} \sqrt{3} \left[ 1 + O(\delta )\right] \) (which follows from a Taylor expansion of \(\sin x\) around \(x = \frac{\pi }{3}\)), we obtain
$$\begin{aligned} I_1 \le \text {poly}(n) \sin ^n(\tfrac{\pi }{3} + \delta ) = \text {poly}(n) (\tfrac{\sqrt{3}}{2})^n \left( 1 + O(\delta )\right) ^n = 2^{-\gamma _1 n + o(n)}. \end{aligned}$$
(15)
Bounding \(I_2\). For \(I_2\), our choice of \(\delta \) is sufficient to make the aforementioned approximation workFootnote 6. Thus, for \(I_2\) we obtain the simplified expression
$$\begin{aligned} I_2&\le \text {poly}(n) \int _{\pi /3 + \delta }^{\pi /2} (\sin \theta )^n t \exp \left[ -3 \ln t\tan ^2\left( \frac{\theta }{2}\right) (1 + o(1))\right] d\theta \end{aligned}$$
(16)
$$\begin{aligned}&\le \int _{\pi /3}^{\pi /2} 2^{n \log _2 \sin \theta - (3 \tan ^2\left( \frac{\theta }{2}\right) - 1) \log _2 t + o(n)} d\theta . \end{aligned}$$
(17)
Note that the integrand is exponential in n and that the exponent \(E(\theta ) = n \log _2 \sin \theta + (-3 \tan ^2 \frac{\theta }{2} - 1) \log _2 t\) is a continuous, differentiable function of \(\theta \). So the asymptotic behavior of the entire integral \(I_2\) is the same as the asymptotic behavior of the integrand’s maximum value:
$$\begin{aligned} \log _2 I_2&\le \max _{\theta \in (\frac{\pi }{3}, \frac{\pi }{2})} \big \{n \log _2 \sin \theta - \left( 3 \tan ^2 \tfrac{\theta }{2} - 1\right) \log _2 t \big \} + o(n). \end{aligned}$$
(18)
Bounding \(p_2^* = I_1 + I_2\). Combining (15), (18), and \(c_t = \frac{1}{n} \log _2 t\), we have
$$\begin{aligned} \tfrac{\log _2 p_2^*}{n} \le \max \{-\gamma _1, \ \max _{\theta \in (\frac{\pi }{3}, \frac{\pi }{2})} \{\log _2 \sin \theta - (3 \tan ^2 \tfrac{\theta }{2} - 1) c_t \}\} + o(1). \end{aligned}$$
(19)
The assumption \(c_n \ge \gamma _1\) and the definition of \(\alpha \le 1\) now give \(\log _2 p_2^* \le -\alpha c_n n + o(n)\) which completes the proof.
1.3 A.3 Balancing the Parameters
Recall that the overall time and space complexities are given by \(O(N \cdot t) = 2^{(c_n + c_t)n + o(n)}\) (time for hashing), \(O(N^2 \cdot p_2^*) = 2^{(c_n + (1 - \alpha ) c_n)n + o(n)}\) (time for comparing vectors), and \(O(N \cdot t) = 2^{(c_n + c_t)n + o(n)}\) (memory requirement). For the overall time and space complexities \(2^{c_{\text {time}} n}\) and \(2^{c_{\text {space}} n}\) we find
$$\begin{aligned} c_{\text {time}}&= c_n + \max \{c_t, (1 - \alpha ) c_n\} + o(1), \quad c_{\text {space}} = c_n + c_t + o(1). \end{aligned}$$
(20)
Further recall that from Nguyen and Vidick’s analysis, we have \(N = (4/3)^{n/2 + o(n)}\) or \(c_n = \gamma _1\). To balance the time complexities of hashing and searching, so that the overall time complexity is minimized, we solve \((1 - \alpha ) \gamma _1 = c_t\) numericallyFootnote 7 for \(c_t\) to obtain the following corollary. Here \(\theta ^*\) denotes the dominant angle \(\theta \) maximizing the expression in (10). Note that the final result takes into account the density at \(\theta = \theta ^*\) as well, and so the result does not simply follow from Lemma 2.
Corollary 1
Taking \(c_t \approx 0.089624\) leads to:
$$\begin{aligned} \theta ^* \approx 0.42540 \pi , \ \alpha \approx 0.56812, \ c_{{\text {time}}} \approx 0.29714, \ c_{{\text {space}}} \approx 0.29714. \end{aligned}$$
(21)
Thus, setting \(t \approx 2^{0.08962 n}\) and \(k = \varTheta (\sqrt{n})\), the heuristic time and space complexities of the SphereSieve algorithm are balanced at \(2^{0.29714n + o(n)}\).
1.4 A.4 Trade-Off Between the Space and Time Complexities
Finally, note that \(c_t = 0\) leads to the original Nguyen-Vidick sieve algorithm, while \(c_t \approx 0.089624\) minimizes the heuristic time complexity at the cost of more space. One can obtain a continuous trade-off between these two extremes by considering values \(c_t \in (0, 0.089624)\). Numerically evaluating the resulting complexities for this range of values of \(c_t\) leads to the curve shown in Fig. 1.