1 Introduction

Among all the candidates submitted in 2017 to the NIST standardization of post-quantum cryptography, the majority are based on hard lattice problems, such as LWE and NTRU problems. Unfortunately, security estimates for lattice problems are known to be difficult: many different assessments exist in the research literature, which is reflected in the wide range of security estimates in NIST submissions (see [2]), depending on the model used. One reason is that the performance of lattice algorithms depends on many parameters: we do not know how to select these parameters optimally, and we do not know how far from optimal are current parameter selections. The most sensitive issue is the evaluation of the cost of a subroutine to find shortest or nearly shortest lattice vectors in certain dimensions (typically the blocksize of blockwise reduction algorithms). In state-of-the-art lattice reduction software [7, 9, 11], this subroutine is implemented by lattice enumeration with extreme pruning, introduced at Eurocrypt ’10 by Gama, Nguyen and Regev [16] as a generalization of pruning methods introduced by Schnorr et al. [34, 35] in the 90s. Yet, most lattice-based NIST submissions chose their parameters based on the assumption that sieving [1, 8, 20, 22, 28] (rather than enumeration) is the most efficient algorithm for this subroutine. This choice goes back to the analysis of NewHope [3, Sect. 6], which states that sieving is more efficient than enumeration in dimension \(\ge 250\) for both classical and quantum computers, based on a lower bound on the cost of sieving (ignoring subexponential terms) and an upper bound on the cost of enumeration (either [11, Table 4] or [10, Table 5.2]). In dimensions around \(140-150\), this upper bound is very close to actual running times for solving the largest record SVP challenges [32], which does not leave much margin for future progress; and for dimensions \(\ge 250\), a numerical extrapolation has been used, which is also debatable.

It would be more consistent to compare the sieving lower bound by a lower bound on lattice enumeration with extreme pruning. Unfortunately, no such lower bound is known: the performances of extreme pruning strongly depends on the choice of bounding function, and it is unknown how good can be such a function. There is only a partial lower bound on the cost of extreme pruning in [11], assuming that the choice of step bounding function analyzed in [16] is optimal. And this partial lower bound is much lower than the upper bound given in [10, 11].

Our results. We study the limitations of lattice enumeration with extreme pruning. We prove the first lower bound on the cost of extreme pruning, given a lower bound on the global success probability. This is done by studying the case of a single enumeration with cylinder pruning, and generalizing it to the extreme pruning case of multiple enumerations, possibly infinitely many. Our results are based on geometric properties of cylinder intersections and a probabilistic form of isoperimetry: usually, isoperimetry refers to a geometric inequality involving the surface area of a set and its volume.

Our lower bounds are easy to compute and appear to be reasonably tight in practice, at least in the single enumeration case: we introduce a cross-entropy-based method which experimentally finds upper bounds somewhat close to our lower bounds.

Impact. By combining our lower bounds with models of strongly-reduced lattice bases introduced in [7, 11, 26] and quantum speed-ups for enumeration [6], we obtain more sound comparisons with sieving: see Fig. 1 for an overview. It suggests that enumeration is faster than sieving up to higher dimensions than previously considered by lattice-based submissions to NIST post-quantum cryptography standardization: the cost lower bound used by many NIST submissions is not as conservative as previously believed, especially in the quantum setting. Concretely, in the quantum setting, the lower bounds of enumeration and sieving cross in dimensions roughly 300–400 in the HKZ-basis model or beyond 500 in the Rankin-basis model, depending on how many enumerations are allowed. We note that in high dimension, our lower bound for enumeration with \(10^{10}\) HKZ bases is somewhat close to the numerical extrapolation of [17, (2)], called Core-Enum+O(1) in [2].

Fig. 1.
figure 1

Upper/lower bounds on the classical/quantum cost of enumeration with cylinder pruning, using strongly-reduced basis models. See Sect. 5 for the exact meaning of these curves: the lower bounds correspond to (16) and (17) and the upper bounds are found by the algorithm in Sect. 4. For comparison, we also displayed several curves from [2]: \(2^{0.292 n}\) and \(2^{0.265 n}\) as the simplified classical/quantum complexity of sieve algorithms, and the numerical extrapolation of enumeration cost of [17, (2)]. (Color figure online)

Technical overview. Enumeration is the simplest algorithm to solve hard lattice problems: it outputs \(L \cap B\), given a lattice L and an n-dimensional ball \(B \subseteq \mathbb {R}^n\). It dates back to the early 1980s [15, 18, 29] but has been significantly improved in practice in the past twenty years, thanks to pruning methods introduced by Schnorr et al. [33,34,35], and later revisited and generalized as respectively cylinder pruning by Gama, Nguyen and Regev [16] and discrete pruning by Aono and Nguyen [5]: pruning methods offer a trade-off by enumerating over a special subset \(S \subseteq B\), at the expense of missing solutions. Gama et al. [16] introduced the idea of extreme pruning where one repeats pruned enumeration many times over different sets S: this can be globally faster than full enumeration, even if a single enumeration has a negligible probability of returning solutions. In the case of cylinder pruning, [16] showed that the speed-up can be asymptotically exponential for simple choices of the pruning subset S.

Cylinder pruning uses the intersection S of n cylinders defined by a lattice basis and a bounding function f: by using different lattice bases B, one obtains different sets S. The running time and the success probability of cylinder pruning depend on the quality of the basis, and the bounding function f. But when one uses different bases, these bases typically have approximately the same quality, which allows to focus on f, which determines the radii of S.

The probability of success of cylinder pruning is related to the volume of S, whereas its cost is related to the volumes of the ‘canonical’ projections of S. We show that if the success probability is lower bounded, that is, if S is sufficiently big (with respect to its volume, or its Gaussian measure for the case of solving LWE), then the function f defining S can be lower bounded: as a special case, if S occupies a large fraction of the ball, f is lower bounded by essentially the linear pruning function of [16]. This immediately gives lower bounds on the volumes of the projections of S, but we significantly improve these direct lower bounds using the following basic form of isoperimetry: for certain distributions such as the Gaussian distribution, among all Borel sets of a given volume, the ball centered at the origin has the largest probability. The extreme pruning case is obtained by a refinement of isoperimetry over finitely many sets: it is somewhat surprising that we obtain a lower bound even in the extreme case where we allow infinitely many sets S.

All our lower bounds are easy to compute. To evaluate their tightness, we introduce a method based on cross-entropy to compute good upper bounds in practice, i.e., good choices of f. This is based on earlier work by Chen [10].

Open problem. Our lower bounds are specific to cylinder pruning [16]. It would be interesting to obtain tight lower bounds for discrete pruning [5].

Roadmap. In Sect. 2, we introduce background and notation on lattices, enumeration and its cost estimations. Section 3 presents our lower bounds as geometric properties of cylinder intersections. Section 4 shows how to obtain good upper bounds in practice, by finding nice cylinder intersections using cross-entropy. Finally, in Sect. 5, we evaluate the tightness of our lower bounds and discuss security estimates for the hardness of finding nearly shortest lattice vectors. The appendix includes proofs of technical results. The full version of this paper on eprint also includes sage scripts to compute our lower bounds.

2 Background

2.1 Notation

Throughout the paper, we use row representations of matrices. The Euclidean norm of a vector \(\mathbf {v}\in \mathbb {R}^{n}\) is denoted \(\left\| \mathbf {v}\right\| \). The ‘canonical’ projection of \(\mathbf {u} \in \mathbb {R}^n\) onto \(\mathbb {R}^k\) for \(1 \le k \le n\) is the truncation \(\tau _k(\mathbf {u}) = (u_1,\dots ,u_k)\).

Measures. We denote by \(\mathrm {vol}\) the standard Lebesgue measure over \(\mathbb {R}^n\). We denote by \(\rho _{n,\sigma }\) the centered Gaussian measure of variance \(\sigma ^2\), whose pdf over \(\mathbb {R}^n\) is

$$ (2\pi \sigma ^2)^{-n/2} e^{-\Vert \mathbf {x} \Vert ^2/(2\sigma ^2)}.$$

The standard Gaussian measure is \(\rho _n = \rho _{n,1}\).

Balls. We denote by \(\mathrm {Ball}_n(R)\) the n-dimensional zero-centered ball of radius R. Let \(V_n(R)=\mathrm {vol}( \mathrm {Ball}_n(R))\). Let \(\mathbf {u}=(u_1,\dots ,u_n)\) be a point chosen uniformly at random from the unit sphere \(S^{n-1}\), e.g. \(u_i = x_i/\sqrt{\sum _{j=1}^n x_j^2}\), where \(x_1,\dots ,x_n\) are independent, normally distributed random variables with mean 0 and variance 1. Then \( \Vert \tau _k(\mathbf {u})\Vert ^2 = \frac{\sum _{i=1}^k x_i^2}{\sum _{i=1}^k x_i^2 + \sum _{i=k+1}^n x_i^2 }=\frac{X}{X+Y},\) where X and Y have distributions \(\text {Gamma}(k/2,\theta =2)\) and \(\text {Gamma}((n-k)/2,\) \(\theta =2)\) respectively. Here, we use the scale parametrization to represent Gamma distributions. Hence, \(\Vert \tau _k(\mathbf {u})\Vert ^2\) has distribution \(\text {Beta}(k/2,(n-k)/2)\). In particular, \(\Vert \tau _{n-2}(\mathbf {u})\Vert ^2\) has distribution \(\text {Beta}(n/2-1,1)\), whose pdf is \(x^{(n/2)-2}/B(n/2-1,1)=(n/2-1) x^{(n/2)-2}\). It follows that the truncation \(\tau _{n-2}(\mathbf {u})\) is uniformly distributed over \(\mathrm {Ball}_{n-2}(1)\), which allows to transfer our results to random points in balls.

Recall that the cumulative distribution function of the \(\text {Beta}(a,b)\) distribution is the regularized incomplete beta function \(I_x(a,b)\) defined as:

$$\begin{aligned} I_x(a,b) = \frac{1}{B(a,b)}\int _0^x u^{a-1} (1-u)^{b-1} du , \end{aligned}$$
(1)

where \(B(a,b) = \frac{\Gamma (a)\Gamma (b)}{\Gamma (a+b)}\) denotes the beta function. We have the following elementary bounds (by integrating by parts):

$$\begin{aligned} \frac{x^a (1-x)^{b-1}}{a B(a,b)}&\le I_x(a,b)&\,\,\, \,\,\,\,\,\,\forall a > 0, b \ge 1, 0 \le x \le 1 \end{aligned}$$
(2)
$$\begin{aligned} I_x(a,b) \le \frac{x^a}{a\cdot B(a,b)}&\qquad \,\,\, \,\,\,\,\,\, \forall a > 0, b \ge 1, 0 \le x \le 1 \end{aligned}$$
(3)

For \(z\in [0,1]\) and \(a,b>0\), \(I_z^{-1}(a,b) + I_{1-z}^{-1}(b,a)=1\) which is immediate from the relation \(I_x(a,b)+I_{1-x}(b,a)=1\).

Finally, \(P(s,x)=\int _{0}^x t^{s-1}e^{-t}dt/\Gamma (s)\) is the regularized incomplete gamma function.

Lattices. A lattice L is a discrete subgroup of \(\mathbb {R}^{m}\), or equivalently the set \(L(\mathbf {b}_{1},\dots ,\mathbf {b}_{n})=\left\{ \sum _{i=1}^{n}x_{i}\mathbf {b}_{i} ~:~ x_{i}\in \mathbb {Z}\right\} \) of all integer combinations of n linearly independent vectors \(\mathbf {b}_{1},\dots ,\mathbf {b}_{n} \in \mathbb {R}^{m}\). Such \(\mathbf {b}_i\)’s form a basis of L. All the bases of L have the same number n of elements, called the dimension or rank of L, and the same n-dimensional volume of the parallelepiped \(\left\{ \sum _{i=1}^{n}a_{i}\mathbf {b}_{i} ~:~ a_{i}\in [0,1) \right\} \) they generate. We call this volume the co-volume, or determinant, of L, and denote it by \(\mathrm {covol}(L)\). The lattice L is said to be full-rank if \(n=m\). The most famous lattice problem is the shortest vector problem (SVP), which asks to find a non-zero lattice vector of minimal Euclidean norm. The closest vector problem (CVP) asks to find a lattice vector closest to a target vector.

Orthogonalization. For a basis \(B=(\mathbf {b}_{1},\dots ,\mathbf {b}_{n})\) of a lattice L and \(i\in \{1,\ldots ,n\}\), we denote by \(\pi _{i}\) the orthogonal projection on \(\mathrm {span}(\mathbf {b}_{1},\dots ,\mathbf {b}_{i-1})^{\perp }\). The Gram-Schmidt orthogonalization of the basis B is defined as the sequence of orthogonal vectors \(B^{\star } = (\mathbf {b}^{\star }_{1},\dots ,\mathbf {b}^{\star }_{n})\), where \(\mathbf {b}^{\star }_i := \pi _i(\mathbf {b}_i)\). We can write each \(\mathbf {b}_i\) as \(\mathbf {b}^{\star }_{i}+\sum _{j=1}^{i-1}\mu _{i,j}\mathbf {b}^{\star }_{j}\) for some unique \(\mu _{i,1},\ldots ,\mu _{i,i-1} \in \mathbb {R}\). Thus, we may represent the \(\mu _{i,j}\)’s by a lower-triangular matrix \(\mu \) with unit diagonal. The projection of a lattice may not be a lattice, but \(\pi _{i}(L)\) is an \(n+1-i\) dimensional lattice generated by \(\pi _{i}(\mathbf {b}_{i}),\dots ,\pi _{i}(\mathbf {b}_{n})\), with \(\mathrm {covol}(\pi _{i}(L))=\prod _{j=i}^{n}\big \Vert \mathbf {b}^{\star }_{j} \big \Vert \).

The Gaussian Heuristic. For a full-rank lattice L in \(\mathbb {R}^n\) and a measurable set \(C\subset \mathbb {R}^n\), the Gaussian heuristic estimates the number of lattice points inside of C to be approximately \(\mathrm {vol}(C)/\mathrm {vol}(L)\). Accordingly, we would expect that \(\lambda _1(L)\) might be close to \(\mathrm {GH}(L) = V_n(1)^{-1/n} \mathrm {vol}(L)^{1/n}\), which holds for a random lattice L.

Cylinders. The performances of cylinder pruning are directly related to the following bodies. Define the (k-dimensional) cylinder-intersection of radii \(R_{1}\le \dots \le R_{k}\) as the set

$$ C_{R_{1},\dots ,R_{k}}=\left\{ \left( x_{1},\dots ,x_{k}\right) \in \mathbb {R}^{k},\ \forall j\le k,\ \sum _{\ell =1}^{j}x_{\ell }^{2}\le R_{j}^{2}\right\} \subseteq \mathrm {Ball}_k(R_k). $$

Gama et al. [16] showed how to efficiently compute tight lower and upper bounds for \(\mathrm {vol}(C_{R_{1},\dots ,R_{k}})\), thanks to the Dirichlet distribution and special integrals.

2.2 Enumeration with Cylinder Pruning

To simplify notations, we assume that we focus on the SVP setting, i.e. to find short lattice vectors, rather than the more general CVP setting. Let L be a full-rank lattice in \(\mathbb {R}^n\). Given a basis \(B=(\mathbf {b}_1,\dots ,\mathbf {b}_n)\) of L and a radius \(R>0\), Enumeration [15, 18, 29] outputs \(L \cap S\) where \(S=\mathrm {Ball}_{n}(R)\) by a depth-first tree search: by comparing all the norms of the vectors obtained, one extracts a shortest non-zero lattice vector.

We follow the general pruning framework of [5], which replaces S by a subset of S depending on B. Given a function \(f: \{1,\dots ,n\} \rightarrow [0,1]\), Gama et al. [16] introduced the following set to generalize the pruned enumeration of [34, 35]:

$$\begin{aligned} P_f(B,R) = \{ \mathbf {x} \in \mathbb {R}^n \,\,\text {s.t.}\,\, \Vert \pi _{n+1-i}(\mathbf {x})\Vert \le f(i) R \,\,\text {for all}\,\, 1 \le i \le n \}, \end{aligned}$$
(4)

where the \(\pi _i\) is the projection over \(\mathrm {span}(\mathbf {b}_1,\dots ,\mathbf {b}_{i-1})^{\perp }\). The set \(P_f(B,R)\) should be viewed as a random variable. Note that \( P_f(B,R) \subseteq \mathrm {Ball}_{n}(R) \) and if g is the constant function equal to 1, then \(P_g(B,R)= \mathrm {Ball}_{n}(R)\).

Gama et al. [16] noticed that the basic enumeration algorithm can actually compute \(L \cap P_f(B,R)\) instead of \(L \cap \mathrm {Ball}_{n}(R)\), just by changing its parameters. We call cylinder pruning this form of pruned enumeration, because \(P_f(B,R)\) is an intersection of cylinders, since each equation \(\Vert \pi _{n+1-i}(\mathbf {x})\Vert \le f(i) R\) defines a cylinder. Cylinder pruning was historically introduced in the SVP setting, but its adaptation to CVP is straightforward, as was shown by Liu and Nguyen [21].

Complexity of Enumeration. The advantage is that for suitable choices of f, enumerating \(L \cap P_f(B,R)\) is much cheaper than enumerating \(L \cap \mathrm {Ball}_{n}(R)\): indeed, [16] shows that cylinder pruning runs in \(\sum _{k=1}^n N_k\) poly-time operations, where \(N_k\) is the number of points of \(\pi _{n+1-k}(L \cap P_f(B,R))\): this is because \(N_k\) is exactly the number of nodes at depth \(n-k+1\) of the enumeration tree which is searched by cylinder pruning. By the Gaussian heuristic, we have heuristically \(N_k \approx H_k\) where:

$$ H_k = \frac{\mathrm {vol}(\pi _{n+1-k}(P_f(B,R)))}{\mathrm {covol}(\pi _{n+1-k}(L))} = \frac{\mathrm {vol}(C_{Rf(1),\dots ,Rf(k)})}{\mathrm {covol}(\pi _{n+1-k}(L))}.$$

It follows that the complexity of cylinder pruning is heuristically:

$$\begin{aligned} N = \sum _{k=1}^n \frac{\mathrm {vol}(C_{Rf(1),\dots ,Rf(k)})}{\prod _{i=n-k+1}^n \Vert \mathbf {b}^{\star }_i\Vert } \end{aligned}$$
(5)

This N is a heuristic estimate of the number of nodes in the tree searched by cylinder pruning. It depends on one hand on R and the bounding function f, but on the other hand on the quality of the basis B, because of the term \(\prod _{i=n-k+1}^n \Vert \mathbf {b}^{\star }_i\Vert \). In the SVP setting, one can further divide (5) by two, because of symmetries in the enumeration tree.

Success Probability. We consider two settings:  

Approximation Setting: :

The algorithm is successful if and only if we find at least one non-zero point of \(L \cap P_f(B,R)\), that is \(L \cap P_f(B,R) \not \subseteq \{0\}\). This is the situation studied in [5] and corresponds to the use of cylinder pruning in blockwise lattice reduction. By the Gaussian heuristic, the number of points of \(L \cap P_f(B,R)\) is heuristically:

$$ \frac{\mathrm {vol}(P_f(B,R))}{\mathrm {covol}(L)} = \frac{\mathrm {vol}(C_{Rf(1),\dots ,Rf(n)})}{\mathrm {covol}(L)}.$$

So we estimate the probability of success as:

$$\begin{aligned} \mathop {\Pr }\limits _{\text {succ}} = \min \left( 1, \frac{\mathrm {vol}(C_{Rf(1),\dots ,Rf(n)})}{\mathrm {covol}(L)}\right) . \end{aligned}$$
(6)

Since \(\mathrm {covol}(L) = V_n(\mathrm {GH}(L))\), if \(R=\beta \mathrm {GH}(L)\), then (6) becomes

$$\begin{aligned} \mathop {\Pr }\limits _{\text {succ}} = \min \left( 1,\beta ^n \frac{\mathrm {vol}(C_{Rf(1),\dots ,Rf(n)})}{V_n(R)}\right) . \end{aligned}$$
(7)
Unique Setting: :

This corresponds to the situation studied in [16] and to bounded distance decoding (BDD). There is a secret vector \(\mathbf {v} \in L\), whose distribution is assumed to be the Gaussian distribution over \(\mathbb {R}^n\) of parameter \(\sigma \). The algorithm is successful if and only if \(\mathbf {v}\) is returned by the algorithm, i.e. if and only if \(\mathbf {v} \in P_f(B,R)\). So we estimate the probability of success as:

$$\begin{aligned} \mathop {\Pr }\limits _{\text {succ}} = \rho _{n,\sigma }( P_f(B,R)) = \rho _{n,\sigma }(C_{f(1)R,\dots ,f(n)R}). \end{aligned}$$
(8)

 

3 Lower Bounds for Cylinder Pruning

In this section, we prove novel geometric properties of cylinder intersections: if a cylinder intersection is sufficiently big (with respect to its volume or its Gaussian measure), we can lower bound the radii defining the intersection, as well as the volume of all its canonical projections, which are also cylinder intersections.

A basic ingredient behind these properties is a special case of cylinder intersections, corresponding to the step-bounding functions used in [16]. More precisely, we consider the intersection of a ball with a cylinder, which we call a ball-cylinder:

$$ D_{k,n}(R,R') = \left\{ \left( x_{1},\dots ,x_{n}\right) \in \mathbb {R}^{n},\ \sum _{l=1}^{k}x_{l}^{2}\le R^{2} \ \ \text {and} \ \ \sum _{l=1}^{n}x_{l}^{2}\le R'^{2}\right\} .$$

In other words, \(D_{k,n}(R,R')=C_{R,\dots ,R,R',\dots ,R'}\) where R is repeated k times, and \(R'\) is repeated \(n-k\) times. The following result is trivial:

Lemma 1

Let \(R_1 \le R_2 \le \cdots \le R_n\) and \(1 \le k \le n\). Then:

$$ C_{R_{1},\dots ,R_{n}} \subseteq D_{k,n}(R_k,R_n).$$

Note that for fixed kn and \(R'\), \(\mathrm {vol}(D_{k,n}(R,R'))\) is an increasing function of R. The following lemma gives properties of the volume and Gaussian measures of ball-cylinders, based on the background:

Lemma 2

Let \(R \le R'\) and \(1 \le k \le n\). Then:

$$\begin{aligned} \mathrm {vol}( D_{k,n}(R,R'))&= V_n(R') \times I_{(R/R')^2}(k/2,1+(n-k)/2) \\ \rho _{k,\sigma }(\mathrm {Ball}_k(R))&\ge \rho _{n,\sigma }( D_{k,n}(R,R')) \ge \rho _{k,\sigma }(\mathrm {Ball}_k(R)) \rho _{n,\sigma }(\mathrm {Ball}_n(R')) \\ \rho _{n,\sigma }(\mathrm {Ball}_n(R))&= P(n/2,R^2 /(2\sigma ^2)) \end{aligned}$$

Proof

Because \(D_{k,n}(R,R') \subseteq \mathrm {Ball}_n(R')\), \( \mathrm {vol}( D_{k,n}(R,R'))/V_n(R')\) is the probability that a random vector \((x_1,\dots ,x_n)\) (chosen uniformly at random from the n-dimensional ball of radius \(R'\)) satisfies \(\sum _{l=1}^{k }x_{l}^{2}\le R^{2},\) that is, \( \sum _{l=1}^{k }(x_{l}/R')^{2}\le (R/R')^{2}.\) It follows that this probability is also the probability that a random vector \((y_1,\dots ,y_n)\) (chosen uniformly at random from the n-dimensional unit ball) satisfies: \( \sum _{l=1}^{k }y_{l}^{2}\le (R/R')^{2}.\) From the background, we know that \( \sum _{l=1}^{k }y_{l}^{2}\) has distribution Beta\((k/2,(n+2-k)/2)\), which proves the first equality.

Note that \(D_{k,n}(R,R') \subseteq D_{k,n}(R,+\infty )\), which proves that \(\rho _{n,\sigma }( D_{k,n}(R,R')) \le \rho _{k,\sigma }(\mathrm {Ball}_k(R))\). Furthermore, by the Gaussian correlation inequality on convex symmetric sets, we have:

$$\begin{aligned} \rho _{n,\sigma }( D_{k,n}(R,R'))&\ge \rho _{n,\sigma }(\mathrm {Ball}_n(R')) \times \rho _{n,\sigma }\left( \{ (x_1,\dots ,x_n) \in \mathbb {R}^n: \sum _{i=1}^k x_i^2 \le R^2 \}\right) \\&= \rho _{k,\sigma }(\mathrm {Ball}_k(R)) \rho _{n,\sigma }(\mathrm {Ball}_n(R')) \end{aligned}$$

which proves that \(\rho _{n,\sigma }( D_{k,n}(R,R')) \ge P(k/2,R^2/(2\sigma ^2)) P(n/2,R'^2/(2\sigma ^2))\).

Finally, let \(x_1,\dots ,x_n\) be independent, normally distributed random variables with mean 0 and variance 1. Then \(X=\sum _{i=1}^n x_i^2\) has the distribution \(\text {Gamma}(n/2,\theta =2)\) whose CDF is . Therefore \(\rho _n(\mathrm {Ball}_n(R)) = P(n/2,R^2/2)\).    \(\square \)

3.1 Lower Bounds on Cylinder Radii

The following theorem lower bounds the radii of any cylinder intersection covering a fraction of the ball:

Theorem 1

Let \(0 \le R_1 \le \dots \le R_n\) be such that \(\mathrm {vol}( C_{R_{1},\dots ,R_{n}} ) \ge \alpha V_n(R_n),\) where \(0 \le \alpha \le 1\). If for all \(1 \le k \le n\), we define \(\alpha _k>0\) by \(I_{\alpha _k}(k/2,1+(n-k)/2)=\alpha \), then \(\mathrm {vol}( D_{k,n}(\sqrt{\alpha _k} R_n,R_n)) \le \mathrm {vol}( C_{R_{1},\dots ,R_{n}})\) and:

$$ R_k \ge \sqrt{\alpha _k} R_n.$$

Proof

Lemma 1 shows that:

$$ \mathrm {vol}( C_{R_{1},\dots ,R_{n}} )\le \mathrm {vol}(D_{k,n}(R_k,R_n)).$$

On the other hand, Lemma 2 shows that by definition of \(\alpha _k\):

$$\begin{aligned} \begin{array}{l} \mathrm {vol}( D_{k,n}(\sqrt{\alpha _k} R_n,R_n)) \\ \displaystyle = V_n(R_n) \times I_{\alpha _k}\left( \frac{k}{2},1+\frac{n-k}{2}\right) = \alpha V_n(R_n) \le \mathrm {vol}( C_{R_{1},\dots ,R_{n}} ), \end{array} \end{aligned}$$

which proves the first statement. Hence:

\( \mathrm {vol}( D_{k,n}(\sqrt{\alpha _k}R_n,R_n)) \le \mathrm {vol}(D_{k,n}(R_k,R_n)),\) which implies that \(R_k \ge \sqrt{\alpha _k} R_n\).

   \(\square \)

The parameter \(\alpha \) in Theorem 1 is directly related to our success probability (7) in the approximation setting: indeed, if \(R_n=\beta GH(L)\) and \(\Pr _{\text {succ}} \ge \gamma \), then \(\alpha = \gamma /\beta ^n\) satisfies the condition of Theorem 1. We have the following Gaussian analogue of Theorem 1, where the lower bound on the volume is replaced by a lower bound on the Gaussian measure:

Theorem 2

Let \(0 \le R_1 \le \dots \le R_n\) be such that \( \rho _{n,\sigma }( C_{R_{1},\dots ,R_{n}} ) \ge \beta ,\) where \(0 \le \beta \le 1\). If for all \(1 \le k \le n\) we define \(\beta _k>0\) by \(P(k/2,\beta _k/(2\sigma ^2))=\beta \), then \(\rho _{n,\sigma }( D_{k,n}(\sqrt{\beta _k},R_n)) \le \rho _{n,\sigma }( C_{R_{1},\dots ,R_{n}})\) and \( R_k \ge \sqrt{\beta _k}.\)

Proof

On the one hand, Lemma 1 shows that:

$$\begin{aligned} \rho _{n,\sigma }( C_{R_{1},\dots ,R_{n}} )\le \rho _{n,\sigma }(D_{k,n}(R_k,R_n)). \end{aligned}$$

On the other hand, Lemma 2 shows that by definition of \(\beta _k\):

$$\begin{aligned} \rho _{n,\sigma }( D_{k,n}(\sqrt{\beta _k},R_n)) \le P(k/2,\beta _k/2(\sigma ^2)) = \beta \le \rho _{n,\sigma }( C_{R_{1},\dots ,R_{n}} ), \end{aligned}$$

which proves the first statement. Hence:

$$\begin{aligned} \rho _{n,\sigma }( D_{k,n}(\sqrt{\beta _k},R_n)) \le \rho _{n,\sigma }(D_{k,n}(R_k,R_n)), \end{aligned}$$

which implies that \(R_k \ge \sqrt{\beta _k}\).    \(\square \)

In Theorem 2, \(\beta \) can be chosen as any lower bound on the success probability in the unique setting (8).

Theorem 1 allows to derive numerical lower bounds on the radii, from any lower bound on the success probability. However, there is a special case for which the lower bound has a simple algebraic form, thanks to the following technical lemma (proved in Appendix A):

Lemma 3

If \(1 \le k \le n\), then:

$$\begin{aligned} 1-P(1/2,1/2) \le I_{k/n}(k/2,(n-k)/2) \le P(1/2,1/2) \end{aligned}$$
(9)

By coupling Theorem 1 and Lemma 3, we obtain that the squared radii of any high-volume cylinder intersection are lower bounded by linear functions:

Theorem 3

Let \(0 \le R_1 \le \dots \le R_n\) such that \( \mathrm {vol}( C_{R_{1},\dots ,R_{n}} ) \ge P(1/2,1/2) \times V_n(R_n).\) Then for all \(1 \le k \le n\):

$$\begin{aligned} R_k \ge \sqrt{\frac{k}{n+2}} R_n. \end{aligned}$$

Proof

The assumption and (9) imply that

$$\begin{aligned} \mathrm {vol}( C_{R_{1},\dots ,R_{n}} ) \ge I_{k/(n+2)}(k/2,1+(n-k)/2) V_n(R_n). \end{aligned}$$

Hence, we can apply Theorem 1 with \(\alpha _k = \sqrt{k/(n+2)}\).    \(\square \)

Note that \(P(1/2,1/2) \approx 0.683\dots \), so any bounding function with high success probability must have a cost lower bounded by that of some linear pruning, which means that its speed-up (compared to full enumeration) is at most single-exponential (see [16]).

3.2 Lower Bounds on Cylinder Volumes from Isoperimetry

The lower bounds on radii given by Theorems 1 and 2 provide lower bounds on \( \mathrm {vol}( C_{R_{1},\dots ,R_{k}} )\) for all \(1 \le k \le n-1\). Indeed, if \(R_k \ge \sqrt{\alpha _k} R_n\), then:

$$ \mathrm {vol}( C_{R_{1},\dots ,R_{k}} ) \ge \mathrm {vol}(C_{ \sqrt{\alpha _1} R_n,\dots ,\sqrt{\alpha _k}R_n}).$$

Such lower bounds immediately provide a lower bound on the cost of enumeration with cylinder pruning, because of (5).

In this subsection, we show that this direct lower bound can be significantly improved, namely it can be replaced by \(V_k( \sqrt{\alpha _k}R_n)\). Our key ingredient is the following isoperimetric result, which says that among all Borel sets of given volume, the ball centered at the origin has the largest measure, for any isotropic measure which decays monotonically radially away :

Theorem 4

(Isoperimetry). Let A be a Borel set of \(\mathbb {R}^k\). Let \(\mathcal {D}\) be a distribution over \(\mathbb {R}^k\) such that its probability density function f is radial and decays monotonically radially away: \(f(\mathbf {x}) \le f(\mathbf {y})\) whenever \(\Vert \mathbf {x}\Vert \ge \Vert \mathbf {y}\Vert \). If a random variable X has distribution \(\mathcal {D}\), then:

$$\begin{aligned} \Pr (X \in A) \le \Pr (X \in B), \end{aligned}$$

where B is the ball of \(\mathbb {R}^k\) centered at the origin such that \(\mathrm {vol}(B) = \mathrm {vol}(A)\).

Proof

The statement is proved in [38, pp. 498–499] for the special case where \(\mathcal {D}\) is the Gaussian distribution over \(\mathbb {R}^k\). However, the proof actually works for any radial probability density function which decays monotonically radially away.    \(\square \)

It implies the following:

Lemma 4

Let \(1 \le k \le n\). Let \(\pi =\tau _k\) be the canonical projection of \(\mathbb {R}^n\) over \(\mathbb {R}^k\). Let C be a subset of the n-dimensional ball of radius \(R'\) such that both C and \(\pi (C)\) are measurable. If R is the radius of the k-dimensional ball of volume \(\mathrm {vol}(\pi (C))\), then:

$$\begin{aligned} \mathrm {vol}(C) \le \mathrm {vol}(D_{k,n}(R,R')) \,\,\text {and}\,\, \rho _{n,\sigma }(C) \le \rho _{n,\sigma }(D_{k,n}(R,R')). \end{aligned}$$

Proof

Let \(B'\) be the n-dimensional centered ball of radius \(R'\). Let B be the k-dimensional centered ball of radius R. Let \(\mathbf {x}\) be chosen uniformly at random from \(B'\). Since \(C \subseteq B'\), \(\mathrm {vol}(C)/V_n(R')\) is exactly \(\Pr (\mathbf {x} \in C)\), and we have:

$$\begin{aligned} \Pr (\mathbf {x} \in C) \le \Pr (\pi (\mathbf {x}) \in \pi (C)). \end{aligned}$$

Let \(\mathcal {D}\) be the distribution of \(\mathbf {y}=\pi (\mathbf {x}) \in \mathbb {R}^k\). Then by Theorem 4,

$$\begin{aligned} \Pr (\mathbf {y} \in \pi (C)) \le \Pr (\mathbf {y} \in B). \end{aligned}$$

Hence:

$$\begin{aligned} \Pr (\mathbf {x} \in C) \le \Pr (\mathbf {y} \in B) = \frac{\mathrm {vol}(D_{k,n}(R,R'))}{V_n(R')}, \end{aligned}$$

which proves the first statement. Similarly, if \(\mathbf {x}\) is chosen from the Gaussian distribution corresponding to \(\rho _{n,\sigma }\), then

$$\begin{aligned} \rho _{n,\sigma }(C)/\rho _{n,\sigma }(B')=\Pr (\mathbf {x} \in C) \le \Pr (\pi (\mathbf {x}) \in \pi (C)). \end{aligned}$$

Let \(\mathcal {D}'\) be the distribution of \(\mathbf {y}=\pi (\mathbf {x}) \in \mathbb {R}^k\): this is a Gaussian distribution. Then by Theorem 4,

$$\begin{aligned} \Pr (\mathbf {y} \in \pi (C)) \le \Pr (\mathbf {y} \in B) = \frac{\rho _{n,\sigma }(D_{k,n}(R,R'))}{\rho _{n,\sigma }(B')}. \end{aligned}$$

   \(\square \)

It has the following geometric consequence:

Corollary 1

Let \(R_1 \le R_2 \le \cdots \le R_n\) and \(1 \le k \le n\). Let \(R>0\) such that \( \mathrm {vol}(C_{R_{1},\dots ,R_{n}}) \ge \mathrm {vol}(D_{k,n}(R,R_n))\) or \( \rho _{n,\sigma }(C_{R_{1},\dots ,R_{n}}) \ge \rho _{n,\sigma }(D_{k,n}(R,R_n))\). Then:

$$\begin{aligned} \mathrm {vol}(C_{R_{1},\dots ,R_{k}}) \ge V_k(R). \end{aligned}$$

Proof

Let \(C=C_{R_{1},\dots ,R_{n}}\) and \(\pi =\tau _k\) be the canonical projection of \(\mathbb {R}^n\) over \(\mathbb {R}^k\). Then \(\pi (C) = C_{R_{1},\dots ,R_{k}}\). If r is the radius the k-dimensional ball of volume \(\mathrm {vol}(\pi (C))\), Lemma 4 implies that: \( \mathrm {vol}(C) \le \mathrm {vol}(D_{k,n}(r,R_n))\) and \( \rho _{n,\sigma }(C) \le \rho _{n,\sigma }(D_{k,n}(r,R_n))\). Thus, by definition of R, we have either \(\mathrm {vol}(D_{k,n}(R,R_n)) \le \mathrm {vol}(C) \le \mathrm {vol}(D_{k,n}(r,R_n))\) or \(\rho _{n,\sigma }(D_{k,n}(R,R_n)) \le \rho _{n,\sigma }(C) \le \rho _{n,\sigma }(D_{k,n}(r,R_n))\), which each imply that \(r \ge R\).    \(\square \)

Note that \(C_{R_{1},\dots ,R_{k}}\) and \(\mathrm {Ball}_k(R)\) are the projections of respectively \(C_{R_{1},\dots ,R_{n}}\) and \(D_{k,n}(R,R_n)\) over \(\mathbb {R}^k\). So the corollary is a bit surprising: if one particular body is “bigger” than the other, then so are their projections. Obviously, this cannot hold for arbitrary bodies in the worst case.

This corollary implies the following lower bounds, which strengthens Theorem 1:

Corollary 2

Under the same assumptions as Theorem 1, we have:

$$\begin{aligned} \mathrm {vol}( C_{R_{1},\dots ,R_{k}} ) \ge V_k(\sqrt{\alpha _k} R_n). \end{aligned}$$

Proof

From Theorem 1, we have: \(\mathrm {vol}( C_{R_{1},\dots ,R_{n}} ) \ge \mathrm {vol}(D_{k,n}(\sqrt{\alpha _k} R_n,R_n)).\) And we apply Corollary 1.    \(\square \)

Similarly, we obtain:

Corollary 3

Under the same assumptions as Theorem 2, we have:

$$\begin{aligned} \mathrm {vol}( C_{R_{1},\dots ,R_{k}} ) \ge V_k(\sqrt{\beta _k} R_n). \end{aligned}$$

It would be interesting to study if the lower bounds of the last two corollaries can be further improved.

3.3 Generalisation to Finitely Many Cylinder Intersections

In this section, we give an analogue of the results of Sect. 3.2 to finitely many cylinder intersections, which corresponds to the extreme pruning setting. The key ingredient is the following refinement of isoperimetry:

Theorem 5

(Isoperimetry). Let \(A_1,\dots ,A_m\) be Borel sets of \(\mathbb {R}^k\). Let \(\mathcal {D}\) be a distribution over \(\mathbb {R}^k\) such that its probability density function f is radial and decays monotonically radially away: \(f(\mathbf {x}) \le f(\mathbf {y})\) whenever \(\Vert \mathbf {x}\Vert \ge \Vert \mathbf {y}\Vert \). If a random variable X has distribution \(\mathcal {D}\), then:

$$\begin{aligned} \frac{1}{m} \sum _{i=1}^m \Pr (X \in A_i) \le \Pr (X \in B), \end{aligned}$$

where B is the ball of \(\mathbb {R}^k\) centered at the origin such that \(\mathrm {vol}(B) = \frac{1}{m} \sum _{i=1}^m \mathrm {vol}(A_i)\).

Proof

The statement is proved in [38, pp. 499–500] for the special case where \(\mathcal {D}\) is the Gaussian distribution over \(\mathbb {R}^k\). However, the proof actually works for any radial probability density function which decays monotonically radially away.    \(\square \)

Lemma 5

Let \(1 \le k \le n\). Let \(\pi =\tau _k\) be the canonical projection of \(\mathbb {R}^n\) over \(\mathbb {R}^k\). Let \(C_1,\dots ,C_m \subseteq \mathrm {Ball}_n(R')\) such that all the \(C_i\)’s and \(\pi (C_i)\)’s are measurable. If R is the radius of the k-dimensional ball of volume \(\frac{1}{m} \sum _{i=1}^m \mathrm {vol}(\pi (C_i))\), then:

$$\begin{aligned} \frac{1}{m} \sum _{i=1}^m \mathrm {vol}(C_i) \le \mathrm {vol}(D_{k,n}(R,R')) \,\,\text {and}\,\, \frac{1}{m} \sum _{i=1}^m \rho _{n,\sigma }(C_i) \le \rho _{n,\sigma }(D_{k,n}(R,R')). \end{aligned}$$

Proof

Let \(B'\) be the n-dimensional centered ball of radius \(R'\). Let B be the k-dimensional centered ball of radius R such that \(\mathrm {vol}(B) = \frac{1}{m} \sum _{i=1}^m \mathrm {vol}(\pi (C_i))\). Let \(\mathbf {x}\) be chosen uniformly at random from \(B'\). Since \(C_i \subseteq B'\), \(\mathrm {vol}(C_i)/V_n(R')\) is exactly \(\Pr (\mathbf {x} \in C_i)\), and we have:

$$\begin{aligned} \Pr (\mathbf {x} \in C_i) \le \Pr (\pi (\mathbf {x}) \in \pi (C_i)). \end{aligned}$$

Let \(\mathcal {D}\) be the distribution of \(\mathbf {y}=\pi (\mathbf {x}) \in \mathbb {R}^k\). Then by Theorem 5,

$$\begin{aligned} \frac{1}{m} \sum _{i=1}^m \Pr (\mathbf {y} \in \pi (C_i)) \le \Pr (\mathbf {y} \in B). \end{aligned}$$

Hence:

$$\begin{aligned} \frac{1}{m} \sum _{i=1}^m \Pr (\mathbf {x} \in C_i) \le \Pr (\mathbf {y} \in B) = \frac{\mathrm {vol}(D_{k,n}(R,R'))}{V_n(R')}, \end{aligned}$$

which proves the first statement.    \(\square \)

It has the following geometric consequence:

Corollary 4

Let \(C_1,\dots ,C_m \subseteq \mathrm {Ball}_n(R_n)\) be n-dimensional cylinder intersections. Let \(1 \le k \le n\) and denote by \(\pi =\tau _k\) the canonical projection of \(\mathbb {R}^n\) over \(\mathbb {R}^k\). Let \(R>0\) such that \( \frac{1}{m} \sum _{i=1}^m \mathrm {vol}(C_i) \ge \mathrm {vol}(D_{k,n}(R,R_n))\) or \( \frac{1}{m} \sum _{i=1}^m \rho _{n,\sigma }(C_i)\ge \rho _{n,\sigma }(D_{k,n}(R,R_n))\). Then:

$$ \frac{1}{m} \sum _{i=1}^m \mathrm {vol}(\pi (C_i)) \ge V_k(R).$$

Proof

If r is the radius of the k-dimensional ball of volume \(\frac{1}{m} \sum _{i=1}^m \mathrm {vol}(\pi (C_i))\), the Lemma 5 implies that: \( \frac{1}{m} \sum _{i=1}^m \mathrm {vol}(C_i) \le \mathrm {vol}(D_{k,n}(r,R_n))\) and \( \frac{1}{m} \sum _{i=1}^m \rho _{n,\sigma }(C_i) \le \rho _{n,\sigma }(D_{k,n}(r,R_n))\). Thus, by definition of R, we have either \(\mathrm {vol}(D_{k,n}(R,R_n)) \le \mathrm {vol}(C) \le \mathrm {vol}(D_{k,n}(r,R_n))\) or \(\rho _{n,\sigma }(D_{k,n}(R,R_n)) \le \rho _n(C) \le \rho _{n,\sigma }(D_{k,n}(r,R_n))\), which each imply that \(r \ge R\).    \(\square \)

Theorem 6

Let \(C_1,\dots ,C_m \subseteq \mathrm {Ball}_n(R_n)\) be n-dimensional cylinder intersections such that \( \sum _{i=1}^m \mathrm {vol}(C_i) \ge m\alpha V_n(R_n),\) where \(0 \le \alpha \le 1\). If for all \(1 \le k \le n\), we define \(\alpha _k>0\) by \(I_{\alpha _k}(k/2,1+(n-k)/2)=\alpha \), then \(\mathrm {vol}( D_{k,n}(\sqrt{\alpha _k} R_n,R_n)) \le \frac{1}{m} \sum _{i=1}^m \mathrm {vol}(C_i)\) and:

$$\begin{aligned} \sum _{i=1}^m \mathrm {vol}(\pi (C_i)) \ge m V_k(\sqrt{\alpha _k}R_n), \end{aligned}$$

where \(\pi =\tau _k\) denotes the canonical projection of \(\mathbb {R}^n\) over \(\mathbb {R}^k\).

Proof

Lemma 2 shows that by definition of \(\alpha _k\):

$$\begin{aligned} \mathrm {vol}( D_{k,n}(\sqrt{\alpha _k} R_n,R_n)) = \alpha V_n(R_n) \le \frac{1}{m} \sum _{i=1}^m \mathrm {vol}(C_i) , \end{aligned}$$

which proves the first statement. And the rest follows by Lemma 4.    \(\square \)

Again, the parameter \(\alpha \) in Theorem 6 is directly related to our global success probability (7) in the approximation setting: the global success probability is \(\le \sum _{i=1}^m \mathrm {vol}(C_i)/\mathrm {covol}(L)\) so if \(R_n=\beta GH(L)\) and the global success probability is \(\ge \gamma \), then \(\alpha = \gamma /(m \beta ^n)\) satisfies the condition of Theorem 1.

We have the following Gaussian analogue of Theorem 6:

Theorem 7

Let \(C_1,\dots ,C_m \subseteq \mathrm {Ball}_n(R_n)\) be n-dimensional cylinder intersections such that \( \sum _{i=1}^m \rho _{n,\sigma }(C_i) \ge m\beta ,\) where \(0 \le \beta \le 1/m\). If for all \(1 \le k \le n\), we define \(\beta _k>0\) by \(P(k/2,\beta _k/(2\sigma ^2))=\beta \), then \(\rho _{n,\sigma }( D_{k,n}(\sqrt{\beta _k} R_n,R_n)) \le \frac{1}{m} \sum _{i=1}^m \rho _{n,\sigma }(C_i)\) and:

$$\begin{aligned} \sum _{i=1}^m \mathrm {vol}(\pi (C_i)) \ge m V_k(\beta _k), \end{aligned}$$

where \(\pi =\tau _k\) denotes the canonical projection of \(\mathbb {R}^n\) over \(\mathbb {R}^k\).

In the unique setting, the global success probability is \(\le \sum _{i=1}^m \rho _{n,\sigma }(C_i) \), so if the global success probability is \(\ge \gamma \), then \(\beta = \gamma /m\) satisfies the condition of Theorem 7.

Surprisingly, we will show that Theorems 6 and 7 imply that we can lower bound the cost of extreme pruning, independently of the number m of cylinder intersections:

Lemma 6

Let the global probability \(0 \le \alpha ' \le 1\) and \(1 \le k \le n\). Let \(\alpha = \alpha '/m\) and \(\alpha _k >0\) such that \(I_{\alpha _k}(k/2,1+(n-k)/2)=\alpha \). Then, \(m V_k(\sqrt{\alpha _k})\) is strictly decreasing w.r.t. m, yet lower bounded by some linear function of \(\alpha '\):

$$\begin{aligned} m V_k(\sqrt{\alpha _k}) > \alpha ' \cdot \frac{k V_k(1)}{2} \cdot B\left( \frac{k}{2},1+\frac{n-k}{2}\right) . \end{aligned}$$

Furthermore, for fixed \(\alpha '\), k and n, the left-hand side converges to the right-hand side when m goes to infinity and \(\alpha _k\) is defined as above.

Lemma 6 implies that the cost of enumeration decreases as the number of cylinder intersections increases, if the global probability \(\alpha '\) is fixed. However, there is a limit given by some linear function of \(\alpha '\) which depends only on n.

To prove the lemma, we use the following two lemmas:

Lemma 7

For \(a\ge 0, b\ge 1, 0<z\le 1\):

$$\begin{aligned} \frac{\partial }{\partial z} I_z^{-1} (a,b) \ge \frac{1}{az} I_z^{-1} (a,b) \end{aligned}$$

Proof

Substituting \(x=I_z^{-1}(a,b)\) in (3) we obtain:

$$\begin{aligned} \frac{(1-I_z^{-1}(a,b))^{b-1} (I_z^{-1}(a,b))^{a}}{aB(a,b)} \le z. \end{aligned}$$

This implies that

$$\begin{aligned} \frac{\partial }{\partial z} I_z^{-1} (a,b) = B(a,b)(1-I_z^{-1}(a,b))^{1-b} (I_z^{-1}(a,b))^{1-a} \ge \frac{1}{az} I_z^{-1}(a,b). \end{aligned}$$

   \(\square \)

Lemma 8

For \(a\ge 0, b\ge 1\):

$$\begin{aligned} \lim _{y\rightarrow 0+} \frac{y}{(I^{-1}_y(a,b))^a} = \frac{1}{a\cdot B(a,b)} \end{aligned}$$

Proof

Bounding inequalities (2) and (3) from both sides implies that

$$\begin{aligned} \lim _{x\rightarrow 0+} \frac{I_x(a,b)}{x^a} = \frac{1}{a\cdot B(a,b)}. \end{aligned}$$

Letting \(x=I^{-1}_y(a,b)\), the claim holds.    \(\square \)

Proof of Lemma 6

We have \(I_{\alpha _k}(k/2,1+(n-k)/2)=\alpha '/m\) and \(\alpha _k = I_{\alpha '/m}^{-1} (k/2,1+(n-k)/2)\). This gives:

$$\begin{aligned} m V_k(\sqrt{\alpha _k}) = V_k(1) m \cdot \left( I_{\alpha '/m}^{-1} (k/2,1+(n-k)/2) \right) ^{k/2}. \end{aligned}$$

Thus, to show the first claim, it suffices to prove that

$$\begin{aligned} g(y) = \frac{1}{y} \left( I_{\alpha ' y}^{-1} (k/2,1+(n-k)/2) \right) ^{k/2} \end{aligned}$$

is strictly increasing over \(0<y\le 1\).

For simplicity, we write \(I:=I_{\alpha ' y}^{-1} (k/2,1+(n-k)/2)\) and we have:

$$\begin{aligned} g'(y) = \frac{\alpha ' k}{2y}I^{k/2-1} \cdot \frac{\partial I}{\partial y}- \frac{I^{k/2}}{y^2} \end{aligned}$$

By Lemma 7, we can see that \(\frac{\partial I}{\partial y} \ge \frac{2}{\alpha ' ky} > I\) and \(g'(y)> 0\) which proves the first claim. The lower bound can be derived by the limit of the function. By the relationship

$$\begin{aligned} \lim _{m \rightarrow \infty } mV_k(\sqrt{\alpha _k}) = V_k(1) \cdot \lim _{y \rightarrow 0+} g(y), \end{aligned}$$

and the straightforward consequence of Lemma 8,

$$\begin{aligned} \lim _{y \rightarrow 0+} g(y) = \alpha ' \cdot \frac{k}{2} \cdot B\left( \frac{k}{2},1+\frac{n-k}{2}\right) , \end{aligned}$$

we obtain the second claim.    \(\square \)

By a similar technique, we can show a similar result for the Gaussian case: the proof is postponed to Appendix A.3.

Lemma 9

Let the global probability \(0 \le \beta ' \le 1\) and \(1 \le k \le n\). Let \(\beta = \beta '/m\) and \(\beta _k >0\) such that \(P(k/2,\beta _k/(2\sigma ^2)) = \beta \). Then, \(m V_k(\sqrt{\beta _k})\) is strictly decreasing w.r.t. m, yet lower bounded by some linear function of \(\beta '\):

$$\begin{aligned} m V_k(\sqrt{\beta _k}) > \beta ' (2\pi \sigma ^2)^{k/2}. \end{aligned}$$

Moreover, for fixed \(\beta '\), k and \(\sigma \), the left-hand side converges to the right-hand side when m goes to infinity and \(\beta _k\) is defined as above.

4 Efficient Upper Bounds Based on Cross-Entropy

In order to guess how tight are our lower bounds in practice, we need to be able to find efficiently very good bounding functions for cylinder pruning. Different methods have been used over the years (see [7, 9, 10, 16]). In this section, we present the method that we used to generate bounding functions that try to minimize the enumeration cost, under the constraint that the success probability is greater than a given \(p_0>0\). From our experience, different methods usually give rise to close bounding functions, but their running time can vary significantly.

4.1 Our Formulation and Previous Algorithms

Usually, the problem to find optimal cost has two formulations and our algorithm targets the first one:

  1. 1.

    [7, 11] for a given basis B, radius R, and target probability \(p_0\), minimize the cost (5) subject to the constraint that the probability (6) is greater than \(p_0\). The variables are \(R_1,\ldots ,R_n\). This kind of constrained optimization is known as monotonic optimization because the objective function and constraint functions are both monotonic, i.e., \(f(x_1,\ldots ,x_n) \le f(x'_1,\ldots ,x'_n)\) if \(x_i \le x'_i\) for all i. It is known that the optimal value is on the border (see, for example [12]). A heuristic random perturbation is implemented in the progressive BKZ library [7], and an outline of the cross-entropy method is mentioned in Chen’s thesis [10].

  2. 2.

    [9] for a given basis B and radius, minimize the expected cost of extreme pruning [16]: \(m\cdot EnumCost + (m-1)\cdot PreprocessCost\) where m is a variable defining the number of bases, and therefore the success probability of the enumeration. The variables are \(R_1,\ldots ,R_n\) and m. This is an unconstrained optimization problem. A heuristic gradient descent and the Nelder-Mead method are implemented in the fpLLL library [9].

We explain why we introduce a new approach. All the known approaches try to minimize an approximate upper bound of the enumeration cost: this approximation is the sum of n terms, where each term can be derived from the computation of a simplex volume (following [16]) which costs \(O(n^2)\), where the unit is number of floating-point operations and the required precision might be linear in n. Although there exists an \(O(n^2)\) algorithm to compute the approximate upper bound [4, Sect. 3.3], a naive random perturbation strategy is too slow to converge.

Besides, we think that the Nelder-Mead and gradient descent are not suitable for our optimization problem. If we want to apply such methods to the constrained problem, a usual approach converts the problem into a corresponding global optimization problem by introducing penalty functions. Then, we find a near-optimal solution to the original problem by using the optimized variable of the converted problem. However, we know that the optimal point is on the border at which the penalty functions must change drastically. It could make the optimal point of the new problem far from the original one. Hence, we need an algorithm to solve our constrained problem directly.

For this purpose, we revisit Chen’s partial description [10] of the cross-entropy method to solve the problem (i). In Sect. 4.2, we give a brief overview of the cross-entropy method, and in Sect. 4.3, we explain how we modify it for our purpose.

4.2 A Brief Introduction to the Cross Entropy Method

The original motivation of the cross entropy method is to speed up Monte-Carlo simulation for approximating a probability. If the target probability is extremely small, the number of sampling points must be huge. To solve this issue, Rubinstein [30] introduced the cross entropy method and showed that the algorithm could be used for combinatorial optimization problems. This subsection gives a general presentation of the cross-entropy method: we will apply it to the optimization of pruning functions. For more information, see for example [14, 30].

Let \(\chi \) be the whole space of combinations and consider a cost function \(S: \chi \rightarrow \mathbb {R}_{\ge 0}\) that we want to minimize. Assume that we have a probability distribution \(D_{\chi ,\mathbf{u}}\) defined over \(\chi \) and parametrized by a vector \(\mathbf {u}\). We fix the corresponding probability density function \(f_{\mathbf {u}}(x)\). A cross-entropy algorithm to find the optimal combination \(X^* := \mathrm{argmin}_{X\in \chi } S(X)\) is outlined in Algorithm 1; here we use the description in the textbook [14, Algorithm 2.3].

figure a

The stochastic program in Step 4 is the problem of finding the parameter vector \(\mathbf {v}\) which optimizes

$$\begin{aligned} \arg \max _{\mathbf{v}} \sum _{i=1}^N I_{S(X_i) \le \gamma _t} \log f_{\mathbf{v}}(X_i) \end{aligned}$$
(10)

where

$$\begin{aligned} I_{S(X_i) \le \gamma _t} = \left\{ \begin{array}{ll} 1 \ \ &{} \mathrm{if} \ \ S(X_i) \le \gamma _t \\ 0 &{} \mathrm{if} \ \ S(X_i) > \gamma _t \\ \end{array} \right. \end{aligned}$$

is the characteristic function. It is known that the new distribution \(D_{\chi ,\mathbf {v}_t}\) derived from the solution is closer to the ideal distribution \(D_{\chi ,\mathbf{opt}}\) that outputs the optimal \(X^{\mathbf{opt}} = \arg \min _X S(X)\) with probability 1, than the previous distribution \(D_{\chi ,\mathbf {v}_{t-1}}\). In other words, the cost of sampled elements from \(D_{\chi ,\mathbf {v}_t}\) are likely to smaller than that of samples from \(D_{\chi ,\mathbf {v}_{t-1}}\). This is quantified by the function to measure the distance between two probability distributions:

$$ D(g,f_{\mathbf{v}}) := \int g(x) \log \frac{g(x)}{f_{\mathbf{v}}(x) } dx $$

which is known as the cross-entropy, or Kullback-Leibler distance. The above algorithm wants to minimize the distance from the optimal state g by changing the parameter vector \(\mathbf {v}\).

The stochastic program (10) can be easily solved analytically if the family of distribution function \(\{ f_{\mathbf{v}}(x) \}_{\mathbf{v}\in V} \) is a natural exponential family (NEF) [31]. In particular, if the function \(f_{\mathbf{v}}(x)\) is convex and differentiable with respect to \(\mathbf{v}\), the solution of (10) is obtained by solving the simultaneous equations

$$\begin{aligned} \sum _{i=1}^N I_{S(X_i) \le \gamma _t} \nabla \log f_{\mathbf{v}}(X_i) = \mathbf{0}. \end{aligned}$$
(11)

The Gaussian product (12) used in the next section is one of the simplest examples of such functions.

4.3 Our Algorithm

For the generic algorithm (Algorithm 1), we substitute our cost function and constraints. Then, we modify the sampler and introduce the FACE strategy as explained in this section. Recall that the input is a lattice basis and its Gram-Schmidt lengths, a radius R and a target probability \(p_0\). We mention that our algorithm follows [19, Algorithm 2] for optimization over a subset of \(\mathbb {R}^m\) by Kroese, Porotsky and Rubinstein.

Modified sampler: The sampling parameter is \(\mathbf {u} = (c_1,\ldots ,c_{n-1},\sigma _1,\ldots ,\sigma _{n-1})\in \mathbb {R}_{\ge 0}^{2n-2}\) where c and \(\sigma \) correspond to the center and deviation respectively.

Since the bounding radii must increase and the last coordinate is \(R_n=1\), the searching space is

$$ \chi = \{ (x_1,\ldots ,x_{n-1}) \in (0,1]^n :x_1 \le x_2 \le \cdots \le x_{n-1} \} \subset \mathbb {R}^{n-1}. $$

To sample from the space following the parameter \(\mathbf {u}\), define the corresponding probability distribution \(D_{\chi ,\mathbf{u}}\) as follows: sample each \(u_i\) from \(N(c_i,\sigma _i^2)\) independently, if all \(u_i \ge 0\), then let \((x_1,\ldots ,x_n)\) be \((u_1,\ldots ,u_n)\) sorted in increasing order and output it. We sort the output because because we do not know a suitable distribution from which the sampling from \(\chi \) is easy. As we will see later, when the algorithm is about to converge, the Gaussian parameters \(\sigma _i\) become small, and the distributions of \(u_i\)’s and \(x_i\) become close. Below we assume that the probability density function of \(D_{\chi ,\mathbf {u}}\) is sufficiently close to that of the Gaussian product

$$\begin{aligned} f_{\mathbf{u}}(X) = \frac{1}{(2\pi )^{n/2}}\prod _{i=1}^{n-1} \left( \frac{1}{\sigma _i} \exp (-(x_i-c_i)^2 / (2\sigma _i^2)) \right) . \end{aligned}$$
(12)

The gradients of log of the function are

$$ \frac{\partial }{\partial c_i} \log f_{\mathbf{u}}(X) = \frac{x_i-c_i}{\sigma _i^2}, $$

and

$$ \frac{\partial }{\partial \sigma _i} \log f_{\mathbf{u}}(X) = -\frac{1}{\sigma _i} + \frac{(x_i-c_i)^2}{\sigma _i^3}. $$

Substituting them into (11), we obtain the formulas to update \(c_i\) and \(\sigma _i\) as follows

$$\begin{aligned} \begin{array}{l} \displaystyle c_i^{new} \leftarrow \frac{\sum _{j: S(X_j) \le \gamma _t } x_{j,i} }{|\{j: S(X_j) \le \gamma _t \}|} \\ \displaystyle \sigma _i^{new} \leftarrow \sqrt{ \frac{\sum _{j: S(X_j) \le \gamma _t } (x_{j,i} -c_i)^2 }{|\{j: S(X_j) \le \gamma _t \}|}} \end{array} \end{aligned}$$
(13)

where we denote \(x_{j,i}\) for the i-th coordinate of \(X_j\).

The FACE strategy: For practical speedup, we can employ the fully-automated cross-entropy (FACE) strategy described in [14, Sect. 4.2]. It simply replaces the full sampling in Step 2 in Fig. 1 by a recycling strategy. Consider a list \(L=\{ X_1,\ldots ,X_N\}\). If the cost of a new sample is less than \(\max _{i\in [N]} S(X_i)\), replace the new sample to the maximum element in the list, and update the parameter vector by (13) using all items in the list, i.e., with \(\gamma _t = +\infty \).

We did preliminary experiments on this strategy and found that our problem has a typical trend, i.e. if the size N of list is small \((\approx 10)\), the minimum cost \(\min _{i\in [N]} S(X_i)\) decreases very fast but seems to stay near a local minimum. On the other hand, if we choose a large N (\(\approx 1000\)), the speed of convergence is slow, but the pruning function found is better than in the small case if we use many loop iterations. Hence, we start with a small N and increase it little by little.

Integrating the above, we give the pseudocode of our optimizing algorithm in Algorithm 2. We used a heuristic parameter set \(N_{init}=10\) and \(N_{max}=50\), and terminate the computation if \(\mathbf {v}\) is not updated in the last 10 loop iterations.

figure b

5 Tightness and Applications to Security Estimates

In this section, we study the heuristic cost N of (5) divided by two (SVP setting).

5.1 Modeling Strongly Reduced Bases

The cost (5) of cylinder pruning over \(P_f(B,R)\) depends both on the quality of the basis B, the radius R and the pruning function f. The results of Sect. 3 allow to lower bound the numerator of each term of (5), but we also need to lower bound the part depending on the basis B. This was already discussed in [7, 11, 25] using two models of strongly reduced bases: the Rankin model used in [11, 25] which provides conservative bounds by anticipating progress in lattice reduction, and the HKZ model used in [7, 11] which is closer to the state-of-the-art. This part is more heuristic than Sect. 3.

The HKZ model. The BKZ algorithm tries to approximate HKZ-reduced bases, which are bases B such that \(\Vert \mathbf {b}^{\star }_i\Vert = \lambda _1(\pi _i(L))\) for all \(1 \le i \le n\). When running BKZ, an HKZ basis is the best output one can hope for. On the other hand, a BKZ-reduced basis with large blocksize will be close to an HKZ-basis, so this model is somewhat close to the state-of-the-art. It corresponds to an idealized Kannan’s algorithm [18] where enumerations are only performed over HKZ-reduced bases (see [23] for more practical variants). Unfortunately, in theory, we do not know what the \(\Vert \mathbf {b}^{\star }_i\Vert \)’s of an HKZ basis will look like exactly, except for \(i=1\), but we can make a guess. Following [7, 11], we assume that for \(1 \le i \le n-50\), \(\Vert \mathbf {b}^{\star }_i\Vert \approx \mathrm {GH}(\pi _i(L)) =V_{n-i+1}(1)^{-1/(n-i+1)} \left( \prod _{k=i}^n \Vert \mathbf {b}^{\star }_k\Vert \right) ^{1/(n-i+1)}\), which means that we assume that \(\pi _i(L)\) behaves like a random lattice. Then we can simulate \(\Vert \mathbf {b}^{\star }_i\Vert \) for \(1 \le i \le n-50\) by a simple recursive formula. We stop at \(n-50\), because Chen and Nguyen [11] reported that the last projected lattices do not behave like random lattices. For the remaining indices, they proposed to use a numerical table from experimental results in low dimension: we use the same table. Note that for a large dimension such as 200, errors in the last coordinates are not an issue because the contribution of the terms \(k\le 50\) in N is negligible.

The Rankin model. It is known that HKZ bases are not optimal for minimizing the running time of enumeration. For instance, Nguyen [27, Chap. 3] noticed a link between the cost of enumeration and the Rankin invariants of a lattice, which provides lower bounds on heuristic estimates of the number of nodes and identifies better bases than HKZ. However, finding these better bases is currently more expensive [13] than finding HKZ-reduced bases. Recall that the Rankin invariants \(\gamma _{n,m}(L)\) of an n-rank lattice L satisfy:

$$\begin{aligned} \gamma _{n,m}(L) := \min _{\begin{array}{c} S:\text { sublattice of }L \\ \mathrm {rank}(S)=m \end{array}} \left( \frac{\mathrm {vol}(S)}{\mathrm {covol}(L)^{m/n}} \right) ^2 \le \frac{\prod _{i=1}^m \Vert \mathbf {b}^{\star }_i\Vert ^2 }{\mathrm {covol}(L)^{2m/n} }, \end{aligned}$$
(14)

for any basis \((\mathbf {b}_1,\dots ,\mathbf {b}_n)\) of L. We have the following lower bound [37, Corollary 1] for Rankin’s constant \(\gamma _{n,m} := \max _{L} \gamma _{n,m}(L)\):

$$\begin{aligned} \gamma _{n,m} \ge \left( n\cdot \frac{\prod _{n-m+1}^n Z(j)}{\prod _{j=2}^m Z(j)} \right) ^{2/n} \ \ \mathrm{where} \ \ Z(j) := \zeta (j) \Gamma (j/2) \pi ^{-j/2}. \end{aligned}$$
(15)

According to [36], it seems plausible that most lattices come close to realizing Rankin constants: for any \(\varepsilon > 0\) and sufficiently large n, most lattices L “should” verify \(\gamma _{n,m}(L)^{1/(2m)} \ge \gamma _{n,m}^{1/(2m)} - \varepsilon \) for all m.

Ignoring \(\varepsilon \), if we lower bound any term of the form \(\frac{\prod _{i=1}^m \Vert \mathbf {b}^{\star }_i\Vert ^2}{\mathrm {covol}(L)^{2m/n}}\) in the simplified cost (5) by the right-hand side of (15), we obtain the following heuristic lower bound formula:

$$\begin{aligned} \begin{array}{l} N = \displaystyle \frac{1}{2} \sum _{k=1}^n \frac{\displaystyle \mathrm {vol}(C_{R_1,\ldots ,R_k})\prod _{i=1}^{n-k} \Vert \mathbf {b}^{\star }_i\Vert }{\mathrm {vol}(L)} > \frac{1}{2} \sum _{k=1}^n \frac{\mathrm {vol}(C_{R_1,\ldots ,R_k})}{\mathrm {vol}(L)^{k/n}} \left( (n-k) \frac{ \displaystyle \prod _{j=k+1}^n Z(j)}{\displaystyle \prod _{j=2}^{n-k} Z(j)} \right) ^{\frac{1}{n-k}} \end{array} \end{aligned}$$

In both cases, substituting the volume lower bounds in Sects. 3.2 and 3.3, we obtain closed formulas to find the lower bound complexity which are suitable for numerical analyses.

On the other hand, for any n-rank lattice L, and any fixed \(m \in \{1,\dots ,n-1\}\), there is a basis \((\mathbf {b}_1,\dots ,\mathbf {b}_n)\) of L such that \(\frac{\prod _{i=1}^m \Vert \mathbf {b}^{\star }_i\Vert ^2 }{\mathrm {covol}(L)^{2m/n}} = \gamma _{n,m}(L)\). This existence would only be guaranteed for fixed m, such as for the m maximizing the corresponding number \(N_{n+1-m}\) of nodes in the enumeration tree at depth m. By idealization, we call Rankin basis a basis such that for all \(m \in \{1,\dots ,n-1\}\), \(\frac{\prod _{i=1}^m \Vert \mathbf {b}^{\star }_i\Vert ^2 }{\mathrm {covol}(L)^{2m/n}}\) is approximately less than the right-hand side of (15): since such bases may not exist, this is an over-simplification to guess how much speed-up might be possible with the best bases. We use Rankin bases to compute speculative upper bounds, anticipating progress in lattice reduction.

5.2 Explicit Lower Bounds

We summarize the applications of the results of Sects. 3.2 and 3.3, to compute lower bounds on the number of nodes searched by cylinder pruning with lower bounded success probability.

Single Enumeration. By Corollary 2, if \(\alpha \) is a lower bound on the success probability,

$$\begin{aligned} N \ge \frac{1}{2}\sum _{k=1}^n \frac{V_k(\sqrt{\alpha _k} R_n)}{\prod _{i=n-k+1}^n \Vert \mathbf {b}^{\star }_i \Vert } \end{aligned}$$
(16)

where \(\alpha _k\) is defined by \(I_{\alpha _k}(k/2,1+(n-k)/2)=\alpha \).

For the Gaussian case with success probability \(\ge \beta \), from Corollary 3,

$$\begin{aligned} N \ge \frac{1}{2}\sum _{k=1}^n \frac{V_k(\sqrt{\beta _k})}{\prod _{i=n-k+1}^n \Vert \mathbf {b}^{\star }_i \Vert } \end{aligned}$$

where \(\beta _k\) is defined by \(P(k/2,\beta _k/(2\sigma ^2))= \beta \).

Multiple Enumerations. For the situation where one can use m bases, let \(\alpha '\) be a lower bound on the global success probability. Then by Lemma 6,

$$\begin{aligned} N \ge \frac{\alpha '}{4}\sum _{k=1}^n \frac{ k V_k(R_n) B(k/2,1+(n-k)/2)}{\prod _{i=n-k+1}^n \Vert \mathbf {b}^{\star }_i \Vert } \end{aligned}$$
(17)

where \(\alpha '\) satisfies \(\mathrm {vol}(\cup _{i=1}^m C_i) \ge \alpha ' \mathrm {vol}(R_n)\).

Lemma 9 also implies a lower bound for the Gaussian setting with global success probability \(\rho _{n,\sigma }(\cup _{i=1}^m C_i) \ge \beta '\):

$$\begin{aligned} N \ge \frac{\beta '}{2} \sum _{k=1}^n \frac{(2\pi \sigma ^2)^{k/2}}{\prod _{i=n-k+1}^n \Vert \mathbf {b}^{\star }_i \Vert }. \end{aligned}$$

5.3 Radii Tightness

To check tightness, we give two figures (Fig. 2) that compare the lower bound of radii from Corollary 2, and the best radii generated by our cross entropy method. The comparison is for two regimes: high and low success probability. Note that the left probability 0.6827 is an approximation of \(P(\frac{1}{2},\frac{1}{2})\) for which the linear pruning is the best known proved lower bound.

We see that the radii bounds are reasonably tight in both cases. We deduce that in these examples, the enumeration cost bounds will also be tight, because the cost is dominated by what happens around \(k\approx n/2\).

We note that it is to easier to compute lower bounds than upper bounds.

Fig. 2.
figure 2

Comparison of lower bound and near optimal radii; for the 150-dimensional simulated HKZ basis, compute near optimal radii and lower bound radii for \(\alpha =0.6827 \gtrsim P(\frac{1}{2},\frac{1}{2})\) (Top) and \(\alpha =10^{-10}\) (Bottom).

5.4 Security Estimates for Enumeration

Figure 1 (in the introduction) displays four bounds on the cost of enumeration in several situations, for varying dimension and simulated HKZ bases and Rankin bases:

  • The thin red curve is an upper bound of the enumeration cost using \(M=10^{10}\) bases with single success probability \(\alpha =10^{-10}\) computed by the cross-entropy method.

  • The bold red curve is a lower bound of the enumeration cost using \(M=10^{10}\) bases with single success probability \(\alpha =10^{-10}\) computed by M times (16).

  • The thin green curve is an upper bound of the enumeration cost w.r.t. infinitely many bases with global success probability \(\alpha ' = 1\). This is computed by M times an upper bound of the enumeration cost with single success probability 1 / M for a very large M where the single cost is greater than lattice dimension.

  • The bold green curve is a lower bound of the enumeration cost w.r.t. infinitely many bases with a large global success probability. This is computed by (17) with \(\alpha '=1\).

In all experiments, we take the radius by \(R_n=GH(L)\). The cost is the number of nodes of the enumeration tree in the classical computing model. The security level is the base-2 logarithm of the cost, which is divided by two in the quantum computing model [6, 24].

We also draw the curve of \(2^{0.292n}\) and \(2^{0.265n}\) which are simplified lower bounds of the cost for solving SVP-n used in [2] for classical and quantum computers, respectively.

In all the situations where we use \(10^{10}\) bases, the upper bounds (thin red curve) and the lower bounds (bold red curve) are close to each other, which demonstrates the tightness of our lower bound.

In the classical setting, our lower bounds for enumeration are higher than sieve lower bounds. On the other hand, in the quantum setting, there are cases where enumeration is faster than quantum sieving. For instance, if an attacker could find many quasi-Rankin bases by some new lattice reduction algorithm, the claimed \(2^{128}\) quantum security might be dropped to about \(2^{96}\) security. In such a situation, the required blocksize would increase from about 480 to 580.

5.5 Experimental Environments

All experiments were performed by a standard server with two Intel Xeon E5-2660 CPUs and 256-GB RAM. We used the boost library version 1.56.0, which has efficient subroutines to compute (incomplete) beta, (incomplete) gamma and zeta functions with high precision.