1 Introduction

The cost of (strong) lattice reduction has received renewed attention in recent years due to its relevance in cryptography. Indeed, lattice-based constructions are presumed to achieve security against quantum adversaries and enable powerful functionalities such as computation on encrypted data. Concrete parameters for such schemes are derived from the difficulty of finding relatively short non-zero vectors in a lattice: the parameters are chosen based on extrapolations of the cost of the BKZ algorithm  [SE94] and its variants  [CN11, AWHT16, MW16]. These algorithms make repeated calls to an oracle that solves the Shortest Vector Problem (SVP), i.e. that finds a shortest non-zero vector in any lattice. Concretely, BKZ with block size k finds relatively short vectors in lattices of dimensions \(n \ge k\) using a k-dimensional SVP solver. The cost of this SVP solver is the dominating component of the cost of BKZ and its variants.

The SVP solver can be instantiated with enumeration-based algorithms, whose asymptotically most efficient variant is Kannan’s algorithm  [Kan83]. It has a worst-case complexity of \(k^{k/(2{{\,\mathrm{\mathrm {e}}\,}}) + o(k)}\), where k is the dimension of the lattice under consideration  [HS07]. This bound is sharp, up to the o(k) term in the exponent  [HS08]. If called on an n-dimensional lattice, then BKZ with block size k outputs a vector of norm \(\approx {(k^{1/(2k)})}^{n} \cdot {{{\,\mathrm{Vol}\,}}({{\,\mathrm{\mathcal {L}}\,}})}^{1/n}\) in time \(\approx k^{k/(2{{\,\mathrm{\mathrm {e}}\,}})}\), when n is sufficiently large compared to k. The \(k^{1/(2k)}\) term is called the root Hermite factor and quantifies the strength of BKZ. The trade-off between root Hermite factor and running-time achieved by BKZ has remained the best known for enumeration-based SVP solvers since the seminal work of Schnorr and Euchner almost 30 years ago. (The analysis of Kannan’s algorithm and hence \({{\,\mathrm{BKZ}\,}}\) was improved in  [HS07], but not the algorithm itself.) Other algorithms, such as  [GN08a, MW16, ALNS19], achieve the same asymptotic trade-off with milder conditions on n/k.

We note that while lattice reduction libraries, such as FPLLL   [dt19a], the Progressive BKZ Library (PBKZ)  [AWHT18] and NTL   [Sho18], implement BKZ with an enumeration-based SVP solver, they do not rely on Kannan’s algorithm: NTL  implements enumeration with LLL preprocessing; FPLLL and PBKZ implement enumeration with stronger preprocessing (typically BKZ with a smaller block size) but not with sufficiently strong preprocessing to satisfy the conditions of  [Kan83, HS07]. Hence, the running-times of these implementations is not established by the theorems in these works.

It has been suggested that the running-time achieved by BKZ for the same output quality might potentially be improved. In  [HS10], it was argued that the same root Hermite factor would be achieved by BKZ in time \(\approx k^{k/8}\) if the Gram–Schmidt norms of so-called HKZ-reduced bases were decreasing geometrically. In  [Ngu10], the same quantity was suggested as a cost lower bound for enumeration-based lattice reduction algorithms. On this basis, several works have speculatively assumed this cost  [ANS18, ACD+18]. However, so far, no lattice reduction algorithm achieving root Hermite factor \(k^{1/(2k)}\) in time \(\approx k^{k/8}\) was known.

Contributions. Our main contribution is an enumeration-based lattice reduction algorithm that runs in time \(k^{k/8}\) and achieves root Hermite factor \(k^{\frac{1}{2k} (1+o(1))}\), where k is a cost parameter akin to the “block size” of the BKZ algorithm (the notion of “block size” for our algorithm is less straightforward, see below). It uses polynomial memory and can be quantumly accelerated to time \(k^{k/16}\) using  [ANS18]. Our analysis relies on a strengthened version of the Gaussian Heuristic.

To estimate the cost of lattice reduction algorithms, the literature typically relies on concrete experiments and simulations that extrapolate them (see, e.g.  [CN11, Che13, MW16, BSW18]). Indeed, the data given in  [Che13] is very broadly appealed to. However, this data only covers up to block size of 250 (below cryptographically relevant block sizes) and no source code is available. As an intermediate contribution, we reproduce and extend the data in  [Che13] using publicly available tools such as  [dt19a, dt19b] (see Sect. 2.5). Using this extended dataset, we then argue that \({{\,\mathrm{BKZ}\,}}\) as implemented in public lattice reduction libraries has running-time closely matching \(k^{k/(2{{\,\mathrm{\mathrm {e}}\,}})}\) (Fig. 2). Our cost improvement hence required a different algorithm and not just an improved analysis of the state-of-the-art.

In Sect. 4 we propose a variant of our improved lattice reduction algorithm that works well in practice. We run simulations and conduct concrete experiments to verify its efficiency, while we leave as a future work to formally analyse it. The simulations suggest that it achieves root Hermite factors \(\approx k^{\frac{1}{2k}}\) in time \(k^{k/8}\), at least up to \(k \approx 1,000\) (which covers cryptographic parameters). Our implementation of this algorithm beats FPLLL’s SVP enumeration from dimension \(\approx 100\) onward. We consider the difference between these two variants as similar to the difference between Kannan’s algorithm and what is routinely implemented in practice such as in FPLLL and PBKZ. We will refer to the former as the “asymptotic variant” and the latter as the “practical variant”. Since our results rely on empirical evidence and simulations, we provide the source code used to produce our figures and the data being plotted as an attachment to the electronic version of the full version of this work.

Key idea. Our new algorithms decouple the preprocessing context from the enumeration context: they preprocess a projected sublattice of larger dimension than they aim to enumerate over (as a result, the notion of “block size” is less obvious than in prior works). More concretely, assume that the basis of the preprocessed projected sublattice is \({{\,\mathrm{SDBKZ}\,}}\)-reduced. Then, as shown in  [MW16] under the Gaussian Heuristic, the first Gram–Schmidt norms \(\Vert \textit{\textbf{b}}_i^*\Vert \) satisfy Schnorr’s Geometric Series Assumption (GSA)  [Sch03]: \(\Vert \textit{\textbf{b}}_i^*\Vert /\Vert \textit{\textbf{b}}_{i+1}^*\Vert \approx r\) for some common r, for all i’s corresponding to the start of the basis. On that “GSA part” of the lattice basis, the enumeration runs faster than on a typical preprocessed BKZ block of the same dimension. To achieve \({{\,\mathrm{SDBKZ}\,}}\)-reducedness at a low cost, our algorithms call themselves recursively.

As a side contribution, we show in the full version of this work that the bases output by \({{\,\mathrm{BKZ}\,}}\) do not satisfy the GSA (under the Gaussian Heuristic), contrarily to a common belief (see e.g.  [YD17, ANS18, BSW18]). This is why we use \({{\,\mathrm{SDBKZ}\,}}\) in the asymptotic algorithm. Nevertheless, for handlable dimensions, BKZ seems to deviate only slightly from the GSA and as it is a little simpler to implement than \({{\,\mathrm{SDBKZ}\,}}\), it seems to remain preferable in practice. This is why we use \({{\,\mathrm{BKZ}\,}}\) as preprocessing in the practical algorithm.

To illustrate the idea of our new algorithms, we consider a BKZ-reduced (resp. SDBKZ-reduced) basis with block size k, in a lattice of dimension \(n=\lceil (1+c) \cdot k\rfloor \) for various k and c. We choose \(n \ge k\) to demonstrate the impact of the GSA region on the enumeration cost. We then estimate the enumeration cost (without pruning) for re-checking that the first basis vector is a shortest non-zero vector in the very first block of size k. For the implementation of this simulation, see , attached to the electronic version of the full version of this work. We consider c for \(c = 0\) to 1 with a step size of 0.01. For each c, we take k from \(k = 100\) to 50, 000 with a step size of 10. Then for each fixed c, we fit the coefficients \(a_{0},a_{1},a_{2}\) of \(a_{0}\,k \log k + a_{1}\,k + a_{2}\) over all k on the enumeration cost of the first block of size k. The result is plotted in Fig. 1. The x-axis denotes the value of c and the y-axis denotes the interpolated constant in front of the \(k \log k\) term.

Let us make several remarks about Fig. 1. First, we stress that all leading constants in Fig. 1 are hypothetical (they do not correspond to efficient algorithms) as they assume an already (SD)BKZ-reduced basis with block size k, i.e. this ignores the preprocessing cost. With that in mind, for \(c = 0\), the simulations show the interpolated constant for both BKZ and \({{\,\mathrm{SDBKZ}\,}}\) is close to \(1/(2{{\,\mathrm{\mathrm {e}}\,}})\), which corresponds to  [HS07]. For \(c = 1\), the interpolated constant is close to 1/8. This illustrates the impact of enumeration in the GSA region (corresponding to Theorem 1). As noted above, in the following section we will describe an algorithm that achieve the corresponding cost of \(k^{k/8 (1+o(1))}\). It is worth noting that for certain c around 0.3, the \(a_0\) of the re-examination cost can be below \(0.125\). We stress that we do not know how to construct an algorithm that achieves a cost of \(k^{a_{0} \cdot k (1+o(1))}\) with \(a_{0} < 0.125\). However, our practical variant of the algorithm seems to achieve cost \(k^{0.125 \cdot k}\) using the region corresponding to those \(c \approx 0.3\).

Fig. 1.
figure 1

Interpolated dominating constant \(a_{0}\) on \(k \log k\).

Discussion. At first sight, the endeavour in this work might appear pointless since lattice sieving algorithms asymptotically outperform lattice enumeration. Indeed, the fastest SVP solver currently known  [BDGL16] has a cost of \(2^{0.292\,n + o(n)}\), where n is the lattice dimension.Footnote 1 Furthermore, a sieving implementation  [ADH+19] now dominates the Darmstadt SVP Challenge’s Hall of Fame, indicating that the crossover between enumeration and sieving is well below cryptographic parameter sizes. However, the study of enumeration algorithms is still relevant to cryptography.

Sieving algorithms have a memory cost that grows exponentially with the lattice dimension n. For dimensions that are currently handlable, the space requirement remains moderate. The impact of this memory cost is unclear for cryptographically relevant dimensions. For instance, it has yet to be established how well sieving algorithms parallelise in non-uniform memory access architectures. Especially, the exponential memory requirement might present a serious obstacle in some scenarios. In contrast, the memory cost of enumeration grows as a small polynomial in the dimension.

Comparing sieving and enumeration for cryptographically relevant dimensions becomes even more complex in the context of quantum computations. Quantum computations asymptotically enable a quadratic speed-up for enumeration, and much less for sieving  [Laa15, Sec. 14.2.10] even assuming free quantum-accessible RAM, which would a priori favour enumeration. However, deciding on how to compare parallelisable classical operations with strictly sequential Grover iterations is unclear and establishing the significant lower-order terms in the quantum costs of these algorithms is an ongoing research programme (see e.g.  [AGPS19]).

Further, recent advances in sieving algorithms  [LM18, Duc18, ADH+19] apply lessons learned from enumeration algorithms to the sieving context: while sieving algorithms are fairly oblivious to the Gram–Schmidt norms of the basis at hand, the cost of enumeration algorithms critically depend on their limited decrease. Current sieving strategies employ a simple form of enumeration (Babai’s lifting  [Bab86]) to exploit the lattice shape by sieving in a projected sublattice and lifting candidates for short vectors to the full lattice. Here, more sophisticated hybrid algorithms permitting flexible trade-offs between memory consumption and running time seem plausible.

Finally, as illustrated in Fig. 1, our work suggests potential avenues for designing faster enumeration algorithms based on further techniques relying on the graph of Gram–Schmidt norms.

Open problems. It would be interesting to remove the heuristics utilised in our analysis to produce a fully proved variant, and to extend the technique to other lattice reduction algorithms such as slide reduction  [GN08a]. Further, establishing lower bounds on the root Hermite factor achievable in time \(k^{k/8 + o(k)}\) for a given dimension of the lattice is an interesting open problem suggested by this work.

2 Preliminaries

Matrices are denoted in bold uppercase and vectors are denoted in bold lowercase. By \(\textit{\textbf{B}} _{\left[ {i:j}\right) } \) we refer to the submatrix spanned by the columns \(\textit{\textbf{b}}_{i},\ldots ,\textit{\textbf{b}}_{j-1}\) of \(\textit{\textbf{B}} \). We let matrix indices start with index 0. We let \(\pi _i(\cdot )\) denote the orthogonal projection onto the linear subspace \({(\textit{\textbf{b}}_0,\ldots ,\textit{\textbf{b}}_{i-1})}^{\perp }\) (this depends on a matrix \(\textit{\textbf{B}} \) that will always being clear from context). We let \(v_n = \frac{\pi ^{n/2}}{\varGamma (1 + n/2)} \approx \frac{1}{\sqrt{n \pi }} {\left( \frac{2 \pi {{\,\mathrm{\mathrm {e}}\,}}}{n}\right) }^{n/2}\) denote the volume of the n-dimensional unit ball. We let the logarithm to base 2 be denoted by \(\log \) and the natural logarithm be denoted by \(\ln \).

Below, we may refer to the cost or enumeration parameter \(k\) of our algorithms as a “block size”.

2.1 Lattices

Let \(\textit{\textbf{B}} \in \mathbb {Q}^{m \times n}\) be a full column rank matrix. The lattice \({{\,\mathrm{\mathcal {L}}\,}}\) generated by \(\textit{\textbf{B}} \) is \({{\,\mathrm{\mathcal {L}}\,}}(\textit{\textbf{B}}) = \{\textit{\textbf{B}} \cdot \textit{\textbf{x}} \mid \textit{\textbf{x}}\in \mathbb {Z}^n\}\) and the matrix \(\textit{\textbf{B}} \) is called a basis of \({{\,\mathrm{\mathcal {L}}\,}}(\textit{\textbf{B}})\). As soon as \(n \ge 2\), any given lattice \(\mathcal {L}\) admits infinitely many bases, and full column rank matrices \(\textit{\textbf{B}},\textit{\textbf{B}} ' \in \mathbb {Q}^{m \times n}\) span the same lattice if and only if there exists \(\textit{\textbf{U}} \in \mathbb {Z}^{n \times n}\) such that \(\textit{\textbf{B}} ' = \textit{\textbf{B}} ' \cdot \textit{\textbf{U}} \) and \(|\det (\textit{\textbf{U}})|=1\). The Euclidean norm of a shortest non-zero vector in \({{\,\mathrm{\mathcal {L}}\,}}\) is denoted by \(\lambda _1(\mathcal {L})\) and called the minimum of \({{\,\mathrm{\mathcal {L}}\,}}\). The task of finding a shortest non-zero vector of \({{\,\mathrm{\mathcal {L}}\,}}\) from an arbitrary basis of \({{\,\mathrm{\mathcal {L}}\,}}\) is called the Shortest Vector Problem (\({{\,\mathrm{SVP}\,}}\)).

We let \(\textit{\textbf{B}} ^* = (\textit{\textbf{b}}_0^*,\ldots ,\textit{\textbf{b}}_{n-1}^*)\) denote the Gram–Schmidt orthogonalisation of \(\textit{\textbf{B}} \) where \(\textit{\textbf{b}}_i^* = \pi _i(\textit{\textbf{b}}_i)\). We write \(\rho _{\left[ a:b \right) }\) for the slope of the \(\log \Vert \textit{\textbf{b}}_{i}^{*}\Vert \)’s with \(i=a,\ldots ,b-1\), under a mean-squared linear interpolation. We let \(\pi _i (\textit{\textbf{B}} _{\left[ {i:j}\right) })\) denote the local block \((\pi _i(\textit{\textbf{b}}_i), \ldots , \pi _i(\textit{\textbf{b}}_{j-1}))\) and let \(\pi _i ({{\,\mathrm{\mathcal {L}}\,}}_{[i:j)})\) denote the lattice generated by \(\pi _i (\textit{\textbf{B}} _{\left[ {i:j}\right) })\). We will also write \(\pi ({{\,\mathrm{\mathcal {L}}\,}})\) if the index i and \({{\,\mathrm{\mathcal {L}}\,}}\) are clear from the context. The volume of a lattice \(\mathcal {L}\) with basis \(\textit{\textbf{B}} \) is defined as \({{\,\mathrm{Vol}\,}}(\mathcal {L}) = \prod _{i< n}\Vert \textit{\textbf{b}}_i^*\Vert \); it does not depend on the choice of basis of \(\mathcal {L}\). Minkowski’s convex body theorem states that \(\lambda _1(\mathcal {L}) \le 2\cdot v_n^{-1/n} \cdot {{{\,\mathrm{Vol}\,}}(\mathcal {L})}^{1/n}\). We define the root Hermite factor of a basis \(\textit{\textbf{B}} \) of a lattice \(\mathcal {L}\) as \(\mathtt {rhf} (\textit{\textbf{B}}) = (\Vert \textit{\textbf{b}}_0\Vert /{{{\,\mathrm{Vol}\,}}(\mathcal {L})}^{1/n})^{1/(n-1)}\). The normalization by the \((n-1)\)-th root is justified by the fact that the lattice reduction algorithms we consider in this work achieve root Hermite factors that are bounded independently of the lattice dimension n. Given as input an arbitrary basis of \({{\,\mathrm{\mathcal {L}}\,}}\), the task of finding a non-zero vector of \({{\,\mathrm{\mathcal {L}}\,}}\) of norm \(\le \gamma \cdot {{{\,\mathrm{Vol}\,}}(\mathcal {L})}^{1/n}\) is called Hermite-\({{\,\mathrm{SVP}\,}}\) with parameter \(\gamma \) (\(\gamma \)-\({{\,\mathrm{HSVP}\,}}\)).

Lattice reduction algorithms and their analyses often rely on heuristic assumptions. Let \(\mathcal {L}\) be an n-dimensional lattice and \(\mathcal {S}\) a measurable set in the real span of \(\mathcal {L}\). The Gaussian Heuristic states that the number of lattice points in \(\mathcal {S}\) is \(|\mathcal {L} \cap \mathcal {S}| \approx {{\,\mathrm{Vol}\,}}(\mathcal {S})/{{\,\mathrm{Vol}\,}}({{\,\mathrm{\mathcal {L}}\,}})\). If \(\mathcal {S}\) is an n-ball of radius r, then the latter is \(\approx v_n \cdot r^n / {{\,\mathrm{Vol}\,}}({{\,\mathrm{\mathcal {L}}\,}})\). By setting \(v_n \cdot r^n \approx {{\,\mathrm{Vol}\,}}({{\,\mathrm{\mathcal {L}}\,}})\), we see that \(\lambda _1(\mathcal {L})\) is close to \(\mathrm {GH}(\mathcal {L}) := v_n^{-1/n}\cdot {{{\,\mathrm{Vol}\,}}(\mathcal {L})}^{1/n}\). Asymptotically, we have \(\mathrm {GH}(\mathcal {L}) \approx \sqrt{\frac{n}{2 \pi {{\,\mathrm{\mathrm {e}}\,}}}} \cdot {{{\,\mathrm{Vol}\,}}(\mathcal {L})}^{1/n}\).

2.2 Enumeration and Kannan’s Algorithm

The \({{\,\mathrm{Enum}\,}}\) algorithm  [Kan83, FP83] is an SVP solver. It takes as input a basis matrix \({\textit{\textbf{B}}}\) of a lattice \(\mathcal {L}\) and consists in enumerating all \((x_i,\ldots , x_{n-1}) \in \mathbb {Z}^{n-i}\) such that \(\Vert \pi _i(\sum _{j\ge i} x_j \cdot \textit{\textbf{b}}_j)\Vert \le A \) for every \(i < n\), where A is an a priori upper bound on or estimate of \(\lambda _1(\mathcal {L})\) (such as \(\Vert \textit{\textbf{b}}_1\Vert \) and \(\mathrm {GH}(\mathcal {L})\), respectively). It may be viewed as a depth-first search of an optimal leaf in a tree indexed by tuples \((x_i,\ldots ,x_{n-1})\), where the singletons \(x_{n-1}\) lie at the top and the full tuples \((x_0,\ldots ,x_{n-1})\) are the leaves. The running-time of \({{\,\mathrm{Enum}\,}}\) is essentially the number of tree nodes (up to a small polynomial factor), and its space cost is polynomial. As argued in  [HS07], the tree size can be estimated as \(\max _{i<n} (v_i \cdot A^i / \prod _{j \ge n-i} \Vert \textit{\textbf{b}}_j^*\Vert )\), under the Gaussian Heuristic. In  [ANS18], it was showed that a quadratic speedup can be obtained quantumly using Montanaro’s quantum backtracking algorithm (and the space cost remains polynomial). We will rely on the following (classical) cost bound, derived from  [HS07, Subsection 4.1]. It is obtained by optimising the tree size \(\max _{i<n} (v_i \cdot A^i / \prod _{j \ge n-i} \Vert \textit{\textbf{b}}_j^*\Vert )\). We can replace A by twice the Gaussian Heuristic \(\mathrm {GH}(\mathcal {L}) = v_n^{-1/n}\cdot {{{\,\mathrm{Vol}\,}}(\mathcal {L})}^{1/n}\), where \({{\,\mathrm{Vol}\,}}(\mathcal {L}) = \prod _{j<n} \Vert \textit{\textbf{b}}_j^*\Vert \). By using the bounds \(\Vert \textit{\textbf{b}}_i^*\Vert \in c \cdot \delta ^{-i} \cdot [1/2,2]\), this optimisation problem boils down to maximising \(\delta ^{ni/2-i^2/2}\) for \(i<n\). The maximum is \(\delta ^{n^2/8}\) (for \(i=n/2\)). The other terms are absorbed in the \(2^{O(n)}\) factor.

Theorem 1

Let \({\textit{\textbf{B}}}\) be a basis matrix of an n-dimensional rational lattice \(\mathcal {L}\). Assume that there exist \(c>0\) and \(\delta >1\) such that \(\Vert \textit{\textbf{b}}_i^*\Vert \in c \cdot \delta ^{-i} \cdot [1/2,2]\), for all \(i < n\). Then, given \({\textit{\textbf{B}}}\) as input (with \(A = 2\cdot v_n^{-1/n} \cdot {{{\,\mathrm{Vol}\,}}(\mathcal {L})}^{1/n}\)), the \({{\,\mathrm{Enum}\,}}\) algorithm returns a shortest non-zero vector of \(\mathcal {L}\) within \(\delta ^{\frac{n^2}{8}}\cdot 2^{O(n)} \cdot {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\textit{\textbf{B}}}))\) bit operations. Its space cost is \({{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\textit{\textbf{B}}}))\).

Kannan’s algorithm  [Kan83] relies on recursive calls to \({{\,\mathrm{Enum}\,}}\) to improve the quality of the Gram–Schmidt orthogonalisation of \(\textit{\textbf{B}} \), so that calling \({{\,\mathrm{Enum}\,}}\) on the preprocessed \(\textit{\textbf{B}} \) is less expensive. Its cost bound was lowered in  [HS07] and that cost upper bound was later showed to be sharp in the worst case, up to lower-order terms  [HS08].

Theorem 2

Let \({\textit{\textbf{B}}}\) be a basis matrix of an n-dimensional rational lattice \(\mathcal {L}\). Given \({\textit{\textbf{B}}}\) as input, Kannan’s algorithm returns a shortest non-zero vector of \(\mathcal {L}\) within \(n^{\frac{n}{2 \mathrm {e}} (1+o(1))} \cdot {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\textit{\textbf{B}}}))\) bit operations. Its space cost is \({{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\textit{\textbf{B}}}))\).

In practice, enumeration is accelerated using two main techniques. The first one, inspired from Kannan’s algorithm, consists in preprocessing the basis with a strong lattice reduction algorithm, such as BKZ (see next subsection). Note that BKZ uses an SVP solver in a lower dimension, so these algorithms can be viewed as calling themselves recursively, in an intertwined manner. The second one is tree pruning  [SE94, GNR10]. The justifying observation is that some tree nodes are much more unlikely than others to have leaves in their subtrees, and are hence discarded. More concretely, one considers the strengthened conditioned \(\Vert \pi _i(\sum _{j\ge i} x_j \cdot \textit{\textbf{b}}_j)\Vert \le t_i \cdot A \), for some pruning coefficients \(t_i \in (0,1)\). These coefficients can be used to extract a refined estimated enumeration cost as well as an estimated success probability (see, e.g.  [Che13, Sec. 3.3]). By making the probability extremely small, the cost-over-probability ratio can be lowered and the probability can be boosted by re-randomising the basis and repeating the pruned enumeration. This strategy is called extreme pruning  [GNR10].

2.3 Lattice Reduction

Given a basis matrix \(\textit{\textbf{B}} \in \mathbb {Q}^{m \times n}\) of a lattice \(\mathcal {L}\), the LLL algorithm  [LLJL82] outputs in polynomial time a basis \(\textit{\textbf{C}} \) of \(\mathcal {L}\) whose Gram–Schmidt norms cannot decrease too fast: \(\Vert \textit{\textbf{c}}_i^*\Vert \ge \Vert \textit{\textbf{c}}_{i-1}^*\Vert /2\) for every \(i<n\). In particular, we have \(\mathtt {rhf} (\textit{\textbf{C}}) \le 2\). A lattice basis \(\textit{\textbf{B}} \) is size-reduced if it satisfies \(|\mu _{i,j}|\le 1/2\) for \(j< i < n\) where \(\mu _{i,j} = \langle \textit{\textbf{b}}_i^{},\textit{\textbf{b}}_j^*\rangle /\langle \textit{\textbf{b}}_j^*,\textit{\textbf{b}}_j^*\rangle \). A lattice basis \(\textit{\textbf{B}} \) is HKZ-reduced if it is size-reduced and satisfies \(\Vert \textit{\textbf{b}}_i^*\Vert = \lambda _1(\pi _i(\mathcal {L}_{\left[ {i:n}\right) }))\), for all \(i < n\). A basis \(\textit{\textbf{B}} \) is BKZ-k reduced for block size \(k \ge 2\) if it is size-reduced and further satisfies \(\Vert \textit{\textbf{b}}_i^*\Vert = \lambda _1(\pi _i(\mathcal {L}_{\left[ {i:\min (i+k,n)}\right) }))\), for all \(i < n\).

The Schnorr-Euchner \({{\,\mathrm{BKZ}\,}}\) algorithm  [SE94] is the lattice reduction algorithm that is commonly used in practice, to obtain bases of better quality than those output by LLL (there exist algorithms that admit better analyses, such as  [GN08a, MW16, ALNS19], but BKZ remains the best in terms of practical performance reported in the current literature). \({{\,\mathrm{BKZ}\,}}\) inputs a block size k and a basis matrix \(\textit{\textbf{B}} \) of a lattice \({{\,\mathrm{\mathcal {L}}\,}}\), and outputs a basis which is “close” to being \({{\,\mathrm{BKZ}\,}}\)-k reduced, up to algorithm parameters. The \({{\,\mathrm{BKZ}\,}}\) algorithm calls an SVP solver in dimensions \(\le k\) on projected sublattices of the working basis of an n-dimensional input lattice. A \({{\,\mathrm{BKZ}\,}}\) sweep consists in SVP solver calls for \(\pi _i(\mathcal {L}_{\left[ {i:\min (i+k,n)}\right) })\) for i from 0 to \(n-2\). BKZ proceeds by repeating such sweeps, and typically a small number of sweeps suffices. At each execution of the SVP solver, if we have \(\lambda _1(\pi _i(\mathcal {L}_{\left[ {i:\min (i+k,n)}\right) })) < \delta \cdot \Vert \textit{\textbf{b}}_i^*\Vert \) where \(\delta <1\) is a relaxing parameter that is close to 1, then \({{\,\mathrm{BKZ}\,}}\) updates the block \(\pi _i(\textit{\textbf{B}} _{\left[ {i:\min (i+k,n)}\right) })\) by inserting the vector found by the SVP solver at index i. It then removes the created linear dependency, e.g. using a gcd computation (see, e.g.  [GN08a]). Whether there was an insertion or not, \({{\,\mathrm{BKZ}\,}}\) finally calls LLL on the local block \(\pi _{i}\left( \textit{\textbf{B}} _{\left[ {i:\min (i+k,n)}\right) } \right) \). The procedure terminates when no change occurs at all during a sweep or after certain termination condition is fulfilled. The higher k, the better the \({{\,\mathrm{BKZ}\,}}\) output quality, but the higher the cost: for large n, \({{\,\mathrm{BKZ}\,}}\) achieves root Hermite factor essentially \(k^{1/(2k)}\) (see  [HPS11]) using an SVP-solver in dimensions \(\le k\) a polynomially bounded number of times.

Schnorr  [Sch03] introduced a heuristic on the shape of the Gram–Schmidt norms of BKZ-reduced bases, called the Geometric Series Assumption (GSA). The GSA asserts that the Gram–Schmidt norms \(\{\Vert \textit{\textbf{b}}_i^*\Vert \}_{i < n}\) of a BKZ-reduced basis behave as a geometric series, i.e., there exists \(r>1\) such that \(\Vert \textit{\textbf{b}}_i^*\Vert /\Vert \textit{\textbf{b}}_{i+1}^*\Vert \approx r\) for all \(i < n-1\). In this situation, the root Hermite factor is \(\sqrt{r}\). It was experimentally observed  [CN11] that the GSA is a good first approximation to the shape of the Gram–Schmidt norms of BKZ. However, as observed in  [CN11] and studied in  [YD17], the GSA does not provide an exact fit to the experiments of BKZ for the last k indices; similarly, as observed in  [YD17] and studied in  [BSW18], the GSA also does not fit for the very few first indices (the latter phenomenon seems to vanish for large k, as opposed to the former).

We will use the self-dual \({{\,\mathrm{BKZ}\,}}\) algorithm (\({{\,\mathrm{SDBKZ}\,}}\)) from  [MW16]. \({{\,\mathrm{SDBKZ}\,}}\) proceeds similarly to \({{\,\mathrm{BKZ}\,}}\), except that it intertwines forward and backward sweeps (for choosing the inputs to the SVP solver), whereas \({{\,\mathrm{BKZ}\,}}\) uses only forward sweeps. Further, it only invokes the SVP solver in dimension exactly k, so that a forward sweep consists in considering \(\pi _i(\mathcal {L}_{\left[ {i:i+k}\right) })\) for i from 0 to \(n-k\) and a backward sweep consists in considering (the duals of) \(\pi _i(\mathcal {L}_{\left[ {i:i+k}\right) })\) for i from \(n-k\) down to 0. We assume that the final sweep is a forward sweep. We use \({{\,\mathrm{SDBKZ}\,}}\) in the theoretical analysis rather than \({{\,\mathrm{BKZ}\,}}\) because, under the Gaussian Heuristic and after polynomially many sweeps, the first \(n - k\) Gram–Schmidt norms of the basis (almost) decrease geometrically, i.e. satisfy the GSA. This may not be necessary for our result to hold, but this simplifies the computations significantly. We adapt  [MW16] by allowing \({{\,\mathrm{SDBKZ}\,}}\) to rely on a \(\gamma \)-\({{\,\mathrm{HSVP}\,}}\) solver \(\mathcal {O}\) rather than on an exact SVP solver (which in particular is a \(\sqrt{k}\)-\({{\,\mathrm{HSVP}\,}}\) solver). We let \({{\,\mathrm{{{\,\mathrm{SDBKZ}\,}}^{\mathcal {O}}}\,}}\) denote the modified algorithm. The analysis of  [MW16] can be readily adapted. We will rely on the following heuristic assumption, which extends the Gaussian Heuristic.

Heuristic 1

Let \(\mathcal {O}\) be a \(\gamma \)-\({{\,\mathrm{HSVP}\,}}\) solver in dimension k. During the \({{\,\mathrm{{{\,\mathrm{SDBKZ}\,}}^{\mathcal {O}}}\,}}\) execution, each call to \(\mathcal {O}\) for a projected k-dimensional sublattice \(\pi ({{\,\mathrm{\mathcal {L}}\,}})\) of the input lattice \({{\,\mathrm{\mathcal {L}}\,}}\) returns a vector of norm \(\approx \gamma \cdot {({{\,\mathrm{Vol}\,}}( \pi ({{\,\mathrm{\mathcal {L}}\,}})))}^{\frac{1}{k}}\).

The \({{\,\mathrm{{{\,\mathrm{SDBKZ}\,}}^{\mathcal {O}}}\,}}\) algorithm makes the Gram–Schmidt norms converge to a fix-point, very fast in terms of the number of \({{\,\mathrm{HSVP}\,}}\) calls  [MW16, Subsection 4.2]. That fix-point is described in  [MW16, Corollary 2]. Adapting these results leads to the following.

Theorem 3

(Under Heuristic 1). Let \(\mathcal {O}\) be a \(\gamma \)-\({{\,\mathrm{HSVP}\,}}\) solver in dimension k. Given as input a basis of an n-dimensional rational lattice \({{\,\mathrm{\mathcal {L}}\,}}\), \({{\,\mathrm{{{\,\mathrm{SDBKZ}\,}}^{\mathcal {O}}}\,}}\) outputs a basis \({\textit{\textbf{B}}}\) of \({{\,\mathrm{\mathcal {L}}\,}}\) such that, for all \(i < n-k\), we have

$$ \Vert \textit{\textbf{b}}_i^*\Vert \approx \gamma ^{\frac{n-1-2i}{k-1}} \cdot {({{\,\mathrm{Vol}\,}}{{\,\mathrm{\mathcal {L}}\,}})}^{\frac{1}{n}}. $$

The number of calls to \(\mathcal {O}\) is \(\le {{\,\mathrm{\mathrm {poly}}\,}}(n)\) and the bit-size of the output basis is \(\le {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\textit{\textbf{B}}}))\).

2.4 Simulating Lattice Reduction

To understand the behaviour of lattice reduction algorithms in practice, a useful approach is to conduct simulations. The underlying idea is to model the practical behaviour of the evolution of the Gram–Schmidt norms during the algorithm execution, without running a costly lattice reduction. Note that this requires only the Gram–Schmidt norms and not the full basis. Chen and Nguyen first provided a BKZ simulator  [CN11] based on the Gaussian Heuristic and with an experiment-driven modification for the blocks at the end of the basis. It relies on the assumption that each SVP solver call in the projected blocks (except the ones at the end of the basis) finds a vector whose norm corresponds to the Gaussian Heuristic applied to that local block. The remaining Gram–Schmidt norms of the block are updated to keep the determinant of the block constant. (Note that in the original  [CN11] simulator, these Gram–Schmidt norms are not updated to keep the determinant of the block constant, but are adjusted at the end of the sweep to keep the global determinant constant; our variant helps for taking enumeration costs into account.)

We extend this simulator in two ways: first, we adapt it to estimate the cost and not only the evolution of the Gram–Schmidt norms; second, we adapt it to other reduction algorithms, such as \({{\,\mathrm{SDBKZ}\,}}\). To estimate the cost, we use the estimates of the full enumeration cost, or the estimated cost of an enumeration with (extreme) pruning. The full enumeration cost estimate is used in Sect. 3 to model our first algorithm for which we can heuristically analyse the quality/cost trade-off. The pruned enumeration cost estimate is used in Sect. 4, which aims to provide a more precise study for practical and cryptographic dimensions. To find the enumeration cost with pruning, we make use of FPyLLL ’s module which numerically optimises pruning parameters for a time/success probability trade-off using a gradient descent.

In small block sizes, the enumeration cost is dominated by calls to LLL. In our code, we simply assume that one LLL call in dimension \(k\) costs the equivalent of visiting \(k^3\) nodes. This is an oversimplification but avoids completely ignoring this polynomial factor. We will compare our concrete estimates with empirical evidence from timing experiments with the implementation in FPLLL, to measure the effect of this imprecision. This assumption enables us to bootstrap our cost estimates. BKZ in block size up to, say, \(40\) only requires LLL preprocessing, allowing us to estimate the cost of preprocessing with block size up to 40, which in turn enables us to estimate the cost (including preprocessing) for larger block sizes etc. To extend the simulation to \({{\,\mathrm{SDBKZ}\,}}\), we simply run the simulation on the Gram–Schmidt norms of the dual basis \(1/\Vert \textit{\textbf{b}}_{n}^{*}\Vert , \ldots , 1/\Vert \textit{\textbf{b}}_{1}^{*}\Vert \). Our simulation source code is available as , as an attachment to the electronic version of the full version of this work.

We give pseudocode for our costed simulation in Algorithm 1. For \({{\,\mathrm{BKZ}\,}}\) simulation, we call Algorithm 1 with \(d=k\), \(c=0\) and with \({{\,\mathrm{tail}\,}}(x,y,z)\) simply outputting \(x\). For our simulations we prepared Gram–Schmidt shapes for LLL-reduced lattices in increasing dimensions \(d\) on which we then estimate the cost of running the algorithm in question for increasingly heavy preprocessing parameters \(k'\), selecting the least expensive one. In our search, we initialise \(c_{2} = 2^{3}\) and then iteratively compute \(c_{j+1}\) given \(c_{2},\ldots ,c_{j}\). When we instantiate Algorithm 1 we either manually pick some small \(t\) (Sect. 4) or pick \(t = \infty \) (Sect. 3.3) which means to run the algorithm until no more changes are made to the basis.

figure d

2.5 State-of-the-Art Enumeration-Based SVP Solving in Practice

To the best of our knowledge, there is no extrapolated running-time for state-of-the-art lattice reduction implementations. Furthermore, the simulation data in  [CN11, Che13] is only available up to a block size of 250. The purpose of this section is to fill this gap by providing extended simulations (and the source code used to produce them) and by reporting running times using the state-of-the-art FPyLLL   [dt19b] and FPLLL   [dt19a] libraries.

First, in Fig. 2 we reproduce the data from  [Che13, Table 5.2] for the estimated cost of solving SVP up to dimension 250, using enumeration.

We then also computed the expected cost (expressed as the number of visited enumeration nodes) up to dimension 500 for Fig. 2, see , attached to the electronic copy of the full version of this work, and Algorithm 1. We note that the preprocessing strategy adopted in our code is to always run two sweeps of preprocessing but that preprocessing proceeds recursively, e.g. preprocessing block size 80 with block size 60 may trigger a preprocessing with block size 40, if previously we found that preprocessing to be most efficient for solving SVP-60, as outlined above. This approach matches that of the FPLLL/FPyLLL   [dt17] which selects the default preprocessing and pruning strategies used in FPLLL/FPyLLL. Thus, the simulation approach resembles that of the actual implementation.

In Fig. 2, we also fitted the coefficients \(a_{1}, a_{2}\) of \(1/(2{{\,\mathrm{\mathrm {e}}\,}}) n\, \log n + a_{1} \cdot n + a_{2}\) to dimensions \(n\) from 150 to 249.Footnote 2

Furthermore, we plot the chosen preprocessing block sizes and success probability of a single enumeration (FPLLL uses extreme pruning) in Fig. 3. This highlights that, even in dimension 500, preprocessing is still well below the \(n - o(n)\) required for Kannan’s algorithm  [Kan83, MW15].

Fig. 2.
figure 2

Expected number of nodes visited during enumeration in dimension \(n\).

Figure 4 plots the running-times of FPLLL in terms of enumeration nodes, timed using  , available as an attachment to the electronic version of the full version of this work. Concretely, running-time in seconds is first converted to CPU cycles by multiplying with the clock speed 2.6 GHzFootnote 3 and we then convert from cycles to nodes by assuming visiting a node takes about 64 clock cycles.Footnote 4 Fig. 4 illustrates that our simulation is reasonably accurate. We note that for running the timing experiments with FPLLL we relied on FPLLL ’s own (recursive call and pruning) strategies, not those produced by our simulator.

Fig. 3.
figure 3

Reduction strategies used for Fig. 2.

Fig. 4.
figure 4

Number of nodes visited during enumeration in dimension \(n\).

The largest computational results known for finding short vectors in unstructured lattices is the Darmstadt SVP Challenge  [SG10]. This challenge asks contestants to find a vector at most \(1.05\) times larger than the Gaussian Heuristic. Thus, the challenge does not require to solve SVP exactly but the easier \((0.254 \sqrt{n})\)-HSVP problem. The strategy we used for SVP can be adapted to this problem as well, see , attached to the electronic version of the full version of document. To validate our simulation methodology against this data, we compare our estimates with various entries from the Hall of Fame for  [SG10] and the literature in Fig. 5.

Fig. 5.
figure 5

Darmstadt SVP Challenge.

We conclude this section by interpreting our simulation results in the context of BKZ. The quality output by BKZ in practice has been studied in the literature  [GN08b, Che13, AGVW17, YD17, BSW18]. Thus, our simulations imply that the running time of BKZ as implemented in  [dt19a] achieves root Hermite factor \(k^{1/(2k)}\) is bounded by \(k^{k/(2{{\,\mathrm{\mathrm {e}}\,}}) + o(k)}\). Indeed, this bound is tight, i.e. BKZ does not achieve a lower running time. To see this, consider the sandpile model of BKZ’s behaviour  [HPS11]. It implies that even if we start with a GSA line, this line from index \(i\) onward deteriorates as we perform updates on indices \(<i\). Furthermore, extreme pruning, which involves rerandomising local blocks, destroys the GSA shape. Thus, we can conclude that in practice BKZ

  • achieves root Hermite factor \(\approx {(\frac{k}{2\pi {{\,\mathrm{\mathrm {e}}\,}}} \cdot {(\pi \, k)}^{\frac{1}{k}})}^{\frac{1}{2(k-1)}}\)  [Che13]

  • in time \({{\,\mathrm{\mathrm {poly}}\,}}(d) \cdot 2^{1/(2{{\,\mathrm{\mathrm {e}}\,}})\,k \log k - 0.995\,k + 16.25} \approx {{\,\mathrm{\mathrm {poly}}\,}}(d) \cdot 2^{1/(2{{\,\mathrm{\mathrm {e}}\,}})\,k \log k - k + 16}\)

where the unit of time is the number of nodes visited during enumeration. We note that a similar conclusion was already drawn in  [APS15] and discussed in  [ABD+16]. However, that conclusion was drawn for the unpublished implementation and limited data in  [Che13].

3 Reaching Root Hermite Factor \(k^{\frac{1}{2k}(1+o(1))}\) in Time \(k^{\frac{k}{8}}\)

This section contains our main contribution: a lattice reduction algorithm that achieves root Hermite factor \(k^{\frac{1}{2k}(1+o(1))}\) in time \(k^{\frac{k}{8}}\). We start by a quality running-time trade-off boosting theorem, based on \({{\,\mathrm{SDBKZ}\,}}\). We then give and analyze the main algorithm, \({{\,\mathrm{FastEnum}\,}}\), and finally propose a simulator for that algorithm.

3.1 A Boosting Theorem

We first show that \({{\,\mathrm{SDBKZ}\,}}\) allows to obtain a reduction from a \(\gamma '\)-\({{\,\mathrm{HSVP}\,}}\) solver in dimension \(n'\) to a \(\gamma \)-\({{\,\mathrm{HSVP}\,}}\) solver in dimension n achieving a larger root Hermite factor. This reduction is not polynomial-time, but we will later aim at making it no more costly than the cost of our \(\gamma \)-\({{\,\mathrm{HSVP}\,}}\) solver.

Theorem 4

(Under Heuristic 1). Let \(\mathcal {O}\) be a \(\gamma \)-\({{\,\mathrm{HSVP}\,}}\) solver in dimension n. Assume we are given as input a basis \({\mathbf{B}}\) of an \(n'\)-dimensional lattice \({{\,\mathrm{\mathcal {L}}\,}}\), with \(n'>n\). We first call \({{\,\mathrm{{{\,\mathrm{SDBKZ}\,}}^{\mathcal {O}}}\,}}\) on \({\mathbf{B}}\): let \({\mathbf{C}}\) denote the output basis. Then we call the \({{\,\mathrm{Enum}\,}}\) algorithm on the sublattice basis made of the first \(n'-n\) vectors of \({\mathbf{C}}\). This provides a \(\gamma '\)-\({{\,\mathrm{HSVP}\,}}\) solver in dimension \(n'\), with

$$ \gamma ' \le \sqrt{n'-n} \; \gamma ^{\frac{n}{n-1}}. $$

The total cost is bounded by \({{\,\mathrm{\mathrm {poly}}\,}}(n')\) calls to \(\mathcal {O}\) and \(\gamma ^{\frac{{(n'-n)}^2}{4(n-1)}} \cdot 2^{O(n'-n)} \cdot {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\mathbf{B}}))\) bit operations.

Proof

By Theorem 3, we have \(\Vert {\mathbf{c}}_i^*\Vert \in \gamma ^{\frac{n'-1-2i}{n-1}} \cdot {({{\,\mathrm{Vol}\,}}({{\,\mathrm{\mathcal {L}}\,}}))}^{\frac{1}{n'}} \cdot [1/2,2]\), for all \(i < n'-n\). Also, the number of calls to \(\mathcal {O}\) is \(\le {{\,\mathrm{\mathrm {poly}}\,}}(n')\) and the bit-size of \({\mathbf{C}}\) is \(\le {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\mathbf{B}}))\).

By Theorem 1 (with “\(\delta = \gamma ^{\frac{2}{n-1}}\)”), the cost of the call to \({{\,\mathrm{Enum}\,}}\) is bounded as \(\gamma ^{\frac{(n'-n)^2}{4(n-1)}} \cdot 2^{O(n'-n)} \cdot {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\mathbf{C}}))\), which, by the above is \(\le \gamma ^{\frac{(n'-n)^2}{4(n-1)}} \cdot 2^{O(n'-n)} \cdot {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\mathbf{B}}))\). Further, by Minkowski’s theorem, the vector output by \({{\,\mathrm{Enum}\,}}\) has norm bounded from above by:

$$ \sqrt{n'-n} \cdot \prod _{i=0}^{n'-n-1} {\left( \gamma ^{\frac{n'-1-2i}{n-1}} {({{\,\mathrm{Vol}\,}}({{\,\mathrm{\mathcal {L}}\,}}))}^{\frac{1}{n'}} \right) }^{\frac{1}{n'-n}} = \sqrt{n'-n} \cdot \gamma ^{\frac{n}{n-1}} \cdot {({{\,\mathrm{Vol}\,}}({{\,\mathrm{\mathcal {L}}\,}}))}^{\frac{1}{n'}}. $$

This completes the proof of the theorem.    \(\square \)

Note that the result is not interesting if \(n'-n\) is chosen too small, as such a choice results in an increased root Hermite factor. Also, if \(n'-n\) is chosen too large, then the cost grows very fast. We consider the following instructive application of Theorem 4. By Theorem 2, Kannan’s algorithm finds a shortest non-zero of \({{\,\mathrm{\mathcal {L}}\,}}\) in time \(n^{\frac{n}{2{{\,\mathrm{\mathrm {e}}\,}}} (1+o(1))} \cdot {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\mathbf{B}}))\), when given as input a basis \({\mathbf{B}}\) of an n-dimensional lattice \({{\,\mathrm{\mathcal {L}}\,}}\). In particular, it solves \(\gamma \)-\({{\,\mathrm{HSVP}\,}}\) with \(\gamma = \sqrt{n}\) and provides a root Hermite factor \(\le n^{\frac{1}{2n}}\). We want to achieve a similar root Hermite factor, but for a lower cost. Now, for a cost parameter k, we would like to restrict the cost to \(k^{\frac{k}{8}} \cdot {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\mathbf{B}}))\) (ideally, while still achieving root Hermite factor \(k^{\frac{1}{2k}}\)). We hence choose an integer \(k_0: = \frac{{{\,\mathrm{\mathrm {e}}\,}}}{4} (1+o(1)) k\). This indeed provides a cost bounded as \(k^{\frac{k}{8}} \cdot {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\mathbf{B}}))\), but this only solves \(\gamma _0\)-\({{\,\mathrm{HSVP}\,}}\) with \(\gamma _0 = \varTheta (\sqrt{k_0})\) in dimension \(k_0\), i.e. only provides a root Hermite factor \(\approx \sqrt{k_0}^{\frac{1}{k_0}} = k^{\frac{2}{k{{\,\mathrm{\mathrm {e}}\,}}} (1+o(1))} \approx k^{\frac{0.74}{k}}\), which is much more than \(k^{\frac{1}{2k}}\). So far, we have not done anything but a change of variable. Now, let us see how Theorem 4 can help. We use it with \(\mathcal {O}\) being Kannan’s algorithm in dimension “\(n = k_0\)”. We set “\(n' = k_1\)” with \(k_1 = k_0 + \lceil \sqrt{k_0 k} \rceil \). This value is chosen so that the total cost bound of Theorem 4 remains \(k^{\frac{k}{8}} \cdot {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\mathbf{B}}))\). The achieved root Hermite factor is \(\le k^{ \frac{1}{k({{\,\mathrm{\mathrm {e}}\,}}/4 + \sqrt{{{\,\mathrm{\mathrm {e}}\,}}/4})} (1+o(1)) } \approx k^{\frac{0.66}{k}}\). Overall, for a similar cost bound, we have decreased the achieved root Hermite factor.

3.2 The \({{\,\mathrm{FastEnum}\,}}\) Algorithm

We iterate the process above to obtain the \({{\,\mathrm{FastEnum}\,}}\) algorithm, described in Algorithm 2. For this reason, we define \(k_0 = x_0 \cdot k\) with \(x_0 = \frac{{{\,\mathrm{\mathrm {e}}\,}}}{4}(1+o(1))\) and, for all \(i\ge 1\):

$$\begin{aligned} k_i = \lceil x_i \cdot k \rceil \ \text{ with } \ x_i = x_{i-1} + \sqrt{\frac{x_{i-1}}{i}}. \end{aligned}$$
(1)

We first study the sequence of \(x_i\)’s.

Lemma 1

We have \(i +1 - \sqrt{i}< x_i < i+1\) for all \(i \ge 1\).

Proof

The upper bound can be readily proved using an induction based on (1). It may be numerically checked that the lower bound holds for \(i \in \{1, 2, 3\}\). We show by induction to prove that \(1-\frac{x_i}{i} < \frac{1}{\sqrt{i}} - \frac{2}{i}\) for \(i \ge 4\), which is a stronger statement. It may be numerically checked that the latter holds for \(i=4\). Now, assume it holds for some \(i-1 \ge 4\) and that we aim at proving it for i. We have

$$\begin{aligned} 1 - \frac{x_{i}}{i} =&\frac{1}{i} \left( (i-1) \Big (1-\frac{x_{i-1}}{i-1}\Big ) + \Big (1-\sqrt{\frac{x_{i-1}}{i}}\Big ) \right) \\ =&\frac{1}{i} \left( (i-1) \Big (1-\frac{x_{i-1}}{i-1}\Big ) + \sqrt{\frac{i-1}{i}} \Big (1-\sqrt{\frac{x_{i-1}}{i-1}}\Big ) +1- \sqrt{\frac{i-1}{i}} \right) . \end{aligned}$$

Now, note that \(\sqrt{\frac{x_{i-1}}{i-1}} > 0.2\) (using our induction hypothesis). Using the bound \(1- \sqrt{t} < \frac{1}{2}(1-t) + \frac{1}{4}(1-t)^2\) which holds for all \(t>0.2\), we can bound \(1 - \frac{x_{i}}{i}\) from above by:

$$ \frac{1}{i} \left( (i-1) \Big (1-\frac{x_{i-1}}{i-1}\Big ) + 1 + \sqrt{\frac{i-1}{i}} \Big ( -1+ \frac{1}{2} \Big (1-\frac{x_{i-1}}{i-1}\Big ) + \frac{1}{4} {\Big (1-\frac{x_{i-1}}{i-1}\Big )}^2 \Big ) \right) . $$

It now suffices to observe that the right hand side is smaller than \(\frac{1}{\sqrt{i}} - \frac{2}{i}\), when \(1-\frac{x_{i-1}}{i-1}\) is replaced by \(\frac{1}{\sqrt{i-1}} - \frac{2}{i-1}\). This may be checked with a computer algebra software.    \(\square \)

The \({{\,\mathrm{FastEnum}\,}}\) algorithm (Algorithm 2) consists in calling the process described in Theorem 4 several times, to improve the root Hermite factor while staying within a \(k^{\frac{k}{8}}\) cost bound.

figure k

Theorem 5

(Under Heuristic 1). Let \(k \ge 4\) tending to infinity, and \(i \le 2^{o(k)}\).Footnote 5 The \({{\,\mathrm{FastEnum}\,}}\) algorithm with parameters k and i solves \(\gamma _i\)-\({{\,\mathrm{HSVP}\,}}\) in dimension \(k_i\), with \(\gamma _i \le k^{\frac{i+1}{2} (1+o(1))}\). For \(i\ge 1\), the corresponding root Hermite factor is below \(k^{\frac{i+1}{2(i+1-\sqrt{i})k} (1+o(1))}\). Further, \({{\,\mathrm{FastEnum}\,}}\) runs in time \(k^{\frac{k}{8}(1+o(1)) + i \cdot O(1)} \cdot {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\mathbf{B}}))\).

For constant values of i (as a function of k), the root Hermite factor is not quite \(k^{\frac{1}{2k}(1+o(1))}\), but it is so for any choice of \(i = \omega (1)\). For i satisfying both \(i= \omega (1)\) and \(i=o(k)\), \({{\,\mathrm{FastEnum}\,}}\) reaches a root Hermite factor \(k^{\frac{1}{2k}(1+o(1))}\) in time \(k^{\frac{k}{8}(1+o(1))} \cdot {{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\mathbf{B}}))\).

Proof

For \(\gamma _0 = \varTheta (\sqrt{k})\) and, by Theorem 4, we have \(\gamma _i \le \sqrt{k_i - k_{i-1}} \cdot \gamma _{i-1}^{\frac{k_{i-1}}{k_{i-1}-1}}\) for all \(i\ge 1\). Using the definition of the \(k_i\)’s and the bounds of Lemma 1, we obtain, for \(i \ge 1\):

$$ \gamma _i \le {\Big (1+\frac{x_{i-1}}{i} k^2 \Big )}^{1/4} \cdot \gamma _{i-1}^{\frac{k (i-\sqrt{i-1})+1}{k (i-\sqrt{i-1})-1}} \le \sqrt{2k} \cdot \gamma _{i-1}^{1+\frac{2}{ki/2 -1}}. $$

Using \(k \ge 4\), we see that the latter is \(\le \sqrt{2k} \gamma _{i-1}^{1+\frac{8}{ki}}\). By unfolding the recursion, we get, for \(i \ge 1\):

$$ \gamma _i \le \sqrt{2k}^{1 + \sum _{j=0}^{i-1} \prod _{\ell =j}^{i-1} (1+\frac{8}{k (\ell +1)})} \cdot \gamma _0^{\prod _{\ell =0}^{i-1} (1+\frac{8}{k (\ell +1)})}. $$

Now, note that we have (using the bound \(\sum _{\ell =0}^{i-1} \frac{1}{\ell +1} \le \ln (i)+1\), and the inequalities \(1+x \le \exp (x) \le 1+2x\) for \(x \in [0,1]\))

$$ \prod _{\ell =j}^{i-1} (1+\frac{8}{k (\ell +1)}) \le \exp \left( {\sum _{\ell =j}^{i-1}\frac{8}{k (\ell +1)}}\right) \le \exp \left( {\frac{8}{k} (\ln (i) + 1)}\right) \le 1+ \frac{16}{k} (\ln (i) + 1).$$

As \(i \le 2^{o(k)}\), the latter is \(\le 1+o(1)\). Overall, this gives \(\gamma _i \le k^{\frac{i+1}{2} (1+o(1))}\). The claim on the root Hermite factor follows from the lower bound of Lemma 1.

We now consider the run-time of the algorithm, and in particular the term \(\gamma _{i-1}^{\frac{(k_i-k_{i-1})^2}{4(k_{i-1}-1)}} \cdot 2^{O(k_i-k_{i-1})}\) from Theorem 4. Recall that by definition of the \(k_i\)’s, we have \(k_i - k_{i-1} \le 1+ \sqrt{\frac{k_{i-1} k }{i}}\). Using the upper bound of Lemma 1, we obtain that \(k_i -k_{i-1} \le O(k)\), and hence that \(2^{O(k_i-k_{i-1})} \le 2^{O(k)}\). We also have

$$ \gamma _{i-1}^{\frac{(k_i-k_{i-1})^2}{4(k_{i-1}-1)}} \le k^{\frac{i}{2} (1+o(1)) \frac{k_{i-1} k}{4i(k_{i-1}-1)}} \le k^{\frac{k}{8}(1+o(1))}. $$

Further, the number of recursive calls is bounded as \({{\,\mathrm{\mathrm {poly}}\,}}(\prod _{j \le i} k_j)\). By Lemma 1, this is \(\le k^{i \cdot O(1)}\). To complete the proof, it may be shown using standard techniques that all bases occurring during the algorithm have bit-sizes bounded as \({{\,\mathrm{\mathrm {poly}}\,}}({{\,\mathrm{\mathrm {size}}\,}}({\mathbf{B}}))\) (where the bound is independent from i).    \(\square \)

3.3 Simulation of Asymptotic Behaviour

In this subsection, we instantiate the \({{\,\mathrm{FastEnum}\,}}\) algorithm as described in Algorithm 2 and confirm its asymptotic behaviour via simulations. Note that the \({{\,\mathrm{FastEnum}\,}}\) algorithm requires \({{\,\mathrm{SDBKZ}\,}}\) subroutines. To simulate this subroutine, we use the costed simulation of Algorithm 1 with flags: \({{\,\mathrm{SDBKZ}\,}}\) and full enumeration cost. We also omit the cost of \({{\,\mathrm{LLL}\,}}\) in the simulation as the enumeration cost dominates in the parameter range considered in this subsection.

To compare the simulation with the theorems, we consider two scenarios. In the first one, called “Theoretical” we numerically compute the \(k_i\)’s, \(\gamma _i\)’s and the slope of the Gram–Schmidt log-norms of the enumeration block (i.e. the first \(k_i - k_{i-1}\) vectors) according to Theorem 5. Here the index i denotes the recursion level. Similarly, \(k_i\) and \(\gamma _i\) are defined in the same way as in (1) and Theorem 5, respectively. In the second one, called “Simulated” we still set the \(k_i\)’s according to (1). However, at the i-th level, we first run an \({{\,\mathrm{SDBKZ}\,}}\) simulation on a lattice of dimension \(k_i\), using the \(\gamma _{i-1}\)-HSVP (simulated) oracle from the previous level. Here, the Hermite factor \(\gamma _{i-1}\) is computed from the simulated basis at the \((i-1)\)-th level. The initial \(\gamma _0\) is computed from a simulated HKZ-reduced basis of dimension \(k_0\). During the \({{\,\mathrm{SDBKZ}\,}}\) simulation, for each HSVP call, we assume that the same Hermite factor \(\gamma _{i-1}\) is achieved. We let the simulated \({{\,\mathrm{SDBKZ}\,}}\) run until no change occurs to the basis or if it has already achieved the theoretical root Hermite factor at the same level, as guided by the proof of Theorem 5. After the simulated \({{\,\mathrm{SDBKZ}\,}}\) preprocessing, we simulate an enumeration in the first block of dimension \(k_i - k_{i-1}\). The enumeration cost is estimated using the full enumeration cost model (see Sect. 2.4), since here we are only interested in the asymptotic behaviour (we defer to Sect. 4 for the concrete behaviour). For a fixed cost parameter k, we consider \(\lceil \ln k\rfloor \) recursion levels \(i=0, \ldots , (\lceil \ln k\rfloor -1)\). For the implementation used for these experiments, we refer to attached to the electronic version of the full version of this work. This simulation algorithm is an instantiation of Algorithm 1.

Using the simulator described above, we computed the achieved simulated root Hermite factors for various cost parameters k from 100 to 2, 999. The results are plotted in Fig. 6. We also computed the theoretical root Hermite factors as established by Theorem 5. More precisely, we used the proof of Theorem 5 to update the root Hermite factors recursively, replacing the term \(\sqrt{n'-n}\) of Theorem 4 by \(v_{n'-n}^{-1/(n'-n)}\) (which corresponds to using the Gaussian Heuristic). It can be observed that the theoretical and simulated root Hermite factors agree closely.

Fig. 6.
figure 6

Simulated and theoretical root Hermite factors for \(k = 100\) to 2, 999 after \(\ln k\) levels of recursion.

Fig. 7.
figure 7

Number of nodes in full enumeration visited during simulation, and a fit.

Figure 7 shows the number of nodes visited during the simulation from \(k=100\) to 2, 999, as well as a curve fit. As an example of the output, Fig. 8 plots the Gram–Schmidt log-norms of the (simulated) reduced basis for \(k=1,000\) right after 7 levels of recursion. Note that last Gram–Schmidt norms of the basis have the shape of those of an HKZ-reduced basis, since we use Kannan’s algorithm at level 0. Also, the successive segments correspond to levels of recursion, their lengths decrease and their respective (negative) slopes decrease with the indices of the Gram–Schmidt norms.

Fig. 8.
figure 8

Gram–Schmidt log-norms of simulated experiments with \(k=1,000\) after \(7 \approx \ln k\) recursion levels.

Finally, we plot the Gram–Schmidt log-norms slope for \(k=1,000\) during the first 20 recursion levels. At level i, we compute the slope for the enumeration region (i.e. the first block of size \(k_i - k_{i-1}\)). It can be observed that the simulated slope is indeed increasing (Fig. 9).

Fig. 9.
figure 9

Simulated and theoretical Gram–Schmidt log-norms slope of enumeration region, for \(k=1,000\) and during the first 20 iterations.

4 A Practical Variant

It can be observed that, in our analysis of Algorithm 2, the dimension of the lattice is relatively large. It is thus interesting to investigate algorithms that require smaller dimensions. In this subsection, we describe a practical strategy that works with dimensions \(d = O(k)\) where the hidden constants are small. As mentioned in the introduction, practical implementations of lattice reduction algorithms often deviate from the asymptotically efficient variants, e.g. by applying much weaker preprocessing than required asymptotically. In this section, we use numerically optimised preprocessing and enumeration strategies to parameterise Algorithm 3, which we view as a practical variant of Algorithm 2, working with dimensions \(d = \lceil (1+c)\cdot k\rceil \) for some small constant \(c\ge 0\). It differs from Algorithm 2 in two respects. First, it applies BKZ preprocessing instead of \({{\,\mathrm{SDBKZ}\,}}\) preprocessing. This is merely an artefact of the latter seemingly not providing an advantage in the parameter ranges we considered. Second, the algorithm adapts the enumeration dimension based on the “space available” for preprocessing. This is to enforce that it stays within \(d\) dimensions, instead of requiring \(\approx i k\) dimensions where i is the number of recursion levels.

We use the following functions in Algorithm 3:

  • The function \({{\,\mathrm{pre}\,}}({k})\) returns a preprocessing cost parameter for a given \(k\).

  • The function \({{\,\mathrm{tail}\,}}(k, c, d)\) returns a new cost parameter \(k^{\star }\) such that enumeration in dimension \(k^{\star }\) after preprocessing with \({{\,\mathrm{pre}\,}}({k^{\star }})\) in dimension \(d\) costs at most as much as enumeration in dimension \(k\) after preprocessing in dimension \(\lceil (1+c)\cdot k \rceil \). In particular, if \(d \ge \lceil (1+c) \cdot k \rceil \) then \(k^{\star } = k\).

  • Preprocessing (Step 4) calls Algorithm 4, perhaps restricted to a small number of while loops. Algorithm 4 is simply the BKZ algorithm where the SVP oracle is replaced by Algorithm 3.

figure m
figure n

We plot the output of our simulations for Algorithm 3 in Fig. 10. These simulations are instantiations of Algorithm 1 with \(d>k\), \(c>0\) and \({{\,\mathrm{tail}\,}}(x,y,z)\) matching those used in Algorithm 3. These were produced using , attached to the electronic version of the full version of this work. Our strategy finding strategy follows the same blueprint as described in Sect. 2.4. Through such simulation experiments we manually established that \(c=0.25\), four sweeps of preprocessing and using BKZ over \({{\,\mathrm{SDBKZ}\,}}\) seems to provide the best performance, which is why we report data on these choices.Footnote 6 We also fitted the coefficients \(a_{0}, a_{1}, a_{2}\) of \(a_{0} \cdot k \log k + a_{1} \cdot k + a_{2}\) to points from 100 to 249. Furthermore, we plot the data from Fig. 2 to provide a reference point for the performance of the new algorithm and also provide some data on the hypothetical performance of Algorithm 3 assuming the cost of all preprocessing costs is only as much as LLL regardless of the choice of \(k'\). This can be considered the best case scenario for Algorithm 3 and thus a rough lower bound on its running time.Footnote 7

In Fig. 11 we give the preprocessing cost parameters and probabilities of success of a single enumeration selected by our optimisation. In particular, these figures suggest that the success probability per enumeration does not drop exponentially fast in Fig. 11b. This is consistent with the second order term in the time complexity which is closer to \(1/2\) (corresponding to standard pruning) than \(1\) (corresponding to extreme pruning). Similarly, in contrast to Fig. 3a the preprocessing cost parameter (or “block size”) \(k'\) in Fig. 11a does not seem to follow an affine function of \(k\), i.e. it seems to grow faster for larger dimensions.

We also give experimental data comparing our implementation of Algorithm 3, , attached to the electronic version of the full version of document, with our simulations in Fig. 12. We note that our implementation of Algorithm 3 is faster than FPyLLL ’s SVP solver from dimension 82 onward. As in Sect. 2.5, we do not use the strategies produced by our simulation to run the implementation but rely on a variant of FPLLL ’s strategizer  [dt17] to optimise these strategies.

Fig. 10.
figure 10

Cost of one call to Algorithm 3 with enumeration dimension \(k\), \(c=1/4\), \(d=\lceil (1+c)\cdot k \rceil \) and four preprocessing sweeps.

Fig. 11.
figure 11

Reduction strategies used for Fig. 10.

Comparing Figs. 2 and 10 is meaningless without taking the obtained root Hermite factors into account. First, Algorithm 3 is not an SVP solver but an Approx-HSVP solver. Second, if \(d<\lceil (1+c)\cdot k \rceil \) then Algorithm 3 will reduce the enumeration dimension, further decreasing the quality of the output.

Since we are interested in running Algorithm 3 as a subroutine of Algorithm 4, we compare the latter against plain BKZ. For this comparison we consider the case \(d = 2\cdot k\), which corresponds to a typical setting encountered in cryptographic applications.

In Fig. 13, we plot the slope of the Gram–Schmidt log-norms as predicted by our simulations for \({{\,\mathrm{BKZ}\,}}\) on the one hand, and a self-dual variant of Algorithm 4. This variant first runs Algorithm 4 on the dual basis, followed by running Algorithm 4 on the original basis. Each run is capped at half the number of sweeps as used for BKZ. The rationale for this strategy is that it handles the quality degradation as the BKZ index \(i\) surpasses \(d-\lceil (1+c)\cdot k \rceil \) where \(k^{\star } < k\). As Fig. 13 illustrates, the obtained quality of the two algorithms is very close. Indeed our SD variant slightly outperforms BKZ, but we note that the ratio of the two is increasing, i.e. the quality advantage will invert as d increases.

Fig. 12.
figure 12

Number of nodes visited during one Approx-HSVP call with enumeration dimension \(k\), \(c=1/4\), \(d=\lceil (1+c)\cdot k \rceil \) and four sweeps of preprocessing.

Fig. 13.
figure 13

Basis quality (BKZ vs SD-Algorithm 4)