1 Introduction

Lattice cryptography provides powerful techniques to build a wide range of advanced cryptographic primitives, like identity based encryption [2,3,4, 11, 23, 31], attribute based encryption [14, 16, 17, 19, 33], some types of fully homomorphic encryption and signatures [12, 13, 24, 32, 35], group signatures [21, 36, 44, 45, 54] and much more (e.g., see [6, 10, 34, 46, 51, 56, 57, 60]). Most of the advanced applications of lattice cryptography rely on a notion of strong lattice trapdoor, introduced in [31], which allows to sample points from an n-dimensional lattice L with a gaussian-like distribution. This gaussian sampling operation is often the main bottleneck in the implementation of advanced cryptographic functions that make use of strong lattice trapdoors, and improving the methods to generate and use lattice trapdoors has been the subject of several investigations [5, 7, 31, 55].

The current state of the art in lattice trapdoor generation and sampling is given by the work of Micciancio and Peikert [51], which introduces a new notion of lattice trapdoor, specialized to the type of q-ary lattices used in cryptography, i.e., integer lattices \(L \subseteq \mathbb {Z}^{n}\) that are periodic modulo \(q\cdot \mathbb {Z}^{n}\). The trapdoor is then used to efficiently sample lattice points with gaussian distribution around a given target. Building on techniques from [55], the sampling algorithm of [51] includes both an on-line and an off-line stage, and [51] focuses on improving the complexity of the on-line stage, which is far more critical in applications. Unfortunately, the most efficient algorithms proposed in [51] for (the on-line stage of) preimage sampling only apply to lattices with modulus \(q=2^{k}\) equal to a power of 2 (or, more generally, the power \(q=p^{k}\) of a small prime p,) which is not compatible with the functional or efficiency requirements of many applications. Moreover, only the on-line stage of [51] takes full advantage of the structure of algebraic lattices [48,49,50] typically employed in the efficient instantiation of lattice cryptography, and essential to reduce the running time of lattice operations from quadratic (in the lattice dimension) to quasi-linear. A straightforward implementation of the off-line stage (e.g., using a generic Cholesky decomposition algorithm) completely destroys the algebraic structure, and degrades the running time of the (off-line) algorithm from quasi-linear to quadratic or worse. For lattices over “power-of-two” cyclotomic rings (the most popular class of algebraic lattices used in cryptography), a much faster algorithm for the off-line stage was proposed by Ducas and Nguyen in [27, Sect. 6], and subsequently simplified, improved and extended to a more general class of cyclotomic rings by the Fast Fourier Orthogonalization (FFO) of Ducas and Prest [28].

Our Contribution. We present improved algorithms for gaussian preimage sampling using the lattice trapdoors of [51]. Specifically, we present a new algorithm (for the on-line stage) capable of handling any modulus q (including the large prime moduli required by some applications) and still achieve the same level of performance of the specialized algorithm of [51] for power-of-two modulus \(q=2^{k}\). This improves the running time of [51] for arbitrary modulus from cubic \(\log ^{3} q\) (or quadratic \(\log ^{2} q\), using precomputation and a substantial amount of storage) to just linear in \(\log q\) and with minimal storage requirements.

As an additional contribution, we present an improved algorithm for the off-line perturbation generation problem which takes full advantage of the algebraic structure of ring lattices. We remark that this problem can already be solved (in quasilinear time \(\tilde{O}(n)\)) using the (FFO) algorithm of [28], which first produces a compact representation of the orthogonalized lattice basis (or covariance matrix), and then uses it to quickly generate lattice samples. We improve on the algorithm of [28] on two fronts. First, the FFO algorithm is quasi-linear in the ring dimension, but quadratic in the module dimension (which, in our application, is \(\log q\)). We combine [28] with the “sparse matrix” optimization of [9] to yield an algorithm that is linear both in the ring dimension and \(\log q\). Moreover, we provide a variant of the FFO algorithm that performs essentially the same operations as [28], but without requiring the precomputation and storage of the FFO (structured) matrix, thereby simplifying the implementation and improving the space complexity of [28].

The G-sampling improvements are summarized in Table 1. The improvements are not just asymptotic: our new algorithms are fairly simple, with small hidden constants, and include a careful choice of the parameters that allows to implement most steps using only integer arithmetic on very small numbers. In Sect. 3.3, we provide an experimental comparison showing that the new algorithm outperforms the generic method of [51] already for small values of the moduli, making it an attractive choice for implementations even in applications where the modulus \(q=n^{O(1)}\) has logarithmic bit-size. For applications using an exponentially large \(q = \exp (n)\), the projected performance improvements are dramatic. The concrete efficiency of our algorithms in the context of full blown cryptographic applications, has been recently confirmed by independent implementation efforts [25, 37, 38].

Table 1. Running time and storage of the (G-sampling) algorithm. G-Sampling running times are scaled by a factor n to take into account that each sample requires n independent calls to the underlying G-sampling operation.

Technical details. In order to describe our techniques, we need first to provide more details on the lattice trapdoor sampling problem. Given a lattice L and a target point \(\mathbf {t}\), the lattice gaussian sampling problem asks to generate (possibly with the help of some trapdoor information) a random lattice point \(\mathbf {v} \in L\) with probability proportional to \(\exp (-c \Vert \mathbf {v} - \mathbf {t}\Vert ^{2})\). Building on techniques from [55], this problem is solved in [51] by mapping L to a fixed (key independent) lattice \(G^{n}\), generating a gaussian sample in \(G^{n}\), and then mapping the result back to L. (The linear function T mapping \(G^{n}\) to L serves as the trapdoor.) Without further adjustments, this produces a lattice point in L with ellipsoidal gaussian distribution, with covariance which depends on the linear transformation T. In order to produce spherical samples (as requiredFootnote 1 by applications), [51] employs a perturbation technique of Peikert [55] which adds some noise (with complementary covariance) to the target \(\mathbf {t}\), before using it as a center for the \(G^{n}\)-lattice sampling operation. In summary, the sampling algorithm of [51, 55] consists of two stages:

  • an off-line (target independent) stage, which generates perturbation vectors with covariance matrix defined by the trapdoor transformation T, and

  • an on-line (target dependent) stage which generates gaussian samples from an (easy to sample) lattice \(G^{n}\).

Not much attention is paid in [51] to the perturbation generation, as it does not depend on the target vector \(\mathbf {t}\), and it is far less time critical in applications.Footnote 2 As for the on-line stage, one of the properties that make the lattice \(G^{n}\) easy to sample is that it is the orthogonal sum of n copies of a \((\log q)\)-dimensional lattice G. So, even using generic algorithms with quadratic running time, G sampling takes a total of \(O(n\log ^{2} q)\) operations. For moduli \(q = n^{O(1)}\) polynomial in the lattice dimension n, this results in quasilinear running time \(O(n \log ^{2} n)\). However, since the G-sampling operation directly affects the on-line running time of the signing algorithm, even a polylogarithmic term \(\log ^{2} q\) can be highly undesirable. To this end, [51] gives a particularly efficient (and easy to implement) algorithm for G-lattice sampling when the lattice modulus \(q=2^{k}\) is a power of 2 (or more generally, a power \(q=p^{k}\) of a small prime p.) The running time of this specialized G-sampling algorithm is \(\log q\), just linear in the lattice dimension, and has minimal (constant) storage requirements. Thanks to its simplicity and efficiency, this algorithm has quickly found its way in concrete implementations of lattice based cryptographic primitives (e.g., see [9]), largely solving the problem of efficient lattice sampling for \(q=2^{k}\). However, setting q to a power of 2 (or more generally, the power of a small prime), may be incompatible with applications and other techniques used in lattice cryptography, like attribute based encryption (ABE) schemes [14] and fast implementation via the number theoretic transform [47, 49]. For arbitrary modulus q, [51] falls back to generic algorithms (for arbitrary lattices) with quadratic complexity. This may still be acceptable when the modulus q is relatively small. But it is nevertheless undesirable, as even polylogarithmic factors have a significant impact on the practical performance of cryptographic functions (easily increasing running times by an order of magnitude), and can make applications completely unusable when the modulus \(q = \exp (n)\) is exponentially large. The concrete example best well illustrates the limitations of [51] is the recent conjunction obfuscator of [20], which requires the modulus q to be prime with bitsize \(\log (q) = O(n)\) linear in the security parameter. In this setting, the specialized algorithm of [51] (for \(q=2^{k}\)) is not applicable, and using a generic algorithm slows down the on-line stage by a factor O(n), or, more concretely, various orders of magnitude for typical parameter settings. Another, less drastic, example is the arithmetic circuit ABE scheme of [14] where q is \(O(2^{n^{\epsilon }})\) for some fixed \(0< \epsilon < 1/2\). Here the slow down is asymptotically smaller, \(n^{\epsilon }\), but still polynomial in the security parameter n.

Unfortunately, the specialized algorithm from [51] makes critical use of the structure of the G-basis when \(q=2^{k}\), and is not easily adapted to other moduli. (See Sect. 3 for details). In order to solve this problem we resort to the same approach used in [51, 55] to generate samples from arbitrary lattices: we map G to an even simpler lattice D using an easy to compute linear transformation \(T'\), perform the gaussian sampling in D, and map the result back to G. As usual, the error shape is corrected by including a perturbation term with appropriate covariance matrix. The main technical problem to be solved is to find a suitable linear transformation \(T'\) such that D can be efficiently sampled and perturbation terms can be easily generated. In Sect. 3 we demonstrate a choice of transformation \(T'\) with all these desirable properties. In particular, using a carefully chosen transformation \(T'\), we obtain lattices D and perturbation matrices that are triangular, sparse, and whose entries admit a simple (and efficiently computable) closed formula expression. So, there is not even a need to store these sparse matrices explicitly, as their entries can be easily computed on the fly. This results in a G-sampling algorithm with linear running time, and minimal (constant) space requirements, beyond the space necessary to store the input, output and randomness of the algorithm.

Next, in Sect. 4, we turn to the problem of efficiently generating the perturbations of the off-line stage. Notice that generating these perturbations is a much harder problem than the one faced when mapping G to D (via \(T'\)). The difference is that while \(G,D,T'\) are fixed (sparse, carefully designed) matrices, the transformation T is a randomly chosen matrix that is used as secret key. In this setting, there is no hope to reduce the computation time to linear in the lattice dimension, because even reading/writing the matrix T can in general take quadratic time. Still, when using algebraic lattices, matrix T admits a compact (linear size) representation, and one can reasonably hope for faster perturbation generation algorithms. As already noted, this can be achieved using the Fast Fourier Orthogonalization algorithm of Ducas and Prest [28], which has running time quasilinear in the ring dimension, but quadratic in the dimension (over the ring) of the matrix T, which is \(O(\log q)\) in our setting. As usual, while for polynomial moduli \(q=n^{O(1)}\), this is only a polylogarithmic slow down, it can be quite significant in practice [9]. We improve on a direct application of the FFO algorithm by first employing an optimization of Bansarkhani and Buchmann [9] to exploit the sparsity of T. (This corresponds to the top level function SamplePz in Fig. 4.) This optimization makes the computation linear in the dimension of T (\(\log q\) in our setting), while keeping the quasilinear dependency on the ring dimension n from [28]. We further improve this combined algorithm by presenting a variant of FFO (described by the two mutually recursive functions SampleFz/Sample2z in Fig. 4) that does not require the precomputation and storage of the FFO matrix.

Comparison with FFO. Since our SamplePz function (Fig. 4) uses a subprocedure SampleFz which is closely related to the FFO algorithm [28], we provide a detailed comparison between the two. We recall that FFO works by first computing a binary tree data structure [28, Algorithm 3], where the root node is labeled by an n-dimensional vector, its two children are labeled by (n / 2)-dimensional vectors, and so on, all the way down to n leaves which are labeled with 1-dimensional vectors. Then, [28, Algorithm 4] uses this binary tree data structure within a block/recursive variant of Babai’s nearest plane algorithm.Footnote 3 Our SampleFz is based on the observation that one can blend/interleave the computation of [28, Algorithm 3] and [28, Algorithm 4], leading to a substantial (asymptotic) memory saving. Specifically, combining the two algorithms avoids the need to precompute and store the FFO binary tree data structure altogether, which is now implicitly generated, on the fly, one node/vector at a time, and discarding each node/vector as soon as possible in a way similar to a depth-first tree traversal. The resulting reduction in space complexity is easily estimated. The original FFO builds a tree with \(\log n\) levels, where level l stores \(2^{l}\) vectors in dimension \(n/2^{l}\). So, the total storage requirement for each level is n, giving an overall space complexity of \(n \log n\). Our FFO variant only stores one node/vector per level, and has space complexity \(\sum _l (n/2^{l}) = 2n\), a \(O(\log n)\) improvement over the space of original FFO algorithm. Moreover, the nodes/vectors are stored implicitly in the execution stack of the program, rather than an explicitly constructed binary tree data structure, yielding lower overhead and an algorithm that is easier to implement. For simplicity we specialized our presentation to power-of-two cyclotomics, which are the most commonly used in lattice cryptography, but everything works equally well for the larger class of cyclotomic rings, in the canonical embedding, considered in [28].

2 Preliminaries

We denote the complex numbers as \(\mathbb C\), the real numbers as \(\mathbb R\), the rational numbers as \(\mathbb {Q}\), and the integers as \(\mathbb {Z}\). A number is denoted by a lower case letter, \(z \in \mathbb {Z}\) for example. We denote the conjugate of a complex number y as \(y^{*}\). When q is a positive integer, \(\log q\) is short for its rounded up logarithm in base two, \(\lceil \log _2 q \rceil \). A floating point number with mantissa length m representing \(x \in \mathbb {R}\) is denoted as \(\bar{x}\). The index set of the first n natural numbers is \([n] = \{1, \dots , n\}\). Vectors are denoted by bold lower case letters, \(\mathbf {v}\), and are in column form (\(\mathbf {v}^{T}\) is a row vector) unless stated otherwise. The inner product of two vectors is \(\left\langle {\mathbf {x},\mathbf {y}}\right\rangle = \mathbf {x}^{T} \mathbf {y}\). We denote matrices with bold upper case letters \(\mathbf {B}\) or with upper case Greek letters (for positive-definite matrices). The transpose of a matrix is \(\mathbf {B}^{T}\). The entry of \(\mathbf {B}\) in row i and column j is denoted \(B_{i,j}\). Unless otherwise stated, the norm of a vector is the \(\ell _2\) norm. The norm of a matrix \(\left\| {\mathbf {B}}\right\| = \max _i \left\| {\mathbf {b}_i}\right\| \) is the maximum norm of its column vectors. Given two probability distributions over a countable domain D, the statistical distance between them is \(\varDelta _{\textsc {sd}}(X,Y) = \frac{1}{2}\sum _{\omega \in D}|X(\omega ) - Y(\omega )|\). In order to avoid tracing irrelevant terms in our statistical distance computations, we define \(\hat{\epsilon } = \epsilon + O(\epsilon ^{2})\).

We denote a random variable x sampled from a distribution \(\mathcal {D}\) as \(x \leftarrow \mathcal {D}\). A random variable distributed as \(\mathcal {D}\) is denoted \(x \sim \mathcal {D}\). We denote an algorithm \(\mathcal {A}\) with oracle access to another algorithm \(\mathcal {B}\) (distribution \(\mathcal {D}\)) as \(\mathcal {A}^{\mathcal {B}}\) (\(\mathcal {A}^{\mathcal {D}}\)).

The max-log, or ML, distance between two distributions was recently introduced by [53] in order to prove tighter bounds for concrete security. The ML distance between two discrete distributions over the same support, S, as

$$\varDelta _{\textsc {ml}}(\mathcal {P}, \mathcal {Q}) = \max \limits _{x \in S} |\ln \mathcal {Q}(x) - \ln \mathcal {P}(x)|.$$

Let \(\mathcal {P}, \mathcal {Q}\) be distributions over a countable domain again and let S be the support of \(\mathcal {P}\).

The Rényi divergence of order infinity of \(\mathcal {Q}\) from \(\mathcal {P}\) is

$$R_{\infty }(\mathcal {P} || \mathcal {Q}) = \max _{x \in S}\frac{\mathcal {P}(x)}{\mathcal {Q}(x)}.$$

Rényi divergence is used in [8] to yield a tighter security analysis than one using statistical distance.

2.1 Linear Algebra

The (foreward) Gram-Schmidt orthogonalization of an ordered set of linearly independent vectors \(\mathbf {B} = \{\mathbf {b}_1, \dots , \mathbf {b}_k\}\) is \(\widetilde{\mathbf {B}} = \{ \mathbf {\widetilde{b}}_1, \dots , \mathbf {\widetilde{b}}_k\}\) where each \(\mathbf {\widetilde{b}}_i\) is the component of \(\mathbf {b}_i\) orthogonal to \(\hbox {span}(\mathbf {b}_1, \dots , \mathbf {b}_{i-1})\) (and the backward GSO is defined as \(\mathbf {b}_i^{\dagger } = \mathbf {b}_i \perp \text {span}(\mathbf {b}_{i+1}, \dots , \mathbf {b}_n)\)). An anti-cylic matrix is an \(n \times n\) matrix of the form

$$ \begin{bmatrix} a_0&-a_{n-1}&\dots&-a_1 \\ a_1&a_0&\dots&-a_2 \\ \vdots&\vdots&\ddots&\vdots \\ a_{n-1}&a_{n-2}&\dots&a_0 \end{bmatrix}. $$

For any two (symmetric) matrices \(\varSigma ,\varGamma \in \mathbb {R}^{n\times n}\), we write \(\varSigma \succeq \varGamma \) if \(\mathbf {x}^{T}(\varSigma -\varGamma )\mathbf {x} \ge 0\) for all (nonzero) vectors \(\mathbf {x}\in \mathbb {R}^{n}\), and \(\varSigma \succ \varGamma \) if \(\mathbf {x}^{T}(\varSigma -\varGamma )\mathbf {x} > 0\). It is easy to check that \(\succeq \) is a partial order relation. Relations \(\preceq \) and \(\prec \) are defined symmetrically. When one of the two matrices \(\varGamma = s \mathbf {I}\) is scalar, we simply write \(\varSigma \succeq s\) or \(\varSigma \preceq s\). A symmetric matrix \(\varSigma \in \mathbb {R}^{n \times n}\) is called positive definite if \(\varSigma \succ 0\), and positive semidefinite if \(\varSigma \succeq 0\). Equivalently, \(\varSigma \) is positive semidefinite if and only if it can be written as \(\varSigma = \mathbf {B}\mathbf {B}^{T}\) for some (square) matrix \(\mathbf {B}\), called a square root of \(\varSigma \) and denoted \(\mathbf {B} = \sqrt{\varSigma }\). (Notice that any \(\varSigma \succ 0\) has infinitely many square roots \(\mathbf {B} = \sqrt{\varSigma }\).) \(\varSigma \) is positive definite if and only if its square root \(\mathbf {B}\) is a square nonsingular matrix. When \(\mathbf {B}\) is upper (resp. lower) triangular, the factorization \(\varSigma = \mathbf {B}\mathbf {B}^{T}\) is called the upper (resp. lower) triangular Cholesky decomposition of \(\varSigma \). The Cholesky decomposition of any positive definite \(\varSigma \in \mathbb {R}^{n \times n}\) can be computed with \(O(n^{3})\) floating point arithmetic operations. For any scalar s, \(\varSigma \succ s\) if and only if all eigenvalues of \(\varSigma \) are strictly greater than s. In particular, positive definite matrices are nonsingular.

For any \(n\times n\) matrix \(\mathbf {S}\) and non-empty index sets \(I,J \subseteq \{1,\ldots ,n\}\), we write \(\mathbf {S}[I,J]\) for the \(|I| \times |J|\) matrix obtained by selecting the elements at positions \((i,j)\in I\times J\) from \(\mathbf {S}\). When \(I=J\), we write \(\mathbf {S}[I]\) as a shorthand for \(\mathbf {S}[I,I]\). For any nonsingular matrix \(\mathbf {S} \in \mathbb {R}^{n\times n}\) and index partition \(I \cup \bar{I} = \{1,\ldots ,n\}\), \(I\cap \bar{I} = \emptyset \), the \(I \times I\) matrix

$$\begin{aligned} \mathbf {S} / I = \mathbf {S}[I] - \mathbf {S}[I,\bar{I}]\cdot \mathbf {S}[\bar{I}]^{-1}\cdot \mathbf {S}[\bar{I},I] \end{aligned}$$

is called the Schur complement of \(\mathbf {S}[\bar{I}]\), often denoted by \(\mathbf {S} / \mathbf {S}[\bar{I}] = \mathbf {S} / I\). In particular, if \(\mathbf {S} = \begin{bmatrix} \mathbf {A}&\mathbf {B} \\ \mathbf {B}^{T}&\mathbf {D} \end{bmatrix}\) then the Schur complement of \(\mathbf {A}\) is the matrix \(\mathbf {S} / \mathbf {A} = \mathbf {D} - \mathbf {B}^{T} \mathbf {A}^{-1}\mathbf {B}\). For any index set I, a symmetric matrix \(\mathbf {S}\) is positive definite if and only if both \(\mathbf {S}[I]\) and its Schur’s complement \(\mathbf {S}/\mathbf {S}[I]\) are positive definite.

Let \(\varSigma = \begin{bmatrix} \mathbf {A}&\mathbf {B} \\ \mathbf {B}^{T}&\mathbf {D} \end{bmatrix} \succ 0\). We can factor \(\varSigma \) in terms of a principal submatrix, say \(\mathbf {D}\), and its Schur complement, \(\varSigma /\mathbf {D} = \mathbf {A} - \mathbf {B} \mathbf {D}^{-1} \mathbf {B}^{T}\), as follows:

$$ \varSigma = \begin{bmatrix} \mathbf {I}&\mathbf {BD}^{-1} \\ \mathbf {0}&\mathbf {I} \end{bmatrix} \begin{bmatrix} \varSigma /\mathbf {D}&\mathbf {0} \\ \mathbf {0}&\mathbf {D} \end{bmatrix} \begin{bmatrix} \mathbf {I}&\mathbf {0} \\ \mathbf {D}^{-1} \mathbf {B}^{T}&\mathbf {I} \end{bmatrix}.$$

The next two theorems regarding the spectra of principal submatrices and Schur complements of positive definite matrices are used in Sect. 4. In both theorems, \(\lambda _i\) is the ith (in non-increasing order, with multiplicity) eigenvalue of a symmetric matrix.

Theorem 1

(Cauchy). For any symmetric matrix \(\mathbf {S} \in \mathbb {R}^{n \times n}\), \(I \subseteq \{1,\ldots ,n\}\) and \(1 \le i \le |I|\)

$$\begin{aligned} \lambda _i(\mathbf {S}) \ge \lambda _i(\mathbf {S}[I]) \ge \lambda _{i+n-|I|}(\mathbf {S}). \end{aligned}$$

Theorem 2

([61, Corollary 2.3]). For any positive definite \(\varSigma \in \mathbb {R}^{n \times n}\), \(I\subseteq \{1,\dots , n\}\) and \(1 \le i \le |I|\)

$$\begin{aligned} \lambda _i(\varSigma ) \ge \lambda _i(\varSigma /I) \ge \lambda _{i + n - |I|}(\varSigma ). \end{aligned}$$

In other words, the eigenvalues of principal submatrices and Schur complements of a positive definite matrix are bounded from below and above by the smallest and largest eigenvalues of the original matrix.

2.2 Gaussians and Lattices

A lattice \(\varLambda \subset \mathbb {R}^{n}\) is a discrete subgroup of \(\mathbb {R}^{n}\). Specifically, a lattice of rank k is the integer span \(\mathcal {L}(\mathbf {B}) = \{z_1\mathbf {b}_1 + \dots + z_k\mathbf {b}_k \mid z_i \in \mathbb {Z}\}\) of a basis \(\mathbf {B} = \{\mathbf {b}_1, \dots , \mathbf {b}_k\} \subset \mathbb {R}^{n}\) \((k\le n)\). There are infinitely many bases for a given lattice since right-multiplying a basis by a unimodular transformation gives another basis. The dual lattice of \(\varLambda \), denoted by \(\varLambda ^{*}\), is the lattice \(\{\mathbf {x} \in \hbox {span}(\varLambda ) | \left\langle {\mathbf {x}, \varLambda }\right\rangle \subseteq \mathbb {Z}\}\). It is easy to see that \(\mathbf {B}^{-T}\) is a basis for \(\mathcal {L}(\mathbf {B})^{*}\) for a full rank lattice (\(n=k\)).

The n-dimensional gaussian function \(\rho :\mathbb {R}^{n} \rightarrow (0,1]\) is defined as \(\rho (\mathbf {x}) := \exp (-\pi \Vert \mathbf {x}\Vert ^{2})\). Applying an invertible linear transformation \(\mathbf {B}\) to the gaussian function yields

$$ \rho _\mathbf {B}(\mathbf {x}) = \rho (\mathbf {B}^{-1} \mathbf {x}) = \exp (-\pi \cdot \mathbf {x}^{T}\varSigma ^{-1}\mathbf {x}) $$

with \(\varSigma = \mathbf {B}\mathbf {B}^{T} \succ 0\). For any \(\mathbf {c} \in \hbox {span}(\mathbf {B}) = \hbox {span}(\varSigma )\), we also define the shifted gaussian function (centered at \(\mathbf {c}\)) as \(\rho _{\sqrt{\varSigma },\mathbf {c}}(\mathbf {x}) = \rho _{\sqrt{\varSigma }}(\mathbf {x} - \mathbf {c})\). Normalizing the function \(\rho _{\mathbf {B},\mathbf {c}}(\mathbf {x})\) by the measure of \(\rho _{\mathbf {B},\mathbf {c}}\) over the span of \(\mathbf {B}\) gives the continuous gaussian distribution with covariance \(\varSigma /(2\pi )\), denoted by \(D_{\sqrt{\varSigma },\mathbf {c}}\). Let \(S \subset \mathbb {R}^{n}\) be any discrete set in \(\mathbb {R}^{n}\), then \(\rho _{\sqrt{\varSigma }}(S) = \sum _{\mathbf {s} \in S} \rho _{\sqrt{\varSigma }}(\mathbf {s})\). The discrete gaussian distribution over a lattice \(\varLambda \), denoted by \(D_{\varLambda , \sqrt{\varSigma }, \mathbf {c}}\), is defined by restricting the support of the distribution to \(\varLambda \). Specifically, a sample \(\mathbf {y} \leftarrow D_{\varLambda , \sqrt{\varSigma }, \mathbf {c}}\) has probability mass function \(\rho _{\sqrt{\varSigma },\mathbf {c}}(\mathbf {x})/\rho _{\sqrt{\varSigma }, \mathbf {c}}(\varLambda )\) for all \(\mathbf {x} \in \varLambda \). Discrete gaussians on lattice cosets \(\varLambda + \mathbf {c}\), for \(\mathbf {c} \in \hbox {span}(\varLambda )\), are defined similarly setting \(\Pr \{\mathbf {y} \leftarrow D_{\varLambda + \mathbf {c}, \sqrt{\varSigma },\mathbf {p}}\} = \rho _{\sqrt{\varSigma },\mathbf {p}}(\mathbf {y})/\rho _{\sqrt{\varSigma },\mathbf {p}}(\varLambda + \mathbf {c})\) for all \(\mathbf {y} \in \varLambda + \mathbf {c}\). For brevity we let \(D_{\varLambda + \mathbf {c}, \sqrt{\varSigma },\mathbf {p}}(\mathbf {y}) := \Pr \{\mathbf {y} \leftarrow D_{\varLambda + \mathbf {c}, \sqrt{\varSigma },\mathbf {p}}\}\).

For a lattice \(\varLambda \) and any (typically small) positive \(\epsilon >0\), the smoothing parameter \(\eta _{\epsilon }(\varLambda )\) [52] is the smallest \(s>0\) such that \(\rho (s \cdot \varLambda ^{*}) \le 1+\epsilon \). A one-dimensional discrete gaussian with a tail-cut, t, is a discrete gaussian \(D_{\mathbb {Z}, c,s}\) restricted to a support of \(\mathbb {Z}\cap [c - t \cdot s, c + t \cdot s]\). We denote this truncated discrete gaussian as \(D^{t}_{\mathbb {Z}, c,s}\). In order to use the ML distance in Sect. 3, we will restrict all tail-cut discrete gaussians to a universal support of \(\mathbb {Z}\cap [c - t \cdot s_{max}, c + t \cdot s_{max}]\) for some \(s_{max}\).

Lemma 1

([31, Lemma 4.2]). For any \(\epsilon >0\), any \(s \ge \eta _{\epsilon }(\mathbb {Z})\), and any \(t > 0\),

More generally, for any positive definite matrix \(\varSigma \) and lattice \(\varLambda \subset \hbox {span}(\varSigma )\), we write \(\sqrt{\varSigma } \ge \eta _{\epsilon }(\varLambda )\), or \(\varSigma \succeq \eta _{\epsilon }^{2}(\varLambda )\), if \(\rho (\sqrt{\varSigma }^{T} \cdot \varLambda ^{*}) \le 1+\epsilon \). The reader is referred to [31, 52, 55] for additional information on the smoothing parameter.

Here we recall two bounds and a discrete gaussian convolution theorem to be used later.

Lemma 2

([31, Lemma 3.1]). Let \(\varLambda \subset \mathbb {R}^{n}\) be a lattice with basis \(\mathbf {B}\), and let \(\epsilon > 0\). Then,

$$\begin{aligned} \eta _{\epsilon }(\varLambda ) \le \Vert \mathbf {\widetilde{B}}\Vert \sqrt{\log (2n(1+1/\epsilon ))/\pi }. \end{aligned}$$

Lemma 3

([55, Lemma 2.5]). For any full rank n-dimensional lattice \(\varLambda \), vector \(\mathbf {c} \in \mathbb {R}^{n}\), real \(\epsilon \in (0,1)\), and positive definite \(\varSigma \succeq \eta ^{2}_{\epsilon }(\varLambda )\),

$$ \rho _{\sqrt{\varSigma }}(\varLambda + \mathbf {c}) \in \left[ \frac{1-\epsilon }{1+\epsilon }, 1 \right] \cdot \rho _{\sqrt{\varSigma }}(\varLambda ).$$

Theorem 3

([55, Theorem 3.1]). For any vectors \(\mathbf {c}_1, \mathbf {c}_2 \in \mathbb {R}^{n}\), lattices \(\varLambda _1, \varLambda _2 \subset \mathbb {R}^{n}\), and positive definite matrices \(\varSigma _1, \varSigma _2 \succ 0\), \(\varSigma = \varSigma _1 + \varSigma _2 \succ 0\), \(\varSigma _3^{-1} = \varSigma _1^{-1} + \varSigma _2^{-1} \succ 0\), if \(\sqrt{\varSigma }_1 \succeq \eta _{\epsilon }(\varLambda _1)\) and \(\sqrt{\varSigma }_3 \succeq \eta _{\epsilon }(\varLambda _2)\) for some \(0 < \epsilon \le 1/2\), then the distribution

$$ X = \{ \mathbf {x} \mid \mathbf {p} \leftarrow D_{\varLambda _2 + \mathbf {c}_2, \sqrt{\varSigma _2}}, \mathbf {x} \leftarrow D_{\varLambda _1 + \mathbf {c}_1,\sqrt{\varSigma _1},\mathbf {p}} \} $$

is within statistical distance \(\varDelta (X,Y) \le 8\epsilon \) from the discrete gaussian \(Y=D_{\varLambda _1 + \mathbf {c}_1, \sqrt{\varSigma }}\).

Below we have the correctness theorem for the standard, randomized version of Babai’s nearest plane algorithm. The term statistically close is the standard cryptographic notion of negligible statistical distance. Precisely, a function \(f: \mathbb N \rightarrow \mathbb R_{\ge 0}\) is negligible if for every \(c>1\) there exists an N such that for all \(n > N\), \(f(n) < n^{-c}\). We emphasize that the algorithm reduces to sampling \(D_{\mathbb {Z}, s, c}\).

Theorem 4

([31, Theorem 4.1]). Given a full-rank lattice basis \(\mathbf {B} \in \mathbb {R}^{n \times n}\), a parameter \(s \ge \Vert \tilde{\mathbf {B}} \Vert \omega (\sqrt{\log n})\), and a center \(\mathbf {c} \in \mathbb {R}^{n}\), there is an \(O(n^{2})\)-time, with a \(O(n^{3})\)-time preprocessing, probabilistic algorithm whose output is statistically close to \(D_{\mathcal {L}(\mathbf {B}), s, \mathbf {c}}\).

2.3 Cyclotomic Fields

Let n be a positive integer. The n-th cyclotomic field over \(\mathbb {Q}\) is the number field \(\mathcal {K}_n = \mathbb {Q}[x]/(\varPhi _n(x)) \cong \mathbb {Q}(\zeta )\) where \(\zeta \) is an n-th primitive root of unity and \(\varPhi _n(x)\) is the minimal polynomial of \(\zeta \) over \(\mathbb {Q}\). The nth cyclotomic ring is \(\mathcal {O}_n = \mathbb {Z}[x]/(\varPhi _n(x))\). Let \(\varphi (n)\) be Euler’s totient function. \(\mathcal {K}_n\) is a \(\varphi (n)\)-dimensional \(\mathbb {Q}\)-vector space, and we can view \(\mathcal {K}_n\) as a subset of \(\mathbb C\) by viewing \(\zeta \) as a complex primitive n-th root of unity.

Multiplication by a fixed element f, \(g \mapsto f\cdot g\), is a linear transformation on \(\mathcal {K}_n\) as a \(\mathbb {Q}\)-vector space. We will often view field elements as \(\varphi (n)\)-dimensional rational vectors via the coefficient embedding. This is defined by \(f(x) = \sum _{i = 0}^{\varphi (n)-1}f_ix^{i} \mapsto (f_0, \cdots , f_{\varphi (n)-1})^{T}\) mapping a field element to its vector of coefficients under the power basis \(\{1, x, \cdots , x^{\varphi (n)-1}\}\) (or equivalently \(\{1, \zeta , \cdots , \zeta ^{\varphi (n)-1}\}\)). We can represent a field element as the matrix in \(\mathbb {Q}^{\varphi (n) \times \varphi (n)}\) that represents the linear transformation by its multiplication in the coefficient embedding. This matrix is called a field element’s coefficient multiplication matrix. When n is a power of two, an element’s coefficient multiplication matrix is anti-cyclic.

An isomorphism from the field F to the field K is a bijection \(\theta : F \rightarrow K\) such that \(\theta (fg) = \theta (f)\theta (g)\), and \(\theta (f + g) = \theta (f) + \theta (g)\) for all \(f,g \in F\). An automorphism is an isomorphism from a field to itself. For example, if we view the cyclotomic field \(\mathcal {K}_n\) as a subset of the complex numbers, then the conjugation map \(f(\zeta ) \mapsto f(\zeta )^{*} = f(\zeta ^{*})\) is an automorphism and can be computed in linear time O(n). In power-of-two cyclotomic fields, the conjugation of a field element corresponds to the matrix transpose of an element’s anti-cyclic multiplication matrix.

Another embedding is the canonical embedding which maps an element \(f\in \mathcal {K}_n\) to the vector of evaluations of f, as a polynomial, at each root of \(\varPhi _n(x)\). When n is a power of two, the linear transformation between the coefficient embedding and the canonical embedding is a scaled isometry.

Let n be a power of two, then the field \(\mathcal {K}_{2n}\) is a two-dimensional \(\mathcal {K}_n\)-vector space as see by splitting a polynomial \(f(x) \in \mathcal {K}_{2n}\) into \(f(x) = f_0(x^{2}) + x\cdot f_1(x^{2})\) for \(f_i \in \mathcal {K}_n\). Now, we can view the linear transformation given by multiplication by some \(f \in \mathcal {K}_{2n}\) as a linear transformation over \(\mathcal {K}_n \oplus \mathcal {K}_n \cong \mathcal {K}_{2n}\). Let \(\phi _{2n}: \mathcal {K}_{2n} \rightarrow \mathbb {Q}^{n \times n}\) be the injective ring homomorphism from the field to an element’s anti-cyclic matrix. Then, we have the following relationship where \(\mathbf {P}\) below is a simple re-indexing matrix known as a stride permutation (increasing evens followed by increasing odds in \(\{0,1, \dots , n-1\}\)),

$$ \mathbf {P} \phi _n(f) \mathbf {P}^{T} = \begin{bmatrix} \phi _{n/2}(f_0)&\phi _{n/2}(x\cdot f_1) \\ \phi _{n/2}(f_1)&\phi _{n/2}(f_0) \end{bmatrix}.$$

3 Sampling G-Lattices

For any positive integers \(b\ge 2\), \(k\ge 1\) and non-negative integer \(u<b^{k}\), we write \([{u}]_{b}^{k}\) for the base-b expansion of u, i.e., the unique vector \((u_0,\ldots ,u_{k-1})\) with entries \(0\le u_i < b\) such that \(u = \sum _i u_i b^{i}\). Typically, \(b=2\) and \([{u}]_{2}^{k}\) is just the k-digits binary representation of u, but larger values of b may be used to obtain interesting efficiency trade-offs. Throughout this section, we consider the values of b and k as fixed, and all definitions and algorithms are implicitly parametrized by them.

In this section we study the so-called G-lattice sampling problem, i.e., the problem of sampling the discrete Gaussian distribution on a lattice coset

$$\begin{aligned} \varLambda _u^{\perp }(\mathbf {g}^{T}) = \{\mathbf {z} \in \mathbb {Z}^{k} : \mathbf {g}^{T}\mathbf {z} = u \bmod q\} \end{aligned}$$

where \(q\le b^{k}\), \(u \in \mathbb {Z}_q\), \(k = \lceil \log _b q \rceil \), and \(\mathbf {g} = (1,b,\dots , b^{k-1})\). G-lattice sampling is used in many lattice schemes employing a trapdoor. Both schemes with polynomial modulus, like IBE [2, 4, 11, 18], group signatures [36, 44, 45, 54], and others (double authentication preventing and predicate authentication preventing signatures, constraint-hiding PRFs) [15, 22], and schemes with super-polynomial modulus [1, 17, 19, 20, 33, 35, 42] (ABE, obfuscation, watermarking, etc.), as well as [39], use G-lattice sampling.

A very efficient algorithm to solve this problem is given in [51] for the special case when \(q = b^{k}\) is a power of the base b. The algorithm, shown in Fig. 1, is very simple. This algorithm reduces the problem of sampling the k-dimensional lattice coset \(\varLambda _u^{\perp }(\mathbf {g}^{T})\) for \(u\in \mathbb {Z}_q\) to the much simpler problem of sampling the one-dimensional lattice cosets \(u + b\mathbb {Z}\) for \(u \in \mathbb {Z}_b\). The simplicity of the algorithm is due to the fact that, when \(q=b^{k}\) is an exact power of b, the lattice \(\varLambda ^{\perp }(\mathbf {g}^{T})\) has a very special basis

which is sparse, triangular, and with small integer entries. (In particular, its Gram-Schmidt orthogonalization \(\widetilde{\mathbf {B}}_{b^{k}} = b\mathbf {I}\) is a scalar matrix.) As a result, the general lattice sampling algorithm of [31, 43] (which typically requires \(O(k^{3})\)-time preprocessing, and \(O(k^{2})\) storage and online running time) can be specialized to the much simpler algorithm in Fig. 1 that runs in linear time O(k), with minimal memory requirements and no preprocessing at all.

We give a specialized algorithm to solve the same sampling problem when \(q<b^{k}\) is an arbitrary modulus. This is needed in many cryptographic applications where the modulus q is typically a prime. As already observed in [51] the lattice \(\varLambda ^{\perp }(\mathbf {g}^{T})\) still has a fairly simple and sparse basis matrix

Fig. 1.
figure 1

A sampling algorithm for G-lattices when the modulus q is a perfect power of the base b. The algorithm is implicitly parametrized by a base b and dimension k.

where \((q_0,\ldots ,q_{k-1}) = [{q}]_{b}^{k} = \mathbf {q}\) is the base-b representation of the modulus q. This basis still has good geometric properties, as all vectors in its (left-to-right) Gram-Schmidt orthogonalization have length at most O(b). So, it can be used with the algorithm of [31, 43] to generate good-quality gaussian samples on the lattice cosets with small standard deviation. However, since the basis is no longer triangular, its Gram-Schmidt orthogonalization is not sparse anymore, and the algorithm of [31, 43] can no longer be optimized to run in linear time as in Fig. 1. In applications where \(q = n^{O(1)}\) is polynomial in the security parameter n, the matrix dimension \(k = O(\log n)\) is relatively small, and the general sampling algorithm (with \(O(k^{2})\) storage and running time) can still be used with an acceptable (albeit significant) performance degradation. However, for larger q this becomes prohibitive in practice. Moreover, even for small q, it would be nice to have an optimal sampling algorithm with O(k) running time, linear in the matrix dimension, as for the exact power case. Here we give such an algorithm, based on the convolution methods of [55], but specialized with a number of concrete technical choices that result in a simple and very fast implementation, comparable to the specialized algorithm of [51] for the exact power case.

The reader may notice that the alternating columns of \(\mathbf {B}_q\), \(\mathbf {b}_1, \mathbf {b}_3, \dots \) and \(\mathbf {b}_2, \mathbf {b}_4, \dots \), are pair-wise orthogonal. Let us call these sets \(\mathbf {B}_1\) and \(\mathbf {B}_2\), respectively. Then, another basis for \(\varLambda ^{\perp }(\mathbf {g}^{T})\) is \((\mathbf {B}_1, \mathbf {B}_2, \mathbf {q})\) and this might suggest that the GSO of this basis is sparse. Unfortunately, this leads to a GSO of \((\mathbf {B}_1, \mathbf {B}_2^{*}, \mathbf {q}^{*})\) where \(\mathbf {B}_2^{*}\) is a dense, upper triangular block. Let \(\mathbf {b}\) be the \(i-th\) vector in \(\mathbf {B}_2\). Then, there are \(2 + i-1\) non-orthogonal vectors to \(\mathbf {b}\) preceding it in \(\mathbf {B}_1\) and \(\mathbf {B}_2^{*}\), filling in the upper portion of \(\tilde{\mathbf {b}}\).

Overview. The idea is the following. Instead of sampling \(\varLambda ^{\perp }_u(\mathbf {g}^{T})\) directly, we express the lattice basis \(\mathbf {B}_q = \mathbf {T} \mathbf {D}\) as the image (under a linear transformation \(\mathbf {T}\)) of some other matrix \(\mathbf {D}\) with very simple (sparse, triangular) structure. Next, we sample the discrete gaussian distribution (say, with variance \(\sigma ^{2}\)) on an appropriate coset of \(\mathcal {L}(\mathbf {D})\). Finally, we map the result back to the original lattice applying the linear transformation \(\mathbf {T}\) to it. Notice that, even if \(\mathcal {L}(\mathbf {D})\) is sampled according to a spherical gaussian distribution, the resulting distribution is no longer spherical. Rather, it follows an ellipsoidal gaussian distribution with (scaled) covariance \(\sigma ^{2} \mathbf {T} \mathbf {T}^{T}\). This problem is solved using the convolution method of [55], i.e., initially adding a perturbation with complementary covariance \(s^{2}\mathbf {I} - \sigma ^{2}\mathbf {T}\mathbf {T}^{T}\) to the target, so that the final output has covariance \(\sigma ^{2}\mathbf {T}\mathbf {T}^{T} + (s^{2}\mathbf {I} - \sigma ^{2}\mathbf {T}\mathbf {T}^{T}) = s^{2}\mathbf {I}\). In summary, at a very high level, the algorithm performs (at least implicitly) the following steps:

  1. 1.

    Compute the covariance matrix \(\varSigma _1 = \mathbf {T}\mathbf {T}^{T}\) and an upper bound r on the spectral norm of \(\mathbf {T} \mathbf {T}^{T}\)

  2. 2.

    Compute the complementary covariance matrix \(\varSigma _2 = r^{2}\mathbf {I} - \varSigma _1\)

  3. 3.

    Sample \(\mathbf {p} \leftarrow D_{\varLambda _1, \sigma \sqrt{\varSigma _2}}\), from some convenient lattice \(\varLambda _1\) using the Cholesky decomposition of \(\varSigma _2\)

  4. 4.

    Compute the preimage \(\mathbf {c} = \mathbf {T}^{-1}(\mathbf {u} - \mathbf {p})\)

  5. 5.

    Sample \(\mathbf {z} \leftarrow D_{\mathcal {L}(\mathbf {D}), -\mathbf {c},\sigma }\)

  6. 6.

    Output \(\mathbf {u} + \mathbf {T} \mathbf {z}\)

The technical challenge is to find appropriate matrices \(\mathbf {T}\) and \(\mathbf {D}\) that lead to a very efficient implementation of all the steps. In particular, we would like \(\mathbf {T}\) to be a very simple matrix (say, sparse, triangular, and with small integer entries) so that \(\mathbf {T}\) has small spectral norm, and both linear transformations \(\mathbf {T}\) and \(\mathbf {T}^{-1}\) can be computed efficiently. The matrix \(\mathbf {D}\) (which is uniquely determined by \(\mathbf {B}\) and \(\mathbf {T}\)) should also be sparse and triangular, so that the discrete gaussian distribution on the cosets of \(\mathcal {L}(\mathbf {D})\) can be efficiently sampled. Finally (and this is the trickiest part in obtaining an efficient instantiation) the complementary covariance matrix \(\varSigma _2 = r^{2}\mathbf {I} - \varSigma _1\) should also have a simple Cholesky decomposition \(\varSigma _2 = \mathbf {L}\mathbf {L}^{T}\) where \(\mathbf {L}\) is triangular, sparse and with small entries, so that perturbations can be generated efficiently. Ideally, all matrices should also have a simple, regular structure, so that they do not need to be stored explicitly, and can be computed on the fly with minimal overhead.

In the next subsection we provide an instantiation that satisfies all of these properties. Next, in Subsect. 3.2 we describe the specialized sampling algorithm resulting from the instantiation, and analyze its correctness and efficiency properties.

3.1 Instantiation

In this subsection, we describe a specific choice of linear transformations and matrix decompositions that satisfies all our desiderata, and results in a very efficient instantiation of the convolution sampling algorithm on G-lattices.

A tempting idea may be to map the lattice basis \(\mathbf {B}_q\) to the basis \(\mathbf {B}_{b^{k}}\), and then use the efficient sampling algorithm from Fig. 1. However, this does not quite work because it results in a pretty bad transformation \(\mathbf {T}\) which has both poor geometrical properties and a dense matrix representation. It turns out that a very good choice for a linear transformation \(\mathbf {T}\) is given precisely by the matrix \(\mathbf {T} = \mathbf {B}_{b^{k}}\) describing the basis when q is a power of b. We remark that \(\mathbf {T}\) is used as a linear transformation, rather than a lattice basis. So, the fact that it equals \(\mathbf {B}_{b^{k}}\) does not seem to carry any special geometric meaning, it just works! In particular, what we do here should not be confused with mapping \(\mathbf {B}_q\) to \(\mathbf {B}_{b^{k}}\). The resulting factorization is

figure a

where the entries of the last column of \(\mathbf {D}\) are defined by the recurrence \(d_i = \frac{d_{i-1}+q_i}{b}\) with initial condition \(d_{-1}=0\). Notice that all the \(d_i\) are in the range [0, 1), and \(b^{i+1}\cdot d_i\) is always an integer. In some sense, sampling from \(\mathcal {L}(\mathbf {D})\) is even easier than sampling from \(\mathcal {L}(\mathbf {B}_{b^{k}})\) because the first \(k-1\) columns of \(\mathbf {D}\) are orthogonal and the corresponding coordinates can be sampled independently in parallel. (This should be contrasted with the sequential algorithm in Fig. 1).

We now look at the geometry and algorithmic complexity of generating perturbations. The covariance matrix of \(\mathbf {T} = \mathbf {B}_{b^{k}}\) is given by

The next step is to find an upper bound \(r^{2}\) on the spectral norm of \(\varSigma _2\), and compute the Cholesky decomposition \(\mathbf {L} \mathbf {L}^{T}\) of the complementary covariance matrix \(\varSigma _2 = r^{2}\mathbf {I} - \varSigma _1\). By the Gershgorin circle theorem, all eigenvalues of \(\varSigma _1\) are in the range \((b\pm 1)^{2}\). So, we may set \(r = b+1\). Numerical computations also suggest that this choice of r is optimal, in the sense that the spectral norm of \(\varSigma _1\) approaches \(b+1\) as k tends to infinity. The Cholesky decomposition is customarily defined by taking \(\mathbf {L}\) to be a lower triangular matrix. However, for sampling purposes, an upper triangular \(\mathbf {L}\) works just as well. It turns out that using an upper triangular \(\mathbf {L}\) in the decomposition process leads to a much simpler solution, where all (squared) entries have a simple, closed form expression, and can be easily computed on-line without requiring any preprocessing computation or storage. (By contrast, numerical computations suggest that the standard Cholesky decomposition with lower triangular \(\mathbf {L}\) is far less regular, and even precomputing it requires exponentially higher precision arithmetic than our upper triangular solution.) So, we let \(\mathbf {L}\) be an upper triangular matrix, and set \(r = b+1\).

For any r, the perturbation’s covariance matrix \(\varSigma _2 = r^{2} \mathbf {I} - \varSigma _1\) has Cholesky decomposition \(\varSigma _2 = \mathbf {L} \cdot \mathbf {L}^{T}\) where \(\mathbf {L}\) is the sparse upper triangular matrix defined by the following equations:

It can be easily verified that these equations have the following simple closed form solution:

$$\begin{aligned} r = b+1, \quad l_0^{2} = b\left( 1+\frac{1}{k}\right) + 1, \quad l_i^{2} = b \left( 1 + \frac{1}{k-i}\right) , \quad h_{i+1}^{2} = b\left( 1 - \frac{1}{k-i}\right) \end{aligned}$$
(1)

We observe that also the inverse transformation \(\mathbf {B}_{b^{k}}^{-1}\) has a simple, closed-form solution: the ith column of \(\mathbf {B}_{b^{k}}^{-1}\) equals \((0, \cdots , 0, \frac{1}{b}, \dots , (\frac{1}{b})^{k-i})\). Notice that this matrix is not sparse, as it has \(O(k^{2})\) nonzero entries. However, there is no need to store it and the associated transformation can still be computed in linear time by solving the sparse triangular system \(\mathbf {T} \mathbf {x} = \mathbf {b}\) by back-substitution.

3.2 The Algorithm

The sampling algorithm, SampleG, is shown in Fig. 2. It takes as input a modulus q, an integer variance s, a coset u of \(\varLambda ^{\perp }(\mathbf {g}^{T})\), and outputs a sample statistically close to \(D_{\varLambda _u^{\perp }(\mathbf {g}^{T}),s}\). SampleG relies on subroutines Perturb and SampleD where Perturb\((\sigma )\) returns a perturbation, \(\mathbf {p}\), statistically close to \(D_{\mathcal {L}(\varSigma _2), \sigma \cdot \sqrt{\varSigma _2}}\), and SampleD\((\sigma , \mathbf {c})\) returns a sample \(\mathbf {z}\) such that \(\mathbf {D}\mathbf {z}\) is statistically close to \(D_{\mathcal {L}(\mathbf {D}), - \mathbf {c}, \sigma }\).

Both Perturb and SampleD are instantiations of the randomized nearest plane algorithm [31, 43]. Consequently, both algorithms rely on a subroutine SampleZ\(_{t}(\sigma , c, \sigma _{max})\) which returns a sample statistically close to one-dimensional discrete gaussian with it a tail-cut t, \(D^{t}_{\mathbb {Z}, \sigma , c}\) over the fixed support of \(\mathbb {Z}\cap [c - t\cdot \sigma _{max}, c + t\cdot \sigma _{max}]\). We fix the support of all one dimensional discrete gaussians for compatibility with ML distance. In addition, we only feed SampleZ centers \(c \in [0,1)\) since we can always shift by an integer.

Storage. The scalars \(c_i\) in SampleG, representing \(\mathbf {c} = \mathbf {B}_{b^{k}}^{-1}(\mathbf {u} - \mathbf {p})\), and \(d_i\) in SampleD, representing the last column of \(\mathbf {D}\), are rational numbers of the form \(x/b^{i}\) for a small integer x and \(i \in [k]\). The numbers \(l_i, h_i\) are positive numbers of magnitude less than \(\sqrt{2b + 1}\).

A naive implementation of the algorithms store floating point numbers \(c_i\), \(d_i\), \(h_i\), and \(l_i\) for a total storage of 4k floating point numbers. However, this can be adapted to constant time storage since they are determined by simple recurrence relations (\(c_i,\) \(d_i\)) or simple formulas (\(h_i,\) \(l_i\)).

Time Complexity. Assuming constant time sampling for SampleZ and scalar arithmetic, SampleG runs in time O(k). Now let us consider all operations: there are 6k integer additions/subtractions, \(3k+2\) integer multiplications, \(3(k+1)\) floating point divisions, 2k floating point multiplications, and 2k floating point additions. The analysis below shows we can use double precision floating point numbers for most applications.

Fig. 2.
figure 2

Sampling algorithm for G-lattices for any modulus \(q < b^{k}\). The algorithms take b and k as implicit parameters, and SampleG outputs a sample with distribution statistically close to \(D_{\varLambda ^{\perp }_u(\mathbf {g}^{T}),s}\). Any scalar with an index out of range is 0, i.e. \(c_{-1} = z_{-1} = z_k = 0\). SampleZ\(_t(\sigma , c, \sigma _{max})\) is any algorithm that samples from a discrete gaussian over \(\mathbb {Z}\) exactly or approximately with centers in [0, 1) and a fixed truncated support \(\mathbb {Z}\cap [c - t\cdot \sigma _{max}, c + t\cdot \sigma _{max}]\) (t is the tail-cut parameter). We denote \(x -\lfloor x \rfloor \) as \(\lfloor x \rceil _{[0,1)}\).

Statistical Analysis and Floating Point Precision. We now perform a statistical analysis on SampleG with a perfect one-dimensional sampler (and no tail-bound), then with a tail-bounded imperfect sampler in terms of ML distance. This allows us to measure loss in concrete security. We direct the reader to [53, Sect. 3] for more details on the ML distance and a complete concrete security analysis.

The following lemma is needed in order to make sense of the “\(\varSigma _3\) condition” in Theorem 3.

Lemma 4

Let \(\varSigma _3\) be defined by \(\varSigma _3^{-1} = \frac{(b+1)^{2}}{s^{2}} [\varSigma _1^{-1} + [(b+1)^{2}\mathbf {I} - \varSigma ]^{-1}]\), then its eigenvalues are \(\varTheta (s^{2}/b)\). Moreover, if \(\lambda _i\) is the \(i-\)th eigenvalue of \(\varSigma _1\), then the \(i-\)th eigenvalue of \(\varSigma _3\) is \((s/[b+1])^{2} \cdot \frac{\lambda _i[(b+1)^{2} - \lambda _i]}{(b+1)^{2}}\).

Proof

Let \(\varSigma _1 = \mathbf {Q}^{T} \mathbf {D}\mathbf {Q}\) be its diagonalization. Then, \(\varSigma _1^{-1} = \mathbf {Q}^{T} \mathbf {D}^{-1}\mathbf {Q}\) and the rest follows from algebraic manipulations of the individual eigenvalues along with the Gershgorin circle theorem on \(\varSigma _1\).    \(\square \)

Let \(C_{\epsilon , k} = \sqrt{\log (2k(1+1/\epsilon ))/\pi }\). Now we can easily bound s from below. We need the following three conditions for s: \(s \ge (b+1) \eta _\epsilon (\mathbf {D})\), \(\sqrt{\varSigma _3} \ge \eta _{\epsilon }(\varSigma _2)\), and \(s \ge (b+1)\eta _\epsilon (\mathbf {L})\). The middle condition determines s with a lower bound of \(s\ge \sqrt{2b}\cdot (2b+1)\cdot C_{\epsilon , k}\) (the last two conditions both have \(s = \varOmega (b^{1.5} \cdot C_{\epsilon , k})\)).

Corollary 1

Fix \(0 < \epsilon \le 1/2\) and let \(s\ge \sqrt{2b}\cdot (2b+1)\cdot C_{\epsilon , k}\). Then, SampleG returns a perturbation within a statistical distance \(\varTheta (k \hat{\epsilon })\) from \(D_{\varLambda ^{\perp }_u(\mathbf {g}^{T}), s}\) for any \(q < b^{k}\) when Perturb and SampleD use a perfect one-dimensional sampler, SampleZ. In addition, the Rényi divergence of order infinity of \(D_{\varLambda ^{\perp }_u(\mathbf {g}^{T}), s}\) from SampleG with a perfect one-dimensional sampler is less than or equal to \(1 + \varTheta (k\hat{\epsilon })\).

The statistical distance bound of \(\varTheta (k \hat{\epsilon })\) results in about a loss of \(\log \log q\) bits in security if \(\epsilon = 2^{-\kappa }\) for a security parameter \(\kappa \) by [53, Lemma 3.1]. (The multiplicative factor of k comes from the randomized nearest plane algorithm’s analysis: see [31, Theorem 4.1].)

Next, we turn to the ML distance for a tighter analysis on the bits of security lost in using SampleG with an imperfect one-dimensional sampler. Since the centers, c, and variances, s, given to SampleZ are computed from two or three floating point computations, we assume both \(\bar{c}\) and \(\bar{s}\) are within a relative error of \(2^{-m}\) of c and s.

Proposition 1

Fix an \(\epsilon > 0\) and let \(s \ge (b+1) \cdot \eta _{\epsilon }(\mathbb {Z})\). For any one-dimensional sampler SampleZ\(_t(\bar{\sigma }, \bar{c}, s)\) that takes as inputs approximated centers \(\bar{c} \in [0,1)\) and variances \(\bar{\sigma } \in [s/(b+1), s \cdot b/(b+1)]\) represented as floating point numbers with mantissa length m, \(\varDelta _{\textsc {ml}}(\textsc {SampleG}^{D^t_{\mathbb {Z}, \sigma , c}}, \textsc {SampleG}^{\textsc {SampleZ}_t(\bar{\sigma }, \bar{c})}) \le \) \(2k[O(b^{2}t^{2} 2^{-m}) + \max _{\bar{\sigma }, \bar{c}}\varDelta _{\textsc {ml}}(\textsc {SampleZ}_t(\bar{\sigma }, \bar{c}, s), D^{t}_{\mathbb {Z}, \bar{\sigma }, \bar{c}})].\)

Assuming a cryptosystem using a perfect sampler for \(D_{\varLambda ^{\perp }_{u}(g^{T}), s}\) has \(\kappa \) bits of security, we can combine the results of Corollary 1, Proposition 1, and [53, Lemma 3.3] to conclude that swapping \(D_{\varLambda ^{\perp }_{u}(g^{T}), s}\) with SampleG yields about \(\kappa - 2 \log (tb^{2}) - 3\log \log q - 5\) bits of security when \(m = \kappa /2\), \(\varDelta _{\textsc {ml}}(\textsc {SampleZ}_t(\bar{s}, \bar{c}), D^{t}_{\mathbb {Z}, \bar{s}, \bar{c}})\) \( < 2^{-\kappa /2}\), and \(\epsilon = 2^{-\kappa }\).

3.3 Implementation and Comparison

In this subsection, we compare simple implementations of both SampleG and the generic randomized nearest plane algorithm [31, Sect. 4] used in the G-lattice setting. The implementations were carried out in C++ with double precision floating point numbers for non-integers on an Intel i7-2600 3.4 GHz CPU. Clock cycles were measured with the “time.h” library and the results are charted in Fig. 3.

Fig. 3.
figure 3

Measured clock cycles with \(q = \{4093, 12289, 1676083, 8383498, 4295967357, \approx 9\cdot 10^{18}\}\) and \(s=100\) averaged over 100,000 runs. The clock cycles for the last three moduli are \(\{19.4, 31.9, 73.9\}\) for GPV and \(\{5.5, 7.5, 13.1\}\) for SampleG with pre-computation.

The one-dimensional sampler, SampleZ, was an instantiation of a discrete version of Karney’s sampler [41], which is a modified rejection sampler. The moduli q were chosen from the common parameters subsection of [40, Sect. 4.2], in addition to an arbitrary 60-bit modulus. Most practical schemes require no more than a 30-bit modulus [9] for lattice dimension (n) up to 1024. More advanced schemes however, like ABE-encryption [14, 19], predicate encryption [34], and obfuscation [20, 26], require a super-polynomial modulus often 90 or more bits (assuming the circuits in the ABE and predicate schemes are of log-depth).

For the generic, randomized nearest plane sampler, we pre-computed and stored the Gram-Schmidt orthogonalization of the basis \(\mathbf {B}_q\) and we only counted the clock cycles to run the algorithm thereafter. We had two versions of SampleG: the first was the algorithm as-is, and the second would store pre-computed perturbations from Perturb\((\sigma )\), one for each G-lattice sample. This version of SampleG with pre-computation saved about a factor of two in clock cycles.

4 Perturbation Sampling in Cyclotomic Rings

The lattice preimage sampling algorithm of [51] requires the generation of \(n(2+\log q)\)-dimensional gaussian perturbation vectors \(\mathbf {p}\) with covariance \(\varSigma _p = s^{2}\cdot \mathbf {I} - \alpha ^{2} \mathbf {T}\cdot \mathbf {T}^{T}\) where \(\mathbf {T} \in \mathbb {Z}^{(2+\log q)n \times n\log q}\) is a matrix with small entries serving as a lattice trapdoor, \(\alpha \) is a small constant factor and s is an upper bound on the spectral norm of \(\alpha \mathbf {T}\). In [51] this is accomplished using the Cholesky factorization of \(\varSigma _p\), which takes \(O(n\log q)^{3}\) precomputation and \(O(n\log q)^{2}\) storage and running time.

The trapdoor matrix \(\mathbf {T}\) of [51] has some additional structure: \(\mathbf {T}^{T} = [\mathbf {{\bar{T}}}^{T} , \mathbf {I}]\) for some \(\mathbf {\bar{T}} \in \mathbb {Z}^{2n \times n\log q}\). Moreover, when working with algebraic lattices, \(\mathbf {\bar{T}} = \phi _n(\mathbf {\tilde{T}})\) is the image (under a ring embedding \(\phi _n:R_n \rightarrow \mathbb {Z}^{n\times n}\)) of some matrix \(\mathbf {\tilde{T}} \in R_n^{2\times \log q}\) with entries in a ring \(R_n\) of rank n. (Most commonly, \(R_{n} = \mathcal {O}_{2n} = \mathbb {Z}[x] / (x^{n} + 1)\) is the ring of integers of the (2n)th cyclotomic field \(\mathcal {K}_{2n}\) for \(n=2^{k}\) a power of two.) In [9] it is observed that, using the sparsity of \(\varSigma _p\), the preprocessing storage and on-line computation cost of noise perturbation reduce to \(O(n^{2} \log q)\).Footnote 4 This is a factor \(\log q\) improvement over a generic implementation, but it is still quadratic in the main security parameter n. This can be a significant improvement in practice, but the overall cost of the algorithm remains substantial. When using generic trapdoors \(\mathbf {\bar{T}} \in \mathbb {Z}^{2n\times n\log q}\), there is little hope to improve the running time below \(O(n^{2} \log q)\), because just reading the matrix \(\mathbf {\bar{T}}\) takes this much time. However, when using algebraic lattices, the trapdoor \(\mathbf {\bar{T}} = \phi _n(\mathbf {\tilde{T}})\) admits a compact representation \(\mathbf {\tilde{T}}\) consisting of only \(2n\log q\) integers, so one may hope to reduce the running time to linear or quasi-linear in n.

In this section we give an alternative algorithm to generate integer perturbation vectors \(\mathbf {p}\) with covariance \(\varSigma _p\) when \(\mathbf {\bar{T}} = \phi _n(\mathbf {{\tilde{T}}})\). Our algorithm takes full advantage of the ring structure of \(R_n\), compactly representing \(\varSigma _p\) and all other matrices generated during the execution of the algorithm as the image of matrices with entries in the ring \(R_n\). In particular, similarly to [27, 28], our algorithm has time and space complexity quasi-linear in n, but does not require any preprocessing/storage. The algorithm can be expressed in a modular way as the combination of three steps:

  1. 1.

    First, the problem of sampling a \(O(n\log q)\)-dimensional integer vectors \(\mathbf {p}\) with covariance \(\varSigma _p\) is reduced to the problem of sampling a 2n-dimensional integer vector with covariance expressed by a \(2\times 2\) matrix over \(R_n\).

  2. 2.

    Next, the problem of sampling an integer vector with covariance in \(R_{n}^{2\times 2}\) is reduced to sampling two n-dimensional integer vectors, each with a covariance expressed by a single ring element in \(R_{n}\).

  3. 3.

    Finally, if \(n>1\), the sampling problem with covariance in \(R_{n}\) is reduced to sampling an n-dimensional perturbation with covariance expressed by a \(2\times 2\) matrix over the smaller ring \(R_{n/2}\).

Iterating the last two steps \(\log n\) times reduces the original problem to sampling in \(R_1 = \mathbb {Z}\). Details about each step are given in the next subsections. We remark that the algorithm is described as a recursive procedure only for simplicity of presentation and analysis, and it can be implemented just as easily using a simple nested loop, similarly to many FFT-like algorithms.

4.1 Discrete Perturbation Algorithm for Power of Two Cyclotomics

In this subsection we present the perturbation algorithm algorithm which produces \(n(2+\log q)\)-dimensional perturbations from a discrete gaussian on \(\mathbb {Z}^{n(2+\log q)}\) in time \(\tilde{O}(n \log q)\).

The entry point of the algorithm is the SamplePz procedure, which takes as input two integer parameters nq, matrices \(\mathbf {{\tilde{T}}} \in R_{n}^{2 \times \log q}\), \(\varSigma _2 \in R_n^{2 \times 2}\), and three positive real numbers \(s^{2},\alpha ^{2}\), \(z = (\alpha ^{-2} - s^{-2})^{-1}\), and is expected to produce an \(n(2+\log q)\)-dimensional vector \(\mathbf {p}\) with (non-spherical) discrete gaussian distribution \(D_{\mathbb {Z}^{n(2+\log q)}, \sqrt{\varSigma _p}}\) of covariance

$$\begin{aligned} \varSigma _p&= s^{2}\cdot \mathbf {I} - \alpha ^{2} \begin{bmatrix} \phi _n(\mathbf {{\tilde{T}}}) \\ \mathbf {I} \end{bmatrix} \cdot \left[ \phi _n(\mathbf {{\tilde{T}}})^{T} \quad \mathbf {I}\right] \\&= \begin{bmatrix} \varSigma _2&-\alpha ^{2}\phi _n(\mathbf {{\tilde{T}}}) \\ -\alpha ^{2}\phi _n(\mathbf {{\tilde{T}}})^{T}&(s^{2} - \alpha ^{2})\mathbf {I} \end{bmatrix}. \end{aligned}$$

The algorithm calls two subroutines:

  • \(\textsc {SampleZ}(s^{2} - \alpha ^{2})\) which samples a one-dimensional discrete gaussian variable of variance \(s^{2}-\alpha ^{2}\) centered at 0, and can be implemented using any standard technique, and

  • \(\textsc {Sample2z}(a,b,d)\), which, on input three ring elements abd compactly describing a positive definite matrix

    $$\begin{aligned} \varSigma _2 = \begin{bmatrix} \phi _n(a)&\phi _n(b) \\ \phi _n(b)^{T}&\phi _n(d) \end{bmatrix}, \end{aligned}$$

    is expected to sample a (2n)-dimensional vector \(p \leftarrow D_{\mathbb {Z}^{2n},\sqrt{\varSigma _2}}\).

In turn, \(\textsc {Sample2z}\) (also described in Fig. 4) makes use of a procedure \(\textsc {SampleFz}(f)\) which on input a ring element f with positive definite \(\phi _n(f)\), returns a sample \(p \leftarrow D_{\mathbb {Z}^{n}, \sqrt{\phi _n(f)}}\).

Efficiency. Multiplications are done in the field \(\mathcal {K}_{i}\), for an element’s dimension \(i \in \{1,2, \dots , 2n\}\), in time \(\varTheta (i \log i)\) by using the Chinese remainder transform (CRT) [49].

By treating scalar arithmetic as constant time, SamplePz has a time complexity of \(\varTheta (n \log n \log q)\) because the transformation by \(\mathbf {\tilde{T}}\) is \(\varTheta (n\log n \log q)\) and SampleFz has complexity \(\varTheta (n \log ^{2} n)\) (represented by the recurrence \(R(n) = 2R(n/2) + 2 \log n/2 + 4.5n\)). The algorithm requires \(2 n\log q\) scalar storage for the trapdoor \(\mathbf {\tilde{T}}\).

Note, SampleFz is even more efficient, \(\varTheta (n \log n)\), if one were to store the polynomials in \(\mathcal {K}_{i}\) in the canonical embedding (Fourier domain). One would change SamplePz to give Sample2z the Fourier/canonical representations of \(a,b,d,c_0, c_1\) and perform an inverse CRT/FFT on \(\mathbf {p} = (\mathbf {p}_0, \mathbf {p}_1)\). This allows us to use the FFT’s butterfly transformation to convert to the Fourier representation of \(f(x) = f_0(x^{2}) + xf_1(x^{2}) \in \mathcal {K}_{2n}\) to the Fourier representation of \(f_0, f_1 \in \mathcal {K}_n\) and multiplication/inversion is now linear time (we would only invert the non-zero entries in the Fourier domain since this corresponds to pulling back to the field, inverting, then pushing forward to the cyclic ring via the embedding given by the Chinese remainder theorem) [28, Lemma 1]. (Moving from the canonical embedding to the FFT domain is linear time since we place zeros for the non-primitive roots of unity [28, Sect. A.2].) This, however, does not change the asymptotic time complexity of SamplePz since generating \(\mathbf {q}\) in the canonical embedding is now \(\varTheta (n \log n \log q)\).

Fig. 4.
figure 4

Sampling algorithm SamplePz for integer perturbations where \(\mathbf {T} = \phi _n(\mathbf {\tilde{T}})\) is a compact trapdoor over a power of two cyclotomic ring. Note, \(\mathbf {\tilde{T}}_i\) is a row vector over \(R_n\) for each \(i \in \{0,1\}\). The algorithm uses a subroutine SampleZ\((\sigma ^{2}, t)\) which samples a discrete gaussian over \(\mathbb {Z}\) with variance \(\sigma ^{2}\) centered at t. The scalar \(z = (\alpha ^{-2} - s^{-2})^{-1}\).

Correctness. One would use Peikert’s convolution theorem, Theorem 3, in an initial attempt to prove the correctness of the algorithms in Fig. 4. However, this would only ensure the correctness of the marginal distributions of \(\mathbf {p}\) in SamplePz and \(q_0\) in Sample2z and not their respective joint distributions, \((\mathbf {p}, \mathbf {q})\) and \((q_0, q_1)\). Even if it were enough, tracking the \(\varSigma _3\) condition in Theorem 3 through the recursive calls of the algorithms above is tedious. Instead, we derive a convolution lemma without a \(\varSigma _3\) condition for the joint distribution of our discrete gaussian convolutions on the simple lattice \(\mathbb {Z}^{n}\).

First, we show the gaussian function \(\rho _{\sqrt{\varSigma }}(\cdot )\) factors in a useful manner with respect to a Schur complement decomposition.

Lemma 5

Let \(\varSigma = \begin{bmatrix} \mathbf {A}&\mathbf {B} \\ \mathbf {B}^{T}&\mathbf {D} \end{bmatrix} \succ \mathbf {0}\) be a positive definite with \(\mathbf {A} \in \mathbb {R}^{n \times n}\) and \(\mathbf {D} \in \mathbb {R}^{m \times m}\) and \(\varSigma /\mathbf {D} = \mathbf {A} - \mathbf {B}\mathbf {D}^{-1}\mathbf {B}^{T}\) is \(\mathbf {D}\)’s Schur complement, and let \(\mathbf {x}_1 \in \mathbb {R}^{n}\) and \(\mathbf {x}_2 \in \mathbb {R}^{m}\) be arbitrary. Then, the gaussian function \(\rho _{\sqrt{\varSigma }}(\mathbf {x})\) factors as \(\rho _{\sqrt{\varSigma /\mathbf {D}}}(\mathbf {x}_1 - \mathbf {BD}^{-1}\mathbf {x}_2) \cdot \rho _{\sqrt{\mathbf {D}}}(\mathbf {x}_2) = \rho _{\sqrt{\varSigma }}(\mathbf {x})\) where \(\mathbf {x} = (\mathbf {x}_1, \mathbf {x}_2) \in \mathbb {R}^{n+m}\).

Proof

(Sketch). This is seen through defining the inverse of \(\varSigma \) in terms of \(\varSigma /\mathbf {D}\) and writing out \(\rho _{\sqrt{\varSigma }}(\mathbf {x})\) in terms of \(\varSigma /\mathbf {D}\). The matrix factorization

$$\varSigma = \begin{bmatrix} \mathbf {I}&\mathbf {BD}^{-1} \\ \mathbf {0}&\mathbf {I} \end{bmatrix} \begin{bmatrix} \varSigma /\mathbf {D}&\mathbf {0} \\ \mathbf {0}&\mathbf {D} \end{bmatrix} \begin{bmatrix} \mathbf {I}&\mathbf {0} \\ \mathbf {D}^{-1} \mathbf {B}^{T}&\mathbf {I} \end{bmatrix}$$

yields the formula for \(\varSigma ^{-1}\) needed to show the result. \(\square \)

A consequence of the above lemma is that the gaussian sum \(\rho _{\sqrt{\varSigma }}(\mathbb {Z}^{n+m})\) expands in terms of the gaussian functions \(\rho _{\sqrt{\mathbf {D}}}(\cdot )\) and \(\rho _{\sqrt{\varSigma /\mathbf {D}}}(\cdot )\),

$$\rho _{\sqrt{\varSigma }}(\mathbb {Z}^{n+m}) = \sum _{\mathbf {y}_2 \in \mathbb {Z}^{m}}\rho _{\sqrt{\mathbf {D}}}(\mathbf {y}_2)\cdot \rho _{\sqrt{\varSigma /\mathbf {D}}}(\mathbb {Z}^{n} - \mathbf {BD}^{-1}\mathbf {y}_2).$$

We will use the following lemma for the correctness proof. It states that if a discrete gaussian on the integer lattice is wide enough in its slimmest direction, then the lower dimensional discrete gaussians with covariance shaped with principal submatrices of the original are wide enough on their respective \(\mathbb {Z}^{n'}\)s.

Lemma 6

Let \(\epsilon >0\), \(\varSigma \succ 0\) be a positive definite matrix in \(\mathbb {R}^{n \times n}\), and let \(I_0 \subset [n]\) be an arbitrary, non-empty subset. If \(\varSigma \succeq \eta _{\epsilon }^{2}(\mathbb {Z}^{n})\), then \(\varSigma [I_0] \succeq \eta _{\epsilon }^{2}(\mathbb {Z}^{|I_0|})\) and \(\varSigma /\bar{I}_0 \succeq \eta _{\epsilon }^{2}(\mathbb {Z}^{n - |I_0|})\) for any principal submatrix - Schur complement pair, \((\varSigma [I_0], \varSigma /\bar{I}_0)\), of \(\varSigma \).

Proof

Note, a consequence of \(\varSigma \succeq \eta _{\epsilon }^{2}(\mathbb {Z}^{n})\) is that \(\varSigma \)’s minimum eigenvalue, \(\lambda _{min}(\varSigma )\), is greater than \(\eta _{\epsilon }^{2}(\mathbb {Z}^{n})\). Let \(\mathbf {M} := \varSigma [I_0] \in \mathbb {R}^{n_0 \times n_0}\) for \(n_0 = |I_0|\). \(\mathbf {M}\) is diagonalizable so let be its diagonalization. Notice, we have the following inequality from the interlacing theorems which imply \(\lambda _{min}(\mathbf {M}) \ge \lambda _{min}(\varSigma )\),

Next, we can bound the quantity \(\rho _{\sqrt{\mathbf {M}^{-1}}}((\mathbb {Z}^{n_0})^*) = \rho _{\sqrt{\mathbf {M}^{-1}}}(\mathbb {Z}^{n_0})\) by \(1+\epsilon \):

$$\begin{aligned} \rho _{\sqrt{\mathbf {M}^{-1}}}(\mathbb {Z}^{n_0})&= \sum _{\mathbf {x} \in \mathbb {Z}^{n_0}} e^{-\pi \mathbf {x}^{T} \mathbf {M} \mathbf {x}} \le \sum _{\mathbf {x} \in \mathbb {Z}^{n_0}} e^{-\pi \lambda _{min}(\varSigma ) \Vert \mathbf {x}\Vert ^{2}} \end{aligned}$$
(2)
$$\begin{aligned} ~~&\le \sum _{\mathbf {x} \in \mathbb {Z}^{n}} e^{-\pi \lambda _{min}(\varSigma ) \Vert \mathbf {x}\Vert ^{2}} \le 1 + \epsilon . \end{aligned}$$
(3)

The jump from \(\mathbb {Z}^{n_0}\) to \(\mathbb {Z}^{n}\) comes from the relation \(\mathbb {Z}^{n_0} \subset \mathbb {Z}^{n}\). The proof for the Schur complement is identical. \(\square \)

Next, we state and prove our main convolution lemma.

Lemma 7

For any real \(0 < \epsilon \le 1/2\), positive integers nm, vector \(\mathbf {c} = (\mathbf {c}_1, \mathbf {c}_2) \in \mathbb {R}^{n+m}\), and positive definite matrix \(\varSigma = \begin{bmatrix} \mathbf {A}&\mathbf {B} \\ \mathbf {B}^{T}&\mathbf {D} \end{bmatrix} \succeq \eta ^{2}_{\epsilon }(\mathbb {Z}^{n+m})\), \(\mathbf {A} \in \mathbb {Z}^{n\times n}\), \(\mathbf {B} \in \mathbb {Z}^{n\times m}\), and \(\mathbf {D} \in \mathbb {Z}^{m\times m}\) (where \(\varSigma /\mathbf {D} = \mathbf {A} - \mathbf {B}\mathbf {D}^{-1}\mathbf {B}^{T}\) is the Schur complement of \(\mathbf {D}\)) the random process

  • \(\mathbf {x}_2 \leftarrow D_{\mathbb {Z}^{m},\sqrt{\mathbf {D}}, \mathbf {c}_2}\).

  • \(\mathbf {x}_1 \leftarrow D_{\mathbb {Z}^{n},\sqrt{\varSigma /\mathbf {D}}, \mathbf {c}_1 + \mathbf {B}\mathbf {D}^{-1}(\mathbf {x}_2 - \mathbf {c}_2)}\).

produces a vector \(\mathbf {x} = (\mathbf {x}_1, \mathbf {x}_2) \in \mathbb {Z}^{n+m}\) such that the Rényi divergence of order infinity of \(D_{\mathbb {Z}^{n + m}, \sqrt{\varSigma }, \mathbf {c}}\) from \(\mathbf {x}\) is less than or equal to \(1 + 4 \epsilon \).

Proof

First, we write out the probability and use Lemma 5 to simplify the numerator. Let \(\mathbf {x'} = (\mathbf {x'}_1, \mathbf {x'}_2)\) below.

$$\begin{aligned} \Pr [\mathbf {x}_1 = \mathbf {x'}_1, \mathbf {x}_2 = \mathbf {x'}_2]&= \frac{\rho _{\sqrt{\varSigma /\mathbf {D}}}(\mathbf {x'}_1 - \mathbf {c}_1 - \mathbf {BD}^{-1}(\mathbf {x'}_2 - \mathbf {c}_2))\cdot \rho _{\sqrt{\mathbf {D}}}(\mathbf {x'}_2 - \mathbf {c}_2)}{\rho _{\sqrt{\varSigma /\mathbf {D}}} (\mathbb {Z}^{n} -\mathbf {c}_1- \mathbf {BD}^{-1}(\mathbf {x'}_2 - \mathbf {c}_2))\cdot \rho _{\sqrt{\mathbf {D}}}(\mathbb {Z}^{m} - \mathbf {c}_2)} \\&= \frac{\rho _{\sqrt{\varSigma }}(\mathbf {x'} - \mathbf {c})}{\rho _{\sqrt{\varSigma /\mathbf {D}}} (\mathbb {Z}^{n} -\mathbf {c}_1- \mathbf {BD}^{-1}(\mathbf {x'}_2 - \mathbf {c}_2))\cdot \rho _{\sqrt{\mathbf {D}}}(\mathbb {Z}^{m} - \mathbf {c}_2)} \end{aligned}$$

Regarding the denominator, we use Lemma 6 to see that since \(\varSigma \succeq \eta _{\epsilon }^{2}(\mathbb {Z}^{n+m})\). Now, we can use Lemma 3 for the first gaussian sum (dependent on \(\mathbf {x'}_2\)) in the denominator to see,

$$\Pr [\mathbf {x}_1 = \mathbf {x'}_1, \mathbf {x}_2 = \mathbf {x'}_2] \in \alpha \cdot D_{\mathbb {Z}^{n+m}, \sqrt{\varSigma }, \mathbf {c}}(\mathbf {x'}) \cdot \left[ \left( \frac{1-\epsilon }{1+\epsilon }\right) , 1 \right] ^{-1}$$

where \(\alpha = \frac{\rho _{\sqrt{\varSigma }}(\mathbb {Z}^{n+m} - \mathbf {c})}{\rho _{ \sqrt{\varSigma /\mathbf {D}} } (\mathbb {Z}^{n}) \cdot \rho _{\sqrt{\mathbf {D}}}(\mathbb {Z}^{m} - \mathbf {c}_2)}.\)

Next, we show \(\alpha \approx 1\). Using Lemma 5 we expand

$$\rho _{\sqrt{\varSigma }}(\mathbb {Z}^{n+m} - \mathbf {c}) = \sum _{\mathbf {y}_2 \in \mathbb {Z}^{m}} \rho _{\sqrt{\mathbf {D}}}(\mathbf {y}_2 - \mathbf {c}_2)\cdot \rho _{\sqrt{\varSigma /\mathbf {D}}}(\mathbb {Z}^{n} - \mathbf {c}_1 - \mathbf {BD}^{-1}(\mathbf {y}_2-\mathbf {c}_2)).$$

The sum \(\rho _{\sqrt{\varSigma /\mathbf {D}}}(\mathbb {Z}^{n} -\mathbf {c}_1- \mathbf {BD}^{-1}(\mathbf {y}_2 - \mathbf {c}_2))\) is approximately \(\rho _{\sqrt{\varSigma /\mathbf {D}}}(\mathbb {Z}^{n})\) because \(\varSigma /\mathbf {D} \succeq \eta _{\epsilon }^{2}(\mathbb {Z}^{n})\) as a consequence of Lemma 6 and \(\varSigma \succeq \eta ^{2}_{\epsilon }(\mathbb {Z}^{n+m})\). In other words,

$$\rho _{\sqrt{\varSigma /\mathbf {D}}}(\mathbb {Z}^{n} -\mathbf {c}_1- \mathbf {BD}^{-1}(\mathbf {y}_2 - \mathbf {c}_2))\in \left[ \frac{1-\epsilon }{1+\epsilon },1 \right] \cdot \rho _{\sqrt{\varSigma /\mathbf {D}}}(\mathbb {Z}^{n})$$

and \(\alpha \in \left[ \left( \frac{1-\epsilon }{1+\epsilon }\right) , 1 \right] \).

Finally, we have the approximation

$$\Pr [\mathbf {x}_1 = \mathbf {x'}_1, \mathbf {x}_2 = \mathbf {x'}_2] \in \left[ \left( \frac{1-\epsilon }{1+\epsilon }\right) , \left( \frac{1+\epsilon }{1-\epsilon }\right) \right] \cdot D_{\mathbb {Z}^{n+m}, \sqrt{\varSigma }, \mathbf {c}}(\mathbf {x'}).$$

Given the restriction on \(\epsilon \in (0, 1/2]\), we have the relation we desire

$$\Pr [\mathbf {x}_1 = \mathbf {x'}_1, \mathbf {x}_2 = \mathbf {x'}_2] \in [1-4\epsilon , 1+4\epsilon ] \cdot D_{\mathbb {Z}^{n+m}, \sqrt{\varSigma }, \mathbf {c}}(\mathbf {x'}).$$

\(\square \)

Next, we bound the Rényi divergence of order infinity between the output of SamplePz and the desired distribution. We need to ensure each discrete gaussian convolution in the algorithm is non-degenerate. We do not analyze the statistical loss from the floating point computations. As shown in Lemma 7, we need \(\varSigma /\mathbf {D} \succeq \eta _{\epsilon }^{2}(\mathbb {Z}^{n_0})\) and \(\mathbf {D} \succeq \eta _{\epsilon }^{2}(\mathbb {Z}^{n_1})\) at each of the n discrete gaussian convolutions. This is met through a simple condition on \(\varSigma _p\) as hinted to in Lemma 6.

Theorem 5

Let \(0 < \epsilon \le 1/2\). If \(\varSigma _p \succeq \eta _{\epsilon }^{2}(\mathbb {Z}^{n(2 + \log q)})\), then \(\textsc {SamplePz}\) returns a perturbation with a Rényi divergence of order infinity \(R_{\infty }(D_{\mathbb {Z}^{n(2 + \log q)}, \sqrt{\varSigma _p}}|| \textsc {SamplePz}) \le 1 + 12n \hat{\epsilon }.\)

Proof

Since each covariance given to SampleFz is a Schur complement or a principal submatrix of a Schur complement of \(\varSigma _p\), Lemma 6 and the interlacing theorems (Theorems 1 and 2) imply the conditions for Lemma 7 are met. As there are \(n-1\) convolutions (inner nodes of a full binary tree of depth \(\log n\)), a quick induction argument shows the probability distribution of the output of \(\textsc {SamplePz}\) is in the interval \([(1-4 \epsilon )^{3(n-1)}, (1+4 \epsilon )^{3(n-1)}] \cdot D_{\mathbb {Z}^{n(2 + \log q)}, \sqrt{\varSigma _p}}(\mathbf {x}).\) Then, we have \(R_{\infty }(D_{\mathbb {Z}^{n(2 + \log q)}, \sqrt{\varSigma _p}}|| \textsc {SamplePz}) \le (1+4 \epsilon )^{3(n-1)} \approx 1 + 12n \hat{\epsilon }.\) \(\square \)

For common parameters \(\epsilon = 2^{-128}\) and \(n = 1024\), we have \(1 - (1+4 \epsilon )^{3(n-1)} \approx 2^{-114}\).

In summary, this shows the FFT-like recurrence in perturbation sampling the integer lattice with an algebraic covariance in power of two cyclotomic rings through repeated convolutions. The relative simplicity of the power of two case relies on the fact that matrix transpose corresponds to the conjugation field automorphism. Hermitian transpose corresponds to the conjugation automorphism in the general cyclotomic case. Therefore, we would use the canonical embedding for efficient perturbation sampling in general cyclotomic rings.