# Short, Invertible Elements in Partially Splitting Cyclotomic Rings and Applications to Lattice-Based Zero-Knowledge Proofs

- 14 Citations
- 2.4k Downloads

## Abstract

When constructing practical zero-knowledge proofs based on the hardness of the Ring-LWE or the Ring-SIS problems over polynomial rings \(\mathbb {Z}_p[X]/(X^n+1)\), it is often necessary that the challenges come from a set \(\mathcal {C}\) that satisfies three properties: the set should be large (around \(2^{256}\)), the elements in it should have small norms, and all the non-zero elements in the difference set \(\mathcal {C}-\mathcal {C}\) should be invertible. The first two properties are straightforward to satisfy, while the third one requires us to make efficiency compromises. We can either work over rings where the polynomial \(X^n+1\) only splits into two irreducible factors modulo *p*, which makes the speed of the multiplication operation in the ring sub-optimal; or we can limit our challenge set to polynomials of smaller degree, which requires them to have (much) larger norms.

In this work we show that one can use the optimal challenge sets \(\mathcal {C}\) and still have the polynomial \(X^n+1\) split into more than two factors. This comes as a direct application of our more general result that states that all non-zero polynomials with “small” coefficients in the cyclotomic ring \(\mathbb {Z}_p[X]/(\varPhi _m(X))\) are invertible (where “small” depends on the size of *p* and how many irreducible factors the \(m^{th}\) cyclotomic polynomial \(\varPhi _m(X)\) splits into). We furthermore establish sufficient conditions for *p* under which \(\varPhi _m(X)\) will split in such fashion.

For the purposes of implementation, if the polynomial \(X^n+1\) splits into *k* factors, we can run FFT for \(\log {k}\) levels until switching to Karatsuba multiplication. Experimentally, we show that increasing the number of levels from one to three or four results in a speedup by a factor of \(\approx 2\) – 3. We point out that this improvement comes completely for free simply by choosing a modulus *p* that has certain algebraic properties. In addition to the speed improvement, having the polynomial split into many factors has other applications – e.g. when one embeds information into the Chinese Remainder representation of the ring elements, the more the polynomial splits, the more information one can embed into an element.

## 1 Introduction

Cryptography based on the presumed hardness of the Ring/Module-SIS and Ring/Module-LWE problems [Mic07, PR06, LM06, LPR10, LS15] is seen as a very likely replacement of traditional cryptography after the eventual coming of quantum computing. There already exist very efficient basic public key primitives, such as encryption schemes and digital signatures, based on the hardness of these problems. For added efficiency, most practical lattice-based constructions work over polynomial rings \(\mathbb {Z}_p[X]/(f(X))\) where *f*(*X*) is the cyclotomic polynomial \(f(X)=X^n+1\) and *p* is chosen in such a way that the \(X^n+1\) splits into *n* linear factors modulo *p*. With such a choice of parameters, multiplication in the ring can be performed very efficiently via the Number Theoretic Transform, which is an analogue of the Fast Fourier Transform that works over a finite field. Some examples of practical implementations that utilize NTT implementations of digital signatures and public key encryption based on the Ring-LWE problem can be found in [GLP12, PG13, ADPS16, BDK+17, DLL+17].

Constructions of more advanced lattice-based primitives sometimes require that the underlying ring has additional properties. In particular, *practical* protocols that utilize zero-knowledge proofs often require that elements with small coefficients are invertible (e.g. [BKLP15, BDOP16, LN17, DLNS17]). This restriction, which precludes using rings where \(X^n+1\) splits completely modulo *p*, stems from the structure of *approximate* zero-knowledge proofs, and we sketch this intuition below.

### 1.1 Approximate Zero-Knowledge Proofs

*s*that satisfies the relation \(f(s)=t\), where

*f*and

*t*are public. In the lattice setting, the function

*A*is a random matrix over some ring (the ring is commonly \(\mathbb {Z}_p\) or \(\mathbb {Z}_p[X]/(X^n+1)\)) and

*s*is a vector over that same ring, where the coefficients of all (or almost all) the elements comprising

*s*are bounded by some small value \(\ll p\).

The function *f* in (1) satisfies the property that \(f(s_1)+f(s_2)=f(s_1+s_2)\) and for any *c* in the ring and any vector *s* over the ring we have \(f(sc)=c\cdot f(s)\). The zero-knowledge proof for attempting to prove the knowledge of *s* proceeds as follows:

The Prover first chooses a “masking parameter” *y* and sends \(w:=f(y)\) to the Verifier. The Verifier picks a random challenge *c* from a subset of the ring and sends it to the prover (in a non-interactive proof, the Prover himself would generate \(c:=\text {H}(t,w)\), where \(\text {H}\) is a cryptographic hash function). The Prover then computes \(z:=sc+y\) and sends it to the Verifier.^{1}

*z*are small. If these checks pass, then the Verifier accepts the proof. To show that the protocol is a proof of knowledge, one can rewind the Prover to just after his first move and send a different challenge \(c'\), and get a response \(z'\) such that \(f(z')=c't+w\). Combined with the first response, we extract the equation

Notice that while the prover started with the knowledge of an *s* with small coefficients such that \(f(s)=t\), he only ends up proving the knowledge of an \(\bar{s}\) with larger coefficients such that \(f(\bar{s})=\bar{c}t\). If \(\bar{c}\) also has small coefficients, then this type of proof is good enough in many (but not all) situations.

**Applications of Approximate Zero-Knowledge Proofs.** As a simple example of the utility of approximate zero-knowledge proofs, we consider commitment schemes where a commitment to a message *m* involves choosing some randomness *r*, and outputting \(f(s)=t\), where *s* is defined as \(\begin{bmatrix} r\\ m\end{bmatrix}\) where *r* and *m* have small coefficients.^{2} Using the zero-knowledge proof from Sect. 1.1, one can prove the knowledge of an \(\bar{s}\) and \(\bar{c}\) such that \(f(\bar{s})=\bar{c}t\). If \(\bar{c}\) is invertible in the ring, then we can argue that this implies that if *t* is later opened to any valid commitment \(s'\) where \(f(s')=t\), then it must be \(s'=\bar{s}/\bar{c}\).

*f*. Such a collision implies a solution to the (Ring-)SIS problem, or, depending on the parameters, may simply not exist (and the scheme can thus be based on (Ring-)LWE).

There are more intricate examples involving commitment schemes (see e.g. [BKLP15, BDOP16]) as well as other applications of such zero knowledge proofs, (e.g. to verifiable encryption [LN17] and voting protocols [DLNS17]) which require that the \(\bar{c}\) be invertible.

**The Challenge Set and its Effect on the Proof.** The challenge *c* is drawn uniformly from some domain \(\mathcal {C}\) which is a subset of \(\mathbb {Z}_p[X]/(X^n+1)\). In order to have small soundness error, we would like \(\mathcal {C}\) to be large. When building non-interactive schemes that should remain secure against quantum computers, one should have \(|\mathcal {C}|\) be around \(2^{256}\). On the other hand, we also would like *c* to have a small norm. The reason for the latter is that the honest prover computes \(z:=sc+y\) and so the \(\bar{s}\) that is extracted from the Prover in (2) is equal to \(z-z'\), and must also therefore depend on \(\Vert sc\Vert \). Thus, the larger the norms of \(c,c'\) are, the larger the extracted solution \(\bar{s}\) will be, and the easier the corresponding (Ring-)SIS problem will be.

^{3}The \(l_2\) norm of such elements is \(\sqrt{60}\).

*k*irreducible polynomials modulo

*p*, then all of these polynomials must have degree 256/

*k*. It is then easy to see, via the Chinese Remainder Theorem that every non-zero polynomial of degree less than 256/

*k*is invertible in the ring \(\mathbb {Z}_p[X]/(X^{256}+1)\). We can therefore define the set

### 1.2 Our Contribution

Our main result is a general theorem (Theorem 1.1) about the invertibility of polynomials with small coefficients in polynomial rings \(\mathbb {Z}_p[X]/(\varPhi _m(X))\), where \(\varPhi _m(X)\) is the \(m^{th}\) cyclotomic polynomial. The theorem states that if a non-zero polynomial has small coefficients (where “small” is related to the prime *p* and the number of irreducible factors of \(\varPhi _m(X)\) modulo *p*), then it’s invertible in the ring \(\mathbb {Z}_p[X]/(\varPhi _m(X))\). For the particular case of \(\varPhi _m(X)=X^n+1\), we show that the polynomial \(X^n+1\) can split into several (in practice up to 8 or 16) irreducible factors and we can still use the optimal challenge sets, like ones of the form from (4). This generalizes and extends a result in [LN17] which showed that one can use the optimal set when \(X^n+1\) splits into two factors. We also show, in Sect. 3.3, some methods for creating challenge sets that are slightly sub-optimal, but allow for the polynomial to split further.

The statement of Theorem 1.1 uses notation from Definition 2.1, while the particular case of \(X^n+1\) in Corollary 1.2 is self-contained. We therefore recommend the reader to first skim the Corollary statement. The proofs of the Theorem and the Corollary are given at the end of Sect. 3.2. For completeness, we also state sufficient conditions for invertibility based on the \(\ell _2\)-norm of the polynomial. This is an intermediate result that we need on the way to obtaining our main result about the invertibility of polynomials with small coefficients (i.e. based on the \(\ell _\infty \) norm of the polynomial), but it could be of independent interest.

### Theorem 1.1

*p*is a prime such that \(p\equiv 1 \pmod {z}\) and \(\text{ ord }_m(p)=m/z\), then the polynomial \(\varPhi _m(X)\) factors as

The above theorem gives sufficient conditions for *p* so that all polynomials with small coefficients in \(\mathbb {Z}_p[X]/(\varPhi _m(X))\) are invertible, but it does not state anything about whether there exist such *p*. In Theorem 2.5, we show that if we additionally put the condition on *m* and *z* that \(8|m \Rightarrow 4|z\), then there are indeed infinitely many primes *p* that satisfy these conditions. In practical lattice constructions involving zero-knowledge proofs, we would normally use a modulus of size at least \(2^{20}\), and we experimentally confirmed (for various cyclotomic polynomials) that one can indeed find many such primes that are of that size.

Specializing the above to the ring \(\mathbb {Z}_p[X]/(X^n+1)\), we obtain the following corollary:

### Corollary 1.2

As an application of this result, suppose that we choose \(k=8\) and a prime *p* congruent to \(17 \pmod {32}\) such that \(p>2^{20}\). Furthermore, suppose that we perform our zero-knowledge proofs over the ring \(\mathbb {Z}_p[X]/(X^n+1)\) (where *n* is a power of 2 greater than 8), and prove the knowledge of \(\bar{s},\bar{c}\) such that \(f(\bar{s})=\bar{c} t\) where \(\Vert \bar{c}\Vert _\infty \le 2\) (i.e. the challenges *c* are taken such that \(\Vert c\Vert _\infty =1\)). Then the above theorem states that \(X^n+1\) factors into 8 polynomials and \(\bar{c}\) will be invertible in the ring since \(\frac{1}{\sqrt{8}}\cdot p^{1/8} > 2 \).

Having \(p>2^{20}\) is quite normal for the regime of zero-knowledge proofs, and therefore having the polynomial \(X^n+1\) split into 8 factors should be possible in virtually every application. If we would like it to split further into 16 or 32 factors, then we would need \(p>2^{48}\) or, respectively, \(p>2^{112}\). In Sect. 3.3 we describe how our techniques used to derive Theorem 1.1 can also be used in a somewhat “ad-hoc” fashion to create different challenge sets \(\mathcal {C}\) that are nearly-optimal (in terms of the maximal norm), but allow \(X^n+1\) to split with somewhat smaller moduli than implied by Theorem 1.1.

In Sect. 4, we describe how one would combine the partially-splitting FFT algorithm with a Karatsuba multiplication algorithm to efficiently multiply in a partially-splitting ring. For primes of size between \(2^{20}\) and \(2^{29}\), one obtains a speed-up of about a factor of 2 by working over rings where \(X^n+1\) splits into 8 versus just 2 factors.

In addition to the speed improvement, there are applications whose usability can be improved by the fact that we work over rings \(\mathbb {Z}_p[X]/(X^n+1)\) where \(X^n+1\) splits into more factors. For example, [BKLP15] constructed a commitment scheme and zero-knowledge proofs of knowledge that allows to prove the fact that \(\mathbf{c}=\mathbf{a}\mathbf{b}\) when Commit(\(\mathbf{a}\)), Commit(\(\mathbf{b}\)), Commit(\(\mathbf{c}\)) are public (the same holds for addition). An application of this result is the verifiability of circuits. For this application, one only needs commitments of 0’s and 1’s, thus if we work over a ring where \(X^n+1\) splits into *k* irreducible factors, one can embed *k* bits into each Chinese Remainder coefficient of \(\mathbf{a}\) and \(\mathbf{b}\), and therefore proving that \(\mathbf{c}=\mathbf{a}\mathbf{b}\) implies that all *k* multiplications of the bits were performed correctly. Thus the larger *k* is, the more multiplications one can prove in parallel. Unfortunately *k* cannot be set too large without ruining the necessary property that the difference of any two distinct challenges is invertible or increasing the \(\ell _2\)-norm of the challenges as described in Sect. 1.1. Our result therefore allows to prove products of 8 (or 16) commitments in parallel without having to increase the parameters of the scheme to accommodate the larger challenges.

## 2 Cyclotomics and Lattices

### 2.1 Cyclotomic Polynomials

### Definition 2.1

The function \(\phi (m)\) is the Euler phi function, \(\delta (m)\) is sometimes referred to as the *radical* of *m*, and \(\tau (m)\) is a function that sometimes comes into play when working with the geometry of cyclotomic rings. The function \(\text{ ord }_m(n)\) is the order of an element *n* in the multiplicative group \(\mathbb {Z}_m^*\). In the special case of \(m=2^k\), we have \(\phi (m)=\tau (m)=2^{k-1}\) and \(\delta (m)=2\).

*p*is some prime and \(r_1,\ldots ,r_{\phi (m)}\) are elements in \(\mathbb {Z}_p^*\) such that \(\text{ ord }_p(r_j)=\phi (m)\), then one can write

### Lemma 2.2

### Proof

### 2.2 The Splitting of Cyclotomic Polynomials

In Theorem 2.3, we give the conditions on the prime *p* such that the polynomial \(\varPhi _m(X)\) splits into irreducible factors \(X^{m/k}-r\) modulo *p*. In Theorem 2.5, we then show that when *m* and *k* satisfy an additional relation, there are infinitely many *p* that satisfy the necessary conditions of Theorem 2.3.

### Theorem 2.3

*p*is a prime such that \(p\equiv 1 \pmod {z}\) and \(\text{ ord }_m(p)=m/z\), then the polynomial \(\varPhi _m(X)\) factors as

### Proof

*p*is a prime and \(p \equiv 1 \pmod {z}\), there exists an element

*r*such that \(\text{ ord }_p(r)=z\). Furthermore, for all the \(\phi (z)\) integers \(1<i<z\) such that \(\gcd (i,z)=1\), we also have \(\text{ ord }_p(r^{i})=z\). We therefore have, by definition of \(\varPhi \), that

*p*. Suppose they are not and \(X^{m/z} - r_j\) has an irreducible divisor

*f*of degree \(d < \frac{m}{z}\). Then

*f*defines an extension field of \(\mathbb {Z}_p\) of degree

*d*, i.e. a finite field with \(p^d\) elements that all satisfy \(X^{p^d} = X\). Hence

*f*divides \(X^{p^d} - X\). Now, from \({\text {ord}}_m(p) = \frac{m}{z} > d\) it follows that we can write \(p^d = am + b\) where \(b \ne 1\). Thus

*f*splits, the roots of

*f*are also roots of \(X^{am + (b - 1)} - 1\) and therefore have order dividing \(am + (b-1)\). This is a contradiction. As a divisor of \(X^{m/z} - r_j\) (and therefore of \(\varPhi _m\)),

*f*has only roots of order

*m*. \(\square \)

In the proof of Theorem 2.5 we need a small result about the multiplicative order of odd integers modulo powers of 2. Since we also need this later in the proof of Corollary 1.2, we state this result in the next lemma.

### Lemma 2.4

Let \(a \equiv 1 + 2^f \pmod {2^{f+1}}\) for \(f \ge 2\). Then the order of *a* in the group of units modulo \(2^e\) for \(e \ge f\) is equal to \(2^{e-f}\), i.e. \(\text{ ord }_{2^e}(a) = 2^{e-f}\).

### Proof

We can write \(a = 1 + 2^fk_1\) with some odd \(k_1 \in \mathbb {Z}\). Then notice \(a^2 = 1 + 2^{f+1}k_1 + 2^{2f}k_1^2 = 1 + 2^{f+1}(k_1 + 2^{f-1}k_1^2) = 1 + 2^{f+1}k_2\) with odd \(k_2 = k_1 + 2^{f-1}k_1^2\). It follows iteratively that \(a^{2^{e-f}} = 1 + 2^ek_{2^{e-f}} \equiv 1 \pmod {2^e}\), which implies the order of *a* modulo \(2^e\) divides \(2^{e-f}\), but \(a^{2^{e-f-1}} = 1 + 2^{e-1}k_{2^{e-f-1}} \not \equiv 1 \pmod {2^e}\) since \(k_{2^{e-f-1}}\) is odd. So, the multiplicative order of *a* modulo \(2^e\) must be \(2^{e-f}\).

### Theorem 2.5

Let \(m=\prod p_i^{e_i}\) for \(e_i\ge 1\) and \(z=\prod p_i^{f_i}\) for any \(1\le f_i\le e_i\). Furthermore, assume that if *m* is divisible by 8, then *z* is divisible by 4. Then there are infinitely many primes *p* such that \(p \equiv 1 \pmod {z}\) and \(\text{ ord }_m(p)=m/z\).

### Proof

*g*is a generator modulo \(p_i\) then either

*g*or \(g + p_i\), say \(g'\), is a generator modulo every power of \(p_i\) (c.f. [Coh00, Lemma 1.4.5]). Define \(a_i = (g')^{(p_i - 1)p_i^{f_i - 1}}\). Then, since \(g'\) has order \((p_i - 1)p_i^{f_i - 1}\) modulo \(p_i^{f_i}\) and order \((p_i - 1)p_i^{e_i - 1}\) mod \(p_i^{e_i}\), it follows that \(a_i \bmod p_i^{f_i} = 1\) and

*m*is divisible by 8; that is, \(e_1 \ge 3\). This implies \(f_1 \ge 2\). From Lemma 2.4 we see that 5 is a generator of a cyclic subgroup of \(\mathbb {Z}_{2^e}^\times \) of index 2 for every \(e \ge 3\), i.e. \({\text {ord}}_{2^{e}}(5) = 2^{e - 2}\). Therefore, \(5^{2^{f_1 - 2}} \bmod 2^{f_1} = 1\) and

*a*that fulfills our two conditions and in fact every integer congruent to \(a \bmod m\) does. By Dirichlet’s theorem on arithmetic progressions, there are infinitely many primes among the \(a + lm\) (\(l \in \mathbb {Z}\)). \(\square \)

As an experimental example consider \(m = 2^2 3^3 7 = 756\) and \(z = 2\cdot 3\cdot 7 = 42\). Then \(\varPhi _m\) splits into 12 polynomials modulo primes of the form in Theorem 2.5. There are 2058 primes of this form between \(2^{20}\) and \(2^{21}\).

### 2.3 The Vandermonde Matrix

*p*and positive integer

*k*, then

Values of *m* less than 600 for which \(s_1(m)\ne \sqrt{\tau (m)}\).

| \(s_1(m)\) | \(\sqrt{\tau (m)}/s_1(m)\) |
---|---|---|

\(105=3\cdot 5\cdot 7\) | 9.952 | 1.0296172 |

\(165=3\cdot 5\cdot 11\) | 12.785 | 1.0046612 |

\(195=3\cdot 5\cdot 13\) | 13.936 | 1.0019718 |

\(210=2\cdot 3\cdot 5\cdot 7\) | 9.952 | 1.0296172 |

\(315=3^2\cdot 5\cdot 7\) | 17.237 | 1.0296172 |

\(330=2\cdot 3\cdot 5\cdot 11\) | 12.785 | 1.0046612 |

\(390=2\cdot 3\cdot 5\cdot 13\) | 13.936 | 1.0019718 |

\(420=2^2\cdot 3\cdot 5\cdot 7\) | 14.074 | 1.0296172 |

\(495=3^2\cdot 5\cdot 11\) | 22.145 | 1.0046612 |

\(525=3\cdot 5^2\cdot 7\) | 22.253 | 1.0296172 |

\(585=3^2\cdot 5\cdot 13\) | 24.139 | 1.0019718 |

We do not know of a theorem analogous to (9) that holds for all *m*, and so we numerically computed \(s_1(m)\) for all \(m<3000\) and observed that \(s_1(m)\le \sqrt{\tau (m)}\) was always satisfied. Furthermore, for most *m*, we still had the equality \(s_1(m)=\sqrt{\tau (m)}\). The only exceptions where \(s_1(m)<\sqrt{\tau (m)}\) were integers that have at least 3 distinct odd prime factors. As an example, Table 1 contains a list of all such values up to 600 for which \(s_1(m)< \sqrt{\tau (m)}\). We point out that while it appears that having three prime factors is a necessary condition for *m* to appear in the table, it is not sufficient. For example, \(255=3\cdot 5\cdot 17\), but still \(s_1(255) = \sqrt{\tau (255)} = \sqrt{255}\).

For all practical sizes of *m* used in cryptography, the value \(s_1(m)\) is fairly easy to compute numerically using basic linear algebra software (e.g. MATLAB, Scilab, etc.), and we will state all our results in terms of \(s_1(m)\). Nevertheless, being able to relate \(s_1(m)\) to \(\tau (m)\) certainly simplifies the calculation. Based on our numerical observations, we formulate the following conjecture:

### Conjecture 2.6

For all positive integers *m*, \(s_1(m)\le \sqrt{\tau (m)}\).

### 2.4 Cyclotomic Rings and Ideal Lattices

*cyclotomic ring*\(\mathbb {Z}[X]/(\varPhi _m(X))\) and \(R_{m,p}\) to be the ring \(\mathbb {Z}_p[X]/(\varPhi _m(X))\), with the usual polynomial addition and multiplication operations. We will denote by normal letters elements in \(\mathbb {Z}\) and by bold letters elements in \(R_m\). For an odd

*p*, an element \(\mathbf{w}\in R_{m,p}\) can always be written as \(\sum \limits _{i=0}^{\phi (m)-1}{w_i X^i}\) where \(|w_i|\le (p-1)/2\). Using this representation, for \(\mathbf{w}\in R_{m,p}\) (and in \(R_m\)), we will define the lengths of elements as

*embedding norm*of an element in \(R_m\). If \(\omega _1,\ldots ,\omega _{\phi (m)}\) are the complex roots of \(\varPhi _m(X)\), then the embedding norm of \(\mathbf{w}\in R_m\) is

*n*is an additive sub-group of \(\mathbb {Z}^n\). For the purposes of this paper, all lattices will be full-rank. The determinant of a full-rank integer lattice \(\varLambda \) of dimension

*n*is the size of the quotient group \(|\mathbb {Z}^n/\varLambda |\). We write \(\lambda _1(\varLambda )\) to denote the Euclidean length of the shortest non-zero vector in \(\varLambda \).

*ideal lattices*. For any ideal lattice \(\varLambda \) of the ring \(R_m\), there exists a lower bound on the embedding norm of its vectors (c.f. [PR07, Lemma 6.2])

### Lemma 2.7

## 3 Invertible Elements in Cyclotomic Rings

The main goal of this section is to prove Theorem 1.1. To this end, we first prove Lemma 3.1, which proves the Theorem for the \(\ell _2\) norm. Unfortunately directly applying this Lemma to prove the \(\ell _\infty \) part of the Theorem 1.1 by using the relationship between the \(\ell _2\) and \(\ell _\infty \) norms is sub-optimal. In Sect. 3.2 we instead show that by writing elements of partially-splitting rings \(R_{m,p}\) as sums of polynomials over smaller, fully-splitting rings, one can obtain a tighter bound. We prove in Lemma 3.2 that if any of the parts of \(\mathbf{y}\in R_{m,p}\) is invertible in the smaller fully-splitting ring, then the polynomial \(\mathbf{y}\) is invertible in \(R_{m,p}\). The full proof of Theorem 1.1 will follow from this Lemma, the special case of Lemma 3.1 applicable to fully-splitting rings, and Theorem 2.3.

### 3.1 Invertibility and the \(\ell _2\) Norm

Our main result only needs a special case of the below Lemma corresponding to when \(\varPhi _m(X)\) fully splits, but we prove a more general statement since it doesn’t bring with it any additional complications.

### Lemma 3.1

### Proof

*i*, \(\mathbf{y}\bmod \left( X^{m/z}-r_i,p\right) =0\). For an

*i*for which \(\mathbf{y}\bmod \left( X^{m/z}-r_i,p\right) =0\) (if there is more than one such

*i*, pick one of them arbitrarily) define the set

*p*, for any polynomial \(\mathbf{z}\in \varLambda \), the polynomial \(\mathbf{z}\cdot X \in R_m\) is also in \(\varLambda \). This implies that \(\varLambda \) is an ideal of \(R_m\), and so an ideal lattice in the ring \(R_m\). By looking at the Chinese Remainder representation modulo

*p*of all the elements in \(\varLambda \) (they have 0 in the coefficient corresponding to modulo \(X^{m/z}-r_i\), and are arbitrary in all other coefficients), one can see that \(\left| \mathbb {Z}^{\phi (m)}/\varLambda \right| =p^{m/z}=p^{\phi (m)/\phi (z)}\), which is the determinant of \(\varLambda \). By Lemma 2.7, we then know that \(\lambda _1(\varLambda )\ge \frac{\sqrt{\phi (m)}}{s_1(m)}\cdot p^{1/\phi (z)}\).

Since \(\mathbf{y}\bmod \left( X^{m/z}-r_i,p\right) =0\) and \(0<\Vert \mathbf{y}\Vert \), we know that \(\mathbf{y}\) is a non-zero vector in \(\varLambda \). But we also have by our hypothesis that \(\Vert \mathbf{y}\Vert <\frac{\sqrt{\phi (m)}}{s_1(m)}\cdot p^{1/\phi (z)}\le \lambda _1(\varLambda )\), which is impossible.

\(\square \)

*m*,

*z*are powers of 2, then \(s_1(m)=\sqrt{m/2}\) and \(s_1(z)=\sqrt{z/2}\)). In Sect. 3.2, we instead break up \(\mathbf{y}\) into a sum of elements in smaller rings \(R_{z,p}\) and prove that only some of these parts, need to be invertible in \(R_{z,p}\) in order for the entire element \(\mathbf{y}\) to be invertible in \(R_{m,p}\).

We point out that Lemma 3.1 was already implicit in [SS11, Lemma 8] for \(\varPhi _m(X)=X^n+1\). To obtain a bound in the \(\ell _\infty \) norm, the authors of that work then applied the norm inequality between the \(\ell _2\) and \(\ell _\infty \) norms to obtain the bound that we described above. Using the more refined approach in the current paper, however, that bound can be tightened and would immediately produce an improvement in the main result of [SS11] which derives the statistical closeness of a particular distribution to uniform. Such applications are therefore another area in which our main result can prove useful.

### 3.2 Partially-Splitting Rings

*p*is chosen such that the polynomial \(\varPhi _m(X)\) factors into

*k*irreducible polynomials of the form \(X^{\phi (m)/k}-r_i\). Theorem 2.3 states the sufficient conditions on

*m*,

*k*,

*p*in order to obtain such a factorization. Throughout this section, we will use the following notation: suppose that

*p*is chosen as above. Then for all integers \(0\le i<\phi (m)/k-1\), we define the polynomials \(\mathbf{y}'_i\) as

*p*, we have

### Lemma 3.2

### Proof

*i*be such that \(\mathbf{y}_i'\) is invertible in the ring \(R_{z,p}\). From (13) and Lemma 2.2 we have that

\(\square \)

Theorem 1.1 now follows from the combination of Theorem 2.3, and Lemmas 3.1 and 3.2.

### Proof

*(Theorem* 1.1*)*. For the conditions on *m*, *z*, and *p*, it follows from Theorem 2.3 that the polynomial \(\varPhi _m(X)\) can be factored into irreducible factors modulo *p* as \(\prod \limits _{j=1}^{\phi (z)} (X^{m/z}-r_j)\). Lemma 2.2 then states that \(\varPhi _{z}(X)\equiv \prod \limits _{j=1}^{\phi (z)} (X-r_j)\pmod {p}\).

For any \(\mathbf{y}\in R_{m,p}\), let the \(\mathbf{y}_i'\) be defined as in (11) where \(k=\phi (z)\). If \(0<\Vert \mathbf{y}\Vert _\infty <\frac{1}{s_1(z)}\cdot p^{1/\phi (z)}\), then because each \(\mathbf{y}_i'\) consists of \(\phi (z)\) coefficients, we have that for all *i*, \(\Vert \mathbf{y}_i'\Vert <\frac{\sqrt{\phi (z)}}{s_1(z)}\cdot p^{1/\phi (z)}\). Since \(\mathbf{y}\ne 0\), it must be that for some *i*, \(\mathbf{y}_i'\ne 0\).

Lemma 3.1 therefore implies that the non-zero \(\mathbf{y}_i'\) is invertible in \(R_{z,p}\). In turn, Lemma 3.2 implies that \(\mathbf{y}\) is invertible in \(R_{m,p}\). \(\square \)

### Proof

(Of Corollary 1.2) If \(n \ge k > 1\) are powers of 2, then we set \(m=2n\) and \(z=2k\) in Theorem 1.1. Then \(\varPhi _m(X)=X^n+1\) and the condition that \(p\equiv 2k+1 \pmod {4k}\), i.e. \(p \equiv z+1 \pmod {2z}\), implies \(p\equiv 1 \pmod {z}\). Now we need to show that \(\text{ ord }_{m}(p)=m/z\), but this follows immediately from Lemma 2.4 by setting \(m = 2^e\) and \(z = 2^f\) and noting that \(f \ge 2\). Finally, from (9) we have \(s_1(z) = \sqrt{\tau (z)} = \sqrt{\frac{z}{2}} = \sqrt{k}\) and \(s_1(m) = \sqrt{n}\). Therefore the upper bounds for the \(\Vert \cdot \Vert _\infty \) and \(\Vert \cdot \Vert \) inequalities read \(\frac{1}{\sqrt{k}}p^{1/k} = \frac{1}{s_1(z)}p^{1/k}\) and \(p^{1/k} = \frac{\sqrt{n}}{s_1(m)}p^{1/k}\), respectively, as in Theorem 1.1. \(\square \)

### 3.3 Example of “Ad-Hoc” Applications of Lemma 3.2

Using Lemma 3.2, as we did in the proof of Theorem 1.1 above, gives a clean statement as to a sufficient condition under which polynomials are invertible in a partially-splitting ring. One thing to note is that putting a bound on the \(\ell _\infty \) norm does not take into account the other properties that our challenge space may have. For example, our challenge space in (4) is also sparse, in addition to having the \(\ell _\infty \) norm bounded by 1. Yet we do not know how to use this sparseness to show that one can let \(\varPhi _m(X)\) split further while still maintaining the invertibility of the set \(\mathcal {C}-\mathcal {C}\).

In some cases, however, there are ways to construct challenge sets that are more in line with Lemma 3.2 and will allow further splitting. We do not see a simple way in which to systematize these ideas, and so one would have to work out the details on a case-by-case basis. Below, we give such an example for the case in which we are working over the ring \(\mathbb {Z}_p[X]/(X^{256}+1)\) and would like to have the polynomial \(X^{256}+1\) split into 16 irreducible factors. If we would like to have \(X^n+1\) split into 16 factors modulo *p* and the set \(\mathcal {C}-\mathcal {C}\) to have elements whose infinity norm is bounded by 2, then applying Theorem 1.1 directly implies that we need to have \(2<\frac{1}{\sqrt{16}}\cdot p^{1/16}\), which implies \(p>2^{48}\).

We will now show how one can lower the requirement on *p* in order to achieve a split into 16 factors by altering the challenge set \(\mathcal {C}\) in (4).

*p*and all the non-zero elements in \(\mathcal {D}-\mathcal {D}\) will be invertible. Therefore if our application calls for a modulus that is larger than \(2^{30.5}\) but smaller than \(2^{48}\), we can use the challenge set \(\mathcal {D}\) and the below lemma.

### Lemma 3.3

Suppose that \(p>2^{16\log _2{\sqrt{14}}}\approx 2^{30.5}\) is a prime congruent to \(33 \pmod {64}\). Then the polynomial \(X^{256}+1\) splits into 16 irreducible polynomials of the form \(X^{16}+r_j\) modulo *p*, and any non-zero polynomial \(\mathbf{y}\in \mathcal {D}-\mathcal {D}\) (as defined in (14)) is invertible in the ring \(\mathbb {Z}_p[X]/(X^{256}+1)\).

### Proof

The fact that \(X^{256}+1\) splits into 16 irreducible factors follows directly from Theorem 2.3. Notice that for any \(\mathbf{y}\in \mathcal {D}-\mathcal {D}\), the maximum \(\ell _2\) norm of \(\mathbf{y}_i'\) is bounded by 4. Furthermore, the degree of each \(\mathbf{y}_i'\) is \(256/16=16\). Thus an immediate consequence of Lemmas 3.2 and 3.1 is that if \(p>2^{32}\), then any non-zero element in \(\mathcal {D}-\mathcal {D}\) is invertible. To slightly improve the lower bound, we can observe that the \(\mathbf{y}_i'\) of norm 4 are polynomials in \(\mathbb {Z}_p[X]/(X^{16}+1)\) with exactly four 2’s in them. But such elements can be written as a product of 2 and a polynomial with 4 \(\pm 1\)’s in it. So if both of those are invertible, so is the product. The maximum norm of these polynomials is 2 and so they are not the elements that set the lower bound. The next largest element in \(\mathcal {D}-\mathcal {D}\) is one that has three 2’s and two \(\pm 1\)’s. The norm of such elements is \(\sqrt{14}\). Thus for all \(p>2^{16\cdot \log _2(\sqrt{14})}\approx 2^{30.5}\), the \(\mathbf{y}_i'\) will be invertible in \(\mathbb {Z}_p[X]/(X^{16}+1)\), and thus every non-zero element in \(\mathcal {D}-\mathcal {D}\) will be invertible in \(\mathbb {Z}_p[X]/(X^{256}+1)\). \(\square \)

CPU cycles of our FFT-accelerated multiplication algorithm for \(\mathbb {Z}_p[X]/(X^{256} + 1)\) using Karatsuba multiplication for the base case. Both the FFT and Karatsuba are plain C implementations.

Number of FFT levels | Primes | |||
---|---|---|---|---|

\(2^{20} - 2^{14} + 1\) | \(2^{23} - 2^{13} + 1\) | \(2^{25} - 2^{12} + 1\) | \(2^{27} - 2^{11} + 1\) | |

0 | 123677 | 123717 | 134506 | 144913 |

1 | 83820 | 83778 | 91775 | 97641 |

2 | 55378 | 55700 | 63148 | 65778 |

3 | 38111 | 38061 | 43116 | 43282 |

4 | 27374 | 27626 | 31782 | 30836 |

5 | 21968 | 21955 | 26406 | 24937 |

6 | 17076 | 17007 | 21518 | 19811 |

7 | 15149 | 15144 | 20483 | 18026 |

8 | 16875 | 16893 | 22329 | 20299 |

CPU cycles of our FFT-accelerated multiplication algorithm for \(\mathbb {Z}_p[X]/(X^{256} + 1)\) using FLINT for base case multiplication. The FFT implementation is a highly optimized AVX2-based implementation.

Number of FFT levels | Primes | |||
---|---|---|---|---|

\(2^{20} - 2^{14} + 1\) | \(2^{23} - 2^{13} + 1\) | \(2^{25} - 2^{12} + 1\) | \(2^{27} - 2^{11} + 1\) | |

0 | 28245 | 31574 | 33642 | 35397 |

1 | 27168 | 29343 | 31419 | 32613 |

2 | 20989 | 23158 | 24915 | 25677 |

3 | 20521 | 22038 | 23582 | 23757 |

4 | 22543 | 23695 | 25016 | 24628 |

5 | 24473 | 24715 | 25337 | 30366 |

6 | 13578 | 13572 | 14307 | 13543 |

7 | 13981 | 14020 | 14522 | 13986 |

8 | 3873 | 3844 | 3847 | 3857 |

## 4 Polynomial Multiplication Implementation

We now describe in more detail the computational advantage of having the modulus \(\varPhi _m\) split into as many factors as possible and present our experimental results. We focus on the case where *m* is a power of two and write \(n = \phi (m) = m/2\). In this case one can use the standard radix-2 FFT-trick to speed up the multiplication. Note that for other *m*, one can also exploit the splitting in a divide-and-conquer fashion similar to the radix-2 FFT.

*r*so that we can write

*n*modulo the two sparse polynomials \(X^{n/2} \pm r\) is very easy and takes only \(\frac{n}{2}\) multiplications, \(\frac{n}{2}\) additions and \(\frac{n}{2}\) subtractions. If \(\mathbb {Z}_p\) contains higher roots so that \(X^n + 1\) splits further, then one can apply the FFT-trick recursively to the smaller rings. What is usually referred to as the number theoretic transform (NTT) is the case where \(\mathbb {Z}_p\) contains a 2

*n*-th root of unity so that \(X^n + 1\) splits completely into linear factors. This reduces multiplication in \(R_{m,p}\) to just multiplication in \(\mathbb {Z}_p\).

As we are interested in the case where the modulus does not split completely, we need to be able to multiply in rings of the form \(\mathbb {Z}_p[X]/(X^{n/k} - r_j)\) with \(k < n\). As is common in cryptographic applications (see, for example [BCLvV17]), we will use the Karatsuba multiplication algorithm to perform this operation. For both the FFT and the Karatsuba multiplication, we have written a relatively straight-forward C implementation.

In Table 2 we give the measurements of our experiments. We have performed multiplications in \(R_{512,p} = \mathbb {Z}_p[X]/(X^{256} + 1)\) for four completely splitting primes between \(2^{20}\) and \(2^{30}\). For each prime we have used between 0 and 8 levels of FFT before switching to Karatsuba multiplication. 0 levels of FFT means that no FFT stage was used at all and the input polynomials were directly multiplied via Karatsuba multiplication. In the other extreme of 8 levels of FFT, no Karatsuba multiplication was used and the corresponding measurements reflect the speed of our full number theoretic transform down to linear factors with pointwise multiplication as the base case. As one more example, when performing 3 levels of FFT, we were multiplying 8 polynomials each of degree less then 32 via Karatsuba multiplication. The listed numbers are numbers of CPU cycles needed for the whole multiplication. They are the medians of 10000 multiplications each. The tests where performed on a laptop equipped with an Intel Skylake i7 CPU running at 3.4 GHz. The cycle counter in this CPU ticks at a constant rate of 2.6 GHz. As one can see, being able to use a prime *p* so that \(X^n + 1\) splits into more than two factors is clearly advantageous. For instance, by allowing \(X^n + 1\) to split into 8 factors compared to just 2, we achieve a speedup of about a factor of two.

We have also experimented with highly-optimized polynomial multiplication algorithms provided by a popular computer algebra library FLINT [HJP13] and PARI [The16]. FLINT employs various forms of Kronecker substitution for the task of polynomial multiplication. For these experiments we used a fast vectorized FFT implementation written in assembler language with AVX2 instructions. For completeness, Table 3 gives the measurements for the tests with FLINT. Unfortunately, each call of the FLINT multiplication function produces additional overhead costs such as deciding on one of several algorithms and computing complex roots for the FFT used in Kronecker substitution. These additional costs are highly significant for our small polynomials. So for every additional stage of our FFT, one needs to multiply twice as many polynomials with FLINT, and hence FLINT spends twice as much time on these auxiliary tasks that one would not have in an actual cryptographic implementation specialized to a particular prime and modulus. This is especially inefficient when the number of FFT levels is large. There nearly all of the time is spend on these tasks as one can see in Table 3 by comparing the cycle counts of 7 and 8 stages of FFT. Note that for 7 stages of FFT, FLINT is used for the trivial task of multiplying polynomials of degree one.

While we were not able to do a meaningful analysis for the combination of our highly-optimized FFT with FLINT, one can see that at level 0 (where the amount of overhead it does is the lowest), FLINT outperforms our un-optimized Karatsuba multiplication by a factor between 4 and 5, while looking at Level 8 shows that our AVX-optimized FFT outperforms the non-optimized version by approximately the same margin. It is then reasonable to assume that one can improve non-FFT multiplication by approximately the same factor as we improved the FFT multiplication, and therefore the improvement going from level 1 and 3 would still be approximately a factor 2 in a routine where both Karatsuba and FFT multiplication were highly optimized.

## Footnotes

- 1.
In lattice-based schemes, it is important to keep the coefficients of

*z*small, and so*y*must be chosen to have small coefficients as well. This can lead to the distribution of*z*being dependent on*sc*, which leaks some information about*s*. This problem is solved in [Lyu09, Lyu12] via various rejection-sampling procedures. How this is done is not important to this paper, and so we ignore this step. - 2.
- 3.
The size of this set is \({256\atopwithdelims ()60}\cdot 2^{60}>2^{256}\).

## Notes

### Acknowledgements

We thank Rafaël del Pino for pointing out an improvement to Lemma 3.3. We also thank the anonymous reviewers for their advice on improving the paper. This work is supported by the SNSF ERC Transfer Grant CRETP2-166734 – FELICITY and the H2020 Project Safecrypto.

## References

- [ADPS16]Alkim, E., Ducas, L., Pöppelmann, T., Schwabe, P.: Post-quantum key exchange - a new hope. In: USENIX, pp. 327–343 (2016)Google Scholar
- [BCLvV17]Bernstein, D.J., Chuengsatiansup, C., Lange, T., van Vredendaal, C.: NTRU prime: reducing attack surface at low cost. In: Adams, C., Camenisch, J. (eds.) SAC 2017. LNCS, vol. 10719, pp. 235–260. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72565-9_12CrossRefGoogle Scholar
- [BDK+17]Bos, J.W., Ducas, L., Kiltz, E., Lepoint, T., Lyubashevsky, V., Schanck, J.M., Schwabe, P., Stehlé, D.: CRYSTALS - Kyber: a CCA-secure module-lattice-based KEM. IACR Cryptology ePrint Archive, 2017:634 (2017). To appear in Euro S&P 2018Google Scholar
- [BDOP16]Baum, C., Damgård, I., Oechsner, S., Peikert, C.: Efficient commitments and zero-knowledge protocols from ring-sis with applications to lattice-based threshold cryptosystems. IACR Cryptology ePrint Archive, 2016:997 (2016)Google Scholar
- [Ber01]Bernstein, D.J.: Multidigit Multiplication for Mathematicians (2001)Google Scholar
- [BKLP15]Benhamouda, F., Krenn, S., Lyubashevsky, V., Pietrzak, K.: Efficient zero-knowledge proofs for commitments from learning with errors over rings. In: Pernul, G., Ryan, P.Y.A., Weippl, E. (eds.) ESORICS 2015. LNCS, vol. 9326, pp. 305–325. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24174-6_16CrossRefGoogle Scholar
- [Coh00]Cohen, H.: A Course in Computational Algebraic Number Theory. Graduate Texts in Mathematics. Springer, Heidelberg (2000)Google Scholar
- [DLL+17]Ducas, L., Lepoint, T., Lyubashevsky, V., Schwabe, P., Seiler, G., Stehlé, D.: CRYSTALS - Dilithium: digital signatures from module lattices. IACR Cryptology ePrint Archive, 2017:633 (2017). To appear in TCHES 2018Google Scholar
- [DLNS17]Del Pino, R., Lyubashevsky, V., Neven, G., Seiler, G.: Practical quantum-safe voting from lattices. In: CCS (2017)Google Scholar
- [GLP12]Güneysu, T., Lyubashevsky, V., Pöppelmann, T.: Practical lattice-based cryptography: a signature scheme for embedded systems. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 530–547. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33027-8_31CrossRefGoogle Scholar
- [HJP13]Hart, W., Johansson, F., Pancratz, S.: FLINT: Fast Library for Number Theory, Version 2.4.0 (2013). http://flintlib.org
- [LM06]Lyubashevsky, V., Micciancio, D.: Generalized compact knapsacks are collision resistant. ICALP
**2**, 144–155 (2006)MathSciNetzbMATHGoogle Scholar - [LN17]Lyubashevsky, V., Neven, G.: One-shot verifiable encryption from lattices. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017. LNCS, vol. 10210, pp. 293–323. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56620-7_11CrossRefGoogle Scholar
- [LPR10]Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 1–23. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5_1CrossRefGoogle Scholar
- [LPR13]Lyubashevsky, V., Peikert, C., Regev, O.: A toolkit for ring-LWE cryptography. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 35–54. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38348-9_3CrossRefGoogle Scholar
- [LS15]Langlois, A., Stehlé, D.: Worst-case to average-case reductions for module lattices. Des. Codes Crypt.
**75**(3), 565–599 (2015)MathSciNetCrossRefzbMATHGoogle Scholar - [Lyu09]Lyubashevsky, V.: Fiat-shamir with aborts: applications to lattice and factoring-based signatures. In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 598–616. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10366-7_35CrossRefGoogle Scholar
- [Lyu12]Lyubashevsky, V.: Lattice signatures without trapdoors. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp. 738–755. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29011-4_43CrossRefGoogle Scholar
- [Mic07]Micciancio, D.: Generalized compact knapsacks, cyclic lattices, and efficient one-way functions. Comput. Complex.
**16**(4), 365–411 (2007)MathSciNetCrossRefzbMATHGoogle Scholar - [PG13]Pöppelmann, T., Güneysu, T.: Towards practical lattice-based public-key encryption on reconfigurable hardware. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 68–85. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_4CrossRefGoogle Scholar
- [PR06]Peikert, C., Rosen, A.: Efficient collision-resistant hashing from worst-case assumptions on cyclic lattices. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 145–166. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_8CrossRefGoogle Scholar
- [PR07]Peikert, C., Rosen, A.: Lattices that admit logarithmic worst-case to average-case connection factors. In: STOC, pp. 478–487 (2007)Google Scholar
- [SS11]Stehlé, D., Steinfeld, R.: Making NTRU as secure as worst-case problems over ideal lattices. In: Paterson, K.G. (ed.) EUROCRYPT 2011. LNCS, vol. 6632, pp. 27–47. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20465-4_4CrossRefGoogle Scholar
- [SSTX09]Stehlé, D., Steinfeld, R., Tanaka, K., Xagawa, K.: Efficient public key encryption based on ideal lattices. In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 617–635. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10366-7_36CrossRefGoogle Scholar
- [The16]The PARI Group, Univ. Bordeaux. PARI/GP version 2.9.0 (2016)Google Scholar