1 Introduction

Cryptography based on the presumed hardness of the Ring/Module-SIS and Ring/Module-LWE problems [Mic07, PR06, LM06, LPR10, LS15] is seen as a very likely replacement of traditional cryptography after the eventual coming of quantum computing. There already exist very efficient basic public key primitives, such as encryption schemes and digital signatures, based on the hardness of these problems. For added efficiency, most practical lattice-based constructions work over polynomial rings \(\mathbb {Z}_p[X]/(f(X))\) where f(X) is the cyclotomic polynomial \(f(X)=X^n+1\) and p is chosen in such a way that the \(X^n+1\) splits into n linear factors modulo p. With such a choice of parameters, multiplication in the ring can be performed very efficiently via the Number Theoretic Transform, which is an analogue of the Fast Fourier Transform that works over a finite field. Some examples of practical implementations that utilize NTT implementations of digital signatures and public key encryption based on the Ring-LWE problem can be found in [GLP12, PG13, ADPS16, BDK+17, DLL+17].

Constructions of more advanced lattice-based primitives sometimes require that the underlying ring has additional properties. In particular, practical protocols that utilize zero-knowledge proofs often require that elements with small coefficients are invertible (e.g. [BKLP15, BDOP16, LN17, DLNS17]). This restriction, which precludes using rings where \(X^n+1\) splits completely modulo p, stems from the structure of approximate zero-knowledge proofs, and we sketch this intuition below.

1.1 Approximate Zero-Knowledge Proofs

Abstractly, in a zero-knowledge proof the prover wants to prove the knowledge of s that satisfies the relation \(f(s)=t\), where f and t are public. In the lattice setting, the function

$$\begin{aligned} f(s):=As \end{aligned}$$
(1)

where A is a random matrix over some ring (the ring is commonly \(\mathbb {Z}_p\) or \(\mathbb {Z}_p[X]/(X^n+1)\)) and s is a vector over that same ring, where the coefficients of all (or almost all) the elements comprising s are bounded by some small value \(\ll p\).

The function f in (1) satisfies the property that \(f(s_1)+f(s_2)=f(s_1+s_2)\) and for any c in the ring and any vector s over the ring we have \(f(sc)=c\cdot f(s)\). The zero-knowledge proof for attempting to prove the knowledge of s proceeds as follows:

The Prover first chooses a “masking parameter” y and sends \(w:=f(y)\) to the Verifier. The Verifier picks a random challenge c from a subset of the ring and sends it to the prover (in a non-interactive proof, the Prover himself would generate \(c:=\text {H}(t,w)\), where \(\text {H}\) is a cryptographic hash function). The Prover then computes \(z:=sc+y\) and sends it to the Verifier.Footnote 1

The Verifier checks that \(f(z)=ct+w\) and, crucially, it also checks to make sure that the coefficients of z are small. If these checks pass, then the Verifier accepts the proof. To show that the protocol is a proof of knowledge, one can rewind the Prover to just after his first move and send a different challenge \(c'\), and get a response \(z'\) such that \(f(z')=c't+w\). Combined with the first response, we extract the equation

$$\begin{aligned} f(\bar{s})=\bar{c}t \end{aligned}$$
(2)

where \(\bar{s}=z-z'\) and \(\bar{c}=c-c'\).

Notice that while the prover started with the knowledge of an s with small coefficients such that \(f(s)=t\), he only ends up proving the knowledge of an \(\bar{s}\) with larger coefficients such that \(f(\bar{s})=\bar{c}t\). If \(\bar{c}\) also has small coefficients, then this type of proof is good enough in many (but not all) situations.

Applications of Approximate Zero-Knowledge Proofs. As a simple example of the utility of approximate zero-knowledge proofs, we consider commitment schemes where a commitment to a message m involves choosing some randomness r, and outputting \(f(s)=t\), where s is defined as \(\begin{bmatrix} r\\ m\end{bmatrix}\) where r and m have small coefficients.Footnote 2 Using the zero-knowledge proof from Sect. 1.1, one can prove the knowledge of an \(\bar{s}\) and \(\bar{c}\) such that \(f(\bar{s})=\bar{c}t\). If \(\bar{c}\) is invertible in the ring, then we can argue that this implies that if t is later opened to any valid commitment \(s'\) where \(f(s')=t\), then it must be \(s'=\bar{s}/\bar{c}\).

The sketch of the argument is as follows: If we extract \(\bar{s},\bar{c}\) and the commitment is opened with \(s'\) such that \(f(s')=t\), then multiplying both sides by \(\bar{c}\) results in \( f(\bar{c}s')=\bar{c}t. \) Combining this with what was extracted from the zero-knowledge proof, we obtain that

$$\begin{aligned} f(\bar{c}s')=f(\bar{s}). \end{aligned}$$
(3)

If \(s'\ne \bar{s}/\bar{c}\), then \(\bar{c}s'\ne \bar{s}\) and we found a collision (with small coefficients) for the function f. Such a collision implies a solution to the (Ring-)SIS problem, or, depending on the parameters, may simply not exist (and the scheme can thus be based on (Ring-)LWE).

There are more intricate examples involving commitment schemes (see e.g. [BKLP15, BDOP16]) as well as other applications of such zero knowledge proofs, (e.g. to verifiable encryption [LN17] and voting protocols [DLNS17]) which require that the \(\bar{c}\) be invertible.

The Challenge Set and its Effect on the Proof. The challenge c is drawn uniformly from some domain \(\mathcal {C}\) which is a subset of \(\mathbb {Z}_p[X]/(X^n+1)\). In order to have small soundness error, we would like \(\mathcal {C}\) to be large. When building non-interactive schemes that should remain secure against quantum computers, one should have \(|\mathcal {C}|\) be around \(2^{256}\). On the other hand, we also would like c to have a small norm. The reason for the latter is that the honest prover computes \(z:=sc+y\) and so the \(\bar{s}\) that is extracted from the Prover in (2) is equal to \(z-z'\), and must also therefore depend on \(\Vert sc\Vert \). Thus, the larger the norms of \(c,c'\) are, the larger the extracted solution \(\bar{s}\) will be, and the easier the corresponding (Ring-)SIS problem will be.

As a running example, suppose that we’re working over the polynomial ring \(\mathbb {Z}_p[X]/(X^{256}+1)\). If invertibility were not an issue, then a simple and nearly optimal way (this way of choosing the challenge set dates back to at least the original paper that proposed a Fiat-Shamir protocol over polynomial rings [Lyu09]) to choose \(\mathcal {C}\) of size \(2^{256}\) would be to define

$$\begin{aligned} \mathcal {C}=\{c\in R_p^{256}\,:\,\Vert c\Vert _\infty =1, \Vert c\Vert _1 = 60\}. \end{aligned}$$
(4)

In other words, the challenges are ring elements consisting of exactly 60 non-zero coefficients which are \(\pm 1\).Footnote 3 The \(l_2\) norm of such elements is \(\sqrt{60}\).

If we take invertibility into consideration, then we need the difference set \(\mathcal {C}-\mathcal {C}\) (excluding 0) to consist only of invertible polynomials. There are some folklore ways of creating sets all of whose non-zero differences are invertible (c.f. [SSTX09, BKLP15]). If the polynomial \(X^{256}+1\) splits into k irreducible polynomials modulo p, then all of these polynomials must have degree 256/k. It is then easy to see, via the Chinese Remainder Theorem that every non-zero polynomial of degree less than 256/k is invertible in the ring \(\mathbb {Z}_p[X]/(X^{256}+1)\). We can therefore define the set

$$\mathcal {C}'=\{c\in R_p^{256}\,:\,\deg (c)<256/k,\,\Vert c\Vert _\infty \le \gamma \},$$

where \(\gamma \approx 2^{k-1}\) in order for the size of the set to be greater than \(2^{256}\). The \(\ell _2\) norm of elements in this set is \(\sqrt{256/k}\cdot \gamma \). If we, for example, take \(k=8\), then this norm becomes \(\sqrt{32}\cdot 2^7\approx 724,\) which is around 90 times larger than the norms of the challenges in the set defined in (4). It is therefore certainly not advantageous to increase the norm of the challenge by this much only to decrease the running time of the computation. In particular, the security of the scheme will decrease and one will need to increase the ring dimension to compensate, which will in turn negate any savings in running time. For example, the extracted solution to the SIS instance in (3) is \(\bar{c}s'-\bar{s}\), and its size heavily depends on the size of the coefficients in \(\bar{c}\). A much more desirable solution would be to have the polynomial \(X^n+1\) split, but still be able to use the challenge set from (4).

1.2 Our Contribution

Our main result is a general theorem (Theorem 1.1) about the invertibility of polynomials with small coefficients in polynomial rings \(\mathbb {Z}_p[X]/(\varPhi _m(X))\), where \(\varPhi _m(X)\) is the \(m^{th}\) cyclotomic polynomial. The theorem states that if a non-zero polynomial has small coefficients (where “small” is related to the prime p and the number of irreducible factors of \(\varPhi _m(X)\) modulo p), then it’s invertible in the ring \(\mathbb {Z}_p[X]/(\varPhi _m(X))\). For the particular case of \(\varPhi _m(X)=X^n+1\), we show that the polynomial \(X^n+1\) can split into several (in practice up to 8 or 16) irreducible factors and we can still use the optimal challenge sets, like ones of the form from (4). This generalizes and extends a result in [LN17] which showed that one can use the optimal set when \(X^n+1\) splits into two factors. We also show, in Sect. 3.3, some methods for creating challenge sets that are slightly sub-optimal, but allow for the polynomial to split further.

The statement of Theorem 1.1 uses notation from Definition 2.1, while the particular case of \(X^n+1\) in Corollary 1.2 is self-contained. We therefore recommend the reader to first skim the Corollary statement. The proofs of the Theorem and the Corollary are given at the end of Sect. 3.2. For completeness, we also state sufficient conditions for invertibility based on the \(\ell _2\)-norm of the polynomial. This is an intermediate result that we need on the way to obtaining our main result about the invertibility of polynomials with small coefficients (i.e. based on the \(\ell _\infty \) norm of the polynomial), but it could be of independent interest.

Theorem 1.1

Let \(m=\prod p_i^{e_i}\) for \(e_i\ge 1\) and \(z=\prod p_i^{f_i}\) for any \(1\le f_i\le e_i\). If p is a prime such that \(p\equiv 1 \pmod {z}\) and \(\text{ ord }_m(p)=m/z\), then the polynomial \(\varPhi _m(X)\) factors as

$$\varPhi _m(X) \equiv \prod \limits _{j=1}^{\phi (z)} (X^{m/z}-r_j)\pmod {p}$$

for distinct \(r_j\in \mathbb {Z}_p^*\) where \(X^{m/z}-r_j\) are irreducible in the ring \(\mathbb {Z}_p[X]\). Furthermore, any \(\mathbf{y}\) in \(\mathbb {Z}_p[X]/(\varPhi _m(X))\) that satisfies either

$$\begin{aligned} 0<\Vert \mathbf{y}\Vert _\infty&< \frac{1}{s_1(z)}\cdot p^{1/\phi (z)}\\&\text {or}\\ 0<\Vert \mathbf{y}\Vert&< \frac{\sqrt{\phi (m)}}{s_1(m)}\cdot p^{1/\phi (z)} \end{aligned}$$

has an inverse in \(\mathbb {Z}_p[X]/(\varPhi _m(X))\).

The above theorem gives sufficient conditions for p so that all polynomials with small coefficients in \(\mathbb {Z}_p[X]/(\varPhi _m(X))\) are invertible, but it does not state anything about whether there exist such p. In Theorem 2.5, we show that if we additionally put the condition on m and z that \(8|m \Rightarrow 4|z\), then there are indeed infinitely many primes p that satisfy these conditions. In practical lattice constructions involving zero-knowledge proofs, we would normally use a modulus of size at least \(2^{20}\), and we experimentally confirmed (for various cyclotomic polynomials) that one can indeed find many such primes that are of that size.

Specializing the above to the ring \(\mathbb {Z}_p[X]/(X^n+1)\), we obtain the following corollary:

Corollary 1.2

Let \(n\ge k>1\) be powers of 2 and \(p=2k+1 \pmod {4k}\) be a prime. Then the polynomial \(X^n+1\) factors as

$$X^n+1 \equiv \prod \limits _{j=1}^k (X^{n/k}-r_j)\pmod {p}$$

for distinct \(r_j\in \mathbb {Z}_p^*\) where \(X^{n/k}-r_j\) are irreducible in the ring \(\mathbb {Z}_p[X]\). Furthermore, any \(\mathbf{y}\) in \(\mathbb {Z}_p[X]/(X^n+1)\) that satisfies either

$$\begin{aligned} 0<\Vert \mathbf{y}\Vert _\infty&< \frac{1}{\sqrt{k}}\cdot p^{1/k}\\&\text {or}\\ 0<\Vert \mathbf{y}\Vert&< p^{1/k} \end{aligned}$$

has an inverse in \(\mathbb {Z}_p[X]/(X^n+1)\).

As an application of this result, suppose that we choose \(k=8\) and a prime p congruent to \(17 \pmod {32}\) such that \(p>2^{20}\). Furthermore, suppose that we perform our zero-knowledge proofs over the ring \(\mathbb {Z}_p[X]/(X^n+1)\) (where n is a power of 2 greater than 8), and prove the knowledge of \(\bar{s},\bar{c}\) such that \(f(\bar{s})=\bar{c} t\) where \(\Vert \bar{c}\Vert _\infty \le 2\) (i.e. the challenges c are taken such that \(\Vert c\Vert _\infty =1\)). Then the above theorem states that \(X^n+1\) factors into 8 polynomials and \(\bar{c}\) will be invertible in the ring since \(\frac{1}{\sqrt{8}}\cdot p^{1/8} > 2 \).

Having \(p>2^{20}\) is quite normal for the regime of zero-knowledge proofs, and therefore having the polynomial \(X^n+1\) split into 8 factors should be possible in virtually every application. If we would like it to split further into 16 or 32 factors, then we would need \(p>2^{48}\) or, respectively, \(p>2^{112}\). In Sect. 3.3 we describe how our techniques used to derive Theorem 1.1 can also be used in a somewhat “ad-hoc” fashion to create different challenge sets \(\mathcal {C}\) that are nearly-optimal (in terms of the maximal norm), but allow \(X^n+1\) to split with somewhat smaller moduli than implied by Theorem 1.1.

In Sect. 4, we describe how one would combine the partially-splitting FFT algorithm with a Karatsuba multiplication algorithm to efficiently multiply in a partially-splitting ring. For primes of size between \(2^{20}\) and \(2^{29}\), one obtains a speed-up of about a factor of 2 by working over rings where \(X^n+1\) splits into 8 versus just 2 factors.

In addition to the speed improvement, there are applications whose usability can be improved by the fact that we work over rings \(\mathbb {Z}_p[X]/(X^n+1)\) where \(X^n+1\) splits into more factors. For example, [BKLP15] constructed a commitment scheme and zero-knowledge proofs of knowledge that allows to prove the fact that \(\mathbf{c}=\mathbf{a}\mathbf{b}\) when Commit(\(\mathbf{a}\)), Commit(\(\mathbf{b}\)), Commit(\(\mathbf{c}\)) are public (the same holds for addition). An application of this result is the verifiability of circuits. For this application, one only needs commitments of 0’s and 1’s, thus if we work over a ring where \(X^n+1\) splits into k irreducible factors, one can embed k bits into each Chinese Remainder coefficient of \(\mathbf{a}\) and \(\mathbf{b}\), and therefore proving that \(\mathbf{c}=\mathbf{a}\mathbf{b}\) implies that all k multiplications of the bits were performed correctly. Thus the larger k is, the more multiplications one can prove in parallel. Unfortunately k cannot be set too large without ruining the necessary property that the difference of any two distinct challenges is invertible or increasing the \(\ell _2\)-norm of the challenges as described in Sect. 1.1. Our result therefore allows to prove products of 8 (or 16) commitments in parallel without having to increase the parameters of the scheme to accommodate the larger challenges.

2 Cyclotomics and Lattices

2.1 Cyclotomic Polynomials

Definition 2.1

For any integer \(m>1\), we write

$$\begin{aligned} \phi (m)&= m\cdot \prod \limits _{p \, is \, prime \, \wedge \,p\,|\,m} \frac{p-1}{p}\\ \delta (m)&=\prod \limits _{p \, is \, prime \,\wedge \,p\,|\,m} p\\ \tau (m)&={\left\{ \begin{array}{ll} m, &{} if \, m \, is \, odd \\ m/2, &{} if \, m \, is \, even \end{array}\right. }\\ s_1(m)&= largest \, singular \, value \, of \, the \, matrix \, in \, (7)\\ \text{ ord }_m(n)&=\min \{k~:~ k>0 \, and \, n^k \bmod m = 1\} \end{aligned}$$

The function \(\phi (m)\) is the Euler phi function, \(\delta (m)\) is sometimes referred to as the radical of m, and \(\tau (m)\) is a function that sometimes comes into play when working with the geometry of cyclotomic rings. The function \(\text{ ord }_m(n)\) is the order of an element n in the multiplicative group \(\mathbb {Z}_m^*\). In the special case of \(m=2^k\), we have \(\phi (m)=\tau (m)=2^{k-1}\) and \(\delta (m)=2\).

The \(m^{th}\) cyclotomic polynomial, written as \(\varPhi _m(X)\), is formally defined to be

$$\varPhi _m(X)=\prod \limits _{i=1}^{\phi (m)}(X-\omega _i),$$

where \(\omega _i\) are the \(m^{th}\) complex primitive roots of unity (of which there are \(\phi (m)\) many). Of particular interest in practical lattice cryptography is the cyclotomic polynomial \(\varPhi _{2^k}(X)=X^{2^{k-1}}+1\).

If p is some prime and \(r_1,\ldots ,r_{\phi (m)}\) are elements in \(\mathbb {Z}_p^*\) such that \(\text{ ord }_p(r_j)=\phi (m)\), then one can write

$$\varPhi _m(X)\equiv \prod _{j=1}^{\phi (m)}(X-r_j)\pmod {p}.$$

For any \(m>1\), it is known that we can express the cyclotomic polynomial \(\varPhi _m(X)\) as

$$\begin{aligned} \varPhi _m(X)=\varPhi _{\delta (m)}\left( X^{m/\delta (m)}\right) , \end{aligned}$$
(5)

and the below Lemma is a generalization of this statement.

Lemma 2.2

Let \(m=\prod p_i^{e_i}\) for \(e_i\ge 1\) and \(z=\prod p_i^{f_i}\) for any \(1\le f_i\le e_i\). Then

$$\varPhi _m(X)=\varPhi _z(X^{m/z}).$$

Proof

By (5), and the fact that \(\delta (m)=\delta (z)\), we can rewrite \(\varPhi _m(X)\) as

$$\begin{aligned} \varPhi _m(X)=\varPhi _{\delta (m)}(X^{m/\delta (m)})&=\varPhi _{\delta (m)}(X^{z/\delta (m)})(X^{m/z}) \nonumber \\ {}&=\varPhi _{\delta (z)}(X^{z/\delta (z)})(X^{m/z})=\varPhi _z(X^{m/z}). \end{aligned}$$
(6)

   \(\square \)

2.2 The Splitting of Cyclotomic Polynomials

In Theorem 2.3, we give the conditions on the prime p such that the polynomial \(\varPhi _m(X)\) splits into irreducible factors \(X^{m/k}-r\) modulo p. In Theorem 2.5, we then show that when m and k satisfy an additional relation, there are infinitely many p that satisfy the necessary conditions of Theorem 2.3.

Theorem 2.3

Let \(m=\prod p_i^{e_i}\) for \(e_i\ge 1\) and \(z=\prod p_i^{f_i}\) for any \(1\le f_i\le e_i\). If p is a prime such that \(p\equiv 1 \pmod {z}\) and \(\text{ ord }_m(p)=m/z\), then the polynomial \(\varPhi _m(X)\) factors as

$$\varPhi _m(X)\equiv \prod \limits _{j=1}^{\phi (z)} (X^{m/z}-r_j)\pmod {p}$$

for distinct \(r_j\in \mathbb {Z}_p^*\) where \(X^{m/z}-r_j\) are irreducible in \(\mathbb {Z}_p[X]\).

Proof

Since p is a prime and \(p \equiv 1 \pmod {z}\), there exists an element r such that \(\text{ ord }_p(r)=z\). Furthermore, for all the \(\phi (z)\) integers \(1<i<z\) such that \(\gcd (i,z)=1\), we also have \(\text{ ord }_p(r^{i})=z\). We therefore have, by definition of \(\varPhi \), that

$$\varPhi _{z}(X)\equiv \prod \limits _{j=1}^{\phi (z)}(X-r_j) \pmod {p}.$$

Applying Lemma 2.2, we obtain that

$$\varPhi _{m}(X)\equiv \prod \limits _{j=1}^{\phi (z)}(X^{m/z}-r_j)\pmod {p}.$$

We now need to prove that the terms \(X^{m/z}-r_j\) are irreducible modulo p. Suppose they are not and \(X^{m/z} - r_j\) has an irreducible divisor f of degree \(d < \frac{m}{z}\). Then f defines an extension field of \(\mathbb {Z}_p\) of degree d, i.e. a finite field with \(p^d\) elements that all satisfy \(X^{p^d} = X\). Hence f divides \(X^{p^d} - X\). Now, from \({\text {ord}}_m(p) = \frac{m}{z} > d\) it follows that we can write \(p^d = am + b\) where \(b \ne 1\). Thus

$$X^{p^d} - X = X^{am + b} - X = X(X^{am + (b - 1)} - 1).$$

If we now consider an extension field of \(\mathbb {Z}_p\) in which f splits, the roots of f are also roots of \(X^{am + (b - 1)} - 1\) and therefore have order dividing \(am + (b-1)\). This is a contradiction. As a divisor of \(X^{m/z} - r_j\) (and therefore of \(\varPhi _m\)), f has only roots of order m.   \(\square \)

In the proof of Theorem 2.5 we need a small result about the multiplicative order of odd integers modulo powers of 2. Since we also need this later in the proof of Corollary 1.2, we state this result in the next lemma.

Lemma 2.4

Let \(a \equiv 1 + 2^f \pmod {2^{f+1}}\) for \(f \ge 2\). Then the order of a in the group of units modulo \(2^e\) for \(e \ge f\) is equal to \(2^{e-f}\), i.e. \(\text{ ord }_{2^e}(a) = 2^{e-f}\).

Proof

We can write \(a = 1 + 2^fk_1\) with some odd \(k_1 \in \mathbb {Z}\). Then notice \(a^2 = 1 + 2^{f+1}k_1 + 2^{2f}k_1^2 = 1 + 2^{f+1}(k_1 + 2^{f-1}k_1^2) = 1 + 2^{f+1}k_2\) with odd \(k_2 = k_1 + 2^{f-1}k_1^2\). It follows iteratively that \(a^{2^{e-f}} = 1 + 2^ek_{2^{e-f}} \equiv 1 \pmod {2^e}\), which implies the order of a modulo \(2^e\) divides \(2^{e-f}\), but \(a^{2^{e-f-1}} = 1 + 2^{e-1}k_{2^{e-f-1}} \not \equiv 1 \pmod {2^e}\) since \(k_{2^{e-f-1}}\) is odd. So, the multiplicative order of a modulo \(2^e\) must be \(2^{e-f}\).

Theorem 2.5

Let \(m=\prod p_i^{e_i}\) for \(e_i\ge 1\) and \(z=\prod p_i^{f_i}\) for any \(1\le f_i\le e_i\). Furthermore, assume that if m is divisible by 8, then z is divisible by 4. Then there are infinitely many primes p such that \(p \equiv 1 \pmod {z}\) and \(\text{ ord }_m(p)=m/z\).

Proof

First we show that an integer not necessarily prime exists that fulfills the two conditions. By the Chinese remainder theorem it suffices to find integers \(a_i\) such that \(a_i \bmod p_i^{f_i} = 1\) and \({\text {ord}}_{p_i^{e_i}}(a_i) = p_i^{e_i - f_i}\). First consider the odd primes \(p_i \ne 2\). It is easy to show that if g is a generator modulo \(p_i\) then either g or \(g + p_i\), say \(g'\), is a generator modulo every power of \(p_i\) (c.f. [Coh00, Lemma 1.4.5]). Define \(a_i = (g')^{(p_i - 1)p_i^{f_i - 1}}\). Then, since \(g'\) has order \((p_i - 1)p_i^{f_i - 1}\) modulo \(p_i^{f_i}\) and order \((p_i - 1)p_i^{e_i - 1}\) mod \(p_i^{e_i}\), it follows that \(a_i \bmod p_i^{f_i} = 1\) and

$$\begin{aligned} {\text {ord}}_{p_i^{e_i}}(a_i) = \frac{(p_i - 1)p_i^{e_i - 1}}{(p_i - 1)p_i^{f_i - 1}} = p_i^{e_i - f_i} \end{aligned}$$

as we wanted. Next, consider \(p = 2\) and the case where m is divisible by 8; that is, \(e_1 \ge 3\). This implies \(f_1 \ge 2\). From Lemma 2.4 we see that 5 is a generator of a cyclic subgroup of \(\mathbb {Z}_{2^e}^\times \) of index 2 for every \(e \ge 3\), i.e. \({\text {ord}}_{2^{e}}(5) = 2^{e - 2}\). Therefore, \(5^{2^{f_1 - 2}} \bmod 2^{f_1} = 1\) and

$${\text {ord}}_{2^{e_1}}(5^{2^{f_1 - 2}}) = \frac{2^{e_1 - 2}}{2^{f_1 - 2}} = 2^{e_1 - f_1}.$$

Hence \(a_1 = 5^{2^{f_1 - 2}}\) is a valid choice in this case. If \(e_1 = 2\), note that 3 is a generator modulo 4 and \(a_1 = 3^{2^{f_1 - 1}}\) is readily seen to work. When \(e_1 = f_1 = 1\), take \(a_1 = 1\). So, there exists an integer a that fulfills our two conditions and in fact every integer congruent to \(a \bmod m\) does. By Dirichlet’s theorem on arithmetic progressions, there are infinitely many primes among the \(a + lm\) (\(l \in \mathbb {Z}\)).    \(\square \)

As an experimental example consider \(m = 2^2 3^3 7 = 756\) and \(z = 2\cdot 3\cdot 7 = 42\). Then \(\varPhi _m\) splits into 12 polynomials modulo primes of the form in Theorem 2.5. There are 2058 primes of this form between \(2^{20}\) and \(2^{21}\).

2.3 The Vandermonde Matrix

To each cyclotomic polynomial \(\varPhi _m(X)\) with roots of unity \(\omega _1,\ldots ,\omega _{\phi (m)}\), we associate the Vandermonde matrix

$$\begin{aligned} \mathbf{V}_m=\begin{bmatrix} 1&\omega _1&\omega _1^2&~\ldots ~&\omega _1^{\phi (m)-1}\\ 1&\omega _2&\omega _2^2&~\ldots ~&\omega _2^{\phi (m)-1}\\&&\ldots&\\ 1&~\omega _{\phi (m)}&~\omega _{\phi (m)}^2&~\ldots ~&\omega _{\phi (m)}^{\phi (m)-1} \end{bmatrix}\in \mathbb {C}^{\phi (m)\times \phi (m)}. \end{aligned}$$
(7)

The important property for us in this paper is the largest singular value of \(\mathbf{V}_m\), which we write as

$$\begin{aligned} s_1(m)=\max _{\mathbf{u}\in \mathbb {C}^{\phi (m)}}\frac{\Vert \mathbf{V}_m\mathbf{u}\Vert }{\Vert \mathbf{u}\Vert }. \end{aligned}$$
(8)

It was shown in [LPR13, Lemma 4.3] that when \(m=p^k\) for any prime p and positive integer k, then

$$\begin{aligned} s_1(m)=\sqrt{\tau (m)}. \end{aligned}$$
(9)
Table 1. Values of m less than 600 for which \(s_1(m)\ne \sqrt{\tau (m)}\).

We do not know of a theorem analogous to (9) that holds for all m, and so we numerically computed \(s_1(m)\) for all \(m<3000\) and observed that \(s_1(m)\le \sqrt{\tau (m)}\) was always satisfied. Furthermore, for most m, we still had the equality \(s_1(m)=\sqrt{\tau (m)}\). The only exceptions where \(s_1(m)<\sqrt{\tau (m)}\) were integers that have at least 3 distinct odd prime factors. As an example, Table 1 contains a list of all such values up to 600 for which \(s_1(m)< \sqrt{\tau (m)}\). We point out that while it appears that having three prime factors is a necessary condition for m to appear in the table, it is not sufficient. For example, \(255=3\cdot 5\cdot 17\), but still \(s_1(255) = \sqrt{\tau (255)} = \sqrt{255}\).

For all practical sizes of m used in cryptography, the value \(s_1(m)\) is fairly easy to compute numerically using basic linear algebra software (e.g. MATLAB, Scilab, etc.), and we will state all our results in terms of \(s_1(m)\). Nevertheless, being able to relate \(s_1(m)\) to \(\tau (m)\) certainly simplifies the calculation. Based on our numerical observations, we formulate the following conjecture:

Conjecture 2.6

For all positive integers m, \(s_1(m)\le \sqrt{\tau (m)}\).

2.4 Cyclotomic Rings and Ideal Lattices

Throughout the paper, we will write \(R_{m}\) to be the cyclotomic ring \(\mathbb {Z}[X]/(\varPhi _m(X))\) and \(R_{m,p}\) to be the ring \(\mathbb {Z}_p[X]/(\varPhi _m(X))\), with the usual polynomial addition and multiplication operations. We will denote by normal letters elements in \(\mathbb {Z}\) and by bold letters elements in \(R_m\). For an odd p, an element \(\mathbf{w}\in R_{m,p}\) can always be written as \(\sum \limits _{i=0}^{\phi (m)-1}{w_i X^i}\) where \(|w_i|\le (p-1)/2\). Using this representation, for \(\mathbf{w}\in R_{m,p}\) (and in \(R_m\)), we will define the lengths of elements as

$$\Vert \mathbf{w}\Vert _\infty =\max _i|w_i|\text { and }\Vert \mathbf{w}\Vert =\sqrt{\sum _i|w_i|^2}.$$

Just as for vectors over \(\mathbb {Z}\), the norms satisfy the inequality \(\Vert \mathbf{w}\Vert \le \sqrt{\phi (m)}\cdot \Vert \mathbf{w}\Vert _\infty \).

Another useful definition of length is with respect to the embedding norm of an element in \(R_m\). If \(\omega _1,\ldots ,\omega _{\phi (m)}\) are the complex roots of \(\varPhi _m(X)\), then the embedding norm of \(\mathbf{w}\in R_m\) is

$$\Vert \mathbf{w}\Vert _e = \sqrt{\sum _i \mathbf{w}(\omega _i)^2}.$$

If we view of \(\mathbf{w}=\begin{bmatrix} w_0 \\ w_1 \\ \ldots \\w_{\phi (m)-1}\end{bmatrix}\) as a vector over \(\mathbb {Z}^{\phi (m)}\), then the above definition is equivalent to

$$\Vert \mathbf{w}\Vert _e = \sqrt{\sum _i \mathbf{w}(\omega _i)^2}=\Vert \mathbf{V}_m\mathbf{w}\Vert $$

due to the fact that the \(i^{th}\) position of \(\mathbf{V}_m\mathbf{w}\) is \(\mathbf{w}(\omega _i)\). This gives a useful relationship between the \(\Vert \cdot \Vert _e\) and \(\Vert \cdot \Vert \) norms as

$$\begin{aligned} \Vert \mathbf{w}\Vert _e\le s_1(m)\cdot \Vert \mathbf{w}\Vert . \end{aligned}$$
(10)

An integer lattice of dimension n is an additive sub-group of \(\mathbb {Z}^n\). For the purposes of this paper, all lattices will be full-rank. The determinant of a full-rank integer lattice \(\varLambda \) of dimension n is the size of the quotient group \(|\mathbb {Z}^n/\varLambda |\). We write \(\lambda _1(\varLambda )\) to denote the Euclidean length of the shortest non-zero vector in \(\varLambda \).

If \(\mathcal {I}\) is an ideal in the polynomial ring \(R_m\), then it is also an additive sub-group of \(\mathbb {Z}^{\phi (m)}\), and therefore a \(\phi (m)\)-dimensional lattice (it can be shown that such lattices are always full-rank). Such lattices are therefore sometimes referred to as ideal lattices. For any ideal lattice \(\varLambda \) of the ring \(R_m\), there exists a lower bound on the embedding norm of its vectors (c.f. [PR07, Lemma 6.2])

$$\forall \mathbf{w}\in \varLambda ,\,\Vert \mathbf{w}\Vert _e\ge \sqrt{\phi (m)}\cdot \det (\varLambda )^{1/\phi (m)}.$$

Combining the above with (10) yields the following lemma:

Lemma 2.7

If \(\varLambda \) is an ideal lattice in \(R_m\), then

$$\lambda _1(\varLambda )\ge \frac{\sqrt{\phi (m)}}{s_1(m)}\cdot \det (\varLambda )^{1/\phi (m)}.$$

3 Invertible Elements in Cyclotomic Rings

The main goal of this section is to prove Theorem 1.1. To this end, we first prove Lemma 3.1, which proves the Theorem for the \(\ell _2\) norm. Unfortunately directly applying this Lemma to prove the \(\ell _\infty \) part of the Theorem 1.1 by using the relationship between the \(\ell _2\) and \(\ell _\infty \) norms is sub-optimal. In Sect. 3.2 we instead show that by writing elements of partially-splitting rings \(R_{m,p}\) as sums of polynomials over smaller, fully-splitting rings, one can obtain a tighter bound. We prove in Lemma 3.2 that if any of the parts of \(\mathbf{y}\in R_{m,p}\) is invertible in the smaller fully-splitting ring, then the polynomial \(\mathbf{y}\) is invertible in \(R_{m,p}\). The full proof of Theorem 1.1 will follow from this Lemma, the special case of Lemma 3.1 applicable to fully-splitting rings, and Theorem 2.3.

3.1 Invertibility and the \(\ell _2\) Norm

Our main result only needs a special case of the below Lemma corresponding to when \(\varPhi _m(X)\) fully splits, but we prove a more general statement since it doesn’t bring with it any additional complications.

Lemma 3.1

Let \(m=\prod p_i^{e_i}\) for \(e_i\ge 1\) and \(z=\prod p_i^{f_i}\) for any \(1\le f_i\le e_i\) such that

$$\varPhi _m(X)\equiv \prod \limits _{i=1}^{\phi (z)}(X^{m/z}-r_i) \pmod {p}$$

for some distinct \(r_i\in \mathbb {Z}_p^*\) where \(X^{m/z}-r_i\) are irreducible in \(\mathbb {Z}_p[X]\), and let \(\mathbf{y}\) be any element in the ring \(R_{m,p}\). If \(0<\Vert \mathbf{y}\Vert <\frac{\sqrt{\phi (m)}}{s_1(m)}\cdot p^{1/\phi (z)}\), then \(\mathbf{y}\) is invertible in \(R_{m,p}\).

Proof

Suppose that \(\mathbf{y}\) is not invertible in \(R_{m,p}\). By the Chinese Remainder Theorem, this implies that for (at least) one i, \(\mathbf{y}\bmod \left( X^{m/z}-r_i,p\right) =0\). For an i for which \(\mathbf{y}\bmod \left( X^{m/z}-r_i,p\right) =0\) (if there is more than one such i, pick one of them arbitrarily) define the set

$$\varLambda =\left\{ \mathbf{z}\in R_m : \mathbf{z}\bmod \left( X^{m/z}-r_i,p\right) =0\right\} .$$

Notice that \(\varLambda \) is an additive group. Also, because \(X^{m/z}-r_i\) is a factor of \(\varPhi _m(X)\) modulo p, for any polynomial \(\mathbf{z}\in \varLambda \), the polynomial \(\mathbf{z}\cdot X \in R_m\) is also in \(\varLambda \). This implies that \(\varLambda \) is an ideal of \(R_m\), and so an ideal lattice in the ring \(R_m\). By looking at the Chinese Remainder representation modulo p of all the elements in \(\varLambda \) (they have 0 in the coefficient corresponding to modulo \(X^{m/z}-r_i\), and are arbitrary in all other coefficients), one can see that \(\left| \mathbb {Z}^{\phi (m)}/\varLambda \right| =p^{m/z}=p^{\phi (m)/\phi (z)}\), which is the determinant of \(\varLambda \). By Lemma 2.7, we then know that \(\lambda _1(\varLambda )\ge \frac{\sqrt{\phi (m)}}{s_1(m)}\cdot p^{1/\phi (z)}\).

Since \(\mathbf{y}\bmod \left( X^{m/z}-r_i,p\right) =0\) and \(0<\Vert \mathbf{y}\Vert \), we know that \(\mathbf{y}\) is a non-zero vector in \(\varLambda \). But we also have by our hypothesis that \(\Vert \mathbf{y}\Vert <\frac{\sqrt{\phi (m)}}{s_1(m)}\cdot p^{1/\phi (z)}\le \lambda _1(\varLambda )\), which is impossible.

   \(\square \)

One can see that a direct application of Lemma 3.1 gives a weaker bound than what we are claiming in Theorem 1.1 – we can only conclude that all vectors \(\mathbf{y}\) such that

$$\Vert \mathbf{y}\Vert _\infty \le \frac{1}{s_1(m)}\cdot p^{1/\phi (z)}$$

are invertible. Since \(z\ll m\), having \(s_1(m)\) vs. \(s_1(z)\) in the denominator makes a very noticeable difference in the tightness of the result (for example, if mz are powers of 2, then \(s_1(m)=\sqrt{m/2}\) and \(s_1(z)=\sqrt{z/2}\)). In Sect. 3.2, we instead break up \(\mathbf{y}\) into a sum of elements in smaller rings \(R_{z,p}\) and prove that only some of these parts, need to be invertible in \(R_{z,p}\) in order for the entire element \(\mathbf{y}\) to be invertible in \(R_{m,p}\).

We point out that Lemma 3.1 was already implicit in [SS11, Lemma 8] for \(\varPhi _m(X)=X^n+1\). To obtain a bound in the \(\ell _\infty \) norm, the authors of that work then applied the norm inequality between the \(\ell _2\) and \(\ell _\infty \) norms to obtain the bound that we described above. Using the more refined approach in the current paper, however, that bound can be tightened and would immediately produce an improvement in the main result of [SS11] which derives the statistical closeness of a particular distribution to uniform. Such applications are therefore another area in which our main result can prove useful.

3.2 Partially-Splitting Rings

In this section, we will be working with rings \(R_{m,p}\) where p is chosen such that the polynomial \(\varPhi _m(X)\) factors into k irreducible polynomials of the form \(X^{\phi (m)/k}-r_i\). Theorem 2.3 states the sufficient conditions on mkp in order to obtain such a factorization. Throughout this section, we will use the following notation: suppose that

$$\mathbf{y}=\sum \limits _{j=0}^{\phi (m)-1}y_j X^j$$

is an element of the ring \(R_{m,p}\), where the value p is chosen as above. Then for all integers \(0\le i<\phi (m)/k-1\), we define the polynomials \(\mathbf{y}'_i\) as

$$\begin{aligned} \mathbf{y}'_i = \sum \limits _{j=0}^{k-1}y_{j\phi (m)/k+i} X^j. \end{aligned}$$
(11)

For example, if \(\phi (m)=8\) and \(k=4\), then for \(\mathbf{y}=\sum \limits _{i=0}^7 y_i X^i\), we have \(\mathbf{y}_0'=y_0+y_2 X+y_4 X^2 +y_6 X^3\) and \(\mathbf{y}_1'=y_1+y_3 X+ y_5 X^2 + y_7 X^3\).

The intuition behind the definition in (11) is that one can write \(\mathbf{y}\) in terms of the \(\mathbf{y}_i'\) as

$$\mathbf{y}=\sum \limits _{i=0}^{\phi (m)/k-1}\mathbf{y}_i'(X^{\phi (m)/k})\cdot X^i.$$

Then to calculate \(\mathbf{y}\bmod \, (X^{\phi (m)/k}-r_j)\) where \((X^{\phi (m)/k}-r_j)\) is one of the irreducible factors of \(\varPhi _m(X)\) modulo p, we have

$$\begin{aligned} \mathbf{y}\bmod \, (X^{\phi (m)/k}-r_j)=\sum \limits _{i=0}^{\phi (m)/k-1}\mathbf{y}'_i(r_j)\cdot X^i \end{aligned}$$
(12)

simply because we plug in \(r_j\) for every \(X^{\phi (m)/k}\).

Lemma 3.2

Let \(m=\prod p_i^{e_i}\) for \(e_i\ge 1\) and \(z=\prod p_i^{f_i}\) for any \(1\le f_i\le e_i\), and suppose that we can write

$$\begin{aligned} \varPhi _m(X)\equiv \prod \limits _{j=1}^{\phi (z)}(X^{m/z}-r_j)\pmod {p} \end{aligned}$$
(13)

for distinct \(r_j\in \mathbb {Z}_p^*\) where \((X^{m/z}-r_j)\) are irreducible in \(\mathbb {Z}_p[X]\). Let \(\mathbf{y}\) be a polynomial in \(R_{m,p}\) and define the associated \(\mathbf{y}_i'\) as in (11), where \(k=\phi (z)\). If some \(\mathbf{y}_i'\) is invertible in \(R_{z,p}\), then \(\mathbf{y}\) is invertible in \(R_{m,p}\).

Proof

By the Chinese Remainder Theorem, the polynomial \(\mathbf{y}\) is invertible in \(R_{m,p}\) if and only if \(\mathbf{y}\bmod (X^{m/z}-r_j)\ne 0\) for all \(r_1,\ldots ,r_k\). When we use \(k=\phi (z)\), (12) can be rewritten as

$$\mathbf{y}\bmod \, (X^{m/z}-r_j)=\sum \limits _{i=0}^{m/z-1}\mathbf{y}'_i(r_j)\cdot X^i.$$

To show that \(\mathbf{y}\) is invertible, it is therefore sufficient to show that

$$\exists i\text { s.t }\forall j,\,\,\,\mathbf{y}'_i(r_j)\bmod \,p\ne 0.$$

Let i be such that \(\mathbf{y}_i'\) is invertible in the ring \(R_{z,p}\). From (13) and Lemma 2.2 we have that

$$\varPhi _{z}(X)\equiv \prod \limits _{j=1}^{\phi (z)} (X-r_j)\pmod {p},$$

and so the ring \(R_{z,p}\) is fully-splitting. Since \(\mathbf{y}'_i\) is invertible in \(R_{z,p}\), the Chinese Remainder Theorem implies that for all \(1\le j\le \phi (z)\), \(\mathbf{y}'_i(r_j)\bmod \,p\ne 0,\) and therefore \(\mathbf{y}\) is invertible in \(R_{m,p}\).

   \(\square \)

Theorem 1.1 now follows from the combination of Theorem 2.3, and Lemmas 3.1 and 3.2.

Proof

(Theorem 1.1). For the conditions on mz,  and p, it follows from Theorem 2.3 that the polynomial \(\varPhi _m(X)\) can be factored into irreducible factors modulo p as \(\prod \limits _{j=1}^{\phi (z)} (X^{m/z}-r_j)\). Lemma 2.2 then states that \(\varPhi _{z}(X)\equiv \prod \limits _{j=1}^{\phi (z)} (X-r_j)\pmod {p}\).

For any \(\mathbf{y}\in R_{m,p}\), let the \(\mathbf{y}_i'\) be defined as in (11) where \(k=\phi (z)\). If \(0<\Vert \mathbf{y}\Vert _\infty <\frac{1}{s_1(z)}\cdot p^{1/\phi (z)}\), then because each \(\mathbf{y}_i'\) consists of \(\phi (z)\) coefficients, we have that for all i, \(\Vert \mathbf{y}_i'\Vert <\frac{\sqrt{\phi (z)}}{s_1(z)}\cdot p^{1/\phi (z)}\). Since \(\mathbf{y}\ne 0\), it must be that for some i, \(\mathbf{y}_i'\ne 0\).

Lemma 3.1 therefore implies that the non-zero \(\mathbf{y}_i'\) is invertible in \(R_{z,p}\). In turn, Lemma 3.2 implies that \(\mathbf{y}\) is invertible in \(R_{m,p}\).    \(\square \)

Proof

(Of Corollary 1.2) If \(n \ge k > 1\) are powers of 2, then we set \(m=2n\) and \(z=2k\) in Theorem 1.1. Then \(\varPhi _m(X)=X^n+1\) and the condition that \(p\equiv 2k+1 \pmod {4k}\), i.e. \(p \equiv z+1 \pmod {2z}\), implies \(p\equiv 1 \pmod {z}\). Now we need to show that \(\text{ ord }_{m}(p)=m/z\), but this follows immediately from Lemma 2.4 by setting \(m = 2^e\) and \(z = 2^f\) and noting that \(f \ge 2\). Finally, from (9) we have \(s_1(z) = \sqrt{\tau (z)} = \sqrt{\frac{z}{2}} = \sqrt{k}\) and \(s_1(m) = \sqrt{n}\). Therefore the upper bounds for the \(\Vert \cdot \Vert _\infty \) and \(\Vert \cdot \Vert \) inequalities read \(\frac{1}{\sqrt{k}}p^{1/k} = \frac{1}{s_1(z)}p^{1/k}\) and \(p^{1/k} = \frac{\sqrt{n}}{s_1(m)}p^{1/k}\), respectively, as in Theorem 1.1.    \(\square \)

3.3 Example of “Ad-Hoc” Applications of Lemma 3.2

Using Lemma 3.2, as we did in the proof of Theorem 1.1 above, gives a clean statement as to a sufficient condition under which polynomials are invertible in a partially-splitting ring. One thing to note is that putting a bound on the \(\ell _\infty \) norm does not take into account the other properties that our challenge space may have. For example, our challenge space in (4) is also sparse, in addition to having the \(\ell _\infty \) norm bounded by 1. Yet we do not know how to use this sparseness to show that one can let \(\varPhi _m(X)\) split further while still maintaining the invertibility of the set \(\mathcal {C}-\mathcal {C}\).

In some cases, however, there are ways to construct challenge sets that are more in line with Lemma 3.2 and will allow further splitting. We do not see a simple way in which to systematize these ideas, and so one would have to work out the details on a case-by-case basis. Below, we give such an example for the case in which we are working over the ring \(\mathbb {Z}_p[X]/(X^{256}+1)\) and would like to have the polynomial \(X^{256}+1\) split into 16 irreducible factors. If we would like to have \(X^n+1\) split into 16 factors modulo p and the set \(\mathcal {C}-\mathcal {C}\) to have elements whose infinity norm is bounded by 2, then applying Theorem 1.1 directly implies that we need to have \(2<\frac{1}{\sqrt{16}}\cdot p^{1/16}\), which implies \(p>2^{48}\).

We will now show how one can lower the requirement on p in order to achieve a split into 16 factors by altering the challenge set \(\mathcal {C}\) in (4).

For a polynomial \(\mathbf{y}\in \mathbb {Z}_p[X]/(X^{256}+1)\), define the \(\mathbf{y}_i'\) as in (11). Define \(\mathcal {D}\) as

$$\begin{aligned} \mathcal {D}=\{\mathbf{y}\in \mathbb {Z}_p[X]/(X^{256}+1)\,:\,\Vert \mathbf{y}_i\Vert _\infty =1\text { and }\forall \, 1\le i\le 16\,, \Vert \mathbf{y}'_i\Vert = 2\} \end{aligned}$$
(14)

In other words, \(\mathcal {D}\) is the set of polynomials \(\mathbf{y}\), such that every \(\mathbf{y}_i'\) has exactly 4 non-zero elements that are \(\pm 1\). The size of \(\mathcal {D}\) is \(\left( {16\atopwithdelims ()4}\cdot 2^4\right) ^{16}\approx 2^{237}\), which should be enough for practical quantum security. The \(\ell _2\) norm of every element in \(\mathcal {D}\) is exactly \(\sqrt{64}=8\). For a fair comparison, we should redefine the set \(\mathcal {C}\) so that it also has size \(2^{237}\). The only change that one must make to the definition in (4) is to lower the \(\ell _1\) norm to 53 from 60. Thus all elements in \(\mathcal {C}\) have \(\ell _2\) norm \(\sqrt{53}\). The elements in set \(\mathcal {D}\) therefore have norm that is larger by a factor of about 1.1. It then depends on the application as to whether having \(X^n+1\) split into 16 rather than 8 factors is worth this modest increase. We will now prove that for primes \(p>2^{30.5}\) of a certain form, \(X^{256}+1\) will split into 16 irreducible factors modulo p and all the non-zero elements in \(\mathcal {D}-\mathcal {D}\) will be invertible. Therefore if our application calls for a modulus that is larger than \(2^{30.5}\) but smaller than \(2^{48}\), we can use the challenge set \(\mathcal {D}\) and the below lemma.

Lemma 3.3

Suppose that \(p>2^{16\log _2{\sqrt{14}}}\approx 2^{30.5}\) is a prime congruent to \(33 \pmod {64}\). Then the polynomial \(X^{256}+1\) splits into 16 irreducible polynomials of the form \(X^{16}+r_j\) modulo p, and any non-zero polynomial \(\mathbf{y}\in \mathcal {D}-\mathcal {D}\) (as defined in (14)) is invertible in the ring \(\mathbb {Z}_p[X]/(X^{256}+1)\).

Proof

The fact that \(X^{256}+1\) splits into 16 irreducible factors follows directly from Theorem 2.3. Notice that for any \(\mathbf{y}\in \mathcal {D}-\mathcal {D}\), the maximum \(\ell _2\) norm of \(\mathbf{y}_i'\) is bounded by 4. Furthermore, the degree of each \(\mathbf{y}_i'\) is \(256/16=16\). Thus an immediate consequence of Lemmas 3.2 and 3.1 is that if \(p>2^{32}\), then any non-zero element in \(\mathcal {D}-\mathcal {D}\) is invertible. To slightly improve the lower bound, we can observe that the \(\mathbf{y}_i'\) of norm 4 are polynomials in \(\mathbb {Z}_p[X]/(X^{16}+1)\) with exactly four 2’s in them. But such elements can be written as a product of 2 and a polynomial with 4 \(\pm 1\)’s in it. So if both of those are invertible, so is the product. The maximum norm of these polynomials is 2 and so they are not the elements that set the lower bound. The next largest element in \(\mathcal {D}-\mathcal {D}\) is one that has three 2’s and two \(\pm 1\)’s. The norm of such elements is \(\sqrt{14}\). Thus for all \(p>2^{16\cdot \log _2(\sqrt{14})}\approx 2^{30.5}\), the \(\mathbf{y}_i'\) will be invertible in \(\mathbb {Z}_p[X]/(X^{16}+1)\), and thus every non-zero element in \(\mathcal {D}-\mathcal {D}\) will be invertible in \(\mathbb {Z}_p[X]/(X^{256}+1)\).   \(\square \)

Table 2. CPU cycles of our FFT-accelerated multiplication algorithm for \(\mathbb {Z}_p[X]/(X^{256} + 1)\) using Karatsuba multiplication for the base case. Both the FFT and Karatsuba are plain C implementations.
Table 3. CPU cycles of our FFT-accelerated multiplication algorithm for \(\mathbb {Z}_p[X]/(X^{256} + 1)\) using FLINT for base case multiplication. The FFT implementation is a highly optimized AVX2-based implementation.

4 Polynomial Multiplication Implementation

We now describe in more detail the computational advantage of having the modulus \(\varPhi _m\) split into as many factors as possible and present our experimental results. We focus on the case where m is a power of two and write \(n = \phi (m) = m/2\). In this case one can use the standard radix-2 FFT-trick to speed up the multiplication. Note that for other m, one can also exploit the splitting in a divide-and-conquer fashion similar to the radix-2 FFT.

Suppose that \(\mathbb {Z}_p\) contains a fourth root of unity r so that we can write

$$\begin{aligned} X^{n} + 1 = (X^{n/2} + r)(X^{n/2} - r). \end{aligned}$$

Then, in algebraic language, the FFT (or NTT) is based on the Chinese remainder theorem, which says that \(R_{m,p} = \mathbb {Z}_p[X]/(X^{n} + 1)\) is isomorphic to the direct product of \(\mathbb {Z}_p[X]/(X^{n/2} + r)\) and \(\mathbb {Z}_p[X]/(X^{n/2} - r)\). To multiply two polynomials in \(R_{m,p}\) one can first reduce them modulo the two factors of the modulus, then multiply the resulting polynomials in the smaller rings, and finally invert the Chinese remainder map in order to obtain the product of the original polynomials. This is called the (radix-2) FFT-trick (see [Ber01] for a very good survey). Note that reducing a polynomial of degree less than n modulo the two sparse polynomials \(X^{n/2} \pm r\) is very easy and takes only \(\frac{n}{2}\) multiplications, \(\frac{n}{2}\) additions and \(\frac{n}{2}\) subtractions. If \(\mathbb {Z}_p\) contains higher roots so that \(X^n + 1\) splits further, then one can apply the FFT-trick recursively to the smaller rings. What is usually referred to as the number theoretic transform (NTT) is the case where \(\mathbb {Z}_p\) contains a 2n-th root of unity so that \(X^n + 1\) splits completely into linear factors. This reduces multiplication in \(R_{m,p}\) to just multiplication in \(\mathbb {Z}_p\).

As we are interested in the case where the modulus does not split completely, we need to be able to multiply in rings of the form \(\mathbb {Z}_p[X]/(X^{n/k} - r_j)\) with \(k < n\). As is common in cryptographic applications (see, for example [BCLvV17]), we will use the Karatsuba multiplication algorithm to perform this operation. For both the FFT and the Karatsuba multiplication, we have written a relatively straight-forward C implementation.

In Table 2 we give the measurements of our experiments. We have performed multiplications in \(R_{512,p} = \mathbb {Z}_p[X]/(X^{256} + 1)\) for four completely splitting primes between \(2^{20}\) and \(2^{30}\). For each prime we have used between 0 and 8 levels of FFT before switching to Karatsuba multiplication. 0 levels of FFT means that no FFT stage was used at all and the input polynomials were directly multiplied via Karatsuba multiplication. In the other extreme of 8 levels of FFT, no Karatsuba multiplication was used and the corresponding measurements reflect the speed of our full number theoretic transform down to linear factors with pointwise multiplication as the base case. As one more example, when performing 3 levels of FFT, we were multiplying 8 polynomials each of degree less then 32 via Karatsuba multiplication. The listed numbers are numbers of CPU cycles needed for the whole multiplication. They are the medians of 10000 multiplications each. The tests where performed on a laptop equipped with an Intel Skylake i7 CPU running at 3.4 GHz. The cycle counter in this CPU ticks at a constant rate of 2.6 GHz. As one can see, being able to use a prime p so that \(X^n + 1\) splits into more than two factors is clearly advantageous. For instance, by allowing \(X^n + 1\) to split into 8 factors compared to just 2, we achieve a speedup of about a factor of two.

We have also experimented with highly-optimized polynomial multiplication algorithms provided by a popular computer algebra library FLINT [HJP13] and PARI [The16]. FLINT employs various forms of Kronecker substitution for the task of polynomial multiplication. For these experiments we used a fast vectorized FFT implementation written in assembler language with AVX2 instructions. For completeness, Table 3 gives the measurements for the tests with FLINT. Unfortunately, each call of the FLINT multiplication function produces additional overhead costs such as deciding on one of several algorithms and computing complex roots for the FFT used in Kronecker substitution. These additional costs are highly significant for our small polynomials. So for every additional stage of our FFT, one needs to multiply twice as many polynomials with FLINT, and hence FLINT spends twice as much time on these auxiliary tasks that one would not have in an actual cryptographic implementation specialized to a particular prime and modulus. This is especially inefficient when the number of FFT levels is large. There nearly all of the time is spend on these tasks as one can see in Table 3 by comparing the cycle counts of 7 and 8 stages of FFT. Note that for 7 stages of FFT, FLINT is used for the trivial task of multiplying polynomials of degree one.

While we were not able to do a meaningful analysis for the combination of our highly-optimized FFT with FLINT, one can see that at level 0 (where the amount of overhead it does is the lowest), FLINT outperforms our un-optimized Karatsuba multiplication by a factor between 4 and 5, while looking at Level 8 shows that our AVX-optimized FFT outperforms the non-optimized version by approximately the same margin. It is then reasonable to assume that one can improve non-FFT multiplication by approximately the same factor as we improved the FFT multiplication, and therefore the improvement going from level 1 and 3 would still be approximately a factor 2 in a routine where both Karatsuba and FFT multiplication were highly optimized.