Faster Fully Homomorphic Encryption: Bootstrapping in Less Than 0.1 Seconds
 135 Citations
 3.9k Downloads
Abstract
In this paper, we revisit fully homomorphic encryption (FHE) based on GSW and its ring variants. We notice that the internal product of GSW can be replaced by a simpler external product between a GSW and an LWE ciphertext.
We show that the bootstrapping scheme FHEW of Ducas and Micciancio [11] can be expressed only in terms of this external product. As a result, we obtain a speed up from less than 1 s to less than 0.1 s. We also reduce the 1 GB bootstrapping key size to 24 MB, preserving the same security levels, and we improve the noise propagation overhead by replacing exact decomposition algorithms with approximate ones.
Moreover, our external product allows to explain the unique asymmetry in the noise propagation of GSW samples and makes it possible to evaluate deterministic automata homomorphically as in [13] in an efficient way with a noise overhead only linear in the length of the tested word.
Finally, we provide an alternative practical analysis of LWE based scheme, which directly relates the security parameter to the error rate of LWE and the entropy of the LWE secret key.
Keywords
Fully homomorphic encryption Bootstrapping Lattices LWE GSW1 Introduction
Fully homomorphic encryption (FHE) allows to perform computations over encrypted data without decrypting them. This concept has long been regarded as an open problem until the breakthrough paper of Gentry in 2009 [15] which demonstrates the feasibility of computing any function on encrypted data. Since then, many constructions have appeared involving new mathematical and algorithmic concepts and improving efficiency.
In homomorphic encryption, messages are encrypted with a noise that grows at each homomorphic evaluation of an elementary operation. In a somewhat encryption scheme, the number of homomorphic operations is limited, but can be made asymptotically large using bootstrapping [15]. This technical trick introduced by Gentry allows to evaluate arbitrary circuits by essentially evaluating the decryption function on encrypted secret keys. This step has remained very costly until the recent paper of Ducas and Micciancio [11], which presented a very fast bootstrapping procedure running in around 0.69 s, making an important step towards practical FHE for arbitrary NAND circuits. In this paper, we further improve the bootstrapping procedure.
We first provide an intuitive formalization of LWE/RingLWE on numbers or polynomials over the real torus, obtained by combining the ScaleInvariantLWE problem of [9] or the LWE normal form of [10] with the GeneralLWE problem of BrakerskiGentryVaikutanathan [5]. We call \({\mathrm {TLWE}}\) this unified representation of LWE ciphertexts, which encode polynomials over the Torus. Its security relies either on the hardness of general or ideal lattice reduction, depending on the choice of dimensions. Using the same formalism, we extend the GSW/RingGSW ciphertexts to \({\mathrm {TGSW}}\), which is the combined analogue of GentrySahaiWater’s ciphertexts from [3, 16], and which can also instantiate the ring version used in DucasMicciancio scheme [11] in the FHEW cryptosystem. Similarly, a \({\mathrm {TGSW}}\) ciphertext encodes an integer polynomial message, and depending on the choice of dimensions, its security is also based on (worstcase) generic or ideal lattice reduction algorithms. \({\mathrm {TLWE}}\) and \({\mathrm {TGSW}}\) are basically dual to each other, and the main idea of our efficiency result comes from the fact that these two schemes can directly be combined together to map the external product of their two messages into a \({\mathrm {TLWE}}\) sample. Since a \({\mathrm {TGSW}}\) sample is essentially a matrix whose individual rows are \({\mathrm {TLWE}}\) samples, our external product \({\mathrm {TGSW}}\) times \({\mathrm {TLWE}}\) is much quicker than the usual internal product \({\mathrm {TGSW}}\) times \({\mathrm {TGSW}}\) used in previous work. This could mostly be understood as comparing the speed of the computation of a matrixvector product to a matrixmatrix product. As a result, we obtain a significant improvement (12 times faster) of the most efficient bootstrapping procedure [11]; it now runs in less than 0.052 s.
We also analyze the case of leveled encryption. Using an external product means that we lose some composability properties in the design of homomorphic circuits. This corresponds to circuits where boolean gates have different kinds of wires that cannot be freely interconnected. Still, we show that we maintain the expressiveness of the whole binary decision diagram and automatabased logic, which was introduced in [13] with the \({\mathrm {GSW}}\)\({\mathrm {GSW}}\) internal product, and we tighten the analysis. Indeed, while it was impractical (10 transitions per second in the ring case, and impractical in the nonring case), we show that the \({\mathrm {TGSW}}\)\({\mathrm {TLWE}}\) external product enables to evaluate up to 5000 transitions per second, in a leveled homomorphic manner. We also refine the mapping between automata and homomorphic gates, and reduce the number of homomorphic operations to test a word with a deterministic automata. This allows to compile and evaluate constanttime algorithms (i.e. with dataindependent control flow) in a leveled homomorphic manner, with only sublinear noise overhead in the running time.
We also propose a new security analysis where the security parameter is directly expressed as a function of the entropy of the secret and the error rate. For the parameters that we propose in our implementation, we predict 188bits of security for both the bootstrapping key and the keyswitching key.
Roadmap. In Sect. 2, we give mathematical definitions and a quick overview of the classical version of LWEbased schemes. In Sect. 3, we generalize LWE and GSW schemes using a torus representation of the samples. We also review the arithmetic operations over the torus and introduce our main theorem characterizing the new morphism between \({\mathrm {TLWE}}\) and \({\mathrm {TGSW}}\). As a proof of concept, we present two main applications in Sect. 4 where we explain our fast bootstrapping procedure, and in Sect. 5, we present efficient leveled evaluation of deterministic automata, and apply it on a constanttime algorithm with logarithmic memory. Finally, we provide a practical security analysis in Sect. 6.
2 Background
Notation. In the rest of the paper we will use the following notations. The security parameter will be denoted as \(\lambda \). The set \(\{0,1\}\) (without any structure) will be written \(\mathbb {B}\). The real Torus \(\mathbb {R}/\mathbb {Z}\), called \(\mathbb {T}\) set of real numbers modulo 1. \(\mathfrak {R}\) denotes the ring of polynomials \(\mathbb {Z}[X]/(X^N+1)\). \(\mathbb {T}_N[X]\) denotes \(\mathbb {R}[X]/(X^N+1)\;\mod \;1\). Finally, we note by \(\mathcal {M}_{p,q}(E)\) the set of matrices \(p\times q\) with entries in E.
This section combines some algebra theory, namely abelian groups, commutative rings, Rmodules, and on some metrics of the continuous field \(\mathbb {R}\).
Definition 2.1
( R module). Let \((R,+,\times )\) be a commutative ring. We say that a set M is a Rmodule when \((M,+)\) is an abelian group, and when there exists an external operation \(\cdot \) which is bidistributive and homogeneous. Namely, \(\forall r,s\in R\) and \(x,y\in M\), \(1_R\cdot x=x\), \((r+s)\cdot x= r\cdot x+s\cdot x\), \(r\cdot (x+y)= r\cdot x+r\cdot y\), and \((r\times s)\cdot x=r\cdot (s\cdot x)\).
Any abelian group is by construction a \(\mathbb {Z}\)module for the iteration (or exponentiation) of its own law. In this paper, one of the most important abelian group we use is the real torus \(\mathbb {T}\), composed of all reals modulo 1 (\(\mathbb {R}\;\mod \;1\)). The torus is not a ring, since the real internal product is not compatible with the modulo 1 projection (expressions like \(0\times \frac{1}{2}\) are undefined). But as an additive group, it is a \(\mathbb {Z}\)module, and the external product \(\cdot \) from \(\mathbb {Z}\times \mathbb {T}\) to \(\mathbb {T}\), like in \(0\cdot \frac{1}{2}=0\), is well defined. More importantly, we recall that for all positive integers N and k, Open image in new window is a \(\mathfrak {R}\)module.
A Rmodule M shares many arithmetic operations and constructions with vector spaces: vectors \(M^n\) or matrices \(\mathcal {M}_{n,m}(M)\) are also Rmodules, and their left dot product with a vector in \(R^n\) or left matrix product in \(\mathcal {M}_{k,n}(R)\) are both well defined.
Gaussian Distributions. Let \(\sigma \in \mathbb {R}^{+}\) be a parameter and \(k\ge 1\) the dimension. For all \(\varvec{x}, \varvec{c}\in \mathbb {R}^k\), we note Open image in new window . If \(\varvec{c}\) is omitted, then it is implicitly 0. Let S be a subset of \(\mathbb {R}^k\), Open image in new window denotes Open image in new window or Open image in new window . For all closed (continuous or discrete) additive subgroup \(M\subseteq \mathbb {R}^k\), then Open image in new window is finite, and defines a (restricted) Gaussian Distribution of parameter \(\sigma \), standard deviation \(\sqrt{2/\pi }\sigma \) and center \(\varvec{c}\) over M, with the density function Open image in new window . Let L be a discrete subgroup of M, then the Modular Gaussian distribution over M/L exists and is defined by the density Open image in new window . Furthermore, when \(\text {span}(M)= \text {span}(L)\), then M/L admits a uniform distribution of constant density \(\mathcal {U}_{M/L}\). In this case, the smoothing parameter Open image in new window of L in M is defined as the smallest \(\sigma \in \mathbb {R}\) such that Open image in new window . If M is omitted, it implicitly means \(\mathbb {R}^k\).
Subgaussian Distributions. A distribution X over \(\mathbb {R}\) is \(\sigma \)subgaussian iff it satisfies the Laplacetransformation bound: \(\forall t\in \mathbb {R}, \mathbb {E}(\exp (tX))\le \exp (\sigma ^2t^2/2)\). By Markov’s inequality, this implies that the tails of X are bounded by the Gaussian function of standard deviation \(\sigma \): \(\forall x>0, \mathbb {P}(X\ge x)\le 2\exp (x^2/2\sigma ^2)\). As an example, the Gaussian distribution of standard deviation \(\sigma \) (i.e. parameter \(\sqrt{\pi /2}\sigma \)), the equidistribution on \(\{\sigma ,\sigma \}\), and the uniform distribution over \([\sqrt{3}\sigma ,\sqrt{3}\sigma ]\), which all have standard deviation \(\sigma \), are \(\sigma \)subgaussian^{1}. If X and \(X'\) are two independent \(\sigma \) and \(\sigma '\)subgaussian variables, then for all \(\alpha ,\beta \in \mathbb {R}\), \(\alpha X+\beta X'\) is \(\sqrt{\alpha ^2\sigma ^2+\beta ^2\sigma '^2}\)subgaussian.
Distance and Norms. We use the standard \(\left\ \cdot \right\ _p\) and \(\left\ \cdot \right\ _\infty \) norms for scalars and vectors over the real field or over the integers. By extension, the norm \(\left\ P(X)\right\ _p\) of a real or integer polynomial \(P\in \mathbb {R}[X]\) is the norm of its coefficient vector. If the polynomial is modulo \(X^N+1\), we take the norm of its unique representative of degree \(\le N1\).
By abuse of notation, we write \(\left\ \varvec{x}\right\ _p= \min _{\varvec{u}\in \varvec{x}+\mathbb {Z}^k}(\left\ u\right\ _p)\) for all \(\varvec{x}\in \mathbb {T}^k\). It is the pnorm of the representative of \(\varvec{x}\) with all coefficients in \(]\frac{1}{2},\frac{1}{2}]\). Although it satisfies the separation and the triangular inequalities, this notation is not a norm, because it lacks homogeneity^{2}, and \(\mathbb {T}^k\) is not a vector space either. But we have \(\forall m\in \mathbb {Z}, \left\ m\cdot \varvec{x}\right\ _p\le m \left\ \varvec{x}\right\ _p\). By extension, we define \(\left\ a\right\ _p\) for a polynomial \(a\in \mathbb {T}_N[X]\) as the p norm of its unique representative in \(\mathbb {R}[X]\) of degree \(\le N1\) and with coefficients in \(]\frac{1}{2},\frac{1}{2}]\).
Definition 2.2
Concentrated Distribution on the Torus, Expectation and Variance A distribution \(\mathcal {X}\) on the torus is concentrated iff. its support is included in a ball of radius \(\frac{1}{4}\) of \(\mathbb {T}\), except for negligible probability. In this case, we define the variance \(\textsf {Var}(\mathcal {X})\) and the expectation \(\mathbb {E}(\mathcal {X})\) of \(\mathcal {X}\) as respectively \(\textsf {Var}(\mathcal {X}) = \min _{\bar{x}\in \mathbb {T}} \sum p(x) x\bar{x}^2\) and \(\mathbb {E}(\mathcal {X})\) as the position \(\bar{x}\in \mathbb {T}\) which minimizes this expression. By extension, we say that a distribution \(\mathcal {X}^\prime \) over \(\mathbb {T}^n\) or \(\mathbb {T}_N[X]^k\) is concentrated iff. each coefficient has an independent concentrated distribution on the torus. Then the expectation \(\mathbb {E}(\mathcal {X}^\prime )\) is the vector of expectations of each coefficient, and \(\textsf {Var}(\mathcal {X}^\prime )\) denotes the maximum of each coefficient’s Variance.
These expectation and variance over \(\mathbb {T}\) follow the same linearity rules than their classical equivalent over the reals.
Fact 2.3
Let \(\mathcal {X}_1,\mathcal {X}_2\) be two independent concentrated distributions on either \(\mathbb {T}, \mathbb {T}^n\) or \(\mathbb {T}_N[X]^k\), and \(e_1,e_2\in \mathbb {Z}\) such that \(\mathcal {X}= e_1\cdot \mathcal {X}_1+e_2\cdot \mathcal {X}_2\) remains concentrated, then \(\mathbb {E}(\mathcal {X})=e_1\cdot \mathbb {E}(\mathcal {X}_1)+e_2\cdot \mathbb {E}(\mathcal {X}_2)\) and \(\textsf {Var}(\mathcal {X})\le e_1^2\cdot \textsf {Var}(\mathcal {X}_1)+e_2^2\cdot \textsf {Var}(\mathcal {X}_2)\).
Also, subgaussian distributions with small enough parameters are necessarily concentrated:
Fact 2.4
Every distribution \(\mathcal {X}\) on either \(\mathbb {T}, \mathbb {T}^n\) or \(\mathbb {T}_N[X]^k\) where each coefficient is \(\sigma \)subgaussian where \(\sigma \le 1/\sqrt{32\log (2)(\lambda +1)}\) is a concentrated distribution: a fraction \(12^{\lambda }\) of its mass is in the interval \([\frac{1}{4},\frac{1}{4}]\).
2.1 Learning with Error Problem
The Learning With Errors (\(\mathsf {LWE}\)) problem was introduced by Regev in 2005 [21]. The Ring variant, called \({\mathrm {RingLWE}}\), was introduced by Lyubashevsky, Peikert and Regev in 2010 [19]. Both variants are nowadays extensively used for the construction of latticebased Homomorphic Encryption schemes. In the original definition [21], a \(\mathsf {LWE}\) sample has its right member on the torus and is defined using continuous Gaussian distributions. Here, we will work entirely on the real torus, employing the same formalism as the Scale Invariant \(\mathsf {LWE}\) (\(\mathsf {SILWE}\)) scheme in [9], or \(\mathsf {LWE}\) scaleinvariant normal form in [10]. Without loss of generality, we refer to it as \(\mathsf {LWE}\).
Definition 2.5
((Homogeneous) LWE). Let \(n \ge 1\) be an integer, \(\alpha \in \mathbb {R}^+\) be a noise parameter and \(\varvec{s}\) be a uniformly distributed secret in some bounded set \(\mathcal {S}\in \mathbb {Z}^n\). Denote by \(\mathcal {D}^\mathsf {LWE}_{\varvec{s}, \alpha }\) the distribution over \(\mathbb {T}^n \times \mathbb {T}\) obtained by sampling a couple \((\varvec{a},b)\), where the left member \(\varvec{a} \in \mathbb {T}^n\) is chosen uniformly random and the right member \(b=\varvec{a}\cdot \varvec{s} + e\). The error e is a sample from a gaussian distribution with parameter \(\alpha \).

Search problem: given access to polynomially many \(\mathsf {LWE}\) samples, find \(s\in \mathcal {S}\).

Decision problem: distinguish between \(\mathsf {LWE}\) samples and uniformly random samples from \(\mathbb {T}^n \times \mathbb {T}\).
Both the \(\mathsf {LWE}\) search or decision problems are reducible to each other, and their average case is asymptotically as hard as worstcase lattice problems. In practice, both problems are also intractable, and their hardness increases with the the entropy of the key set \(\mathcal {S}\) (i.e. n if keys are binary) and \(\alpha \in ]0,\eta _\varepsilon (\mathbb {Z})[\).
Regev’s encryption scheme [21] is the following: Given a discrete message space \(\mathcal {M}\in \mathbb {T}\), for instance \(\{0,\frac{1}{2}\}\), a message \(\mu \in \mathcal {M}\) is encrypted by summing up the trivial \(\mathsf {LWE}\) sample \((\varvec{0}, \mu )\) of \(\mu \) to a Homogeneous \(\mathsf {LWE}\) sample \((\varvec{a},b)\in \mathbb {T}^{n+1}\) with respect to a secret key \(\varvec{s} \in \mathbb {B}^n\) and a noise parameter \(\alpha \in \mathbb {R}^+\). The semantic security of the scheme is equivalent to the \(\mathsf {LWE}\) decisional problem. The decryption of a sample \(\varvec{c}=(\varvec{a},b)\) consists in computing this quantity \(\varphi _s(\varvec{a},b)=b\varvec{s}\cdot \varvec{a}\), which we call the phase of \(\varvec{c}\), and to round it to the nearest element in \(\mathcal {M}\). Decryption is correct with overwhelming probability \(12^{p}\) provided that the parameter \(\alpha \) is \(O(R/\sqrt{p})\) where R is the packing radius of \(\mathcal {M}\).
3 Generalization
In this section we extend this presentation to rings, following the generalization of [5], and also to \({\mathrm {GSW}}\) [16].
3.1 TLWE
We first define \({\mathrm {TLWE}}\) samples, together with the search and decision problems. In the following, ciphertexts are viewed as normal samples.
Definition 3.1
(TLWE samples). Let \(k\ge 1\) be an integer, N a power of 2, and \(\alpha \ge 0\) be a noise parameter. A \({\mathrm {TLWE}}\) secret key \(\varvec{s}\in \mathbb {B}_N[X]^k\) is a vector of k polynomials \(\in \mathfrak {R}=\mathbb {Z}[X]/X^N+1\) with binary coefficients. For security purposes, we assume that private keys are uniformly chosen, and that they actually contain \(n\approx Nk\) bits of entropy. The message space of \({\mathrm {TLWE}}\) samples is \(\mathbb {T}_N[X]\). A fresh \({\mathrm {TLWE}}\) sample of a message \(\mu \in \mathbb {T}_N[X]\) with noise parameter \(\alpha \) under the key \(\varvec{s}\) is an element \((\varvec{a},b)\in \mathbb {T}_N[X]^k\times \mathbb {T}_N[X]\), \(b \in \mathbb {T}_N[X]\) has Gaussian distribution Open image in new window around \(\mu +\varvec{s}\cdot \varvec{a}\). The sample is random iff its left member \(\varvec{a}\) (also called mask) is uniformly random \(\in \mathbb {T}_N[X]^k\) (or a sufficiently dense submodule^{3}), trivial if \(\varvec{a}\) is fixed to \(\varvec{0}\), noiseless if \(\alpha =0\), and homogeneous iff its message \(\mu \) is 0.

Search problem: given access to polynomially many fresh random homogeneous \({\mathrm {TLWE}}\) samples, find their key \(\varvec{s} \in \mathbb {B}_N[X]^k\).

Decision problem: distinguish between fresh random homogeneous \({\mathrm {TLWE}}\) samples from uniformly random samples from \(\mathbb {T}_N[X]^{k+1}\).
This definition is the analogue on the torus of the GeneralLWE problem of [5]. It allows to consider both LWE and RingLWE as a single problem. Choosing N large and \(k=1\) corresponds to the classical (bin)RingLWE (over cyclotomic rings, and up to a scaling factor q). When \(N=1\) and k large, then \(\mathfrak {R}\) and \(\mathbb {T}_N[X]\) respectively collapses to \(\mathbb {Z}\) and \(\mathbb {T}\), and \({\mathrm {TLWE}}\) is simply binLWE (up to the same scaling factor q). Other choices of N, k give some continuum between the two extremes, with a security that varies between worstcase ideal lattices to worstcase regular lattices.
Thanks to the underlying \(\mathfrak {R}\)module structure, we can sum TLWE samples, or we can make integer linear or polynomial combinations of samples with coefficients in \(\mathfrak {R}\). However, each of these combinations increases the noise inside the samples. They are therefore limited to small coefficients.
We additionally define a function called the phase of a \({\mathrm {TLWE}}\) sample, that will be used many times. The phase computation is the first step of the classical decryption algorithm, and uses the secret key.
Definition 3.2
(Phase). Let \(\varvec{c}=(\varvec{a},b)\in \mathbb {T}_N[X]^{k}\times \mathbb {T}_N[X]\) and \(\varvec{s}\in \mathbb {B}_N[X]^k\), we define the phase of the sample as Open image in new window .
The phase is linear over \(\mathbb {T}_N[X]^{k+1}\) and is \((kN+1)\)lipschitzian for the \(\ell _{\infty }\) distance: \(\forall \varvec{x},\varvec{y}\in \mathbb {T}_N[X]^{k+1}, \left\ \varphi _{\varvec{s}}(\varvec{x})\varphi _{\varvec{s}}(\varvec{y})\right\ _\infty \le (kN+1)\left\ \varvec{x}\varvec{y}\right\ _\infty \).
Note that a TLWE sample contains noise, that its semantic is only function of its phase, and that the phase has the nice property to be lipschitzian. Together, these properties have many interesting implications. In particular, we can always work with approximations, since two samples at a short distance on \(\mathbb {T}_N[X]^{k+1}\) share the same properties: they encode the same message, and they can in general be swapped. This fact explains why we can work and describe our algorithms on the infinite Torus.
Given a finite message space \(\mathcal {M}\subseteq \mathbb {T}_N[X]\), the (classical) decryption algorithm computes the phase \(\varphi _s(\varvec{c})\) of the sample, and returns the closest \(\mu \in \mathcal {M}\). It is easy to see that if \(\varvec{c}\) is a fresh TLWE sample of \(\mu \in \mathcal {M}\) with gaussian noise parameter \(\alpha \), the decryption of \(\varvec{c}\) over \(\mathcal {M}\) is equal to \(\mu \) as soon as \(\alpha \) is \(\varTheta (\sqrt{\lambda })\) times smaller than the packing radius of \(\mathcal {M}\). However decryption is harder to define for nonfresh samples. In this case, correctness of the decryption procedure involves a recurrence formula between the decryption of the sum and the sum of the decryption of the inputs conditioned by the noise parameters. In addition, message spaces of the input samples can be in different subgroups of \(\mathbb {T}\). To raise the limitations of the decryption function, we will instead use a mathematical definition of message and error by reasoning directly on the following \(\varOmega \)probability space.
Definition 3.3
(The \(\varOmega \) probability space). Since samples are either independent (random, noiseless, or trivial) fresh \(\varvec{c}\leftarrow TLWE_{\varvec{s},\alpha }(\mu )\), or linear combination \(\tilde{\varvec{c}}=\sum _{i=1}^p e_i \cdot \varvec{c_i}\) of other samples, the probability space \(\varOmega \) is the product of the probability spaces of each individual fresh samples \(\varvec{c}\) with the TLWE distributions defined in Definition 3.1, and of the probability spaces of all the coefficients \((e_1,\dots ,e_p)\in \mathfrak {R}^p\) or \(\mathbb {Z}^p\) that are obtained with randomized algorithm.
In other words, instead of viewing a TLWE sample as a fixed value which is the result of one particular event in \(\varOmega \), we will consider all the possible values at once, and make statistics on them.
We now define functions on \({\mathrm {TLWE}}\) samples: message, error, noise variance, and noise norm. These functions are well defined mathematically, and can be used in the analysis of various algorithms. However, they cannot be directly computed or approximated in practice.
Definition 3.4

the message of \(\varvec{c}\), denoted as Open image in new window is the expectation of \(\varphi _{\varvec{s}}(\varvec{c})\);

the error, denoted \(\textsf {Err}(\varvec{c})\), is equal to \(\varphi _{\varvec{s}}(\varvec{c})\textsf {msg}(\varvec{c})\);

\(\textsf {Var}(\textsf {Err}(\varvec{c}))\) denotes the variance of \(\textsf {Err}(\varvec{c})\), which is by definition also equal to the variance of \(\varphi _{\varvec{s}}(\varvec{c})\);

finally, \(\left\ \textsf {Err}(\varvec{c})\right\ _\infty \) denotes the maximum amplitude of \(\textsf {Err}(\varvec{c})\) (possibly with overwhelming probability).
Unlike the classical decryption algorithm, the message function can be viewed as an ideal black box decryption function, which works with infinite precision even if the message space is continuous. Provided that the noise amplitude remains smaller than \(\frac{1}{4}\), the message function is perfectly linear. Using these intuitive and intrinsic functions will considerably ease the analysis of all algorithms in this paper. In particular, we have:
Fact 3.5
Given p valid and independent \({\mathrm {TLWE}}\) samples \(\varvec{c_1}, \ldots , \varvec{c_p}\) under the same key \(\varvec{s}\), and p integer polynomials \(e_1, \ldots , e_p\in \mathfrak {R}\), if the linear combination Open image in new window is a valid \({\mathrm {TLWE}}\) sample, it satisfies: Open image in new window , with variance \(\textsf {Var}(\textsf {Err}(\varvec{c})) \le \sum _{i=1}^{p} \Vert e_i\Vert _2^2 \cdot \textsf {Var}(\textsf {Err}(\varvec{c_i}))\) and noise amplitude \(\left\ \textsf {Err}(\varvec{c})\right\ _\infty \le \sum _{i=1}^{p} \left\ e_i\right\ _1 \cdot \left\ \textsf {Err}(\varvec{c_i})\right\ _\infty \). If the last bound is \(<\frac{1}{4}\), then \(\varvec{c}\) is necessarily a valid TLWE sample (under the same key \(\varvec{s})\).
In order to characterize the average case behaviour of our homomorphic operations, we shall rely on the heuristic assumption of independence below. This heuristic will only be used for practical averagecase bounds. Our worstcase theorems and lemma based on the infinite norm do not use it at all.
Assumption 3.6
(Independence Heuristic). All the coefficients of the error of \({\mathrm {TLWE}}\) or \({\mathrm {TGSW}}\) samples that occur in all the linear combinations we consider are independent and concentrated. More precisely, they are \(\sigma \)subgaussian where \(\sigma \) is the squareroot of their variance.
This assumption allows us to bound the variance of the noise instead of its norm, and to provide realistic averagecase bounds which often correspond to the square root of the worstcase ones. The error can easily be proved subgaussian, since each coefficients are always obtained by convolving Gaussians or zerocentered bounded uniform distributions. But the independence assumption between all the coefficients remains heuristic. Dependencies between coefficients may affect the variance of their combinations in both directions. The independence of coefficients can be obtained by adding enough entropy in all our decomposition algorithms and by increasing some parameters accordingly, but as noticed in [11], this workaround seems more as a proof artefact, and is experimentally not needed. Since average case corollaries should reflect practical results, we leave the independence of subgaussian samples as a heuristic assumption.
3.2 TGSW
In this section we present a generalized scale invariant version of the FHE scheme \({\mathrm {GSW}}\) [16], that we call \({\mathrm {TGSW}}\). \({\mathrm {GSW}}\) was proposed Gentry, Sahai and Waters in 2013 [16], and improved in [3] and its security is based on the \(\mathsf {LWE}\) problem. The scheme relies on a gadget decomposition function, which we also extend to polynomials, but most importantly, the novelty is that our function is an approximate decomposition, up to some precision parameter. This allows to improve running time and memory requirements for a small amount of additional noise.
Definition 3.7
(Approximate Gadget Decomposition). Let \(\varvec{h}\in \mathcal {M}_{p,k+1}(\mathbb {T}_N[X])\) as in (1). We say that \(Dec_{\varvec{h},\beta ,\epsilon }(\varvec{v})\) is a decomposition algorithm on the gadget \(\varvec{h}\) with quality \(\beta \) and precision \(\epsilon \) if and only if for any \({\mathrm {TLWE}}\) sample \(\varvec{v}\in \mathbb {T}_N[X]^{k+1}\), it efficiently and publicly outputs a small vector \(\varvec{u}\in \mathfrak {R}^{(k+1)\ell }\) such that \(\left\ \varvec{u}\right\ _\infty \le \beta \) and \(\left\ \varvec{u}\cdot \varvec{h}\varvec{v}\right\ _\infty \le \epsilon \). Furthermore, the expectation of \(\varvec{u}\cdot \varvec{h}\varvec{v}\) must to be 0 when \(\varvec{v}\) is uniformly distributed in \(\mathbb {T}_N[X]^{k+1}\)
Lemma 3.8
Let \(\ell \in \mathbb {N}\) and \(B_g\in \mathbb {N}\). Then for \(\beta =B_g/2\) and \(\epsilon =1/2B_g^\ell \), Algorithm 1 is a valid \(Dec_{\varvec{h},\beta ,\epsilon }\).
Proof
Let \(\varvec{v} = (a,b) = (a_1, \ldots , a_{k}, b=a_{k+1}) \in \mathbb {T}_N[X]^{k+1}\) be a \({\mathrm {TLWE}}\) sample, given as input to Algorithm 1. Let \(\varvec{u} = [e_{1,1},\dots ,e_{k+1,\ell }]\in \mathfrak {R}^{(k+1)\ell }\) be the corresponding output by construction \(\left\ \varvec{u}\right\ _\infty \le B_g/2 = \beta \).
Let \(\varvec{\epsilon _\mathbf{dec }}=\varvec{u}\cdot \varvec{h}\varvec{v}\). For all \(i\in [\![1,k+1 ]\!]\) and \(j\in [\![1,\ell ]\!]\), we have by construction Open image in new window . Since \(\bar{a}_{i,j}\) is defined as the nearest multiple of \(\frac{1}{B_g^\ell }\) on the torus, we have \(\bar{a}_{i,j}a_{i,j}\le 1/2B_g^\ell =\epsilon \). \(\varvec{\epsilon _\mathbf{dec }}\) has therefore a concentrated distribution when \(\varvec{v}\) is uniform. We now verify that it is zerocentered. Finally, if we call f the function from \(\mathbb {T}\) to \(\mathbb {T}\) which rounds an element x to its closest multiple of \(\frac{1}{B_g^\ell }\) and the function g the symmetry defined by \(g(x)=2f(x)x\) on the torus; we easily verify that the \(\mathbb {E}(\varvec{\epsilon _\mathbf{dec }}_{i,j})\) is equal to \(\mathbb {E}(a_{i,j}f(a_{i,j}))\) when \(a_{i,j}\) has uniform distribution, which is equal to \(\mathbb {E}(g(a_{i,j})f(g(a_{i,j})))\) when \(g(a_{i,j})\) has uniform distribution also equal to \(\mathbb {E}(f(a_{i,j})a_{i,j})=\mathbb {E}(\varvec{\epsilon _\mathbf{dec }}_{i,j})\). Thus, the expectation of \(\varvec{\epsilon _\mathbf{dec }}\) is 0. \(\square \)
We are now ready to define \({\mathrm {TGSW}}\) samples, and to extend the notions of phase of valid sample, message and error of the samples.
Definition 3.9
(TGSW samples). Let \(\ell \) and \(k\ge 1\) be two integers, \(\alpha \ge 0\) be a noise parameter and \(\varvec{h}\) the gadget defined in Eq. (1). Let \(\varvec{s}\in \mathbb {B}_N[X]^k\) be a \({\mathrm {RingLWE}}\) key, we say that \(\varvec{C}\in \mathcal {M}_{(k+1)\ell ,k+1}(\mathbb {T}_N[X])\) is a fresh \({\mathrm {TGSW}}\) sample of \(\mu \in \mathfrak {R}/\varvec{h}^\perp \) with noise parameter \(\alpha \) iff Open image in new window where each row of \(\varvec{Z}\in \mathcal {M}_{(k+1)\ell ,k+1}(\mathbb {T}_N[X])\) is an Homogeneous TLWE sample (of 0) with Gaussian noise parameter \(\alpha \). Reciprocally, we say that an element \(\varvec{C}\in \mathcal {M}_{(k+1)\ell ,k+1}(\mathbb {T}_N[X])\) is a valid TGSW sample iff there exists a unique polynomial \(\mu \in \mathfrak {R}/\varvec{h}^\perp \) and a unique key \(\varvec{s}\) such that each row of Open image in new window is a valid TLWE sample of 0 for the key \(\varvec{s}\). We call the polynomial \(\mu \) the message of \(\varvec{C}\), and we denote it by Open image in new window .
Definition 3.10
(Phase, Error). Let \(A=\in \mathcal {M}_{(k+1)\ell ,k+1}(\mathbb {T}_N[X])\) be a \({\mathrm {TGSW}}\) sample for a secret key \(\varvec{s}\in \mathbb {B}_N[X]^k\) and noise parameter \(\alpha \ge 0\).
We define the phase of A, denoted as \(\varphi _{\varvec{s}}(A)\in (\mathbb {T}_N[X])^{(k+1)\ell }\), as the list of the \((k+1)\ell \) \({\mathrm {TLWE}}\) phases of each line of A. In the same way, we define the error of A, denoted Open image in new window , as the list of the \((k+1)\ell \) \({\mathrm {TLWE}}\) errors of each line of A.
Since \({\mathrm {TGSW}}\) samples are essentially vectors of \({\mathrm {TLWE}}\) samples, they are naturally compatible with linear operations. And both phase and message functions remain linear.
Fact 3.11
Given p valid \({\mathrm {TGSW}}\) samples \(C_1, \ldots , C_p\) of messages \(\mu _1, \ldots , \mu _p\) under the same key, and with independent error coefficients, and given p integer polynomials \(e_1, \ldots , e_p\), the linear combination Open image in new window is a sample of \(\mu = \sum _{i=1}^{p} e_i \cdot \mu _i\), with variance \(\textsf {Var}(C) = \left( \sum _{i=1}^{p} \Vert e_i\Vert _2^2 \cdot \textsf {Var}(C_i) \right) ^{1/2}\) and noise infinity norm \(\left\ \textsf {Err}(C)\right\ _\infty = \sum _{i=1}^{p} \left\ e_i\right\ _1 \cdot \left\ \textsf {Err}(C)\right\ _\infty \).
Also, the phase remains \(1+kN\) lipschitzian for the infinity norm.
Fact 3.12
For all \(A\in \mathcal {M}_{p,k+1}(\mathbb {T}_N[X])\), \(\left\ \varphi _{\varvec{s}}(A)\right\ _\infty \le (Nk+1)\left\ A\right\ _\infty \).
We finally define the homomorphic product between \({\mathrm {TGSW}}\) and \({\mathrm {TLWE}}\) samples, whose corresponding message is simply the product of the two messages of the initial samples. Since the left member encodes an integer polynomial, and the right one a torus polynomial, this operator performs a homomorphic evaluation of their external product. Theorem 3.14 (resp. Corollary 3.15) analyzes the worstcase (resp. averagecase) noise propagation of this product. Then, Corollary 3.16 relates this new morphism to the classical internal product between \({\mathrm {TGSW}}\) samples.
Definition 3.13
The formula is almost identical to the classical product defined in the original GSW scheme in [16], except that only one vector needs to be decomposed. For this reason, we get almost the same noise propagation formula, with an additional term that comes from the approximations in the decomposition.
Theorem 3.14
(Worstcase External Product). Let A be a valid \({\mathrm {TGSW}}\) sample of message \(\mu _A\) and let \(\varvec{b}\) be a valid \({\mathrm {TLWE}}\) sample of message \(\mu _{\varvec{b}}\). Then \(A \boxdot \varvec{b}\) is a \({\mathrm {TLWE}}\) sample of message \(\mu _A \cdot \mu _{\varvec{b}}\) and \(\left\ \textsf {Err}(A\boxdot \varvec{b})\right\ _\infty \le (k+1)\ell N\beta \left\ \textsf {Err}(A)\right\ _\infty + \left\ \mu _A\right\ _1(1+kN)\epsilon + \left\ \mu _A\right\ _1\left\ \textsf {Err}(\varvec{b})\right\ _\infty \) (worst case), where \(\beta \) and \(\epsilon \) are the parameters used in the decomposition \(Dec_{\varvec{h},\beta ,\epsilon }(\varvec{b})\). If \(\left\ \textsf {Err}(A\boxdot \varvec{b})\right\ _\infty \le 1/4\) we are guaranteed that \(A \boxdot \varvec{b}\) is a valid \({\mathrm {TLWE}}\) sample.
Proof
We similarly obtain the more realistic averagecase noise propagation, based on the independence heuristic, by bounding the Gaussian variance instead of the amplitude.
Corollary 3.15
(Averagecase External Product). Under the same conditions of Theorem 3.14 and by assuming the Heuristic 3.6, we have that \(\textsf {Var}(\textsf {Err}(A\boxdot \varvec{b})) \le (k+1)\ell N\beta ^2\textsf {Var}(\textsf {Err}(A)) + (1+kN)\left\ \mu _A\right\ _2^2 \epsilon ^2 + \left\ \mu _A\right\ _2^2 \textsf {Var}(\textsf {Err}(\varvec{b}))\).
Proof
The last corollary describes exactly the classical internal product between two \({\mathrm {TGSW}}\) samples, already presented in [3, 11, 13, 16] with adapted notations. As we mentioned before, it is much slower to evaluate, because it consists in \((k+1)\ell \) independent computations of the \(\boxdot \) product, which we illustrate now.
Corollary 3.16
Furthermore, by assuming the Heuristic 3.6, we have that Open image in new window (average case).
Proof
Let A and B be two \({\mathrm {TGSW}}\) samples, and \(\mu _A\) and \(\mu _B\) their message. By definition, the ith row of B encodes Open image in new window , so the ith row of \(A\boxtimes B\) encodes Open image in new window . This proves that \(A\boxtimes B\) encodes \(\mu _A\mu _B\). Since the internal product \(A \boxtimes B\) consists in \((k+1)\ell \) independent runs of the external products \(A\boxdot \varvec{b_i}\), the noise propagation formula directly follows from Theorem 3.14 and Corollary 3.15. \(\square \)
In the next section, we show that all internal products in the bootstrapping procedure can be replaced with the external one. Consequently, we expect a speedup of a factor at least \((k+1)\ell \).
4 Application: Single Gate Bootstrapping in Less Than 0.1 Seconds
In this section, we show how to use Theorem 3.14 to speedup the bootstrapping presented in [11]. With additional optimizations, we drastically reduce the bootstrapping key size, and also reduce a bit the noise overhead. To bootstrap a LWE sample \((a,b)\in \mathbb {T}^{n+1}\), which is rescaled as \((\bar{\varvec{a}},\bar{\varvec{b}})\mod 2N\), using relevant encryptions of its secret key \(\varvec{s}\in \mathbb {B}^n\), the overall idea is the following. We start from a fixed polynomial \(\text {testv}\in \mathbb {T}_N[X]\), which is our phase detector: its ith coefficient is set to the value that the bootstrapping should return if \(\varphi _{\varvec{s}}(a,b)=i/2N\). \(\text {testv}\) is first encoded in a trivial \(\mathsf {LWE}\) sample. Then, we iteratively rotate its coefficients, using external multiplications with \({\mathrm {TGSW}}\) encryptions of the hidden monomials \(X^{s_i\bar{a_i}}\). By doing so, the original \(\text {testv}\) gets rotated by the (hidden) phase of \((\varvec{a},b)\), and in the end, we simply extract the constant term as a \(\mathsf {LWE}\) sample.
4.1 TLWE to LWE Extraction
Like in previous work, extracting a LWE sample from a TLWE sample simply means rewriting polynomials into their list of coefficients, and discarding the \(N1\) last coefficients of b. This yields a LWE encryption of the constant term of the initial polynomial message.
Definition 4.1
(TLWE Extraction). Let \((\varvec{a^{\prime \prime }},b^{\prime \prime })\) be a \({\mathrm {TLWE}}_{\varvec{s^{\prime \prime }}}(\mu )\) sample with key \(\varvec{s^{\prime \prime }}\in \mathfrak {R}^k\), We call Open image in new window the integer vector \(\varvec{s'}=\left( \mathsf {coefs}(s_1^{\prime \prime }(X),\dots ,\mathsf {coefs}(s_k^{\prime \prime }(X)\right) \in \mathbb {Z}^{kN}\) and Open image in new window the \(\mathsf {LWE}\) sample \((\varvec{a'},b')\in \mathbb {T}^{kN+1}\) where \(\varvec{a'}=\left( \mathsf {coefs}(a_1^{\prime \prime }(1/X),\dots ,\mathsf {coefs}(a_k^{\prime \prime }(1/X)\right) \) and \(b'=b^{\prime \prime }_0\) the constant term of \(b^{\prime \prime }\). Then \(\varphi _{\varvec{s'}}(a',b')\) (resp. Open image in new window ) is equal to the constant term of \(\varphi _{\varvec{s^{\prime \prime }}}(a^{\prime \prime },b^{\prime \prime })\) (resp. to the constant term of Open image in new window ). And Open image in new window and Open image in new window .
4.2 LWE to LWE KeySwitching Procedure
Given a \(\mathsf {LWE}_{\varvec{s'}}\) sample of a message \(\mu \in \mathbb {T}\), the key switching procedure initially proposed in [5, 7] outputs a \(\mathsf {LWE}_{\varvec{s}}\) sample of the same \(\mu \) without increasing the noise too much. Contrary to previous exact keyswitch procedures, here we tolerate approximations.
Definition 4.2
Let \(\varvec{s}^\prime \in \{0,1\}^{n^\prime }\), \(\varvec{s}\in \{0,1\}^{n}\), a noise parameter \(\gamma \in \mathbb {R}\) and a precision parameter \(t\in \mathbb {N}\), we call key switching secret \(\mathsf {KS}_{\varvec{s'}\rightarrow \varvec{s},\gamma ,t}\) a sequence of fresh \(\mathsf {LWE}\) samples \(\mathsf {KS}_{i,j}\in \mathsf {LWE}_{\varvec{s},\gamma }(s_i'\cdot 2^{j})\) for \(i\in [1,n']\) and \(j\in [1,t]\).
Lemma 4.3
(Key switching). Given \((\varvec{a'},b')\in \mathsf {LWE}_{\varvec{s}'}(\mu )\) where \(\varvec{s}'\in \{0,1\}^{n'}\) with noise \(\eta '=\left\ \textsf {Err}(\varvec{a'},b')\right\ _\infty \) and a keyswitching key \(\mathsf {KS}_{\varvec{s'}\rightarrow \varvec{s},\gamma ,t}\), where \(\varvec{s}\in \{0,1\}^n\), the key switching procedure outputs a \(\mathsf {LWE}\) sample \((\varvec{a},b)\in \mathsf {LWE}_{\varvec{s}_n}(\mu )\) where Open image in new window .
Proof
Corollary 4.4
Let t be an integer parameter. Under Assumption 3.6 Given \((\varvec{a'},b')\in \mathsf {LWE}_{\varvec{s'}}(\mu )\) with noise variance Open image in new window and a key switching key \(\mathsf {KS}_{\varvec{s'}\rightarrow \varvec{s},\gamma ,\ell }\), the key switching procedure outputs an LWE sample \((\varvec{a}',b')\in \mathsf {LWE}_{\varvec{s}}(\mu )\) where Open image in new window .
4.3 Bootstrapping Procedure
Given a \(\mathsf {LWE}\) sample \(\mathsf {LWE}_{\varvec{s}}(\mu )=(\varvec{a},b)\), the bootstrapping procedure constructs an encryption of \(\mu \) under the same key \(\varvec{s}\) but with a fixed amount of noise. As in [11], we will use \({\mathrm {TLWE}}\) as an intermediate encryption scheme to perform a homomorphic evaluation of the phase but here we will use its external product from Theorem 3.14 with a \({\mathrm {TGSW}}\) encryption of the key \(\varvec{s}\).
Definition 4.5
Let \(\varvec{s}\in \mathbb {B}^n\), \(\varvec{s^{\prime \prime }}\in \mathbb {B}_N[X]^k\) and \(\alpha \) be a noise parameter. We define the bootstrapping key \(\text {BK}_{\varvec{s}\rightarrow \varvec{s^{\prime \prime }},\alpha }\) as the sequence of n \({\mathrm {TGSW}}\) samples where \(\text {BK}_i\in {\mathrm {TGSW}}_{\varvec{s^{\prime \prime }},\alpha }(s_i)\).
We first provide a comparison between the bootstrapping of Algorithm 3 and [11, Algorithms 1 and 2] proposal.

Like [11], we rescale the computation of the phase of the input \(\mathsf {LWE}\) sample so that it is modulo 2N (line 2) and we map all the corresponding operations in the multiplicative cyclic group \(\{ 1,X,\dots ,X^{2N1} \}\). Since our \(\mathsf {LWE}\) samples are described over the real torus, the rescaling is done explicitly in line 2. This rescaling may induce a cumulated rounding error of amplitude at most \(\delta \approx \sqrt{n}/4N\) in the average case and \(\delta \le (n+1)/4N\) in the worst case. In the best case, this amplitude can decrease to zero (\(\delta =0\)) if in the actual representation of \(\mathsf {LWE}\) samples, all the coefficients are restricted to multiple of \(\frac{1}{2N}\), which would be the analogue of [11]’s setting.

As in [11], messages are encoded as roots of unity in \(\mathcal {R}\). Our accumulator is a \({\mathrm {TLWE}}\) sample instead of a \({\mathrm {TGSW}}\) sample in [11]. Also accumulator operations use the external product from Theorem 3.14 instead of the slower classical internal product. The test vector \((1\text {+}X\text {+}\dots \text {+}X^{N1})\) is embedded in the accumulator from the very start, when the accumulator is still noiseless while in [11], it is added at the very end. This removes a factor \(\sqrt{N}\) to the final noise overhead.

All the \({\mathrm {TGSW}}\) ciphertexts of \(X^{\bar{a}_i s_i}\) required to update the accumulator internal value are computed dynamically as a very small polynomial combination of \(BK_i\) in the for loop (line 5). This completely removes the need to decompose each \(\bar{a}_i\) on an additional base \(B_r\), and to precompute all possibilities in the bootstrapping key. In other words, this makes our bootstrapping key 46 times smaller than in [11], for the exact same noise overhead. Besides, due to this squashing technique, two accumulator operations were performed per iteration instead of one in our case. This gives us an additional 2X speed up.
Theorem 4.6
(Bootstrapping Theorem). Let \(\varvec{h}\in \mathcal {M}_{\ell (k+1),k+1}(\mathbb {T}_N[X])\) be the gadget defined in Eq. 1 and let \(Dec_{\varvec{h},\epsilon ,\beta }\) be the associated vector gadget decomposition function.
Let \(\varvec{s}\in \mathbb {B}^n\), \(\varvec{s^{\prime \prime }}\in \mathbb {B}_N[X]^k\) and \(\alpha ,\gamma \) be noise amplitudes. Let Open image in new window be a bootstrapping key, let Open image in new window and \(\mathsf {KS}=\mathsf {KS}_{\varvec{s'}\rightarrow \varvec{s},\gamma ,t}\) be a keyswitching secret.
Given \((\varvec{a},b)\in \mathsf {LWE}_{\varvec{s}}(\mu )\) for \(\mu \in \mathbb {T}\), two fixed messages \(\mu _0,\mu _1\), Algorithm 3 outputs a sample in \(\mathsf {LWE}_{\varvec{s}}(\mu ')\) s.t. \(\mu '=\mu _0\) if \(\varphi _{\varvec{s}}(\varvec{a},b)<1/4\delta \) and \(\mu '=\mu _1\) if \(\varphi _{\varvec{s}}(\varvec{a},b)> 1/4+\delta \) where \(\delta \) is the cumulated rounding error equal to \(\frac{n+1}{4N}\) in the worst case and \(\delta =0\) if the all coefficients of \((\varvec{a},b)\) are multiple of \(\frac{1}{2N}\). Let \(\varvec{v}\) be the output of Algorithm 3. Then Open image in new window .
Proof
Line 1: the division by two over torus gives two possible values for \((\bar{\mu },\bar{\mu }')\). In both cases, \(\bar{\mu }+\bar{\mu }'=\mu _0\) and \(\bar{\mu }\bar{\mu }'=\mu _1\).
At line 3, the test vector Open image in new window is defined such that for all \(p\in [0,2N]\), the constant term of Open image in new window is either \(\bar{\mu }'\) if \(p\in ]\!]\frac{N}{2},\frac{N}{2}[\![\) and \(\bar{\mu }'\) else.
In the loop for (from line 5 to 6), we will prove the following invariant: At the beginning of iteration \(i+1\in [1,n+1]\) (i.e. at the end of iteration i), Open image in new window and \(\left\ \textsf {Err}({ACC}_i)\right\ _\infty \le \sum _{j=1}^i \Big ( 2(k+1)\ell N \beta \left\ \textsf {Err}(\text {BK}_j)\right\ _\infty +(1+kN)\epsilon \Big )\).
At the beginning of iteration \(i=1\), the accumulator contains a trivial ciphertext Open image in new window , so \(\left\ \textsf {Err}({ACC}_1)\right\ _\infty =0\).
After \(\textsf {SampleExtract}\) (line 7), the message of u is equal to the constant term of the message of \({ACC}_n\), i.e. Open image in new window where \(\bar{\varphi }=\bar{b}\sum _{i=1}^n \bar{a}_i s_i\). If \(\bar{\varphi }\in [\![N/2,N/2[\![\), the constant term is equal to \(\bar{\mu }'\) and \(\bar{\mu }'\) otherwise.
In other words, \(\varphi _{\varvec{s}}(\varvec{a},b)< 1/4\delta \), then \(\varphi _{\varvec{s}}(\varvec{a},b) < 1/4\delta \) and \(\varphi _{\varvec{s}}(\varvec{a},b) \ge 1/4+ \delta \) and thus using Eq. (2), we obtain that \(\bar{\varphi }\in ]\!]\frac{N}{2},\frac{N}{2}[\![\) and thus, the message of u is equal to \(\bar{\mu }'\). And if \(\varphi _{\varvec{s}}(\varvec{a},b)> 1/4+\delta \) then \(\varphi _{\varvec{s}}(\varvec{a},b)>1/4+\delta \) or \(\varphi _{\varvec{s}}(\varvec{a},b)<1/4\delta \) and using Eq. (2), we obtain the message of u is equal to \(\bar{\mu }'\).
Since \(\textsf {SampleExtract}\) does not add extra noise, \(\left\ \textsf {Err}(\varvec{u})\right\ _\infty \le \left\ \textsf {Err}({ACC}_n)\right\ \). Since the KeySwitch procedure preserves the message, the message of \(v=\mathsf {KeySwitch}_{\mathsf {KS}}(\varvec{u})\) is equal to the message of u. And \(\left\ \textsf {Err}(\varvec{v})\right\ _\infty \le \left\ \textsf {Err}(\varvec{u})\right\ _\infty +kNt\gamma +kN2^{(t+1)}\). \(\square \)
Corollary 4.7
Let Open image in new window and \(V_{\mathsf {KS}}=\textsf {Var}(\textsf {Err}(\mathsf {{KS}_i}))=2/\pi \cdot \gamma ^2\). Under the same conditions of Theorem 4.6, and assuming Assumption 3.6, then the Variance of the output v of Algorithm 3 satisfies Open image in new window .
Proof
The proof is the same as for the proof of the bound on \(\left\ \textsf {Err}(\varvec{v})\right\ _\infty \) replacing all \(\left\ \right\ _\infty \) inequalities by \(\textsf {Var}()\) inequalities. \(\square \)
4.4 Application to Circuits

\(\mathrm {HomNOT}(\varvec{c}) = (\varvec{0},\frac{1}{4})\text {} \varvec{c}\) (no bootstrapping is needed);

\(\mathrm {HomAND}(\varvec{c_1},\varvec{c_2}) = \textsf {Bootstrap}\left( (\varvec{0},\frac{1}{8})\text {+}\varvec{c_1}\text {+}\varvec{c_2} \right) \);

\(\mathrm {HomNAND}(\varvec{c_1},\varvec{c_2}) = \textsf {Bootstrap}\left( (\varvec{0},\frac{5}{8})\text {}\varvec{c_1}\text {}\varvec{c_2} \right) \);

\(\mathrm {HomOR}(\varvec{c_1},\varvec{c_2}) = \textsf {Bootstrap}\left( (\varvec{0},\frac{1}{8})\text {+}\varvec{c_1}\text {+}\varvec{c_2} \right) \);

\(\mathrm {HomXOR}(\varvec{c_1},\varvec{c_2}) = \textsf {Bootstrap}\left( 2\cdot (\varvec{c_1}\text {}\varvec{c_2}) \right) \).
The \(\mathrm {HomXOR}(\varvec{c_1},\varvec{c_2})\) gate can be achieved also by performing \(\textsf {Bootstrap}\left( 2\cdot (\varvec{c_1}\text {+}\varvec{c_2}) \right) \).
4.5 Parameters Implementation and Timings
In this section, we review our implementation parameters and provide a comparison with previous works.
Samples. From a theoretical point of view, our scale invariant scheme is defined over the real torus \(\mathbb {T}\), where all the operations are modulo 1. In practice, since we can work with approximations, we chose to rescale the elements over \(\mathbb {T}\) by a factor \(2^{32}\), and to map them to 32bit integers. Thus, we take advantage of the native and automatic mod \(2^{32}\) operations, including for the external multiplication with integers. Except for some FFT operations, this seems more stable and efficient than working with floating point numbers and reducing modulo 1 regularly. Polynomials mod \(X^N+1\) are either represented as the classical list of the N coefficients, either using the Lagrange halfcomplex representation, which consists in the complex (\(2\cdot 64\)bits) evaluations of the polynomial over the roots of unity \(\exp (i(2j+1)\pi /N)\) for \(j\in [\![0,\frac{N}{2}[\![\). Indeed, the \(\frac{N}{2}\) other evaluations are the conjugates of the first ones, and do not need to be stored. The conversion between both representations is done via Fast Fourier Transform (FFT) (using the library FFTW [12], also used by [11]). Note that the direct FFT transform is \(\sqrt{2N}\) lipschitzian, so the lagrange halfcomplex representation tolerates approximations, and 53 bits of precision is indeed more than enough, provided that the real representative remains small. However, the modulo 1 that can reduce the coefficients of Torus polynomials cannot be applied from the Lagrange representation: we need to perform regular transformations to and from the classical representation. Luckily, it does not represent an overhead, since these conversions are needed anyway, at each iteration of the bootstrapping in order to decompose the accumulator in base \(\varvec{h}\).
Parameters. We take the same or even stronger security parameters as [11], but we adapt them to our notations. We used \(n = 500\), \(N = 1024\), \(k = 1\).

\(\mathsf {LWE}\) samples: \(32 \cdot (n+1)\) bits \(\approx \) 2 KBytes.
The mask of all \(\mathsf {LWE}\) samples (initial and KeySwitch) are clamped to multiples of \(\frac{1}{2048}\). Therefore, the phase computation in the bootstrapping is exact (\(\delta =0\)).

\({\mathrm {TLWE}}\) samples: \((k+1) \cdot N \cdot 32\) bits \(\approx \) 8 KBytes.

\({\mathrm {TGSW}}\) samples: \((k+1) \cdot \ell \) \({\mathrm {TLWE}}\) samples \(\approx \) 48 KBytes.
To define \(\varvec{h}\) and \(\text {Dec}_{\varvec{h},\beta ,\epsilon }\), we used \(\ell = 3\), \(B_g = 1024\), so \(\beta =512\) and \(\epsilon =2^{31}\).

Bootstrapping Key: n \({\mathrm {TGSW}}\) samples \(\approx \) 23.4 MBytes.
We used \(\alpha = 9.0 \cdot 10^{9}\). Since we have a lower noise overhead, our parameter is higher than the parameter \(\approx 3.25\cdot 10^{10}\) of [11], (i.e. ours is more secure), but in counterpart, our \({\mathrm {TLWE}}\) key is binary. See Sect. 6 for more details on the security analysis.

Key Switching Key: \(k \cdot N \cdot t\) \(\mathsf {LWE}\) samples \(\approx \) 29.2 MBytes.
we used \(\gamma = 3.05 \cdot 10^{5}\), \(t = 15\) (The decomposition in the key switching has an precision \(2^{16}\)).

Correctness: The final error variance after bootstrapping is \(9.24.10^{6}\), by Corollary 4.7. It corresponds to a standard deviation of \(\sigma =0.00961\). In [11], the final standard deviation is larger 0.01076. In other words, the noise amplitude after our bootstrapping is \(<\frac{1}{16}\) with very high probability \(\mathsf {erf}(1/16\sqrt{2}\sigma )\ge 12^{33.56}\) (this is comparable to probability \(\ge 12^{32}\) in [11]).
Note that the size of the key switching key can be reduced by a factor \(n+1=501\) if all the masks are the output of a pseudo random function; we may for instance just give the seed. The same technique can be applied to the bootstrapping key, on which the size is only reduced by a factor \(k+1=2\).
#(Classical products)  #(FFT + Lagrange repr.)  

External product  12  8 
Bootstrapping  6000  4006 
Bootstrapping in [11]  (72000)  48000 
In practice, we obtained a running time of 52ms per bootstrapping using the Lagrange halfcomplex representation. It is coherent with the 12x speedup predicted by the table. Profiling the execution shows that the FFTs and complex multiplications are still taking more than 90 % of the total time. Other operations like keyswitch have a negligible running time compared to the main loop of the bootstrapping.
5 Leveled Homomorphic Encryption
In the previous section, we showed how to accelerate the bootstrapping computation in FHE. In this section, we focus on the improvement of Leveled Homomorphic encryption schemes. We present an efficient way to evaluate any deterministic automata homomorphically.
5.1 Boolean Circuits Interpretation
In order to express our external product in a circuit, we consider two kinds of wires: control wires which encode either a small integer or a small integer polynomial. They will be represented by a \({\mathrm {TGSW}}\) sample; and data wires which encode either a sample in \(\mathbb {T}\) or in \(\mathbb {T}_N[X]\). They will be represented by a \({\mathrm {TLWE}}\) sample. The gates we present contain three kinds of slots: control input, data input and data output. In this following section, the rule to build valid circuits is that all control wires are freshly generated by the user, and the data input ports of our gates can be either freshly generated or connected to a data output or to another gate.
We now give an interpretation of our leveled scheme, to simulate boolean circuits only. In this case, the message space of the input \({\mathrm {TLWE}}\) samples will be restricted to \(\{0,\frac{1}{2}\}\), and the message space of control gates to \(\{0,1\}\).

The constant source \(\mathtt {Cst}(\mu )\) for \(\mu \in \{0,\frac{1}{2}\}\) is defined with a single data output equal to \((\varvec{0},\mu )\).

The negation gate \(\mathtt {Not}(\varvec{d})\) takes a single data input \(\varvec{d}\) and outputs \((\varvec{0},\frac{1}{2})\varvec{d}\).

The controlled And gate \(\mathtt {CAnd}(C,\varvec{d})\) takes one control input C and one data input \(\varvec{d}\), and outputs \(C\boxdot \varvec{d}\).

The controlled Mux gate \(\mathtt {CMux}(C,\varvec{d_1},\varvec{d_0})\) takes one control input C and two data inputs \(\varvec{d_1},\varvec{d_0}\) and returns \(C \boxdot (\varvec{d_1}\varvec{d_0}) + \varvec{d_0}\).
Unlike classical circuits, these gates have to be composed with each other depending on the type of inputs/outputs. In our applications, the \({\mathrm {TGSW}}\) encryptions are always fresh ciphertexts.
Theorem 5.1
(Correctness). Let \(\mu \in \{0,\frac{1}{2}\}\), \(\varvec{d},\varvec{d_1},\varvec{d_0}\in {\mathrm {TLWE}}_{\varvec{s}}(\{0,\frac{1}{2}\})\) and \(C\in {\mathrm {TGSW}}_{\varvec{s}}(\{0,1\})\).
Theorem 5.2
Proof

Open image in new window of norm \(\le (k+1)\ell N\beta \eta _C\);

\(\mu _C \varvec{\epsilon _{\text {dec}}}\) of norm \(\le (kN+1)\epsilon \);

\(z_{d_0} + \mu _C(z_{d_1}z_{d_0})\), which is either \(z_{d_1}\) or \(z_{d_0}\), depending on the value of \(\mu _C\);

\(\mu _{d_0}+\mu _C\cdot (\mu _{d_1}\mu _{d_0})\), which is the output message \(\mu _C\text {?}\mu _{d_1}\text {:}\mu _{d_0}\), and is not part of the noise.
Thus, summing the three terms concludes the proof. \(\square \)
Corollary 5.3
Proof
Same as Theorem 5.2, replacing all norm inequalities by Variance inequalities. \(\square \)
We now obtain theorems which are analogue to [13], with a bit less noise on the mux gate, but with the additional restriction that CAnd and CMux have a control wire, which must necessarily be a fresh \({\mathrm {TGSW}}\) ciphertext.
The next step is to understand the meaning of this additional restriction in terms of expressiveness of the resulting homomorphic circuits.
It is clear that we cannot build a random boolean circuit, and just apply the noise recurrence formula from Theorem 5.2 or Corollary 5.3 to get the output noise level. Indeed, it is not allowed to connect a data wire to an control input.
In the following section, we will show that we can still obtain the two most important circuits of [13], namely the deterministic automata circuits, which can evaluate any permutation of regular languages with noise propagation sublinear in the word length and the lookup table, which evaluates arbitrary functions with sublinear noise propagation.
5.2 Deterministic Automata
It is folklore that every deterministic program which reads its input bitbybit in a predetermined order, uses less than B bits of memory, and produces a boolean answer, is equivalent to a deterministic automata of at most \(2^B\) states (independently of the time complexity). This is in particular the case for every boolean function of p variables, that can be trivially executed with \(p1\) bits of internal memory by reading and storing its input bitbybit before returning the final answer. It is of particular interest for most arithmetic functions, like addition, multiplication, or CRT operations, whose naive evaluation only requires \(O(\log (p))\) bits of internal memory.
Let \(\mathcal {A}=(Q,i,T_0,T_1,F)\) be a deterministic automata (over the alphabet \(\{0,1\}\), where Q is the set of states, \(i\in Q\) denotes the initial state, \(T_0,T_1\) are the two transitions (deterministic) functions from Q to Q and \(F\subset Q\) is the set of final states. Such automata is used to evaluate (rational) boolean functions on words where the image of \((w_1,\dots ,w_p)\in \mathbb {B}^p\) is equal to 1 iff. \(T_{w_p}(T_{w_{p1}}(\dots (T_{w_{1}}(i))))\in F\), and 0 otherwise.
Following the construction of [13], we show that we are able to evaluate any deterministic automata homomorphically using only constant and CMux gates efficiently. The noise propagation remains linear in the length of the word w, but compared to [13, Theorem 7.11], we reduce the number of evaluated CMux gates by a factor w for a specific class of acyclic automata that are linked to fixedtime algorithms.
Theorem 5.4
(Evaluating Deterministic Automata). Let \(\mathcal {A}=(Q,i,T_0,T_1,F)\) be a deterministic automata. Given p valid \({\mathrm {TGSW}}\) samples \(C_1,\dots ,C_p\) encrypting the bits of a word \(\varvec{w}\in \mathbb {B}^p\), with noise amplitude Open image in new window and Open image in new window , by evaluating at most \(\le p\#Q\) Cmux gates, one can produce a TLWE sample \(\varvec{d}\) which encrypts \(\frac{1}{2}\) iff \(\mathcal {A}\) accepts \(\varvec{w}\), and 0 otherwise such that Open image in new window . Assuming Heuristic 3.6, Open image in new window . Furthermore, the number of evaluated \(\mathtt {CMux}\) can be decreased to \(\le \#Q\). if \(\mathcal {A}\) satisfies either one of the conditions:
(i) for all \(q\in Q\) (except KO states), all the words that connect i to q have the same length;
(ii) \(\mathcal {A}\) only accepts words of the same length.
Proof
We initialize \(\#Q\) noiseless ciphertexts \(\varvec{d}_{q,p}\) for \(q \in Q\) with \(\varvec{d}_{q,p}=(\varvec{0},\frac{1}{2})=\mathtt {Cst}(\frac{1}{2})\) if \(q\in F\) and \(\varvec{d}_{q,p}=(\varvec{0},0)=\mathtt {Cst}(0)\) otherwise. Then for each letter of \(\varvec{w}\), we map the transitions as follow for all \(q\in Q\) an \(j\in [\![0,p1 ]\!]\): \(\varvec{d}_{q,j1}=\mathtt {CMux}(\varvec{C_j},\varvec{d_{T_1(q),j}},\varvec{d_{T_0(q),j}})\). And we finally output \(\varvec{d_{i,0}}\).
For the complexity, each \(\varvec{d}_{q,j}\) for all \(q\in Q\) an \(j\in [\![0,p1 ]\!]\) is computed with a single \(\mathtt {CMux}\). By applying the noise propagation inequalities of Theorem 5.2 and Corollary 5.3, it follows by an immediate induction on j from p down to 0, that for all \(j\in [\![0,p ]\!]\), \(\left\ \textsf {Err}(\varvec{d_{q,j}})\right\ _\infty \le (pj)\cdot ((k+1)\ell N\beta \eta + (kN+1)\epsilon )\) and \(\textsf {Var}(\textsf {Err}(\varvec{d_{q,j}}))\le (pj)\cdot ((k+1)\ell N\beta ^2\vartheta + (kN+1)\epsilon ^2)\).
Note that it is sufficient to evaluate only the \(\varvec{d_{q,j}}\) when q is accessible by at least one word of length j. Thus, if the \(\mathcal {A}\) satisfies the additional condition (i), then for each \(q\in Q\), we only need to evaluate \(\varvec{d_{q,j}}\) for at most one position j. Thus, we evaluate less than \(\#Q\) CMux gates in total.
Finally, if \(\mathcal {A}\) satisfies (ii), then we first compute the minimal deterministic automata of the same language (and removing the KO state if it is present), then with an immediate proof by contradiction, this minimal automata satisfies (i), and has less than \(\#Q\) states. \(\square \)
For sake of completeness, since every boolean function with p variables can be evaluated by an Automata (that accepting only words of length p), we obtain the evaluation of arbitrary boolean function as an immediate corollary, which is the leveled variant of [13, Corollary 7.9].
Lemma 5.5
(Arbitrary Functions). Let f be any boolean function with p inputs, and \(\varvec{c_1},\dots ,\varvec{c_p}\) be p \({\mathrm {TGSW}}_{\varvec{s}}(\{0,1\})\) ciphertexts of \(x_1,\dots ,x_p\in \{0,1\}\), with noise Open image in new window . Then the CMuxbased Reduced Binary Decision Diagram of f computes a \({\mathrm {TLWE}}_{\varvec{s}}\) ciphertext \(\varvec{d}\) of \(\frac{1}{2}f(x_1, \dots ,x_p)\) with noise Open image in new window by evaluating \(\mathcal {N}(f)\le 2^p\) CMux gates where \(\mathcal {N}(f)\) is the number of distinct partial functions \((x_l,\dots ,x_p)\rightarrow f(x_1,\dots ,x_p)\) for all \(l\in [\![1,p+1]\!], (x_1,\dots ,x_{l1})\in \mathbb {B}^{l1}\).
Proof
(sketch). A trivial automata which evaluates f consists in its full binary decision tree, with the initial state \(i=q_{0,0}\) as the root, each state \(q_{l,j}\) depth \(l\in [\![0,p1]\!]\) and \(j\in [\![0,2^l1]\!]\) is connected with \(T_0(q_{l,j})=q_{l+1,2j}\) and \(T_1(q_{l,j})=q_{l+1,2j+1}\), and at depth p, \(q_{p,j}\in F\) iff \(f(x_1,\dots ,x_p)=1\) where \(j=\sum _{l=1}^{p} x_l2^{pl}\). The minimal version of this automaton has at most \(\mathcal {N}(f)\) states, the rest follows from Theorem 5.4. \(\square \)
Application: Compilation for Leveled Homomorphic Circuits. We now give an example of how we can map a problem to an automata in order to perform a leveled homomorphic evaluation. We will illustrate this concept on the computation of the pth bit of an integer product \(a\times b\) where a and b are given in base 2. We do not claim that the automata approach is the fastest way to solve the problem, arithmetic circuits based on bitDecomp/recomposition are likely to be faster. But the goal is to clarify the generality and simplicity of the process. All we need is a fixedtime algorithm that solves the problem using the least possible memory. Among all algorithms that compute a product, the most naive ones are in general the best: here, we choose the elementaryschool multiplication algorithm that computes the product bitbybit, starting from the LSB, and counting the current carry with the fingers. The pseudocode of this algorithm is recalled in Algorithm 4. The pseudocode is almost given as a deterministic automata, since each step reads a single input bit, and uses it to update its internal state (x, y), that can be stored in only \(M=\log _2(4p)\) bits of memory. More precisely, the states Q of the corresponding automata \(\mathcal {A}\) would be all (j, (x, y)) where \(j\in [\![0,j_{\max }]\!]\) is the step number (i.e. number of reads from the beginning) and \((x,y)\in \mathbb {B}\times [\![0,2p[\![\) are the 4p possible values of the internal memory. The initial state is (0, 0, 0), the total number of reads \(j_{\max }\) is \(\le p^2\), and the final states are all \((j_{\max },x,y)\) where y is odd. This automata satisfies condition (i), since a state (j, x, y) can only be reached after reading j inputs, so by Theorem 5.4, the output can be homomorphically computed by evaluating less than \(\#Q\le 4p^3\) CMux gates, with some O(p) noise overhead. The number of Mux can decrease by a factor 8 by minimizing the automata. Using the same parameters as the bootstrapping key, for \(p=32\), evaluating one Mux gate takes about 0.0002 s, so the whole program (16384 Cmux) would be homomorphically evaluated in 3.2 s.
6 Practical Security Parameters
For an asymptotical security analysis, since the phase is lipschitzian, \({\mathrm {TLWE}}\) samples can be equivalently mapped to their closest binLWE (or binRingLWE), which in turn can be reduced to standard LWE/ringLWE with full secret using the modulusdimension reduction [6] or groupswitching techniques [13]. It can then be reduced to worst case BDD instances. It is also easy to write a direct and tighter searchtodecision reductions for \({\mathrm {TLWE}}\), or a direct worstcase to averagecase reductions from \({\mathrm {TLWE}}\) to GapSVP or BDD.
In this section, we will rather focus on the practical hardness of LWE, and express after all the security parameter \(\lambda \) directly as a function of the entropy of the secret n and the error rate \(\alpha \).
Our analysis is based on the work described in [2]. This paper studies many attacks against LWE, ranging from a direct BDD approach with standard lattice reduction, sieving, or with a variant of BKW [4], resolution via man in the middle attacks. Unfortunately, they found out that there is no singlebest attack. According to their results table [2, Sect. 8, Tables 7 and 8] for the range of dimensions and noise used for FHE, it seems that the SISdistinguisher attack is often the best candidate (related to the LindnerPeikert [17] model, and also used in the parameter estimation of [11]). However, since q is not a parameter in our definition of \({\mathrm {TLWE}}\), we need to adapt their results. This section relies on the following heuristics concerning the experimental behaviour of lattice reduction algorithms. They have been extensively verified and used in practice.
 1.
The fastest lattice reduction algorithms in practice are blockwise lattice algorithms (like BKZ2.0 [8], DBKZ [20], or the slide reduction with large blocksize [14, 20]).
 2.
Practical blockwise lattice reduction algorithms have an intrinsic quality \(\delta >1\) (which depends on the blocksize), and given a mdimensional real basis B of volume V, they compute short vectors of norm \(\delta ^m V^{1/m}\).
 3.
The running time of BKZ2.0 (expressed in bit operations) as a function of the quality parameter is: \(\log _2(t_{\text {BKZ}})(\delta ) = \frac{0.009}{\log _2(\delta )^2}27\) (According to the extrapolation by Albrecht et al. [1] of LiuNguyen datasets [18]).
 4.
The coordinates of vectors produced by lattice reduction algorithms are balanced. Namely, if the algorithm produces vectors of norm \(\left\ v\right\ _2\), each coefficient has a marginal Gaussian distribution of standard deviation \(\left\ v\right\ _2/\sqrt{n}\). Provided that the geometry of the lattice is not too skewed in particular directions, this fact can sometimes be proved, especially if the reduction algorithm samples vectors with Gaussian distribution over the input lattice. This simple fact is at the heart of many attacks based on Coppersmith techniques with lattices.
 5.
For midrange dimensions and polynomially small noise, the SISdistinguisher plus lattice reduction algorithms combined with the searchtodecision is the best attack against LWE; (but this point is less clear, according to the analysis of [1], at least, this attack model tends to overestimate the power of the attacker, so it should produce more conservative parameters).
 6.
Except for small polynomial speedups in the dimension, we don’t know better algorithms to find short vectors in random anticirculant lattices than generic algorithms. This folklore assumption seems still upto date at the time of writing.
If one finds a small integer combination that cancels the mask of homogeneous LWE samples, one may use it to distinguish them from uniformly chosen random samples. If this distinguisher has small advantage \(\varepsilon \), we repeat it about \(1/\varepsilon ^2\) times. Then, thanks to the search to decision reduction (which is particularly tight with our TLWE formulation), each successful answer of the distinguisher reveals one secret key bit. To handle the continuous torus, and since q is not a parameter of \({\mathrm {TLWE}}\) either, we show how to extend the analysis of [2] to our scheme.
The direct approach is to apply the fastest algorithm (BKZ2.0 or slide reduction) directly to \(f_q(B)\), which outputs a vector \(f_q(\varvec{w})\) of standard deviation \(\delta ^{n+m}/\sqrt{n+m}\) where \(\delta \in ]1,1.1]\) is the quality of the reduction.
Once we have a vector \(\varvec{w}\), all we need is to analyse the term \(\sum _{i=1}^m v_i b_i = \sum _{i=1}^m v_i (\varvec{a_i} \varvec{s} + e_i) = \varvec{s}\cdot \sum _{i=1}^m (v_i \varvec{a_i}) + \sum _{i=1}^m v_i \varvec{e_i} = \varvec{s}\cdot \varvec{x} + \varvec{v}\cdot \varvec{e}\).
The table shows that the strength of the lattice reduction is compatible with the values announced in [11]. Our model predicts that the lattice reduction phase is harder (\(\delta = 1.0055\) in our analysis and \(\delta = 1.0064\) in [11]), but the value of \(\varepsilon \) is bigger in our case. Overall, the security of their parametersset is evaluated by our model to 136bits of security, which is larger than the \(\ge 100\)bits of security announced in [11]. The main reason is that we take into account the number of times we need to run the SISdistinguisher to obtain a non negligible advantage. Since our scheme has a smaller noise propagation overhead, we were able to raise the input noise levels in order to strengthen the system, so with the parameters we chose in our implementation, our model predicts 194bits of security for the bootstrapping key and 136bits for the keyswitching key (which remains the bottleneck).
7 Conclusion
In this paper, we presented a generalization of the \(\mathsf {LWE}\) and \({\mathrm {GSW}}\) homomorphic encryption schemes. We improved the execution timing of the bootstrapping procedure and we reduced the size of the keys by keeping at least the same security as in previous fast implementations. This result has been obtained by simplifying the multiplication morphism, which is the main operation used in the scheme we described. As a proof of concept we implemented the scheme itself and we gave concrete parameters and timings. Furthermore, we extend the applicability of the external product to leveled homomorphic encryption. We finally gave a detailed security analysis. Now the main drawback to make our scheme adapted for real life applications is the expansion factor of the ciphertexts of around 400000 with fairly limited batching capabilities.
Footnotes
 1.
For the first two distributions, it is tight, but the uniform distribution over \([\sqrt{3}\sigma ,\sqrt{3}\sigma ]\) is even \(0.78\sigma \)subgaussian.
 2.
Mathematically speaking, a more accurate notion would be \(\text {dist}_p(\varvec{x},\varvec{y})=\left\ \varvec{x}\varvec{y}\right\ _p\), which is a distance. However, the norm symbol is clearer for almost all practical purposes.
 3.
A submodule G is sufficiently dense if there exists an intermediate submodule H such that \(G\subseteq H\subseteq \mathbb {T}^n\), the relative smoothing parameter \(\eta _{H,\varepsilon }(G)\) is \(\le \alpha \), and H is the orthogonal in \(\mathbb {T}^n\) of at most \(n1\) vectors of \(\mathbb {Z}^n\). This definition allows to convert any (Ring)LWE with nonbinary secret to a TLWE instance via binary decomposition.
Notes
Acknowledgements
This work has been supported in part by the CRYPTOCOMP project.
References
 1.Albrecht, M.R., Cid, C., Faugère, J., Fitzpatrick, R., Perret, L.: On the complexity of the BKW algorithm on LWE. Des. Codes Crypt. 74(2), 325–354 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 2.Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. J. Math. Crypt. 9(3), 169–203 (2015)MathSciNetzbMATHGoogle Scholar
 3.AlperinSheriff, J., Peikert, C.: Faster bootstrapping with polynomial error. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8616, pp. 297–314. Springer, Heidelberg (2014). doi: 10.1007/9783662443712_17 CrossRefGoogle Scholar
 4.Blum, A., Kalai, A., Wasserman, H.: Noisetolerant learning, the parity problem, and the statistical query model. J. ACM 50(4), 506–519 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
 5.Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. In: ITCS, pp. 309–325 (2012)Google Scholar
 6.Brakerski, Z., Langlois, A., Peikert, C., Regev, O., Stehlé, D.: Classical hardness of learning with errors. In: Proceedings of 45th STOC, pp. 575–584. ACM (2013)Google Scholar
 7.Brakerski, Z., Vaikuntanathan, V.: Efficient fully homomorphic encryption from (standard) LWE. In: FOCS, pp. 97–106 (2011)Google Scholar
 8.Chen, Y., Nguyen, P.Q.: BKZ 2.0: better lattice security estimates. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 1–20. Springer, Heidelberg (2011). doi: 10.1007/9783642253850_1 CrossRefGoogle Scholar
 9.Cheon, J.H., Stehlé, D.: Fully homomophic encryption over the integers revisited. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 513–536. Springer, Heidelberg (2015). doi: 10.1007/9783662468005_20 Google Scholar
 10.Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: A homomorphic LWE based evoting scheme. In: Takagi, T. (ed.) PQCrypto 2016. LNCS, vol. 9606, pp. 245–265. Springer, Heidelberg (2016). doi: 10.1007/9783319293608_16 CrossRefGoogle Scholar
 11.Ducas, L., Micciancio, D.: FHEW: bootstrapping homomorphic encryption in less than a second. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 617–640. Springer, Heidelberg (2015). doi: 10.1007/9783662468005_24 Google Scholar
 12.Frigo, M., Johnson, S.G.: The design, implementation of FFTW3. In: Proceedings of the IEEE, vol. 93, no. 2, pp. 216–231 (2005). Special issue on “Program Generation, Optimization, and Platform Adaptation”Google Scholar
 13.Gama, N., Izabachène, M., Nguyen, P.Q., Xie, X.: Structural lattice reduction: generalized worstcase to averagecase reductions. IACR Crypt. ePrint Arch. 2014, 48 (2014)zbMATHGoogle Scholar
 14.Gama, N., Nguyen, P.Q.: Predicting lattice reduction. In: Smart, N. (ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 31–51. Springer, Heidelberg (2008). doi: 10.1007/9783540789673_3 CrossRefGoogle Scholar
 15.Gentry, C.: Fully homomorphic encryption using ideal lattices. In: 41st ACM STOC, pp. 169–178 (2009)Google Scholar
 16.Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: conceptuallysimpler, asymptoticallyfaster, attributebased. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8042, pp. 75–92. Springer, Heidelberg (2013). doi: 10.1007/9783642400414_5 CrossRefGoogle Scholar
 17.Lindner, R., Peikert, C.: Better key sizes (and Attacks) for LWEbased encryption. In: Kiayias, A. (ed.) CTRSA 2011. LNCS, vol. 6558, pp. 319–339. Springer, Heidelberg (2011). doi: 10.1007/9783642190742_21 CrossRefGoogle Scholar
 18.Liu, M., Nguyen, P.Q.: Solving BDD by enumeration: an update. In: Dawson, E. (ed.) CTRSA 2013. LNCS, vol. 7779, pp. 293–309. Springer, Heidelberg (2013). doi: 10.1007/9783642360954_19 CrossRefGoogle Scholar
 19.Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 1–23. Springer, Heidelberg (2010). doi: 10.1007/9783642131905_1 CrossRefGoogle Scholar
 20.Micciancio, D., Walter, M.: Practical, predictable lattice basis reduction. In: Fischlin, M., Coron, J.S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 820–849. Springer, Heidelberg (2016). doi: 10.1007/9783662498903_31 CrossRefGoogle Scholar
 21.Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. In: STOC, pp. 84–93 (2005)Google Scholar