Stream Ciphers: A Practical Solution for Efficient HomomorphicCiphertext Compression
 23 Citations
 1.3k Downloads
Abstract
In typical applications of homomorphic encryption, the first step consists for Alice to encrypt some plaintext m under Bob’s public key \(\mathsf {pk}\) and to send the ciphertext \(c = \mathsf {HE}_{\mathsf {pk}}(m)\) to some thirdparty evaluator Charlie. This paper specifically considers that first step, i.e. the problem of transmitting c as efficiently as possible from Alice to Charlie. As previously noted, a form of compression is achieved using hybrid encryption. Given a symmetric encryption scheme \(\mathsf {E}\), Alice picks a random key k and sends a much smaller ciphertext \(c' = (\mathsf {HE}_{\mathsf {pk}}(k), \mathsf {E}_k(m))\) that Charlie decompresses homomorphically into the original c using a decryption circuit \(\mathcal {C}_{{\mathsf {E}^{1}}}\).
In this paper, we revisit that paradigm in light of its concrete implementation constraints; in particular \(\mathsf {E}\) is chosen to be an additive IVbased stream cipher. We investigate the performances offered in this context by Trivium, which belongs to the eSTREAM portfolio, and we also propose a variant with 128bit security: Kreyvium. We show that Trivium, whose security has been firmly established for over a decade, and the new variant Kreyvium have an excellent performance.
Keywords
Stream ciphers Homomorphic cryptography Trivium1 Introduction
Since the breakthrough result of Gentry [30] achieving fully homomorphic encryption (FHE), many works have been published on simpler and more efficient schemes based on homomorphic encryption. Because they allow arbitrary computations on encrypted data, FHE schemes suddenly opened the way to exciting new applications, in particular cloudbased services in several areas (see e.g. [33, 42, 46]).
Compressed Encryption. In these cloud applications, it is often assumed that some data is sent encrypted under a homomorphic encryption (HE) scheme to the cloud to be processed in a way or another. It is thus typical to consider, in the first step of these applications, that a user (Alice) encrypts some data m under some other user’s public key \(\mathsf {pk}\) (Bob) and sends some homomorphic ciphertext \(c=\mathsf {HE}_\mathsf {pk}(m)\) to a thirdparty evaluator in the Cloud (Charlie). The roles of Alice and Bob are clearly distinct, even though they might be played by the same entity in some applications.
Prior Art. The cost of homomorphically evaluating several symmetric primitives has been investigated, including optimized implementations of AES [18, 23, 31], and of the lightweight block ciphers Simon [43] and Prince [24]. Usually lightweight block ciphers seem natural candidates for efficient evaluations in the encrypted domain. However, they may also lead to much worse performances than a homomorphic evaluation of, say, AES. Indeed, contemporary HE schemes use noisy ciphertexts, where a fresh ciphertext includes a noise component which grows along with homomorphic operations. Usually a homomorphic multiplication increases the noise by much larger proportions than a homomorphic addition. The maximum allowable level of noise (determined by the system parameters) then depends mostly on the multiplicative depth of the circuit. Many lightweight block ciphers balance out their simplicity by a large number of rounds, e.g. KATAN and KTANTAN [11], with the effect of considerably increasing their multiplicative depth. This type of design is therefore prohibitive in a HE context. Still Prince appears to be a much more suitable block cipher for homomorphic evaluation than AES (and than Simon), because it specifically targets applications that require a low latency; it is designed to minimize the cost of an unrolled implementation [9] rather than to optimize e.g. silicon area.
At Eurocrypt 2015, Albrecht, Rechberger, Schneider, Tiessen and Zohner observed that the usual criteria that rule the design of lightweight block ciphers are not appropriate when designing a symmetric encryption scheme with a lowcost homomorphic evaluation [2]. Indeed, both the number of rounds and the number of binary multiplications required to evaluate an Sbox have to be taken into account. Minimizing the number of rounds is a crucial issue for lowlatency ciphers like Prince, while minimizing the number of multiplications is a requirement for efficient masked implementations.
These two criteria have been considered together for the first time by Albrecht et al. in the recent design of a family of block ciphers called LowMC [2] with very small multiplicative size and depth^{1}. However, the proposed instances of LowMC, namely LowMC80 and LowMC128, have recently had some security issues [21]. They actually present some weaknesses inherent in their low multiplicative complexity. Indeed, the algebraic normal forms (i.e., the multivariate polynomials) describing the encryption and decryption functions are sparse and have a low degree. This type of features is usually exploited in algebraic attacks, cube attacks and their variants, e.g. [4, 20, 22]. While these attacks are rather general, the improved variant used for breaking LowMC [21], named interpolation attack [38], specifically applies to block ciphers. Indeed it exploits the sparse algebraic normal form of some intermediate bit within the cipher using that this bit can be evaluated both from the plaintext in the forward direction and from the ciphertext in the backward direction. This technique leads to several attacks including a keyrecovery attack against LowMC128 with time complexity \(2^{118}\) and data complexity \(2^{73}\), implying that the cipher does not provide the expected \(128\)bit security level.
Our Contributions. We emphasize that beyond the task of designing a HEfriendly block cipher, revisiting the whole compressed encryption scheme (in particular its internal mode of operation) is what is really needed in order to take these concrete HErelated implementation constraints into account.
First, we identify that homomorphic decompression is subject to an offline phase and an online phase. The offline phase is plaintextindependent and therefore can be performed in advance, whereas the online phase completes decompression upon reception of the plaintextdependent part of the compressed ciphertext. Making the online phase as quick as technically doable leads us to choose an additive IVbased stream cipher to implement \(\mathsf {E}\). However, we note that the use of a lightweight block cipher as the buildingblock of that stream cipher usually provides a security level limited to \(2^{n/2}\) where n is the block size [48], thus limiting the number of encrypted blocks to (typically) less than \(2^{32}\) (i.e. 32 GB for 64bit blocks).
As a result, we propose our own candidate for \(\mathsf {E}\): the keystream generator Trivium [13], which belongs to the eSTREAM portfolio of recommended stream ciphers, and a new proposal called Kreyvium, which shares the same internal structure but allows for bigger keys of 128 bits^{2}. The main advantage of Kreyvium over Trivium is that it provides \(128\)bit security (instead of \(80\)bit) with the same multiplicative depth, and inherits the same security arguments. It is worth noticing that the design of a variant of Trivium which guarantees a \(128\)bit security level has been raised as an open problem for the last ten years [1, p. 30]. Beside a higher security level, it also accommodates longer IVs, so that it can encrypt up to \(46 \cdot 2^{128}\) plaintext bits under the same key, with multiplicative depth only 12. Moreover, both Trivium and Kreyvium are resistant against the interpolation attacks used for breaking LowMC since these ciphers do not rely on a permutation which would enable the attacker to compute backwards.
We implemented our construction and instantiated it with Trivium, Kreyvium and LowMC in CTRmode. Our results show that the promising performances attained by the HEdedicated block cipher LowMC can be achieved with wellknown primitives whose security has been firmly established for over a decade.
Organization of the Paper. We introduce a general model and a generic construction to compress homomorphic ciphertexts in Sect. 2. Our construction using Trivium and Kreyvium is described in Sect. 3. Subsequent experimental results are presented in Sect. 4.
2 A Generic Design for Efficient Decompression
In this section, we describe our model and generic construction to transmit compressed homomorphic ciphertexts between Alice and Charlie. We use the same notation as in the introduction: Alice wants to send some plaintext m, encrypted under Bob’s public key \(\mathsf {pk}\) (of an homomorphic encryption scheme \(\mathsf {HE}\)) to a third party evaluator Charlie.
2.1 Offline/Online Phases in Ciphertext Decompression
 1.
an offline keysetup phase which only depends on Bob’s public key and can be performed once and for all before Charlie starts receiving compressed ciphertexts encrypted under Bob’s key;
 2.
an offline decompression phase which can be performed only based on some plaintextindependent material found in the compressed ciphertext;
 3.
an online decompression phase which aggregates the result of the offline phase with the plaintextdependent part of the compressed ciphertext and (possibly very quickly) recovers the decompressed ciphertext c.
2.2 Our Generic Construction
 1.
Set \(t= \lceil \ell _m / N \rceil \),
 2.
Set \((x_1, \dots , x_t) = G(IV ; t\ell _x)\),
 3.
Randomly pick \(k\leftarrow \{0,1\}^{\ell _k}\),
 4.
For \(1\le i \le t\), compute \(z_i = F_k(x_i)\),
 5.
Set \(\mathsf {keystream}\) to the \(\ell _m\) leftmost bits of \(z_1\,\,\dots \,\,z_t\),
 6.
Output \(c' = (\mathsf {HE}_{\mathsf {pk}}(k), m\oplus \mathsf {keystream})\).
 1.
Set \(t= \lceil \ell _m / N \rceil \),
 2.
Set \((x_1, \dots , x_t) = G(IV ; t\ell _x)\),
 3.
For \(1\le i \le t\), compute \(\mathsf {HE}_{\mathsf {pk}}(z_i) = \mathcal {C}_{{F}}\left( \mathsf {HE}_{\mathsf {pk}}(k), x_i\right) \) with some circuit \(\mathcal {C}_{{F}}\),
 4.
Deduce \(\mathsf {HE}_{\mathsf {pk}}(\mathsf {keystream})\) from \(\mathsf {HE}_{\mathsf {pk}}(z_1),\dots ,\mathsf {HE}_{\mathsf {pk}}(z_t)\),
 5.
Compute \(c = \mathsf {HE}_{\mathsf {pk}}(m) = \mathcal {C}_{{\oplus }}\left( \mathsf {HE}_{\mathsf {pk}}(\mathsf {keystream}), m\oplus \mathsf {keystream}\right) \).
The circuit \(\mathcal {C}_{{\oplus }}\) computes \(\mathsf {HE}(a \oplus b)\) given \(\mathsf {HE}(a)\) and b where a and b are bitstrings of the same size. In our construction, the cost of decompression per plaintext block is fixed and roughly equals one single evaluation of the circuit \(\mathcal {C}_{{F}}\); most importantly, the multiplicative depth of the decompression circuit is also fixed, and set to the depth of \(\mathcal {C}_{{F}}\).
How Secure are Compressed Ciphertexts? From a highlevel perspective, compressed homomorphic encryption is just hybrid encryption and relates to the generic KEMDEM construct. A complete characterization of the security results attached to the KEMDEM framework is presented in [35]. In particular when the KEM and the DEM are INDCPA, the resulting hybrid PKE scheme is at least INDCPA. This result applies directly here: assuming the semantic security of our homomorphic KEM^{3}, and a generalpurpose INDCPA secure DEM, our compressed encryption scheme is INDCPA secure.
Interestingly, the output of an iterated PRF used in CTR mode is computationally indistinguishable from random [6, Theorem 13]. Hence, under the assumption that Trivium or Kreyvium is a PRF^{4}, the keystream \(z_1 \,\,\dots \,\,z_t\) produced by our construction is also indistinguishable. It follows directly from [35] that the compressed encryption scheme is INDCPA. Although the security of Trivium and Kreyvium is empiric, Sect. 3 provides a strong rationale for the Kreyvium design and makes it the solution with the smallest homomorphic evaluation latency known so far.
Why not Use a Block Cipher for F ? Although not specifically in these terms, the use of lightweight block ciphers like Prince and Simon has been proposed in the context of compressed homomorphic ciphertexts e.g. [24, 43]. However a complete encryption scheme based on the ciphers has not been defined. This is a major issue since the security provided by all classical modes of operation (including all variants of CBC, CTR, CFB, OFB, OCB...) is inherently limited to \(2^{n/2}\) where n is the block size [48] (see also e.g. [39, p. 95]). Only a very few modes providing beyondbirthday security have been proposed, e.g. [37, 50], but they induce a higher implementation cost and their security is usually upperbounded by \(2^{2n/3}\).
In other words, the use of a block cipher operating on 64bit blocks like Prince or Simon32/64 implies that the number of blocks encrypted under the same key should be significantly less that \(2^{32}\) (i.e. 32 GB for 64bit blocks). Therefore, only block ciphers with a large enough block size, like the LowMC instantiation with a \(256\)bit block proposed in [2], are suitable in applications which may require the encryption of more than \(2^{32}\) bits under the same key.
3 Trivium and Kreyvium, Two LowDepth Stream Ciphers

a resynchronization function, \(\mathsf {Sync}\), which takes as input the IV and the key (possibly expanded by some precomputation phase), and outputs some \(n\)bit initial state;

a transition function \(\varPhi \) which computes the next state of the generator;

a filtering function f which computes a keystream segment from the internal state.
Size of the Internal State. A major specificity of our context is that a large internal state can be easily handled. Indeed, in most classical stream ciphers, the internalstate size usually appears as a bottleneck because the overall size of the quantities to be stored highly influences the number of gates in the implementation. This is not the case in our context. It might seem, a priori, that increasing the size of the internal state automatically increases the number of nonlinear operations (because the number of inputs of \(\varPhi \) increases). But, this is not the case if a part of this larger internal state is used, for instance, for storing the secret key. This strategy can be used for increasing the security at no implementation cost. Indeed, the complexity of all generic attacks aiming at recovering the internal state of the generator is \(\mathcal {O}(2^{n/2})\) where \(n\) is the size of the secret part of the internal state even if some part is not updated during the keystream generation. For instance, the timememorydatatradeoff attacks in [5, 8, 32] aim at inverting the function which maps the internal state of the generator to the first keystream bits. But precomputing some values of this function must be feasible by the attacker, which is not the case if the filtering or transition function depends on some secret material. On the other hand, the size \(n'\) of the nonconstant secret part of the internal state determines the data complexity for finding a collision on the internal state: the length of the keystream produced from the same key is limited to \(2^{n'/2}\). But, if the transition function or the filtering function depends on the IV, this limitation corresponds to the maximal keystream length produced from the same key/IV pair. It is worth noticing that many attacks require a very long keystream generated from the same key/IV pair and do not apply in our context since the keystream length is strictly limited by the multiplicative depth of the circuit.
3.1 Trivium in the HE Setting
Trivium [13] is one of the 7 stream ciphers recommended by the eSTREAM project [25]. Due to the small number of nonlinear operations in its transition function, it appears as a natural candidate in our context.
No attack better than an exhaustive key search is known so far on the full Trivium. It can then be considered as secure. The family of attacks that seems to provide the best result on roundreduced versions is the cube attack and its variants [4, 22, 28]. They recover some key bits (resp. provide a distinguisher on the keystream) if the number of initialization rounds is reduced to 799 (resp. 885) rounds out of 1152. The highest number of initialization rounds that can be attacked is 961: in this case, a distinguisher exists for a class of weak keys [41].
Multiplicative Depth. It is easy to see that the multiplicative depth grows quite slowly with the number of iterations. An important observation is that, in the internal state, only the first \(80\) bits in Register 1 (the keybits) are initially encrypted under the HE and that, as a consequence, performing hybrid clear and encrypted data calculations is possible (this is done by means of the following simple rules: \(0\cdot [x]=0\), \(1\cdot [x]=[x]\), \(0+[x]=[x]\) and \(1+[x]=[1]+[x]\), where the square brackets denote encrypted bits and where in all but the latter case, a homomorphic operation is avoided which is specially desirable for multiplications). This optimization allows for instance to increase the number of bits which can be generated (after the \(1152\) blank rounds) at depth \(12\) from \(42\) to \(57\) (i.e., a 35 % increase). Then, the relevant quantity in our context is the multiplicative depth of the circuit which computes \(N\) keystream bits from the \(80\)bit key. The proof of the following proposition is given in [14].
Proposition 1
3.2 Kreyvium
Our first aim is to offer a variant of Trivium with 128bit key and IV, without increasing the multiplicative depth of the corresponding circuit. Besides a higher security level, another advantage of this variant is that the number of possible IVs, and then the maximal length of data which can be encrypted under the same key, increases from \(2^{80}N_{\mathsf {trivium}}(d)\) to \(2^{128}N_{\mathsf {kreyvium}}(d)\). Increasing the key and IVsize in Trivium is a challenging task, mentioned as an open problem in [1, p. 30] for instance. In particular, Maximov and Biryukov [45] pointed out that increasing the keysize in Trivium without any additional modification cannot be secure due to some attack with complexity less than \(2^{128}\). A first attempt in this direction has been made in [45] but the resulting cipher accommodates 80bit IV only, and its multiplicative complexity is higher than in Trivium since the number of AND gates is multiplied by \(2\).
Description. Our proposal, Kreyvium, accommodates a key and an IV of 128 bits each. The only difference with the original Trivium is that we have added to the \(288\)bit internal state a \(256\)bit part corresponding to the secret key and the IV. This part of the state aims at making both the filtering and transition functions key and IVdependent. More precisely, these two functions \(f\) and \(\varPhi \) depend on the key bits and IV bits, through the successive outputs of two shiftregisters \(K^*\) and \(IV^*\) initialized by the key and by the IV respectively. The internal state is then composed of five registers of sizes 93, 84, 111, 128 and 128 bits, having an internal state size of 544 bits in total, among which \(416\) become unknown to the attacker after initialization.
Related Ciphers. KATAN [11] is a lightweight block cipher with a lot in common with Trivium. It is composed of two registers, whose feedback functions are very sparse, and have a single nonlinear term. The key, instead of being used for initializing the state, is introduced by XORing two key informationbits per round to each feedback bit. The recently proposed stream cipher Sprout [3], inspired by Grain but with much smaller registers, also inserts the key in a similar way: instead of using the key for initializing the state, one key informationbit is XORed at each clock to the feedback function. We can see the parallelism between these two ciphers and our newly proposed variant. In particular, the previous security analysis on KATAN shows that this type of design does not introduce any clear weakness. Indeed, the best attacks on roundreduced versions of KATAN so far [29] are meetinthemiddle attacks, that exploit the knowledge of the values of the first and the last internal states (due to the blockcipher setting). As this is not the case here, such attacks, as well as the recent interpolation attacks against LowMC [21], do not apply. The best attacks against KATAN, when excluding MitM techniques, are conditional differential attacks [40, 41].
Design Rationale. We have decided to XOR the keybit \(K^*_0\) to the feedback function of the register that interacts with the content of \((s_1, \ldots , s_{63})\) the later, since \((s_1, \ldots , s_{63})\) is initialized with some key bits. The same goes for the \(IV^*\) register. Moreover, as the keybits that start entering the state are the ones that were not in the initial state, all the keybits affect the state at the earliest.
We also decided to initialize the state with some keybits and with all the IV bits, and not with a constant value, as this way the mixing will be performed quicker. Then we can expect that the internalstate bits after initialization are expressed as more complex and less sparse functions in the key and IV bits.
Our change of constant is motivated by the conditional differential attacks from [41]: the conditions needed for a successful attack are that 106 bits from the IV or the key are equal to ’0’ and a single one needs to be ’1’. This suggests that values set to zero “encourage” nonrandom behaviors, leading to our new constant. In other words, in Trivium, an allzero internal state is always updated in an allzero state, while an allone state will change through time. The 0 at the end of the constant is added for preventing slide attacks.
Multiplicative Depth. Exactly as for Trivium, we can compute the number of keystream bits which can be generated from the key at a given depth (see [14]).
Proposition 2
Security Analysis. We investigate how all the known attacks on Trivium can apply to Kreyvium. A more detailled analysis is provided in [14].
TMDTO. TMDTO attacks aiming at recovering the initial state of the cipher do not apply since the size of the secret part of the internal state (416 bits) is much larger than twice the keysize: the size of the whole secret internal state has to be taken into account, even if the additional \(128\)bit part corresponding to \(K^*\) is independent from the rest of the state. On the other hand, TMDTO aiming at recovering the key have complexity larger than exhaustive key search since the key and the IV have the same size [12, 36].
InternalState Collision. A distinguisher may be built if the attacker is able to find two colliding internal states, since the two keystreams produced from colliding states are identical. Finding such a collision requires around \(2^{144}\) keystream bits generated from the same key/IV pair, which is much longer than the maximal keystream length allowed by the multiplicative depth of the circuit. We also show in [14] that, for a given key, finding two internal states colliding on all bits except on \(IV^*\) does not provide any valid distinguisher. The birthdaybound of \(2^{144}\){0,1}then provides a limit on the number of bits produced from the same key/IV pair, not on the bits produced from the same key.
Cube Attacks [22, 28] and Cube Testers [4]. They provide the best attacks for roundreduced Trivium. In our case, as we keep the same main function, but we have two additional XORs per round, thus a better mixing of the variables, we can expect the relations to get more involved and hamper the application of previously defined roundreduced distinguishers. One might wonder if the fact that more variables are involved could ease the attacker’s task, but we point out here that the limitation in the previous attacks was not the IV size, but the size of the cubes themselves. Therefore, having more variables available is of no help with respect to this point. We can conclude that the resistance of Kreyvium to these types of attacks is at least the resistance of Trivium, and even better.
Conditional Differential Cryptanalysis. Because of its applicability to Trivium and KATAN, the attack from [41] is definitely of interest in our case. In particular, the highest number of blank rounds is reached if some conditions on two registers are satisfied at the same time (and not only conditions on the register controlled by the IV bits in the original Trivium). In our case, as we have IV bits in two registers, it is important to elucidate whether an attacker can take advantage of introducing differences in two registers simultaneously. First, let us recall that we have changed the constant to one containing mostly 1. We previously saw that the conditions that favor the attacks are values set to zero in the initial state. In Trivium, we have \((108+4+13)=125\) bits already fixed to zero in the initial state, 3 are fixed to one and the others can be controlled by the attacker in the weakkey setting (and the attacker will force them to be zero most of the time). Now, instead, we have 64 bits forced to be 1, 1 equal to zero, and \((128+93)=221\) bits of the initial state controlled by the attacker in the weakkey setting, plus potentially 21 additional bits from the key still not used, that will be inserted during the first rounds. We can conclude that, while in Trivium it is possible in the weakkey setting, to introduce zeros in the whole initial state but in 3,bits, in Kreyvium, we will never be able to set to zero 64 bits, implying that applying the techniques from [41] becomes much harder.
Algebraic Attacks. Several algebraic attacks have been proposed against Trivium, aiming at recovering the \(288\)bit internal state at the beginning of the keystream generation (i.e., at time \(t=1153\)) from the knowledge of the keystream bits. The most efficient attack of this type is due to Maximov and Biryukov [45]. It exploits the fact that the \(22\) keystream bits at time \(3t'\), \(0 \le t'< 22\), are determined by all bits of the initial state at indexes divisible by \(3\) (starting from the leftmost bit in each register). Moreover, once all bits at positions \(3i\) are known, then guessing that the outputs of the three AND gates at time \(3t'\) are zero provides 3 linear relations between the bits of the internal state and the keystream bits. The attack then consists of an exhaustive search for some bits at indexes divisible by \(3\). The other bits in such positions are then deduced by solving the linear system derived from the keystream bits at positions \(3t'\). Once all these bits have been determined, the other \(192\) bits of the initial state are deduced from the other keystream equations. This process must be iterated until the guess for the outputs of the AND gates is correct. In the case of Trivium, the outputs of at least \(125\) AND gates must be guessed in order to get \(192\) linear relations involving the \(192\) bits at indexes \(3i+1\) and \(3i+2\). This implies that the attack has to be repeated \((4/3)^{125}=2^{52}\) times. From these guesses, we get many linear relations involving the bits at positions \(3i\) only, implying that only an exhaustive search with complexity \(2^{32}\) for the other bits at positions \(3i\) is needed. Therefore, the overall complexity of the attack is around \(2^{32}\times 2^{52}=2^{84}\). A similar algorithm can be applied to Kreyvium, but the main difference is that every linear equation corresponding to a keystream bit also involves one key bit. Moreover, the key bits involved in the generation of any \(128\) consecutive output bits are independent. It follows that each of the first \(128\) linear equations introduces a new unknown in the system to solve. For this reason, it is not possible to determine all bits at positions \(3i\) by an exhaustive search on less than \(96\) bits like for Trivium. Moreover, the outputs of more than \(135\) AND gates must be guessed for obtaining enough equations on the remaining bits of the initial state. Therefore the overall complexity of the attack exceeds \(2^{96}\times 2^{52}=2^{148}\) and is much higher that the cost of the exhaustive key search. It is worth noticing that the attack would have been more efficient if only the feedback bits, and not the keystream bits, would have been dependent on the key. In this case, 22 linear relations independent from the key would have been available to the attacker.
4 Experimental Results
We now discuss and compare the practicality of our generic construction when instantiated with Trivium, Kreyvium and LowMC. The expansion function G implements a mere counter, and the aforementioned algorithms are used to instantiate the function F that produces N bits of keystream per iteration as defined by Propositions 1 and 2.^{5}
HE Framework. In our experiments, we considered two HE schemes: the BGV scheme [10] and the FV scheme [26] (a scaleinvariant version of BGV). The BGV scheme is implemented in the library HElib [34] and has become de facto a standard benchmarking library for HE applications. Similarly, the FV scheme was previously used in several HE benchmarkings [15, 27, 43], is conceptually simpler than the BGV scheme, and is one of the most efficient HE schemes.^{6} Additionally, batching was used [49], i.e. the HE schemes were set up to encrypt vectors in an SIMD fashion (componentwise operations, and rotations via the Frobenius endomorphism). The number of elements that can be encrypted depends on the number of terms in the factorization modulo 2 of the cyclotomic polynomial used in the implementation. This batching allowed us to perform several Trivium/Kreyvium/LowMC in parallel in order to increase the throughput.
Parameter Selection for Subsequent Homomorphic Processing. In all the previous works on the homomorphic evaluation of symmetric encryption schemes, the parameters of the underlying HE scheme were selected for the exact multiplicative depth required and not beyond [2, 19, 24, 31, 43]. This means that once the ciphertext is decompressed, no further homomorphic computation can actually be performed by Charlie – this makes the claimed timings considerably less meaningful in a realworld context.
We benchmarked both parameters for the exact multiplicative depth and parameters able to handle circuits of the minimal multiplicative depth plus 7 to allow further homomorphic processing by Charlie (which is obviously what is expected in applications of homomorphic encryption). We chose 7 because, in practice, numerous applications use algorithms of multiplicative depth smaller than 7 (see e.g. [33, 42]). In what follows we compare the results we obtain using Trivium, Kreyvium and also the LowMC cipher. For LowMC, we benchmarked not only our own implementation but also the LowMC implementation of [2] available at https://bitbucket.org/malb/lowmchelib. Minor changes to this implementation were made in order to obtain an equivalent parametrization of HElib. The main difference is that the implementation from [2] uses an optimized method for multiplying a Boolean vector and a Boolean matrix, namely the “Method of Four Russians”. This explains why our implementation is approximately \(6\,\%\) slower, as it performs 2–3 times more ciphertext additions.
Latency and throughput using HElib on a single core of a midend 48core server (4 x AMD Opteron 6172 processors with 64 GB of RAM).
Algorithm  Security level \(\kappa \)  N  used \(\times \) depth  #slots  Latency sec.  throughput 

bits/min  
Trivium12  80  45  12  600  1417.4  1143.0 
19  720  4420.3  439.8  
Trivium13  80  136  13  600  3650.3  1341.3 
20  720  11379.7  516.3  
Kreyvium12  128  42  12  504  1715.0  740.5 
19  756  4956.0  384.4  
Kreyvium13  128  124  13  682  3987.2  1272.6 
20  480  12450.8  286.8  
LowMC128  \(\text{? }\le 118\)  256  13  682  3608.4  2903.1 
20  480  10619.6  694.3  
LowMC128 [2]  \(\text{? }\le 118\)  256  13  682  3368.8  3109.6 
20  480  9977.1  739.0 
Latency of our construction when using the FV scheme on a midend 48core server (4 x AMD Opteron 6172 processors with 64 GB of RAM).
Algorithm  Security level \(\kappa \)  N  used \(\times \) depth  Latency (sec.)  Speed gain  

1 core  48 cores  
Trivium12  80  57  12  681.5  26.8  \(\times \) 25.4 
19  2097.1  67.6  \(\times \) 31.0  
Trivium13  80  136  13  888.2  33.9  \(\times \) 26.2 
20  2395.0  77.2  \(\times \) 31.0  
Kreyvium12  128  46  12  904.4  35.3  \(\times \) 25.6 
19  2806.3  82.4  \(\times \) 34.1  
Kreyvium13  128  125  13  1318.6  49.7  \(\times \) 26.5 
20  3331.4  97.9  \(\times \) 34.0  
LowMC128  \(\text{? }\le 118\)  256  14  1531.1  171.0  \(\times \) 9.0 
21  3347.8  329.0  \(\times \) 10.2 
Interpretation. First, we would like to recall that LowMC128 must be considered in a different category because of the existence of a keyrecovery attack with time complexity \(2^{118}\) and data complexity \(2^{73}\) [21]. However, it has been included in the table in order to show that the performances achieved by Trivium and Kreyvium are of the same order of magnitude. An increase in the number of rounds of LowMC128 (typically by 4 rounds) is needed to achieve \(128\)bit security, but this would have a nonnegligible impact on its homomorphic evaluation performance, as it would require to increase the depth of the cryptosystem supporting the execution. For instance, a backoftheenvelope estimation for four additional rounds leads to a degradation of its homomorphic execution performances by a factor of about 2 to 3 (more computations with larger parameters). It is also worth noticing that the minimal multiplicative depth for which valid LowMC output ciphertexts were obtained was 14 for the FV scheme and 13 for the BGV scheme. The theoretical multiplicative depth is 12 but the high number of additions explains this difference^{7}.
Our results show that Trivium and Kreyvium have a smaller latency than LowMC, but have a slightly smaller throughput. As already emphasized in [43], realworld applications of homomorphic encryption (which are often cloudbased applications) should be implemented in a transparent and userfriendly way. In the context of our approach, the latency of the offline phase is still an important parameter aiming at an acceptable experience for the enduser even when a sufficient amount of homomorphic keystream could not be precomputed early enough because of overall system dimensioning issues.
Also Trivium and Kreyvium are more parallelizable than LowMC is. Therefore, our work shows that the promising performances obtained by the recently proposed HEdedicated cipher LowMC can also be achieved with Trivium, a wellanalyzed stream cipher, and a variant aiming at achieving 128 bits of security. Last but not least, we recall that our construction was aiming at compressing the size of transmissions between Alice and Charlie. We support an encryption rate \(c'/m\) that becomes asymptotically close to 1 for long messages, e.g. for \(\ell _m = 1\,\text {GB}\) message length, our construction instantiated with Trivium (resp. Kreyvium), yields an expansion rate of 1.08 (resp. 1.16).
5 Conclusion
Our work shows that the promising performances obtained by the recent HEdedicated cipher LowMC can also be achieved with Trivium, a wellknown primitive whose security has been thoroughly analyzed, e.g. [4, 22, 28, 41, 45]. The 10year analysis effort from the community, initiated by the eSTREAM competition, enables us to gain confidence in its security. Also our variant Kreyvium benefits from this analysis since the core of the cipher is essentially the same.
Footnotes
 1.
It is worth noting that in a HE context, reducing the multiplicative size of a symmetric primitive might not be the first concern (while it is critical in a multiparty computation context, which also motivated the work of Albrecht et al. [2]), whereas minimizing the multiplicative depth is of prime importance.
 2.
Independently from our results, another variant of Trivium named TriviA has been proposed [16]. It handles larger keys but uses longer registers. It then needs more rounds for mixing the internal state, which means that it is much less adapted to our setting than Kreyvium.
 3.
Note that it is usual that HE schemes succeed in achieving CPA security, but often grossly fail to realize any form of CCA1 security, to the point of admitting simple key recovery attacks [17].
 4.
Note that this equivalent to say that Kreyvium instantiated with a random key and mapping the IV’s to the keystream is secure [7, Sect. 3.2].
 5.
 6.
We used the Armadillo compiler implementation of FV [15]. This sourcetosource compiler turns a C++ algorithm into a Boolean circuit, optimizes it, and generates an OpenMP parallel code which can then be combined with a HE scheme.
 7.
The multiplicative depth is only an approximation of the homomorphic depth required to absorb the noise generated by the execution of an algorithm [44]. It neglects the noise induced by additions and thus does not hold for too additionintensive algorithms such as those in the LowMC family.
Notes
Acknowledgments
We thank Yannick Seurin for informing us about the complete characterization of secure hybrid encryption.
References
 1.Algorithms, key size and parameters report 2014. Technical report, ENISA (2014)Google Scholar
 2.Albrecht, M.R., Rechberger, C., Schneider, T., Tiessen, T., Zohner, M.: Ciphers for MPC and FHE. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 430–454. Springer, Heidelberg (2015)Google Scholar
 3.Armknecht, F., Mikhalev, V.: On lightweight stream ciphers with shorter internal states. In: Leander, G. (ed.) FSE 2015. LNCS, vol. 9054, pp. 451–470. Springer, Heidelberg (2015)CrossRefGoogle Scholar
 4.Aumasson, J.P., Dinur, I., Meier, W., Shamir, A.: Cube testers and key recovery attacks on reducedround MD6 and Trivium. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 1–22. Springer, Heidelberg (2009)CrossRefGoogle Scholar
 5.Babbage, S.: A space/time tradeoff in exhaustive search attacks on stream ciphers. In: Proceedings of European Convention on Security and Detection, No. 408. IEEE (1995)Google Scholar
 6.Bellare, M., Desai, A., Jokipii, E., Rogaway, P.: A concrete security treatment of symmetric encryption. In: Proceedings of FOCS, pp. 394–403. IEEE Computer Society (1997)Google Scholar
 7.Berbain, C., Gilbert, H.: On the security of IV dependent stream ciphers. In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 254–273. Springer, Heidelberg (2007)CrossRefGoogle Scholar
 8.Biryukov, A., Shamir, A.: Cryptanalytic time/memory/data tradeoffs for stream ciphers. In: Okamoto, T. (ed.) ASIACRYPT 2000. LNCS, vol. 1976, pp. 1–13. Springer, Heidelberg (2000)CrossRefGoogle Scholar
 9.Borghoff, J., Canteaut, A., Güneysu, T., Kavun, E.B., Knezevic, M., Knudsen, L.R., Leander, G., Nikov, V., Paar, C., Rechberger, C., Rombouts, P., Thomsen, S.S., Yalçin, T.: PRINCE – a lowlatency block cipher for pervasive computing applications. In: Wang, X., Sako, K. (eds.) ASIACRYPT 2012. LNCS, vol. 7658, pp. 208–225. Springer, Heidelberg (2012)CrossRefGoogle Scholar
 10.Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) Fully homomorphic encryption without bootstrapping. TOCT 6(3), 13 (2014)MathSciNetCrossRefGoogle Scholar
 11.De Cannière, C., Dunkelman, O., Knežević, M.: KATAN and KTANTAN—a family of small and efficient hardwareoriented block ciphers. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 272–288. Springer, Heidelberg (2009)CrossRefGoogle Scholar
 12.Cannière, C.D., Lano, J., Preneel, B.: Comments on there discovery of time memory data tradeoffs. Technical report, eSTREAMECRYPT Stream Cipher Project (2005). www.ecrypt.eu.org/stream/papersdir/040.pdf
 13.De Cannière, C., Preneel, B.: Trivium. In: Robshaw, M., Billet, O. (eds.) New Stream Cipher Designs. LNCS, vol. 4986, pp. 244–266. Springer, Heidelberg (2008)CrossRefGoogle Scholar
 14.Canteaut, A., Carpov, S., Fontaine, C., Lepoint, T., NayaPlasencia, M., Paillier, P., Sirdey, R.: How to compress homomorphic ciphertexts. IACR Cryptol. ePrint Arch. 2015, 113 (2015). https://eprint.iacr.org/2015/113 Google Scholar
 15.Carpov, S., Dubrulle, P., Sirdey, R.: Armadillo: a compilation chain for privacy preserving applications. In: Proceedings of ACM CCSW. ACM (2015)Google Scholar
 16.Chakraborti, A., Chattopadhyay, A., Hassan, M., Nandi, M.: TriviA: a fast and secure authenticated encryption scheme. In: Güneysu, T., Handschuh, H. (eds.) CHES 2015. LNCS, vol. 9293, pp. 330–353. Springer, Heidelberg (2015)CrossRefGoogle Scholar
 17.Chenal, M., Tang, Q.: On key recovery attacks against existing somewhat homomorphic encryption schemes. In: Aranha, D.F., Menezes, A. (eds.) LATINCRYPT 2014. LNCS, vol. 8895, pp. 239–258. Springer, Heidelberg (2015)Google Scholar
 18.Cheon, J.H., Coron, J.S., Kim, J., Lee, M.S., Lepoint, T., Tibouchi, M., Yun, A.: Batch fully homomorphic encryption over the integers. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 315–335. Springer, Heidelberg (2013)CrossRefGoogle Scholar
 19.Coron, J.S., Lepoint, T., Tibouchi, M.: Scaleinvariant fully homomorphic encryption over the integers. In: Krawczyk, H. (ed.) PKC 2014. LNCS, vol. 8383, pp. 311–328. Springer, Heidelberg (2014)CrossRefGoogle Scholar
 20.Courtois, N., Meier, W.: Algebraic attacks on stream ciphers with linear feedback. In: Biham, E. (ed.) Advances in Cryptology–EUROCRYPT 2003. LNCS, vol. 2656, pp. 345–359. Springer, Heidelberg (2003)CrossRefGoogle Scholar
 21.Dinur, I., Liu, Y., Meier, W., Wang, Q.: Optimized interpolation attacks on LowMC. IACR Cryptol. ePrint Arch. 2015, 418 (2015)zbMATHGoogle Scholar
 22.Dinur, I., Shamir, A.: Cube attacks on tweakable black box polynomials. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 278–299. Springer, Heidelberg (2009)CrossRefGoogle Scholar
 23.Doröz, Y., Hu, Y., Sunar, B.: Homomorphic AES evaluation using NTRU. IACR Cryptol. ePrint Arch. 2014, 39 (2014)Google Scholar
 24.Doröz, Y., Shahverdi, A., Eisenbarth, T., Sunar, B.: Toward practical homomorphic evaluation of block ciphers using prince. In: Böhme, R., Brenner, M., Moore, T., Smith, M. (eds.) FC 2014 Workshops. LNCS, vol. 8438, pp. 208–220. Springer, Heidelberg (2014)Google Scholar
 25.ECRYPT  European Network of Excellence in Cryptology: The eSTREAM StreamCipher Project (2005). http://www.ecrypt.eu.org/stream/
 26.Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptol. ePrint Arch. 2012, 144 (2012)Google Scholar
 27.Fau, S., Sirdey, R., Fontaine, C., Aguilar, C., Gogniat, G.: Towards practical program execution over fully homomorphic encryption schemes. In: IEEE International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pp. 284–290 (2013)Google Scholar
 28.Fouque, P.A., Vannet, T.: Improving key recovery to 784 and 799 rounds of Trivium using optimized cube attacks. In: Moriai, S. (ed.) FSE 2013. LNCS, vol. 8424, pp. 502–517. Springer, Heidelberg (2014)Google Scholar
 29.Fuhr, T., Minaud, B.: Match box meetinthemiddle attack against KATAN. In: Cid, C., Rechberger, C. (eds.) FSE 2014. LNCS, vol. 8540, pp. 61–81. Springer, Heidelberg (2015)Google Scholar
 30.Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Proceedings of STOC, pp. 169–178. ACM (2009)Google Scholar
 31.Gentry, C., Halevi, S., Smart, N.P.: Homomorphic evaluation of the AES circuit. In: SafaviNaini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 850–867. Springer, Heidelberg (2012)CrossRefGoogle Scholar
 32.Golić, J.D.: Cryptanalysis of alleged A5 stream cipher. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 239–255. Springer, Heidelberg (1997)CrossRefGoogle Scholar
 33.Graepel, T., Lauter, K., Naehrig, M.: ML confidential: machine learning on encrypted data. In: Kwon, T., Lee, M.K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 1–21. Springer, Heidelberg (2013)CrossRefGoogle Scholar
 34.Halevi, S., Shoup, V.: Algorithms in HElib. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014, Part I. LNCS, vol. 8616, pp. 554–571. Springer, Heidelberg (2014)CrossRefGoogle Scholar
 35.Herranz, J., Hofheinz, D., Kiltz, E.: Some (in)sufficient conditions for secure hybrid encryption. Inf. Comput. 208(11), 1243–1257 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 36.Hong, J., Sarkar, P.: New applications of time memory data tradeoffs. In: Roy, B. (ed.) ASIACRYPT 2005. LNCS, vol. 3788, pp. 353–372. Springer, Heidelberg (2005)CrossRefGoogle Scholar
 37.Iwata, T.: New blockcipher modes of operation with beyond the birthday bound security. In: Robshaw, M. (ed.) FSE 2006. LNCS, vol. 4047, pp. 310–327. Springer, Heidelberg (2006)CrossRefGoogle Scholar
 38.Jakobsen, T., Knudsen, L.R.: The interpolation attack on block ciphers. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 28–40. Springer, Heidelberg (1997)CrossRefGoogle Scholar
 39.Katz, J., Lindell, Y.: Introduction to Modern Cryptography, 2nd edn. Chapman and Hall/CRC Press, Boca Raton (2014)zbMATHGoogle Scholar
 40.Knellwolf, S., Meier, W., NayaPlasencia, M.: Conditional differential cryptanalysis of NLFSRbased cryptosystems. In: Abe, M. (ed.) ASIACRYPT 2010. LNCS, vol. 6477, pp. 130–145. Springer, Heidelberg (2010)CrossRefGoogle Scholar
 41.Knellwolf, S., Meier, W., NayaPlasencia, M.: Conditional differential cryptanalysis of Trivium and KATAN. In: Miri, A., Vaudenay, S. (eds.) SAC 2011. LNCS, vol. 7118, pp. 200–212. Springer, Heidelberg (2012)CrossRefGoogle Scholar
 42.Lauter, K., LópezAlt, A., Naehrig, M.: Private computation on encrypted genomic data. In: Aranha, D.F., Menezes, A. (eds.) LATINCRYPT 2014. LNCS, vol. 8895, pp. 3–27. Springer, Heidelberg (2015)Google Scholar
 43.Lepoint, T., Naehrig, M.: A comparison of the homomorphic encryption schemes \({\sf FV}\) and \({\sf YASHE}\). In: Pointcheval, D., Vergnaud, D. (eds.) AFRICACRYPT. LNCS, vol. 8469, pp. 318–335. Springer, Heidelberg (2014)CrossRefGoogle Scholar
 44.Lepoint, T., Paillier, P.: On the minimal number of bootstrappings in homomorphic circuits. In: Adams, A.A., Brenner, M., Smith, M. (eds.) FC 2013. LNCS, vol. 7862, pp. 189–200. Springer, Heidelberg (2013)CrossRefGoogle Scholar
 45.Maximov, A., Biryukov, A.: Two trivial attacks on Trivium. In: Adams, C., Miri, A., Wiener, M. (eds.) SAC 2007. LNCS, vol. 4876, pp. 36–55. Springer, Heidelberg (2007)CrossRefGoogle Scholar
 46.Naehrig, M., Lauter, K.E., Vaikuntanathan, V.: Can homomorphic encryption be practical? In: Proceedings of ACM CCSW, pp. 113–124. ACM (2011)Google Scholar
 47.National Institute of Standards and Technology: Recommendation for Block Cipher Modes of Operation. NIST Special, Publication 800–38A (2001)Google Scholar
 48.Rogaway, P.: Evaluation of some blockcipher modes of operation. Cryptrec(2011). http://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf
 49.Smart, N.P., Vercauteren, F.: Fully homomorphic SIMD operations. Des. Codes Crypt. 71(1), 57–81 (2014)CrossRefzbMATHGoogle Scholar
 50.Yasuda, K.: A new variant of PMAC: beyond the birthday bound. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 596–609. Springer, Heidelberg (2011)CrossRefGoogle Scholar