Polar coding for Ring-LWE-based public key encryption

The ring learning with errors (RLWE) problem can be used to construct efficient post-quantum public key encryption schemes. An error distribution, normally a Gaussian-like distribution, is involved in the RLWE problem. In this work we focus on using polar codes to alleviate a natural trade-off present in RLWE public key encryption schemes; namely, we would like a wider error distribution to increase security, but a wider error distribution comes at the cost of an increased probability of decryption error. The motivation of this work is to improve the bit-security level by using wider error distribution while keeping the target decryption failure rate achievable. The approach we proposed in this work is twofold. Firstly, we formulate RLWE public key encryption as a channel model with some noise terms known by the decoder. This makes our approach distinguished from existing research of this kind in the literature which ignores these known terms. Secondly, we design polar codes for the derived channel model. Theoretically and numerically, we show the proposed modeling and polar coding scheme contributes to a considerable bit-security level improvement compared with NewHope, a submission to National Institute of Standards and Technology (NIST), with almost the same parameters. Moreover, polar encoding and decoding support isochronous implementations in the sense that the timings of associated operations are irrelevant to the sensitive information.


3
The contributions of this paper are summarized as follows. 1. We formulate the RLWE-based PKE as an i.i.d. fading channel with CSI available to the receiver without any "independence" assumptions. These are the prerequisites of the proposed of polar coding scheme.
(a) As explained earlier in this section, the coefficient correlation of the residue noise term e ⋅ t − s ⋅ e 1 + e 2 is unfastened by canonical embedding leading to an i.i.d. channel model. We view e and s as CSI which are known by Alice at the decryption stage whilst Bob on the other side only knows its distribution. (b) Taking telecommunication system as an analogy, mapping a single bit 0 or 1 of the plaintext to a symbol on the constellation {0,⌊q/2⌋} is called modulation.
To make the modulation scheme fit in with the i.i.d. fading channel in canonical basis, we proposed a new modulation scheme at the cost of error tolerance.
2. Then we give the explicit construction of polar codes for RLWE-based PKE channel model. Experimental results and theoretical estimation of DFR are also given. Specifically, we derive a new DFR of 2 − 298 for q = 12289,n = 1024,r = 2 ( r = √ k∕2 ) and code rate= 0.25, while NewHope gives a DFR of 2 − 216 in the same setting; we derive a new DFR of 2 − 156 for r = 2.83 (k = 16) and code rate= 0.25 while NewHope is proved to give a DFR of 2 − 137 in almost the same setting [32]. Thanks to the new DFR margin, the proposed RLWE-based PKE achieves a better bit-security level than NewHope while achieving the same target DFR. Besides, the encoding and decoding of polar codes support quasi-linear (i.e., O(n log n) with n to be the degree of the cyclotomic field of RLWE) and isochronous implementations, which will be discussed in detail in Section 7.2.

Roadmap
This paper is organized as follows. A review of the necessary algebraic number theory, fading channels and polar codes can be found in Section 2. In Section 3 we explain how to formulate a typical RLWE-based PKE scheme as an i.i.d. fading channel. How to handle the dependency in canonical basis is also demonstrated. Section 4 gives a high-level description of RLWE-based PKE with the proposed polar coding scheme. Section 5 gives the explicit construction of polar codes for RLWE. Section 6 analyzes the DFR theoretically and experimentally when polar coding is applied. Section 7 discusses the bit-security improvement derived by the new DFR margin as well as the isochrony of polar codes. Section 8 concludes this paper.

Algebraic number theory
We review the necessary concepts from algebraic number theory required for our discussion of ring-LWE. In particular, we will relate many of our definitions to power-of-two cyclotomic fields, which are popular in modern cryptography.

3
A number field K = ℚ( ) can be defined by adjoining an element ∈ ℂ to the field of ℚ where ζ satisfies f(ζ) = 0 for some irreducible polynomial f (X) ∈ ℚ[X] . Then, the degree of K over ℚ is precisely the degree n of f(X). Because f(ζ) = 0, K can be seen as a vector space over ℚ endowed with a basis {1,ζ,...,ζ n− 1 } known as the power basis of K. Let ζ m be a primitive m th complex root of unity with minimal polynomial where ℤ * m is the group of invertible elements in ℤ m . Then, the m th cyclotomic number field is defined as K = ℚ( m ) . When m ≥ 2 is a power of two, f(X) = X n + 1 and n = m/2.
A number field K of degree n permits n distinct ring embeddings i ∶ K → ℂ, i = 1, ..., n, which correspond to n automorphisms of K mapping ζ to each root of its minimal polynomial f(X). The n embeddings include s 1 real embeddings and s 2 pairs of complex conjugate embeddings. The concatenation of the n embeddings is called canonical embedding σ(⋅) which is a map from K into the space For power-of-two cyclotomics, s 1 = 0. Because the complex embeddings come in pairs of conjugates, H is isomorphic to ℝ n . We also remark that under the embedding σ multiplication in K maps to coordinate-wise multiplication in H.
Let O K be the set of all the algebraic integers in K. It forms a ring and is called ring of integers of the number field. For the above power-of-two cyclotomics, the ring of integers is O K = ℤ[X]∕(1 + X n ) and the canonical embedding maps O K to an algebraic lattice in space H and the lattice generator matrix is defined as Moreover, because of the conjugate pairs of the embeddings, we can rewrite σ as � ∶ K → ℝ n And the corresponding basis B of the images of the mapping is Note that both B and B are orthogonal matrices. The determinant of B is √ n n while that of B is ( √ n∕2) n .

Ring-LWE public key encryption scheme and the coefficient dependency
For concreteness, we give an example of a public key scheme based on ring-LWE which was first described in [21]. Many ring-LWE schemes and protocols including NewHope closely resemble this one. The scheme is parameterized by an integer modulus q, dimension n, and error distribution χ over R q . We will take the example of NewHope and view R q as x n +1 and define sampling from χ to be sampling each coefficient of a polynomial from the discrete Gaussian over ℤ . The scheme proceeds as follows.
-Alice samples a secret key s ← and publishes as a public key a ring-LWE sample (a,b) = (a,a ⋅ s + e) ∈ R q × R q , where a is uniformly random and e ← . -Bob encrypts a message m ∈ R 2 as (c 1 , , where e 1 ,e 2 ,t are sampled independently from χ. Alice then recovers the message m by decoding: if the i th coordinate of d is closer to 0 than ⌊q/2⌋, Alice assumes the i th coordinate of m was 0, otherwise she assumes it was 1. We find the dependency between the coefficients of the residue noise term e ⋅ t − s ⋅ e 1 + e 2 obvious if we rewrite it in vector form using coefficient embedding of R q , i.e., where e (i) is the i-th coefficient of polynomial e and t is the coefficient embedding of polynomial t. The row vectors of the negacyclic matrix generated by e have identical norm and they are multiplied by the same vector t and so do s and e 1 .

Fading channel
In wireless communications, a fading channel arises due to a time-varying attenuation of signal quality caused by either the propagation environment or by movement of the transmitter/receiver. We consider a discrete-time fading channel model W where h i is the channel gain, z i is additive white Gaussian noise (AWGN) and N is the signal length. We highlight two facts about CSI which are relevant to the RLWE channel model we will discuss in Section 3. Firstly, a few consecutive h i may be correlated and this period is called coherence interval of a fading channel W denoted by T c . In the context of a fading channel with memory, the channel gain h i is believed to be a constant within one coherence interval and varies independently as the next coherence interval approaches. Secondly, the realization of h i is called channel state information (CSI) and the distribution of h i is called channel distribution information (CDI). CSI sometimes is known to the decoder.
When designing a telecommunication system, we prefer i.i.d. fading channels where h i are independent. There are a few methods to deal with the correlation. Let m = T c > 1 and N/m = n. Since a fading channel with coherence interval T c can be seen as m parallel sub-channels, a bit-interleaved coded modulation (BICM) technique can be used to handle the correlation between sub-channels [7,22]. Another solution is to use multilevel codes [11] to design a coded modulation scheme with signal points in an m-dimensional signal space. In [18], a properly chosen lattice partition chain Λ 1 /⋯/Λ l− 1 /Λ l is employed to design multilevel polar codes to achieve fading channel capacity. In this case, the dimension m of Λ 1 is properly chosen such that the channel gain h i is assumed to be a constant amid the whole transmission of m symbols, i.e. T c = m. A component code C i at the i-th level of the partition chain is designed in order to achieve the capacity of a Λ i /Λ i+ 1 fading channel. The component codes are combined by construction D giving rise to a lattice. More information about the multilevel construction and the Λ i /Λ i+ 1 channel can be found in [18] and [11]. We give an example of a mod ℤ channel and a ℤ∕2ℤ channel as follows and the fading version will be given in Section 3.

Example 1
At the receiver's end, there is a mod V(ℤ) operation giving the equivalent channel output as where z is an AWGN noise and z � = z mod ℤ.
Example 2 A ℤ∕2ℤ channel is an AWGN channel with input restricted to r ∈ (ℤ + a) ∩ V(2ℤ) for some offset a ∈ ℝ . At the receiver's end, the equivalent channel output is where z is an AWGN noise and z � = z mod 2ℤ . It can be viewed as a mod 2ℤ channel with input restricted to a set of elements of ℤ + a that fall in V(2ℤ).
In the special case of T c = 1, channel W is referred to as an i.i.d. fading channel. The design and performance of error-correcting codes for i.i.d. fading channels with/without CSI is well studied [6,36]. In [18], Liu et al. proposed a polar coding scheme for i.i.d. fading channels to achieve the ergodic capacity. Unlike previous work of [6] in which CSI is given to both ends of communication, in Liu et al.'s scheme CSI is only known to the receiver which is more feasible in practice.

Polar codes
Polar codes, introduced by Arıkan in [5], are linear block codes of length n = 2 l for a positive integer l that achieves the capacity of any binary-input discrete memoryless symmetric (BDMS) channels asymptotically 5 . We firstly review some basics of polar codes for a BDMS channel. A binary-input channel W is symmetric if there exists a permutation π of the output alphabet Y such that W(y|1) = W(π(y)|0) and π − 1 = π for y ∈ Y . Given a BDMS channel W, there are two commonly used metrics in information theory to measure the quality of W: the mutual information 6 and the reliability.

Definition 1 (Mutual information of BDMS channels) The mutual information I(W) ∈
[0,1] of a BDMS channel W ∶ X → Y is the maximum rate at which information can be successfully transmitted from the transmitter to the receiver. We define I(W) as In here, we use the definition of symmetric mutual information assuming uniform channel input which is also the capacity of the BDMS channel. We use the notations I(W) and I(Y ;X) interchangeably to denote the mutual information of W.

Definition 2 (Bhattacharyya parameter of BDMS channels)
The Bhattacharyya parameter Z(W) ∈ [0,1] is a measure of channel reliability for a BDMS channel W defined as where a small Z(W) indicates a more reliable channel while a large Z(W) implies a channel with more inference.
The capacity-achieving nature of polar codes arises from the so-called channel polarization phenomenon as a result of recursive applications of Arıkan's transform to two identical W channels and their synthesized derivatives. The overall recursive transform can be done in a channel combining phase and a channel splitting phase. In the channel combining phase, a linear transformation defined as X 1:n = U 1:n G n is performed on a vector U 1∶n ∈ X 1∶n over GF (2), where G n = B n 1 0 1 1 ⊗l . B n is a permutation matrix: if U �1∶n = U 1∶n B n and l = log 2 n , the i � = ((b l , ⋯ , b 2 , b 1 ) 2 + 1)-th coordinate of U �1∶n is the i = ((b 1 ,b 2 ,⋯ ,b l ) 2 + 1)-th coordinate of U 1:n where (⋯ ) 2 is the binary expansion of an integer. By taking X 1:n as the raw input of W, one derives a combined channel W n ∶ X 1∶n → Y 1∶n with a transition probability of where (⋅) i denotes i-the coordinate. Since G n induces a one-to-one mapping between U 1:n and X 1:n , the mutual information of W n is . 5 The generalizations of polar codes are extended to a large class of channels, e.g., the binary-input memoryless symmetric (BMS) channel. 6 The maximum mutual information over all possible channel input distributions is the channel capacity.
In the channel splitting phase, W n is further split back into n synthesized channels W (i) n ∶ X → Y n × X i−1 whose transition probability is defined by It is proved in [5] that Arıkan's transform preserves the mutual information in the sense that More importantly, the quality of the synthesized channels polarizes asymptotically as the recursion proceeds.

Theorem 1 (Channel polarization of mutual information 5)
For any BDMS channel W, the synthesized channels W (i) n polarize in the sense that, for any fixed δ ∈ (0,1), as n goes to infinity through powers of two, the fraction of indices i ∈{1,⋯ ,n} for which I(W (i) n ) ∈ (1 − , 1] goes to I(W) and the fraction for which I(W (i) n ) ∈ [0, ) goes to 1 − I(W).
The channel polarization theorem can also be stated in the metric of Bhattacharyya parameter by replacing I(W (i) n ) by Z(W (i) n ) . For any desired transmission rate R < I(W), we can partition {1,⋯ ,n} into a subset A and its complement A C such that (i) �A� = ⌊nR⌋ and (ii) for any i ∈ A and j ∈ A C , Z(W (i) n ) ≤ Z(W (j) n ) . Given the "best" ⌊nR⌋ channels indexed by A , one can construct polar codes following the encoding rule: where ⊕ is XOR operation, U A is called the information vector and U A C is called the frozen vector known by both encoder and decoder. Typical realization of the frozen vector is U A C = 0 for BDMS channels. In this manner, the useful information is transmitted via the most reliable synthesized channels. A question may arise on how to efficiently calculate Z(W (i) n ) . A brief review can be found in Sections 2.5 and 5.3 but detailed descriptions of these methods are beyond the scope of this work.
The successive cancellation (SC) decoder is the initial decoding algorithm for polar codes. Let u (i) be the i-th coordinate of U 1:n . Given a channel output y 1:n of polar code, the SC decoder yields the recovered ū (i) of u (i) in sequential order of index i according to the decoding rule specified as where ū 1∶i−1 is the estimation of u 1:i− 1 recovered before ū (i) . Details of the SC decoder can be found in Appendix A.
Denote by P e the averaged probability of frame errors. As a result of polar encoding and SC decoding, it is proved in [5] that P e is upper bounded as follows.

3
Theorem 2 (Decoding Performance 5) For any BDMS channel W and any choices of parameter (n, R, A),

Channel degradation and upgradation
The construction of polar codes can be addressed if all the Bhattacharyya parameters Z(W (i) n ) of synthesized channels can be efficiently calculated. To this end, a quantization method was proposed in [34] to construct a degraded or upgraded approximation of a binary-input memoryless symmetric (BMS) channel. In this way, one can approximate Z(W (i) n ) efficiently with tractable and minor distortion. We define the degradation and upgradation relation as follows and will be further discussed them in the sequel.

Definition 3 (Degraded and Upgraded Channel, 34)
for all z ∈ Z and x ∈ X . We denote by Q ⪯ W the relation that Q is degraded with respect to W . Conversely, we denote by Q � ⪰ W the relation that Q ′ is upgraded with respect to W if there exists a channel Q � ∶ X → Z � and a channel P ∶ Z � → Y such that for y ∈ Y and x ∈ X Moreover, Lemma 1 indicates that the synthesized channels of Q, W, Q ′ under Arıkan's transform also fulfill the channel degradation and upgradation relation. This implies a polar code constructed for Q also fits in with W.

Lemma 1 (restatement of Lemma 4.7 in 17)
Given BMS channels W, Q , and Q ′ , we If the channel degradation or upgradation relation is set up, their channel capacity, reliability and error probability will be related as follows. The inequality will reverse if we replace "degraded" by "upgraded".

Definition 4
The real multivariate normal distribution has density function where |⋅| denotes the determinant, where B is the orthogonal basis defined in Section 2.1 and the multiplications (i.e., σ(e)σ(t) and σ(s)σ(e 1 )) and additions are both coordinate-wise as explained in Section 2.1. Due to the conjugate pairs, formula (4) can be refined as where B j represents the j th row of B, vector y and m are vector forms of polynomials y and m, B and ′ are introduced in Section 2.1, B y = � (y) , and B ⌊ q 2 ⌋m = � (⌊ q 2 ⌋m) . To see how the noise term N is distributed, we rewrite formula (5) for all the odd indices i = 1,3,5,⋯ ,n/2 − 1 as . Under embedding ∶ K → ℂ n , the spherical normal distributed vectors, e and t, are mapped to complex spherical normal vectors, (e), (t) ∼ NC(0, nr 2 ) . As for the embedding � ∶ K → ℝ n , the spherical normal distribution N(0, r 2 ) is transformed to a new spherical normal distribution N(0, nr 2 ∕2 ) . Since e,t are coordinate-wise i.i.d. their embeddings σ(e), σ(t), � (e) , � (t) are coordinate-wise independent as well. We observe from formula (6) that every odd-indexed coordinate and the next even-indexed coordinate are somehow correlated because they share the same To further refine the RLWE channel model, we can rewrite formula (5) and (6) as Because of the correlation between every two coordinates, H i and H j are independent for two different indices i,j as long as ⌈i/2⌉≠⌈j/2⌉; otherwise H i = H j . Similarly, Z i and Z j are correlated if ⌈i/2⌉ = ⌈j/2⌉; otherwise they are independent. Unlike in NewHope and other RLWE-based encryption schemes where the plaintext is encoded and decoded in the polynomial basis, we will carry out encoding and decoding in canonical basis. Observe that the channel given by formula (7) is a fading channel with coherence interval T c = 2 coordinates except that the symbols to be transmitted after modulation, i.e., B ⌊ q 2 ⌋ , are not coordinate-wise independent. In next subsection, we will adjust the modulation scheme such that a tailored constellation diagram can fit in with the fading channel.

A tailored constellation diagram
The RLWE channel in formula (3) can be interpreted as n parallel ℤ∕2ℤ channels where a message m ∈{0,1} n is mapped to a symbol on the constellation diagram {0, ⌊ q 2 ⌋} n . The mod R q operation defines a valid constellation space as an n-dimensional cube Λ with vertices {0,q} n . To ease the description of how we design a new constellation diagram in canonical basis, we make a modification to the modulation scheme in formula (3): the message m ∈{− 1,1} n is mapped onto the constellation diagram {±⌊ q 4 ⌋} n and the valid constellation space is a cube Λ with vertices {±⌊ q 2 ⌋} n . This modification will preserve � , the capacity of the ℤ∕2ℤ channel because they are statistically equivalent if we ignore geometrical approximation caused by the round-off operation ⌊⋅⌋. According to formula (7), after applying the canonical embedding, the constellation diagram turns into B {±⌊ q 4 ⌋} n . Similarly, we can obtain the new constellation space � =B by rotating Λ and scaling it up by a factor of √ n∕2. As discussed in previous subsection, the coherence interval T c of the residue noise equals to 2 coordinates while the constellation symbol B ⌊ q 4 ⌋m has memory throughout n coordinates. In a communication system, the interleaving technique can be used to alleviate the correlation of the source by permuting symbols of different code blocks. Unfortunately, interleaving is impractical in the RLWE channel because there is only one code block of length n. At the cost of distance between the constellation symbols, we tailor the constellation space ′ to fit in with the fading channel.
Essentially, we are looking for a new modulation scheme meeting two conditions: (a) we desire the symbols after modulation (or the modulated message) to be coordinatewise i.i.d.; in other words, we expect a valid constellation diagram inside the space ′ such that for coordinate-wise i.i.d. message m, the modulated message is coordinatewise i.i.d. as well; (b) the new modulation scheme gives us a ℤ∕2ℤ channel. Conceptually, the maximal n-dimensional cube ′′ enclosed in ′ and parallel to Λ is our target constellation space. In this case, the symbols to be transmitted can be easily made to be binary and i.i.d. if we divide the cube ′′ equally into 2 n small cubes and select all the centers of the small cubes to be the constellation diagram. However, looking for such a ′′ in practice is intractable when the dimension n is large and we are unclear about in what direction and by what degree the cube ′ is rotated with respect to Λ. Instead, we compromise on the constellation size and use the cube ′′ which is parallel to Λ and is enclosed in the maximal ball inscribed in ′ . In this manner, we can make sure there always exists such a constellation space ′′ and it is straightforward to calculate its size. Figure 1 illustrates this idea in 2-dimensional case. If the side length of Λ is q, the side of ′ turns out to have length q √ n∕2 , and the side of ′′ will be q∕

Tailored RLWE channel model in canonical basis
Given the tailored constellation space ′′ and its corresponding constellation diagram, we now have a tailored RLWE channel model in the canonical basis: As discussed in formula (7), H i and H j are independent for two different indices i,j as long as ⌈i/2⌉≠⌈j/2⌉; otherwise H i = H j . Similarly, Z i and Z j are independent if ⌈i/2⌉≠⌈j/2⌉ otherwise they are correlated. We observe that the tailored channel model in formula (8) can be seen as a fading channel where H i is the channel gain and Z i is the additive noise. A family of fading channels (e.g., i.i.d. fading, block fading, compound fading) are well studied in existing work of [6,18,36] and explicit constructions of error-correcting codes are given. In this work, since H i and Z i have the same coherence interval of two coordinates, our strategy is to divide the n parallel channels into two groups of i.i.d. channels and we construct two parallel polar codes of equal block length n/2 for the two ℤ∕2ℤ fading channels. Note that in this work we use parameters similar to NewHope, e.g., q = 12289,n = 1024, r ∈{1,2,6,9} where the values of r correspond to the "Short" and "Tall" parameters in [8].
Denote by L and L ′ two one-dimensional lattices ⌊ q 2 ⌋ 1  [18] and we are about to adapt their strategy to our tailored RLWE channel model. A diagram of a fading L∕L � channel with CSI available to the decoder is shown in Fig. 2. The pdf of H in terms of various choices of parameter r is depicted in Fig. 3.
As discussed in [12] and [18], the capacity of the fading L∕L � channel is given by where E H [⋅] denotes the expectation over the fading coefficient, (L, 2 ) and (L � , 2 ) are differential entropies of mod-L and mod-L ′ channels respectively, and |L∕L � | is the order of the partition L∕L � . Specifically, (L, 2 ) is given by where R is a fundamental region of lattice L, g (h ) 2 (⋅) is the density function of N(0, h 2 2 ) . We refer to f L,(h ) 2 as an L-periodic Gaussian density function which is defined by summing up a set of copies of a Gaussian density function centered at every lattice point of L. The value of an L-periodic Gaussian variable z ′ is restricted to any fundamental region (10) of L such that the integral of its density function over R(L) is obviously 1. See Fig. 4 for the ergodic capacity of the fading L∕L � channel W ∶ X → (Ỹ, H) with respect to different choices of r. In a communication system, the signal-to-noise ratio (SNR) is a measure of the reliability of a channel. It is defined as the ratio of the signal strength over the noise strength 8 .
Recall it in Section 2.4 the definition of a symmetric channel. It is observed that outputs (ỹ, h) . Therefore, the fading L∕L � channel W is symmetric and we can achieve its capacity using polar codes.   Table 1 gives a high-level description of the RLWE-based PKE scheme using polar codes which are customized for our tailored RLWE channel model in canonical basis. The functions PolarEnc(⋅) and PolarDec(⋅) are encoding and decoding algorithms of polar codes which will be explicitly introduced in the sequel.

Remark 1
Unlike most RLWE encryption schemes where the error distribution χ is defined over ℤ (e.g., central Binomial in NewHope), we use the definition of χ when the ideal learning with errors problem was initially proposed in [33] where χ is defined on ℝ∕[0, q) . Moreover, according to the formal definition of ring-LWE in [21], the error distribution is also continuous over the field tensor product K ⊗ ℚ ℝ.

Remark 2
A plaintext m is uniquely mapped to a symbol ⌊ q PolarEnc( ) on the constellation diagram in canonical basis. Then it is switched to polynomial basis and turned into vector v. Note that ∈ (ℝ∕[0, q)) n but not in R q . We see it reasonable since χ is also real and continuous.
One may notice in Table 1 that Alice finally derives a mod-BR q channel (or equivalently a mod-′ channel) as in Fig. 1 rather than the mod-′′ in formula (8) (or equivalently the mod-L ′ channel in (9)). Questions arise whether the tailored RLWE channel model in formula (8) makes sense and how it will behave if we construct a polar code for the mod-′′ channel when we actually have a mod-′ channel. Lemma 3 illustrates the channel degradation relation between the two channels.

Lemma 3 (Channel Degradation Relation Between RLWE Channel and Its Tailored Vari-
ant) Let ′ be the constellation space and let ′′ be its tailored variant as in Fig. 1. Given the tailored RLWE channel model as in formula (8) with CSI H i known to the decoder as in Fig. 2, the fading L n ∕ �� channel is degraded with respect to the fading L n ∕ � channel.
Proof Denote by W ′ the fading L n ∕ � channel y � = x + h * z mod � where y � ∈ R( � ) , x ∈ L n ∩ R( � ) is the channel input, h is the channel gain and z is the Gaussian noise. In the same fashion, we define the fading L n ∕ �� channel W ′′ as y �� = x + h * z mod �� where y �� ∈ R( �� ) , x ∈ L n ∩ R( �� ).
As formula (10) indicates, the L∕L � fading channel with CSI known to the receiver in formula (9) can be viewed as an independent combination of channel gain h and an L∕L � Gaussian channel. Therefore, with no loss of generality, we can view the channel gain h as a constant. We can rewrite channel W ′ as W � ∶ y � = x + z � mod � and rewrite W ′′ as W �� ∶ y �� = x + z � mod �� where z � ∼ N(0, h 2 2 ) . The channel transition probability of W ′ is where g (h ) 2 represents the density function of N(0, h 2 2 ) and n � = z � mod � . The channel transition probability of W ′′ is where n �� = z � mod �� and the equality (a) is due to the relation � = √ 2B �� , � = √ 2B �� , and n � ∈ R( � ) , n �� ∈ R( �� ) . We observe from equation (13) . Since the transition probabilities in equation (12) and equation (13) The above concatenation satisfies the definition of channel degradation (Definition 3). □ Given the channel degradation relation between the fading L n ∕ � channel W ′ and the fading L n ∕ �� channel W ′′ , it is guaranteed by Lemma 1 that the polar codes constructed for W ′′ also fit in with W ′ . How to explicitly construct polar codes will be shown in next section.

Polar coding for the tailored RLWE channel
As discussed in Section 2.4, we need a BDMS channel before we can adapt the polar coding method, including calculating the Bhattacharyya parameters of the synthesized channels, defining the information set A and frozen set A c , encoding and SC decoding. We have already proved the fading L∕L � channel W ∶ X → (Ỹ, H) as in formula (9) is symmetric in Section 3.3. Since we assume the channel gain H and Gaussian noise Z to be continuous and so is the channel output, we need to discretize the channel output H,Ỹ before constructing polar codes. An elegant channel quantization scheme was proposed in [18] where the two output H and Ỹ are discretized independently with tractable loss of channel capacity. Basically, the channel gain H is discretized into a series of discrete values with uniform occurrence probability. As for the output Ỹ , we will decompose the L∕L � channel into multiple BDMS channels such that the overall channel capacity almost preserves with only negligible loss.

Quantization of the fading coefficient
As discussed in previous sections, the fading L∕L � channel with CSI available to the decoder is statistically equivalent to an independent combination of the fading coefficient H and an L∕L � channel with additive Gaussian noise of variance (hσ) 2 . Therefore, we firstly quantize H then the L∕L � channel. Let {α i } be an ascending sequence in the following form so that for 1 ≤ i ≤ m we have We take the centroid with respect to the interval (α i ,α i+ 1 ) as the discretized alphabet H q = {h i } for i = 1,⋯ ,m where h i is calculated as follows.

Degrading transform quantization
As in Fig. 2 we view the tailored RLWE channel as an i.i.d. fading channel. For such a channel, polar codes are constructed in [18] to achieve the ergodic capacity C(W) as long as the receiver knows the CSI and the transmitter knows the CDI. Given n ( n = 2 l , l ∈ ℤ ) i.i.d. tailored RLWE channels W ∶ X → (Ỹ, H) , we define the channel input as X 1:n = U 1:n G n where U 1:n ∈{0,1} 1:n and G n is the generator matrix 9 . We obtain n synthesized channels W (i) n ∶ U (i) → (U 1∶i−1 ,Ỹ 1∶n , H 1∶n ) for 1 ≤ i ≤ n by performing channel combining and channel splitting. The Bhattacharyya parameter for W is defined as To compute Z(W (i) n ) efficiently, we employ the degrading transform proposed in [34] to quantize a BMS channel W with continuous output alphabet into a degraded and approximated BDMS channel W Q with finite output alphabet size. Intuitively, the finer the discretized output alphabet is, the better W Q approximates W. Since we have already discretized H as h i for i = 1,⋯ ,m, we can consider h i as a constant and quantize the L∕L � channel We define the likelihood ratio (LR) of a channel W as where the transition probability WỸ |X,h i is h|1). We can see it in Fig. 5 that the channel W h i ∶ X, h i →Ỹ is BMS with Ỹ continuously located over the interval [0, q∕ √ 2) . There exists a permutation function

BMS channel W h i can be decomposed into infinite binary symmetric channels (BSCs)
} and the crossover probability is the corresponding probability density  where the integral interval is restricted to ỹ such that (ỹ, h i ) ≥ 1 . If we ignore the subtle geometrical error introduced by rounding ⌊⋅⌋, we can observe a symmetry feature in the graphs in Fig. 5 and we find that the valid integral interval is We divide the interval A into ν segments A j for j ∈ [ν] such that where ⦓ (⋅) is the binary entropy function. Each A j corresponds to a BSC channel with crossover probability where Since lattice L ′ is infinite, we can numerically approximate f L � ,0,h 2 If we define z j and its conjugate z j to be the channel output of the BSC associated with A j , we will obtain the discretized output alphabet of W h i as If we denote by W Q the discretized version of the original fading L∕L � channel where ⊗ denotes the Cartesian product of two sets.

Lemma 4
The channel W Q ∶ X → Z, H q is degraded with respect to W.

Corollary 1
Given that W Q ∶ X → Z, H q is degraded with respect to W, the capacity, Bhattacharyya parameter and frame error rate of the two channels are related as Proof As a corollary of Lemmas 2 and 4. □ It is indicated in [34] that the capacity loss introduced by the degrading transform is no greater than 1/ν. If we choose large alphabet size m and 2ν, the loss of capacity is negligible and so is Z(⋅) and P e (⋅). We also verified our channel quantization scheme with respect to the channel capacity. As is shown in Fig. 6, for m = 20,ν = 50 and multiple choices of r, C(W Q ) is close to C(W) with only negligible difference.
To summarize, what the degrading transform does is to convert the RLWE channel W with continuous output alphabet into a BDMS channel W Q with finite output, which can be viewed as a combination of m × ν BSC channels. In this way, one can construct polar codes for W Q which also fit in with W.

Encoding algorithm PolarEnc(⋅)
Given the BDMS channel W Q derived by channel quantization, we can adapt the polar encoding and decoding method introduced in Section 2.4 to W Q . Recall that the output alphabet of W Q is m × 2ν. As the channel combining and splitting process continue, the , P e (W Q ) ≥ P e (W). Capacity (bits/symbol) r=0.5 r=0.6 r=0.7 r=0.9 r=1 r=1.5 r=2 r=2.5 r=3 r=3.5 r=4 r=4.5 r=5 r=5.5 r=6 C(W) C(W Q ) 1 3 alphabet size of the synthesized channels W (i) Qn will increase exponentially as the recursion proceeds. To handle this problem, we employ an approximation method proposed in [27] which can reduce the alphabet size of a BDMS channel with negligible and tractable loss of performance by merging some of the output symbols.
After we finish computing the Bhattacharyya parameters of all the W (i) Qn , we can define the information set A and frozen set A c . Recall the encoding algorithm PolarEnc(m) in Table 1. We construct polar codes for plaintext m = u 1:n as where u A is the information vector and u A c is the frozen vector. The complexity of encoding is O(n log n) where n is equal to the degree of the cyclotomic field of RLWE.

Decoding algorithm PolarDec(⋅)
The decoding algorithm PolarDec(⋅) is exactly the same as the so called successive cancellation (SC) decoding initially proposed in [5]. Upon receiving the signal ỹ 1∶n (i.e. ỹ 1∶n =B in Table 1) and invoking their knowledge of the CSI h 1:n , the recipient applies the SC decoding to ỹ 1∶n , h 1∶n and gives an estimation ū 1∶n of u 1:n as where the transition probabilities of synthesized channels W (i) n (⋅|⋅) can be recursively calculated by SC decoding algorithm with complexity O(n log n) . Details of SC decoding can be found in Appendix A. A frame error occurs if ū 1∶n ≠ u 1∶n ; we may interchangeably use frame error probability and DFR in this work. Additionally, PolarEnc(⋅) and Polar-Dec(⋅) require constant steps of operations for fixed choices of n, A , making isochronous implementations possible. Details about isochrony will be discussed in Section 7.2.

Results: Performance analysis and improvement
According to Theorem 2, the frame error probability P e (n, R, A) of SC decoding is upper bounded by the sum of Z(W (i) n ) . Since W Q ⪯ W and W (i) Qn ⪯ W (i) n according to Lemma 1, we derive Recall it in Fig. 6 that the capacity of our tailored RLWE channel deteriorates dramatically because we use a tailored and shrunk constellation diagram. As a result, for most choices of r which are believed to be secure in RLWE-based PKE, we cannot obtain a desired DFR lower than 2 − 128 which is used as a benchmark in NIST standardization. As explained in Section 3.2, we carefully and conservatively choose a cube ′′ which is enclosed in the maximal sphere inscribed in ′ . Almost surely there are other valid choices of ′′ lager than the one we choose, though it is not easy at all to figure out the optimal one. A pragmatic solution to this harsh problem is to gradually scale ′′ up by a factor t ≥ 1 and run simulations for each to justify if the numerical results of P e coincide with the upper bound in formula (16). We highlight that if t is not larger than some critical point, the channel degradation relation in Lemma 3 will still hold. Therefore, the theoretical upper bound on P e will still apply after we scale the modulation constellation ′′ . Please refer to Remark 3 for further explanation. Figure 7 compares the upper bounds of frame error probability P e with our simulation results in the setting of q = 12289,n = 1024,r = 1. The solid lines indicate the upper bounds of P e with respect to different code rate R. The solid lines with stars represent our simulation results which, for reasonably small DFR, comply with the upper bound. We aim to achieve P e = 2 − 128 at code rate R = 0.25. Apparently, it is unachievable when the scale factor t = 1. We gradually increase t and obtain the corresponding estimation of P e . We can see that the decoding performance is improved significantly upon a slightly larger t, e.g., P e is smaller than 10 − 60 (≈ 2 − 200 ) at R = 0.25 for t = 2. When t = 2, the experiment result represented by the red star also complies with its corresponding theoretical estimation, i.e., the red solid line. It implies that our estimation of P e for t = 2 is reliable to some extent. Please note that all these experiments target at relatively large P e which is feasible to verify. Figure 8 can be interpreted in the same manner as Fig. 7. The only different parameter used here is r = 2. The solid lines in different colors represent our estimation of P e and the stars are our simulation results. By making scale factor t as large as 6, the target R and P e can be achieved. For relatively large P e shown in the graph, we observed that our simulation results comply with our estimation when t = 6,7,9,11,12. However, when t = 14, simulation results are worse than our estimation, implying that the constellation diagram ′′ is overwhelmingly large and goes beyond the valid domain.
In Fig. 9, r = 2.83. We can observe that our estimations are effective for t = 8,12 but fail for t > 12. We can see that none of our simulation results comply with the estimations in Fig. 10. It implies that the scaling method does not apply for r ≥ 3.46.

Remark 3
The error sources for the scaled and tailored RLWE channel model are concluded as follows.
(a) As t increases, the constellation space ′′ may go beyond ′ and our model will fail to describe the statistical feature of the real channel. (b) The SC decoder takes B y to be the channel output of a fading L n ∕ �� channel while it is actually a fading L n ∕ � ∕ �� channel according to Table 1. This is because Alice firstly performs a mod R q operation and then calculates B y upon receiving y from Bob. For small r, the two channels have quite close distributions but they become less likely as r goes larger. This explains why our model fails when r ≥ 3.46 in Fig. 10. (c) It might be misunderstood that for any t > 1 the theoretical estimation in formula (16) would not apply. This is exactly not the case. As stated in Section 3.2, the constellation ′ shrinks √ n times in length and becomes the tailored one ′′ . As a result of (a) and (b), slightly increasing ′′ will not affect the soundness of the channel degradation relation and formula (16) if t does not go beyond some critical point. To find such point is nontrivial. That is why we run simulations to explore the relation between t, r and DFR. The disadvantage of this pragmatic method is that we can not verify small P e of cryptographic interest.

Security improvement by new DFR
We define the concrete bit-security to be log 2 of the time complexity of certain attacks breaking a scheme of specific parameters of interest. We analyze the concrete bit-security of the proposed RLWE-based PKE by considering the best known generic attacks against ring-LWE and the corresponding cost models. A comprehensive survey of a variety of generic attacks and cost models can be found in [1,2]. Since the proposed RLWE-based PKE differs from NewHope solely in the way plaintext is encoded and decoded, and the error-correction code itself does not affect security reduction, therefore the security estimation of NewHope [4] can be extended to our case.
Following the security estimation in [4,19], we focus on two generic attacks. Essentially, we will consider (a) a primal attack which consists of constructing a unique shortest vector problem (uSVP) given LWE samples and solving it using block Korkin-Zolotarev (BKZ) algorithm with classical/quantum sieving (b) a dual attack which searches for the shortest vector in a dual lattice constructed by LWE samples using BKZ with classical/ quantum sieving. We employ the cost model in [4] where the cost of BKZ with classical/  [13] quantum sieving is 0.292β/0.265β with β the block dimension of BKZ. In Table 2, we summarize the security estimates of the two attacks where the cost is defined as log 2 of time complexity of BKZ 10 . Note that a variant of the dual attack is used by the estimator which makes the cost different from [4].
There exists a trade-off relation between DFR and bit-security level of RLWE-based PKE. Basically, larger error term (or larger binomial parameter k in NewHope) gives better security but worse DFR. The motivation of this work is to employ polar codes to give a safer DFR margin such that we can improve the bit-security level while achieving the target DFR. In NIST standardization, this target DFR is 2 − 128 . A more conservative target 2 − 140 is used in the literature [13,32]. Table 2 illustrates the DFR and bit-security level of RLWE-based PKE using our polar coding scheme for different choices of binomial parameter k ( r = √ k∕2 ) and scale factor t. As we discussed in previous section, the scale factor of the constellation diagram cannot be larger than 12 for k = 8, otherwise the estimation of DFR is no longer valid. We select a more conservative choice t = 11 and achieve DFR= 2 − 298 for n = 1024,q = 12289,k = 8 using our polar coding scheme which is smaller than the DFR 2 − 216 of NewHope round 2 in the same setting. As discussed in Fig. 10, our calculation of DFR for k ≥ 24 (r ≥ 3.46) no longer applies.
In conclusion, our polar coding scheme and the selected parameters provide the RLWEbased PKE with a bit-security of at least 256 bits while achieving the target DFR 2 − 140 (and also 2 − 128 ). This is a considerable improvement compared with NewHope round 2 which offers a bit-security of 235 bits with the same parameters. The state-of-the-art study of this kind can be classified into two categories. In [13], LDPC and BCH codes are used to increase the bit-security to 309 bits while achieving DFR of 2 − 140 . However, their DFR estimation highly relies on an "independence" assumption and their error-correcting algorithms are not isochronous. The other approach was proposed by Song et al. in [32] which gave a tighter bound on DFR of NewHope and the bit-security is increased to 252 bits.

Resilience against timing-based attacks
When error-correcting codes are adapted to RLWE-based PKE, a major concern is the resilience against timing-based attacks. Discussions of this kind can be found in [19,31]. We employ a semi-formal definition of constant-time algorithms which is called "isochrony" in [15]. We view an algorithm to be isochronous if its execution time is independent of the sensitive part of its input and output. This is a weaker notion than the conventional definition but suffices to argue security against timing attacks. We will justify the isochrony of polar encoding and decoding in this section. Encoding As introduced in Section 2.4 as well as Section 5.3, the encoding of polar codes takes plaintext u 1:n as input and yields codewords as equation (1). The block length n is equal to the degree of cyclotomic field of RLWE. The encoding process comprises exactly n log n 2 many XOR logical operations no matter what the plaintext u 1:n is. This can be verified by some trivial examples as in Fig. 11. Note that it is sensible to carry out the calculation of Bhattacharyya parameters for the synthesized channels offline. Because they are determined by the distribution of the residue noise term e ⋅ t − s ⋅ e 1 + e 2 and can be done once and for all. Therefore, the encoding is isochronous.
Decoding As detailed in Appendix A, the SC decoding comprises three types of operations, i.e. (1) recursive calculation of the transition probabilities W (i) n as in Algorithm 2 (2) comparisons of two transition probabilities as line 9 of Algorithm 1 (3) XOR logical operations as in Algorithm 3. As in [15], we prove the SC decoding to be isochronous by showing that its timing is irrelevant to the sensitive information of the protocol. Regarding the decoding of RLWE PKE, the sensitive information includes the input B in Table 1 (we use shorthand notation y 1:n ) and output ū 1∶n (i.e. the decoding result of plaintext u 1:n ) of SC decoding and the secret terms e,s,t,e 1 ,e 2 separately generated by each side of protocol. Note that the information set A and its complement A c are determined by the distribution of secret terms and block length n which are publicly known. The frozen vector (e.g., an all zero vector) is also publicly known. Table 3 illustrates what types of operations are isochronous with respect to the sensitive information.
Firstly, recursively calculations of W (i) n are isochronous because their timings are irrelevant to any sensitive information. As described in Appendix A, for any fixed n an SC decoder carries out exactly n log n many transition probability assemblies as in equation (17) and (18). Normally, these assemblies are floating-point operations. We use transition probability rather than the more popular likelihood ratio recursions to avoid floating-point divisions which are considered difficult for isochronous implementations [29,37].
Secondly, the floating-point comparisons of two transition probabilities in Algorithm 1 are the decision-making process which yields the output ū 1∶n . Generally speaking, comparing two close floating-point values would take longer, but it is equally likely to return True and False nonetheless. Therefore, it makes sense to consider the timings of this type of operations irrelevant to ū 1∶n . In addition, the overall running time taken by comparisons is relevant to A and n because comparisons only take place for information set A . Other sensitive information is not related to comparison operations.
Thirdly, the XOR logical operations in Algorithm 3 are the same as what happens in encoding. The quantity of XOR operations carried out by Algorithm 3 is uniquely determined by block length n. We conclude that the encoding and decoding are isochronous with respect to sensitive information including the plaintext, the input and output of SC decoding and the secret terms e,s,t,e 1 ,e 2 .

Conclusions
We have presented the first example of a polar coding technique to improve the DFR of RLWE-based PKE which takes advantage of viewing the protocol as a fading channel with CSI known to the decoder. Moreover, switching from polynomial basis to canonical basis unfastens the dependency existing in the residue noise term. The constellation space is tailored to derive an i.i.d. fading channel at the cost of decoding performance and a scaling method is employed to counteract the performance loss. Both numerical and theoretical results are given to verify the DFR estimation. The advantages of our method are as follows.
• We derive an i.i.d. channel model of the residue noise term in H space using canonical embedding. The advantage that some knowledge of noise term is known by the decoder is taken to improve the decoding performance. • The bit-security is increased to 256 bits while achieving the target DFR of 2 − 140 in the setting of n = 1024,q = 12289,k = 16 (r = 2.83). This improvement is better than the benchmark 252 bits in [32]. Though it does break the record of 309 bits in [13], their results rely on an "independence" assumption that may not hold nonetheless. • Polar codes support isochronous implementations of encoding and decoding while LDPC and BCH codes employed in [13] do not. We show the encoding and decoding of polar codes to be isochronous with respect to sensitive information of the protocol.
The disadvantages are also given as follows.
• Switching between the two basis by multiplying matrix B and B −1 as in Table 1 increases the complexity of the protocol. • To derive an i.i.d. channel model, we designed a tailored modulation diagram which gives closer code distance than the original modulation diagram {0, ⌊ q 2 ⌋} . It hurts the decoding performance but the power of polar codes and the proposed scaling method counteract this effects to some extent. • However, the critical points of the scale factor t and the noise parameter r beyond which the theoretical upper bound on P e no longer applies are currently missing.

Appendix A Successive cancellation decoding
SC decoding is proposed in [5, Section VIII] and modularized in [35,Section II]. Upon observing the signal y 1:n , the SC decoding works as in Algorithm 1 and gives the estimation of u 1:n . We now illustrate SC decoding by taking an example of n = 8 as in Fig. 12. In Fig. 12, an SC decoder is described as a circuit consisting of n × (l + 1) nodes for n = 2 l . These nodes are pairwise connected by " " wires. We also define two probability arrays . Each array consists of n × (l + 1) elements. In the circuit, every node can be specified by a phase parameter ϕ, a branch parameter ψ and a layer number m where 0 ≤ m ≤ l, 1 ≤ ϕ ≤ 2 m , 0 ≤ ψ < 2 l−m . For any integer 1 ≤ i ≤ n, it has a unique representation as where we use the shorthand notation i = 〈ϕ,ψ〉 when m is clear. As shown in Fig. 12, nodes on layer m are classified into 2 m phases and 2 l−m branches. Nodes in the same phases are in the same color.

Algorithm 2 CalP(⋅)
In CalP(⋅), a node at layer m assembles the outputs of two nodes at layer m − 1 of the same phase but different branches, then it yields two probability values and stores them in the corresponding position in array PReg 0 and PReg 1 . This process takes place in the circuit from rightmost to leftmost in a recursive manner in the sense that the probabilities of same phase but different branches on previous layer (two nodes on RHS of " ") are assembled and turned into probabilities of different phases but the same branch (two nodes on LHS of " ") on current layer.
To initiate, the nodes on layer m = 0 take y 1:n as input and store W(y (i) |0) and W(y Then, node A〈ϕ,ψ〉 m at layer m > 0 updates the two probability arrays as follows. Let = ⌈ 2 ⌉ . If ϕ is odd, then for α = 0, 1 If ϕ is even, then for α = 0, 1 and β = UReg[〈ϕ − 1,ψ〉][m] The above process proceeds recursively from right to left along the " " in the circuit until every node on layer m = l finishes it work.

3
where λ(e) and λ(e) refer to the diagonal elements in Theorem 3.