A new class of irreducible pentanomials for polynomial based multipliers in binary fields

We introduce a new class of irreducible pentanomials over $\mathbb{F}_2$ of the form $f(x) = x^{2b+c} + x^{b+c} + x^b + x^c + 1$. Let $m=2b+c$ and use $f$ to define the finite field extension of degree $m$. We give the exact number of operations required for computing the reduction modulo $f$. We also provide a multiplier based on Karatsuba algorithm in $\mathbb{F}_2[x]$ combined with our reduction process. We give the total cost of the multiplier and found that the bit-parallel multiplier defined by this new class of polynomials has improved XOR and AND complexity. Our multiplier has comparable time delay when compared to other multipliers based on Karatsuba algorithm.


Introduction
Finite field extensions F 2 m of the binary field F 2 play a central role in many engineering applications and areas such as cryptography. Elements in these extensions are commonly represented using polynomial or normal bases. We center in this paper on polynomial bases for bit-parallel multipliers.
When using polynomial bases, since F 2 m ∼ = F 2 [x]/(f ) for an irreducible polynomial f over F 2 of degree m, we write elements in F 2 m as polynomials over F 2 of degree smaller than m. When multiplying with elements in F 2 m , a polynomial of degree up to 2m − 2 may arise. In this case, a modular reduction is necessary to bring the resulting element back to F 2 m . Mathematically, any irreducible polynomial can be used to define the extension. In practice, however, the choice of the irreducible f is crucial for fast and efficient field multiplication.
There are two types of multipliers in F 2 m : one-step algorithms and two-step algorithms. Algorithms of the first type perform modular reduction while the elements are being multiplied. In this paper, we are interested in two-step algorithms, that is, in the first step the multiplication of the elements is performed, and in the second step the modular reduction is executed. Many algorithms have been proposed for both types. An interesting application of two-step algorithms is in several cryptographic implementations that use the lazy reduction method [23,2]. For example, in [15] it is shown the impact of lazy reduction in operations for binary elliptic curves. An important application of the second part of our algorithm, the reduction process, is to side-channel attacks. Indeed, we prove that our modular reduction requires a constant number of arithmetic operations, and as a consequence, it prevents side-channel attacks.
The complexity of hardware circuits for finite field arithmetic in F 2 m is related to the amount of space and the time delay needed to perform the operations. Normally, the number of exclusive-or (XOR) and AND gates is a good estimation of the space complexity. The time complexity is the delay due to the use of these gates.
Several special types of irreducible polynomials have been considered before, including polynomials with few nonzero terms like trinomials and pentanomials (three and five nonzero terms, respectively), equally spaced polynomials, all-one polynomials [6,11,19], and other special families of polynomials [27]. In general, trinomials are preferred, but for degrees where there are no irreducible trinomials, pentanomials are considered.
Like our family, these previous families focus on bit operations, i.e., operations that use only AND and XOR gates. In the literature it is possible to find studies that use computer words to perform the operations [21,17] but this is not the focus of our work.

Contributions of this paper
In this paper, we introduce a new class of irreducible pentanomials with the following format: (1) We compare our pentanomial with the first two families from the list above. The reason to choose these two family is that [18] presents a multiplier considering these families with complexity 25% smaller than the other existing works in the literature using quadratic algorithms. Since our multiplier is based on Karatsuba's algorithm, we also compare our method with Karatsuba type algorithms.
An important reference for previously used polynomials and their complexities is the recent survey on bit-parallel multipliers by Fan and Hasan [4]. Moreover, we observe that all finite fields results used in this paper can be found in the classical textbook by Lidl and Niederreiter [12]; see [14] for recent research in finite fields.
We prove that the complexity of the reduction depends on the exponents b and c of the pentanomial. A consequence of our result is that for a given degree m = 2b + c, for any positive integers b > c > 0, all irreducible polynomials in our family have the same space and time complexity. We provide the exact number of XORs and gate delay required for the reduction of a polynomial of degree 2m − 2 by our pentanomials. The number of XORs needed is 3m − 2 = 6b + 3c − 2 when b = 2c; for b = 2c this number is 12 5 m − 1 = 12c − 1. We also show that AND gates are not required in the reduction process. It is easy to verify that our reduction algorithm is "constant-time" since it runs the same amount of operations independent of the inputs and it avoids timing side-channel attacks [5].
For comparison purposes with other pentanomials proposed in the literature, since the operation considered in those papers is the product of elements in F 2 m , we also consider the number of ANDs and XORs used in the multiplication prior to the reduction. In the literature, one can find works that use the standard product or use some more efficient method of multiplication, such as Karatsuba, and then add the complexity of the reduction.
In this paper, we use a Karatsuba multiplier combined with our fast reduction method. The total cost is then Cm log 2 3 + 3m − 2 or Cm log 2 3 + 12 5 m − 1, depending on b = 2c or b = 2c, respectively. The constant C of the Karatsuba multiplier depends on the implementation. In our experiments, C is strictly less than 6 for all practical degrees, up to degrees 1024. For the reduction, we give algorithms that achieve the above number of operations using any irreducible pentanomial in our family. We compare the complexity of the Karatsuba multiplier with our reduction with the method proposed by Park et. al [18], as well as, with Karatsuba variants given in [4].

Structure of the paper
The structure of this paper is as follows. In Section 2 we give the number of required reduction steps when using a pentanomial f from our family. We show that for our pentanomials this number is 2 or 3. This fact is crucial since such a low number of required reduction steps of our family allows for not only an exact count of the XOR operations but also for a reduced time delay. Our strategy for that consists in describing the reduction process throughout equations, cleaning the redundant operations and presenting the final optimized algorithm. Section 3 provides the first component of our strategy. In this section, we simply reduce a polynomial of degree at most or exactly 2m − 2 to a polynomial of degree smaller than m. The second component of our strategy is more delicate and it allows us to derive the exact number of operations involved when our pentanomial f is used to define F 2 m . Sections 4 and 5 provide those analyses for the cases when two and three steps of reduction are needed, that is, when c = 1 and c > 1, respectively. We give algorithms and exact estimates for the space and time complexities in those cases. Also, we describe a Karatsuba multiplier implementation combined with our reduction. In Section 6, based on our implementation, we show that the number of XOR and AND gates is better than the known space complexity in the literature. On the other hand, the time complexity (delay) in our implementation is worse than quadratic methods but comparable with Karatsuba implementations. Hence, our multiplier would be preferable in situations where space complexity and saving energy are more relevant than time complexity. We demonstrate that our family contains many polynomials, including degrees where pentanomials are suggested by NIST. Conclusions are given in Section 7.

The number of required reductions
When operating with two elements in F 2 m , represented by polynomials, we obtain a polynomial of degree at most 2m − 2. In order to obtain the corresponding element in F 2 m , a further division with remainder by an irreducible polynomial f of degree m is required. We can see this reduction as a process to bring the coefficient in interval [2m − 2, m] to a position less than m. This is done in steps. In each step, the coefficients in interval [2m − 2, m] of the polynomial is substituted by the equivalent bits following the congruence x m ≡ x a + x b + x c + 1. Once the coefficient in position 2m − 2 is brought to a position less than m, the reduction is completed.
In this section, we carefully look into the number of steps needed to reduce the polynomial by our polynomial f given in Equation (1). The most important result of this section is that we need at most 3 steps of this reduction process using our polynomials. This information is used in the next sections to give the exact number of operations when the irreducible pentanomial given in Equation (1) is employed. This computation was possible because our family has a small number of required reduction steps.
d i x i be a polynomial over F 2 . We want to compute D red , the remainder of the division of D 0 by f , where f has the form f (x) = x 2b+c + x b+c + x b + x c + 1 with 2b + c = m and b > c > 0. The maximum number k a of reduction steps for a pentanomial x m + x a + x b + x c + 1 in terms of the exponent a is given by Sunar and Koç [22] k a = m − 2 m − a + 1.
In our case m = 2b + c and a = b + c, thus Using the same method as in [22], we can derive the number of steps required associated to the exponents b and c. These numbers are needed in Section 3. We get and Thus, the reduction process for our family of pentanomials involves at most three steps. This is a special property that our family enjoys. The general process for the reduction proposed in this paper is given in the next section. The special case c = 1, that is when our polynomials have the form f (x) = x 2b+1 + x b+1 + x b + x + 1, requires two steps. This family is treated in detail in Section 4. The general case of our family f (x) = x 2b+c + x b+c + x b + x c + 1 for c > 1 involves three steps and is treated in Section 5.

The general reduction process
The general process that we follow to get the original polynomial D 0 reduced to a polynomial of degree smaller than m is depicted in Figure 1. Without loss of generality, we consider the polynomial to be reduced as always having degree 2m − 2. Indeed, the cost to determine the degree of the polynomial to be reduced is equivalent to checking if the leading coefficient is zero.
The polynomial D 0 to be reduced is split into two parts: A 0 is the piece of the original polynomial with degree at least m and hence that requires extra work, while B 0 is formed by the terms of D 0 with exponents smaller than m and so that it does not require to be reduced. Dividing the leading term of A 0 by f with remainder we obtain D 1 . In the same way as before, we split D 1 in two parts A 1 and B 1 and repeat the process obtaining the tree of Figure 1.
Tree representing the general reduction strategy.

Determining
We trivially have

and hence
3.2 Determining A 1 and B 1 Using for clarity the generic form of a pentanomial over F 2 , f (x) = x m + x a + x b + x c + 1, dividing the leading term of A 0 by f and taking the remainder, we get Separating the already reduced part of D 1 from the piece of D 1 that still requires more work, we obtain and Since m = 2b + c and a = b + c, we have 3.3 Determining A 2 and B 2 As before, we divide the leading term of A 1 by f and we obtain the remainder and D 2c refer to the reductions of the sums in Equation (6). We start with D 2a : Proceeding with the reduction now of the second sum in Equation (6), we obtain Clearly, D 2 b is already reduced, and thus A 2 b = 0, and We finally reduce the third and last sum in Equation (6): Again, we easily check that D 2c is reduced and so A 2c = 0, and Concluding, A 2 is given by and

Determining A 3 and B 3
Dividing the leading term of A 2 in Equation (8) by f , we have We have that D 3 is reduced so A 3 = 0 and 3.5 The number of terms in A r and B r Let Let r be a reduction step. It is clear now that the precise number of terms for A r and B r , for r ≥ 0, can be obtained using k b+c , k b and k c given in Equations (2), (3) and (4). We have: i) The number of terms of A 0 and B 0 is 1.
, while the number of terms of B r is 4 times the number of terms of A r−1 .

The family of polynomials
In this section, we consider the case when c = 1, that is, when k b+c = 2, as given in Equation (2). The polynomials in this subfamily have the form For the subfamily treated in this section, since k b+c = 2, we immediately get A 2 = 0 and the expressions in the previous section simplify. As a consequence, the desired reduction is given by Using Equations (5), (7) and (9), we obtain A crucial issue that allows us to give improved results for our family of pentanomials is the fact that redundancies occur for D red in Equation (11). Let A careful analysis of Equation (11) reveals that T 1 , T 2 and T 3 are used more than once, and hence, savings can occur. We rewrite Equation (11) as One can check that by plugging T 1 , T 2 , T 3 and T 4 in Equation (12) we recover Equation (11). Figure 2 shows these operations. We remark that even though the first row in this figure is B 0 , the following two rows are not B 1 and B 2 . Indeed, those rows are obtained from B 1 and B 2 together with the optimizations provided by T 1 , T 2 , T 3 and T 4 . Using Equation (12), the number N ⊕ of XOR operations is It is also easy to see from Figure 2 that the time delay is 3T X , where T X is the delay of one 2-input XOR gate.
We are now ready to provide Algorithm 1 for computing D red given in Equation (12), and as explained in Figure 2, for the pentanomials f (x) = Putting all pieces together, we give next the main result of this section.

Family
For polynomials of the form f (x) = x 2b+c + x b+c + x b + x c + 1, c > 1, we have that k b+c = 3, implying that A 3 = 0. The reduction is given by Using Equations (5), (7), (9) and (10), we have that D red satisfies Let Again, a careful analysis of Equation (13) shows that T 1 , T 2 and T 3 are used more than once. Thus, we can rewrite Equation (13) for D red as The time delay is 3T X ; after removal of redundancies and not counting repeated terms, we obtain that the number N ⊕ of XORs is Columns 0 to c − 1 of the first three lines of Fig. 3 Theorem 2 Algorithm 2 correctly gives the reduction of a polynomial of degree at most 2m − 2 over F 2 by f (x) = x 2b+c + x b+c + x b + x c + 1 involving N ⊕ = 3m − 2 = 6b + 3c − 2 number of XORs operations and a time delay of 3T X .

Almost equally spaced pentanomials: the special case b = 2c
Consider the special case b = 2c. In this case we obtain the almost equally spaced polynomials f (x) = x 5c + x 3c + x 2c + x c + 1. Our previous analysis when applied to these polynomials entails In the computation of D red , T 1 , T 2 , T 3 and T 4 are used more than once. Figure 4 shows, graphically, these operations. After removal of redundancies,  Using Equation (15) and Figure 4, we naturally have Algorithm 3.

Multiplier in F 2 [x], complexity analysis and comparison
So far, we have focused on the second step of the algorithm, that is, on the reduction part. For the first step, the multiplication part, we simply use a standard Karatsuba recursive algorithm implementation; see Algorithm 4. Recursivity in hardware can be an issue; see [24] and [13], for example, for efficient hardware implementations of polynomial multiplication in finite fields using Karatsuba's type algorithms.
As can be seen our multiplier consists of two steps. The first is the multiplication itself using Karatsuba arithmetic or, if necessary, the school book & is a bitwise AND operator end m2 = f loor(m/2) split A and B high a , lowa ← split(A, m2) method, and the second is the reduction described in the previous sections. The choice of the first step method will basically depend on whether the application requirement is to minimize area (Karatsuba), i.e., the number of ANDs and XORs gates, or to minimize the arithmetic delay (School book); see [4] for several variants of both the schoolbook and Karatsuba algorithms. Minimizing the area is interesting in applications that need to save power at the expense of a longer runtime. We chose the Karatsuba multiplier since our goal is to minimize the area, i.e. to minimize the number of gates AND and XOR. A summary of our results compared with related works is given in Tables 1 and 2. Table 1 presents comparison costs among multipliers that perform two steps for the multiplication, that is, they execute a multiplication followed by a reduction. The table shows the multiplication algorithm used in each case. Table 2 gives a comparison among the state-of-the-art bit multipliers in the literature. The main target for us is [18] since it presents the smallest area in the literature. However, Type 3 polynomials are also considered; this is another practically relevant family of polynomials. With respect to Karatsuba variants, Table 3 of survey [4] shows asymptotic complexities of several Karatsuba multiplication algorithms without reduction.
For each entry in Table 1, we give the multiplication algorithm and the amount of gates AND, XOR as well its delay. We point that for [19] and [25], their multipliers are general for any pentanomial with a ≤ m 2 instead of for a specific family such as [20]. In the case of our family, in addition to the number of XORs for the reduction, we include the cost for the multiplication due to the recursive Karatsuba implementation multiplier, that is, the XOR count is formed by the sum of the XORs of the Karatsuba multiplier and the ones of the reduction part. In our implementation, the constant of Karatsuba is strictly less than 6; see Figure 5 for degrees up to 1024. As can be seen, for degrees powers of 2 minus 1 (2 k − 1, k ≥ 1), the constant achieves local minimum. For the number of AND gates, we provide an interval. The actual number of AND gates depends on the value of m; it only reaches a maximum when m = 2 k − 1, for k ≥ 1.
In Table 2, we provide the number of XORs and ANDs gates for Type 1 and Type 2 families in [18] and [20], Type 3 in [19] and our family of pentanomials. We point out that in [18] the authors compute multiplication and reduction as a unique block with a divide-and-conquer approach using squaring. In contrast, Table 1 Two steps multipliers cost comparison for different family of pentanomials.
x m + x a + x b + x c + 1 [25,20], Multiplication algorithm: Schoolbook. [20], Multiplication algorithm: Mastrovito-like Multiplier. [19], Multiplication algorithm: Mastrovito-like Multiplier. [20], Multiplication algorithm: Dual basis. we separate these two parts and use Karatsuba for the multiplier followed by our reduction algorithm. x m + x b+1 + x b + x + 1, 1 < b ≤ m 2 − 1 [18] b is odd 3m 2 + 24m + 8b + 21 4 3m 2 + 2m − 1 4 TA + (3 + log 2 (m + 1) )Tx [18] b is even 3m 2 + 24m + 8b + 17 4 3m 2 + 2m − 1 4 TA + (3 + log 2 (m + 1) )Tx Type 2 x m + x c+2 + x c+1 + x c + 1 [18] c is odd, c ≤ 3 8 (m − 7) 3m 2 + 24m + 14c + 35 4 3m 2 + 2m − 1 4 TA + (3 + log(m + 1) )Tx [18] c is even, c ≤ m 2 − 1 3m 2 + 24m + 14c + 45 4 The costs for using our pentanomials for degrees proposed by NIST can be found in Table 3. The amount of XOR and AND gates are the exact value obtained from Table 1. The delay costs can be separated in T A and T X , delay for AND gates and XOR gates, respectively. The delay for AND gates is due to only 1 AND gate at the lowest level of the Karatsuba recursion. The delay for the XOR gates in the Karatsuba multiplier is 3 log 2 (m − 1) since there are 3 delay XORs per level of the Karatsuba recursion. For the reduction part, we only have 3 delay XORs. Hence, the total number of XOR delays is 3 log 2 (m − 1) + 3. Table 4 shows the number of irreducible pentanomials of degrees 163, 283 and 571 for the families considered since those are NIST degrees where pentanomials have been recommended [16]. Analyzing the table, we have that family Type 1 has the most irreducible pentanomials, but few of them have degrees recommended by NIST [16]. The first family of Type 2, proposed in [18], has restrictions in the range of c; this family presents the highest number of representatives with NIST degrees of interest. The second family of Type 2, proposed in [20], has no restrictions for c; this family presents the largest number of irreducible polynomials. Type 3 is the special case from [19]. Our family for b = 2c has less irreducible polynomials and it has no irreducible polynomials with degrees 163, 283 and 571. In the other side, when b = 2c our family has 730 polynomials of degrees up to 1024 and it presents 5 pentanomials of NIST degrees.
In the following we comment on the density of irreducible pentanomials in our family. Table 5 lists all irreducible pentanomials of our family for degrees up to 1024; N ⊕ is the number of XORs required for the reduction. We leave as an open problem to mathematically characterize under which conditions our pentanomials are irreducible.

Conclusions
In this paper, we present a new class of pentanomials over F 2 , defined by x 2b+c + x b+c + x b + x c + 1. We give the exact number of XORs in the reduction process; we note that in the reduction process no ANDs are required. It is interesting to point out that even though the cases c = 1 and c > 1, as shown in Figures 2 and 3, are quite different, the final result in terms of number of XORs is the same. We also consider a special case when b = 2c where further reductions are possible.
There are irreducible pentanomials of this shape for several degree extensions of practical interest. We provide a detailed analysis of the space and time complexity involved in the reduction using the pentanomials in our family. For the multiplication process, we simply use the standard Karatsuba algorithm.
The proved complexity analysis of the multiplier and reduction considering the family proposed in this paper, as well as our analysis suggests that these pentanomials are as good as or possibly better to the ones already proposed.
We leave for future work to produce a one-step algorithm using our pentanomials, that is, a multiplier that performs multiplication and reduction in a single step using our family of polynomials, as well as a detailed study of the delay obtained using this algorithm.