## 1 Introduction

Finite field extensions $${\mathbb F}_{2^m}$$ of the binary field $${\mathbb F}_{2}$$ play a central role in many engineering applications and areas such as cryptography. Elements in these extensions are commonly represented using polynomial or normal bases. We center in this paper on polynomial bases for bit-parallel multipliers.

When using polynomial bases, since $${\mathbb F}_{2^m}\cong {\mathbb F}_{2}[x]/(f)$$ for an irreducible polynomial f over $${\mathbb F}_{2}$$ of degree m, we write elements in $${\mathbb F}_{2^m}$$ as polynomials over $${\mathbb F}_{2}$$ of degree smaller than m. When multiplying with elements in $${\mathbb F}_{2^m}$$, a polynomial of degree up to $$2m-2$$ may arise. In this case, a modular reduction is necessary to bring the resulting element back to $${\mathbb F}_{2^m}$$. Mathematically, any irreducible polynomial can be used to define the extension. In practice, however, the choice of the irreducible f is crucial for fast and efficient field multiplication.

There are two types of multipliers in $$\mathbb {F}_{2^m}$$: one-step algorithms and two-step algorithms. Algorithms of the first type perform modular reduction while the elements are being multiplied. In this paper, we are interested in two-step algorithms, that is, in the first step the multiplication of the elements is performed, and in the second step the modular reduction is executed. Many algorithms have been proposed for both types. An interesting application of two-step algorithms is in several cryptographic implementations that use the lazy reduction method [2, 23]. For example, in [15] it is shown the impact of lazy reduction in operations for binary elliptic curves. An important application of the second part of our algorithm, the reduction process, is to side-channel attacks. Indeed, we prove that our modular reduction requires a constant number of arithmetic operations, and as a consequence, it prevents side-channel attacks.

The complexity of hardware circuits for finite field arithmetic in $${\mathbb F}_{2^m}$$ is related to the amount of space and the time delay needed to perform the operations. Normally, the number of exclusive-or (XOR) and AND gates is a good estimation of the space complexity. The time complexity is the delay due to the use of these gates.

Several special types of irreducible polynomials have been considered before, including polynomials with few nonzero terms like trinomials and pentanomials (three and five nonzero terms, respectively), equally spaced polynomials, all-one polynomials [7, 12, 19], and other special families of polynomials [27]. In general, trinomials are preferred, but for degrees where there are no irreducible trinomials, pentanomials are considered.

The analysis of the complexity using trinomials is known [26]. However, there is no general complexity analysis of a generic pentanomial in the literature. Previous results (see [5] for details) have focus on special classes of pentanomials, including:

• $$x^m + x^{b+1} + x^{b} + x^{b-1} + 1$$, where $$2 \le b \le m/2-1$$ [9, 11, 18, 20, 28];

• $$x^m + x^{b+1} + x^{b} + x + 1$$, where $$1< b < m-1$$ [9, 10, 18,19,20, 28];

• $$x^{m} + x^{m-c} + x^{b} + x^{c} + 1$$, where $$1 \le c< b < m-c$$ [3];

• $$x^m + x^a + x^b + x^c + 1$$, where $$1 \le c< b < a \le m/2$$ [19];

• $$x^{m} + x^{m-s} + x^{m-2s} + x^{m-3s} + 1$$, where $$(m-1)/8 \le s \le (m-1)/3$$ [19];

• $$x^{4c} + x^{3c} + x^{2c} + x^{c} + 1$$, where $$c = 5^i$$ and $$i \ge 0$$ [7, 8].

Like our family, these previous families focus on bit operations, i.e., operations that use only AND and XOR gates. In the literature, it is possible to find studies that use computer words to perform the operations [17, 21], but this is not the focus of our work.

### 1.1 Contributions of this paper

In this paper, we introduce a new class of irreducible pentanomials with the following format:

\begin{aligned} f(x) = x^{2b+c} + x^{b+c} + x^b + x^c + 1, b> c > 0. \end{aligned}
(1)

We compare our pentanomial with the first two families from the list above. The reason to choose these two family is that [18] presents a multiplier considering these families with complexity 25% smaller than the other existing works in the literature using quadratic algorithms. Since our multiplier is based on Karatsuba’s algorithm, we also compare our method with Karatsuba type algorithms.

An important reference for previously used polynomials and their complexities is the recent survey on bit-parallel multipliers by Fan and Hasan [5]. Moreover, we observe that all finite fields results used in this paper can be found in the classical textbook by Lidl and Niederreiter [13]; see [14] for recent research in finite fields.

We prove that the complexity of the reduction depends on the exponents b and c of the pentanomial. A consequence of our result is that for a given degree $$m=2b+c$$, for any positive integers $$b>c>0$$, all irreducible polynomials in our family have the same space and time complexity. We provide the exact number of XORs and gate delay required for the reduction of a polynomial of degree $$2m-2$$ by our pentanomials. The number of XORs needed is $$3m-2=6b+3c-2$$ when $$b \ne 2c$$; for $$b=2c$$ this number is $$\frac{12}{5}m - 1=12c-1$$. We also show that AND gates are not required in the reduction process. It is easy to verify that our reduction algorithm is “constant-time” since it runs the same amount of operations independent of the inputs and it avoids timing side-channel attacks [6].

For comparison purposes with other pentanomials proposed in the literature, since the operation considered in those papers is the product of elements in $${\mathbb F}_{2^m}$$, we also consider the number of ANDs and XORs used in the multiplication prior to the reduction. In the literature, one can find works that use the standard product or use some more efficient method of multiplication, such as Karatsuba, and then add the complexity of the reduction.

In this paper, we use a Karatsuba multiplier combined with our fast reduction method. The total cost is then $$C m^{\log _2{3}} + 3m-2$$ or $$C m^{\log _2{3}} + \frac{12}{5}m-1$$, depending on $$b \not = 2c$$ or $$b = 2c$$, respectively. The constant C of the Karatsuba multiplier depends on the implementation. In our experiments, C is strictly less than 6 for all practical degrees, up to degrees 1024. For the reduction, we give algorithms that achieve the above number of operations using any irreducible pentanomial in our family. We compare the complexity of the Karatsuba multiplier with our reduction with the method proposed by Park et al. [18], as well as, with Karatsuba variants given in [5].

### 1.2 Structure of the paper

The structure of this paper is as follows. In Sect. 2, we give the number of required reduction steps when using a pentanomial f from our family. We show that for our pentanomials this number is 2 or 3. This fact is crucial since such a low number of required reduction steps of our family allows for not only an exact count of the XOR operations but also for a reduced time delay. Our strategy for that consists in describing the reduction process throughout equations, cleaning the redundant operations and presenting the final optimized algorithm. Section 3 provides the first component of our strategy. In this section, we simply reduce a polynomial of degree at most or exactly $$2m-2$$ to a polynomial of degree smaller than m. The second component of our strategy is more delicate, and it allows us to derive the exact number of operations involved when our pentanomial f is used to define $${\mathbb F}_{2^m}$$. Sections 4 and 5 provide those analyses for the cases when two and three steps of reduction are needed, that is, when $$c=1$$ and $$c>1$$, respectively. We give algorithms and exact estimates for the space and time complexities in those cases. Also, we describe a Karatsuba multiplier implementation combined with our reduction. In Sect. 6, based on our implementation, we show that the number of XOR and AND gates is better than the known space complexity in the literature. On the other hand, the time complexity (delay) in our implementation is worse than quadratic methods but comparable with Karatsuba implementations. Hence, our multiplier would be preferable in situations where space complexity and saving energy are more relevant than time complexity. We demonstrate that our family contains many polynomials, including degrees where pentanomials are suggested by NIST. Conclusions are given in Sect. 7.

## 2 The number of required reductions

When operating with two elements in $${\mathbb F}_{2^m}$$, represented by polynomials, we obtain a polynomial of degree at most $$2m-2$$. In order to obtain the corresponding element in $${\mathbb F}_{2^m}$$, a further division with remainder by an irreducible polynomial f of degree m is required. We can see this reduction as a process to bring the coefficient in interval $$[2m-2,m]$$ to a position less than m. This is done in steps. In each step, the coefficients in interval $$[2m-2,m]$$ of the polynomial is substituted by the equivalent bits following the congruence $$x^m \equiv x^a+x^b+x^c+1$$. Once the coefficient in position $$2m-2$$ is brought to a position less than m, the reduction is completed.

In this section, we carefully look into the number of steps needed to reduce the polynomial by our polynomial f given in Eq. (1). The most important result of this section is that we need at most 3 steps of this reduction process using our polynomials. This information is used in the next sections to give the exact number of operations when the irreducible pentanomial given in Equation (1) is employed. This computation was possible because our family has a small number of required reduction steps.

Let $$D_0(x) = \sum _{i=0}^{2m-2}d_ix^i$$ be a polynomial over $${\mathbb F}_{2}$$. We want to compute $$D_{red}$$, the remainder of the division of $$D_0$$ by f, where f has the form $$f(x) = x^{2b+c} + x^{b+c} + x^b + x^c + 1$$ with $$2b+c=m$$ and $$b>c>0$$. The maximum number $$k_a$$ of reduction steps for a pentanomial $$x^{m} + x^{a} + x^b + x^c + 1$$ in terms of the exponent a is given by Sunar and Koç [22]

\begin{aligned} k_a = \left\lfloor \frac{m-2}{m-a} \right\rfloor + 1. \end{aligned}

In our case $$m=2b+c$$ and $$a=b+c$$, thus

\begin{aligned} k_{b+c}= & {} \left\lfloor \frac{2b+c-2}{2b+c-b-c} \right\rfloor + 1 = \left\lfloor \frac{c-2}{b} \right\rfloor + 3 \nonumber \\= & {} {\left\{ \begin{array}{ll} 2 &{}\quad \text {if } c = 1, \\ 3 &{}\quad \text {if } c > 1. \end{array}\right. } \end{aligned}
(2)

Using the same method as in [22], we can derive the number of steps required associated to the exponents b and c. These numbers are needed in Sect. 3. We get

\begin{aligned} k_{b} = \left\lfloor \frac{2b+c-2}{2b+c-b} \right\rfloor + 1 = \left\lfloor \frac{b-2}{b+c} \right\rfloor + 2 = 2, \end{aligned}
(3)

and

\begin{aligned} k_c= & {} \left\lfloor \frac{2b+c-2}{2b+c-c} \right\rfloor + 1 = \left\lfloor \frac{c-2}{2b} \right\rfloor + 2 \nonumber \\= & {} {\left\{ \begin{array}{ll} 1 &{}\quad \text {if } c = 1, \\ 2 &{}\quad \text {if } c > 1. \end{array}\right. } \end{aligned}
(4)

Thus, the reduction process for our family of pentanomials involves at most three steps. This is a special property that our family enjoys.

The general process for the reduction proposed in this paper is given in the next section. The special case $$c=1$$, that is when our polynomials have the form $$f(x) = x^{2b+1} + x^{b+1} + x^b + x + 1$$, requires two steps. This family is treated in detail in Sect. 4. The general case of our family $$f(x) = x^{2b+c} + x^{b+c} + x^b + x^c + 1$$ for $$c>1$$ involves three steps and is treated in Sect. 5.

## 3 The general reduction process

The general process that we follow to get the original polynomial $$D_0$$ reduced to a polynomial of degree smaller than m is depicted in Fig. 1. Without loss of generality, we consider the polynomial to be reduced as always having degree $$2m-2$$. Indeed, the cost to determine the degree of the polynomial to be reduced is equivalent to checking if the leading coefficient is zero.

The polynomial $$D_0$$ to be reduced is split into two parts: $$A_0$$ is the piece of the original polynomial with degree at least m and hence that requires extra work, while $$B_0$$ is formed by the terms of $$D_0$$ with exponents smaller than m and so that it does not require to be reduced. Dividing the leading term of $$A_0$$ by f with remainder we obtain $$D_1$$. In the same way as before, we split $$D_1$$ in two parts $$A_1$$ and $$B_1$$ and repeat the process obtaining the tree of Fig. 1.

### 3.1 Determining $$A_0$$ and $$B_0$$

We trivially have

\begin{aligned} D_0 (x) = A_0(x) + B_0(x) = \sum _{i=m}^{2m-2}d_ix^i + \sum _{i=0}^{m-1}d_ix^i, \end{aligned}

and hence

\begin{aligned} A_0 = \sum _{i=m}^{2m-2}d_ix^i \quad \text{ and } \quad B_0 = \sum _{i=0}^{m-1}d_ix^i. \end{aligned}
(5)

### 3.2 Determining $$A_1$$ and $$B_1$$

Using for clarity the generic form of a pentanomial over $${\mathbb F}_{2}$$, $$f(x) = x^{m} + x^{a} + x^b + x^c + 1$$, dividing the leading term of $$A_0$$ by f and taking the remainder, we get

\begin{aligned} D_1= & {} \sum _{i=0}^{m-2}d_{i+m}x^{i+a} + \sum _{i=0}^{m-2}d_{i+m}x^{i+b}\\&+ \sum _{i=0}^{m-2}d_{i+m}x^{i+c} + \sum _{i=0}^{m-2}d_{i+m}x^i. \end{aligned}

Separating the already reduced part of $$D_1$$ from the piece of $$D_1$$ that still requires more work, we obtain

\begin{aligned} A_1= & {} \sum _{i=m}^{m+a-2}d_{i+(m-a)}x^i + \sum _{i=m}^{m+b-2}d_{i+(m-b)}x^i\nonumber \\&+ \sum _{i=m}^{m+c-2}d_{i+(m-c)}x^i, \end{aligned}
(6)

and

\begin{aligned} B_1= & {} \sum _{i=a}^{m-1}d_{i+(m-a)}x^i + \sum _{i=b}^{m-1}d_{i+(m-b)}x^i \\&+ \sum _{i=c}^{m-1}d_{i+(m-c)}x^i + \sum _{i=0}^{m-2}d_{i+m}x^i. \end{aligned}

Since $$m=2b+c$$ and $$a=b+c$$, we have

\begin{aligned} A_1= & {} \sum _{i=2b+c}^{3b+2c-2}d_{i+b}x^i + \sum _{i=2b+c}^{3b+c-2}d_{i+b+c}x^i + \sum _{i=2b+c}^{2b+2c-2}d_{i+2b}x^i, \nonumber \\ B_1= & {} \sum _{i=b+c}^{2b+c-1}d_{i+b}x^i + \sum _{i=b}^{2b+c-1}d_{i+b+c}x^i \nonumber \\&+ \sum _{i=c}^{2b+c-1}d_{i+2b}x^i + \sum _{i=0}^{2b+c-2}d_{i+2b+c}x^i. \end{aligned}
(7)

### 3.3 Determining $$A_2$$ and $$B_2$$

As before, we divide the leading term of $$A_1$$ by f and we obtain the remainder $$D_2$$. We get $$D_2 = D_{2_a} + D_{2_b} + D_{2_c}$$, where $$D_{2_a}$$, $$D_{2_b}$$ and $$D_{2_c}$$ refer to the reductions of the sums in Eq. (6).

We start with $$D_{2_{a}}$$:

\begin{aligned} D_{2_{a}} = \sum _{i=0}^{a-2}d_{i+2m-a}x^i(x^a+x^b+x^c+1). \end{aligned}

Separating $$D_{2_a}$$ in the pieces $$A_{2_{a}}$$ and $$B_{2_{a}}$$, we get $$A_{2_{a}} = \sum _{i=m}^{2a-2}d_{i+2m-2a}x^i$$ since $$b+a-2 < m$$, and

\begin{aligned} B_{2_{a}}= & {} \sum _{i=a}^{m-1}d_{i+2m-2a}x^i + \sum _{i=b}^{a+b-2}d_{i+2m-a-b}x^i \\&+ \sum _{i=c}^{a+c-2}d_{i+2m-a-c}x^i + \sum _{i=0}^{a-2}d_{i+2m-a}x^i. \end{aligned}

Substituting $$m=2b+c$$ and $$a=b+c$$, we get $$A_{2_{a}} = \sum _{i=2b+c}^{2b+2c-2}d_{i+2b}x^i$$, and

\begin{aligned} B_{2_{a}}= & {} \sum _{i=b+c}^{2b+c-1}d_{i+2b}x^i + \sum _{i=b}^{2b+c-2}d_{i+2b+c}x^i \\&+ \sum _{i=c}^{b+2c-2}d_{i+3b}x^i + \sum _{i=0}^{b+c-2}d_{i+3b+c}x^i. \end{aligned}

Proceeding with the reduction now of the second sum in Eq. (6), we obtain

\begin{aligned} D_{2_{b}}= & {} \sum _{i=a}^{a+b-2}d_{i+2m-a-b}x^i + \sum _{i=b}^{2b-2}d_{i+2m-2b}x^i \\&+ \sum _{i=c}^{b+c-2}d_{i+2m-b-c}x^i + \sum _{i=0}^{b-2}d_{i+2m-b}x^i. \end{aligned}

Clearly, $$D_{2_b}$$ is already reduced, and thus $$A_{2_{b}} = 0$$, and

\begin{aligned} B_{2_{b}}= & {} \sum _{i=b+c}^{2b+c-2}d_{i+2b+c}x^i + \sum _{i=b}^{2b-2}d_{i+2b+2c}x^i\\&+ \sum _{i=c}^{b+c-2}d_{i+3b+c}x^i + \sum _{i=0}^{b-2}d_{i+3b+2c}x^i. \end{aligned}

We finally reduce the third and last sum in Eq. (6):

\begin{aligned} D_{2_{c}}= & {} \sum _{i=a}^{a+c-2}d_{i+2m-a-c}x^i + \sum _{i=b}^{b+c-2}d_{i+2m-b-c}x^i \\&+ \sum _{i=c}^{2c-2}d_{i+2m-2c}x^i + \sum _{i=0}^{c-2}d_{i+2m-c}x^i. \end{aligned}

Again, we easily check that $$D_{2_c}$$ is reduced and so $$A_{2_{c}} = 0$$, and

\begin{aligned} B_{2_{c}}= & {} \sum _{i=b+c}^{b+2c-2}d_{i+3b}x^i + \sum _{i=b}^{b+c-2}d_{i+3b+c}x^i \\&+ \sum _{i=c}^{2c-2}d_{i+4b}x^i + \sum _{i=0}^{c-2}d_{i+4b+c}x^i. \end{aligned}

Concluding, $$A_2$$ is given by

\begin{aligned} A_2 = A_{2_a} + A_{2_b} + A_{2_c} = \sum _{i=m}^{2a-2}d_{i+2m-2a}x^i, \end{aligned}
(8)

and $$B_2 = B_{2_a} + B_{2_b} + B_{2_c}$$ is

\begin{aligned} \begin{aligned} B_2 =&\sum _{i=b+c}^{2b+c-1}d_{i+2b}x^i + \sum _{i=c}^{b+2c-2}d_{i+3b}x^i + \sum _{i=b+c}^{b+2c-2}d_{i+3b}x^i \\&+ \sum _{i=c}^{2c-2}d_{i+4b}x^i + \sum _{i=b}^{2b+c-2}d_{i+2b+c}x^i + \sum _{i=b+c}^{2b+c-2}d_{i+2b+c}x^i\\&+ \sum _{i=b}^{2b-2}d_{i+2b+2c}x^i + \sum _{i=0}^{b+c-2}d_{i+3b+c}x^i + \sum _{i=c}^{b+c-2}d_{i+3b+c}x^i\\&+\sum _{i=b}^{b+c-2}d_{i+3b+c}x^i + \sum _{i=0}^{b-2}d_{i+3b+2c}x^i + \sum _{i=0}^{c-2}d_{i+4b+c}x^i. \end{aligned} \end{aligned}
(9)

### 3.4 Determining $$A_3$$ and $$B_3$$

Dividing the leading term of $$A_2$$ in Eq. (8) by f, we have

\begin{aligned} D_{3}= & {} \sum _{i=b+c}^{b+2c-2}d_{i+3b}x^{i} + \sum _{i=b}^{b+c-2}d_{i+3b+c}x^{i} + \sum _{i=c}^{2c-2}d_{i+4b}x^{i}\\&+ \sum _{i=0}^{c-2}d_{i+4b+c}x^{i}. \end{aligned}

We have that $$D_3$$ is reduced so $$A_3=0$$ and

\begin{aligned} B_{3}= & {} \sum _{i=b+c}^{b+2c-2}d_{i+3b}x^{i} + \sum _{i=b}^{b+c-2}d_{i+3b+c}x^{i} + \sum _{i=c}^{2c-2}d_{i+4b}x^{i}\nonumber \\&+ \sum _{i=0}^{c-2}d_{i+4b+c}x^{i}. \end{aligned}
(10)

### 3.5 The number of terms in $$A_r$$ and $$B_r$$

Let $$G(i) = 1$$ if $$i > 0$$ and $$G(i) = 0$$ if $$i \le 0$$. Let r be a reduction step. It is clear now that the precise number of terms for $$A_r$$ and $$B_r$$, for $$r \ge 0$$, can be obtained using $$k_{b+c}$$, $$k_b$$ and $$k_c$$ given in Eqs. (2), (3) and (4). We have:

1. (i)

The number of terms of $$A_0$$ and $$B_0$$ is 1.

2. (ii)

For $$r >0$$, the number of terms of $$A_r$$ is $$G(k_{b+c} - r) + G(k_b -r) + G(k_c -r)$$, while the number of terms of $$B_r$$ is 4 times the number of terms of $$A_{r-1}$$.

## 4 The family of polynomials $$f(x) = x^{2b+1} + x^{b+1} + x^b + x + 1$$

In this section, we consider the case when $$c=1$$, that is, when $$k_{b+c}=2$$, as given in Eq. (2). The polynomials in this subfamily have the form $$f(x) = x^{2b+1} + x^{b+1} + x^b + x + 1$$. For the subfamily treated in this section, since $$k_{b+c}=2$$, we immediately get $$A_2 = 0$$ and the expressions in the previous section simplify. As a consequence, the desired reduction is given by

\begin{aligned} D_{red} = B_0 + B_1 + B_2. \end{aligned}

Using Eqs. (5), (7) and (9), we obtain

\begin{aligned} \begin{aligned} D_{red} =&\sum _{i=0}^{2b}d_ix^i + \sum _{i=b+1}^{2b}d_{i+b}x^i + \sum _{i=1}^{b}d_{i+2b}x^i\\&+ \sum _{i=1}^{b}d_{i+3b}x^i + \sum _{i=b}^{2b}d_{i+b+1}x^i \\&+ \sum _{i=0}^{b-1}d_{i+2b+1}x^i + \sum _{i=b+1}^{2b-1}d_{i+2b+1}x^i\\&+ \sum _{i=b}^{2b-2}d_{i+2b+2}x^i + \sum _{i=0}^{b-2}d_{i+3b+2}x^i + d_{3b+1}. \end{aligned} \end{aligned}
(11)

A crucial issue that allows us to give improved results for our family of pentanomials is the fact that redundancies occur for $$D_{red}$$ in Eq. (11). Let

\begin{aligned} T_1(j)&= \sum _{i=0}^{b-2}(d_{i+2b+1} + d_{i+3b+2})x^{i+j}, \nonumber \quad T_2(j) = d_{3b}x^{j}, \\ T_3(j)&= d_{3b+1}x^{j}, \quad T_4(j) = \sum _{i=0}^{b-1}(d_{i+2b+1} + d_{i+3b+1})x^{i+j}. \end{aligned}

A careful analysis of Eq. (11) reveals that $$T_1$$, $$T_2$$ and $$T_3$$ are used more than once, and hence, savings can occur. We rewrite Eq. (11) as

\begin{aligned} \begin{aligned} D_{red}&= B_0 + T_1(0) + T_1(b) + T_1(b+1) + T_2(b-1) \\&\quad +\,T_2(2b-1) + T_2(2b) + T_3(0) + T_3(2b) + T_4(1). \end{aligned} \end{aligned}
(12)

One can check that by plugging $$T_1$$, $$T_2$$, $$T_3$$ and $$T_4$$ in Eq. (12) we recover Eq. (11). Figure 2 shows these operations. We remark that even though the first row in this figure is $$B_0$$, the following two rows are not $$B_1$$ and $$B_2$$. Indeed, those rows are obtained from $$B_1$$ and $$B_2$$ together with the optimizations provided by $$T_1$$, $$T_2$$, $$T_3$$ and $$T_4$$.

Using Eq. (12), the number $$N_{\oplus }$$ of XOR operations is

\begin{aligned} N_{\oplus } = 6b + 1 = 3m -2. \end{aligned}

It is also easy to see from Fig. 2 that the time delay is $$3T_X$$, where $$T_X$$ is the delay of one 2-input XOR gate.

We are now ready to provide Algorithm 1 for computing $$D_{red}$$ given in Eq. (12), and as explained in Fig. 2, for the pentanomials $$f(x) = x^{2b+1} + x^{b+1} + x^b + x + 1$$.

Putting all pieces together, we give next the main result of this section.

### Theorem 1

Algorithm 1 correctly gives the reduction of a polynomial of degree at most $$2m-2$$ over $${\mathbb F}_{2}$$ by $$f(x) = x^{2b+1} + x^{b+1} + x^b + x + 1$$ involving $$N_{\oplus } = 3m -2 = 6b + 1$$ number of XORs operations and a time delay of $$3T_X$$.

## 5 Family $$f(x) = x^{2b+c} + x^{b+c} + x^b + x^c + 1, c>1$$

For polynomials of the form $$f(x) = x^{2b+c} + x^{b+c} + x^b + x^c + 1$$, $$c>1$$, we have that $$k_{b+c}=3$$, implying that $$A_3 = 0$$. The reduction is given by

\begin{aligned} D_{red} = B_0 + B_1 + B_2 + B_3. \end{aligned}

Using Eqs. (5), (7), (9) and  (10), we have that $$D_{red}$$ satisfies

\begin{aligned} \begin{aligned} D_{red} =&\sum _{i=0}^{2b+c-1}d_ix^i + \sum _{i=b+c}^{2b+c-1}d_{i+b}x^i \\&+ \sum _{i=c}^{b+c-1}d_{i+2b}x^i + \sum _{i=c}^{b+2c-2}d_{i+3b}x^i \\&+ \sum _{i=b}^{2b+c-1}d_{i+b+c}x^i + \sum _{i=0}^{b-1}d_{i+2b+c}x^i \\&+ \sum _{i=b+c}^{2b+c-2}d_{i+2b+c}x^i + \sum _{i=b}^{2b-2}d_{i+2b+2c}x^i\\&+ \sum _{i=0}^{c-1}d_{i+3b+c}x^i + \sum _{i=0}^{b-2}d_{i+3b+2c}x^i. \end{aligned} \end{aligned}
(13)

Let

\begin{aligned} T_{1}(j)&= \sum _{i=0}^{b-2}(d_{i+2b+c} + d_{i+3b+2c})x^{i+j}, \qquad T_2(j) = d_{3b+c-1}x^j,\\ T_3(j)&= \sum _{i=0}^{c-1}d_{i+3b+c}x^{i+j}, \quad T_4(j) = \sum _{i=0}^{b-2}d_{i+2b+c}x^{i+j}, \\ T_5(j)&= \sum _{i=0}^{b-2}d_{i+3b+2c}x^{i+j}. \end{aligned}

Again, a careful analysis of Eq. (13) shows that $$T_1$$, $$T_2$$ and $$T_3$$ are used more than once. Thus, we can rewrite Eq. (13) for $$D_{red}$$ as

\begin{aligned} \begin{aligned} D_{red} =&B_0 + T_1(0) + T_1(b) + T_1(b+c) \\&+ T_2(b-1) + T_2(b+c-1) \\&+ T_2(2b-1) + T_2(2b+c-1) \\&+ T_3(0) + T_3(c) + T_3(2b) + T_4(c) +T_5(2c). \end{aligned} \end{aligned}
(14)

Figure 3 depicts these operations. Using Eq. (14) and Fig. 3, we have Algorithm 2. For code efficiency reasons, in contrast to Algorithm 1, in Algorithm 2 we separate the last line before the equality in Fig. 3. The additions of this last line are done in lines 17 to 20 of Algorithm 2. As a consequence, lines 3 to 16 of Algorithm 2 include only the additions per column from 0 to $$2b+c-1$$ of the first three lines in Fig. 3.

The time delay is $$3T_X$$; after removal of redundancies and not counting repeated terms, we obtain that the number $$N_\oplus$$ of XORs is

\begin{aligned} N_\oplus = 6b + 3c - 2 = 3m-2. \end{aligned}

### Theorem 2

Algorithm 2 correctly gives the reduction of a polynomial of degree at most $$2m-2$$ over $${\mathbb F}_{2}$$ by $$f(x) = x^{2b+c} + x^{b+c} + x^b + x^c + 1$$ involving $$N_{\oplus } = 3m -2 = 6b + 3c-2$$ number of XORs operations and a time delay of $$3T_X$$.

### 5.1 Almost equally spaced pentanomials: the special case $$b=2c$$

Consider the special case $$b=2c$$. In this case, we obtain the almost equally spaced polynomials $$f(x) = x^{5c} + x^{3c} + x^{2c} + x^c + 1$$. Our previous analysis when applied to these polynomials entails

\begin{aligned} \begin{aligned} D_{red} =&\sum _{i=0}^{5c-1}d_ix^i + \sum _{i=3c}^{5c-1}d_{i+2c}x^i + \sum _{i=c}^{3c-1}d_{i+4c}x^i \\&+ \sum _{i=c}^{4c-2}d_{i+6c}x^i + \sum _{i=2c}^{5c-1}d_{i+3c}x^i \\&+ \sum _{i=0}^{2c-1}d_{i+5c}x^i + \sum _{i=3c}^{5c-2}d_{i+5c}x^i + \sum _{i=2c}^{4c-2}d_{i+6c}x^i\\&+ \sum _{i=0}^{c-1}d_{i+7c}x^i + \sum _{i=0}^{2c-2}d_{i+8c}x^i. \end{aligned} \end{aligned}
(15)

Let

\begin{aligned} T_{1}(j)&= \sum _{i=c}^{2c-2}(d_{i+5c} + d_{i+4c})x^{i+j}, \\ T_2(j)&= \sum _{i=c}^{2c-2}(d_{i+8c} + d_{i+6c})x^{i+j}, \quad \\ T_3(j)&= d_{8c-1}x^j, \quad T_4(j) = \sum _{i=0}^{c-1}d_{i+8c}x^{i+j}, \\ T_5(j)&= \sum _{i=0}^{c-1}d_{i+5c}x^{i+j}, \quad \\ T_6(j)&= \sum _{i=0}^{c-2}d_{i+7c}x^{i+j}, \quad T_7(j) = \sum _{i=4c}^{5c-1}d_{i+2c}x^{i+j}. \end{aligned}

In the computation of $$D_{red}$$, $$T_1$$, $$T_2$$, $$T_3$$ and $$T_4$$ are used more than once. Figure 4 shows, graphically, these operations. After removal of redundancies, the number $$N_\oplus$$ of XORs is $$N_\oplus = 12c - 1 = \displaystyle \frac{12}{5}m -1.$$ This number of XORs is close to 2.4m providing a saving of about $$20\%$$ with respect to the other pentanomials in our family. Irreducible pentanomials of this form are rare but they do exist, for example, for degrees 5, 155 and 4805. We observe that the extension 155 is used in [1].

Using Eq. (15) and Fig. 4, we naturally have Algorithm 3.

## 6 Multiplier in $$\mathbb {F}_{2}[x]$$, complexity analysis and comparison

So far, we have focused on the second step of the algorithm, that is, on the reduction part. For the first step, the multiplication part, we simply use a standard Karatsuba recursive algorithm implementation; see Algorithm 4.

Recursivity in hardware can be an issue; see [24] and [4], for example, for efficient hardware implementations of polynomial multiplication in finite fields using Karatsuba’s type algorithms.

As can be seen our multiplier consists of two steps. The first is the multiplication itself using Karatsuba arithmetic or, if necessary, the school book method, and the second is the reduction described in the previous sections. The choice of the first step method will basically depend on whether the application requirement is to minimize area (Karatsuba), i.e., the number of ANDs and XORs gates, or to minimize the arithmetic delay (School book); see [5] for several variants of both the schoolbook and Karatsuba algorithms. Minimizing the area is interesting in applications that need to save power at the expense of a longer runtime.

We chose the Karatsuba multiplier since our goal is to minimize the area, i.e., to minimize the number of gates AND and XOR. A summary of our results compared with related works is given in Tables 1 and 2. Table 1 presents comparison costs among multipliers that perform two steps for the multiplication, that is, they execute a multiplication followed by a reduction. The table shows the multiplication algorithm used in each case. Table 2 gives a comparison among the state-of-the-art bit multipliers in the literature. The main target for us is [18] since it presents the smallest area in the literature. However, Type 3 polynomials are also considered; this is another practically relevant family of polynomials. With respect to Karatsuba variants, Table 3 of survey [5] shows asymptotic complexities of several Karatsuba multiplication algorithms without reduction.

For each entry in Table 1, we give the multiplication algorithm and the amount of gates AND, XOR as well its delay. We point that for [19] and [25], their multipliers are general for any pentanomial with $$a \le \frac{m}{2}$$ instead of for a specific family such as [20]. In the case of our family, in addition to the number of XORs for the reduction, we include the cost for the multiplication due to the recursive Karatsuba implementation multiplier, that is, the XOR count is formed by the sum of the XORs of the Karatsuba multiplier and the ones of the reduction part. In our implementation, the constant of Karatsuba is strictly less than 6; see Fig. 5 for degrees up to 1024. As can be seen, for degrees powers of 2 minus 1 ($$2^k -1, k \ge 1$$), the constant achieves local minimum. For the number of AND gates, we provide an interval. The actual number of AND gates depends on the value of m; it only reaches a maximum when $$m = 2^{k}-1$$, for $$k \ge 1$$.

In Table 2, we provide the number of XORs and ANDs gates for Type 1 and Type 2 families in [18] and [20], Type 3 in [19] and our family of pentanomials. We point out that in [18] the authors compute multiplication and reduction as a unique block with a divide-and-conquer approach using squaring. In contrast, we separate these two parts and use Karatsuba for the multiplier followed by our reduction algorithm.

The costs for using our pentanomials for degrees proposed by NIST can be found in Table 3. The amount of XOR and AND gates are the exact value obtained from Table 1. The delay costs can be separated in $$T_A$$ and $$T_X$$, delay for AND gates and XOR gates, respectively. The delay for AND gates is due to only 1 AND gate at the lowest level of the Karatsuba recursion. The delay for the XOR gates in the Karatsuba multiplier is $$3 \lceil \log _2{(m-1)} \rceil$$ since there are 3 delay XORs per level of the Karatsuba recursion. For the reduction part, we only have 3 delay XORs. Hence, the total number of XOR delays is $$3 \lceil \log _2{(m-1)} \rceil + 3$$.

Table 4 shows the number of irreducible pentanomials of degrees 163, 283 and 571 for the families considered since those are NIST degrees where pentanomials have been recommended [16]. Analyzing the table, we have that family Type 1 has the most irreducible pentanomials, but few of them have degrees recommended by NIST [16]. The first family of Type 2, proposed in [18], has restrictions in the range of c; this family presents the highest number of representatives with NIST degrees of interest. The second family of Type 2, proposed in [20], has no restrictions for c; this family presents the largest number of irreducible polynomials. Type 3 is the special case from [19]. Our family for $$b \not = 2c$$ has less irreducible polynomials, and it has no irreducible polynomials with degrees 163, 283 and 571. In the other side, when $$b \ne 2c$$ our family has 730 polynomials of degrees up to 1024 and it presents 5 pentanomials of NIST degrees.

In the following, we comment on the density of irreducible pentanomials in our family. Table 5 lists all irreducible pentanomials of our family for degrees up to 1024; $$N_\oplus$$ is the number of XORs required for the reduction. We leave as an open problem to mathematically characterize under which conditions our pentanomials are irreducible.

## 7 Conclusions

In this paper, we present a new class of pentanomials over $$\mathbb {F}_2$$, defined by $$x^{2b+c}+ x^{b+c}+ x^b + x^c + 1$$. We give the exact number of XORs in the reduction process; we note that in the reduction process no ANDs are required.

It is interesting to point out that even though the cases $$c=1$$ and $$c>1$$, as shown in Figs. 2 and 3, are quite different, the final result in terms of number of XORs is the same. We also consider a special case when $$b=2c$$ where further reductions are possible.

There are irreducible pentanomials of this shape for several degree extensions of practical interest. We provide a detailed analysis of the space and time complexity involved in the reduction using the pentanomials in our family. For the multiplication process, we simply use the standard Karatsuba algorithm.

The proved complexity analysis of the multiplier and reduction considering the family proposed in this paper, as well as our analysis suggests that these pentanomials are as good as or possibly better to the ones already proposed.

We leave for future work to produce a one-step algorithm using our pentanomials, that is, a multiplier that performs multiplication and reduction in a single step using our family of polynomials, as well as a detailed study of the delay obtained using this algorithm.