ECC on Your Fingertips: A Single Instruction Approach for Lightweight ECC Design in GF(p)
Abstract
Lightweight implementation of Elliptic Curve Cryptography on FPGA has been a popular research topic due to the boom of ubiquitous computing. In this paper we propose a novel single instruction based ultralight ECC cryptoprocessor coupled with dedicated hardIPs of the FPGAs. We show that by using the proposed single instruction framework and using the available block RAMs and DSPs of FPGAs, we can design an ECC cryptoprocessor for NIST curve P256, requiring only 81 and 72 logic slices on Virtes5 and Spartan6 devices respectively. To the best of our knowledge, this is the first implementation of ECC which requires less than 100 slices on any FPGA device family.
Keywords
Elliptic curve Single instruction URISC SBN FPGA HardIPs1 Introduction
With the recent boom in ubiquitous computing, specially in InternetofThings (IoT), the need of lightweight cryptoalgorithms, either at algorithmic or implementation level, has increased significantly. Though the researchers have proposed various lightweight symmetric ciphers, the most popular options for public key cryptography are RSA and Elliptic Curve Cryptography (ECC). ECC based cryptosystem is being preferred over its counterpart RSA because of its wonderful property of increased security level per key bit over RSA. Any ECC based protocol or algorithm is based on underlying elliptic curve scalar multiplication whose computation is based on a number of field operations, making it computationally extensive. Software implementations of ECC, running on smart cards or AVR are slow and can become performance bottleneck for many applications. As an alternative, dedicated ECCcrypto processors are being built on hardware platforms like ASICs (Application Specific Integrated Circuits) and FPGAs (Field Programmable Gate Arrays).
Although ASIC implementations are faster than those based on FPGAs, FPGAs are sometimes preferred over ASIC for cryptographic applications due to its inherent properties of reconfigurability, short time to market and in house security. The entire design cycle of an FPGA based system can be completed inside a single lab unlike ASIC based systems where several different parties are involved in the design cycle. Moreover, modern FPGAs with various device families provide interesting design choices to the designer. Additionally, these FPGAs are now equipped with dedicated hard IPs like DSP blocks, Block RAMs, which when properly utilized results in efficient design of dedicated ECCbased cryptoprocessors in GF(p) with improved timing performance and reduced area overhead.
There have been many works in the literature which focus on efficient implementation of ECC cryptoprocessor in GF(p) on FPGAs. An overview of such implementations can be found in [1]. A lightweight ASIC design was reported in [2]. Considerably high speed designs for FPGAs can be found in [3] which is significantly faster than previous designs reported in [4, 5]. But, though the proposed design requires much less area compared to the previous designs (1715 logic slices on Virtex4 platform for NIST P256), it is still considerably large for lightweight applications. A fast pipelined modular multiplier for ECC field multiplication was proposed in [6], whereas optimized tiling methodology targeting rectangular DSP blocks of Virtex5 FPGA was proposed in [7]. However, both of them have considerable area overhead, hence can not be applied in lightweight applications.
A lightweight ECC algorithm for RFID tags was presented in [8] and authentication and ID transfer protocols based on lightweight ECC was introduced in [9]. On implementation level, authors have proposed a lightweight architecture, known as MicroECC, in [10]. The proposed design methodology shows significant improvement in terms of areatime product compared to the previous implementation [11, 12, 13]. However, MicroECC was implemented on VirtexII platform which is no longer a recommended design platform by Xilinx [14]. Moreover, unlike [11, 12, 13], MicroECC architecture does not support generalized ECC scalar multiplication on any prime field. Nevertheless, for fixed P256 and P224 curve, the performance of MicroECC outperforms other by big margin. Lightweight implementation of IPsec protocols comprising implementation of lightweight block cipher PRESENT, lightweight hash function PHOTON and ECC cryptoprocessor (P160 and P256) was presented in [15]. The ECC implementation requires 670 logic slices on Spartan6 platform for NIST P256 curve. Consequently, a lightweight architecture supporting both RSA and ECC along with some side channel countermeasure was proposed in [16]. The slice consumption of the proposed design is 1914 logic slices on Virtex5 platform which is quite low considering dual support of RSA and ECC, provided by the design. As an alternative of standard NIST specified curves, many researchers have recommended use of Edward curve and hyper elliptic curve (HECC). Efficient lightweight implementation of ECC scalar multiplication on such curves can be found in [17, 18].
In this paper, we want to propose an alternative single instruction approach for designing lightweight ECC scalar multiplier which has not been adopted in the previous works. It is well known that using a single instruction like SBN (subtract and branch if negative), SUBLEQ (subtract and branch if the answer is negative or equal to zero), we can construct a Turing complete computer processor. However, though single instruction processor can execute any arithmetical or logical operation, the execution time of some operations become so large that it can not be used in practical scenarios. Hence, a stand alone URISC processor can not be used to design computationally intensive ECC applications. However, in this paper we will show that using the dedicated hardIPs of FPGA, and with some simple modification of a URISC processor, it is possible to design an immensely lightweight and yet practical ECC architecture.
This architecture is extremely lightweight and to our best of knowledge this is the first implementation of ECC scalar multiplication which requires less than 100 slices on Virtex5 and Spartan6 platform. This significant reduction in slice consumption has been achieved by the lightweight architecture of single instruction processor along with intensive usage of hardIPs of the modern FPGAs. ECC scalar multiplication execution requires to compute and store multiple temporary variables along with the inputs and outputs. This contributes to significant number of register usage and hence increases the slice consumption. In this paper, we will show an alternative design approach where we intensively use the block RAMs and reduce the slice consumption significantly. Further reduction is obtained by replacing the LUT logics with high speed DSP blocks whenever possible. The strategy of using block RAMs to reduce the slice consumption has already been applied for lightweight block ciphers like PRESENT [19], where the authors have shown that block RAM based block cipher design can be extremely lightweight resulting in more slices left for other applications.

We propose a single instruction ECC cryptoprocessor for NIST P256 curve, and analyze various challenges along with their solutions that a designer will face while applying single instruction approach in the context of lightweight implementation of ECC designs.

We show that single instruction based ECC cryptoprocessor, coupled with intensive usage of block RAMs and DSP blocks, can yield extremely lightweight design for ECC scalar multiplication execution. The proposed processor requires less than 100 slices on both Virtex5 and Spartan6 family and involves thorough usage of FPGA hardIPs.
The rest of the paper is structured as below: Sect. 2 gives a very brief introduction of ECC and single instruction processor. Section 3 gives a detailed description of single instruction processor along with the modifications required for efficient ECC scalar multiplication. Consequently, Sect. 4 focuses on the architecture of the proposed ECC cryptoprocessor. Next, in Sect. 6, we discuss the timing and area performance of our design followed by conclusion in Sect. 7.
2 Preliminaries
In this section, we will give a brief summary of ECC and single instruction processors.
2.1 Elliptic Curve Cryptography
As we have previously mentioned, elliptic curve cryptography (ECC) is a public key cryptography based on elliptic curves and finite field. Security of ECC depends upon the mathematical intractability of discrete logarithm of a point in elliptic curve with respect to a known base point.
Now, each point addition and point doubling operation involves multiple field multiplication operation, making it most critical operation for efficient scalar multiplication execution. NIST specified curves are efficient for hardware implementation as modular reduction operation in those curves are simple as it involves a combination of few addition and subtraction. The fast modular reduction algorithm for NIST P256 is shown in Appendix A.
In our proposed design, we have concentrated on the NIST P256 curve. Nevertheless, our approach can be extended to other NIST certified curves also.
2.2 Single Instruction Processor
The concept of single instruction computer or one instruction set computer (OISC) was first proposed in [21]. It has been shown in [22] that using just a single instruction it is possible to create a Turing complete machine. The idea of applying URISC on cryptographic applications was proposed in [23]. In the similar direction, application of one instruction set computer on encrypted data computation was analyzed in [24], but in that paper the authors have investigated OISC in the context of homomorphic encryption and have not considered elliptic curves, which is the precise objective of the present paper.
 1.
ADDLEQ (Add the operands and branch if the answer is less than or equal to zero)
 2.
SUBLEQ (Subtract the operands and branch if the answer is less than or equal to zero)
 3.
SBN (Subtract the operands and branch if the answer is less than zero)
 4.
RSSB (Reverse subtract and skip if borrow)
 5.
SBNZ (Subtract the operands and branch if the answer is nonzero)
SBN and addition using SBN
Using this instruction, we can execute any mathematical, logical, flowcontrol, memory control or loadstore type of instruction. For example, in Table 1 (code 1.2), we will show how to perform addition of two operands using SBN instruction.
In this section we have given a brief idea about elliptic curves and OISC. In the next section, we will go into more details of OISC based on SBN instruction and will analyze it from the point of view of elliptic curve applications.
3 SBNOISC and Elliptic Curve Scalar Multiplication
In the previous subsection, we have given a brief idea about the ECC and OISC, based on SBN instruction (from hereafter we will refer this as SBNOISC). In this section we will focus more on SBNOISC in the context of ECC implementation. We will identify the critical challenges that the designer will face while implementing ECC using SBNOISC and will provide the solutions to tackle those challenges. We will first describe a standalone SBNOISC processor in the next subsection
3.1 StandAlone SBNOISC Processor

Instruction Memory: Instruction memory stores the instructions to be executed and can be implemented on FPGA using block RAMs, configured as single port ROM. In the Fig. 2, the instruction memory can store up to \(2^{11}\) number of instructions and each instruction is 21 bits wide. The format of the instruction is similar to Fig. 1, where address of both the operands are 5 bits wide and the length of the jump address is 11 bits.

Data Memory: Data memory stores the final result of any computation, along with the input and all the temporary results, required during the computation. This has been implemented using block RAM, configured as true dual port RAM. The data memory has space of 32 entries, each of which are 260 bits wide. While implementing scalar multiplication in NIST P256, the partially modular reduced output can be of size 259 bits which can be represented by 260 bits signed representation. Hence we have chosen the data path to be 260 bits wide.

ALU: Arithmetical logical unit (ALU) of SBNOISC contains a subtracter, which computes difference between the two inputs. If the result is negative, program counter gets updated by the jump address, specified in the instruction. Otherwise, the program counter gets updated by the immediate next instruction.
The above described architecture is simple and extremely lightweight, requiring 66 logic slices on a Virtex5 platform. But, as we will show in the next subsection, further optimization of ECC operation can be achieved by introducing different variants of SBN instruction. In the next subsection, we will mainly concentrate on different variants of SBN instructions and will discuss how these different versions of SBN can accelerate ECC implementation.
3.2 Instruction Level Optimizations
Generally, though an OISC processor executes only a single instruction, it is possible to realize different versions of that single instruction to accelerate the desired operation. This approach helps us to reduce the size of instruction memory and consequently, results in faster execution of the aimed design. This is extremely helpful for computationally intensive ECC applications, as illustrated in the following discussion.
Switching Off Memory WriteBack. When we consider traditional SBN instruction (SBNA, B, C), the memory location A always get updated by the result \(D.Mem[A]D.Mem[B]\) (D.Mem is the data memory). But we can reduce the required number of instruction count considerably, if we can switch off this memory writeback operation in some cases.
Let us consider a prime field addition operation. We assume that we need to add two operands stored at memory location A and B and the modulus of the field is stored at memory location P. In Table 2 (code 1.3) shows the realization of this operation using SBN. In this case, we can see that to implement prime field addition we will require 11 SBN instructions. Now, if each SBN instruction execution takes n clock cycles, total clock cycles requirement for field operation will be 11n clock cycles.
Field addition using different SBN instructions
Right Shift on SBN Processor. Right shift is an important operation for elliptic curve scalar multiplication execution as it is required during the field inversion operation. Right shift operation can be executed through SBN instruction by repeated subtraction of the operand. For example, if we wish to right shift an operand by 1 bit position, we need to subtract the operand by 2 until the subtraction result become less than 2. Now as we are concentrating on NIST P256 curve, the operands are typically 256 bits long, making the sequence of repeated subtraction operation extremely time consuming. On the other hand, shifter design on the FPGA has zero LUT overhead if the number of bits to be shifted are fixed. Hence, it is better if we implement right shift operation using a dedicated right shifter module instead of using SBN.
To facilitate this in our architecture, we have introduced another flag (\(SBN_{rs}\) and \(SBN_{\overline{rs}}\)) in our instruction format. When this flag is set, the dedicated right shifter module reads the operand and shift it right by one bit position.
Shifting Key Register. As we have stated in Algorithm 1, the elliptic curve scalar multiplication operation involves point addition and point doubling operation. Point doubling happens for every key bit, but point addition happens only when the key bit value is one. Hence we need to scan the key value bit by bit to execute scalar multiplication operation. On a standard processor this can be implemented using shift and logical AND operation. However, executing logical operations using only SBN instruction is again time consuming and hence practically infeasible.
To solve this challenge, we have used a dedicated key register, separate from the data memory shown in Fig. 2. Also we have introduced another flag in our instruction format (\(SBN_{ks}\) and \(SBN_{\overline{ks}}\)), which when enabled will left shift the key register by one bit. The shifted out bit from the key register will decide whether point doubling or point addition will occur.
Multiplication Using SBN. Field multiplication using SBN is carried out by repeated addition. For example to multiply operand A with Operand B we need to add operand A, B times. Now we have already shown how to implement field addition using SBN in Table 2. To complete the multiplication operation, we need to run that code, B times using a loop. Now, in the worst case scenario, the operands value in NIST P256 curve are in the range of \(2^{256}\), which makes repeated addition implementation impractical as the loop need to run \(2^{256}\) times. Hence, we can not implement field multiplication using only SBN for ECC scalar multiplication.
To solve this problem, we have designed a lightweight multiplier using DSP blocks, which acts as an external multiplier core and execute the field multiplication operation. However, to reset this multiplier core and to provide operand data to the multiplier we need another variant of SBN instruction, which we refer as \(SBN_{mul}\) and \(SBN_{\overline{mul}}\). The \(SBN_{mul}\) instruction resets the multiplier, whereas \(SBN_{\overline{mul}}\) initiates the multiplication operation. The detailed description of this external multiplier core along with its interfacing with the SBNOISC processor is provided in the next section.
Different variant of SBN instruction
Instruction  Memory writeback  Multiplier reset  Keyshift  Rightshift 

\(SBN_{w\overline{mul}\overline{ks}\overline{rs}}\)  \(\checkmark \)  x  x  x 
\(SBN_{nw\overline{mul}\overline{ks}\overline{rs}}\)  x  x  x  x 
\(SBN_{nw\overline{mul}ks\overline{rs}}\)  x  x  \(\checkmark \)  x 
\(SBN_{w\overline{mul}\overline{ks}rs}\)  \(\checkmark \)  x  x  \(\checkmark \) 
\(SBN_{nw{mul}\overline{ks}\overline{rs}}\)  x  \(\checkmark \)  x  x 
4 Lightweight Field Multiplier for SBNOISC
As we have stated in the previous sections, we need to provide a dedicated light weight multiplier core to the SBNOISC processor for efficient execution of the ECC operations. In this section we will focus on the architecture of this dedicated field multiplier and will describe the design strategies behind the proposed filed multiplier methodology.
The architecture of the field multiplier is shown in Fig. 4. As we can see, the architecture requires two DSP blocks, one for integer multiplication and another one for modular reduction operation. DSP blocks of Virtex5 FPGA can support \(25 \times 18\) signed multiplication. It can also provide 48 bit adder/accumulator support. For our implementation, we have used DSP block as \(16 \times 16\) unsigned multiplier, configured in multiply and accumulate mode. Moreover, during addition operation, DSP block is configured as 32 bit adder.
4.1 Integer Multiplication
The integer multiplier receives two 256 bits long operands as input. The operands are divided into 16 bit words and are passed to the first DSP block through two multiplexers. The DSP block is configured in multiply and accumulate mode and support two different operations. In the first operation, DSP block computes \(A*B+P\) where A and B are two multiplexer output and P is the accumulator output. This operation computes the summation of the partial products which are aligned with each other. Let us illustrate this with a small example in Eq. 2.
4.2 Modular Reduction
Once the memory is loaded with the integer multiplication result, modular reduction operation is initiated. For NIST curves, modular reduction operation requires a combination of addition and subtraction operation as shown in Algorithm 2 in Appendix A. Now in Algorithm 2, the modular reduction operation needs to add operands \(T,S_1,S_2,S_3.S_4\) and subtract \(D_1. D_2, D_3, D_4\) from them. We have separated the operands in 32 bit words and have used a DSP adder to execute the addition/subtraction operations. The memory produces 32 bits of output in a single clock cycle, which are added or subtracted depending on the control signal add/sub. Like the previous DSP blocks, this one also supports two operation: \(P \pm C\) and \(C+CONCAT\), where \(CONCAT=P>>32\) and P is the accumulator output. The first operation does the addition or subtraction of a 32 bit operand with the accumulator result, and the second operation is required to add the carry bits generated from the previous additions.
The addition and subtraction sequence of the operands are decided by the modular reduction algorithm for NIST P256 curve, shown in Appendix A. Moreover, the produced result is not fully reduced but is within the range [\(4p,5p\)] [3], where p is the modulus of the curve. The total clock cycles required for this partial modular reduction operation is 68, making the total clock cycle requirement for field multiplication 327.
5 Complete ECC SBNOISC Processor
In this section, we will present the detailed description of our proposed ECC SBNOISC processor. The complete architecture of the processor is shown in Fig. 5. The architecture and the working of the proposed processor is nearly similar to the stand alone SBN processor shown in Fig. 2 with some few modifications which are described below.
The ECC SBNOISC processor is coupled with the multiplier core described in the previous section. Multiplier core is initiated by the mul flag of the instruction. As long as the mul flag is set to one, the multiplier stays in its initial stage. Once it is set low, the multiplier starts its operation and produces the partially reduced output along with signal web which indicates the completion of multiplication operation. In the stand alone SBN (Fig. 2), the data memory is updated only through port A. But in our case, we are also using the unused port B for writing the multiplier output into the memory. It must be noted that when the rs flag is set high, the multiplier module produces right shifted output of input, available through port A.
As we have mentioned earlier, we introduced a flag ks in our instruction format for shifting the key register. Key is stored in a different register which goes though a single bit left shift when ks flag is set high. If the MSB of the key bit is one, we select the address of the memory location containing value 1 (addr_1) and pass it to the data memory. Otherwise if the MSB bit is zero, we select the memory location containing value 0 (addr_0). Once this is done we can easily switch between point doubling and addition operation depending upon the memory location passed to the data memory.
The ALU of the proposed SBNOISC processor is a subtracter, implemented through cascaded DSP blocks. The subtraction operation requires 6 clock cycles to be completed. Instruction fetch, memory read and memory writeback require single clock cycle for each operation. Hence total clock cycle required for a single SBN instruction requires 9 clock cycles.
6 Result and Comparison
In this section we will analyze the performance of the proposed ECC SBNOISC processor in terms of timing and area. Table 4 shows the timing and area performance of the proposed processor. As we can see, the slice count required by the design for both Virtex5 and Spartan6 is very small. This is achieved by indepth usage of blockRAMs and DSP blocks. The stand alone SBN processor is itself very lightweight, and the dedicated multiplier core is designed by judicious use of DSPs and block RAMs making the slice count extremely small. The block RAMs are used to implement both data and instruction memory of the SBNOISC processor. Moreover all the temporary storages along with control units are also implemented through block RAMs which increases the block RAM consumption, but reduces the slice count considerably. A designer can choose a budget of slices and block RAMs and then can design the ECC cryptoprocessor according to that budget. In this paper, we wanted to explore the limit upto which we can reduce the slice count by increasing the block RAM usage. The result in Table 4 shows that saving is significant in terms of slice usage and hence the objective of the paper is achieved.
Area and timing performance of the proposed ECC SBNOISC processor
Platform  Freq. (MHz)  Slices  LUTs  FlipFlops  DSP for ALU  DSP for Multiplier  BlockRAM  Time (ms) 

Virtex5  171.5  81  212  35  6  2  22  11.1 
Spartan6  156.25  72  193  35  6  2  24  12.2 
Comparison of ECC SBNOISC processor with existing designs
Reference  Slices  MULTs  BRAMs  Freq (MHz)  Latency (ms)  FPGA 

MicroECC P256 16 bit [10]  773  1  3  210  10.02  VirtexII Pro 
MicroECC P256 32 bit [10]  1158  4  3  210  4.52  VirtexII Pro 
[11] 16 bit any prime curve  1832  2  9  108.20  29.83  VirtexII Pro 
[11] 32 bit any prime curve  2085  7  9  68.17  15.76  VirtexII Pro 
[3] P256  1715  32 (DSP)  11  490  .62  Virtex4 
[15] P256  221  1  3  Not shown  Not shown  Spartan6 
Present work, P256  81  8(DSP)  22  171.5  11.1  Virtex5 
Present work, P256  72  8(DSP)  24  156.25  12.2  Spartan6 
7 Conclusion
In this paper we have merged two design strategies to create an extremely lightweight ECC cryptoprocessor for scalar multiplication in NIST P256 curve. The first strategy was to use a single instruction processor (ECC SBNOISC processor) to create lightweight framework for ECC scalar multiplication. Then we have equipped this processor with dedicated field multiplier along with some simple modification of the processor architecture and instruction format to make the scalar multiplication operation practical time feasible. The second strategy is to use the dedicated hardIPs of the FPGA to reduce the slice consumption further. We have shown that by thorough usage of DSP blocks and block RAMs, the slice requirement decreases significantly. For Virtex5 and Spartan6, we have been able to achieve less than 100 slice consumption. To the best of our knowledge, this is the first implementation which has been able to achieve this feat.
References
 1.Daly, A., Marnane, W., Kerins, T., Popovici, E.: An FPGA implementation of a GF(p) ALU for encryption processors. Microprocess. Microsyst. 28(56), 253–260 (2004). Special Issue on FPGAs: Applications and DesignsCrossRefGoogle Scholar
 2.Batina, L., Mentens, N., Sakiyama, K., Preneel, B., Verbauwhede, I.: Lowcost elliptic curve cryptography for wireless sensor networks. In: Buttyán, L., Gligor, V.D., Westhoff, D. (eds.) ESAS 2006. LNCS, vol. 4357, pp. 6–17. Springer, Heidelberg (2006)CrossRefGoogle Scholar
 3.Güneysu, T., Paar, C.: Ultra high performance ECC over NIST primes on commercial FPGAs. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 62–78. Springer, Heidelberg (2008)CrossRefGoogle Scholar
 4.Satoh, A., Takano, K.: A scalable dualfield elliptic curve cryptographic processor. IEEE Trans. Comput. 52, 449–460 (2003)CrossRefGoogle Scholar
 5.Orlando, G., Paar, C.: A scalable \(GF\)(\(p\)) elliptic curve processor architecture for programmable hardware. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 356–371. Springer, Heidelberg (2001)Google Scholar
 6.Alrimeih, H., Rakhmatov, D.: Pipelined modular multiplier supporting multiple standard prime fields. In: 2014 IEEE 25th International Conference on ApplicationSpecific Systems, Architectures and Processors (ASAP), pp. 48–56, June 2014Google Scholar
 7.Roy, D.B., Mukhopadhyay, D., Izumi, M., Takahashi, J., Multiplication, T.B.: An efficient strategy to optimize DSP multiplier for accelerating prime field ECC for NIST curves. In: The 51st Annual Design Automation Conference, DAC 2014, San Francisco, CA, USA, 1–5 June 2014, pp. 177:1–177:6 (2014)Google Scholar
 8.Kim, C.J., Yun, S.Y., Park, S.C.: A lightweight ECC algorithm for mobile RFID service. In: Proceedings of the 5th International Conference on Ubiquitous Information Technologies and Applications (CUTE 2010), pp. 1–6, December 2010Google Scholar
 9.He, D., Kumar, N., Chilamkurti, N., Lee, J.H.: Lightweight ECC based RFID authentication integrated with an ID verifier transfer protocol. J. Med. Syst. 38(10), 116 (2014)CrossRefGoogle Scholar
 10.Varchola, M., Güneysu, T., Mischke, O.: MicroECC: a lightweight reconfigurable elliptic curve cryptoprocessor. In: International Conference on Reconfigurable Computing and FPGAs, ReConFig 2011, Cancun, Mexico, November 30–December 2, 2011, pp. 204–210 (2011)Google Scholar
 11.Vliegen, J., Mentens, N,. Genoe, J., Braeken, A., Kubera, S., Touhafi, A., Verbauwhede, I:. A compact FPGAbased architecture for elliptic curve cryptography over prime fields. In: 21st IEEE International Conference on ApplicationSpecific Systems Architectures and Processors, ASAP 2010, Rennes, France, 7–9 July 2010, pp. 313–316 (2010)Google Scholar
 12.Tawalbeh, L.A., Mohammad, A., Gutub, A.A.A.: Efficient FPGA implementation of a programmable architecture for GF(p) elliptic curve crypto computations. Signal Process. Syst. 59(3), 233–244 (2010)CrossRefGoogle Scholar
 13.Ghosh, S., Alam, M., Chowdhury, D.R., Gupta, I.S.: Parallel cryptodevices for GF(P) elliptic curve multiplication resistant against side channel attacks. Comput. Electr. Eng. 35(2), 329–338 (2009)CrossRefzbMATHGoogle Scholar
 14.Xilinx Inc.: VirtexII and VirtexII Pro X FPGA User Guide, 14 February 2011Google Scholar
 15.Driessen, B., Güneysu, T., Kavun, E.B., Mischke, O., Paar, C., Pöppelmann, T.: IPSecco: a lightweight and reconfigurable IPSec core. In: International Conference on Reconfigurable Computing and FPGAs, ReConFig 2012, Cancun, Mexico, 5–7 December 2012, pp. 1–7 (2012)Google Scholar
 16.Pöpper, C., Mischke, O., Güneysu, T.: MicroACP  a fast and secure reconfigurable asymmetric cryptoprocessor. In: Goehringer, D., Santambrogio, M.D., Cardoso, J.M.P., Bertels, K. (eds.) ARC 2014. LNCS, vol. 8405, pp. 240–247. Springer, Heidelberg (2014)CrossRefGoogle Scholar
 17.Himmighofen, A., Jungk, B., Reith, S.: On a FPGAbased method for authentication using edwards curves. In: 8th International Workshop on Reconfigurable and CommunicationCentric SystemsonChip (ReCoSoC), Darmstadt, Germany, 10–12 July 2013, pp. 1–7 (2013)Google Scholar
 18.Fan, J., Batina, L., Verbauwhede, I.: Lightweight Implementation options for curvebased cryptography: HECC is also ready for RFID. In: ICITST, pp. 1–6. IEEE (2009)Google Scholar
 19.Kavun, E.B., Yalcin, T.: RAMbased ultralightweight FPGA implementation of PRESENT. In: International Conference on Reconfigurable Computing and FPGAs (ReConFig 2011), pp. 280–285, November 2011Google Scholar
 20.Hankerson, D., Menezes, A.J., Vanstone, S.: Guide to Elliptic Curve Cryptography. Springer, New York (2003)zbMATHGoogle Scholar
 21.Mavaddat, F., Parhamt, B.: URISC: the ultimate reduced instruction set computer. Int. J. Electr. Eng. Educ. 25, 327–334 (1988)CrossRefGoogle Scholar
 22.Gilreath, W.F., Laplante, P.A.: Computer Architecture : A Minimalist Perspective. The Springer International Series in Engineering and Computer Science. Springer, New York (2003)CrossRefzbMATHGoogle Scholar
 23.Naccache, D.: Is theoretical cryptography any good in practice? In: CHES (2010)Google Scholar
 24.Tsoutsos, N.G., Maniatakos, M.: Investigating the application of one instruction set computing for encrypted data computation. In: Gierlichs, B., Guilley, S., Mukhopadhyay, D. (eds.) SPACE 2013. LNCS, vol. 8204, pp. 21–37. Springer, Heidelberg (2013)CrossRefGoogle Scholar
 25.Liu, A., Ning, P., Tinyecc,: A configurable library for elliptic curve cryptography in wireless sensor networks. In: IPSN, pp. 245–256. IEEE Computer Society (2008)Google Scholar