The Conflicted Usage of RLUTs for Security-Critical Applications on FPGA

  • Debapriya Basu Roy
  • Shivam Bhasin
  • Jean-Luc Danger
  • Sylvain Guilley
  • Wei He
  • Debdeep Mukhopadhyay
  • Zakaria Najm
  • Xuan Thuy Ngo
Article
  • 108 Downloads

Abstract

Modern field programmable gate arrays (FPGAs) have evolved significantly in recent years and have found applications in various fields like cryptography, defense, aerospace, and many more. The integration of FPGA with highly efficient modules like DSP and block RAMs has increased the performance of FPGA significantly. This paper addresses the lesser explored feature of modern FPGA called as reconfigurable LUT (RLUT) whose content can be updated internally, even during run-time. We describe the basic functionality of RLUT and discuss its potential applications for security from both destructive and constructive point of view, highlighting the conflicted usage of RLUTs. Several use cases exploiting RLUT feature in security-critical scenarios (physical attacks related in particular) are studied in detail. The paper proposes design of stealthy hardware Trojans having zero payload overhead to highlight destructive applications which can be built using hardware Trojans. On the other hand, this paper also highlights several constructive applications based on RLUT features, starting from lightweight side-channel countermeasures to kill switch to prevent the FPGA hardware from environmental hazards and malicious attack attempts.

Keywords

Reconfigurable LUT (RLUT) FPGA CFGLUT5 Hardware Trojans Side-channel countermeasures Secret ciphers 

1 Introduction

Field programmable gate arrays (FPGAs) have emerged as a popular platform to implement complex digital logics.

FPGAs came into the VLSI industry as successor of programmable read-only memories (PROMs) and programmable logic devices (PLDs) and has been highly successful due to its reconfigurable nature. A standard FPGA can be defined as islands of configurable logic blocks (CLBs) in the sea of programmable interconnects. However, with time, FPGAs have become more sophisticated due to the addition of several on-chip features such as high-density block memories, DSP cores, PLLs, etc. These features coupled with their core advantage of reconfigurability and low-time to market have made FPGA an integral part of the semiconductor industry, as an attractive economic solution for low- to medium-scale markets like defense, space, automotive, medical, etc.

The key parameters that still remain for designers are area, performance, and power.

With the pervasive applications in security-critical scenarios, designers have started considering security as the fourth parameter. Most recently, FPGAs support bitstream protection by authentication and encryption schemes [1]. Other security features like tamper resistance, blocking bitstream read-back, temperature/voltage sensing, etc. are also available [2]. FPGA has also been a popular design platform for implementations of cryptographic algorithms due to its reconfigurability and in-house security.

Apart from the built-in security features, designers can use FPGA primitives and constraints to implement their own designs in a secure manner. In [3], authors show several side-channel countermeasures which could be realized on FPGAs to protect one design. Another work [4] demonstrates the efficient use of block RAMs to implement complex countermeasures like masking and dual-rail logic. DSPs in FPGAs have also been widely used to design public-key cryptographic algorithms like ECC [5, 6] and other post-quantum algorithms [7]. Moreover, papers like [8] have used FPGA constraints like KEEP, Lock_PINS or language like XDL to design efficient physical countermeasures.

The basic building block of an FPGA is a logic slice. Typically, a logic slice contains look-up tables (LUTs) and flip-flops. LUTs are used to implement combinational logics whereas flip-flops are used to design sequential architectures. Every LUT contains an INIT (LUT contents) value which is basically the truth table of the combinational function implemented on that LUT. This INIT (LUT contents) value is set during the programming of the FPGA through bitstream. Generally, this INIT (LUT contents) value is considered to be constant until the FPGA is reprogrammed again. However, in recent years, a new feature has been added to the FPGAs which allows the user to modify the INIT (LUT contents) value of some special LUTs in the run-time, without any FPGA programming. These special LUTs are known as reconfigurable LUTs or RLUTs as they can be reconfigured during the operation phase to change the input-output mapping of the LUT. To the best of our knowledge, RLUTs have found relevant use in pattern matching and filter applications [9]. Side-channel protection methodology using RLUT is presented in [10] where the authors have combined different side-channel protection strategies with RLUTs and have developed leakage-resilient designs. In [11], authors propose a hardware Trojan to leak a secret key of PRESENT [12] cipher. The Trojan itself needs 16 RLUTs and is always active. Both the size and activity of the Trojan makes it susceptible to detection techniques. Sixteen RLUTs can be considered a non-negligible overhead as PRESENT is a lightweight cipher and itself needs very few LUTs for implementation. However, in this work, the RLUTs are exploited to design a Trojan with negligible overhead. Additi- onally, the Trojan, proposed in this paper, is not always active and hence provides an added advantage over [11].

In this paper, we aim to study the impacts and ramifications of these RLUTs on cryptographic implementations. We conduct an in-depth analysis of the RLUT feature present in Xilinx FPGA and report its true potential for security-related application. We propose several industry-relevant applications of RLUT both of constructive and destructive nature. For example, an RLUT can be easily (ab)used by an FPGA IP designer to insert a stealthy hardware Trojan. On the other hand, using RLUT, a designer can provide several enhanced features like flexibility of programming secret data on client-side.

1.1 Contributions of the Paper

The contribution of the paper can be listed as follows:
  • This paper provides a detailed analysis of RLUTs and how it can be exploited to create extremely stealthy and serious hardware security threats like hardware Trojans (destructive applications).

  • Moreover, it also proposes design methodology which uses RLUTs to redesign (existing) efficient and lightweight side-channel countermeasures to mitigate power-based side-channel attacks (constructive applications).

  • Thus, in this work, we provide efficient exploits and defenses which can be built using RLUTs. We highlight to the best of our knowledge for the first time the double-edged and conflicted nature of the RLUTs usage in FPGA-based designs.

1.2 Organization of the Paper

The rest of the paper is organized as follows: Section 2 describes the rationale of an RLUT and discusses its advantages and disadvantages. Thereafter, several destructive and constructive applications of RLUT are demonstrated in Section 3 and Section 4 respectively. Finally, conclusions are drawn in Section 5.

2 Rationale of the RLUT

RLUT is a feature which is essentially known to be found in Xilinx FPGAs. A Xilinx RLUT can be inferred into a design by using a primitive cell called CFGLUT5 from its library. This primitive cell allows to implement a 5-input LUT with a single output whose configuration can be changed. CFGLUT5 was first introduced in Virtex-5 and Spartan-6 families of Xilinx FPGAs. The working principle of CFGLUT is similar to the shift register or the more popularly known SRL primitives.

Moreover, some older families of Xilinx which do not support CFGLUT5 as a primitive, can still implement RLUT using the SRL16 primitive. In the following, for sake of demonstration, we stick to the CFGLUT5 primitives. Nevertheless, the results should directly apply to its alternatives as well.

The basic block diagram of CFGLUT5 is shown in Fig. 1. It is a 5-input and a 1-output LUT. Alternatively, a CFGLUT5 can also be modeled as a 4-input and 2-output function. The main feature of CFGLUT5 is that it can be configured dynamically during the run-time. Every LUT is loaded with an INIT (LUT contents) value, which actually represents the truth table of the function implemented on that LUT. A CFGLUT5 allows the user to change the INIT (LUT contents) value at the run-time, thus giving the user power of dynamic reconfiguration internally. This reconfiguration is performed using the CD I port. A 1-bit reconfiguration data input is shifted serially into INIT (LUT contents) in each clock cycle if the reconfiguration enable signal (CE) is set high. The previous value of INIT (LUT contents) is flushed out serially through the CD O port, 1-bit per clock cycle. Several CFGLUT5 can be cascaded together using reconfiguration data cascaded output port (CD O ).
Fig. 1

Block diagram of CFGLUT5

The reconfiguration property of CFGLUT5 is illustrated in Fig. 2 with the help of a small example. In this figure, we show how the value of INIT (LUT contents) gets modified:
  • From value O = (O0,O1,O2,…,O30,O31)

  • To a new value N = (N0,N1,N2,…,N30,N31)

Fig. 2

INIT value reconfiguration in CFGLUT5

This reconfiguration requires 32 clock cycles. As it is evident from the figure, reconfiguration steps are basic shift register operations. Hence, if required, reconfiguration of the LUT content can be executed by using shift register primitives (SRL16E_1) in earlier device families. The CD O pin can also be fed back to the CD I pin of the same CFGLUT5. In this case, the original INIT (LUT contents) value can be restored after a maximum of 32 clock cycles without any overhead logic. We will exploit this property of RLUT later to design hardware Trojans.

There are two different kinds of slices in a Xilinx FPGA, i.e., SLICE_M and SLICE_L. Whereas a simple LUT can be synthesized in either of the slices, CFGLUT5 can be implemented only in SLICE_M. SLICE_M contains LUTs which can be configured as memory elements like shift register, distributed memory along with combinational logic function implementation. The LUTs of SLICE_L can only implement combinational logic.

CFGLUT5, when instantiated, is essentially mapped into a SLICE_M, configured as shift register (SRL32) as shown in Fig. 3.
Fig. 3

CFGLUT5 mapped in LUT as SRL32 as shown from Xilinx FPGA Editor

2.1 Comparison with Dynamic Configuration

Another alternative to reconfigure FPGA in run-time is to use partial or dynamic reconfiguration. This reconfiguration can also be exploited to implement secure architectures [13]. In partial reconfiguration, a portion of the implemented design is changed without disrupting operations of the other portion of the FPGA. This operation deploys an Internal Configuration Access Port (ICAP), and the design needing reconfiguration must be mapped into a special reconfigurable region [14]. Reconfiguration latency is in order of milliseconds. Partial reconfiguration is helpful when significant modification of the design is required. However, for small modification, using RLUT is advantageous as it has very small latency (maximum 32 clock cycles) compared to partial reconfiguration. RLUT is configured internally and no external access to either JTAG or Ethernet ports are required for reconfiguring RLUTs. Additionally, traditional DPR (dynamic partial reconfiguration) requires to convey an extra bit file which is not required in case of RLUT, making RLUT ideal for small reconfiguration of the design, in particular for crafting Trojans.

2.2 RLUT and Security

Since we have described the functioning of RLUT in detail, we can clearly recognize some properties which could be helpful or critical for security. A typical problem of cryptographic implementations is its vulnerability to statistical attacks like correlation power analysis (CPA) [15]. For instance, CPA tries to extract secret information from static cryptographic implementations by correlating side-channel leakages to estimated leakage models. A desirable feature to protect such implementations is reconfiguration of few internal features. An RLUT would be a great solution in this case as it has the capability to provide reconfigurability at minimal overhead and with no external access. It is important to reconfigure internally to avoid the risk of any eavesdropping. On the other hand, RLUT can also be used as a security pitfall. For example, an efficient designer can simply replace an LUT with RLUT in a design keeping the same INIT (LUT contents) value. Until reconfiguration, RLUT would compute normally. However, upon reconfiguration, the RLUT can be turned into a potential Trojan. The routing of the design is actually static, only the functionality of the LUT is modified upon reconfiguration. In the following sections, we would show some relevant applications of constructive or deadly nature. Of course, it is only a non-exhaustive list of RLUT applications into security.

3 Destructive Applications of RLUT

In earlier sections, we have presented the basic concepts of RLUTs with major emphasis on CFGLUT5 of Xilinx FPGAs. Though CFGLUT5 provides user a unique opportunity of reconfiguring and modifying the design in run-time, it also gives an adversary an excellent option to design efficient and stealthy hardware Trojan. In this section, we focus on designing tiny but effective hardware Trojan exploiting reconfigurability of RLUTs.

A hardware Trojan is a malevolent modification of a design, intended for either disrupting the algorithm operation or leaking secret information from it. The design of hardware Trojan involves efficient design of Trojan circuitry (known as payload) and design of trigger circuitry to activate the Trojan operation. A stealthy hardware Trojan should have negligible overhead, ideally zero, compared to the original golden circuit. Moreover, probability of the Trojan getting triggered during the functional testing should be very low, preventing accidental discovery of the Trojan. The threat of hardware Trojans is very realistic due to the fabless model followed by the modern semiconductor companies. In this model, the design is sent to remote fabrication laboratories for chip fabrication. It is very easy for an adversary to make some small modification in the design without violating the functionality of the design. The affected chip will give desired output in normal condition, but will leak sensitive information upon being triggered. More detailed analysis of hardware Trojans can be found in [16, 17, 18]. An FPGA-based design is less threatened to hardware Trojans as the entire design cycle can be completed inside the designer’s laboratory. Nevertheless, researchers have shown that it is possible to design efficient hardware Trojans on FPGAs. In [19], the authors have designed a Trojan on a Basys FPGA board which gets triggered depending upon the “content and timing” of the signals. On the other hand, authors in [20] have designed a hardware Trojan which can be deployed on the FPGA via dynamic partial reconfiguration to induce faults in an AES circuitry for differential fault analysis.

In this section, we will focus on effective design of hardware Trojan payload using RLUT. But before going into the design methodologies of payload using RLUTs, we will first describe the other two important aspects of the proposed hardware Trojans: adversary model and trigger methodologies.

3.1 Adversary Model

It is a common trend in the semiconductor industry to acquire proven IPs to reduce time to market and stay competitive. We consider an adversary model where a user buys specific proven IPs from a third-party IP vendor. By proven IPs, we mean IPs with well-established performance and area figures. Let us consider that the IP under consideration is a cryptographic algorithm and the target device is an FPGA. An untrusted vendor can easily insert a Trojan in the IP which can act as a backdoor to access sensitive information of other components of the user circuit. For instance, an IP vendor can provide a user with an obfuscated or even encrypted netlist (encrypted EDIF (Electronic Design Interchange Format)). Such techniques are popular and often used to protect the rights of the IP vendor. A Trojan in an IP is very serious for two major reasons. First, the Trojan will affect all the samples of the final product and secondly it is almost impossible to get a golden model. Moreover, research in Trojan detection under the given attack model is quite limited. The user does not have a golden circuit to compare, thus making hardware Trojan detection using side-channel methodology highly unlikely [21]. Additionally, this adversary model also makes the Trojan design challenging. Generally, before buying an IP, a user will analyze IPs from different IP vendors for performance comparison. This competitive scenario does not leave a big margin (gate-count) for Trojans.

Using RLUT, we can design extremely lightweight hardware Trojan payload as we can reconfigure the same LUTs, used in the crypto-algorithm implementation, from correct configuration to malicious configuration. This reduces the overhead of the hardware Trojan and makes it less susceptible to detection techniques based on visual inspection [22]. We can also restore the original value of RLUT to remove any trace of the Trojan, of course, at minor overheads. An IP designer can easily replace a normal LUT with RLUT. In this case, the designer has only one restriction of replacing an LUT implemented in SLICE_M. It is not difficult to find such an LUT in a medium- to big-scale FPGA which is often the case with cryptographic modules.

Additionally, if the access to the client bitstream is available, the adversary can reverse engineer the bitstream [23] and can replace a normal LUT with RLUT.

Instantiation of CFGLUT5 does not report any special element in the design summary report, but an LUT modeled as SRL32. A shift register has many usages on the circuit. For example, a counter can be very efficiently designed on a shift register using one hot encoding. Moreover, lightweight ciphers employ extensive usage of shift registers for serialized architectures. High resource design could use SLICE_M for normal LUT operations as well.

The only requirement is efficient triggering and a reconfiguration logic which will generate the malicious value upon receiving trigger signal. However, in this paper, we will show that once triggered, malicious value for the hardware Trojan can be generated without any overhead, thus giving us extremely lightweight and stealthy design of hardware Trojans. The basic methodology is same for all the Trojans, which can be tabulated as follows:
  • Choose a sensitive sub-module of the crypto-algorithm. For example, one can choose a 4 × 4 Sbox (can be implemented using 2 LUTs) as the sensitive sub-module.

  • Replace the LUTs of the chosen sub-module with CFGLUT5s without altering the functionality. A 4 × 4 Sbox can also be implemented using two CFGLUT5.

  • Modify the INIT (LUT contents) value upon trigger. As shown in Fig. 1, reconfiguration in CFGLUT5 takes place upon receiving the CE signal. By connecting the trigger output to the CE port, an adversary can tweak the INIT (LUT contents) value of CFGLUT5 and can change it to a malicious value. For example, the 4 × 4 Sboxes, implemented using CFGLUT5 can be modified in such a way that non-linear properties of the Sboxes get lost and the crypto-system becomes vulnerable to standard cryptanalysis. The malicious INIT (LUT contents) value can be easily generated by some nominal extra logic. However, in the subsequent sections, we will show that it is possible to generate the malicious INIT (LUT contents) value without any extra logic.

  • Upon exploitation, restore original INIT (LUT contents) value.

3.2 Design of Trigger for Hardware Trojan

A trigger for a hardware Trojan is designed in a way that the Trojan gets activated in very rare cases. The trigger stimulus can be generated either through output of a sensor under physical stress or some well-controlled internal logic. The complexity of trigger circuit also depends on the needed precision of the trigger in time and space. Several innovative and efficient methods were introduced as a part of Embedded Systems Challenge (2008) where participants were asked to insert Trojans on FPGA designs. For instance, one of the propositions was content and timing trigger [19], which activates with a correct combination of input and time. Such triggers are considered practically impossible to simulate. Other triggers get activated at a specific input pattern. A more detailed analysis with example of different triggering methodologies and their pros and cons can be found in [24].

Moreover, modern devices are loaded with physical sensors to ensure correct operating conditions. It is not difficult to find voltage or temperature sensors in smart-cards or micro-controllers. Similarly, FPGA also come with monitors to protect the system from undesired environmental conditions, Virtex-5 FPGAs contain a system monitor. Though a system monitor is not a part of cipher, they are often included in the SoC for tamper/fault/ temperature variation detection. These sensors are programmed to raise an alarm in event of unexpected physical conditions like overheating, high/low voltage etc. Now, an adversary can use this system monitor to design an efficient and stealthy hardware Trojan trigger methodology. The trick is to choose a trigger condition which is less than threshold value but much higher than nominal conditions. For instance, a chip with nominal temperature of 20 − 30 C and safety threshold of 80 C can be triggered in a small window chosen from the range of 40 − 79 C. Similarly, user-deployed sensors like the one proposed in [25] can also be used to trigger a Trojan.

It must be noted that the proposed sensor-based trigger circuitry is one of the many available efficient triggering methodologies. The adversary can choose any one of them as long as that triggering mechanism is stealthy and hard to detect. The state-of-the-art IoT devices are often paired with physical sensors. Conventionally, AES core or many other third-party cores are not generally connected to the sensors. However, in recent times, there have been works which have recommended the usage of physical sensors in the cryptographic IPs to detect the presence of electromagnetic probes to prevent side-channel attacks [25]. Hence, we believe the presented case study which uses FPGA sensors offers an interesting perspective in this context. However, if such sensors are not available, the malicious third-party IP designer can choose any state of the art triggering methodology.

3.2.1 Proposed Trigger Circuit

In the following, we focus mainly on the payload design of the Trojan using RLUT and directly use the trigger methodology proposed in [26]. Any other triggering methodology can be used instead. We exploit the temperature sensor measurement to generate the trigger signal. The device, used for this experiment, is Xilinx Virtex 5 FPGA mounted on SASEBO-GII boards. As described in the documentation [27], the temperature measurement is read directly on 10-bit signal output of system monitor. This output allows a value which varies from 0 to 1023. System monitor measurement allows to sense a temperature in range of [− 273 C,+ 230 C]; hence, the LSB of the 10-bit output is equal to \(\frac {1}{2}~^{\circ }\)C. More specifically, the ADC (analog to digital conversion) transfer function for recording the temperature of the FPGA is written as follows [27]:
$$ \text{Temperature}(^{\circ}\mathrm{C})=\frac{\text{sysmon output} \times 503.975}{1024}-273.15 $$
(1)

At the normal operating temperature (25 C), system monitor output is around 605 = b1001011101. Thanks to this observation, we decided to use directly the 7 t h bit of system monitor output as hardware Trojan trigger signal. The hardware Trojan will be activated when 7 t h bit of monitor output is high, i.e., when the monitor output is superior to 640 = b1010000000. This value corresponds to 42 C. Therefore, the trigger signal will be active when FPGA temperature is higher than 42 C. The triggering temperature can be another value and 42 C is used only for demonstration purpose.

In our case study, a simple hair dryer costing $5 is enough to heat the FPGA and reach this temperature. We assume that a system monitor is already instantiated in the design, to monitor device working conditions, and the alarm is raised at a temperature higher than 42 C. In such a scenario, the hardware Trojan trigger part does not consume much extra logic and would result in a very low-cost hardware Trojan example.

Whenever we need to trigger the Trojan, we bring the heater circuit close to the FPGA. The FPGA heats up slowly to the temperature of 42 C and raises the output bit to “1”. At this point, we switch-off the heater. Now this output bit stays “1” till the FPGA cools down below 42 C; therefore, we cannot precisely control the duration of trigger in terms of cycle count. We further process this output bit of the system monitor to generate a precise duration trigger. This can be done with some extra logic. In other words, we need a small circuit which can generate a precise trigger signal when the output bit of system monitor goes to “1”. For the Trojans in Table 1, we either need a trigger of 1 clock cycle or 12 clock cycles. Both these triggers can be generated by deploying one LUT and one flip-flop to process output bit of the system monitor. Thus, we can generate a very small trigger circuit to trigger a zero-overhead hardware Trojan.

3.3 Trojan Payload Design

Before designing Trojan payload for a given hardware, we first demonstrate the potential of RLUT in inserting malicious activity. Let us consider a buffer which is a very basic gate. Buffers are often inserted in a circuit by CAD tools to achieve desired timing requirements. For FPGA designers, another equivalent of buffer is route-only LUT. These buffers can be inserted in any sensitive wires without raising an alarm. In fact, sometimes the buffers might already exist.

These buffers are implemented in an LUT6 with INIT= 0xAAAAAAAAAAAAAAAA and can be easily replaced by CFGLUT5. A simple Trojan would consist in changing the INIT (LUT contents) value of CFGLUT5 to 0xAAAAAAAA and feedback CD O output to CD I input (see Fig. 1). The CE input is connected to the trigger of the Trojan. Now, when the Trojan is triggered once (one clock), INIT (LUT contents) value changes to 0x55555555 which changes the functionality of the gate to inverter. Another trigger brings back the INIT (LUT contents) value to 0xAAAAAAAA i.e., a buffer. The operations are illustrated in Fig. 4, where red block shows Trojan inverter and black blocks show a normal buffer. Thus by precisely controlling the trigger, an adversary can interchange between a buffer and inverter. Such a Trojan can be used in many scenarios like injecting single bit faults for Differential Fault Attacks [28] or controlling data multiplexers or misreading status flags, etc.
Fig. 4

Operations of CFGLUT5 to switch from a buffer to inverter and back

In the above example, we see how a buffer can be converted to an inverter by reconfiguring the CFGLUT5 upon receiving the trigger signal. One important observation is that we do not need any extra reconfiguration logic to modify the INIT (LUT contents) value of the CFGLUT5. The modification of the INIT (LUT contents) value is achieved by connecting the reconfiguration input port CD I to the reconfiguration data output port CD O . In other words, we can define the malicious INIT (LUT contents) value in following way
$$INIT_{malicious}=CS_{i}(INIT_{normal}) $$
where CS i denotes cyclic right shift by i bits. The approach of RLUT is harder to detect because the malicious payload does not exist in the design. It is configured when needed and immediately removed upon exploitation. In normal LUT, the malicious design is hard-wired (requires extra logic) and risk detection, whereas RLUT modifies existing resources and enables us to design design hardware Trojans without any extra reconfiguration logic. We will use similar methodologies for all the proposed hardware Trojans in this paper.
Next, we target a basic AES IP as a Trojan target. The architecture of the AES design is shown in Fig. 5. The AES takes 128 bits of plaintext and key as input and produce 128-bit cipher-text in 11 clock cycles. The control unit of the AES encryption engine is governed by a 4-bit mod-12 counter and generates three different control signals which are as follows:
  1. 1.

    load: It is used to switch between plaintext and MixColumns output. During the start of the encryption, this signal is made high to load the plaintext in the AES encryption engine.

     
  2. 2.

    S.R/M.C: It is used to switch between the ShiftRows and MixColumns output in the last round of AES.

     
  3. 3.

    done: It is used to indicate the end of encryption.

     
Fig. 5

AES architecture without Trojans

These signals are set high for different values of the counter. In our Trojan design, we mainly target the control unit of the AES architecture to disrupt the flow of the encryption scheme so that we can retrieve the AES encryption key.

For this, we have developed four different Trojans and have deployed them on the AES implementation. The objective of the developed Trojan is to retrieve the AES key with only one execution of hardware Trojan or single bad encryption. Indeed, it has been shown that only one faulty encryption, if it is accurate in time, suffices to extract a full 128-bit key [29, 30]. Triggering conditions can be further relaxed if several bad encryptions are acceptable. Each Trojan has trigger with different pulse-width or number of clock cycles. For different payloads, the RLUT content varies, hence variation in the trigger.

The detailed description of the developed Trojans are as follows.

3.3.1 Trojan 1

As we have stated earlier, the control unit of AES is based on a counter which also generates a done flag to indicate completion of the encryption cycle and is set to high only if counter value reaches 11. Signal done as shown in Fig. 5, is driven by a LUT6_2, which takes 4 bit counter value as input, and under normal operation it should contain INIT (LUT contents) value 0x00000800 (it means only the 11 t h bit is set to one, i.e., condition required for done signal). To insert a Trojan we replace this LUT with CFGLUT5 with INIT= 0x80000800. It must be noted that though the INIT (LUT contents) value of LUT6_2 and INIT (LUT contents) value of CFGLUT5 are not same, both will essentially produce the same output upon receiving the 4-bit counter value. This is because truth-table of a function of four variables consists of 16 bits only; hence, any change in the upper 16 bits of the INIT (LUT contents) value will not change the functionalities of the LUT.

The CD O output is feedback into CD I input as in the example above. A trigger of 2 clock cycles at the CE input activates the Trojan (INIT= 0x00002002) and produces the round 0 output (at round 0, counter value is 1) as the ciphertext. The round 0 output is actually the same as plaintext XOR key, and by knowing the plaintext, one can easily extract the full key with one wrong encryption. Again, we can see that malicious value of the INIT (LUT contents) is generated by cyclic shift of the original INIT (LUT contents) value of the CFGLUT5; hence, we do not need any extra logic to generate the new INIT (LUT contents) value.

After extracting the key, a trigger of 10 clock cycles will restore the normal operations of the AES (INIT= 0x00800800). This INIT (LUT contents) value need not to be the same value, with which we started the computation (INIT= 0x80000800), as long as the LUT generates correct output. The transition of INIT (LUT contents) to activate the Trojan and restore back is shown in Fig. 7a and the modifications in the AES architecture is shown in Fig. 6.
Fig. 6

AES architecture with Trojan 1

In the above Trojan description, we need 2 clock cycles to modify the CFGLUT5 to malicious Trojan configuration and 10 clock cycles to restore it to the original correct value. So in total, we require 12 clock cycles.

Keeping this in mind, we have implemented three different versions of the same Trojan, depending on the precision of the trigger.

  1. 1.

    Trojan 1a needs a 1 cycle trigger synchronized with the start of the encryption. This trigger is used to enable a FSM which generates 12 clock cycles for CE of the CFGLUT, in order to activate the Trojan and restore it back after exploitation. Because of this, the overhead of the developed Trojan is 6 LUTs and 4 flip-flops.

     
  2. 2.

    Trojan 1b is a zero overhead Trojan. It assumes an adversary to be slightly stronger than Trojan 1a who can generate a trigger signal active for precisely 12 cycles and synchronized with the start of encryption. This overhead is absent in Trojan 1b as the trigger itself act as the CE signal of RLUT.

     
  3. 3.

    Trojan 1c relaxes the restriction on the adversary seen at previous case. It assumes that there are some delays of n ≫ 10 clock cycles between two consecutive encryption. The choice of n ≫ 10 is due to the fact that we need 2 clock cycles to reconfigure the RLUT into malicious Trojan payload, and 10 clock cycles to restore it back to good value. Hence, the gap between two consecutive AES encryption should be greater than 10. The adversary provides a trigger of two clock cycles (not necessarily consecutive) before the start of current encryption. After the faulty encryption is complete, the adversary generates 10 trigger cycles (again not necessarily consecutive) to restore back the cipher operations. The overhead for this Trojan is 2 LUTs , due to routing of RLUT.

     

3.3.2 Trojan 2

This Trojan targets a different signal in the control unit of the AES design. As shown in Fig. 5, the design contains a multiplexer which switches between MixColumns output and input plaintext depending on the round/count value. The output of the multiplexer is produced at input of AddRoundKey operation. Under normal operation, multiplexer passes the input plaintext in round 0 (load signal of multiplexer is set to 1) and MixColumns output (ShiftRows output in the last round) in other rounds (select signal of multiplexer is set to 0). To design the Trojan, we have replaced the LUT6_2 (with INIT= 0x00000002) which generates load signal of the multiplexer with CFGLUT5, containing INIT=0x00400002. As we have observed for Trojan 1, the difference in the INIT (LUT contents) value in LUT6_2 and INIT (LUT contents) value of CFGLUT5 will essentially produce the same output.

In this case, also CD O port of CFGLUT5 is connected to CD I port, enabling cyclic shift of the INIT (LUT contents) value. Upon a trigger of 10 clock cycles, the INIT (LUT contents) value gets modified to INIT= 0x80000400 (it means load will set to one during the last round). This actually changes the multiplexer operation, modifying it to select the plaintext in the last round computation. From the resulting ciphertext of this faulted encryption, we can easily obtain the last round key, given the plaintext. Further a trigger of 2 clock cycles restores the normal operation (INIT= 0x00001002) as shown in Fig 7b. Again, the value over bit position 12 is not a problem as the select signal is controlled by a mod-12 counter and the value is never reached. The counter value 0 indicates idle state, 1 − 10 encryption and 11 indicates end of encryption. This Trojan also has a zero overhead as reconfiguration of the CFGLUT5 is obtained by cyclic right shifting of INIT (LUT contents). But the trigger signal need to be precise and should be available for consecutive 12 clock cycles. Hence, triggering cost is same as Trojan 1b.
Fig. 7

Operations of CFGLUT5 to activate the Trojan and restore to normal operations for a Trojan 1 and b Trojan 2. Bit positions not shown contain “0”

Table 1 summarizes the nature, trigger condition, and cost of the four Trojans. The secret key can be directly recovered by performing an XOR with the known plaintext.

The above-described Trojans can also be designed using normal LUTs. The zero overhead Trojans described above can be designed using 2 LUT overhead (One LUT for Trojan operation and other for selecting between Trojan and normal operations). But such Trojan designs would be easier to detect as the Trojan operated LUT is always present on the design unlike CFGLUT5, where the Trojan operated LUT is created by run-time reconfiguration.

Detectability

As we have seen in Table 1, Trojan 1b and Trojan 2 has zero overhead compared to the golden circuit of AES IP. This happens because no extra circuitry is required for design of these Trojans. Rather, we modify the content of RLUTs (INIT value) to convert a legitimate LUT into a malicious one. Additionally, as we have mentioned in Section 3.1, the usage of RLUT does not show any special element in design summary report, executed by FPGA design tools. Usage of RLUT in the design is reported as a LUT used as shift register. Now, LUT based shift registers are extensively used for efficient design of counters, FSM and lightweight crypto-algorithm implementation [31]. Moreover, in subsequent section, we will show many constructive applications based on RLUTs. The dual nature of RLUTs along with wide range of applications based on LUT based shift registers make the detection of RLUT based Trojan difficult.
Table 1

Area overhead of the Trojans on Virtex-5 FPGA

Trojan

Trigger

LUT

FF

Payload

Freq.

     

(MHz.)

AES

1594

260

Not

212.85

(No Trojan)

   

applicable

 

Trojan 1a

1 s

1600

264

6 LUTs

212.85

    

4 flip-flops

 

Trojan 1b

12 s

1594

260

0

212.85

Trojan 1c

12

1596

260

2 LUTs

212.85

Trojan 2

12 s

1594

260

0

212.85

Trigger is given in clock cycles and s subscript indicates trigger must be consecutive synchronized with the start of encryption

In this section, we have presented different scenarios where CFGLUT5 can be employed as hardware Trojans and can leak secret information from crypto-IPs like AES. We specifically have targeted multiplexers and FSMs of the circuit. It is also possible to design sophisticated Trojans using CFGLUT5 where the developed Trojan will work in conjunction with side-channel attacks or fault injections to increase the vulnerability of the underlying crypto-system.

4 Constructive Applications for RLUT

In the previous section, we discussed some application of RLUT for hardware Trojans into third party IPs. However, RLUT do have a brighter side to their portfolio. The easy and internal reconfigurability of RLUT can surely be well exploited by the designers to solve certain design issues. In the following, we detail two distinct cases with several applications, where RLUT can be put to good use.

4.1 Customizable Sboxes

A common requirement in several industrial application is dynamic or cutomizable substitution boxes (Sboxes) of a cipher. One such scenario which is often encountered by IP designers is the design of secret ciphers which are applied for industrial application. A majority of secret ciphers use a standard algorithm like AES with modified specification like custom Sboxes or linear operations. Sometimes the client is not comfortable to disclose these custom specifications to the IP designer. Common solutions either have a time-space overhead or resort to dynamic reconfiguration, to allow the client to program secret parameters at their facilities. A RLUT can come handy in this case.

There are several algorithms where the Sboxes can be secret. The former Soviet encryption algorithm GOST 28147-89 which was standardized by the Russian standardization agency in 1989 is a prominent example [32]. The A3/A8 GSM algorithm for European mobile telecommunications is another example. In the field of digital rights management, Cryptomeria cipher (C2) has a secret set of Sboxes which are generated and distributed to licencees only.

There are certain encryption schemes like DRECON [33], which offers DPA resistance by construction, exploiting tweakable ciphers. In this scheme, users exchange a set of tweak during the key exchange. The tweak is used to choose the set of Sboxes from a bigger pool of precomputed Sboxes. In the proposed implementation [33], the entire pool of Sboxes must be stored on-chip. Using RLUT, the Sboxes can be easily computed as a function of the tweak and stored on the fly. Similarly, a low-cost masking scheme RSM [4] can also benefit from RLUT to achieve desired rotation albeit at the cost of latency. Thus there exist several applications where customizable Sboxes are needed.

4.1.1 Architecture of Sbox Generator

As a proof of concept, we implement the Sbox generation scheme of [33]. The original implementation generates a pool of 324 × 4 Sboxes and stores it into BRAMs, while only 16 are used for a given encryption. It uses a set of Sboxes which are affine transformations of each other. For a given cryptographically strong Sbox S(⋅), one can generate 2 n strong Sboxes by following: Fi(x) = α S(x) ⊕ i for all i = 0,⋯ ,2 n − 1, where α is an invertible matrix of dimension n × n. α can also be considered a function of the tweak value t i.e. α = f(t). Since affine transformation does not change most of the cryptographic properties of Sboxes, all the generated Sboxes are of equal cryptographic strength [33].

The Sbox computation scheme of [33] can be very well implemented using RLUT as follows. The main objective of this Sbox generator is to compute a new affine Sbox from a given reference Sbox, and store it in the same location. The architecture is shown in Fig. 8. As we have stated earlier, each CFGLUT5 can be modeled as 2 output 4 input function generator, we can implement a 4 × 4 Sbox using two CFGLUT5 as shown in Fig. 8. We consider that the reference 4 × 4 Sbox is implemented using 2 CFGLUT5. We compute the new Sbox and program it in the same 2 CFGLUT5. The reconfiguration of the Sbox is carried through following steps:
  1. 1.

    Read the value of the Sbox for input 15.

     
  2. 2.

    Compute the new value (4-bits {3,2,1,0}) of the Sbox using affine transformation for the Sbox input 15.

     
  3. 3.

    Now CFGLUT5 is updated by the computed value, 2 bits for each CFGLUT5 ({3,2},{1,0}). However, only one bit can be shifted in CFGLUT5 in one clock cycle. Hence we shift in two bits, 1-bit in each CFGLUT5 ({0,2}) and store the other 2-bit ({1,3}) in two 16 bit registers.

     
  4. 4.

    After the 2-bits ({0,2}) of new value of Sbox is shifted in to position 0 of each CFGLUT5, old value for the position 15 is flushed out. The old value at position 14 is moved up to position 15. Thus the address is hard-coded to 4'd15.

     
  5. 5.

    Repeat steps 1 − 4 until whole old Sbox is read out i.e. 16 clock cycles.

     
  6. 6.

    After 16 clock cycles, we start to shift in the data which we stored in the shift register bits ({1,3}) for 16 Sbox entries, which takes another 16 clock cycles. This completes Sbox reconfiguration.

     
Fig. 8

Architecture of Sbox computation using affine transformation and storing in RLUT

The architecture requires 56 LUTs, 38 flip-flops with a maximum operating frequency of 271 MHz. To reconfigure one Sbox, we need 32 clock cycles. Now depending on the application and desired security the sbox recomputation can be done after several encryption or every encryption or every round. It is a purely security-performance trade-off.

4.2 Sbox Scrambling for DPA Resistance

RLUT also have the potential to provide side-channel resistance. The reconfiguration provided by RLUT can be very well used to confuse the attackers. A beneficial target would be the much studied masking countermeasures [3] which suffer from high overhead due to the requirement of regular mask refresh. One of the masking countermeasures which was fine-tuned for FPGA implementation is Block Memory content Scrambling (BMS [3]). This scheme claims first-order security and, to our knowledge, no practical attack has been published against it. However, Sbox Scrambling using BRAM is inefficient on lightweight ciphers with 4x4 sboxes due to underutilization of resources. Hence we propose a novel architecture using RLUT to address this. This mechanism can easily be translated to AES also, however, it remains most interesting for lightweight algorithms like PRESENT.

The authors in [34] have also implemented s-box scrambling scheme using LUT based distributed memory. There they have shown that from security point of view, BRAM based s-box scrambling scheme outperforms LUT based s-box scrambling scheme. Nevertheless, LUT based s-box scrambling scheme could be an attractive choice for lightweight cipher due to its less overhead. Additionally, RLUT-based side-channel resistant solutions can be also combined with other side-channel countermeasures as shown in [10].

The side-channel countermeasure using RLUT, shown in [10] is different from the proposed design architecture. The design of [10] implements standard Boolean masking scheme, where each round uses a different mask. Here, we propose a lightweight architecture of SBox scrambling scheme presented in [3]. These two countermeasures have similar objectives but quite different designs.

The BMS scheme works as follows: let Y (X) = P(SL(X)) be a round of block cipher, where X is the data, P(⋅) is the linear and SL(⋅) is the non-linear layer of the block cipher. For example in PRESENT cipher [35], the non-linear layer is composed of 164 × 4 Sboxes and the linear layer is bit-permutation. According to the BMS scheme, the masked round can be written as Y M (X) = P(SL M (X M )), where X M is masked data XM and SL M (⋅) is the Sbox layer of 16 scrambled Sbox. Now each Sbox S m (⋅) in SL M is scrambled with one nibble m of the 64-bit mask M. The scrambled Sbox S m (⋅) can be simplified as S m (x m )) = S(x m m) ⊕ P− 1(m), where x is one nibble of round input X. Next in a dual-port BRAM which is divided into an active and inactive segment, where the active segment contains SLM0(⋅) i.e. Sbox scrambled with mask M0 is used for encryptions. Parallely, another Sbox layer SLM1(⋅) scrambled with mask M1 is computed in an encryption-independent process and stored in the inactive segment. Every few encryption, the active and inactive contents are swapped and a new Sbox scrambled with a fresh mask is computed and stored in the current inactive segment. This functioning is illustrated in Fig. 9.
Fig. 9

Architecture of Modified PRESENT Round. SLM0 is the (precomputed) active SLayer while SLM1 is being computed as in Fig. 10

BMS is an efficient countermeasure and shown to have reasonable overhead of 44% for LUTs, 2 × BRAMs and roughly 3 × extra flip-flops in FPGA. Another advantage of BMS is that it is generic, i.e., it can be applied to any cryptographic algorithm. BMS can be viewed as a leakage resilient implementation,

where the cipher is not called enough with a fixed mask for an attack to succeed. The memory contexts are swapped again with a fresh mask. However, for certain algorithms BMS could become unattractive. For example in a lightweight algorithm like PRESENT, a 4 × 4 Sbox can be easily implemented in 4 LUTs. In newer FPGA families which support 2-output LUT, 2 LUTs are enough to implement a Sbox. Using a BRAM in such a scenario would lead to huge wastage of resources.

4.2.1 Sbox Scrambling using RLUT

In the following, we use RLUT to implement BMS like countermeasure. Precisely we design a PRESENT cryptoprocessor protected with a BMS like scrambling scheme but using RLUTs to store scrambled Sboxes. The rest of the scheme is left same as [3]. The architecture of Sbox scrambler using RLUT is shown in Fig. 10. SBOX P is the PRESENT Sbox. A mod16 counter generates the Sbox address ADDR which is masked with Mask m of 4-bits. The output of Sbox is scrambled with the inverse permutation of the mask to scramble the Sbox value. Please note that the permutation must be applied on the whole 64-bits of the mask to get 4-bits of the scrambling constant for each Sbox. Each output of the scrambler is 4-bits. As stated before, each 4 × 4 Sbox can be implemented in 2 CFGLUT5 each producing 2-bits of the Sbox computation. Let us call the CFGLUT5 producing bits 0,1 as SBOX M L and bits 2,3 as SBOX M H . The 4-bit output of the scrambler is split into two buses of 2-bits ({3,2},{1,0}). Bits {3,2} and {1,0} are then fed to the CD I of SBOX M L and SBOX M H respectively, through a FIFO. The same scrambler is used to generate all the 16 Sboxes one after the other and program CFGLUT5. In total it requires 16 × 32 clock cycles to refresh all 16 inactive Sboxes. We implement two parallel layers of SBoxes. When the active layer is computing the cipher, the inactive one is being refreshed. Thus cipher operation is not stalled. 16 × 32 clocks (16 encryptions) are needed to refresh the inactive layer and this means that we can swap active and inactive SBoxes after every 16 encryptions. Swap means that active SBox become inactive and vice versa. The cipher design uses active SBox only. The area overhead comes from the scrambler circuit and multiplexers used to swap active/inactive Sboxes. We implemented a PRESENT crypto-processor and protected it with Sbox scrambling countermeasure. The area and performance figures of the original design and its protected version are summarized in Table 2. It should be noted that proposed design has more overhead compared to original BMS scheme in terms of LUT and flip-flops, but does not require any block RAMs which are essential part of original BMS scheme.
Fig. 10

Architecture of Sbox Scrambler

Table 2

The Area and Performance Overhead of Scrambling Scheme on Virtex-5 FPGA

Architecture

LUTs

Flip-flops

Frequency (MHz)

Original

208

150

196

Scrambled

557

552

189

Overhead

2.67 ×

3.68 ×

1.03 ×

A direct comparison with the implementation of [3] is not possible as we present our case study on the lightweight cipher PRESENT, whereas in [3], the authors have presented the case study on AES. Each s-box of PRESENT can be stored in a 16 × 4 memory cell, requiring 64-bit memory. For s-box scrambling scheme, we require two such s-boxes, amounting 128 bits of memory. These two s-boxes can be stored in a single dual port BRAM. However, single BRAM on an FPGA can store upto 36 Kbits of data. It can store upto 8000, 4-bit words. But, to store two PRESENT s-boxes, we need only 32 such words. Hence, the BRAM will be massively underutilized. It must be noted that BRAM is a highly scarce resource compared to LUTs. The ratio of BRAM to LUTs varies from 1 : 600 to 1 : 720 [36]. Hence, the percentage overhead increase in the LUT based s-box scrambling scheme is much less compared to BRAM based s-box scrambling scheme in case of lightweight ciphers.

4.2.2 Security Validation

In this part, the security of the proposed Sbox scrambling scheme on a PRESENT co-processor is evaluated against side-channel attacks. To evaluate the side-channel resistance of the proposed scheme, leakage detection techniques are used. Leakage detection [37, 38] is an alternative to traditional side-channel attacks like correlation power analysis (CPA [15]). Side-channel attacks like CPA mounts a key recovery attack on the target device to test the device. The efficiency of this attack highly depends on the estimation of underlying leakage model. While for unprotected targets generic leakage models work fine, the leakage model in a protected device can be quite complex and difficult to estimate. Moreover, attacking with wrongly estimated model affects the efficiency of the attack. As an alternative, researchers have adopted leakage detection techniques. Such techniques do not mount an attack on the device but only detect if any visible leakage is present in the side-channel traces. In other words, the leakage is detected but not exploited. The advantage of leakage detection techniques is that it is leakage model agnostic and can provide a pass/fail criteria for a device under test.

In this work, we have used test vector leakage assessment (TVLA [38]) as a leakage detection test. Precisely, we use the non-specific Fixed vs Random (FvR) test from TVLA suite. The test is conducted as follows: For a given device, N side-channel traces are collected for a random plaintext input and a fixed key (referred set R). Thereafter, under the same measurement conditions, another set of N side channel measurements for fixed plaintext input and fixed key (referred set F) is collected. Now, we compute the Welsh T-test on the collected traces as \(T=\frac {\mu _{r}-\mu _{f}}{\sqrt {\frac {{\sigma _{r}^{2}}+{\sigma _{f}^{2}}}{N}}}\), where μ r ,σ r are mean and standard deviation of set R and μ f ,σ f are mean and standard deviation of set F. If the absolute T-value is > 4.5, then the collected traces carry side-channel leakage. The process is repeated for each sample in the collected traces. Moreover, to avoid any change in environmental conditions for the two set of traces, the random and fixed plaintext are interleaved. Also it is recommended to run TVLA on middle rounds to avoid misleading leakage from plaintext and ciphertext loading, which is not sensitive in nature. For a properly masked implementation, TVLA should not be able to distinguish random plaintext from fixed plaintext and thus give a absolute TVLA value < 4.5.

To perform the evaluation, the PRESENT coprocessor, protected with SRAM scrambling, is implemented on a Virtex 5 lx50 FPGA soldered on a SASEBO-GII board. The side-channel traces are collected from the back-side of the board, by placing a high-precision EM probe over a chosen decoupling capacitor. The EM probe is further shielded to reduce interference from neighboring components on the PCB board. The measurement setup is illustrated in Fig. 11 and we capture traces corresponding to three middle rounds of the cipher PRESENT, as seen in Fig. 12. We collected 150,000 traces for random (set R) and fixed (set F) plaintext each while the mask is set to random. To compare, we repeat the trace collection of 150,000 for random and fixed plaintext, with mask fixed to 0. The latter is a case of countermeasure turned-off. While the T-value for device with mask-off crosses the ± 4.5 mark with as low as 3000 traces, T-value with the mask on stays below ± 4.5 threshold with 145,000 traces. At 150,000 traces, we observe that the T-value with mask on, finally crosses the ± 4.5 threshold. This provides a security gain of about 50 × for the Sbox Scrambling scheme. The TVLA result with 50,000 and 150,000 traces is plotted in Fig. 13. It can be observed that with 50,000 traces, the design with mask off have a huge leakage while design with mask on have a maximum absolute TVLA value of 3.6. With 150,000 traces, the design with mask off still has significant leakage, while the design with mask on merely crosses the ± 4.5 threshold. Thus the Sbox scrambling scheme provides an enhanced side-channel protection, however some non-linearities in the implementation still leak. This result is different from the original BRAM scrambling scheme (BMS), as BMS uses block RAM which have a totally different leakage profile as compared to RLUTs.
Fig. 11

The EM measurement setup on SASEBO-GII FPGA board

Fig. 12

The measure window drops on the middle 3 out of the 32 encryption rounds of PRESENT encryption

Fig. 13

TVLA result with Mask on and Mask off on a 50,000 traces; b 150,000 traces

4.3 Non-Proprietary Kill Switch

Previous subsections presented on constructive application of RLUT in cryptographic circuits. In this part, a generic protection scheme is presented to prevent FPGA hardware from environmental hazards and malicious attack attempts. Often, FPGA are deployed in hostile environment where the FPGA chip might be subjected to extreme environmental (high temperature etc.) conditions. Even an adversary can put the FPGA chip under such conditions to disturb its activity. A preventive solution in this case is a kill switch which will disable the device either permanently or until next intervention. FPGA vendors also sometimes provide an in-built kill switch which are based on fuses. Our focus is on to develop a non-proprietary, fully digital and extremely lightweight kill switch on FPGAs for wide deployment.

Modern FPGAs nowadays offer many security features which an FPGA user can access to increase the security and tamper resistance of the design. One of such features is zeroization which is part of FPGA security IP core which is proprietary to the corresponding FPGA vendor [39]. The zeroization feature offers to erase sensitive information to prevent their disclosure if the system is attacked or is at an increased risk of unauthorized access. It clears the FPGA configuration memory and the AES-GCM encryption key if it detects an attack. Our approach is a non-proprietary hardware design technique which a user can integrate easily with their existing design, whereas the zeroization feature has to be enabled by instantiating security primitive. More specifically, to take advantages of the zeroisation and other security features provided by the security IP core, the user needs to collaborate with the FPGA vendor. Additionally, at present, the security IP core can only read the operating voltage and temperature sensor data. Such restrictions are not applicable to the proposed kill-switch design as in this scenario, the user can decide the condition for which the kill switch will get activated. The overhead of the security IP core varies from 1% to 8% of the overall design [39] whereas, the overhead of the proposed kill switch is only one RLUT and one multiplexer. Combining the proposed kill switch with the existing security features of the modern FPGAs would be an interesting research direction.

As discussed before, RLUT can be reconfigured with a new truth-table mapping in the run-time during the execution of the design. This specific property was exploited for insertion of hardware Trojans in Section 3. However, in this section we will eventually use this same property to design the protective kill switch. The term kill switch is also has been used in context of the hardware Trojans which destroys the FPGA design upon triggered [40]. On the other hand, in this section we focus upon a constructive kill switch which acts as a protective barrier over the design.

The concept of kill switch for protection of digital design exists in the VLSI industry from a long time. However, most of these designs are either patented or proprietary. Moreover, this designs are based on non-standard technology (fuse) and hence can not be easily integrated with the digital design. For FPGA, Altera provides the designer an option of kii switch to the design [41]. But the designer can not integrate it in his own IP. On the contrary, the solution proposed by us in this paper is not only non-proprietary but also easy to integrate with digital design with very low area overhead.

The block diagram of the proposed kill switch is shown in the Fig. 14.
Fig. 14

Block Diagram of Kill Switch

This module takes the user provided reset signal as input and generates the global system reset signal. The attached sensor with the kill-switch module generates the kill trigger signal which acts as the trigger of the kill-switch module. Whenever the sensor observes any abnormal environmental condition, the kill trigger signal goes high and triggers the kill-switch module. In normal condition when the kill trigger signal is low, global system reset signal is same as the user provided reset signal. However, once the kill trigger signal becomes high, the global reset signal is over written by the kill signal which forces the device into a permanent state of reset. Thus, the circuit is bricked, upon sensing a hostile condition. This deadlock can only broken by removing the abnormal environmental condition and reconfiguring the FPGA with bit-stream again. The abnormal environmental condition can occur due to various reason. Apart from high supply voltage and high temperature variation, the sensor can be configured to sense the presence of side-channel probing equipment like [25]. Thus, with a well-designed sensor, the kill-switch module can even protect the system from side-channel attacks.

The detailed operation of the kill-switch module is described in Figs. 15 and 16. refers to the kill-switch configuration in the triggered state. In normal condition (see Fig. 15), the truth table mapping of the RLUT (INIT or LUT contents) is set to 0x80000000 and the all the five input pins (I0I4) are set to 1. Hence the output of the the RLUT is always 1. This make the \(\overline {kill\_signal}\) to 1 and thus the global system reset signal same as the user provided reset signal. Under hostile condition, when the kill trigger signal is set by the sensor (see Fig. 16), INIT(LUT contents) value gets modified from 0x80000000 to 0x00000000 in a single clock cycle. This makes the output of the RLUT 0, consequently assigning a permanent 0 to the global reset signal. Here we are assuming that the reset condition of the circuit is active low. If the reset of the circuit is active high, the reset polarity must be adjusted accordingly. The RLUT connections are such hard coded that it is not possible to alter it-s value once triggered by the sensor. The only way to make the circuit functional is by reconfiguring the netlist.

The overhead of the entire kill-switch module is only one RLUT and one multiplexer. The multiplexer, shown in Figs. 15 and 16 can also be implemented on the multiplexers present in CLB. Moreover, the design does not have any timing overhead, making it extremely efficient.
Fig. 15

Kill Switch in Normal Condition

Fig. 16

Kill Switch in Triggered Condition

5 Conclusions

Due to the recent progress in performance of modern FPGAs, they are getting integrated in many security-critical applications. Hence, a comprehensive security analysis of modern FPGAs and their’s advanced feature is of paramount importance. In this paper we have analyzed a particular feature of modern FPGA and proposed new design methodologies which can impact the security of the embedded system both positively and negatively. This feature which allows run-time reconfiguration of LUT, provides a unique opportunity for a designer to come up with novel and efficient design strategies for cryptographic implementations using RLUTs. We proposed few such design strategies in this paper from the point of view of both constructive and destructive applications.

First, it has been shown that the RLUT can be used by an attacker to create Hardware Trojans. Indeed the payload of stealthy Trojans can be inserted easily in IP by untrusted vendors. The Trojans can be used to inject faults or modify the control signals in order to facilitate the key extraction. This is illustrated by a few examples of Trojans in AES. Run-time reconfiguration of RLUTs allows the Trojan payload to be extremely lightweight, making the Trojan detection extremely difficult in absence of golden model. Second the protective property of RLUT has been illustrated by increasing the resiliency of the Sboxes of cryptographic algorithms. This is accomplished either by changing dynamically the Sboxes of customized algorithms or scrambling the Sboxes of standard algorithms. These type of design techniques are extremely useful for lightweight block ciphers with Sbox of smaller dimension. Moreover, generating Sboxes in run-time is an attractive design choice for the designer employing ciphers with secret Sboxes. Low area and nearly negligible timing overhead of these solutions make them highly attractive for IoT applications. The effectiveness of RLUT based Sbox scrambling methodology has been verified by leaking detection TVLA on hardware implementation of cipher PRESENT. The result shows clear improvement of side channel resistance compared to unprotected PRESENT. Moreover, in this paper we proposed non-proprietary kill-switch design methodology using RLUT and can be easily integrated with third party IP designs.

To sum up, this paper clearly shows the ramifications of RLUT usage on security of FPGA based embedded system. As we have shown in this paper, RLUTs can be used as both in both productive and harmful application, acting as a double-edged sword for security applications on FPGAs. This conflicted usage motivates further research in two principal directions. Firstly, there is need for Trojan detection techniques at IP level. This detection techniques should be capable of distinguishing a RLUT based optimizations from potential Trojans. Additionally, RLUT-based side-channel resistant designs can also be combined by other side-channel countermeasures to increase the guarantee of security.

References

  1. 1.
    Trimberger S, Moore J (2014) FPGA security: Motivations, features, and applications. Proc IEEE 102 (8):1248–1265CrossRefGoogle Scholar
  2. 2.
    Trimberger S M, Moore J J (2014) FPGA security: Motivations, features, and applications. Proc IEEE 102(8):1248–1265CrossRefGoogle Scholar
  3. 3.
    Güneysu T, Moradi A (2011) Generic side-channel countermeasures for reconfigurable devices. In: Preneel B, Takagi T (eds) CHES, ser. LNCS, vol 6917. Springer, pp 33–48Google Scholar
  4. 4.
    Bhasin S, He W, Guilley S, Danger J-L (2013) Exploiting FPGA block memories for protected cryptographic implementations. In: ReCoSoC. IEEE, pp 1–8Google Scholar
  5. 5.
    Güneysu T, Paar C (2008) Ultra high performance ECC over NIST primes on commercial FPGAs.. In: CHES, pp 62–78Google Scholar
  6. 6.
    Roy DB, Mukhopadhyay D, Izumi M, Takahashi J (2014) Tile before multiplication: An efficient strategy to optimize DSP multiplier for accelerating prime field ECC for NIST curves.. In: The 51st annual design automation conference 2014, DAC ’14. ACM, San Francisco, CA, pp 1–6. [Online]. http://doi.acm.org/10.1145/2593069.2593234
  7. 7.
    Güneysu T Getting post-quantum crypto algorithms ready for deploymentGoogle Scholar
  8. 8.
    He W, Otero A, de la Torre E, Riesgo T (2012) Automatic generation of identical routing pairs for FPGA implemented DPL logic.. In: ReConFig. IEEE, pp 1–6Google Scholar
  9. 9.
    Kumm M, Möller K, Zipf P (2013) Reconfigurable FIR filter using distributed arithmetic on FPGAs. In: 2013 IEEE international symposium on circuits and systems (ISCAS2013). IEEE, Beijing, China, pp 2058–2061.  https://doi.org/10.1109/ISCAS.2013.6572277
  10. 10.
    Sasdrich P, Moradi A, Mischke O, Gu̇neysu T (2015) Achieving side-channel protection with dynamic logic reconfiguration on modern FPGAs.. In: IEEE international symposium on hardware oriented security and trust, HOST 2015. Washington, DC, pp 130–136Google Scholar
  11. 11.
    Kutzner S, Poschmann A, Stȯttinger M (2013) TROJANUS: an ultra-lightweight side-channel leakage generator for fpgas.. In: 2013 international conference on field-programmable technology, FPT 2013. Kyoto, Japan, pp 160–167Google Scholar
  12. 12.
    Bogdanov A, Knudsen L R, Leander G, Paar C, Poschmann A, Robshaw M J B, Seurin Y, Vikkelsoe C (2007) PRESENT: an ultra-lightweight block cipher. Springer, Berlin Heidelberg, Berlin, pp 450–466MATHGoogle Scholar
  13. 13.
    Madlener F, Sotttinger M, Huss S (2009) Novel hardening techniques against differential power analysis for multiplication in G F(2n). In: International conference on field-programmable technology, 2009. FPT 2009, pp 328–334Google Scholar
  14. 14.
    Xilinx Xilinx Partial Reconfiguration User Guide (UG702). http://www.xilinx.com/support/documentation/sw_manuals/xilinx14_1/ug702.pdf
  15. 15.
    Brier É, Clavier C, Olivier F (2004) Correlation power analysis with a leakage model, vol 3156. Springer, Cambridge, pp 16–29MATHGoogle Scholar
  16. 16.
    Ali S, Chakraborty R S, Mukhopadhyay D, Bhunia S (2011) Multi-level attacks: an emerging security concern for cryptographic hardware. In: Design, automation and test in Europe, DATE 2011. Grenoble, France, pp 1176–1179Google Scholar
  17. 17.
    Chakraborty R S, Narasimhan S, Bhunia S (2009) Hardware trojan: threats and emerging solutions. In: IEEE international high level design validation and test workshop, HLDVT 2009. San Francisco, CA, pp 166–171Google Scholar
  18. 18.
    Tehranipoor M, Forte D (2014) Tutorial T4: All you need to know about hardware Trojans and Counterfeit ICs. In: 2014 27th international conference on VLSI design and 2014 13th international conference on embedded systems. Mumbai, India, pp 9–10Google Scholar
  19. 19.
    Chen Z, Guo X, Nagesh R, Reddy A, Gora M, Maiti A Hardware trojan designs on basys FPGA boardGoogle Scholar
  20. 20.
    Johnson AP, Saha S, Chakraborty RS, Mukhopadhyay D, Gören S (2014) Fault attack on AES via hardware trojan insertion by dynamic partial reconfiguration of FPGA over ethernet. In: Proceedings of the 9th workshop on embedded systems security, ser. WESS ’14. ACM, New York, NY, pp 1:1–1:8. http://doi.acm.org/10.1145/2668322.2668323
  21. 21.
    Shende R, Ambawade D D (2016) A side channel based power analysis technique for hardware trojan detection using statistical learning approach. In: 2016 thirteenth international conference on wireless and optical communications networks (WOCN), pp 1–4Google Scholar
  22. 22.
    Bhasin S, Danger J-L, Guilley S, Ngo XT, Sauvage L (2013) Hardware trojan horses in cryptographic IP cores. In: Fischer W, Schmidt J-M (eds) FDTC. IEEE, pp 15–29Google Scholar
  23. 23.
    Note J-B, Rannaud E (2008) From the Bitstream to the Netlist. In: Proceedings of the 16th international ACM/SIGDA symposium on field programmable gate arrays, ser. FPGA ’08. ACM, New York, NY, pp 264–264. http://doi.acm.org/10.1145/1344671.1344729
  24. 24.
    Benchmarks https://www.trust-hub.org/resources/benchmarks, accessed: 2015-01-30
  25. 25.
    Homma N, Hayashi Y, Miura N, Fujimoto D, Tanaka D, Nagata M, Aoki T (2014) EM attack is non-invasive? - Design methodology and validity verification of EM attack sensor. In: Proceedings of the 16th international workshop on cryptographic hardware and embedded systems - CHES 2014. Busan, South Korea, pp 1–16Google Scholar
  26. 26.
    Ng X T, Naj Z, Bhasin S, Roy D B, Danger J-L, Guilley S (2015) Integrated sensor: a backdoor for hardware trojan insertions?. In: 2015 Euromicro conference on digital system design (DSD). IEEE, pp 415–422Google Scholar
  27. 27.
  28. 28.
    Piret G, Quisquater J-J (2003) A Differential fault attack technique against spn structures, with application to the AES and Khazad. In: CHES, ser. LNCS, vol. 2779. Springer, Cologne, Germany, pp 77–88Google Scholar
  29. 29.
    Tunstall M, Mukhopadhyay D, Ali S (2011) Differential fault analysis of the advanced encryption standard using a single fault. In: Ardagna CA, Zhou J (eds) WISTP, ser. Lecture notes in computer science, vol 6633. Springer, pp 224–233Google Scholar
  30. 30.
    Ali S, Mukhopadhyay D, Tunstall M (2013) Differential fault analysis of AES: towards reaching its limits. J Cryptogr Eng 3(2):73–97CrossRefGoogle Scholar
  31. 31.
    Aysu A, Gulcan E, Schaumont P (2014) Simon says: Break area records of block ciphers on fpgas. IEEE Embed Syst Lett 6(2):37–40CrossRefGoogle Scholar
  32. 32.
    Poschmann A, Ling S, Wang H (2010) 256 bit standardized crypto for 650 GE - GOST revisited. In: Mangard S, Standaert FS (eds) Cryptographic hardware and embedded systems, CHES 2010, ser. Lecture notes in computer science, vol 6225. Springer, Berlin Heidelberg, pp 219–233.  https://doi.org/10.1007/978-3-642-15031-9_15
  33. 33.
    Hajra S, Rebeiro C, Bhasin S, Bajaj G, Sharma S, Guilley S, Mukhopadhyay D (2014) DRECON: DPA resistant encryption by construction. In: Pointcheval D, Vergnaud D (eds) AFRICACRYPT, ser. Lecture notes in computer science, vol 8469. Springer, pp 420–439.  https://doi.org/10.1007/978-3-319-06734-6
  34. 34.
    Sasdrich P, Mischke O, Moradi A, Güneysu T (2015) Side-channel protection by randomizing look-up tables on reconfigurable hardware - pitfalls of memory primitives. Cryptology ePrint Archive, Report 2015/198. http://eprint.iacr.org/2015/198
  35. 35.
    Bogdanov A, Knudsen LR, Leander G, Paar C, Poschmann A, Robshaw MJB, Seurin Y, Vikkelsoe C (2007) PRESENT: an ultra-lightweight block cipher. In: CHES, ser. LNCS, vol 4727. Springer, Vienna, Austria, pp 450–466Google Scholar
  36. 36.
    Virtex-5 family overview (ds100). https://www.xilinx.com/support/documentation/data_sheets/ds100.pdf. Accessed 1 Jan 2018
  37. 37.
    Bhasin S, Danger J-L, Guilley S, Najm Z (2014) NICV: normalized inter-class variance for detection of side-channel leakage. In: International symposium on electromagnetic compatibility (EMC ’14 / Tokyo). IEEE, Session OS09: EM Information Leakage. Hitotsubashi Hall (National Center of Sciences), Chiyoda, Tokyo, JapanGoogle Scholar
  38. 38.
    Goodwill G, Jun B, Jaffe J, Rohatgi P (2011) A testing methodology for side-channel resistance validation. NIST Non-Invasive Attack Testing Workshop. http://csrc.nist.gov/news_events/non-invasive-attack-testing-workshop/papers/08_Goodwill.pdf
  39. 39.
  40. 40.
    Adee S (2008) The hunt for the kill switch. IEEE Spectr 45(5):34–39.  https://doi.org/10.1109/MSPEC.2008.4505310 CrossRefGoogle Scholar
  41. 41.
    Pedersen B, Reese D, Joyce J (2012) Method and apparatus for securing a programmable device using a kill switch uS Patent App. 13/097,816. http://www.google.com/patents/US20120274351

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Debapriya Basu Roy
    • 1
  • Shivam Bhasin
    • 2
  • Jean-Luc Danger
    • 3
  • Sylvain Guilley
    • 3
  • Wei He
    • 4
  • Debdeep Mukhopadhyay
    • 1
  • Zakaria Najm
    • 2
  • Xuan Thuy Ngo
    • 5
  1. 1.SEAL, Department of Computer Science and EngineeringIndian Institute of Technology KharagpurKharagpurIndia
  2. 2.Temasek LaboratoriesNanyang Technological UniversitySingaporeSingapore
  3. 3.Institut MINES-TELECOM and Secure-IC SASParisFrance
  4. 4.Shield Lab, Central Research InstituteHuawei International Pte. LtdSingaporeSingapore
  5. 5.Secure-IC SASCesson-SévignéFrance

Personalised recommendations