ESCALATION: Leveraging Logic Masking to Facilitate PathDelayBased Hardware Trojan Detection Methods
 446 Downloads
Abstract
Hardware Trojan (HT), intellectual property (IP) piracy, and overproduction of integrated circuit (IC) are three threats that may happen in untrusted fabrication foundries. HTs are malicious circuitry changes in the IC layout. They affect sidechannels (IC parameters) such as pathdelay or power consumption. Therefore, HT detection methods based on sidechannel analysis have been proposed. They can detect an HT only if its effects on sidechannels are significant among the alteration of sidechannels, caused by process^{1} and environment^{2} variations. IC design modifications at different abstraction levels have been proposed to facilitate HT detection methods after fabrication, such as modifying a circuit to make the paths^{3} of the circuit more sensitive to HTs. Such modifications are known as designfortrust (DfTr). In addition, keybased modifications have been proposed to protect IPs/ICs from IP piracy and IC overproduction. This approach is known as masking or obfuscation, and it modifies a circuit such that it does not correctly work without applying a correct key. In this work, we propose a DfTr method based on leveraging the masking approach. It improves HT detection methods based on pathdelay analysis. As a matter of fact, the delay of shorter paths varies less than that of longer ones. Therefore, the objective of the proposed DfTr is to generate fake short paths for nets that only belong to long paths. Our layout level experiments show that the proposed DfTr masks the functionality of circuits and, on average, increases the HT detectability of pathdelaybased detection methods by 10%.
Keywords
Hardware security Designfortrust Logic masking Hardware Trojan detection IP/IC piracy1 Introduction
The fabless business model has progressively become the main model within the semiconductor industry over the last two decades. Although this model reduces development costs, it faces security challenges among its roles [1], such as design houses, IP developers, system integrators, and fabrication foundries. An important role is fabrication foundries because they can take advantage of access to the ordered layouts and insert a malicious functionality, known as Hardware Trojan (HT). HTs may cause a failure or leak confidential information [2]. In addition, untrusted fabrication foundries can overproduce the layout or perform reverse engineering on the layout functionality and then extract and pirate the intellectual properties (IPs) of the layout [1].
HT detection is a challenging task [2, 3] because smart and skillful attackers insert hardtotrig and welldesigned HTs into the ordered layout. In other words, HT attackers try to use low controllable signals for HT activations (i.e., the trigger part of HT), as well as low observable signals for HT missions (i.e., the payload part of HT). Such HTs rarely change the circuit outputs. Moreover, welldesigned HTs affect as least as possible the circuit sidechannels, such as power or delays of paths^{3} of a circuit. In such cases, the HT effects on sidechannels are not easily distinguishable because sidechannels do not have a fixed value and they vary due to process variations (PV^{1}) and environment variations (EV^{2}). HT detection difficulties impose some design modifications on designers. These modifications can hinder HT attackers or facilitate HT detection methods. This approach is entitled designfortrust (DfTr) [3, 4]. For instance, in order to improve pathdelaybased HT detection methods, one can modify a circuit so that it has fewer paths having high delay variations. In this case, the choices for an HT attacker to insert a hardtodetect HT are more limited. Another type of DfTr is to add calibration structures to a circuit in order to accurately measure sidechannels and PV effects [4].
Keybased design modifications have been proposed to prevent IP/IC piracy [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]. Their basic approach is to modify a circuit such that it does not properly function without applying a correct key [1]; therefore, the piracy of the circuit is meaningless as long as the key is not revealed. Keybased design modifications change the finitestate machine of circuits, at the registertransfer level (RTL), and/or in the combinational part of circuits, at the gate level (GL). Authors have used different names for this approach according to the implementation in GL or RTL, such as logic/state obfuscation [6, 7, 8], logic/state encryption [10, 13], logic/state locking [11, 12, 13], and logic/state masking [5, 14]. Brice et al. discussed that the masking term is more accurate [15], and it is used hereafter.
The second benefit of keybased modifications is to hinder HT insertion; so they are considered as DfTr. Indeed, during the fabrication stage, if HT designers do not know the correct key and functionality, they have more difficulty to design a hardtodetect and efficient HT. For instance, a poorly designed HT might be activated by a sequence of events that never happens if the circuit is fed by the correct key [6, 16].
In [17], we proposed a new benefit of keybased modifications. The proposed approach is to leverage logic masking to improve pathdelaybased HT detection methods. We called the proposed approach “ESCALATION: rEusing logic maSking to faCilitate pAthdeLaybAsed Trojan detectION”. ESCALATION masks the functionality (combinational part) of circuits while improving the efficiency of pathdelaybased HT detection methods. Apart from logic masking benefits, ESCALATION’s objective is to generate fake short paths for nets that only belong to long paths, because shorter paths have smaller delay variation [18]. In [17], we reported the improvement of HT detection probability (HDP) obtained by the proposed algorithm, for a few technologymapped circuits. In addition, the masking quality of the proposed approach was compared with one masking algorithm proposed in [6], namely HARPOON.

We detail the proposed algorithm accompanied by three ideas for improvement. The ideas are explained and shown in examples.

We compare the results and overheads of the proposed algorithm with trustdriven retiming (TDR) proposed in [18]. TDR has the same objective as the proposed algorithm, but TDR does not make any changes in or mask the original functionality. The comparisons are performed at the gate level because the TDR results and overheads are reported at the gate level.

We validate the HT detection efficiency of the proposed algorithm at the layout level. Our experimental results show that our algorithm improves the HDP based on pathdelay analysis. It also provides logic masking.

We validate the logic masking efficiency of the proposed algorithm. We calculate the Hamming distance (HD) of the outputs of masked circuits while they have the correct or wrong keys. We use HD, as a usual metric, to compare the masking quality of our algorithm with that of previous work. The results show that the proposed algorithm makes for better HD than the random masking presented in the following sections.
The rest of the paper is organized as follows: In Section II, we survey related work on both HT detection and logic masking methods. In Section III, we explain the ESCALATION approach and an algorithm based on it. In Section IV, we evaluate the efficiency and the costs of our methodology. The experimental results are given in Section V. We present layoutlevel validation in Section VI. Section VII includes discussions about logic masking in general, and the ESCALATION approach, such as how one can use logic masking in milliongate circuits, and how one can improve HD in the proposed algorithm. Finally, in Section VIII, we draw some conclusions and propose future perspectives on this work.
2 Related Previous Work
In this section, the flow of sidechannelbased HT detection methods is firstly explained. Then, the concepts used in the proposed algorithm are described in previous work relating to pathdelaybased HT detection methods and masking algorithms.
2.1 The Flow of SideChannelBased Hardware Trojan Detection
In order to identify the HT effects on a sidechannel, one needs to generate and apply some test vectors. For instance, the authors proposed approaches and hardware structures for pathdelay measurement and HT detection [19, 20]; they used the test vectors generated by traditional pathdelayfault detection algorithms.
The flow of HT detection methods based on sidechannel analyses has four main steps detailed in [2]. First of all, sufficient and efficient input vectors must be generated. This is very challenging because these vectors must be as few as possible and also stimulate all of the circuit elements. For instance, with pathdelay analysis, the test vectors must cover a collection of paths in such a way that each net of the circuit belongs to at least one path of the collection. The second step consists in applying the generated test vectors to the ICs and measuring the signals of the targeted sidechannel. These signals are used as a fingerprint for each IC. When the targeted sidechannel is pathdelay, the generated test vectors are applied to the ICs and the selected pathdelays are measured. Afterwards, the variation of the targeted sidechannel must be calculated. This variation is referred to as the “Golden reference”.
The Golden reference can be derived from a few genuine ICs (Golden ICs). These ICs can be found by randomly selecting a few ICs and then performing destructive reverse engineering (including encapsulating, delayering, imaging, and image processing [2]). Later research has shown that reverse engineering can be avoided by adding some embedded structures to the circuit and using them to obtain the IC fingerprints and sidechannel variations. This approach is known as Goldenfree HT detection [21]. The final step is to compare the other ICs’ fingerprints with the Golden reference.
2.2 PathDelayBased Hardware Trojan Detection
Jin et al. proposed delay fingerprinting in earlier work [22]. They simulated the presence of PV by assuming a 15% variation in the parameters of a 130 nm technology. Their simulation results show that their approach can detect the HTs that they inserted. However, more PV must be taken into account with newer technologies.
The authors proposed to leverage a pathdelay measurement structure, named Shadow Register, to detect HTs [20, 23]. The implementation of Shadow Register includes duplicating the clock signals and all the flipflops (FF) of the circuit. The input of each original FF feeds the added (shadow) FF. The shadow FFs capture their input using the added clock signal. The added and original clock signals have the same frequency, with a specific phase difference. The phase difference should be equal to the maximum pathdelay variation. In this case, if an FF and its shadow have two different values, one can suspect the presence of an HT. The comparison between each FF and its shadow FF requires an additional XOR gate. The results show that the area overhead and resolution of this structure, as well as HT detectability, are very high [20, 23].
Nejat et al. have proposed using another structure with the same objective [19]. This structure calls for just one additional multiplexer for each FF in the circuit. The results show that this structure calls for less area overhead, and lower delaymeasurement accuracy in comparison to the shadow registerbased measurement approach [19].
Cha et al. have proposed the use of a ring oscillator^{1} (RO) in addition to the shadow register structure [20]. In this work, an RO is used as a calibration device. Indeed, accurate calibration is necessary to remove (or at least decrease) process variation effects and then be able to detect HT sidechannel effects. Briefly explained, process variations have two different components: dietodie variation components which alter sidechannels in each instance of an IC; and withindie variation components which alter sidechannels at different points of an IC [24]. For instance, due to dietodie variations, the frequency of one RO is different in each instance of the IC. Dietodie variations do not have any effect on ROs in different places of one IC. If, due to dietodie variations, an RO in an IC is X% slower (or faster) than the expected value, one can be sure all ROs (with different sizes and structures) in the IC are X% slower (or faster). Thus, one can calculate the dietodie variation effect on pathdelay using one RO, and then be sure the rest of pathdelay variation is due to withindie variation or HT effects.
Shadow registers empower defenders to investigate HT in shorter paths. The results in [20] show that shorter paths are better choices than long ones for delaybased HT investigations. Due to withindie variations, the delays of two specific gates are different in two different places of an IC. The two gates differ less if they are close to each other [24]. If the gates of a path are placed very far from each other, they are more different and they create a long path. This is the reason for less delay variation in shorter paths. Shekarian et al. theoretically proved this fact [18]. Thus, in order to enhance the HDP of pathdelaybased HT detection methods, shorter paths should be investigated instead of longer ones. The experiments in [19] also show that performancedriven technology mapping increases the success of pathdelayanalysisbased HT detection methods because it generates shorter paths.
ROs have also been used with the aim of HT detection based on pathdelay analysis. With this method, all the circuit gates must belong to at least one RO. RO frequency deviation warns of the presence of HTs. In order to have less area overhead, ROembedding algorithms aim to cover all of the circuit gates with fewer ROs. Decreasing the number of ROs increases RO lengths [25]. Charles et al. proposed a delay chain, named REBEL, which bypasses circuit FFs in order to measure the delay of different parts of the circuit and then detect HTs [26]. Whereas REBEL and RObased structures have less area overhead in comparison to the earliermentioned structures, they make long paths for investigating HT. Long paths (or long ROs) have more delay variations; thus, their HT detection probabilities (HDP) are lower.
Shekarian proposed a new DfTr approach based on a trustdriven retiming algorithm, named TDR [18]. The aim was to reduce the number of vulnerable points. Vulnerable points are nets that only belong to long paths. For each net (N) of the circuit (C), there are some paths (P) that pass the net (N). Accordingly, the definition of the most vulnerable net (MVN) of a circuit, with n net, is obtained by the f function:
Definition 1
In Def. 1, the MSP of a circuit is the maximum (longest) path of the shortestpaths (SP). In other words, MSP is the shortest path of the most vulnerable net, and its value is greater than that of the shortest path of other nets.
The TDR algorithm reduces the MSP value by adding extra FFs. It increases the HDP of pathdelaybased HT detection methods but since functionality is not changed and no logic/state masking is generated, it does not prevent IP/IC piracy. ESCALATION, on the other hand, does protect circuits against IP/IC piracy as it uses the logic masking approach. It also uses the potential of masking to improve the HDP of pathdelay analysisbased HT detection methods.
2.3 Logic Masking
Keybased modifications have been proposed against IP/IC piracy and overproduction threats. A masked circuit has a masked mode and a functional mode. In the masked mode, the circuit has an incorrect key and consequently generates an incorrect functionality. In the functional mode, the key is correct and the circuit works correctly [1].
Keybased modifications include modifying the finitestate machine (FSM), at RTL, and the combinational part of a circuit, at the gate level. FSM modification is known as state masking (or state obfuscation). The modified FSM has a few more flipflops (FF) added to extend the state space. It also has a new start state on power up, and one needs to know a specific input sequence to put the circuit back to the original start state. Modifications of the combinational part, known as logic masking, change the state transition graph and the outputs of the circuits. They modify the circuit such that even if an attacker can reach one of the original states, by chance, the circuit outputs and the transition to the next state wrongly occur.
The use of MUX primitive as keygates has been proposed instead of XOR/XNOR gates [10, 11]. The select input of MUX plays the keyinput role. One of the two MUX inputs is randomly selected and connected to a combinational part net. Net selection is done according to the objective of the masking algorithm. This input pin defines the correct value of the key. The MUX passes the correct input if its select input is fed by the correct value; otherwise, the MUX can generate a wrong value coming from another part of the circuit.
Roy et al. have proposed a random net selection for keygate insertion [27]. We call this method “random masking”. Whereas this approach is very easytoimplement, there is no guarantee that the circuit will malfunction for all wrong keys. Muhammad et al. have presented another logic masking method in which keygates have more effect on each other [12]. They use fault excitation/propagation knowledge to insert the keygates in such a way that the attackers cannot use the inputs to excite and propagate the corruption effect of one keygate only. This hinders the attackers’ effort to reveal the key by structural analysis of the masked circuit.
Subhra et al. proposed an algorithm, namely HARPOON, focusing on the fanin and fanout cones of a circuit [6]. Their objective was to maximize the mismatch points between a masked circuit and its original circuit. A mismatch point is likely to create an inversion. If an inversion arrives at the circuit outputs, it consequently produces an incorrect output. Obviously, the output failure might depend on input vectors. As a result, the number of failing patterns can be used as a simple metric (criterion) for masking qualification. However, this metric is not very accurate since it counts failed outputs even with singlebit failures.
If a fab that fabricates a masked circuit finds an activated IC including the masked circuit in the market, then the fab can guess the correct key by comparing the maskedlayout output bits (for different primary input and keyinput vectors) with the correct output bits of the activated IC. The correct output bits, in each trial of the attack, lead the attacker to find the correct key [10]. Thus, an effective masking approach must produce nearly equal numbers of correct and incorrect output bits when it is driven by the wrong keys. In other words, the Hamming distance (HD) between the correct and incorrect output bits (when a circuit is fed by incorrect keys) should be close to 50%. Rajendran et al. also mathematically proved that with 50% HD, the attacker faces the most difficult situation for key guesses [10]. In this case, 0% HD means that the masked circuit outputs are always correct and independent of the key value. Moreover, 100% HD means that all output bits are always wrong (and equal to the inversion of the correct value) while the keyinputs are applied by the wrong keys. Thus, it is not suitable for masking.
If a keyinput has the wrong value, its keygate creates a fault (an inversion). The fault can propagate through the circuit. It looks like the value at the keygate location in the original circuit is ‘1’ (‘0’) and there is also a stuckat at ‘0’ (‘1’). According to this similarity, Rajendran et al. proposed an algorithm based on knowledge of fault sensitization and propagation [10]. In their faultbased analysis (FBA) algorithm, a net has a high faultimpact if we can easily sensitize a stuckat fault on this net, and also easily propagate the fault to the circuit outputs. In each iteration, the proposed algorithm greedily searches all nets to find the one with the highest faultimpact. Instead of faultanalysis, Samimi et al. proposed the same algorithm that directly checks the keygate effects on the output HD [14]. Both algorithms achieve high HD, but they face a scalability problem because in each iteration, the designer has to apply many random input vectors to calculate the keygate effect or faultimpact.
Logic masking can be a good DfTr; however, its first goal is to hinder IP/IC pirates. Samimi et al. tried to remove rare signals using keygates [29]. Such signals barely have transitions during IC operation either in the normal mode or test mode. They are good choices to feed an HT (so that the HT is rarely activated, in particular, during circuit testing). Dupuis et al. proposed the use of AND/NAND/OR/NOR gates instead of XOR/XNOR keygates in order to decrease the number of low controllable nets [28]. Removing rare and lowcontrollable signals does not guarantee that HTs would be fully activated and detected, because activation may depend on a lot of nets, with normal activity, such that only one rare combination of them activates the HT. It is noteworthy that if a circuit has no rare or lowcontrollable signals, making a rare event using normal events can be difficult; and HT attackers probably need a bigger trigger part. The bigger the HT, the more sidechannel deviation appears. Thus, removing rare signals hinders HT attackers.
3 EscalationBased Algorithm
3.1 Escalation
As mentioned in Section II, an HT might increase the delay of some paths. These paths can include either the payload part of HT or the nets that drive the trigger part. Note that most of the nets in a circuit usually belong to more than one path. Furthermore, it was already mentioned that the additional delay is less detectable if it is investigated among long paths. Consequently, a net cannot be a suitable choice for the HT attacker if the net belongs to at least one short path. In other words, for each net, the shortest path passing the net is the best option to investigate a potential HT interacting with the net. This path selection approach can be used in all pathdelaybased HT detection methods. However, there may be nets whose shortest path is not short enough for high HT detection probability. As a result, in ESCALATION, we aim to generate short fake paths using keygates for such nets.
As in most logic masking methods, in the ESCALATION approach, modifications are done at the gate level and in the combinational part of circuits. The keygates used in ESCALATION are XOR/XNOR and MUX cells. The ESCALATION flow has three main steps. In the first step, for each net, the shortest path passing the net is found and selected. The second step is to sort the selected paths and find the longest one (MSP). The MSP has the least HDP among all the selected paths, and it includes the most vulnerable net. The third step is to insert a keygate such that the most vulnerable net belongs to a new path shorter than the MSP. These three steps continue until all nets belong to at least one shortenough path or until all keygates are inserted.
3.2 Examples
In order to show how ESCALATION works, and highlight some effective empirical hints, examples are presented below. The examples are at the gate level. The unit delay model is used at these examples. It is thus assumed that each cell in the netlist has 1 delay unit, and the interconnection delays are negligible.
 1.
Inserting an XOR/XNOR keygate makes a shorter path if the delay of the nearest PI to the target net (Dpi) is greater than the delay of the target net to the nearest PO (Dpo) in the MSP. Otherwise, it is better to use a MUX keygate.
 2.
Each inserted keygate adds a delay unit (one level of gate) to all the paths that include the inserted keygate. This may defeat the purpose. In order not to have this problem, the defender should avoid inserting keygates in selected paths longer than the maximum accepted MSP.
 3.
There is often more than only one vulnerable net in the circuit. Thus, keygate insertion should be done in such a way as to decrease the number of vulnerable nets as much as possible.
These problems and ideas are shown in the following examples and illustrated in Fig. 3.
Example 2: assume the path in Fig. 3b, including {PI_{2}, A, B, C, D, E, F, PO_{2}}, is the MSP with six delay units in the original circuit. In addition, nets ‘N_{2}’ and ‘N_{5}’ are the most vulnerable nets. In this figure, ‘X_{2}’ and ‘M_{2}’ are added to make shorter paths for ‘N_{2}’ and ‘N_{5}’. As shown, the insertion of ‘X_{2}’ before ‘B’ does not generate any shorter paths for ‘N_{2}’. Likewise, the insertion of ‘M_{2}’ before a PO and making a fanout from ‘N_{6}’ does not make any shorter paths for ‘N_{5}’. For a net like N_{2}, in order for its Dpi to be smaller than its Dpo, a MUX must be used. Otherwise, like N_{5}, an XOR/XNOR keygate must be used.
Example 3: assume that the path in Fig. 3c, including {PI_{3}, A, B, C, D, E, PO_{3}}, is the MSP with five delay units in the original circuit. In addition, nets ‘N_{3}’ and ‘N_{5}’ are the most vulnerable nets. In this case, the insertion of MUX ‘M_{3}’ before PO_{3} and making a fanout from ‘N_{4}’ makes a shorter path for ‘N_{3}’ including {PI_{3}, A, B, C, M_{3}, PO_{3}}. It has 4 delay units. But this modification adds a delay unit to the shortest path of ‘N_{5}’. This path includes {PI_{3}, A, B, C, D, E, M_{3}, PO_{3}} and has 5 delay units. As a result, such POs must not be used.
Example 4: assume that the path in Fig. 3.c, including {PI_{4}, A, B, C, D, F, G, PO_{4}}, is the MSP with 6 delay units in the original circuit. In addition, net ‘N_{5}’ is the most vulnerable net. Also assume that the shortest path of ‘N_{4}’ has 5 delay units. It includes {PI_{4}, A, B, C, D, E, PO_{5}}. If ‘X3’ is inserted before ‘F’, a short path with 3 delay units, including {KI_{6}, X_{3}, F, G, PO_{4}}, would be made for N_{5}. The shortest path passing ‘N_{4}’ would be the MSP with 5 delay units. As seen in Fig. 3d, ‘X_{3}’ is inserted before ‘E’; thus, the shortest paths for ‘N_{5}’ and ‘N_{4}’ have 4 and 3 delay units, respectively. They include {KI_{6}, X_{3}, D, F, G, PO_{4}} and {KI_{6}, X_{3}, D, E, PO_{5}}. The tip from this example is that it is sometimes better to consider a few paths other than just the MSP. As seen in this example, it is better to insert the XOR keygate before ‘E’ to solve the MSP problem, and the shortest path of ‘N4’ is a potential MSP.
3.3 Proposed Algorithm
In the following section, we explain an algorithm based on the ESCALATION approach. Algorithm 1 shows the pseudocode of the proposed algorithm. The algorithm takes a gatelevel netlist, a number as the key lengths (the number of keygates), and an integer number as the targeted MSP value (line 1). At first, the shortest path for each net of the circuit is found and stored in the set SIPSet (line 2). To find the shortest path for each net, a breadthfirst search (BFS) is done from the target net to the primary inputs and primary outputs.
A structure is used to store the information on the shortest path (SIPInfo) for each net. It includes two pointers to the selected path (SP) and the target net (TN), and also two variables. One variable contains the delay from the net to its nearest PI (Dpi), and the other one contains the delay of the path from the net to its nearest PO (Dpo). The summation of these two variables is the delay of the path (Value). In order to find MSP, the paths in SIPSet must be sorted according to their delay (Value). MSP is stored in MSPSet [0]. In addition, the potential MSPs are gathered in a set named MSPSet (line 3). The potential MSPs are the paths that belong to SIPSet and have delays greater than the targeted delay for the MSP value.
The next step of the algorithm is a loop that includes the keygate insertion functions (lines 8 and 10). As shown in example 1, if Dpi is greater than Dpo, many nets belonging to the fanin cone of the target net can be used to insert an XOR/XNOR keygate. Likewise, there are many candidates for inserting a MUX keygate in the fanout of the targeted net if Dpo is greater than Dpi. Please note that there is no need to search all the nets in the cones. A BFS can be performed in the cones with the maximum DeepSearch according to (1) (line 6):
After a keygate is inserted, SIPSet is updated (line 14), and again, MSP and the potential MSPs of the modified circuit are identified and stored in MSPSet (line 15). The loop is finished when MSPSet is empty or the number of inserted keygates is more than the keylength (line 5).
The time and memory complexities of the proposed algorithm depend on the BFSs done to find the shortest path for each net. Please note that inside the loop, the BFSs are performed in order to find the most appropriate net for keygate insertion, but the number of nets in the cones (considering DeepSearch and DeepRecalculate) is much less than the number of nets in the circuit, and it is thus ignorable. Other parts of the algorithm are fixed as well. Assuming we are working with an extracted graph of the circuit, BFS takes O(b^{d + 1}) time and memory [30], where b is the branching factor (it is equal to the number of nodes in the biggest logic cone), and d is the distance from the starting node (it is equal to the maximum logicdepth in the circuit). As a BFS must be done for each net and in each iteration, the order of time complexity of ESCALATION is O (nkb^{d + 1}), where n and k are the number of nets in the circuit and the number of inserted keygates, respectively. In a typical case, as n>> b>> d, the ESCALATION complexity order can be estimated as O (kn). This means that the time complexity of the ESCALATION algorithm increases linearly according to the size of the circuit.
4 Measuring ESCALATION Efficiency
In the previous section, we presented an algorithm for keygate insertion based on the ESCALATION approach. Since ESCALATION has two aims, there are two criteria that must be considered. First, how much logic masking is obtained. This is the primary aim in all masking methods. Second, how much the HDP of pathdelaybased detection methods can be improved, an important goal of the ESCALATION approach. In addition, the keygate overheads such as area and circuit performance must be noted.
4.1 Masking Quality
In Section II, it was mentioned that the number of output failures can be a simple metric to measure masking quality. The more output failures there are, the more masking is obtained. However, a more accurate metric is the average of the HD of correct and incorrect output bits while the circuit is fed by the correct and wrong key vectors. As a result, the masking quality of two circuits obtained by applying two different masking methods on one circuit can be evaluated by comparing these masking metrics.
4.2 HDP Calculation
In order to know the HDP improvement when pathdelay analysis is used, the HDPs of the original circuit and of its masked circuits must be compared. Moreover, MSP has the least HDP among the selected shortest paths. Hence, the HDP of MSP can be considered as a metric.
In addition, according to PDF properties
where in (5), F_{x} is the cumulative distribution function of HTsuspected PDF. Equation (5) can be used to calculate the HDP with less than 0.2% error. The use of the 3sigma rule is the reason for this ignorable error. In fact, the Golden PDF can have a value higher than (μ_{ p } + 3σ_{ p }) with less than 0.002 probability. In other words, two out of 1000 HTfree ICs are reported as HTinfected ICs. This fraction, illustrated by the dotted area in Fig. 4, is known as the false positive rate (FPR).
FPR is the fraction of HTfree ICs which are reported as HTinfected ICs. A higher FPR can be accepted in order to have a higher HDP. For example, in Fig. 4, HDP is increased to 100% by accepting more FPR, the dotted area. To avoid the high FPR, more accurate and costly HT detection methods, such as layout image processing, must be used. There can be a tradeoff between the costs of FPR, other HT detection methods, and trustworthiness gained.
In order to get a 100% HDP, F_{x} in eq. (5) should be zero. As a result, ‘μ_{ p } + 3σ_{ p }’ in eq. (5) should be less than ‘μ_{T}−3σ_{ T }’ in HTsuspected PDF. FPR can be calculated by eq. (6), reached as in the HDP equation
It is noteworthy that the interval changes of the Golden or HTsuspected PDF depend on dietodie and withindie variation. For example, in 45 nm technology, they are 36 and 12% respectively [31]. These values together make HT detection very difficult. Fortunately, we can decrease them using some calibration structures. For instance, dietodie effects can be removed from the pathdelay analysis using the method proposed in [20].
5 Experimental Results
5.1 Experiment Setups and Assumptions
The experiments have been carried out on gatelevel circuits from ISCAS89 [32] and ISCAS85 [33]. First, the circuits were elaborated by VERIFIC API [34]. Then, the proposed algorithm (using the unit delay mode) was executed for different targeted MSP values. The algorithm was implemented using VERIFIC API and C++ programming. Afterwards, all the modified circuits were synthesized by Design Compiler [35] and then placed and routed by SOCEncounter [36]. The NANGATE 45 nm technology was used during the synthesis and physical design [37].
In order to perform a fair comparison between the proposed algorithm and previous works, we tried to use the same experiment flows and assumptions. We compare both MSP reduction and HDP improvement with the results of the TDR algorithm [18]. We also compare the logic masking quality of the algorithm with the [6, 10], based on the number of output failures and HD. Finally, we report layoutlevel results.
5.2 HDP Results in Gate Level
Shekarian et al. used the unit delay model in their TDR algorithm [18]. They also reported the HDP improvement and MSP reduction at this level. In this model, zero correlation among the delay variation of cells is assumed, an unrealistic assumption. But the authors tried to make it acceptable by assuming a higher percent of delay variability. They assumed 60% cell delay variation due to process variation.
Among all MSP values reachable by ESCALATION, the two MSP values for which ESCALATION makes area overhead as similar as possible to the area overhead made by TDR are selected and used in Fig. 6. The ESCALATION executions for obtaining these two MSP values are named ESCALATION1 and ESCALATION2 in Fig. 6. ESCALATION1 (2) needs a bit less (more) area overhead than TDR. Figure 6a shows the MSP values obtained by these two ESCALATION executions and TDR. In Figure 6b, the area overheads of the two executions and TDR are compared. Shekarian et al. just reported the number of flipflops (FFs) added due to TDR execution as the area overhead [18]. In fact, the area overhead of the added FFs (including their area and required clock route) is much bigger than the area overhead of keygates. Thus, we calculate the TDR area overhead by multiplying the reported percentage of the added FFs by the percentage of the sequential area of the circuit. For example, for circuit s9234, the TDR algorithm adds 36 FFs, which equals 17% of the number of FFs in the circuit. As the sequential part of s9234 corresponds to 58% of the circuit area after synthesis, the modified circuit has a 9.9% area overhead. In Fig. 6b, the area overheads of ESCALATION1 and ESCALATION2 are prepared according to the area reports obtained by SYNOPSYS Compiler [35].
Figure 6 illustrates that the TDR algorithm achieves a slightly smaller MSP value with almost the same area overhead for the three smallest circuits; however, in one circuit (s13207), the ESCALATION algorithm achieves a better MSP reduction with less area overhead. It is to be noted that heuristic algorithms do not always achieve the optimal result. The TDR algorithm is a heuristic one, and so it is difficult to understand why it does not always give better results than the ESCALATION algorithm.
MSP value, HDP, and required FPR (to obtain 100% HDP) before and after performing the ESCALATION algorithm on sequential circuits, accepting 20% area overhead and assuming unit delay model and 60% cell delay variation
Circuit  MSP value  HDP  RFPR  MSP value  HDP  RFPR  No. of used keygates  Percent of area overhead 

In the original circuit  By ESCALATION in the masked circuit  
S13207  35  10  91  14  38  63  741  17.7 
S35814  38  8  93  13  41  60  297  17.2 
S9234  43  5  96  16  31  70  165  22 
S5378  21  23  78  11  51  50  109  21 
S1423  20  22  78  7  80  21  67  21.5 
S1196  18  24  77  10  55  46  31  21.1 
Average  30  15  86  12  49  52  20 
5.3 Masking Quality
Comparing Hamming distance results of random masking, ESCALATION, and FBA algorithms
Circuit  No. of keygate  Random (%)  ESCALATION (%)  FBA (%) 

C432  17  26  37  50 
C499  40  3  22  50 
C880  28  16  17  50 
C1355  42  13  25  50 
C1908  28  9  25  50 
C3540  22  15  13  50 
C5315  97  10  20  45 
C6288  27  24  8  50 
C7552  89  8  18  48 
Average  –  12  2 1  49 
6 LayoutLevel Validation
In Section V, we used the same assumptions that the authors used at the gate level [18], to fairly compare with this work. Two assumptions in this work are (1) 60% cell delay variation and (2) no correlation among the delay variation of the cells in a path. In fact, there are components in withindie variation that are physically dependent and correlated [24]. The lack of layoutlevel information at gate level and RTL forces designers to use a simple delay and variation models as the authors have done [18].
In order to achieve more accurate results, we performed experiments at the layout level, postplacement, and routing. The experiments include HT insertion in MSP and HPD calculation. The experiments consist in placing and routing masked circuits with SOCEncounter [36]. The shortest path for each net and the MSP of each circuit are then found using TCL scripts. Afterward, an AND gate, as an HT, is inserted in MSP. Note that AND gate is the smallest functional HT that one can use. The MSP delays before and after HT inserting are obtained. The HDP of MSP is calculated according to formula (6).
In HDP, we consider 12% of pathdelay variation due to withindie variation according to [31]. In this work, the authors fabricated ICs with ROs with different lengths. The ROs were inserted in different locations of the layout design. The ICs were fabricated using different layout design styles, in the 45 nm technology. The reports in [31] show that there are around 36 and 12% of pathdelay variations due to dietodie and withindie variation, respectively. Thanks to calibration methods, like [20], we can remove the effects of dietodie variations from the pathdelay analysis. In addition, defenders can use any structure for measuring pathdelay such as the ones given in [19, 20].
HDP of MSP in the original and masked circuits; performance and area overhead in layout level
Circuit  Original circuit  Masked circuit  Performance overhead (%)  No. of used keygates  Area overhead (%)  

Performance  MSP  HDP (%)  Performance  MSP  HDP (%)  
C432  1.47 ns  0.45 ns  93  1.47 ns  0.31 ns  100  0  17  16 
C449  0.79 ns  0.39 ns  100  –  –  –  –  –  – 
C880  1.06 ns  0.32 ns  100  –  –  –  –  –  – 
C1355  1.02 ns  0.40 ns  99  –  –  –  –  –  – 
C1908  1.08 ns  0.32 ns  100  –  –  –  –  –  – 
C3540  1.51 ns  0.47 ns  90  1.66 ns  0.47 ns  90  10  22  13 
C5315  1.11 ns  0.45 ns  93  1.25 ns  0.38 ns  99  13  97  19 
C6288  4.32 ns  1.23 ns  8  4.55 ns  0.81 ns  30  5  27  3 
C7552  2.15 ns  0.77 ns  34  2.19 ns  0.66 ns  51  2  89  21 
Average  64  74  6  14 
Regarding the power overhead of masking methods, it is noteworthy that masked circuits work in their functional mode by the correct key; thus, the added keygates are totally transparent in this mode. They do not add any transitions in functional mode. Hence, they just have leakage power and do not add any dynamic power. The power overhead is not reported in this work because the static power of a few keygates in 45 nm technology is negligible.
7 Discussions
 1.
ESCALATION is a DfTr approach and hinders HT insertion in two ways. First, it masks a circuit, thus, HT attackers cannot have good knowledge about the original functionality of the circuit. Second, using the ESCALATION approach, if all nets of a circuit belong to at least one shortenough path, the circuit is more sensitive to HT delay effects. In such situations, in order to hide these effects, an HT attacker can increase the drive strength and capacity load of the cells that precede and proceed the HT. Cells having more drive strength and capacity load consume more power. As a result, increasing the strength and capacity load increases the success probability of powerbased HT detection methods. Thus, pathdelay and powerbased HT detection methods must be combined.
 2.
The ESCALATION approach is based on logic masking, and it only modifies the combinational part of circuits at the gate level. As explained in Section II, logic masking can be used as one step of masking an RTL netlist. It is used to change the state transition graph of an RTL netlist [16]. If we change the combinational part of a sequential circuit, then wrong keys create wrong values in the FFs and also POs. Output failures in a masked sequential circuit are the result of wrong keys and (consequently) wrong values in FFs. Thus, key detection in sequential circuits is more difficult than in simple combinational ones.
 3.
In order to have better HDP results in the ESCALATION approach, an ESCALATIONbased algorithm can be implemented in designabstraction levels lower than the gate level, such as after performing synthesis, placement, or routing. In the lower levels, there is more information about the delay components of nets and cells; thus, pathdelay calculation and finding the shortest path for the nets is more accurate (and certainly more complex) than at the gate level. If designers implement any logic masking method postplacement and routing, they must incrementally perform a logic optimization after inserting all the keygates. This optimization is done in order to solve the keygates in the combinational part of circuits. For example, if one inverter gate in the original circuit is preceded by an XOR keygate, a logic optimizer algorithm might convert it to an XNOR keygate. As a result, the masked circuit has an XNOR keygate, for which the correct key is ‘0’, although the correct key value of an XNOR keygate would seem to be ‘1’ at first glance.
 4.
In order to improve the HD results in the ESCALATION approach, both HD achievement and MSP reduction can be considered simultaneously. As mentioned in Section III, the objective of the proposed ESCALATION algorithm is just to reduce the MSP. In each iteration of the proposed algorithm, there might be more than one suitable net for our objective. Among these nets, other objectives can be investigated. For instance, we have seen many times that there are a few nets for which keygate insertion can make a shorter path for MSP. A net is then randomly selected, but one can investigate which one has a better effect on HD. This is sure to increase the time and memory complexity of the algorithm.
 5.
The area overheads reported in the previous section seem high; however, it must be taken into account that there is no need to mask a whole big circuit. HT and IP piracy threats are important in the securitycritical parts of circuits. An HT inserted in unmasked parts will not have critical effects. In systemsonchips, there are many IPs that can be found freely everywhere; and so no one cares about them being stolen. A large circuit can therefore easily be partitioned into subcircuits and only the securitycritical parts masked. When this is the case, the area overhead is much less than the reports in this work. For instance, Rajendran et al. masked some small parts of a microprocessor (e.g., thread switch, DMA controller, FP unit, etc.) [10]. As shown in Table 3, in the worst case, we have 21% area overhead; definitely, this is a lot. But if the critical security part of a circuit, which should be masked, occupies just 10% of the circuit area, the area overhead is only 2.1%.
8 Conclusion and Future Works
In this paper, we have presented a new DfTr approach, called ESCALATION, which leverages logic masking in order to enhance HT detection based on pathdelay analysis. Its objective is to reduce the MSP value of the circuit. MSP value reduction is of major interest for HT detection: it increases the HDP of the most vulnerable net. Based on the ESCALATION approach, we proposed an algorithm that identifies the most vulnerable net in the circuit and then inserts a keygate before or after this net. According to the delay of the target net to the PIs or POs, an XOR or MUX keygate is used by the algorithm.
Simple formulas for calculating both HDP and RFPR have been proposed and proven. Using the formulas, HDP has been calculated considering a 60% cell delay variation at the gate level. Furthermore, in layout level, HDP has been calculated considering 12% pathdelay variation. The layout level experiments and results show that the ESCALATION algorithm is capable of improving the HDP of the MSP by 35%.
In addition, the logic masking quality of the ESCALATION algorithm was investigated according to two metrics: the number of failed outputs and the HD of the output bits. We compared the ESCALATION algorithm to the HARPOON algorithm [6]. Experiments show that ESCALATION can reach a good level of logic masking quality, as good as HARPOON’s, by accepting a bit more area overhead. Moreover, the HD of masked circuits using the ESCALATION algorithm was calculated. The results are much better than those attained by random masking [27]. However, they are not as good as the faultbasedanalysis (FBA) results [10]. In addition, we have also discussed how to improve the HD of masked circuits obtained by the ESCALATION approach.
Footnotes
 1.
ROs generate oscillations and they include an odd number of NOT gates (or gates having an inversion function such as NOR/NAND gates) and feedback that the output of the last NOT gate is fed into the first NOT gate.
References
 1.Mishra P, Tehranipoor M, Bhunia S (2017) Security and trust vulnerabilities in thirdparty IPs, In Hardware IP security and trust. Springer, Cham, pp 3–14Google Scholar
 2.Agrawal, D., Baktir, S., Karakoyunlu, D., Rohatgi, P., & Sunar, B. (2007). Trojan detection using IC fingerprinting. In Security and privacy, 2007. SP'07. IEEE Symposium on (pp. 296–310). IEEEGoogle Scholar
 3.Li H, Liu Q, Zhang J (2016) A survey of hardware Trojan threat and defense. Integr VLSI J 55:426–437CrossRefGoogle Scholar
 4.Lecomte M, Fournier J, Maurine P (2017) An onchip technique to detect hardware Trojans and assist counterfeit identification. IEEE Trans Very Large Scale Integr (VLSI) Syst 25(12):3317–3330CrossRefGoogle Scholar
 5.Yu Q, Dofe J, Zhang Y, Frey J (2017) Hardware hardening approaches using camouflaging, encryption, and obfuscation. In: Hardware IP security and trust. Springer, Cham, pp 135–163CrossRefGoogle Scholar
 6.Chakraborty RS, Bhunia S (2009) HARPOON: an obfuscationbased SoC design methodology for hardware protection. IEEE Trans Comput Aided Des Integr Circuits Syst 28(10):1493–1502CrossRefGoogle Scholar
 7.Dofe, J., & Yu, Q. (2017) Novel dynamic statedeflection method for gatelevel design obfuscation. IEEE Trans Comput Aided Des Integr Circuits SystGoogle Scholar
 8.Rajendran, J., Pino, Y., Sinanoglu, O., & Karri, R. (2012) Security analysis of logic obfuscation. In Proceedings of the 49th Annual Design Automation Conference (pp. 83–89). ACMGoogle Scholar
 9.Zhang J (2016) A practical logic obfuscation technique for hardware security. IEEE Trans Very Large Scale Integr (VLSI) Syst 24(3):1193–1197CrossRefGoogle Scholar
 10.Rajendran J, Zhang H, Zhang C, Rose GS, Pino Y, Sinanoglu O, Karri R (2015) Fault analysisbased logic encryption. IEEE Trans Comput 64(2):410–424MathSciNetCrossRefzbMATHGoogle Scholar
 11.Plaza SM, Markov IL (2015) Solving the thirdshift problem in IC piracy with testaware logic locking. IEEE Trans Comput Aided Des Integr Circuits Syst 34(6):961–971CrossRefGoogle Scholar
 12.Yasin M, Rajendran JJ, Sinanoglu O, Karri R (2016) On improving the security of logic locking. IEEE Trans Comput Aided Des Integr Circuits Syst 35(9):1411–1424CrossRefGoogle Scholar
 13.Dutta RG, Guo X, Jin Y (2017) IP trust: the problem and design/validationbased solution. In: Fundamentals of IP and SoC security. Springer, Cham, pp 49–65CrossRefGoogle Scholar
 14.Samimi, S. M. S., Aerabi, E., Nejat, A., Fazeli, M., Hely, D., & Beroulle, V. (2016). High output hammingdistance achievement by a greedy logic masking approach. In EastWest Design & Test Symposium (EWDTS), 2016 I.E. (pp. 1–4). IEEEGoogle Scholar
 15.Colombier B, Bossuet L, Hély D (2017) Logic modificationbased IP protection methods: an overview and a proposal, In Foundations of hardware IP protection. Springer, Cham, pp 37–64Google Scholar
 16.Chakraborty RS, Bhunia S (2011) Security against hardware Trojan attacks using keybased design obfuscation. J Electron Test 27(6):767–785CrossRefGoogle Scholar
 17.Nejat, A., Hely, D., & Beroulle, V. (2016) How logic masking can improve path delay analysis for Hardware Trojan detection. In Computer Design (ICCD), 2016 I.E. 34th International Conference on (pp. 424–427). IEEEGoogle Scholar
 18.Shekarian SMH, Zamani MS (2015) Improving hardware Trojan detection by retiming. Microprocess Microsyst 39(3):145–156CrossRefGoogle Scholar
 19.Nejat A, Shekarian SMH, Zamani MS (2014) A study on the efficiency of hardware Trojan detection based on pathdelay fingerprinting. Microprocess Microsyst 38(3):246–252CrossRefGoogle Scholar
 20.Cha, B., & Gupta, S. K. (2013). Trojan detection via delay measurements: a new approach to select paths and vectors to maximize effectiveness and minimize cost. In Proceedings of the conference on design, automation and test in Europe (pp. 1265–1270). EDA ConsortiumGoogle Scholar
 21.Hoque T, Narasimhan S, Wang X, MalSarkar S, Bhunia S (2017) Goldenfree hardware Trojan detection with high sensitivity under process noise. J Electron Test 33(1):107–124CrossRefGoogle Scholar
 22.Jin, Y., & Makris, Y. (2008). Hardware Trojan detection using path delay fingerprint. In Hardwareoriented security and trust, 2008. HOST 2008. IEEE International Workshop on (pp. 51–57). IEEEGoogle Scholar
 23.Rai, D., & Lach, J. (2009) Performance of delaybased Trojan detection techniques under parameter variations. In Hardwareoriented security and trust, 2009. HOST'09. IEEE International Workshop on (pp. 58–65). IEEEGoogle Scholar
 24.Blaauw D, Chopra K, Srivastava A, Scheffer L (2008) Statistical timing analysis: from basic principles to state of the art. IEEE Trans Comput Aided Des Integr Circuits Syst 27(4):589–607CrossRefGoogle Scholar
 25.Ferraiuolo, A., Zhang, X., & Tehranipoor, M. (2012) Experimental analysis of a ring oscillator network for hardware Trojan detection in a 90nm ASIC. In Proceedings of the International Conference on ComputerAided Design (pp. 37–42). ACMGoogle Scholar
 26.Lamech, C., & Plusquellic, J. (2012) Trojan detection based on delay variations measured using a highprecision, lowoverhead embedded test structure. In HardwareOriented Security and Trust (HOST), 2012 I.E. International Symposium on (pp. 75–82). IEEEGoogle Scholar
 27.Roy JA, Koushanfar F, Markov IL (2010) Ending piracy of integrated circuits. Computer 43(10):30–38CrossRefGoogle Scholar
 28.Dupuis, S., Ba, P. S., Di Natale, G., Flottes, M. L., & Rouzeyre, B. (2014) A novel hardware logic encryption technique for thwarting illegal overproduction and hardware trojans. In OnLine Testing Symposium (IOLTS), 2014 I.E. 20th International (pp. 49–54). IEEEGoogle Scholar
 29.Samimi, M. S., Aerabi, E., Kazemi, Z., Fazeli, M., & Patooghy, A. (2016). Hardware enlightening: nowhere to hide your hardware Trojans!. In OnLine Testing and Robust System Design (IOLTS), 2016 I.E. 22nd International Symposium on (pp. 251–256). IEEEGoogle Scholar
 30.Russell SJ, Norvig P, Canny JF, Malik JM, Edwards DD (2003) Artificial intelligence: a modern approach (Vol. 2, No. 9). Prentice hall, Upper Saddle RiverGoogle Scholar
 31.Pang LT, Qian K, Spanos CJ, Nikolic B (2009) Measurement and analysis of variability in 45 nm strainedSi CMOS technology. IEEE J Solid State Circuits 44(8):2233–2243CrossRefGoogle Scholar
 32.The ISCAS85 Benchmark Circuits. [Online]. Available: http://www.pld.ttu.ee/~maksim/benchmarks/iscas89/
 33.The ISCAS85 Benchmark Circuits. [Online]. Available: http://pld.ttu.ee/~maksim/benchmarks/iscas85/
 34.Verific Design Automation Inc., [Online]. Available: http://www.verific.com
 35.Synopsys Design Compiler, [Online]. Available: http://www.synopsys.com/Tools/Implementation/RTLSynthesis/DesignCompiler/Pages/default.aspx
 36.Cadence SOC Encounter, [Online]. Available: https://www.cadence.com
 37.NanGate—The Standard Cell Library Optimization Company, [Online]. Available: http://www.nangate.com/