Identification and Rejuvenation of NBTI-Critical Logic Paths in Nanoscale Circuits
- 234 Downloads
- 4 Citations
Abstract
The Negative Bias Temperature Instability (NBTI) phenomenon is agreed to be one of the main reliability concerns in nanoscale circuits. It increases the threshold voltage of pMOS transistors, thus, slows down signal propagation along logic paths between flip-flops. NBTI may cause intermittent faults and, ultimately, the circuit’s permanent functional failures. In this paper, we propose an innovative NBTI mitigation approach by rejuvenating the nanoscale logic along NBTI-critical paths. The method is based on hierarchical identification of NBTI-critical paths and the generation of rejuvenation stimuli using an Evolutionary Algorithm. A new, fast, yet accurate model for computation of NBTI-induced delays at gate-level is developed. This model is based on intensive SPICE simulations of individual gates. The generated rejuvenation stimuli are used to drive those pMOS transistors to the recovery phase, which are the most critical for the NBTI-induced path delay. It is intended to apply the rejuvenation procedure to the circuit, as an execution overhead, periodically. Experimental results performed on a set of designs demonstrate reduction of NBTI-induced delays by up to two times with an execution overhead of 0.1 % or less. The proposed approach is aimed at extending the reliable lifetime of nanoelectronics.
Keywords
Hardware rejuvenation Aging NBTI Critical path identification Logic circuit Evolutionary computation MicroGP zamiaCAD1 Introduction
Guaranteeing lifetime reliability is a key challenge in current nanoscale semiconductor manufacturing processes. One of the most critical downsides of technology scaling beyond the 65 nm node is the non-determinism of the devices’ electrical parameters caused by time-dependent variations [18] in the operating characteristics of the device. Two essential sources of time-dependent variations have been identified: Bias Temperature Instability (BTI), and Hot Carrier Injection (HCI) [28]. These physical/chemical effects result in the degradation of the oxide thus causing a drift of the Threshold Voltage (V_{TH}) over time. In terms of magnitude, BTI has become the most prominent effect. In fact, BTI creates the interface traps along the entire silicon-oxide interface and is thus sensitive to temperature and the vertical electric field. On the contrary, hot carrier generation only damages the interface near the drain side, which makes it basically depend on the lateral electric field. As devices shrink, the influence of the temperature and the vertical electric field has largely surpassed the influence of the lateral field [4].
BTI manifests in two distinct forms, depending on the type of transistor involved: Negative BTI (NBTI), which affects pMOS transistors, and its counterpart Positive BTI (PBTI), which affects nMOS devices. In current technologies, the impact of PBTI is much lower than NBTI. Therefore, this paper specifically addresses the Negative Bias Temperature Instability (NBTI) phenomenon [5]. It is worth mentioning that the importance of PBTI is expected to increase, particularly with the adoption of high-k, Hafnium-based dielectrics in the gate-oxide for leakage reduction [24].
1.1 Preliminaries
NBTI is defined as the effect that occurs when a pMOS transistor is negatively biased. The effect manifests itself as an increase of the pMOS transistor threshold voltage |V_{THp}| over time. This leads to drive current reduction and noise increase, which in turn causes a degradation of the device delay. NBTI’s impact on the long-term stability of functional logic is expressed through the incapability of storing a correct value in memory elements such as flip-flops. This effect is due to the de-synchronization between clock distribution and signal propagation through logic paths of a circuit. Therefore, after several years of circuit operation time, the NBTI-induced delays may cause, first, intermittent faults and, ultimately, permanent functional failures in the circuit [17].
The variation of V_{THp} of pMOS transistors due to dynamic NBTI (i.e. when the stress and recovery phases are iterating) is estimated to be 5–15 % per year [8, 13]. The exact value depends on the targeted technology and the environment (e.g. ambient temperature and user workload). The V_{THp} shift due to static NBTI (i.e. when a transistor is under constant stress) can be significantly higher. The path delay degradation follows the same trend, though with a smaller magnitude. It has been shown that NBTI depends on many factors [22], but its strongest correlation is with the signal probability P_{z} (input duty cycle). The signal probability P_{z}(x_{i}) for a logic gate’s input x_{i} is defined as the ratio of time during which the input signal x_{i} is set to logic 0.
1.2 Contributions
In this paper, we propose a novel approach to mitigate NBTI using rejuvenation of nanoscale logic at NBTI-critical paths with dedicated stimuli sequences. The method is based on fast and hierarchical identification of NBTI-critical paths at gate level and the rejuvenation stimuli generation using an Evolutionary Algorithm. Note that this work does not aim at contributing to the development of a new transistor-level model for the underlying NBTI physical and chemical processes and assumes the existing models as an input.
First, based on SPICE electrical simulations, an accurate gate-level model for fast computation of NBTI-induced path delay degradation Δt_{path} and identification of NBTI-critical paths is developed. In more detail, this method is based on the estimation of NBTI-induced delays for individual gates Δt_{gate} along selected timing-critical paths followed by a static timing analysis. Experiments with an industrial ALU circuit expose the good match of the gate-level approach to the electrical simulation results with the simulation speed-up of several orders of magnitude.
Second, an approach to create rejuvenation stimuli for the identified NBTI-critical paths using an evolutionary stimuli generation algorithm is proposed. In this paper, we exploit a general-purpose evolutionary toolkit called μGP [30, 31] and find a suitable fitness function using an open source hardware analysis framework zamiaCAD [6]. The advantage of such flow lies in its flexibility for solving the dependencies of impacts by individual gates to the most critical NBTI-induced path delay by using evolutionary optimization processes.
The generated rejuvenation stimuli are applied at predefined periods in order to drive pMOS transistors to the recovery phase. Thus, the proposed approach aims at extending the reliable lifetime of nanoelectronics.
The remainder of the paper is organized as follows. Section 2 provides an overview of the related work. Section 3 introduces SPICE-inspired models for NBTI-induced gate delay calculation. Section 4 proposes a method for fast gate-level identification of NBTI-critical logic paths. Section 5 introduces a flow for evolutionary generation of rejuvenation stimuli and presents corresponding experimental results. Section 6 discusses applicability and limitations the rejuvenation procedure. Section 7 concludes for the paper.
2 Overview of Related Work
Previous works appearing in the literature address the NBTI problem both for memories [3, 13] and for functional logic [22]. Usually, to mitigate the impact of NBTI on the circuit’s lifetime these approaches adopt redesign strategies, voltage and frequency scaling and internal node control guided by monitoring attributes or design structure analysis.
The work in [20] proposes a redesign approach for functional logic based on transistor sizing technique that not only mitigates NBTI induced delay of the gate under consideration, but also minimizes its impact on the adjacent gates. This technique appears to be very effective, but it is mandatory to provide the critical gates and paths to which it should be applied. Otherwise, this technique will result in an unacceptable area overhead and eventually, excessive power consumption. In [23] the authors present a method, which characterizes the delay of every gate in a standard cell library as a function of the signal probability (P_{z}) of each of its inputs and suggests an NBTI-aware synthesis accordingly. It demonstrates an average of 10 % area recovery for 65 nm technology under the pessimistic assumption that all pMOS transistors in the design are under constant static NBTI stress. Although the calculation process in [23, 40], allowing the derivation of aging curves for logic components, is handy, it is also prohibitively time consuming. This is due to an extremely large number of stress recovery cycles that have to be computed. The work in [35] proposes an approach for temporarily hiding NBTI-induced aging by applying changes to voltage and frequency of the circuit.
Approaches to analyze the efficiency of controlling input signal probability for mitigating NBTI at circuit level were proposed in [14, 38]. Works in [2, 26] propose to exploit the idle time of processors and unused bits in source operands [16]. A very relevant approach for processor circuits’ rejuvenation is presented in [15], where authors propose to replace the default NOP with a special “maximum aging reduction NOP instruction” that, while having no effect on the program state, minimizes the NBTI effect. The results show that this method can extend circuit lifetime by an average of 37 %, with performance, power, and area overhead within 1 %.
Different from state-of-the-art, in this paper, a novel approach of approximation by mathematically convenient functions is used to calculate the gate and path delay degradations caused by NBTI aging. In addition, to the best of our knowledge, it is the first time that evolutionary algorithms are applied to the task of rejuvenation stimuli generation.
- a)
it is based on fast, yet accurate hierarchical gate-level identification of NBTI-critical paths where rejuvenation has to be applied;
- b)
it proposes efficient rejuvenation stimuli generation using an evolutionary algorithm;
- c)
it does not require redesign and can be applied to an existing circuit, exploiting , if necessary, the existing design-for-testability instruments.
3 NBTI-Induced Gate Delay Models
Step 1: Obtaining a curve for ∆V_{THp} as a function of P_{z} for selected environmental variables, technology and a given time of operation in years. (Section 3.1 presents the details for deriving the equation).
Step 2: Performing extensive electrical simulations of individual gates in SPICE to obtain curves of degradation of delay Δt for each input of each gate type under the stressed conditions, i.e. when the corresponding input switches from 1 to 0. Δt will be represented as the percentage of change of the nominal delay t for the given gate. (Section 3.2 explains the process of obtaining the curves for Δt).
Step 3: Developing approximated polynomial equations for the aging curves obtained in Steps 1 and 2. This step enables mathematically convenient calculation of NBTI-critical paths at the gate-level, as to be presented in Section 4.
Note, that as a preprocessing step, complex gates are flattened into NAND, NOR and inverter stages (e.g. an AND gate is represented by a NAND gate followed by an inverter gate).
3.1 Modeling Dependency of pMOS Transistor V_{THp} Shift on Signal Probability P_{z}
3.2 Modeling NBTI-induced Gate Delay Degradation Based on Electrical Simulations in SPICE
Modeling of NBTI-induced gate delay degradation relies on intensive SPICE electrical simulations, in case of this work, performed using Synopsys HSPICE simulator. The selected technology was the 65 nm PTM [44] with V_{DD} = 1.1 V and V_{THp} = −0.365 V. For an alternative technology, it would be necessary to repeat the same gate aging characterization procedure in SPICE.
The SPICE simulation process consisted of simulating the basic cells of the technology library for different V_{THp} shift values ∆V_{THp} in order to capture the dependence of gate output delay on V_{THp}. The output load capacity C_{L} for gate output was chosen to be 1.0fF in accordance with the selected 65 nm technology. Figure 3 displays the typical n- and p-networks device interconnections for Inverter, NAND and NOR gates considered for simulation.
According to the Eq. (3), the maximum V_{THp} shift under static NBTI can be up to 0.27 V (for 10-year induced NBTI aging). Therefore, we have performed SPICE simulations increasing the ∆V_{THp}(x_{i}) value step-by-step from 0 V to 0.27 V applying a 2.5 % sampling step to obtain gate output delay degradation for basic gates.
- a)
IN_{A} = 0 → 1, IN_{B} = 1 (poly1);
- b)
IN_{A} = 1, IN_{B} = 0 → 1 (poly2);
- c)
IN_{A} = 0 → 1; IN_{B} = 0 → 1 (poly3).
The largest difference appears between the values of curves a) and b). It increases up to 15 % when ∆V_{THp} surpases 0.24 V that corresponds to a signal probability P_{z}(x_{i}) very close to 1. Moreover, for P_{z}(x_{i}) values below 0.9, the mean difference is still significant – up to 4 %. This difference is caused by the stressed pMOS transistor location in the net (see Fig. 4) and identifies the necessity to consider which pMOS transistor of the gate is being under NBTI stress since it impacts on the gate output delay degradation.
- a)
IN_{A} = 0 → 1, IN_{B} = 0 (poly1);
- b)
IN_{A} = 0, IN_{B} = 0 → 1 (poly2);
- c)
IN_{A} = 0 → 1; IN_{B} = 0 → 1 (poly3).
Here, the difference between curves b) and c) is over 20 % if the V_{THp} is shifted by 0.2 V and more.
The SPICE experiments show that only the 0 → 1 transition at gate inputs reveals the NBTI-induced gate delay, while the 1 → 0 transition is computed by the gate with no or negligible NBTI-induced gate delay Δt_{gate}. This can be explained by the fact that during the time the pMOS device in the p-network ages, it facilitates the task of discharging the gate output capacitance by the nMOS device, placed in the n-network. The exceptions are NOR gates, especially the NOR gates with multiple inputs, where gate delay degradation Δt_{gate} for input 1 → 0 transition becomes even negative, i.e. the transition delay is decreased compared to the nominal one.
4 NBTI-Critical Logic Paths
The approximation of the curves to mathematically convenient polynomial equations proposed in Section 3 enables fast hierarchical identification of NBTI-critical logic paths at the gate-level. It is based on simulation of signal probabilities, static timing analysis with nominal delays and calculation of the longest NBTI-degraded path using NBTI-induced path delays. In order to explain the main concepts, let us introduce some basic definitions.
Definition 1
An NBTI-critical path is a path in the circuit whose delay d is greater or equal than a ratio of B/D, where B is the time budget for the computations along the path to complete, i.e. the time during which a signal can arrive without making the clock cycle longer than desired and D is the coefficient by which a path can be slowed down by NBTI in the current technology and for the given workload.
For example, if it is known that in the particular case NBTI may cause up to 20 % of delay degradation for the given time period of interest, then D is equal to 1.2.
Definition 2
The longest NBTI-degraded path is the path that has the longest total delay when considering NBTI-induced additional delays Δt for the gates along that path. Here, NBTI-critical paths have to be analyzed both for 0 → 1 and 1 → 0 transitions at their primary inputs.
This Section is organized as follows. First, we introduce an algorithm to identify NBTI-critical logic paths and the longest NBTI-degraded path. This is followed by an example explaining the identification process. Finally, we provide experimental results for accuracy assessment of the hierarchical NBTI-critical paths calculation.
4.1 Identification of NBTI-critical Logic Paths
- 1)
The first task is to calculate the values of signal probabilities P_{z}(x_{i}) for signal lines x_{i} by applying logic simulation of the expected workload to netlist N.
- 2)The next task is to characterize each signal line x_{i} of the circuit by a restricted number of L paths represented by pairs M_{k}(x_{i}) = (d_{k}(x_{i}), s_{k}(x_{i}) = {x_{PI}, …, x_{i}}), where k = 1, …, L and x_{PI} is the primary input at the start of the corresponding path.
- d_{k}(x_{i})
is the delay of the k-th selected path from the primary input x_{PI} to the line x_{i}
- s_{k}(x_{i})
is the set of signal lines on the path from a primary input x_{PI} to the signal x_{i}.
The calculation of the pairs M_{k}(x_{i}) starts from the primary inputs of the netlist N. The delay value d_{k}(x_{i}) for an output signal x_{i} of a gate g_{j} is calculated as d_{k}(x_{i}) = {max d_{k}(x_{in,l})} + t(g_{j}), where x_{in,l} is the l-th input signal of the gate g_{j} and t(g_{j}) is the nominal delay of the gate g_{j}. In order to avoid an explosion of the number of paths to be analyzed, we introduce a threshold L for the number of longest paths traced up to the current signal line x_{i} under analysis for continuing the calculations of the pairs. As a result we obtain a list of L_{i} pairs M_{k}(x_{i}) for each primary output signal x_{i}∈OUT, where L_{i} ≤ L.
- 3)
As the next task, all the obtained pairs {M_{k}(x_{i}) | x_{i}∈OUT} for which d_{k}(x_{i}) ≥ B/D, where B is the time budget for the path to complete and D is the maximum expected delay degradation ratio, will be added to the set of NBTI-critical paths C.
- 4)
Finally, all the paths s_{k}(x_{i}) in C = {M_{k}(x_{i}) = (d_{k}(x_{i}), s_{k}(x_{i}) = {x_{PI}, …, , x_{i}, …, x_{PO}} | k = 0,…,K} will be analyzed for both 0 → 1 and 1 → 0 transitions at their primary inputs in order to calculate their delays after NBTI-degradation. We will calculate the NBTI-degraded delay for the paths s_{k}(x_{i}) by summing up the delays of gates along these paths obtaining the NBTI-degraded path delay d’(s_{k}(x_{i})) for the given transition. Since all the gate stages g_{j} along the paths invert the values at their input, for 0 → 1 (1 → 0) transition at the primary input of the path s_{k}(x_{i}), in the case of even order (odd order) gates on path s_{k}(x_{i}) their nominal delays t(g_{j}) are summed, while NBTI-degraded delays τ(g_{j}) are summed for the odd order (even order) gates, respectively. This is due to the fact that the NBTI-induced delay Δt(g_{j}) manifests itself only under the stressed condition, i.e. when the output of g_{j} is switching from 1 to 0. The NBTI-degraded gate delay τ(g_{j}) is calculated as t(g_{j}) · (100 % + Δt(g_{j})), where Δt(g_{j}) is provided as percentage of delay degradation (see Section 3). Degraded delays for the different gate inputs are calculated separately.
As a result, we obtain NBTI-degraded delay values d’(s_{k}(x_{i})) for both input transitions for all the NBTI-critical paths in the set C and we identify the overall delay of the longest NBTI-degraded path. The latter will be applied as the fitness value to the evolutionary optimization presented in Section 5.
An example identification of NBTI-critical paths and of the longest NBTI degraded path
4.2 NBTI-critical Path Identification Example
Consider the example circuit shown in Fig. 9 consisting of 7 gates. The nominal delays for each gate t(g_{j}) are marked next to them. As the starting point, we set a limit L for the number of paths to be considered at each signal line x_{i}. In other words, only up to L paths with the most significant delay d(x_{i}) will be propagated to the next gate from x_{i}. In our simple example, L is set to 4. An additional parameter is the time budget B. Let the time budget for the circuit in Fig. 9 be 45 time units. Assuming that a maximum NBTI degradation for a path is estimated to be e.g. 20 %, we can calculate a threshold for a path to be considered NBTI-critical. According to the values in this example, any path longer or equal to B/D = 45/1.2 = 37.5 time units will be NBTI-critical.
The first task is to characterize a restricted number of L paths represented by pairs M_{k}(x_{i}) for each primary output signal line x_{i}∈OUT of the circuit. As we can see from Fig. 9, four paths (represented by pairs M_{1}(x_{10}), M_{2}(x_{10}), M_{1}(x_{11}) and M_{2}(x_{11})) exceed the threshold value of 37.5 time units and are therefore included to the set of K = 4 NBTI-critical paths (highlighted by bold lines in the figure).
Table 1 presents the identification of NBTI-critical paths and calculation of the longest NBTI degraded path. The third row “t(g_{j})” provides the nominal delays for each gate g_{j}. The fourth row “d(x_{i})” shows the longest delay from primary inputs to the signal corresponding to the gate output.
In order to calculate the delays of the NBTI-critical paths after the NBTI-degradation we apply Δt(g_{j}) derived for each gate input based on the corresponding P_{z} values at these inputs obtained by gate-level simulation of the user workload. Rows 5 and6 in Table 1 show the NBTI-degraded delay τ(g_{j}) for gates in the stressed state, i.e. when the input is switching from 0 to 1. Degraded delays for both gate inputs IN_{A} and IN_{B} are calculated separately (g_{3} being an inverter has only one input).
Subsequently, the degraded delay d’ of each NBTI-critical path is calculated separately with τ(g_{j}) used for 0 → 1 transitions (stressed condition) and t(g_{j}) used for 1 → 0 transitions (relaxed condition) at the respective gate inputs. Two degraded path delay calculations will be performed for each NBTI-critical path, one for primary input transition 0 → 1 and one for the transition 1 → 0. Thus, for K = 4 NBTI-critical paths 2 · K = 8 calculations are made. The Table lists the NBTI-degraded path delay calculations for the eight paths on six rows, due to the fact that the calculation results for the paths starting with x_{2} and x_{3} (denoted by x_{2/3} in the Table) are equivalent with 1 → 0 primary input transition and are consequently combined into single rows. As it can be seen from the Table, the path M_{2}(x_{10}) = (46, {x_{3},x_{6},x_{7},x_{10}}) for a 0 → 1 primary input transition is the longest NBTI degraded path with the given user workload.
In order to explain obtaining the delay of the longest NBTI degraded path M_{2}(x_{10}) = (46, {x_{3},x_{6},x_{7},x_{10}}), consider the last row “d’_{0→1}({x_{3},x_{6},x_{7},x_{10}})” of Table 1. For the delay from input x_{3} to the output of gate g_{1}, we have to select the “τ(g_{j}) for IN_{B}” since 0 → 1 transition represents the stressed condition for the gate g_{1}. Thus, the delay at g_{1} is 12. For the next gate, gate g_{2}, we have to apply the nominal delay “t(g_{j})” because the transition at the gate input is 1 → 0 and the gate is in a non-stressed state. Thus, the delay at g_{2} is 12 + 14 = 26. Finally, for g_{5} we have to select again the “τ(g_{j}) for IN_{B}” since x_{7} has the 0 → 1 transition, which represents the stressed condition for the gate g_{5}. Thus, the final NBTI degraded path delay for path d'_{0→1}({x_{3},x_{6},x_{7},x_{10}}) at the output of g_{5} is 12 + 14 + 20 = 46.
4.3 Experimental Results for Accuracy Assessment of Model-based NBTI-critical Paths Calculation
According to the circuit specification [1] all input stimuli combinations are allowed. In our experiment, an exhaustive set of input patterns (16,384 vectors) has been generated. This set has been repetitively applied in order to represent a potential user workload. Firstly, input signal probabilities P_{z} for all related pMOS transistors are calculated based on logic simulation results. Secondly, these values are used as input in order to compute the ∆V_{THp} values for pMOS transistors using Eq. (1). The resulting ∆V_{THp} values serve as input for: a) Eq. (2) followed by application of Algorithm 1 and b) path delay simulations in SPICE.
NBTI-induced path delays at selected NBTI-critical paths in the 74HC/HCT181 design calculated by the proposed fast gate-level approach and simulations in SPICE
Path | Delay for path output transition | |||
---|---|---|---|---|
0 → 1 (rise-edge delay) | 1 → 0 (fall-edge delay) | |||
Proposed | SPICE | Proposed | SPICE | |
∆t, % + − % | ∆t, % | ∆t, % | ∆t, % | |
F3#26 | 13.76 | 13.95 | 11.12 | 11.71 |
F3#38 | 13.16 | 13.73 | 12.23 | 10.92 |
F2#61 | 13.06 | 12.09 | 12.27 | 13.91 |
F3#74 | 13.26 | 12.93 | 11.68 | 11.76 |
F1#77 | 7.78 | 9.76 | 17.20 | 16.38 |
5 Evolutionary Generation of Rejuvenation Stimuli
While several test generation techniques can be applied to create rejuvenation stimuli for the identified NBTI-critical paths, an evolutionary algorithm is an efficient option due to its inherent properties. Primarily, it is by construction “blind” regarding the structure of the circuit under analysis, and, therefore, it is able to solve dependencies caused by impacts of individual gates and capable to obtain a cost-effective global solution with respect to all critical paths.
5.1 Evolutionary Algorithms
Over the years, EAs have proven capable of solving difficult problems even within highly complex fitness landscapes, such as open problems related to networking or protocols. Evolutionary optimizers have been successfully exploited both in stationary and dynamic situations. They were demonstrated to be able to identify either single optima or Pareto sets in multi-objective problems. Since 1990s, the complexity of the electronic circuits dramatically increased, and evolutionary heuristics started to be seen as alternatives to classic approaches in the EDA area. Researchers proposed EA-based methodologies for tackling several well-known NP-hard problems, such as placement, floor-planning, routing [11], and automatic test-pattern generation [9, 10]. EAs have also demonstrated to be efficient for evolving assembly test programs for microprocessors, for validation, post-silicon verification, and test [32].
The task of generating rejuvenation stimuli calls for the use of an EA. The evolutionary approach is intentionally “blind” towards the structure of the circuit. As a result, the EA might be able to sort out dependencies of impacts by individual gates, and obtain a global solution with regard to all critical paths in a cost effective manner. Note, that, generally, the internal information about the circuit is still required by external tools that provide the EA with feedback on the generated solutions’ quality (i.e. fitness).
5.2 The Evolutionary Toolkit μGP
The approach presented here makes use of an evolutionary toolkit named μGP. μGP was developed at the Politecnico di Torino in the early 2000s [30, 31, 33], and it is now available under the GNU Public License from Sourceforge.
μGP allows a high degree of customization, but most of its parameters can also be self-adapted, that is, it can autonomously set them to a reasonable value during the optimization process. This self-adaptation mechanism is also used to shift the algorithm’s focus between exploration, i.e., seeking new solutions changing significantly the current ones, and exploitation, i.e., tweaking the current good solutions changing them slightly, increasing both the speed and the quality of result.
μGP implements a large variety of genetic operators that can handle the specific characteristics of the individuals. Moreover, two operators mimic differential evolution [33] to more efficiently handle real-valued parameters, while one operator performs a pseudo exhaustive search on a single element of the solution. All operators may be activated with a specific probability, and, further, self-adaptation regulates these probabilities [6].
5.3 Flow for Evolutionary Generation of Rejuvenation Stimuli
The approach for rejuvenation stimuli generation proposed in this paper was implemented on top of the open source scalable hardware design and analysis framework zamiaCAD [36, 43]. The front-end of zamiaCAD includes a parser and an elaboration engine that supports full VHDL-2002 standard specification and a set of VHDL-2008 extensions. On the back-end side, the framework allows design simulation, static analysis and other applications for debug [19]. zamiaCAD has an Eclipse IDE plug-in based graphical user interface for advanced design entry and navigation.
The evolutionary optimizer evolves the population until a steady state condition is detected (i.e., non-improvement is recorded for a given number of generations), or a maximum number of steps has been performed. The outer loop (see Fig. 12), on the other hand, is repeated until the RSS reaches a satisfying rejuvenation capability.
5.4 Experimental Results for Rejuvenation Stimuli Generation
The proposed approach is applied to several combinational circuits representing sets of logic paths between flip-flops. The benchmarks include 4-bit and 32-bit implementations of the ALU (Arithmetic Logic Unit) core extracted from a MIPS processor design Plasma [29]. The designs were initially described in VHDL at the RT level and the gate level was synthesized with Synopsys Design Compiler. The 4-bit ALU implementation utilized 161 basic gates (INV, 2-input NAND and 2-input NOR) and had 28 gates along the longest path at time-zero (pre-NBTI stress). For the 32-bit ALU implementation, these parameters were 1002 and 138 correspondingly.
- Functional User Workloads are realistic exploitation scenarios of the ALU circuit, when only one, two or three functions out of the 16 implemented by the ALU logic were used.
The workload 1F-OR repeatedly uses only the function OR with all possible combinations for 4-bit operands. 1F-NOR, 1F-AND and 1F-ADD are similar workloads for corresponding single functions.
The workloads 2F-ADD_NOR, 2F-ADD_OR and 2F-OR_AND activate only the two corresponding functions during their execution.
The workloads 3F-OR_ADD_NOR and 3F-OR_AND_NOR exercise 3 functions each.
The Random workload is a set of stimuli repeating 150 random vectors.
The Artificial workloads were generated to represent near-maximum and near-minimum aging scenarios (i.e. the P_{z} values for a large number of gate inputs are either very high or very low).
Rejuvenation Stimuli Generation for 4-bit ALU design
In case of different workloads, the NBTI-induced path delay over 10 years was estimated to reach 9 to 51 %. The highest path delay increase due to NBTI (the “artificial near-max” workload) is less realistic (just 5 deterministic stimuli vectors are repeatedly applied keeping many gates on the NBTI-critical paths in the static NBTI state). It is provided here mainly in order to present one possible worst-case scenario. However, the increased path delays resulting from the Functional User Workloads can be considered realistic and still provide very high delay increments Δt_{path}, in the range of 15 to 30 %. The workload with random stimuli produces a very smooth distribution of P_{z} probabilities close to 0.5 in the whole design structure and therefore in case of longer random sequences can cause only small and well-distributed NBTI-induced path delays (around 12 %, which is also a common estimation found in literature). The last column shows an NBTI-induced path delay of 9 % that is possible only in case of deterministic workloads targeted at minimal NBTI.
Two of the most notable parameters characterizing the workloads are depicted in the third row. The first one is the relative number of those nodes (i.e. gate inputs) whose NBTI was observed to be static in relation to the total number of nodes in the circuit. The second is the subset of these nodes, which fall on the longest NBTI-degraded path, i.e. the one whose NBTI-delay is presented in the red row (the value is also given in percentage relative to the total number of nodes along the path and it is put in brackets). It can be observed that the functional workloads, indeed, include a significant number of nodes at static NBTI, many of them are on the longest NBTI-degraded path. Larger and smaller NBTI-induced path delays caused by the Artificial near-maximum and Random workloads, correspondingly, also correlate with these parameters. The Artificial near-minimal workload illustrates that even with the presence of nodes at static NBTI in the circuit, but being located beyond the NBTI-critical paths, combined with moderate dynamic NBTI elsewhere, may generate an small overall impact on circuit delay degradation.
A use case of rejuvenation can be depicted in the following example. Consider a scenario where the circuit’s time slack is set to 15 %. In this case, during 10 years of operation, 7 out of 9 user workloads (columns 4–10 in Table 3) will result in larger delays induced by NBTI and may functionally fail because of desynchronization (the red row). Here, application of the generated rejuvenation stimuli limited, for example, to 0.1 % execution overhead (the green row) will mitigate NBTI and reduce the induced delays to fit into the time slack margin for 4 functional user workloads (columns 4, 7, 9, 10). In case of workloads with a large number of gate inputs at close-to-1 signal probability (i.e. column 13), rejuvenation may result in NBTI-induced path delay reduction by factor two. The path delays induced by NBTI caused with random and artificial near-min workloads cannot be reduced efficiently by application of rejuvenation stimuli.
Rejuvenation stimuli generation for 32-bit Plasma ALU core
The time the proposed approach required to generate the individual rejuvenation stimuli sequences for each user profile was about 20 min on a moderate workstation (i.e. 3GHz iCore7 Windows 64 bit, 1 GB of memory used by JVM). This time includes iterative execution of the evolutionary algorithm with the circuit simulation by dedicated workload stimuli and NBTI-critical path identification calls.
6 Discussion
Despite the fact that static NBTI may occur significantly more rarely compared to dynamic NBTI in powered nanoscale logic, it is still very probable in practice. Even a correctly designed circuit with no redundancy may be exploited in various applications that keep parts of the logic unused for a very long time. Consider an ALU circuit or its logic as a part of a complete processor design. Because of system-level architecture properties or due to the end user’s habits and needs [25], the logic of the ALU implementation, e.g., 16 functions can be exploited throughout the use of computations with one or two functions only, thus leaving a part of the ALU logic unused and under constant static NBTI stress for years. The same may happen with particular parts of the control logic that activates some service regimes of a product. A particular concern for vulnerable logic can be long-life reliability-critical applications [34]. In both cases, a user profile can change after long periods of product exploitation (e.g. a car changing the owner or an airplane/satellite changing the flying route) after several years, consequently the logic part degraded over years under static NBTI may be activated and, therefore, potentially cause a functional failure.
Different from other known NBTI mitigation approaches, the proposed rejuvenation method does not require redesign of the physical layer before fabrication and can be introduced to a product in the field even after years of operation (e.g. through a firmware update). It can also be an attractive alternative for products, which aim at avoiding frequency or voltage scaling techniques that in return may reduce performance or accelerate transistor’s aging.
7 Conclusion
is based on accurate and fast hierarchical gate-level identification of NBTI-critical paths and particular gates where rejuvenation has to be applied;
proposes efficient rejuvenation stimuli generation with evolutionary algorithm;
does not require redesign and can be applied to the existing circuit, i.e. exploiting if necessary the existing design-for-testability instruments.
The proposed rejuvenation approach may have a very significant impact on the circuit’s lifetime extension. The experimental results clearly demonstrate the feasibility and the efficiency of the proposed approach to generate rejuvenation stimuli, as well as efficacy of the rejuvenation stimuli sequences to mitigate NBTI, especially in cases with static NBTI or dynamic NBTI in which a high number of extreme close-to-1 signal probabilities are involved in the pMOS V_{TH} degradation. It was demonstrated that NBTI-induced path delays can be reduced by up to two times with an execution overhead of 0.1 % or less.
Notes
Acknowledgments
This work has been supported in part by projects EU FP7 CP BASTION, H2020 RIA IMMORTAL and H2020 TWINN TUTORIAL, by CNPq (Science and Technology Foundation, Brazil) under contract n. 303701/2011-0 (PQ) and FAPERGS/CAPES under contract n. 014/2012.
Authors would like to acknowledge Dr. Christoph Werner, from TU Munich, Germany for valuable comments regarding the proposed approach.
References
- 1.(1998) Data sheet “74HC/HCT181 4-bit arithmetic logic unit”, PhilipsGoogle Scholar
- 2.Abella J, Vera X et al (2007) Penelope: the NBTI-aware processor. Proc. 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 85–96Google Scholar
- 3.Ahmed F, Milor L (2010) Reliable cache design with on-chip monitoring of NBTI degradation in SRAM cells using BIST. Proc. 28th IEEE VLSI Test Symposium, pp. 63–68Google Scholar
- 4.Alam MA (2005) Reliability- and process-variation aware design of integrated circuits. Microelectron Reliabil 48(8):1114–1122Google Scholar
- 5.Alam MA, Mahapatra S (2005) A comprehensive model of PMOS NBTI degradation. Microelectron Reliab 45(1):71–81CrossRefGoogle Scholar
- 6.Belluz J, Gaudesi M, Squillero G, Tonda A (2015) Operator selection using improved dynamic multi-armed bandit. Proc. ACM Genetic and Evolutionary Computation Conference, pp. 1311–1317Google Scholar
- 7.Bhardwaj S, Wang W, Vattikonda R, Cao Y, Vrudhula S (2006) Predictive modeling of the NBTI effect for reliable design,” Proc. IEEE Custom Integr. Circuits Conf., pp. 189–192Google Scholar
- 8.Ceratti A, Copetti T, Bolzani L, Vargas F (2012) Investigating the use of an on-chip sensor to monitor NBTI effect in SRAM. Proc. IEEE 13th Latin American Test Workshop, pp.1-6, Apr. 10–13Google Scholar
- 9.Corno F, Sanchez E, Sonza Reorda M, Squillero G (2005) Automatic test generation for verifying microprocessors. IEEE Potentials 24(1):34–37CrossRefGoogle Scholar
- 10.Corno F, Sonza Reorda M, Squillero G (2000) RT-level ITC’99 benchmarks and first ATPG results. IEEE Des Test Comput 17(3):44–53CrossRefGoogle Scholar
- 11.Drechsler R (1998) Evolutionary Algorithms for VLSI CAD. Springer, New YorkCrossRefGoogle Scholar
- 12.Eiben AE, Smith J (2015) Introduction to Evolutionary Computing. Springer, New YorkCrossRefMATHGoogle Scholar
- 13.Ferri C, Papagiannopoulou D, Bahar RI, Calimera A (2011) NBTI-aware data allocation strategies for scratchpad memory based embedded systems. Proc. IEEE 12th Latin American Test Workshop, pp. 1–6, Mar. 27–30Google Scholar
- 14.Firouzi F, Kiamehr S, Tahoori MB (2011) A linear programming approach for minimum NBTI vector selection. Proc. Great Lakes Symposium on VLSI, pp. 253–258Google Scholar
- 15.Firouzi F, Kiamehr S, Tahoori MB (2012) NBTI mitigation by optimized NOP assignment and insertion. Proc. ACM/IEEE Conference on Design, Automation and Test in Europe, pp. 218–223Google Scholar
- 16.Fu X, Li T, Fortes J (2008) NBTI tolerant microarchitecture design in the presence of process variation. Proc. Int. Symposium on Microarchitecture, pp. 399–410Google Scholar
- 17.Grasser T, Kaczer B (2007) Negative bias temperature instability: recoverable versus permanent degradation. Proc. 37th European Solid State Device Research Conference, Munich, pp. 127–130Google Scholar
- 18.Hamdioui S, Gizopoulos D, Guido G, Nicolaidis M, Grasset A, Bonnot P (2013) Reliability challenges of real-time systems in forthcoming technology nodes. Proc. ACM/IEEE Conference on Design, Automation and Test in Europe, pp. 129–134Google Scholar
- 19.Jenihhin M, Tsepurov A, Tihhomirov V, Raik J, Hantson H, Ubar R, Bartsch G, Escobar JM, Wuttke H-D (2014) Automated design error localization in RTL designs. IEEE Des Test 31(1):83–92CrossRefGoogle Scholar
- 20.Khan S, Hamdioui S (2011) Modeling and mitigating NBTI in nanoscale circuits. Proc. 17th International On-Line Testing Symposium, pp. 1–6Google Scholar
- 21.Kostin S, Raik J, Ubar R, Jenihhin M, Vargas F, Bolzani Poehls LM, Copetti T (2014) Hierarchical identification of NBTI-critical gates in nanoscale logic. Proc. IEEE 15th Latin American Test Workshop, pp.1-6Google Scholar
- 22.Kukner H, Khan S, Weckx P, Raghavan P, Hamdioui S, Kaczer B, Catthoor F, Van der Perre L, Lauwereins R, Groeseneken G (2014) Comparison of reaction–diffusion and atomistic trap-based BTI models for logic gates. IEEE Trans Device Mater Reliab 14(1):182–193CrossRefGoogle Scholar
- 23.Kumar SV, Kim CH, Sapatnekar SS (2007) NBTI-aware synthesis of digital circuits. Proc. Design Automation Conference, pp. 370–375Google Scholar
- 24.Kumar S, Kim S, Sapatnekar S (2009) Adaptive techniques for overcoming performance degradation due to aging in digital circuits. Proc.Asia and South Pacific Design Automation Conference, pp. 284–289Google Scholar
- 25.Li Q, Han Q, Sun L (2013) Context-aware handoff on smartphones. Proc. IEEE 10th International Conference on Mobile Ad-Hoc and Sensor Systems, pp. 470–478Google Scholar
- 26.Li L, Zhang Y, Yang J, Zhao J (2010) Proactive nbti mitigation for busy functional units in out-of-order microprocessors. Proc. ACM/IEEE Conference on Design, Automation and Test in Europe, pp. 411–416Google Scholar
- 27.Lin I-C, Lin C-H, Li K-H (2013) Leakage and aging optimization using transmission gate-based technique. IEEE Trans Comput Aided Des Integr Circ Syst 32(1):87–99CrossRefGoogle Scholar
- 28.Mahapatra S, Saha D, Varghese D, Kumar PB (2006) On the generation and recovery of interface traps in MOSFETs subjected to NBTI, FN, and HCI stress. IEEE Trans Electron Devices 53(7):1583–1592CrossRefGoogle Scholar
- 29.Open Cores Plasma CPU project, [http://opencores.org/project,plasma]. Accessed 2015-09-01
- 30.Sanchez E, Schillaci M, Squillero G (2011) Evolutionary Optimization: the μGP toolkit”. Springer, New YorkCrossRefGoogle Scholar
- 31.Squillero G (2005) MicroGP - an evolutionary assembly program generator. Genet Program Evolvable Mach 6(3):247–263CrossRefGoogle Scholar
- 32.Squillero G (2011) Artificial evolution in computer aided design: from the optimization of parameters to the creation of assembly programs. Computing 93(2):102–120MathSciNetGoogle Scholar
- 33.Storn R, Price K (1997) Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359MathSciNetCrossRefMATHGoogle Scholar
- 34.Tai AT, Alkalai L, Chau SN (1998) On-board preventive maintenance for long-life deep-space missions: a model-based analysis. lProc. Computer Performance and Dependability Symposium, pp.196-205, Sep 7–9Google Scholar
- 35.Tiwari A, Torrellas J (2008) Facelift: hiding and slowing down aging in multicores. Proc. International Symposium on Microarchitecture, pp. 129–140Google Scholar
- 36.Tšepurov A, Bartsch G, Dorsch R, Jenihhin M, Raik J, Tihhomirov V (2012) A scalable model based RTL framework zamiaCAD for static analysis. Proc. IFIP/IEEE International Conference on Very Large Scale Integration, pp. 171–176Google Scholar
- 37.Ubar R, Vargas F, Jenihhin M, Raik J, Kostin S, Bolzani Poehls L (2013) Identifying NBTI-Critical Paths in Nanoscale Logic. Proc. Euromicro Conference on Digital System Design, pp. 136 – 141Google Scholar
- 38.Wang Y, Chen X, Wang W, Balakrishnan V, Cao Y, Xie Y, Yang H (2009) On the efficacy of input Vector Control to mitigate NBTI effects and leakage power. Proc. Quality Electronic Design Int’l Symp., pp. 19–26Google Scholar
- 39.Wang W, Reddy V, Krishnan A, Vattikonda R, Krishnan S, Cao Y et al (2007) Compact modeling and simulation of circuit reliability for 65nm CMOS technology. IEEE Trans Device Mater Reliab 7(4):509–517CrossRefGoogle Scholar
- 40.Wang W, Yang S, Bhardwaj S, Vrudhula S, Liu F, Cao Y (2010) The impact of NBTI effect on combinational circuit: modeling, simulation, and analysis. IEEE Trans VLSI 18(2):173–183CrossRefGoogle Scholar
- 41.Wirth GI, da Silva R, Kaczer B (2011) Statistical model for MOSFET bias temperature instability component due to charge trapping. IEEE Trans Electron Devices 58:2743–275CrossRefGoogle Scholar
- 42.Yu C, Velamala J, Sutaria K, Chen MS-W, Ahlbin J, Sanchez EI, Bajura M, Fritze M (2014) Cross-Layer Modeling and Simulation of Circuit Reliability. IEEE Trans Comput Aided Des Integr Circuits Syst 33(1):8–23CrossRefGoogle Scholar
- 43.zamiaCAD framework web page, [http://zamiaCAD.sf.net]. Accessed 09-01 2015
- 44.Zhao W, Cao Y (2007) Predictive technology model for Nano-CMOS design exploration. J Emerging Technol Comput Syst 3(1), Article 1, (http://ptm.asu.edu/modelcard/2006/65nm_bulk.pm)