# Selective Flip-Flop Optimization for Circuit Reliability



Mohammad Saber Golanbari, Mojtaba Ebrahimi, Saman Kiamehr, and Mehdi B. Tahoori

## 1 Introduction, Motivation, and Contributions

VLSI circuits are influenced by several sources of process and runtime variabilities [16]. Among them, supply voltage fluctuation and transistor aging due to BTI are the most important factors [2, 30, 36]. They degrade the performance of VLSI circuits by increasing the delay, and consequently deteriorate lifetime.

The impacts of both voltage-drop and aging are significant on sequential elements such as flip-flops and latches. Due to particular aspects of flip-flops, such as the internal feedback structure, degradation of the transistors of a flip-flop as well as supply voltage fluctuation may lead to serious timing degradation or even functional failure (inability to capture the input independent of timing) [24]. Furthermore, many flip-flops are on the critical paths of a circuit because logic synthesis tools balance the delays of circuit paths to achieve the best performance, area, and power. Therefore, it is necessary to employ design-time mitigation techniques to consider and control such gradual degradation, e.g. by adding appropriate timing margins (*aging and voltage-drop guardband*) [20, 28].

Our analysis shows that in a typical digital design such as a microprocessor, based on the functionality of different components, some flip-flops operate under static or near-static BTI stress, irrespective of the workload. These flip-flops experience large timing degradation because the flip-flop input Signal Probability (SP) is very close to 0.0 or 1.0. Being subject to severe BTI stress, the aforementioned flip-flops degrade faster, imposing a large *aging guardband* to the entire circuit. Flip-flops also experience a large temporally localized voltage-drop, because they are synchronized with the clock edge and supposedly operate at the same time (at

M. S. Golanbari · M. Ebrahimi · S. Kiamehr · M. B. Tahoori (⊠) Karlsruhe Institute of Technology, Karlsruhe, Germany e-mail: golanbari@kit.edu; mehdi.tahoori@kit.edu

J. Henkel, N. Dutt (eds.), *Dependable Embedded Systems*, Embedded Systems, https://doi.org/10.1007/978-3-030-52017-5\_14

clock edge), hence, drawing substantial current leading to a significant voltage-drop over Power Delivery Network (PDN) [22]. Moreover, recent studies have shown that the voltage-drop impact gets more severe by technology scaling [2, 21, 38]. Therefore, in a conventional design flow, costly *voltage-drop timing guardband* is considered for reliable circuit operation [22].

In this chapter, we explore methods to improve circuit reliability by addressing the timing degradation of flip-flops under severe aging<sup>1</sup> and voltage-drop, i.e. *selective flip-flop optimization*. The idea is to find timing-critical flip-flops under high aging and/or voltage-drop impact, and selectively re-optimize them for operating under such stress by appropriately sizing their transistors. This effectively improves the reliability and lifetime of circuits without imposing much overhead, because these flip-flops constitute a small portion of all flip-flops.

Simulation results obtained by applying the proposed method to a processor show that the flip-flops optimized with the proposed method exhibit much less delay degradation, while imposing less than 0.1% leakage power overhead to the processor. As a result, the required timing guardband of the processor using the proposed method is significantly less compared to the original processor. Therefore, given a specific clock period, the optimized processor design with the proposed method has 36.9% longer lifetime and better reliability compared to the original processor design.

## 2 Variability Impact on Flip-Flops

## 2.1 Flip-Flop Timing

Flip-flop timing metrics such as setup-time (U), hold-time (H), clock-to-q ( $D_{CQ}$ ), and data-to-q ( $D_{DQ}$ ) are well discussed in [31, 34]. When the setup-time is large enough, the clock-to-q value is almost constant, but further reduction of the setup-time will increase the clock-to-q value monotonously until a value after which the flip-flop is unable to capture and latch the input [31]. Based on this, the *optimum setup-time* is defined as the setup-time value which causes the clock-to-q value to increase by 10% from its minimum value [32]. Moreover, each flip-flop has two internal paths; one for transferring the input state "zero" to the output i.e. High-to-Low (HL) input transition, and the other for transferring the input state "one" to the output i.e. Low-to-High (LH) input transition. Basically, the timing parameters for these two internal paths can be different [24] as shown in Fig. 1, meaning that there are two sets of timing parameters for internal LH and HL paths of a flip-flop:

 $\{U_{LH}, D_{CQ_{LH}}, D_{DQ_{LH}}\}$  for LH transition,

<sup>&</sup>lt;sup>1</sup>We consider the impact of Negative Bias Temperature Instability (NBTI) on PMOS transistors, and Positive Bias Temperature Instability (PBTI) on NMOS transistors.



Fig. 1 Different flip-flop timing parameters. The correct functionality is guaranteed by considering the flip-flop delay as illustrated

$$\{U_{HL}, D_{CQ_{HL}}, D_{DQ_{HL}}\}$$
 for HL transition.

Flip-flop delay should be defined such that the correct functionality of the flip-flop will be guaranteed, disregard of the transition. Therefore, we define the flip-flop delay as the *summation of the worst setup-time and the worst clock-to-q of both transitions* as shown in Fig. 1.

$$delay = \max\{U_{LH}, U_{HL}\} + \max\{D_{CQ_{LH}}, D_{CQ_{HL}}\}.$$
 (1)

This guarantees that in both transitions the input signal is correctly captured and propagated to the flip-flop output.

## 2.2 **Runtime Variation Impacts on Flip-Flops**

Several parameters such as supply voltage, workload, and temperature affect the performance of flip-flops in a circuit. Parameters such as temperature and supply voltage affect all the transistors of a flip-flop in the same way, whereas the impact of the input SP is different for the transistors of a flip-flop [23]. This results in an asymmetric aging of transistors according to their stress duty cycles. Therefore, the delay degradation of internal LH and HL paths inside an aged flip-flop depends on the input SP [24].



**Fig. 2** Separate internal LH (red)/HL (blue) paths of a C2MOS flip-flop [31] (**a**), and delay of internal LH/HL paths of an aged C2MOS flip-flop after 5 years (optimized for Power Delay Product (PDP) in the fresh state) for different input SPs (**b**)

In the C2MOS flip-flop<sup>2</sup> depicted in Fig. 2a, the internal LH and HL paths consist of two separate groups of transistors, which makes the aging of these two paths independent according to the input SP. Figure 2b illustrates the delay of LH and HL transitions of an aged C2MOS flip-flop [31] for different input SPs. When the flip-flop is aged under input SP = 0.0 (SP0), the worst delay degradation happens on the flip-flop HL path; however, the delay of the flip-flop LH path is only slightly affected. On the other hand, an aging under input SP = 1.0 (SP1) greatly degrades the delay of the flip-flop LH path while slightly affecting the delay of the flip-flop HL path. For moderate aging condition, i.e. 0.1 < SP < 0.9, the delay degradation of both LH and HL paths is moderate. The reason is that under SP0 and SP1 conditions, *Static BTI* (S-BTI) asymmetrically alters the threshold voltages leading to unbalanced aging of LH and HL paths of the flip-flop as the stress duty cycle of some transistors is 1.0, i.e., always under BTI stress. However, in moderate aging condition, the transistors can partially recover as the stress duty cycle is less than 1.0.

The impact of supply voltage fluctuation on the flip-flops of a circuit depends on the workload variation and dynamic power consumption of the circuit. Therefore, each flip-flop may experience a specific amount of voltage-drop. A voltage-drop causes performance degradation of the flip-flops, which is typically larger than the degradation of simpler combinational gates in the standard cell library. Figure 3 compares the impact of a voltage-drop up to 10% on the delay of an aged flip-flop and an aged inverter. Compared to a no-voltage-drop condition, the delay of the flip-flop increases by 23.6% whereas the delay of the inverter is increased by 15%.

Moreover, the flip-flops of a circuit generally experience higher amount of voltage-drop compared to combinational gates [37]. As a result of temporally

<sup>&</sup>lt;sup>2</sup>A C2MOS flip-flop design is a master–slave flip-flop built of two connected C2MOS latches. It is one of the commonly used flip-flops in modern processor designs [31].



**Fig. 3** Comparison between the voltage-drop induced delay degradation of a flip-flop and an inverter, which are aged under same condition (Aging under SP1 for 5 years)

localized switching of flip-flops at the positive (or negative) edge of clock signal, the instantaneous current drawn from PDN at the synchronized clock edge is comparatively high. This leads to high voltage-drop at the clock edge, when the flipflops are processing their input signals. This peak current consumption is damped over the rest of the clock period, when the combinational cells are active. Therefore, in this work we focus on dealing with the impact of voltage fluctuation on the flipflops.

Temporal and spatial temperature variations can also affect the circuit performance. The temporal temperature change could be rather high and has been the subject of research since it affects the reliability of the VLSI circuits. It is demonstrated in [17] that the circuit performance can be changed by up to 10% for 110 °C temperature variation. Therefore, in order to meet the reliability constraints, the circuit timing should be adjusted according to the worst temperature corner, which is typically at high temperature. On-chip spatial temperature gradient puts different stress on circuit components across a chip. The amount of on-chip spatial temperature difference (only on cores) based on simulation [3, 7], sensor measurements [33], and thermal camera [3] is reported to be up to  $\sim$ 30°C. Since the delay change is approximately 4% for every 40 °C [17, 29], the overall difference between the delay degradation of core flip-flops due to such spatial temperature gradient is expected to be less than 3%, and hence, much smaller compared to voltage-drop variation [11].

The combined impact of voltage-drop and aging significantly degrades the performance of flip-flops. As an example, the delay of a fresh flip-flop optimized with balanced HL/LH delay increases from 98.5 ps to 165.7 ps due to the combined impact of voltage-drop (10%) and S-BTI (5-years under SP0). This is equivalent to 68% delay increase. If such a flip-flop is in a critical path of the circuit, a large timing guardband is required for timing closure considering the reliability constraints. Therefore, it is necessary to find such flip-flops at design-time and optimize them for operating under such conditions.

ip-flop delay increase (%) 600 delav in 36 500 34 # of Flip-flops 400 32 300 30 200 28 100 26 24 0, TSP 5 0.01 0.01 TSP 5 0.1 0.1 T Sip 5 0.5 0.5 0.5 0.5 0.9 0.9 0.9 0.9 0.9 0.9 0.9 , zq2, 0,00 Sp 1 sp 0

(a) The average flip-flop SPs during the execution of some MiBench workloads on Leon3, and the corresponding delay degradation in 5 years. In order to be fair in the analysis, the flip-flops which are not exercised by the workloads are excluded from this analysis.



(b) The distribution of voltage-drop on the Leon3 flip-flops. The voltage-drop values are normalized to the maximum voltage-drop across the processor.

Fig. 4 (a) Input signal probabilities and (b) voltage-drop analysis of Leon3 flip-flops executing MiBench Workloads

## 2.3 Significance of Flip-Flops in Circuit Reliability

In a properly designed circuit, the timing of circuit paths are balanced during the synthesis process. Therefore, many flip-flops are *timing-critical* as they lie on the circuit critical paths. Studies [12, 37] have shown that in VLSI circuits, some flip-flops are under severe static BTI leading to a large timing degradation over time. Furthermore, the impact of voltage-drop on flip-flops could be very high as a result of localized power consumption at a specific time (e.g. positive clock edge) or at a specific location on the circuit layout.

The large impact of S-BTI and voltage-drop on flip-flops has a significant impact on the reliability of a circuit when such flip-flops are timing-critical. In order to investigate the likelihood of having such a scenario in a typical digital design, we use the flow presented in Sect. 4 to extract the voltage-drop and the aging of the Leon3 flip-flops by executing six MiBench workloads [15] namely stringsearch, qsort, basicmath, bitcount, fft, and crc32 on Leon3 processor [10]. In order to be fair, we excluded the flip-flops belonging to the parts which are not exercised by the employed workloads such as interrupt handler, timers, and UART controller. The synthesized netlist of the Leon3 processor has 2352 flip-flops, but the results demonstrated in this section contain only 1686 flip-flops belonging to the parts which are exercised by all employed workloads.

Figure 4a demonstrates the input SP distribution of the aforementioned 1686 flipflops. The results show that 181 flip-flops always experience input SP0, whereas 29 flip-flops are under input SP1. Our analysis shows that the flip-flops with such behavior typically belong to either the error checking and exception handling registers or higher bits of address registers which are constant due to temporal and spatial locality of the executed instructions. Besides, the SP of a considerable number of flip-flops is very close to either 0.0 or 1.0. Please note that the results reported in Fig. 4a are the average of six employed workloads, and hence, the flipflops with SP=0 or SP=1 have such SP across all executed workloads. Similar experiment has been carried out in [18] to study the impact of workload in real systems, which shows that some flip-flops are always under S-BTI across different workloads.

Figure 4b shows the distribution of the maximum voltage-drop impacting the flipflops of Leon3 processor compared to the peak voltage-drop across all the executed workloads. Please note that it is necessary to consider the maximum voltage-drop over the execution of all workloads, because it eventually impacts the flip-flop characteristics. A significant portion of flip-flops experience on average 41% of the maximum amount of voltage-drop; however, there are flip-flops at the right side tail of the distribution which experience large voltage-drop comparable to the maximum voltage-drop in the circuit.

According to the observations in Fig. 4, there are flip-flops experiencing both S-BTI and high voltage-drop which leads to high-degradation. If such flip-flops are on a critical path of the processor (i.e. timing-critical flip-flops), the degradation of the flip-flops should be reflected in the timing guardband of the circuit. Timing-critical flip-flops can be categorized into different groups based on the impact of voltagedrop and aging as follows:

- low voltage-drop and low aging,
- low voltage-drop but S-BTI aging (SP0/SP1)\*
- high voltage-drop but typical aging\*
- high voltage-drop and S-BTI aging (SP0/SP1)\*

Therefore, we propose to generate flip-flops specifically optimized for such highdegradation conditions (marked by \*) and add them to the standard cell library. Using the proposed flow in Sect. 4, we determine such high-degradation and timingcritical flip-flops and replace them with the optimized versions to improve the timing and reliability of the circuit.

## 3 Reliability-Aware Flip-Flop Design

In a typical reliability-aware circuit design, one should consider the delay of the elements under variation impacts to ensure the correct functionality of the circuit during the expected lifetime. Therefore, higher delay degradation of timing-critical flip-flops imposes a large timing guardband. In our proposed methodology, we create optimized versions of the flip-flops for different stress conditions based on aging and voltage fluctuations, and use these optimized versions only when a flip-flop is timing-critical and subject to such stress conditions to avoid unnecessary over design. This means that in the cell library, we add the following resilient versions of the flip-flops:

• Aging-resilient flip-flops, optimized for different aging corners (SP0 and SP1),



Fig. 5 Delay of a C2MOS flip-flop which is aged under SP = 0 over 5 years for LH/HL transitions, compared to the flip-flop optimized for SP = 0 showing how the unbalanced aging of internal LH/HL paths worsens the degradation in original flip-flop

- Voltage-drop resilient flip-flops, optimized to have lower performance degradation under voltage fluctuation,
- Aging and voltage-drop resilient flip-flops.

## 3.1 Aging-Resilient Flip-Flop Design

When the fresh delays of internal paths of a flip-flop (i.e., LH and HL paths) are designed to be similar (depicted as solid lines in Fig. 5), the internal path with higher degradation rate eventually becomes dominant and determines the total delay of the flip-flop. In this case, a significant aging in flip-flop characteristics is observed over time (corresponding to the internal path with higher degradation). On the other hand, if the internal path with higher degradation rate is initially faster (by design) than the internal path with lower degradation rate, the dominant internal path would be the slower one, and hence the higher degradation rate of the faster internal path is masked. Consequently, the overall aging of the flip-flop would be rather small. The delay of the optimized flip-flop, shown in Fig. 5 by dashed lines, exhibits such characteristics. The post-aging delay of the optimized flip-flop would increase by  $\sim 10$  ps, which is much lower than  $\sim 40$  ps increase in the

delay of the original flip-flop. We exploit this method for designing aging-resilient flip-flops.

In order to decrease the overall BTI-induced aging inflicted to a flip-flop, our proposed method balances the delay of internal HL and LH paths of the flip-flop for *post-aging state of the flip-flop*, by resizing the transistors of internal HL and LH paths. In other words, the proposed method increases the fresh delay (t = 0) of the flip-flop internal path which has lower degradation rate in order to compensate the overall degradation of the flip-flop after aging. Although the fresh delay of the original flip-flop might be slightly larger compared to the fresh delay of the original flip-flop, the overall delay of the optimized flip-flop since the aging rate is much smaller.

Please note that this method reduces the degradation for a given SP, but inevitably worsens the aging at the other corners of SP. For example, if we optimize the flip-flop for SP0, the degradation would be much higher if the optimized flip-flop operates at SP1. Nevertheless, these flip-flops under S-BTI will not operate at other SP corners, because their SP is determined by the circuit structure and functionality. Therefore, we only optimize for the given SP corner. This means that we intentionally sacrifice other corners, which never occur due to the specific functionality of the circuit, to gain a larger improvement.

## 3.2 Voltage-Drop Resilient Flip-Flop Design

Other than aging, which affects each flip-flop transistor based on the input signal probability, a drop in the supply voltage of the flip-flop slows down all flip-flop transistors in the same way. However, a slight upsizing of specific transistors can compensate the degradation in the flip-flop timing. Therefore, we evaluate the delay of the flip-flop when operating under the impact of voltage-drop, and optimize the flip-flop with the goal of improving the delay. Consequently, the optimized flip-flop would have better timing at the cost of higher power consumption.

## 3.3 Aging and Voltage-Drop Resilient Flip-Flop Design

The degradation in the flip-flop timing due to both S-BTI and voltage-drop is very large. Such timing degradation may not be effectively compensated by resizing the transistors within a flip-flop area without upsizing the entire flip-flop. Therefore, in addition to targeting for better timing under the impact of the aging and voltage-drop, we allow the optimization algorithm to increase the area of the flip-flop by a small percentage. Please note that an extra Engineering Change Order (ECO) might

be needed to replace the original flip-flop with the optimized version in this case. However, since there exist only a few flip-flops under such degradation it would not be an issue to perform an ECO on placement.

# 3.4 Problem Formulation for Flip-Flop Resiliency Optimization

The delay of a flip-flop under a specific working condition (including temperature, voltage, and input SP) can be presented as a function of the transistors' widths:

$$delay = f(W), \quad W = [w_i], \tag{2}$$

where  $[w_i]$  is a vector containing the width of flip-flop transistors. Here, *delay* is the delay (Data-to-q) of the flip-flop, according to Eq. (1), under variation impact, which could be S-BTI stress, voltage-drop, or both depending on the optimization approach.

The delay function f is a complicated function of transistors' widths. Our experimental results for flip-flops with different sizing show that f cannot be presented with any general linear function. Therefore, we use Sequential Quadratic Programming (SQP) which is a non-linear programming technique [19]. In SQP, the problem is converted into quadratic sub-problems and solved in order to find a better sizing in each iteration. For this purpose, we follow an iterative approach in order to minimize the delay of Eq. (2). Given an initial sizing, the delay function f is approximated with a quadratic function:

$$f(W) \approx f(W_0) + \nabla f(W_0)^T \cdot (W - W_0) + \frac{1}{2}(W - W_0)^T \cdot H_f(W_0) \cdot (W - W_0),$$
(3)

where  $\nabla f(W)$  and  $H_f(W)$  are the gradient and the Hessian of the delay function f, respectively. Minimizing the quadratic approximation of Eq. (3), with respect to some constraints, which will be discussed later in this section, yields an optimized transistor sizing. Thereafter, the obtained sizing is used as the initial sizing, and a new iteration is launched. This cycle continues until the optimization reaches the required precision, i.e. the difference between the optimized delays of two consecutive iterations becomes smaller than a predefined threshold  $\epsilon_{delay}$ . Therefore, the solver continues by checking the precision of the resulting delay:

$$|\text{delay}_{i-1} - \text{delay}_i| < \epsilon_{\text{delay}}$$
(4)

where  $delay_i$  represents the delay of ith iteration.

Another reason to use the quadratic approximation is that the optimum result of a linear problem always lies on the boundaries, while the optimum result of a ....

| Table 1   Flip-flop            | Parameters    | W = 0                | $(w_1,\ldots,w_n)$                             |
|--------------------------------|---------------|----------------------|------------------------------------------------|
| optimization method<br>summary | Initial guess | $W_0 = 0$            | optimized W for PDP (fresh)                    |
| summary                        | Constraints   | $w_i \ge w_i$        | w <sub>min</sub>                               |
|                                |               | $\sum_{1}^{n} w_{i}$ | $t \leq (1+\lambda)\sum W_0$                   |
|                                |               | power                | $(W) \leq (1+\beta) \operatorname{power}(W_0)$ |
|                                | Target        | minim                | ize: delay = $f(W)$                            |
|                                | Constants     | $w_{\min}$           | Minimum size                                   |
|                                |               | λ                    | Acceptable excessive area                      |
|                                |               | β                    | Acceptable excessive leakage                   |

quadratic problem can be any point within the boundaries as well as the boundaries themselves. In Sect. 5, we demonstrate that the optimum result does not necessarily lie on the boundaries, and hence a non-linear programming technique is needed to find a better result. Table 1 summarizes the optimization problem.

Several constraints are applied to the optimization problem, relating to transistors size, flip-flop area, and leakage. The first constraint shown in Table 1 limits the minimum size of transistors. The second constraint limits the area of the optimized flip-flop. In case of optimizing for S-BTI or voltage-drop, we consider  $\lambda = 0$  to keep the flip-flop area within the area of the original flip-flop which also facilitates keeping the aspect ratio almost equal to the aspect ratio of the original flip-flop. This way, the optimized flip-flop can easily replace the original flip-flop without any layout modifications at the circuit-level. This is achieved by limiting the summation of transistor widths  $w_i$ . However, for flip-flops under S-BTI and voltage-drop, we assume a  $\lambda > 0$  value to compensate the delay degradation better. The third constraint sets an upper limit for the excessive leakage of the flip-flop by parameter  $\beta$ . This constraint is applied to the optimization problem to limit the leakage power of the optimized flip-flops within an acceptable range. The initial guess of optimization  $W_0$  is the optimum sizing for minimum PDP in the fresh state.

#### 3.5 Reliability-Aware Flip-Flop Optimization Flow

Figure 6 presents our proposed reliability-aware optimization flow. For a given input SP, the SP of all transistors are once calculated using SPICE simulations. Afterwards, based on the extracted SP for transistors and the operating corner of the flip-flop (temperature, supply voltage, etc.), the BTI-induced threshold voltage shifts of all transistors ( $\Delta V_{th}$ ) are obtained. Then, the  $\Delta V_{th}$  values are back-annotated into the original flip-flop SPICE netlist, and the SPICE netlist of aged flip-flop is generated.

In each SQP iteration, the quadratic sub-problems are created and solved to generate further improved flip-flop sizing. Subsequently, the new sizing is backannotated into the aged flip-flop netlist extracted before, and a new aged flip-flop with the given sizing is generated. Then, Cadence Virtuoso Liberate [6] is used



Fig. 6 Overall flow to find the optimum flip-flop sizing for under S-BTI stress and voltage-drop at a specific working corner (voltage, temperature)

to characterize the new flip-flop and extract its delay and power consumption. When the improvement is small enough and the condition in Eq. (4) is met, the SQP method terminates and returns the last sizing as the optimum solution for the problem.

As the process is executed at a specific supply voltage ( $V_{dd}$ ), it can inherently be used to optimize for a voltage-drop as well, when the given supply voltage includes the impact of the voltage-drop. We can also create voltage-drop resilient version of a flip-flop for typical aging, by considering input SP of 0.5. Therefore, we execute the flow presented in Fig. 6 for these conditions in order to create variation-resilient versions of the flip-flop, assuming a supply voltage of  $V_{dd}$  and a maximum voltagedrop of R%:

|                        | Supply voltage (V)          | Aging condition            |
|------------------------|-----------------------------|----------------------------|
| Aging                  | V <sub>dd</sub>             | S-BTI (SP0, SP1)           |
| Voltage-drop           | $(1 - \frac{R}{100})V_{dd}$ | Typical aging (SP $=$ 0.5) |
| Aging and voltage-drop | $(1 - \frac{R}{100})V_{dd}$ | S-BTI (SP0, SP1)           |

After optimization process, it is necessary to re-characterize the flip-flops for different supply voltage  $((1 - \frac{R}{100})V_{dd}$  to  $V_{dd})$  and aging conditions  $(0 \le SP \le 1)$ . The characterization results are then used to obtain overall circuit timing under supply voltage fluctuation and aging impacts.

## **4** Selective Flip-Flop Optimization

This section explains how the optimized flip-flops in Sect. 3 can be employed to improve the reliability of a circuit. The idea is to find the flip-flops affected by S-BTI and/or voltage-drop impacts which are also influential on circuit reliability, i.e. the timing-critical flip-flops, and replace them with the optimized versions. The reason for this *selective flip-flop optimization* is that the reliability-aware flip-flop optimization is costly in terms of leakage overhead per flip-flops. Therefore, flip-flop replacement should be done only for the timing-critical flip-flops which experience S-BTI and/or large voltage-drop to be cost-effective. Since they constitute a small subset of the all flip-flops in the design, the proposed method is able to reduce the overall timing guardband in a cost-effective way.

The overall flow of the proposed selective flip-flop optimization methodology is presented in Fig. 7. The flow uses the results of the Synthesis and Place & Route steps of a VLSI design flow and is composed of (I) Aging and Voltage-Drop Analysis and (II) Selective Flip-flop Replacement steps. The optimization flow updates the gate-level netlist and the circuit layout to improve the reliability of the circuit under voltage-drop and aging impacts. The outputs of the optimization method can be further used in the rest of the VLSI design flow. Therefore, the proposed method is transparent to the VLSI design flow and can be easily integrated into it.

## 4.1 Aging and Voltage-Drop Analysis

In this step, the results of Synthesis and Place & Route steps of the VLSI design flow are used to discover the flip-flops which are *aging-critical*, *voltage-drop critical*, and *timing-critical*.

Aging-critical flip-flops are those flip-flops which experience large impact of aging, i.e. flip-flops under S-BTI. To find the aging-critical flip-flops we need to extract the SP of the flip-flops. Therefore, we perform a gate-level simulation running some representative workloads. The representative workloads are pieces of workloads which are typically executed on the circuit. The result of the gate-level simulation is the Voltage Change Dump (VCD) of all nets inside the circuit. Based on this information we can collect SP of all flip-flops and determine the aging-critical flip-flops.

Dynamic power profiles of circuit components can be extracted from the VCD reports. We estimate the dynamic voltage-drop in the circuit based on the power profiles and the layout and packaging of the circuit. This accounts for the resistive and inductive components of the voltage fluctuation. We generate a *voltage-drop map* of the circuit by evaluating the *maximum voltage-drop* of each cell (gates, flip-flops, etc.) over the time and over different workloads. As a result, we find the maximum amount of voltage-drop that each flip-flop experiences over time.



Fig. 7 Circuit optimization flow using the proposed selective flip-flop optimization method

Accordingly, the flip-flops which experience a large amount of voltage-drop are extracted.

Furthermore, the gate-level simulation results are used to perform a *voltage-drop and aging-aware timing analysis* which obtains the delay of circuit paths under variability impacts. We extended the aging-aware timing analysis in [8] by

considering voltage-drop related information. This is done by characterizing the cells at two different supply voltages: the nominal  $V_{dd}$  and the supply voltage considering the maximum drop  $(1 - \frac{R}{100})V_{dd}$ . Then, for each gate/flip-flop in the gate-level netlist, based on the amount of voltage-drop on the gate, we perform a linear interpolation among the standard cell library entries for two supply voltages and find the corresponding timing information. The linear interpolation is a valid method under the assumption of limited change in the supply voltage, as shown in Fig. 3. For a more aggressive voltage fluctuation, it could be necessary to characterize the standard cell libraries for a few intermediate supply voltage values and employ a PCHIP method. Accordingly, we find the timing-critical flip-flops, which are parts of the critical and near-critical paths of the circuit considering the impact of variations.

## 4.2 Selective Flip-Flop Replacement

In the selective flip-flop replacement step, we replace the flip-flops which are timingcritical, aging-critical, or voltage-drop-critical with their optimized counterparts for such aging and/or voltage-drop conditions. Although a small portion of the flipflops are replaced during the flip-flop replacement process, the circuit layout, timing, and power properties change since the replaced flip-flops are timing-critical and may have different area and power characteristics. Therefore, the proposed flip-flop replacement is an iterative process which replaces a number of flip-flops with the optimized versions in each iteration. The iterative process continues until no flipflop needs to be replaced by an optimized version anymore.

In iteration *i* of the method, we assume that the circuit delay is  $D^i$  based on timing analysis results, and  $d^i_j$  is the maximum delay of the paths terminating at flip-flop *j* (including the delay of the flip-flop as well). Therefore, in each iteration:

1. We choose the timing-critical flip-flops with a timing slack value of less than k% of the circuit delay, i.e. when:

$$d_j^i \ge \left(1 - \frac{k}{100}\right) D^i, \quad j : \text{index of flip-flops.}$$

- Among these flip-flops, those which are also included in the aging-critical and/or voltage-drop-critical flip-flops, are replaced with the optimized versions.
- 3. A trial voltage-drop and aging-aware timing analysis is performed and the circuit delay  $(D^{i++})$  is determined considering the replaced flip-flops.
- 4. We keep the optimized flip-flops only when the corresponding path delay of the flip-flops *before optimization* is larger than a percentage of the evaluated circuit delay  $(D^{i++})$ :

$$d_j^i > r \times D^{i++} \Longrightarrow FF_j \to FF_{j,opt}.$$
 (5)

The rest of the updated flip-flops in this iteration are rolled back to the original versions. Please note that we also consider a ratio r < 1 into Eq. (5) to compensate for the calculation errors due to simulation.

- 5. The layout and gate-level netlist of the circuit are updated. The layout is only updated if a cell with larger area is used (particularly applicable to the flip-flops under both aging and voltage-drop as explained in Sect. 3).
- 6. In case any flip-flop is replaced by an optimized version during this iteration, we need to start a new iteration because the timing and power specification of the circuit are modified. This is done by re-executing the aging and voltage-drop analysis, as explained in Sect. 4.1. The gate-level simulation, which is a time consuming process, does not need to be repeated as its results are not affected by the flip-flop replacement.

The above flow replaces minimum number of flip-flops with the optimized versions and impose minimum amount of overhead to the circuit. In our simulations the flow is terminated within a few iterations, since the changes in the circuit layout, power, and timing are not extensive.

## 5 Results and Discussions

In this section, we evaluate the efficiency of the proposed selective flip-flop optimization based on simulation results.

## 5.1 Simulation Setup

We applied the method to several flip-flop topologies, namely C2MOS latch, Dynamic/Static Single Transistor Clocked latch (DSTC/SSTC), and Semi-Dynamic flip-flop (SDFF) [31]. The flip-flops are implemented using 45 nm Bulk CMOS Predictive Technology Model (PTM) transistors [39]. All flip-flops are initially optimized for the minimum PDP in the fresh state (original design). The aging parameters of the model proposed in [4] are tuned so that the post-aging delay of a Fan-Out 4 (FO4) inverter increases by 10% at SP = 0.5 over 5 years. For delay and leakage measurements, the output load of flip-flops is set to FO4, and the cells are characterized at room temperature and at different supply voltages, ranging from 80 to 100% of the nominal supply voltage of the technology node.

We used Leon3 processor as a case study for our proposed method. We used Nangate 45 nm open cell library for combinational logic, and aging assumptions are the same as described at the beginning of this section. The processor is synthesized using Synopsys Design Compiler and placement and routing is done using Cadence EDI [5]. We executed various MiBench workloads on the synthesized Leon3 processor and extracted the VCD files. Based on the VCD files, the SP of each node of the synthesized circuit is calculated and the power consumption of the gates and flipflops is calculated using Synopsys Power Compiler. The voltage-drop map of the processor is also extracted using VoltSpot tool [38], which is able to extract the voltage-drop caused by both resistive and inductive components.

Please note that the proposed technique is not restricted to a specific working condition or flip-flop topology. We proceed with presenting detailed results and analysis for a C2MOS flip-flop. Then, we discuss the results for other types of flip-flops concisely. Afterwards, the dependency of the improvement achieved by the proposed method to the excessive leakage will be investigated. At the end of this section, the impact of using optimized flip-flops on a Leon3 processor lifetime will be demonstrated.

## 5.2 Detailed Optimization Results of C2MOS Flip-Flop

We apply the proposed optimization flow presented in Sect. 3.5 (see Fig. 6) to C2MOS flip-flop design to create optimized flip-flops for aging and voltage-drop resilience. In order to create the aging-resilient versions of the C2MOS flip-flop, we let the optimizer to consider designs with up to 25% more leakage compared to the original flip-flop by setting the coefficient  $\beta$  in Table 1 to 0.25. At this point, we limit the area of the flip-flop to the area of the original flip-flop, i.e.  $\lambda = 0$ . Please note that the total overhead of the leakage power for the entire circuit would be negligible since the number of optimized flip-flops in the design would be limited. For example, if according to Sect. 2.3, 12.45% of flip-flops are working under S-BTI, and the leakage overhead of an optimized flip-flop would be less than 25%, the leakage overhead imposed on the flip-flops would be at most 3.11% (much less overhead when considering the entire processor design). The aging and voltage-drop resilient version of the C2MOS flip-flop can be created by assuming an extra area up to 20% and more leakage overhead. For this, we assume  $\lambda = 0.2$ ,  $\beta = 1$ . Using the extra area, the optimizer is able to find a better design for those flip-flops which are timing-critical and are under large impact of aging and voltage-drop. Since these flip-flops are very rare, but have significant impact on the overall processor lifetime and reliability, it is effective to spend more area for large reliability and lifetime gains.

Table 2 compares the characteristics of an original and optimized C2MOS flip-flop (such as setup-time (U), clock-to-q  $(D_{CQ})$ , data-to-q  $(D_{DQ})$ , delay, and leakage) in three different optimization scenarios:

- Scenario 1 *post-aging PDP*, optimized for PDP in post-aging.
- Scenario 2 The proposed method (optimized for aging), in which the flip-flop is optimized for aging resiliency, by minimizing its delay for post-

|                                         |          |                         |                                    | Post-a     | Post-aging PDP    | DP                    | Propc      | Proposed method | sthod                 |            |                   |                             |
|-----------------------------------------|----------|-------------------------|------------------------------------|------------|-------------------|-----------------------|------------|-----------------|-----------------------|------------|-------------------|-----------------------------|
|                                         | Original | al                      |                                    | (simil     | (similar to [1])  | ([                    | Optin      | nized fc        | Optimized for aging   | Optin      | iized fo          | Optimized for aging + vdrop |
|                                         | (optim   | ized for                | (optimized for fresh PDP)          | Scenario 1 | rio 1             |                       | Scenario 2 | urio 2          |                       | Scenario 3 | rio 3             |                             |
| Parameters <sup>c</sup>                 | Fresh    | Aged                    | Fresh Aged Aged+vdrop <sup>b</sup> |            | Aged              | Fresh Aged Aged+vdrop | Fresh      | Aged            | Fresh Aged Aged+vdrop | Fresh      | Aged              | Fresh Aged Aged+vdrop       |
| $U_{LH}$ (ps)                           | 20.2     | 20.2 22.6               | 29.2                               | 25.6       | 25.6 30.0         | 32.9                  | 23.0       | 23.0 24.6 30.3  | 30.3                  | 30.0       | 30.0              | 37.7                        |
| $D_{CQ,LH}$ (ps)                        | 78.3     | 78.3 101.8 123.3        | 123.3                              | 91.8       | 91.8 97.6         | 117.5                 | 83.3       | 88.8            | 103.9                 | 72.5       | 72.5 77.5 92.2    | 92.2                        |
| $D_{DQ,LH}$ (ps)                        | 98.5     | 98.5 124.4 152.5        | 152.5                              | 117.4      | 117.4 125.6 150.4 | 150.4                 | 106.3      | 113.4           | 106.3 113.4 134.2     | 102.5      | 102.5 107.6 129.9 | 129.9                       |
| $U_{HL}$ (ps)                           | 16.6     | 16.6 30.8 42.4          | 42.4                               | 13.7       | 13.7 28.2         | 39.6                  | 16.0       | 16.0 30.6 40.6  | 40.6                  | 11.1       | 11.1 24.6 30.6    | 30.6                        |
| $D_{CQ,HL}$ (ps)                        | 78.1     | 78.1 100.5 121.0        | 121.0                              | 82.9       | 97.9              | 117.2                 | 75.9       | 88.5            | 88.5 106.4            | 65.8       | 65.8 75.6 91.7    | 91.7                        |
| $D_{DQ,HL}$ (ps)                        | 94.7     | 94.7 131.3 163.4        | 163.4                              | 96.6       | 96.6 126.1 156.8  | 156.8                 | 91.9       | 119.1           | 91.9 119.1 147.0      | 76.9       | 76.9 100.2 122.3  | 122.3                       |
| Delay (ps) (Sect. 2.1)                  | 98.5     | <b>98.5</b> 132.6 165.7 | 165.7                              | 117.4      | 117.4 126.1 157.1 | 157.1                 | 106.3      | 119.4           | 106.3 119.4 147.0     | 102.5      | 102.5 107.6 129.9 | 129.9                       |
| Leakage (nW)                            | 44.3     | 44.3 30.7 15.5          | 15.5                               | 42.0       | 42.0 27.9 14.1    | 14.1                  | 46.4       | 46.4 31.4 15.8  | 15.8                  | 67.8       | 67.8 46.1 23.1    | 23.1                        |
| PDP                                     | 4368     | 4074 2573               | 2573                               | 4936       | 4936 3525 2215    | 2215                  | 4928       | 4928 3744 2318  | 2318                  | 6952       | 4963 3000         | 3000                        |
| Delay degradation, Eq. (6) <sup>a</sup> | I        | 35% 68%                 | 68%                                | I          | 28%               | 60%                   | I          | 21% 49%         | 49%                   | I          | 9.2% 32%          | 32%                         |
| Excessive leakage                       | I        |                         |                                    | -5.2%      | 0                 |                       | 4.7%       |                 |                       | 53%        |                   |                             |

Table 2 C2MOS flio-flow characteristics for (1) Original flio-flow (Obtimized for PDP in the fresh state). (2) Optimized flio-flow for PDP in post-geing [1]).

<sup>a</sup>The reference for calculating the delay degradation is 98.5 ps (Original, fresh flip-flop)

<sup>b</sup>Measurements are done under 10% delay degradation assumption (Sect. 5.1) and 10% voltage-drop

<sup>c</sup>Dynamic power is not reported because it is irrelevant for flip-flops under S-BTI as these flip-flops do not change state frequently

<sup>d</sup>Optimization results for SP1 are much better. For example, the delay degradation of the proposed method for aging is only 11% (for SP0 it is 21%)

aging. The acceptable excessive area and leakage are 0% and 25%, respectively ( $\beta = 0.25, \lambda = 0$ ).

Scenario 3 The proposed method (optimized for aging + vdrop), in which the flipflop is optimized for aging and voltage-drop resiliency, by minimizing its delay for post-aging and under voltage-drop impact. The acceptable excessive area and leakage are 20 and 100%, respectively ( $\beta = 1, \lambda = 0.2$ ).

The optimization results in Table 2 are reported for "fresh" state (no aging or voltage-drop), for "aged" state (under S-BTI aging SP0 for 5 years), and for "aging + vdrop" state (when the flip-flop is aged under S-BTI for 5 years, and when the supply voltage is dropped by 10%). Setup-time, clock-to-q, and data-to-q values are presented for LH/HL transitions and the delay is calculated according to Sect. 2.1. The delay degradation is the *relative post-aging delay increase of a design compared to the fresh delay of the original design* (marked as bold in the table):

delay degradation = 
$$\frac{\text{delay}_{\text{opt.,aged}} - \text{delay}_{\text{orig.,fresh}}}{\text{delay}_{\text{orig.,fresh}}}.$$
(6)

Since the optimized flip-flop will replace the corresponding flip-flop in the design, the delay degradation is compared to the fresh delay of the original flip-flop in order to give a better understanding of how close the aged delay of the optimized flip-flop is to the fresh delay of the original design.

Basically, scenario 1 is similar to the methods proposed in many flip-flop optimization methods such as [1, 13] in the sense that they consider a multiplication of energy and delay (e.g., the PDP or the Energy Delay Product (EDP)) as the optimization target. Scenario 1 is able to effectively reduce the PDP by increasing the delay and reducing the leakage, but this may result in an unacceptable timing for S-BTI corners. Table 2 shows that due to not considering the flip-flop delay as the optimization target, the PDP methods cannot find the optimum aging-resilient sizing for S-BTI corners.

As presented, for the original flip-flop, the fresh delay of LH and HL paths is almost identical (see  $D_{DQ,LH}$  and  $D_{DQ,HL}$ ), but after aging HL path is much slower than LH path. This leads to 35% delay degradation due to only aging and about 68% when aging and voltage-drop affect the flip-flop. When this flip-flop is optimized for scenario 1, the delay is not reduced well enough because the main concern is PDP not delay. On the other hand, in scenario 2 (proposed method, only for aging), the optimizer alters the sizing to equalize the post-aging delay of the LH/HL paths to achieve the smallest possible post-aging delay with respect to the constraints (119.4 ps). In this case, the post-aging delay is increased by 21% compared to the fresh delay of the original flip-flop. Also the leakage overhead is limited to 4.7%. Since the flip-flop operates in S-BTI zone, the switching rate of the flip-flop is very small. This means that its dynamic power is almost negligible. Therefore, the total power in of flip-flops under S-BTI is determined by the leakage power.



**Fig. 8** Performance of the original flip-flop vs. the flip-flop optimized by the proposed method at SP0 and SP1, before and after aging (5 years)

Even though scenario 2's design is much better for flip-flops which are only under the aging impact compared to the original and the state-of-the-art [1] flipflop designs, the impact of 10% voltage-drop is significant on the delay, i.e. 49% delay degradation. The flip-flop optimization results for scenario 3 show that such flip-flops are more resilient against both aging and voltage-drop impacts. These flipflops consume about 53% more leakage; however, the delay degradation is only 32% under both aging and voltage-drop. Please note that the number of flip-flops under such condition is very small. Therefore, using flip-flops optimized by scenario 3 has negligible impact on the overall processor power consumption.

## 5.3 Optimization Results for Other Flip-Flops

Figure 8 provides the optimization results for a set of representative flip-flops. It compares the delay and the leakage of the original and optimized flip-flops, for both fresh and post-aging states. All delay values are normalized to the fresh delay of the corresponding original flip-flops (which are 114.8 ps for C2MOS, 28.5 ps for SDFF, and 71.0 ps for SSTC).

For C2MOS flip-flop, the proposed method reduces the delay degradation in Eq. (6) to 21%, while the delay degradation of the original design is 35% (14% improvement). This flip-flop has a symmetric structure, which means it can have balanced timing for LH/HL transitions (shown in Fig. 2b), while some flip-flop topologies such as SDFF, always have an unbalanced timing for LH/HL transitions due to their internal structure. For example, in an SDFF, the delay of HL transition is



Fig. 9 Delay of C2MOS flip-flops optimized for SP0 aging using extra leakage (scenario 2). Delay degradation saturates as  $\beta$  increases (after  $\beta = 0.25$ )

always smaller than the LH transition. The reason is that, an intermediate precharged node in this flip-flop should be discharged in LH transition in order to transfer the input "one" to the output, while for the HL transition no such discharging is required. Hence, the slower path is always the LH path. This may worsen the aging if it is coupled with unbalanced aging. For these flip-flops, the optimizer minimizes the delay of the slower path by taking as much area as it can from the faster path, and giving the area to the slower path. For SDFF, this is attained with 15.8% additional leakage at SP0, but it leads to better S-BTI resiliency.

## 5.4 Delay-Leakage Trade-Off

In order to understand the trade-off between additional leakage and delay, we optimized a C2MOS flip-flop with several excessive leakage amounts ranging from 0 to 50% (i.e.  $\beta \in \{0, 0.125, 0.25, 0.5\}$ ). As shown by Fig. 9, lower delay degradation can be achieved by allowing the optimization method to design flip-flops with higher leakage. However, the improvement saturates as  $\beta$  increases. Hence, providing extra leakage to the optimizer is only beneficial until about 25%, because the improvement in the delay is not significant. Please note that the designed flip-flops with looser leakage constraints, i.e. higher  $\beta$ , do not necessarily have very high leakage. As shown in Table 2, the optimized flip-flop in scenario 2 (only aging) has only 4.7% extra leakage while providing much better resiliency against S-BTI aging compared to the original flip-flop and scenario 1 (state-of-the-art work).



**Fig. 10** Comparison of the aging-induced delay degradation under impact of voltage-drop, for original flip-flop, optimized flip-flop with 0% extra area allowance (scenario 2), and optimized flip-flop with 20% extra area allowance (scenario 3). The voltage-drop induced delay increase may be compensated by 20% upsizing of the flip-flop cell during the optimization

# 5.5 Delay-Area Trade-Off

The impact of a small amount of extra area on the resiliency of the flip-flops against both aging and voltage-drop impacts is studied by changing parameter *excessive area overhead*  $\lambda$  (see Table 1). We run the optimization flow in Sect. 3 for  $\lambda \in \{0, 0.2\}$  values and compare the results to the original flip-flop design. Based on the results shown in Fig. 10, the flip-flop designs with no extra area, i.e. scenario 2, exhibit good resiliency against aging; however, under the impact of 10% voltage-drop it has up to 49% delay degradation. Under the impact of voltage-drop, the flip-flop designed with 20% extra area exhibits much better characteristics with maximum 32% delay degradation. This observation confirms that using flip-flops with 20% extra area can be beneficial for the cases when both aging and voltage-drop impacts are severe.

## 5.6 Circuit-Level Results

The proposed selective flip-flop optimization method presented in Sect. 4 is applied to Leon3 processor with the setup presented in Sect. 5.1 to evaluate the overall impact on the processor timing and reliability. The "original flip-flop" designs are optimized for different output loads for minimum PDP in the fresh state, while the "optimized flip-flop" designs for "aging" and "aging + vdrop" are obtained by applying the proposed method. Therefore, per each original flip-flop design for a



**Fig. 11** The layout map of the Leon3 flip-flops during the execution of some MiBench workloads on Leon3, showing relative voltage-drop criticality, timing criticality, and aging-criticality of different flip-flops. Values close to "1" correspond to higher criticality, and values closer to "0" represent the non-critical parts. The top-left part of the processor layout is filled by combinational gates. (a) Relative voltage-drop criticality of flip-flops. (b) Relative timing criticality of flip-flops. (c) Relative aging-criticality of flip-flops

specific output load, there are different optimized designs for S-BTI corners SP0 and SP1 as well as no-vdrop and max-vdrop conditions (according to Sect. 3.5).

The timing of Leon3 processor is evaluated using the "aging and voltage-drop analysis" step of the proposed flow (see Fig. 7). This incorporates using an improved version of an aging-aware timing analysis tool [8] which also considers the impact of supply voltage variation as explained in Sect. 4.1. This timing analysis determines the processor delay under runtime variation impacts.

Figure 11 illustrates the timing of Leon3 flip-flops on the processor layout as well as the calculated impacts of voltage-drop and aging on the processor timing. The presented plots are all normalized to the maximum values (maximum voltage-drop, maximum delay, maximum aging) for better visualization. Therefore, higher values (darker colors) represent a critical situation. Figure 11a presents voltage-drop of the flip-flops extracted using the "aging and voltage-drop analysis" step. The voltagedrop values are normalized to the maximum voltage-drop value extracted during the simulations. As shown, many flip-flops experience at least a moderate voltage-drop during the workload execution. However, the flip-flops on the top-left corner of the layout experience heavy voltage-drop. The timing criticality of the flip-flops is also shown in Fig. 11b. The flip-flops with lower timing slack have values closer to 1.0 in this figure (darker). Interestingly, some of the flip-flops on the top-left corner are also timing-critical. Additionally, the aging-criticality of the flip-flops is presented in Fig. 11c. It is shown that many flip-flops which are under S-BTI are also timingcritical. Most importantly, a few timing-critical flip-flops are affected by both aging and voltage-drop impacts.

Table 3 presents processor delays obtained in fresh state, i.e. no aging or voltagedrop, and when under aging and voltage-drop impacts. We compare the delay of original processor (before applying the proposed method) with the delay of the optimized processors, under runtime variation impacts (aging and voltage-drop) after 7 years. The results are reported for:

|                                       | Processor delay<br>in fresh state | Processor delay<br>after 7<br>years + voltage-<br>drop | Delay<br>degradation | Guardband reduction | Equivalent<br>lifetime<br>improvement |
|---------------------------------------|-----------------------------------|--------------------------------------------------------|----------------------|---------------------|---------------------------------------|
| Using original<br>flip-flops          | 1389.6 ps                         | 1528.2 ps                                              | 9.97%                | _                   | _                                     |
| Proposed (only aging)                 | 1391.3 ps                         | 1494.8 ps                                              | 7.44%                | 33.4 ps             | 30.8%                                 |
| Proposed<br>(aging + voltage<br>drop) | 1379.7 ps                         | 1486.7 ps                                              | 7.75%                | 41.5 ps             | 36.9%                                 |

**Table 3** Processor delay comparison when (1) using only original flip-flops, and (2) using proposedmethod

- 1. "Original processor": using only original flip-flops,
- "Optimized processor for aging": when only the impact of aging is considered during optimization,
- 3. "Optimized processor for aging and voltage-drop": when the impacts of aging and voltage-drop are considered during optimization.

The "original processor" is synthesized using the original flip-flops designs in Table 2. Then, we apply the proposed selective flip-flop optimization in two modes: (I) when only aging is considered, and (II) when both aging and voltage-drop are considered. This obtains two versions of the optimized processor, i.e. "Optimized processor for aging" and "Optimized processor for aging and voltage-drop." In the optimization flow presented in Sect. 4.2, we assume k = 0.15. Therefore, all flip-flops with a slack value less than 15% of the processor delay are assumed as timing-critical flip-flops. Additionally, we assume r = 0.95, which means up to 5% calculation error guardband in the timing analysis method is acceptable. In fact, r value depends on the accuracy of the timing analysis method. After replacing the critical flip-flops according to the proposed method, the processor delay is obtained again using the "aging and voltage-drop analysis" step.

According to the table, delay of the "original processor" is increased by 9.97% after 7 years. This translates into 138.6 ps timing guardband for 7 years of circuit operation, i.e.  $T_{clk} \ge 1528.2$  ps. The "optimized processor for aging" has better delay 1494.8 ps under the impacts of aging and voltage-drop which reduces the required timing guardband by 33.4 ps for 7 years of operation, hence optimizing the performance. Therefore, the degradation rate of this optimized processor is such that it can operate for 9.2 years (30.8% lifetime improvement), if it is used with the timing margins of  $T_{clk} = 1528.2$  ps. Finally, the required timing guardband of "Optimized processor for aging and voltage-drop" is further reduced by 41.5 ps compared to the original processor. Therefore, the lifetime of the processor is improved by 36.9% (9.6 years).

The reason for the achieved improvements in Table 3 is explained by Fig. 12. Here, we only plotted the delay of timing-critical flip-flops with a slack smaller



Fig. 12 Fresh delay (no aging, no voltage-drop) vs. increased delay (aged and 10% voltage-drop) of critical paths of Leon3 processor. The proposed selective flip-flop optimization method replaces the original flip-flops under S-BTI (red) with the optimized flip-flops (green) and suppresses the aging and voltage-drop degradation of the most critical paths

than 15% of the processor delay (under aging and voltage-drop impacts). With this assumption, there are 261 timing-critical flip-flops. Among the timing-critical flip-flops, 92 flip-flops are under S-BTI impact (i.e.  $0 \le SP < 0.01$  or  $0.99 < SP \le 1$ ), 235 flip-flops experience at least 33% relative voltage-drop. After applying the selective flip-flop optimization method, 96 flip-flops are replaced with optimized versions, from which 39 flip-flops are upsized (due to both aging and voltage-drop impact).

As the optimized flip-flops constitute about 4% of all flip-flops in Leon3, the overall leakage overhead with this method is 0.22% according to power analysis results using Synopsys Design Compiler. Moreover, there is virtually no dynamic power overhead because the replaced flip-flops are mostly under S-BTI impact and they rarely switch. The additional area overhead is also very negligible because only 39 flip-flops are replaced by the upsized versions (less than 0.1% area overhead). The ECO process easily fits these flip-flops into the existing layout by slightly moving other cells. Please note that the impact of the voltage-drop and aging on the driving logic paths is much less compared to the flip-flops. Therefore, these paths are degraded at a much lower rate.

## 6 Comparison with the Related Work

Various methods have been proposed to address the impact of aging and voltagedrop on flip-flops [1, 13, 23, 25]. For example, [1] proposes a method to improve flip-flop reliability for a set of corners with different working conditions such as temperatures and voltages by altering the sizing of transistors. These studies mostly optimize flip-flops for dynamic BTI stress condition, and flip-flops under static BTI are mostly overlooked. As explained, the traditional optimization techniques such as optimization for the PDP, or EDP cannot effectively address the delay increase of flip-flops under such stress. There are techniques to reduce the overall impact of voltage-drop on VLSI circuits by skewing the clock input of the flip-flops at design-time in order to reduce the peak current at clock edge [9, 35]. However, these methods are not applicable to flip-flops with zero (or close to zero) timing slack on the critical paths. The techniques at high abstraction level by softwareguided thread scheduling [27] or by voltage emergency prediction [26] also impose additional overhead at another abstraction level than circuit-level, in order to address a circuit-level problem.

## 7 Summary

In many cases, NTC circuits are required to operate over a wide voltage range in order to achieve energy efficiency and satisfy performance constraints as needed. Therefore, an NTC circuit may be exposed to reliability issues such as aging and voltage-drop which are significant in the super-threshold region.

In this chapter, we discussed that a non-negligible portion of circuit flip-flops may be under severe aging or large voltage-drop impact, which leads to timing and functional failures. Therefore, these flip-flops need to be treated separately and specific stress-tolerant designs should be used in order to improve the reliability and lifetime. Accordingly, we propose a method to selectively optimize the flip-flops operating under severe aging stress and/or voltage-drop conditions. The proposed optimization flow resizes the flip-flop transistors to obtain the variability-resilient cells. Then, flip-flops which are under the impact of aging and/or voltage-drop are determined using a variation-aware static timing analysis tool, and are replaced by the optimized flip-flops which can withstand aging and voltage-drop impacts much better. Simulation results show that the proposed selective flip-flop optimization method can reduce Leon3 processor timing guardband, and improve the lifetime of the processor by 36.9%, with negligible power and area overhead.

## References

- Abrishami, H., Hatami, S., Pedram, M.: Multi-corner, energy-delay optimized, NBTI-aware flip-flop design. In: International Symposium on Quality Electronic Design (ISQED), pp. 652– 659 (2010). https://doi.org/10.1109/ISQED.2010.5450509
- Ajami, A.H., Banerjee, K., Mehrotra, A., Pedram, M.: Analysis of IR-drop scaling with implications for deep submicron P/G network designs. In: International Symposium on Quality Electronic Design (ISQED), pp. 35–40 (2003)
- Amrouch, H., Ebi, T., Schneider, J., Parameswaran, S., Henkel, J.: Analyzing the thermal hotspots in FPGA-based embedded systems. In: International Conference on Field Programmable Logic and Applications (FPL), pp. 1–4 (2013)

- Bhardwaj, S., Wang, W., Vattikonda, R., Cao, Y., Vrudhula, S.: Predictive modeling of the NBTI effect for reliable design. In: Custom Integrated Circuits Conference (CICC), pp. 189– 192. IEEE, Piscataway (2006)
- 5. Cadence Encounter Timing System. http://www.cadence.com
- Cadence Virtuoso Liberate Characterization Solution. http://www.cadence.com/products/cic/ liberate/pages/default.aspx
- Denney, J., Ramsey, C.: Comparison of finite-difference and spice tools for thermal modeling of the effects of nonuniform power generation in high-power CPUs. Hewlett-Packard J. 50, 37–45 (1998)
- Ebrahimi, M., Oboril, F., Kiamehr, S., Tahoori, M.B.: Aging-aware Logic Synthesis. In: International Conference on Computer-Aided Design (ICCAD), pp. 61–68 (2013)
- 9. Fishburn, J.P.: Clock skew optimization. IEEE Trans. Comput. 39(7), 945-951 (1990)
- 10. Gaisler, A., Göteborg, S.: Leon3 multiprocessing CPU core. Aeroflex Gaisler (2010)
- Gnad, D.R.E., Oboril, F., Kiamehr, S., Tahoori, M.B.: An experimental evaluation and analysis of transient voltage fluctuations in FPGAs. IEEE Trans. Very Large Scale Integr. VLSI Syst. 26(10), 1817–1830 (2018)
- 12. Golanbari, M.S., Kiamehr, S., Ebrahimi, M., Tahoori, M.B.: Aging guardband reduction through selective flip-flop optimization. In: IEEE European Test Symposium (ETS) (2015)
- Golanbari, M.S., Kiamehr, S., Tahoori, M.B., Nassif, S.: Analysis and optimization of flipflops under process and runtime variations. In: International Symposium on Quality Electronic Design (ISQED) (2015)
- Golanbari, M.S., Kiamehr, S., Ebrahimi, M., Tahoori, M.B.: Selective flip-flop optimization for reliable digital circuit design. IEEE Trans. Very Large Scale Integr. VLSI Syst. 39(7), 1484– 1497 (2020)
- Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: MiBench: a free, commercially representative embedded benchmark suite. In: IEEE International Workshop on Workload Characterization, pp. 3–14. IEEE, Piscataway (2001)
- 16. International Technology Roadmap for Semiconductors (ITRS). http://www.itrs2.net
- Kaul, H., Anders, M.A., Mathew, S.K., Hsu, S.K., Agarwal, A., Krishnamurthy, R.K., Borkar, S.: A 320 mv 56 μw 411 GOPS/watt ultra-low voltage motion estimation accelerator in 65 nm CMOS. IEEE J. Solid State Circuits 44(1), 107–114 (2009)
- Kiamehr, S., Ebrahimi, M., Firouzi, F., Tahoori, M.B.: Extending standard cell library for aging mitigation. IET Comput. Digit. Tech. 9(4), 206–212 (2015)
- Kraft, D.: A software package for sequential quadratic programming. Forschungsbericht-Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raumfahrt 88–28, 1–33 (1988)
- Krishnan, A.T., Cano, F., Chancellor, C., Reddy, V., Qi, Z., Jain, P., Carulli, J., Masin, J., Zuhoski, S., Krishnan, S., et al.: Product drift from NBTI: Guardbanding, circuit and statistical effects. In: International Electron Devices Meeting, pp. 4–3 (2010)
- Mezhiba, A.V., Friedman, E.G.: Scaling trends of on-chip power distribution noise. IEEE Trans. Very Large Scale Integr. VLSI Syst. 12(4), 386–394 (2004)
- Nithin, S., Shanmugam, G., Chandrasekar, S.: Dynamic voltage (IR) drop analysis and design closure: Issues and challenges. In: International Symposium on Quality Electronic Design (ISQED), pp. 611–617 (2010)
- Nunes, C., Butzen, P.F., Reis, A.I., Ribas, R.P.: BTI, HCI and TDDB aging impact in flip-flops. Microelectron. Reliab. 53(9–11), 1355–1359 (2013)
- Ramakrishnan, K., Wu, X., Vijaykrishnan, N., Xie, Y.: Comparative analysis of NBTI effects on low power and high performance flip-flops. In: International Conference on Computer Design (ICCD), pp. 200–205 (2008)
- Rao, V.G., Mahmoodi, H.: Analysis of reliability of flip-flops under transistor aging effects in nano-scale CMOS technology. In: International Conference on Computer Design (ICCD), pp. 439–440 (2011)
- Reddi, V.J., Gupta, M.S., Holloway, G., Wei, G.Y., Smith, M.D., Brooks, D.: Voltage emergency prediction: using signatures to reduce operating margins. In: International Symposium on High-Performance Computer Architecture (HPCA), pp. 18–29 (2009)

- Reddi, V.J., Kanev, S., Kim, W., Campanoni, S., Smith, M.D., Wei, G.Y., Brooks, D.: Voltage smoothing: Characterizing and mitigating voltage noise in production processors via softwareguided thread scheduling. In: International Symposium on Microarchitecture, pp. 77–88 (2010)
- Reddy, V., Carulli, J., Krishnan, A., Bosch, W., Burgess, B.: Impact of negative bias temperature instability on product parametric drift. In: International Conference on Test, pp. 148–155 (2004)
- Sato, T., Ichimiya, J., Ono, N., Hachiya, K., Hashimoto, M.: On-chip thermal gradient analysis and temperature flattening for SoC design. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 88(12), 3382–3389 (2005)
- Schlunder, C., Aresu, S., Georgakos, G., Kanert, W., Reisinger, H., Hofmann, K., Gustin, W.: HCI vs. BTI? - Neither one's out. In: IEEE International Reliability Physics Symposium (IRPS), pp. 2F.4.1–2F.4.6 (2012)
- Stojanovic, V., Oklobdzija, V.G.: Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems. IEEE J. Solid State Circuits 34(4), 536–548 (1999)
- 32. Sundareswaran, S.: Statistical characterization for timing sign-off: from silicon to design and back to silicon. Ph.D. Thesis, UT Austin (2009)
- Tradowsky, C., Cordero, E., Deuser, T., Hübner, M., Becker, J.: Determination of on-chip temperature gradients on reconfigurable hardware. In: International Conference on Reconfigurable Computing and FPGAs (ReConFig), pp. 1–8 (2012)
- Unger, S.H., et al.: Clocking schemes for high-speed digital systems. IEEE Trans. Comput. C-35(10), 880–895 (1986)
- Vittal, A., Ha, H., Brewer, F., Marek-Sadowska, M.: Clock skew optimization for ground bounce control. In: International Conference on Computer-Aided Design (ICCAD), pp. 395– 399 (1996)
- Wang, W., Yang, S., Bhardwaj, S., Vrudhula, S., Liu, F., Cao, Y.: The impact of NBTI effect on combinational circuit: modeling, simulation, and analysis. IEEE Trans. Very Large Scale Integr. VLSI Syst. 18(2), 173–183 (2010). https://doi.org/10.1109/TVLSI.2008.2008810
- 37. Wu, J.K., Wu, T.Y., Lu, L.Y., Chen, K.Y.: IR drop reduction via a flip-flop resynthesis technique. In: International Symposium on Quality Electronic Design (ISQED), pp. 78–83 (2008)
- Zhang, R., Wang, K., Meyer, B.H., Stan, M.R., Skadron, K.: Architecture implications of pads as a scarce resource. In: International Symposium on Computer Architecture (ISCA), pp. 373– 384 (2014)
- 39. Zhao, W., Cao, Y.: New generation of predictive technology model for sub-45nm design exploration. In: International Symposium on Quality Electronic Design (ISQED), pp. 585– 590. IEEE Computer Society, Washington (2006)

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

