Impact of Negative Capacitance Field-Effect Transistor (NCFET) on Many-Core Systems

Negative capacitance field-effect transistor (NCFET) is one of the very promising emerging technologies that may replace existing CMOS technology in the future. This is due to its ability to push the sub-threshold swing of transistors to below 60 mV/decade, which what fundamentally limits voltage scaling in conventional CMOS technology and what has led in the past to the discontinuation of Dennard’s scaling. In this chapter, we demonstrate how NCFET technology can reshape the future of many-core systems—especially when it comes to thermally constrained processors. Compared to conventional CMOS technology, NCFET technology allows processors (1) to be clocked at higher frequency without any increase in the power density or (2) to operate at lower voltages, without sacrificing performance requirements. The latter leads to a significant power saving that plays a major role in increasing the total number of processor cores that can be simultaneously powered on without exceeding the predetermined thermal constraints. In addition, we demonstrate how NCFET technology makes the leakage power in processors increase, instead of decrease as in conventional CMOS technology, when voltage is scaled down. This fundamentally breaks down the core assumption of all existing power management techniques. Hence, novel runtime power management techniques need to be devolved in which such a new inverse leakage-voltage dependency is carefully considered. Otherwise, the efficiency of NCFET-based processors cannot be optimized. In this chapter, we also demonstrate the first NCFET-aware dynamic voltage scaling that optimizes the power of processor while taking the inverse leakage-voltage dependency into account.


Introduction
More than a decade ago, the semiconductor technology had entered the so-called nano-CMOS era, in which the transistor's feature sizes became below 90 nm. Since then, the prior trend of voltage scaling came to an end leading to the discontinuation of Dennard's scaling [7]. In Dennard's scaling, both the dimensions of transistor and the operating voltage are typically scaled by the same factor in order to ensure a constant electric field. Due to the non-scalable voltage, ever-increasing power densities in chips became a substantial obstacle for technology scaling due to the limited ability of existing cooling solutions to dissipate the generated heat [8]. To overcome this fundamental problem, the maximum frequency of processors had stopped increasing with every new generation in order to keep the on-chip power densities under acceptable levels and since 2005 the era of many-core processors had started.
To understand the inability of technology to scale voltage, we need to understand what determines the speed of a processor. As a matter of fact, the drive current (ON current) of a transistor dictates its switching speed and hence it ultimately determines the maximum delay of logic paths that form the processor's netlist. The ON current of a transistor is proportional to (V DD − V T ), where V T denotes the threshold voltage of transistor and V DD denotes the operating voltage. In order to maintain the same level of current, while V DD is scaled down, V T must also be reduced by almost the same amount. However, reducing V T comes with an exponential increase in the leakage current (OFF current) of transistor. This is primarily because that the sub-threshold swing of transistor is fundamentally limited to 60 mV/decade at room temperature akin to "Boltzmann tyranny" [21]. Such a fundamental limit inevitably restricts the minimum possible V T to be at least 300 mV. To ensure a reliable operation, different kinds of safety margins need to be added on top of the minimum voltage, which enforces the operating voltage to remain almost the same with every new technology generation. As abovementioned, the inability to scale voltage has led to the discontinuation of Dennard's scaling, which, in turn, had led to preventing the frequency of processors from increasing.
In summary, the fundamental limit of sub-threshold swing of transistor is the primary reason behind not scaling voltage and it is the origin of on-chip power density problems that processor's designers are facing since more than a decade ago.

Negative Capacitance Field-Effect Transistor (NCFET)
NCFET integrates a ferroelectric layer inside the gate stack of a transistor, which acts as a negative capacitance. Such a layer provides an amplification of the vertical electric field that the transistor perceives. This, in turn, allows the transistor to overcome the fundamental limitation of sub-threshold swing of 60 mV at room temperature. The principle of NCFET was first proposed in 2008 by S. Salahuddin and S. Datta [16]. After which, it very rapidly gained a large popularity due to the remarkable steep switching and high ON current of transistors [1]. Many experiments have consistently proved NCFET [10]. A breakthrough has recently occurred when GlobalFoundries demonstrated NCFET-based circuits using their state-of-theart industrial 14 nm FinFET technology [9]. This showed, for the first time, that NCFET technology has become compatible with the existing CMOS fabrication process. In fact, such a compatibility is essential for any emerging technology to be adopted by semiconductor companies. Otherwise, massive production will never be possible.
In practice, NCFET technology enables the transistor to reach the same ON current, without increasing the OFF current, but at a much lower voltage [2]. This is only possible due to steeper sub-threshold swing. Therefore, in an NCFET technology, the processor can still meet the same performance (as in the conventional FET) but at a lower operating voltage leading to a significant power saving. Beside the low-power usage scenario of NCFET, high-performance usage scenario does also exist. NCFET enables the processor to be clocked at a higher frequency (compared to the conventional FET), while it still be operated at the same voltage due to the increase in the ON current. NCFET technology comes with an important side effect in which it increases the total capacitance of transistor. Such an increase can lead to reliability problems caused by IR-drop and voltage fluctuation during circuit's operation [2,18]. At the same time, because NCFET technology enables circuits to operate at lower voltages, it is expected that other reliability problems, related to lifetime, to become much less because all the underlying aging mechanisms, such as negative bias temperature instability (BTI) and hot-carrier injection (HCI), strongly depend on the operating voltage [20].
In the following sections, we explain how modeling the NCFET effects from physics all the way to the system level can be done. Then, we explore how a manycore system can profit from the NCFET technology. Finally, we explore the impact that NCFET has on power management schemes and how existing assumptions w.r.t voltage-leakage dependency become not valid anymore when it comes to NCFET, which creates the necessity to develop novel power management techniques.

Modeling NCFET at the System Level
In the following we provide an overview of how NCFET is modeled at the system level, i.e., for the purpose of simulating many-core processors. Fundamentally, the properties of the ferroelectric layer are modeled at the physics level [12]. Figure 8.1 presents our methodology in which we traverse all layers from physics, through device, gate, and processor level, to model NCFET at the system level. The behavior of transistors with varying thickness of the ferroelectric layer is modeled following the industrial-standard compact model (BSIM-CMG) [5,14]. Based on this model, we created NCFET-aware cell libraries supporting four different thicknesses of the ferroelectric layer under a wide range of the operating voltage [1]. The thickness ranges from 1 nm (called TFE1) up to 4 nm (TFE4). We then implemented a single many-core tile to the GDSII level and performed timing and power signoffs. The results are explained in detail in the next section. Signoff tools allow to compare power and performance of a processor implemented in different NCFET configurations and are used to extract frequency-dependent scaling factors for dynamic and leakage power. These factors serve as an abstraction at the system level and allow to estimate the power of an NCFET-based processor if the power of 1 Modeling NCFET at the system level (many-core processors) requires to traverse the whole stack from the physics level, where the effects of the ferroelectric layer are modeled, to the system level, where performance and power of many-core processors are affected a baseline implementation (conventional FinFET) is known. Finally, these factors are used to simulate a many-core processor (further details in Sect. 8.2.2).

Processor-Level Investigation
This section shows how NCFET affects the performance and power of a single processor. The insights gained from this evaluation are important to build systemlevel NCFET models and explain observations from system-level simulations. We implemented the layout (GDSII level) of a single tile of the OpenPiton manycore [3], which contains a CPU, caches, and a NoC router. Power and timing signoffs are performed for different NCFET configurations (TFE1 to TFE4) and different operating voltages. Further details of the experimental setup can be found in [15]. Figure 8.2a shows how NCFET increases the performance of a processor. It allows to clock a processor at a higher frequency at the same operating voltage or allows to reduce the voltage while still maintaining the same performance (frequency). This is due to the inherent voltage amplification provided by the additional NCFET increases the frequency of a processor at a certain operating voltage, but (b) also increases the dynamic power consumption due to the increase in the transistor gate capacitance and frequency. (c) While leakage increases almost linearly with the operating voltage with conventional FinFET (baseline), this dependency gets weaker with a thin ferroelectric layer and even reverses with TFE4 due to a negative DIBL effect ferroelectric layer. Like explained earlier, the ferroelectric layer increases the total gate capacity. Together with increased frequency, this increases the dynamic power consumption ( Fig. 8.2b). The thicker the ferroelectric layer gets, the higher get the gains in the frequency, but also the higher gets the dynamic power. Figure 8.2c shows that leakage power is affected more severely. NCFET fundamentally changes the trend. With conventional FinFET (baseline), leakage power increases strongly with increasing voltage. When a thin ferroelectric layer is added (TFE1 and TFE2), this dependency becomes weaker, until at TFE3, leakage is almost independent of the voltage. With a thick ferroelectric layer (TFE4), an effect called negative drain-induced barrier lowering (negative DIBL) reverses the leakage dependency on the voltage [13]. Here, leakage increases at lower voltages. We will explain later (Sect. 8.3.3) how this necessitates developing novel power management techniques.

Simulation of NCFET-Based Many-Core
We use the Sniper many-core simulator [6] to simulate many-core processors.
McPAT [11] is used to periodically estimate the power consumption of each core. Since McPAT does not support NCFET, it is used to estimate the power with conventional FinFET instead. We develop frequency-dependent scaling factors for dynamic and leakage power based on the processor-level investigation explained earlier.

Fig. 8.3 (a)
While NCFET technologies increase the dynamic power at iso-voltage, they also lower the required operating voltage at iso-frequency, which in total decreases the dynamic power at the same frequency. (b) NCFET technologies with a thin ferroelectric layer lower the leakage power, whereas leakage increases with a thick layer (TFE4). Most importantly, the negative DIBL effect reverses the leakage dependency, where lowering the V/f-levels increases leakage allow to go to a lower operating voltage (Fig. 8.2a) while still maintaining the same frequency. Lowering the operating voltage has a stronger effect on the dynamic power. Consequently, NCFET technologies lower the dynamic power when operating at the same frequency ( Fig. 8.3a). Figure 8.3b shows how leakage power depends on the V/f-level. The reverse leakage dependency with TFE4 strongly increases the leakage power. Below 700 MHz, TFE4 would allow to reduce the voltage below 0.2 V, which is the lower limit of the cell library. Figure 8.3a,b allows to estimate the dynamic and leakage power consumption of a processor that is implemented in NCFET, if the power consumption in the baseline (conventional FinFET) is known. We extract frequency-dependent scaling factors for both dynamic and leakage power. These factors serve as an abstraction that allows simulation of complex benchmark applications, like PARSEC [4], on manycore processors with dozens of cores. We thereby scale the leakage and dynamic power that is estimated by McPAT to estimate the power consumption of NCFETbased many-cores. For brevity, details on this approach are omitted here and can be found in [15].

Performance, Power, and Cooling Trade-Offs with NCFET-based Many-Cores
NCFET fundamentally changes the characteristics of transistors and therefore also changes the performance and power of circuits [19], single-core processors [1], and many-core processors [15]. This section demonstrates the impact of the thickness of the ferroelectric layer on the power and performance of a many-core processor. We show that the optimal thickness depends on many factors, such as the application characteristics and the cooling scenario. This section evaluates performance, power, and cooling of a 25-core many-core operating under a thermal constraint of 80 • C. We study PARSEC [4] tasks with up to eight slave threads. Their characteristics range from highly memory-bound (e.g., canneal) to highly compute-bound (e.g., swaptions).

Impact of NCFET on Performance
Due to high power densities (failure of Dennard's scaling) and limited cooling capabilities, it is not always possible in modern technology nodes to simultaneously operate all cores at the peak V/f-levels without violating the thermal constraint. This study investigates the use-case in which cores with an active thread are operated at the peak V/f-levels and cores without a thread mapped to it are power-gated. In this use-case, four factors affect the thermally sustainable utilization (i.e., the number of cores that can be turned on): the application characteristics (power consumption), the mapping of threads to cores, the cooling system, and the transistor technology. We use an Integer Linear Program to obtain the thermally-optimal mapping of threads to cores, which minimizes the formation of hotspots and, thereby, maximizes the thermally sustainable utilization. We study the use-case of a passive cooling, i.e., there is no fan on top of the heat sink. Figure 8.4 shows the thermally sustainable utilization of two benchmarks bodytrack and swaptions during the parallel section of the benchmarks (Region of Interest) for different NCFET technologies. Other benchmarks are available in [15]. Swaptions is a highly compute-intensive task, which results in high power consumption and therefore, the thermally sustainable utilization in the baseline is low (only 8 out of 25 cores). Dynamic power forms the major part of the total power consumption and therefore, thicker ferroelectric layers increase the thermally sustainable utilization because dynamic power is reduced (compare Fig. 8.3a). Consequently, the highest performance is observed with the thickest ferroelectric layer (TFE4). Bodytrack is less compute-intense and has lower dynamic power consumption and consequently lower total power. This results in a higher thermally sustainable utilization compared to swaptions. However, due to lower dynamic power, leakage power accounts for a larger fraction of the total power. As demonstrated in Fig. 8.3b, TFE4 increases the leakage significantly over TFE3. Consequently, TFE4 results in a lower thermally sustainable utilization than TFE3 for bodytrack and the highest performance is observed with TFE3.
These investigations show that the optimal thickness of the ferroelectric layer depends on the application characteristics. Further investigations on how NCFET affects the performance in the case that cores are not operated at the peak V/f-levels can be read in [15]. These investigations additionally study forced-convection cooling (a heat sink with a fan) and reveal that the optimal thickness of the ferroelectric layer also depends on the cooling scenario.

Impact of NCFET on Cooling Requirements
This section studies how NCFET reshapes the existing trade-off between cooling costs and achievable performance, where higher performance comes at the cost of higher power dissipation and therefore higher cooling costs. We study the use-case in which the many-core is operated at its peak performance, i.e., all cores are active at the peak V/f-levels and determine the cooling capabilities that are required to make this use-case thermally safe. The cooling capabilities are measured by the inverse of thermal resistance of the heat sink 1/R th . Varying this value corresponds to changing the air convection. Figure 8.5 shows the required cooling capabilities for the three PARSEC benchmarks swaptions, bodytrack, and canneal during the parallel section of the benchmarks (Region of Interest). NCFET technologies allow to reduce the cooling capabilities over the baseline (conventional FinFET). Most importantly, the required cooling capabilities are minimized at different thicknesses of the ferroelectric layer depending on the application. Swaptions is highly computeintensive and consequently, dynamic power accounts for the majority of the total power. Increasing the thickness of the ferroelectric layer reduces the dynamic power (see Fig. 8.3) and therefore reduces the required cooling. Canneal on the other side is highly memory-bound and therefore, the power consumption is dominated by leakage. Leakage is minimized at TFE2, which consequently minimizes the cooling requirements. Bodytrack shows intermediate values for the dynamic power and therefore, TFE3 is optimal. This investigation shows again that the optimal thickness of the ferroelectric layer depends on the application characteristics and ranges from 2 nm to 4 nm.

Impact of NCFET on Power Management Techniques
The above investigations use the well-established concept of V/f-pairs that are determined at design time by selecting the operating voltage for a given frequency as the lowest voltage that makes operating at this frequency reliable. This is a reasonable approach with conventional transistor technologies, because using a higher voltage would unnecessarily increase both dynamic and leakage power. However, this is no longer true with NCFET with a thick ferroelectric layer (TFE4). Here, increasing the voltage decreases the leakage power. This leads to new optimization potential by selecting the operating voltage for a given frequency, which is demonstrated in the next section.

NCFET-Aware Voltage Scaling
Dynamic voltage scaling (DVS) technique for processor power management is considered to be one of the most effective ways to reduce the energy consumption of an application. DVS technique typically selects the minimum operating voltage V min that sustains the operating frequency of the processor at runtime based on the frequency demands of the application being executed. Reducing the operating voltage, in conventional FET, results in reducing the total power consumption, which implicitly reduces both dynamic and leakage power. However, such a wellknown voltage dependency becomes inverse with respect to leakage power in NCFET due to the negative DIBL effect (see Sect. 8.2.1). With such opposed dependencies (dynamic and leakage) to the operating voltage, total power follows the dominant component when voltage changed which leads to a novel trade-off. Consequently, power is not necessarily minimized at the minimum voltage V min , which traditional DVS selects, but at another voltage V opt . Unawareness of NCFET and its trade-off could lead to not minimize the total power consumption. Therefore, in this section, a novel NCFET-aware voltage scaling technique is presented [17] to overcome the shortness that traditional DVS has in NCFET-based processors.

Importance of NCFET-Aware DVS
With traditional DVS, a set of voltage-frequency pairs are typically selected at design time and later are employed by the DVS technique at runtime to optimize the power. In this case, the lower the selected voltage is, the lower the total power is. Due to the new inverse dependency in leakage power that NCFET exhibits, this is not always valid with respect to NCFET. To demonstrate the consequence of such an inverse dependency at the system level, we plot the total power consumption and its components of the master thread of PARSEC canneal benchmark in Fig. 8.6.  Fig. 8.6 Total power and its components (i.e., leakage and dynamic) of canneal master thread starting from the minimum voltage V min required to sustain 1.0 GHz frequency. The total power is not minimized at V min . The operating voltage required to minimize total power V opt appears at a higher voltage than V min due to leakage increases in NCFET The power examined starting from V min , that traditional DVS selects to sustain the required frequency, and then to overscale the operating voltage. The result shows that the power is not minimized at V min (i.e., V opt =V min ).
Different workloads exhibit different characteristics and hence different total power. Therefore, the contribution of power components differs. Traditional DVS neglects this difference as both contributions (leakage and dynamic) are affected in the same manner with voltage (both are reduced). With NCFET, the contribution of leakage to the total power cannot be neglected because it affects the operating voltage selection when DVS tries to minimize total power. Hence, based on the leakage share, V opt could differ from V min .
For the aforementioned reasons, NCFET-aware DVS is crucial due to the change in the behavior of total power consumption over voltage scaling which emerges from the inverse dependency with respect to leakage power in NCFET.

NCFET-Aware DVS Technique
To enable runtime voltage selection, DVS first needs to determine workload characteristics and then V opt can be correctly selected. Therefore, determining V/fpairs at runtime, like in traditional DVS techniques, is not possible here. Instead, the results from Sect. 8.2.1 have been used to build the power and performance analytical models at design time. Then, these models can be integrated with our new NCFET-aware DVS technique for runtime voltage selection.

Design-Time Models
Power and Performance Modeling The maximum operating frequency f max (V ) depends on the voltage V over the minimum delay d min (V ): a del >0, b del <0, c del ≥0 are constants fitting parameters obtained at design time. Peak leakage and peak dynamic power consumption results by operating at maximum frequency are a dyn >0, b dyn >1, c dyn ≥0, a leak >0, b leak <0 are constant fitting parameters obtained at design time. Both P peak dyn (V , d min (V )) and P leak (V ) are convex in V . By lowering the operating frequency of the CPU (higher delay), dynamic power decreases. However, since leakage power is independent from CPU activity, it is not affected.

Workload-Dependent Power Modeling Dynamic power consumption P dyn (V , d)
is affected by the running workload, which is reduced by a factor 0≤r dyn ≤1 from the peak dynamic power P peak dyn (V , d): r dyn is not constant since it represents the current workload activity. Therefore, total power consumption P total (V c , d) at the current voltage V c , r dyn is Optimal Voltage Computing V opt that minimizes the total power can be obtained from the power and performance models: Algorithm 1 NCFET-aware voltage scaling algorithm to select the optimal voltage (V opt ) at runtime [17] Require: Power and performance models: P peak dyn (c, d)

Operating Voltage Selection
Both DVS techniques differ in the way they select the operating voltage. Therefore, to show the different behavior between both techniques in operating voltage selection, the design space of the operating voltage selection with NCFET-aware (V opt ) and NCFET-unaware DVS (V min ) has been explored in Fig. 8.7. NCFETunaware DVS sets V min that is needed to sustain the required frequency and therefore workload characteristic is not considered. Contrarily, NCFET-aware DVS considers the workload characteristic as it depends on the ratio of leakage to total power measured at V min . The explored design space in Fig. 8.7 shows two distinct regions: (1) For low leakage to total power ratio and for high frequencies, the same voltage is selected (similar action) by both techniques (i.e., V opt =V min ). (2) For high ratios of leakage to total power or low frequencies, NCFET-aware DVS selects a higher voltage (V opt >V min ). Moreover, Fig. 8.7 reveals that: the higher the required frequency or the higher the leakage to total power ratio, the higher V opt is. (2) Similar action is done by both DVS as they select the same operating voltage (V min =V opt ). NCFET-unaware DVS selects V min (that sustains the required frequency) and NCFET-aware selects V opt to minimize the power depending on the frequency and the ratio of leakage to total power. NCFET-aware DVS selects higher voltages when leakage power becomes prominent or at lower frequency

Experimental Setup
Using the same setup in Sect. 8.2.1, power and delay results were examined using the highest ferroelectric thickness (4 nm). Afterwards, the power and performance analytical models have been developed as described in Sect. 8.4.2.
For system-level simulation, relying on the setup described in Sect. 8.2.2, the NCFET-aware DVS technique (Algorithm 1) has been used to select the operating voltage when a set of tasks were examined from the PARSEC benchmark suite [4]. The frequencies are set between 1.0 GHz and 2.4 GHz. V dd is set between 0.2 V and 0.7 V. The low operating voltages V dd in NCFET are lower than traditional FET due to the inherent voltage amplification in NCFET provided by the negative capacitance. For fair comparisons, simulators for both DVS cases were configured to have: the same frequencies, and architecture, in addition to running the same benchmarks. Hence, only voltage selection differs based on DVS decision.

NCFET-Aware DVS Results and Analysis
To show the effectiveness of the NCFET-aware DVS, we first show how NCFETaware DVS actually operates to save power and later to report the energy savings  In phase-1, in Fig. 8.8b, it shows the total power consumption when the frequency is set at 1.7 GHz. Traditional DVS sets V dd to the minimum voltage (0.28 V) which required to sustain this frequency. Thus, dynamic power is minimized but the leakage power is not. NCFET-aware DVS sets V dd to a higher value to guarantee a better trade-off. This will increase the dynamic power but strongly decreases leakage power resulting in a power saving. In phase-2, the master thread is idle and waits for the termination of the slave threads. Therefore, frequency is reduced to the minimum frequency (1.0 GHz). Traditional DVS reduces V dd to 0.2 V due to the low required frequency in which it increases the leakage power. NCFET-aware DVS, instead of reducing V dd , increases the voltage to 0.53 V, which decreases the leakage power. Thereby, the total power consumption in phase-2 is reduced by 67 % compared to the traditional DVS. In phase-3, after the slaves terminated, the master resumes operation and its frequency is boosted again to 1.7 GHz. It is worth to mention that the performance obtained with both DVS techniques is the same. This is because they do not affect the frequency, but only set the V dd under performance constraint.
To reveal the energy savings, different PARSEC benchmarks were examined when active threads are operated at 1.7 GHz and idle cores are suppressed to 1.0 GHz. Figure 8.9 summarizes the energy savings. Energy savings range as shown in Fig. 8.9 from 14 % up to 27 % and in average are up to 20 %.

Conclusion
In this chapter, we investigated how NCFET technology impacts the existing trade-offs in processors and how it can reshape the future of many-core systems. Compared to the existing FinFET technology, NCFET technology allows the processor to operate at a much lower voltage while it still meets the same performance. This results in a considerable power saving and as a result the total number of cores, that can be simultaneously turned on at the maximum frequency, increases without violating the predetermined thermal constraints. We also showed how NCFET inverses the leakage-voltage dependency and proposed a new NCFETaware DVS technique that provides an energy saving of 20% on average compared to conventional DVS techniques, which are unaware of the new leakage-voltage dependency that NCFET brings. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.