Thermal Challenges of 3D ICs

Lin, Sheng-Chih; Banerjee, Kaustav

doi:10.1007/978-0-387-76534-1_14

Sheng-Chih Lin⁴ &
Kaustav Banerjee

Part of the book series: Integrated Circuits and Systems ((ICIR))

2315 Accesses
6 Citations

Download chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

14.1 Introduction

During the past few decades, complementary metal-oxide semiconductor (CMOS) technology scaling along Moore’s law has been the classical solution for the semiconductor industry to meet the ever-increasing demand for lower cost and higher performance [1–4]. However, in the nanometer regime, the pace of the transistor scaling has been slowing down due to the challenges and hindrances of severe short-channel effects, increasing variability, and power/thermal problems [5–7]. Also, due to the increase of functionality (transistor counts) of a planar (single active-layer) integrated circuit (IC), the complexity of interconnecting the devices increases dramatically and requires a large number of metal layers. Consequently, performance improvement from transistor scaling cannot be fully exploited and has gradually been constrained by interconnects [8].

Under this scenario, three-dimensional (3D) integration has been proposed as a promising technology to overcome the bottleneck of interconnects in future advanced nanoscale ICs [9, 10]. Three-dimensional integration scheme involves monolithic stacking of multiple active layers and leads to a considerable reduction in the number and average lengths of the longest global wires seen in traditional planar (2D) chips by providing shorter “vertical” paths for connection (Fig. 14.1). Besides the benefits of interconnect performance [9–11], it is conceivable that heterogeneous integration of designs in CMOS and various non-silicon technologies (SiGe, GaAs, InP, etc.) will be easier to realize in a single chip by the 3D architecture (different active layers) than by existing 2D (planar) chips [9].

Although 3D technology promises significant benefits, thermal issues are expected to offset the gains from this technology due to the degradation of performance and reliability [9, 13]. It is obvious that the heat problem will become worse due to the dramatic increase of power density when a 2D chip is redesigned into a 3D structure with the same functionality (identical power dissipation in a smaller size). Similarly, heat and thermal problems are expected to be exacerbated for 3D applications when many planar (2D) designs are integrated by stacking one layer on top of another [13–15]. Due to the low thermal conductivity of the dielectrics between active layers [16], heat generated by the stacked active layers located away from the heatsink are difficult to transfer, and hence, lead to temperature gradient in the vertical direction of a 3D chip. Therefore, heat and thermal considerations are imperative for determining the practical applicability of 3D technology and for evaluating various 3D design options.

Traditionally, thermal infrared (IR) imaging system has been used for acquiring thermal profiles of 2D ICs [17, 18]. This system offers limited resolution of substrate thermal profiles and is not suitable for 3D designs with multiple layers. Similarly, integrated thermal sensors are commonly employed to ensure that hot-spots do not exceed the specified maximum temperature criteria in high-performance ICs. However, only a limited number of sensors can be integrated into each active layer due to the routing and pin-out constraints. Most importantly, these techniques can only provide thermal profiles after fabrication that is not practical for early design optimization. Hence, an accurate chip-level thermal profile estimation methodology is necessary, especially for 3D applications.

In this chapter, we will focus on the thermal challenges of 3D (multilayers) ICs and discuss their implications for thermal management and their mitigation methods. First, impact of heat on device and interconnect reliability issues are briefly described. Then, an analytical die temperature model of a multilayer IC will be reviewed. In addition, the origin of various electrothermal couplings between chip power, substrate (die) temperature, operating frequency, and supply voltage will be discussed. Subsequently, an accurate chip-level leakage-aware methodology for 3D IC thermal profile estimation is illustrated that self-consistently takes various electrothermal couplings into consideration with a realistic package thermal model that comprehends different packaging layers and noncubic structure of the package. Finally, implications of temperature profiles generated by the proposed methodology for 3-D IC power estimation and thermal management are discussed.

14.2 Thermal Effects in 3D ICs in the Nanometer Regime

While continued scaling of CMOS technologies provides substantial benefits in the form of higher transistor packing density, higher circuit performance, and lower cost of ICs, power consumption and power densities (Watts per unit chip area) have been increasing steadily [5, 6]. Moreover, as CMOS has scaled from generation to generation, power dissipation has historically increased proportionately to increasing transistor density and switching speeds. However, with the minimum feature size of the transistor entering the nanometer regime (<100 nm), leakage power has become a significant fraction of the overall chip power [7]. Also, most leakage mechanisms are strongly temperature dependent. This strong coupling between temperature and leakage can cause further increase in total power dissipation. When several active layers are stacked (3D architecture), the heat and thermal problems are expected to be exacerbated [13, 15].

14.2.1 Impact of Heat on Device and Interconnect Reliability

Elevated substrate temperature is widely known to have a strong impact on the performance and lifetime of devices and interconnects under “field”, “accelerated testing”, and “burn-in” conditions. Higher temperature increases the risk of damaging the devices and interconnects (since major back-end and front-end reliability issues including electromigration (EM), time-dependent dielectric breakdown (TDDB), and negative-bias temperature instability (NBTI) have strong dependence on temperature), even with advanced thermal management technologies [19–21]. Moreover, due to the increase in the number of interconnect levels and introduction of low-κ dielectric materials with poor thermal conductivity, chip-level thermal problems have become even worse [13, 16, 22]. Hence, there is a critical need to accurately estimate the silicon substrate thermal gradients and temperature profile for the development and thermal management of future generations of all high-performance ICs, including 3D chips.

14.2.2 Analytical Average Die Temperature Model

A schematic diagram of a 3D IC with n active layers is illustrated in Fig. 14.2a. Each active layer (including one chip layer and one metallization layer) is separated by the glue layer. The 3D IC can be represented by a first-order equivalent thermal circuit as shown in Fig. 14.2b where P and T denote the power dissipation and the average die temperature of each active layer [13].

Preliminary analysis for estimating the average die temperature of a 3D IC can be carried out by employing the equivalent thermal circuit. Assuming the direction of heat flow is from layer n to layer 1 and to the heatsink, the die temperature of the first layer can be estimated by the following:

$$ T_1 = T_{\textrm{amb}} + \theta _{\textrm{ja}} \cdot \left( {\sum\limits_{k = 1}^n {P_k } } \right), $$

((14.1))

where T _amb represents the ambient temperature and θ _ja denotes the effective junction-to-ambient thermal resistance. Similarly, the temperature rise (above T _amb) of each active layer can be calculated by the following analytical expression, where θ _i is θ _ja (when i = 1) and θ _layer (when i >1):

$$ \Delta T_j = \sum\limits_{i = 1}^j {\left[ {\theta _i \cdot \left( {\sum\limits_{k = i}^n {P_k } } \right)} \right].} $$

((14.2))

According to the first-order equivalent thermal circuit in Fig. 14.2b, heat can only flow toward T _amb (package and heat-sink). Thus, the highest temperature is determined by the temperature of the uppermost (nth) layer. From [13], the temperature rise of the uppermost layer is expected to increase with the square of the number of active layers (~n ²) under assumptions of identical power dissipation in each layer and identical thermal resistance between adjacent layers. Hence, it is clear that heat and thermal problems are expected to exacerbate (highest temperature increases quadratically) as the number of active layer in a 3D IC increases [13].

Although the first-order analytical model comprehends the thermal couplings between different active layers, the analysis simply employs a 1D equivalent thermal circuit with constant and average power dissipation at each layer that arbitrarily imposes the direction on heat transfer. This assumption (average power for each active layer) results in increasingly higher temperatures at the layers which are further away from the heat-sink and can mislead the estimation of temperature in a multiple-layer (3D) IC. As shown in Fig. 14.3, when the power dissipation is nonuniform, temperature could be lower at a point that is away from the heat-sink. Moreover, the first-order 1D analysis also ignores the electrothermal couplings within each active layer (e.g., correlations between temperature and power) which become critical in nanoscale designs [23]. Moreover, as the power dissipation of the entire 3D IC increases, a more detailed consideration of thermal solutions (e.g., package and heat-sink in Fig. 14.2a) must be taken into account (described in the following subsections).

14.2.3 Origin and Significance of Electrothermal Couplings

Typically, switching power and leakage power are the two major contributors to total chip power dissipation. The short-circuit component is relatively small and can be considered as a constant factor of total power [24, 25].

The switching power results from the charging and discharging of circuit capacitances between different voltage levels and increases with the chip frequency and supply voltage. The leakage power, especially subthreshold leakage, used to be negligible, but is rapidly becoming the dominant contributor to the total chip power because it is highly temperature sensitive (being thermionic emission based) (Fig. 14.4a) and exacerbates with technology scaling (Fig. 14.4b). Note that gate leakage (tunneling based) is temperature independent and can be mitigated by gate engineering [26]. Also, the junction (diode) leakage is relatively small as compared to subthreshold leakage [27].

The subthreshold leakage increases significantly due to the fact that supply voltage (V _dd) scaling necessitates threshold voltage (V _th) scaling to maintain a required performance according to International Technology Roadmap for Semiconductors (ITRS) prediction (Fig. 14.4c).

In addition, elevated temperature lowers the threshold voltage of the transistor, and thus increases the leakage further [29]. Moreover, since the gap between the wavelength of light for optical lithography and the polysilicon gate length is increasing (Fig. 14.5a) [30], device channel length exhibits a significant amount of within-die variations [31], which in turn, leads to a significant impact on the distribution of leakage as shown in Fig. 14.5b.

The performance itself depends on temperature due to the dependence of the transistor on-current on operating temperature. Although the threshold voltage decreases at higher operating temperature and partially offsets the performance degradation resulting from the lower carrier mobility, the transistor on-current still decreases at higher operating temperatures (Fig. 14.5c).

The increase in total chip power consumption causes higher die temperature, which further increases subthreshold leakage. Therefore, a strong feedback loop builds up, leading to various electrothermal couplings [23], which had been inconspicuous in earlier generation of ICs. Fig. 14.6 illustrates such electrothermal couplings between performance, power dissipation, supply voltage, threshold voltage, and die temperature.

14.3 Self-Consistent Temperature Estimation for 3D ICs

As elevated and nonuniform temperature in a 3D IC extensively impacts the reliability, performance, and thermal management, acquiring accurate temperature profile of each active layer is necessary in the early design stage (before the 3D chip is fabricated). In this section, a self-consistent 3D temperature profile estimation methodology is presented. The method incorporates the electrothermal couplings, as well as a realistic package thermal model to improve the accuracy of the thermal profile estimation and it is implemented via one of the widely used efficient algorithms for solving heat diffusion equations.

14.3.1 Typical Chip Package Structure and Heat Transfer Mechanisms

Due to the increase in silicon junction temperature for nanometer-scale technologies, packaging has been transformed from playing the traditional role of a protective mechanical enclosure to a sophisticated thermal management platform [32, 33]. Fig. 14.7 illustrates a cross-sectional view of a typical package structure of a planar high-performance IC including a “flip-chip land grid array” package and a socket that interfaces with the printed circuit board. The die is mounted on a package substrate (carrier).

Along the main heat transfer path as shown in Fig. 14.7, the die and the package substrate are attached to an integrated heat spreader (IHS). The IHS, with a relatively larger area than that of the die, spreads the nonuniform heat from the die region to the top of the IHS. This improves the heat flux from a smaller die area to a larger surface that serves as the mating surface for the heat-sink. Since the surface of these three major components (die, IHS, and heat-sink) are never smooth enough to have a perfect contact, they are bonded together with a thermal interface material (TIM) applied between them. The TIM improves the poor thermal conductivity caused by surface roughness (conductivity of TIM is much larger than that of air) and thus enhances the overall thermal performance of the packaging stack-up and cooling mechanisms.

There is a second heat transfer path from the die to the printed circuit board, through the interconnect and dielectric layers, input/output (I/O) pads, and carrier as shown in Fig. 14.7. The thermal resistance of this path (from junction to the printed circuit board) is normally several orders of magnitude higher than that of the major heat transfer path [34]. Therefore, this path can be neglected in the analysis because of the small fraction of heat it can transfer.

Heat is a form of energy that can be transferred as a result of temperature difference by three different modes: (1) conduction, in which heat passes through the matter itself, (2) convection, in which heat is transferred by relative motion of portions of the heated body, and (3) radiation, in which heat is directly transferred between distant portions of the body by electromagnetic radiation. The effect of radiative heat losses can be neglected (effects of heat conduction and convection are considered) since its influence is negligible when forced convection is employed in most high-performance ICs [35]. The silicon die is the main source of heat generation. Heat can be exchanged and transferred by conduction within the entire packaging stack-up and by convection at the surface of the heat-sink.

14.3.2 Full-Chip Package Thermal Model

Practical packaging structures typically employ the heat spreader and heat-sink with larger dimensions (compared to the die) to improve the thermal performance of the main heat transfer path (the realistic package thermal model is shown in Fig. 14.8a). In practice, the area of the heat spreader and heat-sink are at least 9x and 30x larger than the area of the die, respectively. Note that not only does the packaging structure involve different materials with different thermal properties but also their dimensions with respect to the silicon die are different, which will significantly influence the heat transfer as well as the substrate thermal profile. The cubic package thermal mode, on the other hand, refers to a model in which all different package layers have identical areas and dimensions.

The temperature profile cannot be solved analytically due to the presence of complex geometry and complicated boundary conditions. Thus, numerical approaches will be employed for thermal profile estimation.

The fundamental physics of heat transfer in a chip is governed by the following 3D heat conduction equation and is subject to heat convection as the boundary condition [36]:

$$ \rho C_p \frac{\partial }{{\partial t}}T(x,y,z,t) = \nabla \cdot \left[ {k(x,y,z,t)\nabla T(x,y,z,t)} \right] + g(x,y,z,t) $$

((14.3))

$$ k(x,y,z,t)\frac{\partial }{{\partial n_i }}T(x,y,z,t) = h\left[{T(x,y,z,t) - T_{\textrm{amb}} } \right] $$

((14.4))

where ρ is the density of the material (kg/m³), C _p is the specific heat of material (J kg^–1°C), T is the temperature (°C), k is the thermal conductivity of the material (W m^–1°C), g is the internal heat generation (W m^–3), n _i is the outward direction normal to the boundary surface, h is the convective heat transfer coefficient (W m^–2°C), and T _amb is the temperature of the ambient air surrounding the package measured at a specified distance sufficiently far away from the surface of the entire package. Note that k is a measure of the ability of the material to conduct heat. Although it varies with temperature, the variance is relatively small within the range of operation [36]. Hence, a constant value of k is employed for each material in the packaging structure at the nominal temperature in the analysis. Also, for each layer, the thermal conductivity is identical in all directions (i.e., the material of each packaging layer is considered to be isotropic and homogeneous).

The aforementioned partial differential equations and boundary conditions can be rewritten as (14.5) and (14.6) where the temperature (T) is a function of the position (x,y,z) and time (t).

$$ \frac{{\partial T}}{{\partial t}} = \left( {\frac{k}{{\rho C_p }}}\right)\left( {\frac{{\partial ^2 T}}{{\partial x^2 }} +\frac{{\partial ^2 T}}{{\partial y^2 }} + \frac{{\partial ^2T}}{{\partial z^2 }}} \right) + \frac{p}{{\rho C_p }} $$

((14.5))

$$ \frac{{\partial T}}{{\partial n_i }} = \frac{h}{k}\left[ {T - T_{\rm amb} } \right]$$

((14.6))

Electrothermal couplings are incorporated into the thermal model and the parameter p in (14.5) is a function of temperature, time, and the position within the die. Unlike the constant quantity g in (14.3), the parameter p represents the heat generation including electrothermal couplings and is recalculated at each evaluation step in a self-consistent manner.

The entire thermal packaging stack-up (packaging material layers) is discretized based on a typical high-performance package structure as Fig. 14.7. Relationships between discretized cells are governed by the heat partial differential equations and boundary conditions shown in (14.5) and (14.6). Physical thermal parameters, such as thermal conductivity, density, and specific heat of different layers, depend on material properties. Note that the dimensions of a discretized cell are chosen to be equal (i.e., dx = dy = dz). Thus effective thermal conductivity (k _eff) of cells between two adjacent layers, as represented by darker nodes in Fig. 14.8b between layer 1 and layer 2, can be simply determined by (14.7).

$$ \frac{2}{{k_{\rm eff} }} = \left( {\frac{1}{{k_1 }} + \frac{1}{{k_2 }}} \right),$$

((14.7))

where k ₁ and k ₂ represent the thermal conductivity of material in layer 1 and layer 2, respectively. A perfect thermal contact between the TIM layer and the adjacent materials is assumed since TIM is applied between two different layers to reduce the thermal contact resistance caused by surface roughness.

14.3.3 Numerical Approach and Methodology Overview

Partial differential equations (PDEs) of the general form shown in (14.8) are classified as parabolic PDEs (where φ is a function of x, y, z, and t) [36, 37] and can be solved using the finite difference approximation by two well-known approaches: explicit and implicit methods.

$$ \frac{{\partial \varphi }}{{\partial t}} = \alpha \left({\frac{{\partial ^2 \varphi }}{{\partial x^2 }} + \frac{{\partial ^2\varphi }}{{\partial y^2 }} + \frac{{\partial ^2 \varphi}}{{\partial z^2 }}} \right) $$

((14.8))

The explicit method is simple and straightforward [36, 37]. The explicit method calculates the state of a system at the next time step from the state of the system at the current time. However, in many cases, time steps must be very small to maintain stability; this results in long computation time for a steady-state analysis. In order to overcome the aforementioned disadvantages of the explicit method, the implicit method considers both the current state and the state at the next time step [36, 37] and the stability can be maintained over much larger values of time step. However, this method is more complicated to set up and massive matrix manipulations require a considerable amount of computation memory and runtime for each time step.

The alternating direction implicit (ADI) method is a widely used algorithm for the numerical solution of parabolic PDEs involving multiple spatial variables [38, 39]. The advantage of applying this method arises from transferring a multiple dimensional parabolic PDE into a succession of 1D problems. Therefore, no large-scale matrix has to be computed, and it is easy to implement. Thus, the ADI method is employed as the core algorithm to solve the heat PDEs for achieving higher computation efficiency. It is important to note that although other computationally efficient methods exist, choosing any one of them over the others does not affect the accuracy of results.

In order to accurately estimate on-chip thermal gradients and the power dissipation profile, a self-consistent temperature profile estimation methodology is proposed with the capability of incorporating precise layout geometry and the power dissipation of individual circuit blocks in a chip [40].

Fig. 14.9 illustrates the overview of the methodology for substrate temperature profile estimation. The chip is partitioned into a mesh according to the information provided by the layout geometry and power distribution map. Nominal power dissipation (including switching and leakage power) for each functional block is used as initial value according to its activity, depending on specific circuit implementation and application. Note that for a 3D IC, each active layer will have different layout geometry and power distribution. Physical parameters such as specific heat, thermal conductivity, and heat transfer coefficient depend on specific packaging material properties and applied cooling techniques. The full-chip realistic package thermal model is then incorporated, which comprehends both vertical and lateral heat transfer paths. Boundary conditions are determined by the operating environment. The simulator uses layout geometry, nominal power dissipation, boundary conditions, and physical thermal/packaging parameters as initial values to formulate PDEs and then solves these equations in a self-consistent manner using the ADI method for every mesh element. The algorithm converts a multiple-dimensional parabolic PDE into a succession of 1D linear equations. The electrothermal couplings are also embedded in the core of the simulator that simultaneously estimates temperature-dependent quantities for each simulation step. Once the difference of the temperature evaluation between two steps is within a certain range, the evaluation stops and the steady-state temperature profile is obtained. However, if the temperature exceeds the maximum criteria (defined by reliability constraints) for certain extreme cases due to poor packaging solutions or high power dissipation, the evaluation will terminate and thermal runaway will be reported.

The key aspect of the proposed approach as compared to traditional methods is illustrated in Fig. 14.10. Although the entire thermal profile can be obtained by the traditional evaluation, the traditional method is apparently misleading because it ignores the correlation between power and temperature. While one might think of applying the traditional evaluation iteratively by updating the temperature-dependent power (as shown by the dotted arrows), however, this dramatically increases the computation time. In addition, once the steady-state temperature is evaluated without considering the electrothermal couplings, the iterations (as shown by the dotted arrows) based on inaccurate information is meaningless. On the other hand, the proposed self-consistent approach evaluates the steady-state temperature profile by employing the ADI method such that the correlation between the power and the temperature can be incorporated at each time step. Hence, the self-consistent method inherently generates a more accurate power profile, which can then be used to generate an accurate temperature profile by efficient PDE solvers.

figure 14_10_148491_1_En — **Fig. 14.10**

14.3.4 Setup and Implementation: An Example of a 2D IC Thermal Profile Estimation

A design with a die size of 10 × 10 mm² (discretized into 100 × 100 grids) and with power densities per functional block is shown in Fig. 14.11. The power dissipation of the chip or each functional block depends on the application (workload, activity, etc.). However, in this analysis, the power distribution map is known. The nominal total power consumption of the chip at ambient temperature (45°C) is 96 W (nominal active power = 93.1 W, leakage power = 2.9 W). The short-circuit component is relatively small; therefore it is neglected for simplicity. The physical and thermal properties of all packaging layers are evaluated according to a practical packaged high-performance microprocessor [40].

figure 14_11_148491_1_En — **Fig. 14.11**

In order to demonstrate the importance of incorporating electrothermal couplings and realistic package thermal model for estimating the substrate temperature profile, four different simulation scenarios are compared using the design shown in Fig. 14.11. Although the results of the proposed methodology have not been verified against direct measurements, the method simply ensures the self-consistency between power and temperature during each iteration of the PDE solver, which has been validated against an industrial-quality computational fluid dynamics (CFD ) software [41]. The same heat equations are employed and the inclusion of the electrothermal couplings does not change the fundamental equations governing thermal transport via heat conduction and convection but provides an algorithm to self-consistently solve the temperature and leakage power. Hence, once the core of the solver has been validated against the CFD, the results of the methodology can be trusted even with the inclusion of the electrothermal couplings.

Although the results are specific to the aforementioned 2D IC, the conclusions are more generic. It can be observed that there is a region indicated by a circle in Fig. 14.11 where blocks have highest power density. In addition, there is a region indicated by a triangle where blocks have 10 times leakage power dissipation with respect to the values of other functional blocks. However, the average power density of the circuit blocks in the triangle is around 60% of the average power density value in the circle.

Fig. 14.12 and Fig. 14.13 represent the silicon substrate temperature profiles generated under four different scenarios, respectively:

1.
Traditional method + cubic package thermal model
2.
Traditional method + realistic package thermal model
3.
Self-consistent method + cubic package thermal model
4.
Self-consistent method + realistic package thermal model

Note that all temperature profiles are shown using a constant temperature range (56–66°C) for ease of comparison in Fig. 14.12 and Fig. 14.13.

figure 14_12_148491_1_En — **Fig. 14.12**

figure 14_13_148491_1_En — **Fig. 14.13**

The impact of electrothermal couplings on the substrate temperature evaluation can easily be observed by comparing Fig. 14.12b and Fig. 14.13b, which both employ the realistic package thermal model (Fig. 14.8a) and the same cooling conditions. The substrate thermal profile (Fig. 14.12b) is generated using a traditional thermal simulator without considering electrothermal couplings. The highest temperature (hot-spot) is approximately 64.23°C and is located in a region with the highest power density (indicated by a circle in Fig. 14.11). However, a different substrate temperature profile (Fig. 14.13b) is obtained by employing the proposed self-consistent methodology. From the temperature profile in Fig. 14.13b, two hot-spots can be observed: one in the region with the highest power density and the other in the region with a higher percentage of leakage power. Unlike the traditional evaluation, the highest temperature is around 63.81°C and is located in the region with a higher percentage of leakage power (indicated by the triangle in Fig. 14.11). Note that the self-consistent methodology comprehends the couplings between power (active and leakage) and temperature. The steady-state power dissipation (active and leakage) is self-consistent with the temperature and may not be equal to the nominal power dissipation.

As explained in [28], regions with higher switching power density do not necessarily yield a higher temperature due to the various electrothermal couplings. Although the highest temperature values are similar in Fig. 14.12b and Fig. 14.13b, the temperature profile obtained by the self-consistent evaluation shows an additional hot-spot and thus a different temperature distribution. The traditional estimation is clearly misleading in terms of hot-spot count, location, and the overall spatial temperature profile as it neglects the electrothermal couplings between power dissipation and temperature.

The impact of employing two different package thermal models for the cooling path on the temperature profile estimation can be observed by comparing Fig. 14.13a and Fig. 14.13b. For fair comparison, the layout, power density distribution, and discretization of the die are kept identical. In addition, the physical and thermal properties of each packaging layer material are kept constant in both models. Fig. 14.13a shows the estimated substrate temperature profile by using a cubic (unrealistic) package thermal model. Although the electrothermal couplings are considered, unrealistic package thermal model underestimates the lateral heat spreading of packaging layers (particularly in IHS and heat-sink), and thus results in a higher maximum and average substrate temperature. However, it is also important to note that although the maximum temperature is lower, the temperature gradient from the hot-spot to the edges of the chip is higher while employing the realistic package thermal model (e.g., T _max is 65.69°C in Fig. 14.13a and 63.81°C in Fig. 14.13b; T _max – T _min in Fig. 14.13a and Fig. 14.13b are about 8°C and 11°C, respectively). Due to the use of larger heat spreader and heat-sink in the realistic package thermal model, better lateral heat spreading leads to lower maximum temperature but to even lower temperatures at the edges of the chip. This, in turn, is expected to impact the physical design issues such as partitioning and placement schemes for high-performance ICs including multicore designs.

14.3.5 3D IC Thermal Profile Estimation: Analysis and Implications

In [9, 42, 43, 44], several possible applications for this revolutionary 3D technology have been explored. One of the most promising applications is that of integrating a processor-and-memory system on a single 3D chip. Preliminary thermally aware performance analysis of the 3D processor-memory hierarchy (assuming an average temperature for each active layer) is performed with different benchmarks at different processor frequencies [12]. The impact of thermal constraint on performance of the processor-memory hierarchy is summarized in Fig. 14.14.

figure 14_14_148491_1_En — **Fig. 14.14**

For the application, which is highly memory-intensive (e.g., benchmark: mcf ), the execution time per instruction (t _exe) is lower for a 3D system (Fig. 14.14a) while for the application that is less memory-intensive (e.g., benchmark: twolf), the difference in execution time between 2D and 3D system is negligible (Fig. 14.14b). Moreover, when the system is constrained by a maximum allowable temperature (which arises from reliability concerns), the maximum allowable frequency (f _max) of a system is limited. For memory-intensive systems (Fig. 14.14a), even though thermal considerations place a lower limit on the f _max in 3D, better performance can still be achieved as compared to the 2D system running at a higher frequency. This is because the 2D system cannot overcome the memory interface bottleneck. On the other hand, for applications that are not memory intensive (Fig. 14.14b), the system performance is not dominated by memory accesses. Hence, under this scenario, the 2D system, which has a higher limit of f _max, has a system performance better than the 3D system, which is constrained to operate at a lower frequency [12]. Note that this analysis employs an average temperature for active layers and this implies the temperature will be higher at the layer which is away from the heat-sink. However, when detailed temperature profile is taken into consideration, the average temperature model may mislead the temperature estimation and the performance analysis.

As the traditional planar (2D) technology has already been threatened by power and associated thermal problems, the success of 3D integration not only depends on the development of processing technologies but also requires thorough and accurate estimation of thermal profiles in a 3D IC.

Chip-level thermal and reliability issues of planar (2D) IC designs can be comprehended by employing the aforementioned thermal profile estimation methodology while considering the packaging and electrothermal couplings. From Fig. 14.7, the major heat transfer path of a planar (2D) IC is clearly from the active layer to the thermal packaging and heat-sink. However, due to the presence of different power dissipation and distribution of different active layers in a 3D IC, the direction of the heat flow significantly depends on the arrangement and the placement of the active layers.

A generic 3D IC with three active layers is considered in this subsection (one active layer is shown in Fig. 14.11 and the power density maps of the other two active layers are shown in Fig. 14.15). Note that the thickness of each active layer in a 3D IC (around 50 μm, in Fig. 14.1) is much smaller than that in a planar (2D) IC (several hundred microns) for practical integration, assuming the power dissipation of active layers in Fig. 14.15 is 20 W and 10 W, respectively. Typically, in order to reduce the thermal resistance between active layers and heat-sink, layers with higher power dissipation will tend to be placed closer to the heat-sink than layers with less power dissipation. Thus, in this scenario the layer shown in Fig. 14.11 will be attached directly to the package structure and followed by Fig. 14.15a and subsequently by Fig. 14.15b. Note that besides the boundary that is attached to the heat-sink and exposed to the ambient, all other boundaries are considered to be adiabatic in the analysis.

figure 14_15_148491_1_En — **Fig. 14.15**

With the same packaging and environment conditions as in the previous subsection, steady-state temperature profiles of the layers in the 3D IC can be estimated by using the self-consistent methodology (Fig. 14.16). The maximum temperature for each active layer is 64.19°C (layer 1), 63.89°C (layer 2), and 63.47°C (layer 3), respectively. Although power dissipation of layer 2 and layer 3 is much lower than layer 1, steady-state temperature profiles of layer 2 and layer 3 are influenced and raised by layer 1. As discussed in Section 14.2, employing first-order analytical thermal model for 3D IC analysis with average power and temperature of each active layer certainly misleads the temperature estimation and overestimates the maximum temperature of layers that are away from the heat-sink.

figure 14_16_148491_1_En — **Fig. 14.16**

14.4 Implications and Opportunities for 3D IC Thermal Management

Unlike planar (2D) ICs, thermal management for 3D ICs requires thorough considerations not only for each active layer (2D level) but also the correlative impacts between active layers (3D level).

At the 2D level, power-reduction techniques and thermal management for conventional planar (2D) ICs, including device-, circuit-, and architecture-level techniques, can be directly employed to reduce power dissipation and thermal gradient. For CMOS technologies, device short-channel effects [29], which lead to higher subthreshold leakage, have been shown to be improved via substrate engineering. For instance, vertically nonuniform doping (retrograde channel profile) enhances inversion layer mobility because of the lower surface doping [45, 46], while laterally nonuniform channel implants (halo doping) reduce threshold voltage roll-off by compensating 2D charge-sharing effects in short-channel transistors [47–49]. Transistor gate-tunneling leakage, which increases with the ever-thinning silicon dioxide gate dielectric [50], can be alleviated by replacing the thin silicon dioxide by a thicker insulating material with higher dielectric constant (high-κ). In addition, a metal gate electrode was also used to replace the poly-silicon gate to have a better control of the threshold voltage [51, 52]. Similarly, at the circuit level, low-power design methodologies include dual- or multi-V _dd and V _th schemes as well as adaptive body-biasing techniques can be applied [53]. Transistor gating is also considered for low-power or power-constrained designs. For instance, the clock-gating technique is used to reduce clock tree power dissipation [54]. Power gating and sleep transistor insertion techniques reduce leakage by turning off idle circuitry [55]. Thermally aware placement schemes (within-layer) are also available in the literature to optimize performance and operating temperature [56, 57]. Moreover, in [58], a 3D IC thermal placement method using an iterative force-directed approach is presented. At the architecture level, pipelining and parallel (including multi-core) structures are often implemented in low-power designs. The throughput can be maintained at a lower V _dd by parallel implementation. Also, applying pipelining can reduce power consumption while the switching rate and V _dd are reduced [59]. Note that these methods reduce power consumption at the cost of area, performance, or noise margin penalty.

Besides the aforementioned techniques, chip cooling has always been considered as an effective knob for power and thermal management [60]. Conventional cooling techniques and thermal management for planar (2D) ICs cannot be directly applied for 3D IC thermal management but require a holistic consideration including all active layers and thermal solutions (packaging, etc.). For instance, the boundary conditions of one active layer are determined by its adjacent layers, and hence, the aforementioned 2D-level (within-layer) placement schemes are required to comprehend the layer-to-layer effects.

figure 14_17_148491_1_En — **Fig. 14.17**

As shown in the aforementioned example (Fig. 14.15 and Fig. 14.16), layer 1 has the highest power dissipation and the highest temperature occurs in layer 1 even though this layer is closest to the heat-sink. However, temperature profiles and the maximum temperature of the active layers in a 3D IC will change when employing different arrangement schemes of layers. Here we discuss an example with three identical active layers stacked in a 3D IC. The power distribution of the active layers is similar to Fig. 14.11 but the power dissipation of each functional block in the active layer is one-third as compared to that of Fig. 14.11 (note that stacking three high-power dissipation layers with the same thermal solutions leads to thermal runaway). Fig. 14.17 shows the steady-state temperature profile of these three layers. It can be observed that the highest temperature now occurs at layer 3 that is away from the heat-sink.

Moreover, alternative materials with higher thermal conductivities can also improve heat removal of 3D ICs. While, thermal and interlayer vias are shown to mitigate thermal problems in 3D ICs [61], employing metallic carbon nanotube (CNT) bundle vias to replace copper at different locations in the interconnect stack shows substantial benefits in controlling the back-end temperature [62]. In practice, CNTs have been fabricated as thermal and source bumps for flip-chip high-power amplifiers [63]. Also, it has been experimentally shown that the thermal conductivity of TIMs can be improved by employing free-standing CNT arrays or combinations of CNT arrays and existing TIMs [64].

14.5 Summary

Three-dimensional integration technology with multiple active layers has been considered as a promising candidate to alleviate the interconnect delay problems in nanoscale VLSI circuits and to realize heterogeneous integration in the same chip. As heat and thermal effects already significantly impact reliability and performance in high-performance planar (2D) ICs, obviously, heat and thermal problems in 3D ICs are worse since 3D ICs are stacked by, and thus inherited from 2D ICs. In this scenario, accurate thermal profile estimation is critical in the early design stage (before the 3D chip is fabricated).

It is shown that the first-order analysis simply employs a 1D equivalent thermal circuit with constant and average power dissipation at each active layer and results in higher temperature at the layer which is away from the heat-sink that misleads the estimation of temperature in a 3D IC. On the other hand, the proposed self-consistent 3D temperature profile estimation methodology incorporates the electrothermal couplings that are increasingly prominent as technology scales. In addition, a realistic package thermal model is considered to improve the accuracy of the thermal profile estimation. Impact of layer stacking on the temperature profile of a 3D IC is also presented.

The 3D thermal profiles are also strongly influenced by the nature of the application running on the chip. Although various techniques for power saving or thermal management for planar (2D) ICs can be applied to 3D ICs, considerations of active layer arrangement in a 3D IC as well as 3D thermally aware placement schemes can severely influence the steady-sate temperature profile of each active layer. Furthermore, overall thermal conductivity of 3D ICs can be improved by employing higher thermal conductivity materials (e.g., CNTs) between active layers or in the packaging structure.

References

Moore GE (1965) Cramming more components onto integrated circuits. Electronics 114–117
Google Scholar
Moore GE (1975) Progress in digital integrated electronics. IEEE International Electron Devices Meeting, pp 11–13
Google Scholar
Dennard RH, Gaensslen FH, Rideout VL, Bassous E, LeBlanc AR (1974) Design of ion-implanted MOSFETs with very small physical dimensions. IEEE J Solid-State Circuits 9: 256–268
Article Google Scholar
International Technology Roadmap for Semiconductors (ITRS), http://www.itrs.net
Borkar S (1999) Design challenges of technology scaling. IEEE Micro, 19:23–29
Article Google Scholar
De V, Borkar S (1999) Technology and design challenges for low power and high performance. IEEE International Symposium on Low Power Electronics and Design, pp 163–1681
Google Scholar
Gelsinger PP (2001) Microprocessors for the new millennium: Challenges, opportunities, and New Frontiers. IEEE International Solid-State Circuits Conference, pp 22–25
Google Scholar
Meindl JD (2003) Beyond Moore's law: The interconnect era. Comput Sci Eng, 5:20–24
Google Scholar
Banerjee K et al (2001) 3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration. Proceedings of the IEEE, vol 89: pp 602– 633
Article Google Scholar
Topol W et al (2006) Three-dimensional integrated circuits. IBM J Res Dev, 50:491–506
Article Google Scholar
Rahman A, Reif R (2000) System-level performance evaluation of three-dimensional integrated circuits. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol 8. pp 671–678
Article Google Scholar
Loi GL, Agrawal B, Srivastava N, Lin S-C, Sherwood T, Banerjee K (2006) A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy. ACM Design Automation Conference, pp 991–996
Google Scholar
Im S, Banerjee K (2000) Full chip thermal analysis of planar (2-D) and vertically integrated (3-D) high performance ICs. IEEE International Electron Devices Meeting, pp 727–730
Google Scholar
Kleiner MB, Kühn SA, Ramm P, Weber W (1995) Thermal analysis of vertically integrated circuits. IEEE International Electron Devices Meeting, pp 487–490
Google Scholar
Rahman A, Reif R (2001) Thermal analysis of three-dimensional (3-D) integrated circuits (ICs), IEEE Interconnect Technology Conference, pp 157–159
Google Scholar
Banerjee K, Amerasekera A, Dixit G, Hu C (1996) The effect of interconnect scaling and low-k dielectric on the thermal characteristics of the IC metal. IEEE International Electron Devices Meeting, pp 65–68
Google Scholar
Hamann HF, Weger A, Lacey JA, Cohen E, Atherton C (2006) Power distribution measurements of the dual core powerPC™ 970 MP microprocessor. IEEE International Solid-State Circuits Conference, pp 2172–2179
Google Scholar
Hamann HF, Weger A, Lacey JA, Hu Z, Bose P, Cohen PE, Wakil J (2007) Hotspot-limited microprocessors: direct temperature and power distribution measurements. IEEE J Solid-State Circuits 42:56–65
Article Google Scholar
Tadayon P (2000) Thermal challenges during microprocessor testing, Intel Technology Journal 3rd quarter
Google Scholar
Viswanath R, Wakharkar V, Watwe A, Lebonheur V (2000) Thermal performance challenges from silicon to system. Intel Technology Journal 3rd quarter
Google Scholar
Prasher RS, Chang JY, Sauciuc I, Narasimhan S, Chau D, Chrysler G, Myers A, Prstic S, Hu C (2005) Nano and micro technology-based next-generation package-level cooling solutions, Intel Technology Journal 4th quarter
Google Scholar
Banerjee K, Mehrotra A (2001) Global (interconnect) warming, IEEE Circuits Devices Mag, 17:16–32
Google Scholar
Banerjee K, Lin S-C, Keshavarzi A, Narendra S, De V (2003) A self-consistent junction temperature estimation methodology for nanometer scale ICs with implications for performance and thermal management. IEEE International Electron Devices Meeting, pp 887–890
Google Scholar
Chatterjee A, Nandakumar M, Chen IC (1996) An investigation of the impact of technology scaling on power wasted as short-circuit current in low voltage static CMOS circuits. IEEE International Symposium on Low Power Electronics and Design, pp 145–150
Google Scholar
Banerjee K, Mehrotra A (2002) A power-optimal repeater insertion methodology for global interconnects in nanometer designs. IEEE Trans Electron Devices 49:2001–2007
Article Google Scholar
Zeitzoff PM (2004) MOSFET scaling trends and challenges through the end of the roadmap. Custom Integrated Circuits Conference, pp 233–240
Google Scholar
Lin Y-S, Wu C-C, Chang C-S, Yang R-P, Chen W-M, Liaw J-J, Diaz CH (2002) Leakage scaling in deep submicron CMOS for SoC, IEEE Trans Electron Devices 49:1034–1041
Article Google Scholar
Lin S-C, Chrysler G, Mahajan R, De V, Banerjee K (2007) A self-consistent substrate thermal profile estimation technique for nanoscale ICs – Part I: electrothermal couplings and full-chip package thermal model, IEEE Trans Electron Devices 54(12):3342–3350
Article Google Scholar
Taur T, Ning TH (1998) Fundamentals of modern VLSI devices, Cambridge Univ. Press
Google Scholar
Gelsinger P (2004) Gigascale integration for teraops performance–challenges, Opportunities, and New Frontiers, 41^st DAC Keynote.
Google Scholar
Borkar S, Karnik T, Narendra S, Tschanz J, Keshavarzi A, De V (2003) Parameter variations and impact on circuits and microarchitecture. Design Automation Conference pp 338–342
Google Scholar
Adam J, Chang C-S, Stankus JJ, Iyer MK, Chen WT (2002) Addressing packaging challenges, IEEE Circuits Devices Mag 18:40–49
Article Google Scholar
Mahajan R, Nair R, Wakharkar V, Swan J, Tang J, Vandentop G (2002) Emerging directions for packaging technologies, Intel Technology Journal 2 nd quarter
Google Scholar
Im S, Srivastava N, Banerjee K, Goodson KE (2005) Scaling analysis of multilevel interconnect temperatures for high performance ICs. IEEE Trans Electron Devices 52:2710–2719
Article Google Scholar
Cess RD (1961) The effect of radiation upon forced-convection heat transfer. Appl Sci Res 10:430–438
Article MATH Google Scholar
Özişik MN (2002) Boundary value problems of heat conduction. Dover Publications
Google Scholar
Haberman R (1983) Elementary applied partial differential equations with fourier series and boundary value problems, Prentice Hall.
Google Scholar
Peaceman DW, Rachford HH (1955) The numerical solution of parabolic and elliptic differential equations. J Soc Ind Appl Math: 28–41
Google Scholar
Douglas J, Rachford HH (1956) On the numerical solution of heat conduction problems in two or three space variables, Trans Am Math Soc 421–439
Google Scholar
S-C. Lin, Chrysler G, Mahajan R, De V, Banerjee K (2007) A self-consistent substrate thermal profile estimation technique for nanoscale ICs – Part II: Implementation and implications for power estimation and thermal management, IEEE Trans Electron Devices 54:3351–3360
Article Google Scholar
Icepak, (http://www.icepak.com/)
Davis WR, Wilson J, Mick S, Xu J, Hua H, Mineo C, Sule AM, Steer M, Franzon PD (2005) Demystifying 3D ICs: the pros and cons of going vertical, IEEE Design & Test of Comput, 22:498–510
Article Google Scholar
Zeng A, Lü J, Rose K, Gutmann RJ (2005) First-order performance prediction of cache memory with wafer-level 3D integration, IEEE Design & Test of Comput 22:548–555
Article Google Scholar
Kühn SA, Kleiner MB, Ramm P, Weber W (1996) Performance modeling of the interconnect structure of a three-dimensional integrated RISC processor/cache system, IEEE Transactions on Components, Packaging, and Manufacturing Technology-Part B, vol 19. pp 719–727
Article Google Scholar
Wong HSP, Frank DJ, Solomon PM, Wann CHJ, Welser JJ (1999) Nanoscale CMOS, Proceedings of the IEEE, vol. 87. pp 537–570
Google Scholar
De I, Osburn CM (1999) Impact of super-steep-retrograde channel doping profiles on the performance of scaled devices. IEEE Trans Elect Devices 46:1711–1717
Article Google Scholar
Codella CF, Ogura S (1985) Halo doping effects in submicron DI-LDD device design. IEEE International Electron Devices Meeting, pp 230–233
Google Scholar
Shahidi GG, Warnock J, Fischer S, McFarland PA, Acovic A, Subbanna S, Ganin E, Crabbe E, Comfort J, Sun Y-C, Ning TH, Davari B (1993) High-performance devices for a 0.15-μm CMOS technology, IEEE Electron Device Lett 14:466–468
Article Google Scholar
Su L, Subbanna S, Crabbe E, Agnello P, Nowak E, Schulz R, Rauch S, Ng H, Newman T, Ray A, Hargrove M, Acovic A, Snare J, Crowder S, Chen B, Sun J, Davari B (1996) A high-performance 0.08-μm CMOS. IEEE Symposium on VLSI Technology, pp 12–13
Google Scholar
Lo SH, Buchanan DA, Taur Y, Wang W (1997) Quantum-mechanical modeling of electron tunneling current from the inversion layer of ultra-thin-oxide nMOSFETs, IEEE Electron Device Lett 18:209–211
Article Google Scholar
Chau R, Brask J, Datta S, Dewey G, Doczy M, Doyle B, Kavalieros J, Jin B, Metz M, Majumdar A, Radosavljevic M (2005) Application of high-κ gate dielectrics and metal gate electrodes to enable silicon and non-Silicon logic nanotechnology. Microelectron Eng 80:1–6
Article Google Scholar
Gusev EP, Narayanan V, Frank MM (2006) Advanced high-κ dielectric stacks with polySi and metal gates: recent progress and current challenges. IBM J Res Dev 50:387–410
Article Google Scholar
Tschanz JW, Narendra S, Nair R, De V (2003) Effectiveness of adaptive supply voltage and body bias for reducing impact of parameter variations in low power and high performance microprocessors. IEEE J Solid-State Circuits 38:826–829
Article Google Scholar
Pedram M, Rabaey J (2002) Power aware design methodologies, Kluwer.
Google Scholar
Mutoh S, Douseki T, Matsuya Y, Aoki T, Shigematsu S, Yamada J (1995) 1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS. IEEE J Solid-State Circuits 30:847–854
Article Google Scholar
Chao KY, Wong DF (1995) Thermal placement for high-performance multichip modules. International Conference on Computer Design, pp 218–223
Google Scholar
Tsai CH, Kang SM (2000) Cell-level placement for improving substrate thermal distribution. IEEE Transactions on Computer- Aided Design 19:253–266
Article Google Scholar
Goplen B, Sapatnekar S (2003) Efficient thermal placement of standard cells in 3D ICs using a force directed approach. International Conference on Computer Aided Design, pp 86–89
Google Scholar
Chandrakasan AP, Sheng S, Brodersen RW (1992) Low-power CMOS digital design. IEEE J Solid-State Circuits 27:474–484
Article Google Scholar
Lin S-C, Banerjee K (2008) Cool Chips: Opportunities and Implications for Power and Thermal Management, IEEE Trans Electron Devices 55:245–255
Google Scholar
Goplen B, Sapatnekar S (2007) Placement of 3D ICs with thermal and interlayer via considerations, ACM Design Automation Conference, pp 626–631
Google Scholar
Srivastava N, Joshi RV, Banerjee K (2005) Carbon nanotube interconnects: implications for performance, power dissipation and thermal management, IEEE International Electron Devices Meeting, pp 257–260
Google Scholar
Iwai T, Shioya H, Kondo D, Hirose S, Kawabata A, Sato S, Nihei M, Kikkawa T, Joshin K, Awano Y, Yokoyama N (2005) Thermal and source bumps utilizing carbon nanotubes for flip-chip high power amplifiers. IEEE International Electron Devices Meeting, pp 257–260
Google Scholar
Xu J, Fisher TS (2006) Enhancement of thermal interface materials with carbon nanotube arrays. Int J Heat and Mass Transf 49:1658–1666
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Santa Barbara, CA, USA
Sheng-Chih Lin

Authors

Sheng-Chih Lin
View author publications
You can also search for this author in PubMed Google Scholar
Kaustav Banerjee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School Electrical & Electronic Eng., Photonics Res. Centre, Nanyang Technological University, Singapore, 639798, Singapore
Chuan Seng Tan
Center for Integrated Electronics, Rensselaer Polytechnic Institute, 8th Street 110, Troy, 12180, U.S.A.
Ronald J. Gutmann
Dept. Electrical Engineering, Massachusetts Institute of Technology, Massachusetts Ave. 77, Cambridge, 02139, U.S.A.
L. Rafael Reif

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lin, SC., Banerjee, K. (2008). Thermal Challenges of 3D ICs. In: Tan, C., Gutmann, R., Reif, L. (eds) Wafer Level 3-D ICs Process Technology. Integrated Circuits and Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-76534-1_14

Download citation

DOI: https://doi.org/10.1007/978-0-387-76534-1_14
Published: 11 August 2008
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-76532-7
Online ISBN: 978-0-387-76534-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics