Power metallization degradation monitoring on power MOSFETs by means of concurrent degradation processes

An on-chip solution for health monitoring of semiconductor power switches subjected to thermo-mechanical metal fatigue degradation is proposed. The fatigue detection relies on the correlation between the progress of the main failure mechanism, which is critical to the functionality of the device, and a parallel degradation of a non-critical sensing structure using a different mechanism. Both mechanisms are driven by the same cyclic thermo-mechanical load. This study specifically develops a sensing structure for detecting power metallization aging through electrically detectable ratcheting behavior in the routing metal layer underneath. Experiments have been carried out on a dedicated test structure with electrical sensing of the health monitoring structure. Meanwhile, the main degradation progress was observed via scanning electron microscopy in regular intervals. Results show that the proposed approach will reliably work only for detecting degradation driven by repeated high overload events.


Introduction
The automotive industry in Europe must comply with the functional safety requirements defined for electronics in ISO26262. Semiconductor device manufacturers need to provide products that enable the automotive industry to comply with such regulations. An example of how functional safety issues may involve power electronics is the event of failure of a safety switch. Safety switches may be exposed to a large number of overload events during their lifetime. Such events result in thermo-mechanical fatigue of the materials composing the switches themselves.
Thermo-mechanical fatigue of power metallization is actually the main cause of failures related to repetitive overload events in the Vertical Double-Diffused Metal Oxide Semiconductor (VDMOS) power technology considered in this study. This condition consistently comes within the actual scenario, where power semiconductor manufacturers strive to reduce device footprints. Accordingly, an increase in power densities has been observed [1]. On this basis, during technology development, semiconductor manufacturers extensively test the hardness of power switches against thermo-mechanical fatigue. Devices' lifetime and hardness versus thermo-mechanical fatigue can be estimated by means of accelerated aging test systems [2]. However, the stress history of devices in the application environment is not tracked and depends on the specific application environment. This condition increases variability in lifetime. Realtime, on-chip health monitoring could be a viable method to effectively address such a specific reliability issue. A successful implementation could also be used during the technology development to acquire better knowledge about thermo-mechanical fatigue's effect on power metallization. The feasibility of an implementation strategy for such method is here investigated.
Several studies about health condition monitoring in power electronics have been proposed in the literature to date. Studies that addressed the problem with a more fundamental approach are focused on failure precursors [3,4]. The goal of this research branch is to identify electrical parameters which can indicate an imminent failure.

3
Failure precursors are at the base of many implementation studies, and some examples are listed as follows. Dusmez [5] proposed a software frequency response measurement method by means of a digital signal processor to detect on-state resistance variations. Anderson [6] presented an algorithm-based approach to extract several electrical parameters of a device and compared them to the ones of a healthy device. Chen [7] proposed a condition monitoring technique specific for power metallization by measuring the voltage between Kelvin and power emitter on isolated gate bipolar transistor devices during turn-on transients. Panchal [8] proposed a thermal measurement system that allows the estimation of junction-to-case thermal resistance, which is a reliable indicator for the thermal performance of a device. Pu [9] used the turn-on delay time to assess the health condition of SiC power devices.
The aforementioned studies belong to the well-established field of system-level or in-lab health condition monitoring. Our work is focused on on-chip level implementation, which is considered a pioneering field that has been less explored in the literature. Ritter and Pfost [10][11][12] made a relevant contribution in the field of DMOS power technologies.
Ritter and Pfost investigated the use of non-vital, inner layer metal meanders to detect aging of vital metal structures in the same layers as the meanders. This study was performed on a lateral DMOS technology with promising results. The monitoring solution proposed here is also based on the implementation of an on-chip non-vital structure that degrades due to thermo-mechanical stress. Unlike the proposal showed in [10], a non-vital structure in an inner aluminum layer is meant to be used for detecting aging of the top copper metallization. This notion is due to the solution proposed in [10][11][12] that is not directly applicable to the sheet power metallization of a vertical technology. Degradation of the inner non-vital structure is driven by thermomechanical stress as is the degradation of power metallization. Nonetheless, the degradation mechanism is different. Such a structure has an electrical signature that varies with its degradation status, and a dedicated on-chip circuit can read it. The very goal of this research is to investigate the relation between a change in the electrical signature and the degradation status of the power metallization. Given that power metallization fatigue is a common reliability concern [13], the testing method proposed in this study can be relevant to several different power technologies.

Novel health monitoring concept
This research considers two effects of thermo-mechanical stress: fatigue of the power metallization and ratcheting of the aluminum lines. The first effect, power metallization fatigue, is driven by the mismatch of thermal expansion between the metal and the substrate during a heat-up and cool-down cycle. Fatigue affects the thermal coupling between the power metallization and the active area of the power switch. Consequently, thermal resistance locally increases, resulting in higher peak temperatures [14]. This phenomenon will finally result in the failure of the device. The latter effect, ratcheting, is a 1D shifting of the mechanical stress-strain hysteresis curves of a material through cycling when asymmetrical loading is applied [15].
Ratcheting is enabled on the metal lines of an MOS technology device by a combination of two factors [16]: temperature gradients and the difference between the thermal expansion coefficients of silicon and metals. Ratcheting results in a net material transport and accumulation through thermal cycling for the technology under consideration [16]. The accumulation of metal, in turn, results in a pressure increase, which breaks the surrounding dielectric, leading to metal protrusions [17]. If such a phenomenon affects a metal line that runs close to another, then a short between the two lines occurs. This transition from an open circuit to a short between the two lines is the electrical signature used for the proposed monitoring. The latter can be successfully implemented if two conditions are verified. The first condition is that fatigue and aluminum ratcheting must be enabled and progress together by applying thermal cycling. The second condition is the possibility to tune the monitoring structure to ensure that a short will occur before the switch fails, and only if failure is imminent.
To the best of the authors' knowledge, no studies covering the relation between ratcheting in metal wires and power metallization fatigue have been carried out to date. Other studies about the use of metal layer degradation mechanisms for diagnostic health management purposes are available in the literature [11].

Methods
We investigate a dedicated test-chip, which reproduces the same thermo-mechanical behavior of a DMOS power switch. The test-chip contains the fatigue-monitoring structures under investigation. An on-chip polysilicon resistor generates heat. This condition allows us to perform thermal cycling and has the samples experience thermo-mechanical fatigue. The test-chips designed for this technique are called poly-heaters [18]. A more detailed description of the testchip is provided in Sect. 4.
Monitoring structures implemented on the test-chip consist of an aluminum line that is prone to ratcheting (hereinafter referred to as sensing line) [16] running between two other aluminum lines. During thermal cycling, appliances check if a short between a sensing line and a nearby aluminum line occurred. The fatigue detection-related electrical signature given by the structure is a sudden impedance drop. In the case of a final on-chip implementation, a dedicated circuit would provide this information through an output bit. Previous experiments investigated the occurrence of short-circuits between metal lines through thermal cycling. These studies have proven that aluminum lines can be designed to undergo short-circuit in a practical number of cycles (~ 10 5 ) [16]. Accordingly, sensing lines' geometry has been decided. Reference [16] also showed the effect of aluminum line width on the number of cycles to short. This work allows us to tune the sensitivity of the structure to thermo-mechanical stress.
An experiment is performed to carry out the presented investigation. The main concept of the experiment consists of applying thermal cycling at different temperatures to a set of samples until a short occurs, and the resulting impedance drop is detected. The degradation statuses of the power metallization of the samples after cycling are compared. The latter is estimated with a scanning electron microscope (SEM).
Section 5 describes the equipment used for thermal cycling and microscopy analysis. Thermal cycling is performed in an airtight chamber, with forming gas flowing to prevent chip oxidation. The SEM is used for taking a picture of the power metallization of each sample before and after cycling (Figs. 4,5,6,7,8,9). The variation of the thermal cycling parameters, such as peak and base temperatures and pulse duration and period, affect degradation processes. A way to validate the proposed monitoring principle is to compare the effect of cycling parameter variations on ratcheting-induced short-circuit occurrence and power metal degradation. If the variation of a parameter has a modest effect on one of the two degradation processes and a dramatic effect on the other, then the proposed health monitoring principle is unreliable.
Cycling base and peak temperatures are known to significantly affect the materials' fatigue life (Reference [19] provides for a general overview about this topic). Accordingly, the experiment is focused on studying the effect of different peak temperatures on the power metallization degradation and the number of cycles needed for having a short. All of the samples are cycled at a base temperature of 80 °C, and two different peak temperatures have been chosen, namely 400 °C and 460 °C. The experiment is repeated on four samples for each of the two peak temperatures. A resistance measurement checks every 50 cycles if a short-circuit occurred. If so, thermal cycling is immediately interrupted. When all the samples are cycled, pictures taken after cycling are processed to quantify power metallization damage. Section 6 explains how such quantification is conducted. If the power metallization layers of samples cycled with different peak temperatures show a significantly different degradation level, then the feasibility of the proposed monitoring concept is disproved.

Test structure and measurement setup
The test-chip used for this experiment is a simplified structure that emulates the thermo-mechanical behavior of a DMOS power switch without including any DMOS active area. As anticipated in Sect. 3, a resistive polysilicon layer heats the chip. Figure 1 shows the basic structure of the testchip. The large black rectangle connected to the relative pad represents the perimeter of the power metallization. The blue rectangles represent the aluminum lines. The light-brown area represents the polysilicon resistor. The relative contacts can be found at the top and bottom. The yellow squares represent the vias connecting the aluminum lines to the power metallization. Proportions are not respected in this pictures: the real aluminum lines have a different aspect ratio, with a bigger length compared to the width, and they are present in a larger number. In the technology considered here, aluminum lines may or may not have vias connecting them to the power metallization. The latter acts as a source terminal.
Majority of the aluminum lines of the DMOS active area of a switch are source interconnect wires; hence, they are connected to the power metallization through vias. A sensing line needs to be electrically isolated from other aluminum lines; thus, it must not have vias connecting it to the power metallization (Figs. 1, 2). The sensing structures are implemented on the test-chip. Accordingly, resistance measurements between power metallization and sensing lines' terminal detect shorts. The test structure is provided with a thermal sensor. This structure consists of an aluminum meander used for four-point resistance measurements. The thermal sensor allows us to set the desired base and peak temperature for the thermal cycling. Subsection 5 explains how this sensor is used. Figure 3 shows a schematic of the test-chip connected to the rest of the setup.

Experimental test protocol
Before setting the cycling parameters and starting the thermal cycling itself, a picture of the power metallization is taken via SEM. Then, thermal cycling is performed by applying power pulses to the polysilicon resistor through a voltage source and a switch driver (Fig. 3).
The voltage source V 1 provides a constant voltage that keeps the test-chip at a base temperature of 80 °C. The switch driver opens the switch S 1 and closes switch S 2 to apply a pulse. Then, switch S 2 is opened, and switch S 1 is closed to end the pulse.
A low-noise, battery-supplied constant current source provides the force current for the thermal sensor's meander. A digital oscilloscope is utilized for the voltage measurement.
A software architecture based on Lua [20] scripts controls the switch driver and the measurement equipment. In the following, ∆T indicates the difference between the peak temperature read during a pulse through the meander and the base temperature read through the digital multimeter. The equipment automatically performs resistance measurements based on a user-defined schedule. This feature is used for reading the ∆T applied to a specimen during a pulse and detecting short-circuits between any sensing line and the power metallization. A transition of the impedance between the sense line terminal and the power metallization from an out of scale to a finite value in the range of some tens of ohms reveals that a short occurred (Fig. 3).
The power dissipated by the polysilicon resistor for every specimen must be calibrated to obtain the wanted bias temperature T 0 and the wanted initial ∆T, hereinafter ∆T i .
As a first step, the sample is plugged on a dedicated PCB, and the latter is put into an airtight chamber, which has an external connector to allow for the measurement setup to be connected to the PCB. The chamber has an internal volume of approximately 1.5 l. Once the chamber is closed, it is vented with a forming gas flow of approximately 5 l/min. During this process, the tap on the outlet is set to have an internal overpressure of at least 5 kPa. After approximately 30 s, the gas flow is gradually decreased, and the tap on the outlet gradually closed until an overpressure between 6 and 14 kPa can be maintained with the minimum gas flow possible.
Once the forming gas atmosphere is set, the meander's resistance at room temperature must be measured to calibrate the temperature sensor.
Switch S 2 is kept open and S 1 closed to set T 0 . V 2 is set to a reasonable value according to the ones used for similar poly-heaters during previous experiments. A digital multimeter performs a voltage measurement every 60 s at the sensing terminals of the meander until the read value stabilizes. V 1 is then adjusted when necessary during heating. T 0 is considered stable when the read value is between 1.5 °C and 2 °C below the desired value, and the temperature increase is below 0.05 °C/min.
Pulse duration is first set to determine ∆T i . Afterward, we must determine which V 2 (Fig. 3) must be applied to the polysilicon resistor for having the desired ∆T i . This step is performed by setting the source-meter on an initial V 2 value, based on previous experience with poly-heaters. A first pulse is then applied to the test-chip, and a digital oscilloscope performs a four-point resistance measurement of the aluminum meander. According to the relative ∆T, V 2 is increased or decreased as necessary, and the procedure is repeated until V 2 is within 3 mV from the value that corresponds to the desired ∆T i .
Once V 2 is set, power cycling begins. Every 50 cycles, a resistance measurement between the aluminum line and the power metallization reveals if a short occurred. If so, thermal cycling is automatically stopped. A time limit of 60 h (corresponding to 216,000 cycles) is set for the experiment. Thereafter, the SEM is used again to evaluate the power metallization's conditions.
In case shorts occur on one or more samples before cracking of power metallization, one of those samples is cycled further until cracks are visible. More pictures are taken by means of SEM across cycling. This procedure is meant to estimate how early does the health warning signal come.

Power metallization damage estimation
Fiji, an image processing package distribution of ImageJ [21], has been used to process SEM images.
The image processing concept is that voids and cracks appear like dark spots and lines on the power metallization. Consequently, a threshold to the gray values (in a scale from 0 to 255) distinguishing between damaged and non-damaged metal needs to be applied. Therefore, a valid choice of a threshold is needed. Figure 4 shows the power metallization of a sample before cycling. Backscattered electrons are used for imaging, showing the material difference between the gold nail head, the copper power metal, and the surrounding imide cover. Figure 5 shows a sample cycled with a peak temperature of 460 °C. The imide cover is not known to undergo any significant degradation process and is darker than power metallization. The use of the imide area for defining a threshold for image processing allows for the threshold to be defined on a stable reference into the image itself. Consequently, the threshold is defined as the average level of a small rectangular area relative to the imide cover on the bottom right corner of the image (Fig. 5).
Once the threshold level is defined, a manual selection of the copper area is performed. Shadowed areas immediately close to the nail head and the imide edge are excluded from selection (Fig. 6).
Once the selection is saved, the threshold is applied. Selection is then restored (Fig. 7) and the "analyze particles" tool is used. The area percentage into the selection occupied by particles is used as a damage estimator, which can be utilized to compare the damage level of different samples after cycling. This image processing has also been applied on the images of all samples in pristine condition to check consistency, revealing no particles at all for every sample. Table 1 shows the number of cycles to short, the peak temperatures applied for each sample, and the image processing results. Figures 5, 6, and 7 show the power metallization of a sample cycled with a peak temperature of 460 °C. Among Image of a sample (P1012) cycled with a peak temperature of 460 °C. The yellow arrow indicates the small rectangular selection used to define the threshold the samples cycled with such a peak temperature, this is the one that gave the highest damage estimation value. Nevertheless, no significant cracks are visible, and the damaged area is significantly smaller compared with those of the samples cycled with a peak temperature of 400 °C.

Results
Shorts between aluminum lines were detected after a number of cycles ranging from 2200 to 2950 for all samples cycled with a peak temperature of 460 °C, thus showing a consistent behavior (boxplot in Fig. 11). Continued cycling up to 5300 cycles (with intermediate SEM imaging every 500 cycles) eventually starts to provoke significant cracking in the power metal, as shown in Fig. 8.
According to the particle analysis in Fig. 8, the damaged area results to be 1.101%, a value compatible with those of the samples cycled with a peak temperature of 400 °C (Table 1). Figure 9 shows the image of the power metallization of a sample cycled with a peak temperature of 400 °C and a threshold applied. Cracks are apparent in this sample, as well for the other ones cycled with the same parameters, and the area damage estimator is significantly higher compared with those of the samples cycled with a peak temperature of 460 °C. The box plot in Fig. 10 shows damage estimation for the two groups and should show no significant differences between the two groups for an ideally working monitoring structure.
The number of cycles required for a short between aluminum lines varies across the samples cycled with a peak  the gray area is discarded, and the percentage of the total area of the black pixels inside the remaining area is the damage estimator reported in Table 1 . 8 Image of one sample (P1012) cycled with a peak temperature of 460 °C and applied threshold. Sample cycled further after impedance drop detection up to 5300 cycles. The same image processing routine as Fig. 7 was used temperature of 400 °C: no shorts were detected at all for two samples, and 11,050 cycles were sufficient to cause a short in one sample (boxplot in Fig. 11).

Discussion
Two different systems, namely, a copper film and aluminum lines encapsulated into oxide, follow two different fatigue models. Figure 12 shows the Woehler plot of the two hypothetical systems with mutually crossing cycle stress versus time to failure curves. We hypothesized that the relation between the behaviors of the copper film and the encapsulated aluminum system is comparable with the behaviors of systems 1 and 2, respectively (Fig. 12). In such a scenario, power metallization is always mildly damaged when a short between aluminum lines occurs at high peak temperatures. By contrast, a high scatter in the number of cycles to provoke a short in the sensing structure is observed for samples cycled at low peak temperatures. Meanwhile, power metallization is already severely damaged when this short occurs for all samples.
Apart from the aforementioned general differences in fatigue behavior between copper and encapsulated aluminum lines, a further explanation of the large spread of time-toshort in the sensing structure can be an interference effect: at reduced peak temperature, progressing copper power metallization fatigue results in a loss of stiffness of the system, which in turn slows down aluminum displacement. This Fig. 9 A sample (P1020) was power metallized at a peak temperature of 400 °C, and a threshold was applied. Impedance drop occurred at 31,850 cycles. The same image processing routine as Fig. 7 was used   presumable interaction between power copper degradation and the aluminum-oxide system degradation would make health monitoring at reduced thermo-mechanical loading impossible.

Conclusion
The experimental results indicate that the extent to which the two degradation processes correlate depends on the cycling temperatures. During high peak temperature cycling, power metallization will exhibit cracks after a short-circuit between aluminum lines occurs, while it remains acceptably intact until that moment. However, short-circuits may or may not occur under low peak temperatures, and health monitoring becomes impossible. This finding shows a range of cycling parameters for which the proposed monitoring structure is reliable. Further investigations may be conducted to determine whether it is possible to extend the temperature range for which the proposed structures consistently responds by adjusting sensing line features or to shift it to the wanted application-relevant interval. This step would allow us to perform health monitoring on a wider range of stress conditions.