Introduction

Nondestructively operating inspection- and analysis techniques represent an essential instance in the quality assessment and failure analysis chain. Due to the nondestructive character inspected parts and components remain unaltered allowing for screening and monitoring of features and conditions. Therefore, monitoring capabilities are highly valuable when investigating the formation and propagation of anomalies which may transform into a defect, thus supporting the exploration of the interaction of the involved materials and their structures under common and exceptional conditions. With the continuously increasing level of assistance and automation across the industrial, automotive and consumer sector high reliability expectations meet increasingly complex power- and microelectronic components and systems. For ensuring the stated performance and the required safety of electronic products, defect affinity, -evolvement, and -propagation need to be investigated and understood. Besides repetitive analyses for research and exploration purposes, broad range inspection abilities with accurate recognition performance are key to quality assessment and monitoring, inevitable for high reliability and premium products. Against this background, the high complexity of electronic components, the potential requirement of a high-throughput inspection, the aim for nondestructivity, and the necessity of high-accuracy detections meet, resulting in an extended amount of data that need to be analyzed at an elevated performance level.

Due to its nondestructive mode of operation, microscopic acoustic imaging is a widely used technique for defect inspection in quality assessment, technology development and failure analysis. Depending on the specimen’s individual material properties it allows for lateral resolutions on the order of hundreds of microns down to the single digit micron-range. Its contrast mechanism is based on the interaction of an insonated acoustic pulsed wave with the sample’s internal materials and their acoustic properties, respectively. Caused by continuously decreasing feature sizes along with an increasing material variety in such devices the incident signal gets reflected and scattered resulting in echo signals commonly containing trains of individual but distorted pulses. Interpretation of such signals, which are received by the acoustic transducer becomes increasingly difficult, requiring profound expertise of the machine operator. Previous studies have positively evaluated the potential of machine learning for an operator independent analysis of acoustic echo signals for the purpose of increasing inspection reliability and the level of automation for defect detection and localization in quality assessment and failure analysis [1,2,3,4,5]. Some of these studies have employed 1D convolutional neural network architectures for analyzing the shape of the acoustic signals in time domain [1, 2]. The present work is a continuation of those studies that extends the analysis from the pure time domain to the spectral and wavelet domains and the model architectures from 1D to 2D. Even though the information contained in the time signals is equal, representation in spectral—and wavelet domain may provide additional degrees of freedom which have the potential to contribute towards an increased performance and robustness of the analyses. In [2] it was reported that signal distortion caused by even slight defocusing of the acoustic transducer, highly decreased the classification accuracy and thus compromised the defect detection ability when analyzing flip chip devices. The present study also addressed these shortcomings and investigated the performance of 1D and 2D CNN network architectures when signals recorded at varying defocus were included into the training data. Furthermore, the concepts developed for acoustic microscopy-based analyses were extended and applied to the transient thermal responses to electrical excitation recorded in lock-in thermography for quantitatively and qualitatively estimating the depth of buried thermally active electrical defects.

Materials and Methods

Analysis of Acoustic Microscopy Data

Scanning acoustic microscopy was conducted on three individual sample sets and echo signals have been acquired in time domain for further processing and evaluation of ML-based analyses for operator independent and automated signal classification for defect detection.

Description of Samples and Equipment

Acoustic data from two sets of nonmolded flip chip samples and a power device (thin-die setup) have been acquired. Samples in “Set-I” contained flip chips with an edge length of the Si-die of 6 × 7 mm and a thickness of 350 µm. Flip chip bumps of this sample type were 50 µm in lateral diameter with a pitch of 135 µm. The samples of “Set-II” were flip chip devices with an edge length of the Si-die of 7 × 7 mm and a thickness of 85 µm. The flip chip bumps of these samples were 80 and 120 µm, respectively at varying pitch. Both sample sets were purposely taken out from the manufacturing process prior to applying underfill material between the Si-die and the PCB-substrate. Severe stressing for induction of accelerated aging and subsequent defect formation due to thermo-mechanical forces was conducted on the samples of “Set-I” and “Set-II”. To induce defects in the flip chip contacts samples of “Set-II” were placed alternately in boiling water and liquid nitrogen to generate sufficient thermo-mechanical stress to initiate bump-fractures. Samples of “Set-I” were extracted from reliability testing, where they underwent multiple flip chip attach reflow cycles in which samples are exposed to temperature gradients between room temperature and 240 °C. Devices in the sample “Set-III” were power devices which contained a thin Si-die glued onto a metal backplate. The die-attach contained delaminated areas which resulted in a decreased electrical and thermal conductivity. Due to the thin Si-thicknesses, a 2.4 mm 200 MHz acoustic transducer was employed for data acquisition for the sample Sets “I” and “III”, while a 3.2 mm focal length 200 MHz transducer was used for inspection and signal acquisition of sample “Set-II”. Acoustic excitation was performed using a commercial pulser/receiver unit DPR 500, JSR Ultrasonics Pittforth NY, USA. Received signals were digitized at a rate of 1GS/s and a resolution of 8bit using a M3i.21xx-x8-ADC, Spectrum Instrumentation GmbH, Grosshansdorf, Germany. Scanning was performed using a commercial 3-axis scanner to which the transducer was mounted.

Signal Pre-processing and ML-Based Analysis

In the spectral domain frequency related amplitudes and their distribution in the frequency band of the analyzed signal are represented. The spectrum of a time series of measurements can be computed through the Fourier transformation and is complex valued. Even though the information contained in the spectrum is the same as in the representation in time domain, an interpretation of the spectral characteristics may allow for a more distinct interpretation since it represents the signal in terms of periodicities and thus may allow for a more accurate class assignment. In the left column Fig. 1 contains the acoustic time domain signals received from three distinct objects of a flip chip device of a sample of “Set-I”. These objects are Top: a fractured bump, vertical center: an intact bump and in the Bottom row: a signal from a location with an underfill delamination. The center column of Fig. 1 contains the spectral representation of the signals shown in the Left column. To have the spectra contain temporal information, the real part and the imaginary part of the complex valued spectra have been summed. This signal processing will be referred to as “real + imag” in the remainder of this paper. A more intuitive way for signal representation in the spectral domain would be in terms of power distribution as a function of frequency. The power would be computed from the absolute values of the complex valued spectrum, which would erase all temporal information. This, however, was considered non-beneficial for the current purpose of class assignment of the acoustic signals, so the “real + imag” computation, as described above was chosen instead.

Fig. 1
figure 1

Preparation of training data for investigating the influence of the data representation domain onto the achievable accuracy and performance of 1D- and 2D model architectures. LEFT COLUMN: Acoustic echo signals obtained by SAM from: TOP: a defective flip chip bump, VERTICAL CENTER: an intact flip chip bump and BOTTOM: an underfill delamination. Graphs in the HORIZONTALLY CENTRAL COLUMN show the linear spectra (Real + Imaginary part of the FFT) of the signals in the LEFT COLUMN. 2D-maps in RIGHT COLUMN show the wavelet coefficients obtained from the signals in the LEFT COLUMN through continuous wavelet transformation

Two different types of 1D architectures have been trained for classification of the acoustic signals into four classes. These classes where: “defective Bumps”, intact Bumps”, “delamination of underfill” and “background/undefined”. It has been previously observed that the number of misclassifications can substantially be reduced when providing a separate class for signals that do not belong to one of the other classes and thus it was decided in the present work to consider these observations. For the 1D architectures the input layer was of dimension one. In the first type of architecture an image input layer of dimension [1, #of_samplepoints_in_signal, 1] was employed. However, this input layer did not allow for complex valued input signals. Therefore, the “real + imag” processing of the spectrum was conducted to maintain temporal information, of complex valued data, as elaborated above. The second type of 1D architecture contained a sequential input layer, which allowed for the use of complex valued data. Here the complex spectra obtained from Fourier transformation were employed as training data. Both 1D-networks contained three convolutional blocks followed by a global-max-pooling layer and two fully connected layers with a classification layer as the output. In the following, analyses based on the 1D architecture with the image input layer are referred to as “FFT (real + imag)”, while the analyses with the 1D network containing the sequential input layer are referred to as “FFT (complex)”.

A more explicit, however redundant signal representation can be obtained in the time-frequency domain. Here signal spectra are plotted versus the time axis and thus the temporal distribution of the spectral content is exhibited explicitly. In the present study signals underwent a continuous wavelet transformation (CWT) using the Morse wavelet prior to ML-processing. For each signal this transformation provides a two-dimensional map of the wavelet coefficients versus time and frequency, as shown in the right column in Fig. 1. In the CWT-maps the variation of the spectral shape for a signal is contained as a function of time. The shape of an acoustic signal directly corresponds to the interaction of the acoustic wave with the sample. Therefore, with the CWT minor deviations in the shape of a time signal are presented time-resolved in the more explicit representation of the spectral domain, which results in a two-dimensional representation in the time-frequency domain. The maps in Fig. 1 illustrate the distinct differences between the shapes of the wavelet coefficients of the signals recorded from the three different objects.

For processing such two-dimensional matrices, a machine learning architecture requires a two-dimensional input layer. The model employed here was a modified version of the GoogleNet model [6]. This is a pretrained model designed for recognition of structures and shapes in two-dimensional image maps and is expected to be predestined for characterization and classification of the wavelet matrices obtained from the acoustic signals through CWT. The dropout layer “pool5-drop_7×7_s1” of the GoogleNET model was replaced by a dropout layer with weighting of 0.6. Also, the classification layer “loss3-classifier” and the output layers were modified according to the number of classes of the individual training data sets.

Figure 2 shows a summary of the processing and analysis flow of the acoustics-related part of this study. SAM data acquired from the three sample sets were transformed to the spectral- and wavelet domain, respectively and then separately used for training the three in Fig. 2 shown ML-learning approaches/models. For benchmarking and performance comparison the 1D (real + imag) network was also trained and analyzed using unprocessed signals in time domain. Performance assessment of the individual ML-based analysis approaches and ML-models was realized through the classification results obtained from applying labeled test data to the trained models.

Fig. 2
figure 2

Concept of the machine learning based acoustic analysis. For three different case studies acoustic data were acquired and stored. Unprocessed rf-signals, where then either transformed into the frequency and time-frequency domain, respectively. Following this transformed data were used for training of different machine learning models in 1D and 2D

Lock-In Thermography Data Acquisition and ML-Analysis

Lock-in thermography is a highly sensitive technique for the detection and localization of weak thermal sources which correspond to electrically conductive, but resistive defects. Its sensitivity reaches down to several µW of dissipated power. While, due to the application of highly magnifying infrared optics a high lateral resolution down to the range below 10 µm can be achieved, a precise localization in the axial dimension can still be difficult to obtain. In novel microelectronic systems three-dimensional interconnect structures and multi-chip assemblies highly challenge precise defect localization for further physical failure analysis. The present study investigates the applicability of a machine learning based analyses of thermal signals for the estimation of the beneath-surface depth of a buried thermally active defect. Pursued here were two approaches of machine learning based analysis. The first aims at performing classification-based analyses to assign the thermal source to an axial region (e.g., die-interfaces) at increasing depths. The second approach employed a regression-based analysis to quantitatively derive the depth of the thermal source underneath the sample surface. As shown in Fig. 3 the same models have been employed for analyzing the time-resolved thermal responses as were used for the acoustic analyses. For deriving quantitative depth estimates the terminal layer in all three models was replaced by a regression layer.

Fig. 3
figure 3

Concept of the machine learning based analysis of TRTR-signals acquired by LIT for depth localization of thermal sources

Sample Description and Used Equipment

The setup employed for the experimental work was an in house modified lock-in thermography system of the type ELITE, ThermoFisher Scientific, Fremont, CA, USA. The modifications enabled the acquisition of the transient thermal signals for post processing and machine learning based analyses. Investigated were samples of bulk monocrystalline Si of increasing thickness to obtain thermal transients of the increasing propagation lengths to generate pathlength related variations in the thermal signal for mimicking stacked-die assemblies. It should be noted that this sample was chosen to fundamentally investigate the general applicability of machine learning related signal analysis for defect-depth localization however, with limitations. The samples backsides were coated by Pt and thermal sources have been created through needle probing and the application of an electrical current of differing strength and frequencies.

Data Pre-processing

Due to its transparency in the infrared range artifacts in the experimental data have been observed resulting from the interference of the conducted and radiated thermal components. For compensation of these artifacts acquired transient thermal signals have been decomposed into its source signals by conducting independent component analysis (ICA) as previously described by Kögel [7] and then recombined without the radiation related component aiming at the suppression of the radiated part in the resulting signal. All machine learning based analysis approaches have been performed on the raw and the ICA—preprocessed versions of the thermal signals resulting in a total of 12 trainings and test-analyses as shown in Fig. 3. Following this the above-described pre-processing steps using fast-Fourier- and continuous wavelet transformation, respectively have then been performed on the signals.

Employed Software and Toolboxes

All analyses, processing and computations were conducted in MATLAB, The Mathworks Inc., Natick, MA, USA. Custom software has been developed in house to enable convenient and intuitive data management of the acoustic and thermographic data, signal extraction and preprocessing. Besides the signal- and image processing toolboxes the license included MATLAB’s deep learning toolbox which formed the foundation for the development and evaluation of the machine learning based analysis approaches shown here.

Results and Discussion

The purpose of the work underlying the present paper is the development of machine learning based solutions and the evaluation of their potential for the analysis of transient measurement data with respect to their application for interpretation and classification in the context of nondestructive defect identification and localization in complex microelectronic packages. Furthermore, the study aims at investigating the analysis performances of different model architectures for the analysis of signals obtained from differing samples and inspection methods. The methods addressed here are acoustic microscopy and lock-in thermography. The following section presents and discusses the results obtained with the materials and methods described above.

Acoustic Microscopy Analysis

The investigations related to acoustic analysis included the inspection of three different sample sets (2× different types of flip chips, 1× thin-die power device), three different model architectures (2× 1-D, 1× 2-D), three (four) different methods of pre-processing (none, 2× FFT, 1× CWT) plus the investigation of the robustness behavior upon defocus related signal distortion and the effect of inclusion of corresponding signals in the training sequence.

Performance Comparison of Model Architectures and Pre-processing

As shown above, sequences of measurement data can be represented in different domains without changing the information content. Figure 4 contains the Test-accuracies of all model architectures employed here for classification of acoustic signal data. Values obtained from the data of the individual sample sets are grouped. Test-accuracies obtained from the previously trained models using the unprocessed signal in time domain data are represented by the blue bars in all three groups. Orange indicated are the Test-accuracies obtained when signals were preprocessed using the fast Fourier transform (FFT) and adding the real and imaginary parts. The yellow stained bar represents the Test-accuracies when the model was fed the complex-valued spectra obtained from the FFT. In the latter three cases the models had the same complexity they only differed in the type of the input layer. For the complex valued spectra, the model’s input layer was a sequential input layer, while for the two others the model contained an image input layer. In both cases the input layer received the data in one dimensional vectors. The fourth bar in each group, colored purple, shows the test-accuracy that was obtained with the data of each sample type from the modified GoogleNet architecture. Here, a continuous wavelet transform was applied to each signal prior to training and analysis, which resulted in a two-dimensional representation of the wavelet correlation coefficients over time- and frequency. The GoogleNet model, developed and pretrained for shape-recognition in 2-D images contained a 2D input layer which allowed handling the maps of wavelet coefficients. Figure 4 shows the lowest values of the test-accuracy when the signals that were to be processed were in the time domain. In all three samples the models that learned and classified the signals in frequency domain showed an almost equal performance regardless of the input layer type (image versus sequential), but a slightly better performance than processing the signals in time domain. Also, in all cases using the GoogleNet model and performing the training and analysis in 2D highest Test-accuracies were observed. It should be noticed that for the samples of “Set-II” (“FC type II”) all four models performed similarly well with Test-accuracy values above 99%. Observed, but not shown here was, that the computational effort was substantially higher for the 2D analysis, which can be explained by the higher complexity of the GoogleNet model. Therefore, in situations, where computational performance is an issue, it would be recommended to perform classification after fast Fourier transformation of the signals employing a model with a 1D input dimension.

Fig. 4
figure 4

Comparison of classification performances of 1D- and 2D CNN architectures for the two different types of flip chip samples and the power device analyzed here. Left: Test Accuracies obtained with sample “Set -I”. Center: Test Accuracies obtained with sample “Set-II”. Right: Test Accuracies obtained with sample “Set-III”. For all sample sets models trained with preprocessed data (spectral and wavelet domain) showed Test Accuracies higher than 90%. Values above bars are in [%]. Analyses of unprocessed data showed lowest performance values

Figures 5 and 6 exemplarily show the results of the signal classification using the trained 2D-GoogleNet model for a flip chip sample of “Set-I” and a power device of “Set-III”. In both images the map of classifications with the highest probability of the class assignment is superimposed onto the acoustic microscopy image computed from the signal energy. In Fig. 5 red-stained pixels indicate, where the model assigned the signal to the class “Defective Bump”. Green indication corresponds to an assignment to the class “Intact Bump”. As shown in the methods section, the models were trained on four classes. This approach followed a previous observation [2] that the introduction of dedicated classes for separate structures largely decrease the number of misclassifications. To keep the map intuitive, the classes “No Bump” and “Background” are blanked in Fig. 5. Figure 6 contains the classification result for the detection of delamination defects in a power device of sample “Set-III”. The graph on the left exhibits the acoustic image, the center graph shows the classification map with green pixels indicating an intact interface and red corresponding to a delaminated interface region. The graph on the right in Fig. 6 contains the certainty map of the classifications in the center graph. High certainty values are observed in more homogeneous regions and fluctuation occur in areas that are rather nonhomogeneous. This is likely explained by signals at edges between delaminated and adhering regions containing varying signal shapes. Such signals on the other hand are not represented in the training data with the same occurrence as signals from the central regions of the individual classes and thus not used in the training to same extend. A common procedure to handle such issue is to blank results that correspond to classifications that have a certainty value below a defined reliability threshold.

Fig. 5
figure 5

ML-based automated detection of bump fractures employing the 2D-CNN approach using the modified GoogleNet model. Example was taken from sample “Set-I”. Classification was performed in four classes: “Defective Bump”, “Intact Bump”, “no Bump” and “undefined”. Red indication corresponds to defective bumps and green to intact bumps, while the other two classes were blanked (Color figure online)

Fig. 6
figure 6

Results of the ML-based acoustic analysis for the detection and classification of delamination defects in a power device from sample Set-III. LEFT: Acoustic image of the device. CENTER: Classification map with green indicating adhesion and red-delamination. RIGHT: Certainty map of the classifications in the CENTER graph (Color figure online)

Defect Detection Performance Upon Defocus Related Signal Distortion

A reduced classification certainty and an increased number of misclassifications has previously been observed when the transducer-sample distance varied between individual data acquisitions [2]. Since, in a production or testing environment multiple operators conduct the data analysis, an increased likelihood of the unintended occurrence of such deviation is expected. For this reason, it has been investigated, whether the classification robustness upon defocusing-caused signal distortion can be increased when defocused data are included during the training procedure. Figure 7 contains the results of these investigations for data recorded from a flip chip sample of “Set-II”. Data acquisition has been conducted at six different transducer-sample spacings and data were labeled and included in the training of all three models. The defocusing varied between + 290 and − 390 µm, where negative values correspond to the transducer moving closer to the sample surface. Since, the 1D models showed nearly identical Test-accuracies, only the results of the 1D real+ imag—model and thee 2D-modifed GoogleNet model are provided to keep the figure intuitive. The graphs in the left column in Fig. 7 show the acoustic micrographs of the same region of a sample of “Set-II”, however, recorded with different defocus. Also, the illustration is limited to the outermost defocus positions to illustrate the classification behavior at the most extreme signal distortion. It can be seen in the left column that the appearance and sharpness of the bumps and the under-die structures in the acoustic micrographs vary upon the defocus. The second column from the left in Fig. 7 contains the classification results obtained by the 1D model employing the real plus imaginary part of the Fourier transformed of the signal superimposed onto the acoustic micrograph. The third column from the left contains the results of the 2D-modfied GoogleNet model in the same representation. In these graphs green indicates intact bumps, red corresponds to defective bumps and magenta shows underfill delaminations (trapped air underneath the die). Similar to the representation in Fig. 5 the additional classes are blanked to keep the images clear and intuitive. The colorful graphs in the fourth column from the left contain the certainty values obtained for each pixel using the 2D-GoogleNet model. Red indicates a high certainty, while yellow–green–blue correspond to decreasing values in that order. Certainty values below 85% are blanked since they are not used for indication here. It is noticeable that the defect bumps are correctly classified regardless of the defocus the signals have been recorded with. Also, the delaminated spot is clearly detected by both models over all defocus positions. However, the 2D model seem to perform less accurate upon the defocus, as “intact-Bump” assignments are made outside the bump structures. It is also noticeable that the rim of the defective bumps (bottom-left in the images) is classified as intact, while these bumps are clearly defective. It seems that the 2D modified GoogleNet model shows an affinity towards misclassification into the class “intact Bump”. The rightmost column in Fig. 7 illustrates the pulse distortion that occurs upon defocusing the acoustic lens. The signals were recorded from a defective bump and major alterations in the signal shape can be recognized upon the defocus, which challenges the robustness of the classification. The results shown in Fig. 7 clearly show an increase in the classification robustness of defocus related distortions when the training data contain signals that exhibit such variation in the signals shape .

Fig. 7
figure 7

Defect detection employing distorted signals. The 1D ML based analysis (with FFT signal preprocessing) was investigated. Signals have been acquired with increasing defocus starting at 500 µm above the focus until 600 µm beyond the focus. LEFT: Map of the signal energy; second COLUMN FROM LEFT: Classification map obtained using the 1D (real + imag) processing superimposed on energy image—green: intact bump, red: defective bump, magenta: delamination, background classification has been blanked. Third COLUMN FROM LEFT: Classification map obtained using the 2D modified GoogleNet Model and the CWT-processing superimposed on energy image—green: intact bump, red: defective bump, magenta: delamination, background classification has been blanked. Second COLUMN FROM RIGHT: classification certainty maps of the 1D model results—all bumps have been classified with certainty values near 100%. Right: Signals of a defective bump illustrating signal distortion upon defocusing. TOP-RIGHT: 290 µm above interface in focus, RIGHT-VERTICAL CENTER: Interface in focus; BOTTOM-RIGHT: 390 µm closer than interface in focus (Color figure online)

Lock-In Thermography

Lock-in thermography is a highly sensitive technique for the detection and localization of even minor electrical dysfunction when accompanied by thermal emissions. As three-dimensional architectures, become increasingly relevant in microelectronics the defect localization needs to also extend to the third dimension. Previous approaches employed the temporal delay between electrical excitation and the reception of the thermal emissions in terms of phase to pinpoint the defect as the thermal source in depth [8, 9]. While, this approach is generally valid, the practical application requires complex calibration and shown knowledge about the samples structure in order to obtain a precise depth estimate. The present research investigated the general applicability of machine learning based approaches to pinpoint defect related sources of thermal emission in the depth dimension inside a specific sample. Evaluated for this purpose were the three ML models (2× 1D, 1× 2D) as shown above. Equally to the acoustic data analysis time-resolved thermal responses were preprocessed by fast Fourier transformation and applied to the 1D model architecture and continuous wavelet transformed into the time-frequency domain for the 2D modified GoogleNet model, respectively. In extension to the previously described ML-based analyses the models have additionally been varied by implementation of a regression output layer (versus classification output layer) for directly deriving quantitative depth estimates. Figure 8 contains the results of the evaluation of both, the classification- and the regression-based analyses of all three models. The graph at the Top shows the Test-accuracies, for the classification. Here, the signals had to be assigned to one of six depth-classes. It can be noted that the 1D based analyses exhibited a considerably lower accuracy compared to the 2D based model. Since, Si is largely transparent in the infrared band is expected that thermal radiation emitted at the thermal source propagated directly through the sample interfering with the conducted component that irradiates at the sample surface. To investigate, whether this interference causes the low accuracy values, the raw TRTR signals have been decomposed using independent component analysis (ICA) [7] to remove the signal components that are related to the direct radiation. The results of the classification of the ICA-processed signals are on the right in the Top graph in Fig. 8. In the classification case the results obtained with the 1D models appear rather inconsistent, however classification using the 2D model with an accuracy close to 90% seems promising. The bottom graph in Fig. 8 shows the root-mean-squared-error (RSME) of the regression analysis in [µm]. Here a low value of the RSME corresponds to a higher accuracy. While for the 1D models an accuracy of 16–17 µm can be achieved the 2D model allows for a classification accuracy of 5–6 µm. For failure localization in a stacked device the results of all models should be acceptable, since the accuracies lie within the thickness of a common Si-die. In Fig. 9 results of the application of a test data set to the regression models are shown in detail. The upper two graphs contain the predictions made by the 1D models, while the bottom diagram shows the predictions of the 2D model. The dashed red lines indicate equality between label and prediction and would thus correspond to 0 µm RMSE or 100% accuracy. From these graphs the trend of the false predictions can be seen. In a nonlinear manner the 1D models underestimate the actual depths. The 2D modified GoogleNet model however exhibits an almost linear relation between label and prediction although with a certain variance. These graphs suggest that a prediction using the 2D model may be sufficiently reliable for a quantitative depth estimation of the thermal source in the mono-Si bulk sample. However, further research is necessary to transfer these findings into a practical application.

Fig. 8
figure 8

Results of the Model Tests on labeled data, which the model has not encountered during training. Data have been left raw and preprocessed to remove IR-radiation related signal components. TOP: Results of assigning signals into one of six depth classes. BOTTOM: Results of the regression analysis for quantitatively estimating the depth

Fig. 9
figure 9

Results of the predictions obtained from the three trained regression models employing test data. TOP: Test results of the model with the 1D image input layer. CENTER: Test results of the model containing the sequential input layer. BOTTOM: Test results obtained with the 2D modified GoogleNet model. The dashed line indicates equality between input and output and would correspond to 100% accuracy

Summary and Conclusions

The paper describes the development, application and evaluation of machine learning based signal analysis in 1D and 2D for nondestructive defect detection, identification and localization in failure analysis of microelectronic components. The methods addressed are acoustic microscopy and lock-in thermography. The potential of the approach is demonstrated by four case studies including automated bump inspection in flip chip devices and delamination recognition in a power device containing a thin Si-die by high resolution acoustic analysis. Furthermore, the quantitative localization of the depth of a thermally active defect using lock-in thermography through both, classification and regression was investigated. While for the acoustics-related analysis the potential is quite evident, a reliable application of machine learning for the application in lock-in thermography although promising, still requires further research.