Introduction

Microscopy and image analysis has been an important tool for investigating defects in materials and their microstructures. Manually detecting, e.g., cracks, grains or dislocations in photographs and digital images requires expertise, experience, and—depending on the size of the image or the number of features of interest—also a significant amount of time. Recently, high-throughput data analysis, which is concerned with handling huge numbers of specimens (e.g., even exceeding thousands of specimens) or with very large regions of interest (e.g., consisting of several thousands of images) is becoming more prominent in the era of data. Example of such approaches can be found in the work of Xiang et al.[1] where combinatorial high-throughput methods were introduced to discover new materials. Another example is the work by Castelli et al.[2] who analyzed a large space of 5400 different materials to obtain 15 promising candidates for developing new photoelectrochemical cells with improved light absorption.

In the area of wide-bandgap semiconductor material, silicon carbide (SiC) is the leading candidate with a high mechanical, chemical, and thermal stability. It has been shown to be highly suitable for power device applications. Therefore, efforts to produce high-quality SiC by reducing defects (especially dislocations) during the physical vapor transport (PVT) crystal growth process are important and require further optimization. Visual inspection and analyzing dislocation in semiconductors helps to understand the formation of defects during the PVT growth process and ultimately ensures the wafer’s quality during the production process. Such analysis is based on images produced by optical and scanning electron microscopy of the wafer surface or transmission electron microscopy techniques and X-ray topography which allow to follow single dislocation lines within the wafer. Those tasks are commonly done manually or based on classical image analysis methods that perform pixel-based operations, such as thresholding, line thinning, watersheding, to name but a few. These methods typically work only for a particular image and require significant experimentation for determining suitable parameters. In recent years, machine learning approaches have become popular in many research areas, e.g., to automate and accelerate the process of material discovery and design,[3,4,5,6] or to automate computer vision tasks such as object detection or segmentation.[7,8,9] However, obtaining large enough datasets for supervised training is still a challenge since it is time-consuming to annotate large datasets with potentially vast numbers of objects. Thus, “synthetically generated data” becomes helpful and enables the algorithm to train more effectively, e.g., Tremblay et al.[10 generated synthetic objects on top of a random image background and shows that training with synthetic data performs better than with real images. Trampert et al.[11] created artificial grain structures and used only a few hand-labeled real microscopy images to train a convolutional neural network (CNN). Similarly, Govind et al.[12] created artificial transmission electron microscopy images and used those to train a CNN for image segmentation tasks.

In this paper, we propose a combination of different data analysis techniques and deep learning methods in the material science domain for solving two main tasks: The first task is to generalize and automatize the process of creating a dictionary pool of etch-pit images that can be used for creating artificial training data. The second task is to use a deep learning framework to segment and count the occurrence of three different dislocation types (basal plane dislocation (BPD), threading edge dislocation (TED), and threading screw dislocation (TSD)) that commonly appear as etch-pits in KOH etching microscopy images of a 4H-SiC wafer. This allows to analyze a huge number of microscopy images with high fidelity to estimate dislocation distributions of different types and thereby helps to understand mechanisms that lead to defect formation and organization.

The paper is organized as follows: after the introduction section, in “Materials and methods” section, we describe all methods that were used in our work. In “Results and discussion” section, we present and discuss the results from the automated clustering process and the instance segmentation of various dislocation types. “Conclusion” section is the conclusion.

Materials and methods

The goal of this work is to analyze the dislocation content of a large SiC wafer of 10\({\hbox {cm}}\) in diameter. Subsequently, we start by describing the growth and preparation of the SiC crystal. This is then followed by introducing the imaging of the wafer as well as the automated machine learning pipeline for image analysis.

Materials, specimen preparation and microscopy

The SiC sample was cut from a crystal boule grown via the PVT method utilizing a RTD-6800 diamond wire saw. The crystal was sliced parallel to the seed direction, resulting in samples with a 4° off-axis angle with respect to the (0001)-plane, the same as the employed seed.

As a next step, the wafer was polished to remove the deformation zone induced by the sawing process. KOH platelets are heated up to 520°C. The sample is preheated and subsequently lowered into the melt inside a nickel sample holder for 7 min. After the etching step is completed, the sample is taken out of the melt, cooled down and cleaned with HCl and de-ionized water to remove any KOH residue. The employed setup is an in-house development of the group of one of the co-authors. The process is automated, spanning the preheating phase to the cool-down phase. One of the two resulting pieces came into close contact with the sample holder, inducing the sample holder’s geometry as a pattern, as seen in the microscopic images. This pattern is, therefore, not indicative of the sample’s crystal lattice property but, instead, a result of the reduced exposure to the KOH melt.

To reveal the location of dislocations the sample was etched with molten KOH. For this, the Si-face of a SiC wafer was cut parallel to the (0001)-plane and was etched selectively by KOH etching, revealing etch pits where dislocation lines pierce the surface. Both Si-face and C-face are etched simultaneously, but only the Si-face will be considered due to its anisotropic etching nature of etch pits. BPDs are located in the basal plane, i.e., parallel to the (0001)-plane. Thus, they can be detected by KOH etching due to the present small off-axis angle which is 4\(^\circ\) with respect to the basal plane.

Microscopy images of the KOH-etched sample’s Si-faces were taken utilizing a Zeiss Axio Imager.M1m microscope. The magnification was set to 20 times, and the mapping and stitching were carried out by the accompanying Zeiss Zen microscopy software. The whole wafer is shown in Fig. 1(a) and is divided into 20 sections 3 for the scanning process. Each of them again consists of around 2000 images with \(1292\times 968\) pixels [(see Fig. 1(b) and (c)], resulting in a total of 40000 images covering the whole wafer.

Figure 1
figure 1

Micrograph of the investigated SiC waver. The magnified region shows dislocation lines piercing the surface, revealed by KOH etching. The whole wafer image consist of altogether 40,000 images, one of them is shown in sub-figure (c).

Automated creation of an etch pit dictionary

In the following we introduce a data analysis pipeline for obtaining image regions that contain a single etch pit for a BPD, a TSD, and a TED with various Burgers vectors. In this part, the objective is not to identify all etch pits in the wafer; the objective is to automatically find a number of good examples (i.e., image sections with a single, clearly visible etch pit) that are suitable for creating semi-synthetic training data by superimposing such etch pit examples in an artificial image.

Identification of image regions that contain an etch pit

Each gray scale microscopy image [Fig. 2(b)] consists of \(1292\times 968\) pixels. As a preprocessing step, distortions and inhomogeneous contrast were removed in each image through the rolling-ball[13] and CLAHE (Contrast limited adaptive histogram equalization technique) method[14] (see Appendix A for a brief explanation). A binarization threshold was then applied to the image to reveal the darker etch pits (note, that during these steps it is not important that no all etch pits are identified). Additional image processing techniques such as erosion, opening, and dilation using the OpenCV library were used for separating contiguous pixel groups [Fig. 2(c)]. To estimate the shape, a possibly rotated ellipse is fitted to each of the obtained pixel groups [Fig. 2(d)]. By characterizing the shape and size of the ellipse it is possible to exclude pixel groups that do not correspond to etch pits. As shown in Ref. 15 this requires the calculation of three parameters, the “lengthiness” of the ellipse, the “copactness” of the pixel cluster and the “circularity.” If these characteristic values exceed certain values (here: 3, 0.6, and 0.6, respectively), then the pixel groups have a too extreme shape to be etch pits (see Ref. 15 for further details). These pixel groups act as a mask through which individual regions of interest containing a single etch pit are obtained (the masks were additionally expanded by a boarder of 10 pixels). This process is done automatically for all 40000 images covering the entire wafer and resulting in a total number of approximately 1.7million etch pit images. However, not all of these etch pit images can be used for further analysis, because, e.g., two etch pits might overlap (class 0 in Fig. 2(f) shows such examples), and therefore it would not be possible to uniquely determine the dislocation line character. To exclude these “bad examples” from the further analysis steps, a deep learning-based classification method is used, as introduced subsequently.

Figure 2
figure 2

Data analysis pipeline of the two main tasks: the automated creation of an etch-pit dictionary pool (top box) and predicting dislocations in the full wafer (bottom box). Further explanations are given in the text.

Binary classification for selecting good candidates of etch pit images

Generating synthetic training data by superimposing a number of each etch-pit images requires images of high-quality. Therefore, it is necessary to differentiate between a suitable and unsuitable candidate from the pool of extracted etch pits datasets obtained above. Suitable images contain only a single etch pit and no artifacts, see the examples shown in Fig. 2(f). To decide which of the images is a suitable image for further analysis, a CNN is used as a classifier in a supervised learning setup.

As network architecture a multiple-channel CNN with a ResNet34 as a backbone was used. The input channels used for this type of CNN consists of the original gray scale image, a magnitude spectrum of the Fast-Fourier transform of the image as well as of the wavelet transform [see Fig. 2(e)]. These three channels together contain significantly more information than just the gray scale values used in regular CNNs and are very beneficial for the training and testing accuracy.

As training dataset we select (by quick visual inspection which takes less than an hour) 1000 arbitrary etch pit images that are clearly good candidates and 1000 that are clearly unsuitable. The size of each image is \(64\times 64\) pixels. We also perform data augmentation such as rotating the image by various degrees as well as adding noise to the image to ensure that the model generalizes well. The resulting dataset is split into 5 parts that are used for a k-fold cross-validation, where we ensured that there is no class imbalance. The test dataset is chosen as one of the five parts. The performance of the trained network is evaluated in terms of classification accuracy, defined as the ratio between the sum of the correctly predicted records and the total number of predicted records.

In the data analysis pipeline, the input to the trained network is one of the image sections Fig. 2(c) and the output, i.e., the prediction of the network is the class label 0 or 1. There, 0 indicates that the image section contains multiple overlapping etch pits or even other (image or crystallographic) defects while 1 indicates an image section that contains exactly one etch pit and is suitable for further processing.

Automated clustering of different dislocation types

Subsequently, we continue with only those image section that were identified as good candidates for further analysis. The next goal is to identify the exact dislocation type (i.e., BPD, TED or TSD), which is so far not known. In principle this is again a classification task. However, labeling the training data (i.e., sorting all training images into several different dislocation categories) is very time consuming and additionally prone to errors.

A different approach consists in unsupervised learning where the training is not based on pre-assigned class labels. To sort etch pits images into various categories, clustering methods can help by automatically grouping similar images. For determining what “similar” means, clustering methods often operate in a feature space which represents the images by a reduced set of essential features. Effectively, this implies an automated feature extraction followed by a dimensionality reduction, cf. Figure 2(g).

For constructing this reduced feature space we firstly use VGG-16 neural network[16] to convert etch pit images into feature vectors. No training is done; instead, a pretrained network with weights taken from ImageNet-1k[17] has been use for this purpose. VGG-16 is a simple convolutional neural network that consists of 13 convolutional layers followed by three fully connected layers. The first fully connected layer is taken as the 4096-dimensional feature vector and contains a multi-scale lower-level representation of the image which is suitable for clustering. However, the number of dimensions is still high and can cause problems for clustering methods due to the sparsity of the feature space. Thus, a further dimensionality reduction of the feature space is performed.

A sophisticated method for unsupervised dimensionality reduction is the so-called uniform manifold approximation and projection UMAP.[18] It preserves the global data structure and is still computationally manageable. The mathematical background of this technique is related to Laplacian eigenmaps[19] which distribute data uniformly on (sub)manifolds of the data space. For more detail on the algorithm and description refer to[18]. As hyper-parameter of UMAP we used 10 neighbors, 32 random states, and a minimum distance of 0.3. The other parameters are taken as the default values of the library package.[18] The 4096-dimensional dataset from the VGG-16 feature vectors are then reduced to only three components (two of which are shown in Fig. 2(g).

Finally, we use HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise)[20] to automatically cluster the data of the three-dimensional feature space into three separate groups, which can then easily be identified as BPD, TED, and TSD as discussed below.

Predicting dislocations in all sections of the wafer

At the end of the previous steps we have obtained a number of small images each of which contains a single etch pit. We additionally know to which kind of dislocation this etch pit corresponds. The images together with the types are contained in the “etch-pit dictionary.” This allows now to generate synthetic image data based on these small images.

Synthetic image generation

Artificial images with etch pits are created by (i) creating a background and (ii) randomly or systematically placing images from the dictionary on the background and simultaneously creating a mask for the semantic segmentation.

Background images (\(1024\times 1024\) pixels) are grown by a non-parametric sampling method for texture synthesis.[21] From an initial seed (\(200\times 200\) pixels), which is taken from a real microscopy image, the texture of the background image is grown one pixel at a time by a Markov random field model. Each newly synthesized pixel is generated from the center of the chosen neighborhood, which is similar to the pixel of the neighborhood. These neighborhoods are already found by the algorithm.

As a parameter, the minimum and maximum number of BPDs, TEDs, and TSDs that can appear in each of the images is defined. The algorithm randomly chooses the number of each dislocation type within these ranges. Each image is randomly selected from the image pool and pasted on the randomly chosen background image. An additional method is to place dislocations along a straight line segment, mimicking low angle grain boundaries. Altogether, around 580000 etch pit images were used to create randomized training datasets. The resulting synthetic images are obtained together with the mask for semantic segmentation and with the position of each dislocation etch pit. We generate about 10000 images for the training datasets and 1000 images for validation. The size of each synthetic image is \(512\times 512\) pixels.

Instance segmentation and classification of dislocation types

The instance segmentation and classification of dislocation types are done by training a Mask R-CNN[22] deep learning model, which has been implemented in Detectron2[23] (a Facebook AI Research’s next-generation library that provides detection and segmentation algorithms), which is built based on the Pytorch platform. Mask R-CNN[22] extends Faster R-CNN by adding a branch for predicting segmentation masks on each Region of Interest (RoI) in parallel with the existing branch for classification and bounding box regression. The mask branch is a small fully connected network applied to each RoI, predicting a segmentation mask in a pixel-to-pixel manner. Mask R-CNN is simple to implement and to train given the Faster R-CNN framework, which facilitates a wide range of flexible architecture designs. Additionally, the mask branch only adds a small computational overhead, enabling a fast experimentation.

In our implementation we train the model with a ResNet101 as backbone using pre-trained COCO weights. The dataset is converted into COCO format, in which the annotations contain a list of dictionaries with the required information, suitable for Detectron2. The performance of the segmentation is evaluated in terms of the root mean squared error (RMSE), which is defined as

$$\begin{aligned} e = \sqrt{\frac{1}{N} \sum _{i=1}^{N} (y_{i}-\hat{y}_{i})^2}\;, \end{aligned}$$
(1)

where N is the number of values, \(y_{i}\) is the i-th truth value and \(\hat{y}_{i}\) is the i-th predicted value. For the evaluation of the performance, 5-fold cross-validation was conducted giving for each fold the accuracy of 0.89, 0.92, 0.96, 0.94, and 0.92, respectively, which results in an average accuracy of \(\approx 0.92\).

This was the last step in this lengthy data analysis pipeline. We are now able to detect the position and dislocation type in each of the images of the full wafer, as shown in Fig. 2(k). The Burgers vector of the respective defect can be obtained in a conventional manner by computing the radius of the etch pit (compare).[15]

Results and discussion

In the following we start by investigating the physical soundness of the data analysis results. Afterward, the dislocation predictions are discussed within the materials scientific context.

Robustness of the clustering

The first part of above introduced data analysis pipeline (etch pit dictionary creation) was designed to be as robust as possible. In particular, it was designed such that missing even a larger fraction of etch pits should not have any impact on the quality of the dictionary. As an important part we will now investigate the clustering results, Fig. 2(h) in more detail.

There are three different dislocation types, which commonly appear in microscopy images of SiC after etching with molten KOH: BPDs have a sea-shell-like shape, while TED and TSD have a round or hexagonal shape with a core in the center region. Larger cross sections indicate larger Burgers vectors. In Fig. 2(g) and the magnification in Fig. 7 Appendix C we observe that similar etch pits are located close to each other in the latent space generated by UMAP, which is beneficial for the clustering process. Figure 2(h) shows three clusters which are obtained by applying the HDBSCAN clustering method to the three-dimensional UMAP latent space. For visualization purposes, they are further reduced to two dimensions, shown in gray, orange, and blue color, respectively. Note, that (again for visualization purposes) in Fig. 2(g) we only show 1000 data points from part 9 of the wafer with embedded images of those etch pits.

How accurate was the clustering? Do the different colors in fact correspond to the three types of dislocation (BPDs, TEDs, and TSDs)? To answer this question, we take a look at the correlation between the three feature components shown in Fig. 3. First of all, we observe that three components are sufficient to distinctly separate the point clusters, e.g., in the plot of component 1 vs. 3 TSDs are even linearly separable from the other dislocation types. Similarly, plotting components 1 vs 2 and ignoring the TSDs makes BPDs and TEDs linearly separable as well. The three components can not directly be interpreted in terms of physical features. However, this is not a shortcoming of the method as it allows to detect patterns and structure in the data that would otherwise be difficult to obtain.

Figure 3
figure 3

Left: Visualization of all three components after clustering. Right: Magnification of the data projected on the first two components (i.e., the middle plot in the leftmost column of the left figure) along with examples of the etch pit images. The value ranges of the three latent space components are the same, and therefore, no scales are given.

Exploring the structure of the latent space can also be done by taking a look at how the corresponding images change when moving from one cluster to the next. Figure 3 (right) shows several example images in the latent space. Position A and G both indicate very extreme shapes: both are rather large and have a strong contrast. However, A belongs to the group of lengthy shapes (the basal plane dislocations) while G belongs to the threading screw dislocations and is rather round. In between is the group of threading edge dislocations. They are smaller than the BPDs but still roundish. The clustering method successfully separates even those images that are close to each other, e.g., E and F. Visual inspection shows that E and F are indeed different. The same holds for B and C which implies that the three groups of dislocations can be unambiguously separated, making the method very robust for this application. However, it should be mentioned that clustering as an unsupervised learning approach is not able to give information about the degree of reliability of the results. This still requires to check the plausibility of the results, e.g., by an expert taking a look at a few images for their correct assignment to clusters.

Reliability of predicting various dislocation types

For the purpose of validating the trained segmentation model three different datasets were created that were not used in the training process:

  • The first dataset contains 1000 synthetic images with a low etch pit density where the number of etch pits were randomly varied within the ranges of 0..5 TSDs, 0..10 TEDs, and 0..20 BPDs.

  • The second dataset again consists of 1000 synthetic images. However, each of them contain, on average, a larger number of etch pits. Here, we used the ranges of 0..20, 0..50, and 0..200 for the quantities of the three dislocation types.

  • The third dataset consists of 100 real microscopy images that were manually annotated by a domain expert.

Figure 4(a) and (d) of panel A show examples of synthetic microscopy images for the case of low dislocation density and high dislocation density, respectively. The images contain all three types of dislocations, for which a number of etch-pit images are randomly chosen from the dictionary. The dislocation images of high density are unrealistic since such densities are typically not found in commercial SiC wafers. However, including this extreme case turned out to be beneficial for the training process of the instance segmentation since the difficult cases increase the variance of the dataset and thereby helps the model to generalize better.

Figure 4
figure 4

Panel A: Results comparison between ground truth and predicted segmentation. (a) and (d) grayscale images of synthetic low and high dislocation density; (g) real microscopy image; b and e: ground truth segmentation from the synthetic image, (h) ground truth segmentation from hand labeling, (c), (f) and (i) predicted results from the deep learning. Panel B: Visualization of the three types of dislocation density in the whole wafer.

The ground truths and predictions are shown in the middle and right column of Fig. 4 panel A. To quantify the error the RMSE is calculated for all dislocation types resulting in \(e_\text {BPD}=13\), \(e_\text {TED}=23\), and \(e_\text {TSD}=3\) for the case of high dislocation density. These values are much smaller for the case of lower dislocation density which are \(e_\text {BPD}=1\), \(e_\text {TED}=4\), and \(e_\text {TSD}=0.5\), respectively. For the real images, the RMSE values are obtained as in \(e_\text {BPD}=3\), \(e_\text {TED}=5\), and \(e_\text {TSD}=2\). Analyzing the prediction performance of the model for all images of the validation datasets results in the distributions of absolute classification errors as shown in Fig. 6. The dataset with the high dislocation results by design in the worst performance: the dataset was created to be the most challenging one. By contrast, the validation dataset with the lower density the model shows a very good prediction accuracy. The ultimate test, however, is the real dataset since it is a subset of the whole wafer. For this, the variance of the errors is larger than that for the low density synthetic dataset. However, on average, only 1.9 BPDs, 0.9 TEDs, and 1.3 TSDs were misclassified. In many cases, there even was no misclassification at all. Assuming that this dataset is statistically representative of the full waver it can be assumed that this high level of accuracy also will hold for the analysis of the full wafer.

Analysis of the full wafer

Analyzing the full waver is a formidable task since it consists of altogether about 40000 individual images. The above introduced data analysis pipeline is suitable for such a task because even though training of the Mask R-CNN network is computationally expensive, making predictions about location and type of etch pits in new images is significantly faster. The total time required for the prediction across entire 100\({\hbox {mm}}\) wafer is roughly 75\({\hbox {min}}\) with the batch size of 4 when running on an 8-core workstation with Intel Core i9-10850K CPU, 32 GB RAM and a Nvidia GeForce RTX 3060 GPU with 12 GB RAM.

Altogether 1.7 millions etch pits were identified and located. Figure 8 in the appendix summarizes the dislocation content for each of the 20 wafer subdivision, each of which again consists of a large number of individual images. BPDs are the predominant type of dislocation in most of the regions while on average TEDs occur 50% less. The number of TSDs is the lowest in all wafer parts.

To illustrate the result of this analysis the resulting dislocation density distribution of the three dislocation types obtained for the whole wafer is shown in Fig. 4 panel B. The density distribution of BPD is the highest with a maximum value of around 1.6. 10\(^{5}\) \({\hbox {cm}}^{-2}\), while the maxima of TED and TSD are approximately 1.2 \(10^5\)\({\hbox {cm}}^{-2}\) and 0.8 \({10^5}\)\({\hbox {cm}}^{-2}\), respectively. We observe that TEDs and TSDs are mainly concentrated in the central region and on the boundary of the wafer. The center of the wafer is mainly BPDs-free and then takes a high value. Due to the overlapping of the image sections, artifacts from imperfect image registration in form of two vertical lines occur (cf. middle and right figure of Fig. 4(b). The slightly tilted vertical lines in the left plot are arrays of BPDs, forming several low angle grain boundaries that are distributed approximately in an equidistant manner. They occurred due to the high curvature of the growth interface and the shear stress acting on the grown crystal[24] and became visible only through this analysis. Due to the inhomogeneous etching process, the cross-hatched patterns in the TED and TSD defect maps do not correspond to a real variation in defect density. Furthermore, a varying TED density can be seen on the upper left side approximately every \(30^0\) originating from the center of the wafer.

Are there any dislocations missing?

In our study, we assumed that most of the etch pits are either BPDs, TEDs or TSDs. However, in reality, there can also be dislocations of mixed line characters. This type of etch-pit occurs because of the switching direction of dislocations, e.g., a TED can be deflected into a BPD, or vice versa, due to macro-step formation.[25] Similarly, TSDs can bent into BPDs and vice versa (“macro step flow”)[26] Furthermore, “shallow dislocations” were observed to form a round pit (with a curved bottom) without a core. Since dislocations cannot start or end inside the crystal, this structure can result from a dislocation half-loop[27] For those pits, all dislocations are somewhat tilted.[28]

Clearly, all of these cases are not explicitly considered in our analysis and are rather assigned to one of the three classes. Here, the clustering results might help to identify particular etch pit shapes by, e.g., taking a look at the extreme ends of a cluster, similar to what was done in Fig. 3. Another option to extend this investigation is to further cluster the content of a single cluster of images and thereby to obtain further subdivisions of that cluster. This might be a good starting point to detect new sub-groups of etch pit images that are similar in some aspects, which could be, e.g., the case for a mixed dislocation.

Conclusion

We have proposed a complex image analysis pipeline that consists of a number of classical image analysis techniques, supervised deep learning methods as well as unsupervised approaches. A particular emphasize was the robustness and accuracy of the framework. Generation of semi-synthetic training data was key for training an instance segmentation neural network without manually creating data annotation. This helped to rapidly analyze thousands of images and obtain accurate and spatially resolved information about the distribution of different dislocation types.

Our method can be applied to quickly identify areas of defect density inhomogeneities, for example, areas of high BPD density due to localized stress during crystal growth. The increasing diameter standard of SiC wafers of up to 200\({\hbox {mm}}\) directly implies that KOH etching images have to be analyzed automatically since such a high amount of data cannot be handled otherwise.

Ultimately, high-throughput analysis methods as the one in this work will contribute to understanding of defects in SiC which is the basis for optimizing the process parameters during growth.