Sparse deep encoded features with enhanced sinogramic red deer optimization for fault inspection in wafer maps

Altantawy, Doaa A.; Yakout, Mohamed A.

doi:10.1007/s10845-024-02377-4

Sparse deep encoded features with enhanced sinogramic red deer optimization for fault inspection in wafer maps

Open access
Published: 20 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Sparse deep encoded features with enhanced sinogramic red deer optimization for fault inspection in wafer maps

Download PDF

303 Accesses
Explore all metrics

Abstract

Due to the complexity and dynamics of the semiconductor manufacturing processes, wafer bin maps (WBM) present various defect patterns caused by various process faults. The defect type detection on wafer maps provides information about the process and equipment in which the defect occurred. Recently, automatic inspection has played a vital role in meeting the high-throughput demand, especially with deep convolutional neural networks (DCNN) which shows promising efficiency. At the same time, the need for a large amount of labeled and balanced datasets limits the performance of such approaches. In addition, complex DCNN in recognition tasks can provide redundant features that cause overfitting and reduce interpretability. In this paper, a new hybrid deep model for wafer map fault detection to get over these challenges is proposed. Firstly, a new convolutional autoencoder (CAE) is employed as a synthetization model to fix the high imbalance problem of the dataset. Secondly, for efficient dimensionality reduction, an embedding procedure is applied to the synthesized maps to get sparse encoded wafer maps by reinforcing a sparsity regularization in an encoder-decoder network to form a sparsity-boosted autoencoder (SBAE). The sparse embedding of wafer maps guarantees more discriminative features with 50% reduction in spatial size compared to the original wafer maps. Then, the 2D encoded sparse maps are converted to 1D sinograms to be fed later into another aggressive feature reduction stage using a new modified red deer algorithm with a new tinkering strategy. The resultant feature pool is reduced to ~ 25 1D feature bases, i.e., ~ 1.5% of the initial size of the 2D wafer maps. Finally, for the prediction stage, a simple 1DCNN model is introduced. The proposed inspection model is tested via different experiments on real-world wafer map dataset (WM-811K). Compared to state-of-the-art techniques, the proposed model outperforms their performance even with small-sized 1D feature pool. The average testing accuracy are 98.77% and 98.8% for 9 and 8 types of faults, respectively.

Wafer Defect Map Classification Using Sparse Convolutional Networks

Deep Convolutional Neural Networks with Residual Blocks for Wafer Map Defect Pattern Recognition

Wafer bin map inspection based on DenseNet

Article 01 August 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In semiconductor manufacturing, wafer is an important fundamental component in integrated circuits (IC). A single wafer can involve several hundred integrated circuits (ICs) after hundreds of sophisticated processes (Alam & Kehtarnavaz, 2022). Any abnormality in these production processes may lead to the generation of defects in the wafer map. Hence, due to the complexity of these processes, it is impossible to produce wafers without any defects (Jin et al., 2019; Liu & Chien, 2013).

After completing the wafer fabrication processes, every wafer undergoes a testing procedure including a series of multiple electrical tests to determine whether each individual chip (or die) meets its product specifications. Specifically, a probe test bench is utilized to detect the electrical characteristics of the chip dies (Cheng et al., 2021). Then, according to the quality level, the chip dies are marked in different colors on the wafer map. Typically, the captured defect patterns are divided into two types: global random defects and local systematic defects. In the global ones, defects are distributed randomly across the wafer without any spatial arrangement, even in normal production conditions. In contrast, in the local ones, spatial correlations are observed in specific regions of a wafer, resulting in patterns such as center circles, edge rings, local zones, and scratches (Wu et al., 2014).

Integrated circuits manufacturing requires high investment, precise technology, and a complex manufacturing process. Thus, an analysis of the wafer map is essential to improve the yield, quality, and reliability of the IC manufacturing process. Even so, manually annotating wafer maps with their defect types is time-consuming and expensive, especially with large production lines (Shankar & Zhong, 2005). Moreover, engineers judge the defect types of wafer map based on their professional knowledge and work long hours which can be exposed to visual fatigue and raises the risk of erroneous classification. Hence, automatic inspection of the wafer map defect is a necessary step which can reduce the time and cost.

With the advancements of machine learning and deep learning algorithms, building an effective automatic fault detection model has become a hot topic in the research community (Kim & Behdinan, 2023; Theodosiou et al., 2023). These wafer fault detection models can be categorized into segmentation models (Cheng et al., 2021; Chu et al., 2022; Jin et al., 2019; Kim & Kang, 2021; Lee et al., 2010; Nag et al., 2022; Yan et al., 2023) and classification models (Baly & Hajj, 2012; Kyeong & Kim, 2018; Saqlain et al., 2019; Yu et al., 2019; Jin et al., 2020; Chen et al., 2021, Chen et al., 2022; Kang & Kang, 2021; Kim et al., 2021; Wang et al., 2021, Wang et al., 2022; Yu et al., 2021a, b, c; Zheng et al., 2021; Shin et al., 2022; Xuen et al., 2022; Yoon & Kang, 2022; Yu et al., 2022; Zhang et al., 2022; Alqudah et al., 2023). The classical supervised recognizers have achieved some good results in wafer map defect recognition (Alqudah et al., 2023; Baly & Hajj, 2012; Cheng et al., 2021; Saqlain et al., 2019). Nevertheless, their performances relied on the effectiveness of the feature extraction step. In addition, the spatial resolution and noise of the wafer maps significantly affect the performance of such techniques. Accordingly in recent times, deep feature learning and extraction has been widely applied in the field of wafer defect recognition (Kyeong & Kim, 2018; Yu et al., 2019, 2022; Jin et al., 2020; Chen et al., 2021, Chen et al., 2022;; Kang & Kang, 2021; Kim et al., 2021; Wang et al., 2021, Wang et al., 2022; Yu et al., 2021a, b, c; Zheng et al., 2021; Shin et al., 2022; Xuen et al., 2022; Yoon & Kang, 2022; Zhang et al., 2022; Xu et al., 2023). However, employing the traditional 2D convolutional neural network (CNN) in the direct classification of defects can easily lead to instability in the classification results (Xu et al., 2023), especially with very small image resolution, like the employed WM-811K wafer map dataset. This instability occurs because of the lack of spatial information to learn features and make proper predictions. In addition, 2DCNNs generally releases high dimensional features that may contain many redundant information, which brings some challenges at the final classification stage, like increasing the computational complexity, reducing memory efficiency, and increasing the chance of overfitting (Yu et al., 2019; Jin et al., 2020; Chen et al., 2021, Chen et al., 2022; Kang & Kang, 2021; Wang et al., 2021, Wang et al., 2022; Yu et al., 2021a, b; Zheng et al., 2021; Xuen et al., 2022; Yoon & Kang, 2022; Yu et al., 2022; Zhang et al., 2022). Many studies have tried different feature engineering steps with deep models to get over these challenges in (Jin et al., 2020; Yu et al., 2021b; Zheng et al., 2021). Table 1 summarizes the most recent studies that employ deep models, indicating the main procedure, strengths, and issues in each one.

Table 1 Summary of the state-of-the-art techniques in wafer map fault detection following hybrid methods in inspection

Full size table

Accordingly, the main target in this work is how to build a deep recognition system that can provide precise salient features, despite the very low-resolution of wafer maps, with the highest recognition performance. In addition, this recognition system can avoid redundant information that leads to instability and bad interpretability. Thus, we go for converting the 2D wafer map fault detection problem to 1D detection one to get the most salient features with the least dimensionality. Therefore, the redundancy and the high dimensionality of features can be reduced. Nonetheless, new challenges are raised, such as how to assign suitable embedding that kept the 2D spatial information in 1D representation, proper 1D feature ranking, and suitable 1DCNN classifier. Accordingly, the main contributions of the proposed wafer map defect classification model can be summarized as follows.

(1)
We have exploited the importance of encoder-decoder networks, i.e., autoencoders (AE), in two ways. First, AE is employed as a new convolutional synthetization model to get over the high imbalance problem in the employed wafer map dataset. This model proves efficiency by reconstructing the original wafer maps with a total loss of 0.0011. Second, AE is introduced in a new structure as an embedding representation step with dimensionality reduction behavior, named as sparsity-boosted autoencoder (SBAE). The resultant encoded sparse maps from SBAE guarantee more discriminative features with a 50% reduction in size compared to the original wafer maps. Despite this reduction in size, inspection accuracy of 99.48% can be obtained while working with initial wafer map resolution of 27 × 25.
(2)
An enhanced red deer optimization (ERD) with a new tinkering strategy is proposed. ERD is applied to 1D squeezed sinograms of the previous sparse maps. The ERD algorithm results in a final average feature pool of ~ 15 bases, i.e., ~ 1.5% of the initial wafer map size which has resolution of 33 × 29. The performance of ERD is compared, in an ablation behavior, to other different metaheuristic algorithms, such as Genetic (GA), Equilibrium (EO), Grey Wolf (GWO), Sine cosine (SCA), and particle swarm algorithms (PSO). The proposed ERD achieves the least feature pool size with approximately the same accuracy as its alternatives, because of the proposed tinkering strategy that makes the ERD algorithm reaches the global optimum solution with the least number of discriminative features to avoid any possible redundant information.
(3)
Intensive experiments, with a new predictive 1DCNN model, are performed on different resolutions of wafer maps in 8- and 9-fault type prediction. An average accuracy of 95.2% is achieved for unseen 62% testing part of the dataset, while an average accuracy of 98.1% is achieved for unseen 20% testing part of the dataset of in a train–test–validation evaluation. Despite the aggressive dimensionality reduction, the proposed inspection model proves efficient generalization. In addition, the proposed 1DCNN network proves a great balance between the number of parameters and the targeted accuracy in fault detection compared to other common 1DCNNs, such as 1D-VGG16, 1D-ResNet50, 1D-LeNet-5, and 1D-Inception.

The rest of the paper is organized as follows. Details about the targeted wafer map dataset are demonstrated in "Wafer map dataset" section. "Methodology" section describes the details of the proposed methodology of wafer map fault detection. "Experimental results and discussion" section indicated the performed experiments with its results. Finally, the conclusion is offered in "Conclusion" section.

Wafer map dataset

The WM-811K (Wu et al., 2014) is the employed wafer map dataset and it is publicly available at Kaggle website (WM-811K, 2014) It contains 811,457 instances collected from 46,293 lots during the semiconductor fabrication process (Wu et al., 2014). Only a subset of 21.3% (172,950) is labeled by professionals with one of the following nine categories: Center, Donut, Edge-Loc, Edge-Ring, Loc, Random, Scratch, Near-Full, and None, while the rest of the group is still unlabeled, check Fig. 1. As indicated from Fig. 1, the dataset is mostly unlabeled and most of the labeled part is free of fault “None” and the faulty part is very imbalanced. In Table 2, the distribution and the main cause of different defects are pointed out. This dataset provide a single-type defect in a single wafer map (Baly & Hajj, 2012; Saqlain et al., 2019; Yu et al., 2019, 2022; Jin et al., 2020; Chen et al., 2021; Kang & Kang, 2021; Wang et al., 2021, Wang et al., 2022; Yu et al., 2021a, b; Zheng et al., 2021; Chen et al., 2022; Xuen et al., 2022; Yoon & Kang, 2022; Zhang et al., 2022; Alqudah et al., 2023), but there are multiple studies that target mixed-type defect patterns in their inspection models, such as the ones in Kyeong et al. (2018), Kim et al. (2021), Sin et al. (2022), Yu et al. (2022), and Xu et al. (2023).

Table 2 The observed defect patterns in WM-811K with counts and causes

Full size table

Methodology

The main steps of the proposed model for wafer map fault detection are shown in the graphical abstract of Fig. 2, which combines a flow chart with a design purpose for each block. These steps are detailed in the following subsections. Algorithm 1 summarizes a pseudo code for the whole proposed wafer fault detection model.

Wafer data synthetization model

As presented in Table 2 and Fig. 1, WM-811K is a highly imbalanced dataset. Each wafer map consists only of three types of pixels: 0 for the background, 1 for the normal pixels and 2 for the defected ones. Consequently, two main preprocessing steps are performed. The first is one-hot encoding to convert the grey wafer maps $X(m,n)$ to colored ones $XX(m,n,c)$, where c is the number of channels, as $xx\left(m,n, {c}_{i=x\left(m,n\right)}\right)=1, where\,\,i\in \left\{\mathrm{0,1},2\right\}, x\left(m,n\right)$ and $xx(m,n,c)$ denotes the grey and colored pixel value. RGB wafer maps help to extract multi-scale features in the upcoming procedures. The second preprocessing step is a synthetization (augmentation or balancing) model. In this work, an autoencoder-based synthesizing model is used for data augmentation.

An autoencoder (AE; Li et al., 2023) is a type of ANN that learns efficient mappings or codings of unlabeled data in an unsupervised manner. AE extracts output data to reconstruct input data and compare it with original input data. After numerous times of iterations, the value of cost function reaches its optimality, which means that the reconstructed input data is able to approximate the original input data to a maximum extent. The introduced convolutional autoencoder (CAE) shows superiority to the traditional AE by incorporating convolutional layers which preserves the local image structures by incorporating spatial relationships between pixels in images.

The introduced CAE consists mainly of two parts: encoder and decoder, see Fig. 3. The encoder converts the input map to a bottleneck low dimensional feature map, while the decoder performs deconvolution operations to expand the latent feature map to reconstruct the original wafer map. The encoder in the proposed architecture consists of one 2D convolutional layer with 64 filters with a kernel weight of $\left(3\times 3\right)$, and a MaxPooling layer. The extracted feature maps from the convolutional layer of the encoder are represented as

$$H=\mathcal{A}\left(XX*W+B\right),$$

(1)

where $\mathcal{A}$ is the activation function which is employed as ReLU. $XX$ is the encoded colored wafer maps. $W$ and $B$ are the weights (convolutional kernel) and the bias, respectively.

The extracted feature maps from the convolutional layer is fed into a MaxPooling layer to provide the targeted bottleneck low dimensional feature map ${H}_{c}$, as

$${H}_{c}=\mathop{\max\limits_{i=0,\dots ,\mathcal{r}-1, j=0,\dots ,\mathcal{r}-1}}H\left({x}^{\prime}+i,{y}^{\prime}+j\right),$$

(2)

where $\mathcal{r}$ is the MaxPooling operator. ${x}^{\prime}, and {y}^{\prime}$ are the pixels coordinates. At this point, random Gaussian noise ($\mu =0, \sigma =0.1)$ is added at the bottleneck embedded map ${H}_{c}$ to provide more robustness to the synthetization model. Now, the decoder network tries to construct the input maps $XX$ through employing two transposing convolution (deconvolutional) layers and UpSampling layer, revise Fig. 3. The output of decoder is the restored feature maps of ${H}_{c}$ as

$$\mathcal{X}=\mathcal{A}\left({H}_{c}*/*W+B\right),$$

(3)

where $*/*$ is the deconvolution (Conv2DTranspose) operator. The first Conv2DTranspose (TConv2D) layer in the decoder network employs 64 filters with a kernel weight of $\left(3\times 3\right)$ and activation function $\mathcal{A}$ as ReLU. The second one employs three filters with a kernel weight of $\left(3\times 3\right)$ and activation function $\mathcal{A}$ as Sigmoid. The proposed CAE is trained over certain epochs to minimize the reconstruction error between the original wafer maps and the synthesized ones, in terms of the mean squared error as

$$MSE=\frac{1}{N}\sum_{i=1}^{N}{\left({XX}_{i}-{\mathcal{X}}_{i}\right)}^{2},$$

(4)

where $\mathcal{X}==\widehat{XX}$ is the new synthesized wafer maps and $N$ is the number of the inserted wafer maps. Figure 4 indicates a sample wafer map and its corresponding synthesized one. As indicated, they look very similar in the indicated zoomed-up versions of maps. In addition, in the same figure, the reconstruction error over the training epochs is shown with a final loss 0.0011at epoch 30.

Sparse feature learning and encoding

At this stage, as low-dimensional 2D wafer maps are targeted, a new sparse autoencoder model is introduced. Sparsity is the property of being sparse or having a lot of zero entries (Sun et al., 2022). In the context of machine learning, sparsity is often used to refer to the number of zero weights in a neural network. An autoencoder is called sparse when its hidden layer activations are encouraged to be sparse (Ng, 2011). The sparsity constraint is added to the loss function of the traditional convolutional autoencoder. The sparsity constraint can be based on the L₁ norm of the hidden layer activations or on the KL divergence between the distribution of activations in the hidden layer and a target distribution that is sparse. For the main difference between the traditional autoencoder and the sparse one, check Fig. 5.

In the proposed sparsity-boosted autoencoder (SBAE), a sparsity-reinforced layer is added to the last layer of the encoding phase. Therefore, the network is encouraged to learn an encoding by activating only a small number of nodes. The cost function of the proposed SBAE utilizes three terms: a reconstruction term combined with other two regularizers, i.e., a weight decay term and another sparsity-boosting term. The reconstruction term is similar to the previous CAE. The weight decay term helps to decrease the magnitude of the weights and prevent overfitting. The sparsity-boosting term induces a sparsity penalty in the training criterion. For the configuration of the proposed SBAE, check Table 3.

Table 3 Detailed configuration of the proposed SBAE

Full size table

Assume the synthesized wafer maps of $N$ samples $\left({\mathcal{X}}_{1}, {\mathcal{X}}_{2},\dots , {\mathcal{X}}_{N}\right)$, where ${{x}}_{i}$ represents the ith input of sample ${\mathcal{X}}^{i}$. The cost function considering only the reconstruction term with the weight decay term can be expressed as

(5)

where ${\mathcal{A}}_{W, B}\left({\mathcal{X}}^{i}\right)=\mathcal{A}\left(W*{\mathcal{X}}^{i}+{\text{B}}\right)$ is the activation or mapping of the input ${\mathcal{X}}^{i}$ at layer $l$. ${n}_{l}$ denotes the number of layers $l$ in the targeted network. ${o}_{l}$ is the number of nodes or units in layer $l$. adjusts the weight of the decaying term; large can cause overfitting, while small values may cause underfitting. In the proposed SBAE configuration, the value of adjusted empirically via multiple experiments.

For the sparsity-boosted term, the following term $\mathcal{K}\left(\mathcal{P}\Vert \widehat{\mathcal{P}}\right)$ is inserted in the cost function in Eq. 5 to be reformulated as

(6)

$$\mathcal{K}\left(\mathcal{P}\Vert {\widehat{\mathcal{P}}}_{j}\right)=\mathcal{P}{\text{log}}\frac{\mathcal{P}}{{\widehat{\mathcal{P}}}_{j}}+\left(1-\mathcal{P}\right){\text{log}}\frac{1-\mathcal{P}}{1-{\widehat{\mathcal{P}}}_{j}},$$

(7)

where $\mathcal{K}\left(\mathcal{P}\Vert \widehat{\mathcal{P}}\right)$ is Kullback–Leibler divergence which seeks to reduce the deviation between $\widehat{\mathcal{P}}$ and $\mathcal{P}$. $\mathcal{P}$ denotes the targeted sparsity parameter, typically a small value close to zero. $\widehat{\mathcal{P}}$ is the average output of all hidden neurons, ${\widehat{\mathcal{P}}}_{j}=1/N\sum_{i}{ \mathcal{A}}_{W,B}\left({\mathcal{X}}^{i}\right)$. $\gamma $ is the sparse penalty coefficient.

Using the introduced SBAE, the wafer map of size $(m\times n\times c)$ is converted to a sparse encoded wafer map ${\mathcal{X}}^{s}$ of size $\frac{m}{2}\times \frac{n}{2}\times c$, which means the encoded map spatial size is reduced to the half. The encoded map ${\mathcal{X}}^{s}$ is obtained from the bottleneck layer of SBAE. To visualize the clustering performance of the resultant encoded maps, T-SNE (Van der Maaten & Hinton, 2008) is used, see Fig. 6. It helps to visualize the high-dimensional data by mapping the clustered features into low-dimensional space. As indicated, the encoded feature maps show better clustering performance compared to the original maps.

A new sinogramic red deer feature ranking

The main target of feature engineering steps is to reduce the feature dimensionality while keeping the best performance. At this stage, we intend to convert the sparse encoded wafer maps ${\mathcal{X}}^{s}$ to 1D signal without losing the spatial information of the 2D maps. Wherefore, the sparse encoded feature maps ${\mathcal{X}}^{s}$ are converted to sinograms by employing Radon transformation (Leavers, 1992). After that, each sinogram now can be converted to 1D signal ${y}^{s}$ of size $\left(1\times \frac{mnc}{4}\right)$, so for N samples, we have 1D feature pool ${Y}^{s}$ of size $\left(N\times \frac{mnc}{4}\right)$. Then, the proposed enhanced red deer (ERD) algorithm is applied to the resultant sinograms to assign the optimal reduced features.

The conventional red deer algorithm

Red deer (RD) algorithm is a new nature-inspired optimization technique (Fathollahi-Fard et al., 2020). It belongs to the family of population-based metaheuristics algorithms. The main advantage of RD algorithm is that it equally maintains the exploitation and exploration phases, which helps to assign the salient features with low complexity.

Red deer is male or female (hinds). A group of hinds is called a harem. Each harem is assigned a male commander. A competition is set among male RDs to get the harem with more hinds via roaring and fighting. According to the strength of the roaring phase, male RDs are categorized into commanders and stags. Only the strongest male after a fierce fight with the other males will be the commander of the harem. Here, the ${\left(\frac{mnc}{4}\right)}^{th}$ sparse encoded 1D features in , where of size $(N, 1)$ is the feature vector, are considered as a group of RDs. Figure 7 indicates a flow chart of the RD algorithm. The main objective of optimization problem concentrates on the determination of near global or optimal solution evaluated with respect to the variables associated with the problem.

Stage 1: Initialize the population

At this stage, an sparse sinogramic features to initialize red deers as, . Among this population, are chosen as the male red deer features, ${G}_{male}$, while the rest are the hind group, ${G}_{hind}$. The best features in the selected population are selected as males according to their fitness value. The proposed objective or fitness function is a collaboration between the classification accuracy, based on KNN classifier, and the proportion of the selected number of red deers through a weighted sum as

$$f=\upomega . acc+\left(1-\omega \right)\frac{{\vartheta }}{\Upgamma},$$

(8)

where $acc$ denotes the classification accuracy of the currently selected red deers or features. ${\vartheta }$ represents the number of the currently selected red deers, while Γ is the total number of features in the targeted feature pool. $\omega $ is a weighting coefficient in the range of $[\mathrm{0,1}]$.

Stage 2: Roaring phase

The male agents are currently the superior solutions. Roaring is a local search for other neighboring best features. The updating rule is as follows.

$${RD}_{male}^{new}=\left\{\begin{array}{c}{RD}_{male}^{old}+{\alpha }_{1}\times \left(\left({\fancyscript{u}}-{\ell}\right)*{\alpha }_{2}+{\ell}\right), if {\alpha }_{3}\ge 0.5,\\ {RD}_{male}^{old}-{\alpha }_{1}\times \left(\left(\fancyscript{u}-{\ell}\right)*{\alpha }_{2}+{\ell}\right), if {\alpha }_{3}<0.5,\end{array}\right.$$

(9)

where ${RD}_{male}^{new}$ and ${RD}_{male}^{old}$ are the current and the previous positions of male red deer solutions. $\mathcal{u}$ and ${\ell}$ are the upper and lower limits of local search of neighboring solutions. ${\alpha }_{1}$, ${\alpha }_{2}$, and ${\alpha }_{3}$ are randomly generated coefficients from a uniform distribution that ranges from 0 to 1.

Then, the male red deer solutions are categorized into commander and stage red deer solutions. The number of male commanders, is considered as , where $\delta $ is a random value in a range from 0 to 1. The number of stags can be expressed as

Stage 3: The fighting phase between stags and commanders

Here, every commander is allowed to fight with random stags. According to the solution space, we let the group solution of commanders ${G}_{com}$ approaches that of stags ${G}_{stag}$. Accordingly, two new group of solutions are generated as

$${G}_{{new}_{1}}=\frac{{G}_{com}+{G}_{stag}}{2}+ {\beta }_{1}\times \left(\left({\fancyscript{u}}-{\ell}\right)*{\beta }_{2}+{\ell}\right),$$

(10)

$${G}_{{new}_{2}}=\frac{{G}_{com}+{G}_{stag}}{2}- {\beta }_{1}\times \left(\left({\fancyscript{u}}-{\ell}\right)*{\beta }_{2}+{\ell}\right),$$

(11)

where ${G}_{{new}_{1}}$ and ${G}_{{new}_{2}}$ denote the new generated solutions due to the fighting process. $\fancyscript{u}$ and ${\ell}$ are the limits of the search space. ${\beta }_{1}$ and ${\beta }_{2}$ are randomly generated coefficients from a uniform distribution that ranges from 0 to 1. Now, we have four solutions, i.e., ${G}_{com}$, ${G}_{stag}$, ${G}_{{new}_{1}}$, and ${G}_{{new}_{2}}$, the solution with the best cost function $F$ will be selected as the final commander.

Stage 4: Forming harems

Here, the assigned new commander is responsible for forming harems. A harem consists of a male commander and a group of female deer (hinds). The hinds are distributed into separate harems in a random manner, based on the power of the commander in roaring and fighting, i.e., it’s fitting value. Hence, the number of hinds in a harem $i$, , is calculated as

(12)

where ${f}_{i}$ denotes the normalized power, fitting value, of the commander.

Stage 5: Mating phase

After forming harems, there are three possibilities of mating. The first is that the commander of harem $i$ mates with $\rho $ of its harem’s hinds, and the second occurs when the commander mates with $\vartheta $ of hinds of other harems. Commanders attack another harem to expand his command area. The third possibility is that each stag mates with the closest hind, regardless of harem restrictions. Due to the mating phase, new offspring RDs, ${G}_{{\text{OS}}}$, i.e., solutions, are generated as

$${G}_{{\text{OS}}}=\frac{{G}_{com}+{G}_{hind}}{2}+\theta \times \left({\fancyscript{u}}-{\ell}\right),$$

(13)

where ${G}_{com}$, and ${G}_{hind}$ are the group of solutions that represent commanders and hinds. $\theta $ is a random number between 0 and 1. In the third possibility of mating ${G}_{com}$ is replaced by the stag solutions ${G}_{stag}$.

Finally, the next generation is assigned from all the best commanders (a certain percentage of the best solutions), and the best hinds from all hinds and offspring, via the use of a roulette wheel. These previous stages are performed repeatedly until the maximum number of iterations (Maxiter) reached and the optimal features are defined.

The proposed enhanced red deer optimizer (ERD)

The main idea in the proposed ERD is a tinkering strategy that enhances the traditional mating phase in the conventional red deer optimizer represented in Eq. 13. The tinkering strategy is about adding members from the worst harems, i.e., with the worst fitting values, toward the “best-so-far” harems which have the best fitting value. This tinkering strategy is performed based on the degree of diversity among harems.

The tinkering strategy

For the tinkering strategy, we need to define the best harems and the worst harems, according to the fitting value, while keeping . The worst of the worst hinds of the $U$ group will tinker the best harems $Q$, i.e., they join the original hinds. However, this tinkering strategy may lead to local minima entrapment. Accordingly, we need to control the targeted number of worst harems, , as follows.

(14)

where is the total number of constructed harems. ${F}_{harem}^{worst},$ and ${F}_{harem}^{best}$ denote the whole worst and best fitting value, respectively. Ï is the current iteration number corresponding to the maximum assigned number of iterations, $maxiter$. ṋ is a fixed number of harems to be updated within each iteration.

After determining the worst harems, the best harems are assigned as . Then, the number of hinds from the worst harems are distributed equally, in number, between the best harems, while the best of the best harems assigned the worst hinds in fitting simulating the effect of mutation in metaheuristic algorithms. Now, the harems are reformulated according to the tinkering strategy. According to the mating phase, the new offspring RDs, ${G}_{{\text{OS}}}$, are generated as

$${G}_{{\text{OS}}}=\left\{\begin{array}{c}\frac{{G}_{com}+{G}_{hind}}{2}+\theta \times \left({\fancyscript{u}}-{\ell}\right), if v>\zeta ,\\ \frac{{G}_{com}+{G}_{hind}+{G}_{tink}}{3}+\theta \times \left({\fancyscript{u}}-{\ell}\right), otherwise,\end{array}\right.$$

(15)

where ${G}_{tink}$ is the $\xi \%$ of the new tinkering hinds. $v$ is the diversity in fitting between the tinkered harem and the tinkering harem, as it is defined as

(16)

where ${f}_{{harem}_{i}}$ is the whole fitting values of hinds in harem $i$. Algorithm 2 summarizes the main steps of the proposed ERD.

The proposed 1DCNN classification model

For the final prediction stage to differentiate between the different types of faults in the wafer map, we propose a new 1DCNN-based network which receive the resultant feature pool from the proposed ERA algorithm. 1DCNN is a specific type of CNN that is designed to basically operate on one-dimensional signal. Therefore, it employs one-dimensional convolutions and sub-sampling layers for feature mapping and extraction. Following the optimum majority of CNNs, an input layer, CNN layer group (a convolution layer and a pooling layer), a fully connected layer and output layer form a basic 1DCNN. The resultant vector from each convolutional layer, activation layer, or pooling layer can be considered as a one-dimensional “feature vector”, that can be used somehow in the 1D targeted task. For the configuration of the proposed 1DCNN network, check Fig. 8.

Experimental results and discussion

In this section, different experiments have been performed to test the performance of the proposed model of wafer map fault detection. Intensive visual and computitative comparisons are introduced with an ablation study to indicate the impact of the different steps in the introduced model.

Experimental setup all experiments were performed on the Colab Pro environment of Google using different Python libraries, such as Keras, NumPy, Tensorflow, and Sklearn. For the feature reduction stages: check Table 3 for the configuration of SBAE and Algorithm 2 for the parameters of proposed tinkered red deer optimization (ERD). For the final prediction stage, we used the advanced optimization algorithm Adam with its default parameters to exploit the optimum weighting coefficients. We set the learning rate parameter to 0.0001.The 1DCNN is trained with batches of 32 due to memory limitations. For the model assessment, see Fig. 9, different metrices are introduced, such as accuracy, precision, recall, and F1-score.

Comparison to the state of the art

In Table 4, a comparison is set among different methods in wafer map fault detection (Jin et al., 2020; Chen et al., 2021; Yu et al., 2021a), DenseNet-GCF (Yu and Chen et al., 2022; Wang et al., 2021, 2022; Zheng et al., 2021), WDP-BNN (Alqudah et al., 2023; Zhang et al., 2022), and the proposed method. All the prementioned competitors are based on deep neural networks with supportive modules to enhance the detection performance. Most of them employ a cross-validation evaluation (Jin et al., 2020; Yu et al., 2021b, Chen et al., 2022) or just a train–test evaluation (Chen et al., 2021; Yu et al., 2021a; Xuen et al., 2022; Wang et al., 2022; Zhang et al., 2022). Following the cross-validation evaluation, Chen et al. (2022) achieved high accuracy of 98.34% for nine fault classes but with a very complicated model of two 2DCNNs employing more than 53,000 sample. On the other hand, Jin et al. (2020) declared 98.43% accuracy but with 8 fault types employing hybrid model of 2DCNN with error-correcting output codes and support vector machines for classification with 20,000 samples.

Table 4 Computitative comparison between the state-of-the-art methods with proposed detection method

Full size table

For the other types of evaluation, i.e., train–valid evaluation, it is a fragile evaluation because it can simply lead to overfitting and doesn’t guarantee that the model will perform well on other unseen data from the same distribution. Wang et al. (2022) with 95:5 evaluation, for 9 fault types within 13,435 samples, declared 98.35% accuracy but with complicated model of 2DCNN and residual blocks. In addition, residual blocks can make networks more prone to overfitting. Chen et al. (2022) with 80:20 evaluation, for 8 fault types within 33,256 samples, gained 96% accuracy employing very complicated model of two 2DCNNs with error-correcting output codes and support vector machines for classification.

In Table 4, two methods, i.e., Zheng et al. (2021) and Alqudah et al. (2023), have targeted the same evaluation and feature reduction concepts as ours. Zheng et al. (2021) follows a train–validation–test evaluation (60:20:20). They declared accuracy of 93.8 with 4000 sample size for 9 fault types. Zheng et al. (2021) build their own 2DCNN for classification. Their obtained accuracy is adequate with the limited sample size and the lack of preprocessing steps. The proposed model achieves 98.77% and 99.24% accuracy for 9 and 8 fault types, respectively, with 60:20:20 evaluation within around 16,000:18,000 samples.

Alqudah et al. (2023) have followed the concept of feature reduction. Their features are extracted from the 2D wafer maps contrary to the proposed method which are from 1D sinograms. Alqudah et al. (2023) have utilized 59 feature bases in total (Density, radon, and geometry-based features). They declared accuracy of 82.7%, despite using around 29,000 sample, because they have employed a simple SVM classifier. In the proposed fault detection method, three different Train–validation–test evaluations have been introduced, i.e., 60:20:20, 36:15:49, and 23:15:62. We have got a minimum accuracy of 98.5% when the detection model is tested with 62% unseen data for 9 fault types within around 18,000 sample and 15 features. On the other side, we have gained accuracy 98.61% for 62% unseen part for 8 fault types within around 16,000 sample and 19 features. In addition, with the proposed method, the accuracy just changed from 98.77 to 98.50% and from 99.24 to 98.61% when the unseen testing data part changed from 20 to 62% for 9 and 8 fault types, respectively. This very small change proves that the proposed model generalized well by exploiting the most suitable discriminative features by the assigned feature engineering steps (SBAE and ERD). For detailed classification reports, confusion matrices, and train–validation performance of the proposed detection method for 9 fault types, check Figs.10, 11, and 12. From Figs. 10 and 11 we can see that the “none” fault type has the least metrics. It is the most confusing class. It shares similar structures (features) with other faults, check the fault images in Table 2. In addition, “none” fault is totally unbalanced with other faults, so many studies (Alqudah et al., 2023; Chen et al., 2021; Jin et al., 2020) have ignored this fault to avoid additional preprocessing steps. Figure 12 indicates the train–validation performance which shows semi-identical performance that refuting the chance of overfitting.

The impact of wafer map resolution in performance

WM-811K dataset originally has different sizes of wafer maps, 632 in total, and varies in resolution from 6 × 21 to 300 × 202. Figure 13 indicates the resolution distribution in terms of the number of wafer maps. There are 5 resolutions only with more than 10,000 samples. The largest resolution among them $39\times 37$. As the wafer maps are not rich in color, i.e., each map contains only three colors, the initial preprocessing steps to adjust the size can simply affect the fault type. Hence, different experiments have been performed on the resolutions that have more than 8000 samples and they are only 7 different resolutions. These samples for each resolution are indicated in Table 5 before and after (W/O and W/) the CAE-based synthetization model as a balancing model. The “All” wafer map set groups the seven other sets and mapped them to resolution 26 × 26. As shown, in Table 5, the “none” fault type has the largest number of maps in each resolution. In addition, the resolution $33\times 29$ is the most challenging as it has the least number of samples in each fault type.

Table 5 The number of samples under each employed wafer map resolution, from the labeled maps, before and after the CAE-based synthetization model

Full size table

Having different resolutions affect the structure of the fault types, and accordingly affect the discriminative features that affect the recognition rate of each class and thus affect the final metrics. Table 6 indicates the performance of different resolutions under different split ratios. As demonstrated, there are two resolutions that have only 8 fault types or classes, i.e., $25\times 27$ and $27\times 25$, but they don’t have the same types of faults. The resolution $25\times 27$ misses the fault “donut”, while the resolution $27\times 25$ misses the fault “Edge-ring”, revise Table 4. The other resolutions have 9 fault types. For the category of 8 classes, the resolution $27\times 25$ shows better performance than $25\times 27$. With only 19 features, the resolution $27\times 25$ provides 99.24% accuracy with 20% testing and 98.61% accuracy with 62% testing corresponding to 98.36% and 94.82% for $25\times 27$ resolution with 22 features. We can see that the accuracy changed slightly from 99.24 to 98.61% (very good generalization) in $27\times 25$ and aggressively from 98.36 to 94.82% (good generalization) in $25\times 27$ by changing the unseen part from 20 to 62%. As indicated from the classification reports in Fig. 14, the recognition metrics of “Loc” and “none” classes degrade a little at resolution $27\times 25$. At the other side, at resolution $25\times 27$, the recognition metrics of three classes degrades aggressively, i.e., “none”, “loc”, and “Edge-Loc”, while the recognition of the “center” degrades slightly. The main cause behind a degradation in recognition metrics, which affects the grouped performance, is that the extracted discriminative feature fails to recognize this class effectively like the other classes because of sharing similar structures. Check Fig. 15 for samples of wafers for “none”, “loc”, “Edge-Loc”, and “center”.

Table 6 Computitative comparison between the performance of different resolutions of wafer maps at different split ratio

Full size table

For the resolution sets that have 9 fault types, as presented in Table 6, the resolution $33\times 29$ comes at the first place with the minimum feature size (15), and with the best performance in terms of the recognition metrices, despite the variety of the split ratio. It proves a good generalization by changing the unseen part from 20 to 62%. The resolution $39\times 37$ is the worst with the largest feature size (27), the least metrices, and the least generalization. Figure 16 indicates the classification reports of the resolutions $33\times 29$ and $39\times 37$ at different unseen part percentages, i.e., 20% and 62%. As demonstrated, the recognition metrics of “none”, “Loc”, and “Edge-Loc” at the resolution set $33\times 29$ are high which means that the faults shapes discriminatively differ from each other, on contrary to what we have with the resolution set $39\times 37$. This means the recognition rate of the fault type is independent of higher-resolution maps, but the fault type structure. In other words, the resolution set $39\times 37$ has more complicated and indiscriminative fault types that are not efficiently recognizable, like the resolution set $33\times 29$. For more results of other resolution sets, see Table 10 in “Appendix”.

The impact of feature engineering steps

In the proposed fault detection method, two main techniques have been used for feature engineering after the data balancing step. The first is a sparse feature learning and encoding by the proposed SBAE, where the resultant sparse encoded feature maps are extracted from the bottleneck of SBAE. Inducing sparse regularization in a traditional convolutional autoencoder enhances the reconstruction process and accordingly efficient sparse embedding can be gained at the bottleneck of SBAE. In Fig. 17 a comparison is made between the performance of SBAE and the traditional convolutional AE (CAE) that have the same configuration excluding the sparsity regularization. As shown, without the induced sparsity, the reconstructed wafer maps at CAE are blurry without any fine details contrary to SBAE. The second feature engineering step is applying the proposed tinkered red deer feature ranking (ERD) to 1D sinograms of the resultant sparse encoded features from the bottleneck of SBAE. Figure 18 indicates visual comparison for the convergence of fitness over iterations between the conventional red deer optimization and the proposed ERD. Better convergence and higher fitness belongs to the proposed ERD.

The impact of these prementioned feature engineering steps on performance can be checked through four conditions: the first is without any feature engineering steps, the second and the third adopt only one feature engineering step, i.e., SBAE or ERD, the fourth is about applying both feature engineering steps. These conditions are presented in Table 7 as computitative comparison. As shown, the worst performance belongs to the first condition (W/O ERD, W/O SBAE), despite owning the largest 1D feature pool. The redundant information in this large pool limits the performance of the proposed classifier 1DCNN. We can see that the most wafer maps affected by the absence of the feature engineering steps are the set of “$33\times 29$”, while the least affected are the set of “$30\times 34$”. This means that the structure of fault types in the set $33\times 29$ offer more redundant information than that of the set “$30\times 34$”. For the second and the third condition of applying one feature engineering step, i.e., (W/ ERD, W/O SBAE) and (W/O ERD, W/ SBAE), respectively, better performance than the first condition is gained with smaller 1D feature pool.

Table 7 Comparison between the different cases of applying the adopted feature engineering steps, i.e., ERD and SBAE

Full size table

As pointed out, in (W/ ERD, W/O SBAE), considering the original features without the sparse learning by SBAE increases the number of features selected by the proposed ERD algorithm seeking the global optimum feature ranking. The set “$30\times 34$” has the largest feature pool but with high accuracy, while the set “$33\times 29$” has the smallest feature pool but with low accuracy. On the other hand, depending on the sparse boosted features from the bottleneck of SBAE without employing ERD at the third condition (W/O ERD, W/ SBAE) provides a fixed size 1D feature pool which shows adequate performance with some sets, like $27\times 25$ and $33\times 29,$ but fails with other sets, like $26\times 26$, $30\times 34$. The adequate performance means that the extracted features guarantees a suitable discrimination between classes while failing means that features are not discriminative enough. Finally, at the fourth condition (W/ ERD, W/ SBAE), as signified, the least 1D feature pool sizes with the best performance is achieved. The sparsity induced in SBAE allows to get more sparse fine details that simplifies the mission of ERD to find the global optimum with the least number of features. For more computitative results of the four conditions at other different resolutions, see Table 11 in “Appendix”. Figure 19 presents the classification reports of the map set “$26\times 26$” at the previous four conditions. In addition, Fig. 20 indicates the train–validation performance over epochs. The small gap between train and valid means good generalization while large gap means bad generalization.

The impact of the proposed ERD compared to other metaheuristic algorithms.

Here, the impact of the enhanced tinkered red deer algorithm (ERD) is discussed. A comparison is set in Table 8 between different metaheuristics algorithms compared to the proposed ERD. A metaheuristic algorithm (Abdel-Basset et al., 2018) is a high-level procedure or heuristic designed to find, generate, tune, or select a heuristic (partial search algorithm) that may provide a sufficiently good solution to an optimization problem. In the prementioned comparison, Genetic (GA; Sohail, 2023), Equilibrium (EO; Altantawy & Kishk, 2023; Houssein et al., 2022), Grey Wolf (GWO; Faris et al., 2018), Sine cosine (SCA; Zhou et al., 2022), and particle swarm algorithms (PSO; Shami et al., 2022) have been utilized. As demonstrated from Table 8, all assigned metaheuristic algorithms provide a superior performance in the fault type prediction, but the proposed ERD provides the absolute least feature pool size with the same accuracy, approximately. For the 8-fault type detection problem, the average drop in accuracy from the best performer is 0.63%, while in 9-fault type problem, the average drop in accuracy is 1.22%. In the set of “All”, it is noticed that the accuracy degraded a little bit compared to other sets, because of the resizing process of wafer map to a fixed common size which affects the shape of fault patterns. Table 8 indicates the results for the resolution sets: $26\times 26$, $27\times 25$, and All. For more results of other resolution sets, see Table 12 in “Appendix”. For fair comparison in Tables 8 and 12, we have utilized the same population size and seek the parameters that keep the most possible fitting score on all employed metaheuristic algorithms.

Table 8 Comparison of different common metaheuristic algorithms for the proposed fault detection in wafer maps

Full size table

The main cause that the proposed ERD selects a smaller number of features is generally the tendency to focus more on exploitation than other metaheuristic algorithms, such as GA, PSO, EO, GWO, and SCA. The tendency of emphasizing on exploitation rises from the mating behavior of red deer, where the dominant stag (leader) mates with the most fertile hinds. This mechanism symbolizes the selection of features with higher fitness, driving the algorithm towards refining a smaller set of relevant features. In contrast, algorithms like GA and PSO tend to explore the search space more extensively, potentially leading to the selection of a larger number of features. This is because their mechanisms encourage the exploration of diverse solutions and the recombination of features, which can sometimes introduce less relevant features into the selected set. While ERD focus on exploitation can be beneficial in reducing the risk of overfitting, it may also limit the algorithm's ability to capture complex relationships between features. This could potentially affect its performance on datasets where such relationships are crucial for accurate prediction or classification.

The impact of the classification stage

For the prediction stage, different common 1D deep networks (Kiranyaz et al., 2021) have been tested, like 1D-VGG16, 1D-ResNet50, 1D-LeNet-5, and 1D-Inception, against the proposed 1DCNN. A comparison is set at Table 9 between these networks. As demonstrated, the 1D-VGG16 provides the best average accuracy of 98.27%, but with larger parameters which are three times the parameters of the proposed 1DCNN which comes in the second place with average accuracy of 98.08%. The largest parameters and the worst performance belong to 1D-ResNet50 (16 M, 96.68%). The smallest number of parameters, 16 K, belongs to 1D-Inception with an average accuracy of 97.23%. The introduced 1D-LeNet-5 shows an average accuracy of 97.96% with 240 K number of parameters. For more computitative results, see Table 13 in “Appendix”. Figure 21 introduces a visual comparison between the prementioned classifier in terms of the classification report. The main difference between classifiers that affect the total performance is the recognition metrics of “none”, “loc”, and “edge-loc”, as the main difficult fault types in recognition. In Fig. 22, the training and validation losses are indicated over epochs. As shown, the proposed 1DCNN and 1D-LeNet-5 demonstrate the least losses and the smallest gap between validation and training which provides good generalization.

Table 9 Comparison of different common 1D CNNs for the proposed fault detection in wafer maps

Full size table

Conclusion

In this paper, a hybrid deep model for fault type prediction in wafer maps is proposed. The proposed model has targeted three objectives. The first is getting over the highly imbalanced dataset. The second targets obtaining more discriminative reduced features in 1D form instead of the 2D form of the original wafer maps. The final is an effective classifier that achieves adequate balance between accuracy and complexity. For the first objective, a new unsupervised synthetization model as a CAE is proposed which succeeded in reconstructing the inserted wafer maps with a very low loss of 0.0011. For achieving more discriminative features, as the second objective, firstly, a new sparsity-boosted autoencoder (SBAE) is proposed to get sparse encoded maps with 50% reduction in spatial size compared to the original maps. Secondly, an enhanced tinkered red deer optimization (ERD) is applied to 1D sinograms of the previously obtained sparse maps to get an average final 1D feature pool of ~ 25 feature bases (~ 1.5% of the original maps). In an ablation study, the adopted feature engineering steps prove efficiency in getting the least possible feature bases with high accuracy, especially when it is compared to other metaheuristic algorithms, such as Genetic (GA), Equilibrium (EO), Grey Wolf (GWO), Sine cosine (SCA), and particle swarm (PSO) algorithms. For the third objective, a new 1DCNN model is proposed for the 9- and 8-fault type prediction, which achieves an average accuracy of 98.1% with 180 K of trainable parameters. The proposed 1DCNN is compared to other common 1DCNNs, such as 1D-VGG16 (98.26%, 590 K), 1D-ResNet50 (96.7%, 16 M), 1D-LeNet-5 (98%, 240 K), and 1D-Inception (97.23%, 16 K). Despite the achievements of the proposed wafer map inspection model, being a hybrid deep model still increases the computational cost of the inspection procedure. Accordingly, as a future work, we intend to work on the feature engineering steps to reduce its complexity. We will try to extend the proposed detection model to other classification problems in wafer maps and other datasets, as well.

Data availability

All employed datasets are open source and have been cited in this paper.

References

Abdel-Basset, M., Abdel-Fatah, L., & Sangaiah, A. K. (2018). Metaheuristic algorithms: A comprehensive review. In Computational intelligence for multimedia big data on the cloud with engineering applications (pp. 185–231). https://doi.org/10.1016/b978-0-12-813314-9.00010-4
Alam, L., & Kehtarnavaz, N. (2022). A survey of detection methods for die attachment and wire bonding defects in integrated circuit manufacturing. IEEE Access, 10, 83826–83840. https://doi.org/10.1109/access.2022.3197624
Article Google Scholar
Alqudah, R., Al-Mousa, A. A., Hashyeh, Y. A., & Alzaibaq, O. Z. (2023). A systemic comparison between using augmented data and synthetic data as means of enhancing wafer map defect classification. Computers in Industry, 145, 103809. https://doi.org/10.1016/j.compind.2022.103809
Article Google Scholar
Altantawy, D. A., & Kishk, S. S. (2023). Equilibrium-based COVID-19 diagnosis from routine blood tests: A sparse deep convolutional model. Expert Systems with Applications, 213, 118935. https://doi.org/10.1016/j.eswa.2022.118935
Article Google Scholar
Baly, R., & Hajj, H. (2012). Wafer classification using support vector machines. IEEE Transactions on Semiconductor Manufacturing, 25(3), 373–383. https://doi.org/10.1109/tsm.2012.2196058
Article Google Scholar
Chen, S., Zhang, Y., Hou, X., Shang, Y., & Yang, P. (2022). Wafer map failure pattern recognition based on deep convolutional neural network. Expert Systems with Applications, 209, 118254. https://doi.org/10.1016/j.eswa.2022.118254
Article Google Scholar
Chen, S., Zhang, Y., Yi, M., Shang, Y., & Yang, P. (2021). AI classification of wafer map defect patterns by using dual-channel convolutional neural network. Engineering Failure Analysis, 130, 105756. https://doi.org/10.1016/j.engfailanal.2021.105756
Article Google Scholar
Cheng, K. C. C., Chen, L. L. Y., Li, J. W., Li, K. S. M., Tsai, N. C. Y., Wang, S. J., Huang, A. Y. A., Chou, L., Lee, C. S., Chen, J. E., & Liang, H. C. (2021). Machine learning-based detection method for wafer test induced defects. IEEE Transactions on Semiconductor Manufacturing, 34(2), 161–167. https://doi.org/10.1109/tsm.2021.3065405
Article Google Scholar
Chu, M., Park, S., Jeong, J., Joo, K., Lee, Y., & Kang, J. (2022). Recognition of unknown wafer defect via optimal bin embedding technique. The International Journal of Advanced Manufacturing Technology, 121(5–6), 3439–3451. https://doi.org/10.1007/s00170-022-09447-y
Article Google Scholar
Faris, H., Aljarah, I., Al-Betar, M. A., & Mirjalili, S. (2018). Grey wolf optimizer: A review of recent variants and applications. Neural Computing and Applications, 30, 413–435. https://doi.org/10.1007/s00521-017-3272-5
Article Google Scholar
Fathollahi-Fard, A. M., Hajiaghaei-Keshteli, M., & Tavakkoli-Moghaddam, R. (2020). Red deer algorithm (RDA): A new nature-inspired meta-heuristic. Soft Computing, 24, 14637–14665. https://doi.org/10.1007/s00500-020-04812-z
Article Google Scholar
Houssein, E. H., Çelik, E., Mahdy, M. A., & Ghoniem, R. M. (2022). Self-adaptive Equilibrium Optimizer for solving global, combinatorial, engineering, and Multi-Objective problems. Expert Systems with Applications, 195, 116552. https://doi.org/10.1016/j.eswa.2022.116552
Article Google Scholar
Jin, C. H., Kim, H. J., Piao, Y., Li, M., & Piao, M. (2020). Wafer map defect pattern classification based on convolutional neural network features and error-correcting output codes. Journal of Intelligent Manufacturing, 31(8), 1861–1875. https://doi.org/10.1007/s10845-020-01540-x
Article Google Scholar
Jin, C. H., Na, H. J., Piao, M., Pok, G., & Ryu, K. H. (2019). A novel DBSCAN-based defect pattern detection and classification framework for wafer bin map. IEEE Transactions on Semiconductor Manufacturing, 32(3), 286–292. https://doi.org/10.1109/tsm.2019.2916835
Article Google Scholar
Kang, H., & Kang, S. (2021). A stacking ensemble classifier with handcrafted and convolutional features for wafer map pattern classification. Computers in Industry, 129, 103450. https://doi.org/10.1016/j.compind.2021.103450
Article Google Scholar
Kim, D., & Kang, P. (2021). Dynamic clustering for wafer map patterns using self-supervised learning on convolutional autoencoders. IEEE Transactions on Semiconductor Manufacturing, 34(4), 444–454. https://doi.org/10.1109/tsm.2021.3107720
Article Google Scholar
Kim, T., & Behdinan, K. (2023). Advances in machine learning and deep learning applications towards wafer map defect recognition and classification: A review. Journal of Intelligent Manufacturing. https://doi.org/10.1007/s10845-022-01994-1
Article Google Scholar
Kim, T. S., Lee, J. W., Lee, W. K., & Sohn, S. Y. (2021). Novel method for detection of mixed-type defect patterns in wafer maps based on a single shot detector algorithm. Journal of Intelligent Manufacturing. https://doi.org/10.1007/s10845-021-01755-6
Article Google Scholar
Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M., & Inman, D. J. (2021). 1D convolutional neural networks and applications: A survey. Mechanical Systems and Signal Processing, 151, 107398. https://doi.org/10.1016/j.ymssp.2020.107398
Article Google Scholar
Kyeong, K., & Kim, H. (2018). Classification of mixed-type defect patterns in wafer bin maps using convolutional neural networks. IEEE Transactions on Semiconductor Manufacturing, 31(3), 395–402. https://doi.org/10.1109/tsm.2018.2841416
Article Google Scholar
Leavers, V. F. (1992). Use of the Radon transform as a method of extracting information about shape in two dimensions. Image and Vision Computing, 10(2), 99–107. https://doi.org/10.1016/0262-8856(92)90004-m
Article Google Scholar
Lee, S. H., Koo, H. I., & Cho, N. I. (2010). New automatic defect classification algorithm based on a classification-after-segmentation framework. Journal of Electronic Imaging, 19(2), 020502. https://doi.org/10.1117/1.3429116
Article Google Scholar
Li, P., Pei, Y., & Li, J. (2023). A comprehensive survey on design and application of autoencoder in deep learning. Applied Soft Computing. https://doi.org/10.1016/j.asoc.2023.110176
Article Google Scholar
Liu, C. W., & Chien, C. F. (2013). An intelligent system for wafer bin map defect diagnosis: An empirical study for semiconductor manufacturing. Engineering Applications of Artificial Intelligence, 26(5–6), 1479–1486. https://doi.org/10.1016/j.engappai.2012.11.009
Article Google Scholar
Nag, S., Makwana, D., Mittal, S., & Mohan, C. K. (2022). WaferSegClassNet—A light-weight network for classification and segmentation of semiconductor wafer defects. Computers in Industry, 142, 103720. https://doi.org/10.1016/j.compind.2022.103720
Article Google Scholar
Ng, A. (2011). Sparse autoencoder. CS294A Lecture Notes, 72, 1–19. https://graphics.stanford.edu/courses/cs233-21-spring/ReferencedPapers/SAE.pdf
Saqlain, M., Jargalsaikhan, B., & Lee, J. Y. (2019). A voting ensemble classifier for wafer map defect patterns identification in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing, 32(2), 171–182. https://doi.org/10.1109/tsm.2019.2904306
Article Google Scholar
Shami, T. M., El-Saleh, A. A., Alswaitti, M., Al-Tashi, Q., Summakieh, M. A., & Mirjalili, S. (2022). Particle swarm optimization: A comprehensive survey. IEEE Access, 10, 10031–10061. https://doi.org/10.1109/access.2022.3142859
Article Google Scholar
Shankar, N. G., & Zhong, Z. W. (2005). Defect detection on semiconductor wafer surfaces. Microelectronic Engineering, 77(3–4), 337–346. https://doi.org/10.1016/j.mee.2004.12.003
Article Google Scholar
Shin, W., Kahng, H., & Kim, S. B. (2022). Mixup-based classification of mixed-type defect patterns in wafer bin maps. Computers and Industrial Engineering, 167, 107996. https://doi.org/10.1016/j.cie.2022.107996
Article Google Scholar
Sohail, A. (2023). Genetic algorithms in the fields of artificial intelligence and data sciences. Annals of Data Science, 10(4), 1007–1018. https://doi.org/10.1007/s40745-021-00354-9
Article Google Scholar
Sun, Y., Song, Q., & Liang, F. (2022). Consistent sparse deep learning: Theory and computation. Journal of the American Statistical Association, 117(540), 1981–1995. https://doi.org/10.1080/01621459.2021.1895175
Article Google Scholar
Theodosiou, T., Rapti, A., Papageorgiou, K., Tziolas, T., Papageorgiou, E., Dimitriou, N., Margetis, G., & Tzovaras, D. (2023). A review study on ML-based methods for defect-pattern recognition in wafer maps. Procedia Computer Science, 217, 570–583. https://doi.org/10.1016/j.procs.2022.12.253
Article Google Scholar
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11). http://jmlr.org/papers/v9/vandermaaten08a.html
Wang, F. K., Chou, J. H., & Amogne, Z. E. (2022). A deep convolutional neural network with residual blocks for wafer map defect pattern recognition. Quality and Reliability Engineering International, 38(1), 343–357. https://doi.org/10.1002/qre.2983
Article Google Scholar
Wang, S., Zhong, Z., Zhao, Y., & Zuo, L. (2021). A variational autoencoder enhanced deep learning model for wafer defect imbalanced classification. IEEE Transactions on Components, Packaging and Manufacturing Technology, 11(12), 2055–2060. https://doi.org/10.1109/tcpmt.2021.3126083
Article Google Scholar
WM-811K (2014). Retrieved March 15, 2014, from https://www.kaggle.com/datasets/qingyi/wm811k-wafer-map
Wu, M. J., Jang, J. S. R., & Chen, J. L. (2014). Wafer map failure pattern recognition and similarity ranking for large-scale data sets. IEEE Transactions on Semiconductor Manufacturing, 28(1), 1–12. https://doi.org/10.1109/tsm.2014.2364237
Article Google Scholar
Xu, Q., Yu, N., & Hasan, M. M. (2023). Evolutionary computation-based reliability quantification and its application in big data analysis on semiconductor manufacturing. Applied Soft Computing, 136, 110080. https://doi.org/10.1016/j.asoc.2023.110080
Article Google Scholar
Xuen, L. S., Mohd Khairuddin, I., Mohd Razman, M. A., Mat Jizat, J. A., Yuen, E., Jiang, H., Yap, E. H., & Abdul Majeed, P. P. A. (2022, December). The classification of wafer defects: A support vector machine with different DenseNet transfer learning models evaluation. In International conference on robot intelligence technology and applications (pp. 304–309). Springer. https://doi.org/10.1007/978-3-031-26889-2_27
Yan, J., Sheng, Y., & Piao, M. (2023). Semantic segmentation based wafer map mixed-type defect pattern recognition. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. https://doi.org/10.1109/tcad.2023.3274958
Article Google Scholar
Yoon, S., & Kang, S. (2022). Semi-automatic wafer map pattern classification with convolutional neural networks. Computers and Industrial Engineering, 166, 107977. https://doi.org/10.1016/j.cie.2022.107977
Article Google Scholar
Yu, J., Shen, Z., & Wang, S. (2021b). Wafer map defect recognition based on deep transfer learning-based densely connected convolutional network and deep forest. Engineering Applications of Artificial Intelligence, 105, 104387. https://doi.org/10.1016/j.engappai.2021.104387
Article Google Scholar
Yu, J., Li, S., Shen, Z., Wang, S., Liu, C., & Li, Q. (2021a). Deep transfer Wasserstein adversarial network for wafer map defect recognition. Computers and Industrial Engineering, 161, 107679. https://doi.org/10.1016/j.cie.2021.107679
Article Google Scholar
Yu, J., Zheng, X., & Liu, J. (2019). Stacked convolutional sparse denoising auto-encoder for identification of defect patterns in semiconductor wafer map. Computers in Industry, 109, 121–133. https://doi.org/10.1016/j.compind.2019.04.015
Article Google Scholar
Yu, N., Chen, H., Xu, Q., Hasan, M. M., & Sie, O. (2022). Wafer map defect patterns classification based on a lightweight network and data augmentation. CAAI Transactions on Intelligence Technology. https://doi.org/10.1049/cit2.12126
Article Google Scholar
Yu, N. G., Xu, Q., Wang, H. L., & Lin, J. (2021c). Wafer bin map inspection based on DenseNet. Journal of Central South University, 28(8), 2436–2450. https://doi.org/10.1007/s11771-021-4778-7
Article Google Scholar
Zhang, Q., Zhang, Y., Li, J., & Li, Y. (2022). WDP-BNN: Efficient wafer defect pattern classification via binarized neural network. Integration, 85, 76–86. https://doi.org/10.1016/j.vlsi.2022.04.003
Article Google Scholar
Zheng, H., Sherazi, S. W. A., Son, S. H., & Lee, J. Y. (2021). A deep convolutional neural network-based multi-class image classification for automatic wafer map failure recognition in semiconductor manufacturing. Applied Sciences, 11(20), 9769. https://doi.org/10.3390/app11209769
Article Google Scholar
Zhou, W., Wang, P., Heidari, A. A., Zhao, X., & Chen, H. (2022). Spiral Gaussian mutation sine cosine algorithm: Framework and comprehensive performance optimization. Expert Systems with Applications, 209, 118372. https://doi.org/10.1016/j.eswa.2022.118372
Article Google Scholar

Download references

Funding

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Author information

Authors and Affiliations

Electronics and Communications Engineering Department, Faculty of Engineering, Mansoura University, 60 El-Gomhoria Street, Mansoura, Egypt
Doaa A. Altantawy & Mohamed A. Yakout

Authors

Doaa A. Altantawy
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed A. Yakout
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All persons who meet authorship criteria are listed as authors, and all authors certify that they have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript.

Corresponding author

Correspondence to Doaa A. Altantawy.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Tables 10, 11, 12, and 13 corresponds the Tables 6, 7, 8, and 9. They just show results at different other resolutions rather than those employed in Tables 6, 7, 8, and 9 in the main text. Table 10 indicates computitative comparison between the performance of different resolutions of wafer maps (26 × 26, 29 × 26, 30 × 34, and All) at different split ratios through the proposed inspection model. Table 11 demonstrates comparison between the different cases of applying the adopted feature engineering steps, i.e., ERD and SBAE, at resolutions (29 × 26, 27 × 25, 39 × 37, All). Table 12 introduces comparison of different common metaheuristic algorithms for the proposed fault detection in wafer maps, for resolutions (25 × 27, 29 × 26, 30 × 34, 39 × 37, and 33 × 29). Table 13 presents Comparison of different common 1D CNNs for the proposed fault detection in wafer maps, at resolutions (25 × 27, 29 × 26, 39 × 37, 33 × 29, and All). Table 10, 11, 12, and 13 resemble/correspond the set of Tables 6, 7, 8, and 9. Table 10, 11, 12, and 13 just shows results at different resolutions rather than those mentioned in the main text in Tables 6, 7, 8, and 9. Table 14 indicates summary of the main variables/parameters/functions mentioned in this paper.

Table 10 Computitative comparison between the performance of different resolutions of wafer maps at different split ratio

Full size table

Table 11 Comparison between the different cases of applying the adopted feature engineering steps, i.e., ERD and SBAE

Full size table

Table 12 Comparison of different common metaheuristic algorithms for the proposed fault detection in wafer maps

Full size table

Table 13 Comparison of different common 1D CNNs for the proposed fault detection in wafer maps

Full size table

Table 14 Glossary of the main variables/functions/parameters in this paper

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Altantawy, D.A., Yakout, M.A. Sparse deep encoded features with enhanced sinogramic red deer optimization for fault inspection in wafer maps. J Intell Manuf (2024). https://doi.org/10.1007/s10845-024-02377-4

Download citation

Received: 21 September 2023
Accepted: 17 March 2024
Published: 20 May 2024
DOI: https://doi.org/10.1007/s10845-024-02377-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Sparse deep encoded features with enhanced sinogramic red deer optimization for fault inspection in wafer maps

Abstract

Similar content being viewed by others

Wafer Defect Map Classification Using Sparse Convolutional Networks

Deep Convolutional Neural Networks with Residual Blocks for Wafer Map Defect Pattern Recognition

Wafer bin map inspection based on DenseNet

Introduction

Wafer map dataset

Methodology

Wafer data synthetization model

Sparse feature learning and encoding

A new sinogramic red deer feature ranking

The conventional red deer algorithm

Stage 1: Initialize the population

Stage 2: Roaring phase

Stage 3: The fighting phase between stags and commanders

Stage 4: Forming harems

Stage 5: Mating phase

The proposed enhanced red deer optimizer (ERD)

The tinkering strategy

The proposed 1DCNN classification model

Experimental results and discussion

Comparison to the state of the art

The impact of wafer map resolution in performance

The impact of feature engineering steps

The impact of the proposed ERD compared to other metaheuristic algorithms.

The impact of the classification stage

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation