1 Introduction

Chip’s quality is crucial to a semiconductor industry. In order to guarantee the chip’s quality, thousands of test parameters have been defined. Besides, it always includes hundreds even thousands of chips in one test process. Hence, a batch of chip’s test data includes at least hundreds even thousands of float data. The units of these test parameters are various. And the relationship among these test parameters are complex. So the test data set represents a typical multivariable nonlinear coupling system. The useful information, such as the chip’s quality, the relationship among the fault parameters, the fault distribution etc. is hidden inside the “data sea”. It’s difficult to deal with the test data set and obtain it. Essentially, obtaining useful information from test data set is a problem of nonlinear multivariate series analysis, which is a challenge for decades.

The conventional data-driven approaches of the process monitoring are multivariate statistics analysis (MSA) (Amin et al. 2018; He and Tan 2018; Tootooni et al. 2018; Jiang et al. 2017), support vector machines (SVM) (Nguyen and Lee 2018; Du et al. 2018; Ma et al. 2017; Galmeanu et al. 2016)and inductive data mining (IDM). The partial least squares (PLS) (Nomikos and MacGregor 1995; Dong et al. 2015; Foo et al. 2018) and principal component analysis (PCA) (Wold et al. 1987; Gharahbagheri et al. 2017; Dong and Qin 2018; Bakdi and Kouadri 2018; Burgas et al. 2018; Cai and Zhang 2016) are the representative techniques of MSA. Both methods are challenged by the process of non-Gaussian distribution and nonlinear multivariate time series. Independent component analysis (ICA) is focus on dealing with non-Gaussian characteristics (Lee et al. 2004; Lee et al. 2006; Wang et al. 2012; Habchi et al. 2018). Neural-network-related PCA (NN-PCA) (Chen and Liao 2002; Dong and McAvoy 1996), kernel principal component analysis (KPCA) (Alcala and Qin 2010; Ge et al. 2009; Lee et al. 2004; Lee et al. 2007; Mania Navi et al. 2018; Wang et al. 2017; Arroyo-Hernandez 2016) and kernel independent component analysis (KICA) (Zhang 2008; Dyer et al. 2017; Li et al. 2016) could be applied to solve nonlinear problem. SVM (Jemwa and Aldrich 2005; Zeng et al. 2006) establishes a robust non-linear model to offer accurate quantitative predictions in classification.

These common methods reduce the dimension of its input variables. The first step of these methods is to select ‘primary’ and ‘useful’ elements and omit others. Is it true that the omitted elements is useless? The investigation shows that most chip’s fault is rising from insignificant equipment. It is difficult to select a appropriate function to choose “kernel” parameters. Therefore, all the data should be analyzed simultaneously in order to get accurate operating state of the system.

Therefore, it is a challenge to find an appropriate method dealing with huge, nonlinear and multivariate time series simultaneously and efficiently. All the field that require dealing with high dimensional nonlinear series data set face the same challenge. In the field of process industry, a color-spectrum is defined by coloring the monitor data set with RGB color mode(Kai et al. 2015; Gao et al. 2016). This method has the advantage of displaying overall data set on one picture. That is, it avoids select “primary” parameter. Regardless the amount of the parameters, each parameters has the same significance and could be taken into account through analyzing the color-spectrum.

Hence,considering the special requirement of the semiconductor chip’s quality analysis, the method of color-spectrum is developed. By combining the theory of Shewhart-type control chart (Tong et al. 2004; Djauhari et al. 2017; Iglesias et al. 2016) and scientific data visualization (Montgomery 2005; Chu and Wu 2016), a new method named quality-spectrum is proposed. As we know, the chip’s data set is a huge, nonlinear and multivariate series that contains all information of the chip’s quality. The primary aim for analyzing the test data set is to explore the distribution of its abnormal data. The advantage of Shewhart-type control chart is classify data into different regions. Meanwhile, scientific visualization could display the unseen information hidden inside the data sea on a picture (Brodlie et al. 1992). Through analyzing the statistical characteristics of each series individually, Shewhart-type control chart limits the value region named in Q_region for the whole test data set. Hence, it separates the test data set into several groups from worst to best. The quality-spectrum is formed by coloring different groups with various color. By this way, a complicate float data set is transferred to a simple digital color image. The distribution of the chip’s quality could be obtained qualitatively by observing the quality-spectrum.

The outline of this paper is the following: In Sect. 2, a matrix Q_region representing different quality degree filters is defined. In Sect. 3: The definition of quality-spectrum is proposed. In Sect. 4, one case has been used to verify the validation of the methods. This paper is closed with a conclusion in Sect. 5.

2 Quality classified matrix

Suppose a chip’s quality including n test parameters. A test batch includes m chips. Hence test data set of the batch is a m × n float data matrix.

Definition 1. A batch of chip’s test data could be arranged as a test Data Matrix X. It’s a m × n matrix as Eq. 1 shows.

(1)

In Eq. 1 each element \( x_{ij} (x_{ij} \in X,1 \le i \le m,1 \le j \le n,i,i \in N) \) is a particular test data, where i represents the chip’s ID and j marks testing parameter.

Considering each test parameter has been defined a perfect value. A chip best quality should be defined by a 1 × n vector named Perfect_vector as Eq. (2) shows.

(2)

Obviously, perfect_vector is an ideal quality value for the test data which is the baseline of real test data. Therefore, it formed a tolerate region based on the perfect_vector. One is an Up_quality_vector as Eq. (3) shows.

(3)

The other is a Low_quality_vector as Eq. (4) shows.

(4)

Based on Eq. (3) and Eq. (4), the perfect quality chips’s baseline follows the Eq. (5).

$$ X_{Perfect} :\theta_{j} - \delta_{j} < x_{i}^{j} < \theta_{j} + \gamma_{j} ,x_{i}^{j} \in X $$
(5)

Suppose k chips are suitable to the baseline, a k × n perfect data set XPerfect is formed as Eq. 6 shows.

(6)

Each column \( x_{j} \in X_{Perfect} \) is a vector of k chip’s test data of parameters j. The average value of each column means the benchmark parameter j. And the standard deviation of each column is used to calculate the distant of the test data away from its benchmark.

Definition 2. Chip_E is 1× n vector composed with n parameters’ benchmark as Eq. 7 shows.

(7)

Where \( \mu_{j} \left( {\mu_{j} = \frac{1}{n}\sum\nolimits_{i = 1}^{k} {x_{ij} ,x_{ij} \in X_{Perfect} } } \right) \) is the mean of parameter j.

Definition 3. Chip_Std is one dimensional vector of each parameters’ standard deviation based on the benchmark data matrix XPerfect as Eq. 8 shows

(8)

where \( \sigma_{j} = \sqrt {\frac{1}{n - 1}\sum\nolimits_{i = 1}^{k} {(x_{ij} - \mu_{j} )^{2} } } \) is the standard deviation of sensor j.

The chip’s quality could be separated into h degree. For each chip’s test parameter, it has h quality grade interval between the perfect and fault. Figure 1 shows the arrangement of the quality grade interval for test parameter j.

Fig. 1
figure 1

The quality region of parameter j

As Fig. 1 shows, the parameter j’s deviation from its mean is the acknowledged quality standard based on the theory of Shewhart control chart. By this way, the chip’s quality region based on the test data set could be defined.

Definition 4. Q_Region is a matrix enclosed overall test parameters’ upper and lower control limits of as Eq. 9 shows.

(9)

In Eq. (9), h is used to indicate the number of the quality region. Q_Region gives the quality lines that could classify the chip’s test data set, as Fig. 2 shows.

Fig. 2
figure 2

Semiconductor chip’s multiple quality region

By calculating the mean and standard deviation of each parameter based on the benchmark matrix XPerfect, the quality classify standard Q_Region could be obtained. As we discussed above, a batch of semiconductor chip’s test data set is a data matrix X which including m × n test data. The elements of data matrix X is float. Generally, n which represents the number of test variables is always at the quantity of thousands. The elements of the test data set could be classified using Q_Region as a filter. The quality of the chip is divided into h categories following the definition of the Q_Region as Eq. (5) shows. Hence the float test data set has been transferred to an integer quality classifier matrix Q, as Eq. (6) shows.

(10)

As Eq. (10) shows, the data matrix X has been marked with special number. The data matrix has been transferred to quality classifier matrix Q.

3 Chip’s quality-spectru

3.1 Form quality-spectrum

Since human’s eyes are more sensitive to the color change than the float number, it is easy to observe the distribution of the chip’s quality by coloring the quality classifier matrix Q. Visualizing the data values using color corresponds to mapping a single parameter distribution to color(McCormick et al. 1987). The advantage of color expression is that the number of noticeable differences is high. The RGB color model, which is a typical color model, is composed of three basic color: Red, Green and Blue. It is a three-dimensional color phase space \( R \times G \times B \in C^{3} \).Fig. 3 shows the a color cube of RGB whose horizontal x-axis as red values increasing to the left, y-axis as blue increasing to the lower right and the vertical z-axis as green increasing towards the top. The original point of (r, g, b) is black (0, 0, 0) which is the vertex hidden from view. Since the elements of \( \varPi \) is the color index corresponding with the data matrix X, it is can be colored according with the the RGB color cube.

Fig. 3
figure 3

RGB color cube

Based on the theory of data visualization and RGB color cube, a color table \( \varPi \in C^{3} \) is defined to coloring the quality classifer matrix Q, as Table 1 shows.

Table 1 Color table

The color rule is \( \chi :x_{ij} \to \varPi \subset C^{3} \),which mapping a sample value \( x_{ij} \in Q \) to the color phase space \( \varPi \subset C^{3} \cdot \varGamma \) is generated by rendering the color table \( \varPi \) mapped data matrix \( p_{ij} \in \varPi ,i \in [1,m],\;j \in [1,n] \) by using special RGB model \( \xi \) as Eq. 11 shows.

$$ \xi :Q \times \varPi \to \varGamma $$
(11)

Definition 5. Quality-spectrum \( \varGamma \) is a two binary digital image which represents the abnormal monitor data distribution as Eq. 12 shows.

$$ \varGamma = \left\{ {Pixel\_color_{ij}\left| {Pixel\_color_{ij} } \right. = \left\{ {\begin{array}{*{20}c} {Green} & {x_{ij} = 0} \\ {Yellow} & {x_{ij} = 1} \\ {White} & {x_{ij} = 2} \\ {Blue} & {x_{ij} = 3} \\ {Cyan} & {x_{ij} = 4} \\ {Black} & {x_{ij} = 5} \\ {\text{Re} d} & {x_{ij} = 6} \\ \end{array} } \right.,\quad \forall x_{ij} \in Q,i \in [1,m],j \in [1,n]} \right\} $$
(12)

The quality-spectrum defined Green as the first level, which represents the perfect quality of the chip. And the red means the worst quality that absolutely unacceptable. Yellow, white, blue, Cyan, black represents the quality level decreased from the extreme perfect to totally fail. The quality state of the chip could observed intuitively by converting the test data matrix to a quality-spectrum.

3.2 Fault distribution of parameters

There are hundreds of test parameters for each chip. The impact of each parameter to the chip’s quality is the root for fault tracing and quality improving. Since the quality classifier matrix Q represents the fault degree each chip’s special parameter, the fault score of each parameter is obtained as Eq. (13) shows.

(13)

As Eq. (13) shows, the fault score of each parameter is the summary of each column of quality classifier matrix Q.

3.3 Quantitative chip’s quality

The chip’s quality is determined by its test parameters. Suppose each test parameter has the same impact to the chip’s quality. The fault score of the chip’s quality is obtained by summarizing its overall test parameter’s quality score as Eq. (14) shows.

(14)

Therefore, the chip’s quality could be calculated quantitatively by its fault score. The lower of a chip’s fault score represents the better of the chip’s quality.

4 Use case

A batch of chip’s test data looks like Table 2. The number of test chip are thousands as well as its test parameters are over hundreds. Furthermore, the data is float. It is difficult to judge the quality of the chip through the Table 2 directly.

Table 2 Chip’s test data set

Following the definition and algorithm proposed in Sect. 2 and Sect. 3, the chip’s test data set is converted to a quality classifier matrix Q first. Then, a quality-spectrum is formed as Fig. 4 shows.

Fig. 4
figure 4

The semiconductor chip’s quality-spectrum

Figure 4 shows the quality-spectrum of a batch of chip. It includes 1000 chips which have 543 test parameters. The green pixels of the quality-spectrum display the distribution of the chip’s perfect test data. And the red ones means the worst. The color bar on the right side of Fig. 4 displays the color transition from the green to red, which represent the chip’s test data decreases from the best to the worst. That is, the defect chips and problematic parameters distribution could be observed intuitively. Furthermore, the quality-spectrum could be quantitative analyzed by calculating the its fault score of its parameter and chip as Eq. (13) and Eq. (14) shows. Figure 6 shows the fault score of each test parameter which displays 543 test parameter’s fault comprehensively.

Through Fig. 5, it is easy to find the impact of overall test parameter to the chip’s quality which is useful for the fault root tracing and fault diagnose. The chip’s fault scores realize the comprehensive and quantitative judging the impact of overall test parameter. Obviously, the fault score of each chip represents its quality considering overall test parameters in this case. The overall chip’s fault score is shown in Fig. 6.

Fig. 5
figure 5

Fault score of test parameter

Fig. 6
figure 6

Chip’s fault score

As Fig. 6 shows, a lot of chip’s fault score is high which indicates vital faults hidden that decrease the chip’s quality. Through double check the chip’s production process, the faults DO exist. These faults are stubble and hides deeply. It is difficult to be noticed without Fig. 6.

5 Conclusions

The advantage of the quality-spectrum is its ability of dealing with multivariable series data set regardless the scale of the dimension. The chip’s quality-spectrum displays the quality information of a high dimensional huge data by a digital color picture. Each test datum could be transformed to a special color. Through observing the quality-spectrum, it is easier to obtain useful information than a huge float data set because human’s eyes are more sensitive to the change of the color than the float number. Therefore, NO parameter needs to be neglected which is a necessary step for any conventional method. Furthermore, analyzing quality-spectrum could use abundant methods of digital image processing. The proposed method bridge the field of chip’s quality analysis with digital image processing. The use case of this paper using fault score to reveal the quality regular hidden inside a huge float data set is a preliminary exploration.