1.1 Introduction

Fault detection and diagnosis (FDD) technology is a scientific field emerged in the middle of the twentieth century with the rapid development of science and data technology. It manifests itself as the accurate sensing of abnormalities in the manufacturing process, or the health monitoring of equipment, sites, or machinery in a specific operating site. FDD includes abnormality monitoring, abnormal cause identification, and root cause location. Through qualitative and quantitative analysis of field process and historical data, operators and managers can detect alarms that affect product quality or cause major industrial accidents. It is help for cutting off failure paths and repairing abnormalities in a timely manner.

1.1.1 Process Monitoring Method

In general, FDD technique is divided into several parts: fault detection, fault isolation, fault identification, and fault diagnosis (Hwang et al. 2010; Zhou and Hu 2009). Fault detection is determining of the appearance of fault. Once a fault (or error) has been successfully detected, damage assessment needs to be performed, i.e., fault isolation (Yang et al. 2006). Fault isolation lies in determining the type, location, magnitude, and time of the fault (i.e., the observed out-of-threshold variables). It should be noted that fault isolation is not to isolation of specific components of a system with the purpose of stopping errors from propagating. In a sense, fault identification may have been a better choice. It also has the ability to determine its timely change. Isolation and identification are commonly used in the FDD process without strict distinction. Fault diagnosis determines the cause of the observed out-of-threshold variables in this book, so it is called as fault root tracing. During the process of fault tracing, efforts are made to locate the source of the fault and find the root cause.

FDD involves control theory, probability statistics, signal processing, machine learning, and many other research areas. Many effective methods have been developed, and they are usually classified into three categories, knowledge-based, analytical, and data-driven (Chiang et al. 2001). Figure 1.1 shows the classification of fault diagnosis methods.

Fig. 1.1
figure 1

Classification of fault diagnosis methods

(1) Analytical Method

The analytical model of the engineering system is obtained based on the mathematical and physical mechanism. Analytical model-based method represents to monitor the process real time according to the mathematical models often constructed from first principles and physical characteristics. Most analytical measures contain state estimation (Wang et al. 2020), parameter estimation (Yu 1997), parity space (Ding 2013), and analytical redundancy (Suzuki et al. 1999). The analytical method appears to be relatively simple and usually is applied to systems with a relatively small number of inputs, outputs, and states. It is impractical for modern complex system since it is not easy to establish an accurate mathematical model due to its complex characteristics such as nonlinearity, strong coupling, uncertainty, and ultra-high-dimensional input and output.

(2) Knowledge-Based Method

Knowledge-based fault diagnosis does not require an accurate mathematical model. Its basic idea is to use expert knowledge or qualitative relationship to develop the fault detection rules. The common approaches mainly include fault tree diagnosis (Hang et al. 2006), expert system diagnosis (Gath and Kulkarn 2014), directed graphs, fuzzy logic (Miranda and Felipe 2015), etc. The application of knowledge-based models strongly relies on the complete process empirical knowledge. Once the information of the diagnosed object is known from expert experience and historical data, a variety of rules for appropriate reasoning is constructed. However, the accumulation of process experience and knowledge are time-consuming and even difficult. Therefore, this method is not universal and can only be applied to engineering systems which people are familiar with.

(3) Data-Driven Method

Data-driven method is based on the rise of modern information technology. In fact, it involves a variety of disciplines and techniques, including statistics, mathematical analysis, and signal processing. Generally speaking, the industrial data in the field are collected and stored by intelligent sensors. Data analysis can mine the hidden information contained in the data, establish the data model between input and output, help the operator to monitor the system status in real time, and achieve the purpose of fault diagnosis. Data-driven fault diagnosis methods are be divided into three categories: signal processing-based, statistical analysis-based, and artificial intelligence-based (Zhou et al. 2011; Bersimis et al. 2007). The commonality of these methods is that high-dimensional variables are projected into the low-dimensional space with extracting the key features of the system. Data-driven method does not require an accurate model, so is more universal.

Both analytical techniques and data-driven methods have their own merits, but also have certain limitations. Therefore, the fusion-driven approach combining mechanistic knowledge and data could compensate the shortcomings of a single technique. This book explores the fault detection, fault isolation/identification, and fault root tracing problems mainly based on the multivariate statistical analysis as a mathematical foundation.

1.1.2 Statistical Process Monitoring

Fault detection and diagnosis based on multivariate statistical analysis has developed rapidly and a large number of results have emerged recently. This class of method, based on the historical data, uses multivariate projection to decompose the sample space into a low-dimensional principal element subspace and a residual subspace. Then the corresponding statistics are constructs to monitor the observation variables. Thus, this method also is called latent variable projection method.

(1) Fault Detection

The common multivariate statistical fault detection methods include principal component analysis (PCA), partial least squares (PLS), canonical correlation analysis (CCA), canonical variables analysis (CVA), and their extensions. Among them, PCA and PLS, as the most basic techniques, are usually used for monitoring processes with Gaussian distributions. These methods usually use Hotelling’s \({\mathrm{{T}}}^{2}\) and Squared Prediction Error (SPE) statistics to detect variation of process information.

It is worth noting that these techniques extract the process features by maximizing the variance or covariance of process variables. They only utilize the information of first-order statistics (mathematical expectation) and second-order statistics (variance and covariance) while ignoring the higher order statistics (higher order moments and higher order cumulants). Actually, there are few processes in practice that are subject to the Gaussian distribution. The traditional PCA and PLS are unable to extract effective features from non-Gaussian processes due to omitting the higher order statistics. It reduces the monitoring efficiency.

Numerous practical production conditions, such as strong nonlinearity, strong dynamics, and non-Gaussian distribution, make it difficult to directly apply the basic multivariate monitoring methods. To solve these practical problems, various extended multivariate statistical monitoring methods have flourished. For example, to deal with the process dynamics, dynamic PCA and dynamic PLS methods have been developed, which take into account the autocorrelation and cross-correlation among variables (Li and Gang 2006). To deal with the non-Gaussian distribution, independent component analysis (ICA) methods have also been developed (Yoo et al. 2004). To deal with the process nonlinearity, some extended kernel methods such as kernel PCA (KPCA), kernel PLS (KPLS), and kernel ICA (KICA) have emerged (Cheng et al. 2011; Zhang and Chi 2011; Zhang 2009).

(2) Fault Isolation or Identification

A common approach for separating faults is the contribution plot. It is an unsupervised approach that uses only the process data to find fault variables and does not require other prior knowledge. Successful separation based on the contribution plot includes the following properties: (1) each variable has the same mean value of contribution under the normal operation and (2) the faulty variables have very large contribution values under the fault conditions, compared with other normal variables. Alcala and Qin summarized the commonly contribution plot techniques, such as complete decomposition contributions (CDC), partial decomposition contributions (PDC), and reconstruction-based contributions (RBC) (Alcala and Qin 2009, 2011).

However, contribution plot usually suffers from the smearing effect, a situation in which non-faulty variables show larger contribution values, while the contribution values of the fault variables are smaller. Westerhuis et al. pointed out that one variable may affect other variables during the execution of PCA, thus creating a smearing effect (Westerhuis et al. 2000). Kerkhof et al. analyzed the smearing effect in three types of contribution indices, CDC, PDC, and RBC, respectively (Kerkhof et al. 2013). It was pointed that smearing effect is caused by the compression and expansion operations of variables from the perspective of mathematical decomposition. So it cannot be avoided during the transformation of data from measurement space to latent variable space. In order to eliminate the smearing effect, several new contribution indices are given based on dynamically calculating average value of the current and previous residuals (Wang et al. 2017).

If the historical data collected have been previously categorized into separate classes where each class pertains to a particular fault, fault isolation or identification can be transformed into pattern classification problem. The statistical methods, such as Fisher’s discriminant analysis (FDA) (Chiang et al. 2000), have also been successfully applied in industrial practice to solve this problem. It assigns the data into two or more classes via three steps: feature extraction, discriminant analysis, and maximum selection. If the historical data have not been previously categorized, unsupervised cluster analysis may classify data into separate classes accordingly (Jain et al. 2000), such as the K-Means algorithm. More recently, neural network and machine learning techniques imported from statistical analysis theory have been receiving increasing attention, such as support vector data description (SVDD) covered in this book.

(3) Fault Diagnosis or Root Tracing

Fault root tracing based on Bayesian network (BN) is a typical diagnostic method that combines the mechanism knowledge and process data. BN, also known as probabilistic network or causal network, is a typical probabilistic graphical model. Since the end of last century, it has gradually become a research hotspot due to its superior theoretical properties in describing and reasoning about uncertain knowledge. BN was first proposed by Pearlj, a professor at the University of California, in 1988, to solve the problem of uncertain information in artificial intelligence. BN represents the relationships between the causal variable is the form of directed acyclic graphs. In the fault diagnosis process of an industrial system, the observed variable is used as node containing all the information about the equipment, control quantities, and faults in the system. The causal connection between variables is quantitatively described as a directed edge with the conditional probability distribution function (Cai et al. 2017). Fault diagnosis procedure with BNs consists of BN structure modeling, BN parameter modeling, BN forward inference, and BN inverse tracing.

In addition to the probabilistic graphical model such as BN, the development of other causal graphical model has developed vigorously. These progresses aim at determining the causal relationship among the operating units of the system based on hypothesis testing (Zhang and Hyvärinen 2008; Shimizu et al. 2006). The generative model (linear or nonlinear) is built to explain the data generation process, i.e., causality. Then the direction of causality is tested under some certain assumptions. The most typical one is the linear non-Gaussian acyclic model (LiNGAM) and its improved version (Shimizu et al. 2006, 2011). It has the advantage of determining the causal structure of variables without pre-specifying their causal order. All these results are serving as a driving force for the development of probabilistic graphical model and playing a more important role in the field of fault diagnosis.

1.2 Fault Detection Index

The effectiveness of data-driven measures often depends on the characterization of process data changes. Generally, there are two types of changes in process data: common and special. Common changes are entirely caused by random noise, while specials refer to all data changes that are not caused by common causes, such as impulse disturbances. Common process control strategies may be able to remove most of the data changes with special reasons, but these strategies cannot remove the common cause changes inherent in the process data. As process data changes are inevitable, statistical theory plays an important role in most process monitoring programs.

By defining faults as abnormal process conditions, it is easy to know that the application of statistical theory in the monitoring process actually relies on a reasonable assumption: unless the system fails, the data change characteristics are almost unchanged. This means that the characteristics of data fluctuations, such as mean and variance, are repeatable for the same operating conditions, although the actual value of the data may not be very predictable. The repeatability of statistical attributes allows automatic determination of thresholds for certain measures, effectively defining out-of-control conditions. This is an important step to automate the process monitoring program. Statistical process monitoring (SPM) relies on the use of normal process data to build process model. Here, we discuss the main points of SPM, i.e., fault detection index.

In multivariate process monitoring, the variability in the residual subspace (RS) is represented typically by squared sum of the residual, namely the Q statistic or the squared prediction error (SPE). The variability in the principle component subspace (PCS) is represented typically by Hotelling’s \({\mathrm{{T}}}^{2}\) statistic. Owing to the complementary nature of the two indices, combined indices are also proposed for fault detection and diagnosis. Another statistic that measures the variability in the RS is Hawkins’ statistic (Hawkins 1974). The global Mahalanobis distance can also be used as a combined measure of variability in the PCS and RS. Individual tests of PCs can also be conducted (Hawkins 1974), but they are often not preferred in practice, since one has to monitor many statistics. In this section, we summarize several fault detection indices and provide a unified representation.

1.2.1 \({\mathrm{{T}}}^{2}\) Statistic

Consider the sampled data with m observation variables \(\boldsymbol{x} = [x_1,x_2,\ldots ,x_m]\) and n observations for each variable. The data are stacked into a matrix \(\boldsymbol{X}\in \mathcal {R}^{n\times m}\), given by

$$\begin{aligned} \boldsymbol{X}=\left[ \begin{array}{cccccc} x_{11} &{} x_{12} &{} \cdots &{} x_{1m}\\ x_{21} &{} x_{22} &{} \cdots &{} x_{2m}\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ x_{n1} &{} x_{n2} &{} \cdots &{} x_{nm}\\ \end{array} \right] , \end{aligned}$$
(1.1)

firstly, the matrix \(\boldsymbol{X}\) is scaled to zero mean, and the sample covariance matrix is equal to

$$\begin{aligned} \boldsymbol{S}=\frac{1}{n-1}\boldsymbol{X}^\mathrm {T}\boldsymbol{X}. \end{aligned}$$
(1.2)

An eigenvalue decomposition of the matrix \(\boldsymbol{S}\),

$$\begin{aligned} \boldsymbol{S}=\bar{\boldsymbol{P}} \bar{\boldsymbol{\varLambda }} \bar{\boldsymbol{P}} ^\mathrm {T}= [\boldsymbol{ P} \;\tilde{\boldsymbol{P}}]\;diag\{\boldsymbol{\varLambda }, \tilde{\boldsymbol{\varLambda }}\}\;[\boldsymbol{ P} \;\tilde{\boldsymbol{P}}]^\mathrm {T}. \end{aligned}$$
(1.3)

The correlation structure of the covariance matrix \(\boldsymbol{S}\) is revealed, where \(\boldsymbol{P}\) is orthogonal. (\(\boldsymbol{P}\boldsymbol{P}^{\mathrm {T}}=\boldsymbol{I}\), in which, \(\boldsymbol{I}\) is the identity matrix) (Qin 2003) and

$$\begin{aligned} \boldsymbol{\varLambda }&= \frac{1}{n-1} \boldsymbol{T}^\mathrm {T}\boldsymbol{T}=diag\{\lambda _1,\lambda _2,\ldots ,\lambda _k\}\\ \tilde{\boldsymbol{\varLambda }}&= \frac{1}{n-1}{\tilde{\boldsymbol{T}}}^\mathrm {T}\tilde{\boldsymbol{T}}=diag\{\lambda _k+1,\lambda _k+2,\ldots ,\lambda _m\}\\ \quad \quad&\lambda _1\ge \lambda _2\ge \cdots \ge \lambda _m,\; \quad \sum _{i=1}^k \lambda _i > \sum _{j=k+1}^m \lambda _j\\ \lambda _i&= \frac{1}{N-1} \boldsymbol{t}_i^\mathrm {T}\boldsymbol{t}_i\approx \mathrm{var}(\boldsymbol{t}_i) \end{aligned}$$

when n is very large. The score vector \(\boldsymbol{t}_i\) is the i-th column of \(\bar{\boldsymbol{T}}=[\boldsymbol{T,\tilde{T}}]\). The PCS is \(S_p=\text {span}\{\boldsymbol{P}\}\) and the RS is \(S_r=\text {span}\{\tilde{\boldsymbol{P}}\}\). Therefore, the matrix \(\boldsymbol{X}\) is decomposed into a score matrix \(\bar{\boldsymbol{T}}\) and a loading matrix \(\bar{\boldsymbol{P}}=[\boldsymbol{P,\tilde{P}}]\), that is

$$\begin{aligned} \boldsymbol{X}= \bar{\boldsymbol{T}} \bar{\boldsymbol{P}}^\mathrm {T}= \hat{\boldsymbol{X}} +\tilde{\boldsymbol{X}}=\boldsymbol{TP}^\mathrm {T}+\tilde{\boldsymbol{T}} \tilde{\boldsymbol{P}}^\mathrm {T}= \boldsymbol{X PP}^\mathrm {T}+\boldsymbol{X} \left( \boldsymbol{I}-\boldsymbol{P}\boldsymbol{P}^\mathrm {T}\right) , \end{aligned}$$
(1.4)

The sample vector \(\boldsymbol{x}\) can be projected on the PCS and RS, respectively:

$$\begin{aligned}&\boldsymbol{x}=\hat{\boldsymbol{x}}+\tilde{\boldsymbol{x}}\end{aligned}$$
(1.5)
$$\begin{aligned}&\hat{\boldsymbol{x}} =\boldsymbol{P} \boldsymbol{P}^\mathrm {T}\boldsymbol{x}\end{aligned}$$
(1.6)
$$\begin{aligned}&\tilde{\boldsymbol{x}}= \tilde{\boldsymbol{P}}\tilde{\boldsymbol{P}}^\mathrm {T}\boldsymbol{x}= \left( \boldsymbol{I}-\boldsymbol{P}\boldsymbol{P}^\mathrm {T}\right) \boldsymbol{x}. \end{aligned}$$
(1.7)

Assuming \(\boldsymbol{S}\) is invertible and with the definition

$$\begin{aligned} \boldsymbol{z}=\boldsymbol{\varLambda }^{-\frac{1}{2}}\boldsymbol{P}^{\mathrm {T}}\boldsymbol{x}. \end{aligned}$$
(1.8)

The Hotelling’s \({\mathrm{{T}}}^{2}\) statistic is given by Chiang et al. (2001)

$$\begin{aligned} {\mathrm{{T}}}^{2}=\boldsymbol{z}^{\mathrm {T}}\boldsymbol{z}=\boldsymbol{x}^\mathrm {T}\boldsymbol{P} \boldsymbol{\varLambda }^{-1}\boldsymbol{P}^\mathrm {T}\boldsymbol{x}. \end{aligned}$$
(1.9)

The observation vector \(\boldsymbol{x}\) is projected into a set of uncorrelated variables \(\boldsymbol{y}\) by \(\boldsymbol{y}=\boldsymbol{P}^\mathrm {T}\boldsymbol{x}\). The rotation matrix \(\boldsymbol{P}\) directly from the covariance matrix of \(\boldsymbol{x}\) guarantees that \(\boldsymbol{y}\) is correspond to \(\boldsymbol{x}\). \(\boldsymbol{\varLambda }\) scales the elements of \(\boldsymbol{y}\) to produce a set of variables with unit variance corresponding to the elements of \(\boldsymbol{z}\). The conversion of the covariance matrix is demonstrated graphically in Fig. 1.2 for a two-dimensional observation space (\(m = 2\)) (Chiang et al. 2001).

Fig. 1.2
figure 2

A graphical illustration of the covariance conversion for the \({\mathrm{{T}}}^{2}\) statistic

The \(\mathrm{{T}}^2\) statistic is a scaled squared 2-norm of an observation vector \(\boldsymbol{x}\) from its mean. An appropriate scalar threshold is used to monitor the variability of the data in the entire m-dimensional observation space. It is determined based on an appropriate probability distribution with given significance level \(\alpha \). In general, it is assumed that

  • the observations are randomly sampled and subject to a multivariate normal distribution.

  • the mean vector and covariance matrix of observations sampled in the normal operations are equal to the actual ones, respectively.

Then the \(\mathrm{{T}}^2\) statistic follows a \(\chi ^{2}\) distribution with m degrees of freedom (Chiang et al. 2001),

$$\begin{aligned} \mathrm{{T}}_{\alpha }^{2}=\chi _{\alpha }^{2}(m). \end{aligned}$$
(1.10)

The set \({\mathrm{{T}}}^{2}\le \mathrm{{T}}_{\alpha }^{2}\) is an elliptical confidence region in the observation space, as illustrated in Fig. 1.3 for two process variables. This threshold (1.10) is applied to monitor the unusual changes. An observation vector projected within the confidence region indicates process data are in-control status, whereas outside projection indicates that a fault has occurred (Chiang et al. 2001).

Fig. 1.3
figure 3

An elliptical confidence region for the \(T^{2}\) statistic

When the actual covariance matrix for the normal status is not known but instead estimated from the sample covariance matrix (1.2), the threshold for fault detection is given by

$$\begin{aligned} {\mathrm{{T}}}^{2}_{\alpha }=\frac{m(n-1)(n+1)}{n(n-m)}F_{\alpha }(m,n-m), \end{aligned}$$
(1.11)

where \(F_{\alpha }(m,n-m)\) is the upper 100\(\alpha \%\) critical point of the F-distribution with m and \(n - m\) degrees of freedom (Chiang et al. 2001). For the same significance level \(\alpha \), the upper in-control limit in (1.11) is larger (more conservative) than that in (1.10). The two limits approach each other when the amount of observation increases \((n\rightarrow \infty )\) (Tracy et al. 1992).

1.2.2 Squared Prediction Error

The SPE index measures the projection of the sample vector on the residual subspace:

$$\begin{aligned} \mathrm{SPE}:=\Vert \tilde{\boldsymbol{x}}\Vert ^2=\Vert ( \boldsymbol{I}-\boldsymbol{P}\boldsymbol{P}^\mathrm {T}) \boldsymbol{x}\Vert ^2. \end{aligned}$$
(1.12)

The process is considered as normal if

$$\begin{aligned} \mathrm{SPE} \le \delta _{\alpha }^{2}, \end{aligned}$$
(1.13)

where \(\delta _{\alpha }^{2}\) denotes the upper control limit of SPE with a significant level of \(\alpha \). Jackson and Mudholkar gave an expression for \(\delta _{\alpha }^{2}\) (Jackson and Mudholkar 1979)

$$\begin{aligned} \delta _{\alpha }^{2}=\theta _{1}\left( \frac{z_{\alpha }\sqrt{2\theta _{2}h_{0}^2}}{\theta _{1}}+1+\frac{\theta _{2}h_{0}(h_{0}-1)}{\theta _{1}^{2}}\right) ^{1/h_{0}}, \end{aligned}$$
(1.14)

where

$$\begin{aligned} \theta _{i}=\sum _{j=k+1}^{m}\lambda _{j}^{i},\qquad i=1,2,3, \end{aligned}$$
(1.15)
$$\begin{aligned} h_{0}=1-\frac{2\theta _{1}\theta _{3}}{3\theta _{2}^{2}}, \end{aligned}$$
(1.16)

where k is the number of retained principal components and \(z_{\alpha }\) is the normal deviation corresponding to the upper percentile of \(1-\alpha \). Note that the above result is obtained under the following conditions.

  • The sample vector \(\boldsymbol{x}\) follows a multivariate normal distribution.

  • In deriving the control limits, an approximation is made to this distribution that is valid when \(\theta _{1}\) is very large.

  • This result holds regardless of the number of principal components retained in the model.

When a fault occurs, the fault sample vector \(\boldsymbol{x}\) consists of the normal part superimposed on the faulty part. The fault causes the SPE to be larger than the threshold \(\delta _{\alpha }^{2}\), which results in the fault being detected.

Nomikos and MacCregor (1995) used the results in Box (1954) to derive an alternative upper control limit for SPE.

$$\begin{aligned} \delta _{\alpha }^{2}=g\chi _{h;\alpha }^{2} \end{aligned}$$
(1.17)

where

$$\begin{aligned} g=\theta _{2}/\theta _{1},\qquad h=\theta _{1}^{2}/\theta _{2}. \end{aligned}$$
(1.18)

The relationship between SPE threshold (1.14) and (1.17) is as follows: Nomikos and MacCregor (1995)

$$ \delta _{\alpha }^{2}\cong gh\left( 1-\frac{2}{9h}+z_\alpha \sqrt{\frac{2}{9h}}\right) ^3 $$

1.2.3 Mahalanobis Distance

Define the following Mahalanobis distance which forms the global Hotelling’s \({\mathrm{{T}}}^{2}\) test:

$$\begin{aligned} \mathrm{{D}}=\boldsymbol{X}^{\mathrm{{T}}}\boldsymbol{S}^{-1}\boldsymbol{X}\sim \frac{m(n^{2}-1)}{n(n-m)}F_{m,n-m}, \end{aligned}$$
(1.19)

where \(\boldsymbol{S}\) is the sample covariance of \(\boldsymbol{X}\). When \(\boldsymbol{S}\) is singular with \(\mathrm{rank}(\boldsymbol{S})=r<m\), Mardia discusses the use of the pseudo-inverse of \(\boldsymbol{S}\), which in turn yields the Mahalanobis distance of the reduced-rank covariance matrix (Brereton 2015):

$$\begin{aligned} \mathrm{{D}_{r}}=\boldsymbol{X}^{\mathrm{{T}}}\boldsymbol{S}^{+}\boldsymbol{X}\sim \frac{r(n^{2}-1)}{n(n-r)}F_{r,n-r} \end{aligned}$$
(1.20)

where \(\boldsymbol{S}^{+}\) is the Moore-Penrose pseudo-inverse. It is straightforward to show that the global Mahalanobis distance is the sum of \({\mathrm{{T}}}^{2}\) in PCS and \(\mathrm{{T}}_{H}^{2}= \boldsymbol{x}^\mathrm {T}\tilde{\boldsymbol{P}}\tilde{\boldsymbol{\varLambda }}^{-1}\tilde{\boldsymbol{P}}^\mathrm {T}\boldsymbol{x}\) (Hawkins’ statistic Hawkins 1974) in RS:

$$\begin{aligned} \mathrm{{D}}={\mathrm{{T}}}^{2}+\mathrm{{T}}_{H}^{2}. \end{aligned}$$
(1.21)

When the number of observations n is quite large, the global Mahalanobis distance approximately obeys the \(\chi ^{2}\) distribution with m degrees of freedom:

$$\begin{aligned} \mathrm{{D}}\sim \chi _{m}^{2}. \end{aligned}$$
(1.22)

Similarly, the reduced-rank Mahalanobis distance follows:

$$\begin{aligned} \mathrm{{D}_{r}}\sim \chi _{r}^{2}. \end{aligned}$$
(1.23)

Therefore, faults can be detected using the correspondingly defined control limits for \(\mathrm{{D}}\) and \(\mathrm{{D}}_{r}\).

1.2.4 Combined Indices

In practice, better monitoring performance can be achieved in some cases by using a combined index instead of two indices to monitor the process. Yue and Qin proposed a combined index for fault detection that combines SPE and \({\mathrm{{T}}}^{2}\) as follows: Yue and Qin (2001):

$$\begin{aligned} \varphi =\frac{\mathrm{{SPE}}(\boldsymbol{X})}{\delta _{\alpha }^2}+\frac{{\mathrm{{T}}}^{2}(\boldsymbol{X})}{\chi _{l;\alpha }^2}=\boldsymbol{X}^\mathrm {T}{\boldsymbol{\varPhi }} {\boldsymbol{X}}, \end{aligned}$$
(1.24)

where

$$\begin{aligned} \boldsymbol{\varPhi }=\frac{\boldsymbol{P}\boldsymbol{\varLambda }^{-1}\boldsymbol{P}^{\mathrm{{T}}}}{\chi _{l,\alpha }^2} +\frac{\boldsymbol{I}-\boldsymbol{P}\boldsymbol{P}^{\mathrm{{T}}}}{\delta _{\alpha }^2} =\frac{\boldsymbol{P}\boldsymbol{\varLambda }^{-1}\boldsymbol{P}^{\mathrm{{T}}}}{\chi _{l,\alpha }^2} +\frac{\tilde{\boldsymbol{P}}\tilde{\boldsymbol{P}}^{\mathrm{{T}}}}{\delta _{\alpha }^2}. \end{aligned}$$
(1.25)

Notice that \(\boldsymbol{\varPhi }\) is symmetric and positive definite. To use this index for fault detection, the upper control limit of \(\varphi \) is derived from the results of Box (1954), which provides an approximate distribution with the same first two moments as the exact distribution. Using the approximate distribution given in Box (1954), the statistical data \(\varphi \) is approximated as follows:

$$\begin{aligned} \varphi =\boldsymbol{X}^\mathrm {T}\boldsymbol{\varPhi }\boldsymbol{X} \sim g\chi _{h}^2, \end{aligned}$$
(1.26)

where the coefficient

$$\begin{aligned} g=\frac{{tr}(\boldsymbol{S\varPhi })^{2}}{{tr}(\boldsymbol{S\varPhi })} \end{aligned}$$
(1.27)

and the degree of freedom for \(\chi _{h}^{2}\) distribution is

$$\begin{aligned} h=\frac{[{tr}(\boldsymbol{S\varPhi })]^{2}}{{tr}(\boldsymbol{S\varPhi })^{2}}, \end{aligned}$$
(1.28)

in which,

$$\begin{aligned} {tr}(\boldsymbol{S\varPhi })&=\frac{l}{\chi _{l;\alpha }^2}+\frac{\sum _{i=l+1}^{m}\lambda _{i}}{\delta _{\alpha }^{2}}\end{aligned}$$
(1.29)
$$\begin{aligned} {tr}(\boldsymbol{S\varPhi })^{2}&=\frac{l}{\chi _{l;\alpha }^{4}}+\frac{\sum _{i=l+1}^{m}\lambda _{i}^{2}}{\delta _{\alpha }^{4}} \end{aligned}$$
(1.30)

After computing g and h, for a given significance level \(\alpha \), a control upper limit for \(\varphi \) can be obtained. A fault is detected by \(\varphi \) if

$$\begin{aligned} \varphi >g\chi _{h;\alpha }^{2}, \end{aligned}$$
(1.31)

It is worth noting that Raich and Cinar suggest another combined statistic (Raich and Cinar 1996),

$$\begin{aligned} \varphi =c\frac{\mathrm{{SPE}}(\boldsymbol{X})}{\delta _{\alpha }^2}+(1-c)\frac{{\mathrm{{T}}}^{2}(\boldsymbol{X})}{\chi _{l;\alpha }^2}, \end{aligned}$$
(1.32)

where \(c\in (0,1)\) is a constant. They further give a rule that the statistic less than 1 is considered normal. However, this may lead to wrong results because even if the above statistic is less than 1, it is possible that \(\mathrm{SPE}(\boldsymbol{X})>\delta _{\alpha }^2\) or \({\mathrm{{T}}}^{2}(\boldsymbol{X})>\chi _{l;\alpha }^2\) (Qin 2003).

1.2.5 Control Limits in Non-Gaussian Distribution

Nonlinear characteristics are the hotspot of current process monitoring research. Many nonlinear methods such as kernel principal component, neural network, and manifold learning are widely used in the component extraction of process monitoring. The principal component extracted by such methods may be independent of the Gaussian distribution. Thus, the control limits of the \(\mathrm{{T}}^2\) and \(\mathrm Q\) statistical series are calculated by the probability density function, which can be estimated by the nonparametric kernel density estimation (KDE) method. The KDE applies to the \(\mathrm{{T}}^2\) and \(\mathrm {Q}\) statistics because they are univariate although the processes represented by these statistics are multivariate. Therefore, the control limits for the monitoring statistics (\(\mathrm{{T}}^2\) and \(\mathrm {SPE}\)) are calculated from their respective PDF estimates, given by

$$\begin{aligned} \begin{aligned}&\int ^{\mathrm {Th}_{\mathrm{{T}}^2,\alpha }}_{-\infty }g(\mathrm{{T}}^2)d\mathrm{{T}}^2=\alpha \\&\int ^{\mathrm {Th}_\mathrm{{SPE},\alpha }}_{-\infty }g(\mathrm {SPE})d\mathrm {SPE}=\alpha , \end{aligned} \end{aligned}$$
(1.33)

where

$$ g(z) = \frac{1}{lh}\sum ^l_{j=1}\mathbb {K}\left( \frac{z-z_j}{h}\right) $$

\(\mathbb {K}\) denotes a kernel function and h denotes the bandwidth or smoothing parameter.

Finally, the fault detection logic for the PCS and RS is as follows:

$$\begin{aligned} \begin{aligned}&\mathrm{{T}}^2> \mathrm {Th}_{\mathrm{{T}}^2,\alpha }\;\text {or}\;\mathrm{{T}}_\mathrm{SPE} > \mathrm {Th}_\mathrm{{SPE},\alpha },&\;\text {Faults}\\&\mathrm{{T}}^2 \le \mathrm {Th}_{\mathrm{{T}}^2,\alpha }\;\text {and}\;\mathrm{{T}}_\mathrm{SPE} \le \mathrm {Th}_\mathrm{{SPE},\alpha },&\;\text {Fault-free}. \end{aligned} \end{aligned}$$
(1.34)