Quality variables are measured much less frequently and usually with a significant time delay by comparison with the measurement of process variables. Monitoring process variables and their associated quality variables is essential undertaking as it can lead to potential hazards that may cause system shutdowns and thus possibly huge economic losses. Maximum correlation was extracted between quality variables and process variables by partial least squares analysis (PLS) (Kruger et al. 2001; Song et al. 2004; Li et al. 2010; Hu et al. 2013; Zhang et al. 2015). In order to deal with the nonlinear correlation of industrial data, this chapter proposes another two nonlinear PLS methods, named as Local Linear Embedded Projection of Latent Structure (LLEPLS). LLEPLS is an oblique projection on the input data space. By further decomposing the LLEPLS model, Local Linear Embedded Orthogonal Projection of Latent Structure (LLEOPLS) is proposed which the orthogonal projection on the input space is obtained. LLEPLS or LLEOPLS also extracts the maximum relevant information and preserves the local nonlinear structure between input and output simultaneously.

LLEPLS or LLEOPLS project the input and output space into three subspaces from the view of statistical analysis: (1) joint input-output subspace, aiming at finding the nonlinear relationship between the input and output. It also can be used for quality prediction. (2) output-residual subspace, aiming at monitoring the quality-related fault which cannot be predicted from the process data. (3) orthogonal input-residual subspace, aiming at identifying whether the predictable fault is quality related. The corresponding monitoring strategies are established based on the LLEPLS and LLEOPLS model, respectively.

11.1 Comparison of GPLPLS, LPPLS, and LLEPLS

PLS has a better performance compared to PCA in quality-relevant faults. As shown in Fig. 11.1, the output space (\(\boldsymbol{Y}\)) and input space (\(\boldsymbol{X}\)) are decomposed for the PLS model. Here the external relationship is the “foundation” and the internal relationship is the “result”. For nonlinear PLS, the desired “results” cannot be obtained by internal structure adjustment (Zhang and Qin 2008), if the external relationships are linear. Therefore, it is possible to build better internal relationships by starting with the analysis of external relationships. The nonlinear function usually is approximated by a series of locally weighted linear model. For example, (Wang et al. 2014; Yin et al. 2016, 2017) use the locally weighted projection regression (LWPR) or few univariate regressions to learn the nonlinearity of external relationships. This PLS regression can be considered as multi-KPLS regression with Gaussian kernel to some extent.

Fig. 11.1
figure 1

Outer- and inner-model presentation for PLS decomposition

The location-preserving partial least squares (LPPLS) model (given in Chap. 10) is another external nonlinear PLS model and its structure is relatively simple compared to the KPLS model (Wang et al. 2017). However, the LPPLS model has at least two limitations. The first one is that the local geometric structure (uniform weights) cannot be preserved better, or the \(\sigma \) parameter (Gaussian weights) (Kokiopoulou and Saad 2007) is difficult to be selected properly. The second is an oblique decomposition of the measurement process variables. The LPPLS model extracts the principal components and retains local structure by locality-preserving projection (LPP). LLE, another nonlinear dimensionality reduction technique, transforms the global nonlinear problem into a combination of several local linear problems by introducing local geometric information. Compared with LLE method, the local preserving strategy of LPP is more complex, and its parameters (Gaussian weights)are more and not easy to tuned.

The global plus local projection to latent structure (GPLPLS) (given in Chap. 9) integrates the advantages of PLS and LLE methods. The distinctive feature of the GPLPLS model is that the local nonlinear features are enhanced by LLE in the PLS decomposition (Zhou et al. 2018). GPLPLS uses the strategy of plus but not embedding, in which the new feature space is divided into linear part (global projection) and nonlinear part (local preserving). It confirms that the LLE plus PLS algorithm is able to perform the decomposition of the input and output space, and effectively preserve the local geometric structure. However, this combination needs further research, such as how to combine more effectively, how to make the orthogonal decomposition be completed, and also how to quantitatively evaluate the monitoring effect.

Based on the above analysis, Local Linear Embedded Projection of Latent Structure (LLEPLS) is proposed. It extracts the maximum correlation information between input and output, at the same time reveals and preserves the intrinsic nonlinear structure of the original data. The principal components of the input space (or measured variables space) extracted by LLEPLS still contain the variations orthogonal to \({\boldsymbol{Y}}\). These variations are output irrelevant and do not contribute to the output prediction. Moreover, LLEPLS is an oblique projection on the input space. Orthogonalization is an alternative solution for these issues. Then the local linear embedded orthogonal projection to latent structure (LLEOPLS) model is proposed in order to explain further the LLEPLS prediction model and detect quality-related faults. LLEOPLS eliminates the \(T^2\) statistic including variations orthogonal to the output. LLEOPLS differs significantly from other existing nonlinear PLS models in orthogonal projections with local geometric structure preservation and less easily fixed parameters.

11.2 A Brief Review of the LLE Method

Given the normalized data set \(\boldsymbol{X} = \left[ \boldsymbol{x}^\mathrm {T}(1),\boldsymbol{x}^\mathrm {T}(2),\ldots ,\boldsymbol{x}^\mathrm {T}(n)\right] ^\mathrm {T}\in R^{n\times m}\), \(({\boldsymbol{x}}=[x_1,x_2,\ldots ,x_m] \in R^{1\times m})\) of the model, where n is the sampling time and m is the number of input variables. LLE algorithm introduces the local structural information and transforms the global nonlinear problem into a combination of multiple local linear problems. It is outstanding at the locally nonlinear processes.

The size of neighborhood \(k_x\) is crucial for the local geometric structure. According to the distance measures such as Euclidean distance, the K nearest neighbors (KNN) of the sample can be selected (Kouropteva et al. 2002),

$$\begin{aligned} k_{x,opt}=\arg \min _{k_x} (1-\rho ^2 _{\boldsymbol{D}_x \boldsymbol{D}_{\phi _x}}), \end{aligned}$$
(11.1)

where \(\boldsymbol{D}_x\) and \(\boldsymbol{D}_{\phi _x}\) denotes the distance matrices (between point pairs) in \({\boldsymbol{X}}\) and \(\boldsymbol{\Phi }_x\) (\(\boldsymbol{\Phi }_x \) given in (11.4)), and \(\rho \) denotes the standard linear correlation coefficient between \(\boldsymbol{D}_x\) and \(\boldsymbol{D}_{\phi _x}\).

Next, the \(k_x\) nearest neighbors of the sample \(\boldsymbol{x}(i)\) can be obtained. Then \(\boldsymbol{x}(i)\) can be linearly expressed based on its the \(k_x\) nearest neighbors \(\boldsymbol{x}(j)\) by the following optimization object,

$$\begin{aligned} \begin{aligned} J(\boldsymbol{A}_x)&= \min \sum \limits _{i = 1}^n\left\| \boldsymbol{x}(i)-\sum \limits _{j = 1}^{k_x} a_{ij,x}\boldsymbol{x}(j)\right\| ^2 \\ {\mathrm {s.t.}}&\quad \sum \limits _{j = 1}^{k_x} a_{ij,x}=1, \end{aligned} \end{aligned}$$
(11.2)

where \([a_{ij,x}]:=\boldsymbol{A}_x\in R^{n\times {k_x}}, (i=1,2,\ldots ,n,j=1,2,\ldots ,{k_x})\) denotes the weight coefficients. Usually, points belonging to the space \({\boldsymbol{X}}\) are projected onto a new low-dimensional reduced space \(\boldsymbol{\Phi }_{x}=\left[ \boldsymbol{\phi }_{x}^\mathrm {T}(1), \boldsymbol{\phi }_{x}^\mathrm {T}(2), \ldots ,\boldsymbol{\phi }_{x}^\mathrm {T}(n) \right] ^\mathrm {T}\in R^{n \times d},\;(d<m,\; \boldsymbol{\phi }_{x} \in R^{1 \times d})\) determined by the following optimization:

$$\begin{aligned} \begin{aligned} J_\mathrm{{LLE}}(\boldsymbol{W})&=\min \sum _{i=1}^{n}\left\| \boldsymbol{\phi }_{x}(i)-\sum _{j=1}^{k_{x}} a_{i j, x} \boldsymbol{\phi }_{x}(j)\right\| ^{2} \\ \text{ s.t. }&\quad \boldsymbol{\Phi }_{x}^\mathrm {T}\boldsymbol{\Phi }_{x}=\boldsymbol{I}. \end{aligned} \end{aligned}$$
(11.3)

In order to further analysis, a linear mapping matrix \(\boldsymbol{W} = \left[ \boldsymbol{w}_{1},\ldots ,\boldsymbol{w}_{d}\right] \in {R^{m \times d}}\) is introduced with the guarantee of local embedding,

$$\begin{aligned} {\boldsymbol{\phi }_{x}(i)} = \boldsymbol{x}(i)\boldsymbol{W},\;\; (i = 1,2,\ldots ,n). \end{aligned}$$
(11.4)

where \(\boldsymbol{w}_j, j=1,\ldots ,d\) denotes the projection vector. Then the optimization (11.3) is rewritten as

$$\begin{aligned} \begin{aligned} J_{\text{ LLE } }(\boldsymbol{W})&=\min \mathrm{tr}\left( \boldsymbol{W}^\mathrm {T}{\boldsymbol{X}}^\mathrm {T}\boldsymbol{M}_{x}^\mathrm {T}\boldsymbol{M}_{x} {\boldsymbol{X}} \boldsymbol{W}\right) \\ \text{ s.t. }&\quad \boldsymbol{W}^\mathrm {T}{\boldsymbol{X}}^\mathrm {T}{\boldsymbol{X}} \boldsymbol{W}=\boldsymbol{I}, \end{aligned} \end{aligned}$$
(11.5)

where \(\boldsymbol{M}_x=(\boldsymbol{I}-\boldsymbol{A}_x) \in {R^{n \times n}}\). SVD operation is performed on \(\boldsymbol{M}_x\) in order to simplify the dimensionality reduction problem,

$$ \boldsymbol{M}_{x}=\left[ \begin{array}{ll} \boldsymbol{U}_{x}&\bar{\boldsymbol{U}}_{x} \end{array}\right] ^\mathrm {T}\left[ \begin{array}{ll} \boldsymbol{S}_{x} &{} \boldsymbol{0} \\ \boldsymbol{0} &{} \boldsymbol{0} \end{array}\right] \left[ \begin{array}{l} \boldsymbol{V}_{x} \\ \bar{\boldsymbol{V}}_{x}. \end{array}\right] $$

Then, the minimum value problem (11.5) is changed as follows:

$$\begin{aligned} \begin{aligned} J_{\mathrm {LLE}}(\boldsymbol{W})&=\max \mathrm{tr}\left( \boldsymbol{W}^\mathrm {T}{\boldsymbol{X}}_{M}^\mathrm {T}{\boldsymbol{X}}_{M} \boldsymbol{W}\right) \\ \text{ s.t. }&\quad \boldsymbol{W}^\mathrm {T}{\boldsymbol{X}}^\mathrm {T}{\boldsymbol{X}} \boldsymbol{W}=\boldsymbol{I}, \end{aligned} \end{aligned}$$
(11.6)

where \({\boldsymbol{X}}_{M}:= \begin{bmatrix}\boldsymbol{S}_{x}^{-1} &{} \boldsymbol{0}\\ \boldsymbol{0} &{}\boldsymbol{0} \end{bmatrix} \begin{bmatrix}\boldsymbol{V}_{x}\\ \bar{\boldsymbol{V}}_x\end{bmatrix} {\boldsymbol{X}}=\boldsymbol{S}_{V_x}\boldsymbol{X}\). Generally, LLE should chose the reduced dimension d in (11.3) in advance, but PCA can determine the corresponding dimension based on the specific criteria such as the cumulative contribution. The optimization problem (11.6) is further rewritten,

$$\begin{aligned} \begin{aligned} J_{\text{ LLE } }(\boldsymbol{w})&=\max \boldsymbol{w}^\mathrm {T}{\boldsymbol{X}}_{M}^\mathrm {T}{\boldsymbol{X}}_{M} \boldsymbol{w} \\ \text{ s.t. }&\quad \boldsymbol{w}^\mathrm {T}{\boldsymbol{X}}^\mathrm {T}{\boldsymbol{X}} \boldsymbol{w}=1, \end{aligned} \end{aligned}$$
(11.7)

where \(\boldsymbol{w}\in R^{m\times 1}\). The criteria of determining the number of principal components in PCA can be directly applied to LLE. Based on the SVD algorithm, the matrix \({\boldsymbol{X}}_{M}\) is decomposed into a “load matrix” \(\boldsymbol{P}_d=[\boldsymbol{p}_1,\boldsymbol{p}_2,\ldots ,\boldsymbol{p}_d]\) and a “score matrix” \(\boldsymbol{T}_d=[\boldsymbol{t}_1,\boldsymbol{t}_2,\ldots ,\boldsymbol{t}_d]\)

$$ {\boldsymbol{X}}_{M}^\mathrm {T}{\boldsymbol{X}}_{M}=\left[ \boldsymbol{P}_{d 0}\; \boldsymbol{P}_{r 0}\right] \left[ \begin{array}{ll} \boldsymbol{\varLambda }_{d} &{} \\ &{} \boldsymbol{\varLambda }_{r} \end{array}\right] \left[ \begin{array}{l} \boldsymbol{P}_{d 0} \\ \boldsymbol{P}_{r 0} \end{array}\right] $$

and defined \(\boldsymbol{P}_d=\boldsymbol{P}_{d0}/\Vert {\boldsymbol{X} \boldsymbol{P}_{d0}}\Vert \), \(\boldsymbol{P}_r=\boldsymbol{P}_{r0}/\Vert {\boldsymbol{X}\boldsymbol{P}_{r0}}\Vert \), and

$$\begin{aligned} \begin{aligned} {\boldsymbol{X}}_{M}&=\boldsymbol{T}_{d} \boldsymbol{P}_{d}^\mathrm {T}+\boldsymbol{T}_{r} \boldsymbol{P}_{r}^\mathrm {T}\\&=\boldsymbol{P}_{d} \boldsymbol{P}_{d}^\mathrm {T}{\boldsymbol{X}}_{M}+\left( \boldsymbol{I}-\boldsymbol{P}_{d} \boldsymbol{P}_{d}^\mathrm {T}\right) {\boldsymbol{X}}_{M}, \end{aligned} \end{aligned}$$
(11.8)

where \(\boldsymbol{T}_d={\boldsymbol{X}}_{M}\boldsymbol{P}_d,\boldsymbol{T}_r={\boldsymbol{X}}_{M}\boldsymbol{P}_r\).

It is observed from (11.7) and (11.8) that the projection direction of LLE can be obtained by maximizing the variance. Thus, the LLE constructs a new PLS regression with the local geometric structure-preserving ability according to the component extraction criteria.

Variance (factor variation) is used to extract the latent variables in PLS algorithm. It transforms the original data \(\boldsymbol{X}\) and \(\boldsymbol{Y}\) into a set of t-scores \(\boldsymbol{T}\) and u-scores \(\boldsymbol{U}\). The latent factors \(\boldsymbol{T}\) and \(\boldsymbol{U}\) are chosen by maximizing the factor variation. It aims at using fewer dimensions but retaining more features of the original data. PLS is a linear dimensionality reduction technique, but does not explore the intrinsic structure of original data. It is not conducive to data classification, but may make data mixed together. The phenomena that may occur with PLS are given in Fig. 11.2, similar as the PCA. Figure 11.2a shows a two-mode data space \(\boldsymbol{X}\) and Fig. 11.2b give its first principal component \(\boldsymbol{t}_1\) in PCA. The contribution of the first principal component of \(\boldsymbol{t}_1\) is 99%. As shown in Fig. 11.2b, the blue \('o'\) and black \('*'\) points in the one−dimensional coordinate system are mixed together. The second principal component is discarded due to its small contribution although it maintains the local geometric structure.

Fig. 11.2
figure 2

PCA decomposition and its project of a two-mode data space

11.3 LLEPLS Models and LLEPLS-Based Fault Detection

11.3.1 LLEPLS Models

In order to extract the first component pair \((\boldsymbol{t}_1,\boldsymbol{u}_1)\), the traditional PLS optimization is expressed as

$$\begin{aligned} \begin{aligned} J_{\mathrm {PLS}}\left( \boldsymbol{w}_{1}, \boldsymbol{c}_{1}\right)&=\max \boldsymbol{w}_{1}^\mathrm {T}{\boldsymbol{X}}^\mathrm {T}\boldsymbol{Y} \boldsymbol{c}_{1} \\ \mathrm{s.t. }&\quad \boldsymbol{w}_{1}^\mathrm {T}\boldsymbol{w}_{1}=1, \boldsymbol{c}_{1}^\mathrm {T}\boldsymbol{c}_{1}=1. \end{aligned} \end{aligned}$$
(11.9)

Define \(\boldsymbol{E}_{0}= {\boldsymbol{X}}\) and \(\boldsymbol{F}_{0}=\boldsymbol{Y}\). The PLS latent variables \(\boldsymbol{t}_1\) and \(\boldsymbol{c}_1\) of are obtained from \(\boldsymbol{t}_1=\boldsymbol{E}_0\boldsymbol{w}_1\) and \(\boldsymbol{u}_1=\boldsymbol{F}_0\boldsymbol{c}_1\). Here \(\boldsymbol{c}_1\) and \(\boldsymbol{w}_1\) are the eigenvectors corresponding to the maximum eigenvalues of matrices,

$$\begin{aligned} \boldsymbol{E}_{0}^\mathrm {T}\boldsymbol{F}_{0} \boldsymbol{F}_{0}^\mathrm {T}\boldsymbol{E}_{0} \boldsymbol{w}_{1}&=\theta _{1}^{2} \boldsymbol{w}_{1}\end{aligned}$$
(11.10)
$$\begin{aligned} \boldsymbol{F}_{0}^\mathrm {T}\boldsymbol{E}_{0} \boldsymbol{E}_{0}^\mathrm {T}\boldsymbol{F}_{0} \boldsymbol{c}_{1}&=\theta _{1}^{2} \boldsymbol{c}_{1}. \end{aligned}$$
(11.11)

Locality linearly embedded partial least squares (LLEPLS) is proposed to optimize the function as follows:

$$\begin{aligned} \begin{aligned}&J_{\text{ LLEPLS } }\left( \boldsymbol{w}_{1}, \boldsymbol{c}_{1}\right) =\max \boldsymbol{w}_{1}^\mathrm {T}{\boldsymbol{X}}_{M}^\mathrm {T}\boldsymbol{Y}_{M} \boldsymbol{c}_{1} \\&\mathrm{s.t. }\quad \boldsymbol{w}_{1}^\mathrm {T}{\boldsymbol{X}}^\mathrm {T}{\boldsymbol{X}} \boldsymbol{w}_{1}=1, \boldsymbol{c}_{1}^\mathrm {T}\boldsymbol{Y}^\mathrm {T}\boldsymbol{Y} \boldsymbol{c}_{1}=1 \end{aligned} \end{aligned}$$
(11.12)

in which,

$$ \begin{aligned} \boldsymbol{Y}_{M}&=\left[ \begin{array}{ll} \boldsymbol{S}_{y}^{-1} &{} \boldsymbol{0} \\ \boldsymbol{0} &{} \boldsymbol{0} \end{array}\right] \left[ \begin{array}{l} \boldsymbol{V}_{y} \\ \bar{\boldsymbol{V}}_{y} \end{array}\right] \boldsymbol{Y}=\boldsymbol{S}_{V_{y}} \boldsymbol{Y} \\ \boldsymbol{M}_{y}&=\boldsymbol{I}-\boldsymbol{A}_{y}=\left[ \begin{array}{ll} \boldsymbol{U}_{y}&\bar{\boldsymbol{U}}_{y} \end{array}\right] ^{T}\left[ \begin{array}{ll} \boldsymbol{S}_{y} &{} \boldsymbol{0} \\ \boldsymbol{0} &{} \boldsymbol{0} \end{array}\right] \left[ \begin{array}{l} \boldsymbol{V}_{y} \\ \bar{\boldsymbol{V}}_{y}, \end{array}\right] \end{aligned} $$

where \(\boldsymbol{A}_y\) is accompanied by its neighbors with different parameters \(k_y\), similar as \(\boldsymbol{A}_x\). \(\boldsymbol{S}_{y}, \boldsymbol{V}_{y}\) and \(\boldsymbol{U}_{y}\) are also similar to \(\boldsymbol{S}_{x}, \boldsymbol{V}_{x}\) and \(\boldsymbol{U}_{x}\).

The criteria of LLEPLS component decomposition and latent factors extraction are given as follows:

  1. (1)

    The latent factors \(\boldsymbol{u}_i\) and \(\boldsymbol{t}_i\) are chosen to maximize the nonlinear variation of the factors (by local linear embedding).

  2. (2)

    The correlation between potential factors \(\boldsymbol{u}_i\) and \(\boldsymbol{t}_i\) should be as strong as possible.

Then, the latent variable calculation process of LLEPLS model is given as follows. Denote \(\boldsymbol{E}_{0L} = \boldsymbol{X}_M\) and \(\boldsymbol{F}_{0L} = \boldsymbol{Y}_M\), similar as the traditional linear PLS. The constrained optimization problem (11.12) is transformed by introducing a Lagrange multiplier vector,

$$\begin{aligned} \begin{aligned} \varPsi \left( \boldsymbol{w}_{1}, \boldsymbol{c}_{1}\right) =&\boldsymbol{w}_{1}^\mathrm {T}\boldsymbol{E}_{0\,L}^\mathrm {T}\boldsymbol{F}_{0\,L} \boldsymbol{c}_{1}-\lambda _{1}\left( \boldsymbol{w}_{1}^\mathrm {T}{\boldsymbol{X}}^\mathrm {T}{\boldsymbol{X}} \boldsymbol{w}_{1}-1\right) \\&-\lambda _{2}\left( \boldsymbol{c}_{1}^\mathrm {T}\boldsymbol{Y}^\mathrm {T}\boldsymbol{Y} \boldsymbol{c}_{1}-1\right) . \end{aligned} \end{aligned}$$
(11.13)

The optimal \(\boldsymbol{w}_1\) and \(\boldsymbol{c}_1\) is solved by \(\frac{{\partial \varPsi }}{{\partial {\boldsymbol{w}_1}}}=0\) and \(\frac{{\partial \varPsi }}{{\partial {\boldsymbol{c}_1}}}=0\). Next, the optimization problem (11.13) is solved by the maximum eigenvalue problem,

$$\begin{aligned} \left( {\boldsymbol{X}}^\mathrm {T}{\boldsymbol{X}}\right) ^{-1} \boldsymbol{E}_{0\,L}^\mathrm {T}\boldsymbol{F}_{0\,L}\left( \boldsymbol{Y}^\mathrm {T}\boldsymbol{Y}\right) ^{-1} \boldsymbol{F}_{0\,L}^\mathrm {T}\boldsymbol{E}_{0\,L} \boldsymbol{w}_{1}=\theta _{1}^{2} \boldsymbol{w}_{1} \end{aligned}$$
(11.14)
$$\begin{aligned} \left( \boldsymbol{Y}^\mathrm {T}\boldsymbol{Y}\right) ^{-1} \boldsymbol{F}_{0\,L}^\mathrm {T}\boldsymbol{E}_{0\,L}\left( {\boldsymbol{X}}^\mathrm {T}{\boldsymbol{X}}\right) ^{-1} \boldsymbol{E}_{0\,L}^\mathrm {T}\boldsymbol{F}_{0\,L} \boldsymbol{c}_{1}=\theta _{1}^{2} \boldsymbol{c}_{1}. \end{aligned}$$
(11.15)

The first optimal weight vector \(\boldsymbol{w}_1\) in the conventional linear PLS (11.10) is corresponding to the matrix \(\boldsymbol{E}_{0}^\mathrm {T}\boldsymbol{F}_{0} \boldsymbol{F}_{0}^\mathrm {T}\boldsymbol{E}_{0}\). For the LLEPLS (11.14), the optimal \(\boldsymbol{w}_1\) is derived from the corresponding matrix \(\left( {\boldsymbol{X}}^\mathrm {T}{\boldsymbol{X}}\right) ^{-1} \boldsymbol{E}_{0\,L}^\mathrm {T}\boldsymbol{F}_{0\,L}\left( \boldsymbol{Y}^\mathrm {T}\boldsymbol{Y}\right) ^{-1} \boldsymbol{F}_{0\,L}^\mathrm {T}\boldsymbol{E}_{0\,L}\). These matrices are particularly similar. The extraction and modeling of the residual components can be done by traditional PLS methods.

It is worth pointing out that the columns of the input space \(\boldsymbol{X}\) and/or the output space \(\boldsymbol{Y}\) may not be full rank. The inverse of \(\boldsymbol{X}^\mathrm {T}\boldsymbol{X}\) and/or \(\boldsymbol{Y}^\mathrm {T}\boldsymbol{Y}\) does not exist. Similar as the \(S_x\) in (11.6), the corresponding matrix inverse can be obtained for \(\boldsymbol{X}\) and/or \(\boldsymbol{Y}\). It does not affect the following analysis, so both cases will be treated indiscriminately in the rest of this chapter.

The first d components are obtained to predict the regression model, where d is determined by cross-validation tests. Similar to the outer- and inner-model presentation for PLS decomposition, the corresponding LLEPLS decomposition is shown in Fig. 11.3. It is found that that the new feature space \({\boldsymbol{X}_F}\) and \({\boldsymbol{Y}_F}\) are both constructed by the nonlinear part, i.e., the local structure information. Compared with the decomposition of GPLPLS shown in Fig. 9.2, the global linear part is eliminated.

Fig. 11.3
figure 3

Outer- and inner-model presentation for LLEPLS decomposition

11.3.2 LLEPLS for Process and Quality Monitoring

The linear localization embedding in the low-dimensional space of \({\boldsymbol{X}}\) and \({\boldsymbol{Y}}\) is formed by few latent variables \((\boldsymbol{t}_1,\ldots ,\boldsymbol{t}_d)\) in the LLEPLS model. The neighborhood mappings of \(\boldsymbol{E}_{0L}\) and \(\boldsymbol{F}_{0L}\) are decomposed as follows:

$$\begin{aligned} \begin{aligned} \boldsymbol{E}_{0\,L}&=\sum _{i=1}^{d} \boldsymbol{t}_{i} \boldsymbol{p}_{i}^\mathrm {T}+\bar{\boldsymbol{E}}_{0\,L}=\boldsymbol{T} \boldsymbol{P}^\mathrm {T}+\bar{\boldsymbol{E}}_{0\,L} \\ \boldsymbol{F}_{0\,L}&=\sum _{i=1}^{d} \boldsymbol{t}_{i} \boldsymbol{q}_{i}^\mathrm {T}+\bar{\boldsymbol{F}}_{0\,L}=\boldsymbol{T} \boldsymbol{Q}^\mathrm {T}+\bar{\boldsymbol{F}}_{0\,L}, \end{aligned} \end{aligned}$$
(11.16)

where \(\boldsymbol{T}=\left[ \boldsymbol{t}_1,\boldsymbol{t}_2,\ldots ,\boldsymbol{t}_d\right] \) denotes the score vectors, \(\boldsymbol{P} = \left[ \boldsymbol{p}_1,\ldots , \boldsymbol{p}_d\right] \) and \(\boldsymbol{Q}= \left[ \boldsymbol{q}_1, \ldots , \boldsymbol{q}_d\right] \) denote the loading matrices of \(\boldsymbol{E}_{0L}\) and \(\boldsymbol{F}_{0L}\), respectively. Score \(\boldsymbol{T}\) is represented in terms of the neighboring mapping data \(\boldsymbol{E}_{0L}\),

$$\begin{aligned} \boldsymbol{T}=\boldsymbol{E}_{0\,L} \boldsymbol{R}=\boldsymbol{S}_{V_{x}} \boldsymbol{E}_{0} \boldsymbol{R}, \end{aligned}$$
(11.17)

where \(\boldsymbol{R} = [{\boldsymbol{r}_1},\ldots ,{\boldsymbol{r}_d}] \), and

$${\boldsymbol{r}_i} = \prod \limits _{j = 1}^{i - 1} {({\boldsymbol{I}_n} - {\boldsymbol{w}_j}\boldsymbol{p}_j^\mathrm {T}){\boldsymbol{w}_i}}.$$

Equations (11.16) and (11.17) are difficult to directly apply in practice due to the calculation of locality-preserving matrix S, so the decomposition for the scaled and mean-centered \(\boldsymbol{E}_{0}\) and \(\boldsymbol{F}_{0}\) are given,

$$\begin{aligned} \boldsymbol{E}_{0}&=\boldsymbol{T}_{0} \boldsymbol{P}^{\mathrm {T}}+\bar{\boldsymbol{E}}_0\end{aligned}$$
(11.18)
$$\begin{aligned} \boldsymbol{F}_{0}&=\boldsymbol{T}_{0} \bar{\boldsymbol{Q}}^\mathrm {T}+\bar{\boldsymbol{F}}_{0}\nonumber \\&=\boldsymbol{E}_{0} R \bar{\boldsymbol{Q}}^\mathrm {T}+\bar{\boldsymbol{F}}_{0}, \end{aligned}$$
(11.19)

where \(\boldsymbol{T}_0= \boldsymbol{E}_0 \boldsymbol{R}\), \(\bar{\boldsymbol{Q}} =\boldsymbol{T}_0^+ {\boldsymbol{F}}_0\).

Now let’s consider the monitoring of new samples \({\boldsymbol{x}}\) and subsequently on \({\boldsymbol{y}}\). First the samples are scaled and mean-centered, an oblique projection is derived on the input data space \({\boldsymbol{x}}\).

$$\begin{aligned} \begin{aligned} \boldsymbol{x}&=\hat{\boldsymbol{x}}+\boldsymbol{x}_{e} \\ \hat{\boldsymbol{x}}&=\boldsymbol{P} \boldsymbol{R}^\mathrm {T}\boldsymbol{x} \\ \boldsymbol{x}_{e}&=\left( \boldsymbol{I}-\boldsymbol{P} \boldsymbol{R}^\mathrm {T}\right) \boldsymbol{x}. \end{aligned} \end{aligned}$$
(11.20)

The statistics \(T^2\) and Q are calculated as follows:

$$\begin{aligned} \begin{aligned} \boldsymbol{t}&=\boldsymbol{R}^\mathrm {T}\boldsymbol{x} \\ \mathrm {T}^{2}&=\boldsymbol{t}^\mathrm {T}\boldsymbol{\varLambda }^{-1} \boldsymbol{t}=\boldsymbol{t}^\mathrm {T}\left( \frac{1}{n-1} \boldsymbol{T}_{0}^\mathrm {T}\boldsymbol{T}_{0}\right) ^{-1} \boldsymbol{t} \\ \mathrm {Q}&=\left\| \boldsymbol{x}_{e}\right\| ^{2}=\boldsymbol{x}^\mathrm {T}\left( \boldsymbol{I}-\boldsymbol{P} \boldsymbol{R}^\mathrm {T}\right) \boldsymbol{x}, \end{aligned} \end{aligned}$$
(11.21)

where \(\boldsymbol{\varLambda }\) is the sample covariance matrix.

The space of measured variables, i.e., input space, is divided into two subspaces: score subspace and residual subspace. LLEPLS detects the quality-related faults by the \(\mathrm{T}^2\) statistic in the score subspace and detects the quality-irrelevant faults by \(\mathrm {Q}\)-statistics in the residual subspace. The PLS scores which constitute the \(\mathrm{T}^2\) statistic still includes the variation orthogonal to \(\boldsymbol{Y}\). Therefore, LLEPLS still has deficiencies in the quality-related fault detection.

11.4 LLEOPLS Models and LLEOPLS-Based Fault Detection

As demonstrated in (Li et al. 2010), (Ding et al. 2013), the standard PLS performs a diagonal decomposition of the measured process variables. The LLEPLS model (11.16) also is a oblique decomposition operation (11.20) on the measured process variables, which is similar to the standard PLS model. Thus, the major part of the measured process variables may include variations orthogonal to the output variables. In other words, the principle component still include the output irrelevant variation, and the residual part may include a large of output-related variation. In addition, the number of principal components is often dependent on the operator’s decision and is likely to cause the problems of component redundancy. In order to solve these problem, it is necessary to further decompose the LLEPLS model in equation (11.18) and get an orthogonal decomposition for the measured process variables. In this model, the regression coefficient \(\boldsymbol{R} \bar{\boldsymbol{Q}}^\mathrm {T}\) in equation (11.19) are used to describe the relationship between \(\boldsymbol{E}_{0}\) and \(\boldsymbol{F}_{0}\). Performing the SVD operation on \(\boldsymbol{R} \bar{\boldsymbol{Q}}^\mathrm {T}\) to obtain orthogonal decomposition,

$$\begin{aligned} \boldsymbol{R} \bar{\boldsymbol{Q}}^\mathrm {T}=\boldsymbol{U}_{p c} \boldsymbol{S}_{p c} \boldsymbol{V}_{p c}^\mathrm {T}, \end{aligned}$$
(11.22)

where \(\boldsymbol{S}_{pc}\) contains all non-zero singular values in descending order. \(\boldsymbol{V}_{pc}\) and \(\boldsymbol{U}_{pc}\) are the corresponding right and left singular vectors. Then,

$$\begin{aligned} \begin{aligned} \boldsymbol{F}_{0}&=\boldsymbol{E}_{0} \boldsymbol{U}_{p c} \boldsymbol{S}_{p c} \boldsymbol{V}_{p c}^\mathrm {T}+\bar{\boldsymbol{F}}_{0} \\&=\boldsymbol{T}_{p c} \boldsymbol{Q}_{p c}^\mathrm {T}+\bar{\boldsymbol{F}}_{0}, \end{aligned} \end{aligned}$$
(11.23)

where \(\boldsymbol{T}_{pc}=\boldsymbol{E}_0\boldsymbol{U}_{pc}\), \(\boldsymbol{Q}_{pc}= \boldsymbol{V}_{pc}\boldsymbol{S}_{pc}\). The output-residual subspace \(\bar{\boldsymbol{F}}_{0}\) indicates an unpredictable output but may include some variation.

Furthermore, \(\boldsymbol{E}_{0}\) decomposes into two orthogonal subspaces by \(\boldsymbol{T}_{pc}\).

$$\begin{aligned} \begin{aligned} \boldsymbol{E}_{0}&=\hat{\boldsymbol{E}}_{0}+\boldsymbol{X}_e \\&=\boldsymbol{T}_{p c} \boldsymbol{U}_{p c}^\mathrm {T}+\boldsymbol{E}_{0}\left( \boldsymbol{I}-\boldsymbol{U}_{p c} \boldsymbol{U}_{p c}^\mathrm {T}\right) , \end{aligned} \end{aligned}$$
(11.24)

where \(\hat{\boldsymbol{E}}_{0}:=\boldsymbol{T}_{p c} \boldsymbol{U}_{p c}^\mathrm {T}\) and \(\boldsymbol{X}_e=\boldsymbol{E}_{0}\left( \boldsymbol{I}-\boldsymbol{U}_{p c} \boldsymbol{U}_{p c}^\mathrm {T}\right) \). \(\boldsymbol{X}_e\) denotes the orthogonal input-residual subspace. The new data samples \({\boldsymbol{x}}\) and subsequently \({\boldsymbol{y}}\) are orthogonal projected on the input data space \({\boldsymbol{x}}\) for process and quality monitoring,

$$\begin{aligned} \begin{aligned} \boldsymbol{x}&=\hat{\boldsymbol{x}}+\boldsymbol{x}_{e} \\ \hat{\boldsymbol{x}}&=\boldsymbol{U}_{p c} \boldsymbol{U}_{p c}^\mathrm {T}\boldsymbol{x} \\ \boldsymbol{x}_{e}&=\left( \boldsymbol{I}-\boldsymbol{U}_{p c} \boldsymbol{U}_{p c}^\mathrm {T}\right) \boldsymbol{x} \\ \boldsymbol{t}_{p c}&=\boldsymbol{U}_{p c} \boldsymbol{x} \\ \boldsymbol{y}_{e}&=\boldsymbol{y}-\boldsymbol{Q}_{p c} \boldsymbol{t}_{p c}. \end{aligned} \end{aligned}$$
(11.25)

The LLEOPLS model is given in (11.23) and (11.24) with many parameters to be determined in prior. The selection of the optimal parameters has been described for LLE (Kouropteva et al. 2002). The optimal parameters \([k_{x},k_y]\) of LLEOPLS model is determined by simultaneously considering the characteristics of the LLE itself and the relationship between the input and output spaces. The following optimization is given for determining the parameters \([k_{x},k_y]\):

$$\begin{aligned} \begin{aligned} \left[ k_{x}, k_{y}\right] _{\text{ opt }}=\arg \min _{k_{x}, k_{y}}&\left( 1-\rho _{\boldsymbol{D}_{x} \boldsymbol{D}_{\phi _{x}}}^{2}+1-\rho _{\boldsymbol{D}_{y} \boldsymbol{D}_{\phi _y}}^{2}\right. \\&\;\left. +1-\left. \rho _{\boldsymbol{D}_{\hat{y}} \boldsymbol{D}_{y}}^{2}\right| _{\text{ train } }+1-\left. \rho _{\boldsymbol{D}_{\hat{y}} \boldsymbol{D}_{y}}^{2}\right| _{\text{ pre } }\right) , \end{aligned} \end{aligned}$$
(11.26)

where \(\hat{\boldsymbol{y}}= \boldsymbol{Q}_{pc}\boldsymbol{t}_{pc}\). \(\left. \cdot \right| _{train}\) and \(\left. \cdot \right| _{pre}\) are the training data set and the testing data sets, respectively. The first two terms in (11.26), \(1-\rho _{\boldsymbol{D}_{x} \boldsymbol{D}_{\phi _{x}}}^{2}\) and \(1-\rho _{\boldsymbol{D}_{y} \boldsymbol{D}_{\phi _y}}^{2}\), aim at evaluating the geometric similarity between the embedding space and the high-dimensional space. The last two terms, \(1-\rho _{\boldsymbol{D}_{\hat{y}} \boldsymbol{D}_{y}}^{2}\) and \(1-\rho _{\boldsymbol{D}_{\hat{y}} \boldsymbol{D}_{y}}^{2}\), indicate the effect of the model which indirectly reflects the role of the first two terms. Cross-validation is used to ensure the training results of the model. The last term is the most important part in (11.26),

$$\begin{aligned} \left[ k_{x}, k_{y}\right] _{\text{ opt } }=\arg \min _{k_{x} k_{y}}\left( 1-\left. \rho _{\boldsymbol{D}_{\hat{y} }\boldsymbol{D}_y}^{2} \right| _{\text{ pre } }\right) . \end{aligned}$$
(11.27)

A generalized LLEOPLS model with the optimal parameters \(k_x\) and \(k_y\) can be used to monitor the operation of the system. The \(\mathrm{T}^2\) statistics can monitor the output-related score (\(\boldsymbol{T}_{pc}\)), output-residual part and input-residual part,

$$\begin{aligned} \begin{aligned} \mathrm{T}_{p c}^{2}&=\boldsymbol{t}_{p c}^\mathrm {T}\boldsymbol{\varLambda }_{p c}^{-1} \boldsymbol{t}_{p c}=\boldsymbol{t}_{p c}^\mathrm {T}\left\{ \frac{1}{n-1} \boldsymbol{T}_{p c}^\mathrm {T}\boldsymbol{T}_{p c}\right\} ^{-1} \boldsymbol{t}_{p c} \\ \mathrm{T}_{e}^{2}&=\boldsymbol{x}_{e}^\mathrm {T}\boldsymbol{\varLambda }_{x, e}^{-1} \boldsymbol{x}_{e}=\boldsymbol{x}_{e}^\mathrm {T}\left\{ \frac{1}{n-1} \boldsymbol{X}_{e}^\mathrm {T}\boldsymbol{X}_{e}\right\} ^{-1} \boldsymbol{x}_{e} \\ \mathrm{T}_{y, e}^{2}&=\boldsymbol{y}_{e}^\mathrm {T}\boldsymbol{\varLambda }_{y, e}^{-1} \boldsymbol{y}_{e}=\boldsymbol{y}_{e}^\mathrm {T}\left\{ \frac{1}{n-1} \boldsymbol{Y}_{e}^\mathrm {T}\boldsymbol{Y}_{e}\right\} ^{-1} \boldsymbol{y}_{e}, \end{aligned} \end{aligned}$$
(11.28)

where \(\boldsymbol{\varLambda }_{pc}\), \(\boldsymbol{\varLambda }_{x,e}\) and \(\boldsymbol{\varLambda }_{y,e}\) denotes the sample covariance matrices. \(\boldsymbol{Y}_{e}:=\bar{\boldsymbol{F}}_{0}=\boldsymbol{F}_{0}-\boldsymbol{T}_{p c} \boldsymbol{Q}_{p c}^\mathrm {T}\).

The \(\boldsymbol{T}_{pc}\) of the LLEOPLS method is not obtained from a scaled and mean-centered matrix \(\boldsymbol{E}_{0L}\). The control limits of the \(\mathrm{T}^2\) statistical series usually are calculated based on the probability density function estimated by the non-parametric KDE method. The \(\mathrm{T}_{pc}^2\) and \(\mathrm{T}_{e}^2\) statistics both are univariate although the processes represented by these statistics are multivariate. Then the control limits for the monitoring statistics (\(\mathrm{T}_{pc}^2\), \(\mathrm{T}_e^2\) and \(\mathrm{T}_{y,e}^2\)) are obtained from the corresponding PDF estimation,

$$\begin{aligned} \int ^{\mathrm {Th}_{pc,\alpha }}_{-\infty }g(\mathrm{T}_{pc}^2)d\mathrm{T}_{pc}^2=\alpha \end{aligned}$$
$$\begin{aligned} \int ^{\mathrm {Th}_{x_e,\alpha }}_{-\infty }g(\mathrm{T}_{e}^2)d\mathrm{T}_{e}^2=\alpha \end{aligned}$$
$$\begin{aligned} \int ^{\mathrm {Th}_{y_e,\alpha }}_{-\infty }g(\mathrm{T}_{y,e}^2)d\mathrm{T}_{y,e}^2=\alpha , \end{aligned}$$

where

$$g(z) = \frac{1}{lh}\sum ^l_{j=1}\mathcal {K}\left( \frac{z-z_j}{h}\right) , $$

where \(\mathcal {K}(\cdot )\) and h are kernel function and its bandwidth or smoothing parameter, respectively.

Finally, the fault detection logic for the output-residue subspace is given,

$$\begin{aligned} \begin{aligned} \mathrm{T}_{y,e}^2 > \mathrm {Th}_{y_e,\alpha }&\qquad\quad\,\,\,\text {Unpredictable output faults}\\ \mathrm{T}_{y,e}^2 \le \mathrm {Th}_{y_e,\alpha }&\quad\text {Fault-free in unpredictable output}. \end{aligned} \end{aligned}$$
(11.29)

\(\mathrm T_{y,e}^2\) includes the output information, so it is suitable for monitoring the output-residual subspace. But this posteriori quality monitoring is not the focus. Instead, process-based quality monitoring is of greater interest. Fault detection logic for the input space is (Zhou et al. 2018):

$$ \begin{aligned} \begin{aligned} \mathrm{T}_{pc}^2&> \mathrm {Th}_{pc,\alpha }&\text {Quality-relevant faults}\\ \mathrm{T}_{pc}^2&> \mathrm {Th}_{pc,\alpha } \; \text {or}\; \mathrm{T}_{e}^2 > \mathrm {Th}_{x_e,\alpha }&\text {Process-relevant faults}\\ \mathrm{T}_{pc}^2&\le \mathrm {Th}_{pc,\alpha } \; \& \; \mathrm{T}_{e}^2 \le \mathrm {Th}_{x_e,\alpha }&\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\text {Fault-free}. \end{aligned} \end{aligned}$$
(11.30)

The monitoring process of LLEOPLS algorithm for the complex industrial system is given as follows:

  1. 1.

    The original data \({\boldsymbol{X}}\) and \({\boldsymbol{Y}}\) is scaled to zero mean and unit variance.

  2. 2.

    The LLE and PLS optimization objectives ((11.4) and (11.9)) are combined. Then perform the LLEPLS operation for \({\boldsymbol{X}}\) and \({\boldsymbol{Y}}\) to yield \(\boldsymbol{T}_0\), \(\bar{\boldsymbol{Q}}\) and \(\boldsymbol{R}\) as well as the output-residual subspace \(\boldsymbol{Y}_e\), based on (11.18) and (11.19).

  3. 3.

    The number of LLEPLS factors d is determined by cross-validation.

  4. 4.

    Perform SVD on \(\boldsymbol{R} \bar{\boldsymbol{Q}}^\mathrm {T}\). Further access to \(\boldsymbol{U}_{pc}\), \(\boldsymbol{T}_{pc}\) and \(\boldsymbol{Q}_{pc}\).

  5. 5.

    Build the input-residual subspace \({\boldsymbol{X}_e}\).

  6. 6.

    Calculate the control limits (11.28) and finish the fault monitoring according to the fault detection logic (11.30).

11.5 Case Study

The fault detection strategy based on the proposed LLEPLS and LLEOPLS model is performed on the Tennessee Eastman Process (TEP) simulation platform (Lyman and Georgakis 1995). To better demonstrate the effectiveness and rationality of the proposed monitoring strategy, the PLS monitoring strategy and the concurrent projection to latent structure (CPLS) model (Qin and Zheng 2012) are compared. With the CPLS algorithm, the input and output spaces are projected into five subspaces: the input-principle subspace, the input-residual subspace, the output-principle subspace, the output-residual subspace, and the joint input-output subspace. When only the monitoring capability of quality-related faults is considered, the input-residual subspace replaces the input-residual and -principle subspace in the CPLS model. The \(\mathrm{T}_{e}^2\) replaces the corresponding monitoring strategy. In order to emphasize the process-based quality monitoring, the output-residual subspace in LLEOPLS model will not be considered. Similarly, the output-principle and -residual subspaces in CPLS model are not considered.

11.5.1 Models and Discussion

All process measurement variables (XMEAS (1:22)) and manipulation variables (XMV (1:11)) form the input variables matrix \(\boldsymbol{X}\). The quality variable matrix \(\boldsymbol{Y}\) consists of XMEAS (35) and (38). The training data set is normal data IDV(0) and the texting data consists of the 21 fault data IDV(1-21). The optimal parameters of LLEPLS and LLEOPLS are \(k_x=24\) and \(k_y=20\). The number of principal components of the PLS, CPLS, LLEPLS, and LLEOPLS models are 6, 6, 5, and 5, respectively.

Table 11.1 FDR of PLS, LLEPLS, CPLS, and LLEOPLS
Table 11.2 FAR of PLS, LLEPLS, CPLS, and LLEOPLS

From the analysis of previous Chaps. 9 and 10, it is known that faults IDV(3,4), IDV(9,11), IDV(14,15), and IDV(19) had almost no effect on product quality but other faults produced significant variations in quality variables when select component G (XMEAS(35)) and component E (XMEAS(38)) as product quality variables. The FDR and FAR of PLS, LLEPLS, CPLS, and LLEOPLS at the control limit with confidence level \(99.75\%\) are shown in Tables 11.1 and 11.2, respectively. Based on the two tables, the monitoring results for LLEOPLS are a little different from the other monitoring results which are almost the same as FAR, such as IDV(14) and IDV(17). They are considered as quality-related faults in the method of PLS. However, LLEOPLS method indicates that they are quality-irrelevant faults.

Which monitoring results are more credible? The following is given to assess whether the final result of the fault detection is reasonable by quantifying the posterior quality alarm rate (PQAR).

$$\begin{aligned} \mathrm{PQAR}=\frac{ \text{ No. } \text{ of } \text{ samples } \left( \left\{ \left| \left( \boldsymbol{Y}_{F}\right) \right| \right\} >3 \mid f \ne 0\right) }{ \text{ total } \text{ samples } (f \ne 0)} \times 100, \end{aligned}$$
(11.31)

where \(\boldsymbol{Y}_{F}\) are the scaled and mean-centered data, which is the output data of the fault cases. The PQAR is also given in Table 11.1. The 21 faults are divided into two categories by PQAR. Type I is quality-independent (\(\mathrm{PQAR}_i < 6, i=1,2,\ldots ,21\)), including IDV(3,4,9,11,14,15,17,19,20). Type II is quality-relevant faults, and further classified into three categories: IDV(16) has a slight effect on quality; IDV(1, 2, 5, 6, 7, 8, 10, 12, 13, 18) has a serious effect on quality; and IDV(21) causes a slow drift of the output variable. Apparently, the LLEOPLS method achieves a consistent conclusion (\(\mathrm{T}_{pc}^2\)). That is, the LLEOPLS model can eliminate the quality-independent interference alarms better. However, there are still some differences in alarm rates between PQAR and \(\mathrm{T}_{pc}^2\), such as IDV (5), IDV (7), and IDV (20). What causes this difference? Next, the differences between the LLEOPLS method and the other methods are further analyzed based on the PQAR and \(T_{pc}^2\) alarm rates.

11.5.2 Fault Detection Analysis

The differences in fault detection results are discussed for the PLS (CPLS) model and the LLEPLS (LLEOPLS) model, respectively. Several cases exist for output variables or process variables with no faults or minor faults (IDV(3,9,15)). Both approaches provide consistent conclusions. For other faults, there are some differences in their diagnostic results. For two failure cases, including quality-recoverable failures and quality-irrelevant failures, the analysis is as follows. Subplots (a-d) of Figs. 11.4, 11.5 and 11.6 are monitoring result based on the statistics \(\mathrm{T}_{pc}^2\) and \(\mathrm{T}_{e}^2\), respectively. The blue line shows the monitored value and the red dashed line shows the control limit of 99.75%. In the corresponding subplots (e) and (f) give the output prediction, where the blue dashed line is the measurement value and the green line is predicted value.

Experiment 1: Quality-Recoverable Faults

Consider the fault IDV(1), IDV(5), IDV(7). All these fault conditions are step faults, but the in-process feedback controller or cascade controller can compensate the changes in the output variables; therefore, the product quality variables under the fault condition IDV (1), IDV (5), and IDV (7) tend to return to normal. The monitoring results of IDV (1) are shown in Fig. 11.4 by the PLS, LLEPLS, CPLS, LLEOPLS methods.

Fig. 11.4
figure 4

PLS, LLEPLS, CPLS, and LLEOPLS monitoring result for IDV(1) and the output predicted values

Fig. 11.5
figure 5

PLS and LLEPLS monitoring result for IDV(17) and the output predicted values

Fig. 11.6
figure 6

PLS and LLEPLS monitoring result for IDV(20) and the output predicted values

It is easy to find that the \(\mathrm{T}_e^2\) statistics in CPLS and LLEOPLS method can detect the process-related faults. The \(\mathrm{T}_{pc}^2\) statistic of the LLEOPLS model returns back to the control limit which indicates that those faults are quality recoverable. Existing work in the literatures reports the high detection rates of these faults. For example, PLS, CPLS, and LLEPLS methods give many false alarms based on \(\mathrm{T}^2\) for IDV(1). In this case, the LLEOPLS method can accurately reflect the changes in both process variables and quality variables.

For IDV(1), a huge difference between FDR(\(\mathrm{T}^2\)) and PQAR can be observed. On the one hand, FDR(\(\mathrm{T}^2\) or \(\mathrm{T}^2_{pc}\)) is based on the principal components of the process variables (without time delay), while PQDR is obtained based on the actual output values (with time delay). They are not equivalent. Moreover, considering that the data used for modeling are under normal operating, but not under fault conditions. The nonlinearity feature may not be fully excited (i.e., these nonlinearities appear to be linear in the normal and steady operation). When fault occurs, nonlinearity is fully excited and may lead to false alarms and missed alarms due to the inability of the original model to predict the output. In fact, \(\mathrm{T}^2\) is considered to monitor the quality-related fault, which implies the assumption that the output of the system can still be well predicted by the model in case of a failure. Although the variation of the predicted value of the PLS model (XMEAS(38)) follows the variation of the actual output value, the predicted value is too large which results in a much larger FDR (\(\mathrm{T}^2\) in the PLS, CPLS, and LLEPLS models) than the PQAR. Nevertheless, the monitoring results of CPLS and LLEPLS are closer to reality by the orthogonalization strategy and the local linear embedding strategy.

Experiment 2: Quality-Irrelevant Faults

Fault IDV(4,11,14,17,19,20) are quality-irrelevant, in which IDV(4), IDV(11), IDV(14), and IDV(17) are considered as quality-independent but process related. The monitoring results and output predictions for IDV(17) are shown in Fig. 11.5. As shown in Fig. 11.5e, f, the PLS model cannot predict the output values well while the LLEPLS model can predict the output values very accurately. So many false alarms generated by \(\mathrm{T}^2\) of the PLS method. There are two possible reasons: PLS model does not map the nonlinear functions well, and its principal components contain the variations orthogonal to the output variables. Although CPLS improves the orthogonal part of PLS, its nonlinearity extracting ability is still poor. In contrast, the LLEPLS model captures the nonlinear structure well and filters out these false alarms by LLE.

IDV(20) is another touchstone for fault detection. The monitoring results and their output predictions are shown in Fig. 11.6. The detection of all methods is not good based on PQAR, but LLEOPLS method is the best. It is found from the predicted results that LLEPLS model can predict the output variation well. With the removal of the orthogonal component, there remains a question why \(\mathrm{T}^2_{pc}\) still fails to yield consistent results. One of the underlying reasons is that the nonlinear dynamics excited by IDV(20) cannot be well described by the parameters \([k_x,k_y]=[24,20]\), which in turn leads to a wrong classification. Another reason could be the different control limits between PQDR and \(\mathrm{T}^2_{pc}\). The statistical results of PQDR are obtained by assuming that the output variables obey a Gaussian distribution, and subsequently, their control limits are determined by a threefold standard deviation criterion. However, the 99.75% control limit of \(\mathrm{T}^2_{pc}\) was obtained by non-parametric estimation. This differs from the results of the Gaussian assumption. The control limit of \(\mathrm{T}^2_{pc}\) with confidence level 99.75% for the non-parametric KDE is 9.9583, but under the Gaussian assumption is 12.0708). In fact, the monitoring results of \(\mathrm{T}^2_{pc}\) of LLEOPLS show that most of the alarms are transient alarms and few are continuous, where the transient alarms may be caused by noise.

Experiment 3: Other Quality-Related Faults

For other quality-related faults, the FDA results are essentially the same for these methods given in Table 11.1. However, the FDA results are significantly different for IDV(2), IDV(8), IDV(21), etc. The superiority of the proposed method is further verified by comparing the PQAR of IDV(2), IDV(8), IDV(21). The monitoring results are shown in Fig. 11.7. Although fault IDV(2) and IDV(8) are quality-related, the quality certainly meets the production requirements even in these fault condition. So the quality-related alarm is not higher. The monitoring results of the proposed LLEOPLS method are consistent with PQAR.

Fig. 11.7
figure 7

PQAR and the corresponding LLEOPLS monitoring results

11.6 Conclusions

Nonlinear regression modeling and analysis is a particularly tricky task. LLEPLS model transforms the nonlinear regression problem into a combination of multiple local linear regression problems using the local linear embedding feature. It not only allows the local properties of the original data to be preserved, but also allows the correlation between the input space and the output space to be maximized, further accurately predicting the quality variables. While the \(\mathrm{T}_{pc}^2\) statistic of LLEPLS model contains the orthogonal variation of the output. In order to eliminate it, the input space of LLEPLS is further orthogonally decomposed, and the corresponding statistical criteria are established, i.e., LLEOPLS is obtained. The characteristics of the LLEOPLS model with nonlinear mapping and orthogonal decomposition are further clarified by comparing with the PLS, CPLS, and LLEPLS models in TEP benchmark simulation. Simulation results show that the LLEOPLS model is more effective for nonlinear systems and yields better (more consistent) fault detection performance, compared with the PLS, CPLS, and LLEPLS models. Although LLEOPLS has good quality-related monitoring performance for nonlinear processes, it has some limitations, such as that the low-dimensional manifold in which the sampled data are located is linear and that the noise subjects to Gaussian distribution. These are the directions of our further research.