Owing to the raised demands on process operation and product quality, the modern industrial process becomes more complicated when accompanied by the large number of process and quality variables produced. Therefore, quality-related fault detection and diagnosis are extremely necessary for complex industrial processes. Data-driven statistical process monitoring plays an important role in this topic for digging out the useful information from these highly correlated process and quality variables, because the quality variables are measured at a much lower frequency and usually have a significant time delay (Ding 2014; Aumi et al. 2013; Peng et al. 2015; Zhang et al. 2016; Yin et al. 2014). Monitoring the process variables related to the quality variables is significant for finding potential harm that may lead to system shutdown with possible enormous economic loss.

PLS is a typical multivariate statistical analysis technique in two coordinate space, which is well suitable for the quality-related fault detection and process monitoring. However, actual industrial data are often with the features of strong nonlinear dynamic and coupled, etc. PLS method only considers the static linear mapping between multiple sources of data, so it is difficult to achieve accurate detection results by directly applying PLS. It becomes an important direction how to introduce the local structure-preserving capability to the global structure projection of PLS, in order to extract the complex features of industrial data. This idea of global structure and local structure fusion can usually be implemented by two strategies, plus and embedding. This chapter focuses on the idea of plus, global, and local partial least squares (GLPLS) which is introduced first. Global plus local projection to latent structure (GPLPLS) method is further proposed, and three different performance functions are given from the projection requirements of input measurement space and output measurement space, separately or simultaneously. The next two chapters focus on the idea of embedding, two different embedding methods, locality-preserving partial least squares (LPPLS) and local linear embedded projection of latent structure (LLEPLS), are proposed, which use LPP and LLE as local structure-preserving technique, respectively.

9.1 Fusion Motivation of Global Structure and Local Structure

Currently, partial least squares (PLS), which is one of those data-driven methods (Severson et al. 2016; Ge et al. 2012; Li et al. 2010; Zhao 2014; Zhang and Qin 2008), is widely used because of its advantages in extracting the latent variables by establishing the relationship between input and output space for quality-relevant process monitoring (Qin 2010). It maintains the maximum correlation between quality and process variables and has better quality-related fault detection capability. However, the nature of PLS is a linear projection, which is not applicable for nonlinear systems. It uses only global structural information with information such as mean and variance and performs poorly in systems with strong local nonlinear characteristics.

Nonlinear PLS methods can be divided into two categories: external nonlinear PLS models and internal nonlinear PLS models, as shown in Fig. 9.1.

Fig. 9.1
figure 1

Outer and inner model presentation for linear PLS decomposition

External nonlinear PLS models are used as a class of nonlinear PLS models that introduce nonlinear transformations in the input and/or output variables. An example is kernel partial least squares (KPLS) (Rosipal and Trejo 2001; Godoy et al. 2014; Rosipal and Trejo 2001), which is used to describe the nonlinear relationship between the independent variables and for extending the linear relationship between the inputs and outputs. KPLS effectively solves the nonlinear problem between the principal components for input space and output space, but the selection of kernel function is more difficult in practical applications. Similarly, the kernel concurrent canonical correlation analysis (KCCCA) algorithm is proposed for quality-relevant nonlinear process monitoring that considers the nonlinearity in the quality variables (Zhu et al. 2017). Kernel-based methods map the original data into a (possibly high-dimensional) Hilbert space (eigenspace), but the projection in the eigenspace is complex, the direction and length of the projection cannot be determined, and the choice of kernel function is not straightforward.

Inner nonlinear PLS model is where the internal linear model between latent variables is replaced by a nonlinear model, but its external model remains unchanged, such as quadratic partial least squares (QPLS) (Wold et al. 1989), spline function PLS (SPLS) (Wold 1992), and neural network PLS (NNPLS) (Qin and McAvoy 1992, 1996) approaches. Recursive nonlinear PLS (RNPLS) models are built by extending the input and output matrices on top of PLS (Li et al. 2005); nonlinear PLS (NPLSSLT) based on the slice transformation (SLT) can be used for nonlinear correction, where SLT-based segmented linear mapping functions are used to construct nonlinear relationships between input and output score vectors (Shan et al. 2015); and nonlinear iterative partial least square algorithm (NIPALS) is improved by assuming that the score vector is a linear projection of the original variables in the internal nonlinear PLS, at the cost of increased computational complexity and optimization complexity.

PLS methods have nonlinearities in both the outer model and the inner model. An example is the orthogonal nonlinear PLS method (O-NLPLS) which considers orthogonal correlated nonlinearities between the input and output variables (Doymaz et al. 2003). This method retains the orthogonality properties of the PCA method due to the fact that it is based on a neural network architecture. Similarly, RBF network is used to identify the nonlinearity of the input variables and to establish the nonlinear relationship between the input and output variables (Zhao et al. 2006; Shimizu et al. 2006).

The different linear PLS representations are mathematically equivalent. However, using different nonlinear PLS methods results in different performance and characteristics. Existing nonlinear PLS methods have some shortcomings, such as the problem of choosing kernel functions or latent structures for unknown nonlinear systems; the problem of increasing computational complexity when using neural networks for nonlinear mapping; and the lack of a superior PLS decomposition algorithm. Therefore, how to simplify the nonlinear PLS modeling problem is an urgent need to be solved.

Considering that PLS and its extended algorithms only focus attention on the global structural information and cannot extract the local adjacent structural information of the data well, they are not suitable for the extraction of nonlinear features. Therefore, the local linearization method for dealing with nonlinear problems is taken into account. In recent years, locality-preserving projections (LPP) (He and Niyogi 2003; He et al. 2005), which belong to the manifold learning method have been proposed to solve the local adjacent structural feature problem and effectively make up for this deficiency. In addition, there are many other manifold learning methods, such as isometric feature mapping (Tenenbaum et al. 2000), local linear embedding (LLE) (Roweis and Saul 2000), Laplace feature map (Belkin and Niyogi 2003), etc.

Manifold learning methods preserve the local features by projecting the global structure to an approximate linear space, and by constructing a neighborhood graph to explore the inherent geometric features and manifold structure from the sample data sets. But these methods cannot consider the overall structure and lack a detailed analysis and explanation of the correlation between process and quality variables. Therefore, combining the global projection methods, such as PLS, and the manifold learning method, such as LPP and LLE, has become a new topic of concern for a growing number of engineers.

Regarding the combination of global and local information, Zhong et al. proposed a quality-related global and local partial least squares (GLPLS) model (Zhong et al. 2016). The GLPLS method integrates the advantages of the LPP and PLS methods, and extracts meaningful low-dimensional representations from the high-dimensional process and quality data. The principal components in GLPLS preserve the local structural information in their respective data sheets as much as possible. However, the correlation between the process and quality variables is not enhanced, and the constraints of LPP are removed in the optimization objective function. Therefore, the monitoring results are seriously affected.

After further analysis of the geometric characteristics of LPP and PLS, a new integration method called the locality-preserving partial least squares (LPPLS) model that was proposed by Wang et al. pays more attention to the locality-preserving characteristics (Wang et al. 2017). LPPLS can exploit the underlying geometrical structure, which contains the local characteristics, in input and output space. Although the maximization of correlation degree between the process and quality variables was considered, the global characteristics were converted into a combination of multiple local linearized characteristics and were not expressed directly. In many processes, the linear relationship may be the most important, and the best way is to describe it directly rather than through a combination of multiple local linearized characteristics.

9.2 Mathematical Description of Dimensionality Reduction

9.2.1 PLS Optimization Objective

PLS algorithm is used to model the relationship between the normalized data sets \({\boldsymbol{X}=[\boldsymbol{x}(1),\boldsymbol{x}(2),\ldots ,\boldsymbol{x}(n)]}\in R^{n\times m}\; (\boldsymbol{x} =[x_1,x_2,\ldots ,x_m]^\mathrm {T})\) and \({\boldsymbol{Y}}=[\boldsymbol{y}(1),\boldsymbol{y}(2),\ldots ,\boldsymbol{y}(n)]^\mathrm {T}\in R^{n\times l}\;(\boldsymbol{y} =[y_1,y_2,\ldots ,y_l])\). \({\boldsymbol{X}}\) is the process variable and \({\boldsymbol{Y}}\) is the quality variable. m and l are the dimensionality of the input and output spaces, and n is the number of samples. \({\boldsymbol{X}}\) and \({\boldsymbol{Y}}\) are decomposed as follows:

$$\begin{aligned} {\boldsymbol{X}}&= \boldsymbol{TP}^{\mathrm {T}} +\bar{\boldsymbol{X}} \end{aligned}$$
(9.1)
$$\begin{aligned} {\boldsymbol{Y}}&= \boldsymbol{UQ}^{\mathrm {T}} + \bar{\boldsymbol{Y}}, \end{aligned}$$
(9.2)

where \({\boldsymbol{T}}=[\boldsymbol{t}_1,\boldsymbol{t}_2,\ldots ,\boldsymbol{t}_d]\in R^{n\times d}\), and \({\boldsymbol{U}}=[\boldsymbol{u}_1,\boldsymbol{u}_2,\ldots ,\boldsymbol{u}_d]\in R^{n\times d}\) are the score matrices of \({\boldsymbol{X}}\) and \({\boldsymbol{Y}}\), respectively. \({\boldsymbol{P}}=[{\boldsymbol{p}}_1,{\boldsymbol{p}}_2,\ldots ,{\boldsymbol{p}}_d]\in R^{m\times d}\) and \({\boldsymbol{Q}}=[{\boldsymbol{q}}_1,{\boldsymbol{q}}_2,\ldots ,{\boldsymbol{q}}_d]\in R^{l\times d}\) are the load matrices of \({\boldsymbol{X}}\) and \({\boldsymbol{Y}}\). \(\bar{\boldsymbol{X}}\in R^{n\times m}\) and \(\bar{\boldsymbol{Y}}\in R^{n\times l}\) are the residual matrices of \({\boldsymbol{X}}\) and \({\boldsymbol{Y}}\). d is the number of latent variables. The weight vectors \({\boldsymbol{w}}\) and \({\boldsymbol{c}}\) are derived by the NIPALS algorithm such that the covariance of score vectors \({\boldsymbol{t}}\) and \({\boldsymbol{u}}\) is maximized.

$$\begin{aligned} \begin{aligned} \max cov({\boldsymbol{t}},{\boldsymbol{u}})&= \sqrt{Var({\boldsymbol{t}})Var({\boldsymbol{u}})} r({\boldsymbol{t}},{\boldsymbol{u}})\\&=\sqrt{Var(\boldsymbol{Xw})Var(\boldsymbol{Yc})} r(\boldsymbol{Xw},\boldsymbol{Yc}). \end{aligned} \end{aligned}$$
(9.3)

Equation (9.3) is actually equivalent to solving the following optimization problem:

$$\begin{aligned} \begin{aligned}&\max _{\boldsymbol{w},\boldsymbol{c}}<\boldsymbol{Xw},\;\boldsymbol{Yc}>\\&\mathrm{s.t.}\; \Vert \boldsymbol{w} \Vert =1,\Vert \boldsymbol{c} \Vert =1 \end{aligned} \end{aligned}$$
(9.4)

or

$$\begin{aligned} \begin{aligned}&J_\mathrm{PLS} =\max \boldsymbol{w}^\mathrm {T}\boldsymbol{X}^\mathrm {T}\boldsymbol{Yc}\\&\mathrm{s.t.}\; \Vert \boldsymbol{w}\Vert =1,\Vert \boldsymbol{c}\Vert =1. \end{aligned} \end{aligned}$$
(9.5)

9.2.2 LPP and PCA Optimization Objectives

LPP aims to project points in space \({\boldsymbol{X}}\) into low-dimensional space \({\boldsymbol{\varPhi }} = \left[ {\boldsymbol{\phi }}^{\mathrm {T}}(1), \boldsymbol{\phi }^{\mathrm {T}}(2), \ldots , \boldsymbol{\phi }^{\mathrm {T}}(n) \right] ^{\mathrm {T}} \in {R^{n \times d}}(d < m, \boldsymbol{\phi }=[\phi _1,\ldots ,\phi _d] )\) via the projection matrix \({\boldsymbol{W}} = [{\boldsymbol{w}_1},\ldots ,{\boldsymbol{w}_d}] \in {R^{m \times d}}\), that is,

$$\begin{aligned} {\boldsymbol{\phi }}(i) = \boldsymbol{x}(i){\boldsymbol{W}}, (i = 1,2,\ldots ,n). \end{aligned}$$
(9.6)

The optimal mapping of the input space can be obtained by solving the following minimization problem:

$$\begin{aligned} \begin{aligned} J_\mathrm{{LPP}}({\boldsymbol{w}})&= \min \frac{1}{2}\sum \limits _{i,j = 1}^n {||{\boldsymbol{\phi }_i} - {\boldsymbol{\phi }_j}|{|^2}} {s_{xij}}\\&=\min \left( {\boldsymbol{w}}^{\mathrm {T}}{\boldsymbol{X}}^{\mathrm {T}}{\boldsymbol{D}}_x \boldsymbol{Xw} - {\boldsymbol{w}}^{\mathrm {T}}{\boldsymbol{X}}^{\mathrm {T}}{\boldsymbol{S}} _x \boldsymbol{Xw} \right) \\ {\mathrm{s.t.}}&\;\;{\boldsymbol{w}}^{\mathrm {T}}{\boldsymbol{X}}^{\mathrm {T}}{\boldsymbol{D}}_x\boldsymbol{Xw} = 1, \end{aligned} \end{aligned}$$
(9.7)

where \(\boldsymbol{S}_x=[s_{xij}] \in {R^{n \times n}}\) is the neighboring relationship matrix between \(x_i\) and \(x_j\). \(\boldsymbol{D}_x=[d_{xij}]\) is a diagonal matrix, \({d_{xii}} = \sum \limits _j {{s_{xij}}}\), and

$$\begin{aligned} s_{xij}=\left\{ \begin{aligned}&e^{-\frac{||{\boldsymbol{x}(i)} - {\boldsymbol{x}(j)}|{|^2}}{2\delta _x^2 }},&{\boldsymbol{x}(i)} \;\text {and} \;{\boldsymbol{x}(j)} \in \text {``neighbors''}\\ {}&0,&otherwise \end{aligned}\right. \end{aligned}$$
(9.8)

\(\delta _x\) is the neighbors parameter. Compute the “neighbors” of \(\boldsymbol{x}(i)\) and \(\boldsymbol{x}(j)\) by K-nearest neighbors method.

The LPP problem (9.7) in space \({\boldsymbol{X}}\) is updated as follows:

$$\begin{aligned} \begin{aligned} J_{\mathrm{LPP}}(\boldsymbol{w})&= \max \; {\boldsymbol{w}^{\mathrm {T}}}{{\boldsymbol{X}}^{\mathrm {T}}}\boldsymbol{S}_x\boldsymbol{Xw}\\ \mathrm{s.t.} \;&\;{\boldsymbol{w}^{\mathrm {T}}}{{\boldsymbol{X}}^{\mathrm {T}}}\boldsymbol{D}_x\boldsymbol{Xw} = 1. \end{aligned} \end{aligned}$$
(9.9)

The local structure information of \({\boldsymbol{X}}\) is contained in the matrices \({\boldsymbol{X}}^{\mathrm {T}} \boldsymbol{S}_x\boldsymbol{X}\) and \({\boldsymbol{X}}^{\mathrm {T}} \boldsymbol{D}_x\boldsymbol{X}\). The magnitude of the diagonal element values indicates the magnitude of the role of the corresponding variables in preserving the local structure. The non-diagonal elements correspond to the correlation between the observed variables. Similarly, the optimization problem for PCA can be expressed as follows:

$$\begin{aligned} \begin{aligned} J_\mathrm{{PCA}}(\boldsymbol{w})&= \max {\boldsymbol{w}^{\mathrm {T}}}{{\boldsymbol{X}}^{\mathrm {T}}}\boldsymbol{Xw}\\ \mathrm{s.t.}&\;\; {\boldsymbol{w}^{\mathrm {T}}}\boldsymbol{w} = 1. \end{aligned} \end{aligned}$$
(9.10)

Based on the similarity of the optimization goals of LPP and PCA, combined with the component extraction idea of PCA included in PLS, we naturally consider fusing the LPP features into PLS to weaken the limitation of PLS, lack of local feature extraction capabilities. The simplest feature fusion method is to re-synthesize the two optimization goals, such as the GLPLS (Zhong et al. 2016), into a new optimization goal through some trade-off parameters.

9.3 Introduction to the GLPLS

GLPLS method is given in this chapter to obtain the relationship between the quality and measurement variables while maintaining the local characteristics as much as possible. The main idea is to integrate the LPP method to preserve the local structural characteristics and the PLS method to perform the relevant quality statistical analysis. As a result, GLPLS method is able not only to identify the latent characteristics direction for both the measurement and the quality data space but also to preserve (to the greatest extent possible) the local structural characteristics in the two hidden subspaces.

Consider both the manifold structure for process variables \({\boldsymbol{X}}\) and the product output variables \({\boldsymbol{Y}}\) by introducing parameters \(\lambda _1\) and \(\lambda _2\) to control the trade-off between the extraction of the global and local features. Therefore, the objective of GLPLS-based method is defined as

$$\begin{aligned} \begin{aligned} {J_\mathrm{GLPLS}}(\boldsymbol{w},\boldsymbol{c})&= \arg \max \{ {\boldsymbol{w}^\mathrm {T}}{\boldsymbol{X}^\mathrm {T}}\boldsymbol{Yc} + {\lambda _1}{\boldsymbol{w}^\mathrm {T}}{\boldsymbol{\theta }_x}\boldsymbol{w} + {\lambda _2}{\boldsymbol{c}^\mathrm {T}}{\boldsymbol{\theta }_y}\boldsymbol{c}\} \\ \mathrm{s.t.}&\;\;{\boldsymbol{w}^\mathrm {T}}\boldsymbol{w} = 1,{\boldsymbol{c}^\mathrm {T}}\boldsymbol{c} = 1, \end{aligned} \end{aligned}$$
(9.11)

where \({\boldsymbol{\theta }_x} = {\boldsymbol{X}^\mathrm {T}}{\boldsymbol{S}_x}\boldsymbol{X}\) and \({\boldsymbol{\theta }_y} = {\boldsymbol{Y}^\mathrm {T}}{\boldsymbol{S}_y}\boldsymbol{Y}\) represent the local structure information of process variables and quality variables, respectively. \({\boldsymbol{S}_x}\), \({\boldsymbol{S}_y}\), \({\boldsymbol{D}_1}\), and \({\boldsymbol{D}_2}\) are the local feature parameter of the LPP algorithm. Parameters \({\lambda _1}\) and \({\lambda _2}\) are used to control the weight coefficients between global and local features.

It can be found from (9.11) that the objective function of GLPLS contains the objective function of the PLS algorithm \({\boldsymbol{w}^\mathrm {T}}{\boldsymbol{X}^\mathrm {T}}\boldsymbol{Yc}\) and a part of the optimization problem of LPP algorithm \({\boldsymbol{w}^\mathrm {T}}{\boldsymbol{X}^\mathrm {T}}{\boldsymbol{S}_x}\boldsymbol{Xw}\) and \({\boldsymbol{c}^\mathrm {T}}{\boldsymbol{Y}^\mathrm {T}}{\boldsymbol{S}_y}\boldsymbol{Yc}\).

The optimization function (9.11) seems to be a good combination of the PLS algorithm global characteristics and the LPP algorithm local persistence characteristics. Is that really the case? Let us analyze the solution of the optimization problem first. To solve the optimization objective function (9.11), the following Lagrange function is introduced:

$$\begin{aligned} \begin{aligned} \psi (\boldsymbol{w},\boldsymbol{c}) =&{\boldsymbol{w}^\mathrm {T}}{\boldsymbol{X}^\mathrm {T}}\boldsymbol{Yc} + {\lambda _1}{\boldsymbol{w}^\mathrm {T}}{\boldsymbol{\theta }_x}\boldsymbol{w} + {\lambda _2}{\boldsymbol{c}^\mathrm {T}}{\boldsymbol{\theta }_y}\boldsymbol{c} \\ {}&- {\eta _1}({\boldsymbol{w}^\mathrm {T}}\boldsymbol{w} - 1) - {\eta _2}({\boldsymbol{c}^\mathrm {T}}\boldsymbol{c} - 1). \end{aligned} \end{aligned}$$
(9.12)

Then, according to the conditions for extremum, (9.11) is resolved as follows (Zhong et al. 2016):

$$\begin{aligned} {J_\mathrm{GLPLS}}(\boldsymbol{w},\boldsymbol{c}) = {\eta _1} + {\eta _2}. \end{aligned}$$
(9.13)

Let \({\lambda _1} = {\eta _1},{\lambda _2} = {\eta _2}\), \(\boldsymbol{w}\) is best projection vector, which is the corresponding eigenvector of the largest eigenvalue \(\left( \boldsymbol{I} - \boldsymbol{\theta }_x\right) ^{ - 1} \boldsymbol{X}^\mathrm {T}\boldsymbol{Y} \left( \boldsymbol{I} - \boldsymbol{\theta }_y\right) ^{ - 1} \boldsymbol{Y}^\mathrm {T}\boldsymbol{X}\), \(\boldsymbol{c}\) is best projection vector, which is the corresponding eigenvector of the largest eigenvalue \( {(\boldsymbol{I} - {\boldsymbol{\theta }_y})^{ - 1}}{\boldsymbol{Y}^\mathrm {T}}\boldsymbol{X}{(\boldsymbol{I} - {\boldsymbol{\theta }_x})^{ - 1}}{\boldsymbol{X}^\mathrm {T}}\boldsymbol{Y}\), that is,

$$\begin{aligned} \begin{aligned} {(\boldsymbol{I} - {\boldsymbol{\theta }_x})^{ - 1}}{\boldsymbol{X}^\mathrm {T}}\boldsymbol{Y}{(\boldsymbol{I} - {\boldsymbol{\theta }_y})^{ - 1}}{\boldsymbol{Y}^\mathrm {T}}\boldsymbol{Xw}&= 4{\eta _1}{\eta _2}\boldsymbol{w}\\ {(\boldsymbol{I} - {\boldsymbol{\theta }_y})^{ - 1}}{\boldsymbol{Y}^\mathrm {T}}\boldsymbol{X}{(\boldsymbol{I} - {\boldsymbol{\theta }_x})^{ - 1}}{\boldsymbol{X}^\mathrm {T}}\boldsymbol{Yc}&= 4{\eta _1}{\eta _2}\boldsymbol{c}. \end{aligned} \end{aligned}$$
(9.14)

Equation (9.13) shows that the optimal solution of GLPLS is \(\eta _1+\eta _2\), but in the actual calculation process (9.14), the optimal solution obtained by GLPLS algorithm is \(\eta _1\eta _2\). Obviously, in most cases, the conditions for maximizing \(\eta _1+\eta _2\) and \(\eta _1\eta _2\) are different.

In order to explain the reason for this result, we once again return to the GLPLS optimization objective (9.11). Equation (9.11) is a global (PLS) and local (LPP) feature combination optimization problem. It is undeniable that this combination is reasonable to a certain extent. However, the latent variables of PLS are chosen to manifest their variation as much as possible, and the correlation between latent variables is as strong as possible. But the LPP method only needs to keep the local structure information as much as possible when constructing its latent variables. In other words, although the local features of the process variables (\(\boldsymbol{x}({\theta _x} = {\boldsymbol{X}^\mathrm {T}}{\boldsymbol{S}_x}X)\)) and the quality variables (\(\boldsymbol{y}({\theta _y} = {\boldsymbol{Y}^\mathrm {T}}{S_y}Y)\)) are enhanced, the correlation between the local features is not enhanced. Therefore, this direct combination of global and local features may lead to erroneous results.

In the GLPLS method, the LPP is used to maintain local structural features. Locally linear embedding (LLE) is also a commonly used manifold learning algorithm. Like the LPP algorithm, the LLE algorithm also converts a global nonlinear problem into a combination of multiple local linear problems by maintaining local structural information, but the LLE algorithm has fewer adjustable parameters than the LPP algorithm. Therefore, the LLE algorithm is another good solution to the problem of a strongly local nonlinear process system. The LLE algorithm has been briefly introduced in Chap. 11, and its optimization objective function is transformed into a general maximization form. Therefore, in the next section, we combine the PLS method and the LLE/LPP method in a new way, trying to maintain the global and local structural information of the process variables and quality variables at the same time, and enhance the correlation between them.

9.4 Basic Principles of GPLPLS

9.4.1 The GPLPLS Model

According to the Taylor series expansion, a nonlinear function can be written as follows:

$$\begin{aligned} F(Z) = A(Z - {Z_0}) + g(Z - {Z_0}), \end{aligned}$$
(9.15)

where \(A(Z - {Z_0})\) and \(g(Z - {Z_0})\) represent the linear part and the nonlinear part, respectively. In many real systems, especially near the balance point (\({Z_0}\)), the linear part is primary and the nonlinear part is secondary. The PLS method is difficult to model nonlinear systems well. Because the PLS method uses the linear dimensionality reduction method PCA to obtain the principal components, which only establishes the relationship between the linear part of the input variable space (\(\boldsymbol{X}\)) and the output variable space (\(\boldsymbol{Y}\)). In order to obtain a better model with local nonlinear features, the KPLS model (Rosipal and Trejo 2001) maps the original data to a high-dimensional feature space, while the LPPLS model (Wang et al. 2017) transforms nonlinear features into a combination of multiple local linearized features. Both of these methods can solve some nonlinear problems. However, the feature space of the KPLS model is not easy to determine, and the main linear part of the LPPLS model is more suitable to be directly described by global structural features.

In fact, the PLS optimization (9.5) includes two goals for the selected latent variable: one is that the latent variable contains variance varying as much as possible and the other is that the correlation between the latent variables of the input space and the output space is as strong as possible. Although the GLPLS model combines global and local feature information, the combination of the two is not coordinated. How does one combine the two features to maintain the same objective? According to the expression of a nonlinear function (9.15), the input and output spaces can both be divided into two parts: the linear and nonlinear parts. By introducing local structure information, the nonlinear part can be transformed into a combination of multiple local linear problems.

Inspired by the role of the PCA model (\(\underline{\boldsymbol{w}^\mathrm {T}\boldsymbol{X}^\mathrm {T}} \boldsymbol{Xw}\)) in the PLS model (\(\underline{\boldsymbol{w}^\mathrm {T}\boldsymbol{X}^\mathrm {T}} \boldsymbol{Yc}\)) and the limitation of the GLPLS algorithm, this section proposes a novel dimensionality reduction method. It combines global (PCA) and local (LLE/LPP) features to extract latent variables of nonlinear systems. Therefore, the input space \(\boldsymbol{X}\) or the output space \(\boldsymbol{Y}\) is mapped to the new feature space \({\boldsymbol{X}_F}\) and \({\boldsymbol{Y}_F}\), respectively. The new feature space contains a global linear subspace and multiple local linear subspaces. Use the new feature space \({\boldsymbol{X}_F}\) and \({\boldsymbol{Y}_F}\) to replace the original space \(\boldsymbol{X}\) and \(\boldsymbol{Y}\), respectively. Consequently, a new objective function of the global plus local projection to latent structure (GPLPLS) method is shown in the following new optimization objective

$$\begin{aligned} \begin{aligned} {J_{\mathrm{GPLPLS}}}(\boldsymbol{w},\boldsymbol{c})&= \arg \max \{ {\boldsymbol{w}^\mathrm {T}}\boldsymbol{X}_F^\mathrm {T}{\boldsymbol{Y}_F}\boldsymbol{c}\} \\ s.t.\;{\boldsymbol{w}^\mathrm {T}}\boldsymbol{w}&= 1,{\boldsymbol{c}^\mathrm {T}}\boldsymbol{c} = 1, \end{aligned} \end{aligned}$$
(9.16)

where \({\boldsymbol{X}_F}\) and \({\boldsymbol{Y}_F}\) satisfy \({\boldsymbol{X}_F} = \boldsymbol{X} + {\lambda _x}\boldsymbol{\theta }_x^{\frac{1}{2}}\) and \({\boldsymbol{Y}_F} = \boldsymbol{Y} + {\lambda _y}\boldsymbol{\theta }_y^{\frac{1}{2}}\).

It is found that the new feature spaces \({\boldsymbol{X}_F}\) and \({\boldsymbol{Y}_F}\) are both divided into linear part (\({\boldsymbol{X}}\), \({\boldsymbol{Y}}\)) and nonlinear part (\({\lambda _x}\boldsymbol{\theta }_x^{\frac{1}{2}}\), \({\lambda _y}\boldsymbol{\theta }_y^{\frac{1}{2}}\)), similar as (9.15). Figure 9.2 shows the principle of the GPLPLS method. Here \(\boldsymbol{X}_{global}\) and \(\boldsymbol{Y}_{global}\) are the corresponding linear part in the input space and the output space, respectively. They will be projected to the dimensionality reduction space by the traditional global projection method, PLS. \(\boldsymbol{X}_{local}\) and \(\boldsymbol{Y}_{local}\) are the corresponding nonlinear parts, which will be dimensionality reduction projected by the local-preserving projection method (LPP).

Fig. 9.2
figure 2

The schematic diagram of the GPLPLS method

The core of extracting the principle components is PCA. So the linear model of \(\boldsymbol{X}\) and \(\boldsymbol{Y}\) is established by (9.16). It actually contains two relations: one relationship is that the input and output spaces are divided into “score” and “load” (external relationship), and the other relationship is the relationship between the latent variables of the input space and output space (internal relationship). These two relationships can also be seen from the schematic diagram (Fig. 9.2) of the GPLPLS model. Obviously, we can keep only the internal model, or the external model, or retain the local structure information of the internal model and the external model at the same time. Therefore, by setting four different values of \({\lambda _x}\) and \({\lambda _y}\), four different optimization objective functions can be set as follows:

  1. (1)

    PLS optimization objective function: \({\lambda _x} = 0,{\lambda _y} = 0\).

  2. (2)

    \({\mathrm{GPLPLS}}_{x}\) optimization objective function: \({\lambda _x} > 0,{\lambda _y} = 0\).

  3. (3)

    \({\mathrm{GPLPLS}}_{y}\) optimization objective function: \({\lambda _x} = 0,{\lambda _y} > 0\).

  4. (4)

    \({\mathrm{GPLPLS}}_{x+y}\) optimization objective function: \({\lambda _x}> 0,{\lambda _y} > 0\).

9.4.2 Relationship Between GPLPLS Models

The optimization objective function of the GPLPLS method is given by (9.16). There are three GPLPLS models according to different values of \({\lambda _x}\) and \({\lambda _y}\). What is the relationship between the three GPLPLS models? What is the difference between their modeling? These issues will be discussed in this section.

Suppose the original relationship is \(\boldsymbol{Y} = f(\boldsymbol{X})\). Local linear embedding or local-preserving projection can be regarded as the equilibrium point of system linearization. From this perspective, the models with different combinations of \({\lambda _x}\) and \({\lambda _y}\) are as follows:

  1. (1)

    PLS model: \({\hat{\boldsymbol{Y}}} = {\boldsymbol{A}}_{0}\boldsymbol{X}\).

  2. (2)

    \({\mathrm{GPLPLS}}_{x}\) model: \({\hat{\boldsymbol{Y}}}= {\boldsymbol{A}_1}[\boldsymbol{X},{\boldsymbol{x}_{{z_i}}}]\).

  3. (3)

    \({\mathrm{GPLPLS}}_{y}\) model: \({\hat{\boldsymbol{Y}}} = {\boldsymbol{A}_2}[\boldsymbol{X},\boldsymbol{f}({\boldsymbol{x}_{{l_j}}})]\).

  4. (4)

    \({\mathrm{GPLPLS}}_{x+y}\) model: \({\hat{\boldsymbol{Y}}} = {\boldsymbol{A}_3}[\boldsymbol{X},{\boldsymbol{x}_{{z_i}}},\boldsymbol{f}({\boldsymbol{x}_{{l_j}}})]\).

Here \({\boldsymbol{x}_{{z_i}}}(i = 1,2,\ldots ,{k_x})\) and \({\boldsymbol{y}_{{l_j}}} = \boldsymbol{f}({\boldsymbol{x}_{{l_j}}})(j = 1,2,\ldots ,{k_y})\) are the local feature points of the input space and output space, respectively. \({\boldsymbol{A}_0},{\boldsymbol{A}_1},{\boldsymbol{A}_2}\), and \({\boldsymbol{A}_3}\) are the model coefficient matrices. Obviously, PLS uses a simple linear approximation of the original system. This approximation effect is generally not good for a nonlinear relatively strong system. The GPLPLS uses the method of spatial local decomposition and approximates the original system with the sum of multiple simple linear models. \(\mathrm{GPLPLS}_{x}\) or \(\mathrm{GPLPLS}_{y}\) is a special case of \(\mathrm{GPLPLS}_{x+y}\). It seems that these three combinations have embraced all the possible GPLPLS models. Let us go back to the \(\mathrm{GPLPLS}_{x+y}\) model’s optimization function again.

$$\begin{aligned} \begin{aligned} J_{{\mathrm{GPLPLS}_{x+y}}}(\boldsymbol{w},\boldsymbol{c})=&\arg \max _{\boldsymbol{w},\boldsymbol{c}} \{ {\boldsymbol{w}^\mathrm {T}}\left( \boldsymbol{X}+\lambda _x\boldsymbol{\theta }_x^\frac{1}{2}\right) ^\mathrm {T}\left( \boldsymbol{Y}+\lambda _y\boldsymbol{\theta }_y^\frac{1}{2}\right) \boldsymbol{c}\}\\ =&\arg \max _{\boldsymbol{w},\boldsymbol{c}} \left\{ \boldsymbol{w}^\mathrm {T}\boldsymbol{X}^\mathrm {T}\boldsymbol{Y c}+ \lambda _x \boldsymbol{w}^\mathrm {T}\boldsymbol{\theta }_x^{\frac{1}{2}\mathrm {T}} \boldsymbol{Y c}\right. \\&+ \left. \lambda _y \boldsymbol{w}^\mathrm {T}\boldsymbol{X}^\mathrm {T}\boldsymbol{\theta }_y^\frac{1}{2} \boldsymbol{c} + \lambda _x \lambda _y \boldsymbol{w}^\mathrm {T}\boldsymbol{\theta }_x^{\frac{1}{2}\mathrm {T}} \boldsymbol{\theta }_y^\frac{1}{2} \boldsymbol{c} \right\} \\ \mathrm{s.t.}&\quad {\boldsymbol{w}^\mathrm {T}}\boldsymbol{w} = 1,{\boldsymbol{c}^\mathrm {T}}\boldsymbol{c} = 1. \end{aligned} \end{aligned}$$
(9.17)

Obviously, (9.17) contains two coupled components (\(\boldsymbol{\theta }_{x}^{\frac{1}{2}\mathrm {T}}\boldsymbol{Y}\) and \(\boldsymbol{X}^{\mathrm {T}} \boldsymbol{\theta }_{y}^{\frac{1}{2}}\)), which represent the correlation between the linear primary part and the nonlinear part. In some cases, these coupled components may have a negative impact on modeling. On the other hand, in addition to the external relationship between the input and output space which can be extended to a combination of linear and nonlinear, the internal relationship between the input and output space (the final model) can also be described as a combination of linear and nonlinear. Therefore, it is natural that we can model the linear and nonlinear parts without considering the coupling component between the two parts. Correspondingly, there is no need to consider the coupling component between the linear and nonlinear parts in the optimization function of the model. Therefore, the optimization objective of the following \(\mathrm{GPLPLS}_{xy}\) model can be obtained:

$$\begin{aligned} \begin{aligned} J_\mathrm{{\mathrm{GPLPLS}_{xy}}}(\boldsymbol{w},\boldsymbol{c})&= \arg \max \{ \boldsymbol{w}^\mathrm {T}\boldsymbol{X}^\mathrm {T}\boldsymbol{Y} \boldsymbol{c} + \lambda _{xy} \boldsymbol{w}^\mathrm {T}\boldsymbol{\theta }_x^{\frac{1}{2}\mathrm {T}} \boldsymbol{\theta }_y^\frac{1}{2} \boldsymbol{c} \}\\ \mathrm{s.t.}&\quad {\boldsymbol{w}^\mathrm {T}}\boldsymbol{w} = 1,{\boldsymbol{c}^\mathrm {T}}\boldsymbol{c} = 1. \end{aligned} \end{aligned}$$
(9.18)

Among them, \(\lambda _{xy}\) parameters control the trade-off between global and local features.

9.4.3 Principal Components of the GPLPLS Model

In this section, we will introduce how to obtain the principal components of the GPLPLS model. In order to facilitate the comparison with the traditional linear PLS model, denoted by \(\boldsymbol{E}_{0F} = {\boldsymbol{X}_F}\) and \(\boldsymbol{F}_{0F} = {\boldsymbol{Y}_F}\). The optimization objective functions of four GPLPLS models are included in the following optimization objectives:

$$\begin{aligned} \begin{aligned} {J_{\mathrm{GPLPLS}}}(\boldsymbol{w},\boldsymbol{c})&= \arg \max \{ {\boldsymbol{w}^\mathrm {T}}\boldsymbol{X}_F^\mathrm {T}{\boldsymbol{Y}_F}\boldsymbol{c} + \lambda _{xy}{\boldsymbol{w}^\mathrm {T}} \boldsymbol{\theta }_{x}^{\frac{1}{2}\mathrm {T}} \boldsymbol{\theta }_{y}^{\frac{1}{2}}\boldsymbol{c} \} \\ \mathrm{s.t.}&\quad {\boldsymbol{w}^\mathrm {T}}\boldsymbol{w} = 1,{\boldsymbol{c}^\mathrm {T}}\boldsymbol{c} = 1, \end{aligned} \end{aligned}$$
(9.19)

where at least one of \([{\lambda _x},{\lambda _y}]\) and \({\lambda _{xy}}\) is nonzero. The steps of obtaining latent variables of the GPLPLS model (9.19) are as follows.

First, the Lagrangian multiplier factor is introduced to transform the objective function (9.19) into the following unconstrained form:

$$\begin{aligned} \begin{aligned} \varPsi ({\boldsymbol{w}_1},{\boldsymbol{c}_1}) =&\boldsymbol{w}_1^\mathrm {T}\boldsymbol{E}_{0F}^\mathrm {T}{\boldsymbol{F}_{0F}}{\boldsymbol{c}_1} + {\lambda _{xy}}\boldsymbol{w}_1^\mathrm {T}\boldsymbol{\theta }_{x}^{\frac{1}{2}\mathrm {T}} \boldsymbol{\theta }_{y}^{\frac{1}{2}}\boldsymbol{c}_{1}\\ {}&- {\lambda _1}(\boldsymbol{w}_1^\mathrm {T}{\boldsymbol{w}_1} - 1) - {\lambda _2}(\boldsymbol{c}_1^\mathrm {T}{\boldsymbol{c}_1} - 1). \end{aligned} \end{aligned}$$
(9.20)

Let \((\partial \varPsi )/(\partial {\boldsymbol{w}_1}) = 0\) and \((\partial \varPsi )/(\partial {\boldsymbol{c}_1}) = 0\), we can find the optimal solution of \({\boldsymbol{w}_1}\) and \({\boldsymbol{c}_1}\). Then the objective function (9.19) is transformed as

$$\begin{aligned} \left[ \boldsymbol{E}_{0F}^\mathrm {T}{\boldsymbol{F}_{0F}} + {\lambda _{xy}} \boldsymbol{\theta }_{x}^{\frac{1}{2}T}\boldsymbol{\theta }_{y}^{\frac{1}{2}}\right] ^\mathrm {T}\left[ \boldsymbol{E}_{0F}^\mathrm {T}{\boldsymbol{F}_{0F}} + {\lambda _{xy}} \boldsymbol{\theta }_{x}^{\frac{1}{2}\mathrm {T}}\boldsymbol{\theta }_{y}^{\frac{1}{2}}\right] {\boldsymbol{w}_1}= {\theta ^2}{\boldsymbol{w}_1} \end{aligned}$$
(9.21)
$$\begin{aligned} \left[ \boldsymbol{F}_{0F}^\mathrm {T}{\boldsymbol{E}_{0F}} + {\lambda _{xy}}\boldsymbol{\theta }_{y}^{\frac{1}{2}\mathrm {T}}\boldsymbol{\theta }_{x}^{\frac{1}{2}}\right] ^\mathrm {T}\left[ \boldsymbol{F}_{0F}^\mathrm {T}{\boldsymbol{E}_{0F}} + {\lambda _{xy}}\boldsymbol{\theta }_{y}^{\frac{1}{2}\mathrm {T}}\boldsymbol{\theta }_{x}^{\frac{1}{2}}\right] {\boldsymbol{c}_1} = {\theta ^2}{\boldsymbol{c}_1}, \end{aligned}$$
(9.22)

where \(\boldsymbol{\theta }= {\boldsymbol{w}^\mathrm {T}}\boldsymbol{X}_F^\mathrm {T}{\boldsymbol{Y}_F}\boldsymbol{c} + \lambda _{xy}{\boldsymbol{w}^\mathrm {T}} \boldsymbol{\theta }_{x}^{\frac{1}{2}\mathrm {T}} \boldsymbol{\theta }_{y}^{\frac{1}{2}}\boldsymbol{c}\). The target vectors \(\boldsymbol{w}_{1}\) and \(\boldsymbol{c}_{1}\) are calculated from (9.21) and (9.22). After obtaining the target vector (that is, the direction vector of the latent variables), the latent variables \(\boldsymbol{t}_{1}\) and \({\boldsymbol{u}_1}\), the load vectors \(\boldsymbol{p}_{1}\) and \(\boldsymbol{q}_{1}\), and the residual matrices \(\boldsymbol{E}_{1}\) and \(\boldsymbol{F}_{1}\) can be calculated as follows:

$$\begin{aligned} {\boldsymbol{t}_1}&= {\boldsymbol{E}_{0F}}{\boldsymbol{w}_1},&{\boldsymbol{u}_1}&= {\boldsymbol{F}_{0F}}{\boldsymbol{c}_1} \end{aligned}$$
(9.23)
$$\begin{aligned} \boldsymbol{p}_1&= \frac{\boldsymbol{E}_{0F}^\mathrm {T}\boldsymbol{t}_1}{\left\| \boldsymbol{t}_1 \right\| ^2},&\boldsymbol{q}_1&= \frac{\boldsymbol{F}_{0F}^\mathrm {T}\boldsymbol{t}_1}{\left\| \boldsymbol{t}_1 \right\| ^2}\end{aligned}$$
(9.24)
$$\begin{aligned} {\boldsymbol{E}_{1F}}&= {\boldsymbol{E}_{0F}} - {\boldsymbol{t}_1}\boldsymbol{p}_1^\mathrm {T},&{\boldsymbol{F}_{1F}}&= \boldsymbol{F}_{0F} - {\boldsymbol{t}_1}\boldsymbol{q}_1^\mathrm {T}. \end{aligned}$$
(9.25)

Similar to the PLS method, the other latent variables of the GPLPLS model can be obtained by continuing to decompose the residual matrices \({\boldsymbol{E}_{iL}}\) and \({\boldsymbol{F}_{iL}}\)(\(i = 1,2,\ldots ,d - 1\)). Usually, the first d latent variables are used to produce a better predictive regression model and d can be determined by the cross-validation test (Zhou et al. 2010).

The above is the establishment of the GPLPLS model and its principal component extraction process. Now let’s compare GPLPLS model with the GLPLS model.

First of all, GPLPLS likes the GLPLS method at the main idea, i.e., to combine local and global structural features (covariance). Obviously, the GPLPLS method integrates global and local structural features better than the GLPLS method. Different from the GLPLS method, the GPLPLS method not only maintains the local structural features, but also extracts the relevant information in the input space and output space as much as possible. Therefore, the GPLPLS method can extract the largest global correlation as much as possible, while extracting the local structural correlation between process and quality variables.

Compared with the LPPLS method (Chap. 10) and LLEPLS method (Chap. 11), all the characteristics of the LPPLS method are described by local features. This indiscriminate description has advantages in strongly nonlinear systems, but it may not necessarily have advantages in linearly dominant but locally nonlinear systems. The GPLPLS method proposed in this chapter is a process aimed at linear advantages, but it still maintains some nonlinear relationships. It integrates global features (covariance) and nonlinear correlation (multivariance) as much as possible.

9.5 GPLPLS-Based Quality Monitoring

9.5.1 Process and Quality Monitoring Based on GPLPLS

The GPLPLS-based monitoring method is very similar to the PLS method. The common monitoring indicators of PLS are \(\mathrm{{T}^2}\) and \(\mathrm {SPE}\). In Chap. 11, it has been explained in detail that \(\mathrm {SPE}\) statistics is not suitable for monitoring residual space of PLS. Therefore, in this chapter, the process monitoring based on the GPLPLS method uses statistics to monitor the principal component subspace and the remaining subspace. The monitoring process is also divided into two parts: offline training and online monitoring. The detailed process is as follows.

The input space \(\boldsymbol{X}\) and the output space \(\boldsymbol{Y}\) of the GPLPLS model are mapped to a low-dimensional space defined by a small number of latent variables \([{\boldsymbol{t}_1},\ldots ,{\boldsymbol{t}_d}]\). The decomposition of \(\boldsymbol{E}_{0F}\) and \(\boldsymbol{F}_{0F}\) is as follows:

$$\begin{aligned} \begin{aligned} \boldsymbol{E}_{0F}&= \sum \limits _{i = 1}^d {\boldsymbol{t}_i}\boldsymbol{p}_i^\mathrm {T}+ \overline{\boldsymbol{E}}_{0L} = \boldsymbol{T}\boldsymbol{P}^\mathrm {T}+ {\overline{\boldsymbol{E}}}_{0F} \\ \boldsymbol{F}_{0F}&= \sum \limits _{i = 1}^d \boldsymbol{t}_i \boldsymbol{q}_i^\mathrm {T}+ \overline{\boldsymbol{F}}_{0L} = \boldsymbol{T} \boldsymbol{Q}^\mathrm {T}+ {\overline{\boldsymbol{F}}}_{0F}, \end{aligned} \end{aligned}$$
(9.26)

where \(\boldsymbol{T} = [{\boldsymbol{t}_1},{\boldsymbol{t}_2},\ldots ,{\boldsymbol{t}_d}]\) is the score matrix. \(\boldsymbol{P} = [{\boldsymbol{p}_1},\ldots ,{\boldsymbol{p}_d}]\) and \(\boldsymbol{Q} = [{\boldsymbol{q}_1},\ldots ,{\boldsymbol{q}_d}]\) are the load matrices of the process variable \(\boldsymbol{E}_{0F}\) and the quality variable \(\boldsymbol{F}_{0F}\), respectively. Use \(\boldsymbol{E}_{0F}\) instead of \({\boldsymbol{t}_i}\):

$$\begin{aligned} \boldsymbol{T} = \boldsymbol{E}_{0F}\boldsymbol{R} = \left( \boldsymbol{I} + {\lambda _x}\boldsymbol{S}_x^{\frac{1}{2}}\right) \boldsymbol{E}_0 \boldsymbol{R}, \end{aligned}$$
(9.27)

where \(\boldsymbol{R} = \left[ {\boldsymbol{r}_1},\ldots ,{\boldsymbol{r}_d}\right] \) is the decomposition matrix, and

$$\begin{aligned} {\boldsymbol{r}_i} = \prod \limits _{j = 1}^{i - 1} \left( \boldsymbol{I}_n - \boldsymbol{w}_j\boldsymbol{p}_j^\mathrm {T}\right) \boldsymbol{w}_i. \end{aligned}$$

It is noted that \(E_{0F}\) contains the results of locality-preserving learning. Operations (9.26) and (9.27) are executable during the model training. But the data is sampled real time during the process of online monitoring. The individual real-time data cannot be constructed for the transformational matrix \(\boldsymbol{S}_x\) or \(\boldsymbol{S}_y\) for the locality learning. Considering the practical application of (9.26) and (9.27), they should be transformed as the decomposition of normalized matrices \(\boldsymbol{E}_{0}\) and \(\boldsymbol{F}_{0}\),

$$\begin{aligned} {\boldsymbol{E}_0} = {\boldsymbol{T}_0}{\boldsymbol{P}^\mathrm {T}} + \bar{\boldsymbol{E}}_0 \end{aligned}$$
(9.28)
$$\begin{aligned} {\boldsymbol{F}_0} = {\boldsymbol{T}_0}{\bar{\boldsymbol{Q}} ^\mathrm {T}} + {\bar{\boldsymbol{F}} _0} = {\boldsymbol{E}_0}R{\bar{\boldsymbol{Q}} ^\mathrm {T}} + {\overline{\boldsymbol{F}} _0}, \end{aligned}$$
(9.29)

where \({\boldsymbol{T}_0} = {\boldsymbol{E}_0}\boldsymbol{R}, \bar{\boldsymbol{E}}_0= {\boldsymbol{E}_0} - {\boldsymbol{T}_0}{\boldsymbol{P}^\mathrm {T}}, \mathrm{and} \bar{\boldsymbol{Q}} = \boldsymbol{T}_0^ + {\boldsymbol{F}_0}\).

During the online monitoring for new samples \(\boldsymbol{x}\) and \(\boldsymbol{y}\) (standardized data), an oblique projection is introduced in the input space \(\boldsymbol{x}\):

$$\begin{aligned} \boldsymbol{x}&= \boldsymbol{\hat{x}} + {x_e}\end{aligned}$$
(9.30)
$$\begin{aligned} \boldsymbol{\hat{x}}&= \boldsymbol{R}{\boldsymbol{P}^\mathrm {T}}\boldsymbol{x}\end{aligned}$$
(9.31)
$$\begin{aligned} {x_e}&= (\boldsymbol{I} - \boldsymbol{R}{\boldsymbol{P}^\mathrm {T}})\boldsymbol{x}. \end{aligned}$$
(9.32)

The statistics \(\mathrm {T}_{pc}^2\) and \(\mathrm {T}_{e}^2\) of the principal component space and the remaining subspace are calculated as follows:

$$\begin{aligned} \boldsymbol{t}&= {\boldsymbol{R}^\mathrm {T}}\boldsymbol{x} \end{aligned}$$
(9.33)
$$\begin{aligned} \mathrm{{T}}_{pc}^2:= {\boldsymbol{t}^\mathrm {T}}{\boldsymbol{\varLambda }^{ - 1}}\boldsymbol{t}&= {\boldsymbol{t}^\mathrm {T}}{\left\{ \frac{1}{{n - 1}}\boldsymbol{T}_0^\mathrm {T}{\boldsymbol{T}_0}\right\} ^{ - 1}}\boldsymbol{t} \end{aligned}$$
(9.34)
$$\begin{aligned} \mathrm{{T}}_e^2 := \boldsymbol{x}_e^\mathrm {T}\boldsymbol{\varLambda }_e^{ - 1}{\boldsymbol{x}_e}&= \boldsymbol{x}_e^\mathrm {T}{\left\{ \frac{1}{{n - 1}}\boldsymbol{x}_e^\mathrm {T}{\boldsymbol{x}_e}\right\} ^{ - 1}}{\boldsymbol{x}_e}, \end{aligned}$$
(9.35)

where \(\boldsymbol{\varLambda }\) and \({\boldsymbol{\varLambda }_e}\) are covariance matrices. \(\mathrm{T}_{pc}^2\) and \(\mathrm {T}_{e}^2\) are statistics with the threshold \(\mathrm{Th}_{pc,\alpha }\) and \(\mathrm { Th}_{e,\alpha }\), respectively. Considering the statistics \(\mathrm {T}_{pc}^2\) and \(\mathrm {T}_{e}^2\) are not obtained through normalized data \({\boldsymbol{E}_{0}}\), and the output variables may not obey the Gaussian distribution. Therefore, the corresponding thresholds cannot be calculated from F-distribution. So their probability density functions should be estimated first by non-parametric kernel density estimation (KDE) (Lee et al. 2010).

The fault diagnosis logic based on the GPLPLS model is as follows:

$$\begin{aligned} \begin{aligned} \mathrm{T}_{pc}^2&> \mathrm{Th}_{pc,\alpha }&\text {Quality-relevant faults}\\ \mathrm{T}_{pc}^2&> \mathrm{Th}_{pc,\alpha } \;\; \text {or}\;\; \mathrm{T}_{e}^2 > \mathrm{Th}_{e,\alpha }&\text {Process-relevant faults}\\ \mathrm{T}_{pc}^2&\le \mathrm{Th}_{pc,\alpha } \;\; \text {and} \;\; \mathrm{T}_{e}^2 \le \mathrm{Th}_{e,\alpha }&\text {Fault free} \end{aligned} \end{aligned}$$
(9.36)

The process monitoring of GPLPLS algorithm with multiple input and multiple output data is as follows:

  1. (1)

    Standardize the original data \(\boldsymbol{X}\) and \(\boldsymbol{Y}\). Calculate \({\boldsymbol{T}_0}\), \(\bar{\boldsymbol{Q}}\), and \(\boldsymbol{R}\) based on GPLPLS algorithm (9.28) and (9.29). Determine the number of principal components d by cross-validation.

  2. (2)

    Construct the input remaining subspace \({\boldsymbol{x}_e}\).

  3. (3)

    The thresholds are calculated according to the non-parametric KDE estimation, and the fault diagnosis is performed with the detection logic (9.36).

9.5.2 Posterior Monitoring and Evaluation

Many quality-related process monitoring methods have been verified on the well-known TE process simulation platform. The goal of most methods is to make the quality-related alarm rate as high as possible, but the reasonability of monitoring result seems to receive little attention. Therefore, similar to the performance evaluation index of the control loop, we introduce a posterior monitoring assessment (PMA) index to evaluate the reasonability of quality-related alarm rate. PMA is defined as follows:

$$\begin{aligned} \mathrm{PMA} = \frac{\mathbb {E}\left( \boldsymbol{y}_N^2\right) }{\mathbb {E}\left( \boldsymbol{y}_F^2\right) }, \end{aligned}$$
(9.37)

where \(\mathbb {E}(\cdot )\) is the mathematical expectation, \({\boldsymbol{y}_N}\) and \({\boldsymbol{y}_F}\) are the output data of the training data set and the output data of the fault data set, respectively. It is noted that they are both normalized by the mean and standard deviation of \({\boldsymbol{y}_N}\). \(\mathrm{PMA} \rightarrow 1\) indicates that the quality of the fault data is close to normal operation; \(\mathrm{PMA} > 1\) indicates the data quality is better than the normal. Moreover, \(\mathrm{PMA}\) far from 1 means that the quality is very different from the normal, and the corresponding quality-related index \(\mathrm{{T}^2}\) (PLS method) or \(\mathrm{T}_{pc}^2\) (GPLPLS method) should be higher, and the others should be lower.

However, the widespread controllers reduce the impact of certain failures, especially small fault. So a single PMA indicator cannot truly reflect the dynamic changes, two PMA indicators are adopted to describe dynamic and steady-state effects, respectively,

$$\begin{aligned} \mathrm{PMA}_1 = \min \left\{ \frac{\mathbb {E}\left( \boldsymbol{Y}^2_N(k_0:k_1,i)\right) }{\mathbb {E}\left( \boldsymbol{Y}_{F}^2(k_0:k_1,i)\right) }\right\} ,\;\;i=1,2,\ldots ,l \end{aligned}$$
(9.38)
$$\begin{aligned} \mathrm{PMA}_2 = \min \left\{ \frac{\mathbb {E}\left( \boldsymbol{Y}_N^2(k_2:n,i)\right) }{\mathbb {E}\left( \boldsymbol{Y}_{F}^2(k_2:n,i)\right) }\right\} ,\;\;i=1,2,\ldots ,l, \end{aligned}$$
(9.39)

where \(k = 0,1,2\) is constant. It is noted that the worst strategy is selected in order to ensure the rationality of the evaluation. Moreover, the two PMA indicators are only used to test whether the previous fault detection results are reasonable. Their evaluations are objective but not indicate whether the fault is quality related, compared with the detection based on GPLPLS model. The quality testing is necessary for further diagnosis.

9.6 TE Process Simulation Analysis

Process monitoring and fault diagnosis based on the GPLPLS model are tested on the TE simulation platform. The monitoring performance of several models, such as PLS, a concurrent projection to the latent structures (CPLS) (Qin 2012), and GPLPLS, are compared. The input and output spaces are projected and decomposed into five subspaces in CPLS: input principle subspace, input residual subspace, output principle subspace, output residual subspace, and joint input-output subspace. Just focusing on the quality-related faults, the principle and residual subspaces of input are replaced by the input remaining subspace \({x_e}\) in CPLS model, and the corresponding monitoring statistics are replaced by \(\mathrm{T}_e^2\). The output principle and residual subspaces in the CPLS model are not considered in order to highlight process-based quality monitoring. Two different data sets are used from (Zhang et al. 2017) and (Wang et al. 2017).

9.6.1 Model and Discussion

The input matrix is composed of process variables [XMEAS(1:22)] and manipulated variables [XMV(1:11), except XMV(5) and XMV(9)]. The output matrix is composed of mass variable [XMEAS (35), XMEAS (36)]. The training data is normal data IDV(0) and the test data is 21 fault data IDV(1-21). The threshold is calculated based on the confidence level 99.75% (see equation () for detail).

The simulation parameters of the GPLPLS model, especially the \({\mathrm{GPLPLS}}_{xy}\) model) are \({k_x} = 22,{k_y} = 23,{\lambda _x} = {\lambda _y} = 0,{\lambda _{xy}} = 1, {k_0} = 161\). Note that the local nonlinear structure features are extracted by the LLE method. Number of principal components of PLS, CPLS, and GPLPLS models are 6, 6, and 2, respectively, determined by the cross-validation method. \({k_1} = n = 960,{k_2} = 701\). The detection results including FDR, FAR, and indicator \(\mathrm PMA\) are listed in Table 9.1.

Table 9.1 FDRs of PLS, CPLS, \(\mathrm{GPLPLS}_{xy}\), and \(\mathrm PMA\)

With these two PMA indices in Table 9.1, 21 faults are divided into two types: quality-independent faults (\(\mathrm{PMA}_1 > 0.9\) or \(\mathrm{PMA}_1+ \mathrm{PMA}_2 > 1.5\) ) including IDV(3,4,9,11,14,15,19) and quality-related faults. Furthermore, the quality-related faults are further subdivided into four types:

Type 1: fault has a slight impact on quality, [IDV(10,16,17, and 20)], \(0.5<\mathrm{PMA}_i < 0.8\; i=1,2\).

Type 2: fault is quality recoverable, [IDV(1,5, and 7)], \(\mathrm{PMA}_1 < 0.35\; \text {and}\;\mathrm{PMA}_2> 0.65\).

Type 3: fault has a serious impact on quality, [IDV(2, 6, 8, 12, 13, and 18)], \(\mathrm{PMA}_i < 0.1\; i=1,2\).

Type 4: fault causes the output variables to drift slowly, [IDV(21)].

This classification is not only a preliminary result depending on the choice of parameters \(k_0\), \(k_1\), and \(k_2\), but it also has a reference value. All methods show the consistent results for the serious quality-related faults, which are not discussed in the next fault detection analysis.

9.6.2 Fault Diagnosis Analysis

Form the above results, it is found that for some faults, their detection results are not consistent with different methods, including quality-recoverable faults, slight quality-related faults, and quality-independent faults. The detailed analysis for the three situations is given below. For all the monitoring graphs, the horizontal axis represents the sample, the vertical axis represents the statistics (the picture above represents \(\mathrm{T}_{pc}^2\), the picture below represents \(\mathrm{T}^2_e\)), and the red dotted line is the threshold with confidence level 99.75%. The blue line is the actual monitoring value. For all prediction graphs, the horizontal axis represents the sample, the vertical axis represents the output value, the blue dashed line is actual value, and the green line is for the prediction.

Fig. 9.3
figure 3

Output prediction for IDV(1), IDV(5), and IDV(7) using the \(\mathrm{GPLPLS}_{xy}\) method

Fig. 9.4
figure 4

PLS, CPLS, and \({\mathrm{GPLPLS}}_{xy}\) monitoring results for IDV(7)

(1) Quality-recoverable fault

Quality-recoverable faults include IDV(1), IDV(5), and IDV(7). They are all step-change faults, but the feedback or cascade controller can reduce their effect on quality during the actual process. Therefore, the quality variables in the faults IDV(1), IDV(5), and IDV(7) should return to normal. The output prediction is shown in Fig. 9.3. As an example, the corresponding fault monitoring results for IDV(7) are shown in Fig. 9.4 which correspond to the PLS, CPLS, and \({\mathrm{GPLPLS}}_{xy}\) models, respectively. Here the statistics \({\mathrm{T}_{pc}^2}\) and \(\mathrm{T}_e^2\) detected the input space for process-related faults. For the \({\mathrm{GPLPLS}}_{xy}\) model, the value of the \({\mathrm{T}_{pc}^2}\) statistic returns to the normal value, while the \(\mathrm{T}_e^2\) statistic still maintains a high value. This means that these faults are quality-recoverable faults. PLS and CPLS reported that these faults are quality-related faults but give many false alarms, especially for IDV(7). The statistical value of \({\mathrm{T}_{pc}^2}\) is also very close to the threshold, but still exceeds the threshold. They still indicated the fault alarm even when the operation have returned to normal under the controller. They fail to grasp the essence of the fault detection problem with recoverable quality. In this case, the \({\mathrm{GPLPLS}}_{xy}\) method can accurately reflect the process and quality changes.

(2) Quality-independent fault

Quality-independent faults include IDV(4), IDV(11), and IDV(14), but they are related to process. All these faults are related to the reactor cooling water, and these interferences hardly affect the quality of output products. The corresponding output quality prediction of \({\mathrm{GPLPLS}}_{xy}\) methods is shown in Fig. 9.5. The monitoring results for IDV(14) by PLS, CPLS, and \({\mathrm{GPLPLS}}_{xy}\) methods are shown in Fig. 9.6. In the \({\mathrm{GPLPLS}}_{xy}\) model, \({\mathrm{T}_{pc}^2}\) are almost under the threshold, which indicates that these faults are not related to quality. But for PLS and CPLS models, these faults are detected both in \({\mathrm{T}_{pc}^2}\) and \(T_e^2\). In other words, PLS or CPLS model shows that these interferences are related to quality. Compared with PLS, CPLS method can filter out fault alarm to a certain extent in \({\mathrm{T}_{pc}^2}\), but still has higher alarm than \({\mathrm{GPLPLS}}_{xy}\). For quality-independent fault, PLS and CPLS have a high detection rate, but fails to indicate the quality-independent faults.

Fig. 9.5
figure 5

Output prediction for IDV(4), IDV(11), and IDV(14) using the \({\mathrm{GPLPLS}}_{xy}\) method

Fig. 9.6
figure 6

PLS, CPLS, and \({\mathrm{GPLPLS}}_{xy}\) monitoring results for IDV(14)

(3) Slight quality-related faults

Faults, such as IDV(10), IDV(16), IDV(17), and IDV(20), have a slight impact on quality. Few people study this type of failure. Their quality-related alarm rates are similar to quality-recoverable faults. Although they are quality related, they have little impact on quality. Their \({T_{pc}^2}\) value of related monitoring statistics is relatively small. To some extent, these faults can also be regarded as failures that have nothing to do with quality. Many methods, such as the PLS method, fail to detect them accurately. The output prediction values of \({\mathrm{GPLPLS}}_{xy}\) models are shown in Fig. 9.7. The monitoring results of the three models for fault IDV(20) are shown in Fig. 9.8. It can be seen that the monitoring results of the \({\mathrm{GPLPLS}}_{xy}\) model are the most accurate, and the PLS and CPLS models give false alarm results. In the \({\mathrm{GPLPLS}}_{xy}\) model, process changes better match quality changes.

From the three situations analyzed above, it can be seen that the GPLPLS method can filter harmful alarm situations. It can be used for minor quality-related failures, quality-unrelated failures, and quality-recoverable failures. There are two possible reasons for the good fault diagnosis performance of the GPLPLS method: first, the principal component of the GPLPLS method is based on the global features of nonlinear local structural features, and the method enhances its nonlinear mapping ability. Secondly, the GPLPLS method uses a non-Gaussian threshold, which makes it possible to process the signal that does not necessarily satisfy the Gaussian assumption.

Fig. 9.7
figure 7

Output predicted values for IDV(16), IDV(17), and IDV(20) using the \({\mathrm{GPLPLS}}_{xy}\) method

Fig. 9.8
figure 8

PLS, CPLS, and \({\mathrm{GPLPLS}}_{xy}\) monitoring results for IDV(20)

9.6.3 Comparison of Different GPLPLS Models

For the same data set above, the FDRs of the other three \({\mathrm{GPLPLS}}_{x}\), \({\mathrm{GPLPLS}}_{y}\), and \({\mathrm{GPLPLS}}_{x+y}\) models (local nonlinear structural features are all extracted by the LLE method) are shown in Table 9.2, where \(K = [{k_x},{k_y}]\). It can be seen from the table that the results of these methods are very good, and consistent conclusions are drawn. Especially the FDR of \({\mathrm{GPLPLS}}_{x+y}\) model and the \({\mathrm{GPLPLS}}_{xy}\) model are very similar.

Table 9.2 FDRs of GPLPLS methods with LLE local feature

In order to discuss these models more clearly, fault IDV (7) is selected for further analysis. It can be seen from Table 9.2 that the monitoring results of IDV(7) by the \({\mathrm{GPLPLS}}_{y}\) model are obviously inconsistent with other methods. \(\mathrm T^2\) statistics give a higher alarm (79.25%). According to the previous analysis, this alarm is an annoying false alarm. The other three models have relatively low alarm rates for fault IDV (7), near 26%, which means that the monitoring effect is very good. The possible reason for false alarm is that the \({\mathrm{GPLPLS}}_{y}\) model only enhances the local nonlinear structure characteristics in the output space. It is linear to the input space and the output space is nonlinear. Process monitoring results may be better. However, the input space of the TE simulation process may also have strong nonlinearity, which leads to the poor monitoring results of \({\mathrm{GPLPLS}}_{y}\) model, and the other three models show higher consistency with this type of fault.

The above results of the GPLPLS models are obtained by combining with the LLE method to retain local nonlinear structural features. Below, the monitoring results of the GPLPLS model combined with another local retention algorithm LPP method are given, as shown in Table 9.3, where \(\varSigma = [{\sigma _x},{\sigma _y}]\). It can be seen that Table 9.3 gives consistent conclusions, so the analysis will not be performed here.

Table 9.3 FDRs of GPLPLS methods with LPP local feature

Many methods have the similar fusion idea of global projection and local preserving, such as GLPLS, LPPLS, and others. These methods all need to adjust parameters, and different parameters have different results. In order to be as consistent as possible with the existing results of other methods, we chose the same data set in Wang et al. (2017) for the following tests.

In the following comparison experiment, input variable matrix \(\boldsymbol{X}\) is composed of process variables \([\text {XMEAS}\left( {1:22} \right) ]\) and 11 manipulated variables \([\text {XMEAS}\left( {23:33}\right) ]\) except XMV (12). The quality variable matrix \(\boldsymbol{Y}\) includes XMEAS (35) and XMEAS (38). The model parameters based on the combination of manifold learning algorithm and PLS are set as follows:

(1) The GLPLS model: \({\delta _x} = 0.1,{\delta _y} = 0.8,{k_x} = 12,{k_y} = 12\).

(2) The LPPLS model: \({\delta _x} = 1.5,{\delta _y} = 0.8,{k_x} = 20,{k_y} = 15\).

(3) The GPLPLS model: \({k_x} = 11,{k_y} = 16\) (mainly refers to the \({\mathrm{GPLPLS}}_{xy}\) model).

Table 9.4 FDRs comparison for different quality-related methods

Table 9.4 lists the FDR values of different quality-related monitoring methods, corresponding to PLS, CPLS, GLPLS, and GPLPLS models, and the corresponding detection threshold is calculated with confidence level of 99.75%. The last two columns are FDRs calculated based on the PMA value of this data set.

It can be seen from Tables 9.1 and 9.4 that although the data sets are different, the results of PMA are similar. Therefore, the quality-related monitoring results should be similar, and it is obvious that the GPLPLS model gives consistent conclusions. The higher FDR of other models than GPLPLS is due to not good to distinguish whether these faults are quality related. Although GLPLS has similar fusion idea of global feature and local structure, its weak monitoring performance is caused by the inappropriate parameters and model construction. Because it is difficult to select suitable parameters, the parameter determination method is still an open issue.

In summary, GPLPLS model shows good monitoring performance. It is suitable for the combination of global structure and local structure features, so the output prediction results and fault monitoring results of the model are better than other models.

9.7 Conclusions

This chapter proposes a new statistical monitoring model based on the global plus local projection to latent structure (GPLPLS) model. This model not only maintains the global and local structural characteristics of the data, but also pays more attention to the correlation between the extracted principal components. First, the GLPLS method is introduced, and it is pointed out that the model construction of this method is unreasonable, and then the GPLPLS method is proposed to maintain the global and local features with a new structure. Then a monitoring model based on the GPLPLS method is established, and the monitoring performance of the proposed method is verified on the TE process simulation platform. The results show that compared with PLS, CPLS, and GLPLS, GPLPLS method has better process monitoring performance for quality-related fault.