As mentioned in the previous chapter, industrial data are usually divided into two categories, process data and quality data, belonging to different measurement spaces. The vast majority of smart manufacturing problems, such as soft measurement, control, monitoring, optimization, etc., inevitably require modeling the data relationships between the two kinds of measurement variables. This chapter’s subject is to discover the correlation between the sets in different observation spaces.

The multivariate statistical analysis relying on correlation among variables generally include canonical correlation analysis (CCA) and partial least squares regression (PLS). They all perform linear dimensionality reduction with the goal of maximizing the correlation between variables in two measurement spaces. The difference are that CCA maximize correlation, while PLS maximize covariance.

3.1 Canonical Correlation Analysis

Canonical correlation analysis (CCA) was first proposed by Hotelling in 1936 (Hotelling 1936). It is a multivariate statistical analysis method that uses the correlation between two composite variables to reflect the overall correlation between two sets of variables. The CCA algorithm is widely used in the analysis of data correlation and it is also the basis of partial least squares. In addition, it is also used in feature fusion, data dimensionality reduction, and fault detection (Yang et al. 2015; Zhang and Dou 2015; Zhang et al. 2020; Hou 2013; Chen et al. 2016a, b).

3.1.1 Mathematical Principle of CCA

Assuming that there are l dependent variables \(\boldsymbol{y}=\left( y_{1}, y_{2}, \ldots , y_{l}\right) ^\mathrm {T}\) and m independent variables \(\boldsymbol{x}=\left( x_{1}, x_{2}, \ldots , x_{m}\right) ^\mathrm {T}\). In order to capture the correlation between the dependent variables and the independent variables, n sample points are observed, which constitutes two data sets

$$\begin{aligned} \boldsymbol{X}=\left[ \boldsymbol{x}(1), \boldsymbol{x}(2), \ldots , \boldsymbol{x}(n)\right] ^\mathrm {T}\in R^{n \times m} \end{aligned}$$
$$\begin{aligned} \boldsymbol{Y}=\left[ \boldsymbol{y}(1), \boldsymbol{y}(2), \ldots , \boldsymbol{y}(n)\right] ^\mathrm {T}\in R^{n \times l} \end{aligned}$$

CCA draws on the idea of component extraction to find a canonical component u, which is a linear combination of variables \(x_i\); and a canonical component v, which is a linear combination of \(y_i\). In the process of extraction, the correlation between u and v is required to be maximized. The correlation degree between u and v can roughly reflect the correlation between \(\boldsymbol{X}\) and \(\boldsymbol{Y}\).

Without loss of generality, assuming that the original variables are all standardized, i.e., each column of the data set \(\boldsymbol{X}\) and \(\boldsymbol{Y}\) has mean 0 and variance 1, the covariance matrix of \(\mathrm{cov}(\boldsymbol{X},\boldsymbol{Y})\) is equal to its correlation coefficient matrix, in which,

$$\begin{aligned} \mathrm{cov}(\boldsymbol{X},\boldsymbol{Y})=\frac{1}{n} \begin{bmatrix} \boldsymbol{X}^\mathrm {T}\boldsymbol{X} &{} \quad \boldsymbol{X}^\mathrm {T}\boldsymbol{Y}\\ \boldsymbol{Y}^\mathrm {T}\boldsymbol{X} &{} \quad \boldsymbol{Y}^\mathrm {T}\boldsymbol{Y} \end{bmatrix} =\begin{bmatrix} \boldsymbol{\varSigma }_{xx} &{} \boldsymbol{\varSigma }_{xy}\\ \boldsymbol{\varSigma }_{xy}^\mathrm {T}&{} \boldsymbol{\varSigma }_{yy} \end{bmatrix} \end{aligned}$$

PCA is analyzed for \(\boldsymbol{\varSigma }_{xx}\) or \(\boldsymbol{\varSigma }_{yy} \), while CCA is analyzed for \(\boldsymbol{\varSigma }_{xy}\)

Now the problem is how to find the direction vectors \(\boldsymbol{\alpha }\) and \(\boldsymbol{\beta }\), and then use them to construct the canonical components:

$$\begin{aligned} \begin{aligned} u&= {\alpha _1}{x_1} + {\alpha _2}{x_2} + \cdots + {\alpha _m}{x_m}\\ v&= {\beta _1}{y_1} + {\beta _2}{y_2} + \cdots + {\beta _l}{y_l}, \end{aligned} \end{aligned}$$
(3.1)

where \( \boldsymbol{\alpha }= [{\alpha _1},{\alpha _2},\ldots ,{\alpha _m}]^{\mathrm {T}} \in {R^{m \times 1}} \), \( \boldsymbol{\beta }= {[{\beta _1},{\beta _2},\ldots ,{\beta _l}]^{\mathrm {T}}} \in {R^{l \times 1}} \), such that the correlation between u and v is maximized. Obviously, the sample means of u and v are zero, and their sample variances are as follows:

$$\begin{aligned} \mathrm{var} (u)&= \boldsymbol{\alpha }^\mathrm {T}\boldsymbol{\varSigma }_{xx} \boldsymbol{\alpha }\\ \mathrm{var} (v)&= \boldsymbol{\beta }^\mathrm {T}\boldsymbol{\varSigma }_{yy} \boldsymbol{\beta }\end{aligned}$$

The covariance of u and v is

$$\begin{aligned} \mathrm{cov} (u, v) = \boldsymbol{\alpha }^\mathrm {T}\boldsymbol{\varSigma }_{xy} \boldsymbol{\beta }. \end{aligned}$$

One way to maximize the correlation of u and v is to make the corresponding correlation coefficient maximum, i.e.,

$$\begin{aligned} \max {\rho }(u, v) = \frac{\mathrm{cov} (u v)}{\sqrt{\mathrm{var} (u)\mathrm{var} (v)}}. \end{aligned}$$
(3.2)

In CCA, the following optimization objective is used:

$$\begin{aligned} \begin{aligned}&J_\mathrm{CCA}=\max <u,v>= \boldsymbol{\alpha }^\mathrm {T}\boldsymbol{\varSigma }_{xy} \boldsymbol{\beta }\\&\mathrm{s.t.}\; \boldsymbol{\alpha }^\mathrm {T}\boldsymbol{\varSigma }_{xx} \boldsymbol{\alpha }=1; \boldsymbol{\beta }^\mathrm {T}\boldsymbol{\varSigma }_{yy} \boldsymbol{\beta }=1. \end{aligned} \end{aligned}$$
(3.3)

This optimization objective can be summarized as follows: to seek a unit vector \(\boldsymbol{\alpha }\) on the subspace of \(\boldsymbol{X}\) and a unit vector \(\boldsymbol{\beta }\) on the subspace of \(\boldsymbol{Y}\) such that the correlation between u and v is maximized. Geometrically, \({\rho }(u, v)\) is again equal to the cosine of the angle between u and v. Thus, (3.3) is again equivalent to making the angle \(\omega \) between u and v take the minimum value.

It can be seen from (3.3) that the goal of the CCA algorithm is finally transformed into a convex optimization process. The maximum value of this optimization goal is the correlation coefficient of \(\boldsymbol{X}\) and \(\boldsymbol{Y}\), and the corresponding \( \boldsymbol{\alpha }\) and \(\boldsymbol{\beta }\) are projection vectors, or linear coefficients. After the first pair of canonical correlation variables are obtained, the second to kth pair of canonical correlation variables that are not correlated with each other can be similarly calculated.

The following Fig. 3.1 shows the basic principle diagram of the CCA algorithm:

Fig. 3.1
figure 1

Basic principle diagram of the CCA algorithm

At present, there are two main methods which include eigenvalue decomposition and singular value decomposition for optimizing the above objective function to obtain \( \boldsymbol{\alpha }\) and \( \boldsymbol{\beta }\).

3.1.2 Eigenvalue Decomposition of CCA Algorithm

Using the Lagrangian function, the objective function of (3.3) is transformed as follows:

$$\begin{aligned} \max J_\mathrm{CCA}(\boldsymbol{\alpha },\boldsymbol{\beta }) = {\boldsymbol{\alpha }^{\mathrm {T}}}{\boldsymbol{\varSigma }_{xy}}\boldsymbol{\beta }- \frac{{{\lambda _1}}}{2}({\boldsymbol{\alpha }^{\mathrm {T}}}{\boldsymbol{\varSigma }_{xx}}\boldsymbol{\alpha }- 1) - \frac{{{\lambda _2}}}{2}({\boldsymbol{\beta }^{\mathrm {T}}}{\boldsymbol{\varSigma }_{yy}}\boldsymbol{\beta }- 1). \end{aligned}$$
(3.4)

Set \(\frac{\partial J}{\partial \boldsymbol{\alpha }} =0 \) and \(\frac{\partial J}{\partial \boldsymbol{\beta }} =0 \), then

$$\begin{aligned} \begin{aligned} {\boldsymbol{\varSigma }_{xy}}\boldsymbol{\beta }- {\lambda _1}{\boldsymbol{\varSigma }_{xx}}\boldsymbol{\alpha }= 0\\ {\boldsymbol{\varSigma }_{xy}^\mathrm {T}}\boldsymbol{\alpha }- {\lambda _2}{\boldsymbol{\varSigma }_{yy}}\boldsymbol{\beta }= 0. \end{aligned} \end{aligned}$$
(3.5)

Let \( \lambda = {\lambda _1} = {\lambda _2} = \boldsymbol{\alpha }^\mathrm {T}\boldsymbol{\varSigma }_{xy} \boldsymbol{\beta }\), and multiply (3.5) to the left by \( \boldsymbol{\varSigma }_{xx}^{ - 1} \) and \( \boldsymbol{\varSigma }_{yy}^{ - 1} \), respectively, and get:

$$\begin{aligned} \begin{aligned} \boldsymbol{\varSigma }_{xx}^{ - 1}{\boldsymbol{\varSigma }_{xy}}\boldsymbol{\beta }= \lambda \boldsymbol{\alpha }\\ \boldsymbol{\varSigma }_{yy}^{ - 1}{\varSigma _{yx}}\boldsymbol{\alpha }= \lambda \boldsymbol{\beta }. \end{aligned} \end{aligned}$$
(3.6)

Substituting the second formula in (3.6) into the first formula, we can get

$$\begin{aligned} \boldsymbol{\varSigma }_{xx}^{ - 1}{\boldsymbol{\varSigma }_{xy}}\boldsymbol{\varSigma }_{yy}^{ - 1}{\varSigma _{yx}}\boldsymbol{\alpha }= {\lambda ^2}\boldsymbol{\alpha }\end{aligned}$$
(3.7)

From (3.7), we can get the largest eigenvalue \(\lambda \) and the corresponding maximum eigenvector \( \boldsymbol{\alpha }\) only by eigenvalue decomposition of the matrix \( \boldsymbol{\varSigma }_{xx}^{ - 1}{\boldsymbol{\varSigma }_{xy}}\boldsymbol{\varSigma }_{yy}^{ - 1}{\boldsymbol{\varSigma }_{yx}} \). In the similar way, the vector \( \boldsymbol{\beta }\) can be obtained. At this time, the projection vectors \( \boldsymbol{\alpha }\) and \( \boldsymbol{\beta }\) of a set of canonical correlation variables can be obtained.

3.1.3 SVD Solution of CCA Algorithm

Let \( \boldsymbol{\alpha }= \boldsymbol{\varSigma }_{xx}^{ - 1/2} \boldsymbol{a} \), \( \boldsymbol{\beta }= \boldsymbol{\varSigma }_{yy}^{ - 1/2}\boldsymbol{b} \), and then we can get

$$\begin{aligned} \begin{aligned} {\boldsymbol{\alpha }^{\mathrm {T}}}{\boldsymbol{\varSigma }_{xx}}\boldsymbol{\alpha }&= 1 \rightarrow {\boldsymbol{a}^{\mathrm {T}}}\boldsymbol{\varSigma }_{xx}^{ - 1/2}{\boldsymbol{\varSigma }_{xx}}\boldsymbol{\varSigma }_{xx}^{ - 1/2}\boldsymbol{a} = 1 \rightarrow {\boldsymbol{a}^{\mathrm {T}}}\boldsymbol{a} = 1\\ {\boldsymbol{\beta }^{\mathrm {T}}}{\boldsymbol{\varSigma }_{yy}}\boldsymbol{\beta }&= 1 \rightarrow {\boldsymbol{b}^{\mathrm {T}}}\boldsymbol{\varSigma }_{yy}^{ - 1/2}{\boldsymbol{\varSigma }_{yy}}\boldsymbol{\varSigma }_{yy}^{ - 1/2}\boldsymbol{b} = 1 \rightarrow {\boldsymbol{b}^{\mathrm {T}}}\boldsymbol{b} = 1\\ {\boldsymbol{\alpha }^{\mathrm {T}}}{\boldsymbol{\varSigma }_{xy}}\boldsymbol{\beta }&= {\boldsymbol{a}^{\mathrm {T}}}\boldsymbol{\varSigma }_{xx}^{ - 1/2}{\boldsymbol{\varSigma }_{xy}}\boldsymbol{\varSigma }_{yy}^{ - 1/2}\boldsymbol{b}. \end{aligned} \end{aligned}$$
(3.8)

In other words, the objective function of (3.3) can be transformed into as follows:

$$\begin{aligned} \begin{aligned}&J_\mathrm{CCA} (\boldsymbol{a}, \boldsymbol{b})= \arg \max _{a,b} {\boldsymbol{a}^{\mathrm {T}}}\boldsymbol{\varSigma }_{xx}^{ - 1/2}{\boldsymbol{\varSigma }_{xy}}\boldsymbol{\varSigma }_{yy}^{ - 1/2}\boldsymbol{b}\\&\mathrm{s.t.}\;\;{\boldsymbol{a}^{\mathrm {T}}}\boldsymbol{a} = {\boldsymbol{b}^{\mathrm {T}}}\boldsymbol{b} = 1. \end{aligned} \end{aligned}$$
(3.9)

A singular value decomposition for matrix \(\boldsymbol{M}\) yields

$$\begin{aligned} \boldsymbol{M}={\boldsymbol{\varSigma }}_{xx}^{-1 / 2} {\boldsymbol{\varSigma }}_{xy} {\boldsymbol{\varSigma }}_{yy}^{-1 / 2}={\boldsymbol{\varGamma }} {\boldsymbol{\varSigma }} {\boldsymbol{\varPsi }}^{\mathrm {T}}, {\boldsymbol{\varSigma }}=\begin{bmatrix}{\boldsymbol{\varLambda }}_{\kappa } &{} 0 \\ 0 &{} 0\end{bmatrix} \end{aligned}$$
(3.10)

where \(\kappa \) is the number of principal elements or non-zero singular values, and \(\kappa \le min(l,m)\), \({\boldsymbol{\varLambda }}_{\kappa }={\text {diag}}\left( \lambda _{1}, \ldots , \lambda _{\kappa }\right) \), \(\lambda _{1}\ge \cdots \ge \lambda _{\kappa }\>0\).

Since all columns of \(\boldsymbol{\varGamma }\) and \(\boldsymbol{\varPsi }\) are standard orthogonal basis, \( {\boldsymbol{a}^{\mathrm {T}}} \boldsymbol{\varGamma }\) and \( {\boldsymbol{\varPsi }}^{\mathrm {T}} \boldsymbol{b} \) are vectors with only one scalar value of 1, and the remaining scalar value of 0. So, we can get

$$\begin{aligned} {\boldsymbol{a}^{\mathrm {T}}}\boldsymbol{\varSigma }_{xx}^{ - 1/2}{\boldsymbol{\varSigma }_{xy}}\boldsymbol{\varSigma }_{yy}^{ - 1/2}\boldsymbol{b} = {\boldsymbol{a}^{\mathrm {T}}}\boldsymbol{\varGamma }\boldsymbol{\varSigma }{\boldsymbol{\varPsi }^{\mathrm {T}}}\boldsymbol{b} = {\sigma _{ab}}. \end{aligned}$$
(3.11)

From (3.11), it can be seen that \( {\boldsymbol{a}^{\mathrm {T}}}\boldsymbol{\varSigma }_{xx}^{ - 1/2}{\boldsymbol{\varSigma }_{xy}}\boldsymbol{\varSigma }_{yy}^{ - 1/2}\boldsymbol{b} \) maximizes actually the left and right singular vectors corresponding to the maximum singular values of \(\boldsymbol{M}\). Thus, using the corresponding left and right singular vectors \( \boldsymbol{\varGamma }\) and \( {\boldsymbol{\varPsi }}\), we can obtain the projection vectors \( \boldsymbol{\alpha }\) and \( \boldsymbol{\beta }\) for a set of canonical correlation variables, namely,

$$\begin{aligned} \begin{aligned} \boldsymbol{\alpha }= \boldsymbol{\varSigma }_{xx}^{ - 1/2}\boldsymbol{a}\\ \boldsymbol{\beta }= \boldsymbol{\varSigma }_{yy}^{ - 1/2}\boldsymbol{b}. \end{aligned} \end{aligned}$$
(3.12)

3.1.4 CCA-Based Fault Detection

When there is a clear input-output relationship between the two types of data measurable online, CCA can be used to design an effective fault detection system. The CCA-based fault detection method can be considered as an alternative to PCA-based fault detection method, and an extension of PLS-based fault detection method (Chen et al. 2016a).

Let

$$\begin{aligned} \boldsymbol{J}_{s}&={\boldsymbol{\varSigma }}_{xx}^{-1 / 2} {\boldsymbol{\varGamma }}(:, 1: \kappa ) \\ \boldsymbol{L}_{s}&={\boldsymbol{\varSigma }}_{yy}^{-1 / 2} {\boldsymbol{\varPsi }}(:, 1: \kappa )\\ \boldsymbol{J}_{\mathrm {res}}&={\boldsymbol{\varSigma }}_{xx}^{-1 / 2} {\boldsymbol{\varGamma }}(:, \kappa +1: l)\\ \boldsymbol{L}_{\mathrm {res}}&={\boldsymbol{\varSigma }}_{yy}^{-1 / 2} {\boldsymbol{\varPsi }}(:, \kappa +1: m). \end{aligned}$$

According to CCA method, \(\boldsymbol{J}_{s}^{\mathrm {T}} {\boldsymbol{x}}\) and \(\boldsymbol{L}_{s}^{\mathrm {T}} {\boldsymbol{y}}\) are closely related. However, in actual systems, measurement variables are inevitably affected by noise, and the correlation between \(\boldsymbol{J}_{s}^{\mathrm {T}} {\boldsymbol{x}}\) and \(\boldsymbol{L}_{s}^{\mathrm {T}} {\boldsymbol{y}}\) can be expressed as

$$\begin{aligned} \boldsymbol{L}_{s}^{\mathrm {T}} {\boldsymbol{y}}(k)={\boldsymbol{\varLambda }}_{\kappa }^{\mathrm {T}} \boldsymbol{J}_{s}^{\mathrm {T}} {\boldsymbol{x}}(k)+{v}_{s}(k), \end{aligned}$$
(3.13)

where \({v}_{s}\) is the noise term and weakly related to \(\boldsymbol{J}_{s}^{\mathrm {T}}{\boldsymbol{x}}\). Based on this, the residual vector is

$$\begin{aligned} \boldsymbol{r}_{1}(k)=\boldsymbol{L}_{s}^{\mathrm {T}} {\boldsymbol{y}}(k)-{\boldsymbol{\varLambda }}_{\kappa }^{\mathrm {T}} \boldsymbol{J}_{s}^{\mathrm {T}} {\boldsymbol{x}}(k).\end{aligned}$$
(3.14)

Assume that the input and output data obey the Gaussian distribution. It is known that linear transformation does not change the distribution of random variables, so the residual signal \(\boldsymbol{r}_{1}\) also obeys the Gaussian distribution and its covariance matrix is

$$\begin{aligned} {\boldsymbol{\varSigma }}_{r 1}=\frac{1}{N-1}\left( \boldsymbol{L}_{s}^{\mathrm {T}} {\boldsymbol{Y}}-{\boldsymbol{\varLambda }}_{\kappa }^{\mathrm {T}} \boldsymbol{J}_{s}^{\mathrm {T}} {\boldsymbol{X}}\right) \left( \boldsymbol{L}_{s}^{\mathrm {T}} {\boldsymbol{Y}}-{\boldsymbol{\varLambda }}_{\kappa }^{\mathrm {T}} \boldsymbol{J}_{s}^{\mathrm {T}} {\boldsymbol{U}}\right) ^{\mathrm {T}}=\frac{{\boldsymbol{I}}_{\kappa }-{\varLambda }_{\kappa }^{2}}{N-1}.\end{aligned}$$
(3.15)

Similarly, another residual vector can be obtained

$$\begin{aligned} \boldsymbol{r}_{2}(k)=\boldsymbol{J}_{s}^{\mathrm {T}} {\boldsymbol{x}}(k)-{\boldsymbol{\varLambda }}_{\kappa } \boldsymbol{L}_{s}^{\mathrm {T}} {\boldsymbol{y}}(k). \end{aligned}$$
(3.16)

Its covariance matrix is

$$\begin{aligned} {\boldsymbol{\varSigma }}_{r 2}=\frac{1}{N-1}\left( \boldsymbol{J}_{s}^{\mathrm {T}} {\boldsymbol{U}}-{\boldsymbol{\varLambda }}_{\kappa } \boldsymbol{L}_{s}^{\mathrm {T}} {\boldsymbol{Y}}\right) \left( \boldsymbol{J}_{s}^{\mathrm {T}} {\boldsymbol{U}}-{\boldsymbol{\varLambda }}_{\kappa } \boldsymbol{L}_{s}^{\mathrm {T}} {Y}\right) ^{\mathrm {T}}=\frac{{\boldsymbol{I}}_{\kappa }-{\boldsymbol{\varLambda }}_{\kappa }^{2}}{N-1}. \end{aligned}$$
(3.17)

It can be seen from formula (3.15)–(3.16) that the covariance of residual \(\boldsymbol{r}_{1}\) and \(\boldsymbol{r}_{2}\) are the same. For fault detection, the following two statistics can be constructed:

$$\begin{aligned} \mathrm{{T}}_{1}^{2}(k)=(N-1) \boldsymbol{r}_{1}^{\mathrm {T}}(k)\left( {\boldsymbol{I}}_{\kappa }-{\boldsymbol{\varLambda }}_{\kappa }^{2}\right) ^{-1} \boldsymbol{r}_{1}(k) \end{aligned}$$
(3.18)
$$\begin{aligned} \mathrm{{T}}_{2}^{2}(k)=(N-1) \boldsymbol{r}_{2}^{\mathrm {T}}(k)\left( {\boldsymbol{I}}_{\kappa }-{\boldsymbol{\varLambda }}_{\kappa }^{2}\right) ^{-1} \boldsymbol{r}_{2}(k). \end{aligned}$$
(3.19)

3.2 Partial Least Squares

Multiple linear regression analysis is relatively common and the least square method is generally used to estimate the regression coefficient in this type of regression method. But the least square technique often fails when there is multiple correlation between the independent variables or the number of samples is less than the number of variables. So the partial least square technique is developed to resolve this problem. S. Wold and C. Albano et al. proposed the partial least squares method for the first time and applied it to the field of chemistry (Wold et al. 1989). It aims at the regression modeling between two sets of multi-variables with high correlation and integrates the basic functions of multiple linear regression analysis, principal component analysis, and canonical correlation analysis. PLS is also called the second-generation regression analysis method due to its simplification model in the data structure and correlation (Hair et al. 2016). It has developed rapidly and widely used in various fields recent years (Okwuashi et al. 2020; Ramin et al. 2018).

3.2.1 Fundamental of PLS

Suppose there are l dependent variables \(\left( y_{1}, y_{2}, \ldots , y_{l}\right) \) and m independent variables \(\left( x_{1}, x_{2}, \ldots , x_{m}\right) \). In order to study the statistical relationship between the dependent variable and the independent variable, n sample points are observed, which constitutes a data set \(\left( \boldsymbol{X}=\left[ x_{1}, x_{2}, \ldots , x_{m}\right] \in R^{n \times m}\right. \), \(\left. \boldsymbol{Y}=\left[ y_{1}, y_{2}, \ldots , y_{l}\right] \in R^{n \times l}\right) \) of the independent variables and the dependent variables.

To address the problems encountered in least squares multiple regression between \(\boldsymbol{X}\) and \(\boldsymbol{Y}\), the concept of component extraction is introduced in PLS regression analysis. Recall that principal component analysis, for a single data matrix \(\boldsymbol{X}\), finds the composite variable that best summarizes the information in the original data. The principal component \(\boldsymbol{T}\) in \(\boldsymbol{X}\) is extracted with the maximum variance information of the original data:

$$\begin{aligned} \begin{aligned} \max {\text {var}}\left( \boldsymbol{T} \right) , \end{aligned} \end{aligned}$$
(3.20)

PLS extracts component vectors \(\boldsymbol{t}_{i}\) and \(\boldsymbol{u}_{i}\) from \(\boldsymbol{X}\) and \(\boldsymbol{Y}\), which means \(\boldsymbol{t}_{i}\) is a linear combination of \(\left( x_{1}, x_{2}, \ldots , x_{m}\right) \), and \(\boldsymbol{u}_{i}\) is a linear combination of \(\left( y_{1}, y_{2}, \ldots , y_{l}\right) \). During the extracting of components, in order to meet the needs of regression analysis, the following two requirements should be satisfied:

  1. (1)

    \(\boldsymbol{t}_{i}\) and \(\boldsymbol{u}_{i}\) carry the variation information in their respective data set as much as possible, respectively;

  2. (2)

    The correlation between \(\boldsymbol{t}_{i}\) and \(\boldsymbol{u}_{i}\) is maximized.

The two requirements indicate that \(\boldsymbol{t}_{i}\) and \(\boldsymbol{u}_{i}\) should represent the data set \(\boldsymbol{X}\) and \(\boldsymbol{Y}\) as well as possible and the component \(\boldsymbol{t}_{i}\) of the independent variable has the best ability to explain the component \(\boldsymbol{u}_{i}\) of the dependent variable.

3.2.2 PLS Algorithm

The most popular algorithm used in PLS to compute the vectors in the calibration step is known as nonlinear iterative partial least squares (NIPALS). First, normalize the data to achieve the purpose of facilitating calculations. Normalize \(\boldsymbol{X}\) to get matrix \(\boldsymbol{E}_{0}\) and normalize \(\boldsymbol{Y}\) to get matrix \(\boldsymbol{F}_{0}\):

$$\begin{aligned} \begin{aligned} \boldsymbol{E}_{0}=\begin{bmatrix} x_{11} &{} \cdots &{} x_{1\,m} \\ \vdots &{} \vdots &{} \vdots \\ x_{n 1} &{} \cdots &{} x_{m n}, \end{bmatrix}, \quad \boldsymbol{F}_{0}= \begin{bmatrix} y_{11} &{} \cdots &{} y_{1 l} \\ \vdots &{} \vdots &{} \vdots \\ y_{n 1} &{} \cdots &{} x_{n l}. \end{bmatrix} \end{aligned} \end{aligned}$$
(3.21)

In the first step, set \(\boldsymbol{t}_{1}\left( \boldsymbol{t}_{1}=\boldsymbol{E}_{0} \boldsymbol{w}_{1}\right) \) to be the first component of \(\boldsymbol{E}_{0}\), and \(\boldsymbol{w}_{1}\) is the first direction vector of \(\boldsymbol{E}_{0}\), which is a unit vector, \(\left\| \boldsymbol{w}_{1}\right\| =1\). Similarly, set \(\boldsymbol{u}_{1}\left( \boldsymbol{u}_{1}=\boldsymbol{F}_{0} \boldsymbol{c}_{1}\right) \) to be the first component of \(\boldsymbol{F}_{0}\), and \(\boldsymbol{E}_{0}\) is the first direction vector of \(\boldsymbol{F}_{0}\), which is a unit vector too, \(\left\| \boldsymbol{c}_{1}\right\| =1\).

According to the principle of principal component analysis, \(\boldsymbol{t}_{1}\) and \(\boldsymbol{u}_{1}\) should meet the following conditions in order to be able to represent the data variation information in \(\boldsymbol{X}\) and \(\boldsymbol{Y}\) well:

$$\begin{aligned} \begin{aligned} \max {\text {var}}\left( \boldsymbol{t}_{1}\right) \\ \max {\text {var}}\left( \boldsymbol{u}_{1}\right) \end{aligned} \end{aligned}$$
(3.22)

On the other hand, \(\boldsymbol{t}_{1}\) is further required to have the best explanatory ability for \(\boldsymbol{u}_{1}\) due to the needs of regression modeling. According to the thinking of canonical correlation analysis, the correlation between \(\boldsymbol{t}_{1}\) and \(\boldsymbol{u}_{1}\) should reach the maximum value:

$$\begin{aligned} \max r\left( \boldsymbol{t}_{1}, \boldsymbol{u}_{1}\right) . \end{aligned}$$
(3.23)

The covariance of \(\boldsymbol{t}_{1}\) and \(\boldsymbol{u}_{1}\) is usually used to describe the correlation in partial least squares regression:

$$\begin{aligned} \begin{aligned} \max {\text {Cov}}\left( \boldsymbol{t}_{1}, \boldsymbol{u}_{1}\right) =\sqrt{{\text {var}}\left( \boldsymbol{t}_{1}\right) {\text {var}}\left( \boldsymbol{u}_{1}\right) } r\left( \boldsymbol{t}_{1}, \boldsymbol{u}_{1}\right) \end{aligned} \end{aligned}$$
(3.24)

Converting to the normal mathematical expression, \(\boldsymbol{t}_{1}\) and \(\boldsymbol{u}_{1}\) is solved by the following optimization problem:

$$\begin{aligned} \begin{aligned} \max _{\boldsymbol{w}_{1}, \boldsymbol{c}_{1}}\left\langle \boldsymbol{E}_{0} \boldsymbol{w}_{1}, \boldsymbol{F}_{0} \boldsymbol{c}_{1}\right\rangle \\ \text{ s.t } \left\{ \begin{aligned} \boldsymbol{w}_{1}^{\mathrm {T}} \boldsymbol{w}_{1}=1 \\ \boldsymbol{c}_{1}^{\mathrm {T}} \boldsymbol{c}_{1}=1. \end{aligned}\right. \end{aligned} \end{aligned}$$
(3.25)

Therefore, it needs to calculate the maximum value of \(\boldsymbol{w}_{1}^{\mathrm {T}} \boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{F}_{0} \boldsymbol{c}_{1}\) under the constraints of \(\left\| \boldsymbol{w}_{1}\right\| ^{2}=1\) and \(\left\| \boldsymbol{c}_{1}\right\| ^{2}=1\).

In this case, the Lagrangian function is

$$\begin{aligned} \begin{aligned} s=\boldsymbol{w}_{1}^{\mathrm {T}} \boldsymbol{E}^{\mathrm {T}} \boldsymbol{F}_{0} \boldsymbol{c}_{1}-\lambda _{1}\left( \boldsymbol{w}_{1}^{\mathrm {T}} \boldsymbol{w}_{1}-1\right) -\lambda _{2}\left( \boldsymbol{c}_{1}^{\mathrm {T}} \boldsymbol{c}_{1}-1\right) . \end{aligned} \end{aligned}$$
(3.26)

Calculate the partial derivatives of s with respect to \(\boldsymbol{w}_{1}\), \(\boldsymbol{c}_{1}\), \(\lambda _{1}\), and \(\lambda _{2}\), and let them be zero

$$\begin{aligned} \begin{aligned} \frac{\partial s}{\partial \boldsymbol{w}_{1}}=\boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{F}_{0} \boldsymbol{c}_{1}-2 \lambda _{1} \boldsymbol{w}_{1}=0, \end{aligned} \end{aligned}$$
(3.27)
$$\begin{aligned} \begin{aligned} \frac{\partial s}{\partial \boldsymbol{c}_{1}}=\boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{F}_{0} \boldsymbol{w}_{1}-2 \lambda _{2} \boldsymbol{c}_{1}=0, \end{aligned} \end{aligned}$$
(3.28)
$$\begin{aligned} \begin{aligned} \frac{\partial s}{\partial \lambda _{1}}=-\left( \boldsymbol{w}_{1}^{\mathrm {T}} \boldsymbol{w}_{1}-1\right) =0, \end{aligned} \end{aligned}$$
(3.29)
$$\begin{aligned} \begin{aligned} \frac{\partial s}{\partial \lambda _2}=-\left( \boldsymbol{c}^{\mathrm {T}}_{1} \boldsymbol{c}_{1}-1\right) =0. \end{aligned} \end{aligned}$$
(3.30)

It can be derived from the above formulas that

$$\begin{aligned} \begin{aligned} 2 \lambda _{1}=2 \lambda _{2}=\boldsymbol{w}_{1}^{\mathrm {T}} \boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{F}_{0} \boldsymbol{c}_{1}=\left\langle \boldsymbol{E}_{0} \boldsymbol{w}_{1}, \boldsymbol{F}_{0} \boldsymbol{c}_{1}\right\rangle \end{aligned} \end{aligned}$$
(3.31)

Let \(\theta _{1}=2 \lambda _{1}=2 \lambda _{2}=\boldsymbol{w}_{1}^{\mathrm {T}} \boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{F}_{0} \boldsymbol{c}_{1}\), so \(\theta _1\) is the value of the objective function of the optimization problem (3.25). Then (3.27) and (3.28) are rewritten as

$$\begin{aligned} \begin{aligned} \boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{F}_{0} \boldsymbol{c}_{1}=\theta _{1} \boldsymbol{w}_{1}, \end{aligned} \end{aligned}$$
(3.32)
$$\begin{aligned} \begin{aligned} \boldsymbol{F}_{0}^{\mathrm {T}} \boldsymbol{E}_{0} \boldsymbol{w}_{1}=\theta _{1} \boldsymbol{c}_{1}. \end{aligned} \end{aligned}$$
(3.33)

Substitute (3.33) into (3.32),

$$\begin{aligned} \begin{aligned} \boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{F}_{0} \boldsymbol{F}_{0}^{\mathrm {T}} \boldsymbol{E}_{0} \boldsymbol{w}_{1}=\theta _{1}^{2} \boldsymbol{w}_{1}. \end{aligned} \end{aligned}$$
(3.34)

Substitute (3.32) into (3.33) simultaneously,

$$\begin{aligned} \begin{aligned} \boldsymbol{F}_{0}^{\mathrm {T}} \boldsymbol{E}_{0} \boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{F}_{0} \boldsymbol{c}_{1}=\theta _{1}^{2} \boldsymbol{c}_{1}. \end{aligned} \end{aligned}$$
(3.35)

Equation (3.34) shows that \(\boldsymbol{w}_{1}\) is the eigenvector of matrix \(\boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{F}_{0} \boldsymbol{F}_{0}^{\mathrm {T}} \boldsymbol{E}_{0}\) with the corresponding eigenvalue \(\theta _{1}^{2}\). Here, \(\theta _{1}\) is the objective function. If we want to get its maximum value, \(\boldsymbol{w}_{1}\) should be the unit eigenvector of the maximum eigenvalue of matrix \(\boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{F}_{0} \boldsymbol{F}_{0}^{\mathrm {T}} \boldsymbol{E}_{0}\). Similarly, \(\boldsymbol{c}_{1}\) should be the unit eigenvector of the largest eigenvalue of the matrix \(\boldsymbol{F}_{0}^{\mathrm {T}} \boldsymbol{E}_{0} \boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{F}_{0}\).

Then the first components \(\boldsymbol{t}_{1}\) and \(\boldsymbol{u}_{1}\) are calculated from the direction vectors \(\boldsymbol{w}_{1}\) and \(\boldsymbol{c}_{1}\):

$$\begin{aligned} \begin{aligned} \boldsymbol{t}_{1}=\boldsymbol{E}_{0} \boldsymbol{w}_{1} \\ \boldsymbol{u}_{1}=\boldsymbol{F}_{0} \boldsymbol{c}_{1}. \end{aligned} \end{aligned}$$
(3.36)

The regression equations of \(\boldsymbol{E}_{0}\) and \(\boldsymbol{F}_{0}\) is found by \(\boldsymbol{t}_{1}\) and \(\boldsymbol{u}_{1}\):

$$\begin{aligned} \begin{aligned} \boldsymbol{E}_{0}=\boldsymbol{t}_{1} \boldsymbol{p}_{1}^{\mathrm {T}}+\boldsymbol{E}_{1} \\ \boldsymbol{F}_{0}=\boldsymbol{u}_{1} \boldsymbol{q}_{1}^{\mathrm {T}}+\boldsymbol{F}_{1}^{*} \\ \boldsymbol{F}_{0}=\boldsymbol{t}_{1} \boldsymbol{r}_{1}^{\mathrm {T}}+\boldsymbol{F}_{1}. \end{aligned} \end{aligned}$$
(3.37)

The regression coefficient vectors in (3.37) are

$$\begin{aligned} \begin{aligned} \boldsymbol{p}_{1}&=\frac{\boldsymbol{E}_{0}^{\mathrm {T}} \boldsymbol{t}_{1}}{\left\| \boldsymbol{t}_{1}\right\| ^{2}} \\ \boldsymbol{q}_{1}&=\frac{\boldsymbol{F}_{0}^{\mathrm {T}} \boldsymbol{u}_{1}}{\left\| \boldsymbol{u}_{1}\right\| ^{2}} \\ \boldsymbol{r}_{1}&=\frac{\boldsymbol{F}_{0}^{\mathrm {T}} \boldsymbol{t}_{1}}{\left\| \boldsymbol{t}_{1}\right\| ^{2}}. \end{aligned} \end{aligned}$$
(3.38)

\(\boldsymbol{E}_{1}\), \(\boldsymbol{F}_{1}^{*}\) and \(\boldsymbol{F}_{1}\) are the residual matrices of the three regression equations.

Second step is to replace \(\boldsymbol{E}_{0}\) and \(\boldsymbol{F}_{0}\) with residual matrices \(\boldsymbol{E}_{1}\) and \(\boldsymbol{F}_{1}\), respectively. Then find the second pair of direction vectors \(\boldsymbol{w}_{2}\), \(\boldsymbol{c}_{2}\), and the second pair of components \(\boldsymbol{t}_{2}\) and \(\boldsymbol{u}_{2}\):

$$\begin{aligned} \begin{aligned} \boldsymbol{t}_{2}&=\boldsymbol{E}_{1} \boldsymbol{w}_{2} \\ \boldsymbol{u}_{2}&=\boldsymbol{F}_{1} \boldsymbol{c}_{2} \\ \theta _{2}&=\boldsymbol{w}_{2}^{\mathrm {T}} \boldsymbol{E}_{1}^{\mathrm {T}} \boldsymbol{F}_{1} \boldsymbol{c}_{2}. \end{aligned} \end{aligned}$$
(3.39)

Similarly, \(\boldsymbol{w}_{2}\) is the unit eigenvector corresponding to the largest eigenvalue of matrix \(\boldsymbol{E}_{1}^{\mathrm {T}} \boldsymbol{F}_{1} \boldsymbol{F}_{1}^{\mathrm {T}} \boldsymbol{E}_{1}\), and \(\boldsymbol{c}_{2}\) is the unit eigenvector of the largest eigenvalue of matrix \(\boldsymbol{F}_{1}^{\mathrm {T}} \boldsymbol{E}_{1} \boldsymbol{E}_{1}^{\mathrm {T}} \boldsymbol{F}_{1}\). Calculate the regression coefficient

$$\begin{aligned} \begin{aligned} \boldsymbol{p}_{2}&=\frac{\boldsymbol{E}_{1}^{\mathrm {T}} \boldsymbol{t}_{2}}{\left\| \boldsymbol{t}_{2}\right\| ^{2}} \\ \boldsymbol{r}_{2}&=\frac{\boldsymbol{F}_{1}^{\mathrm {T}} \boldsymbol{t}_{2}}{\left\| \boldsymbol{t}_{2}\right\| ^{2}}. \end{aligned} \end{aligned}$$
(3.40)

The regression equation is updated:

$$\begin{aligned} \begin{aligned} \boldsymbol{E}_{1}&=\boldsymbol{t}_{2} \boldsymbol{p}_{2}^{\mathrm {T}}+\boldsymbol{E}_{2} \\ \boldsymbol{F}_{1}&=\boldsymbol{t}_{2} \boldsymbol{r}_{2}^{\mathrm {T}}+\boldsymbol{F}_{2}. \end{aligned} \end{aligned}$$
(3.41)

Repeat the calculation according to the above steps. If the rank of \(\boldsymbol{X}\) is R, the regression equation can be obtained:

$$\begin{aligned} \begin{aligned} \boldsymbol{E}_{0}&=\boldsymbol{t}_{1} \boldsymbol{p}_{1}^{\mathrm {T}}+\cdots +\boldsymbol{t}_{R} \boldsymbol{p}_{R}^{\mathrm {T}} \\ \boldsymbol{F}_{0}&=\boldsymbol{t}_{1} \boldsymbol{r}_{1}^{\mathrm {T}}+\cdots +\boldsymbol{t}_{R} \boldsymbol{r}_{R}^{\mathrm {T}}+\boldsymbol{F}_{R}. \end{aligned} \end{aligned}$$
(3.42)

If the number of feature vectors used in the PLS modeling is large enough, the residuals could be zero. In general, it only needs to select \(a (a \ll R)\) components among them to form a regression model with better prediction. The number of principal components required for modeling is determined by cross-validation discussed in Sect. 3.2.3. Once the appropriate component number is determined, the external relationship of the input variable matrix \(\boldsymbol{X}\) as

$$\begin{aligned} \begin{aligned} \boldsymbol{X}=\boldsymbol{T} \boldsymbol{P}^{\mathrm {T}}+\bar{\boldsymbol{X}}=\sum _{h=1}^{a} \boldsymbol{t}_{h} \boldsymbol{p}_{h}^{\mathrm {T}}+\bar{\boldsymbol{X}}. \end{aligned} \end{aligned}$$
(3.43)

The external relationship of the output variable matrix \(\boldsymbol{Y}\) can be written as

$$\begin{aligned} \begin{aligned} \mathrm {Y}=\boldsymbol{U}\boldsymbol{Q}^{\mathrm {T}}+\bar{\boldsymbol{Y}}=\sum _{h=1}^{a} \boldsymbol{u}_{h} \boldsymbol{q}_{h}^{\mathrm {T}}+\bar{\boldsymbol{Y}}. \end{aligned} \end{aligned}$$
(3.44)

The internal relationship is expressed as

$$\begin{aligned} \begin{aligned} \hat{\boldsymbol{u}}_{h}=\boldsymbol{b}_{h} \boldsymbol{t}_{h}, \quad \boldsymbol{b}_{h}=\boldsymbol{t}_{h}^{\mathrm {T}} \boldsymbol{u}_{h} / \boldsymbol{t}_{h}^{\mathrm {T}} \boldsymbol{t}_{h}. \end{aligned} \end{aligned}$$
(3.45)

3.2.3 Cross-Validation Test

In many cases, the PLS equation does not require the selection of all principal components for regression modeling, but rather, as in principal component analysis, the first \(d(d\le l)\) principal components can be selected in a truncated manner, and a better predictive model can be obtained using only these d principal components. In fact, if the subsequent principal components no longer provide more meaningful information to explain the dependent variable, using too many principal components will only undermine the understanding of the statistical trend and lead to wrong prediction conclusions. The number of principal components required for modeling can be determined by cross-validation.

Cross-validation is used to prevent over-fitting caused by complex model. Sometimes referred to as the circular estimation, it is a statistically useful method for cutting data sample into smaller subset. This is done by first doing the analysis on a subset, while the other subset is used for subsequent confirmation and validation of this analysis. The subset used for analysis is called the training set. The other subset is called validation set and generally separated from the testing set. Two cross-validation methods often used in practice are K-fold cross-validation (K-CV) and leave-one-out cross-validation (LOO-CV).

K-CV divides the n original data into K groups (generally evenly divided), makes each subset of data into a validation set once separately. The rest of the \(K-1\) subsets are considered as the training set, so K-CV will result in K models. In general, K is selected between 5 and 10. LOO-CV is essentially N-CV. The process of determining the number of principal components will be described in detail using LOO-CV as an example.

All n samples are divided into two parts: the first part is the set of all samples excluding a certain sample i (containing a total of \(n-1\) samples) and a regression equation is fitted with this data set using d principal components; The second part is to substitute the ith sample that was just excluded into the fitted regression equation to obtain the predicted value \(\hat{y}_{(i) j}(d),\; j=1,2, \ldots , l\) of \(y_j\). Repeating the above test for each \(i=1,2,\ldots ,n\) , the sum of squared prediction errors for \(y_j\) can be defined as \(\mathrm{PRESS}_{j}(d)\).

$$\begin{aligned} \mathrm{PRESS}_{j}(d)=\sum _{i=1}^{n}\left( y_{i j}-\hat{y}_{(i) j}(d)\right) ^{2}, j=1,2, \ldots , l. \end{aligned}$$
(3.46)

The sum of squared prediction errors of \(Y=\left( y_{1}, \ldots , y_{l}\right) ^{\mathrm {T}}\) can be obtained as

$$\begin{aligned} \begin{aligned} \mathrm{PRESS}(d)=\sum _{j=1}^{l} \mathrm{PRESS}_{j}(d). \end{aligned} \end{aligned}$$
(3.47)

Obviously, if the robustness of the regression equation is not good, the error is large and thus it is very sensitive to change in the samples, and the effect of this perturbation error will increase the \(\mathrm{PRESS}(d)\) value.

On the other hand, use all sample points to fit a regression equation containing d components. In this case, the fitted value of the ith sample point is \(\hat{y}_{i j}(d)\). The fitted error sum of squares for \(y_{j}\) is defined as \(\mathrm{S S}_{j}(d)\) value

$$\begin{aligned} \mathrm{SS}_{j}(d)=\sum _{i=1}^{n}\left( y_{i j}-\hat{y}_{i j}(d)\right) ^{2}. \end{aligned}$$
(3.48)

The sum of squared errors of \(\boldsymbol{Y}\) is

$$\begin{aligned} \mathrm{SS}(d)=\sum _{i=1}^{l} S S_{j}(d) \end{aligned}$$
(3.49)

Generally, \(\mathrm{PRESS}(d)\) is greater than SS(d) because \(\mathrm{PRESS}(d)\) contains an unknown perturbation error and the fitting error decreases with the increase of components, i.e., SS(d) is less than \(\mathrm{SS}(d-1)\). Next, compare \(\mathrm{SS}(d-1)\) and \(\mathrm{PRESS}(d)\). \(\mathrm{SS}(d-1)\) is the fitting error of the regression equation that is fitted with all samples with d components; \(\mathrm{PRESS}(d)\) contains the perturbation error of the samples but with one more component. If the d component regression equation with perturbation error can be somewhat smaller than the fitting error of the \(d-1\) component regression equation, it is considered that adding one component \(\boldsymbol{t}_d\) will result in a significant improvement in prediction accuracy. Therefore, it is always expected that the ratio of \(\frac{\mathrm{PRESS}(d)}{\mathrm{SS}(d-1)}\) is as small as possible. The general setting

$$\begin{aligned} \frac{\mathrm{PRESS}(d)}{\mathrm{SS}(d-1)} \le (1-0.05)^{2}=0.95^2. \end{aligned}$$
(3.50)

IF \(\mathrm{PRESS}(d)\le 0.95^2 \mathrm{SS}(d-1)\), the addition of the component is considered beneficial. And conversely, if \(\mathrm{PRESS}(d) > 0.95^2 \mathrm{SS}(d-1)\), the new addition of components is considered to have no significant improvement in reducing the prediction error of the regression equation.

In practice, the following cross-validation index is used. For each dependent variable \(y_j\), define

$$\begin{aligned} \mathrm{{Q}}_{dj}^{2}=1-\frac{\mathrm{PRESS}_j(d)}{\mathrm{SS}_j(d-1)}. \end{aligned}$$
(3.51)

For the full dependent variable \(\boldsymbol{Y}\), the cross-validation index of component \(\boldsymbol{t}_d\) is defined as

$$\begin{aligned} \mathrm{{Q}}_{d}^{2}=1-\frac{\mathrm{PRESS}(d)}{\mathrm{SS}(d-1)}. \end{aligned}$$
(3.52)

The marginal contribution of component \(\boldsymbol{t}_d\) to the predictive accuracy of the regression model has the following two scales (cross-validation index).

  1. (1)

    \(\mathrm {Q}_{\rm{d}}^{2}>1-0.95^2 =0.0975\), the marginal contribution of \(\boldsymbol{t}_d\) component is significant; and

  2. (2)

    For \(k=1,2,\ldots ,l\), there is at least one k such that \(\mathrm {Q}_{\rm{dj}}^{2}>0.0975\) holds, at which point the addition of component \(\boldsymbol{t}_d\) leads to a significant improvement in the prediction accuracy of at least one dependent variable \(y_k\). Therefore it can also be argued that adding component \(\boldsymbol{t}_d\) is clearly beneficial.