1 Introduction

An active field of research in the process control domain, process monitoring-fault detection (PM-FD) aims to quickly detect sensor faults in chemical process plants (Venkatasubramanian et al. 2003; Ramakrishna Kini and Madakyaru 2020). Until recently, the task of detecting process deviations was dependent on human supervision, which eventually lead to more cost due to the increased complexity in modern process plants. To deal with process deviations or faults, model-based and process history methods have been applied in practice. The model-based PM-FD methods rely on comparing information of measured variables with an explicit known mathematical model of the process. A few commonly used methods include parity relations, expert systems and Kalman filters (Sanjula and Zukui 2019). In contrast, process history PM-FD schemes rely purely on available historical data. Owing to the tremendous advances in the field of computer technology, acquiring data from sensors has become an easy task. Once large data is available, the current status of the process plant can be studied, and this would aid in diagnosing faults.

Process history methods are further classified into two categories: univariate and multivariate techniques. While univariate techniques monitor a single variable at a time, multivariate methods excel in monitoring multiple variables. In recent years, a few of the commonly applied multivariate methods include principal component analysis (PCA), principal component regression (PCR) and partial least squares (PLS) (Yin et al. 2014). PCA, a classic dimensionality reduction method, is one of the oldest multivariate methods used in fault detection problems (Nor et al. 2020; Alauddin et al. 2018). In PCA, large-dimension data is transformed to new dimensional subspace with an aim of retaining the original data information in the transformed variables (Harkat et al. 2020). Despite being the core method in fault detection problems, PCA FD strategy carries with it a few disadvantages. First, modeling in PCA is carried out by considering second-order statistics of mean and variance. Secondly, PCA assumes that the latent variables adhere to Gaussian distribution, which is not often true with industrial processes.

More recently, a PM-FD method based on independent component analysis(ICA) has been proposed for capturing non-Gaussian characteristics of data through higher-order statistical representation. The ICA method, initially developed for blind source separation problems, has progressed very well to find applications in PM-FD problems (Tong et al. 2017; Li and Hongbo 2010). In ICA, large data is decomposed into linear combination of independent components (ICs) that are independent with no orthogonality constraint (Hyvarinen 2013). ICA modeling involves computation of ICs by using an iterative algorithm in which separating the matrix is randomly initialized, and this usually results in different solutions for ICA algorithms (Lee et al. 2006). In this work, the de-mixing orthogonal matrix V is initialized to identity the matrix instead of a random matrix, as this helps in providing a constant solution. Once the ICA model is developed, it is used to monitor new data through three fault indicators, namely, \(\textit{I}^{2}_{d}\), \(\textit{I}_{e}^{2}\) and squared prediction error (SPE). Many extensions of conventional ICA technique have been developed in recent years. This includes dynamic ICA (DICA), which considers the dynamics of the data via lagged variables, modified ICA (MICA) (Tong et al. 2017; Yingwei and Yang 2010), where dominant ICs are extracted from data to reduce computational complexity, kernel ICA (KICA) (Tian et al. 2009), which is used to handle both nonlinear and non-Gaussian industrial data, and multi-block ICA where ICA process monitoring is carried out in different blocks (Yingwei and Chi 2012). A multi-way kernel “FastICA” based on feature samples was developed, which was used for diagnosing batch processes, while nonlinear detection using integrated kernel PCA and locality preserving projections (LPPs) was also developed (Lianfang and Xuemin 2014).

Though ICA and its variant methods have been applied successfully in the process monitoring domain, they still face a stiff challenge of extracting information from high-dimensional noisy data available from industrial plants. In this work, the robustness of ICA and PCA fault detection strategies is validated by their ability in monitoring process data in the presence of different levels of noise. To test the robustness of an FD strategy towards different noise realizations of sensor data, stochastic simulations have been carried out. The main aim is to show that the ICA-based models are more robust for different levels of noise as compared to conventional PCA method. Stochastic simulations involving 1000 runs will be performed for noise with a signal-to-noise ratio (SNR) of 20, 10 and 5. The SNR is defined as the ratio of the variance of the signal to variance of the noise. SNR = 20 corresponds to quality data with less noise, SNR = 10 corresponds to a medium level of noise, while SNR = 5 indicates a high amount of noise in the data. Comparisons have been carried out between ICA-, DICA-, MICA- and PCA-based fault detection strategies for different noise levels.

The organization of this paper is as follows: Section 2 provides a detailed overview of ICA, DICA, MICA and PCA modeling methods along with the fault indicators. Next, robust fault detection based on ICA involving stochastic simulation with different noise realization is presented through a block diagram representation in Sect. 3. To test the effectiveness of different FD strategies against noise realizations, two case studies have been considered: a simulated quadruple tank process (QTP) and a simulated distillation column (DC) process in Sect. 4. The paper terminates with a conclusion section in Sect. 5.

2 Data-Driven Modeling Framework

2.1 Independent Component Analysis

ICA is a multivariate technique with the sole objective of extracting hidden independent components (ICs) from non-Gaussian data. Available training data, \(\mathbf {X_{a}} = [\mathbf {x_{1}},\mathbf {x_{2}}\ldots ,\mathbf {x_{n}}]^{T}\) \(\in \mathfrak {R}^{\textit{m}\times \textit{n}}\) is represented as a linear mixture of \(\textit{p}(\le \textit{m})\) unknown ICs \(\textit{s}_{1}, \textit{s}_{2}\ldots \textit{s}_{p}\) and a mixing or a separation matrix. The original data and ICs are related which is described as:

$$\begin{aligned} \mathbf {X_{a} = AS+E} \end{aligned}$$
(1)

where n and m represent number of observations and variables, respectively, and A = \([\mathbf {{a_{1}}\ldots {a_{ p }}}]^{T}\in \mathfrak {R}^{{m}\times {p}}\) is a deterministic mixing matrix, S = \([\mathbf {{s_{1},s_{2}\ldots s_{ n }}}]\in \mathfrak {R}^{{p\times n}}\) is the IC matrix and \({\mathbf {E}}\in \mathfrak {R}^{{m}\times {n}}\) is a residual matrix. Since two unknown matrices S and A cannot be computed, the aim is focused in finding a separating matrix W such that the reconstructed matrix is given by:

$$\begin{aligned} {\hat{\mathbf {S}} = \mathbf {WX}_{\mathbf {a}}} \end{aligned}$$
(2)

The ICA model development stage consists of four steps, namely, centering, whitening, preprocessing and iterative computing. Initially, the data \(\mathbf {X_{a}}\) is preprocessed to have a mean of zero. Next, whitening is performed by singular value decomposition (SVD) on the covariance of \(\mathbf {X_{a}}\) which generates the transformation \(\mathbf {Z = QX_{a}}\) with \(\mathbf {Q} = \varvec{\Lambda }^{-1}\mathbf {U}^{\mathbf {T}}\), where \(\varvec{\Lambda }\) is a matrix involving diagonal eigen values and U is an eigen vector matrix obtained using the covariance of \(\mathbf {X_{a}}\). The transformation post whitening stage is:

$$\begin{aligned} \mathbf {Z = QX_{a} = QAS = VS} \end{aligned}$$
(3)

where V is an orthogonal matrix. Next, the preprocessing step involves developing important relationships between original data and the separating matrix W, which is useful for determining the ICs. From Eq. (3), S can be estimated as follows:

$$\begin{aligned} {\hat{\mathbf {S}} = \mathbf {V}^{\mathbf {T}}\mathbf {Z} = \mathbf {V}^{\mathbf {T}}\mathbf {QX}_{\mathbf {a}}} \end{aligned}$$
(4)

Comparing Eqs. (2) and (4), a relation between W and V is obtained, given by:

$$\begin{aligned} \mathbf {W = V^{T}Q} \end{aligned}$$
(5)

To avoid random solutions in each iteration, instead of random initialization of \(\mathbf {V}\), it is initialized to an identity matrix \({\mathbf {I}}\in \textit{R}^{\textit{m}\times \textit{m}}\). Next, each column \(\mathbf {v_{i}}\) of V is updated one by one through an iterative algorithm. The computation of V requires all ICs to be independent, and criteria for independence is based on non-Gaussianity which is measured through kurtosis and negentropy. Kurtosis, the fourth-order moment, is sensitive to outliers which makes negentropy a better choice. For a variable having density function f(Y), H is represented as:

$$\begin{aligned} H(Y) = -\int f(Y)log f(Y) \text {d}y. \end{aligned}$$
(6)

Any difference in entropy between the given distribution and normal distribution having the same mean-variance is computed by negentropy which is mathematically given by:

$$\begin{aligned} J(Y) = H(Y_{guass}) - H(Y) \end{aligned}$$
(7)

\(Y_{guass}\) is a random Gaussian variable with a similar covariance matrix as that of Y. For estimating negentropy, a simpler approximation was proposed (Aapo and Erkki 2000):

$$\begin{aligned} J(Y)\approx [E\{G({\hat{Y}})\} - E\{G(y_{g})\}]^{2} \end{aligned}$$
(8)

where Y and \(y_{g}\) are assumed to have a mean of zero and unity variance, and G(.) is a non-quadratic function which can be evaluated as (Aapo and Erkki 2000):

$$\begin{aligned} G(k) = \mathrm{exp}(-ck^{2}/2) \end{aligned}$$
(9)

where \(c\approx 1\). By maximizing the objective function, extraction of independent components takes place sequentially:

$$\begin{aligned} J({\mathbf {v}})\approx [E\{G(\mathbf {v^{T}x_{a}})\} - E\{G(y_{g})\}]^{2} \end{aligned}$$
(10)

Each element of V corresponds to one IC and can be estimated by the use of a deflation scheme. By maximizing the sum of the N-unit cost function, one unit cost function is generalized for determining V. The resulting optimization problem can be described as:

$$\begin{aligned} \max \sum _{i = 1}^{N}J({\mathbf {v}}_{i})i = 1,2\ldots N \end{aligned}$$
(11)

S.T \(E\{(\mathbf {v^{T}_{ k }x_{a}})(\mathbf {v^{T}_{ j }x_{a}})\} = \delta _{jk}\) and \(\underset{i}{{\mathbf {s}}} = \mathbf {v^{T}x_{a}}\) where \(\mathbf {v_{i}}\) is the row of V at the maximum of \(J(\mathbf {v_{i}})\). This undergoes maximization at certain optima of \(E\{g(\mathbf {v^{T}x_{a}})\}\) under the constraint \(E\{(\mathbf {v^{T}x_{a}})^{2}\} = ||{\mathbf {v}}||^{2} = 1\) which is described by:

$$\begin{aligned} F({\mathbf {v}}) = E\{\mathbf {x_{a}g}(\mathbf {v^{T}x_{a}})\} - \gamma {\mathbf {v}} = 0 \end{aligned}$$
(12)

\( \gamma = E\{\mathbf {v_{0}^{T}x_{a}}g(\mathbf {v^{T}x_{a}})\}\) and \(v_{0}\) is the optimum value of v. Solving the above by using the Newton method yields the fixed-point algorithm (Ali and Mahdi 2012):

$$\begin{aligned} E\{{\mathbf {z}}g(\mathbf {v^{T}z})\}-E\{g'(\mathbf {v^{T}z})\}{\mathbf {v}} \end{aligned}$$
(13)

The above algorithm extracts only one independent component at a time. This fixed-point algorithm has to be run m times to estimate the remaining ICs. After each iteration, the vectors have to undergo orthogonalization to make sure that they do not converge to the same maxima. Hence, performing the deflationary orthogonalization through the Gram–Schmidt method:

$$\begin{aligned} {\mathbf {v}}_{j+1}= & {} {\mathbf {v}}_{j+1}-\sum _{i = 1}^{l}({\mathbf {v}}_{j+1}^{T}{\mathbf {v}}_{i}){\mathbf {v}}_{i} \end{aligned}$$
(14)
$$\begin{aligned} {\mathbf {v}}_{j+1}= & {} \dfrac{{\mathbf {v}}_{j+1}}{||{\mathbf {v}}_{j+1}||} \end{aligned}$$
(15)

where j is the number of previously estimated vectors. Once the ICs are computed using the fixed-point algorithm, the next task is selection of dominant ICs which is needed for two reasons: robust performance and reduction of complexity. A few methods have been commonly preferred for selection of dominant ICs (Lee et al. 2006). However, the cumulative percentage variance (CPV) method has been applied in the current work to determine optimum ICs. First, the \(L_{2}\) norm is used to sort the rows of separating W from descending to ascending order, \(\lambda = ||W||_{2}\). Next, CPV method is applied to determine p-dominant ICs which is expressed as:

$$\begin{aligned} \mathrm{CPV}(p) = \dfrac{\sum _{i = 1}^{p}\lambda _{i}}{\sum _{i = 1}^{m}\lambda _{i}} * 100 \end{aligned}$$
(16)

Once a model is developed, determining the prediction capability is crucial for its acceptance. Generally, the prediction capability of a model is reduced when large noise is present in the data. In this work, the \(R^{2}\) test has been used to evaluate the goodness of the developed PCA as well as the ICA model. The coefficient of determination, denoted as \(R^{2}\), is the ratio of the sum of squares regression to the total sum of squares denoted mathematically as:

(17)

SSQ is the sum of squares which determines how far the predicted value is from the mean of the original data, and TSSQ determines how the original data points vary from their mean.

Once an ICA model is developed from data exhibiting normal process operation, the future behavior of variables is compared with the present behavior through the fault detection indicators. The ICA model may be represented as:

$$\begin{aligned} {\mathbf {X}} = \underbrace{{\mathbf {A}}_{p}{\mathbf {S}}_{p}}_\text {Systematic space} + \underbrace{{\mathbf {A}}_{m-p}{\mathbf {S}}_{m-p}}_\text {Excluded space} + \underbrace{{\mathbf {E}}}_\text {Residual space} \end{aligned}$$
(18)

The ICA model consists of three parts: The systematic space represents a model with respect to p optimum ICs, the excluded space represents a model with respect to ignored m–p ICs, and the residual space. To monitor the three parts, three fault indicators are employed namely, \( I ^{2}_{d}\) to monitor systematic space, \( I ^{2}_{e}\) to monitor excluded space and SPE to monitor residual space. The reconstructed vectors \(\hat{{\mathbf {f}}}_{\mathrm{new} p }\) = \({\mathbf {W}}_{\textit{p}}{\mathbf {X}}_{\mathrm{new} \textit{p}}\) and \(\hat{{\mathbf {f}}}_{\mathrm{new} m-p } = {\mathbf {W}}_{ m-p }{\mathbf {X}}_{\mathrm{new} m-p }\) are computed for a new data \({\mathbf {X}}_{\mathrm{new}}\) where \({\mathbf {W}}_{ p }\) is a matrix corresponding to p rows, and \({\mathbf {W}}_{ m-p }\) corresponds to excluded mp rows of the separating matrix. The fault detection indicators of ICA at sample i are represented as (Yuan et al. 2019):

$$\begin{aligned} I ^{2}_{d}(i)= & {} \hat{{\mathbf {f}}}_{\mathrm{new} p }^{T}(i)\hat{{\mathbf {f}}}_{\mathrm{new} p }(i) \end{aligned}$$
(19)
$$\begin{aligned} I _{e}^{2}(i)= & {} \hat{{\mathbf {f}}}_{\mathrm{new} m-p }^{T}(i)\hat{{\mathbf {f}}}_{\mathrm{new} m-p }(i) \end{aligned}$$
(20)
$$\begin{aligned} SPE (i)= & {} ({\mathbf {X}}_{\mathrm{new}}(i)-\hat{{\mathbf {X}}}_{\mathrm{new}}(i))^{T}({\mathbf {X}}_{\mathrm{new}}(i)-\hat{{\mathbf {X}}}_{\mathrm{new}}(i)) \end{aligned}$$
(21)

where \(\hat{{\mathbf {X}}}_{\mathrm{new}} = {\mathbf {Q}}^{-1}{\mathbf {B}}_{p}\hat{{\mathbf {f}}}_{\mathrm{new}}(i) = {\mathbf {Q}}^{-1}{\mathbf {B}}_{p}{\mathbf {W}}_{p}{\mathbf {X}}_{\mathrm{new}}(i)\). Kernel density estimation(KDE) technique is employed for computing the threshold limits for fault indicators (Lee et al. 2004).

2.2 Dynamic ICA

In chemical process plants, variables do not remain at a fixed state, as they have a tendency to move around a nominal operating range, and this will result in variables having strong autocorrelation. The conventional process monitoring methods perform modeling assuming that the present sampling instants are independent from the past sampling instants. However, due to the autocorrelation exhibited by variables, present sampling instants have dependency on past sampling instants, and this information can be included in the modeling stage through dynamic ICA strategy. In dynamic ICA strategy, the variables are supplemented with past observations to generate an augmented matrix which is represented by (Lee et al. 2004):

$$\begin{aligned} {{\mathbf {X}}(\textit{l}) = } \begin{bmatrix} {{\mathbf {x}}_{ t }^{T}} &{} {{\mathbf {x}}_{ t -1}^{T}} &{} {..} &{} {..} &{} {{\mathbf {x}}_{ t-l }^{T}} \\ {{\mathbf {x}}_{ t -1}^{T}} &{} {{\mathbf {x}}_{ t-2 }^{T}} &{} {..} &{} {..} &{} {{\mathbf {x}}_{ t-1 -l}^{T}} \\ {.} &{} {.} &{} {.} &{} {.} &{} {.} \\ {.} &{} {.} &{} {.} &{} {.} &{} {.} \\ {.} &{} {.} &{} {.} &{} {.} &{} {.} \\ {{\mathbf {x}}_{ t+l-n }^{T}} &{} {{\mathbf {x}}_{ t+l-n-1 ^{T}}} &{} {..} &{} {..} &{} {{\mathbf {x}}_{ t-n }^{T}} \end{bmatrix} \end{aligned}$$
(22)

where \(\mathbf{x} _{t}^{T}\) is an observation vector at sampling instant t, and l denotes the number of lagged measurements. The selection of a proper value for time lag l is crucial for capturing dynamic information. A few methods available in the literature for choosing the right number of lags include a subspace identification criterion and the Akaike information criterion. However, it may be noted that selecting l = 1 or l = 2 yields good results for dynamic multivariate techniques (Lee et al. 2004). Once a reference dynamic ICA model is constructed using fault-free data, the fault indicators described in Eqs. (19)–(21) are used for detecting faults in a new data set.

2.3 Modified ICA

The modified ICA (MICA) algorithm was proposed as an improvement of the ICA algorithm in order to have structured ordering of the independent components (ICs). In the MICA algorithm, the initial estimate of ICs is done using normalized principal components (PCs) from the PCA model, and then, the FastICA algorithm is utilized to update the retained ICs with same variance being maintained as that of PCs. The starting step of MICA includes applying PCA to extract PCs from data \(\mathbf {X_{a}} = [\mathbf {x_{1}},\mathbf {x_{2}}\ldots ,\mathbf {x_{n}}]^{T}\) \(\in \mathfrak {R}^{\textit{m}\times \textit{n}}\).

$$\begin{aligned} {\mathbf {X_{a} = TP^{T}}} \end{aligned}$$
(23)

where T and \({\mathbf {P}}\) are score and loading vector matrices, respectively, generated from covariance of \(\mathbf {X_{a}}\). The last few eigenvalues of \(\varvec{\Lambda } = \mathrm{diag}(\lambda _{1},\lambda _{2},\ldots \lambda _{m})\) are close to zero, and they can be left out. The retained eigenvalues or PCs undergo whitening which is expressed as \(\mathbf {Z = QX_{a}}\) with \(\mathbf {Q} = \varvec{\Lambda }^{-1}\mathbf {P}^{\mathbf {T}}\). The whitened components of Z serve as initial estimates for the ICs. The main objective of MICA is to ensure that elements of S become as independent as possible, and the MICA model is defined as:

$$\begin{aligned} {\mathbf {S = C^{T}Z}} \end{aligned}$$
(24)

where C\(\in \mathfrak {R}^{\textit{m}\times \textit{p}}\) such that \(\mathbf {C^{T}C = D}\) and \({\mathbf {D}} = \mathrm{diag}(\lambda _{1},\lambda _{2},\ldots \lambda _{p})\) considering all the dominant eigenvalues. The variance of each element of S is the same as the variance of the score elements of PCA since the following condition is satisfied:

$$\begin{aligned} {\dfrac{\mathbf {SS}^{T}}{n-1} = {\mathbf {D}}} \end{aligned}$$
(25)

This is followed by normalization of S such that:

$$\begin{aligned} {\mathbf {S_{no} = D^{\frac{-1}{2}}S = C_{no}^{T}Z}} \end{aligned}$$
(26)

where \(E(\mathbf {SS^{T}}) = {\mathbf {I}}\). Hence, the objective of MICA is reduced in determining \(\mathbf {C_{no}}\). The elements of the \(\mathbf {C_{no}}\) matrix are computed by a fixed-point algorithm using the negentropy approximation (Lee et al. 2007). Before the fixed-point algorithm is executed, the following initialization is ensured:

$$\begin{aligned} {\mathbf {C_{no} = [I_{m}\vdots 0]}} \end{aligned}$$
(27)

where \(\mathbf {I_{m}}\) is a m-dimensional identity matrix and 0 is an \( m \times (p-m) \) zero matrix. After the computation of \(\mathbf {C_{no}}\), W and A matrices are computed using:

$$\begin{aligned} \mathbf {W}= & {} \mathbf {D}^{1/2}\mathbf {C}_\mathbf {no}^\mathbf {T}\mathbf {Q} \end{aligned}$$
(28)
$$\begin{aligned} {\mathbf {A}}= & {} {\mathbf {P}}{\varvec{\Lambda }}^{1/2}\mathbf {C}_\mathbf {no}\mathbf {D}^{-1/2} \end{aligned}$$
(29)

The next step is to determine dominant ICs. As the variance of each IC in S is the same as that of PCs, any standard criterion for determining optimum PCs in the PCA strategy can be employed. The CPV technique is considered for determining optimum ICs in this study. Two fault indices are defined for the MICA strategy: the \(T^{2}\) and SPE for modeling systematic and residual parts, respectively. For a new data \(\mathbf {X_{new}}\), they are defined as follows:

$$\begin{aligned} \textit{T}^{2}= & {} \mathbf {s^{T}D^{-1}s}\end{aligned}$$
(30)
$$\begin{aligned} \textit{SPE}= & {} \mathbf {(X}-\hat{\mathbf {X}})^{T}(\mathbf {X}-\hat{\mathbf {X}}) \end{aligned}$$
(31)

where \(\mathbf {s = WX_{new}}\) and \({\hat{\mathbf {X}} = \mathbf{As} }\). If the fault indices exceed a predetermined reference threshold, then a fault is declared (Lee et al. 2006).

2.4 Principal Component Analysis

PCA is the most widely applied technique for capturing correlation between the process data variables. After normalization to zero mean and unity variance, the scaled data X \(\in \mathfrak {R}^{\textit{n}\times \textit{m}}\) is used to estimate principal components (PCs) through singular value decomposition:

$$\begin{aligned} \mathbf {X = TP^{T}} \end{aligned}$$
(32)

where T and P are the PC and loading vector matrix, respectively (Harrou et al. 2013). The loading vector matrix is related with covariance of X, which is described mathematically as:

$$\begin{aligned} {\pmb {\Sigma }} = {\mathbf {P}}\varLambda {{\mathbf {P}}}^{T} \text {with} {\mathbf {P}}{\mathbf {P}}^{T} = {\mathbf {P}}^{T}{\mathbf {P}} = \mathbf{I} _{m} \end{aligned}$$
(33)

where \(\varvec{\Lambda } = \mathrm{diag}(\lambda _{1},\lambda _{2},\ldots \lambda _{m})\) is a matrix comprising eigenvalues of \(\mathbf {\Sigma }\) arranged in diagonal format in descending order. Once optimum PCs p are computed via CPV technique, the PCA model can be represented as:

$$\begin{aligned} {\mathbf {X}} = \underbrace{\hat{{\mathbf {T}}}\hat{{\mathbf {P}}}^{T}}_\text {modeled space} + \underbrace{\tilde{{\mathbf {T}}}\tilde{{\mathbf {P}}}^{T}}_\text {residual space} = \hat{{\mathbf {X}}}+{\mathbf {F}} \end{aligned}$$
(34)

The \(\hat{{\mathbf {X}}}\) and \(\hat{{\mathbf {T}}}\) represent model parameters for p retained PCs. The residual matrix F represent model parameters for m-p ignored PCs.

The \(T^{2}\) statistics monitors the modeled part of the developed PCA model. It is represented in the following manner for a new data \(\mathbf {X_{new}}\):

$$\begin{aligned} \textit{T}^{2} = \mathbf {X_{new}}^{T}\hat{{\mathbf {P}}}\hat{\varvec{\Lambda }}^{-1}\hat{{\mathbf {P}}}^{T}\mathbf {X_{new}} \end{aligned}$$
(35)

The matrix \( \hat{\varvec{\Lambda }}\) contains diagonal eigenvalues of retained PCs. The Q statistics monitors the residual part of the developed PCA model, and it is expressed as:

$$\begin{aligned} Q = \mathbf {X_{new}}^{T}({\mathbf {I}}-\hat{{\mathbf {P}}}\hat{{\mathbf {P}}}^{T}){\mathbf {X_{new}}} \end{aligned}$$
(36)

For \(\mathbf {X_{new}}\), a fault may be declared if the value of \(T^{2}\) and Q fault indicators is greater than the threshold (Sanjula and Zukui 2019).

3 ICA-Based Robust Fault Detection Strategy

Once a multivariate model is developed using ICA, evaluating the model for robustness and prediction capability is crucial for its acceptance. The developed multivariate model should be robust enough to perform in harsh and noisy industrial environments. It has been observed that a multivariate FD strategy which performs well in less noisy environments may not be able to replicate the same performance in a noisier environment. Hence, the presence of noise does have its effect and can degrade the fault handling capacity of an FD strategy. In this paper, we present a multivariate FD strategy based on ICA and present the behavior of the developed ICA model when noise is introduced in the underlying data. The aim of this work is to study the effect of different levels of measurement noise on performance of ICA, DICA and MICA models. After this, similar noise realizations are performed on a PCA model followed by drawing comparisons between PCA- and ICA-based FD methods.

figure a

It is unfair to come to a conclusion regarding the robustness of a fault detection scheme based on few rounds of simulations. In the field of system identification and signal processing, when faced with significant uncertainty in a process model involving making a estimation, stochastic simulation has proved to be a good solution. Thus, in this study, stochastic simulations have been carried out with 1000 realizations. The different noise realizations that would be carried out involve introducing noise with a signal-to-noise ratio (SNR) to the data. In the present task, noise with a defined SNR, i.e., SNR = 20, SNR = 10 and SNR = 5, is added to the data by performing stochastic simulation. The data with SNR = 20 corresponds to quality data with less noise, the data with SNR = 10 corresponds to a medium level of noise and SNR = 5 indicates very noisy data. The algorithm of the robust fault detection strategy is described in Algorithm 1.

Fig. 1
figure 1

Representation of the proposed strategy

Figure 1 describes a block diagram representation of the proposed strategy. The available process data is initially split into training and testing groups. The training group data is preprocessed to a zero mean which is then used to develop the ICA model. After determining the optimum ICs, KDE technique is employed to compute the threshold for fault indicators. For the testing data group, stochastic simulation of 1000 runs is performed for noise with an SNR that is fixed at 20, 10 and 5. Next, the fault indicators for ICA are computed and compared with the threshold. During the absence of a fault, the value of the fault indicator will be less than the control limit value. A fault is declared when the fault indicator exceeds the control limit value. Similar noise realizations are also performed on DICA, MICA and PCA strategies for having comparison. In process monitoring problems, the performance of an FD strategy is assessed using false alarm rate (FAR) and missed detection rate (MDR). The FAR is defined as the ratio of the total number of false alarms to the total number of data in a normal operating range. The MDR is defined as the ratio of the total number of faults which do not exceed the control limits to the total number of data in the faulty range. A fault detection strategy is deemed to be robust if FAR and MDR values are minimum for different levels of noise.

4 Case Studies

In this section, two case studies are considered: simulated quadruple tank process and simulated distillation column (DC) process to demonstrate the effectiveness of ICA-based strategies over PCA-FD strategy.

4.1 Types of Sensor Faults

A fault is defined as a continuous series of events where a variable deviates from its regular operating range. Consider a process having a large number of variables. A measurement M(t) associated with a process variable may be represented as:

$$\begin{aligned} M(t) = M_{n}(t) + F(t) \end{aligned}$$
(37)

where \(M_{n}(t)\) is the true value of the measurement M(t), and F(t) is a possible fault at time instant t. The value of F(t) in the equation would be zero when the process variable is operating normally and it takes a non-zero value in the presence of a fault. A bias fault is recognized by sudden jump of a variable from normal operating value to higher or lower abnormal value. In this fault, measurement shifts from its normal value of \(M_{n}(t)\) to a new value \(M_{n}(t) + F\). This can be represented in time series format as:

$$\begin{aligned} M(t) = {\left\{ \begin{array}{ll} M_{n}(t) &{} \hbox { if}\ t<t_{a}; \\ M_{n}(t) + F &{} \hbox { if}\ t \ge t_{a};\\ \end{array}\right. } \end{aligned}$$
(38)

where F is a bias term representing the rate of increase (%), and \(t_{a}\) is a time instant where the fault appears. The aging or drift fault occurs due to aging of a sensor and can be represented in time series format as:

$$\begin{aligned} M(t) = {\left\{ \begin{array}{ll} M_{n}(t) &{} \hbox { if}\ t<t_{a}; \\ M_{n}(t) + V(t-t_{a}) &{} \hbox { if}\ t \ge t_{a};\\ \end{array}\right. } \end{aligned}$$
(39)

where \(N_{n}(t)\) is the true value of the measurement, V is the rate of the slow drift behavior and \(t_{a}\) is the time instant where the fault occurs.

Fig. 2
figure 2

A schematic of the quadruple tank process

Table 1 QTP: MDR & FAR for bias fault
Fig. 3
figure 3

QTP: monitoring results of PCA with bias fault for SNR = 5

Fig. 4
figure 4

QTP: monitoring results of ICA with bias fault for SNR = 5

Fig. 5
figure 5

QTP: monitoring results of DICA with bias fault for SNR = 5

4.2 Simulation Study on a Quadruple Tank Process

In the last few years, the quadruple tank system has been applied in many control as well as process problems since it demonstrates very useful multivariable phenomena between the four tanks. In this process, voltages of a pump \( v _{1}\) and \( v _{2}\) are inputs, while the heights of the liquid level \(h_{1}\), \(h_{2}\), \(h_{3}\) and \(h_{4}\) are measured outputs. The equations that govern the quadruple tank process (QTP) presented through Fig. 2 are (Karl 2000):

$$\begin{aligned} \frac{\text {d}h_{1}}{\text {d}t}= & {} \frac{q_{1}k_{1}V_{1}}{A_{1}}+\frac{a_{3}\sqrt{2gh_{3}}}{A_{1}}-\frac{a_{1}\sqrt{2gh_{1}}}{A_{1}} \end{aligned}$$
(40)
$$\begin{aligned} \frac{\text {d}h_{2}}{\text {d}t}= & {} \frac{q_{2}k_{2}V_{2}}{A_{2}}+\frac{a_{4}\sqrt{2gh_{4}}}{A_{2}}-\frac{a_{2}\sqrt{2gh_{2}}}{A_{2}} \end{aligned}$$
(41)
$$\begin{aligned} \frac{\text {d}h_{3}}{\text {d}t}= & {} \frac{(1-q_{2})k_{2}V_{2}}{A_{3}}-\frac{a_{3}\sqrt{2gh_{3}}}{A_{3}} \end{aligned}$$
(42)
$$\begin{aligned} \frac{\text {d}h_{4}}{\text {d}t}= & {} \frac{(1-q_{1})k_{1}V_{1}}{A_{4}}-\frac{a_{4}\sqrt{2gh_{4}}}{A_{4}} \end{aligned}$$
(43)

In the above equations, \(a_{1} \cdots a_{4}\) represent area of the outlet pipes; \(A_{1} \cdots A_{4}\) represent the cross-sectional area of the tanks; \(q_{1}\) and \(q_{2}\) are valve ratios; \(k_{1}\) and \(k_{2}\) denote pump constants; and g is the gravity of the earth.

Table 2 QTP: MDR & FAR for drift fault
Fig. 6
figure 6

Schematic of a distillation column

In this case, dynamic simulations are carried out to generate a data set for which standard nominal operating points are considered (Karl 2000). Perturbation of inputs is done at their normal operation range using a pseudo-random binary signal in the range of frequency [0  0.03 \(\omega _{n}\)], where \(\omega _{n} = \pi /T\) represents Nyquist frequency. The idinput function in the System Identification Toolbox of MATLAB is used to generate 2048 samples of data, which were later split equally into training and testing data. For ICA and modified ICA(MICA) strategies, three optimum ICs are selected, whereas three PCs are selected for the PCA strategy. For the dynamic ICA strategy, a lag of l = 1 is considered, and six optimum ICs are selected for the same. The model performance of ICA is assessed using \(R^{2}\) values which are 0.975, 0.941 and 0.891 for SNR = 20, SNR = 10 and SNR = 5, respectively. A value of \(R^{2}\) = 0.975 suggests that 97% of the total sum of squares in testing data is described by the ICA model and only 3% in the residuals.

First, a sensor bias fault with a magnitude of \(+\)1 is introduced in the tank 1 measurement at sample 175 of the testing data. The fault indicators of different strategies evaluated through FAR as well as MDR for different noise realizations is presented in Table 1. While PCA-Q statistics has no false alarms, the PCA-\(\textit{T}^{2}\) statistics has many more false alarms than the fault indicators of ICA, DICA and MICA methods. The PCA strategy has very high value of MDR with increased noise level in comparison to other strategies. The performance of ICA, MICA and DICA methods is acceptable even though MDR value increases with added levels of noise. For providing clarity to the reader, monitoring plots of PCA, ICA and DICA strategies in monitoring bias fault for the case of SNR = 5 are presented in Figs. 3, 4 and 5, respectively.

Fig. 7
figure 7

DC process: variation of output signals

Fig. 8
figure 8

DC process: scatter plot, observed vs. predicted values

In the next case, a fault with a rate of 0.008 resembling a drift sensor fault is introduced in sensor data-3 (i.e tank 3 height measurement) from sampling time instant 150 till the end of testing data. The performance of different fault indicators evaluated through FAR/MDR values for the cases with SNR = 20, SNR = 10 and SNR = 5 is presented in Table 2. The results demonstrate that MDR values for PCA and ICA fault indicators are almost comparable except for PCA-\(T^{2}\) which has a high MDR value. The FAR values of all the methods are also in the same range, and the FAR value increases as more noise is added in the data. Thus, it can be concluded that the monitoring performance of ICA, DICA and MICA strategies are slightly better in comparison to PCA strategy in monitoring bias and drift faults. The ICA and its variant fault detection methods are more robust for different levels of noise in the quadruple tank process case study.

4.3 Distillation Column Simulation Study

A distillation column (DC) is an energy-consuming unit in any chemical process plant and is used for separating components from mixtures of components based on the difference in vapor pressure. Proper monitoring of distillation columns is necessary in an industry to avoid any accident and loss of product quality. A schematic of a DC process applied in an industrial setup can be observed in Fig. 6. In this example, the distillation column consists of 32 plates, and 10 resistance temperature detectors (RTDs) are used to monitor temperature at different locations of the column. The flow rates of feed and reflux are used to perturb the distillation column.

The Aspen simulator is used for generating distillation column process data. To begin, flow rates of the feed as well as reflux streams are perturbed from their nominal operating ranges (Harrou et al. 2018; Madakyaru et al. 2013). Once the system has reached a steady-state condition, these perturbations are used to generate data. The input variables consist of 10 temperatures corresponding to measurements at various locations of the column along with flow rates of feed and reflux. The variations of the output (i.e., \(x_{B}\) and \(x_{D}\)) for changes in the input perturbations are presented in Fig. 7. A total data length of 4096 samples is generated with 14 variables, which are then split equally into training and testing data sets.

The training data is used for developing PCA- and ICA-based models which will then be used for monitoring faults in testing data. For ICA and MICA strategies, 8 and 10 ICs are selected, whereas 8 PCs are selected for the PCA strategy. For the DICA strategy, a lag of l = 1 is considered and 14 ICs are selected for the same. Figure 8 demonstrates a scatter plot of observed values vs. predicted values for the case SNR = 5, and it is observed that predicted values follow observed values perfectly. The goodness of the model in fitting training data is found using \(R^{2}\) statistics. The value of \(R^{2}\) for the cases with SNR = 20, SNR = 10 and SNR = 5 is found to be 0.978, 0.955 and 0.9152, respectively. Since \(R^{2}\) values reduces as more noise is present in the data, it may be concluded that noise has its effect on the prediction capability of the ICA model.

First, the ability of different FD strategies in handling sensor bias fault is being investigated. To simulate this fault, a bias with a magnitude of 20 is introduced in testing data temperature variable 5 at sampling time instant 750. To obtain a valid conclusion from the simulation results, 1000 stochastic simulations of different noise realizations have been carried out, and the results are tabulated in Table 3. From the table, it is observed that despite having no false alarms, the \(T^{2}\) and Q statistics have a very high MDR which indicates that they are unable to detect bias fault in the presence of noise. In contrast to PCA, the fault indicators of ICA, DICA and MICA strategies have less FAR and MDR values, thus demonstrating enhanced performance in detecting the bias fault even with increase in noise levels.

Table 3 DC process: MDR & FAR for bias fault

Next, the capability of different FD strategies in monitoring sensor drift fault is presented in this section. A slow ramp fault with a rate of 0.015 resembling an aging sensor fault is introduced in temperature variable 7 at sampling time instant 900. Stochastic simulations are carried out to assess the performance of fault indicators in monitoring this fault for different noise realizations. The time evaluation of PCA, ICA and MICA fault indicators can be observed in Figs. 9, 10 and 11, respectively. The results of stochastic simulation evaluated using FAR and MDR are presented in Table 4. It is observed that the PCA fault indicators have zero FAR but very high MDR value, and this clearly suggests that performance is not good in monitoring drift fault in the presence of noise. In contrast, ICA, DICA and MICA strategies demonstrate better detection of drift fault when noise level is increased in the data. Hence, it can be inferred that performance of ICA, DICA and MICA strategies is commendable in handling sensor faults of the DC process in the presence of noise.

Table 4 DC process: MDR & FAR for drift fault
Fig. 9
figure 9

DC process: monitoring results of PCA with drift fault for SNR = 10

Fig. 10
figure 10

DC process: monitoring results of ICA with drift fault for SNR = 10

Fig. 11
figure 11

DC process: monitoring results of MICA with drift fault for SNR = 10

5 Conclusion

The aim of this study was to find the effectiveness of ICA-based fault detection strategies against different levels of noise in the measured variables, which was then compared with the PCA-based fault detection strategy. Whenever data comes with any level of noise, there is difficulty in separating useful information, and this leads to poor model prediction. Hence, to check the robustness of the FD strategies, stochastic simulations were performed for noise with SNR = 20, SNR = 10 and SNR = 5. The data with SNR = 20 corresponded to quality data with less noise, the data with SNR = 10 corresponded to a medium level of noise and SNR = 5 indicated very noisy data. The performance of ICA FD, dynamic ICA FD and modified ICA FD strategies was compared with the PCA-FD strategy through two case studies, i.e., the quadruple tank process and the distillation column process. The FAR and MDR metrics were used to assess the monitoring performances of different strategies. It was observed that the value of MDR as well as FAR became increasingly higher with increased level of noise in the data for all the FD strategies. The MDR value for the PCA strategy was very high, and it further increased with increased levels of noise. In contrast, the ICA-based FD strategies had much lower values of MDR for different noise levels. Hence, it can be inferred that ICA, dynamic ICA and modified ICA strategies were more effective to deal with sensor faults in comparison to the PCA strategy for different levels of noise. For the future, the proposed strategy can be extended for considering wavelet-based ICA for handling process plants with multi-resolution data.