1 Introduction

In modern petrochemical industries, process abnormalities or faults hamper smooth functioning and this needs to be addressed before they pose a further threat. In this regard, process monitoring plays a vital role in ensuring safe process operation as well as good product quality. A survey has suggested that the USA’s petrochemical industries could save billions of dollars each year if they were to have an effective fault detection (FD) strategy [1]. Similarly, in other parts of the world, lots of cost saving is possible if they were to have an effective FD strategy. Different techniques have been proposed in the literature to tackle process faults and they are broadly divided into two families: model based and data-driven methods [2,3,4]. In the last decade, there has been rapid development in the field of computer-aided systems and advanced sensor technologies which has lead to quick and improved extraction of data from the process. Due to the complex nature of process plants, model-based FD methods are difficult to develop practically. In contrast, large process data is sufficient for modeling data-driven methods which have found themselves in the core of the industrial big data revolution [5]. The multi-variate statistical process monitoring (MSPM) belongs to the group of data-driven monitoring with the luxury of monitoring multiple variables simultaneously and has been applied in FD domain. Good research has been carried out in various domains of neural networks and soft computing in recent years [6,7,8,9,10,11].

Principal component analysis (PCA), a popular multi-variate strategy, handles large dimensional correlated data by projecting it into a lower dimensional subspace without compromising original data information [12]. In the last few years, several works have been carried out as an extension to conventional PCA technique that includes dynamic PCA(DPCA), recursive PCA, multi-way PCA, moving window PCA, multi-scale PCA and nonlinear PCA techniques [13,14,15,16]. The PCA technique relies only on second-order statistics of mean and variance, thus providing restricted representation of non-gaussian data. An MSPM technique based on Independent component analysis (ICA) has been investigated in the last few years to handle the problem of monitoring non-gaussian processes. The ICA technique, originated initially from blind source separation problems, has progressed very well over the time to find itself in many applications that also include process fault detection [17, 18]. ICA represents data using higher-order statistics, thus enabling it to capture non-gaussian features in the process data. In ICA, the data is decomposed into a linear combination of latent variables or Independent Components(ICs) which are independent. The model development in ICA strategy requires maximizing non-gaussianity or minimizing the mutual information. The ICA modeling technique has been combined with \( I ^{2}_{d}\), \( I ^{2}_{e}\) and squared prediction error (SPE) as fault indicators to detect abnormalities in process data. Many variations of conventional ICA strategy have been proposed in the literature including dynamic ICA, multi-way ICA, modified ICA, Kernel ICA and noisy ICA strategies [19,20,21,22].

Most process monitoring problems face the issue of handling measurement noise since it degrades the performance of an FD strategy. The noise masks important features in the data, thus reducing the task of extracting information from process data. The industrial data acquired from a process consists of information (features) occupying different regions on a time-frequency scale [23, 24]. The measured data is termed multi-scale since it is a blend of infectious noise as well rich data that is concentrated in the time and frequency domain. The best way to achieve multi-scale data representation is through the application of wavelets that possesses ability to separate deterministic from stochastic features. Wavelets are mathematical functions having multiple scale features with the ability to synchronize their scale with the nature of signal [25]. The biggest advantage of wavelets is their capacity in discriminating measurement noise from useful data features by applying low pass and high pass filters to data during the multi-scale decomposition stage. In the early years when multi-scale filtering was proposed for fault detection domain, it was integrated with PCA to have an MSPCA strategy [26]. In the work by [26], the PCA strategy was developed at each scale followed by combining the results at individual scales where significant events were present. This enabled wavelets’ ability to detect deterministic changes and capture those features where abnormal operations were evident.

The advantages of multi-scale filtering were used to enhance the monitoring capabilities of Shewart chart, exponentially weighted moving average (EWMA) chart and cumulative sum (CUSUM) chart depending on the type of wavelet selected [27]. A multi-scale PCA algorithm (MSPCA) was proposed in which the contributions were collected from each scale in separate matrices and then, the PCA model was developed to capture correlation at each scale [28]. The multi-scale partial least squares (PLS) was combined with multi-block PLS to have a novel FD strategy which enhanced monitoring performance [29]. The advantage of wavelet based de-noising was used to enhance the prediction capability of linear variable regression (LVR) models such as partial least squares (PLS) and principal component regression (PCR) in inferential model development [30]. In recent years, multi-scale PLS and multi-scale PCA have been combined with a new fault detection index based on generalized likelihood ratio (GLR) to have improved detection of faults [16, 31, 32]. In our earlier research, the MSPCA strategy was combined with dynamic PCA to have multi-scale dynamic PCA (MSDPCA) strategy that enhanced fault detection in benchmark Tennessee Eastman process [33]. Contrary to the eminent advances in MSPCA and MSPLS fault detection methods, ICA has received significantly less attention in the field of wavelet-based process monitoring despite ICA being a better choice for monitoring non-gaussian data. The ICA strategy was integrated with wavelet analysis to have a wavelet-ICA strategy where the Daubechies-3 wavelet was used with three levels of decomposition for the Tennessee Eastman process [34]. In the work proposed by [34], the decomposition depth was fixed to three without a specified criteria.

The conventional ICA FD strategy fails to provide good monitoring results in presence of heavy measurement noise. Hence, if the data is de-noised before developing an ICA model, better extraction of detailed non-gaussian features is possible. Hence, this work aims to develop an FD strategy by demonstrating the ability of wavelets in de-noising and capturing essential non-gaussian features from process data through ICA modeling technique. The proposed technique is expected to enhance FD capability of ICA strategy when applied to data in noisy environment. Noise realizations are carried out where noise with specified SNR (signal to noise ratio) is introduced in the data. The SNR is defined as ratio of variance of the signal to variance of the noise. In the present work, noise with a defined SNR, i.e. SNR=20, SNR=10 and SNR=5 are used to perform noise realizations. The data with SNR=20 corresponds to a quality data with less amount of noise, the data with SNR=10 corresponds to medium level of noise and SNR=5 indicates a very noisy data. As the noise levels in the data increases, performance of the MSICA strategy is expected to be superior to other conventional strategies. One of the major challenges in this work is to compute optimum decomposition depth for multi-scale representation. Decomposing the data too deep over-smoothens the data and decomposing it too fewer results in noise being retained. Also, a fixed level of decomposition depth cannot be used for all process applications and hence, a novel method is proposed in this work. In the proposed method, ICA models are developed at each decomposition depths and the depth that gives best prediction capability is chosen as the optimum level of decomposition.

In sect. 2 of this paper, the PCA- and ICA-based fault detection strategies are presented. This is followed by discussion on wavelets, multi-scale representation of data and proposed MSICA fault detection strategy in sect. 3. Section 4 demonstrates the effectiveness of the proposed MSICA FD strategy through three case studies, namely: dynamic multi-variate process, quadruple tank process and distillation column (DC) process. The performance of the proposed MSICA strategy is compared with ICA, PCA and MSPCA fault detection strategies. Finally, sect. 5 provides a conclusion of the work.

2 Data-Driven Model Development

This section provides brief overview of PCA and ICA parameter estimation strategies. The various steps pertaining to off-line model development and process monitoring for both the strategies are presented in detail.

2.1 Principal Component Analysis

The PCA is a traditional dimension reduction technique where large correlated variables are decomposed into lesser number of latent variables (LV) while maintaining original information present in the original data. Consider a multi-variate data, \(\mathbf{X }=[\mathbf {x}_{\mathbf {1}},\mathbf {x}_{\mathbf {2}}....,\mathbf {x}_{\mathbf {n}}]\) \(\in \mathfrak {R}^{\textit{n}\times \textit{m}}\) where m and n represents number of variables and observations in X. The PCA model can be expressed mathematically as [35]:

$$\begin{aligned} \mathbf {X}={\mathbf {TP}}^{T} \end{aligned}$$
(1)

where T=\([\mathbf {t}_{\mathbf {1}},\mathbf {t}_{\mathbf {2}}......\mathbf {t}_{\mathbf {m}}]\) is latent variable matrix and \(\mathbf{P }=[\mathbf {p}_{\mathbf {1}},\mathbf {p}_{\mathbf {2}}.....\mathbf {p}_{\mathbf {m}}]\) is loading matrix. The off-line monitoring of PCA consists of three stages:

  1. 1.

    Each variable of X is centered to mean of zero and unity variance and then, singular value decomposition (SVD) is performed:

    $$\begin{aligned} \dfrac{1}{\textit{n}-1}{\mathbf {X}}^{T}{\mathbf {X}} = \mathbf {P}{\varvec{\Lambda }}{\mathbf {P}}^{T} \end{aligned}$$
    (2)

    where \({{\varvec{\Lambda }}}=diag(\lambda _{1},\lambda _{2},...\lambda _{m})\) is eigenvalue matrix.

  2. 2.

    The optimum principal components (PCs) p are selected using cumulative percentage variance (CPV) technique:

    $$\begin{aligned} CPV(p) = \dfrac{\sum _{i=1}^{p}\lambda _{i}}{\sum _{i=1}^{m}\lambda _{i}} * 100 \end{aligned}$$
    (3)
  3. 3.

    Next, the threshold limits are determined for \(T^{2}\) statistics and SPE fault indicators - \(T^{2}_\alpha \) and \(SPE_\alpha \).

The online monitoring of PCA consists of three stages:

  1. 1.

    A new data \(\mathbf {X}_{\mathbf {new}}\) is centered to zero mean and unit variance.

  2. 2.

    The following fault indicators are computed for \(\mathbf {X}_{\mathbf {new}}\) using developed PCA model parameters [35]:

    $$\begin{aligned} T^{2}= & {} \mathbf {X}_{\mathbf {new}}^{\mathbf {T}}{\hat{\mathbf {P}}}{\varvec{\Lambda }}^{\mathbf {-1}}{\hat{\mathbf {P}}}^{\mathbf {T}}\mathbf {X}_{\mathbf {new}} \end{aligned}$$
    (4)
    $$\begin{aligned} SPE= & {} \mathbf {X}_{\mathbf {new}}^{\mathbf {T}}(\mathbf {I}-{\hat{\mathbf {P}}}{\hat{\mathbf {P}}}^{\mathbf {T}})\mathbf {X}_{\mathbf {new}} \end{aligned}$$
    (5)

    where \({\hat{P}}\) and \(\varLambda ^{-1}\) are matrices corresponding to p optimum PCs.

  3. 3.

    A fault is declared if fault indicators exceed the threshold:

    • \(T^{2}\ge T^{2}_\alpha \)

    • SPE \(\ge SPE_\alpha \)

2.2 Independent Component Analysis (ICA)

The ICA is a multi-variate statistical strategy for extracting non-gaussian independent components (ICs) from the data by employing higher order statistical parameters [36]. An industrial multi-variate data, \(\mathbf{X }=[\mathbf {x}_{\mathbf {1}},\mathbf {x}_{\mathbf {2}}....,\mathbf {x}_{\mathbf {n}}]^{T}\) with \({\mathbf {X}}\in \mathfrak {R}^{\textit{m}\times \textit{n}}\) is a combination of \(\textit{k}(\le \textit{m})\) unknown independent components (ICs). The ICA model can be represented mathematically as:

$$\begin{aligned} \mathbf {X}=\mathbf {AS} + \mathbf {F} \end{aligned}$$
(6)

where A=\([{\mathbf {a}_{\mathbf {1}}}.....{\mathbf {a}_{ k }}]^{T}\in \mathfrak {R}^{{m}\times {k}}\) is a deterministic mixing matrix, S=\([{\mathbf {s}_{\mathbf {1}},\mathbf {s}_{\mathbf {2}}.....\mathbf {s}_{ n }}]^{T}\in \mathfrak {R}^{{k\times n}}\) is a matrix with ICs, k represents the ICs, and \({\mathbf {F}}\in \mathfrak {R}^{{m}\times {n}}\) is residual matrix. The task of ICA is focused on finding a separating matrix W such that reconstructed matrix is given by:

$$\begin{aligned} {\hat{\mathbf {S}}}={\mathbf {WX}} \end{aligned}$$
(7)

The off-line monitoring of ICA consists of following stages:

  1. 1.

    The normalized data \(\mathbf {X}_{\mathbf {c}}\) undergoes whitening where \(\mathbf {Z}=\mathbf {QX}_{\mathbf {c}}\) is computed with Q = \(\varLambda ^{-1}{\mathbf {B}}^{T}\), Z is whitening matrix, \(\varLambda \) is diagonal matrix and B is eigenvector matrix calculated from the covariance of \(\mathbf {X}_{\mathbf {c}}\). The transformation obtained post whitening stage is expressed as \(\mathbf {Z} = \mathbf {QX}_{\mathbf {c}} = \mathbf {QAS} = \mathbf {VS}\)

  2. 2.

    The following iteration is computed (i=1,2....m):

    $$\begin{aligned} v _{i}=\arg \max (J( y )) \end{aligned}$$
    (8)

    subject to

    $$\begin{aligned}&v _{i}, E( yy ^{T})=I \end{aligned}$$
    (9)
    $$\begin{aligned}&J(Y) \approx [E\{G(y)\} - E\{G(b)\}]^{2} \end{aligned}$$
    (10)

    where

    $$\begin{aligned} y=v_{i}^{T}z \end{aligned}$$
    (11)

    Maximizing non-gaussianity by negentropy approximation is a preferred choice for extracting the IC’s [37]. In Eq. (10), \( J(Y) \) is the negentropy function, b is a gaussian variable with zero mean and unit variance and G is a non-gaussian function used to compute independent components. After performing the iteration m times, \({\mathbf {V}}=[ v_{1} . . . . v_{m}]\in \mathfrak {R}^{{m}\times {m}} \) and separating matrix \(\mathbf {W}=\mathbf {V}^{\mathbf {T}}\mathbf {Q}\) are determined.

  3. 3.

    The CPV method described by Eq. (3) is used to determine optimum ICs [38]. For training data, the fault indicators are constructed the following way [39]:

    $$\begin{aligned} I^{2}_{d}= & {} \mathbf {X}_{\mathbf {c}}^{\mathbf {T}}\mathbf {W}_{\mathbf {k}}^{\mathbf {T}}\mathbf {W}_{\mathbf {k}}\mathbf {X}_{\mathbf {c}} \end{aligned}$$
    (12)
    $$\begin{aligned} I^{2}_{e}= & {} \mathbf {X}_{\mathbf {c}}^{\mathbf {T}}\mathbf {W}_{\mathbf {m-k}}^{\mathbf {T}}\mathbf {W}_{\mathbf {m-k}}\mathbf {X}_{\mathbf {c}} \end{aligned}$$
    (13)
    $$\begin{aligned} SPE= & {} \mathbf {e.e}^{\mathbf {T}} \end{aligned}$$
    (14)

    where \(\mathbf {W}_{\mathbf {k}}\in \mathfrak {R}^{{k}\times {m}}\) represents matrix with retained ICs, \(\mathbf {W}_{\mathbf {m-k}}\in \mathfrak {R}^{(m-k)\times {m}}\) represents matrix with ignored IC’s and SPE represents residual part where e is the residual error with \({\mathbf {e}}=\mathbf {X}_{\mathbf {c}}(i)-\hat{\mathbf {X}_{\mathbf {c}}}\) with \(\hat{\mathbf {X}_{\mathbf {c}}}={\mathbf {Q}}^{-1}{\mathbf {V}}_{k}{\mathbf {W}}_{k}\mathbf {X_{c}}\).

  4. 4.

    Next, thresholds \(Th_{1}\), \(Th_{2}\) and \(Th_{3}\) are developed for fault indicators in Eqs. (12), (13) and (14) using Kernel density estimation (KDE) approach [19].

The online monitoring of ICA fault detection strategy consists of following steps:

  1. 1.

    A new process data \(\mathbf {X}_{\mathbf {new}}\) is centered to zero mean.

  2. 2.

    Fault indicators in Eqs. (12), (13) and (14) are computed for \(\mathbf {X}_{\mathbf {new}}\).

  3. 3.

    A fault is declared if the value of fault indicators exceeds the threshold:

    • \( I^{2}_{d} \ge Th_{1}\)

    • \( I^{2}_{e} \ge Th_{2}\)

    • \(SPE \ge Th_{3}\)

3 Multi-Scale ICA Modeling Using Wavelets

The data from most industrial processes is correlated in time due to the presence of measurement noise and its effect has to be nullified to extract useful information from the data. Many filtering methods like mean filter, EWMA filter, and median filter have been applied in practice. The data variables extracted from the industrial process have different time-frequency localization and this requires multi-scale filtering. It has been observed that wavelet functions are useful in capturing multi-scale features present in the process data. An advantage carried by wavelets is that they can separate useful features of data from the noise through the application of high pass and low pass filters. The other advantage of wavelets is that they decorrelate noise at individual decomposition depths [31]. In this paper, multi-scale representation using wavelets are integrated with the ICA strategy to have MSICA FD strategy.

In multi-scale representation, the wavelet transform is used to transform original process data into multi-scale components based on time and frequency attributes. The wavelet transform is used to project data x(t) on mathematical basis functions which is described as follows:

$$\begin{aligned} (x,\varPsi _{e,d})=\dfrac{1}{\sqrt{e}}\int _{-\infty }^{\infty }{} \mathbf{x}(t) \varPsi ^{*}(\dfrac{t-d}{e}) dt \end{aligned}$$
(15)

where e and d represent dilation and translation parameters and * represents complex conjugate of the mother wavelet \(\varPsi (t)\). The dilation parameter e describes scaling property of the wavelet where wavelet can be either compressed or expanded in frequency domain to extract sharp changes or slow varying behavior of data. The translational parameter d describes the shifting property where the wavelet is either delayed or advanced along the length of the signal to align with useful features in the data. Different class of wavelet basis functions are available that includes Haar basis function and Daubechies filters [40, 41].

The expression in Eq. (15) is described based on continuous wavelet transform (CWT) which gives redundant information since the parameters e and d are continuously changing. In most practical applications, multi-scale decomposition is performed using discrete wavelet transform (DWT) which also provides an additional benefit of computational efficiency [24]. In wavelet-based data-representation, an input signal can be decomposed into two parts, namely: the scaled or approximation coefficients that captures low-frequency components of the signal and detailed coefficients that captures high-frequency components of the signal. A signal can be decomposed into multi-scale components that consist of scaled coefficient vector at depth L and L detail coefficients at all the scales (finest to coarsest). Some advantages of multi-scale filtering are summed up below:

  1. 1.

    The presence of auto-correlated measurement noise in the data degrades the detection capability of an FD strategy. The wavelet-based data representation involves decomposing the data at multiple scales and wavelet coefficients at each scale L would decorrelate the noise component present in the original data.

  2. 2.

    Most filtering techniques like mean, EWMA and median filters represent data at either time or frequency scale, thus making them inefficient to be used in complex process data. In contrast, the multi-scale representation using wavelets can separate noise from information-rich data by applying low and high pass filters during the multi-scale decomposition process.

  3. 3.

    Malfunctioning sensors generate sudden skewness and added to this, modeling errors introduce lots of variations that deviates towards non-gaussianity. Few data-driven FD methods assume that the data is gaussian in nature and the effectiveness reduces as the data deviates from gaussianity. However, multi-scale decomposition using wavelets causes data to be gaussian at multiple levels and this unique property of wavelets can improve the performance of FD methods when non-gaussian errors are present in the data.

To have an efficient MSICA strategy, the scaled data is decomposed to different depths and represented in multi-scale matrix \({\mathbf {X}}_{d}\in \mathfrak {R}^{\textit{m}\times \textit{n}(L+1)}\) using DWT [29]:

$$\begin{aligned} {\mathbf {X}}_{d} = [{\mathbf {X}}_{1}, {\mathbf {X}}_{2}, \cdots {\mathbf {X}}_{L}, {\mathbf {X}}_{L+1}] \end{aligned}$$
(16)

where the components \({\mathbf {X}}_{1}, {\mathbf {X}}_{2} \cdots {\mathbf {X}}_{L}\) contain detailed coefficients at each scale, while the last component \({\mathbf {X}}_{L+1}\) contains the scaled or coefficients at the coarsest scale L. The above representation using wavelet coefficients helps in capturing localized features at each scale which can enhance monitoring performance of the proposed FD strategy.

3.1 Multi-Scale ICA Modeling

The selection of optimum decomposition depth or scale L is an important factor in multi-scale de-noising using wavelets. Decomposing the data too deep on different coefficients results in information being lost. In contrast, if data is not decomposed too deep, unwanted noise is retained in the data and this masks important information in the data. Hence, the level of decomposition has to be decided cautiously for ensuring that the noise is removed and information is retained. In this work, a novel technique is proposed that involves development of ICA model at individual scales followed by performing prediction to compute MSE at individual scales. The scale at which the MSE is minimum is regarded as the optimum scale [31].

The proposed method for computing decomposition depth is presented in Fig. 1. Initially, the data is split into training and testing data groups, namely: \(\mathbf {X}_{\mathbf {tr}}\) and \(\mathbf {X}_{\mathbf {te}}\). The training data is decomposed to different depths using wavelets and the ICA model is developed at each decomposition depth L. Next, the MSE is computed on testing data at depth L using the ICA model:

$$\begin{aligned} MSE = \dfrac{1}{n}[{\mathbf {e}}^{T}{\mathbf {e}}] \end{aligned}$$
(17)

where,

$$\begin{aligned} {\mathbf {e}}= & {} \mathbf {X}_{\mathbf {te}}-\hat{\mathbf {X}_{\mathbf {te}}} \end{aligned}$$
(18)
$$\begin{aligned} \hat{\mathbf {X}_{\mathbf {te}}}= & {} {\mathbf {Q}}^{-1}{\mathbf {V}}_{k}{\mathbf {W}}_{k}\mathbf {X}_{\mathbf {te}} \end{aligned}$$
(19)

where Q is the whitening matrix, V and W are model parameters corresponding to k retained optimum IC’s. In this proposed method, the ICA model is developed at each scale L and prediction capability of developed model is utilized at each scale. This leads to improved de-noising and extraction of features in process data which enhances detection of faults in the process.

Fig. 1
figure 1

MSICA modeling algorithm for calculating decomposition depth

Fig. 2
figure 2

A schematic depiction of MSICA fault detection strategy

3.2 The MSICA Fault Detection Strategy

This paper aims to develop a robust MSICA FD strategy that amalgamates ICA strategy with multi-scale wavelet filtering. Since data from petro-chemical and other process industries is nonlinear and non-gaussian, the data-driven methods based on PCA and PLS fail to perform fault detection efficiently. The ICA technique is preferred over the other data-driven methods since it has a meaningful representation of non-gaussian data. Due to harsh environments in petrochemical process plants, a large amount of noise is embedded into the measurements. If the noisy data is used directly for model development stage without any pre-treatment, important features in the data will be masked by noise, resulting in an information-poor ICA model. The parameters of ICA model would not be able to capture information related to faults in a new process data precisely. Hence, the process data is pre-treated where it undergoes simultaneous extraction of information in time-frequency region through wavelet functions. The de-noised data will be used to construct the ICA model. This would lead to the development of information rich model and eventually enhances fault detection capability.

The block diagram of the proposed MSICA-based fault detection strategy is presented in Fig. 2. The task of fault detection is divided into offline monitoring and online monitoring stages. In offline monitoring stage, once the data under normal process operation is acquired from a process, it undergoes pre-treatment of scaling to zero-mean. This is followed by decomposing the data to different scales using the wavelet functions and reconstruction to get filtered data. This is followed by developing an reference ICA model. From the ICA model parameters, thresholds are computed for the three fault indicators using KDE approach. In online monitoring stage, the new data is processed to mean of zero and fault indicators described in Eqs. 1213 and 14 are computed. Next, the value of fault indicators is compared with the thresholds computed in offline monitoring stage. If the value of fault indicators is lesser than the threshold, the new data is fault-free. However, if the value of fault indicators exceeds the threshold, it indicates presence of fault in the data.

4 Case Studies

Three case studies are considered to illustrate the potentiality of the proposed MSICA FD strategy in monitoring the sensor faults: (i) dynamic multi-variate process (ii) simulated quadruple tank process and (iii) simulated distillation column process. The fault detection rate (FDR) and false alarm rate (FAR) indices are used to evaluate the performance of each fault detection strategy and they are calculated in percentage using the below representation:

$$\begin{aligned} \text {FDR}= & {} \dfrac{\text {Total samples }(J>J_{th})|\text { abnormal condition}}{\text {Total samples in faulty region}}\nonumber \\ \end{aligned}$$
(20)
$$\begin{aligned} \text {FAR}= & {} \dfrac{\text {Total samples }(J>J_{th})|\text { normal condition}}{\text {Total samples in non-faulty region}}\nonumber \\ \end{aligned}$$
(21)

The proposed MSICA monitoring strategy is compared against PCA, ICA and MSPCA monitoring strategies through FAR and FDR indices.

Fig. 3
figure 3

Dynamic multi-variate process: Monitoring results of PCA and MSPCA techniques for intermittent fault

4.1 Dynamic Multi-Variate Process

Here, we present the monitoring of proposed MSICA strategy on a dynamic multi-variate process which is represented as:

$$\begin{aligned} {{\mathbf {z}}}_{a}( i )= & {} \begin{bmatrix} \mathbf {0.118} &{} \mathbf {-0.191} &{} \mathbf {0.28} \\ \mathbf {0.847} &{} \mathbf {0.264} &{} \mathbf {0.94} \\ \mathbf {-0.333} &{} \mathbf {0.514} &{} \mathbf {-0.21} \end{bmatrix}\times {{\mathbf {z}}}_{a}(\textit{i-1}) \nonumber \\&+ \begin{bmatrix} {\mathbf {1}} &{} {\mathbf {2}} \\ {\mathbf {3}} &{} \mathbf {-4} \\ \mathbf {-2} &{} {\mathbf {1}} \end{bmatrix}\times {{\mathbf {u}}}_{a}(\textit{i-1}) \end{aligned}$$
(22)
$$\begin{aligned} {{\mathbf {y}}}_{a}(\textit{i})= & {} {{\mathbf {z}}}_{a}(\textit{i}) + {{\mathbf {v}}}_{a}(\textit{i}) \end{aligned}$$
(23)
$$\begin{aligned} {{\mathbf {u}}}_{a}( i )= & {} \begin{bmatrix} \mathbf {0.811} &{} \mathbf {-0.226} \\ \mathbf {0.477} &{} \mathbf {0.415} \end{bmatrix}\times {{\mathbf {u}}}_{a}(\textit{i-1}) \nonumber \\&+ \begin{bmatrix} \mathbf {0.193} &{} \mathbf {0.689} \\ \mathbf {-0.320} &{} \mathbf {-0.749} \end{bmatrix}\times {{\mathbf {w}}}_{a}(\textit{i-1}) \end{aligned}$$
(24)

Based on parameter values described in [42], a dynamic simulator is used to generate data of 2000 samples involving five variables \([{\mathbf {y}}_{a1},\mathbf{y }_{a2},\mathbf{y }_{a3},\mathbf{u }_{a1},\mathbf{u }_{a2}]\). The data is then split into 1000 samples each of training and testing groups. After normalization of training data, PCA, MSPCA, ICA and MSICA models are developed using the CPV approach. For PCA and MSPCA models, four optimum PCs are retained while three optimum ICs are retained for ICA and MSICA models, respectively. The optimum decomposition depth was found to be four for both MSPCA and MSICA strategies.

The effectiveness of the MSICA FD strategy is illustrated on three types of faults, namely: sustained bias, drift, and intermittent fault. In the first case, a bias is injected in variable \({\mathbf {y}}_{a1}\) of the testing data for sampling times ranging from 400 to 1000. Next, a sensor drift fault is considered where a ramp signal with a slope of 0.075 is introduced in variable \({\mathbf {u}}_{a2}\) between samples 400 to the end of testing data. Finally, an intermittent fault is introduced in variable \({\mathbf {y}}_{a2}\) at sampling times [80,170], [475,565] and [850,940] respectively. To provide clarity to the reader, performance of the proposed strategy in monitoring an intermittent fault is presented for the case with SNR=5. The results can be observed in Figs. 3 and 4. From Fig. 3, the PCA-\(T^{2}\) and MSPCA-\(T^{2}\) strategies fail to detect this fault. While the PCA-SPE method detects the fault partially with small number of false alarms, the MSPCA-SPE technique has better detection ability. From Fig. 4, ICA-\(I^{2}_{d}\) and ICA-\(I^{2}_{e}\) detect the fault better than ICA-SPE. In contrast, the proposed MSICA-\(I^{2}_{d}\), MSICA-\(I^{2}_{e}\) and MSICA-SPE strategies detect the fault clearly with minimum missed detections. The performance of the proposed MSICA FD strategy for different noise realizations is presented in Tables 1 and 2. The results indicate that at lower noise levels, the performance of all the strategies is very good. As the level of noise is increased in the data, the performance of MSICA strategy is superior to other strategies. This is because the multi-scale representation in the MSICA strategy captures important non-gaussian features from the noisy data, which enhances monitoring performance.

Fig. 4
figure 4

Dynamic multi-variate process: Monitoring results of ICA and MSICA techniques for intermittent fault

Table 1 Dynamic multi-variate process: False alarm rate for different noise realizations
Table 2 Dynamic multi-variate process: Fault Detection Rate for different noise realizations

4.2 Quadruple Tank Process

The quadruple tank is a simple process that has been widely used in control system related research over the last few years. The process consists of four tanks which are interconnected and they reveal an interesting multi-variable phenomena. The two pump voltages \( v _{1}\) and \( v _{2}\) serve as the inputs while \(h_{1}\), \(h_{2}\), \(h_{3}\) and \(h_{4}\) heights of the liquid in tanks. The schematic of quadruple tank process is presented in Fig. 5. The dynamic description of quadruple tank process is represented through mathematical equations as follows [43]:

$$\begin{aligned} \frac{dh_{1}}{dt}= & {} \frac{q_{1}k_{1}V_{1}}{A_{1}}+\frac{a_{3}\sqrt{2gh_{3}}}{A_{1}}-\frac{a_{1}\sqrt{2gh_{1}}}{A_{1}} \end{aligned}$$
(25)
$$\begin{aligned} \frac{dh_{2}}{dt}= & {} \frac{q_{2}k_{2}V_{2}}{A_{2}}+\frac{a_{4}\sqrt{2gh_{4}}}{A_{2}}-\frac{a_{2}\sqrt{2gh_{2}}}{A_{2}} \end{aligned}$$
(26)
$$\begin{aligned} \frac{dh_{3}}{dt}= & {} \frac{(1-q_{2})k_{2}V_{2}}{A_{3}}-\frac{a_{3}\sqrt{2gh_{3}}}{A_{3}} \end{aligned}$$
(27)
$$\begin{aligned} \frac{dh_{4}}{dt}= & {} \frac{(1-q_{1})k_{1}V_{1}}{A_{4}}-\frac{a_{4}\sqrt{2gh_{4}}}{A_{4}} \end{aligned}$$
(28)

where \(q_{1}\) and \(q_{2}\) are ratio of valves; \(k_{1}\) and \(k_{2}\) are pump constants; g is gravity of earth ; \(a_{1}\), \(a_{2}\), \(a_{3}\) and \(a_{4}\) represent area of outlet pipes and \(A_{1}\), \(A_{2}\),\(A_{3}\) and \(A_{4}\) are the area of individual tank respectively.

Fig. 5
figure 5

A schematic of quadruple tank process

MATLAB is used to perform dynamic simulations for generating the data, which consists of 2000 observations with six variables. Perturbation of the inputs around the nominal operating point with pseudo random binary signal in the range of frequency [0 0.03 \(\omega _{n}\)] is carried out. Here, \(\omega _{n}=\pi /T\) represents the Nyquist frequency. The six variables in the case study include two pump voltages and the height of liquids in the tanks. Next, the data is split equally into training and testing data groups. After normalization of training data, PCA, MSPCA, ICA and MSICA models are developed. For PCA and MSPCA models, five optimum PCs are retained while four optimum ICs are retained for ICA and MSICA models using the CPV approach. The optimum decomposition depth was found to be three in both MSPCA and MSICA monitoring strategies.

This section describes the performance of the proposed MSICA FD strategy in monitoring sensor faults in simulated quadruple tank set up. First, a sustained bias is injected into tank 2 height variable \(h_{2}\) for sampling times ranging from 400 to 1000 of the testing data. Next, an intermittent fault is introduced into tank 2 height variable \(h_{2}\) between sampling times [150,250], [495,595] and [880,980] respectively. Finally, a ramp signal resembling a sensor drift fault is injected in tank 3 height variable \(h_{3}\) for sampling times ranging from 450 to 1000. To provide clarity to the reader, the results of proposed strategy in monitoring drift fault for the case SNR=10 are presented in detail. Figure 6 suggests that the MSPCA strategy has better detection capability compared to PCA strategy. The MSPCA-\(T^{2}\) detects the fault with a large false alarm rate while MSPCA-SPE detects smoothly with a small delay. From Fig. 7, it is observed that fault indicators of the ICA technique detect the fault with a small delay and lesser false alarm rate. In contrast, the fault indicators of MSICA strategy can identify the fault clearly by providing a smooth transition. The performance of FD strategies for different noise realizations are presented in Tables 3 and 4. As the level of noise is increased, the proposed MSICA strategy over-performs other methods as observed in the case of data with SNR=5. The proposed MSICA FD strategy has good advantage in terms of both FAR as well as FDR. The multi-scale representation of data ensures efficient information being captured from the process data and hence, the proposed MSICA strategy can provide better detection of faults in a noisy environment.

Fig. 6
figure 6

Quadruple tank process: Monitoring results of PCA and MSPCA techniques for drift fault

Fig. 7
figure 7

Quadruple tank process: Monitoring results of ICA and MSICA techniques for drift fault

Table 3 Quadruple tank process: False Alarm Rate for different noise realizations
Table 4 Quadruple tank process: Fault Detection Rate for different noise realizations

4.3 Distillation Column Process

Distillation column (DC) is a high energy-consuming unit in a chemical process plant for separating components from the mixture of components based on their difference in vapor pressure. Proper monitoring of distillation column process is necessary to avoid any accident and loss of product quality in the industry. The distillation column setup consists of 32 plates and 10 resistance temperature detector sensors to monitor the temperature at different location of the process. The schematic of a DC process is presented in Fig. 8.

Fig. 8
figure 8

A schematic of Distillation Column process

The Aspen simulator is used for generating distillation column dynamic data [44]. To begin with, feed and reflux flow rates are perturbed around their nominal operating ranges. Once the system has reached steady state condition, these perturbations would be used to generate data. The input variables consist of temperatures corresponding to measurements at ten locations of the column along with flow rates of feed and reflux. A total data length of 1024 samples with 14 variables (two input variables, ten measured variables and two output variables) is generated which is then split equally into training and testing data groups. The PCA, MSPCA, ICA and MSICA models are developed after the normalization of training data. The CPV approach is used to select optimum latent variables that results in seven and six PCs for PCA and MSPCA techniques while seven and seven optimum ICs for ICA and MSICA models, respectively. The optimum decomposition depth is found to be three and two in the case of MSPCA and MSICA models. Figure 9 illustrates optimum decomposition depth computation for MSICA strategy.

A bias is introduced in temperature variable 5 for sampling times ranging from 150 to 350 of the testing data. The monitoring results of PCA and MSPCA strategies are presented in Fig. 10 and monitoring results of ICA and MSICA strategies in Fig. 11. The PCA-\(T^{2}\) and PCA-SPE strategies are unable to detect the bias fault introduced in the testing data. While MSPCA-\(T^{2}\) has slightly better performance, the MSPCA-SPE strategy can detect the bias fault in the given fault range, however, with few false alarms. As observed in Fig. 11, the three fault indicators of the ICA strategy cannot detect the fault clearly. Comparatively, the three fault indicators of MSICA strategy detect the fault precisely with very minimum false alarms, thus, displaying a clear advantage.

Fig. 9
figure 9

Computation of optimum decomposition depth for simulated distillation column process

Fig. 10
figure 10

DC process: Results of PCA and MSPCA strategies in monitoring bias fault

Fig. 11
figure 11

DC process: Results of ICA and MSICA strategies in monitoring bias fault

Fig. 12
figure 12

DC process: Results of PCA and MSPCA strategies in monitoring intermittent fault

Fig. 13
figure 13

DC process: Results of ICA and MSICA strategies in monitoring intermittent fault

Next, the performance of the MSICA FD strategy in monitoring an intermittent fault is demonstrated. A sustained bias is injected in temperature variable 5 for time instants [100,200] and [350,450]. The monitoring results of PCA and MSPCA strategies are presented in Fig. 12. It is observed that PCA-based fault indicators are unable to detect the fault. While MSPCA-\(T^{2}\) detects the fault partially, the MSPCA-SPE detects the fault with few false alarms. Next, the performances of ICA and MSICA strategies in monitoring this fault are presented in Fig. 13. The ICA-\(I^{2}_{d}\), ICA-\(I^{2}_{e}\) and ICA-SPE based methods detects the fault partially with few missed detections. In contrast, the MSICA-\(I^{2}_{d}\), MSICA-\(I^{2}_{e}\) and MSICA-SPE based fault indicators detect the fault clearly with minimum false alarms.

The performance of FD strategies for different noise realizations is presented in Tables 5 and 6. The results indicate that all FD strategies provides good monitoring results for the case of SNR=20. However, as the level of noise is increased in the data, the MSICA strategy over-performs other methods because of the powerful de-noising scheme provided by the wavelets. Hence, it can be concluded that the MSICA model captures useful process information in presence of noise where as other model structure fails to do so. This is because wavelet-based data representation eliminates the effect of noise and has better representation of data that enhances fault detection.

Table 5 Distillation column process: False Alarm Rate for different noise realizations
Table 6 Distillation column process: Fault detection rates for different noise realizations

5 Conclusion

The measured data from chemical processes are non-gaussian in nature and this has led to frequent usage of ICA FD strategy in process monitoring problems. However, the presence of measurement noise masks important process information and degrades the monitoring efficiency of ICA based strategy. Among various data-filtering methods available, multi-scale filtering using wavelets has found to be useful owing to its ability to capture information from the time-frequency scale. In this work, ICA FD strategy was integrated with wavelet filtering to have a novel MSICA strategy that can enhance detection of faults in noisy process environments. The optimum decomposition depth was obtained by using ICA model estimation at each depth and selecting the depth that gives the minimum MSE of model prediction. The performance of the MSICA FD strategy was illustrated using three case studies, namely: dynamic multi-variate process, quadruple tank process and distillation column process. A study was carried to assess the performance of FD strategies for different levels of noise. As the level of noise in the measured data was increased, the proposed MSICA strategy over-performed other methods with a good FDR value. This is because the multi-scale decomposition using wavelets filters noise and extracts important process information that would enhance fault detection. Hence, it can be concluded that the proposed MSICA strategy satisfies the desired properties of a good fault detection scheme. The proposed MSICA fault detection strategy is linear in nature and is restricted to linear processes only. As a part of future work, the MSICA strategy can be applied for industrial processes that are nonlinear in nature.