A Dynamic Soft Sensor Based on Hybrid Neural Networks to Improve Early Off-spec Detection

Hong, Seokyoung; An, Nahyeon; Cho, Hyungtae; Lim, Jongkoo; Han, In-Su; Moon, Il; Kim, Junghwan

doi:10.1007/s00366-022-01694-7

A Dynamic Soft Sensor Based on Hybrid Neural Networks to Improve Early Off-spec Detection

Original Article
Open access
Published: 23 July 2022

Volume 39, pages 3011–3021, (2023)
Cite this article

Download PDF

You have full access to this open access article

Engineering with Computers Aims and scope Submit manuscript

A Dynamic Soft Sensor Based on Hybrid Neural Networks to Improve Early Off-spec Detection

Download PDF

Seokyoung Hong¹^na1,
Nahyeon An^1,2^na1,
Hyungtae Cho²,
Jongkoo Lim³,
In-Su Han³,
Il Moon¹ &
…
Junghwan Kim ORCID: orcid.org/0000-0002-2311-4567²

2337 Accesses
4 Citations
Explore all metrics

A Correction to this article was published on 30 August 2022

This article has been updated

Abstract

Soft sensors are widely used to predict hard-to-measure quality variables in industrial processes. For efficient quality control, prediction of quality dynamics is essential to prevent off-specification production in a process. Recently, dynamic soft sensors have been developed using machine learning techniques. Time-sequential information of quality variables is important to develop a robust dynamic model, but it is rarely considered in soft sensor modeling because there are insufficient data available to construct a time series of quality variables. Hence, we propose a hybrid sequence-to-sequence recurrent neural network-deep neural network (Seq2Seq RNN–DNN) to predict the quality dynamics for an early off-spec detection system. In the RNN unit, the encoder extracts the dynamic states of the process variables, and the decoder generates a time-relevant sequence to improve the long-term time-series prediction of sensor variables. Quality dynamics are then predicted using sensor variables in the DNN unit, trained using combined dataset consisting of offline analysis and simulation data to solve the problem of insufficient data. Finally, the effectiveness of the proposed networks is demonstrated using a 2,3-butanediol distillation process.

Deep Learning Approaches to Detect Real Time Events Recognition in Smart Manufacturing Systems – A Short Survey

Data-Driven Soft Sensor Model Based on Deep Learning for Quality Prediction of Industrial Processes

Article 15 January 2021

Benefits of Using Digital Twin for Online Fault Diagnosis of a Manufacturing System

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Accurate and reliable online measurement of quality variables is important for effective process monitoring and control in intelligent factories [1]. Most of these quality variables are measured by offline laboratory analysis or online measurements [2]. Although laboratory analysis can provide precise measurements, it requires a long sampling cycle, which results in a large measurement delay [3, 4]. Online measurements could measure the quality variable in real-time, but they are expensive and lack reliability [5]. To resolve these problems, soft sensors employing virtual sensing techniques can be used to estimate quality variables based on other process variables available online, such as flow rate, pressure, and temperature [6]. Over the past decade, soft sensors have received considerable attention owing to their rapid response and low maintenance costs [7].

Soft sensors can be broadly categorized in two types: white box models (first-principle models) and black box models (data-driven models). White box models are a method based on mass and energy balance and various chemical and physical equations. However, these have drawbacks as they require strong domain knowledge of the process and demands enormous calculation cost to build the models. Furthermore, these focus on the ideal steady-states, so they cannot describe the actual operating conditions. Black box models can consider actual process conditions with a little domain knowledge to be modeled. As data-driven modeling has recently developed, various machine learning-based algorithms have led to efficient soft sensor development, such as PLS [8], SVR [9], fuzzy systems [10], deep kernel learning [12], Gaussian process regression [13], etc. In particular, the artificial neural network (ANN) method is widely used in soft sensor developments because it can establish a complicated relationship between input and output variables [7, 11]. Deep learning-based soft sensing models, that have a powerful ability to learn the essential features of data, are also developed and have shown better prediction performances [14, 15]. To consider the dynamic states of the process, hybrid methods that integrate machine learning algorithms and regression models are proposed using ARMA [16], and NARX [17]. A recurrent neural network (RNN) in deep learning technology, which deals with time series data from past steps, is used to develop a dynamic soft sensor model [18]. RNNs can extract the sequential information available in the input data and exhibit better performance in dynamic modeling [19]. However, standard RNNs have difficulties in modeling long sequences because of the gradient vanishing problem [20]. Long short-term memory (LSTM) networks and gated recurrent unit (GRU) networks have been developed to address this problem using memory cells that store the long sequence [21]. The LSTM network is employed to extract the hidden dynamics from the input sequence and quality variables, which shows a better quality prediction accuracy compared with that of RNNs [22]. Furthermore, sequence-to-sequence LSTM networks using encoder–decoder architectures can learn sequential information of both output and input variables simultaneously [23]. Attention-based sequence-to-sequence networks enable the development of an explainable model with importance weights of input variables to predict quality variables [24].

However, existing soft sensors focus on predicting the current quality value and use recursive structure to consider the interaction between quality variables. Therefore, these soft sensors distinguish between on-spec and off-spec conditions based on the current quality value. Consequently, soft sensors with limited information cannot prevent off-spec occurrences because off-spec detection is followed by process control. It is important to predict dynamic behavior of quality variables in the future as well as current prediction for establishment of appropriate decision making. Because dead time, which is the time required to transport materials in industrial processes, is present in a practical control system [25]. The dead time causes a lag between the off-spec detection and the return to the on-spec condition, resulting in significant product loss during the associated recovery time. If quality deterioration could be identified in advance, a preventive policy could be established, and losses would be minimized. The quality variables are determined by the process states with the corresponding dynamics in the process unit. This means that future quality changes can be predicted using process dynamics analysis. Although existing networks significantly improve the prediction performance and robustness of soft sensors, there are several problems in multi-step prediction because they are single-time prediction models with quality-relevant sequences. For a single-time model, a direct prediction method that constructs multiple independent models from the same historical values is used to predict multi-steps directly [26]. Direct prediction outputs multi-step results simultaneously without error accumulation, but the predicted outputs lose their sequential information [27]. To accurately predict future process state changes, the temporal continuity of the output sequence should be maintained to reflect the dynamic state of the process. Therefore, a robust dynamic model can enable reliable future predictions and the recursive structure which accurately predicts the dynamics of future quality variables can handle this issue.

In this study, we developed an early off-spec detection system with a dynamic soft sensor for multi-step prediction of quality variables with temporal correlation. Accordingly, a hybrid sequence-to-sequence recurrent neural network-deep neural network (Seq2Seq RNN–DNN) model has been proposed to address two problems in soft sensor modeling. First, the RNN encoder-decoder architecture is employed to handle the sequence-to-sequence dataset with process dynamics. The encoder extracts dynamic hidden states from the input sequence and the decoder takes the dynamic information to predict the output sequence while maintaining the temporal correlation of the predicted values. Unlike the direct prediction method, this approach improves the prediction performance for long-time series. Second, a combined dataset obtained from offline measurements and process simulation is used to solve the insufficient data problem. Due to the long cycle of offline laboratory analysis, data for soft sensor modeling is limited. Small numbers of data can reduce the predictive performance of a resulting model from data-driven modeling. Therefore, lab measurement data are combined with simulation data to be used for training the DNN.

The remainder of this paper is structured as follows: Sect. 2 provides a brief introduction to the background of the direct prediction method and sequence-to-sequence network. Sect. 3 describes the proposed hybrid Seq2Seq RNN–DNN used for constructing a dynamic soft sensor. A case study is carried out for an industrial process to evaluate the performance of the proposed model in Sect. 4. Finally, conclusions are presented in Sect. 5.

2 Background

2.1 Direct method for multi-step prediction

To predict multi-step with a single-time prediction model, the number of models and time steps must be equal. This prediction method is referred to as a direct prediction method. The key advantage of the direct method for multi-step prediction is its computational simplicity. However, it requires considerable calculation time to establish models for each time step. Furthermore, the direct method loses temporal information of the predicted values. It decreases the prediction accuracy with a long-time series because it predicts the output at each time using an independent model, as shown in Fig. 1. Given the initial dataset, $N$ training sets are created first, each having the same input sequence $\mathbf{X}=[{\mathbf{x}}_{1},{\mathbf{x}}_{2},\dots ,{\mathbf{x}}_{L}]$ but different output ${\mathbf{y}}_{k}$. For example, the output variables for the first prediction model are ${\mathbf{y}}_{1}$, the output variables for the second prediction model are ${\mathbf{y}}_{2}$, and so on. By training each dataset independently, $N$ regression models ${f}_{k}$ $(k=t,t+1,\dots ,N)$ are obtained and the models are used to predict future $N$ values as follows:

$${\mathbf{y}}_{k}={f}_{k}\left({\mathbf{x}}_{1},{\mathbf{x}}_{2},...,{\mathbf{x}}_{{\varvec{L}}}\right) \left(k=\mathrm{1,2},\dots ,N\right),$$

(1)

where $L$ and $N$ represent the history window size and prediction window size, respectively.

2.2 Deep neural network

A deep neural network (DNN) is an ANN with multiple fully connected layers consisting of input, hidden, and output layers. DNNs have previously been applied to predict and simulate many physical problems with high performance. In the DNN structure, each layer receives the output from the previous layer and transfers the output to the next layer. The hidden layers with feedforward networks are trained using backpropagation stochastic gradient descent. The accuracy of the model depends on the chosen architecture, its hyperparameters, the nature of the data, and the learning process. The outputs $(\mathbf{h})$ of the first, hidden, and output layers in the DNN are expressed as:

$${\mathbf{h}}_{i}=\sigma \left({\mathbf{W}}_{i}^{T}\mathbf{x}+{\mathbf{b}}_{i}\right),$$

(2)

$${\mathbf{h}}_{n}=\sigma \left({\mathbf{W}}_{n}^{T}{\mathbf{h}}_{n-1}+{\mathbf{b}}_{n}\right),$$

(3)

$$\widehat{\mathbf{y}}={\mathbf{W}}_{o}^{T}{\mathbf{h}}_{N}+{\mathbf{b}}_{o},$$

(4)

where $\mathbf{W}$ and $\mathbf{b}$ represent the weight matrix and bias vector of the nth hidden layer, respectively. For the input layer, the input variable vector $(\mathbf{x})$ is used instead of ${\mathbf{h}}_{n-1}$. Meanwhile, for the output layer, the predicted values of the output layer ($\widehat{\mathbf{y}})$ are used instead of ${\mathbf{h}}_{n}$, and there is no activation function.

2.3 Recurrent neural network

An RNN stores the past data and forwards the information to calculate the output of the next step [28]. Unlike feedforward DNN, RNN can model temporal dynamics using some form of memory. LSTM and GRU are variants of the standard RNN used to handle the gradient vanishing problem and the explosion of long-term dependencies observed in RNN [29].

As shown in Fig. 2, a typical LSTM cell is configured primarily by three gates: input gate (${i}_{t}$), forget gate (${f}_{t}$), and output gate (${o}_{t}$). The input gate takes newly incoming data and stores the new information in the cell state. The forget gate decides what to forget from the cell state. The output gate receives the calculated cell state and outputs the result of the LSTM cell. Equations (5)–(8) represent the input gate, forget gate, output gate, candidate cell state (${\widetilde{C}}_{t})$, cell state (${C}_{t}$), and the final output (${h}_{t}$), respectively:

$${i}_{t}=\sigma \left({\mathbf{W}}_{xi}{\mathbf{x}}_{t}+{\mathbf{W}}_{hi}{\mathbf{h}}_{t-1}+{b}_{i}\right),$$

(5)

$${f}_{t}=\sigma \left({\mathbf{W}}_{xf}{\mathbf{x}}_{t}+{\mathbf{W}}_{hf}{\mathbf{h}}_{t-1}+{b}_{f}\right),$$

(6)

$${o}_{t}=\sigma \left({\mathbf{W}}_{xo}{\mathbf{x}}_{t}+{\mathbf{W}}_{ho}{\mathbf{h}}_{t-1}+{b}_{o}\right),$$

(7)

$${\widetilde{C}}_{t}=\mathrm{tan}h\left({\mathbf{W}}_{xc}{\mathbf{x}}_{t}+{\mathbf{W}}_{hc}{\mathbf{h}}_{t-1}+{b}_{c}\right),$$

(8)

$$C_{t} = f_{t} \odot C_{t - 1} + i_{t} \odot \tilde{C}_{t} ,$$

(9)

$$h_{t} = o_{t} \odot {\text{tan}}h\left( {C_{t} } \right),$$

(10)

where $\mathbf{W}, b$, and $\sigma$ represent the weight matrix, bias vector, and sigmoid function, respectively, and $\odot$ is the pointwise multiplication of two vectors.

Meanwhile, as shown in Fig. 3, a GRU cell has two gates: an update gate $({z}_{t})$ and a reset gate $({r}_{t})$. The reset gate determines which previous information is to be kept and combined with the new input data. The update gate decides how much prior memory is retained for the future time steps. Both gates determine how much of the past and present information to use and generate new hidden state information. Equations (11)–(14) represent the update gate, reset gate, candidate hidden state (${\widetilde{\mathbf{h}}}_{t})$, and hidden state (${\mathbf{h}}_{t}$), respectively:

$${\mathbf{z}}_{t}=\sigma \left({\mathbf{W}}_{xz}{\mathbf{x}}_{t}+{\mathbf{W}}_{hz}{\mathbf{h}}_{t-1}\right),$$

(11)

$${\mathbf{r}}_{t}=\sigma \left({\mathbf{W}}_{xr}{\mathbf{x}}_{t}+{\mathbf{W}}_{hr}{\mathbf{h}}_{t-1}\right),$$

(12)

$${\tilde{\mathbf{h}}}_{t} = {\text{tan}}h({\mathbf{W}}_{xh} {\mathbf{x}}_{t} + {\mathbf{W}}_{rh} \left( {{\mathbf{r}}_{t} \odot {\mathbf{h}}_{t - 1} } \right)),$$

(13)

$${\mathbf{h}}_{t} = \left( {1 - {\mathbf{z}}_{t} } \right) \odot {\mathbf{h}}_{t - 1} + {\mathbf{z}}_{t} \odot {\tilde{\mathbf{h}}}_{t} .$$

(14)

2.4 Sequence-to-sequence network

The sequence-to-sequence network was proposed for machine translation tasks [30]. Fig. 4. shows a generalized sequence-to-sequence network consisting of an encoder and a decoder. The encoder comprises a stack of RNN layers that output the hidden state using the input vector and the last hidden state from the previous time. The hidden state at the final time step is then converted to a fixed length vector ($\mathbf{C}$), and the vector is then fed into another stack of RNN layers called the decoder. The decoder predicts the output sequence, using the final hidden state from the encoder and the last output state. The types of encoder–decoder units can be any RNN variant, such as LSTM or GRU.

3 Dynamic soft sensor based on hybrid Seq2Seq RNN–DNN

In this section, hybrid Seq2Seq RNN–DNN is developed to improve long-time series prediction and solve the insufficient data problem. Seq2Seq RNN is a powerful prediction method. However, the quality data obtained irregularly from laboratory analysis cannot be used in Seq2Seq structure because RNN requires time-series data for model training. Therefore, hybrid model is required to predict future quality variables which has two steps: prediction of sensor variables in dynamic system and quality measurement using predicted sensor variables. Fig. 5. illustrates the structure of the Seq2Seq RNN–DNN. First, Seq2Seq RNN is trained using time-series data, and it uses historical information and temporal correlation of process variables to predict future sensor variables in dynamic states. Second, DNN trained on combined dataset is used to measure future quality values using output sequence from Seq2Seq RNN. The combined dataset comprises of laboratory analysis data and process simulation data. The detailed steps are described as follows. In the encoder, the RNN network extracts the dynamic features from the history of the process states. Second, the RNN decoder predicts the output sequence of the sensor variables while maintaining the temporal correlation of the time series. Finally, the DNN is utilized as a soft sensor that converts the sensor variables into quality variables at each time step of the output sequence. The steps are described in detail below.

3.1 Sequence-to-sequence RNN network

The network uses a sequence encoder and a sequence decoder structure. The sequential inputs allow the encoder to extract dynamic information from the historical process data and the sequential outputs of the decoder enable the prediction with temporal correlation. First, the process data are reshaped to train the Seq2Seq RNN, as shown in Fig. 6. All process variables $(\mathbf{p}\mathbf{v})$ are divided into two categories: manipulated and sensor variables $(\mathbf{s}\mathbf{v})$. Manipulated variables are adjusted by an operator; sensor variables, which support measurements related to the quality variables, are responses to the manipulation. Therefore, the dynamic state of sensor variables is related to the input sequence of process variables. The reshaped datasets can be denoted as $\{{\mathbf{P}\mathbf{V}}_{t},{\mathbf{S}\mathbf{V}}_{t}, t=\mathrm{1,2},\dots ,S\}$, where $S$ denotes the number of training samples. The input sequence ${\mathbf{P}\mathbf{V}}_{t}=[{\mathbf{p}\mathbf{v}}_{t-L+1},\boldsymbol{ }{\mathbf{p}\mathbf{v}}_{t-L+2},\boldsymbol{ }\dots ,\boldsymbol{ }{\mathbf{p}\mathbf{v}}_{t}]$ is a past time series of process variables with an $L$-step time window, where ${\mathbf{p}\mathbf{v}}_{t}=\left[{pv}_{t}^{1},\boldsymbol{ }{pv}_{t}^{2},\boldsymbol{ }\dots ,{pv}_{t}^{m}\right]$ denotes the $m$ process variables at time $t$. The output sequence ${\mathbf{S}\mathbf{V}}_{t}=[{\mathbf{s}\mathbf{v}}_{t+1},\boldsymbol{ }{\mathbf{s}\mathbf{v}}_{t+2},\boldsymbol{ }\dots ,\boldsymbol{ }{\mathbf{s}\mathbf{v}}_{t+N}]$ is a future time series of sensor variables with an $N$-step time window, where ${\mathbf{s}\mathbf{v}}_{t}=[{sv}_{t}^{1},\boldsymbol{ }{sv}_{t}^{2},\boldsymbol{ }\dots ,{sv}_{t}^{n}]$ denotes the $n$ sensor variables at time $t$.

Second, the RNN module was used as the encoder and the RNN could be replaced with the variants of RNN, such as LSTM and GRU. In the encoder, the output hidden states from the previous time step are utilized as the initial states at the next time step to extract and transfer the dynamic information. The dynamic hidden states of input sequence ${\mathbf{P}\mathbf{V}}_{t}$ are propagated forward through $L$ time steps. The complex dynamic and nonlinear features can be extracted using (15). Here, ${\mathbf{C}}_{t}$ denotes the features output of the RNN encoder for $L$ time steps:

$${\mathbf{C}}_{t}={\mathrm{RNN}}_{\mathrm{encoder}}\left({\mathbf{p}\mathbf{v}}_{t-L+1},{\mathbf{p}\mathbf{v}}_{t-L+2},...,{\mathbf{p}\mathbf{v}}_{t}\right).$$

(15)

After the encoder extracts the dynamic feature extraction, the decoder is used to predict the dynamic state of sensor variables while exploiting the sequential dependence. The input of the RNN decoder at time step $t+k$ consists of two parts, the extracted dynamic feature ${\mathbf{C}}_{t}$ and the last output hidden states at time step $t+k-1$. The dependence among different time steps is forward-propagated through the output sequence of the RNN decoder. Consequently, the output sequence must be decoded from the features and previous outputs as follows:

$${\mathbf{s}\mathbf{v}}_{t+k}={\mathrm{RNN}}_{\mathrm{decoder}}\left({\mathbf{C}}_{t},\boldsymbol{ }{\mathbf{s}\mathbf{v}}_{t+1},{\mathbf{s}\mathbf{v}}_{t+2},\dots ,{\mathbf{s}\mathbf{v}}_{t+k-1}\right).$$

(16)

3.2 DNN soft sensor with combined dataset

A sequence of quality variables is predicted using the sequence of sensor variables from outputs of the RNN decoder. The relationship between sensor and quality variables commonly has high nonlinearity; thus, a soft sensor model is developed based on the DNN algorithm. Quality variables at time step $t+k$ are calculated by the DNN model as follows:

$${\mathbf{q}\mathbf{v}}_{t+k}=\mathrm{DNN}\left({\mathbf{s}\mathbf{v}}_{t+k}\right)\left(k=\mathrm{1,2},...,N\right).$$

(17)

The performance of a data-driven method is significantly dependent on the quantity and quality of data. The quality variables are infrequently measured by an offline sample analysis conducted 4–6 times a day. The limited number of data cannot represent the complicated relationship between sensor and quality variables. Therefore, simulation data are utilized as complementary data to resolve the insufficient data problem. A simulation model of the target process generates additional dataset with various operating conditions. The combined dataset from laboratory analysis and simulation model enable the DNN model to properly learn the information of nonlinear physical properties.

4 Case study

The performance of the proposed RNN–DNN network-based soft sensor model was validated using a 2,3-butanediol (2,3-BDO) distillation column. To investigate the effectiveness of predictions considering temporal correlation, we compared the performance of the proposed sequence-to-sequence network for sensor variable prediction against direct prediction by independent LSTM networks (direct LSTMs). Then, the benefits of simulation data were validated by comparing the combined data-driven model with a laboratory data-driven model. The root-mean-square error ($\mathrm{RMSE}$) and determination coefficient (${R}^{2}$) were used as prediction performance indicators for the soft sensor model.

$$\mathrm{RMSE}= \sqrt{\frac{1}{N}{\sum }_{k=1}^{N}{\left({y}_{k}-{\widehat{y}}_{k}\right)}^{2}},$$

(18)

$${R}^{2}=1-\frac{\sum_{k}{\left({y}_{k}-{\widehat{y}}_{k}\right)}^{2}}{\sum_{k}{\left({y}_{k}-{\overline{y} }_{k}\right)}^{2}},$$

(19)

where ${y}_{k}$ and ${\widehat{y}}_{k}$ denote the actual and predicted values at time $k$, respectively; ${\overline{y} }_{k}$ is the average of the actual values. $\mathrm{RMSE}$ is a measure of absolute errors; therefore, its value is preferred to be low. In contrast, ${R}^{2}$ is a statistical measure of the fit that ranges from 0 to 1. Therefore, a larger ${R}^{2}$ value is preferred.

4.1 Process description and model parameters

The target process was operated as a demonstration plant to produce bio-based 2,3-BDO via natural fermentation. Fig. 7. shows the process flow diagram of the target process to produce a 99 wt% 2,3-BDO product at the bottom of the column. It removes water and acetoin from the top, as well as 2,3-BDO and a small amount of residual acetoin at the bottom. Because the bottom acetoin concentration has a significant influence on the column, it needs to be strictly detected and controlled. To improve the control quality, real-time estimation of the acetoin concentration is required; however, it is difficult to directly measure the impurity content. Thus, the proposed soft sensor was applied to predict the acetoin content as a quality variable in the bottom product.

Over 20 process variables, such as temperature, column pressure, bottom liquid level, flow rate of the feed at the top and bottom, were collected through the distributed control system every minute. Eight process variables were selected as input variables for the Seq2Seq RNN network and three of them were selected as sensor variables to predict the quality variable, that is, acetoin content at the bottom flow rate. Table 1 lists the eight process variables and one quality variable.

Table 1 Process variable description

Full size table

The hyperparameters of each section are chosen using the grid search method. The hyperparameters of temporal correlative GRU involve the number of encoder and decoder hidden layers and hidden neurons, whose candidate sets are {1, 2, 3, 4, 5} and {10, 20, 30, 40}, respectively. The candidates of the number of hidden layers in multivariate DNN are the same as temporal correlative GRU, and those of hidden neurons are {10, 20, 30, 40, 50}. Hyperparameters are selected based on the average value after three iterations of each case and other hyperparameters are listed in Table 2.

Table 2 Hyperparameter setting

Full size table

4.2 Result of multi-step prediction for sensor variables

Fig. 8. shows the actual values from the past thirty steps to future twenty steps and the predicted future values of the first sensor variable by the direct GRU and Seq2seq GRU. The result of the direct method had similar trajectories to the actual values at near-time steps but the difference between the actual and predicted values increased as the time step increased. In contrast, the Seq2Seq network showed well-fit trajectories to the actual values even at later time steps because the model preserved the sequential information in time-series prediction.

To compare the prediction accuracy of direct methods and Seq2Seq networks, the $\mathrm{RMSE}$ values of predicted sensor variables at each time step were calculated and the results of four types of models are shown in Fig. 9. The errors of all models increased as the target time step increased. The direct methods showed better prediction performance than Seq2Seq networks with an average $\mathrm{RMSE}$ value of 0.01 at time step $t+1$. Meanwhile, after the time step $t+9$, the $\mathrm{RMSE}$ values of the Seq2seq GRU were smaller than that of the direct GRU, and after time step $t+12$, the Seq2seq LSTM showed smaller $\mathrm{RMSE}$ values than the direct LSTM. At the last time step ($t+20$), the $\mathrm{RMSE}$ value of Seq2seq GRU was 0.1316, the lowest value among the four models. In contrast to the direct prediction models, which minimized single-step errors, the Seq2Seq networks minimized multi-step errors and consequently showed higher errors at near-time steps but lower average errors of overall time steps. In addition, the result indicates that the reliability of the direct prediction decreased for long-time series.

The overall results from Table 3 show that the Seq2Seq GRU model outperformed the other models with respect to accuracy and training time. The models using the LSTM network showed higher errors and required a longer training time than the models using GRU. The direct methods required multiple models for multi-step prediction, and each model was trained independently using a different training dataset, which resulted in a long training time of over 30 min. On the other hand, Seq2Seq networks required only one prediction model trained by sequential data and the training times were 4.80 and 3.77 min for LSTM and GRU, respectively.

Table 3 Results of overall RMSE values and training time

Full size table

4.3 Prediction of impurity content

The 80 data samples obtained from offline laboratory analysis and the 140 data samples from mathematical simulation were used to develop the DNN soft sensor. Laboratory data were measured under the fulfillment of over 99 wt% 2,3-BDO in bottom flow, while the simulation data were acquired covering a wide operating range less than 99 wt% 2,3-BDO. The dataset for the DNN soft sensor model comprised 80% training data and 20% test data.

Fig. 10. shows the results of the comparison between the actual and predicted values of the developed soft sensor on a set of test data. Given only the experimental dataset obtained under on-spec conditions, DNN rarely gained the knowledge of off-spec conditions in the 2,3-BDO distillation process. Consequently, the trained model showed worse performance, as it could not distinguish between on-spec and off-spec conditions. On the other hand, a soft sensor trained using combined dataset showed outstanding prediction performance in both on-spec $(<0.3)$ and off-spec range $(\ge 0.3)$.

The sensing performance of the developed soft sensor was validated using RMSE values of the test samples under on-spec and off-spec conditions, as shown in Table 4. The result shows that prediction error decreases when the model is trained on the combined dataset. For off-spec data generated from simulation model, the RMSE values of combined data model are much lower than the model trained only on experimental data. Moreover, simulation data also increases prediction accuracy in on-spec condition because the RMSE value decreases from 0.0437 to 0.0394 when the model is trained on combined dataset.

Table 4 Comparison of RMSE values for training and test dataset

Full size table

4.4 Result of early off-spec detection

To investigate the feasibility of the early off-spec detection system, developed models were tested using 650 samples of time-series data. These data were mainly divided into two parts. In the first part of the process history, the process was controlled before the off-spec occurred, while the process was controlled after the off-spec occurred in the second part. The criterion value of off-spec product for the target process was 0.3 g/L acetoin, that is, products containing more than 3.0 g/L acetoin were classified as off-spec products. In Fig. 12, the future qualities are predicted by direct RNN–DNN models and Seq2Seq RNN–DNN models for target time steps $t+10$ and $t+20$, respectively. For the target time step $t+10$, the direct RNN–DNN and Seq2seq RNN–DNN models were able to properly detect the on- and off-spec conditions. In contrast, the Seq2seq RNN–DNN models were able to detect the off-spec occurrence for target time step $t+20$, but the direct RNN–DNN models could not detect the off-spec occurrence. The direct RNN–DNN models malfunctioned by classifying the on-spec condition as the off-spec condition in the first part and failing to detect the off-spec condition in the second part. Consequently, the Seq2seq RNN–DNN models predicted the future quality variable more accurately and are, therefore, more reliable for early detection of off-spec product than the direct RNN–DNN models.

5 Conclusion

In this research, we have proposed a hybrid Seq2Seq RNN–DNN model designed to predict the dynamics of quality variables in multi-step for early off-spec detection. The proposed model consists of two sections: (1) a sequence-to-sequence recurrent units section that captures the process dynamics and (2) the fully connected deep neural network as a soft sensor trained by a combined dataset. The results from the application of the proposed soft sensor to the 2,3-BDO distillation column can be summarized as follows.

1.
The result shows that our approach allows accurate predictions for long-time series and a cheap computational modeling with a short training time. The Seq2Seq RNN model, especially Seq2Seq GRU with an average $\mathrm{RMSE}$ value of 0.0813, outperformed the direct prediction model for 20 time-step predictions and the training time for Seq2Seq GRU was 3.77 min, which is significantly shorter than the 36.69 min of the direct method. Consequently, Seq2Seq RNN–DNN models were able to properly detect off-spec product 20 min in advance.
2.
In addition, we verified the effectiveness of combining offline analysis and simulation data to deal with a limited amount of data for soft sensor modeling.

This model can be used to foster a timely operation by providing an indication of the off-spec product in advance. Consequently, process variables can be adjusted accordingly within a short return time to the on-spec state. Moreover, short training times enable a more practical model for adoption in real processes. In future work, the proposed model can also be applied to a control system for regulating production and to integrate two different time-series prediction methods in a hybrid model frame for improved prediction.

Change history

30 August 2022
A Correction to this paper has been published: https://doi.org/10.1007/s00366-022-01731-5

References

Zhang Y, Teng Y, Zhang Y (2010) Complex process quality prediction using modified kernel partial least squares. Chem Eng Sci 65:2153–2158. https://doi.org/10.1016/j.ces.2009.12.010
Article Google Scholar
Yuan X, Li L, Wang Y et al (2020) Deep learning for quality prediction of nonlinear dynamic processes with variable attention-based long short-term memory network. Can J Chem Eng 98:1377–1389. https://doi.org/10.1002/cjce.23665
Article Google Scholar
Wang D, Liu J, Srinivasan R (2010) Data-driven soft sensor approach for quality prediction in a refining process. IEEE Trans Industr Inf 6:11–17. https://doi.org/10.1109/TII.2009.2025124
Article Google Scholar
Fortuna L, Graziani S, Xibilia MG (2005) Soft sensors for product quality monitoring in debutanizer distillation columns. Control Eng Pract 13:499–508. https://doi.org/10.1016/j.conengprac.2004.04.013
Article Google Scholar
Lin B, Jørgensen SB (2011) Soft sensor design by multivariate fusion of image features and process measurements. J Process Control 21:547–553. https://doi.org/10.1016/j.jprocont.2011.01.006
Article Google Scholar
Xie R, Jan NM, Hao K et al (2020) Supervised Variational autoencoders for soft sensor modeling with missing data. IEEE Trans Ind Inf 16:2820–2828
Article Google Scholar
Yan W, Tang D, Lin Y (2017) A data-driven soft sensor modeling method based on deep learning and its application. IEEE Trans Ind Electron 64:4237–4245. https://doi.org/10.1109/TIE.2016.2622668
Article Google Scholar
Kaneko H, Arakawa M, Funatsu K (2009) Development of a new soft sensor method using independent component analysis and partial least squares. AIChE J 55:87–98. https://doi.org/10.1002/aic
Article Google Scholar
Desai K, Badhe Y, Tambe SS, Kulkarni BD (2006) Soft-sensor development for fed-batch bioreactors using support vector regression. Biochem Eng J 27:225–239. https://doi.org/10.1016/j.bej.2005.08.002
Article Google Scholar
Wang X, Luo R, Shao H (1996) Designing a soft sensor for a distillation column with the fuzzy distributed radial basis function neural network. Proc IEEE Conf Decis Control 2:1714–1719. https://doi.org/10.1109/cdc.1996.572803
Article Google Scholar
Gonzaga JCB, Meleiro LAC, Kiang C, Filho RM (2009) ANN-based soft-sensor for real-time process monitoring and control of an industrial polymerization process. Computers & Chemical Engineering 33:43–49. https://doi.org/10.1016/j.compchemeng.2008.05.019
Liu Y, Yang C, Gao Z, Yao Y (2018) Ensemble deep kernel learning with application to quality prediction in industrial polymerization processes. Chemom Intell Lab Syst 174:15–21. https://doi.org/10.1016/j.chemolab.2018.01.008
Article Google Scholar
Deng H, Yang K, Liu Y et al (2020) Actively exploring informative data for smart modeling of industrial multiphase flow processes. IEEE Trans Ind Inf 17:8357–8366. https://doi.org/10.1109/TII.2020.3046013
Article Google Scholar
Liu Y, Yang C, Liu K et al (2019) Domain adaptation transfer learning soft sensor for product quality prediction. Chemom Intell Lab Syst 192:103813. https://doi.org/10.1016/j.chemolab.2019.103813
Article Google Scholar
Liu Y, Yang C, Zhang M et al (2020) Development of adversarial transfer learning soft sensor for multigrade processes. Ind Eng Chem Res 59:16330–16345. https://doi.org/10.1021/acs.iecr.0c02398
Article Google Scholar
Costa AFB, Claro FAE (2008) Double sampling x̄ control chart for a first-order autoregressive moving average process model. Int J Adv Manuf Technol 39:521–542. https://doi.org/10.1007/s00170-007-1230-6
Article Google Scholar
Jung DH, Kim HS, Jhin C et al (2020) Time-serial analysis of deep neural network models for prediction of climatic conditions inside a greenhouse. Comput Electron Agric 173:105402. https://doi.org/10.1016/j.compag.2020.105402
Article Google Scholar
You Y, Nikolaou M (1993) Dynamic process modeling with recurrent neural networks. AIChE J 39: 1654–1667
Curreri F, Pataneè L, Xibilia MG (2021) RNN- and LSTM-based soft sensors transferability for an industrial process. Sensors (Switzerland) 21:1–20. https://doi.org/10.3390/s21030823
Article Google Scholar
Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404:1–43. https://doi.org/10.1016/j.physd.2019.132306
Article MathSciNet MATH Google Scholar
Hu Y, Huber A, Anumula J, Liu S (2019) Overcoming the vanishing gradient problem in plain recurrent networks. 1–20
Yuan X, Li L, Wang Y (2020) Nonlinear dynamic soft sensor modeling with supervised long short-term memory network. IEEE Trans Ind Inf 16:3168–3176. https://doi.org/10.1109/TII.2019.2902129
Article Google Scholar
Chou CH, Wu H, Kang JL et al (2020) Physically consistent soft-sensor development using sequence-to-sequence neural networks. IEEE Trans Ind Inf 16:2829–2838. https://doi.org/10.1109/TII.2019.2952429
Article Google Scholar
Yuan X, Li L, Shardt YAW et al (2020) Deep learning with spatiotemporal attention-based LSTM for industrial soft sensor model development. IEEE Trans Ind Electron 68:4404–4414. https://doi.org/10.1109/tie.2020.2984443
Article Google Scholar
Normey-Rico JE, Camacho EF (2008) Control of dead-time processes. IEEE Control Syst Mag 28:136–137
Tran VT, Yang BS, Tan ACC (2009) Multi-step ahead direct prediction for the machine condition prognosis using regression trees and neuro-fuzzy systems. Expert Syst Appl 36:9378–9387. https://doi.org/10.1016/j.eswa.2009.01.007
Article Google Scholar
An NH, Anh DT (2016) Comparison of strategies for multi-step-ahead prediction of time series using neural network. In: Proceedings—2015 international conference on advanced computing and applications. ACOMP, 2015, pp 142–149. https://doi.org/10.1109/ACOMP.2015.24
Kwon H, Oh KC, Choi Y et al (2021) Development and application of machine learning-based prediction model for distillation column. Int J Intell Syst. https://doi.org/10.1002/int.22368
Article Google Scholar
Greff K, Srivastava RK, Koutnik J et al (2017) LSTM: a search space Odyssey. IEEE Trans Neural Netw Learn Syst 28:2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924
Article MathSciNet Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 4:3104–3112
Google Scholar

Download references

Acknowledgements

This work was supported in part by the Korea Institute of Industrial Technology [JH-22-0004].

Author information

Seokyoung Hong and Nahyeon An equally contributed to this study.

Authors and Affiliations

Department of Chemical and Biomolecular Engineering, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea
Seokyoung Hong, Nahyeon An & Il Moon
Green Materials and Processes R&D Group, Ulsan Regional Division, Korea Institute of Industrial Technology, 55, Jongga-ro, Jung-gu, Ulsan, 44413, Republic of Korea
Nahyeon An, Hyungtae Cho & Junghwan Kim
R&D center, GS Caltex, 35, 9 Expo-ro, Yuseong-gu, Daejeon, 34122, Republic of Korea
Jongkoo Lim & In-Su Han

Authors

Seokyoung Hong
View author publications
You can also search for this author in PubMed Google Scholar
Nahyeon An
View author publications
You can also search for this author in PubMed Google Scholar
Hyungtae Cho
View author publications
You can also search for this author in PubMed Google Scholar
Jongkoo Lim
View author publications
You can also search for this author in PubMed Google Scholar
In-Su Han
View author publications
You can also search for this author in PubMed Google Scholar
Il Moon
View author publications
You can also search for this author in PubMed Google Scholar
Junghwan Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Il Moon or Junghwan Kim.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hong, S., An, N., Cho, H. et al. A Dynamic Soft Sensor Based on Hybrid Neural Networks to Improve Early Off-spec Detection. Engineering with Computers 39, 3011–3021 (2023). https://doi.org/10.1007/s00366-022-01694-7

Download citation

Received: 12 December 2021
Accepted: 14 June 2022
Published: 23 July 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00366-022-01694-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Dynamic Soft Sensor Based on Hybrid Neural Networks to Improve Early Off-spec Detection

Abstract

Similar content being viewed by others

Deep Learning Approaches to Detect Real Time Events Recognition in Smart Manufacturing Systems – A Short Survey

Data-Driven Soft Sensor Model Based on Deep Learning for Quality Prediction of Industrial Processes

Benefits of Using Digital Twin for Online Fault Diagnosis of a Manufacturing System

1 Introduction

2 Background