1 Introduction

Accurate and reliable online measurement of quality variables is important for effective process monitoring and control in intelligent factories [1]. Most of these quality variables are measured by offline laboratory analysis or online measurements [2]. Although laboratory analysis can provide precise measurements, it requires a long sampling cycle, which results in a large measurement delay [3, 4]. Online measurements could measure the quality variable in real-time, but they are expensive and lack reliability [5]. To resolve these problems, soft sensors employing virtual sensing techniques can be used to estimate quality variables based on other process variables available online, such as flow rate, pressure, and temperature [6]. Over the past decade, soft sensors have received considerable attention owing to their rapid response and low maintenance costs [7].

Soft sensors can be broadly categorized in two types: white box models (first-principle models) and black box models (data-driven models). White box models are a method based on mass and energy balance and various chemical and physical equations. However, these have drawbacks as they require strong domain knowledge of the process and demands enormous calculation cost to build the models. Furthermore, these focus on the ideal steady-states, so they cannot describe the actual operating conditions. Black box models can consider actual process conditions with a little domain knowledge to be modeled. As data-driven modeling has recently developed, various machine learning-based algorithms have led to efficient soft sensor development, such as PLS [8], SVR [9], fuzzy systems [10], deep kernel learning [12], Gaussian process regression [13], etc. In particular, the artificial neural network (ANN) method is widely used in soft sensor developments because it can establish a complicated relationship between input and output variables [7, 11]. Deep learning-based soft sensing models, that have a powerful ability to learn the essential features of data, are also developed and have shown better prediction performances [14, 15]. To consider the dynamic states of the process, hybrid methods that integrate machine learning algorithms and regression models are proposed using ARMA [16], and NARX [17]. A recurrent neural network (RNN) in deep learning technology, which deals with time series data from past steps, is used to develop a dynamic soft sensor model [18]. RNNs can extract the sequential information available in the input data and exhibit better performance in dynamic modeling [19]. However, standard RNNs have difficulties in modeling long sequences because of the gradient vanishing problem [20]. Long short-term memory (LSTM) networks and gated recurrent unit (GRU) networks have been developed to address this problem using memory cells that store the long sequence [21]. The LSTM network is employed to extract the hidden dynamics from the input sequence and quality variables, which shows a better quality prediction accuracy compared with that of RNNs [22]. Furthermore, sequence-to-sequence LSTM networks using encoder–decoder architectures can learn sequential information of both output and input variables simultaneously [23]. Attention-based sequence-to-sequence networks enable the development of an explainable model with importance weights of input variables to predict quality variables [24].

However, existing soft sensors focus on predicting the current quality value and use recursive structure to consider the interaction between quality variables. Therefore, these soft sensors distinguish between on-spec and off-spec conditions based on the current quality value. Consequently, soft sensors with limited information cannot prevent off-spec occurrences because off-spec detection is followed by process control. It is important to predict dynamic behavior of quality variables in the future as well as current prediction for establishment of appropriate decision making. Because dead time, which is the time required to transport materials in industrial processes, is present in a practical control system [25]. The dead time causes a lag between the off-spec detection and the return to the on-spec condition, resulting in significant product loss during the associated recovery time. If quality deterioration could be identified in advance, a preventive policy could be established, and losses would be minimized. The quality variables are determined by the process states with the corresponding dynamics in the process unit. This means that future quality changes can be predicted using process dynamics analysis. Although existing networks significantly improve the prediction performance and robustness of soft sensors, there are several problems in multi-step prediction because they are single-time prediction models with quality-relevant sequences. For a single-time model, a direct prediction method that constructs multiple independent models from the same historical values is used to predict multi-steps directly [26]. Direct prediction outputs multi-step results simultaneously without error accumulation, but the predicted outputs lose their sequential information [27]. To accurately predict future process state changes, the temporal continuity of the output sequence should be maintained to reflect the dynamic state of the process. Therefore, a robust dynamic model can enable reliable future predictions and the recursive structure which accurately predicts the dynamics of future quality variables can handle this issue.

In this study, we developed an early off-spec detection system with a dynamic soft sensor for multi-step prediction of quality variables with temporal correlation. Accordingly, a hybrid sequence-to-sequence recurrent neural network-deep neural network (Seq2Seq RNN–DNN) model has been proposed to address two problems in soft sensor modeling. First, the RNN encoder-decoder architecture is employed to handle the sequence-to-sequence dataset with process dynamics. The encoder extracts dynamic hidden states from the input sequence and the decoder takes the dynamic information to predict the output sequence while maintaining the temporal correlation of the predicted values. Unlike the direct prediction method, this approach improves the prediction performance for long-time series. Second, a combined dataset obtained from offline measurements and process simulation is used to solve the insufficient data problem. Due to the long cycle of offline laboratory analysis, data for soft sensor modeling is limited. Small numbers of data can reduce the predictive performance of a resulting model from data-driven modeling. Therefore, lab measurement data are combined with simulation data to be used for training the DNN.

The remainder of this paper is structured as follows: Sect. 2 provides a brief introduction to the background of the direct prediction method and sequence-to-sequence network. Sect. 3 describes the proposed hybrid Seq2Seq RNN–DNN used for constructing a dynamic soft sensor. A case study is carried out for an industrial process to evaluate the performance of the proposed model in Sect. 4. Finally, conclusions are presented in Sect. 5.

2 Background

2.1 Direct method for multi-step prediction

To predict multi-step with a single-time prediction model, the number of models and time steps must be equal. This prediction method is referred to as a direct prediction method. The key advantage of the direct method for multi-step prediction is its computational simplicity. However, it requires considerable calculation time to establish models for each time step. Furthermore, the direct method loses temporal information of the predicted values. It decreases the prediction accuracy with a long-time series because it predicts the output at each time using an independent model, as shown in Fig. 1. Given the initial dataset, \(N\) training sets are created first, each having the same input sequence \(\mathbf{X}=[{\mathbf{x}}_{1},{\mathbf{x}}_{2},\dots ,{\mathbf{x}}_{L}]\) but different output \({\mathbf{y}}_{k}\). For example, the output variables for the first prediction model are \({\mathbf{y}}_{1}\), the output variables for the second prediction model are \({\mathbf{y}}_{2}\), and so on. By training each dataset independently, \(N\) regression models \({f}_{k}\) \((k=t,t+1,\dots ,N)\) are obtained and the models are used to predict future \(N\) values as follows:

$${\mathbf{y}}_{k}={f}_{k}\left({\mathbf{x}}_{1},{\mathbf{x}}_{2},...,{\mathbf{x}}_{{\varvec{L}}}\right) \left(k=\mathrm{1,2},\dots ,N\right),$$
(1)

where \(L\) and \(N\) represent the history window size and prediction window size, respectively.

Fig. 1
figure 1

Schematic diagram of direct prediction method

2.2 Deep neural network

A deep neural network (DNN) is an ANN with multiple fully connected layers consisting of input, hidden, and output layers. DNNs have previously been applied to predict and simulate many physical problems with high performance. In the DNN structure, each layer receives the output from the previous layer and transfers the output to the next layer. The hidden layers with feedforward networks are trained using backpropagation stochastic gradient descent. The accuracy of the model depends on the chosen architecture, its hyperparameters, the nature of the data, and the learning process. The outputs \((\mathbf{h})\) of the first, hidden, and output layers in the DNN are expressed as:

$${\mathbf{h}}_{i}=\sigma \left({\mathbf{W}}_{i}^{T}\mathbf{x}+{\mathbf{b}}_{i}\right),$$
(2)
$${\mathbf{h}}_{n}=\sigma \left({\mathbf{W}}_{n}^{T}{\mathbf{h}}_{n-1}+{\mathbf{b}}_{n}\right),$$
(3)
$$\widehat{\mathbf{y}}={\mathbf{W}}_{o}^{T}{\mathbf{h}}_{N}+{\mathbf{b}}_{o},$$
(4)

where \(\mathbf{W}\) and \(\mathbf{b}\) represent the weight matrix and bias vector of the nth hidden layer, respectively. For the input layer, the input variable vector \((\mathbf{x})\) is used instead of \({\mathbf{h}}_{n-1}\). Meanwhile, for the output layer, the predicted values of the output layer (\(\widehat{\mathbf{y}})\) are used instead of \({\mathbf{h}}_{n}\), and there is no activation function.

2.3 Recurrent neural network

An RNN stores the past data and forwards the information to calculate the output of the next step [28]. Unlike feedforward DNN, RNN can model temporal dynamics using some form of memory. LSTM and GRU are variants of the standard RNN used to handle the gradient vanishing problem and the explosion of long-term dependencies observed in RNN [29].

As shown in Fig. 2, a typical LSTM cell is configured primarily by three gates: input gate (\({i}_{t}\)), forget gate (\({f}_{t}\)), and output gate (\({o}_{t}\)). The input gate takes newly incoming data and stores the new information in the cell state. The forget gate decides what to forget from the cell state. The output gate receives the calculated cell state and outputs the result of the LSTM cell. Equations (5)–(8) represent the input gate, forget gate, output gate, candidate cell state (\({\widetilde{C}}_{t})\), cell state (\({C}_{t}\)), and the final output (\({h}_{t}\)), respectively:

$${i}_{t}=\sigma \left({\mathbf{W}}_{xi}{\mathbf{x}}_{t}+{\mathbf{W}}_{hi}{\mathbf{h}}_{t-1}+{b}_{i}\right),$$
(5)
$${f}_{t}=\sigma \left({\mathbf{W}}_{xf}{\mathbf{x}}_{t}+{\mathbf{W}}_{hf}{\mathbf{h}}_{t-1}+{b}_{f}\right),$$
(6)
$${o}_{t}=\sigma \left({\mathbf{W}}_{xo}{\mathbf{x}}_{t}+{\mathbf{W}}_{ho}{\mathbf{h}}_{t-1}+{b}_{o}\right),$$
(7)
$${\widetilde{C}}_{t}=\mathrm{tan}h\left({\mathbf{W}}_{xc}{\mathbf{x}}_{t}+{\mathbf{W}}_{hc}{\mathbf{h}}_{t-1}+{b}_{c}\right),$$
(8)
$$C_{t} = f_{t} \odot C_{t - 1} + i_{t} \odot \tilde{C}_{t} ,$$
(9)
$$h_{t} = o_{t} \odot {\text{tan}}h\left( {C_{t} } \right),$$
(10)

where \(\mathbf{W}, b\), and \(\sigma\) represent the weight matrix, bias vector, and sigmoid function, respectively, and \(\odot\) is the pointwise multiplication of two vectors.

Fig. 2
figure 2

LSTM cell structure

Meanwhile, as shown in Fig. 3, a GRU cell has two gates: an update gate \(({z}_{t})\) and a reset gate \(({r}_{t})\). The reset gate determines which previous information is to be kept and combined with the new input data. The update gate decides how much prior memory is retained for the future time steps. Both gates determine how much of the past and present information to use and generate new hidden state information. Equations (11)–(14) represent the update gate, reset gate, candidate hidden state (\({\widetilde{\mathbf{h}}}_{t})\), and hidden state (\({\mathbf{h}}_{t}\)), respectively:

$${\mathbf{z}}_{t}=\sigma \left({\mathbf{W}}_{xz}{\mathbf{x}}_{t}+{\mathbf{W}}_{hz}{\mathbf{h}}_{t-1}\right),$$
(11)
$${\mathbf{r}}_{t}=\sigma \left({\mathbf{W}}_{xr}{\mathbf{x}}_{t}+{\mathbf{W}}_{hr}{\mathbf{h}}_{t-1}\right),$$
(12)
$${\tilde{\mathbf{h}}}_{t} = {\text{tan}}h({\mathbf{W}}_{xh} {\mathbf{x}}_{t} + {\mathbf{W}}_{rh} \left( {{\mathbf{r}}_{t} \odot {\mathbf{h}}_{t - 1} } \right)),$$
(13)
$${\mathbf{h}}_{t} = \left( {1 - {\mathbf{z}}_{t} } \right) \odot {\mathbf{h}}_{t - 1} + {\mathbf{z}}_{t} \odot {\tilde{\mathbf{h}}}_{t} .$$
(14)
Fig. 3
figure 3

GRU cell structure

2.4 Sequence-to-sequence network

The sequence-to-sequence network was proposed for machine translation tasks [30]. Fig. 4. shows a generalized sequence-to-sequence network consisting of an encoder and a decoder. The encoder comprises a stack of RNN layers that output the hidden state using the input vector and the last hidden state from the previous time. The hidden state at the final time step is then converted to a fixed length vector (\(\mathbf{C}\)), and the vector is then fed into another stack of RNN layers called the decoder. The decoder predicts the output sequence, using the final hidden state from the encoder and the last output state. The types of encoder–decoder units can be any RNN variant, such as LSTM or GRU.

Fig. 4
figure 4

Structure of sequence-to-sequence network

3 Dynamic soft sensor based on hybrid Seq2Seq RNN–DNN

In this section, hybrid Seq2Seq RNN–DNN is developed to improve long-time series prediction and solve the insufficient data problem. Seq2Seq RNN is a powerful prediction method. However, the quality data obtained irregularly from laboratory analysis cannot be used in Seq2Seq structure because RNN requires time-series data for model training. Therefore, hybrid model is required to predict future quality variables which has two steps: prediction of sensor variables in dynamic system and quality measurement using predicted sensor variables. Fig. 5. illustrates the structure of the Seq2Seq RNN–DNN. First, Seq2Seq RNN is trained using time-series data, and it uses historical information and temporal correlation of process variables to predict future sensor variables in dynamic states. Second, DNN trained on combined dataset is used to measure future quality values using output sequence from Seq2Seq RNN. The combined dataset comprises of laboratory analysis data and process simulation data. The detailed steps are described as follows. In the encoder, the RNN network extracts the dynamic features from the history of the process states. Second, the RNN decoder predicts the output sequence of the sensor variables while maintaining the temporal correlation of the time series. Finally, the DNN is utilized as a soft sensor that converts the sensor variables into quality variables at each time step of the output sequence. The steps are described in detail below.

Fig. 5
figure 5

Framework of the Seq2Seq RNN–DNN for multi-step forecasting

3.1 Sequence-to-sequence RNN network

The network uses a sequence encoder and a sequence decoder structure. The sequential inputs allow the encoder to extract dynamic information from the historical process data and the sequential outputs of the decoder enable the prediction with temporal correlation. First, the process data are reshaped to train the Seq2Seq RNN, as shown in Fig. 6. All process variables \((\mathbf{p}\mathbf{v})\) are divided into two categories: manipulated and sensor variables \((\mathbf{s}\mathbf{v})\). Manipulated variables are adjusted by an operator; sensor variables, which support measurements related to the quality variables, are responses to the manipulation. Therefore, the dynamic state of sensor variables is related to the input sequence of process variables. The reshaped datasets can be denoted as \(\{{\mathbf{P}\mathbf{V}}_{t},{\mathbf{S}\mathbf{V}}_{t}, t=\mathrm{1,2},\dots ,S\}\), where \(S\) denotes the number of training samples. The input sequence \({\mathbf{P}\mathbf{V}}_{t}=[{\mathbf{p}\mathbf{v}}_{t-L+1},\boldsymbol{ }{\mathbf{p}\mathbf{v}}_{t-L+2},\boldsymbol{ }\dots ,\boldsymbol{ }{\mathbf{p}\mathbf{v}}_{t}]\) is a past time series of process variables with an \(L\)-step time window, where \({\mathbf{p}\mathbf{v}}_{t}=\left[{pv}_{t}^{1},\boldsymbol{ }{pv}_{t}^{2},\boldsymbol{ }\dots ,{pv}_{t}^{m}\right]\) denotes the \(m\) process variables at time \(t\). The output sequence \({\mathbf{S}\mathbf{V}}_{t}=[{\mathbf{s}\mathbf{v}}_{t+1},\boldsymbol{ }{\mathbf{s}\mathbf{v}}_{t+2},\boldsymbol{ }\dots ,\boldsymbol{ }{\mathbf{s}\mathbf{v}}_{t+N}]\) is a future time series of sensor variables with an \(N\)-step time window, where \({\mathbf{s}\mathbf{v}}_{t}=[{sv}_{t}^{1},\boldsymbol{ }{sv}_{t}^{2},\boldsymbol{ }\dots ,{sv}_{t}^{n}]\) denotes the \(n\) sensor variables at time \(t\).

Fig. 6
figure 6

Serialized dataset for sequence-to-sequence structure model

Second, the RNN module was used as the encoder and the RNN could be replaced with the variants of RNN, such as LSTM and GRU. In the encoder, the output hidden states from the previous time step are utilized as the initial states at the next time step to extract and transfer the dynamic information. The dynamic hidden states of input sequence \({\mathbf{P}\mathbf{V}}_{t}\) are propagated forward through \(L\) time steps. The complex dynamic and nonlinear features can be extracted using (15). Here, \({\mathbf{C}}_{t}\) denotes the features output of the RNN encoder for \(L\) time steps:

$${\mathbf{C}}_{t}={\mathrm{RNN}}_{\mathrm{encoder}}\left({\mathbf{p}\mathbf{v}}_{t-L+1},{\mathbf{p}\mathbf{v}}_{t-L+2},...,{\mathbf{p}\mathbf{v}}_{t}\right).$$
(15)

After the encoder extracts the dynamic feature extraction, the decoder is used to predict the dynamic state of sensor variables while exploiting the sequential dependence. The input of the RNN decoder at time step \(t+k\) consists of two parts, the extracted dynamic feature \({\mathbf{C}}_{t}\) and the last output hidden states at time step \(t+k-1\). The dependence among different time steps is forward-propagated through the output sequence of the RNN decoder. Consequently, the output sequence must be decoded from the features and previous outputs as follows:

$${\mathbf{s}\mathbf{v}}_{t+k}={\mathrm{RNN}}_{\mathrm{decoder}}\left({\mathbf{C}}_{t},\boldsymbol{ }{\mathbf{s}\mathbf{v}}_{t+1},{\mathbf{s}\mathbf{v}}_{t+2},\dots ,{\mathbf{s}\mathbf{v}}_{t+k-1}\right).$$
(16)

3.2 DNN soft sensor with combined dataset

A sequence of quality variables is predicted using the sequence of sensor variables from outputs of the RNN decoder. The relationship between sensor and quality variables commonly has high nonlinearity; thus, a soft sensor model is developed based on the DNN algorithm. Quality variables at time step \(t+k\) are calculated by the DNN model as follows:

$${\mathbf{q}\mathbf{v}}_{t+k}=\mathrm{DNN}\left({\mathbf{s}\mathbf{v}}_{t+k}\right)\left(k=\mathrm{1,2},...,N\right).$$
(17)

The performance of a data-driven method is significantly dependent on the quantity and quality of data. The quality variables are infrequently measured by an offline sample analysis conducted 4–6 times a day. The limited number of data cannot represent the complicated relationship between sensor and quality variables. Therefore, simulation data are utilized as complementary data to resolve the insufficient data problem. A simulation model of the target process generates additional dataset with various operating conditions. The combined dataset from laboratory analysis and simulation model enable the DNN model to properly learn the information of nonlinear physical properties.

4 Case study

The performance of the proposed RNN–DNN network-based soft sensor model was validated using a 2,3-butanediol (2,3-BDO) distillation column. To investigate the effectiveness of predictions considering temporal correlation, we compared the performance of the proposed sequence-to-sequence network for sensor variable prediction against direct prediction by independent LSTM networks (direct LSTMs). Then, the benefits of simulation data were validated by comparing the combined data-driven model with a laboratory data-driven model. The root-mean-square error (\(\mathrm{RMSE}\)) and determination coefficient (\({R}^{2}\)) were used as prediction performance indicators for the soft sensor model.

$$\mathrm{RMSE}= \sqrt{\frac{1}{N}{\sum }_{k=1}^{N}{\left({y}_{k}-{\widehat{y}}_{k}\right)}^{2}},$$
(18)
$${R}^{2}=1-\frac{\sum_{k}{\left({y}_{k}-{\widehat{y}}_{k}\right)}^{2}}{\sum_{k}{\left({y}_{k}-{\overline{y} }_{k}\right)}^{2}},$$
(19)

where \({y}_{k}\) and \({\widehat{y}}_{k}\) denote the actual and predicted values at time \(k\), respectively; \({\overline{y} }_{k}\) is the average of the actual values. \(\mathrm{RMSE}\) is a measure of absolute errors; therefore, its value is preferred to be low. In contrast, \({R}^{2}\) is a statistical measure of the fit that ranges from 0 to 1. Therefore, a larger \({R}^{2}\) value is preferred.

4.1 Process description and model parameters

The target process was operated as a demonstration plant to produce bio-based 2,3-BDO via natural fermentation. Fig. 7. shows the process flow diagram of the target process to produce a 99 wt% 2,3-BDO product at the bottom of the column. It removes water and acetoin from the top, as well as 2,3-BDO and a small amount of residual acetoin at the bottom. Because the bottom acetoin concentration has a significant influence on the column, it needs to be strictly detected and controlled. To improve the control quality, real-time estimation of the acetoin concentration is required; however, it is difficult to directly measure the impurity content. Thus, the proposed soft sensor was applied to predict the acetoin content as a quality variable in the bottom product.

Fig. 7
figure 7

Process diagram of 2,3-BDO distillation unit

Over 20 process variables, such as temperature, column pressure, bottom liquid level, flow rate of the feed at the top and bottom, were collected through the distributed control system every minute. Eight process variables were selected as input variables for the Seq2Seq RNN network and three of them were selected as sensor variables to predict the quality variable, that is, acetoin content at the bottom flow rate. Table 1 lists the eight process variables and one quality variable.

Table 1 Process variable description

The hyperparameters of each section are chosen using the grid search method. The hyperparameters of temporal correlative GRU involve the number of encoder and decoder hidden layers and hidden neurons, whose candidate sets are {1, 2, 3, 4, 5} and {10, 20, 30, 40}, respectively. The candidates of the number of hidden layers in multivariate DNN are the same as temporal correlative GRU, and those of hidden neurons are {10, 20, 30, 40, 50}. Hyperparameters are selected based on the average value after three iterations of each case and other hyperparameters are listed in Table 2.

Table 2 Hyperparameter setting

4.2 Result of multi-step prediction for sensor variables

Fig. 8. shows the actual values from the past thirty steps to future twenty steps and the predicted future values of the first sensor variable by the direct GRU and Seq2seq GRU. The result of the direct method had similar trajectories to the actual values at near-time steps but the difference between the actual and predicted values increased as the time step increased. In contrast, the Seq2Seq network showed well-fit trajectories to the actual values even at later time steps because the model preserved the sequential information in time-series prediction.

Fig. 8
figure 8

Multi-step prediction result of first sensor variable on a sample dataset by Seq2Seq GRU

To compare the prediction accuracy of direct methods and Seq2Seq networks, the \(\mathrm{RMSE}\) values of predicted sensor variables at each time step were calculated and the results of four types of models are shown in Fig. 9. The errors of all models increased as the target time step increased. The direct methods showed better prediction performance than Seq2Seq networks with an average \(\mathrm{RMSE}\) value of 0.01 at time step \(t+1\). Meanwhile, after the time step \(t+9\), the \(\mathrm{RMSE}\) values of the Seq2seq GRU were smaller than that of the direct GRU, and after time step \(t+12\), the Seq2seq LSTM showed smaller \(\mathrm{RMSE}\) values than the direct LSTM. At the last time step (\(t+20\)), the \(\mathrm{RMSE}\) value of Seq2seq GRU was 0.1316, the lowest value among the four models. In contrast to the direct prediction models, which minimized single-step errors, the Seq2Seq networks minimized multi-step errors and consequently showed higher errors at near-time steps but lower average errors of overall time steps. In addition, the result indicates that the reliability of the direct prediction decreased for long-time series.

Fig. 9
figure 9

Average RMSE values of a direct LSTM, b direct GRU, c Seq2Seq LSTM, and d Seq2Seq GRU

The overall results from Table 3 show that the Seq2Seq GRU model outperformed the other models with respect to accuracy and training time. The models using the LSTM network showed higher errors and required a longer training time than the models using GRU. The direct methods required multiple models for multi-step prediction, and each model was trained independently using a different training dataset, which resulted in a long training time of over 30 min. On the other hand, Seq2Seq networks required only one prediction model trained by sequential data and the training times were 4.80 and 3.77 min for LSTM and GRU, respectively.

Table 3 Results of overall RMSE values and training time

4.3 Prediction of impurity content

The 80 data samples obtained from offline laboratory analysis and the 140 data samples from mathematical simulation were used to develop the DNN soft sensor. Laboratory data were measured under the fulfillment of over 99 wt% 2,3-BDO in bottom flow, while the simulation data were acquired covering a wide operating range less than 99 wt% 2,3-BDO. The dataset for the DNN soft sensor model comprised 80% training data and 20% test data.

Fig. 10. shows the results of the comparison between the actual and predicted values of the developed soft sensor on a set of test data. Given only the experimental dataset obtained under on-spec conditions, DNN rarely gained the knowledge of off-spec conditions in the 2,3-BDO distillation process. Consequently, the trained model showed worse performance, as it could not distinguish between on-spec and off-spec conditions. On the other hand, a soft sensor trained using combined dataset showed outstanding prediction performance in both on-spec \((<0.3)\) and off-spec range \((\ge 0.3)\).

Fig. 10
figure 10

Comparison between the actual and predicted values of two predictive models, left model a is trained only on experimental dataset, and right model, b is trained on the combined dataset

The sensing performance of the developed soft sensor was validated using RMSE values of the test samples under on-spec and off-spec conditions, as shown in Table 4. The result shows that prediction error decreases when the model is trained on the combined dataset. For off-spec data generated from simulation model, the RMSE values of combined data model are much lower than the model trained only on experimental data. Moreover, simulation data also increases prediction accuracy in on-spec condition because the RMSE value decreases from 0.0437 to 0.0394 when the model is trained on combined dataset.

Table 4 Comparison of RMSE values for training and test dataset

4.4 Result of early off-spec detection

To investigate the feasibility of the early off-spec detection system, developed models were tested using 650 samples of time-series data. These data were mainly divided into two parts. In the first part of the process history, the process was controlled before the off-spec occurred, while the process was controlled after the off-spec occurred in the second part. The criterion value of off-spec product for the target process was 0.3 g/L acetoin, that is, products containing more than 3.0 g/L acetoin were classified as off-spec products. In Fig. 12, the future qualities are predicted by direct RNN–DNN models and Seq2Seq RNN–DNN models for target time steps \(t+10\) and \(t+20\), respectively. For the target time step \(t+10\), the direct RNN–DNN and Seq2seq RNN–DNN models were able to properly detect the on- and off-spec conditions. In contrast, the Seq2seq RNN–DNN models were able to detect the off-spec occurrence for target time step \(t+20\), but the direct RNN–DNN models could not detect the off-spec occurrence. The direct RNN–DNN models malfunctioned by classifying the on-spec condition as the off-spec condition in the first part and failing to detect the off-spec condition in the second part. Consequently, the Seq2seq RNN–DNN models predicted the future quality variable more accurately and are, therefore, more reliable for early detection of off-spec product than the direct RNN–DNN models.

Fig. 12
figure 11

Future quality predictions of two models for target time step a \({\varvec{t}}+10\) and b \({\varvec{t}}+20\)

5 Conclusion

In this research, we have proposed a hybrid Seq2Seq RNN–DNN model designed to predict the dynamics of quality variables in multi-step for early off-spec detection. The proposed model consists of two sections: (1) a sequence-to-sequence recurrent units section that captures the process dynamics and (2) the fully connected deep neural network as a soft sensor trained by a combined dataset. The results from the application of the proposed soft sensor to the 2,3-BDO distillation column can be summarized as follows.

  1. 1.

    The result shows that our approach allows accurate predictions for long-time series and a cheap computational modeling with a short training time. The Seq2Seq RNN model, especially Seq2Seq GRU with an average \(\mathrm{RMSE}\) value of 0.0813, outperformed the direct prediction model for 20 time-step predictions and the training time for Seq2Seq GRU was 3.77 min, which is significantly shorter than the 36.69 min of the direct method. Consequently, Seq2Seq RNN–DNN models were able to properly detect off-spec product 20 min in advance.

  2. 2.

    In addition, we verified the effectiveness of combining offline analysis and simulation data to deal with a limited amount of data for soft sensor modeling.

This model can be used to foster a timely operation by providing an indication of the off-spec product in advance. Consequently, process variables can be adjusted accordingly within a short return time to the on-spec state. Moreover, short training times enable a more practical model for adoption in real processes. In future work, the proposed model can also be applied to a control system for regulating production and to integrate two different time-series prediction methods in a hybrid model frame for improved prediction.