Developing Soft Sensors for Polymer Melt Index in an Industrial Polymerization Process Using Deep Belief Networks

This paper presents developing soft sensors for polymer melt index in an industrial polymerization process by using deep belief network (DBN). The important quality variable melt index of polypropylene is hard to measure in industrial processes. Lack of online measurement instruments becomes a problem in polymer quality control. One effective solution is to use soft sensors to estimate the quality variables from process data. In recent years, deep learning has achieved many successful applications in image classification and speech recognition. DBN as one novel technique has strong generalization capability to model complex dynamic processes due to its deep architecture. It can meet the demand of modelling accuracy when applied to actual processes. Compared to the conventional neural networks, the training of DBN contains a supervised training phase and an unsupervised training phase. To mine the valuable information from process data, DBN can be trained by the process data without existing labels in an unsupervised training phase to improve the performance of estimation. Selection of DBN structure is investigated in the paper. The modelling results achieved by DBN and feedforward neural networks are compared in this paper. It is shown that the DBN models give very accurate estimations of the polymer melt index.


Introduction
Much work on soft sensors in the research area of process control has done in the past few decades.This technique is widely implemented in industrial chemical processes.Soft sensors are very effective techniques for the estimation of some important product quality variables in industrial processes which cannot be measured effectively.In the area of process control, the issues of hardware instruments, such as unavailability or high cost, hinder the product quality control.To overcome these issues, empirical models can be developed based on process operational data obtained from real industrial processes.With such models, the difficult-to-measure quality variables can be estimated from easy-to-measure process variables [1] .This modelling technique based on historical process data has become increasingly popular in chemical processes in recent years.Such data driven models can be effectively used to reduce the cost of production in industrial processes and improve the efficiency.
Much successful research on process modelling based on multivariate statistical techniques has been completed in the last century.In 1901, principal component analysis (PCA) was proposed by Pearson [2] .This method was further developed by Harold Hotelling in the 1930s [3,4] .Based on PCA, principal component regression (PCR) and partial least squares (PLS) have emerged as useful modelling methods to address the problem of co-linearity among the input variables [2] .Data-driven soft sensors based on PCR can be developed by using principal components as the predictor variables.As an improvement of PCR, PLS regression can model both the process data and quality data at the same time [5] .Wold et al. [6] first introduced PLS and Wold further developed it.There were many applications based on the PLS technique in process modelling.One limitation of PLS and PCR is that they are both linear techniques.They are not very effective when applied to nonlinear process modelling.
With the development of machine learning, many researches on developing soft sensors based on machine learning techniques have been reported in the past a few years.There are many successful process modelling techniques based on machine learning, such as support vector machine (SVM) and artificial neural networks (ANN).McCulloch and Pitts [7] proposed the original neural network in the 1940s.After 20 years, with the vast improvement of computer capability, neural networks became a popular research topic.The back-propagation algorithm was applied to ANN by Werbos [8] in 1975.The advantage of ANN is that they can be used to approximate any nonlinear functions.ANN gives very good performance on estimation and prediction of quality data.The backpropagation algorithm can deal with the exclusive-or problems.In the backpropagation training algorithm, the network weights between neurons are modified to distribute the errors back up from the output layer [8] .However, conventional ANN suffers from problems of local optima and lack of generalization capability.SVM can achieve accessible optima of training even when there is little training data [9] .However, when applying SVM to processes with large amount of modelling data, the pressure of computation will increase.In 2006, Hinton et al. [10] first introduced deep learning.Deep belief network (DBN) is one kind of the most well-known data-driven modelling techniques based on deep learning.It shows strong generalization capability in modelling highly nonlinear processes.This model is established with a deep architecture.Deep learning has many applications in speech recognition and images classification [11] .There are two training phases in the procedure of DBN training: unsupervised training followed by supervised training.Before supervised training, DBN will capture more information from nonlinear process input data to achieve more accurate prediction or estimation of quality data.It has shown significant performance in many other applications [12,13] .
Soft sensors for polymer melt index (MI) are established using DBN and applied to an industrial polypropylene polymerization process in this study.By using deep learning techniques, large amount of industrial process data samples without pre-existing labels can also be used by DBN models in the unsupervised training phase.However, these input data are useless for training the conventional feed-forward neural networks which just use supervised training.These process data samples help the DBN model in adjusting the weights in a desirable region.The information from process data were captured during the procedure of unsupervised training.It is shown in this paper that DBN models gave very accurate estimations of MI.
The rest of this paper is organized as follows.An introduction of ANN is given in Section 2. In Section 3, DBN model and the main principles of restricted Boltzmann machines (RBMs) and back-propagation are introduced.Section 4 introduces the case study of an industrial polypropylene polymerization process.The selection of DBN model architectures are discussed and the polymer melt index estimation results are given in Section 5. Section 6 summarizes the conclusions of this paper.

Artificial neural networks
The feed-forward neural network is one of the most well-known machine learning techniques.It can be used in solving many problems of prediction, classification, and pattern recognition.Much research of ANN has been reported in the past decades.In the initial form of simple perceptron invented by McCulloch and Pitts, the model calculates the weighted sum of input variables and then passes it to an activation function.Fig. 1 shows a simple perceptron structure.
As can be seen from Fig. 1, , are input variables and , are the corresponding weights for these input variables.McCulloch and Pitts [7] used the threshold function as the activation function.They proved universal computations can be performed by simple perceptrons if weights are chosen appropriately.However, a lot of complicated systems cannot be represented by this method [14] .Many other activation functions can be used, such as Heaviside step function, sigmoid function and Gaussian function.These activation functions are sometimes also named as transfer function in ANN research.The most popular activation function is the sigmoid function.
The characteristic of the sigmoid function is that it is an "S"-shaped curve as shown in Fig. 2.
The sigmoid function maps its input values into a region from 0 to 1.In Fig. 2, the output value approaches to 1 when x approaches to + ∞, whereas the output approaches to 0 when x approaches to -∞.It has the appropriate asymptotic properties.The sigmoid function is given by (1): where x represents the sum of weighted input values and β is a slope parameter.
The structure of ANN can be regarded as neurons ar-. . . .Multilayer perceptrons are the most classic type of feed-forward networks.They can deal with more complicated problems than simple perceptrons.Neurons in the adjacent layers are connected unidirectionally without feedback loops.A multilayer perceptron model has at least three layers.It is commonly constructed by an input layer, a hidden layer and an output layer.The relationship between the neural network input and output variables can be learnt during the process of supervised training and stored as the trained network weights.The structure of a multilayer perceptron with two hidden layers is shown in Fig. 3.
Each unit in the input layer is a network input.The output of a unit in the hidden or output layer is calculated by passing the sum of the weighted outputs of the previous layer to an activation function as follows: where O j is the output value of the unit j in a particular layer, w ij is the weight between this unit and the i-th unit of the immediate previous layer, I i is the i-th input of this unit (i.e., the output value of the i-th unit in the previous layer), b j is a bias, and f is the activation function.During network training, the weights and biases will be initialized as random values typically in a range between -0.1 and 0.1.Network weights are adjusted by using training algorithms to minimize the error terms between network outputs and target labels.After training, the relationship between system input variables and output variables can be represented by the trained neural networks.
The training process of a multilayer feedforward neural network is supervised training.The most commonly used supervised training algorithm is the backpropagation algorithm.Multilayer feedforward neural networks have the capability of modeling nonlinear processes.However, the process of polymerization is highly nonlinear.The structure of commonly used multilayer neural networks is shallow.When a feedforward neural network with more than three layers is training by backpropagation, the model always suffers from the problem of poor generalization.This modelling technique cannot meet the demand of the accuracy of the estimation.To achieve more accurate estimation of MI, DBN models are established in this study.DBN has a deep architecture and stronger generalization capability.

Structure of deep belief network
The limitation of traditional neural networks is that they usually have shallow structures.There are typically no more than three layers in a conventional neural network model.With this limitation, a neural network with shallow structure may not achieve satisfactory estimation performance when applied to highly nonlinear industrial processes.The actual industrial processes are commonly highly nonlinear.The shallow architecture of feedforward neural networks could lead to the lack of representation capability [15,16] .To approximate various regions of processes, the model needs more hidden neurons added to the hidden layers.It is suggested in recent research that networks with a deep structure can achieve reliable results [15] .DBN has been successfully applied to many research areas, such as classification and recognition [17] .In a DBN model, several restricted Boltzmann machines (RBMs) can be stacked and combined as one learning network.DBN is developed with a deep structure based on a deep learning technique.Fig. 4 presents the basic architecture of DBN.
The DBN shown in Fig. 4 has five layers, an input layer, an output layer and three hidden layers.In Fig. 4

Restricted Boltzmann machines
In the 1980s, Smolensky [18] developed the restricted Boltzmann machine.Hinton et al. [10] developed DBN by stacking RBMs as the layers of DBN.A DBN contains stacked RBMs as shown in Fig. 4.
To understand the basics of RBM, the probability function between visible units and hidden units need to be introduced at first.Equation ( 3) shows the probability function where Z represents a normalizing factor, v represents the vector of visible layer, h represents the vector of hidden layer.The probability P(v, h) increases when the energy function decreases.In the RBM, the energy function is given by where W, b and c are parameters of the function.It should be noted that the vector v and the vector h are both binary-valued.Binary RBMs are used as hidden layers in a DBN model.However, they cannot be used to deal with continuous variables.To overcome this issue, (4) can be extended to energy function of Gaussian RBM: where a i is the mean of Gaussian distribution, σ i is the standard deviation of Gaussian distribution for input neuron.The samples of input data are commonly normalized to zero mean and unit variance in practical applications.Therefore, (5) can be changed to Hintons [19] also described other forms of RBM, but the DBN in this paper only uses Gaussian RBM and binary RBM.

Learning algorithm for RBM
The objective of training RBM is to maximize the probability P(v), which can be achieved by minimizing the energy function.From Gibbs sampling, h can only be sampled from v of visible layers.Based on the previous work, the gradient at a visible point v can be formulated as where is a vector of the network parameters.Computing the positive term in ( 7) is easy because the vector v has been known.Calculating the negative term in (7) becomes intractable.The contrastive divergence is a useful method to overcome the issue of calculating second-order approximation of the negative term and it offers an effective solution [20,21] .The process of training RBM starts with training vectors on the visible units.Then hidden units can be generated from by Gibbs sampling and update visible units from .It is named as a Markov chain.After infinite iterations of Gibbs sampling, the visible units v (∞) and hidden units h (∞) are sampled.The correlation of v (∞) and h (∞) can be measured after sampling for a long time.However, in practical situations, just one iteration of Gibbs sampling can achieve a satisfactory result and the learning algorithm works well.

Supervised training through backpropagation
Back-propagation is the most commonly used supervised training approach to train neural networks.After the unsupervised training phase, the back-propagation algorithm will fine tune the whole network in the supervised training phase.The errors between the network outputs and the corresponding labels are computed and back propagated to the previous layer.Equation (8) shows the error terms

Additional top layer
where O j represents the network output for a training sample, T j is the corresponding target value for the j-th output neuron.The error term of hidden layers is formulated as where w jk is the vector of weights connecting output layer and the last hidden layer, Err k is the error term of output layer.During training, the weight updating is transferred from the output layer to the input layer.The formulas of weight updating are given as where η is the learning rate of the training process, w ij and c j are the vectors of weights and bias respectively.The learning rate needs to be properly selected.A large learning rate may miss the minimum whereas a small learning rate usually leads to slow training speed.
As described earlier, the training of DBN contains an unsupervised training phase and a supervised training phase.The initial weights are adjusted to an appropriate region in the unsupervised training procedure.The whole network is then fine-tuned by backpropagation in the supervised training phase to achieve accurate modelling results.The profuse latent information extracted from input variables during the unsupervised training is more interpretable.This semi-supervised method improves the robustness and generalization capability of model with a deep architecture.

Polypropylene polymerization process
Advanced monitoring, control, and optimization techniques are essential in modern industrial chemical processes to overcome the issue of high cost and improve the efficiency of production [22] .In this paper, DBN is used to develop soft sensors for a polypropylene production plant in China.In this plant, two continuous stirred tank reactors (CSTR) and two fluidized-bed reactors (FBR) are used to produce polypropylene as shown in Fig. 5. Propylene, hydrogen, and catalyst are fed to reactors.Reactants for the growing polymer particles are these gases and liquids fed to the reactors.They are also the provider of the heat transfer media.The melt index of polymer is a key polymer quality variable and should be closely monitored and controlled.MI of polypropylene depends on many factors like catalyst, reactor temperature and concentration of the reaction materials.For example, hydrogen can increase the polymerization rate of polypropylene.It mainly increases the initial polymerization rate of propylene [23] .The hydrogen concentration regulates the weight of polypropylene.Hydrogen can also delay the decay rate of the catalyst.Due to the difficulty of measuring polymer MI in this process, the relationship between MI and some process variables which can be measured easily during the process needs to be found.The inferential estimation of MI can be obtained by soft sensors.As this industrial process is very complicated, it is difficult to develop first principle models linking polymer MI with easy-to-measure process variables.Therefore, nonlinear data-driven models need to be utilised in developing soft sensors for this process.
The polypropylene grades are related to some key variables, such as reactant composition, reactor temperature and catalyst properties.The feedstock of D201 are propylene, hydrogen and catalyst.The co-monomer is added to D204.Several grades of polymers were produced within one month.Industrial process operational data covering this time period are available for this application.In this process, polymer MI were logged every two hours and process samples were logged every half hour.In fact, MI are only highly relevant to a few process variables.Based on the research of Zhang et al. [24] , there are strong correlations between MI of polymer in reactor D204 and hydrogen concentration in reactor D201 and reactor D202.MI of polymer in reactor D201 is highly relevant with the hydrogen concentration and feed rate in reactor D201 [24] .Concentration of hydrogen in D201 and D202, feed rate of hydrogen and MI of polypropylene in reactor D201 and D204 are shown in Figs.6-8, respectively.Due to the industrial confidentiality, the units of these variables are not disclosed.
From Fig. 8, it can be observed that the MI data cover quite a wide range.Thus, the data are suitable for de-  -8, it can be seen that MI is highly correlated with hydrogen feed rate and concentration.The time delay of the industrial process can be found based on the cross-correlations analysis [24] .The data-driven model for inferential estimation of MI can be represented as where MI 1 and MI 2 are the MI in D201 and D204, respectively, H 1 and H 2 are the concentrations of hydrogen in D201 and D202, respectively, and F is the hydrogen feed rate to D201.The original process data set contains 1 534 samples of process operational data and 383 samples of quality data (MI) which are available for the establishment of data driven DBN models.It indicates that the amount of process variable samples is larger than the amount of quality variable samples.There are only 383 samples of process variables that have corresponding quality variables.However, the rest of process variable samples can be utilized by DBN in the unsupervised training phase.By such means, DBN can capture much valuable information from process data.The estimation of MI achieved by DBN can be improved.
The data set for supervised training phase were separated into a training data set, a testing data set and an unseen validation data set.The partition of data sets for estimating MI 1 is presented by Table 1.The partition of data sets for estimating MI 2 is presented by Table 2.
The selections of model structure can be determined by the training data set and testing data set through cross-validation.The unseen validation data are useful to test the performance of the final developed DBN model.Those input data samples without the corresponding output data are named as "unlabeled" process data.Therefore, in the unsupervised training phase of DBN models, samples of process variables without the corresponding MI data can also be utilized.However, those "unlabeled" process variables could not be used by other traditional neural networks for inferential estimation of product quality.For comparison, conventional neural network models were also developed.

Results and discussions
The model structures need to be determined first.In this study, 25 DBN models with different architectures were developed and compared to each other.The one giving the best performance on the testing data set was regarded as having the appropriate structure.These DBN models have one visible layer (input layer), one additional top layer (output layer) and two hidden layers.The learning rate in the unsupervised training phase is selected as 0.01.The learning rate in the supervised training phase is 0.001 5, The structures of 25 DBN models are shown in Table 3. Figs. 9 and 10 present the sum of squared errors (SSE) on the training data set and testing data set, respectively for these 25 DBN models for estimating MI 1 .
From Figs. 9 and 10, the 7th DBN model gives the best generalization performance on the testing data set.The 6th DBN model gives the second lowest value of error on the testing data set.The 12th to the 25th DBN models have lower training errors than the 7th DBN model.However, those models give larger testing errors than the 7th DBN model.Thus, the 12th to the 25th DBN models are likely have suffered from overfitting and their structures are not appropriate to be selected.From the results given by Figs. 9 and 10, the number of neurons in the first hidden layer can be considered as 5. From Table 3, it can be seen that these 25 DBN models have close numbers of neurons in the first and second hidden layers.The first step of this investigation is to confirm that the 7th DBN gave the best performance among these 25 DBN models.To avoid the situation that some DBN models not included in Table 3 might give better performance, the second step is to further investigate the number of neurons in the second hidden layer.Nine additional DBN models with neurons in the second hidden layer ranging from 2 to 10 are constructed.The values of error terms on the training and testing data of these DBN models are shown in Table 4.
From Table 4, it can be seen that the training error of the 7th DBN is the smallest.However, its testing error is not the smallest.The testing errors from the 6th to the 9th DBN increased.Therefore, the estimation on testing data obtained by the 6th to the 9th DBNs are overfitted.The 4th DBN (i.e., the 7th DBN model in Table 3) has the lowest testing error among all DBN models.This indicates that the 4th DBN model has better performance than other models and its structure should be adopted.
In order to demonstrate the advantage of using those input data samples without corresponding target values as additional training data in the unsupervised training     5, where DBN No. 2 was built by using "unlabeled" process data without corresponding MI samples as well.DBN No. 2 in Table 5 is in fact the 4th DBN model in Table 4.The two DBN models in Table 5 have the same structure.It can be seen from Table 5 that  7. It can be seen from Table 7 that the SSE of DBN on training data set is larger than that of the neural network.However, the SSE values of DBN on testing and unseen validation data set are much smaller than those of the neural network.The strong generalization    Fig. 12 compares the estimations of MI 2 by DBN and conventional feedforward neural network on the unseen validation data.In Fig. 12, the solid, dashed, and dotted lines represent, respectively, the actual values of MI 2 , the estimations by DBN, and the estimations by the conventional feedforward neural network.From Fig. 12, it can be seen that both models give similar performance when MI values are high.However, when MI values are low, the DBN model gives better estimations.Table 8 shows the SSE values in the estimation of MI 2 .The SSE of DBN on training data is larger than that of neural network.The SSE values of DBN on testing and unseen validation data set are much smaller than those of the neural network model.The results in Fig. 12 and Table 8 indicate that the estimations of MI 2 achieved by DBN are more reliable and accurate than those from the conventional feedforward neural network.

Conclusions
DBN models for the on-line inferential estimation of the polymer melt index in an industrial polymerization process are developed in this paper.DBN can be developed with a deep structure.The profuse latent information from the process variables can be extracted by DBN.The "unlabeled" process data, which ware useless to the conventional neural network models, can be used in the unsupervised training stage of DBN.It is shown in this paper that the accuracy of inferential estimation of polymer MI can be improved by this means.Selection of DBN structure is investigated in the paper.The appropriate structures of DBN for the estimation of MI 1 and MI 2 are selected.DBN has much better performance compared with the results from conventional feedforward neural networks.The study demonstrates that DBN is very suitable for developing nonlinear data-driven models for the inferential estimation of polymer melt index.The proposed DBN model could be extended for developing multi-step ahead prediction models in the future.The network structure of DBN can be further optimized to improve the robustness.

Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.

5 Fig. 2 "
Fig. 1 Model of simple perceptron , W is the weight of the network, b and c are biases of the network.It can be considered that DBN is a combination of stacking RBMs.Each hidden layer of DBN is regarded as one single RBM.Compared with the traditional Boltzmann machine, the neurons in a hidden layer of DBN are not connected to each other.However, the layers in a network have symmetrical connections with each other.The units in hidden layers are binary units and the visible input layer units are Gaussian units.The first phase of training is unsupervised training and the process operational data are used to train the DBN model without any target variables involved.The unsupervised training helps the DBN mine more correlations than the feed-forward neural networks.The weights are adjusted

Fig. 9
Fig. 9 SSE on training data for estimating MI 1

Fig. 10 SSE
Fig. 10 SSE on testing data for estimating MI 1 the first DBN model has larger SSE values on the training, testing and validation data set than the second DBN model.Therefore, DBN can extract more features from the "unlabeled" data.DBN No. 2 gives better performance than DBN No. 1.Seven conventional single hidden layer feedforward neural network models were also established for the purpose of comparison.The SSE values of these conventional feedforward neural networks with different structures on the training and testing data are given in Table6.From Table6, the 4th neural network has the lowest SSE on the testing data set for estimating MI 1 and the 3rd neural network has the lowest SSE on the testing data for estimating MI 2 .The estimations of MI 1 on the unseen validation data by DBN and the conventional feedforward neural network are shown in Fig.11.In Fig.11, the solid, dashed, and dotted lines represent, respectively, the actual values of MI 1 , the estimations by DBN, and the estimations by the conventional feedforward neural network.It can be seen from Fig. 11 that the estimations by the DBN model are generally closer to the corresponding actual values of MI 1 than those by the feedforward neural network.The SSE values of both DBN and neural network are presented in Table

Table 1
Partition of data sets for estimating MI 1

Table 2
Partition of data sets for estimating MI 2

Table 3
DBN models with different structures

Table 4
Errors of DBN models with different structures for estimating MI 1

Table 6
Errors of neural networks with different structures

Table 5
Errors of DBN models for estimating MI 1 with different input data capability of DBN was proved by the inferential estimation of MI 1 .It gave better performance than the feed-forward neural network.The profuse latent information from process data were extracted by DBN during the unsupervised training phase.Overall, the DBN model gives more accurate estimations of MI 1 .

Table 7
SSE of estimating MI 1

Table 8
SSE of estimating MI 2