Modelling of a post-combustion CO 2 capture process using extreme learning machine

This paper presents modelling of a post-combustion CO 2 capture process using bootstrap aggregated extreme learning machine (ELM). ELM randomly assigns the weights between input and hidden layers and obtains the weights between the hidden layer and output layer using regression type approach in one step. This feature allows an ELM model being developed very quickly. This paper proposes using principal component regression to obtain the weights between the hidden and output layers to address the collinearity issue among hidden neuron outputs. Due to the weights between input and hidden layers are randomly assigned, ELM models could have variations in performance. This paper proposes combining multiple ELM models to enhance model prediction accuracy and reliability. To predict the CO 2 production rate and CO 2 capture level, eight parameters in the process were utilized as model input variables: inlet gas ﬂow rate, CO 2 concentration in inlet ﬂow gas, inlet gas temperature, inlet gas pressure, lean solvent ﬂow rate, lean solvent temperature, lean loading and reboiler duty. The bootstrap re-sampling of training data was applied for building each single ELM and then the individual ELMs are stacked, thereby enhancing the model accuracy and reliability. The bootstrap aggregated extreme learning machine can provide fast learning speed and good generalization performance, which will be used to optimize the CO 2 capture process.


Introduction
Greenhouse emissions (GHE), mainly carbon dioxide (CO 2 ), is identified as the chief reason resulting in the global climate change, especially the global warming.The growing energy demand, due to rapid increasing population and development of industrialization, are directly linked to the increasing release of GHE.The target of a 50% reduction of CO 2 emission by 2050 comparing with the level in 1950 is set by the Intergovernmental Panel on Climate Change.
Carbon capture and storage (CCS) has been widely believed as an advanced technology to achieve CO 2 emission reduction, which captures, transports and stores CO 2 .There are three major types of technologies applied for CCS: postcombustion, pre-combustion and oxyfuel combustion.Among these various CCS technologies, post-combustion CO 2 capture (PCC) process is considered as the most convenient way to reduce CO 2 emission from coal fired power plants, as it can retrofit the exiting power plant and be integrated into new ones.However, PCC process will generate a large amount of energy penalty, which reduces the efficiency and effectiveness of the power plant.The energy requirement is strongly influenced by the operation conditions, equipment dimensions and capture target of PCC process.Therefore, it is necessary to apply process optimisation in order to enhance the efficiency of CCS systems.
In order to optimize the operation of post-combustion CO 2 capture process, a reliable and accurate process model is necessary.In the past, researchers have proposed various kinds of modelling technologies, such as mechanistic models (Lawal et al. 2010;Biliyok et al. 2012;Posch and Haider 2013;Cormos and Daraban 2015) and data-driven models (Zhou et al. 2009(Zhou et al. , 2010;;Sipocz and Assadi 2011).However, some problems have been raised up by using the above mentioned methods.For instance, the development of mechanistic model is not only time consuming, but also needs a huge volume of knowledge of the underlying first principles of the process.It is also computationally very demanding when using a detailed mechanistic model in process optimisation.Statistic models can overcome these problems and are efficient in building data driven models, but they still have a few shortcomings.It is shown in that statistical model is unable to describe the nonlinear relationships that possibly exits among the parameters (Zhou et al. 2010).In this case, another advanced modelling method, artificial neural networks (ANNs), is proposed to address the above weakness.However, feedforward neural networks trained by the back propagation (BP) learning algorithm have some issues: firstly, the most appropriate learning rate is difficult to determine; secondly, the presence of local minima affects the modelling results; then, networks would possibly be over trained leading to poor generalization performance; lastly, it is also time-consuming when applying gradient based learning (Huang et al. 2006).
Extreme learning machine (ELM) was proposed into address the issue of slow training in conventional feedforward neural networks (Huang et al. 2006).ELM is basically a single hidden layer feedforward neural network with randomly assigned weights between the input and hidden layers.The weights between the hidden and output layers are determined in a one-step regression type approach using generalised inverse.Thus, an ELM can be built very quickly.As the weights between the input and hidden layers are randomly assigned, correlations can exist among the hidden neuron outputs and variations in model performance can exist.This paper proposes using principal component regression (PCR) to obtain the weights between the hidden and output layers in order to overcome the correlation issue among hidden neuron outputs.This paper also proposes building multiple ELMs on bootstrap resampling replications of the original training data and then combining these ELMs in order to enhance model accuracy and reliability.The proposed method is applied to the dynamic model development of the whole post-combustion process plant.
This paper is structured as follows: Sect. 2 briefly presents post-combustion CO 2 capture process through chemical absorption.Extreme learning machine, a method for calculating output layer weights in ELM using PCR, and aggregating multiple ELM are given in Sect.3. Application results and discussions are presented in Sect. 4. Section 5 draws some concluded remarks.

CO capture process through chemical absorption
Figure 1 shows a typical post-combustion CO 2 capture process through chemical absorption.It consists of two major parts: an absorber and a stripper.In details, the flue gas from the power plant is pressured into the bottom of absorber and contacted counter-currently with lean MEA solution from the top side.The lean MEA solution will chemically absorb the CO 2 in flue gas, forming rich amine solution.The treated gas stream containing much lower CO 2 content is leaving from the top of absorber.Then the rich amine solution is pressured into the regenerator before preheating in the cross heat exchanger.In the stripper, CO 2 is separated from rich amine solution by the heat provided from the reboiler.The regenerated CO 2 is cooled in condenser and compressed for storage, and remaining solution (lean solution) is recycled to the cross heat exchanger to exchange heat with rich amine.The heat supplied in the reboiler, coming from the low pressure steam from power plant, is used to increase the temperature of solution, separate CO 2 from rich amine and vaporize the gas in stripper.This will result in a large energy consumption.Two parameters are identified to affect the process performance: CO 2 capture level and CO 2 production rate.CO 2 capture level is the amount of CO 2 extracted from the inlet flue gas in absorber column, which is calculated in Eq. (1).
where, m outlet co 2 , V outlet gas, m inlet co 2 and V inlet gas represent CO 2 mass fraction in gas out of absorber, gas flow rate out of absorber, CO 2 mass fraction in inlet flow gas of absorber, and inlet gas flow rate of absorber, respectively.CO 2 production rate represents the amount of CO 2 captured after the condenser, which is an indicator for the whole process because it is not affected by a single component of the process.It is calculated as in Eq. ( 2): where o co 2 is CO 2 production rate after the condenser, _ m co 2 and ṽoutlet gas are CO 2 mass fraction and gas flow rate of the outlet gas from stripper respectively.

Single hidden layer neural networks
Figure 2 shows the structure of a single hidden layer feedforward neural network (SLFN).For N arbitrary distinct samples (x i , t i ), where x ¼ x i1 ; x i2 ; . ..; x in ½ T 2 R n is a vector of network inputs and t i ¼ t i1 ; t i2 ; . ..; t im ½ T 2 R m is a vector of the target values of network outputs.The output of a standard SLFNs with N ˜hidden nodes and activation function g(x) is shown in the following equation: where w i ¼ w i1 ; w i2 ; . ..; w in ½ T is a vector of the weights between the ith hidden node and the input nodes, b i is the bias of the ith hidden nodes, x j is the jth input sample, is a vector of the weight linking the ith hidden node and the output node.The output node is chosen to have linear activation function and the hidden layer neurons use the sigmoid activation function in this paper.
In theory, the standard SLFNs can approximate any continuous nonlinear functions with zero error, which means P Ñ j¼1 o j À t j ¼ 0. Specifically, there exits b i , w i and b i to make: The above equation can be written as Hb = T, where: In the above equations, H is called hidden layer output matrix of the neural network and the ith column of H is the ith hidden node output with respect to inputs x 1 , x 2 , …, x N .Training of SLFNs can be done through finding the minimum value of E ¼ min H NÂ N b NÂm À T NÂm k k .SLFNs are usually trained by gradient-based learning algorithms, such as BP algorithm, which typically need many iterations and are typically slow.The process of training is to search the minimum value of H NÂ N b NÂm À T NÂm k kby numerical optimisation methods.In this procedure, the parameters h = (b, w, b) is iteratively adjusted as below: where g is the learning rate.By using BP algorithm, the parameters are updated by error propagation from the output layer to the input layer.

Bootstrap aggregated ELM
Huang et al. have proved that, if the activation function g(x) is infinitely differentiable in any interval and the number of hidden nodes is large enough, it is not necessary to adjust all the weighting parameters of the network (Huang et al. 2006).In other words, the weights and biases Fig. 2 The structure of single hidden layer feedforward networks between the input and hidden layers can be randomly chosen.In order to get good performance, the required number of hidden nodes is not more than the number of input samples.Huang et al. have used a method of finding a least square solution of the linear equation Hb = T to obtain the weights between the hidden and output layers.
where H y is the generalised inverse of H.However, as the hidden layer outputs can be collinear, the modelling performance would be poor by using least square solution to find the weights between the hidden and output layers.This would be especially true for ELM as they have randomly assigned hidden layer weights and typically large number of hidden neurons are required.This paper proposes using PCR to obtain the weights between the hidden and output layers to overcome the multicollinearity problems.Instead of regressing H and T directly, the principal components of H matrix are used as regressors.
The matrix H can be decomposed into the sum of a series of rank one matrices through principal component decomposition.
In the above equation, u i and p i are the ith score vector and loading vector respectively.The score vectors are orthogonal, likewise the loading vectors, in addition they are of unit length.The loading vector p 1 defines the direction of the greatest variability and the score vector u 1 , also known as the first principal component, represents the projection of each column of H onto p 1 .The first principal component is thus that linear combination of the columns in H explaining the greatest amount of variability (u 1 = Hp 1 ).The second principal component is that linear combination of the columns in H explaining the next greatest amount of variability (u 2 = Hp 2 ) subject to the condition that it is orthogonal to the first principal component.Principal components are arranged in decreasing order of variability explained.Since the columns in H are highly correlated, the first a few principal components can explain the majority of data variability in H. where k represents the number of principal components to retain, and E is a matrix of residuals of unfitted variation.
If the first k principal components can adequately represent the original data set H, then regression can be performed on the first k principal components.The model output is obtained as a linear combination of the first k principal components of H as where w is a vector of model parameters in terms of principal components.The least squares estimation of w is: The model parameters in Eq. ( 8) calculated through PCR are then given by the following equation: The number of principal components, k, to be retained in the model is usually determined through cross-validation (Wold 1978).The data set for building a model is partitioned into a training data set and a testing data set.PCR models with different numbers of principal components are developed on the training data and then tested on the testing data.The model with the smallest testing errors is then considered as having the most appropriate number of principal components.
As shown in (Zhang 1999;Li et al. 2015), combining several networks can improve the prediction accuracy on unseen data and give a better generalization performance.The bootstrap re-sampling replication of the original training data is used for training individual networks and the overall output of the aggregated neural networks is a weighted combination of the individual neural network outputs (Fig. 3).
Therefore, the procedure of building bootstrap aggregated ELM model can be summarized as follows: Given an activation function g(x), and number of hidden nodes N ˜, Step 1: Apply bootstrap re-sampling to produce n (e.g.n = 50) replications of the original training data set, (x i , It has been suggested that, the model prediction confidence bounds can be calculated from individual predictions by using bootstrap aggregated neural networks (Zhang 1999;Li et al. 2015).The standard error of the ith predicted value is calculated as where y(x i ) = P b=1 n y(x i ; W b )/n and n is the number of neural networks.The 95% prediction confidence bounds can be calculated as y(x i ;) ± 1.96r e .It indicates a 95% confidence interval which will contain the true process output with a probability of 0.95.A narrower confidence bound is preferred as it indicates the associated model prediction is more reliable.

Performance evaluation
The simulated dynamic process operation data in (Li et al. 2015) were used to build data-driven models.The simulated data were generated from the mechanistic model implemented in gPROMS at University of Hull with a sampling time of 5 s.The data were divided into three groups: training data (56%), testing data (24%), and unseen validation data (20%).Furthermore, the constructed model used the input data of the second batch in which the lean solution flow rate has a step change, to verify its accuracy.To demonstrate the good performance of bootstrap aggregated ELM, its results are compared with those from (Li et al. 2015).Before training, the data should be scaled to zero mean and unit variance.Both bootstrap aggregated neural network (BA-NNs) and BA-ELM models combine 30 neural networks.In addition, the numbers of hidden neurons used in BA-NNs and BA-ELM are selected within the range of 2-20 and 40-100 respectively.All models with the number of hidden neurons in the above ranges are developed and tested on the testing data.The models give  the smallest mean squared errors (MSE) are considered as having the appropriate number of hidden neurons.The reason for ELM having more hidden neurons is due to the random nature of hidden layer weights in ELM and small number of hidden neurons would usually not be able to provide adequate function representation.The form of the dynamic model is shown in Eq. ( 15).
where y represents CO 2 capture level or CO 2 production rate, u 1 to u 8 are, respectively, inlet gas flow rate, CO 2 concentration in inlet flue gas, inlet gas temperature, inlet gas pressure, MEA circulation rate, lean loading, lean solution temperature, and reboiler temperature.Equation (15) represents a first order nonlinear dynamic model which is of the lowest order.For practical applications, model of the least complexity is generally preferred.If the low order nonlinear dynamic model could not give satisfactory performance, then higher order nonlinear dynamic models should be considered.
When developing the two different models, it is clearly seen that BA-ELM model is very simple because its training only needs one iteration.The performance comparison of the bootstrap aggregated neural networks and bootstrap aggregated ELM is shown in Table 1.The training CPU time of BA-ELM is about nine times lower than that of BA-NNs.The short training time of BA-ELM is due to the fact that each individual ELM is trained in one step without the need of gradient based iterative training.The verification time of BA-ELM is longer than that of BA-NN as the individual ELMs have more hidden neurons than the individual networks in BA-NN.The MSE value on the unseen validation data from BA-NNs is higher than that from BA-ELM.This could be due to the training of some neural networks in BA-NN might have been trapped in local minima or over fitted the noise.The results given in Table 1 demonstrate that BA-ELM is able to train faster and perform better than BA-NNs.The performance of onestep ahead predictions and multi-step ahead predictions of CO 2 production rate in BA-ELM and BA-NNs is indicated in Fig. 4. Clearly, the prediction using BA-ELM model is   much better than that using BA-NNs model, especially after 92 steps for the long range prediction.
The MSE values of CO 2 production rate for individual ELM models can be seen in Fig. 5.The performance on the unseen validation data is not in accordance with that on the training and testing data.For instance, the prediction on the unseen validation data by the 20th ELM is the worst, however, its performance on the training and testing data is better than many of the individual ELM models.This clearly demonstrates that single network has non-robust nature.Nevertheless, when several individual networks are combined together to build the model, the weakness can be addressed easily.Figure 6 indicates the MSE values on model building data by aggregating different numbers of ELM models.The first bar in Fig. 6 represents the first individual ELM model shown in Fig. 5, the second bar represents the combination of the first two individual ELM models, and the last bar represents combining all the individual ELM models.Look into the trends of top and bottom plots in Fig. 6, the prediction performance of bootstrap aggregated ELM on the unseen validation data is consistent with that on the training and testing data.In other words, combining several ELM models is able to get more accurate predictions on the training and testing data, as well as on the unseen validation data, than single ELM models.Furthermore, the MSE values in Fig. 6 indicates that, the aggregated ELM model provides more accurate predictions than single ELM models, when comparing with the MSE values in Fig. 5.
Figure 7 shows the performance comparison of onestep-ahead predictions and multi-step-ahead predictions of CO 2 capture level using BA-ELM and BA-NNs models.It is clear seen from the bottom graph both one-step-ahead predictions and multi-step-ahead predictions from BA-NN are reasonably accurate though some errors are observable, but the long range predictions (green line) are not accurate after 82 steps (410 s).However, in the top graph, the accurate one-step-ahead predictions and multi-step-ahead predictions from BA-ELM are very encouraging, indicating that the model has captured the underlying dynamics of the process.Such accurate long range predictions can be further used for model predictive control and real-time optimisation applications.
The performance comparison of the bootstrap aggregated neural networks and bootstrap aggregated ELM for CO 2 capture level is shown in Table 2.The training CPU time of BA-ELM is six times lower than that of BA-NN, while its verifying CPU time is a little bit longer than the latter one.This is because each network in the BA-ELM has more hidden neurons than each network in BA-NN.Looking into the comparison of the accuracy, the mean squared error (MSE) values on training data in both models are almost same, while the MSE value of BA-ELM on validation data is three times lower than that of BA-NNs.This shows that BA-ELM has a faster training speed and better generalization performance than BA-NNs, which has been proved in Huang et al. (2006).The faster training speed of BA-ELM is due to the ELMs are trained in a onestep procedure without the need of gradient based iterative procedure.

Conclusions
The BA-ELMs is demonstrated as a powerful tool to model the post-combustion CO 2 process, which can be trained much faster and is more accurate than the BA-NNs models.It gives a good generalization performance on unseen data, because the aggregation of multiple ELM can make the model avoid being trapped into local minima and overfitting problems.As ELM can be trained very quickly without iterative network weight updating, aggregating multiple ELMs does not pose any computational issues in model development.The model will be used to optimize the CO 2 capture process in the future.The model prediction confidence bounds provided by the BA-ELM can be incorporated in the optimisation objective function to enhance the reliability of the optimisation (Zhang 2004).Nevertheless, the BA-ELM still exits some problems.For instance, the number of hidden neurons is quite large, which may increase the model computation burden in

Fig. 1
Fig. 1 Simplified process flow diagram of chemical absorption process for post-combustion capture plant

Fig. 4
Fig. 4 Dynamic model prediction of CO 2 production rate using BA-ELM (top) and BA-NNs (bottom)

Fig. 6
Fig. 5 MSE of CO 2 production rate for individual ELM models

Fig. 7
Fig. 7 Dynamic model prediction of CO 2 capture level using BA-ELM (top) and BA-NNs (bottom)

Table 1
Performance comparison of BA-ELM and BA-NNs for CO 2 production rate

Table 2
Performance comparison of BA-ELM and BA-NNs for CO 2 capture level