The pollutant concentration prediction model of NNP-BPNN based on the INI algorithm, AW method and neighbor-PCA

At present, the numerical prediction models fail to predict effectively due to the lack of basic data of pollutant concentration in a short term in China. Therefore, it is necessary to study the statistical prediction methods based on historical data. The traditional Back Propagation Neural Network (BPNN) has been used to predict the pollutant concentration. The missing data also has an impact on modeling, and how to use historical data effectively of multiple monitoring stations in a city should be concerned. In this study, the Improved Newton Interpolation (INI) algorithm has been adopted to solve the problem of missing data, and assigning weight (AW) method has been proposed to enrich data of per station. The Neighbor-Principal Component Analysis (Neighbor-PCA) algorithm has been employed to reduce the dimension of data in order to avoid overfitting caused by high dimension and linear correlation of multiple factors. The strategy of early stopping and gradient descent algorithm have been utilized to avoid the slow convergence speed and overfitting by the traditional BPNN. The methods (INI, AW, Neighbor-PCA) have been integrated as a prediction model named NNP-BPNN. Forecasting experiments of PM2.5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{2.5}$$\end{document} have shown that the NNP-BPNN model can improve the accuracy and generalization ability of the traditional BPNN model. Specifically, the average root mean square error (RMSE) has been reduced by 24% and the average correlation relevancy has been increased by 9.4%. It took 20 s to implement BPNN model, it took 170 s to implement NN-BPNN model and it took 47 s to implement NNP-BPNN model. The time used by NNP-BPNN model is reduced by 72% than that of NN-BPNN model.


Introduction
Currently, environmental pollution poses a serious threat to human health (Zheng et al. 2016).The air quality prediction model mainly adopts the classic regression statistical model and the Numerical Air Prediction model (NAP) (Demuzere et al. 2009).The NAP uses mathematical models of the atmosphere and oceans to predict the air quality based on current atmosphere conditions and the pollutant sources (Zhang et al. 2014).Using cloud computing to implement high performance of computing of NAP would be the key research area in the future (Li et al. 2017;Liu et al. 2017;Li et al. 2017;Cui et al. 2016).However, the NAP can not be adopted among many cities in China due to the lack of the basic data of pollutant concentration in a short time (Huchao et al. 2015;Niska et al. 2005).Hence, statistical prediction methods based on historical data should be studied in order to obtain a high prediction performance.Shi et al. (2012) have proved that Artificial Neural Network (ANN) can provide better results than the traditional multiple linear regression models about pollutant concentration prediction based on historical monitoring data.The combined model of ARMA and BPNN based on historical monitoring data studied by Zhu and Lu (2016) has a smaller prediction error than that of the traditional BPNN.The time series data is decomposed into wavelet coefficients, and the prediction experiments based on three types of neural network models (Multi-layer Perceptron, Elman and Support Vector Machine) based on historical data have shown that the improved models have provided low RMSE and mean absolute error in comparison to the original models, which is developed by Feng et al. (2013).Grivas and Chaloulakou (2006) has employed a novel ANN which selected factors of historical monitoring data via genetic algorithm, proving the model has produced smaller error than that of the regression model in the prediction of PM 10 concentration.Shi et al. (2012) has used Feed-forward BPNN with the hyperbolic tangent sigmoid activation function and the Levenberg-Marquardt optimization method to predict PM 10 concentration in each station.However, there is no explanation that the interaction between central station and neighbor stations in their studies.
The monitoring data that is often lost from multiple stations in a city.Nejadkoorki and Baroutian (2011) have removed the missing data, resulting in the waste of data.The Newton Interpolation algorithm has good interpolation effect when the data loss ratio is not too high (Breu et al. 2016).Thus, in this study, the missing data is effectively interpolated by INI algorithm.In order to enlarge the data, we have taken neighbor stations data into consideration.Because different distance makes different effects on central station, weight has been assigned to each stations data.The oversize of the dimension of input data would affect the generalization ability of the BPNN, therefore, the PCA algorithm is utilized to extract the eigenvector from neighbor stations effectively in the study (Skrobot et al. 2016).The extracted eigenvector from neighbor stations and the eigenvectors of the central station are regarded as input to the BPNN, which can not only keep information as much as possible, but also ensure that the dimension of the input to the BPNN is suitable, increasing the generalization ability of the BPNN.Finally, early stop strategy (Yang and Zhang 2017) has been performed during training.That is, when the error reached the value set by program, the training should be terminated in order to avoid overfitting.The learning rate gradient descent strategy (Huang and Lin 2009) has been employed to avoid the slow convergence speed.
NEXT, INI algorithm would be proved in detail for dealing with missing data.Multiple stations data and AW method have enlarge data for modeling.However, this affects the speed and efficiency of BPNN.Thus, Neighbor-PCA algorithm has been employed to reduce the dimension of data in order to avoid overfitting.BP model and NN_BPNN model are used for contrast.The effectiveness of NN_BPNN model has showed that the integration of INI algorithm and AW method proved to be persuasive and the effectiveness of NNP_BPNN model has showed that the combination of INI algorithm, AW method and Neighbor-PCA algorithm proved to be persuasive.

Related algorithm
In this section, the basic principles of BPNN, Newton interpolation algorithm and PCA algorithm are respectively introduced.

BPNN principles
BPNN is composed by a series of simple unit connected with each other densely.In data mining , neural network has been employed by Chang and Yang (2017).Each unit has a certain amount of input and output.The structure of neuron is shown in Fig. 1 and the structure of BPNN is shown in Fig. 2.
The stimulation delivered by neuron is called x i .The connection weight is called w i .This accumulation is called a j .Sigmoid(x) is put as the activation function.The error is utilized to modify the connection weights for feedback to complete the learning process, which is the feedback mechanism of the BPNN (Guan et al. 2016).The inner of neuron also interferes with the output result, so an extra bias named as b is brought.So we get the formula (1).
But in practice, the traditional BPNN has some shortcomings, such as the long training time, the slow convergence speed, local minimum and the poor stability.It is hard to adjust the initial weight and learning rate parameters (Xiao et al. 2012).If the dimension of the input is high enough, there is a greater challenge for training. (1)

Newton interpolation algorithm
Interpolation function has many different types.Li et al. has introduced multiple methods of interpolation (Li et al. 2017).Using basis function to get the Lagrange interpolation polynomial is common in the theoretical analysis.The basis function would be changed with the change of nodes, which results in the change of formula.Newton interpolation algorithm can overcome this shortcoming (Varsamis and Karampetakis 2012).Newton interpolation algorithm function is determined by independent variables and dependent variables.The first-order function,second-order function,kth-order function are shown in formulas ( 2), ( 3) and ( 4) respectively.Newton interpolation formula defined in formula ( 5) is deducted by formulas (2), ( 3) and ( 4).
Whereas, the instability of the interpolation results is affected by the high power of formula (5) during the process of interpolation (Hlbach 1979).

PCA algorithm
Mapping high dimensional data to low dimensional data through PCA algorithm (Nejadkoorki and Baroutian 2011).
It seems convenient to introduce the following steps of PCA algorithm (Table 1).

NNP-BPNN model
In order to improve the data utilization and prediction accuracy, the NNP-BPNN model is proposed for pollutant concentration prediction.In this paper, the INI algorithm has been utilized to handle missing data.The AW method is calculated based on geographical location.The Neighbor-PCA algorithm is used to deal with the data of the neighbor stations. (2)

INI algorithm
There will be some outliers or even missed value in the data because of all kinds of reasons.If we use the Newton interpolation algorithm directly, large error would be brought.The INI algorithm is seen as formula (6).
f newton (x) is the interpolation results of formula (5).f (x) is the expectation of k samples.x i ( i = 1, 2, 3, … , k − 1 ) is independent variables from the nearest k hours.The data of kth hour is the dependent variable.The p i is the probability and it is a constant of 1 k .If the formula I is true, then the samples of (k -1) is used as the independent variables and the kth sample is used as the dependent variable, getting the f newton (x).The expectation value of ( k − 1 ) samples is as the input to f newton (x).If the formula II is true, then f(x) is calculated by formula (7).The data before interpolation is named as D orinal .The data after the interpolation is named of D.

Multiple stations data and the dimension reduction
In this part, we mainly introduce the data composition from multiple stations, Neighbor-PCA algorithm and the training process of BPNN model.

Multiple stations data and AW method
Due to different distance, neighbor stations have a different impact on the central station.The air pollutant concentration Table 1 The steps of PCA algorithm Input data ={X 1 X 2 ...X P } low dimensional: k Steps 1. Centralize all the data: X i − 1 p ∑ i=1 p X i 2. Calculate the covariance matrix for the dataset: X X T 3. Eigen value decomposition for the matrix X X T 4. Take the eigenvectors of the largest k eigenvalues of all monitoring stations is recalculated by AW method defined as formula (8).
The k ij is the weight between the ith station and the jth station.The d ij is the distance between the ith station and the jth station.It is calculated based on the latitude and longitude and defined in formula (9).
The C is the latitude of the ith station, and The B is the longitude of the ith station, The F is the latitude of the jth station, and The E is the longitude of the jth station.The closer distance is, the greater the impact on the central station.The k ij from formula (8) is the minus function.The neighbor stations that are closer from central station can get greater weight from k ij , enhancing the ability of BPNN model.
The D i is the data of ith station and its composition is seen as formula (10) .The D is the collection of data of k stations and is seen as formula (11).i is the data of D i ×k ij where n ranges from 0 to n and is seen as formula (12).
neighbor is the data of D i × k ij where n ranges from 1 to n.
Compared with the traditional methods, we have considered the neighbor stations based on geographic information, which improves the accuracy of BPNN model.

Dimension reduction process
If the dimension is too high or the sample size is too small, the BPNN would be unable to learn the general rule (Meng and Meng 2010).In this paper, the PCA algorithm has been used to handle data neighbor .
The extracted eigenvector from data neighbor using PCA algorithm is named as V defined as formula (13).defined in formula ( 14) has been used as the input to BPNN.A threelayer structure BPNN mentioned above has been applied in this paper.The number of nodes of the hidden layer are determined by the formula (15).The m is the number of nodes of the input layer and the q is the number of nodes of the output layer.The a is the constant from 1 to 10 (Wang ( 8) et al. 2017).The relevancy, error and RMSE between the prediction and actual value are respectively defined as formulas ( 16), ( 17) and ( 18).Modeling is performed in Matlab7.0, and data D is as input to the models.We choose traingda function as gradient descent function and set 100 as interval.Learning rate is 0.1.We set 1000 as the largest number of training and set 0.0001 as target error.The input nodes is 15, the number of output nodes is 1, and a is 8. So, the hidden layer node is 12 according to the formula (15).One half of data has been applied to train and another half of data has been treated to evaluate the model.
The learning rate (lr) valued between 0 and 1, which determines the step size for updating in each iteration.If lr is too big, it is easy to oscillate, and if lr is too small, it converges too slowly.The gradient descent method can not only solve the problem, to a certain extent, but also make results closer to the global minimum (Table 2).

Modeling of NN-BPNN model
Assuming data of the ith hour is missed, we take the data named x from the (i−4)th to the (i−2)th and the (i−1)th data is named as y, which is used to calculate Newton interpolation polynomial.The value of f newton ( x ) can be calculated.The ranged from 0 to 1.The missing data can be obtained by formulas ( 6) and ( 7).Different weights have been assigned for each station to get the data (Table 3).

Modeling of NNP-BPNN model
Throughout this section, The PCA algorithm is performed to deal with data neighbor to obtain data V and train NNP-BPNN model as follows (Table 4).

Experiments and analysis
Here, we mainly talk about data in experiment and discussions about results.

Data description
16 stations data in one city, including pollutant concentration and meteorological parameters of 254 days from July 30, 2015 to April 10, 2016, have been employed in this paper.The basic information is consist of stations name, time, longitude, latitude.Pollutants are consist of CO, SO 2 , O 3 , NO, NO 2 , NOX, PM 2 .5 , PM 1 0 , VOC.The meteorological parameters are consist of relative humidity, wind direction, wind speed, temperature, pressure, visibility, total rainfall.Among them, the longitude and latitude are used to calculate the distance.A continuous 24 h data is lost among 16 stations.

Results and discussions
Three models have been established, including BPNN model, NN-BPNN model, NNP-BPNN model, which the same data and parameters are used.Here, we discuss the effectiveness of NN-BPNN model with INI algorithm and the AW method.Then we discuss the validity of NNP-BPNN model with INI algorithm, AW method and Neighbor-PCA algorithm.The RMES has been regarded as the evaluation criterion shown in formula (18).

The effectiveness of NN-BPNN model
Figure 3 is the comparison of RMES between BPNN model and NN-BPNN model.In the neural network, the local

Conclusions
In this paper, we study the problem that how to handle missing data and how to utilize historical data effectively for prediction with BPNN.There are some achievements in this paper.Firstly, the INI algorithm is adopted to deal with missing data, so as to avoid the waste of data.Secondly, the AW method not only enriches the experimental data, improving the utilization rate of data, but also reduces the RMSE.The average RMSE of NN-BPNN model is reduced by 18% compared with BPNN model.Thirdly, the NNP-BPNN model with INI algorithm, AW method and Neighbor-PCA algorithm has improved the prediction accuracy and relevancy.The average RMSE of NNP-BPNN model is reduced by 7% compared with NN-BPNN model.The average RMSE of NNP-BPNN model is reduced by 24% compared with BPNN model.
There are still some limitation in this research.More methods of dimension reduction should be studied in the future work.The research of NNP-BPNN model to find out the global minimum error.Multiple machine learning models should be explored in air pollutant concentration prediction area.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licenses/by/4.0/),which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Fig. 1 Fig. 2
Fig. 1 The neuron map of the BPNN In order to verify the effectiveness of the NNP-BP model, three models have been established, including the BPNN model, the NN-BPNN model with INI algorithm and AW method and the NNP-BPNN model with INI algorithm, AW method and Neighbor-PCA algorithm.

Table 2
The steps of BPNN model

Table 3
The steps of NN-BPNN model Input Original data D orinalSteps 1.The missing value is calculated to obtain the data D according to formulas (6) and (7). 2. Calculate the distance d ij of each station according to the formula (9) 3. Calculate the weight k ij according to d ij and formula (10) 4. We can get the data according to the k ij and formula (11) 5.The data is used as the input for BPNN model 6.Calculate the number of hidden layer nodes of BPNN according to formula (15) and value of the error function is not the global minimum.Ten times of experiments have been performed to prove that the NN-BPNN model has a smaller value of sum of RMES than that of BPNN model, as shown in Fig.5.The RMSE of three models is shown in Fig.4.Meanwhile, experimental results on the concentration of PM 2.5 show that the sum of RMES of NNP-BPNN model is relatively minimal, as shown in Fig.5.The relevancy of the three models is shown in Table5.In ten times of experiments,the area of RMSE produced by BPNN model is the largest among that of three models, the average RMSE of NN-BPNN model is reduced by 18% than that ofBPNN model, theaverage RMSE of NNP-BPNN model is reduced by 24% than that of BPNN model and the average RMSE of NNP-BPNN model is reduced by 7% than that of NN-BPNN model.It took 20 s to implement BPNN model, it took 170 s to implement NN-BPNN model and it took 47 s to implement NNP-BPNN model.The time used by NNP-BPNN model is reduced by 72% than that of NN-BPNN model.Empirically, the NNP-BPNN model of statistical forecasting method based on historical monitoring data is simple and practical. minimum

Table 5
The value of relevancy