A Deep Learning-Based Multi-objective Optimization Model for PM2.5 Prediction

Air pollution caused by particulate matter with a diameter of less than 2.5 μm (PM2.5) poses a serious threat to human health and the environment. Predicting PM2.5 concentrations and controlling emissions are crucial for pollution prevention and control. This study proposes a comprehensive solution based on weight-sharing deep learning and multi-objective optimization. The proposed approach first utilizes a model that combines the Convolutional Neural Network and Long Short-Term Memory Neural Network to analyze data from 13 air quality monitoring stations in Xi'an City. By simultaneously inputting data from different monitoring stations, the model can extract highly correlated spatiotemporal features, enabling accurate predictions of PM2.5 concentrations for specific monitoring stations using LSTM. In addition, a multi-objective optimization model is established with the primary goal of achieving maximum total emission reduction. This model takes into account four key factors: the total emission reduction, the task of emission reduction, the government subsidy, and the total cost of emission reduction. To obtain the emission reduction of PM2.5 concentration at 13 monitoring stations, 5 classical intelligence algorithms are employed to solve the model. Experimental results demonstrate the effectiveness of the proposed prediction model, with an average Root Mean Square Error (RMSE) of 12.820 and a fitting coefficient (R2) of 0.907, outperforming all comparison models. The proposed model exhibits strong generalization ability, making it applicable to different time and space conditions. Furthermore, it can be adapted for calculating emission reduction of other air pollutants. Lastly, the multi-objective optimization model achieves significant success in terms of total emission reduction. This study provides a new reference in the field of artificial intelligence and its application to air pollution control. The findings hold great significance for promoting public health and environmental protection.


Introduction
Air pollution can pose a threat to health and has aroused strong concern among people [1].Among all kinds of air pollutants, PM 2.5 is the most serious [2].Existing studies show that an average increase of 100 μg/m 3 of suspended particulate concentration can lead to a 14% increase in mortality [3].According to data from the Health Effects Institute (HEI) Ministry of Commerce, PRC (IHME), in 2019, 92% of the global population lived in areas where the annual average PM 2.5 concentration exceeded the World Health Organization (WHO) requirement (i.e., 10μg/m 3 ) [4].The urban transportation sector is the main source of PM 2.5 [5].In addition, waste incineration, conversion of volatile organic compounds, and meteorological factors also have an impact on PM 2.5 concentration [6].Therefore, actively carrying out PM 2.5 concentration prediction and formulating PM 2.5 emission reduction measures in advance have important theoretical and practical significance for preventing air pollution, Intelligent decision analysis and application [47], and protecting people's health and the ecological environment.
The prediction of PM 2.5 concentration is of great significance.At present, the commonly used prediction methods can be summarized as physical models and data-driven methods.Physical models need to be modeled in accordance with strict mathematical formulas and meteorological knowledge and can achieve good results for specific problems.However, at the same time, physical models often have the characteristics of poor generalization, high model complexity, and long modeling time [7][8][9].The data-driven method [10] is based on the correlation between the historical data on air quality and related factors, which is simpler and more efficient than the physical model.With the development of computer technology and the wide application of big data in the environmental field, researchers are increasingly focusing on the use of data-driven models [11].For example, Kumar and Goyal [12] fused principal component analysis and multiple linear regression to predict the air quality of New Delhi in 2013.Janarthanan et al. [13] combined the gray level co-occurrence matrix and Long Short-Term Memory neural network (LSTM) in 2021 to predict the air quality of Chennai, India.These studies have achieved good results.
With the continuous development of machine learning technology, data-driven machine learning models are becoming more and more popular, and methods such as neural networks have been widely used in air quality prediction [14][15][16][17].For example, Hu et al. [18] used the wavelet decomposition method, simulated annealing, and back propagation neural network to predict PM 2.5 concentration.Ibrir et al. [19] designed a PM 2.5 concentration prediction model with good prediction performance based on the Dragonfly algorithm and support vector machine.At the same time, among various neural networks, LSTM neural network has more advantages in time series data prediction due to its special gate structure, and previous studies have also proved that using LSTM for PM 2.5 concentration prediction can obtain good prediction effects [20,21].But due to the diversity and complexity of the factors affecting air quality, a single prediction model may not be able to extract the spatiotemporal features in the data well, and it is difficult to achieve ideal prediction results.However, LSTM neural network has special gate structures, which give it an advantage in time series data prediction, and has been proven to achieve good results in PM2.5 concentration prediction.Therefore, some researchers attempted to combine Convolutional Neural Networks (CNN) with Long Short-Term Memory (LSTM) or its variant, Gated Recurrent Units (GRU), to predict PM2.5 concentration, and achieved good results [22,23].Meanwhile, CNN, as a popular deep learning model, has strong feature extraction capabilities for spatial information and has been successfully applied in image recognition and air quality prediction fields [24,25].It can effectively extract spatial features from data.Therefore, adopting a combined LSTM and CNN model can better extract spatiotemporal features from data and obtain more accurate PM2.5 concentration prediction results.
However, most of the existing data-driven machine learning models, whether a single model or hybrid model, predict air quality by optimizing the objective function of a specific prediction task and often ignore the potential nonlinear spatial correlation between air quality monitoring stations.At the same time, the used hybrid model will also have the problems of too many parameters and low model training efficiency.To make better use of spatial correlation, extract spatiotemporal features, and achieve better prediction effects, it is appropriate to borrow the idea of multitask learning [26].Multi-task learning and the related idea of weight-sharing have achieved satisfactory results in the fields of speech recognition [27], image processing [28], and sequence processing [29].
According to the prediction results of PM 2.5 concentration, it is also necessary to calculate the emission reduction amount of PM 2.5 concentration by taking into account the emission reduction costs and government subsidies.At present, econometric methods and models or software related to physical chemistry are mainly used in the research on PM 2.5 emission reduction.For example, Wu et al. [30] used the two-way fixed effect regression model to study the relationship between the promotion of electric vehicles and the reduction of PM 30 2.5 emissions.Xu et al. [31] conducted surface NH 31 3 measurement, satellite NH 3 observation, and GEOS-Chem simulation, respectively, and the results showed that reducing NH 3 emission played an important role in PM 2.5 emission reduction.Hu et al. [32] used the nested grid air quality forecasting model system to analyze the time evolution of air pollutants and components in the Beijing-Tianjin-Hebei region and obtained good results [32].These methods are highly targeted and solve specific practical problems well.However, they suffer from poor generalization and high cost.In addition, there is almost no use of the multi-objective operational research model to calculate the emission reduction of PM 2.5 through mathematical formula modeling.At the same time, the multiobjective optimization model has been widely used in the research of wind power generation [33], petroleum [34] and air pollutant concentration prediction [14].Optimization of the metal-working parameters [48], which has proved the scientific and feasibility of this method.The application of the multi-objective optimization model in the process of PM 33 343536 2.5 concentration emission reduction calculation will be a new idea.
To sum up, considering the correlation between the regions where different monitoring sites are located, the time series characteristics of air quality data, the complexity of model parameters, and the correlation and influence The method proposed in this paper has important theoretical and practical significance, which provides a new reference for the study of PM 2.5 concentration prediction and emission reduction calculation, and provides a useful idea for air pollution control and environmental protection.
The four innovations of this paper are as follows: 1.A PM2.5 concentration prediction model is constructed.It uses deep learning and multi-objective optimization technology to analyze the data of 13 air quality monitoring stations in Xi'an and predict the PM2.5 concentration.Specifically, we combine convolutional neural network and long short-term memory neural network to extract highly correlated spatiotemporal features and perform PM2.5 concentration prediction for each monitoring station 2. This paper designs a multi-objective optimization model of PM2.5 emission reduction with the maximum total emission reduction as the primary goal, which comprehensively considers four factors: the total emission reduction, the emission reduction task, the government subsidy, and the total cost of emission reduction.
Five classical intelligent algorithms are used to solve the model to obtain the PM2.5 concentration emission reduction of 13 monitoring stations.
3. The proposed method considers the correlation between the regions of different monitoring stations, the time series characteristics of air quality data, the complexity of model parameters, and the correlation and influence between different regions, which improves the prediction accuracy and emission reduction effect.4. The proposed method provides a new and effective method to predict PM2.5 concentration and calculate emission reduction, which is of great significance for air pollution control and environmental protection.In addition, the paper also provides suggestions for expanding and verifying the applicability of the model, so that it can be further applied to other cities and regions.
The rest of the paper is organized as follows.Sect. 2 describes the study area and the datasets used.Sect. 3 describes the theory and methods used in this paper.Sect. 4 describes the whole process of the model established in this paper.In Sect.5, the empirical study is carried out, and the experimental results are analyzed and discussed in detail.Finally, in Sect.6, conclusions are drawn.

Study the Area and Data
Xi'an, the capital of Shaanxi Province, China, is located in the northwest of China, in the middle of the Guanzhong Plain, from the Weihe River in the north to the Qinling Mountains in the south.By the end of 2022, Xi'an has 11 districts and 2 counties under its jurisdiction, with a total area of 10,752 square kilometers and a permanent population of 12.9959 million.The air pollution in Xi'an is so severe that it once became the most polluted city in China.To control air pollution, the city has set up 13 air quality monitoring stations in various regions.In this paper, 13 air quality monitoring stations in Xi'an City are taken as the research object, and their numbers are 1462A to 1474A.The geographical distribution map of each air quality monitoring station is drawn by ArcGIS software, as shown in Fig. 1.
Six kinds of air quality data collected by 13 air quality monitoring stations were obtained in China National Environmental Monitoring Station.The time span was from January 1 to December 30, 2020, and the data sampling frequency was hourly data.For the monitoring station p, the original data it had in period t could be expressed as Eq. ( 1). (1) In Eq. ( 1), X is an n × m dimensional matrix, which means there are n pieces of data and each piece of data has m features.In this paper, n is the amount of data for one year, and m = 6.Then, the first data in the data set possessed by the first monitoring station 1462A can be expressed as, still taking monitoring station 1462A as an example, part of the original data is given in Table 1.X 1 (0) = x 1  1 (0) 1×6 .

Research Methods and Models
In this section, the methods used in this paper are briefly introduced, including the principle and construction process of the CNN-LSTM model based on weight-sharing, and the construction process of the multi-objective optimization model for PM 2.5 concentration emission reduction.

Input Layer
According to the data preprocessing method in reference [37], after preprocessing the data in this paper, except for the monitoring station to be predicted, all the data sets of other monitoring stations are input as the training set.[37] At the same time, the data set owned by the monitoring station to be used for the prediction task was divided and input into the model according to the 6:2:2 ratio of training set: validation set: test set.

Weight-Sharing Layer
The air quality data among monitoring stations located in a small area have a strong spatial correlation.In this paper,  CNN is a kind of feedforward neural network with special connections, which is suitable for processing data with a European structure.The hidden layer of a CNN usually consists of a convolutional layer, a pooling layer, and a fully connected layer.According to the literature [10], the pooling layer can reduce the number of parameters, which leads to the loss of some feature information [10].Similarly, the fully connected layer will also lead to the loss of location information, thereby destroying the long-term dependence characteristics of time series.Therefore, to avoid these problems, only the convolutional layer in CNN is used as the weight-sharing layer of the prediction model.
For any monitoring station p, when using CNN for feature extraction, the specific steps are as follows: (1) Step 1: convolution calculation.
The main function of the convolution layer is to extract local region information, which is calculated by sliding the convolution kernel of a specific size over the local region of the input data in turn.The specific process Eq. ( 2) is shown as follows: In Eq. ( 2), p is the number of the monitoring station, X l is the two-dimensional matrix corresponding to the input data, the size is L × Cin dimensions, L equals the length of the input, Cin equals the number of channels of the input, where X l+1 p,t,c is the tth parameter in the cth channel of the l + 1th layer; W l p,k,i,c is the kth weight coefficient corresponding to the i-th channel of the cth convolution kernel; B p C is the bias coefficient of the corresponding convolution kernel; K is the convolution kernel size and s is the step size of the convolution kernel.
The activation function adds non-linearity to the neural network, and a non-linear model with the activation function added has more expressive power than a linear model. (2) The activation layer applies a non-linear transformation to the input, giving it better expressive power.Commonly used activation functions are tanh, sigmoid, ReLU, and so on.Among them, the ReLU function has been widely used due to its fast convergence and solves the gradient saturation problem in sigmoid, whose mathematical representation is shown in Eq. ( 3): When the activation function is ReLU, the post-activation output is shown in Eq. ( 3): Equation ( 4), X l+1 p is the output matrix of the input matrix of monitoring station p at layer l + 1, and L is the total number of layers of the neural network (except the input layer), which can be used as input to the next convolutional layer.

LSTM Layer
LSTM is an improvement of the Recurrent Neural Network (RNN) [38].38RNN has a high probability of gradient disappearance and gradient explosion during the training process, and LSTM can effectively solve this problem.In LSTM, each neuron is equivalent to a memory cell.LSTM controls the state of memory cells through a "gate" mechanism, thereby adding or removing information from them.
The LSTM model structure consists of a forget gate, an input gate, an output gate, and an internal memory unit, which jointly control the output of the LSTM.For the monitoring station p, the working steps of the LSTM at time t are as follows: Step 1: Selectively forget the old data information through the forget gate.The equation for calculating the attenuation coefficient f t is shown in Eq. ( 5): In Eq. ( 5), σ is the activation function, which outputs a number between 0 and 1. x p i (t) is the input vector at time t, and h t-1 is the output prediction vector at time t-1.
Step 2: Determine how much new data information will be added to the cell state through the input gate.i t indicates the level of trade-off for the new information added.The tanh function is used to generate candidate cells.The equation are shown in Eqs. ( 6) and ( 7): (3) (5) Page 6 of 16 Step 3: Update the old cell state C t-1 to the new cell state C t .As shown in Eq. ( 8): Step 4: The prediction vector h t at time t is output through the output gate.O t indicates the degree of trade-off between the historical information and the current input information that has been initially fused.The equation are shown in Eqs. ( 9) and (10): Equations ( 5)-( 10 The structure of the LSTM is shown in Fig. 2.

Output Layer
After the input layer, weight-sharing layer, and LSTM layer, the output is obtained through the fully connected layer to obtain the prediction result of PM 2.5 concentration of each air quality monitoring station, and the result is used as the emission reduction upper bound in the PM 2.5 concentration multi-objective emission reduction model, which lays a foundation for the subsequent solution of the PM 2.5 concentration emission reduction multi-objective optimization model. (

PM 2.5 Concentration Emission Reduction
Multi-objective Optimization Model

Constraints
N air quality monitoring stations are established in a city, corresponding to n areas, and each area emits different concentrations of PM 2.5 .Under the influence of meteorological factors, part of the PM 2.5 emitted by the region corresponding to each monitoring station will spread to other regions, and the rest will remain in the region.In order to achieve optimal emission reduction, a multi-objective optimization model of PM 2.5 concentration emission reduction is established.The model is a collaborative emission reduction model across time and space, which not only considers the emission status of the previous period but also takes into account the interaction between n regions.Suppose that the PM 2.5 emission reduction work has been carried out in period t-1.In the current period t, the predicted total production of PM 2.5 concentration in n regions is For region i, the PM 2.5 concentration of other regions moving into region i can be expressed as And from the area i moved to other areas of PM 2.5 concentrations, respectively, represented as In period t, the emission reduction of PM 2.5 concentration in n regions is E(t) = [e 1 (t), e 2 (t),…, e n (t)] T , which are variables to be solved, and each value scheme constitutes an emission reduction scheme in period t.The constraints are described in the following.
The tentative solution of the model E(t) = [e 1 (t), e 2 (t),…, e n (t)] T , e i (t) is the reduction of PM 2.5 emissions in region i in period t.When e i (t) < 0, the objective function value is meaningless, and the emission reduction of each region should be greater than or equal to 0, as shown in Eq. ( 11).(11) e i (t) ≥ 0, i = 1 ∼ n.In period t, the value range e i (t) of the PM 2.5 emission reduction concentration in region i should be larger than the unfinished PM 2.5 emission reduction in the previous period and smaller than the total concentration in region i in this period, as shown in Eqs. ( 12) and (13).
PM_V i is the set of other regions affected by region i.In the model, it is assumed that each region is affected by all other regions, so the set of affected regions of region i in period t can be expressed by Eq. ( 14).PM_V i (t) = {1,2,…, PM_V 0 is the regional influence coefficient, which means that under the joint influence of all regions, the ratio of the absolute value of the residual PM 2.5 concentration in region i after emission reduction, and the sum of PM 2.5 concentrations in all regions in period t is less than PM_V 0 .Its value is a given constant and is determined by the actual situation.In general, PM_V 0 ∈ (0,1), as shown in Eq. (15).PM_U is the regional emission reduction coefficient, which means that the ratio of the emission reduction of region i in period t to the remaining PM 2.5 concentration in this region after the impact of immigration is greater than PM_U.Its value is a given constant and determined by the actual situation.In general, PM_U ∈ (0,1), as shown in Eq. ( 16).

Objective function
(1) Objective function of total emission reduction control.(12) e i (t) ≤ e i (t − 1) In Eq. ( 17), f 1 (E(t)) is the total control function of PM 2.5 emission reduction in region i in period t.
In Eq. ( 18), f 2i (e i (t)) is the abatement task control function of region i in period t.
In Eq. ( 19), f 3i (e i (t)) is the government subsidy utility control function of region i in period t, and q i is the subsidy coefficient of region i, which is determined according to the actual government policy.
In Eq. ( 20), f 4i (e i (t)) is the total cost control function of region i in period t, and c i (t) is the unit PM 2.5 emission reduction cost of region i, which is determined according to the actual emission reduction cost.

Multi-objective Optimization Model of Pm 2.5 Concentration Emission Reduction
The ultimate goal of the multi-objective optimization model of PM 2.5 concentration emission reduction established in this paper is how to control the PM 2.5 concentration emission of each region in the associated region, so as to minimize the impact of PM 2.5 on the atmospheric environment in the associated region.The objective function includes four kinds of index functions, among which the objective function of total emission reduction control occupies the largest weight, the objective function of emission reduction task control, the objective function of government subsidy control, and the objective function of total cost control, the weights of which are decreasing in turn.In summary, the multi-objective optimization model of PM 2.5 concentration emission reduction can be shown in Eq. ( 21).

Parameter Settings of the Model and Algorithm
In the prediction of PM 2.5 concentration, in the S-CNN-LSTM model established in this paper, two the convolution layers and two LSTM layers are set, the size of convolution kernel is 1 × 1, the number of neurons in each LSTM layer is 100, and the number of iterations of the whole network is 100.Four models of MLP, SVR, LSTM, and CNN-LSTM were used for comparison.In the comparison model, the model parameters of CNN-LSTM and S-CNN-LSTM are completely the same, SVR uses the Gaussian kernel function, penalty factor C = 1.0,MLP contains two hidden layers, the number of neural units in each hidden layer is 100, and LSTM contains two hidden layers, the number of neural units in each hidden layer is 100.The number of iterations of MLP and LSTM was 100.At the same time, in S-CNN-LSTM, MLP, SVR, and LSTM models, for each monitoring station, the data of all other monitoring stations are used to jointly train the model.However, in the CNN-LSTM model, only the data set of each monitoring station is used for model training.
When solving the multi-objective emission reduction model of PM 2.5 concentration, the classical genetic algorithm (GA) [39], Differential Evolution (DE) [40], Particle Swarm Optimization (PSO) [41], Artificial Fish Swarm Algorithm (AFSA) [42], and cuckoo search Algorithm (CS) [43] are used to solve the multi-objective optimization model.39 40 41 42 43To verify the scientificity and feasibility of the model, the parameter settings of five optimization algorithms are shown in Table 2.

A Weight-Sharing and Multi-objective Optimization-Based Model for Predicting PM 2.5 Concentrations and Calculating Emission Reductions
In summary, a PM 2.5 concentration prediction and emission reduction calculation model based on weight-sharing and multi-objective optimization is established in this paper.The calculation process of the model includes the following four steps.
( In summary, the flowchart of the above four steps is shown in Fig. 3.

Results and Discussion
In this section, the experiment is first designed, the parameter Settings of all the models and algorithms used are introduced, and the evaluation metrics of the models are given.Then, according to the overall process given in Sect.4, the case study of the PM 2.5 concentration prediction and the solution of the multi-objective optimization model of PM 2.5 concentration emission reduction is carried out to verify the scientificity, feasibility, and superiority of the proposed method.

Model Training
The model in this article is implemented using the Tensor-Flow deep learning framework.The server configuration for training the model is Intel(R) Xeon(R) Platinum CPU, NVIDIA Tesla T4 GPU, and CentOS as the operating system.To ensure fair testing, all models were trained with the same training parameters.All models started from scratch and were trained for 400 epochs.

Model Evaluation Index
RMSE and R2 are commonly used evaluation metrics in predictive models to verify the model's predictive performance.RMSE is a relative error that is related to the dataset.For a given dataset, the smaller the RMSE, the better the model's performance.R2 represents the overall fitting effect of the model, and the closer it is to 1, the better the model's performance.In the experimental section of this article, we use RMSE and R2 to evaluate and compare the performance of models.
Although RMSE and R 2 are widely used in model performance evaluation, there are also some drawbacks.For example, RMSE cannot reflect the model's ability to handle outliers, and R 2 may be sensitive to outliers even if the fitting effect is good.Therefore, in practical applications, multiple evaluation metrics should be comprehensively considered, and the most suitable evaluation criteria should be selected according to the specific problem.
In related research, some scholars use other evaluation metrics such as Mean Absolute Error (MAE), Median Absolute Error (MdAE), and Mean Absolute Percentage Error (MAPE) to evaluate the performance of models.For example, Chen et al. [44] use MAE and Mean Absolute Percentage Error (MPE) to evaluate the model's predictive performance.Li et al. [45] use MAE, MAPE, and Median Absolute Percentage Error (MdAPE) to evaluate the model's performance.In practical applications, the most suitable evaluation metrics should be selected according to the specific problem.
When outputting the predicted results, it is usually necessary to calculate the evaluation metrics to verify the model's predictive performance.In this article, we use Root Mean Square Error (RMSE) and Coefficient of Determination (R 2 ) in the test set to evaluate the model's performance.RMSE is a relative error that is related to the dataset, and the smaller the RMSE, the better the model's performance.R 2 represents the overall fitting effect of the model, and the closer it is to 1, the better the model's performance.According to the previous introduction, for monitoring station p, its original data s e t c a n b e e x p r e s s e d a s the calculation Equation of each evaluation index can be shown in Eqs. ( 22) and ( 23).
In Eqs. ( 21) and (22), mn is the total number of samples of test data,x p i,j and xp i,j are the true and predicted values of the test data, respectively, and x p i,j is the mean of the jth column in the data set.

PM 2.5 Concentration Prediction Results
CNN-LSTM with weight sharing was used to predict the PM 2.5 concentration in the region corresponding to 13 air quality monitoring stations, and CNN-LSTM, LSTM, SVR, and MLP were compared.Taking monitoring stations 1462A and 1474A as examples, the predicted value and real value comparison results of all models were plotted, as shown in Fig. 4. The prediction performance of all models in the 13 monitoring stations is shown in Table 3.
Figure 4 shows that in monitoring stations 1462A and 1474A, the predicted values of the S-CNN-LSTM model are closest to the true values, indicating that the S-CNN-LSTM model can effectively extract the spatiotemporal features and correlations between different monitoring stations, and obtain the optimal prediction effect.
Table 3 shows that compared with MLP, SVR, and LSTM, the S-CNN-LSTM model has the best prediction performance with an average RMSE of 12.820 and an average R 2 of 0.907 in 13 monitoring stations.This is because although MLP, SVR, LSTM models, and the S-CNN-LSTM model use the same data set for model training, the first three models cannot extract the correlation between different monitoring stations, so the prediction effect is not as good as the S-CNN-LSTM model.For CNN-LSTM model, has the same model parameters as the S-CNN-LSTM model, but only uses each monitoring station's data set for model training, so the comprehensive performance is inferior to the S-CNN-LSTM model.In 10 monitoring stations out of 13, the S-CNN-LSTM model is better than the CNN-LSTM model.It is worth mentioning that in the monitoring stations 1464A, 1469A, and 1472A, the prediction performance of the S-CNN-LSTM model is slightly worse than that of the CNN-LSTM model.This is because the features extracted by S-CNN-LSTM tend to be more common among the 13 monitoring stations, so in a specific monitoring station, the prediction effect of a specific monitoring station may not be as good as that of the CNN-LSTM model.However, the S-CNN-LSTM model has stronger generalization than the CNN-LSTM model, which will be introduced and proved in the subsequent content of this paper.

PM 2.5 Concentration Reduction Results
Based on the PM 2.5 prediction results of the regions corresponding to the 13 air quality monitoring stations obtained above, in the test set, taking the PM concentration of the last month (that is, December 2020) as an example, the PM 2.5 concentration of the 13 air quality monitoring stations in each day 2.5 in December was summed up to obtain the total PM 2.5 concentration of the whole month of December.The total concentration was taken as the upper bound (i.e., the maximum emission reduction) of the multi-objective optimization model of PM 2.5 concentration emission reduction.GA, DE, PSO, AFSA, and CS algorithms were used to solve the model, respectively, and the PM 2.5 concentration reduction scheme of the 13 monitoring stations in the next month (January 2021) was calculated.The emission reduction cost q and subsidy coefficient c of the 13 monitoring stations are shown in Table 4, with PM_V 0 = 0.05 and PM_U = 0.01.The final PM 2.5 emission reduction calculation results are shown in Table 5, the emission reduction comparison diagram of the five algorithms is shown in Fig. 5, and their convergence curves are shown in Fig. 6.
It can be seen from Table 4 that all five optimization algorithms have calculated good emission reduction schemes.Taking monitoring station 1462A as an example, the PM 2.5 concentration reduction scheme obtained by the DE algorithm is the best, and the optimal emission reduction concentration is 58.191mg/m 3 , followed by CS, PSO, GA According to Fig. 5, the calculation results of PM 2.5 concentration emission reduction by the CS algorithm are generally due to the other four algorithms and are closest to the actual emission, which proves that the CS algorithm is more suitable for the multi-objective optimization model of PM 2.5 concentration emission reduction proposed in this paper.According to Fig. 6, the CS algorithm converges the earliest and the convergence curve decreases the fastest, indicating that the CS algorithm has the fastest convergence speed, the lowest probability of falling into the local optimum trap, and the overall performance is better than other comparison algorithms.
In summary, all the algorithms have achieved good calculation results, which further proves the solvability, scientific, and feasibility of the established model.From the perspective of the multi-objective optimization model and intelligent algorithm solution, it provides a new idea and reference for the research on PM 2.5 concentration emission reduction.

Verification of the Generalization of the Prediction Model
Through the previous analysis, it can be seen that the comprehensive prediction performance of the S-CNN-LSTM model proposed in this paper is better than all the comparison models, and the multi-objective optimization model of PM 2.5 concentration emission reduction established is scientific and feasible.In the following, to analyze the prediction performance of the S-CNN-LSTM model in depth, Baoji City, Xianyang City, Weinan City, and Tongchuan City, which are close to Xi'an City, are used, corresponding to monitoring stations 1931A, 1919A, 1940A, and 1923A, respectively.The detailed information on the four monitoring stations is shown in Table 6.
Based on the data of 13 monitoring stations in Xi'an, the data of the above 4 monitoring stations are used to predict for different monitoring stations, and the first 400 data in the test set are taken as an example to draw a comparison map between the predicted value and the real value, as shown in Fig. 7.
Figure 7 shows that the predicted values of S-CNN-LSTM in Baoji, Xianyang, Weinan, and Tongchuan are close to the true values, and good prediction effects can also be achieved at the peaks and valleys of the data curve, which proves that the proposed S-CNN-LSTM model has good generalization.In the following, RMSE and R 2 are used as evaluation indexes, and the comparison results of the prediction evaluation indexes of the S-CNN-LSTM and CNN-LSTM models in four cities are given, as shown in Fig. 8.According to Fig. 8, the prediction evaluation indexes of the S-CNN-LSTM model are better than those of the CNN-LSTM model in the four cities, which again proves that the S-CNN-LSTM model has good generalization.

Conclusions
In this study, a method based on weight-sharing deep learning and multi-objective optimization is proposed for PM 2.5 concentration prediction and emission reduction calculation.The S-CNN-LSTM model was constructed to predict PM 2.5 concentration, and intelligent algorithms such as GA, DE, PSO, AFSA, and CS were applied to solve the model, and the PM 2.5 concentration emission reduction schemes in the regions where different monitoring stations were located were obtained.Experimental results show that the proposed method is superior to other comparison models in terms of prediction accuracy and generalization.Taking Xi'an City as an example, the S-CNN-LSTM model is applied to the hourly data set of one year in 2020 collected by 13     In summary, the PM 2.5 concentration prediction method based on the S-CNN-LSTM model and the multi-objective optimization model proposed in this study provides an effective solution for PM 2.5 concentration prediction and emission reduction calculation.This study has important theoretical and practical significance for cities with dense air quality monitoring stations.However, for largescale studies with sparse air quality monitoring stations or among multiple cities, the influence of the correlation between air quality monitoring stations on the accuracy of prediction results needs to be further considered.
The results of this paper have important theoretical and practical significance for people's health and environmental protection.Future research directions can include the application of this method to predict the concentration of other air pollutants, such as PM 10 and SO 2 , and the calculation of emission reduction of other air pollutants.In addition, the model and algorithm can be further optimized to improve the prediction accuracy and emission reduction effect.Finally, the research scope can also be expanded to explore how to improve the accuracy of prediction and promote its application when air quality monitoring stations are sparse or among multiple cities.

Fig. 1
Fig. 1 The geographical distribution map shows the location of meteorological monitoring points in Xi'an City ), W f , W i , W c , W o and b f , b i , b c , b o represent the weight matrix and bias term, respectively; is the sigmoid activation function, and tanh is the tanh activation function.

Table 2
Parameters of six optimization algorithms Optimization algorithms Parameters DE Variation factor F = 0.5, crossover probability = 0.3, biomass in each population = 13, number of populations N = 100 GA Mutation probability = 0.01, biomass in each population = 13, number of populations N = 100 PSO Acceleration coefficient c 1 = c 2 = 0.5, particle dimension = 13, and number of particles N = 100 AFSA Maximum sensing range of fish = 0.3, maximum displacement ratio = 0.5, attenuation coefficient of sensing range = 0.98, crowdedness threshold = 0.5, maximum number of predation attempts = 100, biomass in each population = 13, and population number N = 100 CS The number of nests N = 100, the optimization dimension = 13, the probability of egg discovery Pa = 0.25, β = 1 Page 9 of 16 141

Fig. 3
Fig. 3 The flowchart of the PM2.5 concentration prediction and emission reduction calculation model based on weightsharing and multi-objective optimization

Fig. 4
Fig.4 Comparison of predicted and true values for all models at stations 1462A and 1474A air quality monitoring stations in Xi'an.At the same time, the data of 1931A, 1919A, 1940A, and 1923A from monitoring stations in Baoji, Xianyang, Weinan, and Tongchuan were used to verify the generalization of the S-CNN-LSTM model.The results show that: (1) Compared with MLP, SVR, and LSTM models without CNN, the model using CNN can significantly improve the prediction accuracy due to feature extraction.(2) Compared with CNN-LSTM, the S-CNN-LSTM model is superior to the extraction of highly correlated common features of all air quality monitoring stations, which makes the model have higher prediction accuracy and stronger generalization.Through the experimental verification of the data of 13 air quality monitoring stations in Xi'an, the S-CNN-LSTM model shows better prediction performance, the average Root Mean Square Error (RMSE) is 12.820, and the fitting coefficient (R2) is 0.907, which is better than the traditional model.In addition, in the generalization verification, the S-CNN-LSTM model is better than the CNN-LSTM model in the four cities of Baoji, Xianyang, Weinan Tongchuan, indicating that it has good generalization ability.(3) In terms of multi-objective optimization, a multi-objective optimization model to maximize the total emission reduction was established, and GA, DE, PSO, AFSA, and CS algorithms were applied to solve the model, and the PM 2.5 concentration emission reduction schemes of different monitoring stations were obtained.The results show that the CS algorithm performs the best in the multi-objective optimization model and can obtain

PM 2 .
5 emission reduction closest to the actual emission.This verifies the scientific and feasible nature of the model.

Table 1
Original air quality data of monitoring station 1462A convolutional neural Network (CNN) is used to extract features of spatial correlation.At the same time, in order to reduce the model parameters, the principle of multi-task learning is used to use CNN with the feature of weightsharing.The data of all monitoring stations are input into the CNN together to extract the same spatiotemporal features between different monitoring stations, which improves the generalization of the model.Then, for each monitoring station, the Long Short-Term Memory neural network (LSTM) was used to predict the final PM 2.5 concentration prediction results of each monitoring station. a ) Step 1: After preprocessing the raw data, the data sets of all monitoring stations are input into the CNN model for feature extraction, and the spatiotemporal features suitable for all monitoring stations are obtained.

Table 4
Emission reduction costs and subsidy coefficients of the 13 monitoring stations

Table 5
Average value of optimal emission reduction scheme at each monitoring point in January 2021mg/m3

Table 6
Details