Abstract
Atmospheric chemistry transport models have been extensively applied in aerosol forecasts over recent decades, whereas they are facing challenges from uncertainties in emission rates, meteorological data, and over-simplified chemical parameterizations. Here, we developed a spatial-temporal deep learning framework, named PPN (Pollution-Predicting Net for PM2.5), to accurately and efficiently predict regional PM2.5 concentrations. It has an encoder-decoder architecture and combines the preceding PM2.5 observations and numerical weather prediction. Besides, the model proposes a weighted loss function to promote the forecasting performance in extreme events. We applied the proposed model to forecast 3-day PM2.5 concentrations over the Beijing-Tianjin-Hebei region in China on a three-hour-by-three-hour basis. Overall, the model showed good performance with R2 and RMSE values of 0.7 and 17.7 μg m−3, respectively. It could capture the high PM2.5 concentration in the south and relatively low concentration in the north and exhibit better performance within the next 24 h. The use of the weighted loss function decreased the level of “high values underestimation, low values overestimation”, while incorporating the preceding PM2.5 observations into the encoder phase improved the predictive accuracy within 24 h. We also compared the model result with that from a state-of-the-art numerical model (WRF-Chem with pollutant data assimilation). The temporal R2 and RMSE from the WRF-Chem were 0.30−0.77 and 19−45 μg m−3 while those from the PPN model were 0.42−0.84 and 15−42 μg m−3. The proposed model shows powerful capacity in aerosol forecasts and provides an efficient and accurate tool for early warning and management of regional pollution events.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Introduction
Fine particulate matter (PM2.5, with an aerodynamic diameter of <2.5 µm) has been identified as a key component of atmospheric pollution in China due to large amounts of precursor emissions and unfavorable meteorological conditions1, especially during cold seasons. Inhaling PM2.5 of high concentration poses a serious threat to public health, leading to increased risks of cardiovascular and respiratory diseases2. Therefore, accurate prediction of PM2.5 concentrations is indispensable both for public health warnings and emission controls.
The main approaches of PM2.5 prediction can be divided in two categories: i.e., numerical models and statistical models. Numerical models, such as the WRF-Chem3, CMAQ4, and GEOS-Chem5, describe the essential atmospheric physical and chemical processes through mathematical formulas, which establish the explicable relationship among atmospheric components, meteorological parameters, and emissions in spatial and temporal dimensions. These numerical models, termed Chemistry Transport Models (CTMs), have been extensively used in PM2.5 prediction6,7, source apportionment8,9,10 and mechanism analysis11,12,13 worldwide. However, large uncertainties arise from emission rates14, meteorological data15, as well as the over-simplified chemical parameterizations16, may result in certain deviations from the observed PM2.5 concentrations. To reduce model uncertainties, data assimilation has been extensively utilized to provide more precise initial chemical conditions by incorporating in-situ pollutant observations and satellite retrievals into CTMs17,18,19,20,21. It showed beneficial effects on PM2.5 forecasts for up to 24 h while the benefit from data assimilation diminished rapidly with forecast range17,21. In addition, low computational efficiency is another disadvantage of the CTMs, which usually take several hours for one simulation day with fine resolution22.
Statistical methods essentially establish the relationship between multiple predictors and historically observed PM2.5 concentrations through regression models or machine learning (ML) algorithms, which do not involve the complex physical and chemical processes and usually have less demand for computing resources compared to the CTMs. Initially, linear regression models were applied to PM2.5 forecast and showed reasonable accuracy23. However, the linear assumption in these models sometimes might not precisely capture the connection between predictors and air pollutant concentrations. In recent years, ML algorithms have gained popularity in air quality forecast due to their strong power in effectively handling nonlinear relationship between predictors and targets24. For example, Ma et al.25 utilized the XGBoost algorithm to predict PM2.5 concentrations in Shanghai and reported a correlation coefficient (R) of 0.77 and Root Mean Square Error (RMSE) of about 12–18 μg m−3 for 24 h ahead prediction. Bi et al.26 developed a PM2.5 forecast system by combing the random forest algorithm with CTM model results in central China, which had R2 of 0.76 and 0.64 for the next 2 days. These ML methods show good capacities in capturing non-linear relationships between features and aerosol concentrations at single sites. However, air pollutant forecasting is usually a regional issue related to both spatial relations and time sequences. The traditional ML methods are still difficult to resolve the complex spatiotemporal correlations27.
Nowadays, deep learning (DL) networks show remarkable capabilities in forecasting air pollutant variations27,28 due to their successful applications in dealing with nonlinear spatiotemporal correlations and advances in computing resources. DL is based on multiple-layer neural networks. A kind of the representative models is Convolutional Neural Network (CNN) - Recurrent Neural Network (RNN) architectures in which CNNs extract features and spatial relations29,30 while RNNs handle the temporal dependencies31,32. Long Short-Term Memory (LSTM) is a type of RNN specially designed for overcoming gradient vanishing and exploding when managing long-term dependencies33, which has been extensively employed in spatial-temporal forecasting of air pollutant30,34,35,36,37. For example, Yan et al.30 built a hybrid DL network (CNN-LSTM) to predict PM2.5 concentrations in next 6 h in Beijing based on historical PM2.5 data, meteorological indicators, and spatiotemporal data. Pak et al.35 combined the CNN-LSTM model with a spatiotemporal feature vector to reflect the relations among parameters to forecast the next day’s daily PM2.5 concentrations in Beijing and outperformed the traditional CNN-LSTM method. Yeo et al.36 integrated the gated recurrent unit (GRU) algorithm, a similar architecture as LSTM but has fewer parameters, with a CNN accounting for the geographical correlation of nearby stations. Using the new method, they improved the 24-h PM2.5 prediction in Seoul by about 10%. Overall, the DL-based model is a promising and efficient tool for PM2.5 forecasts. However, the proposed DL models in above studies are usually for forecasting air pollution concentrations in next day on urban scale, which may not fully meet the requirements in practical applications.
In contrast to previous DL architectures, we develop a more advanced spatial-temporal DL model for short-range (0–72 h) PM2.5 forecast on regional scale, named air Pollution-Predicting Net for PM2.5 (PPN). Our model injects the feature variables in different convolutional layers in terms of their impacts on PM2.5, to imitate the behavior of CTMs. In addition, the observed PM2.5 concentrations over multiple preceding time-steps are also included to provide accurate initial field of PM2.5 forecast, like the Four-Dimensional Data Assimilation (FDDA) in CTMs. Thus, the PPN model integrates the strengths of DL network, CTMs and assimilation and is expected to achieve better performance in forecasting regional PM2.5 concentration with wide forecast time range. We apply the PPN model for forecasting hourly PM2.5 concentrations over a highly populated and industrialized region in China. Firstly, model performance for PM2.5 forecasts in both temporal and spatial distributions is evaluated, which is followed by impacts of weighted loss function and preceding observed PM2.5. Besides, we also compare the PPN model performance with the WRF-Chem results. Finally, model structure and data information are displayed.
Results
Overall performance
Using the trained PPN model and predictors, we forecasted the PM2.5 concentrations in January 2022 at every 9 km × 9 km grid cell over the Beijing-Tianjin-Hebei (BTH) region. Figure 1a displays the scatter plot of predicted and observed PM2.5 concentrations for January 2022 with the initial forecast time of UTC 00 every day. As we see, the R2 and RMSE values are 0.70 and 17.7 μg m−3, respectively. The result for June 2022 when PM2.5 pollution was light is also shown in Supplementary Fig. 1, with R2 and RMSE values of 0.49 and 6.9 μg m−3. The lower R2 and RMSE in comparison to that in winter may be a result of low PM2.5 concentration in June with small fluctuation. There is little difference among forecast results from different initial forecast time (Supplementary Table 1). The fairly good performance achieved by the PPN model suggests that it is suitable for forecasting PM2.5 concentrations. Aerosol pollution is usually severe in winter due to stagnation and intensive emissions38, so we mainly focus on model evaluation for the results in January 2022 in the following.
Figure 2 shows the spatial distributions of observed and forecasted PM2.5 concentrations. It can be found that the heavily polluted region is in the south with average concentration exceeding 100 μg m−3. The spatial pattern of PM2.5 over the BTH region is predominantly related to emission, meteorological and orographic factors39. The southern region is more industrialized than the northern region, resulting in higher emission rates of PM2.5 and its precursors. North winds can blow away haze, leading to more clean air in the north region, while south winds facilitate pollutant accumulation along mountains in the south. These determinant factors are all considered in the model training process. Therefore, the trained model can accurately capture the spatial pattern of PM2.5 with higher values in the south and lower values in the north over the BTH region.
It should be noted that RMSEs exhibit higher values in the southern region (up to 40−50 μg m−3) and lower values in the north (about 10−30 μg m−3) (Fig. 2c). Although higher base PM2.5 concentration in the south could explain part of it, the underpredicted PM2.5 with MB values of −15−−5 μg m−3 in the south may be the primary cause of high RMSEs. We further analyze the temporal variations of biases over three representative cities (Beijing, Shijiazhuang and Handan) from north to south (Supplementary Fig. 2). It is found that large biases in Handan, a southern city, mainly occurred around January 12 and 25 when cold air was moving from the north to the south (Supplementary Fig. 3). It dispelled the haze in the north whereas blew pollutants to the southern region. With regards to the southern cities, the “polluted” north winds are beneficial to increasing PM2.5 concentrations. However, the PPN model still has difficulty in capturing PM2.5 that is rapidly transported from a long distance. This deficiency may be caused by the consideration of a small range of transport process (45 × 45 km) in a timestep with limited computing resources.
We also compare observations with predicted PM2.5 concentrations for 0−72 h in advance, as shown in Fig. 3 (orange lines). The predicted mean PM2.5 concentrations are about 65 μg m−3 over the whole region, which are slightly lower than observations (Figs. 3a and d). The PPN results present decreased R2 and increased RMSE with forecast lead time, and the space-average values of 0.57−0.74 and 12−18 μg m−3, respectively. It is worth noting that the model performance gets worse with forecast lead time. This downward trend is particularly obvious during the 0–24 h and becomes stable after 24 h. As shown in Fig. 3b, the spatial R2 is about 0.74 at the initial forecast time, then rapidly reduced to 0.58 at the forecast lead time of 24 h. The RMSE value is about 12 μg m−3 in the beginning, then decreases to 17 μg m−3 at 24 h (Fig. 3c). The model performance levels off during the forecast lead time of 24−72 h with R2 of about 0.57 and RMSEs of 17−18 μg m−3, which is still an acceptable result. In a word, the model exhibits better performance at 0−24 h and achieves stable and reasonable prediction capacity when the forecast time extends to 3 days.
As mentioned in “Introduction”, ML algorithms have been extensively applied in air quality forecast due to strong capacity in effectively dealing with nonlinear relationship. Here, we compare the PPN model results with the other three models based on Random Forest, XGBoost and Multilayer Perceptron algorithms, called RF, XGB and MLP for short, respectively. Random Forest and XGBoost are both ensemble learning algorithms based on decision trees for classification and regression, and have been employed in pollutant dataset construction40 and air quality forecast25,26. Multilayer Perceptron is a basic algorithm for neural networks. In order to emphasize the advantage of the deep network of PPN, we employed a shallow network of the MLP model for comparison. The MLP model has 2 hidden layers of 32 and 8 neurons. We trained the RF, XGB and MLP models using the same features and time periods as those in the PPN model. In addition, each grid over the BTH region was utilized in the model training process, which do not fully consider the spatiotemporal correlations between the predictors and the target. The comparison results show that our PPN model has a prediction performance superior to the other three models in forecasting PM2.5 variations over 13 cities during January 2022 (Supplementary Fig. 4 and Fig. 5). This is highly related to the utilization of multi-convolution layers to capture the spatial relationship among grid cells and LSTM layers for temporal variations, as well as the introduction of preceding observation restraint and weighted loss function that will be elaborated in the following.
Impacts of weighted loss function
From the above, we can conclude that the proposed PPN model can well capture the spatiotemporal variability of PM2.5 concentration over the BTH region for the next 3 days. Interpolated PM2.5 data based on site observations were applied as fitting targets to acquire the gridded results. However, the interpolation may introduce biases into the target dataset, especially in regions with sparsely or unevenly distributed sites. In this study, we introduced a weighted loss function in the model training process to eliminate the adverse effects of PM2.5 interpolation. Detailed information about the weighted loss function can be found in “Methods” section at the end. The influence of the weighted loss function is evaluated in this section to illustrate the innovation of the PPN model. To achieve this, we conducted a comparative experiment that uses Mean Square Error (MSE) as loss function and do not consider the Inverse Distance Weighted (IDW)-based loss function adopted in the PPN model, called PPN_no_IDW for short. The comparison between PPN and PPN_no_IDW results under different PM2.5 levels is shown in Fig. 4. The PM2.5 classification is based on the Technical Regulation on Ambient Air Quality Index (HJ 633–2012). Here, MB is utilized as the evaluation metric instead of RMSE, to better differentiate between overprediction and underprediction. The comparison is based on results from the 114 monitoring sites over the BTH where direct observations are available.
From Fig. 4, we find that the PPN model tends to overpredict PM2.5 concentrations under clean condition (PM2.5 concentration ≤35 μg m−3) while underprediction occurs when the pollution level aggravates. The “high values underestimation, low values overestimation” phenomenon was also previously reported by Mao et al.41, suggesting the difficulty of capturing extreme values by deep learning algorithms at present. By comparison, the PPN model with weighted loss function outperforms the PPN_no_IDW and decreases the level of “high values underestimation, low values overestimation”. The improvement by the PPN is significant when PM2.5 concentration is ≤35 μg m−3 (clean condition) and >115 μg m−3 (moderate or heavy polluted condition). The probability of MB values >15 μg m−3 from the PPN model under clean condition is much less than that from the PPN_no_IDW (Fig. 4a). Under moderate or heavy polluted condition, the probability of large bias is also reduced by the PPN model. The average MB value by PPN is −28 μg m−3, whereas that from the PPN_no_IDW is −35 μg m−3 (Fig. 4d). The improvement by the PPN model is less significant when PM2.5 concentration is ≤115 μg m−3 and >35 μg m−3 (Fig. 4b, c). Overall, the PPN model, which considers more information of in-situ observations in the loss function, can achieve higher prediction accuracy at the monitoring sites.
Impacts of preceding observation restraint
To imitate the FDDA behavior for CTMs, another key feature of the PPN model is to consider the preceding PM2.5 observations within the encoder phase. To verify the significance of the observation restraint, our study also established a model similar to the PPN framework but without the Preceding Observation restraint in the encoder phase (called PPN_no_PO). Figure 1b shows the scatterplot of observed and predicted PM2.5 concentrations by PPN_no_PO for the test dataset. Compared to the PPN results, the PPN_no_PO model shows poorer performance with a lower R2-value of 0.63 and larger RMSE value of 19.9 μg m−3. Figure 2 also displays the predicted PM2.5 concentration by the PPN_no_PO for multi-step ahead prediction (green lines) and the comparison with the standard PPN result. The spatial R2 and RMSE values from the PPN_no_PO result remain at about 0.57 and 18 μg m−3 while those from the PPN model are 0.57−0.74 and 12−18 μg m−3, respectively. The better performance of the PPN model is readily evident during the 0−24 h forecast lead time. It highlights the importance of considering the preceding PM2.5 concentration in the encoder phase and indicates that the proposed PPN model is more advanced in PM2.5 forecast than traditional DL methods.
We also compare the PPN and PPN_no_PO results over the monitoring sites since most of these sites are close to human settlements and susceptible to attracting public attention. The predicted temporal characteristics of PM2.5 by PPN and PPN_no_PO over 13 cities of the BTH region are shown in Supplementary Fig. 6. The corresponding evaluation metrics are shown in Fig. 5. The comparisons in Supplementary Fig. 6 and Fig. 5 are based on results in 0−24 h forecast lead time when the difference between PPN and PPN_no_PO is most prominent. In general, the PPN-predicted PM2.5 concentrations are in good agreement with the observations at the monitoring sites. It well captured two regional pollution events during Jan 6−12 and Jan 22−26. The values of R2 and RMSE are 0.42−0.84 and 15−42 μg m−3, respectively. Similar to the previous results, the PPN predicted PM2.5 in the northern cities (BJ, TJ, LF, ZJK, CD, TS, QHD) performs better than those over the southern region (BD, SJZ, HD, CZ, HS). By comparison, the PPN architecture exhibits better performance than PPN_no_PO in forecasting PM2.5 concentration over most monitoring sites with larger R2 and smaller RMSE (Fig. 5). The performance improvement is more significant in the northern cities, such as BJ, TJ, LF and TS, with RMSE values decreased by 20−28% compared with those from PPN_no_PO. However, the reduction ratios are just 3−6% over the southern cities (HD, XT, HS).
Comparison with the WRF-Chem results
To further evaluate the performance of the PPN model in forecasting PM2.5 concentrations, we also compare it to the state-of-the-art CTM WRF-Chem model. In contrast to the PPN model, the WRF-Chem model integrated with data assimilation is built on the numerical methods that fully takes into account the atmospheric physical and chemical processes. It is a more interpretable tool for air quality forecast. Supplementary Fig. 7 shows the time series of predicted PM2.5 by PPN and WRF-Chem over monitoring sites during January 2022, and the corresponding evaluation indices are displayed in Fig. 6. The temporal R2 and RMSE values from the WRF-Chem are 0.30−0.77 and 19−45 μg m−3 while those from the PPN model are 0.42−0.84 and 15−42 μg m−3. Among 13 cities over the BTH, the RMSEs from the PPN model are 1−35% lower than those from the WRF-Chem. The reduction ratios in 10 cities are >10%. The PPN forecast result in June 2022 is also superior to the WRF-Chem result with larger R2 and lower RMSE (Supplementary Figs. 8 and 9). Therefore, we conclude that the PPN model outperforms the WRF-Chem. Moreover, we also evaluate the model capacity in forecasting PM2.5 concentration at different forecast lead times over the monitoring sites, as illustrated in Fig. 7. It is noted that the average RMSE from the PPN model is 25 μg m−3 at 0−24 h which is 17% lower than that from the WRF-Chem (30 μg m−3) (Fig. 7a). The reduction ratios slightly decrease to 14−16% with increased forecast lead time (24−72 h). The average R2-values of the PPN model are 0.61−0.68, which are higher than those of the WRF-Chem model (0.53−0.57) (Fig. 7d). No matter what the forecast lead time is, the PPN model outperforms the WRF-Chem in forecasting PM2.5 variations. We also find that the PPN model exhibits better predictive capacity particularly under clean and good air quality conditions (PM2.5 < 75 μg m−3), up to 25% lower than the RMSE from WRF-Chem. However, the maximum reduction ratio on polluted days is just 13%.
Summary and discussion
In this study, we developed a new spatial-temporal PPN model for short-range PM2.5 forecasting and applied it to the BTH region. The model separated the input features into “local” or “non-local” layers according to their direct impacts on PM2.5 to imitate the behavior of CTMs. We trained and validated the PPN model with meteorological, emission and derived data during 2020 and 2021, and utilized the trained model to predict PM2.5 variations in January 2022, which is served as the testing dataset. The predictive ability of the model was evaluated on a test dataset with observations.
Overall, the proposed model exhibits good accuracy in predicting spatial and temporal variations of PM2.5 concentrations over the BTH, with R2 and RMSE values of 0.70 and 17.7 μg m−3, respectively. Regarding spatial characteristics, the model successfully captures the high PM2.5 concentrations in the south and relatively low concentrations in the north. The model shows better performance in the next 24 h with spatial R2 and RMSE of 0.58−0.74 and 12−17 μg m−3, respectively. Then the performance gets stable as the forecast lead time increases. In a word, the model has good capacity in forecasting spatial and temporal variability of PM2.5 over the BTH.
The consideration of the weighted loss function and preceding PM2.5 observation in the encoder phase are two innovative points in the model structure. To further explore the advantage of the PPN model, we conducted two sensitivity tests, in which the aforesaid two points are not taken into account. Results show that the PPN model with weighted loss function outperforms the PPN_no_IDW and decreases the level of “high values underestimation, low values overestimation”. Considering the preceding PM2.5 concentration into encoder phase greatly improves the results in the next 24 h with R2 increased from 0.57 to 0.58−0.74 and RMSE decreased from 18 μg m−3 to 12−17 μg m−3. This improvement is also evident over the monitoring stations, especially over the northern cities. We also compare the model performance with the state-of-the-art WRF-Chem results. By comparison, the PPN model shows better predictive accuracy than WRF-Chem. The temporal R2 and RMSE values from the WRF-Chem are 0.30−0.77 and 19−45 μg m−3 while those from the DL model are 0.42−0.84 and 15−42 μg m−3. This better performance exists within all the forecasting lead times.
Spatial-temporal DL algorithms exhibit powerful capacity in dealing with non-linear correlations, making them suitable for air quality forecast. Several hybrid DL networks (CNN-LSTM and Graph Convolutional-LSTM) have already been successfully applied in PM2.5 forecast in winter Beijing with RMSE values of about 22−53 μg m−3 in the first 6 h30 and 24 μg m−3 for next 24 h42, respectively. Unlike the PPN model, these two results are based on a single or several surrounding sites in Beijing. The corresponding RMSEs from PPN are 11−17 μg m−3. Although the comparison is based on different forecast period, the better performance of PPN can still demonstrate the advanced network structure that imitates the behavior of CTMs with inclusion of the preceding PM2.5 observations and weighted loss function. In summary, the PPN model demonstrates strong potential for the application of spatiotemporal PM2.5 forecast over the BTH region, and it is an efficient and accurate tool for regional PM2.5 forecast.
The application of the PPN model could be extended to other components of air pollution, e.g., ozone and nitrogen oxides, with their related meteorological data and emission data as the input parameters. Moreover, it should be noted that we designed the PPN model for short-range air quality forecasts. For medium-range (over 5 days) forecasts on a day-to-day basis, spatial information would be more important than short-range one due to long-distance transport. Therefore, there would be a different DL model for the medium-range air quality forecasts.
Methods
Model structure
PPN has an encoder-decoder architecture and uses the PredRNN43 as the backbone network for capturing the spatiotemporal variation for PM2.5 concentrations. The PredRNN uses multi convolution layers to capture the spatial relationship among grid cells and LSTM layers for temporal variations. It is an updated version of ConvLSTM44, which adds a shortcut connecting the last convolutional layer and the first convolutional layer between adjacent timesteps. The model structure is shown in Fig. 8.
Spatially, the model is designed following the order of aerosol-related processes in CTMs. There is one convolutional layer with the convolutional kernel sizes of 1 × 1 (9 × 9 km), representing the local processes (i.e., chemical and turbulence diffusion processes). Two non-local layers follow, representing the short-distance transport and long-distance transport, with 3 × 3 (27 × 27 km) and 5 × 5 (45 × 45 km) respectively. Rather than considering all input features in the same layer as the PredRNN, the PPN framework firstly separate the feature variables into two parts, i.e., local variables and non-local variables (Table 1), according to their main physical-chemical effects on aerosol. The local variables including emissions, temperature, humidity, precipitation, sea level pressure (SLP) and planetary boundary layer height (PBLH), etc., which are more directly to affect local PM2.5 formation and depletion, thus are gone into the first “local layer”. These variables concerning synoptic patterns and transport process, such as wind, geopotential height at 700 hPa and terrain height, introduce to the PPN model at the beginning of “short transport layer”. It should be noted that now the PPN model only considers a slightly small range of transport process (within 5 × 5 grid cells) in a timestep due to the limitation of our computing resources. Therefore, large-scale inter-regional transport within one timestep may not be well represented by the model. More computing resources that can afford a large size of convolutional kernel could help to solve the problem in the future.
Temporally, the model uses an asymmetric encoder-decoder structure, all the timesteps before the initial time are encoding phase, like the spin-up phase in CTMs and Numerical Weather Prediction (NWP) models. The forecasting timesteps are decoding phase. The outputs of decoder in every timestep are the PM2.5 forecasts. During the decoding phase, the model inputs the meteorological variables, emissions and PM2.5 forecast from the last timestep as the feature variables. And in every timestep of the encoding phase, in addition to the above variables, the PPN model added the PM2.5 observation in the first convolutional layer. Since the LSTM is able to learn and remember the input information for the past period, we expect the PPN to provide a better initial field by adding multi observations across a preceding time period. This idea is similar to a sequential FDDA such as grid nudging45,46 or Ensemble Kalman Filter47.
Experimental region and feature selection
We took the Beijing-Tianjin-Hebei (BTH) region (36.3−42.0°N, 111.8−121.6°E) as our experimental region, which is surrounded by mountains in the north and west and consists of megacities like Beijing (BJ) and Tianjin (TJ) (Fig. 9b). The BTH region is a highly industrialized and densely populated region confronting with severe aerosol pollution48, thus has strong demands for air quality prediction to formulate emission reduction strategy and alert the public health.
PM2.5 concentration is influenced by both meteorological conditions and precursor emissions. Therefore, we classified the input parameters into three categories: i.e., meteorological variables, emission data and derived parameters, as listed in Table 1. Meteorological data were from the Weather Research and Forecasting (WRF) model simulation that provides a downscaled meteorological field based on the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 reanalysis data (https://cds.climate.copernicus.eu/, last access: 2023/02/27). The WRF model was configured to cover the whole China region (Fig. 9a) following Feng et al.49, with a horizontal resolution of 9 km and 38 vertical layers up to 50 hPa. The initial and boundary conditions were obtained from the ERA5 reanalysis data with horizontal resolution of 0.25° × 0.25°. Supplementary Table 2 shows the detailed parameterization schemes used in the WRF simulation. The meteorological variables included factors that relate to synoptic pattern (SLP, geopotential height at 700 hPa), aerosol chemistry (temperature, relative humidity), transport (wind at 10 m and 850hPa, terrain height), vertical diffusion (PBLH), and wet scavenging (precipitation).
We also considered the emission data as input features to distinguish spatial and monthly variations of PM2.5 precursors. The emission data were based on the Multi-resolution Emission Inventory for China (MEIC) developed by Tsinghua University (http://meicmodel.org/, last access: 2022/6/30). It is known that SO2, NOx, VOCs and primary PM2.5 directly contribute to the formation of sulfate, nitrate, black carbon, organic carbon and other inorganics that constitute PM2.5 in the atmosphere. Thus, emission rates of these species were utilized as input features in this study. As the original emission data are in the resolution of 0.25°, we firstly interpolated them to the WRF grid with a horizontal resolution of 9 km.
In addition to the meteorological and emission variables, several derived parameters were also used as input features, as suggested by aerosol-meteorology interactions. Tai et al.50 concluded that high PM2.5 concentrations in the continents were correlated with reduced SLP due to simultaneous presence of cyclones and convergence. Therefore, the change of SLP compared to 24 h ahead was selected as an input feature in our study. Additionally, cold fronts bring a decrease in temperature while stagnation is usually associated with increasing or steady temperature. Thus, changes of temperature indicate synoptic variations thus affecting aerosol concentrations. Rather than utilizing temperature changes at surface, we employed those at 850 hPa to minimize topographic and urban effects. Besides, annual mean PM2.5 concentrations (averages during 2020–2021) in each grid were also adopted as a static feature for learning spatial correlations. Another important feature is the preceding PM2.5 observation in the encoder phase and forecast start time to reduce the forecast error that has already been previously introduced.
All of the input variables were firstly standardized to eliminate impacts of unit inconsistency on forecasting results. We assumed that all data obey the standard normal distribution with a mean value of 0 and a standard deviation of 1, thereby they could be standardized one by one following:
where \(z\) and \(x\) represent the standardized and original data, \(\bar{x}\) and \(\sigma\) are mean and standard deviation values.
Observation data
Surface PM2.5 concentrations were obtained from monitoring stations over the BTH region and its surrounding area, accessed from the China National Environmental Monitoring Centre (CNEMC, http://www.cnemc.cn/). The distribution of these stations is shown in Fig. 9b. To keep consistent with the input feature map, we interpolated the station data of PM2.5 concentration to grid data with a horizontal resolution of 9 km using the Inverse Distance Weighted (IDW) method. The stations with high density over the BTH and its surrounding areas could reduce the bias from interpolation to some extent. The gridded PM2.5 data were applied both in the fitting target and input features. These input features are the annual mean PM2.5 and the preceding PM2.5 which have been introduced in ”Experimental region and feature selection“.
WRF-Chem results
The purpose of our study is to develop a more efficient and accurate PM2.5 forecasting system based on the PPN framework. For comparison purposes, we also adopted the WRF-Chem model to predict hourly PM2.5 concentrations with a resolution of 9 km, covering the North China Plain and its surrounding areas. The WRF-Chem model has the same meteorological configuration as the WRF described in “Experimental region and feature selection”. For chemistry, the Carbon Bond Mechanism version Z (CBMZ)51 and the Model for Simulating Aerosol Interactions and Chemistry (MOSAIC)52 were used as the gas-phase chemistry and aerosol mechanisms to conduct PM2.5 forecasting. Emission rates of anthropogenic pollutants were also from the MEIC inventory, the same as those in the PPN framework. To improve the forecast skill, the three-dimensional variational (3DVAR) algorithm was applied in the WRF-Chem model. Surface pollutant observations (PM2.5, PM10, SO2, NO2, O3, and CO) were assimilated into the model following the method in Sun et al.20. The predicted PM2.5 concentrations generated from the WRF-Chem model were compared with the PPN results to further demonstrate the performance of the proposed DL model.
Experiment
In this study, the input features together with the PM2.5 observations in 2 years (2020–2021) were adopted as the training and validating datasets, 90% of which were randomly chosen as the training set (for model training) and 10% of which were the validating set (for model hyper-parameter optimization or tuning). Data from January and June 2022 were employed as the test dataset to evaluate the model performance. Aerosol pollution is usually severe in winter due to stagnation and intensive emissions38, thus we mainly focus on model evaluation for the January 2022 results. There are 5800 sequences with 80 × 80 grids in the training and validating dataset. Each sequence in the encoder-decoder architecture contained 2 days for inputs and 3 days for prediction with a time resolution of 3 h. In the PPN model training, Adam Optimizer and one-cycle learning rate schedule with a weight decay parameter of 10−4 were utilized. This learning rate schedule can substantially speed up training process with high test accuracy53. The gradient clipping strategy was also adopted in our model to avoid “gradient explosion” phenomenon.
As described in “Observation data”, the interpolated PM2.5 data were applied as fitting targets to acquire a gridded result. However, the interpolation may introduce biases into the target dataset, especially in regions with sparsely or unevenly distributed sites. To eliminate the influence of interpolation on model training, we proposed a weighted loss function based on the IDW method and mean square error (MSE) metric (WMSE). The WMSE is defined as:
where \({w}_{i}\) denotes the weight factor for MSE in grid i, \({y}_{i}\) and \(\hat{{y}_{i}}\) represent the observed and predicted PM2.5 concentrations in grid \(i\). \({w}_{i}\) is considered as 1 if there is one or more PM2.5 monitoring stations in gird \(i\). If there is no station in gird \(i\), \({w}_{i}\) is the average of square sum of the inverse distance between gird \(i\) and the stations in 3 × 3 grids. If there is no station in the surrounding 3 × 3 grids, \({w}_{i}\) is set to a minimum value 0.1. The application of WMSE increases the weights of MSE in areas near the PM2.5 monitoring sites, thus reducing biases introduced by interpolation over regions with sparse sites during the model learning process.
Data availability
The ERA5 reanalysis data are obtained from https://cds.climate.copernicus.eu/. The observed PM2.5 concentrations are obtained from the China National Environmental Monitoring Centre (CNEMC, http://www.cnemc.cn/). All WRF data and PPN forecast data are available upon request from the corresponding author (Dr. Jin Feng).
Code availability
The pre-trained model and inference code can be found at https://gitee.com/jfengcode/pub_ppn. They are also available upon request from the corresponding author (Dr. Jin Feng).
References
Zhai, S. et al. Fine particulate matter (PM2.5) trends in China, 2013–2018: separating contributions from anthropogenic emissions and meteorology. Atmos. Chem. Phys. 19, 11031–11041 (2019).
Xiao, Q. et al. Tracking PM2.5 and O3 pollution and the related health burden in China 2013−2020. Environ. Sci. Technol. 56, 6922–6932 (2022).
Grell, G. A. et al. Fully coupled “online” chemistry within the WRF model. Atmos. Environ. 39, 6957–6975 (2005).
Appel, K. W. et al. The Community Multiscale Air Quality (CMAQ) model versions 5.3 and 5.3.1: system updates and evaluation. Geosci. Model Dev. 14, 2867–2897 (2021).
Bey, I. et al. Global modeling of tropospheric chemistry with assimilated meteorology: Model description and evaluation. J. Geophys. Res. Atmos. 106, 23073–23095 (2001).
Goldberg, D. L. et al. Using gap-filled MAIAC AOD and WRF-Chem to estimate daily PM2.5 concentrations at 1 km resolution in the Eastern United States. Atmos. Environ. 199, 443–452 (2019).
Kong, Y. et al. Improving PM2.5 forecast during haze episodes over China based on a coupled 4D-LETKF and WRF-Chem system. Atmos. Res. 249, 105366 (2021).
Marmur, A., Park, S. K., Mulholland, J. A., Tolbert, P. E. & Russell, A. G. Source apportionment of PM2.5 in the southeastern United States using receptor and emissions-based models: Conceptual differences and implications for time-series health studies. Atmos. Environ. 40, 2533–2551 (2006).
Wang, L. et al. Source apportionment ofPM2.5 in top polluted cities in Hebei, China using the CMAQ model. Atmos. Environ. 122, 723–736 (2015).
Guo, H. et al. Source apportionment ofPM2.5 in North India using source-oriented air quality models. Environ. Pollut. 231, 426–436 (2017).
Qiu, Y., Liao, H., Zhang, R. & Hu, J. Simulated impacts of direct radiative effects of scattering and absorbing aerosols on surface-layer aerosol concentrations in China during a heavily polluted event in February 2014. J. Geophys. Res. Atmos. 122, 5955–5975 (2017).
Li, K. et al. Ozone pollution in the North China Plain spreading into the late-winter haze season. Proc. Natl Acad. Sci. USA 118, e2015797118 (2021).
Zhu, J., Chen, L. & Liao, H. Multi-pollutant air pollution and associated health risks in China from 2014 to 2020. Atmos. Environ. 268, 118829 (2022).
Aleksankina, K., Reis, S., Vieno, M. & Heal, M. R. Advanced methods for uncertainty assessment and global sensitivity analysis of an Eulerian atmospheric chemistry transport model. Atmos. Chem. Phys. 19, 2881–2898 (2019).
Vautard, R. et al. Evaluation of the meteorological forcing used for the Air Quality Model Evaluation International Initiative (AQMEII) air quality simulations. Atmos. Environ. 53, 15–37 (2012).
Foley, K. M. et al. Incremental testing of the Community Multiscale Air Quality (CMAQ) modeling system version 4.7. Geosci. Model Dev. 3, 205–226 (2010).
Jiang, Z. et al. Probing into the impact of 3DVAR assimilation of surface PM10 observations over China using process analysis. J. Geophys. Res. Atmos. 118, 6738–6749 (2013).
Dai, T., Schutgens, N. A., Goto, D., Shi, G. & Nakajima, T. Improvement of aerosol optical properties modeling over Eastern Asia with MODIS AOD assimilation in a global non-hydrostatic icosahedral aerosol transport model. Environ. Pollut. 195, 319–329 (2014).
Jung, J. et al. The impact of the direct effect of aerosols on meteorology and air quality using aerosol optical depth assimilation during the KORUS‐AQ campaign. J. Geophys. Res. Atmos. 124, 8303–8319 (2019).
Sun, W., Liu, Z., Chen, D., Zhao, P. & Chen, M. Development and application of the WRFDA-Chem three-dimensional variational (3DVAR) system: aiming to improve air quality forecasting and diagnose model deficiencies. Atmos. Chem. Phys. 20, 9311–9329 (2020).
Lee, S. et al. Seasonal dependence of aerosol data assimilation and forecasting using satellite and ground-based observations. Remote Sens. 14, 2123 (2022).
Misenis, C. & Zhang, Y. An examination of sensitivity of WRF/Chem predictions to physical parameterizations, horizontal grid spacing, and nesting options. Atmos. Res. 97, 315–334 (2010).
Thomas, S. & Jacko, R. B. Model for forecasting expressway fine particulate matter and carbon monoxide concentration: application of regression and neural network models. J. Air Waste Manage. Assoc. 57, 480–488 (2007).
Karimian, H. et al. Evaluation of different machine learning approaches to forecasting PM2.5 mass concentrations. Aerosol Air Qual. Res. 19, 1400–1410 (2019).
Ma, J., Yu, Z., Qu, Y., Xu, J. & Cao, Y. Application of the XGBoost machine learning method in PM2.5 prediction: a case study of Shanghai. Aerosol Air Qual. Res. 20, 128–138 (2020).
Bi, J., Knowland, K. E., Keller, C. A. & Liu, Y. Combining machine learning and numerical simulation for high-resolution PM2.5 concentration forecast. Environ. Sci. Technol. 56, 1544–1556 (2022).
Zhang, B. et al. Deep learning for air pollutant concentration prediction: a review. Atmos. Environ. 290, 119347 (2022).
Feng, J., Li, Y., Qiu, Y. & Zhu, F. Capturing synoptic-scale variations in surface aerosol pollution using deep learning with meteorological data. Atmos. Chem. Phys. https://doi.org/10.5194/acp-23-375-2023 (2022).
Sayeed, A. et al. Using a deep convolutional neural network to predict 2017 ozone concentrations, 24 h in advance. Neural Netw. 121, 396–408 (2020).
Yan, R. et al. Multi-hour and multi-site air quality index forecasting in Beijing using CNN, LSTM, CNN-LSTM, and spatiotemporal clustering. Expert Syst. Appl. 169, 114513 (2021).
Athira, V., Geetha, P., Vinayakumar, R. & Soman, K. P. DeepAirNet: applying recurrent networks for air quality prediction. Procedia Comput. Sci. 132, 1394–1403 (2018).
Ong, B. T., Sugiura, K. & Zettsu, K. Dynamically pre-trained deep recurrent neural networks using environmental monitoring data for predicting PM2.5. Neural Comput. Appl. 27, 1553–1566 (2016).
Yu, Y., Si, X., Hu, C. & Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 31, 1235–1270 (2019).
Du, S., Li, T., Yang, Y. & Horng, S. J. Deep air quality forecasting using hybrid deep learning framework. IEEE Trans. Knowl. Data Eng. 33, 2412–2424 (2021).
Pak, U. et al. Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: A case study of Beijing, China. Sci. Total Environ. 699, 133561 (2020).
Yeo, I., Choi, Y., Lops, Y. & Sayeed, A. Efficient PM2.5 forecasting using geographical correlation based on integrated deep learning algorithms. Neural Comput. Appl. 33, 15073–15089 (2021).
Zhu, J., Deng, F., Zhao, J. & Zheng, H. Attention-based parallel networks (APNet) for PM2.5 spatiotemporal prediction. Sci. Total Environ. 769, 145082 (2021).
Wang, S. et al. Spatial distribution, seasonal variation and regionalization of PM2.5 concentrations in China. Sci. China Chem. 58, 1435–1443 (2015).
Yan, D. et al. Evolution of the spatiotemporal pattern of PM2.5 concentrations in China –A case study from the Beijing-Tianjin-Hebei region. Atmos. Environ. 183, 225–233 (2018).
Li, H. et al. Constructing a spatiotemporally coherent long-term PM2.5 concentration dataset over China during 1980–2019 using a machine learning approach. Sci. Total Environ. 765, 144263 (2021).
Mao, W., Wang, W., Jiao, L., Zhao, S. & Liu, A. Modeling air quality prediction using a deep learning approach: Method optimization and evaluation. Sustain. Cities and Soc. 65, 102567 (2021).
Sun, Q., Zhu, Y., Chen, X., Xu, A. & Peng, X. A hybrid deep learning model with multi-source data for PM2.5 concentration forecast. Air Qual. Atmos. Health. 14, 503–513 (2021).
Wang, Y. et al. PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2022.3165153 (2022).
Shi, X. et al. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 28, 802–810 (2015).
Bowden, J. H., Otte, T. L., Nolte, C. G. & Otte, M. J. Examining interior grid nudging techniques using two-way nesting in the WRF model for regional climate modeling. J. Clim. 25, 2805–2823 (2012).
Jeon, W. et al. A quantitative analysis of grid nudging effect on each process of PM2. 5 production in the Korean Peninsula. Atmos. Environ. 122, 763–774 (2015).
Houtekamer, P. L. & Zhang, F. Review of the ensemble Kalman filter for atmospheric data assimilation. Mon. Weather Rev. 144, 4489–4532 (2016).
Feng, J., Quan, J., Liao, H., Li, Y. & Zhao, X. An air stagnation index to qualify extreme haze events in northern China. J. Atmos. Sci. 75, 3489–3505 (2018).
Feng, J., Huang, X. & Li, Y. Improving surface wind speed forecasts using an offline surface multilayer model with optimal ground forcing. J. Adv. Model. Earth Syst. 14, 1–16 (2022).
Tai, A. P., Mickley, L. J. & Jacob, D. J. Correlations between fine particulate matter (PM2.5) and meteorological variables in the United States: Implications for the sensitivity of PM2.5 to climate change. Atmos. Environ. 44, 3976–3984 (2010).
Zaveri, R. A. & Peters, L. K. A new lumped structure photochemical mechanism for large-scale applications. J. Geophys. Res. Atmos. 104, D23 (1999).
Zaveri, R. A., Easter, R. C., Fast, J. D. & Peters, L. K. Model for Simulating Aerosol Interactions and Chemistry (MOSAIC). J. Geophys. Res. Atmos. 113, D13 (2008).
Smith, L. N. & Nicholay, T. Super-convergence: very fast training of neural networks using large learning rates. arXiv https://doi.org/10.48550/arXiv.1708.07120 (2019).
Acknowledgements
This research was supported by National Natural Science Foundation of China (Grant No. 42275009), the Open fund by Jiangsu Key Laboratory of Atmospheric Environment Monitoring and Pollution Control (Grant No. KHK2001), National Natural Science Foundation of China (Grant No. 41975168), the Beijing Natural Science Foundation (Grant No. 8194078).
Author information
Authors and Affiliations
Contributions
J.F. conceived the study and design the model. Y.Q. performed the data gathering, model training and validation, analyses, visualizations and wrote the paper. Y.Q., J.F., Z.Z., X.Z., and R.L. contributed to the WRF and WRF-Chem simulation. Z.Z., and Z.L. helped data gathering. J.F., Z.M. and J.Z. helped in editing and revising the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Qiu, Y., Feng, J., Zhang, Z. et al. Regional aerosol forecasts based on deep learning and numerical weather prediction. npj Clim Atmos Sci 6, 71 (2023). https://doi.org/10.1038/s41612-023-00397-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41612-023-00397-0
- Springer Nature Limited
This article is cited by
-
Transforming air pollution management in India with AI and machine learning technologies
Scientific Reports (2024)
-
Aerosol atmospheric rivers: patterns, impacts, and societal insights
Environmental Science and Pollution Research (2024)
-
Resilience to Air Pollution: A Novel Approach for Detecting and Predicting Aerosol Atmospheric Rivers within Earth System Boundaries
Earth Systems and Environment (2024)