Introduction

Fine particulate matter (PM2.5), which consists of particles with aerodynamic diameters < 2.5 µm, has attracted considerable scientific attention (Pope & Dockery, 2006). Previous studies have indicated that prolonged exposure to PM2.5 is affiliated with many human health issues including respiratory problems, cardiovascular disease, cancer, and infectious diseases (Bartell et al., 2013; Brauer et al., 2012; Chen et al., 2017; Crouse et al., 2012; Dominici et al., 2006; Gent et al., 2009; Guo et al., 2016; Lao et al., 2019; Pope, 2000; Zhang et al., 2020). Generally, ground monitoring sites can provide accurate PM2.5 measurements, but there are many regions for which measurements are unavailable as there are no monitoring networks. Thus, the sparse distribution of ground sites limits our capability to estimate the impacts of human exposure to PM2.5, with data on local meteorological effects and emission sources absent. Consequently, it is important that models that can accurately predict the broader spatiotemporal distribution of ground-level PM2.5 concentrations are developed.

Satellite has been applied to monitor ground-level PM2.5 emissions to fill in spatial gaps in ground measurement coverage (Chu et al., 2016; Hu et al., 2014; Kloog et al., 2012; Ma et al., 2014). Several studies were conducted for estimating PM2.5 concentrations from the aerosol optical depth (AOD), derived by satellite remote sensing, including multiple linear regression (Chu et al., 2016; Gupta & Christopher, 2008, 2009; Kacenelenbogen et al., 2006; Liu et al., 2005; Paciorek et al., 2008; Schaap et al., 2009; Wang, 2003; Yao et al. 2018), mixed-effect models (Just et al. 2015; Kloog et al. 2011, 2012, 2014; Lee et al. 2012; Zheng et al. 2016), geographically weighted regressions (Bai et al., 2016; Guo et al., 2017; He & Huang, 2018a, b; Hu, 2009; Ma et al., 2014; You et al., 2015; Zou et al., 2016), and chemical transport models (Crouse et al., 2012, 2016; Hystad et al., 2012; Liu et al., 2004; van Donkelaar et al., 2006; Wang & Chen, 2016). To improve the model performance, an increasing number of predictors, including meteorological information, land cover, and aerosol properties, were integrated. Thus, machine learning models, which are capable of complex nonlinear relationships fitting, have been applied to get PM2.5 concentrations from satellite observations. For example, random forests (Chen et al., 2018; Hu et al., 2017), deep belief networks (DBNs) (Li et al., 2018; Liu et al., 2018), deep neural networks (DNNs) (Wang & Sun, 2019), and machine learning models with high-dimensional expansion (Xue et al., 2019) have been used, and they have delivered superior prediction accuracy and applicability.

All these studies were limited to the use of polar orbit satellites, however, geostationary satellites are still rarely used in the estimation of PM2.5. These geostationary satellites are able to conduct more measurements and facilitate the capturing of atmospheric aerosol variation data hourly. In particular, with the launch and operation of next-generation geostationary meteorological satellites—such as the Advanced Geosynchronous Radiation Imager (AGRI) on board FengYun-4A, the Advanced Himawari Imager (AHI) on board Himawari-8/9, and the Advanced Baseline Imager (ABI) on board GOES-R—abundant AOD datasets have become available. The quality of these data has been validated: for example, it has been reported that the expected uncertainty for the Himawari-8 AOD is ± (0.1 + 0.3 × AOD) (Zhang et al., 2019), whereas the expected uncertainty for the MODIS (C6.1) 10 km AOD product is ± (0.05 + 0.15 × AOD) (Aldabash et al., 2020).

The North China Plain (NCP), which is renowned for experiencing severe atmospheric pollution events, has experienced high PM2.5 concentrations for decades owing to the rapid economic and population development that has taken place nearby. In this study, we applied the DNN methodology to estimate hourly ground-level PM2.5 concentrations over the NCP using Himawari-8 AHI AOD data.

The balance of this paper has been laid out as follows: data sets and a detailed description of the methodology may be found in “Materials and Methods” section, whereas the results and discussion have been given in “Results and Discussion” section. The study has been concluded in “Summary and Conclusions” section.

Materials and Methods

Datasets

Ground-Level PM2.5 Measurements

Ground-level PM2.5 concentration dataset was obtained from the China Environmental Monitoring Center (CEMC). Hourly PM2.5 measurements from 313 air quality sites in the NCP (as shown in Fig. 1) were collated for 2017. These concentrations represented the hourly averages established at the stations by the tapered element oscillating microbalance (TEOM). The accuracy of TEOM is ± 1.5 μg/m3 (You et al., 2016). The data are available at http://106.37.208.233:20035/.

Fig. 1
figure 1

North China Plain elevation. Ground-level data for the study were acquired from the air quality sites (white dots) (color figure online)

Himawari-8 AOD

Himawari-8 was launched by the Japan Meteorological Agency on October 7, 2014. It is a next-generation geostationary meteorological satellite. The AHI on board Himawari-8 has 16 channels. Its spatial temporal resolution is 0.5–2 km and 5–10 min, respectively (Yumimoto et al., 2016; Zhang et al., 2019).

The AOD products are provided with three levels: “Level2” (L2), “Level3” (L3), and “Level4” (L4) (Kikuchi et al., 2018). The spatial and temporal resolution of L3 is 0.05° and 1 h, respectively. In this study, the AOD data of L3 Version 3.0, which can be received from https://www.eorc.jaxa.jp/ptree/index.html, were collected to estimate hourly PM2.5 concentrations. It should be noted that a comprehensive AOD validation, as used in this study, can be found in our previous work (Zhang et al., 2019).

Meteorological and Land Cover Data

Meteorological data for 2017 were obtained from the second Modern-Era Retrospective analysis for Research and Applications (MERRA-2). It is the atmospheric product supported by National Aeronautics and Space Administration, with a spatial resolution of 0.5 × 0.625° (Gelaro et al., 2017; Rienecker et al., 2011). We extracted six hourly meteorological factors from this dataset: surface pressure (PS; as Pa), air temperature at 2 m (TMP; as K), E and N wind speed at 10 m above ground (EW and NW; as m/s), relative humidity (RH; as %), and planetary boundary layer height (PBLH; as m). The data can be downloaded from the website https://disc.gsfc.nasa.gov/datasets.

Landcover-related variables—surface albedo (ALBEDO; which is a unitless variable) and surface incoming shortwave flux (SWGDN; as W/m2)—were also extracted from the MERRA-2 data. Elevation (ELEV; as m) was terrain data at 1 km, whereas the normalized difference vegetation index (NDVI) was derived from MOD13A3, a monthly 1-km resolution dataset.

Based on the AOD, meteorological and land cover, there are 11 predictors were used to derive hourly PM2.5 concentrations over the NCP. That is AOD, surface pressure, air temperature, wind speed (E and N), PBLH, RH, ALBEDO, SWGDN, ELEV, and NDVI. Statistics for these datasets and predictors are listed in Table 1.

Table 1 Dataset information and statistics

Methods

Data Integration

Firstly, because the original data involved various coordinate systems and spatial resolutions, all independent variables were recalibrated into the WGS84 coordinate system. Meteorological variables and land cover data were also recalibrated to, in this case, 0.05° resolution to ensure consistency. After these processes, the 11 predictors were matched with ground PM2.5 in a co-location procedure. The predictors were collected into a station-centered pixel. These selection process eventually gave rise to a dataset consisting of 151,726 records.

DNN Model

The concentrations of ground-level PM2.5 were affected by multiple factors, such as aerosol, meteorological, and surface cover. This complex relationship is difficult to describe accurately with a simple linear model, and so deep learning, which has been widely used in fitting complex, nonlinear relationships, was used to estimate ground-level PM2.5 concentrations. Thus, a DNN model (Hinton et al., 2012) was fitted using Eq. (1):

$${\text{PM}}_{2.5} \, = \, f\left( {\text{AOD, PS, TMP, EW, NW, PBLH, RH, ALBEDO, SWGDN, ELEV, NDVI}} \right){,}$$
(1)

where f (\(\cdot\)) describes the prediction function. The meaning of each predictor has been described above.

Figure 2 shows the structure of the DNN model, where it can be seen that the model contained five hidden layers (which contained 60, 40, 30, 20, and 10 neurons), one input layer (which contained the 11 neurons shown in Eq. (1), and one output layer (which consisted of PM2.5 concentration estimates). This gave the proposed DNN model a structure of 11-60-40-30-20-10-1. It should be noted that the numbers of layers and neurons were chosen by increasing the numbers of neurons until the best estimation results were derived.

Fig. 2
figure 2

Structure of the DNN model used for PM2.5 estimation

Figure 3 illustrates the workflow used to estimate ground-level, hourly PM2.5. The process can be described as follows:

  1. (1)

    Conduct data integration, as described in “Data Integration” section. The derived AOD, meteorological variables, and land cover, which were consistent in time and space, were treated as model training and validation samples.

  2. (2)

    Perform DNN model fitting. To this end, all the 151,726 records (belonging to 313 sites) were first used to train the model, after which sample- and site-based tenfold cross-validation (CV) was carried out to evaluate the performance. CV was conducted as follows:

    1. (a)

      For sample-based CV, the samples were randomly divided into ten sets, with each set accounting for approximately 10% of the records. For each CV process, nine sets were used for training samples, with the tenth used to make predictions. Then, we repeated ten times until the predictions from each set were established;

    2. (b)

      Site-based CV was conducted to examine model sensitivity with respect to the number of ground stations and performance with respect to spatial variations. The 313 sites were randomly split into ten sets, with each set accounting for approximately 10% of the sites. As for the sample-based CV, nine sets were used to fit the model, with the tenth used to make predictions. PM2.5 concentration predictions were finally obtained by completing ten CV cycles;

    3. (c)

      Finally, to evaluate the levels of agreement, linear fit statistics for model predictions vs observations were performed. The statistical indicators include the correlation coefficient (R), slope, y intercept, and prediction root mean squared error (RMSE).

  3. (3)

    Finally, the prediction data (locations with no ground PM2.5 observations) were input into the derived DNN model to obtain spatial distributions of hourly PM2.5 concentrations over the NCP.

Fig. 3
figure 3

Workflow applied to estimate ground-level, hourly PM2.5 concentrations

Results and Discussion

Descriptive Statistics

Variable histograms, covering all 151,726 samples, are presented in Fig. 4. The surface pressure (PS) was found to be almost exponentially distributed, whereas air temperature (TMP) was approximately bimodal in its distribution. The E wind speed, N wind speed, PBLH, surface albedo, surface incoming shortwave flux (SWGDN), RH, and NDVI were found to have approximately normal distributions. The other variables, including elevation, AOD, and PM2.5, exhibited similar logarithmic distributions. The minimum, maximum, and mean PM2.5 concentrations were 1, 534, and 55.13 µg/m3, respectively, whereas the minimum, maximum, and mean AOD were calculated to be 0, 2.99, and 0.4, respectively, with the high AOD and PM2.5 maximums indicating that severe pollution was experienced over the NCP. Their standard deviations were calculated as 0.34 and 42.15 mg/m3, respectively, indicating that there had been significant fluctuations in the atmospheric particulate matter concentrations.

Fig. 4
figure 4

Descriptive statistics for dependent and independent variables (minimums, maximums, means, and standard deviations). The meaning of each item is explained in text

Variable Importance Analysis

To evaluate the potential effect of each predictor to the proposed DNN, we conducted a variable importance analysis. In Fig. 5, the variables for predicting PM2.5 have been represented in the y axis, with the percentage RMSE increase without using the corresponding variable (%IncRMSE) shown on the x axis. The figure shows that AOD (55.07%) was the variable with the highest contribution, which would be attributable to its strong correlation with PM2.5 concentrations. The %IncRMSE calculated without using air temperature as a predictor was 45.92%, followed by PBLH (41.21%), SWGDN (35.07%), RH (34.86%), N wind speed (32.09%), surface pressure (30.63%), E wind speed (24.22%), and surface albedo (24.01%).

Fig. 5
figure 5

Variable importance analysis. The x axis indicates the percentage RMSE increase achieved without using the corresponding predictor (%IncRMSE)

Figure 5 also shows that elevation and NDVI were the weakest contributing variables for hourly PM2.5 concentrations. These relatively low contributions may have been due to the small elevation variations over the NCP (shown in Fig. 4j). Furthermore, hourly PM2.5 concentrations, usually with high frequency variations, were likely to be less affected by variables that varied only slowly over time, such as NDVI.

Model Performance and Validation

The scatter plot showing ground observed (x axis) and estimated (y axis) PM2.5 concentrations for sample- and site-based CVs is shown in Fig. 6a, b. The R, RMSE, slope, and y intercept of sample-based CVs were 0.86, 21.40 μg/m3, 0.81, and 10.22 μg/m3, respectively; with regard to the site-based CVs, the corresponding values were 0.83, 23.65 μg/m3, 0.76, and 13.17 μg/m3, respectively. This good level of consistency demonstrated that the proposed DNN model was capable of achieving satisfactory performance.

Fig. 6
figure 6

Scatter plots for tenfold cross-validations (CVs): a sample-based CV; b site-based CV

We noted that the site-based CV outcome was comparable with that of the sample-based CV, which indicated that the proposed model had good spatial prediction capability. In addition, both the regression linear fit slopes were < unity (0.81 and 0.76), which implied that the DNN model tended to develop results that were slightly underestimated in comparison with the observed PM2.5 concentration. This underestimation was confirmed by noting observed PM2.5 concentrations > 54 μg/m3.

We deduced two possible reasons for the ground-observed PM2.5 concentration underestimation by the model. Firstly, using spatially averaged AOD, meteorological, and land surface variables to estimate point ground-level PM2.5 meant that it was difficult for meteorological parameters with relatively coarse spatial resolution (0.5° × 0.625°) to characterize detailed spatial variations. The other reason was that a spatial average would lead to high values being averaged and low values being overwhelmed. Taken together, these points meant that when we compared spatially averaged estimations with ground measurements, low values appeared to be overestimated and high values underestimated.

The spatial performance of the sample-based CV is illustrated in Fig. 7, where it can be seen that both R and RMSE (Figs. 7a, b, respectively) exhibit spatial variations. The R ranged between 0.48 and 0.93, whereas the RMSE ranged between 13.33 and 45.63, with 81% of R > 0.75 (254 sites out of 313), and 68% of the RMSE < 25 μg/m3 (213 out of the 313 sites). With respect to geographic distribution, sites in Henan Province performed better, with the RMSE being lower in Anhui and Zhejiang.

Fig. 7
figure 7

Cross-validation spatial performance: a R; b RMSE

Scatter plots for ground-observed (x axis) and estimated (y axis) daily and monthly PM2.5 concentrations are shown in Figs. 8a, b. The daily data R and RMSE were calculated to be 0.81 and 24.94 μg/m3, whereas the corresponding estimates for the monthly data were 0.90 and 9.91 μg/m3, respectively, showing that the model could accurately represent seasonal PM2.5 levels, with only small deviations.

Fig. 8
figure 8

Model validation scatter plots: a daily data; b monthly averages

Map of PM2.5 Estimation

The distributions of seasonal average PM2.5 over NCP in 2017 are shown in Fig. 9; in Fig. 9a–d; the seasonal variations in PM2.5 are clearly observable. The highest PM2.5 estimations were in winter, followed by spring, and then autumn, with the lowest values appearing in summer. Mean PM2.5 estimations for spring, summer, autumn, and winter were 48.93, 39.85, 48.06, and 74.78 μg/m3, respectively. The seasonal ground-level PM2.5 observations showed spatial distributions similar to those seen for the estimates, as shown in Figs. 9e–h. The R values of four seasons were 0.88, 0.86, 0.91, and 0.97, respectively (Figs. 9i–l), whereas the corresponding RMSEs were 4.53, 5.22, 4.83, and 5.80 μg/m3, respectively.

Fig. 9
figure 9

Spatial distributions for spring (March, April, and May), summer (June, July, and August), autumn (September, October, and November), and winter (December, January, and February) mean PM2.5 concentrations: ad estimates; eh ground-observations; il ground observation versus estimates distribution scatter plots

Annual estimated PM2.5 patterns for the NCP are plotted in Fig. 10. Generally, low AOD and PM2.5 levels occurred in NW Hebei Province, with its low population density and few industries. These spatial AOD (Fig. 10a) and PM2.5 (Fig. 10b) distribution results were not completely consistent, however, which indicated that their interrelationship was complex. Meanwhile, a heavily polluted region was revealed at the junction of the five provinces (Hebei, Henan, Anhui, Jiangsu, and Shandong). Scatter plots for annualized ground-level observations vs estimated PM2.5 levels are shown in Fig. 10c, with the R and RMSE calculated to be 0.94 and 3.64 μg/m3 (Fig. 10d), respectively.

Fig. 10
figure 10

Spatial distribution for annual average PM2.5 concentrations: a AOD; b estimates; c ground-level observations; d scatter plot for ground-level observations versus estimates

Annualized PM2.5 estimate averages for ten hours (00:00–09:00 (Coordinated Universal Time, UTC)), in 2017, are shown in Fig. 11, whereas Fig. 12 shows the corresponding ground-level observation frequencies. Figure 11 proves that the proposed model can provide at least ten hourly PM2.5 estimations in one day for any given area, i.e., it can provide PM2.5 concentration information at high frequencies. In Fig. 12, we can see that the spatial distribution of ground-level measurements exhibited temporal variations, peaking at noon, local time (03:00 UTC), whereas the data volume gradually decreased in the mornings and afternoons. This could be explained by the fact that the ability of the satellite to capture aerosol signals decreased as the solar zenith angle increased (Zhang et al., 2019).

Fig. 11
figure 11

Annualized PM2.5 estimate averages for different hours (as UTC) in 2017: aj represent 00:00–09:00, respectively

Fig. 12
figure 12

Annualized frequencies for PM2.5 concentration estimates for different hours in 2017 (as UTC): aj represent 00:00–09:00, respectively

Variations in annual average hourly PM2.5 estimations experienced by different cities are shown in Fig. 13. The cities were selected based on their distribution and representativeness for different regions. They displayed similar trends, i.e., over the period 00:00–09:00 (UTC), the PM2.5 levels peaked at certain times and then decreased. The peaks for Hefei (117.24, 31.84), Shijiazhuang (114.52, 38.03), Nanjing (118.80, 32.07), and Jinan (117.13, 36.64) appeared at either 02:00 or 03:00 (UTC), whereas Beijing (116.41, 39.90) and Zhengzhou (113.64, 34.75) peaked at 01:00. Minimum values were experienced by all cities at either 08:00 or 09:00 (UTC). Figure 13 shows that the proposed model could describe variety in PM2.5 concentrations at a temporal resolution of 1 h.

Fig. 13
figure 13

Annual mean variations in 2017 for hourly PM2.5 concentrations in selected study area cities

Summary and Conclusions

It is still a challenge to derive high temporal ground-level PM2.5 accurately using satellite-derived AOD data, especially for regions with higher particulate matter concentrations and complex compositions, such as the NCP, which has become heavily polluted with respect to PM2.5. Herein, we presented a DNN model that was calibrated using aerosol product (from a new-generation geostationary satellite Himawari-8), meteorological, and land cover information to estimate hourly ground-level PM2.5 concentrations of NCP. The estimated PM2.5 concentrations had a spatial and temporal resolution of 0.05° and 1 h, respectively, which can capture detailed variations in temporal PM2.5 distributions than would have been possible using polar orbit satellites such as MODIS. A total of 11 independent variables were used to fit the proposed model: AHI AOD, surface pressure, air temperature, E and N wind speeds, PBLH, RH, surface albedo, SWGDN, elevation, and NDVI. Through data integration, a total of 151,726 records related to 313 ground stations were collected. To validate the model performance, tenfold CV was conducted, and it was found that both sample-based (R = 0.86, RMSE = 21.40 μg/m3) and site-based (R = 0.83, RMSE = 23.65 μg/m3) CVs exhibited satisfactory performances. R values were calculated to be 0.81 and 0.90, respectively, for daily and monthly averaged PM2.5 levels.

When mapped, the estimated PM2.5 concentrations for the NCP showed clear seasonal variations, with the highest PM2.5 concentrations appearing in winter, followed by spring, autumn, and summer. The annual patterns showed that low PM2.5 concentration levels occurred in NW Hebei Province, whereas the area representing the junction of Hebei, Henan, Anhui, Jiangsu, and Shandong provinces was identified as being heavily polluted.

We also produced mapping, in which 2017 hourly data were used to generate annualized averages for ten hours (from 00:00 to 09:00 UTC). These results suggested that the proposed model could provide at least ten different hourly PM2.5 estimations daily, and thus, it had the capability to reveal high levels of atmosphere variation over time. Such successful testing allowed us to conclude that new-generation geostationary satellites have the potential to be used as useful data sources for ground-level PM2.5 estimation.