Skip to main content
Log in

A novel PM2.5 concentrations probability density prediction model combines the least absolute shrinkage and selection operator with quantile regression

  • Research Article
  • Published:
Environmental Science and Pollution Research Aims and scope Submit manuscript

Abstract

PM2.5 has a significant negative impact on human health and atmospheric quality, and accurate prediction of its concentration is necessary. When using common point prediction models for PM2.5 concentration prediction, the influence of various uncertainties on PM2.5 concentrations makes the prediction results suffer from poor accuracy. To address this issue, this paper proposes the quantile regression neural network (QRNN) model based on the least absolute shrinkage and selection operator (LASSO), combined with kernel density estimation (KDE) for probabilistic density prediction of PM2.5 concentrations. The model uses LASSO regression to select the influential factors, and then the quartiles of daily PM2.5 concentrations calculated by the QRNN model are imported into the KDE model to obtain the probability density predictions of PM2.5 concentrations. In the paper, empirical analyses are carried out with the cities of Beijing and Jinan in China as well as six other datasets, and the prediction performance of the model is assessed by using evaluation criteria in both point prediction and interval prediction. The simulation reveals that the predictive performance of the LASSO-QRNN-KDE model is well, and the model is not only effective in filtering high-dimensional data, but also has a higher accuracy compared to common research models. In addition, the model is able to describe the uncertainty of PM2.5 concentration fluctuations and carry more information on the variation of PM2.5 concentrations, which can provide a novel and excellent PM2.5 concentration prediction tool for relevant policy makers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

The datasets analysed during the current study are available in the National Meteorological Data Center of China repository, http://data.cma.cn/.

References

  • Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37(4):373–384

    Article  Google Scholar 

  • Cao Q, Lian S, Chen SC et al (2018) WRF modeling of PM2.5 remediation by SALSCS and its clean air flow over Beijing terrain. Sci Total Environ 626(JUN.1):134

    Article  CAS  Google Scholar 

  • Cao MN, Tian P, Li GR (2021) Study on Lasso penalized quantile regression of optimal portfolio. J Syst Sci Math Sci 1–17

  • Geng G, Zhang Q, Martin RV et al (2015) Estimating long-term PM2.5, concentrations in China using satellite-based aerosol optical depth and a chemical transport model. Remote Sens Environ 166:262–270

    Article  Google Scholar 

  • He YY, Xu QF, Yang SL et al (2013) A power load probability density forecasting method based on RBF neural network quantile regression. Zhongguo Dianji Gongcheng Xuebao/Proc Chin Soc Electr Eng 33(1):93–98

    CAS  Google Scholar 

  • He YY, Yang Q, Wang S et al (2019) Electricity consumption probability density forecasting method based on LASSO-quantile regression neural network. Appl Energy 233–234(JAN.1):565–575

    Article  Google Scholar 

  • He P, Lan W, Ding Y (2021) Is the Chinese stock market predictable?—an evidence based on the combination LASSO-logistic model. Stat Res 38(05):82–96

    Google Scholar 

  • Hoerl A, Kennard R (2000) Taylor & Francis Online : Ridge regression: biased estimation for nonorthogonal problems - Technometrics - Volume 12, Issue 1. Technometrics 42(1):7

    Article  Google Scholar 

  • Huang CJ, Kuo PH (2018) A deep CNN-LSTM model for particulate matter (PM2.5) Forecasting in Smart Cities. Sensors 18(7):2220

    Article  Google Scholar 

  • Huang G, Li X, Zhang B et al (2021) PM2.5 concentration forecasting at surface monitoring sites using GRU neural network based on empirical mode decomposition. Sci Total Environ 768:144516

    Article  CAS  Google Scholar 

  • Jasleen KS, Mamta M (2021) An efficient correlation based adaptive LASSO regression method for air quality index prediction. Earth Sci Informat (prepublish)

  • Jiang F, Qiao YQ (2021) PM2.5 Concentration prediction based on sample entropy and improved extreme learning machine. Statist Decis 37(03):166–171

    Google Scholar 

  • Kang Q, Wu J, Chen M, Jeon BN (2021) Do macroprudential policies affect the bank financing of firms in China? Evidence from a quantile regression approach. J Int Money Financ 115

  • Koenker R (2015) Quantile regression. J Econ Perspect 15(4):143–156

    Article  Google Scholar 

  • Koen F, Boshuizen HC, Nielen MMJ et al (2020) Mapping chronic disease prevalence based on medication use and socio-demographic variables: An application of LASSO in healthcare in the Netherlands

  • Koenker R, Bassett GW (1978) Regression quantiles[J]. Econometrica 46(1):211–244

    Article  Google Scholar 

  • Lee D, Mukhopadhyay S, Rushworth A et al (2017) A rigorous statistical framework for spatio-temporal pollution prediction and estimation of its long-term impact on health. Biostatistics 18(2):370–385

    Google Scholar 

  • Li C, Lou C, Luo D, Xing K (2021) Chinese corporate distress prediction using LASSO: the role of earnings management. Int Rev Financ Anal (prepublish)

  • Liao T, Wang S, Ai J et al (2017) Heavy pollution episodes, transport pathways and potential sources of PM2.5 during the winter of 2013 in Chengdu (China). Sci Total Environ 584–585(apr.15):1056–1065

    Article  Google Scholar 

  • Lightstone SD, Moshary F, Gross B (2017) Comparing CMAQ forecasts with a neural network forecast model for PM2.5 in New York. Atmosphere 8(12):161

    Article  Google Scholar 

  • Li W, Kong D, Wu J (2017) A new hybrid model FPA-SVM considering cointegration for particular matter concentration forecasting: a case study of Kunming and Yuxi, China. Comput Intell Neurosci 2017:1–11

    CAS  Google Scholar 

  • Madsen K, Nielsen HB (1993) A finite smoothing algorithm for linear $l_1 $ estimation. SIAM J Optim 3(2):223–235

  • Masiol M, Formenton G, Pasqualetto A et al (2013) Seasonal trends and spatial variations of PM10-bounded polycyclic aromatic hydrocarbons in Veneto Region, Northeast Italy. Atmos Environ 79(nov.):811–821

    Article  CAS  Google Scholar 

  • Mukhopadhyay S, Sahu SK (2018) A Bayesian spatiotemporal model to estimate long-term exposure to outdoor air pollution at coarser administrative geographies in England and Wales. J R Stat Soc Ser A Stat Soc 181(2):465–486

    Article  Google Scholar 

  • Parzen E (1962) On estimation of probability density function and mode[J]. Ann Math Stat 33(3):1065–1076

    Article  Google Scholar 

  • Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12(4)

  • Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function[J]. Annals of Mathematical Statistics 27(3):832–837

    Article  Google Scholar 

  • Ruby-Figueroa R, Saavedra J, Bahamonde N et al (2016) Permeate flux prediction in the ultrafiltration of fruit juices by ARIMA models. J Membr Sci S0376738816307207

  • Salari TE, Roumiani A, Kazemzadeh E (2021) Globalization, renewable energy consumption, and agricultural production impacts on ecological footprint in emerging countries: using quantile regression approach. Environ Sci Pollut Res Int

  • Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6(2)

  • Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B: Methodol 53(3):683–690

    Google Scholar 

  • Shen Y, Xu W, Jie C (2018) Wind power forecasting using multi-objective evolutionary algorithms for wavelet neural network-optimized prediction intervals. Appl Sci 8(2):185

    Article  Google Scholar 

  • Sun J, Zhao X, Xu C (2021) Crude oil market autocorrelation: evidence from multiscale quantile regression analysis. Energy Econ 98

  • Syrakov D, Prodanova M, Georgieva E et al (2016) Simulation of European air quality by WRF–CMAQ models using AQMEII-2 infrastructure. J Comput Appl Math 232–245

  • Taha A, Rahim A, Habshah M, Intisar IA (2018) Bayesian variable selection and coefficient estimation in heteroscedastic linear regression model. J Appl Stat 45(14)

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1)

  • Wang H, Li J, Peng Y et al (2019) The impacts of the meteorology features on PM_(2.5) levels during a severe haze episode in central-east China. Atmos Environ 197(JAN):177–189

    Article  CAS  Google Scholar 

  • Xiao L, Lang Y, Christakos G (2018) High-resolution spatiotemporal mapping of PM2.5 concentrations at mainland China using a combined BME-GWR technique. Atmos Environ 173(jan.):295–305

    Article  CAS  Google Scholar 

  • Xu H, Bechle MJ, Wang M et al (2019) National PM_(2.5) and NO_2 exposure models for China based on land use regression, satellite measurements, and universal kriging. Sci Total Environ 655(MAR.10):423–433

    Article  CAS  Google Scholar 

  • Yang X, Wu Q, Zhao R et al (2019) New method for evaluating winter air quality: PM2.5 assessment using community multi-scale air quality modeling (CMAQ) in Xi’an. Atmos Environ 211(AUG.):18–28

    Article  CAS  Google Scholar 

  • Zhang L, Lin J, Qiu R et al (2018) Trend analysis and forecast of PM2.5 in Fuzhou, China using the ARIMA model. Ecol Indic 95(DEC.):702–710

    Article  CAS  Google Scholar 

  • Zhou K, Qu Z (2017) Application of BP neural network optimized by genetic simulated annealing algorithm to prediction of air quality index in Lanzhou. 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA). IEEE

Download references

Author information

Authors and Affiliations

Authors

Contributions

Shaomei Yang: methodology, resources, and writing — review and editing. Haoyue Wu: conceptualization, software, validation, investigation, data curation, and writing — original draft.

Corresponding author

Correspondence to Haoyue Wu.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interest

The authors declare no competing interests.

Additional information

Responsible editor: Ilhan Ozturk

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Tables 8, 9, 10

Table 8 Statistical description of data sets
Table 9 The results of the correlation analysis (Beijing)
Table 10 The results of the correlation analysis (Jinan)

Appendix 2

Tables 11, 12

Table 11 LASSO regression results of Beijing data
Table 12 LASSO regression results of Jinan data

Appendix 3

Tables 13, 14

Table 13 Prediction results for three situations of Beijing data
Table 14 Prediction results for three situations of Jinan data

Appendix 4

Tables 15, 16, 17, 18

Table 15 Predicted values under various models of Beijing data (\(\mu g/{m}^{3}\))
Table 16 Relative errors under various models of Beijing data (%)
Table 17 Predicted values under various models of Jinan data (\(\mu g/{m}^{3}\))
Table 18 Relative errors under various models of Jinan data (%)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, S., Wu, H. A novel PM2.5 concentrations probability density prediction model combines the least absolute shrinkage and selection operator with quantile regression. Environ Sci Pollut Res 29, 78265–78291 (2022). https://doi.org/10.1007/s11356-022-21318-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11356-022-21318-3

Keywords

Navigation