Abstract
PM2.5 has a significant negative impact on human health and atmospheric quality, and accurate prediction of its concentration is necessary. When using common point prediction models for PM2.5 concentration prediction, the influence of various uncertainties on PM2.5 concentrations makes the prediction results suffer from poor accuracy. To address this issue, this paper proposes the quantile regression neural network (QRNN) model based on the least absolute shrinkage and selection operator (LASSO), combined with kernel density estimation (KDE) for probabilistic density prediction of PM2.5 concentrations. The model uses LASSO regression to select the influential factors, and then the quartiles of daily PM2.5 concentrations calculated by the QRNN model are imported into the KDE model to obtain the probability density predictions of PM2.5 concentrations. In the paper, empirical analyses are carried out with the cities of Beijing and Jinan in China as well as six other datasets, and the prediction performance of the model is assessed by using evaluation criteria in both point prediction and interval prediction. The simulation reveals that the predictive performance of the LASSO-QRNN-KDE model is well, and the model is not only effective in filtering high-dimensional data, but also has a higher accuracy compared to common research models. In addition, the model is able to describe the uncertainty of PM2.5 concentration fluctuations and carry more information on the variation of PM2.5 concentrations, which can provide a novel and excellent PM2.5 concentration prediction tool for relevant policy makers.
Similar content being viewed by others
Data availability
The datasets analysed during the current study are available in the National Meteorological Data Center of China repository, http://data.cma.cn/.
References
Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37(4):373–384
Cao Q, Lian S, Chen SC et al (2018) WRF modeling of PM2.5 remediation by SALSCS and its clean air flow over Beijing terrain. Sci Total Environ 626(JUN.1):134
Cao MN, Tian P, Li GR (2021) Study on Lasso penalized quantile regression of optimal portfolio. J Syst Sci Math Sci 1–17
Geng G, Zhang Q, Martin RV et al (2015) Estimating long-term PM2.5, concentrations in China using satellite-based aerosol optical depth and a chemical transport model. Remote Sens Environ 166:262–270
He YY, Xu QF, Yang SL et al (2013) A power load probability density forecasting method based on RBF neural network quantile regression. Zhongguo Dianji Gongcheng Xuebao/Proc Chin Soc Electr Eng 33(1):93–98
He YY, Yang Q, Wang S et al (2019) Electricity consumption probability density forecasting method based on LASSO-quantile regression neural network. Appl Energy 233–234(JAN.1):565–575
He P, Lan W, Ding Y (2021) Is the Chinese stock market predictable?—an evidence based on the combination LASSO-logistic model. Stat Res 38(05):82–96
Hoerl A, Kennard R (2000) Taylor & Francis Online : Ridge regression: biased estimation for nonorthogonal problems - Technometrics - Volume 12, Issue 1. Technometrics 42(1):7
Huang CJ, Kuo PH (2018) A deep CNN-LSTM model for particulate matter (PM2.5) Forecasting in Smart Cities. Sensors 18(7):2220
Huang G, Li X, Zhang B et al (2021) PM2.5 concentration forecasting at surface monitoring sites using GRU neural network based on empirical mode decomposition. Sci Total Environ 768:144516
Jasleen KS, Mamta M (2021) An efficient correlation based adaptive LASSO regression method for air quality index prediction. Earth Sci Informat (prepublish)
Jiang F, Qiao YQ (2021) PM2.5 Concentration prediction based on sample entropy and improved extreme learning machine. Statist Decis 37(03):166–171
Kang Q, Wu J, Chen M, Jeon BN (2021) Do macroprudential policies affect the bank financing of firms in China? Evidence from a quantile regression approach. J Int Money Financ 115
Koenker R (2015) Quantile regression. J Econ Perspect 15(4):143–156
Koen F, Boshuizen HC, Nielen MMJ et al (2020) Mapping chronic disease prevalence based on medication use and socio-demographic variables: An application of LASSO in healthcare in the Netherlands
Koenker R, Bassett GW (1978) Regression quantiles[J]. Econometrica 46(1):211–244
Lee D, Mukhopadhyay S, Rushworth A et al (2017) A rigorous statistical framework for spatio-temporal pollution prediction and estimation of its long-term impact on health. Biostatistics 18(2):370–385
Li C, Lou C, Luo D, Xing K (2021) Chinese corporate distress prediction using LASSO: the role of earnings management. Int Rev Financ Anal (prepublish)
Liao T, Wang S, Ai J et al (2017) Heavy pollution episodes, transport pathways and potential sources of PM2.5 during the winter of 2013 in Chengdu (China). Sci Total Environ 584–585(apr.15):1056–1065
Lightstone SD, Moshary F, Gross B (2017) Comparing CMAQ forecasts with a neural network forecast model for PM2.5 in New York. Atmosphere 8(12):161
Li W, Kong D, Wu J (2017) A new hybrid model FPA-SVM considering cointegration for particular matter concentration forecasting: a case study of Kunming and Yuxi, China. Comput Intell Neurosci 2017:1–11
Madsen K, Nielsen HB (1993) A finite smoothing algorithm for linear $l_1 $ estimation. SIAM J Optim 3(2):223–235
Masiol M, Formenton G, Pasqualetto A et al (2013) Seasonal trends and spatial variations of PM10-bounded polycyclic aromatic hydrocarbons in Veneto Region, Northeast Italy. Atmos Environ 79(nov.):811–821
Mukhopadhyay S, Sahu SK (2018) A Bayesian spatiotemporal model to estimate long-term exposure to outdoor air pollution at coarser administrative geographies in England and Wales. J R Stat Soc Ser A Stat Soc 181(2):465–486
Parzen E (1962) On estimation of probability density function and mode[J]. Ann Math Stat 33(3):1065–1076
Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12(4)
Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function[J]. Annals of Mathematical Statistics 27(3):832–837
Ruby-Figueroa R, Saavedra J, Bahamonde N et al (2016) Permeate flux prediction in the ultrafiltration of fruit juices by ARIMA models. J Membr Sci S0376738816307207
Salari TE, Roumiani A, Kazemzadeh E (2021) Globalization, renewable energy consumption, and agricultural production impacts on ecological footprint in emerging countries: using quantile regression approach. Environ Sci Pollut Res Int
Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6(2)
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B: Methodol 53(3):683–690
Shen Y, Xu W, Jie C (2018) Wind power forecasting using multi-objective evolutionary algorithms for wavelet neural network-optimized prediction intervals. Appl Sci 8(2):185
Sun J, Zhao X, Xu C (2021) Crude oil market autocorrelation: evidence from multiscale quantile regression analysis. Energy Econ 98
Syrakov D, Prodanova M, Georgieva E et al (2016) Simulation of European air quality by WRF–CMAQ models using AQMEII-2 infrastructure. J Comput Appl Math 232–245
Taha A, Rahim A, Habshah M, Intisar IA (2018) Bayesian variable selection and coefficient estimation in heteroscedastic linear regression model. J Appl Stat 45(14)
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1)
Wang H, Li J, Peng Y et al (2019) The impacts of the meteorology features on PM_(2.5) levels during a severe haze episode in central-east China. Atmos Environ 197(JAN):177–189
Xiao L, Lang Y, Christakos G (2018) High-resolution spatiotemporal mapping of PM2.5 concentrations at mainland China using a combined BME-GWR technique. Atmos Environ 173(jan.):295–305
Xu H, Bechle MJ, Wang M et al (2019) National PM_(2.5) and NO_2 exposure models for China based on land use regression, satellite measurements, and universal kriging. Sci Total Environ 655(MAR.10):423–433
Yang X, Wu Q, Zhao R et al (2019) New method for evaluating winter air quality: PM2.5 assessment using community multi-scale air quality modeling (CMAQ) in Xi’an. Atmos Environ 211(AUG.):18–28
Zhang L, Lin J, Qiu R et al (2018) Trend analysis and forecast of PM2.5 in Fuzhou, China using the ARIMA model. Ecol Indic 95(DEC.):702–710
Zhou K, Qu Z (2017) Application of BP neural network optimized by genetic simulated annealing algorithm to prediction of air quality index in Lanzhou. 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA). IEEE
Author information
Authors and Affiliations
Contributions
Shaomei Yang: methodology, resources, and writing — review and editing. Haoyue Wu: conceptualization, software, validation, investigation, data curation, and writing — original draft.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interest
The authors declare no competing interests.
Additional information
Responsible editor: Ilhan Ozturk
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, S., Wu, H. A novel PM2.5 concentrations probability density prediction model combines the least absolute shrinkage and selection operator with quantile regression. Environ Sci Pollut Res 29, 78265–78291 (2022). https://doi.org/10.1007/s11356-022-21318-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11356-022-21318-3