Nonparametric location–scale model for the joint forecasting of $$\hbox {SO}_{{2}}$$ and $$\hbox {NO}_{{x}}$$ pollution episodes

Roca-Pardiñas, J.; Ordóñez, C.; Lado-Baleato, O.

doi:10.1007/s00477-020-01901-1

Nonparametric location–scale model for the joint forecasting of $\hbox {SO}_{{2}}$ and $\hbox {NO}_{{x}}$ pollution episodes

Original Paper
Published: 22 October 2020

Volume 35, pages 231–244, (2021)
Cite this article

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

251 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

We present a method to forecast pollution episodes with a bivariate response. The method simultaneously estimates the concentrations of two pollutants, using historical data. It is based on a location–scale model where the means and the standard deviations are approximated by kernel smoothers in additive models, while the variance–covariance matrix is obtained from the residuals of the previous models. The method provides not only an estimation of the concentration of both pollutants over time but also uncertainty regions covering a specific percentage of the data. The suitability of the model was tested with both simulated and real data (specifically $\hbox {SO}_2$ and $\hbox {NO}_x$ concentrations from a coal-fired power station). The results have proved highly satisfactory in both cases. The percentage of data covered by the uncertainty region, its area and a new loss function, a variant of the pinball loss function, were used as metrics to evaluate the performance of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting pollution incidents through semiparametric quantile regression models

Article 15 February 2019

A Bayesian approach to forecasting daily air-pollutant levels

Article 20 March 2018

Automatic specification of piecewise linear additive models: application to forecasting natural gas demand

Article 25 January 2017

References

Abhilash MSK, Thakur A, Gupta D, Sreevidya B (2018) Time series analysis of air pollution in Bengaluru using ARIMA model. In: Perez G, Tiwari S, Trivedi M, Mishra K (eds) Ambient communications and computer systems. Advances in intelligent systems and computing, vol 696. Springer, Singapore
Antanasijević D, Pocajt V, Perić-Grujić A, Ristić M (2018) Multiple-input-multiple-output general regression neural networks model for the simultaneous estimation of traffic-related air pollutants. Atmospheric Pollut Res 9:388–397
Article Google Scholar
Azid IA, Ripin ZM, Aris MS, Ahmad AL, Seetharamu KN, Yusoff RM (2000) Predicting combined-cycle natural gas power plant emissions by using artificial neural networks. In: TENCON proceedings. Intelligent systems and technologies for the new millennium (Cat. o.00CH37119), Kuala Lumpur, Malaysia, vol 3, pp 512–517
Eastoe EF (2008) A hierarchical model for non-stationary multivariate extremes: a case study of surface-level ozone and NOx data in the UK. Environmetrics 20:428–444
Article Google Scholar
Ferretti G, Piroddi L (2001) Estimation of $\text{ NO}_x$ emissions in thermal power plants using neural networks. J Eng Gas Turbines Power Trans Asme 132(2):465–471
Article Google Scholar
Garcia JM, Teodoro F, Cerdeira R, Coelho LMR, Prashant K, Carvalho MG (2016) Developing a methodology to predict PM10 concentrations in urban areas using generalized linear models. Environ Technol 37(18):2316–2325
Article CAS Google Scholar
García Nieto PJ, Sánchez-Lasheras F, García-Gonzalo E, de Cos Juez FJ (2018) PM10 concentration forecasting in the metropolitan area of Oviedo (Northern Spain) using models based on SVM, MLP, VARMA and ARIMA: a case study. Sci Total Environ 621:753–761
Article Google Scholar
Genest C, Rivest L (1993) Statistical inference procedures for bivariate Archimedean copulas. J Am Stat Assoc 1993(88):1034–1043
Article Google Scholar
Gilson M, Dahmen D, Moreno-Bote R, Insabato A, Helias M (2019) The covariance perceptron: a new paradigm for classification and processing of time series in recurrent neuronal networks. BioRxiv. https://doi.org/10.1101/562546
Giorgio C, Scanagatta M (2016) Air pollution prediction via multi-label classification. Environ Model Softw 80:259–264
Article Google Scholar
Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall/CRC Monographs on Statistics and Applied Probability, London
Google Scholar
Hsu KJ (1992) Time series analysis of the interdependence among air pollutants. Atmospheric Environ Part B Urban Atmosphere 26:491–503
Article Google Scholar
Ibrahim MZ, Roziah Z, Marzuki I, Muhd SL (2009) Forecasting and time series analysis of air pollutants in several area of Malaysia. Am J Environ Sci 5(5):625–632
Article CAS Google Scholar
Kadiyala A, Kumar A (2019) Vector time series models for prediction of air quality inside a public transportation bus using available software. Environ Prog Sustain Energy 33(22):337–341
Google Scholar
Kreuzer A, Valle LD, Czado C (2019) A Bayesian non-linear state space copula model to predict air pollution in Beijing. arXiv:1903.08421
Martínez-Silva I, Roca-Pardiñas I, Ordóñez C (2016) Forecasting SO2 pollution incidents by means of quantile curves based on additive models. Environmetrics 27(3):147–157
Article Google Scholar
Muñoz E, Martín ML, Turias IJ, Jimenez-Come MJ, Trujillo FJ (2014) Prediction of PM10 and SO2 exceedances to control air pollution in the Bay of Algeciras, Spain. Stoch Environ Res Risk Assess 28(6):1409–1420
Article Google Scholar
Nelsen RB (1999) An introduction to copulas. Springer, New York
Book Google Scholar
Perez P, Trier A, Reyes J (2000) Prediction of PM2.5 concentrations several hours in advance using neural networks in Santiago, Chile. Atmospheric Environ 34:1189–1196
Article CAS Google Scholar
Roca-Pardiñas J, Ordóñez C (2019) Predicting pollution incidents through semiparametric quantile regression models. Stoch Environ Res Risk Assess 33(3):673–685
Article Google Scholar
Roca-Pardiñas J, González Manteiga W, Febrero-Bande M, Prada-Sánchez JM, Cadarso-Suárez C (2004) Predicting binary time series of SO2 using generalized additive models with unknown link function. Environmetrics 15(7):729–742
Article Google Scholar
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B (Methodol) 53(3):683–690
Google Scholar
Siew LY, Ching LY, Wee PMJ (2008) ARIMA and integrated ARFIMA models for forecasting air pollution index in Shah Alam Selangor. Malays J Anal Sci 12(1):257–263
Google Scholar
Snezhana PK, Krassi VR, Todor V, Silviya BP (2012) Using copulas to measure association between air pollution and respiratory diseases. Int J Environ Ecol Eng 6(11):703–708
Google Scholar
Yu K, Lu Z (2004) Local linear additive quantile regression. Scand J Stat 31:333–346
Article Google Scholar
Zhanqiong H, Sriboonchitta S, Jing D (2013) Modeling dependence dynamics of air pollution: time series analysis using a copula based GARCH type model. In: Huynh VN, Kreinovich V, Sriboonchitta S, Suriya K (eds) Uncertainty analysis in econometrics with applications. Advances in intelligent systems and computing, vol 200. Springer, Berlin

Download references

Acknowledgements

Javier Roca-Pardiñas acknowledges financial support by the Grant MTM2017-89422-P (MINECO/AEI/FEDER, UE). He also acknowledges financial support from the Xunta de Galicia (Centro singular de investigación de Galicia accreditation 2019-2022) and the EU (ERDF), Ref. ED431G2019/06. Óscar Lado-Baleato is funded by a predoctoral grant from the Galician Government (Plan I2C)-Xunta de Galicia.

Author information

Authors and Affiliations

Department of Statistics and OR, SiDOR research group & CINBIO, University of Vigo, 36310, Vigo, Spain
J. Roca-Pardiñas
Department of Mining Exploitation and Prospecting, University of Oviedo, 33600, Mieres, Spain
C. Ordóñez
Department of Statistics, Mathematical Analysis and Optimization, University of Santiago de Compostela, Galicia, Spain
O. Lado-Baleato

Authors

J. Roca-Pardiñas
View author publications
You can also search for this author in PubMed Google Scholar
C. Ordóñez
View author publications
You can also search for this author in PubMed Google Scholar
O. Lado-Baleato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Roca-Pardiñas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: flexible additive transformed model estimation

In order to obtain the estimated additive models in Eq. (3), we have used a backfitting algorithm based on local polynomial kernel smoothers. For mathematical notation simplicity, we consider in this section Y as our response variable and ${\mathbf {X}}=(X_1, \ldots , X_p)$ the vector of covariates. In this regression framework, we consider the transformed additive model:

$$\begin{aligned} E(Y|{\mathbf {X}})= H\left( { \eta _{{\mathbf {X}}} }\right) =H\left( { \alpha + \sum _{j=1}^p f_{j}(X_{j}) }\right) \end{aligned}$$

(18)

where H is a known link function and $\eta =\alpha + \sum _{j=1}^p f_{j}(X_{j})$ is the systematic component. Moreover, in order to guarantee the identification, we assume that $E[f_j(X_j)=0]$. To estimate the model given in (18) we proposed an interactive algorithm based on the New Raphson procedure (extends the ACE—Alternating Conditional Expectation algorithm Hastie and Tibshirani 1990) in combination with a backfitting approach (Yu and Lu 2004).

Let $\{\mathbf {X}_i, Y_{ir}\}_{i=1}^n$ be an independent random sample of $(\mathbf {X}, Y)$. For fitting (18) it is necessary to minimize

$$\begin{aligned} S(\eta )=\sum _{i=1}^n \left( { Y_i - H\left( { \eta _{{\mathbf {X}}_i} }\right) }\right) ^2 \end{aligned}$$

To solve this problem we need to use an iterative process. Here we have used a modified Newton–Raphson algorithm the steps of which are as follows:

Initialize. Compute ${\hat{\alpha }}=H^{-1} ({\bar{Y}})$ with ${\bar{Y}}=n^{-1}\sum _{i=1}^n Y_i$, for the initial estimates, ${\hat{f}}_{1}^0\ldots ={\hat{f}}_{p}^0=0$, ${\hat{\eta }}_i^0={\hat{\alpha }}$, for $i=1,\ldots ,n$.

Step 1. Construct the linearized response ${{\tilde{Y}}}$ and the weights W so that

$$\begin{aligned} {{\tilde{Y}}}_i={\hat{\eta }}_i^0 + \frac{Y_i-H({\hat{\eta }}_i^0)}{H'({\hat{\eta }}_i^0) } \quad \mathrm{and} \quad W_i=\frac{H'({\hat{\sigma }}_i^0)^2 }{{\hat{\sigma }}_i^2} \end{aligned}$$

with $H'({\hat{\partial }})=\delta H / \delta \partial$ and ${\hat{\sigma }}_i^2$ and estimation of $Var(Y_i| {\hat{\mu }}_i^0)$. The estimated ${\hat{\sigma }}_i^2$ can be obtained by fitting an additive model to $(Y_i-H\left( {{\hat{\eta }}_i^0}\right) )^2$.

Step 2. Fit an additive model to ${{\tilde{Y}}}$ weighted by W and compute the updates ${\hat{f}}_j$, for $j=1,\ldots ,p$. At this step, we have used an inner backfitting algorithm based on local polynomial kernel smoothers:

Step 2.1: Cycle $j=1,\ldots ,p$, calculating the partial residuals
$$\begin{aligned} R_i^j={{\tilde{Y}}}_i - {\hat{\alpha }} - \sum _{k=1}^{j-1}{{\hat{f}}_{k}(X_{ik})} - \sum _{k=j+1}^p{{\hat{f}}_{k}^0(X_{ik})} \end{aligned}$$
and compute for $i=1,\ldots ,n$ the updates ${\hat{f}}_j (X_{ij})$ for $i=1,\ldots ,n$, where the linear kernel estimate of $f_j^\tau$ at a localization x is given by ${\hat{f}}_j (x)={\hat{a}}$, with $({\hat{a}}, {\hat{b}})$ being the minimizers of
$$\begin{aligned} \sum _{i=1}^n W_i \left( { R_i^j - a-b(X_{ij}-x)}\right) K\left( \frac{X_{ij} - x}{h_j}\right) ^2 \end{aligned}$$
(19)
where $K(\cdot )$ denotes a kernel function (a symmetric density) and $h_j>0$ is the smoothing parameter. At this step, the obtained estimates ${\hat{f}}_j$ must be refocused by considering ${\hat{f}}_j (X_{ij})={\hat{f}}_j (X_{ij})- n^{-1}\sum _{l=1}^n {\hat{f}}_j (X_{lj})$ .

Here, we have used a Gaussian kernel and the bandwidth $h_j$ is recalculated in each iteration following t cross-validation procedure
$$\begin{aligned} CV=\sum _{i=1}^n{ W_i \left( R_i^j - {\hat{f}}_j^{(-i)}(X_{ij})\right) ^2} \end{aligned}$$
(20)
where ${\hat{f}}_j^{(-i)}(X_{ij})$ is the leave-one-out estimator at $X_{ij}$ obtained from the sample without the i-th data vector.
Step 2.2: Repeat Step 2.1 replacing ${\hat{f}}_{j}^0 (X_{ij})$ by ${\hat{f}}_{j} (X_{ij})$ for $j=1,\ldots ,p$ and $i=1,\ldots ,n$, until the convergence criterion
$$\begin{aligned} \frac{ \sum _{i=1}^n \left( { {\hat{f}}_{j}(X_{ij})-{\hat{f}}_j^0(X_{ij})}\right) ^2}{ \sum _{i=1}^n \left( { {\hat{f}}_{j}^0(X_{ij})}\right) ^2+0.001} \le \varepsilon \quad \text {for all } \quad j=1,\ldots ,p \end{aligned}$$
is reached.

Step 3. Repeat Steps 1–3 with ${\hat{\eta }}_i^0$ being replaced by ${\hat{\eta }}_i= {\hat{\alpha }} + \sum _{j=1}^p {{\hat{\alpha }}_j \cdot X_{ij}}+ \sum _{j=1}^p {\hat{f}}_j(X_{ij})$ for $i=1,\ldots ,n$ until

$$\begin{aligned} \frac{ |\textit{MSE} ({\hat{\eta }}_0, Y) -\textit{MSE} ({\hat{\eta }}, Y)|}{\textit{MSE}({\hat{\eta }}_0, Y)} \le \varepsilon \end{aligned}$$

where $\varepsilon$ is a small threshold and the mean squared error $\textit{MSE}({\hat{\eta }},Y)$ is defined as $\textit{MSE}({\hat{\eta }},Y)=n^{-1}\sum _{i=1}^n W_i(Y_i-H({\hat{\eta }}_i))^2$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roca-Pardiñas, J., Ordóñez, C. & Lado-Baleato, O. Nonparametric location–scale model for the joint forecasting of $\hbox {SO}_{{2}}$ and $\hbox {NO}_{{x}}$ pollution episodes. Stoch Environ Res Risk Assess 35, 231–244 (2021). https://doi.org/10.1007/s00477-020-01901-1

Download citation

Accepted: 08 October 2020
Published: 22 October 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s00477-020-01901-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric location–scale model for the joint forecasting of \(\hbox {SO}_{{2}}\) and \(\hbox {NO}_{{x}}\) pollution episodes

Abstract

Access this article

Similar content being viewed by others

Predicting pollution incidents through semiparametric quantile regression models

A Bayesian approach to forecasting daily air-pollutant levels

Automatic specification of piecewise linear additive models: application to forecasting natural gas demand

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: flexible additive transformed model estimation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonparametric location–scale model for the joint forecasting of \(\hbox {SO}_{{2}}\) and \(\hbox {NO}_{{x}}\) pollution episodes

Abstract

Access this article

Similar content being viewed by others

Predicting pollution incidents through semiparametric quantile regression models

A Bayesian approach to forecasting daily air-pollutant levels

Automatic specification of piecewise linear additive models: application to forecasting natural gas demand

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: flexible additive transformed model estimation

Appendix: flexible additive transformed model estimation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation