Abstract
This paper highlights the problem of step-length selection for the one-step-ahead prediction of ozone called the data time interval. This is done using a case study-based comparison of two approaches for predicting the maximum daily values of tropospheric ozone. The first approach is the 1-day-ahead prediction and the second is the prediction of the maximum values based on a multi-step-ahead iteration of 1-h predictions. Gaussian process modelling is utilised for this comparison. In particular, evolving Gaussian-process models are used that update on-line with the incoming measurement data. These sorts of models have been successfully used in the past for the prediction of ozone pollution. This paper contributes an assessment of the way that the maximum ozone values are predicted. A comparison of the daily maximum ozone values forecasted by a model based on 1-day-ahead predictions with those obtained by iterated 1-h-ahead predictions of the ozone with predictions at predetermined hours of the day is given. The forecast results are in favour of the on-line model based on hourly predictions when approaching closer to the real maximum values of ozone, and in favour of the daily predictions when they are made on a daily basis.
Similar content being viewed by others
References
Al-Alawi SM, Abdul-Wahab SA, Bakheit CS (2008) Combining principal component regression and artificial neural-networks for more accurate predictions of ground-level ozone. Environ Model Softw 23:396–403
Alyousifi Y, Masseran N, Ibrahim K (2017) Modeling the stochastic dependence of air pollution index data. Stoch Environ Res Risk Assess. doi:10.1007/s00477-017-1443-7
Andrawis RR, Atiya AF, El-Shishiny H (2011) Combination of long term and short term forecasts, with application to tourism demand forecasting. Int J Forecast 27(3):870–886
Bruno F, Paci L (2014) Spatiotemporal model for short-term predictions of air pollution data. In: Lanzarone E, Ieva F (eds) The contribution of young researchers to Bayesian statistics. Springer, Cham, pp 91–94
Casals J, Jerez M, Sotoca S (2009) Modelling and forecasting time series sampled at different frequencies. J Forecast 28(4):316–342
Chan LLT, Liu Y, Chen J (2013) Nonlinear system identification with selective recursive Gaussian process models. Ind Eng Chem Res 52(51):18276–18286
Conde-Amboage M, González-Manteiga W, Sánchez-Sellero C (2017) Predicting trace gas concentrations using quantile regression models. Stoch Environ Res Risk Assess 31(6):1359–1370
Ding W, Zhang J, Leung Y (2016) Prediction of air pollutant concentration based on sparse response back-propagation training feedforward neural networks. Environ Sci Pollut Res 23(19):19481–19494
Duenas C, Fernandez MC, Canete S, Carretero J, Liger E (2005) Stochastic model to forecast ground-level ozone concentration at urban and rural areas. Chemosphere 61:1379–1389
EU-Commission (2008) Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe. Off J Eur Commun L152:1–44. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2008:152:0001:0044:EN:PDF
Faris H, Alkasassbeh M, Rodan A (2014) Artificial neural networks for surface ozone prediction: models and analysis. Pol J Environ Stud 23(2):341–348
Faul S, Gregorčič G, Boylan G, Marnane W, Lightbody G, Connolly S (2007) Gaussian process modeling of EEG for the detection of neonatal seizures. IEEE Trans Biomed Eng 54(12):2151–2162
Feng Y, Zhang W, Sun D, Zhang L (2011) Ozone concentration forecast method based on genetic algorithm optimized back propagation neural networks and SVM data classification. Atmos Environ 45:1979–1985
Gong B, Ordieres-Meré J (2016) Prediction of daily maximum ozone threshold exceedances by preprocessing and ensemble artificial intelligence techniques: case study of Hong Kong. Environ Model Softw 84:290–303
Grašič B, Mlakar P, Božnar M (2006) Ozone prediction based on neural networks and Gaussian processes. Nuovo Cimento Soc Ital Fis C 29(6):651–661
Gregorčič G, Lightbody G (2008) Nonlinear system identification: from multiple-model networks to Gaussian processes. Eng Appl Artif Intell 21(7):1035–1055
Hong SM, Bukhari W (2014) Real-time prediction of respiratory motion using a cascade structure of an extended Kalman filter and support vector regression. Phys Med Biol 59(13):3555–3573
Im U, Bianconi R, Solazzo E, Kioutsioukis I, Badia A, Balzarini A, Bar R, Bellasio R, Brunner D, Chemel C, Curci G, Flemming J, Forkel R, Giordano L, Jimnez-Guerrero P, Hirtl M, Hodzic A, Honzak L, Jorba O, Knote C, Kuenen JJP, Makar PA, Manders-Groot A, Neal L, Prez JL, Pirovano G, Pouliot G, Jose RS, Savage N, Schroder W, Sokhi RS, Syrakov D, Torian A, Tuccella P, Werhahn J, Wolke R, Yahya K, Zabkar R, Zhang Y, Zhang J, Hogrefe C, Galmarini S (2015) Evaluation of operational on-line-coupled regional air quality models over Europe and North America in the context of AQMEII phase 2. Part I: ozone. Atmos Environ 115:404–420. doi:10.1016/j.atmosenv.2014.09.042
Kang H, Park FC, Park FC (2015) Motion optimization using Gaussian process dynamical models. Multibody Syst Dyn 34(4):307–325
Kocijan J (2016) Modelling and control of dynamic systems using Gaussian process models. Springer, Cham
Kocijan J, Gradišar D, Božnar MZ, Grašič B, Mlakar P (2016) On-line algorithm for ground-level ozone prediction with a mobile station. Atmos Environ 131:326–333
Kourentzes N, Petropoulos F, Trapero JR (2014) Improving forecasting by estimating time series structural components across multiple frequencies. Int J Forecast 30(2):291–302
Leith DJ, Heidl M, Ringwood J (2004) Gaussian process prior models for electrical load forecasting. In: Proceedings of 2004 international conference on probabilistic methods applied to power systems, Piscataway, NJ, IEEE. IEEE, pp 112–117
Leithead WE, Zhang Y, Neo KS (2005) Wind turbine rotor acceleration: Identification using Gaussian regression. In: Proceedings of 2nd international conference on informatics in control automation and robotics (ICINCO 2005), Setúbal, INSTICC. INSTICC, pp 84–91
Likar B, Kocijan J (2007) Predictive control of a gas–liquid separation plant based on a Gaussian process model. Comput Chem Eng 31(3):142–152. doi:10.1016/j.compchemeng.2006.05.011
Liu J, Han D (2013) On selection of the optimal data time interval for real-time hydrological forecasting. Hydrol Earth Syst Sci 17(9):3639–3659
MacKay DJC (1998) Introduction to Gaussian processes. NATO ASI Ser 168:133–166
Petelin D, Grancharova A, Kocijan J (2013) Evolving Gaussian process models for the prediction of ozone concentration in the air. Simul Model Pract Theory 33(1):68–80
Quinonero-Candela J, Rasmussen CE, Williams CKI (2007) Large-scale Kernel machines, chapter approximation methods for Gaussian process regression. Neural information processing. The MIT Press, Cambridge, pp 203–223
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Schliep EM, Gelfand AE, Holland DM (2017) Alternating Gaussian process modulated renewal processes for modeling threshold exceedances and durations. Stoch Environ Res Risk Assess. doi:10.1007/s00477-017-1417-9
Shi JQ, Choi T (2011) Gaussian process regression analysis for functional data. Chapman and Hall/CRC, Taylor & Francis Group, Boca Raton
Sud K, Singh B, Kohli HS, Jha V, Gupta KL, Sakhuja V (2002) Evaluation of different sampling times for best prediction of cyclosporine area under the curve in renal transplant recipients. Transplant Proc 34(8):3168–3170
Taylan O (2017) Modelling and analysis of ozone concentration by artificial intelligent techniques for estimating air quality. Atmos Environ 150:356–365
Žabkar R, Honzak L, Skok G, Forkel R, Rakovec J, Ceglar A, Žagar N (2015) Evaluation of the high resolution WRF-Chem (v3.4.1) air quality forecast and its comparison with statistical ozone predictions. Geosci Model Dev 8(7):2119–2137
Zhang Y, Bocquet M, Mallet V, Seigneur C, Baklanov A (2012) Real-time air quality forecasting, part I: history, techniques, and current status. Atmos Environ 60:632–655. doi:10.1016/j.atmosenv.2012.06.031
Zhang Y, Bocquet M, Mallet V, Seigneur C, Baklanov A (2012) Real-time air quality forecasting, part II: state of the science, current research needs, and future prospects. Atmos Environ 60:656–676. doi:10.1016/j.atmosenv.2012.02.041
Acknowledgements
The authors acknowledge the financial support from the Slovenian Research Agency (Projects Nos. L2-5475, L2-8174 and P2-0001). The Slovenian Environment Agency provided part of the data.
Author information
Authors and Affiliations
Corresponding author
Appendix: Performance measures
Appendix: Performance measures
The following are performance measures used in the study.
-
The root-mean-square error—RMSE:
$$\begin{aligned} \mathrm {RMSE} = \sqrt{\frac{1}{N}\sum _{i=1}^N (E(\hat{y}_i)-y_i)^2}, \end{aligned}$$(9)where \(y_i\) and \(\hat{y}_i\) are the observation and the prediction in the i-th step, respectively, \(E(\cdot )\) denotes the expectation, i.e., the mean value, of the random variable, and N is the number of used observations.
-
The standardised mean-squared error—SMSE
$$\begin{aligned} \mathrm {SMSE}=\frac{1}{N}\frac{\sum _{i=1}^N(E(\hat{y}_i)-y_i)^2}{\sigma _y^2}, \end{aligned}$$(10)where \(\sigma _y^2\) is the variance of the observations.
-
The Pearson’s correlation coefficient—PCC:
$$\begin{aligned} \mathrm {PCC}=\frac{\sum _{i=1}^N(E(\hat{y}_i)-E(\hat{{\mathbf {y}}})) (y_i-E({\mathbf {y}}))}{N\sigma _y\sigma _{\hat{y}}}, \end{aligned}$$(11)where \(E(\hat{{\mathbf {y}}})\) is the expectation, i.e., the mean value, of the vector of predictions, and \(\sigma _y\), \(\sigma _{\hat{y}}\) are the standard deviations of the observations and the predictions, respectively.
-
The mean fractional bias—MFB:
$$\begin{aligned} \mathrm {MFB}=\frac{1}{N}\sum _{i=1}^N\frac{E(\hat{y}_i)-y_i}{\frac{1}{2}(E(\hat{y}_i)+y_i)}. \end{aligned}$$(12) -
The factor of the modelled values within a factor of two of the observations—FAC2:
$$\begin{aligned} \mathrm {FAC2}=\frac{1}{N}\sum _{i=1}^Nn_i\ \mathrm {with}\ n_i= {\left\{ \begin{array}{ll} 1 &{} \mathrm {for} \,\,\,0.5\le |\frac{E(\hat{y}_i)}{y_i}|\le 2,\\ 0 &{} \mathrm {else}. \end{array}\right. } \end{aligned}$$(13)
RMSE and SMSE are frequently used measures for the accuracy of the predictions’ mean values, which are 0 in the case of a perfect model. SMSE is the standardised measure with values between 0 and 1. PCC is a measure of the associativity and is not sensitive to bias. Its value is between \(-\,1\) and \(+\,1\), with ideally linearly correlated values resulting in a value 1. MFB is the measure that bounds the maximum bias and gives additional weight to underestimations and less weight to overestimations. Its value is between \(-\,2\) and \(+\,2\), with the value 0 in the case of a perfect model. FAC2 indicates the fraction of the data that satisfies the condition from Eq. (13). Its value is between 0 and 1, with the perfect model resulting in a value of 1.
Rights and permissions
About this article
Cite this article
Kocijan, J., Gradišar, D., Stepančič, M. et al. Selection of the data time interval for the prediction of maximum ozone concentrations. Stoch Environ Res Risk Assess 32, 1759–1770 (2018). https://doi.org/10.1007/s00477-017-1468-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-017-1468-y