Skip to main content
Log in

Selection of the data time interval for the prediction of maximum ozone concentrations

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

This paper highlights the problem of step-length selection for the one-step-ahead prediction of ozone called the data time interval. This is done using a case study-based comparison of two approaches for predicting the maximum daily values of tropospheric ozone. The first approach is the 1-day-ahead prediction and the second is the prediction of the maximum values based on a multi-step-ahead iteration of 1-h predictions. Gaussian process modelling is utilised for this comparison. In particular, evolving Gaussian-process models are used that update on-line with the incoming measurement data. These sorts of models have been successfully used in the past for the prediction of ozone pollution. This paper contributes an assessment of the way that the maximum ozone values are predicted. A comparison of the daily maximum ozone values forecasted by a model based on 1-day-ahead predictions with those obtained by iterated 1-h-ahead predictions of the ozone with predictions at predetermined hours of the day is given. The forecast results are in favour of the on-line model based on hourly predictions when approaching closer to the real maximum values of ozone, and in favour of the daily predictions when they are made on a daily basis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Al-Alawi SM, Abdul-Wahab SA, Bakheit CS (2008) Combining principal component regression and artificial neural-networks for more accurate predictions of ground-level ozone. Environ Model Softw 23:396–403

    Article  Google Scholar 

  • Alyousifi Y, Masseran N, Ibrahim K (2017) Modeling the stochastic dependence of air pollution index data. Stoch Environ Res Risk Assess. doi:10.1007/s00477-017-1443-7

    Article  Google Scholar 

  • Andrawis RR, Atiya AF, El-Shishiny H (2011) Combination of long term and short term forecasts, with application to tourism demand forecasting. Int J Forecast 27(3):870–886

    Article  Google Scholar 

  • Bruno F, Paci L (2014) Spatiotemporal model for short-term predictions of air pollution data. In: Lanzarone E, Ieva F (eds) The contribution of young researchers to Bayesian statistics. Springer, Cham, pp 91–94

  • Casals J, Jerez M, Sotoca S (2009) Modelling and forecasting time series sampled at different frequencies. J Forecast 28(4):316–342

    Article  Google Scholar 

  • Chan LLT, Liu Y, Chen J (2013) Nonlinear system identification with selective recursive Gaussian process models. Ind Eng Chem Res 52(51):18276–18286

    Article  CAS  Google Scholar 

  • Conde-Amboage M, González-Manteiga W, Sánchez-Sellero C (2017) Predicting trace gas concentrations using quantile regression models. Stoch Environ Res Risk Assess 31(6):1359–1370

    Article  Google Scholar 

  • Ding W, Zhang J, Leung Y (2016) Prediction of air pollutant concentration based on sparse response back-propagation training feedforward neural networks. Environ Sci Pollut Res 23(19):19481–19494

    Article  Google Scholar 

  • Duenas C, Fernandez MC, Canete S, Carretero J, Liger E (2005) Stochastic model to forecast ground-level ozone concentration at urban and rural areas. Chemosphere 61:1379–1389

    Article  CAS  Google Scholar 

  • EU-Commission (2008) Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe. Off J Eur Commun L152:1–44. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2008:152:0001:0044:EN:PDF

  • Faris H, Alkasassbeh M, Rodan A (2014) Artificial neural networks for surface ozone prediction: models and analysis. Pol J Environ Stud 23(2):341–348

    CAS  Google Scholar 

  • Faul S, Gregorčič G, Boylan G, Marnane W, Lightbody G, Connolly S (2007) Gaussian process modeling of EEG for the detection of neonatal seizures. IEEE Trans Biomed Eng 54(12):2151–2162

    Article  Google Scholar 

  • Feng Y, Zhang W, Sun D, Zhang L (2011) Ozone concentration forecast method based on genetic algorithm optimized back propagation neural networks and SVM data classification. Atmos Environ 45:1979–1985

    Article  CAS  Google Scholar 

  • Gong B, Ordieres-Meré J (2016) Prediction of daily maximum ozone threshold exceedances by preprocessing and ensemble artificial intelligence techniques: case study of Hong Kong. Environ Model Softw 84:290–303

    Article  Google Scholar 

  • Grašič B, Mlakar P, Božnar M (2006) Ozone prediction based on neural networks and Gaussian processes. Nuovo Cimento Soc Ital Fis C 29(6):651–661

    Google Scholar 

  • Gregorčič G, Lightbody G (2008) Nonlinear system identification: from multiple-model networks to Gaussian processes. Eng Appl Artif Intell 21(7):1035–1055

    Article  Google Scholar 

  • Hong SM, Bukhari W (2014) Real-time prediction of respiratory motion using a cascade structure of an extended Kalman filter and support vector regression. Phys Med Biol 59(13):3555–3573

    Article  Google Scholar 

  • Im U, Bianconi R, Solazzo E, Kioutsioukis I, Badia A, Balzarini A, Bar R, Bellasio R, Brunner D, Chemel C, Curci G, Flemming J, Forkel R, Giordano L, Jimnez-Guerrero P, Hirtl M, Hodzic A, Honzak L, Jorba O, Knote C, Kuenen JJP, Makar PA, Manders-Groot A, Neal L, Prez JL, Pirovano G, Pouliot G, Jose RS, Savage N, Schroder W, Sokhi RS, Syrakov D, Torian A, Tuccella P, Werhahn J, Wolke R, Yahya K, Zabkar R, Zhang Y, Zhang J, Hogrefe C, Galmarini S (2015) Evaluation of operational on-line-coupled regional air quality models over Europe and North America in the context of AQMEII phase 2. Part I: ozone. Atmos Environ 115:404–420. doi:10.1016/j.atmosenv.2014.09.042

    Article  CAS  Google Scholar 

  • Kang H, Park FC, Park FC (2015) Motion optimization using Gaussian process dynamical models. Multibody Syst Dyn 34(4):307–325

    Article  Google Scholar 

  • Kocijan J (2016) Modelling and control of dynamic systems using Gaussian process models. Springer, Cham

    Book  Google Scholar 

  • Kocijan J, Gradišar D, Božnar MZ, Grašič B, Mlakar P (2016) On-line algorithm for ground-level ozone prediction with a mobile station. Atmos Environ 131:326–333

    Article  CAS  Google Scholar 

  • Kourentzes N, Petropoulos F, Trapero JR (2014) Improving forecasting by estimating time series structural components across multiple frequencies. Int J Forecast 30(2):291–302

    Article  Google Scholar 

  • Leith DJ, Heidl M, Ringwood J (2004) Gaussian process prior models for electrical load forecasting. In: Proceedings of 2004 international conference on probabilistic methods applied to power systems, Piscataway, NJ, IEEE. IEEE, pp 112–117

  • Leithead WE, Zhang Y, Neo KS (2005) Wind turbine rotor acceleration: Identification using Gaussian regression. In: Proceedings of 2nd international conference on informatics in control automation and robotics (ICINCO 2005), Setúbal, INSTICC. INSTICC, pp 84–91

  • Likar B, Kocijan J (2007) Predictive control of a gas–liquid separation plant based on a Gaussian process model. Comput Chem Eng 31(3):142–152. doi:10.1016/j.compchemeng.2006.05.011

    Article  CAS  Google Scholar 

  • Liu J, Han D (2013) On selection of the optimal data time interval for real-time hydrological forecasting. Hydrol Earth Syst Sci 17(9):3639–3659

    Article  Google Scholar 

  • MacKay DJC (1998) Introduction to Gaussian processes. NATO ASI Ser 168:133–166

    Google Scholar 

  • Petelin D, Grancharova A, Kocijan J (2013) Evolving Gaussian process models for the prediction of ozone concentration in the air. Simul Model Pract Theory 33(1):68–80

    Article  Google Scholar 

  • Quinonero-Candela J, Rasmussen CE, Williams CKI (2007) Large-scale Kernel machines, chapter approximation methods for Gaussian process regression. Neural information processing. The MIT Press, Cambridge, pp 203–223

  • Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge

    Google Scholar 

  • Schliep EM, Gelfand AE, Holland DM (2017) Alternating Gaussian process modulated renewal processes for modeling threshold exceedances and durations. Stoch Environ Res Risk Assess. doi:10.1007/s00477-017-1417-9

    Article  Google Scholar 

  • Shi JQ, Choi T (2011) Gaussian process regression analysis for functional data. Chapman and Hall/CRC, Taylor & Francis Group, Boca Raton

    Google Scholar 

  • Sud K, Singh B, Kohli HS, Jha V, Gupta KL, Sakhuja V (2002) Evaluation of different sampling times for best prediction of cyclosporine area under the curve in renal transplant recipients. Transplant Proc 34(8):3168–3170

    Article  CAS  Google Scholar 

  • Taylan O (2017) Modelling and analysis of ozone concentration by artificial intelligent techniques for estimating air quality. Atmos Environ 150:356–365

    Article  CAS  Google Scholar 

  • Žabkar R, Honzak L, Skok G, Forkel R, Rakovec J, Ceglar A, Žagar N (2015) Evaluation of the high resolution WRF-Chem (v3.4.1) air quality forecast and its comparison with statistical ozone predictions. Geosci Model Dev 8(7):2119–2137

    Article  CAS  Google Scholar 

  • Zhang Y, Bocquet M, Mallet V, Seigneur C, Baklanov A (2012) Real-time air quality forecasting, part I: history, techniques, and current status. Atmos Environ 60:632–655. doi:10.1016/j.atmosenv.2012.06.031

    Article  CAS  Google Scholar 

  • Zhang Y, Bocquet M, Mallet V, Seigneur C, Baklanov A (2012) Real-time air quality forecasting, part II: state of the science, current research needs, and future prospects. Atmos Environ 60:656–676. doi:10.1016/j.atmosenv.2012.02.041

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors acknowledge the financial support from the Slovenian Research Agency (Projects Nos. L2-5475, L2-8174 and P2-0001). The Slovenian Environment Agency provided part of the data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juš Kocijan.

Appendix: Performance measures

Appendix: Performance measures

The following are performance measures used in the study.

  • The root-mean-square error—RMSE:

    $$\begin{aligned} \mathrm {RMSE} = \sqrt{\frac{1}{N}\sum _{i=1}^N (E(\hat{y}_i)-y_i)^2}, \end{aligned}$$
    (9)

    where \(y_i\) and \(\hat{y}_i\) are the observation and the prediction in the i-th step, respectively, \(E(\cdot )\) denotes the expectation, i.e., the mean value, of the random variable, and N is the number of used observations.

  • The standardised mean-squared error—SMSE

    $$\begin{aligned} \mathrm {SMSE}=\frac{1}{N}\frac{\sum _{i=1}^N(E(\hat{y}_i)-y_i)^2}{\sigma _y^2}, \end{aligned}$$
    (10)

    where \(\sigma _y^2\) is the variance of the observations.

  • The Pearson’s correlation coefficient—PCC:

    $$\begin{aligned} \mathrm {PCC}=\frac{\sum _{i=1}^N(E(\hat{y}_i)-E(\hat{{\mathbf {y}}})) (y_i-E({\mathbf {y}}))}{N\sigma _y\sigma _{\hat{y}}}, \end{aligned}$$
    (11)

    where \(E(\hat{{\mathbf {y}}})\) is the expectation, i.e., the mean value, of the vector of predictions, and \(\sigma _y\), \(\sigma _{\hat{y}}\) are the standard deviations of the observations and the predictions, respectively.

  • The mean fractional bias—MFB:

    $$\begin{aligned} \mathrm {MFB}=\frac{1}{N}\sum _{i=1}^N\frac{E(\hat{y}_i)-y_i}{\frac{1}{2}(E(\hat{y}_i)+y_i)}. \end{aligned}$$
    (12)
  • The factor of the modelled values within a factor of two of the observations—FAC2:

    $$\begin{aligned} \mathrm {FAC2}=\frac{1}{N}\sum _{i=1}^Nn_i\ \mathrm {with}\ n_i= {\left\{ \begin{array}{ll} 1 &{} \mathrm {for} \,\,\,0.5\le |\frac{E(\hat{y}_i)}{y_i}|\le 2,\\ 0 &{} \mathrm {else}. \end{array}\right. } \end{aligned}$$
    (13)

RMSE and SMSE are frequently used measures for the accuracy of the predictions’ mean values, which are 0 in the case of a perfect model. SMSE is the standardised measure with values between 0 and 1. PCC is a measure of the associativity and is not sensitive to bias. Its value is between \(-\,1\) and \(+\,1\), with ideally linearly correlated values resulting in a value 1. MFB is the measure that bounds the maximum bias and gives additional weight to underestimations and less weight to overestimations. Its value is between \(-\,2\) and \(+\,2\), with the value 0 in the case of a perfect model. FAC2 indicates the fraction of the data that satisfies the condition from Eq. (13). Its value is between 0 and 1, with the perfect model resulting in a value of 1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kocijan, J., Gradišar, D., Stepančič, M. et al. Selection of the data time interval for the prediction of maximum ozone concentrations. Stoch Environ Res Risk Assess 32, 1759–1770 (2018). https://doi.org/10.1007/s00477-017-1468-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-017-1468-y

Keywords

Navigation