Clustering and forecasting of dissolved oxygen concentration on a river basin

  • Marco Costa
  • A. Manuela Gonçalves
Original Paper


The aim of this contribution is to combine statistical methodologies to geographically classify homogeneous groups of water quality monitoring sites based on similarities in the temporal dynamics of the dissolved oxygen (DO) concentration, in order to obtain accurate forecasts of this quality variable. Our methodology intends to classify the water quality monitoring sites into spatial homogeneous groups, based on the DO concentration, which has been selected and considered relevant to characterize the water quality. We apply clustering techniques based on Kullback Information, measures that are obtained in the state space modelling process. For each homogeneous group of water quality monitoring sites we model the DO concentration using linear and state space models, which incorporate tendency and seasonality components in different ways. Both approaches are compared by the mean squared error (MSE) of forecasts.


Hydrological basin Water quality Clustering State space model Linear model Kalman filter 



The authors would like to thank the anonymous referees for many helpful critics and suggestions that contributed to improve this paper. The authors would like to thank to Eng. Pimenta Machado from the Portuguese Regional Directory for the Northern Environment and Natural Resources and to Eng. Cláudia Brandão from the Portuguese Institute of Water, for sharing their skills and experiences and for supplying the monitored data. A. Manuela Gonçalves acknowledges the financial support provided by the Research Centre of Mathematics of the University of Minho through the FCT Pluriannual Funding Program.


  1. Alpuim T, Barbosa S (1999) The Kalman filter in the estimation of area precipitation. Environmetrics 10:377–394CrossRefGoogle Scholar
  2. Bengtsson T, Cavanaugh JE (2008) State-space discrimination and clustering of atmospheric time series data based on Kullback information measures. Environmetrics 19:103–121CrossRefGoogle Scholar
  3. Boi P (2004) A statistical method for forecasting extreme daily temperatures using ECMWF 2-m temperatures and ground station measurements. Meteorol Appl 11:245–251CrossRefGoogle Scholar
  4. Brown P, Diggle P, Lord M, Young P (2001) Space-time calibration of radar rainfall data. Appl Stat 50(2):221–241Google Scholar
  5. Carl G, Kühn I (2008) Analysing spatial ecological data using linear regression and wavelet analysis. Stoch Environ Res Risk Assess 22(3):315–324CrossRefGoogle Scholar
  6. Costa M, Alpuim T (2010) Parameter estimation of state space models for univariate observations. J Stat Plan Inference 140(7):1889–1902CrossRefGoogle Scholar
  7. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38Google Scholar
  8. Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Arnold, LondonGoogle Scholar
  9. Fovell R, Fovell M (1993) Climate zones of the conterminous United States defined using cluster analysis. J Clim 6:2103–2135CrossRefGoogle Scholar
  10. Galanis G, Anadranistakis M (2002) A one-dimensional Kalman filter for the correction of near surface temperature forecast. Meteorol Appl 9:437–441CrossRefGoogle Scholar
  11. Gong X, Richman M (1995) On the application of cluster analysis to growing season precipitation data in North America east of the Rockies. J Clim 8:897–931CrossRefGoogle Scholar
  12. Harvey AC (1996) Forecasting structural time series models and the Kalman filter. Cambridge University Press, CambridgeGoogle Scholar
  13. Kullback S (1968) Information theory and statistics. Dover, New YorkGoogle Scholar
  14. Leybourne SJ (2006) Estimation and testing of time-varying coefficient regression models in the presence of linear restrictions. J Forecast 12(1):49–62CrossRefGoogle Scholar
  15. Libonati R, Trigo I, DaCamara C (2008) Correction of 2 m-temperature forecasts using Kalman filtering technique. Atmos Res 87:183–197CrossRefGoogle Scholar
  16. Mouriño H, Barão MI (2009) A comparison between the linear regression model with autocorrelated errors and the partial adjustment model. Stoch Environ Res Risk Assess 24(4):499–511CrossRefGoogle Scholar
  17. Oliveira RES, Lima MMCL, Vieira JMP (2005) An indicator system for surface water quality in river basins. In: Inter-Celti colloquium on hydrology and management of water resources 4, GuimarãesGoogle Scholar
  18. Pagan A (1980) Some identification and estimation results for regression models with stochastically varying coefficients. J Econom 13:341–363CrossRefGoogle Scholar
  19. Paschalidou AK, Kassomenos PA, Bartzokas A (2009) A comparative study on various statistical techniques predicting ozone concentrations: implications to environmental management. Environ Monit Assess 148:277–289CrossRefGoogle Scholar
  20. PGIRH/N (1988) Metodologias para a Avaliação de Políticas de Recursos Hídricos - Plano de Gestão da Bacia Hidrográfica do Rio Ave (in Portuguese). Ministério das Obras Públicas, Transportes e Comunicações, Laboratório Nacional de Engenharia Civil, Ministério do Plano e Administração do Território, Comissão de Coordenação da Região Norte 8:66, LisboaGoogle Scholar
  21. PGIRH/N and NATO PO-RIVERS (1994) Caracterização e Directrizes de Planeamento dos Recursos Hídricos do Norte – A Bacia Hidrográfica do Rio Ave (in Portuguese). Ministério do Ambiente e dos Recursos Naturais, Direcção Regional do Ambiente e Recursos Naturais, Instituto da Água. Porto 1–5, 1–13Google Scholar
  22. Shrestha S, Kazama F (2007) Assessment of surface water quality using multivariate statistical techniques: a case study of the Fuji river basin, Japan. Environ Model Softw 22:464–475CrossRefGoogle Scholar
  23. Shumway R, Stoffer D (1982) An approach to time series smoothing and forecasting using the EM algorithm. J Time Ser Anal 3:253–264CrossRefGoogle Scholar
  24. Shumway R, Stoffer D (2006) Time series analysis and its applications, 2nd edn. Springer-Verlag, BerlinGoogle Scholar
  25. Stone RC (1989) Weather types at Brisbane, Queensland: an example of the use of principal components and cluster analysis. Int J Climatol 9:3–32CrossRefGoogle Scholar
  26. Vieira JMP (2003) Water management in national water plan challenges (in Portuguese). Revista Engenharia Civil 16:5–12Google Scholar
  27. Zhu R, El-Shaarawi AH (2009) Model clustering and its application to water quality monitoring. Environmetrics 20:190–205CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.Escola Superior de Tecnologia e Gestão de ÁguedaUniversidade de AveiroÁguedaPortugal
  2. 2.Departamento de Matemática e AplicaçõesUniversidade do MinhoGuimarãesPortugal

Personalised recommendations