A Mixture Model Approach for Compositional Data: Inferring Land-Use Influence on Point-Referenced Water Quality Measurements

  • Adrien IckowiczEmail author
  • Jessica Ford
  • Keith Hayes


The assessment of water quality across space and time is of considerable interest for both agricultural and public health reasons. The standard method to assess the water quality of a sub-catchment, or a group of sub-catchments, usually involves collecting point measurements of water quality and other additional information such as the date and time of measurements, rainfall amounts, the land use and soil type of the catchment and the elevation. Some of this auxiliary information is point-referenced data, measured at the exact location, whereas other such as land use is areal data often recorded in a compositional format at the catchment or sub-catchment level. The spatial change of support inherited by this data collection process breaks the natural link between the response variable and the predictors. In this paper, we present an approach to reconstruct this link by using a categorical latent variable that identifies the land use that most likely influences water quality in each sub-catchment. This constitutes the spatial clustering layer of the model. Each cluster is associated with an estimated temporal variability common to water quality measurements. The strength of this approach lies in the temporal variation identifying each cluster, allowing decision makers to make inform decision regarding land use and its influence over water quality. We demonstrate the potential of this approach with data from a water quality research study in the Mount Lofty range, in South Australia.


Spatio-temporal data Model-based clustering Change of support problem Bayesian analysis 



Funding was provided by “Goyder Institute for Water Research”.


  1. Aitchison J (2003) A concise guide to compositional data analysis. CDA Workshop, GironaGoogle Scholar
  2. Bakar K, Sahu S (2013) spTimer: Spatio-Temporal Bayesian Modelling Using R. Journal of Statistical SoftwareGoogle Scholar
  3. Baudry JP, Maugis C, Michel B (2012) Slope heuristics: Overview and implementation. Statistics and Computing 22(2):455–470MathSciNetCrossRefGoogle Scholar
  4. Beck MB (1987) Water quality modeling: A review of the analysis of uncertainty. Water Resources Research 23(8):1393–1442CrossRefGoogle Scholar
  5. Bishop CM (2006) Pattern Recognition and Machine Learning. Springer, New York, New York, USA, arXiv:1011.1669v3
  6. Buck O, Niyogi DK, Townsend CR (2004) Scale-dependence of land use effects on water quality of streams in agricultural catchments. Environmental pollution (Barking, Essex : 1987) 130(2):287–99CrossRefGoogle Scholar
  7. Burcher CL (2009) Using simplified watershed hydrology to define spatially explicit ’zones of influence’. Hydrobiologia 618(1):149–160CrossRefGoogle Scholar
  8. Cox JW, Oliver DP, Fleming NK, Anderson JS (2012) Off-site transport of nutrients and sediment from three main land-uses in the Mt Lofty Ranges, South Australia. Agricultural Water Management 106:50–59CrossRefGoogle Scholar
  9. Cressie NAC, Wikle CK (2011) Statistics for Spatio-Temporal Data. John Wiley & SonsGoogle Scholar
  10. Eddelbuettel D, François R (2011) Rcpp: Seamless R and C++ integration. Journal of Statistical Software 40(8):1–18,,
  11. Ford J, Ickowicz A, Oliver D, Hayes K, Kookana R (2015) Integrated catchment water planning support for Adelaide Mount Lofty Ranges Water Allocation Planning ( GWAP Project ) Task 5 : Tiered Water Quality Risk Assessment. Tech. Rep. 15/4, Goyder Institute for Water Research, AdelaideGoogle Scholar
  12. Gelfand A, Zhu L, Carlin B (2001) On the change of support problem for spatio-temporal data. Biostatistics (Oxford, England) 2(1):31–45CrossRefGoogle Scholar
  13. Gelman A, Rubin DB (1992) Inference from Iterative Simulation Using Multiple Sequences. Statistical Science 7(4):457–472, arXiv:1011.1669v3 CrossRefGoogle Scholar
  14. Gelman A, Stern HS, Carlin JB, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. Chapman and Hall/CRCGoogle Scholar
  15. Hunsaker CT, Levine DA (1995) Hierarchical Approaches of Water Quality in Rivers Study processes are important in developing. Sciences-New York 45(3):193–203Google Scholar
  16. Kass RE, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90(430):773–795MathSciNetCrossRefGoogle Scholar
  17. King RS, Baker ME, Whigham DF, Weller DE, Jordan TE, Kazyak PF, Hurd MK (2005) Spatial Considerations for Linking Watershed Land Cover To Ecological Indicators in Streams. Ecological Applications 15(1):137–153CrossRefGoogle Scholar
  18. Lehmann EA, Phatak A, Soltyk S, Chia J, Lau R, Palmer M (2013) Bayesian hierarchical modelling of rainfall extremes. In: 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1-6 December 2013, December, pp 1–6Google Scholar
  19. Lindstrom J, Szpiro A, Sampson P, Sheppard L, Oron A, Richards M, Larson T (2011) A flexible spatio-temporal model for air pollution: Allowing for spatio-temporal covariates UW Biostatistics Working Paper Series 370(January):1–38Google Scholar
  20. Lindstrom J, Szpiro A, Sampson P, Bergen S, Sheppard L (2013) SpatioTemporal : An R Package for Spatio-Temporal Modelling of Air-Pollution. CRAN VignettesGoogle Scholar
  21. Moores MT, Hargrave CE, Deegan T, Poulsen M, Harden F, Mengersen K (2015) An external field prior for the hidden potts model with application to cone-beam computed tomography. Computational Statistics & Data Analysis 86:27–41,, MathSciNetCrossRefGoogle Scholar
  22. Murray I, Ghahramani Z, MacKay DJC (2006) MCMC for doubly-intractable distributions. Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06) pp 359–366, arxiv:1206.6848
  23. Nguyen HD, McLachlan GJ, Ullmann JFP, Janke AL (2016) Spatial Clustering of Time-Series via Mixture of Autoregressions Models and Markov Random Fields. Arxiv preprint 70(4):1–42, arxiv:1601.03517 MathSciNetCrossRefGoogle Scholar
  24. Peterson EE, Sheldon F, Darnell R, Bunn SE, Harch BD (2011) A comparison of spatially explicit landscape representation methods and their relationship to stream condition. Freshwater Biology 56(3):590–610CrossRefGoogle Scholar
  25. R Core Team (2013) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, URL
  26. Raftery AE, Dean N (2004) Variable Selection for Model-Based Clustering. Journal of the American Statistical Association 101(473):168–178MathSciNetCrossRefGoogle Scholar
  27. Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the em algorithm. SIAM Review 26(2):195–239MathSciNetCrossRefGoogle Scholar
  28. Rue H, Leonhard H (2005) Theory of Gaussian Markov Random Fields. In: Gaussian Markov Random Fields: Theory and Applications, 1st edn, Chapman and Hall, New York, chap Chapter 2, p 280Google Scholar
  29. Samé A, Chamroukhi F, Govaert G, Aknin P (2011) Model-based clustering and segmentation of time series with changes in regime. Advances in Data Analysis and Classification 5(4):301–321, arxiv:1312.6967 MathSciNetCrossRefGoogle Scholar
  30. Shen Z, Hou X, Li W, Aini G (2014) Relating landscape characteristics to non-point source pollution in a typical urbanized watershed in the municipality of Beijing. Landscape and Urban Planning 123:96–107CrossRefGoogle Scholar
  31. Strayer DL, Beighley RE, Thompson LC, Brooks S, Nilsson C, Pinay G, Naiman RJ (2003) Effects of Land Cover on Stream Ecosystems: Roles of Empirical Models and Scaling Issues. Ecosystems 6(5):407–423CrossRefGoogle Scholar
  32. Szpiro A, Sampson P, Sheppard L, Lumley T, Adar S, Kaufman J (2010) Predicting intra-urban variation in air pollution concentrations with complex spatio-temporal dependencies. Environmetrics 21(6):606–631MathSciNetGoogle Scholar
  33. Varcoe J, van Leeuwen JA, Chittleborough DJ, Cox JW, Smernik RJ, Heitz A (2010) Changes in water quality following gypsum application to catchment soils of the Mount Lofty Ranges, South Australia, Organic Geochemistry 41(2):116–123CrossRefGoogle Scholar
  34. Zhu L, Carlin B, Gelfand A (2003) Hierarchical regression with misaligned spatial data: relating ambient ozone and pediatric asthma ER visits in Atlanta. Environmetrics pp 1–33Google Scholar

Copyright information

© International Biometric Society 2019

Authors and Affiliations

  1. 1.CSIROHobartAustralia

Personalised recommendations