Skip to main content
Log in

Groundwater contaminant source identification via Bayesian model selection and uncertainty quantification

Identification de la source des contaminants dans les eaux souterraines via la sélection d’un modèle bayésien et la quantification de l’incertitude

Identificación de la fuente de contaminantes de aguas subterráneas mediante la selección del modelo bayesiano y la cuantificación de la incertidumbre

基于贝叶斯模型选择及不确定性量化的地下水污染源识别

Identificação de fontes de contaminantes de águas subterrâneas via seleção de modelo Bayesiano e quantificação de incertezas

  • Paper
  • Published:
Hydrogeology Journal Aims and scope Submit manuscript

Abstract

Many deterministic and stochastic approaches have been applied for groundwater contaminant source identification (GCSI) in recent decades. Usually, these implementations are based on a single groundwater model or fixed model structure and ignore the uncertainty of model structure. However, model structure uncertainty is inevitable for groundwater modeling, especially for complex geological environments and limited observations. This study evaluated the impact of model structure uncertainty on GCSI, and proposes an approach for GCSI based on Bayesian model selection. In the framework of multiple model analysis, a set of alternative model structures are used to represent the unknown groundwater system. Then, a novel nested sampling algorithm, POLYCHORD, is used for model selection and source identification. This algorithm is capable of estimating the model’s marginal likelihood and inferring the posterior distribution of the contaminant source’s characteristics simultaneously. Finally, this proposed approach is verified through two GCSI case studies, which include a synthetic groundwater contamination problem and a groundwater transport column experiment. The results demonstrated that GCSI could be inconsistent when using different model structures. Models with higher marginal likelihoods tend to have better performance on the predictions of the contaminant source’s characteristics. It was concluded that POLYCHORD is efficient in marginal likelihood estimation and GCSI.

Résumé

De nombreuses approches déterministes et stochastiques ont été utilisées pour identifier la source de contaminants dans les eaux souterraines au cours des dernières décennies. Habituellement, leurs mises en œuvre sont basés sur un modèle hydrogéologique unique ou une structure de modèle fixe et ignorent l’incertitude de la structure du modèle. Cependant, l’incertitude de la structure du modèle est inévitable dans le cas de la modélisation hydrogéologique, en particulier pour des environnements géologiques complexes et des observations limitées. Cette étude a évalué l’impact de l’incertitude liée à la structure du modèle sur l’identification de la source de contaminants dans les eaux souterraines, et propose une approche pour le faire qui est basée sur la sélection d’un modèle bayésien. Dans le cadre d’analyse de multiple modèles, un ensemble de structures de modèle alternatif est utilisé pour représenter la partie non connue du système hydrogéologique. Ensuite, un nouvel algorithme d’échantillonnage imbriqué, POLYCHORD, est utilisé pour la sélection du modèle et l’identification de la source. Cet algorithme est capable d’estimer la probabilité marginale du modèle et d’interférer avec la distribution à postériori des caractéristiques de la source des contaminants simultanément. Finalement, cette approche proposée est vérifiée en l’appliquant sur deux cas d’études, qui intègrent un problème synthétique de contamination des eaux souterraines et une expérience de transport dans l’eau souterraine sur une colonne. Les résultats démontrent que l’identification de la source des contaminants des eaux souterraines pourrait être incohérente si l’on utilise des structures de modèle différentes. Les modèles dont les probabilités marginales sont les plus élevées, ont tendance à être plus performant sur les prévisions des caractéristiques de la source des contaminants. Il a été conclu que POLYCHORD est efficace dans l’estimation da la probabilité marginale et l’identification de la source des contaminants dans les eaux souterraines.

Resumen

En las últimas décadas se han aplicado muchos enfoques deterministas y estocásticos para la identificación de fuentes de contaminantes de aguas subterráneas (GCSI). Por lo general, estas implementaciones se basan en un modelo único de agua subterránea o en una estructura de modelo fijo e ignoran la incertidumbre de la estructura del modelo. Sin embargo, la incertidumbre de la estructura del modelo es inevitable para el modelado de aguas subterráneas, especialmente para entornos geológicos complejos y observaciones limitadas. Este estudio evaluó el impacto de la incertidumbre de la estructura del modelo en la GCSI, y propone un enfoque para la GCSI basado en la selección del modelo bayesiano. En el marco del análisis de modelos múltiples, se utiliza un conjunto de estructuras de modelos alternativos para representar el sistema de aguas subterráneas desconocido. A continuación, se utiliza un nuevo algoritmo de muestreo anidado, POLYCHORD, para la selección del modelo y la identificación de la fuente. Este algoritmo es capaz de estimar la probabilidad marginal del modelo e inferir la distribución posterior de las características de la fuente del contaminante simultáneamente. Finalmente, este enfoque propuesto se verifica a través de dos estudios de caso de la GCSI, que incluyen un problema de contaminación sintética de las aguas subterráneas y un experimento de columna de transporte de aguas subterráneas. Los resultados demostraron que la GCSI podría ser inconsistente cuando se utilizan diferentes estructuras de modelos. Los modelos con mayor probabilidad marginal tienden a tener un mejor desempeño en las predicciones de las características de la fuente del contaminante. Se concluyó que POLYCHORD es eficiente en la estimación de la probabilidad marginal y en la GCSI.

摘要

近年来已有大量的确定性和随机性方法用于地下水污染源识别,通常这些方法是基于单一的地下水模型或固定的模型结构,而未考虑模型结构的不确定性。在实际条件下,特别对于复杂的地质环境和有限的观测数据,地下水模型的结构不确定性是不可避免的。本次研究评价了模型结构不确定性对于地下水污染源识别的影响,并提出一种基于贝叶斯模型选择的污染源识别方法。首先,在多模型分析的框架内,提出一组可行的备选模型结构来描述实际未知的地下水系统;其次,将最新提出的POLYCHORD算法用于贝叶斯模型选择及污染源识别。最后,通过两个案例分析,包括一个理想的地下水污染物运移案例和一个地下水污染物运移室内实验,对提出的污染源识别方法进行验证。研究结果表明,不同的模型结构将会导致不一致的地下水污染源识别结果。具有越高模型边缘似然值的模型结构,能够更好的进行污染源识别。同时,提出的POLYCHORD算法能够有效地进行模型边缘似然值估计和地下水污染源识别。

Resumo

Muitas abordagens determinísticas e estocásticas foram aplicadas na identificação de fontes de contaminantes de águas subterrâneas (IFCAS) em décadas recentes. Geralmente, essas implementações são baseadas em um único modelo de águas subterrâneas ou um modelo de estrutura fixa e ignoram a incerteza da estrutura do modelo. Entretanto, a incerteza da estrutura do modelo é inevitável para modelagem das águas subterrâneas, especialmente para ambientes geológicos complexos e observações limitadas. Esse estudo avaliou o impacto da incerteza da estrutura do modelo na IFCAS, e propõe uma abordagem para IFCAS baseada na seleção Bayesiana de modelos. Na estrutura da análise de modelos múltiplos, um conjunto de estruturas de modelos alternativos é usado para representar o sistema de água subterrânea desconhecido. Em seguida, um novo algoritmo de amostragem aninhada, POLYCHORD, é usado para seleção do modelo e identificação de fonte. Esse algoritmo é capaz de estimar a probabilidade marginal do modelo e inferir a distribuição posterior das características da fonte contaminante simultaneamente. Finalmente, essa abordagem proposta é verificada através de dois estudos de caso de IFCAS, que incluem um problema sintético de contaminação das águas subterrâneas e um experimento de coluna de transporte de águas subterrâneas. Os resultados demonstraram que a IFCAS pode ser inconsistente ao usar diferentes estruturas de modelo. Modelos com maiores probabilidades marginais tendem a ter melhor desempenho nas previsões das características da fonte de contaminantes. Concluiu-se que o POLYCHORD é eficiente na estimativa da probabilidade marginal e na IFCAS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Atmadja J, Bagtzoglou AC (2001) State of the art report on mathematical methods for groundwater pollution source identification. Environ Forensic 2(3):205–214

    Google Scholar 

  • Ayvaz MT (2007) Simultaneous determination of aquifer parameters and zone structures with fuzzy c-means clustering and meta-heuristic harmony search algorithm. Adv Water Resour 30(11):2326–2338

    Google Scholar 

  • Ayvaz MT (2010) A linked simulation-optimization model for solving the unknown groundwater pollution source identification problems. J Contam Hydrol 117(1–4):46–59

    Google Scholar 

  • Ayvaz MT (2016) A hybrid simulation–optimization approach for solving the areal groundwater pollution source identification problems. J Hydrol 538:161–176

    Google Scholar 

  • Brunetti C, Linde N, Vrugt JA (2017) Bayesian model selection in hydrogeophysics: application to conceptual subsurface models of the South Oyster Bacterial Transport Site, Virginia, USA. Adv Water Resour 102:127–141

    Google Scholar 

  • Cao TT, Zeng XK, Wu JC, Wang D, Sun YY, Zhu XB, Lin J, Long YQ (2018) Integrating MT-DREAMzs and nested sampling algorithms to estimate marginal likelihood and comparison with several other methods. J Hydrol 563:750–765

    Google Scholar 

  • Dai H, Chen XY, Ye M, Song XH, Zachara JM (2017) A geostatistics-informed hierarchical sensitivity analysis method for complex groundwater flow and transport modeling. Water Resour Res 53(5):4327–4343

    Google Scholar 

  • Dokou Z, Pinder GF (2009) Optimal search strategy for the definition of a DNAPL source. J Hydrol 376(3–4):542–556

    Google Scholar 

  • El-Jaat M, Hulley M, Tetreault M (2018) Evaluation of the fast orthogonal search method for forecasting chloride levels in the Deltona groundwater supply (Florida, USA). Hydrogeol J 26(6):1809–1820

    Google Scholar 

  • Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472

    Google Scholar 

  • Gzyl G, Zanini A, Fraczek R, Kura K (2014) Contaminant source and release history identification in groundwater: a multi-step approach. J Contam Hydrol 157:59–72

    Google Scholar 

  • Hamilton JD (1994) Time series analysis. Princeton University Press, Princeton, NJ, 120 pp

  • Handley WJ, Hobson MP, Lasenby AN (2015) POLYCHORD: nested sampling for cosmology. Mon Not R Astron Soc 450(1):L61–L65

    Google Scholar 

  • Harbaugh AW (2005) MODFLOW-2005, the US Geological Survey modular ground-water model: the ground-water flow process. US Geological Survey, Reston, VA

    Google Scholar 

  • Heron G, Parker K, Galligan J, Holmes TC (2009) Thermal treatment of eight CVOC source zones to near nondetect concentrations. Ground Water Monit R 29(3):56–65

    Google Scholar 

  • Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14(4):382–401

    Google Scholar 

  • Hou ZY, Lu WX (2018) Comparative study of surrogate models for groundwater contamination source identification at DNAPL-contaminated sites. Hydrogeol J 26(3):923–932

    Google Scholar 

  • Lartillot N, Philippe H (2006) Computing Bayes factors using thermodynamic integration. Syst Biol 55(2):195–207

    Google Scholar 

  • Li GS, Tan YJ, Cheng J, Wang XQ (2006) Determining magnitude of groundwater pollution sources by data compatibility analysis. Inverse Probl Sci En 14(3):287–300

    Google Scholar 

  • Li Z, Mao XZ (2011) Global multiquadric collocation method for groundwater contaminant source identification. Environ Model Softw 26(12):1611–1621

    Google Scholar 

  • Liu HZ, Bruton TA, Doyle FM, Sedlak DL (2014) In situ chemical oxidation of contaminated groundwater by persulfate: decomposition by Fe(III)- and Mn(IV)-containing oxides and aquifer materials. Environ Sci Technol 48(17):10330–10336

    Google Scholar 

  • Liu PG, Elshall AS, Ye M, Beerli P, Zeng XK, Lu D, Tao YZ (2016) Evaluating marginal likelihood with thermodynamic integration method and comparison with several other numerical methods. Water Resour Res 52(2):734–758

    Google Scholar 

  • Lv XY, Gao B, Sun YY, Dong SN, Wu JC, Jiang BL, Shi XQ (2016) Effects of grain size and structural heterogeneity on the transport and retention of nano-TiO2 in saturated porous media. Sci Total Environ 563:987–995

    Google Scholar 

  • Man J, Liao Q, Zeng L, Wu L (2017) ANOVA-based transformed probabilistic collocation method for Bayesian data-worth analysis. Adv Water Resour 110:203–214

    Google Scholar 

  • Nan TC, Wu JC (2017) Application of ensemble H-infinity filter in aquifer characterization and comparison to ensemble Kalman filter. Water Sci Eng 10(1):25–35

    Google Scholar 

  • Neal RM (2003) Slice sampling. Ann Stat 31(3):705–741

    Google Scholar 

  • Obiri-Nyarko F, Grajales-Mesa SJ, Malina G (2014) An overview of permeable reactive barriers for in situ sustainable groundwater remediation. Chemosphere 111:243–259

    Google Scholar 

  • Onyari E, Taigbenu A (2017) Inverse Green element evaluation of source strength and concentration in groundwater contaminant transport. J Hydroinf 19(1):81–96

    Google Scholar 

  • Refsgaard JC, Christensen S, Sonnenborg TO, Seifert D, Hojberg AL, Troldborg L (2012) Review of strategies for handling geological uncertainty in groundwater flow and transport modeling. Adv Water Resour 36:36–50

    Google Scholar 

  • Schoniger A, Wohling T, Samaniego L, Nowak W (2014) Model selection on solid ground: rigorous comparison of nine ways to evaluate Bayesian model evidence. Water Resour Res 50(12):9484–9513

    Google Scholar 

  • Schoniger A, Illman WA, Wohling T, Nowak W (2015) Finding the right balance between groundwater model complexity and experimental effort via Bayesian model selection. J Hydrol 531:96–110

    Google Scholar 

  • Schoups G, Vrugt JA (2010) A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors. Water Resour Res 46(10):79–93

    Google Scholar 

  • Smith T, Sharma A, Marshall L, Mehrotra R, Sisson S (2010) Development of a formal likelihood function for improved Bayesian inference of ephemeral catchments. Water Resour Res 46:W12551

  • Skilling J (2004) Nested sampling. AIP Conf Proc 735(1):395–405. https://doi.org/10.1063/1.1835238

  • Skilling J (2006) Nested sampling for general Bayesian computation. Bayesian Anal 1(4):833–859

    Google Scholar 

  • Srivastava D, Singh RM (2015) Groundwater system modeling for simultaneous identification of pollution sources and parameters with uncertainty characterization. Water Resour Manag 29(13):4607–4627

    Google Scholar 

  • Sun AY, Painter SL, Wittmeyer GW (2006) A robust approach for iterative contaminant source location and release history recovery. J Contam Hydrol 88(3–4):181–196

    Google Scholar 

  • Volpi E, Schoups G, Firmani G, Vrugt JA (2017) Sworn testimony of the model evidence: Gaussian mixture importance (GAME) sampling. Water Resour Res 53(7):6133–6158

    Google Scholar 

  • Xu T, Gómez-Hernández JJ (2016) Joint identification of contaminant source location, initial release time, and initial solute concentration in an aquifer via ensemble Kalman filtering. Water Resour Res 52(8):6587–6595

    Google Scholar 

  • Xu T, Gomez-Hernandez JJ (2018) Simultaneous identification of a contaminant source and hydraulic conductivity via the restart normal-score ensemble Kalman filter. Adv Water Resour 112:106–123

    Google Scholar 

  • Xu TF, Valocchi AJ (2015) A Bayesian approach to improved calibration and prediction of groundwater models with structural error. Water Resour Res 51(11):9290–9311

    Google Scholar 

  • Xu TF, Valocchi AJ, Ye M, Liang F (2017) Quantifying model structural error: efficient Bayesian calibration of a regional groundwater flow model using surrogates and a data-driven error model. Water Resour Res 53(5):4084–4105

    Google Scholar 

  • Xue L, Zhang D, Guadagnini A, Neuman SP (2014) Multimodel Bayesian analysis of groundwater data worth. Water Resour Res 50(11):8481–8496

    Google Scholar 

  • Zell WO, Culver TB, Sanford WE (2018) Prediction uncertainty and data worth assessment for groundwater transport times in an agricultural catchment. J Hydrol 561:1019–1036

    Google Scholar 

  • Zeng XK, Wu JC, Wang D, Zhu XB, Long YQ (2016) Assessing Bayesian model averaging uncertainty of groundwater modeling based on information entropy method. J Hydrol 538:689–704

    Google Scholar 

  • Zeng XK, Ye M, Wu JC, Wang D, Zhu XB (2018) Improved nested sampling and surrogate-enabled comparison with other marginal likelihood estimators. Water Resour Res 54(2):797–826

    Google Scholar 

  • Zhang JJ, Li WX, Zeng LZ, Wu LS (2016) An adaptive Gaussian process-based method for efficient Bayesian experimental design in groundwater contaminant source identification problems. Water Resour Res 52(8):5971–5984

    Google Scholar 

  • Zheng C, Wang PP (1999) MT3DMS: a modular three-dimensional multispecies transport model for simulation of advection, dispersion, and chemical reactions of contaminants in groundwater systems: documentation and user’s guide. Alabama Univ University, Tuscaloosa, AL

Download references

Funding

This study was supported by The National Key Research and Development Program of China (2016YFC0402802), and the Fundamental Research Funds for the Central Universities (0206-14380085).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiankui Zeng.

Electronic supplementary material

ESM 1

(PDF 502 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, T., Zeng, X., Wu, J. et al. Groundwater contaminant source identification via Bayesian model selection and uncertainty quantification. Hydrogeol J 27, 2907–2918 (2019). https://doi.org/10.1007/s10040-019-02055-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10040-019-02055-3

Keywords

Navigation