Abstract
Many deterministic and stochastic approaches have been applied for groundwater contaminant source identification (GCSI) in recent decades. Usually, these implementations are based on a single groundwater model or fixed model structure and ignore the uncertainty of model structure. However, model structure uncertainty is inevitable for groundwater modeling, especially for complex geological environments and limited observations. This study evaluated the impact of model structure uncertainty on GCSI, and proposes an approach for GCSI based on Bayesian model selection. In the framework of multiple model analysis, a set of alternative model structures are used to represent the unknown groundwater system. Then, a novel nested sampling algorithm, POLYCHORD, is used for model selection and source identification. This algorithm is capable of estimating the model’s marginal likelihood and inferring the posterior distribution of the contaminant source’s characteristics simultaneously. Finally, this proposed approach is verified through two GCSI case studies, which include a synthetic groundwater contamination problem and a groundwater transport column experiment. The results demonstrated that GCSI could be inconsistent when using different model structures. Models with higher marginal likelihoods tend to have better performance on the predictions of the contaminant source’s characteristics. It was concluded that POLYCHORD is efficient in marginal likelihood estimation and GCSI.
Résumé
De nombreuses approches déterministes et stochastiques ont été utilisées pour identifier la source de contaminants dans les eaux souterraines au cours des dernières décennies. Habituellement, leurs mises en œuvre sont basés sur un modèle hydrogéologique unique ou une structure de modèle fixe et ignorent l’incertitude de la structure du modèle. Cependant, l’incertitude de la structure du modèle est inévitable dans le cas de la modélisation hydrogéologique, en particulier pour des environnements géologiques complexes et des observations limitées. Cette étude a évalué l’impact de l’incertitude liée à la structure du modèle sur l’identification de la source de contaminants dans les eaux souterraines, et propose une approche pour le faire qui est basée sur la sélection d’un modèle bayésien. Dans le cadre d’analyse de multiple modèles, un ensemble de structures de modèle alternatif est utilisé pour représenter la partie non connue du système hydrogéologique. Ensuite, un nouvel algorithme d’échantillonnage imbriqué, POLYCHORD, est utilisé pour la sélection du modèle et l’identification de la source. Cet algorithme est capable d’estimer la probabilité marginale du modèle et d’interférer avec la distribution à postériori des caractéristiques de la source des contaminants simultanément. Finalement, cette approche proposée est vérifiée en l’appliquant sur deux cas d’études, qui intègrent un problème synthétique de contamination des eaux souterraines et une expérience de transport dans l’eau souterraine sur une colonne. Les résultats démontrent que l’identification de la source des contaminants des eaux souterraines pourrait être incohérente si l’on utilise des structures de modèle différentes. Les modèles dont les probabilités marginales sont les plus élevées, ont tendance à être plus performant sur les prévisions des caractéristiques de la source des contaminants. Il a été conclu que POLYCHORD est efficace dans l’estimation da la probabilité marginale et l’identification de la source des contaminants dans les eaux souterraines.
Resumen
En las últimas décadas se han aplicado muchos enfoques deterministas y estocásticos para la identificación de fuentes de contaminantes de aguas subterráneas (GCSI). Por lo general, estas implementaciones se basan en un modelo único de agua subterránea o en una estructura de modelo fijo e ignoran la incertidumbre de la estructura del modelo. Sin embargo, la incertidumbre de la estructura del modelo es inevitable para el modelado de aguas subterráneas, especialmente para entornos geológicos complejos y observaciones limitadas. Este estudio evaluó el impacto de la incertidumbre de la estructura del modelo en la GCSI, y propone un enfoque para la GCSI basado en la selección del modelo bayesiano. En el marco del análisis de modelos múltiples, se utiliza un conjunto de estructuras de modelos alternativos para representar el sistema de aguas subterráneas desconocido. A continuación, se utiliza un nuevo algoritmo de muestreo anidado, POLYCHORD, para la selección del modelo y la identificación de la fuente. Este algoritmo es capaz de estimar la probabilidad marginal del modelo e inferir la distribución posterior de las características de la fuente del contaminante simultáneamente. Finalmente, este enfoque propuesto se verifica a través de dos estudios de caso de la GCSI, que incluyen un problema de contaminación sintética de las aguas subterráneas y un experimento de columna de transporte de aguas subterráneas. Los resultados demostraron que la GCSI podría ser inconsistente cuando se utilizan diferentes estructuras de modelos. Los modelos con mayor probabilidad marginal tienden a tener un mejor desempeño en las predicciones de las características de la fuente del contaminante. Se concluyó que POLYCHORD es eficiente en la estimación de la probabilidad marginal y en la GCSI.
摘要
近年来已有大量的确定性和随机性方法用于地下水污染源识别,通常这些方法是基于单一的地下水模型或固定的模型结构,而未考虑模型结构的不确定性。在实际条件下,特别对于复杂的地质环境和有限的观测数据,地下水模型的结构不确定性是不可避免的。本次研究评价了模型结构不确定性对于地下水污染源识别的影响,并提出一种基于贝叶斯模型选择的污染源识别方法。首先,在多模型分析的框架内,提出一组可行的备选模型结构来描述实际未知的地下水系统;其次,将最新提出的POLYCHORD算法用于贝叶斯模型选择及污染源识别。最后,通过两个案例分析,包括一个理想的地下水污染物运移案例和一个地下水污染物运移室内实验,对提出的污染源识别方法进行验证。研究结果表明,不同的模型结构将会导致不一致的地下水污染源识别结果。具有越高模型边缘似然值的模型结构,能够更好的进行污染源识别。同时,提出的POLYCHORD算法能够有效地进行模型边缘似然值估计和地下水污染源识别。
Resumo
Muitas abordagens determinísticas e estocásticas foram aplicadas na identificação de fontes de contaminantes de águas subterrâneas (IFCAS) em décadas recentes. Geralmente, essas implementações são baseadas em um único modelo de águas subterrâneas ou um modelo de estrutura fixa e ignoram a incerteza da estrutura do modelo. Entretanto, a incerteza da estrutura do modelo é inevitável para modelagem das águas subterrâneas, especialmente para ambientes geológicos complexos e observações limitadas. Esse estudo avaliou o impacto da incerteza da estrutura do modelo na IFCAS, e propõe uma abordagem para IFCAS baseada na seleção Bayesiana de modelos. Na estrutura da análise de modelos múltiplos, um conjunto de estruturas de modelos alternativos é usado para representar o sistema de água subterrânea desconhecido. Em seguida, um novo algoritmo de amostragem aninhada, POLYCHORD, é usado para seleção do modelo e identificação de fonte. Esse algoritmo é capaz de estimar a probabilidade marginal do modelo e inferir a distribuição posterior das características da fonte contaminante simultaneamente. Finalmente, essa abordagem proposta é verificada através de dois estudos de caso de IFCAS, que incluem um problema sintético de contaminação das águas subterrâneas e um experimento de coluna de transporte de águas subterrâneas. Os resultados demonstraram que a IFCAS pode ser inconsistente ao usar diferentes estruturas de modelo. Modelos com maiores probabilidades marginais tendem a ter melhor desempenho nas previsões das características da fonte de contaminantes. Concluiu-se que o POLYCHORD é eficiente na estimativa da probabilidade marginal e na IFCAS.
Similar content being viewed by others
References
Atmadja J, Bagtzoglou AC (2001) State of the art report on mathematical methods for groundwater pollution source identification. Environ Forensic 2(3):205–214
Ayvaz MT (2007) Simultaneous determination of aquifer parameters and zone structures with fuzzy c-means clustering and meta-heuristic harmony search algorithm. Adv Water Resour 30(11):2326–2338
Ayvaz MT (2010) A linked simulation-optimization model for solving the unknown groundwater pollution source identification problems. J Contam Hydrol 117(1–4):46–59
Ayvaz MT (2016) A hybrid simulation–optimization approach for solving the areal groundwater pollution source identification problems. J Hydrol 538:161–176
Brunetti C, Linde N, Vrugt JA (2017) Bayesian model selection in hydrogeophysics: application to conceptual subsurface models of the South Oyster Bacterial Transport Site, Virginia, USA. Adv Water Resour 102:127–141
Cao TT, Zeng XK, Wu JC, Wang D, Sun YY, Zhu XB, Lin J, Long YQ (2018) Integrating MT-DREAMzs and nested sampling algorithms to estimate marginal likelihood and comparison with several other methods. J Hydrol 563:750–765
Dai H, Chen XY, Ye M, Song XH, Zachara JM (2017) A geostatistics-informed hierarchical sensitivity analysis method for complex groundwater flow and transport modeling. Water Resour Res 53(5):4327–4343
Dokou Z, Pinder GF (2009) Optimal search strategy for the definition of a DNAPL source. J Hydrol 376(3–4):542–556
El-Jaat M, Hulley M, Tetreault M (2018) Evaluation of the fast orthogonal search method for forecasting chloride levels in the Deltona groundwater supply (Florida, USA). Hydrogeol J 26(6):1809–1820
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472
Gzyl G, Zanini A, Fraczek R, Kura K (2014) Contaminant source and release history identification in groundwater: a multi-step approach. J Contam Hydrol 157:59–72
Hamilton JD (1994) Time series analysis. Princeton University Press, Princeton, NJ, 120 pp
Handley WJ, Hobson MP, Lasenby AN (2015) POLYCHORD: nested sampling for cosmology. Mon Not R Astron Soc 450(1):L61–L65
Harbaugh AW (2005) MODFLOW-2005, the US Geological Survey modular ground-water model: the ground-water flow process. US Geological Survey, Reston, VA
Heron G, Parker K, Galligan J, Holmes TC (2009) Thermal treatment of eight CVOC source zones to near nondetect concentrations. Ground Water Monit R 29(3):56–65
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14(4):382–401
Hou ZY, Lu WX (2018) Comparative study of surrogate models for groundwater contamination source identification at DNAPL-contaminated sites. Hydrogeol J 26(3):923–932
Lartillot N, Philippe H (2006) Computing Bayes factors using thermodynamic integration. Syst Biol 55(2):195–207
Li GS, Tan YJ, Cheng J, Wang XQ (2006) Determining magnitude of groundwater pollution sources by data compatibility analysis. Inverse Probl Sci En 14(3):287–300
Li Z, Mao XZ (2011) Global multiquadric collocation method for groundwater contaminant source identification. Environ Model Softw 26(12):1611–1621
Liu HZ, Bruton TA, Doyle FM, Sedlak DL (2014) In situ chemical oxidation of contaminated groundwater by persulfate: decomposition by Fe(III)- and Mn(IV)-containing oxides and aquifer materials. Environ Sci Technol 48(17):10330–10336
Liu PG, Elshall AS, Ye M, Beerli P, Zeng XK, Lu D, Tao YZ (2016) Evaluating marginal likelihood with thermodynamic integration method and comparison with several other numerical methods. Water Resour Res 52(2):734–758
Lv XY, Gao B, Sun YY, Dong SN, Wu JC, Jiang BL, Shi XQ (2016) Effects of grain size and structural heterogeneity on the transport and retention of nano-TiO2 in saturated porous media. Sci Total Environ 563:987–995
Man J, Liao Q, Zeng L, Wu L (2017) ANOVA-based transformed probabilistic collocation method for Bayesian data-worth analysis. Adv Water Resour 110:203–214
Nan TC, Wu JC (2017) Application of ensemble H-infinity filter in aquifer characterization and comparison to ensemble Kalman filter. Water Sci Eng 10(1):25–35
Neal RM (2003) Slice sampling. Ann Stat 31(3):705–741
Obiri-Nyarko F, Grajales-Mesa SJ, Malina G (2014) An overview of permeable reactive barriers for in situ sustainable groundwater remediation. Chemosphere 111:243–259
Onyari E, Taigbenu A (2017) Inverse Green element evaluation of source strength and concentration in groundwater contaminant transport. J Hydroinf 19(1):81–96
Refsgaard JC, Christensen S, Sonnenborg TO, Seifert D, Hojberg AL, Troldborg L (2012) Review of strategies for handling geological uncertainty in groundwater flow and transport modeling. Adv Water Resour 36:36–50
Schoniger A, Wohling T, Samaniego L, Nowak W (2014) Model selection on solid ground: rigorous comparison of nine ways to evaluate Bayesian model evidence. Water Resour Res 50(12):9484–9513
Schoniger A, Illman WA, Wohling T, Nowak W (2015) Finding the right balance between groundwater model complexity and experimental effort via Bayesian model selection. J Hydrol 531:96–110
Schoups G, Vrugt JA (2010) A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors. Water Resour Res 46(10):79–93
Smith T, Sharma A, Marshall L, Mehrotra R, Sisson S (2010) Development of a formal likelihood function for improved Bayesian inference of ephemeral catchments. Water Resour Res 46:W12551
Skilling J (2004) Nested sampling. AIP Conf Proc 735(1):395–405. https://doi.org/10.1063/1.1835238
Skilling J (2006) Nested sampling for general Bayesian computation. Bayesian Anal 1(4):833–859
Srivastava D, Singh RM (2015) Groundwater system modeling for simultaneous identification of pollution sources and parameters with uncertainty characterization. Water Resour Manag 29(13):4607–4627
Sun AY, Painter SL, Wittmeyer GW (2006) A robust approach for iterative contaminant source location and release history recovery. J Contam Hydrol 88(3–4):181–196
Volpi E, Schoups G, Firmani G, Vrugt JA (2017) Sworn testimony of the model evidence: Gaussian mixture importance (GAME) sampling. Water Resour Res 53(7):6133–6158
Xu T, Gómez-Hernández JJ (2016) Joint identification of contaminant source location, initial release time, and initial solute concentration in an aquifer via ensemble Kalman filtering. Water Resour Res 52(8):6587–6595
Xu T, Gomez-Hernandez JJ (2018) Simultaneous identification of a contaminant source and hydraulic conductivity via the restart normal-score ensemble Kalman filter. Adv Water Resour 112:106–123
Xu TF, Valocchi AJ (2015) A Bayesian approach to improved calibration and prediction of groundwater models with structural error. Water Resour Res 51(11):9290–9311
Xu TF, Valocchi AJ, Ye M, Liang F (2017) Quantifying model structural error: efficient Bayesian calibration of a regional groundwater flow model using surrogates and a data-driven error model. Water Resour Res 53(5):4084–4105
Xue L, Zhang D, Guadagnini A, Neuman SP (2014) Multimodel Bayesian analysis of groundwater data worth. Water Resour Res 50(11):8481–8496
Zell WO, Culver TB, Sanford WE (2018) Prediction uncertainty and data worth assessment for groundwater transport times in an agricultural catchment. J Hydrol 561:1019–1036
Zeng XK, Wu JC, Wang D, Zhu XB, Long YQ (2016) Assessing Bayesian model averaging uncertainty of groundwater modeling based on information entropy method. J Hydrol 538:689–704
Zeng XK, Ye M, Wu JC, Wang D, Zhu XB (2018) Improved nested sampling and surrogate-enabled comparison with other marginal likelihood estimators. Water Resour Res 54(2):797–826
Zhang JJ, Li WX, Zeng LZ, Wu LS (2016) An adaptive Gaussian process-based method for efficient Bayesian experimental design in groundwater contaminant source identification problems. Water Resour Res 52(8):5971–5984
Zheng C, Wang PP (1999) MT3DMS: a modular three-dimensional multispecies transport model for simulation of advection, dispersion, and chemical reactions of contaminants in groundwater systems: documentation and user’s guide. Alabama Univ University, Tuscaloosa, AL
Funding
This study was supported by The National Key Research and Development Program of China (2016YFC0402802), and the Fundamental Research Funds for the Central Universities (0206-14380085).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
ESM 1
(PDF 502 kb)
Rights and permissions
About this article
Cite this article
Cao, T., Zeng, X., Wu, J. et al. Groundwater contaminant source identification via Bayesian model selection and uncertainty quantification. Hydrogeol J 27, 2907–2918 (2019). https://doi.org/10.1007/s10040-019-02055-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10040-019-02055-3