Skip to main content

An improved Bayesian approach linked to a surrogate model for identifying groundwater pollution sources

Une approche bayésienne améliorée liée à un modèle de substitution pour identifier les sources de pollution des eaux souterraines

Un procedimiento bayesiano optimizado vinculado a un modelo alternativo para identificar las fuentes de contaminación de las aguas subterráneas

识别地下水污染源的利用替代模型的贝叶斯改进方法

Um método bayesiano melhorado ligado a um modelo substituto para identificar fontes de poluição em água subterrânea

Abstract

Groundwater pollution source identification (GPSI) provides information about the temporal and spatial distribution of pollution sources and helps decision makers design pollution remediation plans to protect the groundwater environment. The Bayesian approach based on the Markov Chain Monte Carlo (MCMC) approach provides an efficient framework for GPSI. However, MCMC sampling entails multiple model calls to converge to the posterior probability distribution of unknown pollution source parameters and entails a massive computational load if the simulation model is directly called. This study aimed to develop an innovative framework in which an improved MCMC approach was linked to a surrogate model. Sensitivity analysis was incorporated into the MH-MCMC approach, named SAMH-MCMC (sensitivity analysis based Metropolis Hastings-Markov Chain Monte Carlo), to speed up the convergence of the posterior distribution in a novel way to control the search step size. Three computationally inexpensive surrogate models for the simulation model were proposed: support vector regression, Kriging (KRG), and multilayer perceptron, and the most accurate model was chosen. The feasibility and advantages of the developed framework were evaluated and validated through two hypothetical numerical cases with homogenous and heterogeneous media. The proposed approach has strong convergence robustness as it considers the sensitivities of the unknown parameters that characterise groundwater pollution sources and can achieve high identification accuracy. Furthermore, the KRG surrogate model has a higher accuracy than other surrogate models, owing to its linear unbiased estimation characteristic. Overall, the framework developed in this study is a promising solution for identifying groundwater pollution source parameters.

Résumé

L’identification des sources de pollution des eaux souterraines (ISPES) fournit des informations sur la distribution temporelle et spatiale des sources de pollution et aide les décideurs à concevoir des plans de dépollution pour protéger l’environnement des eaux souterraines. L’approche bayésienne basée sur l’approche de Monte Carlo par chaîne de Markov (MCMC) fournit un cadre efficace pour l’ISPES. Cependant, l’échantillonnage MCMC implique de multiples appels au modèle pour converger vers la distribution de probabilité postérieure des paramètres inconnus de la source de pollution et entraîne une charge de calcul massive si le modèle de simulation est appelé directement. Cette étude visait à développer un cadre innovant dans lequel une approche MCMC améliorée était liée à un modèle de substitution. L’analyse de sensibilité a été incorporée dans l’approche MH-MCMC, appelée SAMH-MCMC (analyse de sensibilité basée sur Metropolis Hastings – Chaine de Markov Monte Carlo), afin d’accélérer la convergence de la distribution postérieure d’une nouvelle manière de contrôler la taille du pas de recherche. Trois modèles de substitution peu coûteux en termes de calcul ont été proposés pour le modèle de simulation: la régression par vecteur de support, le krigeage (KRG) et le perceptron multicouche, et le modèle le plus précis a été choisi. La faisabilité et les avantages du cadre développé ont été évalués et validés par deux cas numériques hypothétiques avec des milieux homogènes et hétérogènes. L’approche proposée présente une forte robustesse de convergence car elle prend en compte les sensibilités des paramètres inconnus qui caractérisent les sources de pollution des eaux souterraines et peut atteindre une grande précision d’identification. En outre, le modèle de substitution KRG a une précision plus élevée que les autres modèles de substitution, en raison de sa caractéristique d’estimation linéaire sans biais. Globalement, le cadre développé dans cette étude est une solution prometteuse pour l’identification des paramètres des sources de pollution des eaux souterraines.

Resumen

La identificación de las fuentes de contaminación de las aguas subterráneas (GPSI) proporciona información sobre la distribución temporal y espacial de las fuentes de contaminación y ayuda a los responsables de la toma de decisiones a diseñar planes de remediación de la contaminación para proteger el medio ambiente de las aguas subterráneas. El procedimiento bayesiano basado en el enfoque Markov Chain Monte Carlo (MCMC) proporciona un marco eficiente para la GPSI. Sin embargo, el muestreo MCMC implica múltiples demandas del modelo para converger a la distribución de probabilidad posterior de los parámetros desconocidos de la fuente de contaminación y conlleva una enorme carga computacional si se recurre directamente al modelo de simulación. El objetivo de este estudio es desarrollar un marco innovador en el que se vincula un enfoque MCMC optimizado con un modelo alternativo. El análisis de sensibilidad se incorporó al procedimiento MH-MCMC, denominado SAMH-MCMC (análisis de sensibilidad basado en Metropolis Hastings-Markov Chain Monte Carlo), para acelerar la convergencia de la distribución posterior de una forma novedosa de controlar el tamaño del intervalo de búsqueda. Se propusieron tres modelos alternativos computacionalmente económicos para el modelo de simulación: regresión de vectores de apoyo, Kriging (KRG) y percepción multicapa, y se eligió el modelo más preciso. La viabilidad y las ventajas del marco desarrollado se evaluaron y validaron mediante dos casos numéricos hipotéticos con medios homogéneos y heterogéneos. El enfoque propuesto tiene una gran robustez de convergencia, ya que tiene en cuenta las sensibilidades de los parámetros desconocidos que caracterizan las fuentes de contaminación de las aguas subterráneas y puede alcanzar una gran precisión de identificación. Además, el modelo alternativo KRG tiene una mayor precisión que otros modelos alternativos, debido a su característica de estimación lineal insesgada. En general, el marco desarrollado en este estudio es una solución prometedora para identificar los parámetros de las fuentes de contaminación de las aguas subterráneas.

摘要

地下水污染源识别(GPSI)提供有关污染源的时空分布信息, 帮助决策者设计污染修复计划以保护地下水环境。基于马尔可夫链蒙特卡罗 (MCMC) 方法的贝叶斯方法为 GPSI 提供了一个有效的框架。然而, MCMC采样需要多次模型调用才能收敛到未知污染源参数的后验概率分布, 如果直接调用模拟模型, 则需要大量的计算量。本研究旨在开发一个创新框架, 其中利用替代模型来改进MCMC 方法。敏感性分析被合并到 MH-MCMC 方法中, 命名为 SAMH-MCMC(基于Metropolis Hastings-马尔可夫链蒙特卡罗的敏感性分析), 以控制搜索步长的新方式加速后验分布的收敛。为模拟模型提出了三种计算成本低的替代模型:支持向量回归、克里金法 (KRG) 和多层感知器, 并选择了最准确的模型。通过具有均质和非均质介质的两个假设数值案例, 评估和验证了所开发框架的可行性和优势。所提出的方法具有很强的收敛鲁棒性, 因为它考虑了表征地下水污染源的未知参数的敏感性, 并且可以实现较高的识别精度。此外, 由于其线性无偏估计特性, KRG 替代模型比其他代理模型具有更高的准确性。总体而言, 本研究中开发的框架是识别地下水污染源参数的应用前景的解决方案。

Resumo

Identificação de fonte de poluição em água subterrânea (IFPAS), fornece informação sobre a distribuição temporal e espacial de fontes de poluição e ajuda tomadores de decisão a desenhar planos de remediação de poluição para proteger a água subterrânea. O método Bayesiano baseado no método de Monte Carlo Cadeia de Markov (MCMC) fornece um arcabouço eficiente para identificação de fontes de poluição em água subterrânea (IFPAS). Entretanto, a amostragem de MCMC resulta em múltiplas interações para convergir para a distribuição de probabilidade posterior para parâmetros de uma fonte de poluição não conhecida e resulta numa carga computacional massiva se o modelo de simulação é acionado de forma direta. Este estudo é direcionado para desenvolver um arcabouço inovador no qual um método MCMC otimizado estava conectado a um modelo substituto. Análise de sensibilidade foi incorporada no método MH-MCMC, chamada de SAMH-MCMC (análise sensitiva baseada em Metropolis Hastings-Monte Carlo Cadeia de Markov), para acelerar a convergência da distribuição posterior com um método inovador de controlar o tamanho da etapa de busca. Três modelos substitutos de baixa demanda computacional para o modelo de simulação foram propostos: regressão vetorial de suporte, Krigagem (KRG) e perceptron de multicamadas, sendo o modelo de maior acurácia foi selecionado. A viabilidade e vantagens do arcabouço desenvolvido foram avaliados e validados através de dois casos numéricos hipotéticos utilizando-se de meios homogêneos e heterogêneos. O método proposto tem uma forte convergência robusta, pois considera a sensibilidade dos parâmetros que caracterizam fontes de poluição na água subterrânea e pode alcançar alta acurácia (resolução) de identificação. Além do mais, o modelo substituto KRG possui acurácia maior que outros modelos substitutos, devido a sua característica linear de estimativa não enviesada. Além disso, o arcabouço desenvolvido nesse estudo é uma solução promissora para identificar parâmetro de fontes de poluição em água subterrânea.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

References

  1. Agirre-Basurko E, Ibarra-Berastegi G, Madariaga I (2006) Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Environ Model Softw 21(4):430–446

    Article  Google Scholar 

  2. Alapati S, Kabala ZJ (2000) Recovering the release history of a groundwater contaminant using a non-linear least-squares method. Hydrol Process 14(6):1003–1016

    Article  Google Scholar 

  3. Amirabdollahian M, Datta B (2014) Identification of pollutant source characteristics under uncertainty in contaminated water resources systems using adaptive simulated anealing and fuzzy logic. Int J GEOMATE 6(1):757–762

    Google Scholar 

  4. An Y, Lu W, Cheng W (2015) Surrogate model application to the identification of optimal groundwater exploitation scheme based on regression kriging method: a case study of Western Jilin Province. Int J Environ Res Public Health 12(8):8897–8918

    Article  Google Scholar 

  5. An Y, Lu W, Yan X (2018) A surrogate-based simulation–optimization approach application to parameters’ identification for the HydroGeoSphere model. Environ Earth Sci 77(17):621

    Article  Google Scholar 

  6. Atmadja J, Bagtzoglou AC (2001) State of the art report on mathematical methods for groundwater pollution source identification. Environ Forensic 2(3):205–214

    Article  Google Scholar 

  7. Ayvaz MT (2010) A linked simulation–optimization model for solving the unknown groundwater pollution source identification problems. J Contam Hydrol 117:46–59

    Article  Google Scholar 

  8. Ayvaz MT (2016) A hybrid simulation–optimization approach for solving the areal groundwater pollution source identification problems. J Hydrol 538:161–176

    Article  Google Scholar 

  9. Bagtzoglou AC, Atmadja J (2003) Marching-jury backward beam equation and quasi-reversibility methods for hydrologic inversion: application to contaminant plume spatial distribution recovery. Water Resour Res 39(2):1038

  10. Behzadian K, Kapelan Z, Savic D, Ardeshir A (2009) Stochastic sampling design using a multi-objective genetic algorithm and adaptive neural networks. Environ Model Softw 24(4):530–541

    Article  Google Scholar 

  11. Buhmann M (2003) Radial Basis Functions: Theory and Implementations (Cambridge Monographs on Applied and Computational Mathematics). Cambridge, Cambridge University Press

  12. Butera I, Tanda MG (2003) A geostatistical approach to recover the release history of groundwater pollutants. Water Resour Res 39, ID 129830474

  13. Chen M, Izady A, Abdalla OA, Amerjeed M (2018) A surrogate-based sensitivity quantification and Bayesian inversion of a regional groundwater flow model. J Hydrol 557:826–837

    Article  Google Scholar 

  14. Clarke SM, Griebsch JH, Simpson TW (2005) Analysis of support vector regression for approximation of complex engineering analyses. J Mech Des 127(6):1077–1087

  15. Datta B, Chakrabarty D, Dhar A (2011) Identification of unknown groundwater pollution sources using classical optimization with linked simulation. J Hydro Environ Res 5(1):25–36

    Article  Google Scholar 

  16. Drucker H, Burges C, Kaufman L et al (1997) Support vector regression machines. Adv Neural Inf Proces Syst 28(7):779–784

    Google Scholar 

  17. Freeze RA, Witherspoon PA (1966) Theoretical analysis of regional groundwater flow: 1. analytical and numerical solutions to the mathematical model. Water Resour Res 2(4):641–656

    Article  Google Scholar 

  18. Geweke J (1991) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Federal Reserve Bank of Minneapolis, Research Dept, Minneapolis, MN

    Book  Google Scholar 

  19. Gzyl G, Zanini A, Frączek R et al (2014) Contaminant source and release history identification in groundwater: a multi-step approach. J Contam Hydrol 157:59–72

    Article  Google Scholar 

  20. Haario H, Tamminen SJ (2001) An adaptive metropolis algorithm. Bernoulli 7(2):223–242

    Article  Google Scholar 

  21. Haario H, Laine M, Mira A et al (2006) DREM: efficient adaptive MCMC. Stat Comput 16(4):339–354

    Article  Google Scholar 

  22. Harbaugh AW (2005) MODFLOW-2005, the U.S. Geological Survey modular ground-water model: the ground-water flow process. US Geological Survey Techniques and Methods 6-A16, US Geological Survey, Reston, VA

  23. Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109

    Article  Google Scholar 

  24. Hou Z, Lu W (2018) Comparative study of surrogate models for groundwater contamination source identification at DNAPL-contaminated sites. Hydrogeol J 26(3):923–932

    Article  Google Scholar 

  25. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1/3):489–501

    Article  Google Scholar 

  26. Jha MK, Datta B (2014) Linked simulation-optimization based dedicated monitoring network design for unknown pollutant source identification using dynamic time warping distance. Water Resour Manag 28(12):4161–4182

    Article  Google Scholar 

  27. Jiang X, Lu W, Hou Z et al (2015) Ensemble of surrogates-based optimization for identifying an optimal surfactant-enhanced aquifer remediation strategy at heterogeneous DNAPL-contaminated sites. Comput Geoences 84:37–45

    Google Scholar 

  28. Judith E, Deleo JM (2001) Artificial neural networks. Cancer 91(S8):1615–1635

    Article  Google Scholar 

  29. Kerrou J, Renard P (2009) A numerical analysis of dimensionality and heterogeneity effects on advective dispersive seawater intrusion processes. Hydrogeol J 18(1):55–72

    Article  Google Scholar 

  30. Kerrou J, Renard P, Lecca G et al (2010) Grid-enabled Monte Carlo analysis of the impacts of uncertain discharge rates on seawater intrusion in the Korba aquifer (Tunisia). Hydrol Sci J 55(8):1325–1336

    Article  Google Scholar 

  31. Khu ST, Werner MGF (2003) Reduction of Monte-Carlo simulation runs for uncertainty estimation in hydrological modelling. Hydrol Earth Syst Sci 7(5):680–692

    Article  Google Scholar 

  32. Laloy E, Vrugt JA (2012) High-dimensional posterior exploration of hydrologic models using multiple-try DREAM(ZS) and high-performance computing. Water Resour Res 48(1):W01526

    Article  Google Scholar 

  33. Laloy E, Rogiers B, Vrugt JA et al (2013) Efficient posterior exploration of a high-dimensional groundwater model from two-stage Markov chain Monte Carlo simulation and polynomial chaos expansion. Water Resour Res 49(5):2664–2682

    Article  Google Scholar 

  34. Li GS, Tan YJ, Cheng J et al (2006) Determining magnitude of groundwater pollution sources by data compatibility analysis. Inverse Probl Sci Eng 14(3):287–300

    Article  Google Scholar 

  35. Liu X, Cardiff MA, Kitanidis PK (2010) Parameter estimation in nonlinear environmental problems. Stoch Env Res Risk A 24(7):1003–1022

    Article  Google Scholar 

  36. Metropolis N, Rosenbluth AW, Rosenbluth MN et al (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092

    Article  Google Scholar 

  37. Michalak AM, Kitanidis PK (2004) Estimation of historical groundwater contaminant distribution using the adjoint state method applied to geostatistical inverse modeling. Water Resour Res 40(8): W08302

  38. Mirghani BY, Zechman EM, Ranjithan RS et al (2012) Enhanced simulation-optimization approach using surrogate modeling for solving inverse problems. Environ Forensic 13(4):348–363

    Article  Google Scholar 

  39. Mullur AA, Messac A (2006) Metamodeling using extended radial basis functions: a comparative approach. Eng Comput 21(3):203–217

    Article  Google Scholar 

  40. Neupauer RM, Borchers B, Wilson JL (2000) Comparison of inverse methods for reconstructing the release history of a groundwater contamination source. Water Resour Res 36(9):2469–2475

    Article  Google Scholar 

  41. Noriega L (2005) Multilayer perceptron tutorial. School of Computing, Staffordshire University, Staffordshire, UK, 1-12

  42. Prakash O, Datta B (2013) Sequential optimal monitoring network design and iterative spatial estimation of pollutant concentration for identification of unknown groundwater pollution source locations.  185(7):5611–5626

  43. Regis RG, Shoemaker CA (2007) A stochastic radial basis function method for the global optimization of expensive functions. INFORMS J Comput 19(4):497–509

    Article  Google Scholar 

  44. Ruck DW, Rogers SK, Kabrisky M et al (1990) The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Trans Neural Netw 1(4):296–298

    Article  Google Scholar 

  45. Sacks J, Welch WJ, Mitchell TJ et al (1989) Design and analysis of computer experiments. Stat Sci 4(4):409–423

    Google Scholar 

  46. Simpson TW, Mauery TM, Korte JJ et al (2001) Kriging models for global approximation in simulation-based multidisciplinary design optimization. AIAA J 39:2233–2241

    Article  Google Scholar 

  47. Skaggs TH, Kabala ZJ (1994) Recovering the release history of a groundwater contaminant. Water Resour Res 30(1):71–79

    Article  Google Scholar 

  48. Smith TJ, Marshall LA (2008) Bayesian methods in hydrologic modeling: a study of recent advancements in Markov chain Monte Carlo techniques. Water Resour Res 44(12):W00B05

    Article  Google Scholar 

  49. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222

  50. Smolyak S (1963) Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl Akad Nauk SSSR 4(5):1042–1045

    Google Scholar 

  51. Sun AY (2007) A robust geostatistical approach to contaminant source identification. Water Resour Res 43(2):329–335

    Google Scholar 

  52. Sun AY, Painter SL, Wittmeyer GW (2006) A constrained robust least squares approach for contaminant release history identification. Water Resour Res 42(4):263–269

    Article  Google Scholar 

  53. Ter Braak CJF (2006) A Markov chain Monte Carlo version of the genetic algorithm differential evolution: easy Bayesian computing for real parameter spaces. Stat Comput 16(16):239–249

    Article  Google Scholar 

  54. Van Griensven AV, Meixner T, Grunwald S et al (2006) A global sensitivity analysis tool for the parameters of multi-variable catchment models. J Hydrol 324(1–4):10–23

    Article  Google Scholar 

  55. Vapnik VN (1999) An overview of statistical learning theory. IEEE transactions on neural networks 10(5):988–999

  56. Vrugt JA, Ter Braak CJF (2011) DREAM(D): an adaptive Markov chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter estimation problems. Hydrol Earth Syst Sci 8(12):3701–3713

    Article  Google Scholar 

  57. Vrugt JA, Gupta HV, Bouten W et al (2003) A shuffled complex evolution Metropolis algorithm for optimization and uncertainty assessment of hydrologic model parameters. Water Resour Res 39(8):WR001642

    Google Scholar 

  58. Vrugt JA, Ter Braak CJF, Diks CGH et al (2009) Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling. Int J Nonlinear Sci Numer Simul 10(3):273–290

    Article  Google Scholar 

  59. Xing Z, Qu R, Zhao Y et al (2019) Identifying the release history of a groundwater contaminant source based on an ensemble surrogate model. J Hydrol 572:501–516

    Article  Google Scholar 

  60. Yan X, Dong W, An Y et al (2019) A Bayesian-based integrated approach for identifying groundwater contamination sources. J Hydrol 579:124160

    Article  Google Scholar 

  61. Zeng L, Shi L, Zhang D et al (2012) A sparse grid-based Bayesian method for contaminant source identification. Adv Water Resour 37:1–9

    Article  Google Scholar 

  62. Zhang X, Srinivasan R, Van Liew M (2009) Approximating SWAT model using artificial neural network and support vector machine. JAWRA J Am Water Resour Assoc 45(2):460–474

    Article  Google Scholar 

  63. Zhang G, Lu D, Ye M et al (2013) An adaptive sparse-grid high-order stochastic collocation method for Bayesian inference in groundwater reactive transport modeling. Water Resour Res 49(10):6871–6892

    Article  Google Scholar 

  64. Zhang J, Zeng L, Chen C et al (2015) Efficient Bayesian experimental design for contaminant source identification. Water Resour Res 51(1):576–598

    Article  Google Scholar 

  65. Zhang J, Li W, Zeng L et al (2016) An adaptive Gaussian process-based method for efficient Bayesian experimental design in groundwater contaminant source identification problems. Water Resour Res 52(8):5971–5984

    Article  Google Scholar 

  66. Zhang J, Zheng Q, Chen D et al (2020) Surrogate-based Bayesian inverse modeling of the hydrological system: an adaptive approach considering surrogate approximation error. Water Resour Res 56(1):e2019WR025721

    Google Scholar 

  67. Zheng C, Wang PP (1999) MT3DMS: a modular three-dimensional multispecies transport model for simulation of advection, dispersion, and chemical reactions of contaminants in groundwater systems: documentation and user’s guide. http://hdl.handle.net/11681/4734. Accessed September 30, 2021

Download references

Funding

This study was supported by the National Natural Science Foundation of China (Nos. 41790441, 41931285, and 41902249), the China Postdoctoral Science Foundation (No. 2020 M683399), the Fundamental Research Funds for the Central Universities, CHD (No. 300102291106) and the Key Research and Development Program of Shaanxi (Program No. 2020SF-405).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yongkai An.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

An, Y., Yan, X., Lu, W. et al. An improved Bayesian approach linked to a surrogate model for identifying groundwater pollution sources. Hydrogeol J (2021). https://doi.org/10.1007/s10040-021-02411-2

Download citation

Keywords

  • Groundwater pollution sources
  • Sensitivity analysis
  • Bayesian
  • Inverse modeling
  • Stochastic hydrogeology