Skip to main content
Log in

Comparative study of surrogate models for groundwater contamination source identification at DNAPL-contaminated sites

Etude comparative des modèles de substitution pour l’identification de la source de contamination des eaux souterraines au droit de sites contaminés par des LDNA

Estudio comparativo de modelos sustitutos para la identificación de fuentes de contaminación de aguas subterráneas en sitios contaminados por DNAPL

重质非水相液体污染场地地下水污染源确定替代模型对比研究

Estudo comparativo de modelos substituto para identificação de fontes de contaminação de águas subterrâneas em locais contaminados por DNAPL

  • Paper
  • Published:
Hydrogeology Journal Aims and scope Submit manuscript

Abstract

Knowledge of groundwater contamination sources is critical for effectively protecting groundwater resources, estimating risks, mitigating disaster, and designing remediation strategies. Many methods for groundwater contamination source identification (GCSI) have been developed in recent years, including the simulation–optimization technique. This study proposes utilizing a support vector regression (SVR) model and a kernel extreme learning machine (KELM) model to enrich the content of the surrogate model. The surrogate model was itself key in replacing the simulation model, reducing the huge computational burden of iterations in the simulation–optimization technique to solve GCSI problems, especially in GCSI problems of aquifers contaminated by dense nonaqueous phase liquids (DNAPLs). A comparative study between the Kriging, SVR, and KELM models is reported. Additionally, there is analysis of the influence of parameter optimization and the structure of the training sample dataset on the approximation accuracy of the surrogate model. It was found that the KELM model was the most accurate surrogate model, and its performance was significantly improved after parameter optimization. The approximation accuracy of the surrogate model to the simulation model did not always improve with increasing numbers of training samples. Using the appropriate number of training samples was critical for improving the performance of the surrogate model and avoiding unnecessary computational workload. It was concluded that the KELM model developed in this work could reasonably predict system responses in given operation conditions. Replacing the simulation model with a KELM model considerably reduced the computational burden of the simulation–optimization process and also maintained high computation accuracy.

Résumé

La connaissance des sources de contamination des eaux souterraines est. essentielle pour protéger efficacement les ressources en eau souterraine, estimer les risques, atténuer les désastres et concevoir des stratégies de remédiation. De nombreuses méthodes d’identification de source de contamination des eaux souterraines(ISCES) ont été développées durant les dernières années, incluant une technique de simulation–optimisation. Cette étude propose l’utilization d’un modèle de régression vectorielle de soutien (SVR) et d’un modèle basé sur l’apprentissage extrême d’un noyau (KELM) pour enrichir le contenu du modèle de substitution. Le modèle de substitution était lui-même la clef dans le remplacement du modèle de simulation, réduisant le lourd fardeau de calcul d’itérations de la technique de simulation–optimisation pour résoudre les problèmes d’ISCES, spécialement dans les problèmes d’ISCES d’aquifères contaminés par des liquides denses avec une phase non aqueuse (LDNA). Une étude comparative entre des modèles de krigeage, SVR et KELM est. présentée. De plus, on analyze l’influence de l’optimisation des paramètres et la structure de l’ensemble des données échantillonnées pour l’apprentissage sur la précision de l’approximation du modèle de substitution. On a trouvé que le modèle KELM était le modèle de substitution le plus précis, et que sa performance était améliorée de façon significative après l’optimisation des paramètres. La précision de l’approximation du modèle de substitution comparativement au modèle de simulation n’a pas toujours été améliorée en augmentant le nombre d’échantillons d’apprentissage. L’utilization d’un nombre approprié d’échantillon d’apprentissage était critique pour améliorer la performance du modèle de substitution et éviter une charge de calcul non nécessaire. Il a été conclu que le modèle KELM développé dans ce travail pouvait raisonnablement prédire les réponses du système dans des conditions opératoires données. Remplacer le modèle de simulation par un modèle KELM a considérablement réduit la charge de calcul associée à la procédure de simulation–optimisation et aussi conservé une grande précision de calcul.

Resumen

El conocimiento de las fuentes de contaminación del agua subterránea es fundamental para proteger eficazmente, estimar los riesgos, mitigar los desastres y diseñar estrategias de remediación de los recursos hídricos subterráneos. En los últimos años se han desarrollado muchos métodos para la identificación de fuentes de contaminación del agua subterránea (GCSI), incluidas las técnicas de optimización y simulación. Este estudio propone utilizar un modelo de regresión de soporte vectorial (SVR) y un modelo de máquina de aprendizaje de Kernel (KELM) para enriquecer el contenido del modelo sustituto. El modelo sustituto era en sí mismo clave en la sustitución del modelo de simulación, la reducción de la enorme carga computacional de iteraciones en la técnica de simulación-optimización para resolver problemas de GCSI, especialmente en acuíferos contaminados por líquidos densos en fase no acuosa (DNAPLs). Se presenta un estudio comparativo entre los modelos de Kriging, SVR y KELM. Además, se analiza la influencia de la optimización de parámetros y la estructura del conjunto de datos de la muestra de entrenamiento sobre la precisión de aproximación del modelo sustituto. Se encontró que el modelo KELM fue el modelo sustituto más preciso, y su desempeño mejoró significativamente después de la optimización de parámetros. La precisión de aproximación del modelo sustituto al modelo de simulación no siempre mejoró con un número creciente de muestras de entrenamiento. El uso del número apropiado de muestras de entrenamiento fue crítico para mejorar el rendimiento del modelo sustituto y evitar la carga de trabajo computacional innecesaria. Se concluyó que el modelo KELM desarrollado en este trabajo podría predecir razonablemente las respuestas del sistema en determinadas condiciones de operación. El reemplazo del modelo de simulación con un modelo KELM redujo considerablemente la carga computacional del proceso de optimización y simulación y también mantuvo una alta precisión de cálculo.

摘要

掌握地下水污染源信息对于有效保护地下水资源、评估风险、减轻灾害以及设计修复策略至关重要。近年来提出了很多地下水污染源确定方法,包括模拟-最优化技术。本研究的目的就是使用支撑矢量回顾模型以及内核极端学习机模型丰富替代模型的内容。替代模型本身在替代模拟模型中非常关键,减少解决地下水污染源确定问题、特别是解决重质非水相液体污染的含水层地下水污染源确定问题的模拟-最优化技术中迭代次数巨大的计算负担。论述了Kriging模型、支撑矢量回归模型和内核极端学习机模型对比研究。另外,还分析了参数最优化和培养样品集结构对替代模型近似精确度的影响。发现,内核极端学习机模型是最精确的替代模型,其性能在参数最优化后大大提高。替代模型对模拟模型的近似精确度并不总是随着培养样品的增加而提高。采用培养样品的合适数量对提高替代模型的性能以及避免不必要的计算量至关重要。结论就是,本研究开发的内核极端学习机模型可以合理地预测给定运行条件下的系统响应。用内核极端学习机模型替代模拟模型可大大减少模拟-最优化过程中的计算量,并可保持很高的计算精确度。

Resumo

O conhecimento sobre fontes de contaminação de águas subterrâneas é crítico para uma proteção efetiva dos recursos hídricos subterrâneos, estimando riscos, mitigando desastres, e elaborando estratégias de remediação. Muitos métodos para identificação de fontes de contaminação de águas subterrâneas (IFCAS) têm sido desenvolvidos nos últimos anos, incluindo a técnica de simulação-otimização. Esse estudo propôs a utilização um modelo de regressão por vetores de suporte (RVS) e um modelo de máquina de aprendizado extremo por kernel (MAEK) para enriquecer o conteúdo do modelo substituto (surrogate). O modelo substituto foi em si chave ao substituir o modelo de simulação, reduzindo o imenso peso computacional de interações na técnica de simulação-otimização para resolver problemas de IFCAS, especialmente problemas de IFCAS em aquíferos contaminados por compostos de Fase Líquida Densa Não Aquosa (DNAPLs). Descreve-se um estudo comparativo entre modelos de krigagem, RVS e MAEK. Além disso, disso, fez-se análise da influência da otimização de parâmetros e da estrutura do conjunto de dados da amostra de treinamento sobre a precisão de aproximação do modelo substituto. Verificou-se que o modelo MAEK foi o modelo substituto mais preciso e seu desempenho foi significativamente melhorado após a otimização de parâmetros. A precisão de aproximação do modelo substituto ao modelo de simulação nem sempre melhorou com o aumento do número de amostras de treinamento. Usar o número adequado de amostras de treinamento foi crítico para melhorar o desempenho do modelo substituto e evitar carga de trabalho computacional desnecessária. Concluiu-se que o modelo MAEK desenvolvido neste trabalho poderia razoavelmente prever as respostas do sistema em determinadas condições de operação. A substituição do modelo de simulação por um modelo MAEK reduziu consideravelmente a carga computacional do processo de otimização de simulação, e também mantendo alta precisão de computação.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Asher MJ, Croke BFW, Jakeman AJ, Peeters LJM (2015) A review of surrogate models and their application to groundwater modeling. Water Resour Res 51(8):5957–5973

    Article  Google Scholar 

  • Atmadja J, Bagtzoglou AC (2001) State of the art report on mathematical methods for groundwater pollution source identification. Environ Forensic 2(3):205–214

    Article  Google Scholar 

  • Ayvaz MT (2010) A linked simulation–optimization model for solving the unknown groundwater pollution source identification problems. J Contam Hydrol 117(1–4):46–59

    Article  Google Scholar 

  • Ayvaz MT, Karahan H (2008) A simulation/optimization model for the identification of unknown groundwater well locations and pumping rates. J Hydrol 357(1–2):76–92

    Article  Google Scholar 

  • Bagtzoglou AC, Atmadja J (2005) Mathematical methods for hydrologic inversion: the case of pollution source identification, chap. In: Environmental impact assessment of recycled wastes on surface and ground waters: engineering modeling and sustainability, vol 3. In: Kassim TA (ed) The handbook of environmental chemistry, water pollution series, vol 5, part F. Springer, Heidelberg, Germany, pp 65–96

    Google Scholar 

  • Bagtzoglou AC, Dougherty DE, Tompson AFB (1992) Application of particle methods to reliable identification of groundwater pollution sources. Water Resour Manag 6(1):15–23

    Article  Google Scholar 

  • Bagtzoglou AC, Hossain F (2009) Radial basis function neural network for hydrologic inversion: an appraisal with classical and spatio-temporal geostatistical techniques in the context of site characterization. Stoch Env Res Risk A 23(7):933–945

    Article  Google Scholar 

  • Bagtzoglou AC, Tompson AFB, Dougherty DE (1991) Probabilistic simulation for reliable solute source identification in heterogeneous porous media, chap. In: Ganoulis J (ed) Water resources engineering risk assessment. NATO ASI Series, G 29, Springer, Heidelberg, Germany, pp 189–201

  • Chang, Chih-Chung, Lin, Chih-Jen (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Accessed on December 22, 2016

  • Chen C, Li W, Su H, Liu K (2014) Spectral-spatial classification of hyperspectral image based on kernel extreme learning machine. Remote Sens 6(6):5795–5814

    Article  Google Scholar 

  • Datta B, Chakrabarty D, Dhar A (2011) Identification of unknown groundwater pollution sources using classical optimization with linked simulation. J Hydro Environ Res 5(1):25–36

    Article  Google Scholar 

  • Fernandez-Garcia D, Bolster D, Sanchez-Vila X, Tartakovsky DM (2012) A Bayesian approach to integrate temporal data into probabilistic risk analysis of monitored NAPL remediation. Adv Water Resour 36(SI):108–120

    Article  Google Scholar 

  • He L, Huang GH, Zeng GM, Lu HW (2008) An integrated simulation, inference, and optimization method for identifying groundwater remediation strategies at petroleum-contaminated aquifers in western Canada. Water Res 42(10–11):2629–2639

    Article  Google Scholar 

  • Hossain F, Anagnostou EN, Bagtzoglou AC (2006) On Latin hypercube sampling for efficient uncertainty estimation of satellite rainfall observations in flood prediction. Comput Geosci 32(6):776–792

    Article  Google Scholar 

  • Hou Z, Lu W, Chen M (2016) Surrogate-based sensitivity analysis and uncertainty analysis for DNAPL-contaminated aquifer remediation. J Water Resour Plan Manag 142(11):04016043

    Article  Google Scholar 

  • Hou ZY, Lu WX, Chu HB, Luo JN (2015) Selecting parameter-optimized surrogate models in DNAPL-contaminated aquifer remediation strategies. Environ Eng Sci 32(12):1016–1026

    Article  Google Scholar 

  • Hu JN, Hu JJ, Lin HB, Li XP, Jiang CL, Qiu XH, Li WS (2014) State-of-charge estimation for battery management system using optimized support vector machine for regression. J Power Sources 269:682–693

    Article  Google Scholar 

  • Jiang X, Lu WX, Hou ZY, Zhao HQ, Na J (2015) Ensemble of surrogates-based optimization for identifying an optimal surfactant-enhanced aquifer remediation strategy at heterogeneous DNAPL-contaminated sites. Comput Geosci 84(2015):37–45

    Article  Google Scholar 

  • Luo JN, Lu WX, Xin X, Chu HB (2013) Surrogate model application to the identification of an optimal surfactant-enhanced aquifer remediation strategy for DNAPL-contaminated sites. J Earth Sci 24(6):1023–1032

    Article  Google Scholar 

  • Michalak AM, Kitanidis PK (2003) A method for enforcing parameter nonnegativity in Bayesian inverse problems with an application to contaminant source identification. Water Resour Res 39(2):1033

    Article  Google Scholar 

  • Michalak AM, Kitanidis PK (2004) Estimation of historical groundwater contaminant distribution using the adjoint state method applied to geostatistical inverse modeling. Water Resour Res 40(8):W08302

    Article  Google Scholar 

  • Mirghani B, Tryby M, Ranjithan R, Karonis NT, Mahinthakumar KG (2010) Grid-enabled simulation–optimization framework for environmental characterization. J Comput Civ Eng 24(6):488–498

    Article  Google Scholar 

  • Mirghani BY, Mahinthakumar KG, Tryby ME (2009) A parallel evolutionary strategy based simulation–optimization approach for solving groundwater source identification problems. Adv Water Resour 32(9):1373–1385

    Article  Google Scholar 

  • Mirghani BY, Zechman EM, Ranjithan RS (2012) Enhanced simulation–optimization approach using surrogate modeling for solving inverse problems. Environ Forensic 13(4):348–363

    Article  Google Scholar 

  • Qin XS, Huang GH, Chakma A, Chen B, Zeng GM (2007) Simulation-based process optimization for surfactant-enhanced aquifer remediation at heterogeneous DNAPL-contaminated sites. Sci Total Environ 381(1–3):17–37

    Article  Google Scholar 

  • Queipo NV, Haftka RT, Shyy W (2005) Surrogate-based analysis and optimization. Prog Aerosp Sci 41(1):1–28

    Article  Google Scholar 

  • Rao SVN (2006) A computationally efficient technique for source identification problems in three-dimensional aquifer systems using neural networks and simulated annealing. Environ Forensic 7(3):233–240

    Article  Google Scholar 

  • Shi Y, Zhao LJ, Tang J (2014) Recognition model based feature extraction and kernel extreme learning machine for high dimensional data. Adv Mater Res 875:2020–2024

    Article  Google Scholar 

  • Singh RM, Datta B, Jain A (2004) Identification of unknown groundwater pollution sources using artificial neural networks. J Water Resour Plan Manag 130(6):506–514

    Article  Google Scholar 

  • Smola AJ, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222

    Article  Google Scholar 

  • Sreekanth J, Datta B (2010) Multi-objective management of saltwater intrusion in coastal aquifers using genetic programming and modular neural network based surrogate models. J Hydrol 393(3–4):245–256

    Article  Google Scholar 

  • Srivastava D, Singh RM (2014) Breakthrough curves characterization and identification of an unknown pollution source in groundwater system using an artificial neural network (ANN). Environ Forensic 15(2):175–189

    Article  Google Scholar 

  • Srivastava D, Singh RM (2015) Groundwater system modeling for simultaneous identification of pollution sources and parameters with uncertainty characterization. Water Resour Manag 29:4607–4627

    Article  Google Scholar 

  • Starn JJ, Bagtzoglou AC, Green CT (2015) The effects of numerical-model complexity and observation type on estimated porosity values. Hydrogeol J 23(6):1121–1128

    Article  Google Scholar 

  • Sun AY, Painter SL, Wittmeyer GW (2006) A constrained robust least squares approach for contaminant release history identification. Water Resour Res 42(4):263–269

    Article  Google Scholar 

  • Sun NZ (2009) Inverse problems in groundwater modeling. Springer, The Netherlands

  • Wang H, Jin X (2013) Characterization of groundwater contaminant source using Bayesian method. Stoch Env Res Risk A 27(4):867–876

    Article  Google Scholar 

  • Wang X, Han M (2014) Online sequential extreme learning machine with kernels for nonstationary time series prediction. Neurocomputing 145:90–97

    Article  Google Scholar 

  • Zeng LZ, Shi LS, Zhang DX, Wu LS (2012) A sparse grid based Bayesian method for contaminant source identification. Adv Water Resour 37(3):1–9

    Article  Google Scholar 

  • Zhang JJ, Li WX, Zeng LZ, Wu LS (2016) An adaptive Gaussian process-based method for efficient Bayesian experimental design in groundwater contaminant source identification problems. Water Resour Res 52(8):5971–5984

    Article  Google Scholar 

  • Zhang JJ, Zeng LZ, Chen C, Chen DJ, Wu LS (2015) Efficient Bayesian experimental design for contaminant source identification. Water Resour Res 51(1):576–598

    Article  Google Scholar 

  • Zhang YS, Kimberg DY, Coslett HB, Schwartz MF, Wang Z (2014) Multivariate lesion-symptom mapping using support vector regression. Hum Brain Mapp 35(12):5861–5876

    Article  Google Scholar 

  • Zhao Y, Lu WX, Xiao CN (2016) A Kriging surrogate model coupled in simulation–optimization approach for identifying release history of groundwater sources. J Contam Hydrol 185:51–60

    Article  Google Scholar 

Download references

Acknowledgements

This study was supported by the National Nature Science Foundation of China (Grant Nos. 41672232 and 41372237). Special gratitude is given to the journal editors for their efforts on evaluating the work, and the valuable comments of the anonymous reviewers are also greatly acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenxi Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, Z., Lu, W. Comparative study of surrogate models for groundwater contamination source identification at DNAPL-contaminated sites. Hydrogeol J 26, 923–932 (2018). https://doi.org/10.1007/s10040-017-1690-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10040-017-1690-1

Keywords

Navigation