Abstract
Knowledge of groundwater contamination sources is critical for effectively protecting groundwater resources, estimating risks, mitigating disaster, and designing remediation strategies. Many methods for groundwater contamination source identification (GCSI) have been developed in recent years, including the simulation–optimization technique. This study proposes utilizing a support vector regression (SVR) model and a kernel extreme learning machine (KELM) model to enrich the content of the surrogate model. The surrogate model was itself key in replacing the simulation model, reducing the huge computational burden of iterations in the simulation–optimization technique to solve GCSI problems, especially in GCSI problems of aquifers contaminated by dense nonaqueous phase liquids (DNAPLs). A comparative study between the Kriging, SVR, and KELM models is reported. Additionally, there is analysis of the influence of parameter optimization and the structure of the training sample dataset on the approximation accuracy of the surrogate model. It was found that the KELM model was the most accurate surrogate model, and its performance was significantly improved after parameter optimization. The approximation accuracy of the surrogate model to the simulation model did not always improve with increasing numbers of training samples. Using the appropriate number of training samples was critical for improving the performance of the surrogate model and avoiding unnecessary computational workload. It was concluded that the KELM model developed in this work could reasonably predict system responses in given operation conditions. Replacing the simulation model with a KELM model considerably reduced the computational burden of the simulation–optimization process and also maintained high computation accuracy.
Résumé
La connaissance des sources de contamination des eaux souterraines est. essentielle pour protéger efficacement les ressources en eau souterraine, estimer les risques, atténuer les désastres et concevoir des stratégies de remédiation. De nombreuses méthodes d’identification de source de contamination des eaux souterraines(ISCES) ont été développées durant les dernières années, incluant une technique de simulation–optimisation. Cette étude propose l’utilization d’un modèle de régression vectorielle de soutien (SVR) et d’un modèle basé sur l’apprentissage extrême d’un noyau (KELM) pour enrichir le contenu du modèle de substitution. Le modèle de substitution était lui-même la clef dans le remplacement du modèle de simulation, réduisant le lourd fardeau de calcul d’itérations de la technique de simulation–optimisation pour résoudre les problèmes d’ISCES, spécialement dans les problèmes d’ISCES d’aquifères contaminés par des liquides denses avec une phase non aqueuse (LDNA). Une étude comparative entre des modèles de krigeage, SVR et KELM est. présentée. De plus, on analyze l’influence de l’optimisation des paramètres et la structure de l’ensemble des données échantillonnées pour l’apprentissage sur la précision de l’approximation du modèle de substitution. On a trouvé que le modèle KELM était le modèle de substitution le plus précis, et que sa performance était améliorée de façon significative après l’optimisation des paramètres. La précision de l’approximation du modèle de substitution comparativement au modèle de simulation n’a pas toujours été améliorée en augmentant le nombre d’échantillons d’apprentissage. L’utilization d’un nombre approprié d’échantillon d’apprentissage était critique pour améliorer la performance du modèle de substitution et éviter une charge de calcul non nécessaire. Il a été conclu que le modèle KELM développé dans ce travail pouvait raisonnablement prédire les réponses du système dans des conditions opératoires données. Remplacer le modèle de simulation par un modèle KELM a considérablement réduit la charge de calcul associée à la procédure de simulation–optimisation et aussi conservé une grande précision de calcul.
Resumen
El conocimiento de las fuentes de contaminación del agua subterránea es fundamental para proteger eficazmente, estimar los riesgos, mitigar los desastres y diseñar estrategias de remediación de los recursos hídricos subterráneos. En los últimos años se han desarrollado muchos métodos para la identificación de fuentes de contaminación del agua subterránea (GCSI), incluidas las técnicas de optimización y simulación. Este estudio propone utilizar un modelo de regresión de soporte vectorial (SVR) y un modelo de máquina de aprendizaje de Kernel (KELM) para enriquecer el contenido del modelo sustituto. El modelo sustituto era en sí mismo clave en la sustitución del modelo de simulación, la reducción de la enorme carga computacional de iteraciones en la técnica de simulación-optimización para resolver problemas de GCSI, especialmente en acuíferos contaminados por líquidos densos en fase no acuosa (DNAPLs). Se presenta un estudio comparativo entre los modelos de Kriging, SVR y KELM. Además, se analiza la influencia de la optimización de parámetros y la estructura del conjunto de datos de la muestra de entrenamiento sobre la precisión de aproximación del modelo sustituto. Se encontró que el modelo KELM fue el modelo sustituto más preciso, y su desempeño mejoró significativamente después de la optimización de parámetros. La precisión de aproximación del modelo sustituto al modelo de simulación no siempre mejoró con un número creciente de muestras de entrenamiento. El uso del número apropiado de muestras de entrenamiento fue crítico para mejorar el rendimiento del modelo sustituto y evitar la carga de trabajo computacional innecesaria. Se concluyó que el modelo KELM desarrollado en este trabajo podría predecir razonablemente las respuestas del sistema en determinadas condiciones de operación. El reemplazo del modelo de simulación con un modelo KELM redujo considerablemente la carga computacional del proceso de optimización y simulación y también mantuvo una alta precisión de cálculo.
摘要
掌握地下水污染源信息对于有效保护地下水资源、评估风险、减轻灾害以及设计修复策略至关重要。近年来提出了很多地下水污染源确定方法,包括模拟-最优化技术。本研究的目的就是使用支撑矢量回顾模型以及内核极端学习机模型丰富替代模型的内容。替代模型本身在替代模拟模型中非常关键,减少解决地下水污染源确定问题、特别是解决重质非水相液体污染的含水层地下水污染源确定问题的模拟-最优化技术中迭代次数巨大的计算负担。论述了Kriging模型、支撑矢量回归模型和内核极端学习机模型对比研究。另外,还分析了参数最优化和培养样品集结构对替代模型近似精确度的影响。发现,内核极端学习机模型是最精确的替代模型,其性能在参数最优化后大大提高。替代模型对模拟模型的近似精确度并不总是随着培养样品的增加而提高。采用培养样品的合适数量对提高替代模型的性能以及避免不必要的计算量至关重要。结论就是,本研究开发的内核极端学习机模型可以合理地预测给定运行条件下的系统响应。用内核极端学习机模型替代模拟模型可大大减少模拟-最优化过程中的计算量,并可保持很高的计算精确度。
Resumo
O conhecimento sobre fontes de contaminação de águas subterrâneas é crítico para uma proteção efetiva dos recursos hídricos subterrâneos, estimando riscos, mitigando desastres, e elaborando estratégias de remediação. Muitos métodos para identificação de fontes de contaminação de águas subterrâneas (IFCAS) têm sido desenvolvidos nos últimos anos, incluindo a técnica de simulação-otimização. Esse estudo propôs a utilização um modelo de regressão por vetores de suporte (RVS) e um modelo de máquina de aprendizado extremo por kernel (MAEK) para enriquecer o conteúdo do modelo substituto (surrogate). O modelo substituto foi em si chave ao substituir o modelo de simulação, reduzindo o imenso peso computacional de interações na técnica de simulação-otimização para resolver problemas de IFCAS, especialmente problemas de IFCAS em aquíferos contaminados por compostos de Fase Líquida Densa Não Aquosa (DNAPLs). Descreve-se um estudo comparativo entre modelos de krigagem, RVS e MAEK. Além disso, disso, fez-se análise da influência da otimização de parâmetros e da estrutura do conjunto de dados da amostra de treinamento sobre a precisão de aproximação do modelo substituto. Verificou-se que o modelo MAEK foi o modelo substituto mais preciso e seu desempenho foi significativamente melhorado após a otimização de parâmetros. A precisão de aproximação do modelo substituto ao modelo de simulação nem sempre melhorou com o aumento do número de amostras de treinamento. Usar o número adequado de amostras de treinamento foi crítico para melhorar o desempenho do modelo substituto e evitar carga de trabalho computacional desnecessária. Concluiu-se que o modelo MAEK desenvolvido neste trabalho poderia razoavelmente prever as respostas do sistema em determinadas condições de operação. A substituição do modelo de simulação por um modelo MAEK reduziu consideravelmente a carga computacional do processo de otimização de simulação, e também mantendo alta precisão de computação.
Similar content being viewed by others
References
Asher MJ, Croke BFW, Jakeman AJ, Peeters LJM (2015) A review of surrogate models and their application to groundwater modeling. Water Resour Res 51(8):5957–5973
Atmadja J, Bagtzoglou AC (2001) State of the art report on mathematical methods for groundwater pollution source identification. Environ Forensic 2(3):205–214
Ayvaz MT (2010) A linked simulation–optimization model for solving the unknown groundwater pollution source identification problems. J Contam Hydrol 117(1–4):46–59
Ayvaz MT, Karahan H (2008) A simulation/optimization model for the identification of unknown groundwater well locations and pumping rates. J Hydrol 357(1–2):76–92
Bagtzoglou AC, Atmadja J (2005) Mathematical methods for hydrologic inversion: the case of pollution source identification, chap. In: Environmental impact assessment of recycled wastes on surface and ground waters: engineering modeling and sustainability, vol 3. In: Kassim TA (ed) The handbook of environmental chemistry, water pollution series, vol 5, part F. Springer, Heidelberg, Germany, pp 65–96
Bagtzoglou AC, Dougherty DE, Tompson AFB (1992) Application of particle methods to reliable identification of groundwater pollution sources. Water Resour Manag 6(1):15–23
Bagtzoglou AC, Hossain F (2009) Radial basis function neural network for hydrologic inversion: an appraisal with classical and spatio-temporal geostatistical techniques in the context of site characterization. Stoch Env Res Risk A 23(7):933–945
Bagtzoglou AC, Tompson AFB, Dougherty DE (1991) Probabilistic simulation for reliable solute source identification in heterogeneous porous media, chap. In: Ganoulis J (ed) Water resources engineering risk assessment. NATO ASI Series, G 29, Springer, Heidelberg, Germany, pp 189–201
Chang, Chih-Chung, Lin, Chih-Jen (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Accessed on December 22, 2016
Chen C, Li W, Su H, Liu K (2014) Spectral-spatial classification of hyperspectral image based on kernel extreme learning machine. Remote Sens 6(6):5795–5814
Datta B, Chakrabarty D, Dhar A (2011) Identification of unknown groundwater pollution sources using classical optimization with linked simulation. J Hydro Environ Res 5(1):25–36
Fernandez-Garcia D, Bolster D, Sanchez-Vila X, Tartakovsky DM (2012) A Bayesian approach to integrate temporal data into probabilistic risk analysis of monitored NAPL remediation. Adv Water Resour 36(SI):108–120
He L, Huang GH, Zeng GM, Lu HW (2008) An integrated simulation, inference, and optimization method for identifying groundwater remediation strategies at petroleum-contaminated aquifers in western Canada. Water Res 42(10–11):2629–2639
Hossain F, Anagnostou EN, Bagtzoglou AC (2006) On Latin hypercube sampling for efficient uncertainty estimation of satellite rainfall observations in flood prediction. Comput Geosci 32(6):776–792
Hou Z, Lu W, Chen M (2016) Surrogate-based sensitivity analysis and uncertainty analysis for DNAPL-contaminated aquifer remediation. J Water Resour Plan Manag 142(11):04016043
Hou ZY, Lu WX, Chu HB, Luo JN (2015) Selecting parameter-optimized surrogate models in DNAPL-contaminated aquifer remediation strategies. Environ Eng Sci 32(12):1016–1026
Hu JN, Hu JJ, Lin HB, Li XP, Jiang CL, Qiu XH, Li WS (2014) State-of-charge estimation for battery management system using optimized support vector machine for regression. J Power Sources 269:682–693
Jiang X, Lu WX, Hou ZY, Zhao HQ, Na J (2015) Ensemble of surrogates-based optimization for identifying an optimal surfactant-enhanced aquifer remediation strategy at heterogeneous DNAPL-contaminated sites. Comput Geosci 84(2015):37–45
Luo JN, Lu WX, Xin X, Chu HB (2013) Surrogate model application to the identification of an optimal surfactant-enhanced aquifer remediation strategy for DNAPL-contaminated sites. J Earth Sci 24(6):1023–1032
Michalak AM, Kitanidis PK (2003) A method for enforcing parameter nonnegativity in Bayesian inverse problems with an application to contaminant source identification. Water Resour Res 39(2):1033
Michalak AM, Kitanidis PK (2004) Estimation of historical groundwater contaminant distribution using the adjoint state method applied to geostatistical inverse modeling. Water Resour Res 40(8):W08302
Mirghani B, Tryby M, Ranjithan R, Karonis NT, Mahinthakumar KG (2010) Grid-enabled simulation–optimization framework for environmental characterization. J Comput Civ Eng 24(6):488–498
Mirghani BY, Mahinthakumar KG, Tryby ME (2009) A parallel evolutionary strategy based simulation–optimization approach for solving groundwater source identification problems. Adv Water Resour 32(9):1373–1385
Mirghani BY, Zechman EM, Ranjithan RS (2012) Enhanced simulation–optimization approach using surrogate modeling for solving inverse problems. Environ Forensic 13(4):348–363
Qin XS, Huang GH, Chakma A, Chen B, Zeng GM (2007) Simulation-based process optimization for surfactant-enhanced aquifer remediation at heterogeneous DNAPL-contaminated sites. Sci Total Environ 381(1–3):17–37
Queipo NV, Haftka RT, Shyy W (2005) Surrogate-based analysis and optimization. Prog Aerosp Sci 41(1):1–28
Rao SVN (2006) A computationally efficient technique for source identification problems in three-dimensional aquifer systems using neural networks and simulated annealing. Environ Forensic 7(3):233–240
Shi Y, Zhao LJ, Tang J (2014) Recognition model based feature extraction and kernel extreme learning machine for high dimensional data. Adv Mater Res 875:2020–2024
Singh RM, Datta B, Jain A (2004) Identification of unknown groundwater pollution sources using artificial neural networks. J Water Resour Plan Manag 130(6):506–514
Smola AJ, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Sreekanth J, Datta B (2010) Multi-objective management of saltwater intrusion in coastal aquifers using genetic programming and modular neural network based surrogate models. J Hydrol 393(3–4):245–256
Srivastava D, Singh RM (2014) Breakthrough curves characterization and identification of an unknown pollution source in groundwater system using an artificial neural network (ANN). Environ Forensic 15(2):175–189
Srivastava D, Singh RM (2015) Groundwater system modeling for simultaneous identification of pollution sources and parameters with uncertainty characterization. Water Resour Manag 29:4607–4627
Starn JJ, Bagtzoglou AC, Green CT (2015) The effects of numerical-model complexity and observation type on estimated porosity values. Hydrogeol J 23(6):1121–1128
Sun AY, Painter SL, Wittmeyer GW (2006) A constrained robust least squares approach for contaminant release history identification. Water Resour Res 42(4):263–269
Sun NZ (2009) Inverse problems in groundwater modeling. Springer, The Netherlands
Wang H, Jin X (2013) Characterization of groundwater contaminant source using Bayesian method. Stoch Env Res Risk A 27(4):867–876
Wang X, Han M (2014) Online sequential extreme learning machine with kernels for nonstationary time series prediction. Neurocomputing 145:90–97
Zeng LZ, Shi LS, Zhang DX, Wu LS (2012) A sparse grid based Bayesian method for contaminant source identification. Adv Water Resour 37(3):1–9
Zhang JJ, Li WX, Zeng LZ, Wu LS (2016) An adaptive Gaussian process-based method for efficient Bayesian experimental design in groundwater contaminant source identification problems. Water Resour Res 52(8):5971–5984
Zhang JJ, Zeng LZ, Chen C, Chen DJ, Wu LS (2015) Efficient Bayesian experimental design for contaminant source identification. Water Resour Res 51(1):576–598
Zhang YS, Kimberg DY, Coslett HB, Schwartz MF, Wang Z (2014) Multivariate lesion-symptom mapping using support vector regression. Hum Brain Mapp 35(12):5861–5876
Zhao Y, Lu WX, Xiao CN (2016) A Kriging surrogate model coupled in simulation–optimization approach for identifying release history of groundwater sources. J Contam Hydrol 185:51–60
Acknowledgements
This study was supported by the National Nature Science Foundation of China (Grant Nos. 41672232 and 41372237). Special gratitude is given to the journal editors for their efforts on evaluating the work, and the valuable comments of the anonymous reviewers are also greatly acknowledged.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hou, Z., Lu, W. Comparative study of surrogate models for groundwater contamination source identification at DNAPL-contaminated sites. Hydrogeol J 26, 923–932 (2018). https://doi.org/10.1007/s10040-017-1690-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10040-017-1690-1