Skip to main content

Advertisement

Log in

A case study using support vector machines, neural networks and logistic regression in a GIS to identify wells contaminated with nitrate-N

Cas d’utilisation dans un SIG d’outils guides vectoriels, réseaux nodaux et régression, pour identifier des puits contaminés par les nitrates

Un caso de estudio utilizando máquinas de soporte vectorial, redes neuronales y regresión logística en un GIS para identificar pozos contaminados con N-nitrato

GIS中应用支持向量机、神经网络和逻辑回归确定硝态氮污染井的实例研究

Um caso de estudo de utilização de máquinas de vector de suporte, redes neuronais e regressão logística num SIG para identificar poços contaminados com nitrato-N

  • Report
  • Published:
Hydrogeology Journal Aims and scope Submit manuscript

Abstract

Accurate and inexpensive identification of potentially contaminated wells is critical for water resources protection and management. The objectives of this study are to 1) assess the suitability of approximation tools such as neural networks (NN) and support vector machines (SVM) integrated in a geographic information system (GIS) for identifying contaminated wells and 2) use logistic regression and feature selection methods to identify significant variables for transporting contaminants in and through the soil profile to the groundwater. Fourteen GIS derived soil hydrogeologic and landuse parameters were used as initial inputs in this study. Well water quality data (nitrate-N) from 6,917 wells provided by Florida Department of Environmental Protection (USA) were used as an output target class. The use of the logistic regression and feature selection methods reduced the number of input variables to nine. Receiver operating characteristics (ROC) curves were used for evaluation of these approximation tools. Results showed superior performance with the NN as compared to SVM especially on training data while testing results were comparable. Feature selection did not improve accuracy; however, it helped increase the sensitivity or true positive rate (TPR). Thus, a higher TPR was obtainable with fewer variables.

Résumé

La caractérisation à faible coût de puits potentiellement contaminés est essentielle pour la protection et la gestion des ressources en eau. Les objectifs de cette étude sont 1) évaluer la fiabilité et la pertinence d’outils d’approximation tels les réseaux nodaux “Neural Networks” (NN) et les outils vectoriels guide “Support Vector Machines” (SVM) intégrés dans un Système d’Information Géographique (SIG) pour identifier les puits contaminés et 2) utiliser une loi de régression et définir des méthodes de sélection pour identifier des variables représentatives du flux polluant à travers le sol jusqu’à la nappe. Quatorze paramètres hydrogéologiques et caractérisant l’occupation du sol ont été utilisés comme entrants initiaux dans cette étude. La qualité nitrates (azote N) de 6917 puits, fournie par le Département de la Protection Environnementale de Floride, a défini une classe de référence. L’utilisation de loi de regression et la sélection des critères a réduit le nombre de variables à neuf. Des courbes d’ajustement ont été utilisées pour évaluer l’outil de calcul. Les résultats ont montré une performance supérieure du NN comparativement au SVM, particulièrement sur les valeurs de calage tandis que les résultats des tests étaient comparables. La sélection des courbes n’a pas amélioré la précision. Toutefois, elle a contribué à accroître la sensibilité ou indice de confiance “true positive rate” (TPR). Ainsi, un TPR plus élevé a été obtenu avec moins de variables.

Resumen

La identificación precisa y a bajo costo de pozos potencialmente contaminados es crítico para la protección y gestión de los recursos hídricos. Los objetivos de este estudio son: 1) evaluar la conveniencia de herramientas de aproximación tales como redes neuronales (NN) y máquinas de soporte vectorial (SVM) integradas en un sistema de información geográfica (GIS) para identificar pozos contaminados y 2) usar la regresión logística y métodos de selección de características para identificar variables significativas para el transporte de contaminantes en y a través del perfil del suelo hasta el agua subterránea. Se usaron catorce parámetros de uso de la tierra e hidrogeológicos del suelo derivados del GIS como datos iniciales en este estudio. Se utilizaron datos de calidad de agua de pozo (N -nitrato) de 6,917 pozos provistos por el Florida Department of Environmental Protection (USA) como un objetivo de salida. El uso de los métodos de regresión logística y de selección de características redujeron el número de variables de entrada a nueve. Las curvas de características operativas del receptor (ROC) fueron usadas para la evaluación de estas herramientas de aproximación. Los resultados mostraron un rendimiento superior con el NN comparado con SVM especialmente sobre datos de entrenamiento mientras que los resultados testeados fueran comparables, La selección de características no mejoró la precisión, sin embargo, ello ayuda a incrementar la sensibilidad o tasa de verdaderos positivos (TPR). Así, un más alto TPR se obtuvo con menos variables.

摘要

准确、廉价地识别潜在污染井在水资源保护和管理中非常关键。本研究的目的是1) 评估逼近方法如神经网络 (NN) 和支持向量机 (SVM) 与地理信息系统 (GIS) 集成识别污染井的适用性; 2) 利用逻辑回归和特征选择方法识别控制污染物在土壤剖面中运移或通过剖面到达地下水的显著变量。研究中将14个通过GIS得到的土壤水文地质和土地利用参数作为初始输入。以福罗里达州的环境保护署 (美国) 提供的6917个井的水质数据 (硝态氮) 作为输出目标类。逻辑回归和特征选择方法的应用将输入变量数目减至九个。使用受试者工作特征 (ROC) 曲线评估这些逼近方法。结果显示在训练样本的检验结果可比时, NN比SVM性能更好。特征选择未能改善精确度, 但提高了灵敏度或真阳性率 (TPR) 。因此, 以较少变量也可获得较高的TPR。

Resumo

A identificação precisa e económica de poços potencialmente contaminados é crítica para a protecção e gestão dos recursos hídricos. Os objectivos deste estudo são 1) avaliar a conveniência de ferramentas de aproximação tais como as redes neuronais (RN) e máquinas de vector de suporte (MVS) integrados num sistema de informação geográfica (SIG) para identificar poços contaminados e 2) o uso de regressão logística e métodos de selecção de características para identificar variáveis significativas para o transporte de contaminantes no seio de, e através do, perfil do solo até à água subterrânea. Neste estudo foram usados como informação de entrada catorze parâmetros hidrogeológicos e de uso do solo derivados de SIG. Como espécie alvo para a saída foram usados dados da qualidade da água (nitrato-N) de 6917 poços, fornecidos pelo Departamento de Protecção Ambiental da Florida, EUA (Florida DEP). A utilização de regressão logística e métodos de selecção de características reduziram as variáveis de entrada para nove. Para a avaliação destas ferramentas de aproximação foram usadas curvas operacionais características de receptor (receiver operating characteristics, ROC). Os resultados mostraram um desempenho superior das RN quando comparadas com MVS, especialmente sobre dados de treino, enquanto os resultados dos ensaios eram comparáveis. A selecção de características não melhorou a precisão; no entanto, ajudou a incrementar a sensibilidade ou rácio de positivos verdadeiros (RPV). Desta forma, é possível obter RPV mais elevados com menos variáveis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Almasri MN, Kaluarachchi (2005) Modular neural networks to predict the nitrate distribution in groundwater using the on-ground nitrogen loading and recharge data. Environ Model Softw 20(7):851–871

    Article  Google Scholar 

  • Asefa T, Kemblowski MW, Urroz G, McKee M, Khalil A (2004) Support vectors–based groundwater head observation networks design. Water Resour Res 40(11):1–10

    Article  Google Scholar 

  • Barbash JE, Resek EA (1996) Pesticides in groundwater distribution, trends, and governing factors. Ann Arbor Press, Chelsea, MI, USA

    Google Scholar 

  • Burkart M, Kolpin R, Dana W (1993) Hydrologic and land-use factors associated with herbicides and nitrate in near-surface aquifers. J Environ Qual 22(4):646–656

    Article  Google Scholar 

  • Burrough PA, McDonnell RA (1998) Principles of geographical information systems. London, Oxford University Press, Oxford

    Google Scholar 

  • Candade N, Dixon B (2004) Multispectral classification of Landsat images: comparison of Support Vector Machine and Neural Network classifiers. ASPRS Annual Meeting Proceedings. Denver, CO, May 2004, Mira Digital, Bethesda, MD

  • Chang C, C-J Lin (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm. Cited 19 January 2009

  • Cheng Q, Varshney PK, Arora MK (2006) Logistic regression for feature selection and soft classification of remote sensing data. IEEE Geosci Remote Sens Lett 3(4):491–494

    Article  Google Scholar 

  • Corwin DL, Loague K, Ellsworth TR (1996) Introduction to non-point source pollution in the vadose zone with advanced information technologies. In: Corwin DL, Loague K, Eilsworth TR (eds) Assessment of non-point source pollution in the vadose zone. Geophysical Monograph 108, AGU, Washington, DC

    Google Scholar 

  • Corwin DL, Hopmans J, de Rooij GH (2006) From field- to landscape-scale vadose zone processes: scale issues, modeling, and monitoring. Vadose Zone J 5:129–139

    Article  Google Scholar 

  • Coppola Jr E, Szidarovszky F, Poulton M (2003) Artificial neural network approach for predicting transient water levels in a multilayered groundwater system under variable state pumping, and climate conditions. J Hydrol Eng 8(6):348–360

    Article  Google Scholar 

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. University Press, Cambridge

    Google Scholar 

  • Dibike YB, Velickov S, Solomatine DP, Abott MB (2001) Model induction with support vector machines: introduction and applications. J Comput Civ Eng 15(3):208–216

    Article  Google Scholar 

  • Dixon B (2001) Application of Neuro-fuzzy techniques to predict groundwater vulnerability in Northwest Arkansas. PhD Thesis, University of Arkansas, USA

  • Dixon B (2004) Prediction of ground water vulnerability using an integrated GIS-based neuro-fuzzy techniques. J Spat Hydrol 4(2):1–380–152

    Google Scholar 

  • Dixon B (2005a) Applicability of neuro-fuzzy techniques in predicting groundwater vulnerability: a GIS-based sensitivity analysis. J Hydrol 309(1–4):17–38

    Article  Google Scholar 

  • Dixon B (2005b) Groundwater vulnerability mapping: a GIS and fuzzy rule based integrated tool. J Appl Geog 25:327–347

    Article  Google Scholar 

  • Dixon B, Candade N (2008) Multispectral landuse classification using neural networks and support vector machines: one or the other or both? Int J Remote Sens 29(4):1185–1206

    Article  Google Scholar 

  • Dixon B, Scott HD (1998) Use of fuzzy logic with modified DRASTIC parameters to predict groundwater contamination. In: Scott HD (ed) Vulnerability and use of ground and surface waters in the southern Mississippi Valley region. Publ. no. 269, AWRC, Fayetteville, AR, pp 16–51

    Google Scholar 

  • Dixon B, Scott HD, Dixon JC, Steele KF (2002) Prediction of aquifer vulnerability to pesticides using fuzzy rule-based models at the regional scale. Phys Geog 23:13

    Google Scholar 

  • Dou C, Woldt W, Dahab M, Bogardi I (1997a) Transient groundwater flow simulation using a fuzzy set approach. Ground Water 35(2):205–215

    Article  Google Scholar 

  • Dou C, Woldt W, Bogardi I, Dahab M (1997b) Numerical solute transport simulation using fuzzy sets approach. J Contam Hydrol 27:107–126

    Article  Google Scholar 

  • Garcia LA, Shigidi A (2006) Using neural networks for parameter estimation in groundwater. J Hydrol 18(1–4):215–231

    Article  Google Scholar 

  • Hassan A, Hamed KH (2001) Prediction of plume migration in heterogeneous media using artificial neural networks. Water Resour Res 37(3):605–623

    Article  Google Scholar 

  • Hosmer DW Jr, Lemenshow S (2000) Applied logistics regression. Wiley, NY, 307 pp

    Book  Google Scholar 

  • Grimm LG, Yarnold PR (1995) Reading and understanding multivariate statistics. American Psychological Association, Washington, DC

    Google Scholar 

  • Govindaraju RS, Rao AR (2000) Artificial neural networks in hydrology (Water Science and Technology Library). Kluwer, Dordrcht, The Netherlands

  • Guyon I, Elisseeff (2003) An introduction to variable and feature selection. J Machine Learn Res 3:1157–1182

    Article  Google Scholar 

  • Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley, New York

  • Kelly WR et al (2005) Estimating background and threshold nitrate concentrations in a karst aquifer using probability graphs. Geol Soc Am Abst Prog 37(7):325

    Google Scholar 

  • Khalil A, Almasri MN, McKee M, Kaluarachchi JJ (2005) Applicability of statistical learning algorithms in groundwater quality modeling. Water Resour Res 41, W05010. doi:10.1029/2004WR003608

    Article  Google Scholar 

  • Koterba MT, Banks WSL, Shedlock RJ (1993) Pesticides in shallow groundwater in the Delmarva Peninsula. J Environ Qual 22(3):500–518

    Google Scholar 

  • Kohavi R, John G (1995) Automatic parameter selection by minimizing estimated error. In: Proceedings of the Twelfth International Conference on Machine Learning (ICML-95), Kaufmann, San Francisco, pp 304–312

  • Kunstmann H, Kinzelbach W, Siegfried T (2001) Conditional first order second moment method and its application to the quantification of uncertainty in groundwater modeling. Water Resour Res 38(4), 1035. doi:10.1029/2000WR000022

    Article  Google Scholar 

  • Lin HS (1995) Hydraulic properties and macropore flow of water in relation to soil morphology. PhD Thesis, Texas A & M University, USA, 229 pp

  • Lin HS, McInnes KJ, Wilding LP, Hallmark CT (1999) Effects of soil morphology on hydraulic properties: I. quantification of soil morphology. Soil Sci Soc Am J 63:948–953

    Google Scholar 

  • Lin HS (2003) Hydropedology: bridging disciplines, scales, and data. Vadose Zone J 2:1–11

    Article  Google Scholar 

  • Liong S, Sivapragasam C (2002) Flood stage forecasting with support vector machines. J Am Water Resour Assoc 38(1):173–186

    Article  Google Scholar 

  • Mathworks (2004) Matlab software package. Available at http://www.mathworks.com/academia/. Cited 17 January 2009

  • MedCalc (2004) Logistic regression. Available at http://www.medcalc.be/manual/logistic_regression.php. Cited 13 January 2009

  • Moore DL, Martin DW, Walker ST, Rauch JT, Jones GW (1986) Initial sampling results of an ambient background ground-water quality monitor network in the Southwest Florida Water Management District. Southwest Florida Water Management District, Brooksville, FL, 393 pp

    Google Scholar 

  • National Research Council (NRC) (1993) Ground water vulnerability assessment: contamination potential under conditions of uncertainty. Committee on Techniques for Assessing Ground Water Vulnerability, Water Science and technology Board, Commission on Geosciences, Environment, and Resources, National Academy Press, Washington, DC, 179 pp

    Google Scholar 

  • National Research Council (NRC) (2001a) Basic research opportunities in earth science. National Academy Press, Washington, DC

    Google Scholar 

  • National Research Council (NRC) (2001b) Envisioning the agenda for water resources research in the twenty-first century. National Academy Press, Washington, DC

    Google Scholar 

  • National Research Council (NRC) (2001c) Grand challenges in environmental sciences. National Academy Press, Washington, DC

    Google Scholar 

  • Nayak PC, Rao YRS, Sudheer KP (2006) Groundwater level forecasting in a shallow aquifer using artificial neural network approach. Water Resour Manage 20(1):77–90

    Article  Google Scholar 

  • Oostindie K, Bronswijk JJB (1995) Consequences of preferential flow in cracking clay soils for contamination-risk of shallow aquifers. J Environ Manage 43:359–373

    Article  Google Scholar 

  • Osborne LL, Wiley MJ (1988) Empirical relationships between land use/cover and stream water quality in and agricultural watershed. J Environ Manage 26(1):9–27

    Google Scholar 

  • Quisenberry VL, Smith BR, Phillips RE, Scott HD, Nortcliff S (1993) A soils classification system for describing water and chemical transport. Soil Sci 156:306–315

    Article  Google Scholar 

  • Ray C, Klindworth KK (2000) Neural Networks for agrichemical vulnerability assessment of rural private wells. J Hydrol Eng 4:162–171

    Article  Google Scholar 

  • Rupert MG (1997) Nitrate (NO+ NO3–N) in ground water of the upper Snake River Basin, Idaho and western Wyoming, 1991–95. US Geol Surv Sci Invest Rep 97–4174, 47 pp

  • Sahoo GB, Raya C, Wadeb HF (2005) Pesticide prediction in ground water in North Carolina domestic wells using artificial neural networks. Ecol Modeling 183:29–46

    Article  Google Scholar 

  • Scott HD, Griffis CL, Udouj TH (1998) Well contamination and related spatial attributes. . International Water Resources Engineering Conference, Memphis, TN, 5 August 1998, pp 1200–1205

  • Sitharam TG, Samui P, Anbazhagan P (2008) Spatial variability of rock depth in Bangalore using geostatistical, neural network and support vector machine models. Geotech Geol Eng 26:503–517

    Article  Google Scholar 

  • Spechler RM, Kroening SE (2007) Hydrology of Polk County, Florida. US Geol Surv Sci Invest Rep 2006–5320, 114 pp

    Google Scholar 

  • Steenhuis TS, Ball J, Shalit G, Selker JS, Merwin IA (1994) A simple equation for predicting preferential flow solute concentrations. J Environ Qual 23:1058–1064

    Google Scholar 

  • Steinwart I, Christmann A (2008) Support vector machines (information science and statistics). Springer, Heidelberg

  • Tabachnick BG, Fidell LS (1996) Using multivariate statistics. HarperCollins, NY

    Google Scholar 

  • Tipping ME (2000) The relevance vector machine. In: Solla SA, Leen TK, Muller K-R (eds) Advances in neural information processing systems 12, MIT Press, Boston, pp 652–658

  • US Environmental Protection Agency (1993) A review of methods for assessing aquifer sensitivity and ground water vulnerability to pesticide contamination: EPA/813/R–93/002, US EPA, Wasington, DC, 147 pp

  • US Geological Survey (2004) USGS Circular 1207. US Geological Survey, Reston, VA. http://sofia.usgs.gov/publications/circular/1207/nutrients.html. Cited 14 January 2009

  • Wagner BJ (1992) Simultaneous parameter estimation and contaminant source characterization for couples groundwater flow and contaminant transport modeling. J Hydrol 135:275–303

    Article  Google Scholar 

  • Westervelt J, Shipiro M, Goran W (1989) GRASS: geographical resources analysis support system. US Army Corps of Engineers, Construction Engineering Research Laboratories, Champagne, IL

    Google Scholar 

  • Wohlberg B, Tartakovsky DM, Guadagnini A (2006) Subsurface characterization with support vector machines. 44(1):47-57

  • Zadeh LA (1964) The concept of state in system theory. In: Mesarovic MD (ed) Views on general systems theory: proceedings of the second systems symposium at case institute of technology. Wiley, New York

    Google Scholar 

Download references

Acknowledgements

This research was funded by FWRRC Subcontract UF-EIES-0404012-USF. I would like to thank N. Candade and R. Stetson for their help with this research (database creation).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barnali Dixon.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 44 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dixon, B. A case study using support vector machines, neural networks and logistic regression in a GIS to identify wells contaminated with nitrate-N. Hydrogeol J 17, 1507–1520 (2009). https://doi.org/10.1007/s10040-009-0451-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10040-009-0451-1

Keywords