Skip to main content

Advertisement

Log in

Enhancing data-driven modeling of fluoride concentration using new data mining algorithms

  • Original Article
  • Published:
Environmental Earth Sciences Aims and scope Submit manuscript

Abstract

Groundwater is an essential constituent of drinking water in hard rock areas and hence it requires the analysis of contaminant resources. Fluoride contamination with large spatial variation in the part of Sindhudurg district is reported. The present study focuses on the development of data-driven modeling of fluoride concentration using on-site measurement of physicochemical parameters. In this configuration, six machine learning(ML) architectures, namely data mining algorithms were explored including novel algorithms Gaussian process (GP) and long short term memory (LSTM). The results were compared with support vector machine (SVM), random forest (RF), extreme learning machine (ELM), and multi-layer perceptron (MLP) as a benchmark to test the robustness of the modeling process. In total 225 water samples from different dug-wells/bore- wells were obtained from the area (latitude:15.37–16.40 degree, longitude:73.19–74.18 degree) in the period of 2009–2016. Two subsets of data were divided with 80% data in training and 20% in testing. Different 9 physicochemical parameters pH, EC, TDS, Ca2+, Mg2+, Na+, Cl, HCO3, SO42− were used in the modeling of fluoride (F). In this context logarithmic transformation of raw data was employed to improve the correlation between input and target and therefore to enhance the modeling accuracy. Different quantitative and qualitative (visual) measures were taken to establish the prediction power of models. Results revealed that GP outperform all other models in fluoride prediction followed by LSTM, SVM, MLP, RF, and ELM, respectively. Results also revealed that the model’s performance depends on model structure and data accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Acharya N, Shrivastava NA, Panigrahi BK, Mohanty UC (2014) Development of an artificial neural network based multi-model ensemble to estimate the northeast monsoon rainfall over south peninsular India: an application of extreme learning machine. Clim Dyn 43(5–6):1303–1310

    Article  Google Scholar 

  • Adimalla N, Li P (2019) Occurrence, health risks, and geochemical mechanisms of fluoride and nitrate in groundwater of the rock-dominant semi-arid region, Telangana State, India. Hum Ecol Risk Assess 25(1–2):81–103

    Article  Google Scholar 

  • Alagha JS, Said MAM, Mogheir Y (2014) Modeling of nitrate concentration in groundwater using artificial intelligence approach- a case study of Gaza coastal aquifer. Environ Monit Assess 186(1):35–45

    Article  Google Scholar 

  • Al-Mahallawi K, Mania J, Hani A, Shahrour I (2012) Using of neural networks for the prediction of nitrate groundwater contamination in rural and agricultural areas. Environ Earth Sciences 65(3):917–928

    Article  Google Scholar 

  • Amini M, Johnson A, Abbaspour KC, Mueller K (2009) Modelling large scale geogenic contamination of groundwater, combination of geochemical expertise and statistical techniques. 18th World IMACS. In MODSIM Congress, Cairns, Australia 4100–4106.

  • Asim Y, Shahid AR, Malik AK, Raza B (2018) Significance of machine learning algorithms in professional blogger’s classification. Comput Electr Eng 65:461–473

    Article  Google Scholar 

  • Barzegar R, Moghaddam AA, Adamowski J, Fijani E (2017) Comparison of machine learning models for predicting fluoride contamination in groundwater. Stoch Env Res Risk Assess 31(10):2705–2718

    Article  Google Scholar 

  • Belgiu M, Drăguţ L (2016) Random forest in remote sensing: A review of applications and future directions. ISPRS J Photogramm Remote Sens 114:24–31

    Article  Google Scholar 

  • BIS (2012) “Indian standard drinking water specification.” Second Revision ISO: 10500:2012, Bureau of Indian Standards, Drinking Water Sectional Committee, FAD 25, New Delhi, India.

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Brindha K, Elango L (2013) Geochemistry of fluoride-rich groundwater in a weathered granitic rock region, Southern India. Water Quality Exposure Health 5(3):127–138

    Article  Google Scholar 

  • Bui DT, Khosravi K, Karimi M, Busico G, Khozani ZS, Nguyen H, Kazakis N (2020) Enhancing nitrate and strontium concentration prediction in groundwater by using new data mining algorithm. Sci Total Environ 715:136836

    Article  Google Scholar 

  • Busico G, Cuoco E, Kazakis N, Colombani N, Mastrocicco M, Tedesco D, Voudouris K (2018) Multivariate statistical analysis to characterize/discriminate between anthropogenic and geogenic trace elements occurrence in the Campania Plain, Southern Italy. Environ Pollut 234:260–269

    Article  Google Scholar 

  • Ceryan N, Ozkat EC, Can NK, Ceryan S (2021) Machine learning models to estimate the elastic modulus of weathered magmatic rocks. Environ Earth Sci 80(12):1–24

    Article  Google Scholar 

  • CGWB (2009) Groundwater information, Sindhudurg district, Maharashtra. Technical Report, 1625/DB/2009.

  • Chen JC, Chang NB, Shieh WK (2003) Assessing wastewater reclamation potential by neural network model. Eng Appl Artif Intell 16(2):149–157

    Article  Google Scholar 

  • Cherkassky V, Krasnopolsky V, Solomatine DP, Valdes J (2006) Computational intelligence in earth sciences and environmental applications: Issues and challenges. Neural Netw 19(2):113–121

    Article  Google Scholar 

  • Chitsazan N, Nadiri AA, Tsai FTC (2015) Prediction and structural uncertainty analyses of artificial neural networks using hierarchical Bayesian model averaging. J Hydrol 528:52–62

    Article  Google Scholar 

  • Cho KH, Sthiannopkao S, Pachepsky YA, Kim KW, Kim JH (2011) Prediction of contamination potential of groundwater arsenic in Cambodia, Laos, and Thailand using artificial neural network. Water Res 45(17):5535–5544

    Article  Google Scholar 

  • Coppola EA Jr, Rana AJ, Poulton MM, Szidarovszky F, Uhl VW (2005) A neural network model for predicting aquifer water level elevations. Groundwater 43(2):231–241

    Article  Google Scholar 

  • Duraiswami RA, Patankar U (2011) Occurrence of fluoride in the drinking water sources from Gad river basin, Maharashtra. J Geol Soc India 77(2):167–174

    Article  Google Scholar 

  • Ebrahimy H, Feizizadeh B, Salmani S, Azadi H (2020) A comparative study of land subsidence susceptibility mapping of Tasuj plane, Iran, using boosted regression tree, random forest and classification and regression tree methods. Environ Earth Sci 79:1–12

    Article  Google Scholar 

  • Gaikwad S, Gaikwad S, Meshram D, Wagh V, Kandekar A, Kadam A (2020) Geochemical mobility of ions in groundwater from the tropical western coast of Maharashtra, India: implication to groundwater quality. Environ Dev Sustain 22(3):2591–2624

    Article  Google Scholar 

  • Gunn SR (1998) Support vector machines for classification and regression. ISIS Technical Report 14(1):5–16

    Google Scholar 

  • Haykin S (1999) Neural networks: a comprehensive foundation. Prentice-Hall, New Jersey, p 842

    Google Scholar 

  • Hazarika BB, Gupta D, Berlin M (2020) Modeling suspended sediment load in a river using extreme learning machine and twin support vector regression with wavelet conjunction. Environ Earth Sci 79:1–15

    Article  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Hong X, Gao J, Jiang X, Harris CJ (2014) Estimation of Gaussian process regression model using probability distance measures. Syst Sci Control Eng 2(1):655–663

    Article  Google Scholar 

  • Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501

    Article  Google Scholar 

  • Javadi AA, Al-Najjar MM (2007) Finite element modeling of contaminant transport in soils including the effect of chemical reactions. J Hazard Mater 143(3):690–701

    Article  Google Scholar 

  • Kang J, Yu Z, Wu S, Zhang Y, Gao P (2021) Feasibility analysis of extreme learning machine for predicting thermal conductivity of rocks. Environ Earth Sci 80(13):1–15

    Article  Google Scholar 

  • Kecman V (2005) Support vector machines–an introduction. Support vector machines: theory and applications. Springer, Berlin, pp 1–47

    Google Scholar 

  • Khosravi K, Mao L, Kisi O, Yaseen ZM, Shahid S (2018) Quantifying hourly suspended sediment load using data mining models: case study of a glacierized Andean catchment in Chile. J Hydrol 567:165–179

    Article  Google Scholar 

  • Khozani ZS, Khosravi K, Pham BT, Kløve B, Wan Mohtar WHM, Yaseen ZM (2019) Determination of compound channel apparent shear stress: application of novel data mining models. J Hydroinform 21(5):798–811

    Article  Google Scholar 

  • Kisi O, Tombul M, Kermani MZ (2015) Modeling soil temperatures at different depths by using three different neural computing techniques. Theor Appl Climatol 121(1):377–387

    Article  Google Scholar 

  • Lu H, Ma X (2020) Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 249:126169

    Article  Google Scholar 

  • Madhnure P, Malpe DB (2007) Fluoride contamination of groundwaters in rural parts of Yavatmal District, Maharashtra-Causes and remedies. Gondwana Geol Mag 11:127–135

    Google Scholar 

  • Maiti S, Gupta G, Erram VC, Tiwari RK (2011) Inversion of Schlumberger resistivity sounding data from the critically dynamic Koyna region using the Hybrid Monte Carlo-based neural network approach. Nonlinear Process Geophys 18(2):179–192

    Article  Google Scholar 

  • Maiti S, Erram VC, Gupta G, Tiwari RK, Kulkarni UD, Sangpal RR (2013) Assessment of groundwater quality: a fusion of geochemical and geophysical information via Bayesian neural networks. Environ Monit Assess 185(4):3445–3465

    Article  Google Scholar 

  • Maiti S, Das A, Shah R, Gupta G (2017) Application of automatic relevance determination model for groundwater quality index prediction by combining hydro-geochemical and geo-electrical data. Model Earth Syst Environ 3(4):1371–1382

    Article  Google Scholar 

  • Meshram SG, Singh VP, Kisi O, Karimi V, Meshram C (2020) Application of artificial neural networks, support vector machine and multiple model-ANN to sediment yield prediction. Water Resour Manage 34(15):4561–4575

    Article  Google Scholar 

  • Miao KC, Han TT, Yao YQ, Lu H, Chen P, Wang B, Zhang J (2020) Application of LSTM for short term fog forecasting based on meteorological elements. Neurocomputing 408:285–291

    Article  Google Scholar 

  • Niu ZG, Zhang HW, Liu HB (2006) Application of neural network to the prediction of coastal water quality. J Tianjin Polytech Univ 25(2):89–92

    Google Scholar 

  • O’Hagan A (1978) Curve fitting and optimal design for prediction. J Roy Stat Soc: Ser B (methodol) 40(1):1–24

    Google Scholar 

  • Podgorski JE, Labhasetwar P, Saha D, Berg M (2018) Prediction modeling and mapping of groundwater fluoride contamination throughout India. Environ Sci Technol 52(17):9889–9898

    Article  Google Scholar 

  • Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9(2):181–199

    Article  Google Scholar 

  • Prasad R, Pandey A, Singh KP, Singh VP, Mishra RK, Singh D (2012) Retrieval of spinach crop parameters by microwave remote sensing with back propagation artificial neural networks: A comparison of different transfer functions. Adv Space Res 50(3):363–370

    Article  Google Scholar 

  • Qaderi F, Babanezhad E (2017) Prediction of the groundwater remediation costs for drinking use based on the quality of water resource, using artificial neural network. J Clean Prod 161:840–849

    Article  Google Scholar 

  • Rafique T, Naseem S, Usmani TH, Bashir E, Khan FA, Bhanger MI (2009) Geochemical factors controlling the occurrence of high fluoride groundwater in the Nagar Parkar area, Sindh. Pak J Hazard Mater 171(1–3):424–430

    Article  Google Scholar 

  • Raj D, Shaji E (2017) Fluoride contamination in groundwater resources of Alleppey, southern India. Geosci Front 8(1):117–124

    Article  Google Scholar 

  • Rasmussen CE (2003) Gaussian processes in machine learning. Summer school on machine learning. Springer, Berlin, pp 63–71

    Google Scholar 

  • Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press

    Google Scholar 

  • Schmidhuber J, Wierstra D, Gomez FJ (2005) Evolino: Hybrid neuroevolution/optimal linear search for sequence prediction. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI).

  • Selvam S (2015) A preliminary investigation of lithogenic and anthropogenic influence over fluoride ion chemistry in the groundwater of the southern coastal city, Tamilnadu, India. Environ Monit Assess 187(3):106

    Article  Google Scholar 

  • Shu J (2006) Using neural network model to predict water quality. North Environ 31(1):44–46

    Google Scholar 

  • Singh D, Singh B (2020) Investigating the impact of data normalization on classification performance. Appl Soft Comput 97:105524

    Article  Google Scholar 

  • Singh B, Sihag P, Singh K (2017) Modelling of the impact of water quality on infiltration rate of soil by random forest regression. Modeling Earth Syst Environ 3:999–1004

    Article  Google Scholar 

  • Statnikov A, Aliferis CF, Hardin DP, Guyon I (2013) Gentle Introduction To Support Vector Machines In Biomedicine, A-Volume 2: Case Studies And Benchmarks. World Scientific Publishing Company.

  • Suneetha N, Gupta G, Shailaja G (2018) Geochemical provenance and spatial variation of fluoride in groundwater of Sindhudurg district, Western Maharashtra. Int J Res Granthaalayah 6(5):17–29

    Article  Google Scholar 

  • Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res Atmos 106(D7):7183–7192

    Article  Google Scholar 

  • Todd DK (1980) Groundwater Hydrogeology, 2nd edn. John Willey and Sons, New York, p 537

    Google Scholar 

  • USPHS (1987) Drinking water standards. The United States Public Health Services Publication, Washington

    Google Scholar 

  • Valenzuela-Vasquez L, Ramirez-Hernandez J, Reyes-Lopez J, Sol-Uribe A, Lazaro-Mancilla O (2006) The origin of fluoride in groundwater supply to Hermosillo City, Sonora. Mexico Environ Geol 51(1):17–27

    Article  Google Scholar 

  • Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    Book  Google Scholar 

  • WHO (2008) Guidelines for drinking-water quality, 3rd edn. Switzerland, Geneva, p 494

    Google Scholar 

  • Wu KP, Wang SD (2009) Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recogn 42(5):710–717

    Article  Google Scholar 

  • Xiang SL, Liu ZM, Ma LP (2006) Study of multivariate linear regression analysis model for groundwater quality prediction. Guizhou Science 24(1):60–62

    Google Scholar 

  • Zare Farjoudi S, Alizadeh Z (2021) A comparative study of total dissolved solids in water estimation models using Gaussian process regression with different kernel functions. Environ Earth Sci 80(17):1–14

    Article  Google Scholar 

  • Zhu S, Heddam S, Wu S, Dai J, Jia B (2019) Extreme learning machine-based prediction of daily water temperature for rivers. Environ Earth Sci 78(6):1–17

    Article  Google Scholar 

  • Zou R, Lung WS, Wu J (2007) An adaptive neural network embedded genetic algorithm approach for inverse water quality modeling. Water Resour Res 43:8

    Article  Google Scholar 

Download references

Acknowledgements

We thank Director, IIT (ISM), Dhanbad for permitting us to publish the work. PKG is grateful to IIT(ISM) for the SRF fellowship. SM acknowledges the partial financial support from the Science and Engineering Research Board (SERB), Department of Science and Technology (DST), Govt. of India, New Delhi, (Grant No: CRG/2018/001368) and TexMin project (Grant No.PSF-1H-1Y-007) for neural network research and development. We are thankful to Dr. Gautam Gupta, Indian Institute of Geomagnetism, Mumbai for motivation, suggestion and sharing the part of the water sample data used in the study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Praveen Kumar Gupta.

Ethics declarations

Competing interest

The authors state that there are no interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 19 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, P.K., Maiti, S. Enhancing data-driven modeling of fluoride concentration using new data mining algorithms. Environ Earth Sci 81, 89 (2022). https://doi.org/10.1007/s12665-022-10216-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12665-022-10216-z

Keywords

Navigation