Abstract
Groundwater pollution from nickel (Ni) has been a severe concern in Kanchanaburi Province, Thailand. Recent assessments revealed that the Ni concentration in groundwater, particularly in urban areas, often exceeded the permissible limit. The challenge for groundwater agencies is therefore to delineate regions with high susceptibility to Ni contamination. In this study, a novel modeling approach was applied to a dataset of 117 groundwater samples collected from Kanchanaburi Province between April and July 2021. Twenty site-specific initial variables were considered as influencing factors to Ni contamination. The Random Forest (RF) algorithm with Recursive Feature Elimination (RFE) function was used to select the fourteen most influencing variables. These variables were then used as input features to train a ME model to delineate the Ni contamination susceptibility at a high confidence (Area Under the Curve (AUC) validation value of 0.845). Ten input variables of the altitude, geology, land use, slope, soil type, distance to industrial areas, distance to mining areas, electric conductivity, oxidation–reduction potential, and groundwater depth were discovered in the most explaining the variation of spatial Ni contamination at very high (95.47 km2) and high (86.65 km2) susceptibility. This study devises the novel machine learning approach to identify the conditioning factors and map Ni contamination susceptibility in the groundwater, which provides a baseline dataset and reliable methods for the development of a sustainable groundwater management strategy.
References
Abbaszadeh, M., Dehghan, M., Khodadadian, A., & Heitzinger, C. (2020). Analysis and application of the interpolating element free Galerkin (IEFG) method to simulate the prevention of groundwater contamination with application in fluid flow. Journal of Computational and Applied Mathematics, 368, 112453. https://doi.org/10.1016/j.cam.2019.112453
Alizamir, M., Sobhanardakani, S., & Shahrabadi, A. H. (2019). Prediction of heavy metals concentration in the groundwater resources in Razan Plain: Extreme learning machine vs. artificial neural network and multivariate adaptive regression spline. Annals of Military and Health Sciences Research, 17, e98554. https://doi.org/10.5812/amh.98554
Baldwin, R. A. J. E. (2009). Use of Maximum Entropy Modeling in Wildlife Research. Entropy, 11, 854–866.
Barzegar, R., Moghaddam, A. A., Deo, R., Fijani, E., & Tziritis, E. (2018). Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms. Science of The Total Environment, 621, 697–712. https://doi.org/10.1016/j.scitotenv.2017.11.185
Baumann, T., Fruhstorfer, P., Klein, T., & Niessner, R. (2006). Colloid and heavy metal transport at landfill sites in direct contact with groundwater. Water Research, 40, 2776–2786. https://doi.org/10.1016/j.watres.2006.04.049
Bhagat, S. K., Tiyasha, T., Kumar, A., Malik, T., Jawad, A. H., Khedher, K. M., Deo, R. C., & Yaseen, Z. M. (2022). Integrative artificial intelligence models for Australian coastal sediment lead prediction: An investigation of in-situ measurements and meteorological parameters effects. Journal of Environmental Management, 309, 114711. https://doi.org/10.1016/j.jenvman.2022.114711
Boonsrang, A., Chotpantarat, S., & Sutthirat, C. (2018). Factors controlling the release of metals and a metalloid from the tailings of a gold mine in Thailand. Geochemistry Exploration, Environment, Analysis, 18(2), 109–119.
Breiman, Leo. (2001). Random forests. Machine Learning, 45(1), 5–32.
Chen, Q., Meng, Z., Liu, X., Jin, Q., & Su, R. (2018). Decision variants for the automatic determination of optimal feature subset in RF-RFE. Genes, 9(6), 301.
Cheng, Y.-S., Yu, T.-T., & Son, N.-T.J.R.S. (2021). random forests for landslide prediction in Tsengwen river watershed. Central Taiwan., 13, 199. https://doi.org/10.3390/rs13020199
Chotpantarat, S., Limpakanwech, C., Siriwong, W., Siripattanakul, S., Sutthirat, C. (2011). Effects of soil water characteristic curves on simulation of nitrate vertical transport in a Thai agricultural soil. Sustainable Environment Research. 21.
Chotpantarat, S., Ong, S. K., Sutthirat, C., & Osathaphan, K. J. J. S. R. (2008). Heavy metal contamination of groundwater and surrounding soils by tailing leachates from a gold mine in Thailand. Journal of Science Research, 33, 101–112.
Chotpantarat, S., Parkchai, T., & Wisitthammasri, W. (2020). Multivariate statistical analysis of hydrochemical data and stable isotopes of groundwater contaminated with nitrate at Huay Sai Royal Development Study Center and adjacent areas in Phetchaburi Province Thailand. Water, 12(4), 1127.
Chotpantarat, S., & Thamrongsrisakul, J. (2021). Natural and anthropogenic factors influencing hydrochemical characteristics and heavy metals in groundwater surrounding a gold mine. Thailand. Journal of Asian Earth Sciences, 211, 104692.
Darst, B. F., Malecki, K. C., & Engelman, C. D. (2018). Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC Genetics, 19, 1–6.
De Jesus, K. L. M., Senoro, D. B., Dela Cruz, J. C., & Chan, E. B. (2022). Neuro-particle swarm optimization based in-situ prediction model for heavy metals concentration in groundwater and surface water. Toxics, 10(2), 95.
Egbueri, J. C. (2020). Heavy metals pollution source identification and probabilistic health risk assessment of shallow groundwater in Onitsha, Nigeria. Analytical letters, 53(10), 1620–1638.
Ghadimi, F. (2017). Machine Learning Algorithm for Prediction of Heavy Metal Contamination in the Groundwater in the Arak Urban Area. Journal of Tethys, 5, 115–127.
Gunarathna, M. H. J. P., Kumari, M. K. N., & Nirmanee, K. G. S. (2016). Evaluation of interpolation methods for mapping pH of groundwater. International Journal of Latest Technology in Engineering, Management & Applied Science, 3, 1–5.
Hayyat, M. S., Adnan, M., Awais, M., Bilal, H. M., Khan, B., & Rahman, H. A. (2020). Effect of heavy metal (Ni) on plants and soil: A review. International Journal Application Research, 6(7), 313–318.
Hernandez, P. A., Graham, C. H., Master, L. L., & Albert, D. L. (2006). The effect of sample size and species characteristics on performance of different species distribution modeling methods. Ecography, 29(5), 773–785.
Hui, H., Jin, Q., & Kavan, P. (2014). A study of heavy metal pollution in China: Current status, pollution-control policies and countermeasures. Sustainability, 6(9), 5820–5838. https://doi.org/10.3390/su6095820
Ijlil, S., Essahlaoui, A., Mohajane, M., Essahlaoui, N., Mili, E. M., & Van Rompaey, A. (2022). Machine learning algorithms for modeling and mapping of groundwater pollution risk: A study to reach water security and sustainable development (Sdg) goals in a mediterranean aquifer system. Remote Sensing, 14(10), 2379. https://doi.org/10.3390/rs14102379
Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620–630. https://doi.org/10.1103/PhysRev.106.620
Jeon, H., & Oh, S. (2020). Hybrid-recursive feature elimination for efficient feature selection. Applied Sciences, 10(9), 3211.
Johnson, R.A., Chawla, N.V., Hellmann, J.J. (2012). Species distribution modeling and prediction: A class imbalance problem. In: 2012 Conference on intelligent data understanding, pp. 9–16.
Kaky, E., Nolan, V., Alatawi, A., & Gilbert, F. (2020). A comparison between Ensemble and MaxEnt species distribution modelling approaches for conservation: A case study with Egyptian medicinal plants. Ecological Informatics, 60, 101150.
Kanmani, S., & Gandhimathi, R. (2013). Investigation of physicochemical characteristics and heavy metal distribution profile in groundwater system around the open dump site. Applied Water Science, 3(2), 387–399. https://doi.org/10.1007/s13201-013-0089-y
Ke, B., Nguyen, H., Bui, X.-N., Bui, H.-B., Choi, Y., Zhou, J., Moayedi, H., Costache, R., & Nguyen-Trang, T. (2021). Predicting the sorption efficiency of heavy metal based on the biochar characteristics, metal sources, and environmental conditions using various novel hybrid machine learning models. Chemosphere, 276, 130204. https://doi.org/10.1016/j.chemosphere.2021.130204
Kuhn, M. (2012). Variable selection using the caret package. pp. 1–24.
Kun, W., Nan, Q., & Tianqi, W. (2020). Philosophical analysis of the meaning and nature of entropy and negative entropy theories. Complexity, 2020, 1–11. https://doi.org/10.1155/2020/8769060
Mahya, N., & Hesam, S. K. (2021). Urban flood hazard mapping using machine learning models: GARP, RF MaxEnt and NB. Natural Hazards, 106(1), 119–137.
Meng, Q., Liu, Z., & Borders, B. E. (2013). Assessment of regression kriging for spatial interpolation—comparisons of seven GIS interpolation methods. Cartography and Geographic Information Science, 40(1), 28–39. https://doi.org/10.1080/15230406.2013.762138
Modis, K., Vatalis, K. I., & Sachanidis, Ch. (2013). Spatiotemporal risk assessment of soil pollution in a lignite mining region using a Bayesian maximum entropy (BME) approach. International Journal of Coal Geology, 112, 173–179. https://doi.org/10.1016/j.coal.2012.11.015
Mohankumar, K. (2016). Heavy metal contamination in groundwater around industrial estate vs residential areas in Coimbatore, India. Journal of Clinical and Diagnostic Research. https://doi.org/10.7860/JCDR/2016/15943.7527
Mosavi, A., Hosseini, F. S., Choubin, B., Abdolshahnejad, M., Gharechaee, H., Lahijanzadeh, A., & Dineva, A. A. (2020). Susceptibility prediction of groundwater hardness using ensemble machine learning models. Water, 12(10), 2770.
Mosavi, A., Hosseini, F. S., Choubin, B., Goodarzi, M., & Dineva, A. A. (2020). Groundwater salinity susceptibility mapping using classifier ensemble and Bayesian machine learning models. IEEE Access, 8, 145564–145576. https://doi.org/10.1109/ACCESS.2020.3014908
Mousazade, M., Ghanbarian, G., Pourghasemi, H. R., Safaeian, R., & Cerdà, A. (2019). Maxent data mining technique and its comparison with a bivariate statistical model for predicting the potential distribution of Astragalus Fasciculifolius Boiss. in Fars, Iran. Sustainability, 11(12), 3452. https://doi.org/10.3390/su11123452
Naghibi, S. A., Ahmadi, K., & Daneshi, A. (2017). Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Resources Management, 31(9), 2761–2775. https://doi.org/10.1007/s11269-017-1660-3
Nilkarnjanakul, W., Watchalayann, P., & Chotpantarat, S. (2022). Spatial distribution and health risk assessment of As and Pb contamination in the groundwater of Rayong Province Thailand. Environmental Research, 204, 111838.
Peng, C., Cai, Y., Wang, T., Xiao, R., & Chen, W. (2016). Regional probabilistic risk assessment of heavy metals in different environmental media and land uses: An urbanization-affected drinking water supply area. Scientific Reports, 6(1), 1–9.
Pham, B. T., Phong, T. V., Nguyen-Thoi, T., Parial, K., Singh, S., Ly, H. B., Nguyen, K. T., Ho, L. S., Le, H. V., & Prakash, I. (2022). Ensemble modeling of landslide susceptibility using random subspace learner and different decision tree classifiers. Geocarto International., 37(3), 735–57.
Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3–4), 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026
Podgorski, J., Ruohan, W., Chakravorty, B., & Polya, D. A. (2020). Groundwater arsenic distribution in India by machine learning geospatial modeling. International Journal of Environmental Research and Public Health, 17(19), 7119. https://doi.org/10.3390/ijerph17197119
Rahmati, O., Pourghasemi, H. R., & Melesse, A. M. J. C. (2016). Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region Iran. Catena, 137, 360–372.
Ray, S. (2019). A quick review of machine learning algorithms. In: 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), pp. 35–39.
Sajedi-Hosseini, F., Malekian, A., Choubin, B., Rahmati, O., Cipullo, S., Coulon, F., & Pradhan, B. (2018). A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Science of the Total Environment, 644, 954–962. https://doi.org/10.1016/j.scitotenv.2018.07.054
Sumdang, N., Chotpantarat, S., Cho, K.W., & Thanh, N. N. (2023). The risk assessment of arsenic contamination in the urbanized coastal aquifer of Rayong groundwater basin Thailand using the machine learning approach. Ecotoxicology and Environmental Safety, 253, 114665. https://doi.org/10.1016/j.ecoenv.2023.114665
Shankar, M., & Prabhat, P. (2013). Study of major and trace elements in groundwater of Birsinghpur Area, Satna District Madhya Pradesh. India., 5, 380–386. https://doi.org/10.5897/IJWREE2012.0408
Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical journal, 27(3), 379–423.
Shyamala, G., Arun Kumar, B., Manvitha, S., Vinay Raj, T. (2020). Assessment of spatial interpolation techniques on groundwater contamination. In: International conference on emerging trends in engineering (ICETE), pp. 262–269.
Singh, A., Thakur, N., Sharma, A. (2016). A review of supervised machine learning algorithms. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), pp. 1310–1315.
Singha, S., Pasupuleti, S., Singha, S. S., & Kumar, S. (2020). Effectiveness of groundwater heavy metal pollution indices studies by deep-learning. Journal of Contaminant Hydrology, 235, 103718. https://doi.org/10.1016/j.jconhyd.2020.103718
Singha, S., Pasupuleti, S., Singha, S. S., Singh, R., & Kumar, S. (2021). Prediction of groundwater quality using efficient machine learning technique. Chemosphere, 276, 130265.
Sun, D., Qingyu, G., Wen, H., Jiahui, X., Zhang, Y., Shi, S., Xue, M., & Zhou, X. (2022). Assessment of landslide susceptibility along mountain highways based on different machine learning algorithms and mapping units by hybrid factors screening and sample optimization. Gondwana Research. https://doi.org/10.1016/j.gr.2022.07.013
Sun, D., Shi, S., Wen, H., Xu, J., Zhou, X., & Wu, J. (2021). A hybrid optimization method of factor screening predicated on GeoDetector and random forest for landslide susceptibility mapping. Geomorphology, 379, 107623.
Thanh, N. N., Chotpantarat, S., Trung, N. H., & Ngu, N. H. (2022). Mapping groundwater potential zones in Kanchanaburi Province, Thailand by integrating of analytic hierarchy process, frequency ratio, and random forest. Ecological Indicators, 145, 109591.
Thanh, N. N., Thunyawatcharakul, P., Ngu, N. H., & Chotpantarat, S. (2022). Global review of groundwater potential models in the last decade: Parameters, model techniques, and validation. Journal of Hydrology, 614, 128501. https://doi.org/10.1016/j.jhydrol.2022.128501
Tiankao, W., & Chotpantarat, S. (2018). Risk assessment of arsenic from contaminated soils to shallow groundwater in Ong Phra Sub-District, Suphan Buri Province, Thailand. Journal of Hydrology: Regional Studies, 19, 80–96.
Tien Bui, D., Shirzadi, A., Chapi, K., Shahabi, H., Pradhan, B., Pham, B. T., Singh, V. P., Chen, W., Khosravi, K., & Bin Ahmad, B. (2019). A hybrid computational intelligence approach to groundwater spring potential mapping. Water, 11, 2013. https://doi.org/10.3390/w11102013
Trabelsi, F., & BelHadjAli, S. (2022). Exploring machine learning models in predicting irrigation groundwater quality indices for effective decision making in medjerda River Basin, Tunisia. Sustainability, 14, 2341. https://doi.org/10.3390/su14042341
Twarakavi, N. K., & Kaluarachchi, J. J. (2005). Aquifer vulnerability assessment to heavy metals using ordinal logistic regression. Groundwater, 43(2), 200–214. https://doi.org/10.1111/j.1745-6584.2005.0001.x
Uliasz-Misiak, B., Winid, B., Lewandowska-Śmierzchalska, J., & Matuła, R. (2022). Impact of road transport on groundwater quality. Science of The Total Environment, 824, 153804. https://doi.org/10.1016/j.scitotenv.2022.153804
Ullah, R., Malik, R.N., Qadir, A.J.A.J.o.E.S., Technology, 2009. Assessment of groundwater contamination in an industrial city, Sialkot, Pakistan. 3.
van der Grift, B., & Griffioen, J. (2008). Modelling assessment of regional groundwater contamination due to historic smelter emissions of heavy metals. Journal of Contaminant Hydrology, 96(1–4), 48–68. https://doi.org/10.1016/j.jconhyd.2007.10.001
Venkatramanan, S., Chung, S., Kim, T., Prasanna, M. V., Hamm, S. J. W. Q., & Exposure, H. (2015). Assessment and distribution of metals contamination in groundwater: A case study of Busan City, Korea. Water Quality, Exposure and Health, 7, 219–225. https://doi.org/10.1007/s12403-014-0142-6
Wahyudi, A., Bartzke, M., Küster, E., & Bogaert, P. (2013). Maximum entropy estimation of a Benzene contaminated plume using ecotoxicological assays. Environmental Pollution, 172, 170–179. https://doi.org/10.1016/j.envpol.2012.08.018
Waleeittikul, A., Chotpantarat, S., & Ong, S. K. (2019). Impacts of salinity level and flood irrigation on Cd mobility through a Cd-contaminated soil, Thailand: experimental and modeling techniques. Journal of Soils and Sediments, 19(5), 2357–2373. https://doi.org/10.1007/s11368-018-2207-9
Wang, M., Chen, H., & Lei, M. (2022). Identifying potentially contaminated areas with MaxEnt model for petrochemical industry in China. Environmental Science and Pollution Research, 29(36), 54421–54431. https://doi.org/10.1007/s11356-022-19697-8
Wei, P., Zhu, W., Zhao, Y., Fang, P., Zhang, X., Yan, N., & Zhao, H. J. R. S. (2021). Extraction of Kenyan grassland information using PROBA-V based on RFE-RF algorithm. Remote Sensing, 13, 4762.
Wisitthammasri, W., Chotpantarat, S., & Thitimakorn, T. (2020). Multivariate statistical analysis of the hydrochemical characteristics of a volcano sedimentary aquifer in Saraburi Province, Thailand. Journal of Hydrology: Regional Studies, 32, 100745.
Wongsasuluk, P., Chotpantarat, S., Siriwong, W., & Robson, M. (2014). Heavy metal contamination and human health risk assessment in drinking water from shallow groundwater wells in an agricultural area in Ubon Ratchathani province, Thailand. Environmental geochemistry and health, 36, 169–182.
Yang, J., Song, C., Yang, Y., Xu, C., Guo, F., & Xie, L. (2019). New method for landslide susceptibility mapping supported by spatial logistic regression and GeoDetector: A case study of Duwen Highway Basin, Sichuan Province, China. Geomorphology, 324, 62–71.
Yao, X., Bojie, F., Lü, Y., Sun, F., Wang, S., & Liu, M. (2013). Comparison of four spatial interpolation methods for estimating soil moisture in a complex terrain catchment. PLoS ONE, 8(1), e54660. https://doi.org/10.1371/journal.pone.0054660
Zhang, B., & Wang, H. (2022). Exploring the advantages of the maximum entropy model in calibrating cellular automata for urban growth simulation: A comparative study of four methods. GIScience & Remote Sensing, 59(1), 71–95.
Zhou, X., Wen, H., Zhang, Y., Xu, J., & Zhang, W. (2021). Landslide susceptibility mapping using hybrid random forest with GeoDetector and RFE for factor optimization. Geoscience Frontiers, 12(5), 101211.
Acknowledgements
The researchers would like to thank the Interdisciplinary Program in Environmental Science, Graduate School, Chulalongkorn University, Centre for Agriculture and the Bioeconomy, Queensland University of Technology, and Hue University of Agriculture and Forestry, Hue University. We acknowledge financial support from the National Research Council of Thailand (NRCT): NRCT5-RSA63001-06 and partially support by the Ratchadapisek Sompoch Endowment Fund (2022), Chulalongkorn University (765007-RES02). Nguyen Ngoc Thanh has received the ASEAN/NON-ASEAN Scholarship and the 90th Anniversary of Chulalongkorn University Scholarship from Chulalongkorn University for Ph.D. Program at Graduate School, Chulalongkorn University. We are grateful for the thorough reviews of anonymous reviewers. Their valuable comments significantly improved the earlier draft of this article.
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Contributions
“Conceptualization, N.N.T. and S.C..; methodology, N.N.T. and S.C.; validation, N.N.T. and S.C.; formal analysis, N.N.T.; investigation, N.N.T.and S.C.; resources, S.C.; data curation, N.N.T. and S.C.; writing—original draft preparation, N.N.T.; writing—review and editing, S.C., H.N.T., N.H.T.; visualization, N.N.T.; supervision, S.C.; project administration, S.C.; funding acquisition, S.C. All authors reviewed the manuscript.”
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Thanh, N.N., Chotpantarat, S., Ha, NT. et al. Determination of conditioning factors for mapping nickel contamination susceptibility in groundwater in Kanchanaburi Province, Thailand, using random forest and maximum entropy. Environ Geochem Health 45, 4583–4602 (2023). https://doi.org/10.1007/s10653-023-01512-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10653-023-01512-z