Abstract
Health hazard risk mapping (HHRM) is an important technique used to estimate the potential health risk of an individual, a group, or an entire community of a region. To further progress this work, 67 samples were collected through field investigation in the dry season i.e., from March to early June of 2021 from different parts of hardy rock dominated Purulia district. In this study, 14-health hazard causative factors were considered such as Depth (m), pH, EC (μS/cm), HCO3 (mg/L), As (μg/l), Ca2+ (mg/L), Cl− (mg/L), F− (mg/L), K+ (mg/L), Mg2+ (mg/L), Na+ (mg/L), NO3 (mg/L), PO42−(mg/L), SO42− (mg/L). All of these parameters are selected using multi-collinearity and Pearson’s correlation test. Furthermore, three important machine learning algorithms namely bagging, random forest (RF), and an ensemble of bagging and RF were employed to assess the HHRM. The outcome of the learning models were evaluated by statistical validating methods such as AUC-ROC, sensitivity, specificity, accuracy, precision, F-score, kappa, and Taylor diagram. The result of validating techniques ensure that ensemble technique is more reliable in training (AUCROC-0.934, sensitivity-0.917, specificity-0.925, accuracy-0.921, precision-0.925, F-score-0.922 and kappa-0.851) and validating dataset (AUCROC-0.911, sensitivity-0.904, specificity-0.905, accuracy-0.902, precision-0.907, F-score-0.907 and kappa-0.819) with Taylor diagram (r = 0.94) followed by bagging and RF. The produced result shows the central part of the study area especially the districts of Bagmundi, Balarampur, Arsha, Purulia I and II, Raghunathpur are significantly susceptible to the health hazard due to poor water quality that covers around 15% of the total area.
Similar content being viewed by others
Introduction
The subsurface water, commonly known as groundwater is the significant and purest natural resource, essential for mankind to survive (Li et al. 2012), and worldwide more than 1.5 billion people directly or indirectly depend on groundwater for drinking purposes (Shen et al. 2008). According to Adimalla and Li (2019), and Li et al. (2019), arid and semi-arid regions of the world are substantially reliant on groundwater to fulfill their daily needs including drinking, domestic, irrigation purposes due to scarcity of surface water. Safe drinking water is an internationally accepted human right for better health. The growing populations make sequential changes on the natural environment, human activities such as agricultural patterns, and industrial needs, resulting in severe water emergencies and gradually deteriorate water quality day by day that emerges several environmental adversities as well as health-related problems on the human body (Chabukdhara et al. 2017; Qian et al. 2012; Zhang et al. 2018; Li et al. 2017). Usually, groundwater quality relies on the geological structure of aquifer in a particular region and congregation of chemical elements (Parameswari and Padmini 2018), but in recent times this subsurface water facing quite a lot of threatening conditions due to natural reasons that weathering of rock-forming material and secondary reasons such as uncontrolled water usage, deficiency of water recharge, wastewater discharge, agricultural and industrial activities, extreme use of fertilizer, substantial evaporation and a small amount of rainfall (Tiwari and Singh 2014; Adimalla and Venkatayogi 2016; Kumar et al. 2017; Ahmed et al. 2019). According to World Health Organization (WHO) groundwater extraction significantly increased in the last few decades affecting the quality of groundwater and increasing the health risk. The pollutants, due to their toxicity, carcinogenic nature brings several health risks for a significant number of populations; these pollutants carrier water makes 70% of the diseases and worldwide 20% of cancers are related to water contamination. Various research works have been carried out previously on groundwater quality and associated health hazards in several countries in the world such as India (Garg et al. 2009; Srinivasan and Reddy 2009; Arora et al. 2014; Adimalla and Qian 2019; Singh et al. 2020; Sarkar and Chandra Pal 2021), South Africa (Edokpayi et al. 2018; Genthe et al. 2018; Mthembu et al. 2020; Madilonga et al. 2021), Bangladesh (Islam et al. 2018; Rahman et al. 2018; Bodrud-Doza et al. 2020; Zakir et al. 2020), Sri Lanka (Dissanayake 1991; Pinto et al. 2020), Pakistan (Bhowmik et al. 2015; Raza et al. 2017; Mazhar et al. 2019; Murtaza et al. 2020), China (Liu et al., 2021; Nsabimana et al. 2021; Wang and Li, 2022; Zhang et al., 2020b). Though, there are several trace elements, which are crucial for human health, are there in groundwater, yet too much consumptions of these can make an adverse effect on the human body i.e., reproductive effects, neurological disorder, breathing problem, lung disease, cancer, hypertension, etc. (Islam et al. 2017; Nkpaa et al. 2018). Infants are severely affected compared to adults through excessive intake of nitrate contaminated water and pose the disease named ‘blue baby syndrome’ (Tian and Wu 2019); a significant number of populations facing dental and skeletal fluorosis due to long term fluoride consumption (standard limit, Bureau of Indian Standards-1 (mg/L), WHO-1.5 (mg/L) (Mondal et al. 2016; Sarkar and Chandra Pal 2021); arsenic also caused various health problems in the human body including skin disease, neurological disorder and cancer due to long term drinking of arsenic dominated water (Rahman et al. 2005; Mazumder et al. 2010). According to world Health Organization unwanted Fe consumption can lead to joint pain, fatigue ‘Hemachromatosis’ disease. Excessive Mn consumption for a long time through drinking water can responsible for reproductive damage and neurological disorder in the human body (US EPA 2015).
However, water quality measurement is the prime concern for human health parallelly, human, social, and economic development prominently depends on safe drinking water (Vasanthavigar et al. 2010); About 85% of the rural population merely depends on subsurface water (Garg et al. 2009); because of it, safe water supply and sanitation were one of the important agenda during first 5-year plan (1951–1956) and Millennium Development Goals (2000–2015). Many countries in the world such as India is facing severe water contamination with higher amounts of arsenic, fluoride, salinity, heavy metals traced elements (Wei et al. 2021; Su et al. 2020). Ahmad et al. (2019) demonstrate that how rapid urbanization increase the contamination level in shallow depth groundwater due to its more exposure towards earth surface compare to intermediate and deep depth groundwater level. According to Adimalla et al. (2018), about 21 states of India and around 6.6 crores people have been affected due to fluorosis and many other diseases. Among them, some districts of West Bengal state are highly vulnerable to arsenic, fluoride, and salinity-related problems (Samal et al. 2015; Nath et al. 2021). So, the drinking water supply without proper testing and mitigation triggers the health risk of the rural population (Mondal and Pal 2015). Till now, so many studies have been done on water quality assessment of a particular region but very few studies have been done on HHRM based on hydro chemical properties of water. Although, proper water testing and HHRM can play a significant role to determine affected areas based on testing results. HHRM is a significant assessment tool that can correlate several hydrochemical properties with human health; can identify the potential health risk zone of that area.
Till now, various researchers followed different methodologies and statistical techniques i.e., multivariate statistical technique (Reghunath et al. 2002; Gulgundi and Shetty 2018), water quality index (Sadat-Noori et al. 2014; Zhang et al. 2020a), fuzzy method (Wu et al. 2019), Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) (Li et al. 2013) to assess the water quality with relation to health risk for sustainable development of a particular region. According to Sajedi-Hosseini et al. (2018) these index and statistical methods have a significant role in recognizing the status of groundwater around the world. Very few data-driven models such as neural network and projection pursuit statistical techniques have been used in groundwater studies (Salman and Abu Ruka’h 1999). The interpolation method is also used in water quality assessment but the use of ML algorithms are quite new in ground water research, which is emerging as a capable and promising multi-functional approach to all scientific field. Over the last three decades, different ML methods have been applied extensively to assess the groundwater quality due to enhanced performance in comparison to traditional statistical analysis (Lu and Ma 2020; Aldhyani et al. 2020). The advancement of new ML tools in hydrological studies such as support vector regression and regression tree for wastewater management (Granata et al. 2017); gradient boosting, extreme gradient boosting (XGBoost), deep neural network for groundwater quality assessment (Raheja et al. 2021). Kouadri et al. (2021) proposed three models long short-term memory, multi-linear regression, and artificial neural network for predicting irrigation groundwater quality; bagging, boosting, and RF algorithms were also employed in predicting nitrate concentration in the coastal aquifer of Bangladesh (Islam et al. 2021). All these ML algorithms have large applicability due to their accuracy and precession level. The bagging and RF ML techniques emerge as a significant tool in groundwater quality studies and very few studies have done with relation to human health risk assessment. In groundwater, studies use bagging and RF algorithms are significant methods among all the ML algorithms due to their accuracy, precision level, and very few usages in this purpose (He et al. 2022a, b; Pal et al. 2022). This bagging ML is a popular ensemble algorithm (Prasad et al. 2006; Talukdar et al. 2020), developed by (Breiman 1996) and have used in predicting several natural hazards predictions (Hong et al. 2018; Truong et al. 2018; Pham and Prakash 2019; Yariyan et al. 2020) with high precession level but very few studies have done in groundwater studies as well as in HHRM. Another important ML tool RF introduced by (Breiman 2001). Several researchers successfully applied RF algorithms in their research such as Sihag et al. (2019) in hydraulic conductivity estimation; Pal (2005) in land cover classification and Norouzi and Shahmohammadi-Kalalagh (2019) in groundwater studies.
Purulia district of West Bengal is the extension part of Chhotanagpur plateau region, significantly characterized by lack of groundwater and poor water quality, where frequent fluoride and any other traced elements contamination have been reported (GEC 2017). The available number of research study of the entire Purulia district has been basically focused on groundwater potentiality rather than drinking water quality with relation to human health risk. Although previsouly several studies (Kundu and Nag 2018; Farooq et al. 2018; Bera and Das 2021; Chowdhury et al. 2021) have been done on groundwater quality assessment using either one or two pollutants and considering a particular block area. Therefore, there is a lack of research work on water quality related to health hazard studies of the entire Puruliya district. Thus, the prime objectives of this research work are to find out the significant water pollutants and make a relation with water causing health hazard by developing HHRM of entire Purulia district using bagging, RF and ensemble of bagging and RF machine learning algorithms. Henceforth, keeping this view, the applicability of single ML as well as ensemble model with considering fourteen health hazard susceptibility map is crucial for the entire study area, and this approach is the novelty of this research study. Furthermore, to measure the degree of accuracy and liability of each method eight validating techniques are employed i.e., receiver operating characteristics (ROC) curve (Chowdhuri et al. 2020; Roy et al. 2020), sensitivity, specificity, accuracy, precision, F-score, kappa coefficient, and Taylor diagram (Taylor 2001). The produced validation result is very much significant to the local and regional authorities in taking proper precautions in the reduction of health risks by sustainable water management. The outcome maps can also be very much helpful to identify severe health risk zone and associated measures should be taken to control and minimize this risk in the study area. The proposed approach i.e., ensemble learning of bagging and RF can also be useful in health hazard risk studies as well as other vulnerability studies in several parts of the world.
Materials and Methods
Study Area
Purulia district is the important plateau region of West Bengal, located southwestern part of this state with an absolute location of 22°43′–23°42′ north latitude and 85°49′–86°54′ East longitude (Fig. 1). This district lies as a transition zone between Chhotanagpur plateau and Damodar alluvium plains that are structurally a part of Ranchi peneplain; thus, low valleys, rolling topography, scattered hills are the key physiographic characteristics in this region. Several districts of Jharkhand (Ranchi, Singbhum, and Hazaribagh-Dhanbad) and West Bengal (Paschim Barddhaman, Bankura, and Paschim Medinipur) are sharing their border with our study. Rainfall has mainly taken place due to SW monsoon in June–September, the normal rainfalls have been recorded as 1028 mm and 248 mm in monsoon and non-monsoon season, respectively. Although most of the regions of this district are significantly facing water scarcity due to their hard topography and sloppy surface, therefore around 50% of the water flow as surface runoff due to its very low porosity, permeability, and infiltration rate (Ruidas et al. 2021). Various streams are flowing towards the east and southeast specifically Kangsabati, Damodar, Darakeswar, Kumari, and Subarnarekha among them Kangsabati is the main river in our study area. Besides, Subarnarekha is only one river that flows towards the south of the district; as they originate from the plateau region that result in significant water shortages during the cold and hot season. The total contained area of this district is 6259 km2, based on this occupies 5th position among all the states of West Bengal; Subtropical climate made this district a drought-prone region due to its low rainfall, it has been reported that a medium type of draught situation repeated every 3 years whereas severe drought in every 10 years. Maximum evaporation occurs in February to May due to extreme temperature increases that can be touched up to 46 °C; besides, the recorded lowest temperature is 5 °C in winter in this region. The total inhabitants of this district are 2,930,115 with a low population density of 468/km2, among them, 87.26% and 12.74% populations are rural and urban dwellers, respectively, there is only 64.48% populations are literate. This low literacy rate and lack of water treatment plants makes Purulia district highly vulnerable to health hazard (Farooq et al. 2018). The presence of large amounts of fluoride, potash, arsenic, calcium carbonate makes the drinking water unsuitable for drinking as well as for agricultural activities. Studies revealed that groundwater arsenic contamination in Puruliya is about < 10 μg/L which basically fall in unaffected categories although this littlebit contamination has wide effect on groundwater-related health hazard (Chakraborti et al. 2009). The huge rural and urban population is facing serious health issues due to poor water quality.
Geo-hydrological Condition
The existence, movement, and quality of groundwater are prominently controlled by geology, physiography, structural condition, aquifer location of a particular region (Senthilkumar et al. 2015). Geologically, this study region comprises with seven different underlying strata namely Chhotanagpur gneissic complex, unclassified metamorphic, Singbhum GP, Dalma volcano, Manbhum granite, Kuilapal granite series, few regions characterized with sediments, among them Chhotanagpur gneissic complex is the most dominated division. Purulia has several numbers of rivers but has very poor groundwater resources due to crystalline strata resulting in 3–12 m water level, Undulating rugged topography, scattered hills, few monadnocks are the physiographic characteristics of Purulia districts that causing significant runoff of rainwater instead of sufficient infiltration. The whole Purulia district is mainly characterized by consolidated and semi consolidated crystalline basement that resulting groundwater within 10 mbgl and having maximum discharge of 20 m3/hr, whereas some parts of Purulia covered with Gondwana sandstone with 100 mbgl water level and 22 m3/hr maximum discharge. Primarily four types of aquifer systems are observed here such as weathered zone, Saprolitic zone, Hard rock fractured zones, and unconsolidated sediments. The weathered zone is characterized with varying thickness, the maximum thickness of 25 m and yielding rate up to 2.75 lps; during peak summer it goes to dry, whereas, Saprollitic zone with an average thickness of 4 m and 2.5 lps yielding rate. The hard rock fractures zone and unconsolidated sediments are restricted within 50–110 mbgl and 5–13 mbgl, respectively. Although, in this hardy rock terrain rainwater is the primary source but river channel also helps in groundwater recharge through some fractured zone.
Sampling Techniques
In our current study, a total of 67 water samples were collected from several sources such as tube wells, dug wells, and hand pumps during the dry season (March-early June) of 2021. Portable GPS (Garmin GPS etrex10) was used to record every single water sample location which was documented by sample ID number; two separate dry and clean 500 mL polyethylene containers were used to preserve the collected water samples by adopting different sampling procedures and precautions for different sampling sources. Among two containers, one was used for testing and another one for validating and cross-checking. Furthermore, to get desirable freshwater all tube wells were pumped for a few minutes helps to avoid stagnant water. Afterward, all collected samples were shifted with necessary precaution to the laboratory of The University of Burdwan to examine the chemical properties. In the case of cations (Na+, Ca2+, K+, Mg2+) and anions (NO3, HCO3, PO42−, F−, Cl−, SO42−) estimation the Dionex ICS-90 Ion Chromatography (Islam et al. 2021) system was adopted; water checker U-10 (Khan et al. 2013) was also used in site to examine the basic properties including pH, EC.
Moreover, collected samples were tested in different laboratories to assure the accuracy level. The descriptive statistics in Table 1 show the statistical measures of all parameters. This research work has been done in a sequential order to identify HHRM which is mentioned in Fig. 2.
Inventory Data
In the making of HHRM, the preparation of inventory data is a prime concern in any research work, because it helps to predict probable susceptibility zone with the help of a mathematical relationship between the past records of health hazards with their influencing factors. In this study, we collect the past health hazard data by a field visit in the study region. We collected 67 points among them 70% were in the health hazard susceptible zone and the rest 30% were used for testing the result with the help of the Indian standard limit of water quality data (BIS 2012). According to Chung and Fabbri (2008), one set of data will be used for developing models and another set will validate the result. However, the binary classification is considered for HHRM; thus, collected inventory data is classified into two classes health hazard-prone region and non-health hazard point by employing 1 and 0 values, respectively. In this study we consider 14-health hazard causative factors (Fig. 3) to delineate HHRM through two important statistical approaches such as the multi-collinearity test (Saha et al. 2021a; Liao and Valliant 2012) and Pearson’s correlation coefficients(Tien Bui et al. 2016; Ahmad et al. 2022). Therefore, in this research work, three modeling approaches (bagging, RF, bagging-RF) were employed with suitable evaluation techniques to test the result. We extracted data from all 14-health hazard causative factors in ArcGIS 10.4 platform and employed these datasets in performing adopted learning models.
Multicollinearity Assessment
The independent nature of causative factors is necessary for assessment of HHRM; whereas, the interdependence of a large number of adopted factors can make collinearity issues that caused several regression problems. This is because of large and complex water quality data is used in modeling and multicollinearity elimination is necessary for optimal evaluation of HHRM. In this study, three methods have been adopted including, VIF (< 5), TOL (> 0.1) (Saha et al. 2021b, c), and Pearson’s correlation coefficients (\({r}^{2}= <\pm 0.7\)) (Ahmad et al. 2022) to assess the relationship among the selected factors. If the VIF, TOL, and \({r}^{2}\) value is > 5, < 0.1 and > ± 0.7, respectively, implies there is significant collinearity issues. Following equations help in identifying collinearity issues:
where the regression value is represented by \({R}_{J}^{2}\) in the dataset. Variance inflation factor and tolerance are represented by VIF and TOL, respectively.
Adopted Modeling Approach
In the current study, three predictive models were employed to develop HHRM including Bagging, RF, and an ensemble of Bagging and RF algorithms. These are carried out by using R programming and with the help of ArcGIS 10.4 software.
Bagging
Bagging bootstrapping is an ensemble classifier algorithm, generally developed to enhance the stability and precession level of ML algorithms by generating diverse classifiers. This modeling approach is commonly used for regression and classification purposes. The bootstrapping nature helps to generate classification trees from the original dataset and make aggregate classifiers. Normally, this algorithm forms a final result by combining classifications of randomly created training sets. The bagging techniques gain much popularity because of their variance reduction characteristics and simple implementation with higher accuracy. The group of ‘weak learners’ forms a ‘strong learner’ that is an important part of this modeling approach; where each decision tree defines by ‘weak learner’ and the combined result is defined by a strong learner. These all weak learners have an individual vote and the final prediction (strong learner) is made by gaining maximum votes. In our study, this bagging algorithm is used to construct an ensemble and is efficient in handling unbalanced large datasets that give liable results by eliminating overfitting problems. Henceforth, learning set of Q was measured and it is consisting of n independent features. So, as they indicate health hazard causative factors, it is characterized through \(Q=\left\{\left({X}_{I},{Y}_{i}\right),i=\mathrm{1,2},\dots ,n\right\}\). Primarily, a set of \({C}_{b}(b=\mathrm{1,2},\dots ,n)\) signifies the b-th bootstrap sample of the training set Q obtained through the replacement of n elements of Q. Afterward, boots-trapped estimator \(g*\left(\bullet \right)\) was compute using plug-in principle i.e., \(g*\left(\bullet \right)={h}_{n }\left(\left({X}_{1, }^{*}{Y}_{1}^{*}\right),\dots \left({X}_{n, }^{*}{Y}_{n}^{*}\right)\right)\left(\bullet \right)\). Finally, m times (50 or 100) of repetition of the above steps for yielding \({g}^{*k}\left(\bullet \right)\left(k=\mathrm{1,2},3,\dots ,m\right),\) and associated bagging estimator is \({g}_{Bag}\left(\bullet \right)=\frac{\sum_{k=1}^{m}{g}^{*k}()}{m}\). The following equation (Eq. 13) represent the bagging approach.
where, m expresses the finite number of precisions of the Monte Carlo approximation, and to \(m\to \infty\) represent theoretical quantity.
Random Forest (RF)
RF algorithm is the advanced version of bagging ML algorithm, made up of several decision trees that help in truthful predictions by avoiding overfitting problems in the training dataset where the traditional ML approach usually gives low classifiers accuracy with overfitting problems. In the training period, it creates multiple regression trees and samples by sampling. The acting procedure with same as bagging i.e., both make their classification trees based on bootstrapping and develop multiple decision tree but the difference is that a randomized subset of predictors helps to grow each decision tree; that works by combining the series of tree classifiers and every individual tree have a unit vote for the most preferred result. Hence, the classification is taken place by voting outcomes of multiple classifiers and then the votes together produced a final result with high classifiers accuracy, well noise, and acceptable outliers. The no-needing of cut processing, large number of trees, automatic feature selection are the significant advantages of the RF ML algorithm. In the RF algorithm, the generalization error (GE) is expressed as follows:
where P indicates predictor, \(x\) and \(y\) are health hazard conditioning factors which represent the probability space (\(x\), y), and finally \(mg\) is expressed as:
where, \(I(*)\) express the indicator function, j represents the union of hyper-rectangles, and \({h}_{k}\) is the union of hyper-rectangles.
Besides the aforementioned two ML algorithms, in this study, we adopt the ensemble of bagging and RF modeling approach to get higher precision and accuracy levels compared to an individual. Till now, this ensemble technique is not used in any research work on water quality and HHRM, this will be emerged as a very helpful ML algorithm in environmental management studies by eliminating individual drawbacks of bagging and RF both.
Relative Importance
Parameter selection is one of the significant parts of any predictive analysis and the importance of each causative factor is influenced by adopted methods and evaluation techniques. Hence, various health hazard causative factors have been used in this study but every factor has not equal importance in health hazard susceptibility. Therefore, measurement of relative importance is necessary for any research work. In this study, RF algorithm has been used to measure the individual importance of adopted factors instead of traditional statistical techniques due to the large dataset. (Breiman 2001) proposed this in supervised learning. The mean decrease accuracy (MDA) technique has been used with the help of RF algorithm to quantify the relative importance of each parameter. Bui et al. (2012) state that the relative importance of individual causative factors was set on by keeping out that factor, therefore, the accuracy of the model is calculated.
Validation Techniques
Model evaluation is important to examine the degree of accuracy and performance of the models, developed based on previous data that can relate the predicted result to the real world. Therefore, in this study, we have employed eight statistical validation techniques with one graphical technique including receiver operating characteristics (ROC) curve, sensitivity, specificity, accuracy, precision, F-score, kappa coefficient, and Taylor diagram (Taylor 2001). The AUC (area under curve) value extend from 0.5 to 1; if the value is closer to 1 it implied the superiority of models whereas the closer value of 0.5 represents the bad performance of the models (Chen et al. 2019). Therefore, four different statistical indices were applied to examine the produced result especially true positive (TP), true negative, (TN), false positive (FP), and false-negative (FN). All these measures are used to quantify the accuracy (Bellu et al. 2016); whereas precession is measured through the ratio of true positives and all positives, values ranging from 0 to 1. The higher and lower values are implying the model’s reliability and non-reliability, respectively. The combination of precession and recall is an F-score, one of the important validating techniques in which 1 is the highest range. Another significant validating measure is the kappa coefficient that has the ability to recognize the accuracy, efficiency, and consistency of the adopted models (Hoehler 2000). The derived kappa values divided into five group on the basis of their performance abilities (0–0.2, 0.2–0.4, 0.4–0.6, 0.6–0.8, 0.8–1) (Dou et al. 2020). Following equations are used for the aforementioned indices:
where, TP, TN, FP, and FN interpret the true positive, true negative, false positive, and false-negative accordingly, total and random accuracy are also represented by \({P}_{a}\) and \({P}_{exp},\) respectively. \({O}_{i}\) and \({E}_{i}\) represent the observation and predicted health hazard values, respectively, in training and validating dataset whereas, N is the total number of samples.
Result
Multicollinearity Analysis
The adopted VIF and tolerances method shows that the highest VIF and lowest tolerances values are 2.51 and 0.398, respectively (Table 2), implying all the values are within an acceptable range (VIF < 5 and TOL > 0.1). Hence, there is no interdependence with each other which specifies there is no collinearity among all 14-health hazard causative factors. In this study, another statistical technique, Pearson’s correlation coefficient has been employed to determine the collinearity among conditioning factors. The generated correlation matrix is shown in Table 3 and Fig. 4 that implies most of the variables are relatively correlated which also indicates the selected independent variables are suitable for this modeling approach. The correlation value between \({\mathrm{K}}^{+}\) and \({\mathrm{Ca}}^{2+}\) is 0.71 which is slightly higher than 0.7 which can be considerable due to its slight difference.
Optimal Tuning Parameters in Ensemble Models
Optimal tuning parameters or hyper-parameters are the best to describe of an ensemble approach. Alongside this approach is also used in the appropriate description of the classification problem. In this study, the parameter of boosting iterations and root mean square error (RMSE) is used for the tuning parameters of the Bagging and RF algorithm (Fig. 5). Boosting models perform brilliantly, with the simple understanding ability and modest hyper-parameters for optimal tuning. The mean, median, and best values in a Boosting model based on Bagging and the RF models were used to calculate the link between iteration and its corresponding RMSE.
Health Hazard Risk Map (HHRM)
From the aforementioned data, it is established that the groundwater quality of the study region is the primary concern, Hence, HHRM is necessary to evaluate the health hazard zone because a large number of people significantly relied on groundwater for drinking purposes. To develop the HHRM the appropriate modeling approaches are very helpful to identify and quantify the potential health risk because of contact with several contaminants in different ways (Varol and Davraz 2015). Various studies (Adimalla and Li 2019; Adimalla and Wu 2019; He and Li 2020) have displayed the relation between water quality and probable health risk due to exceeding the permissible limit. However, Health hazard susceptibility assessment is a vital part rather than traditional water quality assessment. In this study, adopted three novel models namely bagging, RF, and an ensemble of bagging and RF developed the potential health hazard susceptible zone with the help of fourteen causative factors, those have diverse distribution patterns throughout the entire district in the investigation year (2021) that previously mentioned in Fig. 3, such as the SE portion, some parts of NW and NE are characterized with low water depth (2.07–3.37 m) situated at Manbazar II and Bandwan, Neturia and Santuri whereas, most of the region characterized with moderate to very high depth (3.92–7.97 m). Around 7.40–7.74 pH level mostly found in Purulia II, Kashipur, Punch, Hura, Arsha, and many blocks of Purulia district while low and very high pH levels were found in a scattered way. The high to very high EC was found in some parts of Bagmundi, Jaypur, Raghunathpur, Barabazar. Ca2+ has varied distributional patterns, values are greater in some pointed parts of this district (Fig. 3). A more or less similar pattern of Mg2+ is found all over the district and a very small part is characterized with higher value. Purulia has several witnesses of fluoride (F−) contamination that found mainly in Purulia II, Arsha, Raghunathpur region. K+ is the important properties of drinking water which have the analogous distributional pattern throughout the district same way Mg2+ have homogeneous distributional pattern higher values are mainly found Bagmundi, Para, Santuri, Manbazar blocks. In this study area, the component Na+ has a heterogeneous distribution pattern; significant parts of this district have a higher value than the permissible limit including Balarampur, Barabazar, Bandwan. Jaipur, Santuri, Para blocks. A very small part of Bagmundi, Santuri, Puncha, Jhalda have the higher values of NO3 that is very similar to PO42− a distribution that is mainly found in the central part of the district. In addition, SO42− is mainly found in a scattered way in Northern, SE, and a small part of Bagmundi block.
The produced results of all models show the more or less analogous result (Fig. 6); in which most of the part eastern region including Santuri, Kashipur, Hura, Puncha, Manbazar I and II, Bundwan, parts of Barabazar, and some parts of the western region, Jhalda I and II fall under safe zone whereas, central part especially Bagmundi, Balarampur, Arsha, Purulia I and II, Raghunathpur significantly susceptible to the health hazard. The HHRM also identifies the spatial extension based on its degree of hazard. In this research work, the HHRM is classified into five categories namely very low, low, moderate, high, and very high; each model gives a different spatial extension. Considering 100% to the entire study region significant part falls under high to very high health hazard susceptible zone; the produced result of the bagging model (Fig. 6a) shows the class-wise distribution i.e., 352.13 km2 (5.49%), 1922.82 km2 (30.02%), 2535.39 km2 (39.58%), 1316.66 km2 (20.55%), and 278.04 km2 (4.34%) are very low, low, moderate, high, and very high accordingly; for RF (Fig. 6b) 555.35 km2 (8.67%), 2628.01 km2 (41.02%), 1931.07 km2 (30.14%), 1038.99 km2 (16.22%), and 251.99 km2 (3.93%) fall under very low, low, moderate, high, and very high zone, respectively; and in the case of an ensemble (Fig. 6c) (bagging and RF) 472.08 km2 (7.37%), 2878.91 km2(44.94%), 2119.25 km2 (33.08%), 780.2 km2 (12.18%), and 154.79 km2 (2.41%) is an area characterized by very low, low, moderate, high, and very high zone separately.
Model Validation and Comparison
The performances of the employed models were validated by evaluating the dataset using nine validation techniques (AUC-ROC, sensitivity, specificity, accuracy, precision, F-score, kappa coefficient, and Taylor diagram). All the results are shown in Table 4, Figs. 7, and 8. It implies that all the used models have good prediction ability with their high predictive values, and indicate the substantial similarity between training and predictive health hazards. All models have displayed that the ensemble of bagging and RF has significant prediction ability followed by bagging and RF. Produced result of AUC-ROC curve shown in Fig. 7 that also indicates ensemble of bagging and RF is the most superior modeling approach that gave most effective result (training-0.934, validating-0.911) followed by bagging (training-0.912, validating-0.902) and RF (training-0.899, validating-0.878).
Taylor diagram (Fig. 8), as a graphical model evaluative tool, has given a notable result in one diagram on validating adopted models. Therefore, this validating diagram gave approximately equal results to the aforementioned validating techniques. Figure 8 shows the graphical presentation of evaluation performance; the derived value of ensemble of bagging and RF is (r = 0.94), have significant performance in health hazard susceptible zone prediction compare to another bagging (r = 0.89) and RF (r = 0.87).
Variable Importance
In this study, RF algorithm helps to assess the relative importance of causative factors; that have shown in Table 5 and Fig. 9. These all causative factors have significantly contributed to the prediction of models. Therefore, the result shows that depth has a great influence on health hazards with 0.912 value followed by the pH and As with 0.824 and 0.793 value, respectively, whereas, Ca2+, F− and Mg2+ with 0.128, 0.197, and 0.401 importance accordingly, have least importance in the models. According to the derived result, depth > pH > As > HCO3 > EC > NO3 > NA+ > K+ > CL− > Mg2+ > SO42− > PO42− > CA2+ > F− is the hierarchical order of importance in models performance. However, all the adopted causative factors have a greater and lower influence on health hazard susceptibility modeling. Therefore, in our study aforementioned all fourteen parameters are chosen in HHRM.
Discussion
The essentiality of potable groundwater in human health is notable in the Purulia district, however, this study showed how the study area was facing potential non-carcinogenic health hazards due to adverse physiochemical properties of groundwater. Therefore, it was essential to build up a proper water quality management tool or making of HHRM of the entire Purulia district. This map will be helpful to the local people as well as regional authorities in realizing the sustainable use and management of groundwater. Thus, this health hazard susceptibility zone prediction is an important and most tough task in water quality and health risk assessment. Worldwide, deterioration of groundwater and related to health risks have been studied by different researchers (Kaur et al. 2020; Chen et al. 2021) and they also found a similar result of health vulnerability; those are ‘blue baby syndrome’ in children, skeletal fluorosis, black foot disease, ‘Hemochromatosis’, and neurological disorder in the human body due to abundance of different chemical components. Therefore, the HHRM prediction of the Purulia district became the foremost choice to all with consideration of all possible physiochemical parameters. Undoubtedly, the precision level of prediction significantly depends on the employed modeling approach. In recent times, some new ML methods such as ANN, SVM, logistic regression, and their ensembles have given more accurate results than the previous traditional method (Tien Bui et al. 2012). However, the identification of suitable methods to delineate the HHRM is extremely necessary. Therefore, we have applied three newly developed mostly accepted ML algorithms in this research work with appropriate evaluation and comparison i.e., bagging, RF, and the ensemble of bagging and RF. Previously, as a new ML tool bagging and RF have been used in different studies for their prediction and they also got their reliable result which significantly associated with reality (Iverson et al. 2004; Prasad et al. 2006; Hegde et al. 2015; Alsouda et al. 2019; Islam et al. 2021). But a very small number of studies have been done in the case of groundwater studies related to public health, where the use of bagging, RF, and their ensemble is a quite new approach. Generally, they all have the measurable ability in prediction studies, among them ensemble (bagging and RF) model yielded a notable result compared to other individual models in terms of evaluation performance. However, the ensemble model is more consistent. In this research work, to validate all the adopted models we have applied AUC-ROC curve; it gave optimal results on their performances in which the ROC value is 0.934 (training) and 0.911 (testing) followed by bagging and RF. According to (Aguirre-Gutiérrez et al. 2013) in the matter of model validation of large spatial prediction, AUC-ROC could not be the only evaluation technique. Thus, in this study, we have used other seven validation techniques to ensure the result quantitatively and all these models also gave the same result.
Above all the identification of health hazard causative factors is the foremost consideration that significantly affects the HHRM. That is why several methods have been introduced for the selection of causative factors such as linear regression, RF, VIF and TOL, and Pearson’s correlation coefficients. Although the selection procedure is still debated; with the presence of these debates, some methods are accepted universally for factors selection. Thus, in this study we have employed two separate methods instead of one method namely, VIF-TOL and Pearson’s correlation coefficients to identify the independent nature of health hazard causative parameters. Generally, As, pH, depth, heavy metals have a great influence to deteriorate the water quality in this study the adopted methods gave the more or less same result. RF was also used to quantify the variable importance that was very helpful in model performance, gave a valuable result for appropriate model development. According to result depth, pH, arsenic, \({\mathrm{HCO}}_{3}^{-}\) have a crucial influence in health risk assessment. The study shows that the spatially central part of this district including Bagmundi, Balarampur, Arsha, Purulia I and II, Raghunathpur significantly susceptible to the health hazard. In this study, firstly, we have overlooked in some cases such as soil characteristics, aquifer properties, and LULC of this study region which are the most significant shortcomings of this study. Secondly, we only focused on water quality data for health hazard assessment but there can be present any other possibilities in deteriorating public health such as food habits, agricultural practices, etc. but in future studies, health hazard susceptible zone would be developed by considering all parameters by newly developed data-driven models. Besides, our study will motivate others to apply an ensemble of bagging and RF for better results. However, the predicted results are still valid and meaningful that will be very helpful by providing information to the local authorities in making proper strategies in eliminating water pollution in the considered region.
Conclusion
Assessment of health hazard risk mapping (HHRM) by considering suitable groundwater-related health hazard causative factors is necessary to control and minimization of health hazard, particularly in semi-arid areas of sub-tropical environment. Therefore, potential HHRM was assessed in the Puruliya district of West Bengal using decision tree-based RF, bagging and their ensemble learning approach, and considering fourteen appropriate causative factors. The summary of this research study are presented as follows:
-
The outcomes of this study revealed that among the fourteen selected factors depth (0.912), pH (0.824), As (0.793) and HCO3 (0.791) is the most dominate in controlling HHRM in the present study region.
-
In the modeling perspective, ensemble approach of RF and bagging is most optimal (AUC = 0.911) to predict HHRM for this plateau region followed by bagging (AUC = 0.902) and RF (AUC = 0.878). In addition to this, Taylor diagram also represent that ensemble approach is most optimal followed by bagging and RF.
-
The result of ensemble approach revealed high precession accuracy which can be recommended for the assessment of HHRM of another region as well as hydrological studies, and environmental problem assessment in different region of the world.
-
The produced HHRM can help the regional planners, decision-makers, and government to take proper preventing measures in the high and very high susceptible regions and simultaneously reduce further damages.
Data Availability
Data available on request from the authors.
References
Adimalla N, Li P (2019) Occurrence, health risks, and geochemical mechanisms of fluoride and nitrate in groundwater of the rock-dominant semi-arid region, Telangana State, India. Hum Ecol Risk Assess 25:81–103. https://doi.org/10.1080/10807039.2018.1480353
Adimalla N, Qian H (2019) Groundwater quality evaluation using water quality index (WQI) for drinking purposes and human health risk (HHR) assessment in an agricultural region of Nanganur, South India. Ecotoxicol Environ Saf 176:153–161. https://doi.org/10.1016/j.ecoenv.2019.03.066
Adimalla N, Venkatayogi S (2016) Mechanism of fluoride enrichment in groundwater of hard rock aquifers in Medak, Telangana state South, India. Environ Earth Sci 76:45. https://doi.org/10.1007/s12665-016-6362-2
Adimalla N, Wu J (2019) Groundwater quality and associated health risks in a semi-arid region of south India: implication to sustainable groundwater management. Hum Ecol Risk Assess 25:191–216. https://doi.org/10.1080/10807039.2018.1546550
Adimalla N, Vasa SK, Li P (2018) Evaluation of groundwater quality, Peddavagu in Central Telangana (PCT), South India: an insight of controlling factors of fluoride enrichment. Model Earth Syst Environ 4:841–852. https://doi.org/10.1007/s40808-018-0443-z
Aguirre-Gutiérrez J, Carvalheiro LG, Polce C et al (2013) Fit-for-purpose: species distribution model performance depends on evaluation criteria—Dutch hoverflies as a case study. PLoS ONE 8:e63708. https://doi.org/10.1371/journal.pone.0063708
Ahmad S, Umar R, Arshad I (2019) Groundwater quality appraisal and its hydrogeochemical characterization—Mathura City, Western Uttar Pradesh. J Geol Soc India 94:611–623. https://doi.org/10.1007/s12594-019-1368-5
Ahmad S, Ahmad I, Umar R, Farooq SH (2022) Spatio-temporal variation and health risk associated with trace element concentrations in groundwater of Mathura city using modified indexing approach. Arab J Geosci 15:318. https://doi.org/10.1007/s12517-022-09434-3
Ahmed N, Bodrud-Doza Md, Islam SMD-U et al (2019) Hydrogeochemical evaluation and statistical analysis of groundwater of Sylhet, North-Eastern Bangladesh. Acta Geochim 38:440–455. https://doi.org/10.1007/s11631-018-0303-6
Aldhyani THH, Al-Yaari M, Alkahtani H, Maashi M (2020) Water quality prediction using artificial intelligence algorithms. Appl Bionics Biomech 2020:e6659314. https://doi.org/10.1155/2020/6659314
Alsouda Y, Pllana S, Kurti A (2019) IoT-based urban noise identification using machine learning: performance of SVM, KNN, bagging, and random forest. In: Proceedings of the International Conference on Omni-Layer Intelligent Systems. Association for Computing Machinery, New York, pp 62–67
Arora T, Mishra A, Matta G et al (2014) Human health risk assessment of temporal and spatial variations of ground water quality at a densely industrialized commercial complex at Haridwar, India. J Appl Nat Sci 6:825–843. https://doi.org/10.31018/jans.v6i2.543
Bellu A, Sanches Fernandes LF, Cortes RMV, Pacheco FAL (2016) A framework model for the dimensioning and allocation of a detention basin system: the case of a flood-prone mountainous watershed. J Hydrol 533:567–580. https://doi.org/10.1016/j.jhydrol.2015.12.043
Bera A, Das S (2021) Water resource management in semi-arid Purulia district of West Bengal, in the context of sustainable development goals. In: Shit PK, Bhunia GS, Adhikary PP, Dash ChJ (eds) Groundwater and society: applications of geospatial technology. Springer, Cham, pp 501–519
Bhowmik AK, Alamdar A, Katsoyiannis I et al (2015) Mapping human health risks from exposure to trace metal contamination of drinking water sources in Pakistan. Sci Total Environ 538:306–316. https://doi.org/10.1016/j.scitotenv.2015.08.069
Bodrud-Doza Md, Islam SMD-U, Rume T et al (2020) Groundwater quality and human health risk assessment for safe and sustainable water supply of Dhaka City dwellers in Bangladesh. Groundw Sustain Dev 10:100374. https://doi.org/10.1016/j.gsd.2020.100374
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140. https://doi.org/10.1007/BF00058655
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Bui DT, Pradhan B, Lofman O, et al (2012) Application of support vector machines in landslide susceptibility assessment for the Hoa Binh province (Vietnam) with kernel functions analysis. International Congress on Environmental Modelling and Software. International Environmental Modelling and Software Society, Leipzig
Chabukdhara M, Gupta SK, Kotecha Y, Nema AK (2017) Groundwater quality in Ghaziabad district, Uttar Pradesh, India: multivariate and health risk assessment. Chemosphere 179:167–178. https://doi.org/10.1016/j.chemosphere.2017.03.086
Chakraborti D, Das B, Rahman MM et al (2009) Status of groundwater arsenic contamination in the state of West Bengal, India: a 20-year study report. Mol Nutr Food Res 53:542–551. https://doi.org/10.1002/mnfr.200700517
Chen W, Tsangaratos P, Ilia I et al (2019) Groundwater spring potential mapping using population-based evolutionary algorithms and data mining methods. Sci Total Environ 684:31–49. https://doi.org/10.1016/j.scitotenv.2019.05.312
Chen F, Yao L, Mei G et al (2021) Groundwater quality and potential human health risk assessment for drinking and irrigation purposes: a case study in the semiarid region of North China. Water 13:783. https://doi.org/10.3390/w13060783
Chowdhuri I, Pal SC, Chakrabortty R (2020) Flood susceptibility mapping by ensemble evidential belief function and binomial logistic regression model on river basin of eastern India. Adv Space Res 65:1466–1489. https://doi.org/10.1016/j.asr.2019.12.003
Chowdhury P, Mukhopadhyay BP, Nayak S, Bera A (2021) Hydro-chemical characterization of groundwater and evaluation of health risk assessment for fluoride contamination areas in the eastern blocks of Purulia district, India. Environ Dev Sustain. https://doi.org/10.1007/s10668-021-01911-1
Chung C-J, Fabbri AG (2008) Predicting landslides for risk analysis—spatial models tested by a cross-validation technique. Geomorphology 94:438–452. https://doi.org/10.1016/j.geomorph.2006.12.036
Dissanayake CB (1991) The fluoride problem in the ground water of Sri Lanka—environmental management and health. Int J Environ Sci 38:137–155. https://doi.org/10.1080/00207239108710658
Dou J, Yunus AP, Merghadi A et al (2020) Different sampling strategies for predicting landslide susceptibilities are deemed less consequential with deep learning. Sci Total Environ 720:137320. https://doi.org/10.1016/j.scitotenv.2020.137320
Edokpayi JN, Enitan AM, Mutileni N, Odiyo JO (2018) Evaluation of water quality and human risk assessment due to heavy metals in groundwater around Muledane area of Vhembe District, Limpopo Province, South Africa. Chem Cent J 12:2. https://doi.org/10.1186/s13065-017-0369-y
Farooq SH, Prusty P, Singh RK et al (2018) Fluoride contamination of groundwater and its seasonal variability in parts of Purulia district, West Bengal, India. Arab J Geosci 11:709. https://doi.org/10.1007/s12517-018-4062-9
Garg VK, Suthar S, Singh S et al (2009) Drinking water quality in villages of southwestern Haryana, India: assessing human health risks associated with hydrochemistry. Environ Geol 58:1329–1340. https://doi.org/10.1007/s00254-008-1636-y
Genthe B, Kapwata T, Le Roux W et al (2018) The reach of human health risks associated with metals/metalloids in water and vegetables along a contaminated river catchment: South Africa and Mozambique. Chemosphere 199:1–9. https://doi.org/10.1016/j.chemosphere.2018.01.160
Granata F, Papirio S, Esposito G et al (2017) Machine learning algorithms for the forecasting of wastewater quality indicators. Water 9:105. https://doi.org/10.3390/w9020105
Gulgundi MS, Shetty A (2018) Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques. Appl Water Sci 8:43. https://doi.org/10.1007/s13201-018-0684-z
He X, Li P (2020) Surface water pollution in the middle Chinese Loess plateau with special focus on hexavalent chromium (Cr6+): occurrence, sources and health risks. Expo Health 12:385–401. https://doi.org/10.1007/s12403-020-00344-x
He S, Wu J, Wang D, He X (2022a) Predictive modeling of groundwater nitrate pollution and evaluating its main impact factors using random forest. Chemosphere 290:133388. https://doi.org/10.1016/j.chemosphere.2021.133388
He S, Li P, Su F, Wang D, Ren X (2022b) Identification and apportionment of shallow groundwater nitrate pollution in Weining Plain, Northwest China, using hydrochemical indices, nitrate stable isotopes, and the new Bayesian stable isotope mixing model (MixSIAR). Environ Pollut 298:118852. https://doi.org/10.1016/j.envpol.2022.118852
Hegde C, Wallace S, Gray K (2015) Using trees, bagging, and random forests to predict rate of penetration during drilling. OnePetro. https://doi.org/10.2118/176792-MS
Hoehler FK (2000) Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. Chem Cent J 53:499–503. https://doi.org/10.1016/S0895-4356(99)00174-2
Hong H, Liu J, Bui DT et al (2018) Landslide susceptibility mapping using J48 decision tree with adaboost, bagging and rotation forest ensembles in the Guangchang area (China). CATENA 163:399–413. https://doi.org/10.1016/j.catena.2018.01.005
Islam SMD-U, Majumder RK, Uddin MJ et al (2017) Hydrochemical characteristics and quality assessment of groundwater in Patuakhali District, southern coastal region of Bangladesh. Expo Health 9:43–60. https://doi.org/10.1007/s12403-016-0221-y
Islam MdA, Romić D, Akber MdA, Romić M (2018) Trace metals accumulation in soil irrigated with polluted water and assessment of human health risk from vegetable consumption in Bangladesh. Environ Geochem Health 40:59–85. https://doi.org/10.1007/s10653-017-9907-8
Islam ARMdT, Pal SC, Chowdhuri I et al (2021) Application of novel framework approach for prediction of nitrate concentration susceptibility in coastal multi-aquifers, Bangladesh. Sci Total Environ 801:149811. https://doi.org/10.1016/j.scitotenv.2021.149811
Iverson LR, Prasad AM, Liaw A (2004) New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis. In: Smithers, Richard, ed Landscape ecology of trees and forests, proceedings of the twelfth annual IALE(UK) conference; 2004 June 21–24; Cirencester, UK [Place of publication unknown]: International Association for Landscape Ecology: 317–320
Kaur L, Rishi MS, Siddiqui AU (2020) Deterministic and probabilistic health risk assessment techniques to evaluate non-carcinogenic human health risk (NHHR) due to fluoride and nitrate in groundwater of Panipat, Haryana, India. Environ Pollut 259:113711. https://doi.org/10.1016/j.envpol.2019.113711
Khan S, Shahnaz M, Jehan N et al (2013) Drinking water quality and human health risk in Charsadda district. Pakistan J Clean Prod 60:93–101. https://doi.org/10.1016/j.jclepro.2012.02.016
Kouadri S, Pande CB, Panneerselvam B et al (2021) Prediction of irrigation groundwater quality parameters using ANN, LSTM, and MLR models. Environ Sci Pollut Res. https://doi.org/10.1007/s11356-021-17084-3
Kumar M, Ramanathan AL, Tripathi R et al (2017) A study of trace element contamination using multivariate statistical techniques and health risk assessment in groundwater of Chhaprola Industrial Area, Gautam Buddha Nagar, Uttar Pradesh, India. Chemosphere 166:135–145. https://doi.org/10.1016/j.chemosphere.2016.09.086
Kundu A, Nag SK (2018) Assessment of groundwater quality in Kashipur Block, Purulia district, West Bengal. Appl Water Sci 8:33. https://doi.org/10.1007/s13201-018-0675-0
Li P, Wu J, Qian H (2012) Groundwater quality assessment based on rough sets attribute reduction and TOPSIS method in a semi-arid area, China. Environ Monit Assess 184:4841–4854. https://doi.org/10.1007/s10661-011-2306-1
Li P, Qian H, Wu J, Chen J (2013) Sensitivity analysis of TOPSIS method in water quality assessment: I. Sensitivity to the parameter weights. Environ Monit Assess 185:2453–2461. https://doi.org/10.1007/s10661-012-2723-9
Li P, Tian R, Xue C, Wu J (2017) Progress, opportunities and key fields for groundwater quality research under the impacts of human activities in China with a special focus on western China. Environ Sci Pollut Res 24(15):13224–13234. https://doi.org/10.1007/s11356-017-8753-7
Li P, He X, Li Y, Xiang G (2019) Occurrence and health implication of fluoride in groundwater of loess aquifer in the chinese loess plateau: a case study of Tongchuan, Northwest China. Expo Health 11:95–107. https://doi.org/10.1007/s12403-018-0278-x
Liao D, Valliant R (2012) Variance inflation factors in the analysis of complex survey data. Survey Methodol 38(1):53–62
Liu J, Gao Z, Zhang Y et al (2021) Hydrochemical evaluation of groundwater quality and human health risk assessment of nitrate in the largest peninsula of China based on high-density sampling: a case study of Weifang. J Clean Prod 322:129164. https://doi.org/10.1016/j.jclepro.2021.129164
Lu H, Ma X (2020) Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 249:126169. https://doi.org/10.1016/j.chemosphere.2020.126169
Madilonga RT, Edokpayi JN, Volenzo ET et al (2021) Water quality assessment and evaluation of human health risk, in Mutangwi river, Limpopo province, South Africa. Int J Environ Res Public Health 18:6765. https://doi.org/10.3390/ijerph18136765
Mazhar I, Hamid A, Afzal S (2019) Groundwater quality assessment and human health risks in Gujranwala District, Pakistan. Environ Earth Sci 78:634. https://doi.org/10.1007/s12665-019-8644-y
Mazumder DNG, Ghosh A, Majumdar KK et al (2010) Arsenic contamination of ground water and its health impact on population of District of Nadia, West Bengal, India. Indian J Community Med 35:331–338. https://doi.org/10.4103/0970-0218.66897
Mondal D, Pal S (2015) A multi-parametric spatial modeling of vulnerability due to arsenic pollution in Murshidabad district of West Bengal, India. Arab J Geosci 8:8047–8054. https://doi.org/10.1007/s12517-015-1809-4
Mondal D, Dutta G, Gupta S (2016) Inferring the fluoride hydrogeochemistry and effect of consuming fluoride-contaminated drinking water on human health in some endemic areas of Birbhum district, West Bengal. Environ Geochem Health 38:557–576. https://doi.org/10.1007/s10653-015-9743-7
Mthembu PP, Elumalai V, Brindha K, Li P (2020) Hydrogeochemical processes and trace metal contamination in groundwater: impact on human health in the maputaland coastal aquifer, South Africa. Expo Health 12:403–426. https://doi.org/10.1007/s12403-020-00369-2
Murtaza B, Natasha AM et al (2020) Compositional and health risk assessment of drinking water from health facilities of District Vehari, Pakistan. Environ Geochem Health 42:2425–2437. https://doi.org/10.1007/s10653-019-00465-6
Nath A, Samanta S, Banerjee S et al (2021) Threat of arsenic contamination, salinity and water pollution in agricultural practices of Sundarban Delta, India, and mitigation strategies. SN Appl Sci 3:560. https://doi.org/10.1007/s42452-021-04544-1
Nkpaa KW, Amadi BA, Wegwu MO (2018) Hazardous metals levels in groundwater from Gokana, Rivers State, Nigeria: non-cancer and cancer health risk assessment. Hum Ecol Risk Assess 24:214–224. https://doi.org/10.1080/10807039.2017.1374166
Norouzi H, Shahmohammadi-Kalalagh S (2019) Locating groundwater artificial recharge sites using random forest: a case study of Shabestar region. Iran Environ Earth Sci 78:380. https://doi.org/10.1007/s12665-019-8381-2
Nsabimana A, Li P, He S, He X, Alam SMK, Fida M (2021) Health risk of the shallow groundwater and its suitability for drinking purpose in Tongchuan, China. Water 13(22):3256. https://doi.org/10.3390/w13223256
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26:217–222. https://doi.org/10.1080/01431160412331269698
Pal SC, Ruidas D, Saha A et al (2022) Application of novel data-mining technique-based nitrate concentration susceptibility prediction approach for coastal aquifers in India. J Clean Prod 346:131205
Parameswari K, Padmini TK (2018) Assessment of groundwater potential in Tirukalukundram block of Southern Chennai metropolitan area. Environ Dev Sustain 20:1535–1552. https://doi.org/10.1007/s10668-017-9952-6
Pham BT, Prakash I (2019) A novel hybrid model of bagging-based Naïve Bayes Trees for landslide susceptibility assessment. Bull Eng Geol Environ 78:1911–1925. https://doi.org/10.1007/s10064-017-1202-5
Pinto U, Thoradeniya B, Maheshwari B (2020) Water quality and chronic kidney disease of unknown aetiology (CKDu) in the dry zone region of Sri Lanka: impacts on well-being of village communities and the way forward. Environ Sci Pollut Res 27:3892–3907. https://doi.org/10.1007/s11356-019-06669-8
Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199. https://doi.org/10.1007/s10021-005-0054-1
Qian H, Li P, Howard KWF et al (2012) Assessment of groundwater vulnerability in the Yinchuan Plain, Northwest China using OREADIC. Environ Monit Assess 184:3613–3628. https://doi.org/10.1007/s10661-011-2211-7
Raheja H, Goel A, Pal M (2021) Prediction of groundwater quality indices using machine learning algorithms. Water Pract Technol 17:336–351. https://doi.org/10.2166/wpt.2021.120
Rahman MM, Sengupta MK, Ahamed S et al (2005) The magnitude of arsenic contamination in groundwater and its health effects to the inhabitants of the Jalangi—one of the 85 arsenic affected blocks in West Bengal, India. Sci Total Environ 338:189–200. https://doi.org/10.1016/j.scitotenv.2004.06.022
Rahman MdM, Islam MdA, Bodrud-Doza Md et al (2018) Spatio-temporal assessment of groundwater quality and human health risk: a case study in Gopalganj, Bangladesh. Expo Health 10:167–188. https://doi.org/10.1007/s12403-017-0253-y
Raza M, Hussain F, Lee J-Y et al (2017) Groundwater status in Pakistan: a review of contamination, health risks, and potential needs. Crit Rev Environ Sci Technol 47:1713–1762. https://doi.org/10.1080/10643389.2017.1400852
Reghunath R, Murthy TRS, Raghavan BR (2002) The utility of multivariate statistical techniques in hydrogeochemical studies: an example from Karnataka, India. Water Res 36:2437–2442. https://doi.org/10.1016/S0043-1354(01)00490-0
Roy P, Chandra Pal S, Chakrabortty R et al (2020) Threats of climate and land use change on future flood susceptibility. J Clean Prod 272:122757
Ruidas D, Pal SC, Islam ARMdT, Saha A (2021) Characterization of groundwater potential zones in water-scarce hardrock regions using data driven model. Environ Earth Sci 80:809. https://doi.org/10.1007/s12665-021-10116-8
Sadat-Noori SM, Ebrahimi K, Liaghat AM (2014) Groundwater quality assessment using the water quality index and GIS in Saveh-Nobaran aquifer, Iran. Environ Earth Sci 71:3827–3843. https://doi.org/10.1007/s12665-013-2770-8
Saha A, Pal SC, Arabameri A et al (2021a) Optimization modelling to establish false measures implemented with ex-situ plant species to control gully erosion in a monsoon-dominated region with novel in-situ measurements. J Environ Manage 287:112284. https://doi.org/10.1016/j.jenvman.2021.112284
Saha A, Pal SC, Arabameri A et al (2021b) Flood susceptibility assessment using novel ensemble of hyperpipes and support vector regression algorithms. Water 13:241
Saha A, Pal SC, Santosh M et al (2021c) Modelling multi-hazard threats to cultural heritage sites and environmental sustainability: the present and future scenarios. J Clean Prod 320:128713
Sajedi-Hosseini F, Malekian A, Choubin B et al (2018) A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Sci Total Environ 644:954–962. https://doi.org/10.1016/j.scitotenv.2018.07.054
Salman SR, Abu Ruka’h YH (1999) Multivariate and principal component statistical analysis of contamination in urban and agricultural soils from North Jordan. Environ Geol 38:265–270. https://doi.org/10.1007/s002540050424
Samal AC, Bhattacharya P, Mallick A et al (2015) A study to investigate fluoride contamination and fluoride exposure dose assessment in lateritic zones of West Bengal, India. Environ Sci Pollut Res 22:6220–6229. https://doi.org/10.1007/s11356-014-3817-4
Sarkar M, Chandra Pal S (2021) Human health hazard assessment for high groundwater arsenic and fluoride intact in Malda district, Eastern India. Groundw Sustain Dev 13:100565. https://doi.org/10.1016/j.gsd.2021.100565
Senthilkumar M, Arumugam R, Gnanasundar D et al (2015) Effects of geological structures on groundwater flow and quality in hardrock regions of northern Tirunelveli district, Southern India. J Earth Syst Sci 124:405–418. https://doi.org/10.1007/s12040-015-0538-0
Shen Y, Oki T, Utsumi N et al (2008) Projection of future world water resources under SRES scenarios: water withdrawal/Projection des ressources en eau mondiales futures selon les scénarios du RSSE: prélèvement d’eau. Hydrol Sci J 53:11–33. https://doi.org/10.1623/hysj.53.1.11
Sihag P, Mohsenzadeh Karimi S, Angelaki A (2019) Random forest, M5P and regression analysis to estimate the field unsaturated hydraulic conductivity. Appl Water Sci 9:129. https://doi.org/10.1007/s13201-019-1007-8
Singh G, Rishi MS, Herojeet R et al (2020) Evaluation of groundwater quality and human health risks from fluoride and nitrate in semi-arid region of northern India. Environ Geochem Health 42:1833–1862. https://doi.org/10.1007/s10653-019-00449-6
Srinivasan JT, Reddy VR (2009) Impact of irrigation water quality on human health: a case study in India. Ecol Econ 68:2800–2807. https://doi.org/10.1016/j.ecolecon.2009.04.019
Su Z, Wu J, He X, Elumalai V (2020) Temporal changes of groundwater quality within the groundwater depression cone and prediction of confined groundwater salinity using Grey Markov model in Yinchuan area of northwest China. Expo Health 12(3):447–468. https://doi.org/10.1007/s12403-020-00355-8
Talukdar S, Ghose B, Shahfahad, et al (2020) Flood susceptibility modeling in Teesta River basin, Bangladesh using novel ensembles of bagging algorithms. Stoch Environ Res Risk Assess 34:2277–2300. https://doi.org/10.1007/s00477-020-01862-5
Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res Atmos 106:7183–7192. https://doi.org/10.1029/2000JD900719
Tian R, Wu J (2019) Groundwater quality appraisal by improved set pair analysis with game theory weightage and health risk estimation of contaminants for Xuecha drinking water source in a loess area in Northwest China. Hum Ecol Risk Assess 25:132–157. https://doi.org/10.1080/10807039.2019.1573035
Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in vietnam using support vector machines, decision tree, and naïve bayes models. Math Probl Eng 2012:e974638. https://doi.org/10.1155/2012/974638
Tien Bui D, Pradhan B, Nampak H et al (2016) Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS. J Hydrol 540:317–330. https://doi.org/10.1016/j.jhydrol.2016.06.027
Tiwari AK, Singh AK (2014) Hydrogeochemical investigation and groundwater quality assessment of Pratapgarh district, Uttar Pradesh. J Geol Soc India 83:329–343. https://doi.org/10.1007/s12594-014-0045-y
Truong XL, Mitamura M, Kono Y et al (2018) Enhancing prediction performance of landslide susceptibility model using hybrid machine learning approach of bagging ensemble and logistic model tree. Appl Sci 8:1046. https://doi.org/10.3390/app8071046
US EPA O (2015) Risk Assessment Guidance for Superfund (RAGS) 2004: Part E https://www.epa.gov/risk/risk-assessment-guidance-superfund-rags-part-e. Accessed 3 Jan 2022
Varol S, Davraz A (2015) Evaluation of potential human health risk and investigation of drinking water quality in Isparta city center (Turkey). J Water Health 14:471–488. https://doi.org/10.2166/wh.2015.187
Vasanthavigar M, Srinivasamoorthy K, Vijayaragavan K et al (2010) Application of water quality index for groundwater quality assessment: Thirumanimuttar sub-basin, Tamilnadu, India. Environ Monit Assess 171:595–609. https://doi.org/10.1007/s10661-009-1302-1
Wang Y, Li P (2022) Appraisal of shallow groundwater quality with human health risk assessment in different seasons in rural areas of the Guanzhong Plain (China). Environ Res 207:112210. https://doi.org/10.1016/j.envres.2021.112210
Wei M, Wu J, Li W, Zhang Q, Su F, Wang Y (2021) Groundwater geochemistry and its impacts on groundwater arsenic enrichment, variation, and health risks in Yongning County, Yinchuan Plain of Northwest China. Expo Health. https://doi.org/10.1007/s12403-021-00391-y
Wu J, Zhou H, He S, Zhang Y (2019) Comprehensive understanding of groundwater quality for domestic and agricultural purposes in terms of health risks in a coal mine area of the Ordos basin, North of the Chinese loess plateau. Environ Earth Sci 78:446. https://doi.org/10.1007/s12665-019-8471-1
Yariyan P, Janizadeh S, Van Phong T et al (2020) Improvement of best first decision trees using bagging and dagging ensembles for flood probability mapping. Water Resour Manage 34:3037–3053. https://doi.org/10.1007/s11269-020-02603-7
Zakir HM, Sharmin S, Akter A, Rahman MdS (2020) Assessment of health risk of heavy metals and water quality indices for irrigation and drinking suitability of waters: a case study of Jamalpur Sadar area, Bangladesh. Environ Adv 2:100005. https://doi.org/10.1016/j.envadv.2020.100005
Zhang Y, Wu J, Xu B (2018) Human health risk assessment of groundwater nitrogen pollution in Jinghui canal irrigation area of the loess region, northwest China. Environ Earth Sci 77:273. https://doi.org/10.1007/s12665-018-7456-9
Zhang Q, Xu P, Qian H (2020a) Groundwater quality assessment using improved Water Quality Index (WQI) and Human Health Risk (HHR) Evaluation in a semi-arid region of Northwest China. Expo Health 12:487–500. https://doi.org/10.1007/s12403-020-00345-w
Zhang Q, Xu P, Qian H, Yang F (2020b) Hydrogeochemistry and fluoride contamination in Jiaokou irrigation district, Central China: assessment based on multivariate statistical approach and human health risk. Sci Total Environ 741:140460. https://doi.org/10.1016/j.scitotenv.2020.140460
Funding
This research is not supported by any fundings.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors declare that there is no conflict of interest to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ruidas, D., Pal, S.C., Towfiqul Islam, A.R.M. et al. Hydrogeochemical Evaluation of Groundwater Aquifers and Associated Health Hazard Risk Mapping Using Ensemble Data Driven Model in a Water Scares Plateau Region of Eastern India. Expo Health 15, 113–131 (2023). https://doi.org/10.1007/s12403-022-00480-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12403-022-00480-6