Introduction

The subsurface water, commonly known as groundwater is the significant and purest natural resource, essential for mankind to survive (Li et al. 2012), and worldwide more than 1.5 billion people directly or indirectly depend on groundwater for drinking purposes (Shen et al. 2008). According to Adimalla and Li (2019), and Li et al. (2019), arid and semi-arid regions of the world are substantially reliant on groundwater to fulfill their daily needs including drinking, domestic, irrigation purposes due to scarcity of surface water. Safe drinking water is an internationally accepted human right for better health. The growing populations make sequential changes on the natural environment, human activities such as agricultural patterns, and industrial needs, resulting in severe water emergencies and gradually deteriorate water quality day by day that emerges several environmental adversities as well as health-related problems on the human body (Chabukdhara et al. 2017; Qian et al. 2012; Zhang et al. 2018; Li et al. 2017). Usually, groundwater quality relies on the geological structure of aquifer in a particular region and congregation of chemical elements (Parameswari and Padmini 2018), but in recent times this subsurface water facing quite a lot of threatening conditions due to natural reasons that weathering of rock-forming material and secondary reasons such as uncontrolled water usage, deficiency of water recharge, wastewater discharge, agricultural and industrial activities, extreme use of fertilizer, substantial evaporation and a small amount of rainfall (Tiwari and Singh 2014; Adimalla and Venkatayogi 2016; Kumar et al. 2017; Ahmed et al. 2019). According to World Health Organization (WHO) groundwater extraction significantly increased in the last few decades affecting the quality of groundwater and increasing the health risk. The pollutants, due to their toxicity, carcinogenic nature brings several health risks for a significant number of populations; these pollutants carrier water makes 70% of the diseases and worldwide 20% of cancers are related to water contamination. Various research works have been carried out previously on groundwater quality and associated health hazards in several countries in the world such as India (Garg et al. 2009; Srinivasan and Reddy 2009; Arora et al. 2014; Adimalla and Qian 2019; Singh et al. 2020; Sarkar and Chandra Pal 2021), South Africa (Edokpayi et al. 2018; Genthe et al. 2018; Mthembu et al. 2020; Madilonga et al. 2021), Bangladesh (Islam et al. 2018; Rahman et al. 2018; Bodrud-Doza et al. 2020; Zakir et al. 2020), Sri Lanka (Dissanayake 1991; Pinto et al. 2020), Pakistan (Bhowmik et al. 2015; Raza et al. 2017; Mazhar et al. 2019; Murtaza et al. 2020), China (Liu et al., 2021; Nsabimana et al. 2021; Wang and Li, 2022; Zhang et al., 2020b). Though, there are several trace elements, which are crucial for human health, are there in groundwater, yet too much consumptions of these can make an adverse effect on the human body i.e., reproductive effects, neurological disorder, breathing problem, lung disease, cancer, hypertension, etc. (Islam et al. 2017; Nkpaa et al. 2018). Infants are severely affected compared to adults through excessive intake of nitrate contaminated water and pose the disease named ‘blue baby syndrome’ (Tian and Wu 2019); a significant number of populations facing dental and skeletal fluorosis due to long term fluoride consumption (standard limit, Bureau of Indian Standards-1 (mg/L), WHO-1.5 (mg/L) (Mondal et al. 2016; Sarkar and Chandra Pal 2021); arsenic also caused various health problems in the human body including skin disease, neurological disorder and cancer due to long term drinking of arsenic dominated water (Rahman et al. 2005; Mazumder et al. 2010). According to world Health Organization unwanted Fe consumption can lead to joint pain, fatigue ‘Hemachromatosis’ disease. Excessive Mn consumption for a long time through drinking water can responsible for reproductive damage and neurological disorder in the human body (US EPA 2015).

However, water quality measurement is the prime concern for human health parallelly, human, social, and economic development prominently depends on safe drinking water (Vasanthavigar et al. 2010); About 85% of the rural population merely depends on subsurface water (Garg et al. 2009); because of it, safe water supply and sanitation were one of the important agenda during first 5-year plan (1951–1956) and Millennium Development Goals (2000–2015). Many countries in the world such as India is facing severe water contamination with higher amounts of arsenic, fluoride, salinity, heavy metals traced elements (Wei et al. 2021; Su et al. 2020). Ahmad et al. (2019) demonstrate that how rapid urbanization increase the contamination level in shallow depth groundwater due to its more exposure towards earth surface compare to intermediate and deep depth groundwater level. According to Adimalla et al. (2018), about 21 states of India and around 6.6 crores people have been affected due to fluorosis and many other diseases. Among them, some districts of West Bengal state are highly vulnerable to arsenic, fluoride, and salinity-related problems (Samal et al. 2015; Nath et al. 2021). So, the drinking water supply without proper testing and mitigation triggers the health risk of the rural population (Mondal and Pal 2015). Till now, so many studies have been done on water quality assessment of a particular region but very few studies have been done on HHRM based on hydro chemical properties of water. Although, proper water testing and HHRM can play a significant role to determine affected areas based on testing results. HHRM is a significant assessment tool that can correlate several hydrochemical properties with human health; can identify the potential health risk zone of that area.

Till now, various researchers followed different methodologies and statistical techniques i.e., multivariate statistical technique (Reghunath et al. 2002; Gulgundi and Shetty 2018), water quality index (Sadat-Noori et al. 2014; Zhang et al. 2020a), fuzzy method (Wu et al. 2019), Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) (Li et al. 2013) to assess the water quality with relation to health risk for sustainable development of a particular region. According to Sajedi-Hosseini et al. (2018) these index and statistical methods have a significant role in recognizing the status of groundwater around the world. Very few data-driven models such as neural network and projection pursuit statistical techniques have been used in groundwater studies (Salman and Abu Ruka’h 1999). The interpolation method is also used in water quality assessment but the use of ML algorithms are quite new in ground water research, which is emerging as a capable and promising multi-functional approach to all scientific field. Over the last three decades, different ML methods have been applied extensively to assess the groundwater quality due to enhanced performance in comparison to traditional statistical analysis (Lu and Ma 2020; Aldhyani et al. 2020). The advancement of new ML tools in hydrological studies such as support vector regression and regression tree for wastewater management (Granata et al. 2017); gradient boosting, extreme gradient boosting (XGBoost), deep neural network for groundwater quality assessment (Raheja et al. 2021). Kouadri et al. (2021) proposed three models long short-term memory, multi-linear regression, and artificial neural network for predicting irrigation groundwater quality; bagging, boosting, and RF algorithms were also employed in predicting nitrate concentration in the coastal aquifer of Bangladesh (Islam et al. 2021). All these ML algorithms have large applicability due to their accuracy and precession level. The bagging and RF ML techniques emerge as a significant tool in groundwater quality studies and very few studies have done with relation to human health risk assessment. In groundwater, studies use bagging and RF algorithms are significant methods among all the ML algorithms due to their accuracy, precision level, and very few usages in this purpose (He et al. 2022a, b; Pal et al. 2022). This bagging ML is a popular ensemble algorithm (Prasad et al. 2006; Talukdar et al. 2020), developed by (Breiman 1996) and have used in predicting several natural hazards predictions (Hong et al. 2018; Truong et al. 2018; Pham and Prakash 2019; Yariyan et al. 2020) with high precession level but very few studies have done in groundwater studies as well as in HHRM. Another important ML tool RF introduced by (Breiman 2001). Several researchers successfully applied RF algorithms in their research such as Sihag et al. (2019) in hydraulic conductivity estimation; Pal (2005) in land cover classification and Norouzi and Shahmohammadi-Kalalagh (2019) in groundwater studies.

Purulia district of West Bengal is the extension part of Chhotanagpur plateau region, significantly characterized by lack of groundwater and poor water quality, where frequent fluoride and any other traced elements contamination have been reported (GEC 2017). The available number of research study of the entire Purulia district has been basically focused on groundwater potentiality rather than drinking water quality with relation to human health risk. Although previsouly several studies (Kundu and Nag 2018; Farooq et al. 2018; Bera and Das 2021; Chowdhury et al. 2021) have been done on groundwater quality assessment using either one or two pollutants and considering a particular block area. Therefore, there is a lack of research work on water quality related to health hazard studies of the entire Puruliya district. Thus, the prime objectives of this research work are to find out the significant water pollutants and make a relation with water causing health hazard by developing HHRM of entire Purulia district using bagging, RF and ensemble of bagging and RF machine learning algorithms. Henceforth, keeping this view, the applicability of single ML as well as ensemble model with considering fourteen health hazard susceptibility map is crucial for the entire study area, and this approach is the novelty of this research study. Furthermore, to measure the degree of accuracy and liability of each method eight validating techniques are employed i.e., receiver operating characteristics (ROC) curve (Chowdhuri et al. 2020; Roy et al. 2020), sensitivity, specificity, accuracy, precision, F-score, kappa coefficient, and Taylor diagram (Taylor 2001). The produced validation result is very much significant to the local and regional authorities in taking proper precautions in the reduction of health risks by sustainable water management. The outcome maps can also be very much helpful to identify severe health risk zone and associated measures should be taken to control and minimize this risk in the study area. The proposed approach i.e., ensemble learning of bagging and RF can also be useful in health hazard risk studies as well as other vulnerability studies in several parts of the world.

Materials and Methods

Study Area

Purulia district is the important plateau region of West Bengal, located southwestern part of this state with an absolute location of 22°43′–23°42′ north latitude and 85°49′–86°54′ East longitude (Fig. 1). This district lies as a transition zone between Chhotanagpur plateau and Damodar alluvium plains that are structurally a part of Ranchi peneplain; thus, low valleys, rolling topography, scattered hills are the key physiographic characteristics in this region. Several districts of Jharkhand (Ranchi, Singbhum, and Hazaribagh-Dhanbad) and West Bengal (Paschim Barddhaman, Bankura, and Paschim Medinipur) are sharing their border with our study. Rainfall has mainly taken place due to SW monsoon in June–September, the normal rainfalls have been recorded as 1028 mm and 248 mm in monsoon and non-monsoon season, respectively. Although most of the regions of this district are significantly facing water scarcity due to their hard topography and sloppy surface, therefore around 50% of the water flow as surface runoff due to its very low porosity, permeability, and infiltration rate (Ruidas et al. 2021). Various streams are flowing towards the east and southeast specifically Kangsabati, Damodar, Darakeswar, Kumari, and Subarnarekha among them Kangsabati is the main river in our study area. Besides, Subarnarekha is only one river that flows towards the south of the district; as they originate from the plateau region that result in significant water shortages during the cold and hot season. The total contained area of this district is 6259 km2, based on this occupies 5th position among all the states of West Bengal; Subtropical climate made this district a drought-prone region due to its low rainfall, it has been reported that a medium type of draught situation repeated every 3 years whereas severe drought in every 10 years. Maximum evaporation occurs in February to May due to extreme temperature increases that can be touched up to 46 °C; besides, the recorded lowest temperature is 5 °C in winter in this region. The total inhabitants of this district are 2,930,115 with a low population density of 468/km2, among them, 87.26% and 12.74% populations are rural and urban dwellers, respectively, there is only 64.48% populations are literate. This low literacy rate and lack of water treatment plants makes Purulia district highly vulnerable to health hazard (Farooq et al. 2018). The presence of large amounts of fluoride, potash, arsenic, calcium carbonate makes the drinking water unsuitable for drinking as well as for agricultural activities. Studies revealed that groundwater arsenic contamination in Puruliya is about < 10 μg/L which basically fall in unaffected categories although this littlebit contamination has wide effect on groundwater-related health hazard (Chakraborti et al. 2009). The huge rural and urban population is facing serious health issues due to poor water quality.

Fig. 1
figure 1

Location of the study area

Geo-hydrological Condition

The existence, movement, and quality of groundwater are prominently controlled by geology, physiography, structural condition, aquifer location of a particular region (Senthilkumar et al. 2015). Geologically, this study region comprises with seven different underlying strata namely Chhotanagpur gneissic complex, unclassified metamorphic, Singbhum GP, Dalma volcano, Manbhum granite, Kuilapal granite series, few regions characterized with sediments, among them Chhotanagpur gneissic complex is the most dominated division. Purulia has several numbers of rivers but has very poor groundwater resources due to crystalline strata resulting in 3–12 m water level, Undulating rugged topography, scattered hills, few monadnocks are the physiographic characteristics of Purulia districts that causing significant runoff of rainwater instead of sufficient infiltration. The whole Purulia district is mainly characterized by consolidated and semi consolidated crystalline basement that resulting groundwater within 10 mbgl and having maximum discharge of 20 m3/hr, whereas some parts of Purulia covered with Gondwana sandstone with 100 mbgl water level and 22 m3/hr maximum discharge. Primarily four types of aquifer systems are observed here such as weathered zone, Saprolitic zone, Hard rock fractured zones, and unconsolidated sediments. The weathered zone is characterized with varying thickness, the maximum thickness of 25 m and yielding rate up to 2.75 lps; during peak summer it goes to dry, whereas, Saprollitic zone with an average thickness of 4 m and 2.5 lps yielding rate. The hard rock fractures zone and unconsolidated sediments are restricted within 50–110 mbgl and 5–13 mbgl, respectively. Although, in this hardy rock terrain rainwater is the primary source but river channel also helps in groundwater recharge through some fractured zone.

Sampling Techniques

In our current study, a total of 67 water samples were collected from several sources such as tube wells, dug wells, and hand pumps during the dry season (March-early June) of 2021. Portable GPS (Garmin GPS etrex10) was used to record every single water sample location which was documented by sample ID number; two separate dry and clean 500 mL polyethylene containers were used to preserve the collected water samples by adopting different sampling procedures and precautions for different sampling sources. Among two containers, one was used for testing and another one for validating and cross-checking. Furthermore, to get desirable freshwater all tube wells were pumped for a few minutes helps to avoid stagnant water. Afterward, all collected samples were shifted with necessary precaution to the laboratory of The University of Burdwan to examine the chemical properties. In the case of cations (Na+, Ca2+, K+, Mg2+) and anions (NO3, HCO3, PO42−, F, Cl, SO42−) estimation the Dionex ICS-90 Ion Chromatography (Islam et al. 2021) system was adopted; water checker U-10 (Khan et al. 2013) was also used in site to examine the basic properties including pH, EC.

Moreover, collected samples were tested in different laboratories to assure the accuracy level. The descriptive statistics in Table 1 show the statistical measures of all parameters. This research work has been done in a sequential order to identify HHRM which is mentioned in Fig. 2.

Table 1 Descriptive statistics of physico-chemical parameters of groundwater samples collected from Purulia district
Fig. 2
figure 2

Methodological flow chart of this study

Inventory Data

In the making of HHRM, the preparation of inventory data is a prime concern in any research work, because it helps to predict probable susceptibility zone with the help of a mathematical relationship between the past records of health hazards with their influencing factors. In this study, we collect the past health hazard data by a field visit in the study region. We collected 67 points among them 70% were in the health hazard susceptible zone and the rest 30% were used for testing the result with the help of the Indian standard limit of water quality data (BIS 2012). According to Chung and Fabbri (2008), one set of data will be used for developing models and another set will validate the result. However, the binary classification is considered for HHRM; thus, collected inventory data is classified into two classes health hazard-prone region and non-health hazard point by employing 1 and 0 values, respectively. In this study we consider 14-health hazard causative factors (Fig. 3) to delineate HHRM through two important statistical approaches such as the multi-collinearity test (Saha et al. 2021a; Liao and Valliant 2012) and Pearson’s correlation coefficients(Tien Bui et al. 2016; Ahmad et al. 2022). Therefore, in this research work, three modeling approaches (bagging, RF, bagging-RF) were employed with suitable evaluation techniques to test the result. We extracted data from all 14-health hazard causative factors in ArcGIS 10.4 platform and employed these datasets in performing adopted learning models.

Fig. 3
figure 3

Health hazard causative factors

Multicollinearity Assessment

The independent nature of causative factors is necessary for assessment of HHRM; whereas, the interdependence of a large number of adopted factors can make collinearity issues that caused several regression problems. This is because of large and complex water quality data is used in modeling and multicollinearity elimination is necessary for optimal evaluation of HHRM. In this study, three methods have been adopted including, VIF (< 5), TOL (> 0.1) (Saha et al. 2021b, c), and Pearson’s correlation coefficients (\({r}^{2}= <\pm 0.7\)) (Ahmad et al. 2022) to assess the relationship among the selected factors. If the VIF, TOL, and \({r}^{2}\) value is > 5, < 0.1 and >  ± 0.7, respectively, implies there is significant collinearity issues. Following equations help in identifying collinearity issues:

$${\text{TOL}}=1-{R}_{J}^{2},$$
(1)
$${\text{VIF}}=\frac{1}{TOL},$$
(2)

where the regression value is represented by \({R}_{J}^{2}\) in the dataset. Variance inflation factor and tolerance are represented by VIF and TOL, respectively.

Adopted Modeling Approach

In the current study, three predictive models were employed to develop HHRM including Bagging, RF, and an ensemble of Bagging and RF algorithms. These are carried out by using R programming and with the help of ArcGIS 10.4 software.

Bagging

Bagging bootstrapping is an ensemble classifier algorithm, generally developed to enhance the stability and precession level of ML algorithms by generating diverse classifiers. This modeling approach is commonly used for regression and classification purposes. The bootstrapping nature helps to generate classification trees from the original dataset and make aggregate classifiers. Normally, this algorithm forms a final result by combining classifications of randomly created training sets. The bagging techniques gain much popularity because of their variance reduction characteristics and simple implementation with higher accuracy. The group of ‘weak learners’ forms a ‘strong learner’ that is an important part of this modeling approach; where each decision tree defines by ‘weak learner’ and the combined result is defined by a strong learner. These all weak learners have an individual vote and the final prediction (strong learner) is made by gaining maximum votes. In our study, this bagging algorithm is used to construct an ensemble and is efficient in handling unbalanced large datasets that give liable results by eliminating overfitting problems. Henceforth, learning set of Q was measured and it is consisting of n independent features. So, as they indicate health hazard causative factors, it is characterized through \(Q=\left\{\left({X}_{I},{Y}_{i}\right),i=\mathrm{1,2},\dots ,n\right\}\). Primarily, a set of \({C}_{b}(b=\mathrm{1,2},\dots ,n)\) signifies the b-th bootstrap sample of the training set Q obtained through the replacement of n elements of Q. Afterward, boots-trapped estimator \(g*\left(\bullet \right)\) was compute using plug-in principle i.e., \(g*\left(\bullet \right)={h}_{n }\left(\left({X}_{1, }^{*}{Y}_{1}^{*}\right),\dots \left({X}_{n, }^{*}{Y}_{n}^{*}\right)\right)\left(\bullet \right)\). Finally, m times (50 or 100) of repetition of the above steps for yielding \({g}^{*k}\left(\bullet \right)\left(k=\mathrm{1,2},3,\dots ,m\right),\) and associated bagging estimator is \({g}_{Bag}\left(\bullet \right)=\frac{\sum_{k=1}^{m}{g}^{*k}()}{m}\). The following equation (Eq. 13) represent the bagging approach.

$${g}_{Bag}\left(\bullet \right){=}^{*}\left[g*\left(\bullet \right)\right],$$
(3)

where, m expresses the finite number of precisions of the Monte Carlo approximation, and to \(m\to \infty\) represent theoretical quantity.

Random Forest (RF)

RF algorithm is the advanced version of bagging ML algorithm, made up of several decision trees that help in truthful predictions by avoiding overfitting problems in the training dataset where the traditional ML approach usually gives low classifiers accuracy with overfitting problems. In the training period, it creates multiple regression trees and samples by sampling. The acting procedure with same as bagging i.e., both make their classification trees based on bootstrapping and develop multiple decision tree but the difference is that a randomized subset of predictors helps to grow each decision tree; that works by combining the series of tree classifiers and every individual tree have a unit vote for the most preferred result. Hence, the classification is taken place by voting outcomes of multiple classifiers and then the votes together produced a final result with high classifiers accuracy, well noise, and acceptable outliers. The no-needing of cut processing, large number of trees, automatic feature selection are the significant advantages of the RF ML algorithm. In the RF algorithm, the generalization error (GE) is expressed as follows:

$$GE= {P}_{x, y}\left(mg\left(x , y\right)<(0\right),$$
(4)

where P indicates predictor, \(x\) and \(y\) are health hazard conditioning factors which represent the probability space (\(x\), y), and finally \(mg\) is expressed as:

$$mg\left(x , y\right)= {av}_{k}I\left({h}_{k}\left(x\right)= y\right)-\underset{j\ne y}{\mathrm{max}}{av}_{k}I\left({h}_{k}\left(x\right)= j\right),$$
(5)

where, \(I(*)\) express the indicator function, j represents the union of hyper-rectangles, and \({h}_{k}\) is the union of hyper-rectangles.

Besides the aforementioned two ML algorithms, in this study, we adopt the ensemble of bagging and RF modeling approach to get higher precision and accuracy levels compared to an individual. Till now, this ensemble technique is not used in any research work on water quality and HHRM, this will be emerged as a very helpful ML algorithm in environmental management studies by eliminating individual drawbacks of bagging and RF both.

Relative Importance

Parameter selection is one of the significant parts of any predictive analysis and the importance of each causative factor is influenced by adopted methods and evaluation techniques. Hence, various health hazard causative factors have been used in this study but every factor has not equal importance in health hazard susceptibility. Therefore, measurement of relative importance is necessary for any research work. In this study, RF algorithm has been used to measure the individual importance of adopted factors instead of traditional statistical techniques due to the large dataset. (Breiman 2001) proposed this in supervised learning. The mean decrease accuracy (MDA) technique has been used with the help of RF algorithm to quantify the relative importance of each parameter. Bui et al. (2012) state that the relative importance of individual causative factors was set on by keeping out that factor, therefore, the accuracy of the model is calculated.

Validation Techniques

Model evaluation is important to examine the degree of accuracy and performance of the models, developed based on previous data that can relate the predicted result to the real world. Therefore, in this study, we have employed eight statistical validation techniques with one graphical technique including receiver operating characteristics (ROC) curve, sensitivity, specificity, accuracy, precision, F-score, kappa coefficient, and Taylor diagram (Taylor 2001). The AUC (area under curve) value extend from 0.5 to 1; if the value is closer to 1 it implied the superiority of models whereas the closer value of 0.5 represents the bad performance of the models (Chen et al. 2019). Therefore, four different statistical indices were applied to examine the produced result especially true positive (TP), true negative, (TN), false positive (FP), and false-negative (FN). All these measures are used to quantify the accuracy (Bellu et al. 2016); whereas precession is measured through the ratio of true positives and all positives, values ranging from 0 to 1. The higher and lower values are implying the model’s reliability and non-reliability, respectively. The combination of precession and recall is an F-score, one of the important validating techniques in which 1 is the highest range. Another significant validating measure is the kappa coefficient that has the ability to recognize the accuracy, efficiency, and consistency of the adopted models (Hoehler 2000). The derived kappa values divided into five group on the basis of their performance abilities (0–0.2, 0.2–0.4, 0.4–0.6, 0.6–0.8, 0.8–1) (Dou et al. 2020). Following equations are used for the aforementioned indices:

$${\text{AUC}}=\frac{(\sum TP+\sum TN)}{(P+N)}$$
(6)
$${\text{Sensitivity}}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$$
(7)
$${\text{Specificity}}=\frac{\mathrm{TN}}{\mathrm{FP}+\mathrm{TN}}$$
(8)
$${\text{Accuracy}}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}$$
(9)
$${\text{Precession}}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}$$
(10)
$${\text{Recall}}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$$
(11)
$$F {\text{score}}=2\times \frac{{\text{Precession}}\times {\text{Recall}}}{{\text{Precession}}+{\text{Recall}}}$$
(12)
$$Kappa=\frac{{P}_{a}-{P}_{exp}}{1-{P}_{exp}}$$
(13)
$${P}_{a}=\frac{\mathrm{TP}+\mathrm{TN}}{\left(\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}\right)}$$
(14)
$${P}_{exp}= \frac{\left(TP+FN\right)\left(TP+FP\right)+\left(FP+TN\right)(FN+TN)}{\surd (TP+TN+FP+FN)},$$
(15)

where, TP, TN, FP, and FN interpret the true positive, true negative, false positive, and false-negative accordingly, total and random accuracy are also represented by \({P}_{a}\) and \({P}_{exp},\) respectively. \({O}_{i}\) and \({E}_{i}\) represent the observation and predicted health hazard values, respectively, in training and validating dataset whereas, N is the total number of samples.

Result

Multicollinearity Analysis

The adopted VIF and tolerances method shows that the highest VIF and lowest tolerances values are 2.51 and 0.398, respectively (Table 2), implying all the values are within an acceptable range (VIF < 5 and TOL > 0.1). Hence, there is no interdependence with each other which specifies there is no collinearity among all 14-health hazard causative factors. In this study, another statistical technique, Pearson’s correlation coefficient has been employed to determine the collinearity among conditioning factors. The generated correlation matrix is shown in Table 3 and Fig. 4 that implies most of the variables are relatively correlated which also indicates the selected independent variables are suitable for this modeling approach. The correlation value between \({\mathrm{K}}^{+}\) and \({\mathrm{Ca}}^{2+}\) is 0.71 which is slightly higher than 0.7 which can be considerable due to its slight difference.

Table 2 Multicollinearity analysis of different health hazard causative factors
Table 3 Pearson’s correlation coefficient between pairs of health hazard causative factors
Fig. 4
figure 4

Graphical presentation of Pearson’s correlation coefficient

Optimal Tuning Parameters in Ensemble Models

Optimal tuning parameters or hyper-parameters are the best to describe of an ensemble approach. Alongside this approach is also used in the appropriate description of the classification problem. In this study, the parameter of boosting iterations and root mean square error (RMSE) is used for the tuning parameters of the Bagging and RF algorithm (Fig. 5). Boosting models perform brilliantly, with the simple understanding ability and modest hyper-parameters for optimal tuning. The mean, median, and best values in a Boosting model based on Bagging and the RF models were used to calculate the link between iteration and its corresponding RMSE.

Fig. 5
figure 5

Evaluation parameters max tree depth versus boosting iteration, a Bagging, b RF

Health Hazard Risk Map (HHRM)

From the aforementioned data, it is established that the groundwater quality of the study region is the primary concern, Hence, HHRM is necessary to evaluate the health hazard zone because a large number of people significantly relied on groundwater for drinking purposes. To develop the HHRM the appropriate modeling approaches are very helpful to identify and quantify the potential health risk because of contact with several contaminants in different ways (Varol and Davraz 2015). Various studies (Adimalla and Li 2019; Adimalla and Wu 2019; He and Li 2020) have displayed the relation between water quality and probable health risk due to exceeding the permissible limit. However, Health hazard susceptibility assessment is a vital part rather than traditional water quality assessment. In this study, adopted three novel models namely bagging, RF, and an ensemble of bagging and RF developed the potential health hazard susceptible zone with the help of fourteen causative factors, those have diverse distribution patterns throughout the entire district in the investigation year (2021) that previously mentioned in Fig. 3, such as the SE portion, some parts of NW and NE are characterized with low water depth (2.07–3.37 m) situated at Manbazar II and Bandwan, Neturia and Santuri whereas, most of the region characterized with moderate to very high depth (3.92–7.97 m). Around 7.40–7.74 pH level mostly found in Purulia II, Kashipur, Punch, Hura, Arsha, and many blocks of Purulia district while low and very high pH levels were found in a scattered way. The high to very high EC was found in some parts of Bagmundi, Jaypur, Raghunathpur, Barabazar. Ca2+ has varied distributional patterns, values are greater in some pointed parts of this district (Fig. 3). A more or less similar pattern of Mg2+ is found all over the district and a very small part is characterized with higher value. Purulia has several witnesses of fluoride (F) contamination that found mainly in Purulia II, Arsha, Raghunathpur region. K+ is the important properties of drinking water which have the analogous distributional pattern throughout the district same way Mg2+ have homogeneous distributional pattern higher values are mainly found Bagmundi, Para, Santuri, Manbazar blocks. In this study area, the component Na+ has a heterogeneous distribution pattern; significant parts of this district have a higher value than the permissible limit including Balarampur, Barabazar, Bandwan. Jaipur, Santuri, Para blocks. A very small part of Bagmundi, Santuri, Puncha, Jhalda have the higher values of NO3 that is very similar to PO42− a distribution that is mainly found in the central part of the district. In addition, SO42− is mainly found in a scattered way in Northern, SE, and a small part of Bagmundi block.

The produced results of all models show the more or less analogous result (Fig. 6); in which most of the part eastern region including Santuri, Kashipur, Hura, Puncha, Manbazar I and II, Bundwan, parts of Barabazar, and some parts of the western region, Jhalda I and II fall under safe zone whereas, central part especially Bagmundi, Balarampur, Arsha, Purulia I and II, Raghunathpur significantly susceptible to the health hazard. The HHRM also identifies the spatial extension based on its degree of hazard. In this research work, the HHRM is classified into five categories namely very low, low, moderate, high, and very high; each model gives a different spatial extension. Considering 100% to the entire study region significant part falls under high to very high health hazard susceptible zone; the produced result of the bagging model (Fig. 6a) shows the class-wise distribution i.e., 352.13 km2 (5.49%), 1922.82 km2 (30.02%), 2535.39 km2 (39.58%), 1316.66 km2 (20.55%), and 278.04 km2 (4.34%) are very low, low, moderate, high, and very high accordingly; for RF (Fig. 6b) 555.35 km2 (8.67%), 2628.01 km2 (41.02%), 1931.07 km2 (30.14%), 1038.99 km2 (16.22%), and 251.99 km2 (3.93%) fall under very low, low, moderate, high, and very high zone, respectively; and in the case of an ensemble (Fig. 6c) (bagging and RF) 472.08 km2 (7.37%), 2878.91 km2(44.94%), 2119.25 km2 (33.08%), 780.2 km2 (12.18%), and 154.79 km2 (2.41%) is an area characterized by very low, low, moderate, high, and very high zone separately.

Fig. 6
figure 6

HHSM of Purulia district; a bagging, b RF, c ensemble of bagging and RF

Model Validation and Comparison

The performances of the employed models were validated by evaluating the dataset using nine validation techniques (AUC-ROC, sensitivity, specificity, accuracy, precision, F-score, kappa coefficient, and Taylor diagram). All the results are shown in Table 4, Figs. 7, and 8. It implies that all the used models have good prediction ability with their high predictive values, and indicate the substantial similarity between training and predictive health hazards. All models have displayed that the ensemble of bagging and RF has significant prediction ability followed by bagging and RF. Produced result of AUC-ROC curve shown in Fig. 7 that also indicates ensemble of bagging and RF is the most superior modeling approach that gave most effective result (training-0.934, validating-0.911) followed by bagging (training-0.912, validating-0.902) and RF (training-0.899, validating-0.878).

Table 4 Model evaluation measure and their performance
Fig. 7
figure 7

AUC-ROC of a training and b validation of adopted models

Fig. 8
figure 8

Taylor diagram for model evaluation

Taylor diagram (Fig. 8), as a graphical model evaluative tool, has given a notable result in one diagram on validating adopted models. Therefore, this validating diagram gave approximately equal results to the aforementioned validating techniques. Figure 8 shows the graphical presentation of evaluation performance; the derived value of ensemble of bagging and RF is (r = 0.94), have significant performance in health hazard susceptible zone prediction compare to another bagging (r = 0.89) and RF (r = 0.87).

Variable Importance

In this study, RF algorithm helps to assess the relative importance of causative factors; that have shown in Table 5 and Fig. 9. These all causative factors have significantly contributed to the prediction of models. Therefore, the result shows that depth has a great influence on health hazards with 0.912 value followed by the pH and As with 0.824 and 0.793 value, respectively, whereas, Ca2+, F and Mg2+ with 0.128, 0.197, and 0.401 importance accordingly, have least importance in the models. According to the derived result, depth > pH > As > HCO3 > EC > NO3 > NA+ > K+ > CL > Mg2+ > SO42− > PO42− > CA2+ > F is the hierarchical order of importance in models performance. However, all the adopted causative factors have a greater and lower influence on health hazard susceptibility modeling. Therefore, in our study aforementioned all fourteen parameters are chosen in HHRM.

Table 5 Variable importance of health hazard conditioning factors
Fig. 9
figure 9

Variable importance of adopted conditioning factors

Discussion

The essentiality of potable groundwater in human health is notable in the Purulia district, however, this study showed how the study area was facing potential non-carcinogenic health hazards due to adverse physiochemical properties of groundwater. Therefore, it was essential to build up a proper water quality management tool or making of HHRM of the entire Purulia district. This map will be helpful to the local people as well as regional authorities in realizing the sustainable use and management of groundwater. Thus, this health hazard susceptibility zone prediction is an important and most tough task in water quality and health risk assessment. Worldwide, deterioration of groundwater and related to health risks have been studied by different researchers (Kaur et al. 2020; Chen et al. 2021) and they also found a similar result of health vulnerability; those are ‘blue baby syndrome’ in children, skeletal fluorosis, black foot disease, ‘Hemochromatosis’, and neurological disorder in the human body due to abundance of different chemical components. Therefore, the HHRM prediction of the Purulia district became the foremost choice to all with consideration of all possible physiochemical parameters. Undoubtedly, the precision level of prediction significantly depends on the employed modeling approach. In recent times, some new ML methods such as ANN, SVM, logistic regression, and their ensembles have given more accurate results than the previous traditional method (Tien Bui et al. 2012). However, the identification of suitable methods to delineate the HHRM is extremely necessary. Therefore, we have applied three newly developed mostly accepted ML algorithms in this research work with appropriate evaluation and comparison i.e., bagging, RF, and the ensemble of bagging and RF. Previously, as a new ML tool bagging and RF have been used in different studies for their prediction and they also got their reliable result which significantly associated with reality (Iverson et al. 2004; Prasad et al. 2006; Hegde et al. 2015; Alsouda et al. 2019; Islam et al. 2021). But a very small number of studies have been done in the case of groundwater studies related to public health, where the use of bagging, RF, and their ensemble is a quite new approach. Generally, they all have the measurable ability in prediction studies, among them ensemble (bagging and RF) model yielded a notable result compared to other individual models in terms of evaluation performance. However, the ensemble model is more consistent. In this research work, to validate all the adopted models we have applied AUC-ROC curve; it gave optimal results on their performances in which the ROC value is 0.934 (training) and 0.911 (testing) followed by bagging and RF. According to (Aguirre-Gutiérrez et al. 2013) in the matter of model validation of large spatial prediction, AUC-ROC could not be the only evaluation technique. Thus, in this study, we have used other seven validation techniques to ensure the result quantitatively and all these models also gave the same result.

Above all the identification of health hazard causative factors is the foremost consideration that significantly affects the HHRM. That is why several methods have been introduced for the selection of causative factors such as linear regression, RF, VIF and TOL, and Pearson’s correlation coefficients. Although the selection procedure is still debated; with the presence of these debates, some methods are accepted universally for factors selection. Thus, in this study we have employed two separate methods instead of one method namely, VIF-TOL and Pearson’s correlation coefficients to identify the independent nature of health hazard causative parameters. Generally, As, pH, depth, heavy metals have a great influence to deteriorate the water quality in this study the adopted methods gave the more or less same result. RF was also used to quantify the variable importance that was very helpful in model performance, gave a valuable result for appropriate model development. According to result depth, pH, arsenic, \({\mathrm{HCO}}_{3}^{-}\) have a crucial influence in health risk assessment. The study shows that the spatially central part of this district including Bagmundi, Balarampur, Arsha, Purulia I and II, Raghunathpur significantly susceptible to the health hazard. In this study, firstly, we have overlooked in some cases such as soil characteristics, aquifer properties, and LULC of this study region which are the most significant shortcomings of this study. Secondly, we only focused on water quality data for health hazard assessment but there can be present any other possibilities in deteriorating public health such as food habits, agricultural practices, etc. but in future studies, health hazard susceptible zone would be developed by considering all parameters by newly developed data-driven models. Besides, our study will motivate others to apply an ensemble of bagging and RF for better results. However, the predicted results are still valid and meaningful that will be very helpful by providing information to the local authorities in making proper strategies in eliminating water pollution in the considered region.

Conclusion

Assessment of health hazard risk mapping (HHRM) by considering suitable groundwater-related health hazard causative factors is necessary to control and minimization of health hazard, particularly in semi-arid areas of sub-tropical environment. Therefore, potential HHRM was assessed in the Puruliya district of West Bengal using decision tree-based RF, bagging and their ensemble learning approach, and considering fourteen appropriate causative factors. The summary of this research study are presented as follows:

  • The outcomes of this study revealed that among the fourteen selected factors depth (0.912), pH (0.824), As (0.793) and HCO3 (0.791) is the most dominate in controlling HHRM in the present study region.

  • In the modeling perspective, ensemble approach of RF and bagging is most optimal (AUC = 0.911) to predict HHRM for this plateau region followed by bagging (AUC = 0.902) and RF (AUC = 0.878). In addition to this, Taylor diagram also represent that ensemble approach is most optimal followed by bagging and RF.

  • The result of ensemble approach revealed high precession accuracy which can be recommended for the assessment of HHRM of another region as well as hydrological studies, and environmental problem assessment in different region of the world.

  • The produced HHRM can help the regional planners, decision-makers, and government to take proper preventing measures in the high and very high susceptible regions and simultaneously reduce further damages.