Introduction

Groundwater is an important water source for human beings (Salman et al. 2019). However, with the development of urbanization, groundwater resources are facing enormous anthropogenic pressure and the constant threat of pollution (Wakode et al. 2018). Groundwater flow paths are complex, and the spread of contaminants is difficult to predict and control accurately, which makes it difficult to remediate groundwater once it has been contaminated. (Agossou and Yang 2021; Hou 2022; Siarkos et al. 2021; Li et al. 2021). Therefore, rational planning and utilization of groundwater resources to avoid groundwater pollution is far more important than post-pollution treatment (Clemens et al. 2020; Erostate et al. 2020; Thomann et al. 2020). Groundwater risk assessment is important for groundwater protection and development (Li et al. 2018; Shrestha et al. 2017; Xiong et al. 2022). It can provide valuable assistance in maintaining groundwater resources’ sustainability and remediating already contaminated sites (Jain 2023; Nanou and Zagana 2018).

Groundwater pollution risk assessment can be dated back to the groundwater vulnerability assessment proposed by Marja in 1968 (Machiwal et al. 2018; Taghavi et al. 2022). Traditional groundwater pollution risk assessment usually includes (1) groundwater vulnerability assessment, (2) pollution load assessment, and (3) groundwater functional value evaluation (Zhang et al. 2021; Zhu et al. 2018). Common vulnerability assessment methods for karst aquifers include EPIK, GLA, PI, COP, and PLEIK (Bordbar et al. 2023; Moreno-Gómez et al. 2024; Cusano et al. 2023). The PLEIK model is currently the most mature method for reflecting groundwater vulnerability in karst areas of southwest China, with minimal data requirements, relatively accurate results, and flexible application (Zou et al. 2014). In practice, the extension of the same weights to other regions is questionable due to the different geography of each assessment area (Wang et al. 2020). Groundwater pollution load assessment methods mainly include classification discrimination, pollution risk index, and prioritization (Kazakis et al. 2015; Bai et al. 2023). Because of the pronounced non-homogeneity and anisotropy of karst aquifer systems, it is difficult to accurately quantify groundwater flow and contaminant fluxes, even at smaller scales, and this poses unique challenges. Therefore, future research efforts should focus on the development of improved vulnerability assessment methods for karst systems, as well as other shortcomings of existing methods.

The study area is rich in manganese ore resources, but the improper exploitation and utilization of manganese ore has caused serious damage to the groundwater environment. According to the special hydrogeological investigation report of the study area, the manganese in the groundwater around some manganese electrolysis enterprises and slag dumps exceeds the manganese standard by 1.01–7620 times. Therefore, there was an urgent need to conduct a regional groundwater pollution risk evaluation to assist in keeping the sustainability of groundwater resources and remediating already contaminated sites. This study combines the hydrogeological characteristics of the study area, considers surface water-groundwater interactions, optimizes the karst groundwater vulnerability evaluation model, quantitatively calculates the weights of the indicators of the groundwater pollution risk evaluation using the Random Forest method, and validates the results of the groundwater pollution risk by using the level difference algorithm. The assessment results are of high guidance significance to the groundwater environmental regulatory authorities in formulating groundwater pollution prevention and control measures.

Study Area

The study area is located in the middle part of the Wuling Mountains and the outer part of the southeastern margin of the Sichuan Basin, with a total area of 2462 km2 (Fig. 1). The area belongs to the northern edge of the Yunnan-Guizhou Plateau. The terrain is generally high in the southwest and low in the northeast, with undulating valleys and wide, flat terrain in the center. The altitude ranges from 245.7 to 1661.0 m. The region has a subtropical humid monsoon climate, with an average annual precipitation of 1393.17 mm from 1991 to 2020, with more than 80% of the years having precipitation between 1100 and 1700 mm. Surface water resources are relatively abundant, in addition to the Youshui River, Huayuan River, and Longtan River, which are connected with neighboring provinces and cities, 48 other rivers in the county are longer than 5 km, with a total length of 737.1 km.

Fig. 1
figure 1

Hydrogeologic map of the study area

According to the aqueous medium, the groundwater in the study area can be divided into three categories: carbonate karst water, bedrock fissure water of clastic rocks, and loose accumulation pore water of the Quaternary System. The distribution area of carbonate rocks accounts for 51% of the whole area, clastic rocks, and shallow metamorphic clastic rocks account for 30%, and other types account for 19%, so karst water is the most important type of groundwater in the study area. Groundwater in the study area is mainly recharged by atmospheric precipitation, and groundwater is usually discharged to rivers, in addition, it is also discharged to the surface in the form of springs and dark rivers. The surface water system alternates frequently with groundwater in the area. In the karst development area, surface water is not developed, and precipitation is mostly absorbed by the ground and recharged to the groundwater; while in the lower gullies and valleys, large springs and dark rivers are recharging the surface water, and even the source of some rivers is itself a discharge point of groundwater, and there is an interconversion of surface water and groundwater in some parts of the area. Controlled by the topography and tectonic conditions, the pore water of loose rock type disperses runoff with the topographic slope drop, which is characterized by recharge and discharge in the vicinity. Karst fissure water and bedrock fissure water of clastic rocks are mainly deposited in karst pipes and fissures, and present net-like runoff. Influenced by the topography of the study area from south to north, the lowest discharge datum is the Youshui River in the north, and the groundwater as a whole is transported from south to north. However, the groundwater is controlled by the spreading law of karst fissures, and there is no unified water surface.

Research Methodology

Groundwater Pollution Risk Assessment

Groundwater pollution risk assessment usually considers three parts: groundwater vulnerability, pollution loading, and groundwater function value assessment. However, due to the strong non-homogeneity of karst groundwater in the study area, the depth of groundwater is more than 50 m, the development project is large, the investment is large, but the economic benefit is low, in addition, the water supply in the study area is mainly surface water, and the annual average of the water supply of groundwater is less than 5%. Therefore, in this study, only the intrinsic vulnerability of groundwater and the load of pollution sources are considered. The groundwater pollution risk index is calculated as follows.

$$R = {\text{PI }} \times {\text{PI}}_{w} + {\text{DI}} \times {\text{DI}}_{w}$$
(1)

where R is the pollution risk index, PI is the integrated load of pollution sources, DI is the vulnerability index, and w subscript is its corresponding weight, which was calculated by the random forest method. The results of groundwater Pollution risk were classified into five levels: high, relatively high, medium, relatively low, and low using the natural discontinuity method in ArcGIS. All maps were analyzed and edited using ArcGIS 10.8, and the results were processed in raster format and calculated using the raster calculation tool.

Groundwater Vulnerability Assessment

The study area has typical geomorphological and hydrogeological characteristics of the karst area in southwest China. Considering the representativeness, systematicity, and objectivity of the model, the PLEIK model (Zou et al. 2014) was selected as the groundwater vulnerability assessment method. The groundwater in the study area not only receives the recharge from the downward seepage of surface water bodies but is also affected by the lateral mutual recharge and discharge of rivers. The “13th Five-Year Plan” environmental quality report for the study area shows extensive mining and processing of manganese has a significant impact on the quality of surface water in the region and is the primary polluting factor for some surface waters. In areas with poor surface water quality and dense river networks, pollutants in rivers are more likely to threaten the groundwater environment, increasing the vulnerability of groundwater. Therefore, it is necessary to add a quantitative description of surface water groundwater interaction indicators in the groundwater vulnerability evaluation model.

Aiming at the characteristics of the close interaction between groundwater and surface water in the study area, the indicators for evaluating the degree of interaction between surface water and groundwater were added, and the PLEIKD model was constructed. The model consists of 6 indicators: P is the thickness of the protective cover, L is the type of land use, E is the development of the surface karst zone, I is the type of recharge, K is the development of the karst network system, and D is the degree of interaction between surface water and groundwater. Detailed explanations and data sources for the groundwater vulnerability determination are described in Table 1. Accordingly, groundwater vulnerability is evaluated according to the following Eq. (2):

$${\text{DI}} = {\text{PP}}_{{\text{W}}} + {\text{LL}}_{{\text{W}}} + {\text{EE}}_{{\text{W}}} + {\text{II}}_{{\text{W}}} + {\text{KK}}_{{\text{W}}} + {\text{DD}}_{{\text{W}}}$$
(2)

where Pw, Lw, Ew, Iw, Kw, Dw are the weighting coefficients of the indicators. DI is the vulnerability index, the larger the value of DI, the higher the vulnerability of groundwater. Table S1 lists the grading and assignment of the indicators of the PLEIKD model (Zou et al. 2014).

Table 1 Detail explanations and data sources of indexes of groundwater vulnerability

Pollution Loading Assessment

The purpose of pollution load assessment is to quantitatively assess the sources of pollution that may contaminate groundwater. The risk of individual groundwater pollution load is calculated according to Eq. (3), and the integrated load of pollution sources is the weighted superposition of each type of pollution source with the following formula:

$$P_{i} = T_{i} \times L_{i} \times Q_{i}$$
(3)
$$PI = W_{i} \times P_{i}$$
(4)

where Pi is the load index of the potential pollution source, Ti is the toxicity of pollutants, Li is the source release potential, Qi is the amount of contaminant that may be released, PI is the combined pollution loading, and Wi represents the weighting of the ith pollution type. The larger the value of PI, the greater the load of the pollution source and the greater the impact of groundwater by pollution. Table S2 shows the source load indicator scoring table. Ti scores pollution sources according to different types of industries; Li is closely related to the protective measures and sewage treatment measures taken by the sources, and the quality and use time of the pollution protection measures are the main factors determining the possibility of pollutant release; Qi is closely related to the scale of the sources and the amount of pollutants discharged. The weights of different types of pollution sources refer to the “Guidelines for the Delineation of Groundwater Pollution Prevention and Control Zones” (Table S3) (MEPC 2019).

Influenced by the combination of characteristics of aqueous rock groups, geological structure, and topography in the study area, there is no uniform water surface of groundwater. In this study, the Darcy formula is used to calculate the radius of the buffer zone for a single pollutant source based on the head conforming to the topographic control theory, and the scope of the buffer zone is adjusted by combining with the distribution of karst pipelines and drop caves. The buffer zone radius calculation formula is as follows:

$$\lambda = K \times I \times t \times 365$$
(5)

where λ is the radius of the buffer zone, m; K is the infiltration coefficient, m/d, from the measured value; I is the hydraulic gradient; t is the number of years of plant construction, from the Second National Census of Pollution Sources.

Calculation of Random Forest Weights

Calculate the weights of indicators through Random Forest (RF) to construct the assessment system of groundwater pollution risk. The assignments of RF are described in “Introduction” section of the Online Resource. According to the groundwater pollution risk assessment considering both the essential vulnerability of groundwater and the loading of pollution sources, the indicator system was proposed, and seven assessment indicators are considered comprehensively, including the comprehensive loading of pollution sources, protective cover, land use type, the intensity of development of the surface karst zone, the development of karst network, infiltration conditions, and the degree of interaction between surface water and groundwater. In this study, 2/3 of the groundwater monitoring well data were selected for making the training dataset, and the remaining 1/3 was used for the validation of the results. 2/3 (a total of 66) of the groundwater monitoring well data were selected for producing the training dataset and the remaining 1/3 (32) were used for validation of the results. The degree of contamination of the groundwater monitoring points was used as a category attribute, noting “no pollution” as 1, “mild pollution” as 2, and “moderate pollution” as 3; “severe pollution” is recorded as 4; “extreme pollution” is recorded as 5. Based on ArcGIS, the basic information of the 66 selected sample points was extracted to obtain the training dataset for each indicator. Each sample point in the training dataset has one category attribute value (1–5) and seven indicator attribute values.

Methods for Validation of Groundwater Pollution Risk Results

To verify the accuracy of the results of groundwater pollution risk assessment in the study area, this study used the level difference method for judgment. The results of the groundwater pollutant analysis and the results of the groundwater pollution risk assessment were divided into the same number of grades. The results of groundwater pollution analysis were divided into five grades: no pollution, mild pollution, moderate pollution, severe pollution, and extreme pollution, which were assigned the grade values 1, 2, 3, 4, and 5; meanwhile, the risk of groundwater pollution was also divided into five grades, which were low, relatively low, medium, relatively high, and high, with the grade values of 1, 2, 3, 4, and 5, respectively. Then, the difference of the grade values was performed separately for each sampling point, and the absolute values were taken. Stigter considered that when the absolute value of the level difference is 0 or 1, the result of pollution risk assessment is reasonable; when the absolute value is 2 or 3, the result of pollution risk assessment is considered to be overestimated or underestimated; when the absolute value is greater than or equal to 4, the result of pollution risk assessment is considered to be extremely overestimated or underestimated (Abba et al. 2024).

Sensitivity Analysis Methods

A large number of input data layers are used in pollution risk assessment models. This was seen as a major advantage; the larger the number of data layers, the smaller the impact of errors or uncertainties in individual parameters on the final outputs (He et al. 2018). In this study, the effective weight method was used for sensitivity analysis. The single-parameter sensitivity analysis method is suitable for evaluating the impact of each parameter on groundwater vulnerability, and the method calculates the effective weight of each parameter. The effective weight is the percentage of the regional groundwater pollution risk index derived from the product of each indicator score and the corresponding weight, and the formula is as follows:

$$W = \left( {P_{r} P_{w} /R} \right) \times 100$$
(6)

where W is the effective weight of each parameter; Pr and Pw are the rank and weight of each parameter; and R is the pollution risk index value.

Results and Discussion

Random Forest Weights

Evaluation Results of Groundwater Pollution

According to the groundwater pollution evaluation method in “Study Area” section of the Online Resource, the results of groundwater pollution level evaluation were obtained (Fig. 2). The numbers of no pollution, mild pollution, moderate pollution, severe pollution, and extreme pollution points among 98 point samples were 47, 18, 14, 11, and 8, respectively. Overall, the groundwater in the study area was mainly no pollution and mild pollution, the sum of which accounted for 66.33%, while severe pollution and extreme pollution accounted for a small percentage, neither of which exceeded 12%. Severe and extreme groundwater pollution points mainly exist around pollution sources such as manganese electrolysis plants and slag yards, pollution factors are mainly manganese. Analyzed in conjunction with the survey reports of 23 manganese electrolysis plants and slag dumps in the region collected during the study period, it is speculated that pollutants can easily enter the groundwater through infiltration of rainfall, mainly due to the lack of effective seepage control measures.

Fig. 2
figure 2

Map of groundwater pollution degrees

Indicator Weights

The weights of indicators were calculated by RF (Table 2). According to the groundwater pollution risk evaluation system constructed in “Groundwater Pollution Risk Assessment” section, a total of seven evaluation indicators, including combined pollution loading, protective cover, type of land use, development of surface karst zone, development of karst network, the type of recharge, and the degree of interaction between surface water and groundwater, were comprehensively considered. In the study, 66 groundwater monitoring points were selected as training sample points, and about 2/3 of the points of each pollution level were selected, including 34, 12, 8, 7, and 5 points of no pollution, mild pollution, moderate pollution, severe pollution, and extreme pollution points, respectively.

Table 2 Table of factor weights for RF-based indicators

Groundwater Vulnerability Assessment Results

The groundwater vulnerability results (Fig. 3) were obtained based on the PLEIKD model. The calculated values of DI ranged from 1.79 to 7.55. The area of high vulnerability is 44 km2, which is 1.81% of the total study area; the area of relatively high vulnerability is 314.21 km2, which is 12.9% of the total study area; the area of medium vulnerability is 590.78 km2, which is 24.26% of the total study area; the area of relatively low vulnerability is 537.48 km2, which is 22.07% of the total study area; and the area of low vulnerability is 948.5 km2, accounting for 38.95% of the total area of the study area. The high vulnerability area of the study area is mainly distributed in Rongxi Town, Longchi Town, Songnong Town, and Qingxi Field Street, which is mainly controlled by the development of surface karst zone and karst network, and the aquifer is mainly gray rock, and the strong development of surface karst zone and karst network leads to the surface pollution entering into the aquifer very easily. The areas of higher vulnerability are mainly located in Meijiang Town, Songnong Town, and Rongxi Town, because the aquifer is mainly dolomite, and the relative development of surface karst belt and karst network leads to surface pollution easily entering into the aquifer. In addition, due to the influence of surface water, vulnerability is also higher near river sections with poor surface water quality such as the Meijiang River, Longtan River, and Rongxi River. Also, groundwater vulnerability scores are higher in areas such as drop holes, skylights, and dark rivers, which are scattered. Areas of relatively low and low vulnerability are mainly concentrated in areas of weak and undeveloped karst, as the ability of the overburden to retard infiltration, a small amount of net recharge, etc. reduces the potential for groundwater contamination (Pereira et al. 2019).

Fig. 3
figure 3

Results of groundwater vulnerability assessment

Pollution Loading Assessment Results

Current pollution loading assessment models are usually composed of indicators such as pollutant toxicity, the release potential of the pollutant source, the amount of pollutant that may be released, and the radius of the buffer zone (Liu et al. 2022). The buffer zone radius of a pollutant source is important in determining the extent of the high surface load zone (Kaur et al. 2022). However, recommended values are usually used for the buffer zone radius in conventional methods, which do not take into account the actual hydrogeologic conditions of the site to determine the extent of pollutant impacts. Based on this, this study considers the distribution and migration impacts of pollutants based on conventional methods of pollutant loading assessment. Based on the pollutant discharge time and groundwater flow rate, the impact range of the groundwater pollution plume was estimated as the buffer zone radius. The results of the pollution loading assessment are shown in Fig. 4. The main sources of groundwater pollution in the study area are industrial enterprises, mines, slag dumps, gas stations, landfills, and agricultural sources.

Fig. 4
figure 4

Assessment results of groundwater pollution loading

Pi values were calculated according to Eq. (3) and ranged from 0.0 to 16.2 for industrial enterprises, 0.0 to 8.4 for mining areas, 0.0 to 14.0 for slag dumps, 0.0 to 2.5 for gas stations, 2.1 for landfills, and 6.0 to 10.0 for agricultural surface sources. PI values were calculated according to Eq. (4) and the PI values for the study area ranged from 12 to 162. The low load zone dominated the study area (93.57%) and was mainly influenced by agricultural sources. Relatively low and medium loading zones accounted for 4.55% of the study area, which is mainly located in industrial enterprises such as cement plants with better impermeability measures and less pollution, closed mines. Relatively high and high loading zones account for 1.88% of the study area, which is mainly distributed in the towns of Gaotian, Rongxi, and Longchi. According to the Second National Pollution Census and the pollution risk prevention, control, and management program of the slag heap in the study area, the higher loading area is mainly for the following three types: the presence of ferrous heavy metal industries with large pollutant emissions; the slag heap that was built for a long period and lacks effective seepage prevention and control measures at the bottom; and mines with clustered mining.

Groundwater Pollution Risk Assessment Results

The groundwater pollution risk classification map (Fig. 5) is a vector iteration of the intrinsic vulnerability of groundwater and pollution loading. The high risk zone of groundwater contamination accounts for only 2.21% of the study area. This zone is mainly located in the central part of the study area, specifically around the contaminated river, a silicon plant, two manganese electrolysis plants, and clustered slag dumps. The area of relatively high risk of groundwater contamination, which accounts for 9.60% of the study area, is scattered in areas of relatively gentle terrain. Areas of medium risk of groundwater contamination, accounting for 22.52% of the study area, are patchily distributed in the central basin and gully areas. The relatively low pollution risk area of the study area, accounting for 26.49%, is mainly distributed in the central part and the villages and towns on the east and west sides, and the groundwater is mainly the karst fissure water of clastic rock sandwiched by carbonate rock, with undeveloped karst, and there is no obvious source of pollution on the surface. The low pollution risk area of the study area, accounting for 39.18%, is mainly distributed in the middle and high mountain areas, with few human activities, and the overall risk of groundwater pollution is small.

Fig. 5
figure 5

Distribution of groundwater pollution risk classification in the study area

The risk map of groundwater contamination shows that the study area has a relatively low probability of groundwater contamination, as the areas with relatively low and low risk classification account for 65.64% of the total area. The most important factors in the high groundwater pollution risk zone are pollution from industrial sources (manganese electrolysis plants, silicon plants), slag dumps, mining extraction, and surface water pollution. The second most important factor is the relatively high intrinsic groundwater vulnerability. A variety of factors contribute to the relatively high risk of contamination in specific areas. Compared to other places, Rongxi town and Longchi town have a higher concentration of industrial businesses and manganese slag sites. Higher pollution loads from overlying aquifers make some areas at high risk of groundwater contamination. The intrinsic vulnerability of groundwater is a key factor in the higher risk areas of Songnong Town, Longchi Township, Meijiang Town, and Rongxi Town. As their aquifers are dominated by limestone, karst landforms are developed, and caves, depressions, funnels, and drop holes are common, various types of pollutants enter the groundwater through preferential flow or infiltration under the action of surface runoff and atmospheric rainfall, threatening the safety of groundwater (Duan et al. 2023; Li et al. 2022).

Validation of Groundwater Pollution Risk Evaluation Results

Comparison of Indicators

The impact of surface water on regional groundwater is evident (Xin et al. 2018). Anthropogenic impacts, such as industrial sources, fecal pollution, livestock wastewater, and agricultural pollution, increase surface water contamination and exacerbate groundwater vulnerability, especially during the rainy season when groundwater is recharged from surface water (Adamovic et al. 2022). Therefore, increasing surface water and groundwater interactions as groundwater vulnerability parameters. To verify the applicability of the added parameters, this study compares the improved PLEIKD model of groundwater vulnerability with the usual PLEIK model, both weighted using the RF method, PLEIKD-RF, and PLEIK-RF, respectively. The groundwater pollution risk map of the study area was obtained by superimposing the PLEIK-based model and the pollution loading is shown in Fig. 6. The overall pollution risk in the study area shows a low level, with the largest area of low and relatively low groundwater pollution risk, accounting for 68.05% of the total area of the study area; the area with relatively high groundwater pollution risk, accounting for 6.56%; and the area with high groundwater pollution risk, accounting for 1.33%.

Fig. 6
figure 6

Distribution of groundwater pollution risk classification in the study area (PLEIK-RF)

Compared with the evaluation results of the PLEIKD-RF model, the evaluation results of the PLEIK-RF increased by 3.04% and 0.88% in the percentage of the area with relatively high and high risk of groundwater pollution, mainly because of the increased consideration of the indicator of the degree of closeness of interaction between groundwater and surface water. The area of Meijiang Town has poor surface water quality due to agricultural pollution and a dense river network; Longchi Town, Songnong Town, and Rongxi Town have poor surface water quality due to the influence of enterprises along the river, which leads to easy contamination of groundwater. Therefore, the risk of pollution in the region increases when the interaction indicators between groundwater and surface water are increased.

Comparison of Weights

Two types of weighting methods are currently available: subjective and objective. The subjective weighting method is based solely on the preferences or judgments of the decision maker, and the potential uncertainty of this method is its main drawback (Yang et al. 2023). The objective weighting method calculates weights based on actual observations, without considering the decision maker’s preferences and without being limited by the decision maker’s expert knowledge or experience (Torkashvand et al. 2023). Therefore, the weights determined by the random forest method are used in this study (Wei et al. 2023). To verify the applicability of the optimized weights, two weight calculation methods (RF and AHP) were compared in this study, and the assessment models were both using PLEIKD, PLEIKD-RF, and PLEIKD-AHP, respectively. The groundwater contamination risk distribution obtained by determining the weights using AHP is shown in Fig. 7. The AHP calculation method is detailed in the literature (Soyaslan 2021). The weights of P, L, E, I, K, and D were 0.22, 0.19, 0.18, 0.11, 0.15, and 0.14, respectively. The weights of vulnerability PI and pollutant loading DI are 0.54 and 0.46, respectively, concerning existing literature (Guo et al. 2014).

Fig. 7
figure 7

Distribution of groundwater pollution risk classes in the study area (PLEIKD-AHP)

As can be seen in Fig. 7, the area of low and relatively low risk of groundwater contamination is the largest, accounting for 55.97% of the total area of the study area. The area of relatively high risk of groundwater contamination accounted for 11.27%, and the area of high risk of groundwater contamination accounted for 3.09%. The distribution of each risk level is roughly similar to Fig. 5, but there are some differences in some areas. In Qingxiba Street, Zhonghe Street, and Guanzhuang Street, the AHP pollution risk is slightly higher than the RF pollution risk because of the high distribution of enterprises and slag dumps, the high pollution loading values, and the higher pollution loading weights of AHP than RF. Under the influence of poor surface water quality and the high density of the river network in some sections of the Meijiang River and Rongxi River, the groundwater vulnerability is high and the groundwater vulnerability of AHP is lower than that of RF, so the pollution risk of AHP is slightly lower than that of RF.

Validation of Results

According to the groundwater pollution assessment in the online resources, the groundwater pollution in the study area is categorized into 5 levels. In the spatial analysis module of ArcGIS, the groundwater risk levels of the 32 sampling points were extracted and compared with the corresponding groundwater pollution levels to obtain the level difference results for each point. The validation results showed that 71.87% of the points were assessed as reasonable based on PLEIKD-RF, which was considered to have 71.87% validation accuracy, and 28.13% of the points were overestimated. Thus, the proposed methodology proved to be valid for the study area. The overestimated points were mainly distributed in the western part of the study area. In these areas, no or slight pollution corresponds to a relatively high risk of groundwater contamination. Pollution source loads had the greatest influence on the distribution of relatively high pollution risks. The points with overestimated pollution risk are mainly distributed near the point source pollution. Combined with the hydrogeologic map and topographic analysis, it is presumed that the overestimated points are located upstream of the groundwater flow network at the source of contamination, and therefore are subject to less contamination. The accuracy of the risk of groundwater contamination can be improved if the hydrogeologic conditions and pollution source information are further investigated (Machiwal et al. 2018; Nistor et al. 2021). In addition, due to the limited number of groundwater sampling points, there is not enough validation in the eastern part of the study area. The sampling points are mainly located in the west. More representative groundwater sampling is recommended to further validate the PLEIK-RF model.

It is important to check the accuracy of groundwater contamination risk maps. The choice of appropriate validation criteria remains controversial in the literature. The level difference method assumes that groundwater contamination results are proportional to groundwater contamination risk. It has the advantage of being easy to use, widely applied, has low data requirements, and provides a clear explanation of the validation. The disadvantage of the method is the uncertainty of the validation results due to the different classification methods used in groundwater contamination and contamination risk assessment.

The level differences of the three assessment results validate the reasonableness of the results as shown in Table 3. The accuracy of the assessment results based on PLEIKD-RF is higher than that of PLEIK-RF (50%) and PLEIKD-AHP (59.38%), which indicates that the newly constructed indicator system is more scientific and reasonable to be applied in the region. This is because it characterizes vulnerability more accurately by considering the existence of groundwater and surface water interactions in the study area. In addition, the RF is mainly based on the internal laws of the indicator data, removes subjective influences, and utilizes the data mining capability of the model to obtain the contribution of the indicator data, making the results more reasonable and objective (Zhou et al. 2024).

Table 3 Reasonableness statistics of groundwater pollution risk assessment results

Sensitivity Analysis

Sensitivity analysis is used to identify the impact indicators with the greatest regional groundwater vulnerability, which is a guide for taking effective groundwater pollution prevention and control measures (Emara et al. 2024). In this study, the effective weighting method was used for sensitivity analysis (Huan et al. 2018). From the results of effective weight statistics (Table 4), it can be seen that the effective weights, in descending order, are protective cover (23.07%), the degree of interaction between surface water and groundwater (18.19%), pollution loading (17.32%), the intensity of development of the surface karst zone (14.79%), land-use (13.75%), and the development of karst network (8.65%), infiltration conditions (4.24%). The effective weights ranged from 4.24 to 23.07%, with protective cover being the most sensitive indicator of overall groundwater contamination risk in the study area, and recharge type being the indicator with the least impact on overall groundwater vulnerability in the study area. The standard deviation of the effective weights of land use type and surface karst zone development were both high, indicating that the impact scores of the two on the risk of groundwater pollution varied widely in space. The mean values of the effective weights of pollution loading are small compared to their original weights. It is mainly affected by the uneven distribution of pollution sources. Industrial enterprises, slag dumps, gas stations, etc. exist in the form of point sources, and their influence range is only 0.1–2 km around. The high load zones are concentrated in localized areas, while the low load zones in the study area account for the vast majority. So it leads to the phenomenon that the maximum value of its effective weight is the largest among the factors, while the mean value is small.

Table 4 Statistics of effective weights

In general, the protective cover, the degree of interaction between surface water and groundwater, and the loading of pollution sources have a relatively large influence on the pollution risk of the study area, which is all greater than 15%. It is of great significance to obtain accurate, detailed, and representative data of the indicators with large influence weights, as well as to reflect their grading criteria and values more objectively and realistically to obtain a more accurate groundwater pollution risk zoning (Zhao et al. 2018). The next step is to improve the refinement of groundwater pollution risk assessment in the following directions: (1) Carry out large-scale and high-precision hydrogeological survey to further clarify the distribution characteristics of karst pipes and the scope of interaction between surface water and groundwater, and obtain detailed and accurate hydrogeological parameters to improve the accuracy of the evaluation of the intrinsic vulnerability of groundwater; (2) Grasp detailed data on rock and soil layers, obtain information on the structure of soil layers, and investigate the filling situation in the dissolution joints; (3) Regularly update the list of various pollution sources, such as industrial, agricultural, and domestic sources, and obtain information on the intensity and duration of pollution from the sources to improve the accuracy of pollution load evaluation.

Conclusion

A comprehensive PLEIKD-RF model was successfully proposed for assessing the risk of groundwater pollution in typical karst areas in Southwest China, which combines the hazards of pollution sources and the vulnerability of groundwater, adopts modified vulnerability assessment parameters, and calculates the weights of each parameter based on the random forest method. The results showed that high and relatively high risk zones accounted for 11.81% of the study area and were mainly distributed in the western and central regions, and pollution from industrial enterprises and karst development were the main reasons for the high and relatively high risk. The low pollution risk zone is mainly distributed in the middle and high mountainous areas of the study area. Human activities increase the risk of regional groundwater pollution and make groundwater more vulnerable to pollution. The correct rate of groundwater pollution risk evaluation is 71.87%, which is higher than that of the unimproved vulnerability evaluation indicators and unimproved weights, as verified by the horizontal difference method, which proves that the model applies to the evaluation of groundwater pollution risk in the southwestern karst area.

Groundwater pollution risk assessment maps can reveal current inappropriate land uses and help managers clearly recognize groundwater pollution risks. As groundwater pollution risks and some assessment indicators are dynamic, assessments should be conducted as regularly as possible to ensure the accuracy of the results. The accuracy of groundwater pollution risk assessment can be improved by increasing the precision of the data, including more accurate information on hydrogeological parameters and a more detailed investigation of surface pollution sources. Moreover, more representative samples can be collected in different groundwater pollution risk level areas to further test the rationality of the evaluation system.