1 Introduction

Water is considered a globally critical resource and is therefore exploited for a variety of uses ranging from industrial to domestic [1]. Entire organisms and ecosystems are completely dependent on water [2]. The earth has an abundance of this very important resource, however, only a small fraction is available for human usage. Globally, less than 1% of the finite freshwater that exists on Earth is accessible for human use. While it only makes up about 2.5 to 3.2% of the Earth’s water, most of it is in glaciers and ice caps (68.7%) or is located underground as groundwater (30%). Only a small fraction of the earth’s freshwater resources (0.3%) is easily accessible as rivers, lakes, and marshes [3, 4]. According to [5], continuous availability of groundwater is associated with the preservation of biodiversity during unconducive weather conditions. Among the various activities the earth’s freshwater resources are used for, irrigation accounts for 70% of it [6]. This has increased the stress on groundwater sources and very little time for replenishment of these sources leading to a steady decline in groundwater availability [7]. Approximately, 70% of Ghana's inhabitants rely on groundwater for household endeavours [8]. Groundwater serves as the water supply for 25% of urban dwellers and 90% of rural inhabitants, as reported by [9]. In Ghana, surface water has predominantly been treated and used as irrigable water in the urban areas. However, the rise in illegal mining also known as “galamsey” has led to a rapid depletion of the resources [10,11,12]. Another cause of strain on surface water is the climate-induced dwindling of the rainfall patterns [13, 14]. According to [15], northern Ghana, where the study area is located; is particularly prone to climatic-induced stress on the water resources. The authors posit that the Savannah zone of northern Ghana has been subjected to sustained patterns of regressions in annual rainfall values coupled with an upsurge in temperature over the last century.

Assessing and evaluating the quality of groundwater sources is integral to its conservation efforts. To determine whether a groundwater source is fit for usage, checks need to be conducted to determine its suitability. There are numerous approaches that can be applied. One of the most often accepted strategies includes water quality indices [16, 17]. The Water Quality Indices (WQIs) can determine the level of appropriateness of the water for usage and emphasize potential dangers it may inhabit. This study evaluates the appropriateness of groundwater as irrigable water and therefore considers WQIs related to irrigation. There are a wide range of irrigation water quality indices employed in assessments globally [18,19,20]. Each has its merits and drawbacks and therefore it is imperative to employ indices that prioritise accuracy over anything else.

Machine Learning Algorithms in recent years have been integral in the development of models employed in the classification and prediction of WQI’s [21, 22]. Palani et al. [23] employed Artificial Neural Networks (ANNs) a machine learning model, programmed to mimic how the human brain processes data, analyses it, and identifies relationships that exist within the data in predicting and forecasting of quantitative properties of water bodies. ANNs are efficient in processing and computing information parallelly and can handle high dimensional data. The algorithm can also discover hidden patterns and complex relationships that exist within the data, but more importantly, they are able to learn from their error, putting them in a unique class when it comes to tools employed in research.

For the above stated reasons and more, machine learning methods are taking over the hitherto used multivariate statistics in the characterization of geo-resources occurrences and as well as their future occurrence through accurate and reliable predictions. Regression techniques are also tools that have been used globally by researchers in both classification and forecasting of geo-resource occurrences [24]. Linear Regression (LR), Random Forest regression (RF), Support Vector regression (SVR), Decision Tree Regression (DT) among others have employed in number of studies worldwide. Balogun and Tella [25] used random forest, decision tree, linear and support vector regression in assessing the correlation between climatic variable and ozone concentrations.

The quality of groundwater and mineralization controls in the study area has been recently conducted by [11]. However, the suitability of the groundwater resources within the catchment’s suitability for irrigation purposes is still an enigma. To maximize the usage of this scarce resource, it is important for a study on its suitability for irrigation to conducted so as to guide reliable and accurate decision-making as well monitoring, within the catchment. To this end, multi-method machine learning techniques were adopted in this study together with a number of groundwater irrigation assessment indices. The aim of the study is to evaluate the suitability of the groundwater for irrigations purposes and also to assess the performance of the various applied machine learning methods in the prediction of the indices. This study is relevant to the attainment of the international goals such as the Strategic Development Goals (SDGs); especially SDG 6 (Clean Water and Sanitation) and SDG 2 (Zero Hunger). The continual assessment of groundwater resources closely aligns with the broader global agenda of ensuring access to clean water and promoting sustainable agriculture practices.

2 Materials and methods

2.1 Study area

The study area lies within latitudes − 2.52° and − 2.05°, and longitudes 9.72° and 10.02° of Northern Ghana. The area is situated primarily in the Wa East District and Wa West (located in the Upper West region); with the southern part of the area lying within the Sawla-Tuna-Kalba (in the Savannah region) of Ghana. (Fig. 1). According to Salifu et al. [26] and Abu et al. [11], the area is characterised by an arid climate with a singular raining season. Furthermore, the area forms part of northern Ghana’s savannah grassland zone and is characteristically arid for prolonged periods, with a relatively shorter period of precipitation lasting from May to September annually. Aabeyir and Aduah [27] reports an annual rainfall of 1000 mm and average humidity of approximately 90 mm in the area. Similar to other parts of Northern Ghana, the area witnesses temperature fluctuations which change range from 15 °C during the rainy season to upwards of 40 °C during the so-called heat season associated with the peak of the dry season. Generally, the Wa East Area is characterised by undulating hills with a low point of 180 m and a high point of around 1300 m above seas level [28]. The main river which drains the area is the Kulpawn river and its tributaries which traverses the area in a distinct dendritic manner.

Fig. 1
figure 1

Map of the study area

According to the Ghana Statistical Service, the population of the study area is approximately 90,000. Of the employed population, about 88.8% are engaged as skilled agricultural, forestry and fishery workers [29]. The area is very much dominated by small-scale farmers that practice subsistence agriculture such as, farming and livestock farming [30]. The subsistence of the area's population is greatly dependent on irrigation because of the semi-arid climate and low rainfall that make rain-fed farming difficult. The primary aim of irrigation is growing staple crops such as maize, millet, and rice, which are essential for food security in the area [30]. Additionally, irrigation enables the cultivation of cash crops such as the case of vegetables and fruits which then affect the household and local markets. Since agriculture is the main driver of people's livelihoods, access to reliable irrigation water becomes of great significance as it ensures food security and makes crops less susceptible to climatic variations.

The area falls with the western geological province of Ghana which is made up of the Paleoproterozoic Birimian rocks and the closely associated Tarkwaian formation (Fig. 1; [31]). The Birimian in Ghana is constituted by volcanic greenstone belts separated by metasedimentary basins [32]. One of these greenstone belts, the Lawra belt, falls within the area is composed of basalt, andesite as well as some metamorphosed sediments. The Tarkwaian on the hand is made up of sedimentary rocks, including sandstones and conglomerates, as well as metamorphic rocks, particularly phyllite [33]. The geology of the area is similar to the typical greenstone belts of Ghana and contains rocks such as basalts gabbros, metasedimentary units and a variety of granitoids [32, 34].

Typically, groundwater occurrence in the Birimian terrain is controlled by structural discontinuities and the weathered zone, which are in turn determined by the regolith profile [35, 36]. The groundwater in the area are contained within fractured aquifer systems [11, 26]. The degree of weathering in the Birimian varies extensively across Ghana, with eroded profiles recorded at depths between 90 and 120 m in the south where rainfall is substantial [37]. Figure 2 depicts a schematic representation of the hydrostratigraphy in the Birimian, with the respective thicknesses. The hydrogeology of the Wa East area is primarily characterised by secondary permeability which formed within a shallow weathered profile [8]. The structural discontinuities which aid in groundwater localisation and movement were mostly formed by the same geological event responsible for the pathways which localised the gold [34].

Fig. 2
figure 2

(adapted from Carrier et al. [40])

A schematic diagram of the hydrostratigraphy in the Birimian

The Goripie–Bulenga–Choggu–Du watershed of the area is inhabited by a limited number of prominent gold exploration corporations, along with scattered instances of Artisanal and Small-Scale Mining (ASGM) operations. The operation of these mines, particularly the ASGM has been a source of concern due to the level of pollution associated with it [1, 10, 38, 39]. Such activities continue to pose a present danger to both surface and groundwater resources in the area.

2.2 Methodology

2.2.1 Water sample collection and analysis

Water samples were collected during a field study specifically designed to sample from the areas most affected by ASGM activities. During the survey, a total of 105 samples were collected following laid down standards in three separate sampling exercises between November 2021 and January 2022. The collection, storage, and analysis of the water samples adhered to the recommended guidelines set by the American Public Health Association [41]. The altitudes and geographic coordinates of the sampling places were ascertained via the Global Positioning System.

A total of sixteen (16) parameters were analysed for at the Ghana Atomic Energy Commission (GAEC) laboratory. The physical parameters, pH and Electrical Conductivity (EC) were recorded using the WTW 323 model pH meter and EC meter respectively. The calcium (Ca2+), magnesium (Mg2+), and total hardness levels were measured with an Ethylenediaminetetraacetic acid (EDTA) titration. On the other hand, the Potassium (K+) and sodium (Na+), and fluoride (F) content of the samples were determined with a flame photometric and 2-(parasulfophenylazo)-1,8-dihydroxy-3,6-naphthalene-disulfonate (SPADNS) methods respectively following [42]. The alkalinity and total dissolved Solids (TDS) concentrations of the samples were recorded via by strong acid titration and electronic colorimeter methods. The levels of sulphate (SO42−) was determined with an ultraviolet spectrophotometer. Additionally, the hydrazine reduction, stannous chloride, and azomethine-H methods were employed to assess nitrate (NO3), and phosphate (PO42−) concentrations [41].

Prior to using the data for subsequent investigation, calculations for Charge Balance Error (CBE) were conducted. According to Hounslow [43], the CBE was calculated using the formula below, resulting in a yield of ± 10% for the parameters which is generally accepted as adequate for groundwater studies [44].

$$CBE = \frac{{\sum cations \left( {{\text{meq}}/{\text{L}}} \right) - |\sum anions \left( {{\text{meq}}/{\text{L}}} \right)|}}{{\sum cations \left( {{\text{meq}}/{\text{L}}} \right) + |\sum anions \left( {{\text{meq}}/{\text{L}}} \right)|}} \times 100,$$
(1)

where \(\sum cations\) is the summation of measured cations and \(\sum anions\) is the summation of the measured anions in meq/L and CBE is the charge balance error.

2.2.2 Development of the ML models

The study suggests the use of SVR, RFR, MLR, DTR and ANN models for the assessment and prediction of Irrigation Water Quality (IWQIs). Compared to traditional statistical models, ML models are more flexible, scalable, and more accurate. However, a lot of ML related statistical techniques remain sparse and lack uniformity. There is need to compare and evaluate the performances of several ML models to determine their feasibility for the specific area in which they will be applied [45]. To address this, the study incorporates the use of multiple ML models against multiple IWQIs, rather than one or two of each.

The IWQIs used within the study include SAR, SSP, Na%, MH, KR, RSC. PI and PIG. IWQI combines pH, F, NO3, Cl, SO4, Ca, Mg, TDS, and TH in its determination. PIG uses a combination of NO3, Cl, SO4, HCO3, Na, K, Ca, Mg, TDS, and TH. SSP, SAR and KR combine Na, Mg, and Ca while Na% and PI combine the same 3 elements the addition of K in the case of Na% and HCO3 in PI’s case. MH makes use of only Ca and Mg.

The first step is the data pre-processing phase. The dataset is divided into independent and dependent variables, concentrations of the variables or features are used as input and the indices output. Thus, the values for the IWQIs are shown in Table 1.

Table 1 Number of neurons in each layer of all ANN models in the study
2.2.2.1 SAR

According to Sposito and Mattigod [46], Sodium Adsorption Ratio (SAR) is an effective means of determining the sodium levels in water and is regarded as the primary factor to take into account when assessing the viability of an irrigation water source. It is determined using Eq. 2:

$$SAR = \frac{{{\text{Na}}^{ + } }}{{\sqrt{\frac{1}{2}} \left( {{\text{Ca}}^{2 + } + {\text{Mg}}^{2 + } } \right)}},$$
(2)

where \({\text{Na}}^{ + }\), \({\text{Ca}}^{2 + }\), \({\text{Mg}}^{2 + }\) represent the concentration of sodium, calcium, and magnesium ions respectively in the sample [47].

2.2.2.2 SSP

Soluble Sodium Percentage is a parameter used in the determination of sodium levels in water in the agricultural setting. It was determined using Eq. 3:

$$SSP = \frac{{\left( {{\text{Na}}^{ + } + {\text{K}}^{ + } } \right)}}{{\left( {{\text{Ca}}^{2 + } + {\text{Mg}}^{2 + } + {\text{Na}}^{ + } + {\text{K}}^{ + } } \right)}} \times 100,$$
(3)

where \({\text{Na}}^{ + }\), \({\text{K}}^{ + } ,\) \({\text{Ca}}^{2 + }\), and \({\text{Mg}}^{2 + }\) represent the concentration of sodium, potassium, calcium, and magnesium ions respectively in the sample [48].

2.2.2.3 MH

Magnesium hazard is another parameter employed in the assessment of water quality. Magnesium ions have the ability to disrupt the balance between calcium and magnesium ions, negatively affecting growth of plants making MH an integral tool in determining water viability for agricultural uses [49]. It is calculated using Eq. 4:

$$MH = \frac{{{\text{Mg}}^{2 + } }}{{\left( {{\text{Ca}}^{2 + } + {\text{Mg}}^{2 + } } \right)}} \times 100,$$
(4)

where \({\text{Mg}}^{2 + } ,\) \({\text{Ca}}^{2 + }\) represent concentration of magnesium and calcium ions respectively [48].

2.2.2.4 KR

Kelly’s ratio, developed by Kelly [50] and used in the assessment of the negative effects of salt on water quality with respect to irrigation. It calculated by measuring sodium ions against calcium and magnesium ions through Eq. 5:

$$KR = \frac{{{\text{Na}}^{ + } }}{{{\text{Ca}}^{2 + } + {\text{Mg}}^{2 + } }},$$
(5)

where \({\text{Na}}^{ + }\), \({\text{Ca}}^{2 + }\), and \({\text{Mg}}^{2 + }\) represent the concentrations of sodium, calcium, and magnesium ions [51].

2.2.2.5 PI

Permeability Index is an integral tool employed in many studies a measure in evaluating how suitable groundwater is for agricultural use [52]. It is calculated using Eq. 6:

$$PI = \frac{{{\text{Na}}^{ + } + \sqrt {{\text{HCO}}_{3}^{ - } } }}{{\left( {{\text{Ca}}^{2 + } + {\text{Mg}}^{2 + } + {\text{Na}}^{ + } } \right)}} \times 100,$$
(6)

where \({\text{Na}}^{ + } ,\) \(\sqrt {{\text{HCO}}_{3}^{ - } }\), \({\text{Ca}}^{2 + }\), and \({\text{Mg}}^{2 + }\) represents the concentrations of sodium, bicarbonate, calcium, and magnesium ions respectively [48].

2.2.2.6 RSC

Residual Sodium Carbonate is as a result of increase in levels of carbonate and bicarbonate beyond the levels of calcium and magnesium [52]. The index is used to determine the alkalinity risk the groundwater presents [53]. It is calculated using Eq. 7:

$$RSC \left( {{\text{meq}}\;{\text{L}}^{ - 1} } \right) = \left( {{\text{CO}}_{3}^{2 - } + {\text{HCO}}_{3}^{ - } } \right) - \left( {{\text{Ca}}^{2 + } + {\text{Mg}}^{2 + } } \right),$$
(7)

where \({\text{CO}}_{3}^{2 - } , \;{\text{HCO}}_{3}^{ - }\), \({\text{Ca}}^{2 + }\), and \({\text{Mg}}^{2 + }\) represent the concentration of carbonate, bicarbonate, calcium, and magnesium ions respectively [47].

2.2.2.7 PIG

Pollution index of groundwater is considered an integral tool in the evaluation of the viability of water conveying the overall quality of the water [54]. It is calculated using Eq. 11:

$$WP = \frac{RW}{{\sum RW}},$$
(8)
$$SC = \frac{C}{\sum DS},$$
(9)
$${\text{OW}} = {\text{WP}} \times {\text{SC,}}$$
(10)
$${\text{PIG}} = \mathop \sum \nolimits_{i}^{n} OW,$$
(11)

where OW represents the overall water quality, WP represents weight parameter, SC status of concentration, C concentration of the sample, DS drinking water quality standard, and RW relative weight [55].

2.2.2.8 Na%

Sodium percentage is another parameter used in the determination of the level of salinity. It is determined by measuring the concentrations of sodium and potassium against calcium and magnesium [56]. The index is calculated using Eq. 12:

$${\text{Na}}\% = \frac{{100 \times {\text{Na}}}}{{{\text{Ca}} + {\text{Mg}} + {\text{K}}}},$$
(12)

where \({\text{Na}}^{ + } ,\;{\text{Ca}}^{2 + }\), \({\text{Mg}}^{2 + } , \;{\text{and}}\;{\text{K}}^{ + }\) represents sodium, calcium, magnesium, and potassium ions [57].

The variables used in determining the indices are used as input features and the index itself the output or dependent variable. The data was scaled by transforming them to fit within a specific range. This ensures that one or more variables do not dominate the calculations within an algorithm thus improving the accuracy [58]. Two major challenges researchers often face in the development of ML models are overfitting and underfitting. Overfitting occurs when the ML model is unable to generalize thus, unable to accurately predict new data. Several factors could contribute to this some being, limited data, too much noise in the data and the complexity of the dataset [59]. An ML model is classified as underfitted when it is unable to learn the relationships in the data used for its training. To avoid this, the dataset is split into two parts, the training set, and the testing set. 80% of the dataset was used for the training of the models and 20% for testing. A thorough evaluation of the predictive performance of the model cannot be achieved with the same data used for its training, therefore after the training process, the test set (data that the model has not seen before) is used for its evaluation. Regularization a method that penalizes generalization error and cross-validation were employed as well to reduce the possibility of overfitting and improve model accuracy as well [60].

2.2.2.9 ANN architecture

Utilizing nodes, similar to the neurons in the human brain in the analysis and generation of output, artificial neural network (ANN) is a computing system modelled after the human brain. They find utility in many different domains, including research, medicine, engineering, and architecture, where their ability to solve complicated issues is necessary. They can convert data into information that is understandable by humans by extracting noteworthy features from huge databases [61, 62]. The architecture of a neural network plays a crucial role in the network’s ability to decipher the complex relationships that potentially exist with the dataset and extract meaningful information from it. The number of layers, number of nodes, and the general design of the network itself affects its invariance and complexity [63]. To aid in the selection of the right set of hyperparameters for each model, an optimization toolkit was employed. An optimization toolkit is a tool that aids in the selection of the right set of hyperparameters for a neural network. It achieves this by running several iterations of the training of the model changing hyperparameters at each turn until it identifies the model that performed best. After identifying the best model, it takes the hyperparameters used in training it and trains our model with it. Table 1 shows the ANN architecture of the models used in the study.

Table 1 shows the makeup of all ANN models in the study, a total of 8 ANN models were generated for each index used in the study. Each model has 15 input neurons representing the variables from the dataset there is also just one output neuron due to regression being the focus of the study. However, there are differences in the number of neurons in the hidden layers for each model. ANN-KR has 16 neurons in the first hidden layers and 64 in the second. ANN-MH has 64 neurons in the first, and 16 in the second, while ANN-PI has 32 neurons in the first, and 16 in the second. ANN-PIG has 16 neurons in the first hidden layer and 64 in the second just like ANN-KR, ANN-RSC has 64 neurons in the first hidden layer and 32 in the second hidden layer. ANN-SAR also has 16 and 64 neurons in the first and second hidden layers respectively. Finally, ANN-SSP and ANN-Na% 32, 16 and 16, 32 neurons in their first and second hidden layers respectively.

2.2.2.10 Model evaluation

Evaluation metrics are statistical measures employed in the assessment of the performance and effectiveness of a ML model. They give a more holistic understanding of the performance of the model and errors it generated. It also provides a meaningful way to compare different ML models. There are a lot of metrics that could be used in the assessment of ML models; however, it is important to ensure that the appropriate evaluation metric is used [64]. In this study, RMSE, R squared, and MAE were used. Root Mean Square Error (RMSE) measures the average difference that exists between a model’s predicted values and its actual ones. It provides insight about the distribution of prediction errors. Lower RMSE indicate a more suitable model [51]. RMSE is calculated using Eq. 13.

$$RSME = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \widehat{{y_{i} }}} \right)^{2} } .$$
(13)

R value or the coefficient of determination measures the ability of the model to replicate observed outcomes. It achieves this by explaining the total variance of outcome the model is able is explain [65]. It is determined using Eq. 14.

$$R^{2} = \frac{RSS}{{TSS}}.$$
(14)

Mean Absolute Error measures the average variance that exist between the absolute values in the dataset and the predicted. This calculates the average of the residuals within the dataset. The lower the MAE, the higher the accuracy of the model [65]. The determination of MAE is achieved through Eq. 15.

$$MAE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {P_{i} - M_{i} } \right|.$$
(15)
2.2.2.11 Parameter optimization

In both ML and DL, the output a model produces is dependent on the parameters within the model. Output from models may not be the best they can be due to the fact that these parameters are set to default values. To gain the best results, hyperparameters need to be tuned to optimum values. Proper hyperparameter tuning can result in better accuracy, prevent overfitting and even make the model more robust [66]. In this study, cross validation was employed to ensure the data the model was trained on was balanced, this was achieved through k-fold cross validation [67]. Hyperparameter tuning was achieved through an optimization toolkit. The toolkit runs possible iterations of the models trying out random parameters in each case until it finds a model with the best results. It then obtains the optimum parameters and then trains the model on these parameters [68].

3 Results

3.1 Water quality indices

The results (Table 2) of the analysis show that all water samples are suitable for irrigation in terms of SAR (100% samples classed as “Excellent”), SSP (100% classed as “Good”), RSC (100% classed as “Suitable”) and KR (Fig. 3a; 100% classed as “Suitable”).

Table 2 General statistics of IWQIs from the study area
Fig. 3
figure 3

Distribution maps of A KR B MH C Na% D PI E PIG

The results of the MH analysis show a range from 17.07 to 95.04 with an average of 56.42. This translates into 30.48% of samples being classed as good while the majority of the samples (69.52%) fall under the unsuitable category. The North (Mengwe) and northeast surrounding Tayiri are characterised as good. The remaining portion of the region contains borehole samples that are categorised as unsuitable for irrigation purposes (Fig. 3b).

Results from the analysis shows a minimum Na% value of 7.93 and a maximum of 99.70, with an average of 37.58. 15% of samples are categorised as excellent, 43.81% as good and 32.38% as permissible. The results further show that 5.71% are poor whiles only 2.86% are classed as unsuitable for irrigational purposes. Both Chasea and Tiza have comparable challenges with their groundwater supplies for irrigation due to high sodium content. These sites have samples which are classified as unsuitable and of poor quality. The centre region of the area has groundwater sources that are classified as having excellent quality for irrigation purposes (Fig. 3c). The overwhelming majority of samples (91.19%) show the groundwater sources were suitable as irrigable water as per the Na% index.

The continual use of irrigable water has an accumulative impact on soil permeability. Therefore, quantifying PI serves as an important metric for evaluating irrigation water quality. This index is controlled by HCO3, Ca2+, Mg2+ and Na+. The results show an average of 68.31, the majority of the samples (69.62%) fall under the “good” category with only 29.52% and 2.86% falling under the “moderate” and “poor” categories respectively. The areas surrounding Chasea are characterised by poor to moderate groundwater resources. Similarly, the northern portion of the region, including Kunfabiala, Siroo, Kparesaga, and Ga, has moderate groundwater quality (Fig. 3d).

The pollution index of groundwater (PIG) for samples similarly paints a favourable picture for the use of groundwater from the area for irrigational purposes. The values for PIG fall between 0.20 and 2.26 with an average of 0.47. The majority of the sample (96.19%) show insignificant pollution while only 4 sample points in general are classed from low pollution to high pollution (1 sample). Only Mengwe in the northern part of the area has groundwater samples identified as having moderate to high pollution (Fig. 3e).

3.2 Correlation matrix analysis

The correlation analysis was undertaken to evaluate the correlations between the dependent variables which are the water quality indices and the independent variables (Total Hardness, Alk, TDS, Mg, Ca, K, Na, HCO3, CO3, SO4, Cl, NO3, F and pH) and also among independent variables, as shown in Fig. 4.

Fig. 4
figure 4

Inter-correlation matrix of the selected water quality variables and the independent variables

pH only shows a positive correlation with CO3 and MH. Cl and SO4 are both positively correlated with each other and with NO3. F does not show a strong positive correlation with any of the other parameters. NO3 shows a positive correlation with TDS, Mg, Ca, SO4 and Cl. SAR on the other hand is only positively correlated with KR. PI is only positively correlated with dependent variables of KR, Na%, RSC and SSP. It however shows a significant negative correlation (− 0.69) with PIG.

A total of 24 models were developed for the study, 8 each for ANN, GBR, and SVR. Table 3 shows the result for each model after testing which include the coefficient of determination (R square), root mean square error (RMSE), and mean absolute error (MAE). The results indicate that all models were able to predict the indices with high level of accuracy. SVR-RSC produced the highest R square with a score of 0.99, an RMSE of 0.09, and MAE of 0.07. GBR-RSC reported the lowest R square with a score of 0.60, RMSE of 0.60, and MAE of 0.32. Overall, models from SVR produced higher R square scores with 6 out of the 8 models reporting R squares above 0.90. SVR models produced low RMSE and MAE when compared to ANN and GBR. This indicates that SVR is a better predictor compared to ANN and GBR. Models from ANN produced better R square scores with 4 out of 8 models with R squares above 0.90. ANN overall had similar RMSE and MAE values as GBR indicating that ANN is a better predictor when compared to GBR.

Table 3 Performance and best parameters of all models after testing

A comparison between the actual and predicted values of all models is presented in Figs. 5, 6, 7, and 8. Figures 5a–f, 6a, b depict that of ANN models for the indices KR, MH, Na%, PI, PIG, RSC, SAR, and SSP respectively. Figures 6c–f, 7a–d show that of GBR models and Figs. 7e, f, 8a–f SVR models all in the same order. Expectedly the results from Figs. 5, 6, 7, and 8 mirror the results from Table 3. Data points from SVR models are closest to the regression line, indicating a high accuracy in prediction, validating the high R square score from the SVR models. Data points from GBR-RSC are sparsely arranged on the chart with many data points far away from the regression line, indicating a lower accuracy when compared to other models and this supports results from Table 3 as well, GBR-RSC produced the lowest R square score.

Fig. 5
figure 5

A scatterplot comparing the actual or observed values against the predicted in the testing set for ANN models a KR, b MH, c Na%, d PI, e PIG, and f RSC

Fig. 6
figure 6

A scatterplot comparing the actual or observed values against the predicted in the testing set for ANN models a SAR and b SSP and c GBR models KR, d MH, e Na%, and f PI

Fig. 7
figure 7

A scatterplot comparing the actual or observed values against the predicted in the testing set for GBR models a PIG, b RSC, c SAR, and d SSP and SVR models e KR and f MH

Fig. 8
figure 8

A scatterplot comparing the actual or observed values against the predicted in the testing set for SVR models a Na%, b PI, c RSC, d PIG, e SAR, and f SSP

Figures 9, 10, 11, and 12 provide integral information regarding the training and testing process. It depicts the prediction trends of both the training and testing phases and the errors generated during each process. The trends of ANN models are shown in Figs. 9a–f, 10a, b, that of GBR Figs. 10c–f, 11a–d, and SVR Figs. 11e, f, 12a–f. The results indicate no major deviation in the trends between the training and testing sets of all models and this suggests that the likelihood of overfitting occurring is quite low. Results from Figs. 10c–f, 11a–d also indicates that GBR produced the most errors during its testing phase, this can be observed through the differences in errors generated in the training phase when compared to the errors produced in the testing phase. This correlates with the results from Table 3, errors negatively impact the accuracy of the model thereby reducing R square score.

Fig. 9
figure 9

A validation curve comparing the errors generated in the training set against errors generated in the testing set for ANN models a KR, b MH, c Na%, d PI, e PIG, and f RSC

Fig. 10
figure 10

A validation curve comparing the errors generated in the training set against errors generated in the testing set for ANN models a SAR and b SSP and GBR models c KR, d MH, e Na%, and f PI

Fig. 11
figure 11

A validation curve comparing the errors generated in the training set against errors generated in the testing set for GBR models a PIG, b RSC, c SAR, and d SSP and SVR models e KR and f Na%

Fig. 12
figure 12

A validation curve comparing the errors generated in the training set against errors generated in the testing set for SVR models a MH, b PI, c PIG, d RSC, e SAR, and f SSP

Figures 13, 14, and 15 provide a platform for understanding individual predictions made by each model, and which variables contribute most to the model output and whether the impact is positive or negative. Results from Fig. 13a depict sodium as the main contributor to ANN-KR’s output. The intensity suggests sodium affects the model’s output positively. Figure 13b shows calcium as the primary contributor with negative intensity, Fig. 13c shows sodium as the primary contributor as well, similar to Fig. 13a. ANN-PI from Fig. 13d has magnesium as its primary contributor whiles ANN-PIG, Fig. 13e, has nitrate as its own. Examination of the rest of the results outlines sodium as the primary driver in relation to model output for 50% of the models considered in the study.

Fig. 13
figure 13

SHAP summary plot indicating feature importance and intensity for ANN models a KR, b MH, c Na%, d PI, e PIG and f RSC

Fig. 14
figure 14

SHAP summary plot indicating feature importance and intensity for a SVR-MH and b GBR-MH

Fig. 15
figure 15

SHAP summary plot indicating feature importance and intensity for a GBR-PI, b SVR-PI, c GBR-PIG, d SVR-PIG, e GBR-RSC, f SVR-RSC

4 Discussions

The application of diverse indices to evaluate groundwater suitability for irrigation purposes leads to policymakers having more comprehensive information than they would if they only relied upon individual chemical indices. SAR, RSC, and KR indices used in the Water Quality Indices analysis show that the water samples were good enough for irrigation in most cases. On the other side, the MH index shows that more than two-thirds of the samples (69.52%) is considered “Unsuitable” and could involve soil quality outcomes from high Mg2+ concentration as well. The strong correlation between Mg2+ with NO3, SO4, Cl, HCO3 and Ca, suggests a combined influence of geogenic and anthropogenic controls. NO3, SO4 and Cl are known pollution markers [69] which could emanate from the farm inputs used by the farmers in the area, while HCO3 and Ca may come from rock weathering or interaction with the groundwater [1, 11]. Sodium (Na%) analysis mostly indicates that they are suitable for irrigation but sites like Chasea and Tiza may face some challenges due to high sodium content reflecting localized soil permeability issues. These results are similar to findings by [1, 70] who found the majority of the groundwater sources to fall within good to permissible categories in groundwater in the Birimian and Granitoids of southwestern, Ghana. This is considered good since high Na% deflocculates and impairs soil permeability [71]. However, according to Singh et al. [72] the high Na+ found in some water sources may be due to inorganic fertilizer application and the dissolving of minerals from the underlying geology.

The Permeability Index (PI) that is controlled by HCO3, Ca2+, Mg2+, and Na+ values, generally shows that the majority is in a moderate to poor condition, except for a few places including Chasea and northern part of the area. The Pollution Index of Groundwater (PIG) indicates that groundwater quality in the study area is generally good for irrigation. The moderate to high pollution levels identified at Mengwe in the north of the study area, suggests an impact of the high level of illegal mining, which is characteristic of the area. This coupled with agricultural activities may be contributing to the deteriorating quality of irrigable water in the area.

Employing the use of 3 different ML models, ANN, GBR, and SVR, the study sought to ascertain whether IWQIs could be predicted using samples collected from the field. The performance of the model was determined using coefficient of determination (R2), root mean square (RMSE), and mean absolute error (MAE). The results from the study suggest strong predictive abilities by ANN, GBR, and SVR. This conclusion is drawn based on the overall performance of all models which is found acceptable based on the high coefficient of determination [73].

Comparing the performances of ANN models only, results show ANN-SAR as the best performing model due its high R2 of 0.96 and low RMSE and MAE of 0.07 and 0.05. The lowest performing models within ANN were ANN-MH and ANN-PIG with an R2 of 0.72 for both, RMSE of 5.71, and MAE of 4.36 for MH and RMSE of 0.06 and MAE of 0.04 for PIG, making ANN-MH overall the worst performing model in the group.

Results from GBR reveal GBR-SAR and GBR-PIG as the best performing models with high R2 of 0.91 each and RMSE and MAE of 0.10, 0.07 and 0.04, 0.02 respectively, making ANN-PIG overall the best performing model in the group. GBR-PI produced an R2 of 0.90 and RMSE and MAE of 5.40, 3.74 respectively. GBR-SSP had an R2 of 0.87 with RMSE of 2.78 and MAE of 1.83 while GBR-MH produced an R2 of 0.86, RMSE of 4.14, and MAE of 2.91. Both GBR-KR and GBR-Na% produced 0.85 R2 and for RMSE and MAE, 0.07 and 0.05 for KR, and 6.46 and 4.72 for Na%. GBR-RSC exhibited relatively low R2 value compared to the rest of the models with a value of 0.60, RMSE of 0.06 and MAE of 0.32.

SVR-RSC demonstrated excellent performance R2 of 0.99, RMSE of 0.09, and MAE of 0.07. SVR-SSP, R2 of 0.98, RMSE of 1.42, and MAE of 1.13, SVR-PIG, R2 of 0.98, RMSE of 0.04, and MAE of 0.03, SVR-SAR, R2 of 0.97, RMSE of 0.06, and MAE of 0.05, SVR-PI with R2 of 0.97, RMSE of 2.80, and MAE of 2.0, and SVR-MH, R2 of 0.95, RMSE of 2.41, and MAE of 1.94. In the lower end of the SVR models were SVR-KR and SVR-Na%. SVR-Na% had R2 of 0.89, RMSE of 5.60, and MAE of 3.90 while SVR-KR had an R2 of 0.86 with RMSE 0.06 and MAE 0.06 making SVR-KR the worst performing model.

The key observations made from the study with regard to model performance are that SVR models generally outperformed ANN and GBR and had 6 out 8 models performing high with R2 of at least 0.95. This finding closely mirrors the findings from [74, 75] where SVR was also found to have outperformed other models used in their studies indicating a consistency with previous publications. GBR models performed variably with good performances in indices like SAR, PIG, and PI but average in the others, and even bad in the case of RSC therefore lacking consistency as compared to ANN and SVR. Another key observation was ANN, GBR, and SVR all exhibited high predictive performance for the index SAR with R2 of 0.96, 0.91, and 0.97 respectively. This highlights SAR as a consistently high performing index across different predictive techniques, this finding is consistent with results from Mokhtar et al. [76] and Bilali and Taleb [77].

The importance of a predictive variable in terms of model output and its intensity was determined using the SHAP summary plot. Findings indicate sodium as the driving influence with respect to 50% of the models generated in the study. The relationship between sodium and model output was determined to be positive. The means that an increase in sodium would mean an increase in model output and a decrease in sodium would mean a decrease in model output. This highlights the importance of sodium in the evaluation of the quality of water for irrigation purposes. Calcium is outlined as the most important feature in models applied on MH reflecting its importance in the model output of ANN-MH, GBR-MH, and SVR-MH but calcium was found to be negatively correlated with model output per Figs. 13b, 14a, b. This means that a decrease in calcium would cause an increase in model output and an increase in calcium would mean a decrease in model output. The results also show magnesium as the major contributor to all models involving PI, ANN-PI, GBR-PI, and SVR-PI. The relationship observed here is also negative, indicating that an increase in magnesium would spell a decrease in model output and vice versa. 2 models namely ANN-PIG and SVR-PIG had nitrate as the main driver, but GBR-PIG had TDS as its first and nitrate as the second making GBR-PIG a standalone result. The relationship observed in all 3 cases though was positive. A similar finding is observed in the case of models involving RSC where ANN-RSC and SVR-RSC both had magnesium as the main contributor to model output, but GBR-RSC had nitrate as its primary contributor to model output and magnesium placing second. The relationship observed in all 3 cases though was negative.

The study outlines SVR and ANN as a highly effective tool for predicting IWQIs, offering really accurate and reliable predictions across a wide range of IWQIs. This opens its potential as a highly effective method for water quality assessment concerning irrigation. GBR, though effective isolating its results pales in comparison to ANN and SVR and therefore may need some form of optimization to reach the performance levels of ANN and SVR and through the SHAP summary plot sodium, calcium, and magnesium are identified as the primary drivers for the prediction of IWQIs.

5 Conclusion and recommendations

In conclusion, the irrigation water quality indices show that all the water samples are fit for use as irrigable water as per SAR, SSP, RSC and KP. However, MH, Na%, PI and PIG showed that 69.52%, 8.57%, 29.52% and 3.81% are unsuitable as irrigable water. It is noteworthy that samples from Chasea and Tiza have been classed as unsuitable for use as irrigation water as per Na% and PI. This suggests a particularly significant influence of anthropogenic factors on the water in the area. These areas must therefore be monitored closely to ensure that there is no irreparable damage done to the groundwater sources available there.

This study illustrates the application of multi-method machine learning models in estimating groundwater suitability for irrigation purposes. The results lay the groundwork for the selection of suitable models based on the needs of the application and the features of the dataset, which will ultimately help policymakers and managers of water resources make well-informed decisions for efficient management of water quality. This is demonstrated by the designation of water quality indices on areas like Chasea and Tiza which clearly represent poor water quality. The findings show that policies should be formulated in order to reduce anthropogenic impacts especially from illegal mining and agricultural practices in certain parts of the area, to secure the area’s groundwater supply, promote sustainable water usage, and guarantee long-term water safety for the agricultural sector.

Policy makers and workers in the agricultural sector in the area can also gain from the knowledge of the suitability of groundwater for irrigation to make more informed decisions regarding the selection of agricultural crops to grow and soil types to cultivate on. Based on the analysis of several indices most groundwater samples are satisfactory for irrigation purposes but there are parts of the area which can be marked as contaminated and further study and control measures are needed due to human activities. This study demonstrates that machine learning models can provide good predictions of water quality indices and can be further used as a tool for future research or direct application for water quality management. The direct way the human activities affect groundwater quality in some parts of the area necessitates the need to address these particular activities in order to help in conserving this resource.

5.1 Study limitations

It is imperative to recognize the limits of the research, such as the dependence on past data and the presumption of stationarity in the metrics pertaining to water quality. Subsequent investigations may examine the integration of data in real-time and tackle the obstacles posed by dynamic environmental circumstances. Predictive accuracy may also be improved by more research into ensemble approaches and hybrid models that combine the advantages of many algorithms. Elimination of as many outliers as possible and improvement in the quality of data may also provide more insight into how they can affect the accuracy of models.