1 Introduction

It is believed that groundwater is one of the purest forms of water, being situated several meters below the surface of the earth and protected from a number of degrading environmental conditions. The exploration of groundwater originated as a result of a pressing need for an alternative source of water for drinking, irrigation, industrial use, laundry, and other purposes. Over the past decades, groundwater has been increasingly exploited all over the world. According to reports by water researchers, over time, groundwater has become the number one source of water for most developed cities all over the world (Alizamir & Sobhanardakani, 2017a, 2017b; Egbueri et al., 2021a; Wagh et al., 2016). Therefore, studies on groundwater resources are vital for its sustainability and the well-being of water users. Water resources are faced with potential contaminants from anthropogenic and non-anthropogenic origins. Studies have shown that the rate of degradation of groundwater resources is increasing (Ayejoto et al., 2022; Egbueri et al., 2021b, 2022a, 2022b; Papazotos, 2021; Ravindra et al., 2022; Wagh et al., 2016). To mitigate the degradation of water resources, potential sources, pathways, and future possibilities of contamination need to be identified. Numerous data-driven (numerical, graphical, statistical, and machine learning) approaches have been applied to identify the possible sources of contamination (Wagh et al., 2016, 2017b; Ansari & Umar, 2019; Egbueri, 2019, 2020; Enyigwe et al. 2021), pathways of contaminants (Egbueri & Agbasi, 2022a, 2022b; Wang et al., 2012; Yang et al., 2020), and to forecast the chances of reoccurrence of these contaminants in water resources (Wagh et al., 2017b; Alizamir & Sobhanardakani, 2017a, 2017b; Egbueri & Agbasi, 2022b).

Groundwater quality is graded after consideration of various water quality parameters like pH, EC (electrical conductivity), TDS (total dissolved solids), cations, anions, metals, amongst others (Subba Rao et al., 2022a). According to the World Health Organization (WHO, 2017) and Standard Organization of Nigeria (SON, 2015), these parameters have an acceptable concentration level in water resources. When found above their acceptable concentration level, they are considered to be hazardous. Moreover, the presence of essential water quality parameters like copper, manganese, iron, and others below their required concentration level may lead to deficiencies (Eghbaljoo-Gharehgheshlaghi et al., 2020; Kumar et al., 2022; Saleem et al., 2022). Due to the numerous parameters considered to determine the overall quality of water resources, water quality indices were introduced. Water quality indices (e.g., unweighted multiplicative water quality index, national sanitation foundation water quality index, overall index of pollution, synthetic pollution index) integrate data of water quality parameters from analyzed water samples and come up with a quantitative description of water resources. The quantitative description is interpreted qualitatively with the aid of a classification scheme. Each water quality index has a different classification scheme. Nevertheless, most have a strong agreement (Egbueri & Agbasi, 2022b). Since the quality of groundwater resources is rated by the concentration levels of various water quality parameters, prediction of future occurrences of these parameters will enhance the forecasting of groundwater quality. Similarly, since water quality indices compute the overall water quality of water resources, forecasting these indices would provide more detailed information on the possible future state of water resources. In other words, forecasting water quality indices and parameters could assist in the effective assessment and monitoring of water resources (Agbasi & Egbueri, 2022; Wagh et al., 2017b).

Traditionally, water quality in different parts of the world has been monitored using field sampling and laboratory testing. However, this process has been hampered by labor and testing costs. The adoption of deep learning in water research has sparked great revolutions and innovations with regard to the assessment and monitoring of water quality. Deep learning is an important component of data science, which also includes statistics and predictive modelling. It assists researchers who are tasked with collecting, analyzing, and interpreting large datasets (Burns & Brush, 2021). Deep learning neural networks, which include artificial neural networks (ANNs), recurrent neural networks, and convolutional neural networks, have been used in the predictive modelling of water quality parameters (Alizamir & Sobhanardakani, 2017a, 2017b; Egbueri, 2021; Wagh et al., 2016). Moreover, other data-intelligent models like linear regression, multiple linear regression (MLR), support vector machines, amongst others, have also been utilized globally to predict various water quality parameters. The application of deep learning and data-intelligent models has significantly reduced the cost of monitoring and assessment of water quality. Studies conducted include the prediction of pH in water (Egbueri & Agbasi, 2022b; Huang et al., 2019; Son et al., 2021; Stackelberg et al., 2020), prediction of TDS in water (Egbueri & Agbasi, 2022b; Jamei et al., 2020; Mehrdadi et al., 2012; Salmani & Jajaei, 2016), prediction of TH in water (Azad et al., 2018; Egbueri & Agbasi, 2022b; Roy & Majumder, 2018), prediction of anions in water (Egbueri, 2021; Mousavi & Amiri, 2012; Wagh et al., 2017b; Yesilnacar et al., 2008; Zare et al., 2011), prediction of cations in water (Aghel et al., 2019; Bondarev, 2019; Katimon et al., 2018; Nhantumbo et al., 2018; Subba Rao et al., 2022b), prediction of metals in water (Alizamir & Sobhanardakani, 2017a, 2017b; Egbueri, 2021; Fard et al., 2017; Ozel et al., 2020; Rooki et al., 2011), and prediction of water quality indices (Chia et al., 2022; Egbueri, 2022a, 2022b).

ANN is a powerful tool designed to mimic the neural functions of the human nervous system (Wagh et al., 2017a). As a result, ANN has the ability to learn a dataset, and its learning ability aids in simulating complex nonlinear relationships (Agatonovic-Kustrin & Beresford, 2000; Saljooghi & Hezarkhani, 2015; Song et al., 2022; Uncuoglu et al., 2022), making it possible to produce meaning out of a dataset in a short period of time. The general structure of the ANN consists of an input layer, a hidden layer, and an output layer, each with numerous neurons (Diamantopoulou et al., 2005; Pandey et al., 2016; Rai et al., 2005). ANN has been shown to be effective in the prediction of water quality parameters in many regions of the world (Egbueri 2022b; Irvan et al., 2022; Kouadri et al., 2022). This tool has also been undoubtedly valuable in studies related to other disciplines. MLR is also a learning tool that aids in depicting the linear relationship between two or more variables. Thus, the MLR is regarded as an advanced form of simple linear regression. This model has found applications in a good number of research studies. Notably, in water resources research, MLR has been utilized successfully for the predictive modelling of water quality parameters (Agbasi & Egbueri, 2022; Kouadri et al., 2021).

Due to the toxicity and bio-accumulative nature of PTEs, their presence in water has received special attention. In central Iran, Bayatzadeh Fard et al. (2017) predicted Fe, Mn, Pb, and Zn in groundwater using ANN, hybrid ANN with biogeography-based optimization, and a multi-output adaptive neural fuzzy inference system. Ucun Ozel et al. (2020) used ANN and an adaptive neuro-fuzzy inference system to model copper, iron, zinc, manganese, nickel, and lead in the Bartin River, Turkey. In Nigeria, Egbueri (2021) predicted NO3, Ni, and Pb in water using ANN. Using ANN, Kanj et al. (2022) predicted mercury in groundwater at Naameh Landfill, Lebanon. Based on literature analysis, ANN and MLR are the most commonly used predictive models for monitoring and assessing water quality. In spite of the popularity of ANN and MLR in water research, the following existing observations were identified from a literature review, and they formed the basis of the present prediction study: (1) majority of the studies that applied machine learning algorithms in water quality assessments focused on the prediction of water quality indices (i.e., the numerical indicators); (2) only a few studies have compared the performances of ANN and MLR in forecasting PTEs; and (3) the few studies that have utilized MLR and ANN to predict PTEs have not utilized sufficient input variables, which are one of the major determinants of accurate prediction. For instance, Ghadimi (2015) combined MLR and ANN for predicting Pb, Zn, and Cu in water. However, only HCO3 and SO42– were used as input variables. Despite using sufficient input variables, Egbueri and Agbasi (2022a) predicted two water quality indices but did not predict PTEs in water. Similarly, Farooq et al. (2022) employed MLR and ANN for predicting water quality parameters; however, PTEs were not predicted. In the study region, Egbueri (2021) predicted PTEs in water using only ANN and also used a few input variables for the prediction. Although some studies in other regions of the world have tested the applicability of MLR and/or ANN in the prediction of PTEs in water resources, there is a dearth of literature that have simultaneously tested or implemented these algorithms for the same purpose in the Nigerian context.

Therefore, the present study aims at predicting PTEs (Cr, Fe, Ni, NO3, Pb, and Zn) in groundwater in Southeastern Nigeria using more input variables. This study integrated and compared the efficacy of ANN and MLR algorithms. Additionally, the drinking suitability of the groundwater resources was evaluated by computing the water pollution index (WPI) and pollution index of groundwater (PIG), which were in turn predicted using the MLR and ANN techniques. The sensitivity of the input variables utilized for this prediction was also analyzed to determine their significance and impact. It is anticipated that the present study will provide clearer insights into effective groundwater sustainability management and protection in the area. It is also hoped that the findings of the study could provide baseline information for future related research in Nigeria and other regions of the world with research gaps in water quality prediction.

2 Background information of the study area

2.1 Location and human activities

The study area is located between latitudes 06°00′N and 6°05′N and longitudes 06°50′E and 07°00′E. Awka-Etiti, Oba, Ojoto, Nnewi, and Nnobi are amongst the communities found within the study region. Pictorial representation of the sampling points can be found in Fig. 1. There is a high rate of industrialization and urbanization in southeastern Nigeria. Thus, the majority of people in the region make a living from commercial activities. Industrial activities include the manufacturing of agrochemicals, food processing, production of building materials, textile production, petrochemical production, etc. Agricultural activities are also prevalent in the locality, and they include: cultivation of land and planting of different crop varieties, livestock farming, fish farming, nomadic farming, etc. Commercial activities in the area involve the trading of goods produced by the agricultural, industrial, and other sectors within the region and neighboring communities. It also involves import and export activities with foreign countries. The human activities in the study region helps to boost their economy and also makes the lives of the people meaningful. However, there are some negative outcomes linked with these activities. For instance, wastes and agrochemicals from diverse farming practices cause water and air pollution. The release of untreated waste materials from industries exposes the environment to PTE pollution. Due to the numerous human activities occurring in the study area, there is an increase in waste generated per capita. Waste disposal facilities in the area are insufficient. Thus, there is indiscriminate disposal of waste materials, which in turn degrades the environment. Over time, PTEs from waste materials get into the water cycle and accumulate, leading to the contamination of water resources in this region.

Fig. 1
figure 1

Map of the study area showing the geographical location and the geologic formations

2.2 Climate, geology, and hydrogeology

The wet/rainy season and the dry season are the two major seasons experienced in this area. The wet season lasts from April to November and is usually cloudy and cool. The dry season lasts from December to March and is usually hot, partly cloudy, and humid. The yearly average temperature is around 75 °F, with temperatures rarely exceeding 90 °F or falling below 57 °F. The northeastern wind blowing through the Sahara and the southwest trade winds from the Atlantic induce the two main seasons (Egbueri et al., 2022a). The annual rainfall in the region has been estimated to be between 1500 and 2000 mm (Nwajide, 2013). The area has a non-uniform topography, as it rests at the apex of the gently sinking portions of the well-known Awka-Orlu Ridge.

The Eocene Nanka Formation and the Oligocene–Miocene Ogwashi Formation are the dominant geologic formations present in the area. (Fig. 1). The Nanka Formation originates from the compressive movements that were fundamental in the folding of the Abakaliki Anticlinorium, which happened during the Campanian–Eocene (Nwachukwu, 1972). Retrograded deposits subsequent to the folding are the components of the current Nanka Formation (Nwajide, 2013). Towards the conclusion of the Eocene's massive tectonic movement, the Ogwashi Formation was deposited (Kelechi, 2017), and the sediments in the depocenters moved down to form the popular Niger Delta. The present study is dominated by the Nanka Formation and it is distributed across the eastern and central lots (Fig. 1). The Nanka Formation is rich in claystones, friable sands, fine-grained fossiliferous sandstone, shales, thin bands of limestone, and sandy shales (Arua, 1986; Reyment, 1965).

Reports on the aquifer features of the Nanka Formation by Okoro et al., (2010a, 2010b) show that the aquiferous units are prolific. In chronological order, the Ogwashi Formation is made up of light-colored mudrocks, coarse-grained sandstone, and lignite seams (Kogbe, 1976). The Formation is identified with two main prolific aquifer systems (Akpoborie et al., 2011). The most desired aquifer unit, which is identified as the alluvial trace deposit, is situated at a shallower depth. However, the less preferred aquifer unit is found at a greater depth and is composed of water rich in iron (Akpoborie et al., 2011).

3 Materials and methods

3.1 Water sampling and analytical procedures

To achieve the research objectives, groundwater samples were collected from boreholes (n = 17) and hand-dug wells (n = 3) within the study area. Water samples were collected using 1 L polyethylene bottles and the location of the water sampling points was determined using a handheld GARMIN GPSMAP 78S series and is well represented in Fig. 1. At each sampling point, the bottles were labelled and put away in a coolant. After the samples were gathered, they were arranged appropriately and sent to the laboratory for proper analysis. Pre and post-sampling procedures adhered to the standard proposed by the American Public Health Association (Rice et al., 2017). Handheld tools (Testr-2, EC meter, HM Digital COM-100, TDS meter) were used in situ to determine the concentration of pH, EC, TDS, and total suspended solids (TSS), respectively. Cations (calcium, sodium, magnesium, potassium), anions (bicarbonate, chloride, nitrate, sulphate), metals (zinc, iron, nickel, lead, chromium) were also examined in the lab, including the total hardness (TH). The methodology incorporated for the analysis of the parameters is presented in Table 1.

Table 1 Techniques used for analysis of cations, anions, and metals

3.2 Indexical methods for groundwater pollution and quality assessment

3.2.1 Computation of water pollution index (WPI)

The water pollution index (WPI) was developed by Hossain and Patra, (2020). In comparison to the existing methods, this new model is more adaptable and simpler to calculate. Water quality is determined primarily by standard permissible concentration (Si) and observed concentration (Ci). The WPI was designed to accommodate an infinite number of input parameters, allowing for extensive research. Using 17 input parameters, this model was used to estimate the impact of COVID-19 on water quality in India (Chakraborty 2021). In this study, 18 parameters (pH, EC, TDS, TSS, TH, Na+, K+, Ca2+, Mg2+, Cl, SO42–, HCO3, NO3, Fe, Zn, Ni, Cr, and Pb) were used to calculate for the WPI value of 20 groundwater samples from southeastern Nigeria. For the WPI model to be applied, the following steps were taken:

Step 1 The pollution load (PLi) of the ith parameter was calculated using the mathematical formula in Eq. 1.

$${PL}_{i} = 1 +\left(\frac{Ci -Si }{Si}\right)$$
(1)

where Ci means concentration of the ith parameter, Si represents the standard permissible limit of the ith parameter. In this study, the permissible limits of water quality parameters approved by the WHO (2017) were followed.

If the pH value is less than or greater than 7, a different method for calculating PLi is recommended (Hossain and Patra, 2020).

If the pH is < 7, then Eq. 2 is recommended (Hossain and Patra, 2020).

$${PL}_{i} = \left(\frac{Ci -7 }{{Si}_{a}-7}\right)$$
(2)

where, Sia is the minimum acceptable pH value i.e., 6.5.

If the pH is > 7, then Eq. 3 is recommended (Hossain and Patra, 2020).

$${PL}_{i} = \left(\frac{Ci -7 }{{Si}_{b}-7}\right)$$
(3)

Sib is the maximum pH value that can be tolerated, which is 8.5.

Step 2 The final WPI scores of the water samples were calculated by adding all PLi values from n parameters and then dividing by n. (Eq. 4). In a case whereby the concentration of any analyzed parameter is zero, that parameter should be subtracted from the total number of parameters (n) in that sample (Hossain & Patra, 2020).

$$\mathrm{WPI}= \frac{1}{n} {\sum }_{i=1}^{n}{\mathrm{PL}}_{i}$$
(4)

The WPI classification schemes for water samples are as follows: excellent water (WPI < 0.50), good water (0.75 > WPI ≥ 0.50), moderately polluted water (1.00 ≥ WPI ≥ 0.75), and highly polluted water (WPI > 1).

It is also worth noting that WPI does not require weightage assignment for the calculation. This eliminates the bias associated with indices that require weighting.

3.2.2 Computation of Pollution index of groundwater (PIG)

Subba Rao (2012) formulated the PIG, which has been successfully used in various locations for monitoring and assessing variations in drinking water quality (Egbueri, 2020; Subba Rao & Chaudhary, 2019; Subba Rao et al., 2018).

Five steps are taken in the evaluation of drinking water quality using PIG (Subba Rao, 2012).

Step 1 This entails calculating the relative weight (Rw) (on a scale of 1–5) of the analyzed parameters based on their individual importance in assessing water quality and relative impact on human health (Table 2). (Subba Rao, 2012).

Table 2 Weightage of parameters for the calculation of PIG

Step 2 This incorporates calculating weight parameters (Wp) for each of the water quality variables to determine their relative contributions to the overall quality of the groundwater samples (Eq. 5).

$$\mathrm{Wp}= \frac{{R}_{w}}{\sum {R}_{w}}$$
(5)

Step 3 The status of concentration (Sc) was calculated by dividing the concentration (C) of each of the analyzed water quality variables in each of the water samples by their respective standard limits (Eq. 6). In the present study, the (WHO, 2017) permissible limits were used in the PIG assessment.

$$Sc= \frac{C}{{D}_{s}}$$
(6)

Step 4 The overall groundwater quality (Ow) was calculated by multiplying Wp by Sc, as shown in Eq. 7.

$$\mathrm{Ow}=\mathrm{Wp }\times \mathrm{Sc}$$
(7)

Step 5 The final step in the PIG assessment was to add up all of the Ow values per sample (Eq. 8).

$$\mathrm{PIG}= \sum \mathrm{Ow}$$
(8)

The PIG values represent the contributions of all chemical variables analyzed in each groundwater sample. As a result, they depict various scenarios of the impact of chemical contamination on aquifer systems (Egbueri, 2020; Subba Rao, 2012; Subba Rao & Chaudhary, 2019; Subba Rao et al., 2018). The classification scheme stated by Subba Rao et al. (2018) for the PIG scores characterizes the extent of pollution in groundwater samples as: insignificant pollution (PIG < 1.0), low pollution (1.0–1.5), moderate pollution (1.5–2.0), high pollution (2.0–2.5), and very high pollution (PIG > 2.5).

3.3 Simulation and prediction of PTEs and water quality indices

3.3.1 Multiple linear regression modelling

Statistical methods, such as regression models, are the most effective tools for investigating any relationship between a sample's dependent and independent variables (Pai et al. 2007; Abyaneh, 2014). The MLR is a method for modelling the linear relationship between one or more independent variables and a dependent variable. The MLR algorithm is based on the least square rule. A model is considered best fitted if the coefficients of determination (R2) is close to one and the modelling errors are small. Equation 9 represents the mathematical expression of the MLR model (Chen & Liu, 2015; Gaya et al., 2020; Kadam et al., 2019; Weisberg, 1985).

$$y= {b}_{0}+{b}_{1}{x}_{2}+{b}_{2}{x}_{2}+\dots +{b}_{i}{x}_{i}+\varepsilon$$
(9)

Globally, the MLR model has been used to establish the relationship between multiple parameters (Pai et al. 2007; Abyaneh, 2014; Chen & Liu, 2015; Arora & Keshari, 2017). In this study, the MLR algorithm was used to simulate and predict water quality parameters and indices. All of the analyzed physicochemical parameters (i.e., pH, EC, TDS, TSS, TH, Ca2+, Na+, Mg2+, K+, Cl, HCO3, SO42–, NO3, Cr, Zn, Fe, Pb, and Ni) were used as predictors for WPI and PIG. In the absence of the predicted variable, similar input parameters were used to predict NO3, Fe, Zn, Ni, Cr, and Pb. IBM SPSS (v. 22) was used to run the MLR modelling. The performance of the MLR simulations was appraised using multiple correlation coefficients (R), coefficients of determination (R2), standard error of estimate (SEE), and adjusted R2.

3.3.2 Artificial neural network modelling

Artificial neural network (ANN) is a prominent artificial intelligence approach. ANNs are made up of computational processing elements known as neurons, which are similar to biological human neurons (Dongare et al., 2012; Maind & Wankar, 2014; Singh et al., 2022). Based on their weights, these neurons are linked to one another. The input, hidden, and output layers of ANNs are formed by these linkages. The weights in the input layer sum up to form the hidden layer and a bias, which adds up to form the output layer (Strik et al. 2005; Wagh et al., 2016; Egbueri, 2021). The predicted output variables are obtained by processing the input variables based on certain activation weights. Due to the reliability of ANNs in predictive modelling, many water researchers have used them to forecast various water quality variables. This is because ANNs offer versatile linear and nonlinear forecasting functions that can efficiently, correctly, and reliably estimate measurable and continuous variables (Egbueri, 2021). This indicates that ANNs may be used to estimate variables that have considerable or complicated connections. As a result of this, ANNs are recognized to be more sophisticated than other empirical models used in environmental monitoring and evaluation, with a high computation rate, learning ability, prediction accuracy, and flexibility.

In the current study, ANN was used to predict water quality indices and critical water quality parameters (NO3, Fe, Zn, Ni, Cr, and Pb). Table 3 contains the instructions and procedures for the ANN modelling which was performed using the IBM SPSS software. The water quality data was divided into two sets: training and testing. To achieve the best results, the models were trained for many iterations. For optimum result, not less than 70% of the dataset was used for the training (Assi et al., 2018; Egbueri, 2021; Fissa et al., 2019; Ghritlahre & Prasad, 2018; Ozel et al., 2020). While the testing aided in the evaluation of the ANN models' performance, training aided in the establishment of relationships between output and input parameters.

Table 3 ANN modeling instructions for the present study

Because the projected parameter scores may not always match the original values in the raw data, the model's performance and dependability need to be validated. This validation aided in the selection of the best ANN models and the discovery of the most efficient activation function methods. The validation of the ANN models in this study was based on the coefficient of determination (R2), adjusted R2, and standard error of estimate (SEE). Equation 10 expresses the R2 function (Egbueri, 2021).

$${R}^{2}=1- \frac{\sum_{i=1}^{n}{\left({X}_{\mathrm{predicted}}-{X}_{\mathrm{experiment}}\right)}^{2}}{\sum_{i=1}^{n}{\left({X}_{\mathrm{predicted}}-{X}_{\mathrm{average}}\right)}^{2}}$$
(10)

While the R2 expresses the goodness of fit in terms of regression model precision in predicting actual data points.

4 Results and discussion

4.1 Physicochemical characteristics of groundwater

For the better judgment of the general quality of a water sample, certain parameters are analyzed. In the present study, 18 parameters were analyzed and the results are presented in Table 4. The pH result revealed that all the groundwater samples were acidic. Acidic water has been linked to acidic rain, industrial pollution, improper sewage disposal, leaching of dissolved elements from dump sites, use of agrochemicals, etc. (Agbasi & Egbueri, 2022; Ayejoto et al., 2022; Egbueri & Agbasi, 2022b). The aforementioned processes contaminate recharge sources of groundwater, leading to the degradation of groundwater quality (Subba Rao et al., 2022c). Although the consumption of acidic water could be beneficial to health, there are some major disadvantages worth highlighting. Low-pH water is corrosive and promotes scaling in plumbing systems and metallic wares (Egbueri, 2021; WHO, 2017). Furthermore, they are associated with health issues such as diarrhea, immune system inhibition, eye and skin irritation, vomiting, tuberculosis, fatigue, mucous membrane cell death, shortness of breath, and so on (Ayejoto et al., 2022; Egbueri et al., 2022a; McGrane, 2020). Under acidic conditions, the dissolution and adsorption of potentially toxic heavy metals is usually increased, resulting in excess bioaccumulation and bioavailability of the PTEs (Reid, 2019). Nonetheless, acidic water is thought to have antibacterial properties, making it potentially beneficial for hair, skin, and washing agricultural products such as crops, fruits, and vegetables (McGrane, 2020).

Table 4 Physicochemical registers of the analyzed groundwater samples

TDS, EC, TH, Ca2+, and Mg2+ have a direct relationship in water samples. The TDS of the water influences the EC and TH, while the Mg2+ and Ca2+ content influence the TDS, EC, and TH, in turn. The Ca2+ and Mg2+ levels in all the groundwater samples were found to be within the WHO (2017) permissible limits. This implies that the water samples are free from Ca2+ and Mg2+ pollution. The relationship between these parameters is more evident as the levels of TDS, EC, and TH were all within their permissible limits. The origin of Ca2+ and Mg2+ in the groundwater samples could be linked to silicate weathering, dissolution of dolomite, gypsum, and limestone present in the sedimentary basin (Bhakar & Singh, 2018; Egbueri, 2019; Egbueri et al., 2021c). The major reason for the abundance of calcium in water is the natural occurrence of calcium in the earth's crust. The presence of calcium and magnesium in water within their permissible limits is very beneficial. Calcium aids in bone development and strengthening, as well as hormone regulation, muscle contraction (improving heartbeat), blood clotting, and nerve impulses. When there is a calcium deficiency in the body, it begins to rely on (remove) calcium from the bones. This could lead to osteoporosis or increased susceptibility to bone fractures after even a minor fall. There is also some evidence that calcium and magnesium in drinking water may help prevent pancreatic, gastric, rectal, and colon cancer, and that magnesium may help prevent ovarian and esophageal cancer (Ada McVean, 2019). Despite these benefits, calcium and magnesium can cause severe damage when consumed in excess. Irregular heartbeat, hypotension, depressed reflexes, Sweating, flaccid paralysis, hypothermia, low blood pressure, slowed breathing, and other symptoms are associated with excess magnesium consumption. As for calcium, excess consumption could lead to osteoporosis, hypertension, stroke, and kidney stones (Sahu, 2019). Calcium is primarily responsible for water hardness and may have an adverse effect on the toxicity of other compounds. Zinc, lead, and Copper, for example, are much more toxic in soft water (Sahu, 2019). Calcium may immobilize iron in limed soils. Even if there is plenty of iron in the soil, this can lead to iron shortages (Sahu, 2019).

TDS in water samples refers to the amount of minerals, organic material, salts, and metals dissolved in a given volume of water. TDS levels affect everyone who lives in, drinks from, or uses water, especially in industrial settings with pipes, valves, and other equipment. TDS are influenced by dissolved minerals, plankton, industrial sewage and waste, urban runoff, silt, winter road salts, fertilizers, and pesticides. TDS can also be emitted by air that contains sulfur, calcium bicarbonate, nitrogen, and other minerals, as well as from rocks. TDS levels above a certain threshold may indicate the presence of hazardous chemicals. It could also be due to hard water, which causes scale buildup in valves and pipes. High TDS levels in industrial and commercial settings can cause cooling towers, boilers, and other machinery to malfunction. TDS is used to indicate the properties of drinking water as well as an aggregate indicator of the presence of a wide range of chemical contaminants (Subba Rao et al., 2021). The major distinguishing factor between TDS and TSS is that TSS cannot pass through a two-micrometer sieve but remains suspended in solution indefinitely.

Total suspended solids are a water quality metric defined as the measure of particles suspended in a particular volume of water that can be trapped by a filter (Egbueri & Agbasi, 2022b). It is a component of a water sample's total solids, with total dissolved solids serving as its counterpart. Total Suspended Solids Plus Total Dissolved Solids equals Total Solids. TSS measurements are widely utilized in a variety of sectors. It is linked to the level of water pollution in a water body. TSS measurement is critical in industrial settings because suspended particles can cause pipe obstruction and damage (Egbueri & Agbasi, 2022b). A variety of variables influence the accumulation of suspended particles in water. Soil erosion in outdoor systems causes more solid material to enter water bodies. TSS levels above a certain threshold are linked to water contamination (Egbueri & Agbasi, 2022b; WHO, 2017). When comparing TSS data, it is critical to examine the type of filter used as well as how the measurement was performed. Finer filters may trap more suspended materials, but they are more costly and filter at a considerably slower rate. The TSS does not have any WHO (2017) or SON (2015) guidelines. However, levels recorded in this investigation were found to be typically low (Table 4).

The concentration of nitrates (NO3) in the groundwater samples was within its permissible limits. Although S11, S12, S14, and S19 were observed to have high concentrations of nitrate. Nitrate is a naturally occurring chemical with several man-made origins. Natural processes such as plant breakdown create nitrate, which is detected in safe and healthy quantities in various foods such as carrots and spinach (Anderson, 2019; Patil et al., 2013; Reinik et al., 2008). Nitrate is found in many fertilizers used on golf courses, lawns, animal manure, crops, and sewage discharge (Wakida & Lerner, 2005; Eller and Katz 2017; Katz et al. 2009). In different parts of the world, nitrate has been discovered in a variety of lakes, rivers, and groundwater (Qasemi et al., 2022; Wang et al. 2012; Unigwe et al., 2022). In water, nitrate cannot be tasted, smelled, or seen (Bergren, 2022). High nitrate levels in water can be due to runoff or leakage from wastewater, fertilized soil, landfills, septic systems, urban drainage, or animal feedlots (Akhtar et al., 2021; Gautam et al., 2021; Verma et al., 2020). Because of multiple sources of nitrate, determining the source of nitrate in drinking water may be challenging. Too much nitrate in drinking water can be dangerous, especially for children. Excessive nitrate consumption can alter how the blood transports oxygen, resulting in blue baby syndrome (Brender, 2020; LaVoie, 2021; Zhang et al., 2018). Bottle-fed babies under the age of six months are at the greatest risk of developing methemoglobinemia (Egbueri & Agbasi, 2022a; WHO, 2017). Methemoglobinemia (blue baby syndrome) is a blood disorder that causes the skin to become blue and can lead to serious sickness or death (Egbueri & Agbasi, 2022a; SON, 2015; WHO, 2017). Only lately has scientific information been gathered to examine the health effects of drinking water containing high levels of nitrate on adults. A growing amount of evidence shows that nitrate or nitrite exposure is connected to a variety of health effects, including high heart rate, nausea, headaches, and stomach cramps (Camargo & Alonso, 2006; Hunault et al., 2009). Some studies also imply that dietary nitrate or nitrite consumption is connected with an increased risk of cancer, particularly gastric cancer. However, there is no scientific consensus on this subject.

Heavy metals (PTEs) are among the most significant pollutants in groundwater sources (Marcovecchio et al., 2007). Some of these heavy metals are required for the growth, development, and health of living beings, but others are not required, and the majority of them are poisonous to organisms (Underwood, 1956). The toxicity of heavy metals is determined by their concentration in the environment. In this study, the heavy metal analysis showed that 35%, 0%, 20%, 0%, and 40% of the groundwater samples contained Fe, Zn, Ni, Cr, and Pb, respectively, above their permissible limits (Table 4).

Iron in groundwater is caused by dissolved iron from soil and rock formations when rainfall seeps, percolates, and drains down the soil and rocks (Orjiekwe et al., 2006). Fortunately, having iron in your home's water does not pose a direct health threat. Iron is required for the body to operate. Iron is contained in 70% of the body's red blood cells and muscle cells, and it is required for oxygen delivery in the blood and muscle tissue (Ayejoto et al., 2022; Egbueri & Agbasi, 2022a). People who lack it might become weary and anemic (Ayejoto et al., 2022; Egbueri & Agbasi, 2022a). High levels of iron in home's water, on the other hand, may have a number of significant effects on the taste, smell, and sight of water (Ayejoto et al., 2022; Egbueri & Agbasi, 2022a). Iron may also have an effect on skin and plumbing fixtures, making them excellent breeding grounds for germs. Rarely, iron bacteria interact with iron to generate rust and bacterial slime. They are not known to cause illness. However, a study by Appenzeller et al. (2005) showed that the presence of iron in water might encourage the growth of bacteria such as Escherichia coli.

Zinc may be naturally introduced into water by the erosion of minerals from soil and rocks, but because zinc ores are only marginally soluble in water, zinc is only dissolved in low amounts (Ayejoto et al., 2022). The bulk of zinc in water is delivered through artificial routes, such as byproducts of coal-fired power plants or steel production, zinc fertilizers, or waste material combustion (Damodharan, 2013; Fuge, 2013; Raja et al., 2015). The results observed in Boji Boji Agbor might be attributable to the fact that zinc, which is a component of roofing sheets, has been carried down into the soil by rainfall before ending up in the subsurface water through leaching over decades (Oyem et al. 2015). Zinc is commonly used to coat iron, steel, and other metals to prevent rust and corrosion (Egbueri, 2022a). Zinc levels in soil may be high as a result of incorrect disposal of zinc-containing wastes by metal production firms and power utilities. The majority of the zinc in soil remains linked to solid particles. When soils have high quantities of zinc, such as at a hazardous waste site, the metal can leach into groundwater. Although severe zinc deficiency is uncommon, it can develop in people with unusual genetic abnormalities, nursing infants whose mothers are zinc deficient, anyone using certain immunosuppressive medications, and people with alcohol dependence (Kubala, 2018). Behavioral issues, delayed sexual maturity, impaired growth and development, chronic diarrhea, impaired wound healing, and skin rashes are all symptoms of severe zinc deficiency (Nicolai et al., 2016). Milder types of zinc deficiency are more prevalent, particularly in children in underdeveloped nations where diets are frequently deficient in essential elements (Kubala, 2018). It is estimated that over 2 billion individuals globally are zinc deficient owing to low dietary consumption (Kamil et al., 2014). Just as a zinc deficit may lead to health problems, an overabundance of zinc can have the same impact. Too much supplementary zinc is the most prevalent cause of zinc poisoning, which can result in both acute and chronic symptoms. Impaired immunological function, lack of appetite, nausea and vomiting, diarrhea, headaches, and stomach cramps are all symptoms of zinc toxicity (Tubek, 2007). Excessive zinc consumption might lead to nutritional deficits in other areas. Chronically high zinc intake, for example, can interfere with copper and iron absorption. Despite the fact that there are several health risks associated with excessive zinc consumption, as well as the effect of zinc deficiency, research shows that zinc has significant health benefits: It hastens wound healing, may lower the risk of certain age-related disorders, strengthens your immune system, may aid in the treatment of acne, and reduces inflammation (Kubala, 2018).

Natural discharges such as volcanic eruptions and windblown dust, as well as anthropogenic activities, cause nickel and its compounds to be released into the atmosphere (Gjikaj et al., 2015; Mohammed et al., 2011; Nagajyoti et al., 2010). The combustion of fuel and residuals accounts for about 62% of all anthropogenic emissions, followed by municipal incineration, nickel metal refining, steel manufacturing, coal combustion, and other nickel alloy manufacturing (Bennett, 1984; Schmidt & Andren, 1980). The primary anthropogenic source of nickel in streams is domestic waste water (Nriagu & Pacyna, 1988). Domestic waste water accounted for 29% of the nickel in influent streams at a water treatment facility in Stockholm, Sweden, according to Sörme and Lagerkvist (2002). Nickel is a micronutrient that is necessary for the healthy functioning of the human body since it stimulates hormonal activity and is involved in lipid metabolism (Zdrojewicz et al. 2016). Despite the fact that no evidence exists to support nickel's nutritional benefit in humans, it has been identified as an important nutrient for several microbes, plants, and animal species (Song et al., 2017). Nickel is necessary for optimal plant growth and development, as well as a number of morphological and physiological activities such as seed germination and productivity (Giuseppe et al., 2020). However, at high concentrations, nickel changes plant metabolism by reducing chlorophyll production, photosynthetic electron transport, and enzyme activity (Sreekanth et al., 2013). As an immunotoxin and carcinogenic agent, Ni can induce a number of health problems, including respiratory tract cancer, lung fibrosis, contact dermatitis, asthma, and cardiovascular disease depending on the amount and duration of exposure (Chen et al., 2017).

Many drinking water sources contain chromium in the + 3 and + 6 oxidation states. Concerns about public health are focused on the presence of hexavalent Cr (chromium-6), which is classified as a proven human carcinogen when inhaled (Zhitkovich, 2011). Chromium-3 is a nutrient that is required by humans. It may be found in a variety of vegetables, fruits, meats, cereals, and yeast. Chromium-6 naturally arises in the environment as a result of the erosion of natural chromium deposits (Ayejoto et al., 2022). It can also be created by industrial methods. There have been documented cases of chromium being released into the environment as a result of leakage, poor storage, or insufficient industrial waste disposal methods (Ayejoto et al., 2022; Egbueri, 2020). Cramping, stomach and intestinal bleeding, diarrhea, liver and kidney damage are among the side effects of hexavalent chromium exposure (Ayejoto et al., 2022; Egbueri & Agbasi, 2022a; WHO, 2017). Mutagenic hexavalent chromium toxic effects may be transmitted to offspring via the placenta (Ayejoto et al., 2022). The exact mechanism by which chromium improves the body is unknown, and instances of shortages in humans are uncommon. A deficit might potentially be linked to several health issues (WHO, 2017). These may include less efficient cholesterol control, which increases the risk of atherosclerosis and heart disease, and decreased glucose tolerance, which leads to poor blood sugar management in people with type 2 diabetes (Ayejoto et al., 2022; WHO, 2017). Nonetheless, chromium is a trace mineral that can increase insulin sensitivity as well as protein, carbohydrate, and lipid metabolism (Ayejoto et al., 2022; WHO, 2017).

Over the last two decades, steps have been taken to reduce lead exposure in tap water. These processes include those carried out in line with the Safe Drinking Water Act revisions of 1986 and 1996 (Tiemann, 2014), as well as the United States Environmental Protection Agency's (EPA) Lead and Copper Rules (US-EPA, 1986). Lead in water can originate from residences that have lead service lines that link to the main water line. Homes without lead service lines may nevertheless have galvanized iron pipes, lead-soldered brass/chrome-plated brass faucets, or other plumbing (Egbueri, 2020, 2022a, 2022b; Saha et al., 2017). Lead in drinking water cannot be seen, tasted, or smelled. The easiest way to determine the risk of lead in drinking water exposure is to identify probable lead sources in the service line and residential plumbing (US-EPA, 1986; WHO, 2017). Lead is very hazardous in very low quantities and has no recognized health benefits (Saha et al., 2017). Lead is bio-accumulative and has the potential to cause irreparable harm to bodily organs such as the reproductive system, neurological system, and kidneys (Todd et al., 1996). It has been shown that high lead levels in water are associated with acidic water (Jordana & Batista, 2004). All the water samples in this study were acidic, and this could explain why about 50% of the groundwater samples were polluted with lead.

4.2 Indexical methods for groundwater pollution and quality assessment

4.2.1 Water pollution index (WPI)

The groundwater samples were classed using the WPI and the results are presented in Table 5. Based on the classification scheme of the WPI model, the water samples were grouped as: excellent water (S3, S6, S7, S10, S12, S13, S15, and S16), good water (S2, S4, S5, S17, and S18), moderately contaminated (S8 and S11), and highly polluted water (S1, S9, S14, S19, and S20). It was observed that lead pollution was the most influential factor in the depreciation of groundwater quality. Therefore, it is recommended that proper water treatment measures are taken before the consumption of the polluted water samples.

Table 5 Results of the WPI and PIG computed using the physicochemical data

4.2.2 Pollution index of groundwater (PIG)

The PIG was also used to classify the groundwater samples. The essence of using more than one indexical method is to remove the bias associated with the use of stand-alone models. The results of the PIG model are presented in Table 5. The results suggest that pollution levels in S3, S5, S6, S7, S12, S15, S16, and S17 were minor, but pollution levels in S1, S9, S14, S19, and S20 were quite high. The very high pollution status was also greatly influenced by lead poisoning. Moreover, low levels of pollution were observed in S2, S4, S10, S11, S13, and S18, whereas S8 was grouped in the moderately polluted class. Furthermore, a simple linear regression analysis was performed between the findings of the two indexical models (Fig. 2). Despite the fact that the PIG was computed using assigned weights, there was a significant positive correlation between the PIG and WPI. This shows that the weightage assignment was properly done and the two indexical models have a strong agreement.

Fig. 2
figure 2

Linear regression graph and equation showing the high correlation between the WPI and PIG models

4.3 Simulation and prediction of PTEs and water quality indices

4.3.1 Multiple linear regression

Table 6 shows the statistical metrics for evaluating the performance of the MLR modelling of water quality parameters and indices. The models’ parity graphs are shown in Fig. 3. The findings demonstrated that the MLR models performed exceptionally well in predicting all variables. However, there was some diversity in performance. Despite the fact that all of the R, and R2 values were high, the SEE differed (Table 6). Models having a greater coefficient of determination (R2), closer to one, outperformed their peers. Furthermore, when two or more models have the same R2 value, error values may be utilized to compare the models. Models with lower error levels are seen to be superior to models with larger error values. Based on the explained facts, the hierarchical order of the MLR model performances for the water quality parameters is as follows: Cr > Zn > Pb > Ni > Fe > NO3. Furthermore, the models for WPI and PIG performed equally well. This result coincides with the linear regression analysis of the two models. It was also discovered that the MLR performed better in predicting the two water quality indicators than the six physicochemical factors. Nonetheless, the overall findings of the MLR imply that it is appropriate for the predictive modelling of water quality parameters.

Table 6 Performance summary of the multilinear regression and artificial neural network models
Fig. 3
figure 3

Parity plots showing the R2 values of the MLR predictions of a NO3, b Fe, c Zn, d Ni, e Cr, f Pb g WPI, and h PIG

4.3.2 Artificial neural network modelling

The ANN approach was employed alongside the MLR for simulating the water quality indices and variables. Table 6 shows the performance metrics of the ANN models developed in this study. The results of the modeling revealed that the ANN models performed well. The models’ parity charts are displayed in Fig. 4, and the residual error plots are provided in Fig. 5. The ANN models have very low modelling errors. The following appears to be their order of performance in modeling of water quality parameters: Pb > Zn > Ni > Cr > Fe > NO3 (Table 6). Furthermore, the models for water quality indices performed similarly in terms of their R2 ratings. However, the WPI model performed better than the PIG model by a fractional difference in terms of their sum of square errors (Table 6). The MLP-NN models, like the MLR models, also proved to be efficient and cost-effective for computing and predicting groundwater quality parameters. A sensitivity analysis was also carried out to see how the input elements influenced the groundwater quality prediction. The results of the analysis are shown in Fig. 6. For the models, only input variables with a sensitivity score (normalized relevance) greater than 50% were considered important. For the models of NO3, Fe, Zn, Ni, Cr, and Pb, three (K > pH > Ni), four (HCO3 > Mg2+  > pH > K+), three (Pb > TSS > Cr), seven (HCO3 > TSS > Zn > EC > Cl > TDS > Pb), seven (Ca2+  > Th > K > Zn > Na+  > Ni > Pb), and two (Zn > Ni) input variables significantly influenced them, respectively (Fig. 6a–f). On the other hand, only Pb was shown to be an important predictor in the models of WPI and PIG (Fig. 6g and h). This also corresponds to the initial findings of the WPI and PIG, which suggested that lead pollution greatly influenced the groundwater samples.

Fig. 4
figure 4

Parity plots showing the R2 values of the ANN predictions of a NO3, b Fe, c Zn, d Ni, e Cr, f Pb, g WPI, and h PIG

Fig. 5
figure 5

Parity plots showing the residual errors of the ANN predictions of a NO3, b Fe, c Zn, d Ni, e Cr, f Pb, g WPI, and h PIG

Fig. 6
figure 6

Bar charts showing the sensitivities of the input variables in the ANN predictions of a NO3, b Fe, c Zn, d Ni, e Cr, f Pb, g WPI, and h PIG

4.3.3 Comparing the performances of MLR and ANN models

It is vital to have trustworthy models that can predict parameters of interest to save money on groundwater monitoring and evaluation. In this study, both MLR and ANN techniques were employed for the computation and prediction of WPI, PIG, NO3, Zn, Fe, Ni, Cr, and Pb. Although both strategies functioned admirably, it is thought important to establish which one outperformed the other. With the exception of Pb, the MLR models outperformed the ANN models in terms of modelling the groundwater parameters and indices. As a result, in our study, we find the MLR to be a more efficient model. However, some studies suggest that the ANN algorithm is better than MLR (Abba et al., 2020; Abdullahi et al., 2020; Faloye et al., 2022; Ghadimi, 2015).

The present study was also compared with previous studies in the literature that predicted the occurrence of similar PTEs in various water sources (Table 7). From the literature analysis, the following observations were made: (1) the present study may be the first to model nickel and chromium in groundwater using the MLR algorithm, (2) PTEs predicted with more input variables had better R2/R rating, (3) input variable type influenced the models’ performances, and (4) the influence of input variable type on the performance of the models was greater than the number of input variables employed. In terms of the R2/R value, the present study had the best modelling result in most scenarios (Table 7). It is hoped that the information presented in this study will promote better modelling of PTEs in future studies.

Table 7 Comparing the performances of MLR and ANN prediction of PTEs in water with previous studies

5 Conclusions and recommendations

Artificial neural network and multiple linear regression proved to be reliable for monitoring groundwater resources. The two models performed very well; although, MLR (95–100%) performed better than ANN (85–99%) in modelling most of the PTEs and water quality indices. The PIG and WPI model characterized the quality of groundwater in the study area similarly. They revealed that groundwater resources in the area are classed into two major groups, the suitable groundwater samples (65–70%) and the unsuitable groundwater samples (30–35%). Simple linear regression also confirmed that there is significant agreement between the results of WPI and PIG. Based on the computation of the ANN, PIG, and WPI models, Pb was the most influencing parameter that degraded the quality of groundwater in the study area.

The sustainability of groundwater resources requires the collective efforts of the government, researchers, and inhabitants of every locality. Better environmentally friendly practices should be adopted in waste management, energy generation, and consumption. Recycling and reuse of materials should be encouraged to reduce waste generation. It is hoped that the methodologies and findings of the present research could provide insights for future works on groundwater resources. Further studies that propose new ways of assessing and monitoring the quality of water resources are encouraged.