Introduction

Water is an essential solvent that facilitates chemical processes, controls temperature, and is an elixir of life on Earth. Its significance emphasizes the necessity of prudent conservation and sustainable use of water resources in order to safeguard the health of our planet in the future. Understanding hydro-chemistry is crucial for evaluating water quality and determining if it is suitable for various purposes (Manna and Biswas 2023). Rivers and lakes are the main sources of surface water that are most easily available for industrial, agricultural, and human consumption. Surface water is utilized as a main source in many regions of the world for drinking, irrigation, and industrial uses. It serves as the most valuable natural resource and is necessary for the ecology, socioeconomic growth, and well-being of humans. However, worldwide, 15% of water is used for industrial operations, 20% for irrigation purposes, and 65% for drinking purposes. Roughly one-third of the world’s population gets their drinking water from surface water (Ding et al. 2023). Due to factors including increased industrial and population expansion, agricultural development, poor sanitation, rapidly rising urbanization, and human activity, surface water availability has been declining for a long-time in both quantity and quality. The weathering cycle, surface water quality, agro-animal cluster pattern, and social life have all been impacted by environmental change. Through a variety of channels, the contamination of different locations can add to the aquifer’s contamination. Their close proximity to possible sources of pollution, including septic tanks, municipal waste landfills, industrial sites with improper waste disposal, and agricultural areas with runoff of fertilizer and pesticide, is a significant factor (Uddin et al. 2023).

Meanwhile, over the past ten years, water pollution and climate change have made the water deficit a global issue. High altitude regions are particularly susceptible to food insecurity because 12% of the world’s population is affected by global warming. In past few decades, surface water contamination has been a serious issue on a global scale. Surface water quality is declining due to several factors, including excessive fertilizer use in irrigation areas, rapid industrial growth, intense urbanization, and waste from humans and animals. Water pollution presents a severe threat to human existence, socioeconomic growth, and the long-term sustainability of water resources (Syeed et al. 2023). The presence of a range of regulated and emergent contaminants, such as microplastics, antimicrobial compounds, and persistent organic pollutants, in surface water resources has also been found in recent urban research. Surface water quality is steadily declining as a result of agriculture’s overuse of pesticides and fertilizers, which has an effect on public health. In addition, it causes disease, contaminated food chains, a shortage of water, a decline in agricultural crop productivity, and the extinction of aquatic species. Consequently, the relevance of comprehending surface water’s susceptibility to contamination is growing, which has caused study in metropolitan areas to become more focused (Krishan et al. 2022). Thus, having enough water is necessary for the development of enterprises and the expansion of agricultural operations. As a result, to monitor the water quality for drinking and irrigation, one must have a thorough understanding of the science of surface water geochemistry, which reveals the processes that influence the chemical configuration of surface water. Water resource studies, planning, and management now heavily rely on hydro-chemical composition characterizations and quality assessments (Gani et al. 2023). In order to solve these problems, safe water supply and environmental protection must be handled in a balanced manner, with an emphasis on eradicating pollution, enhancing infrastructure, and implementing integrated water management techniques. Therefore, it is highly recommended to conduct routine quality assessments of surface water to determine its suitability for drinking and irrigation (Sheikh Khozani et al. 2022).

Nowadays, these factors have caused several scientific research to focus on different approaches for controlling the quality and quantity of surface and ground water supplies. Large water quality datasets, which can be challenging to evaluate and synthesize, must be gathered, and analyzed in order to manage water quality. A multitude of indicators have been proposed for water quality analysis to help with the assessment of data by turning the values of several factors into a comprehensive and all-encompassing index, one of which is the Drinking (D)-Water Quality Index (WQI). The WQI model is regarded as one of the most widely used numerical techniques for analyzing the quality and pollution status of water resources used for drinking and domestic use. Hence, it is regarded as the most effective method for determining the quality of drinking water in industrial, rural, and urban areas (Chakraborty et al. 2023). When it comes to evaluating surface water quality through machine learning, the integration of DWQI provides a useful and applicable method. Policy makers need to be able to describe the state of surface water quality and its controlling mechanisms in order to choose the best treatment methods to address concerns. Handling enormous and complex datasets created by several water quality parameters at various sample points and extracting meaningful information from them are major challenges related to water quality monitoring (Satyaprakash et al. 2024).

Several calculation procedures that take a lot of time and effort are required in order to combine numerous water characterization data points into a single value that, while determining the DWQI, characterizes the water quality level and represents the overall water quality level (Baliarsingh et al. 2023). The multivariate statistical approaches and Spatial Reflectance (SR) may be applied to solve this issue as they are common techniques that define a linear relationship between a group of independent and response variables.

The effectiveness of several multivariate statistical approaches in extracting important data and discrete groups from hydro-chemical datasets related to surface water quality evaluation has been validated recently (Zhalmagambetova et al. 2024). In addition, the research area has utilized a combination of statistical analysis and WQI application to determine the primary ions responsible for the decline in surface water quality. The PLSR (Partial Least Squares Regression) method can be used to evaluate the spatial distribution of the contaminants, analyze controls on surface water composition, and interpret observed interactions among variables. It can also produce simpler associations that shed light on the underlying structure of the variables (Eti et al. 2024). Thus, PLSR has been proposed as a technique to efficiently evaluate water quality indicators and resolve strong multi-collinear and noisy effects in spectrum regions. In this sense, PLSR may offer insightful data that bolsters the effectiveness of spectral un-mixing methods (Alqarawy et al. 2022). The aforementioned algorithms are intelligent algorithms that have produced good results when applied to models of optimal water resource allocation. Nevertheless, the majority of intelligent algorithms suffer from the following drawbacks: they are not very good at local searching, have a lot of uncertainty parameters, and complex determinations (Benkov et al. 2023; Jassam et al. 2024a). Following this, for these additional intelligent algorithms, the best solution can quickly approach a final scheme and result in a high rate of convergence.

Laboratory testing and point sampling have been used in DWQI estimation. Although this method is correct, it is too costly, time-consuming, risky, and limited in scope to be useful for tracking these metrics. Additionally, it is unable to give decision-makers a comprehensive analysis of significant water quality indicators. In order to address this issue, WQIs can be tracked by remote sensing (RS) technology (Najafzadeh and Basirian 2023). Finding the optimal correlation between spectral data and different water quality indicators still requires applying an appropriate statistical model to assess spectral reflectance (SR) indices, since spectral measurements generate a lot of data. Rapid advancements in RS technologies have made a variety of data collection methods widely applicable for the integrated assessment of numerous water quality indices. Proximate hyperspectral sensing, as opposed to satellite imagery, may be a useful technique for measuring and monitoring quality while avoiding the constraints imposed by outside interference factors (Sultana and Dewan 2021).

Geographic Information Systems (GIS) with remote sensing (RS) and mapping are necessary for all geographic and spatial aspects of the development and management of water resources. Geostatistical methods such as GIS, is utilized to show how spatial data may be used to make well-informed management decisions. These methods offer strong analytical and visual aids for characterizing, evaluating, and simulating the processes and functions of natural systems (Latif et al. 2024). In this method, for continuous variables that are dispersed over space and time, analysis and predictions can be performed. The development of interpolation techniques and spatial analytical modeling, has been the focus of a great deal of investigation. These strategies for calculating and creating quantitative or qualitative water maps span from semi-empirical approaches to analytical procedures (Uslu et al. 2024). Spatial correlations between sample point values are utilized to create spatial distribution maps through the application of the Inverted Distance Weighting (IDW) approach. IDW predicts the unknown points by interpolation, which produces a continuous surface of the point data. IDW makes the assumption that a known data point’s impact on the interpolated value decreases with distance. Based on the known points that are close by, the unknown point is given weight in this process. When constructing surfaces from sparsely sampled water quality data, the IDW interpolation took local variance in the measured data points into account (Patra et al. 2022). Thus, the use of GIS facilitates the construction of focused and basin-specific management plans, allowing for the creation of successful interventions that promote sustainable land use practices, safeguard the region’s water quality, and reduce soil erosion.

The study aims to determine the water quality that is safe for consumption by residents of both rural and urban areas. The Mahanadi River Basin stands out as one of the most important and widely used basins in Odisha that is also environmentally delicate. This river regularly provides drinking water, irrigation water, and hydroelectric power to a substantial and prosperous population. In many developing cities like Cuttack, Odisha, because of the steady rise in population and urbanization over the past few years, there has been a noticeable increase in the amount of waste water released into the environment. The growth of irrigation projects in the higher parts of the basin and the frequency of droughts pose a severe danger to the region’s water resources. Particularly in the lower portions of the basin, these factors have a major effect on the biodiversity of the environment and the socioeconomic activities of the local population. Like other rivers, the major tributaries have undergone substantial environmental stress. Climate change, protracted and frequent droughts, and the ensuing insecurity from flooding are all responsible for this reduction. This has the associated drawbacks of polluting the environment, endangering public health, and putting more emphasis on quickly running out water supplies.

The accuracy of the study can be strongly impacted by the availability and caliber of historical, hydrological, and meteorological data. Prior research on the chosen area’s water quality status has not been conducted. Therefore, it is critical to assess the surface water quality in relation to climate conditions by determining threshold levels in order to safeguard the citizens of Cuttack and the surrounding area by reducing pollution-causing elements and implementing long-term conservation measures. Thus, for a variety of reasons, including a lack of infrastructure or resources for data gathering, there may be gaps or inconsistencies in the data in many regions. Because different water quality models have different structures, underlying assumptions, and parameterizations, they may yield different findings for the same region.

In this paper, this study’s goal is to determine whether surface water in the allocated study region is suitable for human consumption. This evaluation is carried out with the use of indicators like Drinking Water Quality Index (DWQI), the PLSR and SR technique, and the utilization of Geographic Information System technology. A more comprehensive understanding of the hydrological implications of climate change and water quality in the area is provided by assessing the consequences of the methodologies taken into consideration in the study. This enables a more thorough assessment of the possible changes in precipitation, temperature, and stream flow. Hence, the study aims to deliver actionable insights to evaluate drinking water quality utilizing water quality measurements and aesthetic qualities, in order to modify planning, policy formation, and sustainable resource management.

The selection of the water quality evaluation takes into account the bibliographies and makes it easier to carry out the methods for quantifying them. In addition to the analytical data, a survey asking locals about their perceptions of the quality of the water and on-site observations were used to assess the risk factors associated with waterborne illnesses. This study is a component of a bigger initiative that aims to determine the most effective waste water treatment technique at each station. Therefore, the results of the study can help local government authorities put policy initiatives to improve the quality of surface water into effect.

Description of the site

In the investigated region, one of Odisha’s oldest districts is known as Cuttack. In this regard, Cuttack, the district’s namesake, is recognized as the state’s commercial center. The area experiences scorching summers and cold winters due to its tropical environment (Satyaprakash et al. 2024). However, the city functions as the district’s administrative hub. The area lies within the coordinates 20°03’ and 20°40’ N latitude and 84°58’ and 86°20’ E longitude. The yearly average precipitation is approximately 400 mm, with a peak of 1500 mm during the wet season months. The district is densely inhabited and covers an area of 3932 square kilometers. Cuttack City is bordered to the north and south by the Katha Jodi and Mahanadi rivers. The mean temperature in the dry season climbs up to 45 °C and lowers down to 10 °C in rest seasons (Baliarsingh et al. 2023). Thus, since most of the rainfall during the dry months evaporates, it is not very significant.

Over 90% of the research area is composed of hard rocks in relation to the geological surroundings. Throughout the whole research region, the parent rock type most frequently observed is gneissic rock. Gray pyroxene granulite has been observed, and on the worn surface is granulitic rock containing minerals. The water in the city was formerly thought to have been an aulacogen, a section of a graben. The vast amount of eroded silt from the Mio-Pliocene Sea and the Mahanadi-Gondwana basin gave rise to the delta (Das 2024a). Both fluvial and marine processes have an impact on the redistribution of deltaic sediments (Parwin et al. 2024). Investigations of the water quality of the Mahanadi and its tributary Katha Jodi River, in the Cuttack region, are currently underway to ascertain its effects on fish diversity as well as for sanitary and health-related purposes. The investigation was initiated between the years 2023 and 2024, respectively (APHA (American Public Health Association) 2005). During the South-West (SW) monsoon, waves control the deltaic coast, and tides and waves work together to dominate it during the non-monsoonal season. Red soil and black cotton soil are often reported soil types in the research region. High permeability and fractured rock types are indicated by the geotechnical parameters of the soil in the research region. Between these hill ranges, plains predominate in this region. The study area is shown in Fig. 1. This research investigation displays nine sampling locations with geospatial coordinates such as S-1 (lat (N): 20.44, long (E): 85.88, Potential source of pollution (PS) = Sewage); S-2 (20.44 N, 85.89 E, PS = Sewage); S-3 (20.44 N, 85.74 E, PS = Agricultural non-point pollution); S-4 (20.47 N, 85.77 E, PS = Animal excrements, Sewage, Human-activities); S-5 (20.48 N, 85.85 E, PS = Agricultural non-point pollution); S-6 (20.46 N, 85.83 E, PS = Undertake domestic sewage and industrial waste water); S-7 (20.46 N, 85.82 E, PS = Sewage and Feeding fodder; S-8 (20.46 N, 85.83 E, PS = Human activities; and S-9 (20.46 N, 85.84 E, PS = Human activities), respectively.

Fig. 1
figure 1

Location map and its collected water sample location

Sampling and analytical method

The present work involves water level monitoring and hydro-geochemical analysis of surface water samples, for a duration of 1-year (2023–2024), corresponding to the pre-monsoon period. Initially, to distinguish the spatiotemporal variance in the water quality assessment, a random sample technique was used. A handheld Gramin GPS unit was employed to document the sampling sites’ geographic coordinates. A total of nine sampling locations (S), were selected for water level measurements (WHO 2008). Evidently, high-density polyethylene (HDPE) bottles that had been deionized and sterilized were used to gather 100 ml of surface water samples straight from these sources. Additionally, samples were acidified by adding diluted nitric acid to reach a pH level below two after being filtered through a membrane filter with a hole size of 0.2 µm. A number of factors are used in selecting parameters for determining the status of a particular water body with respect to its quality.

The planned use of the water, a review of the literature, expert comments, and the geographical conditions of the study region should all be taken into consideration when choosing the criteria. Of these, seven water quality indicators were chosen for sampling and physicochemical analysis, for the purpose of drinking and irrigation activities (WHO 2008). All samples were collected in triplicate. The selected parameters which include pH, dissolved oxygen (DO), Alkalinity, Hardness, Conductivity, Nitrate (NO3), and Phosphate (PO43−). The physicochemical data were obtained from the State Pollution Control Board (SPCB), Odisha. In this study, physicochemical parameters such as pH, and Conductivity, were determined in-situ by the help of portable digital meters. DO concentrations is measured using a DO meter (Jassam et al. 2024a). As for the ionic concentrations, ionic Chromatography was used to quantify the following anions like NO3 and PO43−, respectively. Moreover, alkalinity and hardness, were determined by Titrimetric approach, which is thereby explains the surface water’s qualitative characteristics in the research area (Zheng et al. 2024). The analytical data’s quality was guaranteed by the application of laboratory quality assurance and control techniques. Consequently, field blanks and duplicates are included in quality control samples.

Following the standard field sampling, duplicate samples were taken, and any variations in concentrations between replicate pairs were assessed within the method’s precision. Analysis of blank samples revealed no evidence of intrinsic bias in the analysis technique (Chakraborty et al. 2023). Data obtained after the physicochemical analysis of water samples were used to prepare spatial distribution maps of all ions using surfer 11, ArcGIS 10.5 and Aqua Chem software. The charge balance error equation was utilized to determine the accuracy of the analysis, yielding a concentration of less than 5% for the main anions. The PLSR and SR calculations and plotting were conducted using the MATLAB 2016 and Python Environment (Latif et al. 2024).

Methodology

The study’s focus on a comprehensive, multi-parameter assessment of surface water quality closes a significant research gap. Prior research has occasionally evaluated surface water quality using condensed indices or criteria, which may not have fully captured the complexity of water quality dynamics. One of the major techniques, was implemented to address challenging spatial issues in assessing the water quality. A mathematical technique such as DWQI, can be applied to exceedingly complex decision-making situations that involve several aspects, criteria, and scenarios. By merging many WQIs and applying robust statistical techniques, the study advances our understanding by providing a more thorough and impartial evaluation of surface water suitable for irrigation and consumption. PLSR and SRI were modified in this study to assist decision-makers in incorporating a variety of options that represent the viewpoints of pertinent parties into a prospective or retrospective framework. By using the techniques, shown in Fig. 2, this research provides insightful and doable suggestions for bettering drinking suitability and water resource management in the selected sampling points.

Fig. 2
figure 2

Flow chart of the study

Drinking water quality index (DWQI)

Surface water assessments and needs changes are dependable thanks to the water quality indexes. By comparing water quality parameters with worldwide standards, the WQI approach divides water sources into various classifications (Ding et al. 2023). The idea of Drinking WQI is employed to assess the quality of water. Numerous researchers have altered the methodology based on the weighting of each water quality criterion in numerous locations throughout the world thus far. Numerous professionals and scientists have talked about how crucial WQI is for giving a broad picture of water quality (Zhalmagambetova et al. 2024). In addition, the DWQI index is used by several countries worldwide to assess the quality of their water supplies. By combining all water quality measurements into a single, understandable, and easily interpretable value, this relevance is illustrated. This index’s main advantage is that it allows for the addition of numerous factors to the WQI calculation stages (Krishan et al. 2022). This technique assigns weight to chemical parameters based on subjectivity criteria, which means that the investigator’s judgment and experience in terms of overall quality and human health are the only factors that matter. Based on seven important surface water chemistry characteristics, we employ the conventional method of DWQI in this work to assess the impact of anthropogenic and natural variables on water quality (Gani et al. 2023). Subsequently, the WQI concept was employed in the current study to evaluate the quality of marsh water and identify the physicochemical components that lead to water pollution. Therefore, the public, technicians, managers, and decision-makers can all benefit from this index’s analysis and dissemination of data or basic environmental information (Alqarawy et al. 2022). Ultimately, the formulae have determined the WQI value in each station, yielding a value ranging from 0 to 200. Then the water quality is placed in the subsequent groups: < 50 indicates excellent; 50–100 signifies good; 100–150 as poor; 150–200 as Very poor; and lastly, > 200 represents unsuitable water quality.

PLSR model

In chemometrics, the PLSR is a technique for multivariate data analysis. Choosing the ideal number of latent components to accurately reflect the calibration data without overfitting is a crucial step in PLSR analysis (Benkov et al. 2023). Most often, it functions well as a data processing strategy when there are more input parameters than output parameters, considerable collinearity, and noise in the input variables data (Najafzadeh and Basirian 2023). It works especially well for problems like time-series prediction, where the input and output are collections of data points. Because of this feature, PLSR can display dynamic temporal behavior, which makes them very useful for jobs involving sequences, such time series prediction. The PLSR tool may produce accurate models in the event when the number of independent variables, also called independent variables, that significantly exceeds the number of dependent variables, or measured qualities. However, PLSR was applied in combination with leave-one-out cross-validation (LOOCV). The best ONLFs were chosen to accurately illustrate the calibration data without over or under-fitting since they showed the highest R2 and the lowest RMSE (Sultana and Dewan 2021). Meanwhile, this tool was used to explore the relationship between the physicochemical features that were responsible for changes in the water quality by substituting the original variables with a new set that reflected the impact of important elements on water quality. Random tenfold cross-validation was used on the datasets to improve the model’s performance and result robustness. It is seen that this technique was an indication of the calibration and validation models’ accuracy. By the coefficient of determination (R2), and root mean square errors (RMSE), respectively. The most pertinent feature was found by applying the algorithm, which is outlined in Latif et al. (2024). Prominently, prior to producing the input, it evaluates the information gained from earlier inputs as well as the current input. The PLSR may integrate memory internally because of its recursive nature. It creates an output, copies it, and then feeds it back into the network input to be processed further. The suggested system is divided into multiple phases that are shown in Fig. 3.

Fig. 3
figure 3

Flow chart diagram of the methodology for estimating WQI using PLSR model

Spectral reflectance (SR) model

In the approach of Spectral Reflectance (SR), the acquisition device was detected by employing handheld spectrometer (band range of 302–1148 nm; tec5, Oberursel, Germany). The most important aspect was that it reduced the number of dimensions in the data while improving regression and modeling accuracy for future predictions. It was applied to obtain measurements of surface water samples’ spectral reflectance. The device is made up of two basic elements, one of which is coupled to a diffuser and detects sun radiation, while the other component measures water samples that corresponds to spectral reflectance (Mohseni et al. 2024). The model comprises of three separate layers, which are the input layer, the hidden layer, and the output layer. The hidden layer sits between the independent input layer and the dependent output layer, and the input layer is in charge of supplying an input to the neural network. The spectral reflectance of the water samples was adjusted using a calibration factor derived from a white reference standard in order to calibrate the spectrometer data. The spectral reflectance of each surface water sample was measured four times, for a total of twenty scans (Cai et al. 2021). Sometimes, High-level attributes are extracted from the input by the hidden layer, and these inputs are used by the output layer to generate outputs. The water samples’ spectra were collected during midday to minimize volatility and lessen the impact of variations in the sun zenith angle. Lastly, by smoothing the spectral reflectance, noise was eliminated from both ends of the electromagnetic spectrum. This model is frequently thought of as a broad mathematical model intended to mimic human cognition, particularly in the areas of pattern recognition and prediction. To accomplish their goals, these models make use of neurons with weighted connections or interconnected nodes. The various phases that make up the suggested systems are highlighted as per (Masoud 2022). A Flow chart illustrating the study’s spatial reflectance approach, is shown in Fig. 4.

Fig. 4
figure 4

Methodological framework of SR model

The seven physicochemical characteristics used in the DWQI were spatially analyzed using ArcGIS version 10.5 software. The spatial analytical tool performed the data interpolation using the inverse distance weighted (IDW) methodology. The IDW approach, which estimates unknown values based on the assumption that values at locations for sampling closer to the unsampled point shall be more comparable to one another than at locations farther away, was used to map the distribution of water quality.

Results and discussions

Water’s chemical quality particularly that of surface water, has a significant role in determining how it is used for human consumption, agriculture, and industry, among other uses. Therefore, the type of application can be determined by evaluating its chemical composition. The parameters of the research region are compared with the standard guideline’s values, as advised by the WHO, for drinking purposes in order to determine the area’s acceptability (Otorkpa et al. 2024). Descriptive statistics analysis (minimum and maximum) of the physicochemical characteristics of surface water samples is discussed herein. Using the IDW method in Arc GIS 10.5 software, the geographical distribution was achieved by interpolating between various observed concentrations at river sites. The depicted variable’s influence, which diminishes with distance from the sampled place, is taken into account by the IDW spatial analysis tool. (Sharma and Sharma 2024).

The frequency of chemical reactions and the interactions between pollutants and aquatic life are influenced by temperature. The obtained temperature recorded in the study, varied as 28–31 °C. The pH of water can be used to calculate several forms of geochemical equilibrium and solubility. It also shows how well the water reacts to any acidic or alkaline substances that are present in it (Tecklie et al. 2024). The physicochemical characteristics of the Katha Jodi, a tributary of the Mahanadi River, indicate that the pH of all nine sites was eight, which is somewhat alkaline. Every pH value found in surface water tests fell within the WHO’s recommended range for drinking water. (Patil et al. 2023) conducted a study on the pH concentration of drinking water in the popular Palani tourist region. The results showed that the water’s surface was mildly alkaline, with a pH range of 7.3 to 8.2. Alkalinity indicates a high concentration of hydrogen ions, which may be related to ion exchange, parent rock weathering, and rock-water interactions.

It is quite often that, surface water DO depletion was caused by the leather effluent discharge’s biochemical oxygen demand (BOD) load. Low DO is the cause of acute stress, decreased aquatic fisheries, ecosystem imbalance, and organism death. The DO value ranged as 3.9 to 5.8 mg/L. The desirable value of DO is considered as 6 mg/L. High BOD concentrations in surface waters typically result in low DO concentrations, which in turn induce organic and inorganic pollutants released by leather effluent (Ahmad et al. 2024). Researcher namely (Sharma et al. 2022) stated that increment of high BOD and less DO contributes poor water quality, that resulted because of anthropogenic and geogenic sources. An elevated concentration of these indicators would cause health issues. Moreover, since contaminated water alters soil characteristics and the state of natural ecosystems, it is imperative to keep an eye on the yield of clean water.

The Alkalinity concentration obtained among 122 (min) to 162 mg/L (max) and the average is denoted as 145 mg/L. The upper limit of alkalinity is around < 200 mg/L as per WHO criteria. In general, the concentration is more abundant at all locations (> 100 mg/L). This is a reflection of the transport mechanism in humid tropical regions with high levels of rock mineral dissolution and marshy ionic load. During the rainy season, a variety of inputs, including flora, anthropogenic agents, and the atmosphere, are mobilized. However, according to WHO, all sites belong to safe category for human consumption (Laaraj et al. 2024). Investigated on a study area based on the alkalinity values, as stated by Patel et al. (2023), and they examined modern farming practices that significantly worsened the quality of the water, as well as garbage disposal in residential areas close to the river basin.

The presence of calcium and magnesium cations, which are present in large amounts in surface water, directly contributes to the hardness of the water. The calculated hardness, exhibits in a range between 81–131 mg/L. The WHO recommended limit persists around < 100 mg/L. High hardness content is noticed at six locations such as S-(4) to S-(9). This could have detrimental repercussions on one’s health, such as stomach problems, as well as financial and hydraulic effects, such scaling (Lakouas et al. 2024). Additionally, it could also result from weathering, anthropogenic activities such waste water discharge, and the interaction of water with soil, rock, and other elements. According to Rajkumar et al. (2023), the hardness concentration exhibits around 130 mg/L (average), in their evaluation study, based on the water used for drinking and agricultural activities.

Conductivity values, on the other hand, varied in the range of 909–960 µmhos/cm, while the permissible level is taken as 500. The results show a large range of variation in the salinity of the water with respect to the various ions dissolved in it. Dissolved ions are released into the surface water body due to the chemical process of silicate weathering (Fariz et al. 2024). Notably, when conductivity levels at S-(1), (2), (3), and (4) are higher than 940, it is suggested that anthropogenic activities have contaminated the water. Once more, greater conductivity values are a result of overuse of fertilizers and home waste in agriculture. In an area of India that is semi-arid, (Zhou et al. 2024) examined the conductivity value over the course of the investigation, which is > 500 µmhos/cm.

Nitrate (NO3) is regarded as a non-lithological pollution that enters surface water through nitrate fertilizers, animal and human waste, gas leaks from septic tanks, home wastewater, and irrigation return flows. (Shaikh and Birajdar 2024). Regarding the calculated value of NO3, that fluctuate between 10 and 135 mg/L. The limit of desirability for the drinking water, as per WHO guidelines is indicated as 30 mg/L. Primary agents of NO3 that exceeds the WHO limit at locations like S-(3) to S-(9), is the result of contamination from human or animal waste, runoff from garbage dumps, or runoff from agriculture. If nitrate-rich aquifers feed the river, the concentration may fluctuate often throughout the year (Pizarro et al. 2024). Pasture and agriculture might predominate in the areas surrounding these monitoring locations. Changes in water quality can be facilitated by the heavy use of pesticides, fertilizers, and organic additions in agricultural regions (Li et al. 2024). Similar study conducted by Din et al. (2024) conducted a risk assessment in Changchum New District, China, on shallow water contaminated with nitrate. According to reports, contemporary agricultural practices and domestic waste disposal are the main causes of nitrate pollution in surface water.

According to WHO guidelines, rivers should not contain more phosphate than 0.1 mg/L, yet the quantities of PO43− in our study sites ranged from 0.8 to 1.96 mg/L. The current study’s findings demonstrated an increase in the chemical indicators’ concentrations, particularly at sampling points S-(9), (8), (7), (6), and (5). Over these limits, phosphate can be quite dangerous. Our findings are in line with prior research that found that land cover, particularly pasture and agricultural land, is a significant factor in changes to the water quality, particularly during the rainy season (Sodhi et al. 2024). In addition, Geological formations and soils may be related to the seasonal and spatial variability of water quality indicators along the stations, which may lead to changes in the water system in connection to water quality indicators. A study conducted by Jassam et al. 2024b, that displays comparable phosphate readings; this is caused by improper sewage disposal, seepage runoff, and sanitation facilities.

Relying on the physicochemical analysis results, the interpretation of anionic abundance, which is represented as: NO3 > PO43−. Evidently, Figs. 5, 6a–g shows the spatial distribution map of 7 water quality indicators. From the visualization of spatial map, the poor areas from sampling point S-(1) to S-(5), exhibits higher nitrate and phosphate values. Thus, the common farming techniques during the crop time, or rainy season, cause sediment runoff and nutrient leaching into water bodies. In addition, in these higher-sloped locations connected to erosive processes and intensive land use, considerable rates of soil loss in the middle and lower sections of the hydrographic basin, especially during periods of large rainfall, can endanger the water quality of this water system (Thakur and Devi 2024). However, Soluble or suspended sedimentary materials have the ability to significantly alter the physical–chemical characteristics of water quality. Reducing the amount of nutrients transported to water bodies can be achieved by adopting methods including planting forests next to riverbanks and using cover crops, crop rotation, contour planning, and terracing in agricultural regions (Badalge et al. 2024). Finally, planning for the environment and management is essential to maintaining the study area’s water quality and its use for public supplies. However, the findings provide a framework upon which more in-depth research can be conducted to improve evaluations.

Fig. 5
figure 5

Bar plots of the study area a pH, b DO, c Alkalinity, d Hardness, e Conductivity, f Nitrate (NO3), and g Phosphate (PO43−)

Fig. 6
figure 6

Spatial distribution map of surface water quality for different water quality indicators: a pH, b DO, c Alkalinity, d Hardness, e Conductivity, f Nitrate, and g Phosphate

Further, in the investigation of selected survey stations, WQIs, PLSR, and SRI methods may be utilized to interpret the data and understand the relationships and variations between the ion concentration and the physicochemical characteristics of surface water samples.

DWQI

For drinking purposes, the research area’s DWQI was established. It was determined for the nine samples using the supplied water quality criteria. It also considers the impact of several factors that affect the quality of the water (Zafar et al. 2024). The DWQI values (Fig. 7a) of the surface water samples in the research area varied from 34 to 221, with a mean of 129.33. Again, the DWQI values were often utilized to divide the water quality into five groups (Table 1). In the DWQI classification, one sample (11.11%) was excellent, 3 samples (33.33%) were good, 1 sample (11.11%) was poor, and 4 samples (44.44%) were conferred as poor quality for drinking purposes. Based on the geographical analysis, polluted zones were divided and recognized throughout the study region. In this study, the computed value indicates that majority of surface water samples (55.56%), fell into poor, very poor and unsuitable water categories. However, safe drinking water accounts for only 44.44%. The unsuitability of surface water, observed at five sites, is due to a contamination by high conductivity, very phosphate concentration, and the high nitrate content, respectively. It represents the possible risks to one’s health associated with drinking surface water directly. The impact on water quality that has been seen emphasizes the ways in which anthropogenic interference and land use in the hydrographic basin interact to promote the eutrophication process (Gençer and Başaran 2024). Notably, Fig. 7b depicts the interpolation map across the study area. At S-(1) and (2), the surface water quality is mostly governed by geology because most pollutants are found in surface water as a result of geogenic processes. Increased nitrate and phosphate levels in surface water may be caused by human-caused pollution sources such as intensive irrigation and septic tank leaks (Xu et al. 2024). However, hard rock aquifers are seen to exist in the places with poor and extremely poor water quality. Again, the longer residence time in hard rock aquifers allows for increased mineral weathering (conductivity) at S-(3) and (4). A high concentration of algae encourages the turbidity of the water, lowers the DO levels, and eventually produces toxins. This condition restricts the use of this resource for public supply, as planned for location S-5, and is detrimental to aquatic creatures. Five areas (55%) out of the total, need to be checked before usage are S-(1), (2), (3), (4), and (5), according to the results. Therefore, it is evident that the widespread application of fertilizer and irrigation return flows could be the contributing contributors to the increased average nitrate and conductivity values across all locations (Ejaz et al. 2024). These locations are all close to pasture and agricultural land, which is a land use that contributes to sediment runoff into water bodies and lowers water quality. Additionally, the map clearly indicates that five sites (S-1 to 5) lie in the unsuitable category, containing a value such as 210, 221, 177, 123 and 187 respectively. Similarly, remaining sites (S-6 to 9) accounts for DWQI score as 34, 41, 82, and 89. Thus, artificial water replenishment at the home level should be utilized to reduce the concentration of dissolved ions and raise the WQI in each place. Additionally, reducing anthropogenic pollution sources and improving the WQI in the area can be achieved by implementing appropriate sanitation facilities and sensible agricultural practice laws (Tiwari et al. 2024). Using a GIS tool, a spatial pattern map was produced that offered possible data from the full water quality distribution in the study area. These maps are helpful for monitoring, assessment, and future prediction (Singh et al. 2024).

Fig. 7
figure 7

a WQI variation of the study region and, b spatial distribution map of DWQI

Table 1 Classification of the water quality indices (DWQI)

Analysis on spatial reflectance indices (SRI) in the assessment of water quality indicators

SRI are crucial instruments for the performance and analysis of geographical data related to the management of surface water sources. It is possible to provide insightful results by streamlining and organizing large databases (Jassam et al. 2024c). Several studies have investigated how well space-based optical remote sensing instruments work for determining the quality of water. The aim of this method aimed to carry out in order to identify the key components associated with the different sources of variation in the hydro-chemical data gathered. The majority only examined seven physicochemical water quality variables; however, they did not say anything about estimating surface water’s DWQI using proximate hyperspectral imaging. Using 2-D correlogram maps created from the spectral reflectance of the chosen study area, the newly determined SRIs were created (Naimaee et al. 2024). SRI’s method of flexibility function-fitting hyperparameter tweaking. The data can be used for both numerical and categorical information, and preprocessing is not necessary (Das 2025). Furthermore, the SRI has a long-time horizon for handling missing data (Fig. 8b). These maps showed the values of the R2 for the correlations between DWQI records and SRIs generated from all possible pairings of binary dual wavelengths across the whole spectral range (302–1148 nm). A collection of these techniques allows it to find meaningful classes in a dataset. The best correlations between SRIs and the water quality metrics were shown by the largest R2 hotspot region. The newly discovered SRIs were recovered from the VIS and NIR areas and showed good correlations with DWQI. Relationships between recently calculated SRIs and DWQI had R2 values ranging from 0.65 to 0.82. The results highlight the importance of the NIR (near-infrared) and VIS (visible spectroscopy) spectrum wavelength ranges in assessing the WQIs of the region. As predicted, variations in the physical and chemical composition of the water used to assess and regulate surface water quality had an impact on the characteristics of the emissions reflected from the water components in various light spectrum bands (Han et al. 2024). Regarding the origins of Conductivity, it may occur from the water’s high amounts of organic matter breakdown due to waste water from numerous neighboring companies. The high content of NO3 observed at all locations suggests the presence of anthropogenic activity as a result of intensive irrigation, residential and industrial waste, and excessive application of inorganic nitrogenous fertilizer. In addition, the increased quantities of PO43-found in the surface water samples were likely caused by evaporitic formations, which may have been connected to the nearby agro-food industry or urbanization through waste water discharges (Rao et al. 2024). The VIS, red-edge, and NIR regions of the light spectrum have been shown in numerous studies to correlate more strongly than other spectral regions with particular physicochemical water components in different water bodies. These findings suggest that these spectral regions could be applied to gauge the properties of the water. When comparing the newly calculated SRIs to published indices, the R2 with DWQI was greater.

Fig. 8
figure 8

In the method of DWQI, a association between observed and predicted validation models using PLSR, b correlation matrices show R2 values for the spectra’s potential dull wavelength combinations in the spectrum from 302 to 1148 nm with drinking water quality Index (DWQI)

Prediction of DWQIs using PLSR models

The multivariate statistical analysis method called the PLSR is useful for handling data in chemometrics when there is a significant difference in the number of input variables and the number of output variables, as well as high levels of collinearity and noise in the input variable data. Each SRI only looks at two or three band combinations, even though DWQI is a straightforward approach of evaluating WQI that may be utilized to build lightweight, ground-based spectrum devices for quickly and affordably assessing and regulating water quality on a large scale (Sahab et al. 2024). Because of this, it is challenging to design efficient SRIs for predicting WQIs under a range of potentially misleading circumstances, such as significant fluctuations in the quantities of water components and their effect on the saturation level of the water quality measurements that are being studied. For that reason, in this study (Table 2), several SRIs were used as input variables in the PLSR model, which was used to predict water quality indicators. The PLSR model yielded a more accurate assessment based on R2 values in both the calibration and validation dataset models (Fig. 8a). The PLSR models’ calibration and validation dataset’s R2 and RMSE values are shown in Table 2, for the purpose of DWQI prediction. Thus, PLSR was investigated as a potential alternative method for forecasting the quality of water because it is quick, easy, and only requires a few steps to compute. In validation dataset, for the DWQI, the PLSR model produced robust estimates with R2 of 0.93 and RMSE with a value of 11.67. The results showed the R2 = 0.85 and RMSE = 16.32, in calibration dataset for the 7 physicochemical parameters. Like this study’s forecast of WQIs, (Talukdar et al. 2024) discovered that the drinking water quality index and five WQIs could be predicted using the PLSR model, with R2 ranging between 0.85 and 1.00 in the ground water calibration and validation, in Wadi Fatimah, Makkah Al-Mukarramah Province, Saudi Arabia. Thus, PLSR based on numerous SRIs could be utilized to predict WQI better than utilizing two bands. Once more, PLSR models constructed on several SRIs can be applied as a unified method for WQI remote measurement in water quality assessments. Therefore, the models used in this study have good performance in terms of water quality prediction and a sufficient level of accuracy. Studies like (Eid et al. 2023) found that when predicting inland water quality indicators, PLSR models with a high number of wavebands performed better than models with just one or two wavebands. In the area of Northern Nile, Delta of Egypt, (Mdee et al. 2024) found that the R2 value of the PLSR model reached up to 0.99, indicating good forecasting. Once more, WQIs could be quantified in water quality assessments using PLSR models that are based on many chemical characteristics as a single, cohesive approach (Das 2024b).

Table 2 Calibration and validation models between the observed and predicted values (R2 and RMSE) based on PLSR

Implications of the study

  1. 1.

    Sustainability in terms of water quantity and quality: This approach promotes sustainable water management in a water-stressed zone by utilizing available water resources in the neighborhood.

  2. 2.

    It is advised that the research area’s surface water contamination be decreased by using fertilizers and pesticides sparingly.

  3. 3.

    Applicability: The study’s conclusions can be applied to other areas dealing with comparable issues, like contamination and over usage of surface water. This strategy promotes efficient management and wise use of the resources at hand.

  4. 4.

    In addition, rainwater harvesting will be a useful technique to raise the level of surface water and stop the degradation of surface water, hence reducing reliance on it.

  5. 5.

    Alignment with Sustainable Development Goals (SDG): This study aligns with the United Nations (UNs) SDGs, particularly SDG-6, which focuses on ensuring access to water and sanitation for all. By promoting sustainable water management, it contributes to long-term water security and improved living conditions.

  6. 6.

    The study is relevant to the dedicated efforts of the Government of Odisha and Government of India (GOI) to implement water projects like JALADHARE and Jal Jeevan Mission (JJM) to fulfill the drinking water needs of rural areas.

  7. 7.

    The present surface water quality study will provide a baseline assessment for farmers, regional planners, and other stakeholders engaged in the sustainable management of water resources and farming practices.

  8. 8.

    Scheme based on surface water sources are being undertaken by multi-village drinking water programmes under the National Rural Drinking Water Programme.

  9. 9.

    By purifying surface water from many water sources to the habitats where the water is impacted by high nitrate and phosphate, steps are being done to offer clean and safe water.

  10. 10.

    Initiatives have been made to find a long-term solution to the drinking water problems under the National Drinking Water Scheme, which is based on surface water sources.

Conclusions

In the examined region, surface water, which is a primary supply of drinking water, has been depleting recently. Therefore, the current investigation tried to assess the water quality for human consumption. The drinking indices in this study were derived using seven physicochemical factors. The study was carried out to provide information on the temporal along with the spatial variations of surface water hydro-chemistry utilizing an integrated approach (DWQI, PLSR and SRI) for determining the suitability for surface water for drinking and household purposes.

The following are the investigation’s main conclusions:

  • The surface water of the research region indicates mostly a weak alkaline character.

  • The main causes of the elevated concentration of PO43− and NO3 are water–rock interaction and human activities.

  • According to the findings, anions are seeing a variation in trend: NO3 > PO43−.

  • The DWQI shows that 11.11%, 33.33%, 11.11%, 22.22% and 22.22% of the sample locations are excellent, good, poor, very poor, and unsuitable in terms of the quality of drinking water. The spatial distribution plot demonstrates that majority of the samples, around 5 locations, are of low water quality. As a result of increased evaporation and anthropogenic activities, the results showed that conductivity, phosphate, and nitrate concentrations were greater.

  • According to the study, with DWQI, the SRI that was taken out of the red, red-edge, and green zones had the highest R2. For example, the maximum R2 = 0.70 for the DWQI was shown by the published green or red index. The newly derived index presented the R2 = 0.72 for DWQI.

  • The PLSR model performed well to estimate DWQI with higher R2 and lowest RMSE. Overall, the findings showed that it is often possible to estimate surface water quality variation using a suggested PLSR model. In addition, these multi-SRI based models have the potential to improve the estimation of several WQIs and serve as a unified method for measuring remote component concentrations in water quality evaluations.

  • The five unfit sites which are unsuitable for consumption, as per the results of all methods, is a result of efficient ion leaching, excessive surface water usage, direct effluent discharge, and agricultural influence. As a result, remedial measures and appropriate treatment must be carried out before human consumption.

When policymakers are formulating sustainable water resource management plans, such as reducing fertilizer usage and reducing reliance on surface water in water-stressed areas, an awareness of the quality of the water may be crucial. Therefore, in addition to increasing its dependability for surface water under various conditions, the method proposed in this work, which combines WQI, SRI, PLSR, and GIS techniques, may also be helpful in giving us a thorough understanding of the governing mechanisms and surface water suitability for drinking.

Future scope of the study

  1. 1.

    The report makes a number of recommendations to lessen health hazards, such as installing water treatment facilities, conducting routine surface water quality monitoring, launching public awareness programs, and encouraging better agricultural methods that utilize fewer toxic chemicals.

  2. 2.

    Future research must include a thorough validation procedure for the GIS methodology used, especially the spline interpolation technique, in order to fully assess the accuracy and precision metrics of our GIS-based computations and simulations. Cross-validation is used in this process to compare interpolated values to observed data points.

  3. 3.

    The average size of mistakes in the interpolated data will be highlighted by calculating accuracy metrics, which will give quantitative measures of the model’s performance.

  4. 4.

    For instance, because of their affordability and scalability, the suggested water treatment technologies can be put into practice even in environments with limited resources.

  5. 5.

    The viability of these interventions can be further increased through cooperation with non-governmental organizations and municipal governments.

  6. 6.

    Evaluating the accuracy of our GIS computations is a crucial domain for our upcoming research. This can be done by assessing how consistently the interpolation findings hold up across various data subsets. Through the division of the dataset into subsets for training and testing, researchers may guarantee the reproducibility of findings within a limited confidence interval. Additionally, the accuracy of the interpolated data can be ascertained by computing the standard deviation of many measurements.

  7. 7.

    For legislators and municipal officials, the suggestions made are useful and implementable. Their purpose is to include them into the current frameworks for environmental protection and public health. To help ensure that these recommendations are implemented, the report also makes specific policy proposals. These include tighter industrial discharge rules and incentives for safer agricultural inputs.