Groundwater quality
Identification of DOM using laser-induced fluorescence spectroscopy
The two broad shoulders at 480–500 nm and 550–570 nm, engulfing the water Raman peak within 527.0 ± 2.0 nm (Fig. 4), are consistent with previous studies by Stedmon et al. [58]. It is important to note that the sharper the water Raman peak, the better the water quality. Fluorescence similar to this was again observed for coloured dissolved organic matter in a wide range of environment and in groundwater well samples [56]. Thus, in this study, fluorescence intensity was influenced by the type of surrounding and activity [6, 23] using a fixed excitation and emission wavelengths. Therefore, it is characteristic to note that allochthonous or autochthonous humic-like DOM-type production can be classified as natural or derived from human activity (e.g. human wastes, farm wastes, leachates) through flow path of water, discharge, land-use activities and dry and wet depositions [23]. Thus, the explanations for fluorescence spectra (Fig. 5) observed for 2016, 2017 and 2018, respectively, have been accounted for. Indeed, the trend observed was the same for the three-year research. However, for the sake of brevity the results of 2016 are hereafter presented.
In February 2016, the DOM fluorescence intensities of the water samples ranged from 15.19 to 250.0 a.u. with the lowest value obtained for sample wD1 and the highest was found for sample wC1. For the same month in 2017, the DOM fluorescence had increased, ranging between 15.57 and 638.97 a.u. with the lowest obtained for sample wI1, and the highest for sample wE1. Interestingly, the 2017 pattern was seen in 2018 with the highest being 803.73 a.u. for sample wE1 and 19.37 a.u. for sample wI1.
In March 2016 (at the onset of the rainy season), the lowest DOM fluorescence intensity was observed for sample wI2, which is located in an arid land, not close to a septic tank or public toilet. The highest humic-like DOM fluorescence intensity was found in sample wC2, and this could be due to the fact that well wC is located in a wetland environment and close to a septic tank (Table 1). The age and the nature of the walls of well wC might have contributed to the observed fluorescence intensities (a.u.). Similar explanation can be given for the DOM fluorescence intensities observed in 2017 and 2018 during the same month. However, for 2017, the fluorescence intensities increased ranging from 19.75 to 689.36 a.u., with sample wI2 having the lowest fluorescence intensity value, and sample wE2, the highest. The 2018, DOM fluorescence intensity values ranged from 14.70 a.u. for I2 to 805.95 a.u. for wE2. The high values of the DOM fluorescence intensities obtained in 2017 and 2018 could be attributed to the fact that well wE is located in a wetland environment and close to a septic tank (Table 1), and there was an increase in rainfall within the period.
In April 2016, during the rainy season, the DOM fluorescence intensities ranged from 41.84 to 743.26 a.u., with sample wI3 having the lowest fluorescence intensity, and sample wE3, the highest. The DOM fluorescence intensity for April 2017 ranged from 14.02 to 581.08 a.u. with wI3 as the sample with the lowest intensity, and wE3 having the highest. The reason, as previously given, may validate the differences. However, the decrease in intensity may be due to lack of intense rainfall during the same month in 2017, due to climate change. On the other hand, in 2018 the fluorescence intensities (a.u.) observed ranged from 20.53 a.u. for sample wI3 to 688.61 a.u. for sample wE3. Thus, rainfall levels do affect DOM fluorescence intensity in addition to the deterioration of the walls of the wells and their locations.
Actually, the emission spectra obtained in this study are characterized by broad bands of relatively low fluorescence intensities (a.u) and maximum emission wavelengths which vary in a limited range showing that the samples are of similar origin and nature [36]. The relative low fluorescence intensities indicate dilute samples. In this study, the authors’ assessment of the levels of DOM on seasonal and both spatial and temporal scales, has been mainly qualitative. The fluorophore intensity (a.u.) of the LIF spectra shoulders have been used as a qualitative measure of the DOM concentration levels of the fluorophore in fluorescent intensity of arbitrary unit (a.u.) equivalent to ppm or ppb concentration levels [24, 38] in the water samples. To this end, to discriminate between the levels of contamination, it is possible to quantify the degree of water contamination by checking the ratio between different fluorescence peaks [16] or parts, without subtracting the area under the standardized peak (Fig. 4a) from the spectra area of each contaminated sample [16], although some studies have used various fluorescence indices to show the extent of discrimination among contaminated drinking water samples.
In general, the authors acknowledge that wells which were not covered also showed increased DOM fluorescence during the rainy season. This is so because rainwater brings considerable amounts of dissolved organic matter (DOM) in fractions of low aromaticity and humification degree from the atmosphere to freshwater and marine environments, but little is known about the chemical composition and bioavailability of rainwater DOM [64]. With regard to seasonality, higher concentrations of humic-like DOM were observed during the wet season as a result of leachates from toilets and wetlands, old refuse damps, septic tanks and marshy areas. In contrast, the dry season resulted in much lower concentrations of humic-like DOM. Thus, different hydrological processes seem to be the dominant drivers of seasonality for the well sites studied [14]. Therefore, in this study, the increased humic-like DOM fluorescence intensity trend observed over the three-year period could be due to certain hydrological processes like dominant drivers of seasonality for the sites [14] which needs to be studied further. The deterioration of the walls of the wells might have allowed leachates into the water bodies. Throughout the study period, DOM fluorescence intensity increased in the rainy season; the dry season intensities were mostly low. Thus, the rise in fluorescence intensity of humic DOM substances in the well waters may be a consequence of the input of newly formed humic substances. For example, water samples obtained from wells located in wetlands exhibited higher DOM fluorescence intensities than those obtained from wells in arid lands. However, it is well known that the labile constituents of DOM in aquatic environments provide metabolism substrates for microbial growth and would be decomposed to CO2 [7]. Previous studies on aquatic DOM have indicated that the humic fraction of DOM is mainly responsible for the absorbance of UV light and for the photoproduction of labile substrates that can be subsequently utilized by bacteria. Accordingly, [63] found that aromatic substances including protein- and humic-like matter are easily absorbed by minerals in aquifer, and compared to protein-like substances, humic-like substances are more easily absorbed by minerals, and the humic-like matter concentration exhibited no significant change even though groundwater contamination occurred [19].
Classification using multivariate statistical evaluation
Principal component analysis (PCA) based on the treated raw data of the LIF results was performed on all the water samples for 2016, 2017 and 2018 to help discriminate DOM from different sources and/or subject to different transformations in aquatic environments/seasons. Generally, PCA is an effective statistical tool, and it was used to quantify and categorize the similarity among the water samples. From the fluorescence intensity-emission graphs (Fig. 5) of 2016, in order to obtain conclusive results, the spectra within the region 460–640 nm were used, which amounts to 162 spectra intensities per sample. Statistically, this is more preferable. Figure 6 elucidates the scree plot of the eigenvalues of the principal component numbers and the loadings versus wavelength of the first three principal components. The first three eigenvalues have most of the information regarding the samples. According to the principle of elbow test [26], the truncation (tr) value was determined to be 3. The loadings of PC1 and PC2 were considered since eigenvalues below 100 (= 1) were deemed insignificant. This means that PC1 and PC2 were of high significance for the classification of the quality of the well water samples. In other words, the variance for PC1 shows that more than 99% contributed most significantly to the DOM contamination in the water samples. PC2 contributed less than 1% variance which might be due to the O–H bond of the water component in the samples. Characteristically, the loadings depicted that PC3 does not contribute significantly to the classification of the water samples. For detailed description of the principal component analysis, the reader is encouraged to refer to the work of [8, 62].
A scatter plot (Fig. 7) of PC1 and PC2 for distilled water and the well water samples shows three distinct groups of water samples labelled Groups 1, 2 and 3, respectively. Group 1 consists of 20 water samples including 19 well water samples and the distilled water. The well water samples in Group 1 are characterized by low fluorescence backgrounds (intensities) and sharp water Raman peaks. To this end, Group 1 is likely to consist of water samples with low concentrations of DOM. Group 2 is a cluster of samples A2, A3, H3, J3, C1, C2, E2 and F2. The fluorescence spectra of these samples are characterized by moderately high fluorescence intensities (DOM concentrations) and broad water Raman peaks. Samples C3, E3 and F3, which are located on the far right of the PCA score plot, are the constituents of Group 3. In this group, the DOM fluorescence peak intensities are the highest and broadest, with broader water Raman peaks as compared to those in Group 2.
Although Group 3 samples have the highest water Raman peaks, they are classified as the most contaminated water samples due to their relatively high DOM concentrations (Fig. 5). Principal component analysis (PCA) based on DOM indices revealed two principal factors, which were related to the concentration and humic-type DOM, respectively [31]. Most significantly, the clustering in the PCA score plots (Fig. 7) is consistent with the clusters in Fig. 8, the tree of cluster. Evidently, it shows its ability for the classification of the water samples. In the cluster analysis (CA), the fetches are grouped based on the similarities within a cluster and dissimilarities among the different clusters [39].
In this regard, the Euclidean distance measure examines the similarity and/or dissimilarity (Fig. 8). Group 1 (Fig. 7) relates to Cluster 1 (Fig. 8) with their membership. Similarly, Group 2 relates to Cluster 2, and Group 3 to Cluster 3 in that manner. Based on the dendrogram, the similarities and the dissimilarities in the water samples are again evident. Clusters 1 and 2 are very similar to each other with little marked difference, yet they are unlike those in Cluster 3. With regard to members of Cluster 3, samples C3 and F3 are similar and yet dissimilar to sample E3.
Classification based on K-nearest neighbour (K-NN)
The quality of the water samples studied in 2016 (with regard to DOM contamination levels) was classified using K-nearest neighbour (K-NN) algorithm based on Euclidean distance. This method is strictly based on simple Euclidean distances between one data point (e.g. the singly distilled water) and the well water samples; the shortest distance describes the best water quality. A bar chart (Fig. 9) of Euclidean distances of the well water samples based on similarity measurement shows that B1 corresponds to the shortest Euclidean distance and E3 the longest. Using this range, we classified the samples as being very good, good, fairly good and bad. For example, B1, D2, G1, G2 and I2 are considered as very good water samples; A1, B3, D1, D3, E1, F1, G3, H1 and H2 are deemed good; A2, A3, C1, C2, E2, F2, H3 and J3 as fairly good; and C3, E3 and F3 as bad water samples. The same classification can be obtained from the physical parameters such as colour, odour and the taste of the water samples. In this case, K-NN based on Euclidean distance can be used as a tool for effective water quality classification in addition to other statistical tools such as used in this study. Following on this, a 3 × 4 sample matrix chart (Fig. 10) extracted from Fig. 9 shows the fluorescence fingerprints of three each of the above classifications, vis-à-vis: very good (first row from top), good (second row from top), fairly good (third row from top) and bad (fourth row, bottom) water samples are depicted.