Prediction of ground water quality index to assess suitability for drinking purposes using fuzzy rule-based approach

Groundwater is the most important natural resource for drinking water to many people around the world, especially in rural areas where the supply of treated water is not available. Drinking water resources cannot be optimally used and sustained unless the quality of water is properly assessed. To this end, an attempt has been made to develop a suitable methodology for the assessment of drinking water quality on the basis of 11 physico-chemical parameters. The present study aims to select the fuzzy aggregation approach for estimation of the water quality index of a sample to check the suitability for drinking purposes. Based on expert’s opinion and author’s judgement, 11 water quality (pollutant) variables (Alkalinity, Dissolved Solids (DS), Hardness, pH, Ca, Mg, Fe, Fluoride, As, Sulphate, Nitrates) are selected for the quality assessment. The output results of proposed methodology are compared with the output obtained from widely used deterministic method (weighted arithmetic mean aggregation) for the suitability of the developed methodology.


Introduction
Groundwater is an important and limited resource in many parts of the world and it is extremely important and heavily used in areas where surface water bodies are significantly low. Groundwater quality depends on the quality of recharged water, atmospheric precipitation, inland surface water, and on sub-surface geochemical processes. Temporal changes in the origin and constitution of the recharged water, hydrologic and human factors, may cause periodic changes in groundwater quality. Water pollution not only affects water quality but also threats human health, economic development, and social prosperity (Milovanovic 2007).
A water quality index (WQI) is a mechanism for presenting a cumulatively derived numerical expression defining a certain level of water quality. In other words, WQI summarizes large amounts of water quality data into simple terms (e.g., excellent, good, bad, etc.) for reporting to management and the public in a consistent manner. The concept of WQI is based on the comparison of the water quality parameters with respective regulatory standards and gives a single value to the water quality of a source, which translates the list of constituents and their concentrations present in a sample (Khan et al. 2003;Abbasi 2002). The index method was initially proposed by Horton in (1965). Since then, the formulation and use of indices has been strongly advocated by agencies responsible for water supply and control of water pollution. Landwehr (1979) points out that an index is a performance measurement that aggregates information into a usable form, which reflects the composite influence of significant physical, chemical and biological parameters of water quality conditions. House and Newsome (1989) states that the use of a water quality index (WQI) allows 'good' and 'bad' water quality to be quantified by reducing a large quantity of data on a range of physico-chemical and biological parameters to be a single number in a simple, objective and reproducible manner. Various types of aggregation methods used for aggregation of quality-monitoring data to yield an overall quality index. Over the last three decades, a number of mathematical functions for aggregation of water quality and water pollution indices have been suggested (Horton 1965;Brown et al. 1970;Prati et al. 1971;Dinius 1972;Dee et al. 1973;McDuffie and Haney 1973;Inhaber 1974;Walski and Parker 1974;Truett et al. 1975;Landwehr and Deininger 1976;Ross 1977;Ott 1978;Stoner 1978;Ball and Church 1980;Bhargava 1983;Dinius 1987;House and Ellis 1987;Smith 1989Smith , 1990Dojlido et al. 1994;Š tambuk-Giljanovic 1999;Pesce and Wunderlin 2000;Swamee and Tyagi 2000;Jonnalagadda and Mhere 2001;Cude 2001;Abbasi 2002;Nagels et al. 2002;Said et al. 2004;Debels et al. 2005;Bordalo et al. 2006;Kannel et al. 2007;Swamee and Tyagi 2007). The different aggregation functions can be of additive, multiplicative, minimum or maximum operator forms. Each functions have their own merits and demerits and applicable for limited situations. The most appropriate aggregation function is the one that is either free from or minimizes the overestimation (ambiguity), underestimation (eclipsing) and rigidity problems.
There is always a certain degree of arbitrariness inherent in the choice of an aggregation function. The objective of this study is to develop a fuzzy aggregation approach as suitable techniques for handling the water quality data and conducting the water pollution cumulative risk assessment associated with multi-pollutant under uncertainty. Fuzzy aggregation is the process by which the fuzzy sets that represent the outputs of each rule are combined into a single fuzzy set. Finally, the input for the defuzzification process is a fuzzy set (aggregated output fuzzy set) and the output is a single number.
There are four basic steps primarily involved in water pollution index design: selection of key water pollutant variables; determination of weight for each selected variables; formulation of sub-indices for each parameter; and aggregation of the sub-indices to yield an overall aggregate index.
The study aims to demonstrate the application of soft computing approach for the prediction of water quality index (WQI).

Development of hierarchical fuzzy model for prediction of water quality index
The present study aims to develop a hierarchical fuzzy model for the prediction of water quality index. The fuzzy logic formalism has been used to determine water quality index based on fuzzy reasoning. Comparison has been done with the output obtained from deterministic method (Conventional WQI). A number of artificial data set has been prepared for the demonstration of the water quality assessment. In each data set the concentration of each eleven water quality parameters considered for the study are assumed judiciously to cover the various range of the concentration. Membership functions of the determinants and fuzzy rule bases were defined. The model was evaluated with artificial water quality data set based on Mamdani fuzzy inference system.
The methodology for the development of the fuzzy model to predict the water quality index involves the following steps: Step I: identification of the system's variables The first and most important step in modelling is the identification of system's input and output variables. The structure of the hierarchical fuzzy model is shown in Fig. 1. The structure of the model reveals that first fuzzy model (FIS1) has four input parameters (Alkalinity, Dissolved Solids, Hardness and pH), and second model (FIS2) has also four input parameters (Ca, Mg, Fe, and As) whereas third model (FIS3) has three input parameters (sulphate, nitrate and fluoride). The output values of model FIS1, FIS2 and FIS3 are OG1, OG2, and OG3, respectively then aggregated in the subsequent model to get the final output fuzzy water quality index (FWQI). This relationship between inputs and output can mathematically be expressed as Step II: determination of the ranges of input and output variables The second step is to determine the ranges of the input and output variables. The minimum value for all the parameters are considered as zero except pH for reflecting the best water quality. For pH is equal to 7 representing the best water quality. Similarly, the maximum value is selected for each water quality parameters on the basis of permissible concentrations as per drinking water quality standards (IS 10500). The maximum value for each water quality parameters are four times the permissible concentrations except pH. The drinking water quality standards (IS 10500) for each water quality parameters considered in the study is listed in Table 1. These variables (water quality parameters) in fuzzy modelling are defined as linguistic variables whose linguistic values are words or sentences in a natural or synthetic language. Table 2 shows the linguistic variables, their linguistic values and associated fuzzy intervals.
Step III: selection of the membership functions for various inputs and output variables The next step is to express linguistic values in the form of fuzzy sets, which are  represented by its membership functions. The amount of overlap and the shape of fuzzy sets should be considered by an expert for each input variable. The triangular membership function is the simplest one and has been used due to its computational efficiency. The membership functions for all inputs and output are shown in Fig. 2a-l.
Step IV: Formation of the set of linguistic rules The next step is to writing the linguistic rules. The rule base is a set of linguistic statements in the form of IF-THEN rules with antecedents and consequents, respectively, connected by AND operator. In general, a fuzzy rule-based system with multi-inputs single-output (MISO) can be represented in the following manner: where X 1 , X 2 …, X n are the input linguistic variables and Y 1 is the output linguistic variables, B 1 (i) , B 2 (i) ,…B n (i) and D 1 (i) are linguistic values defined by fuzzy sets on X 1 , X 2 …X n and Y 1, respectively.
Rule bases for each FIS are written separately and represented in Tables 3, 4, 5 and 6 for FIS1, FSI2, FIS3 and FWQI, respectively. Table 3 and 4 represents 81 number of rule bases for FIS1 and FIS, respectively. Similarly, Table 5 and 6 represents 27 number of rule bases for FIS3 and FWQI, respectively. The present methodology to develop fuzzy model for prediction of water quality index (WQI) has been implemented on Fuzzy Logic Toolbox of MATLAB7.
Step V: fuzzy inference and defuzzification The final step is to select the fuzzy inference system for the suitable aggregation and defuzzification of the output to obtain the crisp output. Fuzzification is the process of transforming  the real value inputs into fuzzy values whereas the defuzzification process is the way of transforming fuzzy outputs into real values. The output of the model is calculated using the centroid method. Centroid method is characterized for calculating the gravity center of the final membership function as follows:

Results and discussion
The rules representation of the four models FIS1, FIS2, FIS3 and FWQI are shown in Figs. 3, 5, 7 and 9, respectively. The surface views of the FIS1, FIS2, FIS3 and FWQI models are shown in Figs. 4a-f, 6a-f, 8a-c and 10a-c, respectively. The rule base representation in Fig. 3 indicate that the probable value of FIS1 is 1.73 for the respective average concentrations/value of Alkalinity, Dissolved Solids, Hardness, and pH are 400, 1,000, 600 mg/l and 7. The results are summarised in 3D plots as shown in Fig. 4a-f. Figure 4a shows FIS1 values as a function of Alkalinity and Dissolved Solids as the input while the third (Hardness) and fourth input (pH) are hidden in the view. The model clearly indicates that as either the Alkalinity or Dissolved Solids increases, the FIS1 also increases. Similarly, the model for FIS1 with other combinations of inputs and associated default variables can be explained.
Similarly, the rule base representation in Fig. 5 indicate that probable value of FIS2 is 2.03 for the average concentrations of Ca, Mg, Fe and As are 150, 60, 0.6 and 0.1 mg/l, respectively. The results are summarised in 3D plots in Fig. 6a-f. Figure 6a shows FIS2 as a function of Ca and Mg as the input while the third (Fe) and fourth input (As) are hidden in the view. The model clearly indicates that as either the Ca or Mg increases, the FIS2 also increases. Similarly, the model for FIS2 with other combinations of inputs and associated default variables can be explained.   The rule base representation in Fig. 7 indicate that probable value of FIS3 is 2.09 for the average concentrations of sulphate, nitrate and fluoride are 400, 90 and 3 mg/ l, respectively. The results are summarised in 3D plots shown in Fig. 8a-c. Figure 8a shows FIS3 as a function of sulphate and nitrate as the input while the third (Fluoride) is hidden in the view. The model clearly indicates that as either the sulphate or nitrate increases, the FIS3 also increases. Similarly, the model for FIS3 with other combinations of inputs and associated default variables can be explained.
Fuzzy water quality index (FWQI) model has been computed as a function of FIS1, FIS2 and FIS3. The rule base representation in Fig. 9 indicates that the probable value of FWQI is 2 for the value of FIS1, FIS2 and FIS3 are 2, 2 and 2, respectively. The results are summarised in 3D plots as shown in Fig. 10a-c. Figure 10a shows fuzzy water quality index (FWQI) as a function of FIS1, and FIS2 as the inputs while the third (FIS3) is hidden in the view. The model clearly indicates that as either the FIS1 or FIS2 increases, the FWQI also increases. Thus, the scale of water quality index is a reducing scale, that is, higher value of WQI indicates poor water quality in the area and vice versa.

Validation of the model
The validation of the model has been done by comparing the predicted values of fuzzy WQI from the model with that of the deterministic values of WQI. The detailed methodology of deterministic WQI determination is Appl Water Sci explained below in the next section. A number of artificial data set (nine) for each water quality parameter has been prepared for comparison of water quality indices. The water quality parameter concentrations in the artificial data set are assumed in such a way that it covers all the possible ranges of pollution concentrations. The artificial dataset thus generated are shown in Table 7.
The prediction values of FWQ indices for all the dataset are listed in Table 7. The predicted values WQI are compared with the deterministic WQI. The deterministic  WQI has been also determined for the same dataset and reported in Table 7. The predicted values of fuzzy water quality index and deterministic water quality index are plotted as shown in Fig. 11 for the graphical representation of the comparative values. The trend of the FWQI and WQI lines reveals that the developed fuzzy model can be used for the prediction of water quality index and the fuzzy aggregation mechanism for the prediction of water quality index is better representation than that of the existing aggregation method.

Determination of WQI using deterministic method
The weighted arithmetic mean function (Horton 1965;Brown et al. 1970;Prati et al. 1971;Dinius 1972;Dee et al. 1973;Inhaber 1974;Ott 1978;Ball and Church 1980;Egborge and Coker 1986;Giljanovic 1999;Prasad and Bose 2001;Bardalo et al. 2001;Kumar and Alappat 2004) has been used to determine the water quality index (WQI). The weighted arithmetic mean function is ambiguity free function, shows small eclipsing with large number of variables and is widely used aggregation function. The formula used to determine the aggregated water quality index is given below.
where, I i is the sub-index of ith water quality parameter, WQI is water quality index and 'n' is the number of water quality parameters considered. W i is the weightage of the ith water quality parameter, The sub-index of ith quality parameter can be determined by where, C i the observed concentration of the ith water quality parameter, C s the concentration limit value of the ith water quality parameter as mentioned in Table 1. C min the minimum concentration of the parameter reflecting best water quality, The minimum value for all the parameters considered in the model are 0 except pH (for pH = 7, represent best water quality) The weightage of individual pollutants can be found out using analytical hierarchy process (AHP). AHP is a systematic method for comparing a list of objectives or alternatives. This method forms a pair-wise comparison matrix 'A' as shown below, where the number in the ith row and jth column gives the relative importance of individual water pollutant parameter P i as compared with P j The comparison matrix generated by expert ranking using Saaty's scale (1980) is as below:  Condition A: desirable or minimum value for all the parameters reflecting best water quality, B: 20 % of the permissible limit for all the parameters, C: 40 % of the permissible limit for all the parameters, D: 60 % of the permissible limit for all the parameters, E: 80 % of the permissible limit for all the parameters, F: permissible limit reflecting threshold level of water quality for all the parameters, G: maximum limit for the model application (4 times of the permissible limit), H and I: arbitrarily assumed the concentration level more than the permissible limit

Water pollution parameter's concentrations/levels in different condition
Deterministic WQI FWQI Fig. 11 Graphical representation of FWQI and deterministic WQI Appl Water Sci Thus, the sum of the weightage of the pollutants obtained as The consistency ratio (CR) of the matrix 'A' calculated was found to be 0.009 which is \0.1 as par Satty (1980) and thus the consistency of matrix A is acceptable.

Conclusions
The study suggests a robust decision-making tool for drinking water quality management in the form of the fuzzy water quality index (FWQI). The developed methodology demonstrates to determine a single index value to make assessment of drinking water quality more understandable especially in public consideration. This new index is believed to assist decision makers in reporting the state of ground water quality for drinking purposes. It has been demonstrated that computing with linguistic terms within fuzzy inference system (FIS) improves the tolerance for imprecise data. The model is developed for water quality assessment with artificial dataset considered eleven water quality (Alkalinity, Hardness, pH, Dissolved Solids, Ca, Mg, As, Fluoride, Sulphate, Nitrate and Iron) parameters. The authors believe that the fuzzy logic concepts, if used logically, could be an effective tool for drinking water policy issues. The fuzzy model developed is applicable only for specific number of water quality parameters in specified range selected.