Water quality assessment of the Tano Basin in Ghana: a multivariate statistical approach

Multivariate statistical techniques including principal component and factor analyses were applied in this study to assess the quality of surface water from Tano basin in Ghana. The water samples were obtained from three monitoring stations from January to October 2016. The obtained data set was analysed using multivariate statistical methods. The results obtained from Rho Spearman's correlation revealed that at P < 0.05 two-tailed, a positive correlation between pH and total dissolved solids, pH and alkalinity, pH and electrical conductivity, pH and major anions and cations such as SO4, F, Ca, K, Na and Mg was established. However, negative correlation existed between pH-colour, pH-turbidity and total suspended solids. The results of the principal component analysis show that the five principal components explain more than 91.57% of the total variance and hence can be relied upon for identification of the main sources of variation in the physicochemical properties of the water samples. Principal component 1 embodies about 54.26% of the variance and possesses a high loading for electrical conductivity, Na, Ca, K, Mg. Principal component 2, which also explains 33.94% of the total variance, holds high loadings for pH, SO4, HCO3, and total alkalinity. Component 3 also shows high loadings for TDS, TSS and conductivity, which account for 3.378% of the variation in the hydrochemistry. Components 4 and 5 show a joint influence of anthropogenic activities and partial ecological recovery system of the river and its basin which influence the overall water quality within the basin.


Introduction
Water is one of the most important resources on earth which is required in its pure state for the existence of life. The increasing expansion of the human population with its attendant intensification of industrial activities is impacting negatively on ground and surface water quality across the globe. Indiscriminate discharge of untreated mining and industrials wastes coupled with runoffs from agricultural activities contributes significantly to the pollution of water bodies (Armah et al. 2014;Iscen et al. 2008;Obiri 2007).
The geological nature of a particular area plays a major role in determining the quality of that area's water resources. For instance, minerals dissolution from dominant geological formations (water-rock interactions) typically determines the physicochemical characteristics of either the surface water or the groundwater of that particular area. Both geogenic and anthropogenic factors become important when discussing the overall water resources quality of any given area (Naylor 2003).
Within the Tano basin of Ghana, several land-based activities such as mining of gold by either legalised small-scale or illegal gold mines as well as runoff of pesticides from agricultural lands have all made significant contributions to the concentrations of contaminants in the water bodies of the basin (Baba and Gündüz 2017). Most of the scientific studies in Ghana have assessed the quality of water bodies through direct measurement of the contaminant levels. For example, (Akabzaa et al. 2007) measured the highest 49 Page 2 of 8 concentration of arsenic (3300 µg/L) in water samples from the Obuasi mine area within the Pra Basin; whilst the lowest arsenic concentration (0.05 µg/L) was measured by Ansa-Asare et al. (2014) in water samples from the southwestern river systems.
Statistical methods such as descriptive statistics, graphics as well as isotopic techniques, have been used to evaluate the sources of nutrients and other contaminants in surface water (Hounslow 2018). Also, a combination of descriptive graphics and multivariate statistical techniques are important powerful tools that can be used in characterizing contaminants levels in water basins with complicated land use types and history of pollution (Belkhiri et al. 2011). Examples of multivariate statistical techniques that have used in previous water quality assessment are principal components analysis (PCA), hierarchical cluster analysis (HCA) and discriminant analysis (DA).
The use of multivariate statistical techniques in explaining hidden structures in water quality data in Ghana is now gaining grounds (Armah et al. 2010). The main motive of this paper is, therefore, to explore the use of multivariate statistical techniques in assessing the quality of water bodies within the Tano basin in Ghana.

The study area
The vegetation in the Tano basin is mainly tropical evergreen forests which constitute 50% of forest reserve, Celtis-Triplochiton association (20%), and rain forests (10%). The annual rainfall for the Tano basin which is part of the southwestern rivers system of Ghana varies from 1136.7 mm to 2156.0 mm. The basin has two rainfall seasons which peak in May/June and October/November. The annual temperature of the area has minimum and maximum values of 25.0 °C and 27.0 °C, respectively, with variations of 3 °C to 5 °C from the mean during the daytime. The relative humidity of the area changes from 58 to 96% per annum. The geological formation underlying the Tano river basin with their respective percentage areal coverage includes: Birimian Volcanics (25.5%), Birimian Sediments (31.9%), Tarkwaian (0.6%), Upper Voltaian (1.8%), Granite (38.1%) and Eocene & Cretaceous (2.1%) (Kesse 1985). The dominant formation is the Granitoids and the least in abundance is the Tarkwaian formation underlying the geographic boundary of the catchment (Fig. 1).
The Birimian rocks are strongly folded, foliated and jointed with associated intense weathering along with fractures and other weak zones which can facilitate percolation of water to enhance groundwater storage. The Tarkwaian formation, unlike the Birimian, is slightly metamorphosed and folded with some level of openings along joints. It is permeable and facilitates groundwater development and storage. The Upper Voltaian, otherwise known as the Kwahu-Group, is the thickest and coarsest in the southeast. The basement sandstone of the Upper Voltaian is rich in aquifer system at depth (Tetteh 2016). The granites are characterised by fractured aquifer systems; the sandstone of the tertiary cretaceous along the coast is a rich aquifer system with loose sand shallow aquifers. Eocene and Cretaceous sediments are composed of intercalated beds of sand (SiO 2 ), clay (44% SiO 2 and 40% Al 2 O 3 ), fossiliferous sandy limestone interbedded with limestones, marl (CaCO 3 ), sandstone, siltstone, mudstone, shale with basal conglomerates (Kesse 1985).

Sampling
The water samples were collected from surface water bodies within the Tano basin from January to October 2016. Garmin Etrex GPS was used to determine the geo-satellite position of all the locations. In all, sixty (60) samples were collected from three different locations on the Tano River (i.e. upstream, midstream and downstream) (see Fig. 2). Sampling was carried out by following the  (Kesse 1985). Sampling was preceded by the washing of the sampling bottles with detergent, rinsing with 10% hydrochloric acid followed with doubledistilled water. Before sampling at each of the sampling sites, the water to be collected was used to rinse the bottles so as to eliminate or minimise any potential crosscontaminations. The samples were maintained at around 4 °C by placing them in ice cubes in an ice-chests and transported to the laboratory for analysis.

Laboratory analysis
Methods used in the laboratory for the determination of all the water quality parameters are as summarized below (APHA 1998).
Determination of dissolved oxygen (DO) and biochemical oxygen demand (BOD) was achieved using modified Winkler's method. The nutrient contents of the samples were determined as follows: a. Orthophosphate (PO 4 -P): using ammonium molybdate and ascorbic acid method. b. Ammonia-Nitrogen (NH 4 -N): through the use of indophenol blue method. c. Nitrate-Nitrogen (NO 3 -N): by hydrazine reduction followed by diazotization to form an azo-dye which was measured calorimetrically.
Magnesium (Mg) concentration was calculated using the formula (total hardness-calcium hardness) × 0.244, chloride (Cl) was determined through the argentometric method, and heavy metals determination was done using atomic absorption spectrophotometry. Conductivity, turbidity and total suspended solids (TSS) were determined using conductivity meter, DRT 100B Turbidimeter, and membrane filtration (glass fibre type C) method (dried at 105 °C), respectively. Total dissolved solids (TDS) were determined by weighing after evaporating a known volume of the sample.

Data treatment
The data obtained from laboratory analysis were analysed using different statistical techniques such as mean, standard deviation, skewness, kurtosis. The standardized kurtosis and skewness analyses were performed in order to assess whether the sample came from normal distributions or not. Statistical values found outside − 2 to + 2 range are deemed to have significantly departed from normality. The statistical data analyses were performed for all variables in the original dataset using SPSS version 17.

Principal component analysis (PCA)
PCA lessens the dimensionality of a data set comprising numerous interrelated variables, and at the same time, retains as many of the variability present in the data set as possible (Iscen et al. 2008). To achieve the reduction, the data set is transformed into a new set of non-correlated variable that is organised in increasing order of importance and are referred to as the principal components (PCs).
The indicator (original) variables' total variance is disintegrated into two main components. The first component is a variance associated with the over-all chemical level in surface water. It is expressed as the square of the correlation between any indicator and the factor (pattern loading), and it is known as the communality of the indicator with the common factor (Armah et al. 2010). The second component is a variance that is associated with a unique parameter or factor. It is expressed as the change in the variable minus the communality; and it is known as the specific (error) variance since it is specific to that unique variable (Sharma 1996;Singh et al. 2004). In this study, SPSS version 17 was used in the principal component analysis.

Water quality index analysis
Water quality index (WQI) is a single dimensionless number from 0 to 100, calculated from selected key water quality parameters that describes the overall water quality status of water bodies (Kankal et al. 2012). It is used to indicate the degree to which the natural water quality of a water body is impacted by human activity (WRC 2003). It is obtained by transforming all the selected water quality parameters which are in different units into a unitless common scale, called sub-indices, by plotting graphs called rating curves. The rating curves convert the values of each parameter to a scale of 0-100 (WRC 2003;Sutadian et al. 2016). Weights are then assigned to the parameters with regard to their relative importance and their influence on the final index value (Sutadian et al. 2016;Ramesh et al. 2010). The subindices are then aggregated using a mathematical function to obtain the final WQI value. The sub-index values established from the rating curves can also be tabulated for more practical use, where scores are read from the rating table (Darko et al. 2013). Scores from tabulated sub-index values were used in the calculation of the water quality index values reported in this paper. The weights of the parameters were predetermined by the Water Resources Commission of Ghana based on their relative importance on the freshwater quality of Ghana. The water quality index used in Ghana was developed in 2003 by the Water Resources Commission of Ghana (WRC) for assessing the overall water quality status of surface freshwaters (WRC 2003). The index uses ten key water quality parameters to assess the water quality, namely dissolved oxygen (DO), biochemical oxygen demand (BOD), ammonium nitrogen (NH 4 -N), E. coli, pH, nitrate as nitrogen (NO 3 -N), phosphate as phosphorus (PO 4 -P), total suspended solids (TSS), Conductivity and Temperature. The weights assigned to the parameters by the WRC are: DO (0.18), BOD 5 (0.15), NH 4 -N (0.12), E. coli (0.12), pH (0.09), NO 3 -N (0.08), (PO 4 -P), (0.08), TSS (0.07), conductivity (0.06) and temperature (0.05). Hence, in this study, Ghana Water Resources Commission's ten key water quality parameters together with their weights were used in calculating the water quality index.
Equation 1 was used for the calculation of the water quality index.
where qi = quality rating of the ith parameter, wi = relative weight of the ith parameter, n = total number of parameters.
For quality assurance purposes, blank samples and laboratory control standard solutions were analysed together with all samples. Analytical results were accepted only if the recoveries of control standard solutions were found to be within ± 5% of their respective concentrations, otherwise analysis of the batch was repeated. Also, for a batch of five samples, one duplicate sample was analysed. For analysis of BOD, samples were incubated in an incubator maintained at 20 ± 1 °C.
The WQI of Ghana classifies water quality into four categories: good (Class I), fairly good (Class II), poor (Class III), and grossly polluted (Class IV). An index of > 80 indicates Class I or good water quality; an index of 50-80 indicates Class II or fairly good water quality; an index of 25-50 indicates Class III or poor water quality; and an index of < 25indicates Class IV or grossly polluted water (WRC 2003). In this study, the seasonal trend in the water quality of the rivers was determined to be: March > July > October. The water quality index values obtained for March, July and October showed that all the waters were typical of Class II, i.e. they were of fairly good quality (WRC 2003). The order of the seasonal trend in the quality of the rivers can be explained in part by the high total suspended solids (TSS) encountered in July and October due to the high floods and run-off into the rivers.

Results and discussion
The results of the descriptive statistical analysis of the laboratory data of surface water parameters from the Tano River and its basin are shown in Table 1. The temperatures of the surface waters from the study area are in the range of 23.9 (1) to 26.2 °C. This temperature is typical of surface waters in Ghana. The pH values are in the range of 6.85 to 8.93; which indicate that the pH of the water was close to the acceptable pH value of 6.5 to 8.5. The water samples' pH values satisfied the standard requirement for domestic use and the protection of aquatic life.
Also, apart from total iron concentration, all the other parameters including the physical parameters, nutrients, major ions, heavy metals, as well as the chemical parameters, were generally found to be within the Ghana Standard (GS 175-1/WHO) acceptable limit (Authority, 2008). Table 2 represents the Spearman rho correlation coefficient of the data obtained in this study. According to the table, no strong correlations between pH and turbidity, conductivity, sodium, potassium and sulphate are noticeable. The Spearman rho's correlation matrix, nevertheless, shows a noteworthy inter-physicochemical association (p < 0.05 and p < 0.01). Negative relationships were found to exist between conductivity and turbidity; Cl-turbidity; Fe-conductivity; and Fe-Cl, respectively, at r = 0.828 two-tailed, P < 0.05. Also, positive correlations (r > 0.5) were found between Fe-turbidity, Cl-conductivity and total coliform-BOD 5 . The high concentration of total iron in the water from River Tano and its basin correlates very well with the elevated levels of colour and turbidity values.

Principal component analysis
From the PC analysis (Table 3), five components with eigenvalues ranging from 7.441% to 35.884% were noted to account for the change (variance) in the water quality data set obtained from the Tano basin. Eigenvalues greater than 1 were considered as the standard with which the principal components needed to explain the origin of the changes in  Table 3. Five principal components (Eigenvalues > 1) were extracted from the water quality data sets as shown in Table 3. These components cumulatively explain about 92.559% of the variance in the data. Component 1 (factor 1), which expresses about 35.884% of the total variance, includes mainly hydrochemical properties (pH, electrical conductivity, TDS, total hardness, total alkalinity, calcium hardness and bicarbonate) of the water samples.
As noted by (Cobbina et al. 2012), component 1 is attributable to the natural hydrochemistry of the surface and groundwater systems in Ghana. It suggests the presence of bicarbonates and hydroxides. Component 2 (or factor 2) explains about 27.875% of the total variance which refers to the presence of toxic anthropogenic metal such as manganese, arsenic and total iron as well as other physic-chemical parameters such as temperature, nitrite, sulphate, total suspended solids, colour and turbidity. Component 3 accounts for 16.727% of the total variance, which includes phosphate, nitrate and dissolved oxygen levels in the water samples. The high phosphate and nitrate in samples from the Tano basin may be attributed to human activities such as runoffs from excessive application of fertilizers by farmers or washing of clothes along the banks of the basin. The dissolved oxygen (DO) concentration of the samples ranged from 5.65 mg/L to 8 mg/L with a mean of 6.89 mg/L (Table 2). This indicates that the river was well oxygenated.
Factor 4, which also explains 12.073% of the total variance, includes toxic anthropogenic faecal coliform bacteria. It also accounts for magnesium hardness and magnesium ions in the water samples. According to (Leclerc et al. 2001), the presence of faecal coliform in water bodies is a good indicator of faecal contamination. The fifth component explains only 7.441% of the total variance, and it is influenced by fluoride and chemical oxygen demand. These two factors (factor 4 and 5) appear to originate from the effects of both anthropogenic activities and partial ecological recovery system of the river and its basin.

Water quality index
Water quality index is used to indicate the degree to which the natural water quality of any source is impacted by human activity. In this study, the seasonal trend in the quality of the waters was determined to be: March > July > October (Fig. 3). The water quality index values obtained for March, July and October showed that all the waters were typical of Class II, i.e. they were of fairly good quality (WRC 2003). The order of the seasonal trend in the quality of the waters can be explained in part by the high total suspended solids (TSS) encountered in July and October due to the high floods and run-off into the waters.

Conclusion
Principal component and factor analyses have been successfully applied in the analysis of hydrological data of this study and hence the assessment of the quality of surface water from Tano basin in Ghana. The results of the principal component analysis show that five principal components explain more than 91.57% of the total variance and hence can be relied upon for identification of the main sources of variation in the physicochemical parameters of the water samples. This study reveals that factor analysis is a valuable tool that can help in decision making with regards to the extent of water pollution using real pollution indicators. It is also capable of providing a rough standard or guideline for the selection of potential pre-emptive actions required to manage surface water bodies appropriately.