Assessment of Surface Water Quality in Hyderabad Lakes by Using Multivariate Statistical Techniques, Hyderabad-India

Multivariate statistical techniques such as cluster analysis (CA), principle component analysis (PCA), factor analysis (FA) were applied for the evolution of temporal variations and the interpretation of large complex water quality data set of the Hyderabad city, generating during year 2013-14 monitoring of 16 parameters at 23 different sites of an average depth of 1m. Hierarchical clustering analysis (CA) is first applied to distinguish the three general water quality patterns among the stations. Data set thus obtained was treated using R-mode factor analysis (FA) and followed by principle component analysis (PCA). Factor analysis identified five factors responsible for data structure explaining 75% of total variance and allowed to group selected parameters according to common futures. WT, EC, TSS and Na were associated and controlled by mixed origin with similar contribution from natural and anthropogenic sources. Whereas NO3, PO4, SO4, FC, TC, F, K and B were derived from anthropogenic sources.


Introduction
The protection and restoration of urban lakes and wetlands, urban lakes are in extremely poor condition in Hyderabad, within last 12 years, Hyderabad has lost 3245 ha.area of its water in the form of lakes and ponds.There are endless examples in India that shows such devastating state of urban water bodies (Sridhar Kumar et al, 2014).Almost all urban water bodies in India are suffering because of pollution and are used for disposing untreated local sewage, industrial waste water and solid waste, and in many cases the water bodies have been ultimately turned into landfills.Point and non point sources of pollution degrade surface and ground water and impair their use for drinking, industrial, agricultural, recreation or other purposes (Carpenter et al., 1998;Howarth et al., 1996).A number of point and non point sources contaminate the water bodies by adding the excess nutrients and heavy metals.Over the years their capacities went on decreasing by rapid urbanization, encroachments into lake areas and increased sedimentation resulting from the high human interference in the catchment area (Ramachandraiah and Prasad, 2004).Urbanization increases in population density and the intensification of agricultural activities in the upstream areas is among the main causes of water pollution.Therefore, researchers have been paying more attention to the effects of natural and human activities on water quality, in particular, the key contributors of human activities to nutrients and heavy metals.The discharge of effluents and associated toxic compounds into aquatic ecosystem represents an ongoing environmental problem due to their possible impact on communities in the receiving aquatic water and a potential effect on human health (Abbas Alkarkhi et al., 2008).Further these materials enter the surface water resulting in pollution of irrigation and drinking water.Although, the government of India's (GOI, 1992) policy statement on abatement of pollution at source (GoI, 1992).
Many investigations have been conducted on anthropogenic contaminants of ecosystems.Because of the spatial and temporal variation in water quality conditions, a monitoring program which provides a representative and reliable estimation of the quality of surface waters is necessary (Dixon and Chiswell 1996).The monitoring results produce a large and complicated data matrix that is difficult to interpret to draw meaningful conclusions.Multivariate statistical techniques are powerful tools for analyzing large numbers of samples collected in surveys, classifying assemblages and assessing human impacts on water quality and ecosystem conditions.
The application of different multivariate statistical techniques, such as principal component analysis (PCA), factor analysis (FA), and cluster analysis (CA), assists in the interpretation of complex data matrices for a better understanding of water quality and ecological status of the studied system.These techniques provide the identification of possible factors/sources that affect water environmental systems and offer a valuable tool for reliable management of water resources as well as rapid solution for pollution issues (Palma et al., 2010;Morales et al., 1999).Multivariate statistical techniques have been widely adopted to analyze and evaluate surface and freshwater water quality, and are useful to verify temporal and spatial variations caused by natural and anthropogenic factors linked to seasonality (Wunderlin et al., 2001;Simenov et al., 2003).
The study area, Hyderabad consists of urban lakes situated on the Deccan Plateau at a height of 1788 feet above sea level, located at 17° 22' of northern latitude and 78° 29' of the eastern longitude with an area of 7,100 sq km.The city has been dotted with a number of lakes and almost all the lakes were artificially created, often some centuries back, by constructing bunds and dams in the downstream area of micro-catchments.From upstream of the reservoir to the downstream, these lakes form a cascading system with limited storage space.(Fig- 1).Normal rainfall is 786.8 mm which increases from northwest to southeast.The mean maximum and minimum temperature vary from 40° to 14°C.The city is drained by river Musi and the drainage pattern is of dendritic and rectangular type.

Materials and Methods
In the present study, the data obtained during the year 2013-14, is subjected to different multivariate statistical techniques to extract about the similarities or dissimilarities between sampling stations, identification of water quality variables responsible for spatial and temporal variations in lakes water quality, the hidden factors explaining the structure of the data base and the influence of possible sources (natural and anthropogenic) on the water quality parameters of the lake basins.
The author has conducted a water quality survey during the year 2013-14 on few lakes & tanks because of either increase in the levels of critical parameters or on the point of conservation so as to improve the water quality and its management.These lakes have an average depth of 1 m and having the major human activities like cattle wading, boating, fishing, and the agriculture is melon farming, vegetables and paragrass.The major water quality issues are pathogenic (Bacteriological) pollution, oxygen depleting organic pollution, agricultural runoff, salinity and trace elements.

Monitored Parameters and Analytical Methods
The data generated about 23 water quality monitoring stations, comprising 16 water quality parameters monitored during the year (2013-14).The selected water quality parameters, their units and methods of analysis are summarized in Table 1.The author has sampled preserved and analyzed all the water quality parameters as per Indian inland surface water quality standards.All the samples were collected at center of the lake location.The depth of the sample is subsurface 0.5 m below the water surface.The basic statistics of the measured one year data set on Hyderabad lakes water quality are summarized in Table 2.  From the table 2, it is observed that, stations 13, 19, 20 and 21 are receiving directly untreated waste water from the urbanized catchment and the parameters like Phosphates, Nitrates, Coil forms shows above the prescribed standards, whereas stations 2, 11, 12 and 18 are polluted due to agricultural runoff from the catchment area and the presence of parameters like Boron, Potassium, Conductivity, Sodium, SAR and Fluoride shows the above prescribed standards as per the Central Pollution Control Board (CPCB) standards.

Data Treatment and Multivariate Statistical Methods
The surface water quality data sets were subjected through three multivariate techniques: cluster analysis (CA), principle component analysis (PCA) and factor analysis (FA) (Singh et al., 2004;Dong et al., 2010 andKim et al., 2009).Summary statistics of these data sets were first calculated to evaluate the distributions.FA was applied on standardized data through Z-scale transformation in order to avoid misclassification due to wide difference in data dimensionality (Liu et al., 2003;Kim et al., 2009), standardization tends to increase the influence of variables whose variance is small and vice versa.All the mathematical and statistical computations were made using Statistical Package for Social Sciences SPSS, 1995).
Cluster analysis is group of multivariate techniques whose purpose is to assemble objects based on the characteristic they possess.Hierarchical agglomerative clustering is the most common approach, which provides intuitive similarity relationships between any one sample and the entire data set, and is typically illustrated by a dendrogram (tree diagram) (McKenna, 2003).The Euclidean distance usually gives the similarity between analytical values from the samples (Otto, 1998).In this study hierarchical agglomerative CA was performed on the normalized data set by means of the Wards method, using squared Euclidean distances as a measure of similarity.The Wards method uses an analysis of variance approach to evaluate the distances between clusters in an attempt to minimize the sum of squares (SS) of any two clusters that can be formed at each step.The special variability of water quality in the city determined from CA, using the linkage distance, reported as D link /D max , which represent the quotient between the linkage distances for particular case divided by the maximal linkage distance.The quotient is then multiplied by the 100 as a way to standardize the linkage distance represented on the y-axis.

Factor Analysis/Principal Component Analysis (PCA)
Factor analysis technique extracts the eigen values and eigen vectors from co-variance matrix of original variables.The principle components (PC) are the uncorrelated (orthogonal) variables obtain by multiplying original correlated variables with eigen vector, which is a list of coefficients (loading or weightings).Thus principal components are weighted linear combinations of original variables.PC provides information on the most meaningful parameters, which describe whole data set affording data reduction with minimum loss of original information (Vega et al., 1998;Helena et al., 2000;Shrestha and Kazama 2007).It is a powerful technique for pattern recognition that attempts to explain the variance of large set of inter-correlated variables and transforming in to a smaller set of independent (uncorrelated) variables (principle component).Factor analysis further reduce the contribution of less significant variables obtained from PCA and the new group of variables known as varifactors, are extracted through rotating the axis defined by PCA.A varifactor can include unobservable, hypothetical, latent variables, while a PC is a linear combination of observable water quality variables (Panda et al., 2006;Davis,1986).PCA of the normalized variables was performed to extract significant PC's and to further reduce the contribution of variables with minor significance.These PC's were subjected to varimax rotation (raw) generating varifactors (Brumelis et al., 2000;Love et al., 2004;Abdul et al., 2005).

Spatial Similarity and Size Grouping
Cluster analysis was used to detect the similarity groups between the sampling sites.It yielded a dendrogram (Fig 2) grouping all 23 sampling sites of the city in to three statistically meaningful clusters at (D link /D max ) x 100 < 60.Since we used hierarchical agglomerative cluster analysis, the number of clusters was also decided by practicality of the results as there is ample information (e.g.land use, location of industries etc.) available on the study sites.The results indicate that the CA technique is useful in offering reliable classification of surface water in the whole region and will make it possible to design a future spatial sampling strategy in an optimal manner, which can reduce the number of sampling stations and associated cost.The main descriptive statistics are shown in table 3. Statistical treatment of these data indicates their association and grouping with five factors in water bodies (Table 4).The presence of phosphate and nitrate in most of the sample stations were recorded high.Phosphate it varies from 0.00 to 4.85mg/l with an average of 1.99 mg/l.majority of the sample stations (16 out of 23) the phosphate recoded above the permissible limits of 1 or above 1.00 mg/l.Nitrate it varies from 0.20 to 112.00 mg/l with an average of 19.91 mg/l.sample stations 20, 21, 22 and 23 were recorded above the permissible limit of 45 mg/l.Phosphates are often considered a primary limiting element and Nitrates considered secondary limiting element in most of the lakes, and these concentrations are positively correlated in lakesIt was observed that the other high values of TSS, Conductivity, Na, SAR, F Coli, T Coli, SO 4, F-, K, Temperature and B due to point and non point sources which may be attributed to the industrial and agricultural activities.Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.

Factor Analysis
By factor analysis complex linear correlation between metal concentrations was determined, which enabled interpretation of correlation of elements in the study area.Elements belonging to a given factor were defined by factor matrix after varimax rotation, with those having strong correlations grouped in to factors.Considering the influence they exerted in lakes by determining the distribution of parameters in study area of Hyderabad, the said multiparameter factor was divided in to two groups: (i) factors with strong scattered anthropogenic influence and (ii) factors caused by predominantly natural processes or other anthropogenic influences.The identification of factors is based on dominant influence.The distribution manner of individual association of parameters in the lake waters was determined by principle component method (results are shown in table 4).Based on eigen values and varimax rotation five factors explained most of the variability (total variance explained was about 75.67%).

Factor 1
Factor 1 exhibit 27% of the total variance of 75% with positive loading on TSS, Conductivity, Na and SAR.This factor can be attributed to the influence of agricultural activity in the study area.This factor indicates strong association (r=0.6-0.94) of TSS, Conductivity, Na and SAR.The high variability in the analytical data obtained is indicative of an external source for these parameters in water bodies.Total suspended solids levels were found to be high in few stations with concentration ranging from 4 to 600 mg/l with an average of 36.97 mg/l.The high TSS values reported at station 11.This may be due to direct discharge of untreated sewage from the nearby surroundings which was not having the proper diversion facilities and proves that source of TSS is anthropogenic addition.
Conductivity, it varies from 361 to 15740 µmhos/cm at 25 0 C (average of 1969 µmhos/cm) and permissible limit is 2250 µmhos/cm, shows poor quality of water as per water class of irrigation guidelines in India.Na varying from 0.00 to 2175 mg/l with an average of 205 mg/l.and its tolerance limit is 60.Sample stations 2, 11 and 19 shows abnormal values greater than background mean distribution of 15740 µmhos/cm, and 238 mg/l is high in the area; high values of conductivity and Na which are near the vicinity of industrial area and found entry of industrial waste in to the water body.And agricultural runoff from the catchment causing its increase in water body as a point and non point sources of pollution.SAR values vary from 0.2 to 20 with an average of 3.5 and this value comes under water class Excellent to Good as per the irrigation guidelines in India.

Factor 2
Factor 2 exhibits 16% of the total variance with positive loading on fecal coli forms (FC), total coli forms (TC) and SO 4 .Anthropogenic addition of FC in the water bodies ranging from (MPN) 20 to 2000/100ml (Most Probable Number) with an average of 612/100ml, the criteria as per CPCB surface water, fecal coli form (MPN) 500/100ml desirable and 2500/100mL maximum permissible limit.TC varies from 50 to 2500 MPN with an average of 1061 MPN were recorded at sample station 19 as against the limit of 50 MPN/100ml.Apart from the widespread nature in the environment, the presence TC may also found due to dumping of solid waste on lake shore area.SO 4, it varies from 6 to 1741 mg/l with average of 120 mg/l.The maximum value for SO 4 at sample station 19 (as against the tolerance limit of 400 mg/l) are due to entry of untreated industrial and domestic waste water into water body.Hence this factor can be attributed to origin of FC, TC and SO 4 in the area from anthropogenic source only.

Factor 3
It exhibits 12% of the total variance with positive loading on PO 4 , F-and K.This factor can be attributed to the influence of industrial, municipal waste waters and agricultural runoff found on these parameters in the study area.PO 4 varies from 0.0-4.85mg/l(average=1.99 mg/l), sample stations 3,4,5,6,7,8,10,11,13,14,17,19,20,21,22, and 23 showing above the tolerance limit of 1mg/l.At these stations it was found that the entry of untreated industrial and domestic waste waters into water bodies as a point and non point source of pollution.F-Varies from 0.10-2.90mg/l(average=1.23mg/l),above the tolerance limit of 1.5mg/l found at stations 2, 4,5,6,9,13,14,16,17,18,19,20,21,22 and 23, this may be the water bodies receiving high municipal sewage along with the solid waste.K varies from 6 to 99 mg/l (average = 32.38mg/l), the tolerance limit is <10 mg/l, except most of the sample stations were exceeded the limit and principle source of K may be due to entry of untreated industrial waste and municipal water into lakes.

Factor 4
Factor 4 exhibit 10.7% of the total variance and has positive loading on Temperature and Boron.Temperature varies from 23 to 30 o C with an average of 27.12 o C, and B from 0 to 3.2 mg/l (average = 0.71mg/l).Most of the sample stations 1,2,5,7,8,9,10,13,14,16,17,21,22 and 23 shows more than the irrigation desirable limit of 1mg/l and the principle source of B are mainly from agriculture runoff and it is anthropogenic addition.The contamination due to B in water body and the values represent non point source pollution as an irrigation return flow from the catchment.

Factor 5
Factor 5 exhibit 8.73% of the total variance and has positive loading on NO 3 .NO 3 concentration varies from 0.20 to 112 mg/l with an average of 19.91 mg/l.which exceeds the desirable limit of 20mg/l.This factor can be attributed to the influence of municipal waste waters and agricultural runoff found on these parameters in the study area.Sample stations 3,5,10,11,12,13,18,20,21,22 and 23 show comparatively higher concentration.

Conclusions
In this study, lakes getting polluted due to uncontrolled point and non point sources of pollution due to lack of proper sewage network.Results of factor analysis performed on 10 parameters and identified five factors controlling their variability in the study area.Multivariate statistical approaches show that the pathogenic (Bacteriological) pollution, organic pollution, salinity and Trace elements are highly polluting the lakes.The migration of pollutants in lakes in the form of untreated effluents in the catchment indicates the point source of pollution.The runoff from the agriculture fields also contributing the lake water pollution.The present study suggests that, the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, water quality assessment and identification of pollution factors.Regular water quality monitoring for surface water should be undertaken for identification of pollution sources and understanding spatial variations in water quality for effective water quality management.

Recommendations
 Keeping in view of the urbanization and industrialization the organizations like municipal bodies need to conserve the water bodies around the Hyderabad. The untreated effluents emerging from the catchment must be diverted for maintaining the wholesomeness of the water bodies. The present study provides the baseline data for assessment of contaminations in the study area.
 The lake with sewage treatment plant (STP) was not giving much impact on water quality in lakes without first constructing the diversion sewers.
 Change of land use and construction activity of all types shall be prohibited in all water bodies.Construction should be avoided with in maximum water spread area.

Figure 1 .
Figure 1.Map of study area and water quality monitoring stations (listed 1-23) in Hyderabad basin

Figure 2 .
Figure 2. Dendrogram showing clustering of sampling sites according to water quality characteristics

Table 2 .
Mean and S.D. of different lakes water quality parameters at various locations during the year 2013-14

Table 3 .
Descriptive statistical data of lakes water

Table 4 .
Factor analysis of lake water quality data