Assessment of Surface Water through Multivariate Analysis

Multivariate statistical techniques such as factor analysis (FA) and Discriminant analysis (DA), were applied for the evaluation of spatial variations and the interpretation of a large complex water quality data set of two rivers (Juru and Jejawi) in Malaysia, monitoring 10 parameters at 10 different sites each. Factor analysis resulted in two factors explaining more than 82% of the total variance in water quality data set. The factors indicate that the possible variances in water quality may be due to either sources of anthropogenic origin or due to different biochemical processes that are taking place in the system. The first factor called pseudo anthropogenic factor explained 59.29% of the total variance. The second factor called anthropogenic explained 23.03%. DA gave the best result to identify the relative contribution for all parameters in discriminating (distinguishing) the two rivers affording 100 % correct assignations. This study illustrates the benefit of multivariate statistical techniques for analyzing and interpretation of complex data sets, and to plan for future studies.


Introduction
The rivers are the important sources of surface water and boon of nature to the human beings.They are the inseparable component of ecology on the earth.A river, with its tributaries, is a system that sustains fish and the other aquatic life.It does one way transport of a significant load of dissolved matter and particulate material from different sources (Shrestha and Kazama, 2007) in the direction of its flow.Rivers play a major role in assimilation or transportation of the municipal and industrial wastewater discharge of a constant as well as occasional or seasonal polluting source.The surface runoff is a seasonal phenomenon which is largely affected by climate within the river basin (Singh et al., 2004).The seasonal variation in precipitation, surface runoff, interflow, groundwater flow and pumped in and outflows have a strong effect on river discharge and subsequently on the concentration of pollutants in river water (Vega et al., 1998).Rivers constitute the main inland water resources for domestic, industrial and irrigation purposes; it is inevitable to prevent and control the rivers pollution and to have reliable information on quality of water for effective management.The possible variances in water quality may be due to anthropogenic activities, natural variances during months (season) due to various biochemical or chemical processes.Monitoring programs result in a huge and complex data matrix consist of a large number of physico-chemical parameters (Chapman, 1992).
The application of multivariate methods such as Cluster analysis (CA), principal analysis (PCA), factor analysis (FA), and discriminant analysis (DA) has increased tremendously in recent years for analyzing environmental data and drawing meaningful information (Vega et al., 1998;Lee et al.,2001;. Wunderlin, et al., 2001 ;Reghunath, et al., 2002: Saadia, et al.,2005).In this paper we report our findings of the study of water quality of two rivers of Malaysia and their statistical analysis.The analysis is done to explore the extent of resemblance among the sampling sites, to identify the variables responsible for spatial variations in river water quality, to locate the hidden factors explaining the structure of the database, and to quantify the influence of possible natural and anthropogenic sources on the water parameters of the two selected rivers.

Study area
The two rivers of Malaysia selected for the study are Juru and Jejawi located in the North West coast of peninsular Malaysia, in the state of Penang and within a coastal mudflat in the Juru and Bukit Tambun district (Fig. 1).The sites are located adjacent to industrial areas which were reclaimed from mangrove.The types of industry presently in operation include: electronics; textiles; basic and fabricated metal products; food processing and canning; processing of agricultural products; feed mills; chemical plants; rubber based industry; timber based wood products; paper products and printing works; and transport equipment.Other main activities that are operating in vicinity of the cultured area are a ships' harbour with petroleum unloading and a red earth quarry which extends right up to the coastline.

Analytical Methods
The water quality of monitoring sites comprising 10 water quality parameters were monitored monthly over one year (2006) and analysed as given below: The temperature and conductivity were measured using HACH portable pH meter, Dissolves oxygen (DO) was measured with YSI 1000 DO meter.biochemical oxygen demand (BOD), chemical oxygen demand (COD), total nitrate and total phosphate concentrations were analyzed using Spectrophotometer (HACH/2010).Turbidity was measured using Nephlometer.Total suspended solids (TSS) were analyzed gravimetrically at the laboratory.APHA Standard Methods for the Examination of Water and Wastewater were applied for the analysis of concentration of above mentioned parameters.

Discriminant function
Discriminant analysis is a multivariate technique used for two purposes, the first purpose is description of group separation in which linear functions of the several variables (discriminant functions (DFs)) are used to describe or elucidate the differences between two or more groups and identifying the relative contribution of all variable to separation of the groups.Second aspect is prediction or allocation of observations to group in which linear or quadratic functions of the variable (classification functions (CFs)) are used to assign an observation to one of the groups (Richard&Dean, 2002; Alvin, 2002).SPSS version 12 software was used for carrying out the statistical analysis of the data.

Factor analysis
Factor analysis (FA) is designed to transform the original variables into new uncorrelated variables called factors, which are linear combinations of the original variables.The FA is a data reduction technique and suggests how many variates are important to explain the observed variances in the data.Principal components method (PCA) is used for extraction of different factors.The axis defined by PCA is rotated to reduce the contribution of less significant variables (Vega et al., 1998;Helena et al., 2000).This treatment provides a small number of factors that usually account for approximately the same amount of information as the original set of observations.The FA can be expressed as: where z is the measured variable, a is the factor loading, f is the factor score, e the residual term accounting for errors or other source of variation.

Factor analysis
Factor analysis was carried out on the data set (10 variables) to compare the compositional patterns between analyzed water samples and to identify the various factors that influence each of them.Two factors were extracted explaining more than 82 % of the total variance in the water quality data set.Eigenvalues >1 were taken as criterion for the extraction of the principal components required for explaining the source of variances in the data set .The eigenvalues for different factors, percentage variance accounted and cumulative percentage variance are given in Table 1.The Scree plot is shown in Fig. 3 to clarify the method of extraction of different factors.
The factor analysis was actually performed on the correlation matrix between different parameters followed by Varimax rotation and the same has been used to examine their inter relationship.
The parameter loadings for the two identified factors from the factor analysis of the data are given in Table 2.The factor 1 accounts for 59.29 % of the total variance.It is positively correlated (loading > 0.75) with turbidity, temperature and nitrate concentration while negatively correlated with BOD, and phosphate concentration.This factor appears to be originated from the combined effect of anthropogenic activities accompanied with partial ecological recovery system of the river.So this factor may be called as pseudo anthropogenic factor.
Factor 2, on the other hand, explains 23.03% of the total variance and is positively loaded with conductivity and COD.Since the causes of these two parameters are based on excessive industrial activities and are not compensated/ removed instantaneously by the natural recovery system so might be termed as anthropogenic only.

Source Identification
An attempt was made to study the relationship between factor scores and the samples from different sites.The scores for the first factor are shown in Fig. 3.It is observed that turbidity and nitrate concentrations recorded were low where as BOD and phosphate concentrations were higher in Jejawi River for all sites.The high BOD and phosphate concentrations indicate relatively high waste dumping activity in the Jejawi river.Since the high value of BOD accounts for higher micro organism concentration which in tern may consume nitrate and can cause the precipitation of suspended and colloidal particles in water causing thereby reduction in turbidity.It also indicates that Jejawi river has low pollution than Jeru river due to probably lesser industrial activities and easy natural recovery process in the former river.The physical assessment of industrial area shows that more industries are located at Jeru River and thus provides more strength to the conclusion drawn above.
The scores for the second factor are presented in Fig. 4 and appear to show opposite behavior in the two rivers.The more pollution load from industrial activities in Jeru River probably weakens its natural recovery system which in turn is normal in Jejawi River.Thus the two rivers are almost opposite to each other in terms of industrial pollution and natural recovery system.For the same reason all parameters correlated with the second factor (Fig. 5) are exchanged their level among the sites in the two rivers.

Discriminant analysis (DA)
Variation in water quality parameters was evaluated through DA.The DA applied on raw data consisted of ten parameters.Only one DF was found to discriminate the two rivers as shown in Table .3.Wilk's Lambda test showed that DF is statistically significant as shown in Table 4. Furthermore 100% of the total variance between the two locations was explained by only one DF.The relative contribution for each parameter is given in Eq.2. .
It can be seen that, Turbidity, Temperature, Do, BOD, Conductivity and Tss exhibited strong contribution in discriminating the two locations and account for most of the expected variations in water quality, while other parameters showed less contribution in explaining the variation between Juru and Jejawi River.The relative contribution for water quality parameters can be arranged in the order; Turbidity > Do > Temperature > BOD > Conductivity > TSS > pH > COD>Nitrate> phosphate.
The classification matrix (Table 5) showed that more than 100% of the cases were correctly classified to their respective groups.The results of classification also showed that significant differences existed between these two rivers, which are expressed by in term of one discriminant function.

Conclusion
The multivariate statistical techniques, namely, cluster analysis and factor analysis are important analytical techniques for the processing of water quality parameters and power full tools for classification as well as identification of possible sources of pollution.The techniques are also helpful in providing the possible mechanism with justification, by simple reasoning, to the causes of variation in water quality parameters.

Figure 1 .
Figure 1.Map of sampling locations for study areas

Table 1 .
Extracted values of various factor analysis for water quality parameters

Table 3 .
Eigen-value of DF for the two locations

Table 4 .
Wilks' Lambda for testing discriminant function validity

Table 5 .
Classification results for discriminant analysis of the two rivers