Estimation of Saturation Percentage of Soil Using Multiple Regression, Ann, and Anfis Techniques

The saturation percentage (SP) of soils is an important index in hydrological studies. In this paper, artificial neural networks (ANNs), multiple regression (MR), and adaptive neural-based fuzzy inference system (ANFIS) were used for estimation of saturation percentage of soils collected from Boukan region in the northwestern part of Iran. Percent clay, silt, sand and organic carbon (OC) were used to develop the applied methods. In additions contributions of each input variable were assessed on estimation of SP index. Two performance functions, namely root mean square errors (RMSE) and determination coefficient (R 2), were used to evaluate the adequacy of the models. ANFIS method was found to be superior over the other methods. It is, then, proposed that ANFIS model can be used for reasonable estimation of SP values of soils.


Introduction
Saturation percentage (SP) is related to the mechanical constituents of soils and can, therefore, be regarded as a quantitative measure of soil texture, water-holding capacity, and cation exchange capacity.Soil profiles may be described in terms of SP, and soil maps may be developed to represent quantitative changes in soil texture within a region.Furthermore, measurement of soil water content is important in simulation of all aspects of hydrological cycle, for estimation of plant water use, and for characterizing most soil physical, chemical, and biological processes.Chemically, water serves as transport agent for dissolved inorganic chemicals and suspended biological components, involved in the processes of soil development and degradation.
The saturation percentage is defined as the ratio of the amount of water added to saturate dry soil samples, to total mass of the fully dried soil.Direct measurement of saturation percentage is time consuming and relatively expensive.In the conventional procedure, initially dried soil samples are saturated with deionized water and then oven dried at 105 C˚ for a period 24 hrs.Indirect methods are, then, used as an alternative solution.Numerous attempts have been made to correlate base-saturation percentage with pH in saline suspensions.(Keeney and Corey, 1963;Shaw, 1952) Such efforts have been, at best, only partially successful, particularly if one attempts to estimate precise levels of base saturation from pH measurements.
In view of the importance of accurate estimation of saturation percentage (SP), using basic and readily available soil information, adoption of modern techniques such as artificial neural networks (ANNs) and fuzzy inference system (FIS) can be a viable alternative.Because of the non-linear structure in ANNs models and ambiguity in variables in FIS models, (Piotrowski et al. 1996;Mukhopadhyay 1999), researchers are, recently attracted in using hybrid models such as Adaptive Neural-based Fuzzy Inference System (ANFIS) to further analyze the variables, which are spatially distributed.(Lee, 2000) In this study, efficiency of Adaptive Neural-based Fuzzy Inference System (ANFIS), artificial neural networks (ANN) and multiple regression (MR) models were examined in estimation of saturation percentage (SP) using measured data of clay, silt sand and organic carbon (OC), in Boukan plain in the West Azerbaijan Province, Iran.

Description of the study area
The study area is Boukan region which is located in southern part of West Azerbaijan Province, Iran.Boukan covers an area of 47300 hectares (Figure .1),with latitude of 36º 32' and longitude of 46º 13'.Average elevation in this region is 1330 m above sea level.The area falls under the semiarid climate with an average rainfall of 517 mm/year.600 measured values of clay, silt, sand, organic carbon (OC %) and saturation percentage (SP %) which was previously collected by the Iranian Rojhalat Soil Lab were used in this study.The Walky black method was used to determine OC content in the samples and soil texture (percent sand-silt-clay) were determined using Hydrometer method (Schumacher 2002).
A summary of obtained results and their basic statistics is presented in Table1.The SP values ranged between 28 and 66 (%) with an average value of 48.18%.The respective average values of effective organic carbon, percent of clay, percent of silt and percent of sand were determined as 0.72, 24.43, 51.34 and 24.23 %.
Simple regression analysis was performed to initially establish the predictive relationship between measured parameters.The relations between SP and other measured parameters were analyzed using linear, power, logarithmic, and exponential functions.Models with statistically significant and strong correlations were then selected for further analysis (Table 2).Regression equations were also established among index parameters with SP (Table 3).All obtained relationships were found to be statistically significant according to the Student's t-test at 99% level of confidence.

2.2Artificial Neural Network (ANN)
Artificial neural networks (ANNs) are based on current understanding of biological nervous systems, though much of the biological details are neglected.ANNs are massively parallel systems composed of many processing elements connected by links of variable weights (Lippman, 1987).
In Figure 2 a three-layered neural network consisting of i, j and k layers with the interconnection weights W ij and W jk between layers of neurons is illustrated (Hagan and Menhaj 1994;Kisi and Uncuoglu 2005).The weights are computed through an iterative process based on back propagation algorithm in such a way that the difference between computed and given output (or any error criterion such as mean square error) is sufficiently small.The hidden layer node numbers of each model were determined after trying various network structures, since there is no theory yet available to tell how many hidden units are needed to approximate a given function.Cross validation mode (checking mode) monitors the error to find the optimal termination point for training and also avoid overtraining.Testing mode is used to determine how accurately the network can simulate input-output relationships.
All collected data were divided into three sets, namely; training (3/5 of all data), test (1/5 of all data), and verification (remaining 1/5 of all data).In this study MatLab 7.4 software was used in neural network analysis having a three-layer feed-forward network that consisted of an input layer, one hidden layer, and one output layer.Logsigmoid (transfer) functions for both hidden and output layers were used for analysis network activation.

Adaptive Neural-based Fuzzy Inference System (ANFIS)
Adaptive Neural-based Fuzzy Inference System (ANFIS) is capable of approximating any real continuous function on a compact set to any degree of accuracy (Jang et al., 1997).Specifically, ANFIS system of interest here is functionally equivalent to the Sugenofirst-order fuzzy model (Jang et al., 1997;Drake, 2000).The hybrid learning algorithm, introduced as follows, combines gradient descent and the least-squares method.As a simple example a fuzzy inference system is assumed with two inputs x and y and one output z.The first-order Sugeno fuzzy model, a typical rule set with two fuzzy If-Then rules, can be expressed as: The resulting Sugeno fuzzy reasoning system is presented in Figure 3, where the output z is the weighted average of the individual rule outputs and is itself a crisp value.The corresponding equivalent ANFIS architecture is presented in Figure 4. Nodes at the same layer have similar functions.The node function is described next.The output of the ith node in layer l is denoted as O li.
Layer 1: Every node i in this layer is an adaptive node with node function where x (or y) is the input to the ith node, and A i (or B i-2 ) is a linguistic label (such as ''low'' or ''high'') associated with this node.In other words, O li is the membership grade of a fuzzy set A (= A 1 , A 2 , B 1 ,or B 2 ) and it specifies the degree to which the given input x (or y) satisfies the quantifier A .
where {a i , b i , c i } is the parameter set.As the values of these parameters change, the bell-shaped function varies accordingly, thus exhibiting various forms of membership functions on linguistic label A i .In fact, any continuous and piecewise differentiable functions, such as commonly used in triangular shaped membership functions, are also qualified as candidates for node functions in this layer (Jang, 1993).Parameters in this layer are referred to as premise parameters.The outputs of this layer are the membership values of the premise part.
Layer 2: This layer consists of nodes labeled П, which multiply incoming signals and sending the product out.For instance, Where, each node output represents the firing strength of a rule.
Layer 3: In this layer, the nodes labeled N calculates the ratio of the ith rule's firing strength to the sum of all rules' firing strengths the outputs of this layer are called normalized firing strengths.Layer 4: This layer's nodes are adaptive with node functions where i W is the output of layer 3, and {p i , q i , r i } is the parameter set.Parameters of this layer are referred to as consequent parameters.Layer 5: This layer's single fixed node labeled ∑ computes the final output as the summation of all incoming signals In the present study, the triangular and Gaussian membership functions were used.In each application, different numbers of membership functions were tested and the best one, with minimum root mean square error (RMSE) and the maximum R 2 , was selected.A hybrid intelligent system called ANFIS (the adaptive neuro-fuzzy inference system) for predicting SP was also applied.ANFIS was trained with the help of Matlab (version 7.4) and SPSS (15.0 package), and two top models, namely ANFIS11 and ANFIS12 were selected based on RMSE and R 2 .ANFIS parameter types for the two models and their values are presented in Table 4.

Multiple Regression Models
Multiple Regression (MR) is a statistical technique that allows us to predict someone's score on one variable on the basis of their scores on several other variables.MR was used in order to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable.The general form of these models is , ... .C is the y-intercept (Milton et al., 1997;McClave et al., 1997).Eight MR analysis were carried out to correlate the measured SP to various combinations of measured parameters; namely clay, silt, sand and OC content

Application and results
in this study various combinations of measured parameters (clay, silt, sand, organic carbon percentage, and saturation percentage (SP %)) were examined as inputs to ANN models for evaluation of effect of each variable on SP (%).Several NF models were established with different variable added into the input combination at one time.Thus, the input combinations evaluated here were: In this study, estimated and observed values were statistically compared and analyzed using root mean square error (R-MSE) and determination coefficient (R 2 ).
Where n is the number of observations, O i and P i are the measured and predicted values, respectively.The calculated indices are presented in Table 5.The obtained results shows that ANN models with OC, clay, silt and sand inputs (combination (xiii)) had the smallest RMSE and the highest R 2 .This observation emphasizes that all of these parameters have a weight on prediction of SP (%).The best two ANN models among all applied models are presented in the Table 5.
The coefficient of correlation between measured and predicted values is considered as a valuable indicator of the predictability of models.These relationships, presented in Figure 5, were found to be highly correlated.Cross-correlation between predicted and observed values (Figure 5) indicated that the constructed ANN model is acceptable for prediction of SP.
Multiple regression models were developed to predict SP with different combinations of inputs and the two best models were selected based on R 2 and RMSE indices (Table 6).Cross-correlation between predicted and observed values (Figure 6) indicated that the constructed MR model is acceptable for prediction of SP.
According to the RMSE and R 2 values (Table 7) and cross-correlation between predicted and observed values ( Figure 7), ANFIS model constructed has a high prediction performance for prediction of SP.
Analysis of ANFIS models with various combinations of inputs and different type of membership functions with different numbers showed that the model having OC, clay, silt, and sand as inputs with triangular membership function type has the best performance, followed by, the model having clay, silt, and sand combination as inputs with Gaussian membership function.

Conclusion
In this study Adaptive Neural-based Fuzzy Inference System (ANFIS), artificial neural network (ANN) and multiple regression (MR) techniques were used for prediction of Saturation Percent (SP) using characteristics of soils; namely, percent sand, silt, clay, and organic carbon (OC).Appropriate models were developed by scrutinizing their performance degrees and the model with minimum RMSE and maximum R 2 was selected as best model.The results showed that constructed ANFIS and ANN models were effectively able to predict SP.The comparison of developed models showed that ANFIS and ANN models are superior as compared with MR functions in estimating SP indices.In this study, option of estimating SP using proposed empirical relationship and models is acknowledged.

Table 1 .
Statistical analysis of the measured soil parameters

Table 3 .
Predictive relationship for assessing SP, using available measured values.

Table 5 .
Statistical analysis for the best two ANN models in train and test period

Table 6 .
Performance indices (R 2 and RMSE) for two preferred MR models for prediction of SP

Table 7 .
Preferred ANFIS's parameter type and their performance indices