A Novel Electric Power Plants Performance Assessment Technique Based on Genetic Programming Approach

,


Introduction and Background
The most significant issues developing countries are facing with, is finding the appropriate way of operating and managing their power industries (Yunos & Hawdon, 1997).Electricity is extremely important in the economic development of every society (Liu et al., 2010).In 2007, Iran generated about 190 billion kilowatt-hours (Bkwh) electricity and consumed 153 Bkwh.Iran heavily relies on conventional fossil fuel power plants (especially natural gas generator).Iran's nominal electrical production capacity is about 49,000 Megawatts (MW).Nominal capacity of some power plants is under 10%.Most power plants in Iran are old, and can't work under nominal capacity.On the other hand, Iran needs to increase its power plants generate capacity around 10% annually, to fulfill the 7-9 percent annual demand growth (http://www.eia.doe.gov).
The expenses of constructing electricity power plants and producing electricity are relatively high.In addition, the environmental damage and its consequent costs of burning fossil fuels for electricity generation is remarkable.Hence, performance assessment and efficiency evaluation of a group of selected homogenous thermal electricity power plants or in performance evaluation literature, decision-making units (DMUs) to reduce such costs seem necessary.In 2007, 25.6 percent of the whole amount of electricity produced came from gas turbines; 2.2 from hydroelectric plants; 45.4 from steam power plants; and 26.6 from combined cycle power plants.The rest of it was produced by diesel generators.Figure 1 shows the electricity generated by each of different types of power plants in Iran (http://www.tavanir.org.ir).

One of the a stronger most suitab
In the fiel years, like least squa approach known non 1998).ANNs is not capable of extracting interpolation equations.The ANN implementation is needed to be done by a computer program.The new hybrid approach combining DEA and ANNs (Athanassopoulos & Curram, 1996) has been applied in many fields (Mostafa, 2009;Pendharka, 2010;Çelebi & Bayraktar, 2008;Wu, 2009;Wu et al., 2006;Wang et al., 2009).Wu et al. (2006) (Kaboudan, 2003) such as forecasting electricity demand (Lee et al., 1997); forecasting long term energy consumption (Karabulut et al., 2008) in real-time runoff (Khu et al., 2001); predicting financial data (Iba & Sasaki, 2002); predicting stock prices (Kaboudan, 2000) in fault analysis of the diesel engine fuel (Sun et al., 2004); prediction of ski-jump bucket spillway scour (Azamathulla et al., 2008); river pipeline scour (Azamathulla & Ghani, 2010) and longitudinal dispersion coefficients in streams (Azamathulla & Ghani, 2011) and etc.This study presents a genetic programming procedure for performance evaluating of a set of homogeneous steam power plants and benchmarking.By considering a set of power plants of same types to apply the presented model, more accurate and reliable results are guaranteed.

Genetic Programming
Genetic programming (GP) as an extension of the genetic algorithms was firstly presented by Koza (1992).GP is an area of evolutionary computation methods that creates computer programs.
The computer programs generated by GP are presented as tree structures and expressed in the functional programming language (LISP) (Koza, 1992).The classical GP technique is also called "tree-based GP" (Koza, 1992).The main differences between GP and GA are (Willis et al., 1997): • GP creates solutions or chromosomes as a tree structured in the variable length; while GA's generally make use of chromosomes of fixed length and structure.
• GP typically integrate syntax with a specific domain that regulates meaningful arrangements of information on the chromosome.For GAs, the chromosomes are typically syntax-free.
• GP maintain the syntax of its tree-structured chromosomes in 'reproduction' step, by using the genetic operators.
• GP solutions are often coded in the way that let the chromosomes to be executed directly.GA's are rarely coded in this form.
GP is able to automatically predict the generation of mathematical expressions or programs (Tsakonas, 2006).Like many other areas of computer sciences, GP has been widely utilized in the real world condition.GP creates numerous random populations in the large space of possible solutions (computer programs) to avoid the likelihood of stopping in a "local optimum" (Muttil & Lee, 2005).The functions or programs are called organisms or chromosomes.During the evolution process to find best solution, the size and form of the populations dynamically change (Brezocnik & Balic, 2001).From a set of function and terminal genes, possible solutions in GP can be formed in a recursive manner.
The terminal set T contains the arguments for the functions and can consist of numerical constants, logical constants, variables, etc.In Figure 2 a simple tree structure of a GP model is shown.GP Tree structure has a root node with links went out from each function and end to a terminal.N and n.GP er the conventional regression methods is that conventional regression need to specify the model structure in advance, which is mostly suboptimal.ANNS require the identification of the network structure and then the coefficients (weights) are calculated during the learning process.In GP, the terminal and function sets are defined initially, and then both the optimal form of the model and the coefficients are calculated by GP algorithm (Muttil & Lee, 2005).The GP models can provide additional information about the problem by finding the best fit analytic function.In contrast, ANNs can't provide any analytical function besides the interpretation of the network weights is not generally possible.Opposing to ANNS, GP have a good ability to distinguish among the effective input data and inputs that have no effect on a solution.Therefore, GP can reduce the dimension of the model, and better model interpretation will be achieved (Muttil & Lee, 2005).

Methodology
In the present study, a GP-based algorithm is introduced to measure Iran's main electricity power plants efficiency during a specific period.The presented model is input oriented because of the selected power plants have particular demand to fulfill.Thus, the input quantities are the main decision parameters.By finding cost function instead of production function the GP method can be extend as an output oriented model.In this study one output is considered for simplicity.The proposed algorithm is as follows: (1) Divide the data to input (S) and output (P) sets.Assume that "n" power plants have to be assessed.
(2) Form S as inputs contain all data from input variables of the previous periods.
(3) Divide S to two sub sets: learning (S Learning ) and validation (S validation ) sets.
The learning data are used for learning process.A validation data are also used to test the capability of the model on new data.During the learning process the performance of the evolved models on the validation set is monitored.
The learning and validation data sets used to select the best evolved models and included in the training process.Since better extrapolate of GP is preferred the validation data are chosen from closer data periods S Testing .
(4) Use GP method to find best program function.Calculate the GP best fit function with the desired precision on the validation data.
(5) Calculate fitness value for S Testing using the GP best fit function.
(6) Calculate the absolute error between the real output ( and GP best fit function ( ) in the current period: (7) Calculate the error weight for each predicted value of power plants ( : (8) Calculate Raw Efficiency Scores: For obtaining Raw Efficiency Scores real value is divided to the summation of effects of the each absolute error ratio ) and predicted value.
(9) Final efficiency scores calculation.The efficiency scores are between 0 and 1.The power pant with maximum score takes the highest rank.
The steps of proposed algorithm are illustrated in Figure 6.

Case Study
The conventional thermal steam-electric production plan is defined by engineering framework.In such framework, appropriate input parameters are the consumed fuel quantity and installed power.The installed power is the maximum nominal power for that the plants are originally designed.Labor input variables are for controlling and maintenance services, which also require funds (Azadeh et al., 2007).Electrical energy production is the output.According to some researches on the performance evaluation of Iran's thermal power plants (e.g., Emami Meibodi, 1998), labor is not a major factor.Consequently, GP-based formulation of the electric power (MWh) generated from thermal power plants in each power plants (P) is considered to be as follows: , , Where, IC (MW): Capital (install capacity) IP (MWh): Internal power (Internal consumption) FC (TJ): Fuel consumption IC is measured in terms of installed thermal generating capacity (Hawdon, 1997;Fare et al., 1983).IP is the energy consumption of plant (e.g.powered equipments, etc.).Various fossil fuels such as natural gas, gasoline and mazut have been used as fuel in the production procedure.The type of fuel is depended on availability; cost and environmental issues (Azadeh et al., 2010).FC measurement scale is Tera Joule (TJ).  1.For more detailed information about Iran's thermal power plants, such as total output, generation capacity and fuel consumption can be found in TAVANIR management organization (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004).To start analysis, the main data sets in several periods were separated to training and testing subsets.The training data were used for the learning process and the testing data were employed to evaluate the capability of the model on data sets that were not included in the analysis.
For analysis data sets from 1997 to 2002, 117 sets were used as the training data (100 sets for learning and 17 sets for validation).Also, 31 data sets from 2003 to 2004 were taken for the testing of the models.In the computerized GP predictive algorithm several parameters should be considered.These parameters should be set properly in order to get the best GP prediction model for the Electricity production in steam power plants.
Table 2 shows the GP model parameters.Four basic mathematics operators were sets in the procedure in order to maintain the simplicity of the model.Population size sets the number of programs in the population that GP will evolve.The generation number sets the number of levels the algorithm will use before the run terminates.Based on the complexity of model the appropriate values of these parameters should be selected.Herein, a reasonably large value of initial population and generations were tested to find production function with minimum inaccuracy.The rates of the mutation and crossover operations for the optimal models were 50%.The maximum tree depth was also set to an optimal value of 12.
The other values of effective parameters are selected based on trial and error experiments (Gandomi et al., 2010).
In this study tree-based GP software, GPLAB (Silva, 2007) in addition with subroutines coded in MATLAB was used.

Validity Verification
Based on the estimated results for outputs of power plants calculated by the GP model, the plant efficiencies are quantified.The results are shown in Table 3 through Table 5.In Table 3 the rankings of the power plants based on Athanassopoulos and Curram (1996) study which is called ''standardized efficiency'' is shown (Costa & Markellos, 1997;Delgado, 2005;Azadeh et al., 2007).Also Table 4 shows the calculation results according to Azadeh et al. (2007) approach.Finally Table 5 summarized the main results of efficiency scores based on the proposed GP estimation model, which can be seen in Figure 6.To compare results and check the accuracy of the proposed method, a non-parametric inference method-Spearman rank correlation test-is used.To be more specific for each Power Plant, the statistical significances of the difference between the ranking obtained by proposed methodology, conventional and Azadeh et al. (2007) algorithm are determined using Spearman's rank correlation test.Spearman test evaluates the similarity of the rankings of the different DMUs.In the Spearman test, to examine the null hypothesis a test statistic, Z, is calculated using Equations ( 9) and ( 10) and compared with a pre-determined level of significance,  value.The null hypothesis is "The rankings of two methods are not similar".By considering level of significance  equal to 0.05, critical Z value will be 1.645.If the test statistic computed by Equation (10) exceeds 1.645, the null hypothesis is rejected and we can conclude that alternate hypothesis which is "The two rankings are similar" is true (IC &Yurdakul, 2010).
In Equation ( 9), is the ranking difference of Power Plants j in different methods and K is the number of Power Plants.
represents the Spearman rank correlation coefficient.Table 6 shows the calculated values of , and Z.The calculated Z-values, 3.4611 and 3.022, are higher than 1.645, which indicates that the difference in ranking results of the proposed vs. conventional method and the proposed vs. Azadeh et al. (2007) method, by considering level of significance  equal to 0.05 is statistically insignificant.Based on the test results, it can be concluded that the ranking of Power Plants, obtained by proposed method is reliable.
Table 7 shows the summarized main results in presenting the efficiency scores of the conventional and proposed algorithm and PCA (ZPCA).Based on the results in Table 7, it can be seen that the mean efficiency scores of the conventional algorithm is smaller than mean technical efficiency for the Power Plants based on the proposed algorithm.Statistical t-test has been conducted In order to test significantly difference of the two technical efficiencies obtained from the two algorithms.Base on Table 8 The null hypothesis cannot be rejected, that means technical efficiencies of the proposed algorithm is 25 percent larger than mean technical efficiencies of the Azadeh et al (2007) algorithm at the 1% level of significance.

Conclusion
In this paper a nine-step algorithm was proposed to measure and rank the efficiency of electricity production units (Power Plants) in Iran.The unique feature of proposed algorithm is using the result of GP model to calculate efficiency.Using GP can help to better estimate the performance patterns of Power Plants.GP doesn't require explicit assumption about the function structure of the dependent (output) and independent (input) variables and this can lead to better estimation and results than conventional method such as regression or neural network.The proposed algorithm was applied to a set of steam power plants in 2004.The efficiency results and rankings were compared with the two other methods, conventional and Azadeh et al. (2007) approach.To validate our proposed algorithm and ensure that the proposed algorithm calculates the efficiency scores statistically similar to conventional method the Spearman rank correlation test is used.The results indicate that the efficiency scores are closer to the ideal efficiency with considering the fact that the rankings of Power Plants statistically remain the same.Because of better performance patterns recognition of GP method, the proposed algorithm calculates more precise and realistic results than the conventional approach.When the production function is unknown, The GP based algorithm for measuring technical efficiency can lead to better results than other techniques.
Because of lack of both theoretical and empirical works in efficiency analysis more research in this field is needed.For the future studies, utilization of other prediction techniques such as neural network in combination of GP method to better pattern recognition of production function is advised.Also to obtain more realistic results and to reduce the estimation error of results considering more output and input indicators is useful. Figure

Fig
Fig (A) Choose training variables.(B) Train GP using the learning data (S Learning ).(C) Evaluate the model using the validation data S Validation .

Figure 6 .
Figure 6.Steps of proposed methodology

Figure 8 .
Figure 8. Sensitivity analysis of the predictor variables in the GP model integrated DEA and ANNs to calculate the relative efficiency of a big Canadian bank branches.In this study in first stage a CCR model of DEA and in the next stage NN model was used to measure the relative efficiencies.By better estimation of performance pattern this approach can

Table 1 .
The basic descriptive statistics of model parameters

Table 2 .
The GP parameter settings

Table 3 .
Efficiency scores estimation by the standardized efficiency algorithm

Table 4 .
Estimation of efficiency scores by theAzadeh et al. (2007) algorithm

Table 6 .
Determination of the significance of the difference between the proposed method and conventional methods

Table 7 .
Efficiency scores results

Table 8 .
Hypothesis testing of the mean efficiencies ( E) of the proposed andAzadeh et al. (2007) algorithms