Development of a Simulated Annealing-assisted System for Land-use/Land-cover Classification

Local minima limitations in unsupervised approaches using K-means is still problematic in producing accurate land use/land cover classifications. In response, we developed algorithms of Simulated Annealing (SA) systems based on K-means. We hypothesized that SA-based systems can reduce the likelihood of converging on a local minimum. Two automated SA-based classification systems were developed and applied to a Landsat TM data: a single SA-based (S-SA) system and an integrated SA-based (I-SA) system, which reduces computational intensity. We hypothesized that the I-SA system could produce more efficient classifications than the S-SA system. Kappa statistical analysis on the resulting error matrices demonstrated that the SA-based system significantly improved the classification accuracy over that of the K-means algorithm when appropriate parameters combined as a cooling schedule were chosen. The knowledge and insights gained can facilitate the incorporation of SA random search procedures into other approaches that are similarly limited by local minimum problems to improve accuracy.


Introduction
Remotely-sensed satellite images have long been used to generate land cover inventories.Accurately classifying these images into reliable land cover types is imperative.Many classification methods have been investigated and applied (Belward et al., 1990;Benediktsson et al., 1990;Bischof et al., 1992;Foody, 1995;Khorram et al., 2011;Ratle et al., 2010;Sergi et al., 1995;Serpico and Roli, 1995;Zheng et al., 2008).Classification methods generally comprise two categories-supervised and unsupervised.In a supervised classification, a priori knowledge about each land cover type is used to support collecting training samples for each class.Training samples must be reliable and of a sufficient quantity to estimate the characteristics of classes with adequate accuracy.Frequently, however, training sample collection can be problematic due to insufficient or unreliable information available and complicated or heterogeneous landscape conditions.Alternatively, unsupervised classification requires minimal information and therefore becomes necessary when reliable information and adequate knowledge about the study Simulated Annealing (SA) originated from an analogy between the physical annealing process of solids and large combinatorial optimization problems (Černý, 1985;Kirkpatrick et al., 1983).Geman and Geman (1984) established that SA has the potential to find or approximate the global or near-global optimal in a combinatorial optimization problem.Although SA often requires greater computational time, the improved network could self-adapt to choose momentum parameters according to annealing temperature, thus enabling the network to escape from local minimum spots and converge stably.We hypothesized that a SA-based classification system could reduce the limitation of local minima and thereby improve the accuracy for land cover classification.All unsupervised classifiers have the advantage of being computationally more efficient, thus applicable to large land areas.This investigation explored the development of algorithms and application of SA in unsupervised classifiers for more accurate land cover classification with specific objectives to:

1)
Investigate the applicability of two SA systems for land cover classifications using remotely sensed Landsat Thematic Mapper (TM) multispectral data;

2)
Study the effects of controlling parameters in these SA systems on classification accuracy and efficiency and; 3) Evaluate and compare the resultant classifications from these SA-based algorithms versus the classical unsupervised K-means algorithm.

Clustering by minimizing the distance function
In most clustering problems, one begins with N patterns which are partitioned into K clusters.Algorithms that minimize the sum of squared Euclidean distance between patterns and cluster centers have many theoretical and practical advantages (Tou and Gonzales, 1977).This minimization can be expressed as the generalized least-squared error function J(V): (1) Where is the data set, N is the number of patterns within the data set, is the set of cluster centers, k is the number of clusters; 2  k < N, denotes the number of patterns assigned to the i th cluster center , which is defined as follows: (2) is the associated distance function ( ) between a pattern and a given cluster center where || || is a Euclidean norm on ; The aforementioned minimization is a hard k-partition of Y; any pattern among Y must be assigned to a given cluster and can only belong to this cluster.The objective of the squared distance clustering method is to find a partition containing K clusters that minimizes for a fixed K.In actual applications, the objective, , is minimized for each pattern, ., Based on Tou and Gonzales' (1977) minimization objective described above, the K-means algorithm (Figure 1) consists of the following steps: 1) Select an initial cluster configuration arbitrarily.Choose K initial cluster centers randomly.
2) Repeat: a) Distribute the pixels {Y} among the K cluster domains, using the relation, if for all i, j = 1,2, ….k, ij, where denotes the set of pixels whose cluster center is ; b) Calculate the new cluster centers such that the sum of the squared distance from all pixels in to the new cluster center is minimized; 3) Repeat 2) until convergence conditions are satisfied, which could be defined as no change in cluster centers or as changes below a selected threshold.
The K-means clustering algorithm is computationally efficient and produces good results if the clusters are compact, well-separated in the feature space, and hyperspherical in shape when the Euclidean distance is used (Jain et al., 2000).However, these conditions may not be satisfied in many complex remote sensing applications.Previous investigations have demonstrated that the classical K-means algorithm have not produced satisfactory classifications primarily due to the limitation of the local minimum (Bratsolis, 2009;Hegarat-Mascle et al., 1996;Kolonko and Tran, 1997;Selim and Alsultan, 1991).

Simulated Annealing: basic principles
Simulated annealing (SA), first described by Metropolis et al. (1953), and independently introduced by Kirkpatrick et al. (1983), originated from an analogy between the physical annealing processes of solids and large combinatorial optimization problems (Černý, 1985).In the physical annealing process, if the temperature decreases sufficiently slowly, the ground state, (i.e., the minimal energy state of particles of a solid that are arranged in a highly structured lattice), can be found.The physical annealing process has been simulated in combinatorial optimization problems to search for a globally optimal configuration in order to minimize a predefined cost function.Clustering, or unsupervised classification, is one type of combinatorial optimization problem.A number of different clustering approaches have been used to perform LU/LC classifications using remotely sensed data, but were found to suffer from the local minimum problem (Hegarat-Mascle et al., 1996;Klein and Dubes, 1989;Kolonko and Tran, 1997).SA has long been successfully used in various clustering applications (Brown and Huntley, 1992;Khorram et al., 2011;McErlean et al., 1990;Selim and Alsultan, 1991), making it worthwhile research to develop and investigate SA-based classification systems for LU/LC classification using remotely sensed data due to being able to overcome the local minimum problem (Dai and Khorram, 1999;Geman and Geman, 1984;Pang, Chen and Chen, 2006;Yuan, Van Der Wiele, and Khorram, 2009).
Mathematically, SA can be modeled by means of a Markov chain (Feller, 1950).The basic procedure involves a cooling schedule, in which the temperature parameter T, starts out sufficiently high and is gradually lowered in a given schedule to minimize the energy or cost function associated with a specific problem formulation.At each temperature T, a small, randomly generated perturbation is repetitively applied until the system reaches thermal equilibrium.Then the algorithm moves to the next temperature in the given schedule.The rule of accepting Si perturbation is based upon the Metropolis criterion (Metropolis et al., 1953): 1) If , the energy difference before and after the perturbation, is less than 0, then the perturbation is accepted with probability 1 and the process is continued with the perturbed state; 2) otherwise, the perturbation is accepted with the probability .As a result of a properly designed cooling schedule, the algorithm eventually evolves into a stable state with minimal energy based on the Boltzmann distribution (Geman and Geman, 1984;Laarhoven, 1988).

Adaptation of simulated annealing for clustering issues
To adapt SA to clustering problems, the energy function is defined by Equation 1.We started with a high controlling temperature denoted as and an arbitrary initial clustering assignment.For a given randomly selected pattern, , we reassigned it from its previous cluster m to another randomly selected cluster n based on a uniform probability.The squared Euclidean distances and of the selected pattern from the cluster centers and , and , the distance change between and , were respectively computed.We accepted the new assignment either with a distance decrease <0, or with a distance increase according to a positive probability where T is the temperature at this state.Each trial of generating and accepting a reassignment of a randomly selected pattern was considered as a "transition".The adapted SA algorithm (Figure 2) is denoted as the single SA-based (S-SA) algorithm in the following sections.

Definition of parameters in simulated annealing
The SA algorithm is quite straightforward to implement.However, the quality of the outcome depends on the convergence of the algorithm, which is governed by the cooling schedule.Cooling schedule parameters should be chosen so as to "imitate the asymptotic behavior of the homogeneous algorithm in polynomial time, thereby removing any guarantees with respect to the optimality of the configuration returned by the algorithm" (Laarhoven, 1988).The choice of cooling schedule can be difficult as the "best" choice will be different for each application.The investigation for an adequate cooling schedule is still a work in progress and has long been a predominant theme (Aarts and Korst, 1989;Aarts and Laarhoven, 1987;Chainate et al., 2008;Collins et al., 1987;Laarhoven, 1988;Prajapati et al., 2009).Accordingly, the following parameters were specified and chosen for their relative simplicity and expedient computation:


The initial value of the temperature, .The selected initial value should be sufficiently large such that virtually all transitions can be accepted.Several researchers used the average difference in cost of subsequent configurations occurring in a Markov chain (Leong, 1985;White, 1984).Some smaller candidates of might be applicable in simple clustering applications.


The decrement factor for decreasing the controlling parameter where n is the iteration number.To approximate the global optimal solution, the controlling temperature must decrease sufficiently slowly.Convergence in probability to the global minimum has only been proven for the logarithmic annealing schedule (Geman and Geman, 1984): . Although this inverse-log schedule guarantees the global convergence for a wide class of combinatorial problems, it is far too slow for almost any practical application.The geometric function of temperature: is used where , the decrement factor, is a small constant typically between 0.80 and 0.99.This decreasing function has been proposed in many SA research works and proven to be efficient and applicable (Aarts and Korst, 1989;Burkard and Rendl, 1984;Hegarat-Mascle and Olivier, 1996;Laarhoven, 1988;Lundy and Mees, 1986).


The final value of the temperature .This value is used to determine when to discontinue the annealing process.In our study, we set as a number approaching zero.Via an appropriate cooling schedule, while is approaching , a configuration with minimal energy should be targeted.


A finite number of accepted transitions at each temperature .This factor determines whether the thermal equilibrium at each temperature, a critical prerequisite for the globally optimum approximation, can be restored.To reduce the computational complexity in this study, this factor is specified by two parameters.The first parameter is the iterations, or the times an image is scanned at each temperature.The second parameter is the generation probability, which is used to randomly select a number of pixels per image.For each pixel during each image scan, a random probability is drawn from a uniform probability distribution.If the probability is larger than the predefined generation probability, this pixel will be reassigned to another randomly selected cluster; otherwise the pixel will be passed.The acceptance of the new reassignment depends on the evaluation of the incurred energy T final Tn change before and after this reassignment.The iterations at each temperature and generation probability together determine the length of the homogeneous Markov chain at each .

Integration of the SA and K-means for clustering refinement
Although straightforward to implement, SA is computationally demanding, which has been one of the primary barriers to its widespread application in remote sensing applications using large data sets.It is therefore necessary to investigate ways of implementing SA more efficiently.We believe that if the initial clustering is as close to the optimal clustering as possible, then the search for the global optimum will be both expeditious and more reliable.We hypothesize that if the SA starts with an initialization close to the final optimal solution, a small initial temperature value and a small number of perturbed pixels might be sufficient conditions to find the global or near-global optima with a more rapid cooling schedule.Although K-means may not be able to find the global optimum for a specified problem, the resulting clustering solution might be more likely to be closer to the global optimum than a random initialization.Based on the assumptions above, we proposed an integrated algorithm of SA and K-means called the integrated SA-based (I-SA) system.In this I-SA system, the best result from K-means was used to initialize the cluster centers in SA and then the SA algorithm was implemented using a faster cooling schedule, by which an improved clustering solution might be found more efficiently.To some degree, the I-SA system keeps the random search capability better than SA; therefore reducing the chance of committing to undesirable local minima.

Description of data sets and classification scheme
We developed three unsupervised classification systems based on the K-means algorithm, the single SA algorithm, and the I-SA algorithm.These systems were applied to classify two remotely sensed Landsat TM images.Landsat TM data have been successfully utilized for LU/LC mapping applications (Bai et al., 2009;Dai and Khorram, 1999;Khorram et al., 1987;Vogel and Strohbach, 2009).The TM image, shown in Figure 3, covers an area of 2,856.15ha.The study area was located near the Raleigh-Durham International Airport in North Carolina [USA]-an area of familiarity, less complexity, and nearby proximity to the researchers.The TM image shown in Figure 4 is 7,831.89ha and is located in Georgia [USA] near Macon.For reference convenience in the following sections, the first TM image will be denoted as TM-1 and the second one as TM-2.
Both of the TM images were analyzed visually assisted by aerial photos and available a priori knowledge of the study areas.Using the reference information from each specific study area, the classification scheme for each application was initially developed on the basis of the standard land use/land cover classification scheme proposed by Anderson et al. (1976).As an additional step, classification schemes were adapted depending on the image complexity and separation capability so that the spectrally derived clusters could be easily translated into real land cover classes.From the standard false color composite TM-1 image, it was easy to determine that five land cover classes were dominant: urban, mixed forest, evergreen forest, grassland/agriculture, and water.The TM-2 image was much more heterogeneous than TM-1.Seven classes were eventually derived including urban/residential, evergreen forest, deciduous forest, mixed forest, bare soil or fallow crop field, transitional area or grassland, water.
To reduce the computational complexity, only three bands were used for classification for each application.Jensen (1996) indicated that either the combination of band2 (Reflected Green Band), band3 (Reflected Red Band), and band4 (Reflected Near-infrared Band) or the combination of band3, band4, and band5 (Reflected Middle-infrared Band) are the optimal three-band combinations for land cover classification.In TM-1, band2, band3, and band4 were used to classify the image while band3, band4, and band5 were used in TM-2.The selection of these bands was due to the reflective properties of the features and the land use land cover characteristics of the study areas.
The classification applications using TM-1 and TM-2 were examined to evaluate the wide applicability and potential of the SA-based systems and their reliability in applications with varying classification complexities.In order to evaluate the performances, the resulting classifications were first evaluated by its mathematical performance index .Based on the aforementioned clustering formulation, our minimization objective was to minimize the sum of squared Euclidean distance .The resulting classified map with lowest was more accurate than others.In order to confirm the more accurate results being produced by the lowest J(V), the classifications results were selected from both algorithms and tested against the reference data that were manually derived from aerial photos, the original TM images, and other ancillary data.

Analysis of the selection and roles of parameters in SA
A large number of experiments using various parameter candidates were conducted by applying the S-SA system to image TM-1 in order to assess the role of the aforementioned parameters.The following candidates for each parameter were used: To visualize the effects of these parameters, several groups of experimental data for each tested parameter were selected to compare the resulting and the required computational time.Each group of parameter combinations was run with four random initializations.In the following figures, and computational time for each group are the averages of these four instances.Within each selected data group, only the studied parameter was allowed to change among the selected range while all other parameters were fixed.For example, in Figure 5, each data group represented a randomly selected data series in which ranges from 10 to 10,000 while the other three parameters were fixed.The selected groups of data are graphed on the studied parameter vs.
in Figures 5-8; the studied parameter versus computational time (in seconds), are depicted in Figures 9 through 12.

1)
Effect of the initial value of temperature and the decrement factor In this study, the largest candidate (10,000) was chosen to assure all transitions would be accepted at the beginning of clustering, while the lowest candidate (10) can only accept a small number of transitions.The decrement factor was used to decrease the temperature in the cooling schedule by multiplying it with the current temperature.The SA implementation required that temperature decreases sufficiently slowly but there was no criterion to specify precisely how slow the decrease ought to be.Although Laarhoven (1988) and Aarts & Korst (1989) suggest that should be large enough that virtually all transitions can be accepted, there was no clear evidence from Figure 5 to indicate that the magnitudes of made a significant difference in this clustering application in terms of reducing .Lower initial values like 10, 100 may also be applicable.This might attributed to selected candidates being too high for this particular application.From Figure 6, we observed that comparatively the slowest decrement factor (0.99) had the most stable performance, but there was only a small performance improvement observed among the decrement factor candidates.

2)
Effect of the iterations at each temperature and generation probability In this study, the IET and the GP was combined to determine the length of the homogeneous Markov chain at each temperature.A finite number of accepted transitions (i.e., reassignment of class labels) at each temperature were essential to the convergence of SA.The GP was a uniform probability between 0.0 and 1.0.The larger the IET and the smaller the GP, the more transitions operated and, of course, the greater likelihood that more transitions would be accepted at each temperature.From the behaviors of the five groups of data in Figure 7, it was found that there were no significant differences among the four IET candidates in all groups except the Group I. Figure 8 illustrates the effect of the GP on minimizing .For each group of data, the lowest s in four groups were obtained with GP (0.85), while in other two groups the lowest s were obtained with GP (0.90).
3) Effect of parameters on Computational Time (CT) The effects of the parameters on CT were studied as well using the same graph method.Figure 9 to Figure 12 illustrates that CT increases when , and IET increase, and when GP decreases.There is a compromise between computational time and classification performance.Among the parameters, and were the factors that escalated CT most significantly.
In this specific application, the magnitude of the initial temperature did not have a significant effect on the clustering performance.The slowest decrement factor candidate (0.99) in our study performed better than other candidates, but not significantly.From the graphical analysis above, the appropriate selection of the GP was comparatively more important among the four studied parameters, which verified that a certain number of accepted transitions are one of the essential conditions for successful SA implementations.However, the search for appropriate parameters is specific to each individual application and is certainly related to the structure and cluster numbers of the classified data.Based on our parameter analytical results, we conclude that the choice of a single parameter is not vital for a successful implementation.The optimal and efficient implementation depends on an optimal parameter combination for these four parameters.We recommend the future investigators expand on this optimization, because we ran out of the scope of this project and left this further optimization to those who are interested in the follow up of this research.

Classification Results and Analysis
The classification results of the three clustering algorithms using two TM images: TM-1 and TM-2 are presented in this section.A number of experiments were performed using either different initializations in K-means or T0  different parameter combinations in the two SA-based algorithms.The classification result with the lowest from each algorithm was selected for further accuracy assessment.
The classified maps generated from TM-1 are shown in Figure 13.The SA-related parameters and their performance in terms of and computational time are shown in Table 1.The I-SA algorithm obtained the best classification performance in TM-1.Both SA-based algorithms performed better than the K-means in this application.However, K-means was the most computationally-efficient algorithm.To validate the three classified maps in terms of land-cover/land-use classification accuracy, 253 sample points were randomly selected and visually interpreted based on the reference data derived from false composite TM-1 imagery and aerial photos.The resulting error matrices are presented in Tables 2 through 4.
Among the three classified maps, the highest overall accuracy (91.3%) was obtained by the I-SA algorithm.The Kappa analysis was used to statistically determine if one of the derived error matrices was statistically significantly different than another (Bishop et al., 1975).The KHAT values shown in Table 5 demonstrate that there are strong agreements between the classification results and the reference data for all of these three error matrices.From the Z-tests shown in Table 6, it can be concluded that at the 90% confidence level, the I-SA algorithm is significantly better than the K-means, while there is no significant difference between S-SA and the other two classification systems.
Figure 14 depicts the classified maps generated using TM-2.Their SA-related parameters and the performance indexes are presented in Table 7.To conduct the accuracy assessments, 299 sample points were randomly selected from TM-2 and visually interpreted.The resulting error matrices are presented in Tables 8 through 10.The highest overall accuracy, 75.59%, was obtained by the S-SA algorithm.The Kappa analysis shown in Tables 11 and 12 demonstrates that the agreement between the classification results and the reference data in TM-2 is moderate, and the S-SA algorithm had significantly improved the classification of TM-2 as compared to the K-means and the I-SA algorithms.However, unlike with TM-1, the performance of the I-SA algorithm in TM-2 is poorer than the other two.
By analyzing the results, the classification performances of the three clustering algorithms using TM-1 and TM-2 are clearly different in several aspects.First, the classification accuracy in TM-1 was relatively high and satisfactory while the classification accuracy in TM-2 was moderate.Second, the I-SA algorithm in TM-1 had the best classification performance while its performance in TM-2 was poor with the S-SA algorithm producing the best classification.Third, the resulting SA-related classification improvement in TM-2 was more significant than that in TM-1.We believe that these differences may be due to differences in the complexity of data structures and distributions between TM-1 and TM-2.Visually, we observed that the landscape condition in TM-1 was less complex than TM-2 case.Five typical land cover classes can be easily differentiated.From the distribution plots shown in Figure 15, it is further demonstrated that the five classes in TM-1 are spectrally compact and well-separated.The comparatively homogeneous study area reduced the classification complexity and contributes to achieving relatively high classification accuracy in TM-1 classification.The landscape conditions in TM-2 were more complex than TM-1 because the study area was more heterogeneous and contained more LU/LC classes.The spectral complexity in TM-2 was demonstrated as a more scattered distribution among the seven classes shown in Figure 16.
The increase in spectral complexity explains why the accuracy of resulting classification of TM-2 was lower than that of TM-1.In a more complex classification case like TM-2, the classifier often suffers more local minima than a less complex case.This is one of the reasons why the SA-based algorithm has improved the classification accuracy more significantly in TM-2 than TM-1.It is due to the same reason that the I-SA algorithm has not performed as well in TM-2.In a complex classification application and transitions at each temperature must be specified sufficiently large to help the algorithm jump out of the trap of many local minima.
In both cases-TM-1 and TM-2-the I-SA or the S-SA improved the classification as compared to K-means.This verified our assumption that the SA-based algorithms are able to overcome the local minimum limitations of K-means and improve the classification accuracy.Furthermore, K-Means algorithm is too restrictive to be optimized.The experimental results and analysis described above lead to the conclusion that SA-based classification systems are applicable for land cover classification using remotely sensed data and can significantly improve the classification accuracy when applied to relatively complex classification applications.

Discussions and Conclusion
In this study, algorithms and methodologies of SA originating from the analogy between the physical annealing process of a solid and combinatorial optimization problems were investigated and adapted to solve the local minimum problem in unsupervised LU/LC classification, thereby improving classification accuracy.Two SA-based unsupervised classification systems were developed.Their applicability and implementations for land use and land cover classifications were studied using two remotely sensed images, TM-1 and TM-2, and their classification performances were compared with the classical clustering algorithm K-means.Based on the experimental results and analysis, two factors were found to be essential for the success of the SA-based systems: the complexity of the application and the selection of appropriate controlling parameters.The nature of unsupervised classification is to establish decision boundaries based on the unlabeled data by optimizing certain error criteria so as to naturally group the data into clusters.Decision boundaries are often difficult to find in many applications because the search is often complex and suffers from local minima.The incorporation of SA procedure proves to be helpful to reduce the adverse effects from these local minima.The local minimum problem may be worse for complex applications because they may suffer more local minima than less complex applications.Our experimental analysis suggests that, for simpler applications, the I-SA system is a good option because it is likely to produce better classification more efficiently.However, for more intricate applications, (e.g., either when more land classes are desired or the landscape conditions are very heterogeneous), the I-SA system may not produce improved results, in which case the S-SA system should then be used.
Theoretically, SA has been proven to be able to provide the global optimal solution in clustering problems albeit with a long run time.In practice, the globally optimizing nature of SA can be taken advantage of in order to improve the classification accuracy and reduce the uncertainties for land cover classification.This can be accomplished with appropriate selections of initial temperature and cooling schedule.K-means can be implemented efficiently and produce satisfactory results if classified patterns contain well-separated clusters such as hyper-spectral data.The performance of SA-based systems is sensitive to the choices of controlling parameters.If appropriate parameters are chosen, the resultant classification can be improved significantly.The effective implementation of the SA-based systems in classification problems is still a substantial issue for which there is no definitive analytical solution to date.Based on our findings, we reached the following conclusions: 1) The selection of each individual parameter in the cooling schedule may not be as critical.It may be more important to select an optimal parameter combination for accurate and efficient classification using SA.The speed of the cooling schedule should be slowed when the application complexity increases.
2) The initial value of temperature should be sufficiently large, but some lower , which may not allow all transitions to be accepted, may also be applicable in less complex applications.As application complexity increases, must also be increased.
3) The decrement factor plays an important role in SA.The decreasing speed must be slow enough that thermal equilibrium could be reached at each temperature.Candidates at least above 0.80 are highly recommended from our study.
4) The iterations at each temperature (IET) and the generation probability (GP) determine the length of the Markov chain at each temperature.That is, greater numbers of IET and lower GP result in a longer Markov Chain.There is a trade-off between large decrements and small IET and higher GP.One could either choose smaller with greater IET and lower GP, or larger with smaller IET and larger GP.

Recommendations for Future Research
As we indicated in part of our previous findings (Khorram et al., 2011), SA-based systems have been studied for their potential for improving classification accuracies in LU/LC classification applications using Landsat TM images.To test the general applicability and robustness of SA-based classification systems for LU/LC mapping using remotely sensed data, it is necessary to apply the SA-based classification systems for various LU/LC classification schemes as well as using many other remotely sensed data.Datasets such as SPOT should be tested for similar applications.More importantly, we believe that application of SA classification techniques could be very useful when applied to high resolution satellite data such as IKONAS and Quickbird.There may be merit in testing the applicability of SA to hyper-spectral data.Additional classification applications would be valuable to provide more evidence to demonstrate the potential of SA-based classification systems to improve classification accuracy as compared to many traditional classification approaches suffering from the local minimum problem.
The main advantages of SA are its potential to find near-optimal solutions and its ease of implementation.The disadvantage is the potentially large amount of time required for its application in remote sensing.It is necessary to investigate possibilities of speeding up the SA-based algorithm, such as design of faster sequential algorithm or parallel algorithm.However, rapidly growing and available computing power will lessen the computational problem and facilitate the use of SA in remotely sensing applications.The potential of approximating a global optimum, using the SA-based approaches is very attractive and it is worthwhile to explore other possible ways to combine the SA idea with other classification approaches that suffer from the local minimum problem.

T0 T0
T0     Finally, it is recommended that multiple study sites with different complexities and landscape characteristics during two dates in a year (growing season and leaf off season), could eliminate some of the issues we encountered such as the complexity of the landscape and the seasonal variability of the vegetative cover.1.
Assign the cluster label to every pixel arbitrarily 2. it = 0 (it corresponds to iterations) Cluster assignment: 1.
For each pixel y j , calculate the Euclidean distances of the pixel to each of the cluster centers 2.
Assign the pixel y j to the nearest cluster in terms of the Euclidean distance.Group III: Group IV: Group V: Group VI:

Figure 1 .
Figure 1.The flowchart of K-means clustering algorithm Figure 2. The adapted single SA-based clustering algorithm Figu G G G G G Figure 9. E Group I Group I Group I Group I

Figure 13 .Figure 16 .
Figure G G G G

Table 1 .
SA-related Parameters and Performance Indexes from TM-1

Table 3 .
Error matrix on the classified result of TM-1 from the single SA-based algorithm

Table 4 .
Error matrix on the classified result of TM-1 from the integrated SA-based Algorithm

Table 8 .
Error matrix on the classified result of TM-2 from the K-means, where C1 = Urban/Residential, C2 =