Artificial Bee Colony based Data Mining Algorithms for Classification Tasks

Artificial Bee Colony (ABC) algorithm is considered new and widely used in searching for optimum solutions. This is due to its uniqueness in problem-solving method where the solution for a problem emerges from intelligent behaviour of honeybee swarms. This paper proposes the use of the ABC algorithm as a new tool for Data Mining particularly in classification tasks. Moreover, the proposed ABC for Data Mining were implemented and tested against six traditional classification algorithms classifiers. From the obtained results, ABC proved to be a suitable candidate for classification tasks. This can be proved in the experimental result where the performance of the proposed ABC algorithm has been tested by doing the experiments using UCI datasets. The results obtained in these experiments indicate that ABC algorithm are competitive, not only with other evolutionary techniques, but also to industry standard algorithms such as PART, SOM, Naive Bayes, Classification Tree and Nearest Neighbour (kNN), and can be successfully applied to more demanding problem domains.


Introduction
As the size of digital information grows exponentially, large volumes of raw data need to be extracted.Nowadays, there are several methods to customise and manipulate data according to our needs.The most common method is to use Data Mining (DM).DM has been used in previous years for extracting implicit, valid, and potentially useful knowledge from large volumes of raw data (Sousa, Silva, & Neves).The extracted knowledge must be accurate, readable, comprehensible, and ease of understanding.Furthermore, the process of data mining is also called as the process of knowledge discovery which has been used in most new inter-disciplinary area such as database, artificial intelligence statistics, visualization, parallel computing and other fields (Wang & Huang, 2007).One new and extremely powerful algorithm used in DM is Evolutionary algorithms (EA) (Tan, Teoh, Yu, & Goh, 2009;Whitley, 2001).This includes biology inspired algorithms such as Genetic Algorithms (GA) (Sumida, Houston, McNamara, & Hamilton, 1990), Differential Evolution Algorithms (DE) (Storn & Price, 1997) and swarm based approaches like Ant Colonies (Colorni, Dorigo, & Maniezzo, 1991), and Particle Swarm Optimizations (PSO) (Kennedy, 2006).In this paper, we propose frameworks for DM as classifier using Artificial Bee Colony (ABC) algorithm (D.Karaboga, 2005).We have implemented the ABC algorithm and evaluate its performance by comparing to other most common classifiers such as PART, DPSO, SOM, Naive Bayes, Classification Tree and Nearest Neighbour (kNN).
The rest of this paper is organized as follows.Section 2 presents a brief overview of previous work on Optimization algorithm which relates to Data Mining.Section 3 describes the methodology of ABC from optimization perspectives.Section 4 and 5 are dedicated to describe the proposed ABC for Data Mining and presenting the experimental setup and results.Conclusions are given in Section 6.

Related Work
Many approaches, methods and goals have been tried out for DM.One approach is using GA in DM.GAs have been applied widely to data mining for classification.For example, GA algorithm has been used in oil collecting vehicle routing problem which is based on the insertion of local search and DM modules in order to discover good features in the best solutions found so far and to apply them in the generation of new solutions (Santos, Ochi, Marinho, & Drummond, 2006).Another example of applying GA in DM is Sikora and Piramuthu proposed doing both the tasks of mining and feature selection simultaneously by evolving a binary code alongside the chromosome structure used for evolving the rules in (Sikora & Piramuthu, 2007).
Another new evolution method used in DM is Particle swarm optimizer (PSO) (Kennedy, 2006).PSO is a new evolutionary algorithm, which simulates the coordinated motion in flocks of birds.Sousa, Silva, and Neves (Sousa, et al., 2004) proposed the use of PSO for data mining.PSO can achieve the rule discovery process.However, PSO has evolved in DM in which it can reduce complexity and speed up the data mining process.For instance, Yeh, Chang and Chung (Yeh, Chang, & Chung, 2009) have applied hybrid approach for mining breast cancer pattern using PSO.Moreover, each particle is coded in positive integer numbers and has a feasible system structure.The proposed DPSO from Chen and Hsu (Chen & Hsu, 2006) for data mining algorithm can overcome the drawbacks of GAs.
Besides that, Ant Colony Optimization (ACO) has been widely used in DM.ACO is a branch of newly developed form of artificial intelligence called Swarm Intelligence (SI).ACO algorithm is inspired by social behaviour of ant colonies.Marco Dorigo and colleagues introduced the first ACO algorithms in the early 1990's (Blum, 2005).The development of these algorithms was inspired by the observation of ant colonies.An example of ACO application is done via prediction of protein activity proposed by Nemati et al. (Nemati, Basiri, Ghasem-Aghaee, & Aghdam, 2009).The significant approach involves combining genetic algorithms (GA) and ant colony optimization (ACO) in prediction postsynaptic activity of proteins.The hybrid algorithm takes the advantages of both ACO and GA methods by complement each other for feature selection in protein function prediction.Furthermore, Nicholas (Nicholas & Alex, 2008) proposed a hybrid PSO/ACO algorithm for discovering classification rules.In PSO/ACO, the rule discovery process is divided into two separate phases.In the first phase, ACO discovers a rule containing nominal attributes only.In the second phase, PSO discovers the rule potentially extended with continuous attributes.
From above, we found that many optimization algorithms have been used for classification tasks.From the best of our knowledge, previous researches on ABC algorithm have only focused on optimization but none of them is for classification tasks.Therefore, this paper is the first time to apply the novel ABC algorithm for the classification applications.

Artificial Bee Colony Algorithm (ABC)
ABC algorithm is a new swarm intelligent algorithm, which proposed by Karabog in Erciyes University of Turkey in 2005 (D.Karaboga, 2005).Since ABC algorithm is simple in concept, easy to implement, and has fewer control parameters, it has been widely used in many optimization applications such as protein tertiary structures (H.A A. Bahamish, R. Abdullah, & R.A. Salam, 2009), digital IIR filters (N.Karaboga, 2009), artificial neural networks (D.Karaboga & Akay, 2005) and others.
The minimal model of foraging selection that leads to the emergence of collective intelligence of honey bee swarms consists of three essential components: food sources, employed foragers and unemployed foragers.There are two basic behaviours: recruitment to a food source and the abandonment of a food source (D.Karaboga, 2005).
1) Food sources: it represents a position of solution of optimization problem, the profitability of food source are expressed as fitness of the solution.
2) Unemployed foragers: there are two types of them, scouts and onlookers.Their main task is exploring and exploiting food source.At the beginning, there are two choices for the unemployed foragers: (i). it becomes a scout -randomly search new food sources around the nest; (ii).It becomes an onlooker -determine the nectar amount of food source after watching the waggle dances of employed bee, and select food source according to profitability.
3) Employed foragers: the honeybees found food source, which also known as the employed bees, are equal to the number of food sources.The employed bees store the food source information and share with others according to a certain probability.The employed bee will become a scout when food source has been exhausted.
Basically, there are two important function supports the algorithm: Where P i is the probability value associated with i th food source that calculated by the Eq.1.1.An onlooker bee selects a food source relying on P i In this equation, fit i represents i th food source's nectar amounts, which is measured by employed bees and SN is the number of food source which is equal to the number of employed bees.(In the real-world problems, SN usually represents the number of possible solutions.) In ABC algorithm, the artificial bees need to do the local search for finding the new possible solution.Eq.1.2shows the local search strategy of the original ABC optimization algorithm.V ij is the new candidate food position produced by this equation where k € [1,2,…,SN] and j € [1,2,…,D] are randomly chosen parameters, but k has to be different from j. Thus X ij and X kj represent the different old food source positions.The difference between these two positions is the distance from one food source to the other one.SN is the number of employed bees as the previous described and D is the number of optimization parameters.Φ ij is a random number between [-1, 1] and controls the distance of a neighbour food source position around X ij .
The most crucial factor in ABC is the processing of greedy selection which means if the new food has equal or better nectar than the previous source; it will replace the previous one in the memory.Otherwise, the previous one is retained.In other words, a greedy selection mechanism is employed as the selection operation between the previous and the current food sources.
Beside the SN control parameter, there are two other parameters used in the basic ABC algorithm: the limitation value of food source position and the maximum number of searching cycle.The users should carefully choose these two values, since the larger number will slow down the optimization process significantly, and the small value cannot find a good solution for the requirement.

Proposed ABC Algorithm for Data Mining
ABC algorithm is a swarm intelligence based algorithm simulating the foraging behaviour of a honey bee colony.Most previous research using ABC algorithm have been recently introduced for global optimization.For instance, Karboga (N.Karaboga, 2009) implements ABC algorithm to design digital Infinite Impulse Response (IIR) filters and its performance had compared with other conventional optimization algorithms (LSQ-nonlin) and particle swarm optimization (PSO).Moreover, Bahamish et al. (H. A. A. Bahamish, R. Abdullah, & R. A. Salam, 2009) also uses had applied the ABC algorithm to search the protein conformational search space to find the lowest free energy conformation which resulted the findings of the lowest free energy conformation for a test protein (i.e.Met enkephaline) using ECEPP/2 force fields.ABC algorithm had applied to solve the leaf-constrained minimum spanning tree problem in literature by (Singh, 2009).
Moreover, Basturk and Karaboga (D. Karaboga & Basturk, 2007;D. Karaboga & Basturk, 2008) compared the performance of ABC algorithm with those of GA, PSO and PS-EA; and Differential Evolution (DE), PSO and Evolutionary Algorithm (EA) on a limited number of test problems.Nevertheless, ABC algorithm also has been evolved through time by solving optimization problems in (D.Karaboga & Basturk, 2007).In neural networks, ABC has been used as optimization solution for training neural networks (D.Karaboga, Akay, & Ozturk, 2007).As ABC algorithm precedes the expectation in solving optimization solutions, new approach based on intelligent behaviour of honeybee swarm emerges.This can be found (Yang, 2005) and (D.Karaboga & Akay, 2005) in the literature.Moreover, researches took another step in developing ABC algorithm as optimization solutions in two-dimensional numeric functions (Yang, 2005).As for solving multi variable problems in optimization process, Karaboga proves ABC algorithm has the ability to optimize multi-variable and multi-modal numerical functions (D.Karaboga, 2005).
Based on previous research mentioned in the earlier paragraphs in this section, we found that ABC algorithm has only been used for the optimization purposes.Therefore, this paper aims at applying this novel ABC algorithm for classification applications.The accuracy of the ABC classifier will be compared with other common classifier such as PART, SOM, Naive Bayes, Classification Tree and Nearest Neighbour (kNN).
Firstly, the main components of the proposed ABC algorithm for classification tasks will be introduced as follows.There are six main components in this proposed algorithm.There are (1) the rule format, (2) fitness function, (3) exchanged local search strategy, (4) rule discovery, (5) rule pruning and (6) prediction strategy.These main components will be described in the following sections: (1) Rule Format In classification method, a classification rule for each attribute contains two parts: the antecedent and the consequent.The following format can be seen in Figure 1.
Where feature 1 to feature N are all attributes of the dataset.Each attribute has its lower bound which is the lowest value for this rule and upper bound which is the highest value for this rule.There are three other values associated with the classification rule: predictive class (Class X), the fitness value and the cover percentage of the rule.These three numbers have close relationship with fitness function and prediction strategy.This will be explained more detail in the following sections.(2) Fitness Function To evaluate the fitness value, the fitness function will be used for the classification instead of measuring the nectar amount.Its representation is defined as below: Where TP, FN, FP and TN are the number of different record types and representing of True Positives, False Negatives, False Positives and True Negatives associated with the rule respectively.Before introducing these four values, two important concepts will be explained:  When the algorithm examines the type of the record, it will measure every feature in the record.If the value of a feature is between the lower bound and upper bound for this feature, it means the feature can be covered by the rule.If all features for a record can be covered by the rule, it means the record can be covered by the rule. If the class of the evaluated record is equal to the predictive class by the rule, this denote that the record has he class predicted by the rule.
i. True positives (TP): the number of records covered by the rule that have the class predicted by the rule; ii. False negatives (FN): the number of records not covered by the rule but they have the class predicted by the rule; iii.False positives (FP): the number of records covered by the rule but their class do not predicted by the rule; iv.True negatives (TN): the number of records not covered by the rule and that do not have the class predicted by the rule. (

3) Exchanged Local Search Strategy
When an employed bee does not meet the requirement or reach the maximum cycle number, it needs to move to a new food source followed by the local search strategy.The original ABC algorithm used the Eq.1.2to implement the local search method.However, it will consume an enormous amount of time when the classified dataset contains large number of data and is not suitable for the classification applications.In consideration of time is the significant factor of doing the classification and improving the accuracy, we propose a new simpler local search strategy which named "Exchanged" to replace the original local search strategy.
Where V ij represents the position of the new food source and X kj stands for the neighbour of previous food source.The number of i and k are between 1 and SN, but k has to be different value from i.In addition, j is the number of dimension.In the classification task, the dimension of the dataset equals to the number of features in the dataset.k € [1,2,…,SN] and j € [1,2,…,D] are randomly chosen parameters.When we put this strategy into classification area, the Eq.2.2 is modified to the following functions: Eq. (2.3) V ij 's upper bound = X k2j 's upper bound Eq. (2.4) Where K 1 and K 2 are two random numbers and do not equal to i.
To evaluate the performance of these two local search methods, we have used the UCI datasets for testing.The detail about UCI test datasets and configuration of control parameter will be explained in Section 3. The comparison result between the ABC with original and proposed "Exchanged" local search strategy has shown in Table 1.
From Table 1, we used the bold to highlight the best result between the original and "Exchanged" local search strategies.For the Breast Cancer, Zoo, Can and Monk datasets, the new local search can reach the better result in all four aspects, mean, standard deviation, maximum and minimum.There were only standard deviation and minimum in Soybean dataset and maximum in Irish dataset got the lower value for new strategy.
From Figure 2 testing result, we can conclude that the proposed "Exchanged" local search can outperform the original local search in ABC algorithm for data classification applications.
(4) Rule Discovery The aim of classification rule mining is to find a set of rules which can identify the specific class from different groups.Thus, the rule discovery phase is the most important for classification algorithm, since rule sets is the outcome of this phase.The rule discovery on ABC algorithm has show in the flowchart in Figure 5.
In the intialization stage, we set lower bound value and upper bound value for every attribute.The procedure is defined in Eq.2.5 and Eq.2.6 as below: Eq. (2.5) For these two equations, F max and F min are, respectively, the maximum value and the minimum value of the feature respectively.The difference between them means the range of the feature.f is the original value of the feature.k 1 and k 2 are two random values between 0 and 1.
The classification rule mining algorithm can automatically discovers the rules for each class.For the selected class, it will find the rules iteratively until the rule set can cover all instances belong to that class.Every single rule abides by the rule structure and the rule set consists of many rules.
(5) Rule Pruning After all classes have been processed and all rule sets have been generated, every rule will be put into the rule pruning procedure.The main goal of rule pruning is to remove the redundant feature limitation that might have been unnecessary included in the rule sets.Since some irrelative attributes will negatively influence the classification result, the rule pruning can increase the accuracy.The process of rule pruning has been shown in Figure 4.This process will be repeated until all rules in the rule set are evaluated.

(6) Prediction Strategy
The pruned rule set will be used to predict the new data which their classes are unknown.But sometime, one testing data record will be covered by more than one rule for different class.When this happened, the prediction strategy will determine which class should be predicted.There are three main steps for the prediction approach, they have specified as follow: 1) Calculate the prediction value for all rules which cover the test data record; 2) Accumulate these prediction value according to different possible class; 3) Select the class which has the highest prediction value as the final class.After the procedure of prediction strategy, the core is the prediction function which is used to compute the prediction value for each rule.It is defined in Eq.2.7 as below: prediction value = (α × rule fitness value) + (β × rule cover percentage) Eq. (2.7) Where α and β are two weighted parameters associated with rule fitness value and rule cover percentage respectively, . The Eq.2.1 can calculate the fitness value for each rule.The rule cover percentage defines that the proportion of the records which covered by the rule that have the class predicted by the rule (TP).It is calculated by the expression shows in Eq.2.8:

Cover percentage = N TP
Eq. (2.8) where N is the total number of the records which belong to the predicted class by the rules.The prediction strategy balanced the effect of fitness value and cover percentage for the final predicted class.We need to choose the value of α and β carefully, since they will affect the classification accuracy.Our new approach has provided a new mechanism of classification rule mining based on ABC optimization algorithm.The summary of the main procedure of proposed ABC data mining algorithm has shown in Figure 3.In Figure 3, each rectangle represents the single in stage in classification data mining and the rhombus indicates the data structure or data set generated by the previous stage.In the initialization step, all control parameters are set, such as the number of colony size, the maximum cycle and the bound value limitation.We used K-fold cross-validation for assessing the accuracy of our testing results.In K-fold cross-validation, the original data set is randomly divided into K subsets.The training and testing period will execute K times.For each time, a single subset is used as the validation data for testing (Testing set), and the remaining (K-1) subsets are retained as training data.In addition, each of the K subsets is employed only once as the validation data.The 10-fold cross-validation is the most commonly used algorithm to test the accuracy of classification (McLachlan, Kim-Anh, & Christophe, 2004).After two data sets have separated, the training set will be used during the training period and the aim is to find the classification rule set.Finally, to evaluate the algorithm, we calculate the accuracy for each validation data set and compute the average of K time validations as the final accuracy.

Experimental Design and Result
For measuring the performance of the proposed ABC data mining algorithm, we have conducted experiment with different datasets that are selected from the UCI Machine Learning Repository.There are 187 data sets currently maintained on the homepage of UCI Machine Learning research group ("UC Irvine Machine Learning Repository,").We have tested on the six popular data sets in the experiment: Breast Cancer, Iris, Zoo, Monk, Can and Soybean.To show the performance of ABC classification algorithm, we have compared it with five other classification algorithms from two data mining software tools which named Orange (J. & B., 2005) and Weka (Mark, et al., 2009).We have chosen KNN, Classification Tree, and NB from Orange 1.0 and PART and SOM from Weka 3.6.1.These datasets and algorithms have been widely used in data mining research.The details about six UCI datasets and the configuration of computer system we used for the experiments are showed in Table 4 and Table 5 respectively:

ABC Settings
In ABC data mining algorithm, we have used the common value (10) for the number of fold to calculate the cross validation.During the rule discovery period, there are three basic control parameters: the number of food sources which is equal to the number of employed or onlooker bees (SN), the value of limit and the maximum cycle number.Furthermore, they are the most important parameters in this algorithm, since these three parameters will determine the time consuming of finding the rule set and the quality of rule set.Therefore, suitable values are needed to balance these two factors.The values of limit are automatically assigned to the maximum and the minimum value for each feature.With several times attempt, SN is set to 20 and the maximum cycle number set to 50.After the final rule set has generated, two parameters are required by the class prediction, which are quality weight (α) and coverage weight (β).Both of them were set to 0.5 for the experiments in Section 5.2.

Experimental Result
When we did the testing of UCI dataset, we separated the UCI datasets experiment into two parts.First, we focus on whether the ABC classification algorithm can handle the UCI dataset.The result of first stage is shown in Table 2.
Where the 1 to 10 rows represent the training and testing result of each fold and the last row shows the average value of 10-fold cross validation experiment.According to Table 2 results, the proposed ABC classification algorithm can successfully classify the UCI datasets and give a good result.For each dataset, we have done 10 times testing to analyse the result by using statistic method.We calculated mean, the lowest, the highest accuracy and the standard deviation.The testing results have been shown in Table 3.Then, we compared ABC classification performance using the original local search and the proposed Exchanged local search with other popular classification algorithms that include PART, SOM, Naive Bayes, KNN and Classification Tree in Table 6.In order to show the performance of the ABC classification algorithm in Table 6 more clearly, we have reproduced it in bar chart formats which have been shown from Figure 6 to Figure 11.From above results, we found that ABC classification algorithm can have the best performance for UCI datasets: Breast Cancer, Iris, Zoo, Monk and Soybean which out of all six experimental datasets we used in the experiments.Especially for the Soybean dataset, it is obviously that the accuracy of ABC data mining algorithm is higher than other 5 algorithms by at least 1%.For the only one dataset, "Can", which did not get the top result, our proposed algorithm can obtain the second best performance.The above testing results can clearly prove that the proposed ABC data mining algorithm can handle the data classification task successfully and obtain the superior result when compared with other data mining software tools such as "Orange" and "Weka".

Conclusion
The Artificial Bee Colony (ABC) algorithm is a new searching algorithm under Swarm Intelligence technology.Many approaches, methods and goals have been tried out for DM.Biology inspired algorithms such as GA and swarm based approaches like Ant Colonies have been successfully used in DM.From the best of our knowledge, previous researches have never applied the ABC algorithm in DM.In this paper we propose a novel data mining approach based on ABC searching algorithm for classification task.For adapting classification task, our contributions for this new ABC data mining algorithm are in the following aspects: rule format, fitness value function, local search strategy and prediction strategy.In the local search part, we have proposed a novel strategy which named Exchanged local search instead of the original one in (D.Karaboga, 2005) for improving the classification accuracy and can obtain a superior result for testing datasets.For avoiding ambiguous situation happen, we designed two parameters: (1) quality weight and (2) coverage weight, for predication strategy.These two parameters will be used when a testing record can be covered by more than two rules for different classes.Furthermore, the proposed ABC algorithm has been tested using six UCI datasets.We have compared the proposed algorithm with other five data mining algorithms which selected from data mining softwares tools: "Orange" and "Weka".It has been proved that the proposed ABC data mining algorithm can obtain the superior result for most of these (five out of six) UCI testing datasets.Therefore, we can conclude that the proposed ABC algorithm for DM can obtain competitive result against five traditional data mining algorithms and can be considered as useful and accurate classifier.References Bahamish, H. A. A., Abdullah, R., & Salam, R. A. (2009) (2009).A new hybrid approach for mining breast cancer pattern using discrete particle swarm optimization and statistical method.Expert Systems with Applications, 36(4), 8204-8211.
From the above explanation, every classification rule is designed as the structure below (C Language version): Struct RuleSet{ Double * lowb; //the lower bound for all attributes Double * upb; //the upper bound for all attributes Char cName[50]; //Class X Double prec; //Cover percentage Double * fitvalue; //Fitness value };

Table 1 .
Comparison result for UCI datasets for ABC (Original) and ABC(Exchanged)

Table 2 .
Comparison result for UCI datasets for ABC (Original) and ABC (Exchanged)

Table 3 .
The statistic result of UCI dataset for ABC algorithm