Providing a Combination Classification ( Honeybee Clooney and Decision Tree ) Based on Developmental Learning

The aim of this study is to provide a combination classification based on developmental learning in the proposed method using algorithms inspired by nature (honeybee Clooney) and decision tree, by using algorithm classifier consensus is proposed that this method, at first classifier once implemented and based on the detection rate of input data agreement in the final consensus which is an innovation in this research. To implement the proposed algorithms used MATLAB software. Note that, this is an increase compared to the classifiers ensemble, it have accuracy and fix. This shows that this method of making the ensemble by helping bee Clooney algorithm, when appropriate and effective which the number of data collection records is high or the number of study characteristics is high. In this study, we proposed algorithm on 8 samples tested. However, training time of this method compared with simple ensemble is a slower process but this method compared with simple ensemble method has higher accuracy, this shows, if we want a higher accuracy, we should be spent more time. In general, if the accuracy of the process have a large importance for us, this method can be a good option to get the results that almost optimal.


Introduction
Classification specifically means the discriminant function so that this function n dimension space to the decision region of classes mapped.Today, for diagnosis and problem solving classifiers used.The use of classification learning is an effective approach in the machine learning that in recent years researchers have paid it.In this type of learning in order to improve learning accuracy, the results of classification (such as MLP 1 , SVM (Note 1) etc.) are combined together, to this point say the first phase or the classifiers production.Neural networks are the most common choice for basic classifiers.The theory results and experimental show that, when collective learning is better than learning of the basic classifiers which basis classifiers has the acceptable performance and different in the error.
Two classification when have verity in the error which patterns classified incorrectly that are different.Difference in the error cases of basis classifiers, causes covers each other errors and therefore, diversity in error is basics point in the success of a combinatorial classifiers system.Now the second stage or the combination of classifiers which is done after the first step, at this step, we get classifiers conclude used in the previous step as input to one or more of other classification to base on the output of the first step decision-making.
There are several modes for production and the combination of the first step (classifiers production) and the second step (combined classification).We intend in this article to compare the production different scenarios and combined classifiers and compare them and display output of them by simulating pay, as well as how combine them in the best form. in all the results, we have used standard data sets, see the table, a significant increase in accuracy either on experimental data sets as well as on the validation data set is clear.
We're going to study the effect of classifiers on the basis classifiers in order to increase the combinatorial classifiers efficiency review.

Classifier Combination
Combination of multiple classifiers can be used as a general pattern recognition problem, where inputs are separate classifiers results and outputs are their combination decision considered.This is because the classifiers with different characteristics and or different methodologies can complement each other and cover each other's weaknesses, originated.If several different classifiers with together as vote ensemble their overall error can be significantly reduced.Note that, this title in different places known with different names.
The combination of multiple classifiers, classifier fusion, mixture of experts, committee of neural networks, combinational classification systems, classifier ensemble, divide and conquer classifiers, decision forest and so on.But in general, all methods that benefits from combine the classifier say classifier ensemble while only in the condition usage separate classifier in combination, are applied simultaneously and independently, we have intercourse classifier.

Hypothesis: Why Classifier Combination Work?
To better understand why classifier combination is effective, attention to the below theorem.

Condorcet Theorem:
If every voter has the possibility of p to correct say and the possibility that the majority of voters, correct comment is m then if p> 0.5 it must necessarily m> p.In short, m to converge 1, for all p> 0.5, provided that the number of voters to the infinite [9].
In general, it can prove that: •For p> 0.5, we have: if L to the infinite then will be m = 1.
•For p <0.5, we have: if L to the infinite then will be m = 0.
•For p = 0.5, we have: if L to the infinite then will be m = 0.5.
Where L is the number of voters and m majority vote [1].

Classifier Ensemble Definition
A set of basic classifier which together used to solve the problem of pattern recognition, and their decisions in order to enhance the efficiency of the whole system, are combined with together.Although the composition of the decision how is doing depends on the output of basic classifier in the mix.If their type only the label categories, then used the voting procedures.

Definition Multiple Classifiers Systems
In general, the combination of classifiers can be done in four levels.The first high-level, means that at the level which the output of several classifiers combine with a variety of methods.Here, we have at least several basic classifiers which their decisions with method for example majority combined with together.We named this level, combination level.The next level is the classifier level.At this level, the different basis classifier can use to build multiple classifiers systems of compound.The next level is the level of features where it is assumed that we have a number of features and for each classifiers a subset of the features choose.The last level is the level of data (Figure 1).

Classifiers Output Combination
What a way use to different classifiers output combination depend on the output of existing classifiers in the ensemble that we look at kind of them here.
The first type: the output of each classifiers D i for a given sample pattern as is right and wrong, so that for a given set Z, classifiers Di output vector y i produces.y ij equal to one if classifiers D i pattern of z j classification correct and otherwise is zero.
The second type data: each classifiers D i produces a label δ i for a test pattern X, so that for a pattern a vector δ = [δ 1 , δ 2 … δ n ] occurs which L is the number of classifiers.Here, we do not have any information about the reliability labels and even we do not have replacement label to label a pattern.
The third type data: Here's each classifiers D i a priority of the correct label to the wrong label.
The fourth type data: Here's each classifiers D i produces a vector of c dimensional generates the validity of the hypothesis that, "sample X to category has member "display.In most cases d i, j is a number between zero and one.Now, check several methods of output combining of various classifiers pay.A) Majority vote: suppose that classifiers outputs as binary vectors of c dimensional sample X placed in the category and otherwise d i, j = 0. Vote of plurality mood for category for calculated following formula: (1) If in case, for example two class (c = 2) simple majority (50 percent of the vote plus one) to a class k this vote called majority vote.One of the weaknesses of the majority vote does not guarantee the performance boost, with three classifiers with 60% accuracy at a minimum mood by majority vote to reach 40% efficiency.The best performance which is obtainable from a combination method of the pattern of success and worst possible performance called pattern of failure.Although the pattern of success for example, the mood top is 90 percent however, there is not guarantee to the success event.Attention to the table below B) Weighted majority vote: if existing classifiers in a combination ensemble do not have the same precision, it is reasonable that closer classifiers have a greater impact on the final decision.However, d i, j defined as follows: (2) Then separation function, to class such obtain: (3) That b i is a coefficient for classifiers of D i .For example, assume that three classifiers of D 1 and D 2 and D 3 with accuracy of 0.6, 0.6 and 0.7.By combining majority vote method, p maj = 0.696 but with the weighted majority vote combination method with weights b 1 = b 2 = 0 and b 3 = 1 (which practically we remove vote of classifiers of D 1 and D 2 ) p maj = p 3 = 0.7 will be.One way to weighting to the classifiers for achieve maximum majority vote accuracy through the following formula: (4) That pi is the vote accuracy of D i classifier, provided that L classifier of D i independently of one another [1].

The Proposed Method
Overall the classifier combination may be done in the four levels or procedures.The combination level or integration, classification level, features level and data level.
In this study, used the combination level or fusion method.The combination level is the level that the outputs of multiple classifier combined with a variety of methods.Here, there is at least some basic classifier that their decisions by way of example, the majority vote combined with together.At this level, to any classifier within each class weighted and the participation rate of each class in the final classification determined.Each classifier how percentage involved in the consensus.
In the proposed method, first using decision tree algorithm divided data collection in different classifiers and each classifier divided the smaller classifier then the classifiers training and outcome of education and testing of learning amount of decision trees classifier stored, the results of the previous step which represents the amount of learning each classifier in the class shows to the defined performance function sent, using bee colony algorithm, the efficiency function improved.Then amount of performance and the percentage recognition of each class, obtained in the classifier and each classifier based on percentage recognition of subset class participate in consensus.To better understand the example.

Examples:
Suppose that we have three device of separators fruit or classifier (A, B, C) we want to use this device separated apples and pears, it means that each classifier had two mood or class and in total six mood arise.To each class of the classifiers gets a random value, actually have a value between zero and one which represents the recognition of apple from pear by any device, as the initial value such as Table 2 we arrived.After classification, identification of classes and choose the initial state that the accuracy of each classifier implies to the device (Classifier) get some fruit as example (data collection) and in fact solutions are opinion of classifier to a class save.
In this example as shown in Figure 2, half of input data are apples and the other half are pears.
In the formula initial values of classifier to class II (apple), , the values obtained in the first run of machine for the second class, K number of classifiers and P is percent of the class data rather than all data that in this example, the data half is apple and the other half is pear considered.For all classes is calculated.
In this example and according to data values and consensus votes each input data classifier, pears or class 1 has been detected.
Then, because of classifier accuracy percent is low, we will update the initial values and re-run the algorithm we do this several times until the accuracy of the algorithm close to the desired value, means in fact, with training data collection teach algorithm and at the end between all outputs choose input data related to best accuracy.Finally, from the experimental data sets to test the algorithms use.

Decision Tree Algorithm
The decision tree algorithm works on a specific goal analyzes several features and provide conditions for forecasts and targeted of sales.Decision trees used for predict category variables, classification trees are called because the samples set in category or classes.Decision trees that are used for prediction of continuous variables called regression trees [2].Most learning algorithms, decision tree based on a greedy searching action top to bottom in the space of the existing trees.
In the decision tree ID3 used a statistical value that called the Gain information to determine that how much can be a feature training examples according to their classification separate.

Entropy:
The purity rate (disorder or lack of purity) defines a set of examples.If the set S includes examples of positive and negative from a sense of purpose, the entropy S with respect to the classification of Boolean defined as follows.

E(s) = −(p log p + p log p ) (6)
Information Gain: Information Gain of a feature is the amount of entropy reduction that by isolating example through this feature achieved.
In other words, the information Gain (Gain (S, A)) for a feature such as A to set of examples S defined as follows: Which Values (A) is the set of all the features of A and VS is a subset of S which A has V value.In the above definition, first term is the data entropy and the second term is the amount of entropy expected after the separation of data [2].

Bee Clooney Algorithm
In the bee Clooney algorithm (ABC) Bees include in three groups: In the algorithm ABC, for the first half of the population of bees are worker bees and the other half are browser bee.For each food source, there is only a worker bee.In other words, the number of worker bees equal to the number of food sources around the hive.Worker bees have been exhausted at working in food sources will be leading browser bees.
The main steps of the algorithm is given below: (A) Place of the worker bees in food source in the memory: (B) Place of the browser bees in food source in the memory: (C) Send the leading bees to search for new food sources:  to (achieve desired situation.)

Evaluation Criteria
To evaluate the performance of proposed algorithms have been used the two criteria.One error variance or error of error and other statistical analysis.These two criteria are accurate criteria and at work as the main criteria of quality classifier which they have been used [1].

Evaluation of Different Methods
To evaluate different algorithms and compare algorithms with the proposed method at first the algorithms and the proposed method on several data sets of table 4 run (on each running ten times) then the percent of accurate and error each method earned in each time running and will be entered in an Excel file and gave mean, standard deviation of accuracy and error of each method on every collection.

Explanation set row
This collection of data related to risk factors for breast cancer in women of America.Breast 1 The data collection analysis liver disorders Bupa 2 Collect classified from information to radar returns the ionosphere Galaxy 3 The data collection related to glass, the glass produced.Glass 4 This data set for eleven information categories radar ionosphere signal to send or receive it.ionosphere 5 Data collection for men who have been diagnosed with heart disease.SAHeart 6 The data collection related to recognition of consumed.wine 7 The data collection for the identification of lily flowers Iris 8 The average achieved order in decreasing until algorithm based on accuracy sorted and determine which method is the best accuracy in the data collection.

Accuracy in Different Method
For doing this study, we have 20 samples of the classified algorithm according to the Table 5 selected randomly and with the available data sets, they've performed 10 times and the results were saved.To run the sample algorithms used the Weka software.This box diverse tool and comprehensive through a common interface available, so that the user can compare different methods with together and methods that are more appropriate to consider the issues, recognize.

Observation Charts
For better analysis of results in this section with several number of classifier status graphs image down.

Figure 1. Classifier performance mean
In Figure 1, we can see the difference between mean.In this graphs clearly show there is a significant difference between the mean of data for example data set of galaxy which display withe green color, consider, you will see that output of datasets from 20 percent to 80 percent related to KNN classifier that has just been diagnosed.
To view the mean and standard deviation of classifier accuracy percent criteria than the data set, Figure 2 is continued.As you can see, this chart shows the average of the red line, the above square of red line and Low Square shows the standard deviation of the results.

Conclusion
Due to growing demand in recent decades to efficient intelligent computing providing a classification algorithm with better performance is very justified and valuable.Studies have shown that simple issues solve by common classification algorithm but difficult issues with high complexity, their use is inefficient.Even resolution issues while our knowledge of the relations between classes is poor, can be very difficult.To solve this growing problem have been proposed learning techniques under areas.
In this paper, we tried to compared a variety of ways and to date of combined classification and came to the conclusion that the proposed combined classification method which inspired methods from the nature of use, in cases where the number of data or number of great features, it might be the best method.However, to continue this work can be more criteria for mapping data collection and the number of sub-categories for each numerical algorithm consider that can be adhered reviewed.
In this research tried to a general method for classification in environments prone to error raised.Although the report of mining data cited that numerical methods how that can be imposed on a non-numerical data, but it is suggested the output of this method for data collection that benefits numerical features to be employed.

Figure 1 .
Figure 1.The different levels of classifiers ensemble production

First
level: A combined produce apply to the final results Second level: usage of different classification Three Level: usage of various features subset Fourth level: production of data various subsets Table 1.Summary of output moods of the three classifiers for ten pattern Table 1 mood a (111) means that the number of patterns that classifiers of D 1 , D 2 and D 3 , the classification action for them doing right, and mood b (101) means that the number of patterns that classifiers of D 1 and D 3 classification action for them doing right, and classifiers of D 2 classification them wrong.

Figure
Figure 2. Working method of algorithms

Figure 2 .
Figure 2. Mean and SD of standard deviation of the classifier implementation results

Table 2 .
Apply random weight to the chromosomes

Table 3 .
Calculate the new weight of Clooney

Table 5 .
Table of classification algorithms If the average of the existing classifier results put together in the past part and the results regular based on data tables 6 show better result.Attachment (6): Results of a variety methods of classifier assembly on the different sets (thick values are the best results in each dataset)