Genetic Algorithm for Document Clustering with Simultaneous and Ranked Mutation

K. Premalatha, A.M. Natarajan

Abstract


Clustering is a division of data into groups of similar objects. Each group, called cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups.  The clustering algorithm attempts to find natural groups of components, based on some similarity. Traditional clustering algorithms will search only a small sub-set of all possible clustering and consequently, there is no guarantee that the solution found will be optimal. This paper presents the document clustering based on Genetic algorithm with Simultaneous mutation operator and Ranked mutation rate. The mutation operation is significant to the success of genetic algorithms since it expands the search directions and avoids convergence to local optima. In each stage of the genetic process in a problem, may involve aptly different mutation operators for best results. In simultaneous mutation the genetic algorithm concurrently uses several mutation operators in producing the next generation. The mutation ratio of each operator changes according to assessment from the respective offspring it produces. In ranked scheme, it adapts the mutation rate on the chromosome based on the fitness rank of the earlier population. Experiments results are examined with document corpus. It demonstrates that the proposed algorithm statistically outperforms the Simple GA and K-Means.

Full Text: PDF

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

Modern Applied Science   ISSN 1913-1844 (Print)   ISSN 1913-1852 (Online)

Copyright © Canadian Center of Science and Education

To make sure that you can receive messages from us, please add the 'ccsenet.org' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.