Representation of textual documents by the approach wordnet and n-grams for the unsupervised classification (clustering) with 2D cellular automata: a comparative study
Abstract
In this article we present a 2D cellular automaton (Class_AC) to solve a problem of text mining in the case of unsupervised classification (clustering). Before to experiment the cellular automaton, we vectorized our data indexing textual documents from the database REUTERS 21,578 by Wordnet approach and the representation of text documents by the method n-grams. Our work is to make a comparative study of two approaches to representation that is the conceptual approach (Wordnet) and the n-grams. Section 1 gives an introduction on the biomimétisme and text mining, Section 2 presents representation of texts based on Wordnet approach and the n grams, Section 3 describes the cellular automaton for clustering, Section 4 shows the experimentation and comparison results and finally Section 5 gives a conclusion and perspectives.
Full Text:
PDFDOI: http://dx.doi.org/10.5539/cis.v3n3p240
Computer and Information Science ISSN 1913-8989 (Print) ISSN 1913-8997 (Online)
Copyright © Canadian Center of Science and Education
To make sure that you can receive messages from us, please add the 'ccsenet.org' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.
Computer and Information Science


