Representation of textual documents by the approach wordnet and n-grams for the unsupervised classification (clustering) with 2D cellular automata: a comparative study


  •  HAMOU Reda Mohamed    
  •  LEHIRECHE Ahmed    
  •  LOKBANI Ahmed Chaouki    
  •  RAHMANI Mohamed    

Abstract

In this article we present a 2D cellular automaton (Class_AC) to solve a problem of text mining in the case of unsupervised classification (clustering). Before to experiment the cellular automaton, we vectorized our data indexing textual documents from the database REUTERS 21,578 by Wordnet approach and the representation of text documents by the method n-grams. Our work is to make a comparative study of two approaches to representation that is the conceptual approach (Wordnet) and the n-grams. Section 1 gives an introduction on the biomimétisme and text mining, Section 2 presents representation of texts based on Wordnet approach and  the n grams, Section 3  describes the cellular automaton for clustering, Section 4 shows the experimentation and comparison results and finally Section 5  gives a conclusion and perspectives.


This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1913-8989
  • ISSN(Online): 1913-8997
  • Started: 2008
  • Frequency: quarterly

Journal Metrics

h-index (December 2020): 35

i10-index (December 2020): 152

h5-index (December 2020): N/A

h5-median(December 2020): N/A

( The data was calculated based on Google Scholar Citations. Click Here to Learn More. )

Contact