Representation of textual documents by the approach wordnet and n-grams for the unsupervised classification (clustering) with 2D cellular automata: a comparative study


  •  HAMOU Reda Mohamed    
  •  LEHIRECHE Ahmed    
  •  LOKBANI Ahmed Chaouki    
  •  RAHMANI Mohamed    

Abstract

In this article we present a 2D cellular automaton (Class_AC) to solve a problem of text mining in the case of unsupervised classification (clustering). Before to experiment the cellular automaton, we vectorized our data indexing textual documents from the database REUTERS 21,578 by Wordnet approach and the representation of text documents by the method n-grams. Our work is to make a comparative study of two approaches to representation that is the conceptual approach (Wordnet) and the n-grams. Section 1 gives an introduction on the biomimétisme and text mining, Section 2 presents representation of texts based on Wordnet approach and  the n grams, Section 3  describes the cellular automaton for clustering, Section 4 shows the experimentation and comparison results and finally Section 5  gives a conclusion and perspectives.


This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1913-8989
  • ISSN(Online): 1913-8997
  • Started: 2008
  • Frequency: quarterly

Journal Metrics

WJCI (2020): 0.439

Impact Factor 2020 (by WJCI): 0.247

Google Scholar Citations (March 2022): 6907

Google-based Impact Factor (2021): 0.68

h-index (December 2021): 37

i10-index (December 2021): 172

(Click Here to Learn More)

Contact