Clustering of Web Search Results Based on Document Segmentation


  •  Mohammad Haggag    
  •  Amal Aboutabl    
  •  Najla Mukhtar    

Abstract

The process of clustering documents in a manner which produces accurate and compact clusters becomes increasingly significant mainly with the vast size of information on the web. This problem becomes even more complicated with the multi-topics nature of documents these days. In this paper, we deal with the problem of clustering documents retrieved by a search engine, where each document deals with multiple topics. Our approach is based on segmenting each document into a number of segments and then clustering segments of all documents using the Lingo algorithm. We evaluate the quality of clusters obtained by clustering full documents directly and by clustering document segments using the distance-based average intra-cluster similarity measure. Our results illustrate that average intra-cluster similarity is increased by approximately 75% as a result of clustering document segments as compared to clustering full documents retrieved by the search engine.



This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1913-8989
  • ISSN(Online): 1913-8997
  • Started: 2008
  • Frequency: quarterly

Journal Metrics

WJCI (2021): 0.557

Impact Factor 2021 (by WJCI):  0.304

h-index (December 2022): 40

i10-index (December 2022): 179

h5-index (December 2022): N/A

h5-median(December 2022): N/A

( The data was calculated based on Google Scholar Citations. Click Here to Learn More. )

Contact