Clustering of Web Search Results Based on Document Segmentation

Full Text: PDF &nbsp;
DOI: 10.5539/cis.v6n3p89

Mohammad Haggag; Amal Aboutabl; Najla Mukhtar

doi:10.5539/cis.v6n3p89

Clustering of Web Search Results Based on Document Segmentation

Mohammad Haggag
Amal Aboutabl
Najla Mukhtar

Abstract

The process of clustering documents in a manner which produces accurate and compact clusters becomes increasingly significant mainly with the vast size of information on the web. This problem becomes even more complicated with the multi-topics nature of documents these days. In this paper, we deal with the problem of clustering documents retrieved by a search engine, where each document deals with multiple topics. Our approach is based on segmenting each document into a number of segments and then clustering segments of all documents using the Lingo algorithm. We evaluate the quality of clusters obtained by clustering full documents directly and by clustering document segments using the distance-based average intra-cluster similarity measure. Our results illustrate that average intra-cluster similarity is increased by approximately 75% as a result of clustering document segments as compared to clustering full documents retrieved by the search engine.