Clustering of Web Search Results Based on Document Segmentation
- Mohammad Haggag
- Amal Aboutabl
- Najla Mukhtar
Abstract
The process of clustering documents in a manner which produces accurate and compact clusters becomes increasingly significant mainly with the vast size of information on the web. This problem becomes even more complicated with the multi-topics nature of documents these days. In this paper, we deal with the problem of clustering documents retrieved by a search engine, where each document deals with multiple topics. Our approach is based on segmenting each document into a number of segments and then clustering segments of all documents using the Lingo algorithm. We evaluate the quality of clusters obtained by clustering full documents directly and by clustering document segments using the distance-based average intra-cluster similarity measure. Our results illustrate that average intra-cluster similarity is increased by approximately 75% as a result of clustering document segments as compared to clustering full documents retrieved by the search engine.
- Full Text: PDF
- DOI:10.5539/cis.v6n3p89
Journal Metrics
WJCI (2022): 0.636
Impact Factor 2022 (by WJCI): 0.419
h-index (January 2024): 43
i10-index (January 2024): 193
h5-index (January 2024): N/A
h5-median(January 2024): N/A
( The data was calculated based on Google Scholar Citations. Click Here to Learn More. )
Index
- Academic Journals Database
- BASE (Bielefeld Academic Search Engine)
- CiteFactor
- CNKI Scholar
- COPAC
- CrossRef
- DBLP (2008-2019)
- EBSCOhost
- EuroPub Database
- Excellence in Research for Australia (ERA)
- Genamics JournalSeek
- Google Scholar
- Harvard Library
- Infotrieve
- LOCKSS
- Mendeley
- PKP Open Archives Harvester
- Publons
- ResearchGate
- Scilit
- SHERPA/RoMEO
- Standard Periodical Directory
- The Index of Information Systems Journals
- The Keepers Registry
- UCR Library
- Universe Digital Library
- WJCI Report
- WorldCat
Contact
- Chris LeeEditorial Assistant
- cis@ccsenet.org