Multi-level Frontier based Topic-specific Crawler Design with Improved URL Ordering

Akilandeswari Jeyapal, Gopalan Palanisamy

Abstract


The rapid growth of World Wide Web has urged the development of retrieval tools like search engines. Topic specific crawlers are best suited for the users looking for results on a particular subject. In this paper, a novel design of a topic specific web crawler based on multi-agent system is presented. The architecture proposed employs two types of agents: retrieval and coordinator agents. Coordinator agent is responsible for disseminating URLs from crawling frontiers to individual retrieval agents. The URL frontier is modeled as multi-level queues to implement tunneling and is populated with URLs by a rule based engine. The coordinator agent dynamically assigns URLs to retrieval agents to avoid downloading non productive and duplicate Web pages. The empirical results clearly depict the advantage of using multi-level frontier queues in terms of harvest ratio, time, and downloading highly relevant Web pages.

Full Text: PDF

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

Computer and Information Science   ISSN 1913-8989 (Print)   ISSN 1913-8997 (Online)
Copyright © Canadian Center of Science and Education

To make sure that you can receive messages from us, please add the 'ccsenet.org' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.