School of Computing UofU calendar UofU index UofU directory Map About Salt Lake SoC Calendar University of Utah University of Utah


Research Colloquium

Luciano Barbosa
School of Computing
University of Utah

Friday, October 24, 2008
3147 MEB
Refreshments 3:20 p.m.
Lecture 3:40 p.m.



Title: An Adaptive Crawler for Locating Hidden-Web Entry Points

Abstract
In this paper we describe new adaptive crawling strategies to efficiently locate the entry points to hidden-Web sources. The fact that hidden-Web sources are very sparsely distributed makes the problem of locating them especially challenging. We deal with this problem by using the contents of pages to focus the crawl on a topic; by prioritizing promising links within the topic; and by also following links that may not lead to immediate benefit. We propose a new framework whereby crawlers automatically learn patterns of promising links and adapt their focus as the crawl progresses, thus greatly reducing the amount of required manual setup and tuning. Our experiments over real Web pages in a representative set of domains indicate that online learning leads to significant gains in harvest rates-the adaptive crawlers retrieve up to three times as many forms as crawlers that use a fixed focus strategy.

Joint work with Juliana Freire presented at WWW 2007. portal.acm.org

Return to 2008 Events Calendar


School of Computing • 50 S. Central Campus Dr. Rm. 3190 • Salt Lake City, UT 84112
801-581-8224 • Fax: 801-581-5843 • Send comments to webmaster@cs.utah.edu
Disclaimer