.

Harvest::Controller::URLFilter::RobotsTxt - robots.txt file parsing

DESCRIPTION

The robots.txt file is an accepted method for http server administrators to advertise limits on robots activity on their site. This module provides a means of filtering URLs so that a gatherer does not break these limits.

It currently only works for HTTP URLs (there is no accepted location for robots.txt files for other URL schemes), and will automatically allow objects from any other scheme.

PARAMETERS

The filter takes no parameters.

ACKNOWLEDGMENTS

This class simply provides an interface to the WWW::RobotRules package of libwww-perl which does all of the file parsing.