Harvest::Controller::URLFilter::RobotsTxt - robots.txt file parsing | |
| |
DESCRIPTION | |
The robots.txt file is an accepted method for http server administrators to advertise limits on robots activity on their site. This module provides a means of filtering URLs so that a gatherer does not break these limits. It currently only works for HTTP URLs (there is no accepted location for robots.txt files for other URL schemes), and will automatically allow objects from any other scheme.
| |
PARAMETERS | |
The filter takes no parameters.
| |
ACKNOWLEDGMENTS | |
This class simply provides an interface to the | |