Harvest::Controller::Workload - a class for maintaining a robots workload | |
| |
DESCRIPTION | |
URLs are stored in server ``buckets'' according the server which the URL is coming from. At present little or no error checking on the URLs is performed. It is therefore possible to claim that a URL that was never being fetched is ``done''.
| |
METHODS | |
| |
$workload = new Workload($delay) | |
Constructor for the Workload. Constructs a new workload scheduler which
will wait for a minimum of
| |
$workload->add($url) | |
Add a URL to the workload -
| |
$workload->next | |
Return the next URL to be fetched, and mark that URL as being ``pending''
| |
$workload->down($obj) | |
Mark the server carrying the URL as down. This removes the URL from the ``pending'' list and puts it back in the bucket for the server, and reduces the priority of the bucket until the next M<done> call for a URL on that server.
FIXME: This makes no sense at all. We need to remove the ``down'' object and FIXME: replace it with something different. Saying that a server is down FIXME: should just increase that servers delay to something huge, so we FIXME: don't look at it for a while (probably use exponential backoff here) =cut
sub down {
$self->add($obj); $self->{$key}->status(DOWN); $self->_remove_from_pending($obj); } =head2 $workload->block($obj) Block the server. This will mark all URLs on the server given as being blocked.
FIXME: BLOCK IS CURRENTLY NOT IMPLEMENTED
| |
$workload->unblock($obj) | |
Unblock the server. This will remove the blocked status of the server bucket.
FIXME: UNBLOCK IS CURRENTLY NOT IMPLEMENTED
| |
$workload->done($obj) | |
Mark the URL as being completed. This removes the URL from the pending queue. Note that no precautions are taken to prevent the URL from being fetched again.
| |
$workload->clear | |
Clear the entire workload. | |