Harvest::Controller - interface to the spider controller

Harvest::Controller - interface to the spider controller
DESCRIPTION METHODS $overseer=new Controller($database,$delay,$noims); $overseer->add($rootnode); $root->more $obj=$root->next $root->done($obj)
DESCRIPTION
The `Controller` is the external interface into the spider's scheduler, and filters.
METHODS

$overseer=new Controller($database,$delay,$noims);
Create a new Controller. There should be only one instance of the controller class per spider. The controller class by itself will do nothing, until Rootnodes are entered using the `add` method. `$database` is a Harvest::Database object which is the database in which all gathered data should be stored. `$delay` sets the number of seconds to wait between accesses to the same server. For internet gatherering this should be at least 60, and probably around 300. (1 min and 5 min respectively) `$noims` disables If-Modified-Since based incremental gathering if it is set.
$overseer->add($rootnode);
Add a new rootnode to the list being fetched by the spider. Rootnode should be an object of type Harvest::Controller::RootNode
$root->more
Returns TRUE if there are more objects left to fetch.
$obj=$root->next
Returns the next Harvest::Object to fetch. If there are no more objects available at the current time it will return a time in seconds to sleep until more objects should become available.
$root->done($obj)
Should be called with the results of running a Harvest::Reaper fetch operation on the object `$obj` This method will run the appropriate post-summarising filters, extract any URL references contained in the object, filter them, and add them to the workload, and finally add the object to the database passed to the constructor of this instance of the Controller.
Please mail any comments or questions to sxw@users.sourceforge.net

Harvest::Controller - interface to the spider controller

DESCRIPTION

METHODS