.

Harvest::Controller::RootNode - store all of the information about a RootNode

DESCRIPTION

The RootNode class is used to encapsulate all of the information relevant to a RootNode of the spider.

A RootNode is a starting point for the spider. A single spider can have more than one starting point, and each starting point can have different filters associated with it, to allow fine-grained control of the gathering process.

A RootNode has a name (a unique string), a starting URL, a set of URL filters and a set of DATA filters. No filters are automatically added by this class.

METHODS

$root=new RootNode ($config);

Create a new rootnode. $config should be a Harvest::Config class containing at least the following data:

Url: a starting URL for the node. TemporaryDir: a temporary directory to use whilst gathering Name: a unique name for the Rootnode

$root->registerfilters($prefilter,$postfilter)

Register filter chains for pre and post filtering respectively. Both arguments are Harvest::Controller::FilterManager objects (or children of that object)

$root->urlfilter

Return the URL filter object associated with this rootnode.

$root->postfilter

Return the postfilter object for this rootnode

$url=$root->url;

Return the list of urls of the starting point of this rootnode.

$name=$root->name;

Returns the short, unique, name assigned to the rootnode.